Reference (Gold): statsmodels

Pytest Summary for test `statsmodels`

status	count
passed	17669
skipped	378
xfailed	142
total	18189
collected	18189

Failed pytests:

test_discrete.py::TestLogitNewton::test_normalized_cov_params

test_discrete.py::TestLogitNewton::test_normalized_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestLogitBFGS::test_normalized_cov_params

test_discrete.py::TestLogitBFGS::test_normalized_cov_params

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_var.py::TestVARResults::test_fevd_cov

test_var.py::TestVARResults::test_fevd_cov

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_hannan_rissanen.py::test_invalid_orders

test_hannan_rissanen.py::test_invalid_orders

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_hannan_rissanen.py::test_initial_order

test_hannan_rissanen.py::test_initial_order

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1Mixed_Approx::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1Mixed_Approx::test_simulation_smoothed_state_disturbance

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_Approx::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_Approx::test_simulation_smoothed_state

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_Approx::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_Approx::test_simulation_smoothed_measurement_disturbance

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_Approx::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_Approx::test_simulation_smoothed_state_disturbance

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1_Approx::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1_Approx::test_simulation_smoothed_measurement_disturbance

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1Mixed_Approx::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1Mixed_Approx::test_simulation_smoothed_measurement_disturbance

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1_Approx::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1_Approx::test_simulation_smoothed_state_disturbance

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1_Approx::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1_Approx::test_simulation_smoothed_state

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1Mixed_Approx::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1Mixed_Approx::test_simulation_smoothed_state

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestDFM_Approx::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestDFM_Approx::test_simulation_smoothed_state_disturbance

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestDFM_Approx::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestDFM_Approx::test_simulation_smoothed_measurement_disturbance

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestDFM_Approx::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestDFM_Approx::test_simulation_smoothed_state

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestDFMCollapsed_Approx::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestDFMCollapsed_Approx::test_simulation_smoothed_measurement_disturbance

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestDFMCollapsed_Approx::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestDFMCollapsed_Approx::test_simulation_smoothed_state_disturbance

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestDFMCollapsed_Approx::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestDFMCollapsed_Approx::test_simulation_smoothed_state

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1Missing_Approx::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1Missing_Approx::test_simulation_smoothed_measurement_disturbance

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1Missing_Approx::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1Missing_Approx::test_simulation_smoothed_state_disturbance

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1Missing_Approx::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1Missing_Approx::test_simulation_smoothed_state

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1Mixed_KFAS::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1Mixed_KFAS::test_simulation_smoothed_measurement_disturbance

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1Mixed_KFAS::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1Mixed_KFAS::test_simulation_smoothed_state_disturbance

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestDFM_KFAS::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestDFM_KFAS::test_simulation_smoothed_state_disturbance

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1Mixed_KFAS::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1Mixed_KFAS::test_simulation_smoothed_state

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestDFM_KFAS::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestDFM_KFAS::test_simulation_smoothed_state

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1_KFAS::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1_KFAS::test_simulation_smoothed_state

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestDFM_KFAS::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestDFM_KFAS::test_simulation_smoothed_measurement_disturbance

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1Missing_KFAS::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1Missing_KFAS::test_simulation_smoothed_measurement_disturbance

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1Missing_KFAS::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1Missing_KFAS::test_simulation_smoothed_state_disturbance

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1Missing_KFAS::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1Missing_KFAS::test_simulation_smoothed_state

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1_KFAS::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1_KFAS::test_simulation_smoothed_state_disturbance

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1_KFAS::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1_KFAS::test_simulation_smoothed_measurement_disturbance

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_KFAS::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_KFAS::test_simulation_smoothed_measurement_disturbance

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_KFAS::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_KFAS::test_simulation_smoothed_state

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_KFAS::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_KFAS::test_simulation_smoothed_state_disturbance

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestMNLogitNewtonBaseZero::test_cov_params

test_discrete.py::TestMNLogitNewtonBaseZero::test_cov_params

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestMNLogitNewtonBaseZero::test_normalized_cov_params

test_discrete.py::TestMNLogitNewtonBaseZero::test_normalized_cov_params

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestMNLogitNewtonBaseZero::test_distr

test_discrete.py::TestMNLogitNewtonBaseZero::test_distr

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestMNLogitLBFGSBaseZero::test_normalized_cov_params

test_discrete.py::TestMNLogitLBFGSBaseZero::test_normalized_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestMNLogitLBFGSBaseZero::test_distr

test_discrete.py::TestMNLogitLBFGSBaseZero::test_distr

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestProbitNewton::test_normalized_cov_params

test_discrete.py::TestProbitNewton::test_normalized_cov_params

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestMNLogitLBFGSBaseZero::test_cov_params

test_discrete.py::TestMNLogitLBFGSBaseZero::test_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestProbitMinimizeDefault::test_normalized_cov_params

test_discrete.py::TestProbitMinimizeDefault::test_normalized_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestProbitBasinhopping::test_normalized_cov_params

test_discrete.py::TestProbitBasinhopping::test_normalized_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestProbitMinimizeAdditionalOptions::test_normalized_cov_params

test_discrete.py::TestProbitMinimizeAdditionalOptions::test_normalized_cov_params

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestProbitNM::test_normalized_cov_params

test_discrete.py::TestProbitNM::test_normalized_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestProbitPowell::test_normalized_cov_params

test_discrete.py::TestProbitPowell::test_normalized_cov_params

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestProbitNCG::test_normalized_cov_params

test_discrete.py::TestProbitNCG::test_normalized_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestProbitCG::test_normalized_cov_params

test_discrete.py::TestProbitCG::test_normalized_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestProbitMinimizeDogleg::test_normalized_cov_params

test_discrete.py::TestProbitMinimizeDogleg::test_normalized_cov_params

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestProbitBFGS::test_normalized_cov_params

test_discrete.py::TestProbitBFGS::test_normalized_cov_params

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestPoissonNewton::test_cov_params

test_discrete.py::TestPoissonNewton::test_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestPoissonNewton::test_normalized_cov_params

test_discrete.py::TestPoissonNewton::test_normalized_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialPNB2BFGS::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialPNB2BFGS::test_normalized_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialPNB2BFGS::test_pvalues

test_discrete.py::TestNegativeBinomialPNB2BFGS::test_pvalues

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialNB2BFGS::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialNB2BFGS::test_normalized_cov_params

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialPNB1BFGS::test_pvalues

test_discrete.py::TestNegativeBinomialPNB1BFGS::test_pvalues

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialNB2BFGS::test_pvalues

test_discrete.py::TestNegativeBinomialNB2BFGS::test_pvalues

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialNB2Newton::test_pvalues

test_discrete.py::TestNegativeBinomialNB2Newton::test_pvalues

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialNB2Newton::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialNB2Newton::test_normalized_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialPNB1BFGS::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialPNB1BFGS::test_normalized_cov_params

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialPNB2Newton::test_pvalues

test_discrete.py::TestNegativeBinomialPNB2Newton::test_pvalues

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialGeometricBFGS::test_pvalues

test_discrete.py::TestNegativeBinomialGeometricBFGS::test_pvalues

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialGeometricBFGS::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialGeometricBFGS::test_normalized_cov_params

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialNB1Newton::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialNB1Newton::test_normalized_cov_params

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialPNB2Newton::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialPNB2Newton::test_normalized_cov_params

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialNB1Newton::test_predict_xb

test_discrete.py::TestNegativeBinomialNB1Newton::test_predict_xb

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialNB1BFGS::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialNB1BFGS::test_normalized_cov_params

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_tsaplots.py::test_plot_month

test_tsaplots.py::test_plot_month

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialNB1BFGS::test_predict

test_discrete.py::TestNegativeBinomialNB1BFGS::test_predict

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialNB1Newton::test_predict

test_discrete.py::TestNegativeBinomialNB1Newton::test_predict

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialNB1Newton::test_pvalues

test_discrete.py::TestNegativeBinomialNB1Newton::test_pvalues

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialNB1BFGS::test_pvalues

test_discrete.py::TestNegativeBinomialNB1BFGS::test_pvalues

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialNB1BFGS::test_predict_xb

test_discrete.py::TestNegativeBinomialNB1BFGS::test_predict_xb

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialPNB1Newton::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialPNB1Newton::test_normalized_cov_params

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_discrete.py::TestNegativeBinomialPNB1Newton::test_pvalues

test_discrete.py::TestNegativeBinomialPNB1Newton::test_pvalues

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gofplots.py::TestProbPlotRandomNormalLocScaleDist::test_probplot_other_array

test_gofplots.py::TestProbPlotRandomNormalLocScaleDist::test_probplot_other_array

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gofplots.py::TestProbPlotRandomNormalLocScaleDist::test_probplot_other_prbplt

test_gofplots.py::TestProbPlotRandomNormalLocScaleDist::test_probplot_other_prbplt

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gofplots.py::TestProbPlotRandomNormalFullDist::test_probplot_other_array

test_gofplots.py::TestProbPlotRandomNormalFullDist::test_probplot_other_array

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gofplots.py::TestProbPlotRandomNormalFullDist::test_probplot_other_prbplt

test_gofplots.py::TestProbPlotRandomNormalFullDist::test_probplot_other_prbplt

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gofplots.py::TestProbPlotLongelyWithFit::test_probplot_other_prbplt

test_gofplots.py::TestProbPlotLongelyWithFit::test_probplot_other_prbplt

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gofplots.py::TestProbPlotLongelyNoFit::test_probplot_other_array

test_gofplots.py::TestProbPlotLongelyNoFit::test_probplot_other_array

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gofplots.py::TestProbPlotLongelyNoFit::test_probplot_other_prbplt

test_gofplots.py::TestProbPlotLongelyNoFit::test_probplot_other_prbplt

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gofplots.py::TestProbPlotRandomNormalWithFit::test_probplot_other_array

test_gofplots.py::TestProbPlotRandomNormalWithFit::test_probplot_other_array

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gofplots.py::TestProbPlotLongelyWithFit::test_probplot_other_array

test_gofplots.py::TestProbPlotLongelyWithFit::test_probplot_other_array

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gofplots.py::TestProbPlotRandomNormalWithFit::test_probplot_other_prbplt

test_gofplots.py::TestProbPlotRandomNormalWithFit::test_probplot_other_prbplt

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gofplots.py::TestProbPlotRandomNormalMinimal::test_probplot_other_array

test_gofplots.py::TestProbPlotRandomNormalMinimal::test_probplot_other_array

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gofplots.py::TestProbPlotRandomNormalMinimal::test_probplot_other_prbplt

test_gofplots.py::TestProbPlotRandomNormalMinimal::test_probplot_other_prbplt

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_tsa_indexes.py::test_nonfull_periodindex

test_tsa_indexes.py::test_nonfull_periodindex

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_innovations.py::test_brockwell_davis_example_524_variance

test_innovations.py::test_brockwell_davis_example_524_variance

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_kernel_regression.py::TestKernelReg::test_mfx_nonlinear_ll_cvls

test_kernel_regression.py::TestKernelReg::test_mfx_nonlinear_ll_cvls

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_validation.py::TestArrayLike::test_dot[True]

test_validation.py::TestArrayLike::test_dot[True]

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_validation.py::TestArrayLike::test_dot[False]

test_validation.py::TestArrayLike::test_dot[False]

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_holtwinters.py::TestHoltWinters::test_forecast

test_holtwinters.py::TestHoltWinters::test_forecast

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_copula.py::TestFrankCopula::test_seed[qmc]

test_copula.py::TestFrankCopula::test_seed[qmc]

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_copula.py::TestGumbelCopula::test_seed[qmc]

test_copula.py::TestGumbelCopula::test_seed[qmc]

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_copula.py::TestClaytonCopula::test_seed[qmc]

test_copula.py::TestClaytonCopula::test_seed[qmc]

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMBinomial::test_fitted

test_gam.py::TestGAMBinomial::test_fitted

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMGamma::test_df

test_gam.py::TestGAMGamma::test_df

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMBinomial::test_df

test_gam.py::TestGAMBinomial::test_df

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMGamma::test_fitted

test_gam.py::TestGAMGamma::test_fitted

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMNegativeBinomial::test_predict

test_gam.py::TestGAMNegativeBinomial::test_predict

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

cls = 

    @classmethod
    def init(cls):
        nobs = cls.nobs
        y_true, x, exog = cls.y_true, cls.x, cls.exog
        if not hasattr(cls, 'scale'):
            scale = 1
        else:
            scale = cls.scale

        f = cls.family

        cls.mu_true = mu_true = f.link.inverse(y_true)

        np.random.seed(8765993)
        # Discrete distributions do not take `scale`.
        try:
>           y_obs = cls.rvs(mu_true, scale=scale, size=nobs)

statsmodels/sandbox/tests/test_gam.py:223: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:3343: in rvs
    return super().rvs(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (array([  1.10246682,   1.19063782,   1.28242135,   1.37777111,
         1.47707059,   1.5812708 ,   1.69202247,   1.8...6.76689582,  63.20012622,  70.28035558,  78.06505544,
        86.6410784 ,  96.13812455, 106.74494298, 118.72889615]),)
kwds = {'scale': 1, 'size': 200}, discrete = True, rndm = None

    def rvs(self, *args, **kwds):
        """Random variates of given type.

        Parameters
        ----------
        arg1, arg2, arg3,... : array_like
            The shape parameter(s) for the distribution (see docstring of the
            instance object for more information).
        loc : array_like, optional
            Location parameter (default=0).
        scale : array_like, optional
            Scale parameter (default=1).
        size : int or tuple of ints, optional
            Defining number of random variates (default is 1).
        random_state : {None, int, `numpy.random.Generator`,
                        `numpy.random.RandomState`}, optional

            If `random_state` is None (or `np.random`), the
            `numpy.random.RandomState` singleton is used.
            If `random_state` is an int, a new ``RandomState`` instance is
            used, seeded with `random_state`.
            If `random_state` is already a ``Generator`` or ``RandomState``
            instance, that instance is used.

        Returns
        -------
        rvs : ndarray or scalar
            Random variates of given `size`.

        """
        discrete = kwds.pop('discrete', None)
        rndm = kwds.pop('random_state', None)
>       args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
E       TypeError: _parse_args_rvs() got an unexpected keyword argument 'scale'

.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:1047: TypeError

During handling of the above exception, another exception occurred:

cls = 

    @classmethod
    def setup_class(cls):
        super(TestGAMNegativeBinomial, cls).setup_class()  # initialize DGP

        cls.family = family.NegativeBinomial()
        cls.rvs = stats.nbinom.rvs

>       cls.init()

statsmodels/sandbox/tests/test_gam.py:325: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
statsmodels/sandbox/tests/test_gam.py:225: in init
    y_obs = cls.rvs(mu_true, size=nobs)
.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:3343: in rvs
    return super().rvs(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (array([  1.10246682,   1.19063782,   1.28242135,   1.37777111,
         1.47707059,   1.5812708 ,   1.69202247,   1.8...6.76689582,  63.20012622,  70.28035558,  78.06505544,
        86.6410784 ,  96.13812455, 106.74494298, 118.72889615]),)
kwds = {'size': 200}, discrete = True, rndm = None

    def rvs(self, *args, **kwds):
        """Random variates of given type.

        Parameters
        ----------
        arg1, arg2, arg3,... : array_like
            The shape parameter(s) for the distribution (see docstring of the
            instance object for more information).
        loc : array_like, optional
            Location parameter (default=0).
        scale : array_like, optional
            Scale parameter (default=1).
        size : int or tuple of ints, optional
            Defining number of random variates (default is 1).
        random_state : {None, int, `numpy.random.Generator`,
                        `numpy.random.RandomState`}, optional

            If `random_state` is None (or `np.random`), the
            `numpy.random.RandomState` singleton is used.
            If `random_state` is an int, a new ``RandomState`` instance is
            used, seeded with `random_state`.
            If `random_state` is already a ``Generator`` or ``RandomState``
            instance, that instance is used.

        Returns
        -------
        rvs : ndarray or scalar
            Random variates of given `size`.

        """
        discrete = kwds.pop('discrete', None)
        rndm = kwds.pop('random_state', None)
>       args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
E       TypeError: _parse_args_rvs() missing 1 required positional argument: 'p'

.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:1047: TypeError

test_gam.py::TestGAMPoisson::test_df

test_gam.py::TestGAMPoisson::test_df

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMPoisson::test_fitted

test_gam.py::TestGAMPoisson::test_fitted

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMGaussianLogLink::test_params

test_gam.py::TestGAMGaussianLogLink::test_params

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMNegativeBinomial::test_prediction

test_gam.py::TestGAMNegativeBinomial::test_prediction

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

cls = 

    @classmethod
    def init(cls):
        nobs = cls.nobs
        y_true, x, exog = cls.y_true, cls.x, cls.exog
        if not hasattr(cls, 'scale'):
            scale = 1
        else:
            scale = cls.scale

        f = cls.family

        cls.mu_true = mu_true = f.link.inverse(y_true)

        np.random.seed(8765993)
        # Discrete distributions do not take `scale`.
        try:
>           y_obs = cls.rvs(mu_true, scale=scale, size=nobs)

statsmodels/sandbox/tests/test_gam.py:223: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:3343: in rvs
    return super().rvs(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (array([  1.10246682,   1.19063782,   1.28242135,   1.37777111,
         1.47707059,   1.5812708 ,   1.69202247,   1.8...6.76689582,  63.20012622,  70.28035558,  78.06505544,
        86.6410784 ,  96.13812455, 106.74494298, 118.72889615]),)
kwds = {'scale': 1, 'size': 200}, discrete = True, rndm = None

    def rvs(self, *args, **kwds):
        """Random variates of given type.

        Parameters
        ----------
        arg1, arg2, arg3,... : array_like
            The shape parameter(s) for the distribution (see docstring of the
            instance object for more information).
        loc : array_like, optional
            Location parameter (default=0).
        scale : array_like, optional
            Scale parameter (default=1).
        size : int or tuple of ints, optional
            Defining number of random variates (default is 1).
        random_state : {None, int, `numpy.random.Generator`,
                        `numpy.random.RandomState`}, optional

            If `random_state` is None (or `np.random`), the
            `numpy.random.RandomState` singleton is used.
            If `random_state` is an int, a new ``RandomState`` instance is
            used, seeded with `random_state`.
            If `random_state` is already a ``Generator`` or ``RandomState``
            instance, that instance is used.

        Returns
        -------
        rvs : ndarray or scalar
            Random variates of given `size`.

        """
        discrete = kwds.pop('discrete', None)
        rndm = kwds.pop('random_state', None)
>       args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
E       TypeError: _parse_args_rvs() got an unexpected keyword argument 'scale'

.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:1047: TypeError

During handling of the above exception, another exception occurred:

cls = 

    @classmethod
    def setup_class(cls):
        super(TestGAMNegativeBinomial, cls).setup_class()  # initialize DGP

        cls.family = family.NegativeBinomial()
        cls.rvs = stats.nbinom.rvs

>       cls.init()

statsmodels/sandbox/tests/test_gam.py:325: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
statsmodels/sandbox/tests/test_gam.py:225: in init
    y_obs = cls.rvs(mu_true, size=nobs)
.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:3343: in rvs
    return super().rvs(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (array([  1.10246682,   1.19063782,   1.28242135,   1.37777111,
         1.47707059,   1.5812708 ,   1.69202247,   1.8...6.76689582,  63.20012622,  70.28035558,  78.06505544,
        86.6410784 ,  96.13812455, 106.74494298, 118.72889615]),)
kwds = {'size': 200}, discrete = True, rndm = None

    def rvs(self, *args, **kwds):
        """Random variates of given type.

        Parameters
        ----------
        arg1, arg2, arg3,... : array_like
            The shape parameter(s) for the distribution (see docstring of the
            instance object for more information).
        loc : array_like, optional
            Location parameter (default=0).
        scale : array_like, optional
            Scale parameter (default=1).
        size : int or tuple of ints, optional
            Defining number of random variates (default is 1).
        random_state : {None, int, `numpy.random.Generator`,
                        `numpy.random.RandomState`}, optional

            If `random_state` is None (or `np.random`), the
            `numpy.random.RandomState` singleton is used.
            If `random_state` is an int, a new ``RandomState`` instance is
            used, seeded with `random_state`.
            If `random_state` is already a ``Generator`` or ``RandomState``
            instance, that instance is used.

        Returns
        -------
        rvs : ndarray or scalar
            Random variates of given `size`.

        """
        discrete = kwds.pop('discrete', None)
        rndm = kwds.pop('random_state', None)
>       args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
E       TypeError: _parse_args_rvs() missing 1 required positional argument: 'p'

.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:1047: TypeError

test_gam.py::TestGAMGaussianLogLink::test_df

test_gam.py::TestGAMGaussianLogLink::test_df

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMGaussianLogLink::test_mu

test_gam.py::TestGAMGaussianLogLink::test_mu

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMGaussianLogLink::test_predict

test_gam.py::TestGAMGaussianLogLink::test_predict

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMGaussianLogLink::test_fitted

test_gam.py::TestGAMGaussianLogLink::test_fitted

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMGaussianLogLink::test_prediction

test_gam.py::TestGAMGaussianLogLink::test_prediction

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMNegativeBinomial::test_mu

test_gam.py::TestGAMNegativeBinomial::test_mu

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

cls = 

    @classmethod
    def init(cls):
        nobs = cls.nobs
        y_true, x, exog = cls.y_true, cls.x, cls.exog
        if not hasattr(cls, 'scale'):
            scale = 1
        else:
            scale = cls.scale

        f = cls.family

        cls.mu_true = mu_true = f.link.inverse(y_true)

        np.random.seed(8765993)
        # Discrete distributions do not take `scale`.
        try:
>           y_obs = cls.rvs(mu_true, scale=scale, size=nobs)

statsmodels/sandbox/tests/test_gam.py:223: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:3343: in rvs
    return super().rvs(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (array([  1.10246682,   1.19063782,   1.28242135,   1.37777111,
         1.47707059,   1.5812708 ,   1.69202247,   1.8...6.76689582,  63.20012622,  70.28035558,  78.06505544,
        86.6410784 ,  96.13812455, 106.74494298, 118.72889615]),)
kwds = {'scale': 1, 'size': 200}, discrete = True, rndm = None

    def rvs(self, *args, **kwds):
        """Random variates of given type.

        Parameters
        ----------
        arg1, arg2, arg3,... : array_like
            The shape parameter(s) for the distribution (see docstring of the
            instance object for more information).
        loc : array_like, optional
            Location parameter (default=0).
        scale : array_like, optional
            Scale parameter (default=1).
        size : int or tuple of ints, optional
            Defining number of random variates (default is 1).
        random_state : {None, int, `numpy.random.Generator`,
                        `numpy.random.RandomState`}, optional

            If `random_state` is None (or `np.random`), the
            `numpy.random.RandomState` singleton is used.
            If `random_state` is an int, a new ``RandomState`` instance is
            used, seeded with `random_state`.
            If `random_state` is already a ``Generator`` or ``RandomState``
            instance, that instance is used.

        Returns
        -------
        rvs : ndarray or scalar
            Random variates of given `size`.

        """
        discrete = kwds.pop('discrete', None)
        rndm = kwds.pop('random_state', None)
>       args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
E       TypeError: _parse_args_rvs() got an unexpected keyword argument 'scale'

.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:1047: TypeError

During handling of the above exception, another exception occurred:

cls = 

    @classmethod
    def setup_class(cls):
        super(TestGAMNegativeBinomial, cls).setup_class()  # initialize DGP

        cls.family = family.NegativeBinomial()
        cls.rvs = stats.nbinom.rvs

>       cls.init()

statsmodels/sandbox/tests/test_gam.py:325: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
statsmodels/sandbox/tests/test_gam.py:225: in init
    y_obs = cls.rvs(mu_true, size=nobs)
.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:3343: in rvs
    return super().rvs(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (array([  1.10246682,   1.19063782,   1.28242135,   1.37777111,
         1.47707059,   1.5812708 ,   1.69202247,   1.8...6.76689582,  63.20012622,  70.28035558,  78.06505544,
        86.6410784 ,  96.13812455, 106.74494298, 118.72889615]),)
kwds = {'size': 200}, discrete = True, rndm = None

    def rvs(self, *args, **kwds):
        """Random variates of given type.

        Parameters
        ----------
        arg1, arg2, arg3,... : array_like
            The shape parameter(s) for the distribution (see docstring of the
            instance object for more information).
        loc : array_like, optional
            Location parameter (default=0).
        scale : array_like, optional
            Scale parameter (default=1).
        size : int or tuple of ints, optional
            Defining number of random variates (default is 1).
        random_state : {None, int, `numpy.random.Generator`,
                        `numpy.random.RandomState`}, optional

            If `random_state` is None (or `np.random`), the
            `numpy.random.RandomState` singleton is used.
            If `random_state` is an int, a new ``RandomState`` instance is
            used, seeded with `random_state`.
            If `random_state` is already a ``Generator`` or ``RandomState``
            instance, that instance is used.

        Returns
        -------
        rvs : ndarray or scalar
            Random variates of given `size`.

        """
        discrete = kwds.pop('discrete', None)
        rndm = kwds.pop('random_state', None)
>       args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
E       TypeError: _parse_args_rvs() missing 1 required positional argument: 'p'

.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:1047: TypeError

test_gam.py::TestGAMNegativeBinomial::test_params

test_gam.py::TestGAMNegativeBinomial::test_params

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

cls = 

    @classmethod
    def init(cls):
        nobs = cls.nobs
        y_true, x, exog = cls.y_true, cls.x, cls.exog
        if not hasattr(cls, 'scale'):
            scale = 1
        else:
            scale = cls.scale

        f = cls.family

        cls.mu_true = mu_true = f.link.inverse(y_true)

        np.random.seed(8765993)
        # Discrete distributions do not take `scale`.
        try:
>           y_obs = cls.rvs(mu_true, scale=scale, size=nobs)

statsmodels/sandbox/tests/test_gam.py:223: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:3343: in rvs
    return super().rvs(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (array([  1.10246682,   1.19063782,   1.28242135,   1.37777111,
         1.47707059,   1.5812708 ,   1.69202247,   1.8...6.76689582,  63.20012622,  70.28035558,  78.06505544,
        86.6410784 ,  96.13812455, 106.74494298, 118.72889615]),)
kwds = {'scale': 1, 'size': 200}, discrete = True, rndm = None

    def rvs(self, *args, **kwds):
        """Random variates of given type.

        Parameters
        ----------
        arg1, arg2, arg3,... : array_like
            The shape parameter(s) for the distribution (see docstring of the
            instance object for more information).
        loc : array_like, optional
            Location parameter (default=0).
        scale : array_like, optional
            Scale parameter (default=1).
        size : int or tuple of ints, optional
            Defining number of random variates (default is 1).
        random_state : {None, int, `numpy.random.Generator`,
                        `numpy.random.RandomState`}, optional

            If `random_state` is None (or `np.random`), the
            `numpy.random.RandomState` singleton is used.
            If `random_state` is an int, a new ``RandomState`` instance is
            used, seeded with `random_state`.
            If `random_state` is already a ``Generator`` or ``RandomState``
            instance, that instance is used.

        Returns
        -------
        rvs : ndarray or scalar
            Random variates of given `size`.

        """
        discrete = kwds.pop('discrete', None)
        rndm = kwds.pop('random_state', None)
>       args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
E       TypeError: _parse_args_rvs() got an unexpected keyword argument 'scale'

.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:1047: TypeError

During handling of the above exception, another exception occurred:

cls = 

    @classmethod
    def setup_class(cls):
        super(TestGAMNegativeBinomial, cls).setup_class()  # initialize DGP

        cls.family = family.NegativeBinomial()
        cls.rvs = stats.nbinom.rvs

>       cls.init()

statsmodels/sandbox/tests/test_gam.py:325: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
statsmodels/sandbox/tests/test_gam.py:225: in init
    y_obs = cls.rvs(mu_true, size=nobs)
.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:3343: in rvs
    return super().rvs(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (array([  1.10246682,   1.19063782,   1.28242135,   1.37777111,
         1.47707059,   1.5812708 ,   1.69202247,   1.8...6.76689582,  63.20012622,  70.28035558,  78.06505544,
        86.6410784 ,  96.13812455, 106.74494298, 118.72889615]),)
kwds = {'size': 200}, discrete = True, rndm = None

    def rvs(self, *args, **kwds):
        """Random variates of given type.

        Parameters
        ----------
        arg1, arg2, arg3,... : array_like
            The shape parameter(s) for the distribution (see docstring of the
            instance object for more information).
        loc : array_like, optional
            Location parameter (default=0).
        scale : array_like, optional
            Scale parameter (default=1).
        size : int or tuple of ints, optional
            Defining number of random variates (default is 1).
        random_state : {None, int, `numpy.random.Generator`,
                        `numpy.random.RandomState`}, optional

            If `random_state` is None (or `np.random`), the
            `numpy.random.RandomState` singleton is used.
            If `random_state` is an int, a new ``RandomState`` instance is
            used, seeded with `random_state`.
            If `random_state` is already a ``Generator`` or ``RandomState``
            instance, that instance is used.

        Returns
        -------
        rvs : ndarray or scalar
            Random variates of given `size`.

        """
        discrete = kwds.pop('discrete', None)
        rndm = kwds.pop('random_state', None)
>       args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
E       TypeError: _parse_args_rvs() missing 1 required positional argument: 'p'

.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:1047: TypeError

test_gmm.py::TestGMMStOnestep::test_other

test_gmm.py::TestGMMStOnestep::test_other

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMNegativeBinomial::test_df

test_gam.py::TestGAMNegativeBinomial::test_df

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

cls = 

    @classmethod
    def init(cls):
        nobs = cls.nobs
        y_true, x, exog = cls.y_true, cls.x, cls.exog
        if not hasattr(cls, 'scale'):
            scale = 1
        else:
            scale = cls.scale

        f = cls.family

        cls.mu_true = mu_true = f.link.inverse(y_true)

        np.random.seed(8765993)
        # Discrete distributions do not take `scale`.
        try:
>           y_obs = cls.rvs(mu_true, scale=scale, size=nobs)

statsmodels/sandbox/tests/test_gam.py:223: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:3343: in rvs
    return super().rvs(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (array([  1.10246682,   1.19063782,   1.28242135,   1.37777111,
         1.47707059,   1.5812708 ,   1.69202247,   1.8...6.76689582,  63.20012622,  70.28035558,  78.06505544,
        86.6410784 ,  96.13812455, 106.74494298, 118.72889615]),)
kwds = {'scale': 1, 'size': 200}, discrete = True, rndm = None

    def rvs(self, *args, **kwds):
        """Random variates of given type.

        Parameters
        ----------
        arg1, arg2, arg3,... : array_like
            The shape parameter(s) for the distribution (see docstring of the
            instance object for more information).
        loc : array_like, optional
            Location parameter (default=0).
        scale : array_like, optional
            Scale parameter (default=1).
        size : int or tuple of ints, optional
            Defining number of random variates (default is 1).
        random_state : {None, int, `numpy.random.Generator`,
                        `numpy.random.RandomState`}, optional

            If `random_state` is None (or `np.random`), the
            `numpy.random.RandomState` singleton is used.
            If `random_state` is an int, a new ``RandomState`` instance is
            used, seeded with `random_state`.
            If `random_state` is already a ``Generator`` or ``RandomState``
            instance, that instance is used.

        Returns
        -------
        rvs : ndarray or scalar
            Random variates of given `size`.

        """
        discrete = kwds.pop('discrete', None)
        rndm = kwds.pop('random_state', None)
>       args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
E       TypeError: _parse_args_rvs() got an unexpected keyword argument 'scale'

.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:1047: TypeError

During handling of the above exception, another exception occurred:

cls = 

    @classmethod
    def setup_class(cls):
        super(TestGAMNegativeBinomial, cls).setup_class()  # initialize DGP

        cls.family = family.NegativeBinomial()
        cls.rvs = stats.nbinom.rvs

>       cls.init()

statsmodels/sandbox/tests/test_gam.py:325: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
statsmodels/sandbox/tests/test_gam.py:225: in init
    y_obs = cls.rvs(mu_true, size=nobs)
.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:3343: in rvs
    return super().rvs(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (array([  1.10246682,   1.19063782,   1.28242135,   1.37777111,
         1.47707059,   1.5812708 ,   1.69202247,   1.8...6.76689582,  63.20012622,  70.28035558,  78.06505544,
        86.6410784 ,  96.13812455, 106.74494298, 118.72889615]),)
kwds = {'size': 200}, discrete = True, rndm = None

    def rvs(self, *args, **kwds):
        """Random variates of given type.

        Parameters
        ----------
        arg1, arg2, arg3,... : array_like
            The shape parameter(s) for the distribution (see docstring of the
            instance object for more information).
        loc : array_like, optional
            Location parameter (default=0).
        scale : array_like, optional
            Scale parameter (default=1).
        size : int or tuple of ints, optional
            Defining number of random variates (default is 1).
        random_state : {None, int, `numpy.random.Generator`,
                        `numpy.random.RandomState`}, optional

            If `random_state` is None (or `np.random`), the
            `numpy.random.RandomState` singleton is used.
            If `random_state` is an int, a new ``RandomState`` instance is
            used, seeded with `random_state`.
            If `random_state` is already a ``Generator`` or ``RandomState``
            instance, that instance is used.

        Returns
        -------
        rvs : ndarray or scalar
            Random variates of given `size`.

        """
        discrete = kwds.pop('discrete', None)
        rndm = kwds.pop('random_state', None)
>       args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
E       TypeError: _parse_args_rvs() missing 1 required positional argument: 'p'

.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:1047: TypeError

test_gmm.py::TestGMMStOneiter::test_other

test_gmm.py::TestGMMStOneiter::test_other

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestGAMNegativeBinomial::test_fitted

test_gam.py::TestGAMNegativeBinomial::test_fitted

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

cls = 

    @classmethod
    def init(cls):
        nobs = cls.nobs
        y_true, x, exog = cls.y_true, cls.x, cls.exog
        if not hasattr(cls, 'scale'):
            scale = 1
        else:
            scale = cls.scale

        f = cls.family

        cls.mu_true = mu_true = f.link.inverse(y_true)

        np.random.seed(8765993)
        # Discrete distributions do not take `scale`.
        try:
>           y_obs = cls.rvs(mu_true, scale=scale, size=nobs)

statsmodels/sandbox/tests/test_gam.py:223: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:3343: in rvs
    return super().rvs(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (array([  1.10246682,   1.19063782,   1.28242135,   1.37777111,
         1.47707059,   1.5812708 ,   1.69202247,   1.8...6.76689582,  63.20012622,  70.28035558,  78.06505544,
        86.6410784 ,  96.13812455, 106.74494298, 118.72889615]),)
kwds = {'scale': 1, 'size': 200}, discrete = True, rndm = None

    def rvs(self, *args, **kwds):
        """Random variates of given type.

        Parameters
        ----------
        arg1, arg2, arg3,... : array_like
            The shape parameter(s) for the distribution (see docstring of the
            instance object for more information).
        loc : array_like, optional
            Location parameter (default=0).
        scale : array_like, optional
            Scale parameter (default=1).
        size : int or tuple of ints, optional
            Defining number of random variates (default is 1).
        random_state : {None, int, `numpy.random.Generator`,
                        `numpy.random.RandomState`}, optional

            If `random_state` is None (or `np.random`), the
            `numpy.random.RandomState` singleton is used.
            If `random_state` is an int, a new ``RandomState`` instance is
            used, seeded with `random_state`.
            If `random_state` is already a ``Generator`` or ``RandomState``
            instance, that instance is used.

        Returns
        -------
        rvs : ndarray or scalar
            Random variates of given `size`.

        """
        discrete = kwds.pop('discrete', None)
        rndm = kwds.pop('random_state', None)
>       args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
E       TypeError: _parse_args_rvs() got an unexpected keyword argument 'scale'

.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:1047: TypeError

During handling of the above exception, another exception occurred:

cls = 

    @classmethod
    def setup_class(cls):
        super(TestGAMNegativeBinomial, cls).setup_class()  # initialize DGP

        cls.family = family.NegativeBinomial()
        cls.rvs = stats.nbinom.rvs

>       cls.init()

statsmodels/sandbox/tests/test_gam.py:325: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
statsmodels/sandbox/tests/test_gam.py:225: in init
    y_obs = cls.rvs(mu_true, size=nobs)
.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:3343: in rvs
    return super().rvs(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = 
args = (array([  1.10246682,   1.19063782,   1.28242135,   1.37777111,
         1.47707059,   1.5812708 ,   1.69202247,   1.8...6.76689582,  63.20012622,  70.28035558,  78.06505544,
        86.6410784 ,  96.13812455, 106.74494298, 118.72889615]),)
kwds = {'size': 200}, discrete = True, rndm = None

    def rvs(self, *args, **kwds):
        """Random variates of given type.

        Parameters
        ----------
        arg1, arg2, arg3,... : array_like
            The shape parameter(s) for the distribution (see docstring of the
            instance object for more information).
        loc : array_like, optional
            Location parameter (default=0).
        scale : array_like, optional
            Scale parameter (default=1).
        size : int or tuple of ints, optional
            Defining number of random variates (default is 1).
        random_state : {None, int, `numpy.random.Generator`,
                        `numpy.random.RandomState`}, optional

            If `random_state` is None (or `np.random`), the
            `numpy.random.RandomState` singleton is used.
            If `random_state` is an int, a new ``RandomState`` instance is
            used, seeded with `random_state`.
            If `random_state` is already a ``Generator`` or ``RandomState``
            instance, that instance is used.

        Returns
        -------
        rvs : ndarray or scalar
            Random variates of given `size`.

        """
        discrete = kwds.pop('discrete', None)
        rndm = kwds.pop('random_state', None)
>       args, loc, scale, size = self._parse_args_rvs(*args, **kwds)
E       TypeError: _parse_args_rvs() missing 1 required positional argument: 'p'

.venv/lib/python3.10/site-packages/scipy/stats/_distn_infrastructure.py:1047: TypeError

test_gmm.py::TestGMMStOneiterNO_Nonlinear::test_other

test_gmm.py::TestGMMStOneiterNO_Nonlinear::test_other

[gw4] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_durbin_levinson.py::test_nonstationary_series_variance

test_durbin_levinson.py::test_nonstationary_series_variance

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gmm.py::TestGMMStOneiterNO_Linear::test_other

test_gmm.py::TestGMMStOneiterNO_Linear::test_other

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gmm.py::TestGMMStOneiterNO::test_other

test_gmm.py::TestGMMStOneiterNO::test_other

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gmm.py::TestGMMStOneiterOLS_Linear::test_other

test_gmm.py::TestGMMStOneiterOLS_Linear::test_other

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gmm.py::TestGMMStOnestepNO::test_other

test_gmm.py::TestGMMStOnestepNO::test_other

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_yule_walker.py::test_invalid_xfail

test_yule_walker.py::test_invalid_xfail

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_glm_weights.py::TestGlmPoissonPwNr::test_compare_optimizers

test_glm_weights.py::TestGlmPoissonPwNr::test_compare_optimizers

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_glm_weights.py::TestGlmPoissonPwNr::test_basic

test_glm_weights.py::TestGlmPoissonPwNr::test_basic

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_power.py::test_power_solver_warn

test_power.py::test_power_solver_warn

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gam.py::TestAdditiveModel::test_df

test_gam.py::TestAdditiveModel::test_df

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_tost.py::test_tost_transform_paired

test_tost.py::test_tost_transform_paired

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_exact_diffuse_filtering.py::test_irrelevant_state

test_exact_diffuse_filtering.py::test_irrelevant_state

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_kde.py::TestKDEWGauss::test_density

test_kde.py::TestKDEWGauss::test_density

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_kde.py::TestKDEWCos2::test_density

test_kde.py::TestKDEWCos2::test_density

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_kde.py::TestKDEWEpa::test_density

test_kde.py::TestKDEWEpa::test_density

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_kde.py::TestKDEWTri::test_density

test_kde.py::TestKDEWTri::test_density

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_kde.py::TestKDEWBiw::test_density

test_kde.py::TestKDEWBiw::test_density

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_kde.py::TestKDEWCos::test_density

test_kde.py::TestKDEWCos::test_density

[gw2] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_gmm.py::TestGMMOLS::test_other

test_gmm.py::TestGMMOLS::test_other

[gw3] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_kernels.py::TestCosine::test_smoothconf

test_kernels.py::TestCosine::test_smoothconf

[gw1] linux -- Python 3.10.12 /testbed/.venv/bin/python3

test_summary_old.py::test_regression_summary

test_summary_old.py::test_regression_summary

[gw0] linux -- Python 3.10.12 /testbed/.venv/bin/python3

Patch diff

diff --git a/statsmodels/api.py b/statsmodels/api.py
index 00e64027b..b39e1e38a 100644
--- a/statsmodels/api.py
+++ b/statsmodels/api.py
@@ -1,25 +1,109 @@
-__all__ = ['BayesGaussMI', 'BinomialBayesMixedGLM', 'ConditionalLogit',
-    'ConditionalMNLogit', 'ConditionalPoisson', 'Factor', 'GEE', 'GLM',
-    'GLMGam', 'GLS', 'GLSAR', 'GeneralizedPoisson', 'HurdleCountModel',
-    'Logit', 'MANOVA', 'MI', 'MICE', 'MICEData', 'MNLogit', 'MixedLM',
-    'NegativeBinomial', 'NegativeBinomialP', 'NominalGEE', 'OLS',
-    'OrdinalGEE', 'PCA', 'PHReg', 'Poisson', 'PoissonBayesMixedGLM',
-    'ProbPlot', 'Probit', 'QuantReg', 'RLM', 'RecursiveLS', 'SurvfuncRight',
-    'TruncatedLFPoisson', 'TruncatedLFNegativeBinomialP', 'WLS',
-    'ZeroInflatedGeneralizedPoisson', 'ZeroInflatedNegativeBinomialP',
-    'ZeroInflatedPoisson', '__version__', 'add_constant', 'categorical',
-    'cov_struct', 'datasets', 'distributions', 'duration', 'emplike',
-    'families', 'formula', 'gam', 'genmod', 'graphics', 'iolib', 'load',
-    'load_pickle', 'multivariate', 'nonparametric', 'qqline', 'qqplot',
-    'qqplot_2samples', 'regression', 'robust', 'show_versions', 'stats',
-    'test', 'tools', 'tsa', 'webdoc', '__version_info__']
+# -*- coding: utf-8 -*-
+
+__all__ = [
+    "BayesGaussMI",
+    "BinomialBayesMixedGLM",
+    "ConditionalLogit",
+    "ConditionalMNLogit",
+    "ConditionalPoisson",
+    "Factor",
+    "GEE",
+    "GLM",
+    "GLMGam",
+    "GLS",
+    "GLSAR",
+    "GeneralizedPoisson",
+    "HurdleCountModel",
+    "Logit",
+    "MANOVA",
+    "MI",
+    "MICE",
+    "MICEData",
+    "MNLogit",
+    "MixedLM",
+    "NegativeBinomial",
+    "NegativeBinomialP",
+    "NominalGEE",
+    "OLS",
+    "OrdinalGEE",
+    "PCA",
+    "PHReg",
+    "Poisson",
+    "PoissonBayesMixedGLM",
+    "ProbPlot",
+    "Probit",
+    "QuantReg",
+    "RLM",
+    "RecursiveLS",
+    "SurvfuncRight",
+    "TruncatedLFPoisson",
+    "TruncatedLFNegativeBinomialP",
+    "WLS",
+    "ZeroInflatedGeneralizedPoisson",
+    "ZeroInflatedNegativeBinomialP",
+    "ZeroInflatedPoisson",
+    "__version__",
+    "add_constant",
+    "categorical",
+    "cov_struct",
+    "datasets",
+    "distributions",
+    "duration",
+    "emplike",
+    "families",
+    "formula",
+    "gam",
+    "genmod",
+    "graphics",
+    "iolib",
+    "load",
+    "load_pickle",
+    "multivariate",
+    "nonparametric",
+    "qqline",
+    "qqplot",
+    "qqplot_2samples",
+    "regression",
+    "robust",
+    "show_versions",
+    "stats",
+    "test",
+    "tools",
+    "tsa",
+    "webdoc",
+    "__version_info__"
+]
+
+
 from . import datasets, distributions, iolib, regression, robust, tools
 from .__init__ import test
-from statsmodels._version import version as __version__, version_tuple as __version_info__
-from .discrete.conditional_models import ConditionalLogit, ConditionalMNLogit, ConditionalPoisson
-from .discrete.count_model import ZeroInflatedGeneralizedPoisson, ZeroInflatedNegativeBinomialP, ZeroInflatedPoisson
-from .discrete.discrete_model import GeneralizedPoisson, Logit, MNLogit, NegativeBinomial, NegativeBinomialP, Poisson, Probit
-from .discrete.truncated_model import TruncatedLFPoisson, TruncatedLFNegativeBinomialP, HurdleCountModel
+from statsmodels._version import (
+    version as __version__, version_tuple as __version_info__
+)
+from .discrete.conditional_models import (
+    ConditionalLogit,
+    ConditionalMNLogit,
+    ConditionalPoisson,
+)
+from .discrete.count_model import (
+    ZeroInflatedGeneralizedPoisson,
+    ZeroInflatedNegativeBinomialP,
+    ZeroInflatedPoisson,
+)
+from .discrete.discrete_model import (
+    GeneralizedPoisson,
+    Logit,
+    MNLogit,
+    NegativeBinomial,
+    NegativeBinomialP,
+    Poisson,
+    Probit,
+)
+from .discrete.truncated_model import (
+    TruncatedLFPoisson,
+    TruncatedLFNegativeBinomialP,
+    HurdleCountModel,
+    )
 from .duration import api as duration
 from .duration.hazard_regression import PHReg
 from .duration.survfunc import SurvfuncRight
@@ -28,7 +112,16 @@ from .formula import api as formula
 from .gam import api as gam
 from .gam.generalized_additive_model import GLMGam
 from .genmod import api as genmod
-from .genmod.api import GEE, GLM, BinomialBayesMixedGLM, NominalGEE, OrdinalGEE, PoissonBayesMixedGLM, cov_struct, families
+from .genmod.api import (
+    GEE,
+    GLM,
+    BinomialBayesMixedGLM,
+    NominalGEE,
+    OrdinalGEE,
+    PoissonBayesMixedGLM,
+    cov_struct,
+    families,
+)
 from .graphics import api as graphics
 from .graphics.gofplots import ProbPlot, qqline, qqplot, qqplot_2samples
 from .imputation.bayes_mi import MI, BayesGaussMI
@@ -49,4 +142,5 @@ from .tools.print_version import show_versions
 from .tools.tools import add_constant, categorical
 from .tools.web import webdoc
 from .tsa import api as tsa
+
 load = load_pickle
diff --git a/statsmodels/base/_constraints.py b/statsmodels/base/_constraints.py
index ff57dbc65..22c36713c 100644
--- a/statsmodels/base/_constraints.py
+++ b/statsmodels/base/_constraints.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Thu May 15 16:36:05 2014

@@ -5,6 +6,7 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np


@@ -30,15 +32,19 @@ class LinearConstraints:

     """

-    def __init__(self, constraint_matrix, constraint_values, variable_names,
-        **kwds):
+    def __init__(self, constraint_matrix, constraint_values,
+                 variable_names, **kwds):
+
         self.constraint_matrix = constraint_matrix
         self.constraint_values = constraint_values
         self.variable_names = variable_names
+
+        # alias for patsy compatibility
         self.coefs = constraint_matrix
         self.constants = constraint_values
+
         self.__dict__.update(kwds)
-        self.tuple = self.constraint_matrix, self.constraint_values
+        self.tuple = (self.constraint_matrix, self.constraint_values)

     def __iter__(self):
         yield from self.tuple
@@ -47,14 +53,14 @@ class LinearConstraints:
         return self.tuple[idx]

     def __str__(self):
-
         def prod_string(v, name):
             v = np.abs(v)
             if v != 1:
-                ss = str(v) + ' * ' + name
+                ss = str(v) + " * " + name
             else:
                 ss = name
             return ss
+
         constraints_strings = []
         for r, q in zip(*self):
             ss = []
@@ -62,11 +68,12 @@ class LinearConstraints:
                 if v != 0 and ss == []:
                     ss += prod_string(v, name)
                 elif v > 0:
-                    ss += ' + ' + prod_string(v, name)
+                    ss += " + " + prod_string(v, name)
                 elif v < 0:
-                    ss += ' - ' + prod_string(np.abs(v), name)
-            ss += ' = ' + str(q.item())
+                    ss += " - " + prod_string(np.abs(v), name)
+            ss += " = " + str(q.item())
             constraints_strings.append(''.join(ss))
+
         return '\n'.join(constraints_strings)

     @classmethod
@@ -84,7 +91,7 @@ class LinearConstraints:
         instance of this class

         """
-        pass
+        return cls(lc.coefs, lc.constants, lc.variable_names)


 class TransformRestriction:
@@ -125,25 +132,39 @@ class TransformRestriction:
     """

     def __init__(self, R, q=None):
+
+        # The calculations are based on Stata manual for makecns
         R = self.R = np.atleast_2d(R)
         if q is not None:
             q = self.q = np.asarray(q)
+
         k_constr, k_vars = R.shape
         self.k_constr, self.k_vars = k_constr, k_vars
         self.k_unconstr = k_vars - k_constr
+
         m = np.eye(k_vars) - R.T.dot(np.linalg.pinv(R).T)
         evals, evecs = np.linalg.eigh(m)
+
+        # This normalizes the transformation so the larges element is 1.
+        # It makes it easier to interpret simple restrictions, e.g. b1 + b2 = 0
+        # TODO: make this work, there is something wrong, does not round-trip
+        #       need to adjust constant
+        #evecs_maxabs = np.max(np.abs(evecs), 0)
+        #evecs = evecs / evecs_maxabs
+
         self.evals = evals
-        self.evecs = evecs
+        self.evecs = evecs # temporarily attach as attribute
         L = self.L = evecs[:, :k_constr]
         self.transf_mat = evecs[:, k_constr:]
+
         if q is not None:
+            # use solve instead of inv
+            #self.constant = q.T.dot(np.linalg.inv(L.T.dot(R.T)).dot(L.T))
             try:
                 self.constant = q.T.dot(np.linalg.solve(L.T.dot(R.T), L.T))
             except np.linalg.linalg.LinAlgError as e:
-                raise ValueError(
-                    'possibly inconsistent constraints. error generated by\n%r'
-                     % (e,))
+                raise ValueError('possibly inconsistent constraints. error '
+                                 'generated by\n%r' % (e, ))
         else:
             self.constant = 0

@@ -165,7 +186,8 @@ class TransformRestriction:
         If the restriction is not homogeneous, i.e. q is not equal to zero,
         then this is an affine transform.
         """
-        pass
+        params_reduced = np.asarray(params_reduced)
+        return self.transf_mat.dot(params_reduced.T).T + self.constant

     def reduce(self, params):
         """transform from the full to the reduced parameter space
@@ -183,7 +205,8 @@ class TransformRestriction:
         This transform can be applied to the original parameters as well
         as to the data. If params is 2-d, then each row is transformed.
         """
-        pass
+        params = np.asarray(params)
+        return params.dot(self.transf_mat)


 def transform_params_constraint(params, Sinv, R, q):
@@ -218,11 +241,16 @@ def transform_params_constraint(params, Sinv, R, q):
     My guess is that this is the point in the subspace that satisfies
     the constraint that has minimum Mahalanobis distance. Proof ?
     """
-    pass
+
+    rsr = R.dot(Sinv).dot(R.T)
+
+    reduction = Sinv.dot(R.T).dot(np.linalg.solve(rsr, R.dot(params) - q))
+    return params - reduction


 def fit_constrained(model, constraint_matrix, constraint_values,
-    start_params=None, fit_kwds=None):
+                    start_params=None, fit_kwds=None):
+    # note: self is model instance
     """fit model subject to linear equality constraints

     The constraints are of the form   `R params = q`
@@ -278,7 +306,39 @@ def fit_constrained(model, constraint_matrix, constraint_values,

     Requires a model that implement an offset option.
     """
-    pass
+    self = model   # internal alias, used for methods
+    if fit_kwds is None:
+        fit_kwds = {}
+
+    R, q = constraint_matrix, constraint_values
+    endog, exog = self.endog, self.exog
+
+    transf = TransformRestriction(R, q)
+
+    exogp_st = transf.reduce(exog)
+
+    offset = exog.dot(transf.constant.squeeze())
+    if hasattr(self, 'offset'):
+        offset += self.offset
+
+    if start_params is not None:
+        start_params =  transf.reduce(start_params)
+
+    #need copy, because we do not want to change it, we do not need deepcopy
+    import copy
+    init_kwds = copy.copy(self._get_init_kwds())
+
+    # TODO: refactor to combine with above or offset_all
+    if 'offset' in init_kwds:
+        del init_kwds['offset']
+
+    # using offset as keywords is not supported in all modules
+    mod_constr = self.__class__(endog, exogp_st, offset=offset, **init_kwds)
+    res_constr = mod_constr.fit(start_params=start_params, **fit_kwds)
+    params_orig = transf.expand(res_constr.params).squeeze()
+    cov_params = transf.transf_mat.dot(res_constr.cov_params()).dot(transf.transf_mat.T)
+
+    return params_orig, cov_params, res_constr


 def fit_constrained_wrap(model, constraints, start_params=None, **fit_kwds):
@@ -294,4 +354,41 @@ def fit_constrained_wrap(model, constraints, start_params=None, **fit_kwds):
     This is the prototype for the fit_constrained method that has been added
     to Poisson and GLM.
     """
-    pass
+
+    self = model  # alias for use as method
+
+    #constraints = (R, q)
+    # TODO: temporary trailing underscore to not overwrite the monkey
+    #       patched version
+    # TODO: decide whether to move the imports
+    from patsy import DesignInfo
+    # we need this import if we copy it to a different module
+    #from statsmodels.base._constraints import fit_constrained
+
+    # same pattern as in base.LikelihoodModel.t_test
+    lc = DesignInfo(self.exog_names).linear_constraint(constraints)
+    R, q = lc.coefs, lc.constants
+
+    # TODO: add start_params option, need access to tranformation
+    #       fit_constrained needs to do the transformation
+    params, cov, res_constr = fit_constrained(self, R, q,
+                                              start_params=start_params,
+                                              fit_kwds=fit_kwds)
+    #create dummy results Instance, TODO: wire up properly
+    res = self.fit(start_params=params, maxiter=0,
+                   warn_convergence=False)  # we get a wrapper back
+    res._results.params = params
+    res._results.cov_params_default = cov
+    cov_type = fit_kwds.get('cov_type', 'nonrobust')
+    if cov_type == 'nonrobust':
+        res._results.normalized_cov_params = cov / res_constr.scale
+    else:
+        res._results.normalized_cov_params = None
+
+    k_constr = len(q)
+    res._results.df_resid += k_constr
+    res._results.df_model -= k_constr
+    res._results.constraints = LinearConstraints.from_patsy(lc)
+    res._results.k_constr = k_constr
+    res._results.results_constrained = res_constr
+    return res
diff --git a/statsmodels/base/_parameter_inference.py b/statsmodels/base/_parameter_inference.py
index 1e9a26c00..2a160b924 100644
--- a/statsmodels/base/_parameter_inference.py
+++ b/statsmodels/base/_parameter_inference.py
@@ -1,15 +1,18 @@
+# -*- coding: utf-8 -*-
 """
 Created on Wed May 30 15:11:09 2018

 @author: josef
 """
+
 import numpy as np
 from scipy import stats


+# this is a copy from stats._diagnostic_other to avoid circular imports
 def _lm_robust(score, constraint_matrix, score_deriv_inv, cov_score,
-    cov_params=None):
-    """general formula for score/LM test
+               cov_params=None):
+    '''general formula for score/LM test

     generalized score or lagrange multiplier test for implicit constraints

@@ -48,13 +51,35 @@ def _lm_robust(score, constraint_matrix, score_deriv_inv, cov_score,
     Notes
     -----

-    """
-    pass
-
-
-def score_test(self, exog_extra=None, params_constrained=None, hypothesis=
-    'joint', cov_type=None, cov_kwds=None, k_constraints=None, r_matrix=
-    None, scale=None, observed=True):
+    '''
+    # shorthand alias
+    R, Ainv, B, V = constraint_matrix, score_deriv_inv, cov_score, cov_params
+
+    k_constraints = np.linalg.matrix_rank(R)
+    tmp = R.dot(Ainv)
+    wscore = tmp.dot(score)  # C Ainv score
+
+    if B is None and V is None:
+        # only Ainv is given, so we assume information matrix identity holds
+        # computational short cut, should be same if Ainv == inv(B)
+        lm_stat = score.dot(Ainv.dot(score))
+    else:
+        # information matrix identity does not hold
+        if V is None:
+            inner = tmp.dot(B).dot(tmp.T)
+        else:
+            inner = R.dot(V).dot(R.T)
+
+        #lm_stat2 = wscore.dot(np.linalg.pinv(inner).dot(wscore))
+        # Let's assume inner is invertible, TODO: check if usecase for pinv exists
+        lm_stat = wscore.dot(np.linalg.solve(inner, wscore))
+    pval = stats.chi2.sf(lm_stat, k_constraints)
+    return lm_stat, pval, k_constraints
+
+
+def score_test(self, exog_extra=None, params_constrained=None,
+               hypothesis='joint', cov_type=None, cov_kwds=None,
+               k_constraints=None, r_matrix=None, scale=None, observed=True):
     """score test for restrictions or for omitted variables

     Null Hypothesis : constraints are satisfied
@@ -140,25 +165,223 @@ def score_test(self, exog_extra=None, params_constrained=None, hypothesis=
     The covariance matrix of the score is the simple empirical covariance of
     score_obs without degrees of freedom correction.
     """
-    pass
-
-
-def _scorehess_extra(self, params=None, exog_extra=None, exog2_extra=None,
-    hess_kwds=None):
+    # TODO: we are computing unnecessary things for cov_type nonrobust
+    if hasattr(self, "_results"):
+        # use numpy if we have wrapper, not relevant if method
+        self = self._results
+    model = self.model
+    nobs = model.endog.shape[0]  # model.nobs
+    # discrete Poisson does not have nobs
+    if params_constrained is None:
+        params_constrained = self.params
+    cov_type = cov_type if cov_type is not None else self.cov_type
+
+    if observed is False:
+        hess_kwd = {'observed': False}
+    else:
+        hess_kwd = {}
+
+    if exog_extra is None:
+
+        if hasattr(self, 'constraints'):
+            if isinstance(self.constraints, tuple):
+                r_matrix = self.constraints[0]
+            else:
+                r_matrix = self.constraints.coefs
+            k_constraints = r_matrix.shape[0]
+
+        else:
+            if k_constraints is None:
+                raise ValueError('if exog_extra is None, then k_constraints'
+                                 'needs to be given')
+
+        # we need to use results scale as additional parameter
+        if scale is not None:
+            # we need to use results scale as additional parameter, gh #7840
+            score_kwd = {'scale': scale}
+            hess_kwd['scale'] = scale
+        else:
+            score_kwd = {}
+
+        # duplicate computation of score, might not be needed
+        score = model.score(params_constrained, **score_kwd)
+        score_obs = model.score_obs(params_constrained, **score_kwd)
+        hessian = model.hessian(params_constrained, **hess_kwd)
+
+    else:
+        if cov_type == 'V':
+            raise ValueError('if exog_extra is not None, then cov_type cannot '
+                             'be V')
+        if hasattr(self, 'constraints'):
+            raise NotImplementedError('if exog_extra is not None, then self'
+                                      'should not be a constrained fit result')
+
+        if isinstance(exog_extra, tuple):
+            sh = _scorehess_extra(self, params_constrained, *exog_extra,
+                                  hess_kwds=hess_kwd)
+            score_obs, hessian, k_constraints, r_matrix = sh
+            score = score_obs.sum(0)
+        else:
+            exog_extra = np.asarray(exog_extra)
+            k_constraints = 0
+            ex = np.column_stack((model.exog, exog_extra))
+            # this uses shape not matrix rank to determine k_constraints
+            # requires nonsingular (no added perfect collinearity)
+            k_constraints += ex.shape[1] - model.exog.shape[1]
+            # TODO use diag instead of full np.eye
+            r_matrix = np.eye(len(self.params) + k_constraints
+                              )[-k_constraints:]
+
+            score_factor = model.score_factor(params_constrained)
+            if score_factor.ndim == 1:
+                score_obs = (score_factor[:, None] * ex)
+            else:
+                sf = score_factor
+                score_obs = np.column_stack((sf[:, :1] * ex, sf[:, 1:]))
+            score = score_obs.sum(0)
+            hessian_factor = model.hessian_factor(params_constrained,
+                                                  **hess_kwd)
+            # see #4714
+            from statsmodels.genmod.generalized_linear_model import GLM
+            if isinstance(model, GLM):
+                hessian_factor *= -1
+            hessian = np.dot(ex.T * hessian_factor, ex)
+
+    if cov_type == 'nonrobust':
+        cov_score_test = -hessian
+    elif cov_type.upper() == 'HC0':
+        hinv = -np.linalg.inv(hessian)
+        cov_score = nobs * np.cov(score_obs.T)
+        # temporary to try out
+        lm = _lm_robust(score, r_matrix, hinv, cov_score, cov_params=None)
+        return lm
+        # alternative is to use only the center, but it is singular
+        # https://github.com/statsmodels/statsmodels/pull/2096#issuecomment-393646205
+        # cov_score_test_inv = cov_lm_robust(score, r_matrix, hinv,
+        #                                   cov_score, cov_params=None)
+    elif cov_type.upper() == 'V':
+        # TODO: this does not work, V in fit_constrained results is singular
+        # we need cov_params without the zeros in it
+        hinv = -np.linalg.inv(hessian)
+        cov_score = nobs * np.cov(score_obs.T)
+        V = self.cov_params_default
+        # temporary to try out
+        chi2stat = _lm_robust(score, r_matrix, hinv, cov_score, cov_params=V)
+        pval = stats.chi2.sf(chi2stat, k_constraints)
+        return chi2stat, pval
+    else:
+        msg = 'Only cov_type "nonrobust" and "HC0" are available.'
+        raise NotImplementedError(msg)
+
+    if hypothesis == 'joint':
+        chi2stat = score.dot(np.linalg.solve(cov_score_test, score[:, None]))
+        pval = stats.chi2.sf(chi2stat, k_constraints)
+        # return a stats results instance instead?  Contrast?
+        return chi2stat, pval, k_constraints
+    elif hypothesis == 'separate':
+        diff = score
+        bse = np.sqrt(np.diag(cov_score_test))
+        stat = diff / bse
+        pval = stats.norm.sf(np.abs(stat))*2
+        return stat, pval
+    else:
+        raise NotImplementedError('only hypothesis "joint" is available')
+
+
+def _scorehess_extra(self, params=None, exog_extra=None,
+                     exog2_extra=None, hess_kwds=None):
     """Experimental helper function for variable addition score test.

     This uses score and hessian factor at the params which should be the
     params of the restricted model.

     """
-    pass
+    if hess_kwds is None:
+        hess_kwds = {}
+    # this corresponds to a model methods, so we need only the model
+    model = self.model
+    # as long as we have results instance, we can take params from it
+    if params is None:
+        params = self.params
+
+    # get original exog from model, currently only if exactly 2
+    exog_o1, exog_o2 = model._get_exogs()
+
+    if exog_o2 is None:
+        # if extra params is scalar, as in NB, GPP
+        exog_o2 = np.ones((exog_o1.shape[0], 1))
+
+    k_mean = exog_o1.shape[1]
+    k_prec = exog_o2.shape[1]
+    if exog_extra is not None:
+        exog = np.column_stack((exog_o1, exog_extra))
+    else:
+        exog = exog_o1
+
+    if exog2_extra is not None:
+        exog2 = np.column_stack((exog_o2, exog2_extra))
+    else:
+        exog2 = exog_o2
+
+    k_mean_new = exog.shape[1]
+    k_prec_new = exog2.shape[1]
+    k_cm = k_mean_new - k_mean
+    k_cp = k_prec_new - k_prec
+    k_constraints = k_cm + k_cp
+
+    index_mean = np.arange(k_mean, k_mean_new)
+    index_prec = np.arange(k_mean_new + k_prec, k_mean_new + k_prec_new)
+
+    r_matrix = np.zeros((k_constraints, len(params) + k_constraints))
+    # print(exog.shape, exog2.shape)
+    # print(r_matrix.shape, k_cm, k_cp, k_mean_new, k_prec_new)
+    # print(index_mean, index_prec)
+    r_matrix[:k_cm, index_mean] = np.eye(k_cm)
+    r_matrix[k_cm: k_cm + k_cp, index_prec] = np.eye(k_cp)
+
+    if hasattr(model, "score_hessian_factor"):
+        sf, hf = model.score_hessian_factor(params, return_hessian=True,
+                                            **hess_kwds)
+    else:
+        sf = model.score_factor(params)
+        hf = model.hessian_factor(params, **hess_kwds)
+
+    sf1, sf2 = sf
+    hf11, hf12, hf22 = hf
+
+    # elementwise product for each row (observation)
+    d1 = sf1[:, None] * exog
+    d2 = sf2[:, None] * exog2
+    score_obs = np.column_stack((d1, d2))
+
+    # elementwise product for each row (observation)
+    d11 = (exog.T * hf11).dot(exog)
+    d12 = (exog.T * hf12).dot(exog2)
+    d22 = (exog2.T * hf22).dot(exog2)
+    hessian = np.block([[d11, d12], [d12.T, d22]])
+    return score_obs, hessian, k_constraints, r_matrix
+
+
+def im_ratio(results):
+    res = getattr(results, "_results", results)  # shortcut
+    hess = res.model.hessian(res.params)
+    if res.cov_type == "nonrobust":
+        score_obs = res.model.score_obs(res.params)
+        cov_score = score_obs.T @ score_obs
+        hessneg_inv = np.linalg.inv(-hess)
+        im_ratio = hessneg_inv @ cov_score
+    else:
+        im_ratio = res.cov_params() @ (-hess)
+    return im_ratio


 def tic(results):
     """Takeuchi information criterion for misspecified models

     """
-    pass
+    imr = getattr(results, "im_ratio", im_ratio(results))
+    tic = - 2 * results.llf + 2 * np.trace(imr)
+    return tic


 def gbic(results, gbicp=False):
@@ -171,4 +394,11 @@ def gbic(results, gbicp=False):
     Series B (Statistical Methodology) 76 (1): 141–67.

     """
-    pass
+    self = getattr(results, "_results", results)
+    k_params = self.df_model + 1
+    nobs = k_params + self.df_resid
+    imr = getattr(results, "im_ratio", im_ratio(results))
+    imr_logdet = np.linalg.slogdet(imr)[1]
+    gbic = -2 * self.llf + k_params * np.log(nobs) - imr_logdet  # LL equ. (20)
+    gbicp = gbic + np.trace(imr)  # LL equ. (23)
+    return gbic, gbicp
diff --git a/statsmodels/base/_penalized.py b/statsmodels/base/_penalized.py
index dc211ca13..00161cd5d 100644
--- a/statsmodels/base/_penalized.py
+++ b/statsmodels/base/_penalized.py
@@ -1,9 +1,11 @@
+# -*- coding: utf-8 -*-
 """
 Created on Sun May 10 08:23:48 2015

 Author: Josef Perktold
 License: BSD-3
 """
+
 import numpy as np
 from ._penalties import NonePenalty
 from statsmodels.tools.numdiff import approx_fprime_cs, approx_fprime
@@ -29,57 +31,141 @@ class PenalizedMixin:
     """

     def __init__(self, *args, **kwds):
+
+        # pop extra kwds before calling super
         self.penal = kwds.pop('penal', None)
-        self.pen_weight = kwds.pop('pen_weight', None)
+        self.pen_weight =  kwds.pop('pen_weight', None)
+
         super(PenalizedMixin, self).__init__(*args, **kwds)
+
+        # TODO: define pen_weight as average pen_weight? i.e. per observation
+        # I would have prefered len(self.endog) * kwds.get('pen_weight', 1)
+        # or use pen_weight_factor in signature
         if self.pen_weight is None:
             self.pen_weight = len(self.endog)
+
         if self.penal is None:
+            # unpenalized by default
             self.penal = NonePenalty()
             self.pen_weight = 0
+
         self._init_keys.extend(['penal', 'pen_weight'])
         self._null_drop_keys = getattr(self, '_null_drop_keys', [])
         self._null_drop_keys.extend(['penal', 'pen_weight'])

+    def _handle_scale(self, params, scale=None, **kwds):
+
+        if scale is None:
+            # special handling for GLM
+            if hasattr(self, 'scaletype'):
+                mu = self.predict(params)
+                scale = self.estimate_scale(mu)
+            else:
+                scale = 1
+
+        return scale
+
     def loglike(self, params, pen_weight=None, **kwds):
         """
         Log-likelihood of model at params
         """
-        pass
+        if pen_weight is None:
+            pen_weight = self.pen_weight
+
+        llf = super(PenalizedMixin, self).loglike(params, **kwds)
+        if pen_weight != 0:
+            scale = self._handle_scale(params, **kwds)
+            llf -= 1/scale * pen_weight * self.penal.func(params)
+
+        return llf

     def loglikeobs(self, params, pen_weight=None, **kwds):
         """
         Log-likelihood of model observations at params
         """
-        pass
+        if pen_weight is None:
+            pen_weight = self.pen_weight
+
+        llf = super(PenalizedMixin, self).loglikeobs(params, **kwds)
+        nobs_llf = float(llf.shape[0])
+
+        if pen_weight != 0:
+            scale = self._handle_scale(params, **kwds)
+            llf -= 1/scale * pen_weight / nobs_llf * self.penal.func(params)
+
+        return llf

     def score_numdiff(self, params, pen_weight=None, method='fd', **kwds):
         """score based on finite difference derivative
         """
-        pass
+        if pen_weight is None:
+            pen_weight = self.pen_weight
+
+        loglike = lambda p: self.loglike(p, pen_weight=pen_weight, **kwds)
+
+        if method == 'cs':
+            return approx_fprime_cs(params, loglike)
+        elif method == 'fd':
+            return approx_fprime(params, loglike, centered=True)
+        else:
+            raise ValueError('method not recognized, should be "fd" or "cs"')

     def score(self, params, pen_weight=None, **kwds):
         """
         Gradient of model at params
         """
-        pass
+        if pen_weight is None:
+            pen_weight = self.pen_weight
+
+        sc = super(PenalizedMixin, self).score(params, **kwds)
+        if pen_weight != 0:
+            scale = self._handle_scale(params, **kwds)
+            sc -= 1/scale * pen_weight * self.penal.deriv(params)
+
+        return sc

     def score_obs(self, params, pen_weight=None, **kwds):
         """
         Gradient of model observations at params
         """
-        pass
+        if pen_weight is None:
+            pen_weight = self.pen_weight
+
+        sc = super(PenalizedMixin, self).score_obs(params, **kwds)
+        nobs_sc = float(sc.shape[0])
+        if pen_weight != 0:
+            scale = self._handle_scale(params, **kwds)
+            sc -= 1/scale * pen_weight / nobs_sc  * self.penal.deriv(params)
+
+        return sc

     def hessian_numdiff(self, params, pen_weight=None, **kwds):
         """hessian based on finite difference derivative
         """
-        pass
+        if pen_weight is None:
+            pen_weight = self.pen_weight
+        loglike = lambda p: self.loglike(p, pen_weight=pen_weight, **kwds)
+
+        from statsmodels.tools.numdiff import approx_hess
+        return approx_hess(params, loglike)

     def hessian(self, params, pen_weight=None, **kwds):
         """
         Hessian of model at params
         """
-        pass
+        if pen_weight is None:
+            pen_weight = self.pen_weight
+
+        hess = super(PenalizedMixin, self).hessian(params, **kwds)
+        if pen_weight != 0:
+            scale = self._handle_scale(params, **kwds)
+            h = self.penal.deriv2(params)
+            if h.ndim == 1:
+                hess -= 1/scale * np.diag(pen_weight * h)
+            else:
+                hess -= 1/scale * pen_weight * h
+
+        return hess

     def fit(self, method=None, trim=None, **kwds):
         """minimize negative penalized log-likelihood
@@ -101,4 +187,41 @@ class PenalizedMixin:
             Specifically, additional optimizer keywords and cov_type related
             keywords can be added.
         """
-        pass
+        # If method is None, then we choose a default method ourselves
+
+        # TODO: temporary hack, need extra fit kwds
+        # we need to rule out fit methods in a model that will not work with
+        # penalization
+        from statsmodels.gam.generalized_additive_model import GLMGam
+        from statsmodels.genmod.generalized_linear_model import GLM
+        # Only for fit methods supporting max_start_irls
+        if isinstance(self, (GLM, GLMGam)):
+            kwds.update({'max_start_irls': 0})
+
+        # currently we use `bfgs` by default
+        if method is None:
+            method = 'bfgs'
+
+        if trim is None:
+            trim = False
+
+        res = super(PenalizedMixin, self).fit(method=method, **kwds)
+
+        if trim is False:
+            # note boolean check for "is False", not "False_like"
+            return res
+        else:
+            if trim is True:
+                trim = 1e-4  # trim threshold
+            # TODO: make it penal function dependent
+            # temporary standin, only checked for Poisson and GLM,
+            # and is computationally inefficient
+            drop_index = np.nonzero(np.abs(res.params) < trim)[0]
+            keep_index = np.nonzero(np.abs(res.params) > trim)[0]
+
+            if drop_index.any():
+                # TODO: do we need to add results attributes?
+                res_aux = self._fit_zeros(keep_index, **kwds)
+                return res_aux
+            else:
+                return res
diff --git a/statsmodels/base/_penalties.py b/statsmodels/base/_penalties.py
index ba0df6528..df3872513 100644
--- a/statsmodels/base/_penalties.py
+++ b/statsmodels/base/_penalties.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 A collection of smooth penalty functions.

@@ -36,9 +37,9 @@ class Penalty:
     The class has a member called `alpha` that scales the weights.
     """

-    def __init__(self, weights=1.0):
+    def __init__(self, weights=1.):
         self.weights = weights
-        self.alpha = 1.0
+        self.alpha = 1.

     def func(self, params):
         """
@@ -54,7 +55,7 @@ class Penalty:
         A scalar penaty value; greater values imply greater
         penalization.
         """
-        pass
+        raise NotImplementedError

     def deriv(self, params):
         """
@@ -70,7 +71,7 @@ class Penalty:
         The gradient of the penalty with respect to each element in
         `params`.
         """
-        pass
+        raise NotImplementedError

     def _null_weights(self, params):
         """work around for Null model
@@ -79,7 +80,12 @@ class Penalty:
         as in DiscreteModels.
         TODO: check other models
         """
-        pass
+        if np.size(self.weights) > 1:
+            if len(params) == 1:
+                raise  # raise to identify models where this would be needed
+                return 0.
+
+        return self.weights


 class NonePenalty(Penalty):
@@ -93,15 +99,37 @@ class NonePenalty(Penalty):
             import warnings
             warnings.warn('keyword arguments are be ignored')

+    def func(self, params):
+        if params.ndim == 2:
+            return np.zeros(params.shape[1:])
+        else:
+            return 0
+
+    def deriv(self, params):
+        return np.zeros(params.shape)
+
+    def deriv2(self, params):
+        # returns diagonal of hessian
+        return np.zeros(params.shape[0])
+

 class L2(Penalty):
     """
     The L2 (ridge) penalty.
     """

-    def __init__(self, weights=1.0):
+    def __init__(self, weights=1.):
         super().__init__(weights)

+    def func(self, params):
+        return np.sum(self.weights * self.alpha * params**2)
+
+    def deriv(self, params):
+        return 2 * self.weights * self.alpha * params
+
+    def deriv2(self, params):
+        return 2 * self.weights * self.alpha * np.ones(len(params))
+

 class L2Univariate(Penalty):
     """
@@ -110,20 +138,43 @@ class L2Univariate(Penalty):

     def __init__(self, weights=None):
         if weights is None:
-            self.weights = 1.0
+            self.weights = 1.
         else:
             self.weights = weights

+    def func(self, params):
+        return self.weights * params**2
+
+    def deriv(self, params):
+        return 2 * self.weights * params
+
+    def deriv2(self, params):
+        return 2 * self.weights * np.ones(len(params))
+

 class PseudoHuber(Penalty):
     """
     The pseudo-Huber penalty.
     """

-    def __init__(self, dlt, weights=1.0):
+    def __init__(self, dlt, weights=1.):
         super().__init__(weights)
         self.dlt = dlt

+    def func(self, params):
+        v = np.sqrt(1 + (params / self.dlt)**2)
+        v -= 1
+        v *= self.dlt**2
+        return np.sum(self.weights * self.alpha * v, 0)
+
+    def deriv(self, params):
+        v = np.sqrt(1 + (params / self.dlt)**2)
+        return params * self.weights * self.alpha / v
+
+    def deriv2(self, params):
+        v = np.power(1 + (params / self.dlt)**2, -3/2)
+        return self.weights * self.alpha * v
+

 class SCAD(Penalty):
     """
@@ -166,11 +217,49 @@ class SCAD(Penalty):
     1348-1360.
     """

-    def __init__(self, tau, c=3.7, weights=1.0):
+    def __init__(self, tau, c=3.7, weights=1.):
         super().__init__(weights)
         self.tau = tau
         self.c = c

+    def func(self, params):
+
+        # 3 segments in absolute value
+        tau = self.tau
+        p_abs = np.atleast_1d(np.abs(params))
+        res = np.empty(p_abs.shape, p_abs.dtype)
+        res.fill(np.nan)
+        mask1 = p_abs < tau
+        mask3 = p_abs >= self.c * tau
+        res[mask1] = tau * p_abs[mask1]
+        mask2 = ~mask1 & ~mask3
+        p_abs2 = p_abs[mask2]
+        tmp = (p_abs2**2 - 2 * self.c * tau * p_abs2 + tau**2)
+        res[mask2] = -tmp / (2 * (self.c - 1))
+        res[mask3] = (self.c + 1) * tau**2 / 2.
+
+        return (self.weights * res).sum(0)
+
+    def deriv(self, params):
+
+        # 3 segments in absolute value
+        tau = self.tau
+        p = np.atleast_1d(params)
+        p_abs = np.abs(p)
+        p_sign = np.sign(p)
+        res = np.empty(p_abs.shape)
+        res.fill(np.nan)
+
+        mask1 = p_abs < tau
+        mask3 = p_abs >= self.c * tau
+        mask2 = ~mask1 & ~mask3
+        res[mask1] = p_sign[mask1] * tau
+        tmp = p_sign[mask2] * (p_abs[mask2] - self.c * tau)
+        res[mask2] = -tmp / (self.c - 1)
+        res[mask3] = 0
+
+        return self.weights * res
+
     def deriv2(self, params):
         """Second derivative of function

@@ -178,7 +267,19 @@ class SCAD(Penalty):
         Hessian. If the return is 1 dimensional, then it is the diagonal of
         the Hessian.
         """
-        pass
+
+        # 3 segments in absolute value
+        tau = self.tau
+        p = np.atleast_1d(params)
+        p_abs = np.abs(p)
+        res = np.zeros(p_abs.shape)
+
+        mask1 = p_abs < tau
+        mask3 = p_abs >= self.c * tau
+        mask2 = ~mask1 & ~mask3
+        res[mask2] = -1 / (self.c - 1)
+
+        return self.weights * res


 class SCADSmoothed(SCAD):
@@ -207,23 +308,95 @@ class SCADSmoothed(SCAD):
     all penalty classes.
     """

-    def __init__(self, tau, c=3.7, c0=None, weights=1.0, restriction=None):
+    def __init__(self, tau, c=3.7, c0=None, weights=1., restriction=None):
         super().__init__(tau, c=c, weights=weights)
         self.tau = tau
         self.c = c
         self.c0 = c0 if c0 is not None else tau * 0.1
         if self.c0 > tau:
             raise ValueError('c0 cannot be larger than tau')
+
+        # get coefficients for quadratic approximation
         c0 = self.c0
+        # need to temporarily override weights for call to super
         weights = self.weights
-        self.weights = 1.0
+        self.weights = 1.
         deriv_c0 = super(SCADSmoothed, self).deriv(c0)
         value_c0 = super(SCADSmoothed, self).func(c0)
         self.weights = weights
+
         self.aq1 = value_c0 - 0.5 * deriv_c0 * c0
         self.aq2 = 0.5 * deriv_c0 / c0
         self.restriction = restriction

+    def func(self, params):
+        # workaround for Null model
+        weights = self._null_weights(params)
+        # TODO: `and np.size(params) > 1` is hack for llnull, need better solution
+        if self.restriction is not None and np.size(params) > 1:
+            params = self.restriction.dot(params)
+        # need to temporarily override weights for call to super
+        # Note: we have the same problem with `restriction`
+        self_weights = self.weights
+        self.weights = 1.
+        value = super(SCADSmoothed, self).func(params[None, ...])
+        self.weights = self_weights
+
+        # shift down so func(0) == 0
+        value -= self.aq1
+        # change the segment corrsponding to quadratic approximation
+        p_abs = np.atleast_1d(np.abs(params))
+        mask = p_abs < self.c0
+        p_abs_masked = p_abs[mask]
+        value[mask] = self.aq2 * p_abs_masked**2
+
+        return (weights * value).sum(0)
+
+    def deriv(self, params):
+        # workaround for Null model
+        weights = self._null_weights(params)
+        if self.restriction is not None and np.size(params) > 1:
+            params = self.restriction.dot(params)
+        # need to temporarily override weights for call to super
+        self_weights = self.weights
+        self.weights = 1.
+        value = super(SCADSmoothed, self).deriv(params)
+        self.weights = self_weights
+
+        #change the segment corrsponding to quadratic approximation
+        p = np.atleast_1d(params)
+        mask = np.abs(p) < self.c0
+        value[mask] = 2 * self.aq2 * p[mask]
+
+        if self.restriction is not None and np.size(params) > 1:
+            return weights * value.dot(self.restriction)
+        else:
+            return weights * value
+
+    def deriv2(self, params):
+        # workaround for Null model
+        weights = self._null_weights(params)
+        if self.restriction is not None and np.size(params) > 1:
+            params = self.restriction.dot(params)
+        # need to temporarily override weights for call to super
+        self_weights = self.weights
+        self.weights = 1.
+        value = super(SCADSmoothed, self).deriv2(params)
+        self.weights = self_weights
+
+        # change the segment corrsponding to quadratic approximation
+        p = np.atleast_1d(params)
+        mask = np.abs(p) < self.c0
+        value[mask] = 2 * self.aq2
+
+        if self.restriction is not None and np.size(params) > 1:
+            # note: super returns 1d array for diag, i.e. hessian_diag
+            # TODO: weights are missing
+            return (self.restriction.T * (weights * value)
+                    ).dot(self.restriction)
+        else:
+            return weights * value
+

 class ConstraintsPenalty:
     """
@@ -250,13 +423,16 @@ class ConstraintsPenalty:
     """

     def __init__(self, penalty, weights=None, restriction=None):
+
         self.penalty = penalty
         if weights is None:
-            self.weights = 1.0
+            self.weights = 1.
         else:
             self.weights = weights
+
         if restriction is not None:
             restriction = np.asarray(restriction)
+
         self.restriction = restriction

     def func(self, params):
@@ -272,7 +448,14 @@ class ConstraintsPenalty:
         deriv2 : ndarray
             value(s) of penalty function
         """
-        pass
+        # TODO: `and np.size(params) > 1` is hack for llnull, need better solution
+        # Is this still needed? it seems to work without
+        if self.restriction is not None:
+            params = self.restriction.dot(params)
+
+        value = self.penalty.func(params)
+
+        return (self.weights * value.T).T.sum(0)

     def deriv(self, params):
         """first derivative of penalty function w.r.t. params
@@ -287,7 +470,16 @@ class ConstraintsPenalty:
         deriv2 : ndarray
             array of first partial derivatives
         """
-        pass
+        if self.restriction is not None:
+            params = self.restriction.dot(params)
+
+        value = self.penalty.deriv(params)
+
+        if self.restriction is not None:
+            return self.weights * value.T.dot(self.restriction)
+        else:
+            return (self.weights * value.T)
+
     grad = deriv

     def deriv2(self, params):
@@ -303,7 +495,21 @@ class ConstraintsPenalty:
         deriv2 : ndarray, 2-D
             second derivative matrix
         """
-        pass
+
+        if self.restriction is not None:
+            params = self.restriction.dot(params)
+
+        value = self.penalty.deriv2(params)
+
+        if self.restriction is not None:
+            # note: univariate penalty returns 1d array for diag,
+            # i.e. hessian_diag
+            v = (self.restriction.T * value * self.weights)
+            value = v.dot(self.restriction)
+        else:
+            value = np.diag(self.weights * value)
+
+        return value


 class L2ConstraintsPenalty(ConstraintsPenalty):
@@ -311,16 +517,20 @@ class L2ConstraintsPenalty(ConstraintsPenalty):
     """

     def __init__(self, weights=None, restriction=None, sigma_prior=None):
+
         if sigma_prior is not None:
             raise NotImplementedError('sigma_prior is not implemented yet')
+
         penalty = L2Univariate()
+
         super(L2ConstraintsPenalty, self).__init__(penalty, weights=weights,
-            restriction=restriction)
+                                                  restriction=restriction)


 class CovariancePenalty:

     def __init__(self, weight):
+        # weight should be scalar
         self.weight = weight

     def func(self, mat, mat_inv):
@@ -336,7 +546,7 @@ class CovariancePenalty:
         -------
         A scalar penalty value
         """
-        pass
+        raise NotImplementedError

     def deriv(self, mat, mat_inv):
         """
@@ -353,7 +563,7 @@ class CovariancePenalty:
         with respect to each element in the lower triangle
         of `mat`.
         """
-        pass
+        raise NotImplementedError


 class PSD(CovariancePenalty):
@@ -362,3 +572,16 @@ class PSD(CovariancePenalty):
     approaches the boundary of the domain of symmetric, positive
     definite matrices.
     """
+
+    def func(self, mat, mat_inv):
+        try:
+            cy = np.linalg.cholesky(mat)
+        except np.linalg.LinAlgError:
+            return np.inf
+        return -2 * self.weight * np.sum(np.log(np.diag(cy)))
+
+    def deriv(self, mat, mat_inv):
+        cy = mat_inv.copy()
+        cy = 2*cy - np.diag(np.diag(cy))
+        i,j = np.tril_indices(mat.shape[0])
+        return -self.weight * cy[i,j]
diff --git a/statsmodels/base/_prediction_inference.py b/statsmodels/base/_prediction_inference.py
index bab5b7cc0..1730d9a6d 100644
--- a/statsmodels/base/_prediction_inference.py
+++ b/statsmodels/base/_prediction_inference.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Fri Dec 19 11:29:18 2014

@@ -5,17 +6,19 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 from scipy import stats
 import pandas as pd


+# this is similar to ContrastResults after t_test, partially copied, adjusted
 class PredictionResultsBase:
     """Based class for get_prediction results
     """

-    def __init__(self, predicted, var_pred, func=None, deriv=None, df=None,
-        dist=None, row_labels=None, **kwds):
+    def __init__(self, predicted, var_pred, func=None, deriv=None,
+                 df=None, dist=None, row_labels=None, **kwds):
         self.predicted = predicted
         self.var_pred = var_pred
         self.func = func
@@ -23,18 +26,27 @@ class PredictionResultsBase:
         self.df = df
         self.row_labels = row_labels
         self.__dict__.update(kwds)
+
         if dist is None or dist == 'norm':
             self.dist = stats.norm
             self.dist_args = ()
         elif dist == 't':
             self.dist = stats.t
-            self.dist_args = self.df,
+            self.dist_args = (self.df,)
         else:
             self.dist = dist
             self.dist_args = ()

+    @property
+    def se(self):
+        return np.sqrt(self.var_pred)
+
+    @property
+    def tvalues(self):
+        return self.predicted / self.se
+
     def t_test(self, value=0, alternative='two-sided'):
-        """z- or t-test for hypothesis that mean is equal to value
+        '''z- or t-test for hypothesis that mean is equal to value

         Parameters
         ----------
@@ -52,13 +64,33 @@ class PredictionResultsBase:
             the attribute of the instance, specified in `__init__`. Default
             if not specified is the normal distribution.

-        """
-        pass
+        '''
+        # assumes symmetric distribution
+        stat = (self.predicted - value) / self.se
+
+        if alternative in ['two-sided', '2-sided', '2s']:
+            pvalue = self.dist.sf(np.abs(stat), *self.dist_args)*2
+        elif alternative in ['larger', 'l']:
+            pvalue = self.dist.sf(stat, *self.dist_args)
+        elif alternative in ['smaller', 's']:
+            pvalue = self.dist.cdf(stat, *self.dist_args)
+        else:
+            raise ValueError('invalid alternative')
+        return stat, pvalue

     def _conf_int_generic(self, center, se, alpha, dist_args=None):
         """internal function to avoid code duplication
         """
-        pass
+        if dist_args is None:
+            dist_args = ()
+
+        q = self.dist.ppf(1 - alpha / 2., *dist_args)
+        lower = center - q * se
+        upper = center + q * se
+        ci = np.column_stack((lower, upper))
+        # if we want to stack at a new last axis, for lower.ndim > 1
+        # np.concatenate((lower[..., None], upper[..., None]), axis=-1)
+        return ci

     def conf_int(self, *, alpha=0.05, **kwds):
         """Confidence interval for the predicted value.
@@ -79,7 +111,10 @@ class PredictionResultsBase:
             The array has the lower and the upper limit of the confidence
             interval in the columns.
         """
-        pass
+
+        ci = self._conf_int_generic(self.predicted, self.se, alpha,
+                                    dist_args=self.dist_args)
+        return ci

     def summary_frame(self, alpha=0.05):
         """Summary frame
@@ -94,13 +129,27 @@ class PredictionResultsBase:
         -------
         pandas DataFrame with columns 'predicted', 'se', 'ci_lower', 'ci_upper'
         """
-        pass
+        ci = self.conf_int(alpha=alpha)
+        to_include = {}
+        to_include['predicted'] = self.predicted
+        to_include['se'] = self.se
+        to_include['ci_lower'] = ci[:, 0]
+        to_include['ci_upper'] = ci[:, 1]
+
+        self.table = to_include
+        # pandas dict does not handle 2d_array
+        # data = np.column_stack(list(to_include.values()))
+        # names = ....
+        res = pd.DataFrame(to_include, index=self.row_labels,
+                           columns=to_include.keys())
+        return res


 class PredictionResultsMonotonic(PredictionResultsBase):

     def __init__(self, predicted, var_pred, linpred=None, linpred_se=None,
-        func=None, deriv=None, df=None, dist=None, row_labels=None):
+                 func=None, deriv=None, df=None, dist=None, row_labels=None):
+        # TODO: is var_resid used? drop from arguments?
         self.predicted = predicted
         self.var_pred = var_pred
         self.linpred = linpred
@@ -109,12 +158,13 @@ class PredictionResultsMonotonic(PredictionResultsBase):
         self.deriv = deriv
         self.df = df
         self.row_labels = row_labels
+
         if dist is None or dist == 'norm':
             self.dist = stats.norm
             self.dist_args = ()
         elif dist == 't':
             self.dist = stats.t
-            self.dist_args = self.df,
+            self.dist_args = (self.df,)
         else:
             self.dist = dist
             self.dist_args = ()
@@ -122,7 +172,16 @@ class PredictionResultsMonotonic(PredictionResultsBase):
     def _conf_int_generic(self, center, se, alpha, dist_args=None):
         """internal function to avoid code duplication
         """
-        pass
+        if dist_args is None:
+            dist_args = ()
+
+        q = self.dist.ppf(1 - alpha / 2., *dist_args)
+        lower = center - q * se
+        upper = center + q * se
+        ci = np.column_stack((lower, upper))
+        # if we want to stack at a new last axis, for lower.ndim > 1
+        # np.concatenate((lower[..., None], upper[..., None]), axis=-1)
+        return ci

     def conf_int(self, method='endpoint', alpha=0.05, **kwds):
         """Confidence interval for the predicted value.
@@ -151,7 +210,19 @@ class PredictionResultsMonotonic(PredictionResultsBase):
             The array has the lower and the upper limit of the confidence
             interval in the columns.
         """
-        pass
+        tmp = np.linspace(0, 1, 6)
+        # TODO: drop check?
+        is_linear = (self.func(tmp) == tmp).all()
+        if method == 'endpoint' and not is_linear:
+            ci_linear = self._conf_int_generic(self.linpred, self.linpred_se,
+                                               alpha,
+                                               dist_args=self.dist_args)
+            ci = self.func(ci_linear)
+        elif method == 'delta' or is_linear:
+            ci = self._conf_int_generic(self.predicted, self.se, alpha,
+                                        dist_args=self.dist_args)
+
+        return ci


 class PredictionResultsDelta(PredictionResultsBase):
@@ -159,8 +230,10 @@ class PredictionResultsDelta(PredictionResultsBase):
     """

     def __init__(self, results_delta, **kwds):
+
         predicted = results_delta.predicted()
         var_pred = results_delta.var()
+
         super().__init__(predicted, var_pred, **kwds)


@@ -172,8 +245,9 @@ class PredictionResultsMean(PredictionResultsBase):
     `_mean` post fix in the attribute names.
     """

-    def __init__(self, predicted_mean, var_pred_mean, var_resid=None, df=
-        None, dist=None, row_labels=None, linpred=None, link=None):
+    def __init__(self, predicted_mean, var_pred_mean, var_resid=None,
+                 df=None, dist=None, row_labels=None, linpred=None, link=None):
+        # TODO: is var_resid used? drop from arguments?
         self.predicted = predicted_mean
         self.var_pred = var_pred_mean
         self.df = df
@@ -181,16 +255,32 @@ class PredictionResultsMean(PredictionResultsBase):
         self.row_labels = row_labels
         self.linpred = linpred
         self.link = link
+
         if dist is None or dist == 'norm':
             self.dist = stats.norm
             self.dist_args = ()
         elif dist == 't':
             self.dist = stats.t
-            self.dist_args = self.df,
+            self.dist_args = (self.df,)
         else:
             self.dist = dist
             self.dist_args = ()

+    @property
+    def predicted_mean(self):
+        # alias for backwards compatibility
+        return self.predicted
+
+    @property
+    def var_pred_mean(self):
+        # alias for backwards compatibility
+        return self.var_pred
+
+    @property
+    def se_mean(self):
+        # alias for backwards compatibility
+        return self.se
+
     def conf_int(self, method='endpoint', alpha=0.05, **kwds):
         """Confidence interval for the predicted value.

@@ -218,7 +308,21 @@ class PredictionResultsMean(PredictionResultsBase):
             The array has the lower and the upper limit of the confidence
             interval in the columns.
         """
-        pass
+        tmp = np.linspace(0, 1, 6)
+        is_linear = (self.link.inverse(tmp) == tmp).all()
+        if method == 'endpoint' and not is_linear:
+            ci_linear = self.linpred.conf_int(alpha=alpha, obs=False)
+            ci = self.link.inverse(ci_linear)
+        elif method == 'delta' or is_linear:
+            se = self.se_mean
+            q = self.dist.ppf(1 - alpha / 2., *self.dist_args)
+            lower = self.predicted_mean - q * se
+            upper = self.predicted_mean + q * se
+            ci = np.column_stack((lower, upper))
+            # if we want to stack at a new last axis, for lower.ndim > 1
+            # np.concatenate((lower[..., None], upper[..., None]), axis=-1)
+
+        return ci

     def summary_frame(self, alpha=0.05):
         """Summary frame
@@ -234,7 +338,21 @@ class PredictionResultsMean(PredictionResultsBase):
         pandas DataFrame with columns
         'mean', 'mean_se', 'mean_ci_lower', 'mean_ci_upper'.
         """
-        pass
+        # TODO: finish and cleanup
+        ci_mean = self.conf_int(alpha=alpha)
+        to_include = {}
+        to_include['mean'] = self.predicted_mean
+        to_include['mean_se'] = self.se_mean
+        to_include['mean_ci_lower'] = ci_mean[:, 0]
+        to_include['mean_ci_upper'] = ci_mean[:, 1]
+
+        self.table = to_include
+        # pandas dict does not handle 2d_array
+        # data = np.column_stack(list(to_include.values()))
+        # names = ....
+        res = pd.DataFrame(to_include, index=self.row_labels,
+                           columns=to_include.keys())
+        return res


 def _get_exog_predict(self, exog=None, transform=True, row_labels=None):
@@ -262,11 +380,36 @@ def _get_exog_predict(self, exog=None, transform=True, row_labels=None):
     row_labels : list of str
         Labels or pandas index for rows of prediction
     """
-    pass
-

-def get_prediction_glm(self, exog=None, transform=True, row_labels=None,
-    linpred=None, link=None, pred_kwds=None):
+    # prepare exog and row_labels, based on base Results.predict
+    if transform and hasattr(self.model, 'formula') and exog is not None:
+        from patsy import dmatrix
+        if isinstance(exog, pd.Series):
+            exog = pd.DataFrame(exog)
+        exog = dmatrix(self.model.data.design_info, exog)
+
+    if exog is not None:
+        if row_labels is None:
+            row_labels = getattr(exog, 'index', None)
+            if callable(row_labels):
+                row_labels = None
+
+        exog = np.asarray(exog)
+        if exog.ndim == 1 and (self.model.exog.ndim == 1 or
+                               self.model.exog.shape[1] == 1):
+            exog = exog[:, None]
+        exog = np.atleast_2d(exog)  # needed in count model shape[1]
+    else:
+        exog = self.model.exog
+
+        if row_labels is None:
+            row_labels = getattr(self.model.data, 'row_labels', None)
+    return exog, row_labels
+
+
+def get_prediction_glm(self, exog=None, transform=True,
+                       row_labels=None, linpred=None, link=None,
+                       pred_kwds=None):
     """
     Compute prediction results for GLM compatible models.

@@ -301,11 +444,40 @@ def get_prediction_glm(self, exog=None, transform=True, row_labels=None,
         variance and can on demand calculate confidence intervals and summary
         tables for the prediction of the mean and of new observations.
     """
-    pass

+    # prepare exog and row_labels, based on base Results.predict
+    exog, row_labels = _get_exog_predict(
+        self,
+        exog=exog,
+        transform=transform,
+        row_labels=row_labels,
+        )
+
+    if pred_kwds is None:
+        pred_kwds = {}
+
+    predicted_mean = self.model.predict(self.params, exog, **pred_kwds)

-def get_prediction_linear(self, exog=None, transform=True, row_labels=None,
-    pred_kwds=None, index=None):
+    covb = self.cov_params()
+
+    link_deriv = self.model.family.link.inverse_deriv(linpred.predicted_mean)
+    var_pred_mean = link_deriv**2 * (exog * np.dot(covb, exog.T).T).sum(1)
+    var_resid = self.scale  # self.mse_resid / weights
+
+    # TODO: check that we have correct scale, Refactor scale #???
+    # special case for now:
+    if self.cov_type == 'fixed scale':
+        var_resid = self.cov_kwds['scale']
+
+    dist = ['norm', 't'][self.use_t]
+    return PredictionResultsMean(
+        predicted_mean, var_pred_mean, var_resid,
+        df=self.df_resid, dist=dist,
+        row_labels=row_labels, linpred=linpred, link=link)
+
+
+def get_prediction_linear(self, exog=None, transform=True,
+                          row_labels=None, pred_kwds=None, index=None):
     """
     Compute prediction results for linear prediction.

@@ -338,11 +510,42 @@ def get_prediction_linear(self, exog=None, transform=True, row_labels=None,
         variance and can on demand calculate confidence intervals and summary
         tables for the prediction.
     """
-    pass
-

-def get_prediction_monotonic(self, exog=None, transform=True, row_labels=
-    None, link=None, pred_kwds=None, index=None):
+    # prepare exog and row_labels, based on base Results.predict
+    exog, row_labels = _get_exog_predict(
+        self,
+        exog=exog,
+        transform=transform,
+        row_labels=row_labels,
+        )
+
+    if pred_kwds is None:
+        pred_kwds = {}
+
+    k1 = exog.shape[1]
+    if len(self.params > k1):
+        # TODO: we allow endpoint transformation only for the first link
+        index = np.arange(k1)
+    else:
+        index = None
+    # get linear prediction and standard errors
+    covb = self.cov_params(column=index)
+    var_pred = (exog * np.dot(covb, exog.T).T).sum(1)
+    pred_kwds_linear = pred_kwds.copy()
+    pred_kwds_linear["which"] = "linear"
+    predicted = self.model.predict(self.params, exog, **pred_kwds_linear)
+
+    dist = ['norm', 't'][self.use_t]
+    res = PredictionResultsBase(predicted, var_pred,
+                                df=self.df_resid, dist=dist,
+                                row_labels=row_labels
+                                )
+    return res
+
+
+def get_prediction_monotonic(self, exog=None, transform=True,
+                             row_labels=None, link=None,
+                             pred_kwds=None, index=None):
     """
     Compute prediction results when endpoint transformation is valid.

@@ -378,11 +581,53 @@ def get_prediction_monotonic(self, exog=None, transform=True, row_labels=
         variance and can on demand calculate confidence intervals and summary
         tables for the prediction.
     """
-    pass

-
-def get_prediction_delta(self, exog=None, which='mean', average=False,
-    agg_weights=None, transform=True, row_labels=None, pred_kwds=None):
+    # prepare exog and row_labels, based on base Results.predict
+    exog, row_labels = _get_exog_predict(
+        self,
+        exog=exog,
+        transform=transform,
+        row_labels=row_labels,
+        )
+
+    if pred_kwds is None:
+        pred_kwds = {}
+
+    if link is None:
+        link = self.model.family.link
+
+    func_deriv = link.inverse_deriv
+
+    # get linear prediction and standard errors
+    covb = self.cov_params(column=index)
+    linpred_var = (exog * np.dot(covb, exog.T).T).sum(1)
+    pred_kwds_linear = pred_kwds.copy()
+    pred_kwds_linear["which"] = "linear"
+    linpred = self.model.predict(self.params, exog, **pred_kwds_linear)
+
+    predicted = self.model.predict(self.params, exog, **pred_kwds)
+    link_deriv = func_deriv(linpred)
+    var_pred = link_deriv**2 * linpred_var
+
+    dist = ['norm', 't'][self.use_t]
+    res = PredictionResultsMonotonic(predicted, var_pred,
+                                     df=self.df_resid, dist=dist,
+                                     row_labels=row_labels, linpred=linpred,
+                                     linpred_se=np.sqrt(linpred_var),
+                                     func=link.inverse, deriv=func_deriv)
+    return res
+
+
+def get_prediction_delta(
+        self,
+        exog=None,
+        which="mean",
+        average=False,
+        agg_weights=None,
+        transform=True,
+        row_labels=None,
+        pred_kwds=None
+        ):
     """
     compute prediction results

@@ -424,11 +669,35 @@ def get_prediction_delta(self, exog=None, which='mean', average=False,
         variance and can on demand calculate confidence intervals and summary
         tables for the prediction of the mean and of new observations.
     """
-    pass

+    # prepare exog and row_labels, based on base Results.predict
+    exog, row_labels = _get_exog_predict(
+        self,
+        exog=exog,
+        transform=transform,
+        row_labels=row_labels,
+        )
+    if agg_weights is None:
+        agg_weights = np.array(1.)
+
+    def f_pred(p):
+        """Prediction function as function of params
+        """
+        pred = self.model.predict(p, exog, which=which, **pred_kwds)
+        if average:
+            # using `.T` which should work if aggweights is 1-dim
+            pred = (pred.T * agg_weights.T).mean(-1).T
+        return pred
+
+    nlpm = self._get_wald_nonlinear(f_pred)
+    # TODO: currently returns NonlinearDeltaCov
+    res = PredictionResultsDelta(nlpm)
+    return res

-def get_prediction(self, exog=None, transform=True, which='mean',
-    row_labels=None, average=False, agg_weights=None, pred_kwds=None):
+
+def get_prediction(self, exog=None, transform=True, which="mean",
+                   row_labels=None, average=False, agg_weights=None,
+                   pred_kwds=None):
     """
     Compute prediction results when endpoint transformation is valid.

@@ -481,11 +750,65 @@ def get_prediction(self, exog=None, transform=True, which='mean',
     -----
     Status: new in 0.14, experimental
     """
-    pass
-
-
-def params_transform_univariate(params, cov_params, link=None, transform=
-    None, row_labels=None):
+    use_endpoint = getattr(self.model, "_use_endpoint", True)
+
+    if which == "linear":
+        res = get_prediction_linear(
+            self,
+            exog=exog,
+            transform=transform,
+            row_labels=row_labels,
+            pred_kwds=pred_kwds,
+            )
+
+    elif (which == "mean")and (use_endpoint is True) and (average is False):
+        # endpoint transformation
+        k1 = self.model.exog.shape[1]
+        if len(self.params > k1):
+            # TODO: we allow endpoint transformation only for the first link
+            index = np.arange(k1)
+        else:
+            index = None
+
+        pred_kwds["which"] = which
+        # TODO: add link or ilink to all link based models (except zi
+        link = getattr(self.model, "link", None)
+        if link is None:
+            # GLM
+            if hasattr(self.model, "family"):
+                link = getattr(self.model.family, "link", None)
+        if link is None:
+            # defaulting to log link for count models
+            import warnings
+            warnings.warn("using default log-link in get_prediction")
+            from statsmodels.genmod.families import links
+            link = links.Log()
+        res = get_prediction_monotonic(
+            self,
+            exog=exog,
+            transform=transform,
+            row_labels=row_labels,
+            link=link,
+            pred_kwds=pred_kwds,
+            index=index,
+            )
+
+    else:
+        # which is not mean or linear, or we need averaging
+        res = get_prediction_delta(
+            self,
+            exog=exog,
+            which=which,
+            average=average,
+            agg_weights=agg_weights,
+            pred_kwds=pred_kwds,
+            )
+
+    return res
+
+
+def params_transform_univariate(params, cov_params, link=None, transform=None,
+                                row_labels=None):
     """
     results for univariate, nonlinear, monotonicaly transformed parameters

@@ -494,4 +817,30 @@ def params_transform_univariate(params, cov_params, link=None, transform=
     `exp(params)` in the case of Poisson or other models with exponential
     mean function.
     """
-    pass
+
+    from statsmodels.genmod.families import links
+    if link is None and transform is None:
+        link = links.Log()
+
+    if row_labels is None and hasattr(params, 'index'):
+        row_labels = params.index
+
+    params = np.asarray(params)
+
+    predicted_mean = link.inverse(params)
+    link_deriv = link.inverse_deriv(params)
+    var_pred_mean = link_deriv**2 * np.diag(cov_params)
+    # TODO: do we want covariance also, or just var/se
+
+    dist = stats.norm
+
+    # TODO: need ci for linear prediction, method of `lin_pred
+    linpred = PredictionResultsMean(
+        params, np.diag(cov_params), dist=dist,
+        row_labels=row_labels, link=links.Identity())
+
+    res = PredictionResultsMean(
+        predicted_mean, var_pred_mean, dist=dist,
+        row_labels=row_labels, linpred=linpred, link=link)
+
+    return res
diff --git a/statsmodels/base/_screening.py b/statsmodels/base/_screening.py
index 184b74744..562784215 100644
--- a/statsmodels/base/_screening.py
+++ b/statsmodels/base/_screening.py
@@ -1,11 +1,14 @@
+# -*- coding: utf-8 -*-
 """
 Created on Sat May 19 15:53:21 2018

 Author: Josef Perktold
 License: BSD-3
 """
+
 from collections import defaultdict
 import numpy as np
+
 from statsmodels.base._penalties import SCADSmoothed


@@ -58,7 +61,6 @@ class ScreeningResults:
         'idx_nonzero' is based ond the array that includes exog_keep, while
         'idx_exog' is the index based on the exog of the batch.
     """
-
     def __init__(self, screener, **kwds):
         self.screener = screener
         self.__dict__.update(**kwds)
@@ -140,22 +142,29 @@ class VariableScreening:
     """

     def __init__(self, model, pen_weight=None, use_weights=True, k_add=30,
-        k_max_add=30, threshold_trim=0.0001, k_max_included=20,
-        ranking_attr='resid_pearson', ranking_project=True):
+                 k_max_add=30, threshold_trim=1e-4, k_max_included=20,
+                 ranking_attr='resid_pearson', ranking_project=True):
+
         self.model = model
         self.model_class = model.__class__
         self.init_kwds = model._get_init_kwds()
+        # pen_weight and penal are explicitly included
+        # TODO: check what we want to do here
         self.init_kwds.pop('pen_weight', None)
         self.init_kwds.pop('penal', None)
+
         self.endog = model.endog
         self.exog_keep = model.exog
         self.k_keep = model.exog.shape[1]
         self.nobs = len(self.endog)
         self.penal = self._get_penal()
+
         if pen_weight is not None:
             self.pen_weight = pen_weight
         else:
             self.pen_weight = self.nobs * 10
+
+        # option for screening algorithm
         self.use_weights = use_weights
         self.k_add = k_add
         self.k_max_add = k_max_add
@@ -167,15 +176,45 @@ class VariableScreening:
     def _get_penal(self, weights=None):
         """create new Penalty instance
         """
-        pass
+        return SCADSmoothed(0.1, c0=0.0001, weights=weights)

     def ranking_measure(self, res_pen, exog, keep=None):
         """compute measure for ranking exog candidates for inclusion
         """
-        pass
+        endog = self.endog
+
+        if self.ranking_project:
+            assert res_pen.model.exog.shape[1] == len(keep)
+            ex_incl = res_pen.model.exog[:, keep]
+            exog = exog - ex_incl.dot(np.linalg.pinv(ex_incl).dot(exog))
+
+        if self.ranking_attr == 'predicted_poisson':
+            # I keep this for more experiments
+
+            # TODO: does it really help to change/trim params
+            # we are not reestimating with trimmed model
+            p = res_pen.params.copy()
+            if keep is not None:
+                p[~keep] = 0
+            predicted = res_pen.model.predict(p)
+            # this is currently hardcoded for Poisson
+            resid_factor = (endog - predicted) / np.sqrt(predicted)
+        elif self.ranking_attr[:6] == 'model.':
+            # use model method, this is intended for score_factor
+            attr = self.ranking_attr.split('.')[1]
+            resid_factor = getattr(res_pen.model, attr)(res_pen.params)
+            if resid_factor.ndim == 2:
+                # for score_factor when extra params are in model
+                resid_factor = resid_factor[:, 0]
+            mom_cond = np.abs(resid_factor.dot(exog))**2
+        else:
+            # use results attribute
+            resid_factor = getattr(res_pen, self.ranking_attr)
+            mom_cond = np.abs(resid_factor.dot(exog))**2
+        return mom_cond

     def screen_exog(self, exog, endog=None, maxiter=100, method='bfgs',
-        disp=False, fit_kwds=None):
+                    disp=False, fit_kwds=None):
         """screen and select variables (columns) in exog

         Parameters
@@ -203,7 +242,132 @@ class VariableScreening:
             exog, combined exog that are always kept plust exog_candidates.
             see ScreeningResults for a full description
         """
-        pass
+        model_class = self.model_class
+        if endog is None:
+            # allow a different endog than used in model
+            endog = self.endog
+        x0 = self.exog_keep
+        k_keep = self.k_keep
+        x1 = exog
+        k_current = x0.shape[1]
+        # TODO: remove the need for x, use x1 separately from x0
+        # needs change to idx to be based on x1 (candidate variables)
+        x = np.column_stack((x0, x1))
+        nobs, k_vars = x.shape
+        fkwds = fit_kwds if fit_kwds is not None else {}
+        fit_kwds = {'maxiter': 200, 'disp': False}
+        fit_kwds.update(fkwds)
+
+        history = defaultdict(list)
+        idx_nonzero = np.arange(k_keep, dtype=int)
+        keep = np.ones(k_keep, np.bool_)
+        idx_excl = np.arange(k_keep, k_vars)
+        mod_pen = model_class(endog, x0, **self.init_kwds)
+        # do not penalize initial estimate
+        mod_pen.pen_weight = 0
+        res_pen = mod_pen.fit(**fit_kwds)
+        start_params = res_pen.params
+        converged = False
+        idx_old = []
+        for it in range(maxiter):
+            # candidates for inclusion in next iteration
+            x1 = x[:, idx_excl]
+            mom_cond = self.ranking_measure(res_pen, x1, keep=keep)
+            assert len(mom_cond) == len(idx_excl)
+            mcs = np.sort(mom_cond)[::-1]
+            idx_thr = min((self.k_max_add, k_current + self.k_add, len(mcs)))
+            threshold = mcs[idx_thr]
+            # indices of exog in current expansion model
+            idx = np.concatenate((idx_nonzero, idx_excl[mom_cond > threshold]))
+            start_params2 = np.zeros(len(idx))
+            start_params2[:len(start_params)] = start_params
+
+            if self.use_weights:
+                weights = np.ones(len(idx))
+                weights[:k_keep] = 0
+                # modify Penalty instance attached to self
+                # damgerous if res_pen is reused
+                self.penal.weights = weights
+            mod_pen = model_class(endog, x[:, idx], penal=self.penal,
+                                  pen_weight=self.pen_weight,
+                                  **self.init_kwds)
+
+            res_pen = mod_pen.fit(method=method,
+                                  start_params=start_params2,
+                                  warn_convergence=False, skip_hessian=True,
+                                  **fit_kwds)
+
+            keep = np.abs(res_pen.params) > self.threshold_trim
+            # use largest params to keep
+            if keep.sum() > self.k_max_included:
+                # TODO we can use now np.partition with partial sort
+                thresh_params = np.sort(np.abs(res_pen.params))[
+                                                        -self.k_max_included]
+                keep2 = np.abs(res_pen.params) > thresh_params
+                keep = np.logical_and(keep, keep2)
+
+            # Note: idx and keep are for current expansion model
+            # idx_nonzero has indices of selected variables in full exog
+            keep[:k_keep] = True  # always keep exog_keep
+            idx_nonzero = idx[keep]
+
+            if disp:
+                print(keep)
+                print(idx_nonzero)
+            # x0 is exog of currently selected model, not used in iteration
+            # x0 = x[:, idx_nonzero]
+            k_current = len(idx_nonzero)
+            start_params = res_pen.params[keep]
+
+            # use mask to get excluded indices
+            mask_excl = np.ones(k_vars, dtype=bool)
+            mask_excl[idx_nonzero] = False
+            idx_excl = np.nonzero(mask_excl)[0]
+            history['idx_nonzero'].append(idx_nonzero)
+            history['keep'].append(keep)
+            history['params_keep'].append(start_params)
+            history['idx_added'].append(idx)
+
+            if (len(idx_nonzero) == len(idx_old) and
+                    (idx_nonzero == idx_old).all()):
+                converged = True
+                break
+            idx_old = idx_nonzero
+
+        # final esimate
+        # check that we still have exog_keep
+        assert np.all(idx_nonzero[:k_keep] == np.arange(k_keep))
+        if self.use_weights:
+            weights = np.ones(len(idx_nonzero))
+            weights[:k_keep] = 0
+            # create new Penalty instance to avoide sharing attached penal
+            penal = self._get_penal(weights=weights)
+        else:
+            penal = self.penal
+        mod_final = model_class(endog, x[:, idx_nonzero],
+                                penal=penal,
+                                pen_weight=self.pen_weight,
+                                **self.init_kwds)
+
+        res_final = mod_final.fit(method=method,
+                                  start_params=start_params,
+                                  warn_convergence=False,
+                                  **fit_kwds)
+        # set exog_names for final model
+        xnames = ['var%4d' % ii for ii in idx_nonzero]
+        res_final.model.exog_names[k_keep:] = xnames[k_keep:]
+
+        res = ScreeningResults(self,
+                               results_pen = res_pen,
+                               results_final = res_final,
+                               idx_nonzero = idx_nonzero,
+                               idx_exog = idx_nonzero[k_keep:] - k_keep,
+                               idx_excl = idx_excl,
+                               history = history,
+                               converged = converged,
+                               iterations = it + 1  # it is 0-based
+                               )
+        return res

     def screen_exog_iterator(self, exog_iterator):
         """
@@ -235,4 +399,34 @@ class VariableScreening:
             in the exog_iterator.
             see ScreeningResults for a full description
         """
-        pass
+        k_keep = self.k_keep
+        # res_batches = []
+        res_idx = []
+        exog_winner = []
+        exog_idx = []
+        for ex in exog_iterator:
+            res_screen = self.screen_exog(ex, maxiter=20)
+            # avoid storing res_screen, only for debugging
+            # res_batches.append(res_screen)
+            res_idx.append(res_screen.idx_nonzero)
+            exog_winner.append(ex[:, res_screen.idx_nonzero[k_keep:] - k_keep])
+            exog_idx.append(res_screen.idx_nonzero[k_keep:] - k_keep)
+
+        exog_winner = np.column_stack(exog_winner)
+        res_screen_final = self.screen_exog(exog_winner, maxiter=20)
+
+        exog_winner_names = ['var%d_%d' % (bidx, idx)
+                             for bidx, batch in enumerate(exog_idx)
+                             for idx in batch]
+
+        idx_full = [(bidx, idx)
+                    for bidx, batch in enumerate(exog_idx)
+                    for idx in batch]
+        ex_final_idx = res_screen_final.idx_nonzero[k_keep:] - k_keep
+        final_names = np.array(exog_winner_names)[ex_final_idx]
+        res_screen_final.idx_nonzero_batches = np.array(idx_full)[ex_final_idx]
+        res_screen_final.exog_final_names = final_names
+        history = {'idx_nonzero': res_idx,
+                   'idx_exog': exog_idx}
+        res_screen_final.history_batches = history
+        return res_screen_final
diff --git a/statsmodels/base/covtype.py b/statsmodels/base/covtype.py
index 52619df72..cb26fd0c0 100644
--- a/statsmodels/base/covtype.py
+++ b/statsmodels/base/covtype.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Mon Aug 04 08:00:16 2014

@@ -5,33 +6,43 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 from statsmodels.compat.python import lzip
+
 import numpy as np
-descriptions = {'HC0':
-    'Standard Errors are heteroscedasticity robust (HC0)', 'HC1':
-    'Standard Errors are heteroscedasticity robust (HC1)', 'HC2':
-    'Standard Errors are heteroscedasticity robust (HC2)', 'HC3':
-    'Standard Errors are heteroscedasticity robust (HC3)', 'HAC':
-    'Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using {maxlags} lags and {correction} small sample correction'
-    , 'fixed_scale': 'Standard Errors are based on fixed scale', 'cluster':
-    'Standard Errors are robust to cluster correlation (cluster)',
-    'HAC-Panel':
-    'Standard Errors are robust to cluster correlation (HAC-Panel)',
-    'HAC-Groupsum':
-    'Driscoll and Kraay Standard Errors are robust to cluster correlation (HAC-Groupsum)'
-    , 'none': 'Covariance matrix not calculated.', 'approx':
-    'Covariance matrix calculated using numerical ({approx_type}) differentiation.'
-    , 'OPG':
-    'Covariance matrix calculated using the outer product of gradients ({approx_type}).'
-    , 'OIM':
-    'Covariance matrix calculated using the observed information matrix ({approx_type}) described in Harvey (1989).'
-    , 'robust':
-    'Quasi-maximum likelihood covariance matrix used for robustness to some misspecifications; calculated using numerical ({approx_type}) differentiation.'
-    , 'robust-OIM':
-    'Quasi-maximum likelihood covariance matrix used for robustness to some misspecifications; calculated using the observed information matrix ({approx_type}) described in Harvey (1989).'
-    , 'robust-approx':
-    'Quasi-maximum likelihood covariance matrix used for robustness to some misspecifications; calculated using numerical ({approx_type}) differentiation.'
-    }
+
+descriptions = {
+    'HC0': 'Standard Errors are heteroscedasticity robust (HC0)',
+    'HC1': 'Standard Errors are heteroscedasticity robust (HC1)',
+    'HC2': 'Standard Errors are heteroscedasticity robust (HC2)',
+    'HC3': 'Standard Errors are heteroscedasticity robust (HC3)',
+    'HAC': 'Standard Errors are heteroscedasticity and autocorrelation '
+           'robust (HAC) using {maxlags} lags and '
+           '{correction} small sample correction',
+    'fixed_scale': 'Standard Errors are based on fixed scale',
+    'cluster': 'Standard Errors are robust to cluster correlation (cluster)',
+    'HAC-Panel': 'Standard Errors are robust to '
+                 'cluster correlation (HAC-Panel)',
+    'HAC-Groupsum': 'Driscoll and Kraay Standard Errors are robust to '
+                    'cluster correlation (HAC-Groupsum)',
+    'none': 'Covariance matrix not calculated.',
+    'approx': 'Covariance matrix calculated using numerical ({approx_type}) '
+              'differentiation.',
+    'OPG': 'Covariance matrix calculated using the outer product of '
+           'gradients ({approx_type}).',
+    'OIM': 'Covariance matrix calculated using the observed information '
+           'matrix ({approx_type}) described in Harvey (1989).',
+    'robust': 'Quasi-maximum likelihood covariance matrix used for '
+              'robustness to some misspecifications; calculated using '
+              'numerical ({approx_type}) differentiation.',
+    'robust-OIM': 'Quasi-maximum likelihood covariance matrix used for '
+                  'robustness to some misspecifications; calculated using the '
+                  'observed information matrix ({approx_type}) described in '
+                  'Harvey (1989).',
+    'robust-approx': 'Quasi-maximum likelihood covariance matrix used for '
+                     'robustness to some misspecifications; calculated using '
+                     'numerical ({approx_type}) differentiation.',
+}


 def normalize_cov_type(cov_type):
@@ -46,7 +57,11 @@ def normalize_cov_type(cov_type):
     -------
     normalized_cov_type : str
     """
-    pass
+    if cov_type == 'nw-panel':
+        cov_type = 'hac-panel'
+    if cov_type == 'nw-groupsum':
+        cov_type = 'hac-groupsum'
+    return cov_type


 def get_robustcov_results(self, cov_type='HC1', use_t=None, **kwds):
@@ -178,4 +193,177 @@ def get_robustcov_results(self, cov_type='HC1', use_t=None, **kwds):
     .. todo:: Currently there is no check for extra or misspelled keywords,
          except in the case of cov_type `HCx`
     """
-    pass
+
+    import statsmodels.stats.sandwich_covariance as sw
+
+    cov_type = normalize_cov_type(cov_type)
+
+    if 'kernel' in kwds:
+        kwds['weights_func'] = kwds.pop('kernel')
+    if 'weights_func' in kwds and not callable(kwds['weights_func']):
+        kwds['weights_func'] = sw.kernel_dict[kwds['weights_func']]
+
+    # pop because HCx raises if any kwds
+    sc_factor = kwds.pop('scaling_factor', None)
+
+    # TODO: make separate function that returns a robust cov plus info
+    use_self = kwds.pop('use_self', False)
+    if use_self:
+        res = self
+    else:
+        # this does not work for most models, use raw instance instead from fit
+        res = self.__class__(self.model, self.params,
+                   normalized_cov_params=self.normalized_cov_params,
+                   scale=self.scale)
+
+    res.cov_type = cov_type
+    # use_t might already be defined by the class, and already set
+    if use_t is None:
+        use_t = self.use_t
+    res.cov_kwds = {'use_t':use_t}  # store for information
+    res.use_t = use_t
+
+    adjust_df = False
+    if cov_type in ['cluster', 'hac-panel', 'hac-groupsum']:
+        df_correction = kwds.get('df_correction', None)
+        # TODO: check also use_correction, do I need all combinations?
+        if df_correction is not False: # i.e. in [None, True]:
+            # user did not explicitely set it to False
+            adjust_df = True
+
+    res.cov_kwds['adjust_df'] = adjust_df
+
+    # verify and set kwds, and calculate cov
+    # TODO: this should be outsourced in a function so we can reuse it in
+    #       other models
+    # TODO: make it DRYer   repeated code for checking kwds
+    if cov_type.upper() in ('HC0', 'HC1', 'HC2', 'HC3'):
+        if kwds:
+            raise ValueError('heteroscedasticity robust covariance '
+                             'does not use keywords')
+        res.cov_kwds['description'] = descriptions[cov_type.upper()]
+
+        res.cov_params_default = getattr(self, 'cov_' + cov_type.upper(), None)
+        if res.cov_params_default is None:
+            # results classes that do not have cov_HCx attribute
+            res.cov_params_default = sw.cov_white_simple(self,
+                                                         use_correction=False)
+    elif cov_type.lower() == 'hac':
+        maxlags = kwds['maxlags']   # required?, default in cov_hac_simple
+        res.cov_kwds['maxlags'] = maxlags
+        weights_func = kwds.get('weights_func', sw.weights_bartlett)
+        res.cov_kwds['weights_func'] = weights_func
+        use_correction = kwds.get('use_correction', False)
+        res.cov_kwds['use_correction'] = use_correction
+        res.cov_kwds['description'] =  descriptions['HAC'].format(
+            maxlags=maxlags, correction=['without', 'with'][use_correction])
+
+        res.cov_params_default = sw.cov_hac_simple(self, nlags=maxlags,
+                                             weights_func=weights_func,
+                                             use_correction=use_correction)
+    elif cov_type.lower() == 'cluster':
+        #cluster robust standard errors, one- or two-way
+        groups = kwds['groups']
+        if not hasattr(groups, 'shape'):
+            groups = np.asarray(groups).T
+
+        if groups.ndim >= 2:
+            groups = groups.squeeze()
+
+        res.cov_kwds['groups'] = groups
+        use_correction = kwds.get('use_correction', True)
+        res.cov_kwds['use_correction'] = use_correction
+        if groups.ndim == 1:
+            if adjust_df:
+                # need to find number of groups
+                # duplicate work
+                self.n_groups = n_groups = len(np.unique(groups))
+            res.cov_params_default = sw.cov_cluster(self, groups,
+                                             use_correction=use_correction)
+
+        elif groups.ndim == 2:
+            if hasattr(groups, 'values'):
+                groups = groups.values
+
+            if adjust_df:
+                # need to find number of groups
+                # duplicate work
+                n_groups0 = len(np.unique(groups[:,0]))
+                n_groups1 = len(np.unique(groups[:, 1]))
+                self.n_groups = (n_groups0, n_groups1)
+                n_groups = min(n_groups0, n_groups1) # use for adjust_df
+
+            # Note: sw.cov_cluster_2groups has 3 returns
+            res.cov_params_default = sw.cov_cluster_2groups(self, groups,
+                                         use_correction=use_correction)[0]
+        else:
+            raise ValueError('only two groups are supported')
+        res.cov_kwds['description'] = descriptions['cluster']
+
+    elif cov_type.lower() == 'hac-panel':
+        #cluster robust standard errors
+        res.cov_kwds['time'] = time = kwds.get('time', None)
+        res.cov_kwds['groups'] = groups = kwds.get('groups', None)
+        #TODO: nlags is currently required
+        #nlags = kwds.get('nlags', True)
+        #res.cov_kwds['nlags'] = nlags
+        #TODO: `nlags` or `maxlags`
+        res.cov_kwds['maxlags'] = maxlags = kwds['maxlags']
+        use_correction = kwds.get('use_correction', 'hac')
+        res.cov_kwds['use_correction'] = use_correction
+        weights_func = kwds.get('weights_func', sw.weights_bartlett)
+        res.cov_kwds['weights_func'] = weights_func
+        # TODO: clumsy time index in cov_nw_panel
+        if groups is not None:
+            groups = np.asarray(groups)
+            tt = (np.nonzero(groups[:-1] != groups[1:])[0] + 1).tolist()
+            nobs_ = len(groups)
+        elif time is not None:
+            # TODO: clumsy time index in cov_nw_panel
+            time = np.asarray(time)
+            tt = (np.nonzero(time[1:] < time[:-1])[0] + 1).tolist()
+            nobs_ = len(time)
+        else:
+            raise ValueError('either time or groups needs to be given')
+        groupidx = lzip([0] + tt, tt + [nobs_])
+        self.n_groups = n_groups = len(groupidx)
+        res.cov_params_default = sw.cov_nw_panel(self, maxlags, groupidx,
+                                            weights_func=weights_func,
+                                            use_correction=use_correction)
+        res.cov_kwds['description'] = descriptions['HAC-Panel']
+
+    elif cov_type.lower() == 'hac-groupsum':
+        # Driscoll-Kraay standard errors
+        res.cov_kwds['time'] = time = kwds['time']
+        #TODO: nlags is currently required
+        #nlags = kwds.get('nlags', True)
+        #res.cov_kwds['nlags'] = nlags
+        #TODO: `nlags` or `maxlags`
+        res.cov_kwds['maxlags'] = maxlags = kwds['maxlags']
+        use_correction = kwds.get('use_correction', 'cluster')
+        res.cov_kwds['use_correction'] = use_correction
+        weights_func = kwds.get('weights_func', sw.weights_bartlett)
+        res.cov_kwds['weights_func'] = weights_func
+        if adjust_df:
+            # need to find number of groups
+            tt = (np.nonzero(time[1:] < time[:-1])[0] + 1)
+            self.n_groups = n_groups = len(tt) + 1
+        res.cov_params_default = sw.cov_nw_groupsum(self, maxlags, time,
+                                        weights_func=weights_func,
+                                        use_correction=use_correction)
+        res.cov_kwds['description'] = descriptions['HAC-Groupsum']
+    else:
+        raise ValueError('cov_type not recognized. See docstring for ' +
+                         'available options and spelling')
+
+    # generic optional factor to scale covariance
+
+    res.cov_kwds['scaling_factor'] = sc_factor
+    if sc_factor is not None:
+        res.cov_params_default *= sc_factor
+
+    if adjust_df:
+        # Note: df_resid is used for scale and others, add new attribute
+        res.df_resid_inference = n_groups - 1
+
+    return res
diff --git a/statsmodels/base/data.py b/statsmodels/base/data.py
index 23a4bcd15..bed63b830 100644
--- a/statsmodels/base/data.py
+++ b/statsmodels/base/data.py
@@ -3,21 +3,35 @@ Base tools for handling various kinds of data structures, attaching metadata to
 results, and doing data cleaning
 """
 from __future__ import annotations
+
 from statsmodels.compat.python import lmap
+
 from functools import reduce
+
 import numpy as np
 from pandas import DataFrame, Series, isnull, MultiIndex
+
 import statsmodels.tools.data as data_util
 from statsmodels.tools.decorators import cache_readonly, cache_writable
 from statsmodels.tools.sm_exceptions import MissingDataError


+def _asarray_2dcolumns(x):
+    if np.asarray(x).ndim > 1 and np.asarray(x).squeeze().ndim == 1:
+        return
+
+
 def _asarray_2d_null_rows(x):
     """
     Makes sure input is an array and is 2d. Makes sure output is 2d. True
     indicates a null in the rows of 2d x.
     """
-    pass
+    #Have to have the asarrays because isnull does not account for array_like
+    #input
+    x = np.asarray(x)
+    if x.ndim == 1:
+        x = x[:, None]
+    return np.any(isnull(x), axis=1)[:, None]


 def _nan_rows(*arrs):
@@ -26,7 +40,15 @@ def _nan_rows(*arrs):
     of the _2d_ arrays in arrs are NaNs. Inputs can be any mixture of Series,
     DataFrames or array_like.
     """
-    pass
+    if len(arrs) == 1:
+        arrs += ([[False]],)
+
+    def _nan_row_maybe_two_inputs(x, y):
+        # check for dtype bc dataframe has dtypes
+        x_is_boolean_array = hasattr(x, 'dtype') and x.dtype == bool and x
+        return np.logical_or(_asarray_2d_null_rows(x),
+                             (x_is_boolean_array | _asarray_2d_null_rows(y)))
+    return reduce(_nan_row_maybe_two_inputs, arrs).squeeze()


 class ModelData:
@@ -37,8 +59,8 @@ class ModelData:
     _param_names = None
     _cov_names = None

-    def __init__(self, endog, exog=None, missing='none', hasconst=None, **
-        kwargs):
+    def __init__(self, endog, exog=None, missing='none', hasconst=None,
+                 **kwargs):
         if data_util._is_recarray(endog) or data_util._is_recarray(exog):
             from statsmodels.tools.sm_exceptions import recarray_exception
             raise NotImplementedError(recarray_exception)
@@ -47,19 +69,20 @@ class ModelData:
         if 'formula' in kwargs:
             self.formula = kwargs.pop('formula')
         if missing != 'none':
-            arrays, nan_idx = self.handle_missing(endog, exog, missing, **
-                kwargs)
+            arrays, nan_idx = self.handle_missing(endog, exog, missing,
+                                                  **kwargs)
             self.missing_row_idx = nan_idx
-            self.__dict__.update(arrays)
+            self.__dict__.update(arrays)  # attach all the data arrays
             self.orig_endog = self.endog
             self.orig_exog = self.exog
             self.endog, self.exog = self._convert_endog_exog(self.endog,
-                self.exog)
+                                                             self.exog)
         else:
-            self.__dict__.update(kwargs)
+            self.__dict__.update(kwargs)  # attach the extra arrays anyway
             self.orig_endog = endog
             self.orig_exog = exog
             self.endog, self.exog = self._convert_endog_exog(endog, exog)
+
         self.const_idx = None
         self.k_constant = 0
         self._handle_constant(hasconst)
@@ -69,40 +92,272 @@ class ModelData:
     def __getstate__(self):
         from copy import copy
         d = copy(self.__dict__)
-        if 'design_info' in d:
-            del d['design_info']
-            d['restore_design_info'] = True
+        if "design_info" in d:
+            del d["design_info"]
+            d["restore_design_info"] = True
         return d

     def __setstate__(self, d):
-        if 'restore_design_info' in d:
+        if "restore_design_info" in d:
+            # NOTE: there may be a more performant way to do this
             from patsy import dmatrices, PatsyError
             exc = []
             try:
                 data = d['frame']
             except KeyError:
                 data = d['orig_endog'].join(d['orig_exog'])
-            for depth in [2, 3, 1, 0, 4]:
+
+            for depth in [2, 3, 1, 0, 4]:  # sequence is a guess where to likely find it
                 try:
-                    _, design = dmatrices(d['formula'], data, eval_env=
-                        depth, return_type='dataframe')
+                    _, design = dmatrices(d['formula'], data, eval_env=depth,
+                                          return_type='dataframe')
                     break
                 except (NameError, PatsyError) as e:
-                    exc.append(e)
+                    exc.append(e)   # why do I need a reference from outside except block
                     pass
             else:
                 raise exc[-1]
+
             self.design_info = design.design_info
-            del d['restore_design_info']
+            del d["restore_design_info"]
         self.__dict__.update(d)

+    def _handle_constant(self, hasconst):
+        if hasconst is False or self.exog is None:
+            self.k_constant = 0
+            self.const_idx = None
+        else:
+            # detect where the constant is
+            check_implicit = False
+            exog_max = np.max(self.exog, axis=0)
+            if not np.isfinite(exog_max).all():
+                raise MissingDataError('exog contains inf or nans')
+            exog_min = np.min(self.exog, axis=0)
+            const_idx = np.where(exog_max == exog_min)[0].squeeze()
+            self.k_constant = const_idx.size
+
+            if self.k_constant == 1:
+                if self.exog[:, const_idx].mean() != 0:
+                    self.const_idx = int(const_idx)
+                else:
+                    # we only have a zero column and no other constant
+                    check_implicit = True
+            elif self.k_constant > 1:
+                # we have more than one constant column
+                # look for ones
+                values = []  # keep values if we need != 0
+                for idx in const_idx:
+                    value = self.exog[:, idx].mean()
+                    if value == 1:
+                        self.k_constant = 1
+                        self.const_idx = int(idx)
+                        break
+                    values.append(value)
+                else:
+                    # we did not break, no column of ones
+                    pos = (np.array(values) != 0)
+                    if pos.any():
+                        # take the first nonzero column
+                        self.k_constant = 1
+                        self.const_idx = int(const_idx[pos.argmax()])
+                    else:
+                        # only zero columns
+                        check_implicit = True
+            elif self.k_constant == 0:
+                check_implicit = True
+            else:
+                # should not be here
+                pass
+
+            if check_implicit and not hasconst:
+                # look for implicit constant
+                # Compute rank of augmented matrix
+                augmented_exog = np.column_stack(
+                            (np.ones(self.exog.shape[0]), self.exog))
+                rank_augm = np.linalg.matrix_rank(augmented_exog)
+                rank_orig = np.linalg.matrix_rank(self.exog)
+                self.k_constant = int(rank_orig == rank_augm)
+                self.const_idx = None
+            elif hasconst:
+                # Ensure k_constant is 1 any time hasconst is True
+                # even if one is not found
+                self.k_constant = 1
+
+    @classmethod
+    def _drop_nans(cls, x, nan_mask):
+        return x[nan_mask]
+
+    @classmethod
+    def _drop_nans_2d(cls, x, nan_mask):
+        return x[nan_mask][:, nan_mask]
+
     @classmethod
     def handle_missing(cls, endog, exog, missing, **kwargs):
         """
         This returns a dictionary with keys endog, exog and the keys of
         kwargs. It preserves Nones.
         """
-        pass
+        none_array_names = []
+
+        # patsy's already dropped NaNs in y/X
+        missing_idx = kwargs.pop('missing_idx', None)
+
+        if missing_idx is not None:
+            # y, X already handled by patsy. add back in later.
+            combined = ()
+            combined_names = []
+            if exog is None:
+                none_array_names += ['exog']
+        elif exog is not None:
+            combined = (endog, exog)
+            combined_names = ['endog', 'exog']
+        else:
+            combined = (endog,)
+            combined_names = ['endog']
+            none_array_names += ['exog']
+
+        # deal with other arrays
+        combined_2d = ()
+        combined_2d_names = []
+        if len(kwargs):
+            for key, value_array in kwargs.items():
+                if value_array is None or np.ndim(value_array) == 0:
+                    none_array_names += [key]
+                    continue
+                # grab 1d arrays
+                if value_array.ndim == 1:
+                    combined += (np.asarray(value_array),)
+                    combined_names += [key]
+                elif value_array.squeeze().ndim == 1:
+                    combined += (np.asarray(value_array),)
+                    combined_names += [key]
+
+                # grab 2d arrays that are _assumed_ to be symmetric
+                elif value_array.ndim == 2:
+                    combined_2d += (np.asarray(value_array),)
+                    combined_2d_names += [key]
+                else:
+                    raise ValueError("Arrays with more than 2 dimensions "
+                                     "are not yet handled")
+
+        if missing_idx is not None:
+            nan_mask = missing_idx
+            updated_row_mask = None
+            if combined:  # there were extra arrays not handled by patsy
+                combined_nans = _nan_rows(*combined)
+                if combined_nans.shape[0] != nan_mask.shape[0]:
+                    raise ValueError("Shape mismatch between endog/exog "
+                                     "and extra arrays given to model.")
+                # for going back and updated endog/exog
+                updated_row_mask = combined_nans[~nan_mask]
+                nan_mask |= combined_nans  # for updating extra arrays only
+            if combined_2d:
+                combined_2d_nans = _nan_rows(combined_2d)
+                if combined_2d_nans.shape[0] != nan_mask.shape[0]:
+                    raise ValueError("Shape mismatch between endog/exog "
+                                     "and extra 2d arrays given to model.")
+                if updated_row_mask is not None:
+                    updated_row_mask |= combined_2d_nans[~nan_mask]
+                else:
+                    updated_row_mask = combined_2d_nans[~nan_mask]
+                nan_mask |= combined_2d_nans
+
+        else:
+            nan_mask = _nan_rows(*combined)
+            if combined_2d:
+                nan_mask = _nan_rows(*(nan_mask[:, None],) + combined_2d)
+
+        if not np.any(nan_mask):  # no missing do not do anything
+            combined = dict(zip(combined_names, combined))
+            if combined_2d:
+                combined.update(dict(zip(combined_2d_names, combined_2d)))
+            if none_array_names:
+                combined.update({k: kwargs.get(k, None)
+                                 for k in none_array_names})
+
+            if missing_idx is not None:
+                combined.update({'endog': endog})
+                if exog is not None:
+                    combined.update({'exog': exog})
+
+            return combined, []
+
+        elif missing == 'raise':
+            raise MissingDataError("NaNs were encountered in the data")
+
+        elif missing == 'drop':
+            nan_mask = ~nan_mask
+            drop_nans = lambda x: cls._drop_nans(x, nan_mask)
+            drop_nans_2d = lambda x: cls._drop_nans_2d(x, nan_mask)
+            combined = dict(zip(combined_names, lmap(drop_nans, combined)))
+
+            if missing_idx is not None:
+                if updated_row_mask is not None:
+                    updated_row_mask = ~updated_row_mask
+                    # update endog/exog with this new information
+                    endog = cls._drop_nans(endog, updated_row_mask)
+                    if exog is not None:
+                        exog = cls._drop_nans(exog, updated_row_mask)
+
+                combined.update({'endog': endog})
+                if exog is not None:
+                    combined.update({'exog': exog})
+
+            if combined_2d:
+                combined.update(dict(zip(combined_2d_names,
+                                         lmap(drop_nans_2d, combined_2d))))
+            if none_array_names:
+                combined.update({k: kwargs.get(k, None)
+                                 for k in none_array_names})
+
+            return combined, np.where(~nan_mask)[0].tolist()
+        else:
+            raise ValueError("missing option %s not understood" % missing)
+
+    def _convert_endog_exog(self, endog, exog):
+
+        # for consistent outputs if endog is (n,1)
+        yarr = self._get_yarr(endog)
+        xarr = None
+        if exog is not None:
+            xarr = self._get_xarr(exog)
+            if xarr.ndim == 1:
+                xarr = xarr[:, None]
+            if xarr.ndim != 2:
+                raise ValueError("exog is not 1d or 2d")
+
+        return yarr, xarr
+
+    @cache_writable()
+    def ynames(self):
+        endog = self.orig_endog
+        ynames = self._get_names(endog)
+        if not ynames:
+            ynames = _make_endog_names(self.endog)
+
+        if len(ynames) == 1:
+            return ynames[0]
+        else:
+            return list(ynames)
+
+    @cache_writable()
+    def xnames(self) -> list[str] | None:
+        exog = self.orig_exog
+        if exog is not None:
+            xnames = self._get_names(exog)
+            if not xnames:
+                xnames = _make_exog_names(self.exog)
+            return list(xnames)
+        return None
+
+    @property
+    def param_names(self):
+        # for handling names of 'extra' parameters in summary, etc.
+        return self._param_names or self.xnames
+
+    @param_names.setter
+    def param_names(self, values):
+        self._param_names = values

     @property
     def cov_names(self):
@@ -114,11 +369,130 @@ class ModelData:

         If not set, returns param_names
         """
-        pass
+        # for handling names of covariance names in multidimensional models
+        if self._cov_names is not None:
+            return self._cov_names
+        return self.param_names
+
+    @cov_names.setter
+    def cov_names(self, value):
+        # for handling names of covariance names in multidimensional models
+        self._cov_names = value
+
+    @cache_readonly
+    def row_labels(self):
+        exog = self.orig_exog
+        if exog is not None:
+            row_labels = self._get_row_labels(exog)
+        else:
+            endog = self.orig_endog
+            row_labels = self._get_row_labels(endog)
+        return row_labels
+
+    def _get_row_labels(self, arr):
+        return None
+
+    def _get_names(self, arr):
+        if isinstance(arr, DataFrame):
+            if isinstance(arr.columns, MultiIndex):
+                # Flatten MultiIndexes into "simple" column names
+                return ['_'.join((level for level in c if level))
+                        for c in arr.columns]
+            else:
+                return list(arr.columns)
+        elif isinstance(arr, Series):
+            if arr.name:
+                return [arr.name]
+            else:
+                return
+        else:
+            try:
+                return arr.dtype.names
+            except AttributeError:
+                pass
+
+        return None
+
+    def _get_yarr(self, endog):
+        if data_util._is_structured_ndarray(endog):
+            endog = data_util.struct_to_ndarray(endog)
+        endog = np.asarray(endog)
+        if len(endog) == 1:  # never squeeze to a scalar
+            if endog.ndim == 1:
+                return endog
+            elif endog.ndim > 1:
+                return np.asarray([endog.squeeze()])
+
+        return endog.squeeze()
+
+    def _get_xarr(self, exog):
+        if data_util._is_structured_ndarray(exog):
+            exog = data_util.struct_to_ndarray(exog)
+        return np.asarray(exog)
+
+    def _check_integrity(self):
+        if self.exog is not None:
+            if len(self.exog) != len(self.endog):
+                raise ValueError("endog and exog matrices are different sizes")
+
+    def wrap_output(self, obj, how='columns', names=None):
+        if how == 'columns':
+            return self.attach_columns(obj)
+        elif how == 'rows':
+            return self.attach_rows(obj)
+        elif how == 'cov':
+            return self.attach_cov(obj)
+        elif how == 'dates':
+            return self.attach_dates(obj)
+        elif how == 'columns_eq':
+            return self.attach_columns_eq(obj)
+        elif how == 'cov_eq':
+            return self.attach_cov_eq(obj)
+        elif how == 'generic_columns':
+            return self.attach_generic_columns(obj, names)
+        elif how == 'generic_columns_2d':
+            return self.attach_generic_columns_2d(obj, names)
+        elif how == 'ynames':
+            return self.attach_ynames(obj)
+        elif how == 'multivariate_confint':
+            return self.attach_mv_confint(obj)
+        else:
+            return obj
+
+    def attach_columns(self, result):
+        return result
+
+    def attach_columns_eq(self, result):
+        return result
+
+    def attach_cov(self, result):
+        return result
+
+    def attach_cov_eq(self, result):
+        return result
+
+    def attach_rows(self, result):
+        return result
+
+    def attach_dates(self, result):
+        return result
+
+    def attach_mv_confint(self, result):
+        return result
+
+    def attach_generic_columns(self, result, *args, **kwargs):
+        return result
+
+    def attach_generic_columns_2d(self, result, *args, **kwargs):
+        return result
+
+    def attach_ynames(self, result):
+        return result


 class PatsyData(ModelData):
-    pass
+    def _get_names(self, arr):
+        return arr.design_info.column_names


 class PandasData(ModelData):
@@ -127,9 +501,176 @@ class PandasData(ModelData):
     results
     """

+    def _convert_endog_exog(self, endog, exog=None):
+        #TODO: remove this when we handle dtype systematically
+        endog = np.asarray(endog)
+        exog = exog if exog is None else np.asarray(exog)
+        if endog.dtype == object or exog is not None and exog.dtype == object:
+            raise ValueError("Pandas data cast to numpy dtype of object. "
+                             "Check input data with np.asarray(data).")
+        return super(PandasData, self)._convert_endog_exog(endog, exog)
+
+    @classmethod
+    def _drop_nans(cls, x, nan_mask):
+        if isinstance(x, (Series, DataFrame)):
+            return x.loc[nan_mask]
+        else:  # extra arguments could be plain ndarrays
+            return super(PandasData, cls)._drop_nans(x, nan_mask)
+
+    @classmethod
+    def _drop_nans_2d(cls, x, nan_mask):
+        if isinstance(x, (Series, DataFrame)):
+            return x.loc[nan_mask].loc[:, nan_mask]
+        else:  # extra arguments could be plain ndarrays
+            return super(PandasData, cls)._drop_nans_2d(x, nan_mask)
+
+    def _check_integrity(self):
+        endog, exog = self.orig_endog, self.orig_exog
+        # exog can be None and we could be upcasting one or the other
+        if (exog is not None and
+                (hasattr(endog, 'index') and hasattr(exog, 'index')) and
+                not self.orig_endog.index.equals(self.orig_exog.index)):
+            raise ValueError("The indices for endog and exog are not aligned")
+        super(PandasData, self)._check_integrity()
+
+    def _get_row_labels(self, arr):
+        try:
+            return arr.index
+        except AttributeError:
+            # if we've gotten here it's because endog is pandas and
+            # exog is not, so just return the row labels from endog
+            return self.orig_endog.index
+
+    def attach_generic_columns(self, result, names):
+        # get the attribute to use
+        column_names = getattr(self, names, None)
+        return Series(result, index=column_names)
+
+    def attach_generic_columns_2d(self, result, rownames, colnames=None):
+        colnames = colnames or rownames
+        rownames = getattr(self, rownames, None)
+        colnames = getattr(self, colnames, None)
+        return DataFrame(result, index=rownames, columns=colnames)
+
+    def attach_columns(self, result):
+        # this can either be a 1d array or a scalar
+        # do not squeeze because it might be a 2d row array
+        # if it needs a squeeze, the bug is elsewhere
+        if result.ndim <= 1:
+            return Series(result, index=self.param_names)
+        else:  # for e.g., confidence intervals
+            return DataFrame(result, index=self.param_names)
+
+    def attach_columns_eq(self, result):
+        return DataFrame(result, index=self.xnames, columns=self.ynames)
+
+    def attach_cov(self, result):
+        return DataFrame(result, index=self.cov_names, columns=self.cov_names)
+
+    def attach_cov_eq(self, result):
+        return DataFrame(result, index=self.ynames, columns=self.ynames)
+
+    def attach_rows(self, result):
+        # assumes if len(row_labels) > len(result) it's bc it was truncated
+        # at the front, for AR lags, for example
+        squeezed = result.squeeze()
+        k_endog = np.array(self.ynames, ndmin=1).shape[0]
+        if k_endog > 1 and squeezed.shape == (k_endog,):
+            squeezed = squeezed[None, :]
+        # May be zero-dim, for example in the case of forecast one step in tsa
+        if squeezed.ndim < 2:
+            out = Series(squeezed)
+        else:
+            out = DataFrame(result)
+            out.columns = self.ynames
+        out.index = self.row_labels[-len(result):]
+        return out
+
+    def attach_dates(self, result):
+        squeezed = result.squeeze()
+        k_endog = np.array(self.ynames, ndmin=1).shape[0]
+        if k_endog > 1 and squeezed.shape == (k_endog,):
+            squeezed = np.asarray(squeezed)[None, :]
+        # May be zero-dim, for example in the case of forecast one step in tsa
+        if squeezed.ndim < 2:
+            return Series(squeezed, index=self.predict_dates)
+        else:
+            return DataFrame(np.asarray(result),
+                             index=self.predict_dates,
+                             columns=self.ynames)
+
+    def attach_mv_confint(self, result):
+        return DataFrame(result.reshape((-1, 2)),
+                         index=self.cov_names,
+                         columns=['lower', 'upper'])
+
+    def attach_ynames(self, result):
+        squeezed = result.squeeze()
+        # May be zero-dim, for example in the case of forecast one step in tsa
+        if squeezed.ndim < 2:
+            return Series(squeezed, name=self.ynames)
+        else:
+            return DataFrame(result, columns=self.ynames)
+
+
+def _make_endog_names(endog):
+    if endog.ndim == 1 or endog.shape[1] == 1:
+        ynames = ['y']
+    else:  # for VAR
+        ynames = ['y%d' % (i+1) for i in range(endog.shape[1])]
+
+    return ynames
+
+
+def _make_exog_names(exog):
+    exog_var = exog.var(0)
+    if (exog_var == 0).any():
+        # assumes one constant in first or last position
+        # avoid exception if more than one constant
+        const_idx = exog_var.argmin()
+        exog_names = ['x%d' % i for i in range(1, exog.shape[1])]
+        exog_names.insert(const_idx, 'const')
+    else:
+        exog_names = ['x%d' % i for i in range(1, exog.shape[1]+1)]
+
+    return exog_names
+
+
+def handle_missing(endog, exog=None, missing='none', **kwargs):
+    klass = handle_data_class_factory(endog, exog)
+    if missing == 'none':
+        ret_dict = dict(endog=endog, exog=exog)
+        ret_dict.update(kwargs)
+        return ret_dict, None
+    return klass.handle_missing(endog, exog, missing=missing, **kwargs)
+

 def handle_data_class_factory(endog, exog):
     """
     Given inputs
     """
-    pass
+    if data_util._is_using_ndarray_type(endog, exog):
+        klass = ModelData
+    elif data_util._is_using_pandas(endog, exog):
+        klass = PandasData
+    elif data_util._is_using_patsy(endog, exog):
+        klass = PatsyData
+    # keep this check last
+    elif data_util._is_using_ndarray(endog, exog):
+        klass = ModelData
+    else:
+        raise ValueError('unrecognized data structures: %s / %s' %
+                         (type(endog), type(exog)))
+    return klass
+
+
+def handle_data(endog, exog, missing='none', hasconst=None, **kwargs):
+    # deal with lists and tuples up-front
+    if isinstance(endog, (list, tuple)):
+        endog = np.asarray(endog)
+    if isinstance(exog, (list, tuple)):
+        exog = np.asarray(exog)
+
+    klass = handle_data_class_factory(endog, exog)
+    return klass(endog, exog=exog, missing=missing, hasconst=hasconst,
+                 **kwargs)
diff --git a/statsmodels/base/distributed_estimation.py b/statsmodels/base/distributed_estimation.py
index fd302da59..f6f6ac9af 100644
--- a/statsmodels/base/distributed_estimation.py
+++ b/statsmodels/base/distributed_estimation.py
@@ -1,8 +1,10 @@
 from statsmodels.base.elastic_net import RegularizedResults
-from statsmodels.stats.regularized_covariance import _calc_nodewise_row, _calc_nodewise_weight, _calc_approx_inv_cov
+from statsmodels.stats.regularized_covariance import _calc_nodewise_row, \
+    _calc_nodewise_weight, _calc_approx_inv_cov
 from statsmodels.base.model import LikelihoodModelResults
 from statsmodels.regression.linear_model import OLS
 import numpy as np
+
 """
 Distributed estimation routines. Currently, we support several
 methods of distribution
@@ -88,7 +90,12 @@ def _est_regularized_naive(mod, pnum, partitions, fit_kwds=None):
     -------
     An array of the parameters for the regularized fit
     """
-    pass
+
+    if fit_kwds is None:
+        raise ValueError("_est_regularized_naive currently " +
+                         "requires that fit_kwds not be None.")
+
+    return mod.fit_regularized(**fit_kwds).params


 def _est_unregularized_naive(mod, pnum, partitions, fit_kwds=None):
@@ -109,7 +116,12 @@ def _est_unregularized_naive(mod, pnum, partitions, fit_kwds=None):
     -------
     An array of the parameters for the fit
     """
-    pass
+
+    if fit_kwds is None:
+        raise ValueError("_est_unregularized_naive currently " +
+                         "requires that fit_kwds not be None.")
+
+    return mod.fit(**fit_kwds).params


 def _join_naive(params_l, threshold=0):
@@ -123,7 +135,18 @@ def _join_naive(params_l, threshold=0):
     threshold : scalar
         The threshold at which the coefficients will be cut.
     """
-    pass
+
+    p = len(params_l[0])
+    partitions = len(params_l)
+
+    params_mn = np.zeros(p)
+    for params in params_l:
+        params_mn += params
+    params_mn /= partitions
+
+    params_mn[np.abs(params_mn) < threshold] = 0
+
+    return params_mn


 def _calc_grad(mod, params, alpha, L1_wt, score_kwds):
@@ -163,7 +186,10 @@ def _calc_grad(mod, params, alpha, L1_wt, score_kwds):

     X^T(y - X^T params)
     """
-    pass
+
+    grad = -mod.score(np.asarray(params), **score_kwds)
+    grad += alpha * (1 - L1_wt)
+    return grad


 def _calc_wdesign_mat(mod, params, hess_kwds):
@@ -184,11 +210,13 @@ def _calc_wdesign_mat(mod, params, hess_kwds):
     An array-like object, updated design matrix, same dimension
     as mod.exog
     """
-    pass
+
+    rhess = np.sqrt(mod.hessian_factor(np.asarray(params), **hess_kwds))
+    return rhess[:, None] * mod.exog


 def _est_regularized_debiased(mod, mnum, partitions, fit_kwds=None,
-    score_kwds=None, hess_kwds=None):
+                              score_kwds=None, hess_kwds=None):
     """estimates the regularized fitted parameters, is the default
     estimation_method for class DistributedModel.

@@ -215,7 +243,41 @@ def _est_regularized_debiased(mod, mnum, partitions, fit_kwds=None,
         A list of array like objects for nodewise_row
         A list of array like objects for nodewise_weight
     """
-    pass
+
+    score_kwds = {} if score_kwds is None else score_kwds
+    hess_kwds = {} if hess_kwds is None else hess_kwds
+
+    if fit_kwds is None:
+        raise ValueError("_est_regularized_debiased currently " +
+                         "requires that fit_kwds not be None.")
+    else:
+        alpha = fit_kwds["alpha"]
+
+    if "L1_wt" in fit_kwds:
+        L1_wt = fit_kwds["L1_wt"]
+    else:
+        L1_wt = 1
+
+    nobs, p = mod.exog.shape
+    p_part = int(np.ceil((1. * p) / partitions))
+
+    params = mod.fit_regularized(**fit_kwds).params
+    grad = _calc_grad(mod, params, alpha, L1_wt, score_kwds) / nobs
+
+    wexog = _calc_wdesign_mat(mod, params, hess_kwds)
+
+    nodewise_row_l = []
+    nodewise_weight_l = []
+    for idx in range(mnum * p_part, min((mnum + 1) * p_part, p)):
+
+        nodewise_row = _calc_nodewise_row(wexog, idx, alpha)
+        nodewise_row_l.append(nodewise_row)
+
+        nodewise_weight = _calc_nodewise_weight(wexog, nodewise_row, idx,
+                                                alpha)
+        nodewise_weight_l.append(nodewise_weight)
+
+    return params, grad, nodewise_row_l, nodewise_weight_l


 def _join_debiased(results_l, threshold=0):
@@ -230,10 +292,41 @@ def _join_debiased(results_l, threshold=0):
     threshold : scalar
         The threshold at which the coefficients will be cut.
     """
-    pass

+    p = len(results_l[0][0])
+    partitions = len(results_l)
+
+    params_mn = np.zeros(p)
+    grad_mn = np.zeros(p)
+
+    nodewise_row_l = []
+    nodewise_weight_l = []
+
+    for r in results_l:
+
+        params_mn += r[0]
+        grad_mn += r[1]
+
+        nodewise_row_l.extend(r[2])
+        nodewise_weight_l.extend(r[3])
+
+    nodewise_row_l = np.array(nodewise_row_l)
+    nodewise_weight_l = np.array(nodewise_weight_l)

-def _helper_fit_partition(self, pnum, endog, exog, fit_kwds, init_kwds_e={}):
+    params_mn /= partitions
+    grad_mn *= -1. / partitions
+
+    approx_inv_cov = _calc_approx_inv_cov(nodewise_row_l, nodewise_weight_l)
+
+    debiased_params = params_mn + approx_inv_cov.dot(grad_mn)
+
+    debiased_params[np.abs(debiased_params) < threshold] = 0
+
+    return debiased_params
+
+
+def _helper_fit_partition(self, pnum, endog, exog, fit_kwds,
+                          init_kwds_e={}):
     """handles the model fitting for each machine. NOTE: this
     is primarily handled outside of DistributedModel because
     joblib cannot handle class methods.
@@ -258,7 +351,15 @@ def _helper_fit_partition(self, pnum, endog, exog, fit_kwds, init_kwds_e={}):
     estimation_method result.  For the default,
     _est_regularized_debiased, a tuple.
     """
-    pass
+
+    temp_init_kwds = self.init_kwds.copy()
+    temp_init_kwds.update(init_kwds_e)
+
+    model = self.model_class(endog, exog, **temp_init_kwds)
+    results = self.estimation_method(model, pnum, self.partitions,
+                                     fit_kwds=fit_kwds,
+                                     **self.estimation_kwds)
+    return results


 class DistributedModel:
@@ -325,45 +426,55 @@ class DistributedModel:
     --------
     """

-    def __init__(self, partitions, model_class=None, init_kwds=None,
-        estimation_method=None, estimation_kwds=None, join_method=None,
-        join_kwds=None, results_class=None, results_kwds=None):
+    def __init__(self, partitions, model_class=None,
+                 init_kwds=None, estimation_method=None,
+                 estimation_kwds=None, join_method=None, join_kwds=None,
+                 results_class=None, results_kwds=None):
+
         self.partitions = partitions
+
         if model_class is None:
             self.model_class = OLS
         else:
             self.model_class = model_class
+
         if init_kwds is None:
             self.init_kwds = {}
         else:
             self.init_kwds = init_kwds
+
         if estimation_method is None:
             self.estimation_method = _est_regularized_debiased
         else:
             self.estimation_method = estimation_method
+
         if estimation_kwds is None:
             self.estimation_kwds = {}
         else:
             self.estimation_kwds = estimation_kwds
+
         if join_method is None:
             self.join_method = _join_debiased
         else:
             self.join_method = join_method
+
         if join_kwds is None:
             self.join_kwds = {}
         else:
             self.join_kwds = join_kwds
+
         if results_class is None:
             self.results_class = RegularizedResults
         else:
             self.results_class = results_class
+
         if results_kwds is None:
             self.results_kwds = {}
         else:
             self.results_kwds = results_kwds

-    def fit(self, data_generator, fit_kwds=None, parallel_method=
-        'sequential', parallel_backend=None, init_kwds_generator=None):
+    def fit(self, data_generator, fit_kwds=None, parallel_method="sequential",
+            parallel_backend=None, init_kwds_generator=None):
         """Performs the distributed estimation using the corresponding
         DistributedModel

@@ -391,10 +502,36 @@ class DistributedModel:
         join_method result.  For the default, _join_debiased, it returns a
         p length array.
         """
-        pass

-    def fit_sequential(self, data_generator, fit_kwds, init_kwds_generator=None
-        ):
+        if fit_kwds is None:
+            fit_kwds = {}
+
+        if parallel_method == "sequential":
+            results_l = self.fit_sequential(data_generator, fit_kwds,
+                                            init_kwds_generator)
+
+        elif parallel_method == "joblib":
+            results_l = self.fit_joblib(data_generator, fit_kwds,
+                                        parallel_backend,
+                                        init_kwds_generator)
+
+        else:
+            raise ValueError("parallel_method: %s is currently not supported"
+                             % parallel_method)
+
+        params = self.join_method(results_l, **self.join_kwds)
+
+        # NOTE that currently, the dummy result model that is initialized
+        # here does not use any init_kwds from the init_kwds_generator event
+        # if it is provided.  It is possible to imagine an edge case where
+        # this might be a problem but given that the results model instance
+        # does not correspond to any data partition this seems reasonable.
+        res_mod = self.model_class([0], [0], **self.init_kwds)
+
+        return self.results_class(res_mod, params, **self.results_kwds)
+
+    def fit_sequential(self, data_generator, fit_kwds,
+                       init_kwds_generator=None):
         """Sequentially performs the distributed estimation using
         the corresponding DistributedModel

@@ -416,10 +553,32 @@ class DistributedModel:
         join_method result.  For the default, _join_debiased, it returns a
         p length array.
         """
-        pass
+
+        results_l = []
+
+        if init_kwds_generator is None:
+
+            for pnum, (endog, exog) in enumerate(data_generator):
+
+                results = _helper_fit_partition(self, pnum, endog, exog,
+                                                fit_kwds)
+                results_l.append(results)
+
+        else:
+
+            tup_gen = enumerate(zip(data_generator,
+                                    init_kwds_generator))
+
+            for pnum, ((endog, exog), init_kwds_e) in tup_gen:
+
+                results = _helper_fit_partition(self, pnum, endog, exog,
+                                                fit_kwds, init_kwds_e)
+                results_l.append(results)
+
+        return results_l

     def fit_joblib(self, data_generator, fit_kwds, parallel_backend,
-        init_kwds_generator=None):
+                   init_kwds_generator=None):
         """Performs the distributed estimation in parallel using joblib

         Parameters
@@ -443,7 +602,36 @@ class DistributedModel:
         join_method result.  For the default, _join_debiased, it returns a
         p length array.
         """
-        pass
+
+        from statsmodels.tools.parallel import parallel_func
+
+        par, f, n_jobs = parallel_func(_helper_fit_partition, self.partitions)
+
+        if parallel_backend is None and init_kwds_generator is None:
+            results_l = par(f(self, pnum, endog, exog, fit_kwds)
+                            for pnum, (endog, exog)
+                            in enumerate(data_generator))
+
+        elif parallel_backend is not None and init_kwds_generator is None:
+            with parallel_backend:
+                results_l = par(f(self, pnum, endog, exog, fit_kwds)
+                                for pnum, (endog, exog)
+                                in enumerate(data_generator))
+
+        elif parallel_backend is None and init_kwds_generator is not None:
+            tup_gen = enumerate(zip(data_generator, init_kwds_generator))
+            results_l = par(f(self, pnum, endog, exog, fit_kwds, init_kwds)
+                            for pnum, ((endog, exog), init_kwds)
+                            in tup_gen)
+
+        elif parallel_backend is not None and init_kwds_generator is not None:
+            tup_gen = enumerate(zip(data_generator, init_kwds_generator))
+            with parallel_backend:
+                results_l = par(f(self, pnum, endog, exog, fit_kwds, init_kwds)
+                                for pnum, ((endog, exog), init_kwds)
+                                in tup_gen)
+
+        return results_l


 class DistributedResults(LikelihoodModelResults):
@@ -485,4 +673,5 @@ class DistributedResults(LikelihoodModelResults):
             prediction : ndarray, pandas.Series or pandas.DataFrame
             See self.model.predict
         """
-        pass
+
+        return self.model.predict(self.params, exog, *args, **kwargs)
diff --git a/statsmodels/base/elastic_net.py b/statsmodels/base/elastic_net.py
index eb5d21a7b..4c6dbd454 100644
--- a/statsmodels/base/elastic_net.py
+++ b/statsmodels/base/elastic_net.py
@@ -2,6 +2,7 @@ import numpy as np
 from statsmodels.base.model import Results
 import statsmodels.base.wrapper as wrap
 from statsmodels.tools.decorators import cache_readonly
+
 """
 Elastic net regularization.

@@ -36,13 +37,32 @@ def _gen_npfuncs(k, L1_wt, alpha, loglike_kwds, score_kwds, hess_kwds):
     ``x`` is a point in the parameter space and ``model`` is an
     arbitrary statsmodels regression model.
     """
-    pass
+
+    def nploglike(params, model):
+        nobs = model.nobs
+        pen_llf = alpha[k] * (1 - L1_wt) * np.sum(params**2) / 2
+        llf = model.loglike(np.r_[params], **loglike_kwds)
+        return - llf / nobs + pen_llf
+
+    def npscore(params, model):
+        nobs = model.nobs
+        pen_grad = alpha[k] * (1 - L1_wt) * params
+        gr = -model.score(np.r_[params], **score_kwds)[0] / nobs
+        return gr + pen_grad
+
+    def nphess(params, model):
+        nobs = model.nobs
+        pen_hess = alpha[k] * (1 - L1_wt)
+        h = -model.hessian(np.r_[params], **hess_kwds)[0, 0] / nobs + pen_hess
+        return h
+
+    return nploglike, npscore, nphess


-def fit_elasticnet(model, method='coord_descent', maxiter=100, alpha=0.0,
-    L1_wt=1.0, start_params=None, cnvrg_tol=1e-07, zero_tol=1e-08, refit=
-    False, check_step=True, loglike_kwds=None, score_kwds=None, hess_kwds=None
-    ):
+def fit_elasticnet(model, method="coord_descent", maxiter=100,
+                   alpha=0., L1_wt=1., start_params=None, cnvrg_tol=1e-7,
+                   zero_tol=1e-8, refit=False, check_step=True,
+                   loglike_kwds=None, score_kwds=None, hess_kwds=None):
     """
     Return an elastic net regularized fit to a regression model.

@@ -112,10 +132,145 @@ def fit_elasticnet(model, method='coord_descent', maxiter=100, alpha=0.0,
     then repeatedly optimize the L1 penalized version of this function
     along coordinate axes.
     """
-    pass

-
-def _opt_1d(func, grad, hess, model, start, L1_wt, tol, check_step=True):
+    k_exog = model.exog.shape[1]
+
+    loglike_kwds = {} if loglike_kwds is None else loglike_kwds
+    score_kwds = {} if score_kwds is None else score_kwds
+    hess_kwds = {} if hess_kwds is None else hess_kwds
+
+    if np.isscalar(alpha):
+        alpha = alpha * np.ones(k_exog)
+
+    # Define starting params
+    if start_params is None:
+        params = np.zeros(k_exog)
+    else:
+        params = start_params.copy()
+
+    btol = 1e-4
+    params_zero = np.zeros(len(params), dtype=bool)
+
+    init_args = model._get_init_kwds()
+    # we do not need a copy of init_args b/c get_init_kwds provides new dict
+    init_args['hasconst'] = False
+    model_offset = init_args.pop('offset', None)
+    if 'exposure' in init_args and init_args['exposure'] is not None:
+        if model_offset is None:
+            model_offset = np.log(init_args.pop('exposure'))
+        else:
+            model_offset += np.log(init_args.pop('exposure'))
+
+    fgh_list = [
+        _gen_npfuncs(k, L1_wt, alpha, loglike_kwds, score_kwds, hess_kwds)
+        for k in range(k_exog)]
+
+    converged = False
+
+    for itr in range(maxiter):
+
+        # Sweep through the parameters
+        params_save = params.copy()
+        for k in range(k_exog):
+
+            # Under the active set method, if a parameter becomes
+            # zero we do not try to change it again.
+            # TODO : give the user the option to switch this off
+            if params_zero[k]:
+                continue
+
+            # Set the offset to account for the variables that are
+            # being held fixed in the current coordinate
+            # optimization.
+            params0 = params.copy()
+            params0[k] = 0
+            offset = np.dot(model.exog, params0)
+            if model_offset is not None:
+                offset += model_offset
+
+            # Create a one-variable model for optimization.
+            model_1var = model.__class__(
+                model.endog, model.exog[:, k], offset=offset, **init_args)
+
+            # Do the one-dimensional optimization.
+            func, grad, hess = fgh_list[k]
+            params[k] = _opt_1d(
+                func, grad, hess, model_1var, params[k], alpha[k]*L1_wt,
+                tol=btol, check_step=check_step)
+
+            # Update the active set
+            if itr > 0 and np.abs(params[k]) < zero_tol:
+                params_zero[k] = True
+                params[k] = 0.
+
+        # Check for convergence
+        pchange = np.max(np.abs(params - params_save))
+        if pchange < cnvrg_tol:
+            converged = True
+            break
+
+    # Set approximate zero coefficients to be exactly zero
+    params[np.abs(params) < zero_tol] = 0
+
+    if not refit:
+        results = RegularizedResults(model, params)
+        results.converged = converged
+        return RegularizedResultsWrapper(results)
+
+    # Fit the reduced model to get standard errors and other
+    # post-estimation results.
+    ii = np.flatnonzero(params)
+    cov = np.zeros((k_exog, k_exog))
+    init_args = dict([(k, getattr(model, k, None)) for k in model._init_keys])
+    if len(ii) > 0:
+        model1 = model.__class__(
+            model.endog, model.exog[:, ii], **init_args)
+        rslt = model1.fit()
+        params[ii] = rslt.params
+        cov[np.ix_(ii, ii)] = rslt.normalized_cov_params
+    else:
+        # Hack: no variables were selected but we need to run fit in
+        # order to get the correct results class.  So just fit a model
+        # with one variable.
+        model1 = model.__class__(model.endog, model.exog[:, 0], **init_args)
+        rslt = model1.fit(maxiter=0)
+
+    # fit may return a results or a results wrapper
+    if issubclass(rslt.__class__, wrap.ResultsWrapper):
+        klass = rslt._results.__class__
+    else:
+        klass = rslt.__class__
+
+    # Not all models have a scale
+    if hasattr(rslt, 'scale'):
+        scale = rslt.scale
+    else:
+        scale = 1.
+
+    # The degrees of freedom should reflect the number of parameters
+    # in the refit model, not including the zeros that are displayed
+    # to indicate which variables were dropped.  See issue #1723 for
+    # discussion about setting df parameters in model and results
+    # classes.
+    p, q = model.df_model, model.df_resid
+    model.df_model = len(ii)
+    model.df_resid = model.nobs - model.df_model
+
+    # Assuming a standard signature for creating results classes.
+    refit = klass(model, params, cov, scale=scale)
+    refit.regularized = True
+    refit.converged = converged
+    refit.method = method
+    refit.fit_history = {'iteration': itr + 1}
+
+    # Restore df in model class, see issue #1723 for discussion.
+    model.df_model, model.df_resid = p, q
+
+    return refit
+
+
+def _opt_1d(func, grad, hess, model, start, L1_wt, tol,
+            check_step=True):
     """
     One-dimensional helper for elastic net.

@@ -155,7 +310,49 @@ def _opt_1d(func, grad, hess, model, start, L1_wt, tol, check_step=True):
     -------
     The argmin of the objective function.
     """
-    pass
+
+    # Overview:
+    # We want to minimize L(x) + L1_wt*abs(x), where L() is a smooth
+    # loss function that includes the log-likelihood and L2 penalty.
+    # This is a 1-dimensional optimization.  If L(x) is exactly
+    # quadratic we can solve for the argmin exactly.  Otherwise we
+    # approximate L(x) with a quadratic function Q(x) and try to use
+    # the minimizer of Q(x) + L1_wt*abs(x).  But if this yields an
+    # uphill step for the actual target function L(x) + L1_wt*abs(x),
+    # then we fall back to a expensive line search.  The line search
+    # is never needed for OLS.
+
+    x = start
+    f = func(x, model)
+    b = grad(x, model)
+    c = hess(x, model)
+    d = b - c*x
+
+    # The optimum is achieved by hard thresholding to zero
+    if L1_wt > np.abs(d):
+        return 0.
+
+    # x + h is the minimizer of the Q(x) + L1_wt*abs(x)
+    if d >= 0:
+        h = (L1_wt - b) / c
+    elif d < 0:
+        h = -(L1_wt + b) / c
+    else:
+        return np.nan
+
+    # If the new point is not uphill for the target function, take it
+    # and return.  This check is a bit expensive and un-necessary for
+    # OLS
+    if not check_step:
+        return x + h
+    f1 = func(x + h, model) + L1_wt*np.abs(x + h)
+    if f1 <= f + L1_wt*np.abs(x) + 1e-10:
+        return x + h
+
+    # Fallback for models where the loss is not quadratic
+    from scipy.optimize import brent
+    x_opt = brent(func, args=(model,), brack=(x-1, x+1), tol=tol)
+    return x_opt


 class RegularizedResults(Results):
@@ -169,7 +366,6 @@ class RegularizedResults(Results):
     params : ndarray
         The estimated (regularized) parameters.
     """
-
     def __init__(self, model, params):
         super(RegularizedResults, self).__init__(model, params)

@@ -178,12 +374,15 @@ class RegularizedResults(Results):
         """
         The predicted values from the model at the estimated parameters.
         """
-        pass
+        return self.model.predict(self.params)


 class RegularizedResultsWrapper(wrap.ResultsWrapper):
-    _attrs = {'params': 'columns', 'resid': 'rows', 'fittedvalues': 'rows'}
+    _attrs = {
+        'params': 'columns',
+        'resid': 'rows',
+        'fittedvalues': 'rows',
+    }
     _wrap_attrs = _attrs
-
-
-wrap.populate_wrapper(RegularizedResultsWrapper, RegularizedResults)
+wrap.populate_wrapper(RegularizedResultsWrapper,  # noqa:E305
+                      RegularizedResults)
diff --git a/statsmodels/base/l1_cvxopt.py b/statsmodels/base/l1_cvxopt.py
index 94608ccba..11ea2a3ab 100644
--- a/statsmodels/base/l1_cvxopt.py
+++ b/statsmodels/base/l1_cvxopt.py
@@ -5,8 +5,9 @@ import numpy as np
 import statsmodels.base.l1_solvers_common as l1_solvers_common


-def fit_l1_cvxopt_cp(f, score, start_params, args, kwargs, disp=False,
-    maxiter=100, callback=None, retall=False, full_output=False, hess=None):
+def fit_l1_cvxopt_cp(
+        f, score, start_params, args, kwargs, disp=False, maxiter=100,
+        callback=None, retall=False, full_output=False, hess=None):
     """
     Solve the l1 regularized problem using cvxopt.solvers.cp

@@ -52,28 +53,135 @@ def fit_l1_cvxopt_cp(f, score, start_params, args, kwargs, disp=False,
         number of iterative refinement steps when solving KKT equations
         (default: 1).
     """
-    pass
+    from cvxopt import solvers, matrix
+
+    start_params = np.array(start_params).ravel('F')
+
+    ## Extract arguments
+    # k_params is total number of covariates, possibly including a leading constant.
+    k_params = len(start_params)
+    # The start point
+    x0 = np.append(start_params, np.fabs(start_params))
+    x0 = matrix(x0, (2 * k_params, 1))
+    # The regularization parameter
+    alpha = np.array(kwargs['alpha_rescaled']).ravel('F')
+    # Make sure it's a vector
+    alpha = alpha * np.ones(k_params)
+    assert alpha.min() >= 0
+
+    ## Wrap up functions for cvxopt
+    f_0 = lambda x: _objective_func(f, x, k_params, alpha, *args)
+    Df = lambda x: _fprime(score, x, k_params, alpha)
+    G = _get_G(k_params)  # Inequality constraint matrix, Gx \leq h
+    h = matrix(0.0, (2 * k_params, 1))  # RHS in inequality constraint
+    H = lambda x, z: _hessian_wrapper(hess, x, z, k_params)
+
+    ## Define the optimization function
+    def F(x=None, z=None):
+        if x is None:
+            return 0, x0
+        elif z is None:
+            return f_0(x), Df(x)
+        else:
+            return f_0(x), Df(x), H(x, z)
+
+    ## Convert optimization settings to cvxopt form
+    solvers.options['show_progress'] = disp
+    solvers.options['maxiters'] = maxiter
+    if 'abstol' in kwargs:
+        solvers.options['abstol'] = kwargs['abstol']
+    if 'reltol' in kwargs:
+        solvers.options['reltol'] = kwargs['reltol']
+    if 'feastol' in kwargs:
+        solvers.options['feastol'] = kwargs['feastol']
+    if 'refinement' in kwargs:
+        solvers.options['refinement'] = kwargs['refinement']
+
+    ### Call the optimizer
+    results = solvers.cp(F, G, h)
+    x = np.asarray(results['x']).ravel()
+    params = x[:k_params]
+
+    ### Post-process
+    # QC
+    qc_tol = kwargs['qc_tol']
+    qc_verbose = kwargs['qc_verbose']
+    passed = l1_solvers_common.qc_results(
+        params, alpha, score, qc_tol, qc_verbose)
+    # Possibly trim
+    trim_mode = kwargs['trim_mode']
+    size_trim_tol = kwargs['size_trim_tol']
+    auto_trim_tol = kwargs['auto_trim_tol']
+    params, trimmed = l1_solvers_common.do_trim_params(
+        params, k_params, alpha, score, passed, trim_mode, size_trim_tol,
+        auto_trim_tol)
+
+    ### Pack up return values for statsmodels
+    # TODO These retvals are returned as mle_retvals...but the fit was not ML
+    if full_output:
+        fopt = f_0(x)
+        gopt = float('nan')  # Objective is non-differentiable
+        hopt = float('nan')
+        iterations = float('nan')
+        converged = (results['status'] == 'optimal')
+        warnflag = results['status']
+        retvals = {
+            'fopt': fopt, 'converged': converged, 'iterations': iterations,
+            'gopt': gopt, 'hopt': hopt, 'trimmed': trimmed,
+            'warnflag': warnflag}
+    else:
+        x = np.array(results['x']).ravel()
+        params = x[:k_params]
+
+    ### Return results
+    if full_output:
+        return params, retvals
+    else:
+        return params


 def _objective_func(f, x, k_params, alpha, *args):
     """
     The regularized objective function.
     """
-    pass
+    from cvxopt import matrix
+
+    x_arr = np.asarray(x)
+    params = x_arr[:k_params].ravel()
+    u = x_arr[k_params:]
+    # Call the numpy version
+    objective_func_arr = f(params, *args) + (alpha * u).sum()
+    # Return
+    return matrix(objective_func_arr)


 def _fprime(score, x, k_params, alpha):
     """
     The regularized derivative.
     """
-    pass
+    from cvxopt import matrix
+
+    x_arr = np.asarray(x)
+    params = x_arr[:k_params].ravel()
+    # Call the numpy version
+    # The derivative just appends a vector of constants
+    fprime_arr = np.append(score(params), alpha)
+    # Return
+    return matrix(fprime_arr, (1, 2 * k_params))


 def _get_G(k_params):
     """
     The linear inequality constraint matrix.
     """
-    pass
+    from cvxopt import matrix
+
+    I = np.eye(k_params)  # noqa:E741
+    A = np.concatenate((-I, -I), axis=1)
+    B = np.concatenate((I, -I), axis=1)
+    C = np.concatenate((A, B), axis=0)
+    # Return
+    return matrix(C)


 def _hessian_wrapper(hess, x, z, k_params):
@@ -83,4 +191,13 @@ def _hessian_wrapper(hess, x, z, k_params):
     cvxopt wants the hessian of the objective function and the constraints.
         Since our constraints are linear, this part is all zeros.
     """
-    pass
+    from cvxopt import matrix
+
+    x_arr = np.asarray(x)
+    params = x_arr[:k_params].ravel()
+    zh_x = np.asarray(z[0]) * hess(params)
+    zero_mat = np.zeros(zh_x.shape)
+    A = np.concatenate((zh_x, zero_mat), axis=1)
+    B = np.concatenate((zero_mat, zero_mat), axis=1)
+    zh_x_ext = np.concatenate((A, B), axis=0)
+    return matrix(zh_x_ext, (2 * k_params, 2 * k_params))
diff --git a/statsmodels/base/l1_slsqp.py b/statsmodels/base/l1_slsqp.py
index 8c61550d1..2f96f40c8 100644
--- a/statsmodels/base/l1_slsqp.py
+++ b/statsmodels/base/l1_slsqp.py
@@ -7,8 +7,9 @@ from scipy.optimize import fmin_slsqp
 import statsmodels.base.l1_solvers_common as l1_solvers_common


-def fit_l1_slsqp(f, score, start_params, args, kwargs, disp=False, maxiter=
-    1000, callback=None, retall=False, full_output=False, hess=None):
+def fit_l1_slsqp(
+        f, score, start_params, args, kwargs, disp=False, maxiter=1000,
+        callback=None, retall=False, full_output=False, hess=None):
     """
     Solve the l1 regularized problem using scipy.optimize.fmin_slsqp().

@@ -47,32 +48,121 @@ def fit_l1_slsqp(f, score, start_params, args, kwargs, disp=False, maxiter=
     acc : float (default 1e-6)
         Requested accuracy as used by slsqp
     """
-    pass
+    start_params = np.array(start_params).ravel('F')
+
+    ### Extract values
+    # k_params is total number of covariates,
+    # possibly including a leading constant.
+    k_params = len(start_params)
+    # The start point
+    x0 = np.append(start_params, np.fabs(start_params))
+    # alpha is the regularization parameter
+    alpha = np.array(kwargs['alpha_rescaled']).ravel('F')
+    # Make sure it's a vector
+    alpha = alpha * np.ones(k_params)
+    assert alpha.min() >= 0
+    # Convert display parameters to scipy.optimize form
+    disp_slsqp = _get_disp_slsqp(disp, retall)
+    # Set/retrieve the desired accuracy
+    acc = kwargs.setdefault('acc', 1e-10)
+
+    ### Wrap up for use in fmin_slsqp
+    func = lambda x_full: _objective_func(f, x_full, k_params, alpha, *args)
+    f_ieqcons_wrap = lambda x_full: _f_ieqcons(x_full, k_params)
+    fprime_wrap = lambda x_full: _fprime(score, x_full, k_params, alpha)
+    fprime_ieqcons_wrap = lambda x_full: _fprime_ieqcons(x_full, k_params)
+
+    ### Call the solver
+    results = fmin_slsqp(
+        func, x0, f_ieqcons=f_ieqcons_wrap, fprime=fprime_wrap, acc=acc,
+        iter=maxiter, disp=disp_slsqp, full_output=full_output,
+        fprime_ieqcons=fprime_ieqcons_wrap)
+    params = np.asarray(results[0][:k_params])
+
+    ### Post-process
+    # QC
+    qc_tol = kwargs['qc_tol']
+    qc_verbose = kwargs['qc_verbose']
+    passed = l1_solvers_common.qc_results(
+        params, alpha, score, qc_tol, qc_verbose)
+    # Possibly trim
+    trim_mode = kwargs['trim_mode']
+    size_trim_tol = kwargs['size_trim_tol']
+    auto_trim_tol = kwargs['auto_trim_tol']
+    params, trimmed = l1_solvers_common.do_trim_params(
+        params, k_params, alpha, score, passed, trim_mode, size_trim_tol,
+        auto_trim_tol)
+
+    ### Pack up return values for statsmodels optimizers
+    # TODO These retvals are returned as mle_retvals...but the fit was not ML.
+    # This could be confusing someday.
+    if full_output:
+        x_full, fx, its, imode, smode = results
+        fopt = func(np.asarray(x_full))
+        converged = (imode == 0)
+        warnflag = str(imode) + ' ' + smode
+        iterations = its
+        gopt = float('nan')     # Objective is non-differentiable
+        hopt = float('nan')
+        retvals = {
+            'fopt': fopt, 'converged': converged, 'iterations': iterations,
+            'gopt': gopt, 'hopt': hopt, 'trimmed': trimmed,
+            'warnflag': warnflag}
+
+    ### Return
+    if full_output:
+        return params, retvals
+    else:
+        return params
+
+
+def _get_disp_slsqp(disp, retall):
+    if disp or retall:
+        if disp:
+            disp_slsqp = 1
+        if retall:
+            disp_slsqp = 2
+    else:
+        disp_slsqp = 0
+    return disp_slsqp


 def _objective_func(f, x_full, k_params, alpha, *args):
     """
     The regularized objective function
     """
-    pass
+    x_params = x_full[:k_params]
+    x_added = x_full[k_params:]
+    ## Return
+    return f(x_params, *args) + (alpha * x_added).sum()


 def _fprime(score, x_full, k_params, alpha):
     """
     The regularized derivative
     """
-    pass
+    x_params = x_full[:k_params]
+    # The derivative just appends a vector of constants
+    return np.append(score(x_params), alpha)


 def _f_ieqcons(x_full, k_params):
     """
     The inequality constraints.
     """
-    pass
+    x_params = x_full[:k_params]
+    x_added = x_full[k_params:]
+    # All entries in this vector must be \geq 0 in a feasible solution
+    return np.append(x_params + x_added, x_added - x_params)


 def _fprime_ieqcons(x_full, k_params):
     """
     Derivative of the inequality constraints
     """
-    pass
+    I = np.eye(k_params)  # noqa:E741
+    A = np.concatenate((I, I), axis=1)
+    B = np.concatenate((-I, I), axis=1)
+    C = np.concatenate((A, B), axis=0)
+    ## Return
+    return C
diff --git a/statsmodels/base/l1_solvers_common.py b/statsmodels/base/l1_solvers_common.py
index d7c52c500..ecdd68eff 100644
--- a/statsmodels/base/l1_solvers_common.py
+++ b/statsmodels/base/l1_solvers_common.py
@@ -1,7 +1,9 @@
 """
 Holds common functions for l1 solvers.
 """
+
 import numpy as np
+
 from statsmodels.tools.sm_exceptions import ConvergenceWarning


@@ -38,11 +40,58 @@ def qc_results(params, alpha, score, qc_tol, qc_verbose=False):
     ------
     Warning message if QC check fails.
     """
-    pass
+    ## Check for fatal errors
+    assert not np.isnan(params).max()
+    assert (params == params.ravel('F')).min(), \
+        "params should have already been 1-d"
+
+    ## Start the theory compliance check
+    fprime = score(params)
+    k_params = len(params)
+
+    passed_array = np.array([True] * k_params)
+    for i in range(k_params):
+        if alpha[i] > 0:
+            # If |fprime| is too big, then something went wrong
+            if (abs(fprime[i]) - alpha[i]) / alpha[i] > qc_tol:
+                passed_array[i] = False
+    qc_dict = dict(
+        fprime=fprime, alpha=alpha, params=params, passed_array=passed_array)
+    passed = passed_array.min()
+    if not passed:
+        num_failed = (~passed_array).sum()
+        message = 'QC check did not pass for %d out of %d parameters' % (
+            num_failed, k_params)
+        message += '\nTry increasing solver accuracy or number of iterations'\
+            ', decreasing alpha, or switch solvers'
+        if qc_verbose:
+            message += _get_verbose_addon(qc_dict)
+
+        import warnings
+        warnings.warn(message, ConvergenceWarning)
+
+    return passed
+
+
+def _get_verbose_addon(qc_dict):
+    alpha = qc_dict['alpha']
+    params = qc_dict['params']
+    fprime = qc_dict['fprime']
+    passed_array = qc_dict['passed_array']
+
+    addon = '\n------ verbose QC printout -----------------'
+    addon = '\n------ Recall the problem was rescaled by 1 / nobs ---'
+    addon += '\n|%-10s|%-10s|%-10s|%-10s|' % (
+        'passed', 'alpha', 'fprime', 'param')
+    addon += '\n--------------------------------------------'
+    for i in range(len(alpha)):
+        addon += '\n|%-10s|%-10.3e|%-10.3e|%-10.3e|' % (
+                passed_array[i], alpha[i], fprime[i], params[i])
+    return addon


 def do_trim_params(params, k_params, alpha, score, passed, trim_mode,
-    size_trim_tol, auto_trim_tol):
+        size_trim_tol, auto_trim_tol):
     """
     Trims (set to zero) params that are zero at the theoretical minimum.
     Uses heuristics to account for the solver not actually finding the minimum.
@@ -83,4 +132,32 @@ def do_trim_params(params, k_params, alpha, score, passed, trim_mode,
     trimmed : ndarray of booleans
         trimmed[i] == True if the ith parameter was trimmed.
     """
-    pass
+    ## Trim the small params
+    trimmed = [False] * k_params
+
+    if trim_mode == 'off':
+        trimmed = np.array([False] * k_params)
+    elif trim_mode == 'auto' and not passed:
+        import warnings
+        msg = "Could not trim params automatically due to failed QC check. " \
+              "Trimming using trim_mode == 'size' will still work."
+        warnings.warn(msg, ConvergenceWarning)
+        trimmed = np.array([False] * k_params)
+    elif trim_mode == 'auto' and passed:
+        fprime = score(params)
+        for i in range(k_params):
+            if alpha[i] != 0:
+                if (alpha[i] - abs(fprime[i])) / alpha[i] > auto_trim_tol:
+                    params[i] = 0.0
+                    trimmed[i] = True
+    elif trim_mode == 'size':
+        for i in range(k_params):
+            if alpha[i] != 0:
+                if abs(params[i]) < size_trim_tol:
+                    params[i] = 0.0
+                    trimmed[i] = True
+    else:
+        raise ValueError(
+            "trim_mode == %s, which is not recognized" % (trim_mode))
+
+    return params, np.asarray(trimmed)
diff --git a/statsmodels/base/model.py b/statsmodels/base/model.py
index 5cd0d18a3..578a7f8a8 100644
--- a/statsmodels/base/model.py
+++ b/statsmodels/base/model.py
@@ -1,22 +1,39 @@
 from __future__ import annotations
+
 from statsmodels.compat.python import lzip
+
 from functools import reduce
 import warnings
+
 import numpy as np
 import pandas as pd
 from scipy import stats
+
 from statsmodels.base.data import handle_data
 from statsmodels.base.optimizer import Optimizer
 import statsmodels.base.wrapper as wrap
 from statsmodels.formula import handle_formula_data
-from statsmodels.stats.contrast import ContrastResults, WaldTestResults, t_test_pairwise
+from statsmodels.stats.contrast import (
+    ContrastResults,
+    WaldTestResults,
+    t_test_pairwise,
+)
 from statsmodels.tools.data import _is_using_pandas
-from statsmodels.tools.decorators import cache_readonly, cached_data, cached_value
+from statsmodels.tools.decorators import (
+    cache_readonly,
+    cached_data,
+    cached_value,
+)
 from statsmodels.tools.numdiff import approx_fprime
-from statsmodels.tools.sm_exceptions import HessianInversionWarning, ValueWarning
+from statsmodels.tools.sm_exceptions import (
+    HessianInversionWarning,
+    ValueWarning,
+)
 from statsmodels.tools.tools import nan_dot, recipr
 from statsmodels.tools.validation import bool_like
+
 ERROR_INIT_KWARGS = False
+
 _model_params_doc = """Parameters
     ----------
     endog : array_like
@@ -26,7 +43,9 @@ _model_params_doc = """Parameters
         is the number of regressors. An intercept is not included by default
         and should be added by the user. See
         :func:`statsmodels.tools.add_constant`."""
-_missing_param_doc = """missing : str
+
+_missing_param_doc = """\
+missing : str
         Available options are 'none', 'drop', and 'raise'. If 'none', no nan
         checking is done. If 'drop', any observations with nans are dropped.
         If 'raise', an error is raised. Default is 'none'."""
@@ -42,8 +61,7 @@ _extra_param_doc = """


 class Model:
-    __doc__ = (
-        """
+    __doc__ = """
     A (predictive) statistical model. Intended to be subclassed not used.

     %(params_doc)s
@@ -59,24 +77,32 @@ class Model:
     `endog` and `exog` are references to any data provided.  So if the data is
     already stored in numpy arrays and it is changed then `endog` and `exog`
     will change as well.
-    """
-         % {'params_doc': _model_params_doc, 'extra_params_doc': 
-        _missing_param_doc + _extra_param_doc})
+    """ % {'params_doc': _model_params_doc,
+           'extra_params_doc': _missing_param_doc + _extra_param_doc}
+
+    # Maximum number of endogenous variables when using a formula
+    # Default is 1, which is more common. Override in models when needed
+    # Set to None to skip check
     _formula_max_endog = 1
-    _kwargs_allowed = ['missing', 'missing_idx', 'formula', 'design_info',
-        'hasconst']
+    # kwargs that are generically allowed, maybe not supported in all models
+    _kwargs_allowed = [
+        "missing", 'missing_idx', 'formula', 'design_info', "hasconst",
+        ]

     def __init__(self, endog, exog=None, **kwargs):
         missing = kwargs.pop('missing', 'none')
         hasconst = kwargs.pop('hasconst', None)
-        self.data = self._handle_data(endog, exog, missing, hasconst, **kwargs)
+        self.data = self._handle_data(endog, exog, missing, hasconst,
+                                      **kwargs)
         self.k_constant = self.data.k_constant
         self.exog = self.data.exog
         self.endog = self.data.endog
         self._data_attr = []
         self._data_attr.extend(['exog', 'endog', 'data.exog', 'data.endog'])
-        if 'formula' not in kwargs:
+        if 'formula' not in kwargs:  # will not be able to unpickle without these
             self._data_attr.extend(['data.orig_endog', 'data.orig_exog'])
+        # store keys for extras if we need to recreate model instance
+        # we do not need 'missing', maybe we need 'hasconst'
         self._init_keys = list(kwargs.keys())
         if hasconst is not None:
             self._init_keys.append('hasconst')
@@ -84,11 +110,43 @@ class Model:
     def _get_init_kwds(self):
         """return dictionary with extra keys used in model.__init__
         """
-        pass
+        kwds = dict(((key, getattr(self, key, None))
+                     for key in self._init_keys))
+
+        return kwds
+
+    def _check_kwargs(self, kwargs, keys_extra=None, error=ERROR_INIT_KWARGS):
+
+        kwargs_allowed = [
+            "missing", 'missing_idx', 'formula', 'design_info', "hasconst",
+            ]
+        if keys_extra:
+            kwargs_allowed.extend(keys_extra)
+
+        kwargs_invalid = [i for i in kwargs if i not in kwargs_allowed]
+        if kwargs_invalid:
+            msg = "unknown kwargs " + repr(kwargs_invalid)
+            if error is False:
+                warnings.warn(msg, ValueWarning)
+            else:
+                raise ValueError(msg)
+
+    def _handle_data(self, endog, exog, missing, hasconst, **kwargs):
+        data = handle_data(endog, exog, missing, hasconst, **kwargs)
+        # kwargs arrays could have changed, easier to just attach here
+        for key in kwargs:
+            if key in ['design_info', 'formula']:  # leave attached to data
+                continue
+            # pop so we do not start keeping all these twice or references
+            try:
+                setattr(self, key, data.__dict__.pop(key))
+            except KeyError:  # panel already pops keys in data handling
+                pass
+        return data

     @classmethod
-    def from_formula(cls, formula, data, subset=None, drop_cols=None, *args,
-        **kwargs):
+    def from_formula(cls, formula, data, subset=None, drop_cols=None,
+                     *args, **kwargs):
         """
         Create a Model from a formula and dataframe.

@@ -126,27 +184,73 @@ class Model:
         args and kwargs are passed on to the model instantiation. E.g.,
         a numpy structured or rec array, a dictionary, or a pandas DataFrame.
         """
-        pass
+        # TODO: provide a docs template for args/kwargs from child models
+        # TODO: subset could use syntax. issue #469.
+        if subset is not None:
+            data = data.loc[subset]
+        eval_env = kwargs.pop('eval_env', None)
+        if eval_env is None:
+            eval_env = 2
+        elif eval_env == -1:
+            from patsy import EvalEnvironment
+            eval_env = EvalEnvironment({})
+        elif isinstance(eval_env, int):
+            eval_env += 1  # we're going down the stack again
+        missing = kwargs.get('missing', 'drop')
+        if missing == 'none':  # with patsy it's drop or raise. let's raise.
+            missing = 'raise'
+
+        tmp = handle_formula_data(data, None, formula, depth=eval_env,
+                                  missing=missing)
+        ((endog, exog), missing_idx, design_info) = tmp
+        max_endog = cls._formula_max_endog
+        if (max_endog is not None and
+                endog.ndim > 1 and endog.shape[1] > max_endog):
+            raise ValueError('endog has evaluated to an array with multiple '
+                             'columns that has shape {0}. This occurs when '
+                             'the variable converted to endog is non-numeric'
+                             ' (e.g., bool or str).'.format(endog.shape))
+        if drop_cols is not None and len(drop_cols) > 0:
+            cols = [x for x in exog.columns if x not in drop_cols]
+            if len(cols) < len(exog.columns):
+                exog = exog[cols]
+                cols = list(design_info.term_names)
+                for col in drop_cols:
+                    try:
+                        cols.remove(col)
+                    except ValueError:
+                        pass  # OK if not present
+                design_info = design_info.subset(cols)
+
+        kwargs.update({'missing_idx': missing_idx,
+                       'missing': missing,
+                       'formula': formula,  # attach formula for unpckling
+                       'design_info': design_info})
+        mod = cls(endog, exog, *args, **kwargs)
+        mod.formula = formula
+        # since we got a dataframe, attach the original
+        mod.data.frame = data
+        return mod

     @property
     def endog_names(self):
         """
         Names of endogenous variables.
         """
-        pass
+        return self.data.ynames

     @property
-    def exog_names(self) ->(list[str] | None):
+    def exog_names(self) -> list[str] | None:
         """
         Names of exogenous variables.
         """
-        pass
+        return self.data.xnames

     def fit(self):
         """
         Fit a model to data.
         """
-        pass
+        raise NotImplementedError

     def predict(self, params, exog=None, *args, **kwargs):
         """
@@ -154,7 +258,7 @@ class Model:

         This is a placeholder intended to be overwritten by individual models.
         """
-        pass
+        raise NotImplementedError


 class LikelihoodModel(Model):
@@ -176,6 +280,9 @@ class LikelihoodModel(Model):
         """
         pass

+    # TODO: if the intent is to re-initialize the model with new data then this
+    # method needs to take inputs...
+
     def loglike(self, params):
         """
         Log-likelihood of model.
@@ -189,7 +296,7 @@ class LikelihoodModel(Model):
         -----
         Must be overridden by subclasses.
         """
-        pass
+        raise NotImplementedError

     def score(self, params):
         """
@@ -207,7 +314,7 @@ class LikelihoodModel(Model):
         ndarray
             The score vector evaluated at the parameters.
         """
-        pass
+        raise NotImplementedError

     def information(self, params):
         """
@@ -220,7 +327,7 @@ class LikelihoodModel(Model):
         params : ndarray
             The model parameters.
         """
-        pass
+        raise NotImplementedError

     def hessian(self, params):
         """
@@ -236,11 +343,11 @@ class LikelihoodModel(Model):
         ndarray
             The hessian evaluated at the parameters.
         """
-        pass
+        raise NotImplementedError

     def fit(self, start_params=None, method='newton', maxiter=100,
-        full_output=True, disp=True, fargs=(), callback=None, retall=False,
-        skip_hessian=False, **kwargs):
+            full_output=True, disp=True, fargs=(), callback=None, retall=False,
+            skip_hessian=False, **kwargs):
         """
         Fit method for likelihood based models

@@ -404,10 +511,108 @@ class LikelihoodModel(Model):
                     documentation of `scipy.optimize.minimize`.
                     If no method is specified, then BFGS is used.
         """
-        pass
+        Hinv = None  # JP error if full_output=0, Hinv not defined
+
+        if start_params is None:
+            if hasattr(self, 'start_params'):
+                start_params = self.start_params
+            elif self.exog is not None:
+                # fails for shape (K,)?
+                start_params = [0.0] * self.exog.shape[1]
+            else:
+                raise ValueError("If exog is None, then start_params should "
+                                 "be specified")
+
+        # TODO: separate args from nonarg taking score and hessian, ie.,
+        # user-supplied and numerically evaluated estimate frprime does not take
+        # args in most (any?) of the optimize function
+
+        nobs = self.endog.shape[0]
+        # f = lambda params, *args: -self.loglike(params, *args) / nobs
+
+        def f(params, *args):
+            return -self.loglike(params, *args) / nobs
+
+        if method == 'newton':
+            # TODO: why are score and hess positive?
+            def score(params, *args):
+                return self.score(params, *args) / nobs
+
+            def hess(params, *args):
+                return self.hessian(params, *args) / nobs
+        else:
+            def score(params, *args):
+                return -self.score(params, *args) / nobs
+
+            def hess(params, *args):
+                return -self.hessian(params, *args) / nobs
+
+        warn_convergence = kwargs.pop('warn_convergence', True)
+
+        # Remove covariance args before calling fir to allow strict checking
+        if 'cov_type' in kwargs:
+            cov_kwds = kwargs.get('cov_kwds', {})
+            kwds = {'cov_type': kwargs['cov_type'], 'cov_kwds': cov_kwds}
+            if cov_kwds:
+                del kwargs["cov_kwds"]
+            del kwargs["cov_type"]
+        else:
+            kwds = {}
+        if 'use_t' in kwargs:
+            kwds['use_t'] = kwargs['use_t']
+            del kwargs["use_t"]
+
+        optimizer = Optimizer()
+        xopt, retvals, optim_settings = optimizer._fit(f, score, start_params,
+                                                       fargs, kwargs,
+                                                       hessian=hess,
+                                                       method=method,
+                                                       disp=disp,
+                                                       maxiter=maxiter,
+                                                       callback=callback,
+                                                       retall=retall,
+                                                       full_output=full_output)
+        # Restore cov_type, cov_kwds and use_t
+        optim_settings.update(kwds)
+        # NOTE: this is for fit_regularized and should be generalized
+        cov_params_func = kwargs.setdefault('cov_params_func', None)
+        if cov_params_func:
+            Hinv = cov_params_func(self, xopt, retvals)
+        elif method == 'newton' and full_output:
+            Hinv = np.linalg.inv(-retvals['Hessian']) / nobs
+        elif not skip_hessian:
+            H = -1 * self.hessian(xopt)
+            invertible = False
+            if np.all(np.isfinite(H)):
+                eigvals, eigvecs = np.linalg.eigh(H)
+                if np.min(eigvals) > 0:
+                    invertible = True
+
+            if invertible:
+                Hinv = eigvecs.dot(np.diag(1.0 / eigvals)).dot(eigvecs.T)
+                Hinv = np.asfortranarray((Hinv + Hinv.T) / 2.0)
+            else:
+                warnings.warn('Inverting hessian failed, no bse or cov_params '
+                              'available', HessianInversionWarning)
+                Hinv = None
+
+        # TODO: add Hessian approximation and change the above if needed
+        mlefit = LikelihoodModelResults(self, xopt, Hinv, scale=1., **kwds)
+
+        # TODO: hardcode scale?
+        mlefit.mle_retvals = retvals
+        if isinstance(retvals, dict):
+            if warn_convergence and not retvals['converged']:
+                from statsmodels.tools.sm_exceptions import ConvergenceWarning
+                warnings.warn("Maximum Likelihood optimization failed to "
+                              "converge. Check mle_retvals",
+                              ConvergenceWarning)
+
+        mlefit.mle_settings = optim_settings
+        return mlefit

     def _fit_zeros(self, keep_index=None, start_params=None,
-        return_auxiliary=False, k_params=None, **fit_kwds):
+                   return_auxiliary=False, k_params=None, **fit_kwds):
         """experimental, fit the model subject to zero constraints

         Intended for internal use cases until we know what we need.
@@ -440,7 +645,112 @@ class LikelihoodModel(Model):
         -------
         results : Results instance
         """
-        pass
+        # we need to append index of extra params to keep_index as in
+        # NegativeBinomial
+        if hasattr(self, 'k_extra') and self.k_extra > 0:
+            # we cannot change the original, TODO: should we add keep_index_params?
+            keep_index = np.array(keep_index, copy=True)
+            k = self.exog.shape[1]
+            extra_index = np.arange(k, k + self.k_extra)
+            keep_index_p = np.concatenate((keep_index, extra_index))
+        else:
+            keep_index_p = keep_index
+
+        # not all models support start_params, drop if None, hide them in fit_kwds
+        if start_params is not None:
+            fit_kwds['start_params'] = start_params[keep_index_p]
+            k_params = len(start_params)
+            # ignore k_params in this case, or verify consisteny?
+
+        # build auxiliary model and fit
+        init_kwds = self._get_init_kwds()
+        mod_constr = self.__class__(self.endog, self.exog[:, keep_index],
+                                    **init_kwds)
+        res_constr = mod_constr.fit(**fit_kwds)
+        # switch name, only need keep_index for params below
+        keep_index = keep_index_p
+
+        if k_params is None:
+            k_params = self.exog.shape[1]
+            k_params += getattr(self, 'k_extra', 0)
+
+        params_full = np.zeros(k_params)
+        params_full[keep_index] = res_constr.params
+
+        # create dummy results Instance, TODO: wire up properly
+        # TODO: this could be moved into separate private method if needed
+        # discrete L1 fit_regularized doens't reestimate AFAICS
+        # RLM does not have method, disp nor warn_convergence keywords
+        # OLS, WLS swallows extra kwds with **kwargs, but does not have method='nm'
+        try:
+            # Note: addding full_output=False causes exceptions
+            res = self.fit(maxiter=0, disp=0, method='nm', skip_hessian=True,
+                           warn_convergence=False, start_params=params_full)
+            # we get a wrapper back
+        except (TypeError, ValueError):
+            res = self.fit()
+
+        # Warning: make sure we are not just changing the wrapper instead of
+        # results #2400
+        # TODO: do we need to change res._results.scale in some models?
+        if hasattr(res_constr.model, 'scale'):
+            # Note: res.model is self
+            # GLM problem, see #2399,
+            # TODO: remove from model if not needed anymore
+            res.model.scale = res._results.scale = res_constr.model.scale
+
+        if hasattr(res_constr, 'mle_retvals'):
+            res._results.mle_retvals = res_constr.mle_retvals
+            # not available for not scipy optimization, e.g. glm irls
+            # TODO: what retvals should be required?
+            # res.mle_retvals['fcall'] = res_constr.mle_retvals.get('fcall', np.nan)
+            # res.mle_retvals['iterations'] = res_constr.mle_retvals.get(
+            #                                                 'iterations', np.nan)
+            # res.mle_retvals['converged'] = res_constr.mle_retvals['converged']
+        # overwrite all mle_settings
+        if hasattr(res_constr, 'mle_settings'):
+            res._results.mle_settings = res_constr.mle_settings
+
+        res._results.params = params_full
+        if (not hasattr(res._results, 'normalized_cov_params') or
+                res._results.normalized_cov_params is None):
+            res._results.normalized_cov_params = np.zeros((k_params, k_params))
+        else:
+            res._results.normalized_cov_params[...] = 0
+
+        # fancy indexing requires integer array
+        keep_index = np.array(keep_index)
+        res._results.normalized_cov_params[keep_index[:, None], keep_index] = \
+            res_constr.normalized_cov_params
+        k_constr = res_constr.df_resid - res._results.df_resid
+        if hasattr(res_constr, 'cov_params_default'):
+            res._results.cov_params_default = np.zeros((k_params, k_params))
+            res._results.cov_params_default[keep_index[:, None], keep_index] = \
+                res_constr.cov_params_default
+        if hasattr(res_constr, 'cov_type'):
+            res._results.cov_type = res_constr.cov_type
+            res._results.cov_kwds = res_constr.cov_kwds
+
+        res._results.keep_index = keep_index
+        res._results.df_resid = res_constr.df_resid
+        res._results.df_model = res_constr.df_model
+
+        res._results.k_constr = k_constr
+        res._results.results_constrained = res_constr
+
+        # special temporary workaround for RLM
+        # need to be able to override robust covariances
+        if hasattr(res.model, 'M'):
+            del res._results._cache['resid']
+            del res._results._cache['fittedvalues']
+            del res._results._cache['sresid']
+            cov = res._results._cache['bcov_scaled']
+            # inplace adjustment
+            cov[...] = 0
+            cov[keep_index[:, None], keep_index] = res_constr.bcov_scaled
+            res._results.cov_params_default = cov
+
+        return res

     def _fit_collinear(self, atol=1e-14, rtol=1e-13, **kwds):
         """experimental, fit of the model without collinear variables
@@ -450,9 +760,19 @@ class LikelihoodModel(Model):
         Options will be added in future, when the supporting functions
         to identify collinear variables become available.
         """
-        pass
+
+        # ------ copied from PR #2380 remove when merged
+        x = self.exog
+        tol = atol + rtol * x.var(0)
+        r = np.linalg.qr(x, mode='r')
+        mask = np.abs(r.diagonal()) < np.sqrt(tol)
+        # TODO add to results instance
+        # idx_collinear = np.where(mask)[0]
+        idx_keep = np.where(~mask)[0]
+        return self._fit_zeros(keep_index=idx_keep, **kwds)


+# TODO: the below is unfinished
 class GenericLikelihoodModel(LikelihoodModel):
     """
     Allows the fitting of any likelihood function via maximum likelihood.
@@ -499,31 +819,78 @@ class GenericLikelihoodModel(LikelihoodModel):
     import numpy as np
     np.allclose(res.params, probit_res.params)
     """
-
-    def __init__(self, endog, exog=None, loglike=None, score=None, hessian=
-        None, missing='none', extra_params_names=None, **kwds):
+    def __init__(self, endog, exog=None, loglike=None, score=None,
+                 hessian=None, missing='none', extra_params_names=None,
+                 **kwds):
+        # let them be none in case user wants to use inheritance
         if loglike is not None:
             self.loglike = loglike
         if score is not None:
             self.score = score
         if hessian is not None:
             self.hessian = hessian
-        hasconst = kwds.pop('hasconst', None)
+
+        hasconst = kwds.pop("hasconst", None)
         self.__dict__.update(kwds)
-        super(GenericLikelihoodModel, self).__init__(endog, exog, missing=
-            missing, hasconst=hasconst, **kwds)
+
+        # TODO: data structures?
+
+        # TODO temporary solution, force approx normal
+        # self.df_model = 9999
+        # somewhere: CacheWriteWarning: 'df_model' cannot be overwritten
+        super(GenericLikelihoodModel, self).__init__(endog, exog,
+                                                     missing=missing,
+                                                     hasconst=hasconst,
+                                                     **kwds
+                                                     )
+
+        # this will not work for ru2nmnl, maybe np.ndim of a dict?
         if exog is not None:
-            self.nparams = exog.shape[1] if np.ndim(exog) == 2 else 1
+            self.nparams = (exog.shape[1] if np.ndim(exog) == 2 else 1)
+
         if extra_params_names is not None:
             self._set_extra_params_names(extra_params_names)

+    def _set_extra_params_names(self, extra_params_names):
+        # check param_names
+        if extra_params_names is not None:
+            if self.exog is not None:
+                self.exog_names.extend(extra_params_names)
+            else:
+                self.data.xnames = extra_params_names
+
+            self.k_extra = len(extra_params_names)
+            if hasattr(self, "df_resid"):
+                self.df_resid -= self.k_extra
+
+        self.nparams = len(self.exog_names)
+
+    # this is redundant and not used when subclassing
     def initialize(self):
         """
         Initialize (possibly re-initialize) a Model instance. For
         instance, the design matrix of a linear model may change
         and some things must be recomputed.
         """
-        pass
+        if not self.score:  # right now score is not optional
+            self.score = lambda x: approx_fprime(x, self.loglike)
+            if not self.hessian:
+                pass
+        else:   # can use approx_hess_p if we have a gradient
+            if not self.hessian:
+                pass
+        # Initialize is called by
+        # statsmodels.model.LikelihoodModel.__init__
+        # and should contain any preprocessing that needs to be done for a model
+        if self.exog is not None:
+            # assume constant
+            er = np.linalg.matrix_rank(self.exog)
+            self.df_model = float(er - 1)
+            self.df_resid = float(self.exog.shape[0] - er)
+        else:
+            self.df_model = np.nan
+            self.df_resid = np.nan
+        super(GenericLikelihoodModel, self).initialize()

     def expandparams(self, params):
         """
@@ -551,19 +918,21 @@ class GenericLikelihoodModel(LikelihoodModel):
         this could also be replaced by a more general parameter
         transformation.
         """
-        pass
+        paramsfull = self.fixed_params.copy()
+        paramsfull[self.fixed_paramsmask] = params
+        return paramsfull

     def reduceparams(self, params):
         """Reduce parameters"""
-        pass
+        return params[self.fixed_paramsmask]

     def loglike(self, params):
         """Log-likelihood of model at params"""
-        pass
+        return self.loglikeobs(params).sum(0)

     def nloglike(self, params):
         """Negative log-likelihood of model at params"""
-        pass
+        return -self.loglikeobs(params).sum(0)

     def loglikeobs(self, params):
         """
@@ -579,26 +948,33 @@ class GenericLikelihoodModel(LikelihoodModel):
         loglike : array_like
             The log likelihood of the model evaluated at `params`.
         """
-        pass
+        return -self.nloglikeobs(params)

     def score(self, params):
         """
         Gradient of log-likelihood evaluated at params
         """
-        pass
+        kwds = {}
+        kwds.setdefault('centered', True)
+        return approx_fprime(params, self.loglike, **kwds).ravel()

     def score_obs(self, params, **kwds):
         """
         Jacobian/Gradient of log-likelihood evaluated at params for each
         observation.
         """
-        pass
+        # kwds.setdefault('epsilon', 1e-4)
+        kwds.setdefault('centered', True)
+        return approx_fprime(params, self.loglikeobs, **kwds)

     def hessian(self, params):
         """
         Hessian of log-likelihood evaluated at params
         """
-        pass
+        from statsmodels.tools.numdiff import approx_hess
+
+        # need options for hess (epsilon)
+        return approx_hess(params, self.loglike)

     def hessian_factor(self, params, scale=None, observed=True):
         """Weights for calculating Hessian
@@ -621,7 +997,44 @@ class GenericLikelihoodModel(LikelihoodModel):
             A 1d weight vector used in the calculation of the Hessian.
             The hessian is obtained by `(exog.T * hessian_factor).dot(exog)`
         """
-        pass
+
+        raise NotImplementedError
+
+    def fit(self, start_params=None, method='nm', maxiter=500, full_output=1,
+            disp=1, callback=None, retall=0, **kwargs):
+
+        if start_params is None:
+            if hasattr(self, 'start_params'):
+                start_params = self.start_params
+            else:
+                start_params = 0.1 * np.ones(self.nparams)
+
+        if "cov_type" not in kwargs:
+            # this will add default cov_type name and description
+            kwargs["cov_type"] = 'nonrobust'
+
+        fit_method = super(GenericLikelihoodModel, self).fit
+        mlefit = fit_method(start_params=start_params,
+                            method=method, maxiter=maxiter,
+                            full_output=full_output,
+                            disp=disp, callback=callback, **kwargs)
+
+        results_class = getattr(self, 'results_class',
+                                GenericLikelihoodModelResults)
+        genericmlefit = results_class(self, mlefit)
+
+        # amend param names
+        exog_names = [] if (self.exog_names is None) else self.exog_names
+        k_miss = len(exog_names) - len(mlefit.params)
+        if not k_miss == 0:
+            if k_miss < 0:
+                self._set_extra_params_names(['par%d' % i
+                                              for i in range(-k_miss)])
+            else:
+                # I do not want to raise after we have already fit()
+                warnings.warn('more exog_names than parameters', ValueWarning)
+
+        return genericmlefit


 class Results:
@@ -635,11 +1048,11 @@ class Results:
     params : ndarray
         parameter estimates from the fit model
     """
-
     def __init__(self, model, params, **kwd):
         self.__dict__.update(kwd)
         self.initialize(model, params, **kwd)
         self._data_attr = []
+        # Variables to clear from cache
         self._data_in_cache = ['fittedvalues', 'resid', 'wresid']

     def initialize(self, model, params, **kwargs):
@@ -655,7 +1068,62 @@ class Results:
         **kwargs
             Any additional keyword arguments required to initialize the model.
         """
-        pass
+        self.params = params
+        self.model = model
+        if hasattr(model, 'k_constant'):
+            self.k_constant = model.k_constant
+
+    def _transform_predict_exog(self, exog, transform=True):
+
+        is_pandas = _is_using_pandas(exog, None)
+        exog_index = None
+        if is_pandas:
+            if exog.ndim == 2 or self.params.size == 1:
+                exog_index = exog.index
+            else:
+                exog_index = [exog.index.name]
+
+        if transform and hasattr(self.model, 'formula') and (exog is not None):
+            # allow both location of design_info, see #7043
+            design_info = (getattr(self.model, "design_info", None) or
+                           self.model.data.design_info)
+            from patsy import dmatrix
+            if isinstance(exog, pd.Series):
+                # we are guessing whether it should be column or row
+                if (hasattr(exog, 'name') and isinstance(exog.name, str) and
+                        exog.name in design_info.describe()):
+                    # assume we need one column
+                    exog = pd.DataFrame(exog)
+                else:
+                    # assume we need a row
+                    exog = pd.DataFrame(exog).T
+                exog_index = exog.index
+            orig_exog_len = len(exog)
+            is_dict = isinstance(exog, dict)
+            try:
+                exog = dmatrix(design_info, exog, return_type="dataframe")
+            except Exception as exc:
+                msg = ('predict requires that you use a DataFrame when '
+                       'predicting from a model\nthat was created using the '
+                       'formula api.'
+                       '\n\nThe original error message returned by patsy is:\n'
+                       '{0}'.format(str(str(exc))))
+                raise exc.__class__(msg)
+            if orig_exog_len > len(exog) and not is_dict:
+                if exog_index is None:
+                    warnings.warn('nan values have been dropped', ValueWarning)
+                else:
+                    exog = exog.reindex(exog_index)
+            exog_index = exog.index
+
+        if exog is not None:
+            exog = np.asarray(exog)
+            if exog.ndim == 1 and (self.model.exog.ndim == 1 or
+                                   self.model.exog.shape[1] == 1):
+                exog = exog[:, None]
+            exog = np.atleast_2d(exog)  # needed in count model shape[1]
+
+        return exog, exog_index

     def predict(self, exog=None, transform=True, *args, **kwargs):
         """
@@ -702,7 +1170,20 @@ class Results:
         Row indices as in pandas data frames are supported, and added to the
         returned prediction.
         """
-        pass
+        exog, exog_index = self._transform_predict_exog(exog,
+                                                        transform=transform)
+
+        predict_results = self.model.predict(self.params, exog, *args,
+                                             **kwargs)
+
+        if exog_index is not None and not hasattr(predict_results,
+                                                  'predicted_values'):
+            if predict_results.ndim == 1:
+                return pd.Series(predict_results, index=exog_index)
+            else:
+                return pd.DataFrame(predict_results, index=exog_index)
+        else:
+            return predict_results

     def summary(self):
         """
@@ -710,9 +1191,10 @@ class Results:

         Not implemented
         """
-        pass
+        raise NotImplementedError


+# TODO: public method?
 class LikelihoodModelResults(Results):
     """
     Class to contain results from likelihood models
@@ -868,65 +1350,111 @@ class LikelihoodModelResults(Results):
                 Results at each iteration.
         """

-    def __init__(self, model, params, normalized_cov_params=None, scale=1.0,
-        **kwargs):
+    # by default we use normal distribution
+    # can be overwritten by instances or subclasses
+
+    def __init__(self, model, params, normalized_cov_params=None, scale=1.,
+                 **kwargs):
         super(LikelihoodModelResults, self).__init__(model, params)
         self.normalized_cov_params = normalized_cov_params
         self.scale = scale
         self._use_t = False
+        # robust covariance
+        # We put cov_type in kwargs so subclasses can decide in fit whether to
+        # use this generic implementation
         if 'use_t' in kwargs:
             use_t = kwargs['use_t']
             self.use_t = use_t if use_t is not None else False
         if 'cov_type' in kwargs:
             cov_type = kwargs.get('cov_type', 'nonrobust')
             cov_kwds = kwargs.get('cov_kwds', {})
+
             if cov_type == 'nonrobust':
                 self.cov_type = 'nonrobust'
-                self.cov_kwds = {'description': 
-                    'Standard Errors assume that the ' +
-                    'covariance matrix of the errors is correctly ' +
-                    'specified.'}
+                self.cov_kwds = {'description': 'Standard Errors assume that the ' +
+                                 'covariance matrix of the errors is correctly ' +
+                                 'specified.'}
             else:
                 from statsmodels.base.covtype import get_robustcov_results
                 if cov_kwds is None:
                     cov_kwds = {}
                 use_t = self.use_t
-                get_robustcov_results(self, cov_type=cov_type, use_self=
-                    True, use_t=use_t, **cov_kwds)
+                # TODO: we should not need use_t in get_robustcov_results
+                get_robustcov_results(self, cov_type=cov_type, use_self=True,
+                                      use_t=use_t, **cov_kwds)

     def normalized_cov_params(self):
         """See specific model class docstring"""
-        pass
-
+        raise NotImplementedError
+
+    def _get_robustcov_results(self, cov_type='nonrobust', use_self=True,
+                               use_t=None, **cov_kwds):
+        if use_self is False:
+            raise ValueError("use_self should have been removed long ago.  "
+                             "See GH#4401")
+        from statsmodels.base.covtype import get_robustcov_results
+        if cov_kwds is None:
+            cov_kwds = {}
+
+        if cov_type == 'nonrobust':
+            self.cov_type = 'nonrobust'
+            self.cov_kwds = {'description': 'Standard Errors assume that the ' +
+                             'covariance matrix of the errors is correctly ' +
+                             'specified.'}
+        else:
+            # TODO: we should not need use_t in get_robustcov_results
+            get_robustcov_results(self, cov_type=cov_type, use_self=True,
+                                  use_t=use_t, **cov_kwds)
     @property
     def use_t(self):
         """Flag indicating to use the Student's distribution in inference."""
-        pass
+        return self._use_t
+
+    @use_t.setter
+    def use_t(self, value):
+        self._use_t = bool(value)

     @cached_value
     def llf(self):
         """Log-likelihood of model"""
-        pass
+        return self.model.loglike(self.params)

     @cached_value
     def bse(self):
         """The standard errors of the parameter estimates."""
-        pass
+        # Issue 3299
+        if ((not hasattr(self, 'cov_params_default')) and
+                (self.normalized_cov_params is None)):
+            bse_ = np.empty(len(self.params))
+            bse_[:] = np.nan
+        else:
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore", RuntimeWarning)
+                bse_ = np.sqrt(np.diag(self.cov_params()))
+        return bse_

     @cached_value
     def tvalues(self):
         """
         Return the t-statistic for a given parameter estimate.
         """
-        pass
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", RuntimeWarning)
+            return self.params / self.bse

     @cached_value
     def pvalues(self):
         """The two-tailed p values for the t-stats of the params."""
-        pass
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", RuntimeWarning)
+            if self.use_t:
+                df_resid = getattr(self, 'df_resid_inference', self.df_resid)
+                return stats.t.sf(np.abs(self.tvalues), df_resid) * 2
+            else:
+                return stats.norm.sf(np.abs(self.tvalues)) * 2

     def cov_params(self, r_matrix=None, column=None, scale=None, cov_p=None,
-        other=None):
+                   other=None):
         """
         Compute the variance/covariance matrix.

@@ -976,8 +1504,50 @@ class LikelihoodModelResults(Results):

         ``(scale) * (X.T X)^(-1)[column][:,column]`` if column is 1d
         """
-        pass
+        if (hasattr(self, 'mle_settings') and
+                self.mle_settings['optimizer'] in ['l1', 'l1_cvxopt_cp']):
+            dot_fun = nan_dot
+        else:
+            dot_fun = np.dot
+
+        if (cov_p is None and self.normalized_cov_params is None and
+                not hasattr(self, 'cov_params_default')):
+            raise ValueError('need covariance of parameters for computing '
+                             '(unnormalized) covariances')
+        if column is not None and (r_matrix is not None or other is not None):
+            raise ValueError('Column should be specified without other '
+                             'arguments.')
+        if other is not None and r_matrix is None:
+            raise ValueError('other can only be specified with r_matrix')
+
+        if cov_p is None:
+            if hasattr(self, 'cov_params_default'):
+                cov_p = self.cov_params_default
+            else:
+                if scale is None:
+                    scale = self.scale
+                cov_p = self.normalized_cov_params * scale
+
+        if column is not None:
+            column = np.asarray(column)
+            if column.shape == ():
+                return cov_p[column, column]
+            else:
+                return cov_p[column[:, None], column]
+        elif r_matrix is not None:
+            r_matrix = np.asarray(r_matrix)
+            if r_matrix.shape == ():
+                raise ValueError("r_matrix should be 1d or 2d")
+            if other is None:
+                other = r_matrix
+            else:
+                other = np.asarray(other)
+            tmp = dot_fun(r_matrix, dot_fun(cov_p, np.transpose(other)))
+            return tmp
+        else:  # if r_matrix is None and column is None:
+            return cov_p

+    # TODO: make sure this works as needed for GLMs
     def t_test(self, r_matrix, cov_p=None, use_t=None):
         """
         Compute a t-test for a each linear hypothesis of the form Rb = q.
@@ -1067,7 +1637,58 @@ class LikelihoodModelResults(Results):
         c2             1.0001      0.249      0.000      1.000       0.437       1.563
         ==============================================================================
         """
-        pass
+        from patsy import DesignInfo
+        use_t = bool_like(use_t, "use_t", strict=True, optional=True)
+        if self.params.ndim == 2:
+            names = ['y{}_{}'.format(i[0], i[1])
+                     for i in self.model.data.cov_names]
+        else:
+            names = self.model.data.cov_names
+        LC = DesignInfo(names).linear_constraint(r_matrix)
+        r_matrix, q_matrix = LC.coefs, LC.constants
+        num_ttests = r_matrix.shape[0]
+        num_params = r_matrix.shape[1]
+
+        if (cov_p is None and self.normalized_cov_params is None and
+                not hasattr(self, 'cov_params_default')):
+            raise ValueError('Need covariance of parameters for computing '
+                             'T statistics')
+        params = self.params.ravel(order="F")
+        if num_params != params.shape[0]:
+            raise ValueError('r_matrix and params are not aligned')
+        if q_matrix is None:
+            q_matrix = np.zeros(num_ttests)
+        else:
+            q_matrix = np.asarray(q_matrix)
+            q_matrix = q_matrix.squeeze()
+        if q_matrix.size > 1:
+            if q_matrix.shape[0] != num_ttests:
+                raise ValueError("r_matrix and q_matrix must have the same "
+                                 "number of rows")
+
+        if use_t is None:
+            # switch to use_t false if undefined
+            use_t = (hasattr(self, 'use_t') and self.use_t)
+
+        _effect = np.dot(r_matrix, params)
+
+        # Perform the test
+        if num_ttests > 1:
+            _sd = np.sqrt(np.diag(self.cov_params(
+                r_matrix=r_matrix, cov_p=cov_p)))
+        else:
+            _sd = np.sqrt(self.cov_params(r_matrix=r_matrix, cov_p=cov_p))
+        _t = (_effect - q_matrix) * recipr(_sd)
+
+        df_resid = getattr(self, 'df_resid_inference', self.df_resid)
+
+        if use_t:
+            return ContrastResults(effect=_effect, t=_t, sd=_sd,
+                                   df_denom=df_resid)
+        else:
+            return ContrastResults(effect=_effect, statistic=_t, sd=_sd,
+                                   df_denom=df_resid,
+                                   distribution='norm')

     def f_test(self, r_matrix, cov_p=None, invcov=None):
         """
@@ -1162,10 +1783,12 @@ class LikelihoodModelResults(Results):
         >>> print(f_test)
         <F test: F=array([[ 144.17976065]]), p=6.322026217355609e-08, df_denom=9, df_num=3>
         """
-        pass
+        res = self.wald_test(r_matrix, cov_p=cov_p, invcov=invcov, use_f=True, scalar=True)
+        return res

-    def wald_test(self, r_matrix, cov_p=None, invcov=None, use_f=None,
-        df_constraints=None, scalar=None):
+    # TODO: untested for GLMs?
+    def wald_test(self, r_matrix, cov_p=None, invcov=None,
+                  use_f=None, df_constraints=None, scalar=None):
         """
         Compute a Wald-test for a joint linear hypothesis.

@@ -1227,10 +1850,87 @@ class LikelihoodModelResults(Results):
         design matrix of the model. There can be problems in non-OLS models
         where the rank of the covariance of the noise is not full.
         """
-        pass
+        use_f = bool_like(use_f, "use_f", strict=True, optional=True)
+        scalar = bool_like(scalar, "scalar", strict=True, optional=True)
+        if use_f is None:
+            # switch to use_t false if undefined
+            use_f = (hasattr(self, 'use_t') and self.use_t)
+
+        from patsy import DesignInfo
+        if self.params.ndim == 2:
+            names = ['y{}_{}'.format(i[0], i[1])
+                     for i in self.model.data.cov_names]
+        else:
+            names = self.model.data.cov_names
+        params = self.params.ravel(order="F")
+        LC = DesignInfo(names).linear_constraint(r_matrix)
+        r_matrix, q_matrix = LC.coefs, LC.constants
+
+        if (self.normalized_cov_params is None and cov_p is None and
+                invcov is None and not hasattr(self, 'cov_params_default')):
+            raise ValueError('need covariance of parameters for computing '
+                             'F statistics')
+
+        cparams = np.dot(r_matrix, params[:, None])
+        J = float(r_matrix.shape[0])  # number of restrictions
+
+        if q_matrix is None:
+            q_matrix = np.zeros(J)
+        else:
+            q_matrix = np.asarray(q_matrix)
+        if q_matrix.ndim == 1:
+            q_matrix = q_matrix[:, None]
+            if q_matrix.shape[0] != J:
+                raise ValueError("r_matrix and q_matrix must have the same "
+                                 "number of rows")
+        Rbq = cparams - q_matrix
+        if invcov is None:
+            cov_p = self.cov_params(r_matrix=r_matrix, cov_p=cov_p)
+            if np.isnan(cov_p).max():
+                raise ValueError("r_matrix performs f_test for using "
+                                 "dimensions that are asymptotically "
+                                 "non-normal")
+            invcov = np.linalg.pinv(cov_p)
+            J_ = np.linalg.matrix_rank(cov_p)
+            if J_ < J:
+                warnings.warn('covariance of constraints does not have full '
+                              'rank. The number of constraints is %d, but '
+                              'rank is %d' % (J, J_), ValueWarning)
+                J = J_
+
+        # TODO streamline computation, we do not need to compute J if given
+        if df_constraints is not None:
+            # let caller override J by df_constraint
+            J = df_constraints
+
+        if (hasattr(self, 'mle_settings') and
+                self.mle_settings['optimizer'] in ['l1', 'l1_cvxopt_cp']):
+            F = nan_dot(nan_dot(Rbq.T, invcov), Rbq)
+        else:
+            F = np.dot(np.dot(Rbq.T, invcov), Rbq)
+
+        df_resid = getattr(self, 'df_resid_inference', self.df_resid)
+        if scalar is None:
+            warnings.warn(
+                "The behavior of wald_test will change after 0.14 to returning "
+                "scalar test statistic values. To get the future behavior now, "
+                "set scalar to True. To silence this message while retaining "
+                "the legacy behavior, set scalar to False.",
+                FutureWarning
+            )
+            scalar = False
+        if scalar and F.size == 1:
+            F = float(np.squeeze(F))
+        if use_f:
+            F /= J
+            return ContrastResults(F=F, df_denom=df_resid,
+                                   df_num=J) #invcov.shape[0])
+        else:
+            return ContrastResults(chi2=F, df_denom=J, statistic=F,
+                                   distribution='chi2', distargs=(J,))

     def wald_test_terms(self, skip_single=False, extra_constraints=None,
-        combine_terms=None, scalar=None):
+                        combine_terms=None, scalar=None):
         """
         Compute a sequence of Wald tests for terms over multiple columns.

@@ -1277,8 +1977,10 @@ class LikelihoodModelResults(Results):
         C(Weight, Sum)                    12.432445  3.99943118767e-05              2        51
         C(Duration, Sum):C(Weight, Sum)    0.176002      0.83912310946              2        51

-        >>> res_poi = Poisson.from_formula("Days ~ C(Weight) * C(Duration)",                                            data).fit(cov_type='HC0')
-        >>> wt = res_poi.wald_test_terms(skip_single=False,                                          combine_terms=['Duration', 'Weight'])
+        >>> res_poi = Poisson.from_formula("Days ~ C(Weight) * C(Duration)", \
+                                           data).fit(cov_type='HC0')
+        >>> wt = res_poi.wald_test_terms(skip_single=False, \
+                                         combine_terms=['Duration', 'Weight'])
         >>> print(wt)
                                     chi2             P>chi2  df constraint
         Intercept              15.695625  7.43960374424e-05              1
@@ -1288,10 +1990,89 @@ class LikelihoodModelResults(Results):
         Duration               11.187849     0.010752286833              3
         Weight                 30.263368  4.32586407145e-06              4
         """
-        pass
+        # lazy import
+        from collections import defaultdict
+
+        result = self
+        if extra_constraints is None:
+            extra_constraints = []
+        if combine_terms is None:
+            combine_terms = []
+        design_info = getattr(result.model.data, 'design_info', None)
+
+        if design_info is None and extra_constraints is None:
+            raise ValueError('no constraints, nothing to do')
+
+        identity = np.eye(len(result.params))
+        constraints = []
+        combined = defaultdict(list)
+        if design_info is not None:
+            for term in design_info.terms:
+                cols = design_info.slice(term)
+                name = term.name()
+                constraint_matrix = identity[cols]
+
+                # check if in combined
+                for cname in combine_terms:
+                    if cname in name:
+                        combined[cname].append(constraint_matrix)
+
+                k_constraint = constraint_matrix.shape[0]
+                if skip_single:
+                    if k_constraint == 1:
+                        continue
+
+                constraints.append((name, constraint_matrix))
+
+            combined_constraints = []
+            for cname in combine_terms:
+                combined_constraints.append((cname, np.vstack(combined[cname])))
+        else:
+            # check by exog/params names if there is no formula info
+            for col, name in enumerate(result.model.exog_names):
+                constraint_matrix = np.atleast_2d(identity[col])
+
+                # check if in combined
+                for cname in combine_terms:
+                    if cname in name:
+                        combined[cname].append(constraint_matrix)
+
+                if skip_single:
+                    continue
+
+                constraints.append((name, constraint_matrix))
+
+            combined_constraints = []
+            for cname in combine_terms:
+                combined_constraints.append((cname, np.vstack(combined[cname])))
+
+        use_t = result.use_t
+        distribution = ['chi2', 'F'][use_t]
+
+        res_wald = []
+        index = []
+        for name, constraint in constraints + combined_constraints + extra_constraints:
+            wt = result.wald_test(constraint, scalar=scalar)
+            row = [wt.statistic, wt.pvalue, constraint.shape[0]]
+            if use_t:
+                row.append(wt.df_denom)
+            res_wald.append(row)
+            index.append(name)
+
+        # distribution nerutral names
+        col_names = ['statistic', 'pvalue', 'df_constraint']
+        if use_t:
+            col_names.append('df_denom')
+        # TODO: maybe move DataFrame creation to results class
+        from pandas import DataFrame
+        table = DataFrame(res_wald, index=index, columns=col_names)
+        res = WaldTestResults(None, distribution, None, table=table)
+        # TODO: remove temp again, added for testing
+        res.temp = constraints + combined_constraints + extra_constraints
+        return res

     def t_test_pairwise(self, term_name, method='hs', alpha=0.05,
-        factor_labels=None):
+                        factor_labels=None):
         """
         Perform pairwise t_test with multiple testing corrected p-values.

@@ -1347,7 +2128,9 @@ class LikelihoodModelResults(Results):
         3-1         1.763307   0.000002      True
         3-2         1.130992   0.010212      True
         """
-        pass
+        res = t_test_pairwise(self, term_name, method=method, alpha=alpha,
+                              factor_labels=factor_labels)
+        return res

     def _get_wald_nonlinear(self, func, deriv=None):
         """Experimental method for nonlinear prediction and tests
@@ -1369,9 +2152,14 @@ class LikelihoodModelResults(Results):
             calculate the results for the prediction or tests

         """
-        pass
+        from statsmodels.stats._delta_method import NonlinearDeltaCov
+        func_args = None  # TODO: not yet implemented, maybe skip - use partial
+        nl = NonlinearDeltaCov(func, self.params, self.cov_params(),
+                               deriv=deriv, func_args=func_args)
+
+        return nl

-    def conf_int(self, alpha=0.05, cols=None):
+    def conf_int(self, alpha=.05, cols=None):
         """
         Construct confidence interval for the fitted parameters.

@@ -1423,7 +2211,32 @@ class LikelihoodModelResults(Results):
         array([[-0.1115811 ,  0.03994274],
                [-3.12506664, -0.91539297]])
         """
-        pass
+        bse = self.bse
+
+        if self.use_t:
+            dist = stats.t
+            df_resid = getattr(self, 'df_resid_inference', self.df_resid)
+            q = dist.ppf(1 - alpha / 2, df_resid)
+        else:
+            dist = stats.norm
+            q = dist.ppf(1 - alpha / 2)
+
+        params = self.params
+        lower = params - q * bse
+        upper = params + q * bse
+        if cols is not None:
+            warnings.warn(
+                "cols is deprecated and will be removed after 0.14 is "
+                "released. cols only works when inputs are NumPy arrays and "
+                "will fail when using pandas Series or DataFrames as input. "
+                "Subsets of confidence intervals can be selected using slices "
+                "of the full confidence interval array.",
+                FutureWarning
+            )
+            cols = np.asarray(cols)
+            lower = lower[cols]
+            upper = upper[cols]
+        return np.asarray(lzip(lower, upper))

     def save(self, fname, remove_data=False):
         """
@@ -1444,7 +2257,13 @@ class LikelihoodModelResults(Results):
         If remove_data is true and the model result does not implement a
         remove_data method then this will raise an exception.
         """
-        pass
+
+        from statsmodels.iolib.smpickle import save_pickle
+
+        if remove_data:
+            self.remove_data()
+
+        save_pickle(self, fname)

     @classmethod
     def load(cls, fname):
@@ -1467,7 +2286,9 @@ class LikelihoodModelResults(Results):
         Results
             The unpickled results instance.
         """
-        pass
+
+        from statsmodels.iolib.smpickle import load_pickle
+        return load_pickle(fname)

     def remove_data(self):
         """
@@ -1502,18 +2323,68 @@ class LikelihoodModelResults(Results):
         result._data_attr_model : arrays attached to the model
             instance but not to the results instance
         """
-        pass
+        cls = self.__class__
+        # Note: we cannot just use `getattr(cls, x)` or `getattr(self, x)`
+        # because of redirection involved with property-like accessors
+        cls_attrs = {}
+        for name in dir(cls):
+            try:
+                attr = object.__getattribute__(cls, name)
+            except AttributeError:
+                pass
+            else:
+                cls_attrs[name] = attr
+        data_attrs = [x for x in cls_attrs
+                      if isinstance(cls_attrs[x], cached_data)]
+        for name in data_attrs:
+            self._cache[name] = None
+
+        def wipe(obj, att):
+            # get to last element in attribute path
+            p = att.split('.')
+            att_ = p.pop(-1)
+            try:
+                obj_ = reduce(getattr, [obj] + p)
+                if hasattr(obj_, att_):
+                    setattr(obj_, att_, None)
+            except AttributeError:
+                pass
+
+        model_only = ['model.' + i for i in getattr(self, "_data_attr_model", [])]
+        model_attr = ['model.' + i for i in self.model._data_attr]
+        for att in self._data_attr + model_attr + model_only:
+            if att in data_attrs:
+                # these have been handled above, and trying to call wipe
+                # would raise an Exception anyway, so skip these
+                continue
+            wipe(self, att)
+
+        for key in self._data_in_cache:
+            try:
+                self._cache[key] = None
+            except (AttributeError, KeyError):
+                pass


 class LikelihoodResultsWrapper(wrap.ResultsWrapper):
-    _attrs = {'params': 'columns', 'bse': 'columns', 'pvalues': 'columns',
-        'tvalues': 'columns', 'resid': 'rows', 'fittedvalues': 'rows',
-        'normalized_cov_params': 'cov'}
-    _wrap_attrs = _attrs
-    _wrap_methods = {'cov_params': 'cov', 'conf_int': 'columns'}
+    _attrs = {
+        'params': 'columns',
+        'bse': 'columns',
+        'pvalues': 'columns',
+        'tvalues': 'columns',
+        'resid': 'rows',
+        'fittedvalues': 'rows',
+        'normalized_cov_params': 'cov',
+    }

+    _wrap_attrs = _attrs
+    _wrap_methods = {
+        'cov_params': 'cov',
+        'conf_int': 'columns'
+    }

-wrap.populate_wrapper(LikelihoodResultsWrapper, LikelihoodModelResults)
+wrap.populate_wrapper(LikelihoodResultsWrapper,  # noqa:E305
+                      LikelihoodModelResults)


 class ResultMixin:
@@ -1521,29 +2392,42 @@ class ResultMixin:
     @cache_readonly
     def df_modelwc(self):
         """Model WC"""
-        pass
+        # collect different ways of defining the number of parameters, used for
+        # aic, bic
+        k_extra = getattr(self.model, "k_extra", 0)
+        if hasattr(self, 'df_model'):
+            if hasattr(self, 'k_constant'):
+                hasconst = self.k_constant
+            elif hasattr(self, 'hasconst'):
+                hasconst = self.hasconst
+            else:
+                # default assumption
+                hasconst = 1
+            return self.df_model + hasconst + k_extra
+        else:
+            return self.params.size

     @cache_readonly
     def aic(self):
         """Akaike information criterion"""
-        pass
+        return -2 * self.llf + 2 * (self.df_modelwc)

     @cache_readonly
     def bic(self):
         """Bayesian information criterion"""
-        pass
+        return -2 * self.llf + np.log(self.nobs) * (self.df_modelwc)

     @cache_readonly
     def score_obsv(self):
         """cached Jacobian of log-likelihood
         """
-        pass
+        return self.model.score_obs(self.params)

     @cache_readonly
     def hessv(self):
         """cached Hessian of log-likelihood
         """
-        pass
+        return self.model.hessian(self.params)

     @cache_readonly
     def covjac(self):
@@ -1551,7 +2435,12 @@ class ResultMixin:
         covariance of parameters based on outer product of jacobian of
         log-likelihood
         """
-        pass
+        #  if not hasattr(self, '_results'):
+        #      raise ValueError('need to call fit first')
+        #      #self.fit()
+        #  self.jacv = jacv = self.jac(self._results.params)
+        jacv = self.score_obsv
+        return np.linalg.inv(np.dot(jacv.T, jacv))

     @cache_readonly
     def covjhj(self):
@@ -1561,19 +2450,23 @@ class ResultMixin:

         name should be covhjh
         """
-        pass
+        jacv = self.score_obsv
+        hessv = self.hessv
+        hessinv = np.linalg.inv(hessv)
+        #  self.hessinv = hessin = self.cov_params()
+        return np.dot(hessinv, np.dot(np.dot(jacv.T, jacv), hessinv))

     @cache_readonly
     def bsejhj(self):
         """standard deviation of parameter estimates based on covHJH
         """
-        pass
+        return np.sqrt(np.diag(self.covjhj))

     @cache_readonly
     def bsejac(self):
         """standard deviation of parameter estimates based on covjac
         """
-        pass
+        return np.sqrt(np.diag(self.covjac))

     def bootstrap(self, nrep=100, method='nm', disp=0, store=1):
         """simple bootstrap to get mean and variance of estimator
@@ -1610,7 +2503,30 @@ class ResultMixin:
         This will be moved to apply only to models with independently
         distributed observations.
         """
-        pass
+        results = []
+        hascloneattr = True if hasattr(self.model, 'cloneattr') else False
+        for i in range(nrep):
+            rvsind = np.random.randint(self.nobs, size=self.nobs)
+            # this needs to set startparam and get other defining attributes
+            # need a clone method on model
+            if self.exog is not None:
+                exog_resamp = self.exog[rvsind, :]
+            else:
+                exog_resamp = None
+            # build auxiliary model and fit
+            init_kwds = self.model._get_init_kwds()
+            fitmod = self.model.__class__(self.endog[rvsind],
+                                          exog=exog_resamp, **init_kwds)
+            if hascloneattr:
+                for attr in self.model.cloneattr:
+                    setattr(fitmod, attr, getattr(self.model, attr))
+
+            fitres = fitmod.fit(method=method, disp=disp)
+            results.append(fitres.params)
+        results = np.array(results)
+        if store:
+            self.bootstrap_results = results
+        return results.mean(0), results.std(0), results

     def get_nlfun(self, fun):
         """
@@ -1618,25 +2534,35 @@ class ResultMixin:

         This is not Implemented
         """
-        pass
+        # I think this is supposed to get the delta method that is currently
+        # in miscmodels count (as part of Poisson example)
+        raise NotImplementedError


-class _LLRMixin:
+class _LLRMixin():
     """Mixin class for Null model and likelihood ratio
     """
+    # methods copied from DiscreteResults, adjusted pseudo R2

-    def pseudo_rsquared(self, kind='mcf'):
+    def pseudo_rsquared(self, kind="mcf"):
         """
         McFadden's pseudo-R-squared. `1 - (llf / llnull)`
         """
-        pass
+        kind = kind.lower()
+        if kind.startswith("mcf"):
+            prsq = 1 - self.llf / self.llnull
+        elif kind.startswith("cox") or kind in ["cs", "lr"]:
+            prsq = 1 - np.exp((self.llnull - self.llf) * (2 / self.nobs))
+        else:
+            raise ValueError("only McFadden and Cox-Snell are available")
+        return prsq

     @cache_readonly
     def llr(self):
         """
         Likelihood ratio chi-squared statistic; `-2*(llnull - llf)`
         """
-        pass
+        return -2*(self.llnull - self.llf)

     @cache_readonly
     def llr_pvalue(self):
@@ -1645,7 +2571,13 @@ class _LLRMixin:
         statistic greater than llr.  llr has a chi-squared distribution
         with degrees of freedom `df_model`.
         """
-        pass
+        # see also RegressionModel compare_lr_test
+        llr = self.llr
+        df_full = self.df_resid
+        df_restr = self.df_resid_null
+        lrdf = (df_restr - df_full)
+        self.df_lr_null = lrdf
+        return stats.distributions.chi2.sf(llr, lrdf)

     def set_null_options(self, llnull=None, attach_results=True, **kwargs):
         """
@@ -1673,14 +2605,68 @@ class _LLRMixin:
         -----
         Modifies attributes of this instance, and so has no return.
         """
-        pass
+        # reset cache, note we need to add here anything that depends on
+        # llnullor the null model. If something is missing, then the attribute
+        # might be incorrect.
+        self._cache.pop('llnull', None)
+        self._cache.pop('llr', None)
+        self._cache.pop('llr_pvalue', None)
+        self._cache.pop('prsquared', None)
+        if hasattr(self, 'res_null'):
+            del self.res_null
+
+        if llnull is not None:
+            self._cache['llnull'] = llnull
+        self._attach_nullmodel = attach_results
+        self._optim_kwds_null = kwargs

     @cache_readonly
     def llnull(self):
         """
         Value of the constant-only loglikelihood
         """
-        pass
+        model = self.model
+        kwds = model._get_init_kwds().copy()
+        for key in getattr(model, '_null_drop_keys', []):
+            del kwds[key]
+        # TODO: what parameters to pass to fit?
+        mod_null = model.__class__(model.endog, np.ones(self.nobs), **kwds)
+        # TODO: consider catching and warning on convergence failure?
+        # in the meantime, try hard to converge. see
+        # TestPoissonConstrained1a.test_smoke
+
+        optim_kwds = getattr(self, '_optim_kwds_null', {}).copy()
+
+        if 'start_params' in optim_kwds:
+            # user provided
+            sp_null = optim_kwds.pop('start_params')
+        elif hasattr(model, '_get_start_params_null'):
+            # get moment estimates if available
+            sp_null = model._get_start_params_null()
+        else:
+            sp_null = None
+
+        opt_kwds = dict(method='bfgs', warn_convergence=False, maxiter=10000,
+                        disp=0)
+        opt_kwds.update(optim_kwds)
+
+        if optim_kwds:
+            res_null = mod_null.fit(start_params=sp_null, **opt_kwds)
+        else:
+            # this should be a reasonably method case across versions
+            res_null = mod_null.fit(start_params=sp_null, method='nm',
+                                    warn_convergence=False,
+                                    maxiter=10000, disp=0)
+            res_null = mod_null.fit(start_params=res_null.params, method='bfgs',
+                                    warn_convergence=False,
+                                    maxiter=10000, disp=0)
+
+        if getattr(self, '_attach_nullmodel', False) is not False:
+            self.res_null = res_null
+
+        self.k_null = len(res_null.params)
+        self.df_resid_null = res_null.df_resid
+        return res_null.llf


 class GenericLikelihoodModelResults(LikelihoodModelResults, ResultMixin):
@@ -1735,29 +2721,48 @@ class GenericLikelihoodModelResults(LikelihoodModelResults, ResultMixin):
         self.endog = model.endog
         self.exog = model.exog
         self.nobs = model.endog.shape[0]
-        k_extra = getattr(self.model, 'k_extra', 0)
+
+        # TODO: possibly move to model.fit()
+        #       and outsource together with patching names
+        k_extra = getattr(self.model, "k_extra", 0)
         if hasattr(model, 'df_model') and not np.isnan(model.df_model):
             self.df_model = model.df_model
         else:
             df_model = len(mlefit.params) - self.model.k_constant - k_extra
             self.df_model = df_model
+            # retrofitting the model, used in t_test TODO: check design
             self.model.df_model = df_model
+
         if hasattr(model, 'df_resid') and not np.isnan(model.df_resid):
             self.df_resid = model.df_resid
         else:
             self.df_resid = self.endog.shape[0] - self.df_model - k_extra
+            # retrofitting the model, used in t_test TODO: check design
             self.model.df_resid = self.df_resid
+
         self._cache = {}
         self.__dict__.update(mlefit.__dict__)
+
         k_params = len(mlefit.params)
+        # checks mainly for adding new models or subclassing
+
         if self.df_model + self.model.k_constant + k_extra != k_params:
-            warnings.warn(
-                'df_model + k_constant + k_extra differs from k_params')
-        if self.df_resid != self.nobs - k_params:
-            warnings.warn('df_resid differs from nobs - k_params')
+            warnings.warn("df_model + k_constant + k_extra "
+                          "differs from k_params")

-    def get_prediction(self, exog=None, which='mean', transform=True,
-        row_labels=None, average=False, agg_weights=None, **kwargs):
+        if self.df_resid != self.nobs - k_params:
+            warnings.warn("df_resid differs from nobs - k_params")
+
+    def get_prediction(
+            self,
+            exog=None,
+            which="mean",
+            transform=True,
+            row_labels=None,
+            average=False,
+            agg_weights=None,
+            **kwargs
+            ):
         """
         Compute prediction results when endpoint transformation is valid.

@@ -1805,9 +2810,23 @@ class GenericLikelihoodModelResults(LikelihoodModelResults, ResultMixin):
         -----
         Status: new in 0.14, experimental
         """
-        pass
+        from statsmodels.base._prediction_inference import get_prediction
+
+        pred_kwds = kwargs
+
+        res = get_prediction(
+            self,
+            exog=exog,
+            which=which,
+            transform=transform,
+            row_labels=row_labels,
+            average=average,
+            agg_weights=agg_weights,
+            pred_kwds=pred_kwds
+            )
+        return res

-    def summary(self, yname=None, xname=None, title=None, alpha=0.05):
+    def summary(self, yname=None, xname=None, title=None, alpha=.05):
         """Summarize the Regression Results

         Parameters
@@ -1833,4 +2852,31 @@ class GenericLikelihoodModelResults(LikelihoodModelResults, ResultMixin):
         --------
         statsmodels.iolib.summary.Summary : class to hold summary results
         """
-        pass
+
+        top_left = [('Dep. Variable:', None),
+                    ('Model:', None),
+                    ('Method:', ['Maximum Likelihood']),
+                    ('Date:', None),
+                    ('Time:', None),
+                    ('No. Observations:', None),
+                    ('Df Residuals:', None),
+                    ('Df Model:', None),
+                    ]
+
+        top_right = [('Log-Likelihood:', None),
+                     ('AIC:', ["%#8.4g" % self.aic]),
+                     ('BIC:', ["%#8.4g" % self.bic])
+                     ]
+
+        if title is None:
+            title = self.model.__class__.__name__ + ' ' + "Results"
+
+        # create summary table instance
+        from statsmodels.iolib.summary import Summary
+        smry = Summary()
+        smry.add_table_2cols(self, gleft=top_left, gright=top_right,
+                             yname=yname, xname=xname, title=title)
+        smry.add_table_params(self, yname=yname, xname=xname, alpha=alpha,
+                              use_t=self.use_t)
+
+        return smry
diff --git a/statsmodels/base/optimizer.py b/statsmodels/base/optimizer.py
index a04f5f5aa..d9a3c19cd 100644
--- a/statsmodels/base/optimizer.py
+++ b/statsmodels/base/optimizer.py
@@ -3,17 +3,38 @@ Functions that are general enough to use for any model fitting. The idea is
 to untie these from LikelihoodModel so that they may be re-used generally.
 """
 from __future__ import annotations
+
 from typing import Any, Sequence
 import numpy as np
 from scipy import optimize
 from statsmodels.compat.scipy import SP_LT_15, SP_LT_17


-class Optimizer:
+def check_kwargs(kwargs: dict[str, Any], allowed: Sequence[str], method: str):
+    extra = set(list(kwargs.keys())).difference(list(allowed))
+    if extra:
+        import warnings
+
+        warnings.warn(
+            "Keyword arguments have been passed to the optimizer that have "
+            "no effect. The list of allowed keyword arguments for method "
+            f"{method} is: {', '.join(allowed)}. The list of unsupported "
+            f"keyword arguments passed include: {', '.join(extra)}. After "
+            "release 0.14, this will raise.",
+            FutureWarning
+        )
+
+
+def _check_method(method, methods):
+    if method not in methods:
+        message = "Unknown fit method %s" % method
+        raise ValueError(message)

+
+class Optimizer:
     def _fit(self, objective, gradient, start_params, fargs, kwargs,
-        hessian=None, method='newton', maxiter=100, full_output=True, disp=
-        True, callback=None, retall=False):
+             hessian=None, method='newton', maxiter=100, full_output=True,
+             disp=True, callback=None, retall=False):
         """
         Fit function for any model with an objective function.

@@ -190,7 +211,46 @@ class Optimizer:
                     documentation of `scipy.optimize.minimize`.
                     If no method is specified, then BFGS is used.
         """
-        pass
+        # TODO: generalize the regularization stuff
+        # Extract kwargs specific to fit_regularized calling fit
+        extra_fit_funcs = kwargs.get('extra_fit_funcs', dict())
+
+        methods = ['newton', 'nm', 'bfgs', 'lbfgs', 'powell', 'cg', 'ncg',
+                   'basinhopping', 'minimize']
+        methods += extra_fit_funcs.keys()
+        method = method.lower()
+        _check_method(method, methods)
+
+        fit_funcs = {
+            'newton': _fit_newton,
+            'nm': _fit_nm,  # Nelder-Mead
+            'bfgs': _fit_bfgs,
+            'lbfgs': _fit_lbfgs,
+            'cg': _fit_cg,
+            'ncg': _fit_ncg,
+            'powell': _fit_powell,
+            'basinhopping': _fit_basinhopping,
+            'minimize': _fit_minimize  # wrapper for scipy.optimize.minimize
+        }
+
+        # NOTE: fit_regularized checks the methods for these but it should be
+        #      moved up probably
+        if extra_fit_funcs:
+            fit_funcs.update(extra_fit_funcs)
+
+        func = fit_funcs[method]
+        xopt, retvals = func(objective, gradient, start_params, fargs, kwargs,
+                             disp=disp, maxiter=maxiter, callback=callback,
+                             retall=retall, full_output=full_output,
+                             hess=hessian)
+
+        optim_settings = {'optimizer': method, 'start_params': start_params,
+                          'maxiter': maxiter, 'full_output': full_output,
+                          'disp': disp, 'fargs': fargs, 'callback': callback,
+                          'retall': retall, "extra_fit_funcs": extra_fit_funcs}
+        optim_settings.update(kwargs)
+        # set as attributes or return?
+        return xopt, retvals, optim_settings

     def _fit_constrained(self, params):
         """
@@ -205,11 +265,23 @@ class Optimizer:
         model_instance.add_constraint("x1 + x2 = 2")
         result = model_instance.fit()
         """
-        pass
+        raise NotImplementedError
+
+    def _fit_regularized(self, params):
+        # TODO: code will not necessarily be general here. 3 options.
+        # 1) setup for scipy.optimize.fmin_sqlsqp
+        # 2) setup for cvxopt
+        # 3) setup for openopt
+        raise NotImplementedError
+
+
+########################################
+# Helper functions to fit


-def _fit_minimize(f, score, start_params, fargs, kwargs, disp=True, maxiter
-    =100, callback=None, retall=False, full_output=True, hess=None):
+def _fit_minimize(f, score, start_params, fargs, kwargs, disp=True,
+                  maxiter=100, callback=None, retall=False,
+                  full_output=True, hess=None):
     """
     Fit using scipy minimize, where kwarg `min_method` defines the algorithm.

@@ -254,12 +326,62 @@ def _fit_minimize(f, score, start_params, fargs, kwargs, disp=True, maxiter
         information returned from the solver used. If it is False, this is
         None.
     """
-    pass
+    kwargs.setdefault('min_method', 'BFGS')
+
+    # prepare options dict for minimize
+    filter_opts = ['extra_fit_funcs', 'niter', 'min_method', 'tol', 'bounds', 'constraints']
+    options = {k: v for k, v in kwargs.items() if k not in filter_opts}
+    options['disp'] = disp
+    options['maxiter'] = maxiter
+
+    # Use Hessian/Jacobian only if they're required by the method
+    no_hess = ['Nelder-Mead', 'Powell', 'CG', 'BFGS', 'COBYLA', 'SLSQP']
+    no_jac = ['Nelder-Mead', 'Powell', 'COBYLA']
+    if kwargs['min_method'] in no_hess:
+        hess = None
+    if kwargs['min_method'] in no_jac:
+        score = None
+
+    # Use bounds/constraints only if they're allowed by the method
+    has_bounds = ['L-BFGS-B', 'TNC', 'SLSQP', 'trust-constr']
+    # Added in SP 1.5
+    if not SP_LT_15:
+        has_bounds += ['Powell']
+    # Added in SP 1.7
+    if not SP_LT_17:
+        has_bounds += ['Nelder-Mead']
+    has_constraints = ['COBYLA', 'SLSQP', 'trust-constr']
+
+    if 'bounds' in kwargs.keys() and kwargs['min_method'] in has_bounds:
+        bounds = kwargs['bounds']
+    else:
+        bounds = None
+
+    if 'constraints' in kwargs.keys() and kwargs['min_method'] in has_constraints:
+        constraints = kwargs['constraints']
+    else:
+        constraints = ()

+    res = optimize.minimize(f, start_params, args=fargs, method=kwargs['min_method'],
+                            jac=score, hess=hess, bounds=bounds, constraints=constraints,
+                            callback=callback, options=options)

-def _fit_newton(f, score, start_params, fargs, kwargs, disp=True, maxiter=
-    100, callback=None, retall=False, full_output=True, hess=None,
-    ridge_factor=1e-10):
+    xopt = res.x
+    retvals = None
+    if full_output:
+        nit = getattr(res, 'nit', np.nan)  # scipy 0.14 compat
+        retvals = {'fopt': res.fun, 'iterations': nit,
+                   'fcalls': res.nfev, 'warnflag': res.status,
+                   'converged': res.success}
+        if retall:
+            retvals.update({'allvecs': res.values()})
+
+    return xopt, retvals
+
+
+def _fit_newton(f, score, start_params, fargs, kwargs, disp=True,
+                maxiter=100, callback=None, retall=False,
+                full_output=True, hess=None, ridge_factor=1e-10):
     """
     Fit using Newton-Raphson algorithm.

@@ -306,11 +428,64 @@ def _fit_newton(f, score, start_params, fargs, kwargs, disp=True, maxiter=
         information returned from the solver used. If it is False, this is
         None.
     """
-    pass
+    check_kwargs(kwargs, ("tol", "ridge_factor"), "newton")
+    tol = kwargs.setdefault('tol', 1e-8)
+    ridge_factor = kwargs.setdefault('ridge_factor', 1e-10)
+    iterations = 0
+    oldparams = np.inf
+    newparams = np.asarray(start_params)
+    if retall:
+        history = [oldparams, newparams]
+    while (iterations < maxiter and np.any(np.abs(newparams -
+                                                  oldparams) > tol)):
+        H = np.asarray(hess(newparams))
+        # regularize Hessian, not clear what ridge factor should be
+        # keyword option with absolute default 1e-10, see #1847
+        if not np.all(ridge_factor == 0):
+            H[np.diag_indices(H.shape[0])] += ridge_factor
+        oldparams = newparams
+        newparams = oldparams - np.linalg.solve(H, score(oldparams))
+        if retall:
+            history.append(newparams)
+        if callback is not None:
+            callback(newparams)
+        iterations += 1
+    fval = f(newparams, *fargs)  # this is the negative likelihood
+    if iterations == maxiter:
+        warnflag = 1
+        if disp:
+            print("Warning: Maximum number of iterations has been "
+                  "exceeded.")
+            print("         Current function value: %f" % fval)
+            print("         Iterations: %d" % iterations)
+    else:
+        warnflag = 0
+        if disp:
+            print("Optimization terminated successfully.")
+            print("         Current function value: %f" % fval)
+            print("         Iterations %d" % iterations)
+    if full_output:
+        (xopt, fopt, niter,
+         gopt, hopt) = (newparams, f(newparams, *fargs),
+                        iterations, score(newparams),
+                        hess(newparams))
+        converged = not warnflag
+        retvals = {'fopt': fopt, 'iterations': niter, 'score': gopt,
+                   'Hessian': hopt, 'warnflag': warnflag,
+                   'converged': converged}
+        if retall:
+            retvals.update({'allvecs': history})
+
+    else:
+        xopt = newparams
+        retvals = None
+
+    return xopt, retvals


-def _fit_bfgs(f, score, start_params, fargs, kwargs, disp=True, maxiter=100,
-    callback=None, retall=False, full_output=True, hess=None):
+def _fit_bfgs(f, score, start_params, fargs, kwargs, disp=True,
+              maxiter=100, callback=None, retall=False,
+              full_output=True, hess=None):
     """
     Fit using Broyden-Fletcher-Goldfarb-Shannon algorithm.

@@ -355,11 +530,35 @@ def _fit_bfgs(f, score, start_params, fargs, kwargs, disp=True, maxiter=100,
         information returned from the solver used. If it is False, this is
         None.
     """
-    pass
+    check_kwargs(kwargs, ("gtol", "norm", "epsilon"), "bfgs")
+    gtol = kwargs.setdefault('gtol', 1.0000000000000001e-05)
+    norm = kwargs.setdefault('norm', np.Inf)
+    epsilon = kwargs.setdefault('epsilon', 1.4901161193847656e-08)
+    retvals = optimize.fmin_bfgs(f, start_params, score, args=fargs,
+                                 gtol=gtol, norm=norm, epsilon=epsilon,
+                                 maxiter=maxiter, full_output=full_output,
+                                 disp=disp, retall=retall, callback=callback)
+    if full_output:
+        if not retall:
+            xopt, fopt, gopt, Hinv, fcalls, gcalls, warnflag = retvals
+        else:
+            (xopt, fopt, gopt, Hinv, fcalls,
+             gcalls, warnflag, allvecs) = retvals
+        converged = not warnflag
+        retvals = {'fopt': fopt, 'gopt': gopt, 'Hinv': Hinv,
+                   'fcalls': fcalls, 'gcalls': gcalls, 'warnflag':
+                       warnflag, 'converged': converged}
+        if retall:
+            retvals.update({'allvecs': allvecs})
+    else:
+        xopt = retvals
+        retvals = None

+    return xopt, retvals

-def _fit_lbfgs(f, score, start_params, fargs, kwargs, disp=True, maxiter=
-    100, callback=None, retall=False, full_output=True, hess=None):
+
+def _fit_lbfgs(f, score, start_params, fargs, kwargs, disp=True, maxiter=100,
+               callback=None, retall=False, full_output=True, hess=None):
     """
     Fit using Limited-memory Broyden-Fletcher-Goldfarb-Shannon algorithm.

@@ -410,11 +609,82 @@ def _fit_lbfgs(f, score, start_params, fargs, kwargs, disp=True, maxiter=
     its gradient with respect to the parameters do not have notationally
     consistent sign.
     """
-    pass
+    check_kwargs(
+        kwargs,
+        ("m", "pgtol", "factr", "maxfun", "epsilon", "approx_grad", "bounds", "loglike_and_score", "iprint"),
+        "lbfgs"
+    )
+    # Use unconstrained optimization by default.
+    bounds = kwargs.setdefault('bounds', [(None, None)] * len(start_params))
+    kwargs.setdefault('iprint', 0)
+
+    # Pass the following keyword argument names through to fmin_l_bfgs_b
+    # if they are present in kwargs, otherwise use the fmin_l_bfgs_b
+    # default values.
+    names = ('m', 'pgtol', 'factr', 'maxfun', 'epsilon', 'approx_grad')
+    extra_kwargs = dict((x, kwargs[x]) for x in names if x in kwargs)
+
+    # Extract values for the options related to the gradient.
+    approx_grad = kwargs.get('approx_grad', False)
+    loglike_and_score = kwargs.get('loglike_and_score', None)
+    epsilon = kwargs.get('epsilon', None)
+
+    # The approx_grad flag has superpowers nullifying the score function arg.
+    if approx_grad:
+        score = None
+
+    # Choose among three options for dealing with the gradient (the gradient
+    # of a log likelihood function with respect to its parameters
+    # is more specifically called the score in statistics terminology).
+    # The first option is to use the finite-differences
+    # approximation that is built into the fmin_l_bfgs_b optimizer.
+    # The second option is to use the provided score function.
+    # The third option is to use the score component of a provided
+    # function that simultaneously evaluates the log likelihood and score.
+    if epsilon and not approx_grad:
+        raise ValueError('a finite-differences epsilon was provided '
+                         'even though we are not using approx_grad')
+    if approx_grad and loglike_and_score:
+        raise ValueError('gradient approximation was requested '
+                         'even though an analytic loglike_and_score function '
+                         'was given')
+    if loglike_and_score:
+        func = lambda p, *a: tuple(-x for x in loglike_and_score(p, *a))
+    elif score:
+        func = f
+        extra_kwargs['fprime'] = score
+    elif approx_grad:
+        func = f

+    retvals = optimize.fmin_l_bfgs_b(func, start_params, maxiter=maxiter,
+                                     callback=callback, args=fargs,
+                                     bounds=bounds, disp=disp,
+                                     **extra_kwargs)

-def _fit_nm(f, score, start_params, fargs, kwargs, disp=True, maxiter=100,
-    callback=None, retall=False, full_output=True, hess=None):
+    if full_output:
+        xopt, fopt, d = retvals
+        # The warnflag is
+        # 0 if converged
+        # 1 if too many function evaluations or too many iterations
+        # 2 if stopped for another reason, given in d['task']
+        warnflag = d['warnflag']
+        converged = (warnflag == 0)
+        gopt = d['grad']
+        fcalls = d['funcalls']
+        iterations = d['nit']
+        retvals = {'fopt': fopt, 'gopt': gopt, 'fcalls': fcalls,
+                   'warnflag': warnflag, 'converged': converged,
+                   'iterations': iterations}
+    else:
+        xopt = retvals[0]
+        retvals = None
+
+    return xopt, retvals
+
+
+def _fit_nm(f, score, start_params, fargs, kwargs, disp=True,
+            maxiter=100, callback=None, retall=False,
+            full_output=True, hess=None):
     """
     Fit using Nelder-Mead algorithm.

@@ -459,11 +729,35 @@ def _fit_nm(f, score, start_params, fargs, kwargs, disp=True, maxiter=100,
         information returned from the solver used. If it is False, this is
         None.
     """
-    pass
+    check_kwargs(kwargs, ("xtol", "ftol", "maxfun"), "nm")
+    xtol = kwargs.setdefault('xtol', 0.0001)
+    ftol = kwargs.setdefault('ftol', 0.0001)
+    maxfun = kwargs.setdefault('maxfun', None)
+    retvals = optimize.fmin(f, start_params, args=fargs, xtol=xtol,
+                            ftol=ftol, maxiter=maxiter, maxfun=maxfun,
+                            full_output=full_output, disp=disp, retall=retall,
+                            callback=callback)
+    if full_output:
+        if not retall:
+            xopt, fopt, niter, fcalls, warnflag = retvals
+        else:
+            xopt, fopt, niter, fcalls, warnflag, allvecs = retvals
+        converged = not warnflag
+        retvals = {'fopt': fopt, 'iterations': niter,
+                   'fcalls': fcalls, 'warnflag': warnflag,
+                   'converged': converged}
+        if retall:
+            retvals.update({'allvecs': allvecs})
+    else:
+        xopt = retvals
+        retvals = None

+    return xopt, retvals

-def _fit_cg(f, score, start_params, fargs, kwargs, disp=True, maxiter=100,
-    callback=None, retall=False, full_output=True, hess=None):
+
+def _fit_cg(f, score, start_params, fargs, kwargs, disp=True,
+            maxiter=100, callback=None, retall=False,
+            full_output=True, hess=None):
     """
     Fit using Conjugate Gradient algorithm.

@@ -508,11 +802,35 @@ def _fit_cg(f, score, start_params, fargs, kwargs, disp=True, maxiter=100,
         information returned from the solver used. If it is False, this is
         None.
     """
-    pass
+    check_kwargs(kwargs, ("gtol", "norm", "epsilon"), "cg")
+    gtol = kwargs.setdefault('gtol', 1.0000000000000001e-05)
+    norm = kwargs.setdefault('norm', np.Inf)
+    epsilon = kwargs.setdefault('epsilon', 1.4901161193847656e-08)
+    retvals = optimize.fmin_cg(f, start_params, score, gtol=gtol, norm=norm,
+                               epsilon=epsilon, maxiter=maxiter,
+                               full_output=full_output, disp=disp,
+                               retall=retall, callback=callback)
+    if full_output:
+        if not retall:
+            xopt, fopt, fcalls, gcalls, warnflag = retvals
+        else:
+            xopt, fopt, fcalls, gcalls, warnflag, allvecs = retvals
+        converged = not warnflag
+        retvals = {'fopt': fopt, 'fcalls': fcalls, 'gcalls': gcalls,
+                   'warnflag': warnflag, 'converged': converged}
+        if retall:
+            retvals.update({'allvecs': allvecs})
+
+    else:
+        xopt = retvals
+        retvals = None

+    return xopt, retvals

-def _fit_ncg(f, score, start_params, fargs, kwargs, disp=True, maxiter=100,
-    callback=None, retall=False, full_output=True, hess=None):
+
+def _fit_ncg(f, score, start_params, fargs, kwargs, disp=True,
+             maxiter=100, callback=None, retall=False,
+             full_output=True, hess=None):
     """
     Fit using Newton Conjugate Gradient algorithm.

@@ -557,11 +875,37 @@ def _fit_ncg(f, score, start_params, fargs, kwargs, disp=True, maxiter=100,
         information returned from the solver used. If it is False, this is
         None.
     """
-    pass
+    check_kwargs(kwargs, ("fhess_p", "avextol", "epsilon"), "ncg")
+    fhess_p = kwargs.setdefault('fhess_p', None)
+    avextol = kwargs.setdefault('avextol', 1.0000000000000001e-05)
+    epsilon = kwargs.setdefault('epsilon', 1.4901161193847656e-08)
+    retvals = optimize.fmin_ncg(f, start_params, score, fhess_p=fhess_p,
+                                fhess=hess, args=fargs, avextol=avextol,
+                                epsilon=epsilon, maxiter=maxiter,
+                                full_output=full_output, disp=disp,
+                                retall=retall, callback=callback)
+    if full_output:
+        if not retall:
+            xopt, fopt, fcalls, gcalls, hcalls, warnflag = retvals
+        else:
+            xopt, fopt, fcalls, gcalls, hcalls, warnflag, allvecs = \
+                retvals
+        converged = not warnflag
+        retvals = {'fopt': fopt, 'fcalls': fcalls, 'gcalls': gcalls,
+                   'hcalls': hcalls, 'warnflag': warnflag,
+                   'converged': converged}
+        if retall:
+            retvals.update({'allvecs': allvecs})
+    else:
+        xopt = retvals
+        retvals = None
+
+    return xopt, retvals


-def _fit_powell(f, score, start_params, fargs, kwargs, disp=True, maxiter=
-    100, callback=None, retall=False, full_output=True, hess=None):
+def _fit_powell(f, score, start_params, fargs, kwargs, disp=True,
+                maxiter=100, callback=None, retall=False,
+                full_output=True, hess=None):
     """
     Fit using Powell's conjugate direction algorithm.

@@ -606,11 +950,38 @@ def _fit_powell(f, score, start_params, fargs, kwargs, disp=True, maxiter=
         information returned from the solver used. If it is False, this is
         None.
     """
-    pass
+    check_kwargs(kwargs, ("xtol", "ftol", "maxfun", "start_direc"), "powell")
+    xtol = kwargs.setdefault('xtol', 0.0001)
+    ftol = kwargs.setdefault('ftol', 0.0001)
+    maxfun = kwargs.setdefault('maxfun', None)
+    start_direc = kwargs.setdefault('start_direc', None)
+    retvals = optimize.fmin_powell(f, start_params, args=fargs, xtol=xtol,
+                                   ftol=ftol, maxiter=maxiter, maxfun=maxfun,
+                                   full_output=full_output, disp=disp,
+                                   retall=retall, callback=callback,
+                                   direc=start_direc)
+    if full_output:
+        if not retall:
+            xopt, fopt, direc, niter, fcalls, warnflag = retvals
+        else:
+            xopt, fopt, direc, niter, fcalls, warnflag, allvecs = \
+                retvals
+        converged = not warnflag
+        retvals = {'fopt': fopt, 'direc': direc, 'iterations': niter,
+                   'fcalls': fcalls, 'warnflag': warnflag,
+                   'converged': converged}
+        if retall:
+            retvals.update({'allvecs': allvecs})
+    else:
+        xopt = retvals
+        retvals = None
+
+    return xopt, retvals


 def _fit_basinhopping(f, score, start_params, fargs, kwargs, disp=True,
-    maxiter=100, callback=None, retall=False, full_output=True, hess=None):
+                      maxiter=100, callback=None, retall=False,
+                      full_output=True, hess=None):
     """
     Fit using Basin-hopping algorithm.

@@ -655,4 +1026,40 @@ def _fit_basinhopping(f, score, start_params, fargs, kwargs, disp=True,
         information returned from the solver used. If it is False, this is
         None.
     """
-    pass
+    check_kwargs(
+        kwargs,
+        ("niter", "niter_success", "T", "stepsize", "interval", "minimizer", "seed"),
+        "basinhopping"
+    )
+    kwargs = {k: v for k, v in kwargs.items()}
+    niter = kwargs.setdefault('niter', 100)
+    niter_success = kwargs.setdefault('niter_success', None)
+    T = kwargs.setdefault('T', 1.0)
+    stepsize = kwargs.setdefault('stepsize', 0.5)
+    interval = kwargs.setdefault('interval', 50)
+    seed = kwargs.get("seed")
+    minimizer_kwargs = kwargs.get('minimizer', {})
+    minimizer_kwargs['args'] = fargs
+    minimizer_kwargs['jac'] = score
+    method = minimizer_kwargs.get('method', None)
+    if method and method != 'L-BFGS-B':  # l_bfgs_b does not take a hessian
+        minimizer_kwargs['hess'] = hess
+
+    retvals = optimize.basinhopping(f, start_params,
+                                    minimizer_kwargs=minimizer_kwargs,
+                                    niter=niter, niter_success=niter_success,
+                                    T=T, stepsize=stepsize, disp=disp,
+                                    callback=callback, interval=interval,
+                                    seed=seed)
+    xopt = retvals.x
+    if full_output:
+        retvals = {
+            'fopt': retvals.fun,
+            'iterations': retvals.nit,
+            'fcalls': retvals.nfev,
+            'converged': 'completed successfully' in retvals.message[0]
+        }
+    else:
+        retvals = None
+
+    return xopt, retvals
diff --git a/statsmodels/base/transform.py b/statsmodels/base/transform.py
index b2da0c8db..d18d9c9e6 100644
--- a/statsmodels/base/transform.py
+++ b/statsmodels/base/transform.py
@@ -49,7 +49,23 @@ class BoxCox:
         Box, G. E. P., and D. R. Cox. 1964. "An Analysis of Transformations".
         `Journal of the Royal Statistical Society`. 26 (2): 211-252.
         """
-        pass
+        x = np.asarray(x)
+
+        if np.any(x <= 0):
+            raise ValueError("Non-positive x.")
+
+        if lmbda is None:
+            lmbda = self._est_lambda(x,
+                                     method=method,
+                                     **kwargs)
+
+        # if less than 0.01, treat lambda as zero.
+        if np.isclose(lmbda, 0.):
+            y = np.log(x)
+        else:
+            y = (np.power(x, lmbda) - 1.) / lmbda
+
+        return y, lmbda

     def untransform_boxcox(self, x, lmbda, method='naive'):
         """
@@ -75,7 +91,18 @@ class BoxCox:
         y : array_like
             The untransformed series.
         """
-        pass
+        method = method.lower()
+        x = np.asarray(x)
+
+        if method == 'naive':
+            if np.isclose(lmbda, 0.):
+                y = np.exp(x)
+            else:
+                y = np.power(lmbda * x + 1, 1. / lmbda)
+        else:
+            raise ValueError("Method '{0}' not understood.".format(method))
+
+        return y

     def _est_lambda(self, x, bounds=(-1, 2), method='guerrero', **kwargs):
         """
@@ -104,10 +131,25 @@ class BoxCox:
         lmbda : float
             The lambda parameter.
         """
-        pass
+        method = method.lower()
+
+        if len(bounds) != 2:
+            raise ValueError("Bounds of length {0} not understood."
+                             .format(len(bounds)))
+        elif bounds[0] >= bounds[1]:
+            raise ValueError("Lower bound exceeds upper bound.")
+
+        if method == 'guerrero':
+            lmbda = self._guerrero_cv(x, bounds=bounds, **kwargs)
+        elif method == 'loglik':
+            lmbda = self._loglik_boxcox(x, bounds=bounds, **kwargs)
+        else:
+            raise ValueError("Method '{0}' not understood.".format(method))
+
+        return lmbda

-    def _guerrero_cv(self, x, bounds, window_length=4, scale='sd', options=
-        {'maxiter': 25}):
+    def _guerrero_cv(self, x, bounds, window_length=4, scale='sd',
+                     options={'maxiter': 25}):
         """
         Computes lambda using guerrero's coefficient of variation. If no
         seasonality is present in the data, window_length is set to 4 (as
@@ -132,7 +174,31 @@ class BoxCox:
         options : dict
             The options (as a dict) to be passed to the optimizer.
         """
-        pass
+        nobs = len(x)
+        groups = int(nobs / window_length)
+
+        # remove the first n < window_length observations from consideration.
+        grouped_data = np.reshape(x[nobs - (groups * window_length): nobs],
+                                  (groups, window_length))
+        mean = np.mean(grouped_data, 1)
+
+        scale = scale.lower()
+        if scale == 'sd':
+            dispersion = np.std(grouped_data, 1, ddof=1)
+        elif scale == 'mad':
+            dispersion = mad(grouped_data, axis=1)
+        else:
+            raise ValueError("Scale '{0}' not understood.".format(scale))
+
+        def optim(lmbda):
+            rat = np.divide(dispersion, np.power(mean, 1 - lmbda))  # eq 6, p 40
+            return np.std(rat, ddof=1) / np.mean(rat)
+
+        res = minimize_scalar(optim,
+                              bounds=bounds,
+                              method='bounded',
+                              options=options)
+        return res.x

     def _loglik_boxcox(self, x, bounds, options={'maxiter': 25}):
         """
@@ -146,4 +212,15 @@ class BoxCox:
         options : dict
             The options (as a dict) to be passed to the optimizer.
         """
-        pass
+        sum_x = np.sum(np.log(x))
+        nobs = len(x)
+
+        def optim(lmbda):
+            y, lmbda = self.transform_boxcox(x, lmbda)
+            return (1 - lmbda) * sum_x + (nobs / 2.) * np.log(np.var(y))
+
+        res = minimize_scalar(optim,
+                              bounds=bounds,
+                              method='bounded',
+                              options=options)
+        return res.x
diff --git a/statsmodels/base/wrapper.py b/statsmodels/base/wrapper.py
index 9dd67c3db..b1ba8ee23 100644
--- a/statsmodels/base/wrapper.py
+++ b/statsmodels/base/wrapper.py
@@ -20,14 +20,17 @@ class ResultsWrapper:

     def __getattribute__(self, attr):
         get = lambda name: object.__getattribute__(self, name)
+
         try:
             results = get('_results')
         except AttributeError:
             pass
+
         try:
             return get(attr)
         except AttributeError:
             pass
+
         obj = getattr(results, attr)
         data = results.model.data
         how = self._wrap_attrs.get(attr)
@@ -35,12 +38,15 @@ class ResultsWrapper:
             obj = data.wrap_output(obj, how[0], *how[1:])
         elif how:
             obj = data.wrap_output(obj, how=how)
+
         return obj

     def __getstate__(self):
+        # print 'pickling wrapper', self.__dict__
         return self.__dict__

     def __setstate__(self, dict_):
+        # print 'unpickling wrapper', dict_
         self.__dict__.update(dict_)

     def save(self, fname, remove_data=False):
@@ -57,7 +63,12 @@ class ResultsWrapper:
             pickling. See the remove_data method.
             In some cases not all arrays will be set to None.
         """
-        pass
+        from statsmodels.iolib.smpickle import save_pickle
+
+        if remove_data:
+            self.remove_data()
+
+        save_pickle(self, fname)

     @classmethod
     def load(cls, fname):
@@ -80,4 +91,42 @@ class ResultsWrapper:
         Results
             The unpickled results instance.
         """
-        pass
+        from statsmodels.iolib.smpickle import load_pickle
+        return load_pickle(fname)
+
+
+def union_dicts(*dicts):
+    result = {}
+    for d in dicts:
+        result.update(d)
+    return result
+
+
+def make_wrapper(func, how):
+    @functools.wraps(func)
+    def wrapper(self, *args, **kwargs):
+        results = object.__getattribute__(self, '_results')
+        data = results.model.data
+        if how and isinstance(how, tuple):
+            obj = data.wrap_output(func(results, *args, **kwargs), how[0], how[1:])
+        elif how:
+            obj = data.wrap_output(func(results, *args, **kwargs), how)
+        return obj
+
+    sig = inspect.signature(func)
+    formatted = str(sig)
+
+    doc = dedent(wrapper.__doc__) if wrapper.__doc__ else ''
+    wrapper.__doc__ = "\n%s%s\n%s" % (func.__name__, formatted, doc)
+
+    return wrapper
+
+
+def populate_wrapper(klass, wrapping):
+    for meth, how in klass._wrap_methods.items():
+        if not hasattr(wrapping, meth):
+            continue
+
+        func = getattr(wrapping, meth)
+        wrapper = make_wrapper(func, how)
+        setattr(klass, meth, wrapper)
diff --git a/statsmodels/compat/_scipy_multivariate_t.py b/statsmodels/compat/_scipy_multivariate_t.py
index 1465536a1..2eb1e5ff2 100644
--- a/statsmodels/compat/_scipy_multivariate_t.py
+++ b/statsmodels/compat/_scipy_multivariate_t.py
@@ -1,13 +1,24 @@
+# flake8: noqa: E501
+#
+# Author: Joris Vankerschaver 2013
+#
+
 import numpy as np
 import scipy.linalg
 from scipy._lib import doccer
 from scipy.special import gammaln
+
 from scipy._lib._util import check_random_state
+
 from scipy.stats import mvn
+
 _LOG_2PI = np.log(2 * np.pi)
 _LOG_2 = np.log(2)
 _LOG_PI = np.log(np.pi)
-_doc_random_state = """random_state : {None, int, np.random.RandomState, np.random.Generator}, optional
+
+
+_doc_random_state = """\
+random_state : {None, int, np.random.RandomState, np.random.Generator}, optional
     Used for drawing random variates.
     If `seed` is `None` the `~np.random.RandomState` singleton is used.
     If `seed` is an int, a new ``RandomState`` instance is used, seeded
@@ -24,7 +35,10 @@ def _squeeze_output(out):
     if necessary.

     """
-    pass
+    out = out.squeeze()
+    if out.ndim == 0:
+        out = out[()]
+    return out


 def _eigvalsh_to_eps(spectrum, cond=None, rcond=None):
@@ -52,10 +66,17 @@ def _eigvalsh_to_eps(spectrum, cond=None, rcond=None):
         Magnitude cutoff for numerical negligibility.

     """
-    pass
+    if rcond is not None:
+        cond = rcond
+    if cond in [None, -1]:
+        t = spectrum.dtype.char.lower()
+        factor = {'f': 1E3, 'd': 1E6}
+        cond = factor[t] * np.finfo(t).eps
+    eps = cond * np.max(abs(spectrum))
+    return eps


-def _pinv_1d(v, eps=1e-05):
+def _pinv_1d(v, eps=1e-5):
     """
     A helper function for computing the pseudoinverse.

@@ -72,7 +93,7 @@ def _pinv_1d(v, eps=1e-05):
         A vector of pseudo-inverted numbers.

     """
-    pass
+    return np.array([0 if abs(x) <= eps else 1/x for x in v], dtype=float)


 class _PSD:
@@ -115,9 +136,13 @@ class _PSD:

     """

-    def __init__(self, M, cond=None, rcond=None, lower=True, check_finite=
-        True, allow_singular=True):
+    def __init__(self, M, cond=None, rcond=None, lower=True,
+                 check_finite=True, allow_singular=True):
+        # Compute the symmetric eigendecomposition.
+        # Note that eigh takes care of array conversion, chkfinite,
+        # and assertion that the matrix is square.
         s, u = scipy.linalg.eigh(M, lower=lower, check_finite=check_finite)
+
         eps = _eigvalsh_to_eps(s, cond, rcond)
         if np.min(s) < -eps:
             raise ValueError('the input matrix must be positive semidefinite')
@@ -126,11 +151,21 @@ class _PSD:
             raise np.linalg.LinAlgError('singular matrix')
         s_pinv = _pinv_1d(s, eps)
         U = np.multiply(u, np.sqrt(s_pinv))
+
+        # Initialize the eagerly precomputed attributes.
         self.rank = len(d)
         self.U = U
         self.log_pdet = np.sum(np.log(d))
+
+        # Initialize an attribute to be lazily computed.
         self._pinv = None

+    @property
+    def pinv(self):
+        if self._pinv is None:
+            self._pinv = np.dot(self.U, self.U.T)
+        return self._pinv
+

 class multi_rv_generic:
     """
@@ -138,7 +173,6 @@ class multi_rv_generic:
     distributions.

     """
-
     def __init__(self, seed=None):
         super(multi_rv_generic, self).__init__()
         self._random_state = check_random_state(seed)
@@ -156,7 +190,17 @@ class multi_rv_generic:
         If an int, use a new RandomState instance seeded with seed.

         """
-        pass
+        return self._random_state
+
+    @random_state.setter
+    def random_state(self, seed):
+        self._random_state = check_random_state(seed)
+
+    def _get_random_state(self, random_state):
+        if random_state is not None:
+            return check_random_state(random_state)
+        else:
+            return self._random_state


 class multi_rv_frozen:
@@ -164,34 +208,52 @@ class multi_rv_frozen:
     Class which encapsulates common functionality between all frozen
     multivariate distributions.
     """
+    @property
+    def random_state(self):
+        return self._dist._random_state

+    @random_state.setter
+    def random_state(self, seed):
+        self._dist._random_state = check_random_state(seed)

-_mvn_doc_default_callparams = """mean : array_like, optional
+
+_mvn_doc_default_callparams = """\
+mean : array_like, optional
     Mean of the distribution (default zero)
 cov : array_like, optional
     Covariance matrix of the distribution (default one)
 allow_singular : bool, optional
     Whether to allow a singular covariance matrix.  (Default: False)
 """
-_mvn_doc_callparams_note = """Setting the parameter `mean` to `None` is equivalent to having `mean`
+
+_mvn_doc_callparams_note = \
+    """Setting the parameter `mean` to `None` is equivalent to having `mean`
     be the zero-vector. The parameter `cov` can be a scalar, in which case
     the covariance matrix is the identity times that value, a vector of
     diagonal entries for the covariance matrix, or a two-dimensional
     array_like.
     """
-_mvn_doc_frozen_callparams = ''
-_mvn_doc_frozen_callparams_note = (
-    'See class definition for a detailed description of parameters.')
-mvn_docdict_params = {'_mvn_doc_default_callparams':
-    _mvn_doc_default_callparams, '_mvn_doc_callparams_note':
-    _mvn_doc_callparams_note, '_doc_random_state': _doc_random_state}
-mvn_docdict_noparams = {'_mvn_doc_default_callparams':
-    _mvn_doc_frozen_callparams, '_mvn_doc_callparams_note':
-    _mvn_doc_frozen_callparams_note, '_doc_random_state': _doc_random_state}
+
+_mvn_doc_frozen_callparams = ""
+
+_mvn_doc_frozen_callparams_note = \
+    """See class definition for a detailed description of parameters."""
+
+mvn_docdict_params = {
+    '_mvn_doc_default_callparams': _mvn_doc_default_callparams,
+    '_mvn_doc_callparams_note': _mvn_doc_callparams_note,
+    '_doc_random_state': _doc_random_state
+}
+
+mvn_docdict_noparams = {
+    '_mvn_doc_default_callparams': _mvn_doc_frozen_callparams,
+    '_mvn_doc_callparams_note': _mvn_doc_frozen_callparams_note,
+    '_doc_random_state': _doc_random_state
+}


 class multivariate_normal_gen(multi_rv_generic):
-    """
+    r"""
     A multivariate normal random variable.

     The `mean` keyword specifies the mean. The `cov` keyword specifies the
@@ -240,10 +302,10 @@ class multivariate_normal_gen(multi_rv_generic):

     .. math::

-        f(x) = \\frac{1}{\\sqrt{(2 \\pi)^k \\det \\Sigma}}
-               \\exp\\left( -\\frac{1}{2} (x - \\mu)^T \\Sigma^{-1} (x - \\mu) \\right),
+        f(x) = \frac{1}{\sqrt{(2 \pi)^k \det \Sigma}}
+               \exp\left( -\frac{1}{2} (x - \mu)^T \Sigma^{-1} (x - \mu) \right),

-    where :math:`\\mu` is the mean, :math:`\\Sigma` the covariance matrix,
+    where :math:`\mu` is the mean, :math:`\Sigma` the covariance matrix,
     and :math:`k` is the dimension of the space where :math:`x` takes values.

     .. versionadded:: 0.14.0
@@ -286,8 +348,9 @@ class multivariate_normal_gen(multi_rv_generic):
         See `multivariate_normal_frozen` for more information.

         """
-        return multivariate_normal_frozen(mean, cov, allow_singular=
-            allow_singular, seed=seed)
+        return multivariate_normal_frozen(mean, cov,
+                                          allow_singular=allow_singular,
+                                          seed=seed)

     def _process_parameters(self, dim, mean, cov):
         """
@@ -295,7 +358,62 @@ class multivariate_normal_gen(multi_rv_generic):
         mean and covariance are full vector resp. matrix.

         """
-        pass
+
+        # Try to infer dimensionality
+        if dim is None:
+            if mean is None:
+                if cov is None:
+                    dim = 1
+                else:
+                    cov = np.asarray(cov, dtype=float)
+                    if cov.ndim < 2:
+                        dim = 1
+                    else:
+                        dim = cov.shape[0]
+            else:
+                mean = np.asarray(mean, dtype=float)
+                dim = mean.size
+        else:
+            if not np.isscalar(dim):
+                raise ValueError("Dimension of random variable must be "
+                                 "a scalar.")
+
+        # Check input sizes and return full arrays for mean and cov if
+        # necessary
+        if mean is None:
+            mean = np.zeros(dim)
+        mean = np.asarray(mean, dtype=float)
+
+        if cov is None:
+            cov = 1.0
+        cov = np.asarray(cov, dtype=float)
+
+        if dim == 1:
+            mean.shape = (1,)
+            cov.shape = (1, 1)
+
+        if mean.ndim != 1 or mean.shape[0] != dim:
+            raise ValueError("Array 'mean' must be a vector of length %d." %
+                             dim)
+        if cov.ndim == 0:
+            cov = cov * np.eye(dim)
+        elif cov.ndim == 1:
+            cov = np.diag(cov)
+        elif cov.ndim == 2 and cov.shape != (dim, dim):
+            rows, cols = cov.shape
+            if rows != cols:
+                msg = ("Array 'cov' must be square if it is two dimensional,"
+                       " but cov.shape = %s." % str(cov.shape))
+            else:
+                msg = ("Dimension mismatch: array 'cov' is of shape %s,"
+                       " but 'mean' is a vector of length %d.")
+                msg = msg % (str(cov.shape), len(mean))
+            raise ValueError(msg)
+        elif cov.ndim > 2:
+            raise ValueError("Array 'cov' must be at most two-dimensional,"
+                             " but cov.ndim = %d" % cov.ndim)
+
+        return dim, mean, cov

     def _process_quantiles(self, x, dim):
         """
@@ -303,7 +421,17 @@ class multivariate_normal_gen(multi_rv_generic):
         each data point.

         """
-        pass
+        x = np.asarray(x, dtype=float)
+
+        if x.ndim == 0:
+            x = x[np.newaxis]
+        elif x.ndim == 1:
+            if dim == 1:
+                x = x[:, np.newaxis]
+            else:
+                x = x[np.newaxis, :]
+
+        return x

     def _logpdf(self, x, mean, prec_U, log_det_cov, rank):
         """
@@ -328,7 +456,9 @@ class multivariate_normal_gen(multi_rv_generic):
         called directly; use 'logpdf' instead.

         """
-        pass
+        dev = x - mean
+        maha = np.sum(np.square(np.dot(dev, prec_U)), axis=-1)
+        return -0.5 * (rank * _LOG_2PI + log_det_cov + maha)

     def logpdf(self, x, mean=None, cov=1, allow_singular=False):
         """
@@ -350,7 +480,11 @@ class multivariate_normal_gen(multi_rv_generic):
         %(_mvn_doc_callparams_note)s

         """
-        pass
+        dim, mean, cov = self._process_parameters(None, mean, cov)
+        x = self._process_quantiles(x, dim)
+        psd = _PSD(cov, allow_singular=allow_singular)
+        out = self._logpdf(x, mean, psd.U, psd.log_pdet, psd.rank)
+        return _squeeze_output(out)

     def pdf(self, x, mean=None, cov=1, allow_singular=False):
         """
@@ -372,7 +506,11 @@ class multivariate_normal_gen(multi_rv_generic):
         %(_mvn_doc_callparams_note)s

         """
-        pass
+        dim, mean, cov = self._process_parameters(None, mean, cov)
+        x = self._process_quantiles(x, dim)
+        psd = _PSD(cov, allow_singular=allow_singular)
+        out = np.exp(self._logpdf(x, mean, psd.U, psd.log_pdet, psd.rank))
+        return _squeeze_output(out)

     def _cdf(self, x, mean, cov, maxpts, abseps, releps):
         """
@@ -399,10 +537,15 @@ class multivariate_normal_gen(multi_rv_generic):
         .. versionadded:: 1.0.0

         """
-        pass
+        lower = np.full(mean.shape, -np.inf)
+        # mvnun expects 1-d arguments, so process points sequentially
+        func1d = lambda x_slice: mvn.mvnun(lower, x_slice, mean, cov,
+                                           maxpts, abseps, releps)[0]
+        out = np.apply_along_axis(func1d, -1, x)
+        return _squeeze_output(out)

     def logcdf(self, x, mean=None, cov=1, allow_singular=False, maxpts=None,
-        abseps=1e-05, releps=1e-05):
+               abseps=1e-5, releps=1e-5):
         """
         Log of the multivariate normal cumulative distribution function.

@@ -431,10 +574,17 @@ class multivariate_normal_gen(multi_rv_generic):
         .. versionadded:: 1.0.0

         """
-        pass
+        dim, mean, cov = self._process_parameters(None, mean, cov)
+        x = self._process_quantiles(x, dim)
+        # Use _PSD to check covariance matrix
+        _PSD(cov, allow_singular=allow_singular)
+        if not maxpts:
+            maxpts = 1000000 * dim
+        out = np.log(self._cdf(x, mean, cov, maxpts, abseps, releps))
+        return out

     def cdf(self, x, mean=None, cov=1, allow_singular=False, maxpts=None,
-        abseps=1e-05, releps=1e-05):
+            abseps=1e-5, releps=1e-5):
         """
         Multivariate normal cumulative distribution function.

@@ -463,7 +613,14 @@ class multivariate_normal_gen(multi_rv_generic):
         .. versionadded:: 1.0.0

         """
-        pass
+        dim, mean, cov = self._process_parameters(None, mean, cov)
+        x = self._process_quantiles(x, dim)
+        # Use _PSD to check covariance matrix
+        _PSD(cov, allow_singular=allow_singular)
+        if not maxpts:
+            maxpts = 1000000 * dim
+        out = self._cdf(x, mean, cov, maxpts, abseps, releps)
+        return out

     def rvs(self, mean=None, cov=1, size=1, random_state=None):
         """
@@ -487,7 +644,11 @@ class multivariate_normal_gen(multi_rv_generic):
         %(_mvn_doc_callparams_note)s

         """
-        pass
+        dim, mean, cov = self._process_parameters(None, mean, cov)
+
+        random_state = self._get_random_state(random_state)
+        out = random_state.multivariate_normal(mean, cov, size)
+        return _squeeze_output(out)

     def entropy(self, mean=None, cov=1):
         """
@@ -507,16 +668,17 @@ class multivariate_normal_gen(multi_rv_generic):
         %(_mvn_doc_callparams_note)s

         """
-        pass
+        dim, mean, cov = self._process_parameters(None, mean, cov)
+        _, logdet = np.linalg.slogdet(2 * np.pi * np.e * cov)
+        return 0.5 * logdet


 multivariate_normal = multivariate_normal_gen()


 class multivariate_normal_frozen(multi_rv_frozen):
-
     def __init__(self, mean=None, cov=1, allow_singular=False, seed=None,
-        maxpts=None, abseps=1e-05, releps=1e-05):
+                 maxpts=None, abseps=1e-5, releps=1e-5):
         """
         Create a frozen multivariate normal distribution.

@@ -562,8 +724,8 @@ class multivariate_normal_frozen(multi_rv_frozen):

         """
         self._dist = multivariate_normal_gen(seed)
-        self.dim, self.mean, self.cov = self._dist._process_parameters(None,
-            mean, cov)
+        self.dim, self.mean, self.cov = self._dist._process_parameters(
+                                                            None, mean, cov)
         self.cov_info = _PSD(self.cov, allow_singular=allow_singular)
         if not maxpts:
             maxpts = 1000000 * self.dim
@@ -571,6 +733,27 @@ class multivariate_normal_frozen(multi_rv_frozen):
         self.abseps = abseps
         self.releps = releps

+    def logpdf(self, x):
+        x = self._dist._process_quantiles(x, self.dim)
+        out = self._dist._logpdf(x, self.mean, self.cov_info.U,
+                                 self.cov_info.log_pdet, self.cov_info.rank)
+        return _squeeze_output(out)
+
+    def pdf(self, x):
+        return np.exp(self.logpdf(x))
+
+    def logcdf(self, x):
+        return np.log(self.cdf(x))
+
+    def cdf(self, x):
+        x = self._dist._process_quantiles(x, self.dim)
+        out = self._dist._cdf(x, self.mean, self.cov, self.maxpts, self.abseps,
+                              self.releps)
+        return _squeeze_output(out)
+
+    def rvs(self, size=1, random_state=None):
+        return self._dist.rvs(self.mean, self.cov, size, random_state)
+
     def entropy(self):
         """
         Computes the differential entropy of the multivariate normal.
@@ -581,10 +764,13 @@ class multivariate_normal_frozen(multi_rv_frozen):
             Entropy of the multivariate normal distribution

         """
-        pass
+        log_pdet = self.cov_info.log_pdet
+        rank = self.cov_info.rank
+        return 0.5 * (rank * (_LOG_2PI + 1) + log_pdet)


-_mvt_doc_default_callparams = """
+_mvt_doc_default_callparams = \
+"""
 loc : array_like, optional
     Location of the distribution. (default ``0``)
 shape : array_like, optional
@@ -595,23 +781,32 @@ df : float, optional
 allow_singular : bool, optional
     Whether to allow a singular matrix. (default ``False``)
 """
-_mvt_doc_callparams_note = """Setting the parameter `loc` to ``None`` is equivalent to having `loc`
+
+_mvt_doc_callparams_note = \
+"""Setting the parameter `loc` to ``None`` is equivalent to having `loc`
 be the zero-vector. The parameter `shape` can be a scalar, in which case
 the shape matrix is the identity times that value, a vector of
 diagonal entries for the shape matrix, or a two-dimensional array_like.
 """
-_mvt_doc_frozen_callparams_note = (
-    'See class definition for a detailed description of parameters.')
-mvt_docdict_params = {'_mvt_doc_default_callparams':
-    _mvt_doc_default_callparams, '_mvt_doc_callparams_note':
-    _mvt_doc_callparams_note, '_doc_random_state': _doc_random_state}
-mvt_docdict_noparams = {'_mvt_doc_default_callparams': '',
+
+_mvt_doc_frozen_callparams_note = \
+"""See class definition for a detailed description of parameters."""
+
+mvt_docdict_params = {
+    '_mvt_doc_default_callparams': _mvt_doc_default_callparams,
+    '_mvt_doc_callparams_note': _mvt_doc_callparams_note,
+    '_doc_random_state': _doc_random_state
+}
+
+mvt_docdict_noparams = {
+    '_mvt_doc_default_callparams': "",
     '_mvt_doc_callparams_note': _mvt_doc_frozen_callparams_note,
-    '_doc_random_state': _doc_random_state}
+    '_doc_random_state': _doc_random_state
+}


 class multivariate_t_gen(multi_rv_generic):
-    """
+    r"""
     A multivariate t-distributed random variable.

     The `loc` parameter specifies the location. The `shape` parameter specifies
@@ -650,15 +845,15 @@ class multivariate_t_gen(multi_rv_generic):

     .. math::

-        f(x) = \\frac{\\Gamma(\\nu + p)/2}{\\Gamma(\\nu/2)\\nu^{p/2}\\pi^{p/2}|\\Sigma|^{1/2}}
-               \\exp\\left[1 + \\frac{1}{\\nu} (\\mathbf{x} - \\boldsymbol{\\mu})^{\\top}
-               \\boldsymbol{\\Sigma}^{-1}
-               (\\mathbf{x} - \\boldsymbol{\\mu}) \\right]^{-(\\nu + p)/2},
+        f(x) = \frac{\Gamma(\nu + p)/2}{\Gamma(\nu/2)\nu^{p/2}\pi^{p/2}|\Sigma|^{1/2}}
+               \exp\left[1 + \frac{1}{\nu} (\mathbf{x} - \boldsymbol{\mu})^{\top}
+               \boldsymbol{\Sigma}^{-1}
+               (\mathbf{x} - \boldsymbol{\mu}) \right]^{-(\nu + p)/2},

-    where :math:`p` is the dimension of :math:`\\mathbf{x}`,
-    :math:`\\boldsymbol{\\mu}` is the :math:`p`-dimensional location,
-    :math:`\\boldsymbol{\\Sigma}` the :math:`p \\times p`-dimensional shape
-    matrix, and :math:`\\nu` is the degrees of freedom.
+    where :math:`p` is the dimension of :math:`\mathbf{x}`,
+    :math:`\boldsymbol{\mu}` is the :math:`p`-dimensional location,
+    :math:`\boldsymbol{\Sigma}` the :math:`p \times p`-dimensional shape
+    matrix, and :math:`\nu` is the degrees of freedom.

     .. versionadded:: 1.6.0

@@ -688,8 +883,8 @@ class multivariate_t_gen(multi_rv_generic):
         self.__doc__ = doccer.docformat(self.__doc__, mvt_docdict_params)
         self._random_state = check_random_state(seed)

-    def __call__(self, loc=None, shape=1, df=1, allow_singular=False, seed=None
-        ):
+    def __call__(self, loc=None, shape=1, df=1, allow_singular=False,
+                 seed=None):
         """
         Create a frozen multivariate t-distribution. See
         `multivariate_t_frozen` for parameters.
@@ -697,9 +892,10 @@ class multivariate_t_gen(multi_rv_generic):
         """
         if df == np.inf:
             return multivariate_normal_frozen(mean=loc, cov=shape,
-                allow_singular=allow_singular, seed=seed)
+                                              allow_singular=allow_singular,
+                                              seed=seed)
         return multivariate_t_frozen(loc=loc, shape=shape, df=df,
-            allow_singular=allow_singular, seed=seed)
+                                     allow_singular=allow_singular, seed=seed)

     def pdf(self, x, loc=None, shape=1, df=1, allow_singular=False):
         """
@@ -726,7 +922,12 @@ class multivariate_t_gen(multi_rv_generic):
         array([0.00075713])

         """
-        pass
+        dim, loc, shape, df = self._process_parameters(loc, shape, df)
+        x = self._process_quantiles(x, dim)
+        shape_info = _PSD(shape, allow_singular=allow_singular)
+        logpdf = self._logpdf(x, loc, shape_info.U, shape_info.log_pdet, df,
+                              dim, shape_info.rank)
+        return np.exp(logpdf)

     def logpdf(self, x, loc=None, shape=1, df=1):
         """
@@ -758,7 +959,11 @@ class multivariate_t_gen(multi_rv_generic):
         pdf : Probability density function.

         """
-        pass
+        dim, loc, shape, df = self._process_parameters(loc, shape, df)
+        x = self._process_quantiles(x, dim)
+        shape_info = _PSD(shape)
+        return self._logpdf(x, loc, shape_info.U, shape_info.log_pdet, df, dim,
+                            shape_info.rank)

     def _logpdf(self, x, loc, prec_U, log_pdet, df, dim, rank):
         """Utility method `pdf`, `logpdf` for parameters.
@@ -788,7 +993,20 @@ class multivariate_t_gen(multi_rv_generic):
         directly; use 'logpdf' instead.

         """
-        pass
+        if df == np.inf:
+            return multivariate_normal._logpdf(x, loc, prec_U, log_pdet, rank)
+
+        dev = x - loc
+        maha = np.square(np.dot(dev, prec_U)).sum(axis=-1)
+
+        t = 0.5 * (df + dim)
+        A = gammaln(t)
+        B = gammaln(0.5 * df)
+        C = dim/2. * np.log(df * np.pi)
+        D = 0.5 * log_pdet
+        E = -t * np.log(1 + (1./df) * maha)
+
+        return _squeeze_output(A - B - C - D + E)

     def rvs(self, loc=None, shape=1, df=1, size=1, random_state=None):
         """
@@ -818,7 +1036,25 @@ class multivariate_t_gen(multi_rv_generic):
         array([[0.93477495, 3.00408716]])

         """
-        pass
+        # For implementation details, see equation (3):
+        #
+        #    Hofert, "On Sampling from the Multivariatet Distribution", 2013
+        #     http://rjournal.github.io/archive/2013-2/hofert.pdf
+        #
+        dim, loc, shape, df = self._process_parameters(loc, shape, df)
+        if random_state is not None:
+            rng = check_random_state(random_state)
+        else:
+            rng = self._random_state
+
+        if np.isinf(df):
+            x = np.ones(size)
+        else:
+            x = rng.chisquare(df, size=size) / df
+
+        z = rng.multivariate_normal(np.zeros(dim), shape, size=size)
+        samples = loc + z / np.sqrt(x)[:, None]
+        return _squeeze_output(samples)

     def _process_quantiles(self, x, dim):
         """
@@ -826,7 +1062,15 @@ class multivariate_t_gen(multi_rv_generic):
         each data point.

         """
-        pass
+        x = np.asarray(x, dtype=float)
+        if x.ndim == 0:
+            x = x[np.newaxis]
+        elif x.ndim == 1:
+            if dim == 1:
+                x = x[:, np.newaxis]
+            else:
+                x = x[np.newaxis, :]
+        return x

     def _process_parameters(self, loc, shape, df):
         """
@@ -834,13 +1078,66 @@ class multivariate_t_gen(multi_rv_generic):
         defaults, and ensure compatible dimensions.

         """
-        pass
+        if loc is None and shape is None:
+            loc = np.asarray(0, dtype=float)
+            shape = np.asarray(1, dtype=float)
+            dim = 1
+        elif loc is None:
+            shape = np.asarray(shape, dtype=float)
+            if shape.ndim < 2:
+                dim = 1
+            else:
+                dim = shape.shape[0]
+            loc = np.zeros(dim)
+        elif shape is None:
+            loc = np.asarray(loc, dtype=float)
+            dim = loc.size
+            shape = np.eye(dim)
+        else:
+            shape = np.asarray(shape, dtype=float)
+            loc = np.asarray(loc, dtype=float)
+            dim = loc.size
+
+        if dim == 1:
+            loc.shape = (1,)
+            shape.shape = (1, 1)
+
+        if loc.ndim != 1 or loc.shape[0] != dim:
+            raise ValueError("Array 'loc' must be a vector of length %d." %
+                             dim)
+        if shape.ndim == 0:
+            shape = shape * np.eye(dim)
+        elif shape.ndim == 1:
+            shape = np.diag(shape)
+        elif shape.ndim == 2 and shape.shape != (dim, dim):
+            rows, cols = shape.shape
+            if rows != cols:
+                msg = ("Array 'cov' must be square if it is two dimensional,"
+                       " but cov.shape = %s." % str(shape.shape))
+            else:
+                msg = ("Dimension mismatch: array 'cov' is of shape %s,"
+                       " but 'loc' is a vector of length %d.")
+                msg = msg % (str(shape.shape), len(loc))
+            raise ValueError(msg)
+        elif shape.ndim > 2:
+            raise ValueError("Array 'cov' must be at most two-dimensional,"
+                             " but cov.ndim = %d" % shape.ndim)
+
+        # Process degrees of freedom.
+        if df is None:
+            df = 1
+        elif df <= 0:
+            raise ValueError("'df' must be greater than zero.")
+        elif np.isnan(df):
+            raise ValueError("'df' is 'nan' but must be greater than zero or 'np.inf'.")
+
+        return dim, loc, shape, df


 class multivariate_t_frozen(multi_rv_frozen):

-    def __init__(self, loc=None, shape=1, df=1, allow_singular=False, seed=None
-        ):
+    def __init__(self, loc=None, shape=1, df=1, allow_singular=False,
+                 seed=None):
         """
         Create a frozen multivariate t distribution.

@@ -865,11 +1162,32 @@ class multivariate_t_frozen(multi_rv_frozen):
         self.dim, self.loc, self.shape, self.df = dim, loc, shape, df
         self.shape_info = _PSD(shape, allow_singular=allow_singular)

+    def logpdf(self, x):
+        x = self._dist._process_quantiles(x, self.dim)
+        U = self.shape_info.U
+        log_pdet = self.shape_info.log_pdet
+        return self._dist._logpdf(x, self.loc, U, log_pdet, self.df, self.dim,
+                                  self.shape_info.rank)
+
+    def pdf(self, x):
+        return np.exp(self.logpdf(x))
+
+    def rvs(self, size=1, random_state=None):
+        return self._dist.rvs(loc=self.loc,
+                              shape=self.shape,
+                              df=self.df,
+                              size=size,
+                              random_state=random_state)
+

 multivariate_t = multivariate_t_gen()
+
+
+# Set frozen generator docstrings from corresponding docstrings in
+# matrix_normal_gen and fill in default strings in class docstrings
 for name in ['logpdf', 'pdf', 'rvs']:
     method = multivariate_t_gen.__dict__[name]
     method_frozen = multivariate_t_frozen.__dict__[name]
     method_frozen.__doc__ = doccer.docformat(method.__doc__,
-        mvt_docdict_noparams)
+                                             mvt_docdict_noparams)
     method.__doc__ = doccer.docformat(method.__doc__, mvt_docdict_params)
diff --git a/statsmodels/compat/numpy.py b/statsmodels/compat/numpy.py
index 6ede7a8f5..d8c844ac7 100644
--- a/statsmodels/compat/numpy.py
+++ b/statsmodels/compat/numpy.py
@@ -39,11 +39,20 @@ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 """
 from packaging.version import Version, parse
+
 import numpy as np
-__all__ = ['NP_LT_123', 'NP_LT_114', 'lstsq', 'np_matrix_rank', 'np_new_unique'
-    ]
-NP_LT_114 = parse(np.__version__) < Version('1.13.99')
-NP_LT_123 = parse(np.__version__) < Version('1.22.99')
+
+__all__ = [
+    "NP_LT_123",
+    "NP_LT_114",
+    "lstsq",
+    "np_matrix_rank",
+    "np_new_unique",
+]
+
+NP_LT_114 = parse(np.__version__) < Version("1.13.99")
+NP_LT_123 = parse(np.__version__) < Version("1.22.99")
+
 np_matrix_rank = np.linalg.matrix_rank
 np_new_unique = np.unique

@@ -53,4 +62,6 @@ def lstsq(a, b, rcond=None):
     Shim that allows modern rcond setting with backward compat for NumPY
     earlier than 1.14
     """
-    pass
+    if NP_LT_114 and rcond is None:
+        rcond = -1
+    return np.linalg.lstsq(a, b, rcond=rcond)
diff --git a/statsmodels/compat/pandas.py b/statsmodels/compat/pandas.py
index f0edd9d56..3af29c946 100644
--- a/statsmodels/compat/pandas.py
+++ b/statsmodels/compat/pandas.py
@@ -1,40 +1,74 @@
 from typing import Optional
+
 import numpy as np
 from packaging.version import Version, parse
 import pandas as pd
-from pandas.util._decorators import Appender, Substitution, cache_readonly, deprecate_kwarg
-__all__ = ['assert_frame_equal', 'assert_index_equal',
-    'assert_series_equal', 'data_klasses', 'frequencies',
-    'is_numeric_dtype', 'testing', 'cache_readonly', 'deprecate_kwarg',
-    'Appender', 'Substitution', 'is_int_index', 'is_float_index',
-    'make_dataframe', 'to_numpy', 'PD_LT_1_0_0', 'get_cached_func',
-    'get_cached_doc', 'call_cached_func', 'PD_LT_1_4', 'PD_LT_2',
-    'MONTH_END', 'QUARTER_END', 'YEAR_END', 'FUTURE_STACK']
+from pandas.util._decorators import (
+    Appender,
+    Substitution,
+    cache_readonly,
+    deprecate_kwarg,
+)
+
+__all__ = [
+    "assert_frame_equal",
+    "assert_index_equal",
+    "assert_series_equal",
+    "data_klasses",
+    "frequencies",
+    "is_numeric_dtype",
+    "testing",
+    "cache_readonly",
+    "deprecate_kwarg",
+    "Appender",
+    "Substitution",
+    "is_int_index",
+    "is_float_index",
+    "make_dataframe",
+    "to_numpy",
+    "PD_LT_1_0_0",
+    "get_cached_func",
+    "get_cached_doc",
+    "call_cached_func",
+    "PD_LT_1_4",
+    "PD_LT_2",
+    "MONTH_END",
+    "QUARTER_END",
+    "YEAR_END",
+    "FUTURE_STACK",
+]
+
 version = parse(pd.__version__)
-PD_LT_2_2_0 = version < Version('2.1.99')
-PD_LT_2_1_0 = version < Version('2.0.99')
-PD_LT_1_0_0 = version < Version('0.99.0')
-PD_LT_1_4 = version < Version('1.3.99')
-PD_LT_2 = version < Version('1.9.99')
+
+PD_LT_2_2_0 = version < Version("2.1.99")
+PD_LT_2_1_0 = version < Version("2.0.99")
+PD_LT_1_0_0 = version < Version("0.99.0")
+PD_LT_1_4 = version < Version("1.3.99")
+PD_LT_2 = version < Version("1.9.99")
+
 try:
     from pandas.api.types import is_numeric_dtype
 except ImportError:
     from pandas.core.common import is_numeric_dtype
+
 try:
     from pandas.tseries import offsets as frequencies
 except ImportError:
     from pandas.tseries import frequencies
-data_klasses = pd.Series, pd.DataFrame
+
+data_klasses = (pd.Series, pd.DataFrame)
+
 try:
     import pandas.testing as testing
 except ImportError:
     import pandas.util.testing as testing
+
 assert_frame_equal = testing.assert_frame_equal
 assert_index_equal = testing.assert_index_equal
 assert_series_equal = testing.assert_series_equal


-def is_int_index(index: pd.Index) ->bool:
+def is_int_index(index: pd.Index) -> bool:
     """
     Check if an index is integral

@@ -48,10 +82,14 @@ def is_int_index(index: pd.Index) ->bool:
     bool
         True if is an index with a standard integral type
     """
-    pass
+    return (
+        isinstance(index, pd.Index)
+        and isinstance(index.dtype, np.dtype)
+        and np.issubdtype(index.dtype, np.integer)
+    )


-def is_float_index(index: pd.Index) ->bool:
+def is_float_index(index: pd.Index) -> bool:
     """
     Check if an index is floating

@@ -65,7 +103,11 @@ def is_float_index(index: pd.Index) ->bool:
     bool
         True if an index with a standard numpy floating dtype
     """
-    pass
+    return (
+        isinstance(index, pd.Index)
+        and isinstance(index.dtype, np.dtype)
+        and np.issubdtype(index.dtype, np.floating)
+    )


 try:
@@ -73,20 +115,39 @@ try:
 except ImportError:
     import string

-    def rands_array(nchars, size, dtype='O'):
+    def rands_array(nchars, size, dtype="O"):
         """
         Generate an array of byte strings.
         """
-        pass
+        rands_chars = np.array(
+            list(string.ascii_letters + string.digits), dtype=(np.str_, 1)
+        )
+        retval = (
+            np.random.choice(rands_chars, size=nchars * np.prod(size))
+            .view((np.str_, nchars))
+            .reshape(size)
+        )
+        if dtype is None:
+            return retval
+        else:
+            return retval.astype(dtype)

     def make_dataframe():
         """
         Simple verion of pandas._testing.makeDataFrame
         """
-        pass
+        n = 30
+        k = 4
+        index = pd.Index(rands_array(nchars=10, size=n), name=None)
+        data = {
+            c: pd.Series(np.random.randn(n), index=index)
+            for c in string.ascii_uppercase[:k]
+        }

+        return pd.DataFrame(data)

-def to_numpy(po: pd.DataFrame) ->np.ndarray:
+
+def to_numpy(po: pd.DataFrame) -> np.ndarray:
     """
     Workaround legacy pandas lacking to_numpy

@@ -99,10 +160,29 @@ def to_numpy(po: pd.DataFrame) ->np.ndarray:
     ndarray
         A numpy array
     """
-    pass
+    try:
+        return po.to_numpy()
+    except AttributeError:
+        return po.values
+
+
+def get_cached_func(cached_prop):
+    try:
+        return cached_prop.fget
+    except AttributeError:
+        return cached_prop.func
+
+
+def call_cached_func(cached_prop, *args, **kwargs):
+    f = get_cached_func(cached_prop)
+    return f(*args, **kwargs)
+
+
+def get_cached_doc(cached_prop) -> Optional[str]:
+    return get_cached_func(cached_prop).__doc__


-MONTH_END = 'M' if PD_LT_2_2_0 else 'ME'
-QUARTER_END = 'Q' if PD_LT_2_2_0 else 'QE'
-YEAR_END = 'Y' if PD_LT_2_2_0 else 'YE'
-FUTURE_STACK = {} if PD_LT_2_1_0 else {'future_stack': True}
+MONTH_END = "M" if PD_LT_2_2_0 else "ME"
+QUARTER_END = "Q" if PD_LT_2_2_0 else "QE"
+YEAR_END = "Y" if PD_LT_2_2_0 else "YE"
+FUTURE_STACK = {} if PD_LT_2_1_0 else {"future_stack": True}
diff --git a/statsmodels/compat/patsy.py b/statsmodels/compat/patsy.py
index c49c73793..9d762b425 100644
--- a/statsmodels/compat/patsy.py
+++ b/statsmodels/compat/patsy.py
@@ -1,3 +1,16 @@
 from statsmodels.compat.pandas import PD_LT_2
+
 import pandas as pd
 import patsy.util
+
+
+def _safe_is_pandas_categorical_dtype(dt):
+    if PD_LT_2:
+        return pd.api.types.is_categorical_dtype(dt)
+    return isinstance(dt, pd.CategoricalDtype)
+
+
+def monkey_patch_cat_dtype():
+    patsy.util.safe_is_pandas_categorical_dtype = (
+        _safe_is_pandas_categorical_dtype
+    )
diff --git a/statsmodels/compat/platform.py b/statsmodels/compat/platform.py
index 5f4b314b3..715b5b4f6 100644
--- a/statsmodels/compat/platform.py
+++ b/statsmodels/compat/platform.py
@@ -1,10 +1,18 @@
 import os
 import sys
-__all__ = ['PLATFORM_OSX', 'PLATFORM_WIN', 'PLATFORM_WIN32', 'PLATFORM_32',
-    'PLATFORM_LINUX', 'PLATFORM_LINUX32']
-PLATFORM_OSX = sys.platform == 'darwin'
-PLATFORM_WIN = sys.platform in ('win32', 'cygwin') or os.name == 'nt'
+
+__all__ = [
+    "PLATFORM_OSX",
+    "PLATFORM_WIN",
+    "PLATFORM_WIN32",
+    "PLATFORM_32",
+    "PLATFORM_LINUX",
+    "PLATFORM_LINUX32",
+]
+
+PLATFORM_OSX = sys.platform == "darwin"
+PLATFORM_WIN = sys.platform in ("win32", "cygwin") or os.name == "nt"
 PLATFORM_WIN32 = PLATFORM_WIN and sys.maxsize < 2 ** 33
-PLATFORM_LINUX = sys.platform[:5] == 'linux'
+PLATFORM_LINUX = sys.platform[:5] == "linux"
 PLATFORM_32 = sys.maxsize < 2 ** 33
 PLATFORM_LINUX32 = PLATFORM_32 and PLATFORM_LINUX
diff --git a/statsmodels/compat/pytest.py b/statsmodels/compat/pytest.py
index eb5d3515c..10e4dc210 100644
--- a/statsmodels/compat/pytest.py
+++ b/statsmodels/compat/pytest.py
@@ -1,13 +1,15 @@
 from __future__ import annotations
+
 from typing import Tuple, Type, Union
 import warnings
+
 from _pytest.recwarn import WarningsChecker
 from pytest import warns
-__all__ = ['pytest_warns']

+__all__ = ["pytest_warns"]

-class NoWarningsChecker:

+class NoWarningsChecker:
     def __init__(self):
         self.cw = warnings.catch_warnings(record=True)
         self.rec = []
@@ -18,15 +20,17 @@ class NoWarningsChecker:
     def __exit__(self, type, value, traceback):
         if self.rec:
             warnings = [w.category.__name__ for w in self.rec]
-            joined = '\\n'.join(warnings)
+            joined = "\\n".join(warnings)
             raise AssertionError(
-                f"""Function is marked as not warning but the following warnings were found: 
-{joined}"""
-                )
+                "Function is marked as not warning but the following "
+                "warnings were found: \n"
+                f"{joined}"
+            )


-def pytest_warns(warning: (Type[Warning] | Tuple[Type[Warning], ...] | None)
-    ) ->Union[WarningsChecker, NoWarningsChecker]:
+def pytest_warns(
+    warning: Type[Warning] | Tuple[Type[Warning], ...] | None
+) -> Union[WarningsChecker, NoWarningsChecker]:
     """

     Parameters
@@ -39,4 +43,9 @@ def pytest_warns(warning: (Type[Warning] | Tuple[Type[Warning], ...] | None)
     cm

     """
-    pass
+    if warning is None:
+        return NoWarningsChecker()
+    else:
+        assert warning is not None
+
+        return warns(warning)
diff --git a/statsmodels/compat/python.py b/statsmodels/compat/python.py
index 79ebf95b5..71a49fd42 100644
--- a/statsmodels/compat/python.py
+++ b/statsmodels/compat/python.py
@@ -3,15 +3,64 @@ Compatibility tools for differences between Python 2 and 3
 """
 import sys
 from typing import TYPE_CHECKING
+
 PY37 = sys.version_info[:2] == (3, 7)
-asunicode = lambda x, _: str(x)
-__all__ = ['asunicode', 'asstr', 'asbytes', 'Literal', 'lmap', 'lzip',
-    'lrange', 'lfilter', 'with_metaclass']
+
+asunicode = lambda x, _: str(x)  # noqa:E731
+
+
+__all__ = [
+    "asunicode",
+    "asstr",
+    "asbytes",
+    "Literal",
+    "lmap",
+    "lzip",
+    "lrange",
+    "lfilter",
+    "with_metaclass",
+]
+
+
+def asbytes(s):
+    if isinstance(s, bytes):
+        return s
+    return s.encode("latin1")
+
+
+def asstr(s):
+    if isinstance(s, str):
+        return s
+    return s.decode("latin1")
+
+
+# list-producing versions of the major Python iterating functions
+def lrange(*args, **kwargs):
+    return list(range(*args, **kwargs))
+
+
+def lzip(*args, **kwargs):
+    return list(zip(*args, **kwargs))
+
+
+def lmap(*args, **kwargs):
+    return list(map(*args, **kwargs))
+
+
+def lfilter(*args, **kwargs):
+    return list(filter(*args, **kwargs))


 def with_metaclass(meta, *bases):
     """Create a base class with a metaclass."""
-    pass
+    # This requires a bit of explanation: the basic idea is to make a dummy
+    # metaclass for one level of class instantiation that replaces itself with
+    # the actual metaclass.
+    class metaclass(meta):
+        def __new__(cls, name, this_bases, d):
+            return meta(name, bases, d)
+
+    return type.__new__(metaclass, "temporary_class", (), {})


 if sys.version_info >= (3, 8):
diff --git a/statsmodels/compat/scipy.py b/statsmodels/compat/scipy.py
index 319f7b040..7be54da7d 100644
--- a/statsmodels/compat/scipy.py
+++ b/statsmodels/compat/scipy.py
@@ -1,12 +1,14 @@
 from packaging.version import Version, parse
+
 import numpy as np
 import scipy
+
 SP_VERSION = parse(scipy.__version__)
-SP_LT_15 = SP_VERSION < Version('1.4.99')
+SP_LT_15 = SP_VERSION < Version("1.4.99")
 SCIPY_GT_14 = not SP_LT_15
-SP_LT_16 = SP_VERSION < Version('1.5.99')
-SP_LT_17 = SP_VERSION < Version('1.6.99')
-SP_LT_19 = SP_VERSION < Version('1.8.99')
+SP_LT_16 = SP_VERSION < Version("1.5.99")
+SP_LT_17 = SP_VERSION < Version("1.6.99")
+SP_LT_19 = SP_VERSION < Version("1.8.99")


 def _next_regular(target):
@@ -18,15 +20,55 @@ def _next_regular(target):

     Target must be a positive integer.
     """
-    pass
+    if target <= 6:
+        return target
+
+    # Quickly check if it's already a power of 2
+    if not (target & (target - 1)):
+        return target
+
+    match = float("inf")  # Anything found will be smaller
+    p5 = 1
+    while p5 < target:
+        p35 = p5
+        while p35 < target:
+            # Ceiling integer division, avoiding conversion to float
+            # (quotient = ceil(target / p35))
+            quotient = -(-target // p35)
+            # Quickly find next power of 2 >= quotient
+            p2 = 2 ** ((quotient - 1).bit_length())
+
+            N = p2 * p35
+            if N == target:
+                return N
+            elif N < match:
+                match = N
+            p35 *= 3
+            if p35 == target:
+                return p35
+        if p35 < match:
+            match = p35
+        p5 *= 5
+        if p5 == target:
+            return p5
+    if p5 < match:
+        match = p5
+    return match


 def _valarray(shape, value=np.nan, typecode=None):
     """Return an array of all value."""
-    pass
+
+    out = np.ones(shape, dtype=bool) * value
+    if typecode is not None:
+        out = out.astype(typecode)
+    if not isinstance(out, np.ndarray):
+        out = np.asarray(out)
+    return out


 if SP_LT_16:
-    from ._scipy_multivariate_t import multivariate_t
+    # copied from scipy, added to scipy in 1.6.0
+    from ._scipy_multivariate_t import multivariate_t  # noqa: F401
 else:
-    from scipy.stats import multivariate_t
+    from scipy.stats import multivariate_t  # noqa: F401
diff --git a/statsmodels/datasets/anes96/data.py b/statsmodels/datasets/anes96/data.py
index 4bd107bbf..9c0ac541a 100644
--- a/statsmodels/datasets/anes96/data.py
+++ b/statsmodels/datasets/anes96/data.py
@@ -1,17 +1,22 @@
 """American National Election Survey 1996"""
 from numpy import log
+
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is public domain.'
+
+COPYRIGHT = """This is public domain."""
 TITLE = __doc__
 SOURCE = """
 http://www.electionstudies.org/

 The American National Election Studies.
 """
-DESCRSHORT = (
-    'This data is a subset of the American National Election Studies of 1996.')
+
+DESCRSHORT = """This data is a subset of the American National Election Studies of 1996."""
+
 DESCRLONG = DESCRSHORT
+
 NOTE = """::

     Number of observations - 944
@@ -92,7 +97,8 @@ def load_pandas():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=5, exog_idx=[10, 2, 6, 7, 8])


 def load():
@@ -103,4 +109,11 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+
+def _get_data():
+    data = du.load_csv(__file__, 'anes96.csv', sep=r'\s')
+    data = du.strip_column_names(data)
+    data['logpopul'] = log(data['popul'] + .1)
+    return data.astype(float)
diff --git a/statsmodels/datasets/cancer/data.py b/statsmodels/datasets/cancer/data.py
index 14c044749..8e9f0b6da 100644
--- a/statsmodels/datasets/cancer/data.py
+++ b/statsmodels/datasets/cancer/data.py
@@ -1,16 +1,22 @@
 """Breast Cancer Data"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = '???'
-TITLE = 'Breast Cancer Data'
-SOURCE = """
+
+COPYRIGHT   = """???"""
+TITLE       = """Breast Cancer Data"""
+SOURCE      = """
 This is the breast cancer data used in Owen's empirical likelihood.  It is taken from
 Rice, J.A. Mathematical Statistics and Data Analysis.
 http://www.cengage.com/statistics/discipline_content/dataLibrary.html
 """
-DESCRSHORT = 'Breast Cancer and county population'
-DESCRLONG = 'The number of breast cancer observances in various counties'
-NOTE = """::
+
+DESCRSHORT  = """Breast Cancer and county population"""
+
+DESCRLONG   = """The number of breast cancer observances in various counties"""
+
+#suggested notes
+NOTE        = """::

     Number of observations: 301
     Number of variables: 2
@@ -22,6 +28,11 @@ NOTE = """::
 """


+def load_pandas():
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=0, exog_idx=None)
+
+
 def load():
     """
     Load the data and return a Dataset class instance.
@@ -31,4 +42,8 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+
+def _get_data():
+    return du.load_csv(__file__, 'cancer.csv', convert_float=True)
diff --git a/statsmodels/datasets/ccard/data.py b/statsmodels/datasets/ccard/data.py
index a4fec40d3..3734b1f3c 100644
--- a/statsmodels/datasets/ccard/data.py
+++ b/statsmodels/datasets/ccard/data.py
@@ -1,20 +1,25 @@
 """Bill Greene's credit scoring data."""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = """Used with express permission of the original author, who
+
+COPYRIGHT   = """Used with express permission of the original author, who
 retains all rights."""
-TITLE = __doc__
-SOURCE = """
+TITLE       = __doc__
+SOURCE      = """
 William Greene's `Econometric Analysis`

 More information can be found at the web site of the text:
 http://pages.stern.nyu.edu/~wgreene/Text/econometricanalysis.htm
 """
-DESCRSHORT = "William Greene's credit scoring data"
-DESCRLONG = """More information on this data can be found on the
+
+DESCRSHORT  = """William Greene's credit scoring data"""
+
+DESCRLONG   = """More information on this data can be found on the
 homepage for Greene's `Econometric Analysis`. See source.
 """
-NOTE = """::
+
+NOTE        = """::

     Number of observations - 72
     Number of variables - 5
@@ -31,7 +36,8 @@ def load_pandas():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=0)


 def load():
@@ -42,4 +48,8 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+
+def _get_data():
+    return du.load_csv(__file__, 'ccard.csv', convert_float=True)
diff --git a/statsmodels/datasets/china_smoking/data.py b/statsmodels/datasets/china_smoking/data.py
index b99d1aac1..268962fa2 100644
--- a/statsmodels/datasets/china_smoking/data.py
+++ b/statsmodels/datasets/china_smoking/data.py
@@ -1,17 +1,22 @@
 """Smoking and lung cancer in eight cities in China."""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'Intern. J. Epidemiol. (1992)'
-TITLE = __doc__
-SOURCE = """
+
+COPYRIGHT   = """Intern. J. Epidemiol. (1992)"""
+TITLE       = __doc__
+SOURCE      = """
 Transcribed from Z. Liu, Smoking and Lung Cancer Incidence in China,
 Intern. J. Epidemiol., 21:197-201, (1992).
 """
-DESCRSHORT = 'Co-occurrence of lung cancer and smoking in 8 Chinese cities.'
-DESCRLONG = """This is a series of 8 2x2 contingency tables showing the co-occurrence
+
+DESCRSHORT  = """Co-occurrence of lung cancer and smoking in 8 Chinese cities."""
+
+DESCRLONG   = """This is a series of 8 2x2 contingency tables showing the co-occurrence
 of lung cancer and smoking in 8 Chinese cities.
 """
-NOTE = """::
+
+NOTE        = """::

     Number of Observations - 8
     Number of Variables - 3
@@ -32,7 +37,11 @@ def load_pandas():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    raw_data = du.load_csv(__file__, 'china_smoking.csv')
+    data = raw_data.set_index('Location')
+    dset = du.Dataset(data=data, title="Smoking and lung cancer in Chinese regions")
+    dset.raw_data = raw_data
+    return dset


 def load():
@@ -44,4 +53,4 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
diff --git a/statsmodels/datasets/co2/data.py b/statsmodels/datasets/co2/data.py
index b3013dde3..482421abe 100644
--- a/statsmodels/datasets/co2/data.py
+++ b/statsmodels/datasets/co2/data.py
@@ -1,10 +1,13 @@
 """Mauna Loa Weekly Atmospheric CO2 Data"""
 import pandas as pd
+
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is public domain.'
-TITLE = 'Mauna Loa Weekly Atmospheric CO2 Data'
-SOURCE = """
+
+COPYRIGHT   = """This is public domain."""
+TITLE       = """Mauna Loa Weekly Atmospheric CO2 Data"""
+SOURCE      = """
 Data obtained from http://cdiac.ornl.gov/trends/co2/sio-keel-flask/sio-keel-flaskmlo_c.html

 Obtained on 3/15/2014.
@@ -13,16 +16,18 @@ Citation:

 Keeling, C.D. and T.P. Whorf. 2004. Atmospheric CO2 concentrations derived from flask air samples at sites in the SIO network. In Trends: A Compendium of Data on Global Change. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tennessee, U.S.A.
 """
-DESCRSHORT = (
-    'Atmospheric CO2 from Continuous Air Samples at Mauna Loa Observatory, Hawaii, U.S.A.'
-    )
-DESCRLONG = """
+
+DESCRSHORT  = """Atmospheric CO2 from Continuous Air Samples at Mauna Loa Observatory, Hawaii, U.S.A."""
+
+DESCRLONG   = """
 Atmospheric CO2 from Continuous Air Samples at Mauna Loa Observatory, Hawaii, U.S.A.

 Period of Record: March 1958 - December 2001

 Methods: An Applied Physics Corporation (APC) nondispersive infrared gas analyzer was used to obtain atmospheric CO2 concentrations, based on continuous data (four measurements per hour) from atop intake lines on several towers. Steady data periods of not less than six hours per day are required; if no such six-hour periods are available on any given day, then no data are used that day. Weekly averages were calculated for most weeks throughout the approximately 44 years of record. The continuous data for year 2000 is compared with flask data from the same site in the graphics section."""
-NOTE = """::
+
+#suggested notes
+NOTE        = """::

     Number of observations: 2225
     Number of variables: 2
@@ -35,6 +40,14 @@ NOTE = """::
 """


+def load_pandas():
+    data = _get_data()
+    index = pd.date_range(start=str(data['date'][0]), periods=len(data), freq='W-SAT')
+    dataset = data[['co2']]
+    dataset.index = index
+    return du.Dataset(data=dataset, names=list(data.columns))
+
+
 def load():
     """
     Load the data and return a Dataset class instance.
@@ -44,4 +57,7 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+def _get_data():
+    return du.load_csv(__file__, 'co2.csv')
diff --git a/statsmodels/datasets/committee/data.py b/statsmodels/datasets/committee/data.py
index 1329456bc..0572aefa3 100644
--- a/statsmodels/datasets/committee/data.py
+++ b/statsmodels/datasets/committee/data.py
@@ -1,16 +1,20 @@
 """First 100 days of the US House of Representatives 1995"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = """Used with express permission from the original author,
+
+COPYRIGHT   = """Used with express permission from the original author,
 who retains all rights."""
-TITLE = __doc__
-SOURCE = """
+TITLE       = __doc__
+SOURCE      = """
 Jeff Gill's `Generalized Linear Models: A Unifited Approach`

 http://jgill.wustl.edu/research/books.html
 """
-DESCRSHORT = 'Number of bill assignments in the 104th House in 1995'
-DESCRLONG = """The example in Gill, seeks to explain the number of bill
+
+DESCRSHORT  = """Number of bill assignments in the 104th House in 1995"""
+
+DESCRLONG   = """The example in Gill, seeks to explain the number of bill
 assignments in the first 100 days of the US' 104th House of Representatives.
 The response variable is the number of bill assignments in the first 100 days
 over 20 Committees.  The explanatory variables in the example are the number of
@@ -22,6 +26,7 @@ the number of subcommittees and the log of the staff size.

 The data returned by load are not cleaned to represent the above example.
 """
+
 NOTE = """::

     Number of Observations - 20
@@ -42,6 +47,11 @@ NOTE = """::
 """


+def load_pandas():
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=0)
+
+
 def load():
     """Load the committee data and returns a data class.

@@ -50,4 +60,10 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+
+def _get_data():
+    data = du.load_csv(__file__, 'committee.csv')
+    data = data.iloc[:, 1:7].astype(float)
+    return data
diff --git a/statsmodels/datasets/copper/data.py b/statsmodels/datasets/copper/data.py
index 0059db136..abbce4ea7 100644
--- a/statsmodels/datasets/copper/data.py
+++ b/statsmodels/datasets/copper/data.py
@@ -1,16 +1,20 @@
 """World Copper Prices 1951-1975 dataset."""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = """Used with express permission from the original author,
+
+COPYRIGHT   = """Used with express permission from the original author,
 who retains all rights."""
-TITLE = 'World Copper Market 1951-1975 Dataset'
-SOURCE = """
+TITLE       = "World Copper Market 1951-1975 Dataset"
+SOURCE      = """
 Jeff Gill's `Generalized Linear Models: A Unified Approach`

 http://jgill.wustl.edu/research/books.html
 """
-DESCRSHORT = 'World Copper Market 1951-1975'
-DESCRLONG = """This data describes the world copper market from 1951 through 1975.  In an
+
+DESCRSHORT  = """World Copper Market 1951-1975"""
+
+DESCRLONG   = """This data describes the world copper market from 1951 through 1975.  In an
 example, in Gill, the outcome variable (of a 2 stage estimation) is the world
 consumption of copper for the 25 years.  The explanatory variables are the
 world consumption of copper in 1000 metric tons, the constant dollar adjusted
@@ -18,6 +22,7 @@ price of copper, the price of a substitute, aluminum, an index of real per
 capita income base 1970, an annual measure of manufacturer inventory change,
 and a time trend.
 """
+
 NOTE = """
 Number of Observations - 25

@@ -36,6 +41,12 @@ Years are included in the data file though not returned by load.
 """


+def _get_data():
+    data = du.load_csv(__file__, 'copper.csv')
+    data = data.iloc[:, 1:7]
+    return data.astype(float)
+
+
 def load_pandas():
     """
     Load the copper data and returns a Dataset class.
@@ -45,7 +56,8 @@ def load_pandas():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=0)


 def load():
@@ -57,4 +69,4 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
diff --git a/statsmodels/datasets/cpunish/data.py b/statsmodels/datasets/cpunish/data.py
index cfdd5debf..b3e1b8f37 100644
--- a/statsmodels/datasets/cpunish/data.py
+++ b/statsmodels/datasets/cpunish/data.py
@@ -1,16 +1,20 @@
 """US Capital Punishment dataset."""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = """Used with express permission from the original author,
+
+COPYRIGHT   = """Used with express permission from the original author,
 who retains all rights."""
-TITLE = __doc__
-SOURCE = """
+TITLE       = __doc__
+SOURCE      = """
 Jeff Gill's `Generalized Linear Models: A Unified Approach`

 http://jgill.wustl.edu/research/books.html
 """
-DESCRSHORT = 'Number of state executions in 1997'
-DESCRLONG = """This data describes the number of times capital punishment is implemented
+
+DESCRSHORT  = """Number of state executions in 1997"""
+
+DESCRLONG   = """This data describes the number of times capital punishment is implemented
 at the state level for the year 1997.  The outcome variable is the number of
 executions.  There were executions in 17 states.
 Included in the data are explanatory variables for median per capita income
@@ -20,7 +24,8 @@ crimes per 100,000 residents for 1996, a dummy variable indicating
 whether the state is in the South, and (an estimate of) the proportion
 of the population with a college degree of some kind.
 """
-NOTE = """::
+
+NOTE        = """::

     Number of Observations - 17
     Number of Variables - 7
@@ -48,7 +53,8 @@ def load_pandas():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=0)


 def load():
@@ -60,4 +66,10 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+
+def _get_data():
+    data = du.load_csv(__file__, 'cpunish.csv')
+    data = data.iloc[:, 1:8].astype(float)
+    return data
diff --git a/statsmodels/datasets/danish_data/data.py b/statsmodels/datasets/danish_data/data.py
index 2060dbb96..b5ce11908 100644
--- a/statsmodels/datasets/danish_data/data.py
+++ b/statsmodels/datasets/danish_data/data.py
@@ -1,8 +1,11 @@
 """Danish Money Demand Data"""
 import pandas as pd
+
 from statsmodels.datasets import utils as du
-__docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is public domain.'
+
+__docformat__ = "restructuredtext"
+
+COPYRIGHT = """This is public domain."""
 TITLE = __doc__
 SOURCE = """
 Danish data used in S. Johansen and K. Juselius.  For estimating
@@ -13,8 +16,11 @@ estimating a money demand function::
         for Money, Oxford Bulletin of Economics and Statistics, 52, 2,
         169-210.
 """
-DESCRSHORT = 'Danish Money Demand Data'
+
+DESCRSHORT = """Danish Money Demand Data"""
+
 DESCRLONG = DESCRSHORT
+
 NOTE = """::
     Number of Observations - 55

@@ -30,6 +36,12 @@ NOTE = """::
 """


+def load_pandas():
+    data = _get_data()
+    data.index.freq = "QS-JAN"
+    return du.Dataset(data=data, names=list(data.columns))
+
+
 def load():
     """
     Load the US macro data and return a Dataset class.
@@ -43,11 +55,22 @@ def load():
     -----
     The Dataset instance does not contain endog and exog attributes.
     """
-    pass
+    return load_pandas()
+
+
+def _get_data():
+    data = du.load_csv(__file__, "data.csv")
+    for i, val in enumerate(data.period):
+        parts = val.split("Q")
+        month = (int(parts[1]) - 1) * 3 + 1
+
+        data.loc[data.index[i], "period"] = f"{parts[0]}-{month:02d}-01"
+    data["period"] = pd.to_datetime(data.period)
+    return data.set_index("period").astype(float)


-variable_names = ['lrm', 'lry', 'lpy', 'ibo', 'ide']
+variable_names = ["lrm", "lry", "lpy", "ibo", "ide"]


 def __str__():
-    return 'danish_data'
+    return "danish_data"
diff --git a/statsmodels/datasets/elec_equip/data.py b/statsmodels/datasets/elec_equip/data.py
index c60eba66e..b00c4639b 100644
--- a/statsmodels/datasets/elec_equip/data.py
+++ b/statsmodels/datasets/elec_equip/data.py
@@ -1,15 +1,22 @@
 """Euro area 18 - Total Turnover Index, Manufacture of electrical equipment"""
 import os
+
 import pandas as pd
+
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is public domain.'
+
+COPYRIGHT = """This is public domain."""
 TITLE = __doc__
 SOURCE = """
 Data are from the Statistical Office of the European Commission (Eurostat)
 """
-DESCRSHORT = 'EU Manufacture of electrical equipment'
+
+DESCRSHORT = """EU Manufacture of electrical equipment"""
+
 DESCRLONG = DESCRSHORT
+
 NOTE = """::
     Variable name definitions::

@@ -22,6 +29,11 @@ NOTE = """::
 """


+def load_pandas():
+    data = _get_data()
+    return du.Dataset(data=data, names=list(data.columns))
+
+
 def load():
     """
     Load the EU Electrical Equipment manufacturing data into a Dataset class
@@ -35,11 +47,18 @@ def load():
     -----
     The Dataset instance does not contain endog and exog attributes.
     """
-    pass
+    return load_pandas()
+
+
+def _get_data():
+    curr_dir = os.path.split(os.path.abspath(__file__))[0]
+    data = pd.read_csv(os.path.join(curr_dir, 'elec_equip.csv'))
+    data.index = pd.to_datetime(data.pop('DATE'))
+    return data


-variable_names = ['elec_equip']
+variable_names = ["elec_equip"]


 def __str__():
-    return 'elec_equip'
+    return "elec_equip"
diff --git a/statsmodels/datasets/elnino/data.py b/statsmodels/datasets/elnino/data.py
index f7adb53e5..b5231b8fa 100644
--- a/statsmodels/datasets/elnino/data.py
+++ b/statsmodels/datasets/elnino/data.py
@@ -1,20 +1,27 @@
 """El Nino dataset, 1950 - 2010"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This data is in the public domain.'
-TITLE = 'El Nino - Sea Surface Temperatures'
-SOURCE = """
+
+COPYRIGHT   = """This data is in the public domain."""
+
+TITLE       = """El Nino - Sea Surface Temperatures"""
+
+SOURCE      = """
 National Oceanic and Atmospheric Administration's National Weather Service

 ERSST.V3B dataset, Nino 1+2
 http://www.cpc.ncep.noaa.gov/data/indices/
 """
-DESCRSHORT = 'Averaged monthly sea surface temperature - Pacific Ocean.'
-DESCRLONG = """This data contains the averaged monthly sea surface
+
+DESCRSHORT  = """Averaged monthly sea surface temperature - Pacific Ocean."""
+
+DESCRLONG   = """This data contains the averaged monthly sea surface
 temperature in degrees Celcius of the Pacific Ocean, between 0-10 degrees South
 and 90-80 degrees West, from 1950 to 2010.  This dataset was obtained from
 NOAA.
 """
+
 NOTE = """::

     Number of Observations - 61 x 12
@@ -28,6 +35,12 @@ NOTE = """::
 """


+def load_pandas():
+    data = _get_data()
+    dataset = du.Dataset(data=data, names=list(data.columns))
+    return dataset
+
+
 def load():
     """
     Load the El Nino data and return a Dataset class.
@@ -41,4 +54,8 @@ def load():
     -----
     The elnino Dataset instance does not contain endog and exog attributes.
     """
-    pass
+    return load_pandas()
+
+
+def _get_data():
+    return du.load_csv(__file__, 'elnino.csv', convert_float=True)
diff --git a/statsmodels/datasets/engel/data.py b/statsmodels/datasets/engel/data.py
index c335ae185..2660d83e0 100644
--- a/statsmodels/datasets/engel/data.py
+++ b/statsmodels/datasets/engel/data.py
@@ -1,9 +1,11 @@
 """Name of dataset."""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is public domain.'
-TITLE = 'Engel (1857) food expenditure data'
-SOURCE = """
+
+COPYRIGHT   = """This is public domain."""
+TITLE       = """Engel (1857) food expenditure data"""
+SOURCE      = """
 This dataset was used in Koenker and Bassett (1982) and distributed alongside
 the ``quantreg`` package for R.

@@ -13,11 +15,13 @@ Regression Quantiles; Econometrica 50, 43-61.
 Roger Koenker (2012). quantreg: Quantile Regression. R package version 4.94.
 http://CRAN.R-project.org/package=quantreg
 """
-DESCRSHORT = 'Engel food expenditure data.'
-DESCRLONG = (
-    'Data on income and food expenditure for 235 working class households in 1857 Belgium.'
-    )
-NOTE = """::
+
+DESCRSHORT  = """Engel food expenditure data."""
+
+DESCRLONG   = """Data on income and food expenditure for 235 working class households in 1857 Belgium."""
+
+#suggested notes
+NOTE        = """::

     Number of observations: 235
     Number of variables: 2
@@ -26,7 +30,6 @@ NOTE = """::
         foodexp - annual household food expenditure (Belgian francs)
 """

-
 def load():
     """
     Load the data and return a Dataset class instance.
@@ -36,4 +39,13 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+
+def load_pandas():
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=0, exog_idx=None)
+
+
+def _get_data():
+    return du.load_csv(__file__, 'engel.csv')
diff --git a/statsmodels/datasets/fair/data.py b/statsmodels/datasets/fair/data.py
index 1c5fd2e8d..98b0211e3 100644
--- a/statsmodels/datasets/fair/data.py
+++ b/statsmodels/datasets/fair/data.py
@@ -1,20 +1,26 @@
 """Fair's Extramarital Affairs Data"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'Included with permission of the author.'
-TITLE = 'Affairs dataset'
-SOURCE = """
+
+COPYRIGHT   = """Included with permission of the author."""
+TITLE       = """Affairs dataset"""
+SOURCE      = """
 Fair, Ray. 1978. "A Theory of Extramarital Affairs," `Journal of Political
 Economy`, February, 45-61.

 The data is available at http://fairmodel.econ.yale.edu/rayfair/pdf/2011b.htm
 """
-DESCRSHORT = 'Extramarital affair data.'
-DESCRLONG = """Extramarital affair data used to explain the allocation
+
+DESCRSHORT  = """Extramarital affair data."""
+
+DESCRLONG   = """Extramarital affair data used to explain the allocation
 of an individual's time among work, time spent with a spouse, and time
 spent with a paramour. The data is used as an example of regression
 with censored data."""
-NOTE = """::
+
+#suggested notes
+NOTE        = """::

     Number of observations: 6366
     Number of variables: 9
@@ -53,4 +59,13 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+
+def load_pandas():
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=8, exog_idx=None)
+
+
+def _get_data():
+    return du.load_csv(__file__, 'fair.csv', convert_float=True)
diff --git a/statsmodels/datasets/fertility/data.py b/statsmodels/datasets/fertility/data.py
index 71e69e798..9749f6f2c 100644
--- a/statsmodels/datasets/fertility/data.py
+++ b/statsmodels/datasets/fertility/data.py
@@ -1,11 +1,11 @@
 """World Bank Fertility Data."""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = (
-    'This data is distributed according to the World Bank terms of use. See SOURCE.'
-    )
-TITLE = 'World Bank Fertility Data'
-SOURCE = """
+
+COPYRIGHT   = """This data is distributed according to the World Bank terms of use. See SOURCE."""
+TITLE       = """World Bank Fertility Data"""
+SOURCE      = """
 This data has been acquired from

 The World Bank: Fertility rate, total (births per woman): World Development Indicators
@@ -25,11 +25,13 @@ The World Bank Terms of Use can be found at the following URL

 http://go.worldbank.org/OJC02YMLA0
 """
-DESCRSHORT = (
-    'Total fertility rate represents the number of children that would be born to a woman if she were to live to the end of her childbearing years and bear children in accordance with current age-specific fertility rates.'
-    )
-DESCRLONG = DESCRSHORT
-NOTE = """
+
+DESCRSHORT  = """Total fertility rate represents the number of children that would be born to a woman if she were to live to the end of her childbearing years and bear children in accordance with current age-specific fertility rates."""
+
+DESCRLONG   = DESCRSHORT
+
+#suggested notes
+NOTE        = """
 ::

     This is panel data in wide-format
@@ -54,4 +56,13 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+
+def load_pandas():
+    data = _get_data()
+    return du.Dataset(data=data)
+
+
+def _get_data():
+    return du.load_csv(__file__, 'fertility.csv')
diff --git a/statsmodels/datasets/grunfeld/data.py b/statsmodels/datasets/grunfeld/data.py
index 136259b24..b1649373a 100644
--- a/statsmodels/datasets/grunfeld/data.py
+++ b/statsmodels/datasets/grunfeld/data.py
@@ -1,10 +1,13 @@
 """Grunfeld (1950) Investment Data"""
 import pandas as pd
+
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is public domain.'
-TITLE = __doc__
-SOURCE = """This is the Grunfeld (1950) Investment Data.
+
+COPYRIGHT   = """This is public domain."""
+TITLE       = __doc__
+SOURCE      = """This is the Grunfeld (1950) Investment Data.

 The source for the data was the original 11-firm data set from Grunfeld's Ph.D.
 thesis recreated by Kleiber and Zeileis (2008) "The Grunfeld Data at 50".
@@ -14,9 +17,12 @@ http://statmath.wu-wien.ac.at/~zeileis/grunfeld/
 For a note on the many versions of the Grunfeld data circulating see:
 http://www.stanford.edu/~clint/bench/grunfeld.htm
 """
-DESCRSHORT = 'Grunfeld (1950) Investment Data for 11 U.S. Firms.'
-DESCRLONG = DESCRSHORT
-NOTE = """::
+
+DESCRSHORT  = """Grunfeld (1950) Investment Data for 11 U.S. Firms."""
+
+DESCRLONG   = DESCRSHORT
+
+NOTE        = """::

     Number of observations - 220 (20 years for 11 firms)

@@ -36,7 +42,6 @@ NOTE = """::
     string categorical variable.
 """

-
 def load():
     """
     Loads the Grunfeld data and returns a Dataset class.
@@ -51,8 +56,7 @@ def load():
     raw_data has the firm variable expanded to dummy variables for each
     firm (ie., there is no reference dummy)
     """
-    pass
-
+    return load_pandas()

 def load_pandas():
     """
@@ -68,4 +72,14 @@ def load_pandas():
     raw_data has the firm variable expanded to dummy variables for each
     firm (ie., there is no reference dummy)
     """
-    pass
+    data = _get_data()
+    data.year = data.year.astype(float)
+    raw_data = pd.get_dummies(data)
+    ds = du.process_pandas(data, endog_idx=0)
+    ds.raw_data = raw_data
+    return ds
+
+
+def _get_data():
+    data = du.load_csv(__file__, 'grunfeld.csv')
+    return data
diff --git a/statsmodels/datasets/heart/data.py b/statsmodels/datasets/heart/data.py
index b923eb785..5f061ee96 100644
--- a/statsmodels/datasets/heart/data.py
+++ b/statsmodels/datasets/heart/data.py
@@ -1,14 +1,21 @@
 """Heart Transplant Data, Miller 1976"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = '???'
-TITLE = 'Transplant Survival Data'
-SOURCE = """Miller, R. (1976). Least squares regression with censored data. Biometrica, 63 (3). 449-464.
+
+COPYRIGHT   = """???"""
+
+TITLE       = """Transplant Survival Data"""
+
+SOURCE      = """Miller, R. (1976). Least squares regression with censored data. Biometrica, 63 (3). 449-464.

 """
-DESCRSHORT = 'Survival times after receiving a heart transplant'
-DESCRLONG = """This data contains the survival time after receiving a heart transplant, the age of the patient and whether or not the survival time was censored.
+
+DESCRSHORT  = """Survival times after receiving a heart transplant"""
+
+DESCRLONG   = """This data contains the survival time after receiving a heart transplant, the age of the patient and whether or not the survival time was censored.
 """
+
 NOTE = """::

     Number of Observations - 69
@@ -31,4 +38,16 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+
+def load_pandas():
+    data = _get_data()
+    dataset = du.process_pandas(data, endog_idx=0, exog_idx=None)
+    dataset.censors = dataset.exog.iloc[:, 0]
+    dataset.exog = dataset.exog.iloc[:, 1]
+    return dataset
+
+
+def _get_data():
+    return du.load_csv(__file__, 'heart.csv')
diff --git a/statsmodels/datasets/interest_inflation/data.py b/statsmodels/datasets/interest_inflation/data.py
index 900903a1b..a5d43fa46 100644
--- a/statsmodels/datasets/interest_inflation/data.py
+++ b/statsmodels/datasets/interest_inflation/data.py
@@ -1,15 +1,21 @@
 """(West) German interest and inflation rate 1972-1998"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = '...'
+
+COPYRIGHT = """..."""  # TODO
 TITLE = __doc__
 SOURCE = """
 http://www.jmulti.de/download/datasets/e6.dat
 """
-DESCRSHORT = '(West) German interest and inflation rate 1972Q2 - 1998Q4'
+
+DESCRSHORT = """(West) German interest and inflation rate 1972Q2 - 1998Q4"""
+
 DESCRLONG = """West German (until 1990) / German (afterwards) interest and
 inflation rate 1972Q2 - 1998Q4
 """
+
+
 NOTE = """::
     Number of Observations - 107

@@ -22,8 +28,9 @@ NOTE = """::
         Dp        - Delta log gdp deflator
         R         - nominal long term interest rate
 """
-variable_names = ['Dp', 'R']
-first_season = 1
+
+variable_names = ["Dp", "R"]
+first_season = 1  # 1 stands for: first observation in Q2 (0 would mean Q1)


 def load():
@@ -40,8 +47,18 @@ def load():
     The interest_inflation Dataset instance does not contain endog and exog
     attributes.
     """
-    pass
+    return load_pandas()
+
+
+def load_pandas():
+    data = _get_data()
+    names = data.columns
+    dataset = du.Dataset(data=data, names=names)
+    return dataset
+

+def _get_data():
+    return du.load_csv(__file__, 'E6.csv', convert_float=True)

 def __str__():
-    return 'e6'
+    return "e6"
diff --git a/statsmodels/datasets/longley/data.py b/statsmodels/datasets/longley/data.py
index cf3e0a9e0..797797484 100644
--- a/statsmodels/datasets/longley/data.py
+++ b/statsmodels/datasets/longley/data.py
@@ -1,9 +1,11 @@
 """Longley dataset"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is public domain.'
-TITLE = __doc__
-SOURCE = """
+
+COPYRIGHT   = """This is public domain."""
+TITLE       = __doc__
+SOURCE      = """
 The classic 1967 Longley Data

 http://www.itl.nist.gov/div898/strd/lls/data/Longley.shtml
@@ -14,11 +16,14 @@ http://www.itl.nist.gov/div898/strd/lls/data/Longley.shtml
         Electronic Comptuer from the Point of View of the User."  Journal of
         the American Statistical Association.  62.319, 819-41.
 """
-DESCRSHORT = ''
-DESCRLONG = """The Longley dataset contains various US macroeconomic
+
+DESCRSHORT  = """"""
+
+DESCRLONG   = """The Longley dataset contains various US macroeconomic
 variables that are known to be highly collinear.  It has been used to appraise
 the accuracy of least squares routines."""
-NOTE = """::
+
+NOTE        = """::

     Number of Observations - 16

@@ -36,6 +41,7 @@ NOTE = """::
 """


+
 def load():
     """
     Load the Longley data and return a Dataset class.
@@ -45,7 +51,7 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()


 def load_pandas():
@@ -57,4 +63,11 @@ def load_pandas():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=0)
+
+
+def _get_data():
+    data = du.load_csv(__file__, 'longley.csv')
+    data = data.iloc[:, [1, 2, 3, 4, 5, 6, 7]].astype(float)
+    return data
diff --git a/statsmodels/datasets/macrodata/data.py b/statsmodels/datasets/macrodata/data.py
index 99c6589a9..ba77526b9 100644
--- a/statsmodels/datasets/macrodata/data.py
+++ b/statsmodels/datasets/macrodata/data.py
@@ -1,9 +1,11 @@
 """United States Macroeconomic data"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is public domain.'
-TITLE = __doc__
-SOURCE = """
+
+COPYRIGHT   = """This is public domain."""
+TITLE       = __doc__
+SOURCE      = """
 Compiled by Skipper Seabold. All data are from the Federal Reserve Bank of St.
 Louis [1] except the unemployment rate which was taken from the National
 Bureau of Labor Statistics [2]. ::
@@ -15,9 +17,12 @@ Bureau of Labor Statistics [2]. ::
     [2] Data Source: Bureau of Labor Statistics, U.S. Department of Labor;
         http://www.bls.gov/data/; accessed December 15, 2009.
 """
-DESCRSHORT = 'US Macroeconomic Data for 1959Q1 - 2009Q3'
-DESCRLONG = DESCRSHORT
-NOTE = """::
+
+DESCRSHORT  = """US Macroeconomic Data for 1959Q1 - 2009Q3"""
+
+DESCRLONG   = DESCRSHORT
+
+NOTE        = """::
     Number of Observations - 203

     Number of Variables - 14
@@ -50,6 +55,11 @@ NOTE = """::
 """


+def load_pandas():
+    data = _get_data()
+    return du.Dataset(data=data, names=list(data.columns))
+
+
 def load():
     """
     Load the US macro data and return a Dataset class.
@@ -63,11 +73,15 @@ def load():
     -----
     The macrodata Dataset instance does not contain endog and exog attributes.
     """
-    pass
+    return load_pandas()
+
+
+def _get_data():
+    return du.load_csv(__file__, 'macrodata.csv').astype(float)


-variable_names = ['realcons', 'realgdp', 'realinv']
+variable_names = ["realcons", "realgdp", "realinv"]


 def __str__():
-    return 'macrodata'
+    return "macrodata"
diff --git a/statsmodels/datasets/modechoice/data.py b/statsmodels/datasets/modechoice/data.py
index 8d06163b5..23a35b8ab 100644
--- a/statsmodels/datasets/modechoice/data.py
+++ b/statsmodels/datasets/modechoice/data.py
@@ -1,7 +1,9 @@
 """Travel Mode Choice"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is public domain.'
+
+COPYRIGHT = """This is public domain."""
 TITLE = __doc__
 SOURCE = """
 Greene, W.H. and D. Hensher (1997) Multinomial logit and discrete choice models
@@ -11,8 +13,10 @@ Download from on-line complements to Greene, W.H. (2011) Econometric Analysis,
 Prentice Hall, 7th Edition (data table F18-2)
 http://people.stern.nyu.edu/wgreene/Text/Edition7/TableF18-2.csv
 """
-DESCRSHORT = (
-    'Data used to study travel mode choice between Australian cities\n')
+
+DESCRSHORT = """Data used to study travel mode choice between Australian cities
+"""
+
 DESCRLONG = """The data, collected as part of a 1987 intercity mode choice
 study, are a sub-sample of 210 non-business trips between Sydney, Canberra and
 Melbourne in which the traveler chooses a mode from four alternatives (plane,
@@ -21,6 +25,7 @@ over-sampling of the less popular modes (plane, train and bus) and under-samplin
 of the more popular mode, car. The level of service data was derived from highway
 and transport networks in Sydney, Melbourne, non-metropolitan N.S.W. and Victoria,
 including the Australian Capital Territory."""
+
 NOTE = """::

     Number of observations: 840 Observations On 4 Modes for 210 Individuals.
@@ -55,7 +60,7 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()


 def load_pandas():
@@ -67,4 +72,9 @@ def load_pandas():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    data = _get_data()
+    return du.process_pandas(data, endog_idx = 2, exog_idx=[3,4,5,6,7,8])
+
+
+def _get_data():
+    return du.load_csv(__file__, 'modechoice.csv', sep=';', convert_float=True)
diff --git a/statsmodels/datasets/nile/data.py b/statsmodels/datasets/nile/data.py
index 4108e99e0..85f3aa626 100644
--- a/statsmodels/datasets/nile/data.py
+++ b/statsmodels/datasets/nile/data.py
@@ -1,19 +1,26 @@
 """Nile River Flows."""
 import pandas as pd
+
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is public domain.'
-TITLE = 'Nile River flows at Ashwan 1871-1970'
-SOURCE = """
+
+COPYRIGHT   = """This is public domain."""
+TITLE       = """Nile River flows at Ashwan 1871-1970"""
+SOURCE      = """
 This data is first analyzed in:

     Cobb, G. W. 1978. "The Problem of the Nile: Conditional Solution to a
         Changepoint Problem." *Biometrika*. 65.2, 243-51.
 """
-DESCRSHORT = """This dataset contains measurements on the annual flow of
+
+DESCRSHORT  = """This dataset contains measurements on the annual flow of
 the Nile as measured at Ashwan for 100 years from 1871-1970."""
-DESCRLONG = DESCRSHORT + ' There is an apparent changepoint near 1898.'
-NOTE = """::
+
+DESCRLONG   = DESCRSHORT + " There is an apparent changepoint near 1898."
+
+#suggested notes
+NOTE        = """::

     Number of observations: 100
     Number of variables: 2
@@ -33,4 +40,16 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+
+def load_pandas():
+    data = _get_data()
+    # TODO: time series
+    endog = pd.Series(data['volume'], index=data['year'].astype(int))
+    dataset = du.Dataset(data=data, names=list(data.columns), endog=endog, endog_name='volume')
+    return dataset
+
+
+def _get_data():
+    return du.load_csv(__file__, 'nile.csv').astype(float)
diff --git a/statsmodels/datasets/randhie/data.py b/statsmodels/datasets/randhie/data.py
index 9a52adddf..a82c4679a 100644
--- a/statsmodels/datasets/randhie/data.py
+++ b/statsmodels/datasets/randhie/data.py
@@ -1,9 +1,11 @@
 """RAND Health Insurance Experiment Data"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is in the public domain.'
-TITLE = __doc__
-SOURCE = """
+
+COPYRIGHT   = """This is in the public domain."""
+TITLE       = __doc__
+SOURCE      = """
 The data was collected by the RAND corporation as part of the Health
 Insurance Experiment (HIE).

@@ -20,9 +22,12 @@ See randhie/src for the original data and description.  The data included
 here contains only a subset of the original data.  The data varies slightly
 compared to that reported in Cameron and Trivedi.
 """
-DESCRSHORT = 'The RAND Co. Health Insurance Experiment Data'
-DESCRLONG = ''
-NOTE = """::
+
+DESCRSHORT  = """The RAND Co. Health Insurance Experiment Data"""
+
+DESCRLONG   = """"""
+
+NOTE        = """::

     Number of observations - 20,190
     Number of variables - 10
@@ -56,7 +61,7 @@ def load():
     endog - response variable, mdvis
     exog - design
     """
-    pass
+    return load_pandas()


 def load_pandas():
@@ -73,4 +78,8 @@ def load_pandas():
     endog - response variable, mdvis
     exog - design
     """
-    pass
+    return du.process_pandas(_get_data(), endog_idx=0)
+
+
+def _get_data():
+    return du.load_csv(__file__, 'randhie.csv')
diff --git a/statsmodels/datasets/scotland/data.py b/statsmodels/datasets/scotland/data.py
index 442f58ed0..1408d941f 100644
--- a/statsmodels/datasets/scotland/data.py
+++ b/statsmodels/datasets/scotland/data.py
@@ -1,16 +1,19 @@
 """Taxation Powers Vote for the Scottish Parliament 1997 dataset."""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = """Used with express permission from the original author,
+
+COPYRIGHT   = """Used with express permission from the original author,
 who retains all rights."""
-TITLE = 'Taxation Powers Vote for the Scottish Parliament 1997'
-SOURCE = """
+TITLE       = "Taxation Powers Vote for the Scottish Parliament 1997"
+SOURCE      = """
 Jeff Gill's `Generalized Linear Models: A Unified Approach`

 http://jgill.wustl.edu/research/books.html
 """
-DESCRSHORT = "Taxation Powers' Yes Vote for Scottish Parliamanet-1997"
-DESCRLONG = """
+DESCRSHORT  = """Taxation Powers' Yes Vote for Scottish Parliamanet-1997"""
+
+DESCRLONG   = """
 This data is based on the example in Gill and describes the proportion of
 voters who voted Yes to grant the Scottish Parliament taxation powers.
 The data are divided into 32 council districts.  This example's explanatory
@@ -24,7 +27,8 @@ between female unemployment and the council tax.
 The original source files and variable information are included in
 /scotland/src/
 """
-NOTE = """::
+
+NOTE        = """::

     Number of Observations - 32 (1 for each Scottish district)

@@ -58,7 +62,7 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()


 def load_pandas():
@@ -70,4 +74,11 @@ def load_pandas():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=0)
+
+
+def _get_data():
+    data = du.load_csv(__file__, 'scotvote.csv')
+    data = data.iloc[:, 1:9]
+    return data.astype(float)
diff --git a/statsmodels/datasets/spector/data.py b/statsmodels/datasets/spector/data.py
index db96c6cc9..b187c4e1d 100644
--- a/statsmodels/datasets/spector/data.py
+++ b/statsmodels/datasets/spector/data.py
@@ -1,19 +1,24 @@
 """Spector and Mazzeo (1980) - Program Effectiveness Data"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = """Used with express permission of the original author, who
+
+COPYRIGHT   = """Used with express permission of the original author, who
 retains all rights. """
-TITLE = __doc__
-SOURCE = """
+TITLE       = __doc__
+SOURCE      = """
 http://pages.stern.nyu.edu/~wgreene/Text/econometricanalysis.htm

 The raw data was downloaded from Bill Greene's Econometric Analysis web site,
 though permission was obtained from the original researcher, Dr. Lee Spector,
 Professor of Economics, Ball State University."""
-DESCRSHORT = """Experimental data on the effectiveness of the personalized
+
+DESCRSHORT  = """Experimental data on the effectiveness of the personalized
 system of instruction (PSI) program"""
-DESCRLONG = DESCRSHORT
-NOTE = """::
+
+DESCRLONG   = DESCRSHORT
+
+NOTE        = """::

     Number of Observations - 32

@@ -38,7 +43,7 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()


 def load_pandas():
@@ -50,4 +55,12 @@ def load_pandas():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=3)
+
+
+def _get_data():
+    data = du.load_csv(__file__, 'spector.csv', sep=r'\s')
+    data = du.strip_column_names(data)
+    data = data.iloc[:, [1, 2, 3, 4]]
+    return data.astype(float)
diff --git a/statsmodels/datasets/stackloss/data.py b/statsmodels/datasets/stackloss/data.py
index 4a29df5ef..e9f057744 100644
--- a/statsmodels/datasets/stackloss/data.py
+++ b/statsmodels/datasets/stackloss/data.py
@@ -1,17 +1,22 @@
 """Stack loss data"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is public domain. '
-TITLE = __doc__
-SOURCE = """
+
+COPYRIGHT   = """This is public domain. """
+TITLE       = __doc__
+SOURCE      = """
 Brownlee, K. A. (1965), "Statistical Theory and Methodology in
 Science and Engineering", 2nd edition, New York:Wiley.
 """
-DESCRSHORT = 'Stack loss plant data of Brownlee (1965)'
-DESCRLONG = """The stack loss plant data of Brownlee (1965) contains
+
+DESCRSHORT  = """Stack loss plant data of Brownlee (1965)"""
+
+DESCRLONG   = """The stack loss plant data of Brownlee (1965) contains
 21 days of measurements from a plant's oxidation of ammonia to nitric acid.
 The nitric oxide pollutants are captured in an absorption tower."""
-NOTE = """::
+
+NOTE        = """::

     Number of Observations - 21

@@ -36,8 +41,7 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
-
+    return load_pandas()

 def load_pandas():
     """
@@ -48,4 +52,9 @@ def load_pandas():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=0)
+
+
+def _get_data():
+    return du.load_csv(__file__, 'stackloss.csv').astype(float)
diff --git a/statsmodels/datasets/star98/data.py b/statsmodels/datasets/star98/data.py
index f8cc70bcc..6f13c0f1e 100644
--- a/statsmodels/datasets/star98/data.py
+++ b/statsmodels/datasets/star98/data.py
@@ -1,16 +1,19 @@
 """Star98 Educational Testing dataset."""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = """Used with express permission from the original author,
+
+COPYRIGHT   = """Used with express permission from the original author,
 who retains all rights."""
-TITLE = 'Star98 Educational Dataset'
-SOURCE = """
+TITLE       = "Star98 Educational Dataset"
+SOURCE      = """
 Jeff Gill's `Generalized Linear Models: A Unified Approach`

 http://jgill.wustl.edu/research/books.html
 """
-DESCRSHORT = 'Math scores for 303 student with 10 explanatory factors'
-DESCRLONG = """
+DESCRSHORT  = """Math scores for 303 student with 10 explanatory factors"""
+
+DESCRLONG   = """
 This data is on the California education policy and outcomes (STAR program
 results for 1998.  The data measured standardized testing by the California
 Department of Education that required evaluation of 2nd - 11th grade students
@@ -21,7 +24,8 @@ over the national median value on the mathematics exam.

 The data used in this example is only a subset of the original source.
 """
-NOTE = """::
+
+NOTE        = """::

     Number of Observations - 303 (counties in California).

@@ -62,6 +66,7 @@ NOTE = """::
 """


+
 def load():
     """
     Load the star98 data and returns a Dataset class instance.
@@ -71,4 +76,26 @@ def load():
     Load instance:
         a class of the data with array attrbutes 'endog' and 'exog'
     """
-    pass
+    return load_pandas()
+
+
+def load_pandas():
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=['NABOVE', 'NBELOW'])
+
+
+def _get_data():
+    data = du.load_csv(__file__, 'star98.csv')
+    names = ["NABOVE","NBELOW","LOWINC","PERASIAN","PERBLACK","PERHISP",
+            "PERMINTE","AVYRSEXP","AVSALK","PERSPENK","PTRATIO","PCTAF",
+            "PCTCHRT","PCTYRRND","PERMINTE_AVYRSEXP","PERMINTE_AVSAL",
+            "AVYRSEXP_AVSAL","PERSPEN_PTRATIO","PERSPEN_PCTAF","PTRATIO_PCTAF",
+            "PERMINTE_AVYRSEXP_AVSAL","PERSPEN_PTRATIO_PCTAF"]
+    data.columns = names
+    nabove = data['NABOVE'].copy()
+    nbelow = data['NBELOW'].copy()
+
+    data['NABOVE'] = nbelow  # successes
+    data['NBELOW'] = nabove - nbelow  # now failures
+
+    return data
diff --git a/statsmodels/datasets/statecrime/data.py b/statsmodels/datasets/statecrime/data.py
index ed68c7aa4..7d5530b8f 100644
--- a/statsmodels/datasets/statecrime/data.py
+++ b/statsmodels/datasets/statecrime/data.py
@@ -1,14 +1,20 @@
 """Statewide Crime Data"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'Public domain.'
-TITLE = 'Statewide Crime Data 2009'
-SOURCE = """
+
+COPYRIGHT   = """Public domain."""
+TITLE       = """Statewide Crime Data 2009"""
+SOURCE      = """
 All data is for 2009 and was obtained from the American Statistical Abstracts except as indicated below.
 """
-DESCRSHORT = 'State crime data 2009'
-DESCRLONG = DESCRSHORT
-NOTE = """::
+
+DESCRSHORT  = """State crime data 2009"""
+
+DESCRLONG   = DESCRSHORT
+
+#suggested notes
+NOTE        = """::

     Number of observations: 51
     Number of variables: 8
@@ -47,6 +53,11 @@ NOTE = """::
         Areas are area of 50,000 or more people."""


+def load_pandas():
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=2, exog_idx=[7, 4, 3, 5], index_idx=0)
+
+
 def load():
     """
     Load the statecrime data and return a Dataset class instance.
@@ -56,4 +67,8 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+
+def _get_data():
+    return du.load_csv(__file__, 'statecrime.csv')
diff --git a/statsmodels/datasets/strikes/data.py b/statsmodels/datasets/strikes/data.py
index 3fb429f9f..4eaa15134 100644
--- a/statsmodels/datasets/strikes/data.py
+++ b/statsmodels/datasets/strikes/data.py
@@ -1,9 +1,11 @@
 """U.S. Strike Duration Data"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This is public domain.'
-TITLE = __doc__
-SOURCE = """
+
+COPYRIGHT   = """This is public domain."""
+TITLE       = __doc__
+SOURCE      = """
 This is a subset of the data used in Kennan (1985). It was originally
 published by the Bureau of Labor Statistics.

@@ -12,13 +14,17 @@ published by the Bureau of Labor Statistics.
     Kennan, J. 1985. "The duration of contract strikes in US manufacturing.
         `Journal of Econometrics` 28.1, 5-28.
 """
-DESCRSHORT = """Contains data on the length of strikes in US manufacturing and
+
+DESCRSHORT  = """Contains data on the length of strikes in US manufacturing and
 unanticipated industrial production."""
-DESCRLONG = """Contains data on the length of strikes in US manufacturing and
+
+DESCRLONG   = """Contains data on the length of strikes in US manufacturing and
 unanticipated industrial production. The data is a subset of the data originally
 used by Kennan. The data here is data for the months of June only to avoid
 seasonal issues."""
-NOTE = """::
+
+#suggested notes
+NOTE        = """::

     Number of observations - 62

@@ -31,6 +37,7 @@ NOTE = """::
 """


+
 def load_pandas():
     """
     Load the strikes data and return a Dataset class instance.
@@ -40,7 +47,8 @@ def load_pandas():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=0)


 def load():
@@ -52,4 +60,8 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()
+
+
+def _get_data():
+    return du.load_csv(__file__,'strikes.csv').astype(float)
diff --git a/statsmodels/datasets/sunspots/data.py b/statsmodels/datasets/sunspots/data.py
index 950a6a6df..3164373f5 100644
--- a/statsmodels/datasets/sunspots/data.py
+++ b/statsmodels/datasets/sunspots/data.py
@@ -1,18 +1,23 @@
 """Yearly sunspots data 1700-2008"""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'This data is public domain.'
-TITLE = __doc__
-SOURCE = """
+
+COPYRIGHT   = """This data is public domain."""
+TITLE       = __doc__
+SOURCE      = """
 http://www.ngdc.noaa.gov/stp/solar/solarda3.html

 The original dataset contains monthly data on sunspot activity in the file
 ./src/sunspots_yearly.dat.  There is also sunspots_monthly.dat.
 """
-DESCRSHORT = """Yearly (1700-2008) data on sunspots from the National
+
+DESCRSHORT  = """Yearly (1700-2008) data on sunspots from the National
 Geophysical Data Center."""
-DESCRLONG = DESCRSHORT
-NOTE = """::
+
+DESCRLONG   = DESCRSHORT
+
+NOTE        = """::

     Number of Observations - 309 (Annual 1700 - 2008)
     Number of Variables - 1
@@ -24,6 +29,15 @@ NOTE = """::
 """


+def load_pandas():
+    data = _get_data()
+    # TODO: time series
+    endog = data.set_index(data.YEAR).SUNACTIVITY
+    dataset = du.Dataset(data=data, names=list(data.columns),
+                         endog=endog, endog_name='volume')
+    return dataset
+
+
 def load():
     """
     Load the yearly sunspot data and returns a data class.
@@ -39,4 +53,8 @@ def load():
     data, raw_data, and endog are all the same variable.  There is no exog
     attribute defined.
     """
-    pass
+    return load_pandas()
+
+
+def _get_data():
+    return du.load_csv(__file__, 'sunspots.csv').astype(float)
diff --git a/statsmodels/datasets/template_data.py b/statsmodels/datasets/template_data.py
index b5f7aa02f..312b9eb45 100644
--- a/statsmodels/datasets/template_data.py
+++ b/statsmodels/datasets/template_data.py
@@ -1,16 +1,22 @@
 """Name of dataset."""
 from statsmodels.datasets import utils as du
+
 __docformat__ = 'restructuredtext'
-COPYRIGHT = 'E.g., This is public domain.'
-TITLE = 'Title of the dataset'
-SOURCE = """
+
+COPYRIGHT   = """E.g., This is public domain."""
+TITLE       = """Title of the dataset"""
+SOURCE      = """
 This section should provide a link to the original dataset if possible and
 attribution and correspondance information for the dataset's original author
 if so desired.
 """
-DESCRSHORT = 'A short description.'
-DESCRLONG = 'A longer description of the dataset.'
-NOTE = """
+
+DESCRSHORT  = """A short description."""
+
+DESCRLONG   = """A longer description of the dataset."""
+
+#suggested notes
+NOTE        = """
 ::

     Number of observations:
@@ -30,7 +36,7 @@ def load():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    return load_pandas()


 def load_pandas():
@@ -42,4 +48,9 @@ def load_pandas():
     Dataset
         See DATASET_PROPOSAL.txt for more information.
     """
-    pass
+    data = _get_data()
+    return du.process_pandas(data, endog_idx=0)
+
+
+def _get_data():
+    return du.load_csv(__file__, 'DatasetName.csv')
diff --git a/statsmodels/datasets/utils.py b/statsmodels/datasets/utils.py
index df046902a..09972f7ea 100644
--- a/statsmodels/datasets/utils.py
+++ b/statsmodels/datasets/utils.py
@@ -1,4 +1,5 @@
 from statsmodels.compat.python import lrange
+
 from io import StringIO
 from os import environ, makedirs
 from os.path import abspath, dirname, exists, expanduser, join
@@ -6,6 +7,7 @@ import shutil
 from urllib.error import HTTPError, URLError
 from urllib.parse import urljoin
 from urllib.request import urlopen
+
 import numpy as np
 from pandas import Index, read_csv, read_stata

@@ -37,19 +39,23 @@ def webuse(data, baseurl='https://www.stata-press.com/data/r11/', as_df=True):
     Make sure baseurl has trailing forward slash. Does not do any
     error checking in response URLs.
     """
-    pass
+    url = urljoin(baseurl, data+'.dta')
+    return read_stata(url)


 class Dataset(dict):
-
     def __init__(self, **kw):
+        # define some default attributes, so pylint can find them
         self.endog = None
         self.exog = None
         self.data = None
         self.names = None
+
         dict.__init__(self, kw)
         self.__dict__ = self
-        try:
+        # Some datasets have string variables. If you want a raw_data
+        # attribute you must create this in the dataset's load function.
+        try:  # some datasets have string variables
             self.raw_data = self.data.astype(float)
         except:
             pass
@@ -58,12 +64,70 @@ class Dataset(dict):
         return str(self.__class__)


+def process_pandas(data, endog_idx=0, exog_idx=None, index_idx=None):
+    names = data.columns
+
+    if isinstance(endog_idx, int):
+        endog_name = names[endog_idx]
+        endog = data[endog_name].copy()
+        if exog_idx is None:
+            exog = data.drop([endog_name], axis=1)
+        else:
+            exog = data[names[exog_idx]].copy()
+    else:
+        endog = data.loc[:, endog_idx].copy()
+        endog_name = list(endog.columns)
+        if exog_idx is None:
+            exog = data.drop(endog_name, axis=1)
+        elif isinstance(exog_idx, int):
+            exog = data[names[exog_idx]].copy()
+        else:
+            exog = data[names[exog_idx]].copy()
+
+    if index_idx is not None:  # NOTE: will have to be improved for dates
+        index = Index(data.iloc[:, index_idx])
+        endog.index = index
+        exog.index = index.copy()
+        data = data.set_index(names[index_idx])
+
+    exog_name = list(exog.columns)
+    dataset = Dataset(data=data, names=list(names), endog=endog,
+                      exog=exog, endog_name=endog_name, exog_name=exog_name)
+    return dataset
+
+
 def _maybe_reset_index(data):
     """
     All the Rdatasets have the integer row.labels from R if there is no
     real index. Strip this for a zero-based index
     """
-    pass
+    if data.index.equals(Index(lrange(1, len(data) + 1))):
+        data = data.reset_index(drop=True)
+    return data
+
+
+def _get_cache(cache):
+    if cache is False:
+        # do not do any caching or load from cache
+        cache = None
+    elif cache is True:  # use default dir for cache
+        cache = get_data_home(None)
+    else:
+        cache = get_data_home(cache)
+    return cache
+
+
+def _cache_it(data, cache_path):
+    import zlib
+    with open(cache_path, "wb") as zf:
+        zf.write(zlib.compress(data))
+
+
+def _open_cache(cache_path):
+    import zlib
+    # return as bytes object encoded in utf-8 for cross-compat of cached
+    with open(cache_path, 'rb') as zf:
+        return zlib.decompress(zf.read())


 def _urlopen_cached(url, cache):
@@ -72,10 +136,63 @@ def _urlopen_cached(url, cache):
     downloads the data and cache is not None then it will put the downloaded
     data in the cache path.
     """
-    pass
-
+    from_cache = False
+    if cache is not None:
+        file_name = url.split("://")[-1].replace('/', ',')
+        file_name = file_name.split('.')
+        if len(file_name) > 1:
+            file_name[-2] += '-v2'
+        else:
+            file_name[0] += '-v2'
+        file_name = '.'.join(file_name) + ".zip"
+        cache_path = join(cache, file_name)
+        try:
+            data = _open_cache(cache_path)
+            from_cache = True
+        except:
+            pass

-def get_rdataset(dataname, package='datasets', cache=False):
+    # not using the cache or did not find it in cache
+    if not from_cache:
+        data = urlopen(url, timeout=3).read()
+        if cache is not None:  # then put it in the cache
+            _cache_it(data, cache_path)
+    return data, from_cache
+
+
+def _get_data(base_url, dataname, cache, extension="csv"):
+    url = base_url + (dataname + ".%s") % extension
+    try:
+        data, from_cache = _urlopen_cached(url, cache)
+    except HTTPError as err:
+        if '404' in str(err):
+            raise ValueError("Dataset %s was not found." % dataname)
+        else:
+            raise err
+
+    data = data.decode('utf-8', 'strict')
+    return StringIO(data), from_cache
+
+
+def _get_dataset_meta(dataname, package, cache):
+    # get the index, you'll probably want this cached because you have
+    # to download info about all the data to get info about any of the data...
+    index_url = ("https://raw.githubusercontent.com/vincentarelbundock/"
+                 "Rdatasets/master/datasets.csv")
+    data, _ = _urlopen_cached(index_url, cache)
+    data = data.decode('utf-8', 'strict')
+    index = read_csv(StringIO(data))
+    idx = np.logical_and(index.Item == dataname, index.Package == package)
+    if not idx.any():
+        raise ValueError(
+            f"Item {dataname} from Package {package} was not found. Check "
+            f"the CSV file at {index_url} to verify the Item and Package."
+        )
+    dataset_meta = index.loc[idx]
+    return dataset_meta["Title"].iloc[0]
+
+
+def get_rdataset(dataname, package="datasets", cache=False):
     """download and return R dataset

     Parameters
@@ -111,7 +228,23 @@ def get_rdataset(dataname, package='datasets', cache=False):
     is checked to see if the data should be downloaded again or not. If the
     dataset is in the cache, it's used.
     """
-    pass
+    # NOTE: use raw github bc html site might not be most up to date
+    data_base_url = ("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/"
+                     "master/csv/"+package+"/")
+    docs_base_url = ("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/"
+                     "master/doc/"+package+"/rst/")
+    cache = _get_cache(cache)
+    data, from_cache = _get_data(data_base_url, dataname, cache)
+    data = read_csv(data, index_col=0)
+    data = _maybe_reset_index(data)
+
+    title = _get_dataset_meta(dataname, package, cache)
+    doc, _ = _get_data(docs_base_url, dataname, cache, "rst")
+
+    return Dataset(data=data, __doc__=doc.read(), package=package, title=title,
+                   from_cache=from_cache)
+
+# The below function were taken from sklearn


 def get_data_home(data_home=None):
@@ -129,17 +262,29 @@ def get_data_home(data_home=None):

     If the folder does not already exist, it is automatically created.
     """
-    pass
+    if data_home is None:
+        data_home = environ.get('STATSMODELS_DATA',
+                                join('~', 'statsmodels_data'))
+    data_home = expanduser(data_home)
+    if not exists(data_home):
+        makedirs(data_home)
+    return data_home


 def clear_data_home(data_home=None):
     """Delete all the content of the data home cache."""
-    pass
+    data_home = get_data_home(data_home)
+    shutil.rmtree(data_home)


 def check_internet(url=None):
     """Check if internet is available"""
-    pass
+    url = "https://github.com" if url is None else url
+    try:
+        urlopen(url)
+    except URLError as err:
+        return False
+    return True


 def strip_column_names(df):
@@ -160,9 +305,28 @@ def strip_column_names(df):
     -----
     In-place modification
     """
-    pass
+    columns = []
+    for c in df:
+        if c.startswith('\'') and c.endswith('\''):
+            c = c[1:-1]
+        elif c.startswith('\''):
+            c = c[1:]
+        elif c.endswith('\''):
+            c = c[:-1]
+        columns.append(c)
+    df.columns = columns
+    return df


 def load_csv(base_file, csv_name, sep=',', convert_float=False):
     """Standard simple csv loader"""
-    pass
+    filepath = dirname(abspath(base_file))
+    filename = join(filepath,csv_name)
+    engine = 'python' if sep != ',' else 'c'
+    float_precision = {}
+    if engine == 'c':
+        float_precision = {'float_precision': 'high'}
+    data = read_csv(filename, sep=sep, engine=engine, **float_precision)
+    if convert_float:
+        data = data.astype(float)
+    return data
diff --git a/statsmodels/discrete/_diagnostics_count.py b/statsmodels/discrete/_diagnostics_count.py
index dde4c4222..3de5993f6 100644
--- a/statsmodels/discrete/_diagnostics_count.py
+++ b/statsmodels/discrete/_diagnostics_count.py
@@ -1,11 +1,15 @@
+# -*- coding: utf-8 -*-
 """
 Created on Fri Sep 15 12:53:45 2017

 Author: Josef Perktold
 """
+
 import numpy as np
 from scipy import stats
+
 import pandas as pd
+
 from statsmodels.stats.base import HolderTuple
 from statsmodels.discrete.discrete_model import Poisson
 from statsmodels.regression.linear_model import OLS
@@ -52,11 +56,27 @@ def _combine_bins(edge_index, x):
     >>> dia.combine_bins([0,1,3], np.arange(4))
     (array([0, 3]), array([1, 2]))
     """
-    pass
-
-
-def plot_probs(freq, probs_predicted, label='predicted', upp_xlim=None, fig
-    =None):
+    x = np.asarray(x)
+    if x.ndim == 1:
+        is_1d = True
+        x = x[None, :]
+    else:
+        is_1d = False
+    xli = []
+    kli = []
+    for bin_idx in range(len(edge_index) - 1):
+        i, j = edge_index[bin_idx : bin_idx + 2]
+        xli.append(x[:, i:j].sum(1))
+        kli.append(j - i)
+
+    x_new = np.column_stack(xli)
+    if is_1d:
+        x_new = x_new.squeeze()
+    return x_new, np.asarray(kli)
+
+
+def plot_probs(freq, probs_predicted, label='predicted', upp_xlim=None,
+               fig=None):
     """diagnostic plots for comparing two lists of discrete probabilities

     Parameters
@@ -84,7 +104,38 @@ def plot_probs(freq, probs_predicted, label='predicted', upp_xlim=None, fig
         The figure contains 3 subplot with probabilities, cumulative
         probabilities and a PP-plot
     """
-    pass
+
+    if isinstance(label, list):
+        label0, label1 = label
+    else:
+        label0, label1 = 'freq', label
+
+    if fig is None:
+        import matplotlib.pyplot as plt
+        fig = plt.figure(figsize=(8,12))
+    ax1 = fig.add_subplot(311)
+    ax1.plot(freq, '-o', label=label0)
+    ax1.plot(probs_predicted, '-d', label=label1)
+    if upp_xlim is not None:
+        ax1.set_xlim(0, upp_xlim)
+    ax1.legend()
+    ax1.set_title('probabilities')
+
+    ax2 = fig.add_subplot(312)
+    ax2.plot(np.cumsum(freq), '-o', label=label0)
+    ax2.plot(np.cumsum(probs_predicted), '-d', label=label1)
+    if upp_xlim is not None:
+        ax2.set_xlim(0, upp_xlim)
+    ax2.legend()
+    ax2.set_title('cumulative probabilities')
+
+    ax3 = fig.add_subplot(313)
+    ax3.plot(np.cumsum(probs_predicted), np.cumsum(freq), 'o')
+    ax3.plot(np.arange(len(freq)) / len(freq), np.arange(len(freq)) / len(freq))
+    ax3.set_title('PP-plot')
+    ax3.set_xlabel(label1)
+    ax3.set_ylabel(label0)
+    return fig


 def test_chisquare_prob(results, probs, bin_edges=None, method=None):
@@ -140,14 +191,58 @@ def test_chisquare_prob(results, probs, bin_edges=None, method=None):
     .. [3] Manjón, M., and O. Martínez. 2014. “The Chi-Squared Goodness-of-Fit
            Test for Count-Data Models.” Stata Journal 14 (4): 798–816.
     """
-    pass
+    res = results
+    score_obs = results.model.score_obs(results.params)
+    d_ind = (res.model.endog[:, None] == np.arange(probs.shape[1])).astype(int)
+    if bin_edges is not None:
+        d_ind_bins, k_bins = _combine_bins(bin_edges, d_ind)
+        probs_bins, k_bins = _combine_bins(bin_edges, probs)
+        k_bins = probs_bins.shape[-1]
+    else:
+        d_ind_bins, k_bins = d_ind, d_ind.shape[1]
+        probs_bins = probs
+    diff1 = d_ind_bins - probs_bins
+    # diff2 = (1 - d_ind.sum(1)) - (1 - probs_bins.sum(1))
+    x_aux = np.column_stack((score_obs, diff1[:, :-1]))  # diff2))
+    nobs = x_aux.shape[0]
+    res_aux = OLS(np.ones(nobs), x_aux).fit()
+
+    chi2_stat = nobs * (1 - res_aux.ssr / res_aux.uncentered_tss)
+    df = res_aux.model.rank - score_obs.shape[1]
+    if df < k_bins - 1:
+        # not a problem in general, but it can be for OPG version
+        import warnings
+        # TODO: Warning shows up in Monte Carlo loop, skip for now
+        warnings.warn('auxiliary model is rank deficient')
+
+    statistic = chi2_stat
+    pvalue = stats.chi2.sf(chi2_stat, df)
+
+    res = HolderTuple(
+        statistic=statistic,
+        pvalue=pvalue,
+        df=df,
+        diff1=diff1,
+        res_aux=res_aux,
+        distribution="chi2",
+        )
+    return res


 class DispersionResults(HolderTuple):
-    pass
+
+    def summary_frame(self):
+        frame = pd.DataFrame({
+            "statistic": self.statistic,
+            "pvalue": self.pvalue,
+            "method": self.method,
+            "alternative": self.alternative
+            })
+
+        return frame


-def test_poisson_dispersion(results, method='all', _old=False):
+def test_poisson_dispersion(results, method="all", _old=False):
     """Score/LM type tests for Poisson variance assumptions

     Null Hypothesis is
@@ -178,12 +273,92 @@ def test_poisson_dispersion(results, method='all', _old=False):
         summary_frame method that returns the results as pandas DataFrame.

     """
-    pass

-
-def _test_poisson_dispersion_generic(results, exog_new_test,
-    exog_new_control=None, include_score=False, use_endog=True, cov_type=
-    'HC3', cov_kwds=None, use_t=False):
+    if method not in ["all"]:
+        raise ValueError(f'unknown method "{method}"')
+
+    if hasattr(results, '_results'):
+        results = results._results
+
+    endog = results.model.endog
+    nobs = endog.shape[0]  # TODO: use attribute, may need to be added
+    fitted = results.predict()
+    # fitted = results.fittedvalues  # discrete has linear prediction
+    # this assumes Poisson
+    resid2 = results.resid_response**2
+    var_resid_endog = (resid2 - endog)
+    var_resid_fitted = (resid2 - fitted)
+    std1 = np.sqrt(2 * (fitted**2).sum())
+
+    var_resid_endog_sum = var_resid_endog.sum()
+    dean_a = var_resid_fitted.sum() / std1
+    dean_b = var_resid_endog_sum / std1
+    dean_c = (var_resid_endog / fitted).sum() / np.sqrt(2 * nobs)
+
+    pval_dean_a = 2 * stats.norm.sf(np.abs(dean_a))
+    pval_dean_b = 2 * stats.norm.sf(np.abs(dean_b))
+    pval_dean_c = 2 * stats.norm.sf(np.abs(dean_c))
+
+    results_all = [[dean_a, pval_dean_a],
+                   [dean_b, pval_dean_b],
+                   [dean_c, pval_dean_c]]
+    description = [['Dean A', 'mu (1 + a mu)'],
+                   ['Dean B', 'mu (1 + a mu)'],
+                   ['Dean C', 'mu (1 + a)']]
+
+    # Cameron Trived auxiliary regression page 78 count book 1989
+    endog_v = var_resid_endog / fitted
+    res_ols_nb2 = OLS(endog_v, fitted).fit(use_t=False)
+    stat_ols_nb2 = res_ols_nb2.tvalues[0]
+    pval_ols_nb2 = res_ols_nb2.pvalues[0]
+    results_all.append([stat_ols_nb2, pval_ols_nb2])
+    description.append(['CT nb2', 'mu (1 + a mu)'])
+
+    res_ols_nb1 = OLS(endog_v, fitted).fit(use_t=False)
+    stat_ols_nb1 = res_ols_nb1.tvalues[0]
+    pval_ols_nb1 = res_ols_nb1.pvalues[0]
+    results_all.append([stat_ols_nb1, pval_ols_nb1])
+    description.append(['CT nb1', 'mu (1 + a)'])
+
+    endog_v = var_resid_endog / fitted
+    res_ols_nb2 = OLS(endog_v, fitted).fit(cov_type='HC3', use_t=False)
+    stat_ols_hc1_nb2 = res_ols_nb2.tvalues[0]
+    pval_ols_hc1_nb2 = res_ols_nb2.pvalues[0]
+    results_all.append([stat_ols_hc1_nb2, pval_ols_hc1_nb2])
+    description.append(['CT nb2 HC3', 'mu (1 + a mu)'])
+
+    res_ols_nb1 = OLS(endog_v, np.ones(len(endog_v))).fit(cov_type='HC3',
+                                                          use_t=False)
+    stat_ols_hc1_nb1 = res_ols_nb1.tvalues[0]
+    pval_ols_hc1_nb1 = res_ols_nb1.pvalues[0]
+    results_all.append([stat_ols_hc1_nb1, pval_ols_hc1_nb1])
+    description.append(['CT nb1 HC3', 'mu (1 + a)'])
+
+    results_all = np.array(results_all)
+    if _old:
+        # for backwards compatibility in 0.14, remove in later versions
+        return results_all, description
+    else:
+        res = DispersionResults(
+            statistic=results_all[:, 0],
+            pvalue=results_all[:, 1],
+            method=[i[0] for i in description],
+            alternative=[i[1] for i in description],
+            name="Poisson Dispersion Test"
+            )
+        return res
+
+
+def _test_poisson_dispersion_generic(
+        results,
+        exog_new_test,
+        exog_new_control=None,
+        include_score=False,
+        use_endog=True,
+        cov_type='HC3',
+        cov_kwds=None,
+        use_t=False
+        ):
     """A variable addition test for the variance function

     This uses an artificial regression to calculate a variant of an LM or
@@ -193,7 +368,57 @@ def _test_poisson_dispersion_generic(results, exog_new_test,

     Warning: insufficiently tested, especially for options
     """
-    pass
+
+    if hasattr(results, '_results'):
+        results = results._results
+
+    endog = results.model.endog
+    nobs = endog.shape[0]   # TODO: use attribute, may need to be added
+    # fitted = results.fittedvalues  # generic has linpred as fittedvalues
+    fitted = results.predict()
+    resid2 = results.resid_response**2
+    # the following assumes Poisson
+    if use_endog:
+        var_resid = (resid2 - endog)
+    else:
+        var_resid = (resid2 - fitted)
+
+    endog_v = var_resid / fitted
+
+    k_constraints = exog_new_test.shape[1]
+    ex_list = [exog_new_test]
+    if include_score:
+        score_obs = results.model.score_obs(results.params)
+        ex_list.append(score_obs)
+
+    if exog_new_control is not None:
+        ex_list.append(score_obs)
+
+    if len(ex_list) > 1:
+        ex = np.column_stack(ex_list)
+        use_wald = True
+    else:
+        ex = ex_list[0]  # no control variables in exog
+        use_wald = False
+
+    res_ols = OLS(endog_v, ex).fit(cov_type=cov_type, cov_kwds=cov_kwds,
+                                   use_t=use_t)
+
+    if use_wald:
+        # we have controls and need to test coefficients
+        k_vars = ex.shape[1]
+        constraints = np.eye(k_constraints, k_vars)
+        ht = res_ols.wald_test(constraints)
+        stat_ols = ht.statistic
+        pval_ols = ht.pvalue
+    else:
+        # we do not have controls and can use overall fit
+        nobs = endog_v.shape[0]
+        rsquared_noncentered = 1 - res_ols.ssr/res_ols.uncentered_tss
+        stat_ols = nobs * rsquared_noncentered
+        pval_ols = stats.chi2.sf(stat_ols, k_constraints)
+
+    return stat_ols, pval_ols


 def test_poisson_zeroinflation_jh(results_poisson, exog_infl=None):
@@ -245,7 +470,45 @@ def test_poisson_zeroinflation_jh(results_poisson, exog_infl=None):
            Poisson Models.” Computational Statistics & Data Analysis 40 (1):
            75–96. https://doi.org/10.1016/S0167-9473(01)00104-9.
     """
-    pass
+    if not isinstance(results_poisson.model, Poisson):
+        # GLM Poisson would be also valid, not tried
+        import warnings
+        warnings.warn('Test is only valid if model is Poisson')
+
+    nobs = results_poisson.model.endog.shape[0]
+
+    if exog_infl is None:
+        exog_infl = np.ones((nobs, 1))
+
+
+    endog = results_poisson.model.endog
+    exog = results_poisson.model.exog
+
+    mu = results_poisson.predict()
+    prob_zero = np.exp(-mu)
+
+    cov_poi = results_poisson.cov_params()
+    cross_derivative = (exog_infl.T * (-mu)).dot(exog).T
+    cov_infl = (exog_infl.T * ((1 - prob_zero) / prob_zero)).dot(exog_infl)
+    score_obs_infl = exog_infl * (((endog == 0) - prob_zero) / prob_zero)[:,None]
+    #score_obs_infl = exog_infl * ((endog == 0) * (1 - prob_zero) / prob_zero - (endog>0))[:,None] #same
+    score_infl = score_obs_infl.sum(0)
+    cov_score_infl = cov_infl - cross_derivative.T.dot(cov_poi).dot(cross_derivative)
+    cov_score_infl_inv = np.linalg.pinv(cov_score_infl)
+
+    statistic = score_infl.dot(cov_score_infl_inv).dot(score_infl)
+    df2 = np.linalg.matrix_rank(cov_score_infl)  # more general, maybe not needed
+    df = exog_infl.shape[1]
+    pvalue = stats.chi2.sf(statistic, df)
+
+    res = HolderTuple(
+        statistic=statistic,
+        pvalue=pvalue,
+        df=df,
+        rank_score=df2,
+        distribution="chi2",
+        )
+    return res


 def test_poisson_zeroinflation_broek(results_poisson):
@@ -267,7 +530,31 @@ def test_poisson_zeroinflation_broek(results_poisson):
            https://doi.org/10.2307/2532959.

     """
-    pass
+
+    mu = results_poisson.predict()
+    prob_zero = np.exp(-mu)
+    endog = results_poisson.model.endog
+    # nobs = len(endog)
+    # score =  ((endog == 0) / prob_zero).sum() - nobs
+    # var_score = (1 / prob_zero).sum() - nobs - endog.sum()
+    score = (((endog == 0) - prob_zero) / prob_zero).sum()
+    var_score = ((1 - prob_zero) / prob_zero).sum() - endog.sum()
+    statistic = score / np.sqrt(var_score)
+    pvalue_two = 2 * stats.norm.sf(np.abs(statistic))
+    pvalue_upp = stats.norm.sf(statistic)
+    pvalue_low = stats.norm.cdf(statistic)
+
+    res = HolderTuple(
+        statistic=statistic,
+        pvalue=pvalue_two,
+        pvalue_smaller=pvalue_upp,
+        pvalue_larger=pvalue_low,
+        chi2=statistic**2,
+        pvalue_chi2=stats.chi2.sf(statistic**2, 1),
+        df_chi2=1,
+        distribution="normal",
+        )
+    return res


 def test_poisson_zeros(results):
@@ -289,4 +576,31 @@ def test_poisson_zeros(results):
            https://doi.org/10.1177/0962280217749991.

     """
-    pass
+    x = results.model.exog
+    mean = results.predict()
+    prob0 = np.exp(-mean)
+    counts = (results.model.endog == 0).astype(int)
+    diff = counts.sum() - prob0.sum()
+    var1 = prob0 @ (1 - prob0)
+    pm = prob0 * mean
+    c = np.linalg.inv(x.T * mean @ x)
+    pmx = pm @ x
+    var2 = pmx @ c @ pmx
+    var = var1 - var2
+    statistic = diff / np.sqrt(var)
+
+    pvalue_two = 2 * stats.norm.sf(np.abs(statistic))
+    pvalue_upp = stats.norm.sf(statistic)
+    pvalue_low = stats.norm.cdf(statistic)
+
+    res = HolderTuple(
+        statistic=statistic,
+        pvalue=pvalue_two,
+        pvalue_smaller=pvalue_upp,
+        pvalue_larger=pvalue_low,
+        chi2=statistic**2,
+        pvalue_chi2=stats.chi2.sf(statistic**2, 1),
+        df_chi2=1,
+        distribution="normal",
+        )
+    return res
diff --git a/statsmodels/discrete/conditional_models.py b/statsmodels/discrete/conditional_models.py
index ee642cef8..cbed6c23c 100644
--- a/statsmodels/discrete/conditional_models.py
+++ b/statsmodels/discrete/conditional_models.py
@@ -1,11 +1,13 @@
 """
 Conditional logistic, Poisson, and multinomial logit regression
 """
+
 import numpy as np
 import statsmodels.base.model as base
 import statsmodels.regression.linear_model as lm
 import statsmodels.base.wrapper as wrap
-from statsmodels.discrete.discrete_model import MultinomialResults, MultinomialResultsWrapper
+from statsmodels.discrete.discrete_model import (MultinomialResults,
+      MultinomialResultsWrapper)
 import collections
 import warnings
 import itertools
@@ -14,33 +16,40 @@ import itertools
 class _ConditionalModel(base.LikelihoodModel):

     def __init__(self, endog, exog, missing='none', **kwargs):
-        if 'groups' not in kwargs:
+
+        if "groups" not in kwargs:
             raise ValueError("'groups' is a required argument")
-        groups = kwargs['groups']
+        groups = kwargs["groups"]
+
         if groups.size != endog.size:
             msg = "'endog' and 'groups' should have the same dimensions"
             raise ValueError(msg)
+
         if exog.shape[0] != endog.size:
-            msg = (
-                "The leading dimension of 'exog' should equal the length of 'endog'"
-                )
+            msg = "The leading dimension of 'exog' should equal the length of 'endog'"
             raise ValueError(msg)
-        super(_ConditionalModel, self).__init__(endog, exog, missing=
-            missing, **kwargs)
+
+        super(_ConditionalModel, self).__init__(
+            endog, exog, missing=missing, **kwargs)
+
         if self.data.const_idx is not None:
-            msg = (
-                'Conditional models should not have an intercept in the ' +
-                'design matrix')
+            msg = ("Conditional models should not have an intercept in the " +
+                  "design matrix")
             raise ValueError(msg)
+
         exog = self.exog
         self.k_params = exog.shape[1]
+
+        # Get the row indices for each group
         row_ix = {}
         for i, g in enumerate(groups):
             if g not in row_ix:
                 row_ix[g] = []
             row_ix[g].append(i)
+
+        # Split the data into groups and remove groups with no variation
         endog, exog = np.asarray(endog), np.asarray(exog)
-        offset = kwargs.get('offset')
+        offset = kwargs.get("offset")
         self._endog_grp = []
         self._exog_grp = []
         self._groupsize = []
@@ -64,23 +73,73 @@ class _ConditionalModel(base.LikelihoodModel):
             self._groupsize.append(len(y))
             self._exog_grp.append(exog[ix, :])
             self._sumy.append(np.sum(y))
+
         if drops[0] > 0:
-            msg = ('Dropped %d groups and %d observations for having ' +
-                'no within-group variance') % tuple(drops)
+            msg = ("Dropped %d groups and %d observations for having " +
+                   "no within-group variance") % tuple(drops)
             warnings.warn(msg)
+
+        # This can be pre-computed
         if offset is not None:
             self._endofs = []
             for k, ofs in enumerate(self._offset_grp):
                 self._endofs.append(np.dot(self._endog_grp[k], ofs))
+
+        # Number of groups
         self._n_groups = len(self._endog_grp)
+
+        # These are the sufficient statistics
         self._xy = []
         self._n1 = []
         for g in range(self._n_groups):
             self._xy.append(np.dot(self._endog_grp[g], self._exog_grp[g]))
             self._n1.append(np.sum(self._endog_grp[g]))

-    def fit_regularized(self, method='elastic_net', alpha=0.0, start_params
-        =None, refit=False, **kwargs):
+    def hessian(self, params):
+
+        from statsmodels.tools.numdiff import approx_fprime
+        hess = approx_fprime(params, self.score)
+        hess = np.atleast_2d(hess)
+        return hess
+
+    def fit(self,
+            start_params=None,
+            method='BFGS',
+            maxiter=100,
+            full_output=True,
+            disp=False,
+            fargs=(),
+            callback=None,
+            retall=False,
+            skip_hessian=False,
+            **kwargs):
+
+        rslt = super(_ConditionalModel, self).fit(
+            start_params=start_params,
+            method=method,
+            maxiter=maxiter,
+            full_output=full_output,
+            disp=disp,
+            skip_hessian=skip_hessian)
+
+        crslt = ConditionalResults(self, rslt.params, rslt.cov_params(), 1)
+        crslt.method = method
+        crslt.nobs = self.nobs
+        crslt.n_groups = self._n_groups
+        crslt._group_stats = [
+            "%d" % min(self._groupsize),
+            "%d" % max(self._groupsize),
+            "%.1f" % np.mean(self._groupsize)
+        ]
+        rslt = ConditionalResultsWrapper(crslt)
+        return rslt
+
+    def fit_regularized(self,
+                        method="elastic_net",
+                        alpha=0.,
+                        start_params=None,
+                        refit=False,
+                        **kwargs):
         """
         Return a regularized fit to a linear regression model.

@@ -107,7 +166,48 @@ class _ConditionalModel(base.LikelihoodModel):
         Results
             A results instance.
         """
-        pass
+
+        from statsmodels.base.elastic_net import fit_elasticnet
+
+        if method != "elastic_net":
+            raise ValueError("method for fit_regularized must be elastic_net")
+
+        defaults = {"maxiter": 50, "L1_wt": 1, "cnvrg_tol": 1e-10,
+                    "zero_tol": 1e-10}
+        defaults.update(kwargs)
+
+        return fit_elasticnet(self, method=method,
+                              alpha=alpha,
+                              start_params=start_params,
+                              refit=refit,
+                              **defaults)
+
+    # Override to allow groups to be passed as a variable name.
+    @classmethod
+    def from_formula(cls,
+                     formula,
+                     data,
+                     subset=None,
+                     drop_cols=None,
+                     *args,
+                     **kwargs):
+
+        try:
+            groups = kwargs["groups"]
+            del kwargs["groups"]
+        except KeyError:
+            raise ValueError("'groups' is a required argument")
+
+        if isinstance(groups, str):
+            groups = data[groups]
+
+        if "0+" not in formula.replace(" ", ""):
+            warnings.warn("Conditional models should not include an intercept")
+
+        model = super(_ConditionalModel, cls).from_formula(
+            formula, data=data, groups=groups, *args, **kwargs)
+
+        return model


 class ConditionalLogit(_ConditionalModel):
@@ -131,12 +231,121 @@ class ConditionalLogit(_ConditionalModel):
     """

     def __init__(self, endog, exog, missing='none', **kwargs):
-        super(ConditionalLogit, self).__init__(endog, exog, missing=missing,
-            **kwargs)
+
+        super(ConditionalLogit, self).__init__(
+            endog, exog, missing=missing, **kwargs)
+
         if np.any(np.unique(self.endog) != np.r_[0, 1]):
-            msg = 'endog must be coded as 0, 1'
+            msg = "endog must be coded as 0, 1"
             raise ValueError(msg)
+
         self.K = self.exog.shape[1]
+        # i.e. self.k_params, for compatibility with MNLogit
+
+    def loglike(self, params):
+
+        ll = 0
+        for g in range(len(self._endog_grp)):
+            ll += self.loglike_grp(g, params)
+
+        return ll
+
+    def score(self, params):
+
+        score = 0
+        for g in range(self._n_groups):
+            score += self.score_grp(g, params)
+
+        return score
+
+    def _denom(self, grp, params, ofs=None):
+
+        if ofs is None:
+            ofs = 0
+
+        exb = np.exp(np.dot(self._exog_grp[grp], params) + ofs)
+
+        # In the recursions, f may be called multiple times with the
+        # same arguments, so we memoize the results.
+        memo = {}
+
+        def f(t, k):
+            if t < k:
+                return 0
+            if k == 0:
+                return 1
+
+            try:
+                return memo[(t, k)]
+            except KeyError:
+                pass
+
+            v = f(t - 1, k) + f(t - 1, k - 1) * exb[t - 1]
+            memo[(t, k)] = v
+
+            return v
+
+        return f(self._groupsize[grp], self._n1[grp])
+
+    def _denom_grad(self, grp, params, ofs=None):
+
+        if ofs is None:
+            ofs = 0
+
+        ex = self._exog_grp[grp]
+        exb = np.exp(np.dot(ex, params) + ofs)
+
+        # s may be called multiple times in the recursions with the
+        # same arguments, so memoize the results.
+        memo = {}
+
+        def s(t, k):
+
+            if t < k:
+                return 0, np.zeros(self.k_params)
+            if k == 0:
+                return 1, 0
+
+            try:
+                return memo[(t, k)]
+            except KeyError:
+                pass
+
+            h = exb[t - 1]
+            a, b = s(t - 1, k)
+            c, e = s(t - 1, k - 1)
+            d = c * h * ex[t - 1, :]
+
+            u, v = a + c * h, b + d + e * h
+            memo[(t, k)] = (u, v)
+
+            return u, v
+
+        return s(self._groupsize[grp], self._n1[grp])
+
+    def loglike_grp(self, grp, params):
+
+        ofs = None
+        if hasattr(self, 'offset'):
+            ofs = self._offset_grp[grp]
+
+        llg = np.dot(self._xy[grp], params)
+
+        if ofs is not None:
+            llg += self._endofs[grp]
+
+        llg -= np.log(self._denom(grp, params, ofs))
+
+        return llg
+
+    def score_grp(self, grp, params):
+
+        ofs = 0
+        if hasattr(self, 'offset'):
+            ofs = self._offset_grp[grp]
+
+        d, h = self._denom_grad(grp, params, ofs)
+        return self._xy[grp] - h / d


 class ConditionalPoisson(_ConditionalModel):
@@ -158,14 +367,60 @@ class ConditionalPoisson(_ConditionalModel):
         Codes defining the groups. This is a required keyword parameter.
     """

+    def loglike(self, params):

-class ConditionalResults(base.LikelihoodModelResults):
+        ofs = None
+        if hasattr(self, 'offset'):
+            ofs = self._offset_grp
+
+        ll = 0.0

+        for i in range(len(self._endog_grp)):
+
+            xb = np.dot(self._exog_grp[i], params)
+            if ofs is not None:
+                xb += ofs[i]
+            exb = np.exp(xb)
+            y = self._endog_grp[i]
+            ll += np.dot(y, xb)
+            s = exb.sum()
+            ll -= self._sumy[i] * np.log(s)
+
+        return ll
+
+    def score(self, params):
+
+        ofs = None
+        if hasattr(self, 'offset'):
+            ofs = self._offset_grp
+
+        score = 0.0
+
+        for i in range(len(self._endog_grp)):
+
+            x = self._exog_grp[i]
+            xb = np.dot(x, params)
+            if ofs is not None:
+                xb += ofs[i]
+            exb = np.exp(xb)
+            s = exb.sum()
+            y = self._endog_grp[i]
+            score += np.dot(y, x)
+            score -= self._sumy[i] * np.dot(exb, x) / s
+
+        return score
+
+
+class ConditionalResults(base.LikelihoodModelResults):
     def __init__(self, model, params, normalized_cov_params, scale):
-        super(ConditionalResults, self).__init__(model, params,
-            normalized_cov_params=normalized_cov_params, scale=scale)

-    def summary(self, yname=None, xname=None, title=None, alpha=0.05):
+        super(ConditionalResults, self).__init__(
+            model,
+            params,
+            normalized_cov_params=normalized_cov_params,
+            scale=scale)
+
+    def summary(self, yname=None, xname=None, title=None, alpha=.05):
         """
         Summarize the fitted model.

@@ -193,8 +448,41 @@ class ConditionalResults(base.LikelihoodModelResults):
         statsmodels.iolib.summary.Summary : class to hold summary
             results
         """
-        pass

+        top_left = [
+            ('Dep. Variable:', None),
+            ('Model:', None),
+            ('Log-Likelihood:', None),
+            ('Method:', [self.method]),
+            ('Date:', None),
+            ('Time:', None),
+        ]
+
+        top_right = [
+            ('No. Observations:', None),
+            ('No. groups:', [self.n_groups]),
+            ('Min group size:', [self._group_stats[0]]),
+            ('Max group size:', [self._group_stats[1]]),
+            ('Mean group size:', [self._group_stats[2]]),
+        ]
+
+        if title is None:
+            title = "Conditional Logit Model Regression Results"
+
+        # create summary tables
+        from statsmodels.iolib.summary import Summary
+        smry = Summary()
+        smry.add_table_2cols(
+            self,
+            gleft=top_left,
+            gright=top_right,  # [],
+            yname=yname,
+            xname=xname,
+            title=title)
+        smry.add_table_params(
+            self, yname=yname, xname=xname, alpha=alpha, use_t=self.use_t)
+
+        return smry

 class ConditionalMNLogit(_ConditionalModel):
     """
@@ -222,18 +510,24 @@ class ConditionalMNLogit(_ConditionalModel):
     """

     def __init__(self, endog, exog, missing='none', **kwargs):
-        super(ConditionalMNLogit, self).__init__(endog, exog, missing=
-            missing, **kwargs)
+
+        super(ConditionalMNLogit, self).__init__(
+            endog, exog, missing=missing, **kwargs)
+
+        # endog must be integers
         self.endog = self.endog.astype(int)
+
         self.k_cat = self.endog.max() + 1
         self.df_model = (self.k_cat - 1) * self.exog.shape[1]
         self.df_resid = self.nobs - self.df_model
         self._ynames_map = {j: str(j) for j in range(self.k_cat)}
-        self.J = self.k_cat
-        self.K = self.exog.shape[1]
+        self.J = self.k_cat  # Unfortunate name, needed for results
+        self.K = self.exog.shape[1]  # for compatibility with MNLogit
+
         if self.endog.min() < 0:
-            msg = 'endog may not contain negative values'
+            msg = "endog may not contain negative values"
             raise ValueError(msg)
+
         grx = collections.defaultdict(list)
         for k, v in enumerate(self.groups):
             grx[v].append(k)
@@ -241,6 +535,97 @@ class ConditionalMNLogit(_ConditionalModel):
         self._group_labels.sort()
         self._grp_ix = [grx[k] for k in self._group_labels]

+    def fit(self,
+            start_params=None,
+            method='BFGS',
+            maxiter=100,
+            full_output=True,
+            disp=False,
+            fargs=(),
+            callback=None,
+            retall=False,
+            skip_hessian=False,
+            **kwargs):
+
+        if start_params is None:
+            q = self.exog.shape[1]
+            c = self.k_cat - 1
+            start_params = np.random.normal(size=q * c)
+
+        # Do not call super(...).fit because it cannot handle the 2d-params.
+        rslt = base.LikelihoodModel.fit(
+            self,
+            start_params=start_params,
+            method=method,
+            maxiter=maxiter,
+            full_output=full_output,
+            disp=disp,
+            skip_hessian=skip_hessian)
+
+        rslt.params = rslt.params.reshape((self.exog.shape[1], -1))
+        rslt = MultinomialResults(self, rslt)
+
+        # Not clear what the null likelihood should be, there is no intercept
+        # so the null model is not clearly defined.  This is needed for summary
+        # to work.
+        rslt.set_null_options(llnull=np.nan)
+
+        return MultinomialResultsWrapper(rslt)
+
+    def loglike(self, params):
+
+        q = self.exog.shape[1]
+        c = self.k_cat - 1
+
+        pmat = params.reshape((q, c))
+        pmat = np.concatenate((np.zeros((q, 1)), pmat), axis=1)
+        lpr = np.dot(self.exog, pmat)
+
+        ll = 0.0
+        for ii in self._grp_ix:
+            x = lpr[ii, :]
+            jj = np.arange(x.shape[0], dtype=int)
+            y = self.endog[ii]
+            denom = 0.0
+            for p in itertools.permutations(y):
+                denom += np.exp(x[(jj, p)].sum())
+            ll += x[(jj, y)].sum() - np.log(denom)
+
+        return ll
+
+
+    def score(self, params):
+
+        q = self.exog.shape[1]
+        c = self.k_cat - 1
+
+        pmat = params.reshape((q, c))
+        pmat = np.concatenate((np.zeros((q, 1)), pmat), axis=1)
+        lpr = np.dot(self.exog, pmat)
+
+        grad = np.zeros((q, c))
+        for ii in self._grp_ix:
+            x = lpr[ii, :]
+            jj = np.arange(x.shape[0], dtype=int)
+            y = self.endog[ii]
+            denom = 0.0
+            denomg = np.zeros((q, c))
+            for p in itertools.permutations(y):
+                v = np.exp(x[(jj, p)].sum())
+                denom += v
+                for i, r in enumerate(p):
+                    if r != 0:
+                        denomg[:, r - 1] += v * self.exog[ii[i], :]
+
+            for i, r in enumerate(y):
+                if r != 0:
+                    grad[:, r - 1] += self.exog[ii[i], :]
+
+            grad -= denomg / denom
+
+        return grad.flatten()
+
+

 class ConditionalResultsWrapper(lm.RegressionResultsWrapper):
     pass
diff --git a/statsmodels/discrete/count_model.py b/statsmodels/discrete/count_model.py
index 78efef4ca..e4b2ef304 100644
--- a/statsmodels/discrete/count_model.py
+++ b/statsmodels/discrete/count_model.py
@@ -1,16 +1,25 @@
-__all__ = ['ZeroInflatedPoisson', 'ZeroInflatedGeneralizedPoisson',
-    'ZeroInflatedNegativeBinomialP']
+__all__ = ["ZeroInflatedPoisson", "ZeroInflatedGeneralizedPoisson",
+           "ZeroInflatedNegativeBinomialP"]
+
 import warnings
 import numpy as np
 import statsmodels.base.model as base
 import statsmodels.base.wrapper as wrap
 import statsmodels.regression.linear_model as lm
-from statsmodels.discrete.discrete_model import DiscreteModel, CountModel, Poisson, Logit, CountResults, L1CountResults, Probit, _discrete_results_docs, _validate_l1_method, GeneralizedPoisson, NegativeBinomialP
+from statsmodels.discrete.discrete_model import (DiscreteModel, CountModel,
+                                                 Poisson, Logit, CountResults,
+                                                 L1CountResults, Probit,
+                                                 _discrete_results_docs,
+                                                 _validate_l1_method,
+                                                 GeneralizedPoisson,
+                                                 NegativeBinomialP)
 from statsmodels.distributions import zipoisson, zigenpoisson, zinegbin
 from statsmodels.tools.numdiff import approx_fprime, approx_hess
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.sm_exceptions import ConvergenceWarning
 from statsmodels.compat.pandas import Appender
+
+
 _doc_zi_params = """
     exog_infl : array_like or None
         Explanatory variables for the binary inflation model, i.e. for
@@ -26,8 +35,7 @@ _doc_zi_params = """


 class GenericZeroInflated(CountModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Generic Zero Inflated Model

     %(params)s
@@ -41,56 +49,62 @@ class GenericZeroInflated(CountModel):
         A reference to the exogenous design.
     exog_infl : ndarray
         A reference to the zero-inflated exogenous design.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        _doc_zi_params + base._missing_param_doc})
+    """ % {'params' : base._model_params_doc,
+           'extra_params' : _doc_zi_params + base._missing_param_doc}
+
+    def __init__(self, endog, exog, exog_infl=None, offset=None,
+                 inflation='logit', exposure=None, missing='none', **kwargs):
+        super(GenericZeroInflated, self).__init__(endog, exog, offset=offset,
+                                                  exposure=exposure,
+                                                  missing=missing, **kwargs)

-    def __init__(self, endog, exog, exog_infl=None, offset=None, inflation=
-        'logit', exposure=None, missing='none', **kwargs):
-        super(GenericZeroInflated, self).__init__(endog, exog, offset=
-            offset, exposure=exposure, missing=missing, **kwargs)
         if exog_infl is None:
             self.k_inflate = 1
             self._no_exog_infl = True
-            self.exog_infl = np.ones((endog.size, self.k_inflate), dtype=np
-                .float64)
+            self.exog_infl = np.ones((endog.size, self.k_inflate),
+                                     dtype=np.float64)
         else:
             self.exog_infl = exog_infl
             self.k_inflate = exog_infl.shape[1]
             self._no_exog_infl = False
+
         if len(exog.shape) == 1:
             self.k_exog = 1
         else:
             self.k_exog = exog.shape[1]
+
         self.infl = inflation
         if inflation == 'logit':
-            self.model_infl = Logit(np.zeros(self.exog_infl.shape[0]), self
-                .exog_infl)
+            self.model_infl = Logit(np.zeros(self.exog_infl.shape[0]),
+                                    self.exog_infl)
             self._hessian_inflate = self._hessian_logit
         elif inflation == 'probit':
             self.model_infl = Probit(np.zeros(self.exog_infl.shape[0]),
-                self.exog_infl)
+                                    self.exog_infl)
             self._hessian_inflate = self._hessian_probit
+
         else:
-            raise ValueError('inflation == %s, which is not handled' %
-                inflation)
+            raise ValueError("inflation == %s, which is not handled"
+                             % inflation)
+
         self.inflation = inflation
         self.k_extra = self.k_inflate
+
         if len(self.exog) != len(self.exog_infl):
-            raise ValueError(
-                'exog and exog_infl have different number ofobservation. `missing` handling is not supported'
-                )
-        infl_names = [('inflate_%s' % i) for i in self.model_infl.data.
-            param_names]
+            raise ValueError('exog and exog_infl have different number of'
+                             'observation. `missing` handling is not supported')
+
+        infl_names = ['inflate_%s' % i for i in self.model_infl.data.param_names]
         self.exog_names[:] = infl_names + list(self.exog_names)
         self.exog_infl = np.asarray(self.exog_infl, dtype=np.float64)
+
         self._init_keys.extend(['exog_infl', 'inflation'])
         self._null_drop_keys = ['exog_infl']

     def _get_exogs(self):
         """list of exogs, for internal use in post-estimation
         """
-        pass
+        return (self.exog, self.exog_infl)

     def loglike(self, params):
         """
@@ -113,7 +127,7 @@ class GenericZeroInflated(CountModel):
             \\sum_{y_{i}>0}(\\ln(1-w_{i})+L_{main\\_model})
             where P - pdf of main model, L - loglike function of main model.
         """
-        pass
+        return np.sum(self.loglikeobs(params))

     def loglikeobs(self, params):
         """
@@ -138,7 +152,86 @@ class GenericZeroInflated(CountModel):

         for observations :math:`i=1,...,n`
         """
-        pass
+        params_infl = params[:self.k_inflate]
+        params_main = params[self.k_inflate:]
+
+        y = self.endog
+        w = self.model_infl.predict(params_infl)
+
+        w = np.clip(w, np.finfo(float).eps, 1 - np.finfo(float).eps)
+        llf_main = self.model_main.loglikeobs(params_main)
+        zero_idx = np.nonzero(y == 0)[0]
+        nonzero_idx = np.nonzero(y)[0]
+
+        llf = np.zeros_like(y, dtype=np.float64)
+        llf[zero_idx] = (np.log(w[zero_idx] +
+            (1 - w[zero_idx]) * np.exp(llf_main[zero_idx])))
+        llf[nonzero_idx] = np.log(1 - w[nonzero_idx]) + llf_main[nonzero_idx]
+
+        return llf
+
+    @Appender(DiscreteModel.fit.__doc__)
+    def fit(self, start_params=None, method='bfgs', maxiter=35,
+            full_output=1, disp=1, callback=None,
+            cov_type='nonrobust', cov_kwds=None, use_t=None, **kwargs):
+        if start_params is None:
+            offset = getattr(self, "offset", 0) + getattr(self, "exposure", 0)
+            if np.size(offset) == 1 and offset == 0:
+                offset = None
+            start_params = self._get_start_params()
+
+        if callback is None:
+            # work around perfect separation callback #3895
+            callback = lambda *x: x
+
+        mlefit = super(GenericZeroInflated, self).fit(start_params=start_params,
+                       maxiter=maxiter, disp=disp, method=method,
+                       full_output=full_output, callback=callback,
+                       **kwargs)
+
+        zipfit = self.result_class(self, mlefit._results)
+        result = self.result_class_wrapper(zipfit)
+
+        if cov_kwds is None:
+            cov_kwds = {}
+
+        result._get_robustcov_results(cov_type=cov_type,
+                                      use_self=True, use_t=use_t, **cov_kwds)
+        return result
+
+    @Appender(DiscreteModel.fit_regularized.__doc__)
+    def fit_regularized(self, start_params=None, method='l1',
+            maxiter='defined_by_method', full_output=1, disp=1, callback=None,
+            alpha=0, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=1e-4,
+            qc_tol=0.03, **kwargs):
+
+        _validate_l1_method(method)
+
+        if np.size(alpha) == 1 and alpha != 0:
+            k_params = self.k_exog + self.k_inflate
+            alpha = alpha * np.ones(k_params)
+
+        extra = self.k_extra - self.k_inflate
+        alpha_p = alpha[:-(self.k_extra - extra)] if (self.k_extra
+            and np.size(alpha) > 1) else alpha
+        if start_params is None:
+            offset = getattr(self, "offset", 0) + getattr(self, "exposure", 0)
+            if np.size(offset) == 1 and offset == 0:
+                offset = None
+            start_params = self.model_main.fit_regularized(
+                start_params=start_params, method=method, maxiter=maxiter,
+                full_output=full_output, disp=0, callback=callback,
+                alpha=alpha_p, trim_mode=trim_mode, auto_trim_tol=auto_trim_tol,
+                size_trim_tol=size_trim_tol, qc_tol=qc_tol, **kwargs).params
+            start_params = np.append(np.ones(self.k_inflate), start_params)
+        cntfit = super(CountModel, self).fit_regularized(
+                start_params=start_params, method=method, maxiter=maxiter,
+                full_output=full_output, disp=disp, callback=callback,
+                alpha=alpha, trim_mode=trim_mode, auto_trim_tol=auto_trim_tol,
+                size_trim_tol=size_trim_tol, qc_tol=qc_tol, **kwargs)
+
+        discretefit = self.result_class_reg(self, cntfit)
+        return self.result_class_reg_wrapper(discretefit)

     def score_obs(self, params):
         """
@@ -155,6 +248,86 @@ class GenericZeroInflated(CountModel):
             The score vector of the model, i.e. the first derivative of the
             loglikelihood function, evaluated at `params`
         """
+        params_infl = params[:self.k_inflate]
+        params_main = params[self.k_inflate:]
+
+        y = self.endog
+        w = self.model_infl.predict(params_infl)
+        w = np.clip(w, np.finfo(float).eps, 1 - np.finfo(float).eps)
+        score_main = self.model_main.score_obs(params_main)
+        llf_main = self.model_main.loglikeobs(params_main)
+        llf = self.loglikeobs(params)
+        zero_idx = np.nonzero(y == 0)[0]
+        nonzero_idx = np.nonzero(y)[0]
+
+        mu = self.model_main.predict(params_main)
+
+        # TODO: need to allow for complex to use CS numerical derivatives
+        dldp = np.zeros((self.exog.shape[0], self.k_exog), dtype=np.float64)
+        dldw = np.zeros_like(self.exog_infl, dtype=np.float64)
+
+        dldp[zero_idx,:] = (score_main[zero_idx].T *
+                     (1 - (w[zero_idx]) / np.exp(llf[zero_idx]))).T
+        dldp[nonzero_idx,:] = score_main[nonzero_idx]
+
+        if self.inflation == 'logit':
+            dldw[zero_idx,:] =  (self.exog_infl[zero_idx].T * w[zero_idx] *
+                                 (1 - w[zero_idx]) *
+                                 (1 - np.exp(llf_main[zero_idx])) /
+                                  np.exp(llf[zero_idx])).T
+            dldw[nonzero_idx,:] = -(self.exog_infl[nonzero_idx].T *
+                                    w[nonzero_idx]).T
+        elif self.inflation == 'probit':
+            return approx_fprime(params, self.loglikeobs)
+
+        return np.hstack((dldw, dldp))
+
+    def score(self, params):
+        return self.score_obs(params).sum(0)
+
+    def _hessian_main(self, params):
+        pass
+
+    def _hessian_logit(self, params):
+        params_infl = params[:self.k_inflate]
+        params_main = params[self.k_inflate:]
+
+        y = self.endog
+        w = self.model_infl.predict(params_infl)
+        w = np.clip(w, np.finfo(float).eps, 1 - np.finfo(float).eps)
+        score_main = self.model_main.score_obs(params_main)
+        llf_main = self.model_main.loglikeobs(params_main)
+        llf = self.loglikeobs(params)
+        zero_idx = np.nonzero(y == 0)[0]
+        nonzero_idx = np.nonzero(y)[0]
+
+        hess_arr = np.zeros((self.k_inflate, self.k_exog + self.k_inflate))
+
+        pmf = np.exp(llf)
+
+        #d2l/dw2
+        for i in range(self.k_inflate):
+            for j in range(i, -1, -1):
+                hess_arr[i, j] = ((
+                    self.exog_infl[zero_idx, i] * self.exog_infl[zero_idx, j] *
+                    (w[zero_idx] * (1 - w[zero_idx]) * ((1 -
+                    np.exp(llf_main[zero_idx])) * (1 - 2 * w[zero_idx]) *
+                    np.exp(llf[zero_idx]) - (w[zero_idx] - w[zero_idx]**2) *
+                    (1 - np.exp(llf_main[zero_idx]))**2) /
+                    pmf[zero_idx]**2)).sum() -
+                    (self.exog_infl[nonzero_idx, i] * self.exog_infl[nonzero_idx, j] *
+                    w[nonzero_idx] * (1 - w[nonzero_idx])).sum())
+
+        #d2l/dpdw
+        for i in range(self.k_inflate):
+            for j in range(self.k_exog):
+                hess_arr[i, j + self.k_inflate] = -(score_main[zero_idx, j] *
+                    w[zero_idx] * (1 - w[zero_idx]) *
+                    self.exog_infl[zero_idx, i] / pmf[zero_idx]).sum()
+
+        return hess_arr
+
+    def _hessian_probit(self, params):
         pass

     def hessian(self, params):
@@ -175,10 +348,26 @@ class GenericZeroInflated(CountModel):
         Notes
         -----
         """
-        pass
+        hess_arr_main = self._hessian_main(params)
+        hess_arr_infl = self._hessian_inflate(params)
+
+        if hess_arr_main is None or hess_arr_infl is None:
+            return approx_hess(params, self.loglike)
+
+        dim = self.k_exog + self.k_inflate
+
+        hess_arr = np.zeros((dim, dim))
+
+        hess_arr[:self.k_inflate,:] = hess_arr_infl
+        hess_arr[self.k_inflate:,self.k_inflate:] = hess_arr_main
+
+        tri_idx = np.triu_indices(self.k_exog + self.k_inflate, k=1)
+        hess_arr[tri_idx] = hess_arr.T[tri_idx]
+
+        return hess_arr

     def predict(self, params, exog=None, exog_infl=None, exposure=None,
-        offset=None, which='mean', y_values=None):
+                offset=None, which='mean', y_values=None):
         """
         Predict expected response or other statistic given exogenous variables.

@@ -226,18 +415,103 @@ class GenericZeroInflated(CountModel):
             Values of the random variable endog at which pmf is evaluated.
             Only used if ``which="prob"``
         """
-        pass
+        no_exog = False
+        if exog is None:
+            no_exog = True
+            exog = self.exog
+
+        if exog_infl is None:
+            if no_exog:
+                exog_infl = self.exog_infl
+            else:
+                if self._no_exog_infl:
+                    exog_infl = np.ones((len(exog), 1))
+        else:
+            exog_infl = np.asarray(exog_infl)
+            if exog_infl.ndim == 1 and self.k_inflate == 1:
+                exog_infl = exog_infl[:, None]
+
+        if exposure is None:
+            if no_exog:
+                exposure = getattr(self, 'exposure', 0)
+            else:
+                exposure = 0
+        else:
+            exposure = np.log(exposure)
+
+        if offset is None:
+            if no_exog:
+                offset = getattr(self, 'offset', 0)
+            else:
+                offset = 0
+
+        params_infl = params[:self.k_inflate]
+        params_main = params[self.k_inflate:]
+
+        prob_main = 1 - self.model_infl.predict(params_infl, exog_infl)
+
+        lin_pred = np.dot(exog, params_main[:self.exog.shape[1]]) + exposure + offset
+
+        # Refactor: This is pretty hacky,
+        # there should be an appropriate predict method in model_main
+        # this is just prob(y=0 | model_main)
+        tmp_exog = self.model_main.exog
+        tmp_endog = self.model_main.endog
+        tmp_offset = getattr(self.model_main, 'offset', False)
+        tmp_exposure = getattr(self.model_main, 'exposure', False)
+        self.model_main.exog = exog
+        self.model_main.endog = np.zeros((exog.shape[0]))
+        self.model_main.offset = offset
+        self.model_main.exposure = exposure
+        llf = self.model_main.loglikeobs(params_main)
+        self.model_main.exog = tmp_exog
+        self.model_main.endog = tmp_endog
+        # tmp_offset might be an array with elementwise equality testing
+        #if np.size(tmp_offset) == 1 and tmp_offset[0] == 'no':
+        if tmp_offset is False:
+            del self.model_main.offset
+        else:
+            self.model_main.offset = tmp_offset
+        #if np.size(tmp_exposure) == 1 and tmp_exposure[0] == 'no':
+        if tmp_exposure is False:
+            del self.model_main.exposure
+        else:
+            self.model_main.exposure = tmp_exposure
+        # end hack
+
+        prob_zero = (1 - prob_main) + prob_main * np.exp(llf)
+
+        if which == 'mean':
+            return prob_main * np.exp(lin_pred)
+        elif which == 'mean-main':
+            return np.exp(lin_pred)
+        elif which == 'linear':
+            return lin_pred
+        elif which == 'mean-nonzero':
+            return prob_main * np.exp(lin_pred) / (1 - prob_zero)
+        elif which == 'prob-zero':
+            return prob_zero
+        elif which == 'prob-main':
+            return prob_main
+        elif which == 'var':
+            mu = np.exp(lin_pred)
+            return self._predict_var(params, mu, 1 - prob_main)
+        elif which == 'prob':
+            return self._predict_prob(params, exog, exog_infl, exposure,
+                                      offset, y_values=y_values)
+        else:
+            raise ValueError('which = %s is not available' % which)

     def _derivative_predict(self, params, exog=None, transform='dydx'):
         """NotImplemented
         """
-        pass
+        raise NotImplementedError

-    def _derivative_exog(self, params, exog=None, transform='dydx',
-        dummy_idx=None, count_idx=None):
+    def _derivative_exog(self, params, exog=None, transform="dydx",
+                         dummy_idx=None, count_idx=None):
         """NotImplemented
         """
-        pass
+        raise NotImplementedError

     def _deriv_mean_dparams(self, params):
         """
@@ -253,7 +527,21 @@ class GenericZeroInflated(CountModel):
         The value of the derivative of the expected endog with respect
         to the parameter vector.
         """
-        pass
+        params_infl = params[:self.k_inflate]
+        params_main = params[self.k_inflate:]
+
+        w = self.model_infl.predict(params_infl)
+        w = np.clip(w, np.finfo(float).eps, 1 - np.finfo(float).eps)
+        mu = self.model_main.predict(params_main)
+
+        score_infl = self.model_infl._deriv_mean_dparams(params_infl)
+        score_main = self.model_main._deriv_mean_dparams(params_main)
+
+        dmat_infl = - mu[:, None] * score_infl
+        dmat_main = (1 - w[:, None]) * score_main
+
+        dmat = np.column_stack((dmat_infl, dmat_main))
+        return dmat

     def _deriv_score_obs_dendog(self, params):
         """derivative of score_obs w.r.t. endog
@@ -268,12 +556,30 @@ class GenericZeroInflated(CountModel):
         derivative : ndarray_2d
             The derivative of the score_obs with respect to endog.
         """
-        pass
+        raise NotImplementedError
+
+        # The below currently does not work, discontinuity at zero
+        # see https://github.com/statsmodels/statsmodels/pull/7951#issuecomment-996355875  # noqa
+        from statsmodels.tools.numdiff import _approx_fprime_scalar
+        endog_original = self.endog
+
+        def f(y):
+            if y.ndim == 2 and y.shape[1] == 1:
+                y = y[:, 0]
+            self.endog = y
+            self.model_main.endog = y
+            sf = self.score_obs(params)
+            self.endog = endog_original
+            self.model_main.endog = endog_original
+            return sf
+
+        ds = _approx_fprime_scalar(self.endog[:, None], f, epsilon=1e-2)
+
+        return ds


 class ZeroInflatedPoisson(GenericZeroInflated):
-    __doc__ = (
-        """
+    __doc__ = """
     Poisson Zero Inflated Model

     %(params)s
@@ -287,23 +593,75 @@ class ZeroInflatedPoisson(GenericZeroInflated):
         A reference to the exogenous design.
     exog_infl : ndarray
         A reference to the zero-inflated exogenous design.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        _doc_zi_params + base._missing_param_doc})
-
-    def __init__(self, endog, exog, exog_infl=None, offset=None, exposure=
-        None, inflation='logit', missing='none', **kwargs):
-        super(ZeroInflatedPoisson, self).__init__(endog, exog, offset=
-            offset, inflation=inflation, exog_infl=exog_infl, exposure=
-            exposure, missing=missing, **kwargs)
+    """ % {'params' : base._model_params_doc,
+           'extra_params' : _doc_zi_params + base._missing_param_doc}
+
+    def __init__(self, endog, exog, exog_infl=None, offset=None, exposure=None,
+                 inflation='logit', missing='none', **kwargs):
+        super(ZeroInflatedPoisson, self).__init__(endog, exog, offset=offset,
+                                                  inflation=inflation,
+                                                  exog_infl=exog_infl,
+                                                  exposure=exposure,
+                                                  missing=missing, **kwargs)
         self.model_main = Poisson(self.endog, self.exog, offset=offset,
-            exposure=exposure)
+                                  exposure=exposure)
         self.distribution = zipoisson
         self.result_class = ZeroInflatedPoissonResults
         self.result_class_wrapper = ZeroInflatedPoissonResultsWrapper
         self.result_class_reg = L1ZeroInflatedPoissonResults
         self.result_class_reg_wrapper = L1ZeroInflatedPoissonResultsWrapper

+    def _hessian_main(self, params):
+        params_infl = params[:self.k_inflate]
+        params_main = params[self.k_inflate:]
+
+        y = self.endog
+        w = self.model_infl.predict(params_infl)
+        w = np.clip(w, np.finfo(float).eps, 1 - np.finfo(float).eps)
+        score = self.score(params)
+        zero_idx = np.nonzero(y == 0)[0]
+        nonzero_idx = np.nonzero(y)[0]
+
+        mu = self.model_main.predict(params_main)
+
+        hess_arr = np.zeros((self.k_exog, self.k_exog))
+
+        coeff = (1 + w[zero_idx] * (np.exp(mu[zero_idx]) - 1))
+
+        #d2l/dp2
+        for i in range(self.k_exog):
+            for j in range(i, -1, -1):
+                hess_arr[i, j] = ((
+                    self.exog[zero_idx, i] * self.exog[zero_idx, j] *
+                    mu[zero_idx] * (w[zero_idx] - 1) * (1 / coeff -
+                    w[zero_idx] * mu[zero_idx] * np.exp(mu[zero_idx]) /
+                    coeff**2)).sum() - (mu[nonzero_idx] * self.exog[nonzero_idx, i] *
+                    self.exog[nonzero_idx, j]).sum())
+
+        return hess_arr
+
+    def _predict_prob(self, params, exog, exog_infl, exposure, offset,
+                      y_values=None):
+        params_infl = params[:self.k_inflate]
+        params_main = params[self.k_inflate:]
+
+        if y_values is None:
+            y_values = np.atleast_2d(np.arange(0, np.max(self.endog)+1))
+
+        if len(exog_infl.shape) < 2:
+            transform = True
+            w = np.atleast_2d(
+                self.model_infl.predict(params_infl, exog_infl))[:, None]
+        else:
+            transform = False
+            w = self.model_infl.predict(params_infl, exog_infl)[:, None]
+
+        w = np.clip(w, np.finfo(float).eps, 1 - np.finfo(float).eps)
+        mu = self.model_main.predict(params_main, exog,
+            offset=offset)[:, None]
+        result = self.distribution.pmf(y_values, mu, w)
+        return result[0] if transform else result
+
     def _predict_var(self, params, mu, prob_infl):
         """predict values for conditional variance V(endog | exog)

@@ -321,10 +679,19 @@ class ZeroInflatedPoisson(GenericZeroInflated):
         -------
         Predicted conditional variance.
         """
-        pass
-
-    def get_distribution(self, params, exog=None, exog_infl=None, exposure=
-        None, offset=None):
+        w = prob_infl
+        var_ = (1 - w) * mu * (1 + w * mu)
+        return var_
+
+    def _get_start_params(self):
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", category=ConvergenceWarning)
+            start_params = self.model_main.fit(disp=0, method="nm").params
+        start_params = np.append(np.ones(self.k_inflate) * 0.1, start_params)
+        return start_params
+
+    def get_distribution(self, params, exog=None, exog_infl=None,
+                         exposure=None, offset=None):
         """Get frozen instance of distribution based on predicted parameters.

         Parameters
@@ -355,12 +722,18 @@ class ZeroInflatedPoisson(GenericZeroInflated):
         -------
         Instance of frozen scipy distribution subclass.
         """
-        pass
+        mu = self.predict(params, exog=exog, exog_infl=exog_infl,
+                          exposure=exposure, offset=offset, which="mean-main")
+        w = self.predict(params, exog=exog, exog_infl=exog_infl,
+                         exposure=exposure, offset=offset, which="prob-main")
+
+        # distr = self.distribution(mu[:, None], 1 - w[:, None])
+        distr = self.distribution(mu, 1 - w)
+        return distr


 class ZeroInflatedGeneralizedPoisson(GenericZeroInflated):
-    __doc__ = (
-        """
+    __doc__ = """
     Zero Inflated Generalized Poisson Model

     %(params)s
@@ -376,32 +749,59 @@ class ZeroInflatedGeneralizedPoisson(GenericZeroInflated):
         A reference to the zero-inflated exogenous design.
     p : scalar
         P denotes parametrizations for ZIGP regression.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        _doc_zi_params +
-        """p : float
+    """ % {'params' : base._model_params_doc,
+           'extra_params' : _doc_zi_params +
+           """p : float
         dispersion power parameter for the GeneralizedPoisson model.  p=1 for
         ZIGP-1 and p=2 for ZIGP-2. Default is p=2
-    """
-         + base._missing_param_doc})
+    """ + base._missing_param_doc}

-    def __init__(self, endog, exog, exog_infl=None, offset=None, exposure=
-        None, inflation='logit', p=2, missing='none', **kwargs):
+    def __init__(self, endog, exog, exog_infl=None, offset=None, exposure=None,
+                 inflation='logit', p=2, missing='none', **kwargs):
         super(ZeroInflatedGeneralizedPoisson, self).__init__(endog, exog,
-            offset=offset, inflation=inflation, exog_infl=exog_infl,
-            exposure=exposure, missing=missing, **kwargs)
-        self.model_main = GeneralizedPoisson(self.endog, self.exog, offset=
-            offset, exposure=exposure, p=p)
+                                                  offset=offset,
+                                                  inflation=inflation,
+                                                  exog_infl=exog_infl,
+                                                  exposure=exposure,
+                                                  missing=missing, **kwargs)
+        self.model_main = GeneralizedPoisson(self.endog, self.exog,
+            offset=offset, exposure=exposure, p=p)
         self.distribution = zigenpoisson
         self.k_exog += 1
         self.k_extra += 1
-        self.exog_names.append('alpha')
+        self.exog_names.append("alpha")
         self.result_class = ZeroInflatedGeneralizedPoissonResults
-        self.result_class_wrapper = (
-            ZeroInflatedGeneralizedPoissonResultsWrapper)
+        self.result_class_wrapper = ZeroInflatedGeneralizedPoissonResultsWrapper
         self.result_class_reg = L1ZeroInflatedGeneralizedPoissonResults
-        self.result_class_reg_wrapper = (
-            L1ZeroInflatedGeneralizedPoissonResultsWrapper)
+        self.result_class_reg_wrapper = L1ZeroInflatedGeneralizedPoissonResultsWrapper
+
+    def _get_init_kwds(self):
+        kwds = super(ZeroInflatedGeneralizedPoisson, self)._get_init_kwds()
+        kwds['p'] = self.model_main.parameterization + 1
+        return kwds
+
+    def _predict_prob(self, params, exog, exog_infl, exposure, offset,
+                      y_values=None):
+        params_infl = params[:self.k_inflate]
+        params_main = params[self.k_inflate:]
+
+        p = self.model_main.parameterization + 1
+        if y_values is None:
+            y_values = np.atleast_2d(np.arange(0, np.max(self.endog)+1))
+
+        if len(exog_infl.shape) < 2:
+            transform = True
+            w = np.atleast_2d(
+                self.model_infl.predict(params_infl, exog_infl))[:, None]
+        else:
+            transform = False
+            w = self.model_infl.predict(params_infl, exog_infl)[:, None]
+
+        w[w == 1.] = np.nextafter(1, 0)
+        mu = self.model_main.predict(params_main, exog,
+            exposure=exposure, offset=offset)[:, None]
+        result = self.distribution.pmf(y_values, mu, params_main[-1], p, w)
+        return result[0] if transform else result

     def _predict_var(self, params, mu, prob_infl):
         """predict values for conditional variance V(endog | exog)
@@ -420,12 +820,36 @@ class ZeroInflatedGeneralizedPoisson(GenericZeroInflated):
         -------
         Predicted conditional variance.
         """
-        pass
+        alpha = params[-1]
+        w = prob_infl
+        p = self.model_main.parameterization
+        var_ = (1 - w) * mu * ((1 + alpha * mu**p)**2 + w * mu)
+        return var_
+
+    def _get_start_params(self):
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", category=ConvergenceWarning)
+            start_params = ZeroInflatedPoisson(self.endog, self.exog,
+                exog_infl=self.exog_infl).fit(disp=0).params
+        start_params = np.append(start_params, 0.1)
+        return start_params
+
+    @Appender(ZeroInflatedPoisson.get_distribution.__doc__)
+    def get_distribution(self, params, exog=None, exog_infl=None,
+                         exposure=None, offset=None):
+
+        p = self.model_main.parameterization + 1
+        mu = self.predict(params, exog=exog, exog_infl=exog_infl,
+                          exposure=exposure, offset=offset, which="mean-main")
+        w = self.predict(params, exog=exog, exog_infl=exog_infl,
+                         exposure=exposure, offset=offset, which="prob-main")
+        # distr = self.distribution(mu[:, None], params[-1], p, 1 - w[:, None])
+        distr = self.distribution(mu, params[-1], p, 1 - w)
+        return distr


 class ZeroInflatedNegativeBinomialP(GenericZeroInflated):
-    __doc__ = (
-        """
+    __doc__ = """
     Zero Inflated Generalized Negative Binomial Model

     %(params)s
@@ -442,31 +866,59 @@ class ZeroInflatedNegativeBinomialP(GenericZeroInflated):
     p : scalar
         P denotes parametrizations for ZINB regression. p=1 for ZINB-1 and
     p=2 for ZINB-2. Default is p=2
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        _doc_zi_params +
-        """p : float
+    """ % {'params' : base._model_params_doc,
+           'extra_params' : _doc_zi_params +
+           """p : float
         dispersion power parameter for the NegativeBinomialP model.  p=1 for
         ZINB-1 and p=2 for ZINM-2. Default is p=2
-    """
-         + base._missing_param_doc})
+    """ + base._missing_param_doc}

-    def __init__(self, endog, exog, exog_infl=None, offset=None, exposure=
-        None, inflation='logit', p=2, missing='none', **kwargs):
+    def __init__(self, endog, exog, exog_infl=None, offset=None, exposure=None,
+                 inflation='logit', p=2, missing='none', **kwargs):
         super(ZeroInflatedNegativeBinomialP, self).__init__(endog, exog,
-            offset=offset, inflation=inflation, exog_infl=exog_infl,
-            exposure=exposure, missing=missing, **kwargs)
-        self.model_main = NegativeBinomialP(self.endog, self.exog, offset=
-            offset, exposure=exposure, p=p)
+                                                  offset=offset,
+                                                  inflation=inflation,
+                                                  exog_infl=exog_infl,
+                                                  exposure=exposure,
+                                                  missing=missing, **kwargs)
+        self.model_main = NegativeBinomialP(self.endog, self.exog,
+            offset=offset, exposure=exposure, p=p)
         self.distribution = zinegbin
         self.k_exog += 1
         self.k_extra += 1
-        self.exog_names.append('alpha')
+        self.exog_names.append("alpha")
         self.result_class = ZeroInflatedNegativeBinomialResults
         self.result_class_wrapper = ZeroInflatedNegativeBinomialResultsWrapper
         self.result_class_reg = L1ZeroInflatedNegativeBinomialResults
-        self.result_class_reg_wrapper = (
-            L1ZeroInflatedNegativeBinomialResultsWrapper)
+        self.result_class_reg_wrapper = L1ZeroInflatedNegativeBinomialResultsWrapper
+
+    def _get_init_kwds(self):
+        kwds = super(ZeroInflatedNegativeBinomialP, self)._get_init_kwds()
+        kwds['p'] = self.model_main.parameterization
+        return kwds
+
+    def _predict_prob(self, params, exog, exog_infl, exposure, offset,
+                      y_values=None):
+        params_infl = params[:self.k_inflate]
+        params_main = params[self.k_inflate:]
+
+        p = self.model_main.parameterization
+        if y_values is None:
+            y_values = np.arange(0, np.max(self.endog)+1)
+
+        if len(exog_infl.shape) < 2:
+            transform = True
+            w = np.atleast_2d(
+                self.model_infl.predict(params_infl, exog_infl))[:, None]
+        else:
+            transform = False
+            w = self.model_infl.predict(params_infl, exog_infl)[:, None]
+
+        w = np.clip(w, np.finfo(float).eps, 1 - np.finfo(float).eps)
+        mu = self.model_main.predict(params_main, exog,
+            exposure=exposure, offset=offset)[:, None]
+        result = self.distribution.pmf(y_values, mu, params_main[-1], p, w)
+        return result[0] if transform else result

     def _predict_var(self, params, mu, prob_infl):
         """predict values for conditional variance V(endog | exog)
@@ -485,11 +937,56 @@ class ZeroInflatedNegativeBinomialP(GenericZeroInflated):
         -------
         Predicted conditional variance.
         """
-        pass
+        alpha = params[-1]
+        w = prob_infl
+        p = self.model_main.parameterization
+        var_ = (1 - w) * mu * (1 + alpha * mu**(p - 1) + w * mu)
+        return var_
+
+    def _get_start_params(self):
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", category=ConvergenceWarning)
+            start_params = self.model_main.fit(disp=0, method='nm').params
+        start_params = np.append(np.zeros(self.k_inflate), start_params)
+        return start_params
+
+    @Appender(ZeroInflatedPoisson.get_distribution.__doc__)
+    def get_distribution(self, params, exog=None, exog_infl=None,
+                         exposure=None, offset=None):
+
+        p = self.model_main.parameterization
+        mu = self.predict(params, exog=exog, exog_infl=exog_infl,
+                          exposure=exposure, offset=offset, which="mean-main")
+        w = self.predict(params, exog=exog, exog_infl=exog_infl,
+                         exposure=exposure, offset=offset, which="prob-main")
+
+        # distr = self.distribution(mu[:, None], params[-1], p, 1 - w[:, None])
+        distr = self.distribution(mu, params[-1], p, 1 - w)
+        return distr


 class ZeroInflatedResults(CountResults):

+    def get_prediction(self, exog=None, exog_infl=None, exposure=None,
+                       offset=None, which='mean', average=False,
+                       agg_weights=None, y_values=None,
+                       transform=True, row_labels=None):
+
+        import statsmodels.base._prediction_inference as pred
+
+        pred_kwds = {
+            'exog_infl': exog_infl,
+            'exposure': exposure,
+            'offset': offset,
+            'y_values': y_values,
+            }
+
+        res = pred.get_prediction_delta(self, exog=exog, which=which,
+                                        average=average,
+                                        agg_weights=agg_weights,
+                                        pred_kwds=pred_kwds)
+        return res
+
     def get_influence(self):
         """
         Influence and outlier measures
@@ -520,20 +1017,29 @@ class ZeroInflatedResults(CountResults):
         for OLS. This is a measure for exog outliers but does not take
         specific features of the model into account.
         """
-        pass
+        # same as sumper in DiscreteResults, only added for docstring
+        from statsmodels.stats.outliers_influence import MLEInfluence
+        return MLEInfluence(self)


 class ZeroInflatedPoissonResults(ZeroInflatedResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for Zero Inflated Poisson', 'extra_attr': ''}
-
-    def get_margeff(self, at='overall', method='dydx', atexog=None, dummy=
-        False, count=False):
+    __doc__ = _discrete_results_docs % {
+    "one_line_description": "A results class for Zero Inflated Poisson",
+    "extra_attr": ""}
+
+    @cache_readonly
+    def _dispersion_factor(self):
+        mu = self.predict(which='linear')
+        w = 1 - self.predict() / np.exp(self.predict(which='linear'))
+        return (1 + w * np.exp(mu))
+
+    def get_margeff(self, at='overall', method='dydx', atexog=None,
+                    dummy=False, count=False):
         """Get marginal effects of the fitted model.

         Not yet implemented for Zero Inflated Models
         """
-        pass
+        raise NotImplementedError("not yet implemented for zero inflation")


 class L1ZeroInflatedPoissonResults(L1CountResults, ZeroInflatedPoissonResults):
@@ -542,88 +1048,93 @@ class L1ZeroInflatedPoissonResults(L1CountResults, ZeroInflatedPoissonResults):

 class ZeroInflatedPoissonResultsWrapper(lm.RegressionResultsWrapper):
     pass
-
-
 wrap.populate_wrapper(ZeroInflatedPoissonResultsWrapper,
-    ZeroInflatedPoissonResults)
+                      ZeroInflatedPoissonResults)


 class L1ZeroInflatedPoissonResultsWrapper(lm.RegressionResultsWrapper):
     pass
-
-
 wrap.populate_wrapper(L1ZeroInflatedPoissonResultsWrapper,
-    L1ZeroInflatedPoissonResults)
+                      L1ZeroInflatedPoissonResults)


 class ZeroInflatedGeneralizedPoissonResults(ZeroInflatedResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for Zero Inflated Generalized Poisson',
-        'extra_attr': ''}
-
-    def get_margeff(self, at='overall', method='dydx', atexog=None, dummy=
-        False, count=False):
+    __doc__ = _discrete_results_docs % {
+        "one_line_description": "A results class for Zero Inflated Generalized Poisson",
+        "extra_attr": ""}
+
+    @cache_readonly
+    def _dispersion_factor(self):
+        p = self.model.model_main.parameterization
+        alpha = self.params[self.model.k_inflate:][-1]
+        mu = np.exp(self.predict(which='linear'))
+        w = 1 - self.predict() / mu
+        return ((1 + alpha * mu**p)**2 + w * mu)
+
+    def get_margeff(self, at='overall', method='dydx', atexog=None,
+                    dummy=False, count=False):
         """Get marginal effects of the fitted model.

         Not yet implemented for Zero Inflated Models
         """
-        pass
+        raise NotImplementedError("not yet implemented for zero inflation")


 class L1ZeroInflatedGeneralizedPoissonResults(L1CountResults,
-    ZeroInflatedGeneralizedPoissonResults):
+        ZeroInflatedGeneralizedPoissonResults):
     pass


-class ZeroInflatedGeneralizedPoissonResultsWrapper(lm.RegressionResultsWrapper
-    ):
+class ZeroInflatedGeneralizedPoissonResultsWrapper(
+        lm.RegressionResultsWrapper):
     pass
-
-
 wrap.populate_wrapper(ZeroInflatedGeneralizedPoissonResultsWrapper,
-    ZeroInflatedGeneralizedPoissonResults)
+                      ZeroInflatedGeneralizedPoissonResults)


-class L1ZeroInflatedGeneralizedPoissonResultsWrapper(lm.
-    RegressionResultsWrapper):
+class L1ZeroInflatedGeneralizedPoissonResultsWrapper(
+        lm.RegressionResultsWrapper):
     pass
-
-
 wrap.populate_wrapper(L1ZeroInflatedGeneralizedPoissonResultsWrapper,
-    L1ZeroInflatedGeneralizedPoissonResults)
+                      L1ZeroInflatedGeneralizedPoissonResults)


 class ZeroInflatedNegativeBinomialResults(ZeroInflatedResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for Zero Inflated Generalized Negative Binomial',
-        'extra_attr': ''}
-
-    def get_margeff(self, at='overall', method='dydx', atexog=None, dummy=
-        False, count=False):
+    __doc__ = _discrete_results_docs % {
+        "one_line_description": "A results class for Zero Inflated Generalized Negative Binomial",
+        "extra_attr": ""}
+
+    @cache_readonly
+    def _dispersion_factor(self):
+        p = self.model.model_main.parameterization
+        alpha = self.params[self.model.k_inflate:][-1]
+        mu = np.exp(self.predict(which='linear'))
+        w = 1 - self.predict() / mu
+        return (1 + alpha * mu**(p-1) + w * mu)
+
+    def get_margeff(self, at='overall', method='dydx', atexog=None,
+            dummy=False, count=False):
         """Get marginal effects of the fitted model.

         Not yet implemented for Zero Inflated Models
         """
-        pass
+        raise NotImplementedError("not yet implemented for zero inflation")


 class L1ZeroInflatedNegativeBinomialResults(L1CountResults,
-    ZeroInflatedNegativeBinomialResults):
+        ZeroInflatedNegativeBinomialResults):
     pass


-class ZeroInflatedNegativeBinomialResultsWrapper(lm.RegressionResultsWrapper):
+class ZeroInflatedNegativeBinomialResultsWrapper(
+        lm.RegressionResultsWrapper):
     pass
-
-
 wrap.populate_wrapper(ZeroInflatedNegativeBinomialResultsWrapper,
-    ZeroInflatedNegativeBinomialResults)
+                      ZeroInflatedNegativeBinomialResults)


-class L1ZeroInflatedNegativeBinomialResultsWrapper(lm.RegressionResultsWrapper
-    ):
+class L1ZeroInflatedNegativeBinomialResultsWrapper(
+        lm.RegressionResultsWrapper):
     pass
-
-
 wrap.populate_wrapper(L1ZeroInflatedNegativeBinomialResultsWrapper,
-    L1ZeroInflatedNegativeBinomialResults)
+                      L1ZeroInflatedNegativeBinomialResults)
diff --git a/statsmodels/discrete/diagnostic.py b/statsmodels/discrete/diagnostic.py
index 7f940b0cf..a970d311b 100644
--- a/statsmodels/discrete/diagnostic.py
+++ b/statsmodels/discrete/diagnostic.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Wed Nov 18 15:17:58 2020

@@ -5,11 +6,25 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import warnings
+
 import numpy as np
+
 from statsmodels.tools.decorators import cache_readonly
-from statsmodels.stats.diagnostic_gen import test_chisquare_binning
-from statsmodels.discrete._diagnostics_count import test_poisson_dispersion, test_poisson_zeroinflation_jh, test_poisson_zeroinflation_broek, test_poisson_zeros, test_chisquare_prob, plot_probs
+
+from statsmodels.stats.diagnostic_gen import (
+    test_chisquare_binning
+    )
+from statsmodels.discrete._diagnostics_count import (
+    test_poisson_dispersion,
+    # _test_poisson_dispersion_generic,
+    test_poisson_zeroinflation_jh,
+    test_poisson_zeroinflation_broek,
+    test_poisson_zeros,
+    test_chisquare_prob,
+    plot_probs
+    )


 class CountDiagnostic:
@@ -30,6 +45,14 @@ class CountDiagnostic:
         self.results = results
         self.y_max = y_max

+    @cache_readonly
+    def probs_predicted(self):
+        if self.y_max is not None:
+            kwds = {"y_values": np.arange(self.y_max + 1)}
+        else:
+            kwds = {}
+        return self.results.predict(which="prob", **kwds)
+
     def test_chisquare_prob(self, bin_edges=None, method=None):
         """Moment test for binned probabilites using OPG.

@@ -80,12 +103,27 @@ class CountDiagnostic:
         Prob(y_i = k | x) are aggregated over observations ``i``.

         """
-        pass
-
-    def plot_probs(self, label='predicted', upp_xlim=None, fig=None):
+        kwds = {}
+        if bin_edges is not None:
+            # TODO: verify upper bound, we drop last bin (may be open, inf)
+            kwds["y_values"] = np.arange(bin_edges[-2] + 1)
+        probs = self.results.predict(which="prob", **kwds)
+        res = test_chisquare_prob(self.results, probs, bin_edges=bin_edges,
+                                  method=method)
+        return res
+
+    def plot_probs(self, label='predicted', upp_xlim=None,
+                   fig=None):
         """Plot observed versus predicted frequencies for entire sample.
         """
-        pass
+        probs_predicted = self.probs_predicted.sum(0)
+        k_probs = len(probs_predicted)
+        freq = np.bincount(self.results.model.endog.astype(int),
+                           minlength=k_probs)[:k_probs]
+        fig = plot_probs(freq, probs_predicted,
+                         label=label, upp_xlim=upp_xlim,
+                         fig=fig)
+        return fig


 class PoissonDiagnostic(CountDiagnostic):
@@ -99,6 +137,9 @@ class PoissonDiagnostic(CountDiagnostic):

     """

+    def _init__(self, results):
+        self.results = results
+
     def test_dispersion(self):
         """Test for excess (over or under) dispersion in Poisson.

@@ -106,9 +147,10 @@ class PoissonDiagnostic(CountDiagnostic):
         -------
         dispersion results
         """
-        pass
+        res = test_poisson_dispersion(self.results)
+        return res

-    def test_poisson_zeroinflation(self, method='prob', exog_infl=None):
+    def test_poisson_zeroinflation(self, method="prob", exog_infl=None):
         """Test for excess zeros, zero inflation or deflation.

         Parameters
@@ -146,10 +188,25 @@ class PoissonDiagnostic(CountDiagnostic):
         conditional means of the estimated Poisson distribution are large.
         In these cases, p-values will not be accurate.
         """
-        pass
+        if method == "prob":
+            if exog_infl is not None:
+                warnings.warn('exog_infl is only used if method = "broek"')
+            res = test_poisson_zeros(self.results)
+        elif method == "broek":
+            if exog_infl is None:
+                res = test_poisson_zeroinflation_broek(self.results)
+            else:
+                exog_infl = np.asarray(exog_infl)
+                if exog_infl.ndim == 1:
+                    exog_infl = exog_infl[:, None]
+                res = test_poisson_zeroinflation_jh(self.results,
+                                                    exog_infl=exog_infl)
+
+        return res

     def _chisquare_binned(self, sort_var=None, bins=10, k_max=None, df=None,
-        sort_method='quicksort', frac_upp=0.1, alpha_nc=0.05):
+                          sort_method="quicksort", frac_upp=0.1,
+                          alpha_nc=0.05):
         """Hosmer-Lemeshow style test for count data.

         Note, this does not take into account that parameters are estimated.
@@ -162,4 +219,33 @@ class PoissonDiagnostic(CountDiagnostic):
         of observations sorted according the ``sort_var``.

         """
-        pass
+
+        if sort_var is None:
+            sort_var = self.results.predict(which="lin")
+
+        endog = self.results.model.endog
+        # not sure yet how this is supposed to work
+        # max_count = endog.max * 2
+        # no option for max count in predict
+        # counts = (endog == np.arange(max_count)).astype(int)
+        expected = self.results.predict(which="prob")
+        counts = (endog[:, None] == np.arange(expected.shape[1])).astype(int)
+
+        # truncate upper tail
+        if k_max is None:
+            nobs = len(endog)
+            icumcounts_sum = nobs - counts.sum(0).cumsum(0)
+            k_max = np.argmax(icumcounts_sum < nobs * frac_upp) - 1
+        expected = expected[:, :k_max]
+        counts = counts[:, :k_max]
+        # we should correct for or include truncated upper bin
+        # inplace modification, we cannot reuse expected and counts anymore
+        expected[:, -1] += 1 - expected.sum(1)
+        counts[:, -1] += 1 - counts.sum(1)
+
+        # TODO: what's the correct df, same as for multinomial/ordered ?
+        res = test_chisquare_binning(counts, expected, sort_var=sort_var,
+                                     bins=bins, df=df, ordered=True,
+                                     sort_method=sort_method,
+                                     alpha_nc=alpha_nc)
+        return res
diff --git a/statsmodels/discrete/discrete_margins.py b/statsmodels/discrete/discrete_margins.py
index 091443394..dc7be4cea 100644
--- a/statsmodels/discrete/discrete_margins.py
+++ b/statsmodels/discrete/discrete_margins.py
@@ -1,30 +1,43 @@
+#Splitting out maringal effects to see if they can be generalized
+
 from statsmodels.compat.python import lzip
 import numpy as np
 from scipy.stats import norm
 from statsmodels.tools.decorators import cache_readonly

+#### margeff helper functions ####
+#NOTE: todo marginal effects for group 2
+# group 2 oprobit, ologit, gologit, mlogit, biprobit

 def _check_margeff_args(at, method):
     """
     Checks valid options for margeff
     """
-    pass
-
+    if at not in ['overall','mean','median','zero','all']:
+        raise ValueError("%s not a valid option for `at`." % at)
+    if method not in ['dydx','eyex','dyex','eydx']:
+        raise ValueError("method is not understood.  Got %s" % method)

 def _check_discrete_args(at, method):
     """
     Checks the arguments for margeff if the exogenous variables are discrete.
     """
-    pass
-
+    if method in ['dyex','eyex']:
+        raise ValueError("%s not allowed for discrete variables" % method)
+    if at in ['median', 'zero']:
+        raise ValueError("%s not allowed for discrete variables" % at)

 def _get_const_index(exog):
     """
     Returns a boolean array of non-constant column indices in exog and
     an scalar array of where the constant is or None
     """
-    pass
-
+    effects_idx = exog.var(0) != 0
+    if np.any(~effects_idx):
+        const_idx = np.where(~effects_idx)[0]
+    else:
+        const_idx = None
+    return effects_idx, const_idx

 def _isdummy(X):
     """
@@ -43,8 +56,25 @@ def _isdummy(X):
     >>> ind
     array([0, 3, 4])
     """
-    pass
-
+    X = np.asarray(X)
+    if X.ndim > 1:
+        ind = np.zeros(X.shape[1]).astype(bool)
+    max = (np.max(X, axis=0) == 1)
+    min = (np.min(X, axis=0) == 0)
+    remainder = np.all(X % 1. == 0, axis=0)
+    ind = min & max & remainder
+    if X.ndim == 1:
+        ind = np.asarray([ind])
+    return np.where(ind)[0]
+
+def _get_dummy_index(X, const_idx):
+    dummy_ind = _isdummy(X)
+    dummy = True
+
+    if dummy_ind.size == 0: # do not waste your time
+        dummy = False
+        dummy_ind = None # this gets passed to stand err func
+    return dummy_ind, dummy

 def _iscount(X):
     """
@@ -63,33 +93,108 @@ def _iscount(X):
     >>> ind
     array([0, 3, 4])
     """
-    pass
-
+    X = np.asarray(X)
+    remainder = np.logical_and(np.logical_and(np.all(X % 1. == 0, axis = 0),
+                               X.var(0) != 0), np.all(X >= 0, axis=0))
+    dummy = _isdummy(X)
+    remainder = np.where(remainder)[0].tolist()
+    for idx in dummy:
+        remainder.remove(idx)
+    return np.array(remainder)
+
+def _get_count_index(X, const_idx):
+    count_ind = _iscount(X)
+    count = True
+
+    if count_ind.size == 0: # do not waste your time
+        count = False
+        count_ind = None # for stand err func
+    return count_ind, count
+
+def _get_margeff_exog(exog, at, atexog, ind):
+    if atexog is not None: # user supplied
+        if isinstance(atexog, dict):
+            # assumes values are singular or of len(exog)
+            for key in atexog:
+                exog[:,key] = atexog[key]
+        elif isinstance(atexog, np.ndarray): #TODO: handle DataFrames
+            if atexog.ndim == 1:
+                k_vars = len(atexog)
+            else:
+                k_vars = atexog.shape[1]
+            try:
+                assert k_vars == exog.shape[1]
+            except:
+                raise ValueError("atexog does not have the same number "
+                        "of variables as exog")
+            exog = atexog
+
+    #NOTE: we should fill in atexog after we process at
+    if at == 'mean':
+        exog = np.atleast_2d(exog.mean(0))
+    elif at == 'median':
+        exog = np.atleast_2d(np.median(exog, axis=0))
+    elif at == 'zero':
+        exog = np.zeros((1,exog.shape[1]))
+        exog[0,~ind] = 1
+    return exog

 def _get_count_effects(effects, exog, count_ind, method, model, params):
     """
     If there's a count variable, the predicted difference is taken by
     subtracting one and adding one to exog then averaging the difference
     """
-    pass
-
+    # this is the index for the effect and the index for count col in exog
+    for i in count_ind:
+        exog0 = exog.copy()
+        exog0[:, i] -= 1
+        effect0 = model.predict(params, exog0)
+        exog0[:, i] += 2
+        effect1 = model.predict(params, exog0)
+        #NOTE: done by analogy with dummy effects but untested bc
+        # stata does not handle both count and eydx anywhere
+        if 'ey' in method:
+            effect0 = np.log(effect0)
+            effect1 = np.log(effect1)
+        effects[:, i] = ((effect1 - effect0)/2)
+    return effects

 def _get_dummy_effects(effects, exog, dummy_ind, method, model, params):
     """
     If there's a dummy variable, the predicted difference is taken at
     0 and 1
     """
-    pass
-
+    # this is the index for the effect and the index for dummy col in exog
+    for i in dummy_ind:
+        exog0 = exog.copy() # only copy once, can we avoid a copy?
+        exog0[:,i] = 0
+        effect0 = model.predict(params, exog0)
+        #fittedvalues0 = np.dot(exog0,params)
+        exog0[:,i] = 1
+        effect1 = model.predict(params, exog0)
+        if 'ey' in method:
+            effect0 = np.log(effect0)
+            effect1 = np.log(effect1)
+        effects[:, i] = (effect1 - effect0)
+    return effects
+
+def _effects_at(effects, at):
+    if at == 'all':
+        effects = effects
+    elif at == 'overall':
+        effects = effects.mean(0)
+    else:
+        effects = effects[0,:]
+    return effects

 def _margeff_cov_params_dummy(model, cov_margins, params, exog, dummy_ind,
-    method, J):
-    """
+        method, J):
+    r"""
     Returns the Jacobian for discrete regressors for use in margeff_cov_params.

     For discrete regressors the marginal effect is

-    \\Delta F = F(XB) | d = 1 - F(XB) | d = 0
+    \Delta F = F(XB) | d = 1 - F(XB) | d = 0

     The row of the Jacobian for this variable is given by

@@ -97,17 +202,32 @@ def _margeff_cov_params_dummy(model, cov_margins, params, exog, dummy_ind,

     Where F is the default prediction of the model.
     """
-    pass
-
+    for i in dummy_ind:
+        exog0 = exog.copy()
+        exog1 = exog.copy()
+        exog0[:,i] = 0
+        exog1[:,i] = 1
+        dfdb0 = model._derivative_predict(params, exog0, method)
+        dfdb1 = model._derivative_predict(params, exog1, method)
+        dfdb = (dfdb1 - dfdb0)
+        if dfdb.ndim >= 2: # for overall
+            dfdb = dfdb.mean(0)
+        if J > 1:
+            K = dfdb.shape[1] // (J-1)
+            cov_margins[i::K, :] = dfdb
+        else:
+            # dfdb could be too short if there are extra params, k_extra > 0
+            cov_margins[i, :len(dfdb)] = dfdb # how each F changes with change in B
+    return cov_margins

 def _margeff_cov_params_count(model, cov_margins, params, exog, count_ind,
-    method, J):
-    """
+                             method, J):
+    r"""
     Returns the Jacobian for discrete regressors for use in margeff_cov_params.

     For discrete regressors the marginal effect is

-    \\Delta F = F(XB) | d += 1 - F(XB) | d -= 1
+    \Delta F = F(XB) | d += 1 - F(XB) | d -= 1

     The row of the Jacobian for this variable is given by

@@ -115,11 +235,25 @@ def _margeff_cov_params_count(model, cov_margins, params, exog, count_ind,

     where F is the default prediction for the model.
     """
-    pass
-
+    for i in count_ind:
+        exog0 = exog.copy()
+        exog0[:,i] -= 1
+        dfdb0 = model._derivative_predict(params, exog0, method)
+        exog0[:,i] += 2
+        dfdb1 = model._derivative_predict(params, exog0, method)
+        dfdb = (dfdb1 - dfdb0)
+        if dfdb.ndim >= 2: # for overall
+            dfdb = dfdb.mean(0) / 2
+        if J > 1:
+            K = dfdb.shape[1] / (J-1)
+            cov_margins[i::K, :] = dfdb
+        else:
+            # dfdb could be too short if there are extra params, k_extra > 0
+            cov_margins[i, :len(dfdb)] = dfdb # how each F changes with change in B
+    return cov_margins

 def margeff_cov_params(model, params, exog, cov_params, at, derivative,
-    dummy_ind, count_ind, method, J):
+                       dummy_ind, count_ind, method, J):
     """
     Computes the variance-covariance of marginal effects by the delta method.

@@ -169,23 +303,62 @@ def margeff_cov_params(model, params, exog, cov_params, at, derivative,
     The outer Jacobians are computed via numerical differentiation if
     derivative is a function.
     """
-    pass
-
+    if callable(derivative):
+        from statsmodels.tools.numdiff import approx_fprime_cs
+        params = params.ravel('F')  # for Multinomial
+        try:
+            jacobian_mat = approx_fprime_cs(params, derivative,
+                                            args=(exog,method))
+        except TypeError:  # norm.cdf does not take complex values
+            from statsmodels.tools.numdiff import approx_fprime
+            jacobian_mat = approx_fprime(params, derivative,
+                                            args=(exog,method))
+        if at == 'overall':
+            jacobian_mat = np.mean(jacobian_mat, axis=1)
+        else:
+            jacobian_mat = jacobian_mat.squeeze()  # exog was 2d row vector
+        if dummy_ind is not None:
+            jacobian_mat = _margeff_cov_params_dummy(model, jacobian_mat,
+                                params, exog, dummy_ind, method, J)
+        if count_ind is not None:
+            jacobian_mat = _margeff_cov_params_count(model, jacobian_mat,
+                                params, exog, count_ind, method, J)
+    else:
+        jacobian_mat = derivative
+
+    #NOTE: this will not go through for at == 'all'
+    return np.dot(np.dot(jacobian_mat, cov_params), jacobian_mat.T)

 def margeff_cov_with_se(model, params, exog, cov_params, at, derivative,
-    dummy_ind, count_ind, method, J):
+                        dummy_ind, count_ind, method, J):
     """
     See margeff_cov_params.

     Same function but returns both the covariance of the marginal effects
     and their standard errors.
     """
-    pass
+    cov_me = margeff_cov_params(model, params, exog, cov_params, at,
+                                              derivative, dummy_ind,
+                                              count_ind, method, J)
+    return cov_me, np.sqrt(np.diag(cov_me))
+
+
+def margeff():
+    raise NotImplementedError
+
+

+def _check_at_is_all(method):
+    if method['at'] == 'all':
+        raise ValueError("Only margeff are available when `at` is "
+                         "'all'. Please input specific points if you would "
+                         "like to do inference.")

-_transform_names = dict(dydx='dy/dx', eyex='d(lny)/d(lnx)', dyex=
-    'dy/d(lnx)', eydx='d(lny)/dx')

+_transform_names = dict(dydx='dy/dx',
+                        eyex='d(lny)/d(lnx)',
+                        dyex='dy/d(lnx)',
+                        eydx='d(lny)/dx')

 class Margins:
     """
@@ -194,15 +367,46 @@ class Margins:
     This is just a sketch of what we may want out of a general margins class.
     I (SS) need to look at details of other models.
     """
-
     def __init__(self, results, get_margeff, derivative, dist=None,
-        margeff_args=()):
+                       margeff_args=()):
         self._cache = {}
         self.results = results
         self.dist = dist
         self.get_margeff(margeff_args)

+    def _reset(self):
+        self._cache = {}
+
+    def get_margeff(self, *args, **kwargs):
+        self._reset()
+        self.margeff = self.get_margeff(*args)
+
+    @cache_readonly
+    def tvalues(self):
+        raise NotImplementedError
+
+    @cache_readonly
+    def cov_margins(self):
+        raise NotImplementedError
+
+    @cache_readonly
+    def margins_se(self):
+        raise NotImplementedError
+
+    def summary_frame(self):
+        raise NotImplementedError
+
+    @cache_readonly
+    def pvalues(self):
+        raise NotImplementedError
+
+    def conf_int(self, alpha=.05):
+        raise NotImplementedError

+    def summary(self, alpha=.05):
+        raise NotImplementedError
+
+#class DiscreteMargins(Margins):
 class DiscreteMargins:
     """Get marginal effects of a Discrete Choice model.

@@ -217,13 +421,20 @@ class DiscreteMargins:
         Keyword args are passed to `get_margeff`. This is the same as
         results.get_margeff. See there for more information.
     """
-
     def __init__(self, results, args, kwargs={}):
         self._cache = {}
         self.results = results
         self.get_margeff(*args, **kwargs)

-    def summary_frame(self, alpha=0.05):
+    def _reset(self):
+        self._cache = {}
+
+    @cache_readonly
+    def tvalues(self):
+        _check_at_is_all(self.margeff_options)
+        return self.margeff / self.margeff_se
+
+    def summary_frame(self, alpha=.05):
         """
         Returns a DataFrame summarizing the marginal effects.

@@ -243,9 +454,47 @@ class DiscreteMargins:
         The dataframe is created on each call and not cached, as are the
         tables build in `summary()`
         """
-        pass
-
-    def conf_int(self, alpha=0.05):
+        _check_at_is_all(self.margeff_options)
+        results = self.results
+        model = self.results.model
+        from pandas import DataFrame, MultiIndex
+        names = [_transform_names[self.margeff_options['method']],
+                                  'Std. Err.', 'z', 'Pr(>|z|)',
+                                  'Conf. Int. Low', 'Cont. Int. Hi.']
+        ind = self.results.model.exog.var(0) != 0 # True if not a constant
+        exog_names = self.results.model.exog_names
+        k_extra = getattr(model, 'k_extra', 0)
+        if k_extra > 0:
+            exog_names = exog_names[:-k_extra]
+        var_names = [name for i,name in enumerate(exog_names) if ind[i]]
+
+        if self.margeff.ndim == 2:
+            # MNLogit case
+            ci = self.conf_int(alpha)
+            table = np.column_stack([i.ravel("F") for i in
+                        [self.margeff, self.margeff_se, self.tvalues,
+                         self.pvalues, ci[:, 0, :], ci[:, 1, :]]])
+
+            _, yname_list = results._get_endog_name(model.endog_names,
+                                                        None, all=True)
+            ynames = np.repeat(yname_list, len(var_names))
+            xnames = np.tile(var_names, len(yname_list))
+            index = MultiIndex.from_tuples(list(zip(ynames, xnames)),
+                                           names=['endog', 'exog'])
+        else:
+            table = np.column_stack((self.margeff, self.margeff_se, self.tvalues,
+                                     self.pvalues, self.conf_int(alpha)))
+            index=var_names
+
+        return DataFrame(table, columns=names, index=index)
+
+
+    @cache_readonly
+    def pvalues(self):
+        _check_at_is_all(self.margeff_options)
+        return norm.sf(np.abs(self.tvalues)) * 2
+
+    def conf_int(self, alpha=.05):
         """
         Returns the confidence intervals of the marginal effects

@@ -261,9 +510,14 @@ class DiscreteMargins:
             An array with lower, upper confidence intervals for the marginal
             effects.
         """
-        pass
-
-    def summary(self, alpha=0.05):
+        _check_at_is_all(self.margeff_options)
+        me_se = self.margeff_se
+        q = norm.ppf(1 - alpha / 2)
+        lower = self.margeff - q * me_se
+        upper = self.margeff + q * me_se
+        return np.asarray(lzip(lower, upper))
+
+    def summary(self, alpha=.05):
         """
         Returns a summary table for marginal effects

@@ -278,10 +532,75 @@ class DiscreteMargins:
         Summary : SummaryTable
             A SummaryTable instance
         """
-        pass
-
-    def get_margeff(self, at='overall', method='dydx', atexog=None, dummy=
-        False, count=False):
+        _check_at_is_all(self.margeff_options)
+        results = self.results
+        model = results.model
+        title = model.__class__.__name__ + " Marginal Effects"
+        method = self.margeff_options['method']
+        top_left = [('Dep. Variable:', [model.endog_names]),
+                ('Method:', [method]),
+                ('At:', [self.margeff_options['at']]),]
+
+        from statsmodels.iolib.summary import (Summary, summary_params,
+                                                table_extend)
+        exog_names = model.exog_names[:] # copy
+        smry = Summary()
+
+        # TODO: sigh, we really need to hold on to this in _data...
+        _, const_idx = _get_const_index(model.exog)
+        if const_idx is not None:
+            exog_names.pop(const_idx[0])
+        if getattr(model, 'k_extra', 0) > 0:
+            exog_names = exog_names[:-model.k_extra]
+
+        J = int(getattr(model, "J", 1))
+        if J > 1:
+            yname, yname_list = results._get_endog_name(model.endog_names,
+                                                None, all=True)
+        else:
+            yname = model.endog_names
+            yname_list = [yname]
+
+        smry.add_table_2cols(self, gleft=top_left, gright=[],
+                yname=yname, xname=exog_names, title=title)
+
+        # NOTE: add_table_params is not general enough yet for margeff
+        # could use a refactor with getattr instead of hard-coded params
+        # tvalues etc.
+        table = []
+        conf_int = self.conf_int(alpha)
+        margeff = self.margeff
+        margeff_se = self.margeff_se
+        tvalues = self.tvalues
+        pvalues = self.pvalues
+        if J > 1:
+            for eq in range(J):
+                restup = (results, margeff[:,eq], margeff_se[:,eq],
+                          tvalues[:,eq], pvalues[:,eq], conf_int[:,:,eq])
+                tble = summary_params(restup, yname=yname_list[eq],
+                              xname=exog_names, alpha=alpha, use_t=False,
+                              skip_header=True)
+                tble.title = yname_list[eq]
+                # overwrite coef with method name
+                header = ['', _transform_names[method], 'std err', 'z',
+                        'P>|z|', '[' + str(alpha/2), str(1-alpha/2) + ']']
+                tble.insert_header_row(0, header)
+                table.append(tble)
+
+            table = table_extend(table, keep_headers=True)
+        else:
+            restup = (results, margeff, margeff_se, tvalues, pvalues, conf_int)
+            table = summary_params(restup, yname=yname, xname=exog_names,
+                    alpha=alpha, use_t=False, skip_header=True)
+            header = ['', _transform_names[method], 'std err', 'z',
+                        'P>|z|', '[' + str(alpha/2), str(1-alpha/2) + ']']
+            table.insert_header_row(0, header)
+
+        smry.tables.append(table)
+        return smry
+
+    def get_margeff(self, at='overall', method='dydx', atexog=None,
+                          dummy=False, count=False):
         """Get marginal effects of the fitted model.

         Parameters
@@ -340,4 +659,74 @@ class DiscreteMargins:
         When using after Poisson, returns the expected number of events
         per period, assuming that the model is loglinear.
         """
-        pass
+        self._reset() # always reset the cache when this is called
+        #TODO: if at is not all or overall, we can also put atexog values
+        # in summary table head
+        method = method.lower()
+        at = at.lower()
+        _check_margeff_args(at, method)
+        self.margeff_options = dict(method=method, at=at)
+        results = self.results
+        model = results.model
+        params = results.params
+        exog = model.exog.copy() # copy because values are changed
+        effects_idx, const_idx =  _get_const_index(exog)
+
+        if dummy:
+            _check_discrete_args(at, method)
+            dummy_idx, dummy = _get_dummy_index(exog, const_idx)
+        else:
+            dummy_idx = None
+
+        if count:
+            _check_discrete_args(at, method)
+            count_idx, count = _get_count_index(exog, const_idx)
+        else:
+            count_idx = None
+
+        # attach dummy_idx and cout_idx
+        self.dummy_idx = dummy_idx
+        self.count_idx = count_idx
+
+        # get the exogenous variables
+        exog = _get_margeff_exog(exog, at, atexog, effects_idx)
+
+        # get base marginal effects, handled by sub-classes
+        effects = model._derivative_exog(params, exog, method,
+                                                    dummy_idx, count_idx)
+
+        J = getattr(model, 'J', 1)
+        effects_idx = np.tile(effects_idx, J) # adjust for multi-equation.
+
+        effects = _effects_at(effects, at)
+
+        if at == 'all':
+            if J > 1:
+                K = model.K - np.any(~effects_idx) # subtract constant
+                self.margeff = effects[:, effects_idx].reshape(-1, K, J,
+                                                                order='F')
+            else:
+                self.margeff = effects[:, effects_idx]
+        else:
+            # Set standard error of the marginal effects by Delta method.
+            margeff_cov, margeff_se = margeff_cov_with_se(model, params, exog,
+                                                results.cov_params(), at,
+                                                model._derivative_exog,
+                                                dummy_idx, count_idx,
+                                                method, J)
+
+            # reshape for multi-equation
+            if J > 1:
+                K = model.K - np.any(~effects_idx) # subtract constant
+                self.margeff = effects[effects_idx].reshape(K, J, order='F')
+                self.margeff_se = margeff_se[effects_idx].reshape(K, J,
+                                                                  order='F')
+                self.margeff_cov = margeff_cov[effects_idx][:, effects_idx]
+            else:
+                # do not care about at constant
+                # hack truncate effects_idx again if necessary
+                # if eyex, then effects is truncated to be without extra params
+                effects_idx = effects_idx[:len(effects)]
+                self.margeff_cov = margeff_cov[effects_idx][:, effects_idx]
+                self.margeff_se = margeff_se[effects_idx]
+                self.margeff = effects[effects_idx]
diff --git a/statsmodels/discrete/discrete_model.py b/statsmodels/discrete/discrete_model.py
index f7c210c5e..1dc42f132 100644
--- a/statsmodels/discrete/discrete_model.py
+++ b/statsmodels/discrete/discrete_model.py
@@ -15,16 +15,20 @@ G.S. Madalla. `Limited-Dependent and Qualitative Variables in Econometrics`.

 W. Greene. `Econometric Analysis`. Prentice Hall, 5th. edition. 2003.
 """
-__all__ = ['Poisson', 'Logit', 'Probit', 'MNLogit', 'NegativeBinomial',
-    'GeneralizedPoisson', 'NegativeBinomialP', 'CountModel']
+__all__ = ["Poisson", "Logit", "Probit", "MNLogit", "NegativeBinomial",
+           "GeneralizedPoisson", "NegativeBinomialP", "CountModel"]
+
 from statsmodels.compat.pandas import Appender
+
 import warnings
+
 import numpy as np
 from pandas import MultiIndex, get_dummies
 from scipy import special, stats
 from scipy.special import digamma, gammaln, loggamma, polygamma
 from scipy.stats import nbinom
-from statsmodels.base.data import handle_data
+
+from statsmodels.base.data import handle_data  # for mnlogit
 from statsmodels.base.l1_slsqp import fit_l1_slsqp
 import statsmodels.base.model as base
 import statsmodels.base.wrapper as wrap
@@ -36,15 +40,33 @@ import statsmodels.regression.linear_model as lm
 from statsmodels.tools import data as data_tools, tools
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.numdiff import approx_fprime_cs
-from statsmodels.tools.sm_exceptions import PerfectSeparationError, PerfectSeparationWarning, SpecificationWarning
+from statsmodels.tools.sm_exceptions import (
+    PerfectSeparationError,
+    PerfectSeparationWarning,
+    SpecificationWarning,
+    )
+
+
 try:
-    import cvxopt
+    import cvxopt  # noqa:F401
     have_cvxopt = True
 except ImportError:
     have_cvxopt = False
+
+
+# TODO: When we eventually get user-settable precision, we need to change
+#       this
 FLOAT_EPS = np.finfo(float).eps
+
+# Limit for exponentials to avoid overflow
 EXP_UPPER_LIMIT = np.log(np.finfo(np.float64).max) - 1.0
-_discrete_models_docs = '\n'
+
+# TODO: add options for the parameter covariance/variance
+#       ie., OIM, EIM, and BHHH see Green 21.4
+
+_discrete_models_docs = """
+"""
+
 _discrete_results_docs = """
     %(one_line_description)s

@@ -67,11 +89,13 @@ _discrete_results_docs = """
     llf : float
         Value of the loglikelihood
     %(extra_attr)s"""
+
 _l1_results_attr = """    nnz_params : int
         The number of nonzero parameters in the model.  Train with
         trim_params == True or else numerical error will distort this.
     trimmed : bool array
         trimmed[i] == True if the ith parameter was trimmed from the model."""
+
 _get_start_params_null_docs = """
 Compute one-step moment estimator for null (constant-only) model

@@ -83,6 +107,7 @@ params : ndarray
     parameter estimate based one one-step moment matching

 """
+
 _check_rank_doc = """
     check_rank : bool
         Check exog rank to determine model degrees of freedom. Default is
@@ -91,6 +116,39 @@ _check_rank_doc = """
     """


+# helper for MNLogit (will be generally useful later)
+def _numpy_to_dummies(endog):
+    if endog.ndim == 2 and endog.dtype.kind not in ["S", "O"]:
+        endog_dummies = endog
+        ynames = range(endog.shape[1])
+    else:
+        dummies = get_dummies(endog, drop_first=False)
+        ynames = {i: dummies.columns[i] for i in range(dummies.shape[1])}
+        endog_dummies = np.asarray(dummies, dtype=float)
+
+        return endog_dummies, ynames
+
+    return endog_dummies, ynames
+
+
+def _pandas_to_dummies(endog):
+    if endog.ndim == 2:
+        if endog.shape[1] == 1:
+            yname = endog.columns[0]
+            endog_dummies = get_dummies(endog.iloc[:, 0])
+        else:  # assume already dummies
+            yname = 'y'
+            endog_dummies = endog
+    else:
+        yname = endog.name
+        if yname is None:
+            yname = 'y'
+        endog_dummies = get_dummies(endog)
+    ynames = endog_dummies.columns.tolist()
+
+    return endog_dummies, ynames, yname
+
+
 def _validate_l1_method(method):
     """
     As of 0.10.0, the supported values for `method` in `fit_regularized`
@@ -105,7 +163,12 @@ def _validate_l1_method(method):
     ------
     ValueError
     """
-    pass
+    if method not in ['l1', 'l1_cvxopt_cp']:
+        raise ValueError('`method` = {method} is not supported, use either '
+                         '"l1" or "l1_cvxopt_cp"'.format(method=method))
+
+
+#### Private Model Classes ####


 class DiscreteModel(base.LikelihoodModel):
@@ -120,7 +183,7 @@ class DiscreteModel(base.LikelihoodModel):
     def __init__(self, endog, exog, check_rank=True, **kwargs):
         self._check_rank = check_rank
         super().__init__(endog, exog, **kwargs)
-        self.raise_on_perfect_prediction = False
+        self.raise_on_perfect_prediction = False  # keep for backwards compat
         self.k_extra = 0

     def initialize(self):
@@ -129,35 +192,69 @@ class DiscreteModel(base.LikelihoodModel):
         statsmodels.model.LikelihoodModel.__init__
         and should contain any preprocessing that needs to be done for a model.
         """
-        pass
+        if self._check_rank:
+            # assumes constant
+            rank = tools.matrix_rank(self.exog, method="qr")
+        else:
+            # If rank check is skipped, assume full
+            rank = self.exog.shape[1]
+        self.df_model = float(rank - 1)
+        self.df_resid = float(self.exog.shape[0] - rank)

     def cdf(self, X):
         """
         The cumulative distribution function of the model.
         """
-        pass
+        raise NotImplementedError

     def pdf(self, X):
         """
         The probability density (mass) function of the model.
         """
-        pass
+        raise NotImplementedError
+
+    def _check_perfect_pred(self, params, *args):
+        endog = self.endog
+        fittedvalues = self.predict(params)
+        if np.allclose(fittedvalues - endog, 0):
+            if self.raise_on_perfect_prediction:
+                # backwards compatibility for attr raise_on_perfect_prediction
+                msg = "Perfect separation detected, results not available"
+                raise PerfectSeparationError(msg)
+            else:
+                msg = ("Perfect separation or prediction detected, "
+                       "parameter may not be identified")
+                warnings.warn(msg, category=PerfectSeparationWarning)

     @Appender(base.LikelihoodModel.fit.__doc__)
     def fit(self, start_params=None, method='newton', maxiter=35,
-        full_output=1, disp=1, callback=None, **kwargs):
+            full_output=1, disp=1, callback=None, **kwargs):
         """
         Fit the model using maximum likelihood.

         The rest of the docstring is from
         statsmodels.base.model.LikelihoodModel.fit
         """
-        pass
+        if callback is None:
+            callback = self._check_perfect_pred
+        else:
+            pass  # TODO: make a function factory to have multiple call-backs

-    def fit_regularized(self, start_params=None, method='l1', maxiter=
-        'defined_by_method', full_output=1, disp=True, callback=None, alpha
-        =0, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=0.0001,
-        qc_tol=0.03, qc_verbose=False, **kwargs):
+        mlefit = super().fit(start_params=start_params,
+                             method=method,
+                             maxiter=maxiter,
+                             full_output=full_output,
+                             disp=disp,
+                             callback=callback,
+                             **kwargs)
+
+        return mlefit  # It is up to subclasses to wrap results
+
+    def fit_regularized(self, start_params=None, method='l1',
+                        maxiter='defined_by_method', full_output=1, disp=True,
+                        callback=None, alpha=0, trim_mode='auto',
+                        auto_trim_tol=0.01, size_trim_tol=1e-4, qc_tol=0.03,
+                        qc_verbose=False, **kwargs):
         """
         Fit the model using a regularized maximum likelihood.

@@ -259,7 +356,60 @@ class DiscreteModel(base.LikelihoodModel):
         (i) :math:`|\\partial_k L| = \\alpha_k`  and  :math:`\\beta_k \\neq 0`
         (ii) :math:`|\\partial_k L| \\leq \\alpha_k`  and  :math:`\\beta_k = 0`
         """
-        pass
+        _validate_l1_method(method)
+        # Set attributes based on method
+        cov_params_func = self.cov_params_func_l1
+
+        ### Bundle up extra kwargs for the dictionary kwargs.  These are
+        ### passed through super(...).fit() as kwargs and unpacked at
+        ### appropriate times
+        alpha = np.array(alpha)
+        assert alpha.min() >= 0
+        try:
+            kwargs['alpha'] = alpha
+        except TypeError:
+            kwargs = dict(alpha=alpha)
+        kwargs['alpha_rescaled'] = kwargs['alpha'] / float(self.endog.shape[0])
+        kwargs['trim_mode'] = trim_mode
+        kwargs['size_trim_tol'] = size_trim_tol
+        kwargs['auto_trim_tol'] = auto_trim_tol
+        kwargs['qc_tol'] = qc_tol
+        kwargs['qc_verbose'] = qc_verbose
+
+        ### Define default keyword arguments to be passed to super(...).fit()
+        if maxiter == 'defined_by_method':
+            if method == 'l1':
+                maxiter = 1000
+            elif method == 'l1_cvxopt_cp':
+                maxiter = 70
+
+        ## Parameters to pass to super(...).fit()
+        # For the 'extra' parameters, pass all that are available,
+        # even if we know (at this point) we will only use one.
+        extra_fit_funcs = {'l1': fit_l1_slsqp}
+        if have_cvxopt and method == 'l1_cvxopt_cp':
+            from statsmodels.base.l1_cvxopt import fit_l1_cvxopt_cp
+            extra_fit_funcs['l1_cvxopt_cp'] = fit_l1_cvxopt_cp
+        elif method.lower() == 'l1_cvxopt_cp':
+            raise ValueError("Cannot use l1_cvxopt_cp as cvxopt "
+                             "was not found (install it, or use method='l1' instead)")
+
+        if callback is None:
+            callback = self._check_perfect_pred
+        else:
+            pass  # make a function factory to have multiple call-backs
+
+        mlefit = super().fit(start_params=start_params,
+                             method=method,
+                             maxiter=maxiter,
+                             full_output=full_output,
+                             disp=disp,
+                             callback=callback,
+                             extra_fit_funcs=extra_fit_funcs,
+                             cov_params_func=cov_params_func,
+                             **kwargs)
+
+        return mlefit  # up to subclasses to wrap results

     def cov_params_func_l1(self, likelihood_model, xopt, retvals):
         """
@@ -270,47 +420,73 @@ class DiscreteModel(base.LikelihoodModel):
         Returns a full cov_params matrix, with entries corresponding
         to zero'd values set to np.nan.
         """
-        pass
+        H = likelihood_model.hessian(xopt)
+        trimmed = retvals['trimmed']
+        nz_idx = np.nonzero(~trimmed)[0]
+        nnz_params = (~trimmed).sum()
+        if nnz_params > 0:
+            H_restricted = H[nz_idx[:, None], nz_idx]
+            # Covariance estimate for the nonzero params
+            H_restricted_inv = np.linalg.inv(-H_restricted)
+        else:
+            H_restricted_inv = np.zeros(0)
+
+        cov_params = np.nan * np.ones(H.shape)
+        cov_params[nz_idx[:, None], nz_idx] = H_restricted_inv
+
+        return cov_params

-    def predict(self, params, exog=None, which='mean', linear=None):
+    def predict(self, params, exog=None, which="mean", linear=None):
         """
         Predict response variable of a model given exogenous variables.
         """
-        pass
+        raise NotImplementedError

-    def _derivative_exog(self, params, exog=None, dummy_idx=None, count_idx
-        =None):
+    def _derivative_exog(self, params, exog=None, dummy_idx=None,
+                         count_idx=None):
         """
         This should implement the derivative of the non-linear function
         """
-        pass
+        raise NotImplementedError

     def _derivative_exog_helper(self, margeff, params, exog, dummy_idx,
-        count_idx, transform):
+                                count_idx, transform):
         """
         Helper for _derivative_exog to wrap results appropriately
         """
-        pass
+        from .discrete_margins import _get_count_effects, _get_dummy_effects
+
+        if count_idx is not None:
+            margeff = _get_count_effects(margeff, exog, count_idx, transform,
+                                         self, params)
+        if dummy_idx is not None:
+            margeff = _get_dummy_effects(margeff, exog, dummy_idx, transform,
+                                         self, params)
+
+        return margeff


 class BinaryModel(DiscreteModel):
     _continuous_ok = False

     def __init__(self, endog, exog, offset=None, check_rank=True, **kwargs):
+        # unconditional check, requires no extra kwargs added by subclasses
         self._check_kwargs(kwargs)
         super().__init__(endog, exog, offset=offset, check_rank=check_rank,
-            **kwargs)
+                         **kwargs)
         if not issubclass(self.__class__, MultinomialModel):
             if not np.all((self.endog >= 0) & (self.endog <= 1)):
-                raise ValueError('endog must be in the unit interval.')
+                raise ValueError("endog must be in the unit interval.")
+
         if offset is None:
             delattr(self, 'offset')
-            if not self._continuous_ok and np.any(self.endog != np.round(
-                self.endog)):
-                raise ValueError('endog must be binary, either 0 or 1')

-    def predict(self, params, exog=None, which='mean', linear=None, offset=None
-        ):
+            if (not self._continuous_ok and
+                    np.any(self.endog != np.round(self.endog))):
+                raise ValueError("endog must be binary, either 0 or 1")
+
+    def predict(self, params, exog=None, which="mean", linear=None,
+                offset=None):
         """
         Predict response variable of a model given exogenous variables.

@@ -349,11 +525,69 @@ class BinaryModel(DiscreteModel):
         array
             Fitted values at exog.
         """
-        pass
+        if linear is not None:
+            msg = 'linear keyword is deprecated, use which="linear"'
+            warnings.warn(msg, FutureWarning)
+            if linear is True:
+                which = "linear"
+
+        # Use fit offset if appropriate
+        if offset is None and exog is None and hasattr(self, 'offset'):
+            offset = self.offset
+        elif offset is None:
+            offset = 0.
+
+        if exog is None:
+            exog = self.exog
+
+        linpred = np.dot(exog, params) + offset
+
+        if which == "mean":
+            return self.cdf(linpred)
+        elif which == "linear":
+            return linpred
+        if which == "var":
+            mu = self.cdf(linpred)
+            var_ = mu * (1 - mu)
+            return var_
+        else:
+            raise ValueError('Only `which` is "mean", "linear" or "var" are'
+                             ' available.')
+
+    @Appender(DiscreteModel.fit_regularized.__doc__)
+    def fit_regularized(self, start_params=None, method='l1',
+            maxiter='defined_by_method', full_output=1, disp=1, callback=None,
+            alpha=0, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=1e-4,
+            qc_tol=0.03, **kwargs):
+
+        _validate_l1_method(method)
+
+        bnryfit = super().fit_regularized(start_params=start_params,
+                                          method=method,
+                                          maxiter=maxiter,
+                                          full_output=full_output,
+                                          disp=disp,
+                                          callback=callback,
+                                          alpha=alpha,
+                                          trim_mode=trim_mode,
+                                          auto_trim_tol=auto_trim_tol,
+                                          size_trim_tol=size_trim_tol,
+                                          qc_tol=qc_tol,
+                                          **kwargs)
+
+        discretefit = L1BinaryResults(self, bnryfit)
+        return L1BinaryResultsWrapper(discretefit)
+
+    def fit_constrained(self, constraints, start_params=None, **fit_kwds):
+
+        res = fit_constrained_wrap(self, constraints, start_params=None,
+                                   **fit_kwds)
+        return res
+
     fit_constrained.__doc__ = fit_constrained_wrap.__doc__

     def _derivative_predict(self, params, exog=None, transform='dydx',
-        offset=None):
+                            offset=None):
         """
         For computing marginal effects standard errors.

@@ -364,10 +598,16 @@ class BinaryModel(DiscreteModel):
         Transform can be 'dydx' or 'eydx'. Checking is done in margeff
         computations for appropriate transform.
         """
-        pass
+        if exog is None:
+            exog = self.exog
+        linpred = self.predict(params, exog, offset=offset, which="linear")
+        dF = self.pdf(linpred)[:,None] * exog
+        if 'ey' in transform:
+            dF /= self.predict(params, exog, offset=offset)[:,None]
+        return dF

     def _derivative_exog(self, params, exog=None, transform='dydx',
-        dummy_idx=None, count_idx=None, offset=None):
+                         dummy_idx=None, count_idx=None, offset=None):
         """
         For computing marginal effects returns dF(XB) / dX where F(.) is
         the predicted probabilities
@@ -377,7 +617,22 @@ class BinaryModel(DiscreteModel):
         Not all of these make sense in the presence of discrete regressors,
         but checks are done in the results in get_margeff.
         """
-        pass
+        # Note: this form should be appropriate for
+        #   group 1 probit, logit, logistic, cloglog, heckprob, xtprobit
+        if exog is None:
+            exog = self.exog
+
+        linpred = self.predict(params, exog, offset=offset, which="linear")
+        margeff = np.dot(self.pdf(linpred)[:,None],
+                         params[None,:])
+
+        if 'ex' in transform:
+            margeff *= exog
+        if 'ey' in transform:
+            margeff /= self.predict(params, exog)[:, None]
+
+        return self._derivative_exog_helper(margeff, params, exog,
+                                            dummy_idx, count_idx, transform)

     def _deriv_mean_dparams(self, params):
         """
@@ -393,7 +648,11 @@ class BinaryModel(DiscreteModel):
         The value of the derivative of the expected endog with respect
         to the parameter vector.
         """
-        pass
+        link = self.link
+        lin_pred = self.predict(params, which="linear")
+        idl = link.inverse_deriv(lin_pred)
+        dmat = self.exog * idl[:, None]
+        return dmat

     def get_distribution(self, params, exog=None, offset=None):
         """Get frozen instance of distribution based on predicted parameters.
@@ -422,18 +681,57 @@ class BinaryModel(DiscreteModel):
         -------
         Instance of frozen scipy distribution.
         """
-        pass
+        mu = self.predict(params, exog=exog, offset=offset)
+        # distr = stats.bernoulli(mu[:, None])
+        distr = stats.bernoulli(mu)
+        return distr


 class MultinomialModel(BinaryModel):

+    def _handle_data(self, endog, exog, missing, hasconst, **kwargs):
+        if data_tools._is_using_ndarray_type(endog, None):
+            endog_dummies, ynames = _numpy_to_dummies(endog)
+            yname = 'y'
+        elif data_tools._is_using_pandas(endog, None):
+            endog_dummies, ynames, yname = _pandas_to_dummies(endog)
+        else:
+            endog = np.asarray(endog)
+            endog_dummies, ynames = _numpy_to_dummies(endog)
+            yname = 'y'
+
+        if not isinstance(ynames, dict):
+            ynames = dict(zip(range(endog_dummies.shape[1]), ynames))
+
+        self._ynames_map = ynames
+        data = handle_data(endog_dummies, exog, missing, hasconst, **kwargs)
+        data.ynames = yname  # overwrite this to single endog name
+        data.orig_endog = endog
+        self.wendog = data.endog
+
+        # repeating from upstream...
+        for key in kwargs:
+            if key in ['design_info', 'formula']:  # leave attached to data
+                continue
+            try:
+                setattr(self, key, data.__dict__.pop(key))
+            except KeyError:
+                pass
+        return data
+
     def initialize(self):
         """
         Preprocesses the data for MNLogit.
         """
-        pass
+        super().initialize()
+        # This is also a "whiten" method in other models (eg regression)
+        self.endog = self.endog.argmax(1)  # turn it into an array of col idx
+        self.J = self.wendog.shape[1]
+        self.K = self.exog.shape[1]
+        self.df_model *= (self.J-1)  # for each J - 1 equation.
+        self.df_resid = self.exog.shape[0] - self.df_model - (self.J-1)

-    def predict(self, params, exog=None, which='mean', linear=None):
+    def predict(self, params, exog=None, which="mean", linear=None):
         """
         Predict response variable of a model given exogenous variables.

@@ -476,7 +774,58 @@ class MultinomialModel(BinaryModel):
         Column 0 is the base case, the rest conform to the rows of params
         shifted up one for the base case.
         """
-        pass
+        if linear is not None:
+            msg = 'linear keyword is deprecated, use which="linear"'
+            warnings.warn(msg, FutureWarning)
+            if linear is True:
+                which = "linear"
+
+        if exog is None: # do here to accommodate user-given exog
+            exog = self.exog
+        if exog.ndim == 1:
+            exog = exog[None]
+
+        pred = super().predict(params, exog, which=which)
+        if which == "linear":
+            pred = np.column_stack((np.zeros(len(exog)), pred))
+        return pred
+
+    @Appender(DiscreteModel.fit.__doc__)
+    def fit(self, start_params=None, method='newton', maxiter=35,
+            full_output=1, disp=1, callback=None, **kwargs):
+        if start_params is None:
+            start_params = np.zeros((self.K * (self.J-1)))
+        else:
+            start_params = np.asarray(start_params)
+
+        if callback is None:
+            # placeholder until check_perfect_pred
+            callback = lambda x, *args : None
+        # skip calling super to handle results from LikelihoodModel
+        mnfit = base.LikelihoodModel.fit(self, start_params = start_params,
+                method=method, maxiter=maxiter, full_output=full_output,
+                disp=disp, callback=callback, **kwargs)
+        mnfit.params = mnfit.params.reshape(self.K, -1, order='F')
+        mnfit = MultinomialResults(self, mnfit)
+        return MultinomialResultsWrapper(mnfit)
+
+    @Appender(DiscreteModel.fit_regularized.__doc__)
+    def fit_regularized(self, start_params=None, method='l1',
+            maxiter='defined_by_method', full_output=1, disp=1, callback=None,
+            alpha=0, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=1e-4,
+            qc_tol=0.03, **kwargs):
+        if start_params is None:
+            start_params = np.zeros((self.K * (self.J-1)))
+        else:
+            start_params = np.asarray(start_params)
+        mnfit = DiscreteModel.fit_regularized(
+                self, start_params=start_params, method=method, maxiter=maxiter,
+                full_output=full_output, disp=disp, callback=callback,
+                alpha=alpha, trim_mode=trim_mode, auto_trim_tol=auto_trim_tol,
+                size_trim_tol=size_trim_tol, qc_tol=qc_tol, **kwargs)
+        mnfit.params = mnfit.params.reshape(self.K, -1, order='F')
+        mnfit = L1MultinomialResults(self, mnfit)
+        return L1MultinomialResultsWrapper(mnfit)

     def _derivative_predict(self, params, exog=None, transform='dydx'):
         """
@@ -491,10 +840,37 @@ class MultinomialModel(BinaryModel):
         Transform can be 'dydx' or 'eydx'. Checking is done in margeff
         computations for appropriate transform.
         """
-        pass
+        if exog is None:
+            exog = self.exog
+        if params.ndim == 1: # will get flatted from approx_fprime
+            params = params.reshape(self.K, self.J-1, order='F')
+
+        eXB = np.exp(np.dot(exog, params))
+        sum_eXB = (1 + eXB.sum(1))[:,None]
+        J = int(self.J)
+        K = int(self.K)
+        repeat_eXB = np.repeat(eXB, J, axis=1)
+        X = np.tile(exog, J-1)
+        # this is the derivative wrt the base level
+        F0 = -repeat_eXB * X / sum_eXB ** 2
+        # this is the derivative wrt the other levels when
+        # dF_j / dParams_j (ie., own equation)
+        #NOTE: this computes too much, any easy way to cut down?
+        F1 = eXB.T[:,:,None]*X * (sum_eXB - repeat_eXB) / (sum_eXB**2)
+        F1 = F1.transpose((1,0,2)) # put the nobs index first
+
+        # other equation index
+        other_idx = ~np.kron(np.eye(J-1), np.ones(K)).astype(bool)
+        F1[:, other_idx] = (-eXB.T[:,:,None]*X*repeat_eXB / \
+                           (sum_eXB**2)).transpose((1,0,2))[:, other_idx]
+        dFdX = np.concatenate((F0[:, None,:], F1), axis=1)
+
+        if 'ey' in transform:
+            dFdX /= self.predict(params, exog)[:, :, None]
+        return dFdX

     def _derivative_exog(self, params, exog=None, transform='dydx',
-        dummy_idx=None, count_idx=None):
+                         dummy_idx=None, count_idx=None):
         """
         For computing marginal effects returns dF(XB) / dX where F(.) is
         the predicted probabilities
@@ -514,21 +890,50 @@ class MultinomialModel(BinaryModel):
         margeff.reshape(nobs, K, J, order='F).mean(0) and the marginal effects
         for choice J are in column J
         """
-        pass
+        J = int(self.J)  # number of alternative choices
+        K = int(self.K)  # number of variables
+        # Note: this form should be appropriate for
+        #   group 1 probit, logit, logistic, cloglog, heckprob, xtprobit
+        if exog is None:
+            exog = self.exog
+        if params.ndim == 1:  # will get flatted from approx_fprime
+            params = params.reshape(K, J-1, order='F')
+
+        zeroparams = np.c_[np.zeros(K), params]  # add base in
+
+        cdf = self.cdf(np.dot(exog, params))
+
+        # TODO: meaningful interpretation for `iterm`?
+        iterm = np.array([cdf[:, [i]] * zeroparams[:, i]
+                          for i in range(int(J))]).sum(0)
+
+        margeff = np.array([cdf[:, [j]] * (zeroparams[:, j] - iterm)
+                            for j in range(J)])
+
+        # swap the axes to make sure margeff are in order nobs, K, J
+        margeff = np.transpose(margeff, (1, 2, 0))
+
+        if 'ex' in transform:
+            margeff *= exog
+        if 'ey' in transform:
+            margeff /= self.predict(params, exog)[:,None,:]
+
+        margeff = self._derivative_exog_helper(margeff, params, exog,
+                                               dummy_idx, count_idx, transform)
+        return margeff.reshape(len(exog), -1, order='F')

     def get_distribution(self, params, exog=None, offset=None):
         """get frozen instance of distribution
         """
-        pass
+        raise NotImplementedError


 class CountModel(DiscreteModel):
-
-    def __init__(self, endog, exog, offset=None, exposure=None, missing=
-        'none', check_rank=True, **kwargs):
+    def __init__(self, endog, exog, offset=None, exposure=None, missing='none',
+                 check_rank=True, **kwargs):
         self._check_kwargs(kwargs)
-        super().__init__(endog, exog, check_rank, missing=missing, offset=
-            offset, exposure=exposure, **kwargs)
+        super().__init__(endog, exog, check_rank, missing=missing,
+                         offset=offset, exposure=exposure, **kwargs)
         if exposure is not None:
             self.exposure = np.asarray(self.exposure)
             self.exposure = np.log(self.exposure)
@@ -539,13 +944,57 @@ class CountModel(DiscreteModel):
             delattr(self, 'offset')
         if exposure is None:
             delattr(self, 'exposure')
+
+        # promote dtype to float64 if needed
         dt = np.promote_types(self.endog.dtype, np.float64)
         self.endog = np.asarray(self.endog, dt)
         dt = np.promote_types(self.exog.dtype, np.float64)
         self.exog = np.asarray(self.exog, dt)

-    def predict(self, params, exog=None, exposure=None, offset=None, which=
-        'mean', linear=None):
+
+    def _check_inputs(self, offset, exposure, endog):
+        if offset is not None and offset.shape[0] != endog.shape[0]:
+            raise ValueError("offset is not the same length as endog")
+
+        if exposure is not None and exposure.shape[0] != endog.shape[0]:
+            raise ValueError("exposure is not the same length as endog")
+
+    def _get_init_kwds(self):
+        # this is a temporary fixup because exposure has been transformed
+        # see #1609
+        kwds = super()._get_init_kwds()
+        if 'exposure' in kwds and kwds['exposure'] is not None:
+            kwds['exposure'] = np.exp(kwds['exposure'])
+        return kwds
+
+    def _get_predict_arrays(self, exog=None, offset=None, exposure=None):
+
+        # convert extras if not None
+        if exposure is not None:
+            exposure = np.log(np.asarray(exposure))
+        if offset is not None:
+            offset = np.asarray(offset)
+
+        # get defaults
+        if exog is None:
+            # prediction is in-sample
+            exog = self.exog
+            if exposure is None:
+                exposure = getattr(self, 'exposure', 0)
+            if offset is None:
+                offset = getattr(self, 'offset', 0)
+        else:
+            # user specified
+            exog = np.asarray(exog)
+            if exposure is None:
+                exposure = 0
+            if offset is None:
+                offset = 0
+
+        return exog, offset, exposure
+
+    def predict(self, params, exog=None, exposure=None, offset=None,
+                which='mean', linear=None):
         """
         Predict response variable of a count model given exogenous variables

@@ -594,7 +1043,39 @@ class CountModel(DiscreteModel):
         If exposure is specified, then it will be logged by the method.
         The user does not need to log it first.
         """
-        pass
+        if linear is not None:
+            msg = 'linear keyword is deprecated, use which="linear"'
+            warnings.warn(msg, FutureWarning)
+            if linear is True:
+                which = "linear"
+
+        # the following is copied from GLM predict (without family/link check)
+        # Use fit offset if appropriate
+        if offset is None and exog is None and hasattr(self, 'offset'):
+            offset = self.offset
+        elif offset is None:
+            offset = 0.
+
+        # Use fit exposure if appropriate
+        if exposure is None and exog is None and hasattr(self, 'exposure'):
+            # Already logged
+            exposure = self.exposure
+        elif exposure is None:
+            exposure = 0.
+        else:
+            exposure = np.log(exposure)
+
+        if exog is None:
+            exog = self.exog
+
+        fitted = np.dot(exog, params[:exog.shape[1]])
+        linpred = fitted + exposure + offset
+        if which == "mean":
+            return np.exp(linpred)
+        elif which.startswith("lin"):
+            return linpred
+        else:
+            raise ValueError('keyword which has to be "mean" and "linear"')

     def _derivative_predict(self, params, exog=None, transform='dydx'):
         """
@@ -607,10 +1088,16 @@ class CountModel(DiscreteModel):
         Transform can be 'dydx' or 'eydx'. Checking is done in margeff
         computations for appropriate transform.
         """
-        pass
+        if exog is None:
+            exog = self.exog
+        #NOTE: this handles offset and exposure
+        dF = self.predict(params, exog)[:,None] * exog
+        if 'ey' in transform:
+            dF /= self.predict(params, exog)[:,None]
+        return dF

-    def _derivative_exog(self, params, exog=None, transform='dydx',
-        dummy_idx=None, count_idx=None):
+    def _derivative_exog(self, params, exog=None, transform="dydx",
+                         dummy_idx=None, count_idx=None):
         """
         For computing marginal effects. These are the marginal effects
         d F(XB) / dX
@@ -622,7 +1109,19 @@ class CountModel(DiscreteModel):
         Not all of these make sense in the presence of discrete regressors,
         but checks are done in the results in get_margeff.
         """
-        pass
+        # group 3 poisson, nbreg, zip, zinb
+        if exog is None:
+            exog = self.exog
+        k_extra = getattr(self, 'k_extra', 0)
+        params_exog = params if k_extra == 0 else params[:-k_extra]
+        margeff = self.predict(params, exog)[:,None] * params_exog[None,:]
+        if 'ex' in transform:
+            margeff *= exog
+        if 'ey' in transform:
+            margeff /= self.predict(params, exog)[:,None]
+
+        return self._derivative_exog_helper(margeff, params, exog,
+                                            dummy_idx, count_idx, transform)

     def _deriv_mean_dparams(self, params):
         """
@@ -638,12 +1137,59 @@ class CountModel(DiscreteModel):
         The value of the derivative of the expected endog with respect
         to the parameter vector.
         """
-        pass
+        from statsmodels.genmod.families import links
+        link = links.Log()
+        lin_pred = self.predict(params, which="linear")
+        idl = link.inverse_deriv(lin_pred)
+        dmat = self.exog * idl[:, None]
+        if self.k_extra > 0:
+            dmat_extra = np.zeros((dmat.shape[0], self.k_extra))
+            dmat = np.column_stack((dmat, dmat_extra))
+        return dmat
+
+
+    @Appender(DiscreteModel.fit.__doc__)
+    def fit(self, start_params=None, method='newton', maxiter=35,
+            full_output=1, disp=1, callback=None, **kwargs):
+        cntfit = super().fit(start_params=start_params,
+                             method=method,
+                             maxiter=maxiter,
+                             full_output=full_output,
+                             disp=disp,
+                             callback=callback,
+                             **kwargs)
+        discretefit = CountResults(self, cntfit)
+        return CountResultsWrapper(discretefit)
+
+    @Appender(DiscreteModel.fit_regularized.__doc__)
+    def fit_regularized(self, start_params=None, method='l1',
+            maxiter='defined_by_method', full_output=1, disp=1, callback=None,
+            alpha=0, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=1e-4,
+            qc_tol=0.03, **kwargs):
+
+        _validate_l1_method(method)
+
+        cntfit = super().fit_regularized(start_params=start_params,
+                                         method=method,
+                                         maxiter=maxiter,
+                                         full_output=full_output,
+                                         disp=disp,
+                                         callback=callback,
+                                         alpha=alpha,
+                                         trim_mode=trim_mode, auto_trim_tol=auto_trim_tol,
+                                         size_trim_tol=size_trim_tol,
+                                         qc_tol=qc_tol,
+                                         **kwargs)
+
+        discretefit = L1CountResults(self, cntfit)
+        return L1CountResultsWrapper(discretefit)
+
+
+# Public Model Classes


 class Poisson(CountModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Poisson Model

     %(params)s
@@ -655,15 +1201,19 @@ class Poisson(CountModel):
         A reference to the endogenous response variable
     exog : ndarray
         A reference to the exogenous design.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """offset : array_like
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+           """offset : array_like
         Offset is added to the linear prediction with coefficient equal to 1.
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
         equal to 1.
-        """
-         + base._missing_param_doc + _check_rank_doc})
+        """ + base._missing_param_doc + _check_rank_doc}
+
+    @cache_readonly
+    def family(self):
+        from statsmodels.genmod import families
+        return families.Poisson()

     def cdf(self, X):
         """
@@ -690,7 +1240,8 @@ class Poisson(CountModel):

         The parameter `X` is :math:`X\\beta` in the above formula.
         """
-        pass
+        y = self.endog
+        return stats.poisson.cdf(y, np.exp(X))

     def pdf(self, X):
         """
@@ -719,7 +1270,8 @@ class Poisson(CountModel):

         The parameter `X` is :math:`x_{i}\\beta` in the above formula.
         """
-        pass
+        y = self.endog
+        return np.exp(stats.poisson.logpmf(y, np.exp(X)))

     def loglike(self, params):
         """
@@ -740,7 +1292,15 @@ class Poisson(CountModel):
         -----
         .. math:: \\ln L=\\sum_{i=1}^{n}\\left[-\\lambda_{i}+y_{i}x_{i}^{\\prime}\\beta-\\ln y_{i}!\\right]
         """
-        pass
+        offset = getattr(self, "offset", 0)
+        exposure = getattr(self, "exposure", 0)
+        XB = np.dot(self.exog, params) + offset + exposure
+        endog = self.endog
+        return np.sum(
+            -np.exp(np.clip(XB, None, EXP_UPPER_LIMIT))
+            + endog * XB
+            - gammaln(endog + 1)
+        )

     def loglikeobs(self, params):
         """
@@ -763,7 +1323,62 @@ class Poisson(CountModel):

         for observations :math:`i=1,...,n`
         """
-        pass
+        offset = getattr(self, "offset", 0)
+        exposure = getattr(self, "exposure", 0)
+        XB = np.dot(self.exog, params) + offset + exposure
+        endog = self.endog
+        #np.sum(stats.poisson.logpmf(endog, np.exp(XB)))
+        return -np.exp(XB) +  endog*XB - gammaln(endog+1)
+
+    @Appender(_get_start_params_null_docs)
+    def _get_start_params_null(self):
+        offset = getattr(self, "offset", 0)
+        exposure = getattr(self, "exposure", 0)
+        const = (self.endog / np.exp(offset + exposure)).mean()
+        params = [np.log(const)]
+        return params
+
+    @Appender(DiscreteModel.fit.__doc__)
+    def fit(self, start_params=None, method='newton', maxiter=35,
+            full_output=1, disp=1, callback=None, **kwargs):
+
+        if start_params is None and self.data.const_idx is not None:
+            # k_params or k_exog not available?
+            start_params = 0.001 * np.ones(self.exog.shape[1])
+            start_params[self.data.const_idx] = self._get_start_params_null()[0]
+
+        kwds = {}
+        if kwargs.get('cov_type') is not None:
+            kwds['cov_type'] = kwargs.get('cov_type')
+            kwds['cov_kwds'] = kwargs.get('cov_kwds', {})
+
+        cntfit = super(CountModel, self).fit(start_params=start_params,
+                                             method=method,
+                                             maxiter=maxiter,
+                                             full_output=full_output,
+                                             disp=disp,
+                                             callback=callback,
+                                             **kwargs)
+
+        discretefit = PoissonResults(self, cntfit, **kwds)
+        return PoissonResultsWrapper(discretefit)
+
+    @Appender(DiscreteModel.fit_regularized.__doc__)
+    def fit_regularized(self, start_params=None, method='l1',
+            maxiter='defined_by_method', full_output=1, disp=1, callback=None,
+            alpha=0, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=1e-4,
+            qc_tol=0.03, **kwargs):
+
+        _validate_l1_method(method)
+
+        cntfit = super(CountModel, self).fit_regularized(
+                start_params=start_params, method=method, maxiter=maxiter,
+                full_output=full_output, disp=disp, callback=callback,
+                alpha=alpha, trim_mode=trim_mode, auto_trim_tol=auto_trim_tol,
+                size_trim_tol=size_trim_tol, qc_tol=qc_tol, **kwargs)
+
+        discretefit = L1PoissonResults(self, cntfit)
+        return L1PoissonResultsWrapper(discretefit)

     def fit_constrained(self, constraints, start_params=None, **fit_kwds):
         """fit the model subject to linear equality constraints
@@ -794,7 +1409,45 @@ class Poisson(CountModel):
         -------
         results : Results instance
         """
-        pass
+
+        #constraints = (R, q)
+        # TODO: temporary trailing underscore to not overwrite the monkey
+        #       patched version
+        # TODO: decide whether to move the imports
+        from patsy import DesignInfo
+        from statsmodels.base._constraints import (fit_constrained,
+                                                   LinearConstraints)
+
+        # same pattern as in base.LikelihoodModel.t_test
+        lc = DesignInfo(self.exog_names).linear_constraint(constraints)
+        R, q = lc.coefs, lc.constants
+
+        # TODO: add start_params option, need access to tranformation
+        #       fit_constrained needs to do the transformation
+        params, cov, res_constr = fit_constrained(self, R, q,
+                                                  start_params=start_params,
+                                                  fit_kwds=fit_kwds)
+        #create dummy results Instance, TODO: wire up properly
+        res = self.fit(maxiter=0, method='nm', disp=0,
+                       warn_convergence=False) # we get a wrapper back
+        res.mle_retvals['fcall'] = res_constr.mle_retvals.get('fcall', np.nan)
+        res.mle_retvals['iterations'] = res_constr.mle_retvals.get(
+                                                        'iterations', np.nan)
+        res.mle_retvals['converged'] = res_constr.mle_retvals['converged']
+        res._results.params = params
+        res._results.cov_params_default = cov
+        cov_type = fit_kwds.get('cov_type', 'nonrobust')
+        if cov_type != 'nonrobust':
+            res._results.normalized_cov_params = cov # assume scale=1
+        else:
+            res._results.normalized_cov_params = None
+        k_constr = len(q)
+        res._results.df_resid += k_constr
+        res._results.df_model -= k_constr
+        res._results.constraints = LinearConstraints.from_patsy(lc)
+        res._results.k_constr = k_constr
+        res._results.results_constrained = res_constr
+        return res

     def score(self, params):
         """
@@ -819,7 +1472,11 @@ class Poisson(CountModel):

         .. math:: \\ln\\lambda_{i}=x_{i}\\beta
         """
-        pass
+        offset = getattr(self, "offset", 0)
+        exposure = getattr(self, "exposure", 0)
+        X = self.exog
+        L = np.exp(np.dot(X,params) + offset + exposure)
+        return np.dot(self.endog - L, X)

     def score_obs(self, params):
         """
@@ -845,7 +1502,11 @@ class Poisson(CountModel):

         .. math:: \\ln\\lambda_{i}=x_{i}\\beta
         """
-        pass
+        offset = getattr(self, "offset", 0)
+        exposure = getattr(self, "exposure", 0)
+        X = self.exog
+        L = np.exp(np.dot(X,params) + offset + exposure)
+        return (self.endog - L)[:,None] * X

     def score_factor(self, params):
         """
@@ -871,7 +1532,11 @@ class Poisson(CountModel):

         .. math:: \\ln\\lambda_{i}=x_{i}\\beta
         """
-        pass
+        offset = getattr(self, "offset", 0)
+        exposure = getattr(self, "exposure", 0)
+        X = self.exog
+        L = np.exp(np.dot(X,params) + offset + exposure)
+        return (self.endog - L)

     def hessian(self, params):
         """
@@ -896,7 +1561,11 @@ class Poisson(CountModel):

         .. math:: \\ln\\lambda_{i}=x_{i}\\beta
         """
-        pass
+        offset = getattr(self, "offset", 0)
+        exposure = getattr(self, "exposure", 0)
+        X = self.exog
+        L = np.exp(np.dot(X,params) + exposure + offset)
+        return -np.dot(L*X.T, X)

     def hessian_factor(self, params):
         """
@@ -921,7 +1590,11 @@ class Poisson(CountModel):

         .. math:: \\ln\\lambda_{i}=x_{i}\\beta
         """
-        pass
+        offset = getattr(self, "offset", 0)
+        exposure = getattr(self, "exposure", 0)
+        X = self.exog
+        L = np.exp(np.dot(X,params) + exposure + offset)
+        return -L

     def _deriv_score_obs_dendog(self, params, scale=None):
         """derivative of score_obs w.r.t. endog
@@ -942,10 +1615,10 @@ class Poisson(CountModel):
             can is given by `score_factor0[:, None] * exog` where
             `score_factor0` is the score_factor without the residual.
         """
-        pass
+        return self.exog

-    def predict(self, params, exog=None, exposure=None, offset=None, which=
-        'mean', linear=None, y_values=None):
+    def predict(self, params, exog=None, exposure=None, offset=None,
+                which='mean', linear=None, y_values=None):
         """
         Predict response variable of a model given exogenous variables.

@@ -1001,21 +1674,54 @@ class Poisson(CountModel):
             Values of the random variable endog at which pmf is evaluated.
             Only used if ``which="prob"``
         """
-        pass
+        # Note docstring is reused by other count models
+
+        if linear is not None:
+            msg = 'linear keyword is deprecated, use which="linear"'
+            warnings.warn(msg, FutureWarning)
+            if linear is True:
+                which = "linear"
+
+        if which.startswith("lin"):
+            which = "linear"
+        if which in ["mean", "linear"]:
+            return super().predict(params, exog=exog, exposure=exposure,
+                                   offset=offset,
+                                   which=which, linear=linear)
+        # TODO: add full set of which
+        elif which == "var":
+            mu = self.predict(params, exog=exog,
+                              exposure=exposure, offset=offset,
+                              )
+            return mu
+        elif which == "prob":
+            if y_values is not None:
+                y_values = np.atleast_2d(y_values)
+            else:
+                y_values = np.atleast_2d(
+                    np.arange(0, np.max(self.endog) + 1))
+            mu = self.predict(params, exog=exog,
+                              exposure=exposure, offset=offset,
+                              )[:, None]
+            # uses broadcasting
+            return stats.poisson._pmf(y_values, mu)
+        else:
+            raise ValueError('Value of the `which` option is not recognized')

     def _prob_nonzero(self, mu, params=None):
         """Probability that count is not zero

         internal use in Censored model, will be refactored or removed
         """
-        pass
+        prob_nz = - np.expm1(-mu)
+        return prob_nz

     def _var(self, mu, params=None):
         """variance implied by the distribution

         internal use, will be refactored or removed
         """
-        pass
+        return mu

     def get_distribution(self, params, exog=None, exposure=None, offset=None):
         """Get frozen instance of distribution based on predicted parameters.
@@ -1044,12 +1750,13 @@ class Poisson(CountModel):
         -------
         Instance of frozen scipy distribution subclass.
         """
-        pass
+        mu = self.predict(params, exog=exog, exposure=exposure, offset=offset)
+        distr = stats.poisson(mu)
+        return distr


 class GeneralizedPoisson(CountModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Generalized Poisson Model

     %(params)s
@@ -1061,9 +1768,9 @@ class GeneralizedPoisson(CountModel):
         A reference to the endogenous response variable
     exog : ndarray
         A reference to the exogenous design.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+               """
     p : scalar
         P denotes parameterizations for GP regression. p=1 for GP-1 and
         p=2 for GP-2. Default is p=1.
@@ -1071,18 +1778,30 @@ class GeneralizedPoisson(CountModel):
         Offset is added to the linear prediction with coefficient equal to 1.
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
-        equal to 1."""
-         + base._missing_param_doc + _check_rank_doc})
-
-    def __init__(self, endog, exog, p=1, offset=None, exposure=None,
-        missing='none', check_rank=True, **kwargs):
-        super().__init__(endog, exog, offset=offset, exposure=exposure,
-            missing=missing, check_rank=check_rank, **kwargs)
+        equal to 1.""" + base._missing_param_doc + _check_rank_doc}
+
+    def __init__(self, endog, exog, p=1, offset=None,
+                 exposure=None, missing='none', check_rank=True, **kwargs):
+        super().__init__(endog,
+                         exog,
+                         offset=offset,
+                         exposure=exposure,
+                         missing=missing,
+                         check_rank=check_rank,
+                         **kwargs)
         self.parameterization = p - 1
         self.exog_names.append('alpha')
         self.k_extra = 1
         self._transparams = False

+    def _get_init_kwds(self):
+        kwds = super()._get_init_kwds()
+        kwds['p'] = self.parameterization + 1
+        return kwds
+
+    def _get_exogs(self):
+        return (self.exog, None)
+
     def loglike(self, params):
         """
         Loglikelihood of Generalized Poisson model
@@ -1105,7 +1824,7 @@ class GeneralizedPoisson(CountModel):
             ln(y_{i}!)-\\frac{\\mu_{i}+\\alpha*\\mu_{i}^{p-1}*y_{i}}{1+\\alpha*
             \\mu_{i}^{p-1}}\\right]
         """
-        pass
+        return np.sum(self.loglikeobs(params))

     def loglikeobs(self, params):
         """
@@ -1131,7 +1850,210 @@ class GeneralizedPoisson(CountModel):

         for observations :math:`i=1,...,n`
         """
-        pass
+        if self._transparams:
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+        params = params[:-1]
+        p = self.parameterization
+        endog = self.endog
+        mu = self.predict(params)
+        mu_p = np.power(mu, p)
+        a1 = 1 + alpha * mu_p
+        a2 = mu + (a1 - 1) * endog
+        a1 = np.maximum(1e-20, a1)
+        a2 = np.maximum(1e-20, a2)
+        return (np.log(mu) + (endog - 1) * np.log(a2) - endog *
+                np.log(a1) - gammaln(endog + 1) - a2 / a1)
+
+    @Appender(_get_start_params_null_docs)
+    def _get_start_params_null(self):
+        offset = getattr(self, "offset", 0)
+        exposure = getattr(self, "exposure", 0)
+
+        const = (self.endog / np.exp(offset + exposure)).mean()
+        params = [np.log(const)]
+        mu = const * np.exp(offset + exposure)
+        resid = self.endog - mu
+        a = self._estimate_dispersion(mu, resid, df_resid=resid.shape[0] - 1)
+        params.append(a)
+
+        return np.array(params)
+
+    def _estimate_dispersion(self, mu, resid, df_resid=None):
+        q = self.parameterization
+        if df_resid is None:
+            df_resid = resid.shape[0]
+        a = ((np.abs(resid) / np.sqrt(mu) - 1) * mu**(-q)).sum() / df_resid
+        return a
+
+
+    @Appender(
+        """
+        use_transparams : bool
+            This parameter enable internal transformation to impose
+            non-negativity. True to enable. Default is False.
+            use_transparams=True imposes the no underdispersion (alpha > 0)
+            constraint. In case use_transparams=True and method="newton" or
+            "ncg" transformation is ignored.
+        """)
+    @Appender(DiscreteModel.fit.__doc__)
+    def fit(self, start_params=None, method='bfgs', maxiter=35,
+            full_output=1, disp=1, callback=None, use_transparams=False,
+            cov_type='nonrobust', cov_kwds=None, use_t=None, optim_kwds_prelim=None,
+            **kwargs):
+        if use_transparams and method not in ['newton', 'ncg']:
+            self._transparams = True
+        else:
+            if use_transparams:
+                warnings.warn('Parameter "use_transparams" is ignored',
+                              RuntimeWarning)
+            self._transparams = False
+
+        if start_params is None:
+            offset = getattr(self, "offset", 0) + getattr(self, "exposure", 0)
+            if np.size(offset) == 1 and offset == 0:
+                offset = None
+            kwds_prelim = {'disp': 0, 'skip_hessian': True,
+                           'warn_convergence': False}
+            if optim_kwds_prelim is not None:
+                kwds_prelim.update(optim_kwds_prelim)
+            mod_poi = Poisson(self.endog, self.exog, offset=offset)
+            with warnings.catch_warnings():
+                warnings.simplefilter("always")
+                res_poi = mod_poi.fit(**kwds_prelim)
+            start_params = res_poi.params
+            a = self._estimate_dispersion(res_poi.predict(), res_poi.resid,
+                                          df_resid=res_poi.df_resid)
+            start_params = np.append(start_params, max(-0.1, a))
+
+        if callback is None:
+            # work around perfect separation callback #3895
+            callback = lambda *x: x
+
+        mlefit = super().fit(start_params=start_params,
+                             maxiter=maxiter,
+                             method=method,
+                             disp=disp,
+                             full_output=full_output,
+                             callback=callback,
+                             **kwargs)
+        if optim_kwds_prelim is not None:
+            mlefit.mle_settings["optim_kwds_prelim"] = optim_kwds_prelim
+        if use_transparams and method not in ["newton", "ncg"]:
+            self._transparams = False
+            mlefit._results.params[-1] = np.exp(mlefit._results.params[-1])
+
+        gpfit = GeneralizedPoissonResults(self, mlefit._results)
+        result = GeneralizedPoissonResultsWrapper(gpfit)
+
+        if cov_kwds is None:
+            cov_kwds = {}
+
+        result._get_robustcov_results(cov_type=cov_type,
+                                      use_self=True, use_t=use_t, **cov_kwds)
+        return result
+
+    @Appender(DiscreteModel.fit_regularized.__doc__)
+    def fit_regularized(self, start_params=None, method='l1',
+            maxiter='defined_by_method', full_output=1, disp=1, callback=None,
+            alpha=0, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=1e-4,
+            qc_tol=0.03, **kwargs):
+
+        _validate_l1_method(method)
+
+        if np.size(alpha) == 1 and alpha != 0:
+            k_params = self.exog.shape[1] + self.k_extra
+            alpha = alpha * np.ones(k_params)
+            alpha[-1] = 0
+
+        alpha_p = alpha[:-1] if (self.k_extra and np.size(alpha) > 1) else alpha
+        self._transparams = False
+        if start_params is None:
+            offset = getattr(self, "offset", 0) + getattr(self, "exposure", 0)
+            if np.size(offset) == 1 and offset == 0:
+                offset = None
+            mod_poi = Poisson(self.endog, self.exog, offset=offset)
+            with warnings.catch_warnings():
+                warnings.simplefilter("always")
+                start_params = mod_poi.fit_regularized(
+                    start_params=start_params, method=method, maxiter=maxiter,
+                    full_output=full_output, disp=0, callback=callback,
+                    alpha=alpha_p, trim_mode=trim_mode,
+                    auto_trim_tol=auto_trim_tol, size_trim_tol=size_trim_tol,
+                    qc_tol=qc_tol, **kwargs).params
+            start_params = np.append(start_params, 0.1)
+
+        cntfit = super(CountModel, self).fit_regularized(
+                start_params=start_params, method=method, maxiter=maxiter,
+                full_output=full_output, disp=disp, callback=callback,
+                alpha=alpha, trim_mode=trim_mode, auto_trim_tol=auto_trim_tol,
+                size_trim_tol=size_trim_tol, qc_tol=qc_tol, **kwargs)
+
+        discretefit = L1GeneralizedPoissonResults(self, cntfit)
+        return L1GeneralizedPoissonResultsWrapper(discretefit)
+
+    def score_obs(self, params):
+        if self._transparams:
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+
+        params = params[:-1]
+        p = self.parameterization
+        exog = self.exog
+        y = self.endog[:,None]
+        mu = self.predict(params)[:,None]
+        mu_p = np.power(mu, p)
+        a1 = 1 + alpha * mu_p
+        a2 = mu + alpha * mu_p * y
+        a3 = alpha * p * mu ** (p - 1)
+        a4 = a3 * y
+        dmudb = mu * exog
+
+        dalpha = (mu_p * (y * ((y - 1) / a2 - 2 / a1) + a2 / a1**2))
+        dparams = dmudb * (-a4 / a1 +
+                           a3 * a2 / (a1 ** 2) +
+                           (1 + a4) * ((y - 1) / a2 - 1 / a1) +
+                           1 / mu)
+
+        return np.concatenate((dparams, np.atleast_2d(dalpha)),
+                              axis=1)
+
+    def score(self, params):
+        score = np.sum(self.score_obs(params), axis=0)
+        if self._transparams:
+            score[-1] == score[-1] ** 2
+            return score
+        else:
+            return score
+
+    def score_factor(self, params, endog=None):
+        params = np.asarray(params)
+        if self._transparams:
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+
+        params = params[:-1]
+        p = self.parameterization
+        y = self.endog if endog is None else endog
+
+        mu = self.predict(params)
+        mu_p = np.power(mu, p)
+        a1 = 1 + alpha * mu_p
+        a2 = mu + alpha * mu_p * y
+        a3 = alpha * p * mu ** (p - 1)
+        a4 = a3 * y
+        dmudb = mu
+
+        dalpha = (mu_p * (y * ((y - 1) / a2 - 2 / a1) + a2 / a1**2))
+        dparams = dmudb * (-a4 / a1 +
+                           a3 * a2 / (a1 ** 2) +
+                           (1 + a4) * ((y - 1) / a2 - 1 / a1) +
+                           1 / mu)
+
+        return dparams, dalpha

     def _score_p(self, params):
         """
@@ -1148,7 +2070,21 @@ class GeneralizedPoisson(CountModel):
             dldp is first derivative of the loglikelihood function,
         evaluated at `p-parameter`.
         """
-        pass
+        if self._transparams:
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+        params = params[:-1]
+        p = self.parameterization
+        y = self.endog[:,None]
+        mu = self.predict(params)[:,None]
+        mu_p = np.power(mu, p)
+        a1 = 1 + alpha * mu_p
+        a2 = mu + alpha * mu_p * y
+
+        dp = np.sum((np.log(mu) * ((a2 - mu) * ((y - 1) / a2 - 2 / a1) +
+                                   (a1 - 1) * a2 / a1 ** 2)))
+        return dp

     def hessian(self, params):
         """
@@ -1165,7 +2101,66 @@ class GeneralizedPoisson(CountModel):
             The Hessian, second derivative of loglikelihood function,
             evaluated at `params`
         """
-        pass
+        if self._transparams:
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+
+        params = params[:-1]
+        p = self.parameterization
+        exog = self.exog
+        y = self.endog[:,None]
+        mu = self.predict(params)[:,None]
+        mu_p = np.power(mu, p)
+        a1 = 1 + alpha * mu_p
+        a2 = mu + alpha * mu_p * y
+        a3 = alpha * p * mu ** (p - 1)
+        a4 = a3 * y
+        a5 = p * mu ** (p - 1)
+        dmudb = mu * exog
+
+        # for dl/dparams dparams
+        dim = exog.shape[1]
+        hess_arr = np.empty((dim+1,dim+1))
+
+        for i in range(dim):
+            for j in range(i + 1):
+                hess_val = np.sum(mu * exog[:,i,None] * exog[:,j,None] *
+                    (mu * (a3 * a4 / a1**2 -
+                           2 * a3**2 * a2 / a1**3 +
+                           2 * a3 * (a4 + 1) / a1**2 -
+                           a4 * p / (mu * a1) +
+                           a3 * p * a2 / (mu * a1**2) +
+                           (y - 1) * a4 * (p - 1) / (a2 * mu) -
+                           (y - 1) * (1 + a4)**2 / a2**2 -
+                           a4 * (p - 1) / (a1 * mu)) +
+                     ((y - 1) * (1 + a4) / a2 -
+                      (1 + a4) / a1)), axis=0)
+                hess_arr[i, j] = np.squeeze(hess_val)
+        tri_idx = np.triu_indices(dim, k=1)
+        hess_arr[tri_idx] = hess_arr.T[tri_idx]
+
+        # for dl/dparams dalpha
+        dldpda = np.sum((2 * a4 * mu_p / a1**2 -
+                         2 * a3 * mu_p * a2 / a1**3 -
+                         mu_p * y * (y - 1) * (1 + a4) / a2**2 +
+                         mu_p * (1 + a4) / a1**2 +
+                         a5 * y * (y - 1) / a2 -
+                         2 * a5 * y / a1 +
+                         a5 * a2 / a1**2) * dmudb,
+                        axis=0)
+
+        hess_arr[-1,:-1] = dldpda
+        hess_arr[:-1,-1] = dldpda
+
+        # for dl/dalpha dalpha
+        dldada = mu_p**2 * (3 * y / a1**2 -
+                            (y / a2)**2. * (y - 1) -
+                            2 * a2 / a1**3)
+
+        hess_arr[-1,-1] = dldada.sum()
+
+        return hess_arr

     def hessian_factor(self, params):
         """
@@ -1188,7 +2183,95 @@ class GeneralizedPoisson(CountModel):
             parameter.

         """
-        pass
+        params = np.asarray(params)
+        if self._transparams:
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+
+        params = params[:-1]
+        p = self.parameterization
+        y = self.endog
+        mu = self.predict(params)
+        mu_p = np.power(mu, p)
+        a1 = 1 + alpha * mu_p
+        a2 = mu + alpha * mu_p * y
+        a3 = alpha * p * mu ** (p - 1)
+        a4 = a3 * y
+        a5 = p * mu ** (p - 1)
+        dmudb = mu
+
+        dbb = mu * (
+             mu * (a3 * a4 / a1**2 -
+                   2 * a3**2 * a2 / a1**3 +
+                   2 * a3 * (a4 + 1) / a1**2 -
+                   a4 * p / (mu * a1) +
+                   a3 * p * a2 / (mu * a1**2) +
+                   a4 / (mu * a1) -
+                   a3 * a2 / (mu * a1**2) +
+                   (y - 1) * a4 * (p - 1) / (a2 * mu) -
+                   (y - 1) * (1 + a4)**2 / a2**2 -
+                   a4 * (p - 1) / (a1 * mu) -
+                   1 / mu**2) +
+             (-a4 / a1 +
+              a3 * a2 / a1**2 +
+              (y - 1) * (1 + a4) / a2 -
+              (1 + a4) / a1 +
+              1 / mu))
+
+        # for dl/dlinpred dalpha
+        dba = ((2 * a4 * mu_p / a1**2 -
+                         2 * a3 * mu_p * a2 / a1**3 -
+                         mu_p * y * (y - 1) * (1 + a4) / a2**2 +
+                         mu_p * (1 + a4) / a1**2 +
+                         a5 * y * (y - 1) / a2 -
+                         2 * a5 * y / a1 +
+                         a5 * a2 / a1**2) * dmudb)
+
+        # for dl/dalpha dalpha
+        daa = mu_p**2 * (3 * y / a1**2 -
+                            (y / a2)**2. * (y - 1) -
+                            2 * a2 / a1**3)
+
+        return dbb, dba, daa
+
+    @Appender(Poisson.predict.__doc__)
+    def predict(self, params, exog=None, exposure=None, offset=None,
+                which='mean', y_values=None):
+
+        if exog is None:
+            exog = self.exog
+
+        if exposure is None:
+            exposure = getattr(self, 'exposure', 0)
+        elif exposure != 0:
+            exposure = np.log(exposure)
+
+        if offset is None:
+            offset = getattr(self, 'offset', 0)
+
+        fitted = np.dot(exog, params[:exog.shape[1]])
+        linpred = fitted + exposure + offset
+
+        if which == 'mean':
+            return np.exp(linpred)
+        elif which == 'linear':
+            return linpred
+        elif which == 'var':
+            mean = np.exp(linpred)
+            alpha = params[-1]
+            pm1 = self.parameterization  # `p - 1` in GPP
+            var_ = mean * (1 + alpha * mean**pm1)**2
+            return var_
+        elif which == 'prob':
+            if y_values is None:
+                y_values = np.atleast_2d(np.arange(0, np.max(self.endog)+1))
+            mu = self.predict(params, exog=exog, exposure=exposure,
+                              offset=offset)[:, None]
+            return genpoisson_p.pmf(y_values, mu, params[-1],
+                                    self.parameterization + 1)
+        else:
+            raise ValueError('keyword \'which\' not recognized')

     def _deriv_score_obs_dendog(self, params):
         """derivative of score_obs w.r.t. endog
@@ -1203,32 +2286,56 @@ class GeneralizedPoisson(CountModel):
         derivative : ndarray_2d
             The derivative of the score_obs with respect to endog.
         """
-        pass
+        # code duplication with NegativeBinomialP
+        from statsmodels.tools.numdiff import _approx_fprime_cs_scalar
+
+        def f(y):
+            if y.ndim == 2 and y.shape[1] == 1:
+                y = y[:, 0]
+            sf = self.score_factor(params, endog=y)
+            return np.column_stack(sf)
+
+        dsf = _approx_fprime_cs_scalar(self.endog[:, None], f)
+        # deriv is 2d vector
+        d1 = dsf[:, :1] * self.exog
+        d2 = dsf[:, 1:2]
+
+        return np.column_stack((d1, d2))

     def _var(self, mu, params=None):
         """variance implied by the distribution

         internal use, will be refactored or removed
         """
-        pass
+        alpha = params[-1]
+        pm1 = self.parameterization  # `p-1` in GPP
+        var_ = mu * (1 + alpha * mu**pm1)**2
+        return var_

     def _prob_nonzero(self, mu, params):
         """Probability that count is not zero

         internal use in Censored model, will be refactored or removed
         """
-        pass
+        alpha = params[-1]
+        pm1 = self.parameterization  # p-1 in GPP
+        prob_zero = np.exp(- mu / (1 + alpha * mu**pm1))
+        prob_nz = 1 - prob_zero
+        return prob_nz

     @Appender(Poisson.get_distribution.__doc__)
     def get_distribution(self, params, exog=None, exposure=None, offset=None):
         """get frozen instance of distribution
         """
-        pass
+        mu = self.predict(params, exog=exog, exposure=exposure, offset=offset)
+        p = self.parameterization + 1
+        # distr = genpoisson_p(mu[:, None], params[-1], p)
+        distr = genpoisson_p(mu, params[-1], p)
+        return distr


 class Logit(BinaryModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Logit Model

     %(params)s
@@ -1242,11 +2349,17 @@ class Logit(BinaryModel):
         A reference to the endogenous response variable
     exog : ndarray
         A reference to the exogenous design.
-    """
-         % {'params': base._model_params_doc, 'extra_params': base.
-        _missing_param_doc + _check_rank_doc})
+    """ % {'params': base._model_params_doc,
+           'extra_params': base._missing_param_doc + _check_rank_doc}
+
     _continuous_ok = True

+    @cache_readonly
+    def link(self):
+        from statsmodels.genmod.families import links
+        link = links.Logit()
+        return link
+
     def cdf(self, X):
         """
         The logistic cumulative distribution function
@@ -1268,7 +2381,8 @@ class Logit(BinaryModel):
                   \\text{Prob}\\left(Y=1|x\\right)=
                   \\frac{e^{x^{\\prime}\\beta}}{1+e^{x^{\\prime}\\beta}}
         """
-        pass
+        X = np.asarray(X)
+        return 1/(1+np.exp(-X))

     def pdf(self, X):
         """
@@ -1291,7 +2405,13 @@ class Logit(BinaryModel):

         .. math:: \\lambda\\left(x^{\\prime}\\beta\\right)=\\frac{e^{-x^{\\prime}\\beta}}{\\left(1+e^{-x^{\\prime}\\beta}\\right)^{2}}
         """
-        pass
+        X = np.asarray(X)
+        return np.exp(-X)/(1+np.exp(-X))**2
+
+    @cache_readonly
+    def family(self):
+        from statsmodels.genmod import families
+        return families.Binomial()

     def loglike(self, params):
         """
@@ -1318,7 +2438,9 @@ class Logit(BinaryModel):
         Where :math:`q=2y-1`. This simplification comes from the fact that the
         logistic distribution is symmetric.
         """
-        pass
+        q = 2*self.endog - 1
+        linpred = self.predict(params, which="linear")
+        return np.sum(np.log(self.cdf(q * linpred)))

     def loglikeobs(self, params):
         """
@@ -1347,7 +2469,9 @@ class Logit(BinaryModel):
         where :math:`q=2y-1`. This simplification comes from the fact that the
         logistic distribution is symmetric.
         """
-        pass
+        q = 2*self.endog - 1
+        linpred = self.predict(params, which="linear")
+        return np.log(self.cdf(q * linpred))

     def score(self, params):
         """
@@ -1368,7 +2492,11 @@ class Logit(BinaryModel):
         -----
         .. math:: \\frac{\\partial\\ln L}{\\partial\\beta}=\\sum_{i=1}^{n}\\left(y_{i}-\\Lambda_{i}\\right)x_{i}
         """
-        pass
+
+        y = self.endog
+        X = self.exog
+        fitted = self.predict(params)
+        return np.dot(y - fitted, X)

     def score_obs(self, params):
         """
@@ -1391,7 +2519,11 @@ class Logit(BinaryModel):

         for observations :math:`i=1,...,n`
         """
-        pass
+
+        y = self.endog
+        X = self.exog
+        fitted = self.predict(params)
+        return (y - fitted)[:,None] * X

     def score_factor(self, params):
         """
@@ -1418,7 +2550,9 @@ class Logit(BinaryModel):

         .. math:: \\ln\\lambda_{i}=x_{i}\\beta
         """
-        pass
+        y = self.endog
+        fitted = self.predict(params)
+        return (y - fitted)

     def hessian(self, params):
         """
@@ -1439,7 +2573,9 @@ class Logit(BinaryModel):
         -----
         .. math:: \\frac{\\partial^{2}\\ln L}{\\partial\\beta\\partial\\beta^{\\prime}}=-\\sum_{i}\\Lambda_{i}\\left(1-\\Lambda_{i}\\right)x_{i}x_{i}^{\\prime}
         """
-        pass
+        X = self.exog
+        L = self.predict(params)
+        return -np.dot(L*(1-L)*X.T,X)

     def hessian_factor(self, params):
         """
@@ -1456,7 +2592,22 @@ class Logit(BinaryModel):
             The Hessian factor, second derivative of loglikelihood function
             with respect to the linear predictor evaluated at `params`
         """
-        pass
+        L = self.predict(params)
+        return -L * (1 - L)
+
+    @Appender(DiscreteModel.fit.__doc__)
+    def fit(self, start_params=None, method='newton', maxiter=35,
+            full_output=1, disp=1, callback=None, **kwargs):
+        bnryfit = super().fit(start_params=start_params,
+                              method=method,
+                              maxiter=maxiter,
+                              full_output=full_output,
+                              disp=disp,
+                              callback=callback,
+                              **kwargs)
+
+        discretefit = LogitResults(self, bnryfit)
+        return BinaryResultsWrapper(discretefit)

     def _deriv_score_obs_dendog(self, params):
         """derivative of score_obs w.r.t. endog
@@ -1473,12 +2624,11 @@ class Logit(BinaryModel):
             can is given by `score_factor0[:, None] * exog` where
             `score_factor0` is the score_factor without the residual.
         """
-        pass
+        return self.exog


 class Probit(BinaryModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Probit Model

     %(params)s
@@ -1492,9 +2642,14 @@ class Probit(BinaryModel):
         A reference to the endogenous response variable
     exog : ndarray
         A reference to the exogenous design.
-    """
-         % {'params': base._model_params_doc, 'extra_params': base.
-        _missing_param_doc + _check_rank_doc})
+    """ % {'params': base._model_params_doc,
+           'extra_params': base._missing_param_doc + _check_rank_doc}
+
+    @cache_readonly
+    def link(self):
+        from statsmodels.genmod.families import links
+        link = links.Probit()
+        return link

     def cdf(self, X):
         """
@@ -1514,7 +2669,7 @@ class Probit(BinaryModel):
         -----
         This function is just an alias for scipy.stats.norm.cdf
         """
-        pass
+        return stats.norm._cdf(X)

     def pdf(self, X):
         """
@@ -1534,7 +2689,9 @@ class Probit(BinaryModel):
         -----
         This function is just an alias for scipy.stats.norm.pdf
         """
-        pass
+        X = np.asarray(X)
+        return stats.norm._pdf(X)
+

     def loglike(self, params):
         """
@@ -1558,7 +2715,10 @@ class Probit(BinaryModel):
         Where :math:`q=2y-1`. This simplification comes from the fact that the
         normal distribution is symmetric.
         """
-        pass
+
+        q = 2*self.endog - 1
+        linpred = self.predict(params, which="linear")
+        return np.sum(np.log(np.clip(self.cdf(q * linpred), FLOAT_EPS, 1)))

     def loglikeobs(self, params):
         """
@@ -1584,7 +2744,11 @@ class Probit(BinaryModel):
         where :math:`q=2y-1`. This simplification comes from the fact that the
         normal distribution is symmetric.
         """
-        pass
+
+        q = 2*self.endog - 1
+        linpred = self.predict(params, which="linear")
+        return np.log(np.clip(self.cdf(q*linpred), FLOAT_EPS, 1))
+

     def score(self, params):
         """
@@ -1608,7 +2772,13 @@ class Probit(BinaryModel):
         Where :math:`q=2y-1`. This simplification comes from the fact that the
         normal distribution is symmetric.
         """
-        pass
+        y = self.endog
+        X = self.exog
+        XB = self.predict(params, which="linear")
+        q = 2*y - 1
+        # clip to get rid of invalid divide complaint
+        L = q*self.pdf(q*XB)/np.clip(self.cdf(q*XB), FLOAT_EPS, 1 - FLOAT_EPS)
+        return np.dot(L,X)

     def score_obs(self, params):
         """
@@ -1634,7 +2804,13 @@ class Probit(BinaryModel):
         Where :math:`q=2y-1`. This simplification comes from the fact that the
         normal distribution is symmetric.
         """
-        pass
+        y = self.endog
+        X = self.exog
+        XB = self.predict(params, which="linear")
+        q = 2*y - 1
+        # clip to get rid of invalid divide complaint
+        L = q*self.pdf(q*XB)/np.clip(self.cdf(q*XB), FLOAT_EPS, 1 - FLOAT_EPS)
+        return L[:,None] * X

     def score_factor(self, params):
         """
@@ -1660,7 +2836,13 @@ class Probit(BinaryModel):
         Where :math:`q=2y-1`. This simplification comes from the fact that the
         normal distribution is symmetric.
         """
-        pass
+        y = self.endog
+        XB = self.predict(params, which="linear")
+        q = 2*y - 1
+        # clip to get rid of invalid divide complaint
+        L = q*self.pdf(q*XB)/np.clip(self.cdf(q*XB), FLOAT_EPS, 1 - FLOAT_EPS)
+        return L
+

     def hessian(self, params):
         """
@@ -1687,7 +2869,11 @@ class Probit(BinaryModel):

         and :math:`q=2y-1`
         """
-        pass
+        X = self.exog
+        XB = self.predict(params, which="linear")
+        q = 2*self.endog - 1
+        L = q*self.pdf(q*XB)/self.cdf(q*XB)
+        return np.dot(-L*(L+XB)*X.T,X)

     def hessian_factor(self, params):
         """
@@ -1714,7 +2900,23 @@ class Probit(BinaryModel):

         and :math:`q=2y-1`
         """
-        pass
+        XB = self.predict(params, which="linear")
+        q = 2 * self.endog - 1
+        L = q * self.pdf(q * XB) / self.cdf(q * XB)
+        return -L * (L + XB)
+
+    @Appender(DiscreteModel.fit.__doc__)
+    def fit(self, start_params=None, method='newton', maxiter=35,
+            full_output=1, disp=1, callback=None, **kwargs):
+        bnryfit = super().fit(start_params=start_params,
+                              method=method,
+                              maxiter=maxiter,
+                              full_output=full_output,
+                              disp=disp,
+                              callback=callback,
+                              **kwargs)
+        discretefit = ProbitResults(self, bnryfit)
+        return BinaryResultsWrapper(discretefit)

     def _deriv_score_obs_dendog(self, params):
         """derivative of score_obs w.r.t. endog
@@ -1731,12 +2933,18 @@ class Probit(BinaryModel):
             can is given by `score_factor0[:, None] * exog` where
             `score_factor0` is the score_factor without the residual.
         """
-        pass
+
+        linpred = self.predict(params, which="linear")
+
+        pdf_ = self.pdf(linpred)
+        # clip to get rid of invalid divide complaint
+        cdf_ = np.clip(self.cdf(linpred), FLOAT_EPS, 1 - FLOAT_EPS)
+        deriv = pdf_ / cdf_ / (1 - cdf_)  # deriv factor
+        return deriv[:, None] * self.exog


 class MNLogit(MultinomialModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Multinomial Logit Model

     Parameters
@@ -1776,24 +2984,26 @@ class MNLogit(MultinomialModel):
     Notes
     -----
     See developer notes for further information on `MNLogit` internals.
-    """
-         % {'extra_params': base._missing_param_doc + _check_rank_doc})
+    """ % {'extra_params': base._missing_param_doc + _check_rank_doc}

     def __init__(self, endog, exog, check_rank=True, **kwargs):
         super().__init__(endog, exog, check_rank=check_rank, **kwargs)
+
+        # Override cov_names since multivariate model
         yname = self.endog_names
         ynames = self._ynames_map
         ynames = MultinomialResults._maybe_convert_ynames_int(ynames)
+        # use range below to ensure sortedness
         ynames = [ynames[key] for key in range(int(self.J))]
-        idx = MultiIndex.from_product((ynames[1:], self.data.xnames), names
-            =(yname, None))
+        idx = MultiIndex.from_product((ynames[1:], self.data.xnames),
+                                      names=(yname, None))
         self.data.cov_names = idx

     def pdf(self, eXB):
         """
         NotImplemented
         """
-        pass
+        raise NotImplementedError

     def cdf(self, X):
         """
@@ -1814,7 +3024,8 @@ class MNLogit(MultinomialModel):
         In the multinomial logit model.
         .. math:: \\frac{\\exp\\left(\\beta_{j}^{\\prime}x_{i}\\right)}{\\sum_{k=0}^{J}\\exp\\left(\\beta_{k}^{\\prime}x_{i}\\right)}
         """
-        pass
+        eXB = np.column_stack((np.ones(len(X)), np.exp(X)))
+        return eXB/eXB.sum(1)[:,None]

     def loglike(self, params):
         """
@@ -1843,7 +3054,10 @@ class MNLogit(MultinomialModel):
         where :math:`d_{ij}=1` if individual `i` chose alternative `j` and 0
         if not.
         """
-        pass
+        params = params.reshape(self.K, -1, order='F')
+        d = self.wendog
+        logprob = np.log(self.cdf(np.dot(self.exog,params)))
+        return np.sum(d * logprob)

     def loglikeobs(self, params):
         """
@@ -1874,7 +3088,10 @@ class MNLogit(MultinomialModel):
         where :math:`d_{ij}=1` if individual `i` chose alternative `j` and 0
         if not.
         """
-        pass
+        params = params.reshape(self.K, -1, order='F')
+        d = self.wendog
+        logprob = np.log(self.cdf(np.dot(self.exog,params)))
+        return d * logprob

     def score(self, params):
         """
@@ -1901,7 +3118,11 @@ class MNLogit(MultinomialModel):
         In the multinomial model the score matrix is K x J-1 but is returned
         as a flattened array to work with the solvers.
         """
-        pass
+        params = params.reshape(self.K, -1, order='F')
+        firstterm = self.wendog[:,1:] - self.cdf(np.dot(self.exog,
+                                                  params))[:,1:]
+        #NOTE: might need to switch terms if params is reshaped
+        return np.dot(firstterm.T, self.exog).flatten()

     def loglike_and_score(self, params):
         """
@@ -1910,7 +3131,12 @@ class MNLogit(MultinomialModel):
         Note that both of these returned quantities will need to be negated
         before being minimized by the maximum likelihood fitting machinery.
         """
-        pass
+        params = params.reshape(self.K, -1, order='F')
+        cdf_dot_exog_params = self.cdf(np.dot(self.exog, params))
+        loglike_value = np.sum(self.wendog * np.log(cdf_dot_exog_params))
+        firstterm = self.wendog[:, 1:] - cdf_dot_exog_params[:, 1:]
+        score_array = np.dot(firstterm.T, self.exog).flatten()
+        return loglike_value, score_array

     def score_obs(self, params):
         """
@@ -1937,7 +3163,11 @@ class MNLogit(MultinomialModel):
         as a flattened array. The Jacobian has the observations in rows and
         the flattened array of derivatives in columns.
         """
-        pass
+        params = params.reshape(self.K, -1, order='F')
+        firstterm = self.wendog[:,1:] - self.cdf(np.dot(self.exog,
+                                                  params))[:,1:]
+        #NOTE: might need to switch terms if params is reshaped
+        return (firstterm[:,:,None] * self.exog[:,None,:]).reshape(self.exog.shape[0], -1)

     def hessian(self, params):
         """
@@ -1968,12 +3198,87 @@ class MNLogit(MultinomialModel):
         This implementation does not take advantage of the symmetry of
         the Hessian and could probably be refactored for speed.
         """
-        pass
+        params = params.reshape(self.K, -1, order='F')
+        X = self.exog
+        pr = self.cdf(np.dot(X,params))
+        partials = []
+        J = self.J
+        K = self.K
+        for i in range(J-1):
+            for j in range(J-1): # this loop assumes we drop the first col.
+                if i == j:
+                    partials.append(\
+                        -np.dot(((pr[:,i+1]*(1-pr[:,j+1]))[:,None]*X).T,X))
+                else:
+                    partials.append(-np.dot(((pr[:,i+1]*-pr[:,j+1])[:,None]*X).T,X))
+        H = np.array(partials)
+        # the developer's notes on multinomial should clear this math up
+        H = np.transpose(H.reshape(J-1, J-1, K, K), (0, 2, 1, 3)).reshape((J-1)*K, (J-1)*K)
+        return H
+
+
+#TODO: Weibull can replaced by a survival analsysis function
+# like stat's streg (The cox model as well)
+#class Weibull(DiscreteModel):
+#    """
+#    Binary choice Weibull model
+#
+#    Notes
+#    ------
+#    This is unfinished and untested.
+#    """
+##TODO: add analytic hessian for Weibull
+#    def initialize(self):
+#        pass
+#
+#    def cdf(self, X):
+#        """
+#        Gumbell (Log Weibull) cumulative distribution function
+#        """
+##        return np.exp(-np.exp(-X))
+#        return stats.gumbel_r.cdf(X)
+#        # these two are equivalent.
+#        # Greene table and discussion is incorrect.
+#
+#    def pdf(self, X):
+#        """
+#        Gumbell (LogWeibull) probability distribution function
+#        """
+#        return stats.gumbel_r.pdf(X)
+#
+#    def loglike(self, params):
+#        """
+#        Loglikelihood of Weibull distribution
+#        """
+#        X = self.exog
+#        cdf = self.cdf(np.dot(X,params))
+#        y = self.endog
+#        return np.sum(y*np.log(cdf) + (1-y)*np.log(1-cdf))
+#
+#    def score(self, params):
+#        y = self.endog
+#        X = self.exog
+#        F = self.cdf(np.dot(X,params))
+#        f = self.pdf(np.dot(X,params))
+#        term = (y*f/F + (1 - y)*-f/(1-F))
+#        return np.dot(term,X)
+#
+#    def hessian(self, params):
+#        hess = nd.Jacobian(self.score)
+#        return hess(params)
+#
+#    def fit(self, start_params=None, method='newton', maxiter=35, tol=1e-08):
+## The example had problems with all zero start values, Hessian = 0
+#        if start_params is None:
+#            start_params = OLS(self.endog, self.exog).fit().params
+#        mlefit = super(Weibull, self).fit(start_params=start_params,
+#                method=method, maxiter=maxiter, tol=tol)
+#        return mlefit
+#


 class NegativeBinomial(CountModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Negative Binomial Model

     %(params)s
@@ -1992,9 +3297,9 @@ class NegativeBinomial(CountModel):
         for count data". Economics Letters. Volume 99, Number 3, pp.585-590.
     Hilbe, J.M. 2011. "Negative binomial regression". Cambridge University
         Press.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """loglike_method : str
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+           """loglike_method : str
         Log-likelihood type. 'nb2','nb1', or 'geometric'.
         Fitted value :math:`\\mu`
         Heterogeneity parameter :math:`\\alpha`
@@ -2007,13 +3312,17 @@ class NegativeBinomial(CountModel):
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
         equal to 1.
-    """
-         + base._missing_param_doc + _check_rank_doc})
+    """ + base._missing_param_doc + _check_rank_doc}

     def __init__(self, endog, exog, loglike_method='nb2', offset=None,
-        exposure=None, missing='none', check_rank=True, **kwargs):
-        super().__init__(endog, exog, offset=offset, exposure=exposure,
-            missing=missing, check_rank=check_rank, **kwargs)
+                 exposure=None, missing='none', check_rank=True, **kwargs):
+        super().__init__(endog,
+                         exog,
+                         offset=offset,
+                         exposure=exposure,
+                         missing=missing,
+                         check_rank=check_rank,
+                         **kwargs)
         self.loglike_method = loglike_method
         self._initialize()
         if loglike_method in ['nb2', 'nb1']:
@@ -2021,10 +3330,32 @@ class NegativeBinomial(CountModel):
             self.k_extra = 1
         else:
             self.k_extra = 0
+        # store keys for extras if we need to recreate model instance
+        # we need to append keys that do not go to super
         self._init_keys.append('loglike_method')

+    def _initialize(self):
+        if self.loglike_method == 'nb2':
+            self.hessian = self._hessian_nb2
+            self.score = self._score_nbin
+            self.loglikeobs = self._ll_nb2
+            self._transparams = True  # transform lnalpha -> alpha in fit
+        elif self.loglike_method == 'nb1':
+            self.hessian = self._hessian_nb1
+            self.score = self._score_nb1
+            self.loglikeobs = self._ll_nb1
+            self._transparams = True  # transform lnalpha -> alpha in fit
+        elif self.loglike_method == 'geometric':
+            self.hessian = self._hessian_geom
+            self.score = self._score_geom
+            self.loglikeobs = self._ll_geometric
+        else:
+            raise ValueError('Likelihood type must "nb1", "nb2" '
+                             'or "geometric"')
+
+    # Workaround to pickle instance methods
     def __getstate__(self):
-        odict = self.__dict__.copy()
+        odict = self.__dict__.copy()  # copy the dict since we change it
         del odict['hessian']
         del odict['score']
         del odict['loglikeobs']
@@ -2034,8 +3365,40 @@ class NegativeBinomial(CountModel):
         self.__dict__.update(indict)
         self._initialize()

+    def _ll_nbin(self, params, alpha, Q=0):
+        if np.any(np.iscomplex(params)) or np.iscomplex(alpha):
+            gamma_ln = loggamma
+        else:
+            gamma_ln = gammaln
+        endog = self.endog
+        mu = self.predict(params)
+        size = 1/alpha * mu**Q
+        prob = size/(size+mu)
+        coeff = (gamma_ln(size+endog) - gamma_ln(endog+1) -
+                 gamma_ln(size))
+        llf = coeff + size*np.log(prob) + endog*np.log(1-prob)
+        return llf
+
+    def _ll_nb2(self, params):
+        if self._transparams:  # got lnalpha during fit
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+        return self._ll_nbin(params[:-1], alpha, Q=0)
+
+    def _ll_nb1(self, params):
+        if self._transparams:  # got lnalpha during fit
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+        return self._ll_nbin(params[:-1], alpha, Q=1)
+
+    def _ll_geometric(self, params):
+        # we give alpha of 1 because it's actually log(alpha) where alpha=0
+        return self._ll_nbin(params, 1, 0)
+
     def loglike(self, params):
-        """
+        r"""
         Loglikelihood for negative binomial model

         Parameters
@@ -2053,50 +3416,428 @@ class NegativeBinomial(CountModel):
         Notes
         -----
         Following notation in Greene (2008), with negative binomial
-        heterogeneity parameter :math:`\\alpha`:
+        heterogeneity parameter :math:`\alpha`:

         .. math::

-           \\lambda_i &= exp(X\\beta) \\\\
-           \\theta &= 1 / \\alpha \\\\
-           g_i &= \\theta \\lambda_i^Q \\\\
-           w_i &= g_i/(g_i + \\lambda_i) \\\\
-           r_i &= \\theta / (\\theta+\\lambda_i) \\\\
-           ln \\mathcal{L}_i &= ln \\Gamma(y_i+g_i) - ln \\Gamma(1+y_i) + g_iln (r_i) + y_i ln(1-r_i)
+           \lambda_i &= exp(X\beta) \\
+           \theta &= 1 / \alpha \\
+           g_i &= \theta \lambda_i^Q \\
+           w_i &= g_i/(g_i + \lambda_i) \\
+           r_i &= \theta / (\theta+\lambda_i) \\
+           ln \mathcal{L}_i &= ln \Gamma(y_i+g_i) - ln \Gamma(1+y_i) + g_iln (r_i) + y_i ln(1-r_i)

         where :math`Q=0` for NB2 and geometric and :math:`Q=1` for NB1.
-        For the geometric, :math:`\\alpha=0` as well.
+        For the geometric, :math:`\alpha=0` as well.
         """
-        pass
+        llf = np.sum(self.loglikeobs(params))
+        return llf
+
+    def _score_geom(self, params):
+        exog = self.exog
+        y = self.endog[:, None]
+        mu = self.predict(params)[:, None]
+        dparams = exog * (y-mu)/(mu+1)
+        return dparams.sum(0)

     def _score_nbin(self, params, Q=0):
         """
         Score vector for NB2 model
         """
-        pass
+        if self._transparams: # lnalpha came in during fit
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+        params = params[:-1]
+        exog = self.exog
+        y = self.endog[:,None]
+        mu = self.predict(params)[:,None]
+        a1 = 1/alpha * mu**Q
+        prob = a1 / (a1 + mu)  # a1 aka "size" in _ll_nbin
+        if Q == 1:  # nb1
+            # Q == 1 --> a1 = mu / alpha --> prob = 1 / (alpha + 1)
+            dgpart = digamma(y + a1) - digamma(a1)
+            dparams = exog * a1 * (np.log(prob) +
+                       dgpart)
+            dalpha = ((alpha * (y - mu * np.log(prob) -
+                              mu*(dgpart + 1)) -
+                       mu * (np.log(prob) +
+                           dgpart))/
+                       (alpha**2*(alpha + 1))).sum()
+
+        elif Q == 0:  # nb2
+            dgpart = digamma(y + a1) - digamma(a1)
+            dparams = exog*a1 * (y-mu)/(mu+a1)
+            da1 = -alpha**-2
+            dalpha = (dgpart + np.log(a1)
+                        - np.log(a1+mu) - (y-mu)/(a1+mu)).sum() * da1
+
+        #multiply above by constant outside sum to reduce rounding error
+        if self._transparams:
+            return np.r_[dparams.sum(0), dalpha*alpha]
+        else:
+            return np.r_[dparams.sum(0), dalpha]
+
+    def _score_nb1(self, params):
+        return self._score_nbin(params, Q=1)
+
+    def _hessian_geom(self, params):
+        exog = self.exog
+        y = self.endog[:,None]
+        mu = self.predict(params)[:,None]
+
+        # for dl/dparams dparams
+        dim = exog.shape[1]
+        hess_arr = np.empty((dim, dim))
+        const_arr = mu*(1+y)/(mu+1)**2
+        for i in range(dim):
+            for j in range(dim):
+                if j > i:
+                    continue
+                hess_arr[i,j] = np.squeeze(
+                    np.sum(-exog[:,i,None] * exog[:,j,None] * const_arr,
+                           axis=0
+                           )
+                )
+        tri_idx = np.triu_indices(dim, k=1)
+        hess_arr[tri_idx] = hess_arr.T[tri_idx]
+        return hess_arr
+

     def _hessian_nb1(self, params):
         """
         Hessian of NB1 model.
         """
-        pass
+        if self._transparams: # lnalpha came in during fit
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+
+        params = params[:-1]
+        exog = self.exog
+        y = self.endog[:,None]
+        mu = self.predict(params)[:,None]
+
+        a1 = mu/alpha
+        dgpart = digamma(y + a1) - digamma(a1)
+        prob = 1 / (1 + alpha)  # equiv: a1 / (a1 + mu)
+
+        # for dl/dparams dparams
+        dim = exog.shape[1]
+        hess_arr = np.empty((dim+1,dim+1))
+        #const_arr = a1*mu*(a1+y)/(mu+a1)**2
+        # not all of dparams
+        dparams = exog / alpha * (np.log(prob) +
+                                  dgpart)
+
+        dmudb = exog*mu
+        xmu_alpha = exog * a1
+        trigamma = (special.polygamma(1, a1 + y) -
+                    special.polygamma(1, a1))
+        for i in range(dim):
+            for j in range(dim):
+                if j > i:
+                    continue
+                hess_arr[i,j] = np.squeeze(
+                    np.sum(
+                        dparams[:,i,None] * dmudb[:,j,None] +
+                        xmu_alpha[:,i,None] * xmu_alpha[:,j,None] * trigamma,
+                        axis=0
+                    )
+                )
+        tri_idx = np.triu_indices(dim, k=1)
+        hess_arr[tri_idx] = hess_arr.T[tri_idx]
+
+        # for dl/dparams dalpha
+        # da1 = -alpha**-2
+        dldpda = np.sum(-a1 * dparams + exog * a1 *
+                        (-trigamma*mu/alpha**2 - prob), axis=0)
+
+        hess_arr[-1,:-1] = dldpda
+        hess_arr[:-1,-1] = dldpda
+
+        log_alpha = np.log(prob)
+        alpha3 = alpha**3
+        alpha2 = alpha**2
+        mu2 = mu**2
+        dada = ((alpha3*mu*(2*log_alpha + 2*dgpart + 3) -
+                 2*alpha3*y +
+                 4*alpha2*mu*(log_alpha + dgpart) +
+                 alpha2 * (2*mu - y) +
+                 2*alpha*mu2*trigamma + mu2 * trigamma + alpha2 * mu2 * trigamma +
+                 2*alpha*mu*(log_alpha + dgpart)
+                 )/(alpha**4*(alpha2 + 2*alpha + 1)))
+        hess_arr[-1,-1] = dada.sum()
+
+        return hess_arr

     def _hessian_nb2(self, params):
         """
         Hessian of NB2 model.
         """
-        pass
+        if self._transparams: # lnalpha came in during fit
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+        a1 = 1/alpha
+        params = params[:-1]
+
+        exog = self.exog
+        y = self.endog[:,None]
+        mu = self.predict(params)[:,None]
+        prob = a1 / (a1 + mu)
+        dgpart = digamma(a1 + y) - digamma(a1)
+
+        # for dl/dparams dparams
+        dim = exog.shape[1]
+        hess_arr = np.empty((dim+1,dim+1))
+        const_arr = a1*mu*(a1+y)/(mu+a1)**2
+        for i in range(dim):
+            for j in range(dim):
+                if j > i:
+                    continue
+                hess_arr[i,j] = np.sum(-exog[:,i,None] * exog[:,j,None] *
+                                       const_arr, axis=0).squeeze()
+        tri_idx = np.triu_indices(dim, k=1)
+        hess_arr[tri_idx] = hess_arr.T[tri_idx]
+
+        # for dl/dparams dalpha
+        da1 = -alpha**-2
+        dldpda = -np.sum(mu*exog*(y-mu)*a1**2/(mu+a1)**2 , axis=0)
+        hess_arr[-1,:-1] = dldpda
+        hess_arr[:-1,-1] = dldpda
+
+        # for dl/dalpha dalpha
+        #NOTE: polygamma(1,x) is the trigamma function
+        da2 = 2*alpha**-3
+        dalpha = da1 * (dgpart +
+                    np.log(prob) - (y - mu)/(a1+mu))
+        dada = (da2 * dalpha/da1 + da1**2 * (special.polygamma(1, a1+y) -
+                    special.polygamma(1, a1) + 1/a1 - 1/(a1 + mu) +
+                    (y - mu)/(mu + a1)**2)).sum()
+        hess_arr[-1,-1] = dada
+
+        return hess_arr
+
+    #TODO: replace this with analytic where is it used?
+    def score_obs(self, params):
+        sc = approx_fprime_cs(params, self.loglikeobs)
+        return sc
+
+    @Appender(Poisson.predict.__doc__)
+    def predict(self, params, exog=None, exposure=None, offset=None,
+                which='mean', linear=None, y_values=None):
+
+        if linear is not None:
+            msg = 'linear keyword is deprecated, use which="linear"'
+            warnings.warn(msg, FutureWarning)
+            if linear is True:
+                which = "linear"
+
+        # avoid duplicate computation for get-distribution
+        if which == "prob":
+            distr = self.get_distribution(
+                params,
+                exog=exog,
+                exposure=exposure,
+                offset=offset
+                )
+            if y_values is None:
+                y_values = np.arange(0, np.max(self.endog) + 1)
+            else:
+                y_values = np.asarray(y_values)
+
+            assert y_values.ndim == 1
+            y_values = y_values[..., None]
+            return distr.pmf(y_values).T
+
+        exog, offset, exposure = self._get_predict_arrays(
+            exog=exog,
+            offset=offset,
+            exposure=exposure
+            )
+
+        fitted = np.dot(exog, params[:exog.shape[1]])
+        linpred = fitted + exposure + offset
+        if which == "mean":
+            return np.exp(linpred)
+        elif which.startswith("lin"):
+            return linpred
+        elif which == "var":
+            mu = np.exp(linpred)
+            if self.loglike_method == 'geometric':
+                var_ = mu * (1 + mu)
+            else:
+                if self.loglike_method == 'nb2':
+                    p = 2
+                elif self.loglike_method == 'nb1':
+                    p = 1
+                alpha = params[-1]
+                var_ = mu * (1 + alpha * mu**(p - 1))
+            return var_
+        else:
+            raise ValueError('keyword which has to be "mean" and "linear"')
+
+    @Appender(_get_start_params_null_docs)
+    def _get_start_params_null(self):
+        offset = getattr(self, "offset", 0)
+        exposure = getattr(self, "exposure", 0)
+        const = (self.endog / np.exp(offset + exposure)).mean()
+        params = [np.log(const)]
+        mu = const * np.exp(offset + exposure)
+        resid = self.endog - mu
+        a = self._estimate_dispersion(mu, resid, df_resid=resid.shape[0] - 1)
+        params.append(a)
+        return np.array(params)
+
+    def _estimate_dispersion(self, mu, resid, df_resid=None):
+        if df_resid is None:
+            df_resid = resid.shape[0]
+        if self.loglike_method == 'nb2':
+            #params.append(np.linalg.pinv(mu[:,None]).dot(resid**2 / mu - 1))
+            a = ((resid**2 / mu - 1) / mu).sum() / df_resid
+        else: #self.loglike_method == 'nb1':
+            a = (resid**2 / mu - 1).sum() / df_resid
+        return a
+
+    def fit(self, start_params=None, method='bfgs', maxiter=35,
+            full_output=1, disp=1, callback=None,
+            cov_type='nonrobust', cov_kwds=None, use_t=None,
+            optim_kwds_prelim=None, **kwargs):
+
+        # Note: do not let super handle robust covariance because it has
+        # transformed params
+        self._transparams = False # always define attribute
+        if self.loglike_method.startswith('nb') and method not in ['newton',
+                                                                   'ncg']:
+            self._transparams = True # in case same Model instance is refit
+        elif self.loglike_method.startswith('nb'): # method is newton/ncg
+            self._transparams = False # because we need to step in alpha space
+
+        if start_params is None:
+            # Use poisson fit as first guess.
+            #TODO, Warning: this assumes exposure is logged
+            offset = getattr(self, "offset", 0) + getattr(self, "exposure", 0)
+            if np.size(offset) == 1 and offset == 0:
+                offset = None
+            kwds_prelim = {'disp': 0, 'skip_hessian': True, 'warn_convergence': False}
+            if optim_kwds_prelim is not None:
+                kwds_prelim.update(optim_kwds_prelim)
+            mod_poi = Poisson(self.endog, self.exog, offset=offset)
+            with warnings.catch_warnings():
+                warnings.simplefilter("always")
+                res_poi = mod_poi.fit(**kwds_prelim)
+            start_params = res_poi.params
+            if self.loglike_method.startswith('nb'):
+                a = self._estimate_dispersion(res_poi.predict(), res_poi.resid,
+                                              df_resid=res_poi.df_resid)
+                start_params = np.append(start_params, max(0.05, a))
+        else:
+            if self._transparams is True:
+                # transform user provided start_params dispersion, see #3918
+                start_params = np.array(start_params, copy=True)
+                start_params[-1] = np.log(start_params[-1])
+
+        if callback is None:
+            # work around perfect separation callback #3895
+            callback = lambda *x: x
+
+        mlefit = super().fit(start_params=start_params,
+                             maxiter=maxiter, method=method, disp=disp,
+                             full_output=full_output, callback=callback,
+                             **kwargs)
+        if optim_kwds_prelim is not None:
+            mlefit.mle_settings["optim_kwds_prelim"] = optim_kwds_prelim
+        # TODO: Fix NBin _check_perfect_pred
+        if self.loglike_method.startswith('nb'):
+            # mlefit is a wrapped counts results
+            self._transparams = False # do not need to transform anymore now
+            # change from lnalpha to alpha
+            if method not in ["newton", "ncg"]:
+                mlefit._results.params[-1] = np.exp(mlefit._results.params[-1])
+
+            nbinfit = NegativeBinomialResults(self, mlefit._results)
+            result = NegativeBinomialResultsWrapper(nbinfit)
+        else:
+            result = mlefit
+
+        if cov_kwds is None:
+            cov_kwds = {}  #TODO: make this unnecessary ?
+        result._get_robustcov_results(cov_type=cov_type, use_self=True, use_t=use_t, **cov_kwds)
+        return result
+
+
+    def fit_regularized(self, start_params=None, method='l1',
+            maxiter='defined_by_method', full_output=1, disp=1, callback=None,
+            alpha=0, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=1e-4,
+            qc_tol=0.03, **kwargs):
+
+        _validate_l1_method(method)
+
+        if self.loglike_method.startswith('nb') and (np.size(alpha) == 1 and
+                                                     alpha != 0):
+            # do not penalize alpha if alpha is scalar
+            k_params = self.exog.shape[1] + self.k_extra
+            alpha = alpha * np.ones(k_params)
+            alpha[-1] = 0
+
+        # alpha for regularized poisson to get starting values
+        alpha_p = alpha[:-1] if (self.k_extra and np.size(alpha) > 1) else alpha
+
+        self._transparams = False
+        if start_params is None:
+            # Use poisson fit as first guess.
+            #TODO, Warning: this assumes exposure is logged
+            offset = getattr(self, "offset", 0) + getattr(self, "exposure", 0)
+            if np.size(offset) == 1 and offset == 0:
+                offset = None
+            mod_poi = Poisson(self.endog, self.exog, offset=offset)
+            with warnings.catch_warnings():
+                warnings.simplefilter("always")
+                start_params = mod_poi.fit_regularized(
+                    start_params=start_params, method=method, maxiter=maxiter,
+                    full_output=full_output, disp=0, callback=callback,
+                    alpha=alpha_p, trim_mode=trim_mode,
+                    auto_trim_tol=auto_trim_tol, size_trim_tol=size_trim_tol,
+                    qc_tol=qc_tol, **kwargs).params
+            if self.loglike_method.startswith('nb'):
+                start_params = np.append(start_params, 0.1)
+
+        cntfit = super(CountModel, self).fit_regularized(
+                start_params=start_params, method=method, maxiter=maxiter,
+                full_output=full_output, disp=disp, callback=callback,
+                alpha=alpha, trim_mode=trim_mode, auto_trim_tol=auto_trim_tol,
+                size_trim_tol=size_trim_tol, qc_tol=qc_tol, **kwargs)
+
+        discretefit = L1NegativeBinomialResults(self, cntfit)
+        return L1NegativeBinomialResultsWrapper(discretefit)

     @Appender(Poisson.get_distribution.__doc__)
     def get_distribution(self, params, exog=None, exposure=None, offset=None):
         """get frozen instance of distribution
         """
-        pass
+        mu = self.predict(params, exog=exog, exposure=exposure, offset=offset)
+        if self.loglike_method == 'geometric':
+            # distr = stats.geom(1 / (1 + mu[:, None]), loc=-1)
+            distr = stats.geom(1 / (1 + mu), loc=-1)
+        else:
+            if self.loglike_method == 'nb2':
+                p = 2
+            elif self.loglike_method == 'nb1':
+                p = 1
+
+            alpha = params[-1]
+            q = 2 - p
+            size = 1. / alpha * mu**q
+            prob = size / (size + mu)
+            # distr = nbinom(size[:, None], prob[:, None])
+            distr = nbinom(size, prob)
+
+        return distr


 class NegativeBinomialP(CountModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Generalized Negative Binomial (NB-P) Model

     %(params)s
@@ -2111,9 +3852,9 @@ class NegativeBinomialP(CountModel):
     p : scalar
         P denotes parameterizations for NB-P regression. p=1 for NB-1 and
         p=2 for NB-2. Default is p=1.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """p : scalar
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+               """p : scalar
         P denotes parameterizations for NB regression. p=1 for NB-1 and
         p=2 for NB-2. Default is p=2.
     offset : array_like
@@ -2121,18 +3862,31 @@ class NegativeBinomialP(CountModel):
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
         equal to 1.
-        """
-         + base._missing_param_doc + _check_rank_doc})
-
-    def __init__(self, endog, exog, p=2, offset=None, exposure=None,
-        missing='none', check_rank=True, **kwargs):
-        super().__init__(endog, exog, offset=offset, exposure=exposure,
-            missing=missing, check_rank=check_rank, **kwargs)
+        """ + base._missing_param_doc + _check_rank_doc}
+
+    def __init__(self, endog, exog, p=2, offset=None,
+                 exposure=None, missing='none', check_rank=True,
+                 **kwargs):
+        super().__init__(endog,
+                         exog,
+                         offset=offset,
+                         exposure=exposure,
+                         missing=missing,
+                         check_rank=check_rank,
+                         **kwargs)
         self.parameterization = p
         self.exog_names.append('alpha')
         self.k_extra = 1
         self._transparams = False

+    def _get_init_kwds(self):
+        kwds = super()._get_init_kwds()
+        kwds['p'] = self.parameterization
+        return kwds
+
+    def _get_exogs(self):
+        return (self.exog, None)
+
     def loglike(self, params):
         """
         Loglikelihood of Generalized Negative Binomial (NB-P) model
@@ -2148,7 +3902,7 @@ class NegativeBinomialP(CountModel):
             The log-likelihood function of the model evaluated at `params`.
             See notes.
         """
-        pass
+        return np.sum(self.loglikeobs(params))

     def loglikeobs(self, params):
         """
@@ -2165,7 +3919,25 @@ class NegativeBinomialP(CountModel):
             The log likelihood for each observation of the model evaluated
             at `params`. See Notes
         """
-        pass
+        if self._transparams:
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+
+        params = params[:-1]
+        p = self.parameterization
+        y = self.endog
+
+        mu = self.predict(params)
+        mu_p = mu**(2 - p)
+        a1 = mu_p / alpha
+        a2 = mu + a1
+
+        llf = (gammaln(y + a1) - gammaln(y + 1) - gammaln(a1) +
+               a1 * np.log(a1) + y * np.log(mu) -
+               (y + a1) * np.log(a2))
+
+        return llf

     def score_obs(self, params):
         """
@@ -2182,7 +3954,34 @@ class NegativeBinomialP(CountModel):
             The score vector of the model, i.e. the first derivative of the
             loglikelihood function, evaluated at `params`
         """
-        pass
+        if self._transparams:
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+
+        params = params[:-1]
+        p = 2 - self.parameterization
+        y = self.endog
+
+        mu = self.predict(params)
+        mu_p = mu**p
+        a1 = mu_p / alpha
+        a2 = mu + a1
+        a3 = y + a1
+        a4 = p * a1 / mu
+
+        dgpart = digamma(a3) - digamma(a1)
+        dgterm = dgpart + np.log(a1 / a2) + 1 - a3 / a2
+        # TODO: better name/interpretation for dgterm?
+
+        dparams = (a4 * dgterm -
+                   a3 / a2 +
+                   y / mu)
+        dparams = (self.exog.T * mu * dparams).T
+        dalpha = -a1 / alpha * dgterm
+
+        return np.concatenate((dparams, np.atleast_2d(dalpha).T),
+                              axis=1)

     def score(self, params):
         """
@@ -2199,7 +3998,12 @@ class NegativeBinomialP(CountModel):
             The score vector of the model, i.e. the first derivative of the
             loglikelihood function, evaluated at `params`
         """
-        pass
+        score = np.sum(self.score_obs(params), axis=0)
+        if self._transparams:
+            score[-1] == score[-1] ** 2
+            return score
+        else:
+            return score

     def score_factor(self, params, endog=None):
         """
@@ -2216,7 +4020,34 @@ class NegativeBinomialP(CountModel):
             The score vector of the model, i.e. the first derivative of the
             loglikelihood function, evaluated at `params`
         """
-        pass
+        params = np.asarray(params)
+        if self._transparams:
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+
+        params = params[:-1]
+        p = 2 - self.parameterization
+        y = self.endog if endog is None else endog
+
+        mu = self.predict(params)
+        mu_p = mu**p
+        a1 = mu_p / alpha
+        a2 = mu + a1
+        a3 = y + a1
+        a4 = p * a1 / mu
+
+        dgpart = digamma(a3) - digamma(a1)
+
+        dparams = ((a4 * dgpart -
+                   a3 / a2) +
+                   y / mu + a4 * (1 - a3 / a2 + np.log(a1 / a2)))
+        dparams = (mu * dparams).T
+        dalpha = (-a1 / alpha * (dgpart +
+                                 np.log(a1 / a2) +
+                                 1 - a3 / a2))
+
+        return dparams, dalpha

     def hessian(self, params):
         """
@@ -2232,7 +4063,66 @@ class NegativeBinomialP(CountModel):
         hessian : ndarray, 2-D
             The hessian matrix of the model.
         """
-        pass
+        if self._transparams:
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+        params = params[:-1]
+
+        p = 2 - self.parameterization
+        y = self.endog
+        exog = self.exog
+        mu = self.predict(params)
+
+        mu_p = mu**p
+        a1 = mu_p / alpha
+        a2 = mu + a1
+        a3 = y + a1
+        a4 = p * a1 / mu
+
+        prob = a1 / a2
+        lprob = np.log(prob)
+        dgpart = digamma(a3) - digamma(a1)
+        pgpart = polygamma(1, a3) - polygamma(1, a1)
+
+        dim = exog.shape[1]
+        hess_arr = np.zeros((dim + 1, dim + 1))
+
+        coeff = mu**2 * (((1 + a4)**2 * a3 / a2**2 -
+                          a3 / a2 * (p - 1) * a4 / mu -
+                          y / mu**2 -
+                          2 * a4 * (1 + a4) / a2 +
+                          p * a4 / mu * (lprob + dgpart + 2) -
+                          a4 / mu * (lprob + dgpart + 1) +
+                          a4**2 * pgpart) +
+                         (-(1 + a4) * a3 / a2 +
+                          y / mu +
+                          a4 * (lprob + dgpart + 1)) / mu)
+
+        for i in range(dim):
+            hess_arr[i, :-1] = np.sum(self.exog[:, :].T * self.exog[:, i] * coeff, axis=1)
+
+
+        hess_arr[-1,:-1] = (self.exog[:, :].T * mu * a1 *
+                ((1 + a4) * (1 - a3 / a2) / a2 -
+                 p * (lprob + dgpart + 2) / mu +
+                 p / mu * (a3 + p * a1) / a2 -
+                 a4 * pgpart) / alpha).sum(axis=1)
+
+
+        da2 = (a1 * (2 * lprob +
+                     2 * dgpart + 3 -
+                     2 * a3 / a2
+                     + a1 * pgpart
+                     - 2 * prob +
+                     prob * a3 / a2) / alpha**2)
+
+        hess_arr[-1, -1] = da2.sum()
+
+        tri_idx = np.triu_indices(dim + 1, k=1)
+        hess_arr[tri_idx] = hess_arr.T[tri_idx]
+
+        return hess_arr

     def hessian_factor(self, params):
         """
@@ -2248,13 +4138,80 @@ class NegativeBinomialP(CountModel):
         hessian : ndarray, 2-D
             The hessian matrix of the model.
         """
-        pass
+        params = np.asarray(params)
+        if self._transparams:
+            alpha = np.exp(params[-1])
+        else:
+            alpha = params[-1]
+        params = params[:-1]
+
+        p = 2 - self.parameterization
+        y = self.endog
+        mu = self.predict(params)
+
+        mu_p = mu**p
+        a1 = mu_p / alpha
+        a2 = mu + a1
+        a3 = y + a1
+        a4 = p * a1 / mu
+        a5 = a4 * p / mu
+
+        dgpart = digamma(a3) - digamma(a1)
+
+        coeff = mu**2 * (((1 + a4)**2 * a3 / a2**2 -
+                          a3 * (a5 - a4 / mu) / a2 -
+                          y / mu**2 -
+                          2 * a4 * (1 + a4) / a2 +
+                          a5 * (np.log(a1) - np.log(a2) + dgpart + 2) -
+                          a4 * (np.log(a1) - np.log(a2) + dgpart + 1) / mu -
+                          a4**2 * (polygamma(1, a1) - polygamma(1, a3))) +
+                         (-(1 + a4) * a3 / a2 +
+                          y / mu +
+                          a4 * (np.log(a1) - np.log(a2) + dgpart + 1)) / mu)
+
+        hfbb = coeff
+
+        hfba = (mu * a1 *
+                ((1 + a4) * (1 - a3 / a2) / a2 -
+                 p * (np.log(a1 / a2) + dgpart + 2) / mu +
+                 p * (a3 / mu + a4) / a2 +
+                 a4 * (polygamma(1, a1) - polygamma(1, a3))) / alpha)
+
+        hfaa = (a1 * (2 * np.log(a1 / a2) +
+                     2 * dgpart + 3 -
+                     2 * a3 / a2 - a1 * polygamma(1, a1) +
+                     a1 * polygamma(1, a3) - 2 * a1 / a2 +
+                     a1 * a3 / a2**2) / alpha**2)
+
+        return hfbb, hfba, hfaa
+
+    @Appender(_get_start_params_null_docs)
+    def _get_start_params_null(self):
+        offset = getattr(self, "offset", 0)
+        exposure = getattr(self, "exposure", 0)
+
+        const = (self.endog / np.exp(offset + exposure)).mean()
+        params = [np.log(const)]
+        mu = const * np.exp(offset + exposure)
+        resid = self.endog - mu
+        a = self._estimate_dispersion(mu, resid, df_resid=resid.shape[0] - 1)
+        params.append(a)
+
+        return np.array(params)
+
+    def _estimate_dispersion(self, mu, resid, df_resid=None):
+        q = self.parameterization - 1
+        if df_resid is None:
+            df_resid = resid.shape[0]
+        a = ((resid**2 / mu - 1) * mu**(-q)).sum() / df_resid
+        return a

     @Appender(DiscreteModel.fit.__doc__)
-    def fit(self, start_params=None, method='bfgs', maxiter=35, full_output
-        =1, disp=1, callback=None, use_transparams=False, cov_type=
-        'nonrobust', cov_kwds=None, use_t=None, optim_kwds_prelim=None, **
-        kwargs):
+    def fit(self, start_params=None, method='bfgs', maxiter=35,
+            full_output=1, disp=1, callback=None, use_transparams=False,
+            cov_type='nonrobust', cov_kwds=None, use_t=None,
+            optim_kwds_prelim=None, **kwargs):
+        # TODO: Fix doc string
         """
         use_transparams : bool
             This parameter enable internal transformation to impose
@@ -2263,7 +4220,139 @@ class NegativeBinomialP(CountModel):
             constraint. In case use_transparams=True and method="newton" or
             "ncg" transformation is ignored.
         """
-        pass
+        if use_transparams and method not in ['newton', 'ncg']:
+            self._transparams = True
+        else:
+            if use_transparams:
+                warnings.warn('Parameter "use_transparams" is ignored',
+                              RuntimeWarning)
+            self._transparams = False
+        if start_params is None:
+            offset = getattr(self, "offset", 0) + getattr(self, "exposure", 0)
+            if np.size(offset) == 1 and offset == 0:
+                offset = None
+            kwds_prelim = {'disp': 0, 'skip_hessian': True, 'warn_convergence': False}
+            if optim_kwds_prelim is not None:
+                kwds_prelim.update(optim_kwds_prelim)
+            mod_poi = Poisson(self.endog, self.exog, offset=offset)
+            with warnings.catch_warnings():
+                warnings.simplefilter("always")
+                res_poi = mod_poi.fit(**kwds_prelim)
+            start_params = res_poi.params
+            a = self._estimate_dispersion(res_poi.predict(), res_poi.resid,
+                                          df_resid=res_poi.df_resid)
+            start_params = np.append(start_params, max(0.05, a))
+
+        if callback is None:
+            # work around perfect separation callback #3895
+            callback = lambda *x: x
+
+        mlefit = super(NegativeBinomialP, self).fit(start_params=start_params,
+                        maxiter=maxiter, method=method, disp=disp,
+                        full_output=full_output, callback=callback,
+                        **kwargs)
+        if optim_kwds_prelim is not None:
+            mlefit.mle_settings["optim_kwds_prelim"] = optim_kwds_prelim
+        if use_transparams and method not in ["newton", "ncg"]:
+            self._transparams = False
+            mlefit._results.params[-1] = np.exp(mlefit._results.params[-1])
+
+        nbinfit = NegativeBinomialPResults(self, mlefit._results)
+        result = NegativeBinomialPResultsWrapper(nbinfit)
+
+        if cov_kwds is None:
+            cov_kwds = {}
+        result._get_robustcov_results(cov_type=cov_type,
+                                    use_self=True, use_t=use_t, **cov_kwds)
+        return result
+
+    @Appender(DiscreteModel.fit_regularized.__doc__)
+    def fit_regularized(self, start_params=None, method='l1',
+            maxiter='defined_by_method', full_output=1, disp=1, callback=None,
+            alpha=0, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=1e-4,
+            qc_tol=0.03, **kwargs):
+
+        _validate_l1_method(method)
+
+        if np.size(alpha) == 1 and alpha != 0:
+            k_params = self.exog.shape[1] + self.k_extra
+            alpha = alpha * np.ones(k_params)
+            alpha[-1] = 0
+
+        alpha_p = alpha[:-1] if (self.k_extra and np.size(alpha) > 1) else alpha
+
+        self._transparams = False
+        if start_params is None:
+            offset = getattr(self, "offset", 0) + getattr(self, "exposure", 0)
+            if np.size(offset) == 1 and offset == 0:
+                offset = None
+            mod_poi = Poisson(self.endog, self.exog, offset=offset)
+            with warnings.catch_warnings():
+                warnings.simplefilter("always")
+                start_params = mod_poi.fit_regularized(
+                    start_params=start_params, method=method, maxiter=maxiter,
+                    full_output=full_output, disp=0, callback=callback,
+                    alpha=alpha_p, trim_mode=trim_mode,
+                    auto_trim_tol=auto_trim_tol, size_trim_tol=size_trim_tol,
+                    qc_tol=qc_tol, **kwargs).params
+            start_params = np.append(start_params, 0.1)
+
+        cntfit = super(CountModel, self).fit_regularized(
+                start_params=start_params, method=method, maxiter=maxiter,
+                full_output=full_output, disp=disp, callback=callback,
+                alpha=alpha, trim_mode=trim_mode, auto_trim_tol=auto_trim_tol,
+                size_trim_tol=size_trim_tol, qc_tol=qc_tol, **kwargs)
+
+        discretefit = L1NegativeBinomialResults(self, cntfit)
+
+        return L1NegativeBinomialResultsWrapper(discretefit)
+
+    @Appender(Poisson.predict.__doc__)
+    def predict(self, params, exog=None, exposure=None, offset=None,
+                which='mean', y_values=None):
+
+        if exog is None:
+            exog = self.exog
+
+        if exposure is None:
+            exposure = getattr(self, 'exposure', 0)
+        elif exposure != 0:
+            exposure = np.log(exposure)
+
+        if offset is None:
+            offset = getattr(self, 'offset', 0)
+
+        fitted = np.dot(exog, params[:exog.shape[1]])
+        linpred = fitted + exposure + offset
+
+        if which == 'mean':
+            return np.exp(linpred)
+        elif which == 'linear':
+            return linpred
+        elif which == 'var':
+            mean = np.exp(linpred)
+            alpha = params[-1]
+            p = self.parameterization  # no `-1` as in GPP
+            var_ = mean * (1 + alpha * mean**(p - 1))
+            return var_
+        elif which == 'prob':
+            if y_values is None:
+                y_values = np.atleast_2d(np.arange(0, np.max(self.endog)+1))
+
+            mu = self.predict(params, exog, exposure, offset)
+            size, prob = self.convert_params(params, mu)
+            return nbinom.pmf(y_values, size[:, None], prob[:, None])
+        else:
+            raise ValueError('keyword "which" = %s not recognized' % which)
+
+    def convert_params(self, params, mu):
+        alpha = params[-1]
+        p = 2 - self.parameterization
+
+        size = 1. / alpha * mu**p
+        prob = size / (size + mu)
+
+        return (size, prob)

     def _deriv_score_obs_dendog(self, params):
         """derivative of score_obs w.r.t. endog
@@ -2278,61 +4367,92 @@ class NegativeBinomialP(CountModel):
         derivative : ndarray_2d
             The derivative of the score_obs with respect to endog.
         """
-        pass
+        from statsmodels.tools.numdiff import _approx_fprime_cs_scalar
+
+        def f(y):
+            if y.ndim == 2 and y.shape[1] == 1:
+                y = y[:, 0]
+            sf = self.score_factor(params, endog=y)
+            return np.column_stack(sf)
+
+        dsf = _approx_fprime_cs_scalar(self.endog[:, None], f)
+        # deriv is 2d vector
+        d1 = dsf[:, :1] * self.exog
+        d2 = dsf[:, 1:2]
+
+        return np.column_stack((d1, d2))

     def _var(self, mu, params=None):
         """variance implied by the distribution

         internal use, will be refactored or removed
         """
-        pass
+        alpha = params[-1]
+        p = self.parameterization  # no `-1` as in GPP
+        var_ = mu * (1 + alpha * mu**(p - 1))
+        return var_

     def _prob_nonzero(self, mu, params):
         """Probability that count is not zero

         internal use in Censored model, will be refactored or removed
         """
-        pass
+        alpha = params[-1]
+        p = self.parameterization
+        prob_nz = 1 - (1 + alpha * mu**(p-1))**(- 1 / alpha)
+        return prob_nz

     @Appender(Poisson.get_distribution.__doc__)
     def get_distribution(self, params, exog=None, exposure=None, offset=None):
         """get frozen instance of distribution
         """
-        pass
+        mu = self.predict(params, exog=exog, exposure=exposure, offset=offset)
+        size, prob = self.convert_params(params, mu)
+        # distr = nbinom(size[:, None], prob[:, None])
+        distr = nbinom(size, prob)
+        return distr


+### Results Class ###
+
 class DiscreteResults(base.LikelihoodModelResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for the discrete dependent variable models.',
-        'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {"one_line_description" :
+        "A results class for the discrete dependent variable models.",
+        "extra_attr" : ""}

     def __init__(self, model, mlefit, cov_type='nonrobust', cov_kwds=None,
-        use_t=None):
+                 use_t=None):
+        #super(DiscreteResults, self).__init__(model, params,
+        #        np.linalg.inv(-hessian), scale=1.)
         self.model = model
-        self.method = 'MLE'
+        self.method = "MLE"
         self.df_model = model.df_model
         self.df_resid = model.df_resid
         self._cache = {}
         self.nobs = model.exog.shape[0]
         self.__dict__.update(mlefit.__dict__)
-        self.converged = mlefit.mle_retvals['converged']
+        self.converged = mlefit.mle_retvals["converged"]
+
         if not hasattr(self, 'cov_type'):
+            # do this only if super, i.e. mlefit did not already add cov_type
+            # robust covariance
             if use_t is not None:
                 self.use_t = use_t
             if cov_type == 'nonrobust':
                 self.cov_type = 'nonrobust'
-                self.cov_kwds = {'description': 
-                    'Standard Errors assume that the ' +
-                    'covariance matrix of the errors is correctly ' +
-                    'specified.'}
+                self.cov_kwds = {'description' : 'Standard Errors assume that the ' +
+                                 'covariance matrix of the errors is correctly ' +
+                                 'specified.'}
             else:
                 if cov_kwds is None:
                     cov_kwds = {}
                 from statsmodels.base.covtype import get_robustcov_results
-                get_robustcov_results(self, cov_type=cov_type, use_self=
-                    True, **cov_kwds)
+                get_robustcov_results(self, cov_type=cov_type, use_self=True,
+                                           **cov_kwds)
+

     def __getstate__(self):
+        # remove unpicklable methods
         mle_settings = getattr(self, 'mle_settings', None)
         if mle_settings is not None:
             if 'callback' in mle_settings:
@@ -2346,14 +4466,14 @@ class DiscreteResults(base.LikelihoodModelResults):
         """
         McFadden's pseudo-R-squared. `1 - (llf / llnull)`
         """
-        pass
+        return 1 - self.llf/self.llnull

     @cache_readonly
     def llr(self):
         """
         Likelihood ratio chi-squared statistic; `-2*(llnull - llf)`
         """
-        pass
+        return -2*(self.llnull - self.llf)

     @cache_readonly
     def llr_pvalue(self):
@@ -2362,7 +4482,7 @@ class DiscreteResults(base.LikelihoodModelResults):
         statistic greater than llr.  llr has a chi-squared distribution
         with degrees of freedom `df_model`.
         """
-        pass
+        return stats.distributions.chi2.sf(self.llr, self.df_model)

     def set_null_options(self, llnull=None, attach_results=True, **kwargs):
         """
@@ -2390,21 +4510,73 @@ class DiscreteResults(base.LikelihoodModelResults):
         -----
         Modifies attributes of this instance, and so has no return.
         """
-        pass
+        # reset cache, note we need to add here anything that depends on
+        # llnullor the null model. If something is missing, then the attribute
+        # might be incorrect.
+        self._cache.pop('llnull', None)
+        self._cache.pop('llr', None)
+        self._cache.pop('llr_pvalue', None)
+        self._cache.pop('prsquared', None)
+        if hasattr(self, 'res_null'):
+            del self.res_null
+
+        if llnull is not None:
+            self._cache['llnull'] = llnull
+        self._attach_nullmodel = attach_results
+        self._optim_kwds_null = kwargs

     @cache_readonly
     def llnull(self):
         """
         Value of the constant-only loglikelihood
         """
-        pass
+        model = self.model
+        kwds = model._get_init_kwds().copy()
+        for key in getattr(model, '_null_drop_keys', []):
+            del kwds[key]
+        # TODO: what parameters to pass to fit?
+        mod_null = model.__class__(model.endog, np.ones(self.nobs), **kwds)
+        # TODO: consider catching and warning on convergence failure?
+        # in the meantime, try hard to converge. see
+        # TestPoissonConstrained1a.test_smoke
+
+        optim_kwds = getattr(self, '_optim_kwds_null', {}).copy()
+
+        if 'start_params' in optim_kwds:
+            # user provided
+            sp_null = optim_kwds.pop('start_params')
+        elif hasattr(model, '_get_start_params_null'):
+            # get moment estimates if available
+            sp_null = model._get_start_params_null()
+        else:
+            sp_null = None
+
+        opt_kwds = dict(method='bfgs', warn_convergence=False, maxiter=10000,
+                        disp=0)
+        opt_kwds.update(optim_kwds)
+
+        if optim_kwds:
+            res_null = mod_null.fit(start_params=sp_null, **opt_kwds)
+        else:
+            # this should be a reasonably method case across versions
+            res_null = mod_null.fit(start_params=sp_null, method='nm',
+                                    warn_convergence=False,
+                                    maxiter=10000, disp=0)
+            res_null = mod_null.fit(start_params=res_null.params, method='bfgs',
+                                    warn_convergence=False,
+                                    maxiter=10000, disp=0)
+
+        if getattr(self, '_attach_nullmodel', False) is not False:
+            self.res_null = res_null
+
+        return res_null.llf

     @cache_readonly
     def fittedvalues(self):
         """
         Linear predictor XB.
         """
-        pass
+        return np.dot(self.model.exog, self.params[:self.model.exog.shape[1]])

     @cache_readonly
     def resid_response(self):
@@ -2412,7 +4584,7 @@ class DiscreteResults(base.LikelihoodModelResults):
         Respnose residuals. The response residuals are defined as
         `endog - fittedvalues`
         """
-        pass
+        return self.model.endog - self.predict()

     @cache_readonly
     def resid_pearson(self):
@@ -2420,7 +4592,8 @@ class DiscreteResults(base.LikelihoodModelResults):
         Pearson residuals defined as response residuals divided by standard
         deviation implied by the model.
         """
-        pass
+        var_ = self.predict(which="var")
+        return self.resid_response / np.sqrt(var_)

     @cache_readonly
     def aic(self):
@@ -2428,7 +4601,8 @@ class DiscreteResults(base.LikelihoodModelResults):
         Akaike information criterion.  `-2*(llf - p)` where `p` is the number
         of regressors including the intercept.
         """
-        pass
+        k_extra = getattr(self.model, 'k_extra', 0)
+        return -2*(self.llf - (self.df_model + 1 + k_extra))

     @cache_readonly
     def bic(self):
@@ -2436,7 +4610,12 @@ class DiscreteResults(base.LikelihoodModelResults):
         Bayesian information criterion. `-2*llf + ln(nobs)*p` where `p` is the
         number of regressors including the intercept.
         """
-        pass
+        k_extra = getattr(self.model, 'k_extra', 0)
+        return -2*self.llf + np.log(self.nobs)*(self.df_model + 1 + k_extra)
+
+    @cache_readonly
+    def im_ratio(self):
+        return pinfer.im_ratio(self)

     def info_criteria(self, crit, dk_params=0):
         """Return an information criterion for the model.
@@ -2462,12 +4641,42 @@ class DiscreteResults(base.LikelihoodModelResults):
         Burnham KP, Anderson KR (2002). Model Selection and Multimodel
         Inference; Springer New York.
         """
-        pass
+        crit = crit.lower()
+        k_extra = getattr(self.model, 'k_extra', 0)
+        k_params = self.df_model + 1 + k_extra + dk_params
+
+        if crit == "aic":
+            return -2 * self.llf + 2 * k_params
+        elif crit == "bic":
+            nobs = self.df_model + self.df_resid + 1
+            bic = -2*self.llf + k_params*np.log(nobs)
+            return bic
+        elif crit == "tic":
+            return pinfer.tic(self)
+        elif crit == "gbic":
+            return pinfer.gbic(self)
+        else:
+            raise ValueError("Name of information criterion not recognized.")
+
+    def score_test(self, exog_extra=None, params_constrained=None,
+                   hypothesis='joint', cov_type=None, cov_kwds=None,
+                   k_constraints=None, observed=True):
+
+        res = pinfer.score_test(self, exog_extra=exog_extra,
+                                params_constrained=params_constrained,
+                                hypothesis=hypothesis,
+                                cov_type=cov_type, cov_kwds=cov_kwds,
+                                k_constraints=k_constraints,
+                                observed=observed)
+        return res
+
     score_test.__doc__ = pinfer.score_test.__doc__

-    def get_prediction(self, exog=None, transform=True, which='mean',
-        linear=None, row_labels=None, average=False, agg_weights=None,
-        y_values=None, **kwargs):
+    def get_prediction(self, exog=None,
+                       transform=True, which="mean", linear=None,
+                       row_labels=None, average=False,
+                       agg_weights=None, y_values=None,
+                       **kwargs):
         """
         Compute prediction results when endpoint transformation is valid.

@@ -2530,10 +4739,48 @@ class DiscreteResults(base.LikelihoodModelResults):
         -----
         Status: new in 0.14, experimental
         """
-        pass

-    def get_margeff(self, at='overall', method='dydx', atexog=None, dummy=
-        False, count=False):
+        if linear is True:
+            # compatibility with old keyword
+            which = "linear"
+
+        pred_kwds = kwargs
+        # y_values is explicit so we can add it to the docstring
+        if y_values is not None:
+            pred_kwds["y_values"] = y_values
+
+        res = pred.get_prediction(
+            self,
+            exog=exog,
+            which=which,
+            transform=transform,
+            row_labels=row_labels,
+            average=average,
+            agg_weights=agg_weights,
+            pred_kwds=pred_kwds
+            )
+        return res
+
+    def get_distribution(self, exog=None, transform=True, **kwargs):
+
+        exog, _ = self._transform_predict_exog(exog, transform=transform)
+        if exog is not None:
+            exog = np.asarray(exog)
+        distr = self.model.get_distribution(self.params,
+                                            exog=exog,
+                                            **kwargs
+                                            )
+        return distr
+
+    def _get_endog_name(self, yname, yname_list):
+        if yname is None:
+            yname = self.model.endog_names
+        if yname_list is None:
+            yname_list = self.model.endog_names
+        return yname, yname_list
+
+    def get_margeff(self, at='overall', method='dydx', atexog=None,
+            dummy=False, count=False):
         """Get marginal effects of the fitted model.

         Parameters
@@ -2604,7 +4851,10 @@ class DiscreteResults(base.LikelihoodModelResults):
         When using after Poisson, returns the expected number of events per
         period, assuming that the model is loglinear.
         """
-        pass
+        if getattr(self.model, "offset", None) is not None:
+            raise NotImplementedError("Margins with offset are not available.")
+        from statsmodels.discrete.discrete_margins import DiscreteMargins
+        return DiscreteMargins(self, (at, method, atexog, dummy, count))

     def get_influence(self):
         """
@@ -2620,10 +4870,11 @@ class DiscreteResults(base.LikelihoodModelResults):
         --------
         statsmodels.stats.outliers_influence.MLEInfluence
         """
-        pass
+        from statsmodels.stats.outliers_influence import MLEInfluence
+        return MLEInfluence(self)

-    def summary(self, yname=None, xname=None, title=None, alpha=0.05,
-        yname_list=None):
+    def summary(self, yname=None, xname=None, title=None, alpha=.05,
+                yname_list=None):
         """
         Summarize the Regression Results.

@@ -2650,10 +4901,51 @@ class DiscreteResults(base.LikelihoodModelResults):
         --------
         statsmodels.iolib.summary.Summary : Class that hold summary results.
         """
-        pass

-    def summary2(self, yname=None, xname=None, title=None, alpha=0.05,
-        float_format='%.4f'):
+        top_left = [('Dep. Variable:', None),
+                     ('Model:', [self.model.__class__.__name__]),
+                     ('Method:', [self.method]),
+                     ('Date:', None),
+                     ('Time:', None),
+                     ('converged:', ["%s" % self.mle_retvals['converged']]),
+                    ]
+
+        top_right = [('No. Observations:', None),
+                     ('Df Residuals:', None),
+                     ('Df Model:', None),
+                     ('Pseudo R-squ.:', ["%#6.4g" % self.prsquared]),
+                     ('Log-Likelihood:', None),
+                     ('LL-Null:', ["%#8.5g" % self.llnull]),
+                     ('LLR p-value:', ["%#6.4g" % self.llr_pvalue])
+                     ]
+
+        if hasattr(self, 'cov_type'):
+            top_left.append(('Covariance Type:', [self.cov_type]))
+
+        if title is None:
+            title = self.model.__class__.__name__ + ' ' + "Regression Results"
+
+        # boiler plate
+        from statsmodels.iolib.summary import Summary
+        smry = Summary()
+        yname, yname_list = self._get_endog_name(yname, yname_list)
+
+        # for top of table
+        smry.add_table_2cols(self, gleft=top_left, gright=top_right,
+                             yname=yname, xname=xname, title=title)
+
+        # for parameters, etc
+        smry.add_table_params(self, yname=yname_list, xname=xname, alpha=alpha,
+                              use_t=self.use_t)
+
+        if hasattr(self, 'constraints'):
+            smry.add_extra_txt(['Model has been estimated subject to linear '
+                                'equality constraints.'])
+
+        return smry
+
+    def summary2(self, yname=None, xname=None, title=None, alpha=.05,
+                 float_format="%.4f"):
         """
         Experimental function to summarize regression results.

@@ -2682,12 +4974,22 @@ class DiscreteResults(base.LikelihoodModelResults):
         --------
         statsmodels.iolib.summary2.Summary : Class that holds summary results.
         """
-        pass
+        from statsmodels.iolib import summary2
+        smry = summary2.Summary()
+        smry.add_base(results=self, alpha=alpha, float_format=float_format,
+                      xname=xname, yname=yname, title=title)
+
+        if hasattr(self, 'constraints'):
+            smry.add_text('Model has been estimated subject to linear '
+                          'equality constraints.')
+
+        return smry


 class CountResults(DiscreteResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for count data', 'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {
+        "one_line_description": "A results class for count data",
+        "extra_attr": ""}

     @cache_readonly
     def resid(self):
@@ -2703,7 +5005,7 @@ class CountResults(DiscreteResults):
         where :math:`p = \\exp(X\\beta)`. Any exposure and offset variables
         are also handled.
         """
-        pass
+        return self.model.endog - self.predict()

     def get_diagnostic(self, y_max=None):
         """
@@ -2721,53 +5023,82 @@ class CountResults(DiscreteResults):
         --------
         statsmodels.statsmodels.discrete.diagnostic.CountDiagnostic
         """
-        pass
+        from statsmodels.discrete.diagnostic import CountDiagnostic
+        return CountDiagnostic(self, y_max=y_max)


 class NegativeBinomialResults(CountResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for NegativeBinomial 1 and 2', 'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {
+        "one_line_description": "A results class for NegativeBinomial 1 and 2",
+        "extra_attr": ""}

     @cache_readonly
     def lnalpha(self):
         """Natural log of alpha"""
-        pass
+        return np.log(self.params[-1])

     @cache_readonly
     def lnalpha_std_err(self):
         """Natural log of standardized error"""
-        pass
+        return self.bse[-1] / self.params[-1]
+
+    @cache_readonly
+    def aic(self):
+        # + 1 because we estimate alpha
+        k_extra = getattr(self.model, 'k_extra', 0)
+        return -2*(self.llf - (self.df_model + self.k_constant + k_extra))
+
+    @cache_readonly
+    def bic(self):
+        # + 1 because we estimate alpha
+        k_extra = getattr(self.model, 'k_extra', 0)
+        return -2*self.llf + np.log(self.nobs)*(self.df_model +
+                                                self.k_constant + k_extra)


 class NegativeBinomialPResults(NegativeBinomialResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for NegativeBinomialP', 'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {
+        "one_line_description": "A results class for NegativeBinomialP",
+        "extra_attr": ""}


 class GeneralizedPoissonResults(NegativeBinomialResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for Generalized Poisson', 'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {
+        "one_line_description": "A results class for Generalized Poisson",
+        "extra_attr": ""}
+
+    @cache_readonly
+    def _dispersion_factor(self):
+        p = getattr(self.model, 'parameterization', 0)
+        mu = self.predict()
+        return (1 + self.params[-1] * mu**p)**2


 class L1CountResults(DiscreteResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for count data fit by l1 regularization',
-        'extra_attr': _l1_results_attr}
+    __doc__ = _discrete_results_docs % {"one_line_description" :
+            "A results class for count data fit by l1 regularization",
+            "extra_attr" : _l1_results_attr}

     def __init__(self, model, cntfit):
         super(L1CountResults, self).__init__(model, cntfit)
+        # self.trimmed is a boolean array with T/F telling whether or not that
+        # entry in params has been set zero'd out.
         self.trimmed = cntfit.mle_retvals['trimmed']
         self.nnz_params = (~self.trimmed).sum()
+
+        # Set degrees of freedom.  In doing so,
+        # adjust for extra parameter in NegativeBinomial nb1 and nb2
+        # extra parameter is not included in df_model
         k_extra = getattr(self.model, 'k_extra', 0)
+
         self.df_model = self.nnz_params - 1 - k_extra
-        self.df_resid = float(self.model.endog.shape[0] - self.nnz_params
-            ) + k_extra
+        self.df_resid = float(self.model.endog.shape[0] - self.nnz_params) + k_extra


 class PoissonResults(CountResults):

     def predict_prob(self, n=None, exog=None, exposure=None, offset=None,
-        transform=True):
+                     transform=True):
         """
         Return predicted probability of each count level for each observation

@@ -2786,7 +5117,14 @@ class PoissonResults(CountResults):
             observation is 0, column 1 is the probability that each
             observation is 1, etc.
         """
-        pass
+        if n is not None:
+            counts = np.atleast_2d(n)
+        else:
+            counts = np.atleast_2d(np.arange(0, np.max(self.model.endog)+1))
+        mu = self.predict(exog=exog, exposure=exposure, offset=offset,
+                          transform=transform, which="mean")[:,None]
+        # uses broadcasting
+        return stats.poisson.pmf(counts, mu)

     @property
     def resid_pearson(self):
@@ -2804,7 +5142,9 @@ class PoissonResults(CountResults):

         For now :math:`M_j` is always set to 1.
         """
-        pass
+        # Pearson residuals
+        p = self.predict()  # fittedvalues is still linear
+        return (self.model.endog - p)/np.sqrt(p)

     def get_influence(self):
         """
@@ -2820,7 +5160,8 @@ class PoissonResults(CountResults):
         --------
         statsmodels.stats.outliers_influence.MLEInfluence
         """
-        pass
+        from statsmodels.stats.outliers_influence import MLEInfluence
+        return MLEInfluence(self)

     def get_diagnostic(self, y_max=None):
         """
@@ -2838,32 +5179,28 @@ class PoissonResults(CountResults):
         --------
         statsmodels.statsmodels.discrete.diagnostic.PoissonDiagnostic
         """
-        pass
+        from statsmodels.discrete.diagnostic import (
+            PoissonDiagnostic)
+        return PoissonDiagnostic(self, y_max=y_max)


 class L1PoissonResults(L1CountResults, PoissonResults):
     pass

-
 class L1NegativeBinomialResults(L1CountResults, NegativeBinomialResults):
     pass

-
 class L1GeneralizedPoissonResults(L1CountResults, GeneralizedPoissonResults):
     pass

-
 class OrderedResults(DiscreteResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for ordered discrete data.', 'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {"one_line_description" : "A results class for ordered discrete data." , "extra_attr" : ""}
     pass

-
 class BinaryResults(DiscreteResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for binary data', 'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {"one_line_description" : "A results class for binary data", "extra_attr" : ""}

-    def pred_table(self, threshold=0.5):
+    def pred_table(self, threshold=.5):
         """
         Prediction table

@@ -2878,7 +5215,41 @@ class BinaryResults(DiscreteResults):
         pred_table[i,j] refers to the number of times "i" was observed and
         the model predicted "j". Correct predictions are along the diagonal.
         """
-        pass
+        model = self.model
+        actual = model.endog
+        pred = np.array(self.predict() > threshold, dtype=float)
+        bins = np.array([0, 0.5, 1])
+        return np.histogram2d(actual, pred, bins=bins)[0]
+
+    @Appender(DiscreteResults.summary.__doc__)
+    def summary(self, yname=None, xname=None, title=None, alpha=.05,
+                yname_list=None):
+        smry = super(BinaryResults, self).summary(yname, xname, title, alpha,
+                                                  yname_list)
+        fittedvalues = self.model.cdf(self.fittedvalues)
+        absprederror = np.abs(self.model.endog - fittedvalues)
+        predclose_sum = (absprederror < 1e-4).sum()
+        predclose_frac = predclose_sum / len(fittedvalues)
+
+        # add warnings/notes
+        etext = []
+        if predclose_sum == len(fittedvalues):  # TODO: nobs?
+            wstr = "Complete Separation: The results show that there is"
+            wstr += "complete separation or perfect prediction.\n"
+            wstr += "In this case the Maximum Likelihood Estimator does "
+            wstr += "not exist and the parameters\n"
+            wstr += "are not identified."
+            etext.append(wstr)
+        elif predclose_frac > 0.1:  # TODO: get better diagnosis
+            wstr = "Possibly complete quasi-separation: A fraction "
+            wstr += "%4.2f of observations can be\n" % predclose_frac
+            wstr += "perfectly predicted. This might indicate that there "
+            wstr += "is complete\nquasi-separation. In this case some "
+            wstr += "parameters will not be identified."
+            etext.append(wstr)
+        if etext:
+            smry.add_extra_txt(etext)
+        return smry

     @cache_readonly
     def resid_dev(self):
@@ -2898,7 +5269,20 @@ class BinaryResults(DiscreteResults):

         For now :math:`M_j` is always set to 1.
         """
-        pass
+        #These are the deviance residuals
+        #model = self.model
+        endog = self.model.endog
+        #exog = model.exog
+        # M = # of individuals that share a covariate pattern
+        # so M[i] = 2 for i = two share a covariate pattern
+        M = 1
+        p = self.predict()
+        #Y_0 = np.where(exog == 0)
+        #Y_M = np.where(exog == M)
+        #NOTE: Common covariate patterns are not yet handled
+        res = -(1-endog)*np.sqrt(2*M*np.abs(np.log(1-p))) + \
+                endog*np.sqrt(2*M*np.abs(np.log(p)))
+        return res

     @cache_readonly
     def resid_pearson(self):
@@ -2916,7 +5300,16 @@ class BinaryResults(DiscreteResults):

         For now :math:`M_j` is always set to 1.
         """
-        pass
+        # Pearson residuals
+        #model = self.model
+        endog = self.model.endog
+        #exog = model.exog
+        # M = # of individuals that share a covariate pattern
+        # so M[i] = 2 for i = two share a covariate pattern
+        # use unique row pattern?
+        M = 1
+        p = self.predict()
+        return (endog - M*p)/np.sqrt(M*p*(1-p))

     @cache_readonly
     def resid_response(self):
@@ -2931,12 +5324,13 @@ class BinaryResults(DiscreteResults):

         where :math:`p=cdf(X\\beta)`.
         """
-        pass
+        return self.model.endog - self.predict()


 class LogitResults(BinaryResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for Logit Model', 'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {
+        "one_line_description": "A results class for Logit Model",
+        "extra_attr": ""}

     @cache_readonly
     def resid_generalized(self):
@@ -2952,7 +5346,8 @@ class LogitResults(BinaryResults):
         where :math:`p=cdf(X\\beta)`. This is the same as the `resid_response`
         for the Logit model.
         """
-        pass
+        # Generalized residuals
+        return self.model.endog - self.predict()

     def get_influence(self):
         """
@@ -2968,12 +5363,14 @@ class LogitResults(BinaryResults):
         --------
         statsmodels.stats.outliers_influence.MLEInfluence
         """
-        pass
+        from statsmodels.stats.outliers_influence import MLEInfluence
+        return MLEInfluence(self)


 class ProbitResults(BinaryResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for Probit Model', 'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {
+        "one_line_description": "A results class for Probit Model",
+        "extra_attr": ""}

     @cache_readonly
     def resid_generalized(self):
@@ -2986,16 +5383,22 @@ class ProbitResults(BinaryResults):

         .. math:: y\\frac{\\phi(X\\beta)}{\\Phi(X\\beta)}-(1-y)\\frac{\\phi(X\\beta)}{1-\\Phi(X\\beta)}
         """
-        pass
-
+        # generalized residuals
+        model = self.model
+        endog = model.endog
+        XB = self.predict(which="linear")
+        pdf = model.pdf(XB)
+        cdf = model.cdf(XB)
+        return endog * pdf/cdf - (1-endog)*pdf/(1-cdf)

 class L1BinaryResults(BinaryResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'Results instance for binary data fit by l1 regularization',
-        'extra_attr': _l1_results_attr}
-
+    __doc__ = _discrete_results_docs % {"one_line_description" :
+    "Results instance for binary data fit by l1 regularization",
+    "extra_attr" : _l1_results_attr}
     def __init__(self, model, bnryfit):
         super(L1BinaryResults, self).__init__(model, bnryfit)
+        # self.trimmed is a boolean array with T/F telling whether or not that
+        # entry in params has been set zero'd out.
         self.trimmed = bnryfit.mle_retvals['trimmed']
         self.nnz_params = (~self.trimmed).sum()
         self.df_model = self.nnz_params - 1
@@ -3003,19 +5406,53 @@ class L1BinaryResults(BinaryResults):


 class MultinomialResults(DiscreteResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for multinomial data', 'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {"one_line_description" :
+            "A results class for multinomial data", "extra_attr" : ""}

     def __init__(self, model, mlefit):
         super(MultinomialResults, self).__init__(model, mlefit)
         self.J = model.J
         self.K = model.K

+    @staticmethod
+    def _maybe_convert_ynames_int(ynames):
+        # see if they're integers
+        issue_warning = False
+        msg = ('endog contains values are that not int-like. Uses string '
+               'representation of value. Use integer-valued endog to '
+               'suppress this warning.')
+        for i in ynames:
+            try:
+                if ynames[i] % 1 == 0:
+                    ynames[i] = str(int(ynames[i]))
+                else:
+                    issue_warning = True
+                    ynames[i] = str(ynames[i])
+            except TypeError:
+                ynames[i] = str(ynames[i])
+        if issue_warning:
+            warnings.warn(msg, SpecificationWarning)
+
+        return ynames
+
     def _get_endog_name(self, yname, yname_list, all=False):
         """
         If all is False, the first variable name is dropped
         """
-        pass
+        model = self.model
+        if yname is None:
+            yname = model.endog_names
+        if yname_list is None:
+            ynames = model._ynames_map
+            ynames = self._maybe_convert_ynames_int(ynames)
+            # use range below to ensure sortedness
+            ynames = [ynames[key] for key in range(int(model.J))]
+            ynames = ['='.join([yname, name]) for name in ynames]
+            if not all:
+                yname_list = ynames[1:] # assumes first variable is dropped
+            else:
+                yname_list = ynames
+        return yname, yname_list

     def pred_table(self):
         """
@@ -3026,12 +5463,38 @@ class MultinomialResults(DiscreteResults):
         pred_table[i,j] refers to the number of times "i" was observed and
         the model predicted "j". Correct predictions are along the diagonal.
         """
-        pass
+        ju = self.model.J - 1  # highest index
+        # these are the actual, predicted indices
+        #idx = lzip(self.model.endog, self.predict().argmax(1))
+        bins = np.concatenate(([0], np.linspace(0.5, ju - 0.5, ju), [ju]))
+        return np.histogram2d(self.model.endog, self.predict().argmax(1),
+                              bins=bins)[0]
+
+    @cache_readonly
+    def bse(self):
+        bse = np.sqrt(np.diag(self.cov_params()))
+        return bse.reshape(self.params.shape, order='F')
+
+    @cache_readonly
+    def aic(self):
+        return -2*(self.llf - (self.df_model+self.model.J-1))
+
+    @cache_readonly
+    def bic(self):
+        return -2*self.llf + np.log(self.nobs)*(self.df_model+self.model.J-1)
+
+    def conf_int(self, alpha=.05, cols=None):
+        confint = super(DiscreteResults, self).conf_int(alpha=alpha,
+                                                            cols=cols)
+        return confint.transpose(2,0,1)

     def get_prediction(self):
         """Not implemented for Multinomial
         """
-        pass
+        raise NotImplementedError
+
+    def margeff(self):
+        raise NotImplementedError("Use get_margeff instead")

     @cache_readonly
     def resid_misclassified(self):
@@ -3051,9 +5514,11 @@ class MultinomialResults(DiscreteResults):
         predicted probability is the same as that of the observed variable
         and 1 otherwise.
         """
-        pass
+        # it's 0 or 1 - 0 for correct prediction and 1 for a missed one
+        return (self.model.wendog.argmax(1) !=
+                self.predict().argmax(1)).astype(float)

-    def summary2(self, alpha=0.05, float_format='%.4f'):
+    def summary2(self, alpha=0.05, float_format="%.4f"):
         """Experimental function to summarize regression results

         Parameters
@@ -3073,22 +5538,48 @@ class MultinomialResults(DiscreteResults):
         --------
         statsmodels.iolib.summary2.Summary : class to hold summary results
         """
-        pass

+        from statsmodels.iolib import summary2
+        smry = summary2.Summary()
+        smry.add_dict(summary2.summary_model(self))
+        # One data frame per value of endog
+        eqn = self.params.shape[1]
+        confint = self.conf_int(alpha)
+        for i in range(eqn):
+            coefs = summary2.summary_params((self, self.params[:, i],
+                                             self.bse[:, i],
+                                             self.tvalues[:, i],
+                                             self.pvalues[:, i],
+                                             confint[i]),
+                                            alpha=alpha)
+            # Header must show value of endog
+            level_str =  self.model.endog_names + ' = ' + str(i)
+            coefs[level_str] = coefs.index
+            coefs = coefs.iloc[:, [-1, 0, 1, 2, 3, 4, 5]]
+            smry.add_df(coefs, index=False, header=True,
+                        float_format=float_format)
+            smry.add_title(results=self)
+        return smry

-class L1MultinomialResults(MultinomialResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for multinomial data fit by l1 regularization',
-        'extra_attr': _l1_results_attr}

+class L1MultinomialResults(MultinomialResults):
+    __doc__ = _discrete_results_docs % {"one_line_description" :
+        "A results class for multinomial data fit by l1 regularization",
+        "extra_attr" : _l1_results_attr}
     def __init__(self, model, mlefit):
         super(L1MultinomialResults, self).__init__(model, mlefit)
+        # self.trimmed is a boolean array with T/F telling whether or not that
+        # entry in params has been set zero'd out.
         self.trimmed = mlefit.mle_retvals['trimmed']
         self.nnz_params = (~self.trimmed).sum()
+
+        # Note: J-1 constants
         self.df_model = self.nnz_params - (self.model.J - 1)
         self.df_resid = float(self.model.endog.shape[0] - self.nnz_params)


+#### Results Wrappers ####
+
 class OrderedResultsWrapper(lm.RegressionResultsWrapper):
     pass

@@ -3107,15 +5598,16 @@ class NegativeBinomialResultsWrapper(lm.RegressionResultsWrapper):
     pass


-wrap.populate_wrapper(NegativeBinomialResultsWrapper, NegativeBinomialResults)
+wrap.populate_wrapper(NegativeBinomialResultsWrapper,
+                      NegativeBinomialResults)


 class NegativeBinomialPResultsWrapper(lm.RegressionResultsWrapper):
     pass


-wrap.populate_wrapper(NegativeBinomialPResultsWrapper, NegativeBinomialPResults
-    )
+wrap.populate_wrapper(NegativeBinomialPResultsWrapper,
+                      NegativeBinomialPResults)


 class GeneralizedPoissonResultsWrapper(lm.RegressionResultsWrapper):
@@ -3123,7 +5615,7 @@ class GeneralizedPoissonResultsWrapper(lm.RegressionResultsWrapper):


 wrap.populate_wrapper(GeneralizedPoissonResultsWrapper,
-    GeneralizedPoissonResults)
+                      GeneralizedPoissonResults)


 class PoissonResultsWrapper(lm.RegressionResultsWrapper):
@@ -3149,7 +5641,7 @@ class L1NegativeBinomialResultsWrapper(lm.RegressionResultsWrapper):


 wrap.populate_wrapper(L1NegativeBinomialResultsWrapper,
-    L1NegativeBinomialResults)
+                      L1NegativeBinomialResults)


 class L1GeneralizedPoissonResultsWrapper(lm.RegressionResultsWrapper):
@@ -3157,14 +5649,17 @@ class L1GeneralizedPoissonResultsWrapper(lm.RegressionResultsWrapper):


 wrap.populate_wrapper(L1GeneralizedPoissonResultsWrapper,
-    L1GeneralizedPoissonResults)
+                      L1GeneralizedPoissonResults)


 class BinaryResultsWrapper(lm.RegressionResultsWrapper):
-    _attrs = {'resid_dev': 'rows', 'resid_generalized': 'rows',
-        'resid_pearson': 'rows', 'resid_response': 'rows'}
+    _attrs = {"resid_dev": "rows",
+              "resid_generalized": "rows",
+              "resid_pearson": "rows",
+              "resid_response": "rows"
+              }
     _wrap_attrs = wrap.union_dicts(lm.RegressionResultsWrapper._wrap_attrs,
-        _attrs)
+                                   _attrs)


 wrap.populate_wrapper(BinaryResultsWrapper, BinaryResults)
@@ -3178,12 +5673,12 @@ wrap.populate_wrapper(L1BinaryResultsWrapper, L1BinaryResults)


 class MultinomialResultsWrapper(lm.RegressionResultsWrapper):
-    _attrs = {'resid_misclassified': 'rows'}
+    _attrs = {"resid_misclassified": "rows"}
     _wrap_attrs = wrap.union_dicts(lm.RegressionResultsWrapper._wrap_attrs,
-        _attrs)
+                                   _attrs)
     _methods = {'conf_int': 'multivariate_confint'}
-    _wrap_methods = wrap.union_dicts(lm.RegressionResultsWrapper.
-        _wrap_methods, _methods)
+    _wrap_methods = wrap.union_dicts(lm.RegressionResultsWrapper._wrap_methods,
+                                     _methods)


 wrap.populate_wrapper(MultinomialResultsWrapper, MultinomialResults)
diff --git a/statsmodels/discrete/truncated_model.py b/statsmodels/discrete/truncated_model.py
index 69a517b1d..24f97b9ef 100644
--- a/statsmodels/discrete/truncated_model.py
+++ b/statsmodels/discrete/truncated_model.py
@@ -1,13 +1,27 @@
 from __future__ import division
-__all__ = ['TruncatedLFPoisson', 'TruncatedLFNegativeBinomialP',
-    'HurdleCountModel']
+
+__all__ = ["TruncatedLFPoisson", "TruncatedLFNegativeBinomialP",
+           "HurdleCountModel"]
+
 import warnings
 import numpy as np
 import statsmodels.base.model as base
 import statsmodels.base.wrapper as wrap
 import statsmodels.regression.linear_model as lm
-from statsmodels.distributions.discrete import truncatedpoisson, truncatednegbin
-from statsmodels.discrete.discrete_model import DiscreteModel, CountModel, CountResults, L1CountResults, Poisson, NegativeBinomialP, GeneralizedPoisson, _discrete_results_docs
+from statsmodels.distributions.discrete import (
+    truncatedpoisson,
+    truncatednegbin,
+    )
+from statsmodels.discrete.discrete_model import (
+    DiscreteModel,
+    CountModel,
+    CountResults,
+    L1CountResults,
+    Poisson,
+    NegativeBinomialP,
+    GeneralizedPoisson,
+    _discrete_results_docs,
+    )
 from statsmodels.tools.numdiff import approx_hess
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.sm_exceptions import ConvergenceWarning
@@ -15,8 +29,7 @@ from copy import deepcopy


 class TruncatedLFGeneric(CountModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Generic Truncated model for count data

     .. versionadded:: 0.14.0
@@ -33,21 +46,26 @@ class TruncatedLFGeneric(CountModel):
     truncation : int, optional
         Truncation parameter specify truncation point out of the support
         of the distribution. pmf(k) = 0 for k <= truncation
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """offset : array_like
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+           """offset : array_like
         Offset is added to the linear prediction with coefficient equal to 1.
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
         equal to 1.

-    """
-         + base._missing_param_doc})
-
-    def __init__(self, endog, exog, truncation=0, offset=None, exposure=
-        None, missing='none', **kwargs):
-        super(TruncatedLFGeneric, self).__init__(endog, exog, offset=offset,
-            exposure=exposure, missing=missing, **kwargs)
+    """ + base._missing_param_doc}
+
+    def __init__(self, endog, exog, truncation=0, offset=None,
+                 exposure=None, missing='none', **kwargs):
+        super(TruncatedLFGeneric, self).__init__(
+            endog,
+            exog,
+            offset=offset,
+            exposure=exposure,
+            missing=missing,
+            **kwargs
+            )
         mask = self.endog > truncation
         self.exog = self.exog[mask]
         self.endog = self.endog[mask]
@@ -55,8 +73,10 @@ class TruncatedLFGeneric(CountModel):
             self.offset = self.offset[mask]
         if exposure is not None:
             self.exposure = self.exposure[mask]
+
         self.trunc = truncation
-        self.truncation = truncation
+        self.truncation = truncation  # needed for recreating model
+        # We cannot set the correct df_resid here, not enough information
         self._init_keys.extend(['truncation'])
         self._null_drop_keys = []

@@ -79,7 +99,7 @@ class TruncatedLFGeneric(CountModel):
         -----

         """
-        pass
+        return np.sum(self.loglikeobs(params))

     def loglikeobs(self, params):
         """
@@ -100,7 +120,28 @@ class TruncatedLFGeneric(CountModel):
         -----

         """
-        pass
+        llf_main = self.model_main.loglikeobs(params)
+
+        yt = self.trunc + 1
+
+        # equivalent ways to compute truncation probability
+        # pmf0 = np.zeros_like(self.endog, dtype=np.float64)
+        # for i in range(self.trunc + 1):
+        #     model = self.model_main.__class__(np.ones_like(self.endog) * i,
+        #                                       self.exog)
+        #     pmf0 += np.exp(model.loglikeobs(params))
+        #
+        # pmf1 = self.model_main.predict(
+        #     params, which="prob", y_values=np.arange(yt)).sum(-1)
+
+        pmf = self.predict(
+            params, which="prob-base", y_values=np.arange(yt)).sum(-1)
+
+        llf = llf_main - np.log(1 - pmf)
+        # assert np.allclose(pmf0, pmf)
+        # assert np.allclose(pmf1, pmf)
+
+        return llf

     def score_obs(self, params):
         """
@@ -117,7 +158,25 @@ class TruncatedLFGeneric(CountModel):
             The score vector of the model, i.e. the first derivative of the
             loglikelihood function, evaluated at `params`
         """
-        pass
+        score_main = self.model_main.score_obs(params)
+
+        pmf = np.zeros_like(self.endog, dtype=np.float64)
+        # TODO: can we rewrite to following without creating new models
+        score_trunc = np.zeros_like(score_main, dtype=np.float64)
+        for i in range(self.trunc + 1):
+            model = self.model_main.__class__(
+                np.ones_like(self.endog) * i,
+                self.exog,
+                offset=getattr(self, "offset", None),
+                exposure=getattr(self, "exposure", None),
+                )
+            pmf_i = np.exp(model.loglikeobs(params))
+            score_trunc += (model.score_obs(params).T * pmf_i).T
+            pmf += pmf_i
+
+        dparams = score_main + (score_trunc.T / (1 - pmf)).T
+
+        return dparams

     def score(self, params):
         """
@@ -134,8 +193,84 @@ class TruncatedLFGeneric(CountModel):
             The score vector of the model, i.e. the first derivative of the
             loglikelihood function, evaluated at `params`
         """
-        pass
+        return self.score_obs(params).sum(0)
+
+    def fit(self, start_params=None, method='bfgs', maxiter=35,
+            full_output=1, disp=1, callback=None,
+            cov_type='nonrobust', cov_kwds=None, use_t=None, **kwargs):
+        if start_params is None:
+            offset = getattr(self, "offset", 0) + getattr(self, "exposure", 0)
+            if np.size(offset) == 1 and offset == 0:
+                offset = None
+            model = self.model_main.__class__(self.endog, self.exog,
+                                              offset=offset)
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore", category=ConvergenceWarning)
+                start_params = model.fit(disp=0).params
+
+        # Todo: check how we can to this in __init__
+        k_params = self.df_model + 1 + self.k_extra
+        self.df_resid = self.endog.shape[0] - k_params
+
+        mlefit = super(TruncatedLFGeneric, self).fit(
+            start_params=start_params,
+            method=method,
+            maxiter=maxiter,
+            disp=disp,
+            full_output=full_output,
+            callback=lambda x: x,
+            **kwargs
+            )
+
+        zipfit = self.result_class(self, mlefit._results)
+        result = self.result_class_wrapper(zipfit)
+
+        if cov_kwds is None:
+            cov_kwds = {}
+
+        result._get_robustcov_results(cov_type=cov_type,
+                                      use_self=True, use_t=use_t, **cov_kwds)
+        return result
+
     fit.__doc__ = DiscreteModel.fit.__doc__
+
+    def fit_regularized(
+            self, start_params=None, method='l1',
+            maxiter='defined_by_method', full_output=1, disp=1, callback=None,
+            alpha=0, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=1e-4,
+            qc_tol=0.03, **kwargs):
+
+        if np.size(alpha) == 1 and alpha != 0:
+            k_params = self.exog.shape[1]
+            alpha = alpha * np.ones(k_params)
+
+        alpha_p = alpha
+        if start_params is None:
+            offset = getattr(self, "offset", 0) + getattr(self, "exposure", 0)
+            if np.size(offset) == 1 and offset == 0:
+                offset = None
+            model = self.model_main.__class__(self.endog, self.exog,
+                                              offset=offset)
+            start_params = model.fit_regularized(
+                start_params=start_params, method=method, maxiter=maxiter,
+                full_output=full_output, disp=0, callback=callback,
+                alpha=alpha_p, trim_mode=trim_mode,
+                auto_trim_tol=auto_trim_tol,
+                size_trim_tol=size_trim_tol, qc_tol=qc_tol, **kwargs).params
+        cntfit = super(CountModel, self).fit_regularized(
+                start_params=start_params, method=method, maxiter=maxiter,
+                full_output=full_output, disp=disp, callback=callback,
+                alpha=alpha, trim_mode=trim_mode, auto_trim_tol=auto_trim_tol,
+                size_trim_tol=size_trim_tol, qc_tol=qc_tol, **kwargs)
+
+        if method in ['l1', 'l1_cvxopt_cp']:
+            discretefit = self.result_class_reg(self, cntfit)
+        else:
+            raise TypeError(
+                    "argument method == %s, which is not handled" % method)
+
+        return self.result_class_reg_wrapper(discretefit)
+
     fit_regularized.__doc__ = DiscreteModel.fit_regularized.__doc__

     def hessian(self, params):
@@ -156,10 +291,10 @@ class TruncatedLFGeneric(CountModel):
         Notes
         -----
         """
-        pass
+        return approx_hess(params, self.loglike)

-    def predict(self, params, exog=None, exposure=None, offset=None, which=
-        'mean', y_values=None):
+    def predict(self, params, exog=None, exposure=None, offset=None,
+                which='mean', y_values=None):
         """
         Predict response variable or other statistic given exogenous variables.

@@ -216,12 +351,88 @@ class TruncatedLFGeneric(CountModel):
         If exposure is specified, then it will be logged by the method.
         The user does not need to log it first.
         """
-        pass
+        exog, offset, exposure = self._get_predict_arrays(
+            exog=exog,
+            offset=offset,
+            exposure=exposure
+            )
+
+        fitted = np.dot(exog, params[:exog.shape[1]])
+        linpred = fitted + exposure + offset
+
+        if which == 'mean':
+            mu = np.exp(linpred)
+            if self.truncation == 0:
+                prob_main = self.model_main._prob_nonzero(mu, params)
+                return mu / prob_main
+            elif self.truncation == -1:
+                return mu
+            elif self.truncation > 0:
+                counts = np.atleast_2d(np.arange(0, self.truncation + 1))
+                # next is same as in prob-main below
+                probs = self.model_main.predict(
+                    params, exog=exog, exposure=np.exp(exposure),
+                    offset=offset, which="prob", y_values=counts)
+                prob_tregion = probs.sum(1)
+                mean_tregion = (np.arange(self.truncation + 1) * probs).sum(1)
+                mean = (mu - mean_tregion) / (1 - prob_tregion)
+                return mean
+            else:
+                raise ValueError("unsupported self.truncation")
+        elif which == 'linear':
+            return linpred
+        elif which == 'mean-main':
+            return np.exp(linpred)
+        elif which == 'prob':
+            if y_values is not None:
+                counts = np.atleast_2d(y_values)
+            else:
+                counts = np.atleast_2d(np.arange(0, np.max(self.endog)+1))
+            mu = np.exp(linpred)[:, None]
+            if self.k_extra == 0:
+                # poisson, no extra params
+                probs = self.model_dist.pmf(counts, mu, self.trunc)
+            elif self.k_extra == 1:
+                p = self.model_main.parameterization
+                probs = self.model_dist.pmf(counts, mu, params[-1],
+                                            p, self.trunc)
+            else:
+                raise ValueError("k_extra is not 0 or 1")
+            return probs
+        elif which == 'prob-base':
+            if y_values is not None:
+                counts = np.asarray(y_values)
+            else:
+                counts = np.arange(0, np.max(self.endog)+1)
+
+            probs = self.model_main.predict(
+                params, exog=exog, exposure=np.exp(exposure),
+                offset=offset, which="prob", y_values=counts)
+            return probs
+        elif which == 'var':
+            mu = np.exp(linpred)
+            counts = np.atleast_2d(np.arange(0, self.truncation + 1))
+            # next is same as in prob-main below
+            probs = self.model_main.predict(
+                params, exog=exog, exposure=np.exp(exposure),
+                offset=offset, which="prob", y_values=counts)
+            prob_tregion = probs.sum(1)
+            mean_tregion = (np.arange(self.truncation + 1) * probs).sum(1)
+            mean = (mu - mean_tregion) / (1 - prob_tregion)
+            mnc2_tregion = (np.arange(self.truncation + 1)**2 *
+                            probs).sum(1)
+            vm = self.model_main._var(mu, params)
+            # uncentered 2nd moment
+            mnc2 = (mu**2 + vm - mnc2_tregion) / (1 - prob_tregion)
+            v = mnc2 - mean**2
+            return v
+        else:
+            raise ValueError(
+                "argument which == %s not handled" % which)


 class TruncatedLFPoisson(TruncatedLFGeneric):
-    __doc__ = (
-        """
+    __doc__ = """
     Truncated Poisson model for count data

     .. versionadded:: 0.14.0
@@ -238,25 +449,33 @@ class TruncatedLFPoisson(TruncatedLFGeneric):
     truncation : int, optional
         Truncation parameter specify truncation point out of the support
         of the distribution. pmf(k) = 0 for k <= truncation
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """offset : array_like
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+           """offset : array_like
         Offset is added to the linear prediction with coefficient equal to 1.
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
         equal to 1.

-    """
-         + base._missing_param_doc})
-
-    def __init__(self, endog, exog, offset=None, exposure=None, truncation=
-        0, missing='none', **kwargs):
-        super(TruncatedLFPoisson, self).__init__(endog, exog, offset=offset,
-            exposure=exposure, truncation=truncation, missing=missing, **kwargs
+    """ + base._missing_param_doc}
+
+    def __init__(self, endog, exog, offset=None, exposure=None,
+                 truncation=0, missing='none', **kwargs):
+        super(TruncatedLFPoisson, self).__init__(
+            endog,
+            exog,
+            offset=offset,
+            exposure=exposure,
+            truncation=truncation,
+            missing=missing,
+            **kwargs
             )
-        self.model_main = Poisson(self.endog, self.exog, exposure=getattr(
-            self, 'exposure', None), offset=getattr(self, 'offset', None))
+        self.model_main = Poisson(self.endog, self.exog,
+                                  exposure=getattr(self, "exposure", None),
+                                  offset=getattr(self, "offset", None),
+                                  )
         self.model_dist = truncatedpoisson
+
         self.result_class = TruncatedLFPoissonResults
         self.result_class_wrapper = TruncatedLFGenericResultsWrapper
         self.result_class_reg = L1TruncatedLFGenericResults
@@ -279,12 +498,14 @@ class TruncatedLFPoisson(TruncatedLFGeneric):
         -------
         Predicted conditional variance.
         """
-        pass
+        w = (1 - np.exp(-mu))  # prob of no truncation, 1 - P(y=0)
+        m = mu / w
+        var_ = m - (1 - w) * m**2
+        return m, var_


 class TruncatedLFNegativeBinomialP(TruncatedLFGeneric):
-    __doc__ = (
-        """
+    __doc__ = """
     Truncated Generalized Negative Binomial model for count data

     .. versionadded:: 0.14.0
@@ -301,28 +522,38 @@ class TruncatedLFNegativeBinomialP(TruncatedLFGeneric):
     truncation : int, optional
         Truncation parameter specify truncation point out of the support
         of the distribution. pmf(k) = 0 for k <= truncation
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """offset : array_like
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+           """offset : array_like
         Offset is added to the linear prediction with coefficient equal to 1.
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
         equal to 1.

-    """
-         + base._missing_param_doc})
-
-    def __init__(self, endog, exog, offset=None, exposure=None, truncation=
-        0, p=2, missing='none', **kwargs):
-        super(TruncatedLFNegativeBinomialP, self).__init__(endog, exog,
-            offset=offset, exposure=exposure, truncation=truncation,
-            missing=missing, **kwargs)
-        self.model_main = NegativeBinomialP(self.endog, self.exog, exposure
-            =getattr(self, 'exposure', None), offset=getattr(self, 'offset',
-            None), p=p)
+    """ + base._missing_param_doc}
+
+    def __init__(self, endog, exog, offset=None, exposure=None,
+                 truncation=0, p=2, missing='none', **kwargs):
+        super(TruncatedLFNegativeBinomialP, self).__init__(
+            endog,
+            exog,
+            offset=offset,
+            exposure=exposure,
+            truncation=truncation,
+            missing=missing,
+            **kwargs
+            )
+        self.model_main = NegativeBinomialP(
+            self.endog,
+            self.exog,
+            exposure=getattr(self, "exposure", None),
+            offset=getattr(self, "offset", None),
+            p=p
+            )
         self.k_extra = self.model_main.k_extra
         self.exog_names.extend(self.model_main.exog_names[-self.k_extra:])
         self.model_dist = truncatednegbin
+
         self.result_class = TruncatedNegativeBinomialResults
         self.result_class_wrapper = TruncatedLFGenericResultsWrapper
         self.result_class_reg = L1TruncatedLFGenericResults
@@ -345,12 +576,22 @@ class TruncatedLFNegativeBinomialP(TruncatedLFGeneric):
         -------
         Predicted conditional variance.
         """
-        pass
+        # note: prob_zero and vm are distribution specific, rest is generic
+        # when mean of base model is mu
+        alpha = params[-1]
+        p = self.model_main.parameterization
+        prob_zero = (1 + alpha * mu**(p-1))**(- 1 / alpha)
+        w = 1 - prob_zero  # prob of no truncation, 1 - P(y=0)
+        m = mu / w
+        vm = mu * (1 + alpha * mu**(p-1))  # variance of NBP
+        # uncentered 2nd moment is vm + mu**2
+        mnc2 = (mu**2 + vm) / w  # uses mnc2_tregion = 0
+        var_ = mnc2 - m**2
+        return m, var_


 class TruncatedLFGeneralizedPoisson(TruncatedLFGeneric):
-    __doc__ = (
-        """
+    __doc__ = """
     Truncated Generalized Poisson model for count data

     .. versionadded:: 0.14.0
@@ -367,37 +608,46 @@ class TruncatedLFGeneralizedPoisson(TruncatedLFGeneric):
     truncation : int, optional
         Truncation parameter specify truncation point out of the support
         of the distribution. pmf(k) = 0 for k <= truncation
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """offset : array_like
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+           """offset : array_like
         Offset is added to the linear prediction with coefficient equal to 1.
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
         equal to 1.

-    """
-         + base._missing_param_doc})
-
-    def __init__(self, endog, exog, offset=None, exposure=None, truncation=
-        0, p=2, missing='none', **kwargs):
-        super(TruncatedLFGeneralizedPoisson, self).__init__(endog, exog,
-            offset=offset, exposure=exposure, truncation=truncation,
-            missing=missing, **kwargs)
-        self.model_main = GeneralizedPoisson(self.endog, self.exog,
-            exposure=getattr(self, 'exposure', None), offset=getattr(self,
-            'offset', None), p=p)
+    """ + base._missing_param_doc}
+
+    def __init__(self, endog, exog, offset=None, exposure=None,
+                 truncation=0, p=2, missing='none', **kwargs):
+        super(TruncatedLFGeneralizedPoisson, self).__init__(
+            endog,
+            exog,
+            offset=offset,
+            exposure=exposure,
+            truncation=truncation,
+            missing=missing,
+            **kwargs
+            )
+        self.model_main = GeneralizedPoisson(
+            self.endog,
+            self.exog,
+            exposure=getattr(self, "exposure", None),
+            offset=getattr(self, "offset", None),
+            p=p
+            )
         self.k_extra = self.model_main.k_extra
         self.exog_names.extend(self.model_main.exog_names[-self.k_extra:])
         self.model_dist = None
         self.result_class = TruncatedNegativeBinomialResults
+
         self.result_class_wrapper = TruncatedLFGenericResultsWrapper
         self.result_class_reg = L1TruncatedLFGenericResults
         self.result_class_reg_wrapper = L1TruncatedLFGenericResultsWrapper


 class _RCensoredGeneric(CountModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Generic right Censored model for count data

     %(params)s
@@ -409,23 +659,28 @@ class _RCensoredGeneric(CountModel):
         A reference to the endogenous response variable
     exog : array
         A reference to the exogenous design.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """offset : array_like
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+           """offset : array_like
         Offset is added to the linear prediction with coefficient equal to 1.
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
         equal to 1.

-    """
-         + base._missing_param_doc})
+    """ + base._missing_param_doc}

-    def __init__(self, endog, exog, offset=None, exposure=None, missing=
-        'none', **kwargs):
+    def __init__(self, endog, exog, offset=None, exposure=None,
+                 missing='none', **kwargs):
         self.zero_idx = np.nonzero(endog == 0)[0]
         self.nonzero_idx = np.nonzero(endog)[0]
-        super(_RCensoredGeneric, self).__init__(endog, exog, offset=offset,
-            exposure=exposure, missing=missing, **kwargs)
+        super(_RCensoredGeneric, self).__init__(
+            endog,
+            exog,
+            offset=offset,
+            exposure=exposure,
+            missing=missing,
+            **kwargs
+            )

     def loglike(self, params):
         """
@@ -446,7 +701,7 @@ class _RCensoredGeneric(CountModel):
         -----

         """
-        pass
+        return np.sum(self.loglikeobs(params))

     def loglikeobs(self, params):
         """
@@ -467,7 +722,14 @@ class _RCensoredGeneric(CountModel):
         -----

         """
-        pass
+        llf_main = self.model_main.loglikeobs(params)
+
+        llf = np.concatenate(
+            (llf_main[self.zero_idx],
+             np.log(1 - np.exp(llf_main[self.nonzero_idx])))
+            )
+
+        return llf

     def score_obs(self, params):
         """
@@ -484,7 +746,17 @@ class _RCensoredGeneric(CountModel):
             The score vector of the model, i.e. the first derivative of the
             loglikelihood function, evaluated at `params`
         """
-        pass
+        score_main = self.model_main.score_obs(params)
+        llf_main = self.model_main.loglikeobs(params)
+
+        score = np.concatenate((
+            score_main[self.zero_idx],
+            (score_main[self.nonzero_idx].T *
+             -np.exp(llf_main[self.nonzero_idx]) /
+             (1 - np.exp(llf_main[self.nonzero_idx]))).T
+            ))
+
+        return score

     def score(self, params):
         """
@@ -501,8 +773,79 @@ class _RCensoredGeneric(CountModel):
             The score vector of the model, i.e. the first derivative of the
             loglikelihood function, evaluated at `params`
         """
-        pass
+        return self.score_obs(params).sum(0)
+
+    def fit(self, start_params=None, method='bfgs', maxiter=35,
+            full_output=1, disp=1, callback=None,
+            cov_type='nonrobust', cov_kwds=None, use_t=None, **kwargs):
+        if start_params is None:
+            offset = getattr(self, "offset", 0) + getattr(self, "exposure", 0)
+            if np.size(offset) == 1 and offset == 0:
+                offset = None
+            model = self.model_main.__class__(self.endog, self.exog,
+                                              offset=offset)
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore", category=ConvergenceWarning)
+                start_params = model.fit(disp=0).params
+        mlefit = super(_RCensoredGeneric, self).fit(
+            start_params=start_params,
+            method=method,
+            maxiter=maxiter,
+            disp=disp,
+            full_output=full_output,
+            callback=lambda x: x,
+            **kwargs
+            )
+
+        zipfit = self.result_class(self, mlefit._results)
+        result = self.result_class_wrapper(zipfit)
+
+        if cov_kwds is None:
+            cov_kwds = {}
+
+        result._get_robustcov_results(cov_type=cov_type,
+                                      use_self=True, use_t=use_t, **cov_kwds)
+        return result
+
     fit.__doc__ = DiscreteModel.fit.__doc__
+
+    def fit_regularized(
+            self, start_params=None, method='l1',
+            maxiter='defined_by_method', full_output=1, disp=1, callback=None,
+            alpha=0, trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=1e-4,
+            qc_tol=0.03, **kwargs):
+
+        if np.size(alpha) == 1 and alpha != 0:
+            k_params = self.exog.shape[1]
+            alpha = alpha * np.ones(k_params)
+
+        alpha_p = alpha
+        if start_params is None:
+            offset = getattr(self, "offset", 0) + getattr(self, "exposure", 0)
+            if np.size(offset) == 1 and offset == 0:
+                offset = None
+            model = self.model_main.__class__(self.endog, self.exog,
+                                              offset=offset)
+            start_params = model.fit_regularized(
+                start_params=start_params, method=method, maxiter=maxiter,
+                full_output=full_output, disp=0, callback=callback,
+                alpha=alpha_p, trim_mode=trim_mode,
+                auto_trim_tol=auto_trim_tol,
+                size_trim_tol=size_trim_tol, qc_tol=qc_tol, **kwargs).params
+        cntfit = super(CountModel, self).fit_regularized(
+                start_params=start_params, method=method, maxiter=maxiter,
+                full_output=full_output, disp=disp, callback=callback,
+                alpha=alpha, trim_mode=trim_mode, auto_trim_tol=auto_trim_tol,
+                size_trim_tol=size_trim_tol, qc_tol=qc_tol, **kwargs)
+
+        if method in ['l1', 'l1_cvxopt_cp']:
+            discretefit = self.result_class_reg(self, cntfit)
+        else:
+            raise TypeError(
+                    "argument method == %s, which is not handled" % method)
+
+        return self.result_class_reg_wrapper(discretefit)
+
     fit_regularized.__doc__ = DiscreteModel.fit_regularized.__doc__

     def hessian(self, params):
@@ -523,12 +866,11 @@ class _RCensoredGeneric(CountModel):
         Notes
         -----
         """
-        pass
+        return approx_hess(params, self.loglike)


 class _RCensoredPoisson(_RCensoredGeneric):
-    __doc__ = (
-        """
+    __doc__ = """
     Censored Poisson model for count data

     %(params)s
@@ -540,21 +882,21 @@ class _RCensoredPoisson(_RCensoredGeneric):
         A reference to the endogenous response variable
     exog : array
         A reference to the exogenous design.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """offset : array_like
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+           """offset : array_like
         Offset is added to the linear prediction with coefficient equal to 1.
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
         equal to 1.

-    """
-         + base._missing_param_doc})
+    """ + base._missing_param_doc}

-    def __init__(self, endog, exog, offset=None, exposure=None, missing=
-        'none', **kwargs):
+    def __init__(self, endog, exog, offset=None,
+                 exposure=None, missing='none', **kwargs):
         super(_RCensoredPoisson, self).__init__(endog, exog, offset=offset,
-            exposure=exposure, missing=missing, **kwargs)
+                                                exposure=exposure,
+                                                missing=missing, **kwargs)
         self.model_main = Poisson(np.zeros_like(self.endog), self.exog)
         self.model_dist = None
         self.result_class = TruncatedLFGenericResults
@@ -564,8 +906,7 @@ class _RCensoredPoisson(_RCensoredGeneric):


 class _RCensoredGeneralizedPoisson(_RCensoredGeneric):
-    __doc__ = (
-        """
+    __doc__ = """
     Censored Generalized Poisson model for count data

     %(params)s
@@ -577,23 +918,24 @@ class _RCensoredGeneralizedPoisson(_RCensoredGeneric):
         A reference to the endogenous response variable
     exog : array
         A reference to the exogenous design.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """offset : array_like
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+           """offset : array_like
         Offset is added to the linear prediction with coefficient equal to 1.
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
         equal to 1.

-    """
-         + base._missing_param_doc})
+    """ + base._missing_param_doc}

-    def __init__(self, endog, exog, offset=None, p=2, exposure=None,
-        missing='none', **kwargs):
-        super(_RCensoredGeneralizedPoisson, self).__init__(endog, exog,
-            offset=offset, exposure=exposure, missing=missing, **kwargs)
-        self.model_main = GeneralizedPoisson(np.zeros_like(self.endog),
-            self.exog)
+    def __init__(self, endog, exog, offset=None, p=2,
+                 exposure=None, missing='none', **kwargs):
+        super(_RCensoredGeneralizedPoisson, self).__init__(
+            endog, exog, offset=offset, exposure=exposure,
+            missing=missing, **kwargs)
+
+        self.model_main = GeneralizedPoisson(
+            np.zeros_like(self.endog), self.exog)
         self.model_dist = None
         self.result_class = TruncatedLFGenericResults
         self.result_class_wrapper = TruncatedLFGenericResultsWrapper
@@ -602,8 +944,7 @@ class _RCensoredGeneralizedPoisson(_RCensoredGeneric):


 class _RCensoredNegativeBinomialP(_RCensoredGeneric):
-    __doc__ = (
-        """
+    __doc__ = """
     Censored Negative Binomial model for count data

     %(params)s
@@ -615,23 +956,30 @@ class _RCensoredNegativeBinomialP(_RCensoredGeneric):
         A reference to the endogenous response variable
     exog : array
         A reference to the exogenous design.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """offset : array_like
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+           """offset : array_like
         Offset is added to the linear prediction with coefficient equal to 1.
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
         equal to 1.

-    """
-         + base._missing_param_doc})
-
-    def __init__(self, endog, exog, offset=None, p=2, exposure=None,
-        missing='none', **kwargs):
-        super(_RCensoredNegativeBinomialP, self).__init__(endog, exog,
-            offset=offset, exposure=exposure, missing=missing, **kwargs)
-        self.model_main = NegativeBinomialP(np.zeros_like(self.endog), self
-            .exog, p=p)
+    """ + base._missing_param_doc}
+
+    def __init__(self, endog, exog, offset=None, p=2,
+                 exposure=None, missing='none', **kwargs):
+        super(_RCensoredNegativeBinomialP, self).__init__(
+            endog,
+            exog,
+            offset=offset,
+            exposure=exposure,
+            missing=missing,
+            **kwargs
+            )
+        self.model_main = NegativeBinomialP(np.zeros_like(self.endog),
+                                            self.exog,
+                                            p=p
+                                            )
         self.model_dist = None
         self.result_class = TruncatedLFGenericResults
         self.result_class_wrapper = TruncatedLFGenericResultsWrapper
@@ -640,8 +988,7 @@ class _RCensoredNegativeBinomialP(_RCensoredGeneric):


 class _RCensored(_RCensoredGeneric):
-    __doc__ = (
-        """
+    __doc__ = """
     Censored model for count data

     %(params)s
@@ -653,27 +1000,34 @@ class _RCensored(_RCensoredGeneric):
         A reference to the endogenous response variable
     exog : array
         A reference to the exogenous design.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """offset : array_like
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+           """offset : array_like
         Offset is added to the linear prediction with coefficient equal to 1.
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
         equal to 1.

-    """
-         + base._missing_param_doc})
-
-    def __init__(self, endog, exog, model=Poisson, distribution=
-        truncatedpoisson, offset=None, exposure=None, missing='none', **kwargs
-        ):
-        super(_RCensored, self).__init__(endog, exog, offset=offset,
-            exposure=exposure, missing=missing, **kwargs)
+    """ + base._missing_param_doc}
+
+    def __init__(self, endog, exog, model=Poisson,
+                 distribution=truncatedpoisson, offset=None,
+                 exposure=None, missing='none', **kwargs):
+        super(_RCensored, self).__init__(
+            endog,
+            exog,
+            offset=offset,
+            exposure=exposure,
+            missing=missing,
+            **kwargs
+            )
         self.model_main = model(np.zeros_like(self.endog), self.exog)
         self.model_dist = distribution
+        # fix k_extra and exog_names
         self.k_extra = k_extra = self.model_main.k_extra
         if k_extra > 0:
             self.exog_names.extend(self.model_main.exog_names[-k_extra:])
+
         self.result_class = TruncatedLFGenericResults
         self.result_class_wrapper = TruncatedLFGenericResultsWrapper
         self.result_class_reg = L1TruncatedLFGenericResults
@@ -684,12 +1038,12 @@ class _RCensored(_RCensoredGeneric):

         internal use in Censored model, will be refactored or removed
         """
-        pass
+        prob_nz = self.model_main._prob_nonzero(mu, params)
+        return prob_nz


 class HurdleCountModel(CountModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Hurdle model for count data

     .. versionadded:: 0.14.0
@@ -713,9 +1067,9 @@ class HurdleCountModel(CountModel):
     pzero : scalar
         Define parameterization parameter zero hurdle model family.
         Used when zerodist='negbin'.
-    """
-         % {'params': base._model_params_doc, 'extra_params': 
-        """offset : array_like
+    """ % {'params': base._model_params_doc,
+           'extra_params':
+           """offset : array_like
         Offset is added to the linear prediction with coefficient equal to 1.
     exposure : array_like
         Log(exposure) is added to the linear prediction with coefficient
@@ -732,24 +1086,53 @@ class HurdleCountModel(CountModel):
     ----------
     not yet

-    """
-         + base._missing_param_doc})
+    """ + base._missing_param_doc}
+
+    def __init__(self, endog, exog, offset=None,
+                 dist="poisson", zerodist="poisson",
+                 p=2, pzero=2,
+                 exposure=None, missing='none', **kwargs):

-    def __init__(self, endog, exog, offset=None, dist='poisson', zerodist=
-        'poisson', p=2, pzero=2, exposure=None, missing='none', **kwargs):
-        if offset is not None or exposure is not None:
-            msg = 'Offset and exposure are not yet implemented'
+        if (offset is not None) or (exposure is not None):
+            msg = "Offset and exposure are not yet implemented"
             raise NotImplementedError(msg)
-        super(HurdleCountModel, self).__init__(endog, exog, offset=offset,
-            exposure=exposure, missing=missing, **kwargs)
+        super(HurdleCountModel, self).__init__(
+            endog,
+            exog,
+            offset=offset,
+            exposure=exposure,
+            missing=missing,
+            **kwargs
+            )
         self.k_extra1 = 0
         self.k_extra2 = 0
+
         self._initialize(dist, zerodist, p, pzero)
         self.result_class = HurdleCountResults
         self.result_class_wrapper = HurdleCountResultsWrapper
         self.result_class_reg = L1HurdleCountResults
         self.result_class_reg_wrapper = L1HurdleCountResultsWrapper

+    def _initialize(self, dist, zerodist, p, pzero):
+        if (dist not in ["poisson", "negbin"] or
+                zerodist not in ["poisson", "negbin"]):
+            raise NotImplementedError('dist and zerodist must be "poisson",'
+                                      '"negbin"')
+
+        if zerodist == "poisson":
+            self.model1 = _RCensored(self.endog, self.exog, model=Poisson)
+        elif zerodist == "negbin":
+            self.model1 = _RCensored(self.endog, self.exog,
+                                     model=NegativeBinomialP)
+            self.k_extra1 += 1
+
+        if dist == "poisson":
+            self.model2 = TruncatedLFPoisson(self.endog, self.exog)
+        elif dist == "negbin":
+            self.model2 = TruncatedLFNegativeBinomialP(self.endog, self.exog,
+                                                       p=p)
+            self.k_extra2 += 1
+
     def loglike(self, params):
         """
         Loglikelihood of Generic Hurdle model
@@ -769,11 +1152,69 @@ class HurdleCountModel(CountModel):
         -----

         """
-        pass
+        k = int((len(params) - self.k_extra1 - self.k_extra2) / 2
+                ) + self.k_extra1
+        return (self.model1.loglike(params[:k]) +
+                self.model2.loglike(params[k:]))
+
+    def fit(self, start_params=None, method='bfgs', maxiter=35,
+            full_output=1, disp=1, callback=None,
+            cov_type='nonrobust', cov_kwds=None, use_t=None, **kwargs):
+
+        if cov_type != "nonrobust":
+            raise ValueError("robust cov_type currently not supported")
+
+        results1 = self.model1.fit(
+            start_params=start_params,
+            method=method, maxiter=maxiter, disp=disp,
+            full_output=full_output, callback=lambda x: x,
+            **kwargs
+            )
+
+        results2 = self.model2.fit(
+            start_params=start_params,
+            method=method, maxiter=maxiter, disp=disp,
+            full_output=full_output, callback=lambda x: x,
+            **kwargs
+            )
+
+        result = deepcopy(results1)
+        result._results.model = self
+        result.mle_retvals['converged'] = [results1.mle_retvals['converged'],
+                                           results2.mle_retvals['converged']]
+        result._results.params = np.append(results1._results.params,
+                                           results2._results.params)
+        # TODO: the following should be in __init__ or initialize
+        result._results.df_model += results2._results.df_model
+        # this looks wrong attr does not exist, always 0
+        self.k_extra1 += getattr(results1._results, "k_extra", 0)
+        self.k_extra2 += getattr(results2._results, "k_extra", 0)
+        self.k_extra = (self.k_extra1 + self.k_extra2 + 1)
+        xnames1 = ["zm_" + name for name in self.model1.exog_names]
+        self.exog_names[:] = xnames1 + self.model2.exog_names
+
+        # fix up cov_params,
+        # we could use normalized cov_params directly, unless it's not used
+        from scipy.linalg import block_diag
+        result._results.normalized_cov_params = None
+        try:
+            cov1 = results1._results.cov_params()
+            cov2 = results2._results.cov_params()
+            result._results.normalized_cov_params = block_diag(cov1, cov2)
+        except ValueError as e:
+            if "need covariance" not in str(e):
+                # could be some other problem
+                raise
+
+        modelfit = self.result_class(self, result._results, results1, results2)
+        result = self.result_class_wrapper(modelfit)
+
+        return result
+
     fit.__doc__ = DiscreteModel.fit.__doc__

-    def predict(self, params, exog=None, exposure=None, offset=None, which=
-        'mean', y_values=None):
+    def predict(self, params, exog=None, exposure=None,
+                offset=None, which='mean', y_values=None):
         """
         Predict response variable or other statistic given exogenous variables.

@@ -837,22 +1278,109 @@ class HurdleCountModel(CountModel):
         number of zeros compared to the count model. If it is smaller than one,
         then the number of zeros is deflated.
         """
-        pass
+        which = which.lower()  # make it case insensitive
+        no_exog = True if exog is None else False
+        exog, offset, exposure = self._get_predict_arrays(
+            exog=exog,
+            offset=offset,
+            exposure=exposure
+            )
+
+        exog_zero = None  # not yet
+        if exog_zero is None:
+            if no_exog:
+                exog_zero = self.exog
+            else:
+                exog_zero = exog
+
+        k_zeros = int((len(params) - self.k_extra1 - self.k_extra2) / 2
+                      ) + self.k_extra1
+        params_zero = params[:k_zeros]
+        params_main = params[k_zeros:]
+
+        lin_pred = (np.dot(exog, params_main[:self.exog.shape[1]]) +
+                    exposure + offset)
+
+        # this currently is mean_main, offset, exposure for zero part ?
+        mu1 = self.model1.predict(params_zero, exog=exog)
+        # prob that count model applies y>0 from zero model predict
+        prob_main = self.model1.model_main._prob_nonzero(mu1, params_zero)
+        prob_zero = (1 - prob_main)
+
+        mu2 = np.exp(lin_pred)
+        prob_ntrunc = self.model2.model_main._prob_nonzero(mu2, params_main)
+
+        if which == 'mean':
+            return prob_main * np.exp(lin_pred) / prob_ntrunc
+        elif which == 'mean-main':
+            return np.exp(lin_pred)
+        elif which == 'linear':
+            return lin_pred
+        elif which == 'mean-nonzero':
+            return np.exp(lin_pred) / prob_ntrunc
+        elif which == 'prob-zero':
+            return prob_zero
+        elif which == 'prob-main':
+            return prob_main
+        elif which == 'prob-trunc':
+            return 1 - prob_ntrunc
+        # not yet supported
+        elif which == 'var':
+            # generic computation using results from submodels
+            mu = np.exp(lin_pred)
+            mt, vt = self.model2._predict_mom_trunc0(params_main, mu)
+            var_ = prob_main * vt + prob_main * (1 - prob_main) * mt**2
+            return var_
+        elif which == 'prob':
+            probs_main = self.model2.predict(
+                params_main, exog, np.exp(exposure), offset, which="prob",
+                y_values=y_values)
+            probs_main *= prob_main[:, None]
+            probs_main[:, 0] = prob_zero
+            return probs_main
+        else:
+            raise ValueError('which = %s is not available' % which)


 class TruncatedLFGenericResults(CountResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for Generic Truncated', 'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {
+        "one_line_description": "A results class for Generic Truncated",
+        "extra_attr": ""}


 class TruncatedLFPoissonResults(TruncatedLFGenericResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for Truncated Poisson', 'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {
+        "one_line_description": "A results class for Truncated Poisson",
+        "extra_attr": ""}
+
+    @cache_readonly
+    def _dispersion_factor(self):
+        if self.model.trunc != 0:
+            msg = "dispersion is only available for zero-truncation"
+            raise NotImplementedError(msg)
+
+        mu = np.exp(self.predict(which='linear'))
+
+        return (1 - mu / (np.exp(mu) - 1))


 class TruncatedNegativeBinomialResults(TruncatedLFGenericResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for Truncated Negative Binomial', 'extra_attr': ''}
+    __doc__ = _discrete_results_docs % {
+        "one_line_description":
+            "A results class for Truncated Negative Binomial",
+        "extra_attr": ""}
+
+    @cache_readonly
+    def _dispersion_factor(self):
+        if self.model.trunc != 0:
+            msg = "dispersion is only available for zero-truncation"
+            raise NotImplementedError(msg)
+
+        alpha = self.params[-1]
+        p = self.model.model_main.parameterization
+        mu = np.exp(self.predict(which='linear'))
+
+        return (1 - alpha * mu**(p-1) / (np.exp(mu**(p-1)) - 1))


 class L1TruncatedLFGenericResults(L1CountResults, TruncatedLFGenericResults):
@@ -864,7 +1392,7 @@ class TruncatedLFGenericResultsWrapper(lm.RegressionResultsWrapper):


 wrap.populate_wrapper(TruncatedLFGenericResultsWrapper,
-    TruncatedLFGenericResults)
+                      TruncatedLFGenericResults)


 class L1TruncatedLFGenericResultsWrapper(lm.RegressionResultsWrapper):
@@ -872,21 +1400,37 @@ class L1TruncatedLFGenericResultsWrapper(lm.RegressionResultsWrapper):


 wrap.populate_wrapper(L1TruncatedLFGenericResultsWrapper,
-    L1TruncatedLFGenericResults)
+                      L1TruncatedLFGenericResults)


 class HurdleCountResults(CountResults):
-    __doc__ = _discrete_results_docs % {'one_line_description':
-        'A results class for Hurdle model', 'extra_attr': ''}
-
-    def __init__(self, model, mlefit, results_zero, results_count, cov_type
-        ='nonrobust', cov_kwds=None, use_t=None):
-        super(HurdleCountResults, self).__init__(model, mlefit, cov_type=
-            cov_type, cov_kwds=cov_kwds, use_t=use_t)
+    __doc__ = _discrete_results_docs % {
+        "one_line_description": "A results class for Hurdle model",
+        "extra_attr": ""}
+
+    def __init__(self, model, mlefit, results_zero, results_count,
+                 cov_type='nonrobust', cov_kwds=None, use_t=None):
+        super(HurdleCountResults, self).__init__(
+            model,
+            mlefit,
+            cov_type=cov_type,
+            cov_kwds=cov_kwds,
+            use_t=use_t,
+            )
         self.results_zero = results_zero
         self.results_count = results_count
+        # TODO: this is to fix df_resid, should be automatic but is not
         self.df_resid = self.model.endog.shape[0] - len(self.params)

+    @cache_readonly
+    def llnull(self):
+        return (self.results_zero._results.llnull +
+                self.results_count._results.llnull)
+
+    @cache_readonly
+    def bse(self):
+        return np.append(self.results_zero.bse, self.results_count.bse)
+

 class L1HurdleCountResults(L1CountResults, HurdleCountResults):
     pass
@@ -896,11 +1440,13 @@ class HurdleCountResultsWrapper(lm.RegressionResultsWrapper):
     pass


-wrap.populate_wrapper(HurdleCountResultsWrapper, HurdleCountResults)
+wrap.populate_wrapper(HurdleCountResultsWrapper,
+                      HurdleCountResults)


 class L1HurdleCountResultsWrapper(lm.RegressionResultsWrapper):
     pass


-wrap.populate_wrapper(L1HurdleCountResultsWrapper, L1HurdleCountResults)
+wrap.populate_wrapper(L1HurdleCountResultsWrapper,
+                      L1HurdleCountResults)
diff --git a/statsmodels/distributions/bernstein.py b/statsmodels/distributions/bernstein.py
index 282eed001..2bb3b85d1 100644
--- a/statsmodels/distributions/bernstein.py
+++ b/statsmodels/distributions/bernstein.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Wed Feb 17 15:35:23 2021

@@ -5,10 +6,14 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 from scipy import stats
+
 from statsmodels.tools.decorators import cache_readonly
-from statsmodels.distributions.tools import _Grid, cdf2prob_grid, prob2cdf_grid, _eval_bernstein_dd, _eval_bernstein_2d, _eval_bernstein_1d
+from statsmodels.distributions.tools import (
+        _Grid, cdf2prob_grid, prob2cdf_grid,
+        _eval_bernstein_dd, _eval_bernstein_2d, _eval_bernstein_1d)


 class BernsteinDistribution:
@@ -35,7 +40,7 @@ class BernsteinDistribution:
         self.cdf_grid = cdf_grid = np.asarray(cdf_grid)
         self.k_dim = cdf_grid.ndim
         self.k_grid = cdf_grid.shape
-        self.k_grid_product = np.prod([(i - 1) for i in self.k_grid])
+        self.k_grid_product = np.prod([i-1 for i in self.k_grid])
         self._grid = _Grid(self.k_grid)

     @classmethod
@@ -58,7 +63,29 @@ class BernsteinDistribution:
         -------
         Instance of a Bernstein distribution
         """
-        pass
+        data = np.asarray(data)
+        if np.any(data < 0) or np.any(data > 1):
+            raise ValueError("data needs to be in [0, 1]")
+
+        if data.ndim == 1:
+            data = data[:, None]
+
+        k_dim = data.shape[1]
+        if np.size(k_bins) == 1:
+            k_bins = [k_bins] * k_dim
+        bins = [np.linspace(-1 / ni, 1, ni + 2) for ni in k_bins]
+        c, e = np.histogramdd(data, bins=bins, density=False)
+        # TODO: check when we have zero observations, which bin?
+        # check bins start at 0 exept leading bin
+        assert all([ei[1] == 0 for ei in e])
+        c /= len(data)
+
+        cdf_grid = prob2cdf_grid(c)
+        return cls(cdf_grid)
+
+    @cache_readonly
+    def prob_grid(self):
+        return cdf2prob_grid(self.cdf_grid, prepend=None)

     def cdf(self, x):
         """cdf values evaluated at x.
@@ -83,7 +110,11 @@ class BernsteinDistribution:
         currently the bernstein polynomials will be evaluated in a fully
         vectorized computation.
         """
-        pass
+        x = np.asarray(x)
+        if x.ndim == 1 and self.k_dim == 1:
+            x = x[:, None]
+        cdf_ = _eval_bernstein_dd(x, self.cdf_grid)
+        return cdf_

     def pdf(self, x):
         """pdf values evaluated at x.
@@ -108,7 +139,12 @@ class BernsteinDistribution:
         currently the bernstein polynomials will be evaluated in a fully
         vectorized computation.
         """
-        pass
+        x = np.asarray(x)
+        if x.ndim == 1 and self.k_dim == 1:
+            x = x[:, None]
+        # TODO: check usage of k_grid_product. Should this go into eval?
+        pdf_ = self.k_grid_product * _eval_bernstein_dd(x, self.prob_grid)
+        return pdf_

     def get_marginal(self, idx):
         """Get marginal BernsteinDistribution.
@@ -123,7 +159,19 @@ class BernsteinDistribution:
         -------
         BernsteinDistribution instance for the marginal distribution.
         """
-        pass
+
+        # univariate
+        if self.k_dim == 1:
+            return self
+
+        sl = [-1] * self.k_dim
+        if np.shape(idx) == ():
+            idx = [idx]
+        for ii in idx:
+            sl[ii] = slice(None, None, None)
+        cdf_m = self.cdf_grid[tuple(sl)]
+        bpd_marginal = BernsteinDistribution(cdf_m)
+        return bpd_marginal

     def rvs(self, nobs):
         """Generate random numbers from distribution.
@@ -133,12 +181,47 @@ class BernsteinDistribution:
         nobs : int
             Number of random observations to generate.
         """
-        pass
+        rvs_mnl = np.random.multinomial(nobs, self.prob_grid.flatten())
+        k_comp = self.k_dim
+        rvs_m = []
+        for i in range(len(rvs_mnl)):
+            if rvs_mnl[i] != 0:
+                idx = np.unravel_index(i, self.prob_grid.shape)
+                rvsi = []
+                for j in range(k_comp):
+                    n = self.k_grid[j]
+                    xgi = self._grid.x_marginal[j][idx[j]]
+                    # Note: x_marginal starts at 0
+                    #       x_marginal ends with 1 but that is not used by idx
+                    rvsi.append(stats.beta.rvs(n * xgi + 1, n * (1-xgi) + 0,
+                                               size=rvs_mnl[i]))
+                rvs_m.append(np.column_stack(rvsi))
+
+        rvsm = np.concatenate(rvs_m)
+        return rvsm


 class BernsteinDistributionBV(BernsteinDistribution):
-    pass
+
+    def cdf(self, x):
+        cdf_ = _eval_bernstein_2d(x, self.cdf_grid)
+        return cdf_
+
+    def pdf(self, x):
+        # TODO: check usage of k_grid_product. Should this go into eval?
+        pdf_ = self.k_grid_product * _eval_bernstein_2d(x, self.prob_grid)
+        return pdf_


 class BernsteinDistributionUV(BernsteinDistribution):
-    pass
+
+    def cdf(self, x, method="binom"):
+
+        cdf_ = _eval_bernstein_1d(x, self.cdf_grid, method=method)
+        return cdf_
+
+    def pdf(self, x, method="binom"):
+        # TODO: check usage of k_grid_product. Should this go into eval?
+        pdf_ = self.k_grid_product * _eval_bernstein_1d(x, self.prob_grid,
+                                                        method=method)
+        return pdf_
diff --git a/statsmodels/distributions/copula/_special.py b/statsmodels/distributions/copula/_special.py
index 25ef1c6c7..cf19a599a 100644
--- a/statsmodels/distributions/copula/_special.py
+++ b/statsmodels/distributions/copula/_special.py
@@ -4,19 +4,23 @@ Special functions for copulas not available in scipy

 Created on Jan. 27, 2023
 """
+
 import numpy as np
 from scipy.special import factorial


-class Sterling1:
+class Sterling1():
     """Stirling numbers of the first kind
     """
+    # based on
+    # https://rosettacode.org/wiki/Stirling_numbers_of_the_first_kind#Python

     def __init__(self):
         self._cache = {}

     def __call__(self, n, k):
-        key = str(n) + ',' + str(k)
+        key = str(n) + "," + str(k)
+
         if key in self._cache.keys():
             return self._cache[key]
         if n == k == 0:
@@ -32,26 +36,29 @@ class Sterling1:
     def clear_cache(self):
         """clear cache of Sterling numbers
         """
-        pass
+        self._cache = {}


 sterling1 = Sterling1()


-class Sterling2:
+class Sterling2():
     """Stirling numbers of the second kind
     """
+    # based on
+    # https://rosettacode.org/wiki/Stirling_numbers_of_the_second_kind#Python

     def __init__(self):
         self._cache = {}

     def __call__(self, n, k):
-        key = str(n) + ',' + str(k)
+        key = str(n) + "," + str(k)
+
         if key in self._cache.keys():
             return self._cache[key]
         if n == k == 0:
             return 1
-        if n > 0 and k == 0 or n == 0 and k > 0:
+        if (n > 0 and k == 0) or (n == 0 and k > 0):
             return 0
         if n == k:
             return 1
@@ -64,7 +71,7 @@ class Sterling2:
     def clear_cache(self):
         """clear cache of Sterling numbers
         """
-        pass
+        self._cache = {}


 sterling2 = Sterling2()
@@ -75,7 +82,7 @@ def li3(z):

     Li(-3, z)
     """
-    pass
+    return z * (1 + 4 * z + z**2) / (1 - z)**4


 def li4(z):
@@ -83,7 +90,7 @@ def li4(z):

     Li(-4, z)
     """
-    pass
+    return z * (1 + z) * (1 + 10 * z + z**2) / (1 - z)**5


 def lin(n, z):
@@ -93,4 +100,11 @@ def lin(n, z):

     https://en.wikipedia.org/wiki/Polylogarithm#Particular_values
     """
-    pass
+    if np.size(z) > 1:
+        z = np.array(z)[..., None]
+
+    k = np.arange(n+1)
+    st2 = np.array([sterling2(n + 1, ki + 1) for ki in k])
+    res = (-1)**(n+1) * np.sum(factorial(k) * st2 * (-1 / (1 - z))**(k+1),
+                               axis=-1)
+    return res
diff --git a/statsmodels/distributions/copula/api.py b/statsmodels/distributions/copula/api.py
index 61f37ae75..b8727bf7e 100644
--- a/statsmodels/distributions/copula/api.py
+++ b/statsmodels/distributions/copula/api.py
@@ -1,11 +1,32 @@
-from statsmodels.distributions.copula.copulas import CopulaDistribution
-from statsmodels.distributions.copula.archimedean import ArchimedeanCopula, FrankCopula, ClaytonCopula, GumbelCopula
+from statsmodels.distributions.copula.copulas import (
+    CopulaDistribution)
+
+from statsmodels.distributions.copula.archimedean import (
+    ArchimedeanCopula, FrankCopula, ClaytonCopula, GumbelCopula)
 import statsmodels.distributions.copula.transforms as transforms
-from statsmodels.distributions.copula.elliptical import GaussianCopula, StudentTCopula
-from statsmodels.distributions.copula.extreme_value import ExtremeValueCopula
+
+from statsmodels.distributions.copula.elliptical import (
+    GaussianCopula, StudentTCopula)
+
+from statsmodels.distributions.copula.extreme_value import (
+    ExtremeValueCopula)
 import statsmodels.distributions.copula.depfunc_ev as depfunc_ev
-from statsmodels.distributions.copula.other_copulas import IndependenceCopula, rvs_kernel
-__all__ = ['ArchimedeanCopula', 'ClaytonCopula', 'CopulaDistribution',
-    'ExtremeValueCopula', 'FrankCopula', 'GaussianCopula', 'GumbelCopula',
-    'IndependenceCopula', 'StudentTCopula', 'depfunc_ev', 'transforms',
-    'rvs_kernel']
+
+from statsmodels.distributions.copula.other_copulas import (
+    IndependenceCopula, rvs_kernel)
+
+
+__all__ = [
+    "ArchimedeanCopula",
+    "ClaytonCopula",
+    "CopulaDistribution",
+    "ExtremeValueCopula",
+    "FrankCopula",
+    "GaussianCopula",
+    "GumbelCopula",
+    "IndependenceCopula",
+    "StudentTCopula",
+    "depfunc_ev",
+    "transforms",
+    "rvs_kernel"
+]
diff --git a/statsmodels/distributions/copula/archimedean.py b/statsmodels/distributions/copula/archimedean.py
index abc4addcb..81eadedc3 100644
--- a/statsmodels/distributions/copula/archimedean.py
+++ b/statsmodels/distributions/copula/archimedean.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Fri Jan 29 19:19:45 2021

@@ -6,19 +7,36 @@ License: BSD-3

 """
 import sys
+
 import numpy as np
 from scipy import stats, integrate, optimize
+
 from . import transforms
 from .copulas import Copula
 from statsmodels.tools.rng_qrng import check_random_state


+def _debye(alpha):
+    # EPSILON = np.finfo(np.float32).eps
+    EPSILON = np.finfo(np.float64).eps * 100
+
+    def integrand(t):
+        return np.squeeze(t / (np.exp(t) - 1))
+    _alpha = np.squeeze(alpha)
+    debye_value = integrate.quad(integrand, EPSILON, _alpha)[0] / _alpha
+    return debye_value
+
+
 def _debyem1_expansion(x):
     """Debye function minus 1, Taylor series approximation around zero

     function is not used
     """
-    pass
+    x = np.asarray(x)
+    # Expansion derived using Wolfram alpha
+    dm1 = (-x/4 + x**2/36 - x**4/3600 + x**6/211680 - x**8/10886400 +
+           x**10/526901760 - x**12 * 691/16999766784000)
+    return dm1


 def tau_frank(theta):
@@ -35,7 +53,23 @@ def tau_frank(theta):
     -------
     tau : float, tau for given theta
     """
-    pass
+
+    if theta <= 1:
+        tau = _tau_frank_expansion(theta)
+    else:
+        debye_value = _debye(theta)
+        tau = 1 + 4 * (debye_value - 1) / theta
+
+    return tau
+
+
+def _tau_frank_expansion(x):
+    x = np.asarray(x)
+    # expansion derived using wolfram alpha
+    # agrees better with R copula for x<=1, maybe even for larger theta
+    tau = (x/9 - x**3/900 + x**5/52920 - x**7/2721600 + x**9/131725440 -
+           x**11 * 691/4249941696000)
+    return tau


 class ArchimedeanCopula(Copula):
@@ -61,103 +95,385 @@ class ArchimedeanCopula(Copula):
         self.transform = transform
         self.k_args = 1

+    def _handle_args(self, args):
+        # TODO: how to we handle non-tuple args? two we allow single values?
+        # Model fit might give an args that can be empty
+        if isinstance(args, np.ndarray):
+            args = tuple(args)  # handles empty arrays, unpacks otherwise
+        if not isinstance(args, tuple):
+            # could still be a scalar or numpy scalar
+            args = (args,)
+        if len(args) == 0 or args == (None,):
+            # second condition because we converted None to tuple
+            args = self.args
+
+        return args
+
+    def _handle_u(self, u):
+        u = np.asarray(u)
+        if u.shape[-1] != self.k_dim:
+            import warnings
+            warnings.warn("u has different dimension than k_dim. "
+                          "This will raise exception in future versions",
+                          FutureWarning)
+
+        return u
+
     def cdf(self, u, args=()):
         """Evaluate cdf of Archimedean copula."""
-        pass
+        args = self._handle_args(args)
+        u = self._handle_u(u)
+        axis = -1
+        phi = self.transform.evaluate
+        phi_inv = self.transform.inverse
+        cdfv = phi_inv(phi(u, *args).sum(axis), *args)
+        # clip numerical noise
+        out = cdfv if isinstance(cdfv, np.ndarray) else None
+        cdfv = np.clip(cdfv, 0., 1., out=out)  # inplace if possible
+        return cdfv

     def pdf(self, u, args=()):
         """Evaluate pdf of Archimedean copula."""
-        pass
+        u = self._handle_u(u)
+        args = self._handle_args(args)
+        axis = -1
+
+        phi_d1 = self.transform.deriv
+        if u.shape[-1] == 2:
+            psi_d = self.transform.deriv2_inverse
+        elif u.shape[-1] == 3:
+            psi_d = self.transform.deriv3_inverse
+        elif u.shape[-1] == 4:
+            psi_d = self.transform.deriv4_inverse
+        else:
+            # will raise NotImplementedError if not available
+            k = u.shape[-1]
+
+            def psi_d(*args):
+                return self.transform.derivk_inverse(k, *args)
+
+        psi = self.transform.evaluate(u, *args).sum(axis)
+
+        pdfv = np.prod(phi_d1(u, *args), axis)
+        pdfv *= (psi_d(psi, *args))
+
+        # use abs, I'm not sure yet about where to add signs
+        return np.abs(pdfv)

     def logpdf(self, u, args=()):
         """Evaluate log pdf of multivariate Archimedean copula."""
-        pass
+
+        u = self._handle_u(u)
+        args = self._handle_args(args)
+        axis = -1
+
+        phi_d1 = self.transform.deriv
+        if u.shape[-1] == 2:
+            psi_d = self.transform.deriv2_inverse
+        elif u.shape[-1] == 3:
+            psi_d = self.transform.deriv3_inverse
+        elif u.shape[-1] == 4:
+            psi_d = self.transform.deriv4_inverse
+        else:
+            # will raise NotImplementedError if not available
+            k = u.shape[-1]
+
+            def psi_d(*args):
+                return self.transform.derivk_inverse(k, *args)
+
+        psi = self.transform.evaluate(u, *args).sum(axis)
+
+        # I need np.abs because derivatives are negative,
+        # is this correct for mv?
+        logpdfv = np.sum(np.log(np.abs(phi_d1(u, *args))), axis)
+        logpdfv += np.log(np.abs(psi_d(psi, *args)))
+
+        return logpdfv
+
+    def _arg_from_tau(self, tau):
+        # for generic compat
+        return self.theta_from_tau(tau)


 class ClaytonCopula(ArchimedeanCopula):
-    """Clayton copula.
+    r"""Clayton copula.

     Dependence is greater in the negative tail than in the positive.

     .. math::

-        C_\\theta(u,v) = \\left[ \\max\\left\\{ u^{-\\theta} + v^{-\\theta} -1 ;
-        0 \\right\\} \\right]^{-1/\\theta}
+        C_\theta(u,v) = \left[ \max\left\{ u^{-\theta} + v^{-\theta} -1 ;
+        0 \right\} \right]^{-1/\theta}

-    with :math:`\\theta\\in[-1,\\infty)\\backslash\\{0\\}`.
+    with :math:`\theta\in[-1,\infty)\backslash\{0\}`.

     """

     def __init__(self, theta=None, k_dim=2):
         if theta is not None:
-            args = theta,
+            args = (theta,)
         else:
             args = ()
         super().__init__(transforms.TransfClayton(), args=args, k_dim=k_dim)
+
         if theta is not None:
             if theta <= -1 or theta == 0:
                 raise ValueError('Theta must be > -1 and !=0')
         self.theta = theta

+    def rvs(self, nobs=1, args=(), random_state=None):
+        rng = check_random_state(random_state)
+        th, = self._handle_args(args)
+        x = rng.random((nobs, self.k_dim))
+        v = stats.gamma(1. / th).rvs(size=(nobs, 1), random_state=rng)
+        if self.k_dim != 2:
+            rv = (1 - np.log(x) / v) ** (-1. / th)
+        else:
+            rv = self.transform.inverse(- np.log(x) / v, th)
+        return rv
+
+    def pdf(self, u, args=()):
+        u = self._handle_u(u)
+        th, = self._handle_args(args)
+        if u.shape[-1] == 2:
+            a = (th + 1) * np.prod(u, axis=-1) ** -(th + 1)
+            b = np.sum(u ** -th, axis=-1) - 1
+            c = -(2 * th + 1) / th
+            return a * b ** c
+        else:
+            return super().pdf(u, args)
+
+    def logpdf(self, u, args=()):
+        # we skip Archimedean logpdf, that uses numdiff
+        return super().logpdf(u, args=args)
+
+    def cdf(self, u, args=()):
+        u = self._handle_u(u)
+        th, = self._handle_args(args)
+        d = u.shape[-1]  # self.k_dim
+        return (np.sum(u ** (-th), axis=-1) - d + 1) ** (-1.0 / th)
+
+    def tau(self, theta=None):
+        # Joe 2014 p. 168
+        if theta is None:
+            theta = self.theta
+
+        return theta / (theta + 2)
+
+    def theta_from_tau(self, tau):
+        return 2 * tau / (1 - tau)
+

 class FrankCopula(ArchimedeanCopula):
-    """Frank copula.
+    r"""Frank copula.

     Dependence is symmetric.

     .. math::

-        C_\\theta(\\mathbf{u}) = -\\frac{1}{\\theta} \\log \\left[ 1-
-        \\frac{ \\prod_j (1-\\exp(- \\theta u_j)) }{ (1 - \\exp(-\\theta)-1)^{d -
-        1} } \\right]
+        C_\theta(\mathbf{u}) = -\frac{1}{\theta} \log \left[ 1-
+        \frac{ \prod_j (1-\exp(- \theta u_j)) }{ (1 - \exp(-\theta)-1)^{d -
+        1} } \right]

-    with :math:`\\theta\\in \\mathbb{R}\\backslash\\{0\\}, \\mathbf{u} \\in [0, 1]^d`.
+    with :math:`\theta\in \mathbb{R}\backslash\{0\}, \mathbf{u} \in [0, 1]^d`.

     """

     def __init__(self, theta=None, k_dim=2):
         if theta is not None:
-            args = theta,
+            args = (theta,)
         else:
             args = ()
         super().__init__(transforms.TransfFrank(), args=args, k_dim=k_dim)
+
         if theta is not None:
             if theta == 0:
                 raise ValueError('Theta must be !=0')
         self.theta = theta

+    def rvs(self, nobs=1, args=(), random_state=None):
+        rng = check_random_state(random_state)
+        th, = self._handle_args(args)
+        x = rng.random((nobs, self.k_dim))
+        v = stats.logser.rvs(1. - np.exp(-th),
+                             size=(nobs, 1), random_state=rng)
+
+        return -1. / th * np.log(1. + np.exp(-(-np.log(x) / v))
+                                 * (np.exp(-th) - 1.))
+
+    # explicit BV formulas copied from Joe 1997 p. 141
+    # todo: check expm1 and log1p for improved numerical precision
+
+    def pdf(self, u, args=()):
+        u = self._handle_u(u)
+        th, = self._handle_args(args)
+        if u.shape[-1] != 2:
+            return super().pdf(u, th)
+
+        g_ = np.exp(-th * np.sum(u, axis=-1)) - 1
+        g1 = np.exp(-th) - 1
+
+        num = -th * g1 * (1 + g_)
+        aux = np.prod(np.exp(-th * u) - 1, axis=-1) + g1
+        den = aux ** 2
+        return num / den
+
+    def cdf(self, u, args=()):
+        u = self._handle_u(u)
+        th, = self._handle_args(args)
+        dim = u.shape[-1]
+
+        num = np.prod(1 - np.exp(- th * u), axis=-1)
+        den = (1 - np.exp(-th)) ** (dim - 1)
+
+        return -1.0 / th * np.log(1 - num / den)
+
+    def logpdf(self, u, args=()):
+        u = self._handle_u(u)
+        th, = self._handle_args(args)
+        if u.shape[-1] == 2:
+            # bivariate case
+            u1, u2 = u[..., 0], u[..., 1]
+            b = 1 - np.exp(-th)
+            pdf = np.log(th * b) - th * (u1 + u2)
+            pdf -= 2 * np.log(b - (1 - np.exp(- th * u1)) *
+                              (1 - np.exp(- th * u2)))
+            return pdf
+        else:
+            # for now use generic from base Copula class, log(self.pdf(...))
+            # we skip Archimedean logpdf, that uses numdiff
+            return super().logpdf(u, args)
+
     def cdfcond_2g1(self, u, args=()):
         """Conditional cdf of second component given the value of first.
         """
-        pass
+        u = self._handle_u(u)
+        th, = self._handle_args(args)
+        if u.shape[-1] == 2:
+            # bivariate case
+            u1, u2 = u[..., 0], u[..., 1]
+            cdfc = np.exp(- th * u1)
+            cdfc /= np.expm1(-th) / np.expm1(- th * u2) + np.expm1(- th * u1)
+            return cdfc
+        else:
+            raise NotImplementedError("u needs to be bivariate (2 columns)")

     def ppfcond_2g1(self, q, u1, args=()):
         """Conditional pdf of second component given the value of first.
         """
-        pass
+        u1 = np.asarray(u1)
+        th, = self._handle_args(args)
+        if u1.shape[-1] == 1:
+            # bivariate case, conditional on value of first variable
+            ppfc = - np.log(1 + np.expm1(- th) /
+                            ((1 / q - 1) * np.exp(-th * u1) + 1)) / th
+
+            return ppfc
+        else:
+            raise NotImplementedError("u needs to be bivariate (2 columns)")
+
+    def tau(self, theta=None):
+        # Joe 2014 p. 166
+        if theta is None:
+            theta = self.theta
+
+        return tau_frank(theta)
+
+    def theta_from_tau(self, tau):
+        MIN_FLOAT_LOG = np.log(sys.float_info.min)
+        MAX_FLOAT_LOG = np.log(sys.float_info.max)
+
+        def _theta_from_tau(alpha):
+            return self.tau(theta=alpha) - tau
+
+        # avoid start=1, because break in tau approximation method
+        start = 0.5 if tau < 0.11 else 2
+
+        result = optimize.least_squares(_theta_from_tau, start, bounds=(
+            MIN_FLOAT_LOG, MAX_FLOAT_LOG))
+        theta = result.x[0]
+        return theta


 class GumbelCopula(ArchimedeanCopula):
-    """Gumbel copula.
+    r"""Gumbel copula.

     Dependence is greater in the positive tail than in the negative.

     .. math::

-        C_\\theta(u,v) = \\exp\\!\\left[ -\\left( (-\\log(u))^\\theta +
-        (-\\log(v))^\\theta \\right)^{1/\\theta} \\right]
+        C_\theta(u,v) = \exp\!\left[ -\left( (-\log(u))^\theta +
+        (-\log(v))^\theta \right)^{1/\theta} \right]

-    with :math:`\\theta\\in[1,\\infty)`.
+    with :math:`\theta\in[1,\infty)`.

     """

     def __init__(self, theta=None, k_dim=2):
         if theta is not None:
-            args = theta,
+            args = (theta,)
         else:
             args = ()
         super().__init__(transforms.TransfGumbel(), args=args, k_dim=k_dim)
+
         if theta is not None:
             if theta <= 1:
                 raise ValueError('Theta must be > 1')
         self.theta = theta
+
+    def rvs(self, nobs=1, args=(), random_state=None):
+        rng = check_random_state(random_state)
+        th, = self._handle_args(args)
+        x = rng.random((nobs, self.k_dim))
+        v = stats.levy_stable.rvs(
+            1. / th, 1., 0,
+            np.cos(np.pi / (2 * th)) ** th,
+            size=(nobs, 1), random_state=rng
+        )
+
+        if self.k_dim != 2:
+            rv = np.exp(-(-np.log(x) / v) ** (1. / th))
+        else:
+            rv = self.transform.inverse(- np.log(x) / v, th)
+        return rv
+
+    def pdf(self, u, args=()):
+        u = self._handle_u(u)
+        th, = self._handle_args(args)
+        if u.shape[-1] == 2:
+            xy = -np.log(u)
+            xy_theta = xy ** th
+
+            sum_xy_theta = np.sum(xy_theta, axis=-1)
+            sum_xy_theta_theta = sum_xy_theta ** (1.0 / th)
+
+            a = np.exp(-sum_xy_theta_theta)
+            b = sum_xy_theta_theta + th - 1.0
+            c = sum_xy_theta ** (1.0 / th - 2)
+            d = np.prod(xy, axis=-1) ** (th - 1.0)
+            e = np.prod(u, axis=-1) ** (- 1.0)
+
+            return a * b * c * d * e
+        else:
+            return super().pdf(u, args)
+
+    def cdf(self, u, args=()):
+        u = self._handle_u(u)
+        th, = self._handle_args(args)
+        h = np.sum((-np.log(u)) ** th, axis=-1)
+        cdf = np.exp(-h ** (1.0 / th))
+        return cdf
+
+    def logpdf(self, u, args=()):
+        # we skip Archimedean logpdf, that uses numdiff
+        return super().logpdf(u, args=args)
+
+    def tau(self, theta=None):
+        # Joe 2014 p. 172
+        if theta is None:
+            theta = self.theta
+
+        return (theta - 1) / theta
+
+    def theta_from_tau(self, tau):
+        return 1 / (1 - tau)
diff --git a/statsmodels/distributions/copula/copulas.py b/statsmodels/distributions/copula/copulas.py
index 44d98e3d2..2f6b97bfa 100644
--- a/statsmodels/distributions/copula/copulas.py
+++ b/statsmodels/distributions/copula/copulas.py
@@ -11,8 +11,10 @@ copulas. The Annals of Statistics, 37(5), pp.2990-3022.

 """
 from abc import ABC, abstractmethod
+
 import numpy as np
 from scipy import stats
+
 from statsmodels.graphics import utils


@@ -34,9 +36,11 @@ class CopulaDistribution:
     Status: experimental, argument handling may still change

     """
-
     def __init__(self, copula, marginals, cop_args=()):
+
         self.copula = copula
+
+        # no checking done on marginals
         self.marginals = marginals
         self.cop_args = cop_args
         self.k_vars = len(marginals)
@@ -86,7 +90,18 @@ class CopulaDistribution:
         --------
         statsmodels.tools.rng_qrng.check_random_state
         """
-        pass
+        if cop_args is None:
+            cop_args = self.cop_args
+        if marg_args is None:
+            marg_args = [()] * self.k_vars
+
+        sample = self.copula.rvs(nobs=nobs, args=cop_args,
+                                 random_state=random_state)
+
+        for i, dist in enumerate(self.marginals):
+            sample[:, i] = dist.ppf(0.5 + (1 - 1e-10) * (sample[:, i] - 0.5),
+                                    *marg_args[i])
+        return sample

     def cdf(self, y, cop_args=None, marg_args=None):
         """CDF of copula distribution.
@@ -113,7 +128,20 @@ class CopulaDistribution:
         cdf values

         """
-        pass
+        y = np.asarray(y)
+        if cop_args is None:
+            cop_args = self.cop_args
+        if marg_args is None:
+            marg_args = [()] * y.shape[-1]
+
+        cdf_marg = []
+        for i in range(self.k_vars):
+            cdf_marg.append(self.marginals[i].cdf(y[..., i], *marg_args[i]))
+
+        u = np.column_stack(cdf_marg)
+        if y.ndim == 1:
+            u = u.squeeze()
+        return self.copula.cdf(u, cop_args)

     def pdf(self, y, cop_args=None, marg_args=None):
         """PDF of copula distribution.
@@ -139,7 +167,7 @@ class CopulaDistribution:
         -------
         pdf values
         """
-        pass
+        return np.exp(self.logpdf(y, cop_args=cop_args, marg_args=marg_args))

     def logpdf(self, y, cop_args=None, marg_args=None):
         """Log-pdf of copula distribution.
@@ -166,30 +194,47 @@ class CopulaDistribution:
         log-pdf values

         """
-        pass
+        y = np.asarray(y)
+        if cop_args is None:
+            cop_args = self.cop_args
+        if marg_args is None:
+            marg_args = tuple([()] * y.shape[-1])
+
+        lpdf = 0.0
+        cdf_marg = []
+        for i in range(self.k_vars):
+            lpdf += self.marginals[i].logpdf(y[..., i], *marg_args[i])
+            cdf_marg.append(self.marginals[i].cdf(y[..., i], *marg_args[i]))
+
+        u = np.column_stack(cdf_marg)
+        if y.ndim == 1:
+            u = u.squeeze()
+
+        lpdf += self.copula.logpdf(u, cop_args)
+        return lpdf


 class Copula(ABC):
-    """A generic Copula class meant for subclassing.
+    r"""A generic Copula class meant for subclassing.

     Notes
     -----
-    A function :math:`\\phi` on :math:`[0, \\infty]` is the Laplace-Stieltjes
-    transform of a distribution function if and only if :math:`\\phi` is
-    completely monotone and :math:`\\phi(0) = 1` [2]_.
+    A function :math:`\phi` on :math:`[0, \infty]` is the Laplace-Stieltjes
+    transform of a distribution function if and only if :math:`\phi` is
+    completely monotone and :math:`\phi(0) = 1` [2]_.

     The following algorithm for sampling a ``d``-dimensional exchangeable
-    Archimedean copula with generator :math:`\\phi` is due to Marshall, Olkin
-    (1988) [1]_, where :math:`LS^{−1}(\\phi)` denotes the inverse
-    Laplace-Stieltjes transform of :math:`\\phi`.
+    Archimedean copula with generator :math:`\phi` is due to Marshall, Olkin
+    (1988) [1]_, where :math:`LS^{−1}(\phi)` denotes the inverse
+    Laplace-Stieltjes transform of :math:`\phi`.

     From a mixture representation with respect to :math:`F`, the following
     algorithm may be derived for sampling Archimedean copulas, see [1]_.

-    1. Sample :math:`V \\sim F = LS^{−1}(\\phi)`.
-    2. Sample i.i.d. :math:`X_i \\sim U[0,1], i \\in \\{1,...,d\\}`.
-    3. Return:math:`(U_1,..., U_d)`, where :math:`U_i = \\phi(−\\log(X_i)/V), i
-       \\in \\{1, ...,d\\}`.
+    1. Sample :math:`V \sim F = LS^{−1}(\phi)`.
+    2. Sample i.i.d. :math:`X_i \sim U[0,1], i \in \{1,...,d\}`.
+    3. Return:math:`(U_1,..., U_d)`, where :math:`U_i = \phi(−\log(X_i)/V), i
+       \in \{1, ...,d\}`.

     Detailed properties of each copula can be found in [3]_.

@@ -252,7 +297,7 @@ class Copula(ABC):
         --------
         statsmodels.tools.rng_qrng.check_random_state
         """
-        pass
+        raise NotImplementedError

     @abstractmethod
     def pdf(self, u, args=()):
@@ -274,7 +319,6 @@ class Copula(ABC):
         pdf : ndarray, (nobs, k_dim)
             Copula pdf evaluated at points ``u``.
         """
-        pass

     def logpdf(self, u, args=()):
         """Log of copula pdf, loglikelihood.
@@ -295,7 +339,7 @@ class Copula(ABC):
         cdf : ndarray, (nobs, k_dim)
             Copula log-pdf evaluated at points ``u``.
         """
-        pass
+        return np.log(self.pdf(u, *args))

     @abstractmethod
     def cdf(self, u, args=()):
@@ -317,7 +361,6 @@ class Copula(ABC):
         cdf : ndarray, (nobs, k_dim)
             Copula cdf evaluated at points ``u``.
         """
-        pass

     def plot_scatter(self, sample=None, nobs=500, random_state=None, ax=None):
         """Sample the copula and plot.
@@ -354,7 +397,18 @@ class Copula(ABC):
         --------
         statsmodels.tools.rng_qrng.check_random_state
         """
-        pass
+        if self.k_dim != 2:
+            raise ValueError("Can only plot 2-dimensional Copula.")
+
+        if sample is None:
+            sample = self.rvs(nobs=nobs, random_state=random_state)
+
+        fig, ax = utils.create_mpl_ax(ax)
+        ax.scatter(sample[:, 0], sample[:, 1])
+        ax.set_xlabel('u')
+        ax.set_ylabel('v')
+
+        return fig, sample

     def plot_pdf(self, ticks_nbr=10, ax=None):
         """Plot the PDF.
@@ -374,7 +428,40 @@ class Copula(ABC):
             `ax` is connected.

         """
-        pass
+        from matplotlib import pyplot as plt
+        if self.k_dim != 2:
+            import warnings
+            warnings.warn("Plotting 2-dimensional Copula.")
+
+        n_samples = 100
+
+        eps = 1e-4
+        uu, vv = np.meshgrid(np.linspace(eps, 1 - eps, n_samples),
+                             np.linspace(eps, 1 - eps, n_samples))
+        points = np.vstack([uu.ravel(), vv.ravel()]).T
+
+        data = self.pdf(points).T.reshape(uu.shape)
+        min_ = np.nanpercentile(data, 5)
+        max_ = np.nanpercentile(data, 95)
+
+        fig, ax = utils.create_mpl_ax(ax)
+
+        vticks = np.linspace(min_, max_, num=ticks_nbr)
+        range_cbar = [min_, max_]
+        cs = ax.contourf(uu, vv, data, vticks,
+                         antialiased=True, vmin=range_cbar[0],
+                         vmax=range_cbar[1])
+
+        ax.set_xlabel("u")
+        ax.set_ylabel("v")
+        ax.set_xlim(0, 1)
+        ax.set_ylim(0, 1)
+        ax.set_aspect('equal')
+        cbar = plt.colorbar(cs, ticks=vticks)
+        cbar.set_label('p')
+        fig.tight_layout()
+
+        return fig

     def tau_simulated(self, nobs=1024, random_state=None):
         """Kendall's tau based on simulated samples.
@@ -385,7 +472,8 @@ class Copula(ABC):
             Kendall's tau.

         """
-        pass
+        x = self.rvs(nobs, random_state=random_state)
+        return stats.kendalltau(x[:, 0], x[:, 1])[0]

     def fit_corr_param(self, data):
         """Copula correlation parameter using Kendall's tau of sample data.
@@ -402,7 +490,16 @@ class Copula(ABC):
             pearson correlation in elliptical.
             If k_dim > 2, then average tau is used.
         """
-        pass
+        x = np.asarray(data)
+
+        if x.shape[1] == 2:
+            tau = stats.kendalltau(x[:, 0], x[:, 1])[0]
+        else:
+            k = self.k_dim
+            taus = [stats.kendalltau(x[..., i], x[..., j])[0]
+                    for i in range(k) for j in range(i+1, k)]
+            tau = np.mean(taus)
+        return self._arg_from_tau(tau)

     def _arg_from_tau(self, tau):
         """Compute correlation parameter from tau.
@@ -419,4 +516,4 @@ class Copula(ABC):
             pearson correlation in elliptical.

         """
-        pass
+        raise NotImplementedError
diff --git a/statsmodels/distributions/copula/depfunc_ev.py b/statsmodels/distributions/copula/depfunc_ev.py
index 3bc766d3c..928a850c3 100644
--- a/statsmodels/distributions/copula/depfunc_ev.py
+++ b/statsmodels/distributions/copula/depfunc_ev.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """ Pickand's dependence functions as generators for EV-copulas


@@ -7,6 +8,7 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 from scipy import stats
 from statsmodels.tools.numdiff import _approx_fprime_cs_scalar, approx_hess
@@ -17,53 +19,133 @@ class PickandDependence:
     def __call__(self, *args, **kwargs):
         return self.evaluate(*args, **kwargs)

+    def evaluate(self, t, *args):
+        raise NotImplementedError
+
     def deriv(self, t, *args):
         """First derivative of the dependence function

         implemented through numerical differentiation
         """
-        pass
+        t = np.atleast_1d(t)
+        return _approx_fprime_cs_scalar(t, self.evaluate)

     def deriv2(self, t, *args):
         """Second derivative of the dependence function

         implemented through numerical differentiation
         """
-        pass
+        if np.size(t) == 1:
+            d2 = approx_hess([t], self.evaluate, args=args)[0]
+        else:
+            d2 = np.array([approx_hess([ti], self.evaluate, args=args)[0, 0]
+                           for ti in t])
+        return d2


 class AsymLogistic(PickandDependence):
-    """asymmetric logistic model of Tawn 1988
+    '''asymmetric logistic model of Tawn 1988

     special case: a1=a2=1 : Gumbel

     restrictions:
      - theta in (0,1]
      - a1, a2 in [0,1]
-    """
+    '''
     k_args = 3

+    def _check_args(self, a1, a2, theta):
+        condth = (theta > 0) and (theta <= 1)
+        conda1 = (a1 >= 0) and (a1 <= 1)
+        conda2 = (a2 >= 0) and (a2 <= 1)
+        return condth and conda1 and conda2
+
+    def evaluate(self, t, a1, a2, theta):
+
+        # if not np.all(_check_args(a1, a2, theta)):
+        #    raise ValueError('invalid args')
+
+        transf = (1 - a2) * (1-t)
+        transf += (1 - a1) * t
+        transf += ((a1 * t)**(1./theta) + (a2 * (1-t))**(1./theta))**theta
+
+        return transf
+
+    def deriv(self, t, a1, a2, theta):
+        b = theta
+
+        d1 = ((a1 * (a1 * t)**(1/b - 1) - a2 * (a2 * (1 - t))**(1/b - 1)) *
+              ((a1 * t)**(1/b) + (a2 * (1 - t))**(1/b))**(b - 1) - a1 + a2)
+        return d1
+
+    def deriv2(self, t, a1, a2, theta):
+        b = theta
+        d2 = ((1 - b) * (a1 * t)**(1/b) * (a2 * (1 - t))**(1/b) *
+              ((a1 * t)**(1/b) + (a2 * (1 - t))**(1/b))**(b - 2)
+              )/(b * (1 - t)**2 * t**2)
+        return d2
+

 transform_tawn = AsymLogistic()


 class AsymNegLogistic(PickandDependence):
-    """asymmetric negative logistic model of Joe 1990
+    '''asymmetric negative logistic model of Joe 1990

     special case:  a1=a2=1 : symmetric negative logistic of Galambos 1978

     restrictions:
      - theta in (0,inf)
      - a1, a2 in (0,1]
-    """
+    '''
     k_args = 3

+    def _check_args(self, a1, a2, theta):
+        condth = (theta > 0)
+        conda1 = (a1 > 0) and (a1 <= 1)
+        conda2 = (a2 > 0) and (a2 <= 1)
+        return condth and conda1 and conda2
+
+    def evaluate(self, t, a1, a2, theta):
+        # if not np.all(self._check_args(a1, a2, theta)):
+        #     raise ValueError('invalid args')
+
+        a1, a2 = a2, a1
+        transf = 1 - ((a1 * (1-t))**(-1./theta) +
+                      (a2 * t)**(-1./theta))**(-theta)
+        return transf
+
+    def deriv(self, t, a1, a2, theta):
+        a1, a2 = a2, a1
+        m1 = -1 / theta
+        m2 = m1 - 1
+
+        # (a1^(-1/θ) (1 - t)^(-1/θ - 1) - a2^(-1/θ) t^(-1/θ - 1))*
+        # (a1^(-1/θ) (1 - t)^(-1/θ) + (a2 t)^(-1/θ))^(-θ - 1)
+
+        d1 = (a1**m1 * (1 - t)**m2 - a2**m1 * t**m2) * (
+                (a1 * (1 - t))**m1 + (a2 * t)**m1)**(-theta - 1)
+        return d1
+
+    def deriv2(self, t, a1, a2, theta):
+        b = theta
+        a1, a2 = a2, a1
+        a1tp = (a1 * (1 - t))**(1/b)
+        a2tp = (a2 * t)**(1/b)
+        a1tn = (a1 * (1 - t))**(-1/b)
+        a2tn = (a2 * t)**(-1/b)
+
+        t1 = (b + 1) * a2tp * a1tp * (a1tn + a2tn)**(-b)
+        t2 = b * (1 - t)**2 * t**2 * (a1tp + a2tp)**2
+        d2 = t1 / t2
+        return d2
+

 transform_joe = AsymNegLogistic()


 class AsymMixed(PickandDependence):
-    """asymmetric mixed model of Tawn 1988
+    '''asymmetric mixed model of Tawn 1988

     special case:  k=0, theta in [0,1] : symmetric mixed model of
         Tiago de Oliveira 1980
@@ -73,50 +155,175 @@ class AsymMixed(PickandDependence):
      - theta + 3*k > 0
      - theta + k <= 1
      - theta + 2*k <= 1
-    """
+    '''
     k_args = 2

+    def _check_args(self, theta, k):
+        condth = (theta >= 0)
+        cond1 = (theta + 3*k > 0) and (theta + k <= 1) and (theta + 2*k <= 1)
+        return condth & cond1
+
+    def evaluate(self, t, theta, k):
+        transf = 1 - (theta + k) * t + theta * t*t + k * t**3
+        return transf
+
+    def deriv(self, t, theta, k):
+        d_dt = - (theta + k) + 2 * theta * t + 3 * k * t**2
+        return d_dt
+
+    def deriv2(self, t, theta, k):
+        d2_dt2 = 2 * theta + 6 * k * t
+        return d2_dt2

+
+# backwards compatibility for now
 transform_tawn2 = AsymMixed()


 class AsymBiLogistic(PickandDependence):
-    """bilogistic model of Coles and Tawn 1994, Joe, Smith and Weissman 1992
+    '''bilogistic model of Coles and Tawn 1994, Joe, Smith and Weissman 1992

     restrictions:
      - (beta, delta) in (0,1)^2 or
      - (beta, delta) in (-inf,0)^2

     not vectorized because of numerical integration
-    """
+    '''
     k_args = 2

+    def _check_args(self, beta, delta):
+        cond1 = (beta > 0) and (beta <= 1) and (delta > 0) and (delta <= 1)
+        cond2 = (beta < 0) and (delta < 0)
+        return cond1 | cond2
+
+    def evaluate(self, t, beta, delta):
+        # if not np.all(_check_args(beta, delta)):
+        #    raise ValueError('invalid args')
+
+        def _integrant(w):
+            term1 = (1 - beta) * np.power(w, -beta) * (1-t)
+            term2 = (1 - delta) * np.power(1-w, -delta) * t
+            return np.maximum(term1, term2)
+
+        from scipy.integrate import quad
+        transf = quad(_integrant, 0, 1)[0]
+        return transf
+

 transform_bilogistic = AsymBiLogistic()


 class HR(PickandDependence):
-    """model of Huesler Reiss 1989
+    '''model of Huesler Reiss 1989

     special case:  a1=a2=1 : symmetric negative logistic of Galambos 1978

     restrictions:
      - lambda in (0,inf)
-    """
+    '''
     k_args = 1

+    def _check_args(self, lamda):
+        cond = (lamda > 0)
+        return cond
+
+    def evaluate(self, t, lamda):
+        # if not np.all(self._check_args(lamda)):
+        #    raise ValueError('invalid args')
+
+        term = np.log((1. - t) / t) * 0.5 / lamda
+
+        from scipy.stats import norm
+        # use special if I want to avoid stats import
+        transf = ((1 - t) * norm._cdf(lamda + term) +
+                  t * norm._cdf(lamda - term))
+        return transf
+
+    def _derivs(self, t, lamda, order=(1, 2)):
+        if not isinstance(order, (int, np.integer)):
+            if (1 in order) and (2 in order):
+                order = -1
+            else:
+                raise ValueError("order should be 1, 2, or (1,2)")
+
+        dn = 1 / np.sqrt(2 * np.pi)
+        a = lamda
+        g = np.log((1. - t) / t) * 0.5 / a
+        gd1 = 1 / (2 * a * (t - 1) * t)
+        gd2 = (0.5 - t) / (a * ((1 - t) * t)**2)
+        # f = stats.norm.cdf(t)
+        # fd1 = np.exp(-t**2 / 2) / sqrt(2 * np.pi)  # stats.norm.pdf(t)
+        # fd2 = fd1 * t
+        tp = a + g
+        fp = stats.norm.cdf(tp)
+        fd1p = np.exp(-tp**2 / 2) * dn  # stats.norm.pdf(t)
+        fd2p = -fd1p * tp
+        tn = a - g
+        fn = stats.norm.cdf(tn)
+        fd1n = np.exp(-tn**2 / 2) * dn  # stats.norm.pdf(t)
+        fd2n = -fd1n * tn
+
+        if order in (1, -1):
+            # d1 = g'(t) (-t f'(a - g(t)) - (t - 1) f'(a + g(t))) + f(a - g(t))
+            #      - f(a + g(t))
+            d1 = gd1 * (-t * fd1n - (t - 1) * fd1p) + fn - fp
+        if order in (2, -1):
+            # d2 = g'(t)^2 (t f''(a - g(t)) - (t - 1) f''(a + g(t))) +
+            #     (-(t - 1) g''(t) - 2 g'(t)) f'(a + g(t)) -
+            #     (t g''(t) + 2 g'(t)) f'(a - g(t))
+            d2 = (gd1**2 * (t * fd2n - (t - 1) * fd2p) +
+                  (-(t - 1) * gd2 - 2 * gd1) * fd1p -
+                  (t * gd2 + 2 * gd1) * fd1n
+                  )
+
+        if order == 1:
+            return d1
+        elif order == 2:
+            return d2
+        elif order == -1:
+            return (d1, d2)
+
+    def deriv(self, t, lamda):
+        return self._derivs(t, lamda, 1)
+
+    def deriv2(self, t, lamda):
+        return self._derivs(t, lamda, 2)
+

 transform_hr = HR()


+# def transform_tev(t, rho, df):
 class TEV(PickandDependence):
-    """t-EV model of Demarta and McNeil 2005
+    '''t-EV model of Demarta and McNeil 2005

     restrictions:
      - rho in (-1,1)
      - x > 0
-    """
+    '''
     k_args = 2

+    def _check_args(self, rho, df):
+        x = df  # alias, Genest and Segers use chi, copual package uses df
+        cond1 = (x > 0)
+        cond2 = (rho > 0) and (rho < 1)
+        return cond1 and cond2
+
+    def evaluate(self, t, rho, df):
+        x = df  # alias, Genest and Segers use chi, copual package uses df
+        # if not np.all(self, _check_args(rho, x)):
+        #    raise ValueError('invalid args')
+
+        from scipy.stats import t as stats_t
+        # use special if I want to avoid stats import
+
+        term1 = (np.power(t/(1.-t), 1./x) - rho)  # for t
+        term2 = (np.power((1.-t)/t, 1./x) - rho)  # for 1-t
+        term0 = np.sqrt(1. + x) / np.sqrt(1 - rho*rho)
+        z1 = term0 * term1
+        z2 = term0 * term2
+        transf = t * stats_t._cdf(z1, x+1) + (1 - t) * stats_t._cdf(z2, x+1)
+        return transf
+

 transform_tev = TEV()
diff --git a/statsmodels/distributions/copula/elliptical.py b/statsmodels/distributions/copula/elliptical.py
index b455f5a29..fe012dfb8 100644
--- a/statsmodels/distributions/copula/elliptical.py
+++ b/statsmodels/distributions/copula/elliptical.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Fri Jan 29 19:19:45 2021

@@ -8,7 +9,9 @@ License: BSD-3
 """
 import numpy as np
 from scipy import stats
+# scipy compat:
 from statsmodels.compat.scipy import multivariate_t
+
 from statsmodels.distributions.copula.copulas import Copula


@@ -28,6 +31,30 @@ class EllipticalCopula(Copula):
     copulas.

     """
+    def _handle_args(self, args):
+        if args != () and args is not None:
+            msg = ("Methods in elliptical copulas use copula parameters in"
+                   " attributes. `arg` in the method is ignored")
+            raise ValueError(msg)
+        else:
+            return args
+
+    def rvs(self, nobs=1, args=(), random_state=None):
+        self._handle_args(args)
+        x = self.distr_mv.rvs(size=nobs, random_state=random_state)
+        return self.distr_uv.cdf(x)
+
+    def pdf(self, u, args=()):
+        self._handle_args(args)
+        ppf = self.distr_uv.ppf(u)
+        mv_pdf_ppf = self.distr_mv.pdf(ppf)
+
+        return mv_pdf_ppf / np.prod(self.distr_uv.pdf(ppf), axis=-1)
+
+    def cdf(self, u, args=()):
+        self._handle_args(args)
+        ppf = self.distr_uv.ppf(u)
+        return self.distr_mv.cdf(ppf)

     def tau(self, corr=None):
         """Bivariate kendall's tau based on correlation coefficient.
@@ -43,7 +70,12 @@ class EllipticalCopula(Copula):
         Kendall's tau that corresponds to pearson correlation in the
         elliptical copula.
         """
-        pass
+        if corr is None:
+            corr = self.corr
+        if corr.shape == (2, 2):
+            corr = corr[0, 1]
+        rho = 2 * np.arcsin(corr) / np.pi
+        return rho

     def corr_from_tau(self, tau):
         """Pearson correlation from kendall's tau.
@@ -58,7 +90,8 @@ class EllipticalCopula(Copula):
         Pearson correlation coefficient for given tau in elliptical
         copula. This can be used as parameter for an elliptical copula.
         """
-        pass
+        corr = np.sin(tau * np.pi / 2)
+        return corr

     def fit_corr_param(self, data):
         """Copula correlation parameter using Kendall's tau of sample data.
@@ -75,26 +108,38 @@ class EllipticalCopula(Copula):
             pearson correlation in elliptical.
             If k_dim > 2, then average tau is used.
         """
-        pass
+        x = np.asarray(data)
+
+        if x.shape[1] == 2:
+            tau = stats.kendalltau(x[:, 0], x[:, 1])[0]
+        else:
+            k = self.k_dim
+            tau = np.eye(k)
+            for i in range(k):
+                for j in range(i+1, k):
+                    tau_ij = stats.kendalltau(x[..., i], x[..., j])[0]
+                    tau[i, j] = tau[j, i] = tau_ij
+
+        return self._arg_from_tau(tau)


 class GaussianCopula(EllipticalCopula):
-    """Gaussian copula.
+    r"""Gaussian copula.

     It is constructed from a multivariate normal distribution over
-    :math:`\\mathbb{R}^d` by using the probability integral transform.
+    :math:`\mathbb{R}^d` by using the probability integral transform.

-    For a given correlation matrix :math:`R \\in[-1, 1]^{d \\times d}`,
+    For a given correlation matrix :math:`R \in[-1, 1]^{d \times d}`,
     the Gaussian copula with parameter matrix :math:`R` can be written
     as:

     .. math::

-        C_R^{\\text{Gauss}}(u) = \\Phi_R\\left(\\Phi^{-1}(u_1),\\dots,
-        \\Phi^{-1}(u_d) \\right),
+        C_R^{\text{Gauss}}(u) = \Phi_R\left(\Phi^{-1}(u_1),\dots,
+        \Phi^{-1}(u_d) \right),

-    where :math:`\\Phi^{-1}` is the inverse cumulative distribution function
-    of a standard normal and :math:`\\Phi_R` is the joint cumulative
+    where :math:`\Phi^{-1}` is the inverse cumulative distribution function
+    of a standard normal and :math:`\Phi_R` is the joint cumulative
     distribution function of a multivariate normal distribution with mean
     vector zero and covariance matrix equal to the correlation
     matrix :math:`R`.
@@ -136,12 +181,13 @@ class GaussianCopula(EllipticalCopula):
         if corr is None:
             corr = np.eye(k_dim)
         elif k_dim == 2 and np.size(corr) == 1:
-            corr = np.array([[1.0, corr], [corr, 1.0]])
+            corr = np.array([[1., corr], [corr, 1.]])
+
         self.corr = np.asarray(corr)
-        self.args = self.corr,
+        self.args = (self.corr,)
         self.distr_uv = stats.norm
-        self.distr_mv = stats.multivariate_normal(cov=corr, allow_singular=
-            allow_singular)
+        self.distr_mv = stats.multivariate_normal(
+            cov=corr, allow_singular=allow_singular)

     def dependence_tail(self, corr=None):
         """
@@ -160,7 +206,12 @@ class GaussianCopula(EllipticalCopula):
         Lower and upper tail dependence coefficients of the copula with given
         Pearson correlation coefficient.
         """
-        pass
+
+        return 0, 0
+
+    def _arg_from_tau(self, tau):
+        # for generic compat
+        return self.corr_from_tau(tau)


 class StudentTCopula(EllipticalCopula):
@@ -198,13 +249,21 @@ class StudentTCopula(EllipticalCopula):
         if corr is None:
             corr = np.eye(k_dim)
         elif k_dim == 2 and np.size(corr) == 1:
-            corr = np.array([[1.0, corr], [corr, 1.0]])
+            corr = np.array([[1., corr], [corr, 1.]])
+
         self.df = df
         self.corr = np.asarray(corr)
-        self.args = corr, df
+        self.args = (corr, df)
+        # both uv and mv are frozen distributions
         self.distr_uv = stats.t(df=df)
         self.distr_mv = multivariate_t(shape=corr, df=df)

+    def cdf(self, u, args=()):
+        raise NotImplementedError("CDF not available in closed form.")
+        # ppf = self.distr_uv.ppf(u)
+        # mvt = MVT([0, 0], self.corr, self.df)
+        # return mvt.cdf(ppf)
+
     def spearmans_rho(self, corr=None):
         """
         Bivariate Spearman's rho based on correlation coefficient.
@@ -222,7 +281,13 @@ class StudentTCopula(EllipticalCopula):
         Spearman's rho that corresponds to pearson correlation in the
         elliptical copula.
         """
-        pass
+        if corr is None:
+            corr = self.corr
+        if corr.shape == (2, 2):
+            corr = corr[0, 1]
+
+        tau = 6 * np.arcsin(corr / 2) / np.pi
+        return tau

     def dependence_tail(self, corr=None):
         """
@@ -241,4 +306,18 @@ class StudentTCopula(EllipticalCopula):
         Lower and upper tail dependence coefficients of the copula with given
         Pearson correlation coefficient.
         """
-        pass
+        if corr is None:
+            corr = self.corr
+        if corr.shape == (2, 2):
+            corr = corr[0, 1]
+
+        df = self.df
+        t = - np.sqrt((df + 1) * (1 - corr) / 1 + corr)
+        # Note self.distr_uv is frozen, df cannot change, use stats.t instead
+        lam = 2 * stats.t.cdf(t, df + 1)
+        return lam, lam
+
+    def _arg_from_tau(self, tau):
+        # for generic compat
+        # this does not provide an estimate of df
+        return self.corr_from_tau(tau)
diff --git a/statsmodels/distributions/copula/extreme_value.py b/statsmodels/distributions/copula/extreme_value.py
index 3de573445..b1d2a9365 100644
--- a/statsmodels/distributions/copula/extreme_value.py
+++ b/statsmodels/distributions/copula/extreme_value.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """ Extreme Value Copulas
 Created on Fri Jan 29 19:19:45 2021

@@ -5,14 +6,16 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 from .copulas import Copula


 def copula_bv_ev(u, transform, args=()):
-    """generic bivariate extreme value copula
-    """
-    pass
+    '''generic bivariate extreme value copula
+    '''
+    u, v = u
+    return np.exp(np.log(u * v) * (transform(np.log(u)/np.log(u*v), *args)))


 class ExtremeValueCopula(Copula):
@@ -54,7 +57,19 @@ class ExtremeValueCopula(Copula):
         self.k_args = transform.k_args
         self.args = args
         if k_dim != 2:
-            raise ValueError('Only bivariate EV copulas are available.')
+            raise ValueError("Only bivariate EV copulas are available.")
+
+    def _handle_args(self, args):
+        # TODO: how to we handle non-tuple args? two we allow single values?
+        # Model fit might give an args that can be empty
+        if isinstance(args, np.ndarray):
+            args = tuple(args)  # handles empty arrays, unpacks otherwise
+        if args == () or args is None:
+            args = self.args
+        if not isinstance(args, tuple):
+            args = (args,)
+
+        return args

     def cdf(self, u, args=()):
         """Evaluate cdf of bivariate extreme value copula.
@@ -74,7 +89,12 @@ class ExtremeValueCopula(Copula):
         -------
         CDF values at evaluation points.
         """
-        pass
+        # currently only Bivariate
+        u, v = np.asarray(u).T
+        args = self._handle_args(args)
+        cdfv = np.exp(np.log(u * v) *
+                      self.transform(np.log(u)/np.log(u*v), *args))
+        return cdfv

     def pdf(self, u, args=()):
         """Evaluate pdf of bivariate extreme value copula.
@@ -94,7 +114,20 @@ class ExtremeValueCopula(Copula):
         -------
         PDF values at evaluation points.
         """
-        pass
+        tr = self.transform
+        u1, u2 = np.asarray(u).T
+        args = self._handle_args(args)
+
+        log_u12 = np.log(u1 * u2)
+        t = np.log(u1) / log_u12
+        cdf = self.cdf(u, args)
+        dep = tr(t, *args)
+        d1 = tr.deriv(t, *args)
+        d2 = tr.deriv2(t, *args)
+        pdf_ = cdf / (u1 * u2) * ((dep + (1 - t) * d1) * (dep - t * d1) -
+                                  d2 * (1 - t) * t / log_u12)
+
+        return pdf_

     def logpdf(self, u, args=()):
         """Evaluate log-pdf of bivariate extreme value copula.
@@ -114,7 +147,7 @@ class ExtremeValueCopula(Copula):
         -------
         Log-pdf values at evaluation points.
         """
-        pass
+        return np.log(self.pdf(u, args=args))

     def conditional_2g1(self, u, args=()):
         """conditional distribution
@@ -125,4 +158,7 @@ class ExtremeValueCopula(Copula):

         where t = np.log(v)/np.log(u*v)
         """
-        pass
+        raise NotImplementedError
+
+    def fit_corr_param(self, data):
+        raise NotImplementedError
diff --git a/statsmodels/distributions/copula/other_copulas.py b/statsmodels/distributions/copula/other_copulas.py
index ed0d6080e..4435a3b1c 100644
--- a/statsmodels/distributions/copula/other_copulas.py
+++ b/statsmodels/distributions/copula/other_copulas.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Fri Jan 29 19:19:45 2021

@@ -7,6 +8,7 @@ License: BSD-3
 """
 import numpy as np
 from scipy import stats
+
 from statsmodels.tools.rng_qrng import check_random_state
 from statsmodels.distributions.copula.copulas import Copula

@@ -18,7 +20,7 @@ class IndependenceCopula(Copula):

     .. math::

-        C_ heta(u,v) = uv
+        C_\theta(u,v) = uv

     Parameters
     ----------
@@ -33,10 +35,35 @@ class IndependenceCopula(Copula):
     copulas.

     """
-
     def __init__(self, k_dim=2):
         super().__init__(k_dim=k_dim)

+    def _handle_args(self, args):
+        if args != () and args is not None:
+            msg = ("Independence copula does not use copula parameters.")
+            raise ValueError(msg)
+        else:
+            return args
+
+    def rvs(self, nobs=1, args=(), random_state=None):
+        self._handle_args(args)
+        rng = check_random_state(random_state)
+        x = rng.random((nobs, self.k_dim))
+        return x
+
+    def pdf(self, u, args=()):
+        u = np.asarray(u)
+        return np.ones(u.shape[:-1])
+
+    def cdf(self, u, args=()):
+        return np.prod(u, axis=-1)
+
+    def tau(self):
+        return 0
+
+    def plot_pdf(self, *args):
+        raise NotImplementedError("PDF is constant over the domain.")
+

 def rvs_kernel(sample, size, bw=1, k_func=None, return_extras=False):
     """Random sampling from empirical copula using Beta distribution
@@ -71,4 +98,26 @@ def rvs_kernel(sample, size, bw=1, k_func=None, return_extras=False):
     -----
     Status: experimental, API will change.
     """
-    pass
+    # vectorized for observations
+    n = sample.shape[0]
+    if k_func is None:
+        kfunc = _kernel_rvs_beta1
+    idx = np.random.randint(0, n, size=size)
+    xi = sample[idx]
+    krvs = np.column_stack([kfunc(xii, bw) for xii in xi.T])
+
+    if return_extras:
+        return krvs, idx, xi
+    else:
+        return krvs
+
+
+def _kernel_rvs_beta(x, bw):
+    # Beta kernel for density, pdf, estimation
+    return stats.beta.rvs(x / bw + 1, (1 - x) / bw + 1, size=x.shape)
+
+
+def _kernel_rvs_beta1(x, bw):
+    # Beta kernel for density, pdf, estimation
+    # Kiriliouk, Segers, Tsukuhara 2020 arxiv, using bandwith 1/nobs sample
+    return stats.beta.rvs(x / bw, (1 - x) / bw + 1)
diff --git a/statsmodels/distributions/copula/transforms.py b/statsmodels/distributions/copula/transforms.py
index d25c063a7..9ccfcc4c9 100644
--- a/statsmodels/distributions/copula/transforms.py
+++ b/statsmodels/distributions/copula/transforms.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """ Transformation Classes as generators for Archimedean copulas


@@ -8,6 +9,7 @@ License: BSD-3

 """
 import warnings
+
 import numpy as np
 from scipy.special import expm1, gamma

@@ -17,23 +19,198 @@ class Transforms:
     def __init__(self):
         pass

+    def deriv2_inverse(self, phi, args):
+        t = self.inverse(phi, args)
+        phi_d1 = self.deriv(t, args)
+        phi_d2 = self.deriv2(t, args)
+        return np.abs(phi_d2 / phi_d1**3)
+
+    def derivk_inverse(self, k, phi, theta):
+        raise NotImplementedError("not yet implemented")
+

 class TransfFrank(Transforms):
-    pass
+
+    def evaluate(self, t, theta):
+        t = np.asarray(t)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", RuntimeWarning)
+            val = -(np.log(-expm1(-theta*t)) - np.log(-expm1(-theta)))
+        return val
+        # return - np.log(expm1(-theta*t) / expm1(-theta))
+
+    def inverse(self, phi, theta):
+        phi = np.asarray(phi)
+        return -np.log1p(np.exp(-phi) * expm1(-theta)) / theta
+
+    def deriv(self, t, theta):
+        t = np.asarray(t)
+        tmp = np.exp(-t*theta)
+        return -theta * tmp/(tmp - 1)
+
+    def deriv2(self, t, theta):
+        t = np.asarray(t)
+        tmp = np.exp(theta * t)
+        d2 = - theta**2 * tmp / (tmp - 1)**2
+        return d2
+
+    def deriv2_inverse(self, phi, theta):
+
+        et = np.exp(theta)
+        ept = np.exp(phi + theta)
+        d2 = (et - 1) * ept / (theta * (ept - et + 1)**2)
+        return d2
+
+    def deriv3_inverse(self, phi, theta):
+        et = np.exp(theta)
+        ept = np.exp(phi + theta)
+        d3 = -(((et - 1) * ept * (ept + et - 1)) /
+               (theta * (ept - et + 1)**3))
+        return d3
+
+    def deriv4_inverse(self, phi, theta):
+        et = np.exp(theta)
+        ept = np.exp(phi + theta)
+        p = phi
+        b = theta
+        d4 = ((et - 1) * ept *
+              (-4 * ept + np.exp(2 * (p + b)) + 4 * np.exp(p + 2 * b) -
+               2 * et + np.exp(2 * b) + 1)
+              ) / (b * (ept - et + 1)**4)
+
+        return d4
+
+    def is_completly_monotonic(self, theta):
+        # range of theta for which it is copula for d>2 (more than 2 rvs)
+        return theta > 0 & theta < 1


 class TransfClayton(Transforms):
-    pass
+
+    def _checkargs(self, theta):
+        return theta > 0
+
+    def evaluate(self, t, theta):
+        return np.power(t, -theta) - 1.
+
+    def inverse(self, phi, theta):
+        return np.power(1 + phi, -1/theta)
+
+    def deriv(self, t, theta):
+        return -theta * np.power(t, -theta-1)
+
+    def deriv2(self, t, theta):
+        return theta * (theta + 1) * np.power(t, -theta-2)
+
+    def deriv_inverse(self, phi, theta):
+        return -(1 + phi)**(-(theta + 1) / theta) / theta
+
+    def deriv2_inverse(self, phi, theta):
+        return ((theta + 1) * (1 + phi)**(-1 / theta - 2)) / theta**2
+
+    def deriv3_inverse(self, phi, theta):
+        th = theta  # shorthand
+        d3 = -((1 + th) * (1 + 2 * th) / th**3 * (1 + phi)**(-1 / th - 3))
+        return d3
+
+    def deriv4_inverse(self, phi, theta):
+        th = theta  # shorthand
+        d4 = ((1 + th) * (1 + 2 * th) * (1 + 3 * th) / th**4
+              ) * (1 + phi)**(-1 / th - 4)
+        return d4
+
+    def derivk_inverse(self, k, phi, theta):
+        thi = 1 / theta  # shorthand
+        d4 = (-1)**k * gamma(k + thi) / gamma(thi) * (1 + phi)**(-(k + thi))
+        return d4
+
+    def is_completly_monotonic(self, theta):
+        return theta > 0


 class TransfGumbel(Transforms):
-    """
+    '''
     requires theta >=1
-    """
+    '''
+
+    def _checkargs(self, theta):
+        return theta >= 1
+
+    def evaluate(self, t, theta):
+        return np.power(-np.log(t), theta)
+
+    def inverse(self, phi, theta):
+        return np.exp(-np.power(phi, 1. / theta))
+
+    def deriv(self, t, theta):
+        return - theta * (-np.log(t))**(theta - 1) / t
+
+    def deriv2(self, t, theta):
+        tmp1 = np.log(t)
+        d2 = (theta*(-1)**(1 + theta) * tmp1**(theta-1) * (1 - theta) +
+              theta*(-1)**(1 + theta)*tmp1**theta)/(t**2*tmp1)
+        # d2 = (theta * tmp1**(-1 + theta) * (1 - theta) + theta * tmp1**theta
+        #       ) / (t**2 * tmp1)
+
+        return d2
+
+    def deriv2_inverse(self, phi, theta):
+        th = theta  # shorthand
+        d2 = ((phi**(2 / th) + (th - 1) * phi**(1 / th))) / (phi**2 * th**2)
+        d2 *= np.exp(-phi**(1 / th))
+        return d2
+
+    def deriv3_inverse(self, phi, theta):
+        p = phi  # shorthand
+        b = theta
+        d3 = (-p**(3 / b) + (3 - 3 * b) * p**(2 / b) +
+              ((3 - 2 * b) * b - 1) * p**(1 / b)
+              ) / (p * b)**3
+        d3 *= np.exp(-p**(1 / b))
+        return d3
+
+    def deriv4_inverse(self, phi, theta):
+        p = phi  # shorthand
+        b = theta
+        d4 = (((6 * b**3 - 11 * b**2 + 6. * b - 1) * p**(1 / b) +
+               (11 * b**2 - 18 * b + 7) * p**(2 / b) +
+               (6 * (b - 1)) * p**(3 / b) +
+               p**(4 / b))
+              )/(p * b)**4
+
+        d4 *= np.exp(-p**(1 / b))
+        return d4
+
+    def is_completly_monotonic(self, theta):
+        return theta > 1


 class TransfIndep(Transforms):
-    pass
+
+    def evaluate(self, t, *args):
+        t = np.asarray(t)
+        return -np.log(t)
+
+    def inverse(self, phi, *args):
+        phi = np.asarray(phi)
+        return np.exp(-phi)
+
+    def deriv(self, t, *args):
+        t = np.asarray(t)
+        return - 1./t
+
+    def deriv2(self, t, *args):
+        t = np.asarray(t)
+        return 1. / t**2
+
+    def deriv2_inverse(self, phi, *args):
+        return np.exp(-phi)
+
+    def deriv3_inverse(self, phi, *args):
+        return -np.exp(-phi)
+
+    def deriv4_inverse(self, phi, *args):
+        return np.exp(-phi)


 class _TransfPower(Transforms):
@@ -46,3 +223,17 @@ class _TransfPower(Transforms):

     def __init__(self, transform):
         self.transform = transform
+
+    def evaluate(self, t, alpha, beta, *tr_args):
+        t = np.asarray(t)
+
+        phi = np.power(self.transform.evaluate(np.power(t, alpha), *tr_args),
+                       beta)
+        return phi
+
+    def inverse(self, phi, alpha, beta, *tr_args):
+        phi = np.asarray(phi)
+        transf = self.transform
+        phi_inv = np.power(transf.evaluate(np.power(phi, 1. / beta), *tr_args),
+                           1. / alpha)
+        return phi_inv
diff --git a/statsmodels/distributions/discrete.py b/statsmodels/distributions/discrete.py
index 26e3ef62c..84a941c9e 100644
--- a/statsmodels/distributions/discrete.py
+++ b/statsmodels/distributions/discrete.py
@@ -1,62 +1,226 @@
 import numpy as np
+
 from scipy.stats import rv_discrete, poisson, nbinom
 from scipy.special import gammaln
 from scipy._lib._util import _lazywhere
+
 from statsmodels.base.model import GenericLikelihoodModel


 class genpoisson_p_gen(rv_discrete):
-    """Generalized Poisson distribution
-    """
+    '''Generalized Poisson distribution
+    '''
+    def _argcheck(self, mu, alpha, p):
+        return (mu >= 0) & (alpha==alpha) & (p > 0)
+
+    def _logpmf(self, x, mu, alpha, p):
+        mu_p = mu ** (p - 1.)
+        a1 = np.maximum(np.nextafter(0, 1), 1 + alpha * mu_p)
+        a2 = np.maximum(np.nextafter(0, 1), mu + (a1 - 1.) * x)
+        logpmf_ = np.log(mu) + (x - 1.) * np.log(a2)
+        logpmf_ -=  x * np.log(a1) + gammaln(x + 1.) + a2 / a1
+        return logpmf_
+
+    def _pmf(self, x, mu, alpha, p):
+        return np.exp(self._logpmf(x, mu, alpha, p))
+
+    def mean(self, mu, alpha, p):
+        return mu

+    def var(self, mu, alpha, p):
+        dispersion_factor = (1 + alpha * mu**(p - 1))**2
+        var = dispersion_factor * mu
+        return var

-genpoisson_p = genpoisson_p_gen(name='genpoisson_p', longname=
-    'Generalized Poisson')
+
+genpoisson_p = genpoisson_p_gen(name='genpoisson_p',
+                                longname='Generalized Poisson')


 class zipoisson_gen(rv_discrete):
-    """Zero Inflated Poisson distribution
-    """
+    '''Zero Inflated Poisson distribution
+    '''
+    def _argcheck(self, mu, w):
+        return (mu > 0) & (w >= 0) & (w<=1)
+
+    def _logpmf(self, x, mu, w):
+        return _lazywhere(x != 0, (x, mu, w),
+                          (lambda x, mu, w: np.log(1. - w) + x * np.log(mu) -
+                          gammaln(x + 1.) - mu),
+                          np.log(w + (1. - w) * np.exp(-mu)))
+
+    def _pmf(self, x, mu, w):
+        return np.exp(self._logpmf(x, mu, w))
+
+    def _cdf(self, x, mu, w):
+        # construct cdf from standard poisson's cdf and the w inflation of zero
+        return w + poisson(mu=mu).cdf(x) * (1 - w)

+    def _ppf(self, q, mu, w):
+        # we just translated and stretched q to remove zi
+        q_mod = (q - w) / (1 - w)
+        x = poisson(mu=mu).ppf(q_mod)
+        # set to zero if in the zi range
+        x[q < w] = 0
+        return x

-zipoisson = zipoisson_gen(name='zipoisson', longname='Zero Inflated Poisson')
+    def mean(self, mu, w):
+        return (1 - w) * mu

+    def var(self, mu, w):
+        dispersion_factor = 1 + w * mu
+        var = (dispersion_factor * self.mean(mu, w))
+        return var
+
+    def _moment(self, n, mu, w):
+        return (1 - w) * poisson.moment(n, mu)
+
+
+zipoisson = zipoisson_gen(name='zipoisson',
+                          longname='Zero Inflated Poisson')

 class zigeneralizedpoisson_gen(rv_discrete):
-    """Zero Inflated Generalized Poisson distribution
-    """
+    '''Zero Inflated Generalized Poisson distribution
+    '''
+    def _argcheck(self, mu, alpha, p, w):
+        return (mu > 0) & (w >= 0) & (w<=1)

+    def _logpmf(self, x, mu, alpha, p, w):
+        return _lazywhere(x != 0, (x, mu, alpha, p, w),
+                          (lambda x, mu, alpha, p, w: np.log(1. - w) +
+                          genpoisson_p.logpmf(x, mu, alpha, p)),
+                          np.log(w + (1. - w) *
+                          genpoisson_p.pmf(x, mu, alpha, p)))

-zigenpoisson = zigeneralizedpoisson_gen(name='zigenpoisson', longname=
-    'Zero Inflated Generalized Poisson')
+    def _pmf(self, x, mu, alpha, p, w):
+        return np.exp(self._logpmf(x, mu, alpha, p, w))

+    def mean(self, mu, alpha, p, w):
+        return (1 - w) * mu
+
+    def var(self, mu, alpha, p, w):
+        p = p - 1
+        dispersion_factor = (1 + alpha * mu ** p) ** 2 + w * mu
+        var = (dispersion_factor * self.mean(mu, alpha, p, w))
+        return var

-class zinegativebinomial_gen(rv_discrete):
-    """Zero Inflated Generalized Negative Binomial distribution
-    """

+zigenpoisson = zigeneralizedpoisson_gen(
+    name='zigenpoisson',
+    longname='Zero Inflated Generalized Poisson')

-zinegbin = zinegativebinomial_gen(name='zinegbin', longname=
-    'Zero Inflated Generalized Negative Binomial')
+
+class zinegativebinomial_gen(rv_discrete):
+    '''Zero Inflated Generalized Negative Binomial distribution
+    '''
+    def _argcheck(self, mu, alpha, p, w):
+        return (mu > 0) & (w >= 0) & (w<=1)
+
+    def _logpmf(self, x, mu, alpha, p, w):
+        s, p = self.convert_params(mu, alpha, p)
+        return _lazywhere(x != 0, (x, s, p, w),
+                          (lambda x, s, p, w: np.log(1. - w) +
+                          nbinom.logpmf(x, s, p)),
+                          np.log(w + (1. - w) *
+                          nbinom.pmf(x, s, p)))
+
+    def _pmf(self, x, mu, alpha, p, w):
+        return np.exp(self._logpmf(x, mu, alpha, p, w))
+
+    def _cdf(self, x, mu, alpha, p, w):
+        s, p = self.convert_params(mu, alpha, p)
+        # construct cdf from standard negative binomial cdf
+        # and the w inflation of zero
+        return w + nbinom.cdf(x, s, p) * (1 - w)
+
+    def _ppf(self, q, mu, alpha, p, w):
+        s, p = self.convert_params(mu, alpha, p)
+        # we just translated and stretched q to remove zi
+        q_mod = (q - w) / (1 - w)
+        x = nbinom.ppf(q_mod, s, p)
+        # set to zero if in the zi range
+        x[q < w] = 0
+        return x
+
+    def mean(self, mu, alpha, p, w):
+        return (1 - w) * mu
+
+    def var(self, mu, alpha, p, w):
+        dispersion_factor = 1 + alpha * mu ** (p - 1) + w * mu
+        var = (dispersion_factor * self.mean(mu, alpha, p, w))
+        return var
+
+    def _moment(self, n, mu, alpha, p, w):
+        s, p = self.convert_params(mu, alpha, p)
+        return (1 - w) * nbinom.moment(n, s, p)
+
+    def convert_params(self, mu, alpha, p):
+        size = 1. / alpha * mu**(2-p)
+        prob = size / (size + mu)
+        return (size, prob)
+
+zinegbin = zinegativebinomial_gen(name='zinegbin',
+    longname='Zero Inflated Generalized Negative Binomial')


 class truncatedpoisson_gen(rv_discrete):
-    """Truncated Poisson discrete random variable
-    """
+    '''Truncated Poisson discrete random variable
+    '''
+    # TODO: need cdf, and rvs

+    def _argcheck(self, mu, truncation):
+        # this does not work
+        # vector bound breaks some generic methods
+        # self.a = truncation + 1 # max(truncation + 1, 0)
+        return (mu >= 0) & (truncation >= -1)

-truncatedpoisson = truncatedpoisson_gen(name='truncatedpoisson', longname=
-    'Truncated Poisson')
+    def _get_support(self, mu, truncation):
+        return truncation + 1, self.b

+    def _logpmf(self, x, mu, truncation):
+        pmf = 0
+        for i in range(int(np.max(truncation)) + 1):
+            pmf += poisson.pmf(i, mu)
+
+        logpmf_ = poisson.logpmf(x, mu) - np.log(1 - pmf)
+        #logpmf_[x < truncation + 1] = - np.inf
+        return logpmf_
+
+    def _pmf(self, x, mu, truncation):
+        return np.exp(self._logpmf(x, mu, truncation))
+
+truncatedpoisson = truncatedpoisson_gen(name='truncatedpoisson',
+                                        longname='Truncated Poisson')

 class truncatednegbin_gen(rv_discrete):
-    """Truncated Generalized Negative Binomial (NB-P) discrete random variable
-    """
+    '''Truncated Generalized Negative Binomial (NB-P) discrete random variable
+    '''
+    def _argcheck(self, mu, alpha, p, truncation):
+        return (mu >= 0) & (truncation >= -1)
+
+    def _get_support(self, mu, alpha, p, truncation):
+        return truncation + 1, self.b
+
+    def _logpmf(self, x, mu, alpha, p, truncation):
+        size, prob = self.convert_params(mu, alpha, p)
+        pmf = 0
+        for i in range(int(np.max(truncation)) + 1):
+            pmf += nbinom.pmf(i, size, prob)

+        logpmf_ = nbinom.logpmf(x, size, prob) - np.log(1 - pmf)
+        # logpmf_[x < truncation + 1] = - np.inf
+        return logpmf_

-truncatednegbin = truncatednegbin_gen(name='truncatednegbin', longname=
-    'Truncated Generalized Negative Binomial')
+    def _pmf(self, x, mu, alpha, p, truncation):
+        return np.exp(self._logpmf(x, mu, alpha, p, truncation))

+    def convert_params(self, mu, alpha, p):
+        size = 1. / alpha * mu**(2-p)
+        prob = size / (size + mu)
+        return (size, prob)
+
+truncatednegbin = truncatednegbin_gen(name='truncatednegbin',
+    longname='Truncated Generalized Negative Binomial')

 class DiscretizedCount(rv_discrete):
     """Count distribution based on discretized distribution
@@ -106,25 +270,100 @@ class DiscretizedCount(rv_discrete):
     """

     def __new__(cls, *args, **kwds):
+        # rv_discrete.__new__ does not allow `kwds`, skip it
+        # only does dispatch to multinomial
         return super(rv_discrete, cls).__new__(cls)

     def __init__(self, distr, d_offset=0, add_scale=True, **kwds):
+        # kwds are extras in rv_discrete
         self.distr = distr
         self.d_offset = d_offset
         self._ctor_param = distr._ctor_param
         self.add_scale = add_scale
         if distr.shapes is not None:
-            self.k_shapes = len(distr.shapes.split(','))
+            self.k_shapes = len(distr.shapes.split(","))
             if add_scale:
-                kwds.update({'shapes': distr.shapes + ', s'})
+                kwds.update({"shapes": distr.shapes + ", s"})
                 self.k_shapes += 1
-        elif add_scale:
-            kwds.update({'shapes': 's'})
-            self.k_shapes = 1
         else:
-            self.k_shapes = 0
+            # no shape parameters in underlying distribution
+            if add_scale:
+                kwds.update({"shapes": "s"})
+                self.k_shapes = 1
+            else:
+                self.k_shapes = 0
+
         super().__init__(**kwds)

+    def _updated_ctor_param(self):
+        dic = super()._updated_ctor_param()
+        dic["distr"] = self.distr
+        return dic
+
+    def _unpack_args(self, args):
+        if self.add_scale:
+            scale = args[-1]
+            args = args[:-1]
+        else:
+            scale = 1
+        return args, scale
+
+    def _rvs(self, *args, size=None, random_state=None):
+        args, scale = self._unpack_args(args)
+        if size is None:
+            size = getattr(self, "_size", 1)
+        rv = np.trunc(self.distr.rvs(*args, scale=scale, size=size,
+                                     random_state=random_state) +
+                      self.d_offset)
+        return rv
+
+    def _pmf(self, x, *args):
+        distr = self.distr
+        if self.d_offset != 0:
+            x = x + self.d_offset
+
+        args, scale = self._unpack_args(args)
+
+        p = (distr.sf(x, *args, scale=scale) -
+             distr.sf(x + 1, *args, scale=scale))
+        return p
+
+    def _cdf(self, x, *args):
+        distr = self.distr
+        args, scale = self._unpack_args(args)
+        if self.d_offset != 0:
+            x = x + self.d_offset
+        p = distr.cdf(x + 1, *args, scale=scale)
+        return p
+
+    def _sf(self, x, *args):
+        distr = self.distr
+        args, scale = self._unpack_args(args)
+        if self.d_offset != 0:
+            x = x + self.d_offset
+        p = distr.sf(x + 1, *args, scale=scale)
+        return p
+
+    def _ppf(self, p, *args):
+        distr = self.distr
+        args, scale = self._unpack_args(args)
+
+        qc = distr.ppf(p, *args, scale=scale)
+        if self.d_offset != 0:
+            qc = qc + self.d_offset
+        q = np.floor(qc * (1 - 1e-15))
+        return q
+
+    def _isf(self, p, *args):
+        distr = self.distr
+        args, scale = self._unpack_args(args)
+
+        qc = distr.isf(p, *args, scale=scale)
+        if self.d_offset != 0:
+            qc = qc + self.d_offset
+        q = np.floor(qc * (1 - 1e-15))
+        return q
+

 class DiscretizedModel(GenericLikelihoodModel):
     """experimental model to fit discretized distribution
@@ -159,20 +398,43 @@ class DiscretizedModel(GenericLikelihoodModel):
     >>> probs = res.predict(which="probs", k_max=5)

     """
-
     def __init__(self, endog, exog=None, distr=None):
         if exog is not None:
-            raise ValueError('exog is not supported')
+            raise ValueError("exog is not supported")
+
         super().__init__(endog, exog, distr=distr)
         self._init_keys.append('distr')
         self.df_resid = len(endog) - distr.k_shapes
         self.df_model = 0
-        self.k_extra = distr.k_shapes
+        self.k_extra = distr.k_shapes  # no constant subtracted
         self.k_constant = 0
-        self.nparams = distr.k_shapes
+        self.nparams = distr.k_shapes  # needed for start_params
         self.start_params = 0.5 * np.ones(self.nparams)

+    def loglike(self, params):
+
+        # this does not allow exog yet,
+        # model `params` are also distribution `args`
+        # For regression model this needs to be replaced by a conversion method
+        args = params
+        ll = np.log(self.distr._pmf(self.endog, *args))
+        return ll.sum()
+
+    def predict(self, params, exog=None, which=None, k_max=20):
+
+        if exog is not None:
+            raise ValueError("exog is not supported")
+
+        args = params
+        if which == "probs":
+            pr = self.distr.pmf(np.arange(k_max), *args)
+            return pr
+        else:
+            raise ValueError('only which="probs" is currently implemented')
+
     def get_distr(self, params):
         """frozen distribution instance of the discrete distribution.
         """
-        pass
+        args = params
+        distr = self.distr(*args)
+        return distr
diff --git a/statsmodels/distributions/edgeworth.py b/statsmodels/distributions/edgeworth.py
index 6e33bc44d..e09df29f1 100644
--- a/statsmodels/distributions/edgeworth.py
+++ b/statsmodels/distributions/edgeworth.py
@@ -1,12 +1,23 @@
+
 import warnings
+
 import numpy as np
 from numpy.polynomial.hermite_e import HermiteE
 from scipy.special import factorial
 from scipy.stats import rv_continuous
 import scipy.special as special
-_faa_di_bruno_cache = {(1): [[(1, 1)]], (2): [[(1, 2)], [(2, 1)]], (3): [[(
-    1, 3)], [(2, 1), (1, 1)], [(3, 1)]], (4): [[(1, 4)], [(1, 2), (2, 1)],
-    [(2, 2)], [(3, 1), (1, 1)], [(4, 1)]]}
+
+# TODO:
+# * actually solve (31) of Blinnikov & Moessner
+# * numerical stability: multiply factorials in logspace?
+# * ppf & friends: Cornish & Fisher series, or tabulate/solve
+
+
+_faa_di_bruno_cache = {
+        1: [[(1, 1)]],
+        2: [[(1, 2)], [(2, 1)]],
+        3: [[(1, 3)], [(2, 1), (1, 1)], [(3, 1)]],
+        4: [[(1, 4)], [(1, 2), (2, 1)], [(2, 2)], [(3, 1), (1, 1)], [(4, 1)]]}


 def _faa_di_bruno_partitions(n):
@@ -33,7 +44,14 @@ def _faa_di_bruno_partitions(n):
     >>> for p in _faa_di_bruno_partitions(4):
     ...     assert 4 == sum(m * k for (m, k) in p)
     """
-    pass
+    if n < 1:
+        raise ValueError("Expected a positive integer; got %s instead" % n)
+    try:
+        return _faa_di_bruno_cache[n]
+    except KeyError:
+        # TODO: higher order terms
+        # solve Eq. (31) from Blinninkov & Moessner here
+        raise NotImplementedError('Higher order terms not yet implemented.')


 def cumulant_from_moments(momt, n):
@@ -53,10 +71,32 @@ def cumulant_from_moments(momt, n):
     kappa : float
         n-th cumulant.
     """
-    pass
-
-
-_norm_pdf_C = np.sqrt(2 * np.pi)
+    if n < 1:
+        raise ValueError("Expected a positive integer. Got %s instead." % n)
+    if len(momt) < n:
+        raise ValueError("%s-th cumulant requires %s moments, "
+                         "only got %s." % (n, n, len(momt)))
+    kappa = 0.
+    for p in _faa_di_bruno_partitions(n):
+        r = sum(k for (m, k) in p)
+        term = (-1)**(r - 1) * factorial(r - 1)
+        for (m, k) in p:
+            term *= np.power(momt[m - 1] / factorial(m), k) / factorial(k)
+        kappa += term
+    kappa *= factorial(n)
+    return kappa
+
+## copied from scipy.stats.distributions to avoid the overhead of
+## the public methods
+_norm_pdf_C = np.sqrt(2*np.pi)
+def _norm_pdf(x):
+    return np.exp(-x**2/2.0) / _norm_pdf_C
+
+def _norm_cdf(x):
+    return special.ndtr(x)
+
+def _norm_sf(x):
+    return special.ndtr(-x)


 class ExpandedNormal(rv_continuous):
@@ -110,20 +150,55 @@ class ExpandedNormal(rv_continuous):
     .. [*] S. Blinnikov and R. Moessner, Expansions for nearly Gaussian
         distributions, Astron. Astrophys. Suppl. Ser. 130, 193 (1998)
     """
-
     def __init__(self, cum, name='Edgeworth expanded normal', **kwds):
         if len(cum) < 2:
-            raise ValueError('At least two cumulants are needed.')
+            raise ValueError("At least two cumulants are needed.")
         self._coef, self._mu, self._sigma = self._compute_coefs_pdf(cum)
         self._herm_pdf = HermiteE(self._coef)
         if self._coef.size > 2:
             self._herm_cdf = HermiteE(-self._coef[1:])
         else:
-            self._herm_cdf = lambda x: 0.0
+            self._herm_cdf = lambda x: 0.
+
+        # warn if pdf(x) < 0 for some values of x within 4 sigma
         r = np.real_if_close(self._herm_pdf.roots())
         r = (r - self._mu) / self._sigma
         if r[(np.imag(r) == 0) & (np.abs(r) < 4)].any():
             mesg = 'PDF has zeros at %s ' % r
             warnings.warn(mesg, RuntimeWarning)
-        kwds.update({'name': name, 'momtype': 0})
+
+        kwds.update({'name': name,
+                     'momtype': 0})   # use pdf, not ppf in self.moment()
         super(ExpandedNormal, self).__init__(**kwds)
+
+    def _pdf(self, x):
+        y = (x - self._mu) / self._sigma
+        return self._herm_pdf(y) * _norm_pdf(y) / self._sigma
+
+    def _cdf(self, x):
+        y = (x - self._mu) / self._sigma
+        return (_norm_cdf(y) +
+                self._herm_cdf(y) * _norm_pdf(y))
+
+    def _sf(self, x):
+        y = (x - self._mu) / self._sigma
+        return (_norm_sf(y) -
+                self._herm_cdf(y) * _norm_pdf(y))
+
+    def _compute_coefs_pdf(self, cum):
+        # scale cumulants by \sigma
+        mu, sigma = cum[0], np.sqrt(cum[1])
+        lam = np.asarray(cum)
+        for j, l in enumerate(lam):
+            lam[j] /= cum[1]**j
+
+        coef = np.zeros(lam.size * 3 - 5)
+        coef[0] = 1.
+        for s in range(lam.size - 2):
+            for p in _faa_di_bruno_partitions(s+1):
+                term = sigma**(s+1)
+                for (m, k) in p:
+                    term *= np.power(lam[m+1] / factorial(m+2), k) / factorial(k)
+                r = sum(k for (m, k) in p)
+                coef[s + 1 + 2*r] += term
+        return coef, mu, sigma
diff --git a/statsmodels/distributions/empirical_distribution.py b/statsmodels/distributions/empirical_distribution.py
index ef860c483..1afbb5a3e 100644
--- a/statsmodels/distributions/empirical_distribution.py
+++ b/statsmodels/distributions/empirical_distribution.py
@@ -5,8 +5,8 @@ import numpy as np
 from scipy.interpolate import interp1d


-def _conf_set(F, alpha=0.05):
-    """
+def _conf_set(F, alpha=.05):
+    r"""
     Constructs a Dvoretzky-Kiefer-Wolfowitz confidence band for the eCDF.

     Parameters
@@ -20,14 +20,18 @@ def _conf_set(F, alpha=0.05):
     -----
     Based on the DKW inequality.

-    .. math:: P \\left( \\sup_x \\left| F(x) - \\hat(F)_n(X) \\right| >
-       \\epsilon \\right) \\leq 2e^{-2n\\epsilon^2}
+    .. math:: P \left( \sup_x \left| F(x) - \hat(F)_n(X) \right| >
+       \epsilon \right) \leq 2e^{-2n\epsilon^2}

     References
     ----------
     Wasserman, L. 2006. `All of Nonparametric Statistics`. Springer.
     """
-    pass
+    nobs = len(F)
+    epsilon = np.sqrt(np.log(2./alpha) / (2 * nobs))
+    lower = np.clip(F - epsilon, 0, 1)
+    upper = np.clip(F + epsilon, 0, 1)
+    return lower, upper


 class StepFunction:
@@ -74,21 +78,26 @@ class StepFunction:
     3.0
     """

-    def __init__(self, x, y, ival=0.0, sorted=False, side='left'):
+    def __init__(self, x, y, ival=0., sorted=False, side='left'):  # noqa
+
         if side.lower() not in ['right', 'left']:
             msg = "side can take the values 'right' or 'left'"
             raise ValueError(msg)
         self.side = side
+
         _x = np.asarray(x)
         _y = np.asarray(y)
+
         if _x.shape != _y.shape:
-            msg = 'x and y do not have the same shape'
+            msg = "x and y do not have the same shape"
             raise ValueError(msg)
         if len(_x.shape) != 1:
             msg = 'x and y must be 1-dimensional'
             raise ValueError(msg)
+
         self.x = np.r_[-np.inf, _x]
         self.y = np.r_[ival, _y]
+
         if not sorted:
             asort = np.argsort(self.x)
             self.x = np.take(self.x, asort, 0)
@@ -96,6 +105,7 @@ class StepFunction:
         self.n = self.x.shape[0]

     def __call__(self, time):
+
         tind = np.searchsorted(self.x, time, self.side) - 1
         return self.y[tind]

@@ -126,13 +136,18 @@ class ECDF(StepFunction):
     >>> ecdf([3, 55, 0.5, 1.5])
     array([ 0.75,  1.  ,  0.  ,  0.25])
     """
-
     def __init__(self, x, side='right'):
         x = np.array(x, copy=True)
         x.sort()
         nobs = len(x)
-        y = np.linspace(1.0 / nobs, 1, nobs)
+        y = np.linspace(1./nobs, 1, nobs)
         super(ECDF, self).__init__(x, y, side=side, sorted=True)
+        # TODO: make `step` an arg and have a linear interpolation option?
+        # This is the path with `step` is True
+        # If `step` is False, a previous version of the code read
+        #  `return interp1d(x,y,drop_errors=False,fill_values=ival)`
+        # which would have raised a NameError if hit, so would need to be
+        # fixed.  See GH#5701.


 class ECDFDiscrete(StepFunction):
@@ -184,7 +199,6 @@ class ECDFDiscrete(StepFunction):
     >>> print(e1.y, e2.y)
     [0.  0.2 0.4 0.8 1. ] [0.  0.2 0.4 0.8 1. ]
     """
-
     def __init__(self, x, freq_weights=None, side='right'):
         if freq_weights is None:
             x, freq_weights = np.unique(x, return_counts=True)
@@ -207,4 +221,15 @@ def monotone_fn_inverter(fn, x, vectorized=True, **keywords):
     and a set of x values, return an linearly interpolated approximation
     to its inverse from its values on x.
     """
-    pass
+    x = np.asarray(x)
+    if vectorized:
+        y = fn(x, **keywords)
+    else:
+        y = []
+        for _x in x:
+            y.append(fn(_x, **keywords))
+        y = np.array(y)
+
+    a = np.argsort(y)
+
+    return interp1d(y[a], x[a])
diff --git a/statsmodels/distributions/mixture_rvs.py b/statsmodels/distributions/mixture_rvs.py
index 62438d2a9..6110513de 100644
--- a/statsmodels/distributions/mixture_rvs.py
+++ b/statsmodels/distributions/mixture_rvs.py
@@ -1,7 +1,6 @@
 import numpy as np

-
-def _make_index(prob, size):
+def _make_index(prob,size):
     """
     Returns a boolean index for given probabilities.

@@ -11,8 +10,9 @@ def _make_index(prob, size):
     being True and a 25% chance of the second column being True. The
     columns are mutually exclusive.
     """
-    pass
-
+    rv = np.random.uniform(size=(size,1))
+    cumprob = np.cumsum(prob)
+    return np.logical_and(np.r_[0,cumprob[:-1]] <= rv, rv < cumprob)

 def mixture_rvs(prob, size, dist, kwargs=None):
     """
@@ -42,18 +42,42 @@ def mixture_rvs(prob, size, dist, kwargs=None):
     >>> Y = mixture_rvs(prob, 5000, dist=[stats.norm, stats.norm],
     ...                 kwargs = (dict(loc=-1,scale=.5),dict(loc=1,scale=.5)))
     """
-    pass
+    if len(prob) != len(dist):
+        raise ValueError("You must provide as many probabilities as distributions")
+    if not np.allclose(np.sum(prob), 1):
+        raise ValueError("prob does not sum to 1")
+
+    if kwargs is None:
+        kwargs = ({},)*len(prob)
+
+    idx = _make_index(prob,size)
+    sample = np.empty(size)
+    for i in range(len(prob)):
+        sample_idx = idx[...,i]
+        sample_size = sample_idx.sum()
+        loc = kwargs[i].get('loc',0)
+        scale = kwargs[i].get('scale',1)
+        args = kwargs[i].get('args',())
+        sample[sample_idx] = dist[i].rvs(*args, **dict(loc=loc,scale=scale,
+            size=sample_size))
+    return sample


 class MixtureDistribution:
-    """univariate mixture distribution
+    '''univariate mixture distribution

     for simple case for now (unbound support)
     does not yet inherit from scipy.stats.distributions

     adding pdf to mixture_rvs, some restrictions on broadcasting
     Currently it does not hold any state, all arguments included in each method.
-    """
+    '''
+
+    #def __init__(self, prob, size, dist, kwargs=None):
+
+    def rvs(self, prob, size, dist, kwargs=None):
+        return mixture_rvs(prob, size, dist, kwargs=kwargs)
+

     def pdf(self, x, prob, dist, kwargs=None):
         """
@@ -87,7 +111,23 @@ class MixtureDistribution:
         >>> Y = mixture.pdf(x, prob, dist=[stats.norm, stats.norm],
         ...                 kwargs = (dict(loc=-1,scale=.5),dict(loc=1,scale=.5)))
         """
-        pass
+        if len(prob) != len(dist):
+            raise ValueError("You must provide as many probabilities as distributions")
+        if not np.allclose(np.sum(prob), 1):
+            raise ValueError("prob does not sum to 1")
+
+        if kwargs is None:
+            kwargs = ({},)*len(prob)
+
+        for i in range(len(prob)):
+            loc = kwargs[i].get('loc',0)
+            scale = kwargs[i].get('scale',1)
+            args = kwargs[i].get('args',())
+            if i == 0:  #assume all broadcast the same as the first dist
+                pdf_ = prob[i] * dist[i].pdf(x, *args, loc=loc, scale=scale)
+            else:
+                pdf_ += prob[i] * dist[i].pdf(x, *args, loc=loc, scale=scale)
+        return pdf_

     def cdf(self, x, prob, dist, kwargs=None):
         """
@@ -123,7 +163,23 @@ class MixtureDistribution:
         >>> Y = mixture.pdf(x, prob, dist=[stats.norm, stats.norm],
         ...                 kwargs = (dict(loc=-1,scale=.5),dict(loc=1,scale=.5)))
         """
-        pass
+        if len(prob) != len(dist):
+            raise ValueError("You must provide as many probabilities as distributions")
+        if not np.allclose(np.sum(prob), 1):
+            raise ValueError("prob does not sum to 1")
+
+        if kwargs is None:
+            kwargs = ({},)*len(prob)
+
+        for i in range(len(prob)):
+            loc = kwargs[i].get('loc',0)
+            scale = kwargs[i].get('scale',1)
+            args = kwargs[i].get('args',())
+            if i == 0:  #assume all broadcast the same as the first dist
+                cdf_ = prob[i] * dist[i].cdf(x, *args, loc=loc, scale=scale)
+            else:
+                cdf_ += prob[i] * dist[i].cdf(x, *args, loc=loc, scale=scale)
+        return cdf_


 def mv_mixture_rvs(prob, size, dist, nvars, **kwargs):
@@ -161,24 +217,52 @@ def mv_mixture_rvs(prob, size, dist, nvars, **kwargs):
     mvn32 = mvd.MVNormal(mu2, cov3/2., 4)
     rvs = mix.mv_mixture_rvs([0.4, 0.6], 2000, [mvn3, mvn32], 3)
     """
-    pass
+    if len(prob) != len(dist):
+        raise ValueError("You must provide as many probabilities as distributions")
+    if not np.allclose(np.sum(prob), 1):
+        raise ValueError("prob does not sum to 1")
+
+    if kwargs is None:
+        kwargs = ({},)*len(prob)
+
+    idx = _make_index(prob,size)
+    sample = np.empty((size, nvars))
+    for i in range(len(prob)):
+        sample_idx = idx[...,i]
+        sample_size = sample_idx.sum()
+        #loc = kwargs[i].get('loc',0)
+        #scale = kwargs[i].get('scale',1)
+        #args = kwargs[i].get('args',())
+        # use int to avoid numpy bug with np.random.multivariate_normal
+        sample[sample_idx] = dist[i].rvs(size=int(sample_size))
+    return sample
+


 if __name__ == '__main__':
+
     from scipy import stats
-    obs_dist = mixture_rvs([0.25, 0.75], size=10000, dist=[stats.norm,
-        stats.beta], kwargs=(dict(loc=-1, scale=0.5), dict(loc=1, scale=1,
-        args=(1, 0.5))))
+
+    obs_dist = mixture_rvs([.25,.75], size=10000, dist=[stats.norm, stats.beta],
+                kwargs=(dict(loc=-1,scale=.5),dict(loc=1,scale=1,args=(1,.5))))
+
+
+
     nobs = 10000
     mix = MixtureDistribution()
-    mix_kwds = dict(loc=-1, scale=0.25), dict(loc=1, scale=0.75)
-    mrvs = mix.rvs([1 / 3.0, 2 / 3.0], size=nobs, dist=[stats.norm, stats.
-        norm], kwargs=mix_kwds)
-    grid = np.linspace(-4, 4, 100)
-    mpdf = mix.pdf(grid, [1 / 3.0, 2 / 3.0], dist=[stats.norm, stats.norm],
-        kwargs=mix_kwds)
-    mcdf = mix.cdf(grid, [1 / 3.0, 2 / 3.0], dist=[stats.norm, stats.norm],
-        kwargs=mix_kwds)
+##    mrvs = mixture_rvs([1/3.,2/3.], size=nobs, dist=[stats.norm, stats.norm],
+##                   kwargs = (dict(loc=-1,scale=.5),dict(loc=1,scale=.75)))
+
+    mix_kwds = (dict(loc=-1,scale=.25),dict(loc=1,scale=.75))
+    mrvs = mix.rvs([1/3.,2/3.], size=nobs, dist=[stats.norm, stats.norm],
+                   kwargs=mix_kwds)
+
+    grid = np.linspace(-4,4, 100)
+    mpdf = mix.pdf(grid, [1/3.,2/3.], dist=[stats.norm, stats.norm],
+                   kwargs=mix_kwds)
+    mcdf = mix.cdf(grid, [1/3.,2/3.], dist=[stats.norm, stats.norm],
+                   kwargs=mix_kwds)
+
     doplot = 1
     if doplot:
         import matplotlib.pyplot as plt
@@ -186,8 +270,10 @@ if __name__ == '__main__':
         plt.hist(mrvs, bins=50, normed=True, color='red')
         plt.title('histogram of sample and pdf')
         plt.plot(grid, mpdf, lw=2, color='black')
+
         plt.figure()
         plt.hist(mrvs, bins=50, normed=True, cumulative=True, color='red')
         plt.title('histogram of sample and pdf')
         plt.plot(grid, mcdf, lw=2, color='black')
+
         plt.show()
diff --git a/statsmodels/distributions/tools.py b/statsmodels/distributions/tools.py
index 45df6d9ca..25111bc7f 100644
--- a/statsmodels/distributions/tools.py
+++ b/statsmodels/distributions/tools.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Thu Feb 11 09:19:30 2021

@@ -6,9 +7,11 @@ License: BSD-3

 """
 import warnings
+
 import numpy as np
 from scipy import interpolate, stats

+# helper functions to work on a grid of cdf and pdf, histogram

 class _Grid:
     """Create Grid values and indices, grid in [0, 1]^d
@@ -41,13 +44,17 @@ class _Grid:

     def __init__(self, k_grid, eps=0):
         self.k_grid = k_grid
-        x_marginal = [(np.arange(ki) / (ki - 1)) for ki in k_grid]
-        idx_flat = np.column_stack(np.unravel_index(np.arange(np.prod(
-            k_grid)), k_grid)).astype(float)
+
+        x_marginal = [np.arange(ki) / (ki - 1) for ki in k_grid]
+
+        idx_flat = np.column_stack(
+                np.unravel_index(np.arange(np.prod(k_grid)), k_grid)
+                ).astype(float)
         x_flat = idx_flat / idx_flat.max(0)
         if eps != 0:
             x_marginal = [np.clip(xi, eps, 1 - eps) for xi in x_marginal]
             x_flat = np.clip(x_flat, eps, 1 - eps)
+
         self.x_marginal = x_marginal
         self.idx_flat = idx_flat
         self.x_flat = x_flat
@@ -66,7 +73,12 @@ def prob2cdf_grid(probs):
     cdf : ndarray
         Grid of cumulative probabilities with same shape as probs.
     """
-    pass
+    cdf = np.asarray(probs).copy()
+    k = cdf.ndim
+    for i in range(k):
+        cdf = cdf.cumsum(axis=i)
+
+    return cdf


 def cdf2prob_grid(cdf, prepend=0):
@@ -83,10 +95,17 @@ def cdf2prob_grid(cdf, prepend=0):
         Rectangular grid of cell probabilities.

     """
-    pass
+    if prepend is None:
+        prepend = np._NoValue
+    prob = np.asarray(cdf).copy()
+    k = prob.ndim
+    for i in range(k):
+        prob = np.diff(prob, prepend=prepend, axis=i)
+
+    return prob


-def average_grid(values, coords=None, _method='slicing'):
+def average_grid(values, coords=None, _method="slicing"):
     """Compute average for each cell in grid using endpoints

     Parameters
@@ -104,10 +123,37 @@ def average_grid(values, coords=None, _method='slicing'):
     -------
     Grid with averaged cell values.
     """
-    pass
+    k_dim = values.ndim
+    if _method == "slicing":
+        p = values.copy()
+
+        for d in range(k_dim):
+            # average (p[:-1] + p[1:]) / 2 over each axis
+            sl1 = [slice(None, None, None)] * k_dim
+            sl2 = [slice(None, None, None)] * k_dim
+            sl1[d] = slice(None, -1, None)
+            sl2[d] = slice(1, None, None)
+            sl1 = tuple(sl1)
+            sl2 = tuple(sl2)
+
+            p = (p[sl1] + p[sl2]) / 2
+
+    elif _method == "convolve":
+        from scipy import signal
+        p = signal.convolve(values, 0.5**k_dim * np.ones([2] * k_dim),
+                            mode="valid")
+
+    if coords is not None:
+        dx = np.array(1)
+        for d in range(k_dim):
+            dx = dx[..., None] * np.diff(coords[d])
+
+        p = p * dx

+    return p

-def nearest_matrix_margins(mat, maxiter=100, tol=1e-08):
+
+def nearest_matrix_margins(mat, maxiter=100, tol=1e-8):
     """nearest matrix with uniform margins

     Parameters
@@ -134,7 +180,32 @@ def nearest_matrix_margins(mat, maxiter=100, tol=1e-08):


     """
-    pass
+    pc = np.asarray(mat)
+    converged = False
+
+    for _ in range(maxiter):
+        pc0 = pc.copy()
+        for ax in range(pc.ndim):
+            axs = tuple([i for i in range(pc.ndim) if not i == ax])
+            pc0 /= pc.sum(axis=axs, keepdims=True)
+        pc = pc0
+        pc /= pc.sum()
+
+        # check convergence
+        mptps = []
+        for ax in range(pc.ndim):
+            axs = tuple([i for i in range(pc.ndim) if not i == ax])
+            marg = pc.sum(axis=axs, keepdims=False)
+            mptps.append(np.ptp(marg))
+        if max(mptps) < tol:
+            converged = True
+            break
+
+    if not converged:
+        from statsmodels.tools.sm_exceptions import ConvergenceWarning
+        warnings.warn("Iterations did not converge, maxiter reached",
+                      ConvergenceWarning)
+    return pc


 def _rankdata_no_ties(x):
@@ -148,7 +219,11 @@ def _rankdata_no_ties(x):
     scipy.stats.rankdata

     """
-    pass
+    nobs, k_vars = x.shape
+    ranks = np.ones((nobs, k_vars))
+    sidx = np.argsort(x, axis=0)
+    ranks[sidx, np.arange(k_vars)] = np.arange(1, nobs + 1)[:, None]
+    return ranks


 def frequencies_fromdata(data, k_bins, use_ranks=True):
@@ -179,7 +254,18 @@ def frequencies_fromdata(data, k_bins, use_ranks=True):
     This function is intended for internal use and will be generalized in
     future. API will change.
     """
-    pass
+    data = np.asarray(data)
+    k_dim = data.shape[-1]
+    k = k_bins + 1
+    g2 = _Grid([k] * k_dim, eps=0)
+    if use_ranks:
+        data = _rankdata_no_ties(data) / (data.shape[0] + 1)
+        # alternatives: scipy handles ties, but uses np.apply_along_axis
+        # rvs = stats.rankdata(rvs, axis=0) / (rvs.shape[0] + 1)
+        # rvs = (np.argsort(np.argsort(rvs, axis=0), axis=0) + 1
+        #                              ) / (rvs.shape[0] + 1)
+    freqr, _ = np.histogramdd(data, bins=g2.x_marginal)
+    return freqr


 def approx_copula_pdf(copula, k_bins=10, force_uniform=True, use_pdf=False):
@@ -217,10 +303,35 @@ def approx_copula_pdf(copula, k_bins=10, force_uniform=True, use_pdf=False):
     This function is intended for internal use and will be generalized in
     future. API will change.
     """
-    pass
-
-
-def _eval_bernstein_1d(x, fvals, method='binom'):
+    k_dim = copula.k_dim
+    k = k_bins + 1
+    ks = tuple([k] * k_dim)
+
+    if use_pdf:
+        g = _Grid([k] * k_dim, eps=0.1 / k_bins)
+        pdfg = copula.pdf(g.x_flat).reshape(*ks)
+        # correct for bin size
+        pdfg *= 1 / k**k_dim
+        ag = average_grid(pdfg)
+        if force_uniform:
+            pdf_grid = nearest_matrix_margins(ag, maxiter=100, tol=1e-8)
+        else:
+            pdf_grid = ag / ag.sum()
+    else:
+        g = _Grid([k] * k_dim, eps=1e-6)
+        cdfg = copula.cdf(g.x_flat).reshape(*ks)
+        # correct for bin size
+        pdf_grid = cdf2prob_grid(cdfg, prepend=None)
+        # TODO: check boundary approximation, eg. undefined at zero
+        # for now just normalize
+        pdf_grid /= pdf_grid.sum()
+
+    return pdf_grid
+
+
+# functions to evaluate bernstein polynomials
+
+def _eval_bernstein_1d(x, fvals, method="binom"):
     """Evaluate 1-dimensional bernstein polynomial given grid of values.

     experimental, comparing methods
@@ -245,7 +356,30 @@ def _eval_bernstein_1d(x, fvals, method='binom'):
     Bernstein polynomial at evaluation points, weighted sum of Bernstein
     polynomial basis.
     """
-    pass
+    k_terms = fvals.shape[-1]
+    xx = np.asarray(x)
+    k = np.arange(k_terms).astype(float)
+    n = k_terms - 1.
+
+    if method.lower() == "binom":
+        # Divide by 0 RuntimeWarning here
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", RuntimeWarning)
+            poly_base = stats.binom.pmf(k, n, xx[..., None])
+        bp_values = (fvals * poly_base).sum(-1)
+    elif method.lower() == "bpoly":
+        bpb = interpolate.BPoly(fvals[:, None], [0., 1])
+        bp_values = bpb(x)
+    elif method.lower() == "beta":
+        # Divide by 0 RuntimeWarning here
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", RuntimeWarning)
+            poly_base = stats.beta.pdf(xx[..., None], k + 1, n - k + 1) / (n + 1)
+        bp_values = (fvals * poly_base).sum(-1)
+    else:
+        raise ValueError("method not recogized")
+
+    return bp_values


 def _eval_bernstein_2d(x, fvals):
@@ -266,7 +400,25 @@ def _eval_bernstein_2d(x, fvals):
     Bernstein polynomial at evaluation points, weighted sum of Bernstein
     polynomial basis.
     """
-    pass
+    k_terms = fvals.shape
+    k_dim = fvals.ndim
+    if k_dim != 2:
+        raise ValueError("`fval` needs to be 2-dimensional")
+    xx = np.atleast_2d(x)
+    if xx.shape[1] != 2:
+        raise ValueError("x needs to be bivariate and have 2 columns")
+
+    x1, x2 = xx.T
+    n1, n2 = k_terms[0] - 1, k_terms[1] - 1
+    k1 = np.arange(k_terms[0]).astype(float)
+    k2 = np.arange(k_terms[1]).astype(float)
+
+    # we are building a nobs x n1 x n2 array
+    poly_base = (stats.binom.pmf(k1[None, :, None], n1, x1[:, None, None]) *
+                 stats.binom.pmf(k2[None, None, :], n2, x2[:, None, None]))
+    bp_values = (fvals * poly_base).sum(-1).sum(-1)
+
+    return bp_values


 def _eval_bernstein_dd(x, fvals):
@@ -287,10 +439,33 @@ def _eval_bernstein_dd(x, fvals):
     Bernstein polynomial at evaluation points, weighted sum of Bernstein
     polynomial basis.
     """
-    pass
+    k_terms = fvals.shape
+    k_dim = fvals.ndim
+    xx = np.atleast_2d(x)
+
+    # The following loop is a tricky
+    # we add terms for each x and expand dimension of poly base in each
+    # iteration using broadcasting
+
+    poly_base = np.zeros(x.shape[0])
+    for i in range(k_dim):
+        ki = np.arange(k_terms[i]).astype(float)
+        for _ in range(i+1):
+            ki = ki[..., None]
+        ni = k_terms[i] - 1
+        xi = xx[:, i]
+        poly_base = poly_base[None, ...] + stats.binom._logpmf(ki, ni, xi)
+
+    poly_base = np.exp(poly_base)
+    bp_values = fvals.T[..., None] * poly_base
+
+    for i in range(k_dim):
+        bp_values = bp_values.sum(0)
+
+    return bp_values


-def _ecdf_mv(data, method='seq', use_ranks=True):
+def _ecdf_mv(data, method="seq", use_ranks=True):
     """
     Multivariate empiricial distribution function, empirical copula

@@ -310,4 +485,21 @@ def _ecdf_mv(data, method='seq', use_ranks=True):
     computes the correct ecdf counts even in the case of ties.

     """
-    pass
+    x = np.asarray(data)
+    n = x.shape[0]
+    if use_ranks:
+        x = _rankdata_no_ties(x) / n
+    if method == "brute":
+        count = [((x <= x[i]).all(1)).sum() for i in range(n)]
+        count = np.asarray(count)
+    elif method.startswith("seq"):
+        sort_idx0 = np.argsort(x[:, 0])
+        x_s0 = x[sort_idx0]
+        x1 = x_s0[:, 1:]
+        count_smaller = [(x1[:i] <= x1[i]).all(1).sum() + 1 for i in range(n)]
+        count = np.empty(x.shape[0])
+        count[sort_idx0] = count_smaller
+    else:
+        raise ValueError("method not available")
+
+    return count, x
diff --git a/statsmodels/duration/_kernel_estimates.py b/statsmodels/duration/_kernel_estimates.py
index 1d51c6a6b..d27928a72 100644
--- a/statsmodels/duration/_kernel_estimates.py
+++ b/statsmodels/duration/_kernel_estimates.py
@@ -2,7 +2,8 @@ import numpy as np
 from statsmodels.duration.hazard_regression import PHReg


-def _kernel_cumincidence(time, status, exog, kfunc, freq_weights, dimred=True):
+def _kernel_cumincidence(time, status, exog, kfunc, freq_weights,
+                         dimred=True):
     """
     Calculates cumulative incidence functions using kernels.

@@ -27,7 +28,92 @@ def _kernel_cumincidence(time, status, exog, kfunc, freq_weights, dimred=True):
         directly for calculating kernel weights without dimension
         reduction.
     """
-    pass
+
+    # Reorder so time is ascending
+    ii = np.argsort(time)
+    time = time[ii]
+    status = status[ii]
+    exog = exog[ii, :]
+    nobs = len(time)
+
+    # Convert the unique times to ranks (0, 1, 2, ...)
+    utime, rtime = np.unique(time, return_inverse=True)
+
+    # Last index where each unique time occurs.
+    ie = np.searchsorted(time, utime, side='right') - 1
+
+    ngrp = int(status.max())
+
+    # All-cause status
+    statusa = (status >= 1).astype(np.float64)
+
+    if freq_weights is not None:
+        freq_weights = freq_weights / freq_weights.sum()
+
+    ip = []
+    sp = [None] * nobs
+    n_risk = [None] * nobs
+    kd = [None] * nobs
+    for k in range(ngrp):
+        status0 = (status == k + 1).astype(np.float64)
+
+        # Dimension reduction step
+        if dimred:
+            sfe = PHReg(time, exog, status0).fit()
+            fitval_e = sfe.predict().predicted_values
+            sfc = PHReg(time, exog, 1 - status0).fit()
+            fitval_c = sfc.predict().predicted_values
+            exog2d = np.hstack((fitval_e[:, None], fitval_c[:, None]))
+            exog2d -= exog2d.mean(0)
+            exog2d /= exog2d.std(0)
+        else:
+            exog2d = exog
+
+        ip0 = 0
+        for i in range(nobs):
+
+            if k == 0:
+                kd1 = exog2d - exog2d[i, :]
+                kd1 = kfunc(kd1)
+                kd[i] = kd1
+
+            # Get the local all-causes survival function
+            if k == 0:
+                denom = np.cumsum(kd[i][::-1])[::-1]
+                num = kd[i] * statusa
+                rat = num / denom
+                tr = 1e-15
+                ii = np.flatnonzero((denom < tr) & (num < tr))
+                rat[ii] = 0
+                ratc = 1 - rat
+                ratc = np.clip(ratc, 1e-10, np.inf)
+                lrat = np.log(ratc)
+                prat = np.cumsum(lrat)[ie]
+                sf = np.exp(prat)
+                sp[i] = np.r_[1, sf[:-1]]
+                n_risk[i] = denom[ie]
+
+            # Number of cause-specific deaths at each unique time.
+            d0 = np.bincount(rtime, weights=status0*kd[i],
+                             minlength=len(utime))
+
+            # The cumulative incidence function probabilities.  Carry
+            # forward once the effective sample size drops below 1.
+            ip1 = np.cumsum(sp[i] * d0 / n_risk[i])
+            jj = len(ip1) - np.searchsorted(n_risk[i][::-1], 1)
+            if jj < len(ip1):
+                ip1[jj:] = ip1[jj - 1]
+            if freq_weights is None:
+                ip0 += ip1
+            else:
+                ip0 += freq_weights[i] * ip1
+
+        if freq_weights is None:
+            ip0 /= nobs
+
+        ip.append(ip0)
+
+    return utime, ip


 def _kernel_survfunc(time, status, exog, kfunc, freq_weights):
@@ -64,4 +150,57 @@ def _kernel_survfunc(time, status, exog, kfunc, freq_weights):
     doi:10.1214/009053604000000508.
     https://arxiv.org/pdf/math/0409180.pdf
     """
-    pass
+
+    # Dimension reduction step
+    sfe = PHReg(time, exog, status).fit()
+    fitval_e = sfe.predict().predicted_values
+    sfc = PHReg(time, exog, 1 - status).fit()
+    fitval_c = sfc.predict().predicted_values
+    exog2d = np.hstack((fitval_e[:, None], fitval_c[:, None]))
+
+    n = len(time)
+    ixd = np.flatnonzero(status == 1)
+
+    # For consistency with standard KM, only compute the survival
+    # function at the times of observed events.
+    utime = np.unique(time[ixd])
+
+    # Reorder everything so time is ascending
+    ii = np.argsort(time)
+    time = time[ii]
+    status = status[ii]
+    exog2d = exog2d[ii, :]
+
+    # Last index where each evaluation time occurs.
+    ie = np.searchsorted(time, utime, side='right') - 1
+
+    if freq_weights is not None:
+        freq_weights = freq_weights / freq_weights.sum()
+
+    sprob = 0.
+    for i in range(n):
+
+        kd = exog2d - exog2d[i, :]
+        kd = kfunc(kd)
+
+        denom = np.cumsum(kd[::-1])[::-1]
+        num = kd * status
+        rat = num / denom
+        tr = 1e-15
+        ii = np.flatnonzero((denom < tr) & (num < tr))
+        rat[ii] = 0
+        ratc = 1 - rat
+        ratc = np.clip(ratc, 1e-12, np.inf)
+        lrat = np.log(ratc)
+        prat = np.cumsum(lrat)[ie]
+        prat = np.exp(prat)
+
+        if freq_weights is None:
+            sprob += prat
+        else:
+            sprob += prat * freq_weights[i]
+
+    if freq_weights is None:
+        sprob /= n
+
+    return sprob, utime
diff --git a/statsmodels/duration/api.py b/statsmodels/duration/api.py
index 8fc3d8351..925c598d0 100644
--- a/statsmodels/duration/api.py
+++ b/statsmodels/duration/api.py
@@ -1,3 +1,4 @@
-__all__ = ['PHReg', 'SurvfuncRight', 'survdiff', 'CumIncidenceRight']
+__all__ = ["PHReg", "SurvfuncRight", "survdiff", "CumIncidenceRight"]
 from .hazard_regression import PHReg
-from .survfunc import SurvfuncRight, survdiff, CumIncidenceRight
+from .survfunc import (SurvfuncRight, survdiff,
+                       CumIncidenceRight)
diff --git a/statsmodels/duration/hazard_regression.py b/statsmodels/duration/hazard_regression.py
index 3f416a69f..ddfe1e465 100644
--- a/statsmodels/duration/hazard_regression.py
+++ b/statsmodels/duration/hazard_regression.py
@@ -15,10 +15,13 @@ hazards model.
 http://www.mwsug.org/proceedings/2006/stats/MWSUG-2006-SD08.pdf
 """
 import numpy as np
+
 from statsmodels.base import model
 import statsmodels.base.model as base
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.compat.pandas import Appender
+
+
 _predict_docstring = """
     Returns predicted values from the proportional hazards
     regression model.
@@ -66,9 +69,11 @@ _predict_docstring = """
     Types `surv` and `cumhaz` require estimation of the cumulative
     hazard function.
 """
+
 _predict_params_doc = """
     params : array_like
         The proportional hazards model parameters."""
+
 _predict_cov_params_docstring = """
     cov_params : array_like
         The covariance matrix of the estimated `params` vector,
@@ -76,10 +81,11 @@ _predict_cov_params_docstring = """
         otherwise optional."""


+
 class PHSurvivalTime:

-    def __init__(self, time, status, exog, strata=None, entry=None, offset=None
-        ):
+    def __init__(self, time, status, exog, strata=None, entry=None,
+                 offset=None):
         """
         Represent a collection of survival times with possible
         stratification and left truncation.
@@ -108,74 +114,162 @@ class PHSurvivalTime:
         offset : array_like
             An optional array of offsets
         """
+
+        # Default strata
         if strata is None:
             strata = np.zeros(len(time), dtype=np.int32)
+
+        # Default entry times
         if entry is None:
             entry = np.zeros(len(time))
+
+        # Parameter validity checks.
         self._check(time, status, strata, entry)
+
+        # Get the row indices for the cases in each stratum
         stu = np.unique(strata)
         sth = {x: [] for x in stu}
-        for i, k in enumerate(strata):
+        for i,k in enumerate(strata):
             sth[k].append(i)
         stratum_rows = [np.asarray(sth[k], dtype=np.int32) for k in stu]
         stratum_names = stu
-        ix = [i for i, ix in enumerate(stratum_rows) if status[ix].sum() > 0]
+
+        # Remove strata with no events
+        ix = [i for i,ix in enumerate(stratum_rows) if status[ix].sum() > 0]
         self.nstrat_orig = len(stratum_rows)
         stratum_rows = [stratum_rows[i] for i in ix]
         stratum_names = [stratum_names[i] for i in ix]
+
+        # The number of strata
         nstrat = len(stratum_rows)
         self.nstrat = nstrat
-        for stx, ix in enumerate(stratum_rows):
+
+        # Remove subjects whose entry time occurs after the last event
+        # in their stratum.
+        for stx,ix in enumerate(stratum_rows):
             last_failure = max(time[ix][status[ix] == 1])
-            ii = [i for i, t in enumerate(entry[ix]) if t <= last_failure]
+
+            # Stata uses < here, R uses <=
+            ii = [i for i,t in enumerate(entry[ix]) if
+                  t <= last_failure]
             stratum_rows[stx] = stratum_rows[stx][ii]
-        for stx, ix in enumerate(stratum_rows):
+
+        # Remove subjects who are censored before the first event in
+        # their stratum.
+        for stx,ix in enumerate(stratum_rows):
             first_failure = min(time[ix][status[ix] == 1])
-            ii = [i for i, t in enumerate(time[ix]) if t >= first_failure]
+
+            ii = [i for i,t in enumerate(time[ix]) if
+                  t >= first_failure]
             stratum_rows[stx] = stratum_rows[stx][ii]
-        for stx, ix in enumerate(stratum_rows):
+
+        # Order by time within each stratum
+        for stx,ix in enumerate(stratum_rows):
             ii = np.argsort(time[ix])
             stratum_rows[stx] = stratum_rows[stx][ii]
+
         if offset is not None:
             self.offset_s = []
             for stx in range(nstrat):
                 self.offset_s.append(offset[stratum_rows[stx]])
         else:
             self.offset_s = None
+
+        # Number of informative subjects
         self.n_obs = sum([len(ix) for ix in stratum_rows])
+
         self.stratum_rows = stratum_rows
         self.stratum_names = stratum_names
+
+        # Split everything by stratum
         self.time_s = self._split(time)
         self.exog_s = self._split(exog)
         self.status_s = self._split(status)
         self.entry_s = self._split(entry)
-        self.ufailt_ix, self.risk_enter, self.risk_exit, self.ufailt = [], [
-            ], [], []
+
+        # Precalculate some indices needed to fit Cox models.
+        # Distinct failure times within a stratum are always taken to
+        # be sorted in ascending order.
+        #
+        # ufailt_ix[stx][k] is a list of indices for subjects who fail
+        # at the k^th sorted unique failure time in stratum stx
+        #
+        # risk_enter[stx][k] is a list of indices for subjects who
+        # enter the risk set at the k^th sorted unique failure time in
+        # stratum stx
+        #
+        # risk_exit[stx][k] is a list of indices for subjects who exit
+        # the risk set at the k^th sorted unique failure time in
+        # stratum stx
+        self.ufailt_ix, self.risk_enter, self.risk_exit, self.ufailt =\
+            [], [], [], []
+
         for stx in range(self.nstrat):
+
+            # All failure times
             ift = np.flatnonzero(self.status_s[stx] == 1)
             ft = self.time_s[stx][ift]
+
+            # Unique failure times
             uft = np.unique(ft)
             nuft = len(uft)
-            uft_map = dict([(x, i) for i, x in enumerate(uft)])
+
+            # Indices of cases that fail at each unique failure time
+            #uft_map = {x:i for i,x in enumerate(uft)} # requires >=2.7
+            uft_map = dict([(x, i) for i,x in enumerate(uft)]) # 2.6
             uft_ix = [[] for k in range(nuft)]
-            for ix, ti in zip(ift, ft):
+            for ix,ti in zip(ift,ft):
                 uft_ix[uft_map[ti]].append(ix)
+
+            # Indices of cases (failed or censored) that enter the
+            # risk set at each unique failure time.
             risk_enter1 = [[] for k in range(nuft)]
-            for i, t in enumerate(self.time_s[stx]):
-                ix = np.searchsorted(uft, t, 'right') - 1
+            for i,t in enumerate(self.time_s[stx]):
+                ix = np.searchsorted(uft, t, "right") - 1
                 if ix >= 0:
                     risk_enter1[ix].append(i)
+
+            # Indices of cases (failed or censored) that exit the
+            # risk set at each unique failure time.
             risk_exit1 = [[] for k in range(nuft)]
-            for i, t in enumerate(self.entry_s[stx]):
+            for i,t in enumerate(self.entry_s[stx]):
                 ix = np.searchsorted(uft, t)
                 risk_exit1[ix].append(i)
+
             self.ufailt.append(uft)
-            self.ufailt_ix.append([np.asarray(x, dtype=np.int32) for x in
-                uft_ix])
-            self.risk_enter.append([np.asarray(x, dtype=np.int32) for x in
-                risk_enter1])
-            self.risk_exit.append([np.asarray(x, dtype=np.int32) for x in
-                risk_exit1])
+            self.ufailt_ix.append([np.asarray(x, dtype=np.int32)
+                                   for x in uft_ix])
+            self.risk_enter.append([np.asarray(x, dtype=np.int32)
+                                    for x in risk_enter1])
+            self.risk_exit.append([np.asarray(x, dtype=np.int32)
+                                   for x in risk_exit1])
+
+    def _split(self, x):
+        v = []
+        if x.ndim == 1:
+            for ix in self.stratum_rows:
+                v.append(x[ix])
+        else:
+            for ix in self.stratum_rows:
+                v.append(x[ix, :])
+        return v
+
+    def _check(self, time, status, strata, entry):
+        n1, n2, n3, n4 = len(time), len(status), len(strata),\
+            len(entry)
+        nv = [n1, n2, n3, n4]
+        if max(nv) != min(nv):
+            raise ValueError("endog, status, strata, and " +
+                             "entry must all have the same length")
+        if min(time) < 0:
+            raise ValueError("endog must be non-negative")
+        if min(entry) < 0:
+            raise ValueError("entry time must be non-negative")
+
+        # In Stata, this is entry >= time, in R it is >.
+        if np.any(entry > time):
+            raise ValueError("entry times may not occur " +
+                             "after event or censoring times")


 class PHReg(model.LikelihoodModel):
@@ -218,12 +312,21 @@ class PHReg(model.LikelihoodModel):
     of `exog` all must have the same length
     """

-    def __init__(self, endog, exog, status=None, entry=None, strata=None,
-        offset=None, ties='breslow', missing='drop', **kwargs):
+    def __init__(self, endog, exog, status=None, entry=None,
+                 strata=None, offset=None, ties='breslow',
+                 missing='drop', **kwargs):
+
+        # Default is no censoring
         if status is None:
             status = np.ones(len(endog))
-        super(PHReg, self).__init__(endog, exog, status=status, entry=entry,
-            strata=strata, offset=offset, missing=missing, **kwargs)
+
+        super(PHReg, self).__init__(endog, exog, status=status,
+                                    entry=entry, strata=strata,
+                                    offset=offset, missing=missing,
+                                    **kwargs)
+
+        # endog and exog are automatically converted, but these are
+        # not
         if self.status is not None:
             self.status = np.asarray(self.status)
         if self.entry is not None:
@@ -232,23 +335,31 @@ class PHReg(model.LikelihoodModel):
             self.strata = np.asarray(self.strata)
         if self.offset is not None:
             self.offset = np.asarray(self.offset)
-        self.surv = PHSurvivalTime(self.endog, self.status, self.exog, self
-            .strata, self.entry, self.offset)
+
+        self.surv = PHSurvivalTime(self.endog, self.status,
+                                    self.exog, self.strata,
+                                    self.entry, self.offset)
         self.nobs = len(self.endog)
         self.groups = None
+
+        # TODO: not used?
         self.missing = missing
-        self.df_resid = float(self.exog.shape[0] - np.linalg.matrix_rank(
-            self.exog))
+
+        self.df_resid = float(self.exog.shape[0] -
+                              np.linalg.matrix_rank(self.exog))
         self.df_model = float(np.linalg.matrix_rank(self.exog))
+
         ties = ties.lower()
-        if ties not in ('efron', 'breslow'):
-            raise ValueError('`ties` must be either `efron` or ' + '`breslow`')
+        if ties not in ("efron", "breslow"):
+            raise ValueError("`ties` must be either `efron` or " +
+                             "`breslow`")
+
         self.ties = ties

     @classmethod
-    def from_formula(cls, formula, data, status=None, entry=None, strata=
-        None, offset=None, subset=None, ties='breslow', missing='drop', *
-        args, **kwargs):
+    def from_formula(cls, formula, data, status=None, entry=None,
+                     strata=None, offset=None, subset=None,
+                     ties='breslow', missing='drop', *args, **kwargs):
         """
         Create a proportional hazards regression model from a formula
         and dataframe.
@@ -294,7 +405,32 @@ class PHReg(model.LikelihoodModel):
         -------
         model : PHReg model instance
         """
-        pass
+
+        # Allow array arguments to be passed by column name.
+        if isinstance(status, str):
+            status = data[status]
+        if isinstance(entry, str):
+            entry = data[entry]
+        if isinstance(strata, str):
+            strata = data[strata]
+        if isinstance(offset, str):
+            offset = data[offset]
+
+        import re
+        terms = re.split(r"[+\-~]", formula)
+        for term in terms:
+            term = term.strip()
+            if term in ("0", "1"):
+                import warnings
+                warnings.warn("PHReg formulas should not include any '0' or '1' terms")
+
+        mod = super(PHReg, cls).from_formula(formula, data,
+                    status=status, entry=entry, strata=strata,
+                    offset=offset, subset=subset, ties=ties,
+                    missing=missing, drop_cols=["Intercept"], *args,
+                    **kwargs)
+
+        return mod

     def fit(self, groups=None, **args):
         """
@@ -312,11 +448,34 @@ class PHReg(model.LikelihoodModel):
         PHRegResults
             Returns a results instance.
         """
-        pass

-    def fit_regularized(self, method='elastic_net', alpha=0.0, start_params
-        =None, refit=False, **kwargs):
-        """
+        # TODO process for missing values
+        if groups is not None:
+            if len(groups) != len(self.endog):
+                msg = ("len(groups) = %d and len(endog) = %d differ" %
+                       (len(groups), len(self.endog)))
+                raise ValueError(msg)
+            self.groups = np.asarray(groups)
+        else:
+            self.groups = None
+
+        if 'disp' not in args:
+            args['disp'] = False
+
+        fit_rslts = super(PHReg, self).fit(**args)
+
+        if self.groups is None:
+            cov_params = fit_rslts.cov_params()
+        else:
+            cov_params = self.robust_covariance(fit_rslts.params)
+
+        results = PHRegResults(self, fit_rslts.params, cov_params)
+
+        return results
+
+    def fit_regularized(self, method="elastic_net", alpha=0.,
+                        start_params=None, refit=False, **kwargs):
+        r"""
         Return a regularized fit to a linear regression model.

         Parameters
@@ -351,7 +510,7 @@ class PHReg(model.LikelihoodModel):

         .. math::

-            -loglike/n + alpha*((1-L1\\_wt)*|params|_2^2/2 + L1\\_wt*|params|_1)
+            -loglike/n + alpha*((1-L1\_wt)*|params|_2^2/2 + L1\_wt*|params|_1)

         where :math:`|*|_1` and :math:`|*|_2` are the L1 and L2 norms.

@@ -370,27 +529,54 @@ class PHReg(model.LikelihoodModel):
         zero_tol : float
             Coefficients below this threshold are treated as zero.
         """
-        pass
+
+        from statsmodels.base.elastic_net import fit_elasticnet
+
+        if method != "elastic_net":
+            raise ValueError("method for fit_regularized must be elastic_net")
+
+        defaults = {"maxiter" : 50, "L1_wt" : 1, "cnvrg_tol" : 1e-10,
+                    "zero_tol" : 1e-10}
+        defaults.update(kwargs)
+
+        return fit_elasticnet(self, method=method,
+                              alpha=alpha,
+                              start_params=start_params,
+                              refit=refit,
+                              **defaults)
+

     def loglike(self, params):
         """
         Returns the log partial likelihood function evaluated at
         `params`.
         """
-        pass
+
+        if self.ties == "breslow":
+            return self.breslow_loglike(params)
+        elif self.ties == "efron":
+            return self.efron_loglike(params)

     def score(self, params):
         """
         Returns the score function evaluated at `params`.
         """
-        pass
+
+        if self.ties == "breslow":
+            return self.breslow_gradient(params)
+        elif self.ties == "efron":
+            return self.efron_gradient(params)

     def hessian(self, params):
         """
         Returns the Hessian matrix of the log partial likelihood
         function evaluated at `params`.
         """
-        pass
+
+        if self.ties == "breslow":
+            return self.breslow_hessian(params)
+        else:
+            return self.efron_hessian(params)

     def breslow_loglike(self, params):
         """
@@ -398,7 +584,42 @@ class PHReg(model.LikelihoodModel):
         evaluated at `params`, using the Breslow method to handle tied
         times.
         """
-        pass
+
+        surv = self.surv
+
+        like = 0.
+
+        # Loop over strata
+        for stx in range(surv.nstrat):
+
+            uft_ix = surv.ufailt_ix[stx]
+            exog_s = surv.exog_s[stx]
+            nuft = len(uft_ix)
+
+            linpred = np.dot(exog_s, params)
+            if surv.offset_s is not None:
+                linpred += surv.offset_s[stx]
+            linpred -= linpred.max()
+            e_linpred = np.exp(linpred)
+
+            xp0 = 0.
+
+            # Iterate backward through the unique failure times.
+            for i in range(nuft)[::-1]:
+
+                # Update for new cases entering the risk set.
+                ix = surv.risk_enter[stx][i]
+                xp0 += e_linpred[ix].sum()
+
+                # Account for all cases that fail at this point.
+                ix = uft_ix[i]
+                like += (linpred[ix] - np.log(xp0)).sum()
+
+                # Update for cases leaving the risk set.
+                ix = surv.risk_exit[stx][i]
+                xp0 -= e_linpred[ix].sum()
+
+        return like

     def efron_loglike(self, params):
         """
@@ -406,28 +627,215 @@ class PHReg(model.LikelihoodModel):
         evaluated at `params`, using the Efron method to handle tied
         times.
         """
-        pass
+
+        surv = self.surv
+
+        like = 0.
+
+        # Loop over strata
+        for stx in range(surv.nstrat):
+
+            # exog and linear predictor for this stratum
+            exog_s = surv.exog_s[stx]
+            linpred = np.dot(exog_s, params)
+            if surv.offset_s is not None:
+                linpred += surv.offset_s[stx]
+            linpred -= linpred.max()
+            e_linpred = np.exp(linpred)
+
+            xp0 = 0.
+
+            # Iterate backward through the unique failure times.
+            uft_ix = surv.ufailt_ix[stx]
+            nuft = len(uft_ix)
+            for i in range(nuft)[::-1]:
+
+                # Update for new cases entering the risk set.
+                ix = surv.risk_enter[stx][i]
+                xp0 += e_linpred[ix].sum()
+                xp0f = e_linpred[uft_ix[i]].sum()
+
+                # Account for all cases that fail at this point.
+                ix = uft_ix[i]
+                like += linpred[ix].sum()
+
+                m = len(ix)
+                J = np.arange(m, dtype=np.float64) / m
+                like -= np.log(xp0 - J*xp0f).sum()
+
+                # Update for cases leaving the risk set.
+                ix = surv.risk_exit[stx][i]
+                xp0 -= e_linpred[ix].sum()
+
+        return like

     def breslow_gradient(self, params):
         """
         Returns the gradient of the log partial likelihood, using the
         Breslow method to handle tied times.
         """
-        pass
+
+        surv = self.surv
+
+        grad = 0.
+
+        # Loop over strata
+        for stx in range(surv.nstrat):
+
+            # Indices of subjects in the stratum
+            strat_ix = surv.stratum_rows[stx]
+
+            # Unique failure times in the stratum
+            uft_ix = surv.ufailt_ix[stx]
+            nuft = len(uft_ix)
+
+            # exog and linear predictor for the stratum
+            exog_s = surv.exog_s[stx]
+            linpred = np.dot(exog_s, params)
+            if surv.offset_s is not None:
+                linpred += surv.offset_s[stx]
+            linpred -= linpred.max()
+            e_linpred = np.exp(linpred)
+
+            xp0, xp1 = 0., 0.
+
+            # Iterate backward through the unique failure times.
+            for i in range(nuft)[::-1]:
+
+                # Update for new cases entering the risk set.
+                ix = surv.risk_enter[stx][i]
+                if len(ix) > 0:
+                    v = exog_s[ix,:]
+                    xp0 += e_linpred[ix].sum()
+                    xp1 += (e_linpred[ix][:,None] * v).sum(0)
+
+                # Account for all cases that fail at this point.
+                ix = uft_ix[i]
+                grad += (exog_s[ix,:] - xp1 / xp0).sum(0)
+
+                # Update for cases leaving the risk set.
+                ix = surv.risk_exit[stx][i]
+                if len(ix) > 0:
+                    v = exog_s[ix,:]
+                    xp0 -= e_linpred[ix].sum()
+                    xp1 -= (e_linpred[ix][:,None] * v).sum(0)
+
+        return grad

     def efron_gradient(self, params):
         """
         Returns the gradient of the log partial likelihood evaluated
         at `params`, using the Efron method to handle tied times.
         """
-        pass
+
+        surv = self.surv
+
+        grad = 0.
+
+        # Loop over strata
+        for stx in range(surv.nstrat):
+
+            # Indices of cases in the stratum
+            strat_ix = surv.stratum_rows[stx]
+
+            # exog and linear predictor of the stratum
+            exog_s = surv.exog_s[stx]
+            linpred = np.dot(exog_s, params)
+            if surv.offset_s is not None:
+                linpred += surv.offset_s[stx]
+            linpred -= linpred.max()
+            e_linpred = np.exp(linpred)
+
+            xp0, xp1 = 0., 0.
+
+            # Iterate backward through the unique failure times.
+            uft_ix = surv.ufailt_ix[stx]
+            nuft = len(uft_ix)
+            for i in range(nuft)[::-1]:
+
+                # Update for new cases entering the risk set.
+                ix = surv.risk_enter[stx][i]
+                if len(ix) > 0:
+                    v = exog_s[ix,:]
+                    xp0 += e_linpred[ix].sum()
+                    xp1 += (e_linpred[ix][:,None] * v).sum(0)
+                ixf = uft_ix[i]
+                if len(ixf) > 0:
+                    v = exog_s[ixf,:]
+                    xp0f = e_linpred[ixf].sum()
+                    xp1f = (e_linpred[ixf][:,None] * v).sum(0)
+
+                    # Consider all cases that fail at this point.
+                    grad += v.sum(0)
+
+                    m = len(ixf)
+                    J = np.arange(m, dtype=np.float64) / m
+                    numer = xp1 - np.outer(J, xp1f)
+                    denom = xp0 - np.outer(J, xp0f)
+                    ratio = numer / denom
+                    rsum = ratio.sum(0)
+                    grad -= rsum
+
+                # Update for cases leaving the risk set.
+                ix = surv.risk_exit[stx][i]
+                if len(ix) > 0:
+                    v = exog_s[ix,:]
+                    xp0 -= e_linpred[ix].sum()
+                    xp1 -= (e_linpred[ix][:,None] * v).sum(0)
+
+        return grad

     def breslow_hessian(self, params):
         """
         Returns the Hessian of the log partial likelihood evaluated at
         `params`, using the Breslow method to handle tied times.
         """
-        pass
+
+        surv = self.surv
+
+        hess = 0.
+
+        # Loop over strata
+        for stx in range(surv.nstrat):
+
+            uft_ix = surv.ufailt_ix[stx]
+            nuft = len(uft_ix)
+
+            exog_s = surv.exog_s[stx]
+
+            linpred = np.dot(exog_s, params)
+            if surv.offset_s is not None:
+                linpred += surv.offset_s[stx]
+            linpred -= linpred.max()
+            e_linpred = np.exp(linpred)
+
+            xp0, xp1, xp2 = 0., 0., 0.
+
+            # Iterate backward through the unique failure times.
+            for i in range(nuft)[::-1]:
+
+                # Update for new cases entering the risk set.
+                ix = surv.risk_enter[stx][i]
+                if len(ix) > 0:
+                    xp0 += e_linpred[ix].sum()
+                    v = exog_s[ix,:]
+                    xp1 += (e_linpred[ix][:,None] * v).sum(0)
+                    elx = e_linpred[ix]
+                    xp2 += np.einsum("ij,ik,i->jk", v, v, elx)
+
+                # Account for all cases that fail at this point.
+                m = len(uft_ix[i])
+                hess += m*(xp2 / xp0  - np.outer(xp1, xp1) / xp0**2)
+
+                # Update for new cases entering the risk set.
+                ix = surv.risk_exit[stx][i]
+                if len(ix) > 0:
+                    xp0 -= e_linpred[ix].sum()
+                    v = exog_s[ix,:]
+                    xp1 -= (e_linpred[ix][:,None] * v).sum(0)
+                    elx = e_linpred[ix]
+                    xp2 -= np.einsum("ij,ik,i->jk", v, v, elx)
+        return -hess

     def efron_hessian(self, params):
         """
@@ -435,7 +843,65 @@ class PHReg(model.LikelihoodModel):
         evaluated at `params`, using the Efron method to handle tied
         times.
         """
-        pass
+
+        surv = self.surv
+
+        hess = 0.
+
+        # Loop over strata
+        for stx in range(surv.nstrat):
+
+            exog_s = surv.exog_s[stx]
+
+            linpred = np.dot(exog_s, params)
+            if surv.offset_s is not None:
+                linpred += surv.offset_s[stx]
+            linpred -= linpred.max()
+            e_linpred = np.exp(linpred)
+
+            xp0, xp1, xp2 = 0., 0., 0.
+
+            # Iterate backward through the unique failure times.
+            uft_ix = surv.ufailt_ix[stx]
+            nuft = len(uft_ix)
+            for i in range(nuft)[::-1]:
+
+                # Update for new cases entering the risk set.
+                ix = surv.risk_enter[stx][i]
+                if len(ix) > 0:
+                    xp0 += e_linpred[ix].sum()
+                    v = exog_s[ix,:]
+                    xp1 += (e_linpred[ix][:,None] * v).sum(0)
+                    elx = e_linpred[ix]
+                    xp2 += np.einsum("ij,ik,i->jk", v, v, elx)
+
+                ixf = uft_ix[i]
+                if len(ixf) > 0:
+                    v = exog_s[ixf,:]
+                    xp0f = e_linpred[ixf].sum()
+                    xp1f = (e_linpred[ixf][:,None] * v).sum(0)
+                    elx = e_linpred[ixf]
+                    xp2f = np.einsum("ij,ik,i->jk", v, v, elx)
+
+                # Account for all cases that fail at this point.
+                m = len(uft_ix[i])
+                J = np.arange(m, dtype=np.float64) / m
+                c0 = xp0 - J*xp0f
+                hess += xp2 * np.sum(1 / c0)
+                hess -= xp2f * np.sum(J / c0)
+                mat = (xp1[None, :] - np.outer(J, xp1f)) / c0[:, None]
+                hess -= np.einsum("ij,ik->jk", mat, mat)
+
+                # Update for new cases entering the risk set.
+                ix = surv.risk_exit[stx][i]
+                if len(ix) > 0:
+                    xp0 -= e_linpred[ix].sum()
+                    v = exog_s[ix,:]
+                    xp1 -= (e_linpred[ix][:,None] * v).sum(0)
+                    elx = e_linpred[ix]
+                    xp2 -= np.einsum("ij,ik,i->jk", v, v, elx)
+
+        return -hess

     def robust_covariance(self, params):
         """
@@ -459,7 +925,30 @@ class PHReg(model.LikelihoodModel):
         within which observations may be dependent.  The covariance
         matrix is calculated using the Huber-White "sandwich" approach.
         """
-        pass
+
+        if self.groups is None:
+            raise ValueError("`groups` must be specified to calculate the robust covariance matrix")
+
+        hess = self.hessian(params)
+
+        score_obs = self.score_residuals(params)
+
+        # Collapse
+        grads = {}
+        for i,g in enumerate(self.groups):
+            if g not in grads:
+                grads[g] = 0.
+            grads[g] += score_obs[i, :]
+        grads = np.asarray(list(grads.values()))
+
+        mat = grads[None, :, :]
+        mat = mat.T * mat
+        mat = mat.sum(1)
+
+        hess_inv = np.linalg.inv(hess)
+        cmat = np.dot(hess_inv, np.dot(mat, hess_inv))
+
+        return cmat

     def score_residuals(self, params):
         """
@@ -482,7 +971,70 @@ class PHReg(model.LikelihoodModel):
         Observations in a stratum with no observed events have undefined
         score residuals, and contain NaN in the returned matrix.
         """
-        pass
+
+        surv = self.surv
+
+        score_resid = np.zeros(self.exog.shape, dtype=np.float64)
+
+        # Use to set undefined values to NaN.
+        mask = np.zeros(self.exog.shape[0], dtype=np.int32)
+
+        w_avg = self.weighted_covariate_averages(params)
+
+        # Loop over strata
+        for stx in range(surv.nstrat):
+
+            uft_ix = surv.ufailt_ix[stx]
+            exog_s = surv.exog_s[stx]
+            nuft = len(uft_ix)
+            strat_ix = surv.stratum_rows[stx]
+
+            xp0 = 0.
+
+            linpred = np.dot(exog_s, params)
+            if surv.offset_s is not None:
+                linpred += surv.offset_s[stx]
+            linpred -= linpred.max()
+            e_linpred = np.exp(linpred)
+
+            at_risk_ix = set()
+
+            # Iterate backward through the unique failure times.
+            for i in range(nuft)[::-1]:
+
+                # Update for new cases entering the risk set.
+                ix = surv.risk_enter[stx][i]
+                at_risk_ix |= set(ix)
+                xp0 += e_linpred[ix].sum()
+
+                atr_ix = list(at_risk_ix)
+                leverage = exog_s[atr_ix, :] - w_avg[stx][i, :]
+
+                # Event indicators
+                d = np.zeros(exog_s.shape[0])
+                d[uft_ix[i]] = 1
+
+                # The increment in the cumulative hazard
+                dchaz = len(uft_ix[i]) / xp0
+
+                # Piece of the martingale residual
+                mrp = d[atr_ix] - e_linpred[atr_ix] * dchaz
+
+                # Update the score residuals
+                ii = strat_ix[atr_ix]
+                score_resid[ii,:] += leverage * mrp[:, None]
+                mask[ii] = 1
+
+                # Update for cases leaving the risk set.
+                ix = surv.risk_exit[stx][i]
+                at_risk_ix -= set(ix)
+                xp0 -= e_linpred[ix].sum()
+
+        jj = np.flatnonzero(mask == 0)
+        if len(jj) > 0:
+            score_resid[jj, :] = np.nan
+
+        return score_resid

     def weighted_covariate_averages(self, params):
         """
@@ -506,7 +1058,46 @@ class PHReg(model.LikelihoodModel):
         -----
         Used to calculate leverages and score residuals.
         """
-        pass
+
+        surv = self.surv
+
+        averages = []
+        xp0, xp1 = 0., 0.
+
+        # Loop over strata
+        for stx in range(surv.nstrat):
+
+            uft_ix = surv.ufailt_ix[stx]
+            exog_s = surv.exog_s[stx]
+            nuft = len(uft_ix)
+
+            average_s = np.zeros((len(uft_ix), exog_s.shape[1]),
+                                  dtype=np.float64)
+
+            linpred = np.dot(exog_s, params)
+            if surv.offset_s is not None:
+                linpred += surv.offset_s[stx]
+            linpred -= linpred.max()
+            e_linpred = np.exp(linpred)
+
+            # Iterate backward through the unique failure times.
+            for i in range(nuft)[::-1]:
+
+                # Update for new cases entering the risk set.
+                ix = surv.risk_enter[stx][i]
+                xp0 += e_linpred[ix].sum()
+                xp1 += np.dot(e_linpred[ix], exog_s[ix, :])
+
+                average_s[i, :] = xp1 / xp0
+
+                # Update for cases leaving the risk set.
+                ix = surv.risk_exit[stx][i]
+                xp0 -= e_linpred[ix].sum()
+                xp1 -= np.dot(e_linpred[ix], exog_s[ix, :])
+
+            averages.append(average_s)
+
+        return averages

     def baseline_cumulative_hazard(self, params):
         """
@@ -528,7 +1119,49 @@ class PHReg(model.LikelihoodModel):
         -----
         Uses the Nelson-Aalen estimator.
         """
-        pass
+
+        # TODO: some disagreements with R, not the same algorithm but
+        # hard to deduce what R is doing.  Our results are reasonable.
+
+        surv = self.surv
+        rslt = []
+
+        # Loop over strata
+        for stx in range(surv.nstrat):
+
+            uft = surv.ufailt[stx]
+            uft_ix = surv.ufailt_ix[stx]
+            exog_s = surv.exog_s[stx]
+            nuft = len(uft_ix)
+
+            linpred = np.dot(exog_s, params)
+            if surv.offset_s is not None:
+                linpred += surv.offset_s[stx]
+            e_linpred = np.exp(linpred)
+
+            xp0 = 0.
+            h0 = np.zeros(nuft, dtype=np.float64)
+
+            # Iterate backward through the unique failure times.
+            for i in range(nuft)[::-1]:
+
+                # Update for new cases entering the risk set.
+                ix = surv.risk_enter[stx][i]
+                xp0 += e_linpred[ix].sum()
+
+                # Account for all cases that fail at this point.
+                ix = uft_ix[i]
+                h0[i] = len(ix) / xp0
+
+                # Update for cases leaving the risk set.
+                ix = surv.risk_exit[stx][i]
+                xp0 -= e_linpred[ix].sum()
+
+            cumhaz = np.cumsum(h0) - h0
+            current_strata_surv = np.exp(-cumhaz)
+            rslt.append([uft, cumhaz, current_strata_surv])
+
+        return rslt

     def baseline_cumulative_hazard_function(self, params):
         """
@@ -545,7 +1178,112 @@ class PHReg(model.LikelihoodModel):
         A dict mapping stratum names to the estimated baseline
         cumulative hazard function.
         """
-        pass
+
+        from scipy.interpolate import interp1d
+        surv = self.surv
+        base = self.baseline_cumulative_hazard(params)
+
+        cumhaz_f = {}
+        for stx in range(surv.nstrat):
+            time_h = base[stx][0]
+            cumhaz = base[stx][1]
+            time_h = np.r_[-np.inf, time_h, np.inf]
+            cumhaz = np.r_[cumhaz[0], cumhaz, cumhaz[-1]]
+            func = interp1d(time_h, cumhaz, kind='zero')
+            cumhaz_f[self.surv.stratum_names[stx]] = func
+
+        return cumhaz_f
+
+    @Appender(_predict_docstring % {
+        'params_doc': _predict_params_doc,
+        'cov_params_doc': _predict_cov_params_docstring})
+    def predict(self, params, exog=None, cov_params=None, endog=None,
+                strata=None, offset=None, pred_type="lhr", pred_only=False):
+
+        # This function breaks mediation, because it does not simply
+        # return the predicted values as an array.
+
+        pred_type = pred_type.lower()
+        if pred_type not in ["lhr", "hr", "surv", "cumhaz"]:
+            msg = "Type %s not allowed for prediction" % pred_type
+            raise ValueError(msg)
+
+        class bunch:
+            predicted_values = None
+            standard_errors = None
+        ret_val = bunch()
+
+        # Do not do anything with offset here because we want to allow
+        # different offsets to be specified even if exog is the model
+        # exog.
+        exog_provided = True
+        if exog is None:
+            exog = self.exog
+            exog_provided = False
+
+        lhr = np.dot(exog, params)
+        if offset is not None:
+            lhr += offset
+        # Never use self.offset unless we are also using self.exog
+        elif self.offset is not None and not exog_provided:
+            lhr += self.offset
+
+        # Handle lhr and hr prediction first, since they do not make
+        # use of the hazard function.
+
+        if pred_type == "lhr":
+            ret_val.predicted_values = lhr
+            if cov_params is not None:
+                mat = np.dot(exog, cov_params)
+                va = (mat * exog).sum(1)
+                ret_val.standard_errors = np.sqrt(va)
+            if pred_only:
+                return ret_val.predicted_values
+            return ret_val
+
+        hr = np.exp(lhr)
+
+        if pred_type == "hr":
+            ret_val.predicted_values = hr
+            if pred_only:
+                return ret_val.predicted_values
+            return ret_val
+
+        # Makes sure endog is defined
+        if endog is None and exog_provided:
+            msg = "If `exog` is provided `endog` must be provided."
+            raise ValueError(msg)
+        # Use model endog if using model exog
+        elif endog is None and not exog_provided:
+            endog = self.endog
+
+        # Make sure strata is defined
+        if strata is None:
+            if exog_provided and self.surv.nstrat > 1:
+                raise ValueError("`strata` must be provided")
+            if self.strata is None:
+                strata = [self.surv.stratum_names[0],] * len(endog)
+            else:
+                strata = self.strata
+
+        cumhaz = np.nan * np.ones(len(endog), dtype=np.float64)
+        stv = np.unique(strata)
+        bhaz = self.baseline_cumulative_hazard_function(params)
+        for stx in stv:
+            ix = np.flatnonzero(strata == stx)
+            func = bhaz[stx]
+            cumhaz[ix] = func(endog[ix]) * hr[ix]
+
+        if pred_type == "cumhaz":
+            ret_val.predicted_values = cumhaz
+
+        elif pred_type == "surv":
+            ret_val.predicted_values = np.exp(-cumhaz)
+
+        if pred_only:
+            return ret_val.predicted_values
+
+        return ret_val

     def get_distribution(self, params, scale=1.0, exog=None):
         """
@@ -572,11 +1310,72 @@ class PHReg(model.LikelihoodModel):
         of the survivor function that puts all mass on the observed
         failure times within a stratum.
         """
-        pass
+
+        surv = self.surv
+        bhaz = self.baseline_cumulative_hazard(params)
+
+        # The arguments to rv_discrete_float, first obtained by
+        # stratum
+        pk, xk = [], []
+
+        if exog is None:
+            exog_split = surv.exog_s
+        else:
+            exog_split = self.surv._split(exog)
+
+        for stx in range(self.surv.nstrat):
+
+            exog_s = exog_split[stx]
+
+            linpred = np.dot(exog_s, params)
+            if surv.offset_s is not None:
+                linpred += surv.offset_s[stx]
+            e_linpred = np.exp(linpred)
+
+            # The unique failure times for this stratum (the support
+            # of the distribution).
+            pts = bhaz[stx][0]
+
+            # The individual cumulative hazards for everyone in this
+            # stratum.
+            ichaz = np.outer(e_linpred, bhaz[stx][1])
+
+            # The individual survival functions.
+            usurv = np.exp(-ichaz)
+            z = np.zeros((usurv.shape[0], 1))
+            usurv = np.concatenate((usurv, z), axis=1)
+
+            # The individual survival probability masses.
+            probs = -np.diff(usurv, 1)
+
+            pk.append(probs)
+            xk.append(np.outer(np.ones(probs.shape[0]), pts))
+
+        # Pad to make all strata have the same shape
+        mxc = max([x.shape[1] for x in xk])
+        for k in range(self.surv.nstrat):
+            if xk[k].shape[1] < mxc:
+                xk1 = np.zeros((xk[k].shape[0], mxc))
+                pk1 = np.zeros((pk[k].shape[0], mxc))
+                xk1[:, 0:xk[k].shape[1]] = xk[k]
+                pk1[:, 0:pk[k].shape[1]] = pk[k]
+                xk[k], pk[k] = xk1, pk1
+
+        # Put the support points and probabilities into single matrices
+        xka = np.nan * np.ones((len(self.endog), mxc))
+        pka = np.ones((len(self.endog), mxc), dtype=np.float64) / mxc
+        for stx in range(self.surv.nstrat):
+            ix = self.surv.stratum_rows[stx]
+            xka[ix, :] = xk[stx]
+            pka[ix, :] = pk[stx]
+
+        dist = rv_discrete_float(xka, pka)
+
+        return dist


 class PHRegResults(base.LikelihoodModelResults):
-    """
+    '''
     Class to contain results of fitting a Cox proportional hazards
     survival model.

@@ -602,29 +1401,33 @@ class PHRegResults(base.LikelihoodModelResults):
     See Also
     --------
     statsmodels.LikelihoodModelResults
-    """
+    '''
+
+    def __init__(self, model, params, cov_params, scale=1., covariance_type="naive"):
+
+        # There is no scale parameter, but we need it for
+        # meta-procedures that work with results.

-    def __init__(self, model, params, cov_params, scale=1.0,
-        covariance_type='naive'):
         self.covariance_type = covariance_type
         self.df_resid = model.df_resid
         self.df_model = model.df_model
-        super(PHRegResults, self).__init__(model, params, scale=1.0,
-            normalized_cov_params=cov_params)
+
+        super(PHRegResults, self).__init__(model, params, scale=1.,
+           normalized_cov_params=cov_params)

     @cache_readonly
     def standard_errors(self):
         """
         Returns the standard errors of the parameter estimates.
         """
-        pass
+        return np.sqrt(np.diag(self.cov_params()))

     @cache_readonly
     def bse(self):
         """
         Returns the standard errors of the parameter estimates.
         """
-        pass
+        return self.standard_errors

     def get_distribution(self):
         """
@@ -642,13 +1445,27 @@ class PHRegResults(base.LikelihoodModelResults):
         of the survivor function that puts all mass on the observed
         failure times within a stratum.
         """
-        pass
+
+        return self.model.get_distribution(self.params)
+
+    @Appender(_predict_docstring % {'params_doc': '', 'cov_params_doc': ''})
+    def predict(self, endog=None, exog=None, strata=None,
+                offset=None, transform=True, pred_type="lhr"):
+        return super(PHRegResults, self).predict(exog=exog,
+                                                 transform=transform,
+                                                 cov_params=self.cov_params(),
+                                                 endog=endog,
+                                                 strata=strata,
+                                                 offset=offset,
+                                                 pred_type=pred_type)

     def _group_stats(self, groups):
         """
         Descriptive statistics of the groups.
         """
-        pass
+        gsizes = np.unique(groups, return_counts=True)
+        gsizes = gsizes[1]
+        return gsizes.min(), gsizes.max(), gsizes.mean(), len(gsizes)

     @cache_readonly
     def weighted_covariate_averages(self):
@@ -656,14 +1473,14 @@ class PHRegResults(base.LikelihoodModelResults):
         The average covariate values within the at-risk set at each
         event time point, weighted by hazard.
         """
-        pass
+        return self.model.weighted_covariate_averages(self.params)

     @cache_readonly
     def score_residuals(self):
         """
         A matrix containing the score residuals.
         """
-        pass
+        return self.model.score_residuals(self.params)

     @cache_readonly
     def baseline_cumulative_hazard(self):
@@ -671,7 +1488,7 @@ class PHRegResults(base.LikelihoodModelResults):
         A list (corresponding to the strata) containing the baseline
         cumulative hazard function evaluated at the event points.
         """
-        pass
+        return self.model.baseline_cumulative_hazard(self.params)

     @cache_readonly
     def baseline_cumulative_hazard_function(self):
@@ -679,7 +1496,7 @@ class PHRegResults(base.LikelihoodModelResults):
         A list (corresponding to the strata) containing function
         objects that calculate the cumulative hazard function.
         """
-        pass
+        return self.model.baseline_cumulative_hazard_function(self.params)

     @cache_readonly
     def schoenfeld_residuals(self):
@@ -690,16 +1507,70 @@ class PHRegResults(base.LikelihoodModelResults):
         -----
         Schoenfeld residuals for censored observations are set to zero.
         """
-        pass
+
+        surv = self.model.surv
+        w_avg = self.weighted_covariate_averages
+
+        # Initialize at NaN since rows that belong to strata with no
+        # events have undefined residuals.
+        sch_resid = np.nan*np.ones(self.model.exog.shape, dtype=np.float64)
+
+        # Loop over strata
+        for stx in range(surv.nstrat):
+
+            uft = surv.ufailt[stx]
+            exog_s = surv.exog_s[stx]
+            time_s = surv.time_s[stx]
+            strat_ix = surv.stratum_rows[stx]
+
+            ii = np.searchsorted(uft, time_s)
+
+            # These subjects are censored after the last event in
+            # their stratum, so have empty risk sets and undefined
+            # residuals.
+            jj = np.flatnonzero(ii < len(uft))
+
+            sch_resid[strat_ix[jj], :] = exog_s[jj, :] - w_avg[stx][ii[jj], :]
+
+        jj = np.flatnonzero(self.model.status == 0)
+        sch_resid[jj, :] = np.nan
+
+        return sch_resid

     @cache_readonly
     def martingale_residuals(self):
         """
         The martingale residuals.
         """
-        pass

-    def summary(self, yname=None, xname=None, title=None, alpha=0.05):
+        surv = self.model.surv
+
+        # Initialize at NaN since rows that belong to strata with no
+        # events have undefined residuals.
+        mart_resid = np.nan*np.ones(len(self.model.endog), dtype=np.float64)
+
+        cumhaz_f_list = self.baseline_cumulative_hazard_function
+
+        # Loop over strata
+        for stx in range(surv.nstrat):
+
+            cumhaz_f = cumhaz_f_list[stx]
+
+            exog_s = surv.exog_s[stx]
+            time_s = surv.time_s[stx]
+
+            linpred = np.dot(exog_s, self.params)
+            if surv.offset_s is not None:
+                linpred += surv.offset_s[stx]
+            e_linpred = np.exp(linpred)
+
+            ii = surv.stratum_rows[stx]
+            chaz = cumhaz_f(time_s)
+            mart_resid[ii] = self.model.status[ii] - e_linpred * chaz
+
+        return mart_resid
+
+    def summary(self, yname=None, xname=None, title=None, alpha=.05):
         """
         Summarize the proportional hazards regression results.

@@ -727,7 +1598,71 @@ class PHRegResults(base.LikelihoodModelResults):
         --------
         statsmodels.iolib.summary2.Summary : class to hold summary results
         """
-        pass
+
+        from statsmodels.iolib import summary2
+        smry = summary2.Summary()
+        float_format = "%8.3f"
+
+        info = {}
+        info["Model:"] = "PH Reg"
+        if yname is None:
+            yname = self.model.endog_names
+        info["Dependent variable:"] = yname
+        info["Ties:"] = self.model.ties.capitalize()
+        info["Sample size:"] = str(self.model.surv.n_obs)
+        info["Num. events:"] = str(int(sum(self.model.status)))
+
+        if self.model.groups is not None:
+            mn, mx, avg, num = self._group_stats(self.model.groups)
+            info["Num groups:"] = "%.0f" % num
+            info["Min group size:"] = "%.0f" % mn
+            info["Max group size:"] = "%.0f" % mx
+            info["Avg group size:"] = "%.1f" % avg
+
+        if self.model.strata is not None:
+            mn, mx, avg, num = self._group_stats(self.model.strata)
+            info["Num strata:"] = "%.0f" % num
+            info["Min stratum size:"] = "%.0f" % mn
+            info["Max stratum size:"] = "%.0f" % mx
+            info["Avg stratum size:"] = "%.1f" % avg
+
+        smry.add_dict(info, align='l', float_format=float_format)
+
+        param = summary2.summary_params(self, alpha=alpha)
+        param = param.rename(columns={"Coef.": "log HR",
+                                      "Std.Err.": "log HR SE"})
+        param.insert(2, "HR", np.exp(param["log HR"]))
+        a = "[%.3f" % (alpha / 2)
+        param.loc[:, a] = np.exp(param.loc[:, a])
+        a = "%.3f]" % (1 - alpha / 2)
+        param.loc[:, a] = np.exp(param.loc[:, a])
+        if xname is not None:
+            param.index = xname
+        smry.add_df(param, float_format=float_format)
+        smry.add_title(title=title, results=self)
+        smry.add_text("Confidence intervals are for the hazard ratios")
+
+        dstrat = self.model.surv.nstrat_orig - self.model.surv.nstrat
+        if dstrat > 0:
+            if dstrat == 1:
+                smry.add_text("1 stratum dropped for having no events")
+            else:
+                smry.add_text("%d strata dropped for having no events" % dstrat)
+
+        if self.model.entry is not None:
+            n_entry = sum(self.model.entry != 0)
+            if n_entry == 1:
+                smry.add_text("1 observation has a positive entry time")
+            else:
+                smry.add_text("%d observations have positive entry times" % n_entry)
+
+        if self.model.groups is not None:
+            smry.add_text("Standard errors account for dependence within groups")
+
+        if hasattr(self, "regularized"):
+            smry.add_text("Standard errors do not account for the regularization")
+
+        return smry


 class rv_discrete_float:
@@ -759,6 +1694,7 @@ class rv_discrete_float:
     """

     def __init__(self, xk, pk):
+
         self.xk = xk
         self.pk = pk
         self.cpk = np.cumsum(self.pk, axis=1)
@@ -775,7 +1711,13 @@ class rv_discrete_float:
         n : not used
             Present for signature compatibility
         """
-        pass
+
+        n = self.xk.shape[0]
+        u = np.random.uniform(size=n)
+
+        ix = (self.cpk < u[:, None]).sum(1)
+        ii = np.arange(n, dtype=np.int32)
+        return self.xk[(ii,ix)]

     def mean(self):
         """
@@ -786,7 +1728,8 @@ class rv_discrete_float:
         `xk`, using the probabilities in the corresponding row of
         `pk`.
         """
-        pass
+
+        return (self.xk * self.pk).sum(1)

     def var(self):
         """
@@ -797,7 +1740,11 @@ class rv_discrete_float:
         `xk`, using the probabilities in the corresponding row of
         `pk`.
         """
-        pass
+
+        mn = self.mean()
+        xkc = self.xk - mn[:, None]
+
+        return (self.pk * (self.xk - xkc)**2).sum(1)

     def std(self):
         """
@@ -808,4 +1755,5 @@ class rv_discrete_float:
         each row of `xk`, using the probabilities in the corresponding
         row of `pk`.
         """
-        pass
+
+        return np.sqrt(self.var())
diff --git a/statsmodels/duration/survfunc.py b/statsmodels/duration/survfunc.py
index d5b00a459..561e4ff03 100644
--- a/statsmodels/duration/survfunc.py
+++ b/statsmodels/duration/survfunc.py
@@ -4,20 +4,157 @@ from scipy.stats.distributions import chi2, norm
 from statsmodels.graphics import utils


-def _calc_survfunc_right(time, status, weights=None, entry=None, compress=
-    True, retall=True):
+def _calc_survfunc_right(time, status, weights=None, entry=None, compress=True,
+                         retall=True):
     """
     Calculate the survival function and its standard error for a single
     group.
     """
-    pass
+
+    # Convert the unique times to ranks (0, 1, 2, ...)
+    if entry is None:
+        utime, rtime = np.unique(time, return_inverse=True)
+    else:
+        tx = np.concatenate((time, entry))
+        utime, rtime = np.unique(tx, return_inverse=True)
+        rtime = rtime[0:len(time)]
+
+    # Number of deaths at each unique time.
+    ml = len(utime)
+    if weights is None:
+        d = np.bincount(rtime, weights=status, minlength=ml)
+    else:
+        d = np.bincount(rtime, weights=status*weights, minlength=ml)
+
+    # Size of risk set just prior to each event time.
+    if weights is None:
+        n = np.bincount(rtime, minlength=ml)
+    else:
+        n = np.bincount(rtime, weights=weights, minlength=ml)
+    if entry is not None:
+        n = np.cumsum(n) - n
+        rentry = np.searchsorted(utime, entry, side='left')
+        if weights is None:
+            n0 = np.bincount(rentry, minlength=ml)
+        else:
+            n0 = np.bincount(rentry, weights=weights, minlength=ml)
+        n0 = np.cumsum(n0) - n0
+        n = n0 - n
+    else:
+        n = np.cumsum(n[::-1])[::-1]
+
+    # Only retain times where an event occurred.
+    if compress:
+        ii = np.flatnonzero(d > 0)
+        d = d[ii]
+        n = n[ii]
+        utime = utime[ii]
+
+    # The survival function probabilities.
+    sp = 1 - d / n.astype(np.float64)
+    ii = sp < 1e-16
+    sp[ii] = 1e-16
+    sp = np.log(sp)
+    sp = np.cumsum(sp)
+    sp = np.exp(sp)
+    sp[ii] = 0
+
+    if not retall:
+        return sp, utime, rtime, n, d
+
+    # Standard errors
+    if weights is None:
+        # Greenwood's formula
+        denom = n * (n - d)
+        denom = np.clip(denom, 1e-12, np.inf)
+        se = d / denom.astype(np.float64)
+        se[(n == d) | (n == 0)] = np.nan
+        se = np.cumsum(se)
+        se = np.sqrt(se)
+        locs = np.isfinite(se) | (sp != 0)
+        se[locs] *= sp[locs]
+        se[~locs] = np.nan
+    else:
+        # Tsiatis' (1981) formula
+        se = d / (n * n).astype(np.float64)
+        se = np.cumsum(se)
+        se = np.sqrt(se)
+
+    return sp, se, utime, rtime, n, d


 def _calc_incidence_right(time, status, weights=None):
     """
     Calculate the cumulative incidence function and its standard error.
     """
-    pass
+
+    # Calculate the all-cause survival function.
+    status0 = (status >= 1).astype(np.float64)
+    sp, utime, rtime, n, d = _calc_survfunc_right(time, status0, weights,
+                                                  compress=False, retall=False)
+
+    ngrp = int(status.max())
+
+    # Number of cause-specific deaths at each unique time.
+    d = []
+    for k in range(ngrp):
+        status0 = (status == k + 1).astype(np.float64)
+        if weights is None:
+            d0 = np.bincount(rtime, weights=status0, minlength=len(utime))
+        else:
+            d0 = np.bincount(rtime, weights=status0*weights,
+                             minlength=len(utime))
+        d.append(d0)
+
+    # The cumulative incidence function probabilities.
+    ip = []
+    sp0 = np.r_[1, sp[:-1]] / n
+    for k in range(ngrp):
+        ip0 = np.cumsum(sp0 * d[k])
+        ip.append(ip0)
+
+    # The standard error of the cumulative incidence function.
+    if weights is not None:
+        return ip, None, utime
+    se = []
+    da = sum(d)
+    for k in range(ngrp):
+
+        ra = da / (n * (n - da))
+        v = ip[k]**2 * np.cumsum(ra)
+        v -= 2 * ip[k] * np.cumsum(ip[k] * ra)
+        v += np.cumsum(ip[k]**2 * ra)
+
+        ra = (n - d[k]) * d[k] / n
+        v += np.cumsum(sp0**2 * ra)
+
+        ra = sp0 * d[k] / n
+        v -= 2 * ip[k] * np.cumsum(ra)
+        v += 2 * np.cumsum(ip[k] * ra)
+
+        se.append(np.sqrt(v))
+
+    return ip, se, utime
+
+
+def _checkargs(time, status, entry, freq_weights, exog):
+
+    if len(time) != len(status):
+        raise ValueError("time and status must have the same length")
+
+    if entry is not None and (len(entry) != len(time)):
+        msg = "entry times and event times must have the same length"
+        raise ValueError(msg)
+
+    if entry is not None and np.any(entry >= time):
+        msg = "Entry times must not occur on or after event times"
+        raise ValueError(msg)
+
+    if freq_weights is not None and (len(freq_weights) != len(time)):
+        raise ValueError("weights, time and status must have the same length")
+
+    if exog is not None and (exog.shape[0] != len(time)):
+        raise ValueError("the rows of exog should align with time")


 class CumIncidenceRight:
@@ -95,29 +232,32 @@ class CumIncidenceRight:
     https://arxiv.org/pdf/math/0409180.pdf
     """

-    def __init__(self, time, status, title=None, freq_weights=None, exog=
-        None, bw_factor=1.0, dimred=True):
+    def __init__(self, time, status, title=None, freq_weights=None,
+                 exog=None, bw_factor=1., dimred=True):
+
         _checkargs(time, status, None, freq_weights, None)
         time = self.time = np.asarray(time)
         status = self.status = np.asarray(status)
         if freq_weights is not None:
             freq_weights = self.freq_weights = np.asarray(freq_weights)
+
         if exog is not None:
             from ._kernel_estimates import _kernel_cumincidence
             exog = self.exog = np.asarray(exog)
             nobs = exog.shape[0]
-            kw = nobs ** (-1 / 3.0) * bw_factor
-            kfunc = lambda x: np.exp(-x ** 2 / kw ** 2).sum(1)
-            x = _kernel_cumincidence(time, status, exog, kfunc,
-                freq_weights, dimred)
+            kw = nobs**(-1/3.0) * bw_factor
+            kfunc = lambda x: np.exp(-x**2 / kw**2).sum(1)
+            x = _kernel_cumincidence(time, status, exog, kfunc, freq_weights,
+                                     dimred)
             self.times = x[0]
             self.cinc = x[1]
             return
+
         x = _calc_incidence_right(time, status, freq_weights)
         self.cinc = x[0]
         self.cinc_se = x[1]
         self.times = x[2]
-        self.title = '' if not title else title
+        self.title = "" if not title else title


 class SurvfuncRight:
@@ -188,35 +328,40 @@ class SurvfuncRight:
     https://arxiv.org/pdf/math/0409180.pdf
     """

-    def __init__(self, time, status, entry=None, title=None, freq_weights=
-        None, exog=None, bw_factor=1.0):
+    def __init__(self, time, status, entry=None, title=None,
+                 freq_weights=None, exog=None, bw_factor=1.):
+
         _checkargs(time, status, entry, freq_weights, exog)
         time = self.time = np.asarray(time)
         status = self.status = np.asarray(status)
         if freq_weights is not None:
             freq_weights = self.freq_weights = np.asarray(freq_weights)
+
         if entry is not None:
             entry = self.entry = np.asarray(entry)
+
         if exog is not None:
             if entry is not None:
-                raise ValueError('exog and entry cannot both be present')
+                raise ValueError("exog and entry cannot both be present")
             from ._kernel_estimates import _kernel_survfunc
             exog = self.exog = np.asarray(exog)
             nobs = exog.shape[0]
-            kw = nobs ** (-1 / 3.0) * bw_factor
-            kfunc = lambda x: np.exp(-x ** 2 / kw ** 2).sum(1)
+            kw = nobs**(-1/3.0) * bw_factor
+            kfunc = lambda x: np.exp(-x**2 / kw**2).sum(1)
             x = _kernel_survfunc(time, status, exog, kfunc, freq_weights)
             self.surv_prob = x[0]
             self.surv_times = x[1]
             return
-        x = _calc_survfunc_right(time, status, weights=freq_weights, entry=
-            entry)
+
+        x = _calc_survfunc_right(time, status, weights=freq_weights,
+                                 entry=entry)
+
         self.surv_prob = x[0]
         self.surv_prob_se = x[1]
         self.surv_times = x[2]
         self.n_risk = x[4]
         self.n_events = x[5]
-        self.title = '' if not title else title
+        self.title = "" if not title else title

     def plot(self, ax=None):
         """
@@ -243,7 +388,8 @@ class SurvfuncRight:
         >>> li = ax.get_lines()
         >>> li[1].set_visible(False)
         """
-        pass
+
+        return plot_survfunc(self, ax)

     def quantile(self, p):
         """
@@ -257,7 +403,14 @@ class SurvfuncRight:

         Returns the estimated quantile.
         """
-        pass
+
+        # SAS uses a strict inequality here.
+        ii = np.flatnonzero(self.surv_prob < 1 - p)
+
+        if len(ii) == 0:
+            return np.nan
+
+        return self.surv_times[ii[0]]

     def quantile_ci(self, p, alpha=0.05, method='cloglog'):
         """
@@ -293,7 +446,43 @@ class SurvfuncRight:

           http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_lifetest_details03.htm
         """
-        pass
+
+        tr = norm.ppf(1 - alpha / 2)
+
+        method = method.lower()
+        if method == "cloglog":
+            g = lambda x: np.log(-np.log(x))
+            gprime = lambda x: -1 / (x * np.log(x))
+        elif method == "linear":
+            g = lambda x: x
+            gprime = lambda x: 1
+        elif method == "log":
+            g = np.log
+            gprime = lambda x: 1 / x
+        elif method == "logit":
+            g = lambda x: np.log(x / (1 - x))
+            gprime = lambda x: 1 / (x * (1 - x))
+        elif method == "asinsqrt":
+            g = lambda x: np.arcsin(np.sqrt(x))
+            gprime = lambda x: 1 / (2 * np.sqrt(x) * np.sqrt(1 - x))
+        else:
+            raise ValueError("unknown method")
+
+        r = g(self.surv_prob) - g(1 - p)
+        r /= (gprime(self.surv_prob) * self.surv_prob_se)
+
+        ii = np.flatnonzero(np.abs(r) <= tr)
+        if len(ii) == 0:
+            return np.nan, np.nan
+
+        lb = self.surv_times[ii[0]]
+
+        if ii[-1] == len(self.surv_times) - 1:
+            ub = np.inf
+        else:
+            ub = self.surv_times[ii[-1] + 1]
+
+        return lb, ub

     def summary(self):
         """
@@ -302,9 +491,17 @@ class SurvfuncRight:
         The summary is a dataframe containing the unique event times,
         estimated survival function values, and related quantities.
         """
-        pass

-    def simultaneous_cb(self, alpha=0.05, method='hw', transform='log'):
+        df = pd.DataFrame(index=self.surv_times)
+        df.index.name = "Time"
+        df["Surv prob"] = self.surv_prob
+        df["Surv prob SE"] = self.surv_prob_se
+        df["num at risk"] = self.n_risk
+        df["num events"] = self.n_events
+
+        return df
+
+    def simultaneous_cb(self, alpha=0.05, method="hw", transform="log"):
         """
         Returns a simultaneous confidence band for the survival function.

@@ -333,11 +530,41 @@ class SurvfuncRight:
             The upper confidence limits corresponding to the points
             in `surv_times`.
         """
-        pass

-
-def survdiff(time, status, group, weight_type=None, strata=None, entry=None,
-    **kwargs):
+        method = method.lower()
+        if method != "hw":
+            msg = "only the Hall-Wellner (hw) method is implemented"
+            raise ValueError(msg)
+
+        if alpha != 0.05:
+            raise ValueError("alpha must be set to 0.05")
+
+        transform = transform.lower()
+        s2 = self.surv_prob_se**2 / self.surv_prob**2
+        nn = self.n_risk
+        if transform == "log":
+            denom = np.sqrt(nn) * np.log(self.surv_prob)
+            theta = 1.3581 * (1 + nn * s2) / denom
+            theta = np.exp(theta)
+            lcb = self.surv_prob**(1/theta)
+            ucb = self.surv_prob**theta
+        elif transform == "arcsin":
+            k = 1.3581
+            k *= (1 + nn * s2) / (2 * np.sqrt(nn))
+            k *= np.sqrt(self.surv_prob / (1 - self.surv_prob))
+            f = np.arcsin(np.sqrt(self.surv_prob))
+            v = np.clip(f - k, 0, np.inf)
+            lcb = np.sin(v)**2
+            v = np.clip(f + k, -np.inf, np.pi/2)
+            ucb = np.sin(v)**2
+        else:
+            raise ValueError("Unknown transform")
+
+        return lcb, ucb
+
+
+def survdiff(time, status, group, weight_type=None, strata=None,
+             entry=None, **kwargs):
     """
     Test for the equality of two survival distributions.

@@ -374,7 +601,139 @@ def survdiff(time, status, group, weight_type=None, strata=None, entry=None,
             statistic value
     pvalue : The p-value for the chi^2 test
     """
-    pass
+
+    time = np.asarray(time)
+    status = np.asarray(status)
+    group = np.asarray(group)
+
+    gr = np.unique(group)
+
+    if strata is None:
+        obs, var = _survdiff(time, status, group, weight_type, gr,
+                             entry, **kwargs)
+    else:
+        strata = np.asarray(strata)
+        stu = np.unique(strata)
+        obs, var = 0., 0.
+        for st in stu:
+            # could be more efficient?
+            ii = (strata == st)
+            obs1, var1 = _survdiff(time[ii], status[ii], group[ii],
+                                   weight_type, gr, entry, **kwargs)
+            obs += obs1
+            var += var1
+
+    chisq = obs.dot(np.linalg.solve(var, obs))  # (O - E).T * V^(-1) * (O - E)
+    pvalue = 1 - chi2.cdf(chisq, len(gr)-1)
+
+    return chisq, pvalue
+
+
+def _survdiff(time, status, group, weight_type, gr, entry=None,
+              **kwargs):
+    # logrank test for one stratum
+    # calculations based on https://web.stanford.edu/~lutian/coursepdf/unit6.pdf
+    # formula for variance better to take from https://web.stanford.edu/~lutian/coursepdf/survweek3.pdf
+
+    # Get the unique times.
+    if entry is None:
+        utimes, rtimes = np.unique(time, return_inverse=True)
+    else:
+        utimes, rtimes = np.unique(np.concatenate((time, entry)),
+                                   return_inverse=True)
+        rtimes = rtimes[0:len(time)]
+
+    # Split entry times by group if present (should use pandas groupby)
+    tse = [(gr_i, None) for gr_i in gr]
+    if entry is not None:
+        for k, _ in enumerate(gr):
+            ii = (group == gr[k])
+            entry1 = entry[ii]
+            tse[k] = (gr[k], entry1)
+
+    # Event count and risk set size at each time point, per group and overall.
+    # TODO: should use Pandas groupby
+    nrisk, obsv = [], []
+    ml = len(utimes)
+    for g, entry0 in tse:
+
+        mk = (group == g)
+        n = np.bincount(rtimes, weights=mk, minlength=ml)
+
+        ob = np.bincount(rtimes, weights=status*mk, minlength=ml)
+        obsv.append(ob)
+
+        if entry is not None:
+            n = np.cumsum(n) - n
+            rentry = np.searchsorted(utimes, entry0, side='left')
+            n0 = np.bincount(rentry, minlength=ml)
+            n0 = np.cumsum(n0) - n0
+            nr = n0 - n
+        else:
+            nr = np.cumsum(n[::-1])[::-1]
+
+        nrisk.append(nr)
+
+    obs = sum(obsv)
+    nrisk_tot = sum(nrisk)
+    ix = np.flatnonzero(nrisk_tot > 1)
+
+    weights = None
+    if weight_type is not None:
+        weight_type = weight_type.lower()
+        if weight_type == "gb":
+            weights = nrisk_tot
+        elif weight_type == "tw":
+            weights = np.sqrt(nrisk_tot)
+        elif weight_type == "fh":
+            if "fh_p" not in kwargs:
+                msg = "weight_type type 'fh' requires specification of fh_p"
+                raise ValueError(msg)
+            fh_p = kwargs["fh_p"]
+            # Calculate the survivor function directly to avoid the
+            # overhead of creating a SurvfuncRight object
+            sp = 1 - obs / nrisk_tot.astype(np.float64)
+            sp = np.log(sp)
+            sp = np.cumsum(sp)
+            sp = np.exp(sp)
+            weights = sp**fh_p
+            weights = np.roll(weights, 1)
+            weights[0] = 1
+        else:
+            raise ValueError("weight_type not implemented")
+
+    dfs = len(gr) - 1
+    r = np.vstack(nrisk) / np.clip(nrisk_tot, 1e-10, np.inf)[None, :]  # each line is timeseries of r's. line per group
+
+    # The variance of event counts in each group.
+    groups_oe = []
+    groups_var = []
+
+    var_denom = nrisk_tot - 1
+    var_denom = np.clip(var_denom, 1e-10, np.inf)
+
+    # use the first group as a reference
+    for g in range(1, dfs+1):
+        # Difference between observed and  expected number of events in the group #g
+        oe = obsv[g] - r[g]*obs
+
+        # build one row of the dfs x dfs variance matrix
+        var_tensor_part = r[1:, :].T * (np.eye(1, dfs, g-1).ravel() - r[g, :, None])  # r*(1 - r) in multidim
+        var_scalar_part = obs * (nrisk_tot - obs) / var_denom
+        var = var_tensor_part * var_scalar_part[:, None]
+
+        if weights is not None:
+            oe = weights * oe
+            var = (weights**2)[:, None] * var
+
+        # sum over times and store
+        groups_oe.append(oe[ix].sum())
+        groups_var.append(var[ix].sum(axis=0))
+
+    obs_vec = np.hstack(groups_oe)
+    var_mat = np.vstack(groups_var)
+
+    return obs_vec, var_mat


 def plot_survfunc(survfuncs, ax=None):
@@ -416,4 +775,43 @@ def plot_survfunc(survfuncs, ax=None):
     >>> ha[0].set_color('purple')
     >>> ha[1].set_color('orange')
     """
-    pass
+
+    fig, ax = utils.create_mpl_ax(ax)
+
+    # If we have only a single survival function to plot, put it into
+    # a list.
+    try:
+        assert type(survfuncs[0]) is SurvfuncRight
+    except:
+        survfuncs = [survfuncs]
+
+    for gx, sf in enumerate(survfuncs):
+
+        # The estimated survival function does not include a point at
+        # time 0, include it here for plotting.
+        surv_times = np.concatenate(([0], sf.surv_times))
+        surv_prob = np.concatenate(([1], sf.surv_prob))
+
+        # If the final times are censoring times they are not included
+        # in the survival function so we add them here
+        mxt = max(sf.time)
+        if mxt > surv_times[-1]:
+            surv_times = np.concatenate((surv_times, [mxt]))
+            surv_prob = np.concatenate((surv_prob, [surv_prob[-1]]))
+
+        label = getattr(sf, "title", "Group %d" % (gx + 1))
+
+        li, = ax.step(surv_times, surv_prob, '-', label=label, lw=2,
+                      where='post')
+
+        # Plot the censored points.
+        ii = np.flatnonzero(np.logical_not(sf.status))
+        ti = np.unique(sf.time[ii])
+        jj = np.searchsorted(surv_times, ti) - 1
+        sp = surv_prob[jj]
+        ax.plot(ti, sp, '+', ms=12, color=li.get_color(),
+                label=label + " points")
+
+    ax.set_ylim(0, 1.01)
+
+    return fig
diff --git a/statsmodels/emplike/aft_el.py b/statsmodels/emplike/aft_el.py
index 307868b17..898f186b6 100644
--- a/statsmodels/emplike/aft_el.py
+++ b/statsmodels/emplike/aft_el.py
@@ -28,12 +28,16 @@ Statistics. 14:3, 643-656.

 """
 import warnings
+
 import numpy as np
+#from elregress import ElReg
 from scipy import optimize
 from scipy.stats import chi2
+
 from statsmodels.regression.linear_model import OLS, WLS
 from statsmodels.tools import add_constant
 from statsmodels.tools.sm_exceptions import IterationLimitWarning
+
 from .descriptive import _OptFuncts


@@ -53,7 +57,6 @@ class OptAFT(_OptFuncts):
         Uses the modified Em algorithm of Zhou 2005 to maximize the
         likelihood of a parameter vector.
     """
-
     def __init__(self):
         pass

@@ -75,12 +78,20 @@ class OptAFT(_OptFuncts):
             -2 times the log likelihood of the nuisance parameters and the
             hypothesized value of the parameter(s) of interest.
         """
-        pass
+        test_params = test_vals.reshape(self.model.nvar, 1)
+        est_vect = self.model.uncens_exog * (self.model.uncens_endog -
+                                            np.dot(self.model.uncens_exog,
+                                                         test_params))
+        eta_star = self._modif_newton(np.zeros(self.model.nvar), est_vect,
+                                         self.model._fit_weights)
+        denom = np.sum(self.model._fit_weights) + np.dot(eta_star, est_vect.T)
+        self.new_weights = self.model._fit_weights / denom
+        return -1 * np.sum(np.log(self.new_weights))

     def _EM_test(self, nuisance_params, params=None, param_nums=None,
-        b0_vals=None, F=None, survidx=None, uncens_nobs=None, numcensbelow=
-        None, km=None, uncensored=None, censored=None, maxiter=None, ftol=None
-        ):
+                 b0_vals=None, F=None, survidx=None, uncens_nobs=None,
+                numcensbelow=None, km=None, uncensored=None, censored=None,
+                maxiter=None, ftol=None):
         """
         Uses EM algorithm to compute the maximum likelihood of a test

@@ -102,7 +113,46 @@ class OptAFT(_OptFuncts):
         -----
         Optional parameters are provided by the test_beta function.
         """
-        pass
+        iters = 0
+        params[param_nums] = b0_vals
+
+        nuis_param_index = np.int_(np.delete(np.arange(self.model.nvar),
+                                           param_nums))
+        params[nuis_param_index] = nuisance_params
+        to_test = params.reshape(self.model.nvar, 1)
+        opt_res = np.inf
+        diff = np.inf
+        while iters < maxiter and diff > ftol:
+            F = F.flatten()
+            death = np.cumsum(F[::-1])
+            survivalprob = death[::-1]
+            surv_point_mat = np.dot(F.reshape(-1, 1),
+                                1. / survivalprob[survidx].reshape(1, - 1))
+            surv_point_mat = add_constant(surv_point_mat)
+            summed_wts = np.cumsum(surv_point_mat, axis=1)
+            wts = summed_wts[np.int_(np.arange(uncens_nobs)),
+                             numcensbelow[uncensored]]
+            # ^E step
+            # See Zhou 2005, section 3.
+            self.model._fit_weights = wts
+            new_opt_res = self._opt_wtd_nuis_regress(to_test)
+                # ^ Uncensored weights' contribution to likelihood value.
+            F = self.new_weights
+                # ^ M step
+            diff = np.abs(new_opt_res - opt_res)
+            opt_res = new_opt_res
+            iters = iters + 1
+        death = np.cumsum(F.flatten()[::-1])
+        survivalprob = death[::-1]
+        llike = -opt_res + np.sum(np.log(survivalprob[survidx]))
+        wtd_km = km.flatten() / np.sum(km)
+        survivalmax = np.cumsum(wtd_km[::-1])[::-1]
+        llikemax = np.sum(np.log(wtd_km[uncensored])) + \
+          np.sum(np.log(survivalmax[censored]))
+        if iters == maxiter:
+            warnings.warn('The EM reached the maximum number of iterations',
+                          IterationLimitWarning)
+        return -2 * (llike - llikemax)

     def _ci_limits_beta(self, b0, param_num=None):
         """
@@ -116,7 +166,7 @@ class OptAFT(_OptFuncts):
         param_num : int
             Parameter index of b0
         """
-        pass
+        return self.test_beta([b0], [param_num])[0] - self.r0


 class emplikeAFT:
@@ -174,7 +224,6 @@ class emplikeAFT:
     The last observation is assumed to be uncensored which makes
     estimation and inference possible.
     """
-
     def __init__(self, endog, exog, censors):
         self.nobs = np.shape(exog)[0]
         self.endog = endog.reshape(self.nobs, 1)
@@ -185,12 +234,13 @@ class emplikeAFT:
         self.endog = self.endog[idx]
         self.exog = self.exog[idx]
         self.censors = self.censors[idx]
-        self.censors[-1] = 1
+        self.censors[-1] = 1  # Sort in init, not in function
         self.uncens_nobs = int(np.sum(self.censors))
         mask = self.censors.ravel().astype(bool)
         self.uncens_endog = self.endog[mask, :].reshape(-1, 1)
         self.uncens_exog = self.exog[mask, :]

+
     def _is_tied(self, endog, censors):
         """
         Indicated if an observation takes the same value as the next
@@ -209,7 +259,13 @@ class emplikeAFT:
             ties[i]=1 if endog[i]==endog[i+1] and
             censors[i]=censors[i+1]
         """
-        pass
+        nobs = int(self.nobs)
+        endog_idx = endog[np.arange(nobs - 1)] == (
+            endog[np.arange(nobs - 1) + 1])
+        censors_idx = censors[np.arange(nobs - 1)] == (
+            censors[np.arange(nobs - 1) + 1])
+        indic_ties = endog_idx * censors_idx  # Both true
+        return np.int_(indic_ties)

     def _km_w_ties(self, tie_indic, untied_km):
         """
@@ -223,7 +279,22 @@ class emplikeAFT:
         untied_km: 1d array
             Km estimates at each observation assuming no ties.
         """
-        pass
+        # TODO: Vectorize, even though it is only 1 pass through for any
+        # function call
+        num_same = 1
+        idx_nums = []
+        for obs_num in np.arange(int(self.nobs - 1))[::-1]:
+            if tie_indic[obs_num] == 1:
+                idx_nums.append(obs_num)
+                num_same = num_same + 1
+                untied_km[obs_num] = untied_km[obs_num + 1]
+            elif tie_indic[obs_num] == 0 and num_same > 1:
+                idx_nums.append(max(idx_nums) + 1)
+                idx_nums = np.asarray(idx_nums)
+                untied_km[idx_nums] = untied_km[idx_nums]
+                num_same = 1
+                idx_nums = []
+        return untied_km.reshape(self.nobs, 1)

     def _make_km(self, endog, censors):
         """
@@ -248,7 +319,15 @@ class emplikeAFT:
         the data.If a censored observation and an uncensored observation has
         the same value, it is assumed that the uncensored happened first.
         """
-        pass
+        nobs = self.nobs
+        num = (nobs - (np.arange(nobs) + 1.))
+        denom = ((nobs - (np.arange(nobs) + 1.) + 1.))
+        km = (num / denom).reshape(nobs, 1)
+        km = km ** np.abs(censors - 1.)
+        km = np.cumprod(km)  # If no ties, this is kaplan-meier
+        tied = self._is_tied(endog, censors)
+        wtd_km = self._km_w_ties(tied, km)
+        return (censors / wtd_km).reshape(nobs, 1)

     def fit(self):
         """
@@ -268,11 +347,15 @@ class emplikeAFT:
         -----
         To avoid dividing by zero, max(endog) is assumed to be uncensored.
         """
-        pass
+        return AFTResults(self)

+    def predict(self, params, endog=None):
+        if endog is None:
+            endog = self.endog
+        return np.dot(endog, params)

-class AFTResults(OptAFT):

+class AFTResults(OptAFT):
     def __init__(self, model):
         self.model = model

@@ -294,10 +377,15 @@ class AFTResults(OptAFT):
         -----
         To avoid dividing by zero, max(endog) is assumed to be uncensored.
         """
-        pass
-
-    def test_beta(self, b0_vals, param_nums, ftol=10 ** -5, maxiter=30,
-        print_weights=1):
+        self.model.modif_censors = np.copy(self.model.censors)
+        self.model.modif_censors[-1] = 1
+        wts = self.model._make_km(self.model.endog, self.model.modif_censors)
+        res = WLS(self.model.endog, self.model.exog, wts).fit()
+        params = res.params
+        return params
+
+    def test_beta(self, b0_vals, param_nums, ftol=10 ** - 5, maxiter=30,
+                  print_weights=1):
         """
         Returns the profile log likelihood for regression parameters
         'param_num' at 'b0_vals.'
@@ -361,9 +449,48 @@ class AFTResults(OptAFT):
         >>> res
         (4.623487775078047, 0.031537049752572731)
         """
-        pass
-
-    def ci_beta(self, param_num, beta_high, beta_low, sig=0.05):
+        censors = self.model.censors
+        endog = self.model.endog
+        exog = self.model.exog
+        uncensored = (censors == 1).flatten()
+        censored = (censors == 0).flatten()
+        uncens_endog = endog[uncensored]
+        uncens_exog = exog[uncensored, :]
+        reg_model = OLS(uncens_endog, uncens_exog).fit()
+        llr, pval, new_weights = reg_model.el_test(b0_vals, param_nums,
+                                      return_weights=True)  # Needs to be changed
+        km = self.model._make_km(endog, censors).flatten()  # when merged
+        uncens_nobs = self.model.uncens_nobs
+        F = np.asarray(new_weights).reshape(uncens_nobs)
+        # Step 0 ^
+        params = self.params()
+        survidx = np.where(censors == 0)
+        survidx = survidx[0] - np.arange(len(survidx[0]))
+        numcensbelow = np.int_(np.cumsum(1 - censors))
+        if len(param_nums) == len(params):
+            llr = self._EM_test([], F=F, params=params,
+                                      param_nums=param_nums,
+                                b0_vals=b0_vals, survidx=survidx,
+                             uncens_nobs=uncens_nobs,
+                             numcensbelow=numcensbelow, km=km,
+                             uncensored=uncensored, censored=censored,
+                             ftol=ftol, maxiter=25)
+            return llr, chi2.sf(llr, self.model.nvar)
+        else:
+            x0 = np.delete(params, param_nums)
+            try:
+                res = optimize.fmin(self._EM_test, x0,
+                                   (params, param_nums, b0_vals, F, survidx,
+                                    uncens_nobs, numcensbelow, km, uncensored,
+                                    censored, maxiter, ftol), full_output=1,
+                                    disp=0)
+
+                llr = res[1]
+                return llr, chi2.sf(llr, len(param_nums))
+            except np.linalg.linalg.LinAlgError:
+                return np.inf, 0
+
+    def ci_beta(self, param_num, beta_high, beta_low, sig=.05):
         """
         Returns the confidence interval for a regression
         parameter in the AFT model.
@@ -403,4 +530,10 @@ class AFTResults(OptAFT):
         If the user desires to verify the success of the optimization,
         it is recommended to test the limits using test_beta.
         """
-        pass
+        params = self.params()
+        self.r0 = chi2.ppf(1 - sig, 1)
+        ll = optimize.brentq(self._ci_limits_beta, beta_low,
+                             params[param_num], (param_num))
+        ul = optimize.brentq(self._ci_limits_beta,
+                             params[param_num], beta_high, (param_num))
+        return ll, ul
diff --git a/statsmodels/emplike/api.py b/statsmodels/emplike/api.py
index 41d9b504f..6a8b5f5eb 100644
--- a/statsmodels/emplike/api.py
+++ b/statsmodels/emplike/api.py
@@ -2,8 +2,11 @@
 API for empirical likelihood

 """
-__all__ = ['DescStat', 'DescStatUV', 'DescStatMV', 'ELOriginRegress',
-    'ANOVA', 'emplikeAFT']
+__all__ = [
+    "DescStat", "DescStatUV", "DescStatMV",
+    "ELOriginRegress", "ANOVA", "emplikeAFT"
+]
+
 from .descriptive import DescStat, DescStatUV, DescStatMV
 from .originregress import ELOriginRegress
 from .elanova import ANOVA
diff --git a/statsmodels/emplike/descriptive.py b/statsmodels/emplike/descriptive.py
index 29e422b65..c1b1b3b40 100644
--- a/statsmodels/emplike/descriptive.py
+++ b/statsmodels/emplike/descriptive.py
@@ -38,7 +38,12 @@ def DescStat(endog):
         If k=1, the function returns a univariate instance, DescStatUV.
         If k>1, the function returns a multivariate instance, DescStatMV.
     """
-    pass
+    if endog.ndim == 1:
+        endog = endog.reshape(len(endog), 1)
+    if endog.shape[1] == 1:
+        return DescStatUV(endog)
+    if endog.shape[1] > 1:
+        return DescStatMV(endog)


 class _OptFuncts:
@@ -89,7 +94,14 @@ class _OptFuncts:
         The function value is not used in optimization and the optimal value
         is disregarded when computing the log likelihood ratio.
         """
-        pass
+        data_star = np.log(weights) + (np.sum(weights) +\
+                                       np.dot(est_vect, eta))
+        idx = data_star < 1. / nobs
+        not_idx = ~idx
+        nx = nobs * data_star[idx]
+        data_star[idx] = np.log(1. / nobs) - 1.5 + nx * (2. - nx / 2)
+        data_star[not_idx] = np.log(data_star[not_idx])
+        return data_star

     def _hess(self, eta, est_vect, weights, nobs):
         """
@@ -112,7 +124,14 @@ class _OptFuncts:
         hess : m x m array
             Weighted hessian used in _wtd_modif_newton
         """
-        pass
+        #eta = np.squeeze(eta)
+        data_star_doub_prime = np.sum(weights) + np.dot(est_vect, eta)
+        idx = data_star_doub_prime < 1. / nobs
+        not_idx = ~idx
+        data_star_doub_prime[idx] = - nobs ** 2
+        data_star_doub_prime[not_idx] = - (data_star_doub_prime[not_idx]) ** -2
+        wtd_dsdp = weights * data_star_doub_prime
+        return np.dot(est_vect.T, wtd_dsdp[:, None] * est_vect)

     def _grad(self, eta, est_vect, weights, nobs):
         """
@@ -135,9 +154,15 @@ class _OptFuncts:
         gradient : ndarray (m,1)
             The gradient used in _wtd_modif_newton
         """
-        pass
-
-    def _modif_newton(self, eta, est_vect, weights):
+        #eta = np.squeeze(eta)
+        data_star_prime = np.sum(weights) + np.dot(est_vect, eta)
+        idx = data_star_prime < 1. / nobs
+        not_idx = ~idx
+        data_star_prime[idx] = nobs * (2 - nobs * data_star_prime[idx])
+        data_star_prime[not_idx] = 1. / data_star_prime[not_idx]
+        return np.dot(weights * data_star_prime, est_vect)
+
+    def _modif_newton(self,  eta, est_vect, weights):
         """
         Modified Newton's method for maximizing the log 'star' equation.  This
         function calls _fit_newton to find the optimal values of eta.
@@ -158,7 +183,15 @@ class _OptFuncts:
         params : 1xm array
             Lagrange multiplier that maximizes the log-likelihood
         """
-        pass
+        nobs = len(est_vect)
+        f = lambda x0: - np.sum(self._log_star(x0, est_vect, weights, nobs))
+        grad = lambda x0: - self._grad(x0, est_vect, weights, nobs)
+        hess = lambda x0: - self._hess(x0, est_vect, weights, nobs)
+        kwds = {'tol': 1e-8}
+        eta = eta.squeeze()
+        res = _fit_newton(f, grad, eta, (), kwds, hess=hess, maxiter=50, \
+                              disp=0)
+        return res[0]

     def _find_eta(self, eta):
         """
@@ -175,7 +208,8 @@ class _OptFuncts:
         llr : float
             n times the log likelihood value for a given value of eta
         """
-        pass
+        return np.sum((self.endog - self.mu0) / \
+              (1. + eta * (self.endog - self.mu0)))

     def _ci_limits_mu(self, mu):
         """
@@ -193,7 +227,7 @@ class _OptFuncts:
             The difference between the log likelihood value of mu0 and
             a specified value.
         """
-        pass
+        return self.test_mean(mu)[0] - self.r0

     def _find_gamma(self, gamma):
         """
@@ -213,7 +247,10 @@ class _OptFuncts:
             The difference between the log-liklihood when the Lagrange
             multiplier is gamma and a pre-specified value
         """
-        pass
+        denom = np.sum((self.endog - gamma) ** -1)
+        new_weights = (self.endog - gamma) ** -1 / denom
+        return -2 * np.sum(np.log(self.nobs * new_weights)) - \
+            self.r0

     def _opt_var(self, nuisance_mu, pval=False):
         """
@@ -231,7 +268,22 @@ class _OptFuncts:
             Log likelihood of a pre-specified variance holding the nuisance
             parameter constant
         """
-        pass
+        endog = self.endog
+        nobs = self.nobs
+        sig_data = ((endog - nuisance_mu) ** 2 \
+                    - self.sig2_0)
+        mu_data = (endog - nuisance_mu)
+        est_vect = np.column_stack((mu_data, sig_data))
+        eta_star = self._modif_newton(np.array([1. / nobs,
+                                               1. / nobs]), est_vect,
+                                                np.ones(nobs) * (1. / nobs))
+
+        denom = 1 + np.dot(eta_star, est_vect.T)
+        self.new_weights = 1. / nobs * 1. / denom
+        llr = np.sum(np.log(nobs * self.new_weights))
+        if pval:  # Used for contour plotting
+            return chi2.sf(-2 * llr, 1)
+        return -2 * llr

     def _ci_limits_var(self, var):
         """
@@ -250,7 +302,7 @@ class _OptFuncts:
             The difference between the log likelihood ratio at var_test and a
             pre-specified value.
         """
-        pass
+        return self.test_var(var)[0] - self.r0

     def _opt_skew(self, nuis_params):
         """
@@ -268,7 +320,21 @@ class _OptFuncts:
             The log likelihood ratio of a pre-specified skewness holding
             the nuisance parameters constant.
         """
-        pass
+        endog = self.endog
+        nobs = self.nobs
+        mu_data = endog - nuis_params[0]
+        sig_data = ((endog - nuis_params[0]) ** 2) - nuis_params[1]
+        skew_data = ((((endog - nuis_params[0]) ** 3) /
+                    (nuis_params[1] ** 1.5))) - self.skew0
+        est_vect = np.column_stack((mu_data, sig_data, skew_data))
+        eta_star = self._modif_newton(np.array([1. / nobs,
+                                               1. / nobs,
+                                               1. / nobs]), est_vect,
+                                               np.ones(nobs) * (1. / nobs))
+        denom = 1. + np.dot(eta_star, est_vect.T)
+        self.new_weights = 1. / nobs * 1. / denom
+        llr = np.sum(np.log(nobs * self.new_weights))
+        return -2 * llr

     def _opt_kurt(self, nuis_params):
         """
@@ -286,7 +352,21 @@ class _OptFuncts:
             The log likelihood ratio of a pre-speified kurtosis holding the
             nuisance parameters constant
         """
-        pass
+        endog = self.endog
+        nobs = self.nobs
+        mu_data = endog - nuis_params[0]
+        sig_data = ((endog - nuis_params[0]) ** 2) - nuis_params[1]
+        kurt_data = (((((endog - nuis_params[0]) ** 4) / \
+                    (nuis_params[1] ** 2))) - 3) - self.kurt0
+        est_vect = np.column_stack((mu_data, sig_data, kurt_data))
+        eta_star = self._modif_newton(np.array([1. / nobs,
+                                               1. / nobs,
+                                               1. / nobs]), est_vect,
+                                               np.ones(nobs) * (1. / nobs))
+        denom = 1 + np.dot(eta_star, est_vect.T)
+        self.new_weights = 1. / nobs * 1. / denom
+        llr = np.sum(np.log(nobs * self.new_weights))
+        return -2 * llr

     def _opt_skew_kurt(self, nuis_params):
         """
@@ -304,7 +384,24 @@ class _OptFuncts:
             The log likelihood ratio of a pre-speified skewness and
             kurtosis holding the nuisance parameters constant.
         """
-        pass
+        endog = self.endog
+        nobs = self.nobs
+        mu_data = endog - nuis_params[0]
+        sig_data = ((endog - nuis_params[0]) ** 2) - nuis_params[1]
+        skew_data = ((((endog - nuis_params[0]) ** 3) / \
+                    (nuis_params[1] ** 1.5))) - self.skew0
+        kurt_data = (((((endog - nuis_params[0]) ** 4) / \
+                    (nuis_params[1] ** 2))) - 3) - self.kurt0
+        est_vect = np.column_stack((mu_data, sig_data, skew_data, kurt_data))
+        eta_star = self._modif_newton(np.array([1. / nobs,
+                                               1. / nobs,
+                                               1. / nobs,
+                                               1. / nobs]), est_vect,
+                                               np.ones(nobs) * (1. / nobs))
+        denom = 1. + np.dot(eta_star, est_vect.T)
+        self.new_weights = 1. / nobs * 1. / denom
+        llr = np.sum(np.log(nobs * self.new_weights))
+        return -2 * llr

     def _ci_limits_skew(self, skew):
         """
@@ -319,7 +416,7 @@ class _OptFuncts:
             The difference between the log likelihood ratio at skew and a
             pre-specified value.
         """
-        pass
+        return self.test_skew(skew)[0] - self.r0

     def _ci_limits_kurt(self, kurt):
         """
@@ -334,7 +431,7 @@ class _OptFuncts:
             The difference between the log likelihood ratio at kurt and a
             pre-specified value.
         """
-        pass
+        return self.test_kurt(kurt)[0] - self.r0

     def _opt_correl(self, nuis_params, corr0, endog, nobs, x0, weights0):
         """
@@ -349,7 +446,21 @@ class _OptFuncts:
             The log-likelihood of the correlation coefficient holding nuisance
             parameters constant
         """
-        pass
+        mu1_data, mu2_data = (endog - nuis_params[::2]).T
+        sig1_data = mu1_data ** 2 - nuis_params[1]
+        sig2_data = mu2_data ** 2 - nuis_params[3]
+        correl_data = ((mu1_data * mu2_data) - corr0 *
+                    (nuis_params[1] * nuis_params[3]) ** .5)
+        est_vect = np.column_stack((mu1_data, sig1_data,
+                                    mu2_data, sig2_data, correl_data))
+        eta_star = self._modif_newton(x0, est_vect, weights0)
+        denom = 1. + np.dot(est_vect, eta_star)
+        self.new_weights = 1. / nobs * 1. / denom
+        llr = np.sum(np.log(nobs * self.new_weights))
+        return -2 * llr
+
+    def _ci_limits_corr(self, corr):
+        return self.test_corr(corr)[0] - self.r0


 class DescStatUV(_OptFuncts):
@@ -395,10 +506,21 @@ class DescStatUV(_OptFuncts):
         test_results : tuple
             The log-likelihood ratio and p-value of mu0
         """
-        pass
-
-    def ci_mean(self, sig=0.05, method='gamma', epsilon=10 ** -8, gamma_low
-        =-10 ** 10, gamma_high=10 ** 10):
+        self.mu0 = mu0
+        endog = self.endog
+        nobs = self.nobs
+        eta_min = (1. - (1. / nobs)) / (self.mu0 - max(endog))
+        eta_max = (1. - (1. / nobs)) / (self.mu0 - min(endog))
+        eta_star = optimize.brentq(self._find_eta, eta_min, eta_max)
+        new_weights = (1. / nobs) * 1. / (1. + eta_star * (endog - self.mu0))
+        llr = -2 * np.sum(np.log(nobs * new_weights))
+        if return_weights:
+            return llr, chi2.sf(llr, 1), new_weights
+        else:
+            return llr, chi2.sf(llr, 1)
+
+    def ci_mean(self, sig=.05, method='gamma', epsilon=10 ** -8,
+                 gamma_low=-10 ** 10, gamma_high=10 ** 10):
         """
         Returns the confidence interval for the mean.

@@ -450,7 +572,32 @@ class DescStatUV(_OptFuncts):
         Interval : tuple
             Confidence interval for the mean
         """
-        pass
+        endog = self.endog
+        sig = 1 - sig
+        if method == 'nested-brent':
+            self.r0 = chi2.ppf(sig, 1)
+            middle = np.mean(endog)
+            epsilon_u = (max(endog) - np.mean(endog)) * epsilon
+            epsilon_l = (np.mean(endog) - min(endog)) * epsilon
+            ulim = optimize.brentq(self._ci_limits_mu, middle,
+                max(endog) - epsilon_u)
+            llim = optimize.brentq(self._ci_limits_mu, middle,
+                min(endog) + epsilon_l)
+            return llim, ulim
+
+        if method == 'gamma':
+            self.r0 = chi2.ppf(sig, 1)
+            gamma_star_l = optimize.brentq(self._find_gamma, gamma_low,
+                min(endog) - epsilon)
+            gamma_star_u = optimize.brentq(self._find_gamma, \
+                         max(endog) + epsilon, gamma_high)
+            weights_low = ((endog - gamma_star_l) ** -1) / \
+                np.sum((endog - gamma_star_l) ** -1)
+            weights_high = ((endog - gamma_star_u) ** -1) / \
+                np.sum((endog - gamma_star_u) ** -1)
+            mu_low = np.sum(weights_low * endog)
+            mu_high = np.sum(weights_high * endog)
+            return mu_low,  mu_high

     def test_var(self, sig2_0, return_weights=False):
         """
@@ -479,9 +626,18 @@ class DescStatUV(_OptFuncts):
         >>> el_analysis = sm.emplike.DescStat(random_numbers)
         >>> hyp_test = el_analysis.test_var(9500)
         """
-        pass
-
-    def ci_var(self, lower_bound=None, upper_bound=None, sig=0.05):
+        self.sig2_0 = sig2_0
+        mu_max = max(self.endog)
+        mu_min = min(self.endog)
+        llr = optimize.fminbound(self._opt_var, mu_min, mu_max, \
+                                 full_output=1)[1]
+        p_val = chi2.sf(llr, 1)
+        if return_weights:
+            return llr, p_val, self.new_weights.T
+        else:
+            return llr, p_val
+
+    def ci_var(self, lower_bound=None, upper_bound=None, sig=.05):
         """
         Returns the confidence interval for the variance.

@@ -524,10 +680,21 @@ class DescStatUV(_OptFuncts):
         different signs, consider lowering lower_bound and raising
         upper_bound.
         """
-        pass
+        endog = self.endog
+        if upper_bound is None:
+            upper_bound = ((self.nobs - 1) * endog.var()) / \
+              (chi2.ppf(.0001, self.nobs - 1))
+        if lower_bound is None:
+            lower_bound = ((self.nobs - 1) * endog.var()) / \
+              (chi2.ppf(.9999, self.nobs - 1))
+        self.r0 = chi2.ppf(1 - sig, 1)
+        llim = optimize.brentq(self._ci_limits_var, lower_bound, endog.var())
+        ulim = optimize.brentq(self._ci_limits_var, endog.var(), upper_bound)
+        return llim, ulim

     def plot_contour(self, mu_low, mu_high, var_low, var_high, mu_step,
-        var_step, levs=[0.2, 0.1, 0.05, 0.01, 0.001]):
+                        var_step,
+                        levs=[.2, .1, .05, .01, .001]):
         """
         Returns a plot of the confidence region for a univariate
         mean and variance.
@@ -561,7 +728,19 @@ class DescStatUV(_OptFuncts):
         Figure
             The contour plot
         """
-        pass
+        fig, ax = utils.create_mpl_ax()
+        ax.set_ylabel('Variance')
+        ax.set_xlabel('Mean')
+        mu_vect = list(np.arange(mu_low, mu_high, mu_step))
+        var_vect = list(np.arange(var_low, var_high, var_step))
+        z = []
+        for sig0 in var_vect:
+            self.sig2_0 = sig0
+            for mu0 in mu_vect:
+                z.append(self._opt_var(mu0, pval=True))
+        z = np.asarray(z).reshape(len(var_vect), len(mu_vect))
+        ax.contour(mu_vect, var_vect, z, levels=levs)
+        return fig

     def test_skew(self, skew0, return_weights=False):
         """
@@ -582,7 +761,16 @@ class DescStatUV(_OptFuncts):
         test_results : tuple
             The log-likelihood ratio and p_value of skew0
         """
-        pass
+        self.skew0 = skew0
+        start_nuisance = np.array([self.endog.mean(),
+                                       self.endog.var()])
+
+        llr = optimize.fmin_powell(self._opt_skew, start_nuisance,
+                                     full_output=1, disp=0)[1]
+        p_val = chi2.sf(llr, 1)
+        if return_weights:
+            return llr, p_val,  self.new_weights.T
+        return llr, p_val

     def test_kurt(self, kurt0, return_weights=False):
         """
@@ -603,7 +791,16 @@ class DescStatUV(_OptFuncts):
         test_results : tuple
             The log-likelihood ratio and p-value of kurt0
         """
-        pass
+        self.kurt0 = kurt0
+        start_nuisance = np.array([self.endog.mean(),
+                                       self.endog.var()])
+
+        llr = optimize.fmin_powell(self._opt_kurt, start_nuisance,
+                                     full_output=1, disp=0)[1]
+        p_val = chi2.sf(llr, 1)
+        if return_weights:
+            return llr, p_val, self.new_weights.T
+        return llr, p_val

     def test_joint_skew_kurt(self, skew0, kurt0, return_weights=False):
         """
@@ -626,9 +823,19 @@ class DescStatUV(_OptFuncts):
         test_results : tuple
             The log-likelihood ratio and p-value  of the joint hypothesis test.
         """
-        pass
-
-    def ci_skew(self, sig=0.05, upper_bound=None, lower_bound=None):
+        self.skew0 = skew0
+        self.kurt0 = kurt0
+        start_nuisance = np.array([self.endog.mean(),
+                                       self.endog.var()])
+
+        llr = optimize.fmin_powell(self._opt_skew_kurt, start_nuisance,
+                                     full_output=1, disp=0)[1]
+        p_val = chi2.sf(llr, 2)
+        if return_weights:
+            return llr, p_val, self.new_weights.T
+        return llr, p_val
+
+    def ci_skew(self, sig=.05, upper_bound=None, lower_bound=None):
         """
         Returns the confidence interval for skewness.

@@ -655,9 +862,24 @@ class DescStatUV(_OptFuncts):
         If function returns f(a) and f(b) must have different signs, consider
         expanding lower and upper bounds
         """
-        pass
-
-    def ci_kurt(self, sig=0.05, upper_bound=None, lower_bound=None):
+        nobs = self.nobs
+        endog = self.endog
+        if upper_bound is None:
+            upper_bound = skew(endog) + \
+            2.5 * ((6. * nobs * (nobs - 1.)) / \
+              ((nobs - 2.) * (nobs + 1.) * \
+               (nobs + 3.))) ** .5
+        if lower_bound is None:
+            lower_bound = skew(endog) - \
+            2.5 * ((6. * nobs * (nobs - 1.)) / \
+              ((nobs - 2.) * (nobs + 1.) * \
+               (nobs + 3.))) ** .5
+        self.r0 = chi2.ppf(1 - sig, 1)
+        llim = optimize.brentq(self._ci_limits_skew, lower_bound, skew(endog))
+        ulim = optimize.brentq(self._ci_limits_skew, skew(endog), upper_bound)
+        return llim, ulim
+
+    def ci_kurt(self, sig=.05, upper_bound=None, lower_bound=None):
         """
         Returns the confidence interval for kurtosis.

@@ -689,7 +911,28 @@ class DescStatUV(_OptFuncts):
         If function returns f(a) and f(b) must have different signs, consider
         expanding the bounds.
         """
-        pass
+        endog = self.endog
+        nobs = self.nobs
+        if upper_bound is None:
+            upper_bound = kurtosis(endog) + \
+            (2.5 * (2. * ((6. * nobs * (nobs - 1.)) / \
+              ((nobs - 2.) * (nobs + 1.) * \
+               (nobs + 3.))) ** .5) * \
+               (((nobs ** 2.) - 1.) / ((nobs - 3.) *\
+                 (nobs + 5.))) ** .5)
+        if lower_bound is None:
+            lower_bound = kurtosis(endog) - \
+            (2.5 * (2. * ((6. * nobs * (nobs - 1.)) / \
+              ((nobs - 2.) * (nobs + 1.) * \
+               (nobs + 3.))) ** .5) * \
+               (((nobs ** 2.) - 1.) / ((nobs - 3.) *\
+                 (nobs + 5.))) ** .5)
+        self.r0 = chi2.ppf(1 - sig, 1)
+        llim = optimize.brentq(self._ci_limits_kurt, lower_bound, \
+                             kurtosis(endog))
+        ulim = optimize.brentq(self._ci_limits_kurt, kurtosis(endog), \
+                             upper_bound)
+        return llim, ulim


 class DescStatMV(_OptFuncts):
@@ -734,11 +977,30 @@ class DescStatMV(_OptFuncts):
         test_results : tuple
             The log-likelihood ratio and p-value for mu_array
         """
-        pass
-
-    def mv_mean_contour(self, mu1_low, mu1_upp, mu2_low, mu2_upp, step1,
-        step2, levs=(0.001, 0.01, 0.05, 0.1, 0.2), var1_name=None,
-        var2_name=None, plot_dta=False):
+        endog = self.endog
+        nobs = self.nobs
+        if len(mu_array) != endog.shape[1]:
+            raise ValueError('mu_array must have the same number of '
+                             'elements as the columns of the data.')
+        mu_array = mu_array.reshape(1, endog.shape[1])
+        means = np.ones((endog.shape[0], endog.shape[1]))
+        means = mu_array * means
+        est_vect = endog - means
+        start_vals = 1. / nobs * np.ones(endog.shape[1])
+        eta_star = self._modif_newton(start_vals, est_vect,
+                                      np.ones(nobs) * (1. / nobs))
+        denom = 1 + np.dot(eta_star, est_vect.T)
+        self.new_weights = 1 / nobs * 1 / denom
+        llr = -2 * np.sum(np.log(nobs * self.new_weights))
+        p_val = chi2.sf(llr, mu_array.shape[1])
+        if return_weights:
+            return llr, p_val,  self.new_weights.T
+        else:
+            return llr, p_val
+
+    def mv_mean_contour(self, mu1_low, mu1_upp, mu2_low, mu2_upp, step1, step2,
+                        levs=(.001, .01, .05, .1, .2), var1_name=None,
+                        var2_name=None, plot_dta=False):
         """
         Creates a confidence region plot for the mean of bivariate data

@@ -783,7 +1045,30 @@ class DescStatMV(_OptFuncts):
         >>> contourp = el_analysis.mv_mean_contour(-2, 2, -2, 2, .1, .1)
         >>> contourp.show()
         """
-        pass
+        if self.endog.shape[1] != 2:
+            raise ValueError('Data must contain exactly two variables')
+        fig, ax = utils.create_mpl_ax()
+        if var2_name is None:
+            ax.set_ylabel('Variable 2')
+        else:
+            ax.set_ylabel(var2_name)
+        if var1_name is None:
+            ax.set_xlabel('Variable 1')
+        else:
+            ax.set_xlabel(var1_name)
+        x = np.arange(mu1_low, mu1_upp, step1)
+        y = np.arange(mu2_low, mu2_upp, step2)
+        pairs = itertools.product(x, y)
+        z = []
+        for i in pairs:
+            z.append(self.mv_test_mean(np.asarray(i))[0])
+        X, Y = np.meshgrid(x, y)
+        z = np.asarray(z)
+        z = z.reshape(X.shape[1], Y.shape[0])
+        ax.contour(x, y, z.T, levels=levs)
+        if plot_dta:
+            ax.plot(self.endog[:, 0], self.endog[:, 1], 'bo')
+        return fig

     def test_corr(self, corr0, return_weights=0):
         """
@@ -799,9 +1084,26 @@ class DescStatMV(_OptFuncts):
             If true, returns the weights that maximize
             the log-likelihood at the hypothesized value
         """
-        pass
-
-    def ci_corr(self, sig=0.05, upper_bound=None, lower_bound=None):
+        nobs = self.nobs
+        endog = self.endog
+        if endog.shape[1] != 2:
+            raise NotImplementedError('Correlation matrix not yet implemented')
+        nuis0 = np.array([endog[:, 0].mean(),
+                              endog[:, 0].var(),
+                              endog[:, 1].mean(),
+                              endog[:, 1].var()])
+
+        x0 = np.zeros(5)
+        weights0 = np.array([1. / nobs] * int(nobs))
+        args = (corr0, endog, nobs, x0, weights0)
+        llr = optimize.fmin(self._opt_correl, nuis0, args=args,
+                                     full_output=1, disp=0)[1]
+        p_val = chi2.sf(llr, 1)
+        if return_weights:
+            return llr, p_val, self.new_weights.T
+        return llr, p_val
+
+    def ci_corr(self, sig=.05, upper_bound=None, lower_bound=None):
         """
         Returns the confidence intervals for the correlation coefficient

@@ -823,4 +1125,20 @@ class DescStatMV(_OptFuncts):
         interval : tuple
             Confidence interval for the correlation
         """
-        pass
+        endog = self.endog
+        nobs = self.nobs
+        self.r0 = chi2.ppf(1 - sig, 1)
+        point_est = np.corrcoef(endog[:, 0], endog[:, 1])[0, 1]
+        if upper_bound is None:
+            upper_bound = min(.999, point_est + \
+                          2.5 * ((1. - point_est ** 2.) / \
+                          (nobs - 2.)) ** .5)
+
+        if lower_bound is None:
+            lower_bound = max(- .999, point_est - \
+                          2.5 * (np.sqrt((1. - point_est ** 2.) / \
+                          (nobs - 2.))))
+
+        llim = optimize.brenth(self._ci_limits_corr, lower_bound, point_est)
+        ulim = optimize.brenth(self._ci_limits_corr, point_est, upper_bound)
+        return llim, ulim
diff --git a/statsmodels/emplike/elanova.py b/statsmodels/emplike/elanova.py
index 60995bb96..7783ae312 100644
--- a/statsmodels/emplike/elanova.py
+++ b/statsmodels/emplike/elanova.py
@@ -21,7 +21,6 @@ class _ANOVAOpt(_OptFuncts):
     Class containing functions that are optimized over when
     conducting ANOVA.
     """
-
     def _opt_common_mu(self, mu):
         """
         Optimizes the likelihood under the null hypothesis that all groups have
@@ -37,7 +36,23 @@ class _ANOVAOpt(_OptFuncts):
         llr : float
             -2 times the llr ratio, which is the test statistic.
         """
-        pass
+        nobs = self.nobs
+        endog = self.endog
+        num_groups = self.num_groups
+        endog_asarray = np.zeros((nobs, num_groups))
+        obs_num = 0
+        for arr_num in range(len(endog)):
+            new_obs_num = obs_num + len(endog[arr_num])
+            endog_asarray[obs_num: new_obs_num, arr_num] = endog[arr_num] - \
+              mu
+            obs_num = new_obs_num
+        est_vect = endog_asarray
+        wts = np.ones(est_vect.shape[0]) * (1. / (est_vect.shape[0]))
+        eta_star = self._modif_newton(np.zeros(num_groups), est_vect, wts)
+        denom = 1. + np.dot(eta_star, est_vect.T)
+        self.new_weights = 1. / nobs * 1. / denom
+        llr = np.sum(np.log(nobs * self.new_weights))
+        return -2 * llr


 class ANOVA(_ANOVAOpt):
@@ -87,4 +102,20 @@ class ANOVA(_ANOVAOpt):
         res: tuple
             The log-likelihood, p-value and estimate for the common mean.
         """
-        pass
+        if mu is not None:
+            llr = self._opt_common_mu(mu)
+            pval = 1 - chi2.cdf(llr, self.num_groups - 1)
+            if return_weights:
+                return llr, pval, mu, self.new_weights
+            else:
+                return llr, pval, mu
+        else:
+            res = optimize.fmin_powell(self._opt_common_mu, mu_start,
+                                       full_output=1, disp=False)
+            llr = res[1]
+            mu_common = float(np.squeeze(res[0]))
+            pval = 1 - chi2.cdf(llr, self.num_groups - 1)
+            if return_weights:
+                return llr, pval, mu_common, self.new_weights
+            else:
+                return llr, pval, mu_common
diff --git a/statsmodels/emplike/elregress.py b/statsmodels/emplike/elregress.py
index d5b449c31..d5c82f34d 100644
--- a/statsmodels/emplike/elregress.py
+++ b/statsmodels/emplike/elregress.py
@@ -16,6 +16,7 @@ import numpy as np
 from statsmodels.emplike.descriptive import _OptFuncts


+
 class _ELRegOpts(_OptFuncts):
     """

@@ -28,13 +29,13 @@ class _ELRegOpts(_OptFuncts):
     OLSResults : Results instance
         A fitted OLS result.
     """
-
     def __init__(self):
         pass

-    def _opt_nuis_regress(self, nuisance_params, param_nums=None, endog=
-        None, exog=None, nobs=None, nvar=None, params=None, b0_vals=None,
-        stochastic_exog=None):
+    def _opt_nuis_regress(self, nuisance_params, param_nums=None,
+                          endog=None, exog=None,
+                          nobs=None, nvar=None, params=None, b0_vals=None,
+                          stochastic_exog=None):
         """
         A function that is optimized over nuisance parameters to conduct a
         hypothesis test for the parameters of interest.
@@ -50,4 +51,37 @@ class _ELRegOpts(_OptFuncts):
             -2 x the log-likelihood of the nuisance parameters and the
             hypothesized value of the parameter(s) of interest.
         """
-        pass
+        params[param_nums] = b0_vals
+        nuis_param_index = np.int_(np.delete(np.arange(nvar),
+                                             param_nums))
+        params[nuis_param_index] = nuisance_params
+        new_params = params.reshape(nvar, 1)
+        self.new_params = new_params
+        est_vect = exog * \
+          (endog - np.squeeze(np.dot(exog, new_params))).reshape(int(nobs), 1)
+        if not stochastic_exog:
+            exog_means = np.mean(exog, axis=0)[1:]
+            exog_mom2 = (np.sum(exog * exog, axis=0))[1:]\
+                          / nobs
+            mean_est_vect = exog[:, 1:] - exog_means
+            mom2_est_vect = (exog * exog)[:, 1:] - exog_mom2
+            regressor_est_vect = np.concatenate((mean_est_vect, mom2_est_vect),
+                                                axis=1)
+            est_vect = np.concatenate((est_vect, regressor_est_vect),
+                                           axis=1)
+
+        wts = np.ones(int(nobs)) * (1. / nobs)
+        x0 = np.zeros(est_vect.shape[1]).reshape(-1, 1)
+        try:
+            eta_star = self._modif_newton(x0, est_vect, wts)
+            denom = 1. + np.dot(eta_star, est_vect.T)
+            self.new_weights = 1. / nobs * 1. / denom
+            # the following commented out code is to verify weights
+            # see open issue #1845
+            #self.new_weights /= self.new_weights.sum()
+            #if not np.allclose(self.new_weights.sum(), 1., rtol=0, atol=1e-10):
+            #    raise RuntimeError('weights do not sum to 1')
+            llr = np.sum(np.log(nobs * self.new_weights))
+            return -2 * llr
+        except np.linalg.linalg.LinAlgError:
+            return np.inf
diff --git a/statsmodels/emplike/originregress.py b/statsmodels/emplike/originregress.py
index 4a5faa4f7..28956a5e3 100644
--- a/statsmodels/emplike/originregress.py
+++ b/statsmodels/emplike/originregress.py
@@ -19,7 +19,9 @@ Owen, A.B. (2001). Empirical Likelihood.  Chapman and Hall. p. 82.
 import numpy as np
 from scipy import optimize
 from scipy.stats import chi2
+
 from statsmodels.regression.linear_model import OLS, RegressionResults
+# When descriptive merged, this will be changed
 from statsmodels.tools.tools import add_constant


@@ -50,7 +52,6 @@ class ELOriginRegress:
     nvar : float
         Number of exogenous regressors.
     """
-
     def __init__(self, endog, exog):
         self.endog = endog
         self.exog = exog
@@ -58,7 +59,7 @@ class ELOriginRegress:
         try:
             self.nvar = float(exog.shape[1])
         except IndexError:
-            self.nvar = 1.0
+            self.nvar = 1.

     def fit(self):
         """
@@ -69,7 +70,20 @@ class ELOriginRegress:
         Results : class
             Empirical likelihood regression class.
         """
-        pass
+        exog_with = add_constant(self.exog, prepend=True)
+        restricted_model = OLS(self.endog, exog_with)
+        restricted_fit = restricted_model.fit()
+        restricted_el = restricted_fit.el_test(
+        np.array([0]), np.array([0]), ret_params=1)
+        params = np.squeeze(restricted_el[3])
+        beta_hat_llr = restricted_el[0]
+        llf = np.sum(np.log(restricted_el[2]))
+        return OriginResults(restricted_model, params, beta_hat_llr, llf)
+
+    def predict(self, params, exog=None):
+        if exog is None:
+            exog = self.exog
+        return np.dot(add_constant(exog, prepend=True), params)


 class OriginResults(RegressionResults):
@@ -136,15 +150,13 @@ class OriginResults(RegressionResults):
     >>> fitted.conf_int()
     TypeError: unsupported operand type(s) for *: 'instancemethod' and 'float'
     """
-
     def __init__(self, model, params, est_llr, llf_el):
         self.model = model
         self.params = np.squeeze(params)
         self.llr = est_llr
         self.llf_el = llf_el
-
-    def el_test(self, b0_vals, param_nums, method='nm', stochastic_exog=1,
-        return_weights=0):
+    def el_test(self, b0_vals, param_nums, method='nm',
+                            stochastic_exog=1, return_weights=0):
         """
         Returns the llr and p-value for a hypothesized parameter value
         for a regression that goes through the origin.
@@ -180,10 +192,22 @@ class OriginResults(RegressionResults):
         res : tuple
             pvalue and likelihood ratio.
         """
-        pass
-
-    def conf_int_el(self, param_num, upper_bound=None, lower_bound=None,
-        sig=0.05, method='nm', stochastic_exog=True):
+        b0_vals = np.hstack((0, b0_vals))
+        param_nums = np.hstack((0, param_nums))
+        test_res = self.model.fit().el_test(b0_vals, param_nums, method=method,
+                                  stochastic_exog=stochastic_exog,
+                                  return_weights=return_weights)
+        llr_test = test_res[0]
+        llr_res = llr_test - self.llr
+        pval = chi2.sf(llr_res, self.model.exog.shape[1] - 1)
+        if return_weights:
+            return llr_res, pval, test_res[2]
+        else:
+            return llr_res, pval
+
+    def conf_int_el(self, param_num, upper_bound=None,
+                       lower_bound=None, sig=.05, method='nm',
+                       stochastic_exog=True):
         """
         Returns the confidence interval for a regression parameter when the
         regression is forced through the origin.
@@ -213,4 +237,23 @@ class OriginResults(RegressionResults):
         ci: tuple
             The confidence interval for the parameter 'param_num'.
         """
-        pass
+        r0 = chi2.ppf(1 - sig, 1)
+        param_num = np.array([param_num])
+        if upper_bound is None:
+            ci = np.asarray(self.model.fit().conf_int(.0001))
+            upper_bound = (np.squeeze(ci[param_num])[1])
+        if lower_bound is None:
+            ci = np.asarray(self.model.fit().conf_int(.0001))
+            lower_bound = (np.squeeze(ci[param_num])[0])
+
+        def f(b0):
+            b0 = np.array([b0])
+            val = self.el_test(
+                b0, param_num, method=method, stochastic_exog=stochastic_exog
+            )
+            return val[0] - r0
+
+        _param = np.squeeze(self.params[param_num])
+        lowerl = optimize.brentq(f, np.squeeze(lower_bound), _param)
+        upperl = optimize.brentq(f, _param, np.squeeze(upper_bound))
+        return (lowerl, upperl)
diff --git a/statsmodels/examples/es_misc_poisson2.py b/statsmodels/examples/es_misc_poisson2.py
index 6c48b0219..a4d1494c8 100644
--- a/statsmodels/examples/es_misc_poisson2.py
+++ b/statsmodels/examples/es_misc_poisson2.py
@@ -1,51 +1,62 @@
+
 import numpy as np
+
 import statsmodels.api as sm
-from statsmodels.miscmodels.count import PoissonGMLE, PoissonOffsetGMLE, PoissonZiGMLE
+from statsmodels.miscmodels.count import (PoissonGMLE, PoissonOffsetGMLE,
+                                          PoissonZiGMLE)
 from statsmodels.discrete.discrete_model import Poisson
-DEC = 3


+DEC = 3
+
 class Dummy:
     pass

-
 self = Dummy()
+
+# generate artificial data
 np.random.seed(98765678)
 nobs = 200
-rvs = np.random.randn(nobs, 6)
+rvs = np.random.randn(nobs,6)
 data_exog = rvs
 data_exog = sm.add_constant(data_exog, prepend=False)
-xbeta = 1 + 0.1 * rvs.sum(1)
+xbeta = 1 + 0.1*rvs.sum(1)
 data_endog = np.random.poisson(np.exp(xbeta))
+
+#estimate discretemod.Poisson as benchmark
 res_discrete = Poisson(data_endog, data_exog).fit()
+
 mod_glm = sm.GLM(data_endog, data_exog, family=sm.families.Poisson())
 res_glm = mod_glm.fit()
+
+#estimate generic MLE
 self.mod = PoissonGMLE(data_endog, data_exog)
 res = self.mod.fit()
-offset = res.params[0] * data_exog[:, 0]
-mod1 = PoissonOffsetGMLE(data_endog, data_exog[:, 1:], offset=offset)
-start_params = np.ones(6) / 2.0
+offset = res.params[0] * data_exog[:,0]  #1d ???
+
+mod1 = PoissonOffsetGMLE(data_endog, data_exog[:,1:], offset=offset)
+start_params = np.ones(6)/2.
 start_params = res.params[1:]
-res1 = mod1.fit(start_params=start_params, method='nm', maxiter=1000,
-    maxfun=1000)
+res1 = mod1.fit(start_params=start_params, method='nm', maxiter=1000, maxfun=1000)
+
 print('mod2')
-mod2 = PoissonZiGMLE(data_endog, data_exog[:, 1:], offset=offset)
-start_params = np.r_[np.ones(6) / 2.0, 10]
-start_params = np.r_[res.params[1:], 20.0]
-res2 = mod2.fit(start_params=start_params, method='bfgs', maxiter=1000,
-    maxfun=2000)
+mod2 = PoissonZiGMLE(data_endog, data_exog[:,1:], offset=offset)
+start_params = np.r_[np.ones(6)/2.,10]
+start_params = np.r_[res.params[1:], 20.] #-100]
+res2 = mod2.fit(start_params=start_params, method='bfgs', maxiter=1000, maxfun=2000)
+
 print('mod3')
 mod3 = PoissonZiGMLE(data_endog, data_exog, offset=None)
-start_params = np.r_[np.ones(7) / 2.0, 10]
-start_params = np.r_[res.params, 20.0]
-res3 = mod3.fit(start_params=start_params, method='nm', maxiter=1000,
-    maxfun=2000)
+start_params = np.r_[np.ones(7)/2.,10]
+start_params = np.r_[res.params, 20.]
+res3 = mod3.fit(start_params=start_params, method='nm', maxiter=1000, maxfun=2000)
+
 print('mod4')
 data_endog2 = np.r_[data_endog, np.zeros(nobs)]
 data_exog2 = np.r_[data_exog, data_exog]
+
 mod4 = PoissonZiGMLE(data_endog2, data_exog2, offset=None)
-start_params = np.r_[np.ones(7) / 2.0, 10]
-start_params = np.r_[res.params, 0.0]
-res4 = mod4.fit(start_params=start_params, method='nm', maxiter=1000,
-    maxfun=1000)
+start_params = np.r_[np.ones(7)/2.,10]
+start_params = np.r_[res.params, 0.]
+res4 = mod4.fit(start_params=start_params, method='nm', maxiter=1000, maxfun=1000)
 print(res4.summary())
diff --git a/statsmodels/examples/ex_arch_canada.py b/statsmodels/examples/ex_arch_canada.py
index a636c5b39..af5ff8c00 100644
--- a/statsmodels/examples/ex_arch_canada.py
+++ b/statsmodels/examples/ex_arch_canada.py
@@ -1,12 +1,16 @@
+# -*- coding: utf-8 -*-
 """

 Created on Sat Dec 24 07:31:47 2011

 Author: Josef Perktold
 """
+
 import numpy as np
 import statsmodels.stats.diagnostic as dia
-canada_raw = """     405.36646642737       929.610513893698        7.52999999999884        386.136109062605
+
+canada_raw = '''\
+     405.36646642737       929.610513893698        7.52999999999884        386.136109062605
     404.639833965913       929.803984550587        7.69999999999709        388.135759111711
     403.814883043744       930.318387567177        7.47000000000116        390.540112911955
     404.215773188006       931.427687420772         7.2699999999968        393.963817246136
@@ -89,53 +93,66 @@ canada_raw = """     405.36646642737        929.610513893698        7.52999999999884
     416.867407108435       960.362493080892        6.80000000000291        469.134788222928
     417.610399060359       960.783379042937        6.69999999999709        469.336419672322
     418.002980476361       961.029029939624        6.93000000000029        470.011666329664
-    417.266680178544       961.765709811429        6.87000000000262        469.647234439539"""
-canada = np.array(canada_raw.split(), float).reshape(-1, 4)
-k = 2
-resarch2 = dia.acorr_lm((canada[:, k] - canada[:, k].mean()) ** 2, maxlag=2,
-    autolag=None, store=1)
+    417.266680178544       961.765709811429        6.87000000000262        469.647234439539'''
+
+canada = np.array(canada_raw.split(), float).reshape(-1,4)
+k=2
+resarch2 = dia.acorr_lm((canada[:,k]-canada[:,k].mean())**2, maxlag=2, autolag=None, store=1)
 print(resarch2)
-resarch5 = dia.acorr_lm(canada[:, k] ** 2, maxlag=12, autolag=None, store=1)
-ss = """        ARCH LM-test; Null hypothesis: no ARCH effects
+resarch5 = dia.acorr_lm(canada[:,k]**2, maxlag=12, autolag=None, store=1)
+
+ss = '''\
+        ARCH LM-test; Null hypothesis: no ARCH effects

 Chi-squared = %(chi)-8.4f df = %(df)-4d p-value = %(pval)8.4g
-"""
+'''
 resarch = resarch5
 print()
-print(ss % dict(chi=resarch[2], df=resarch[-1].resols.df_model, pval=
-    resarch[3]))
-"""
+print(ss % dict(chi=resarch[2], df=resarch[-1].resols.df_model, pval=resarch[3]))
+
+
+#R:FinTS: ArchTest(as.vector(Canada[,3]), lag=5)
+'''
         ARCH LM-test; Null hypothesis: no ARCH effects

 data:  as.vector(Canada[, 3])
 Chi-squared = 78.878, df = 5, p-value = 1.443e-15
-"""
-"""
+'''
+
+#from ss above
+'''
         ARCH LM-test; Null hypothesis: no ARCH effects

 Chi-squared = 78.849   df = 5    p-value = 1.461e-15
-"""
-"""
+'''
+
+#k=2
+#R
+'''
         ARCH LM-test; Null hypothesis: no ARCH effects

 data:  as.vector(Canada[, 4])
 Chi-squared = 74.6028, df = 5, p-value = 1.121e-14
-"""
-"""
+'''
+#mine
+'''
         ARCH LM-test; Null hypothesis: no ARCH effects

 Chi-squared = 74.6028  df = 5    p-value = 1.126e-14
-"""
-"""
+'''
+
+'''
 > ArchTest(as.vector(Canada[,4]), lag=12)

         ARCH LM-test; Null hypothesis: no ARCH effects

 data:  as.vector(Canada[, 4])
 Chi-squared = 69.6359, df = 12, p-value = 3.747e-10
-"""
-"""
+'''
+
+#mine:
+'''
         ARCH LM-test; Null hypothesis: no ARCH effects

 Chi-squared = 69.6359  df = 12   p-value = 3.747e-10
-"""
+'''
diff --git a/statsmodels/examples/ex_emplike_1.py b/statsmodels/examples/ex_emplike_1.py
index 5fab62ec8..73ea897c1 100644
--- a/statsmodels/examples/ex_emplike_1.py
+++ b/statsmodels/examples/ex_emplike_1.py
@@ -4,24 +4,52 @@ inference for descriptive statistics.  If matplotlib is installed
 it also generates plots.

 """
+
 import numpy as np
 import statsmodels.api as sm
 print('Welcome to El')
-np.random.seed(634)
+np.random.seed(634)  # No significance of the seed.
+# Let's first generate some univariate data.
 univariate = np.random.standard_normal(30)
+
+# Now let's play with it
+# Initiate an empirical likelihood descriptive statistics instance
 eldescriptive = sm.emplike.DescStat(univariate)
-eldescriptive_mean = eldescriptive.endog.mean()
+
+# Empirical likelihood is (typically) a  method of inference,
+# not estimation.  Therefore, there is no attribute eldescriptive.mean
+# However, we can check the mean:
+eldescriptive_mean = eldescriptive.endog.mean()  #.42
+
+#Let's conduct a hypothesis test to see if the mean is 0
 print('Hypothesis test results for the mean:')
 print(eldescriptive.test_mean(0))
-eldescriptive_var = eldescriptive.endog.var()
+
+
+# The first value is is  -2 *log-likelihood ratio, which is distributed
+#chi2.  The second value is the p-value.
+
+# Let's see what the variance is:
+eldescriptive_var = eldescriptive.endog.var()  # 1.01
+
+#Let's test if the variance is 1:
 print('Hypothesis test results for the variance:')
 print(eldescriptive.test_var(1))
+
+# Let's test if Skewness and Kurtosis are 0
 print('Hypothesis test results for Skewness:')
 print(eldescriptive.test_skew(0))
 print('Hypothesis test results for the Kurtosis:')
 print(eldescriptive.test_kurt(0))
+# Note that the skewness and Kurtosis take longer.  This is because
+# we have to optimize over the nuisance parameters (mean, variance).
+
+# We can also test for the joint skewness and kurtosis
 print(' Joint Skewness-Kurtosis test')
 eldescriptive.test_joint_skew_kurt(0, 0)
+
+
+# Let's try and get some confidence intervals
 print('Confidence interval for the mean')
 print(eldescriptive.ci_mean())
 print('Confidence interval for the variance')
@@ -30,17 +58,41 @@ print('Confidence interval for skewness')
 print(eldescriptive.ci_skew())
 print('Confidence interval for kurtosis')
 print(eldescriptive.ci_kurt())
-mean_variance_contour = eldescriptive.plot_contour(-0.5, 1.2, 0.2, 2.5, 
-    0.05, 0.05)
+
+
+# if matplotlib is installed, we can get a contour plot for the mean
+# and variance.
+mean_variance_contour = eldescriptive.plot_contour(-.5, 1.2, .2, 2.5, .05, .05)
+# This returns a figure instance.  Just type mean_var_contour.show()
+# to see the plot.
+
+# Once you close the plot, we can start some multivariate analysis.
+
 x1 = np.random.exponential(2, (30, 1))
 x2 = 2 * x1 + np.random.chisquare(4, (30, 1))
 mv_data = np.concatenate((x1, x2), axis=1)
 mv_elmodel = sm.emplike.DescStat(mv_data)
+# For multivariate data, the only methods are mv_test_mean,
+# mv mean contour and ci_corr and test_corr.
+
+# Let's test the hypthesis that x1 has a mean of 2 and x2 has a mean of 7
 print('Multivaraite mean hypothesis test')
 print(mv_elmodel.mv_test_mean(np.array([2, 7])))
+
+# Now let's get the confidence interval for correlation
 print('Correlation Coefficient CI')
 print(mv_elmodel.ci_corr())
+# Note how this took much longer than previous functions.  That is
+# because the function is optimizing over 4 nuisance parameters.
+# We can also do a hypothesis test for correlation
 print('Hypothesis test for correlation')
-print(mv_elmodel.test_corr(0.7))
-means_contour = mv_elmodel.mv_mean_contour(1, 3, 6, 9, 0.15, 0.15, plot_dta=1)
-means_contour2 = mv_elmodel.mv_mean_contour(1, 3, 6, 9, 0.05, 0.05, plot_dta=0)
+print(mv_elmodel.test_corr(.7))
+
+# Finally, let's create a contour plot for the means of the data
+means_contour = mv_elmodel.mv_mean_contour(1, 3, 6,9, .15,.15, plot_dta=1)
+# This also returns a fig so we can type mean_contour.show() to see the figure
+# Sometimes, the data is very dispersed and we would like to see the confidence
+# intervals without the plotted data.  Let's see the difference when we set
+# plot_dta=0
+
+means_contour2 = mv_elmodel.mv_mean_contour(1, 3, 6,9, .05,.05, plot_dta=0)
diff --git a/statsmodels/examples/ex_emplike_2.py b/statsmodels/examples/ex_emplike_2.py
index bcd750c61..af6306c9e 100644
--- a/statsmodels/examples/ex_emplike_2.py
+++ b/statsmodels/examples/ex_emplike_2.py
@@ -2,50 +2,89 @@
 This script is a basic tutorial on how to conduct empirical
 likelihood estimation and inference in linear regression models.
 """
+
 import numpy as np
 import statsmodels.api as sm
-np.random.seed(100)
+
+# Let's generate some regression data
+np.random.seed(100)  # no significance of the seed
 X = np.random.standard_normal((40, 3))
 X = sm.add_constant(X)
-beta = np.arange(1, 5)
+beta = np.arange(1,5)
 y = np.dot(X, beta) + np.random.standard_normal(40)
+# There are no distributional assumptions on the error.  I just chose
+# normal errors to demonstrate.
+
 print('Lets play with EL Regression')
+
+
+# In a model with an intercept, access EL inference through OLS results.
+
+
 elmodel = sm.OLS(y, X)
 fitted = elmodel.fit()
+
+
+# Let's test if the intercept is 0
 print('Intercept test')
 test0_1 = fitted.el_test(np.array([0]), np.array([0]))
 print(test0_1)
+#  Let's test if beta3 is 4
 print('beta3 test')
 test1 = fitted.el_test(np.array([4]), np.array([3]))
 print(test1)
+#  Lets test the hypothesis that beta3=4 and beta2=3
 print('joint beta test')
 test2 = fitted.el_test(np.array([3, 4]), np.array([2, 3]))
 print(test2)
+
+#  Let's get the confidence intervals for the parameters
 print('Confidence Interval for Beta1')
 ci_beta1 = fitted.conf_int_el(1)
 print(ci_beta1)
+
+# Of course, we can still see the rest of the RegressionResults
 print('R-squared')
 print(fitted.rsquared)
 print('Params')
 print(fitted.params)
+
+#  Now lets check out regression through the origin
 print('Origin Regression')
 originx = np.random.standard_normal((30, 3))
 originbeta = np.array([[1], [2], [3]])
 originy = np.dot(originx, originbeta) + np.random.standard_normal((30, 1))
+
 originmodel = sm.emplike.ELOriginRegress(originy, originx)
+#  Since in this case, parameter estimates are different then in OLS,
+#  we need to fit the model.
+
 originfit = originmodel.fit()
+
+
 print('The fitted parameters')
 print(originfit.params)
 print('The MSE')
 print(originfit.mse_model)
 print('The R-squared')
 print(originfit.rsquared)
+
+# Note that the first element of param is 0 and there are 4 params.  That is
+# because the first param is the intercept term.  This is noted in the
+# documentation.
+
+#  Now that the model is fitted, we can do some inference.
+
 print('Test beta1 =1')
 test_beta1 = originfit.el_test([1], [1])
 print(test_beta1)
+
+#  A confidence interval for Beta1.
 print('confidence interval for beta1')
 ci_beta2 = originfit.conf_int_el(1)
 print(ci_beta2)
+
+# Finally, since we initiated an EL model, normal inference is not available
 try:
     originfit.conf_int()
 except:
diff --git a/statsmodels/examples/ex_emplike_3.py b/statsmodels/examples/ex_emplike_3.py
index 654619b3e..f9fb7e82a 100644
--- a/statsmodels/examples/ex_emplike_3.py
+++ b/statsmodels/examples/ex_emplike_3.py
@@ -5,16 +5,24 @@ inference in an accelerated failure time model using empirical likelihood.
 We will be using the Stanford Heart Transplant data

 """
+
 import numpy as np
+
 import statsmodels.api as sm
+
 data = sm.datasets.heart.load()
-model = sm.emplike.emplikeAFT(np.log10(data.endog), sm.add_constant(data.
-    exog), data.censors)
+# Note this data has endog, exog and censors
+# We will take the log (base 10) of the endogenous survival times
+
+model = sm.emplike.emplikeAFT(np.log10(data.endog),
+                              sm.add_constant(data.exog), data.censors)
+
+# We need to fit the model to get the parameters
 fitted = model.fit()
 print(fitted.params())
-test1 = fitted.test_beta([4], [0])
+test1 = fitted.test_beta([4],[0])  # Test that the intercept is 4
 print(test1)
-test2 = fitted.test_beta([-0.05], [1])
+test2 = fitted.test_beta([-.05], [1]) # Test that the slope is -.05
 print(test2)
-ci_beta1 = fitted.ci_beta(1, 0.1, -0.1)
+ci_beta1 = fitted.ci_beta(1, .1, -.1)
 print(ci_beta1)
diff --git a/statsmodels/examples/ex_feasible_gls_het.py b/statsmodels/examples/ex_feasible_gls_het.py
index bdec80dd1..90517160c 100644
--- a/statsmodels/examples/ex_feasible_gls_het.py
+++ b/statsmodels/examples/ex_feasible_gls_het.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Examples for linear model with heteroscedasticity estimated by feasible GLS

 These are examples to check the results during development.
@@ -19,24 +20,32 @@ include a constant and in the second case I include some of the same
 regressors as in the main equation.

 """
+
 import numpy as np
 from numpy.testing import assert_almost_equal
+
 from statsmodels.regression.linear_model import OLS
 from statsmodels.regression.feasible_gls import GLSHet, GLSHet2
+
 examples = ['ex1']
+
 if 'ex1' in examples:
+    #from tut_ols_wls
     nsample = 1000
     sig = 0.5
     x1 = np.linspace(0, 20, nsample)
-    X = np.c_[x1, (x1 - 5) ** 2, np.ones(nsample)]
-    np.random.seed(0)
-    beta = [0.5, -0.015, 1.0]
+    X = np.c_[x1, (x1-5)**2, np.ones(nsample)]
+    np.random.seed(0)#9876789) #9876543)
+    beta = [0.5, -0.015, 1.]
     y_true2 = np.dot(X, beta)
     w = np.ones(nsample)
-    w[nsample * 6 // 10:] = 4
-    y2 = y_true2 + sig * np.sqrt(w) * np.random.normal(size=nsample)
-    X2 = X[:, [0, 2]]
+    w[nsample*6//10:] = 4  #Note this is the squared value
+    #y2[:nsample*6/10] = y_true2[:nsample*6/10] + sig*1. * np.random.normal(size=nsample*6/10)
+    #y2[nsample*6/10:] = y_true2[nsample*6/10:] + sig*4. * np.random.normal(size=nsample*4/10)
+    y2 = y_true2 + sig*np.sqrt(w)* np.random.normal(size=nsample)
+    X2 = X[:,[0,2]]
     X2 = X
+
     res_ols = OLS(y2, X2).fit()
     print('OLS beta estimates')
     print(res_ols.params)
@@ -53,15 +62,23 @@ if 'ex1' in examples:
     print(res0.params)
     print('WLS stddev of beta')
     print(res1.bse)
-    print(res1.model.weights / res1.model.weights.max())
-    assert_almost_equal(res1.model.weights / res1.model.weights.max(), 1.0 /
-        w, 14)
+    #compare with previous version GLSHet2, refactoring check
+    #assert_almost_equal(res1.params, np.array([ 0.37642521,  1.51447662]))
+    #this fails ???  more iterations? different starting weights?
+
+
+    print(res1.model.weights/res1.model.weights.max())
+    #why is the error so small in the estimated weights ?
+    assert_almost_equal(res1.model.weights/res1.model.weights.max(), 1./w, 14)
     print('residual regression params')
     print(res1.results_residual_regression.params)
     print('scale of model ?')
     print(res1.scale)
     print('unweighted residual variance, note unweighted mean is not zero')
     print(res1.resid.var())
+    #Note weighted mean is zero:
+    #(res1.model.weights * res1.resid).mean()
+
     doplots = False
     if doplots:
         import matplotlib.pyplot as plt
@@ -71,21 +88,27 @@ if 'ex1' in examples:
         plt.plot(x1, res1.fittedvalues, 'r-', label='fwls')
         plt.plot(x1, res_ols.fittedvalues, '--', label='ols')
         plt.legend()
-    z = (w[:, None] == np.unique(w)).astype(float)
+
+    #z = (w[:,None] == [1,4]).astype(float) #dummy variable
+    z = (w[:,None] == np.unique(w)).astype(float) #dummy variable
     mod2 = GLSHet(y2, X2, exog_var=z)
     res2 = mod2.iterative_fit(2)
     print(res2.params)
+
     import statsmodels.api as sm
     z = sm.add_constant(w)
     mod3 = GLSHet(y2, X2, exog_var=z)
     res3 = mod3.iterative_fit(8)
     print(res3.params)
     print("np.array(res3.model.history['ols_params'])")
+
     print(np.array(res3.model.history['ols_params']))
     print("np.array(res3.model.history['self_params'])")
     print(np.array(res3.model.history['self_params']))
-    print(np.unique(res2.model.weights))
+
+    print(np.unique(res2.model.weights)) #for discrete z only, only a few uniques
     print(np.unique(res3.model.weights))
+
     if doplots:
         plt.figure()
         plt.plot(x1, y2, 'o')
@@ -95,4 +118,6 @@ if 'ex1' in examples:
         plt.plot(x1, res3.fittedvalues, '-', label='fwls3')
         plt.plot(x1, res_ols.fittedvalues, '--', label='ols')
         plt.legend()
+
+
         plt.show()
diff --git a/statsmodels/examples/ex_feasible_gls_het_0.py b/statsmodels/examples/ex_feasible_gls_het_0.py
index b505f4aa6..971fd28a5 100644
--- a/statsmodels/examples/ex_feasible_gls_het_0.py
+++ b/statsmodels/examples/ex_feasible_gls_het_0.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Examples for linear model with heteroscedasticity estimated by feasible GLS

 These are examples to check the results during development.
@@ -14,36 +15,53 @@ Created on Wed Dec 21 12:28:17 2011
 Author: Josef Perktold

 """
+
 import numpy as np
 from numpy.testing import assert_almost_equal
+
 from statsmodels.regression.linear_model import OLS
 from statsmodels.regression.feasible_gls import GLSHet, GLSHet2
 from statsmodels.tools.tools import add_constant
+
 examples = ['ex1']
+
 if 'ex1' in examples:
-    nsample = 300
+    nsample = 300  #different pattern last graph with 100 or 200 or 500
     sig = 0.5
-    np.random.seed(9876789)
+
+    np.random.seed(9876789) #9876543)
     X = np.random.randn(nsample, 3)
-    X = np.column_stack((np.ones((nsample, 1)), X))
-    beta = [1, 0.5, -0.5, 1.0]
+    X = np.column_stack((np.ones((nsample,1)), X))
+    beta = [1, 0.5, -0.5, 1.]
     y_true2 = np.dot(X, beta)
+
+
     x1 = np.linspace(0, 1, nsample)
-    gamma = np.array([1, 3.0])
+    gamma = np.array([1, 3.])
+    #with slope 3 instead of two, I get negative weights, Not correct
+    #   - was misspecified, but the negative weights are still possible with identity link
+    #gamma /= gamma.sum()   #normalize assuming x1.max is 1
     z_true = add_constant(x1)
+
     winv = np.dot(z_true, gamma)
-    het_params = sig ** 2 * np.array([1, 3.0])
-    sig2_het = sig ** 2 * winv
-    weights_dgp = 1 / winv
-    weights_dgp /= weights_dgp.max()
+    het_params = sig**2 * np.array([1, 3.])  # for squared
+    sig2_het = sig**2 * winv
+
+    weights_dgp = 1/winv
+    weights_dgp /= weights_dgp.max()  #should be already normalized - NOT check normalization
+    #y2[:nsample*6/10] = y_true2[:nsample*6/10] + sig*1. * np.random.normal(size=nsample*6/10)
     z0 = np.zeros(nsample)
-    z0[nsample * 5 // 10:] = 1
+    z0[(nsample * 5)//10:] = 1   #dummy for 2 halfs of sample
     z0 = add_constant(z0)
+
     z1 = add_constant(x1)
+
     noise = np.sqrt(sig2_het) * np.random.normal(size=nsample)
     y2 = y_true2 + noise
-    X2 = X[:, [0, 2]]
-    X2 = X
+
+    X2 = X[:,[0,2]]  #misspecified, missing regressor in main equation
+    X2 = X  #correctly specigied
+
     res_ols = OLS(y2, X2).fit()
     print('OLS beta estimates')
     print(res_ols.params)
@@ -60,16 +78,24 @@ if 'ex1' in examples:
     print(res0.params)
     print('WLS stddev of beta')
     print(res1.bse)
-    print(res1.model.weights / res1.model.weights.max())
-    assert_almost_equal(res1.model.weights / res1.model.weights.max(),
-        weights_dgp, 14)
+    #compare with previous version GLSHet2, refactoring check
+    #assert_almost_equal(res1.params, np.array([ 0.37642521,  1.51447662]))
+    #this fails ???  more iterations? different starting weights?
+
+
+    print(res1.model.weights/res1.model.weights.max())
+    #why is the error so small in the estimated weights ?
+    assert_almost_equal(res1.model.weights/res1.model.weights.max(), weights_dgp, 14)
     print('residual regression params')
     print(res1.results_residual_regression.params)
     print('scale of model ?')
     print(res1.scale)
     print('unweighted residual variance, note unweighted mean is not zero')
     print(res1.resid.var())
-    doplots = True
+    #Note weighted mean is zero:
+    #(res1.model.weights * res1.resid).mean()
+
+    doplots = True #False
     if doplots:
         import matplotlib.pyplot as plt
         plt.figure()
@@ -78,25 +104,37 @@ if 'ex1' in examples:
         plt.plot(x1, res1.fittedvalues, 'r-', label='fwls')
         plt.plot(x1, res_ols.fittedvalues, '--', label='ols')
         plt.legend()
+
+    #the next only works if w has finite support, discrete/categorical
+    #z = (w[:,None] == [1,4]).astype(float) #dummy variable
+    #z = (w0[:,None] == np.unique(w0)).astype(float) #dummy variable
+    #changed z0 contains dummy and constant
     mod2 = GLSHet(y2, X2, exog_var=z0)
     res2 = mod2.iterative_fit(3)
     print(res2.params)
+
     import statsmodels.api as sm
-    z = sm.add_constant(x1 / x1.max())
-    mod3 = GLSHet(y2, X2, exog_var=z1)
+    #z = sm.add_constant(w, prepend=True)
+    z = sm.add_constant(x1/x1.max())
+    mod3 = GLSHet(y2, X2, exog_var=z1)#, link=sm.families.links.Log())
     res3 = mod3.iterative_fit(20)
-    error_var_3 = res3.mse_resid / res3.model.weights
+    error_var_3 = res3.mse_resid/res3.model.weights
     print(res3.params)
     print("np.array(res3.model.history['ols_params'])")
+
     print(np.array(res3.model.history['ols_params']))
     print("np.array(res3.model.history['self_params'])")
     print(np.array(res3.model.history['self_params']))
-    print(np.unique(res2.model.weights))
+
+    #Models 2 and 3 are equivalent with different parameterization of Z
+    print(np.unique(res2.model.weights)) #for discrete z only, only a few uniques
     print(np.unique(res3.model.weights))
+
     print(res3.summary())
     print('\n\nResults of estimation of weights')
     print('--------------------------------')
     print(res3.results_residual_regression.summary())
+
     if doplots:
         plt.figure()
         plt.plot(x1, y2, 'o')
@@ -106,17 +144,23 @@ if 'ex1' in examples:
         plt.plot(x1, res3.fittedvalues, '-', label='fwls3')
         plt.plot(x1, res_ols.fittedvalues, '--', label='ols')
         plt.legend()
+
         plt.figure()
         plt.ylim(0, 5)
-        res_e2 = OLS(noise ** 2, z).fit()
-        plt.plot(noise ** 2, 'bo', alpha=0.5, label='dgp error**2')
+        res_e2 = OLS(noise**2, z).fit()
+        plt.plot(noise**2, 'bo', alpha=0.5, label='dgp error**2')
         plt.plot(res_e2.fittedvalues, lw=2, label='ols for noise**2')
+        #plt.plot(res3.model.weights, label='GLSHet weights')
         plt.plot(error_var_3, lw=2, label='GLSHet error var')
-        plt.plot(res3.resid ** 2, 'ro', alpha=0.5, label='resid squared')
-        plt.plot(sig ** 2 * winv, lw=2, label='DGP error var')
+        plt.plot(res3.resid**2, 'ro', alpha=0.5, label='resid squared')
+        #plt.plot(weights_dgp, label='DGP weights')
+        plt.plot(sig**2 * winv, lw=2, label='DGP error var')
         plt.legend()
+
+
         plt.show()
-    """Note these are close but maybe biased because of skewed distribution
+
+    '''Note these are close but maybe biased because of skewed distribution
     >>> res3.mse_resid/res3.model.weights[-10:]
     array([ 1.03115871,  1.03268209,  1.03420547,  1.03572885,  1.03725223,
             1.03877561,  1.04029899,  1.04182237,  1.04334575,  1.04486913])
@@ -126,4 +170,4 @@ if 'ex1' in examples:
     >>> sig**2 * w[-10:]
     array([ 0.98647295,  0.98797595,  0.98947896,  0.99098196,  0.99248497,
             0.99398798,  0.99549098,  0.99699399,  0.99849699,  1.        ])
-        """
+        '''
diff --git a/statsmodels/examples/ex_generic_mle.py b/statsmodels/examples/ex_generic_mle.py
index 2257abf7e..1979e8d7f 100644
--- a/statsmodels/examples/ex_generic_mle.py
+++ b/statsmodels/examples/ex_generic_mle.py
@@ -1,97 +1,148 @@
+
 from functools import partial
+
 import numpy as np
 from scipy import stats
+
 import statsmodels.api as sm
 from statsmodels.base.model import GenericLikelihoodModel
 from statsmodels.tools.numdiff import approx_fprime, approx_hess
+
 data = sm.datasets.spector.load()
 data.exog = sm.add_constant(data.exog, prepend=False)
+# in this dir
+
 probit_mod = sm.Probit(data.endog, data.exog)
 probit_res = probit_mod.fit()
 loglike = probit_mod.loglike
 score = probit_mod.score
-mod = GenericLikelihoodModel(data.endog, data.exog * 2, loglike, score)
-res = mod.fit(method='nm', maxiter=500)
-
+mod = GenericLikelihoodModel(data.endog, data.exog*2, loglike, score)
+res = mod.fit(method="nm", maxiter = 500)

 def probitloglike(params, endog, exog):
     """
     Log likelihood for the probit
     """
-    pass
+    q = 2*endog - 1
+    X = exog
+    return np.add.reduce(stats.norm.logcdf(q*np.dot(X,params)))


 model_loglike = partial(probitloglike, endog=data.endog, exog=data.exog)
 mod = GenericLikelihoodModel(data.endog, data.exog, loglike=model_loglike)
-res = mod.fit(method='nm', maxiter=500)
+res = mod.fit(method="nm", maxiter=500)
 print(res)
-np.allclose(res.params, probit_res.params, rtol=0.0001)
+
+
+np.allclose(res.params, probit_res.params, rtol=1e-4)
 print(res.params, probit_res.params)
+
+#datal = sm.datasets.longley.load()
 datal = sm.datasets.ccard.load()
 datal.exog = sm.add_constant(datal.exog, prepend=False)
+# Instance of GenericLikelihood model does not work directly, because loglike
+# cannot get access to data in self.endog, self.exog
+
 nobs = 5000
-rvs = np.random.randn(nobs, 6)
-datal.exog = rvs[:, :-1]
+rvs = np.random.randn(nobs,6)
+datal.exog = rvs[:,:-1]
 datal.exog = sm.add_constant(datal.exog, prepend=False)
 datal.endog = 1 + rvs.sum(1)
+
 show_error = False
-show_error2 = 1
+show_error2 = 1#False
 if show_error:
+    def loglike_norm_xb(self, params):
+        beta = params[:-1]
+        sigma = params[-1]
+        xb = np.dot(self.exog, beta)
+        return stats.norm.logpdf(self.endog, loc=xb, scale=sigma)
+
     mod_norm = GenericLikelihoodModel(datal.endog, datal.exog, loglike_norm_xb)
-    res_norm = mod_norm.fit(method='nm', maxiter=500)
+    res_norm = mod_norm.fit(method="nm", maxiter = 500)
+
     print(res_norm.params)
+
 if show_error2:
-    model_loglike3 = partial(loglike_norm_xb, endog=datal.endog, exog=datal
-        .exog)
+    def loglike_norm_xb(params, endog, exog):
+        beta = params[:-1]
+        sigma = params[-1]
+        #print exog.shape, beta.shape
+        xb = np.dot(exog, beta)
+        #print xb.shape, stats.norm.logpdf(endog, loc=xb, scale=sigma).shape
+        return stats.norm.logpdf(endog, loc=xb, scale=sigma).sum()
+
+    model_loglike3 = partial(loglike_norm_xb,
+                             endog=datal.endog, exog=datal.exog)
     mod_norm = GenericLikelihoodModel(datal.endog, datal.exog, model_loglike3)
-    res_norm = mod_norm.fit(start_params=np.ones(datal.exog.shape[1] + 1),
-        method='nm', maxiter=5000)
-    print(res_norm.params)
+    res_norm = mod_norm.fit(start_params=np.ones(datal.exog.shape[1]+1),
+                            method="nm", maxiter = 5000)

+    print(res_norm.params)

 class MygMLE(GenericLikelihoodModel):
-    pass
+    # just for testing
+    def loglike(self, params):
+        beta = params[:-1]
+        sigma = params[-1]
+        xb = np.dot(self.exog, beta)
+        return stats.norm.logpdf(self.endog, loc=xb, scale=sigma).sum()

+    def loglikeobs(self, params):
+        beta = params[:-1]
+        sigma = params[-1]
+        xb = np.dot(self.exog, beta)
+        return stats.norm.logpdf(self.endog, loc=xb, scale=sigma)

 mod_norm2 = MygMLE(datal.endog, datal.exog)
-res_norm2 = mod_norm2.fit(start_params=[1.0] * datal.exog.shape[1] + [1],
-    method='nm', maxiter=500)
+#res_norm = mod_norm.fit(start_params=np.ones(datal.exog.shape[1]+1), method="nm", maxiter = 500)
+res_norm2 = mod_norm2.fit(start_params=[1.]*datal.exog.shape[1]+[1], method="nm", maxiter = 500)
 np.allclose(res_norm.params, res_norm2.params)
 print(res_norm2.params)
+
 res2 = sm.OLS(datal.endog, datal.exog).fit()
 start_params = np.hstack((res2.params, np.sqrt(res2.mse_resid)))
-res_norm3 = mod_norm2.fit(start_params=start_params, method='nm', maxiter=
-    500, retall=0)
+res_norm3 = mod_norm2.fit(start_params=start_params, method="nm", maxiter = 500,
+                          retall=0)
 print(start_params)
 print(res_norm3.params)
 print(res2.bse)
 print(res_norm3.bse)
 print('llf', res2.llf, res_norm3.llf)
-bse = np.sqrt(np.diag(np.linalg.inv(res_norm3.model.hessian(res_norm3.params)))
-    )
+
+bse = np.sqrt(np.diag(np.linalg.inv(res_norm3.model.hessian(res_norm3.params))))
 res_norm3.model.score(res_norm3.params)
-res_bfgs = mod_norm2.fit(start_params=start_params, method='bfgs', fprime=
-    None, maxiter=500, retall=0)
-hb = -approx_hess(res_norm3.params, mod_norm2.loglike, epsilon=-0.0001)
-hf = -approx_hess(res_norm3.params, mod_norm2.loglike, epsilon=0.0001)
-hh = (hf + hb) / 2.0
+
+#fprime in fit option cannot be overwritten, set to None, when score is defined
+# exception is fixed, but I do not think score was supposed to be called
+
+res_bfgs = mod_norm2.fit(start_params=start_params, method="bfgs", fprime=None,
+                         maxiter=500, retall=0)
+
+hb=-approx_hess(res_norm3.params, mod_norm2.loglike, epsilon=-1e-4)
+hf=-approx_hess(res_norm3.params, mod_norm2.loglike, epsilon=1e-4)
+hh = (hf+hb)/2.
 print(np.linalg.eigh(hh))
-grad = -approx_fprime(res_norm3.params, mod_norm2.loglike, epsilon=-0.0001)
+
+grad = -approx_fprime(res_norm3.params, mod_norm2.loglike, epsilon=-1e-4)
 print(grad)
-gradb = -approx_fprime(res_norm3.params, mod_norm2.loglike, epsilon=-0.0001)
-gradf = -approx_fprime(res_norm3.params, mod_norm2.loglike, epsilon=0.0001)
-print((gradb + gradf) / 2.0)
+gradb = -approx_fprime(res_norm3.params, mod_norm2.loglike, epsilon=-1e-4)
+gradf = -approx_fprime(res_norm3.params, mod_norm2.loglike, epsilon=1e-4)
+print((gradb+gradf)/2.)
+
 print(res_norm3.model.score(res_norm3.params))
 print(res_norm3.model.score(start_params))
-mod_norm2.loglike(start_params / 2.0)
-print(np.linalg.inv(-1 * mod_norm2.hessian(res_norm3.params)))
+mod_norm2.loglike(start_params/2.)
+print(np.linalg.inv(-1*mod_norm2.hessian(res_norm3.params)))
 print(np.sqrt(np.diag(res_bfgs.cov_params())))
 print(res_norm3.bse)
-print('MLE - OLS parameter estimates')
+
+print("MLE - OLS parameter estimates")
 print(res_norm3.params[:-1] - res2.params)
-print('bse diff in percent')
-print(res_norm3.bse[:-1] / res2.bse * 100.0 - 100)
-"""
+print("bse diff in percent")
+print((res_norm3.bse[:-1] / res2.bse)*100. - 100)
+
+'''
 Optimization terminated successfully.
          Current function value: 12.818804
          Iterations 6
@@ -296,11 +347,14 @@ array([   5.51471653,   80.36595035,    7.46933695,   82.92232357,
    22.91695494]

 Is scale a misnomer, actually scale squared, i.e. variance of error term ?
-"""
+'''
+
 print(res_norm3.model.score_obs(res_norm3.params).shape)
+
 jac = res_norm3.model.score_obs(res_norm3.params)
-print(np.sqrt(np.diag(np.dot(jac.T, jac))) / start_params)
+print(np.sqrt(np.diag(np.dot(jac.T, jac)))/start_params)
 jac2 = res_norm3.model.score_obs(res_norm3.params, centered=True)
+
 print(np.sqrt(np.diag(np.linalg.inv(np.dot(jac.T, jac)))))
 print(res_norm3.bse)
 print(res2.bse)
diff --git a/statsmodels/examples/ex_generic_mle_t.py b/statsmodels/examples/ex_generic_mle_t.py
index 1042095dc..443e95008 100644
--- a/statsmodels/examples/ex_generic_mle_t.py
+++ b/statsmodels/examples/ex_generic_mle_t.py
@@ -1,20 +1,34 @@
+# -*- coding: utf-8 -*-
 """
 Created on Wed Jul 28 08:28:04 2010

 Author: josef-pktd
 """
+
+
 import numpy as np
+
 from scipy import special
 import statsmodels.api as sm
 from statsmodels.base.model import GenericLikelihoodModel
 from statsmodels.tools.numdiff import approx_hess
+
+#redefine some shortcuts
 np_log = np.log
 np_pi = np.pi
 sps_gamln = special.gammaln


+def maxabs(arr1, arr2):
+    return np.max(np.abs(arr1 - arr2))
+
+def maxabsrel(arr1, arr2):
+    return np.max(np.abs(arr2 / arr1 - 1))
+
+
+
 class MyT(GenericLikelihoodModel):
-    """Maximum Likelihood Estimation of Poisson Model
+    '''Maximum Likelihood Estimation of Poisson Model

     This is an example for generic MLE which has the same
     statistical model as discretemod.Poisson.
@@ -24,8 +38,12 @@ class MyT(GenericLikelihoodModel):
     and all resulting statistics are based on numerical
     differentiation.

-    """
+    '''

+    def loglike(self, params):
+        return -self.nloglikeobs(params).sum(0)
+
+    # copied from discretemod.Poisson
     def nloglikeobs(self, params):
         """
         Loglikelihood of Poisson model
@@ -43,26 +61,44 @@ class MyT(GenericLikelihoodModel):
         -----
         .. math:: \\ln L=\\sum_{i=1}^{n}\\left[-\\lambda_{i}+y_{i}x_{i}^{\\prime}\\beta-\\ln y_{i}!\\right]
         """
-        pass
+        #print len(params),
+        beta = params[:-2]
+        df = params[-2]
+        scale = params[-1]
+        loc = np.dot(self.exog, beta)
+        endog = self.endog
+        x = (endog - loc)/scale
+        #next part is stats.t._logpdf
+        lPx = sps_gamln((df+1)/2) - sps_gamln(df/2.)
+        lPx -= 0.5*np_log(df*np_pi) + (df+1)/2.*np_log(1+(x**2)/df)
+        lPx -= np_log(scale)  # correction for scale
+        return -lPx


+#Example:
 np.random.seed(98765678)
 nobs = 1000
-rvs = np.random.randn(nobs, 5)
+rvs = np.random.randn(nobs,5)
 data_exog = sm.add_constant(rvs, prepend=False)
-xbeta = 0.9 + 0.1 * rvs.sum(1)
-data_endog = xbeta + 0.1 * np.random.standard_t(5, size=nobs)
+xbeta = 0.9 + 0.1*rvs.sum(1)
+data_endog = xbeta + 0.1*np.random.standard_t(5, size=nobs)
+#print data_endog
+
 modp = MyT(data_endog, data_exog)
-modp.start_value = np.ones(data_exog.shape[1] + 2)
+modp.start_value = np.ones(data_exog.shape[1]+2)
 modp.start_value[-2] = 10
 modp.start_params = modp.start_value
-resp = modp.fit(start_params=modp.start_value)
+resp = modp.fit(start_params = modp.start_value)
 print(resp.params)
 print(resp.bse)
-hb = -approx_hess(modp.start_value, modp.loglike, epsilon=-0.0001)
+
+
+hb=-approx_hess(modp.start_value, modp.loglike, epsilon=-1e-4)
 tmp = modp.loglike(modp.start_value)
 print(tmp.shape)
-"""
+
+
+'''
 >>> tmp = modp.loglike(modp.start_value)
 8
 >>> tmp.shape
@@ -85,8 +121,9 @@ print(tmp.shape)
 >>> xbeta.shape
 (100,)
 >>>
-"""
-"""
+'''
+
+'''
 repr(start_params) array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
 Optimization terminated successfully.
          Current function value: 91.897859
@@ -220,4 +257,4 @@ array([  1.58253308e-01,   1.73188603e-01,   1.77357447e-01,
          2.06707494e-02,  -1.31174789e-01,   8.79915580e-01,
          6.47663840e+03,   6.73457641e+02])
 >>>
-"""
+'''
diff --git a/statsmodels/examples/ex_generic_mle_tdist.py b/statsmodels/examples/ex_generic_mle_tdist.py
index d850037a2..f578111b8 100644
--- a/statsmodels/examples/ex_generic_mle_tdist.py
+++ b/statsmodels/examples/ex_generic_mle_tdist.py
@@ -1,22 +1,40 @@
+# -*- coding: utf-8 -*-
 """
 Created on Wed Jul 28 08:28:04 2010

 Author: josef-pktd
 """
 import numpy as np
+
 from scipy import stats, special, optimize
 import statsmodels.api as sm
 from statsmodels.base.model import GenericLikelihoodModel
 from statsmodels.tools.numdiff import approx_hess
-import statsmodels.sandbox.distributions.sppatch
+
+#import for kstest based estimation
+#should be replace
+# FIXME: importing these patches scipy distribution classes in-place.
+#  Do not do this.
+import statsmodels.sandbox.distributions.sppatch  # noqa:F401
+
+
+#redefine some shortcuts
 np_log = np.log
 np_pi = np.pi
 sps_gamln = special.gammaln
-store_params = []


+def maxabs(arr1, arr2):
+    return np.max(np.abs(arr1 - arr2))
+
+def maxabsrel(arr1, arr2):
+    return np.max(np.abs(arr2 / arr1 - 1))
+
+#global
+store_params = []
+
 class MyT(GenericLikelihoodModel):
-    """Maximum Likelihood Estimation of Linear Model with t-distributed errors
+    '''Maximum Likelihood Estimation of Linear Model with t-distributed errors

     This is an example for generic MLE which has the same
     statistical model as discretemod.Poisson.
@@ -26,8 +44,13 @@ class MyT(GenericLikelihoodModel):
     and all resulting statistics are based on numerical
     differentiation.

-    """
+    '''
+

+    def loglike(self, params):
+        return -self.nloglikeobs(params).sum(0)
+
+    # copied from discretemod.Poisson
     def nloglikeobs(self, params):
         """
         Loglikelihood of Poisson model
@@ -45,34 +68,59 @@ class MyT(GenericLikelihoodModel):
         -----
         .. math:: \\ln L=\\sum_{i=1}^{n}\\left[-\\lambda_{i}+y_{i}x_{i}^{\\prime}\\beta-\\ln y_{i}!\\right]
         """
-        pass
-
-
+        #print len(params),
+        store_params.append(params)
+        if self.fixed_params is not None:
+            #print 'using fixed'
+            params = self.expandparams(params)
+
+        beta = params[:-2]
+        df = params[-2]
+        scale = params[-1]
+        loc = np.dot(self.exog, beta)
+        endog = self.endog
+        x = (endog - loc)/scale
+        #next part is stats.t._logpdf
+        lPx = sps_gamln((df+1)/2) - sps_gamln(df/2.)
+        lPx -= 0.5*np_log(df*np_pi) + (df+1)/2.*np_log(1+(x**2)/df)
+        lPx -= np_log(scale)  # correction for scale
+        return -lPx
+
+
+#Example:
 np.random.seed(98765678)
 nobs = 1000
 nvars = 6
 df = 5
-rvs = np.random.randn(nobs, nvars - 1)
+rvs = np.random.randn(nobs, nvars-1)
 data_exog = sm.add_constant(rvs, prepend=False)
-xbeta = 0.9 + 0.1 * rvs.sum(1)
-data_endog = xbeta + 0.1 * np.random.standard_t(df, size=nobs)
+xbeta = 0.9 + 0.1*rvs.sum(1)
+data_endog = xbeta + 0.1*np.random.standard_t(df, size=nobs)
 print(data_endog.var())
+
 res_ols = sm.OLS(data_endog, data_exog).fit()
 print(res_ols.scale)
 print(np.sqrt(res_ols.scale))
 print(res_ols.params)
 kurt = stats.kurtosis(res_ols.resid)
-df_fromkurt = 6.0 / kurt + 4
+df_fromkurt = 6./kurt + 4
 print(stats.t.stats(df_fromkurt, moments='mvsk'))
 print(stats.t.stats(df, moments='mvsk'))
+
 modp = MyT(data_endog, data_exog)
-start_value = 0.1 * np.ones(data_exog.shape[1] + 2)
+start_value = 0.1*np.ones(data_exog.shape[1]+2)
+#start_value = np.zeros(data_exog.shape[1]+2)
+#start_value[:nvars] = sm.OLS(data_endog, data_exog).fit().params
 start_value[:nvars] = res_ols.params
-start_value[-2] = df_fromkurt
-start_value[-1] = np.sqrt(res_ols.scale)
+start_value[-2] = df_fromkurt #10
+start_value[-1] = np.sqrt(res_ols.scale) #0.5
 modp.start_params = start_value
+
+#adding fixed parameters
+
 fixdf = np.nan * np.zeros(modp.start_params.shape)
 fixdf[-2] = 100
+
 fixone = 0
 if fixone:
     modp.fixed_params = fixdf
@@ -81,63 +129,162 @@ if fixone:
 else:
     modp.fixed_params = None
     modp.fixed_paramsmask = None
-resp = modp.fit(start_params=modp.start_params, disp=1, method='nm')
-print("""
-estimation results t-dist""")
+
+
+resp = modp.fit(start_params = modp.start_params, disp=1, method='nm')#'newton')
+#resp = modp.fit(start_params = modp.start_params, disp=1, method='newton')
+print('\nestimation results t-dist')
 print(resp.params)
 print(resp.bse)
-resp2 = modp.fit(start_params=resp.params, method='Newton')
+resp2 = modp.fit(start_params = resp.params, method='Newton')
 print('using Newton')
 print(resp2.params)
 print(resp2.bse)
-hb = -approx_hess(modp.start_params, modp.loglike, epsilon=-0.0001)
+
+
+hb=-approx_hess(modp.start_params, modp.loglike, epsilon=-1e-4)
 tmp = modp.loglike(modp.start_params)
 print(tmp.shape)
-pp = np.array(store_params)
+#np.linalg.eigh(np.linalg.inv(hb))[0]
+
+pp=np.array(store_params)
 print(pp.min(0))
 print(pp.max(0))


+
+
+##################### Example: Pareto
+# estimating scale does not work yet, a bug somewhere ?
+# fit_ks works well, but no bse or other result statistics yet
+
+
 class MyPareto(GenericLikelihoodModel):
-    """Maximum Likelihood Estimation pareto distribution
+    '''Maximum Likelihood Estimation pareto distribution

     first version: iid case, with constant parameters
-    """
+    '''
+
+    #copied from stats.distribution
+    def pdf(self, x, b):
+        return b * x**(-b-1)
+
+    def loglike(self, params):
+        return -self.nloglikeobs(params).sum(0)
+
+    def nloglikeobs(self, params):
+        #print params.shape
+        if self.fixed_params is not None:
+            #print 'using fixed'
+            params = self.expandparams(params)
+        b = params[0]
+        loc = params[1]
+        scale = params[2]
+        #loc = np.dot(self.exog, beta)
+        endog = self.endog
+        x = (endog - loc)/scale
+        logpdf = np_log(b) - (b+1.)*np_log(x)  #use np_log(1 + x) for Pareto II
+        logpdf -= np.log(scale)
+        #lb = loc + scale
+        #logpdf[endog<lb] = -inf
+        #import pdb; pdb.set_trace()
+        logpdf[x<1] = -10000 #-np.inf
+        return -logpdf

     def fit_ks(self):
-        """fit Pareto with nested optimization
+        '''fit Pareto with nested optimization

         originally published on stackoverflow
         this does not trim lower values during ks optimization

-        """
-        pass
+        '''
+        rvs = self.endog
+        rvsmin = rvs.min()
+        fixdf = np.nan * np.ones(3)
+        self.fixed_params = fixdf
+        self.fixed_paramsmask = np.isnan(fixdf)
+
+        def pareto_ks(loc, rvs):
+            #start_scale = rvs.min() - loc # not used yet
+            #est = self.fit_fr(rvs, 1., frozen=[np.nan, loc, np.nan])
+            self.fixed_params[1] = loc
+            est = self.fit(start_params=self.start_params[self.fixed_paramsmask]).params
+            #est = self.fit(start_params=self.start_params, method='nm').params
+            args = (est[0], loc, est[1])
+            return stats.kstest(rvs,'pareto',args)[0]
+
+        locest = optimize.fmin(pareto_ks, rvsmin - 1.5, (rvs,))
+        est = stats.pareto.fit_fr(rvs, 0., frozen=[np.nan, locest, np.nan])
+        args = (est[0], locest[0], est[1])
+        return args
+

     def fit_ks1_trim(self):
-        """fit Pareto with nested optimization
+        '''fit Pareto with nested optimization

         originally published on stackoverflow

-        """
-        pass
+        '''
+        self.nobs = self.endog.shape[0]
+        rvs = np.sort(self.endog)
+        rvsmin = rvs.min()
+
+        def pareto_ks(loc, rvs):
+            #start_scale = rvs.min() - loc # not used yet
+            est = stats.pareto.fit_fr(rvs, frozen=[np.nan, loc, np.nan])
+            args = (est[0], loc, est[1])
+            return stats.kstest(rvs,'pareto',args)[0]
+
+        #locest = optimize.fmin(pareto_ks, rvsmin*0.7, (rvs,))
+        maxind = min(np.floor(self.nobs*0.95).astype(int), self.nobs-10)
+        res = []
+        for trimidx in range(self.nobs//2, maxind):
+            xmin = loc = rvs[trimidx]
+            res.append([trimidx, pareto_ks(loc-1e-10, rvs[trimidx:])])
+        res = np.array(res)
+        bestidx = res[np.argmin(res[:,1]),0].astype(int)
+        print(bestidx)
+        locest = rvs[bestidx]
+
+        est = stats.pareto.fit_fr(rvs[bestidx:], 1., frozen=[np.nan, locest, np.nan])
+        args = (est[0], locest, est[1])
+        return args

     def fit_ks1(self):
-        """fit Pareto with nested optimization
+        '''fit Pareto with nested optimization

         originally published on stackoverflow

-        """
-        pass
+        '''
+        rvs = self.endog
+        rvsmin = rvs.min()
+
+        def pareto_ks(loc, rvs):
+            #start_scale = rvs.min() - loc # not used yet
+            est = stats.pareto.fit_fr(rvs, 1., frozen=[np.nan, loc, np.nan])
+            args = (est[0], loc, est[1])
+            return stats.kstest(rvs,'pareto',args)[0]

+        #locest = optimize.fmin(pareto_ks, rvsmin*0.7, (rvs,))
+        locest = optimize.fmin(pareto_ks, rvsmin - 1.5, (rvs,))
+        est = stats.pareto.fit_fr(rvs, 1., frozen=[np.nan, locest, np.nan])
+        args = (est[0], locest[0], est[1])
+        return args

+#y = stats.pareto.rvs(1, loc=10, scale=2, size=nobs)
 y = stats.pareto.rvs(1, loc=0, scale=2, size=nobs)
-par_start_params = np.array([1.0, 9.0, 2.0])
+
+
+par_start_params = np.array([1., 9., 2.])
+
 mod_par = MyPareto(y)
-mod_par.start_params = np.array([1.0, 10.0, 2.0])
-mod_par.start_params = np.array([1.0, -9.0, 2.0])
+mod_par.start_params = np.array([1., 10., 2.])
+mod_par.start_params = np.array([1., -9., 2.])
 mod_par.fixed_params = None
+
 fixdf = np.nan * np.ones(mod_par.start_params.shape)
 fixdf[1] = 9.9
+#fixdf[2] = 2.
 fixone = 0
 if fixone:
     mod_par.fixed_params = fixdf
@@ -152,24 +299,38 @@ else:
     mod_par.df_model = 3
     mod_par.df_resid = mod_par.endog.shape[0] - mod_par.df_model
     mod_par.data.xnames = ['shape', 'loc', 'scale']
-res_par = mod_par.fit(start_params=mod_par.start_params, method='nm',
-    maxfun=10000, maxiter=5000)
+
+res_par = mod_par.fit(start_params=mod_par.start_params, method='nm', maxfun=10000, maxiter=5000)
+#res_par2 = mod_par.fit(start_params=res_par.params, method='newton', maxfun=10000, maxiter=5000)
+
 res_parks = mod_par.fit_ks1()
+
 print(res_par.params)
+#print res_par2.params
 print(res_parks)
+
 print(res_par.params[1:].sum(), sum(res_parks[1:]), mod_par.endog.min())
+
+#start new model, so we do not get two result instances with the same model instance
 mod_par = MyPareto(y)
 mod_par.fixed_params = fixdf
 mod_par.fixed_paramsmask = np.isnan(fixdf)
 mod_par.df_model = mod_par.fixed_paramsmask.sum()
 mod_par.df_resid = mod_par.endog.shape[0] - mod_par.df_model
-mod_par.data.xnames = [name for name, incl in zip(['shape', 'loc', 'scale'],
-    mod_par.fixed_paramsmask) if incl]
+#mod_par.data.xnames = np.array(['shape', 'loc', 'scale'])[mod_par.fixed_paramsmask].tolist() # works also
+mod_par.data.xnames = [name for (name, incl) in zip(['shape', 'loc', 'scale'], mod_par.fixed_paramsmask) if incl]
+
 res_par3 = mod_par.start_params = par_start_params[mod_par.fixed_paramsmask]
 res5 = mod_par.fit(start_params=mod_par.start_params)
+##res_parks2 = mod_par.fit_ks()
+##
+##res_parkst = mod_par.fit_ks1_trim()
+##print res_parkst
+
 print(res5.summary())
-print(res5.t_test([[1, 0]]))
-"""
+print(res5.t_test([[1,0]]))
+
+'''
 0.0686702747648
 0.0164150896481
 0.128121386381
@@ -241,14 +402,111 @@ array([ 1.,  2.])
 >>> mod_par.loglikeobs(np.array([1., 10., 2.]))[0]
 -0.087533156771285828
 >>>
-"""
-"""
+'''
+
+'''
 >>> mod_par.nloglikeobs(np.array([1., 10., 2.]))[0]
 0.86821349410251691
 >>> np.log(stats.pareto.pdf(y,1.,10.,2.)).sum()
 -2627.9403758026938
-"""
-"""
+'''
+
+
+#'''
+#0.0686702747648
+#0.0164150896481
+#0.128121386381
+#[ 0.10370428  0.09921315  0.09676723  0.10457413  0.10201618  0.89964496]
+#(array(0.0), array(1.4552599885729827), array(0.0), array(2.5072143354058203))
+#(array(0.0), array(1.6666666666666667), array(0.0), array(6.0))
+#repr(start_params) array([ 0.10370428,  0.09921315,  0.09676723,  0.10457413,  0.10201618,
+#        0.89964496,  6.39309417,  0.12812139])
+#Optimization terminated successfully.
+#         Current function value: -679.951339
+#         Iterations: 398
+#         Function evaluations: 609
+#
+#estimation results t-dist
+#[ 0.10400826  0.10111893  0.09725133  0.10507788  0.10086163  0.8996041
+#  4.72131318  0.09825355]
+#[ 0.00365493  0.00356149  0.00349329  0.00362333  0.003732    0.00362716
+#  0.72325227  0.00388822]
+#repr(start_params) array([ 0.10400826,  0.10111893,  0.09725133,  0.10507788,  0.10086163,
+#        0.8996041 ,  4.72131318,  0.09825355])
+#Optimization terminated successfully.
+#         Current function value: -679.950443
+#         Iterations 3
+#using Newton
+#[ 0.10395383  0.10106762  0.09720665  0.10503384  0.10080599  0.89954546
+#  4.70918964  0.09815885]
+#[ 0.00365299  0.00355968  0.00349147  0.00362166  0.00373015  0.00362533
+#  0.72014669  0.00388436]
+#()
+#[ 0.09992709  0.09786601  0.09387356  0.10229919  0.09756623  0.85466272
+#  4.60459182  0.09661986]
+#[ 0.11308292  0.10828401  0.1028508   0.11268895  0.10934726  0.94462721
+#  7.15412655  0.13452746]
+#repr(start_params) array([ 1.,  2.])
+#Warning: Maximum number of function evaluations has been exceeded.
+#repr(start_params) array([  3.06504406e+302,   3.29325579e+303])
+#
+#>>> mod_par.fixed_params
+#array([ NaN,  10.,  NaN])
+#>>> mod_par.start_params
+#array([ 1.,  2.])
+#
+#
+#>>> stats.pareto.fit_fr(y, 1., frozen=[np.nan, 10., np.nan])
+#array([ 1.0346268 ,  2.00184808])
+#
+#>>> stats.pareto.fit_fr(y, frozen=[np.nan, 10., np.nan])
+#array([ 1.03463526,  2.00184809])
+#>>> stats.pareto.pdf(y, 1.03463526, 10, 2.00184809).sum()
+#173.33947284555239
+#
+#>>> mod_par.loglike((1.03463526, 10, 2.00184809))
+#-962.21623668859741
+#>>> np.log(stats.pareto.pdf(y, 1.03463526, 10, 2.00184809)).sum()
+#-inf
+#>>> np.log(stats.pareto.pdf(y, 1.03463526, 9, 2.00184809)).sum()
+#-3074.5947476137271
+#>>> np.log(stats.pareto.pdf(y, 1.03463526, 10., 2.00184809)).sum()
+#-inf
+#>>> np.log(stats.pareto.pdf(y, 1.03463526, 9.9, 2.00184809)).sum()
+#-2677.3867091635661
+#>>> y.min()
+#12.001848089426717
+#>>> np.log(stats.pareto.pdf(y, 1.03463526, loc=9.9, scale=2.00184809)).sum()
+#-2677.3867091635661
+#>>> np.log(stats.pareto.pdf(y, 1.03463526, loc=10., scale=2.00184809)).sum()
+#-inf
+#>>> stats.pareto.logpdf(y, 1.03463526, loc=10., scale=2.00184809).sum()
+#-inf
+#>>> stats.pareto.logpdf(y, 1.03463526, loc=9.99, scale=2.00184809).sum()
+#-2631.6120098202355
+#>>> mod_par.loglike((1.03463526, 9.99, 2.00184809))
+#-963.2513896113644
+#>>> maxabs(y, mod_par.endog)
+#0.0
+#
+#>>> stats.pareto.a
+#1.0
+#
+#>>> b, loc, scale = (1.03463526, 9.99, 2.00184809)
+#>>> (1-loc)/scale
+#-4.4908502522786327
+#
+#>>> lb = scale + loc
+#>>> lb
+#11.991848090000001
+#>>> (lb-loc)/scale == 1
+#False
+#>>> (lb-loc)/scale
+#1.0000000000000004
+#>>>
+#'''
+
+'''
 repr(start_params) array([  1.,  10.,   2.])
 Optimization terminated successfully.
          Current function value: 2626.436870
@@ -265,8 +523,9 @@ Optimization terminated successfully.
 >>> y.min()
 12.001848089426717

-"""
-"""
+'''
+
+'''
 0.0686702747648
 0.0164150896481
 0.128121386381
@@ -570,4 +829,4 @@ array([ 1.07716265,  1.18977526,  1.07093   ,  1.05157081,  1.15991232,
         4.96781682]), <a list of 10 Patch objects>)
 >>> plt.show()

-"""
+'''
diff --git a/statsmodels/examples/ex_grangercausality.py b/statsmodels/examples/ex_grangercausality.py
index 57f17163f..aca2d7c5b 100644
--- a/statsmodels/examples/ex_grangercausality.py
+++ b/statsmodels/examples/ex_grangercausality.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """

 Created on Sat Jul 06 15:44:57 2013
@@ -7,26 +8,32 @@ Author: Josef Perktold
 import numpy as np
 from numpy.testing import assert_almost_equal
 from statsmodels.datasets import macrodata
+
 import statsmodels.tsa.stattools as tsa_stats
+
+# some example data
 mdata = macrodata.load_pandas().data
-mdata = mdata[['realgdp', 'realcons']].values
+mdata = mdata[['realgdp','realcons']].values
 data = mdata
 data = np.diff(np.log(data), axis=0)
-r_result = [0.243097, 0.7844328, 195, 2]
-gr = tsa_stats.grangercausalitytests(data[:, 1::-1], 2, verbose=False)
+
+#R: lmtest:grangertest
+r_result = [0.243097, 0.7844328, 195, 2]  #f_test
+gr = tsa_stats.grangercausalitytests(data[:,1::-1], 2, verbose=False)
 assert_almost_equal(r_result, gr[2][0]['ssr_ftest'], decimal=7)
-assert_almost_equal(gr[2][0]['params_ftest'], gr[2][0]['ssr_ftest'], decimal=7)
+assert_almost_equal(gr[2][0]['params_ftest'], gr[2][0]['ssr_ftest'],
+                    decimal=7)
+
 lag = 2
-print("""
-Test Results for %d lags""" % lag)
+print('\nTest Results for %d lags' % lag)
 print()
-print('\n'.join([('%-20s statistic: %f6.4   p-value: %f6.4' % (k, res[0],
-    res[1])) for k, res in gr[lag][0].items()]))
-print("""
- Results for auxiliary restricted regression with two lags""")
+print('\n'.join(['%-20s statistic: %f6.4   p-value: %f6.4' % (k, res[0], res[1])
+                 for k, res in gr[lag][0].items()]))
+
+print('\n Results for auxiliary restricted regression with two lags')
 print()
 print(gr[lag][1][0].summary())
-print("""
- Results for auxiliary unrestricted regression with two lags""")
+
+print('\n Results for auxiliary unrestricted regression with two lags')
 print()
 print(gr[lag][1][1].summary())
diff --git a/statsmodels/examples/ex_inter_rater.py b/statsmodels/examples/ex_inter_rater.py
index 5a3c06610..63a7f964c 100644
--- a/statsmodels/examples/ex_inter_rater.py
+++ b/statsmodels/examples/ex_inter_rater.py
@@ -1,13 +1,18 @@
+# -*- coding: utf-8 -*-
 """

 Created on Mon Dec 10 08:54:02 2012

 Author: Josef Perktold
 """
+
 import numpy as np
+
 from statsmodels.stats.inter_rater import fleiss_kappa, cohens_kappa
-table0 = np.asarray(
-    """1   0   0   0   0   14  1.000
+
+
+table0 = np.asarray('''\
+1  0   0   0   0   14  1.000
 2  0   2   6   4   2   0.253
 3  0   0   3   5   6   0.308
 4  0   3   9   2   0   0.440
@@ -16,42 +21,64 @@ table0 = np.asarray(
 7  3   2   6   3   0   0.242
 8  2   5   3   2   2   0.176
 9  6   5   2   1   0   0.286
-10     0   2   2   3   7   0.286"""
-    .split(), float).reshape(10, -1)
-Total = np.asarray('20 \t28 \t39 \t21 \t32'.split('\t'), int)
-Pj = np.asarray('0.143 \t0.200 \t0.279 \t0.150 \t0.229'.split('\t'), float)
-kappa_wp = 0.21
+10     0   2   2   3   7   0.286'''.split(), float).reshape(10,-1)
+
+
+Total = np.asarray("20     28  39  21  32".split('\t'), int)
+Pj = np.asarray("0.143     0.200   0.279   0.150   0.229".split('\t'), float)
+kappa_wp = 0.210
 table1 = table0[:, 1:-1]
+
+
 print(fleiss_kappa(table1))
-table4 = np.array([[20, 5], [10, 15]])
-print('res', cohens_kappa(table4), 0.4)
+table4 = np.array([[20,5], [10, 15]])
+print('res', cohens_kappa(table4), 0.4) #wikipedia
+
 table5 = np.array([[45, 15], [25, 15]])
-print('res', cohens_kappa(table5), 0.1304)
+print('res', cohens_kappa(table5), 0.1304) #wikipedia
+
 table6 = np.array([[25, 35], [5, 35]])
-print('res', cohens_kappa(table6), 0.2593)
-print('res', cohens_kappa(table6, weights=np.arange(2)), 0.2593)
-t7 = np.array([[16, 18, 28], [10, 27, 13], [28, 20, 24]])
+print('res', cohens_kappa(table6), 0.2593)  #wikipedia
+print('res', cohens_kappa(table6, weights=np.arange(2)), 0.2593)  #wikipedia
+t7 = np.array([[16, 18, 28],
+               [10, 27, 13],
+               [28, 20, 24]])
 print(cohens_kappa(t7, weights=[0, 1, 2]))
+
 table8 = np.array([[25, 35], [5, 35]])
 print('res', cohens_kappa(table8))
-"""
+
+#SAS example from http://www.john-uebersax.com/stat/saskappa.htm
+'''
    Statistic          Value       ASE     95% Confidence Limits
    ------------------------------------------------------------
    Simple Kappa      0.3333    0.0814       0.1738       0.4929
    Weighted Kappa    0.2895    0.0756       0.1414       0.4376
-"""
-t9 = [[0, 0, 0], [5, 16, 3], [8, 12, 28]]
+'''
+t9 = [[0,  0,  0],
+      [5, 16,  3],
+      [8, 12, 28]]
 res9 = cohens_kappa(t9)
 print('res', res9)
 print('res', cohens_kappa(t9, weights=[0, 1, 2]))
+
+
+#check max kappa, constructed by hand, same marginals
 table6a = np.array([[30, 30], [0, 40]])
 res = cohens_kappa(table6a)
 assert res.kappa == res.kappa_max
+#print np.divide(*cohens_kappa(table6)[:2])
 print(res.kappa / res.kappa_max)
-table10 = [[0, 4, 1], [0, 8, 0], [0, 1, 5]]
+
+
+table10 = [[0, 4, 1],
+           [0, 8, 0],
+           [0, 1, 5]]
 res10 = cohens_kappa(table10)
 print('res10', res10)
-"""SAS result for table10
+
+
+'''SAS result for table10

                   Simple Kappa Coefficient
               --------------------------------
@@ -80,4 +107,4 @@ print('res10', res10)
               Z                         3.2971
               One-sided Pr >  Z         0.0005
               Two-sided Pr > |Z|        0.0010
-"""
+'''
diff --git a/statsmodels/examples/ex_kde_confint.py b/statsmodels/examples/ex_kde_confint.py
index 6f221fe00..e2905d18b 100644
--- a/statsmodels/examples/ex_kde_confint.py
+++ b/statsmodels/examples/ex_kde_confint.py
@@ -1,45 +1,68 @@
+# -*- coding: utf-8 -*-
 """

 Created on Mon Dec 16 11:02:59 2013

 Author: Josef Perktold
 """
+
 import numpy as np
 from scipy import stats
 import matplotlib.pyplot as plt
 import statsmodels.nonparametric.api as npar
 from statsmodels.sandbox.nonparametric import kernels
 from statsmodels.distributions.mixture_rvs import mixture_rvs
+
+# example from test_kde.py mixture of two normal distributions
 np.random.seed(12345)
-x = mixture_rvs([0.25, 0.75], size=200, dist=[stats.norm, stats.norm],
-    kwargs=(dict(loc=-1, scale=0.5), dict(loc=1, scale=0.5)))
-x.sort()
+x = mixture_rvs([.25,.75], size=200, dist=[stats.norm, stats.norm],
+                kwargs = (dict(loc=-1, scale=.5),dict(loc=1, scale=.5)))
+
+x.sort() # not needed
+
 kde = npar.KDEUnivariate(x)
 kde.fit('gau')
 ci = kde.kernel.density_confint(kde.density, len(x))
+
 fig = plt.figure()
 ax = fig.add_subplot(1, 1, 1)
+
 ax.hist(x, bins=15, density=True, alpha=0.25)
+
 ax.plot(kde.support, kde.density, lw=2, color='red')
-ax.fill_between(kde.support, ci[:, 0], ci[:, 1], color='grey', alpha='0.7')
+ax.fill_between(kde.support, ci[:,0], ci[:,1],
+                    color='grey', alpha='0.7')
 ax.set_title('Kernel Density Gaussian (bw = %4.2f)' % kde.bw)
+
+
+# use all kernels directly
+
 x_grid = np.linspace(np.min(x), np.max(x), 51)
 x_grid = np.linspace(-3, 3, 51)
+
 kernel_names = ['Biweight', 'Cosine', 'Epanechnikov', 'Gaussian',
-    'Triangular', 'Triweight']
+                'Triangular', 'Triweight', #'Uniform',
+                ]
+
 fig = plt.figure()
 for ii, kn in enumerate(kernel_names):
-    ax = fig.add_subplot(2, 3, ii + 1)
+    ax = fig.add_subplot(2, 3, ii+1)   # without uniform
+
     ax.hist(x, bins=10, density=True, alpha=0.25)
+
+    #reduce bandwidth for Gaussian and Uniform which are to large in example
     if kn in ['Gaussian', 'Uniform']:
-        args = 0.5,
+        args = (0.5,)
     else:
         args = ()
     kernel = getattr(kernels, kn)(*args)
+
     kde_grid = [kernel.density(x, xi) for xi in x_grid]
     confint_grid = kernel.density_confint(kde_grid, len(x))
+
     ax.plot(x_grid, kde_grid, lw=2, color='red', label=kn)
-    ax.fill_between(x_grid, confint_grid[:, 0], confint_grid[:, 1], color=
-        'grey', alpha='0.7')
+    ax.fill_between(x_grid, confint_grid[:,0], confint_grid[:,1],
+                    color='grey', alpha='0.7')
     ax.legend(loc='upper left')
+
 plt.show()
diff --git a/statsmodels/examples/ex_kde_normalreference.py b/statsmodels/examples/ex_kde_normalreference.py
index 9413762dc..fc9376acc 100644
--- a/statsmodels/examples/ex_kde_normalreference.py
+++ b/statsmodels/examples/ex_kde_normalreference.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Author: Padarn Wilson

@@ -5,28 +6,53 @@ Performance of normal reference plug-in estimator vs silverman. Sample is drawn
 from a mixture of gaussians. Distribution has been chosen to be reasoanbly close
 to normal.
 """
+
 import numpy as np
 from scipy import stats
 import matplotlib.pyplot as plt
 import statsmodels.nonparametric.api as npar
 from statsmodels.distributions.mixture_rvs import mixture_rvs
+
+# example from test_kde.py mixture of two normal distributions
 np.random.seed(12345)
-x = mixture_rvs([0.1, 0.9], size=200, dist=[stats.norm, stats.norm], kwargs
-    =(dict(loc=0, scale=0.5), dict(loc=1, scale=0.5)))
+x = mixture_rvs([.1, .9], size=200, dist=[stats.norm, stats.norm],
+                kwargs=(dict(loc=0, scale=.5), dict(loc=1, scale=.5)))
+
 kde = npar.KDEUnivariate(x)
-kernel_names = ['Gaussian', 'Epanechnikov', 'Biweight', 'Triangular',
-    'Triweight', 'Cosine']
-kernel_switch = ['gau', 'epa', 'tri', 'biw', 'triw', 'cos']
+
+
+kernel_names = ['Gaussian', 'Epanechnikov', 'Biweight',
+                'Triangular', 'Triweight', 'Cosine'
+                ]
+
+kernel_switch = ['gau', 'epa', 'tri', 'biw',
+                 'triw', 'cos'
+                 ]
+
+
+def true_pdf(x):
+    pdf = 0.1 * stats.norm.pdf(x, loc=0, scale=0.5)
+    pdf += 0.9 * stats.norm.pdf(x, loc=1, scale=0.5)
+    return pdf
+
 fig = plt.figure()
 for ii, kn in enumerate(kernel_switch):
-    ax = fig.add_subplot(2, 3, ii + 1)
+
+    ax = fig.add_subplot(2, 3, ii + 1)   # without uniform
+
     ax.hist(x, bins=20, density=True, alpha=0.25)
+
     kde.fit(kernel=kn, bw='silverman', fft=False)
     ax.plot(kde.support, kde.density)
+
     kde.fit(kernel=kn, bw='normal_reference', fft=False)
     ax.plot(kde.support, kde.density)
+
     ax.plot(kde.support, true_pdf(kde.support), color='black', linestyle='--')
+
     ax.set_title(kernel_names[ii])
+
+
 ax.legend(['silverman', 'normal reference', 'true pdf'], loc='lower right')
 ax.set_title('200 points')
 plt.show()
diff --git a/statsmodels/examples/ex_kernel_regression.py b/statsmodels/examples/ex_kernel_regression.py
index b4ba36a16..229318fee 100644
--- a/statsmodels/examples/ex_kernel_regression.py
+++ b/statsmodels/examples/ex_kernel_regression.py
@@ -1,39 +1,63 @@
+# -*- coding: utf-8 -*-
 """

 Created on Wed Jan 02 09:17:40 2013

 Author: Josef Perktold based on test file by George Panterov
 """
+
 import numpy as np
 import numpy.testing as npt
 import matplotlib.pyplot as plt
+
 import statsmodels.nonparametric.api as nparam
-italy_gdp = [8.556, 12.262, 9.587, 8.119, 5.537, 6.796, 8.638, 6.483, 6.212,
-    5.111, 6.001, 7.027, 4.616, 3.922, 4.688, 3.957, 3.159, 3.763, 3.829, 
-    5.242, 6.275, 8.518, 11.542, 9.348, 8.02, 5.527, 6.865, 8.666, 6.672, 
-    6.289, 5.286, 6.271, 7.94, 4.72, 4.357, 4.672, 3.883, 3.065, 3.489, 
-    3.635, 5.443, 6.302, 9.054, 12.485, 9.896, 8.33, 6.161, 7.055, 8.717, 6.95]
-italy_year = [1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 
-    1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1952,
-    1952, 1952, 1952, 1952, 1952, 1952, 1952, 1952, 1952, 1952, 1952, 1952,
-    1952, 1952, 1952, 1952, 1952, 1952, 1952, 1952, 1953, 1953, 1953, 1953,
-    1953, 1953, 1953, 1953]
+#import statsmodels.api as sm
+#nparam = sm.nonparametric
+
+
+
+italy_gdp = \
+        [8.556, 12.262, 9.587, 8.119, 5.537, 6.796, 8.638,
+         6.483, 6.212, 5.111, 6.001, 7.027, 4.616, 3.922,
+         4.688, 3.957, 3.159, 3.763, 3.829, 5.242, 6.275,
+         8.518, 11.542, 9.348, 8.02, 5.527, 6.865, 8.666,
+         6.672, 6.289, 5.286, 6.271, 7.94, 4.72, 4.357,
+         4.672, 3.883, 3.065, 3.489, 3.635, 5.443, 6.302,
+         9.054, 12.485, 9.896, 8.33, 6.161, 7.055, 8.717,
+         6.95]
+
+italy_year = \
+        [1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951,
+       1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1952,
+       1952, 1952, 1952, 1952, 1952, 1952, 1952, 1952, 1952, 1952, 1952,
+       1952, 1952, 1952, 1952, 1952, 1952, 1952, 1952, 1952, 1953, 1953,
+       1953, 1953, 1953, 1953, 1953, 1953]
+
 italy_year = np.asarray(italy_year, float)
-model = nparam.KernelReg(endog=[italy_gdp], exog=[italy_year], reg_type=
-    'lc', var_type='o', bw='cv_ls')
+
+model = nparam.KernelReg(endog=[italy_gdp],
+                         exog=[italy_year], reg_type='lc',
+                         var_type='o', bw='cv_ls')
+
 sm_bw = model.bw
 R_bw = 0.1390096
+
 sm_mean, sm_mfx = model.fit()
 sm_mean2 = sm_mean[0:5]
 sm_mfx = sm_mfx[0:5]
 R_mean = 6.190486
+
 sm_R2 = model.r_squared()
 R_R2 = 0.1435323
-npt.assert_allclose(sm_bw, R_bw, atol=0.01)
-npt.assert_allclose(sm_mean2, R_mean, atol=0.01)
-npt.assert_allclose(sm_R2, R_R2, atol=0.01)
+
+npt.assert_allclose(sm_bw, R_bw, atol=1e-2)
+npt.assert_allclose(sm_mean2, R_mean, atol=1e-2)
+npt.assert_allclose(sm_R2, R_R2, atol=1e-2)
+
+
 fig = plt.figure()
-ax = fig.add_subplot(1, 1, 1)
+ax = fig.add_subplot(1,1,1)
 ax.plot(italy_year, italy_gdp, 'o')
 ax.plot(italy_year, sm_mean, '-')
+
 plt.show()
diff --git a/statsmodels/examples/ex_kernel_regression2.py b/statsmodels/examples/ex_kernel_regression2.py
index 56aa80c41..b26e530c1 100644
--- a/statsmodels/examples/ex_kernel_regression2.py
+++ b/statsmodels/examples/ex_kernel_regression2.py
@@ -1,38 +1,55 @@
+# -*- coding: utf-8 -*-
 """

 Created on Wed Jan 02 13:43:44 2013

 Author: Josef Perktold
 """
+
 import numpy as np
 import statsmodels.nonparametric.api as nparam
+
 if __name__ == '__main__':
+
     np.random.seed(500)
     nobs = [250, 1000][0]
     sig_fac = 1
     x = np.random.uniform(-2, 2, size=nobs)
     x.sort()
-    y_true = np.sin(x * 5) / x + 2 * x
-    y = y_true + sig_fac * np.sqrt(np.abs(3 + x)) * np.random.normal(size=nobs)
-    model = nparam.KernelReg(endog=[y], exog=[x], reg_type='lc', var_type=
-        'c', bw='cv_ls', defaults=nparam.EstimatorSettings(efficient=True))
+    y_true = np.sin(x*5)/x + 2*x
+    y = y_true + sig_fac * (np.sqrt(np.abs(3+x))) * np.random.normal(size=nobs)
+
+    model = nparam.KernelReg(endog=[y],
+                             exog=[x], reg_type='lc',
+                             var_type='c', bw='cv_ls',
+                             defaults=nparam.EstimatorSettings(efficient=True))
+
     sm_bw = model.bw
+
     sm_mean, sm_mfx = model.fit()
-    model1 = nparam.KernelReg(endog=[y], exog=[x], reg_type='lc', var_type=
-        'c', bw='cv_ls')
+
+    model1 = nparam.KernelReg(endog=[y],
+                             exog=[x], reg_type='lc',
+                             var_type='c', bw='cv_ls')
     mean1, mfx1 = model1.fit()
-    model2 = nparam.KernelReg(endog=[y], exog=[x], reg_type='ll', var_type=
-        'c', bw='cv_ls')
+
+    model2 = nparam.KernelReg(endog=[y],
+                             exog=[x], reg_type='ll',
+                             var_type='c', bw='cv_ls')
+
     mean2, mfx2 = model2.fit()
+
     print(model.bw)
     print(model1.bw)
     print(model2.bw)
+
     import matplotlib.pyplot as plt
     fig = plt.figure()
-    ax = fig.add_subplot(1, 1, 1)
+    ax = fig.add_subplot(1,1,1)
     ax.plot(x, y, 'o', alpha=0.5)
     ax.plot(x, y_true, lw=2, label='DGP mean')
     ax.plot(x, sm_mean, lw=2, label='kernel mean')
     ax.plot(x, mean2, lw=2, label='kernel mean')
     ax.legend()
+
     plt.show()
diff --git a/statsmodels/examples/ex_kernel_regression3.py b/statsmodels/examples/ex_kernel_regression3.py
index 754841907..ad5fa630f 100644
--- a/statsmodels/examples/ex_kernel_regression3.py
+++ b/statsmodels/examples/ex_kernel_regression3.py
@@ -1,20 +1,24 @@
+# -*- coding: utf-8 -*-
 """script to try out Censored kernel regression

 Created on Wed Jan 02 13:43:44 2013

 Author: Josef Perktold
 """
+
 import numpy as np
 import statsmodels.nonparametric.api as nparam
+
 if __name__ == '__main__':
+
     np.random.seed(500)
     nobs = [250, 1000][0]
     sig_fac = 1
     x = np.random.uniform(-2, 2, size=nobs)
     x.sort()
-    x2 = x ** 2 + 0.02 * np.random.normal(size=nobs)
-    y_true = np.sin(x * 5) / x + 2 * x - 3 * x2
-    y = y_true + sig_fac * np.sqrt(np.abs(3 + x)) * np.random.normal(size=nobs)
+    x2 = x**2 + 0.02 * np.random.normal(size=nobs)
+    y_true = np.sin(x*5)/x + 2*x - 3 * x2
+    y = y_true + sig_fac * (np.sqrt(np.abs(3+x))) * np.random.normal(size=nobs)
     cens_side = ['left', 'right', 'random'][2]
     if cens_side == 'left':
         c_val = 0.5
@@ -25,26 +29,47 @@ if __name__ == '__main__':
     elif cens_side == 'random':
         c_val = 3.5 + 3 * np.random.randn(nobs)
         y_cens = np.minimum(y, c_val)
-    model = nparam.KernelCensoredReg(endog=[y_cens], exog=[x, x2], reg_type
-        ='ll', var_type='cc', bw='aic', censor_val=c_val[:, None])
+
+    model = nparam.KernelCensoredReg(endog=[y_cens],
+                                     #exog=[np.column_stack((x, x**2))], reg_type='lc',
+                                     exog=[x, x2], reg_type='ll',
+                                     var_type='cc', bw='aic', #'cv_ls', #[0.23, 434697.22], #'cv_ls',
+                                     censor_val=c_val[:,None]
+                                     #defaults=nparam.EstimatorSettings(efficient=True)
+                                     )
+
     sm_bw = model.bw
+
     sm_mean, sm_mfx = model.fit()
-    model2 = nparam.KernelReg(endog=[y_cens], exog=[x, x2], reg_type='ll',
-        var_type='cc', bw='aic')
+
+#    model1 = nparam.KernelReg(endog=[y],
+#                             exog=[x], reg_type='lc',
+#                             var_type='c', bw='cv_ls')
+#    mean1, mfx1 = model1.fit()
+
+    model2 = nparam.KernelReg(endog=[y_cens],
+                              exog=[x, x2], reg_type='ll',
+                              var_type='cc', bw='aic')#, 'cv_ls'
+
     mean2, mfx2 = model2.fit()
+
     print(model.bw)
+    #print model1.bw
     print(model2.bw)
+
     ix = np.argsort(y_cens)
     ix_rev = np.zeros(nobs, int)
     ix_rev[ix] = np.arange(nobs)
     ix_rev = model.sortix_rev
+
     import matplotlib.pyplot as plt
     fig = plt.figure()
-    ax = fig.add_subplot(1, 1, 1)
+    ax = fig.add_subplot(1,1,1)
     ax.plot(x, y, 'o', alpha=0.5)
     ax.plot(x, y_cens, 'o', alpha=0.5)
     ax.plot(x, y_true, lw=2, label='DGP mean')
     ax.plot(x, sm_mean[ix_rev], lw=2, label='model 0 mean')
     ax.plot(x, mean2, lw=2, label='model 2 mean')
     ax.legend()
+
     plt.show()
diff --git a/statsmodels/examples/ex_kernel_regression_censored2.py b/statsmodels/examples/ex_kernel_regression_censored2.py
index 29a647c7e..6a9540eb5 100644
--- a/statsmodels/examples/ex_kernel_regression_censored2.py
+++ b/statsmodels/examples/ex_kernel_regression_censored2.py
@@ -1,27 +1,37 @@
+# -*- coding: utf-8 -*-
 """script to check KernelCensoredReg based on test file

 Created on Thu Jan 03 20:20:47 2013

 Author: Josef Perktold
 """
+
 import numpy as np
 import statsmodels.nonparametric.api as nparam
+
 if __name__ == '__main__':
+    #example from test file
     nobs = 200
     np.random.seed(1234)
-    C1 = np.random.normal(size=(nobs,))
-    C2 = np.random.normal(2, 1, size=(nobs,))
-    noise = 0.1 * np.random.normal(size=(nobs,))
-    y = 0.3 + 1.2 * C1 - 0.9 * C2 + noise
-    y[y > 0] = 0
-    model = nparam.KernelCensoredReg(endog=[y], exog=[C1, C2], reg_type=
-        'll', var_type='cc', bw='cv_ls', censor_val=0)
+    C1 = np.random.normal(size=(nobs, ))
+    C2 = np.random.normal(2, 1, size=(nobs, ))
+    noise = 0.1 * np.random.normal(size=(nobs, ))
+    y = 0.3 +1.2 * C1 - 0.9 * C2 + noise
+    y[y>0] = 0  # censor the data
+    model = nparam.KernelCensoredReg(endog=[y], exog=[C1, C2],
+                                     reg_type='ll', var_type='cc',
+                                     bw='cv_ls', censor_val=0)
     sm_mean, sm_mfx = model.fit()
+
     import matplotlib.pyplot as plt
     fig = plt.figure()
-    ax = fig.add_subplot(1, 1, 1)
+    ax = fig.add_subplot(1,1,1)
     sortidx = np.argsort(y)
     ax.plot(y[sortidx], 'o', alpha=0.5)
+    #ax.plot(x, y_cens, 'o', alpha=0.5)
+    #ax.plot(x, y_true, lw=2, label='DGP mean')
     ax.plot(sm_mean[sortidx], lw=2, label='model 0 mean')
+    #ax.plot(x, mean2, lw=2, label='model 2 mean')
     ax.legend()
+
     plt.show()
diff --git a/statsmodels/examples/ex_kernel_regression_dgp.py b/statsmodels/examples/ex_kernel_regression_dgp.py
index ad2de6958..d2176fd84 100644
--- a/statsmodels/examples/ex_kernel_regression_dgp.py
+++ b/statsmodels/examples/ex_kernel_regression_dgp.py
@@ -1,31 +1,45 @@
+# -*- coding: utf-8 -*-
 """

 Created on Sun Jan 06 09:50:54 2013

 Author: Josef Perktold
 """
+
+
 if __name__ == '__main__':
+
     import numpy as np
     import matplotlib.pyplot as plt
     from statsmodels.nonparametric.api import KernelReg
     import statsmodels.sandbox.nonparametric.dgp_examples as dgp
+
+
     seed = np.random.randint(999999)
     seed = 430973
     print(seed)
     np.random.seed(seed)
-    funcs = [dgp.UnivariateFanGijbels1(), dgp.UnivariateFanGijbels2(), dgp.
-        UnivariateFanGijbels1EU(), dgp.UnivariateFunc1()]
+
+    funcs = [dgp.UnivariateFanGijbels1(),
+             dgp.UnivariateFanGijbels2(),
+             dgp.UnivariateFanGijbels1EU(),
+             #dgp.UnivariateFanGijbels2(distr_x=stats.uniform(-2, 4))
+             dgp.UnivariateFunc1()
+             ]
+
     res = []
     fig = plt.figure()
-    for i, func in enumerate(funcs):
+    for i,func in enumerate(funcs):
+        #f = func()
         f = func
-        model = KernelReg(endog=[f.y], exog=[f.x], reg_type='ll', var_type=
-            'c', bw='cv_ls')
+        model = KernelReg(endog=[f.y], exog=[f.x], reg_type='ll',
+                          var_type='c', bw='cv_ls')
         mean, mfx = model.fit()
-        ax = fig.add_subplot(2, 2, i + 1)
+        ax = fig.add_subplot(2, 2, i+1)
         f.plot(ax=ax)
         ax.plot(f.x, mean, color='r', lw=2, label='est. mean')
         ax.legend(loc='upper left')
         res.append((model, mean, mfx))
+
     fig.suptitle('Kernel Regression')
     fig.show()
diff --git a/statsmodels/examples/ex_kernel_regression_sigtest.py b/statsmodels/examples/ex_kernel_regression_sigtest.py
index e3c856758..6ddb6aded 100644
--- a/statsmodels/examples/ex_kernel_regression_sigtest.py
+++ b/statsmodels/examples/ex_kernel_regression_sigtest.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Kernel Regression and Significance Test

 Warning: SLOW, 11 minutes on my computer
@@ -31,25 +32,39 @@ bootstrap critical values
 times: 8.34599995613 20.6909999847 666.373999834

 """
+
 import time
+
 import numpy as np
 import statsmodels.nonparametric.api as nparam
 import statsmodels.nonparametric.kernel_regression as smkr
+
 if __name__ == '__main__':
     t0 = time.time()
+    #example from test file
     nobs = 200
     np.random.seed(1234)
-    C1 = np.random.normal(size=(nobs,))
-    C2 = np.random.normal(2, 1, size=(nobs,))
-    noise = np.random.normal(size=(nobs,))
-    Y = 0.3 + 1.2 * C1 - 0.9 * C2 + noise
-    model = nparam.KernelReg(endog=[Y], exog=[C1, C2], reg_type='lc',
-        var_type='cc', bw='aic')
+    C1 = np.random.normal(size=(nobs, ))
+    C2 = np.random.normal(2, 1, size=(nobs, ))
+    noise = np.random.normal(size=(nobs, ))
+    Y = 0.3 +1.2 * C1 - 0.9 * C2 + noise
+    #self.write2file('RegData.csv', (Y, C1, C2))
+
+    #CODE TO PRODUCE BANDWIDTH ESTIMATION IN R
+    #library(np)
+    #data <- read.csv('RegData.csv', header=FALSE)
+    #bw <- npregbw(formula=data$V1 ~ data$V2 + data$V3,
+    #                bwmethod='cv.aic', regtype='lc')
+    model = nparam.KernelReg(endog=[Y], exog=[C1, C2],
+                             reg_type='lc', var_type='cc', bw='aic')
     mean, marg = model.fit()
+    #R_bw = [0.4017893, 0.4943397]  # Bandwidth obtained in R
     bw_expected = [0.3987821, 0.50933458]
+    #npt.assert_allclose(model.bw, bw_expected, rtol=1e-3)
     print('bw')
     print(model.bw)
     print(bw_expected)
+
     print('\nsig_test - default')
     print(model.sig_test([1], nboot=100))
     t1 = time.time()
@@ -62,6 +77,7 @@ if __name__ == '__main__':
     bsort0 = np.sort(res0.t_dist)
     nrep0 = len(bsort0)
     print(bsort0[(probs * nrep0).astype(int)])
+
     t2 = time.time()
     print('\nsig_test - pivot=True, nboot=200, nested_res=50')
     res1 = smkr.TestRegCoefC(model, [1], pivot=True, nboot=200, nested_res=50)
@@ -74,4 +90,18 @@ if __name__ == '__main__':
     nrep1 = len(bsort1)
     print(bsort1[(probs * nrep1).astype(int)])
     t3 = time.time()
-    print('times:', t1 - t0, t2 - t1, t3 - t2)
+
+    print('times:', t1-t0, t2-t1, t3-t2)
+
+
+#    import matplotlib.pyplot as plt
+#    fig = plt.figure()
+#    ax = fig.add_subplot(1,1,1)
+#    ax.plot(x, y, 'o', alpha=0.5)
+#    ax.plot(x, y_cens, 'o', alpha=0.5)
+#    ax.plot(x, y_true, lw=2, label='DGP mean')
+#    ax.plot(x, sm_mean, lw=2, label='model 0 mean')
+#    ax.plot(x, mean2, lw=2, label='model 2 mean')
+#    ax.legend()
+#
+#    plt.show()
diff --git a/statsmodels/examples/ex_kernel_semilinear_dgp.py b/statsmodels/examples/ex_kernel_semilinear_dgp.py
index 9275cb61e..08d54d7e2 100644
--- a/statsmodels/examples/ex_kernel_semilinear_dgp.py
+++ b/statsmodels/examples/ex_kernel_semilinear_dgp.py
@@ -1,51 +1,71 @@
+# -*- coding: utf-8 -*-
 """

 Created on Sun Jan 06 09:50:54 2013

 Author: Josef Perktold
 """
+
+
+
 if __name__ == '__main__':
+
     import numpy as np
     import matplotlib.pyplot as plt
+    #from statsmodels.nonparametric.api import KernelReg
     import statsmodels.sandbox.nonparametric.kernel_extras as smke
     import statsmodels.sandbox.nonparametric.dgp_examples as dgp

-
     class UnivariateFunc1a(dgp.UnivariateFunc1):
-        pass
+
+        def het_scale(self, x):
+            return 0.5
+
     seed = np.random.randint(999999)
-    seed = 648456
+    #seed = 430973
+    #seed = 47829
+    seed = 648456 #good seed for het_scale = 0.5
     print(seed)
     np.random.seed(seed)
+
     nobs, k_vars = 300, 3
     x = np.random.uniform(-2, 2, size=(nobs, k_vars))
-    xb = x.sum(1) / 3
+    xb = x.sum(1) / 3  #beta = [1,1,1]
+
     k_vars_lin = 2
     x2 = np.random.uniform(-2, 2, size=(nobs, k_vars_lin))
-    funcs = [UnivariateFunc1a(x=xb)]
+
+    funcs = [#dgp.UnivariateFanGijbels1(),
+             #dgp.UnivariateFanGijbels2(),
+             #dgp.UnivariateFanGijbels1EU(),
+             #dgp.UnivariateFanGijbels2(distr_x=stats.uniform(-2, 4))
+             UnivariateFunc1a(x=xb)
+             ]
+
     res = []
     fig = plt.figure()
-    for i, func in enumerate(funcs):
+    for i,func in enumerate(funcs):
+        #f = func()
         f = func
         y = f.y + x2.sum(1)
         model = smke.SemiLinear(y, x2, x, 'ccc', k_vars_lin)
         mean, mfx = model.fit()
-        ax = fig.add_subplot(1, 1, i + 1)
+        ax = fig.add_subplot(1, 1, i+1)
         f.plot(ax=ax)
         xb_est = np.dot(model.exog, model.b)
-        sortidx = np.argsort(xb_est)
-        ax.plot(f.x[sortidx], mean[sortidx], 'o', color='r', lw=2, label=
-            'est. mean')
+        sortidx = np.argsort(xb_est) #f.x)
+        ax.plot(f.x[sortidx], mean[sortidx], 'o', color='r', lw=2, label='est. mean')
+#        ax.plot(f.x, mean0, color='g', lw=2, label='est. mean')
         ax.legend(loc='upper left')
         res.append((model, mean, mfx))
+
     print('beta', model.b)
-    print('scale - est', (y - (xb_est + mean)).std())
-    print('scale - dgp realised, true', (y - (f.y_true + x2.sum(1))).std(),
-        2 * f.het_scale(1))
+    print('scale - est', (y - (xb_est+mean)).std())
+    print('scale - dgp realised, true', (y - (f.y_true + x2.sum(1))).std(), \
+                                        2 * f.het_scale(1))
     fittedvalues = xb_est + mean
     resid = np.squeeze(model.endog) - fittedvalues
-    print('corrcoef(fittedvalues, resid)', np.corrcoef(fittedvalues, resid)
-        [0, 1])
+    print('corrcoef(fittedvalues, resid)', np.corrcoef(fittedvalues, resid)[0,1])
     print('variance of components, var and as fraction of var(y)')
     print('fitted values', fittedvalues.var(), fittedvalues.var() / y.var())
     print('linear       ', xb_est.var(), xb_est.var() / y.var())
@@ -55,48 +75,51 @@ if __name__ == '__main__':
     print(np.cov(fittedvalues, resid) / model.endog.var(ddof=1))
     print('sum', (np.cov(fittedvalues, resid) / model.endog.var(ddof=1)).sum())
     print('\ncovariance decomposition, xb, m, resid as fraction of var(y)')
-    print(np.cov(np.column_stack((xb_est, mean, resid)), rowvar=False) /
-        model.endog.var(ddof=1))
+    print(np.cov(np.column_stack((xb_est, mean, resid)), rowvar=False) / model.endog.var(ddof=1))
+
     fig.suptitle('Kernel Regression')
     fig.show()
+
     alpha = 0.7
     fig = plt.figure()
     ax = fig.add_subplot(1, 1, 1)
-    ax.plot(f.x[sortidx], f.y[sortidx], 'o', color='b', lw=2, alpha=alpha,
-        label='observed')
-    ax.plot(f.x[sortidx], f.y_true[sortidx], 'o', color='g', lw=2, alpha=
-        alpha, label='dgp. mean')
-    ax.plot(f.x[sortidx], mean[sortidx], 'o', color='r', lw=2, alpha=alpha,
-        label='est. mean')
+    ax.plot(f.x[sortidx], f.y[sortidx], 'o', color='b', lw=2, alpha=alpha, label='observed')
+    ax.plot(f.x[sortidx], f.y_true[sortidx], 'o', color='g', lw=2, alpha=alpha, label='dgp. mean')
+    ax.plot(f.x[sortidx], mean[sortidx], 'o', color='r', lw=2, alpha=alpha, label='est. mean')
     ax.legend(loc='upper left')
+
     sortidx = np.argsort(xb_est + mean)
     fig = plt.figure()
     ax = fig.add_subplot(1, 1, 1)
-    ax.plot(f.x[sortidx], y[sortidx], 'o', color='b', lw=2, alpha=alpha,
-        label='observed')
-    ax.plot(f.x[sortidx], f.y_true[sortidx], 'o', color='g', lw=2, alpha=
-        alpha, label='dgp. mean')
-    ax.plot(f.x[sortidx], (xb_est + mean)[sortidx], 'o', color='r', lw=2,
-        alpha=alpha, label='est. mean')
+    ax.plot(f.x[sortidx], y[sortidx], 'o', color='b', lw=2, alpha=alpha, label='observed')
+    ax.plot(f.x[sortidx], f.y_true[sortidx], 'o', color='g', lw=2, alpha=alpha, label='dgp. mean')
+    ax.plot(f.x[sortidx], (xb_est + mean)[sortidx], 'o', color='r', lw=2, alpha=alpha, label='est. mean')
     ax.legend(loc='upper left')
     ax.set_title('Semilinear Model - observed and total fitted')
+
     fig = plt.figure()
+#    ax = fig.add_subplot(1, 2, 1)
+#    ax.plot(f.x, f.y, 'o', color='b', lw=2, alpha=alpha, label='observed')
+#    ax.plot(f.x, f.y_true, 'o', color='g', lw=2, alpha=alpha, label='dgp. mean')
+#    ax.plot(f.x, mean, 'o', color='r', lw=2, alpha=alpha, label='est. mean')
+#    ax.legend(loc='upper left')
     sortidx0 = np.argsort(xb)
     ax = fig.add_subplot(1, 2, 1)
     ax.plot(f.y[sortidx0], 'o', color='b', lw=2, alpha=alpha, label='observed')
-    ax.plot(f.y_true[sortidx0], 'o', color='g', lw=2, alpha=alpha, label=
-        'dgp. mean')
-    ax.plot(mean[sortidx0], 'o', color='r', lw=2, alpha=alpha, label=
-        'est. mean')
+    ax.plot(f.y_true[sortidx0], 'o', color='g', lw=2, alpha=alpha, label='dgp. mean')
+    ax.plot(mean[sortidx0], 'o', color='r', lw=2, alpha=alpha, label='est. mean')
     ax.legend(loc='upper left')
     ax.set_title('Single Index Model (sorted by true xb)')
+
     ax = fig.add_subplot(1, 2, 2)
     ax.plot(y - xb_est, 'o', color='b', lw=2, alpha=alpha, label='observed')
     ax.plot(f.y_true, 'o', color='g', lw=2, alpha=alpha, label='dgp. mean')
     ax.plot(mean, 'o', color='r', lw=2, alpha=alpha, label='est. mean')
     ax.legend(loc='upper left')
     ax.set_title('Single Index Model (nonparametric)')
+
     plt.figure()
-    plt.plot(y, xb_est + mean, '.')
+    plt.plot(y, xb_est+mean, '.')
     plt.title('observed versus fitted values')
+
     plt.show()
diff --git a/statsmodels/examples/ex_kernel_singleindex_dgp.py b/statsmodels/examples/ex_kernel_singleindex_dgp.py
index 63928f944..d7b37cc9f 100644
--- a/statsmodels/examples/ex_kernel_singleindex_dgp.py
+++ b/statsmodels/examples/ex_kernel_singleindex_dgp.py
@@ -1,68 +1,92 @@
+# -*- coding: utf-8 -*-
 """

 Created on Sun Jan 06 09:50:54 2013

 Author: Josef Perktold
 """
+
+
+
 if __name__ == '__main__':
+
     import numpy as np
     import matplotlib.pyplot as plt
+    #from statsmodels.nonparametric.api import KernelReg
     import statsmodels.sandbox.nonparametric.kernel_extras as smke
     import statsmodels.sandbox.nonparametric.dgp_examples as dgp

-
     class UnivariateFunc1a(dgp.UnivariateFunc1):
-        pass
+
+        def het_scale(self, x):
+            return 0.5
+
     seed = np.random.randint(999999)
-    seed = 648456
+    #seed = 430973
+    #seed = 47829
+    seed = 648456 #good seed for het_scale = 0.5
     print(seed)
     np.random.seed(seed)
+
     nobs, k_vars = 300, 3
     x = np.random.uniform(-2, 2, size=(nobs, k_vars))
-    xb = x.sum(1) / 3
-    funcs = [UnivariateFunc1a(x=xb)]
+    xb = x.sum(1) / 3  #beta = [1,1,1]
+
+    funcs = [#dgp.UnivariateFanGijbels1(),
+             #dgp.UnivariateFanGijbels2(),
+             #dgp.UnivariateFanGijbels1EU(),
+             #dgp.UnivariateFanGijbels2(distr_x=stats.uniform(-2, 4))
+             UnivariateFunc1a(x=xb)
+             ]
+
     res = []
     fig = plt.figure()
-    for i, func in enumerate(funcs):
+    for i,func in enumerate(funcs):
+        #f = func()
         f = func
-        model = smke.SingleIndexModel(endog=[f.y], exog=x, var_type='ccc')
+#        mod0 = smke.SingleIndexModel(endog=[f.y], exog=[xb], #reg_type='ll',
+#                          var_type='c')#, bw='cv_ls')
+#        mean0, mfx0 = mod0.fit()
+        model = smke.SingleIndexModel(endog=[f.y], exog=x, #reg_type='ll',
+                          var_type='ccc')#, bw='cv_ls')
         mean, mfx = model.fit()
-        ax = fig.add_subplot(1, 1, i + 1)
+        ax = fig.add_subplot(1, 1, i+1)
         f.plot(ax=ax)
         xb_est = np.dot(model.exog, model.b)
-        sortidx = np.argsort(xb_est)
-        ax.plot(f.x[sortidx], mean[sortidx], 'o', color='r', lw=2, label=
-            'est. mean')
+        sortidx = np.argsort(xb_est) #f.x)
+        ax.plot(f.x[sortidx], mean[sortidx], 'o', color='r', lw=2, label='est. mean')
+#        ax.plot(f.x, mean0, color='g', lw=2, label='est. mean')
         ax.legend(loc='upper left')
         res.append((model, mean, mfx))
+
     fig.suptitle('Kernel Regression')
     fig.show()
+
     alpha = 0.7
     fig = plt.figure()
     ax = fig.add_subplot(1, 1, 1)
-    ax.plot(f.x[sortidx], f.y[sortidx], 'o', color='b', lw=2, alpha=alpha,
-        label='observed')
-    ax.plot(f.x[sortidx], f.y_true[sortidx], 'o', color='g', lw=2, alpha=
-        alpha, label='dgp. mean')
-    ax.plot(f.x[sortidx], mean[sortidx], 'o', color='r', lw=2, alpha=alpha,
-        label='est. mean')
+    ax.plot(f.x[sortidx], f.y[sortidx], 'o', color='b', lw=2, alpha=alpha, label='observed')
+    ax.plot(f.x[sortidx], f.y_true[sortidx], 'o', color='g', lw=2, alpha=alpha, label='dgp. mean')
+    ax.plot(f.x[sortidx], mean[sortidx], 'o', color='r', lw=2, alpha=alpha, label='est. mean')
     ax.legend(loc='upper left')
+
     fig = plt.figure()
+#    ax = fig.add_subplot(1, 2, 1)
+#    ax.plot(f.x, f.y, 'o', color='b', lw=2, alpha=alpha, label='observed')
+#    ax.plot(f.x, f.y_true, 'o', color='g', lw=2, alpha=alpha, label='dgp. mean')
+#    ax.plot(f.x, mean, 'o', color='r', lw=2, alpha=alpha, label='est. mean')
+#    ax.legend(loc='upper left')
     sortidx0 = np.argsort(xb)
     ax = fig.add_subplot(1, 2, 1)
     ax.plot(f.y[sortidx0], 'o', color='b', lw=2, alpha=alpha, label='observed')
-    ax.plot(f.y_true[sortidx0], 'o', color='g', lw=2, alpha=alpha, label=
-        'dgp. mean')
-    ax.plot(mean[sortidx0], 'o', color='r', lw=2, alpha=alpha, label=
-        'est. mean')
+    ax.plot(f.y_true[sortidx0], 'o', color='g', lw=2, alpha=alpha, label='dgp. mean')
+    ax.plot(mean[sortidx0], 'o', color='r', lw=2, alpha=alpha, label='est. mean')
     ax.legend(loc='upper left')
     ax.set_title('Single Index Model (sorted by true xb)')
     ax = fig.add_subplot(1, 2, 2)
     ax.plot(f.y[sortidx], 'o', color='b', lw=2, alpha=alpha, label='observed')
-    ax.plot(f.y_true[sortidx], 'o', color='g', lw=2, alpha=alpha, label=
-        'dgp. mean')
-    ax.plot(mean[sortidx], 'o', color='r', lw=2, alpha=alpha, label='est. mean'
-        )
+    ax.plot(f.y_true[sortidx], 'o', color='g', lw=2, alpha=alpha, label='dgp. mean')
+    ax.plot(mean[sortidx], 'o', color='r', lw=2, alpha=alpha, label='est. mean')
     ax.legend(loc='upper left')
     ax.set_title('Single Index Model (sorted by estimated xb)')
     plt.show()
diff --git a/statsmodels/examples/ex_kernel_test_functional.py b/statsmodels/examples/ex_kernel_test_functional.py
index 9da0e56e5..9207d055f 100644
--- a/statsmodels/examples/ex_kernel_test_functional.py
+++ b/statsmodels/examples/ex_kernel_test_functional.py
@@ -1,53 +1,69 @@
+# -*- coding: utf-8 -*-
 """

 Created on Tue Jan 08 19:03:20 2013

 Author: Josef Perktold
 """
+
+
+
 if __name__ == '__main__':
+
     import numpy as np
+
     from statsmodels.regression.linear_model import OLS
+    #from statsmodels.nonparametric.api import KernelReg
     import statsmodels.sandbox.nonparametric.kernel_extras as smke
+
     seed = np.random.randint(999999)
+    #seed = 661176
     print(seed)
     np.random.seed(seed)
-    sig_e = 0.5
+
+    sig_e = 0.5 #0.1
     nobs, k_vars = 200, 1
     x = np.random.uniform(-2, 2, size=(nobs, k_vars))
     x.sort()
+
     order = 3
-    exog = x ** np.arange(order + 1)
-    beta = np.array([1, 1, 0.1, 0.0])[:order + 1]
+    exog = x**np.arange(order + 1)
+    beta = np.array([1, 1, 0.1, 0.0])[:order+1] # 1. / np.arange(1, order + 2)
     y_true = np.dot(exog, beta)
     y = y_true + sig_e * np.random.normal(size=nobs)
     endog = y
+
     print('DGP')
     print('nobs=%d, beta=%r, sig_e=%3.1f' % (nobs, beta, sig_e))
-    mod_ols = OLS(endog, exog[:, :2])
+
+    mod_ols = OLS(endog, exog[:,:2])
     res_ols = mod_ols.fit()
-    tst = smke.TestFForm(endog, exog[:, :2], bw=[0.01, 0.45], var_type='cc',
-        fform=lambda x, p: mod_ols.predict(p, x), estimator=lambda y, x:
-        OLS(y, x).fit().params, nboot=1000)
+    #'cv_ls'[1000, 0.5][0.01, 0.45]
+    tst = smke.TestFForm(endog, exog[:,:2], bw=[0.01, 0.45], var_type='cc',
+                         fform=lambda x,p: mod_ols.predict(p,x),
+                         estimator=lambda y,x: OLS(y,x).fit().params,
+                         nboot=1000)
+
     print('bw', tst.bw)
     print('tst.test_stat', tst.test_stat)
     print(tst.sig)
     print('tst.boots_results mean, min, max', (tst.boots_results.mean(),
-        tst.boots_results.min(), tst.boots_results.max()))
-    print('lower tail bootstrap p-value', (tst.boots_results < tst.
-        test_stat).mean())
-    print('upper tail bootstrap p-value', (tst.boots_results >= tst.
-        test_stat).mean())
+                                               tst.boots_results.min(),
+                                               tst.boots_results.max()))
+    print('lower tail bootstrap p-value', (tst.boots_results < tst.test_stat).mean())
+    print('upper tail bootstrap p-value', (tst.boots_results >= tst.test_stat).mean())
     from scipy import stats
-    print('aymp.normal p-value (2-sided)', stats.norm.sf(np.abs(tst.
-        test_stat)) * 2)
+    print('aymp.normal p-value (2-sided)', stats.norm.sf(np.abs(tst.test_stat))*2)
     print('aymp.normal p-value (upper)', stats.norm.sf(tst.test_stat))
-    do_plot = True
+
+    do_plot=True
     if do_plot:
         import matplotlib.pyplot as plt
         plt.figure()
         plt.plot(x, y, '.')
         plt.plot(x, res_ols.fittedvalues)
         plt.title('OLS fit')
+
         plt.figure()
         plt.hist(tst.boots_results.ravel(), bins=20)
         plt.title('bootstrap histogram or test statistic')
diff --git a/statsmodels/examples/ex_kernel_test_functional_li_wang.py b/statsmodels/examples/ex_kernel_test_functional_li_wang.py
index 9c3ac2532..29454a3c3 100644
--- a/statsmodels/examples/ex_kernel_test_functional_li_wang.py
+++ b/statsmodels/examples/ex_kernel_test_functional_li_wang.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Example TestFForm with Li Wang DGP1

 Created on Tue Jan 08 19:03:20 2013
@@ -57,63 +58,82 @@ aymp.normal p-value (upper) 0.306629578855


 """
+
+
+
 if __name__ == '__main__':
+
     import time
+
     import numpy as np
     from scipy import stats
+
     from statsmodels.regression.linear_model import OLS
+    #from statsmodels.nonparametric.api import KernelReg
     import statsmodels.sandbox.nonparametric.kernel_extras as smke
+
     seed = np.random.randint(999999)
+    #seed = 661176
     print(seed)
     np.random.seed(seed)
-    sig_e = 0.1
+
+    sig_e = 0.1 #0.5 #0.1
     nobs, k_vars = 100, 1
+
     t0 = time.time()
+
     b_res = []
     for i in range(100):
         x = np.random.uniform(0, 1, size=(nobs, k_vars))
         x.sort(0)
+
         order = 2
-        exog = x ** np.arange(1, order + 1)
-        beta = np.array([2, -0.2])[:order + 1 - 1]
+        exog = x**np.arange(1, order + 1)
+        beta = np.array([2, -0.2])[:order+1-1] # 1. / np.arange(1, order + 2)
         y_true = np.dot(exog, beta)
         y = y_true + sig_e * np.random.normal(size=nobs)
         endog = y
-        mod_ols = OLS(endog, exog[:, :1])
-        bw_lw = [1.0 / np.sqrt(12.0) * nobs ** -0.2] * 2
-        tst = smke.TestFForm(endog, exog[:, :1], bw=bw_lw, var_type='c',
-            fform=lambda x, p: mod_ols.predict(p, x), estimator=lambda y, x:
-            OLS(y, x).fit().params, nboot=399)
-        b_res.append([tst.test_stat, stats.norm.sf(tst.test_stat), (tst.
-            boots_results > tst.test_stat).mean()])
+
+        mod_ols = OLS(endog, exog[:,:1])
+        #res_ols = mod_ols.fit()
+        #'cv_ls'[1000, 0.5]
+        bw_lw = [1./np.sqrt(12.) * nobs**(-0.2)]*2  #(-1. / 5.)
+        tst = smke.TestFForm(endog, exog[:,:1], bw=bw_lw, var_type='c',
+                             fform=lambda x,p: mod_ols.predict(p,x),
+                             estimator=lambda y,x: OLS(y,x).fit().params,
+                             nboot=399)
+        b_res.append([tst.test_stat,
+                      stats.norm.sf(tst.test_stat),
+                      (tst.boots_results > tst.test_stat).mean()])
     t1 = time.time()
     b_res = np.asarray(b_res)
-    print('time', (t1 - t0) / 60.0)
+
+    print('time', (t1 - t0) / 60.)
     print(b_res.mean(0))
     print(b_res.std(0))
     print('reject at [0.2, 0.1, 0.05] (row 1: normal, row 2: bootstrap)')
-    print((b_res[:, 1:, None] >= [0.2, 0.1, 0.05]).mean(0))
+    print((b_res[:,1:,None] >= [0.2, 0.1, 0.05]).mean(0))
+
     print('bw', tst.bw)
     print('tst.test_stat', tst.test_stat)
     print(tst.sig)
-    print('tst.boots_results min, max', tst.boots_results.min(), tst.
-        boots_results.max())
-    print('lower tail bootstrap p-value', (tst.boots_results < tst.
-        test_stat).mean())
-    print('upper tail bootstrap p-value', (tst.boots_results >= tst.
-        test_stat).mean())
+    print('tst.boots_results min, max', tst.boots_results.min(), tst.boots_results.max())
+    print('lower tail bootstrap p-value', (tst.boots_results < tst.test_stat).mean())
+    print('upper tail bootstrap p-value', (tst.boots_results >= tst.test_stat).mean())
     from scipy import stats
-    print('aymp.normal p-value (2-sided)', stats.norm.sf(np.abs(tst.
-        test_stat)) * 2)
+    print('aymp.normal p-value (2-sided)', stats.norm.sf(np.abs(tst.test_stat))*2)
     print('aymp.normal p-value (upper)', stats.norm.sf(tst.test_stat))
+
     res_ols = mod_ols.fit()
-    do_plot = True
+
+    do_plot=True
     if do_plot:
         import matplotlib.pyplot as plt
         plt.figure()
         plt.plot(x, y, '.')
         plt.plot(x, res_ols.fittedvalues)
         plt.title('OLS fit')
+
         plt.figure()
         plt.hist(tst.boots_results.ravel(), bins=20)
         plt.title('bootstrap histogram or test statistic')
diff --git a/statsmodels/examples/ex_lowess.py b/statsmodels/examples/ex_lowess.py
index 0a1c1a532..492452921 100644
--- a/statsmodels/examples/ex_lowess.py
+++ b/statsmodels/examples/ex_lowess.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Mon Oct 31 15:26:06 2011

@@ -5,37 +6,70 @@ Author: Chris Jordan Squire

 extracted from test suite by josef-pktd
 """
+
 import os
+
 import numpy as np
 import matplotlib.pyplot as plt
 import statsmodels.api as sm
 import statsmodels.nonparametric.tests.results
+
+# this is just to check direct import
 import statsmodels.nonparametric.smoothers_lowess
 statsmodels.nonparametric.smoothers_lowess.lowess
+
 lowess = sm.nonparametric.lowess
-x = np.arange(20.0)
-noise = np.array([-0.76741118, -0.30754369, 0.39950921, -0.46352422, -
-    1.67081778, 0.6595567, 0.66367639, -2.04388585, 0.8123281, 1.45977518, 
-    1.21428038, 1.29296866, 0.78028477, -0.2402853, -0.21721302, 0.24549405,
-    0.25987014, -0.90709034, -1.45688216, -0.31780505])
+
+x = np.arange(20.)
+
+#standard normal noise
+noise = np.array([-0.76741118, -0.30754369,
+                    0.39950921, -0.46352422, -1.67081778,
+                    0.6595567 ,  0.66367639, -2.04388585,
+                    0.8123281 ,  1.45977518,
+                    1.21428038,  1.29296866,  0.78028477,
+                    -0.2402853 , -0.21721302,
+                    0.24549405,  0.25987014, -0.90709034,
+                    -1.45688216, -0.31780505])
 y = x + noise
-expected_lowess = np.array([[0.0, -0.58337912], [1.0, 0.61951246], [2.0, 
-    1.82221628], [3.0, 3.02536876], [4.0, 4.22667951], [5.0, 5.42387723], [
-    6.0, 6.60834945], [7.0, 7.7797691], [8.0, 8.91824348], [9.0, 9.94997506
-    ], [10.0, 10.89697569], [11.0, 11.78746276], [12.0, 12.62356492], [13.0,
-    13.41538492], [14.0, 14.15745254], [15.0, 14.92343948], [16.0, 
-    15.70019862], [17.0, 16.48167846], [18.0, 17.26380699], [19.0, 18.0466769]]
-    )
+
+expected_lowess = np.array([[  0.        ,  -0.58337912],
+                           [  1.        ,   0.61951246],
+                           [  2.        ,   1.82221628],
+                           [  3.        ,   3.02536876],
+                           [  4.        ,   4.22667951],
+                           [  5.        ,   5.42387723],
+                           [  6.        ,   6.60834945],
+                           [  7.        ,   7.7797691 ],
+                           [  8.        ,   8.91824348],
+                           [  9.        ,   9.94997506],
+                           [ 10.        ,  10.89697569],
+                           [ 11.        ,  11.78746276],
+                           [ 12.        ,  12.62356492],
+                           [ 13.        ,  13.41538492],
+                           [ 14.        ,  14.15745254],
+                           [ 15.        ,  14.92343948],
+                           [ 16.        ,  15.70019862],
+                           [ 17.        ,  16.48167846],
+                           [ 18.        ,  17.26380699],
+                           [ 19.        ,  18.0466769 ]])
+
 actual_lowess = lowess(y, x)
 print(actual_lowess)
-print(np.max(np.abs(actual_lowess - expected_lowess)))
+print(np.max(np.abs(actual_lowess-expected_lowess)))
+
 plt.plot(y, 'o')
-plt.plot(actual_lowess[:, 1])
-plt.plot(expected_lowess[:, 1])
+plt.plot(actual_lowess[:,1])
+plt.plot(expected_lowess[:,1])
+
 rpath = os.path.split(statsmodels.nonparametric.tests.results.__file__)[0]
 rfile = os.path.join(rpath, 'test_lowess_frac.csv')
-test_data = np.genfromtxt(open(rfile, 'rb'), delimiter=',', names=True)
+test_data = np.genfromtxt(open(rfile, 'rb'),
+                          delimiter=',', names=True)
 expected_lowess_23 = np.array([test_data['x'], test_data['out_2_3']]).T
 expected_lowess_15 = np.array([test_data['x'], test_data['out_1_5']]).T
-actual_lowess_23 = lowess(test_data['y'], test_data['x'], frac=2.0 / 3)
-actual_lowess_15 = lowess(test_data['y'], test_data['x'], frac=1.0 / 5)
+
+actual_lowess_23 = lowess(test_data['y'], test_data['x'] ,frac = 2./3)
+actual_lowess_15 = lowess(test_data['y'], test_data['x'] ,frac = 1./5)
+
+#plt.show()
diff --git a/statsmodels/examples/ex_misc_tarma.py b/statsmodels/examples/ex_misc_tarma.py
index 861832c24..9b5b426b9 100644
--- a/statsmodels/examples/ex_misc_tarma.py
+++ b/statsmodels/examples/ex_misc_tarma.py
@@ -1,55 +1,75 @@
+# -*- coding: utf-8 -*-
 """

 Created on Wed Jul 03 23:01:44 2013

 Author: Josef Perktold
 """
+
 import numpy as np
 import matplotlib.pyplot as plt
+
 from statsmodels.tsa.arima_process import arma_generate_sample, ArmaProcess
 from statsmodels.miscmodels.tmodel import TArma
 from statsmodels.tsa.arima_model import ARMA
 from statsmodels.tsa.arma_mle import Arma
+
 nobs = 500
 ar = [1, -0.6, -0.1]
 ma = [1, 0.7]
 dist = lambda n: np.random.standard_t(3, size=n)
 np.random.seed(8659567)
-x = arma_generate_sample(ar, ma, nobs, scale=1, distrvs=dist, burnin=500)
+x = arma_generate_sample(ar, ma, nobs, scale=1, distrvs=dist,
+                         burnin=500)
+
 mod = TArma(x)
-order = 2, 1
+order = (2, 1)
 res = mod.fit(order=order)
 res2 = mod.fit_mle(order=order, start_params=np.r_[res[0], 5, 1], method='nm')
+
 print(res[0])
 proc = ArmaProcess.from_coeffs(res[0][:order[0]], res[0][:order[1]])
+
 print(ar, ma)
 proc.nobs = nobs
+# TODO: bug nobs is None, not needed ?, used in ArmaProcess.__repr__
 print(proc.ar, proc.ma)
+
 print(proc.ar_roots(), proc.ma_roots())
+
 modn = Arma(x)
 resn = modn.fit_mle(order=order)
+
 moda = ARMA(x, order=order)
-resa = moda.fit(trend='nc')
-print("""
-parameter estimates""")
+resa = moda.fit( trend='nc')
+
+print('\nparameter estimates')
 print('ls  ', res[0])
 print('norm', resn.params)
 print('t   ', res2.params)
 print('A   ', resa.params)
-print("""
-standard deviation of parameter estimates""")
+
+print('\nstandard deviation of parameter estimates')
+#print 'ls  ', res[0]  #TODO: not available yet
 print('norm', resn.bse)
 print('t   ', res2.bse)
 print('A   ', resa.bse)
 print('A/t-1', resa.bse / res2.bse[:3] - 1)
+
 print('other bse')
 print(resn.bsejac)
 print(resn.bsejhj)
 print(res2.bsejac)
 print(res2.bsejhj)
+
 print(res2.t_test(np.eye(len(res2.params))))
+
+# TArma has no fittedvalues and resid
+# TODO: check if lag is correct or if fitted `x-resid` is shifted
 resid = res2.model.geterrors(res2.params)
-fv = res[2]['fvec']
+fv = res[2]['fvec']  #resid returned from leastsq?
+
 plt.plot(x, 'o', alpha=0.5)
-plt.plot(x - resid)
-plt.plot(x - fv)
+plt.plot(x-resid)
+plt.plot(x-fv)
+#plt.show()
diff --git a/statsmodels/examples/ex_misc_tmodel.py b/statsmodels/examples/ex_misc_tmodel.py
index 55c85aa60..7a089e8ea 100644
--- a/statsmodels/examples/ex_misc_tmodel.py
+++ b/statsmodels/examples/ex_misc_tmodel.py
@@ -1,38 +1,50 @@
+
 import numpy as np
+
 from scipy import stats
 import statsmodels.api as sm
 from statsmodels.miscmodels import TLinearModel
 from statsmodels.tools.numdiff import approx_hess
+
+#Example:
+#np.random.seed(98765678)
 nobs = 50
 nvars = 6
 df = 3
-rvs = np.random.randn(nobs, nvars - 1)
+rvs = np.random.randn(nobs, nvars-1)
 data_exog = sm.add_constant(rvs, prepend=False)
-xbeta = 0.9 + 0.1 * rvs.sum(1)
-data_endog = xbeta + 0.1 * np.random.standard_t(df, size=nobs)
+xbeta = 0.9 + 0.1*rvs.sum(1)
+data_endog = xbeta + 0.1*np.random.standard_t(df, size=nobs)
 print('variance of endog:', data_endog.var())
-print('true parameters:', [0.1] * nvars + [0.9])
+print('true parameters:', [0.1]*nvars + [0.9])
+
 res_ols = sm.OLS(data_endog, data_exog).fit()
-print("""
-Results with ols""")
+print('\nResults with ols')
 print('----------------')
 print(res_ols.scale)
 print(np.sqrt(res_ols.scale))
 print(res_ols.params)
 print(res_ols.bse)
 kurt = stats.kurtosis(res_ols.resid)
-df_fromkurt = 6.0 / kurt + 4
+df_fromkurt = 6./kurt + 4
 print('df_fromkurt from ols residuals', df_fromkurt)
 print(stats.t.stats(df_fromkurt, moments='mvsk'))
 print(stats.t.stats(df, moments='mvsk'))
+
 modp = TLinearModel(data_endog, data_exog)
-start_value = 0.1 * np.ones(data_exog.shape[1] + 2)
+start_value = 0.1*np.ones(data_exog.shape[1]+2)
+#start_value = np.zeros(data_exog.shape[1]+2)
+#start_value[:nvars] = sm.OLS(data_endog, data_exog).fit().params
 start_value[:nvars] = res_ols.params
-start_value[-2] = df_fromkurt
-start_value[-1] = np.sqrt(res_ols.scale)
+start_value[-2] = df_fromkurt #10
+start_value[-1] = np.sqrt(res_ols.scale) #0.5
 modp.start_params = start_value
+
+#adding fixed parameters
+
 fixdf = np.nan * np.zeros(modp.start_params.shape)
 fixdf[-2] = 5
+
 fixone = 0
 if fixone:
     modp.fixed_params = fixdf
@@ -41,20 +53,30 @@ if fixone:
 else:
     modp.fixed_params = None
     modp.fixed_paramsmask = None
-print("""
-Results with TLinearModel""")
+
+
+print('\nResults with TLinearModel')
 print('-------------------------')
-resp = modp.fit(start_params=modp.start_params, disp=1, method='nm', maxfun
-    =10000, maxiter=5000)
+resp = modp.fit(start_params = modp.start_params, disp=1, method='nm',
+                maxfun=10000, maxiter=5000)#'newton')
+#resp = modp.fit(start_params = modp.start_params, disp=1, method='newton')
+
 print('using Nelder-Mead')
 print(resp.params)
 print(resp.bse)
-resp2 = modp.fit(start_params=resp.params, method='Newton')
+resp2 = modp.fit(start_params = resp.params, method='Newton')
 print('using Newton')
 print(resp2.params)
 print(resp2.bse)
-hb = -approx_hess(modp.start_params, modp.loglike, epsilon=-0.0001)
+
+
+hb=-approx_hess(modp.start_params, modp.loglike, epsilon=-1e-4)
 tmp = modp.loglike(modp.start_params)
 print(tmp.shape)
 print('eigenvalues of numerical Hessian')
 print(np.linalg.eigh(np.linalg.inv(hb))[0])
+
+#store_params is only available in original test script
+##pp=np.array(store_params)
+##print pp.min(0)
+##print pp.max(0)
diff --git a/statsmodels/examples/ex_multivar_kde.py b/statsmodels/examples/ex_multivar_kde.py
index 9cf0767b9..fa154bca3 100644
--- a/statsmodels/examples/ex_multivar_kde.py
+++ b/statsmodels/examples/ex_multivar_kde.py
@@ -8,32 +8,46 @@ author: George Panterov
 import numpy as np
 import matplotlib.pyplot as plt
 from matplotlib import cm
+
 import statsmodels.api as sm
+
 if __name__ == '__main__':
     np.random.seed(123456)
+
+    # generate the data
     nobs = 500
     BW = 'cv_ml'
+
     mu1 = [3, 4]
     mu2 = [6, 1]
     cov1 = np.asarray([[1, 0.7], [0.7, 1]])
     cov2 = np.asarray([[1, -0.7], [-0.7, 1]])
+
     ix = np.random.uniform(size=nobs) > 0.5
     V = np.random.multivariate_normal(mu1, cov1, size=nobs)
     V[ix, :] = np.random.multivariate_normal(mu2, cov2, size=nobs)[ix, :]
+
     x = V[:, 0]
     y = V[:, 1]
-    dens = sm.nonparametric.KDEMultivariate(data=[x, y], var_type='cc', bw=
-        BW, defaults=sm.nonparametric.EstimatorSettings(efficient=True))
+
+    dens = sm.nonparametric.KDEMultivariate(data=[x, y], var_type='cc', bw=BW,
+                                            defaults=sm.nonparametric.EstimatorSettings(efficient=True))
+
     supportx = np.linspace(min(x), max(x), 60)
     supporty = np.linspace(min(y), max(y), 60)
     X, Y = np.meshgrid(supportx, supporty)
+
     edat = np.column_stack([X.ravel(), Y.ravel()])
     Z = dens.pdf(edat).reshape(X.shape)
+
+    # plot
     fig = plt.figure(1)
     ax = fig.gca(projection='3d')
     surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet,
-        linewidth=0, antialiased=False)
+            linewidth=0, antialiased=False)
+
     fig.colorbar(surf, shrink=0.5, aspect=5)
     plt.figure(2)
     plt.imshow(Z)
+
     plt.show()
diff --git a/statsmodels/examples/ex_nearest_corr.py b/statsmodels/examples/ex_nearest_corr.py
index e6633d291..b98a44444 100644
--- a/statsmodels/examples/ex_nearest_corr.py
+++ b/statsmodels/examples/ex_nearest_corr.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Find near positive definite correlation and covariance matrices

 Created on Sun Aug 19 15:25:07 2012
@@ -14,25 +15,43 @@ As distance measure for how close the change in the matrix is, we consider
 the sum of squared differences (Frobenious norm without taking the square root)

 """
+
 import numpy as np
-from statsmodels.stats.correlation_tools import corr_nearest, corr_clipped
+from statsmodels.stats.correlation_tools import (
+                 corr_nearest, corr_clipped)
+
 examples = ['all']
+
 if 'all' in examples:
+    # x0 is positive definite
     x0 = np.array([[1, -0.2, -0.9], [-0.2, 1, -0.2], [-0.9, -0.2, 1]])
+    # x has negative eigenvalues, not definite
     x = np.array([[1, -0.9, -0.9], [-0.9, 1, -0.9], [-0.9, -0.9, 1]])
+    #x = np.array([[1, 0.2, 0.2], [0.2, 1, 0.2], [0.2, 0.2, 1]])
+
     n_fact = 2
+
     print('evals original', np.linalg.eigvalsh(x))
     y = corr_nearest(x, n_fact=100)
     print('evals nearest', np.linalg.eigvalsh(y))
     print(y)
+
     y = corr_nearest(x, n_fact=100, threshold=1e-16)
     print('evals nearest', np.linalg.eigvalsh(y))
     print(y)
+
     y = corr_clipped(x, threshold=1e-16)
     print('evals clipped', np.linalg.eigvalsh(y))
     print(y)
+
     np.set_printoptions(precision=4)
     print('\nMini Monte Carlo')
+    # we are simulating a uniformly distributed symmetric matrix
+    #     and find close positive definite matrix
+    # original can be far away from positive definite,
+    #     then original and converted matrices can be far apart in norm
+    # results are printed for visual inspection of different cases
+
     k_vars = 5
     diag_idx = np.arange(k_vars)
     for ii in range(10):
@@ -40,29 +59,41 @@ if 'all' in examples:
         x = np.random.uniform(-1, 1, size=(k_vars, k_vars))
         x = (x + x.T) * 0.5
         x[diag_idx, diag_idx] = 1
+        #x_std = np.sqrt(np.diag(x))
+        #x = x / x_std / x_std[:,None]
         print()
         print(np.sort(np.linalg.eigvals(x)), 'original')
+
         yn = corr_nearest(x, threshold=1e-12, n_fact=200)
-        print(np.sort(np.linalg.eigvals(yn)), ((yn - x) ** 2).sum(), 'nearest')
+        print(np.sort(np.linalg.eigvals(yn)), ((yn - x)**2).sum(), 'nearest')
+
         yc = corr_clipped(x, threshold=1e-12)
-        print(np.sort(np.linalg.eigvals(yc)), ((yc - x) ** 2).sum(), 'clipped')
+        print(np.sort(np.linalg.eigvals(yc)), ((yc - x)**2).sum(), 'clipped')
+
     import time
     t0 = time.time()
     for _ in range(100):
         corr_nearest(x, threshold=1e-15, n_fact=100)
+
     t1 = time.time()
     for _ in range(1000):
         corr_clipped(x, threshold=1e-15)
     t2 = time.time()
+
     print('\ntime (nearest, clipped):', t1 - t0, t2 - t1)
+
 if 'all' in examples:
-    x2 = np.array([1, 0.477, 0.644, 0.478, 0.651, 0.826, 0.477, 1, 0.516, 
-        0.233, 0.682, 0.75, 0.644, 0.516, 1, 0.599, 0.581, 0.742, 0.478, 
-        0.233, 0.599, 1, 0.741, 0.8, 0.651, 0.682, 0.581, 0.741, 1, 0.798, 
-        0.826, 0.75, 0.742, 0.8, 0.798, 1]).reshape(6, 6)
+    # example for test case against R
+    x2 = np.array([ 1,     0.477, 0.644, 0.478, 0.651, 0.826,
+                   0.477, 1,     0.516, 0.233, 0.682, 0.75,
+                   0.644, 0.516, 1,     0.599, 0.581, 0.742,
+                   0.478, 0.233, 0.599, 1,     0.741, 0.8,
+                   0.651, 0.682, 0.581, 0.741, 1,     0.798,
+                   0.826, 0.75,  0.742, 0.8,   0.798, 1]).reshape(6,6)
+
     y1 = corr_nearest(x2, threshold=1e-15, n_fact=200)
     y2 = corr_clipped(x2, threshold=1e-15)
     print('\nmatrix 2')
     print(np.sort(np.linalg.eigvals(x2)), 'original')
-    print(np.sort(np.linalg.eigvals(y1)), ((y1 - x2) ** 2).sum(), 'nearest')
-    print(np.sort(np.linalg.eigvals(y1)), ((y2 - x2) ** 2).sum(), 'clipped')
+    print(np.sort(np.linalg.eigvals(y1)), ((y1 - x2)**2).sum(), 'nearest')
+    print(np.sort(np.linalg.eigvals(y1)), ((y2 - x2)**2).sum(), 'clipped')
diff --git a/statsmodels/examples/ex_ols_robustcov.py b/statsmodels/examples/ex_ols_robustcov.py
index 89e0671d4..e14c25eb7 100644
--- a/statsmodels/examples/ex_ols_robustcov.py
+++ b/statsmodels/examples/ex_ols_robustcov.py
@@ -1,43 +1,62 @@
+
 import numpy as np
+
 from statsmodels.datasets import macrodata
 from statsmodels.regression.linear_model import OLS
 from statsmodels.tools.tools import add_constant
+
 d2 = macrodata.load().data
-g_gdp = 400 * np.diff(np.log(d2['realgdp']))
-g_inv = 400 * np.diff(np.log(d2['realinv']))
+g_gdp = 400*np.diff(np.log(d2['realgdp']))
+g_inv = 400*np.diff(np.log(d2['realinv']))
 exogg = add_constant(np.c_[g_gdp, d2['realint'][:-1]], prepend=False)
 res_olsg = OLS(g_inv, exogg).fit()
+
+
+
 print(res_olsg.summary())
 res_hc0 = res_olsg.get_robustcov_results('HC1')
 print('\n\n')
 print(res_hc0.summary())
 print('\n\n')
-res_hac4 = res_olsg.get_robustcov_results('HAC', maxlags=4, use_correction=True
-    )
+res_hac4 = res_olsg.get_robustcov_results('HAC', maxlags=4, use_correction=True)
 print(res_hac4.summary())
+
+
 print('\n\n')
 tt = res_hac4.t_test(np.eye(len(res_hac4.params)))
 print(tt.summary())
 print('\n\n')
 print(tt.summary_frame())
+
 res_hac4.use_t = False
+
 print('\n\n')
 tt = res_hac4.t_test(np.eye(len(res_hac4.params)))
 print(tt.summary())
 print('\n\n')
 print(tt.summary_frame())
+
 print(vars(res_hac4.f_test(np.eye(len(res_hac4.params))[:-1])))
+
 print(vars(res_hac4.wald_test(np.eye(len(res_hac4.params))[:-1], use_f=True)))
 print(vars(res_hac4.wald_test(np.eye(len(res_hac4.params))[:-1], use_f=False)))
+
+# new cov_type can be set in fit method of model
+
 mod_olsg = OLS(g_inv, exogg)
-res_hac4b = mod_olsg.fit(cov_type='HAC', cov_kwds=dict(maxlags=4,
-    use_correction=True))
+res_hac4b = mod_olsg.fit(cov_type='HAC',
+                         cov_kwds=dict(maxlags=4, use_correction=True))
 print(res_hac4b.summary())
+
 res_hc1b = mod_olsg.fit(cov_type='HC1')
 print(res_hc1b.summary())
-res_hc1c = mod_olsg.fit(cov_type='HC1', cov_kwds={'use_t': True})
+
+# force t-distribution
+res_hc1c = mod_olsg.fit(cov_type='HC1', cov_kwds={'use_t':True})
 print(res_hc1c.summary())
-decade = (d2['year'][1:] // 10).astype(int)
-res_clu = mod_olsg.fit(cov_type='cluster', cov_kwds={'groups': decade,
-    'use_t': True})
+
+# force t-distribution
+decade = (d2['year'][1:] // 10).astype(int)  # just make up a group variable
+res_clu = mod_olsg.fit(cov_type='cluster',
+                       cov_kwds={'groups':decade, 'use_t':True})
 print(res_clu.summary())
diff --git a/statsmodels/examples/ex_ordered_model.py b/statsmodels/examples/ex_ordered_model.py
index 39802f0fc..a03f25a48 100644
--- a/statsmodels/examples/ex_ordered_model.py
+++ b/statsmodels/examples/ex_ordered_model.py
@@ -1,57 +1,97 @@
+# -*- coding: utf-8 -*-
 """
 Created on Mon Aug 24 11:17:06 2015

 Author: Josef Perktold
 License: BSD-3
 """
+
 import numpy as np
 from scipy import stats
 import pandas
+
 from statsmodels.miscmodels.ordinal_model import OrderedModel
+
 nobs, k_vars = 1000, 3
 x = np.random.randn(nobs, k_vars)
+# x = np.column_stack((np.ones(nobs), x))
+# #constant will be in integration limits
 xb = x.dot(np.ones(k_vars))
 y_latent = xb + np.random.randn(nobs)
 y = np.round(np.clip(y_latent, -2.4, 2.4)).astype(int) + 2
+
 print(np.unique(y))
 print(np.bincount(y))
+
 mod = OrderedModel(y, x)
+# start_params = np.ones(k_vars + 4)
+# start_params = np.concatenate((np.ones(k_vars), np.arange(4)))
 start_ppf = stats.norm.ppf((np.bincount(y) / len(y)).cumsum())
-start_threshold = np.concatenate((start_ppf[:1], np.log(np.diff(start_ppf[:
-    -1]))))
+start_threshold = np.concatenate((start_ppf[:1],
+                                  np.log(np.diff(start_ppf[:-1]))))
 start_params = np.concatenate((np.zeros(k_vars), start_threshold))
 res = mod.fit(start_params=start_params, maxiter=5000, maxfun=5000)
 print(res.params)
+# res = mod.fit(start_params=res.params, method='bfgs')
 res = mod.fit(start_params=start_params, method='bfgs')
+
 print(res.params)
 print(np.exp(res.params[-(mod.k_levels - 1):]).cumsum())
+# print(res.summary())
+
 predicted = res.model.predict(res.params)
 pred_choice = predicted.argmax(1)
 print('Fraction of correct choice predictions')
 print((y == pred_choice).mean())
-print("""
-comparing bincount""")
+
+print('\ncomparing bincount')
 print(np.bincount(res.model.predict(res.params).argmax(1)))
 print(np.bincount(res.model.endog))
+
 res_log = OrderedModel(y, x, distr='logit').fit(method='bfgs')
 pred_choice_log = res_log.predict().argmax(1)
 print((y == pred_choice_log).mean())
 print(res_log.summary())
-dataf = pandas.read_stata('M:\\josef_new\\scripts\\ologit_ucla.dta')
-res_log2 = OrderedModel(np.asarray(dataf['apply']), np.asarray(dataf[[
-    'pared', 'public', 'gpa']], float), distr='logit').fit(method='bfgs')
-res_log3 = OrderedModel(dataf['apply'].values.codes, np.asarray(dataf[[
-    'pared', 'public', 'gpa']], float), distr='logit').fit(method='bfgs')
+
+# example form UCLA Stats pages
+# http://www.ats.ucla.edu/stat/stata/dae/ologit.htm
+# requires downloaded dataset ologit.dta
+
+dataf = pandas.read_stata(r"M:\josef_new\scripts\ologit_ucla.dta")
+
+# this works but sorts category levels alphabetically
+res_log2 = OrderedModel(np.asarray(dataf['apply']),
+                        np.asarray(dataf[['pared', 'public', 'gpa']], float),
+                        distr='logit').fit(method='bfgs')
+
+# this replicates the UCLA example except
+# for different parameterization of par2
+res_log3 = OrderedModel(dataf['apply'].values.codes,
+                        np.asarray(dataf[['pared', 'public', 'gpa']], float),
+                        distr='logit').fit(method='bfgs')
+
 print(res_log3.summary())
-print(OrderedModel(dataf['apply'].values.codes, np.asarray(dataf[['pared',
-    'public', 'gpa']], float), distr='probit').fit(method='bfgs').summary())
+
+# with ordered probit - not on UCLA page
+print(
+    OrderedModel(dataf['apply'].values.codes,
+                 np.asarray(dataf[['pared', 'public', 'gpa']], float),
+                 distr='probit').fit(method='bfgs').summary())


+# example with a custom distribution - not on UCLA page
+# definition of the SciPy dist
 class CLogLog(stats.rv_continuous):
-    pass
+    def _ppf(self, q):
+        return np.log(-np.log(1 - q))
+
+    def _cdf(self, x):
+        return 1 - np.exp(-np.exp(x))


 cloglog = CLogLog()
-res_cloglog = OrderedModel(dataf['apply'], dataf[['pared', 'public', 'gpa']
-    ], distr=cloglog).fit(method='bfgs', disp=False)
+
+res_cloglog = OrderedModel(dataf['apply'],
+                           dataf[['pared', 'public', 'gpa']],
+                           distr=cloglog).fit(method='bfgs', disp=False)
 print(res_cloglog.summary())
diff --git a/statsmodels/examples/ex_outliers_influence.py b/statsmodels/examples/ex_outliers_influence.py
index e15469be2..9d2ced8b8 100644
--- a/statsmodels/examples/ex_outliers_influence.py
+++ b/statsmodels/examples/ex_outliers_influence.py
@@ -1,9 +1,15 @@
+
 import numpy as np
+
 import statsmodels.stats.outliers_influence as oi
+
+
 if __name__ == '__main__':
+
     import statsmodels.api as sm
-    data = np.array(
-        """    64 57  8
+
+    data = np.array('''\
+    64 57  8
     71 59 10
     53 49  6
     67 62 11
@@ -14,62 +20,90 @@ if __name__ == '__main__':
     56 42 10
     51 42  6
     76 61 12
-    68 57  9"""
-        .split(), float).reshape(-1, 3)
+    68 57  9'''.split(), float).reshape(-1,3)
     varnames = 'weight height age'.split()
-    endog = data[:, 0]
-    exog = sm.add_constant(data[:, 2])
+
+    endog = data[:,0]
+    exog = sm.add_constant(data[:,2])
+
+
     res_ols = sm.OLS(endog, exog).fit()
+
     hh = (res_ols.model.exog * res_ols.model.pinv_wexog.T).sum(1)
     x = res_ols.model.exog
-    hh_check = np.diag(np.dot(x, np.dot(res_ols.model.normalized_cov_params,
-        x.T)))
+    hh_check = np.diag(np.dot(x, np.dot(res_ols.model.normalized_cov_params, x.T)))
+
     from numpy.testing import assert_almost_equal
     assert_almost_equal(hh, hh_check, decimal=13)
-    res = res_ols
-    resid_press = res.resid / (1 - hh)
+
+    res = res_ols #alias
+
+    #http://en.wikipedia.org/wiki/PRESS_statistic
+    #predicted residuals, leave one out predicted residuals
+    resid_press = res.resid / (1-hh)
     ess_press = np.dot(resid_press, resid_press)
-    sigma2_est = np.sqrt(res.mse_resid)
+
+    sigma2_est = np.sqrt(res.mse_resid) #can be replace by different estimators of sigma
     sigma_est = np.sqrt(sigma2_est)
     resid_studentized = res.resid / sigma_est / np.sqrt(1 - hh)
+    #http://en.wikipedia.org/wiki/DFFITS:
     dffits = resid_studentized * np.sqrt(hh / (1 - hh))
+
     nobs, k_vars = res.model.exog.shape
-    dffits_threshold = 2 * np.sqrt(k_vars / nobs)
+    #Belsley, Kuh and Welsch (1980) suggest a threshold for abs(DFFITS)
+    dffits_threshold = 2 * np.sqrt(k_vars/nobs)
+
     res_ols.df_modelwc = res_ols.df_model + 1
     n_params = res.model.exog.shape[1]
-    cooks_d = res.resid ** 2 / sigma2_est / res_ols.df_modelwc * hh / (1 - hh
-        ) ** 2
-    cooks_d2 = resid_studentized ** 2 / res_ols.df_modelwc * hh / (1 - hh)
+    #http://en.wikipedia.org/wiki/Cook%27s_distance
+    cooks_d = res.resid**2 / sigma2_est / res_ols.df_modelwc * hh / (1 - hh)**2
+    #or
+    #Eubank p.93, 94
+    cooks_d2 = resid_studentized**2 / res_ols.df_modelwc * hh / (1 - hh)
+    #threshold if normal, also Wikipedia
     from scipy import stats
     alpha = 0.1
-    print(stats.f.isf(1 - alpha, n_params, res.df_resid))
+    #df looks wrong
+    print(stats.f.isf(1-alpha, n_params, res.df_resid))
     print(stats.f.sf(cooks_d, n_params, res.df_resid))
+
+
     print('Cooks Distance')
     print(cooks_d)
     print(cooks_d2)
+
     doplot = 0
     if doplot:
         import matplotlib.pyplot as plt
         fig = plt.figure()
-        ax = fig.add_subplot(3, 1, 2)
+#        ax = fig.add_subplot(3,1,1)
+#        plt.plot(andrew_results.weights, 'o', label='rlm weights')
+#        plt.legend(loc='lower left')
+        ax = fig.add_subplot(3,1,2)
         plt.plot(cooks_d, 'o', label="Cook's distance")
         plt.legend(loc='upper left')
-        ax2 = fig.add_subplot(3, 1, 3)
+        ax2 = fig.add_subplot(3,1,3)
         plt.plot(resid_studentized, 'o', label='studentized_resid')
         plt.plot(dffits, 'o', label='DFFITS')
         leg = plt.legend(loc='lower left', fancybox=True)
-        leg.get_frame().set_alpha(0.5)
-        ltext = leg.get_texts()
-        plt.setp(ltext, fontsize='small')
+        leg.get_frame().set_alpha(0.5) #, fontsize='small')
+        ltext = leg.get_texts() # all the text.Text instance in the legend
+        plt.setp(ltext, fontsize='small') # the legend text fontsize
+
+
     print(oi.reset_ramsey(res, degree=3))
+
+    #note, constant in last column
     for i in range(1):
         print(oi.variance_inflation_factor(res.model.exog, i))
+
     infl = oi.OLSInfluence(res_ols)
     print(infl.resid_studentized_external)
     print(infl.resid_studentized_internal)
     print(infl.summary_table())
     print(oi.summary_table(res, alpha=0.05)[0])
-"""
+
+'''
 >>> res.resid
 array([  4.28571429,   4.        ,   0.57142857,  -3.64285714,
         -4.71428571,   1.92857143,  10.        ,  -6.35714286,
@@ -84,4 +118,4 @@ array([  4.76635514,   4.53333333,   0.8       ,  -4.56315789,
        -12.46666667,  -2.        ,   2.58227848,   5.06880734])
 >>> infl.ess_press
 465.98646628086374
-"""
+'''
diff --git a/statsmodels/examples/ex_pandas.py b/statsmodels/examples/ex_pandas.py
index aacd5ea16..09190fbab 100644
--- a/statsmodels/examples/ex_pandas.py
+++ b/statsmodels/examples/ex_pandas.py
@@ -1,26 +1,41 @@
+# -*- coding: utf-8 -*-
 """Examples using Pandas

 """
+
+
 from statsmodels.compat.pandas import frequencies
+
 from datetime import datetime
+
 import matplotlib as mpl
 import matplotlib.pyplot as plt
 import numpy as np
 from pandas import DataFrame, Series
+
 import statsmodels.api as sm
 import statsmodels.tsa.api as tsa
 from statsmodels.tsa.arima_process import arma_generate_sample
+
 data = sm.datasets.stackloss.load()
 X = DataFrame(data.exog, columns=data.exog_name)
-X['intercept'] = 1.0
+X['intercept'] = 1.
 Y = Series(data.endog)
+
+#Example: OLS
 model = sm.OLS(Y, X)
 results = model.fit()
 print(results.summary())
+
 print(results.params)
 print(results.cov_params())
+
 infl = results.get_influence()
 print(infl.summary_table())
+
+#raise
+
+#Example RLM
 huber_t = sm.RLM(Y, X, M=sm.robust.norms.HuberT())
 hub_results = huber_t.fit()
 print(hub_results.params)
@@ -31,42 +46,86 @@ print(hub_results.summary())
 def plot_acf_multiple(ys, lags=20):
     """
     """
-    pass
+    from statsmodels.tsa.stattools import acf
+
+    # hack
+    old_size = mpl.rcParams['font.size']
+    mpl.rcParams['font.size'] = 8
+
+    plt.figure(figsize=(10, 10))
+    xs = np.arange(lags + 1)
+
+    acorr = np.apply_along_axis(lambda x: acf(x, nlags=lags), 0, ys)
+
+    k = acorr.shape[1]
+    for i in range(k):
+        ax = plt.subplot(k, 1, i + 1)
+        ax.vlines(xs, [0], acorr[:, i])
+
+        ax.axhline(0, color='k')
+        ax.set_ylim([-1, 1])

+        # hack?
+        ax.set_xlim([-1, xs[-1] + 1])
+
+    mpl.rcParams['font.size'] = old_size
+
+#Example TSA descriptive

 data = sm.datasets.macrodata.load()
 mdata = data.data
 df = DataFrame.from_records(mdata)
 quarter_end = frequencies.BQuarterEnd()
-df.index = [quarter_end.rollforward(datetime(int(y), int(q) * 3, 1)) for y,
-    q in zip(df.pop('year'), df.pop('quarter'))]
+df.index = [quarter_end.rollforward(datetime(int(y), int(q) * 3, 1))
+for y, q in zip(df.pop('year'), df.pop('quarter'))]
 logged = np.log(df.loc[:, ['m1', 'realgdp', 'cpi']])
 logged.plot(subplots=True)
+
 log_difference = logged.diff().dropna()
 plot_acf_multiple(log_difference.values)
+
+#Example TSA VAR
+
 model = tsa.VAR(log_difference, freq='BQ')
 print(model.select_order())
+
 res = model.fit(2)
 print(res.summary())
 print(res.is_stable())
+
 irf = res.irf(20)
 irf.plot()
+
 fevd = res.fevd()
 fevd.plot()
+
 print(res.test_whiteness())
 print(res.test_causality('m1', 'realgdp'))
 print(res.test_normality())
-arparams = np.array([0.75, -0.25])
-maparams = np.array([0.65, 0.35])
+
+
+#Example TSA ARMA
+
+
+# Generate some data from an ARMA process
+arparams = np.array([.75, -.25])
+maparams = np.array([.65, .35])
+# The conventions of the arma_generate function require that we specify a
+# 1 for the zero-lag of the AR and MA parameters and that the AR parameters
+# be negated.
 arparams = np.r_[1, -arparams]
 maparam = np.r_[1, maparams]
 nobs = 250
 y = arma_generate_sample(arparams, maparams, nobs)
 plt.figure()
 plt.plot(y)
+
+# Now, optionally, we can add some dates information. For this example,
+# we'll use a pandas time series.
 dates = sm.tsa.datetools.dates_from_range('1980m1', length=nobs)
 y = Series(y, index=dates)
 arma_mod = sm.tsa.ARMA(y, order=(2, 2), freq='M')
 arma_res = arma_mod.fit(trend='nc', disp=-1)
 print(arma_res.params)
+
 plt.show()
diff --git a/statsmodels/examples/ex_pareto_plot.py b/statsmodels/examples/ex_pareto_plot.py
index 729579c3b..6a35b66ac 100644
--- a/statsmodels/examples/ex_pareto_plot.py
+++ b/statsmodels/examples/ex_pareto_plot.py
@@ -1,22 +1,34 @@
+# -*- coding: utf-8 -*-
 """
 Created on Sun Aug 01 19:20:16 2010

 Author: josef-pktd
 """
+
+
 import numpy as np
 from scipy import stats
 import matplotlib.pyplot as plt
+
 nobs = 1000
 r = stats.pareto.rvs(1, size=nobs)
-rhisto, e = np.histogram(np.clip(r, 0, 1000), bins=50)
+
+#rhisto = np.histogram(r, bins=20)
+rhisto, e = np.histogram(np.clip(r, 0 , 1000), bins=50)
 plt.figure()
-plt.loglog(e[:-1] + np.diff(e) / 2, rhisto, '-o')
+plt.loglog(e[:-1]+np.diff(e)/2, rhisto, '-o')
 plt.figure()
-plt.loglog(e[:-1] + np.diff(e) / 2, nobs - rhisto.cumsum(), '-o')
+plt.loglog(e[:-1]+np.diff(e)/2, nobs-rhisto.cumsum(), '-o')
+##plt.figure()
+##plt.plot(e[:-1]+np.diff(e)/2, rhisto.cumsum(), '-o')
+##plt.figure()
+##plt.semilogx(e[:-1]+np.diff(e)/2, nobs-rhisto.cumsum(), '-o')
+
 rsind = np.argsort(r)
 rs = r[rsind]
-rsf = nobs - rsind.argsort()
+rsf = nobs-rsind.argsort()
 plt.figure()
-plt.loglog(rs, nobs - np.arange(nobs), '-o')
-print(stats.linregress(np.log(rs), np.log(nobs - np.arange(nobs))))
+plt.loglog(rs, nobs-np.arange(nobs), '-o')
+print(stats.linregress(np.log(rs), np.log(nobs-np.arange(nobs))))
+
 plt.show()
diff --git a/statsmodels/examples/ex_predict_results.py b/statsmodels/examples/ex_predict_results.py
index c56a81c15..6d18d5027 100644
--- a/statsmodels/examples/ex_predict_results.py
+++ b/statsmodels/examples/ex_predict_results.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Sat Dec 20 12:01:13 2014

@@ -5,61 +6,88 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 from numpy.testing import assert_allclose
 from statsmodels.regression.linear_model import WLS
+
 from statsmodels.tools.tools import add_constant
 from statsmodels.sandbox.regression.predstd import wls_prediction_std
 from statsmodels.regression._prediction import get_prediction
 from statsmodels.genmod._prediction import params_transform_univariate
 from statsmodels.genmod.generalized_linear_model import GLM
 from statsmodels.genmod.families import links
+
+
+# from example wls.py
+
 nsample = 50
 x = np.linspace(0, 20, nsample)
-X = np.column_stack((x, (x - 5) ** 2))
+X = np.column_stack((x, (x - 5)**2))
 X = add_constant(X)
-beta = [5.0, 0.5, -0.01]
+beta = [5., 0.5, -0.01]
 sig = 0.5
 w = np.ones(nsample)
-w[nsample * 6 / 10:] = 3
+w[nsample * 6/10:] = 3
 y_true = np.dot(X, beta)
 e = np.random.normal(size=nsample)
 y = y_true + sig * w * e
-X = X[:, [0, 1]]
-mod_wls = WLS(y, X, weights=1.0 / w)
+X = X[:,[0,1]]
+
+
+# ### WLS knowing the true variance ratio of heteroscedasticity
+
+mod_wls = WLS(y, X, weights=1./w)
 res_wls = mod_wls.fit()
+
+
+
 prstd, iv_l, iv_u = wls_prediction_std(res_wls)
 pred_res = get_prediction(res_wls)
 ci = pred_res.conf_int(obs=True)
+
 assert_allclose(pred_res.se_obs, prstd, rtol=1e-13)
 assert_allclose(ci, np.column_stack((iv_l, iv_u)), rtol=1e-13)
+
 print(pred_res.summary_frame().head())
+
 pred_res2 = res_wls.get_prediction()
 ci2 = pred_res2.conf_int(obs=True)
+
 assert_allclose(pred_res2.se_obs, prstd, rtol=1e-13)
 assert_allclose(ci2, np.column_stack((iv_l, iv_u)), rtol=1e-13)
+
 print(pred_res2.summary_frame().head())
+
 res_wls_n = mod_wls.fit(use_t=False)
 pred_wls_n = res_wls_n.get_prediction()
 print(pred_wls_n.summary_frame().head())
+
+
 w_sqrt = np.sqrt(w)
-mod_glm = GLM(y / w_sqrt, X / w_sqrt[:, None])
+mod_glm = GLM(y/w_sqrt, X/w_sqrt[:,None])
 res_glm = mod_glm.fit()
 pred_glm = res_glm.get_prediction()
 print(pred_glm.summary_frame().head())
+
 res_glm_t = mod_glm.fit(use_t=True)
 pred_glm_t = res_glm_t.get_prediction()
 print(pred_glm_t.summary_frame().head())
+
 rates = params_transform_univariate(res_glm.params, res_glm.cov_params())
-print("""
-Rates exp(params)""")
+print('\nRates exp(params)')
 print(rates.summary_frame())
-rates2 = np.column_stack((np.exp(res_glm.params), res_glm.bse * np.exp(
-    res_glm.params), np.exp(res_glm.conf_int())))
+
+rates2 = np.column_stack((np.exp(res_glm.params),
+                          res_glm.bse * np.exp(res_glm.params),
+                          np.exp(res_glm.conf_int())))
 assert_allclose(rates.summary_frame().values, rates2, rtol=1e-13)
-pt = params_transform_univariate(res_glm.params, res_glm.cov_params(), link
-    =links.Identity())
+
+
+# with identity transform
+pt = params_transform_univariate(res_glm.params, res_glm.cov_params(), link=links.Identity())
 print(pt.tvalues)
+
 assert_allclose(pt.tvalues, res_glm.tvalues, rtol=1e-13)
 assert_allclose(pt.se_mean, res_glm.bse, rtol=1e-13)
 ptt = pt.t_test()
diff --git a/statsmodels/examples/ex_proportion.py b/statsmodels/examples/ex_proportion.py
index e14f7553c..956133456 100644
--- a/statsmodels/examples/ex_proportion.py
+++ b/statsmodels/examples/ex_proportion.py
@@ -1,16 +1,24 @@
+# -*- coding: utf-8 -*-
 """

 Created on Sun Apr 21 07:59:26 2013

 Author: Josef Perktold
 """
+
 from statsmodels.compat.python import lmap
 import numpy as np
 import matplotlib.pyplot as plt
+
 import statsmodels.stats.proportion as sms
 import statsmodels.stats.weightstats as smw
+
 from numpy.testing import assert_almost_equal
-ss = """1 blue  fair   23  1 blue  red     7  1 blue  medium 24
+
+
+# Region, Eyes, Hair, Count
+ss = '''\
+1 blue  fair   23  1 blue  red     7  1 blue  medium 24
 1 blue  dark   11  1 green fair   19  1 green red     7
 1 green medium 18  1 green dark   14  1 brown fair   34
 1 brown red     5  1 brown medium 41  1 brown dark   40
@@ -18,16 +26,23 @@ ss = """1 blue  fair   23  1 blue  red     7  1 blue  medium 24
 2 blue  medium 44  2 blue  dark   40  2 blue  black   6
 2 green fair   50  2 green red    31  2 green medium 37
 2 green dark   23  2 brown fair   56  2 brown red    42
-2 brown medium 53  2 brown dark   54  2 brown black  13"""
-dta0 = np.array(ss.split()).reshape(-1, 4)
-dta = np.array(lmap(tuple, dta0.tolist()), dtype=[('Region', int), ('Eyes',
-    'S6'), ('Hair', 'S6'), ('Count', int)])
-xfair = np.repeat([1, 0], [228, 762 - 228])
+2 brown medium 53  2 brown dark   54  2 brown black  13'''
+
+dta0 = np.array(ss.split()).reshape(-1,4)
+dta = np.array(lmap(tuple, dta0.tolist()), dtype=[('Region', int), ('Eyes', 'S6'), ('Hair', 'S6'), ('Count', int)])
+
+xfair = np.repeat([1,0], [228, 762-228])
+
+# comparing to SAS last output at
+# http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_sect028.htm
+# confidence interval for tost
 ci01 = smw.confint_ztest(xfair, alpha=0.1)
-assert_almost_equal(ci01, [0.2719, 0.3265], 4)
+assert_almost_equal(ci01,  [0.2719, 0.3265], 4)
 res = smw.ztost(xfair, 0.18, 0.38)
+
 assert_almost_equal(res[1][0], 7.1865, 4)
 assert_almost_equal(res[2][0], -4.8701, 4)
+
 nn = np.arange(200, 351)
 pow_z = sms.power_ztost_prop(0.5, 0.72, nn, 0.6, alpha=0.05)
 pow_bin = sms.power_ztost_prop(0.5, 0.72, nn, 0.6, alpha=0.05, dist='binom')
@@ -37,4 +52,5 @@ plt.legend(loc='lower right')
 plt.title('Proportion Equivalence Test: Power as function of sample size')
 plt.xlabel('Number of Observations')
 plt.ylabel('Power')
+
 plt.show()
diff --git a/statsmodels/examples/ex_regressionplots.py b/statsmodels/examples/ex_regressionplots.py
index 2a920665d..64df3fc47 100644
--- a/statsmodels/examples/ex_regressionplots.py
+++ b/statsmodels/examples/ex_regressionplots.py
@@ -1,95 +1,141 @@
+# -*- coding: utf-8 -*-
 """Examples for Regression Plots

 Author: Josef Perktold

 """
+
 import numpy as np
 import statsmodels.api as sm
 import matplotlib.pyplot as plt
+
 from statsmodels.sandbox.regression.predstd import wls_prediction_std
 import statsmodels.graphics.regressionplots as smrp
 from statsmodels.graphics.tests.test_regressionplots import TestPlot
+
+#example from tut.ols with changes
+#fix a seed for these examples
 np.random.seed(9876789)
+
+# OLS non-linear curve but linear in parameters
+# ---------------------------------------------
+
 nsample = 100
 sig = 0.5
 x1 = np.linspace(0, 20, nsample)
-x2 = 5 + 3 * np.random.randn(nsample)
-X = np.c_[x1, x2, np.sin(0.5 * x1), (x2 - 5) ** 2, np.ones(nsample)]
-beta = [0.5, 0.5, 1, -0.04, 5.0]
+x2 = 5 + 3* np.random.randn(nsample)
+X = np.c_[x1, x2, np.sin(0.5*x1), (x2-5)**2, np.ones(nsample)]
+beta = [0.5, 0.5, 1, -0.04, 5.]
 y_true = np.dot(X, beta)
 y = y_true + sig * np.random.normal(size=nsample)
+
+#estimate only linear function, misspecified because of non-linear terms
 exog0 = sm.add_constant(np.c_[x1, x2], prepend=False)
+
+#    plt.figure()
+#    plt.plot(x1, y, 'o', x1, y_true, 'b-')
+
 res = sm.OLS(y, exog0).fit()
-plot_old = 0
+#print res.params
+#print res.bse
+
+
+plot_old = 0 #True
 if plot_old:
+
+    #current bug predict requires call to model.results
+    #print res.model.predict
     prstd, iv_l, iv_u = wls_prediction_std(res)
     plt.plot(x1, res.fittedvalues, 'r-o')
     plt.plot(x1, iv_u, 'r--')
     plt.plot(x1, iv_l, 'r--')
     plt.title('blue: true,   red: OLS')
+
     plt.figure()
     plt.plot(res.resid, 'o')
     plt.title('Residuals')
+
     fig2 = plt.figure()
-    ax = fig2.add_subplot(2, 1, 1)
+    ax = fig2.add_subplot(2,1,1)
+    #namestr = ' for %s' % self.name if self.name else ''
     plt.plot(x1, res.resid, 'o')
-    ax.set_title('residuals versus exog')
-    ax = fig2.add_subplot(2, 1, 2)
+    ax.set_title('residuals versus exog')# + namestr)
+    ax = fig2.add_subplot(2,1,2)
     plt.plot(x2, res.resid, 'o')
+
     fig3 = plt.figure()
-    ax = fig3.add_subplot(2, 1, 1)
+    ax = fig3.add_subplot(2,1,1)
+    #namestr = ' for %s' % self.name if self.name else ''
     plt.plot(x1, res.fittedvalues, 'o')
-    ax.set_title('Fitted values versus exog')
-    ax = fig3.add_subplot(2, 1, 2)
+    ax.set_title('Fitted values versus exog')# + namestr)
+    ax = fig3.add_subplot(2,1,2)
     plt.plot(x2, res.fittedvalues, 'o')
+
     fig4 = plt.figure()
-    ax = fig4.add_subplot(2, 1, 1)
+    ax = fig4.add_subplot(2,1,1)
+    #namestr = ' for %s' % self.name if self.name else ''
     plt.plot(x1, res.fittedvalues + res.resid, 'o')
-    ax.set_title('Fitted values plus residuals versus exog')
-    ax = fig4.add_subplot(2, 1, 2)
+    ax.set_title('Fitted values plus residuals versus exog')# + namestr)
+    ax = fig4.add_subplot(2,1,2)
     plt.plot(x2, res.fittedvalues + res.resid, 'o')
+
+    # see http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/partregr.htm
     fig5 = plt.figure()
-    ax = fig5.add_subplot(2, 1, 1)
-    res1a = sm.OLS(y, exog0[:, [0, 2]]).fit()
-    res1b = sm.OLS(x1, exog0[:, [0, 2]]).fit()
+    ax = fig5.add_subplot(2,1,1)
+    #namestr = ' for %s' % self.name if self.name else ''
+    res1a = sm.OLS(y, exog0[:,[0,2]]).fit()
+    res1b = sm.OLS(x1, exog0[:,[0,2]]).fit()
     plt.plot(res1b.resid, res1a.resid, 'o')
     res1c = sm.OLS(res1a.resid, res1b.resid).fit()
     plt.plot(res1b.resid, res1c.fittedvalues, '-')
-    ax.set_title('Partial Regression plot')
-    ax = fig5.add_subplot(2, 1, 2)
-    res2a = sm.OLS(y, exog0[:, [0, 1]]).fit()
-    res2b = sm.OLS(x2, exog0[:, [0, 1]]).fit()
+    ax.set_title('Partial Regression plot')# + namestr)
+    ax = fig5.add_subplot(2,1,2)
+    #plt.plot(x2, res.fittedvalues + res.resid, 'o')
+    res2a = sm.OLS(y, exog0[:,[0,1]]).fit()
+    res2b = sm.OLS(x2, exog0[:,[0,1]]).fit()
     plt.plot(res2b.resid, res2a.resid, 'o')
     res2c = sm.OLS(res2a.resid, res2b.resid).fit()
     plt.plot(res2b.resid, res2c.fittedvalues, '-')
+
+    # see http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/ccpr.htm
     fig6 = plt.figure()
-    ax = fig6.add_subplot(2, 1, 1)
-    x1beta = x1 * res.params[1]
-    x2beta = x2 * res.params[2]
+    ax = fig6.add_subplot(2,1,1)
+    #namestr = ' for %s' % self.name if self.name else ''
+    x1beta = x1*res.params[1]
+    x2beta = x2*res.params[2]
     plt.plot(x1, x1beta + res.resid, 'o')
     plt.plot(x1, x1beta, '-')
-    ax.set_title('X_i beta_i plus residuals versus exog (CCPR)')
-    ax = fig6.add_subplot(2, 1, 2)
+    ax.set_title('X_i beta_i plus residuals versus exog (CCPR)')# + namestr)
+    ax = fig6.add_subplot(2,1,2)
     plt.plot(x2, x2beta + res.resid, 'o')
     plt.plot(x2, x2beta, '-')
+
+
+    #print res.summary()
+
 doplots = 1
 if doplots:
     fig1 = smrp.plot_fit(res, 0, y_true=None)
     smrp.plot_fit(res, 1, y_true=None)
-    smrp.plot_partregress_grid(res, exog_idx=[0, 1])
+    smrp.plot_partregress_grid(res, exog_idx=[0,1])
     smrp.plot_regress_exog(res, exog_idx=0)
     smrp.plot_ccpr(res, exog_idx=0)
-    smrp.plot_ccpr_grid(res, exog_idx=[0, 1])
+    smrp.plot_ccpr_grid(res, exog_idx=[0,1])
+
 tp = TestPlot()
 tp.test_plot_fit()
-fig1 = smrp.plot_partregress_grid(res, exog_idx=[0, 1])
+
+fig1 = smrp.plot_partregress_grid(res, exog_idx=[0,1])
+#add lowess
 ax = fig1.axes[0]
 y0 = ax.get_lines()[0]._y
 x0 = ax.get_lines()[0]._x
 lres = sm.nonparametric.lowess(y0, x0, frac=0.2)
-ax.plot(lres[:, 0], lres[:, 1], 'r', lw=1.5)
+ax.plot(lres[:,0], lres[:,1], 'r', lw=1.5)
 ax = fig1.axes[1]
 y0 = ax.get_lines()[0]._y
 x0 = ax.get_lines()[0]._x
 lres = sm.nonparametric.lowess(y0, x0, frac=0.2)
-ax.plot(lres[:, 0], lres[:, 1], 'r', lw=1.5)
+ax.plot(lres[:,0], lres[:,1], 'r', lw=1.5)
+
+#plt.show()
diff --git a/statsmodels/examples/ex_rootfinding.py b/statsmodels/examples/ex_rootfinding.py
index 31545de80..3eebf1d02 100644
--- a/statsmodels/examples/ex_rootfinding.py
+++ b/statsmodels/examples/ex_rootfinding.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """

 Created on Sat Mar 23 13:35:51 2013
@@ -6,48 +7,102 @@ Author: Josef Perktold
 """
 import numpy as np
 from statsmodels.tools.rootfinding import brentq_expanding
-DEBUG = False
+
+
+# Warning: module.global, changing this will affect everyone
+#import statsmodels.tools.rootfinding as smroots
+#smroots.DEBUG = True
+
+DEBUG = False #True
+
+
+def func(x, a):
+    f = (x - a)**3
+    if DEBUG:
+        print('evaluating at %g, fval = %f' % (x, f))
+    return f
+
+
+def func_nan(x, a, b):
+    x = np.atleast_1d(x)
+    f = (x - 1.*a)**3
+    f[x < b] = np.nan
+    if DEBUG:
+        print('evaluating at %f, fval = %f' % (x, f))
+    return f
+
+
+
+def funcn(x, a):
+    f = -(x - a)**3
+    if DEBUG:
+        print('evaluating at %g, fval = %g' % (x, f))
+    return f
+
+
+def func2(x, a):
+    f = (x - a)**3
+    print('evaluating at %g, fval = %f' % (x, f))
+    return f
+
 if __name__ == '__main__':
     run_all = False
     if run_all:
         print(brentq_expanding(func, args=(0,), increasing=True))
+
         print(brentq_expanding(funcn, args=(0,), increasing=False))
         print(brentq_expanding(funcn, args=(-50,), increasing=False))
+
         print(brentq_expanding(func, args=(20,)))
         print(brentq_expanding(funcn, args=(20,)))
         print(brentq_expanding(func, args=(500000,)))
+
+        # one bound
         print(brentq_expanding(func, args=(500000,), low=10000))
         print(brentq_expanding(func, args=(-50000,), upp=-1000))
+
         print(brentq_expanding(funcn, args=(500000,), low=10000))
         print(brentq_expanding(funcn, args=(-50000,), upp=-1000))
+
+        # both bounds
+        # hits maxiter in brentq if bounds too wide
         print(brentq_expanding(func, args=(500000,), low=300000, upp=700000))
-        print(brentq_expanding(func, args=(-50000,), low=-70000, upp=-1000))
+        print(brentq_expanding(func, args=(-50000,), low= -70000, upp=-1000))
         print(brentq_expanding(funcn, args=(500000,), low=300000, upp=700000))
-        print(brentq_expanding(funcn, args=(-50000,), low=-70000, upp=-10000))
-        print(brentq_expanding(func, args=(1.234e+30,), xtol=10000000000.0,
-            increasing=True, maxiter_bq=200))
+        print(brentq_expanding(funcn, args=(-50000,), low= -70000, upp=-10000))
+
+        print(brentq_expanding(func, args=(1.234e30,), xtol=1e10,
+                               increasing=True, maxiter_bq=200))
+
+
     print(brentq_expanding(func, args=(-50000,), start_low=-10000))
     try:
         print(brentq_expanding(func, args=(-500,), start_upp=-100))
     except ValueError:
         print('raised ValueError start_upp needs to be positive')
-    """ it still works
+
+    ''' it still works
     raise ValueError('start_upp needs to be positive')
     -499.999996336
-    """
-    """ this does not work
+    '''
+    ''' this does not work
     >>> print(brentq_expanding(func, args=(-500,), start_upp=-1000)
     raise ValueError('start_upp needs to be positive')
     OverflowError: (34, 'Result too large')
-    """
+    '''
+
     try:
-        print(brentq_expanding(funcn, args=(-50000,), low=-40000, upp=-10000))
+        print(brentq_expanding(funcn, args=(-50000,), low= -40000, upp=-10000))
     except Exception as e:
         print(e)
+
     val, info = brentq_expanding(func, args=(500,), full_output=True)
     print(val)
     print(vars(info))
-    print(brentq_expanding(func_nan, args=(20, 0), increasing=True))
-    print(brentq_expanding(func_nan, args=(20, 0)))
-    print(brentq_expanding(func_nan, args=(-20, 0), increasing=True))
-    print(brentq_expanding(func_nan, args=(-20, 0)))
+
+    #
+    print(brentq_expanding(func_nan, args=(20,0), increasing=True))
+    print(brentq_expanding(func_nan, args=(20,0)))
+    # In the next point 0 is minumum, below is nan
+    print(brentq_expanding(func_nan, args=(-20,0), increasing=True))
+    print(brentq_expanding(func_nan, args=(-20,0)))
diff --git a/statsmodels/examples/ex_sandwich.py b/statsmodels/examples/ex_sandwich.py
index 6bb255df5..711ba1af2 100644
--- a/statsmodels/examples/ex_sandwich.py
+++ b/statsmodels/examples/ex_sandwich.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """examples for sandwich estimators of covariance

 Author: Josef Perktold
@@ -6,53 +7,73 @@ Author: Josef Perktold
 from statsmodels.compat.python import lzip
 import numpy as np
 from numpy.testing import assert_almost_equal
+
 import statsmodels.api as sm
+
 import statsmodels.stats.sandwich_covariance as sw
+
+
 nobs = 100
-kvars = 4
-x = np.random.randn(nobs, kvars - 1)
+kvars = 4 #including constant
+x = np.random.randn(nobs, kvars-1)
 exog = sm.add_constant(x)
 params_true = np.ones(kvars)
 y_true = np.dot(exog, params_true)
-sigma = 0.1 + np.exp(exog[:, -1])
+sigma = 0.1 + np.exp(exog[:,-1])
 endog = y_true + sigma * np.random.randn(nobs)
+
 self = sm.OLS(endog, exog).fit()
+
 print(self.HC3_se)
 print(sw.se_cov(sw.cov_hc3(self)))
+#test standalone refactoring
 assert_almost_equal(sw.se_cov(sw.cov_hc0(self)), self.HC0_se, 15)
 assert_almost_equal(sw.se_cov(sw.cov_hc1(self)), self.HC1_se, 15)
 assert_almost_equal(sw.se_cov(sw.cov_hc2(self)), self.HC2_se, 15)
 assert_almost_equal(sw.se_cov(sw.cov_hc3(self)), self.HC3_se, 15)
 print(self.HC0_se)
 print(sw.se_cov(sw.cov_hac_simple(self, nlags=0, use_correction=False)))
+#test White as HAC with nlags=0, same as nlags=1 ?
 bse_hac0 = sw.se_cov(sw.cov_hac_simple(self, nlags=0, use_correction=False))
 assert_almost_equal(bse_hac0, self.HC0_se, 15)
 print(bse_hac0)
+#test White as HAC with nlags=0, same as nlags=1 ?
 bse_hac0c = sw.se_cov(sw.cov_hac_simple(self, nlags=0, use_correction=True))
 assert_almost_equal(bse_hac0c, self.HC1_se, 15)
+
 bse_w = sw.se_cov(sw.cov_white_simple(self, use_correction=False))
 print(bse_w)
+#test White
 assert_almost_equal(bse_w, self.HC0_se, 15)
+
 bse_wc = sw.se_cov(sw.cov_white_simple(self, use_correction=True))
 print(bse_wc)
+#test White
 assert_almost_equal(bse_wc, self.HC1_se, 15)
+
+
 groups = np.repeat(np.arange(5), 20)
+
 idx = np.nonzero(np.diff(groups))[0].tolist()
-groupidx = lzip([0] + idx, idx + [len(groups)])
+groupidx = lzip([0]+idx, idx+[len(groups)])
 ngroups = len(groupidx)
+
 print(sw.se_cov(sw.cov_cluster(self, groups)))
-print(sw.se_cov(sw.cov_cluster(self, np.ones(len(endog), int),
-    use_correction=False)))
+#two strange looking corner cases BUG?
+print(sw.se_cov(sw.cov_cluster(self, np.ones(len(endog), int), use_correction=False)))
 print(sw.se_cov(sw.cov_crosssection_0(self, np.arange(len(endog)))))
-groups = np.repeat(np.arange(50), 100 // 50)
+#these results are close to simple (no group) white, 50 groups 2 obs each
+groups = np.repeat(np.arange(50), 100//50)
 print(sw.se_cov(sw.cov_cluster(self, groups)))
-groups = np.repeat(np.arange(2), 100 // 2)
+#2 groups with 50 obs each, what was the interpretation again?
+groups = np.repeat(np.arange(2), 100//2)
 print(sw.se_cov(sw.cov_cluster(self, groups)))
-"""http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.txt"""
-"""
+
+"http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.txt"
+'''
 test <- read.table(
       url(paste("http://www.kellogg.northwestern.edu/",
             "faculty/petersen/htm/papers/se/",
             "test_data.txt",sep="")),
     col.names=c("firmid", "year", "x", "y"))
-"""
+'''
diff --git a/statsmodels/examples/ex_sandwich2.py b/statsmodels/examples/ex_sandwich2.py
index f4a07c2fa..224bca7ee 100644
--- a/statsmodels/examples/ex_sandwich2.py
+++ b/statsmodels/examples/ex_sandwich2.py
@@ -1,45 +1,73 @@
+# -*- coding: utf-8 -*-
 """Cluster robust standard errors for OLS

 Created on Fri Dec 16 12:52:13 2011
 Author: Josef Perktold
 """
 from urllib.request import urlretrieve
+
 import numpy as np
 from numpy.testing import assert_almost_equal
+
 import statsmodels.api as sm
 import statsmodels.stats.sandwich_covariance as sw
+
+#http://www.ats.ucla.edu/stat/stata/seminars/svy_stata_intro/srs.dta
+
 import statsmodels.iolib.foreign as dta
+
 try:
-    srs = dta.genfromdta('srs.dta')
+    srs = dta.genfromdta("srs.dta")
     print('using local file')
 except IOError:
-    urlretrieve(
-        'http://www.ats.ucla.edu/stat/stata/seminars/svy_stata_intro/srs.dta',
-        'srs.dta')
+    urlretrieve('http://www.ats.ucla.edu/stat/stata/seminars/svy_stata_intro/srs.dta', 'srs.dta')
     print('downloading file')
-    srs = dta.genfromdta('srs.dta')
+    srs = dta.genfromdta("srs.dta")
+#    from statsmodels.datasets import webuse
+#    srs = webuse('srs', 'http://www.ats.ucla.edu/stat/stata/seminars/svy_stata_intro/')
+#    #does currently not cache file
+
 y = srs['api00']
+#older numpy do not reorder
+#x = srs[['growth', 'emer', 'yr_rnd']].view(float).reshape(len(y), -1)
+#force sequence
 x = np.column_stack([srs[ii] for ii in ['growth', 'emer', 'yr_rnd']])
 group = srs['dnum']
-xx = sm.add_constant(x, prepend=False)
-mask = (xx != -999.0).all(1)
+
+#xx = sm.add_constant(x, prepend=True)
+xx = sm.add_constant(x, prepend=False) #const at end for Stata compatibility
+
+#remove nan observation
+mask = (xx!=-999.0).all(1)   #nan code in dta file
 mask.shape
 y = y[mask]
 xx = xx[mask]
 group = group[mask]
+
+#run OLS
+
 res_srs = sm.OLS(y, xx).fit()
 print('params    ', res_srs.params)
 print('bse_OLS   ', res_srs.bse)
+
+#get cluster robust standard errors and compare with STATA
+
 cov_cr = sw.cov_cluster(res_srs, group.astype(int))
 bse_cr = sw.se_cov(cov_cr)
 print('bse_rob   ', bse_cr)
-res_stata = np.rec.array([('growth', '|', -0.1027121, 0.2291703, -0.45, 
-    0.655, -0.5548352, 0.3494111), ('emer', '|', -5.444932, 0.7293969, -
-    7.46, 0.0, -6.883938, -4.005927), ('yr_rnd', '|', -51.07569, 22.83615, 
-    -2.24, 0.027, -96.12844, -6.022935), ('_cons', '|', 740.3981, 13.46076,
-    55.0, 0.0, 713.8418, 766.9544)], dtype=[('exogname', '|S6'), ('del',
-    '|S1'), ('params', 'float'), ('bse', 'float'), ('tvalues', 'float'), (
-    'pvalues', 'float'), ('cilow', 'float'), ('ciupp', 'float')])
+
+res_stata = np.rec.array(
+    [('growth', '|', -0.1027121, 0.22917029999999999, -0.45000000000000001, 0.65500000000000003, -0.55483519999999997, 0.34941109999999997),
+     ('emer', '|', -5.4449319999999997, 0.72939690000000001, -7.46, 0.0, -6.8839379999999997, -4.0059269999999998),
+     ('yr_rnd', '|', -51.075690000000002, 22.83615, -2.2400000000000002, 0.027, -96.128439999999998, -6.0229350000000004),
+     ('_cons', '|', 740.3981, 13.460760000000001, 55.0, 0.0, 713.84180000000003, 766.95439999999996)],
+    dtype=[('exogname', '|S6'), ('del', '|S1'), ('params', 'float'),
+           ('bse', 'float'), ('tvalues', 'float'), ('pvalues', 'float'),
+           ('cilow', 'float'), ('ciupp', 'float')])
+
 print('diff Stata', bse_cr - res_stata.bse)
 assert_almost_equal(bse_cr, res_stata.bse, decimal=6)
-print('reldiff to OLS', bse_cr / res_srs.bse - 1)
+
+#We see that in this case the robust standard errors of the parameter estimates
+#are larger than those of OLS by 8 to 35 %
+print('reldiff to OLS', bse_cr/res_srs.bse - 1)
diff --git a/statsmodels/examples/ex_sandwich3.py b/statsmodels/examples/ex_sandwich3.py
index fff6b4cb1..25cd567f2 100644
--- a/statsmodels/examples/ex_sandwich3.py
+++ b/statsmodels/examples/ex_sandwich3.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Cluster Robust Standard Errors with Two Clusters

 Created on Sat Dec 17 08:39:16 2011
@@ -5,39 +6,58 @@ Created on Sat Dec 17 08:39:16 2011
 Author: Josef Perktold
 """
 from urllib.request import urlretrieve
+
 import numpy as np
 from numpy.testing import assert_almost_equal
+
 import statsmodels.api as sm
+
 import statsmodels.stats.sandwich_covariance as sw
+
+#requires Petersen's test_data
+#http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.txt
 try:
-    pet = np.genfromtxt('test_data.txt')
+    pet = np.genfromtxt("test_data.txt")
     print('using local file')
 except IOError:
-    urlretrieve(
-        'http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.txt'
-        , 'test_data.txt')
+    urlretrieve('http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.txt',
+                       'test_data.txt')
     print('downloading file')
-    pet = np.genfromtxt('test_data.txt')
-endog = pet[:, -1]
-group = pet[:, 0].astype(int)
-time = pet[:, 1].astype(int)
-exog = sm.add_constant(pet[:, 2])
+    pet = np.genfromtxt("test_data.txt")
+
+
+endog = pet[:,-1]
+group = pet[:,0].astype(int)
+time = pet[:,1].astype(int)
+exog = sm.add_constant(pet[:,2])
 res = sm.OLS(endog, exog).fit()
+
 cov01, covg, covt = sw.cov_cluster_2groups(res, group, group2=time)
+
+#Reference number from Petersen
+#http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.htm
+
 bse_petw = [0.0284, 0.0284]
-bse_pet0 = [0.067, 0.0506]
-bse_pet1 = [0.0234, 0.0334]
-bse_pet01 = [0.0651, 0.0536]
+bse_pet0 = [0.0670, 0.0506]
+bse_pet1 = [0.0234, 0.0334]  #year
+bse_pet01 = [0.0651, 0.0536]  #firm and year
+
 bse_0 = sw.se_cov(covg)
 bse_1 = sw.se_cov(covt)
 bse_01 = sw.se_cov(cov01)
+
 print('OLS            ', res.bse)
 print('het HC0        ', res.HC0_se, bse_petw - res.HC0_se)
 print('het firm       ', bse_0, bse_0 - bse_pet0)
 print('het year       ', bse_1, bse_1 - bse_pet1)
 print('het firm & year', bse_01, bse_01 - bse_pet01)
+
 print('relative difference standard error het firm & year to OLS')
 print('               ', bse_01 / res.bse)
+
+#From the last line we see that the cluster and year robust standard errors
+#are approximately twice those of OLS
+
 assert_almost_equal(bse_petw, res.HC0_se, decimal=4)
 assert_almost_equal(bse_0, bse_pet0, decimal=4)
 assert_almost_equal(bse_1, bse_pet1, decimal=4)
diff --git a/statsmodels/examples/ex_scatter_ellipse.py b/statsmodels/examples/ex_scatter_ellipse.py
index 98f9608ac..1735f6505 100644
--- a/statsmodels/examples/ex_scatter_ellipse.py
+++ b/statsmodels/examples/ex_scatter_ellipse.py
@@ -1,25 +1,35 @@
-"""example for grid of scatter plots with probability ellipses
+'''example for grid of scatter plots with probability ellipses


 Author: Josef Perktold
 License: BSD-3
-"""
+'''
+
+
 from statsmodels.compat.python import lrange
 import numpy as np
 import matplotlib.pyplot as plt
+
 from statsmodels.graphics.plot_grids import scatter_ellipse
+
+
 nvars = 6
-mmean = np.arange(1.0, nvars + 1) / nvars * 1.5
+mmean = np.arange(1.,nvars+1)/nvars * 1.5
 rho = 0.5
+#dcorr = rho*np.ones((nvars, nvars)) + (1-rho)*np.eye(nvars)
 r = np.random.uniform(-0.99, 0.99, size=(nvars, nvars))
-r = (r + r.T) / 2.0
+##from scipy import stats
+##r = stats.rdist.rvs(1, size=(nvars, nvars))
+r = (r + r.T) / 2.
 assert np.allclose(r, r.T)
 mcorr = r
 mcorr[lrange(nvars), lrange(nvars)] = 1
-mstd = np.arange(1.0, nvars + 1) / nvars
+#dcorr = np.array([[1, 0.5, 0.1],[0.5, 1, -0.2], [0.1, -0.2, 1]])
+mstd = np.arange(1.,nvars+1)/nvars
 mcov = mcorr * np.outer(mstd, mstd)
 evals = np.linalg.eigvalsh(mcov)
-assert evals.min > 0
+assert evals.min > 0 #assert positive definite
+
 nobs = 100
 data = np.random.multivariate_normal(mmean, mcov, size=nobs)
 dmean = data.mean(0)
@@ -29,6 +39,12 @@ print(dcov)
 dcorr = np.corrcoef(data, rowvar=0)
 dcorr[np.triu_indices(nvars)] = 0
 print(dcorr)
-varnames = [('var%d' % i) for i in range(nvars)]
+
+#default
+#fig = scatter_ellipse(data, level=[0.5, 0.75, 0.95])
+#used for checking
+#fig = scatter_ellipse(data, level=[0.5, 0.75, 0.95], add_titles=True, keep_ticks=True)
+#check varnames
+varnames = ['var%d' % i for i in range(nvars)]
 fig = scatter_ellipse(data, level=0.9, varnames=varnames)
 plt.show()
diff --git a/statsmodels/examples/ex_univar_kde.py b/statsmodels/examples/ex_univar_kde.py
index 4ae5ae7d4..35e46f290 100644
--- a/statsmodels/examples/ex_univar_kde.py
+++ b/statsmodels/examples/ex_univar_kde.py
@@ -12,56 +12,72 @@ Produces six different plots for each distribution
 6) Poisson

 """
+
+
 import numpy as np
 import scipy.stats as stats
 import matplotlib.pyplot as plt
 import statsmodels.api as sm
+
 KDEMultivariate = sm.nonparametric.KDEMultivariate
+
+
 np.random.seed(123456)
+
+# Beta distribution
+
+# Parameters
 a = 2
 b = 5
 nobs = 250
+
 support = np.random.beta(a, b, size=nobs)
 rv = stats.beta(a, b)
 ix = np.argsort(support)
-dens_normal = KDEMultivariate(data=[support], var_type='c', bw=
-    'normal_reference')
+
+dens_normal = KDEMultivariate(data=[support], var_type='c', bw='normal_reference')
 dens_cvls = KDEMultivariate(data=[support], var_type='c', bw='cv_ls')
 dens_cvml = KDEMultivariate(data=[support], var_type='c', bw='cv_ml')
+
 plt.figure(1)
 plt.plot(support[ix], rv.pdf(support[ix]), label='Actual')
 plt.plot(support[ix], dens_normal.pdf()[ix], label='Scott')
 plt.plot(support[ix], dens_cvls.pdf()[ix], label='CV_LS')
 plt.plot(support[ix], dens_cvml.pdf()[ix], label='CV_ML')
-plt.title(
-    'Nonparametric Estimation of the Density of Beta Distributed Random Variable'
-    )
+plt.title("Nonparametric Estimation of the Density of Beta Distributed " \
+          "Random Variable")
 plt.legend(('Actual', 'Scott', 'CV_LS', 'CV_ML'))
+
+# f distribution
 df = 100
 dn = 100
 nobs = 250
+
 support = np.random.f(dn, df, size=nobs)
 rv = stats.f(df, dn)
 ix = np.argsort(support)
-dens_normal = KDEMultivariate(data=[support], var_type='c', bw=
-    'normal_reference')
+
+dens_normal = KDEMultivariate(data=[support], var_type='c', bw='normal_reference')
 dens_cvls = KDEMultivariate(data=[support], var_type='c', bw='cv_ls')
 dens_cvml = KDEMultivariate(data=[support], var_type='c', bw='cv_ml')
+
 plt.figure(2)
 plt.plot(support[ix], rv.pdf(support[ix]), label='Actual')
 plt.plot(support[ix], dens_normal.pdf()[ix], label='Scott')
 plt.plot(support[ix], dens_cvls.pdf()[ix], label='CV_LS')
 plt.plot(support[ix], dens_cvml.pdf()[ix], label='CV_ML')
-plt.title(
-    'Nonparametric Estimation of the Density of f Distributed Random Variable')
+plt.title("Nonparametric Estimation of the Density of f Distributed " \
+          "Random Variable")
 plt.legend(('Actual', 'Scott', 'CV_LS', 'CV_ML'))
+
+# Pareto distribution
 a = 2
 nobs = 150
 support = np.random.pareto(a, size=nobs)
 rv = stats.pareto(a)
 ix = np.argsort(support)
-dens_normal = KDEMultivariate(data=[support], var_type='c', bw=
-    'normal_reference')
+
+dens_normal = KDEMultivariate(data=[support], var_type='c', bw='normal_reference')
 dens_cvls = KDEMultivariate(data=[support], var_type='c', bw='cv_ls')
 dens_cvml = KDEMultivariate(data=[support], var_type='c', bw='cv_ml')
 plt.figure(3)
@@ -69,63 +85,71 @@ plt.plot(support[ix], rv.pdf(support[ix]), label='Actual')
 plt.plot(support[ix], dens_normal.pdf()[ix], label='Scott')
 plt.plot(support[ix], dens_cvls.pdf()[ix], label='CV_LS')
 plt.plot(support[ix], dens_cvml.pdf()[ix], label='CV_ML')
-plt.title(
-    'Nonparametric Estimation of the Density of Pareto Distributed Random Variable'
-    )
+plt.title("Nonparametric Estimation of the Density of Pareto " \
+          "Distributed Random Variable")
 plt.legend(('Actual', 'Scott', 'CV_LS', 'CV_ML'))
+
+# Laplace Distribution
 mu = 0
 s = 1
 nobs = 250
+
 support = np.random.laplace(mu, s, size=nobs)
 rv = stats.laplace(mu, s)
 ix = np.argsort(support)
-dens_normal = KDEMultivariate(data=[support], var_type='c', bw=
-    'normal_reference')
+
+dens_normal = KDEMultivariate(data=[support], var_type='c', bw='normal_reference')
 dens_cvls = KDEMultivariate(data=[support], var_type='c', bw='cv_ls')
 dens_cvml = KDEMultivariate(data=[support], var_type='c', bw='cv_ml')
+
 plt.figure(4)
 plt.plot(support[ix], rv.pdf(support[ix]), label='Actual')
 plt.plot(support[ix], dens_normal.pdf()[ix], label='Scott')
 plt.plot(support[ix], dens_cvls.pdf()[ix], label='CV_LS')
 plt.plot(support[ix], dens_cvml.pdf()[ix], label='CV_ML')
-plt.title(
-    'Nonparametric Estimation of the Density of Laplace Distributed Random Variable'
-    )
+plt.title("Nonparametric Estimation of the Density of Laplace " \
+          "Distributed Random Variable")
 plt.legend(('Actual', 'Scott', 'CV_LS', 'CV_ML'))
+
+# Weibull Distribution
 a = 1
 nobs = 250
+
 support = np.random.weibull(a, size=nobs)
 rv = stats.weibull_min(a)
+
 ix = np.argsort(support)
-dens_normal = KDEMultivariate(data=[support], var_type='c', bw=
-    'normal_reference')
+dens_normal = KDEMultivariate(data=[support], var_type='c', bw='normal_reference')
 dens_cvls = KDEMultivariate(data=[support], var_type='c', bw='cv_ls')
 dens_cvml = KDEMultivariate(data=[support], var_type='c', bw='cv_ml')
+
 plt.figure(5)
 plt.plot(support[ix], rv.pdf(support[ix]), label='Actual')
 plt.plot(support[ix], dens_normal.pdf()[ix], label='Scott')
 plt.plot(support[ix], dens_cvls.pdf()[ix], label='CV_LS')
 plt.plot(support[ix], dens_cvml.pdf()[ix], label='CV_ML')
-plt.title(
-    'Nonparametric Estimation of the Density of Weibull Distributed Random Variable'
-    )
+plt.title("Nonparametric Estimation of the Density of Weibull " \
+          "Distributed Random Variable")
 plt.legend(('Actual', 'Scott', 'CV_LS', 'CV_ML'))
+
+# Poisson Distribution
 a = 2
 nobs = 250
 support = np.random.poisson(a, size=nobs)
 rv = stats.poisson(a)
+
 ix = np.argsort(support)
-dens_normal = KDEMultivariate(data=[support], var_type='o', bw=
-    'normal_reference')
+dens_normal = KDEMultivariate(data=[support], var_type='o', bw='normal_reference')
 dens_cvls = KDEMultivariate(data=[support], var_type='o', bw='cv_ls')
 dens_cvml = KDEMultivariate(data=[support], var_type='o', bw='cv_ml')
+
 plt.figure(6)
 plt.plot(support[ix], rv.pmf(support[ix]), label='Actual')
 plt.plot(support[ix], dens_normal.pdf()[ix], label='Scott')
 plt.plot(support[ix], dens_cvls.pdf()[ix], label='CV_LS')
 plt.plot(support[ix], dens_cvml.pdf()[ix], label='CV_ML')
-plt.title(
-    'Nonparametric Estimation of the Density of Poisson Distributed Random Variable'
-    )
+plt.title("Nonparametric Estimation of the Density of Poisson " \
+          "Distributed Random Variable")
 plt.legend(('Actual', 'Scott', 'CV_LS', 'CV_ML'))
+
 plt.show()
diff --git a/statsmodels/examples/ex_wald_anova.py b/statsmodels/examples/ex_wald_anova.py
index 8477e21f0..134e87920 100644
--- a/statsmodels/examples/ex_wald_anova.py
+++ b/statsmodels/examples/ex_wald_anova.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Example for wald_test for terms - `wald_anova`

 Created on Mon Dec 15 11:19:23 2014
@@ -6,37 +7,42 @@ Author: Josef Perktold
 License: BSD-3

 """
-import numpy as np
+
+import numpy as np  # noqa:F401 --> needed for patsy
+
 from statsmodels.formula.api import ols, glm, poisson
 from statsmodels.discrete.discrete_model import Poisson
+
 import statsmodels.stats.tests.test_anova as ttmod
 from statsmodels.discrete.discrete_model import NegativeBinomial
+
+
 test = ttmod.TestAnova3()
 test.setup_class()
-data = test.data.drop([0, 1, 2])
-res_ols = ols('np.log(Days+1) ~ C(Duration, Sum)*C(Weight, Sum)', data).fit(
-    use_t=False)
-res_glm = glm('np.log(Days+1) ~ C(Duration, Sum)*C(Weight, Sum)', data).fit()
-res_poi = Poisson.from_formula('Days ~ C(Weight) * C(Duration)', data).fit(
-    cov_type='HC0')
-res_poi_2 = poisson('Days ~ C(Weight) + C(Duration)', data).fit(cov_type='HC0')
+
+data = test.data.drop([0,1,2])
+res_ols = ols("np.log(Days+1) ~ C(Duration, Sum)*C(Weight, Sum)", data).fit(use_t=False)
+
+res_glm = glm("np.log(Days+1) ~ C(Duration, Sum)*C(Weight, Sum)",
+                        data).fit()
+
+res_poi = Poisson.from_formula("Days ~ C(Weight) * C(Duration)", data).fit(cov_type='HC0')
+res_poi_2 = poisson("Days ~ C(Weight) + C(Duration)", data).fit(cov_type='HC0')
+
 print('\nOLS')
 print(res_ols.wald_test_terms())
 print('\nGLM')
-print(res_glm.wald_test_terms(skip_single=False, combine_terms=['Duration',
-    'Weight']))
+print(res_glm.wald_test_terms(skip_single=False, combine_terms=['Duration', 'Weight']))
 print('\nPoisson 1')
-print(res_poi.wald_test_terms(skip_single=False, combine_terms=['Duration',
-    'Weight']))
+print(res_poi.wald_test_terms(skip_single=False, combine_terms=['Duration', 'Weight']))
 print('\nPoisson 2')
 print(res_poi_2.wald_test_terms(skip_single=False))
-res_nb2 = NegativeBinomial.from_formula('Days ~ C(Weight) * C(Duration)', data
-    ).fit()
-print("""
-Negative Binomial nb2""")
+
+res_nb2 = NegativeBinomial.from_formula("Days ~ C(Weight) * C(Duration)", data).fit()
+print('\nNegative Binomial nb2')
 print(res_nb2.wald_test_terms(skip_single=False))
-res_nb1 = NegativeBinomial.from_formula('Days ~ C(Weight) * C(Duration)',
-    data, loglike_method='nb1').fit(cov_type='HC0')
-print("""
-Negative Binomial nb2""")
+
+res_nb1 = NegativeBinomial.from_formula("Days ~ C(Weight) * C(Duration)", data,
+                                        loglike_method='nb1').fit(cov_type='HC0')
+print('\nNegative Binomial nb2')
 print(res_nb1.wald_test_terms(skip_single=False))
diff --git a/statsmodels/examples/example_discrete_mnl.py b/statsmodels/examples/example_discrete_mnl.py
index c9b663fa5..47ad26021 100644
--- a/statsmodels/examples/example_discrete_mnl.py
+++ b/statsmodels/examples/example_discrete_mnl.py
@@ -1,40 +1,63 @@
 """Example: statsmodels.discretemod
 """
+
 from statsmodels.compat.python import lrange
+
 import numpy as np
+
 import statsmodels.api as sm
-from statsmodels.iolib.summary import summary_params_2d, summary_params_2dflat, table_extend
+from statsmodels.iolib.summary import (
+    summary_params_2d,
+    summary_params_2dflat,
+    table_extend,
+)
+
 anes_data = sm.datasets.anes96.load()
 anes_exog = anes_data.exog
 anes_exog = sm.add_constant(anes_exog, prepend=False)
 mlogit_mod = sm.MNLogit(anes_data.endog, anes_exog)
 mlogit_res = mlogit_mod.fit()
+
+# The default method for the fit is Newton-Raphson
+# However, you can use other solvers
 mlogit_res = mlogit_mod.fit(method='bfgs', maxiter=100)
-exog_names = [anes_data.exog_name[i] for i in [0, 2] + lrange(5, 8)] + ['const'
-    ]
-endog_names = [(anes_data.endog_name + '_%d' % i) for i in np.unique(
-    mlogit_res.model.endog)[1:]]
+# The below needs a lot of iterations to get it right?
+#TODO: Add a technical note on algorithms
+#mlogit_res = mlogit_mod.fit(method='ncg') # this takes forever
+
+
+exog_names = [anes_data.exog_name[i] for i in [0, 2]+lrange(5,8)] + ['const']
+endog_names = [anes_data.endog_name+'_%d' % i for i in np.unique(mlogit_res.model.endog)[1:]]
 print('\n\nMultinomial')
-print(summary_params_2d(mlogit_res, extras=['bse', 'tvalues'], endog_names=
-    endog_names, exog_names=exog_names))
-tables, table_all = summary_params_2dflat(mlogit_res, endog_names=
-    endog_names, exog_names=exog_names, keep_headers=True)
-tables, table_all = summary_params_2dflat(mlogit_res, endog_names=
-    endog_names, exog_names=exog_names, keep_headers=False)
+print(summary_params_2d(mlogit_res, extras=['bse','tvalues'],
+                         endog_names=endog_names, exog_names=exog_names))
+tables, table_all = summary_params_2dflat(mlogit_res,
+                                          endog_names=endog_names,
+                                          exog_names=exog_names,
+                                          keep_headers=True)
+tables, table_all = summary_params_2dflat(mlogit_res,
+                                          endog_names=endog_names,
+                                          exog_names=exog_names,
+                                          keep_headers=False)
 print('\n\n')
 print(table_all)
 print('\n\n')
-print('\n'.join(str(t) for t in tables))
+print('\n'.join((str(t) for t in tables)))
+
 at = table_extend(tables)
 print(at)
+
 print('\n\n')
 print(mlogit_res.summary())
 print(mlogit_res.summary(yname='PID'))
-endog_names = [(anes_data.endog_name + '=%d' % i) for i in np.unique(
-    mlogit_res.model.endog)[1:]]
-print(mlogit_res.summary(yname='PID', yname_list=endog_names, xname=exog_names)
-    )
-""" #trying pickle
+#the following is supposed to raise ValueError
+#mlogit_res.summary(yname=['PID'])
+
+endog_names = [anes_data.endog_name+'=%d' % i for i in np.unique(mlogit_res.model.endog)[1:]]
+print(mlogit_res.summary(yname='PID', yname_list=endog_names, xname=exog_names))
+
+
+''' #trying pickle
 import pickle

 #copy.deepcopy(mlogit_res)  #raises exception: AttributeError: 'ResettableCache' object has no attribute '_resetdict'
@@ -44,4 +67,4 @@ mnl_res.cov_params()
 #mnl_res.model.exog = None
 pickle.dump(mnl_res, open('mnl_res.dump', 'w'))
 mnl_res_l = pickle.load(open('mnl_res.dump', 'r'))
-"""
+'''
diff --git a/statsmodels/examples/example_enhanced_boxplots.py b/statsmodels/examples/example_enhanced_boxplots.py
index 5ff783607..6b70a95c3 100644
--- a/statsmodels/examples/example_enhanced_boxplots.py
+++ b/statsmodels/examples/example_enhanced_boxplots.py
@@ -1,61 +1,98 @@
+
 import numpy as np
 import matplotlib.pyplot as plt
+
 import statsmodels.api as sm
+
+
+# Necessary to make horizontal axis labels fit
 plt.rcParams['figure.subplot.bottom'] = 0.23
+
 data = sm.datasets.anes96.load_pandas()
 party_ID = np.arange(7)
-labels = ['Strong Democrat', 'Weak Democrat', 'Independent-Democrat',
-    'Independent-Independent', 'Independent-Republican', 'Weak Republican',
-    'Strong Republican']
+labels = ["Strong Democrat", "Weak Democrat", "Independent-Democrat",
+          "Independent-Independent", "Independent-Republican",
+          "Weak Republican", "Strong Republican"]
+
+# Group age by party ID.
 age = [data.exog['age'][data.endog == id] for id in party_ID]
+
+
+# Create a violin plot.
 fig = plt.figure()
 ax = fig.add_subplot(111)
-sm.graphics.violinplot(age, ax=ax, labels=labels, plot_opts={'cutoff_val': 
-    5, 'cutoff_type': 'abs', 'label_fontsize': 'small', 'label_rotation': 30})
-ax.set_xlabel('Party identification of respondent.')
-ax.set_ylabel('Age')
+
+sm.graphics.violinplot(age, ax=ax, labels=labels,
+                       plot_opts={'cutoff_val':5, 'cutoff_type':'abs',
+                                  'label_fontsize':'small',
+                                  'label_rotation':30})
+
+ax.set_xlabel("Party identification of respondent.")
+ax.set_ylabel("Age")
 ax.set_title("US national election '96 - Age & Party Identification")
+
+
+# Create a bean plot.
 fig2 = plt.figure()
 ax = fig2.add_subplot(111)
-sm.graphics.beanplot(age, ax=ax, labels=labels, plot_opts={'cutoff_val': 5,
-    'cutoff_type': 'abs', 'label_fontsize': 'small', 'label_rotation': 30})
-ax.set_xlabel('Party identification of respondent.')
-ax.set_ylabel('Age')
+
+sm.graphics.beanplot(age, ax=ax, labels=labels,
+                    plot_opts={'cutoff_val':5, 'cutoff_type':'abs',
+                               'label_fontsize':'small',
+                               'label_rotation':30})
+
+ax.set_xlabel("Party identification of respondent.")
+ax.set_ylabel("Age")
 ax.set_title("US national election '96 - Age & Party Identification")
+
+
+# Create a jitter plot.
 fig3 = plt.figure()
 ax = fig3.add_subplot(111)
-plot_opts = {'cutoff_val': 5, 'cutoff_type': 'abs', 'label_fontsize':
-    'small', 'label_rotation': 30, 'violin_fc': (0.8, 0.8, 0.8),
-    'jitter_marker': '.', 'jitter_marker_size': 3, 'bean_color': '#FF6F00',
-    'bean_mean_color': '#009D91'}
-sm.graphics.beanplot(age, ax=ax, labels=labels, jitter=True, plot_opts=
-    plot_opts)
-ax.set_xlabel('Party identification of respondent.')
-ax.set_ylabel('Age')
+
+plot_opts={'cutoff_val':5, 'cutoff_type':'abs', 'label_fontsize':'small',
+           'label_rotation':30, 'violin_fc':(0.8, 0.8, 0.8),
+           'jitter_marker':'.', 'jitter_marker_size':3, 'bean_color':'#FF6F00',
+           'bean_mean_color':'#009D91'}
+sm.graphics.beanplot(age, ax=ax, labels=labels, jitter=True,
+                    plot_opts=plot_opts)
+
+ax.set_xlabel("Party identification of respondent.")
+ax.set_ylabel("Age")
 ax.set_title("US national election '96 - Age & Party Identification")
-ix = data.exog['income'] < 16
+
+
+# Create an asymmetrical jitter plot.
+ix = data.exog['income'] < 16  # incomes < $30k
 age = data.exog['age'][ix]
 endog = data.endog[ix]
 age_lower_income = [age[endog == id] for id in party_ID]
-ix = data.exog['income'] >= 20
+
+ix = data.exog['income'] >= 20  # incomes > $50k
 age = data.exog['age'][ix]
 endog = data.endog[ix]
 age_higher_income = [age[endog == id] for id in party_ID]
+
 fig = plt.figure()
 ax = fig.add_subplot(111)
-plot_opts['violin_fc'] = 0.5, 0.5, 0.5
+
+plot_opts['violin_fc'] = (0.5, 0.5, 0.5)
 plot_opts['bean_show_mean'] = False
 plot_opts['bean_show_median'] = False
-plot_opts['bean_legend_text'] = 'Income < \\$30k'
+plot_opts['bean_legend_text'] = r'Income < \$30k'
 plot_opts['cutoff_val'] = 10
 sm.graphics.beanplot(age_lower_income, ax=ax, labels=labels, side='left',
-    jitter=True, plot_opts=plot_opts)
-plot_opts['violin_fc'] = 0.7, 0.7, 0.7
+                     jitter=True, plot_opts=plot_opts)
+plot_opts['violin_fc'] = (0.7, 0.7, 0.7)
 plot_opts['bean_color'] = '#009D91'
-plot_opts['bean_legend_text'] = 'Income > \\$50k'
+plot_opts['bean_legend_text'] = r'Income > \$50k'
 sm.graphics.beanplot(age_higher_income, ax=ax, labels=labels, side='right',
-    jitter=True, plot_opts=plot_opts)
-ax.set_xlabel('Party identification of respondent.')
-ax.set_ylabel('Age')
+                     jitter=True, plot_opts=plot_opts)
+
+ax.set_xlabel("Party identification of respondent.")
+ax.set_ylabel("Age")
 ax.set_title("US national election '96 - Age & Party Identification")
+
+
+# Show all plots.
 plt.show()
diff --git a/statsmodels/examples/example_functional_plots.py b/statsmodels/examples/example_functional_plots.py
index 4b137eda8..abb047090 100644
--- a/statsmodels/examples/example_functional_plots.py
+++ b/statsmodels/examples/example_functional_plots.py
@@ -1,30 +1,52 @@
-"""Functional boxplots and rainbow plots
+'''Functional boxplots and rainbow plots

 see docstrings for an explanation


 Author: Ralf Gommers

-"""
+'''
+
 import matplotlib.pyplot as plt
 import numpy as np
+
 import statsmodels.api as sm
+
+#Load the El Nino dataset.  Consists of 60 years worth of Pacific Ocean sea
+#surface temperature data.
+
 data = sm.datasets.elnino.load()
+
+#Create a functional boxplot:
+
+#We see that the years 1982-83 and 1997-98 are outliers; these are
+#the years where El Nino (a climate pattern characterized by warming
+#up of the sea surface and higher air pressures) occurred with unusual
+#intensity.
+
 fig = plt.figure()
 ax = fig.add_subplot(111)
-res = sm.graphics.fboxplot(data.raw_data[:, 1:], wfactor=2.58, labels=data.
-    raw_data[:, 0].astype(int), ax=ax)
-ax.set_xlabel('Month of the year')
-ax.set_ylabel('Sea surface temperature (C)')
+res = sm.graphics.fboxplot(data.raw_data[:, 1:], wfactor=2.58,
+                           labels=data.raw_data[:, 0].astype(int),
+                           ax=ax)
+
+ax.set_xlabel("Month of the year")
+ax.set_ylabel("Sea surface temperature (C)")
 ax.set_xticks(np.arange(13, step=3) - 1)
-ax.set_xticklabels(['', 'Mar', 'Jun', 'Sep', 'Dec'])
+ax.set_xticklabels(["", "Mar", "Jun", "Sep", "Dec"])
 ax.set_xlim([-0.2, 11.2])
+
+
+
+#Create a rainbow plot:
+
 fig = plt.figure()
 ax = fig.add_subplot(111)
 res = sm.graphics.rainbowplot(data.raw_data[:, 1:], ax=ax)
-ax.set_xlabel('Month of the year')
-ax.set_ylabel('Sea surface temperature (C)')
+
+ax.set_xlabel("Month of the year")
+ax.set_ylabel("Sea surface temperature (C)")
 ax.set_xticks(np.arange(13, step=3) - 1)
-ax.set_xticklabels(['', 'Mar', 'Jun', 'Sep', 'Dec'])
+ax.set_xticklabels(["", "Mar", "Jun", "Sep", "Dec"])
 ax.set_xlim([-0.2, 11.2])
 plt.show()
diff --git a/statsmodels/examples/example_kde.py b/statsmodels/examples/example_kde.py
index c68b1750e..0067a7b35 100644
--- a/statsmodels/examples/example_kde.py
+++ b/statsmodels/examples/example_kde.py
@@ -1,15 +1,31 @@
+
 import numpy as np
 from scipy import stats
 from statsmodels.distributions.mixture_rvs import mixture_rvs
 from statsmodels.nonparametric.kde import kdensityfft
 from statsmodels.nonparametric import bandwidths
 import matplotlib.pyplot as plt
+
+
 np.random.seed(12345)
-obs_dist = mixture_rvs([0.25, 0.75], size=10000, dist=[stats.norm, stats.
-    norm], kwargs=(dict(loc=-1, scale=0.5), dict(loc=1, scale=0.5)))
-f_hat, grid, bw = kdensityfft(obs_dist, kernel='gauss', bw='scott')
+obs_dist = mixture_rvs([.25,.75], size=10000, dist=[stats.norm, stats.norm],
+                kwargs = (dict(loc=-1,scale=.5),dict(loc=1,scale=.5)))
+#.. obs_dist = mixture_rvs([.25,.75], size=10000, dist=[stats.norm, stats.beta],
+#..            kwargs = (dict(loc=-1,scale=.5),dict(loc=1,scale=1,args=(1,.5))))
+
+
+f_hat, grid, bw = kdensityfft(obs_dist, kernel="gauss", bw="scott")
+
+# Check the plot
+
 plt.figure()
 plt.hist(obs_dist, bins=50, normed=True, color='red')
 plt.plot(grid, f_hat, lw=2, color='black')
 plt.show()
+
+# do some timings
+# get bw first because they're not streamlined
 bw = bandwidths.bw_scott(obs_dist)
+
+#.. timeit kdensity(obs_dist, kernel="gauss", bw=bw, gridsize=2**10)
+#.. timeit kdensityfft(obs_dist, kernel="gauss", bw=bw, gridsize=2**10)
diff --git a/statsmodels/examples/example_ols_minimal_comp.py b/statsmodels/examples/example_ols_minimal_comp.py
index 76812d14e..d8178693b 100644
--- a/statsmodels/examples/example_ols_minimal_comp.py
+++ b/statsmodels/examples/example_ols_minimal_comp.py
@@ -3,22 +3,28 @@
 add example for new compare methods

 """
+
 import numpy as np
 import statsmodels.api as sm
+
 np.random.seed(765367)
 nsample = 100
-x = np.linspace(0, 10, 100)
-X = sm.add_constant(np.column_stack((x, x ** 2)))
+x = np.linspace(0,10, 100)
+X = sm.add_constant(np.column_stack((x, x**2)))
 beta = np.array([10, 1, 0.01])
 y = np.dot(X, beta) + np.random.normal(size=nsample)
+
 results = sm.OLS(y, X).fit()
 print(results.summary())
-results2 = sm.OLS(y, X[:, :2]).fit()
+
+results2 = sm.OLS(y, X[:,:2]).fit()
 print(results.compare_f_test(results2))
-print(results.f_test([0, 0, 1]))
+print(results.f_test([0,0,1]))
+
 print(results.compare_lr_test(results2))
-"""
+
+'''
 (1.841903749875428, 0.1778775592033047)
 <F test: F=array([[ 1.84190375]]), p=[[ 0.17787756]], df_denom=97, df_num=1>
 (1.8810663357027693, 0.17021300121753191, 1.0)
-"""
+'''
diff --git a/statsmodels/examples/example_rpy.py b/statsmodels/examples/example_rpy.py
index 91d4abbc1..5fb290f2e 100644
--- a/statsmodels/examples/example_rpy.py
+++ b/statsmodels/examples/example_rpy.py
@@ -1,4 +1,4 @@
-"""Just two examples for using rpy
+'''Just two examples for using rpy

 These examples are mainly for developers.

@@ -15,28 +15,33 @@ it does not work for all types of R models.

 There are also R scripts included with most of the datasets to run
 some basic models for comparisons of results to statsmodels.
-"""
+'''
+
 from rpy import r
+
 import statsmodels.api as sm
+
 examples = [1, 2]
+
 if 1 in examples:
     data = sm.datasets.longley.load()
-    y, x = data.endog, sm.add_constant(data.exog, prepend=False)
-    des_cols = [('x.%d' % (i + 1)) for i in range(x.shape[1])]
+    y,x = data.endog, sm.add_constant(data.exog, prepend=False)
+    des_cols = ['x.%d' % (i+1) for i in range(x.shape[1])]
     formula = r('y~%s-1' % '+'.join(des_cols))
     frame = r.data_frame(y=y, x=x)
     results = r.lm(formula, data=frame)
     print(list(results.keys()))
     print(results['coefficients'])
+
 if 2 in examples:
     data2 = sm.datasets.star98.load()
-    y2, x2 = data2.endog, sm.add_constant(data2.exog, prepend=False)
-    y2 = y2[:, 0] / y2.sum(axis=1)
-    des_cols2 = [('x.%d' % (i + 1)) for i in range(x2.shape[1])]
+    y2,x2 = data2.endog, sm.add_constant(data2.exog, prepend=False)
+    y2 = y2[:,0]/y2.sum(axis=1)
+    des_cols2 = ['x.%d' % (i+1) for i in range(x2.shape[1])]
     formula2 = r('y~%s-1' % '+'.join(des_cols2))
     frame2 = r.data_frame(y=y2, x=x2)
     results2 = r.glm(formula2, data=frame2, family='binomial')
-    params_est = [results2['coefficients'][k] for k in sorted(results2[
-        'coefficients'])]
+    params_est = [results2['coefficients'][k] for k
+                    in sorted(results2['coefficients'])]
     print(params_est)
-    print(', '.join(['%13.10f'] * 21) % tuple(params_est))
+    print(', '.join(['%13.10f']*21) % tuple(params_est))
diff --git a/statsmodels/examples/koul_and_mc.py b/statsmodels/examples/koul_and_mc.py
index e80f144d1..57c85319e 100644
--- a/statsmodels/examples/koul_and_mc.py
+++ b/statsmodels/examples/koul_and_mc.py
@@ -1,5 +1,9 @@
 import statsmodels.api as sm
 import numpy as np
+
+##################
+#Monte Carlo test#
+##################
 modrand1 = np.random.RandomState(5676576)
 modrand2 = np.random.RandomState(1543543)
 modrand3 = np.random.RandomState(5738276)
@@ -10,14 +14,21 @@ y = np.dot(X, beta)
 params = []
 for i in range(10000):
     yhat = y + modrand2.standard_normal((1000, 1))
-    cens_times = 50 + modrand3.standard_normal((1000, 1)) * 5
+    cens_times = 50 + (modrand3.standard_normal((1000, 1)) * 5)
     yhat_observed = np.minimum(yhat, cens_times)
     censors = np.int_(yhat < cens_times)
     model = sm.emplike.emplikeAFT(yhat_observed, X, censors)
     new_params = model.fit().params
     params.append(new_params)
-mc_est = np.mean(params, axis=0)
+
+mc_est = np.mean(params, axis=0)  # Gives MC parameter estimate
+
+##################
+#Koul replication#
+##################
+
 koul_data = np.genfromtxt('/home/justin/rverify.csv', delimiter=';')
+# ^ Change path to where file is located.
 koul_y = np.log10(koul_data[:, 0])
 koul_x = sm.add_constant(koul_data[:, 2])
 koul_censors = koul_data[:, 1]
diff --git a/statsmodels/examples/l1_demo/demo.py b/statsmodels/examples/l1_demo/demo.py
index b04f09e3c..a31eae2a2 100644
--- a/statsmodels/examples/l1_demo/demo.py
+++ b/statsmodels/examples/l1_demo/demo.py
@@ -3,6 +3,8 @@ import statsmodels.api as sm
 import scipy as sp
 from scipy import linalg
 from scipy import stats
+
+
 docstr = """
 Demonstrates l1 regularization for likelihood models.
 Use different models by setting mode = mnlogit, logit, or probit.
@@ -37,13 +39,83 @@ def main():
     """
     Provides a CLI for the demo.
     """
-    pass
+    usage = "usage: %prog [options] mode"
+    usage += '\n'+docstr
+    parser = OptionParser(usage=usage)
+    # base_alpha
+    parser.add_option("-a", "--base_alpha",
+            help="Size of regularization param (the param actully used will "\
+                    "automatically scale with data size in this demo) "\
+                    "[default: %default]",
+            dest='base_alpha', action='store', type='float', default=0.01)
+    # num_samples
+    parser.add_option("-N", "--num_samples",
+            help="Number of data points to generate for fit "\
+                    "[default: %default]",
+            dest='N', action='store', type='int', default=500)
+    # get_l1_slsqp_results
+    parser.add_option("--get_l1_slsqp_results",
+            help="Do an l1 fit using slsqp. [default: %default]", \
+            action="store_true",dest='get_l1_slsqp_results', default=False)
+    # get_l1_cvxopt_results
+    parser.add_option("--get_l1_cvxopt_results",
+            help="Do an l1 fit using cvxopt. [default: %default]", \
+            action="store_true",dest='get_l1_cvxopt_results', default=False)
+    # num_nonconst_covariates
+    parser.add_option("--num_nonconst_covariates",
+            help="Number of covariates that are not constant "\
+                    "(a constant will be prepended) [default: %default]",
+                    dest='num_nonconst_covariates', action='store',
+                    type='int', default=10)
+    # noise_level
+    parser.add_option("--noise_level",
+            help="Level of the noise relative to signal [default: %default]",
+                    dest='noise_level', action='store', type='float',
+                    default=0.2)
+    # cor_length
+    parser.add_option("--cor_length",
+            help="Correlation length of the (Gaussian) independent variables"\
+                    "[default: %default]",
+                    dest='cor_length', action='store', type='float',
+                    default=2)
+    # num_zero_params
+    parser.add_option("--num_zero_params",
+            help="Number of parameters equal to zero for every target in "\
+                    "logistic regression examples.  [default: %default]",
+                    dest='num_zero_params', action='store', type='int',
+                    default=8)
+    # num_targets
+    parser.add_option("-J", "--num_targets",
+            help="Number of choices for the endogenous response in "\
+                    "multinomial logit example [default: %default]",
+                    dest='num_targets', action='store', type='int', default=3)
+    # print_summaries
+    parser.add_option("-s", "--print_summaries",
+            help="Print the full fit summary. [default: %default]", \
+            action="store_true",dest='print_summaries', default=False)
+    # save_arrays
+    parser.add_option("--save_arrays",
+            help="Save exog/endog/true_params to disk for future use. "\
+                    "[default: %default]",
+                    action="store_true",dest='save_arrays', default=False)
+    # load_old_arrays
+    parser.add_option("--load_old_arrays",
+            help="Load exog/endog/true_params arrays from disk.  "\
+                    "[default: %default]",
+                    action="store_true",dest='load_old_arrays', default=False)
+
+    (options, args) = parser.parse_args()
+
+    assert len(args) == 1
+    mode = args[0].lower()
+
+    run_demo(mode, **options.__dict__)


 def run_demo(mode, base_alpha=0.01, N=500, get_l1_slsqp_results=False,
-    get_l1_cvxopt_results=False, num_nonconst_covariates=10, noise_level=
-    0.2, cor_length=2, num_zero_params=8, num_targets=3, print_summaries=
-    False, save_arrays=False, load_old_arrays=False):
+        get_l1_cvxopt_results=False, num_nonconst_covariates=10,
+        noise_level=0.2, cor_length=2, num_zero_params=8, num_targets=3,
+        print_summaries=False, save_arrays=False, load_old_arrays=False):
     """
     Run the demo and print results.

@@ -80,31 +152,131 @@ def run_demo(mode, base_alpha=0.01, N=500, get_l1_slsqp_results=False,
     load_old_arrays
         Load exog/endog/true_params arrays from disk.
     """
-    pass
+    if mode != 'mnlogit':
+        print("Setting num_targets to 2 since mode != 'mnlogit'")
+        num_targets = 2
+    models = {
+            'logit': sm.Logit, 'mnlogit': sm.MNLogit, 'probit': sm.Probit}
+    endog_funcs = {
+            'logit': get_logit_endog, 'mnlogit': get_logit_endog,
+            'probit': get_probit_endog}
+    # The regularization parameter
+    # Here we scale it with N for simplicity.  In practice, you should
+    # use cross validation to pick alpha
+    alpha = base_alpha * N * sp.ones((num_nonconst_covariates+1, num_targets-1))
+    alpha[0,:] = 0  # Do not regularize the intercept
+
+    #### Make the data and model
+    exog = get_exog(N, num_nonconst_covariates, cor_length)
+    exog = sm.add_constant(exog)
+    true_params = sp.rand(num_nonconst_covariates+1, num_targets-1)
+    if num_zero_params:
+        true_params[-num_zero_params:, :] = 0
+    endog = endog_funcs[mode](true_params, exog, noise_level)
+
+    endog, exog, true_params = save_andor_load_arrays(
+            endog, exog, true_params, save_arrays, load_old_arrays)
+    model = models[mode](endog, exog)
+
+    #### Get the results and print
+    results = run_solvers(model, true_params, alpha,
+            get_l1_slsqp_results, get_l1_cvxopt_results, print_summaries)
+
+    summary_str = get_summary_str(results, true_params, get_l1_slsqp_results,
+            get_l1_cvxopt_results, print_summaries)
+
+    print(summary_str)


 def run_solvers(model, true_params, alpha, get_l1_slsqp_results,
-    get_l1_cvxopt_results, print_summaries):
+        get_l1_cvxopt_results, print_summaries):
     """
     Runs the solvers using the specified settings and returns a result string.
     Works the same for any l1 penalized likelihood model.
     """
-    pass
+    results = {}
+    #### Train the models
+    # Get ML results
+    results['results_ML'] = model.fit(method='newton')
+    # Get l1 results
+    start_params = results['results_ML'].params.ravel(order='F')
+    if get_l1_slsqp_results:
+        results['results_l1_slsqp'] = model.fit_regularized(
+                method='l1', alpha=alpha, maxiter=1000,
+                start_params=start_params, retall=True)
+    if get_l1_cvxopt_results:
+        results['results_l1_cvxopt_cp'] = model.fit_regularized(
+                method='l1_cvxopt_cp', alpha=alpha, maxiter=50,
+                start_params=start_params, retall=True, feastol=1e-5)
+
+    return results


 def get_summary_str(results, true_params, get_l1_slsqp_results,
-    get_l1_cvxopt_results, print_summaries):
+        get_l1_cvxopt_results, print_summaries):
     """
     Gets a string summarizing the results.
     """
-    pass
+    #### Extract specific results
+    results_ML = results['results_ML']
+    RMSE_ML = get_RMSE(results_ML, true_params)
+    if get_l1_slsqp_results:
+        results_l1_slsqp = results['results_l1_slsqp']
+    if get_l1_cvxopt_results:
+        results_l1_cvxopt_cp = results['results_l1_cvxopt_cp']
+
+    #### Format summaries
+    # Short summary
+    print_str = '\n\n=========== Short Error Summary ============'
+    print_str += '\n\n The maximum likelihood fit RMS error = %.4f' % RMSE_ML
+    if get_l1_slsqp_results:
+        RMSE_l1_slsqp = get_RMSE(results_l1_slsqp, true_params)
+        print_str += '\n The l1_slsqp fit RMS error = %.4f' % RMSE_l1_slsqp
+    if get_l1_cvxopt_results:
+        RMSE_l1_cvxopt_cp = get_RMSE(results_l1_cvxopt_cp, true_params)
+        print_str += '\n The l1_cvxopt_cp fit RMS error = %.4f' % RMSE_l1_cvxopt_cp
+    # Parameters
+    print_str += '\n\n\n============== Parameters ================='
+    print_str += "\n\nTrue parameters: \n%s" % true_params
+    # Full summary
+    if print_summaries:
+        print_str += '\n' + results_ML.summary().as_text()
+        if get_l1_slsqp_results:
+            print_str += '\n' + results_l1_slsqp.summary().as_text()
+        if get_l1_cvxopt_results:
+            print_str += '\n' + results_l1_cvxopt_cp.summary().as_text()
+    else:
+        print_str += '\n\nThe maximum likelihood params are \n%s' % results_ML.params
+        if get_l1_slsqp_results:
+            print_str += '\n\nThe l1_slsqp params are \n%s' % results_l1_slsqp.params
+        if get_l1_cvxopt_results:
+            print_str += '\n\nThe l1_cvxopt_cp params are \n%s' % \
+                    results_l1_cvxopt_cp.params
+    # Return
+    return print_str
+
+
+def save_andor_load_arrays(
+        endog, exog, true_params, save_arrays, load_old_arrays):
+    if save_arrays:
+        sp.save('endog.npy', endog)
+        sp.save('exog.npy', exog)
+        sp.save('true_params.npy', true_params)
+    if load_old_arrays:
+        endog = sp.load('endog.npy')
+        exog = sp.load('exog.npy')
+        true_params = sp.load('true_params.npy')
+    return endog, exog, true_params


 def get_RMSE(results, true_params):
     """
     Gets the (normalized) root mean square error.
     """
-    pass
+    diff = results.params.reshape(true_params.shape) - true_params
+    raw_RMSE = sp.sqrt(((diff)**2).sum())
+    param_norm = sp.sqrt((true_params**2).sum())
+    return raw_RMSE / param_norm


 def get_logit_endog(true_params, exog, noise_level):
@@ -112,7 +284,21 @@ def get_logit_endog(true_params, exog, noise_level):
     Gets an endogenous response that is consistent with the true_params,
         perturbed by noise at noise_level.
     """
-    pass
+    N = exog.shape[0]
+    ### Create the probability of entering the different classes,
+    ### given exog and true_params
+    Xdotparams = sp.dot(exog, true_params)
+    noise = noise_level * sp.randn(*Xdotparams.shape)
+    eXB = sp.column_stack((sp.ones(len(Xdotparams)), sp.exp(Xdotparams)))
+    class_probabilities = eXB / eXB.sum(1)[:, None]
+
+    ### Create the endog
+    cdf = class_probabilities.cumsum(axis=1)
+    endog = sp.zeros(N)
+    for i in range(N):
+        endog[i] = sp.searchsorted(cdf[i, :], sp.rand())
+
+    return endog


 def get_probit_endog(true_params, exog, noise_level):
@@ -120,7 +306,19 @@ def get_probit_endog(true_params, exog, noise_level):
     Gets an endogenous response that is consistent with the true_params,
         perturbed by noise at noise_level.
     """
-    pass
+    N = exog.shape[0]
+    ### Create the probability of entering the different classes,
+    ### given exog and true_params
+    Xdotparams = sp.dot(exog, true_params)
+    noise = noise_level * sp.randn(*Xdotparams.shape)
+
+    ### Create the endog
+    cdf = stats.norm._cdf(-Xdotparams)
+    endog = sp.zeros(N)
+    for i in range(N):
+        endog[i] = sp.searchsorted(cdf[i, :], sp.rand())
+
+    return endog


 def get_exog(N, num_nonconst_covariates, cor_length):
@@ -135,7 +333,20 @@ def get_exog(N, num_nonconst_covariates, cor_length):
     BEWARE:  With very long correlation lengths, you often get a singular KKT
         matrix (during the l1_cvxopt_cp fit)
     """
-    pass
+    ## Create the noiseless exog
+    uncorrelated_exog = sp.randn(N, num_nonconst_covariates)
+    if cor_length == 0:
+        exog = uncorrelated_exog
+    else:
+        cov_matrix = sp.zeros((num_nonconst_covariates, num_nonconst_covariates))
+        j = sp.arange(num_nonconst_covariates)
+        for i in range(num_nonconst_covariates):
+            cov_matrix[i,:] = sp.exp(-sp.fabs(i-j) / cor_length)
+        chol = linalg.cholesky(cov_matrix)  # cov_matrix = sp.dot(chol.T, chol)
+        exog = sp.dot(uncorrelated_exog, chol)
+    ## Return
+    return exog
+


 if __name__ == '__main__':
diff --git a/statsmodels/examples/l1_demo/short_demo.py b/statsmodels/examples/l1_demo/short_demo.py
index b227c1682..35379fafb 100644
--- a/statsmodels/examples/l1_demo/short_demo.py
+++ b/statsmodels/examples/l1_demo/short_demo.py
@@ -19,49 +19,91 @@ The l1_cvxopt_cp solver is part of CVXOPT and this package needs to be
 """
 import matplotlib.pyplot as plt
 import numpy as np
+
 import statsmodels.api as sm
+
+## Load the data from Spector and Mazzeo (1980)
 spector_data = sm.datasets.spector.load()
 spector_data.exog = sm.add_constant(spector_data.exog)
 N = len(spector_data.endog)
 K = spector_data.exog.shape[1]
+
+### Logit Model
 logit_mod = sm.Logit(spector_data.endog, spector_data.exog)
+## Standard logistic regression
 logit_res = logit_mod.fit()
+
+## Regularized regression
+
+# Set the reularization parameter to something reasonable
 alpha = 0.05 * N * np.ones(K)
-logit_l1_res = logit_mod.fit_regularized(method='l1', alpha=alpha, acc=1e-06)
-logit_l1_cvxopt_res = logit_mod.fit_regularized(method='l1_cvxopt_cp',
-    alpha=alpha)
-print('============ Results for Logit =================')
-print('ML results')
+
+# Use l1, which solves via a built-in (scipy.optimize) solver
+logit_l1_res = logit_mod.fit_regularized(method='l1', alpha=alpha, acc=1e-6)
+
+# Use l1_cvxopt_cp, which solves with a CVXOPT solver
+logit_l1_cvxopt_res = logit_mod.fit_regularized(
+        method='l1_cvxopt_cp', alpha=alpha)
+
+## Print results
+print("============ Results for Logit =================")
+print("ML results")
 print(logit_res.summary())
-print('l1 results')
+print("l1 results")
 print(logit_l1_res.summary())
 print(logit_l1_cvxopt_res.summary())
+
+### Multinomial Logit Example using American National Election Studies Data
 anes_data = sm.datasets.anes96.load()
 anes_exog = anes_data.exog
 anes_exog = sm.add_constant(anes_exog, prepend=False)
 mlogit_mod = sm.MNLogit(anes_data.endog, anes_exog)
 mlogit_res = mlogit_mod.fit()
+
+## Set the regularization parameter.
 alpha = 10 * np.ones((mlogit_mod.J - 1, mlogit_mod.K))
-alpha[-1, :] = 0
+
+# Do not regularize the constant
+alpha[-1,:] = 0
 mlogit_l1_res = mlogit_mod.fit_regularized(method='l1', alpha=alpha)
 print(mlogit_l1_res.params)
-print('============ Results for MNLogit =================')
-print('ML results')
+
+#mlogit_l1_res = mlogit_mod.fit_regularized(
+#        method='l1_cvxopt_cp', alpha=alpha, abstol=1e-10, trim_tol=1e-6)
+#print mlogit_l1_res.params
+
+## Print results
+print("============ Results for MNLogit =================")
+print("ML results")
 print(mlogit_res.summary())
-print('l1 results')
+print("l1 results")
 print(mlogit_l1_res.summary())
+#
+#
+#### Logit example with many params, sweeping alpha
 spector_data = sm.datasets.spector.load()
 X = spector_data.exog
 Y = spector_data.endog
-N = 50
+
+## Fit
+N = 50  # number of points to solve at
 K = X.shape[1]
 logit_mod = sm.Logit(Y, X)
-coeff = np.zeros((N, K))
+coeff = np.zeros((N, K))  # Holds the coefficients
 alphas = 1 / np.logspace(-0.5, 2, N)
+
+## Sweep alpha and store the coefficients
+# QC check does not always pass with the default options.
+# Use the options QC_verbose=True and disp=True
+# to to see what is happening.  It just barely does not pass, so I decreased
+# acc and increased QC_tol to make it pass
 for n, alpha in enumerate(alphas):
-    logit_res = logit_mod.fit_regularized(method='l1', alpha=alpha,
-        trim_mode='off', QC_tol=0.1, disp=False, QC_verbose=True, acc=1e-15)
-    coeff[n, :] = logit_res.params
+    logit_res = logit_mod.fit_regularized(
+        method='l1', alpha=alpha, trim_mode='off', QC_tol=0.1, disp=False,
+        QC_verbose=True, acc=1e-15)
+    coeff[n,:] = logit_res.params
+
+## Plot
 plt.figure(1)
 plt.clf()
 plt.grid()
@@ -69,6 +111,6 @@ plt.title('Regularization Path')
 plt.xlabel('alpha')
 plt.ylabel('Parameter value')
 for i in range(K):
-    plt.plot(alphas, coeff[:, i], label='X' + str(i), lw=3)
+    plt.plot(alphas, coeff[:,i], label='X'+str(i), lw=3)
 plt.legend(loc='best')
 plt.show()
diff --git a/statsmodels/examples/l1_demo/sklearn_compare.py b/statsmodels/examples/l1_demo/sklearn_compare.py
index 9d0898420..a1ab6c9de 100644
--- a/statsmodels/examples/l1_demo/sklearn_compare.py
+++ b/statsmodels/examples/l1_demo/sklearn_compare.py
@@ -17,47 +17,72 @@ The results "prove" that the regularization paths are the same.  Note that
     are NOT monotonic.  As a result, the paths do not match up perfectly.
 """
 from statsmodels.compat.python import lrange
+
 import matplotlib.pyplot as plt
 import numpy as np
 from sklearn import linear_model
+
 import statsmodels.api as sm
+
+## Decide which dataset to use
+# Use either spector or anes96
 use_spector = False
+
+#### Load data
+## The Spector and Mazzeo (1980) data from statsmodels
 if use_spector:
     spector_data = sm.datasets.spector.load()
     X = spector_data.exog
     Y = spector_data.endog
 else:
     raise Exception(
-        'The anes96 dataset is now loaded in as a short version that cannot be used here'
-        )
+        "The anes96 dataset is now loaded in as a short version that cannot "\
+        "be used here")
     anes96_data = sm.datasets.anes96.load_pandas()
     Y = anes96_data.exog.vote
-N = 200
+
+#### Fit and plot results
+N = 200  # number of points to solve at
 K = X.shape[1]
+
+## statsmodels
 logit_mod = sm.Logit(Y, X)
-sm_coeff = np.zeros((N, K))
+sm_coeff = np.zeros((N, K))  # Holds the coefficients
 if use_spector:
-    alphas = 1 / np.logspace(-1, 2, N)
+    alphas = 1 / np.logspace(-1, 2, N)  # for spector_data
 else:
-    alphas = 1 / np.logspace(-3, 2, N)
+    alphas = 1 / np.logspace(-3, 2, N)  # for anes96_data
 for n, alpha in enumerate(alphas):
-    logit_res = logit_mod.fit_regularized(method='l1', alpha=alpha, disp=
-        False, trim_mode='off')
-    sm_coeff[n, :] = logit_res.params
+    logit_res = logit_mod.fit_regularized(
+            method='l1', alpha=alpha, disp=False, trim_mode='off')
+    sm_coeff[n,:] = logit_res.params
+## Sklearn
 sk_coeff = np.zeros((N, K))
 if use_spector:
     Cs = np.logspace(-0.45, 2, N)
 else:
     Cs = np.logspace(-2.6, 0, N)
 for n, C in enumerate(Cs):
-    clf = linear_model.LogisticRegression(C=C, penalty='l1', fit_intercept=
-        False)
+    clf = linear_model.LogisticRegression(
+            C=C, penalty='l1', fit_intercept=False)
     clf.fit(X, Y)
     sk_coeff[n, :] = clf.coef_
-sk_special_X = np.fabs(sk_coeff[:, 2])
-sm_special_X = np.fabs(sm_coeff[:, 2])
+
+## Get the reparametrization of sm_coeff that makes the paths equal
+# Do this by finding one single re-parameterization of the second coefficient
+# that makes the path for the second coefficient (almost) identical.  This
+# same parameterization will work for the other two coefficients since the
+# the regularization coefficients (in sk and sm) are related by a constant.
+#
+# special_X is chosen since this coefficient becomes non-zero before the
+# other two...and is relatively monotonic...with both datasets.
+sk_special_X = np.fabs(sk_coeff[:,2])
+sm_special_X = np.fabs(sm_coeff[:,2])
 s = np.zeros(N)
+# Note that sk_special_X will not always be perfectly sorted...
 s = np.searchsorted(sk_special_X, sm_special_X)
+
+## Plot
 plt.figure(2)
 plt.clf()
 plt.grid()
@@ -67,12 +92,12 @@ plt.title('Regularization Paths')
 colors = ['b', 'r', 'k', 'g', 'm', 'c', 'y']
 for coeff, name in [(sm_coeff, 'sm'), (sk_coeff, 'sk')]:
     if name == 'sk':
-        ltype = 'x'
-        t = lrange(N)
+        ltype = 'x'  # linetype
+        t = lrange(N)  # The 'time' parameter
     else:
         ltype = 'o'
         t = s
     for i in range(K):
-        plt.plot(t, coeff[:, i], ltype + colors[i], label=name + '-X' + str(i))
+        plt.plot(t, coeff[:,i], ltype+colors[i], label=name+'-X'+str(i))
 plt.legend(loc='best')
 plt.show()
diff --git a/statsmodels/examples/run_all.py b/statsmodels/examples/run_all.py
index 64c529e37..aab31e69e 100644
--- a/statsmodels/examples/run_all.py
+++ b/statsmodels/examples/run_all.py
@@ -1,4 +1,4 @@
-"""run all examples to make sure we do not get an exception
+'''run all examples to make sure we do not get an exception

 Note:
 If an example contaings plt.show(), then all plot windows have to be closed
@@ -6,47 +6,64 @@ manually, at least in my setup.

 uncomment plt.show() to show all plot windows

-"""
+'''
 from statsmodels.compat.python import lzip, input
-import matplotlib.pyplot as plt
+import matplotlib.pyplot as plt #matplotlib is required for many examples
+
 stop_on_error = True
+
+
 filelist = ['example_glsar.py', 'example_wls.py', 'example_gls.py',
-    'example_glm.py', 'example_ols_tftest.py', 'example_ols.py',
-    'example_ols_minimal.py', 'example_rlm.py', 'example_discrete.py',
-    'example_predict.py', 'example_ols_table.py', 'tut_ols.py',
-    'tut_ols_rlm.py', 'tut_ols_wls.py']
+            'example_glm.py', 'example_ols_tftest.py', #'example_rpy.py',
+            'example_ols.py', 'example_ols_minimal.py', 'example_rlm.py',
+            'example_discrete.py', 'example_predict.py',
+            'example_ols_table.py',
+            'tut_ols.py', 'tut_ols_rlm.py', 'tut_ols_wls.py']
+
 use_glob = True
 if use_glob:
     import glob
     filelist = glob.glob('*.py')
+
 print(lzip(range(len(filelist)), filelist))
+
 for fname in ['run_all.py', 'example_rpy.py']:
     filelist.remove(fname)
+
+#filelist = filelist[15:]
+
+
+
+#temporarily disable show
 plt_show = plt.show
+def noop(*args):
+    pass
 plt.show = noop
-cont = input(
-    """Are you sure you want to run all of the examples?
+
+cont = input("""Are you sure you want to run all of the examples?
 This is done mainly to check that they are up to date.
-(y/n) >>> """
-    )
+(y/n) >>> """)
 has_errors = []
 if 'y' in cont.lower():
     for run_all_f in filelist:
         try:
-            print('\n\nExecuting example file', run_all_f)
-            print('-----------------------' + '-' * len(run_all_f))
-            with open(run_all_f, encoding='utf-8') as f:
+            print("\n\nExecuting example file", run_all_f)
+            print("-----------------------" + "-"*len(run_all_f))
+            with open(run_all_f, encoding="utf-8") as f:
                 exec(f.read())
         except:
-            print('**********************' + '*' * len(run_all_f))
-            print('ERROR in example file', run_all_f)
-            print('**********************' + '*' * len(run_all_f))
+            #f might be overwritten in the executed file
+            print("**********************" + "*"*len(run_all_f))
+            print("ERROR in example file", run_all_f)
+            print("**********************" + "*"*len(run_all_f))
             has_errors.append(run_all_f)
             if stop_on_error:
                 raise
-print("""
-Modules that raised exception:""")
+
+print('\nModules that raised exception:')
 print(has_errors)
+
+#reenable show after closing windows
 plt.close('all')
 plt.show = plt_show
 plt.show()
diff --git a/statsmodels/examples/try_2regress.py b/statsmodels/examples/try_2regress.py
index 491f253ca..536f1acd1 100644
--- a/statsmodels/examples/try_2regress.py
+++ b/statsmodels/examples/try_2regress.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """F test for null hypothesis that coefficients in two regressions are the same

 see discussion in http://mail.scipy.org/pipermail/scipy-user/2010-March/024851.html
@@ -5,41 +6,57 @@ see discussion in http://mail.scipy.org/pipermail/scipy-user/2010-March/024851.h
 Created on Thu Mar 25 22:56:45 2010
 Author: josef-pktd
 """
+
 import numpy as np
 from numpy.testing import assert_almost_equal
 import statsmodels.api as sm
+
 np.random.seed(87654589)
-nobs = 10
+
+nobs = 10 #100
 x1 = np.random.randn(nobs)
-y1 = 10 + 15 * x1 + 2 * np.random.randn(nobs)
+y1 = 10 + 15*x1 + 2*np.random.randn(nobs)
+
 x1 = sm.add_constant(x1, prepend=False)
-assert_almost_equal(x1, np.vander(x1[:, 0], 2), 16)
+assert_almost_equal(x1, np.vander(x1[:,0],2), 16)
 res1 = sm.OLS(y1, x1).fit()
 print(res1.params)
-print(np.polyfit(x1[:, 0], y1, 1))
-assert_almost_equal(res1.params, np.polyfit(x1[:, 0], y1, 1), 14)
-print(res1.summary(xname=['x1', 'const1']))
+print(np.polyfit(x1[:,0], y1, 1))
+assert_almost_equal(res1.params, np.polyfit(x1[:,0], y1, 1), 14)
+print(res1.summary(xname=['x1','const1']))
+
+#regression 2
 x2 = np.random.randn(nobs)
-y2 = 19 + 17 * x2 + 2 * np.random.randn(nobs)
+y2 = 19 + 17*x2 + 2*np.random.randn(nobs)
+#y2 = 10 + 15*x2 + 2*np.random.randn(nobs)  # if H0 is true
+
 x2 = sm.add_constant(x2, prepend=False)
-assert_almost_equal(x2, np.vander(x2[:, 0], 2), 16)
+assert_almost_equal(x2, np.vander(x2[:,0],2), 16)
+
 res2 = sm.OLS(y2, x2).fit()
 print(res2.params)
-print(np.polyfit(x2[:, 0], y2, 1))
-assert_almost_equal(res2.params, np.polyfit(x2[:, 0], y2, 1), 14)
-print(res2.summary(xname=['x2', 'const2']))
-x = np.concatenate((x1, x2), 0)
-y = np.concatenate((y1, y2))
-dummy = np.arange(2 * nobs) > nobs - 1
-x = np.column_stack((x, x * dummy[:, None]))
+print(np.polyfit(x2[:,0], y2, 1))
+assert_almost_equal(res2.params, np.polyfit(x2[:,0], y2, 1), 14)
+print(res2.summary(xname=['x2','const2']))
+
+
+# joint regression
+
+x = np.concatenate((x1,x2),0)
+y = np.concatenate((y1,y2))
+dummy = np.arange(2*nobs)>nobs-1
+x = np.column_stack((x,x*dummy[:,None]))
+
 res = sm.OLS(y, x).fit()
-print(res.summary(xname=['x', 'const', 'x2', 'const2']))
-print("""
-F test for equal coefficients in 2 regression equations""")
-print(res.f_test([[0, 0, 1, 0], [0, 0, 0, 1]]))
-print("""
-checking coefficients individual versus joint""")
+print(res.summary(xname=['x','const','x2','const2']))
+
+print('\nF test for equal coefficients in 2 regression equations')
+#effect of dummy times second regression is zero
+#is equivalent to 3rd and 4th coefficient are both zero
+print(res.f_test([[0,0,1,0],[0,0,0,1]]))
+
+print('\nchecking coefficients individual versus joint')
 print(res1.params, res2.params)
-print(res.params[:2], res.params[:2] + res.params[2:])
+print(res.params[:2], res.params[:2]+res.params[2:])
 assert_almost_equal(res1.params, res.params[:2], 13)
-assert_almost_equal(res2.params, res.params[:2] + res.params[2:], 13)
+assert_almost_equal(res2.params, res.params[:2]+res.params[2:], 13)
diff --git a/statsmodels/examples/try_fit_constrained.py b/statsmodels/examples/try_fit_constrained.py
index 7acc44ab4..970570237 100644
--- a/statsmodels/examples/try_fit_constrained.py
+++ b/statsmodels/examples/try_fit_constrained.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Fri May 30 22:56:57 2014

@@ -5,65 +6,114 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 from numpy.testing import assert_allclose, assert_raises
-from statsmodels.base._constraints import TransformRestriction, fit_constrained, transform_params_constraint
+
+from statsmodels.base._constraints import (
+    TransformRestriction,
+    fit_constrained,
+    transform_params_constraint,
+)
+
 if __name__ == '__main__':
+
+
+
     R = np.array([[1, 1, 0, 0, 0], [0, 0, 1, -1, 0]])
+
+
+
     k_constr, k_vars = R.shape
+
     m = np.eye(k_vars) - R.T.dot(np.linalg.pinv(R).T)
     evals, evecs = np.linalg.eigh(m)
+
     L = evecs[:, :k_constr]
     T = evecs[:, k_constr:]
+
     print(T.T.dot(np.eye(k_vars)))
+
     tr = np.column_stack((T, R.T))
+
     q = [2, 0]
     tr0 = TransformRestriction(R, q)
-    p_reduced = [1, 1, 1]
+
+    p_reduced = [1,1,1]
+    #round trip test
     assert_allclose(tr0.reduce(tr0.expand(p_reduced)), p_reduced, rtol=1e-14)
+
+
     p = tr0.expand(p_reduced)
     assert_allclose(R.dot(p), q, rtol=1e-14)
+
+    # inconsistent restrictions
+    #Ri = np.array([[1, 1, 0, 0, 0], [0, 0, 1, -1, 0], [0, 0, 1, -2, 0]])
     R = np.array([[1, 1, 0, 0, 0], [0, 0, 1, -1, 0], [0, 0, 1, 0, -1]])
     q = np.zeros(R.shape[0])
-    tr1 = TransformRestriction(R, q)
-    p = tr1.expand([1, 1])
+    tr1 = TransformRestriction(R, q)  # bug raises error with q
+    p = tr1.expand([1,1])
+
+    # inconsistent restrictions that has a solution with p3=0
     Ri = np.array([[1, 1, 0, 0, 0], [0, 0, 1, -1, 0], [0, 0, 1, -2, 0]])
     tri = TransformRestriction(Ri, [0, 1, 1])
-    p = tri.expand([1, 1])
-    print(p[[2, 3]])
+    # Note: the only way this can hold is if variable 3 is zero
+    p = tri.expand([1,1])
+    print(p[[2,3]])
+    #array([  1.00000000e+00,  -1.34692639e-17])
+
+    # inconsistent without any possible solution
     Ri2 = np.array([[0, 0, 0, 1, 0], [0, 0, 1, -1, 0], [0, 0, 1, -2, 0]])
     q = [1, 1]
+    #tri2 = TransformRestriction(Ri2, q)
+    #p = tri.expand([1,1])
     assert_raises(ValueError, TransformRestriction, Ri2, q)
+    # L does not have full row rank, calculating constant fails with Singular Matrix
+
+    # transform data xr = T x
     np.random.seed(1)
     x = np.random.randn(10, 5)
     xr = tr1.reduce(x)
+    # roundtrip
     x2 = tr1.expand(xr)
+    # this does not hold ? do not use constant? do not need it anyway ?
+    #assert_allclose(x2, x, rtol=1e-14)
+
+
     from patsy import DesignInfo
+
     names = 'a b c d'.split()
     LC = DesignInfo(names).linear_constraint('a + b = 0')
-    LC = DesignInfo(names).linear_constraint(['a + b = 0', 'a + 2*c = 1',
-        'b-a', 'c-a', 'd-a'])
+    LC = DesignInfo(names).linear_constraint(['a + b = 0', 'a + 2*c = 1', 'b-a', 'c-a', 'd-a'])
+    #LC = DesignInfo(self.model.exog_names).linear_constraint(r_matrix)
     r_matrix, q_matrix = LC.coefs, LC.constants
+
     np.random.seed(123)
     nobs = 20
     x = 1 + np.random.randn(nobs, 4)
     exog = np.column_stack((np.ones(nobs), x))
     endog = exog.sum(1) + np.random.randn(nobs)
+
     from statsmodels.regression.linear_model import OLS
     res2 = OLS(endog, exog).fit()
-    transf = TransformRestriction([[0, 0, 0, 1, 1]], res2.params[-2:].sum())
+    #transf = TransformRestriction(np.eye(exog.shape[1])[:2], res2.params[:2] / 2)
+    transf = TransformRestriction([[0, 0, 0,1,1]], res2.params[-2:].sum())
     exog_st = transf.reduce(exog)
     res1 = OLS(endog, exog_st).fit()
+    # need to correct for constant/offset in the optimization
     res1 = OLS(endog - exog.dot(transf.constant.squeeze()), exog_st).fit()
     params = transf.expand(res1.params).squeeze()
     assert_allclose(params, res2.params, rtol=1e-13)
     print(res2.params)
     print(params)
     print(res1.params)
+
     res3_ols = OLS(endog - exog[:, -1], exog[:, :-2]).fit()
-    transf3 = TransformRestriction([[0, 0, 0, 1, 0], [0, 0, 0, 0, 1]], [0, 1])
+    #transf = TransformRestriction(np.eye(exog.shape[1])[:2], res2.params[:2] / 2)
+    transf3 = TransformRestriction([[0, 0, 0, 1, 0],[0, 0, 0, 0, 1]], [0, 1])
     exog3_st = transf3.reduce(exog)
     res3 = OLS(endog, exog3_st).fit()
+    # need to correct for constant/offset in the optimization
     res3 = OLS(endog - exog.dot(transf3.constant.squeeze()), exog3_st).fit()
     params = transf3.expand(res3.params).squeeze()
     assert_allclose(params[:-2], res3_ols.params, rtol=1e-13)
@@ -71,56 +121,68 @@ if __name__ == '__main__':
     print(params)
     print(res3_ols.params)
     print(res3_ols.bse)
-    cov_params3 = transf3.transf_mat.dot(res3.cov_params()).dot(transf3.
-        transf_mat.T)
+    # the following raises `ValueError: cannot test a constant constraint`
+    #tt = res3.t_test(transf3.transf_mat, transf3.constant.squeeze())
+    #print tt.sd
+    cov_params3 = transf3.transf_mat.dot(res3.cov_params()).dot(transf3.transf_mat.T)
     bse3 = np.sqrt(np.diag(cov_params3))
     print(bse3)
-    tp = transform_params_constraint(res2.params, res2.
-        normalized_cov_params, transf3.R, transf3.q)
-    tp = transform_params_constraint(res2.params, res2.cov_params(),
-        transf3.R, transf3.q)
+
+    tp = transform_params_constraint(res2.params, res2.normalized_cov_params,
+                                     transf3.R, transf3.q)
+    tp = transform_params_constraint(res2.params, res2.cov_params(), transf3.R, transf3.q)
+
     import statsmodels.api as sm
     rand_data = sm.datasets.randhie.load()
     rand_exog = rand_data.exog.view(float).reshape(len(rand_data.exog), -1)
     rand_exog = sm.add_constant(rand_exog, prepend=False)
+
+
+    # Fit Poisson model:
     poisson_mod0 = sm.Poisson(rand_data.endog, rand_exog)
-    poisson_res0 = poisson_mod0.fit(method='newton')
+    poisson_res0 = poisson_mod0.fit(method="newton")
+
     R = np.zeros((2, 10))
     R[0, -2] = 1
     R[1, -1] = 1
     transfp = TransformRestriction(R, [0, 1])
     poisson_mod = sm.Poisson(rand_data.endog, rand_exog[:, :-2])
-    poisson_res = poisson_mod.fit(method='newton', offset=rand_exog.dot(
-        transfp.constant.squeeze()))
+    # note wrong offset, why did I put offset in fit ? it's ignored
+    poisson_res = poisson_mod.fit(method="newton", offset=rand_exog.dot(transfp.constant.squeeze()))
+
     exogp_st = transfp.reduce(rand_exog)
     poisson_modr = sm.Poisson(rand_data.endog, exogp_st)
-    poisson_resr = poisson_modr.fit(method='newton')
+    poisson_resr = poisson_modr.fit(method="newton")
     paramsp = transfp.expand(poisson_resr.params).squeeze()
     print('\nPoisson')
     print(paramsp)
     print(poisson_res.params)
+    # error because I do not use the unconstrained basic model
+#    tp = transform_params_constraint(poisson_res.params, poisson_res.cov_params(), transfp.R, transfp.q)
+#    cov_params3 = transf3.transf_mat.dot(res3.cov_params()).dot(transf3.transf_mat.T)
+#    bse3 = np.sqrt(np.diag(cov_params3))
+
+
     poisson_mod0 = sm.Poisson(rand_data.endog, rand_exog)
-    poisson_res0 = poisson_mod0.fit(method='newton')
-    tp = transform_params_constraint(poisson_res0.params, poisson_res0.
-        cov_params(), transfp.R, transfp.q)
-    cov_params3 = transf3.transf_mat.dot(res3.cov_params()).dot(transf3.
-        transf_mat.T)
+    poisson_res0 = poisson_mod0.fit(method="newton")
+    tp = transform_params_constraint(poisson_res0.params, poisson_res0.cov_params(), transfp.R, transfp.q)
+    cov_params3 = transf3.transf_mat.dot(res3.cov_params()).dot(transf3.transf_mat.T)
     bse3 = np.sqrt(np.diag(cov_params3))
-    poisson_mod = sm.Poisson(rand_data.endog, rand_exog[:, :-2], offset=
-        rand_exog[:, -1])
-    poisson_res = poisson_mod.fit(method='newton')
+
+    # try again same example as it was intended
+
+    poisson_mod = sm.Poisson(rand_data.endog, rand_exog[:, :-2], offset=rand_exog[:, -1])
+    poisson_res = poisson_mod.fit(method="newton")
+
     exogp_st = transfp.reduce(rand_exog)
-    poisson_modr = sm.Poisson(rand_data.endog, exogp_st, offset=rand_exog.
-        dot(transfp.constant.squeeze()))
-    poisson_resr = poisson_modr.fit(method='newton')
+    poisson_modr = sm.Poisson(rand_data.endog, exogp_st, offset=rand_exog.dot(transfp.constant.squeeze()))
+    poisson_resr = poisson_modr.fit(method="newton")
     paramsp = transfp.expand(poisson_resr.params).squeeze()
     print('\nPoisson')
     print(paramsp)
     print(poisson_resr.params)
-    tp = transform_params_constraint(poisson_res0.params, poisson_res0.
-        cov_params(), transfp.R, transfp.q)
-    cov_paramsp = transfp.transf_mat.dot(poisson_resr.cov_params()).dot(transfp
-        .transf_mat.T)
+    tp = transform_params_constraint(poisson_res0.params, poisson_res0.cov_params(), transfp.R, transfp.q)
+    cov_paramsp = transfp.transf_mat.dot(poisson_resr.cov_params()).dot(transfp.transf_mat.T)
     bsep = np.sqrt(np.diag(cov_paramsp))
     print(bsep)
     p, cov, res_r = fit_constrained(poisson_mod0, transfp.R, transfp.q)
diff --git a/statsmodels/examples/try_gee.py b/statsmodels/examples/try_gee.py
index 906b3bda4..22af6bbab 100644
--- a/statsmodels/examples/try_gee.py
+++ b/statsmodels/examples/try_gee.py
@@ -1,61 +1,82 @@
+# -*- coding: utf-8 -*-
 """

 Created on Thu Jul 18 14:57:46 2013

 Author: Josef Perktold
 """
+
 import numpy as np
+
 from statsmodels.genmod.generalized_estimating_equations import GEE, GEEMargins
+
 from statsmodels.genmod.families import Gaussian
+
 from statsmodels.genmod.tests import gee_gaussian_simulation_check as gees
-da, va = gees.gen_gendat_ar0(0.6)()
+
+da,va = gees.gen_gendat_ar0(0.6)()
 ga = Gaussian()
-lhs = np.array([[0.0, 1, 1, 0, 0]])
-rhs = np.r_[0.0,]
+lhs = np.array([[0., 1, 1, 0, 0],])
+rhs = np.r_[0.,]
+
 example = []
 if 'constraint' in example:
-    md = GEE(da.endog, da.exog, da.group, da.time, ga, va, constraint=(lhs,
-        rhs))
+    md = GEE(da.endog, da.exog, da.group, da.time, ga, va,
+                     constraint=(lhs, rhs))
     mdf = md.fit()
     print(mdf.summary())
-md2 = GEE(da.endog, da.exog, da.group, da.time, ga, va, constraint=None)
+
+
+md2 = GEE(da.endog, da.exog, da.group, da.time, ga, va,
+                 constraint=None)
 mdf2 = md2.fit()
 print('\n\n')
 print(mdf2.summary())
+
+
 mdf2.use_t = False
 mdf2.df_resid = np.diff(mdf2.model.exog.shape)
 tt2 = mdf2.t_test(np.eye(len(mdf2.params)))
+# need main to get wald_test
+#print mdf2.wald_test(np.eye(len(mdf2.params))[1:])
+
 mdf2.predict(da.exog.mean(0), offset=0)
+# -0.10867809062890971
+
 marg2 = GEEMargins(mdf2, ())
 print(marg2.summary())
+
+
 mdf_nc = md2.fit(cov_type='naive')
 mdf_bc = md2.fit(cov_type='bias_reduced')
+
 mdf_nc.use_t = False
 mdf_nc.df_resid = np.diff(mdf2.model.exog.shape)
 mdf_bc.use_t = False
 mdf_bc.df_resid = np.diff(mdf2.model.exog.shape)
+
 tt_nc = mdf_nc.t_test(np.eye(len(mdf2.params)))
 tt_bc = mdf_bc.t_test(np.eye(len(mdf2.params)))
+
 print('\nttest robust')
 print(tt2)
 print('\nttest naive')
 print(tt_nc)
-print("""
-ttest bias corrected""")
+print('\nttest bias corrected')
 print(tt_bc)
-print("""
-bse after fit option """)
+
+print("\nbse after fit option ")
 bse = np.column_stack((mdf2.bse, mdf2.bse, mdf_nc.bse, mdf_bc.bse))
 print(bse)
-print("""
-implemented `standard_errors`""")
-bse2 = np.column_stack((mdf2.bse, mdf2.standard_errors(), mdf2.
-    standard_errors(covariance_type='naive'), mdf2.standard_errors(
-    covariance_type='bias_reduced')))
+
+print("\nimplemented `standard_errors`")
+bse2 = np.column_stack((mdf2.bse, mdf2.standard_errors(),
+                                 mdf2.standard_errors(covariance_type='naive'),
+                                 mdf2.standard_errors(covariance_type='bias_reduced')))
 print(bse2)
-print('bse and `standard_errors` agree:', np.allclose(bse, bse2))
-print("""
-implied standard errors in t_test""")
+print("bse and `standard_errors` agree:", np.allclose(bse, bse2))
+
+print("\nimplied standard errors in t_test")
 bse1 = np.column_stack((mdf2.bse, tt2.sd, tt_nc.sd, tt_bc.sd))
 print(bse1)
-print('t_test uses correct cov_params:', np.allclose(bse1, bse2))
+print("t_test uses correct cov_params:", np.allclose(bse1, bse2))
diff --git a/statsmodels/examples/try_gof_chisquare.py b/statsmodels/examples/try_gof_chisquare.py
index c839e1fbb..b08695a56 100644
--- a/statsmodels/examples/try_gof_chisquare.py
+++ b/statsmodels/examples/try_gof_chisquare.py
@@ -1,76 +1,100 @@
+# -*- coding: utf-8 -*-
 """

 Created on Thu Feb 28 15:37:53 2013

 Author: Josef Perktold
 """
+
 import numpy as np
 from scipy import stats
-from statsmodels.stats.gof import chisquare, chisquare_power, chisquare_effectsize
+from statsmodels.stats.gof import (chisquare, chisquare_power,
+                                  chisquare_effectsize)
+
 from numpy.testing import assert_almost_equal
+
+
 nobs = 30000
 n_bins = 5
-probs = 1.0 / np.arange(2, n_bins + 2)
+probs = 1./np.arange(2, n_bins + 2)
 probs /= probs.sum()
+#nicer
 probs = np.round(probs, 2)
 probs[-1] = 1 - probs[:-1].sum()
-print('probs', probs)
+print("probs", probs)
 probs_d = probs.copy()
 delta = 0.01
 probs_d[0] += delta
 probs_d[1] -= delta
 probs_cs = probs.cumsum()
-rvs = np.argmax(np.random.rand(nobs, 1) < probs_cs, 1)
+#rvs = np.random.multinomial(n_bins, probs, size=10)
+#rvs = np.round(np.random.randn(10), 2)
+rvs = np.argmax(np.random.rand(nobs,1) < probs_cs, 1)
 print(probs)
-print(np.bincount(rvs) * (1.0 / nobs))
+print(np.bincount(rvs) * (1. / nobs))
+
+
 freq = np.bincount(rvs)
-print(stats.chisquare(freq, nobs * probs))
-print('null', chisquare(freq, nobs * probs))
-print('delta', chisquare(freq, nobs * probs_d))
-chisq_null, pval_null = chisquare(freq, nobs * probs)
-d_null = ((freq / float(nobs) - probs) ** 2 / probs).sum()
+print(stats.chisquare(freq, nobs*probs))
+print('null', chisquare(freq, nobs*probs))
+print('delta', chisquare(freq, nobs*probs_d))
+chisq_null, pval_null = chisquare(freq, nobs*probs)
+
+# effect size ?
+d_null = ((freq / float(nobs) - probs)**2 / probs).sum()
 print(d_null)
-d_delta = ((freq / float(nobs) - probs_d) ** 2 / probs_d).sum()
+d_delta = ((freq / float(nobs) - probs_d)**2 / probs_d).sum()
 print(d_delta)
-d_null_alt = ((probs - probs_d) ** 2 / probs_d).sum()
+d_null_alt = ((probs - probs_d)**2 / probs_d).sum()
 print(d_null_alt)
-print("""
-chisquare with value""")
-chisq, pval = chisquare(freq, nobs * probs_d)
+
+print('\nchisquare with value')
+chisq, pval = chisquare(freq, nobs*probs_d)
 print(stats.ncx2.sf(chisq_null, n_bins, 0.001 * nobs))
 print(stats.ncx2.sf(chisq, n_bins, 0.001 * nobs))
 print(stats.ncx2.sf(chisq, n_bins, d_delta * nobs))
-print(chisquare(freq, nobs * probs_d, value=np.sqrt(d_delta)))
-print(chisquare(freq, nobs * probs_d, value=np.sqrt(chisq / nobs)))
+print(chisquare(freq, nobs*probs_d, value=np.sqrt(d_delta)))
+print(chisquare(freq, nobs*probs_d, value=np.sqrt(chisq / nobs)))
 print()
-assert_almost_equal(stats.chi2.sf(d_delta * nobs, n_bins - 1), chisquare(
-    freq, nobs * probs_d)[1], decimal=13)
+
+assert_almost_equal(stats.chi2.sf(d_delta * nobs, n_bins - 1),
+                    chisquare(freq, nobs*probs_d)[1], decimal=13)
+
 crit = stats.chi2.isf(0.05, n_bins - 1)
-power = stats.ncx2.sf(crit, n_bins - 1, 0.001 ** 2 * nobs)
+power = stats.ncx2.sf(crit, n_bins-1, 0.001**2 * nobs)
+#> library(pwr)
+#> tr = pwr.chisq.test(w =0.001, N =30000 , df = 5-1, sig.level = 0.05, power = NULL)
 assert_almost_equal(power, 0.05147563, decimal=7)
 effect_size = 0.001
 power = chisquare_power(effect_size, nobs, n_bins, alpha=0.05)
 assert_almost_equal(power, 0.05147563, decimal=7)
-print(chisquare(freq, nobs * probs, value=0, ddof=0))
-d_null_alt = ((probs - probs_d) ** 2 / probs).sum()
-print(chisquare(freq, nobs * probs, value=np.sqrt(d_null_alt), ddof=0))
+print(chisquare(freq, nobs*probs, value=0, ddof=0))
+d_null_alt = ((probs - probs_d)**2 / probs).sum()
+print(chisquare(freq, nobs*probs, value=np.sqrt(d_null_alt), ddof=0))
+
+
+#Monte Carlo to check correct size and power of test
+
 d_delta_r = chisquare_effectsize(probs, probs_d)
 n_rep = 10000
 nobs = 3000
 res_boots = np.zeros((n_rep, 6))
 for i in range(n_rep):
-    rvs = np.argmax(np.random.rand(nobs, 1) < probs_cs, 1)
+    rvs = np.argmax(np.random.rand(nobs,1) < probs_cs, 1)
     freq = np.bincount(rvs)
-    res1 = chisquare(freq, nobs * probs)
-    res2 = chisquare(freq, nobs * probs_d)
-    res3 = chisquare(freq, nobs * probs_d, value=d_delta_r)
+    res1 = chisquare(freq, nobs*probs)
+    res2 = chisquare(freq, nobs*probs_d)
+    res3 = chisquare(freq, nobs*probs_d, value=d_delta_r)
     res_boots[i] = [res1[0], res2[0], res3[0], res1[1], res2[1], res3[1]]
+
 alpha = np.array([0.01, 0.05, 0.1, 0.25, 0.5])
-chi2_power = chisquare_power(chisquare_effectsize(probs, probs_d), 3000,
-    n_bins, alpha=[0.01, 0.05, 0.1, 0.25, 0.5])
+chi2_power = chisquare_power(chisquare_effectsize(probs, probs_d), 3000, n_bins,
+                             alpha=[0.01, 0.05, 0.1, 0.25, 0.5])
 print((res_boots[:, 3:] < 0.05).mean(0))
 reject_freq = (res_boots[:, 3:, None] < alpha).mean(0)
 reject = (res_boots[:, 3:, None] < alpha).sum(0)
+
 desired = np.column_stack((alpha, chi2_power, alpha)).T
+
 print('relative difference Monte Carlo rejection and expected (in %)')
 print((reject_freq / desired - 1) * 100)
diff --git a/statsmodels/examples/try_polytrend.py b/statsmodels/examples/try_polytrend.py
index 98f3eb926..873e36734 100644
--- a/statsmodels/examples/try_polytrend.py
+++ b/statsmodels/examples/try_polytrend.py
@@ -1,39 +1,64 @@
+
+
 import matplotlib.pyplot as plt
 import numpy as np
 from scipy import special
+
 import statsmodels.api as sm
 from statsmodels.datasets.macrodata import data
+
+#import statsmodels.linear_model.regression as smreg
+
+
 dta = data.load()
 gdp = np.log(dta.data['realgdp'])
+
+
 maxorder = 20
 polybase = special.chebyt
 polybase = special.legendre
-t = np.linspace(-1, 1, len(gdp))
+
+t = np.linspace(-1,1,len(gdp))
+
 exog = np.column_stack([polybase(i)(t) for i in range(maxorder)])
-fitted = [sm.OLS(gdp, exog[:, :maxr]).fit().fittedvalues for maxr in range(
-    2, maxorder)]
-print((np.corrcoef(exog[:, 1:6], rowvar=0) * 10000).astype(int))
+
+fitted = [sm.OLS(gdp, exog[:, :maxr]).fit().fittedvalues for maxr in
+          range(2,maxorder)]
+
+print((np.corrcoef(exog[:,1:6], rowvar=0)*10000).astype(int))
+
+
 plt.figure()
 plt.plot(gdp, 'o')
-for i in range(maxorder - 2):
+for i in range(maxorder-2):
     plt.plot(fitted[i])
+
 plt.figure()
-for i in range(maxorder - 4, maxorder - 2):
+#plt.plot(gdp, 'o')
+for i in range(maxorder-4, maxorder-2):
+    #plt.figure()
     plt.plot(gdp - fitted[i])
-    plt.title(str(i + 2))
+    plt.title(str(i+2))
+
 plt.figure()
 plt.plot(gdp, '.')
 plt.plot(fitted[-1], lw=2, color='r')
 plt.plot(fitted[0], lw=2, color='g')
 plt.title('GDP and Polynomial Trend')
+
 plt.figure()
 plt.plot(gdp - fitted[-1], lw=2, color='r')
 plt.plot(gdp - fitted[0], lw=2, color='g')
-plt.title(
-    'Residual GDP minus Polynomial Trend (green: linear, red: legendre(20))')
-ex2 = t[:, None] ** np.arange(6)
-q2, r2 = np.linalg.qr(ex2, mode='full')
-np.max(np.abs(np.dot(q2.T, q2) - np.eye(6)))
+plt.title('Residual GDP minus Polynomial Trend (green: linear, red: legendre(20))')
+
+
+#orthonormalize an exog using QR
+
+ex2 = t[:,None]**np.arange(6)  #np.vander has columns reversed
+q2,r2 = np.linalg.qr(ex2, mode='full')
+np.max(np.abs(np.dot(q2.T, q2)-np.eye(6)))
 plt.figure()
 plt.plot(q2, lw=2)
+
+
 plt.show()
diff --git a/statsmodels/examples/try_power.py b/statsmodels/examples/try_power.py
index ff2753630..1a0524e8c 100644
--- a/statsmodels/examples/try_power.py
+++ b/statsmodels/examples/try_power.py
@@ -1,43 +1,69 @@
+# -*- coding: utf-8 -*-
 """

 Created on Sat Mar 02 14:38:17 2013

 Author: Josef Perktold
 """
+
+
 import statsmodels.stats.power as smp
-sigma = 1
-d = 0.3
-nobs = 80
-alpha = 0.05
-print(smp.normal_power(d, nobs / 2, 0.05))
+
+sigma=1
+d=0.3
+nobs=80
+alpha=0.05
+print(smp.normal_power(d, nobs/2, 0.05))
 print(smp.NormalIndPower().power(d, nobs, 0.05))
-print(smp.NormalIndPower().solve_power(effect_size=0.3, nobs1=80, alpha=
-    0.05, power=None))
+print(smp.NormalIndPower().solve_power(effect_size=0.3, nobs1=80, alpha=0.05, power=None))
 print(0.475100870572638, 'R')
-norm_pow = smp.normal_power(-0.01, nobs / 2, 0.05)
+
+norm_pow = smp.normal_power(-0.01, nobs/2, 0.05)
 norm_pow_R = 0.05045832927039234
+#value from R: >pwr.2p.test(h=0.01,n=80,sig.level=0.05,alternative="two.sided")
 print('norm_pow', norm_pow, norm_pow - norm_pow_R)
-norm_pow = smp.NormalIndPower().power(0.01, nobs, 0.05, alternative='larger')
+
+norm_pow = smp.NormalIndPower().power(0.01, nobs, 0.05, alternative="larger")
 norm_pow_R = 0.056869534873146124
+#value from R: >pwr.2p.test(h=0.01,n=80,sig.level=0.05,alternative="greater")
 print('norm_pow', norm_pow, norm_pow - norm_pow_R)
-norm_pow = smp.NormalIndPower().power(-0.01, nobs, 0.05, alternative='larger')
+
+# Note: negative effect size is same as switching one-sided alternative
+# TODO: should I switch to larger/smaller instead of "one-sided" options
+norm_pow = smp.NormalIndPower().power(-0.01, nobs, 0.05, alternative="larger")
 norm_pow_R = 0.0438089705093578
+#value from R: >pwr.2p.test(h=0.01,n=80,sig.level=0.05,alternative="less")
 print('norm_pow', norm_pow, norm_pow - norm_pow_R)
+
+
+#Note: I use n_bins and ddof instead of df
+# pwr.chisq.test(w=0.289,df=(4-1),N=100,sig.level=0.05)
 chi2_pow = smp.GofChisquarePower().power(0.289, 100, 4, 0.05)
 chi2_pow_R = 0.675077657003721
 print('chi2_pow', chi2_pow, chi2_pow - chi2_pow_R)
+
 chi2_pow = smp.GofChisquarePower().power(0.01, 100, 4, 0.05)
 chi2_pow_R = 0.0505845519208533
 print('chi2_pow', chi2_pow, chi2_pow - chi2_pow_R)
+
 chi2_pow = smp.GofChisquarePower().power(2, 100, 4, 0.05)
 chi2_pow_R = 1
 print('chi2_pow', chi2_pow, chi2_pow - chi2_pow_R)
+
 chi2_pow = smp.GofChisquarePower().power(0.9, 100, 4, 0.05)
 chi2_pow_R = 0.999999999919477
 print('chi2_pow', chi2_pow, chi2_pow - chi2_pow_R, 'lower precision ?')
+
 chi2_pow = smp.GofChisquarePower().power(0.8, 100, 4, 0.05)
 chi2_pow_R = 0.999999968205591
 print('chi2_pow', chi2_pow, chi2_pow - chi2_pow_R)
+
+def cohen_es(*args, **kwds):
+    print("You better check what's a meaningful effect size for your question.")
+
+
+#BUG: after fixing 2.sided option, 2 rejection areas
 tt_pow = smp.TTestPower().power(effect_size=0.01, nobs=nobs, alpha=0.05)
 tt_pow_R = 0.05089485285965
+# value from> pwr.t.test(d=0.01,n=80,sig.level=0.05,type="one.sample",alternative="two.sided")
 print('tt_pow', tt_pow, tt_pow - tt_pow_R)
diff --git a/statsmodels/examples/try_power2.py b/statsmodels/examples/try_power2.py
index dd8b31c37..83b50eb8f 100644
--- a/statsmodels/examples/try_power2.py
+++ b/statsmodels/examples/try_power2.py
@@ -1,63 +1,57 @@
+# -*- coding: utf-8 -*-
 """

 Created on Wed Mar 13 13:06:14 2013

 Author: Josef Perktold
 """
+
 from statsmodels.stats.power import TTestPower, TTestIndPower, tt_solve_power
+
 if __name__ == '__main__':
     effect_size, alpha, power = 0.5, 0.05, 0.8
+
     ttest_pow = TTestPower()
     print('\nroundtrip - root with respect to all variables')
-    print("""
-       calculated, desired""")
-    nobs_p = ttest_pow.solve_power(effect_size=effect_size, nobs=None,
-        alpha=alpha, power=power)
+    print('\n       calculated, desired')
+
+    nobs_p = ttest_pow.solve_power(effect_size=effect_size, nobs=None, alpha=alpha, power=power)
     print('nobs  ', nobs_p)
-    print('effect', ttest_pow.solve_power(effect_size=None, nobs=nobs_p,
-        alpha=alpha, power=power), effect_size)
-    print('alpha ', ttest_pow.solve_power(effect_size=effect_size, nobs=
-        nobs_p, alpha=None, power=power), alpha)
-    print('power  ', ttest_pow.solve_power(effect_size=effect_size, nobs=
-        nobs_p, alpha=alpha, power=None), power)
+    print('effect', ttest_pow.solve_power(effect_size=None, nobs=nobs_p, alpha=alpha, power=power), effect_size)
+
+    print('alpha ', ttest_pow.solve_power(effect_size=effect_size, nobs=nobs_p, alpha=None, power=power), alpha)
+    print('power  ', ttest_pow.solve_power(effect_size=effect_size, nobs=nobs_p, alpha=alpha, power=None), power)
+
     print('\nroundtrip - root with respect to all variables')
-    print("""
-       calculated, desired""")
-    print('nobs  ', tt_solve_power(effect_size=effect_size, nobs=None,
-        alpha=alpha, power=power), nobs_p)
-    print('effect', tt_solve_power(effect_size=None, nobs=nobs_p, alpha=
-        alpha, power=power), effect_size)
-    print('alpha ', tt_solve_power(effect_size=effect_size, nobs=nobs_p,
-        alpha=None, power=power), alpha)
-    print('power  ', tt_solve_power(effect_size=effect_size, nobs=nobs_p,
-        alpha=alpha, power=None), power)
+    print('\n       calculated, desired')
+
+    print('nobs  ', tt_solve_power(effect_size=effect_size, nobs=None, alpha=alpha, power=power), nobs_p)
+    print('effect', tt_solve_power(effect_size=None, nobs=nobs_p, alpha=alpha, power=power), effect_size)
+
+    print('alpha ', tt_solve_power(effect_size=effect_size, nobs=nobs_p, alpha=None, power=power), alpha)
+    print('power  ', tt_solve_power(effect_size=effect_size, nobs=nobs_p, alpha=alpha, power=None), power)
+
     print('\none sided')
-    nobs_p1 = tt_solve_power(effect_size=effect_size, nobs=None, alpha=
-        alpha, power=power, alternative='larger')
+    nobs_p1 = tt_solve_power(effect_size=effect_size, nobs=None, alpha=alpha, power=power, alternative='larger')
     print('nobs  ', nobs_p1)
-    print('effect', tt_solve_power(effect_size=None, nobs=nobs_p1, alpha=
-        alpha, power=power, alternative='larger'), effect_size)
-    print('alpha ', tt_solve_power(effect_size=effect_size, nobs=nobs_p1,
-        alpha=None, power=power, alternative='larger'), alpha)
-    print('power  ', tt_solve_power(effect_size=effect_size, nobs=nobs_p1,
-        alpha=alpha, power=None, alternative='larger'), power)
+    print('effect', tt_solve_power(effect_size=None, nobs=nobs_p1, alpha=alpha, power=power, alternative='larger'), effect_size)
+    print('alpha ', tt_solve_power(effect_size=effect_size, nobs=nobs_p1, alpha=None, power=power, alternative='larger'), alpha)
+    print('power  ', tt_solve_power(effect_size=effect_size, nobs=nobs_p1, alpha=alpha, power=None, alternative='larger'), power)
+
+    #start_ttp = dict(effect_size=0.01, nobs1=10., alpha=0.15, power=0.6)
+
     ttind_solve_power = TTestIndPower().solve_power
+
     print('\nroundtrip - root with respect to all variables')
-    print("""
-       calculated, desired""")
-    nobs_p2 = ttind_solve_power(effect_size=effect_size, nobs1=None, alpha=
-        alpha, power=power)
+    print('\n       calculated, desired')
+
+    nobs_p2 = ttind_solve_power(effect_size=effect_size, nobs1=None, alpha=alpha, power=power)
     print('nobs  ', nobs_p2)
-    print('effect', ttind_solve_power(effect_size=None, nobs1=nobs_p2,
-        alpha=alpha, power=power), effect_size)
-    print('alpha ', ttind_solve_power(effect_size=effect_size, nobs1=
-        nobs_p2, alpha=None, power=power), alpha)
-    print('power  ', ttind_solve_power(effect_size=effect_size, nobs1=
-        nobs_p2, alpha=alpha, power=None), power)
-    print('ratio  ', ttind_solve_power(effect_size=effect_size, nobs1=
-        nobs_p2, alpha=alpha, power=power, ratio=None), 1)
+    print('effect', ttind_solve_power(effect_size=None, nobs1=nobs_p2, alpha=alpha, power=power), effect_size)
+    print('alpha ', ttind_solve_power(effect_size=effect_size, nobs1=nobs_p2, alpha=None, power=power), alpha)
+    print('power  ', ttind_solve_power(effect_size=effect_size, nobs1=nobs_p2, alpha=alpha, power=None), power)
+    print('ratio  ', ttind_solve_power(effect_size=effect_size, nobs1=nobs_p2, alpha=alpha, power=power, ratio=None), 1)
+
     print('\ncheck ratio')
-    print('smaller power', ttind_solve_power(effect_size=effect_size, nobs1
-        =nobs_p2, alpha=alpha, power=0.7, ratio=None), '< 1')
-    print('larger power ', ttind_solve_power(effect_size=effect_size, nobs1
-        =nobs_p2, alpha=alpha, power=0.9, ratio=None), '> 1')
+    print('smaller power', ttind_solve_power(effect_size=effect_size, nobs1=nobs_p2, alpha=alpha, power=0.7, ratio=None), '< 1')
+    print('larger power ', ttind_solve_power(effect_size=effect_size, nobs1=nobs_p2, alpha=alpha, power=0.9, ratio=None), '> 1')
diff --git a/statsmodels/examples/try_tukey_hsd.py b/statsmodels/examples/try_tukey_hsd.py
index 50e8b2c67..29b60583b 100644
--- a/statsmodels/examples/try_tukey_hsd.py
+++ b/statsmodels/examples/try_tukey_hsd.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """

 Created on Wed Mar 28 15:34:18 2012
@@ -5,13 +6,18 @@ Created on Wed Mar 28 15:34:18 2012
 Author: Josef Perktold
 """
 from io import StringIO
+
 import numpy as np
 from numpy.testing import assert_almost_equal, assert_equal
 from scipy import stats
+
 from statsmodels.stats.libqsturng import qsturng
 from statsmodels.stats.multicomp import tukeyhsd
 import statsmodels.stats.multicomp as multi
-ss = """  43.9  1   1
+
+
+ss = '''\
+  43.9  1   1
   39.0  1   2
   46.7  1   3
   43.8  1   4
@@ -50,8 +56,11 @@ ss = """  43.9  1   1
   43.2  4   7
   38.7  4   8
   40.9  4   9
-  39.7  4  10"""
-ss2 = """1     mental               2
+  39.7  4  10'''
+
+#idx   Treatment StressReduction
+ss2 = '''\
+1     mental               2
 2     mental               2
 3     mental               3
 4     mental               4
@@ -80,8 +89,10 @@ ss2 = """1     mental               2
 27   medical               3
 28   medical               1
 29   medical               3
-30   medical               1"""
-ss3 = """1 24.5
+30   medical               1'''
+
+ss3 = '''\
+1 24.5
 1 23.5
 1 26.4
 1 27.1
@@ -95,60 +106,74 @@ ss3 = """1 24.5
 3 28.3
 3 24.3
 3 26.2
-3 27.8"""
-cylinders = np.array([8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 8, 8, 8, 8,
-    8, 8, 4, 6, 6, 6, 4, 4, 4, 4, 4, 4, 6, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8,
-    8, 6, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 4, 4, 4, 4, 4, 8, 4, 6, 6, 8, 8,
-    8, 8, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 6, 6, 4,
-    6, 4, 4, 4, 4, 4, 4, 4, 4])
-cyl_labels = np.array(['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA',
-    'USA', 'USA', 'USA', 'France', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA',
-    'USA', 'USA', 'USA', 'Japan', 'USA', 'USA', 'USA', 'Japan', 'Germany',
-    'France', 'Germany', 'Sweden', 'Germany', 'USA', 'USA', 'USA', 'USA',
-    'USA', 'Germany', 'USA', 'USA', 'France', 'USA', 'USA', 'USA', 'USA',
-    'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'Germany', 'Japan', 'USA',
-    'USA', 'USA', 'USA', 'Germany', 'Japan', 'Japan', 'USA', 'Sweden',
-    'USA', 'France', 'Japan', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA',
-    'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'Germany', 'Japan',
-    'Japan', 'USA', 'USA', 'Japan', 'Japan', 'Japan', 'Japan', 'Japan',
-    'Japan', 'USA', 'USA', 'USA', 'USA', 'Japan', 'USA', 'USA', 'USA',
-    'Germany', 'USA', 'USA', 'USA'])
-dta = np.recfromtxt(StringIO(ss), names=('Rust', 'Brand', 'Replication'))
-dta2 = np.recfromtxt(StringIO(ss2), names=('idx', 'Treatment',
-    'StressReduction'))
-dta3 = np.recfromtxt(StringIO(ss3), names=('Brand', 'Relief'))
+3 27.8'''
+
+cylinders = np.array([8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 6, 6, 6, 4, 4,
+                    4, 4, 4, 4, 6, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 6, 6, 6, 6, 4, 4, 4, 4, 6, 6,
+                    6, 6, 4, 4, 4, 4, 4, 8, 4, 6, 6, 8, 8, 8, 8, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
+                    4, 4, 4, 4, 4, 4, 4, 6, 6, 4, 6, 4, 4, 4, 4, 4, 4, 4, 4])
+cyl_labels = np.array(['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'France',
+    'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'Japan', 'USA', 'USA', 'USA', 'Japan',
+    'Germany', 'France', 'Germany', 'Sweden', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'Germany',
+    'USA', 'USA', 'France', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'Germany',
+    'Japan', 'USA', 'USA', 'USA', 'USA', 'Germany', 'Japan', 'Japan', 'USA', 'Sweden', 'USA', 'France',
+    'Japan', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA',
+    'Germany', 'Japan', 'Japan', 'USA', 'USA', 'Japan', 'Japan', 'Japan', 'Japan', 'Japan', 'Japan', 'USA',
+    'USA', 'USA', 'USA', 'Japan', 'USA', 'USA', 'USA', 'Germany', 'USA', 'USA', 'USA'])
+
+dta = np.recfromtxt(StringIO(ss), names=("Rust","Brand","Replication"))
+dta2 = np.recfromtxt(StringIO(ss2), names = ("idx", "Treatment", "StressReduction"))
+dta3 = np.recfromtxt(StringIO(ss3), names = ("Brand", "Relief"))
+
+#print tukeyhsd(dta['Brand'], dta['Rust'])
+
+def get_thsd(mci):
+    var_ = np.var(mci.groupstats.groupdemean(), ddof=len(mci.groupsunique))
+    means = mci.groupstats.groupmean
+    nobs = mci.groupstats.groupnobs
+    resi = tukeyhsd(means, nobs, var_, df=None, alpha=0.05, q_crit=qsturng(0.95, len(means), (nobs-1).sum()))
+    print(resi[4])
+    var2 = (mci.groupstats.groupvarwithin() * (nobs - 1)).sum() \
+                                                        / (nobs - 1).sum()
+    assert_almost_equal(var_, var2, decimal=14)
+    return resi
+
 mc = multi.MultiComparison(dta['Rust'], dta['Brand'])
 res = mc.tukeyhsd()
 print(res)
+
 mc2 = multi.MultiComparison(dta2['StressReduction'], dta2['Treatment'])
 res2 = mc2.tukeyhsd()
 print(res2)
-mc2s = multi.MultiComparison(dta2['StressReduction'][3:29], dta2[
-    'Treatment'][3:29])
+
+mc2s = multi.MultiComparison(dta2['StressReduction'][3:29], dta2['Treatment'][3:29])
 res2s = mc2s.tukeyhsd()
 print(res2s)
 res2s_001 = mc2s.tukeyhsd(alpha=0.01)
-tukeyhsd2s = np.array([1.888889, 0.8888889, -1, 0.2658549, -0.5908785, -
-    2.587133, 3.511923, 2.368656, 0.5871331, 0.002837638, 0.150456, 0.1266072]
-    ).reshape(3, 4, order='F')
-assert_almost_equal(res2s_001.confint, tukeyhsd2s[:, 1:3], decimal=3)
+#R result
+tukeyhsd2s = np.array([1.888889,0.8888889,-1,0.2658549,-0.5908785,-2.587133,3.511923,2.368656,0.5871331,0.002837638,0.150456,0.1266072]).reshape(3,4, order='F')
+assert_almost_equal(res2s_001.confint, tukeyhsd2s[:,1:3], decimal=3)
+
 mc3 = multi.MultiComparison(dta3['Relief'], dta3['Brand'])
 res3 = mc3.tukeyhsd()
 print(res3)
-tukeyhsd4 = multi.MultiComparison(cylinders, cyl_labels, group_order=[
-    'Sweden', 'Japan', 'Germany', 'France', 'USA'])
+
+tukeyhsd4 = multi.MultiComparison(cylinders, cyl_labels, group_order=["Sweden", "Japan", "Germany", "France", "USA"])
 res4 = tukeyhsd4.tukeyhsd()
 print(res4)
 try:
     import matplotlib.pyplot as plt
-    fig = res4.plot_simultaneous('USA')
+    fig = res4.plot_simultaneous("USA")
     plt.show()
 except Exception as e:
     print(e)
+
 for mci in [mc, mc2, mc3]:
     get_thsd(mci)
+
 print(mc2.allpairtest(stats.ttest_ind, method='b')[0])
-"""same as SAS:
+
+'''same as SAS:
 >>> np.var(mci.groupstats.groupdemean(), ddof=3)
 4.6773333333333351
 >>> var_ = np.var(mci.groupstats.groupdemean(), ddof=3)
@@ -160,8 +185,10 @@ array([[ 0.95263648,  8.24736352],
 array([[ 0.95098508,  8.24901492],
        [-3.38901492,  3.90901492],
        [-7.98901492, -0.69098508]])
-"""
-ss5 = """Comparisons significant at the 0.05 level are indicated by ***.
+'''
+
+ss5 = '''\
+Comparisons significant at the 0.05 level are indicated by ***.
 BRAND
 Comparison Difference
 Between
@@ -171,18 +198,21 @@ Means Simultaneous 95% Confidence Limits   Sign.
 3 - 2  -4.340  -7.989  -0.691  ***
 3 - 1  0.260   -3.389  3.909    -
 1 - 2  -4.600  -8.249  -0.951  ***
-1 - 3  -0.260  -3.909  3.389   """
-ss5 = """2 - 3 4.340   0.691   7.989   ***
+1 - 3  -0.260  -3.909  3.389   '''
+
+ss5 = '''\
+2 - 3  4.340   0.691   7.989   ***
 2 - 1  4.600   0.951   8.249   ***
 3 - 2  -4.340  -7.989  -0.691  ***
 3 - 1  0.260   -3.389  3.909    -
 1 - 2  -4.600  -8.249  -0.951  ***
-1 - 3  -0.260  -3.909  3.389   """
-dta5 = np.recfromtxt(StringIO(ss5), names=('pair', 'mean', 'lower', 'upper',
-    'sig'), delimiter='\t')
-sas_ = dta5[[1, 3, 2]]
+1 - 3  -0.260  -3.909  3.389   '''
+
+dta5 = np.recfromtxt(StringIO(ss5), names = ('pair', 'mean', 'lower', 'upper', 'sig'), delimiter='\t')
+
+sas_ = dta5[[1,3,2]]
 confint1 = res3.confint
-confint2 = sas_[['lower', 'upper']].view(float).reshape((3, 2))
+confint2 = sas_[['lower','upper']].view(float).reshape((3,2))
 assert_almost_equal(confint1, confint2, decimal=2)
 reject1 = res3.reject
 reject2 = sas_['sig'] == '***'
diff --git a/statsmodels/examples/tsa/ar1cholesky.py b/statsmodels/examples/tsa/ar1cholesky.py
index b6b5c997e..21038de52 100644
--- a/statsmodels/examples/tsa/ar1cholesky.py
+++ b/statsmodels/examples/tsa/ar1cholesky.py
@@ -1,32 +1,42 @@
+# -*- coding: utf-8 -*-
 """
 Created on Thu Oct 21 15:42:18 2010

 Author: josef-pktd
 """
+
 import numpy as np
 from scipy import linalg

-
-def tiny2zero(x, eps=1e-15):
-    """replace abs values smaller than eps by zero, makes copy
-    """
-    pass
+def tiny2zero(x, eps = 1e-15):
+    '''replace abs values smaller than eps by zero, makes copy
+    '''
+    mask = np.abs(x.copy()) <  eps
+    x[mask] = 0
+    return x


 nobs = 5
-autocov = 0.8 ** np.arange(nobs)
-autocov = np.array([3.0, 2.0, 1.0, 0.4, 0.12, 0.016, -0.0112, 0.016, -
-    0.0112, -0.01216, -0.007488, -0.0035584]) / 3.0
+autocov = 0.8**np.arange(nobs)
+#from statsmodels.tsa import arima_process as ap
+#autocov = ap.arma_acf([1, -0.8, 0.2], [1])[:10]
+autocov = np.array([ 3.,  2.,  1.,  0.4,  0.12,  0.016, -0.0112,
+        0.016    , -0.0112   , -0.01216  , -0.007488 , -0.0035584])/3.
 autocov = autocov[:nobs]
 sigma = linalg.toeplitz(autocov)
 sigmainv = linalg.inv(sigma)
+
 c = linalg.cholesky(sigma, lower=True)
 ci = linalg.cholesky(sigmainv, lower=True)
+
 print(sigma)
-print(tiny2zero(ci / ci.max()))
-"""this is the text book transformation"""
-print('coefficient for first observation', np.sqrt(1 - autocov[1] ** 2))
-ci2 = ci[::-1, ::-1].T
-print(tiny2zero(ci2 / ci2.max()))
-print(np.dot(ci / ci.max(), np.ones(nobs)))
-print(np.dot(ci2 / ci2.max(), np.ones(nobs)))
+print(tiny2zero(ci/ci.max()))
+
+"this is the text book transformation"
+print('coefficient for first observation', np.sqrt(1-autocov[1]**2))
+ci2 = ci[::-1,::-1].T
+print(tiny2zero(ci2/ci2.max()))
+
+print(np.dot(ci/ci.max(), np.ones(nobs)))
+
+print(np.dot(ci2/ci2.max(), np.ones(nobs)))
diff --git a/statsmodels/examples/tsa/arma_plots.py b/statsmodels/examples/tsa/arma_plots.py
index 571e7f191..e4ab6f79a 100644
--- a/statsmodels/examples/tsa/arma_plots.py
+++ b/statsmodels/examples/tsa/arma_plots.py
@@ -1,45 +1,77 @@
-"""Plot acf and pacf for some ARMA(1,1)
+'''Plot acf and pacf for some ARMA(1,1)
+
+'''
+

-"""
 import numpy as np
 import matplotlib.pyplot as plt
 import matplotlib.ticker as mticker
+
 import statsmodels.tsa.arima_process as tsp
 from statsmodels.graphics.tsaplots import plotacf
+
 np.set_printoptions(precision=2)
-arcoefs = [0.9, 0.0, -0.5]
-macoefs = [0.9, 0.0, -0.5]
+
+
+arcoefs = [0.9, 0., -0.5] #[0.9, 0.5, 0.1, 0., -0.5]
+macoefs = [0.9, 0., -0.5] #[0.9, 0.5, 0.1, 0., -0.5]
 nsample = 1000
 nburnin = 1000
 sig = 1
+
 fig = plt.figure(figsize=(8, 13))
-fig.suptitle('ARMA: Autocorrelation (left) and Partial Autocorrelation (right)'
-    )
+fig.suptitle('ARMA: Autocorrelation (left) and Partial Autocorrelation (right)')
 subplotcount = 1
 nrows = 4
 for arcoef in arcoefs[:-1]:
     for macoef in macoefs[:-1]:
-        ar = np.r_[1.0, -arcoef]
-        ma = np.r_[1.0, macoef]
+        ar = np.r_[1., -arcoef]
+        ma = np.r_[1.,  macoef]
+
+        #from statsmodels.sandbox.tsa.fftarma import ArmaFft as FftArmaProcess
+        #y = tsp.arma_generate_sample(ar,ma,nsample, sig, burnin)
+        #armaprocess = FftArmaProcess(ar, ma, nsample) #TODO: make n optional
+        #armaprocess.plot4()
         armaprocess = tsp.ArmaProcess(ar, ma)
         acf = armaprocess.acf(20)[:20]
         pacf = armaprocess.pacf(20)[:20]
         ax = fig.add_subplot(nrows, 2, subplotcount)
         plotacf(acf, ax=ax)
-        ax.text(0.7, 0.6, 'ar =%s \nma=%s' % (ar, ma), transform=ax.
-            transAxes, horizontalalignment='left', size='xx-small')
-        ax.set_xlim(-1, 20)
-        subplotcount += 1
+##        ax.set_title('Autocorrelation \nar=%s, ma=%rs' % (ar, ma),
+##                     size='xx-small')
+        ax.text(0.7, 0.6, 'ar =%s \nma=%s' % (ar, ma),
+                transform=ax.transAxes,
+                horizontalalignment='left', #'right',
+                size='xx-small')
+        ax.set_xlim(-1,20)
+        subplotcount +=1
         ax = fig.add_subplot(nrows, 2, subplotcount)
         plotacf(pacf, ax=ax)
-        ax.text(0.7, 0.6, 'ar =%s \nma=%s' % (ar, ma), transform=ax.
-            transAxes, horizontalalignment='left', size='xx-small')
-        ax.set_xlim(-1, 20)
-        subplotcount += 1
+##        ax.set_title('Partial Autocorrelation \nar=%s, ma=%rs' % (ar, ma),
+##                     size='xx-small')
+        ax.text(0.7, 0.6, 'ar =%s \nma=%s' % (ar, ma),
+                transform=ax.transAxes,
+                horizontalalignment='left', #'right',
+                size='xx-small')
+        ax.set_xlim(-1,20)
+        subplotcount +=1
+
 axs = fig.axes
+### turn of the 2nd column y tick labels
+##for ax in axs[1::2]:#[:,1].flat:
+##   for label in ax.get_yticklabels(): label.set_visible(False)
+
+# turn off all but the bottom xtick labels
 for ax in axs[:-2]:
     for label in ax.get_xticklabels():
         label.set_visible(False)
-for ax in axs:
-    ax.yaxis.set_major_locator(mticker.MaxNLocator(3))
+
+
+# use a MaxNLocator on the first column y axis if you have a bunch of
+# rows to avoid bunching; example below uses at most 3 ticks
+for ax in axs: #[::2]:#[:,1].flat:
+    ax.yaxis.set_major_locator( mticker.MaxNLocator(3 ))
+
+
+
 plt.show()
diff --git a/statsmodels/examples/tsa/compare_arma.py b/statsmodels/examples/tsa/compare_arma.py
index 7eef98153..fb42f371c 100644
--- a/statsmodels/examples/tsa/compare_arma.py
+++ b/statsmodels/examples/tsa/compare_arma.py
@@ -2,66 +2,76 @@ from time import time
 from statsmodels.tsa.arma_mle import Arma
 from statsmodels.tsa.api import ARMA
 import numpy as np
-print('Battle of the dueling ARMAs')
-y_arma22 = np.loadtxt(
-    'C:\\Josef\\eclipsegworkspace\\statsmodels-josef-experimental-gsoc\\scikits\\statsmodels\\tsa\\y_arma22.txt'
-    )
+
+print("Battle of the dueling ARMAs")
+
+y_arma22 = np.loadtxt(r'C:\Josef\eclipsegworkspace\statsmodels-josef-experimental-gsoc\scikits\statsmodels\tsa\y_arma22.txt')
+
 arma1 = Arma(y_arma22)
 arma2 = ARMA(y_arma22)
-print('The actual results from gretl exact mle are')
-params_mle = np.array([0.82699, -0.333986, 0.0362419, -0.792825])
+
+print("The actual results from gretl exact mle are")
+params_mle = np.array([.826990, -.333986, .0362419, -.792825])
 sigma_mle = 1.094011
 llf_mle = -1510.233
-print('params: ', params_mle)
-print('sigma: ', sigma_mle)
-print('llf: ', llf_mle)
-print('The actual results from gretl css are')
-params_css = np.array([0.82481, -0.337077, 0.0407222, -0.789792])
+print("params: ", params_mle)
+print("sigma: ", sigma_mle)
+print("llf: ", llf_mle)
+print("The actual results from gretl css are")
+params_css = np.array([.824810, -.337077, .0407222, -.789792])
 sigma_css = 1.095688
 llf_css = -1507.301
+
 results = []
-results += ['gretl exact mle', params_mle, sigma_mle, llf_mle]
-results += ['gretl css', params_css, sigma_css, llf_css]
+results += ["gretl exact mle", params_mle, sigma_mle, llf_mle]
+results += ["gretl css", params_css, sigma_css, llf_css]
+
 t0 = time()
-print('Exact MLE - Kalman filter version using l_bfgs_b')
-arma2.fit(order=(2, 2), trend='nc')
+print("Exact MLE - Kalman filter version using l_bfgs_b")
+arma2.fit(order=(2,2), trend='nc')
 t1 = time()
-print('params: ', arma2.params)
-print('sigma: ', arma2.sigma2 ** 0.5)
+print("params: ", arma2.params)
+print("sigma: ", arma2.sigma2**.5)
 arma2.llf = arma2.loglike(arma2._invtransparams(arma2.params))
-results += ['exact mle kalmanf', arma2.params, arma2.sigma2 ** 0.5, arma2.llf]
-print('time used:', t1 - t0)
-t1 = time()
-print('CSS MLE - ARMA Class')
-arma2.fit(order=(2, 2), trend='nc', method='css')
-t2 = time()
+results += ["exact mle kalmanf", arma2.params, arma2.sigma2**.5, arma2.llf]
+print('time used:', t1-t0)
+
+t1=time()
+print("CSS MLE - ARMA Class")
+arma2.fit(order=(2,2), trend='nc', method="css")
+t2=time()
 arma2.llf = arma2.loglike_css(arma2._invtransparams(arma2.params))
-print('params: ', arma2.params)
-print('sigma: ', arma2.sigma2 ** 0.5)
-results += ['css kalmanf', arma2.params, arma2.sigma2 ** 0.5, arma2.llf]
-print('time used:', t2 - t1)
-print('Arma.fit_mle results')
+print("params: ", arma2.params)
+print("sigma: ", arma2.sigma2**.5)
+results += ["css kalmanf", arma2.params, arma2.sigma2**.5, arma2.llf]
+print('time used:', t2-t1)
+
+print("Arma.fit_mle results")
+# have to set nar and nma manually
 arma1.nar = 2
 arma1.nma = 2
-t2 = time()
+t2=time()
 ret = arma1.fit_mle()
-t3 = time()
-print('params, first 4, sigma, last 1 ', ret.params)
-results += ['Arma.fit_mle ', ret.params[:4], ret.params[-1], ret.llf]
-print('time used:', t3 - t2)
-print('Arma.fit method = "ls"')
-t3 = time()
-ret2 = arma1.fit(order=(2, 0, 2), method='ls')
-t4 = time()
+t3=time()
+print("params, first 4, sigma, last 1 ", ret.params)
+results += ["Arma.fit_mle ", ret.params[:4], ret.params[-1], ret.llf]
+print('time used:', t3-t2)
+
+print("Arma.fit method = \"ls\"")
+t3=time()
+ret2 = arma1.fit(order=(2,0,2), method="ls")
+t4=time()
 print(ret2[0])
-results += ['Arma.fit ls', ret2[0]]
-print('time used:', t4 - t3)
-print('Arma.fit method = "CLS"')
-t4 = time()
-ret3 = arma1.fit(order=(2, 0, 2), method='None')
-t5 = time()
+results += ["Arma.fit ls", ret2[0]]
+print('time used:', t4-t3)
+
+print("Arma.fit method = \"CLS\"")
+t4=time()
+ret3 = arma1.fit(order=(2,0,2), method="None")
+t5=time()
 print(ret3)
-results += ['Arma.fit other', ret3[0]]
-print('time used:', t5 - t4)
+results += ["Arma.fit other", ret3[0]]
+print('time used:', t5-t4)
+
 for i in results:
     print(i)
diff --git a/statsmodels/examples/tsa/ex_arma_all.py b/statsmodels/examples/tsa/ex_arma_all.py
index 657ecb3e2..b60920358 100644
--- a/statsmodels/examples/tsa/ex_arma_all.py
+++ b/statsmodels/examples/tsa/ex_arma_all.py
@@ -1,3 +1,5 @@
+
+
 import numpy as np
 from numpy.testing import assert_almost_equal
 import matplotlib.pyplot as plt
@@ -7,43 +9,65 @@ from statsmodels.tsa.arma_mle import Arma
 from statsmodels.tsa.arima_model import ARMA
 from statsmodels.tsa.arima_process import arma_generate_sample
 from statsmodels.miscmodels.tmodel import TArma
-x = fa.ArmaFft([1, -0.5], [1.0, 0.4], 40).generate_sample(size=200, burnin=1000
-    )
+
+
+x = fa.ArmaFft([1, -0.5], [1., 0.4], 40).generate_sample(size=200, burnin=1000)
 d = TsaDescriptive(x)
 d.plot4()
-d.fit((1, 1), trend='nc')
+
+#d.fit(order=(1,1))
+d.fit((1,1), trend='nc')
 print(d.res.params)
+
 modc = Arma(x)
-resls = modc.fit(order=(1, 1))
+resls = modc.fit(order=(1,1))
 print(resls[0])
-rescm = modc.fit_mle(order=(1, 1), start_params=[-0.4, 0.4, 1.0])
+rescm = modc.fit_mle(order=(1,1), start_params=[-0.4,0.4, 1.])
 print(rescm.params)
+
+#decimal 1 corresponds to threshold of 5% difference
 assert_almost_equal(resls[0] / d.res.params, 1, decimal=1)
 assert_almost_equal(rescm.params[:-1] / d.res.params, 1, decimal=1)
+#copied to tsa.tests
+
 plt.figure()
 plt.plot(x, 'b-o')
 plt.plot(modc.predicted(), 'r-')
 plt.figure()
 plt.plot(modc.error_estimate)
+#plt.show()
+
+
 modct = TArma(x)
-reslst = modc.fit(order=(1, 1))
+reslst = modc.fit(order=(1,1))
 print(reslst[0])
-rescmt = modct.fit_mle(order=(1, 1), start_params=[-0.4, 0.4, 10, 1.0],
-    maxiter=500, maxfun=500)
+rescmt = modct.fit_mle(order=(1,1), start_params=[-0.4,0.4, 10, 1.],maxiter=500,
+                       maxfun=500)
 print(rescmt.params)
+
+
 mkf = ARMA(x)
-rkf = mkf.fit((1, 1), trend='nc')
+##rkf = mkf.fit((1,1))
+##rkf.params
+rkf = mkf.fit((1,1), trend='nc')
 print(rkf.params)
+
 np.random.seed(12345)
-y_arma22 = arma_generate_sample([1.0, -0.85, 0.35, -0.1], [1, 0.25, -0.7],
-    nsample=1000)
+y_arma22 = arma_generate_sample([1.,-.85,.35, -0.1],[1,.25,-.7], nsample=1000)
+##arma22 = ARMA(y_arma22)
+##res22 = arma22.fit(trend = 'n', order=(2,2))
+##print 'kf ',res22.params
+##res22css = arma22.fit(method='css',trend = 'n', order=(2,2))
+##print 'css', res22css.params
 mod22 = Arma(y_arma22)
-resls22 = mod22.fit(order=(2, 2))
+resls22 = mod22.fit(order=(2,2))
 print('ls ', resls22[0])
-resmle22 = mod22.fit_mle(order=(2, 2), maxfun=2000)
+resmle22 = mod22.fit_mle(order=(2,2), maxfun=2000)
 print('mle', resmle22.params)
+
 f = mod22.forecast()
 f3 = mod22.forecast3(start=900)[-20:]
+
 print(y_arma22[-10:])
 print(f[-20:])
 print(f3[-109:-90])
diff --git a/statsmodels/examples/tsa/ex_coint.py b/statsmodels/examples/tsa/ex_coint.py
index 06862a9f5..06ee1454b 100644
--- a/statsmodels/examples/tsa/ex_coint.py
+++ b/statsmodels/examples/tsa/ex_coint.py
@@ -1,3 +1,8 @@
+
 from statsmodels.tsa.tests.test_stattools import TestCoint_t
+
+
+#test whether t-test for cointegration equals that produced by Stata
+
 tst = TestCoint_t()
 print(tst.test_tstat())
diff --git a/statsmodels/examples/tsa/ex_var.py b/statsmodels/examples/tsa/ex_var.py
index 2a04bdbac..d5cef7f84 100644
--- a/statsmodels/examples/tsa/ex_var.py
+++ b/statsmodels/examples/tsa/ex_var.py
@@ -1,26 +1,46 @@
+
 import numpy as np
+
 import statsmodels.api as sm
 from statsmodels.tsa.api import VAR
+
+# some example data
 mdata = sm.datasets.macrodata.load().data
-mdata = mdata[['realgdp', 'realcons', 'realinv']]
+mdata = mdata[['realgdp','realcons','realinv']]
 names = mdata.dtype.names
-data = mdata.view((float, 3))
-use_growthrate = False
+data = mdata.view((float,3))
+
+use_growthrate = False #True #False
 if use_growthrate:
     data = 100 * 4 * np.diff(np.log(data), axis=0)
+
 model = VAR(data, names=names)
 res = model.fit(4)
+
 nobs_all = data.shape[0]
-fc_in = np.array([np.squeeze(res.forecast(model.y[t - 20:t], 1)) for t in
-    range(nobs_all - 6, nobs_all)])
+
+#in-sample 1-step ahead forecasts
+fc_in = np.array([np.squeeze(res.forecast(model.y[t-20:t], 1))
+                  for t in range(nobs_all-6,nobs_all)])
+
 print(fc_in - res.fittedvalues[-6:])
-fc_out = np.array([np.squeeze(VAR(data[:t]).fit(2).forecast(data[t - 20:t],
-    1)) for t in range(nobs_all - 6, nobs_all)])
-print(fc_out - data[nobs_all - 6:nobs_all])
+
+#out-of-sample 1-step ahead forecasts
+fc_out = np.array([np.squeeze(VAR(data[:t]).fit(2).forecast(data[t-20:t], 1))
+                   for t in range(nobs_all-6,nobs_all)])
+
+print(fc_out - data[nobs_all-6:nobs_all])
 print(fc_out - res.fittedvalues[-6:])
+
+
+#out-of-sample h-step ahead forecasts
 h = 2
-fc_out = np.array([VAR(data[:t]).fit(2).forecast(data[t - 20:t], h)[-1] for
-    t in range(nobs_all - 6 - h + 1, nobs_all - h + 1)])
-print(fc_out - data[nobs_all - 6:nobs_all])
+fc_out = np.array([VAR(data[:t]).fit(2).forecast(data[t-20:t], h)[-1]
+                   for t in range(nobs_all-6-h+1,nobs_all-h+1)])
+
+print(fc_out - data[nobs_all-6:nobs_all])  #out-of-sample forecast error
 print(fc_out - res.fittedvalues[-6:])
+
+#import matplotlib.pyplot as plt
 res.plot_forecast(20)
+#plt.show()
diff --git a/statsmodels/examples/tsa/ex_var_reorder.py b/statsmodels/examples/tsa/ex_var_reorder.py
index 19f8cdfbe..98f3fb7ca 100644
--- a/statsmodels/examples/tsa/ex_var_reorder.py
+++ b/statsmodels/examples/tsa/ex_var_reorder.py
@@ -1,3 +1,5 @@
+
 from statsmodels.tsa.vector_ar.tests.test_var import TestVARResults
+
 test_VAR = TestVARResults()
 test_VAR.test_reorder()
diff --git a/statsmodels/examples/tsa/lagpolynomial.py b/statsmodels/examples/tsa/lagpolynomial.py
index f91d2a8c5..63637a8fa 100644
--- a/statsmodels/examples/tsa/lagpolynomial.py
+++ b/statsmodels/examples/tsa/lagpolynomial.py
@@ -1,28 +1,46 @@
+# -*- coding: utf-8 -*-
 """
 Created on Fri Oct 22 08:13:38 2010

 Author: josef-pktd
 License: BSD (3-clause)
 """
+
 import numpy as np
 from numpy import polynomial as npp


 class LagPolynomial(npp.Polynomial):

+    #def __init__(self, maxlag):
+
+    def pad(self, maxlag):
+        return LagPolynomial(np.r_[self.coef, np.zeros(maxlag-len(self.coef))])
+
+    def padflip(self, maxlag):
+        return LagPolynomial(np.r_[self.coef, np.zeros(maxlag-len(self.coef))][::-1])
+
     def flip(self):
-        """reverse polynomial coefficients
-        """
-        pass
+        '''reverse polynomial coefficients
+        '''
+        return LagPolynomial(self.coef[::-1])

     def div(self, other, maxlag=None):
-        """padded division, pads numerator with zeros to maxlag
-        """
-        pass
+        '''padded division, pads numerator with zeros to maxlag
+        '''
+        if maxlag is None:
+            maxlag = max(len(self.coef), len(other.coef)) + 1
+        return (self.padflip(maxlag) / other.flip()).flip()
+
+    def filter(self, arr):
+        return (self * arr).coef[:-len(self.coef)]  #trim to end
+


 ar = LagPolynomial([1, -0.8])
 arpad = ar.pad(10)
+
 ma = LagPolynomial([1, 0.1])
 mapad = ma.pad(10)
+
 unit = LagPolynomial([1])
diff --git a/statsmodels/examples/tsa/try_ar.py b/statsmodels/examples/tsa/try_ar.py
index 7bfab6097..dd8e086ff 100644
--- a/statsmodels/examples/tsa/try_ar.py
+++ b/statsmodels/examples/tsa/try_ar.py
@@ -1,14 +1,15 @@
+# -*- coding: utf-8 -*-
 """
 Created on Thu Oct 21 21:45:24 2010

 Author: josef-pktd
 """
+
 import numpy as np
 from scipy import signal

-
 def armaloop(arcoefs, macoefs, x):
-    """get arma recursion in simple loop
+    '''get arma recursion in simple loop

     for simplicity assumes that ma polynomial is not longer than the ar-polynomial

@@ -30,26 +31,49 @@ def armaloop(arcoefs, macoefs, x):
     -----
     Except for the treatment of initial observations this is the same as using
     scipy.signal.lfilter, which is much faster. Written for testing only
-    """
-    pass
+    '''
+    arcoefs_r = np.asarray(arcoefs)
+    macoefs_r = np.asarray(macoefs)
+    x = np.asarray(x)
+    nobs = x.shape[0]
+    #assume ar longer than ma
+    arlag = arcoefs_r.shape[0]
+    malag = macoefs_r.shape[0]
+    maxlag = max(arlag, malag)
+    print(maxlag)
+    y = np.zeros(x.shape, float)
+    e = np.zeros(x.shape, float)
+    y[:maxlag] = x[:maxlag]
+
+    #if malag > arlaga:
+    for t in range(arlag, maxlag):
+        y[t] = (x[t-arlag:t] * arcoefs_r).sum(0) + (e[:t] * macoefs_r[:t]).sum(0)
+        e[t] = x[t] - y[t]
+
+    for t in range(maxlag, nobs):
+        #wrong broadcasting, 1d only
+        y[t] = (x[t-arlag:t] * arcoefs_r).sum(0) + (e[t-malag:t] * macoefs_r).sum(0)
+        e[t] = x[t] - y[t]

+    return y, e

-arcoefs, macoefs = -np.array([1, -0.8, 0.2])[1:], np.array([1.0, 0.5, 0.1])[1:]
+arcoefs, macoefs = -np.array([1, -0.8, 0.2])[1:], np.array([1., 0.5, 0.1])[1:]
 print(armaloop(arcoefs, macoefs, np.ones(10)))
 print(armaloop([0.8], [], np.ones(10)))
-print(armaloop([0.8], [], np.arange(2, 10)))
-y, e = armaloop([0.1], [0.8], np.arange(2, 10))
+print(armaloop([0.8], [], np.arange(2,10)))
+y, e = armaloop([0.1], [0.8], np.arange(2,10))
 print(e)
-print(signal.lfilter(np.array([1, -0.1]), np.array([1.0, 0.8]), np.arange(2,
-    10)))
+print(signal.lfilter(np.array([1, -0.1]), np.array([1., 0.8]), np.arange(2,10)))
+
 y, e = armaloop([], [0.8], np.ones(10))
 print(e)
-print(signal.lfilter(np.array([1, -0.0]), np.array([1.0, 0.8]), np.ones(10)))
-ic = signal.lfiltic(np.array([1, -0.1]), np.array([1.0, 0.8]), np.ones([0]),
-    np.array([1]))
-print(signal.lfilter(np.array([1, -0.1]), np.array([1.0, 0.8]), np.ones(10),
-    zi=ic))
-zi = signal.lfilter_zi(np.array([1, -0.8, 0.2]), np.array([1.0, 0, 0]))
-print(signal.lfilter(np.array([1, -0.1]), np.array([1.0, 0.8]), np.ones(10),
-    zi=zi))
-print(signal.filtfilt(np.array([1, -0.8]), np.array([1.0]), np.ones(10)))
+print(signal.lfilter(np.array([1, -0.]), np.array([1., 0.8]), np.ones(10)))
+
+ic=signal.lfiltic(np.array([1, -0.1]), np.array([1., 0.8]), np.ones([0]), np.array([1]))
+print(signal.lfilter(np.array([1, -0.1]), np.array([1., 0.8]), np.ones(10), zi=ic))
+
+zi = signal.lfilter_zi(np.array([1, -0.8, 0.2]), np.array([1., 0, 0]))
+print(signal.lfilter(np.array([1, -0.1]), np.array([1., 0.8]), np.ones(10), zi=zi))
+print(signal.filtfilt(np.array([1, -0.8]), np.array([1.]), np.ones(10)))
+
+#todo write examples/test across different versions
diff --git a/statsmodels/examples/tut_ols_ancova.py b/statsmodels/examples/tut_ols_ancova.py
index 20cc8a0a5..60fd0d71b 100644
--- a/statsmodels/examples/tut_ols_ancova.py
+++ b/statsmodels/examples/tut_ols_ancova.py
@@ -1,4 +1,4 @@
-"""Examples OLS
+'''Examples OLS

 Note: uncomment plt.show() to display graphs

@@ -31,27 +31,51 @@ Estimate the model

 strongly rejected because differences in intercept are very large

-"""
+'''
+
 import numpy as np
 import statsmodels.api as sm
 import matplotlib.pyplot as plt
 from statsmodels.sandbox.regression.predstd import wls_prediction_std
+
+#fix a seed for these examples
 np.random.seed(98765789)
+
+#OLS with dummy variables, similar to ANCOVA
+#-------------------------------------------
+
+#construct simulated example:
+#3 groups common slope but different intercepts
+
 nsample = 50
 x1 = np.linspace(0, 20, nsample)
-sig = 1.0
+sig = 1.
+#suppose observations from 3 groups
 xg = np.zeros(nsample, int)
 xg[20:40] = 1
 xg[40:] = 2
-dummy = (xg[:, None] == np.unique(xg)).astype(float)
-X = np.c_[x1, dummy[:, 1:], np.ones(nsample)]
-beta = [1.0, 3, -3, 10]
+#print xg
+dummy = (xg[:,None] == np.unique(xg)).astype(float)
+#use group 0 as benchmark
+X = np.c_[x1, dummy[:,1:], np.ones(nsample)]
+beta = [1., 3, -3, 10]
 y_true = np.dot(X, beta)
 y = y_true + sig * np.random.normal(size=nsample)
+
+#estimate
+#~~~~~~~~
+
 res2 = sm.OLS(y, X).fit()
+#print "estimated parameters: x d1-d0 d2-d0 constant"
 print(res2.params)
+#print "standard deviation of parameter estimates"
 print(res2.bse)
 prstd, iv_l, iv_u = wls_prediction_std(res2)
+#print res.summary()
+
+#plot
+#~~~~
+
 plt.figure()
 plt.plot(x1, y, 'o', x1, y_true, 'b-')
 plt.plot(x1, res2.fittedvalues, 'r--.')
@@ -59,6 +83,15 @@ plt.plot(x1, iv_u, 'r--')
 plt.plot(x1, iv_l, 'r--')
 plt.title('3 groups: different intercepts, common slope; blue: true, red: OLS')
 plt.show()
-R = [[0, 1, 0, 0], [0, 0, 1, 0]]
-print('Test hypothesis that all groups have same intercept')
+
+
+#Test hypothesis that all groups have same intercept
+#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+R = [[0, 1, 0, 0],
+     [0, 0, 1, 0]]
+
+# F test joint hypothesis R * beta = 0
+# i.e. coefficient on both dummy variables equal zero
+print("Test hypothesis that all groups have same intercept")
 print(res2.f_test(R))
diff --git a/statsmodels/examples/tut_ols_rlm_short.py b/statsmodels/examples/tut_ols_rlm_short.py
index f5abc3f58..107ad7208 100644
--- a/statsmodels/examples/tut_ols_rlm_short.py
+++ b/statsmodels/examples/tut_ols_rlm_short.py
@@ -1,4 +1,4 @@
-"""Examples: comparing OLS and RLM
+'''Examples: comparing OLS and RLM

 robust estimators and outliers

@@ -6,37 +6,57 @@ RLM is less influenced by outliers than OLS and has estimated slope
 closer to true slope and not tilted like OLS.

 Note: uncomment plt.show() to display graphs
-"""
+'''
+
 import numpy as np
+#from scipy import stats
 import statsmodels.api as sm
 import matplotlib.pyplot as plt
 from statsmodels.sandbox.regression.predstd import wls_prediction_std
+
+#fix a seed for these examples
 np.random.seed(98765789)
+
 nsample = 50
 x1 = np.linspace(0, 20, nsample)
 X = np.c_[x1, np.ones(nsample)]
-sig = 0.3
-beta = [0.5, 5.0]
+
+sig = 0.3   # smaller error variance makes OLS<->RLM contrast bigger
+beta = [0.5, 5.]
 y_true2 = np.dot(X, beta)
-y2 = y_true2 + sig * 1.0 * np.random.normal(size=nsample)
-y2[[39, 41, 43, 45, 48]] -= 5
+y2 = y_true2 + sig*1. * np.random.normal(size=nsample)
+y2[[39,41,43,45,48]] -= 5   # add some outliers (10% of nsample)
+
+
+# Example: estimate linear function (true is linear)
+
 plt.figure()
 plt.plot(x1, y2, 'o', x1, y_true2, 'b-')
+
+
 res2 = sm.OLS(y2, X).fit()
-print('OLS: parameter estimates: slope, constant')
+print("OLS: parameter estimates: slope, constant")
 print(res2.params)
-print('standard deviation of parameter estimates')
+print("standard deviation of parameter estimates")
 print(res2.bse)
 prstd, iv_l, iv_u = wls_prediction_std(res2)
 plt.plot(x1, res2.fittedvalues, 'r-')
 plt.plot(x1, iv_u, 'r--')
 plt.plot(x1, iv_l, 'r--')
+
+
+#compare with robust estimator
+
 resrlm2 = sm.RLM(y2, X).fit()
-print("""
-RLM: parameter estimates: slope, constant""")
+print("\nRLM: parameter estimates: slope, constant")
 print(resrlm2.params)
-print('standard deviation of parameter estimates')
+print("standard deviation of parameter estimates")
 print(resrlm2.bse)
 plt.plot(x1, resrlm2.fittedvalues, 'g.-')
 plt.title('Data with Outliers; blue: true, red: OLS, green: RLM')
+
+
+# see also help(sm.RLM.fit) for more options and
+# module sm.robust.scale for scale options
+
 plt.show()
diff --git a/statsmodels/formula/api.py b/statsmodels/formula/api.py
index d246756c6..58d8782f5 100644
--- a/statsmodels/formula/api.py
+++ b/statsmodels/formula/api.py
@@ -8,6 +8,7 @@ import statsmodels.regression.quantile_regression as qr_
 import statsmodels.duration.hazard_regression as hr_
 import statsmodels.genmod.generalized_estimating_equations as gee_
 import statsmodels.gam.generalized_additive_model as gam_
+
 gls = lm_.GLS.from_formula
 wls = lm_.WLS.from_formula
 ols = lm_.OLS.from_formula
@@ -29,8 +30,29 @@ glmgam = gam_.GLMGam.from_formula
 conditional_logit = dcm_.ConditionalLogit.from_formula
 conditional_mnlogit = dcm_.ConditionalMNLogit.from_formula
 conditional_poisson = dcm_.ConditionalPoisson.from_formula
+
 del lm_, dm_, mlm_, glm_, roblm_, qr_, hr_, gee_, gam_, dcm_
-__all__ = ['conditional_logit', 'conditional_mnlogit',
-    'conditional_poisson', 'gee', 'glm', 'glmgam', 'gls', 'glsar', 'logit',
-    'mixedlm', 'mnlogit', 'negativebinomial', 'nominal_gee', 'ols',
-    'ordinal_gee', 'phreg', 'poisson', 'probit', 'quantreg', 'rlm', 'wls']
+
+__all__ = [
+    "conditional_logit",
+    "conditional_mnlogit",
+    "conditional_poisson",
+    "gee",
+    "glm",
+    "glmgam",
+    "gls",
+    "glsar",
+    "logit",
+    "mixedlm",
+    "mnlogit",
+    "negativebinomial",
+    "nominal_gee",
+    "ols",
+    "ordinal_gee",
+    "phreg",
+    "poisson",
+    "probit",
+    "quantreg",
+    "rlm",
+    "wls",
+]
diff --git a/statsmodels/formula/formulatools.py b/statsmodels/formula/formulatools.py
index 9c2461beb..1ebeb3fef 100644
--- a/statsmodels/formula/formulatools.py
+++ b/statsmodels/formula/formulatools.py
@@ -1,11 +1,24 @@
 import statsmodels.tools.data as data_util
 from patsy import dmatrices, NAAction
 import numpy as np
+
+# if users want to pass in a different formula framework, they can
+# add their handler here. how to do it interactively?
+
+# this is a mutable object, so editing it should show up in the below
 formula_handler = {}


 class NAAction(NAAction):
-    pass
+    # monkey-patch so we can handle missing values in 'extra' arrays later
+    def _handle_NA_drop(self, values, is_NAs, origins):
+        total_mask = np.zeros(is_NAs[0].shape[0], dtype=bool)
+        for is_NA in is_NAs:
+            total_mask |= is_NA
+        good_mask = ~total_mask
+        self.missing_mask = total_mask
+        # "..." to handle 1- versus 2-dim indexing
+        return [v[good_mask, ...] for v in values]


 def handle_formula_data(Y, X, formula, depth=0, missing='drop'):
@@ -32,24 +45,67 @@ def handle_formula_data(Y, X, formula, depth=0, missing='drop'):
     exog : array_like
         Should preserve the input type of Y,X. Could be None.
     """
-    pass
+    # half ass attempt to handle other formula objects
+    if isinstance(formula, tuple(formula_handler.keys())):
+        return formula_handler[type(formula)]
+
+    na_action = NAAction(on_NA=missing)
+
+    if X is not None:
+        if data_util._is_using_pandas(Y, X):
+            result = dmatrices(formula, (Y, X), depth,
+                               return_type='dataframe', NA_action=na_action)
+        else:
+            result = dmatrices(formula, (Y, X), depth,
+                               return_type='dataframe', NA_action=na_action)
+    else:
+        if data_util._is_using_pandas(Y, None):
+            result = dmatrices(formula, Y, depth, return_type='dataframe',
+                               NA_action=na_action)
+        else:
+            result = dmatrices(formula, Y, depth, return_type='dataframe',
+                               NA_action=na_action)
+
+    # if missing == 'raise' there's not missing_mask
+    missing_mask = getattr(na_action, 'missing_mask', None)
+    if not np.any(missing_mask):
+        missing_mask = None
+    if len(result) > 1:  # have RHS design
+        design_info = result[1].design_info  # detach it from DataFrame
+    else:
+        design_info = None
+    # NOTE: is there ever a case where we'd need LHS design_info?
+    return result, missing_mask, design_info


 def _remove_intercept_patsy(terms):
     """
     Remove intercept from Patsy terms.
     """
-    pass
+    from patsy.desc import INTERCEPT
+    if INTERCEPT in terms:
+        terms.remove(INTERCEPT)
+    return terms
+
+
+def _has_intercept(design_info):
+    from patsy.desc import INTERCEPT
+    return INTERCEPT in design_info.terms


 def _intercept_idx(design_info):
     """
     Returns boolean array index indicating which column holds the intercept.
     """
-    pass
+    from patsy.desc import INTERCEPT
+    from numpy import array
+    return array([INTERCEPT == i for i in design_info.terms])


 def make_hypotheses_matrices(model_results, test_formula):
     """
     """
-    pass
+    from patsy.constraint import linear_constraint
+    exog_names = model_results.model.exog_names
+    LC = linear_constraint(test_formula, exog_names)
+    return LC
diff --git a/statsmodels/gam/api.py b/statsmodels/gam/api.py
index 5a7d9f6f8..c1f7c5cf7 100644
--- a/statsmodels/gam/api.py
+++ b/statsmodels/gam/api.py
@@ -1,4 +1,5 @@
 from .generalized_additive_model import GLMGam
 from .gam_cross_validation.gam_cross_validation import MultivariateGAMCVPath
 from .smooth_basis import BSplines, CyclicCubicSplines
-__all__ = ['BSplines', 'CyclicCubicSplines', 'GLMGam', 'MultivariateGAMCVPath']
+
+__all__ = ["BSplines", "CyclicCubicSplines", "GLMGam", "MultivariateGAMCVPath"]
diff --git a/statsmodels/gam/gam_cross_validation/cross_validators.py b/statsmodels/gam/gam_cross_validation/cross_validators.py
index 54a0f2b05..2551e4ae0 100644
--- a/statsmodels/gam/gam_cross_validation/cross_validators.py
+++ b/statsmodels/gam/gam_cross_validation/cross_validators.py
@@ -1,9 +1,11 @@
+# -*- coding: utf-8 -*-
 """
 Cross-validation iterators for GAM

 Author: Luca Puggini

 """
+
 from abc import ABCMeta, abstractmethod
 from statsmodels.compat.python import with_metaclass
 import numpy as np
@@ -14,10 +16,13 @@ class BaseCrossValidator(with_metaclass(ABCMeta)):
     The BaseCrossValidator class is a base class for all the iterators that
     split the data in train and test as for example KFolds or LeavePOut
     """
-
     def __init__(self):
         pass

+    @abstractmethod
+    def split(self):
+        pass
+

 class KFold(BaseCrossValidator):
     """
@@ -46,4 +51,17 @@ class KFold(BaseCrossValidator):
     def split(self, X, y=None, label=None):
         """yield index split into train and test sets
         """
-        pass
+        # TODO: X and y are redundant, we only need nobs
+
+        nobs = X.shape[0]
+        index = np.array(range(nobs))
+
+        if self.shuffle:
+            np.random.shuffle(index)
+
+        folds = np.array_split(index, self.k_folds)
+        for fold in folds:
+            test_index = np.zeros(nobs, dtype=bool)
+            test_index[fold] = True
+            train_index = np.logical_not(test_index)
+            yield train_index, test_index
diff --git a/statsmodels/gam/gam_cross_validation/gam_cross_validation.py b/statsmodels/gam/gam_cross_validation/gam_cross_validation.py
index dd19b78ff..17329cc19 100644
--- a/statsmodels/gam/gam_cross_validation/gam_cross_validation.py
+++ b/statsmodels/gam/gam_cross_validation/gam_cross_validation.py
@@ -1,14 +1,17 @@
+# -*- coding: utf-8 -*-
 """
 Cross-validation classes for GAM

 Author: Luca Puggini

 """
+
 from abc import ABCMeta, abstractmethod
 from statsmodels.compat.python import with_metaclass
 import itertools
 import numpy as np
-from statsmodels.gam.smooth_basis import GenericSmoothers, UnivariateGenericSmoother
+from statsmodels.gam.smooth_basis import (GenericSmoothers,
+                                          UnivariateGenericSmoother)


 class BaseCV(with_metaclass(ABCMeta)):
@@ -22,8 +25,27 @@ class BaseCV(with_metaclass(ABCMeta)):
         self.cv_iterator = cv_iterator
         self.exog = exog
         self.endog = endog
-        self.train_test_cv_indices = self.cv_iterator.split(self.exog, self
-            .endog, label=None)
+        # TODO: cv_iterator.split only needs nobs from endog or exog
+        self.train_test_cv_indices = self.cv_iterator.split(self.exog,
+                                                            self.endog,
+                                                            label=None)
+
+    def fit(self, **kwargs):
+        # kwargs are the input values for the fit method of the
+        # cross-validated object
+
+        cv_err = []
+
+        for train_index, test_index in self.train_test_cv_indices:
+            cv_err.append(self._error(train_index, test_index, **kwargs))
+
+        return np.array(cv_err)
+
+    @abstractmethod
+    def _error(self, train_index, test_index, **kwargs):
+        # train the model on the train set
+        #   and returns the error on the test set
+        pass


 def _split_train_test_smoothers(x, smoother, train_index, test_index):
@@ -31,11 +53,41 @@ def _split_train_test_smoothers(x, smoother, train_index, test_index):

     Note: this does not take exog_linear into account
     """
-    pass
+    train_smoothers = []
+    test_smoothers = []
+    for smoother in smoother.smoothers:
+        train_basis = smoother.basis[train_index]
+        train_der_basis = smoother.der_basis[train_index]
+        train_der2_basis = smoother.der2_basis[train_index]
+        train_cov_der2 = smoother.cov_der2
+        # TODO: Double check this part. cov_der2 is calculated with all data
+        train_x = smoother.x[train_index]
+
+        train_smoothers.append(
+            UnivariateGenericSmoother(
+                train_x, train_basis, train_der_basis, train_der2_basis,
+                train_cov_der2, smoother.variable_name + ' train'))
+
+        test_basis = smoother.basis[test_index]
+        test_der_basis = smoother.der_basis[test_index]
+        test_cov_der2 = smoother.cov_der2
+        # TODO: Double check this part. cov_der2 is calculated with all data
+        test_x = smoother.x[test_index]
+
+        test_smoothers.append(
+            UnivariateGenericSmoother(
+                test_x, test_basis, test_der_basis, train_der2_basis,
+                test_cov_der2, smoother.variable_name + ' test'))
+
+    train_multivariate_smoothers = GenericSmoothers(x[train_index],
+                                                    train_smoothers)
+    test_multivariate_smoothers = GenericSmoothers(x[test_index],
+                                                   test_smoothers)
+
+    return train_multivariate_smoothers, test_multivariate_smoothers


 class MultivariateGAMCV(BaseCV):
-
     def __init__(self, smoother, alphas, gam, cost, endog, exog, cv_iterator):
         self.cost = cost
         self.gam = gam
@@ -43,8 +95,35 @@ class MultivariateGAMCV(BaseCV):
         self.exog_linear = exog
         self.alphas = alphas
         self.cv_iterator = cv_iterator
-        super(MultivariateGAMCV, self).__init__(cv_iterator, endog, self.
-            smoother.basis)
+        # TODO: super does not do anything with endog, exog, except get nobs
+        # refactor to clean up what where `exog` and `exog_linear` is attached
+        super(MultivariateGAMCV, self).__init__(cv_iterator,
+                                                endog,
+                                                # exog,  # not used in super
+                                                self.smoother.basis)
+
+    def _error(self, train_index, test_index, **kwargs):
+        train_smoother, test_smoother = _split_train_test_smoothers(
+            self.smoother.x, self.smoother, train_index, test_index)
+
+        endog_train = self.endog[train_index]
+        endog_test = self.endog[test_index]
+        if self.exog_linear is not None:
+            exog_linear_train = self.exog_linear[train_index]
+            exog_linear_test = self.exog_linear[test_index]
+        else:
+            exog_linear_train = None
+            exog_linear_test = None
+
+        gam = self.gam(endog_train, exog=exog_linear_train,
+                       smoother=train_smoother, alpha=self.alphas)
+        gam_res = gam.fit(**kwargs)
+        # exog_linear_test and test_smoother.basis will be column_stacked
+        #     but not transformed in predict
+        endog_est = gam_res.predict(exog_linear_test, test_smoother.basis,
+                                    transform=False)
+
+        return self.cost(endog_test, endog_est)


 class BasePenaltiesPathCV(with_metaclass(ABCMeta)):
@@ -62,6 +141,24 @@ class BasePenaltiesPathCV(with_metaclass(ABCMeta)):
         self.cv_error = None
         self.cv_std = None

+    def plot_path(self):
+        from statsmodels.graphics.utils import _import_mpl
+        plt = _import_mpl()
+        plt.plot(self.alphas, self.cv_error, c='black')
+        plt.plot(self.alphas, self.cv_error + 1.96 * self.cv_std,
+                 c='blue')
+        plt.plot(self.alphas, self.cv_error - 1.96 * self.cv_std,
+                 c='blue')
+
+        plt.plot(self.alphas, self.cv_error, 'o', c='black')
+        plt.plot(self.alphas, self.cv_error + 1.96 * self.cv_std, 'o',
+                 c='blue')
+        plt.plot(self.alphas, self.cv_error - 1.96 * self.cv_std, 'o',
+                 c='blue')
+
+        return
+        # TODO add return
+

 class MultivariateGAMCVPath:
     """k-fold cross-validation for GAM
@@ -92,6 +189,22 @@ class MultivariateGAMCVPath:
         self.endog = endog
         self.exog = exog
         self.cv_iterator = cv_iterator
-        self.cv_error = np.zeros(shape=len(self.alphas_grid))
-        self.cv_std = np.zeros(shape=len(self.alphas_grid))
+        self.cv_error = np.zeros(shape=(len(self.alphas_grid, )))
+        self.cv_std = np.zeros(shape=(len(self.alphas_grid, )))
         self.alpha_cv = None
+
+    def fit(self, **kwargs):
+        for i, alphas_i in enumerate(self.alphas_grid):
+            gam_cv = MultivariateGAMCV(smoother=self.smoother,
+                                       alphas=alphas_i,
+                                       gam=self.gam,
+                                       cost=self.cost,
+                                       endog=self.endog,
+                                       exog=self.exog,
+                                       cv_iterator=self.cv_iterator)
+            cv_err = gam_cv.fit(**kwargs)
+            self.cv_error[i] = cv_err.mean()
+            self.cv_std[i] = cv_err.std()
+
+        self.alpha_cv = self.alphas_grid[np.argmin(self.cv_error)]
+        return self
diff --git a/statsmodels/gam/gam_penalties.py b/statsmodels/gam/gam_penalties.py
index 2f63b3ddb..283b4def4 100644
--- a/statsmodels/gam/gam_penalties.py
+++ b/statsmodels/gam/gam_penalties.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Penalty classes for Generalized Additive Models

@@ -5,6 +6,7 @@ Author: Luca Puggini
 Author: Josef Perktold

 """
+
 import numpy as np
 from scipy.linalg import block_diag
 from statsmodels.base._penalties import Penalty
@@ -52,7 +54,11 @@ class UnivariateGamPenalty(Penalty):
         func : float
             value of the penalty evaluated at params
         """
-        pass
+        if alpha is None:
+            alpha = self.alpha
+
+        f = params.dot(self.univariate_smoother.cov_der2.dot(params))
+        return alpha * f / self.nobs

     def deriv(self, params, alpha=None):
         """evaluate derivative of penalty with respect to params
@@ -69,7 +75,12 @@ class UnivariateGamPenalty(Penalty):
         deriv : ndarray
             derivative, gradient of the penalty with respect to params
         """
-        pass
+        if alpha is None:
+            alpha = self.alpha
+
+        d = 2 * alpha * np.dot(self.univariate_smoother.cov_der2, params)
+        d /= self.nobs
+        return d

     def deriv2(self, params, alpha=None):
         """evaluate second derivative of penalty with respect to params
@@ -86,7 +97,12 @@ class UnivariateGamPenalty(Penalty):
         deriv2 : ndarray, 2-Dim
             second derivative, hessian of the penalty with respect to params
         """
-        pass
+        if alpha is None:
+            alpha = self.alpha
+
+        d2 = 2 * alpha * self.univariate_smoother.cov_der2
+        d2 /= self.nobs
+        return d2

     def penalty_matrix(self, alpha=None):
         """penalty matrix for the smooth term of a GAM
@@ -104,7 +120,10 @@ class UnivariateGamPenalty(Penalty):
             smooth terms, i.e. the number of parameters for this smooth
             term in the regression model
         """
-        pass
+        if alpha is None:
+            alpha = self.alpha
+
+        return alpha * self.univariate_smoother.cov_der2


 class MultivariateGamPenalty(Penalty):
@@ -139,13 +158,15 @@ class MultivariateGamPenalty(Penalty):
     k_params : total number of parameters in the regression model
     """

-    def __init__(self, multivariate_smoother, alpha, weights=None, start_idx=0
-        ):
+    def __init__(self, multivariate_smoother, alpha, weights=None,
+                 start_idx=0):
+
         if len(multivariate_smoother.smoothers) != len(alpha):
-            msg = (
-                'all the input values should be of the same length. len(smoothers)=%d, len(alphas)=%d'
-                 % (len(multivariate_smoother.smoothers), len(alpha)))
+            msg = ('all the input values should be of the same length.'
+                   ' len(smoothers)=%d, len(alphas)=%d') % (
+                   len(multivariate_smoother.smoothers), len(alpha))
             raise ValueError(msg)
+
         self.multivariate_smoother = multivariate_smoother
         self.dim_basis = self.multivariate_smoother.dim_basis
         self.k_variables = self.multivariate_smoother.k_variables
@@ -153,22 +174,31 @@ class MultivariateGamPenalty(Penalty):
         self.alpha = alpha
         self.start_idx = start_idx
         self.k_params = start_idx + self.dim_basis
+
+        # TODO: Review this,
         if weights is None:
-            self.weights = [(1.0) for _ in range(self.k_variables)]
+            # weights should have total length as params
+            # but it can also be scalar in individual component
+            self.weights = [1. for _ in range(self.k_variables)]
         else:
             import warnings
             warnings.warn('weights is currently ignored')
             self.weights = weights
-        self.mask = [np.zeros(self.k_params, dtype=bool) for _ in range(
-            self.k_variables)]
+
+        self.mask = [np.zeros(self.k_params, dtype=bool)
+                     for _ in range(self.k_variables)]
         param_count = start_idx
         for i, smoother in enumerate(self.multivariate_smoother.smoothers):
-            self.mask[i][param_count:param_count + smoother.dim_basis] = True
+            # the mask[i] contains a vector of length k_columns. The index
+            # corresponding to the i-th input variable are set to True.
+            self.mask[i][param_count: param_count + smoother.dim_basis] = True
             param_count += smoother.dim_basis
+
         self.gp = []
         for i in range(self.k_variables):
-            gp = UnivariateGamPenalty(self.multivariate_smoother.smoothers[
-                i], weights=self.weights[i], alpha=self.alpha[i])
+            gp = UnivariateGamPenalty(self.multivariate_smoother.smoothers[i],
+                                      weights=self.weights[i],
+                                      alpha=self.alpha[i])
             self.gp.append(gp)

     def func(self, params, alpha=None):
@@ -186,7 +216,15 @@ class MultivariateGamPenalty(Penalty):
         func : float
             value of the penalty evaluated at params
         """
-        pass
+        if alpha is None:
+            alpha = [None] * self.k_variables
+
+        cost = 0
+        for i in range(self.k_variables):
+            params_i = params[self.mask[i]]
+            cost += self.gp[i].func(params_i, alpha=alpha[i])
+
+        return cost

     def deriv(self, params, alpha=None):
         """evaluate derivative of penalty with respect to params
@@ -203,7 +241,15 @@ class MultivariateGamPenalty(Penalty):
         deriv : ndarray
             derivative, gradient of the penalty with respect to params
         """
-        pass
+        if alpha is None:
+            alpha = [None] * self.k_variables
+
+        grad = [np.zeros(self.start_idx)]
+        for i in range(self.k_variables):
+            params_i = params[self.mask[i]]
+            grad.append(self.gp[i].deriv(params_i, alpha=alpha[i]))
+
+        return np.concatenate(grad)

     def deriv2(self, params, alpha=None):
         """evaluate second derivative of penalty with respect to params
@@ -220,7 +266,15 @@ class MultivariateGamPenalty(Penalty):
         deriv2 : ndarray, 2-Dim
             second derivative, hessian of the penalty with respect to params
         """
-        pass
+        if alpha is None:
+            alpha = [None] * self.k_variables
+
+        deriv2 = [np.zeros((self.start_idx, self.start_idx))]
+        for i in range(self.k_variables):
+            params_i = params[self.mask[i]]
+            deriv2.append(self.gp[i].deriv2(params_i, alpha=alpha[i]))
+
+        return block_diag(*deriv2)

     def penalty_matrix(self, alpha=None):
         """penalty matrix for generalized additive model
@@ -243,4 +297,11 @@ class MultivariateGamPenalty(Penalty):
         used as positional arguments. The order of keywords might change.
         We might need to add a ``params`` keyword if the need arises.
         """
-        pass
+        if alpha is None:
+            alpha = self.alpha
+
+        s_all = [np.zeros((self.start_idx, self.start_idx))]
+        for i in range(self.k_variables):
+            s_all.append(self.gp[i].penalty_matrix(alpha=alpha[i]))
+
+        return block_diag(*s_all)
diff --git a/statsmodels/gam/generalized_additive_model.py b/statsmodels/gam/generalized_additive_model.py
index 8da3ea8e3..5f6a7cd32 100644
--- a/statsmodels/gam/generalized_additive_model.py
+++ b/statsmodels/gam/generalized_additive_model.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Generalized Additive Models

@@ -6,22 +7,31 @@ Author: Josef Perktold

 created on 08/07/2015
 """
+
 from collections.abc import Iterable
-import copy
+import copy  # check if needed when dropping python 2.7
+
 import numpy as np
 from scipy import optimize
 import pandas as pd
+
 import statsmodels.base.wrapper as wrap
+
 from statsmodels.discrete.discrete_model import Logit
-from statsmodels.genmod.generalized_linear_model import GLM, GLMResults, GLMResultsWrapper, _check_convergence
+from statsmodels.genmod.generalized_linear_model import (
+    GLM, GLMResults, GLMResultsWrapper, _check_convergence)
 import statsmodels.regression.linear_model as lm
-from statsmodels.tools.sm_exceptions import PerfectSeparationError, ValueWarning
+# import statsmodels.regression._tools as reg_tools  # TODO: use this for pirls
+from statsmodels.tools.sm_exceptions import (PerfectSeparationError,
+                                             ValueWarning)
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.data import _is_using_pandas
 from statsmodels.tools.linalg import matrix_sqrt
+
 from statsmodels.base._penalized import PenalizedMixin
 from statsmodels.gam.gam_penalties import MultivariateGamPenalty
-from statsmodels.gam.gam_cross_validation.gam_cross_validation import MultivariateGAMCVPath
+from statsmodels.gam.gam_cross_validation.gam_cross_validation import (
+    MultivariateGAMCVPath)
 from statsmodels.gam.gam_cross_validation.cross_validators import KFold


@@ -31,7 +41,44 @@ def _transform_predict_exog(model, exog, design_info=None):
     Note: this is copied from base.model.Results.predict and converted to
     standalone function with additional options.
     """
-    pass
+
+    is_pandas = _is_using_pandas(exog, None)
+
+    exog_index = exog.index if is_pandas else None
+
+    if design_info is None:
+        design_info = getattr(model.data, 'design_info', None)
+
+    if design_info is not None and (exog is not None):
+        from patsy import dmatrix
+        if isinstance(exog, pd.Series):
+            # we are guessing whether it should be column or row
+            if (hasattr(exog, 'name') and isinstance(exog.name, str) and
+                    exog.name in design_info.describe()):
+                # assume we need one column
+                exog = pd.DataFrame(exog)
+            else:
+                # assume we need a row
+                exog = pd.DataFrame(exog).T
+        orig_exog_len = len(exog)
+        is_dict = isinstance(exog, dict)
+        exog = dmatrix(design_info, exog, return_type="dataframe")
+        if orig_exog_len > len(exog) and not is_dict:
+            import warnings
+            if exog_index is None:
+                warnings.warn('nan values have been dropped', ValueWarning)
+            else:
+                exog = exog.reindex(exog_index)
+        exog_index = exog.index
+
+    if exog is not None:
+        exog = np.asarray(exog)
+        if exog.ndim == 1 and (model.exog.ndim == 1 or
+                               model.exog.shape[1] == 1):
+            exog = exog[:, None]
+        exog = np.atleast_2d(exog)  # needed in count model shape[1]
+
+    return exog, exog_index


 class GLMGamResults(GLMResults):
@@ -67,22 +114,30 @@ class GLMGamResults(GLMResults):
     """

     def __init__(self, model, params, normalized_cov_params, scale, **kwds):
+
+        # this is a messy way to compute edf and update scale
+        # need several attributes to compute edf
         self.model = model
         self.params = params
         self.normalized_cov_params = normalized_cov_params
         self.scale = scale
         edf = self.edf.sum()
-        self.df_model = edf - 1
+        self.df_model = edf - 1  # assume constant
+        # need to use nobs or wnobs attribute
         self.df_resid = self.model.endog.shape[0] - edf
+
+        # we are setting the model df for the case when super is using it
+        # df in model will be incorrect state when alpha/pen_weight changes
         self.model.df_model = self.df_model
         self.model.df_resid = self.df_resid
         mu = self.fittedvalues
         self.scale = scale = self.model.estimate_scale(mu)
         super(GLMGamResults, self).__init__(model, params,
-            normalized_cov_params, scale, **kwds)
+                                            normalized_cov_params, scale,
+                                            **kwds)

-    def _tranform_predict_exog(self, exog=None, exog_smooth=None, transform
-        =True):
+    def _tranform_predict_exog(self, exog=None, exog_smooth=None,
+                               transform=True):
         """Transform original explanatory variables for prediction

         Parameters
@@ -103,7 +158,37 @@ class GLMGamResults(GLMResults):
         exog_transformed : ndarray
             design matrix for the prediction
         """
-        pass
+        if exog_smooth is not None:
+            exog_smooth = np.asarray(exog_smooth)
+        exog_index = None
+        if transform is False:
+            # the following allows that either or both exog are not None
+            if exog_smooth is None:
+                # exog could be None or array
+                ex = exog
+            else:
+                if exog is None:
+                    ex = exog_smooth
+                else:
+                    ex = np.column_stack((exog, exog_smooth))
+        else:
+            # transform exog_linear if needed
+            if exog is not None and hasattr(self.model, 'design_info_linear'):
+                exog, exog_index = _transform_predict_exog(
+                    self.model, exog, self.model.design_info_linear)
+
+            # create smooth basis
+            if exog_smooth is not None:
+                ex_smooth = self.model.smoother.transform(exog_smooth)
+                if exog is None:
+                    ex = ex_smooth
+                else:
+                    # TODO: there might be problems is exog_smooth is 1-D
+                    ex = np.column_stack((exog, ex_smooth))
+            else:
+                ex = exog
+
+        return ex, exog_index

     def predict(self, exog=None, exog_smooth=None, transform=True, **kwargs):
         """"
@@ -127,10 +212,23 @@ class GLMGamResults(GLMResults):
         prediction : ndarray, pandas.Series or pandas.DataFrame
             predicted values
         """
-        pass
+        ex, exog_index = self._tranform_predict_exog(exog=exog,
+                                                     exog_smooth=exog_smooth,
+                                                     transform=transform)
+        predict_results = super(GLMGamResults, self).predict(ex,
+                                                             transform=False,
+                                                             **kwargs)
+        if exog_index is not None and not hasattr(
+                predict_results, 'predicted_values'):
+            if predict_results.ndim == 1:
+                return pd.Series(predict_results, index=exog_index)
+            else:
+                return pd.DataFrame(predict_results, index=exog_index)
+        else:
+            return predict_results

     def get_prediction(self, exog=None, exog_smooth=None, transform=True,
-        **kwargs):
+                       **kwargs):
         """compute prediction results

         Parameters
@@ -154,7 +252,11 @@ class GLMGamResults(GLMResults):
             summary tables for the prediction of the mean and of new
             observations.
         """
-        pass
+        ex, exog_index = self._tranform_predict_exog(exog=exog,
+                                                     exog_smooth=exog_smooth,
+                                                     transform=transform)
+        return super(GLMGamResults, self).get_prediction(ex, transform=False,
+                                                         **kwargs)

     def partial_values(self, smooth_index, include_constant=True):
         """contribution of a smooth term to the linear prediction
@@ -180,10 +282,33 @@ class GLMGamResults(GLMResults):
         se_pred : nd_array
             standard error of linear prediction
         """
-        pass
+        variable = smooth_index
+        smoother = self.model.smoother
+        mask = smoother.mask[variable]
+
+        start_idx = self.model.k_exog_linear
+        idx = start_idx + np.nonzero(mask)[0]
+
+        # smoother has only smooth parts, not exog_linear
+        exog_part = smoother.basis[:, mask]
+
+        const_idx = self.model.data.const_idx
+        if include_constant and const_idx is not None:
+            idx = np.concatenate(([const_idx], idx))
+            exog_part = self.model.exog[:, idx]
+
+        linpred = np.dot(exog_part, self.params[idx])
+        # select the submatrix corresponding to a single variable
+        partial_cov_params = self.cov_params(column=idx)
+
+        covb = partial_cov_params
+        var = (exog_part * np.dot(covb, exog_part.T).T).sum(1)
+        se = np.sqrt(var)
+
+        return linpred, se

     def plot_partial(self, smooth_index, plot_se=True, cpr=False,
-        include_constant=True, ax=None):
+                     include_constant=True, ax=None):
         """plot the contribution of a smooth term to the linear prediction

         Parameters
@@ -210,7 +335,36 @@ class GLMGamResults(GLMResults):
             If `ax` is None, the created figure. Otherwise, the Figure to which
             `ax` is connected.
         """
-        pass
+        from statsmodels.graphics.utils import _import_mpl, create_mpl_ax
+        _import_mpl()
+
+        variable = smooth_index
+        y_est, se = self.partial_values(variable,
+                                        include_constant=include_constant)
+        smoother = self.model.smoother
+        x = smoother.smoothers[variable].x
+        sort_index = np.argsort(x)
+        x = x[sort_index]
+        y_est = y_est[sort_index]
+        se = se[sort_index]
+
+        fig, ax = create_mpl_ax(ax)
+
+        if cpr:
+            # TODO: resid_response does not make sense with nonlinear link
+            # use resid_working ?
+            residual = self.resid_working[sort_index]
+            cpr_ = y_est + residual
+            ax.scatter(x, cpr_, s=4)
+
+        ax.plot(x, y_est, c='blue', lw=2)
+        if plot_se:
+            ax.plot(x, y_est + 1.96 * se, '-', c='blue')
+            ax.plot(x, y_est - 1.96 * se, '-', c='blue')
+
+        ax.set_xlabel(smoother.smoothers[variable].variable_name)
+
+        return fig

     def test_significance(self, smooth_index):
         """hypothesis test that a smooth component is zero.
@@ -228,7 +382,20 @@ class GLMGamResults(GLMResults):
         wald_test : ContrastResults instance
             the results instance created by `wald_test`
         """
-        pass
+
+        variable = smooth_index
+        smoother = self.model.smoother
+        start_idx = self.model.k_exog_linear
+
+        k_params = len(self.params)
+        # a bit messy, we need first index plus length of smooth term
+        mask = smoother.mask[variable]
+        k_constraints = mask.sum()
+        idx = start_idx + np.nonzero(mask)[0][0]
+        constraints = np.eye(k_constraints, k_params, idx)
+        df_constraints = self.edf[idx: idx + k_constraints].sum()
+
+        return self.wald_test(constraints, df_constraints=df_constraints)

     def get_hat_matrix_diag(self, observed=True, _axis=1):
         """
@@ -255,7 +422,42 @@ class GLMGamResults(GLMResults):
             The diagonal of the hat matrix computed from the observed
             or expected hessian.
         """
-        pass
+        weights = self.model.hessian_factor(self.params, scale=self.scale,
+                                            observed=observed)
+        wexog = np.sqrt(weights)[:, None] * self.model.exog
+
+        # we can use inverse hessian directly instead of computing it from
+        # WLS/IRLS as in GLM
+
+        # TODO: does `normalized_cov_params * scale` work in all cases?
+        # this avoids recomputing hessian, check when used for other models.
+        hess_inv = self.normalized_cov_params * self.scale
+        # this is in GLM equivalent to the more generic and direct
+        # hess_inv = np.linalg.inv(-self.model.hessian(self.params))
+        hd = (wexog * hess_inv.dot(wexog.T).T).sum(axis=_axis)
+        return hd
+
+    @cache_readonly
+    def edf(self):
+        return self.get_hat_matrix_diag(_axis=0)
+
+    @cache_readonly
+    def hat_matrix_trace(self):
+        return self.hat_matrix_diag.sum()
+
+    @cache_readonly
+    def hat_matrix_diag(self):
+        return self.get_hat_matrix_diag(observed=True)
+
+    @cache_readonly
+    def gcv(self):
+        return self.scale / (1. - self.hat_matrix_trace / self.nobs)**2
+
+    @cache_readonly
+    def cv(self):
+        cv_ = ((self.resid_pearson / (1. - self.hat_matrix_diag))**2).sum()
+        cv_ /= self.nobs
+        return cv_


 class GLMGamResultsWrapper(GLMResultsWrapper):
@@ -307,17 +509,24 @@ class GLMGam(PenalizedMixin, GLM):
     User specified var or freq weights are most likely also not correct for
     all results.)
     """
+
     _results_class = GLMGamResults
     _results_class_wrapper = GLMGamResultsWrapper

-    def __init__(self, endog, exog=None, smoother=None, alpha=0, family=
-        None, offset=None, exposure=None, missing='none', **kwargs):
+    def __init__(self, endog, exog=None, smoother=None, alpha=0, family=None,
+                 offset=None, exposure=None, missing='none', **kwargs):
+
+        # TODO: check usage of hasconst
         hasconst = kwargs.get('hasconst', None)
         xnames_linear = None
         if hasattr(exog, 'design_info'):
             self.design_info_linear = exog.design_info
             xnames_linear = self.design_info_linear.column_names
+
         is_pandas = _is_using_pandas(exog, None)
+
+        # TODO: handle data is experimental, see #5469
+        # This is a bit wasteful because we need to `handle_data twice`
         self.data_linear = self._handle_data(endog, exog, missing, hasconst)
         if xnames_linear is None:
             xnames_linear = self.data_linear.xnames
@@ -328,30 +537,48 @@ class GLMGam(PenalizedMixin, GLM):
             exog_linear = None
             k_exog_linear = 0
         self.k_exog_linear = k_exog_linear
+        # We need exog_linear for k-fold cross validation
+        # TODO: alternative is to take columns from combined exog
         self.exog_linear = exog_linear
+
         self.smoother = smoother
         self.k_smooths = smoother.k_variables
         self.alpha = self._check_alpha(alpha)
         penal = MultivariateGamPenalty(smoother, alpha=self.alpha,
-            start_idx=k_exog_linear)
+                                       start_idx=k_exog_linear)
         kwargs.pop('penal', None)
         if exog_linear is not None:
             exog = np.column_stack((exog_linear, smoother.basis))
         else:
             exog = smoother.basis
+
+        # TODO: check: xnames_linear will be None instead of empty list
+        #       if no exog_linear
+        # can smoother be empty ? I guess not allowed.
         if xnames_linear is None:
             xnames_linear = []
         xnames = xnames_linear + self.smoother.col_names
+
         if is_pandas and exog_linear is not None:
+            # we a dataframe so we can get a PandasData instance for wrapping
             exog = pd.DataFrame(exog, index=self.data_linear.row_labels,
-                columns=xnames)
+                                columns=xnames)
+
         super(GLMGam, self).__init__(endog, exog=exog, family=family,
-            offset=offset, exposure=exposure, penal=penal, missing=missing,
-            **kwargs)
+                                     offset=offset, exposure=exposure,
+                                     penal=penal, missing=missing, **kwargs)
+
         if not is_pandas:
+            # set exog nanmes if not given by pandas DataFrame
             self.exog_names[:] = xnames
+
+        # TODO: the generic data handling might attach the design_info from the
+        #       linear part, but this is incorrect for the full model and
+        #       causes problems in wald_test_terms
+
         if hasattr(self.data, 'design_info'):
             del self.data.design_info
+        # formula also might be attached which causes problems in predict
         if hasattr(self, 'formula'):
             self.formula_linear = self.formula
             self.formula = None
@@ -371,11 +598,16 @@ class GLMGam(PenalizedMixin, GLM):
             penalization weight, list with length equal to the number of
             smooth terms
         """
-        pass
-
-    def fit(self, start_params=None, maxiter=1000, method='pirls', tol=
-        1e-08, scale=None, cov_type='nonrobust', cov_kwds=None, use_t=None,
-        full_output=True, disp=False, max_start_irls=3, **kwargs):
+        if not isinstance(alpha, Iterable):
+            alpha = [alpha] * len(self.smoother.smoothers)
+        elif not isinstance(alpha, list):
+            # we want alpha to be a list
+            alpha = list(alpha)
+        return alpha
+
+    def fit(self, start_params=None, maxiter=1000, method='pirls', tol=1e-8,
+            scale=None, cov_type='nonrobust', cov_kwds=None, use_t=None,
+            full_output=True, disp=False, max_start_irls=3, **kwargs):
         """estimate parameters and create instance of GLMGamResults class

         Parameters
@@ -390,17 +622,143 @@ class GLMGam(PenalizedMixin, GLM):
         -------
         res : instance of wrapped GLMGamResults
         """
-        pass
+        # TODO: temporary hack to remove attribute
+        # formula also might be attached which in inherited from_formula
+        # causes problems in predict
+        if hasattr(self, 'formula'):
+            self.formula_linear = self.formula
+            del self.formula

-    def _fit_pirls(self, alpha, start_params=None, maxiter=100, tol=1e-08,
-        scale=None, cov_type='nonrobust', cov_kwds=None, use_t=None,
-        weights=None):
+        # TODO: alpha not allowed yet, but is in `_fit_pirls`
+        # alpha = self._check_alpha()
+
+        if method.lower() in ['pirls', 'irls']:
+            res = self._fit_pirls(self.alpha, start_params=start_params,
+                                  maxiter=maxiter, tol=tol, scale=scale,
+                                  cov_type=cov_type, cov_kwds=cov_kwds,
+                                  use_t=use_t, **kwargs)
+        else:
+            if max_start_irls > 0 and (start_params is None):
+                res = self._fit_pirls(self.alpha, start_params=start_params,
+                                      maxiter=max_start_irls, tol=tol,
+                                      scale=scale,
+                                      cov_type=cov_type, cov_kwds=cov_kwds,
+                                      use_t=use_t, **kwargs)
+                start_params = res.params
+                del res
+            res = super(GLMGam, self).fit(start_params=start_params,
+                                          maxiter=maxiter, method=method,
+                                          tol=tol, scale=scale,
+                                          cov_type=cov_type, cov_kwds=cov_kwds,
+                                          use_t=use_t,
+                                          full_output=full_output, disp=disp,
+                                          max_start_irls=0,
+                                          **kwargs)
+        return res
+
+    # pag 165 4.3 # pag 136 PIRLS
+    def _fit_pirls(self, alpha, start_params=None, maxiter=100, tol=1e-8,
+                   scale=None, cov_type='nonrobust', cov_kwds=None, use_t=None,
+                   weights=None):
         """fit model with penalized reweighted least squares
         """
-        pass
+        # TODO: this currently modifies several attributes
+        # self.scale, self.scaletype, self.mu, self.weights
+        # self.data_weights,
+        # and possibly self._offset_exposure
+        # several of those might not be necessary, e.g. mu and weights
+
+        # alpha = alpha * len(y) * self.scale / 100
+        # TODO: we need to rescale alpha
+        endog = self.endog
+        wlsexog = self.exog  # smoother.basis
+        spl_s = self.penal.penalty_matrix(alpha=alpha)
+
+        nobs, n_columns = wlsexog.shape
+
+        # TODO what are these values?
+        if weights is None:
+            self.data_weights = np.array([1.] * nobs)
+        else:
+            self.data_weights = weights
+
+        if not hasattr(self, '_offset_exposure'):
+            self._offset_exposure = 0
+
+        self.scaletype = scale
+        # TODO: check default scale types
+        # self.scaletype = 'dev'
+        # during iteration
+        self.scale = 1
+
+        if start_params is None:
+            mu = self.family.starting_mu(endog)
+            lin_pred = self.family.predict(mu)
+        else:
+            lin_pred = np.dot(wlsexog, start_params) + self._offset_exposure
+            mu = self.family.fitted(lin_pred)
+        dev = self.family.deviance(endog, mu)
+
+        history = dict(params=[None, start_params], deviance=[np.inf, dev])
+        converged = False
+        criterion = history['deviance']
+        # This special case is used to get the likelihood for a specific
+        # params vector.
+        if maxiter == 0:
+            mu = self.family.fitted(lin_pred)
+            self.scale = self.estimate_scale(mu)
+            wls_results = lm.RegressionResults(self, start_params, None)
+            iteration = 0
+
+        for iteration in range(maxiter):
+
+            # TODO: is this equivalent to point 1 of page 136:
+            # w = 1 / (V(mu) * g'(mu))  ?
+            self.weights = self.data_weights * self.family.weights(mu)
+
+            # TODO: is this equivalent to point 1 of page 136:
+            # z = g(mu)(y - mu) + X beta  ?
+            wlsendog = (lin_pred + self.family.link.deriv(mu) * (endog - mu)
+                        - self._offset_exposure)
+
+            # this defines the augmented matrix point 2a on page 136
+            wls_results = penalized_wls(wlsendog, wlsexog, spl_s, self.weights)
+            lin_pred = np.dot(wlsexog, wls_results.params).ravel()
+            lin_pred += self._offset_exposure
+            mu = self.family.fitted(lin_pred)
+
+            # We do not need to update scale in GLM/LEF models
+            # We might need it in dispersion models.
+            # self.scale = self.estimate_scale(mu)
+            history = self._update_history(wls_results, mu, history)
+
+            if endog.squeeze().ndim == 1 and np.allclose(mu - endog, 0):
+                msg = "Perfect separation detected, results not available"
+                raise PerfectSeparationError(msg)
+
+            # TODO need atol, rtol
+            # args of _check_convergence: (criterion, iteration, atol, rtol)
+            converged = _check_convergence(criterion, iteration, tol, 0)
+            if converged:
+                break
+        self.mu = mu
+        self.scale = self.estimate_scale(mu)
+        glm_results = GLMGamResults(self, wls_results.params,
+                                    wls_results.normalized_cov_params,
+                                    self.scale,
+                                    cov_type=cov_type, cov_kwds=cov_kwds,
+                                    use_t=use_t)
+
+        glm_results.method = "PIRLS"
+        history['iteration'] = iteration + 1
+        glm_results.fit_history = history
+        glm_results.converged = converged
+
+        return GLMGamResultsWrapper(glm_results)

     def select_penweight(self, criterion='aic', start_params=None,
-        start_model_params=None, method='basinhopping', **fit_kwds):
+                         start_model_params=None,
+                         method='basinhopping', **fit_kwds):
         """find alpha by minimizing results criterion

         The objective for the minimization can be results attributes like
@@ -456,10 +814,61 @@ class GLMGam(PenalizedMixin, GLM):
         is a better way to find a global optimum. API (e.g. type of return)
         might also change.
         """
-        pass
+        # copy attributes that are changed, so we can reset them
+        scale_keep = self.scale
+        scaletype_keep = self.scaletype
+        # TODO: use .copy() method when available for all types
+        alpha_keep = copy.copy(self.alpha)
+
+        if start_params is None:
+            start_params = np.zeros(self.k_smooths)
+        else:
+            start_params = np.log(1e-20 + start_params)
+
+        history = {}
+        history['alpha'] = []
+        history['params'] = [start_model_params]
+        history['criterion'] = []
+
+        def fun(p):
+            a = np.exp(p)
+            res_ = self._fit_pirls(start_params=history['params'][-1],
+                                   alpha=a)
+            history['alpha'].append(a)
+            history['params'].append(np.asarray(res_.params))
+            return getattr(res_, criterion)
+
+        if method == 'nm':
+            kwds = dict(full_output=True, maxiter=1000, maxfun=2000)
+            kwds.update(fit_kwds)
+            fit_res = optimize.fmin(fun, start_params, **kwds)
+            opt = fit_res[0]
+        elif method == 'basinhopping':
+            kwds = dict(minimizer_kwargs={'method': 'Nelder-Mead',
+                        'options': {'maxiter': 100, 'maxfev': 500}},
+                        niter=10)
+            kwds.update(fit_kwds)
+            fit_res = optimize.basinhopping(fun, start_params, **kwds)
+            opt = fit_res.x
+        elif method == 'minimize':
+            fit_res = optimize.minimize(fun, start_params, **fit_kwds)
+            opt = fit_res.x
+        else:
+            raise ValueError('method not recognized')

-    def select_penweight_kfold(self, alphas=None, cv_iterator=None, cost=
-        None, k_folds=5, k_grid=11):
+        del history['params'][0]  # remove the model start_params
+
+        alpha = np.exp(opt)
+
+        # reset attributes that have or might have changed
+        self.scale = scale_keep
+        self.scaletype = scaletype_keep
+        self.alpha = alpha_keep
+
+        return alpha, fit_res, history
+
+    def select_penweight_kfold(self, alphas=None, cv_iterator=None, cost=None,
+                               k_folds=5, k_grid=11):
         """find alphas by k-fold cross-validation

         Warning: This estimates ``k_folds`` models for each point in the
@@ -491,7 +900,24 @@ class GLMGam(PenalizedMixin, GLM):
         The default alphas are defined as
         ``alphas = [np.logspace(0, 7, k_grid) for _ in range(k_smooths)]``
         """
-        pass
+
+        if cost is None:
+            def cost(x1, x2):
+                return np.linalg.norm(x1 - x2) / len(x1)
+
+        if alphas is None:
+            alphas = [np.logspace(0, 7, k_grid) for _ in range(self.k_smooths)]
+
+        if cv_iterator is None:
+            cv_iterator = KFold(k_folds=k_folds, shuffle=True)
+
+        gam_cv = MultivariateGAMCVPath(smoother=self.smoother, alphas=alphas,
+                                       gam=GLMGam, cost=cost, endog=self.endog,
+                                       exog=self.exog_linear,
+                                       cv_iterator=cv_iterator)
+        gam_cv_res = gam_cv.fit()
+
+        return gam_cv_res.alpha_cv, gam_cv_res


 class LogitGam(PenalizedMixin, Logit):
@@ -504,16 +930,17 @@ class LogitGam(PenalizedMixin, Logit):

     not verified yet.
     """
-
     def __init__(self, endog, smoother, alpha, *args, **kwargs):
         if not isinstance(alpha, Iterable):
             alpha = np.array([alpha] * len(smoother.smoothers))
+
         self.smoother = smoother
         self.alpha = alpha
-        self.pen_weight = 1
+        self.pen_weight = 1  # TODO: pen weight should not be defined here!!
         penal = MultivariateGamPenalty(smoother, alpha=alpha)
-        super(LogitGam, self).__init__(endog, smoother.basis, *args, penal=
-            penal, **kwargs)
+
+        super(LogitGam, self).__init__(endog, smoother.basis, penal=penal,
+                                       *args, **kwargs)


 def penalized_wls(endog, exog, penalty_matrix, weights):
@@ -535,7 +962,18 @@ def penalized_wls(endog, exog, penalty_matrix, weights):
     -------
     results : Results instance of WLS
     """
-    pass
+    y, x, s = endog, exog, penalty_matrix
+    # TODO: I do not understand why I need 2 * s
+    aug_y, aug_x, aug_weights = make_augmented_matrix(y, x, 2 * s, weights)
+    wls_results = lm.WLS(aug_y, aug_x, aug_weights).fit()
+    # TODO: use MinimalWLS during iterations, less overhead
+    # However, MinimalWLS does not return normalized_cov_params
+    #   which we need at the end of the iterations
+    # call would be
+    # wls_results = reg_tools._MinimalWLS(aug_y, aug_x, aug_weights).fit()
+    wls_results.params = wls_results.params.ravel()
+
+    return wls_results


 def make_augmented_matrix(endog, exog, penalty_matrix, weights):
@@ -561,4 +999,19 @@ def make_augmented_matrix(endog, exog, penalty_matrix, weights):
     weights_aug : ndarray
         augmented weights for WLS
     """
-    pass
+    y, x, s, = endog, exog, penalty_matrix
+    nobs = x.shape[0]
+
+    # TODO: needs full because of broadcasting with weights
+    # check what weights should be doing
+    rs = matrix_sqrt(s)
+    x1 = np.vstack([x, rs])  # augmented x
+    n_samp1es_x1 = x1.shape[0]
+
+    y1 = np.array([0.] * n_samp1es_x1)  # augmented y
+    y1[:nobs] = y
+
+    id1 = np.array([1.] * rs.shape[0])
+    w1 = np.concatenate([weights, id1])
+
+    return y1, x1, w1
diff --git a/statsmodels/gam/smooth_basis.py b/statsmodels/gam/smooth_basis.py
index 510beab83..7fe7d90ae 100644
--- a/statsmodels/gam/smooth_basis.py
+++ b/statsmodels/gam/smooth_basis.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Spline and other smoother classes for Generalized Additive Models

@@ -6,22 +7,130 @@ Author: Josef Perktold

 Created on Fri Jun  5 16:32:00 2015
 """
+
+# import useful only for development
 from abc import ABCMeta, abstractmethod
 from statsmodels.compat.python import with_metaclass
+
 import numpy as np
 import pandas as pd
 from patsy import dmatrix
 from patsy.mgcv_cubic_splines import _get_all_sorted_knots
+
 from statsmodels.tools.linalg import transf_constraints


+# Obtain b splines from patsy
+
+def _equally_spaced_knots(x, df):
+    n_knots = df - 2
+    x_min = x.min()
+    x_max = x.max()
+    knots = np.linspace(x_min, x_max, n_knots)
+    return knots
+
+
+def _R_compat_quantile(x, probs):
+    # return np.percentile(x, 100 * np.asarray(probs))
+    probs = np.asarray(probs)
+    quantiles = np.asarray([np.percentile(x, 100 * prob)
+                            for prob in probs.ravel(order="C")])
+    return quantiles.reshape(probs.shape, order="C")
+
+
+# FIXME: is this copy/pasted?  If so, why do we need it?  If not, get
+#  rid of the try/except for scipy import
+# from patsy splines.py
+def _eval_bspline_basis(x, knots, degree, deriv='all', include_intercept=True):
+    try:
+        from scipy.interpolate import splev
+    except ImportError:
+        raise ImportError("spline functionality requires scipy")
+    # 'knots' are assumed to be already pre-processed. E.g. usually you
+    # want to include duplicate copies of boundary knots; you should do
+    # that *before* calling this constructor.
+    knots = np.atleast_1d(np.asarray(knots, dtype=float))
+    assert knots.ndim == 1
+    knots.sort()
+    degree = int(degree)
+    x = np.atleast_1d(x)
+    if x.ndim == 2 and x.shape[1] == 1:
+        x = x[:, 0]
+    assert x.ndim == 1
+    # XX FIXME: when points fall outside of the boundaries, splev and R seem
+    # to handle them differently. I do not know why yet. So until we understand
+    # this and decide what to do with it, I'm going to play it safe and
+    # disallow such points.
+    if np.min(x) < np.min(knots) or np.max(x) > np.max(knots):
+        raise NotImplementedError("some data points fall outside the "
+                                  "outermost knots, and I'm not sure how "
+                                  "to handle them. (Patches accepted!)")
+    # Thanks to Charles Harris for explaining splev. It's not well
+    # documented, but basically it computes an arbitrary b-spline basis
+    # given knots and degree on some specificed points (or derivatives
+    # thereof, but we do not use that functionality), and then returns some
+    # linear combination of these basis functions. To get out the basis
+    # functions themselves, we use linear combinations like [1, 0, 0], [0,
+    # 1, 0], [0, 0, 1].
+    # NB: This probably makes it rather inefficient (though I have not checked
+    # to be sure -- maybe the fortran code actually skips computing the basis
+    # function for coefficients that are zero).
+    # Note: the order of a spline is the same as its degree + 1.
+    # Note: there are (len(knots) - order) basis functions.
+
+    k_const = 1 - int(include_intercept)
+    n_bases = len(knots) - (degree + 1) - k_const
+    if deriv in ['all', 0]:
+        basis = np.empty((x.shape[0], n_bases), dtype=float)
+        ret = basis
+    if deriv in ['all', 1]:
+        der1_basis = np.empty((x.shape[0], n_bases), dtype=float)
+        ret = der1_basis
+    if deriv in ['all', 2]:
+        der2_basis = np.empty((x.shape[0], n_bases), dtype=float)
+        ret = der2_basis
+
+    for i in range(n_bases):
+        coefs = np.zeros((n_bases + k_const,))
+        # we are skipping the first column of the basis to drop constant
+        coefs[i + k_const] = 1
+        ii = i
+        if deriv in ['all', 0]:
+            basis[:, ii] = splev(x, (knots, coefs, degree))
+        if deriv in ['all', 1]:
+            der1_basis[:, ii] = splev(x, (knots, coefs, degree), der=1)
+        if deriv in ['all', 2]:
+            der2_basis[:, ii] = splev(x, (knots, coefs, degree), der=2)
+
+    if deriv == 'all':
+        return basis, der1_basis, der2_basis
+    else:
+        return ret
+
+
+def compute_all_knots(x, df, degree):
+    order = degree + 1
+    n_inner_knots = df - order
+    lower_bound = np.min(x)
+    upper_bound = np.max(x)
+    knot_quantiles = np.linspace(0, 1, n_inner_knots + 2)[1:-1]
+    inner_knots = _R_compat_quantile(x, knot_quantiles)
+    all_knots = np.concatenate(([lower_bound, upper_bound] * order,
+                                inner_knots))
+    return all_knots, lower_bound, upper_bound, inner_knots
+
+
 def make_bsplines_basis(x, df, degree):
-    """ make a spline basis for x """
-    pass
+    ''' make a spline basis for x '''
+
+    all_knots, _, _, _ = compute_all_knots(x, df, degree)
+    basis, der_basis, der2_basis = _eval_bspline_basis(x, all_knots, degree)
+    return basis, der_basis, der2_basis


-def get_knots_bsplines(x=None, df=None, knots=None, degree=3, spacing=
-    'quantile', lower_bound=None, upper_bound=None, all_knots=None):
+def get_knots_bsplines(x=None, df=None, knots=None, degree=3,
+                       spacing='quantile', lower_bound=None,
+                       upper_bound=None, all_knots=None):
     """knots for use in B-splines

     There are two main options for the knot placement
@@ -32,7 +141,86 @@ def get_knots_bsplines(x=None, df=None, knots=None, degree=3, spacing=
     The first corresponds to splines as used by patsy. the second is the
     knot spacing for P-Splines.
     """
-    pass
+    # based on patsy memorize_finish
+    if all_knots is not None:
+        return all_knots
+
+    x_min = x.min()
+    x_max = x.max()
+
+    if degree < 0:
+        raise ValueError("degree must be greater than 0 (not %r)"
+                         % (degree,))
+    if int(degree) != degree:
+        raise ValueError("degree must be an integer (not %r)"
+                         % (degree,))
+
+    # These are guaranteed to all be 1d vectors by the code above
+    # x = np.concatenate(tmp["xs"])
+    if df is None and knots is None:
+        raise ValueError("must specify either df or knots")
+    order = degree + 1
+    if df is not None:
+        n_inner_knots = df - order
+        if n_inner_knots < 0:
+            raise ValueError("df=%r is too small for degree=%r; must be >= %s"
+                             % (df, degree,
+                                # We know that n_inner_knots is negative;
+                                # if df were that much larger, it would
+                                # have been zero, and things would work.
+                                df - n_inner_knots))
+        if knots is not None:
+            if len(knots) != n_inner_knots:
+                raise ValueError("df=%s with degree=%r implies %s knots, "
+                                 "but %s knots were provided"
+                                 % (df, degree,
+                                    n_inner_knots, len(knots)))
+        elif spacing == 'quantile':
+            # Need to compute inner knots
+            knot_quantiles = np.linspace(0, 1, n_inner_knots + 2)[1:-1]
+            inner_knots = _R_compat_quantile(x, knot_quantiles)
+        elif spacing == 'equal':
+            # Need to compute inner knots
+            grid = np.linspace(0, 1, n_inner_knots + 2)[1:-1]
+            inner_knots = x_min + grid * (x_max - x_min)
+            diff_knots = inner_knots[1] - inner_knots[0]
+        else:
+            raise ValueError("incorrect option for spacing")
+    if knots is not None:
+        inner_knots = knots
+    if lower_bound is None:
+        lower_bound = np.min(x)
+    if upper_bound is None:
+        upper_bound = np.max(x)
+
+    if lower_bound > upper_bound:
+        raise ValueError("lower_bound > upper_bound (%r > %r)"
+                         % (lower_bound, upper_bound))
+    inner_knots = np.asarray(inner_knots)
+    if inner_knots.ndim > 1:
+        raise ValueError("knots must be 1 dimensional")
+    if np.any(inner_knots < lower_bound):
+        raise ValueError("some knot values (%s) fall below lower bound "
+                         "(%r)"
+                         % (inner_knots[inner_knots < lower_bound],
+                            lower_bound))
+    if np.any(inner_knots > upper_bound):
+        raise ValueError("some knot values (%s) fall above upper bound "
+                         "(%r)"
+                         % (inner_knots[inner_knots > upper_bound],
+                            upper_bound))
+
+    if spacing == "equal":
+        diffs = np.arange(1, order + 1) * diff_knots
+        lower_knots = inner_knots[0] - diffs[::-1]
+        upper_knots = inner_knots[-1] + diffs
+        all_knots = np.concatenate((lower_knots, inner_knots, upper_knots))
+    else:
+        all_knots = np.concatenate(([lower_bound, upper_bound] * order,
+                                    inner_knots))
+    all_knots.sort()
+
+    return all_knots


 def _get_integration_points(knots, k_points=3):
@@ -40,11 +228,17 @@ def _get_integration_points(knots, k_points=3):

     inserts k_points between each two consecutive knots
     """
-    pass
+    k_points = k_points + 1
+    knots = np.unique(knots)
+    dxi = np.arange(k_points) / k_points
+    dxk = np.diff(knots)
+    dx = dxk[:, None] * dxi
+    x = np.concatenate(((knots[:-1, None] + dx).ravel(), [knots[-1]]))
+    return x


-def get_covder2(smoother, k_points=3, integration_points=None, skip_ctransf
-    =False, deriv=2):
+def get_covder2(smoother, k_points=3, integration_points=None,
+                skip_ctransf=False, deriv=2):
     """
     Approximate integral of cross product of second derivative of smoother

@@ -52,37 +246,88 @@ def get_covder2(smoother, k_points=3, integration_points=None, skip_ctransf
     integral of the smoother derivative cross-product at knots plus k_points
     in between knots.
     """
-    pass
-
-
+    try:
+        from scipy.integrate import simpson
+    except ImportError:
+        # Remove after SciPy 1.7 is the minimum version
+        from scipy.integrate import simps as simpson
+    knots = smoother.knots
+    if integration_points is None:
+        x = _get_integration_points(knots, k_points=k_points)
+    else:
+        x = integration_points
+    d2 = smoother.transform(x, deriv=deriv, skip_ctransf=skip_ctransf)
+    covd2 = simpson(d2[:, :, None] * d2[:, None, :], x=x, axis=0)
+    return covd2
+
+
+# TODO: this function should be deleted
 def make_poly_basis(x, degree, intercept=True):
-    """
+    '''
     given a vector x returns poly=(1, x, x^2, ..., x^degree)
     and its first and second derivative
-    """
-    pass
-
+    '''
+
+    if intercept:
+        start = 0
+    else:
+        start = 1
+
+    nobs = len(x)
+    basis = np.zeros(shape=(nobs, degree + 1 - start))
+    der_basis = np.zeros(shape=(nobs, degree + 1 - start))
+    der2_basis = np.zeros(shape=(nobs, degree + 1 - start))
+
+    for i in range(start, degree + 1):
+        basis[:, i - start] = x ** i
+        der_basis[:, i - start] = i * x ** (i - 1)
+        der2_basis[:, i - start] = i * (i - 1) * x ** (i - 2)
+
+    return basis, der_basis, der2_basis
+
+
+# TODO: try to include other kinds of splines from patsy
+# x = np.linspace(0, 1, 30)
+# df = 10
+# degree = 3
+# from patsy.mgcv_cubic_splines import cc, cr, te
+# all_knots, lower, upper, inner  = compute_all_knots(x, df, degree)
+# result = cc(x, df=df, knots=all_knots, lower_bound=lower, upper_bound=upper,
+#             constraints=None)
+#
+# import matplotlib.pyplot as plt
+#
+# result = np.array(result)
+# print(result.shape)
+# plt.plot(result.T)
+# plt.show()

 class UnivariateGamSmoother(with_metaclass(ABCMeta)):
     """Base Class for single smooth component
     """
-
     def __init__(self, x, constraints=None, variable_name='x'):
         self.x = x
         self.constraints = constraints
         self.variable_name = variable_name
         self.nobs, self.k_variables = len(x), 1
+
         base4 = self._smooth_basis_for_single_variable()
         if constraints == 'center':
             constraints = base4[0].mean(0)[None, :]
+
         if constraints is not None and not isinstance(constraints, str):
             ctransf = transf_constraints(constraints)
             self.ctransf = ctransf
-        elif not hasattr(self, 'ctransf'):
-            self.ctransf = None
+        else:
+            # subclasses might set ctransf directly
+            # only used if constraints is None
+            if not hasattr(self, 'ctransf'):
+                self.ctransf = None
+
         self.basis, self.der_basis, self.der2_basis, self.cov_der2 = base4
         if self.ctransf is not None:
             ctransf = self.ctransf
+            # transform attributes that are not None
             if base4[0] is not None:
                 self.basis = base4[0].dot(ctransf)
             if base4[1] is not None:
@@ -91,40 +336,63 @@ class UnivariateGamSmoother(with_metaclass(ABCMeta)):
                 self.der2_basis = base4[2].dot(ctransf)
             if base4[3] is not None:
                 self.cov_der2 = ctransf.T.dot(base4[3]).dot(ctransf)
+
         self.dim_basis = self.basis.shape[1]
-        self.col_names = [(self.variable_name + '_s' + str(i)) for i in
-            range(self.dim_basis)]
+        self.col_names = [self.variable_name + "_s" + str(i)
+                          for i in range(self.dim_basis)]
+
+    @abstractmethod
+    def _smooth_basis_for_single_variable(self):
+        return


 class UnivariateGenericSmoother(UnivariateGamSmoother):
     """Generic single smooth component
     """
-
     def __init__(self, x, basis, der_basis, der2_basis, cov_der2,
-        variable_name='x'):
+                 variable_name='x'):
         self.basis = basis
         self.der_basis = der_basis
         self.der2_basis = der2_basis
         self.cov_der2 = cov_der2
-        super(UnivariateGenericSmoother, self).__init__(x, variable_name=
-            variable_name)
+
+        super(UnivariateGenericSmoother, self).__init__(
+            x, variable_name=variable_name)
+
+    def _smooth_basis_for_single_variable(self):
+        return self.basis, self.der_basis, self.der2_basis, self.cov_der2


 class UnivariatePolynomialSmoother(UnivariateGamSmoother):
     """polynomial single smooth component
     """
-
     def __init__(self, x, degree, variable_name='x'):
         self.degree = degree
-        super(UnivariatePolynomialSmoother, self).__init__(x, variable_name
-            =variable_name)
+        super(UnivariatePolynomialSmoother, self).__init__(
+            x, variable_name=variable_name)

     def _smooth_basis_for_single_variable(self):
+        # TODO: unclear description
         """
         given a vector x returns poly=(1, x, x^2, ..., x^degree)
         and its first and second derivative
         """
-        pass
+
+        basis = np.zeros(shape=(self.nobs, self.degree))
+        der_basis = np.zeros(shape=(self.nobs, self.degree))
+        der2_basis = np.zeros(shape=(self.nobs, self.degree))
+        for i in range(self.degree):
+            dg = i + 1
+            basis[:, i] = self.x ** dg
+            der_basis[:, i] = dg * self.x ** (dg - 1)
+            if dg > 1:
+                der2_basis[:, i] = dg * (dg - 1) * self.x ** (dg - 2)
+            else:
+                der2_basis[:, i] = 0
+
+        cov_der2 = np.dot(der2_basis.T, der2_basis)
+
+        return basis, der_basis, der2_basis, cov_der2


 class UnivariateBSplines(UnivariateGamSmoother):
@@ -178,16 +446,28 @@ class UnivariateBSplines(UnivariateGamSmoother):
           If all knots are provided, then those will be taken as given and
           all other options will be ignored.
     """
-
     def __init__(self, x, df, degree=3, include_intercept=False,
-        constraints=None, variable_name='x', covder2_kwds=None, **knot_kwds):
+                 constraints=None, variable_name='x',
+                 covder2_kwds=None, **knot_kwds):
         self.degree = degree
         self.df = df
         self.include_intercept = include_intercept
         self.knots = get_knots_bsplines(x, degree=degree, df=df, **knot_kwds)
-        self.covder2_kwds = covder2_kwds if covder2_kwds is not None else {}
-        super(UnivariateBSplines, self).__init__(x, constraints=constraints,
-            variable_name=variable_name)
+        self.covder2_kwds = (covder2_kwds if covder2_kwds is not None
+                             else {})
+        super(UnivariateBSplines, self).__init__(
+            x, constraints=constraints, variable_name=variable_name)
+
+    def _smooth_basis_for_single_variable(self):
+        basis, der_basis, der2_basis = _eval_bspline_basis(
+            self.x, self.knots, self.degree,
+            include_intercept=self.include_intercept)
+        # cov_der2 = np.dot(der2_basis.T, der2_basis)
+
+        cov_der2 = get_covder2(self, skip_ctransf=True,
+                               **self.covder2_kwds)
+
+        return basis, der_basis, der2_basis, cov_der2

     def transform(self, x_new, deriv=0, skip_ctransf=False):
         """create the spline basis for new observations
@@ -211,7 +491,18 @@ class UnivariateBSplines(UnivariateGamSmoother):
         basis : ndarray
             design matrix for the spline basis for given ``x_new``
         """
-        pass
+
+        if x_new is None:
+            x_new = self.x
+        exog = _eval_bspline_basis(x_new, self.knots, self.degree,
+                                   deriv=deriv,
+                                   include_intercept=self.include_intercept)
+
+        # ctransf does not exist yet when cov_der2 is computed
+        ctransf = getattr(self, 'ctransf', None)
+        if ctransf is not None and not skip_ctransf:
+            exog = exog.dot(self.ctransf)
+        return exog


 class UnivariateCubicSplines(UnivariateGamSmoother):
@@ -221,14 +512,101 @@ class UnivariateCubicSplines(UnivariateGamSmoother):
     """

     def __init__(self, x, df, constraints=None, transform='domain',
-        variable_name='x'):
+                 variable_name='x'):
+
         self.degree = 3
         self.df = df
         self.transform_data_method = transform
+
         self.x = x = self.transform_data(x, initialize=True)
         self.knots = _equally_spaced_knots(x, df)
-        super(UnivariateCubicSplines, self).__init__(x, constraints=
-            constraints, variable_name=variable_name)
+        super(UnivariateCubicSplines, self).__init__(
+            x, constraints=constraints, variable_name=variable_name)
+
+    def transform_data(self, x, initialize=False):
+        tm = self.transform_data_method
+        if tm is None:
+            return x
+
+        if initialize is True:
+            if tm == 'domain':
+                self.domain_low = x.min(0)
+                self.domain_upp = x.max(0)
+            elif isinstance(tm, tuple):
+                self.domain_low = tm[0]
+                self.domain_upp = tm[1]
+                self.transform_data_method = 'domain'
+            else:
+                raise ValueError("transform should be None, 'domain' "
+                                 "or a tuple")
+            self.domain_diff = self.domain_upp - self.domain_low
+
+        if self.transform_data_method == 'domain':
+            x = (x - self.domain_low) / self.domain_diff
+            return x
+        else:
+            raise ValueError("incorrect transform_data_method")
+
+    def _smooth_basis_for_single_variable(self):
+
+        basis = self._splines_x()[:, :-1]
+        # demean except for constant, does not affect derivatives
+        if not self.constraints == 'none':
+            self.transf_mean = basis[:, 1:].mean(0)
+            basis[:, 1:] -= self.transf_mean
+        else:
+            self.transf_mean = np.zeros(basis.shape[1])
+        s = self._splines_s()[:-1, :-1]
+        if not self.constraints == 'none':
+            ctransf = np.diag(1/np.max(np.abs(basis), axis=0))
+        else:
+            ctransf = np.eye(basis.shape[1])
+        # use np.eye to avoid rescaling
+        # ctransf = np.eye(basis.shape[1])
+
+        if self.constraints == 'no-const':
+            ctransf = ctransf[1:]
+
+        self.ctransf = ctransf
+
+        return basis, None, None, s
+
+    def _rk(self, x, z):
+        p1 = ((z - 1 / 2) ** 2 - 1 / 12) * ((x - 1 / 2) ** 2 - 1 / 12) / 4
+        p2 = ((np.abs(z - x) - 1 / 2) ** 4 -
+              1 / 2 * (np.abs(z - x) - 1 / 2) ** 2 +
+              7 / 240) / 24.
+        return p1 - p2
+
+    def _splines_x(self, x=None):
+        if x is None:
+            x = self.x
+        n_columns = len(self.knots) + 2
+        nobs = x.shape[0]
+        basis = np.ones(shape=(nobs, n_columns))
+        basis[:, 1] = x
+        # for loop equivalent to outer(x, xk, fun=rk)
+        for i, xi in enumerate(x):
+            for j, xkj in enumerate(self.knots):
+                s_ij = self._rk(xi, xkj)
+                basis[i, j + 2] = s_ij
+        return basis
+
+    def _splines_s(self):
+        q = len(self.knots) + 2
+        s = np.zeros(shape=(q, q))
+        for i, x1 in enumerate(self.knots):
+            for j, x2 in enumerate(self.knots):
+                s[i + 2, j + 2] = self._rk(x1, x2)
+        return s
+
+    def transform(self, x_new):
+        x_new = self.transform_data(x_new, initialize=False)
+        exog = self._splines_x(x_new)
+        exog[:, 1:] -= self.transf_mean
+        if self.ctransf is not None:
+            exog = exog.dot(self.ctransf)
+        return exog


 class UnivariateCubicCyclicSplines(UnivariateGamSmoother):
@@ -258,14 +636,27 @@ class UnivariateCubicCyclicSplines(UnivariateGamSmoother):
         The name for the underlying explanatory variable, x, used in for
         creating the column and parameter names for the basis functions.
     """
-
     def __init__(self, x, df, constraints=None, variable_name='x'):
         self.degree = 3
         self.df = df
         self.x = x
         self.knots = _equally_spaced_knots(x, df)
-        super(UnivariateCubicCyclicSplines, self).__init__(x, constraints=
-            constraints, variable_name=variable_name)
+        super(UnivariateCubicCyclicSplines, self).__init__(
+            x, constraints=constraints, variable_name=variable_name)
+
+    def _smooth_basis_for_single_variable(self):
+        basis = dmatrix("cc(x, df=" + str(self.df) + ") - 1", {"x": self.x})
+        self.design_info = basis.design_info
+        n_inner_knots = self.df - 2 + 1  # +n_constraints
+        # TODO: from CubicRegressionSplines class
+        all_knots = _get_all_sorted_knots(self.x, n_inner_knots=n_inner_knots,
+                                          inner_knots=None,
+                                          lower_bound=None, upper_bound=None)
+
+        b, d = self._get_b_and_d(all_knots)
+        s = self._get_s(b, d)
+
+        return basis, None, None, s

     def _get_b_and_d(self, knots):
         """Returns mapping of cyclic cubic spline values to 2nd derivatives.
@@ -292,49 +683,91 @@ class UnivariateCubicCyclicSplines(UnivariateGamSmoother):
         -----
         The penalty matrix is equal to ``s = d.T.dot(b^-1).dot(d)``
         """
-        pass
+        h = knots[1:] - knots[:-1]
+        n = knots.size - 1
+
+        # b and d are defined such that the penalty matrix is equivalent to:
+        # s = d.T.dot(b^-1).dot(d)
+        # reference in particular to pag 146 of Wood's book
+        b = np.zeros((n, n))  # the b matrix on page 146 of Wood's book
+        d = np.zeros((n, n))  # the d matrix on page 146 of Wood's book
+
+        b[0, 0] = (h[n - 1] + h[0]) / 3.
+        b[0, n - 1] = h[n - 1] / 6.
+        b[n - 1, 0] = h[n - 1] / 6.
+
+        d[0, 0] = -1. / h[0] - 1. / h[n - 1]
+        d[0, n - 1] = 1. / h[n - 1]
+        d[n - 1, 0] = 1. / h[n - 1]
+
+        for i in range(1, n):
+            b[i, i] = (h[i - 1] + h[i]) / 3.
+            b[i, i - 1] = h[i - 1] / 6.
+            b[i - 1, i] = h[i - 1] / 6.
+
+            d[i, i] = -1. / h[i - 1] - 1. / h[i]
+            d[i, i - 1] = 1. / h[i - 1]
+            d[i - 1, i] = 1. / h[i - 1]
+
+        return b, d
+
+    def _get_s(self, b, d):
+        return d.T.dot(np.linalg.inv(b)).dot(d)
+
+    def transform(self, x_new):
+        exog = dmatrix(self.design_info, {"x": x_new})
+        if self.ctransf is not None:
+            exog = exog.dot(self.ctransf)
+        return exog


 class AdditiveGamSmoother(with_metaclass(ABCMeta)):
     """Base class for additive smooth components
     """
+    def __init__(self, x, variable_names=None, include_intercept=False,
+                 **kwargs):

-    def __init__(self, x, variable_names=None, include_intercept=False, **
-        kwargs):
+        # get pandas names before using asarray
         if isinstance(x, pd.DataFrame):
             data_names = x.columns.tolist()
         elif isinstance(x, pd.Series):
             data_names = [x.name]
         else:
             data_names = None
+
         x = np.asarray(x)
+
         if x.ndim == 1:
             self.x = x.copy()
-            self.x.shape = len(x), 1
+            self.x.shape = (len(x), 1)
         else:
             self.x = x
+
         self.nobs, self.k_variables = self.x.shape
         if isinstance(include_intercept, bool):
             self.include_intercept = [include_intercept] * self.k_variables
         else:
             self.include_intercept = include_intercept
+
         if variable_names is None:
             if data_names is not None:
                 self.variable_names = data_names
             else:
-                self.variable_names = [('x' + str(i)) for i in range(self.
-                    k_variables)]
+                self.variable_names = ['x' + str(i)
+                                       for i in range(self.k_variables)]
         else:
             self.variable_names = variable_names
+
         self.smoothers = self._make_smoothers_list()
-        self.basis = np.hstack(list(smoother.basis for smoother in self.
-            smoothers))
+        self.basis = np.hstack(list(smoother.basis
+                               for smoother in self.smoothers))
         self.dim_basis = self.basis.shape[1]
-        self.penalty_matrices = [smoother.cov_der2 for smoother in self.
-            smoothers]
+        self.penalty_matrices = [smoother.cov_der2
+                                 for smoother in self.smoothers]
         self.col_names = []
         for smoother in self.smoothers:
             self.col_names.extend(smoother.col_names)
+
         self.mask = []
         last_column = 0
         for smoother in self.smoothers:
@@ -343,6 +776,10 @@ class AdditiveGamSmoother(with_metaclass(ABCMeta)):
             last_column = last_column + smoother.dim_basis
             self.mask.append(mask)

+    @abstractmethod
+    def _make_smoothers_list(self):
+        pass
+
     def transform(self, x_new):
         """create the spline basis for new observations

@@ -359,26 +796,41 @@ class AdditiveGamSmoother(with_metaclass(ABCMeta)):
         basis : ndarray
             design matrix for the spline basis for given ``x_new``.
         """
-        pass
+        if x_new.ndim == 1 and self.k_variables == 1:
+            x_new = x_new.reshape(-1, 1)
+        exog = np.hstack(list(self.smoothers[i].transform(x_new[:, i])
+                         for i in range(self.k_variables)))
+        return exog


 class GenericSmoothers(AdditiveGamSmoother):
     """generic class for additive smooth components for GAM
     """
-
     def __init__(self, x, smoothers):
         self.smoothers = smoothers
         super(GenericSmoothers, self).__init__(x, variable_names=None)

+    def _make_smoothers_list(self):
+        return self.smoothers
+

 class PolynomialSmoother(AdditiveGamSmoother):
     """additive polynomial components for GAM
     """
-
     def __init__(self, x, degrees, variable_names=None):
         self.degrees = degrees
-        super(PolynomialSmoother, self).__init__(x, variable_names=
-            variable_names)
+        super(PolynomialSmoother, self).__init__(x,
+                                                 variable_names=variable_names)
+
+    def _make_smoothers_list(self):
+        smoothers = []
+        for v in range(self.k_variables):
+            uv_smoother = UnivariatePolynomialSmoother(
+                self.x[:, v],
+                degree=self.degrees[v],
+                variable_name=self.variable_names[v])
+            smoothers.append(uv_smoother)
+        return smoothers


 class BSplines(AdditiveGamSmoother):
@@ -462,9 +914,8 @@ class BSplines(AdditiveGamSmoother):
     ``include_intercept`` will be automatically set to True to avoid
     dropping an additional column.
     """
-
-    def __init__(self, x, df, degree, include_intercept=False, constraints=
-        None, variable_names=None, knot_kwds=None):
+    def __init__(self, x, df, degree, include_intercept=False,
+                 constraints=None, variable_names=None, knot_kwds=None):
         if isinstance(degree, int):
             self.degrees = np.array([degree], dtype=int)
         else:
@@ -474,11 +925,27 @@ class BSplines(AdditiveGamSmoother):
         else:
             self.dfs = df
         self.knot_kwds = knot_kwds
+        # TODO: move attaching constraints to super call
         self.constraints = constraints
         if constraints == 'center':
             include_intercept = True
-        super(BSplines, self).__init__(x, include_intercept=
-            include_intercept, variable_names=variable_names)
+
+        super(BSplines, self).__init__(x, include_intercept=include_intercept,
+                                       variable_names=variable_names)
+
+    def _make_smoothers_list(self):
+        smoothers = []
+        for v in range(self.k_variables):
+            kwds = self.knot_kwds[v] if self.knot_kwds else {}
+            uv_smoother = UnivariateBSplines(
+                self.x[:, v],
+                df=self.dfs[v], degree=self.degrees[v],
+                include_intercept=self.include_intercept[v],
+                constraints=self.constraints,
+                variable_name=self.variable_names[v], **kwds)
+            smoothers.append(uv_smoother)
+
+        return smoothers


 class CubicSplines(AdditiveGamSmoother):
@@ -487,14 +954,25 @@ class CubicSplines(AdditiveGamSmoother):
     Note, these splines do NOT use the same spline basis as
     ``Cubic Regression Splines``.
     """
-
     def __init__(self, x, df, constraints='center', transform='domain',
-        variable_names=None):
+                 variable_names=None):
         self.dfs = df
         self.constraints = constraints
         self.transform = transform
         super(CubicSplines, self).__init__(x, constraints=constraints,
-            variable_names=variable_names)
+                                           variable_names=variable_names)
+
+    def _make_smoothers_list(self):
+        smoothers = []
+        for v in range(self.k_variables):
+            uv_smoother = UnivariateCubicSplines(
+                            self.x[:, v], df=self.dfs[v],
+                            constraints=self.constraints,
+                            transform=self.transform,
+                            variable_name=self.variable_names[v])
+            smoothers.append(uv_smoother)
+
+        return smoothers


 class CyclicCubicSplines(AdditiveGamSmoother):
@@ -518,9 +996,72 @@ class CyclicCubicSplines(AdditiveGamSmoother):
         creating the column and parameter names for the basis functions.
         If ``x`` is a pandas object, then the names will be taken from it.
     """
-
     def __init__(self, x, df, constraints=None, variable_names=None):
         self.dfs = df
+        # TODO: move attaching constraints to super call
         self.constraints = constraints
-        super(CyclicCubicSplines, self).__init__(x, variable_names=
-            variable_names)
+        super(CyclicCubicSplines, self).__init__(x,
+                                                 variable_names=variable_names)
+
+    def _make_smoothers_list(self):
+        smoothers = []
+        for v in range(self.k_variables):
+            uv_smoother = UnivariateCubicCyclicSplines(
+                self.x[:, v],
+                df=self.dfs[v], constraints=self.constraints,
+                variable_name=self.variable_names[v])
+            smoothers.append(uv_smoother)
+
+        return smoothers
+
+# class CubicRegressionSplines(BaseCubicSplines):
+#     # TODO: this class is still not tested
+#
+#     def __init__(self, x, df=10):
+#         import warnings
+#         warnings.warn("This class is still not tested and it is probably"
+#                       " not working properly. "
+#                       "I suggest to use another smoother", Warning)
+#
+#         super(CubicRegressionSplines, self).__init__(x, df)
+#
+#         self.basis = dmatrix("cc(x, df=" + str(df) + ") - 1", {"x": x})
+#         n_inner_knots = df - 2 + 1 # +n_constraints
+#         # TODO: ACcording to CubicRegressionSplines class this should be
+#         #  n_inner_knots = df - 2
+#         all_knots = _get_all_sorted_knots(x, n_inner_knots=n_inner_knots,
+#                                           inner_knots=None,
+#                                           lower_bound=None, upper_bound=None)
+#
+#         b, d = self._get_b_and_d(all_knots)
+#         self.s = self._get_s(b, d)
+#
+#         self.dim_basis = self.basis.shape[1]
+#
+#     def _get_b_and_d(self, knots):
+#
+#         h = knots[1:] - knots[:-1]
+#         n = knots.size - 1
+#
+#         # b and d are defined such that the penalty matrix is equivalent to:
+#         # s = d.T.dot(b^-1).dot(d)
+#         # reference in particular to pag 146 of Wood's book
+#         b = np.zeros((n, n)) # the b matrix on page 146 of Wood's book
+#         d = np.zeros((n, n)) # the d matrix on page 146 of Wood's book
+#
+#         for i in range(n-2):
+#             d[i, i] = 1/h[i]
+#             d[i, i+1] = -1/h[i] - 1/h[i+1]
+#             d[i, i+2] = 1/h[i+1]
+#
+#             b[i, i] = (h[i] + h[i+1])/3
+#
+#         for i in range(n-3):
+#             b[i, i+1] = h[i+1]/6
+#             b[i+1, i] = h[i+1]/6
+#
+#         return b, d
+#
+#     def _get_s(self, b, d):
+#
+#         return d.T.dot(np.linalg.pinv(b)).dot(d)
diff --git a/statsmodels/genmod/_tweedie_compound_poisson.py b/statsmodels/genmod/_tweedie_compound_poisson.py
index e657ada73..1b393c09f 100644
--- a/statsmodels/genmod/_tweedie_compound_poisson.py
+++ b/statsmodels/genmod/_tweedie_compound_poisson.py
@@ -19,10 +19,74 @@ Smyth G.K. and Jørgensen B. 2002. Fitting Tweedie's compound Poisson model to
 import numpy as np
 from scipy._lib._util import _lazywhere
 from scipy.special import gammaln
+
+
+def _theta(mu, p):
+    return np.where(p == 1, np.log(mu), mu ** (1 - p) / (1 - p))
+
+
+def _alpha(p):
+    return (2 - p) / (1 - p)
+
+
+def _logWj(y, j, p, phi):
+    alpha = _alpha(p)
+    logz = (-alpha * np.log(y) + alpha * np.log(p - 1) - (1 - alpha) *
+            np.log(phi) - np.log(2 - p))
+    return (j * logz - gammaln(1 + j) - gammaln(-alpha * j))
+
+
+def kappa(mu, p):
+    return mu ** (2 - p) / (2 - p)
+
+
+@np.vectorize
+def _sumw(y, j_l, j_u, logWmax, p, phi):
+    j = np.arange(j_l, j_u + 1)
+    sumw = np.sum(np.exp(_logWj(y, j, p, phi) - logWmax))
+    return sumw
+
+
+def logW(y, p, phi):
+    alpha = _alpha(p)
+    jmax = y ** (2 - p) / ((2 - p) * phi)
+    logWmax = np.array((1 - alpha) * jmax)
+    tol = logWmax - 37  # Machine accuracy for 64 bit.
+    j = np.ceil(jmax)
+    while (_logWj(y, np.ceil(j), p, phi) > tol).any():
+        j = np.where(_logWj(y, j, p, phi) > tol, j + 1, j)
+    j_u = j
+    j = np.floor(jmax)
+    j = np.where(j > 1, j, 1)
+    while (_logWj(y, j, p, phi) > tol).any() and (j > 1).any():
+        j = np.where(_logWj(y, j, p, phi) > tol, j - 1, 1)
+    j_l = j
+    sumw = _sumw(y, j_l, j_u, logWmax, p, phi)
+    return logWmax + np.log(sumw)
+
+
+def density_at_zero(y, mu, p, phi):
+    return np.exp(-(mu ** (2 - p)) / (phi * (2 - p)))
+
+
+def density_otherwise(y, mu, p, phi):
+    theta = _theta(mu, p)
+    logd = logW(y, p, phi) - np.log(y) + (1 / phi * (y * theta - kappa(mu, p)))
+    return np.exp(logd)
+
+
+def series_density(y, mu, p, phi):
+    density = _lazywhere(np.array(y) > 0,
+                         (y, mu, p, phi),
+                         f=density_otherwise,
+                         f2=density_at_zero)
+    return density
+
+
 if __name__ == '__main__':
     from scipy import stats
-    n = stats.poisson.rvs(0.1, size=10000000)
-    y = stats.gamma.rvs(0.1, scale=30000, size=10000000)
+    n = stats.poisson.rvs(.1, size=10000000)
+    y = stats.gamma.rvs(.1, scale=30000, size=10000000)
     y = n * y
     mu = stats.gamma.rvs(10, scale=30, size=10000000)
     import time
diff --git a/statsmodels/genmod/api.py b/statsmodels/genmod/api.py
index e3ab9997b..ddf61d72f 100644
--- a/statsmodels/genmod/api.py
+++ b/statsmodels/genmod/api.py
@@ -1,5 +1,8 @@
-__all__ = ['GLM', 'GEE', 'OrdinalGEE', 'NominalGEE',
-    'BinomialBayesMixedGLM', 'PoissonBayesMixedGLM', 'families', 'cov_struct']
+__all__ = [
+    "GLM", "GEE", "OrdinalGEE", "NominalGEE",
+    "BinomialBayesMixedGLM", "PoissonBayesMixedGLM",
+    "families", "cov_struct"
+]
 from .generalized_linear_model import GLM
 from .generalized_estimating_equations import GEE, OrdinalGEE, NominalGEE
 from .bayes_mixed_glm import BinomialBayesMixedGLM, PoissonBayesMixedGLM
diff --git a/statsmodels/genmod/bayes_mixed_glm.py b/statsmodels/genmod/bayes_mixed_glm.py
index fa6d9ff15..d7e2dbdae 100644
--- a/statsmodels/genmod/bayes_mixed_glm.py
+++ b/statsmodels/genmod/bayes_mixed_glm.py
@@ -1,4 +1,4 @@
-"""
+r"""
 Bayesian inference for generalized linear mixed models.

 Currently only families without additional scale or shape parameters
@@ -45,6 +45,7 @@ is independent Gaussian (random effect realizations are independent
 within and between values of the `ident` array).  The model
 :math:`p(y | vc, fep)` depends on the specific GLM being fit.
 """
+
 import numpy as np
 from scipy.optimize import minimize
 from scipy import sparse
@@ -54,14 +55,22 @@ from statsmodels.genmod import families
 import pandas as pd
 import warnings
 import patsy
-glw = [[0.2955242247147529, -0.1488743389816312], [0.2955242247147529, 
-    0.1488743389816312], [0.2692667193099963, -0.4333953941292472], [
-    0.2692667193099963, 0.4333953941292472], [0.219086362515982, -
-    0.6794095682990244], [0.219086362515982, 0.6794095682990244], [
-    0.1494513491505806, -0.8650633666889845], [0.1494513491505806, 
-    0.8650633666889845], [0.0666713443086881, -0.9739065285171717], [
-    0.0666713443086881, 0.9739065285171717]]
-_init_doc = """
+
+# Gauss-Legendre weights
+glw = [
+    [0.2955242247147529, -0.1488743389816312],
+    [0.2955242247147529, 0.1488743389816312],
+    [0.2692667193099963, -0.4333953941292472],
+    [0.2692667193099963, 0.4333953941292472],
+    [0.2190863625159820, -0.6794095682990244],
+    [0.2190863625159820, 0.6794095682990244],
+    [0.1494513491505806, -0.8650633666889845],
+    [0.1494513491505806, 0.8650633666889845],
+    [0.0666713443086881, -0.9739065285171717],
+    [0.0666713443086881, 0.9739065285171717],
+]
+
+_init_doc = r"""
     Generalized Linear Mixed Model with Bayesian estimation

     The class implements the Laplace approximation to the posterior
@@ -152,6 +161,9 @@ _init_doc = """
     models with binary outcomes
     https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866838/
     """
+
+# The code in the example should be identical to what appears in
+# the test_doc_examples unit test
 _logit_example = """
     A binomial (logistic) random effects model with random intercepts
     for villages and random slopes for each year within each village:
@@ -161,6 +173,9 @@ _logit_example = """
                    'y ~ year_cen', random, data)
     >>> result = model.fit_vb()
 """
+
+# The code in the example should be identical to what appears in
+# the test_doc_examples unit test
 _poisson_example = """
     A Poisson random effects model with random intercepts for villages
     and random slopes for each year within each village:
@@ -173,64 +188,94 @@ _poisson_example = """


 class _BayesMixedGLM(base.Model):
+    def __init__(self,
+                 endog,
+                 exog,
+                 exog_vc=None,
+                 ident=None,
+                 family=None,
+                 vcp_p=1,
+                 fe_p=2,
+                 fep_names=None,
+                 vcp_names=None,
+                 vc_names=None,
+                 **kwargs):

-    def __init__(self, endog, exog, exog_vc=None, ident=None, family=None,
-        vcp_p=1, fe_p=2, fep_names=None, vcp_names=None, vc_names=None, **
-        kwargs):
         if exog.ndim == 1:
             if isinstance(exog, np.ndarray):
                 exog = exog[:, None]
             else:
                 exog = pd.DataFrame(exog)
+
         if exog.ndim != 2:
             msg = "'exog' must have one or two columns"
             raise ValueError(msg)
+
         if exog_vc.ndim == 1:
             if isinstance(exog_vc, np.ndarray):
                 exog_vc = exog_vc[:, None]
             else:
                 exog_vc = pd.DataFrame(exog_vc)
+
         if exog_vc.ndim != 2:
             msg = "'exog_vc' must have one or two columns"
             raise ValueError(msg)
+
         ident = np.asarray(ident)
         if ident.ndim != 1:
-            msg = 'ident must be a one-dimensional array'
+            msg = "ident must be a one-dimensional array"
             raise ValueError(msg)
+
         if len(ident) != exog_vc.shape[1]:
-            msg = 'len(ident) should match the number of columns of exog_vc'
+            msg = "len(ident) should match the number of columns of exog_vc"
             raise ValueError(msg)
+
         if not np.issubdtype(ident.dtype, np.integer):
-            msg = 'ident must have an integer dtype'
+            msg = "ident must have an integer dtype"
             raise ValueError(msg)
+
+        # Get the fixed effects parameter names
         if fep_names is None:
-            if hasattr(exog, 'columns'):
+            if hasattr(exog, "columns"):
                 fep_names = exog.columns.tolist()
             else:
-                fep_names = [('FE_%d' % (k + 1)) for k in range(exog.shape[1])]
+                fep_names = ["FE_%d" % (k + 1) for k in range(exog.shape[1])]
+
+        # Get the variance parameter names
         if vcp_names is None:
-            vcp_names = [('VC_%d' % (k + 1)) for k in range(int(max(ident)) +
-                1)]
-        elif len(vcp_names) != len(set(ident)):
-            msg = 'The lengths of vcp_names and ident should be the same'
-            raise ValueError(msg)
+            vcp_names = ["VC_%d" % (k + 1) for k in range(int(max(ident)) + 1)]
+        else:
+            if len(vcp_names) != len(set(ident)):
+                msg = "The lengths of vcp_names and ident should be the same"
+                raise ValueError(msg)
+
         if not sparse.issparse(exog_vc):
             exog_vc = sparse.csr_matrix(exog_vc)
+
         ident = ident.astype(int)
         vcp_p = float(vcp_p)
         fe_p = float(fe_p)
+
+        # Number of fixed effects parameters
         if exog is None:
             k_fep = 0
         else:
             k_fep = exog.shape[1]
+
+        # Number of variance component structure parameters and
+        # variance component realizations.
         if exog_vc is None:
             k_vc = 0
             k_vcp = 0
         else:
             k_vc = exog_vc.shape[1]
             k_vcp = max(ident) + 1
+
+        # power might be better but not available in older scipy
         exog_vc2 = exog_vc.multiply(exog_vc)
+
         super(_BayesMixedGLM, self).__init__(endog, exog, **kwargs)
+
         self.exog_vc = exog_vc
         self.exog_vc2 = exog_vc2
         self.ident = ident
@@ -247,6 +292,25 @@ class _BayesMixedGLM(base.Model):
         if vc_names is not None:
             self.names += vc_names

+    def _unpack(self, vec):
+
+        ii = 0
+
+        # Fixed effects parameters
+        fep = vec[:ii + self.k_fep]
+        ii += self.k_fep
+
+        # Variance component structure parameters (standard
+        # deviations).  These are on the log scale.  The standard
+        # deviation for random effect j is exp(vcp[ident[j]]).
+        vcp = vec[ii:ii + self.k_vcp]
+        ii += self.k_vcp
+
+        # Random effect realizations
+        vc = vec[ii:]
+
+        return fep, vcp, vc
+
     def logposterior(self, params):
         """
         The overall log-density: log p(y, fe, vc, vcp).
@@ -254,17 +318,99 @@ class _BayesMixedGLM(base.Model):
         This differs by an additive constant from the log posterior
         log p(fe, vc, vcp | y).
         """
-        pass
+
+        fep, vcp, vc = self._unpack(params)
+
+        # Contributions from p(y | x, vc)
+        lp = 0
+        if self.k_fep > 0:
+            lp += np.dot(self.exog, fep)
+        if self.k_vc > 0:
+            lp += self.exog_vc.dot(vc)
+
+        mu = self.family.link.inverse(lp)
+        ll = self.family.loglike(self.endog, mu)
+
+        if self.k_vc > 0:
+
+            # Contributions from p(vc | vcp)
+            vcp0 = vcp[self.ident]
+            s = np.exp(vcp0)
+            ll -= 0.5 * np.sum(vc**2 / s**2) + np.sum(vcp0)
+
+            # Contributions from p(vc)
+            ll -= 0.5 * np.sum(vcp**2 / self.vcp_p**2)
+
+        # Contributions from p(fep)
+        if self.k_fep > 0:
+            ll -= 0.5 * np.sum(fep**2 / self.fe_p**2)
+
+        return ll

     def logposterior_grad(self, params):
         """
         The gradient of the log posterior.
         """
-        pass
+
+        fep, vcp, vc = self._unpack(params)
+
+        lp = 0
+        if self.k_fep > 0:
+            lp += np.dot(self.exog, fep)
+        if self.k_vc > 0:
+            lp += self.exog_vc.dot(vc)
+
+        mu = self.family.link.inverse(lp)
+
+        score_factor = (self.endog - mu) / self.family.link.deriv(mu)
+        score_factor /= self.family.variance(mu)
+
+        te = [None, None, None]
+
+        # Contributions from p(y | x, z, vc)
+        if self.k_fep > 0:
+            te[0] = np.dot(score_factor, self.exog)
+        if self.k_vc > 0:
+            te[2] = self.exog_vc.transpose().dot(score_factor)
+
+        if self.k_vc > 0:
+            # Contributions from p(vc | vcp)
+            # vcp0 = vcp[self.ident]
+            # s = np.exp(vcp0)
+            # ll -= 0.5 * np.sum(vc**2 / s**2) + np.sum(vcp0)
+            vcp0 = vcp[self.ident]
+            s = np.exp(vcp0)
+            u = vc**2 / s**2 - 1
+            te[1] = np.bincount(self.ident, weights=u)
+            te[2] -= vc / s**2
+
+            # Contributions from p(vcp)
+            # ll -= 0.5 * np.sum(vcp**2 / self.vcp_p**2)
+            te[1] -= vcp / self.vcp_p**2
+
+        # Contributions from p(fep)
+        if self.k_fep > 0:
+            te[0] -= fep / self.fe_p**2
+
+        te = [x for x in te if x is not None]
+
+        return np.concatenate(te)
+
+    def _get_start(self):
+        start_fep = np.zeros(self.k_fep)
+        start_vcp = np.ones(self.k_vcp)
+        start_vc = np.random.normal(size=self.k_vc)
+        start = np.concatenate((start_fep, start_vcp, start_vc))
+        return start

     @classmethod
-    def from_formula(cls, formula, vc_formulas, data, family=None, vcp_p=1,
-        fe_p=2):
+    def from_formula(cls,
+                     formula,
+                     vc_formulas,
+                     data,
+                     family=None,
+                     vcp_p=1,
+                     fe_p=2):
         """
         Fit a BayesMixedGLM using a formula.

@@ -289,9 +435,37 @@ class _BayesMixedGLM(base.Model):
         fe_p : float
             The prior standard deviation for the fixed effects parameters.
         """
-        pass

-    def fit(self, method='BFGS', minim_opts=None):
+        ident = []
+        exog_vc = []
+        vcp_names = []
+        j = 0
+        for na, fml in vc_formulas.items():
+            mat = patsy.dmatrix(fml, data, return_type='dataframe')
+            exog_vc.append(mat)
+            vcp_names.append(na)
+            ident.append(j * np.ones(mat.shape[1], dtype=np.int_))
+            j += 1
+        exog_vc = pd.concat(exog_vc, axis=1)
+        vc_names = exog_vc.columns.tolist()
+
+        ident = np.concatenate(ident)
+
+        model = super(_BayesMixedGLM, cls).from_formula(
+            formula,
+            data=data,
+            family=family,
+            subset=None,
+            exog_vc=exog_vc,
+            ident=ident,
+            vc_names=vc_names,
+            vcp_names=vcp_names,
+            fe_p=fe_p,
+            vcp_p=vcp_p)
+
+        return model
+
+    def fit(self, method="BFGS", minim_opts=None):
         """
         fit is equivalent to fit_map.

@@ -299,9 +473,9 @@ class _BayesMixedGLM(base.Model):

         Use `fit_vb` to fit the model using variational Bayes.
         """
-        pass
+        self.fit_map(method, minim_opts)

-    def fit_map(self, method='BFGS', minim_opts=None, scale_fe=False):
+    def fit_map(self, method="BFGS", minim_opts=None, scale_fe=False):
         """
         Construct the Laplace approximation to the posterior distribution.

@@ -321,7 +495,43 @@ class _BayesMixedGLM(base.Model):
         -------
         BayesMixedGLMResults instance.
         """
-        pass
+
+        if scale_fe:
+            mn = self.exog.mean(0)
+            sc = self.exog.std(0)
+            self._exog_save = self.exog
+            self.exog = self.exog.copy()
+            ixs = np.flatnonzero(sc > 1e-8)
+            self.exog[:, ixs] -= mn[ixs]
+            self.exog[:, ixs] /= sc[ixs]
+
+        def fun(params):
+            return -self.logposterior(params)
+
+        def grad(params):
+            return -self.logposterior_grad(params)
+
+        start = self._get_start()
+
+        r = minimize(fun, start, method=method, jac=grad, options=minim_opts)
+        if not r.success:
+            msg = ("Laplace fitting did not converge, |gradient|=%.6f" %
+                   np.sqrt(np.sum(r.jac**2)))
+            warnings.warn(msg)
+
+        from statsmodels.tools.numdiff import approx_fprime
+        hess = approx_fprime(r.x, grad)
+        cov = np.linalg.inv(hess)
+
+        params = r.x
+
+        if scale_fe:
+            self.exog = self._exog_save
+            del self._exog_save
+            params[ixs] /= sc[ixs]
+            cov[ixs, :][:, ixs] /= np.outer(sc[ixs], sc[ixs])
+
+        return BayesMixedGLMResults(self, params, cov, optim_retvals=r)

     def predict(self, params, exog=None, linear=False):
         """
@@ -343,7 +553,17 @@ class _BayesMixedGLM(base.Model):
         -------
         A 1-dimensional array of predicted values
         """
-        pass
+
+        if exog is None:
+            exog = self.exog
+
+        q = exog.shape[1]
+        pr = np.dot(exog, params[0:q])
+
+        if not linear:
+            pr = self.family.link.inverse(pr)
+
+        return pr


 class _VariationalBayesMixedGLM:
@@ -351,11 +571,27 @@ class _VariationalBayesMixedGLM:
     A mixin providing generic (not family-specific) methods for
     variational Bayes mean field fitting.
     """
+
+    # Integration range (from -rng to +rng).  The integrals are with
+    # respect to a standard Gaussian distribution so (-5, 5) will be
+    # sufficient in many cases.
     rng = 5
+
     verbose = False

-    def vb_elbo_base(self, h, tm, fep_mean, vcp_mean, vc_mean, fep_sd,
-        vcp_sd, vc_sd):
+    # Returns the mean and variance of the linear predictor under the
+    # given distribution parameters.
+    def _lp_stats(self, fep_mean, fep_sd, vc_mean, vc_sd):
+
+        tm = np.dot(self.exog, fep_mean)
+        tv = np.dot(self.exog**2, fep_sd**2)
+        tm += self.exog_vc.dot(vc_mean)
+        tv += self.exog_vc2.dot(vc_sd**2)
+
+        return tm, tv
+
+    def vb_elbo_base(self, h, tm, fep_mean, vcp_mean, vc_mean, fep_sd, vcp_sd,
+                     vc_sd):
         """
         Returns the evidence lower bound (ELBO) for the model.

@@ -372,19 +608,92 @@ class _VariationalBayesMixedGLM:
             can be achieved for any GLM with a canonical link
             function.
         """
-        pass

-    def vb_elbo_grad_base(self, h, tm, tv, fep_mean, vcp_mean, vc_mean,
-        fep_sd, vcp_sd, vc_sd):
+        # p(y | vc) contributions
+        iv = 0
+        for w in glw:
+            z = self.rng * w[1]
+            iv += w[0] * h(z) * np.exp(-z**2 / 2)
+        iv /= np.sqrt(2 * np.pi)
+        iv *= self.rng
+        iv += self.endog * tm
+        iv = iv.sum()
+
+        # p(vc | vcp) * p(vcp) * p(fep) contributions
+        iv += self._elbo_common(fep_mean, fep_sd, vcp_mean, vcp_sd, vc_mean,
+                                vc_sd)
+
+        r = (iv + np.sum(np.log(fep_sd)) + np.sum(np.log(vcp_sd)) + np.sum(
+            np.log(vc_sd)))
+
+        return r
+
+    def vb_elbo_grad_base(self, h, tm, tv, fep_mean, vcp_mean, vc_mean, fep_sd,
+                          vcp_sd, vc_sd):
         """
         Return the gradient of the ELBO function.

         See vb_elbo_base for parameters.
         """
-        pass

-    def fit_vb(self, mean=None, sd=None, fit_method='BFGS', minim_opts=None,
-        scale_fe=False, verbose=False):
+        fep_mean_grad = 0.
+        fep_sd_grad = 0.
+        vcp_mean_grad = 0.
+        vcp_sd_grad = 0.
+        vc_mean_grad = 0.
+        vc_sd_grad = 0.
+
+        # p(y | vc) contributions
+        for w in glw:
+            z = self.rng * w[1]
+            u = h(z) * np.exp(-z**2 / 2) / np.sqrt(2 * np.pi)
+            r = u / np.sqrt(tv)
+            fep_mean_grad += w[0] * np.dot(u, self.exog)
+            vc_mean_grad += w[0] * self.exog_vc.transpose().dot(u)
+            fep_sd_grad += w[0] * z * np.dot(r, self.exog**2 * fep_sd)
+            v = self.exog_vc2.multiply(vc_sd).transpose().dot(r)
+            v = np.squeeze(np.asarray(v))
+            vc_sd_grad += w[0] * z * v
+
+        fep_mean_grad *= self.rng
+        vc_mean_grad *= self.rng
+        fep_sd_grad *= self.rng
+        vc_sd_grad *= self.rng
+        fep_mean_grad += np.dot(self.endog, self.exog)
+        vc_mean_grad += self.exog_vc.transpose().dot(self.endog)
+
+        (fep_mean_grad_i, fep_sd_grad_i, vcp_mean_grad_i, vcp_sd_grad_i,
+         vc_mean_grad_i, vc_sd_grad_i) = self._elbo_grad_common(
+             fep_mean, fep_sd, vcp_mean, vcp_sd, vc_mean, vc_sd)
+
+        fep_mean_grad += fep_mean_grad_i
+        fep_sd_grad += fep_sd_grad_i
+        vcp_mean_grad += vcp_mean_grad_i
+        vcp_sd_grad += vcp_sd_grad_i
+        vc_mean_grad += vc_mean_grad_i
+        vc_sd_grad += vc_sd_grad_i
+
+        fep_sd_grad += 1 / fep_sd
+        vcp_sd_grad += 1 / vcp_sd
+        vc_sd_grad += 1 / vc_sd
+
+        mean_grad = np.concatenate((fep_mean_grad, vcp_mean_grad,
+                                    vc_mean_grad))
+        sd_grad = np.concatenate((fep_sd_grad, vcp_sd_grad, vc_sd_grad))
+
+        if self.verbose:
+            print(
+                "|G|=%f" % np.sqrt(np.sum(mean_grad**2) + np.sum(sd_grad**2)))
+
+        return mean_grad, sd_grad
+
+    def fit_vb(self,
+               mean=None,
+               sd=None,
+               fit_method="BFGS",
+               minim_opts=None,
+               scale_fe=False,
+               verbose=False):
         """
         Fit a model using the variational Bayes mean field approximation.

@@ -425,7 +734,121 @@ class _VariationalBayesMixedGLM:
         review for Statisticians
         https://arxiv.org/pdf/1601.00670.pdf
         """
-        pass
+
+        self.verbose = verbose
+
+        if scale_fe:
+            mn = self.exog.mean(0)
+            sc = self.exog.std(0)
+            self._exog_save = self.exog
+            self.exog = self.exog.copy()
+            ixs = np.flatnonzero(sc > 1e-8)
+            self.exog[:, ixs] -= mn[ixs]
+            self.exog[:, ixs] /= sc[ixs]
+
+        n = self.k_fep + self.k_vcp + self.k_vc
+        ml = self.k_fep + self.k_vcp + self.k_vc
+        if mean is None:
+            m = np.zeros(n)
+        else:
+            if len(mean) != ml:
+                raise ValueError(
+                    "mean has incorrect length, %d != %d" % (len(mean), ml))
+            m = mean.copy()
+        if sd is None:
+            s = -0.5 + 0.1 * np.random.normal(size=n)
+        else:
+            if len(sd) != ml:
+                raise ValueError(
+                    "sd has incorrect length, %d != %d" % (len(sd), ml))
+
+            # s is parametrized on the log-scale internally when
+            # optimizing the ELBO function (this is transparent to the
+            # caller)
+            s = np.log(sd)
+
+        # Do not allow the variance parameter starting mean values to
+        # be too small.
+        i1, i2 = self.k_fep, self.k_fep + self.k_vcp
+        m[i1:i2] = np.where(m[i1:i2] < -1, -1, m[i1:i2])
+
+        # Do not allow the posterior standard deviation starting values
+        # to be too small.
+        s = np.where(s < -1, -1, s)
+
+        def elbo(x):
+            n = len(x) // 2
+            return -self.vb_elbo(x[:n], np.exp(x[n:]))
+
+        def elbo_grad(x):
+            n = len(x) // 2
+            gm, gs = self.vb_elbo_grad(x[:n], np.exp(x[n:]))
+            gs *= np.exp(x[n:])
+            return -np.concatenate((gm, gs))
+
+        start = np.concatenate((m, s))
+        mm = minimize(
+            elbo, start, jac=elbo_grad, method=fit_method, options=minim_opts)
+        if not mm.success:
+            warnings.warn("VB fitting did not converge")
+
+        n = len(mm.x) // 2
+        params = mm.x[0:n]
+        va = np.exp(2 * mm.x[n:])
+
+        if scale_fe:
+            self.exog = self._exog_save
+            del self._exog_save
+            params[ixs] /= sc[ixs]
+            va[ixs] /= sc[ixs]**2
+
+        return BayesMixedGLMResults(self, params, va, mm)
+
+    # Handle terms in the ELBO that are common to all models.
+    def _elbo_common(self, fep_mean, fep_sd, vcp_mean, vcp_sd, vc_mean, vc_sd):
+
+        iv = 0
+
+        # p(vc | vcp) contributions
+        m = vcp_mean[self.ident]
+        s = vcp_sd[self.ident]
+        iv -= np.sum((vc_mean**2 + vc_sd**2) * np.exp(2 * (s**2 - m))) / 2
+        iv -= np.sum(m)
+
+        # p(vcp) contributions
+        iv -= 0.5 * (vcp_mean**2 + vcp_sd**2).sum() / self.vcp_p**2
+
+        # p(b) contributions
+        iv -= 0.5 * (fep_mean**2 + fep_sd**2).sum() / self.fe_p**2
+
+        return iv
+
+    def _elbo_grad_common(self, fep_mean, fep_sd, vcp_mean, vcp_sd, vc_mean,
+                          vc_sd):
+
+        # p(vc | vcp) contributions
+        m = vcp_mean[self.ident]
+        s = vcp_sd[self.ident]
+        u = vc_mean**2 + vc_sd**2
+        ve = np.exp(2 * (s**2 - m))
+        dm = u * ve - 1
+        ds = -2 * u * ve * s
+        vcp_mean_grad = np.bincount(self.ident, weights=dm)
+        vcp_sd_grad = np.bincount(self.ident, weights=ds)
+
+        vc_mean_grad = -vc_mean.copy() * ve
+        vc_sd_grad = -vc_sd.copy() * ve
+
+        # p(vcp) contributions
+        vcp_mean_grad -= vcp_mean / self.vcp_p**2
+        vcp_sd_grad -= vcp_sd / self.vcp_p**2
+
+        # p(b) contributions
+        fep_mean_grad = -fep_mean.copy() / self.fe_p**2
+        fep_sd_grad = -fep_sd.copy() / self.fe_p**2
+
+        return (fep_mean_grad, fep_sd_grad, vcp_mean_grad, vcp_sd_grad,
+                vc_mean_grad, vc_sd_grad)


 class BayesMixedGLMResults:
@@ -451,11 +874,14 @@ class BayesMixedGLMResults:
     """

     def __init__(self, model, params, cov_params, optim_retvals=None):
+
         self.model = model
         self.params = params
         self._cov_params = cov_params
         self.optim_retvals = optim_retvals
-        self.fe_mean, self.vcp_mean, self.vc_mean = model._unpack(params)
+
+        self.fe_mean, self.vcp_mean, self.vc_mean = (model._unpack(params))
+
         if cov_params.ndim == 2:
             cp = np.diag(cov_params)
         else:
@@ -465,6 +891,60 @@ class BayesMixedGLMResults:
         self.vcp_sd = np.sqrt(self.vcp_sd)
         self.vc_sd = np.sqrt(self.vc_sd)

+    def cov_params(self):
+
+        if hasattr(self.model.data, "frame"):
+            # Return the covariance matrix as a dataframe or series
+            na = (self.model.fep_names + self.model.vcp_names +
+                  self.model.vc_names)
+            if self._cov_params.ndim == 2:
+                return pd.DataFrame(self._cov_params, index=na, columns=na)
+            else:
+                return pd.Series(self._cov_params, index=na)
+
+        # Return the covariance matrix as a ndarray
+        return self._cov_params
+
+    def summary(self):
+
+        df = pd.DataFrame()
+        m = self.model.k_fep + self.model.k_vcp
+        df["Type"] = (["M" for k in range(self.model.k_fep)] +
+                      ["V" for k in range(self.model.k_vcp)])
+
+        df["Post. Mean"] = self.params[0:m]
+
+        if self._cov_params.ndim == 2:
+            v = np.diag(self._cov_params)[0:m]
+            df["Post. SD"] = np.sqrt(v)
+        else:
+            df["Post. SD"] = np.sqrt(self._cov_params[0:m])
+
+        # Convert variance parameters to natural scale
+        df["SD"] = np.exp(df["Post. Mean"])
+        df["SD (LB)"] = np.exp(df["Post. Mean"] - 2 * df["Post. SD"])
+        df["SD (UB)"] = np.exp(df["Post. Mean"] + 2 * df["Post. SD"])
+        df["SD"] = ["%.3f" % x for x in df.SD]
+        df["SD (LB)"] = ["%.3f" % x for x in df["SD (LB)"]]
+        df["SD (UB)"] = ["%.3f" % x for x in df["SD (UB)"]]
+        df.loc[df.index < self.model.k_fep, "SD"] = ""
+        df.loc[df.index < self.model.k_fep, "SD (LB)"] = ""
+        df.loc[df.index < self.model.k_fep, "SD (UB)"] = ""
+
+        df.index = self.model.fep_names + self.model.vcp_names
+
+        summ = summary2.Summary()
+        summ.add_title(self.model.family.__class__.__name__ +
+                       " Mixed GLM Results")
+        summ.add_df(df)
+
+        summ.add_text("Parameter types are mean structure (M) and "
+                      "variance structure (V)")
+        summ.add_text("Variance parameters are modeled as log "
+                      "standard deviations")
+
+        return summ
+
     def random_effects(self, term=None):
         """
         Posterior mean and standard deviation of random effects.
@@ -483,7 +963,24 @@ class BayesMixedGLMResults:
         Data frame of posterior means and posterior standard
         deviations of random effects.
         """
-        pass
+
+        z = self.vc_mean
+        s = self.vc_sd
+        na = self.model.vc_names
+
+        if term is not None:
+            termix = self.model.vcp_names.index(term)
+            ii = np.flatnonzero(self.model.ident == termix)
+            z = z[ii]
+            s = s[ii]
+            na = [na[i] for i in ii]
+
+        x = pd.DataFrame({"Mean": z, "SD": s})
+
+        if na is not None:
+            x.index = na
+
+        return x

     def predict(self, exog=None, linear=False):
         """
@@ -502,53 +999,190 @@ class BayesMixedGLMResults:
         -------
         A one-dimensional array of fitted values.
         """
-        pass
+
+        return self.model.predict(self.params, exog, linear)


 class BinomialBayesMixedGLM(_VariationalBayesMixedGLM, _BayesMixedGLM):
+
     __doc__ = _init_doc.format(example=_logit_example)

-    def __init__(self, endog, exog, exog_vc, ident, vcp_p=1, fe_p=2,
-        fep_names=None, vcp_names=None, vc_names=None):
-        super(BinomialBayesMixedGLM, self).__init__(endog, exog, exog_vc=
-            exog_vc, ident=ident, vcp_p=vcp_p, fe_p=fe_p, family=families.
-            Binomial(), fep_names=fep_names, vcp_names=vcp_names, vc_names=
-            vc_names)
+    def __init__(self,
+                 endog,
+                 exog,
+                 exog_vc,
+                 ident,
+                 vcp_p=1,
+                 fe_p=2,
+                 fep_names=None,
+                 vcp_names=None,
+                 vc_names=None):
+
+        super(BinomialBayesMixedGLM, self).__init__(
+            endog,
+            exog,
+            exog_vc=exog_vc,
+            ident=ident,
+            vcp_p=vcp_p,
+            fe_p=fe_p,
+            family=families.Binomial(),
+            fep_names=fep_names,
+            vcp_names=vcp_names,
+            vc_names=vc_names)
+
         if not np.all(np.unique(endog) == np.r_[0, 1]):
-            msg = 'endog values must be 0 and 1, and not all identical'
+            msg = "endog values must be 0 and 1, and not all identical"
             raise ValueError(msg)

+    @classmethod
+    def from_formula(cls, formula, vc_formulas, data, vcp_p=1, fe_p=2):
+
+        fam = families.Binomial()
+        x = _BayesMixedGLM.from_formula(
+            formula, vc_formulas, data, family=fam, vcp_p=vcp_p, fe_p=fe_p)
+
+        # Copy over to the intended class structure
+        mod = BinomialBayesMixedGLM(
+            x.endog,
+            x.exog,
+            exog_vc=x.exog_vc,
+            ident=x.ident,
+            vcp_p=x.vcp_p,
+            fe_p=x.fe_p,
+            fep_names=x.fep_names,
+            vcp_names=x.vcp_names,
+            vc_names=x.vc_names)
+        mod.data = x.data
+
+        return mod
+
     def vb_elbo(self, vb_mean, vb_sd):
         """
         Returns the evidence lower bound (ELBO) for the model.
         """
-        pass
+
+        fep_mean, vcp_mean, vc_mean = self._unpack(vb_mean)
+        fep_sd, vcp_sd, vc_sd = self._unpack(vb_sd)
+        tm, tv = self._lp_stats(fep_mean, fep_sd, vc_mean, vc_sd)
+
+        def h(z):
+            return -np.log(1 + np.exp(tm + np.sqrt(tv) * z))
+
+        return self.vb_elbo_base(h, tm, fep_mean, vcp_mean, vc_mean, fep_sd,
+                                 vcp_sd, vc_sd)

     def vb_elbo_grad(self, vb_mean, vb_sd):
         """
         Returns the gradient of the model's evidence lower bound (ELBO).
         """
-        pass
+
+        fep_mean, vcp_mean, vc_mean = self._unpack(vb_mean)
+        fep_sd, vcp_sd, vc_sd = self._unpack(vb_sd)
+        tm, tv = self._lp_stats(fep_mean, fep_sd, vc_mean, vc_sd)
+
+        def h(z):
+            u = tm + np.sqrt(tv) * z
+            x = np.zeros_like(u)
+            ii = np.flatnonzero(u > 0)
+            uu = u[ii]
+            x[ii] = 1 / (1 + np.exp(-uu))
+            ii = np.flatnonzero(u <= 0)
+            uu = u[ii]
+            x[ii] = np.exp(uu) / (1 + np.exp(uu))
+            return -x
+
+        return self.vb_elbo_grad_base(h, tm, tv, fep_mean, vcp_mean, vc_mean,
+                                      fep_sd, vcp_sd, vc_sd)


 class PoissonBayesMixedGLM(_VariationalBayesMixedGLM, _BayesMixedGLM):
+
     __doc__ = _init_doc.format(example=_poisson_example)

-    def __init__(self, endog, exog, exog_vc, ident, vcp_p=1, fe_p=2,
-        fep_names=None, vcp_names=None, vc_names=None):
-        super(PoissonBayesMixedGLM, self).__init__(endog=endog, exog=exog,
-            exog_vc=exog_vc, ident=ident, vcp_p=vcp_p, fe_p=fe_p, family=
-            families.Poisson(), fep_names=fep_names, vcp_names=vcp_names,
+    def __init__(self,
+                 endog,
+                 exog,
+                 exog_vc,
+                 ident,
+                 vcp_p=1,
+                 fe_p=2,
+                 fep_names=None,
+                 vcp_names=None,
+                 vc_names=None):
+
+        super(PoissonBayesMixedGLM, self).__init__(
+            endog=endog,
+            exog=exog,
+            exog_vc=exog_vc,
+            ident=ident,
+            vcp_p=vcp_p,
+            fe_p=fe_p,
+            family=families.Poisson(),
+            fep_names=fep_names,
+            vcp_names=vcp_names,
             vc_names=vc_names)

+    @classmethod
+    def from_formula(cls,
+                     formula,
+                     vc_formulas,
+                     data,
+                     vcp_p=1,
+                     fe_p=2,
+                     vcp_names=None,
+                     vc_names=None):
+
+        fam = families.Poisson()
+        x = _BayesMixedGLM.from_formula(
+            formula,
+            vc_formulas,
+            data,
+            family=fam,
+            vcp_p=vcp_p,
+            fe_p=fe_p)
+
+        # Copy over to the intended class structure
+        mod = PoissonBayesMixedGLM(
+            endog=x.endog,
+            exog=x.exog,
+            exog_vc=x.exog_vc,
+            ident=x.ident,
+            vcp_p=x.vcp_p,
+            fe_p=x.fe_p,
+            fep_names=x.fep_names,
+            vcp_names=x.vcp_names,
+            vc_names=x.vc_names)
+        mod.data = x.data
+
+        return mod
+
     def vb_elbo(self, vb_mean, vb_sd):
         """
         Returns the evidence lower bound (ELBO) for the model.
         """
-        pass
+
+        fep_mean, vcp_mean, vc_mean = self._unpack(vb_mean)
+        fep_sd, vcp_sd, vc_sd = self._unpack(vb_sd)
+        tm, tv = self._lp_stats(fep_mean, fep_sd, vc_mean, vc_sd)
+
+        def h(z):
+            return -np.exp(tm + np.sqrt(tv) * z)
+
+        return self.vb_elbo_base(h, tm, fep_mean, vcp_mean, vc_mean, fep_sd,
+                                 vcp_sd, vc_sd)

     def vb_elbo_grad(self, vb_mean, vb_sd):
         """
         Returns the gradient of the model's evidence lower bound (ELBO).
         """
-        pass
+
+        fep_mean, vcp_mean, vc_mean = self._unpack(vb_mean)
+        fep_sd, vcp_sd, vc_sd = self._unpack(vb_sd)
+        tm, tv = self._lp_stats(fep_mean, fep_sd, vc_mean, vc_sd)
+
+        def h(z):
+            y = -np.exp(tm + np.sqrt(tv) * z)
+            return y
+
+        return self.vb_elbo_grad_base(h, tm, tv, fep_mean, vcp_mean, vc_mean,
+                                      fep_sd, vcp_sd, vc_sd)
diff --git a/statsmodels/genmod/cov_struct.py b/statsmodels/genmod/cov_struct.py
index 21490fdde..f188d64ea 100644
--- a/statsmodels/genmod/cov_struct.py
+++ b/statsmodels/genmod/cov_struct.py
@@ -7,13 +7,20 @@ docs:
 http://www.stata.com/manuals13/xtxtgee.pdf
 """
 from statsmodels.compat.pandas import Appender
+
 from collections import defaultdict
 import warnings
+
 import numpy as np
 import pandas as pd
 from scipy import linalg as spl
+
 from statsmodels.stats.correlation_tools import cov_nearest
-from statsmodels.tools.sm_exceptions import ConvergenceWarning, NotImplementedWarning, OutputWarning
+from statsmodels.tools.sm_exceptions import (
+    ConvergenceWarning,
+    NotImplementedWarning,
+    OutputWarning,
+)
 from statsmodels.tools.validation import bool_like


@@ -33,9 +40,17 @@ class CovStruct:
     the identity correlation matrix.
     """

-    def __init__(self, cov_nearest_method='clipped'):
+    def __init__(self, cov_nearest_method="clipped"):
+
+        # Parameters describing the dependency structure
         self.dep_params = None
+
+        # Keep track of the number of times that the covariance was
+        # adjusted.
         self.cov_adjust = []
+
+        # Method for projecting the covariance matrix if it is not
+        # PSD.
         self.cov_nearest_method = cov_nearest_method

     def initialize(self, model):
@@ -48,7 +63,7 @@ class CovStruct:
         model : GEE class
             A reference to the parent GEE class instance.
         """
-        pass
+        self.model = model

     def update(self, params):
         """
@@ -60,7 +75,7 @@ class CovStruct:
         params : array_like
             Working values for the regression parameters.
         """
-        pass
+        raise NotImplementedError

     def covariance_matrix(self, endog_expval, index):
         """
@@ -84,7 +99,7 @@ class CovStruct:
             True if M is a correlation matrix, False if M is a
             covariance matrix
         """
-        pass
+        raise NotImplementedError

     def covariance_matrix_solve(self, expval, index, stdev, rhs):
         """
@@ -132,14 +147,49 @@ class CovStruct:
         subclasses to optimize the linear algebra according to the
         structure of the covariance matrix.
         """
-        pass
+
+        vmat, is_cor = self.covariance_matrix(expval, index)
+        if is_cor:
+            vmat *= np.outer(stdev, stdev)
+
+        # Factor the covariance matrix.  If the factorization fails,
+        # attempt to condition it into a factorizable matrix.
+        threshold = 1e-2
+        success = False
+        cov_adjust = 0
+        for itr in range(20):
+            try:
+                vco = spl.cho_factor(vmat)
+                success = True
+                break
+            except np.linalg.LinAlgError:
+                vmat = cov_nearest(vmat, method=self.cov_nearest_method,
+                                   threshold=threshold)
+                threshold *= 2
+                cov_adjust += 1
+                msg = "At least one covariance matrix was not PSD "
+                msg += "and required projection."
+                warnings.warn(msg)
+
+        self.cov_adjust.append(cov_adjust)
+
+        # Last resort if we still cannot factor the covariance matrix.
+        if not success:
+            warnings.warn(
+                "Unable to condition covariance matrix to an SPD "
+                "matrix using cov_nearest", ConvergenceWarning)
+            vmat = np.diag(np.diag(vmat))
+            vco = spl.cho_factor(vmat)
+
+        soln = [spl.cho_solve(vco, x) for x in rhs]
+        return soln

     def summary(self):
         """
         Returns a text summary of the current estimate of the
         dependence structure.
         """
-        pass
+        raise NotImplementedError


 class Independence(CovStruct):
@@ -147,6 +197,30 @@ class Independence(CovStruct):
     An independence working dependence structure.
     """

+    @Appender(CovStruct.update.__doc__)
+    def update(self, params):
+        # Nothing to update
+        return
+
+    @Appender(CovStruct.covariance_matrix.__doc__)
+    def covariance_matrix(self, expval, index):
+        dim = len(expval)
+        return np.eye(dim, dtype=np.float64), True
+
+    @Appender(CovStruct.covariance_matrix_solve.__doc__)
+    def covariance_matrix_solve(self, expval, index, stdev, rhs):
+        v = stdev ** 2
+        rslt = []
+        for x in rhs:
+            if x.ndim == 1:
+                rslt.append(x / v)
+            else:
+                rslt.append(x / v[:, None])
+        return rslt
+
+    def summary(self):
+        return ("Observations within a cluster are modeled "
+                "as being independent.")

 class Unstructured(CovStruct):
     """
@@ -159,9 +233,79 @@ class Unstructured(CovStruct):
     by each observed value.
     """

-    def __init__(self, cov_nearest_method='clipped'):
+    def __init__(self, cov_nearest_method="clipped"):
+
         super(Unstructured, self).__init__(cov_nearest_method)

+    def initialize(self, model):
+
+        self.model = model
+
+        import numbers
+        if not issubclass(self.model.time.dtype.type, numbers.Integral):
+            msg = "time must be provided and must have integer dtype"
+            raise ValueError(msg)
+
+        q = self.model.time[:, 0].max() + 1
+
+        self.dep_params = np.eye(q)
+
+    @Appender(CovStruct.covariance_matrix.__doc__)
+    def covariance_matrix(self, endog_expval, index):
+
+        if hasattr(self.model, "time"):
+            time_li = self.model.time_li
+            ix = time_li[index][:, 0]
+            return self.dep_params[np.ix_(ix, ix)],True
+
+        return self.dep_params, True
+
+    @Appender(CovStruct.update.__doc__)
+    def update(self, params):
+
+        endog = self.model.endog_li
+        nobs = self.model.nobs
+        varfunc = self.model.family.variance
+        cached_means = self.model.cached_means
+        has_weights = self.model.weights is not None
+        weights_li = self.model.weights
+
+        time_li = self.model.time_li
+        q = self.model.time.max() + 1
+        csum = np.zeros((q, q))
+        wsum = 0.
+        cov = np.zeros((q, q))
+
+        scale = 0.
+        for i in range(self.model.num_group):
+
+            # Get the Pearson residuals
+            expval, _ = cached_means[i]
+            stdev = np.sqrt(varfunc(expval))
+            resid = (endog[i] - expval) / stdev
+
+            ix = time_li[i][:, 0]
+            m = np.outer(resid, resid)
+            ssr = np.sum(np.diag(m))
+
+            w = weights_li[i] if has_weights else 1.
+            csum[np.ix_(ix, ix)] += w
+            wsum += w * len(ix)
+            cov[np.ix_(ix, ix)] += w * m
+            scale += w * ssr
+        ddof = self.model.ddof_scale
+        scale /= wsum * (nobs - ddof) / float(nobs)
+        cov /= (csum - ddof)
+
+        sd = np.sqrt(np.diag(cov))
+        cov /= np.outer(sd, sd)
+
+        self.dep_params = cov
+
+    def summary(self):
+        print("Estimated covariance structure:")
+        print(self.dep_params)
+

 class Exchangeable(CovStruct):
     """
@@ -169,8 +313,83 @@ class Exchangeable(CovStruct):
     """

     def __init__(self):
+
         super(Exchangeable, self).__init__()
-        self.dep_params = 0.0
+
+        # The correlation between any two values in the same cluster
+        self.dep_params = 0.
+
+    @Appender(CovStruct.update.__doc__)
+    def update(self, params):
+
+        endog = self.model.endog_li
+
+        nobs = self.model.nobs
+
+        varfunc = self.model.family.variance
+
+        cached_means = self.model.cached_means
+
+        has_weights = self.model.weights is not None
+        weights_li = self.model.weights
+
+        residsq_sum, scale = 0, 0
+        fsum1, fsum2, n_pairs = 0., 0., 0.
+        for i in range(self.model.num_group):
+            expval, _ = cached_means[i]
+            stdev = np.sqrt(varfunc(expval))
+            resid = (endog[i] - expval) / stdev
+            f = weights_li[i] if has_weights else 1.
+
+            ssr = np.sum(resid * resid)
+            scale += f * ssr
+            fsum1 += f * len(endog[i])
+
+            residsq_sum += f * (resid.sum() ** 2 - ssr) / 2
+            ngrp = len(resid)
+            npr = 0.5 * ngrp * (ngrp - 1)
+            fsum2 += f * npr
+            n_pairs += npr
+
+        ddof = self.model.ddof_scale
+        scale /= (fsum1 * (nobs - ddof) / float(nobs))
+        residsq_sum /= scale
+        self.dep_params = residsq_sum / \
+            (fsum2 * (n_pairs - ddof) / float(n_pairs))
+
+    @Appender(CovStruct.covariance_matrix.__doc__)
+    def covariance_matrix(self, expval, index):
+        dim = len(expval)
+        dp = self.dep_params * np.ones((dim, dim), dtype=np.float64)
+        np.fill_diagonal(dp, 1)
+        return dp, True
+
+    @Appender(CovStruct.covariance_matrix_solve.__doc__)
+    def covariance_matrix_solve(self, expval, index, stdev, rhs):
+
+        k = len(expval)
+        c = self.dep_params / (1. - self.dep_params)
+        c /= 1. + self.dep_params * (k - 1)
+
+        rslt = []
+        for x in rhs:
+            if x.ndim == 1:
+                x1 = x / stdev
+                y = x1 / (1. - self.dep_params)
+                y -= c * sum(x1)
+                y /= stdev
+            else:
+                x1 = x / stdev[:, None]
+                y = x1 / (1. - self.dep_params)
+                y -= c * x1.sum(0)
+                y /= stdev[:, None]
+            rslt.append(y)
+
+        return rslt
+
+    def summary(self):
+        return ("The correlation between two observations in the " +
+                "same cluster is %.3f" % self.dep_params)


 class Nested(CovStruct):
@@ -226,14 +445,139 @@ class Nested(CovStruct):
         variables indicating which variance components are associated
         with the corresponding element of QY.
         """
-        pass
+
+        super(Nested, self).initialize(model)
+
+        if self.model.weights is not None:
+            warnings.warn("weights not implemented for nested cov_struct, "
+                          "using unweighted covariance estimate",
+                          NotImplementedWarning)
+
+        # A bit of processing of the nest data
+        id_matrix = np.asarray(self.model.dep_data)
+        if id_matrix.ndim == 1:
+            id_matrix = id_matrix[:, None]
+        self.id_matrix = id_matrix
+
+        endog = self.model.endog_li
+        designx, ilabels = [], []
+
+        # The number of layers of nesting
+        n_nest = self.id_matrix.shape[1]
+
+        for i in range(self.model.num_group):
+            ngrp = len(endog[i])
+            glab = self.model.group_labels[i]
+            rix = self.model.group_indices[glab]
+
+            # Determine the number of common variance components
+            # shared by each pair of observations.
+            ix1, ix2 = np.tril_indices(ngrp, -1)
+            ncm = (self.id_matrix[rix[ix1], :] ==
+                   self.id_matrix[rix[ix2], :]).sum(1)
+
+            # This is used to construct the working correlation
+            # matrix.
+            ilabel = np.zeros((ngrp, ngrp), dtype=np.int32)
+            ilabel[(ix1, ix2)] = ncm + 1
+            ilabel[(ix2, ix1)] = ncm + 1
+            ilabels.append(ilabel)
+
+            # This is used to estimate the variance components.
+            dsx = np.zeros((len(ix1), n_nest + 1), dtype=np.float64)
+            dsx[:, 0] = 1
+            for k in np.unique(ncm):
+                ii = np.flatnonzero(ncm == k)
+                dsx[ii, 1:k + 1] = 1
+            designx.append(dsx)
+
+        self.designx = np.concatenate(designx, axis=0)
+        self.ilabels = ilabels
+
+        svd = np.linalg.svd(self.designx, 0)
+        self.designx_u = svd[0]
+        self.designx_s = svd[1]
+        self.designx_v = svd[2].T
+
+    @Appender(CovStruct.update.__doc__)
+    def update(self, params):
+
+        endog = self.model.endog_li
+
+        nobs = self.model.nobs
+        dim = len(params)
+
+        if self.designx is None:
+            self._compute_design(self.model)
+
+        cached_means = self.model.cached_means
+
+        varfunc = self.model.family.variance
+
+        dvmat = []
+        scale = 0.
+        for i in range(self.model.num_group):
+
+            expval, _ = cached_means[i]
+
+            stdev = np.sqrt(varfunc(expval))
+            resid = (endog[i] - expval) / stdev
+
+            ix1, ix2 = np.tril_indices(len(resid), -1)
+            dvmat.append(resid[ix1] * resid[ix2])
+
+            scale += np.sum(resid ** 2)
+
+        dvmat = np.concatenate(dvmat)
+        scale /= (nobs - dim)
+
+        # Use least squares regression to estimate the variance
+        # components
+        vcomp_coeff = np.dot(self.designx_v, np.dot(self.designx_u.T,
+                                                    dvmat) / self.designx_s)
+
+        self.vcomp_coeff = np.clip(vcomp_coeff, 0, np.inf)
+        self.scale = scale
+
+        self.dep_params = self.vcomp_coeff.copy()
+
+    @Appender(CovStruct.covariance_matrix.__doc__)
+    def covariance_matrix(self, expval, index):
+
+        dim = len(expval)
+
+        # First iteration
+        if self.dep_params is None:
+            return np.eye(dim, dtype=np.float64), True
+
+        ilabel = self.ilabels[index]
+
+        c = np.r_[self.scale, np.cumsum(self.vcomp_coeff)]
+        vmat = c[ilabel]
+        vmat /= self.scale
+        return vmat, True

     def summary(self):
         """
         Returns a summary string describing the state of the
         dependence structure.
         """
-        pass
+
+        dep_names = ["Groups"]
+        if hasattr(self.model, "_dep_data_names"):
+            dep_names.extend(self.model._dep_data_names)
+        else:
+            dep_names.extend(["Component %d:" % (k + 1) for k in range(len(self.vcomp_coeff) - 1)])
+        if hasattr(self.model, "_groups_name"):
+            dep_names[0] = self.model._groups_name
+        dep_names.append("Residual")
+
+        vc = self.vcomp_coeff.tolist()
+        vc.append(self.scale - np.sum(vc))
+
+        smry = pd.DataFrame({"Variance": vc}, index=dep_names)
+
+        return smry


 class Stationary(CovStruct):
@@ -255,15 +599,152 @@ class Stationary(CovStruct):
     """

     def __init__(self, max_lag=1, grid=None):
+
         super(Stationary, self).__init__()
-        grid = bool_like(grid, 'grid', optional=True)
+        grid = bool_like(grid, "grid", optional=True)
         if grid is None:
-            warnings.warn('grid=True will become default in a future version',
-                FutureWarning)
+            warnings.warn(
+                "grid=True will become default in a future version",
+                FutureWarning
+            )
+
         self.max_lag = max_lag
         self.grid = bool(grid)
         self.dep_params = np.zeros(max_lag + 1)

+    def initialize(self, model):
+
+        super(Stationary, self).initialize(model)
+
+        # Time used as an index needs to be integer type.
+        if not self.grid:
+            time = self.model.time[:, 0].astype(np.int32)
+            self.time = self.model.cluster_list(time)
+
+    @Appender(CovStruct.update.__doc__)
+    def update(self, params):
+
+        if self.grid:
+            self.update_grid(params)
+        else:
+            self.update_nogrid(params)
+
+    def update_grid(self, params):
+
+        endog = self.model.endog_li
+        cached_means = self.model.cached_means
+        varfunc = self.model.family.variance
+
+        dep_params = np.zeros(self.max_lag + 1)
+        for i in range(self.model.num_group):
+
+            expval, _ = cached_means[i]
+            stdev = np.sqrt(varfunc(expval))
+            resid = (endog[i] - expval) / stdev
+
+            dep_params[0] += np.sum(resid * resid) / len(resid)
+            for j in range(1, self.max_lag + 1):
+                v = resid[j:]
+                dep_params[j] += np.sum(resid[0:-j] * v) / len(v)
+
+        dep_params /= dep_params[0]
+        self.dep_params = dep_params
+
+    def update_nogrid(self, params):
+
+        endog = self.model.endog_li
+        cached_means = self.model.cached_means
+        varfunc = self.model.family.variance
+
+        dep_params = np.zeros(self.max_lag + 1)
+        dn = np.zeros(self.max_lag + 1)
+        resid_ssq = 0
+        resid_ssq_n = 0
+        for i in range(self.model.num_group):
+
+            expval, _ = cached_means[i]
+            stdev = np.sqrt(varfunc(expval))
+            resid = (endog[i] - expval) / stdev
+
+            j1, j2 = np.tril_indices(len(expval), -1)
+            dx = np.abs(self.time[i][j1] - self.time[i][j2])
+            ii = np.flatnonzero(dx <= self.max_lag)
+            j1 = j1[ii]
+            j2 = j2[ii]
+            dx = dx[ii]
+
+            vs = np.bincount(dx, weights=resid[j1] * resid[j2],
+                             minlength=self.max_lag + 1)
+            vd = np.bincount(dx, minlength=self.max_lag + 1)
+
+            resid_ssq += np.sum(resid**2)
+            resid_ssq_n += len(resid)
+
+            ii = np.flatnonzero(vd > 0)
+            if len(ii) > 0:
+                dn[ii] += 1
+                dep_params[ii] += vs[ii] / vd[ii]
+
+        i0 = np.flatnonzero(dn > 0)
+        dep_params[i0] /= dn[i0]
+        resid_msq = resid_ssq / resid_ssq_n
+        dep_params /= resid_msq
+        self.dep_params = dep_params
+
+    @Appender(CovStruct.covariance_matrix.__doc__)
+    def covariance_matrix(self, endog_expval, index):
+
+        if self.grid:
+            return self.covariance_matrix_grid(endog_expval, index)
+
+        j1, j2 = np.tril_indices(len(endog_expval), -1)
+        dx = np.abs(self.time[index][j1] - self.time[index][j2])
+        ii = np.flatnonzero(dx <= self.max_lag)
+        j1 = j1[ii]
+        j2 = j2[ii]
+        dx = dx[ii]
+
+        cmat = np.eye(len(endog_expval))
+        cmat[j1, j2] = self.dep_params[dx]
+        cmat[j2, j1] = self.dep_params[dx]
+
+        return cmat, True
+
+    def covariance_matrix_grid(self, endog_expval, index):
+
+        from scipy.linalg import toeplitz
+        r = np.zeros(len(endog_expval))
+        r[0] = 1
+        r[1:self.max_lag + 1] = self.dep_params[1:]
+        return toeplitz(r), True
+
+    @Appender(CovStruct.covariance_matrix_solve.__doc__)
+    def covariance_matrix_solve(self, expval, index, stdev, rhs):
+
+        if not self.grid:
+            return super(Stationary, self).covariance_matrix_solve(
+                expval, index, stdev, rhs)
+
+        from statsmodels.tools.linalg import stationary_solve
+        r = np.zeros(len(expval))
+        r[0:self.max_lag] = self.dep_params[1:]
+
+        rslt = []
+        for x in rhs:
+            if x.ndim == 1:
+                y = x / stdev
+                rslt.append(stationary_solve(r, y) / stdev)
+            else:
+                y = x / stdev[:, None]
+                rslt.append(stationary_solve(r, y) / stdev[:, None])
+
+        return rslt
+
+    def summary(self):
+
+        lag = np.arange(self.max_lag + 1)
+        return pd.DataFrame({"Lag": lag, "Cov": self.dep_params})
+

 class Autoregressive(CovStruct):
     """
@@ -302,19 +783,213 @@ class Autoregressive(CovStruct):
     """

     def __init__(self, dist_func=None, grid=None):
+
         super(Autoregressive, self).__init__()
-        grid = bool_like(grid, 'grid', optional=True)
+        grid = bool_like(grid, "grid", optional=True)
+        # The function for determining distances based on time
         if dist_func is None:
             self.dist_func = lambda x, y: np.abs(x - y).sum()
         else:
             self.dist_func = dist_func
+
         if grid is None:
-            warnings.warn('grid=True will become default in a future version',
-                FutureWarning)
+            warnings.warn(
+                "grid=True will become default in a future version",
+                FutureWarning
+            )
         self.grid = bool(grid)
         if not self.grid:
             self.designx = None
-        self.dep_params = 0.0
+
+        # The autocorrelation parameter
+        self.dep_params = 0.
+
+    @Appender(CovStruct.update.__doc__)
+    def update(self, params):
+
+        if self.model.weights is not None:
+            warnings.warn("weights not implemented for autoregressive "
+                          "cov_struct, using unweighted covariance estimate",
+                          NotImplementedWarning)
+
+        if self.grid:
+            self._update_grid(params)
+        else:
+            self._update_nogrid(params)
+
+    def _update_grid(self, params):
+
+        cached_means = self.model.cached_means
+        scale = self.model.estimate_scale()
+        varfunc = self.model.family.variance
+        endog = self.model.endog_li
+
+        lag0, lag1 = 0.0, 0.0
+        for i in range(self.model.num_group):
+
+            expval, _ = cached_means[i]
+            stdev = np.sqrt(scale * varfunc(expval))
+            resid = (endog[i] - expval) / stdev
+
+            n = len(resid)
+            if n > 1:
+                lag1 += np.sum(resid[0:-1] * resid[1:]) / (n - 1)
+                lag0 += np.sum(resid**2) / n
+
+        self.dep_params = lag1 / lag0
+
+    def _update_nogrid(self, params):
+
+        endog = self.model.endog_li
+        time = self.model.time_li
+
+        # Only need to compute this once
+        if self.designx is not None:
+            designx = self.designx
+        else:
+            designx = []
+            for i in range(self.model.num_group):
+
+                ngrp = len(endog[i])
+                if ngrp == 0:
+                    continue
+
+                # Loop over pairs of observations within a cluster
+                for j1 in range(ngrp):
+                    for j2 in range(j1):
+                        designx.append(self.dist_func(time[i][j1, :],
+                                                      time[i][j2, :]))
+
+            designx = np.array(designx)
+            self.designx = designx
+
+        scale = self.model.estimate_scale()
+        varfunc = self.model.family.variance
+        cached_means = self.model.cached_means
+
+        # Weights
+        var = 1. - self.dep_params ** (2 * designx)
+        var /= 1. - self.dep_params ** 2
+        wts = 1. / var
+        wts /= wts.sum()
+
+        residmat = []
+        for i in range(self.model.num_group):
+
+            expval, _ = cached_means[i]
+            stdev = np.sqrt(scale * varfunc(expval))
+            resid = (endog[i] - expval) / stdev
+
+            ngrp = len(resid)
+            for j1 in range(ngrp):
+                for j2 in range(j1):
+                    residmat.append([resid[j1], resid[j2]])
+
+        residmat = np.array(residmat)
+
+        # Need to minimize this
+        def fitfunc(a):
+            dif = residmat[:, 0] - (a ** designx) * residmat[:, 1]
+            return np.dot(dif ** 2, wts)
+
+        # Left bracket point
+        b_lft, f_lft = 0., fitfunc(0.)
+
+        # Center bracket point
+        b_ctr, f_ctr = 0.5, fitfunc(0.5)
+        while f_ctr > f_lft:
+            b_ctr /= 2
+            f_ctr = fitfunc(b_ctr)
+            if b_ctr < 1e-8:
+                self.dep_params = 0
+                return
+
+        # Right bracket point
+        b_rgt, f_rgt = 0.75, fitfunc(0.75)
+        while f_rgt < f_ctr:
+            b_rgt = b_rgt + (1. - b_rgt) / 2
+            f_rgt = fitfunc(b_rgt)
+            if b_rgt > 1. - 1e-6:
+                raise ValueError(
+                    "Autoregressive: unable to find right bracket")
+
+        from scipy.optimize import brent
+        self.dep_params = brent(fitfunc, brack=[b_lft, b_ctr, b_rgt])
+
+    @Appender(CovStruct.covariance_matrix.__doc__)
+    def covariance_matrix(self, endog_expval, index):
+        ngrp = len(endog_expval)
+        if self.dep_params == 0:
+            return np.eye(ngrp, dtype=np.float64), True
+        idx = np.arange(ngrp)
+        cmat = self.dep_params ** np.abs(idx[:, None] - idx[None, :])
+        return cmat, True
+
+    @Appender(CovStruct.covariance_matrix_solve.__doc__)
+    def covariance_matrix_solve(self, expval, index, stdev, rhs):
+        # The inverse of an AR(1) covariance matrix is tri-diagonal.
+
+        k = len(expval)
+        r = self.dep_params
+        soln = []
+
+        # RHS has 1 row
+        if k == 1:
+            return [x / stdev ** 2 for x in rhs]
+
+        # RHS has 2 rows
+        if k == 2:
+            mat = np.array([[1, -r], [-r, 1]])
+            mat /= (1. - r ** 2)
+            for x in rhs:
+                if x.ndim == 1:
+                    x1 = x / stdev
+                else:
+                    x1 = x / stdev[:, None]
+                x1 = np.dot(mat, x1)
+                if x.ndim == 1:
+                    x1 /= stdev
+                else:
+                    x1 /= stdev[:, None]
+                soln.append(x1)
+            return soln
+
+        # RHS has >= 3 rows: values c0, c1, c2 defined below give
+        # the inverse.  c0 is on the diagonal, except for the first
+        # and last position.  c1 is on the first and last position of
+        # the diagonal.  c2 is on the sub/super diagonal.
+        c0 = (1. + r ** 2) / (1. - r ** 2)
+        c1 = 1. / (1. - r ** 2)
+        c2 = -r / (1. - r ** 2)
+        soln = []
+        for x in rhs:
+            flatten = False
+            if x.ndim == 1:
+                x = x[:, None]
+                flatten = True
+            x1 = x / stdev[:, None]
+
+            z0 = np.zeros((1, x1.shape[1]))
+            rhs1 = np.concatenate((x1[1:, :], z0), axis=0)
+            rhs2 = np.concatenate((z0, x1[0:-1, :]), axis=0)
+
+            y = c0 * x1 + c2 * rhs1 + c2 * rhs2
+            y[0, :] = c1 * x1[0, :] + c2 * x1[1, :]
+            y[-1, :] = c1 * x1[-1, :] + c2 * x1[-2, :]
+
+            y /= stdev[:, None]
+
+            if flatten:
+                y = np.squeeze(y)
+
+            soln.append(y)
+
+        return soln
+
+    def summary(self):
+
+        return ("Autoregressive(1) dependence parameter: %.3f\n" %
+                self.dep_params)


 class CategoricalCovStruct(CovStruct):
@@ -332,6 +1007,24 @@ class CategoricalCovStruct(CovStruct):
         value.
     """

+    def initialize(self, model):
+
+        super(CategoricalCovStruct, self).initialize(model)
+
+        self.nlevel = len(model.endog_values)
+        self._ncut = self.nlevel - 1
+
+        from numpy.lib.stride_tricks import as_strided
+        b = np.dtype(np.int64).itemsize
+
+        ibd = []
+        for v in model.endog_li:
+            jj = np.arange(0, len(v) + 1, self._ncut, dtype=np.int64)
+            jj = as_strided(jj, shape=(len(jj) - 1, 2), strides=(b, b))
+            ibd.append(jj)
+
+        self.ibd = ibd
+

 class GlobalOddsRatio(CategoricalCovStruct):
     """
@@ -365,7 +1058,41 @@ class GlobalOddsRatio(CategoricalCovStruct):
     def __init__(self, endog_type):
         super(GlobalOddsRatio, self).__init__()
         self.endog_type = endog_type
-        self.dep_params = 0.0
+        self.dep_params = 0.
+
+    def initialize(self, model):
+
+        super(GlobalOddsRatio, self).initialize(model)
+
+        if self.model.weights is not None:
+            warnings.warn("weights not implemented for GlobalOddsRatio "
+                          "cov_struct, using unweighted covariance estimate",
+                          NotImplementedWarning)
+
+        # Need to restrict to between-subject pairs
+        cpp = []
+        for v in model.endog_li:
+
+            # Number of subjects in this group
+            m = int(len(v) / self._ncut)
+            i1, i2 = np.tril_indices(m, -1)
+
+            cpp1 = {}
+            for k1 in range(self._ncut):
+                for k2 in range(k1 + 1):
+                    jj = np.zeros((len(i1), 2), dtype=np.int64)
+                    jj[:, 0] = i1 * self._ncut + k1
+                    jj[:, 1] = i2 * self._ncut + k2
+                    cpp1[(k2, k1)] = jj
+
+            cpp.append(cpp1)
+
+        self.cpp = cpp
+
+        # Initialize the dependence parameters
+        self.crude_or = self.observed_crude_oddsratio()
+        if self.model.update_dep:
+            self.dep_params = self.crude_or

     def pooled_odds_ratio(self, tables):
         """
@@ -374,7 +1101,32 @@ class GlobalOddsRatio(CategoricalCovStruct):
         The pooled odds ratio is the inverse variance weighted average
         of the sample odds ratios of the tables.
         """
-        pass
+
+        if len(tables) == 0:
+            return 1.
+
+        # Get the sampled odds ratios and variances
+        log_oddsratio, var = [], []
+        for table in tables:
+            lor = np.log(table[1, 1]) + np.log(table[0, 0]) -\
+                np.log(table[0, 1]) - np.log(table[1, 0])
+            log_oddsratio.append(lor)
+            var.append((1 / table.astype(np.float64)).sum())
+
+        # Calculate the inverse variance weighted average
+        wts = [1 / v for v in var]
+        wtsum = sum(wts)
+        wts = [w / wtsum for w in wts]
+        log_pooled_or = sum([w * e for w, e in zip(wts, log_oddsratio)])
+
+        return np.exp(log_pooled_or)
+
+    @Appender(CovStruct.covariance_matrix.__doc__)
+    def covariance_matrix(self, expected_value, index):
+
+        vmat = self.get_eyy(expected_value, index)
+        vmat -= np.outer(expected_value, expected_value)
+        return vmat, False

     def observed_crude_oddsratio(self):
         """
@@ -385,7 +1137,34 @@ class GlobalOddsRatio(CategoricalCovStruct):
         odds ratios.  Since the covariate effects are ignored, this OR
         will generally be greater than the stratified OR.
         """
-        pass
+
+        cpp = self.cpp
+        endog = self.model.endog_li
+
+        # Storage for the contingency tables for each (c,c')
+        tables = {}
+        for ii in cpp[0].keys():
+            tables[ii] = np.zeros((2, 2), dtype=np.float64)
+
+        # Get the observed crude OR
+        for i in range(len(endog)):
+
+            # The observed joint values for the current cluster
+            yvec = endog[i]
+            endog_11 = np.outer(yvec, yvec)
+            endog_10 = np.outer(yvec, 1. - yvec)
+            endog_01 = np.outer(1. - yvec, yvec)
+            endog_00 = np.outer(1. - yvec, 1. - yvec)
+
+            cpp1 = cpp[i]
+            for ky in cpp1.keys():
+                ix = cpp1[ky]
+                tables[ky][1, 1] += endog_11[ix[:, 0], ix[:, 1]].sum()
+                tables[ky][1, 0] += endog_10[ix[:, 0], ix[:, 1]].sum()
+                tables[ky][0, 1] += endog_01[ix[:, 0], ix[:, 1]].sum()
+                tables[ky][0, 0] += endog_00[ix[:, 0], ix[:, 1]].sum()
+
+        return self.pooled_odds_ratio(list(tables.values()))

     def get_eyy(self, endog_expval, index):
         """
@@ -393,7 +1172,31 @@ class GlobalOddsRatio(CategoricalCovStruct):
         that endog[i] = 1 and endog[j] = 1, based on the marginal
         probabilities of endog and the global odds ratio `current_or`.
         """
-        pass
+
+        current_or = self.dep_params
+        ibd = self.ibd[index]
+
+        # The between-observation joint probabilities
+        if current_or == 1.0:
+            vmat = np.outer(endog_expval, endog_expval)
+        else:
+            psum = endog_expval[:, None] + endog_expval[None, :]
+            pprod = endog_expval[:, None] * endog_expval[None, :]
+            pfac = np.sqrt((1. + psum * (current_or - 1.)) ** 2 +
+                           4 * current_or * (1. - current_or) * pprod)
+            vmat = 1. + psum * (current_or - 1.) - pfac
+            vmat /= 2. * (current_or - 1)
+
+        # Fix E[YY'] for elements that belong to same observation
+        for bdl in ibd:
+            evy = endog_expval[bdl[0]:bdl[1]]
+            if self.endog_type == "ordinal":
+                vmat[bdl[0]:bdl[1], bdl[0]:bdl[1]] =\
+                    np.minimum.outer(evy, evy)
+            else:
+                vmat[bdl[0]:bdl[1], bdl[0]:bdl[1]] = np.diag(evy)
+
+        return vmat

     @Appender(CovStruct.update.__doc__)
     def update(self, params):
@@ -401,7 +1204,46 @@ class GlobalOddsRatio(CategoricalCovStruct):
         Update the global odds ratio based on the current value of
         params.
         """
-        pass
+
+        cpp = self.cpp
+        cached_means = self.model.cached_means
+
+        # This will happen if all the clusters have only
+        # one observation
+        if len(cpp[0]) == 0:
+            return
+
+        tables = {}
+        for ii in cpp[0]:
+            tables[ii] = np.zeros((2, 2), dtype=np.float64)
+
+        for i in range(self.model.num_group):
+
+            endog_expval, _ = cached_means[i]
+
+            emat_11 = self.get_eyy(endog_expval, i)
+            emat_10 = endog_expval[:, None] - emat_11
+            emat_01 = -emat_11 + endog_expval
+            emat_00 = 1. - (emat_11 + emat_10 + emat_01)
+
+            cpp1 = cpp[i]
+            for ky in cpp1.keys():
+                ix = cpp1[ky]
+                tables[ky][1, 1] += emat_11[ix[:, 0], ix[:, 1]].sum()
+                tables[ky][1, 0] += emat_10[ix[:, 0], ix[:, 1]].sum()
+                tables[ky][0, 1] += emat_01[ix[:, 0], ix[:, 1]].sum()
+                tables[ky][0, 0] += emat_00[ix[:, 0], ix[:, 1]].sum()
+
+        cor_expval = self.pooled_odds_ratio(list(tables.values()))
+
+        self.dep_params *= self.crude_or / cor_expval
+        if not np.isfinite(self.dep_params):
+            self.dep_params = 1.
+            warnings.warn("dep_params became inf, resetting to 1",
+                          ConvergenceWarning)
+
+    def summary(self):
+        return "Global odds ratio: %.3f\n" % self.dep_params


 class OrdinalIndependence(CategoricalCovStruct):
@@ -416,6 +1258,23 @@ class OrdinalIndependence(CategoricalCovStruct):
     There are no parameters to estimate in this covariance structure.
     """

+    def covariance_matrix(self, expected_value, index):
+
+        ibd = self.ibd[index]
+        n = len(expected_value)
+        vmat = np.zeros((n, n))
+
+        for bdl in ibd:
+            ev = expected_value[bdl[0]:bdl[1]]
+            vmat[bdl[0]:bdl[1], bdl[0]:bdl[1]] =\
+                np.minimum.outer(ev, ev) - np.outer(ev, ev)
+
+        return vmat, False
+
+    # Nothing to update
+    def update(self, params):
+        pass
+

 class NominalIndependence(CategoricalCovStruct):
     """
@@ -429,6 +1288,23 @@ class NominalIndependence(CategoricalCovStruct):
     There are no parameters to estimate in this covariance structure.
     """

+    def covariance_matrix(self, expected_value, index):
+
+        ibd = self.ibd[index]
+        n = len(expected_value)
+        vmat = np.zeros((n, n))
+
+        for bdl in ibd:
+            ev = expected_value[bdl[0]:bdl[1]]
+            vmat[bdl[0]:bdl[1], bdl[0]:bdl[1]] =\
+                np.diag(ev) - np.outer(ev, ev)
+
+        return vmat, False
+
+    # Nothing to update
+    def update(self, params):
+        pass
+

 class Equivalence(CovStruct):
     """
@@ -494,19 +1370,25 @@ class Equivalence(CovStruct):
     """

     def __init__(self, pairs=None, labels=None, return_cov=False):
+
         super(Equivalence, self).__init__()
-        if pairs is None and labels is None:
+
+        if (pairs is None) and (labels is None):
             raise ValueError(
-                'Equivalence cov_struct requires either `pairs` or `labels`')
-        if pairs is not None and labels is not None:
+                "Equivalence cov_struct requires either `pairs` or `labels`")
+
+        if (pairs is not None) and (labels is not None):
             raise ValueError(
-                'Equivalence cov_struct accepts only one of `pairs` and `labels`'
-                )
+                "Equivalence cov_struct accepts only one of `pairs` "
+                "and `labels`")
+
         if pairs is not None:
             import copy
             self.pairs = copy.deepcopy(pairs)
+
         if labels is not None:
             self.labels = np.asarray(labels)
+
         self.return_cov = return_cov

     def _make_pairs(self, i, j):
@@ -516,4 +1398,165 @@ class Equivalence(CovStruct):
         The arrays i and j must be one-dimensional containing non-negative
         integers.
         """
-        pass
+
+        mat = np.zeros((len(i) * len(j), 2), dtype=np.int32)
+
+        # Create the pairs and order them
+        f = np.ones(len(j))
+        mat[:, 0] = np.kron(f, i).astype(np.int32)
+        f = np.ones(len(i))
+        mat[:, 1] = np.kron(j, f).astype(np.int32)
+        mat.sort(1)
+
+        # Remove repeated rows
+        try:
+            dtype = np.dtype((np.void, mat.dtype.itemsize * mat.shape[1]))
+            bmat = np.ascontiguousarray(mat).view(dtype)
+            _, idx = np.unique(bmat, return_index=True)
+        except TypeError:
+            # workaround for old numpy that cannot call unique with complex
+            # dtypes
+            rs = np.random.RandomState(4234)
+            bmat = np.dot(mat, rs.uniform(size=mat.shape[1]))
+            _, idx = np.unique(bmat, return_index=True)
+        mat = mat[idx, :]
+
+        return mat[:, 0], mat[:, 1]
+
+    def _pairs_from_labels(self):
+
+        from collections import defaultdict
+        pairs = defaultdict(lambda: defaultdict(lambda: None))
+
+        model = self.model
+
+        df = pd.DataFrame({"labels": self.labels, "groups": model.groups})
+        gb = df.groupby(["groups", "labels"])
+
+        ulabels = np.unique(self.labels)
+
+        for g_ix, g_lb in enumerate(model.group_labels):
+
+            # Loop over label pairs
+            for lx1 in range(len(ulabels)):
+                for lx2 in range(lx1 + 1):
+
+                    lb1 = ulabels[lx1]
+                    lb2 = ulabels[lx2]
+
+                    try:
+                        i1 = gb.groups[(g_lb, lb1)]
+                        i2 = gb.groups[(g_lb, lb2)]
+                    except KeyError:
+                        continue
+
+                    i1, i2 = self._make_pairs(i1, i2)
+
+                    clabel = str(lb1) + "/" + str(lb2)
+
+                    # Variance parameters belong in their own equiv class.
+                    jj = np.flatnonzero(i1 == i2)
+                    if len(jj) > 0:
+                        clabelv = clabel + "/v"
+                        pairs[g_lb][clabelv] = (i1[jj], i2[jj])
+
+                    # Covariance parameters
+                    jj = np.flatnonzero(i1 != i2)
+                    if len(jj) > 0:
+                        i1 = i1[jj]
+                        i2 = i2[jj]
+                        pairs[g_lb][clabel] = (i1, i2)
+
+        self.pairs = pairs
+
+    def initialize(self, model):
+
+        super(Equivalence, self).initialize(model)
+
+        if self.model.weights is not None:
+            warnings.warn("weights not implemented for equalence cov_struct, "
+                          "using unweighted covariance estimate",
+                          NotImplementedWarning)
+
+        if not hasattr(self, 'pairs'):
+            self._pairs_from_labels()
+
+        # Initialize so that any equivalence class containing a
+        # variance parameter has value 1.
+        self.dep_params = defaultdict(lambda: 0.)
+        self._var_classes = set()
+        for gp in self.model.group_labels:
+            for lb in self.pairs[gp]:
+                j1, j2 = self.pairs[gp][lb]
+                if np.any(j1 == j2):
+                    if not np.all(j1 == j2):
+                        warnings.warn(
+                            "equivalence class contains both variance "
+                            "and covariance parameters", OutputWarning)
+                    self._var_classes.add(lb)
+                    self.dep_params[lb] = 1
+
+        # Need to start indexing at 0 within each group.
+        # rx maps olds indices to new indices
+        rx = -1 * np.ones(len(self.model.endog), dtype=np.int32)
+        for g_ix, g_lb in enumerate(self.model.group_labels):
+            ii = self.model.group_indices[g_lb]
+            rx[ii] = np.arange(len(ii), dtype=np.int32)
+
+        # Reindex
+        for gp in self.model.group_labels:
+            for lb in self.pairs[gp].keys():
+                a, b = self.pairs[gp][lb]
+                self.pairs[gp][lb] = (rx[a], rx[b])
+
+    @Appender(CovStruct.update.__doc__)
+    def update(self, params):
+
+        endog = self.model.endog_li
+        varfunc = self.model.family.variance
+        cached_means = self.model.cached_means
+        dep_params = defaultdict(lambda: [0., 0., 0.])
+        n_pairs = defaultdict(lambda: 0)
+        dim = len(params)
+
+        for k, gp in enumerate(self.model.group_labels):
+            expval, _ = cached_means[k]
+            stdev = np.sqrt(varfunc(expval))
+            resid = (endog[k] - expval) / stdev
+            for lb in self.pairs[gp].keys():
+                if (not self.return_cov) and lb in self._var_classes:
+                    continue
+                jj = self.pairs[gp][lb]
+                dep_params[lb][0] += np.sum(resid[jj[0]] * resid[jj[1]])
+                if not self.return_cov:
+                    dep_params[lb][1] += np.sum(resid[jj[0]] ** 2)
+                    dep_params[lb][2] += np.sum(resid[jj[1]] ** 2)
+                n_pairs[lb] += len(jj[0])
+
+        if self.return_cov:
+            for lb in dep_params.keys():
+                dep_params[lb] = dep_params[lb][0] / (n_pairs[lb] - dim)
+        else:
+            for lb in dep_params.keys():
+                den = np.sqrt(dep_params[lb][1] * dep_params[lb][2])
+                dep_params[lb] = dep_params[lb][0] / den
+            for lb in self._var_classes:
+                dep_params[lb] = 1.
+
+        self.dep_params = dep_params
+        self.n_pairs = n_pairs
+
+    @Appender(CovStruct.covariance_matrix.__doc__)
+    def covariance_matrix(self, expval, index):
+        dim = len(expval)
+        cmat = np.zeros((dim, dim))
+        g_lb = self.model.group_labels[index]
+
+        for lb in self.pairs[g_lb].keys():
+            j1, j2 = self.pairs[g_lb][lb]
+            cmat[j1, j2] = self.dep_params[lb]
+
+        cmat = cmat + cmat.T
+        np.fill_diagonal(cmat, cmat.diagonal() / 2)
+
+        return cmat, not self.return_cov
diff --git a/statsmodels/genmod/families/family.py b/statsmodels/genmod/families/family.py
index 8b1330e19..662cdccbb 100644
--- a/statsmodels/genmod/families/family.py
+++ b/statsmodels/genmod/families/family.py
@@ -1,13 +1,24 @@
-"""
+'''
 The one parameter exponential family distributions used by GLM.
-"""
+'''
+# TODO: quasi, quasibinomial, quasipoisson
+# see
+# http://www.biostat.jhsph.edu/~qli/biostatistics_r_doc/library/stats/html/family.html
+# for comparison to R, and McCullagh and Nelder
+
+
 import inspect
 import warnings
+
 import numpy as np
 from scipy import special, stats
+
 from statsmodels.compat.scipy import SP_LT_17
-from statsmodels.tools.sm_exceptions import ValueWarning
+from statsmodels.tools.sm_exceptions import (
+    ValueWarning,
+    )
 from . import links as L, varfuncs as V
+
 FLOAT_EPS = np.finfo(float).eps


@@ -32,6 +43,7 @@ class Family:
     --------
     :ref:`links` : Further details on links.
     """
+    # TODO: change these class attributes, use valid somewhere...
     valid = [-np.inf, np.inf]
     links = []

@@ -47,27 +59,45 @@ class Family:
         appropriate links for each family but note that not all of these are
         currently available.
         """
-        pass
+        # TODO: change the links class attribute in the families to hold
+        # meaningful information instead of a list of links instances such as
+        # [<statsmodels.family.links.Log object at 0x9a4240c>,
+        #  <statsmodels.family.links.Power object at 0x9a423ec>,
+        #  <statsmodels.family.links.Power object at 0x9a4236c>]
+        # for Poisson...
+        self._link = link
+        if self._check_link:
+            if not isinstance(link, L.Link):
+                raise TypeError("The input should be a valid Link object.")
+            if hasattr(self, "links"):
+                validlink = max([isinstance(link, _) for _ in self.links])
+                if not validlink:
+                    msg = "Invalid link for family, should be in %s. (got %s)"
+                    raise ValueError(msg % (repr(self.links), link))

     def _getlink(self):
         """
         Helper method to get the link for a family.
         """
-        pass
-    link = property(_getlink, _setlink, doc='Link function for family')
+        return self._link
+
+    # link property for each family is a pointer to link instance
+    link = property(_getlink, _setlink, doc="Link function for family")

     def __init__(self, link, variance, check_link=True):
         self._check_link = check_link
         if inspect.isclass(link):
             warnmssg = (
-                'Calling Family(..) with a link class is not allowed. Use an instance of a link class instead.'
-                )
+                "Calling Family(..) with a link class is not allowed. Use an "
+                "instance of a link class instead."
+            )
             raise TypeError(warnmssg)
+
         self.link = link
         self.variance = variance

     def starting_mu(self, y):
-        """
+        r"""
         Starting value for mu in the IRLS algorithm.

         Parameters
@@ -84,14 +114,14 @@ class Family:
         -----
         .. math::

-           \\mu_0 = (Y + \\overline{Y})/2
+           \mu_0 = (Y + \overline{Y})/2

         Only the Binomial family takes a different initial value.
         """
-        pass
+        return (y + y.mean())/2.

     def weights(self, mu):
-        """
+        r"""
         Weights for IRLS steps

         Parameters
@@ -108,13 +138,12 @@ class Family:
         -----
         .. math::

-           w = 1 / (g'(\\mu)^2  * Var(\\mu))
+           w = 1 / (g'(\mu)^2  * Var(\mu))
         """
-        pass
+        return 1. / (self.link.deriv(mu)**2 * self.variance(mu))

-    def deviance(self, endog, mu, var_weights=1.0, freq_weights=1.0, scale=1.0
-        ):
-        """
+    def deviance(self, endog, mu, var_weights=1., freq_weights=1., scale=1.):
+        r"""
         The deviance function evaluated at (endog, mu, var_weights,
         freq_weights, scale) for the distribution.

@@ -144,8 +173,8 @@ class Family:

         .. math::

-           D = 2\\sum_i (freq\\_weights_i * var\\_weights *
-           (llf(endog_i, endog_i) - llf(endog_i, \\mu_i)))
+           D = 2\sum_i (freq\_weights_i * var\_weights *
+           (llf(endog_i, endog_i) - llf(endog_i, \mu_i)))

         where y is the endogenous variable. The deviance functions are
         analytically defined for each family.
@@ -153,12 +182,13 @@ class Family:
         Internally, we calculate deviance as:

         .. math::
-           D = \\sum_i freq\\_weights_i * var\\_weights * resid\\_dev_i  / scale
+           D = \sum_i freq\_weights_i * var\_weights * resid\_dev_i  / scale
         """
-        pass
+        resid_dev = self._resid_dev(endog, mu)
+        return np.sum(resid_dev * freq_weights * var_weights / scale)

-    def resid_dev(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def resid_dev(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The deviance residuals

         Parameters
@@ -183,23 +213,25 @@ class Family:
         observation i to the deviance as

         .. math::
-           resid\\_dev_i = sign(y_i-\\mu_i) \\sqrt{D_i}
+           resid\_dev_i = sign(y_i-\mu_i) \sqrt{D_i}

         D_i is calculated from the _resid_dev method in each family.
         Distribution-specific documentation of the calculation is available
         there.
         """
-        pass
+        resid_dev = self._resid_dev(endog, mu)
+        resid_dev *= var_weights / scale
+        return np.sign(endog - mu) * np.sqrt(np.clip(resid_dev, 0., np.inf))

     def fitted(self, lin_pred):
-        """
+        r"""
         Fitted values based on linear predictors lin_pred.

         Parameters
         ----------
         lin_pred : ndarray
             Values of the linear predictor of the model.
-            :math:`X \\cdot \\beta` in a classical linear model.
+            :math:`X \cdot \beta` in a classical linear model.

         Returns
         -------
@@ -207,7 +239,8 @@ class Family:
             The mean response variables given by the inverse of the link
             function.
         """
-        pass
+        fits = self.link.inverse(lin_pred)
+        return fits

     def predict(self, mu):
         """
@@ -224,10 +257,10 @@ class Family:
             Linear predictors based on the mean response variables.  The value
             of the link function at the given mu.
         """
-        pass
+        return self.link(mu)

-    def loglike_obs(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def loglike_obs(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The log-likelihood function for each observation in terms of the fitted
         mean response for the distribution.

@@ -255,10 +288,10 @@ class Family:
         both ``loglike(endog, endog)`` and ``loglike(endog, mu)`` to get the
         log-likelihood ratio.
         """
-        pass
+        raise NotImplementedError

-    def loglike(self, endog, mu, var_weights=1.0, freq_weights=1.0, scale=1.0):
-        """
+    def loglike(self, endog, mu, var_weights=1., freq_weights=1., scale=1.):
+        r"""
         The log-likelihood function in terms of the fitted mean response.

         Parameters
@@ -285,17 +318,18 @@ class Family:
         Where :math:`ll_i` is the by-observation log-likelihood:

         .. math::
-           ll = \\sum(ll_i * freq\\_weights_i)
+           ll = \sum(ll_i * freq\_weights_i)

         ``ll_i`` is defined for each family. endog and mu are not restricted
         to ``endog`` and ``mu`` respectively.  For instance, you could call
         both ``loglike(endog, endog)`` and ``loglike(endog, mu)`` to get the
         log-likelihood ratio.
         """
-        pass
+        ll_obs = self.loglike_obs(endog, mu, var_weights, scale)
+        return np.sum(ll_obs * freq_weights)

-    def resid_anscombe(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def resid_anscombe(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The Anscombe residuals

         Parameters
@@ -320,15 +354,15 @@ class Family:
         Anscombe residuals are defined by

         .. math::
-           resid\\_anscombe_i = \\frac{A(y)-A(\\mu)}{A'(\\mu)\\sqrt{Var[\\mu]}} *
-           \\sqrt(var\\_weights)
+           resid\_anscombe_i = \frac{A(y)-A(\mu)}{A'(\mu)\sqrt{Var[\mu]}} *
+           \sqrt(var\_weights)

-        where :math:`A'(y)=v(y)^{-\\frac{1}{3}}` and :math:`v(\\mu)` is the
-        variance function :math:`Var[y]=\\frac{\\phi}{w}v(mu)`.
+        where :math:`A'(y)=v(y)^{-\frac{1}{3}}` and :math:`v(\mu)` is the
+        variance function :math:`Var[y]=\frac{\phi}{w}v(mu)`.
         The transformation :math:`A(y)` makes the residuals more normal
         distributed.
         """
-        pass
+        raise NotImplementedError

     def _clean(self, x):
         """
@@ -340,7 +374,7 @@ class Family:
         possible that other families might need a check for validity of the
         domain.
         """
-        pass
+        return np.clip(x, FLOAT_EPS, np.inf)


 class Poisson(Family):
@@ -374,16 +408,19 @@ class Poisson(Family):
     links = [L.Log, L.Identity, L.Sqrt]
     variance = V.mu
     valid = [0, np.inf]
-    safe_links = [L.Log]
+    safe_links = [L.Log, ]

     def __init__(self, link=None, check_link=True):
         if link is None:
             link = L.Log()
-        super(Poisson, self).__init__(link=link, variance=Poisson.variance,
-            check_link=check_link)
+        super(Poisson, self).__init__(
+            link=link,
+            variance=Poisson.variance,
+            check_link=check_link
+            )

     def _resid_dev(self, endog, mu):
-        """
+        r"""
         Poisson deviance residuals

         Parameters
@@ -402,13 +439,15 @@ class Poisson(Family):
         -----
         .. math::

-           resid\\_dev_i = 2 * (endog_i * \\ln(endog_i / \\mu_i) -
-           (endog_i - \\mu_i))
+           resid\_dev_i = 2 * (endog_i * \ln(endog_i / \mu_i) -
+           (endog_i - \mu_i))
         """
-        pass
+        endog_mu = self._clean(endog / mu)
+        resid_dev = endog * np.log(endog_mu) - (endog - mu)
+        return 2 * resid_dev

-    def loglike_obs(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def loglike_obs(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The log-likelihood function for each observation in terms of the fitted
         mean response for the Poisson distribution.

@@ -432,13 +471,14 @@ class Poisson(Family):
         Notes
         -----
         .. math::
-            ll_i = var\\_weights_i / scale * (endog_i * \\ln(\\mu_i) - \\mu_i -
-            \\ln \\Gamma(endog_i + 1))
+            ll_i = var\_weights_i / scale * (endog_i * \ln(\mu_i) - \mu_i -
+            \ln \Gamma(endog_i + 1))
         """
-        pass
+        return var_weights / scale * (endog * np.log(mu) - mu -
+                                      special.gammaln(endog + 1))

-    def resid_anscombe(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def resid_anscombe(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The Anscombe residuals

         Parameters
@@ -462,13 +502,16 @@ class Poisson(Family):
         -----
         .. math::

-           resid\\_anscombe_i = (3/2) * (endog_i^{2/3} - \\mu_i^{2/3}) /
-           \\mu_i^{1/6} * \\sqrt(var\\_weights)
+           resid\_anscombe_i = (3/2) * (endog_i^{2/3} - \mu_i^{2/3}) /
+           \mu_i^{1/6} * \sqrt(var\_weights)
         """
-        pass
+        resid = ((3 / 2.) * (endog**(2 / 3.) - mu**(2 / 3.)) /
+                 (mu ** (1 / 6.) * scale ** 0.5))
+        resid *= np.sqrt(var_weights)
+        return resid

-    def get_distribution(self, mu, scale=1.0, var_weights=1.0):
-        """
+    def get_distribution(self, mu, scale=1., var_weights=1.):
+        r"""
         Frozen Poisson distribution instance for given parameters

         Parameters
@@ -486,7 +529,8 @@ class Poisson(Family):
         distribution instance

         """
-        pass
+
+        return stats.poisson(mu)


 class Gaussian(Family):
@@ -517,6 +561,7 @@ class Gaussian(Family):
     statsmodels.genmod.families.family.Family : Parent class for all links.
     :ref:`links` : Further details on links.
     """
+
     links = [L.Log, L.Identity, L.InversePower]
     variance = V.constant
     safe_links = links
@@ -524,11 +569,14 @@ class Gaussian(Family):
     def __init__(self, link=None, check_link=True):
         if link is None:
             link = L.Identity()
-        super(Gaussian, self).__init__(link=link, variance=Gaussian.
-            variance, check_link=check_link)
+        super(Gaussian, self).__init__(
+            link=link,
+            variance=Gaussian.variance,
+            check_link=check_link
+            )

     def _resid_dev(self, endog, mu):
-        """
+        r"""
         Gaussian deviance residuals

         Parameters
@@ -547,12 +595,12 @@ class Gaussian(Family):
         -----
         .. math::

-           resid\\_dev_i = (endog_i - \\mu_i) ** 2
+           resid\_dev_i = (endog_i - \mu_i) ** 2
         """
-        pass
+        return (endog - mu) ** 2

-    def loglike_obs(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def loglike_obs(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The log-likelihood function for each observation in terms of the fitted
         mean response for the Gaussian distribution.

@@ -580,26 +628,29 @@ class Gaussian(Family):

         .. math::

-           llf = -nobs / 2 * (\\log(SSR) + (1 + \\log(2 \\pi / nobs)))
+           llf = -nobs / 2 * (\log(SSR) + (1 + \log(2 \pi / nobs)))

         where

         .. math::

-           SSR = \\sum_i (Y_i - g^{-1}(\\mu_i))^2
+           SSR = \sum_i (Y_i - g^{-1}(\mu_i))^2

         If the links is not the identity link then the loglikelihood
         function is defined as

         .. math::

-           ll_i = -1 / 2 \\sum_i  * var\\_weights * ((Y_i - mu_i)^2 / scale +
-                                                \\log(2 * \\pi * scale))
+           ll_i = -1 / 2 \sum_i  * var\_weights * ((Y_i - mu_i)^2 / scale +
+                                                \log(2 * \pi * scale))
         """
-        pass
+        ll_obs = -var_weights * (endog - mu) ** 2 / scale
+        ll_obs += -np.log(scale / var_weights) - np.log(2 * np.pi)
+        ll_obs /= 2
+        return ll_obs

-    def resid_anscombe(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def resid_anscombe(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The Anscombe residuals

         Parameters
@@ -626,13 +677,15 @@ class Gaussian(Family):

         .. math::

-           resid\\_anscombe_i = (Y_i - \\mu_i) / \\sqrt{scale} *
-           \\sqrt(var\\_weights)
+           resid\_anscombe_i = (Y_i - \mu_i) / \sqrt{scale} *
+           \sqrt(var\_weights)
         """
-        pass
+        resid = (endog - mu) / scale ** 0.5
+        resid *= np.sqrt(var_weights)
+        return resid

-    def get_distribution(self, mu, scale, var_weights=1.0):
-        """
+    def get_distribution(self, mu, scale, var_weights=1.):
+        r"""
         Frozen Gaussian distribution instance for given parameters

         Parameters
@@ -649,7 +702,9 @@ class Gaussian(Family):
         distribution instance

         """
-        pass
+
+        scale_n = scale / var_weights
+        return stats.norm(loc=mu, scale=np.sqrt(scale_n))


 class Gamma(Family):
@@ -682,16 +737,19 @@ class Gamma(Family):
     """
     links = [L.Log, L.Identity, L.InversePower]
     variance = V.mu_squared
-    safe_links = [L.Log]
+    safe_links = [L.Log, ]

     def __init__(self, link=None, check_link=True):
         if link is None:
             link = L.InversePower()
-        super(Gamma, self).__init__(link=link, variance=Gamma.variance,
-            check_link=check_link)
+        super(Gamma, self).__init__(
+            link=link,
+            variance=Gamma.variance,
+            check_link=check_link
+            )

     def _resid_dev(self, endog, mu):
-        """
+        r"""
         Gamma deviance residuals

         Parameters
@@ -710,13 +768,15 @@ class Gamma(Family):
         -----
         .. math::

-           resid\\_dev_i = 2 * ((endog_i - \\mu_i) / \\mu_i -
-           \\log(endog_i / \\mu_i))
+           resid\_dev_i = 2 * ((endog_i - \mu_i) / \mu_i -
+           \log(endog_i / \mu_i))
         """
-        pass
+        endog_mu = self._clean(endog / mu)
+        resid_dev = -np.log(endog_mu) + (endog - mu) / mu
+        return 2 * resid_dev

-    def loglike_obs(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def loglike_obs(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The log-likelihood function for each observation in terms of the fitted
         mean response for the Gamma distribution.

@@ -741,14 +801,23 @@ class Gamma(Family):
         -----
         .. math::

-           ll_i = var\\_weights_i / scale * (\\ln(var\\_weights_i * endog_i /
-           (scale * \\mu_i)) - (var\\_weights_i * endog_i) /
-           (scale * \\mu_i)) - \\ln \\Gamma(var\\_weights_i / scale) - \\ln(\\mu_i)
+           ll_i = var\_weights_i / scale * (\ln(var\_weights_i * endog_i /
+           (scale * \mu_i)) - (var\_weights_i * endog_i) /
+           (scale * \mu_i)) - \ln \Gamma(var\_weights_i / scale) - \ln(\mu_i)
         """
-        pass
+        endog_mu = self._clean(endog / mu)
+        weight_scale = var_weights / scale
+        ll_obs = weight_scale * np.log(weight_scale * endog_mu)
+        ll_obs -= weight_scale * endog_mu
+        ll_obs -= special.gammaln(weight_scale) + np.log(endog)
+        return ll_obs

-    def resid_anscombe(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+        # in Stata scale is set to equal 1 for reporting llf
+        # in R it's the dispersion, though there is a loss of precision vs.
+        # our results due to an assumed difference in implementation
+
+    def resid_anscombe(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The Anscombe residuals

         Parameters
@@ -772,13 +841,15 @@ class Gamma(Family):
         -----
         .. math::

-           resid\\_anscombe_i = 3 * (endog_i^{1/3} - \\mu_i^{1/3}) / \\mu_i^{1/3}
-           / \\sqrt{scale} * \\sqrt(var\\_weights)
+           resid\_anscombe_i = 3 * (endog_i^{1/3} - \mu_i^{1/3}) / \mu_i^{1/3}
+           / \sqrt{scale} * \sqrt(var\_weights)
         """
-        pass
+        resid = 3 * (endog**(1/3.) - mu**(1/3.)) / mu**(1/3.) / scale ** 0.5
+        resid *= np.sqrt(var_weights)
+        return resid

-    def get_distribution(self, mu, scale, var_weights=1.0):
-        """
+    def get_distribution(self, mu, scale, var_weights=1.):
+        r"""
         Frozen Gamma distribution instance for given parameters

         Parameters
@@ -795,7 +866,11 @@ class Gamma(Family):
         distribution instance

         """
-        pass
+        # combine var_weights with scale
+        scale_ = scale / var_weights
+        shape = 1 / scale_
+        scale_g = mu * scale_
+        return stats.gamma(shape, scale=scale_g)


 class Binomial(Family):
@@ -838,27 +913,37 @@ class Binomial(Family):
     successes, with parameter `var_weights` containing the
     number of trials for each row.
     """
-    links = [L.Logit, L.Probit, L.Cauchy, L.Log, L.LogC, L.CLogLog, L.
-        LogLog, L.Identity]
-    variance = V.binary
+
+    links = [L.Logit, L.Probit, L.Cauchy, L.Log, L.LogC, L.CLogLog, L.LogLog,
+             L.Identity]
+    variance = V.binary  # this is not used below in an effort to include n
+
+    # Other safe links, e.g. cloglog and probit are subclasses
     safe_links = [L.Logit, L.CDFLink]

-    def __init__(self, link=None, check_link=True):
+    def __init__(self, link=None, check_link=True):  # , n=1.):
         if link is None:
             link = L.Logit()
+        # TODO: it *should* work for a constant n>1 actually, if freq_weights
+        # is equal to n
         self.n = 1
-        super(Binomial, self).__init__(link=link, variance=V.Binomial(n=
-            self.n), check_link=check_link)
+        # overwritten by initialize if needed but always used to initialize
+        # variance since endog is assumed/forced to be (0,1)
+        super(Binomial, self).__init__(
+            link=link,
+            variance=V.Binomial(n=self.n),
+            check_link=check_link
+            )

     def starting_mu(self, y):
-        """
+        r"""
         The starting values for the IRLS algorithm for the Binomial family.
-        A good choice for the binomial family is :math:`\\mu_0 = (Y_i + 0.5)/2`
+        A good choice for the binomial family is :math:`\mu_0 = (Y_i + 0.5)/2`
         """
-        pass
+        return (y + .5)/2

     def initialize(self, endog, freq_weights):
-        """
+        '''
         Initialize the response variable.

         Parameters
@@ -876,11 +961,23 @@ class Binomial(Family):
         (successes, failures) and
         successes/(success + failures) is returned.  And n is set to
         successes + failures.
-        """
-        pass
+        '''
+        # if not np.all(np.asarray(freq_weights) == 1):
+        #     self.variance = V.Binomial(n=freq_weights)
+        if endog.ndim > 1 and endog.shape[1] > 2:
+            raise ValueError('endog has more than 2 columns. The Binomial '
+                             'link supports either a single response variable '
+                             'or a paired response variable.')
+        elif endog.ndim > 1 and endog.shape[1] > 1:
+            y = endog[:, 0]
+            # overwrite self.freq_weights for deviance below
+            self.n = endog.sum(1)
+            return y*1./self.n, self.n
+        else:
+            return endog, np.ones(endog.shape[0])

     def _resid_dev(self, endog, mu):
-        """
+        r"""
         Binomial deviance residuals

         Parameters
@@ -899,13 +996,16 @@ class Binomial(Family):
         -----
         .. math::

-           resid\\_dev_i = 2 * n * (endog_i * \\ln(endog_i /\\mu_i) +
-           (1 - endog_i) * \\ln((1 - endog_i) / (1 - \\mu_i)))
+           resid\_dev_i = 2 * n * (endog_i * \ln(endog_i /\mu_i) +
+           (1 - endog_i) * \ln((1 - endog_i) / (1 - \mu_i)))
         """
-        pass
+        endog_mu = self._clean(endog / (mu + 1e-20))
+        n_endog_mu = self._clean((1. - endog) / (1. - mu + 1e-20))
+        resid_dev = endog * np.log(endog_mu) + (1 - endog) * np.log(n_endog_mu)
+        return 2 * self.n * resid_dev

-    def loglike_obs(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def loglike_obs(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The log-likelihood function for each observation in terms of the fitted
         mean response for the Binomial distribution.

@@ -932,25 +1032,32 @@ class Binomial(Family):

         .. math::

-         ll_i = \\sum_i (y_i * \\log(\\mu_i/(1-\\mu_i)) + \\log(1-\\mu_i)) *
-               var\\_weights_i
+         ll_i = \sum_i (y_i * \log(\mu_i/(1-\mu_i)) + \log(1-\mu_i)) *
+               var\_weights_i

         If the endogenous variable is binomial:

         .. math::

-           ll_i = \\sum_i var\\_weights_i * (\\ln \\Gamma(n+1) -
-                  \\ln \\Gamma(y_i + 1) - \\ln \\Gamma(n_i - y_i +1) + y_i *
-                  \\log(\\mu_i / (n_i - \\mu_i)) + n * \\log(1 - \\mu_i/n_i))
+           ll_i = \sum_i var\_weights_i * (\ln \Gamma(n+1) -
+                  \ln \Gamma(y_i + 1) - \ln \Gamma(n_i - y_i +1) + y_i *
+                  \log(\mu_i / (n_i - \mu_i)) + n * \log(1 - \mu_i/n_i))

         where :math:`y_i = Y_i * n_i` with :math:`Y_i` and :math:`n_i` as
         defined in Binomial initialize.  This simply makes :math:`y_i` the
         original number of successes.
         """
-        pass
+        n = self.n     # Number of trials
+        y = endog * n  # Number of successes

-    def resid_anscombe(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+        # note that mu is still in (0,1), i.e. not converted back
+        return (
+            special.gammaln(n + 1) - special.gammaln(y + 1) -
+            special.gammaln(n - y + 1) + y * np.log(mu / (1 - mu + 1e-20)) +
+            n * np.log(1 - mu + 1e-20)) * var_weights
+
+    def resid_anscombe(self, endog, mu, var_weights=1., scale=1.):
+        r'''
         The Anscombe residuals

         Parameters
@@ -974,8 +1081,8 @@ class Binomial(Family):
         -----
         .. math::

-            n^{2/3}*(cox\\_snell(endog)-cox\\_snell(mu)) /
-            (mu*(1-mu/n)*scale^3)^{1/6} * \\sqrt(var\\_weights)
+            n^{2/3}*(cox\_snell(endog)-cox\_snell(mu)) /
+            (mu*(1-mu/n)*scale^3)^{1/6} * \sqrt(var\_weights)

         where cox_snell is defined as
         cox_snell(x) = betainc(2/3., 2/3., x)*betainc(2/3.,2/3.)
@@ -986,7 +1093,7 @@ class Binomial(Family):
         The name 'cox_snell' is idiosyncratic and is simply used for
         convenience following the approach suggested in Cox and Snell (1968).
         Further note that
-        :math:`cox\\_snell(x) = \\frac{3}{2}*x^{2/3} *
+        :math:`cox\_snell(x) = \frac{3}{2}*x^{2/3} *
         hyp2f1(2/3.,1/3.,5/3.,x)`
         where hyp2f1 is the hypergeometric 2f1 function.  The Anscombe
         residuals are sometimes defined in the literature using the
@@ -999,11 +1106,21 @@ class Binomial(Family):

         Cox, DR and Snell, EJ. (1968) "A General Definition of Residuals."
             Journal of the Royal Statistical Society B. 30, 248-75.
-        """
-        pass
+        '''
+        endog = endog * self.n  # convert back to successes
+        mu = mu * self.n  # convert back to successes

-    def get_distribution(self, mu, scale=1.0, var_weights=1.0, n_trials=1):
-        """
+        def cox_snell(x):
+            return special.betainc(2/3., 2/3., x) * special.beta(2/3., 2/3.)
+
+        resid = (self.n ** (2/3.) * (cox_snell(endog * 1. / self.n) -
+                                     cox_snell(mu * 1. / self.n)) /
+                 (mu * (1 - mu * 1. / self.n) * scale ** 3) ** (1 / 6.))
+        resid *= np.sqrt(var_weights)
+        return resid
+
+    def get_distribution(self, mu, scale=1., var_weights=1., n_trials=1):
+        r"""
         Frozen Binomial distribution instance for given parameters

         Parameters
@@ -1024,7 +1141,8 @@ class Binomial(Family):
         distribution instance

         """
-        pass
+
+        return stats.binom(n=n_trials, p=mu)


 class InverseGaussian(Family):
@@ -1061,18 +1179,22 @@ class InverseGaussian(Family):
     The inverse Gaussian distribution is sometimes referred to in the
     literature as the Wald distribution.
     """
+
     links = [L.InverseSquared, L.InversePower, L.Identity, L.Log]
     variance = V.mu_cubed
-    safe_links = [L.InverseSquared, L.Log]
+    safe_links = [L.InverseSquared, L.Log, ]

     def __init__(self, link=None, check_link=True):
         if link is None:
             link = L.InverseSquared()
-        super(InverseGaussian, self).__init__(link=link, variance=
-            InverseGaussian.variance, check_link=check_link)
+        super(InverseGaussian, self).__init__(
+            link=link,
+            variance=InverseGaussian.variance,
+            check_link=check_link
+            )

     def _resid_dev(self, endog, mu):
-        """
+        r"""
         Inverse Gaussian deviance residuals

         Parameters
@@ -1091,12 +1213,12 @@ class InverseGaussian(Family):
         -----
         .. math::

-           resid\\_dev_i = 1 / (endog_i * \\mu_i^2) * (endog_i - \\mu_i)^2
+           resid\_dev_i = 1 / (endog_i * \mu_i^2) * (endog_i - \mu_i)^2
         """
-        pass
+        return 1. / (endog * mu ** 2) * (endog - mu) ** 2

-    def loglike_obs(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def loglike_obs(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The log-likelihood function for each observation in terms of the fitted
         mean response for the Inverse Gaussian distribution.

@@ -1121,14 +1243,17 @@ class InverseGaussian(Family):
         -----
         .. math::

-           ll_i = -1/2 * (var\\_weights_i * (endog_i - \\mu_i)^2 /
-           (scale * endog_i * \\mu_i^2) + \\ln(scale * \\endog_i^3 /
-           var\\_weights_i) - \\ln(2 * \\pi))
+           ll_i = -1/2 * (var\_weights_i * (endog_i - \mu_i)^2 /
+           (scale * endog_i * \mu_i^2) + \ln(scale * \endog_i^3 /
+           var\_weights_i) - \ln(2 * \pi))
         """
-        pass
+        ll_obs = -var_weights * (endog - mu) ** 2 / (scale * endog * mu ** 2)
+        ll_obs += -np.log(scale * endog ** 3 / var_weights) - np.log(2 * np.pi)
+        ll_obs /= 2
+        return ll_obs

-    def resid_anscombe(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def resid_anscombe(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The Anscombe residuals

         Parameters
@@ -1153,13 +1278,15 @@ class InverseGaussian(Family):
         -----
         .. math::

-           resid\\_anscombe_i = \\log(Y_i / \\mu_i) / \\sqrt{\\mu_i * scale} *
-           \\sqrt(var\\_weights)
+           resid\_anscombe_i = \log(Y_i / \mu_i) / \sqrt{\mu_i * scale} *
+           \sqrt(var\_weights)
         """
-        pass
+        resid = np.log(endog / mu) / np.sqrt(mu * scale)
+        resid *= np.sqrt(var_weights)
+        return resid

-    def get_distribution(self, mu, scale, var_weights=1.0):
-        """
+    def get_distribution(self, mu, scale, var_weights=1.):
+        r"""
         Frozen Inverse Gaussian distribution instance for given parameters

         Parameters
@@ -1176,11 +1303,14 @@ class InverseGaussian(Family):
         distribution instance

         """
-        pass
+        # combine var_weights with scale
+        scale_ = scale / var_weights
+        mu_ig = mu * scale_
+        return stats.invgauss(mu_ig, scale=1 / scale_)


 class NegativeBinomial(Family):
-    """
+    r"""
     Negative Binomial exponential family (corresponds to NB2).

     Parameters
@@ -1215,33 +1345,38 @@ class NegativeBinomial(Family):
     -----
     Power link functions are not yet supported.

-    Parameterization for :math:`y=0, 1, 2, \\ldots` is
+    Parameterization for :math:`y=0, 1, 2, \ldots` is

     .. math::

-       f(y) = \\frac{\\Gamma(y+\\frac{1}{\\alpha})}{y!\\Gamma(\\frac{1}{\\alpha})}
-              \\left(\\frac{1}{1+\\alpha\\mu}\\right)^{\\frac{1}{\\alpha}}
-              \\left(\\frac{\\alpha\\mu}{1+\\alpha\\mu}\\right)^y
+       f(y) = \frac{\Gamma(y+\frac{1}{\alpha})}{y!\Gamma(\frac{1}{\alpha})}
+              \left(\frac{1}{1+\alpha\mu}\right)^{\frac{1}{\alpha}}
+              \left(\frac{\alpha\mu}{1+\alpha\mu}\right)^y

-    with :math:`E[Y]=\\mu\\,` and :math:`Var[Y]=\\mu+\\alpha\\mu^2`.
+    with :math:`E[Y]=\mu\,` and :math:`Var[Y]=\mu+\alpha\mu^2`.
     """
     links = [L.Log, L.CLogLog, L.Identity, L.NegativeBinomial, L.Power]
+    # TODO: add the ability to use the power links with an if test
+    # similar to below
     variance = V.nbinom
-    safe_links = [L.Log]
-
-    def __init__(self, link=None, alpha=1.0, check_link=True):
-        self.alpha = 1.0 * alpha
-        if alpha is self.__init__.__defaults__[1]:
-            warnings.warn(
-                f'Negative binomial dispersion parameter alpha not set. Using default value alpha={alpha}.'
-                , ValueWarning)
+    safe_links = [L.Log, ]
+
+    def __init__(self, link=None, alpha=1., check_link=True):
+        self.alpha = 1. * alpha  # make it at least float
+        if alpha is self.__init__.__defaults__[1]:  # `is` is intentional
+            warnings.warn("Negative binomial dispersion parameter alpha not "
+                          f"set. Using default value alpha={alpha}.",
+                          ValueWarning)
         if link is None:
             link = L.Log()
-        super(NegativeBinomial, self).__init__(link=link, variance=V.
-            NegativeBinomial(alpha=self.alpha), check_link=check_link)
+        super(NegativeBinomial, self).__init__(
+            link=link,
+            variance=V.NegativeBinomial(alpha=self.alpha),
+            check_link=check_link
+            )

     def _resid_dev(self, endog, mu):
-        """
+        r"""
         Negative Binomial deviance residuals

         Parameters
@@ -1260,14 +1395,19 @@ class NegativeBinomial(Family):
         -----
         .. math::

-            resid_dev_i = 2 * (endog_i * \\ln(endog_i /
-            \\mu_i) - (endog_i + 1 / \\alpha) * \\ln((endog_i + 1 / \\alpha) /
-            (\\mu_i + 1 / \\alpha)))
+            resid_dev_i = 2 * (endog_i * \ln(endog_i /
+            \mu_i) - (endog_i + 1 / \alpha) * \ln((endog_i + 1 / \alpha) /
+            (\mu_i + 1 / \alpha)))
         """
-        pass
+        endog_mu = self._clean(endog / mu)
+        endog_alpha = endog + 1 / self.alpha
+        mu_alpha = mu + 1 / self.alpha
+        resid_dev = endog * np.log(endog_mu)
+        resid_dev -= endog_alpha * np.log(endog_alpha / mu_alpha)
+        return 2 * resid_dev

-    def loglike_obs(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def loglike_obs(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The log-likelihood function for each observation in terms of the fitted
         mean response for the Negative Binomial distribution.

@@ -1294,16 +1434,16 @@ class NegativeBinomial(Family):

         .. math::

-           llf = \\sum_i var\\_weights_i / scale * (Y_i * \\log{(\\alpha * \\mu_i /
-                 (1 + \\alpha * \\mu_i))} - \\log{(1 + \\alpha * \\mu_i)}/
-                 \\alpha + Constant)
+           llf = \sum_i var\_weights_i / scale * (Y_i * \log{(\alpha * \mu_i /
+                 (1 + \alpha * \mu_i))} - \log{(1 + \alpha * \mu_i)}/
+                 \alpha + Constant)

         where :math:`Constant` is defined as:

         .. math::

-           Constant = \\ln \\Gamma{(Y_i + 1/ \\alpha )} - \\ln \\Gamma(Y_i + 1) -
-                      \\ln \\Gamma{(1/ \\alpha )}
+           Constant = \ln \Gamma{(Y_i + 1/ \alpha )} - \ln \Gamma(Y_i + 1) -
+                      \ln \Gamma{(1/ \alpha )}

         constant = (special.gammaln(endog + 1 / self.alpha) -
                     special.gammaln(endog+1)-special.gammaln(1/self.alpha))
@@ -1311,10 +1451,15 @@ class NegativeBinomial(Family):
                 np.log(1 + self.alpha * mu) / self.alpha +
                 constant) * var_weights / scale
         """
-        pass
+        ll_obs = endog * np.log(self.alpha * mu)
+        ll_obs -= (endog + 1 / self.alpha) * np.log(1 + self.alpha * mu)
+        ll_obs += special.gammaln(endog + 1 / self.alpha)
+        ll_obs -= special.gammaln(1 / self.alpha)
+        ll_obs -= special.gammaln(endog + 1)
+        return var_weights / scale * ll_obs

-    def resid_anscombe(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+    def resid_anscombe(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The Anscombe residuals

         Parameters
@@ -1337,23 +1482,31 @@ class NegativeBinomial(Family):
         Notes
         -----
         Anscombe residuals for Negative Binomial are the same as for Binomial
-        upon setting :math:`n=-\\frac{1}{\\alpha}`. Due to the negative value of
-        :math:`-\\alpha*Y` the representation with the hypergeometric function
+        upon setting :math:`n=-\frac{1}{\alpha}`. Due to the negative value of
+        :math:`-\alpha*Y` the representation with the hypergeometric function
         :math:`H2F1(x) =  hyp2f1(2/3.,1/3.,5/3.,x)` is advantageous

         .. math::

-            resid\\_anscombe_i = \\frac{3}{2} *
-            (Y_i^(2/3)*H2F1(-\\alpha*Y_i) - \\mu_i^(2/3)*H2F1(-\\alpha*\\mu_i))
-            / (\\mu_i * (1+\\alpha*\\mu_i) * scale^3)^(1/6) * \\sqrt(var\\_weights)
+            resid\_anscombe_i = \frac{3}{2} *
+            (Y_i^(2/3)*H2F1(-\alpha*Y_i) - \mu_i^(2/3)*H2F1(-\alpha*\mu_i))
+            / (\mu_i * (1+\alpha*\mu_i) * scale^3)^(1/6) * \sqrt(var\_weights)

         Note that for the (unregularized) Beta function, one has
         :math:`Beta(z,a,b) = z^a/a * H2F1(a,1-b,a+1,z)`
         """
-        pass
+        def hyp2f1(x):
+            return special.hyp2f1(2 / 3., 1 / 3., 5 / 3., x)

-    def get_distribution(self, mu, scale=1.0, var_weights=1.0):
-        """
+        resid = (3 / 2. * (endog ** (2 / 3.) * hyp2f1(-self.alpha * endog) -
+                           mu ** (2 / 3.) * hyp2f1(-self.alpha * mu)) /
+                 (mu * (1 + self.alpha * mu) *
+                 scale ** 3) ** (1 / 6.))
+        resid *= np.sqrt(var_weights)
+        return resid
+
+    def get_distribution(self, mu, scale=1., var_weights=1.):
+        r"""
         Frozen NegativeBinomial distribution instance for given parameters

         Parameters
@@ -1371,7 +1524,9 @@ class NegativeBinomial(Family):
         distribution instance

         """
-        pass
+        size = 1. / self.alpha
+        prob = size / (size + mu)
+        return stats.nbinom(size, prob)


 class Tweedie(Family):
@@ -1422,20 +1577,22 @@ class Tweedie(Family):
     variance = V.Power(power=1.5)
     safe_links = [L.Log, L.Power]

-    def __init__(self, link=None, var_power=1.0, eql=False, check_link=True):
+    def __init__(self, link=None, var_power=1., eql=False, check_link=True):
         self.var_power = var_power
         self.eql = eql
         if eql and (var_power < 1 or var_power > 2):
-            raise ValueError(
-                'Tweedie: if EQL=True then var_power must fall between 1 and 2'
-                )
+            raise ValueError("Tweedie: if EQL=True then var_power must fall "
+                             "between 1 and 2")
         if link is None:
             link = L.Log()
-        super(Tweedie, self).__init__(link=link, variance=V.Power(power=
-            var_power * 1.0), check_link=check_link)
+        super(Tweedie, self).__init__(
+            link=link,
+            variance=V.Power(power=var_power * 1.),
+            check_link=check_link
+            )

     def _resid_dev(self, endog, mu):
-        """
+        r"""
         Tweedie deviance residuals

         Parameters
@@ -1456,13 +1613,13 @@ class Tweedie(Family):

         .. math::

-            dev_i = \\mu_i
+            dev_i = \mu_i

         when :math:`endog_i = 0` and

         .. math::

-            dev_i = endog_i * \\log(endog_i / \\mu_i) + (\\mu_i - endog_i)
+            dev_i = endog_i * \log(endog_i / \mu_i) + (\mu_i - endog_i)

         otherwise.

@@ -1470,26 +1627,37 @@ class Tweedie(Family):

         .. math::

-            dev_i =  (endog_i - \\mu_i) / \\mu_i - \\log(endog_i / \\mu_i)
+            dev_i =  (endog_i - \mu_i) / \mu_i - \log(endog_i / \mu_i)

         For all other p,

         .. math::

             dev_i = endog_i^{2 - p} / ((1 - p) * (2 - p)) -
-                    endog_i * \\mu_i^{1 - p} / (1 - p) + \\mu_i^{2 - p} /
+                    endog_i * \mu_i^{1 - p} / (1 - p) + \mu_i^{2 - p} /
                     (2 - p)

         The deviance residual is then

         .. math::

-            resid\\_dev_i = 2 * dev_i
-        """
-        pass
-
-    def loglike_obs(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+            resid\_dev_i = 2 * dev_i
+        """
+        p = self.var_power
+        if p == 1:
+            dev = np.where(endog == 0,
+                           mu,
+                           endog * np.log(endog / mu) + (mu - endog))
+        elif p == 2:
+            endog1 = self._clean(endog)
+            dev = ((endog - mu) / mu) - np.log(endog1 / mu)
+        else:
+            dev = (endog ** (2 - p) / ((1 - p) * (2 - p)) -
+                   endog * mu ** (1-p) / (1 - p) + mu ** (2 - p) / (2 - p))
+        return 2 * dev
+
+    def loglike_obs(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The log-likelihood function for each observation in terms of the fitted
         mean response for the Tweedie distribution.

@@ -1525,10 +1693,77 @@ class Tweedie(Family):
         JA Nelder, D Pregibon (1987).  An extended quasi-likelihood function.
         Biometrika 74:2, pp 221-232.  https://www.jstor.org/stable/2336136
         """
-        pass
-
-    def resid_anscombe(self, endog, mu, var_weights=1.0, scale=1.0):
-        """
+        p = self.var_power
+        endog = np.atleast_1d(endog)
+        if p == 1:
+            return Poisson().loglike_obs(
+                endog=endog,
+                mu=mu,
+                var_weights=var_weights,
+                scale=scale
+            )
+        elif p == 2:
+            return Gamma().loglike_obs(
+                endog=endog,
+                mu=mu,
+                var_weights=var_weights,
+                scale=scale
+            )
+
+        if not self.eql:
+            if p < 1 or p > 2:
+                # We have not yet implemented the actual likelihood
+                return np.nan
+
+            # scipy compat bessel_wright added in 1.7
+            if SP_LT_17:
+                # old return was nan
+                return np.nan
+
+            # See: Dunn, Smyth (2004) "Series evaluation of Tweedie
+            # exponential dispersion model densities"
+            # pdf(y, mu, p, phi) = f(y, theta, phi)
+            # = c(y, phi) * exp(1/phi (y theta - kappa(theta)))
+            # kappa = cumulant function
+            # theta = function of expectation mu and power p
+            # alpha = (2-p)/(1-p)
+            # phi = scale
+            # for 1<p<2:
+            # c(y, phi) = 1/y * wright_bessel(a, b, x)
+            # a = -alpha
+            # b = 0
+            # x = (p-1)**alpha/(2-p) / y**alpha / phi**(1-alpha)
+            scale = scale / var_weights
+            theta = mu ** (1 - p) / (1 - p)
+            kappa = mu ** (2 - p) / (2 - p)
+            alpha = (2 - p) / (1 - p)
+
+            ll_obs = (endog * theta - kappa) / scale
+            idx = endog > 0
+            if np.any(idx):
+                if not np.isscalar(endog):
+                    endog = endog[idx]
+                if not np.isscalar(scale):
+                    scale = scale[idx]
+                x = ((p - 1) * scale / endog) ** alpha
+                x /= (2 - p) * scale
+                wb = special.wright_bessel(-alpha, 0, x)
+                ll_obs[idx] += np.log(1/endog * wb)
+            return ll_obs
+        else:
+            # Equations 4 of Kaas
+            llf = np.log(2 * np.pi * scale) + p * np.log(endog)
+            llf -= np.log(var_weights)
+            llf /= -2
+            u = (endog ** (2 - p)
+                 - (2 - p) * endog * mu ** (1 - p)
+                 + (1 - p) * mu ** (2 - p))
+            u *= var_weights / (scale * (1 - p) * (2 - p))
+
+        return llf - u
+
+    def resid_anscombe(self, endog, mu, var_weights=1., scale=1.):
+        r"""
         The Anscombe residuals

         Parameters
@@ -1554,8 +1789,8 @@ class Tweedie(Family):

         .. math::

-            resid\\_anscombe_i = \\log(endog_i / \\mu_i) / \\sqrt{\\mu_i * scale} *
-            \\sqrt(var\\_weights)
+            resid\_anscombe_i = \log(endog_i / \mu_i) / \sqrt{\mu_i * scale} *
+            \sqrt(var\_weights)

         Otherwise,

@@ -1565,7 +1800,14 @@ class Tweedie(Family):

         .. math::

-            resid\\_anscombe_i = (1 / c) * (endog_i^c - \\mu_i^c) / \\mu_i^{p / 6}
-            / \\sqrt{scale} * \\sqrt(var\\_weights)
-        """
-        pass
+            resid\_anscombe_i = (1 / c) * (endog_i^c - \mu_i^c) / \mu_i^{p / 6}
+            / \sqrt{scale} * \sqrt(var\_weights)
+        """
+        if self.var_power == 3:
+            resid = np.log(endog / mu) / np.sqrt(mu * scale)
+        else:
+            c = (3. - self.var_power) / 3.
+            resid = ((1. / c) * (endog ** c - mu ** c) /
+                     mu ** (self.var_power / 6.)) / scale ** 0.5
+        resid *= np.sqrt(var_weights)
+        return resid
diff --git a/statsmodels/genmod/families/links.py b/statsmodels/genmod/families/links.py
index 5326065c4..99b461d67 100644
--- a/statsmodels/genmod/families/links.py
+++ b/statsmodels/genmod/families/links.py
@@ -1,12 +1,23 @@
 """
 Defines the link functions to be used with GLM and GEE families.
 """
+
 import numpy as np
 import scipy.stats
 import warnings
+
 FLOAT_EPS = np.finfo(float).eps


+def _link_deprecation_warning(old, new):
+    warnings.warn(
+        f"The {old} link alias is deprecated. Use {new} instead. The {old} "
+        f"link alias will be removed after the 0.15.0 release.",
+        FutureWarning
+    )
+    # raise
+
+
 class Link:
     """
     A generic link function for one-parameter exponential family.
@@ -45,7 +56,7 @@ class Link:
         g^(-1)(z) : ndarray
             The value of the inverse of the link function g^(-1)(z) = p
         """
-        pass
+        return NotImplementedError

     def deriv(self, p):
         """
@@ -60,14 +71,15 @@ class Link:
         g'(p) : ndarray
             The value of the derivative of the link function g'(p)
         """
-        pass
+        return NotImplementedError

     def deriv2(self, p):
         """Second derivative of the link function g''(p)

         implemented through numerical differentiation
         """
-        pass
+        from statsmodels.tools.numdiff import _approx_fprime_cs_scalar
+        return _approx_fprime_cs_scalar(p, self.deriv)

     def inverse_deriv(self, z):
         """
@@ -88,7 +100,7 @@ class Link:
         This reference implementation gives the correct result but is
         inefficient, so it can be overridden in subclasses.
         """
-        pass
+        return 1 / self.deriv(self.inverse(z))

     def inverse_deriv2(self, z):
         """
@@ -110,7 +122,8 @@ class Link:
         This reference implementation gives the correct result but is
         inefficient, so it can be overridden in subclasses.
         """
-        pass
+        iz = self.inverse(z)
+        return -self.deriv2(iz) / self.deriv(iz) ** 3


 class Logit(Link):
@@ -140,7 +153,7 @@ class Logit(Link):
         pclip : ndarray
             Clipped probabilities
         """
-        pass
+        return np.clip(p, FLOAT_EPS, 1. - FLOAT_EPS)

     def __call__(self, p):
         """
@@ -161,7 +174,7 @@ class Logit(Link):
         g(p) = log(p / (1 - p))
         """
         p = self._clean(p)
-        return np.log(p / (1.0 - p))
+        return np.log(p / (1. - p))

     def inverse(self, z):
         """
@@ -181,7 +194,9 @@ class Logit(Link):
         -----
         g^(-1)(z) = exp(z)/(1+exp(z))
         """
-        pass
+        z = np.asarray(z)
+        t = np.exp(-z)
+        return 1. / (1. + t)

     def deriv(self, p):
         """
@@ -204,7 +219,8 @@ class Logit(Link):
         Alias for `Logit`:
         logit = Logit()
         """
-        pass
+        p = self._clean(p)
+        return 1. / (p * (1 - p))

     def inverse_deriv(self, z):
         """
@@ -220,7 +236,8 @@ class Logit(Link):
         g'^(-1)(z) : ndarray
             The value of the derivative of the inverse of the logit function
         """
-        pass
+        t = np.exp(z)
+        return t / (1 + t) ** 2

     def deriv2(self, p):
         """
@@ -236,7 +253,8 @@ class Logit(Link):
         g''(z) : ndarray
             The value of the second derivative of the logit function
         """
-        pass
+        v = p * (1 - p)
+        return (2 * p - 1) / v ** 2


 class Power(Link):
@@ -257,7 +275,7 @@ class Power(Link):
     Identity = Power(power=1.)
     """

-    def __init__(self, power=1.0):
+    def __init__(self, power=1.):
         self.power = power

     def __call__(self, p):
@@ -301,7 +319,10 @@ class Power(Link):
         -----
         g^(-1)(z`) = `z`**(1/`power`)
         """
-        pass
+        if self.power == 1:
+            return z
+        else:
+            return np.power(z, 1. / self.power)

     def deriv(self, p):
         """
@@ -321,7 +342,10 @@ class Power(Link):
         -----
         g'(`p`) = `power` * `p`**(`power` - 1)
         """
-        pass
+        if self.power == 1:
+            return np.ones_like(p)
+        else:
+            return self.power * np.power(p, self.power - 1)

     def deriv2(self, p):
         """
@@ -341,7 +365,10 @@ class Power(Link):
         -----
         g''(`p`) = `power` * (`power` - 1) * `p`**(`power` - 2)
         """
-        pass
+        if self.power == 1:
+            return np.zeros_like(p)
+        else:
+            return self.power * (self.power - 1) * np.power(p, self.power - 2)

     def inverse_deriv(self, z):
         """
@@ -358,7 +385,10 @@ class Power(Link):
             The value of the derivative of the inverse of the power transform
         function
         """
-        pass
+        if self.power == 1:
+            return np.ones_like(z)
+        else:
+            return np.power(z, (1 - self.power) / self.power) / self.power

     def inverse_deriv2(self, z):
         """
@@ -375,7 +405,11 @@ class Power(Link):
             The value of the derivative of the inverse of the power transform
         function
         """
-        pass
+        if self.power == 1:
+            return np.zeros_like(z)
+        else:
+            return ((1 - self.power) *
+                    np.power(z, (1 - 2*self.power)/self.power) / self.power**2)


 class InversePower(Power):
@@ -390,7 +424,7 @@ class InversePower(Power):
     """

     def __init__(self):
-        super().__init__(power=-1.0)
+        super().__init__(power=-1.)


 class Sqrt(Power):
@@ -405,22 +439,22 @@ class Sqrt(Power):
     """

     def __init__(self):
-        super().__init__(power=0.5)
+        super().__init__(power=.5)


 class InverseSquared(Power):
-    """
+    r"""
     The inverse squared transform

     Notes
     -----
-    g(`p`) = 1/(`p`\\*\\*2)
+    g(`p`) = 1/(`p`\*\*2)

     Alias of statsmodels.family.links.Power(power=2.)
     """

     def __init__(self):
-        super().__init__(power=-2.0)
+        super().__init__(power=-2.)


 class Identity(Power):
@@ -435,7 +469,7 @@ class Identity(Power):
     """

     def __init__(self):
-        super().__init__(power=1.0)
+        super().__init__(power=1.)


 class Log(Link):
@@ -448,6 +482,9 @@ class Log(Link):
     machine epsilon so that p is in (0,1). log is an alias of Log.
     """

+    def _clean(self, x):
+        return np.clip(x, FLOAT_EPS, np.inf)
+
     def __call__(self, p, **extra):
         """
         Log transform link function
@@ -487,7 +524,7 @@ class Log(Link):
         -----
         g^{-1}(z) = exp(z)
         """
-        pass
+        return np.exp(z)

     def deriv(self, p):
         """
@@ -507,7 +544,8 @@ class Log(Link):
         -----
         g'(x) = 1/x
         """
-        pass
+        p = self._clean(p)
+        return 1. / p

     def deriv2(self, p):
         """
@@ -527,7 +565,8 @@ class Log(Link):
         -----
         g''(x) = -1/x^2
         """
-        pass
+        p = self._clean(p)
+        return -1. / p ** 2

     def inverse_deriv(self, z):
         """
@@ -544,7 +583,7 @@ class Log(Link):
             The value of the derivative of the inverse of the log function,
             the exponential function
         """
-        pass
+        return np.exp(z)


 class LogC(Link):
@@ -557,6 +596,9 @@ class LogC(Link):
     machine epsilon so that p is in (0,1). logc is an alias of LogC.
     """

+    def _clean(self, x):
+        return np.clip(x, FLOAT_EPS, 1. - FLOAT_EPS)
+
     def __call__(self, p, **extra):
         """
         Log-complement transform link function
@@ -596,7 +638,7 @@ class LogC(Link):
         -----
         g^{-1}(z) = 1 - exp(z)
         """
-        pass
+        return 1 - np.exp(z)

     def deriv(self, p):
         """
@@ -616,7 +658,8 @@ class LogC(Link):
         -----
         g'(x) = -1/(1 - x)
         """
-        pass
+        p = self._clean(p)
+        return -1. / (1. - p)

     def deriv2(self, p):
         """
@@ -636,7 +679,8 @@ class LogC(Link):
         -----
         g''(x) = -(-1/(1 - x))^2
         """
-        pass
+        p = self._clean(p)
+        return -1 * np.power(-1. / (1. - p), 2)

     def inverse_deriv(self, z):
         """
@@ -654,7 +698,7 @@ class LogC(Link):
             The value of the derivative of the inverse of the log-complement
             function.
         """
-        pass
+        return -np.exp(z)

     def inverse_deriv2(self, z):
         """
@@ -671,9 +715,10 @@ class LogC(Link):
             The value of the second derivative of the inverse of the
             log-complement function.
         """
-        pass
+        return -np.exp(z)


+# TODO: the CDFLink is untested
 class CDFLink(Logit):
     """
     The use the CDF of a scipy.stats distribution
@@ -733,7 +778,7 @@ class CDFLink(Logit):
         -----
         g^(-1)(`z`) = `dbn`.cdf(`z`)
         """
-        pass
+        return self.dbn.cdf(z)

     def deriv(self, p):
         """
@@ -753,7 +798,8 @@ class CDFLink(Logit):
         -----
         g'(`p`) = 1./ `dbn`.pdf(`dbn`.ppf(`p`))
         """
-        pass
+        p = self._clean(p)
+        return 1. / self.dbn.pdf(self.dbn.ppf(p))

     def deriv2(self, p):
         """
@@ -761,7 +807,9 @@ class CDFLink(Logit):

         implemented through numerical differentiation
         """
-        pass
+        p = self._clean(p)
+        linpred = self.dbn.ppf(p)
+        return - self.inverse_deriv2(linpred) / self.dbn.pdf(linpred) ** 3

     def deriv2_numdiff(self, p):
         """
@@ -769,7 +817,10 @@ class CDFLink(Logit):

         implemented through numerical differentiation
         """
-        pass
+        from statsmodels.tools.numdiff import _approx_fprime_scalar
+        p = np.atleast_1d(p)
+        # Note: special function for norm.ppf does not support complex
+        return _approx_fprime_scalar(p, self.deriv, centered=True)

     def inverse_deriv(self, z):
         """
@@ -786,7 +837,7 @@ class CDFLink(Logit):
             The value of the derivative of the inverse of the logit function.
             This is just the pdf in a CDFLink,
         """
-        pass
+        return self.dbn.pdf(z)

     def inverse_deriv2(self, z):
         """
@@ -809,7 +860,11 @@ class CDFLink(Logit):

         The inherited method is implemented through numerical differentiation.
         """
-        pass
+        from statsmodels.tools.numdiff import _approx_fprime_scalar
+        z = np.atleast_1d(z)
+
+        # Note: special function for norm.ppf does not support complex
+        return _approx_fprime_scalar(z, self.inverse_deriv, centered=True)


 class Probit(CDFLink):
@@ -830,14 +885,16 @@ class Probit(CDFLink):
         This is the derivative of the pdf in a CDFLink

         """
-        pass
+        return - z * self.dbn.pdf(z)

     def deriv2(self, p):
         """
         Second derivative of the link function g''(p)

         """
-        pass
+        p = self._clean(p)
+        linpred = self.dbn.ppf(p)
+        return linpred / self.dbn.pdf(linpred) ** 2


 class Cauchy(CDFLink):
@@ -868,7 +925,13 @@ class Cauchy(CDFLink):
         g''(p) : ndarray
             Value of the second derivative of Cauchy link function at `p`
         """
-        pass
+        p = self._clean(p)
+        a = np.pi * (p - 0.5)
+        d2 = 2 * np.pi ** 2 * np.sin(a) / np.cos(a) ** 3
+        return d2
+
+    def inverse_deriv2(self, z):
+        return - 2 * z / (np.pi * (z ** 2 + 1) ** 2)


 class CLogLog(Logit):
@@ -923,7 +986,7 @@ class CLogLog(Logit):
         -----
         g^(-1)(`z`) = 1-exp(-exp(`z`))
         """
-        pass
+        return 1 - np.exp(-np.exp(z))

     def deriv(self, p):
         """
@@ -943,7 +1006,8 @@ class CLogLog(Logit):
         -----
         g'(p) = - 1 / ((p-1)*log(1-p))
         """
-        pass
+        p = self._clean(p)
+        return 1. / ((p - 1) * (np.log(1 - p)))

     def deriv2(self, p):
         """
@@ -959,7 +1023,11 @@ class CLogLog(Logit):
         g''(p) : ndarray
             The second derivative of the CLogLog link function
         """
-        pass
+        p = self._clean(p)
+        fl = np.log(1 - p)
+        d2 = -1 / ((1 - p) ** 2 * fl)
+        d2 *= 1 + 1 / fl
+        return d2

     def inverse_deriv(self, z):
         """
@@ -975,7 +1043,7 @@ class CLogLog(Logit):
         g^(-1)'(z) : ndarray
             The derivative of the inverse of the CLogLog link function
         """
-        pass
+        return np.exp(z - np.exp(z))


 class LogLog(Logit):
@@ -1026,7 +1094,7 @@ class LogLog(Logit):
         -----
         g^(-1)(`z`) = exp(-exp(-`z`))
         """
-        pass
+        return np.exp(-np.exp(-z))

     def deriv(self, p):
         """
@@ -1046,7 +1114,8 @@ class LogLog(Logit):
         -----
         g'(p) = - 1 /(p * log(p))
         """
-        pass
+        p = self._clean(p)
+        return -1. / (p * (np.log(p)))

     def deriv2(self, p):
         """
@@ -1062,7 +1131,9 @@ class LogLog(Logit):
         g''(p) : ndarray
             The second derivative of the LogLog link function
         """
-        pass
+        p = self._clean(p)
+        d2 = (1 + np.log(p)) / (p * (np.log(p))) ** 2
+        return d2

     def inverse_deriv(self, z):
         """
@@ -1078,7 +1149,7 @@ class LogLog(Logit):
         g^(-1)'(z) : ndarray
             The derivative of the inverse of the LogLog link function
         """
-        pass
+        return np.exp(-np.exp(-z) - z)

     def inverse_deriv2(self, z):
         """
@@ -1094,7 +1165,7 @@ class LogLog(Logit):
         g^(-1)''(z) : ndarray
             The second derivative of the inverse of the LogLog link function
         """
-        pass
+        return self.inverse_deriv(z) * (np.exp(-z) - 1)


 class NegativeBinomial(Link):
@@ -1109,9 +1180,12 @@ class NegativeBinomial(Link):
         Permissible values are usually assumed to be in (.01, 2).
     """

-    def __init__(self, alpha=1.0):
+    def __init__(self, alpha=1.):
         self.alpha = alpha

+    def _clean(self, x):
+        return np.clip(x, FLOAT_EPS, np.inf)
+
     def __call__(self, p):
         """
         Negative Binomial transform link function
@@ -1151,7 +1225,7 @@ class NegativeBinomial(Link):
         -----
         g^(-1)(z) = exp(z)/(alpha*(1-exp(z)))
         """
-        pass
+        return -1 / (self.alpha * (1 - np.exp(-z)))

     def deriv(self, p):
         """
@@ -1171,7 +1245,7 @@ class NegativeBinomial(Link):
         -----
         g'(x) = 1/(x+alpha*x^2)
         """
-        pass
+        return 1 / (p + self.alpha * p ** 2)

     def deriv2(self, p):
         """
@@ -1192,7 +1266,9 @@ class NegativeBinomial(Link):
         -----
         g''(x) = -(1+2*alpha*x)/(x+alpha*x^2)^2
         """
-        pass
+        numer = -(1 + 2 * self.alpha * p)
+        denom = (p + self.alpha * p ** 2) ** 2
+        return numer / denom

     def inverse_deriv(self, z):
         """
@@ -1209,9 +1285,11 @@ class NegativeBinomial(Link):
             The value of the derivative of the inverse of the negative
             binomial link
         """
-        pass
+        t = np.exp(z)
+        return t / (self.alpha * (1 - t) ** 2)


+# TODO: Deprecated aliases, remove after 0.15
 class logit(Logit):
     """
     Alias of Logit
@@ -1412,6 +1490,6 @@ class nbinom(NegativeBinomial):
     nbinom = NegativeBinomial(alpha=1.)
     """

-    def __init__(self, alpha=1.0):
+    def __init__(self, alpha=1.):
         _link_deprecation_warning('nbinom', 'NegativeBinomial')
         super().__init__(alpha=alpha)
diff --git a/statsmodels/genmod/families/varfuncs.py b/statsmodels/genmod/families/varfuncs.py
index 35a70523b..5b5336aab 100644
--- a/statsmodels/genmod/families/varfuncs.py
+++ b/statsmodels/genmod/families/varfuncs.py
@@ -47,7 +47,7 @@ class VarianceFunction:
         """
         Derivative of the variance function v'(mu)
         """
-        pass
+        return np.zeros_like(mu)


 constant = VarianceFunction()
@@ -84,7 +84,7 @@ class Power:
     mu_cubed = Power(power=3)
     """

-    def __init__(self, power=1.0):
+    def __init__(self, power=1.):
         self.power = power

     def __call__(self, mu):
@@ -109,7 +109,11 @@ class Power:

         May be undefined at zero.
         """
-        pass
+
+        der = self.power * np.fabs(mu) ** (self.power - 1)
+        ii = np.flatnonzero(mu < 0)
+        der[ii] *= -1
+        return der


 mu = Power()
@@ -171,6 +175,9 @@ class Binomial:
     def __init__(self, n=1):
         self.n = n

+    def _clean(self, p):
+        return np.clip(p, FLOAT_EPS, 1 - FLOAT_EPS)
+
     def __call__(self, mu):
         """
         Binomial variance function
@@ -188,11 +195,12 @@ class Binomial:
         p = self._clean(mu / self.n)
         return p * (1 - p) * self.n

+    # TODO: inherit from super
     def deriv(self, mu):
         """
         Derivative of the variance function v'(mu)
         """
-        pass
+        return 1 - 2*mu


 binary = Binomial()
@@ -206,7 +214,7 @@ This is an alias of Binomial(n=1)


 class NegativeBinomial:
-    """
+    '''
     Negative binomial variance function

     Parameters
@@ -231,11 +239,14 @@ class NegativeBinomial:

     A private method _clean trims the data by machine epsilon so that p is
     in (0,inf)
-    """
+    '''

-    def __init__(self, alpha=1.0):
+    def __init__(self, alpha=1.):
         self.alpha = alpha

+    def _clean(self, p):
+        return np.clip(p, FLOAT_EPS, np.inf)
+
     def __call__(self, mu):
         """
         Negative binomial variance function
@@ -251,13 +262,15 @@ class NegativeBinomial:
             variance = mu + alpha*mu**2
         """
         p = self._clean(mu)
-        return p + self.alpha * p ** 2
+        return p + self.alpha*p**2

     def deriv(self, mu):
         """
         Derivative of the negative binomial variance function.
         """
-        pass
+
+        p = self._clean(mu)
+        return 1 + 2 * self.alpha * p


 nbinom = NegativeBinomial()
diff --git a/statsmodels/genmod/generalized_estimating_equations.py b/statsmodels/genmod/generalized_estimating_equations.py
index 5201bc1b9..fe6c2945c 100644
--- a/statsmodels/genmod/generalized_estimating_equations.py
+++ b/statsmodels/genmod/generalized_estimating_equations.py
@@ -24,6 +24,7 @@ improved small-sample properties.  Biometrics. 2001 Mar;57(1):126-34.
 """
 from statsmodels.compat.python import lzip
 from statsmodels.compat.pandas import Appender
+
 import numpy as np
 from scipy import stats
 import pandas as pd
@@ -31,17 +32,31 @@ import patsy
 from collections import defaultdict
 from statsmodels.tools.decorators import cache_readonly
 import statsmodels.base.model as base
+# used for wrapper:
 import statsmodels.regression.linear_model as lm
 import statsmodels.base.wrapper as wrap
+
 from statsmodels.genmod import families
 from statsmodels.genmod.generalized_linear_model import GLM, GLMResults
 from statsmodels.genmod import cov_struct as cov_structs
+
 import statsmodels.genmod.families.varfuncs as varfuncs
 from statsmodels.genmod.families.links import Link
-from statsmodels.tools.sm_exceptions import ConvergenceWarning, DomainWarning, IterationLimitWarning, ValueWarning
+
+from statsmodels.tools.sm_exceptions import (ConvergenceWarning,
+                                             DomainWarning,
+                                             IterationLimitWarning,
+                                             ValueWarning)
 import warnings
-from statsmodels.graphics._regressionplots_doc import _plot_added_variable_doc, _plot_partial_residuals_doc, _plot_ceres_residuals_doc
-from statsmodels.discrete.discrete_margins import _get_margeff_exog, _check_margeff_args, _effects_at, margeff_cov_with_se, _check_at_is_all, _transform_names, _check_discrete_args, _get_dummy_index, _get_count_index
+
+from statsmodels.graphics._regressionplots_doc import (
+    _plot_added_variable_doc,
+    _plot_partial_residuals_doc,
+    _plot_ceres_residuals_doc)
+from statsmodels.discrete.discrete_margins import (
+    _get_margeff_exog, _check_margeff_args, _effects_at, margeff_cov_with_se,
+    _check_at_is_all, _transform_names, _check_discrete_args,
+    _get_dummy_index, _get_count_index)


 class ParameterConstraint:
@@ -64,22 +79,40 @@ class ParameterConstraint:
         exog : ndarray
           The n x p exognenous data for the full model.
         """
+
+        # In case a row or column vector is passed (patsy linear
+        # constraints passes a column vector).
         rhs = np.atleast_1d(rhs.squeeze())
+
         if rhs.ndim > 1:
-            raise ValueError(
-                'The right hand side of the constraint must be a vector.')
+            raise ValueError("The right hand side of the constraint "
+                             "must be a vector.")
+
         if len(rhs) != lhs.shape[0]:
-            raise ValueError(
-                'The number of rows of the left hand side constraint matrix L must equal the length of the right hand side constraint vector R.'
-                )
+            raise ValueError("The number of rows of the left hand "
+                             "side constraint matrix L must equal "
+                             "the length of the right hand side "
+                             "constraint vector R.")
+
         self.lhs = lhs
         self.rhs = rhs
+
+        # The columns of lhs0 are an orthogonal basis for the
+        # orthogonal complement to row(lhs), the columns of lhs1 are
+        # an orthogonal basis for row(lhs).  The columns of lhsf =
+        # [lhs0, lhs1] are mutually orthogonal.
         lhs_u, lhs_s, lhs_vt = np.linalg.svd(lhs.T, full_matrices=1)
         self.lhs0 = lhs_u[:, len(lhs_s):]
         self.lhs1 = lhs_u[:, 0:len(lhs_s)]
         self.lhsf = np.hstack((self.lhs0, self.lhs1))
-        self.param0 = np.dot(self.lhs1, np.dot(lhs_vt, self.rhs) / lhs_s)
+
+        # param0 is one solution to the underdetermined system
+        # L * param = R.
+        self.param0 = np.dot(self.lhs1, np.dot(lhs_vt, self.rhs) /
+                             lhs_s)
+
         self._offset_increment = np.dot(exog, self.param0)
+
         self.orig_exog = exog
         self.exog_fulltrans = np.dot(exog, self.lhsf)

@@ -93,7 +126,8 @@ class ParameterConstraint:
         exog : array_like
            The exogeneous data for the model.
         """
-        pass
+
+        return self._offset_increment

     def reduced_exog(self):
         """
@@ -105,28 +139,30 @@ class ParameterConstraint:
         exog : array_like
            The exogeneous data for the model.
         """
-        pass
+        return self.exog_fulltrans[:, 0:self.lhs0.shape[1]]

     def restore_exog(self):
         """
         Returns the full exog matrix before it was reduced to
         satisfy the constraint.
         """
-        pass
+        return self.orig_exog

     def unpack_param(self, params):
         """
         Converts the parameter vector `params` from reduced to full
         coordinates.
         """
-        pass
+
+        return self.param0 + np.dot(self.lhs0, params)

     def unpack_cov(self, bcov):
         """
         Converts the covariance matrix `bcov` from reduced to full
         coordinates.
         """
-        pass
+
+        return np.dot(self.lhs0, np.dot(bcov, self.lhs0.T))


 _gee_init_doc = """
@@ -221,20 +257,28 @@ _gee_init_doc = """
     --------
     %(example)s
 """
+
 _gee_nointercept = """
     The nominal and ordinal GEE models should not have an intercept
     (either implicit or explicit).  Use "0 + " in a formula to
     suppress the intercept.
 """
-_gee_family_doc = """        The default is Gaussian.  To specify the binomial
+
+_gee_family_doc = """\
+        The default is Gaussian.  To specify the binomial
         distribution use `family=sm.families.Binomial()`. Each family
         can take a link instance as an argument.  See
         statsmodels.genmod.families.family for more information."""
-_gee_ordinal_family_doc = """        The only family supported is `Binomial`.  The default `Logit`
+
+_gee_ordinal_family_doc = """\
+        The only family supported is `Binomial`.  The default `Logit`
         link may be replaced with `probit` if desired."""
-_gee_nominal_family_doc = """        The default value `None` uses a multinomial logit family
+
+_gee_nominal_family_doc = """\
+        The default value `None` uses a multinomial logit family
         specifically designed for use with GEE.  Setting this
         argument to a non-default value is not currently supported."""
+
 _gee_fit_doc = """
     Fits a marginal regression model using generalized estimating
     equations (GEE).
@@ -292,6 +336,7 @@ _gee_fit_doc = """
     `params_niter` to a value greater than 1, since the mean
     structure parameters converge in one step.
 """
+
 _gee_results_doc = """
     Attributes
     ----------
@@ -334,6 +379,7 @@ _gee_results_doc = """
     bse : ndarray
         The standard errors of the fitted GEE parameters.
 """
+
 _gee_example = """
     Logistic regression with autoregressive working dependence:

@@ -366,6 +412,7 @@ _gee_example = """
     >>> result = model.fit()
     >>> print(result.summary())
 """
+
 _gee_ordinal_example = """
     Fit an ordinal regression model using GEE, with "global
     odds ratio" dependence:
@@ -384,6 +431,7 @@ _gee_ordinal_example = """
     >>> result = model.fit()
     >>> print(result.summary())
 """
+
 _gee_nominal_example = """
     Fit a nominal regression model using GEE:

@@ -412,29 +460,59 @@ _gee_nominal_example = """
 """


+def _check_args(endog, exog, groups, time, offset, exposure):
+
+    if endog.size != exog.shape[0]:
+        raise ValueError("Leading dimension of 'exog' should match "
+                         "length of 'endog'")
+
+    if groups.size != endog.size:
+        raise ValueError("'groups' and 'endog' should have the same size")
+
+    if time is not None and (time.size != endog.size):
+        raise ValueError("'time' and 'endog' should have the same size")
+
+    if offset is not None and (offset.size != endog.size):
+        raise ValueError("'offset and 'endog' should have the same size")
+
+    if exposure is not None and (exposure.size != endog.size):
+        raise ValueError("'exposure' and 'endog' should have the same size")
+
+
 class GEE(GLM):
+
     __doc__ = (
-        '    Marginal Regression Model using Generalized Estimating Equations.\n'
-         + _gee_init_doc % {'extra_params': base._missing_param_doc,
-        'family_doc': _gee_family_doc, 'example': _gee_example, 'notes': ''})
+        "    Marginal Regression Model using Generalized Estimating "
+        "Equations.\n" + _gee_init_doc %
+        {'extra_params': base._missing_param_doc,
+         'family_doc': _gee_family_doc,
+         'example': _gee_example,
+         'notes': ""})
+
     cached_means = None

     def __init__(self, endog, exog, groups, time=None, family=None,
-        cov_struct=None, missing='none', offset=None, exposure=None,
-        dep_data=None, constraint=None, update_dep=True, weights=None, **kwargs
-        ):
+                 cov_struct=None, missing='none', offset=None,
+                 exposure=None, dep_data=None, constraint=None,
+                 update_dep=True, weights=None, **kwargs):
+
         if type(self) is GEE:
             self._check_kwargs(kwargs)
         if family is not None:
             if not isinstance(family.link, tuple(family.safe_links)):
-                msg = (
-                    'The {0} link function does not respect the domain of the {1} family.'
-                    )
+                msg = ("The {0} link function does not respect the "
+                       "domain of the {1} family.")
                 warnings.warn(msg.format(family.link.__class__.__name__,
-                    family.__class__.__name__), DomainWarning)
-        groups = np.asarray(groups)
-        if 'missing_idx' in kwargs and kwargs['missing_idx'] is not None:
-            ii = ~kwargs['missing_idx']
+                                         family.__class__.__name__),
+                              DomainWarning)
+
+        groups = np.asarray(groups)  # in case groups is pandas
+
+        if "missing_idx" in kwargs and kwargs["missing_idx"] is not None:
+            # If here, we are entering from super.from_formula; missing
+            # has already been dropped from endog and exog, but not from
+            # the other variables.
+            ii = ~kwargs["missing_idx"]
             groups = groups[ii]
             if time is not None:
                 time = time[ii]
@@ -442,94 +520,267 @@ class GEE(GLM):
                 offset = offset[ii]
             if exposure is not None:
                 exposure = exposure[ii]
-            del kwargs['missing_idx']
+            del kwargs["missing_idx"]
+
         self.missing = missing
         self.dep_data = dep_data
         self.constraint = constraint
         self.update_dep = update_dep
+
         self._fit_history = defaultdict(list)
-        super(GEE, self).__init__(endog, exog, groups=groups, time=time,
-            offset=offset, exposure=exposure, weights=weights, dep_data=
-            dep_data, missing=missing, family=family, **kwargs)
-        _check_args(self.endog, self.exog, self.groups, self.time, getattr(
-            self, 'offset', None), getattr(self, 'exposure', None))
-        self._init_keys.extend(['update_dep', 'constraint', 'family',
-            'cov_struct'])
+
+        # Pass groups, time, offset, and dep_data so they are
+        # processed for missing data along with endog and exog.
+        # Calling super creates self.exog, self.endog, etc. as
+        # ndarrays and the original exog, endog, etc. are
+        # self.data.endog, etc.
+        super(GEE, self).__init__(endog, exog, groups=groups,
+                                  time=time, offset=offset,
+                                  exposure=exposure, weights=weights,
+                                  dep_data=dep_data, missing=missing,
+                                  family=family, **kwargs)
+
+        _check_args(
+            self.endog,
+            self.exog,
+            self.groups,
+            self.time,
+            getattr(self, "offset", None),
+            getattr(self, "exposure", None),
+        )
+
+        self._init_keys.extend(["update_dep", "constraint", "family",
+                                "cov_struct"])
+        # remove keys added by super that are not supported
         try:
-            self._init_keys.remove('freq_weights')
-            self._init_keys.remove('var_weights')
+            self._init_keys.remove("freq_weights")
+            self._init_keys.remove("var_weights")
         except ValueError:
             pass
+
+        # Handle the family argument
         if family is None:
             family = families.Gaussian()
-        elif not issubclass(family.__class__, families.Family):
-            raise ValueError('GEE: `family` must be a genmod family instance')
+        else:
+            if not issubclass(family.__class__, families.Family):
+                raise ValueError("GEE: `family` must be a genmod "
+                                 "family instance")
         self.family = family
+
+        # Handle the cov_struct argument
         if cov_struct is None:
             cov_struct = cov_structs.Independence()
-        elif not issubclass(cov_struct.__class__, cov_structs.CovStruct):
-            raise ValueError(
-                'GEE: `cov_struct` must be a genmod cov_struct instance')
+        else:
+            if not issubclass(cov_struct.__class__, cov_structs.CovStruct):
+                raise ValueError("GEE: `cov_struct` must be a genmod "
+                                 "cov_struct instance")
+
         self.cov_struct = cov_struct
+
+        # Handle the constraint
         self.constraint = None
         if constraint is not None:
             if len(constraint) != 2:
-                raise ValueError('GEE: `constraint` must be a 2-tuple.')
+                raise ValueError("GEE: `constraint` must be a 2-tuple.")
             if constraint[0].shape[1] != self.exog.shape[1]:
                 raise ValueError(
-                    'GEE: the left hand side of the constraint must have the same number of columns as the exog matrix.'
-                    )
-            self.constraint = ParameterConstraint(constraint[0], constraint
-                [1], self.exog)
+                    "GEE: the left hand side of the constraint must have "
+                    "the same number of columns as the exog matrix.")
+            self.constraint = ParameterConstraint(constraint[0],
+                                                  constraint[1],
+                                                  self.exog)
+
             if self._offset_exposure is not None:
                 self._offset_exposure += self.constraint.offset_increment()
             else:
-                self._offset_exposure = self.constraint.offset_increment(
-                    ).copy()
+                self._offset_exposure = (
+                    self.constraint.offset_increment().copy())
             self.exog = self.constraint.reduced_exog()
+
+        # Create list of row indices for each group
         group_labels, ix = np.unique(self.groups, return_inverse=True)
-        se = pd.Series(index=np.arange(len(ix)), dtype='int')
+        se = pd.Series(index=np.arange(len(ix)), dtype="int")
         gb = se.groupby(ix).groups
         dk = [(lb, np.asarray(gb[k])) for k, lb in enumerate(group_labels)]
         self.group_indices = dict(dk)
         self.group_labels = group_labels
+
+        # Convert the data to the internal representation, which is a
+        # list of arrays, corresponding to the groups.
         self.endog_li = self.cluster_list(self.endog)
         self.exog_li = self.cluster_list(self.exog)
+
         if self.weights is not None:
             self.weights_li = self.cluster_list(self.weights)
+
         self.num_group = len(self.endog_li)
+
+        # Time defaults to a 1d grid with equal spacing
         if self.time is not None:
             if self.time.ndim == 1:
                 self.time = self.time[:, None]
             self.time_li = self.cluster_list(self.time)
         else:
-            self.time_li = [np.arange(len(y), dtype=np.float64)[:, None] for
-                y in self.endog_li]
+            self.time_li = \
+                [np.arange(len(y), dtype=np.float64)[:, None]
+                 for y in self.endog_li]
             self.time = np.concatenate(self.time_li)
-        if self._offset_exposure is None or np.isscalar(self._offset_exposure
-            ) and self._offset_exposure == 0.0:
+
+        if (self._offset_exposure is None or
+            (np.isscalar(self._offset_exposure) and
+             self._offset_exposure == 0.)):
             self.offset_li = None
         else:
             self.offset_li = self.cluster_list(self._offset_exposure)
         if constraint is not None:
-            self.constraint.exog_fulltrans_li = self.cluster_list(self.
-                constraint.exog_fulltrans)
+            self.constraint.exog_fulltrans_li = \
+                self.cluster_list(self.constraint.exog_fulltrans)
+
         self.family = family
+
         self.cov_struct.initialize(self)
+
+        # Total sample size
         group_ns = [len(y) for y in self.endog_li]
         self.nobs = sum(group_ns)
-        self.df_model = self.exog.shape[1] - 1
+        # The following are column based, not on rank see #1928
+        self.df_model = self.exog.shape[1] - 1  # assumes constant
         self.df_resid = self.nobs - self.exog.shape[1]
+
+        # Skip the covariance updates if all groups have a single
+        # observation (reduces to fitting a GLM).
         maxgroup = max([len(x) for x in self.endog_li])
         if maxgroup == 1:
             self.update_dep = False

+    # Override to allow groups and time to be passed as variable
+    # names.
+    @classmethod
+    def from_formula(cls, formula, groups, data, subset=None,
+                     time=None, offset=None, exposure=None,
+                     *args, **kwargs):
+        """
+        Create a GEE model instance from a formula and dataframe.
+
+        Parameters
+        ----------
+        formula : str or generic Formula object
+            The formula specifying the model
+        groups : array_like or string
+            Array of grouping labels.  If a string, this is the name
+            of a variable in `data` that contains the grouping labels.
+        data : array_like
+            The data for the model.
+        subset : array_like
+            An array-like object of booleans, integers, or index
+            values that indicate the subset of the data to used when
+            fitting the model.
+        time : array_like or string
+            The time values, used for dependence structures involving
+            distances between observations.  If a string, this is the
+            name of a variable in `data` that contains the time
+            values.
+        offset : array_like or string
+            The offset values, added to the linear predictor.  If a
+            string, this is the name of a variable in `data` that
+            contains the offset values.
+        exposure : array_like or string
+            The exposure values, only used if the link function is the
+            logarithm function, in which case the log of `exposure`
+            is added to the offset (if any).  If a string, this is the
+            name of a variable in `data` that contains the offset
+            values.
+        %(missing_param_doc)s
+        args : extra arguments
+            These are passed to the model
+        kwargs : extra keyword arguments
+            These are passed to the model with two exceptions. `dep_data`
+            is processed as described below.  The ``eval_env`` keyword is
+            passed to patsy. It can be either a
+            :class:`patsy:patsy.EvalEnvironment` object or an integer
+            indicating the depth of the namespace to use. For example, the
+            default ``eval_env=0`` uses the calling namespace.
+            If you wish to use a "clean" environment set ``eval_env=-1``.
+
+        Optional arguments
+        ------------------
+        dep_data : str or array_like
+            Data used for estimating the dependence structure.  See
+            specific dependence structure classes (e.g. Nested) for
+            details.  If `dep_data` is a string, it is interpreted as
+            a formula that is applied to `data`. If it is an array, it
+            must be an array of strings corresponding to column names in
+            `data`.  Otherwise it must be an array-like with the same
+            number of rows as data.
+
+        Returns
+        -------
+        model : GEE model instance
+
+        Notes
+        -----
+        `data` must define __getitem__ with the keys in the formula
+        terms args and kwargs are passed on to the model
+        instantiation. E.g., a numpy structured or rec array, a
+        dictionary, or a pandas DataFrame.
+        """ % {'missing_param_doc': base._missing_param_doc}
+
+        groups_name = "Groups"
+        if isinstance(groups, str):
+            groups_name = groups
+            groups = data[groups]
+
+        if isinstance(time, str):
+            time = data[time]
+
+        if isinstance(offset, str):
+            offset = data[offset]
+
+        if isinstance(exposure, str):
+            exposure = data[exposure]
+
+        dep_data = kwargs.get("dep_data")
+        dep_data_names = None
+        if dep_data is not None:
+            if isinstance(dep_data, str):
+                dep_data = patsy.dmatrix(dep_data, data,
+                                         return_type='dataframe')
+                dep_data_names = dep_data.columns.tolist()
+            else:
+                dep_data_names = list(dep_data)
+                dep_data = data[dep_data]
+            kwargs["dep_data"] = np.asarray(dep_data)
+
+        family = None
+        if "family" in kwargs:
+            family = kwargs["family"]
+            del kwargs["family"]
+
+        model = super(GEE, cls).from_formula(formula, data=data, subset=subset,
+                                             groups=groups, time=time,
+                                             offset=offset,
+                                             exposure=exposure,
+                                             family=family,
+                                             *args, **kwargs)
+
+        if dep_data_names is not None:
+            model._dep_data_names = dep_data_names
+        model._groups_name = groups_name
+
+        return model
+
     def cluster_list(self, array):
         """
         Returns `array` split into subarrays corresponding to the
         cluster structure.
         """
-        pass
+
+        if array.ndim == 1:
+            return [np.array(array[self.group_indices[k]])
+                    for k in self.group_labels]
+        else:
+            return [np.array(array[self.group_indices[k], :])
+                    for k in self.group_labels]

     def compare_score_test(self, submodel):
         """
@@ -562,13 +813,159 @@ class GEE(GLM):
         test in GEE".
         http://www.sph.umn.edu/faculty1/wp-content/uploads/2012/11/rr2002-013.pdf
         """
-        pass
+
+        # Since the model has not been fit, its scaletype has not been
+        # set.  So give it the scaletype of the submodel.
+        self.scaletype = submodel.model.scaletype
+
+        # Check consistency between model and submodel (not a comprehensive
+        # check)
+        submod = submodel.model
+        if self.exog.shape[0] != submod.exog.shape[0]:
+            msg = "Model and submodel have different numbers of cases."
+            raise ValueError(msg)
+        if self.exog.shape[1] == submod.exog.shape[1]:
+            msg = "Model and submodel have the same number of variables"
+            warnings.warn(msg)
+        if not isinstance(self.family, type(submod.family)):
+            msg = "Model and submodel have different GLM families."
+            warnings.warn(msg)
+        if not isinstance(self.cov_struct, type(submod.cov_struct)):
+            warnings.warn("Model and submodel have different GEE covariance "
+                          "structures.")
+        if not np.equal(self.weights, submod.weights).all():
+            msg = "Model and submodel should have the same weights."
+            warnings.warn(msg)
+
+        # Get the positions of the submodel variables in the
+        # parent model
+        qm, qc = _score_test_submodel(self, submodel.model)
+        if qm is None:
+            msg = "The provided model is not a submodel."
+            raise ValueError(msg)
+
+        # Embed the submodel params into a params vector for the
+        # parent model
+        params_ex = np.dot(qm, submodel.params)
+
+        # Attempt to preserve the state of the parent model
+        cov_struct_save = self.cov_struct
+        import copy
+        cached_means_save = copy.deepcopy(self.cached_means)
+
+        # Get the score vector of the submodel params in
+        # the parent model
+        self.cov_struct = submodel.cov_struct
+        self.update_cached_means(params_ex)
+        _, score = self._update_mean_params()
+        if score is None:
+            msg = "Singular matrix encountered in GEE score test"
+            warnings.warn(msg, ConvergenceWarning)
+            return None
+
+        if not hasattr(self, "ddof_scale"):
+            self.ddof_scale = self.exog.shape[1]
+
+        if not hasattr(self, "scaling_factor"):
+            self.scaling_factor = 1
+
+        _, ncov1, cmat = self._covmat()
+        score2 = np.dot(qc.T, score)
+
+        try:
+            amat = np.linalg.inv(ncov1)
+        except np.linalg.LinAlgError:
+            amat = np.linalg.pinv(ncov1)
+
+        bmat_11 = np.dot(qm.T, np.dot(cmat, qm))
+        bmat_22 = np.dot(qc.T, np.dot(cmat, qc))
+        bmat_12 = np.dot(qm.T, np.dot(cmat, qc))
+
+        amat_11 = np.dot(qm.T, np.dot(amat, qm))
+        amat_12 = np.dot(qm.T, np.dot(amat, qc))
+
+        try:
+            ab = np.linalg.solve(amat_11, bmat_12)
+        except np.linalg.LinAlgError:
+            ab = np.dot(np.linalg.pinv(amat_11), bmat_12)
+
+        score_cov = bmat_22 - np.dot(amat_12.T, ab)
+
+        try:
+            aa = np.linalg.solve(amat_11, amat_12)
+        except np.linalg.LinAlgError:
+            aa = np.dot(np.linalg.pinv(amat_11), amat_12)
+
+        score_cov -= np.dot(bmat_12.T, aa)
+
+        try:
+            ab = np.linalg.solve(amat_11, bmat_11)
+        except np.linalg.LinAlgError:
+            ab = np.dot(np.linalg.pinv(amat_11), bmat_11)
+
+        try:
+            aa = np.linalg.solve(amat_11, amat_12)
+        except np.linalg.LinAlgError:
+            aa = np.dot(np.linalg.pinv(amat_11), amat_12)
+
+        score_cov += np.dot(amat_12.T, np.dot(ab, aa))
+
+        # Attempt to restore state
+        self.cov_struct = cov_struct_save
+        self.cached_means = cached_means_save
+
+        from scipy.stats.distributions import chi2
+        try:
+            sc2 = np.linalg.solve(score_cov, score2)
+        except np.linalg.LinAlgError:
+            sc2 = np.dot(np.linalg.pinv(score_cov), score2)
+        score_statistic = np.dot(score2, sc2)
+        score_df = len(score2)
+        score_pvalue = 1 - chi2.cdf(score_statistic, score_df)
+        return {"statistic": score_statistic,
+                "df": score_df,
+                "p-value": score_pvalue}

     def estimate_scale(self):
         """
         Estimate the dispersion/scale.
         """
-        pass
+
+        if self.scaletype is None:
+            if isinstance(self.family, (families.Binomial, families.Poisson,
+                                        families.NegativeBinomial,
+                                        _Multinomial)):
+                return 1.
+        elif isinstance(self.scaletype, float):
+            return np.array(self.scaletype)
+
+        endog = self.endog_li
+        cached_means = self.cached_means
+        nobs = self.nobs
+        varfunc = self.family.variance
+
+        scale = 0.
+        fsum = 0.
+        for i in range(self.num_group):
+
+            if len(endog[i]) == 0:
+                continue
+
+            expval, _ = cached_means[i]
+            sdev = np.sqrt(varfunc(expval))
+            resid = (endog[i] - expval) / sdev
+
+            if self.weights is not None:
+                f = self.weights_li[i]
+                scale += np.sum(f * (resid ** 2))
+                fsum += f.sum()
+            else:
+                scale += np.sum(resid ** 2)
+                fsum += len(resid)
+
+        scale /= (fsum * (nobs - self.ddof_scale) / float(nobs))
+
+        return scale

     def mean_deriv(self, exog, lin_pred):
         """
@@ -591,7 +988,10 @@ class GEE(GLM):
         If there is an offset or exposure, it should be added to
         `lin_pred` prior to calling this function.
         """
-        pass
+
+        idl = self.family.link.inverse_deriv(lin_pred)
+        dmat = exog * idl[:, None]
+        return dmat

     def mean_deriv_exog(self, exog, params, offset_exposure=None):
         """
@@ -611,7 +1011,14 @@ class GEE(GLM):
         -------
         The derivative of the expected endog with respect to exog.
         """
-        pass
+
+        lin_pred = np.dot(exog, params)
+        if offset_exposure is not None:
+            lin_pred += offset_exposure
+
+        idl = self.family.link.inverse_deriv(lin_pred)
+        dmat = np.outer(idl, params)
+        return dmat

     def _update_mean_params(self):
         """
@@ -626,7 +1033,49 @@ class GEE(GLM):
             multiply this vector by the scale parameter to
             incorporate the scale.
         """
-        pass
+
+        endog = self.endog_li
+        exog = self.exog_li
+        weights = getattr(self, "weights_li", None)
+
+        cached_means = self.cached_means
+
+        varfunc = self.family.variance
+
+        bmat, score = 0, 0
+        for i in range(self.num_group):
+
+            expval, lpr = cached_means[i]
+            resid = endog[i] - expval
+            dmat = self.mean_deriv(exog[i], lpr)
+            sdev = np.sqrt(varfunc(expval))
+
+            if weights is not None:
+                w = weights[i]
+                wresid = resid * w
+                wdmat = dmat * w[:, None]
+            else:
+                wresid = resid
+                wdmat = dmat
+
+            rslt = self.cov_struct.covariance_matrix_solve(
+                    expval, i, sdev, (wdmat, wresid))
+            if rslt is None:
+                return None, None
+            vinv_d, vinv_resid = tuple(rslt)
+
+            bmat += np.dot(dmat.T, vinv_d)
+            score += np.dot(dmat.T, vinv_resid)
+
+        try:
+            update = np.linalg.solve(bmat, score)
+        except np.linalg.LinAlgError:
+            update = np.dot(np.linalg.pinv(bmat), score)
+
+        self._fit_history["cov_adjust"].append(
+            self.cov_struct.cov_adjust)
+
+        return update, score

     def update_cached_means(self, mean_params):
         """
@@ -635,7 +1084,26 @@ class GEE(GLM):
         called every time the regression parameters are changed, to
         keep the cached means up to date.
         """
-        pass
+
+        endog = self.endog_li
+        exog = self.exog_li
+        offset = self.offset_li
+
+        linkinv = self.family.link.inverse
+
+        self.cached_means = []
+
+        for i in range(self.num_group):
+
+            if len(endog[i]) == 0:
+                continue
+
+            lpr = np.dot(exog[i], mean_params)
+            if offset is not None:
+                lpr += offset[i]
+            expval = linkinv(lpr)
+
+            self.cached_means.append((expval, lpr))

     def _covmat(self):
         """
@@ -656,11 +1124,317 @@ class GEE(GLM):
            The center matrix of the sandwich expression, used in
            obtaining score test results.
         """
-        pass
+
+        endog = self.endog_li
+        exog = self.exog_li
+        weights = getattr(self, "weights_li", None)
+        varfunc = self.family.variance
+        cached_means = self.cached_means
+
+        # Calculate the naive (model-based) and robust (sandwich)
+        # covariances.
+        bmat, cmat = 0, 0
+        for i in range(self.num_group):
+
+            expval, lpr = cached_means[i]
+            resid = endog[i] - expval
+            dmat = self.mean_deriv(exog[i], lpr)
+            sdev = np.sqrt(varfunc(expval))
+
+            if weights is not None:
+                w = weights[i]
+                wresid = resid * w
+                wdmat = dmat * w[:, None]
+            else:
+                wresid = resid
+                wdmat = dmat
+
+            rslt = self.cov_struct.covariance_matrix_solve(
+                expval, i, sdev, (wdmat, wresid))
+            if rslt is None:
+                return None, None, None, None
+            vinv_d, vinv_resid = tuple(rslt)
+
+            bmat += np.dot(dmat.T, vinv_d)
+            dvinv_resid = np.dot(dmat.T, vinv_resid)
+            cmat += np.outer(dvinv_resid, dvinv_resid)
+
+        scale = self.estimate_scale()
+
+        try:
+            bmati = np.linalg.inv(bmat)
+        except np.linalg.LinAlgError:
+            bmati = np.linalg.pinv(bmat)
+
+        cov_naive = bmati * scale
+        cov_robust = np.dot(bmati, np.dot(cmat, bmati))
+
+        cov_naive *= self.scaling_factor
+        cov_robust *= self.scaling_factor
+        return cov_robust, cov_naive, cmat
+
+    # Calculate the bias-corrected sandwich estimate of Mancl and
+    # DeRouen.
+    def _bc_covmat(self, cov_naive):
+
+        cov_naive = cov_naive / self.scaling_factor
+        endog = self.endog_li
+        exog = self.exog_li
+        varfunc = self.family.variance
+        cached_means = self.cached_means
+        scale = self.estimate_scale()
+
+        bcm = 0
+        for i in range(self.num_group):
+
+            expval, lpr = cached_means[i]
+            resid = endog[i] - expval
+            dmat = self.mean_deriv(exog[i], lpr)
+            sdev = np.sqrt(varfunc(expval))
+
+            rslt = self.cov_struct.covariance_matrix_solve(
+                expval, i, sdev, (dmat,))
+            if rslt is None:
+                return None
+            vinv_d = rslt[0]
+            vinv_d /= scale
+
+            hmat = np.dot(vinv_d, cov_naive)
+            hmat = np.dot(hmat, dmat.T).T
+
+            f = self.weights_li[i] if self.weights is not None else 1.
+
+            aresid = np.linalg.solve(np.eye(len(resid)) - hmat, resid)
+            rslt = self.cov_struct.covariance_matrix_solve(
+                expval, i, sdev, (aresid,))
+            if rslt is None:
+                return None
+            srt = rslt[0]
+            srt = f * np.dot(dmat.T, srt) / scale
+            bcm += np.outer(srt, srt)
+
+        cov_robust_bc = np.dot(cov_naive, np.dot(bcm, cov_naive))
+        cov_robust_bc *= self.scaling_factor
+
+        return cov_robust_bc
+
+    def _starting_params(self):
+
+        if np.isscalar(self._offset_exposure):
+            offset = None
+        else:
+            offset = self._offset_exposure
+
+        model = GLM(self.endog, self.exog, family=self.family,
+                    offset=offset, freq_weights=self.weights)
+        result = model.fit()
+        return result.params
+
+    @Appender(_gee_fit_doc)
+    def fit(self, maxiter=60, ctol=1e-6, start_params=None,
+            params_niter=1, first_dep_update=0,
+            cov_type='robust', ddof_scale=None, scaling_factor=1.,
+            scale=None):
+
+        self.scaletype = scale
+
+        # Subtract this number from the total sample size when
+        # normalizing the scale parameter estimate.
+        if ddof_scale is None:
+            self.ddof_scale = self.exog.shape[1]
+        else:
+            if not ddof_scale >= 0:
+                raise ValueError(
+                    "ddof_scale must be a non-negative number or None")
+            self.ddof_scale = ddof_scale
+
+        self.scaling_factor = scaling_factor
+
+        self._fit_history = defaultdict(list)
+
+        if self.weights is not None and cov_type == 'naive':
+            raise ValueError("when using weights, cov_type may not be naive")
+
+        if start_params is None:
+            mean_params = self._starting_params()
+        else:
+            start_params = np.asarray(start_params)
+            mean_params = start_params.copy()
+
+        self.update_cached_means(mean_params)
+
+        del_params = -1.
+        num_assoc_updates = 0
+        for itr in range(maxiter):
+
+            update, score = self._update_mean_params()
+            if update is None:
+                warnings.warn("Singular matrix encountered in GEE update",
+                              ConvergenceWarning)
+                break
+            mean_params += update
+            self.update_cached_means(mean_params)
+
+            # L2 norm of the change in mean structure parameters at
+            # this iteration.
+            del_params = np.sqrt(np.sum(score ** 2))
+
+            self._fit_history['params'].append(mean_params.copy())
+            self._fit_history['score'].append(score)
+            self._fit_history['dep_params'].append(
+                self.cov_struct.dep_params)
+
+            # Do not exit until the association parameters have been
+            # updated at least once.
+            if (del_params < ctol and
+                    (num_assoc_updates > 0 or self.update_dep is False)):
+                break
+
+            # Update the dependence structure
+            if (self.update_dep and (itr % params_niter) == 0
+                    and (itr >= first_dep_update)):
+                self._update_assoc(mean_params)
+                num_assoc_updates += 1
+
+        if del_params >= ctol:
+            warnings.warn("Iteration limit reached prior to convergence",
+                          IterationLimitWarning)
+
+        if mean_params is None:
+            warnings.warn("Unable to estimate GEE parameters.",
+                          ConvergenceWarning)
+            return None
+
+        bcov, ncov, _ = self._covmat()
+        if bcov is None:
+            warnings.warn("Estimated covariance structure for GEE "
+                          "estimates is singular", ConvergenceWarning)
+            return None
+        bc_cov = None
+        if cov_type == "bias_reduced":
+            bc_cov = self._bc_covmat(ncov)
+
+        if self.constraint is not None:
+            x = mean_params.copy()
+            mean_params, bcov = self._handle_constraint(mean_params, bcov)
+            if mean_params is None:
+                warnings.warn("Unable to estimate constrained GEE "
+                              "parameters.", ConvergenceWarning)
+                return None
+
+            y, ncov = self._handle_constraint(x, ncov)
+            if y is None:
+                warnings.warn("Unable to estimate constrained GEE "
+                              "parameters.", ConvergenceWarning)
+                return None
+
+            if bc_cov is not None:
+                y, bc_cov = self._handle_constraint(x, bc_cov)
+                if x is None:
+                    warnings.warn("Unable to estimate constrained GEE "
+                                  "parameters.", ConvergenceWarning)
+                    return None
+
+        scale = self.estimate_scale()
+
+        # kwargs to add to results instance, need to be available in __init__
+        res_kwds = dict(cov_type=cov_type,
+                        cov_robust=bcov,
+                        cov_naive=ncov,
+                        cov_robust_bc=bc_cov)
+
+        # The superclass constructor will multiply the covariance
+        # matrix argument bcov by scale, which we do not want, so we
+        # divide bcov by the scale parameter here
+        results = GEEResults(self, mean_params, bcov / scale, scale,
+                             cov_type=cov_type, use_t=False,
+                             attr_kwds=res_kwds)
+
+        # attributes not needed during results__init__
+        results.fit_history = self._fit_history
+        self.fit_history = defaultdict(list)
+        results.score_norm = del_params
+        results.converged = (del_params < ctol)
+        results.cov_struct = self.cov_struct
+        results.params_niter = params_niter
+        results.first_dep_update = first_dep_update
+        results.ctol = ctol
+        results.maxiter = maxiter
+
+        # These will be copied over to subclasses when upgrading.
+        results._props = ["cov_type", "use_t",
+                          "cov_params_default", "cov_robust",
+                          "cov_naive", "cov_robust_bc",
+                          "fit_history",
+                          "score_norm", "converged", "cov_struct",
+                          "params_niter", "first_dep_update", "ctol",
+                          "maxiter"]
+
+        return GEEResultsWrapper(results)
+
+    def _update_regularized(self, params, pen_wt, scad_param, eps):
+
+        sn, hm = 0, 0
+
+        for i in range(self.num_group):
+
+            expval, _ = self.cached_means[i]
+            resid = self.endog_li[i] - expval
+            sdev = np.sqrt(self.family.variance(expval))
+
+            ex = self.exog_li[i] * sdev[:, None]**2
+            rslt = self.cov_struct.covariance_matrix_solve(
+                           expval, i, sdev, (resid, ex))
+            sn0 = rslt[0]
+            sn += np.dot(ex.T, sn0)
+            hm0 = rslt[1]
+            hm += np.dot(ex.T, hm0)
+
+        # Wang et al. divide sn here by num_group, but that
+        # seems to be incorrect
+
+        ap = np.abs(params)
+        clipped = np.clip(scad_param * pen_wt - ap, 0, np.inf)
+        en = pen_wt * clipped * (ap > pen_wt)
+        en /= (scad_param - 1) * pen_wt
+        en += pen_wt * (ap <= pen_wt)
+        en /= eps + ap
+
+        hm.flat[::hm.shape[0] + 1] += self.num_group * en
+        sn -= self.num_group * en * params
+        try:
+            update = np.linalg.solve(hm, sn)
+        except np.linalg.LinAlgError:
+            update = np.dot(np.linalg.pinv(hm), sn)
+            msg = "Encountered singularity in regularized GEE update"
+            warnings.warn(msg)
+        hm *= self.estimate_scale()
+
+        return update, hm
+
+    def _regularized_covmat(self, mean_params):
+
+        self.update_cached_means(mean_params)
+
+        ma = 0
+
+        for i in range(self.num_group):
+
+            expval, _ = self.cached_means[i]
+            resid = self.endog_li[i] - expval
+            sdev = np.sqrt(self.family.variance(expval))
+
+            ex = self.exog_li[i] * sdev[:, None]**2
+            rslt = self.cov_struct.covariance_matrix_solve(
+                           expval, i, sdev, (resid,))
+            ma0 = np.dot(ex.T, rslt[0])
+            ma += np.outer(ma0, ma0)
+
+        return ma

     def fit_regularized(self, pen_wt, scad_param=3.7, maxiter=100,
-        ddof_scale=None, update_assoc=5, ctol=1e-05, ztol=0.001, eps=1e-06,
-        scale=None):
+                        ddof_scale=None, update_assoc=5,
+                        ctol=1e-5, ztol=1e-3, eps=1e-6, scale=None):
         """
         Regularized estimation for GEE.

@@ -713,7 +1487,67 @@ class GEE(GLM):
         https://www.ncbi.nlm.nih.gov/pubmed/21955051
         http://users.stat.umn.edu/~wangx346/research/GEE_selection.pdf
         """
-        pass
+
+        self.scaletype = scale
+
+        mean_params = np.zeros(self.exog.shape[1])
+        self.update_cached_means(mean_params)
+        converged = False
+        fit_history = defaultdict(list)
+
+        # Subtract this number from the total sample size when
+        # normalizing the scale parameter estimate.
+        if ddof_scale is None:
+            self.ddof_scale = self.exog.shape[1]
+        else:
+            if not ddof_scale >= 0:
+                raise ValueError(
+                    "ddof_scale must be a non-negative number or None")
+            self.ddof_scale = ddof_scale
+
+        # Keep this private for now.  In some cases the early steps are
+        # very small so it seems necessary to ensure a certain minimum
+        # number of iterations before testing for convergence.
+        miniter = 20
+
+        for itr in range(maxiter):
+
+            update, hm = self._update_regularized(
+                              mean_params, pen_wt, scad_param, eps)
+            if update is None:
+                msg = "Singular matrix encountered in regularized GEE update"
+                warnings.warn(msg, ConvergenceWarning)
+                break
+            if itr > miniter and np.sqrt(np.sum(update**2)) < ctol:
+                converged = True
+                break
+            mean_params += update
+            fit_history['params'].append(mean_params.copy())
+            self.update_cached_means(mean_params)
+
+            if itr != 0 and (itr % update_assoc == 0):
+                self._update_assoc(mean_params)
+
+        if not converged:
+            msg = "GEE.fit_regularized did not converge"
+            warnings.warn(msg)
+
+        mean_params[np.abs(mean_params) < ztol] = 0
+
+        self._update_assoc(mean_params)
+        ma = self._regularized_covmat(mean_params)
+        cov = np.linalg.solve(hm, ma)
+        cov = np.linalg.solve(hm, cov.T)
+
+        # kwargs to add to results instance, need to be available in __init__
+        res_kwds = dict(cov_type="robust", cov_robust=cov)
+
+        scale = self.estimate_scale()
+        rslt = GEEResults(self, mean_params, cov, scale,
+                          regularized=True, attr_kwds=res_kwds)
+        rslt.fit_history = fit_history
+
+        return GEEResultsWrapper(rslt)

     def _handle_constraint(self, mean_params, bcov):
         """
@@ -736,16 +1570,72 @@ class GEE(GLM):
             The input covariance matrix bcov, expanded to the
             coordinate system of the full model
         """
-        pass
+
+        # The number of variables in the full model
+        red_p = len(mean_params)
+        full_p = self.constraint.lhs.shape[1]
+        mean_params0 = np.r_[mean_params, np.zeros(full_p - red_p)]
+
+        # Get the score vector under the full model.
+        save_exog_li = self.exog_li
+        self.exog_li = self.constraint.exog_fulltrans_li
+        import copy
+        save_cached_means = copy.deepcopy(self.cached_means)
+        self.update_cached_means(mean_params0)
+        _, score = self._update_mean_params()
+
+        if score is None:
+            warnings.warn("Singular matrix encountered in GEE score test",
+                          ConvergenceWarning)
+            return None, None
+
+        _, ncov1, cmat = self._covmat()
+        scale = self.estimate_scale()
+        cmat = cmat / scale ** 2
+        score2 = score[red_p:] / scale
+        amat = np.linalg.inv(ncov1)
+
+        bmat_11 = cmat[0:red_p, 0:red_p]
+        bmat_22 = cmat[red_p:, red_p:]
+        bmat_12 = cmat[0:red_p, red_p:]
+        amat_11 = amat[0:red_p, 0:red_p]
+        amat_12 = amat[0:red_p, red_p:]
+
+        score_cov = bmat_22 - np.dot(amat_12.T,
+                                     np.linalg.solve(amat_11, bmat_12))
+        score_cov -= np.dot(bmat_12.T,
+                            np.linalg.solve(amat_11, amat_12))
+        score_cov += np.dot(amat_12.T,
+                            np.dot(np.linalg.solve(amat_11, bmat_11),
+                                   np.linalg.solve(amat_11, amat_12)))
+
+        from scipy.stats.distributions import chi2
+        score_statistic = np.dot(score2,
+                                 np.linalg.solve(score_cov, score2))
+        score_df = len(score2)
+        score_pvalue = 1 - chi2.cdf(score_statistic, score_df)
+        self.score_test_results = {"statistic": score_statistic,
+                                   "df": score_df,
+                                   "p-value": score_pvalue}
+
+        mean_params = self.constraint.unpack_param(mean_params)
+        bcov = self.constraint.unpack_cov(bcov)
+
+        self.exog_li = save_exog_li
+        self.cached_means = save_cached_means
+        self.exog = self.constraint.restore_exog()
+
+        return mean_params, bcov

     def _update_assoc(self, params):
         """
         Update the association parameters
         """
-        pass
+
+        self.cov_struct.update(params)

     def _derivative_exog(self, params, exog=None, transform='dydx',
-        dummy_idx=None, count_idx=None):
+                         dummy_idx=None, count_idx=None):
         """
         For computing marginal effects, returns dF(XB) / dX where F(.)
         is the fitted mean.
@@ -755,7 +1645,30 @@ class GEE(GLM):
         Not all of these make sense in the presence of discrete regressors,
         but checks are done in the results in get_margeff.
         """
-        pass
+        # This form should be appropriate for group 1 probit, logit,
+        # logistic, cloglog, heckprob, xtprobit.
+        offset_exposure = None
+        if exog is None:
+            exog = self.exog
+            offset_exposure = self._offset_exposure
+
+        margeff = self.mean_deriv_exog(exog, params, offset_exposure)
+
+        if 'ex' in transform:
+            margeff *= exog
+        if 'ey' in transform:
+            margeff /= self.predict(params, exog)[:, None]
+        if count_idx is not None:
+            from statsmodels.discrete.discrete_margins import (
+                _get_count_effects)
+            margeff = _get_count_effects(margeff, exog, count_idx, transform,
+                                         self, params)
+        if dummy_idx is not None:
+            from statsmodels.discrete.discrete_margins import (
+                _get_dummy_effects)
+            margeff = _get_dummy_effects(margeff, exog, dummy_idx, transform,
+                                         self, params)
+        return margeff

     def qic(self, params, scale, cov_params, n_step=1000):
         """
@@ -808,52 +1721,101 @@ class GEE(GLM):
         .. [*] W. Pan (2001).  Akaike's information criterion in generalized
                estimating equations.  Biometrics (57) 1.
         """
-        pass
+
+        varfunc = self.family.variance
+
+        means = []
+        omega = 0.0
+        # omega^-1 is the model-based covariance assuming independence
+
+        for i in range(self.num_group):
+            expval, lpr = self.cached_means[i]
+            means.append(expval)
+            dmat = self.mean_deriv(self.exog_li[i], lpr)
+            omega += np.dot(dmat.T, dmat) / scale
+
+        means = np.concatenate(means)
+
+        # The quasi-likelihood, use change of variables so the integration is
+        # from -1 to 1.
+        endog_li = np.concatenate(self.endog_li)
+        du = means - endog_li
+        qv = np.empty(n_step)
+        xv = np.linspace(-0.99999, 1, n_step)
+        for i, g in enumerate(xv):
+            u = endog_li + (g + 1) * du / 2.0
+            vu = varfunc(u)
+            qv[i] = -np.sum(du**2 * (g + 1) / vu)
+        qv /= (4 * scale)
+
+        try:
+            from scipy.integrate import trapezoid
+        except ImportError:
+            # Remove after minimum is SciPy 1.7
+            from scipy.integrate import trapz as trapezoid
+        ql = trapezoid(qv, dx=xv[1] - xv[0])
+
+        qicu = -2 * ql + 2 * self.exog.shape[1]
+        qic = -2 * ql + 2 * np.trace(np.dot(omega, cov_params))
+
+        return ql, qic, qicu


 class GEEResults(GLMResults):
+
     __doc__ = (
-        'This class summarizes the fit of a marginal regression model using GEE.\n'
-         + _gee_results_doc)
+        "This class summarizes the fit of a marginal regression model "
+        "using GEE.\n" + _gee_results_doc)

-    def __init__(self, model, params, cov_params, scale, cov_type='robust',
-        use_t=False, regularized=False, **kwds):
-        super(GEEResults, self).__init__(model, params,
-            normalized_cov_params=cov_params, scale=scale)
+    def __init__(self, model, params, cov_params, scale,
+                 cov_type='robust', use_t=False, regularized=False,
+                 **kwds):
+
+        super(GEEResults, self).__init__(
+            model, params, normalized_cov_params=cov_params,
+            scale=scale)
+
+        # not added by super
         self.df_resid = model.df_resid
         self.df_model = model.df_model
         self.family = model.family
+
         attr_kwds = kwds.pop('attr_kwds', {})
         self.__dict__.update(attr_kwds)
-        if not (hasattr(self, 'cov_type') and hasattr(self,
-            'cov_params_default')):
-            self.cov_type = cov_type
+
+        # we do not do this if the cov_type has already been set
+        # subclasses can set it through attr_kwds
+        if not (hasattr(self, 'cov_type') and
+                hasattr(self, 'cov_params_default')):
+            self.cov_type = cov_type  # keep alias
             covariance_type = self.cov_type.lower()
-            allowed_covariances = ['robust', 'naive', 'bias_reduced']
+            allowed_covariances = ["robust", "naive", "bias_reduced"]
             if covariance_type not in allowed_covariances:
-                msg = 'GEE: `cov_type` must be one of ' + ', '.join(
-                    allowed_covariances)
+                msg = ("GEE: `cov_type` must be one of " +
+                       ", ".join(allowed_covariances))
                 raise ValueError(msg)
-            if cov_type == 'robust':
+
+            if cov_type == "robust":
                 cov = self.cov_robust
-            elif cov_type == 'naive':
+            elif cov_type == "naive":
                 cov = self.cov_naive
-            elif cov_type == 'bias_reduced':
+            elif cov_type == "bias_reduced":
                 cov = self.cov_robust_bc
+
             self.cov_params_default = cov
-        elif self.cov_type != cov_type:
-            raise ValueError(
-                'cov_type in argument is different from already attached cov_type'
-                )
+        else:
+            if self.cov_type != cov_type:
+                raise ValueError('cov_type in argument is different from '
+                                 'already attached cov_type')

     @cache_readonly
     def resid(self):
         """
         The response residuals.
         """
-        pass
+        return self.resid_response

-    def standard_errors(self, cov_type='robust'):
+    def standard_errors(self, cov_type="robust"):
         """
         This is a convenience function that returns the standard
         errors for any covariance type.  The value of `bse` is the
@@ -867,7 +1829,29 @@ class GEEResults(GLMResults):
             the covariance used to compute standard errors.  Defaults
             to "robust".
         """
-        pass
+
+        # Check covariance_type
+        covariance_type = cov_type.lower()
+        allowed_covariances = ["robust", "naive", "bias_reduced"]
+        if covariance_type not in allowed_covariances:
+            msg = ("GEE: `covariance_type` must be one of " +
+                   ", ".join(allowed_covariances))
+            raise ValueError(msg)
+
+        if covariance_type == "robust":
+            return np.sqrt(np.diag(self.cov_robust))
+        elif covariance_type == "naive":
+            return np.sqrt(np.diag(self.cov_naive))
+        elif covariance_type == "bias_reduced":
+            if self.cov_robust_bc is None:
+                raise ValueError(
+                    "GEE: `bias_reduced` covariance not available")
+            return np.sqrt(np.diag(self.cov_robust_bc))
+
+    # Need to override to allow for different covariance types.
+    @cache_readonly
+    def bse(self):
+        return self.standard_errors(self.cov_type)

     def score_test(self):
         """
@@ -892,7 +1876,13 @@ class GEEResults(GLMResults):
         test in GEE".
         http://www.sph.umn.edu/faculty1/wp-content/uploads/2012/11/rr2002-013.pdf
         """
-        pass
+
+        if not hasattr(self.model, "score_test_results"):
+            msg = "score_test on results instance only available when "
+            msg += " model was fit with constraints"
+            raise ValueError(msg)
+
+        return self.model.score_test_results

     @cache_readonly
     def resid_split(self):
@@ -901,14 +1891,22 @@ class GEEResults(GLMResults):
         values from the model.  The residuals are returned as a list
         of arrays containing the residuals for each cluster.
         """
-        pass
+        sresid = []
+        for v in self.model.group_labels:
+            ii = self.model.group_indices[v]
+            sresid.append(self.resid[ii])
+        return sresid

     @cache_readonly
     def resid_centered(self):
         """
         Returns the residuals centered within each group.
         """
-        pass
+        cresid = self.resid.copy()
+        for v in self.model.group_labels:
+            ii = self.model.group_indices[v]
+            cresid[ii] -= cresid[ii].mean()
+        return cresid

     @cache_readonly
     def resid_centered_split(self):
@@ -917,7 +1915,11 @@ class GEEResults(GLMResults):
         residuals are returned as a list of arrays containing the
         centered residuals for each cluster.
         """
-        pass
+        sresid = []
+        for v in self.model.group_labels:
+            ii = self.model.group_indices[v]
+            sresid.append(self.centered_resid[ii])
+        return sresid

     def qic(self, scale=None, n_step=1000):
         """
@@ -925,12 +1927,58 @@ class GEEResults(GLMResults):

         See GEE.qic for documentation.
         """
-        pass
+
+        # It is easy to forget to set the scale parameter.  Sometimes
+        # this is intentional, so we warn.
+        if scale is None:
+            warnings.warn("QIC values obtained using scale=None are not "
+                          "appropriate for comparing models")
+
+        if scale is None:
+            scale = self.scale
+
+        _, qic, qicu = self.model.qic(self.params, scale,
+                                      self.cov_params(),
+                                      n_step=n_step)
+
+        return qic, qicu
+
+    # FIXME: alias to be removed, temporary backwards compatibility
     split_resid = resid_split
     centered_resid = resid_centered
     split_centered_resid = resid_centered_split

-    def conf_int(self, alpha=0.05, cols=None, cov_type=None):
+    @Appender(_plot_added_variable_doc % {'extra_params_doc': ''})
+    def plot_added_variable(self, focus_exog, resid_type=None,
+                            use_glm_weights=True, fit_kwargs=None,
+                            ax=None):
+
+        from statsmodels.graphics.regressionplots import plot_added_variable
+
+        fig = plot_added_variable(self, focus_exog,
+                                  resid_type=resid_type,
+                                  use_glm_weights=use_glm_weights,
+                                  fit_kwargs=fit_kwargs, ax=ax)
+
+        return fig
+
+    @Appender(_plot_partial_residuals_doc % {'extra_params_doc': ''})
+    def plot_partial_residuals(self, focus_exog, ax=None):
+
+        from statsmodels.graphics.regressionplots import plot_partial_residuals
+
+        return plot_partial_residuals(self, focus_exog, ax=ax)
+
+    @Appender(_plot_ceres_residuals_doc % {'extra_params_doc': ''})
+    def plot_ceres_residuals(self, focus_exog, frac=0.66, cond_means=None,
+                             ax=None):
+
+        from statsmodels.graphics.regressionplots import plot_ceres_residuals
+
+        return plot_ceres_residuals(self, focus_exog, frac,
+                                    cond_means=cond_means, ax=ax)
+
+    def conf_int(self, alpha=.05, cols=None, cov_type=None):
         """
         Returns confidence intervals for the fitted parameters.

@@ -950,9 +1998,27 @@ class GEEResults(GLMResults):
         -----
         The confidence interval is based on the Gaussian distribution.
         """
-        pass
+        # super does not allow to specify cov_type and method is not
+        # implemented,
+        # FIXME: remove this method here
+        if cov_type is None:
+            bse = self.bse
+        else:
+            bse = self.standard_errors(cov_type=cov_type)
+        params = self.params
+        dist = stats.norm
+        q = dist.ppf(1 - alpha / 2)
+
+        if cols is None:
+            lower = self.params - q * bse
+            upper = self.params + q * bse
+        else:
+            cols = np.asarray(cols)
+            lower = params[cols] - q * bse[cols]
+            upper = params[cols] + q * bse[cols]
+        return np.asarray(lzip(lower, upper))

-    def summary(self, yname=None, xname=None, title=None, alpha=0.05):
+    def summary(self, yname=None, xname=None, title=None, alpha=.05):
         """
         Summarize the GEE regression results

@@ -985,10 +2051,72 @@ class GEEResults(GLMResults):
         --------
         statsmodels.iolib.summary.Summary : class to hold summary results
         """
-        pass

-    def get_margeff(self, at='overall', method='dydx', atexog=None, dummy=
-        False, count=False):
+        top_left = [('Dep. Variable:', None),
+                    ('Model:', None),
+                    ('Method:', ['Generalized']),
+                    ('', ['Estimating Equations']),
+                    ('Family:', [self.model.family.__class__.__name__]),
+                    ('Dependence structure:',
+                     [self.model.cov_struct.__class__.__name__]),
+                    ('Date:', None),
+                    ('Covariance type: ', [self.cov_type, ])
+                    ]
+
+        NY = [len(y) for y in self.model.endog_li]
+
+        top_right = [('No. Observations:', [sum(NY)]),
+                     ('No. clusters:', [len(self.model.endog_li)]),
+                     ('Min. cluster size:', [min(NY)]),
+                     ('Max. cluster size:', [max(NY)]),
+                     ('Mean cluster size:', ["%.1f" % np.mean(NY)]),
+                     ('Num. iterations:', ['%d' %
+                                           len(self.fit_history['params'])]),
+                     ('Scale:', ["%.3f" % self.scale]),
+                     ('Time:', None),
+                     ]
+
+        # The skew of the residuals
+        skew1 = stats.skew(self.resid)
+        kurt1 = stats.kurtosis(self.resid)
+        skew2 = stats.skew(self.centered_resid)
+        kurt2 = stats.kurtosis(self.centered_resid)
+
+        diagn_left = [('Skew:', ["%12.4f" % skew1]),
+                      ('Centered skew:', ["%12.4f" % skew2])]
+
+        diagn_right = [('Kurtosis:', ["%12.4f" % kurt1]),
+                       ('Centered kurtosis:', ["%12.4f" % kurt2])
+                       ]
+
+        if title is None:
+            title = self.model.__class__.__name__ + ' ' +\
+                "Regression Results"
+
+        # Override the exog variable names if xname is provided as an
+        # argument.
+        if xname is None:
+            xname = self.model.exog_names
+
+        if yname is None:
+            yname = self.model.endog_names
+
+        # Create summary table instance
+        from statsmodels.iolib.summary import Summary
+        smry = Summary()
+        smry.add_table_2cols(self, gleft=top_left, gright=top_right,
+                             yname=yname, xname=xname,
+                             title=title)
+        smry.add_table_params(self, yname=yname, xname=xname,
+                              alpha=alpha, use_t=False)
+        smry.add_table_2cols(self, gleft=diagn_left,
+                             gright=diagn_right, yname=yname,
+                             xname=xname, title="")
+
+        return smry
+
+    def get_margeff(self, at='overall', method='dydx', atexog=None,
+                    dummy=False, count=False):
         """Get marginal effects of the fitted model.

         Parameters
@@ -1047,9 +2175,15 @@ class GEEResults(GLMResults):
         When using after Poisson, returns the expected number of events
         per period, assuming that the model is loglinear.
         """
-        pass

-    def plot_isotropic_dependence(self, ax=None, xpoints=10, min_n=50):
+        if self.model.constraint is not None:
+            warnings.warn("marginal effects ignore constraints",
+                          ValueWarning)
+
+        return GEEMargins(self, (at, method, atexog, dummy, count))
+
+    def plot_isotropic_dependence(self, ax=None, xpoints=10,
+                                  min_n=50):
         """
         Create a plot of the pairwise products of within-group
         residuals against the corresponding time differences.  This
@@ -1070,9 +2204,56 @@ class GEEResults(GLMResults):
             The minimum sample size in a bin for the mean residual
             product to be included on the plot.
         """
-        pass

-    def sensitivity_params(self, dep_params_first, dep_params_last, num_steps):
+        from statsmodels.graphics import utils as gutils
+
+        resid = self.model.cluster_list(self.resid)
+        time = self.model.cluster_list(self.model.time)
+
+        # All within-group pairwise time distances (xdt) and the
+        # corresponding products of scaled residuals (xre).
+        xre, xdt = [], []
+        for re, ti in zip(resid, time):
+            ix = np.tril_indices(re.shape[0], 0)
+            re = re[ix[0]] * re[ix[1]] / self.scale ** 2
+            xre.append(re)
+            dists = np.sqrt(((ti[ix[0], :] - ti[ix[1], :]) ** 2).sum(1))
+            xdt.append(dists)
+
+        xre = np.concatenate(xre)
+        xdt = np.concatenate(xdt)
+
+        if ax is None:
+            fig, ax = gutils.create_mpl_ax(ax)
+        else:
+            fig = ax.get_figure()
+
+        # Convert to a correlation
+        ii = np.flatnonzero(xdt == 0)
+        v0 = np.mean(xre[ii])
+        xre /= v0
+
+        # Use the simple average to smooth, since fancier smoothers
+        # that trim and downweight outliers give biased results (we
+        # need the actual mean of a skewed distribution).
+        if np.isscalar(xpoints):
+            xpoints = np.linspace(0, max(xdt), xpoints)
+        dg = np.digitize(xdt, xpoints)
+        dgu = np.unique(dg)
+        hist = np.asarray([np.sum(dg == k) for k in dgu])
+        ii = np.flatnonzero(hist >= min_n)
+        dgu = dgu[ii]
+        dgy = np.asarray([np.mean(xre[dg == k]) for k in dgu])
+        dgx = np.asarray([np.mean(xdt[dg == k]) for k in dgu])
+
+        ax.plot(dgx, dgy, '-', color='orange', lw=5)
+        ax.set_xlabel("Time difference")
+        ax.set_ylabel("Product of scaled residuals")
+
+        return fig
+
+    def sensitivity_params(self, dep_params_first,
+                           dep_params_last, num_steps):
         """
         Refits the GEE model using a sequence of values for the
         dependence parameters.
@@ -1091,52 +2272,189 @@ class GEEResults(GLMResults):
         results : array_like
             The GEEResults objects resulting from the fits.
         """
-        pass
+
+        model = self.model
+
+        import copy
+        cov_struct = copy.deepcopy(self.model.cov_struct)
+
+        # We are fixing the dependence structure in each run.
+        update_dep = model.update_dep
+        model.update_dep = False
+
+        dep_params = []
+        results = []
+        for x in np.linspace(0, 1, num_steps):
+
+            dp = x * dep_params_last + (1 - x) * dep_params_first
+            dep_params.append(dp)
+
+            model.cov_struct = copy.deepcopy(cov_struct)
+            model.cov_struct.dep_params = dp
+            rslt = model.fit(start_params=self.params,
+                             ctol=self.ctol,
+                             params_niter=self.params_niter,
+                             first_dep_update=self.first_dep_update,
+                             cov_type=self.cov_type)
+            results.append(rslt)
+
+        model.update_dep = update_dep
+
+        return results
+
+    # FIXME: alias to be removed, temporary backwards compatibility
     params_sensitivity = sensitivity_params


 class GEEResultsWrapper(lm.RegressionResultsWrapper):
-    _attrs = {'centered_resid': 'rows'}
+    _attrs = {
+        'centered_resid': 'rows',
+    }
     _wrap_attrs = wrap.union_dicts(lm.RegressionResultsWrapper._wrap_attrs,
-        _attrs)
-
-
-wrap.populate_wrapper(GEEResultsWrapper, GEEResults)
+                                   _attrs)
+wrap.populate_wrapper(GEEResultsWrapper, GEEResults)  # noqa:E305


 class OrdinalGEE(GEE):
-    __doc__ = ('    Ordinal Response Marginal Regression Model using GEE\n' +
+
+    __doc__ = (
+        "    Ordinal Response Marginal Regression Model using GEE\n" +
         _gee_init_doc % {'extra_params': base._missing_param_doc,
-        'family_doc': _gee_ordinal_family_doc, 'example':
-        _gee_ordinal_example, 'notes': _gee_nointercept})
+                         'family_doc': _gee_ordinal_family_doc,
+                         'example': _gee_ordinal_example,
+                         'notes': _gee_nointercept})

     def __init__(self, endog, exog, groups, time=None, family=None,
-        cov_struct=None, missing='none', offset=None, dep_data=None,
-        constraint=None, **kwargs):
+                 cov_struct=None, missing='none', offset=None,
+                 dep_data=None, constraint=None, **kwargs):
+
         if family is None:
             family = families.Binomial()
-        elif not isinstance(family, families.Binomial):
-            raise ValueError('ordinal GEE must use a Binomial family')
+        else:
+            if not isinstance(family, families.Binomial):
+                raise ValueError("ordinal GEE must use a Binomial family")
+
         if cov_struct is None:
             cov_struct = cov_structs.OrdinalIndependence()
-        endog, exog, groups, time, offset = self.setup_ordinal(endog, exog,
-            groups, time, offset)
-        super(OrdinalGEE, self).__init__(endog, exog, groups, time, family,
-            cov_struct, missing, offset, dep_data, constraint)
+
+        endog, exog, groups, time, offset = self.setup_ordinal(
+            endog, exog, groups, time, offset)
+
+        super(OrdinalGEE, self).__init__(endog, exog, groups, time,
+                                         family, cov_struct, missing,
+                                         offset, dep_data, constraint)

     def setup_ordinal(self, endog, exog, groups, time, offset):
         """
         Restructure ordinal data as binary indicators so that they can
         be analyzed using Generalized Estimating Equations.
         """
-        pass
+
+        self.endog_orig = endog.copy()
+        self.exog_orig = exog.copy()
+        self.groups_orig = groups.copy()
+        if offset is not None:
+            self.offset_orig = offset.copy()
+        else:
+            self.offset_orig = None
+            offset = np.zeros(len(endog))
+        if time is not None:
+            self.time_orig = time.copy()
+        else:
+            self.time_orig = None
+            time = np.zeros((len(endog), 1))
+
+        exog = np.asarray(exog)
+        endog = np.asarray(endog)
+        groups = np.asarray(groups)
+        time = np.asarray(time)
+        offset = np.asarray(offset)
+
+        # The unique outcomes, except the greatest one.
+        self.endog_values = np.unique(endog)
+        endog_cuts = self.endog_values[0:-1]
+        ncut = len(endog_cuts)
+
+        nrows = ncut * len(endog)
+        exog_out = np.zeros((nrows, exog.shape[1]),
+                            dtype=np.float64)
+        endog_out = np.zeros(nrows, dtype=np.float64)
+        intercepts = np.zeros((nrows, ncut), dtype=np.float64)
+        groups_out = np.zeros(nrows, dtype=groups.dtype)
+        time_out = np.zeros((nrows, time.shape[1]),
+                            dtype=np.float64)
+        offset_out = np.zeros(nrows, dtype=np.float64)
+
+        jrow = 0
+        zipper = zip(exog, endog, groups, time, offset)
+        for (exog_row, endog_value, group_value, time_value,
+             offset_value) in zipper:
+
+            # Loop over thresholds for the indicators
+            for thresh_ix, thresh in enumerate(endog_cuts):
+
+                exog_out[jrow, :] = exog_row
+                endog_out[jrow] = int(np.squeeze(endog_value > thresh))
+                intercepts[jrow, thresh_ix] = 1
+                groups_out[jrow] = group_value
+                time_out[jrow] = time_value
+                offset_out[jrow] = offset_value
+                jrow += 1
+
+        exog_out = np.concatenate((intercepts, exog_out), axis=1)
+
+        # exog column names, including intercepts
+        xnames = ["I(y>%.1f)" % v for v in endog_cuts]
+        if type(self.exog_orig) is pd.DataFrame:
+            xnames.extend(self.exog_orig.columns)
+        else:
+            xnames.extend(["x%d" % k for k in range(1, exog.shape[1] + 1)])
+        exog_out = pd.DataFrame(exog_out, columns=xnames)
+
+        # Preserve the endog name if there is one
+        if type(self.endog_orig) is pd.Series:
+            endog_out = pd.Series(endog_out, name=self.endog_orig.name)
+
+        return endog_out, exog_out, groups_out, time_out, offset_out
+
+    def _starting_params(self):
+        exposure = getattr(self, "exposure", None)
+        model = GEE(self.endog, self.exog, self.groups,
+                    time=self.time, family=families.Binomial(),
+                    offset=self.offset, exposure=exposure)
+        result = model.fit()
+        return result.params
+
+    @Appender(_gee_fit_doc)
+    def fit(self, maxiter=60, ctol=1e-6, start_params=None,
+            params_niter=1, first_dep_update=0,
+            cov_type='robust'):
+
+        rslt = super(OrdinalGEE, self).fit(maxiter, ctol, start_params,
+                                           params_niter, first_dep_update,
+                                           cov_type=cov_type)
+
+        rslt = rslt._results   # use unwrapped instance
+        res_kwds = dict(((k, getattr(rslt, k)) for k in rslt._props))
+        # Convert the GEEResults to an OrdinalGEEResults
+        ord_rslt = OrdinalGEEResults(self, rslt.params,
+                                     rslt.cov_params() / rslt.scale,
+                                     rslt.scale,
+                                     cov_type=cov_type,
+                                     attr_kwds=res_kwds)
+        # for k in rslt._props:
+        #    setattr(ord_rslt, k, getattr(rslt, k))
+        # TODO: document or delete
+
+        return OrdinalGEEResultsWrapper(ord_rslt)


 class OrdinalGEEResults(GEEResults):
+
     __doc__ = (
-        """This class summarizes the fit of a marginal regression modelfor an ordinal response using GEE.
-"""
-         + _gee_results_doc)
+        "This class summarizes the fit of a marginal regression model"
+        "for an ordinal response using GEE.\n"
+        + _gee_results_doc)

     def plot_distribution(self, ax=None, exog_values=None):
         """
@@ -1167,7 +2485,62 @@ class OrdinalGEEResults(GEEResults):
         >>> ev = [{"sex": 1}, {"sex": 0}]
         >>> rslt.distribution_plot(exog_values=ev)
         """
-        pass
+
+        from statsmodels.graphics import utils as gutils
+
+        if ax is None:
+            fig, ax = gutils.create_mpl_ax(ax)
+        else:
+            fig = ax.get_figure()
+
+        # If no covariate patterns are specified, create one with all
+        # variables set to their mean values.
+        if exog_values is None:
+            exog_values = [{}, ]
+
+        exog_means = self.model.exog.mean(0)
+        ix_icept = [i for i, x in enumerate(self.model.exog_names) if
+                    x.startswith("I(")]
+
+        for ev in exog_values:
+
+            for k in ev.keys():
+                if k not in self.model.exog_names:
+                    raise ValueError("%s is not a variable in the model"
+                                     % k)
+
+            # Get the fitted probability for each level, at the given
+            # covariate values.
+            pr = []
+            for j in ix_icept:
+
+                xp = np.zeros_like(self.params)
+                xp[j] = 1.
+                for i, vn in enumerate(self.model.exog_names):
+                    if i in ix_icept:
+                        continue
+                    # User-specified value
+                    if vn in ev:
+                        xp[i] = ev[vn]
+                    # Mean value
+                    else:
+                        xp[i] = exog_means[i]
+
+                p = 1 / (1 + np.exp(-np.dot(xp, self.params)))
+                pr.append(p)
+
+            pr.insert(0, 1)
+            pr.append(0)
+            pr = np.asarray(pr)
+            prd = -np.diff(pr)
+
+            ax.plot(self.model.endog_values, prd, 'o-')
+
+        ax.set_xlabel("Response value")
+        ax.set_ylabel("Probability")
+        ax.set_ylim(0, 1)
+
+        return fig


 def _score_test_submodel(par, sub):
@@ -1195,41 +2568,146 @@ def _score_test_submodel(par, sub):
     -----
     Returns None, None if the provided submodel is not actually a submodel.
     """
-    pass

+    x1 = par.exog
+    x2 = sub.exog

-class OrdinalGEEResultsWrapper(GEEResultsWrapper):
-    pass
+    u, s, vt = np.linalg.svd(x1, 0)
+    v = vt.T
+
+    # Get the orthogonal complement of col(x2) in col(x1).
+    a, _ = np.linalg.qr(x2)
+    a = u - np.dot(a, np.dot(a.T, u))
+    x2c, sb, _ = np.linalg.svd(a, 0)
+    x2c = x2c[:, sb > 1e-12]
+
+    # x1 * qm = x2
+    ii = np.flatnonzero(np.abs(s) > 1e-12)
+    qm = np.dot(v[:, ii], np.dot(u[:, ii].T, x2) / s[ii, None])
+
+    e = np.max(np.abs(x2 - np.dot(x1, qm)))
+    if e > 1e-8:
+        return None, None

+    # x1 * qc = x2c
+    qc = np.dot(v[:, ii], np.dot(u[:, ii].T, x2c) / s[ii, None])

-wrap.populate_wrapper(OrdinalGEEResultsWrapper, OrdinalGEEResults)
+    return qm, qc
+
+
+class OrdinalGEEResultsWrapper(GEEResultsWrapper):
+    pass
+wrap.populate_wrapper(OrdinalGEEResultsWrapper, OrdinalGEEResults)  # noqa:E305


 class NominalGEE(GEE):
+
     __doc__ = (
-        '    Nominal Response Marginal Regression Model using GEE.\n' + 
+        "    Nominal Response Marginal Regression Model using GEE.\n" +
         _gee_init_doc % {'extra_params': base._missing_param_doc,
-        'family_doc': _gee_nominal_family_doc, 'example':
-        _gee_nominal_example, 'notes': _gee_nointercept})
+                         'family_doc': _gee_nominal_family_doc,
+                         'example': _gee_nominal_example,
+                         'notes': _gee_nointercept})

     def __init__(self, endog, exog, groups, time=None, family=None,
-        cov_struct=None, missing='none', offset=None, dep_data=None,
-        constraint=None, **kwargs):
-        endog, exog, groups, time, offset = self.setup_nominal(endog, exog,
-            groups, time, offset)
+                 cov_struct=None, missing='none', offset=None,
+                 dep_data=None, constraint=None, **kwargs):
+
+        endog, exog, groups, time, offset = self.setup_nominal(
+            endog, exog, groups, time, offset)
+
         if family is None:
             family = _Multinomial(self.ncut + 1)
+
         if cov_struct is None:
             cov_struct = cov_structs.NominalIndependence()
-        super(NominalGEE, self).__init__(endog, exog, groups, time, family,
-            cov_struct, missing, offset, dep_data, constraint)
+
+        super(NominalGEE, self).__init__(
+            endog, exog, groups, time, family, cov_struct, missing,
+            offset, dep_data, constraint)
+
+    def _starting_params(self):
+        exposure = getattr(self, "exposure", None)
+        model = GEE(self.endog, self.exog, self.groups,
+                    time=self.time, family=families.Binomial(),
+                    offset=self.offset, exposure=exposure)
+        result = model.fit()
+        return result.params

     def setup_nominal(self, endog, exog, groups, time, offset):
         """
         Restructure nominal data as binary indicators so that they can
         be analyzed using Generalized Estimating Equations.
         """
-        pass
+
+        self.endog_orig = endog.copy()
+        self.exog_orig = exog.copy()
+        self.groups_orig = groups.copy()
+        if offset is not None:
+            self.offset_orig = offset.copy()
+        else:
+            self.offset_orig = None
+            offset = np.zeros(len(endog))
+        if time is not None:
+            self.time_orig = time.copy()
+        else:
+            self.time_orig = None
+            time = np.zeros((len(endog), 1))
+
+        exog = np.asarray(exog)
+        endog = np.asarray(endog)
+        groups = np.asarray(groups)
+        time = np.asarray(time)
+        offset = np.asarray(offset)
+
+        # The unique outcomes, except the greatest one.
+        self.endog_values = np.unique(endog)
+        endog_cuts = self.endog_values[0:-1]
+        ncut = len(endog_cuts)
+        self.ncut = ncut
+
+        nrows = len(endog_cuts) * exog.shape[0]
+        ncols = len(endog_cuts) * exog.shape[1]
+        exog_out = np.zeros((nrows, ncols), dtype=np.float64)
+        endog_out = np.zeros(nrows, dtype=np.float64)
+        groups_out = np.zeros(nrows, dtype=np.float64)
+        time_out = np.zeros((nrows, time.shape[1]),
+                            dtype=np.float64)
+        offset_out = np.zeros(nrows, dtype=np.float64)
+
+        jrow = 0
+        zipper = zip(exog, endog, groups, time, offset)
+        for (exog_row, endog_value, group_value, time_value,
+             offset_value) in zipper:
+
+            # Loop over thresholds for the indicators
+            for thresh_ix, thresh in enumerate(endog_cuts):
+
+                u = np.zeros(len(endog_cuts), dtype=np.float64)
+                u[thresh_ix] = 1
+                exog_out[jrow, :] = np.kron(u, exog_row)
+                endog_out[jrow] = (int(endog_value == thresh))
+                groups_out[jrow] = group_value
+                time_out[jrow] = time_value
+                offset_out[jrow] = offset_value
+                jrow += 1
+
+        # exog names
+        if isinstance(self.exog_orig, pd.DataFrame):
+            xnames_in = self.exog_orig.columns
+        else:
+            xnames_in = ["x%d" % k for k in range(1, exog.shape[1] + 1)]
+        xnames = []
+        for tr in endog_cuts:
+            xnames.extend(["%s[%.1f]" % (v, tr) for v in xnames_in])
+        exog_out = pd.DataFrame(exog_out, columns=xnames)
+        exog_out = pd.DataFrame(exog_out, columns=xnames)
+
+        # Preserve endog name if there is one
+        if isinstance(self.endog_orig, pd.Series):
+            endog_out = pd.Series(endog_out, name=self.endog_orig.name)
+
+        return endog_out, exog_out, groups_out, time_out, offset_out

     def mean_deriv(self, exog, lin_pred):
         """
@@ -1249,7 +2727,30 @@ class NominalGEE(GEE):
         The derivative of the expected endog with respect to the
         parameters.
         """
-        pass
+
+        expval = np.exp(lin_pred)
+
+        # Reshape so that each row contains all the indicators
+        # corresponding to one multinomial observation.
+        expval_m = np.reshape(expval, (len(expval) // self.ncut,
+                                       self.ncut))
+
+        # The normalizing constant for the multinomial probabilities.
+        denom = 1 + expval_m.sum(1)
+        denom = np.kron(denom, np.ones(self.ncut, dtype=np.float64))
+
+        # The multinomial probabilities
+        mprob = expval / denom
+
+        # First term of the derivative: denom * expval' / denom^2 =
+        # expval' / denom.
+        dmat = mprob[:, None] * exog
+
+        # Second term of the derivative: -expval * denom' / denom^2
+        ddenom = expval[:, None] * exog
+        dmat -= mprob[:, None] * ddenom / denom[:, None]
+
+        return dmat

     def mean_deriv_exog(self, exog, params, offset_exposure=None):
         """
@@ -1274,14 +2775,75 @@ class NominalGEE(GEE):
         -----
         offset_exposure must be set at None for the multinomial family.
         """
-        pass
+
+        if offset_exposure is not None:
+            warnings.warn("Offset/exposure ignored for the multinomial family",
+                          ValueWarning)
+
+        lpr = np.dot(exog, params)
+        expval = np.exp(lpr)
+
+        expval_m = np.reshape(expval, (len(expval) // self.ncut,
+                                       self.ncut))
+
+        denom = 1 + expval_m.sum(1)
+        denom = np.kron(denom, np.ones(self.ncut, dtype=np.float64))
+
+        bmat0 = np.outer(np.ones(exog.shape[0]), params)
+
+        # Masking matrix
+        qmat = []
+        for j in range(self.ncut):
+            ee = np.zeros(self.ncut, dtype=np.float64)
+            ee[j] = 1
+            qmat.append(np.kron(ee, np.ones(len(params) // self.ncut)))
+        qmat = np.array(qmat)
+        qmat = np.kron(np.ones((exog.shape[0] // self.ncut, 1)), qmat)
+        bmat = bmat0 * qmat
+
+        dmat = expval[:, None] * bmat / denom[:, None]
+
+        expval_mb = np.kron(expval_m, np.ones((self.ncut, 1)))
+        expval_mb = np.kron(expval_mb, np.ones((1, self.ncut)))
+
+        dmat -= expval[:, None] * (bmat * expval_mb) / denom[:, None] ** 2
+
+        return dmat
+
+    @Appender(_gee_fit_doc)
+    def fit(self, maxiter=60, ctol=1e-6, start_params=None,
+            params_niter=1, first_dep_update=0,
+            cov_type='robust'):
+
+        rslt = super(NominalGEE, self).fit(maxiter, ctol, start_params,
+                                           params_niter, first_dep_update,
+                                           cov_type=cov_type)
+        if rslt is None:
+            warnings.warn("GEE updates did not converge",
+                          ConvergenceWarning)
+            return None
+
+        rslt = rslt._results   # use unwrapped instance
+        res_kwds = dict(((k, getattr(rslt, k)) for k in rslt._props))
+        # Convert the GEEResults to a NominalGEEResults
+        nom_rslt = NominalGEEResults(self, rslt.params,
+                                     rslt.cov_params() / rslt.scale,
+                                     rslt.scale,
+                                     cov_type=cov_type,
+                                     attr_kwds=res_kwds)
+        # TODO: document or delete
+        # for k in rslt._props:
+        #    setattr(nom_rslt, k, getattr(rslt, k))
+
+        return NominalGEEResultsWrapper(nom_rslt)


 class NominalGEEResults(GEEResults):
+
     __doc__ = (
-        """This class summarizes the fit of a marginal regression modelfor a nominal response using GEE.
-"""
-         + _gee_results_doc)
+        "This class summarizes the fit of a marginal regression model"
+        "for a nominal response using GEE.\n"
+        + _gee_results_doc)

     def plot_distribution(self, ax=None, exog_values=None):
         """
@@ -1312,14 +2874,60 @@ class NominalGEEResults(GEEResults):
         >>> ex = [{"sex": 1}, {"sex": 0}]
         >>> rslt.distribution_plot(exog_values=ex)
         """
-        pass

+        from statsmodels.graphics import utils as gutils

-class NominalGEEResultsWrapper(GEEResultsWrapper):
-    pass
+        if ax is None:
+            fig, ax = gutils.create_mpl_ax(ax)
+        else:
+            fig = ax.get_figure()
+
+        # If no covariate patterns are specified, create one with all
+        # variables set to their mean values.
+        if exog_values is None:
+            exog_values = [{}, ]
+
+        link = self.model.family.link.inverse
+        ncut = self.model.family.ncut

+        k = int(self.model.exog.shape[1] / ncut)
+        exog_means = self.model.exog.mean(0)[0:k]
+        exog_names = self.model.exog_names[0:k]
+        exog_names = [x.split("[")[0] for x in exog_names]

-wrap.populate_wrapper(NominalGEEResultsWrapper, NominalGEEResults)
+        params = np.reshape(self.params,
+                            (ncut, len(self.params) // ncut))
+
+        for ev in exog_values:
+
+            exog = exog_means.copy()
+
+            for k in ev.keys():
+                if k not in exog_names:
+                    raise ValueError("%s is not a variable in the model"
+                                     % k)
+
+                ii = exog_names.index(k)
+                exog[ii] = ev[k]
+
+            lpr = np.dot(params, exog)
+            pr = link(lpr)
+            pr = np.r_[pr, 1 - pr.sum()]
+
+            ax.plot(self.model.endog_values, pr, 'o-')
+
+        ax.set_xlabel("Response value")
+        ax.set_ylabel("Probability")
+        ax.set_xticks(self.model.endog_values)
+        ax.set_xticklabels(self.model.endog_values)
+        ax.set_ylim(0, 1)
+
+        return fig
+
+
+class NominalGEEResultsWrapper(GEEResultsWrapper):
+    pass
+wrap.populate_wrapper(NominalGEEResultsWrapper, NominalGEEResults)  # noqa:E305


 class _MultinomialLogit(Link):
@@ -1359,7 +2967,16 @@ class _MultinomialLogit(Link):
         prob : ndarray
             Probabilities, or expected values
         """
-        pass
+
+        expval = np.exp(lpr)
+
+        denom = 1 + np.reshape(expval, (len(expval) // self.ncut,
+                                        self.ncut)).sum(1)
+        denom = np.kron(denom, np.ones(self.ncut, dtype=np.float64))
+
+        prob = expval / denom
+
+        return prob


 class _Multinomial(families.Family):
@@ -1367,9 +2984,10 @@ class _Multinomial(families.Family):
     Pseudo-link function for fitting nominal multinomial models with
     GEE.  Not for use outside the GEE class.
     """
-    links = [_MultinomialLogit]
+
+    links = [_MultinomialLogit, ]
     variance = varfuncs.binary
-    safe_links = [_MultinomialLogit]
+    safe_links = [_MultinomialLogit, ]

     def __init__(self, nlevels, check_link=True):
         """
@@ -1382,6 +3000,10 @@ class _Multinomial(families.Family):
         self._check_link = check_link
         self.initialize(nlevels)

+    def initialize(self, nlevels):
+        self.ncut = nlevels - 1
+        self.link = _MultinomialLogit(self.ncut)
+

 class GEEMargins:
     """
@@ -1404,7 +3026,15 @@ class GEEMargins:
         self.results = results
         self.get_margeff(*args, **kwargs)

-    def summary_frame(self, alpha=0.05):
+    def _reset(self):
+        self._cache = {}
+
+    @cache_readonly
+    def tvalues(self):
+        _check_at_is_all(self.margeff_options)
+        return self.margeff / self.margeff_se
+
+    def summary_frame(self, alpha=.05):
         """
         Returns a DataFrame summarizing the marginal effects.

@@ -1419,9 +3049,24 @@ class GEEMargins:
         frame : DataFrames
             A DataFrame summarizing the marginal effects.
         """
-        pass
+        _check_at_is_all(self.margeff_options)
+        from pandas import DataFrame
+        names = [_transform_names[self.margeff_options['method']],
+                 'Std. Err.', 'z', 'Pr(>|z|)',
+                 'Conf. Int. Low', 'Cont. Int. Hi.']
+        ind = self.results.model.exog.var(0) != 0  # True if not a constant
+        exog_names = self.results.model.exog_names
+        var_names = [name for i, name in enumerate(exog_names) if ind[i]]
+        table = np.column_stack((self.margeff, self.margeff_se, self.tvalues,
+                                 self.pvalues, self.conf_int(alpha)))
+        return DataFrame(table, columns=names, index=var_names)

-    def conf_int(self, alpha=0.05):
+    @cache_readonly
+    def pvalues(self):
+        _check_at_is_all(self.margeff_options)
+        return stats.norm.sf(np.abs(self.tvalues)) * 2
+
+    def conf_int(self, alpha=.05):
         """
         Returns the confidence intervals of the marginal effects

@@ -1437,9 +3082,14 @@ class GEEMargins:
             An array with lower, upper confidence intervals for the marginal
             effects.
         """
-        pass
+        _check_at_is_all(self.margeff_options)
+        me_se = self.margeff_se
+        q = stats.norm.ppf(1 - alpha / 2)
+        lower = self.margeff - q * me_se
+        upper = self.margeff + q * me_se
+        return np.asarray(lzip(lower, upper))

-    def summary(self, alpha=0.05):
+    def summary(self, alpha=.05):
         """
         Returns a summary table for marginal effects

@@ -1454,4 +3104,120 @@ class GEEMargins:
         Summary : SummaryTable
             A SummaryTable instance
         """
-        pass
+        _check_at_is_all(self.margeff_options)
+        results = self.results
+        model = results.model
+        title = model.__class__.__name__ + " Marginal Effects"
+        method = self.margeff_options['method']
+        top_left = [('Dep. Variable:', [model.endog_names]),
+                    ('Method:', [method]),
+                    ('At:', [self.margeff_options['at']]), ]
+
+        from statsmodels.iolib.summary import (Summary, summary_params,
+                                               table_extend)
+        exog_names = model.exog_names[:]  # copy
+        smry = Summary()
+
+        const_idx = model.data.const_idx
+        if const_idx is not None:
+            exog_names.pop(const_idx)
+
+        J = int(getattr(model, "J", 1))
+        if J > 1:
+            yname, yname_list = results._get_endog_name(model.endog_names,
+                                                        None, all=True)
+        else:
+            yname = model.endog_names
+            yname_list = [yname]
+
+        smry.add_table_2cols(self, gleft=top_left, gright=[],
+                             yname=yname, xname=exog_names, title=title)
+
+        # NOTE: add_table_params is not general enough yet for margeff
+        # could use a refactor with getattr instead of hard-coded params
+        # tvalues etc.
+        table = []
+        conf_int = self.conf_int(alpha)
+        margeff = self.margeff
+        margeff_se = self.margeff_se
+        tvalues = self.tvalues
+        pvalues = self.pvalues
+        if J > 1:
+            for eq in range(J):
+                restup = (results, margeff[:, eq], margeff_se[:, eq],
+                          tvalues[:, eq], pvalues[:, eq], conf_int[:, :, eq])
+                tble = summary_params(restup, yname=yname_list[eq],
+                                      xname=exog_names, alpha=alpha,
+                                      use_t=False,
+                                      skip_header=True)
+                tble.title = yname_list[eq]
+                # overwrite coef with method name
+                header = ['', _transform_names[method], 'std err', 'z',
+                          'P>|z|',
+                          '[%3.1f%% Conf. Int.]' % (100 - alpha * 100)]
+                tble.insert_header_row(0, header)
+                # from IPython.core.debugger import Pdb; Pdb().set_trace()
+                table.append(tble)
+
+            table = table_extend(table, keep_headers=True)
+        else:
+            restup = (results, margeff, margeff_se, tvalues, pvalues, conf_int)
+            table = summary_params(restup, yname=yname, xname=exog_names,
+                                   alpha=alpha, use_t=False, skip_header=True)
+            header = ['', _transform_names[method], 'std err', 'z',
+                      'P>|z|', '[%3.1f%% Conf. Int.]' % (100 - alpha * 100)]
+            table.insert_header_row(0, header)
+
+        smry.tables.append(table)
+        return smry
+
+    def get_margeff(self, at='overall', method='dydx', atexog=None,
+                    dummy=False, count=False):
+
+        self._reset()  # always reset the cache when this is called
+        # TODO: if at is not all or overall, we can also put atexog values
+        # in summary table head
+        method = method.lower()
+        at = at.lower()
+        _check_margeff_args(at, method)
+        self.margeff_options = dict(method=method, at=at)
+        results = self.results
+        model = results.model
+        params = results.params
+        exog = model.exog.copy()  # copy because values are changed
+        effects_idx = exog.var(0) != 0
+        const_idx = model.data.const_idx
+
+        if dummy:
+            _check_discrete_args(at, method)
+            dummy_idx, dummy = _get_dummy_index(exog, const_idx)
+        else:
+            dummy_idx = None
+
+        if count:
+            _check_discrete_args(at, method)
+            count_idx, count = _get_count_index(exog, const_idx)
+        else:
+            count_idx = None
+
+        # get the exogenous variables
+        exog = _get_margeff_exog(exog, at, atexog, effects_idx)
+
+        # get base marginal effects, handled by sub-classes
+        effects = model._derivative_exog(params, exog, method,
+                                         dummy_idx, count_idx)
+        effects = _effects_at(effects, at)
+
+        if at == 'all':
+            self.margeff = effects[:, effects_idx]
+        else:
+            # Set standard error of the marginal effects by Delta method.
+            margeff_cov, margeff_se = margeff_cov_with_se(
+                model, params, exog, results.cov_params(), at,
+                model._derivative_exog, dummy_idx, count_idx,
+                method, 1)
+
+            # do not care about at constant
+            self.margeff_cov = margeff_cov[effects_idx][:, effects_idx]
+            self.margeff_se = margeff_se[effects_idx]
+            self.margeff = effects[effects_idx]
diff --git a/statsmodels/genmod/generalized_linear_model.py b/statsmodels/genmod/generalized_linear_model.py
index 73e6e39d9..849e7c61a 100644
--- a/statsmodels/genmod/generalized_linear_model.py
+++ b/statsmodels/genmod/generalized_linear_model.py
@@ -18,36 +18,71 @@ McCullagh, P. and Nelder, J.A.  1989.  "Generalized Linear Models." 2nd ed.
     Chapman & Hall, Boca Rotan.
 """
 from statsmodels.compat.pandas import Appender
+
 import warnings
+
 import numpy as np
 from numpy.linalg.linalg import LinAlgError
+
 import statsmodels.base.model as base
 import statsmodels.base.wrapper as wrap
+
 from statsmodels.base import _prediction_inference as pred
 from statsmodels.base._prediction_inference import PredictionResultsMean
 import statsmodels.base._parameter_inference as pinfer
-from statsmodels.graphics._regressionplots_doc import _plot_added_variable_doc, _plot_ceres_residuals_doc, _plot_partial_residuals_doc
+
+from statsmodels.graphics._regressionplots_doc import (
+    _plot_added_variable_doc,
+    _plot_ceres_residuals_doc,
+    _plot_partial_residuals_doc,
+)
 import statsmodels.regression._tools as reg_tools
 import statsmodels.regression.linear_model as lm
-from statsmodels.tools.decorators import cache_readonly, cached_data, cached_value
+from statsmodels.tools.decorators import (
+    cache_readonly,
+    cached_data,
+    cached_value,
+)
 from statsmodels.tools.docstring import Docstring
-from statsmodels.tools.sm_exceptions import DomainWarning, HessianInversionWarning, PerfectSeparationWarning
+from statsmodels.tools.sm_exceptions import (
+    DomainWarning,
+    HessianInversionWarning,
+    PerfectSeparationWarning,
+)
 from statsmodels.tools.validation import float_like
+
+# need import in module instead of lazily to copy `__doc__`
 from . import families
+
+
 __all__ = ['GLM', 'PredictionResultsMean']


+def _check_convergence(criterion, iteration, atol, rtol):
+    return np.allclose(criterion[iteration], criterion[iteration + 1],
+                       atol=atol, rtol=rtol)
+
+
+# Remove after 0.13 when bic changes to bic llf
 class _ModuleVariable:
     _value = None

+    @property
+    def use_bic_llf(self):
+        return self._value
+
+    def set_use_bic_llf(self, val):
+        if val not in (True, False, None):
+            raise ValueError("Must be True, False or None")
+        self._value = bool(val) if val is not None else val
+

 _use_bic_helper = _ModuleVariable()
 SET_USE_BIC_LLF = _use_bic_helper.set_use_bic_llf


 class GLM(base.LikelihoodModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Generalized Linear Models

     GLM inherits from statsmodels.base.model.LikelihoodModel
@@ -256,70 +291,164 @@ class GLM(base.LikelihoodModel):
     interpretation. The loglikelihood is not correctly specified in this case,
     and statistics based on it, such AIC or likelihood ratio tests, are not
     appropriate.
-    """
-         % {'extra_params': base._missing_param_doc})
+    """ % {'extra_params': base._missing_param_doc}
+    # Maximum number of endogenous variables when using a formula
     _formula_max_endog = 2

-    def __init__(self, endog, exog, family=None, offset=None, exposure=None,
-        freq_weights=None, var_weights=None, missing='none', **kwargs):
+    def __init__(self, endog, exog, family=None, offset=None,
+                 exposure=None, freq_weights=None, var_weights=None,
+                 missing='none', **kwargs):
+
         if type(self) is GLM:
             self._check_kwargs(kwargs, ['n_trials'])
-        if family is not None and not isinstance(family.link, tuple(family.
-            safe_links)):
-            warnings.warn(
-                f'The {type(family.link).__name__} link function does not respect the domain of the {type(family).__name__} family.'
-                , DomainWarning)
+
+        if (family is not None) and not isinstance(family.link,
+                                                   tuple(family.safe_links)):
+
+            warnings.warn((f"The {type(family.link).__name__} link function "
+                           "does not respect the domain of the "
+                           f"{type(family).__name__} family."),
+                          DomainWarning)
+
         if exposure is not None:
             exposure = np.log(exposure)
-        if offset is not None:
+        if offset is not None:  # this should probably be done upstream
             offset = np.asarray(offset)
+
         if freq_weights is not None:
             freq_weights = np.asarray(freq_weights)
         if var_weights is not None:
             var_weights = np.asarray(var_weights)
+
         self.freq_weights = freq_weights
         self.var_weights = var_weights
-        super(GLM, self).__init__(endog, exog, missing=missing, offset=
-            offset, exposure=exposure, freq_weights=freq_weights,
-            var_weights=var_weights, **kwargs)
+
+        super(GLM, self).__init__(endog, exog, missing=missing,
+                                  offset=offset, exposure=exposure,
+                                  freq_weights=freq_weights,
+                                  var_weights=var_weights, **kwargs)
         self._check_inputs(family, self.offset, self.exposure, self.endog,
-            self.freq_weights, self.var_weights)
+                           self.freq_weights, self.var_weights)
         if offset is None:
             delattr(self, 'offset')
         if exposure is None:
             delattr(self, 'exposure')
+
         self.nobs = self.endog.shape[0]
+
+        # things to remove_data
         self._data_attr.extend(['weights', 'mu', 'freq_weights',
-            'var_weights', 'iweights', '_offset_exposure', 'n_trials'])
+                                'var_weights', 'iweights', '_offset_exposure',
+                                'n_trials'])
+        # register kwds for __init__, offset and exposure are added by super
         self._init_keys.append('family')
+
         self._setup_binomial()
+        # internal usage for recreating a model
         if 'n_trials' in kwargs:
             self.n_trials = kwargs['n_trials']
-        offset_exposure = 0.0
+
+        # Construct a combined offset/exposure term.  Note that
+        # exposure has already been logged if present.
+        offset_exposure = 0.
         if hasattr(self, 'offset'):
             offset_exposure = self.offset
         if hasattr(self, 'exposure'):
             offset_exposure = offset_exposure + self.exposure
         self._offset_exposure = offset_exposure
+
         self.scaletype = None

     def initialize(self):
         """
         Initialize a generalized linear model.
         """
-        pass
+        self.df_model = np.linalg.matrix_rank(self.exog) - 1
+
+        if (self.freq_weights is not None) and \
+           (self.freq_weights.shape[0] == self.endog.shape[0]):
+            self.wnobs = self.freq_weights.sum()
+            self.df_resid = self.wnobs - self.df_model - 1
+        else:
+            self.wnobs = self.exog.shape[0]
+            self.df_resid = self.exog.shape[0] - self.df_model - 1
+
+    def _check_inputs(self, family, offset, exposure, endog, freq_weights,
+                      var_weights):

-    def loglike_mu(self, mu, scale=1.0):
+        # Default family is Gaussian
+        if family is None:
+            family = families.Gaussian()
+        self.family = family
+
+        if exposure is not None:
+            if not isinstance(self.family.link, families.links.Log):
+                raise ValueError("exposure can only be used with the log "
+                                 "link function")
+            elif exposure.shape[0] != endog.shape[0]:
+                raise ValueError("exposure is not the same length as endog")
+
+        if offset is not None:
+            if offset.shape[0] != endog.shape[0]:
+                raise ValueError("offset is not the same length as endog")
+
+        if freq_weights is not None:
+            if freq_weights.shape[0] != endog.shape[0]:
+                raise ValueError("freq weights not the same length as endog")
+            if len(freq_weights.shape) > 1:
+                raise ValueError("freq weights has too many dimensions")
+
+        # internal flag to store whether freq_weights were not None
+        self._has_freq_weights = (self.freq_weights is not None)
+        if self.freq_weights is None:
+            self.freq_weights = np.ones((endog.shape[0]))
+            # TODO: check do we want to keep None as sentinel for freq_weights
+
+        if np.shape(self.freq_weights) == () and self.freq_weights > 1:
+            self.freq_weights = (self.freq_weights *
+                                 np.ones((endog.shape[0])))
+
+        if var_weights is not None:
+            if var_weights.shape[0] != endog.shape[0]:
+                raise ValueError("var weights not the same length as endog")
+            if len(var_weights.shape) > 1:
+                raise ValueError("var weights has too many dimensions")
+
+        # internal flag to store whether var_weights were not None
+        self._has_var_weights = (var_weights is not None)
+        if var_weights is None:
+            self.var_weights = np.ones((endog.shape[0]))
+            # TODO: check do we want to keep None as sentinel for var_weights
+        self.iweights = np.asarray(self.freq_weights * self.var_weights)
+
+    def _get_init_kwds(self):
+        # this is a temporary fixup because exposure has been transformed
+        # see #1609, copied from discrete_model.CountModel
+        kwds = super(GLM, self)._get_init_kwds()
+        if 'exposure' in kwds and kwds['exposure'] is not None:
+            kwds['exposure'] = np.exp(kwds['exposure'])
+        return kwds
+
+    def loglike_mu(self, mu, scale=1.):
         """
         Evaluate the log-likelihood for a generalized linear model.
         """
-        pass
+        scale = float_like(scale, "scale")
+        return self.family.loglike(self.endog, mu, self.var_weights,
+                                   self.freq_weights, scale)

     def loglike(self, params, scale=None):
         """
         Evaluate the log-likelihood for a generalized linear model.
         """
-        pass
+        scale = float_like(scale, "scale", optional=True)
+        lin_pred = np.dot(self.exog, params) + self._offset_exposure
+        expval = self.family.link.inverse(lin_pred)
+        if scale is None:
+            scale = self.estimate_scale(expval)
+        llf = self.family.loglike(self.endog, expval, self.var_weights,
+                                  self.freq_weights, scale)
+        return llf

     def score_obs(self, params, scale=None):
         """score first derivative of the loglikelihood for each observation.
@@ -339,7 +468,9 @@ class GLM(base.LikelihoodModel):
             The first derivative of the loglikelihood function evaluated at
             params for each observation.
         """
-        pass
+        scale = float_like(scale, "scale", optional=True)
+        score_factor = self.score_factor(params, scale=scale)
+        return score_factor[:, None] * self.exog

     def score(self, params, scale=None):
         """score, first derivative of the loglikelihood function
@@ -359,7 +490,9 @@ class GLM(base.LikelihoodModel):
             The first derivative of the loglikelihood function calculated as
             the sum of `score_obs`
         """
-        pass
+        scale = float_like(scale, "scale", optional=True)
+        score_factor = self.score_factor(params, scale=scale)
+        return np.dot(score_factor, self.exog)

     def score_factor(self, params, scale=None):
         """weights for score for each observation
@@ -381,7 +514,19 @@ class GLM(base.LikelihoodModel):
             A 1d weight vector used in the calculation of the score_obs.
             The score_obs are obtained by `score_factor[:, None] * exog`
         """
-        pass
+        scale = float_like(scale, "scale", optional=True)
+        mu = self.predict(params)
+        if scale is None:
+            scale = self.estimate_scale(mu)
+
+        score_factor = (self.endog - mu) / self.family.link.deriv(mu)
+        score_factor /= self.family.variance(mu)
+        score_factor *= self.iweights * self.n_trials
+
+        if not scale == 1:
+            score_factor /= scale
+
+        return score_factor

     def hessian_factor(self, params, scale=None, observed=True):
         """Weights for calculating Hessian
@@ -404,7 +549,42 @@ class GLM(base.LikelihoodModel):
             A 1d weight vector used in the calculation of the Hessian.
             The hessian is obtained by `(exog.T * hessian_factor).dot(exog)`
         """
-        pass
+
+        # calculating eim_factor
+        mu = self.predict(params)
+        if scale is None:
+            scale = self.estimate_scale(mu)
+
+        eim_factor = 1 / (self.family.link.deriv(mu)**2 *
+                          self.family.variance(mu))
+        eim_factor *= self.iweights * self.n_trials
+
+        if not observed:
+            if not scale == 1:
+                eim_factor /= scale
+            return eim_factor
+
+        # calculating oim_factor, eim_factor is with scale=1
+
+        score_factor = self.score_factor(params, scale=1.)
+        if eim_factor.ndim > 1 or score_factor.ndim > 1:
+            raise RuntimeError('something wrong')
+
+        tmp = self.family.variance(mu) * self.family.link.deriv2(mu)
+        tmp += self.family.variance.deriv(mu) * self.family.link.deriv(mu)
+
+        tmp = score_factor * tmp
+        # correct for duplicatee iweights in oim_factor and score_factor
+        tmp /= self.iweights * self.n_trials
+        oim_factor = eim_factor * (1 + tmp)
+
+        if tmp.ndim > 1:
+            raise RuntimeError('something wrong')
+
+        if not scale == 1:
+            oim_factor /= scale
+
+        return oim_factor

     def hessian(self, params, scale=None, observed=None):
         """Hessian, second derivative of loglikelihood function
@@ -426,30 +606,74 @@ class GLM(base.LikelihoodModel):
         hessian : ndarray
             Hessian, i.e. observed information, or expected information matrix.
         """
-        pass
+        if observed is None:
+            if getattr(self, '_optim_hessian', None) == 'eim':
+                observed = False
+            else:
+                observed = True
+        scale = float_like(scale, "scale", optional=True)
+        tmp = getattr(self, '_tmp_like_exog', np.empty_like(self.exog, dtype=float))
+
+        factor = self.hessian_factor(params, scale=scale, observed=observed)
+        np.multiply(self.exog.T, factor, out=tmp.T)
+        return -tmp.T.dot(self.exog)

     def information(self, params, scale=None):
         """
         Fisher information matrix.
         """
-        pass
+        scale = float_like(scale, "scale", optional=True)
+        return self.hessian(params, scale=scale, observed=False)

-    def _derivative_exog(self, params, exog=None, transform='dydx',
-        dummy_idx=None, count_idx=None, offset=None, exposure=None):
+    def _derivative_exog(self, params, exog=None, transform="dydx",
+                         dummy_idx=None, count_idx=None,
+                         offset=None, exposure=None):
         """
         Derivative of mean, expected endog with respect to the parameters
         """
-        pass
+        if exog is None:
+            exog = self.exog
+        if (offset is not None) or (exposure is not None):
+            raise NotImplementedError("offset and exposure not supported")
+
+        lin_pred = self.predict(params, exog, which="linear",
+                                offset=offset, exposure=exposure)
+
+        k_extra = getattr(self, 'k_extra', 0)
+        params_exog = params if k_extra == 0 else params[:-k_extra]
+
+        margeff = (self.family.link.inverse_deriv(lin_pred)[:, None] *
+                   params_exog)
+        if 'ex' in transform:
+            margeff *= exog
+        if 'ey' in transform:
+            mean = self.family.link.inverse(lin_pred)
+            margeff /= mean[:,None]
+
+        return self._derivative_exog_helper(margeff, params, exog,
+                                            dummy_idx, count_idx, transform)

     def _derivative_exog_helper(self, margeff, params, exog, dummy_idx,
-        count_idx, transform):
+                                count_idx, transform):
         """
         Helper for _derivative_exog to wrap results appropriately
         """
-        pass
+        from statsmodels.discrete.discrete_margins import (
+            _get_count_effects,
+            _get_dummy_effects,
+            )
+
+        if count_idx is not None:
+            margeff = _get_count_effects(margeff, exog, count_idx, transform,
+                                         self, params)
+        if dummy_idx is not None:
+            margeff = _get_dummy_effects(margeff, exog, dummy_idx, transform,
+                                         self, params)
+
+        return margeff

     def _derivative_predict(self, params, exog=None, transform='dydx',
-        offset=None, exposure=None):
+                            offset=None, exposure=None):
         """
         Derivative of the expected endog with respect to the parameters.

@@ -468,7 +692,22 @@ class GLM(base.LikelihoodModel):
         The value of the derivative of the expected endog with respect
         to the parameter vector.
         """
-        pass
+        # core part is same as derivative_mean_params
+        # additionally handles exog and transform
+        if exog is None:
+            exog = self.exog
+        if (offset is not None) or (exposure is not None) or (
+                getattr(self, 'offset', None) is not None):
+            raise NotImplementedError("offset and exposure not supported")
+
+        lin_pred = self.predict(params, exog=exog, which="linear")
+        idl = self.family.link.inverse_deriv(lin_pred)
+        dmat = exog * idl[:, None]
+        if 'ey' in transform:
+            mean = self.family.link.inverse(lin_pred)
+            dmat /= mean[:, None]
+
+        return dmat

     def _deriv_mean_dparams(self, params):
         """
@@ -484,7 +723,10 @@ class GLM(base.LikelihoodModel):
         The value of the derivative of the expected endog with respect
         to the parameter vector.
         """
-        pass
+        lin_pred = self.predict(params, which="linear")
+        idl = self.family.link.inverse_deriv(lin_pred)
+        dmat = self.exog * idl[:, None]
+        return dmat

     def _deriv_score_obs_dendog(self, params, scale=None):
         """derivative of score_obs w.r.t. endog
@@ -505,10 +747,22 @@ class GLM(base.LikelihoodModel):
             can is given by `score_factor0[:, None] * exog` where
             `score_factor0` is the score_factor without the residual.
         """
-        pass
+        scale = float_like(scale, "scale", optional=True)
+        mu = self.predict(params)
+        if scale is None:
+            scale = self.estimate_scale(mu)
+
+        score_factor = 1 / self.family.link.deriv(mu)
+        score_factor /= self.family.variance(mu)
+        score_factor *= self.iweights * self.n_trials
+
+        if not scale == 1:
+            score_factor /= scale

-    def score_test(self, params_constrained, k_constraints=None, exog_extra
-        =None, observed=True):
+        return score_factor[:, None] * self.exog
+
+    def score_test(self, params_constrained, k_constraints=None,
+                   exog_extra=None, observed=True):
         """score test for restrictions or for omitted variables

         The covariance matrix for the score is based on the Hessian, i.e.
@@ -548,13 +802,47 @@ class GLM(base.LikelihoodModel):
         -----
         not yet verified for case with scale not equal to 1.
         """
-        pass
+
+        if exog_extra is None:
+            if k_constraints is None:
+                raise ValueError('if exog_extra is None, then k_constraints'
+                                 'needs to be given')
+
+            score = self.score(params_constrained)
+            hessian = self.hessian(params_constrained, observed=observed)
+
+        else:
+            # exog_extra = np.asarray(exog_extra)
+            if k_constraints is None:
+                k_constraints = 0
+
+            ex = np.column_stack((self.exog, exog_extra))
+            k_constraints += ex.shape[1] - self.exog.shape[1]
+
+            score_factor = self.score_factor(params_constrained)
+            score = (score_factor[:, None] * ex).sum(0)
+            hessian_factor = self.hessian_factor(params_constrained,
+                                                 observed=observed)
+            hessian = -np.dot(ex.T * hessian_factor, ex)
+
+        from scipy import stats
+
+        # TODO check sign, why minus?
+        chi2stat = -score.dot(np.linalg.solve(hessian, score[:, None]))
+        pval = stats.chi2.sf(chi2stat, k_constraints)
+        # return a stats results instance instead?  Contrast?
+        return chi2stat, pval, k_constraints

     def _update_history(self, tmp_result, mu, history):
         """
         Helper method to update history during iterative fit.
         """
-        pass
+        history['params'].append(tmp_result.params)
+        history['deviance'].append(self.family.deviance(self.endog, mu,
+                                                        self.var_weights,
+                                                        self.freq_weights,
+                                                        self.scale))
+        return history

     def estimate_scale(self, mu):
         """
@@ -581,9 +869,35 @@ class GLM(base.LikelihoodModel):
         --------
         statsmodels.genmod.generalized_linear_model.GLM.fit
         """
-        pass
+        if not self.scaletype:
+            if isinstance(self.family, (families.Binomial, families.Poisson,
+                                        families.NegativeBinomial)):
+                return 1.
+            else:
+                return self._estimate_x2_scale(mu)
+
+        if isinstance(self.scaletype, float):
+            return np.array(self.scaletype)
+
+        if isinstance(self.scaletype, str):
+            if self.scaletype.lower() == 'x2':
+                return self._estimate_x2_scale(mu)
+            elif self.scaletype.lower() == 'dev':
+                return (self.family.deviance(self.endog, mu, self.var_weights,
+                                             self.freq_weights, 1.) /
+                        (self.df_resid))
+            else:
+                raise ValueError("Scale %s with type %s not understood" %
+                                 (self.scaletype, type(self.scaletype)))
+        else:
+            raise ValueError("Scale %s with type %s not understood" %
+                             (self.scaletype, type(self.scaletype)))

-    def estimate_tweedie_power(self, mu, method='brentq', low=1.01, high=5.0):
+    def _estimate_x2_scale(self, mu):
+        resid = np.power(self.endog - mu, 2) * self.iweights
+        return np.sum(resid / self.family.variance(mu)) / self.df_resid
+
+    def estimate_tweedie_power(self, mu, method='brentq', low=1.01, high=5.):
         """
         Tweedie specific function to estimate scale and the variance parameter.
         The variance parameter is also referred to as p, xi, or shape.
@@ -607,10 +921,22 @@ class GLM(base.LikelihoodModel):
         power : float
             The estimated shape or power.
         """
-        pass
+        if method == 'brentq':
+            from scipy.optimize import brentq
+
+            def psi_p(power, mu):
+                scale = ((self.iweights * (self.endog - mu) ** 2 /
+                          (mu ** power)).sum() / self.df_resid)
+                return (np.sum(self.iweights * ((self.endog - mu) ** 2 /
+                               (scale * (mu ** power)) - 1) *
+                               np.log(mu)) / self.freq_weights.sum())
+            power = brentq(psi_p, low, high, args=(mu))
+        else:
+            raise NotImplementedError('Only brentq can currently be used')
+        return power

-    def predict(self, params, exog=None, exposure=None, offset=None, which=
-        'mean', linear=None):
+    def predict(self, params, exog=None, exposure=None, offset=None,
+                which="mean", linear=None):
         """
         Return predicted values for a design matrix

@@ -654,10 +980,50 @@ class GLM(base.LikelihoodModel):

         Exposure values must be strictly positive.
         """
-        pass
+        if linear is not None:
+            msg = 'linear keyword is deprecated, use which="linear"'
+            warnings.warn(msg, FutureWarning)
+            if linear is True:
+                which = "linear"
+
+        # Use fit offset if appropriate
+        if offset is None and exog is None and hasattr(self, 'offset'):
+            offset = self.offset
+        elif offset is None:
+            offset = 0.
+
+        if exposure is not None and not isinstance(self.family.link,
+                                                   families.links.Log):
+            raise ValueError("exposure can only be used with the log link "
+                             "function")
+
+        # Use fit exposure if appropriate
+        if exposure is None and exog is None and hasattr(self, 'exposure'):
+            # Already logged
+            exposure = self.exposure
+        elif exposure is None:
+            exposure = 0.
+        else:
+            exposure = np.log(np.asarray(exposure))
+
+        if exog is None:
+            exog = self.exog
+
+        linpred = np.dot(exog, params) + offset + exposure
+
+        if which == "mean":
+            return self.family.fitted(linpred)
+        elif which == "linear":
+            return linpred
+        elif which == "var_unscaled":
+            mean = self.family.fitted(linpred)
+            var_ = self.family.variance(mean)
+            return var_
+        else:
+            raise ValueError(f'The which value "{which}" is not recognized')

     def get_distribution(self, params, scale=None, exog=None, exposure=None,
-        offset=None, var_weights=1.0, n_trials=1.0):
+                         offset=None, var_weights=1., n_trials=1.):
         """
         Return a instance of the predictive distribution.

@@ -694,11 +1060,38 @@ class GLM(base.LikelihoodModel):
         to fit the model.  If any other value is used for ``n``, misleading
         results will be produced.
         """
-        pass
+        scale = float_like(scale, "scale", optional=True)
+        # use scale=1, independent of QMLE scale for discrete
+        if isinstance(self.family, (families.Binomial, families.Poisson,
+                                    families.NegativeBinomial)):
+            scale = 1.
+
+        mu = self.predict(params, exog, exposure, offset, which="mean")
+
+        kwds = {}
+        if (np.any(n_trials != 1) and
+                isinstance(self.family, families.Binomial)):

-    def fit(self, start_params=None, maxiter=100, method='IRLS', tol=1e-08,
-        scale=None, cov_type='nonrobust', cov_kwds=None, use_t=None,
-        full_output=True, disp=False, max_start_irls=3, **kwargs):
+            kwds["n_trials"] = n_trials
+
+        distr = self.family.get_distribution(mu, scale,
+                                             var_weights=var_weights, **kwds)
+        return distr
+
+    def _setup_binomial(self):
+        # this checks what kind of data is given for Binomial.
+        # family will need a reference to endog if this is to be removed from
+        # preprocessing
+        self.n_trials = np.ones((self.endog.shape[0]))  # For binomial
+        if isinstance(self.family, families.Binomial):
+            tmp = self.family.initialize(self.endog, self.freq_weights)
+            self.endog = tmp[0]
+            self.n_trials = tmp[1]
+            self._init_keys.append('n_trials')
+
+    def fit(self, start_params=None, maxiter=100, method='IRLS', tol=1e-8,
+            scale=None, cov_type='nonrobust', cov_kwds=None, use_t=None,
+            full_output=True, disp=False, max_start_irls=3, **kwargs):
         """
         Fits a generalized linear model for a given family.

@@ -777,28 +1170,208 @@ class GLM(base.LikelihoodModel):
         instance of the IRLS iteration is attached to the results instance
         as `results_wls` attribute.
         """
-        pass
-
-    def _fit_gradient(self, start_params=None, method='newton', maxiter=100,
-        tol=1e-08, full_output=True, disp=True, scale=None, cov_type=
-        'nonrobust', cov_kwds=None, use_t=None, max_start_irls=3, **kwargs):
+        if isinstance(scale, str):
+            scale = scale.lower()
+            if scale not in ("x2", "dev"):
+                raise ValueError(
+                    "scale must be either X2 or dev when a string."
+                )
+        elif scale is not None:
+            # GH-6627
+            try:
+                scale = float(scale)
+            except Exception as exc:
+                raise type(exc)(
+                    "scale must be a float if given and no a string."
+                )
+        self.scaletype = scale
+
+        if method.lower() == "irls":
+            if cov_type.lower() == 'eim':
+                cov_type = 'nonrobust'
+            return self._fit_irls(start_params=start_params, maxiter=maxiter,
+                                  tol=tol, scale=scale, cov_type=cov_type,
+                                  cov_kwds=cov_kwds, use_t=use_t, **kwargs)
+        else:
+            self._optim_hessian = kwargs.get('optim_hessian')
+            if self._optim_hessian is not None:
+                del kwargs['optim_hessian']
+            self._tmp_like_exog = np.empty_like(self.exog, dtype=float)
+            fit_ = self._fit_gradient(start_params=start_params,
+                                      method=method,
+                                      maxiter=maxiter,
+                                      tol=tol, scale=scale,
+                                      full_output=full_output,
+                                      disp=disp, cov_type=cov_type,
+                                      cov_kwds=cov_kwds, use_t=use_t,
+                                      max_start_irls=max_start_irls,
+                                      **kwargs)
+            del self._optim_hessian
+            del self._tmp_like_exog
+            return fit_
+
+    def _fit_gradient(self, start_params=None, method="newton",
+                      maxiter=100, tol=1e-8, full_output=True,
+                      disp=True, scale=None, cov_type='nonrobust',
+                      cov_kwds=None, use_t=None, max_start_irls=3,
+                      **kwargs):
         """
         Fits a generalized linear model for a given family iteratively
         using the scipy gradient optimizers.
         """
-        pass

-    def _fit_irls(self, start_params=None, maxiter=100, tol=1e-08, scale=
-        None, cov_type='nonrobust', cov_kwds=None, use_t=None, **kwargs):
+        # fix scale during optimization, see #4616
+        scaletype = self.scaletype
+        self.scaletype = 1.
+
+        if (max_start_irls > 0) and (start_params is None):
+            irls_rslt = self._fit_irls(start_params=start_params,
+                                       maxiter=max_start_irls,
+                                       tol=tol, scale=1., cov_type='nonrobust',
+                                       cov_kwds=None, use_t=None,
+                                       **kwargs)
+            start_params = irls_rslt.params
+            del irls_rslt
+        rslt = super(GLM, self).fit(start_params=start_params,
+                                    maxiter=maxiter, full_output=full_output,
+                                    method=method, disp=disp, **kwargs)
+
+        # reset scaletype to original
+        self.scaletype = scaletype
+
+        mu = self.predict(rslt.params)
+        scale = self.estimate_scale(mu)
+
+        if rslt.normalized_cov_params is None:
+            cov_p = None
+        else:
+            cov_p = rslt.normalized_cov_params / scale
+
+        if cov_type.lower() == 'eim':
+            oim = False
+            cov_type = 'nonrobust'
+        else:
+            oim = True
+
+        try:
+            cov_p = np.linalg.inv(-self.hessian(rslt.params, observed=oim)) / scale
+        except LinAlgError:
+            warnings.warn('Inverting hessian failed, no bse or cov_params '
+                          'available', HessianInversionWarning)
+            cov_p = None
+
+        results_class = getattr(self, '_results_class', GLMResults)
+        results_class_wrapper = getattr(self, '_results_class_wrapper', GLMResultsWrapper)
+        glm_results = results_class(self, rslt.params,
+                                    cov_p,
+                                    scale,
+                                    cov_type=cov_type, cov_kwds=cov_kwds,
+                                    use_t=use_t)
+
+        # TODO: iteration count is not always available
+        history = {'iteration': 0}
+        if full_output:
+            glm_results.mle_retvals = rslt.mle_retvals
+            if 'iterations' in rslt.mle_retvals:
+                history['iteration'] = rslt.mle_retvals['iterations']
+        glm_results.method = method
+        glm_results.fit_history = history
+
+        return results_class_wrapper(glm_results)
+
+    def _fit_irls(self, start_params=None, maxiter=100, tol=1e-8,
+                  scale=None, cov_type='nonrobust', cov_kwds=None,
+                  use_t=None, **kwargs):
         """
         Fits a generalized linear model for a given family using
         iteratively reweighted least squares (IRLS).
         """
-        pass
-
-    def fit_regularized(self, method='elastic_net', alpha=0.0, start_params
-        =None, refit=False, opt_method='bfgs', **kwargs):
-        """
+        attach_wls = kwargs.pop('attach_wls', False)
+        atol = kwargs.get('atol')
+        rtol = kwargs.get('rtol', 0.)
+        tol_criterion = kwargs.get('tol_criterion', 'deviance')
+        wls_method = kwargs.get('wls_method', 'lstsq')
+        atol = tol if atol is None else atol
+
+        endog = self.endog
+        wlsexog = self.exog
+        if start_params is None:
+            start_params = np.zeros(self.exog.shape[1])
+            mu = self.family.starting_mu(self.endog)
+            lin_pred = self.family.predict(mu)
+        else:
+            lin_pred = np.dot(wlsexog, start_params) + self._offset_exposure
+            mu = self.family.fitted(lin_pred)
+        self.scale = self.estimate_scale(mu)
+        dev = self.family.deviance(self.endog, mu, self.var_weights,
+                                   self.freq_weights, self.scale)
+        if np.isnan(dev):
+            raise ValueError("The first guess on the deviance function "
+                             "returned a nan.  This could be a boundary "
+                             " problem and should be reported.")
+
+        # first guess on the deviance is assumed to be scaled by 1.
+        # params are none to start, so they line up with the deviance
+        history = dict(params=[np.inf, start_params], deviance=[np.inf, dev])
+        converged = False
+        criterion = history[tol_criterion]
+        # This special case is used to get the likelihood for a specific
+        # params vector.
+        if maxiter == 0:
+            mu = self.family.fitted(lin_pred)
+            self.scale = self.estimate_scale(mu)
+            wls_results = lm.RegressionResults(self, start_params, None)
+            iteration = 0
+        for iteration in range(maxiter):
+            self.weights = (self.iweights * self.n_trials *
+                            self.family.weights(mu))
+            wlsendog = (lin_pred + self.family.link.deriv(mu) * (self.endog-mu)
+                        - self._offset_exposure)
+            wls_mod = reg_tools._MinimalWLS(wlsendog, wlsexog,
+                                            self.weights, check_endog=True,
+                                            check_weights=True)
+            wls_results = wls_mod.fit(method=wls_method)
+            lin_pred = np.dot(self.exog, wls_results.params)
+            lin_pred += self._offset_exposure
+            mu = self.family.fitted(lin_pred)
+            history = self._update_history(wls_results, mu, history)
+            self.scale = self.estimate_scale(mu)
+            if endog.squeeze().ndim == 1 and np.allclose(mu - endog, 0):
+                msg = ("Perfect separation or prediction detected, "
+                       "parameter may not be identified")
+                warnings.warn(msg, category=PerfectSeparationWarning)
+            converged = _check_convergence(criterion, iteration + 1, atol,
+                                           rtol)
+            if converged:
+                break
+        self.mu = mu
+
+        if maxiter > 0:  # Only if iterative used
+            wls_method2 = 'pinv' if wls_method == 'lstsq' else wls_method
+            wls_model = lm.WLS(wlsendog, wlsexog, self.weights)
+            wls_results = wls_model.fit(method=wls_method2)
+
+        glm_results = GLMResults(self, wls_results.params,
+                                 wls_results.normalized_cov_params,
+                                 self.scale,
+                                 cov_type=cov_type, cov_kwds=cov_kwds,
+                                 use_t=use_t)
+
+        glm_results.method = "IRLS"
+        glm_results.mle_settings = {}
+        glm_results.mle_settings['wls_method'] = wls_method
+        glm_results.mle_settings['optimizer'] = glm_results.method
+        if (maxiter > 0) and (attach_wls is True):
+            glm_results.results_wls = wls_results
+        history['iteration'] = iteration + 1
+        glm_results.fit_history = history
+        glm_results.converged = converged
+        return GLMResultsWrapper(glm_results)
+
+    def fit_regularized(self, method="elastic_net", alpha=0.,
+                        start_params=None, refit=False,
+                        opt_method="bfgs", **kwargs):
+        r"""
         Return a regularized fit to a linear regression model.

         Parameters
@@ -835,7 +1408,7 @@ class GLM(base.LikelihoodModel):

         .. math::

-            -loglike/n + alpha*((1-L1\\_wt)*|params|_2^2/2 + L1\\_wt*|params|_1)
+            -loglike/n + alpha*((1-L1\_wt)*|params|_2^2/2 + L1\_wt*|params|_1)

         where :math:`|*|_1` and :math:`|*|_2` are the L1 and L2 norms.

@@ -855,7 +1428,73 @@ class GLM(base.LikelihoodModel):
         zero_tol : float
             Coefficients below this threshold are treated as zero.
         """
-        pass
+
+        if kwargs.get("L1_wt", 1) == 0:
+            return self._fit_ridge(alpha, start_params, opt_method)
+
+        from statsmodels.base.elastic_net import fit_elasticnet
+
+        if method != "elastic_net":
+            raise ValueError("method for fit_regularized must be elastic_net")
+
+        defaults = {"maxiter": 50, "L1_wt": 1, "cnvrg_tol": 1e-10,
+                    "zero_tol": 1e-10}
+        defaults.update(kwargs)
+
+        llkw = kwargs.get("loglike_kwds", {})
+        sckw = kwargs.get("score_kwds", {})
+        hekw = kwargs.get("hess_kwds", {})
+        llkw["scale"] = 1
+        sckw["scale"] = 1
+        hekw["scale"] = 1
+        defaults["loglike_kwds"] = llkw
+        defaults["score_kwds"] = sckw
+        defaults["hess_kwds"] = hekw
+
+        result = fit_elasticnet(self, method=method,
+                                alpha=alpha,
+                                start_params=start_params,
+                                refit=refit,
+                                **defaults)
+
+        self.mu = self.predict(result.params)
+        self.scale = self.estimate_scale(self.mu)
+
+        if not result.converged:
+            warnings.warn("Elastic net fitting did not converge")
+
+        return result
+
+    def _fit_ridge(self, alpha, start_params, method):
+
+        if start_params is None:
+            start_params = np.zeros(self.exog.shape[1])
+
+        def fun(x):
+            return -(self.loglike(x) / self.nobs - np.sum(alpha * x**2) / 2)
+
+        def grad(x):
+            return -(self.score(x) / self.nobs - alpha * x)
+
+        from scipy.optimize import minimize
+
+        from statsmodels.base.elastic_net import (
+            RegularizedResults,
+            RegularizedResultsWrapper,
+        )
+
+        mr = minimize(fun, start_params, jac=grad, method=method)
+        params = mr.x
+
+        if not mr.success:
+            ngrad = np.sqrt(np.sum(mr.jac**2))
+            msg = "GLM ridge optimization may have failed, |grad|=%f" % ngrad
+            warnings.warn(msg)
+
+        results = RegularizedResults(self, params)
+        results = RegularizedResultsWrapper(results)
+
+        return results

     def fit_constrained(self, constraints, start_params=None, **fit_kwds):
         """fit the model subject to linear equality constraints
@@ -887,11 +1526,44 @@ class GLM(base.LikelihoodModel):
         -------
         results : Results instance
         """
-        pass
+
+        from patsy import DesignInfo
+
+        from statsmodels.base._constraints import (
+            LinearConstraints,
+            fit_constrained,
+        )
+
+        # same pattern as in base.LikelihoodModel.t_test
+        lc = DesignInfo(self.exog_names).linear_constraint(constraints)
+        R, q = lc.coefs, lc.constants
+
+        # TODO: add start_params option, need access to tranformation
+        #       fit_constrained needs to do the transformation
+        params, cov, res_constr = fit_constrained(self, R, q,
+                                                  start_params=start_params,
+                                                  fit_kwds=fit_kwds)
+        # create dummy results Instance, TODO: wire up properly
+        res = self.fit(start_params=params, maxiter=0)  # we get a wrapper back
+        res._results.params = params
+        res._results.cov_params_default = cov
+        cov_type = fit_kwds.get('cov_type', 'nonrobust')
+        if cov_type != 'nonrobust':
+            res._results.normalized_cov_params = cov / res_constr.scale
+        else:
+            res._results.normalized_cov_params = None
+        res._results.scale = res_constr.scale
+        k_constr = len(q)
+        res._results.df_resid += k_constr
+        res._results.df_model -= k_constr
+        res._results.constraints = LinearConstraints.from_patsy(lc)
+        res._results.k_constr = k_constr
+        res._results.results_constrained = res_constr
+        return res


 get_prediction_doc = Docstring(pred.get_prediction_glm.__doc__)
-get_prediction_doc.remove_parameters('pred_kwds')
+get_prediction_doc.remove_parameters("pred_kwds")


 class GLMResults(base.LikelihoodModelResults):
@@ -933,9 +1605,12 @@ class GLMResults(base.LikelihoodModelResults):
     """

     def __init__(self, model, params, normalized_cov_params, scale,
-        cov_type='nonrobust', cov_kwds=None, use_t=None):
-        super(GLMResults, self).__init__(model, params,
-            normalized_cov_params=normalized_cov_params, scale=scale)
+                 cov_type='nonrobust', cov_kwds=None, use_t=None):
+        super(GLMResults, self).__init__(
+                model,
+                params,
+                normalized_cov_params=normalized_cov_params,
+                scale=scale)
         self.family = model.family
         self._endog = model.endog
         self.nobs = model.endog.shape[0]
@@ -949,36 +1624,48 @@ class GLMResults(base.LikelihoodModelResults):
         self.df_resid = model.df_resid
         self.df_model = model.df_model
         self._cache = {}
+        # are these intermediate results needed or can we just
+        # call the model's attributes?
+
+        # for remove data and pickle without large arrays
         self._data_attr.extend(['results_constrained', '_freq_weights',
-            '_var_weights', '_iweights'])
+                                '_var_weights', '_iweights'])
         self._data_in_cache.extend(['null', 'mu'])
         self._data_attr_model = getattr(self, '_data_attr_model', [])
         self._data_attr_model.append('mu')
+
+        # robust covariance
         from statsmodels.base.covtype import get_robustcov_results
         if use_t is None:
-            self.use_t = False
+            self.use_t = False    # TODO: class default
         else:
             self.use_t = use_t
-        ct = cov_type == 'nonrobust' or cov_type.upper().startswith('HC')
+
+        # temporary warning
+        ct = (cov_type == 'nonrobust') or (cov_type.upper().startswith('HC'))
         if self.model._has_freq_weights and not ct:
+
             from statsmodels.tools.sm_exceptions import SpecificationWarning
             warnings.warn('cov_type not fully supported with freq_weights',
-                SpecificationWarning)
+                          SpecificationWarning)
+
         if self.model._has_var_weights and not ct:
+
             from statsmodels.tools.sm_exceptions import SpecificationWarning
             warnings.warn('cov_type not fully supported with var_weights',
-                SpecificationWarning)
+                          SpecificationWarning)
+
         if cov_type == 'nonrobust':
             self.cov_type = 'nonrobust'
-            self.cov_kwds = {'description': 
-                'Standard Errors assume that the' +
-                ' covariance matrix of the errors is correctly ' + 'specified.'
-                }
+            self.cov_kwds = {'description': 'Standard Errors assume that the' +
+                             ' covariance matrix of the errors is correctly ' +
+                             'specified.'}
+
         else:
             if cov_kwds is None:
                 cov_kwds = {}
             get_robustcov_results(self, cov_type=cov_type, use_self=True,
-                use_t=use_t, **cov_kwds)
+                                  use_t=use_t, **cov_kwds)

     @cached_data
     def resid_response(self):
@@ -986,7 +1673,7 @@ class GLMResults(base.LikelihoodModelResults):
         Response residuals.  The response residuals are defined as
         `endog` - `fittedvalues`
         """
-        pass
+        return self._n_trials * (self._endog-self.mu)

     @cached_data
     def resid_pearson(self):
@@ -996,7 +1683,9 @@ class GLMResults(base.LikelihoodModelResults):
         specific variance function.  See statsmodels.families.family and
         statsmodels.families.varfuncs for more information.
         """
-        pass
+        return (np.sqrt(self._n_trials) * (self._endog-self.mu) *
+                np.sqrt(self._var_weights) /
+                np.sqrt(self.family.variance(self.mu)))

     @cached_data
     def resid_working(self):
@@ -1005,7 +1694,10 @@ class GLMResults(base.LikelihoodModelResults):
         `resid_response`/link'(`mu`).  See statsmodels.family.links for the
         derivatives of the link functions.  They are defined analytically.
         """
-        pass
+        # Isn't self.resid_response is already adjusted by _n_trials?
+        val = (self.resid_response * self.family.link.deriv(self.mu))
+        val *= self._n_trials
+        return val

     @cached_data
     def resid_anscombe(self):
@@ -1014,7 +1706,7 @@ class GLMResults(base.LikelihoodModelResults):
         specific Anscombe residuals. Currently, the unscaled residuals are
         provided. In a future version, the scaled residuals will be provided.
         """
-        pass
+        return self.resid_anscombe_scaled

     @cached_data
     def resid_anscombe_scaled(self):
@@ -1022,7 +1714,9 @@ class GLMResults(base.LikelihoodModelResults):
         Scaled Anscombe residuals.  See statsmodels.families.family for
         distribution-specific Anscombe residuals.
         """
-        pass
+        return self.family.resid_anscombe(self._endog, self.fittedvalues,
+                                          var_weights=self._var_weights,
+                                          scale=self.scale)

     @cached_data
     def resid_anscombe_unscaled(self):
@@ -1030,7 +1724,9 @@ class GLMResults(base.LikelihoodModelResults):
         Unscaled Anscombe residuals.  See statsmodels.families.family for
         distribution-specific Anscombe residuals.
         """
-        pass
+        return self.family.resid_anscombe(self._endog, self.fittedvalues,
+                                          var_weights=self._var_weights,
+                                          scale=1.)

     @cached_data
     def resid_deviance(self):
@@ -1038,7 +1734,10 @@ class GLMResults(base.LikelihoodModelResults):
         Deviance residuals.  See statsmodels.families.family for distribution-
         specific deviance residuals.
         """
-        pass
+        dev = self.family.resid_dev(self._endog, self.fittedvalues,
+                                    var_weights=self._var_weights,
+                                    scale=1.)
+        return dev

     @cached_value
     def pearson_chi2(self):
@@ -1046,7 +1745,10 @@ class GLMResults(base.LikelihoodModelResults):
         Pearson's Chi-Squared statistic is defined as the sum of the squares
         of the Pearson residuals.
         """
-        pass
+        chisq = (self._endog - self.mu)**2 / self.family.variance(self.mu)
+        chisq *= self._iweights * self._n_trials
+        chisqsum = np.sum(chisq)
+        return chisqsum

     @cached_data
     def fittedvalues(self):
@@ -1058,21 +1760,43 @@ class GLMResults(base.LikelihoodModelResults):
         obtained by multiplying the design matrix by the coefficient
         vector.
         """
-        pass
+        return self.mu

     @cached_data
     def mu(self):
         """
         See GLM docstring.
         """
-        pass
+        return self.model.predict(self.params)

     @cache_readonly
     def null(self):
         """
         Fitted values of the null model
         """
-        pass
+        endog = self._endog
+        model = self.model
+        exog = np.ones((len(endog), 1))
+
+        kwargs = model._get_init_kwds().copy()
+        kwargs.pop('family')
+
+        for key in getattr(model, '_null_drop_keys', []):
+            del kwargs[key]
+        start_params = np.atleast_1d(self.family.link(endog.mean()))
+        oe = self.model._offset_exposure
+        if not (np.size(oe) == 1 and oe == 0):
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore", DomainWarning)
+                mod = GLM(endog, exog, family=self.family, **kwargs)
+                fitted = mod.fit(start_params=start_params).fittedvalues
+        else:
+            # correct if fitted is identical across observations
+            wls_model = lm.WLS(endog, exog,
+                               weights=self._iweights * self._n_trials)
+            fitted = wls_model.fit().fittedvalues
+
+        return fitted

     @cache_readonly
     def deviance(self):
@@ -1080,20 +1804,25 @@ class GLMResults(base.LikelihoodModelResults):
         See statsmodels.families.family for the distribution-specific deviance
         functions.
         """
-        pass
+        return self.family.deviance(self._endog, self.mu, self._var_weights,
+                                    self._freq_weights)

     @cache_readonly
     def null_deviance(self):
         """The value of the deviance function for the model fit with a constant
         as the only regressor."""
-        pass
+        return self.family.deviance(self._endog, self.null, self._var_weights,
+                                    self._freq_weights)

     @cache_readonly
     def llnull(self):
         """
         Log-likelihood of the model fit with a constant as the only regressor
         """
-        pass
+        return self.family.loglike(self._endog, self.null,
+                                   var_weights=self._var_weights,
+                                   freq_weights=self._freq_weights,
+                                   scale=self.scale)

     def llf_scaled(self, scale=None):
         """
@@ -1102,7 +1831,24 @@ class GLMResults(base.LikelihoodModelResults):
         case with linear link, the concentrated log-likelihood is
         returned.
         """
-        pass
+
+        _modelfamily = self.family
+        if scale is None:
+            if (isinstance(self.family, families.Gaussian) and
+                    isinstance(self.family.link, families.links.Power) and
+                    (self.family.link.power == 1.)):
+                # Scale for the concentrated Gaussian log likelihood
+                # (profile log likelihood with the scale parameter
+                # profiled out).
+                scale = (np.power(self._endog - self.mu, 2) * self._iweights).sum()
+                scale /= self.model.wnobs
+            else:
+                scale = self.scale
+        val = _modelfamily.loglike(self._endog, self.mu,
+                                   var_weights=self._var_weights,
+                                   freq_weights=self._freq_weights,
+                                   scale=scale)
+        return val

     @cached_value
     def llf(self):
@@ -1114,9 +1860,9 @@ class GLMResults(base.LikelihoodModelResults):
         otherwise it uses the non-concentrated log-likelihood evaluated
         at the estimated scale.
         """
-        pass
+        return self.llf_scaled()

-    def pseudo_rsquared(self, kind='cs'):
+    def pseudo_rsquared(self, kind="cs"):
         """
         Pseudo R-squared

@@ -1138,7 +1884,14 @@ class GLMResults(base.LikelihoodModelResults):
         float
             Pseudo R-squared
         """
-        pass
+        kind = kind.lower()
+        if kind.startswith("mcf"):
+            prsq = 1 - self.llf / self.llnull
+        elif kind.startswith("cox") or kind in ["cs", "lr"]:
+            prsq = 1 - np.exp((self.llnull - self.llf) * (2 / self.nobs))
+        else:
+            raise ValueError("only McFadden and Cox-Snell are available")
+        return prsq

     @cached_value
     def aic(self):
@@ -1146,7 +1899,7 @@ class GLMResults(base.LikelihoodModelResults):
         Akaike Information Criterion
         -2 * `llf` + 2 * (`df_model` + 1)
         """
-        pass
+        return self.info_criteria("aic")

     @property
     def bic(self):
@@ -1166,7 +1919,23 @@ class GLMResults(base.LikelihoodModelResults):
         The log-likelihood version is defined
         -2 * `llf` + (`df_model` + 1)*log(n)
         """
-        pass
+        if _use_bic_helper.use_bic_llf not in (True, False):
+            warnings.warn(
+                "The bic value is computed using the deviance formula. After "
+                "0.13 this will change to the log-likelihood based formula. "
+                "This change has no impact on the relative rank of models "
+                "compared using BIC. You can directly access the "
+                "log-likelihood version using the `bic_llf` attribute. You "
+                "can suppress this message by calling "
+                "statsmodels.genmod.generalized_linear_model.SET_USE_BIC_LLF "
+                "with True to get the LLF-based version now or False to retain"
+                "the deviance version.",
+                FutureWarning
+            )
+        if bool(_use_bic_helper.use_bic_llf):
+            return self.bic_llf
+
+        return self.bic_deviance

     @cached_value
     def bic_deviance(self):
@@ -1176,7 +1945,9 @@ class GLMResults(base.LikelihoodModelResults):
         Based on the deviance,
         `deviance` - `df_resid` * log(`nobs`)
         """
-        pass
+        return (self.deviance -
+                (self.model.wnobs - self.df_model - 1) *
+                np.log(self.model.wnobs))

     @cached_value
     def bic_llf(self):
@@ -1186,7 +1957,7 @@ class GLMResults(base.LikelihoodModelResults):
         Based on the log-likelihood,
         -2 * `llf` + log(n) * (`df_model` + 1)
         """
-        pass
+        return self.info_criteria("bic")

     def info_criteria(self, crit, scale=None, dk_params=0):
         """Return an information criterion for the model.
@@ -1225,11 +1996,32 @@ class GLMResults(base.LikelihoodModelResults):
         Burnham KP, Anderson KR (2002). Model Selection and Multimodel
         Inference; Springer New York.
         """
-        pass
-
+        crit = crit.lower()
+        k_params = self.df_model + 1 + dk_params
+
+        if crit == "aic":
+            return -2 * self.llf + 2 * k_params
+        elif crit == "bic":
+            nobs = self.df_model + self.df_resid + 1
+            bic = -2*self.llf + k_params*np.log(nobs)
+            return bic
+        elif crit == "qaic":
+            f = self.model.family
+            fl = (families.Poisson, families.NegativeBinomial,
+                  families.Binomial)
+            if not isinstance(f, fl):
+                msg = "QAIC is only valid for Binomial, Poisson and "
+                msg += "Negative Binomial families."
+                warnings.warn(msg)
+            llf = self.llf_scaled(scale=1)
+            return -2 * llf/scale + 2 * k_params
+
+    # now explicit docs, old and new behavior, copied from generic classes
+    # @Appender(str(get_prediction_doc))
     def get_prediction(self, exog=None, exposure=None, offset=None,
-        transform=True, which=None, linear=None, average=False, agg_weights
-        =None, row_labels=None):
+                       transform=True, which=None, linear=None,
+                       average=False, agg_weights=None,
+                       row_labels=None):
         """
     Compute prediction results for GLM compatible models.

@@ -1315,7 +2107,74 @@ class GLMResults(base.LikelihoodModelResults):
     compatible prediction results class will be removed.

     """
-        pass
+
+        import statsmodels.regression._prediction as linpred
+
+        pred_kwds = {'exposure': exposure, 'offset': offset, 'which': 'linear'}
+
+        if which is None:
+            # two calls to a get_prediction duplicates exog generation if patsy
+            res_linpred = linpred.get_prediction(self, exog=exog,
+                                                 transform=transform,
+                                                 row_labels=row_labels,
+                                                 pred_kwds=pred_kwds)
+
+            pred_kwds['which'] = 'mean'
+            res = pred.get_prediction_glm(self, exog=exog, transform=transform,
+                                          row_labels=row_labels,
+                                          linpred=res_linpred,
+                                          link=self.model.family.link,
+                                          pred_kwds=pred_kwds)
+        else:
+            # new generic version, if 'which' is specified
+
+            pred_kwds = {'exposure': exposure, 'offset': offset}
+            # not yet, only applies to count families
+            # y_values is explicit so we can add it to the docstring
+            # if y_values is not None:
+            #    pred_kwds["y_values"] = y_values
+
+            res = pred.get_prediction(
+                self,
+                exog=exog,
+                which=which,
+                transform=transform,
+                row_labels=row_labels,
+                average=average,
+                agg_weights=agg_weights,
+                pred_kwds=pred_kwds
+                )
+
+        return res
+
+    @Appender(pinfer.score_test.__doc__)
+    def score_test(self, exog_extra=None, params_constrained=None,
+                   hypothesis='joint', cov_type=None, cov_kwds=None,
+                   k_constraints=None, observed=True):
+
+        if self.model._has_freq_weights is True:
+            warnings.warn("score test has not been verified with freq_weights",
+                          UserWarning)
+        if self.model._has_var_weights is True:
+            warnings.warn("score test has not been verified with var_weights",
+                          UserWarning)
+
+        # We need to temporarily change model.df_resid for scale computation
+        # TODO: find a nicer way. gh #7840
+        mod_df_resid = self.model.df_resid
+        self.model.df_resid = self.df_resid
+        if k_constraints is not None:
+            self.model.df_resid += k_constraints
+        res = pinfer.score_test(self, exog_extra=exog_extra,
+                                params_constrained=params_constrained,
+                                hypothesis=hypothesis,
+                                cov_type=cov_type, cov_kwds=cov_kwds,
+                                k_constraints=k_constraints,
+                                scale=None,
+                                observed=observed)
+
+        self.model.df_resid = mod_df_resid
+        return res

     def get_hat_matrix_diag(self, observed=True):
         """
@@ -1334,7 +2193,11 @@ class GLMResults(base.LikelihoodModelResults):
             The diagonal of the hat matrix computed from the observed
             or expected hessian.
         """
-        pass
+        weights = self.model.hessian_factor(self.params, observed=observed)
+        wexog = np.sqrt(weights)[:, None] * self.model.exog
+
+        hd = (wexog * np.linalg.pinv(wexog).T).sum(1)
+        return hd

     def get_influence(self, observed=True):
         """
@@ -1357,10 +2220,22 @@ class GLMResults(base.LikelihoodModelResults):
         --------
         statsmodels.stats.outliers_influence.GLMInfluence
         """
-        pass
+        from statsmodels.stats.outliers_influence import GLMInfluence
+
+        weights = self.model.hessian_factor(self.params, observed=observed)
+        weights_sqrt = np.sqrt(weights)
+        wexog = weights_sqrt[:, None] * self.model.exog
+        wendog = weights_sqrt * self.model.endog
+
+        # using get_hat_matrix_diag has duplicated computation
+        hat_matrix_diag = self.get_hat_matrix_diag(observed=observed)
+        infl = GLMInfluence(self, endog=wendog, exog=wexog,
+                         resid=self.resid_pearson / np.sqrt(self.scale),
+                         hat_matrix_diag=hat_matrix_diag)
+        return infl

-    def get_distribution(self, exog=None, exposure=None, offset=None,
-        var_weights=1.0, n_trials=1.0):
+    def get_distribution(self, exog=None, exposure=None,
+                         offset=None, var_weights=1., n_trials=1.):
         """
         Return a instance of the predictive distribution.

@@ -1395,10 +2270,33 @@ class GLMResults(base.LikelihoodModelResults):
         to fit the model.  If any other value is used for ``n``, misleading
         results will be produced.
         """
-        pass
+        # Note this is mostly a copy of GLM.get_prediction
+        # calling here results.predict avoids the exog check and trasnform

-    def get_margeff(self, at='overall', method='dydx', atexog=None, dummy=
-        False, count=False):
+        if isinstance(self.model.family, (families.Binomial, families.Poisson,
+                                    families.NegativeBinomial)):
+            # use scale=1, independent of QMLE scale for discrete
+            scale = 1.
+            if self.scale != 1.:
+                msg = "using scale=1, no exess dispersion in distribution"
+                warnings.warn(msg, UserWarning)
+        else:
+            scale = self.scale
+
+        mu = self.predict(exog, exposure, offset, which="mean")
+
+        kwds = {}
+        if (np.any(n_trials != 1) and
+                isinstance(self.model.family, families.Binomial)):
+
+            kwds["n_trials"] = n_trials
+
+        distr = self.model.family.get_distribution(
+            mu, scale, var_weights=var_weights, **kwds)
+        return distr
+
+    def get_margeff(self, at='overall', method='dydx', atexog=None,
+            dummy=False, count=False):
         """Get marginal effects of the fitted model.

         Warning: offset, exposure and weights (var_weights and freq_weights)
@@ -1476,9 +2374,59 @@ class GLMResults(base.LikelihoodModelResults):
         handling of freq_weights for average effect "overall" might change.

         """
-        pass
+        if getattr(self.model, "offset", None) is not None:
+            raise NotImplementedError("Margins with offset are not available.")
+        if (np.any(self.model.var_weights != 1) or
+                np.any(self.model.freq_weights != 1)):
+            warnings.warn("weights are not taken into account by margeff")
+        from statsmodels.discrete.discrete_margins import DiscreteMargins
+        return DiscreteMargins(self, (at, method, atexog, dummy, count))
+
+    @Appender(base.LikelihoodModelResults.remove_data.__doc__)
+    def remove_data(self):
+        # GLM has alias/reference in result instance
+        self._data_attr.extend([i for i in self.model._data_attr
+                                if '_data.' not in i])
+        super(self.__class__, self).remove_data()
+
+        # TODO: what are these in results?
+        self._endog = None
+        self._freq_weights = None
+        self._var_weights = None
+        self._iweights = None
+        self._n_trials = None
+
+    @Appender(_plot_added_variable_doc % {'extra_params_doc': ''})
+    def plot_added_variable(self, focus_exog, resid_type=None,
+                            use_glm_weights=True, fit_kwargs=None,
+                            ax=None):
+
+        from statsmodels.graphics.regressionplots import plot_added_variable

-    def summary(self, yname=None, xname=None, title=None, alpha=0.05):
+        fig = plot_added_variable(self, focus_exog,
+                                  resid_type=resid_type,
+                                  use_glm_weights=use_glm_weights,
+                                  fit_kwargs=fit_kwargs, ax=ax)
+
+        return fig
+
+    @Appender(_plot_partial_residuals_doc % {'extra_params_doc': ''})
+    def plot_partial_residuals(self, focus_exog, ax=None):
+
+        from statsmodels.graphics.regressionplots import plot_partial_residuals
+
+        return plot_partial_residuals(self, focus_exog, ax=ax)
+
+    @Appender(_plot_ceres_residuals_doc % {'extra_params_doc': ''})
+    def plot_ceres_residuals(self, focus_exog, frac=0.66, cond_means=None,
+                             ax=None):
+
+        from statsmodels.graphics.regressionplots import plot_ceres_residuals
+
+        return plot_ceres_residuals(self, focus_exog, frac,
+                                    cond_means=cond_means, ax=ax)
+
+    def summary(self, yname=None, xname=None, title=None, alpha=.05):
         """
         Summarize the Regression Results

@@ -1506,10 +2454,54 @@ class GLMResults(base.LikelihoodModelResults):
         --------
         statsmodels.iolib.summary.Summary : class to hold summary results
         """
-        pass

-    def summary2(self, yname=None, xname=None, title=None, alpha=0.05,
-        float_format='%.4f'):
+        top_left = [('Dep. Variable:', None),
+                    ('Model:', None),
+                    ('Model Family:', [self.family.__class__.__name__]),
+                    ('Link Function:', [self.family.link.__class__.__name__]),
+                    ('Method:', [self.method]),
+                    ('Date:', None),
+                    ('Time:', None),
+                    ('No. Iterations:',
+                     ["%d" % self.fit_history['iteration']]),
+                    ]
+
+        try:
+            prsquared = self.pseudo_rsquared(kind="cs")
+        except ValueError:
+            prsquared = np.nan
+
+        top_right = [('No. Observations:', None),
+                     ('Df Residuals:', None),
+                     ('Df Model:', None),
+                     ('Scale:', ["%#8.5g" % self.scale]),
+                     ('Log-Likelihood:', None),
+                     ('Deviance:', ["%#8.5g" % self.deviance]),
+                     ('Pearson chi2:', ["%#6.3g" % self.pearson_chi2]),
+                     ('Pseudo R-squ. (CS):', ["%#6.4g" % prsquared])
+                     ]
+
+        if hasattr(self, 'cov_type'):
+            top_left.append(('Covariance Type:', [self.cov_type]))
+
+        if title is None:
+            title = "Generalized Linear Model Regression Results"
+
+        # create summary tables
+        from statsmodels.iolib.summary import Summary
+        smry = Summary()
+        smry.add_table_2cols(self, gleft=top_left, gright=top_right,
+                             yname=yname, xname=xname, title=title)
+        smry.add_table_params(self, yname=yname, xname=xname, alpha=alpha,
+                              use_t=self.use_t)
+
+        if hasattr(self, 'constraints'):
+            smry.add_extra_txt(['Model has been estimated subject to linear '
+                                'equality constraints.'])
+        return smry
+
+    def summary2(self, yname=None, xname=None, title=None, alpha=.05,
+                 float_format="%.4f"):
         """Experimental summary for regression Results

         Parameters
@@ -1538,23 +2530,43 @@ class GLMResults(base.LikelihoodModelResults):
         --------
         statsmodels.iolib.summary2.Summary : class to hold summary results
         """
-        pass
+        self.method = 'IRLS'
+        from statsmodels.iolib import summary2
+        smry = summary2.Summary()
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", FutureWarning)
+            smry.add_base(results=self, alpha=alpha, float_format=float_format,
+                          xname=xname, yname=yname, title=title)
+        if hasattr(self, 'constraints'):
+            smry.add_text('Model has been estimated subject to linear '
+                          'equality constraints.')
+
+        return smry


 class GLMResultsWrapper(lm.RegressionResultsWrapper):
-    _attrs = {'resid_anscombe': 'rows', 'resid_deviance': 'rows',
-        'resid_pearson': 'rows', 'resid_response': 'rows', 'resid_working':
-        'rows'}
+    _attrs = {
+        'resid_anscombe': 'rows',
+        'resid_deviance': 'rows',
+        'resid_pearson': 'rows',
+        'resid_response': 'rows',
+        'resid_working': 'rows'
+    }
     _wrap_attrs = wrap.union_dicts(lm.RegressionResultsWrapper._wrap_attrs,
-        _attrs)
+                                   _attrs)


 wrap.populate_wrapper(GLMResultsWrapper, GLMResults)
-if __name__ == '__main__':
+
+if __name__ == "__main__":
     from statsmodels.datasets import longley
     data = longley.load()
+    # data.exog = add_constant(data.exog)
     GLMmod = GLM(data.endog, data.exog).fit()
     GLMT = GLMmod.summary(returns='tables')
+    # GLMT[0].extend_right(GLMT[1])
+    # print(GLMT[0])
+    # print(GLMT[2])
     GLMTp = GLMmod.summary(title='Test GLM')
     """
 From Stata
diff --git a/statsmodels/genmod/qif.py b/statsmodels/genmod/qif.py
index fb31399c8..07bba77f0 100644
--- a/statsmodels/genmod/qif.py
+++ b/statsmodels/genmod/qif.py
@@ -28,7 +28,7 @@ class QIFCovariance:
         Returns the term'th basis matrix, which is a dim x dim
         matrix.
         """
-        pass
+        raise NotImplementedError


 class QIFIndependence(QIFCovariance):
@@ -43,6 +43,12 @@ class QIFIndependence(QIFCovariance):
     def __init__(self):
         self.num_terms = 1

+    def mat(self, dim, term):
+        if term == 0:
+            return np.eye(dim)
+        else:
+            return None
+

 class QIFExchangeable(QIFCovariance):
     """
@@ -52,6 +58,14 @@ class QIFExchangeable(QIFCovariance):
     def __init__(self):
         self.num_terms = 2

+    def mat(self, dim, term):
+        if term == 0:
+            return np.eye(dim)
+        elif term == 1:
+            return np.ones((dim, dim))
+        else:
+            return None
+

 class QIFAutoregressive(QIFCovariance):
     """
@@ -61,6 +75,28 @@ class QIFAutoregressive(QIFCovariance):
     def __init__(self):
         self.num_terms = 3

+    def mat(self, dim, term):
+
+        if dim < 3:
+            msg = ("Groups must have size at least 3 for " +
+                   "autoregressive covariance.")
+            raise ValueError(msg)
+
+        if term == 0:
+            return np.eye(dim)
+        elif term == 1:
+            mat = np.zeros((dim, dim))
+            mat.flat[1::(dim+1)] = 1
+            mat += mat.T
+            return mat
+        elif term == 2:
+            mat = np.zeros((dim, dim))
+            mat[0, 0] = 1
+            mat[dim-1, dim-1] = 1
+            return mat
+        else:
+            return None
+

 class QIF(base.Model):
     """
@@ -90,31 +126,55 @@ class QIF(base.Model):
     www.jstor.org/stable/2673612
     """

-    def __init__(self, endog, exog, groups, family=None, cov_struct=None,
-        missing='none', **kwargs):
+    def __init__(self, endog, exog, groups, family=None,
+                 cov_struct=None, missing='none', **kwargs):
+
+        # Handle the family argument
         if family is None:
             family = families.Gaussian()
-        elif not issubclass(family.__class__, families.Family):
-            raise ValueError('QIF: `family` must be a genmod family instance')
+        else:
+            if not issubclass(family.__class__, families.Family):
+                raise ValueError("QIF: `family` must be a genmod "
+                                 "family instance")
         self.family = family
+
         self._fit_history = defaultdict(list)
+
+        # Handle the cov_struct argument
         if cov_struct is None:
             cov_struct = QIFIndependence()
-        elif not isinstance(cov_struct, QIFCovariance):
-            raise ValueError(
-                'QIF: `cov_struct` must be a QIFCovariance instance')
+        else:
+            if not isinstance(cov_struct, QIFCovariance):
+                raise ValueError(
+                    "QIF: `cov_struct` must be a QIFCovariance instance")
         self.cov_struct = cov_struct
+
         groups = np.asarray(groups)
-        super(QIF, self).__init__(endog, exog, groups=groups, missing=
-            missing, **kwargs)
+
+        super(QIF, self).__init__(endog, exog, groups=groups,
+                                  missing=missing, **kwargs)
+
         self.group_names = list(set(groups))
         self.nobs = len(self.endog)
+
         groups_ix = defaultdict(list)
         for i, g in enumerate(groups):
             groups_ix[g].append(i)
         self.groups_ix = [groups_ix[na] for na in self.group_names]
+
         self._check_args(groups)

+    def _check_args(self, groups):
+
+        if len(groups) != len(self.endog):
+            msg = "QIF: groups and endog should have the same length"
+            raise ValueError(msg)
+
+        if len(self.endog) != self.exog.shape[0]:
+            msg = ("QIF: the length of endog should be equal to the "
+                   "number of rows of exog.")
+            raise ValueError(msg)
+
     def objective(self, params):
         """
         Calculate the gradient of the QIF objective function.
@@ -132,7 +192,92 @@ class QIF(base.Model):
             The gradients of each estimating equation with
             respect to the parameter.
         """
-        pass
+
+        endog = self.endog
+        exog = self.exog
+        lpr = np.dot(exog, params)
+        mean = self.family.link.inverse(lpr)
+        va = self.family.variance(mean)
+
+        # Mean derivative
+        idl = self.family.link.inverse_deriv(lpr)
+        idl2 = self.family.link.inverse_deriv2(lpr)
+        vd = self.family.variance.deriv(mean)
+
+        m = self.cov_struct.num_terms
+        p = exog.shape[1]
+
+        d = p * m
+        gn = np.zeros(d)
+        gi = np.zeros(d)
+        gi_deriv = np.zeros((d, p))
+        gn_deriv = np.zeros((d, p))
+        cn_deriv = [0] * p
+        cmat = np.zeros((d, d))
+
+        fastvar = self.family.variance is varfuncs.constant
+        fastlink = isinstance(
+            self.family.link,
+            # TODO: Remove links.identity after deprecation final
+            (links.Identity, links.identity)
+        )
+
+        for ix in self.groups_ix:
+            sd = np.sqrt(va[ix])
+            resid = endog[ix] - mean[ix]
+            sresid = resid / sd
+            deriv = exog[ix, :] * idl[ix, None]
+
+            jj = 0
+            for j in range(m):
+                # The derivative of each term in (5) of Qu et al.
+                # There are four terms involving beta in a product.
+                # Iterated application of the product rule gives
+                # the gradient as a sum of four terms.
+                c = self.cov_struct.mat(len(ix), j)
+                crs1 = np.dot(c, sresid) / sd
+                gi[jj:jj+p] = np.dot(deriv.T, crs1)
+                crs2 = np.dot(c, -deriv / sd[:, None]) / sd[:, None]
+                gi_deriv[jj:jj+p, :] = np.dot(deriv.T, crs2)
+                if not (fastlink and fastvar):
+                    for k in range(p):
+                        m1 = np.dot(exog[ix, :].T,
+                                    idl2[ix] * exog[ix, k] * crs1)
+                        if not fastvar:
+                            vx = -0.5 * vd[ix] * deriv[:, k] / va[ix]**1.5
+                            m2 = np.dot(deriv.T, vx * np.dot(c, sresid))
+                            m3 = np.dot(deriv.T, np.dot(c, vx * resid) / sd)
+                        else:
+                            m2, m3 = 0, 0
+                        gi_deriv[jj:jj+p, k] += m1 + m2 + m3
+                jj += p
+
+            for j in range(p):
+                u = np.outer(gi, gi_deriv[:, j])
+                cn_deriv[j] += u + u.T
+
+            gn += gi
+            gn_deriv += gi_deriv
+
+            cmat += np.outer(gi, gi)
+
+        ngrp = len(self.groups_ix)
+        gn /= ngrp
+        gn_deriv /= ngrp
+        cmat /= ngrp**2
+
+        qif = np.dot(gn, np.linalg.solve(cmat, gn))
+
+        gcg = np.zeros(p)
+        for j in range(p):
+            cn_deriv[j] /= len(self.groups_ix)**2
+            u = np.linalg.solve(cmat, cn_deriv[j]).T
+            u = np.linalg.solve(cmat, u)
+            gcg[j] = np.dot(gn, np.dot(u, gn))
+
+        grad = 2 * np.dot(gn_deriv.T, np.linalg.solve(cmat, gn)) - gcg
+
+        return qif, grad, cmat, gn, gn_deriv

     def estimate_scale(self, params):
         """
@@ -141,10 +286,25 @@ class QIF(base.Model):
         The scale parameter for binomial and Poisson families is
         fixed at 1, otherwise it is estimated from the data.
         """
-        pass
+
+        if isinstance(self.family, (families.Binomial, families.Poisson)):
+            return 1.
+
+        if hasattr(self, "ddof_scale"):
+            ddof_scale = self.ddof_scale
+        else:
+            ddof_scale = self.exog[1]
+
+        lpr = np.dot(self.exog, params)
+        mean = self.family.link.inverse(lpr)
+        resid = self.endog - mean
+        scale = np.sum(resid**2) / (self.nobs - ddof_scale)
+
+        return scale

     @classmethod
-    def from_formula(cls, formula, groups, data, subset=None, *args, **kwargs):
+    def from_formula(cls, formula, groups, data, subset=None,
+                     *args, **kwargs):
         """
         Create a QIF model instance from a formula and dataframe.

@@ -166,10 +326,18 @@ class QIF(base.Model):
         -------
         model : QIF model instance
         """
-        pass

-    def fit(self, maxiter=100, start_params=None, tol=1e-06, gtol=0.0001,
-        ddof_scale=None):
+        if isinstance(groups, str):
+            groups = data[groups]
+
+        model = super(QIF, cls).from_formula(
+                   formula, data=data, subset=subset,
+                   groups=groups, *args, **kwargs)
+
+        return model
+
+    def fit(self, maxiter=100, start_params=None, tol=1e-6, gtol=1e-4,
+            ddof_scale=None):
         """
         Fit a GLM to correlated data using QIF.

@@ -191,15 +359,59 @@ class QIF(base.Model):
         -------
         QIFResults object
         """
-        pass
+
+        if ddof_scale is None:
+            self.ddof_scale = self.exog.shape[1]
+        else:
+            self.ddof_scale = ddof_scale
+
+        if start_params is None:
+            model = GLM(self.endog, self.exog, family=self.family)
+            result = model.fit()
+            params = result.params
+        else:
+            params = start_params
+
+        for _ in range(maxiter):
+
+            qif, grad, cmat, _, gn_deriv = self.objective(params)
+
+            gnorm = np.sqrt(np.sum(grad * grad))
+            self._fit_history["qif"].append(qif)
+            self._fit_history["gradnorm"].append(gnorm)
+
+            if gnorm < gtol:
+                break
+
+            cjac = 2 * np.dot(gn_deriv.T, np.linalg.solve(cmat, gn_deriv))
+            step = np.linalg.solve(cjac, grad)
+
+            snorm = np.sqrt(np.sum(step * step))
+            self._fit_history["stepnorm"].append(snorm)
+            if snorm < tol:
+                break
+            params -= step
+
+        vcov = np.dot(gn_deriv.T, np.linalg.solve(cmat, gn_deriv))
+        vcov = np.linalg.inv(vcov)
+        scale = self.estimate_scale(params)
+
+        rslt = QIFResults(self, params, vcov / scale, scale)
+        rslt.fit_history = self._fit_history
+        self._fit_history = defaultdict(list)
+
+        return QIFResultsWrapper(rslt)


 class QIFResults(base.LikelihoodModelResults):
     """Results class for QIF Regression"""
+    def __init__(self, model, params, cov_params, scale,
+                 use_t=False, **kwds):
+
+        super(QIFResults, self).__init__(
+            model, params, normalized_cov_params=cov_params,
+            scale=scale)

-    def __init__(self, model, params, cov_params, scale, use_t=False, **kwds):
-        super(QIFResults, self).__init__(model, params,
-            normalized_cov_params=cov_params, scale=scale)
         self.qif, _, _, _, _ = self.model.objective(params)

     @cache_readonly
@@ -207,23 +419,32 @@ class QIFResults(base.LikelihoodModelResults):
         """
         An AIC-like statistic for models fit using QIF.
         """
-        pass
+        if isinstance(self.model.cov_struct, QIFIndependence):
+            msg = "AIC not available with QIFIndependence covariance"
+            raise ValueError(msg)
+        df = self.model.exog.shape[1]
+        return self.qif + 2*df

     @cache_readonly
     def bic(self):
         """
         A BIC-like statistic for models fit using QIF.
         """
-        pass
+        if isinstance(self.model.cov_struct, QIFIndependence):
+            msg = "BIC not available with QIFIndependence covariance"
+            raise ValueError(msg)
+        df = self.model.exog.shape[1]
+        return self.qif + np.log(self.model.nobs)*df

     @cache_readonly
     def fittedvalues(self):
         """
         Returns the fitted values from the model.
         """
-        pass
+        return self.model.family.link.inverse(
+                np.dot(self.model.exog, self.params))

-    def summary(self, yname=None, xname=None, title=None, alpha=0.05):
+    def summary(self, yname=None, xname=None, title=None, alpha=.05):
         """
         Summarize the QIF regression results

@@ -251,7 +472,48 @@ class QIFResults(base.LikelihoodModelResults):
         --------
         statsmodels.iolib.summary.Summary : class to hold summary results
         """
-        pass
+
+        top_left = [('Dep. Variable:', None),
+                    ('Method:', ['QIF']),
+                    ('Family:', [self.model.family.__class__.__name__]),
+                    ('Covariance structure:',
+                     [self.model.cov_struct.__class__.__name__]),
+                    ('Date:', None),
+                    ('Time:', None),
+                    ]
+
+        NY = [len(y) for y in self.model.groups_ix]
+
+        top_right = [('No. Observations:', [sum(NY)]),
+                     ('No. clusters:', [len(NY)]),
+                     ('Min. cluster size:', [min(NY)]),
+                     ('Max. cluster size:', [max(NY)]),
+                     ('Mean cluster size:', ["%.1f" % np.mean(NY)]),
+                     ('Scale:', ["%.3f" % self.scale]),
+                     ]
+
+        if title is None:
+            title = self.model.__class__.__name__ + ' ' +\
+                "Regression Results"
+
+        # Override the exog variable names if xname is provided as an
+        # argument.
+        if xname is None:
+            xname = self.model.exog_names
+
+        if yname is None:
+            yname = self.model.endog_names
+
+        # Create summary table instance
+        from statsmodels.iolib.summary import Summary
+        smry = Summary()
+        smry.add_table_2cols(self, gleft=top_left, gright=top_right,
+                             yname=yname, xname=xname,
+                             title=title)
+        smry.add_table_params(self, yname=yname, xname=xname,
+                              alpha=alpha, use_t=False)
+
+        return smry


 class QIFResultsWrapper(lm.RegressionResultsWrapper):
diff --git a/statsmodels/graphics/_regressionplots_doc.py b/statsmodels/graphics/_regressionplots_doc.py
index 039a57b8e..59bac854a 100644
--- a/statsmodels/graphics/_regressionplots_doc.py
+++ b/statsmodels/graphics/_regressionplots_doc.py
@@ -1,4 +1,5 @@
-_plot_added_variable_doc = """    Create an added variable plot for a fitted regression model.
+_plot_added_variable_doc = """\
+    Create an added variable plot for a fitted regression model.

     Parameters
     ----------
@@ -24,7 +25,9 @@ _plot_added_variable_doc = """    Create an added variable plot for a fitted reg
     Figure
         A matplotlib figure instance.
 """
-_plot_partial_residuals_doc = """    Create a partial residual, or 'component plus residual' plot for a
+
+_plot_partial_residuals_doc = """\
+    Create a partial residual, or 'component plus residual' plot for a
     fitted regression model.

     Parameters
@@ -40,7 +43,9 @@ _plot_partial_residuals_doc = """    Create a partial residual, or 'component pl
     Figure
         A matplotlib figure instance.
 """
-_plot_ceres_residuals_doc = """    Conditional Expectation Partial Residuals (CERES) plot.
+
+_plot_ceres_residuals_doc = """\
+    Conditional Expectation Partial Residuals (CERES) plot.

     Produce a CERES plot for a fitted regression model.

@@ -115,7 +120,10 @@ _plot_ceres_residuals_doc = """    Conditional Expectation Partial Residuals (CE

     .. plot:: plots/graphics_regression_ceres_residuals.py
 """
-_plot_influence_doc = """    Plot of influence in regression. Plots studentized resids vs. leverage.
+
+
+_plot_influence_doc = """\
+    Plot of influence in regression. Plots studentized resids vs. leverage.

     Parameters
     ----------
@@ -170,7 +178,10 @@ _plot_influence_doc = """    Plot of influence in regression. Plots studentized

     .. plot:: plots/graphics_regression_influence.py
     """
-_plot_leverage_resid2_doc = """    Plot leverage statistics vs. normalized residuals squared
+
+
+_plot_leverage_resid2_doc = """\
+    Plot leverage statistics vs. normalized residuals squared

     Parameters
     ----------
diff --git a/statsmodels/graphics/agreement.py b/statsmodels/graphics/agreement.py
index 7f4657a30..3de2cb6e3 100644
--- a/statsmodels/graphics/agreement.py
+++ b/statsmodels/graphics/agreement.py
@@ -1,15 +1,17 @@
-"""
+'''
 Bland-Altman mean-difference plots

 Author: Joses Ho
 License: BSD-3
-"""
+'''
+
 import numpy as np
+
 from . import utils


 def mean_diff_plot(m1, m2, sd_limit=1.96, ax=None, scatter_kwds=None,
-    mean_line_kwds=None, limit_lines_kwds=None):
+                   mean_line_kwds=None, limit_lines_kwds=None):
     """
     Construct a Tukey/Bland-Altman Mean Difference Plot.

@@ -81,4 +83,72 @@ def mean_diff_plot(m1, m2, sd_limit=1.96, ax=None, scatter_kwds=None,

     .. plot:: plots/graphics-mean_diff_plot.py
     """
-    pass
+    fig, ax = utils.create_mpl_ax(ax)
+
+    if len(m1) != len(m2):
+        raise ValueError('m1 does not have the same length as m2.')
+    if sd_limit < 0:
+        raise ValueError('sd_limit ({}) is less than 0.'.format(sd_limit))
+
+    means = np.mean([m1, m2], axis=0)
+    diffs = m1 - m2
+    mean_diff = np.mean(diffs)
+    std_diff = np.std(diffs, axis=0)
+
+    scatter_kwds = scatter_kwds or {}
+    if 's' not in scatter_kwds:
+        scatter_kwds['s'] = 20
+    mean_line_kwds = mean_line_kwds or {}
+    limit_lines_kwds = limit_lines_kwds or {}
+    for kwds in [mean_line_kwds, limit_lines_kwds]:
+        if 'color' not in kwds:
+            kwds['color'] = 'gray'
+        if 'linewidth' not in kwds:
+            kwds['linewidth'] = 1
+    if 'linestyle' not in mean_line_kwds:
+        kwds['linestyle'] = '--'
+    if 'linestyle' not in limit_lines_kwds:
+        kwds['linestyle'] = ':'
+
+    ax.scatter(means, diffs, **scatter_kwds) # Plot the means against the diffs.
+    ax.axhline(mean_diff, **mean_line_kwds)  # draw mean line.
+
+    # Annotate mean line with mean difference.
+    ax.annotate('mean diff:\n{}'.format(np.round(mean_diff, 2)),
+                xy=(0.99, 0.5),
+                horizontalalignment='right',
+                verticalalignment='center',
+                fontsize=14,
+                xycoords='axes fraction')
+
+    if sd_limit > 0:
+        half_ylim = (1.5 * sd_limit) * std_diff
+        ax.set_ylim(mean_diff - half_ylim,
+                    mean_diff + half_ylim)
+        limit_of_agreement = sd_limit * std_diff
+        lower = mean_diff - limit_of_agreement
+        upper = mean_diff + limit_of_agreement
+        for j, lim in enumerate([lower, upper]):
+            ax.axhline(lim, **limit_lines_kwds)
+        ax.annotate(f'-{sd_limit} SD: {lower:0.2g}',
+                    xy=(0.99, 0.07),
+                    horizontalalignment='right',
+                    verticalalignment='bottom',
+                    fontsize=14,
+                    xycoords='axes fraction')
+        ax.annotate(f'+{sd_limit} SD: {upper:0.2g}',
+                    xy=(0.99, 0.92),
+                    horizontalalignment='right',
+                    fontsize=14,
+                    xycoords='axes fraction')
+
+    elif sd_limit == 0:
+        half_ylim = 3 * std_diff
+        ax.set_ylim(mean_diff - half_ylim,
+                    mean_diff + half_ylim)
+
+    ax.set_ylabel('Difference', fontsize=15)
+    ax.set_xlabel('Means', fontsize=15)
+    ax.tick_params(labelsize=13)
+    fig.tight_layout()
+    return fig
diff --git a/statsmodels/graphics/api.py b/statsmodels/graphics/api.py
index c7d7f7d8f..b5ae388f2 100644
--- a/statsmodels/graphics/api.py
+++ b/statsmodels/graphics/api.py
@@ -6,10 +6,38 @@ from .factorplots import interaction_plot
 from .functional import fboxplot, hdrboxplot, rainbowplot
 from .gofplots import qqplot
 from .plottools import rainbow
-from .regressionplots import abline_plot, influence_plot, plot_ccpr, plot_ccpr_grid, plot_fit, plot_leverage_resid2, plot_partregress, plot_partregress_grid, plot_regress_exog
-__all__ = ['abline_plot', 'beanplot', 'fboxplot', 'hdrboxplot',
-    'influence_plot', 'interaction_plot', 'mean_diff_plot', 'plot_ccpr',
-    'plot_ccpr_grid', 'plot_corr', 'plot_corr_grid', 'plot_fit',
-    'plot_leverage_resid2', 'plot_partregress', 'plot_partregress_grid',
-    'plot_regress_exog', 'qqplot', 'rainbow', 'rainbowplot', 'tsa',
-    'violinplot']
+from .regressionplots import (
+    abline_plot,
+    influence_plot,
+    plot_ccpr,
+    plot_ccpr_grid,
+    plot_fit,
+    plot_leverage_resid2,
+    plot_partregress,
+    plot_partregress_grid,
+    plot_regress_exog,
+)
+
+__all__ = [
+    "abline_plot",
+    "beanplot",
+    "fboxplot",
+    "hdrboxplot",
+    "influence_plot",
+    "interaction_plot",
+    "mean_diff_plot",
+    "plot_ccpr",
+    "plot_ccpr_grid",
+    "plot_corr",
+    "plot_corr_grid",
+    "plot_fit",
+    "plot_leverage_resid2",
+    "plot_partregress",
+    "plot_partregress_grid",
+    "plot_regress_exog",
+    "qqplot",
+    "rainbow",
+    "rainbowplot",
+    "tsa",
+    "violinplot",
+]
diff --git a/statsmodels/graphics/boxplots.py b/statsmodels/graphics/boxplots.py
index 5865e43ea..2e142a2eb 100644
--- a/statsmodels/graphics/boxplots.py
+++ b/statsmodels/graphics/boxplots.py
@@ -1,12 +1,18 @@
 """Variations on boxplots."""
+
+# Author: Ralf Gommers
+# Based on code by Flavio Coelho and Teemu Ikonen.
+
 import numpy as np
 from scipy.stats import gaussian_kde
+
 from . import utils
+
 __all__ = ['violinplot', 'beanplot']


 def violinplot(data, ax=None, labels=None, positions=None, side='both',
-    show_boxplot=True, plot_opts=None):
+               show_boxplot=True, plot_opts=None):
     """
     Make a violin plot of each dataset in the `data` sequence.

@@ -117,21 +123,114 @@ def violinplot(data, ax=None, labels=None, positions=None, side='both',

     .. plot:: plots/graphics_boxplot_violinplot.py
     """
-    pass
+    plot_opts = {} if plot_opts is None else plot_opts
+    if max([np.size(arr) for arr in data]) == 0:
+        msg = "No Data to make Violin: Try again!"
+        raise ValueError(msg)
+
+    fig, ax = utils.create_mpl_ax(ax)
+
+    data = list(map(np.asarray, data))
+    if positions is None:
+        positions = np.arange(len(data)) + 1
+
+    # Determine available horizontal space for each individual violin.
+    pos_span = np.max(positions) - np.min(positions)
+    width = np.min([0.15 * np.max([pos_span, 1.]),
+                    plot_opts.get('violin_width', 0.8) / 2.])
+
+    # Plot violins.
+    for pos_data, pos in zip(data, positions):
+        _single_violin(ax, pos, pos_data, width, side, plot_opts)
+
+    if show_boxplot:
+        ax.boxplot(data, notch=1, positions=positions, vert=1)
+
+    # Set ticks and tick labels of horizontal axis.
+    _set_ticks_labels(ax, data, labels, positions, plot_opts)
+
+    return fig


 def _single_violin(ax, pos, pos_data, width, side, plot_opts):
     """"""
-    pass
+    bw_factor = plot_opts.get('bw_factor', None)
+
+    def _violin_range(pos_data, plot_opts):
+        """Return array with correct range, with which violins can be plotted."""
+        cutoff = plot_opts.get('cutoff', False)
+        cutoff_type = plot_opts.get('cutoff_type', 'std')
+        cutoff_val = plot_opts.get('cutoff_val', 1.5)
+
+        s = 0.0
+        if not cutoff:
+            if cutoff_type == 'std':
+                s = cutoff_val * np.std(pos_data)
+            else:
+                s = cutoff_val
+
+        x_lower = kde.dataset.min() - s
+        x_upper = kde.dataset.max() + s
+        return np.linspace(x_lower, x_upper, 100)
+
+    pos_data = np.asarray(pos_data)
+    # Kernel density estimate for data at this position.
+    kde = gaussian_kde(pos_data, bw_method=bw_factor)
+
+    # Create violin for pos, scaled to the available space.
+    xvals = _violin_range(pos_data, plot_opts)
+    violin = kde.evaluate(xvals)
+    violin = width * violin / violin.max()
+
+    if side == 'both':
+        envelope_l, envelope_r = (-violin + pos, violin + pos)
+    elif side == 'right':
+        envelope_l, envelope_r = (pos, violin + pos)
+    elif side == 'left':
+        envelope_l, envelope_r = (-violin + pos, pos)
+    else:
+        msg = "`side` parameter should be one of {'left', 'right', 'both'}."
+        raise ValueError(msg)
+
+    # Draw the violin.
+    ax.fill_betweenx(xvals, envelope_l, envelope_r,
+                     facecolor=plot_opts.get('violin_fc', '#66c2a5'),
+                     edgecolor=plot_opts.get('violin_ec', 'k'),
+                     lw=plot_opts.get('violin_lw', 1),
+                     alpha=plot_opts.get('violin_alpha', 0.5))
+
+    return xvals, violin


 def _set_ticks_labels(ax, data, labels, positions, plot_opts):
     """Set ticks and labels on horizontal axis."""
-    pass
+
+    # Set xticks and limits.
+    ax.set_xlim([np.min(positions) - 0.5, np.max(positions) + 0.5])
+    ax.set_xticks(positions)
+
+    label_fontsize = plot_opts.get('label_fontsize')
+    label_rotation = plot_opts.get('label_rotation')
+    if label_fontsize or label_rotation:
+        from matplotlib.artist import setp
+
+    if labels is not None:
+        if not len(labels) == len(data):
+            msg = "Length of `labels` should equal length of `data`."
+            raise ValueError(msg)
+
+        xticknames = ax.set_xticklabels(labels)
+        if label_fontsize:
+            setp(xticknames, fontsize=label_fontsize)
+
+        if label_rotation:
+            setp(xticknames, rotation=label_rotation)
+
+    return


 def beanplot(data, ax=None, labels=None, positions=None, side='both',
-    jitter=False, plot_opts={}):
+             jitter=False, plot_opts={}):
     """
     Bean plot of each dataset in a sequence.

@@ -230,14 +329,91 @@ def beanplot(data, ax=None, labels=None, positions=None, side='both',

     .. plot:: plots/graphics_boxplot_beanplot.py
     """
-    pass
+    fig, ax = utils.create_mpl_ax(ax)
+
+    data = list(map(np.asarray, data))
+    if positions is None:
+        positions = np.arange(len(data)) + 1
+
+    # Determine available horizontal space for each individual violin.
+    pos_span = np.max(positions) - np.min(positions)
+    violin_width = np.min([0.15 * np.max([pos_span, 1.]),
+                    plot_opts.get('violin_width', 0.8) / 2.])
+    bean_width = np.min([0.15 * np.max([pos_span, 1.]),
+                    plot_opts.get('bean_size', 0.5) / 2.])
+    bean_mean_width = np.min([0.15 * np.max([pos_span, 1.]),
+                    plot_opts.get('bean_mean_size', 0.5) / 2.])
+
+    legend_txt = plot_opts.get('bean_legend_text', None)
+    for pos_data, pos in zip(data, positions):
+        # Draw violins.
+        xvals, violin = _single_violin(ax, pos, pos_data, violin_width, side, plot_opts)
+
+        if jitter:
+            # Draw data points at random coordinates within violin envelope.
+            jitter_coord = pos + _jitter_envelope(pos_data, xvals, violin, side)
+            ax.plot(jitter_coord, pos_data, ls='',
+                    marker=plot_opts.get('jitter_marker', 'o'),
+                    ms=plot_opts.get('jitter_marker_size', 4),
+                    mec=plot_opts.get('bean_color', 'k'),
+                    mew=1, mfc=plot_opts.get('jitter_fc', 'none'),
+                    label=legend_txt)
+        else:
+            # Draw bean lines.
+            ax.hlines(pos_data, pos - bean_width, pos + bean_width,
+                      lw=plot_opts.get('bean_lw', 0.5),
+                      color=plot_opts.get('bean_color', 'k'),
+                      label=legend_txt)
+
+        # Show legend if required.
+        if legend_txt is not None:
+            _show_legend(ax)
+            legend_txt = None  # ensure we get one entry per call to beanplot
+
+        # Draw mean line.
+        if plot_opts.get('bean_show_mean', True):
+            ax.hlines(np.mean(pos_data), pos - bean_mean_width, pos + bean_mean_width,
+                      lw=plot_opts.get('bean_mean_lw', 2.),
+                      color=plot_opts.get('bean_mean_color', 'b'))
+
+        # Draw median marker.
+        if plot_opts.get('bean_show_median', True):
+            ax.plot(pos, np.median(pos_data),
+                    marker=plot_opts.get('bean_median_marker', '+'),
+                    color=plot_opts.get('bean_median_color', 'r'))
+
+    # Set ticks and tick labels of horizontal axis.
+    _set_ticks_labels(ax, data, labels, positions, plot_opts)
+
+    return fig


 def _jitter_envelope(pos_data, xvals, violin, side):
     """Determine envelope for jitter markers."""
-    pass
+    if side == 'both':
+        low, high = (-1., 1.)
+    elif side == 'right':
+        low, high = (0, 1.)
+    elif side == 'left':
+        low, high = (-1., 0)
+    else:
+        raise ValueError("`side` input incorrect: %s" % side)
+
+    jitter_envelope = np.interp(pos_data, xvals, violin)
+    jitter_coord = jitter_envelope * np.random.uniform(low=low, high=high,
+                                                       size=pos_data.size)
+
+    return jitter_coord


 def _show_legend(ax):
     """Utility function to show legend."""
-    pass
+    leg = ax.legend(loc=1, shadow=True, fancybox=True, labelspacing=0.2,
+                    borderpad=0.15)
+    ltext  = leg.get_texts()
+    llines = leg.get_lines()
+    frame  = leg.get_frame()
+
+    from matplotlib.artist import setp
+    setp(ltext, fontsize='small')
+    setp(llines, linewidth=1)
diff --git a/statsmodels/graphics/correlation.py b/statsmodels/graphics/correlation.py
index b2cc19973..43746673f 100644
--- a/statsmodels/graphics/correlation.py
+++ b/statsmodels/graphics/correlation.py
@@ -1,4 +1,4 @@
-"""correlation plots
+'''correlation plots

 Author: Josef Perktold
 License: BSD-3
@@ -6,13 +6,14 @@ License: BSD-3
 example for usage with different options in
 statsmodels/sandbox/examples/thirdparty/ex_ratereturn.py

-"""
+'''
 import numpy as np
+
 from . import utils


 def plot_corr(dcorr, xnames=None, ynames=None, title=None, normcolor=False,
-    ax=None, cmap='RdYlBu_r'):
+              ax=None, cmap='RdYlBu_r'):
     """Plot correlation of many variables in a tight color grid.

     Parameters
@@ -60,11 +61,69 @@ def plot_corr(dcorr, xnames=None, ynames=None, title=None, normcolor=False,

     .. plot:: plots/graphics_correlation_plot_corr.py
     """
-    pass
-
-
-def plot_corr_grid(dcorrs, titles=None, ncols=None, normcolor=False, xnames
-    =None, ynames=None, fig=None, cmap='RdYlBu_r'):
+    if ax is None:
+        create_colorbar = True
+    else:
+        create_colorbar = False
+
+    fig, ax = utils.create_mpl_ax(ax)
+
+    nvars = dcorr.shape[0]
+
+    if ynames is None:
+        ynames = xnames
+    if title is None:
+        title = 'Correlation Matrix'
+    if isinstance(normcolor, tuple):
+        vmin, vmax = normcolor
+    elif normcolor:
+        vmin, vmax = -1.0, 1.0
+    else:
+        vmin, vmax = None, None
+
+    axim = ax.imshow(dcorr, cmap=cmap, interpolation='nearest',
+                     extent=(0,nvars,0,nvars), vmin=vmin, vmax=vmax)
+
+    # create list of label positions
+    labelPos = np.arange(0, nvars) + 0.5
+
+    if isinstance(ynames, list) and len(ynames) == 0:
+        ax.set_yticks([])
+    elif ynames is not None:
+        ax.set_yticks(labelPos)
+        ax.set_yticks(labelPos[:-1]+0.5, minor=True)
+        ax.set_yticklabels(ynames[::-1], fontsize='small',
+                           horizontalalignment='right')
+
+    if isinstance(xnames, list) and len(xnames) == 0:
+        ax.set_xticks([])
+    elif xnames is not None:
+        ax.set_xticks(labelPos)
+        ax.set_xticks(labelPos[:-1]+0.5, minor=True)
+        ax.set_xticklabels(xnames, fontsize='small', rotation=45,
+                           horizontalalignment='right')
+
+
+    if not title == '':
+        ax.set_title(title)
+
+    if create_colorbar:
+        fig.colorbar(axim, use_gridspec=True)
+    fig.tight_layout()
+
+    ax.tick_params(which='minor', length=0)
+    ax.tick_params(direction='out', top=False, right=False)
+    try:
+        ax.grid(True, which='minor', linestyle='-', color='w', lw=1)
+    except AttributeError:
+        # Seems to fail for axes created with AxesGrid.  MPL bug?
+        pass
+
+    return fig
+
+
+def plot_corr_grid(dcorrs, titles=None, ncols=None, normcolor=False, xnames=None,
+                   ynames=None, fig=None, cmap='RdYlBu_r'):
     """
     Create a grid of correlation plots.

@@ -123,4 +182,40 @@ def plot_corr_grid(dcorrs, titles=None, ncols=None, normcolor=False, xnames

     .. plot:: plots/graphics_correlation_plot_corr_grid.py
     """
-    pass
+    if ynames is None:
+        ynames = xnames
+
+    if not titles:
+        titles = ['']*len(dcorrs)
+
+    n_plots = len(dcorrs)
+    if ncols is not None:
+        nrows = int(np.ceil(n_plots / float(ncols)))
+    else:
+        # Determine number of rows and columns, square if possible, otherwise
+        # prefer a wide (more columns) over a high layout.
+        if n_plots < 4:
+            nrows, ncols = 1, n_plots
+        else:
+            nrows = int(np.sqrt(n_plots))
+            ncols = int(np.ceil(n_plots / float(nrows)))
+
+    # Create a figure with the correct size
+    aspect = min(ncols / float(nrows), 1.8)
+    vsize = np.sqrt(nrows) * 5
+    fig = utils.create_mpl_fig(fig, figsize=(vsize * aspect + 1, vsize))
+
+    for i, c in enumerate(dcorrs):
+        ax = fig.add_subplot(nrows, ncols, i+1)
+        # Ensure to only plot labels on bottom row and left column
+        _xnames = xnames if nrows * ncols - (i+1) < ncols else []
+        _ynames = ynames if (i+1) % ncols == 1 else []
+        plot_corr(c, xnames=_xnames, ynames=_ynames, title=titles[i],
+                  normcolor=normcolor, ax=ax, cmap=cmap)
+
+    # Adjust figure margins and add a colorbar
+    fig.subplots_adjust(bottom=0.1, left=0.09, right=0.9, top=0.9)
+    cax = fig.add_axes([0.92, 0.1, 0.025, 0.8])
+    fig.colorbar(fig.axes[0].images[0], cax=cax)
+
+    return fig
diff --git a/statsmodels/graphics/dotplots.py b/statsmodels/graphics/dotplots.py
index e97d77fa2..da96b0b3a 100644
--- a/statsmodels/graphics/dotplots.py
+++ b/statsmodels/graphics/dotplots.py
@@ -1,12 +1,15 @@
 import numpy as np
+
 from . import utils


-def dot_plot(points, intervals=None, lines=None, sections=None, styles=None,
-    marker_props=None, line_props=None, split_names=None, section_order=
-    None, line_order=None, stacked=False, styles_order=None, striped=False,
-    horizontal=True, show_names='both', fmt_left_name=None, fmt_right_name=
-    None, show_section_titles=None, ax=None):
+def dot_plot(points, intervals=None, lines=None, sections=None,
+             styles=None, marker_props=None, line_props=None,
+             split_names=None, section_order=None, line_order=None,
+             stacked=False, styles_order=None, striped=False,
+             horizontal=True, show_names="both",
+             fmt_left_name=None, fmt_right_name=None,
+             show_section_titles=None, ax=None):
     """
     Dot plotting (also known as forest and blobbogram).

@@ -118,4 +121,369 @@ def dot_plot(points, intervals=None, lines=None, sections=None, styles=None,

     >>> dot_plot(points=point_values, lines=label_values)
     """
-    pass
+
+    import matplotlib.transforms as transforms
+
+    fig, ax = utils.create_mpl_ax(ax)
+
+    # Convert to numpy arrays if that is not what we are given.
+    points = np.asarray(points)
+    asarray_or_none = lambda x : None if x is None else np.asarray(x)
+    intervals = asarray_or_none(intervals)
+    lines = asarray_or_none(lines)
+    sections = asarray_or_none(sections)
+    styles = asarray_or_none(styles)
+
+    # Total number of points
+    npoint = len(points)
+
+    # Set default line values if needed
+    if lines is None:
+        lines = np.arange(npoint)
+
+    # Set default section values if needed
+    if sections is None:
+        sections = np.zeros(npoint)
+
+    # Set default style values if needed
+    if styles is None:
+        styles = np.zeros(npoint)
+
+    # The vertical space (in inches) for a section title
+    section_title_space = 0.5
+
+    # The number of sections
+    nsect = len(set(sections))
+    if section_order is not None:
+        nsect = len(set(section_order))
+
+    # The number of section titles
+    if show_section_titles is False:
+        draw_section_titles = False
+        nsect_title = 0
+    elif show_section_titles is True:
+        draw_section_titles = True
+        nsect_title = nsect
+    else:
+        draw_section_titles = nsect > 1
+        nsect_title = nsect if nsect > 1 else 0
+
+    # The total vertical space devoted to section titles.
+    section_space_total = section_title_space * nsect_title
+
+    # Add a bit of room so that points that fall at the axis limits
+    # are not cut in half.
+    ax.set_xmargin(0.02)
+    ax.set_ymargin(0.02)
+
+    if section_order is None:
+        lines0 = list(set(sections))
+        lines0.sort()
+    else:
+        lines0 = section_order
+
+    if line_order is None:
+        lines1 = list(set(lines))
+        lines1.sort()
+    else:
+        lines1 = line_order
+
+    # A map from (section,line) codes to index positions.
+    lines_map = {}
+    for i in range(npoint):
+        if section_order is not None and sections[i] not in section_order:
+            continue
+        if line_order is not None and lines[i] not in line_order:
+            continue
+        ky = (sections[i], lines[i])
+        if ky not in lines_map:
+            lines_map[ky] = []
+        lines_map[ky].append(i)
+
+    # Get the size of the axes on the parent figure in inches
+    bbox = ax.get_window_extent().transformed(
+        fig.dpi_scale_trans.inverted())
+    awidth, aheight = bbox.width, bbox.height
+
+    # The number of lines in the plot.
+    nrows = len(lines_map)
+
+    # The positions of the lowest and highest guideline in axes
+    # coordinates (for horizontal dotplots), or the leftmost and
+    # rightmost guidelines (for vertical dotplots).
+    bottom, top = 0, 1
+
+    if horizontal:
+        # x coordinate is data, y coordinate is axes
+        trans = transforms.blended_transform_factory(ax.transData,
+                                                     ax.transAxes)
+    else:
+        # x coordinate is axes, y coordinate is data
+        trans = transforms.blended_transform_factory(ax.transAxes,
+                                                     ax.transData)
+
+    # Space used for a section title, in axes coordinates
+    title_space_axes = section_title_space / aheight
+
+    # Space between lines
+    if horizontal:
+        dpos = (top - bottom - nsect_title*title_space_axes) /\
+            float(nrows)
+    else:
+        dpos = (top - bottom) / float(nrows)
+
+    # Determine the spacing for stacked points
+    if styles_order is not None:
+        style_codes = styles_order
+    else:
+        style_codes = list(set(styles))
+        style_codes.sort()
+    # Order is top to bottom for horizontal plots, so need to
+    # flip.
+    if horizontal:
+        style_codes = style_codes[::-1]
+    # nval is the maximum number of points on one line.
+    nval = len(style_codes)
+    if nval > 1:
+        stackd = dpos / (2.5*(float(nval)-1))
+    else:
+        stackd = 0.
+
+    # Map from style code to its integer position
+    style_codes_map = {x: style_codes.index(x) for x in style_codes}
+
+    # Setup default marker styles
+    colors = ["r", "g", "b", "y", "k", "purple", "orange"]
+    if marker_props is None:
+        marker_props = {x: {} for x in style_codes}
+    for j in range(nval):
+        sc = style_codes[j]
+        if "color" not in marker_props[sc]:
+            marker_props[sc]["color"] = colors[j % len(colors)]
+        if "marker" not in marker_props[sc]:
+            marker_props[sc]["marker"] = "o"
+        if "ms" not in marker_props[sc]:
+            marker_props[sc]["ms"] = 10 if stackd == 0 else 6
+
+    # Setup default line styles
+    if line_props is None:
+        line_props = {x: {} for x in style_codes}
+    for j in range(nval):
+        sc = style_codes[j]
+        if "color" not in line_props[sc]:
+            line_props[sc]["color"] = "grey"
+        if "linewidth" not in line_props[sc]:
+            line_props[sc]["linewidth"] = 2 if stackd > 0 else 8
+
+    if horizontal:
+        # The vertical position of the first line.
+        pos = top - dpos/2 if nsect == 1 else top
+    else:
+        # The horizontal position of the first line.
+        pos = bottom + dpos/2
+
+    # Points that have already been labeled
+    labeled = set()
+
+    # Positions of the y axis grid lines
+    ticks = []
+
+    # Loop through the sections
+    for k0 in lines0:
+
+        # Draw a section title
+        if draw_section_titles:
+
+            if horizontal:
+
+                y0 = pos + dpos/2 if k0 == lines0[0] else pos
+
+                ax.fill_between((0, 1), (y0,y0),
+                                (pos-0.7*title_space_axes,
+                                 pos-0.7*title_space_axes),
+                                color='darkgrey',
+                                transform=ax.transAxes,
+                                zorder=1)
+
+                txt = ax.text(0.5, pos - 0.35*title_space_axes, k0,
+                              horizontalalignment='center',
+                              verticalalignment='center',
+                              transform=ax.transAxes)
+                txt.set_fontweight("bold")
+                pos -= title_space_axes
+
+            else:
+
+                m = len([k for k in lines_map if k[0] == k0])
+
+                ax.fill_between((pos-dpos/2+0.01,
+                                 pos+(m-1)*dpos+dpos/2-0.01),
+                                (1.01,1.01), (1.06,1.06),
+                                color='darkgrey',
+                                transform=ax.transAxes,
+                                zorder=1, clip_on=False)
+
+                txt = ax.text(pos + (m-1)*dpos/2, 1.02, k0,
+                              horizontalalignment='center',
+                              verticalalignment='bottom',
+                              transform=ax.transAxes)
+                txt.set_fontweight("bold")
+
+        jrow = 0
+        for k1 in lines1:
+
+            # No data to plot
+            if (k0, k1) not in lines_map:
+                continue
+
+            # Draw the guideline
+            if horizontal:
+                ax.axhline(pos, color='grey')
+            else:
+                ax.axvline(pos, color='grey')
+
+            # Set up the labels
+            if split_names is not None:
+                us = k1.split(split_names)
+                if len(us) >= 2:
+                    left_label, right_label = us[0], us[1]
+                else:
+                    left_label, right_label = k1, None
+            else:
+                left_label, right_label = k1, None
+
+            if fmt_left_name is not None:
+                left_label = fmt_left_name(left_label)
+
+            if fmt_right_name is not None:
+                right_label = fmt_right_name(right_label)
+
+            # Draw the stripe
+            if striped and jrow % 2 == 0:
+                if horizontal:
+                    ax.fill_between((0, 1), (pos-dpos/2, pos-dpos/2),
+                                    (pos+dpos/2, pos+dpos/2),
+                                    color='lightgrey',
+                                    transform=ax.transAxes,
+                                    zorder=0)
+                else:
+                    ax.fill_between((pos-dpos/2, pos+dpos/2),
+                                    (0, 0), (1, 1),
+                                    color='lightgrey',
+                                    transform=ax.transAxes,
+                                    zorder=0)
+
+            jrow += 1
+
+            # Draw the left margin label
+            if show_names.lower() in ("left", "both"):
+                if horizontal:
+                    ax.text(-0.1/awidth, pos, left_label,
+                            horizontalalignment="right",
+                            verticalalignment='center',
+                            transform=ax.transAxes,
+                            family='monospace')
+                else:
+                    ax.text(pos, -0.1/aheight, left_label,
+                            horizontalalignment="center",
+                            verticalalignment='top',
+                            transform=ax.transAxes,
+                            family='monospace')
+
+            # Draw the right margin label
+            if show_names.lower() in ("right", "both"):
+                if right_label is not None:
+                    if horizontal:
+                        ax.text(1 + 0.1/awidth, pos, right_label,
+                                horizontalalignment="left",
+                                verticalalignment='center',
+                                transform=ax.transAxes,
+                                family='monospace')
+                    else:
+                        ax.text(pos, 1 + 0.1/aheight, right_label,
+                                horizontalalignment="center",
+                                verticalalignment='bottom',
+                                transform=ax.transAxes,
+                                family='monospace')
+
+            # Save the vertical position so that we can place the
+            # tick marks
+            ticks.append(pos)
+
+            # Loop over the points in one line
+            for ji,jp in enumerate(lines_map[(k0,k1)]):
+
+                # Calculate the vertical offset
+                yo = 0
+                if stacked:
+                    yo = -dpos/5 + style_codes_map[styles[jp]]*stackd
+
+                pt = points[jp]
+
+                # Plot the interval
+                if intervals is not None:
+
+                    # Symmetric interval
+                    if np.isscalar(intervals[jp]):
+                        lcb, ucb = pt - intervals[jp],\
+                            pt + intervals[jp]
+
+                    # Nonsymmetric interval
+                    else:
+                        lcb, ucb = pt - intervals[jp][0],\
+                            pt + intervals[jp][1]
+
+                    # Draw the interval
+                    if horizontal:
+                        ax.plot([lcb, ucb], [pos+yo, pos+yo], '-',
+                                transform=trans,
+                                **line_props[styles[jp]])
+                    else:
+                        ax.plot([pos+yo, pos+yo], [lcb, ucb], '-',
+                                transform=trans,
+                                **line_props[styles[jp]])
+
+
+                # Plot the point
+                sl = styles[jp]
+                sll = sl if sl not in labeled else None
+                labeled.add(sl)
+                if horizontal:
+                    ax.plot([pt,], [pos+yo,], ls='None',
+                            transform=trans, label=sll,
+                            **marker_props[sl])
+                else:
+                    ax.plot([pos+yo,], [pt,], ls='None',
+                            transform=trans, label=sll,
+                            **marker_props[sl])
+
+            if horizontal:
+                pos -= dpos
+            else:
+                pos += dpos
+
+    # Set up the axis
+    if horizontal:
+        ax.xaxis.set_ticks_position("bottom")
+        ax.yaxis.set_ticks_position("none")
+        ax.set_yticklabels([])
+        ax.spines['left'].set_color('none')
+        ax.spines['right'].set_color('none')
+        ax.spines['top'].set_color('none')
+        ax.spines['bottom'].set_position(('axes', -0.1/aheight))
+        ax.set_ylim(0, 1)
+        ax.yaxis.set_ticks(ticks)
+        ax.autoscale_view(scaley=False, tight=True)
+    else:
+        ax.yaxis.set_ticks_position("left")
+        ax.xaxis.set_ticks_position("none")
+        ax.set_xticklabels([])
+        ax.spines['bottom'].set_color('none')
+        ax.spines['right'].set_color('none')
+        ax.spines['top'].set_color('none')
+        ax.spines['left'].set_position(('axes', -0.1/awidth))
+        ax.set_xlim(0, 1)
+        ax.xaxis.set_ticks(ticks)
+        ax.autoscale_view(scalex=False, tight=True)
+
+    return fig
diff --git a/statsmodels/graphics/factorplots.py b/statsmodels/graphics/factorplots.py
index df84238b4..c5b70115c 100644
--- a/statsmodels/graphics/factorplots.py
+++ b/statsmodels/graphics/factorplots.py
@@ -1,15 +1,19 @@
+# -*- coding: utf-8 -*-
 """
 Authors:    Josef Perktold, Skipper Seabold, Denis A. Engemann
 """
 from statsmodels.compat.python import lrange
+
 import numpy as np
+
 from statsmodels.graphics.plottools import rainbow
 import statsmodels.graphics.utils as utils


-def interaction_plot(x, trace, response, func='mean', ax=None, plottype='b',
-    xlabel=None, ylabel=None, colors=None, markers=None, linestyles=None,
-    legendloc='best', legendtitle=None, **kwargs):
+def interaction_plot(x, trace, response, func="mean", ax=None, plottype='b',
+                     xlabel=None, ylabel=None, colors=None, markers=None,
+                     linestyles=None, legendloc='best', legendtitle=None,
+                     **kwargs):
     """
     Interaction plot for factor level statistics.

@@ -87,7 +91,72 @@ def interaction_plot(x, trace, response, func='mean', ax=None, plottype='b',
        import matplotlib.pyplot as plt
        #plt.show()
     """
-    pass
+
+    from pandas import DataFrame
+    fig, ax = utils.create_mpl_ax(ax)
+
+    response_name = ylabel or getattr(response, 'name', 'response')
+    func_name = getattr(func, "__name__", str(func))
+    ylabel = '%s of %s' % (func_name, response_name)
+    xlabel = xlabel or getattr(x, 'name', 'X')
+    legendtitle = legendtitle or getattr(trace, 'name', 'Trace')
+
+    ax.set_ylabel(ylabel)
+    ax.set_xlabel(xlabel)
+
+    x_values = x_levels = None
+    if isinstance(x[0], str):
+        x_levels = [l for l in np.unique(x)]
+        x_values = lrange(len(x_levels))
+        x = _recode(x, dict(zip(x_levels, x_values)))
+
+    data = DataFrame(dict(x=x, trace=trace, response=response))
+    plot_data = data.groupby(['trace', 'x']).aggregate(func).reset_index()
+
+    # return data
+    # check plot args
+    n_trace = len(plot_data['trace'].unique())
+
+    linestyles = ['-'] * n_trace if linestyles is None else linestyles
+    markers = ['.'] * n_trace if markers is None else markers
+    colors = rainbow(n_trace) if colors is None else colors
+
+    if len(linestyles) != n_trace:
+        raise ValueError("Must be a linestyle for each trace level")
+    if len(markers) != n_trace:
+        raise ValueError("Must be a marker for each trace level")
+    if len(colors) != n_trace:
+        raise ValueError("Must be a color for each trace level")
+
+    if plottype == 'both' or plottype == 'b':
+        for i, (values, group) in enumerate(plot_data.groupby('trace')):
+            # trace label
+            label = str(group['trace'].values[0])
+            ax.plot(group['x'], group['response'], color=colors[i],
+                    marker=markers[i], label=label,
+                    linestyle=linestyles[i], **kwargs)
+    elif plottype == 'line' or plottype == 'l':
+        for i, (values, group) in enumerate(plot_data.groupby('trace')):
+            # trace label
+            label = str(group['trace'].values[0])
+            ax.plot(group['x'], group['response'], color=colors[i],
+                    label=label, linestyle=linestyles[i], **kwargs)
+    elif plottype == 'scatter' or plottype == 's':
+        for i, (values, group) in enumerate(plot_data.groupby('trace')):
+            # trace label
+            label = str(group['trace'].values[0])
+            ax.scatter(group['x'], group['response'], color=colors[i],
+                    label=label, marker=markers[i], **kwargs)
+
+    else:
+        raise ValueError("Plot type %s not understood" % plottype)
+    ax.legend(loc=legendloc, title=legendtitle)
+    ax.margins(.1)
+
+    if all([x_levels, x_values]):
+        ax.set_xticks(x_values)
+        ax.set_xticklabels(x_levels)
+    return fig


 def _recode(x, levels):
@@ -105,4 +174,32 @@ def _recode(x, levels):
     -------
     out : instance numpy.ndarray
     """
-    pass
+    from pandas import Series
+    name = None
+    index = None
+
+    if isinstance(x, Series):
+        name = x.name
+        index = x.index
+        x = x.values
+
+    if x.dtype.type not in [np.str_, np.object_]:
+        raise ValueError('This is not a categorial factor.'
+                         ' Array of str type required.')
+
+    elif not isinstance(levels, dict):
+        raise ValueError('This is not a valid value for levels.'
+                         ' Dict required.')
+
+    elif not (np.unique(x) == np.unique(list(levels.keys()))).all():
+        raise ValueError('The levels do not match the array values.')
+
+    else:
+        out = np.empty(x.shape[0], dtype=int)
+        for level, coding in levels.items():
+            out[x == level] = coding
+
+        if name:
+            out = Series(out, name=name, index=index)
+
+        return out
diff --git a/statsmodels/graphics/functional.py b/statsmodels/graphics/functional.py
index d59cc1e25..9b601f224 100644
--- a/statsmodels/graphics/functional.py
+++ b/statsmodels/graphics/functional.py
@@ -1,10 +1,13 @@
 """Module for functional boxplots."""
 from statsmodels.compat.numpy import NP_LT_123
+
 import numpy as np
 from scipy.special import comb
+
 from statsmodels.graphics.utils import _import_mpl
 from statsmodels.multivariate.pca import PCA
 from statsmodels.nonparametric.kernel_density import KDEMultivariate
+
 try:
     from scipy.optimize import brute, differential_evolution, fmin
     have_de_optim = True
@@ -13,7 +16,9 @@ except ImportError:
     have_de_optim = False
 import itertools
 from multiprocessing import Pool
+
 from . import utils
+
 __all__ = ['hdrboxplot', 'fboxplot', 'rainbowplot', 'banddepth']


@@ -24,23 +29,16 @@ class HdrResults:
         self.__dict__.update(kwds)

     def __repr__(self):
-        msg = (
-            """HDR boxplot summary:
--> median:
-{}
--> 50% HDR (max, min):
-{}
--> 90% HDR (max, min):
-{}
--> Extra quantiles (max, min):
-{}
--> Outliers:
-{}
--> Outliers indices:
-{}
-"""
-            .format(self.median, self.hdr_50, self.hdr_90, self.
-            extra_quantiles, self.outliers, self.outliers_idx))
+        msg = ("HDR boxplot summary:\n"
+               "-> median:\n{}\n"
+               "-> 50% HDR (max, min):\n{}\n"
+               "-> 90% HDR (max, min):\n{}\n"
+               "-> Extra quantiles (max, min):\n{}\n"
+               "-> Outliers:\n{}\n"
+               "-> Outliers indices:\n{}\n"
+               ).format(self.median, self.hdr_50, self.hdr_90,
+                        self.extra_quantiles, self.outliers, self.outliers_idx)
+
         return msg


@@ -67,7 +65,11 @@ def _inverse_transform(pca, data):
     projection : ndarray
         nobs by nvar array of the projection onto ncomp factors
     """
-    pass
+    factors = pca.factors
+    pca.factors = data.reshape(-1, factors.shape[1])
+    projection = pca.project()
+    pca.factors = factors
+    return projection


 def _curve_constrained(x, idx, sign, band, pca, ks_gaussian):
@@ -95,7 +97,13 @@ def _curve_constrained(x, idx, sign, band, pca, ks_gaussian):
     value : float
         Curve value at `idx`.
     """
-    pass
+    x = x.reshape(1, -1)
+    pdf = ks_gaussian.pdf(x)
+    if band[0] < pdf < band[1]:
+        value = sign * _inverse_transform(pca, x)[0][idx]
+    else:
+        value = 1E6
+    return value


 def _min_max_band(args):
@@ -124,11 +132,28 @@ def _min_max_band(args):
     band : tuple of float
         ``(max, min)`` curve values at `idx`
     """
-    pass
-
-
-def hdrboxplot(data, ncomp=2, alpha=None, threshold=0.95, bw=None, xdata=
-    None, labels=None, ax=None, use_brute=False, seed=None):
+    idx, (band, pca, bounds, ks_gaussian, use_brute, seed) = args
+    if have_de_optim and not use_brute:
+        max_ = differential_evolution(_curve_constrained, bounds=bounds,
+                                      args=(idx, -1, band, pca, ks_gaussian),
+                                      maxiter=7, seed=seed).x
+        min_ = differential_evolution(_curve_constrained, bounds=bounds,
+                                      args=(idx, 1, band, pca, ks_gaussian),
+                                      maxiter=7, seed=seed).x
+    else:
+        max_ = brute(_curve_constrained, ranges=bounds, finish=fmin,
+                     args=(idx, -1, band, pca, ks_gaussian))
+
+        min_ = brute(_curve_constrained, ranges=bounds, finish=fmin,
+                     args=(idx, 1, band, pca, ks_gaussian))
+
+    band = (_inverse_transform(pca, max_)[0][idx],
+            _inverse_transform(pca, min_)[0][idx])
+    return band
+
+
+def hdrboxplot(data, ncomp=2, alpha=None, threshold=0.95, bw=None,
+               xdata=None, labels=None, ax=None, use_brute=False, seed=None):
     """
     High Density Region boxplot

@@ -270,11 +295,185 @@ def hdrboxplot(data, ncomp=2, alpha=None, threshold=0.95, bw=None, xdata=

     .. plot:: plots/graphics_functional_hdrboxplot.py
     """
-    pass
+    fig, ax = utils.create_mpl_ax(ax)
+
+    if labels is None:
+        # For use with pandas, get the labels
+        if hasattr(data, 'index'):
+            labels = data.index
+        else:
+            labels = np.arange(len(data))
+
+    data = np.asarray(data)
+    if xdata is None:
+        xdata = np.arange(data.shape[1])
+
+    n_samples, dim = data.shape
+    # PCA and bivariate plot
+    pca = PCA(data, ncomp=ncomp)
+    data_r = pca.factors
+
+    # Create gaussian kernel
+    ks_gaussian = KDEMultivariate(data_r, bw=bw,
+                                  var_type='c' * data_r.shape[1])
+
+    # Boundaries of the n-variate space
+    bounds = np.array([data_r.min(axis=0), data_r.max(axis=0)]).T
+
+    # Compute contour line of pvalue linked to a given probability level
+    if alpha is None:
+        alpha = [threshold, 0.9, 0.5]
+    else:
+        alpha.extend([threshold, 0.9, 0.5])
+        alpha = list(set(alpha))
+    alpha.sort(reverse=True)
+
+    n_quantiles = len(alpha)
+    pdf_r = ks_gaussian.pdf(data_r).flatten()
+    if NP_LT_123:
+        pvalues = [np.percentile(pdf_r, (1 - alpha[i]) * 100,
+                                 interpolation='linear')
+                   for i in range(n_quantiles)]
+    else:
+        pvalues = [np.percentile(pdf_r, (1 - alpha[i]) * 100,
+                                 method='midpoint')
+                   for i in range(n_quantiles)]
+
+    # Find mean, outliers curves
+    if have_de_optim and not use_brute:
+        median = differential_evolution(lambda x: - ks_gaussian.pdf(x),
+                                        bounds=bounds, maxiter=5, seed=seed).x
+    else:
+        median = brute(lambda x: - ks_gaussian.pdf(x),
+                       ranges=bounds, finish=fmin)
+
+    outliers_idx = np.where(pdf_r < pvalues[alpha.index(threshold)])[0]
+    labels_outlier = [labels[i] for i in outliers_idx]
+    outliers = data[outliers_idx]
+
+    # Find HDR given some quantiles
+
+    def _band_quantiles(band, use_brute=use_brute, seed=seed):
+        """
+        Find extreme curves for a quantile band.
+
+        From the `band` of quantiles, the associated PDF extrema values
+        are computed. If `min_alpha` is not provided (single quantile value),
+        `max_pdf` is set to `1E6` in order not to constrain the problem on high
+        values.
+
+        An optimization is performed per component in order to find the min and
+        max curves. This is done by comparing the PDF value of a given curve
+        with the band PDF.
+
+        Parameters
+        ----------
+        band : array_like
+            alpha values ``(max_alpha, min_alpha)`` ex: ``[0.9, 0.5]``
+        use_brute : bool
+            Use the brute force optimizer instead of the default differential
+            evolution to find the curves. Default is False.
+        seed : {None, int, np.random.RandomState}
+            Seed value to pass to scipy.optimize.differential_evolution. Can
+            be an integer or RandomState instance. If None, then the default
+            RandomState provided by np.random is used.
+
+
+        Returns
+        -------
+        band_quantiles : list of 1-D array
+            ``(max_quantile, min_quantile)`` (2, n_features)
+        """
+        min_pdf = pvalues[alpha.index(band[0])]
+        try:
+            max_pdf = pvalues[alpha.index(band[1])]
+        except IndexError:
+            max_pdf = 1E6
+        band = [min_pdf, max_pdf]
+
+        pool = Pool()
+        data = zip(range(dim), itertools.repeat((band, pca,
+                                                 bounds, ks_gaussian,
+                                                 seed, use_brute)))
+        band_quantiles = pool.map(_min_max_band, data)
+        pool.terminate()
+        pool.close()
+
+        band_quantiles = list(zip(*band_quantiles))
+
+        return band_quantiles
+
+    extra_alpha = [i for i in alpha
+                   if 0.5 != i and 0.9 != i and threshold != i]
+    if len(extra_alpha) > 0:
+        extra_quantiles = []
+        for x in extra_alpha:
+            for y in _band_quantiles([x], use_brute=use_brute, seed=seed):
+                extra_quantiles.append(y)
+    else:
+        extra_quantiles = []
+
+    # Inverse transform from n-variate plot to dataset dataset's shape
+    median = _inverse_transform(pca, median)[0]
+    hdr_90 = _band_quantiles([0.9, 0.5], use_brute=use_brute, seed=seed)
+    hdr_50 = _band_quantiles([0.5], use_brute=use_brute, seed=seed)
+
+    hdr_res = HdrResults({
+                            "median": median,
+                            "hdr_50": hdr_50,
+                            "hdr_90": hdr_90,
+                            "extra_quantiles": extra_quantiles,
+                            "outliers": outliers,
+                            "outliers_idx": outliers_idx
+                         })
+
+    # Plots
+    ax.plot(np.array([xdata] * n_samples).T, data.T,
+            c='c', alpha=.1, label=None)
+    ax.plot(xdata, median, c='k', label='Median')
+    fill_betweens = []
+    fill_betweens.append(ax.fill_between(xdata, *hdr_50, color='gray',
+                                         alpha=.4,  label='50% HDR'))
+    fill_betweens.append(ax.fill_between(xdata, *hdr_90, color='gray',
+                                         alpha=.3, label='90% HDR'))
+
+    if len(extra_quantiles) != 0:
+        ax.plot(np.array([xdata] * len(extra_quantiles)).T,
+                np.array(extra_quantiles).T,
+                c='y', ls='-.', alpha=.4, label='Extra quantiles')
+
+    if len(outliers) != 0:
+        for ii, outlier in enumerate(outliers):
+            if labels_outlier is None:
+                label = 'Outliers'
+            else:
+                label = str(labels_outlier[ii])
+            ax.plot(xdata, outlier, ls='--', alpha=0.7, label=label)
+
+    handles, labels = ax.get_legend_handles_labels()
+
+    # Proxy artist for fill_between legend entry
+    # See https://matplotlib.org/1.3.1/users/legend_guide.html
+    plt = _import_mpl()
+    for label, fill_between in zip(['50% HDR', '90% HDR'], fill_betweens):
+        p = plt.Rectangle((0, 0), 1, 1,
+                          fc=fill_between.get_facecolor()[0])
+        handles.append(p)
+        labels.append(label)
+
+    by_label = dict(zip(labels, handles))
+    if len(outliers) != 0:
+        by_label.pop('Median')
+        by_label.pop('50% HDR')
+        by_label.pop('90% HDR')
+
+    ax.legend(by_label.values(), by_label.keys(), loc='best')
+
+    return fig, hdr_res


 def fboxplot(data, xdata=None, labels=None, depth=None, method='MBD',
-    wfactor=1.5, ax=None, plot_opts=None):
+             wfactor=1.5, ax=None, plot_opts=None):
     """
     Plot functional boxplot.

@@ -391,11 +590,85 @@ def fboxplot(data, xdata=None, labels=None, depth=None, method='MBD',

     .. plot:: plots/graphics_functional_fboxplot.py
     """
-    pass
-
-
-def rainbowplot(data, xdata=None, depth=None, method='MBD', ax=None, cmap=None
-    ):
+    fig, ax = utils.create_mpl_ax(ax)
+
+    plot_opts = {} if plot_opts is None else plot_opts
+    if plot_opts.get('cmap_outliers') is None:
+        from matplotlib.cm import rainbow_r
+        plot_opts['cmap_outliers'] = rainbow_r
+
+    data = np.asarray(data)
+    if xdata is None:
+        xdata = np.arange(data.shape[1])
+
+    # Calculate band depth if required.
+    if depth is None:
+        if method not in ['MBD', 'BD2']:
+            raise ValueError("Unknown value for parameter `method`.")
+
+        depth = banddepth(data, method=method)
+    else:
+        if depth.size != data.shape[0]:
+            raise ValueError("Provided `depth` array is not of correct size.")
+
+    # Inner area is 25%-75% region of band-depth ordered curves.
+    ix_depth = np.argsort(depth)[::-1]
+    median_curve = data[ix_depth[0], :]
+    ix_IQR = data.shape[0] // 2
+    lower = data[ix_depth[0:ix_IQR], :].min(axis=0)
+    upper = data[ix_depth[0:ix_IQR], :].max(axis=0)
+
+    # Determine region for outlier detection
+    inner_median = np.median(data[ix_depth[0:ix_IQR], :], axis=0)
+    lower_fence = inner_median - (inner_median - lower) * wfactor
+    upper_fence = inner_median + (upper - inner_median) * wfactor
+
+    # Find outliers.
+    ix_outliers = []
+    ix_nonout = []
+    for ii in range(data.shape[0]):
+        if (np.any(data[ii, :] > upper_fence) or
+                np.any(data[ii, :] < lower_fence)):
+            ix_outliers.append(ii)
+        else:
+            ix_nonout.append(ii)
+
+    ix_outliers = np.asarray(ix_outliers)
+
+    # Plot envelope of all non-outlying data
+    lower_nonout = data[ix_nonout, :].min(axis=0)
+    upper_nonout = data[ix_nonout, :].max(axis=0)
+    ax.fill_between(xdata, lower_nonout, upper_nonout,
+                    color=plot_opts.get('c_outer', (0.75, 0.75, 0.75)))
+
+    # Plot central 50% region
+    ax.fill_between(xdata, lower, upper,
+                    color=plot_opts.get('c_inner', (0.5, 0.5, 0.5)))
+
+    # Plot median curve
+    ax.plot(xdata, median_curve, color=plot_opts.get('c_median', 'k'),
+            lw=plot_opts.get('lw_median', 2))
+
+    # Plot outliers
+    cmap = plot_opts.get('cmap_outliers')
+    for ii, ix in enumerate(ix_outliers):
+        label = str(labels[ix]) if labels is not None else None
+        ax.plot(xdata, data[ix, :],
+                color=cmap(float(ii) / (len(ix_outliers)-1)), label=label,
+                lw=plot_opts.get('lw_outliers', 1))
+
+    if plot_opts.get('draw_nonout', False):
+        for ix in ix_nonout:
+            ax.plot(xdata, data[ix, :], 'k-', lw=0.5)
+
+    if labels is not None:
+        ax.legend()
+
+    return fig, depth, ix_depth, ix_outliers
+
+
+def rainbowplot(data, xdata=None, depth=None, method='MBD', ax=None,
+                cmap=None):
     """
     Create a rainbow plot for a set of curves.

@@ -466,7 +739,38 @@ def rainbowplot(data, xdata=None, depth=None, method='MBD', ax=None, cmap=None

     .. plot:: plots/graphics_functional_rainbowplot.py
     """
-    pass
+    fig, ax = utils.create_mpl_ax(ax)
+
+    if cmap is None:
+        from matplotlib.cm import rainbow_r
+        cmap = rainbow_r
+
+    data = np.asarray(data)
+    if xdata is None:
+        xdata = np.arange(data.shape[1])
+
+    # Calculate band depth if required.
+    if depth is None:
+        if method not in ['MBD', 'BD2']:
+            raise ValueError("Unknown value for parameter `method`.")
+
+        depth = banddepth(data, method=method)
+    else:
+        if depth.size != data.shape[0]:
+            raise ValueError("Provided `depth` array is not of correct size.")
+
+    ix_depth = np.argsort(depth)[::-1]
+
+    # Plot all curves, colored by depth
+    num_curves = data.shape[0]
+    for ii in range(num_curves):
+        ax.plot(xdata, data[ix_depth[ii], :], c=cmap(ii / (num_curves - 1.)))
+
+    # Plot the median curve
+    median_curve = data[ix_depth[0], :]
+    ax.plot(xdata, median_curve, 'k-', lw=2)
+
+    return fig


 def banddepth(data, method='MBD'):
@@ -523,4 +827,27 @@ def banddepth(data, method='MBD'):
            million curves be ranked?", Journal for the Rapid Dissemination
            of Statistics Research, vol. 1, pp. 68-74, 2012.
     """
-    pass
+    n, p = data.shape
+    rv = np.argsort(data, axis=0)
+    rmat = np.argsort(rv, axis=0) + 1
+
+    # band depth
+    def _fbd2():
+        down = np.min(rmat, axis=1) - 1
+        up = n - np.max(rmat, axis=1)
+        return (up * down + n - 1) / comb(n, 2)
+
+    # modified band depth
+    def _fmbd():
+        down = rmat - 1
+        up = n - rmat
+        return ((np.sum(up * down, axis=1) / p) + n - 1) / comb(n, 2)
+
+    if method == 'BD2':
+        depth = _fbd2()
+    elif method == 'MBD':
+        depth = _fmbd()
+    else:
+        raise ValueError("Unknown input value for parameter `method`.")
+
+    return depth
diff --git a/statsmodels/graphics/gofplots.py b/statsmodels/graphics/gofplots.py
index e60b313dc..54a915893 100644
--- a/statsmodels/graphics/gofplots.py
+++ b/statsmodels/graphics/gofplots.py
@@ -1,12 +1,16 @@
 from statsmodels.compat.python import lzip
+
 import numpy as np
 from scipy import stats
+
 from statsmodels.distributions import ECDF
 from statsmodels.regression.linear_model import OLS
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.tools import add_constant
+
 from . import utils
-__all__ = ['qqplot', 'qqplot_2samples', 'qqline', 'ProbPlot']
+
+__all__ = ["qqplot", "qqplot_2samples", "qqline", "ProbPlot"]


 class ProbPlot:
@@ -162,26 +166,39 @@ class ProbPlot:
     .. plot:: plots/graphics_gofplots_qqplot.py
     """

-    def __init__(self, data, dist=stats.norm, fit=False, distargs=(), a=0,
-        loc=0, scale=1):
+    def __init__(
+        self,
+        data,
+        dist=stats.norm,
+        fit=False,
+        distargs=(),
+        a=0,
+        loc=0,
+        scale=1,
+    ):
+
         self.data = data
         self.a = a
         self.nobs = data.shape[0]
         self.distargs = distargs
         self.fit = fit
+
         self._is_frozen = isinstance(dist, stats.distributions.rv_frozen)
-        if self._is_frozen and (fit or loc != 0 or scale != 1 or distargs != ()
-            ):
+        if self._is_frozen and (
+            fit or loc != 0 or scale != 1 or distargs != ()
+        ):
             raise ValueError(
-                'Frozen distributions cannot be combined with fit, loc, scale or distargs.'
-                )
+                "Frozen distributions cannot be combined with fit, loc, scale"
+                " or distargs."
+            )
+        # propertes
         self._cache = {}
         if self._is_frozen:
             self.dist = dist
             dist_gen = dist.dist
             shapes = dist_gen.shapes
             if shapes is not None:
-                shape_args = tuple(map(str.strip, shapes.split(',')))
+                shape_args = tuple(map(str.strip, shapes.split(",")))
             else:
                 shape_args = ()
             numargs = len(shape_args)
@@ -189,11 +206,11 @@ class ProbPlot:
             if len(args) >= numargs + 1:
                 self.loc = args[numargs]
             else:
-                self.loc = dist.kwds.get('loc', loc)
+                self.loc = dist.kwds.get("loc", loc)
             if len(args) >= numargs + 2:
                 self.scale = args[numargs + 1]
             else:
-                self.scale = dist.kwds.get('scale', scale)
+                self.scale = dist.kwds.get("scale", scale)
             fit_params = []
             for i, arg in enumerate(shape_args):
                 if arg in dist.kwds:
@@ -214,13 +231,15 @@ class ProbPlot:
             try:
                 self.dist = dist(*distargs, **dict(loc=loc, scale=scale))
             except Exception:
-                distargs = ', '.join([str(da) for da in distargs])
-                cmd = 'dist({distargs}, loc={loc}, scale={scale})'
+                distargs = ", ".join([str(da) for da in distargs])
+                cmd = "dist({distargs}, loc={loc}, scale={scale})"
                 cmd = cmd.format(distargs=distargs, loc=loc, scale=scale)
                 raise TypeError(
-                    """Initializing the distribution failed.  This can occur if distargs contains loc or scale. The distribution initialization command is:
-{cmd}"""
-                    .format(cmd=cmd))
+                    "Initializing the distribution failed.  This "
+                    "can occur if distargs contains loc or scale. "
+                    "The distribution initialization command "
+                    "is:\n{cmd}".format(cmd=cmd)
+                )
             self.loc = loc
             self.scale = scale
             self.fit_params = np.r_[distargs, loc, scale]
@@ -233,30 +252,57 @@ class ProbPlot:
     @cache_readonly
     def theoretical_percentiles(self):
         """Theoretical percentiles"""
-        pass
+        return plotting_pos(self.nobs, self.a)

     @cache_readonly
     def theoretical_quantiles(self):
         """Theoretical quantiles"""
-        pass
+        try:
+            return self.dist.ppf(self.theoretical_percentiles)
+        except TypeError:
+            msg = "%s requires more parameters to compute ppf".format(
+                self.dist.name,
+            )
+            raise TypeError(msg)
+        except Exception as exc:
+            msg = "failed to compute the ppf of {0}".format(self.dist.name)
+            raise type(exc)(msg)

     @cache_readonly
     def sorted_data(self):
         """sorted data"""
-        pass
+        sorted_data = np.array(self.data, copy=True)
+        sorted_data.sort()
+        return sorted_data

     @cache_readonly
     def sample_quantiles(self):
         """sample quantiles"""
-        pass
+        if self.fit and self.loc != 0 and self.scale != 1:
+            return (self.sorted_data - self.loc) / self.scale
+        else:
+            return self.sorted_data

     @cache_readonly
     def sample_percentiles(self):
         """Sample percentiles"""
-        pass
-
-    def ppplot(self, xlabel=None, ylabel=None, line=None, other=None, ax=
-        None, **plotkwargs):
+        _check_for(self.dist, "cdf")
+        if self._is_frozen:
+            return self.dist.cdf(self.sorted_data)
+        quantiles = (self.sorted_data - self.fit_params[-2]) / self.fit_params[
+            -1
+        ]
+        return self.dist.cdf(quantiles)
+
+    def ppplot(
+        self,
+        xlabel=None,
+        ylabel=None,
+        line=None,
+        other=None,
+        ax=None,
+        **plotkwargs,
+    ):
         """
         Plot of the percentiles of x versus the percentiles of a distribution.

@@ -300,10 +346,55 @@ class ProbPlot:
             If `ax` is None, the created figure.  Otherwise the figure to which
             `ax` is connected.
         """
-        pass
+        if other is not None:
+            check_other = isinstance(other, ProbPlot)
+            if not check_other:
+                other = ProbPlot(other)
+
+            p_x = self.theoretical_percentiles
+            ecdf_x = ECDF(other.sample_quantiles)(self.sample_quantiles)
+
+            fig, ax = _do_plot(
+                p_x, ecdf_x, self.dist, ax=ax, line=line, **plotkwargs
+            )

-    def qqplot(self, xlabel=None, ylabel=None, line=None, other=None, ax=
-        None, swap: bool=False, **plotkwargs):
+            if xlabel is None:
+                xlabel = "Probabilities of 2nd Sample"
+            if ylabel is None:
+                ylabel = "Probabilities of 1st Sample"
+
+        else:
+            fig, ax = _do_plot(
+                self.theoretical_percentiles,
+                self.sample_percentiles,
+                self.dist,
+                ax=ax,
+                line=line,
+                **plotkwargs,
+            )
+            if xlabel is None:
+                xlabel = "Theoretical Probabilities"
+            if ylabel is None:
+                ylabel = "Sample Probabilities"
+
+        ax.set_xlabel(xlabel)
+        ax.set_ylabel(ylabel)
+
+        ax.set_xlim([0.0, 1.0])
+        ax.set_ylim([0.0, 1.0])
+
+        return fig
+
+    def qqplot(
+        self,
+        xlabel=None,
+        ylabel=None,
+        line=None,
+        other=None,
+        ax=None,
+        swap: bool = False,
+        **plotkwargs,
+    ):
         """
         Plot of the quantiles of x versus the quantiles/ppf of a distribution.

@@ -353,10 +444,63 @@ class ProbPlot:
             If `ax` is None, the created figure.  Otherwise the figure to which
             `ax` is connected.
         """
-        pass
+        if other is not None:
+            check_other = isinstance(other, ProbPlot)
+            if not check_other:
+                other = ProbPlot(other)
+
+            s_self = self.sample_quantiles
+            s_other = other.sample_quantiles
+
+            if len(s_self) > len(s_other):
+                raise ValueError(
+                    "Sample size of `other` must be equal or "
+                    + "larger than this `ProbPlot` instance"
+                )
+            elif len(s_self) < len(s_other):
+                # Use quantiles of the smaller set and interpolate quantiles of
+                # the larger data set
+                p = plotting_pos(self.nobs, self.a)
+                s_other = stats.mstats.mquantiles(s_other, p)
+            fig, ax = _do_plot(
+                s_other, s_self, self.dist, ax=ax, line=line, **plotkwargs
+            )
+
+            if xlabel is None:
+                xlabel = "Quantiles of 2nd Sample"
+            if ylabel is None:
+                ylabel = "Quantiles of 1st Sample"
+            if swap:
+                xlabel, ylabel = ylabel, xlabel

-    def probplot(self, xlabel=None, ylabel=None, line=None, exceed=False,
-        ax=None, **plotkwargs):
+        else:
+            fig, ax = _do_plot(
+                self.theoretical_quantiles,
+                self.sample_quantiles,
+                self.dist,
+                ax=ax,
+                line=line,
+                **plotkwargs,
+            )
+            if xlabel is None:
+                xlabel = "Theoretical Quantiles"
+            if ylabel is None:
+                ylabel = "Sample Quantiles"
+
+        ax.set_xlabel(xlabel)
+        ax.set_ylabel(ylabel)
+
+        return fig
+
+    def probplot(
+        self,
+        xlabel=None,
+        ylabel=None,
+        line=None,
+        exceed=False,
+        ax=None,
+        **plotkwargs,
+    ):
         """
         Plot of unscaled quantiles of x against the prob of a distribution.

@@ -400,11 +544,52 @@ class ProbPlot:
             If `ax` is None, the created figure.  Otherwise the figure to which
             `ax` is connected.
         """
-        pass
-
+        if exceed:
+            fig, ax = _do_plot(
+                self.theoretical_quantiles[::-1],
+                self.sorted_data,
+                self.dist,
+                ax=ax,
+                line=line,
+                **plotkwargs,
+            )
+            if xlabel is None:
+                xlabel = "Probability of Exceedance (%)"

-def qqplot(data, dist=stats.norm, distargs=(), a=0, loc=0, scale=1, fit=
-    False, line=None, ax=None, **plotkwargs):
+        else:
+            fig, ax = _do_plot(
+                self.theoretical_quantiles,
+                self.sorted_data,
+                self.dist,
+                ax=ax,
+                line=line,
+                **plotkwargs,
+            )
+            if xlabel is None:
+                xlabel = "Non-exceedance Probability (%)"
+
+        if ylabel is None:
+            ylabel = "Sample Quantiles"
+
+        ax.set_xlabel(xlabel)
+        ax.set_ylabel(ylabel)
+        _fmt_probplot_axis(ax, self.dist, self.nobs)
+
+        return fig
+
+
+def qqplot(
+    data,
+    dist=stats.norm,
+    distargs=(),
+    a=0,
+    loc=0,
+    scale=1,
+    fit=False,
+    line=None,
+    ax=None,
+    **plotkwargs,
+):
     """
     Q-Q plot of the quantiles of x versus the quantiles/ppf of a distribution.

@@ -500,11 +685,16 @@ def qqplot(data, dist=stats.norm, distargs=(), a=0, loc=0, scale=1, fit=

     .. plot:: plots/graphics_gofplots_qqplot.py
     """
-    pass
+    probplot = ProbPlot(
+        data, dist=dist, distargs=distargs, fit=fit, a=a, loc=loc, scale=scale
+    )
+    fig = probplot.qqplot(ax=ax, line=line, **plotkwargs)
+    return fig


-def qqplot_2samples(data1, data2, xlabel=None, ylabel=None, line=None, ax=None
-    ):
+def qqplot_2samples(
+    data1, data2, xlabel=None, ylabel=None, line=None, ax=None
+):
     """
     Q-Q Plot of two samples' quantiles.

@@ -579,10 +769,29 @@ def qqplot_2samples(data1, data2, xlabel=None, ylabel=None, line=None, ax=None
     >>> fig = qqplot_2samples(pp_x, pp_y, xlabel=None, ylabel=None,
     ...                       line=None, ax=None)
     """
-    pass
-
-
-def qqline(ax, line, x=None, y=None, dist=None, fmt='r-', **lineoptions):
+    if not isinstance(data1, ProbPlot):
+        data1 = ProbPlot(data1)
+
+    if not isinstance(data2, ProbPlot):
+        data2 = ProbPlot(data2)
+    if data2.data.shape[0] > data1.data.shape[0]:
+        fig = data1.qqplot(
+            xlabel=ylabel, ylabel=xlabel, line=line, other=data2, ax=ax
+        )
+    else:
+        fig = data2.qqplot(
+            xlabel=ylabel,
+            ylabel=xlabel,
+            line=line,
+            other=data1,
+            ax=ax,
+            swap=True,
+        )
+
+    return fig
+
+
+def qqline(ax, line, x=None, y=None, dist=None, fmt="r-", **lineoptions):
     """
     Plot a reference line for a qqplot.

@@ -639,9 +848,78 @@ def qqline(ax, line, x=None, y=None, dist=None, fmt='r-', **lineoptions):

     .. plot:: plots/graphics_gofplots_qqplot_qqline.py
     """
-    pass
-
-
+    lineoptions = lineoptions.copy()
+    for ls in ("-", "--", "-.", ":"):
+        if ls in fmt:
+            lineoptions.setdefault("linestyle", ls)
+            fmt = fmt.replace(ls, "")
+            break
+    for marker in (
+        ".",
+        ",",
+        "o",
+        "v",
+        "^",
+        "<",
+        ">",
+        "1",
+        "2",
+        "3",
+        "4",
+        "8",
+        "s",
+        "p",
+        "P",
+        "*",
+        "h",
+        "H",
+        "+",
+        "x",
+        "X",
+        "D",
+        "d",
+        "|",
+        "_",
+    ):
+        if marker in fmt:
+            lineoptions.setdefault("marker", marker)
+            fmt = fmt.replace(marker, "")
+            break
+    if fmt:
+        lineoptions.setdefault("color", fmt)
+
+    if line == "45":
+        end_pts = lzip(ax.get_xlim(), ax.get_ylim())
+        end_pts[0] = min(end_pts[0])
+        end_pts[1] = max(end_pts[1])
+        ax.plot(end_pts, end_pts, **lineoptions)
+        ax.set_xlim(end_pts)
+        ax.set_ylim(end_pts)
+        return  # does this have any side effects?
+    if x is None or y is None:
+        raise ValueError("If line is not 45, x and y cannot be None.")
+    x = np.array(x)
+    y = np.array(y)
+    if line == "r":
+        # could use ax.lines[0].get_xdata(), get_ydata(),
+        # but don't know axes are "clean"
+        y = OLS(y, add_constant(x)).fit().fittedvalues
+        ax.plot(x, y, **lineoptions)
+    elif line == "s":
+        m, b = np.std(y), np.mean(y)
+        ref_line = x * m + b
+        ax.plot(x, ref_line, **lineoptions)
+    elif line == "q":
+        _check_for(dist, "ppf")
+        q25 = stats.scoreatpercentile(y, 25)
+        q75 = stats.scoreatpercentile(y, 75)
+        theoretical_quartiles = dist.ppf([0.25, 0.75])
+        m = (q75 - q25) / np.diff(theoretical_quartiles)
+        b = q25 - m * theoretical_quartiles[0]
+        ax.plot(x, m * x + b, **lineoptions)
+
+
+# about 10x faster than plotting_position in sandbox and mstats
 def plotting_pos(nobs, a=0.0, b=None):
     """
     Generates sequence of plotting positions
@@ -672,7 +950,8 @@ def plotting_pos(nobs, a=0.0, b=None):
     scipy.stats.mstats.plotting_positions
         Additional information on alpha and beta
     """
-    pass
+    b = a if b is None else b
+    return (np.arange(1.0, nobs + 1) - a) / (nobs + 1 - a - b)


 def _fmt_probplot_axis(ax, dist, nobs):
@@ -694,11 +973,30 @@ def _fmt_probplot_axis(ax, dist, nobs):
     -------
     There is no return value. This operates on `ax` in place
     """
-    pass
-
-
-def _do_plot(x, y, dist=None, line=None, ax=None, fmt='b', step=False, **kwargs
-    ):
+    _check_for(dist, "ppf")
+    axis_probs = np.linspace(10, 90, 9, dtype=float)
+    small = np.array([1.0, 2, 5])
+    axis_probs = np.r_[small, axis_probs, 100 - small[::-1]]
+    if nobs >= 50:
+        axis_probs = np.r_[small / 10, axis_probs, 100 - small[::-1] / 10]
+    if nobs >= 500:
+        axis_probs = np.r_[small / 100, axis_probs, 100 - small[::-1] / 100]
+    axis_probs /= 100.0
+    axis_qntls = dist.ppf(axis_probs)
+    ax.set_xticks(axis_qntls)
+    ax.set_xticklabels(
+        [str(lbl) for lbl in (axis_probs * 100)],
+        rotation=45,
+        rotation_mode="anchor",
+        horizontalalignment="right",
+        verticalalignment="center",
+    )
+    ax.set_xlim([axis_qntls.min(), axis_qntls.max()])
+
+
+def _do_plot(
+    x, y, dist=None, line=None, ax=None, fmt="b", step=False, **kwargs
+):
     """
     Boiler plate plotting function for the `ppplot`, `qqplot`, and
     `probplot` methods of the `ProbPlot` class
@@ -728,4 +1026,33 @@ def _do_plot(x, y, dist=None, line=None, ax=None, fmt='b', step=False, **kwargs
     ax : AxesSubplot
         The original axes if provided.  Otherwise a new instance.
     """
-    pass
+    plot_style = {
+        "marker": "o",
+        "markerfacecolor": "C0",
+        "markeredgecolor": "C0",
+        "linestyle": "none",
+    }
+
+    plot_style.update(**kwargs)
+    where = plot_style.pop("where", "pre")
+
+    fig, ax = utils.create_mpl_ax(ax)
+    ax.set_xmargin(0.02)
+
+    if step:
+        ax.step(x, y, fmt, where=where, **plot_style)
+    else:
+        ax.plot(x, y, fmt, **plot_style)
+    if line:
+        if line not in ["r", "q", "45", "s"]:
+            msg = "%s option for line not understood" % line
+            raise ValueError(msg)
+
+        qqline(ax, line, x=x, y=y, dist=dist)
+
+    return fig, ax
+
+
+def _check_for(dist, attr="ppf"):
+    if not hasattr(dist, attr):
+        raise AttributeError(f"distribution must have a {attr} method")
diff --git a/statsmodels/graphics/mosaicplot.py b/statsmodels/graphics/mosaicplot.py
index a150237ab..cc7f1610b 100644
--- a/statsmodels/graphics/mosaicplot.py
+++ b/statsmodels/graphics/mosaicplot.py
@@ -5,13 +5,19 @@ and informative way.

 see the docstring of the mosaic function for more informations.
 """
+# Author: Enrico Giampieri - 21 Jan 2013
+
 from statsmodels.compat.python import lrange, lzip
+
 from itertools import product
+
 import numpy as np
 from numpy import array, cumsum, iterable, r_
 from pandas import DataFrame
+
 from statsmodels.graphics import utils
-__all__ = ['mosaic']
+
+__all__ = ["mosaic"]


 def _normalize_split(proportion):
@@ -19,7 +25,29 @@ def _normalize_split(proportion):
     return a list of proportions of the available space given the division
     if only a number is given, it will assume a split in two pieces
     """
-    pass
+    if not iterable(proportion):
+        if proportion == 0:
+            proportion = array([0.0, 1.0])
+        elif proportion >= 1:
+            proportion = array([1.0, 0.0])
+        elif proportion < 0:
+            raise ValueError("proportions should be positive,"
+                              "given value: {}".format(proportion))
+        else:
+            proportion = array([proportion, 1.0 - proportion])
+    proportion = np.asarray(proportion, dtype=float)
+    if np.any(proportion < 0):
+        raise ValueError("proportions should be positive,"
+                          "given value: {}".format(proportion))
+    if np.allclose(proportion, 0):
+        raise ValueError("at least one proportion should be "
+                          "greater than zero".format(proportion))
+    # ok, data are meaningful, so go on
+    if len(proportion) < 2:
+        return array([0.0, 1.0])
+    left = r_[0, cumsum(proportion)]
+    left /= left[-1] * 1.0
+    return left


 def _split_rect(x, y, width, height, proportion, horizontal=True, gap=0.05):
@@ -30,7 +58,35 @@ def _split_rect(x, y, width, height, proportion, horizontal=True, gap=0.05):
     a gap of 1 correspond to a plot that is half void and the remaining half
     space is proportionally divided among the pieces.
     """
-    pass
+    x, y, w, h = float(x), float(y), float(width), float(height)
+    if (w < 0) or (h < 0):
+        raise ValueError("dimension of the square less than"
+                          "zero w={} h=()".format(w, h))
+    proportions = _normalize_split(proportion)
+
+    # extract the starting point and the dimension of each subdivision
+    # in respect to the unit square
+    starting = proportions[:-1]
+    amplitude = proportions[1:] - starting
+
+    # how much each extrema is going to be displaced due to gaps
+    starting += gap * np.arange(len(proportions) - 1)
+
+    # how much the squares plus the gaps are extended
+    extension = starting[-1] + amplitude[-1] - starting[0]
+
+    # normalize everything for fit again in the original dimension
+    starting /= extension
+    amplitude /= extension
+
+    # bring everything to the original square
+    starting = (x if horizontal else y) + starting * (w if horizontal else h)
+    amplitude = amplitude * (w if horizontal else h)
+
+    # create each 4-tuple for each new block
+    results = [(s, y, a, h) if horizontal else (x, s, w, a)
+                for s, a in zip(starting, amplitude)]
+    return results


 def _reduce_dict(count_dict, partial_key):
@@ -38,7 +94,9 @@ def _reduce_dict(count_dict, partial_key):
     Make partial sum on a counter dict.
     Given a match for the beginning of the category, it will sum each value.
     """
-    pass
+    L = len(partial_key)
+    count = sum(v for k, v in count_dict.items() if k[:L] == partial_key)
+    return count


 def _key_splitting(rect_dict, keys, values, key_subset, horizontal, gap):
@@ -48,14 +106,28 @@ def _key_splitting(rect_dict, keys, values, key_subset, horizontal, gap):
     as long as the key start with the tuple key_subset.  The other keys are
     returned without modification.
     """
-    pass
+    result = {}
+    L = len(key_subset)
+    for name, (x, y, w, h) in rect_dict.items():
+        if key_subset == name[:L]:
+            # split base on the values given
+            divisions = _split_rect(x, y, w, h, values, horizontal, gap)
+            for key, rect in zip(keys, divisions):
+                result[name + (key,)] = rect
+        else:
+            result[name] = (x, y, w, h)
+    return result


 def _tuplify(obj):
     """convert an object in a tuple of strings (even if it is not iterable,
     like a single integer number, but keep the string healthy)
     """
-    pass
+    if np.iterable(obj) and not isinstance(obj, str):
+        res = tuple(str(o) for o in obj)
+    else:
+        res = (str(obj),)
+    return res


 def _categories_level(keys):
@@ -63,7 +135,11 @@ def _categories_level(keys):
     return each level of each category
     [[key_1_level_1,key_2_level_1],[key_1_level_2,key_2_level_2]]
     """
-    pass
+    res = []
+    for i in zip(*(keys)):
+        tuplefied = _tuplify(i)
+        res.append(list(dict([(j, None) for j in tuplefied])))
+    return res


 def _hierarchical_split(count_dict, horizontal=True, gap=0.05):
@@ -106,12 +182,46 @@ def _hierarchical_split(count_dict, horizontal=True, gap=0.05):
             2 - width of the rectangle
             3 - height of the rectangle
     """
-    pass
+    # this is the unit square that we are going to divide
+    base_rect = dict([(tuple(), (0, 0, 1, 1))])
+    # get the list of each possible value for each level
+    categories_levels = _categories_level(list(count_dict.keys()))
+    L = len(categories_levels)
+
+    # recreate the gaps vector starting from an int
+    if not np.iterable(gap):
+        gap = [gap / 1.5 ** idx for idx in range(L)]
+    # extend if it's too short
+    if len(gap) < L:
+        last = gap[-1]
+        gap = list(*gap) + [last / 1.5 ** idx for idx in range(L)]
+    # trim if it's too long
+    gap = gap[:L]
+    # put the count dictionay in order for the keys
+    # this will allow some code simplification
+    count_ordered = dict([(k, count_dict[k])
+                        for k in list(product(*categories_levels))])
+    for cat_idx, cat_enum in enumerate(categories_levels):
+        # get the partial key up to the actual level
+        base_keys = list(product(*categories_levels[:cat_idx]))
+        for key in base_keys:
+            # for each partial and each value calculate how many
+            # observation we have in the counting dictionary
+            part_count = [_reduce_dict(count_ordered, key + (partial,))
+                            for partial in cat_enum]
+            # reduce the gap for subsequents levels
+            new_gap = gap[cat_idx]
+            # split the given subkeys in the rectangle dictionary
+            base_rect = _key_splitting(base_rect, cat_enum, part_count, key,
+                                       horizontal, new_gap)
+        horizontal = not horizontal
+    return base_rect


 def _single_hsv_to_rgb(hsv):
     """Transform a color from the hsv space to the rgb."""
-    pass
+    from matplotlib.colors import hsv_to_rgb
+    return hsv_to_rgb(array(hsv).reshape(1, 1, 3)).reshape(3)


 def _create_default_properties(data):
@@ -122,7 +232,43 @@ def _create_default_properties(data):
     decoration on the rectangle.  Does not manage more than four
     level of categories
     """
-    pass
+    categories_levels = _categories_level(list(data.keys()))
+    Nlevels = len(categories_levels)
+    # first level, the hue
+    L = len(categories_levels[0])
+    # hue = np.linspace(1.0, 0.0, L+1)[:-1]
+    hue = np.linspace(0.0, 1.0, L + 2)[:-2]
+    # second level, the saturation
+    L = len(categories_levels[1]) if Nlevels > 1 else 1
+    saturation = np.linspace(0.5, 1.0, L + 1)[:-1]
+    # third level, the value
+    L = len(categories_levels[2]) if Nlevels > 2 else 1
+    value = np.linspace(0.5, 1.0, L + 1)[:-1]
+    # fourth level, the hatch
+    L = len(categories_levels[3]) if Nlevels > 3 else 1
+    hatch = ['', '/', '-', '|', '+'][:L + 1]
+    # convert in list and merge with the levels
+    hue = lzip(list(hue), categories_levels[0])
+    saturation = lzip(list(saturation),
+                     categories_levels[1] if Nlevels > 1 else [''])
+    value = lzip(list(value),
+                     categories_levels[2] if Nlevels > 2 else [''])
+    hatch = lzip(list(hatch),
+                     categories_levels[3] if Nlevels > 3 else [''])
+    # create the properties dictionary
+    properties = {}
+    for h, s, v, t in product(hue, saturation, value, hatch):
+        hv, hn = h
+        sv, sn = s
+        vv, vn = v
+        tv, tn = t
+        level = (hn,) + ((sn,) if sn else tuple())
+        level = level + ((vn,) if vn else tuple())
+        level = level + ((tn,) if tn else tuple())
+        hsv = array([hv, sv, vv])
+        prop = {'color': _single_hsv_to_rgb(hsv), 'hatch': tv, 'lw': 0}
+        properties[level] = prop
+    return properties


 def _normalize_data(data, index):
@@ -135,21 +281,106 @@ def _normalize_data(data, index):
         3 - everything that can be converted to a numpy array
         4 - pandas.DataFrame (via the _normalize_dataframe function)
     """
-    pass
+    # if data is a dataframe we need to take a completely new road
+    # before coming back here. Use the hasattr to avoid importing
+    # pandas explicitly
+    if hasattr(data, 'pivot') and hasattr(data, 'groupby'):
+        data = _normalize_dataframe(data, index)
+        index = None
+    # can it be used as a dictionary?
+    try:
+        items = list(data.items())
+    except AttributeError:
+        # ok, I cannot use the data as a dictionary
+        # Try to convert it to a numpy array, or die trying
+        data = np.asarray(data)
+        temp = {}
+        for idx in np.ndindex(data.shape):
+            name = tuple(i for i in idx)
+            temp[name] = data[idx]
+        data = temp
+        items = list(data.items())
+    # make all the keys a tuple, even if simple numbers
+    data = dict([_tuplify(k), v] for k, v in items)
+    categories_levels = _categories_level(list(data.keys()))
+    # fill the void in the counting dictionary
+    indexes = product(*categories_levels)
+    contingency = dict([(k, data.get(k, 0)) for k in indexes])
+    data = contingency
+    # reorder the keys order according to the one specified by the user
+    # or if the index is None convert it into a simple list
+    # right now it does not do any check, but can be modified in the future
+    index = lrange(len(categories_levels)) if index is None else index
+    contingency = {}
+    for key, value in data.items():
+        new_key = tuple(key[i] for i in index)
+        contingency[new_key] = value
+    data = contingency
+    return data


 def _normalize_dataframe(dataframe, index):
     """Take a pandas DataFrame and count the element present in the
     given columns, return a hierarchical index on those columns
     """
-    pass
+    #groupby the given keys, extract the same columns and count the element
+    # then collapse them with a mean
+    data = dataframe[index].dropna()
+    grouped = data.groupby(index, sort=False, observed=False)
+    counted = grouped[index].count()
+    averaged = counted.mean(axis=1)
+    # Fill empty missing with 0, see GH5639
+    averaged = averaged.fillna(0.0)
+    return averaged


 def _statistical_coloring(data):
     """evaluate colors from the indipendence properties of the matrix
     It will encounter problem if one category has all zeros
     """
-    pass
+    data = _normalize_data(data, None)
+    categories_levels = _categories_level(list(data.keys()))
+    Nlevels = len(categories_levels)
+    total = 1.0 * sum(v for v in data.values())
+    # count the proportion of observation
+    # for each level that has the given name
+    # at each level
+    levels_count = []
+    for level_idx in range(Nlevels):
+        proportion = {}
+        for level in categories_levels[level_idx]:
+            proportion[level] = 0.0
+            for key, value in data.items():
+                if level == key[level_idx]:
+                    proportion[level] += value
+            proportion[level] /= total
+        levels_count.append(proportion)
+    # for each key I obtain the expected value
+    # and it's standard deviation from a binomial distribution
+    # under the hipothesys of independence
+    expected = {}
+    for key, value in data.items():
+        base = 1.0
+        for i, k in enumerate(key):
+            base *= levels_count[i][k]
+        expected[key] = base * total, np.sqrt(total * base * (1.0 - base))
+    # now we have the standard deviation of distance from the
+    # expected value for each tile. We create the colors from this
+    sigmas = dict((k, (data[k] - m) / s) for k, (m, s) in expected.items())
+    props = {}
+    for key, dev in sigmas.items():
+        red = 0.0 if dev < 0 else (dev / (1 + dev))
+        blue = 0.0 if dev > 0 else (dev / (-1 + dev))
+        green = (1.0 - red - blue) / 2.0
+        hatch = 'x' if dev > 2 else 'o' if dev < -2 else ''
+        props[key] = {'color': [red, green, blue], 'hatch': hatch}
+    return props
+
+
+def _get_position(x, w, h, W):
+    if W == 0:
+        return x
+    return (x + w / 2.0) * w * h / W


 def _create_labels(rects, horizontal, ax, rotation):
@@ -160,12 +391,85 @@ def _create_labels(rects, horizontal, ax, rotation):
     ax: the axis on which the label should be applied
     rotation: the rotation list for each side
     """
-    pass
+    categories = _categories_level(list(rects.keys()))
+    if len(categories) > 4:
+        msg = ("maximum of 4 level supported for axes labeling... and 4"
+               "is already a lot of levels, are you sure you need them all?")
+        raise ValueError(msg)
+    labels = {}
+    #keep it fixed as will be used a lot of times
+    items = list(rects.items())
+    vertical = not horizontal
+
+    #get the axis ticks and labels locator to put the correct values!
+    ax2 = ax.twinx()
+    ax3 = ax.twiny()
+    #this is the order of execution for horizontal disposition
+    ticks_pos = [ax.set_xticks, ax.set_yticks, ax3.set_xticks, ax2.set_yticks]
+    ticks_lab = [ax.set_xticklabels, ax.set_yticklabels,
+                 ax3.set_xticklabels, ax2.set_yticklabels]
+    #for the vertical one, rotate it by one
+    if vertical:
+        ticks_pos = ticks_pos[1:] + ticks_pos[:1]
+        ticks_lab = ticks_lab[1:] + ticks_lab[:1]
+    #clean them
+    for pos, lab in zip(ticks_pos, ticks_lab):
+        pos([])
+        lab([])
+    #for each level, for each value in the level, take the mean of all
+    #the sublevel that correspond to that partial key
+    for level_idx, level in enumerate(categories):
+        #this dictionary keep the labels only for this level
+        level_ticks = dict()
+        for value in level:
+            #to which level it should refer to get the preceding
+            #values of labels? it's rather a tricky question...
+            #this is dependent on the side. It's a very crude management
+            #but I couldn't think a more general way...
+            if horizontal:
+                if level_idx == 3:
+                    index_select = [-1, -1, -1]
+                else:
+                    index_select = [+0, -1, -1]
+            else:
+                if level_idx == 3:
+                    index_select = [+0, -1, +0]
+                else:
+                    index_select = [-1, -1, -1]
+            #now I create the base key name and append the current value
+            #It will search on all the rects to find the corresponding one
+            #and use them to evaluate the mean position
+            basekey = tuple(categories[i][index_select[i]]
+                            for i in range(level_idx))
+            basekey = basekey + (value,)
+            subset = dict((k, v) for k, v in items
+                          if basekey == k[:level_idx + 1])
+            #now I extract the center of all the tiles and make a weighted
+            #mean of all these center on the area of the tile
+            #this should give me the (more or less) correct position
+            #of the center of the category
+
+            vals = list(subset.values())
+            W = sum(w * h for (x, y, w, h) in vals)
+            x_lab = sum(_get_position(x, w, h, W) for (x, y, w, h) in vals)
+            y_lab = sum(_get_position(y, h, w, W) for (x, y, w, h) in vals)
+            #now base on the ordering, select which position to keep
+            #needs to be written in a more general form of 4 level are enough?
+            #should give also the horizontal and vertical alignment
+            side = (level_idx + vertical) % 4
+            level_ticks[value] = y_lab if side % 2 else x_lab
+        #now we add the labels of this level to the correct axis
+
+        ticks_pos[level_idx](list(level_ticks.values()))
+        ticks_lab[level_idx](list(level_ticks.keys()),
+                             rotation=rotation[level_idx])
+    return labels


 def mosaic(data, index=None, ax=None, horizontal=True, gap=0.005,
-    properties=lambda key: None, labelizer=None, title='', statistic=False,
-    axes_label=True, label_rotation=0.0):
+           properties=lambda key: None, labelizer=None,
+           title='', statistic=False, axes_label=True,
+           label_rotation=0.0):
     """Create a mosaic plot from a contingency table.

     It allows to visualize multivariate categorical data in a rigorous
@@ -313,4 +617,51 @@ def mosaic(data, index=None, ax=None, horizontal=True, gap=0.005,

     .. plot :: plots/graphics_mosaicplot_mosaic.py
     """
-    pass
+    if isinstance(data, DataFrame) and index is None:
+        raise ValueError("You must pass an index if data is a DataFrame."
+                         " See examples.")
+
+    from matplotlib.patches import Rectangle
+
+    #from pylab import Rectangle
+    fig, ax = utils.create_mpl_ax(ax)
+    # normalize the data to a dict with tuple of strings as keys
+    data = _normalize_data(data, index)
+    # split the graph into different areas
+    rects = _hierarchical_split(data, horizontal=horizontal, gap=gap)
+    # if there is no specified way to create the labels
+    # create a default one
+    if labelizer is None:
+        labelizer = lambda k: "\n".join(k)
+    if statistic:
+        default_props = _statistical_coloring(data)
+    else:
+        default_props = _create_default_properties(data)
+    if isinstance(properties, dict):
+        color_dict = properties
+        properties = lambda key: color_dict.get(key, None)
+    for k, v in rects.items():
+        # create each rectangle and put a label on it
+        x, y, w, h = v
+        conf = properties(k)
+        props = conf if conf else default_props[k]
+        text = labelizer(k)
+        Rect = Rectangle((x, y), w, h, label=text, **props)
+        ax.add_patch(Rect)
+        ax.text(x + w / 2, y + h / 2, text, ha='center',
+                 va='center', size='smaller')
+    #creating the labels on the axis
+    #o clearing it
+    if axes_label:
+        if np.iterable(label_rotation):
+            rotation = label_rotation
+        else:
+            rotation = [label_rotation] * 4
+        labels = _create_labels(rects, horizontal, ax, rotation)
+    else:
+        ax.set_xticks([])
+        ax.set_xticklabels([])
+        ax.set_yticks([])
+        ax.set_yticklabels([])
+    ax.set_title(title)
+    return fig, rects
diff --git a/statsmodels/graphics/plot_grids.py b/statsmodels/graphics/plot_grids.py
index 4767e3dc2..e8fdcd55a 100644
--- a/statsmodels/graphics/plot_grids.py
+++ b/statsmodels/graphics/plot_grids.py
@@ -1,4 +1,4 @@
-"""create scatterplot with confidence ellipsis
+'''create scatterplot with confidence ellipsis

 Author: Josef Perktold
 License: BSD-3
@@ -8,20 +8,38 @@ TODO: update script to use sharex, sharey, and visible=False
     for sharex I need to have the ax of the last_row when editing the earlier
     rows. Or you axes_grid1, imagegrid
     http://matplotlib.sourceforge.net/mpl_toolkits/axes_grid/users/overview.html
-"""
+'''
+
+
 import numpy as np
 from scipy import stats
+
 from . import utils
+
 __all__ = ['scatter_ellipse']


 def _make_ellipse(mean, cov, ax, level=0.95, color=None):
     """Support function for scatter_ellipse."""
-    pass
+    from matplotlib.patches import Ellipse
+
+    v, w = np.linalg.eigh(cov)
+    u = w[0] / np.linalg.norm(w[0])
+    angle = np.arctan(u[1]/u[0])
+    angle = 180 * angle / np.pi # convert to degrees
+    v = 2 * np.sqrt(v * stats.chi2.ppf(level, 2)) #get size corresponding to level
+    ell = Ellipse(mean[:2], v[0], v[1], 180 + angle, facecolor='none',
+                  edgecolor=color,
+                  #ls='dashed',  #for debugging
+                  lw=1.5)
+    ell.set_clip_box(ax.bbox)
+    ell.set_alpha(0.5)
+    ax.add_artist(ell)


 def scatter_ellipse(data, level=0.9, varnames=None, ell_kwds=None,
-    plot_kwds=None, add_titles=False, keep_ticks=False, fig=None):
+                    plot_kwds=None, add_titles=False, keep_ticks=False,
+                    fig=None):
     """Create a grid of scatter plots with confidence ellipses.

     ell_kwds, plot_kdes not used yet
@@ -69,4 +87,91 @@ def scatter_ellipse(data, level=0.9, varnames=None, ell_kwds=None,

     .. plot:: plots/graphics_plot_grids_scatter_ellipse.py
     """
-    pass
+    fig = utils.create_mpl_fig(fig)
+    import matplotlib.ticker as mticker
+
+    data = np.asanyarray(data)  #needs mean and cov
+    nvars = data.shape[1]
+    if varnames is None:
+        #assuming single digit, nvars<=10  else use 'var%2d'
+        varnames = ['var%d' % i for i in range(nvars)]
+
+    plot_kwds_ = dict(ls='none', marker='.', color='k', alpha=0.5)
+    if plot_kwds:
+        plot_kwds_.update(plot_kwds)
+
+    ell_kwds_= dict(color='k')
+    if ell_kwds:
+        ell_kwds_.update(ell_kwds)
+
+    dmean = data.mean(0)
+    dcov = np.cov(data, rowvar=0)
+
+    for i in range(1, nvars):
+        #print '---'
+        ax_last=None
+        for j in range(i):
+            #print i,j, i*(nvars-1)+j+1
+            ax = fig.add_subplot(nvars-1, nvars-1, (i-1)*(nvars-1)+j+1)
+##                                 #sharey=ax_last) #sharey does not allow empty ticks?
+##            if j == 0:
+##                print 'new ax_last', j
+##                ax_last = ax
+##                ax.set_ylabel(varnames[i])
+            #TODO: make sure we have same xlim and ylim
+
+            formatter = mticker.FormatStrFormatter('% 3.1f')
+            ax.yaxis.set_major_formatter(formatter)
+            ax.xaxis.set_major_formatter(formatter)
+
+            idx = np.array([j,i])
+            ax.plot(*data[:,idx].T, **plot_kwds_)
+
+            if np.isscalar(level):
+                level = [level]
+            for alpha in level:
+                _make_ellipse(dmean[idx], dcov[idx[:,None], idx], ax, level=alpha,
+                         **ell_kwds_)
+
+            if add_titles:
+                ax.set_title('%s-%s' % (varnames[i], varnames[j]))
+            if not ax.is_first_col():
+                if not keep_ticks:
+                    ax.set_yticks([])
+                else:
+                    ax.yaxis.set_major_locator(mticker.MaxNLocator(3))
+            else:
+                ax.set_ylabel(varnames[i])
+            if ax.is_last_row():
+                ax.set_xlabel(varnames[j])
+            else:
+                if not keep_ticks:
+                    ax.set_xticks([])
+                else:
+                    ax.xaxis.set_major_locator(mticker.MaxNLocator(3))
+
+            dcorr = np.corrcoef(data, rowvar=0)
+            dc = dcorr[idx[:,None], idx]
+            xlim = ax.get_xlim()
+            ylim = ax.get_ylim()
+##            xt = xlim[0] + 0.1 * (xlim[1] - xlim[0])
+##            yt = ylim[0] + 0.1 * (ylim[1] - ylim[0])
+##            if dc[1,0] < 0 :
+##                yt = ylim[0] + 0.1 * (ylim[1] - ylim[0])
+##            else:
+##                yt = ylim[1] - 0.2 * (ylim[1] - ylim[0])
+            yrangeq = ylim[0] + 0.4 * (ylim[1] - ylim[0])
+            if dc[1,0] < -0.25 or (dc[1,0] < 0.25 and dmean[idx][1] > yrangeq):
+                yt = ylim[0] + 0.1 * (ylim[1] - ylim[0])
+            else:
+                yt = ylim[1] - 0.2 * (ylim[1] - ylim[0])
+            xt = xlim[0] + 0.1 * (xlim[1] - xlim[0])
+            ax.text(xt, yt, '$\\rho=%0.2f$'% dc[1,0])
+
+    for ax in fig.axes:
+        if ax.is_last_row(): # or ax.is_first_col():
+            ax.xaxis.set_major_locator(mticker.MaxNLocator(3))
+        if ax.is_first_col():
+            ax.yaxis.set_major_locator(mticker.MaxNLocator(3))
+
+    return fig
diff --git a/statsmodels/graphics/plottools.py b/statsmodels/graphics/plottools.py
index ddd3d2981..516b85606 100644
--- a/statsmodels/graphics/plottools.py
+++ b/statsmodels/graphics/plottools.py
@@ -20,4 +20,8 @@ def rainbow(n):
     Converts from HSV coordinates (0, 1, 1) to (1, 1, 1) to RGB. Based on
     the Sage function of the same name.
     """
-    pass
+    from matplotlib import colors
+    R = np.ones((1,n,3))
+    R[0,:,0] = np.linspace(0, 1, n, endpoint=False)
+    #Note: could iterate and use colorsys.hsv_to_rgb
+    return colors.hsv_to_rgb(R).squeeze()
diff --git a/statsmodels/graphics/regressionplots.py b/statsmodels/graphics/regressionplots.py
index 3e67e7757..1d42508e0 100644
--- a/statsmodels/graphics/regressionplots.py
+++ b/statsmodels/graphics/regressionplots.py
@@ -1,4 +1,4 @@
-"""Partial Regression plot and residual plots to find misspecification
+'''Partial Regression plot and residual plots to find misspecification


 Author: Josef Perktold
@@ -9,12 +9,14 @@ update
 2011-06-05 : start to convert example to usable functions
 2011-10-27 : docstrings

-"""
+'''
 from statsmodels.compat.pandas import Appender
 from statsmodels.compat.python import lrange, lzip
+
 import numpy as np
 import pandas as pd
 from patsy import dmatrix
+
 from statsmodels.genmod.generalized_estimating_equations import GEE
 from statsmodels.genmod.generalized_linear_model import GLM
 from statsmodels.graphics import utils
@@ -22,15 +24,29 @@ from statsmodels.nonparametric.smoothers_lowess import lowess
 from statsmodels.regression.linear_model import GLS, OLS, WLS
 from statsmodels.sandbox.regression.predstd import wls_prediction_std
 from statsmodels.tools.tools import maybe_unwrap_results
-from ._regressionplots_doc import _plot_added_variable_doc, _plot_ceres_residuals_doc, _plot_influence_doc, _plot_leverage_resid2_doc, _plot_partial_residuals_doc
+
+from ._regressionplots_doc import (
+    _plot_added_variable_doc,
+    _plot_ceres_residuals_doc,
+    _plot_influence_doc,
+    _plot_leverage_resid2_doc,
+    _plot_partial_residuals_doc,
+)
+
 __all__ = ['plot_fit', 'plot_regress_exog', 'plot_partregress', 'plot_ccpr',
-    'plot_regress_exog', 'plot_partregress_grid', 'plot_ccpr_grid',
-    'add_lowess', 'abline_plot', 'influence_plot', 'plot_leverage_resid2',
-    'added_variable_resids', 'partial_resids', 'ceres_resids',
-    'plot_added_variable', 'plot_partial_residuals', 'plot_ceres_residuals']
+           'plot_regress_exog', 'plot_partregress_grid', 'plot_ccpr_grid',
+           'add_lowess', 'abline_plot', 'influence_plot',
+           'plot_leverage_resid2', 'added_variable_resids',
+           'partial_resids', 'ceres_resids', 'plot_added_variable',
+           'plot_partial_residuals', 'plot_ceres_residuals']
+
+#TODO: consider moving to influence module
+def _high_leverage(results):
+    #TODO: replace 1 with k_constant
+    return 2. * (results.df_model + 1)/results.nobs


-def add_lowess(ax, lines_idx=0, frac=0.2, **lowess_kwargs):
+def add_lowess(ax, lines_idx=0, frac=.2, **lowess_kwargs):
     """
     Add Lowess line to a plot.

@@ -51,7 +67,11 @@ def add_lowess(ax, lines_idx=0, frac=0.2, **lowess_kwargs):
     Figure
         The figure that holds the instance.
     """
-    pass
+    y0 = ax.get_lines()[lines_idx]._y
+    x0 = ax.get_lines()[lines_idx]._x
+    lres = lowess(y0, x0, frac=frac, **lowess_kwargs)
+    ax.plot(lres[:, 0], lres[:, 1], 'r', lw=1.5)
+    return ax.figure


 def plot_fit(results, exog_idx, y_true=None, ax=None, vlines=True, **kwargs):
@@ -116,7 +136,38 @@ def plot_fit(results, exog_idx, y_true=None, ax=None, vlines=True, **kwargs):

     .. plot:: plots/graphics_plot_fit_ex.py
     """
-    pass
+
+    fig, ax = utils.create_mpl_ax(ax)
+
+    exog_name, exog_idx = utils.maybe_name_or_idx(exog_idx, results.model)
+    results = maybe_unwrap_results(results)
+
+    #maybe add option for wendog, wexog
+    y = results.model.endog
+    x1 = results.model.exog[:, exog_idx]
+    x1_argsort = np.argsort(x1)
+    y = y[x1_argsort]
+    x1 = x1[x1_argsort]
+
+    ax.plot(x1, y, 'bo', label=results.model.endog_names)
+    if y_true is not None:
+        ax.plot(x1, y_true[x1_argsort], 'b-', label='True values')
+    title = 'Fitted values versus %s' % exog_name
+
+    ax.plot(x1, results.fittedvalues[x1_argsort], 'D', color='r',
+            label='fitted', **kwargs)
+    if vlines is True:
+        _, iv_l, iv_u = wls_prediction_std(results)
+        ax.vlines(x1, iv_l[x1_argsort], iv_u[x1_argsort], linewidth=1,
+                  color='k', alpha=.7)
+    #ax.fill_between(x1, iv_l[x1_argsort], iv_u[x1_argsort], alpha=0.1,
+    #                    color='k')
+    ax.set_title(title)
+    ax.set_xlabel(exog_name)
+    ax.set_ylabel(results.model.endog_names)
+    ax.legend(loc='best', numpoints=1)
+
+    return fig


 def plot_regress_exog(results, exog_idx, fig=None):
@@ -165,7 +216,59 @@ def plot_regress_exog(results, exog_idx, fig=None):

     .. plot:: plots/graphics_regression_regress_exog.py
     """
-    pass
+
+    fig = utils.create_mpl_fig(fig)
+
+    exog_name, exog_idx = utils.maybe_name_or_idx(exog_idx, results.model)
+    results = maybe_unwrap_results(results)
+
+    #maybe add option for wendog, wexog
+    y_name = results.model.endog_names
+    x1 = results.model.exog[:, exog_idx]
+    prstd, iv_l, iv_u = wls_prediction_std(results)
+
+    ax = fig.add_subplot(2, 2, 1)
+    ax.plot(x1, results.model.endog, 'o', color='b', alpha=0.9, label=y_name)
+    ax.plot(x1, results.fittedvalues, 'D', color='r', label='fitted',
+            alpha=.5)
+    ax.vlines(x1, iv_l, iv_u, linewidth=1, color='k', alpha=.7)
+    ax.set_title('Y and Fitted vs. X', fontsize='large')
+    ax.set_xlabel(exog_name)
+    ax.set_ylabel(y_name)
+    ax.legend(loc='best')
+
+    ax = fig.add_subplot(2, 2, 2)
+    ax.plot(x1, results.resid, 'o')
+    ax.axhline(y=0, color='black')
+    ax.set_title('Residuals versus %s' % exog_name, fontsize='large')
+    ax.set_xlabel(exog_name)
+    ax.set_ylabel("resid")
+
+    ax = fig.add_subplot(2, 2, 3)
+    exog_noti = np.ones(results.model.exog.shape[1], bool)
+    exog_noti[exog_idx] = False
+    exog_others = results.model.exog[:, exog_noti]
+    from pandas import Series
+    fig = plot_partregress(results.model.data.orig_endog,
+                           Series(x1, name=exog_name,
+                                  index=results.model.data.row_labels),
+                           exog_others, obs_labels=False, ax=ax)
+    ax.set_title('Partial regression plot', fontsize='large')
+    #ax.set_ylabel("Fitted values")
+    #ax.set_xlabel(exog_name)
+
+    ax = fig.add_subplot(2, 2, 4)
+    fig = plot_ccpr(results, exog_idx, ax=ax)
+    ax.set_title('CCPR Plot', fontsize='large')
+    #ax.set_xlabel(exog_name)
+    #ax.set_ylabel("Fitted values + resids")
+
+    fig.suptitle('Regression Plots for %s' % exog_name, fontsize="large")
+
+    fig.tight_layout()
+
+    fig.subplots_adjust(top=.90)
+    return fig


 def _partial_regression(endog, exog_i, exog_others):
@@ -189,12 +292,17 @@ def _partial_regression(endog, exog_i, exog_others):
          results from regression of endog on exog_others and of exog_i on
          exog_others
     """
-    pass
+    #FIXME: This function does not appear to be used.
+    res1a = OLS(endog, exog_others).fit()
+    res1b = OLS(exog_i, exog_others).fit()
+    res1c = OLS(res1a.resid, res1b.resid).fit()
+
+    return res1c, (res1a, res1b)


-def plot_partregress(endog, exog_i, exog_others, data=None, title_kwargs={},
-    obs_labels=True, label_kwargs={}, ax=None, ret_coords=False, eval_env=1,
-    **kwargs):
+def plot_partregress(endog, exog_i, exog_others, data=None,
+                     title_kwargs={}, obs_labels=True, label_kwargs={},
+                     ax=None, ret_coords=False, eval_env=1, **kwargs):
     """Plot partial regression for a single regressor.

     Parameters
@@ -277,7 +385,86 @@ def plot_partregress(endog, exog_i, exog_others, data=None, title_kwargs={},
     More detailed examples can be found in the Regression Plots notebook
     on the examples page.
     """
-    pass
+    #NOTE: there is no interaction between possible missing data and
+    #obs_labels yet, so this will need to be tweaked a bit for this case
+    fig, ax = utils.create_mpl_ax(ax)
+
+    # strings, use patsy to transform to data
+    if isinstance(endog, str):
+        endog = dmatrix(endog + "-1", data, eval_env=eval_env)
+
+    if isinstance(exog_others, str):
+        RHS = dmatrix(exog_others, data, eval_env=eval_env)
+    elif isinstance(exog_others, list):
+        RHS = "+".join(exog_others)
+        RHS = dmatrix(RHS, data, eval_env=eval_env)
+    else:
+        RHS = exog_others
+    RHS_isemtpy = False
+    if isinstance(RHS, np.ndarray) and RHS.size==0:
+        RHS_isemtpy = True
+    elif isinstance(RHS, pd.DataFrame) and RHS.empty:
+        RHS_isemtpy = True
+    if isinstance(exog_i, str):
+        exog_i = dmatrix(exog_i + "-1", data, eval_env=eval_env)
+
+    # all arrays or pandas-like
+
+    if RHS_isemtpy:
+        endog = np.asarray(endog)
+        exog_i = np.asarray(exog_i)
+        ax.plot(endog, exog_i, 'o', **kwargs)
+        fitted_line = OLS(endog, exog_i).fit()
+        x_axis_endog_name = 'x' if isinstance(exog_i, np.ndarray) else exog_i.name
+        y_axis_endog_name = 'y' if isinstance(endog, np.ndarray) else endog.design_info.column_names[0]
+    else:
+        res_yaxis = OLS(endog, RHS).fit()
+        res_xaxis = OLS(exog_i, RHS).fit()
+        xaxis_resid = res_xaxis.resid
+        yaxis_resid = res_yaxis.resid
+        x_axis_endog_name = res_xaxis.model.endog_names
+        y_axis_endog_name = res_yaxis.model.endog_names
+        ax.plot(xaxis_resid, yaxis_resid, 'o', **kwargs)
+        fitted_line = OLS(yaxis_resid, xaxis_resid).fit()
+
+    fig = abline_plot(0, np.asarray(fitted_line.params)[0], color='k', ax=ax)
+
+    if x_axis_endog_name == 'y':  # for no names regression will just get a y
+        x_axis_endog_name = 'x'  # this is misleading, so use x
+    ax.set_xlabel("e(%s | X)" % x_axis_endog_name)
+    ax.set_ylabel("e(%s | X)" % y_axis_endog_name)
+    ax.set_title('Partial Regression Plot', **title_kwargs)
+
+    # NOTE: if we want to get super fancy, we could annotate if a point is
+    # clicked using this widget
+    # http://stackoverflow.com/questions/4652439/
+    # is-there-a-matplotlib-equivalent-of-matlabs-datacursormode/
+    # 4674445#4674445
+    if obs_labels is True:
+        if data is not None:
+            obs_labels = data.index
+        elif hasattr(exog_i, "index"):
+            obs_labels = exog_i.index
+        else:
+            obs_labels = res_xaxis.model.data.row_labels
+        #NOTE: row_labels can be None.
+        #Maybe we should fix this to never be the case.
+        if obs_labels is None:
+            obs_labels = lrange(len(exog_i))
+
+    if obs_labels is not False:  # could be array_like
+        if len(obs_labels) != len(exog_i):
+            raise ValueError("obs_labels does not match length of exog_i")
+        label_kwargs.update(dict(ha="center", va="bottom"))
+        ax = utils.annotate_axes(lrange(len(obs_labels)), obs_labels,
+                                 lzip(res_xaxis.resid, res_yaxis.resid),
+                                 [(0, 5)] * len(obs_labels), "x-large", ax=ax,
+                                 **label_kwargs)
+
+    if ret_coords:
+        return fig, (res_xaxis.resid, res_yaxis.resid)
+    else:
+        return fig


 def plot_partregress_grid(results, exog_idx=None, grid=None, fig=None):
@@ -342,7 +529,44 @@ def plot_partregress_grid(results, exog_idx=None, grid=None, fig=None):

     .. plot:: plots/graphics_regression_partregress_grid.py
     """
-    pass
+    import pandas
+    fig = utils.create_mpl_fig(fig)
+
+    exog_name, exog_idx = utils.maybe_name_or_idx(exog_idx, results.model)
+
+    # TODO: maybe add option for using wendog, wexog instead
+    y = pandas.Series(results.model.endog, name=results.model.endog_names)
+    exog = results.model.exog
+
+    k_vars = exog.shape[1]
+    # this function does not make sense if k_vars=1
+
+    nrows = (len(exog_idx) + 1) // 2
+    ncols = 1 if nrows == len(exog_idx) else 2
+    if grid is not None:
+        nrows, ncols = grid
+    if ncols > 1:
+        title_kwargs = {"fontdict": {"fontsize": 'small'}}
+
+    # for indexing purposes
+    other_names = np.array(results.model.exog_names)
+    for i, idx in enumerate(exog_idx):
+        others = lrange(k_vars)
+        others.pop(idx)
+        exog_others = pandas.DataFrame(exog[:, others],
+                                       columns=other_names[others])
+        ax = fig.add_subplot(nrows, ncols, i + 1)
+        plot_partregress(y, pandas.Series(exog[:, idx],
+                                          name=other_names[idx]),
+                         exog_others, ax=ax, title_kwargs=title_kwargs,
+                         obs_labels=False)
+        ax.set_title("")
+
+    fig.suptitle("Partial Regression Plot", fontsize="large")
+    fig.tight_layout()
+    fig.subplots_adjust(top=.95)
+
+    return fig


 def plot_ccpr(results, exog_idx, ax=None):
@@ -407,7 +631,25 @@ def plot_ccpr(results, exog_idx, ax=None):

     .. plot:: plots/graphics_regression_ccpr.py
     """
-    pass
+    fig, ax = utils.create_mpl_ax(ax)
+
+    exog_name, exog_idx = utils.maybe_name_or_idx(exog_idx, results.model)
+    results = maybe_unwrap_results(results)
+
+    x1 = results.model.exog[:, exog_idx]
+    #namestr = ' for %s' % self.name if self.name else ''
+    x1beta = x1*results.params[exog_idx]
+    ax.plot(x1, x1beta + results.resid, 'o')
+    from statsmodels.tools.tools import add_constant
+    mod = OLS(x1beta, add_constant(x1)).fit()
+    params = mod.params
+    fig = abline_plot(*params, **dict(ax=ax))
+    #ax.plot(x1, x1beta, '-')
+    ax.set_title('Component and component plus residual plot')
+    ax.set_ylabel("Residual + %s*beta_%d" % (exog_name, exog_idx))
+    ax.set_xlabel("%s" % exog_name)
+
+    return fig


 def plot_ccpr_grid(results, exog_idx=None, grid=None, fig=None):
@@ -473,11 +715,40 @@ def plot_ccpr_grid(results, exog_idx=None, grid=None, fig=None):

     .. plot:: plots/graphics_regression_ccpr_grid.py
     """
-    pass
+    fig = utils.create_mpl_fig(fig)
+
+    exog_name, exog_idx = utils.maybe_name_or_idx(exog_idx, results.model)
+
+    if grid is not None:
+        nrows, ncols = grid
+    else:
+        if len(exog_idx) > 2:
+            nrows = int(np.ceil(len(exog_idx)/2.))
+            ncols = 2
+        else:
+            nrows = len(exog_idx)
+            ncols = 1
+
+    seen_constant = 0
+    for i, idx in enumerate(exog_idx):
+        if results.model.exog[:, idx].var() == 0:
+            seen_constant = 1
+            continue
+
+        ax = fig.add_subplot(nrows, ncols, i+1-seen_constant)
+        fig = plot_ccpr(results, exog_idx=idx, ax=ax)
+        ax.set_title("")
+
+    fig.suptitle("Component-Component Plus Residual Plot", fontsize="large")
+
+    fig.tight_layout()
+
+    fig.subplots_adjust(top=.95)
+    return fig


 def abline_plot(intercept=None, slope=None, horiz=None, vert=None,
-    model_results=None, ax=None, **kwargs):
+                model_results=None, ax=None, **kwargs):
     """
     Plot a line given an intercept and slope.

@@ -522,7 +793,274 @@ def abline_plot(intercept=None, slope=None, horiz=None, vert=None,

     .. plot:: plots/graphics_regression_abline.py
     """
-    pass
+    if ax is not None:  # get axis limits first thing, do not change these
+        x = ax.get_xlim()
+    else:
+        x = None
+
+    fig, ax = utils.create_mpl_ax(ax)
+
+    if model_results:
+        intercept, slope = model_results.params
+        if x is None:
+            x = [model_results.model.exog[:, 1].min(),
+                 model_results.model.exog[:, 1].max()]
+    else:
+        if not (intercept is not None and slope is not None):
+            raise ValueError("specify slope and intercepty or model_results")
+        if x is None:
+            x = ax.get_xlim()
+
+    data_y = [x[0]*slope+intercept, x[1]*slope+intercept]
+    ax.set_xlim(x)
+    #ax.set_ylim(y)
+
+    from matplotlib.lines import Line2D
+
+    class ABLine2D(Line2D):
+        def __init__(self, *args, **kwargs):
+            super(ABLine2D, self).__init__(*args, **kwargs)
+            self.id_xlim_callback = None
+            self.id_ylim_callback = None
+
+        def remove(self):
+            ax = self.axes
+            if self.id_xlim_callback:
+                ax.callbacks.disconnect(self.id_xlim_callback)
+            if self.id_ylim_callback:
+                ax.callbacks.disconnect(self.id_ylim_callback)
+            super(ABLine2D, self).remove()
+
+        def update_datalim(self, ax):
+            ax.set_autoscale_on(False)
+            children = ax.get_children()
+            ablines = [child for child in children if child is self]
+            abline = ablines[0]
+            x = ax.get_xlim()
+            y = [x[0] * slope + intercept, x[1] * slope + intercept]
+            abline.set_data(x, y)
+            ax.figure.canvas.draw()
+
+    # TODO: how to intercept something like a margins call and adjust?
+    line = ABLine2D(x, data_y, **kwargs)
+    ax.add_line(line)
+    line.id_xlim_callback = ax.callbacks.connect('xlim_changed', line.update_datalim)
+    line.id_ylim_callback = ax.callbacks.connect('ylim_changed', line.update_datalim)
+
+    if horiz:
+        ax.hline(horiz)
+    if vert:
+        ax.vline(vert)
+    return fig
+
+
+@Appender(_plot_influence_doc.format(**{
+    'extra_params_doc': "results: object\n"
+                        "        Results for a fitted regression model.\n"
+                        "    influence: instance\n"
+                        "        The instance of Influence for model."}))
+def _influence_plot(results, influence, external=True, alpha=.05,
+                    criterion="cooks", size=48, plot_alpha=.75, ax=None,
+                    leverage=None, resid=None,
+                    **kwargs):
+    # leverage and resid kwds are used only internally for MLEInfluence
+    infl = influence
+    fig, ax = utils.create_mpl_ax(ax)
+
+    if criterion.lower().startswith('coo'):
+        psize = infl.cooks_distance[0]
+    elif criterion.lower().startswith('dff'):
+        psize = np.abs(infl.dffits[0])
+    else:
+        raise ValueError("Criterion %s not understood" % criterion)
+
+    # scale the variables
+    #TODO: what is the correct scaling and the assumption here?
+    #we want plots to be comparable across different plots
+    #so we would need to use the expected distribution of criterion probably
+    old_range = np.ptp(psize)
+    new_range = size**2 - 8**2
+
+    psize = (psize - psize.min()) * new_range/old_range + 8**2
+
+    if leverage is None:
+        leverage = infl.hat_matrix_diag
+    if resid is None:
+        ylabel = "Studentized Residuals"
+        if external:
+            resid = infl.resid_studentized_external
+        else:
+            resid = infl.resid_studentized
+    else:
+        resid = np.asarray(resid)
+        ylabel = "Residuals"
+
+    from scipy import stats
+
+    cutoff = stats.t.ppf(1.-alpha/2, results.df_resid)
+    large_resid = np.abs(resid) > cutoff
+    large_leverage = leverage > _high_leverage(results)
+    large_points = np.logical_or(large_resid, large_leverage)
+
+    ax.scatter(leverage, resid, s=psize, alpha=plot_alpha)
+
+    # add point labels
+    labels = results.model.data.row_labels
+    if labels is None:
+        labels = lrange(len(resid))
+    ax = utils.annotate_axes(np.where(large_points)[0], labels,
+                             lzip(leverage, resid),
+                             lzip(-(psize/2)**.5, (psize/2)**.5), "x-large",
+                             ax)
+
+    # TODO: make configurable or let people do it ex-post?
+    font = {"fontsize": 16, "color": "black"}
+    ax.set_ylabel(ylabel, **font)
+    ax.set_xlabel("Leverage", **font)
+    ax.set_title("Influence Plot", **font)
+    return fig
+
+
+@Appender(_plot_influence_doc.format(**{
+    'extra_params_doc': "results : Results\n"
+                        "        Results for a fitted regression model."}))
+def influence_plot(results, external=True, alpha=.05, criterion="cooks",
+                   size=48, plot_alpha=.75, ax=None, **kwargs):
+
+    infl = results.get_influence()
+    res = _influence_plot(results, infl, external=external, alpha=alpha,
+                          criterion=criterion, size=size,
+                          plot_alpha=plot_alpha, ax=ax, **kwargs)
+    return res
+
+
+@Appender(_plot_leverage_resid2_doc.format({
+    'extra_params_doc': "results: object\n"
+                        "    Results for a fitted regression model\n"
+                        "influence: instance\n"
+                        "    instance of Influence for model"}))
+def _plot_leverage_resid2(results, influence, alpha=.05, ax=None,
+                         **kwargs):
+
+    from scipy.stats import norm, zscore
+    fig, ax = utils.create_mpl_ax(ax)
+
+    infl = influence
+    leverage = infl.hat_matrix_diag
+    resid = zscore(infl.resid)
+    ax.plot(resid**2, leverage, 'o', **kwargs)
+    ax.set_xlabel("Normalized residuals**2")
+    ax.set_ylabel("Leverage")
+    ax.set_title("Leverage vs. Normalized residuals squared")
+
+    large_leverage = leverage > _high_leverage(results)
+    #norm or t here if standardized?
+    cutoff = norm.ppf(1.-alpha/2)
+    large_resid = np.abs(resid) > cutoff
+    labels = results.model.data.row_labels
+    if labels is None:
+        labels = lrange(int(results.nobs))
+    index = np.where(np.logical_or(large_leverage, large_resid))[0]
+    ax = utils.annotate_axes(index, labels, lzip(resid**2, leverage),
+                             [(0, 5)]*int(results.nobs), "large",
+                             ax=ax, ha="center", va="bottom")
+    ax.margins(.075, .075)
+    return fig
+
+
+@Appender(_plot_leverage_resid2_doc.format({
+    'extra_params_doc': "results : object\n"
+                        "    Results for a fitted regression model"}))
+def plot_leverage_resid2(results, alpha=.05, ax=None, **kwargs):
+
+    infl = results.get_influence()
+    return _plot_leverage_resid2(results, infl, alpha=alpha, ax=ax, **kwargs)
+
+
+
+@Appender(_plot_added_variable_doc % {
+    'extra_params_doc': "results : object\n"
+                        "    Results for a fitted regression model"})
+def plot_added_variable(results, focus_exog, resid_type=None,
+                        use_glm_weights=True, fit_kwargs=None, ax=None):
+
+    model = results.model
+
+    fig, ax = utils.create_mpl_ax(ax)
+
+    endog_resid, focus_exog_resid =\
+                 added_variable_resids(results, focus_exog,
+                                       resid_type=resid_type,
+                                       use_glm_weights=use_glm_weights,
+                                       fit_kwargs=fit_kwargs)
+
+    ax.plot(focus_exog_resid, endog_resid, 'o', alpha=0.6)
+
+    ax.set_title('Added variable plot', fontsize='large')
+
+    if isinstance(focus_exog, str):
+        xname = focus_exog
+    else:
+        xname = model.exog_names[focus_exog]
+    ax.set_xlabel(xname, size=15)
+    ax.set_ylabel(model.endog_names + " residuals", size=15)
+
+    return fig
+
+
+@Appender(_plot_partial_residuals_doc % {
+    'extra_params_doc': "results : object\n"
+                        "    Results for a fitted regression model"})
+def plot_partial_residuals(results, focus_exog, ax=None):
+    # Docstring attached below
+
+    model = results.model
+
+    focus_exog, focus_col = utils.maybe_name_or_idx(focus_exog, model)
+
+    pr = partial_resids(results, focus_exog)
+    focus_exog_vals = results.model.exog[:, focus_col]
+
+    fig, ax = utils.create_mpl_ax(ax)
+    ax.plot(focus_exog_vals, pr, 'o', alpha=0.6)
+
+    ax.set_title('Partial residuals plot', fontsize='large')
+
+    if isinstance(focus_exog, str):
+        xname = focus_exog
+    else:
+        xname = model.exog_names[focus_exog]
+    ax.set_xlabel(xname, size=15)
+    ax.set_ylabel("Component plus residual", size=15)
+
+    return fig
+
+
+@Appender(_plot_ceres_residuals_doc % {
+    'extra_params_doc': "results : Results\n"
+                        "        Results instance of a fitted regression "
+                        "model."})
+def plot_ceres_residuals(results, focus_exog, frac=0.66, cond_means=None,
+                         ax=None):
+
+    model = results.model
+
+    focus_exog, focus_col = utils.maybe_name_or_idx(focus_exog, model)
+
+    presid = ceres_resids(results, focus_exog, frac=frac,
+                          cond_means=cond_means)
+
+    focus_exog_vals = model.exog[:, focus_col]
+
+    fig, ax = utils.create_mpl_ax(ax)
+    ax.plot(focus_exog_vals, presid, 'o', alpha=0.6)
+
+    ax.set_title('CERES residuals plot', fontsize='large')
+
+    ax.set_xlabel(focus_exog, size=15)
+    ax.set_ylabel("Component plus residual", size=15)
+
+    return fig


 def ceres_resids(results, focus_exog, frac=0.66, cond_means=None):
@@ -558,8 +1096,61 @@ def ceres_resids(results, focus_exog, frac=0.66, cond_means=None):

     Currently only supports GLM, GEE, and OLS models.
     """
-    pass

+    model = results.model
+
+    if not isinstance(model, (GLM, GEE, OLS)):
+        raise ValueError("ceres residuals not available for %s" %
+                         model.__class__.__name__)
+
+    focus_exog, focus_col = utils.maybe_name_or_idx(focus_exog, model)
+
+    # Indices of non-focus columns
+    ix_nf = range(len(results.params))
+    ix_nf = list(ix_nf)
+    ix_nf.pop(focus_col)
+    nnf = len(ix_nf)
+
+    # Estimate the conditional means if not provided.
+    if cond_means is None:
+
+        # Below we calculate E[x | focus] where x is each column other
+        # than the focus column.  We do not want the intercept when we do
+        # this so we remove it here.
+        pexog = model.exog[:, ix_nf]
+        pexog -= pexog.mean(0)
+        u, s, vt = np.linalg.svd(pexog, 0)
+        ii = np.flatnonzero(s > 1e-6)
+        pexog = u[:, ii]
+
+        fcol = model.exog[:, focus_col]
+        cond_means = np.empty((len(fcol), pexog.shape[1]))
+        for j in range(pexog.shape[1]):
+
+            # Get the fitted values for column i given the other
+            # columns (skip the intercept).
+            y0 = pexog[:, j]
+
+            cf = lowess(y0, fcol, frac=frac, return_sorted=False)
+
+            cond_means[:, j] = cf
+
+    new_exog = np.concatenate((model.exog[:, ix_nf], cond_means), axis=1)
+
+    # Refit the model using the adjusted exog values
+    klass = model.__class__
+    init_kwargs = model._get_init_kwds()
+    new_model = klass(model.endog, new_exog, **init_kwargs)
+    new_result = new_model.fit()
+
+    # The partial residual, with respect to l(x2) (notation of Cook 1998)
+    presid = model.endog - new_result.fittedvalues
+    if isinstance(model, (GLM, GEE)):
+        presid *= model.family.link.deriv(new_result.fittedvalues)
+    if new_exog.shape[1] > nnf:
+        presid += np.dot(new_exog[:, nnf:], new_result.params[nnf:])
+
+    return presid

 def partial_resids(results, focus_exog):
     """
@@ -584,11 +1175,33 @@ def partial_resids(results, focus_exog):
     generalized linear models.  Journal of the American Statistical
     Association, 93:442.
     """
-    pass

+    # TODO: could be a method of results
+    # TODO: see Cook et al (1998) for a more general definition
+
+    # The calculation follows equation (8) from Cook's paper.
+    model = results.model
+    resid = model.endog - results.predict()
+
+    if isinstance(model, (GLM, GEE)):
+        resid *= model.family.link.deriv(results.fittedvalues)
+    elif isinstance(model, (OLS, GLS, WLS)):
+        pass # No need to do anything
+    else:
+        raise ValueError("Partial residuals for '%s' not implemented."
+                         % type(model))
+
+    if type(focus_exog) is str:
+        focus_col = model.exog_names.index(focus_exog)
+    else:
+        focus_col = focus_exog
+
+    focus_val = results.params[focus_col] * model.exog[:, focus_col]
+
+    return focus_val + resid

 def added_variable_resids(results, focus_exog, resid_type=None,
-    use_glm_weights=True, fit_kwargs=None):
+                          use_glm_weights=True, fit_kwargs=None):
     """
     Residualize the endog variable and a 'focus' exog variable in a
     regression model with respect to the other exog variables.
@@ -627,4 +1240,57 @@ def added_variable_resids(results, focus_exog, resid_type=None,

     Currently only GLM, GEE, and OLS models are supported.
     """
-    pass
+
+    model = results.model
+    if not isinstance(model, (GEE, GLM, OLS)):
+        raise ValueError("model type %s not supported for added variable residuals" %
+                         model.__class__.__name__)
+
+    exog = model.exog
+    endog = model.endog
+
+    focus_exog, focus_col = utils.maybe_name_or_idx(focus_exog, model)
+
+    focus_exog_vals = exog[:, focus_col]
+
+    # Default residuals
+    if resid_type is None:
+        if isinstance(model, (GEE, GLM)):
+            resid_type = "resid_deviance"
+        else:
+            resid_type = "resid"
+
+    ii = range(exog.shape[1])
+    ii = list(ii)
+    ii.pop(focus_col)
+    reduced_exog = exog[:, ii]
+    start_params = results.params[ii]
+
+    klass = model.__class__
+
+    kwargs = model._get_init_kwds()
+    new_model = klass(endog, reduced_exog, **kwargs)
+    args = {"start_params": start_params}
+    if fit_kwargs is not None:
+        args.update(fit_kwargs)
+    new_result = new_model.fit(**args)
+    if not getattr(new_result, "converged", True):
+        raise ValueError("fit did not converge when calculating added variable residuals")
+
+    try:
+        endog_resid = getattr(new_result, resid_type)
+    except AttributeError:
+        raise ValueError("'%s' residual type not available" % resid_type)
+
+    import statsmodels.regression.linear_model as lm
+
+    if isinstance(model, (GLM, GEE)) and use_glm_weights:
+        weights = model.family.weights(results.fittedvalues)
+        if hasattr(model, "data_weights"):
+            weights = weights * model.data_weights
+        lm_results = lm.WLS(focus_exog_vals, reduced_exog, weights).fit()
+    else:
+        lm_results = lm.OLS(focus_exog_vals, reduced_exog).fit()
+    focus_exog_resid = lm_results.resid
+
+    return endog_resid, focus_exog_resid
diff --git a/statsmodels/graphics/tsaplots.py b/statsmodels/graphics/tsaplots.py
index 2eca95ce8..c3358cd28 100644
--- a/statsmodels/graphics/tsaplots.py
+++ b/statsmodels/graphics/tsaplots.py
@@ -1,18 +1,103 @@
 """Correlation plot functions."""
 from statsmodels.compat.pandas import deprecate_kwarg
+
 import calendar
+
 import numpy as np
 import pandas as pd
+
 from statsmodels.graphics import utils
 from statsmodels.tools.validation import array_like
 from statsmodels.tsa.stattools import acf, pacf, ccf


-@deprecate_kwarg('unbiased', 'adjusted')
-def plot_acf(x, ax=None, lags=None, *, alpha=0.05, use_vlines=True,
-    adjusted=False, fft=False, missing='none', title='Autocorrelation',
-    zero=True, auto_ylims=False, bartlett_confint=True, vlines_kwargs=None,
-    **kwargs):
+def _prepare_data_corr_plot(x, lags, zero):
+    zero = bool(zero)
+    irregular = False if zero else True
+    if lags is None:
+        # GH 4663 - use a sensible default value
+        nobs = x.shape[0]
+        lim = min(int(np.ceil(10 * np.log10(nobs))), nobs // 2)
+        lags = np.arange(not zero, lim + 1)
+    elif np.isscalar(lags):
+        lags = np.arange(not zero, int(lags) + 1)  # +1 for zero lag
+    else:
+        irregular = True
+        lags = np.asanyarray(lags).astype(int)
+    nlags = lags.max(0)
+
+    return lags, nlags, irregular
+
+
+def _plot_corr(
+    ax,
+    title,
+    acf_x,
+    confint,
+    lags,
+    irregular,
+    use_vlines,
+    vlines_kwargs,
+    auto_ylims=False,
+    skip_lag0_confint=True,
+    **kwargs,
+):
+    if irregular:
+        acf_x = acf_x[lags]
+        if confint is not None:
+            confint = confint[lags]
+
+    if use_vlines:
+        ax.vlines(lags, [0], acf_x, **vlines_kwargs)
+        ax.axhline(**kwargs)
+
+    kwargs.setdefault("marker", "o")
+    kwargs.setdefault("markersize", 5)
+    if "ls" not in kwargs:
+        # gh-2369
+        kwargs.setdefault("linestyle", "None")
+    ax.margins(0.05)
+    ax.plot(lags, acf_x, **kwargs)
+    ax.set_title(title)
+
+    ax.set_ylim(-1, 1)
+    if auto_ylims:
+        ax.set_ylim(
+            1.25 * np.minimum(min(acf_x), min(confint[:, 0] - acf_x)),
+            1.25 * np.maximum(max(acf_x), max(confint[:, 1] - acf_x)),
+        )
+
+    if confint is not None:
+        if skip_lag0_confint and lags[0] == 0:
+            lags = lags[1:]
+            confint = confint[1:]
+            acf_x = acf_x[1:]
+        lags = lags.astype(float)
+        lags[np.argmin(lags)] -= 0.5
+        lags[np.argmax(lags)] += 0.5
+        ax.fill_between(
+            lags, confint[:, 0] - acf_x, confint[:, 1] - acf_x, alpha=0.25
+        )
+
+
+@deprecate_kwarg("unbiased", "adjusted")
+def plot_acf(
+    x,
+    ax=None,
+    lags=None,
+    *,
+    alpha=0.05,
+    use_vlines=True,
+    adjusted=False,
+    fft=False,
+    missing="none",
+    title="Autocorrelation",
+    zero=True,
+    auto_ylims=False,
+    bartlett_confint=True,
+    vlines_kwargs=None,
+    **kwargs,
+):
     """
     Plot the autocorrelation function

@@ -122,12 +207,53 @@ def plot_acf(x, ax=None, lags=None, *, alpha=0.05, use_vlines=True,

     .. plot:: plots/graphics_tsa_plot_acf.py
     """
-    pass
-
-
-def plot_pacf(x, ax=None, lags=None, alpha=0.05, method='ywm', use_vlines=
-    True, title='Partial Autocorrelation', zero=True, vlines_kwargs=None,
-    **kwargs):
+    fig, ax = utils.create_mpl_ax(ax)
+
+    lags, nlags, irregular = _prepare_data_corr_plot(x, lags, zero)
+    vlines_kwargs = {} if vlines_kwargs is None else vlines_kwargs
+
+    confint = None
+    # acf has different return type based on alpha
+    acf_x = acf(
+        x,
+        nlags=nlags,
+        alpha=alpha,
+        fft=fft,
+        bartlett_confint=bartlett_confint,
+        adjusted=adjusted,
+        missing=missing,
+    )
+    if alpha is not None:
+        acf_x, confint = acf_x[:2]
+
+    _plot_corr(
+        ax,
+        title,
+        acf_x,
+        confint,
+        lags,
+        irregular,
+        use_vlines,
+        vlines_kwargs,
+        auto_ylims=auto_ylims,
+        **kwargs,
+    )
+
+    return fig
+
+
+def plot_pacf(
+    x,
+    ax=None,
+    lags=None,
+    alpha=0.05,
+    method="ywm",
+    use_vlines=True,
+    title="Partial Autocorrelation",
+    zero=True,
+    vlines_kwargs=None,
+    **kwargs,
+):
     """
     Plot the partial autocorrelation function

@@ -218,12 +344,47 @@ def plot_pacf(x, ax=None, lags=None, alpha=0.05, method='ywm', use_vlines=

     .. plot:: plots/graphics_tsa_plot_pacf.py
     """
-    pass
-
-
-def plot_ccf(x, y, *, ax=None, lags=None, negative_lags=False, alpha=0.05,
-    use_vlines=True, adjusted=False, fft=False, title='Cross-correlation',
-    auto_ylims=False, vlines_kwargs=None, **kwargs):
+    fig, ax = utils.create_mpl_ax(ax)
+    vlines_kwargs = {} if vlines_kwargs is None else vlines_kwargs
+    lags, nlags, irregular = _prepare_data_corr_plot(x, lags, zero)
+
+    confint = None
+    if alpha is None:
+        acf_x = pacf(x, nlags=nlags, alpha=alpha, method=method)
+    else:
+        acf_x, confint = pacf(x, nlags=nlags, alpha=alpha, method=method)
+
+    _plot_corr(
+        ax,
+        title,
+        acf_x,
+        confint,
+        lags,
+        irregular,
+        use_vlines,
+        vlines_kwargs,
+        **kwargs,
+    )
+
+    return fig
+
+
+def plot_ccf(
+        x,
+        y,
+        *,
+        ax=None,
+        lags=None,
+        negative_lags=False,
+        alpha=0.05,
+        use_vlines=True,
+        adjusted=False,
+        fft=False,
+        title="Cross-correlation",
+        auto_ylims=False,
+        vlines_kwargs=None,
+        **kwargs,
+):
     """
     Plot the cross-correlation function

@@ -289,13 +450,58 @@ def plot_ccf(x, y, *, ax=None, lags=None, negative_lags=False, alpha=0.05,
     >>> sm.graphics.tsa.plot_ccf(diffed["unemp"], diffed["infl"])
     >>> plt.show()
     """
-    pass
-
-
-def plot_accf_grid(x, *, varnames=None, fig=None, lags=None, negative_lags=
-    True, alpha=0.05, use_vlines=True, adjusted=False, fft=False, missing=
-    'none', zero=True, auto_ylims=False, bartlett_confint=False,
-    vlines_kwargs=None, **kwargs):
+    fig, ax = utils.create_mpl_ax(ax)
+
+    lags, nlags, irregular = _prepare_data_corr_plot(x, lags, True)
+    vlines_kwargs = {} if vlines_kwargs is None else vlines_kwargs
+
+    if negative_lags:
+        lags = -lags
+
+    ccf_res = ccf(
+        x, y, adjusted=adjusted, fft=fft, alpha=alpha, nlags=nlags + 1
+    )
+    if alpha is not None:
+        ccf_xy, confint = ccf_res
+    else:
+        ccf_xy = ccf_res
+        confint = None
+
+    _plot_corr(
+        ax,
+        title,
+        ccf_xy,
+        confint,
+        lags,
+        irregular,
+        use_vlines,
+        vlines_kwargs,
+        auto_ylims=auto_ylims,
+        skip_lag0_confint=False,
+        **kwargs,
+    )
+
+    return fig
+
+
+def plot_accf_grid(
+        x,
+        *,
+        varnames=None,
+        fig=None,
+        lags=None,
+        negative_lags=True,
+        alpha=0.05,
+        use_vlines=True,
+        adjusted=False,
+        fft=False,
+        missing="none",
+        zero=True,
+        auto_ylims=False,
+        bartlett_confint=False,
+        vlines_kwargs=None,
+        **kwargs,
+):
     """
     Plot auto/cross-correlation grid

@@ -375,7 +581,65 @@ def plot_accf_grid(x, *, varnames=None, fig=None, lags=None, negative_lags=
     >>> sm.graphics.tsa.plot_accf_grid(diffed[["unemp", "infl"]])
     >>> plt.show()
     """
-    pass
+    from statsmodels.tools.data import _is_using_pandas
+
+    array_like(x, "x", ndim=2)
+    m = x.shape[1]
+
+    fig = utils.create_mpl_fig(fig)
+    gs = fig.add_gridspec(m, m)
+
+    if _is_using_pandas(x, None):
+        varnames = varnames or list(x.columns)
+
+        def get_var(i):
+            return x.iloc[:, i]
+    else:
+        varnames = varnames or [f'x[{i}]' for i in range(m)]
+
+        x = np.asarray(x)
+
+        def get_var(i):
+            return x[:, i]
+
+    for i in range(m):
+        for j in range(m):
+            ax = fig.add_subplot(gs[i, j])
+            if i == j:
+                plot_acf(
+                    get_var(i),
+                    ax=ax,
+                    title=f'ACF({varnames[i]})',
+                    lags=lags,
+                    alpha=alpha,
+                    use_vlines=use_vlines,
+                    adjusted=adjusted,
+                    fft=fft,
+                    missing=missing,
+                    zero=zero,
+                    auto_ylims=auto_ylims,
+                    bartlett_confint=bartlett_confint,
+                    vlines_kwargs=vlines_kwargs,
+                    **kwargs,
+                )
+            else:
+                plot_ccf(
+                    get_var(i),
+                    get_var(j),
+                    ax=ax,
+                    title=f'CCF({varnames[i]}, {varnames[j]})',
+                    lags=lags,
+                    negative_lags=negative_lags and i > j,
+                    alpha=alpha,
+                    use_vlines=use_vlines,
+                    adjusted=adjusted,
+                    fft=fft,
+                    auto_ylims=auto_ylims,
+                    vlines_kwargs=vlines_kwargs,
+                    **kwargs,
+                )
+
+    return fig


 def seasonal_plot(grouped_x, xticklabels, ylabel=None, ax=None):
@@ -396,7 +660,26 @@ def seasonal_plot(grouped_x, xticklabels, ylabel=None, ax=None):
         If given, this subplot is used to plot in instead of a new figure being
         created.
     """
-    pass
+    fig, ax = utils.create_mpl_ax(ax)
+    start = 0
+    ticks = []
+    for season, df in grouped_x:
+        df = df.copy()  # or sort balks for series. may be better way
+        df.sort_index()
+        nobs = len(df)
+        x_plot = np.arange(start, start + nobs)
+        ticks.append(x_plot.mean())
+        ax.plot(x_plot, df.values, "k")
+        ax.hlines(
+            df.values.mean(), x_plot[0], x_plot[-1], colors="r", linewidth=3
+        )
+        start += nobs
+
+    ax.set_xticks(ticks)
+    ax.set_xticklabels(xticklabels)
+    ax.set_ylabel(ylabel)
+    ax.margins(0.1, 0.05)
+    return fig


 def month_plot(x, dates=None, ylabel=None, ax=None):
@@ -437,7 +720,19 @@ def month_plot(x, dates=None, ylabel=None, ax=None):

     .. plot:: plots/graphics_tsa_month_plot.py
     """
-    pass
+
+    if dates is None:
+        from statsmodels.tools.data import _check_period_index
+
+        _check_period_index(x, freq="M")
+    else:
+        x = pd.Series(x, index=pd.PeriodIndex(dates, freq="M"))
+
+    # there's no zero month
+    xticklabels = list(calendar.month_abbr)[1:]
+    return seasonal_plot(
+        x.groupby(lambda y: y.month), xticklabels, ylabel=ylabel, ax=ax
+    )


 def quarter_plot(x, dates=None, ylabel=None, ax=None):
@@ -478,11 +773,29 @@ def quarter_plot(x, dates=None, ylabel=None, ax=None):

     .. plot:: plots/graphics_tsa_quarter_plot.py
     """
-    pass
+
+    if dates is None:
+        from statsmodels.tools.data import _check_period_index
+
+        _check_period_index(x, freq="Q")
+    else:
+        x = pd.Series(x, index=pd.PeriodIndex(dates, freq="Q"))
+
+    xticklabels = ["q1", "q2", "q3", "q4"]
+    return seasonal_plot(
+        x.groupby(lambda y: y.quarter), xticklabels, ylabel=ylabel, ax=ax
+    )


-def plot_predict(result, start=None, end=None, dynamic=False, alpha=0.05,
-    ax=None, **predict_kwargs):
+def plot_predict(
+    result,
+    start=None,
+    end=None,
+    dynamic=False,
+    alpha=0.05,
+    ax=None,
+    **predict_kwargs,
+):
     """

     Parameters
@@ -523,4 +836,38 @@ def plot_predict(result, start=None, end=None, dynamic=False, alpha=0.05,
     Figure
         matplotlib Figure containing the prediction plot
     """
-    pass
+    from statsmodels.graphics.utils import _import_mpl, create_mpl_ax
+
+    _ = _import_mpl()
+    fig, ax = create_mpl_ax(ax)
+    from statsmodels.tsa.base.prediction import PredictionResults
+
+    # use predict so you set dates
+    pred: PredictionResults = result.get_prediction(
+        start=start, end=end, dynamic=dynamic, **predict_kwargs
+    )
+    mean = pred.predicted_mean
+    if isinstance(mean, (pd.Series, pd.DataFrame)):
+        x = mean.index
+        mean.plot(ax=ax, label="forecast")
+    else:
+        x = np.arange(mean.shape[0])
+        ax.plot(x, mean)
+
+    if alpha is not None:
+        label = f"{1-alpha:.0%} confidence interval"
+        ci = pred.conf_int(alpha)
+        conf_int = np.asarray(ci)
+
+        ax.fill_between(
+            x,
+            conf_int[:, 0],
+            conf_int[:, 1],
+            color="gray",
+            alpha=0.5,
+            label=label,
+        )
+
+    ax.legend(loc="best")
+
+    return fig
diff --git a/statsmodels/graphics/tukeyplot.py b/statsmodels/graphics/tukeyplot.py
index da5ce94ee..55a3c69af 100644
--- a/statsmodels/graphics/tukeyplot.py
+++ b/statsmodels/graphics/tukeyplot.py
@@ -1,8 +1,75 @@
 import matplotlib.lines as lines
 import matplotlib.pyplot as plt
 import numpy as np
-results = np.array([[-10.04391794, 26.34391794], [-21.45225794, 14.93557794
-    ], [5.61441206, 42.00224794], [-13.40225794, 22.98557794], [-
-    29.60225794, 6.78557794], [-2.53558794, 33.85224794], [-21.55225794, 
-    14.83557794], [8.87275206, 45.26058794], [-10.14391794, 26.24391794], [
-    -37.21058794, -0.82275206]])
+
+
+def tukeyplot(results, dim=None, yticklabels=None):
+    npairs = len(results)
+
+    fig = plt.figure()
+    fsp = fig.add_subplot(111)
+    fsp.axis([-50,50,0.5,10.5])
+    fsp.set_title('95 % family-wise confidence level')
+    fsp.title.set_y(1.025)
+    fsp.set_yticks(np.arange(1,11))
+    fsp.set_yticklabels(['V-T','V-S','T-S','V-P','T-P','S-P','V-M',
+                         'T-M','S-M','P-M'])
+    #fsp.yaxis.set_major_locator(mticker.MaxNLocator(npairs))
+    fsp.yaxis.grid(True, linestyle='-', color='gray')
+    fsp.set_xlabel('Differences in mean levels of Var', labelpad=8)
+    fsp.xaxis.tick_bottom()
+    fsp.yaxis.tick_left()
+
+    xticklines = fsp.get_xticklines()
+    for xtickline in xticklines:
+        xtickline.set_marker(lines.TICKDOWN)
+        xtickline.set_markersize(10)
+
+    xlabels = fsp.get_xticklabels()
+    for xlabel in xlabels:
+        xlabel.set_y(-.04)
+
+    yticklines = fsp.get_yticklines()
+    for ytickline in yticklines:
+        ytickline.set_marker(lines.TICKLEFT)
+        ytickline.set_markersize(10)
+
+    ylabels = fsp.get_yticklabels()
+    for ylabel in ylabels:
+        ylabel.set_x(-.04)
+
+    for pair in range(npairs):
+        data = .5+results[pair]/100.
+        #fsp.axhline(y=npairs-pair, xmin=data[0], xmax=data[1], linewidth=1.25,
+        fsp.axhline(y=npairs-pair, xmin=data.mean(), xmax=data[1], linewidth=1.25,
+            color='blue', marker="|",  markevery=1)
+
+        fsp.axhline(y=npairs-pair, xmin=data[0], xmax=data.mean(), linewidth=1.25,
+            color='blue', marker="|", markevery=1)
+
+    #for pair in range(npairs):
+    #    data = .5+results[pair]/100.
+    #    data = results[pair]
+    #    data = np.r_[data[0],data.mean(),data[1]]
+    #    l = plt.plot(data, [npairs-pair]*len(data), color='black',
+    #                linewidth=.5, marker="|", markevery=1)
+
+    fsp.axvline(x=0, linestyle="--", color='black')
+
+    fig.subplots_adjust(bottom=.125)
+
+
+
+results = np.array([[-10.04391794,  26.34391794],
+      [-21.45225794,  14.93557794],
+      [  5.61441206,  42.00224794],
+      [-13.40225794,  22.98557794],
+      [-29.60225794,   6.78557794],
+      [ -2.53558794,  33.85224794],
+      [-21.55225794,  14.83557794],
+      [  8.87275206,  45.26058794],
+      [-10.14391794,  26.24391794],
+      [-37.21058794,  -0.82275206]])
+
+
+#plt.show()
diff --git a/statsmodels/graphics/utils.py b/statsmodels/graphics/utils.py
index fc6424ee5..a8bbe99e1 100644
--- a/statsmodels/graphics/utils.py
+++ b/statsmodels/graphics/utils.py
@@ -1,11 +1,17 @@
 """Helper functions for graphics with Matplotlib."""
 from statsmodels.compat.python import lrange
+
 __all__ = ['create_mpl_ax', 'create_mpl_fig']


 def _import_mpl():
     """This function is not needed outside this utils module."""
-    pass
+    try:
+        import matplotlib.pyplot as plt
+    except:
+        raise ImportError("Matplotlib is not found.")
+
+    return plt


 def create_mpl_ax(ax=None):
@@ -44,7 +50,14 @@ def create_mpl_ax(ax=None):
     >>> from statsmodels.graphics import utils
     >>> fig, ax = utils.create_mpl_ax(ax)
     """
-    pass
+    if ax is None:
+        plt = _import_mpl()
+        fig = plt.figure()
+        ax = fig.add_subplot(111)
+    else:
+        fig = ax.figure
+
+    return fig, ax


 def create_mpl_fig(fig=None, figsize=None):
@@ -69,7 +82,11 @@ def create_mpl_fig(fig=None, figsize=None):
     --------
     create_mpl_ax
     """
-    pass
+    if fig is None:
+        plt = _import_mpl()
+        fig = plt.figure(figsize=figsize)
+
+    return fig


 def maybe_name_or_idx(idx, model):
@@ -77,7 +94,24 @@ def maybe_name_or_idx(idx, model):
     Give a name or an integer and return the name and integer location of the
     column in a design matrix.
     """
-    pass
+    if idx is None:
+        idx = lrange(model.exog.shape[1])
+    if isinstance(idx, int):
+        exog_name = model.exog_names[idx]
+        exog_idx = idx
+    # anticipate index as list and recurse
+    elif isinstance(idx, (tuple, list)):
+        exog_name = []
+        exog_idx = []
+        for item in idx:
+            exog_name_item, exog_idx_item = maybe_name_or_idx(item, model)
+            exog_name.append(exog_name_item)
+            exog_idx.append(exog_idx_item)
+    else: # assume we've got a string variable
+        exog_name = idx
+        exog_idx = model.exog_names.index(idx)
+
+    return exog_name, exog_idx


 def get_data_names(series_or_dataframe):
@@ -85,7 +119,18 @@ def get_data_names(series_or_dataframe):
     Input can be an array or pandas-like. Will handle 1d array-like but not
     2d. Returns a str for 1d data or a list of strings for 2d data.
     """
-    pass
+    names = getattr(series_or_dataframe, 'name', None)
+    if not names:
+        names = getattr(series_or_dataframe, 'columns', None)
+    if not names:
+        shape = getattr(series_or_dataframe, 'shape', [1])
+        nvars = 1 if len(shape) == 1 else series_or_dataframe.shape[1]
+        names = ["X%d" for _ in range(nvars)]
+        if nvars == 1:
+            names = names[0]
+    else:
+        names = names.tolist()
+    return names


 def annotate_axes(index, labels, points, offset_points, size, ax, **kwargs):
@@ -93,4 +138,10 @@ def annotate_axes(index, labels, points, offset_points, size, ax, **kwargs):
     Annotate Axes with labels, points, offset_points according to the
     given index.
     """
-    pass
+    for i in index:
+        label = labels[i]
+        point = points[i]
+        offset = offset_points[i]
+        ax.annotate(label, point, xytext=offset, textcoords="offset points",
+                    size=size, **kwargs)
+    return ax
diff --git a/statsmodels/imputation/bayes_mi.py b/statsmodels/imputation/bayes_mi.py
index a63136857..524df261a 100644
--- a/statsmodels/imputation/bayes_mi.py
+++ b/statsmodels/imputation/bayes_mi.py
@@ -53,25 +53,32 @@ class BayesGaussMI:
     """

     def __init__(self, data, mean_prior=None, cov_prior=None, cov_prior_df=1):
+
         self.exog_names = None
         if type(data) is pd.DataFrame:
             self.exog_names = data.columns
-        data = np.require(data, requirements='W')
+
+        data = np.require(data, requirements="W")
         self.data = data
         self._data = data
         self.mask = np.isnan(data)
         self.nobs = self.mask.shape[0]
         self.nvar = self.mask.shape[1]
+
+        # Identify all distinct missing data patterns
         z = 1 + np.log(1 + np.arange(self.mask.shape[1]))
         c = np.dot(self.mask, z)
         rowmap = {}
         for i, v in enumerate(c):
             if v == 0:
+                # No missing values
                 continue
             if v not in rowmap:
                 rowmap[v] = []
             rowmap[v].append(i)
         self.patterns = [np.asarray(v) for v in rowmap.values()]
+
+        # Simple starting values for mean and covariance
         p = self._data.shape[1]
         self.cov = np.eye(p)
         mean = []
@@ -79,29 +86,69 @@ class BayesGaussMI:
             v = self._data[:, i]
             v = v[np.isfinite(v)]
             if len(v) == 0:
-                msg = 'Column %d has no observed values' % i
+                msg = "Column %d has no observed values" % i
                 raise ValueError(msg)
             mean.append(v.mean())
         self.mean = np.asarray(mean)
+
+        # Default covariance matrix of the (Gaussian) mean prior
         if mean_prior is None:
             mean_prior = np.eye(p)
         self.mean_prior = mean_prior
+
+        # Default center matrix of the (inverse Wishart) covariance prior
         if cov_prior is None:
             cov_prior = np.eye(p)
         self.cov_prior = cov_prior
+
+        # Degrees of freedom for the (inverse Wishart) covariance prior
         self.cov_prior_df = cov_prior_df

     def update(self):
         """
         Cycle through all Gibbs updates.
         """
-        pass
+
+        self.update_data()
+
+        # Need to update data first
+        self.update_mean()
+        self.update_cov()

     def update_data(self):
         """
         Gibbs update of the missing data values.
         """
-        pass
+
+        for ix in self.patterns:
+
+            i = ix[0]
+            ix_miss = np.flatnonzero(self.mask[i, :])
+            ix_obs = np.flatnonzero(~self.mask[i, :])
+
+            mm = self.mean[ix_miss]
+            mo = self.mean[ix_obs]
+
+            voo = self.cov[ix_obs, :][:, ix_obs]
+            vmm = self.cov[ix_miss, :][:, ix_miss]
+            vmo = self.cov[ix_miss, :][:, ix_obs]
+
+            r = self._data[ix, :][:, ix_obs] - mo
+            cm = mm + np.dot(vmo, np.linalg.solve(voo, r.T)).T
+            cv = vmm - np.dot(vmo, np.linalg.solve(voo, vmo.T))
+
+            cs = np.linalg.cholesky(cv)
+            u = np.random.normal(size=(len(ix), len(ix_miss)))
+            self._data[np.ix_(ix, ix_miss)] = cm + np.dot(u, cs.T)
+
+        # Set the user-visible data set.
+        if self.exog_names is not None:
+            self.data = pd.DataFrame(
+                           self._data,
+                           columns=self.exog_names,
+                           copy=False)
+        else:
+            self.data = self._data

     def update_mean(self):
         """
@@ -109,7 +156,20 @@ class BayesGaussMI:

         Do not call until update_data has been called once.
         """
-        pass
+        # https://stats.stackexchange.com/questions/28744/multivariate-normal-posterior
+
+        # Posterior covariance matrix of the mean
+        cm = np.linalg.solve(self.cov/self.nobs + self.mean_prior,
+                             self.mean_prior / self.nobs)
+        cm = np.dot(self.cov, cm)
+
+        # Posterior mean of the mean
+        vm = np.linalg.solve(self.cov, self._data.sum(0))
+        vm = np.dot(cm, vm)
+
+        # Sample
+        r = np.linalg.cholesky(cm)
+        self.mean = vm + np.dot(r, np.random.normal(0, 1, self.nvar))

     def update_cov(self):
         """
@@ -117,7 +177,17 @@ class BayesGaussMI:

         Do not call until update_data has been called once.
         """
-        pass
+        # https://stats.stackexchange.com/questions/50844/estimating-the-covariance-posterior-distribution-of-a-multivariate-gaussian
+
+        r = self._data - self.mean
+        gr = np.dot(r.T, r)
+        a = gr + self.cov_prior
+        df = int(np.ceil(self.nobs + self.cov_prior_df))
+
+        r = np.linalg.cholesky(np.linalg.inv(a))
+        x = np.dot(np.random.normal(size=(df, self.nvar)), r.T)
+        ma = np.dot(x.T, x)
+        self.cov = np.linalg.inv(ma)


 class MI:
@@ -167,39 +237,49 @@ class MI:
     """

     def __init__(self, imp, model, model_args_fn=None, model_kwds_fn=None,
-        formula=None, fit_args=None, fit_kwds=None, xfunc=None, burn=100,
-        nrep=20, skip=10):
+                 formula=None, fit_args=None, fit_kwds=None, xfunc=None,
+                 burn=100, nrep=20, skip=10):
+
+        # The imputer
         self.imp = imp
+
+        # The number of imputed data sets to skip between each imputed
+        # data set tha that is used in the analysis.
         self.skip = skip
+
+        # The model class
         self.model = model
         self.formula = formula
-        if model_args_fn is None:

+        if model_args_fn is None:
             def f(x):
                 return []
             model_args_fn = f
         self.model_args_fn = model_args_fn
-        if model_kwds_fn is None:

+        if model_kwds_fn is None:
             def f(x):
                 return {}
             model_kwds_fn = f
         self.model_kwds_fn = model_kwds_fn
-        if fit_args is None:

+        if fit_args is None:
             def f(x):
                 return []
             fit_args = f
         self.fit_args = fit_args
-        if fit_kwds is None:

+        if fit_kwds is None:
             def f(x):
                 return {}
             fit_kwds = f
         self.fit_kwds = fit_kwds
+
         self.xfunc = xfunc
         self.nrep = nrep
         self.skip = skip
+
+        # Burn-in
         for k in range(burn):
             imp.update()

@@ -219,7 +299,70 @@ class MI:
         -------
         A MIResults object.
         """
-        pass
+
+        par, cov = [], []
+        all_results = []
+
+        for k in range(self.nrep):
+
+            for k in range(self.skip+1):
+                self.imp.update()
+
+            da = self.imp.data
+
+            if self.xfunc is not None:
+                da = self.xfunc(da)
+
+            if self.formula is None:
+                model = self.model(*self.model_args_fn(da),
+                                   **self.model_kwds_fn(da))
+            else:
+                model = self.model.from_formula(
+                          self.formula, *self.model_args_fn(da),
+                          **self.model_kwds_fn(da))
+
+            result = model.fit(*self.fit_args(da), **self.fit_kwds(da))
+
+            if results_cb is not None:
+                all_results.append(results_cb(result))
+
+            par.append(np.asarray(result.params.copy()))
+            cov.append(np.asarray(result.cov_params().copy()))
+
+        params, cov_params, fmi = self._combine(par, cov)
+
+        r = MIResults(self, model, params, cov_params)
+        r.fmi = fmi
+
+        r.results = all_results
+
+        return r
+
+    def _combine(self, par, cov):
+        # Helper function to apply "Rubin's combining rule"
+
+        par = np.asarray(par)
+
+        # Number of imputations
+        m = par.shape[0]
+
+        # Point estimate
+        params = par.mean(0)
+
+        # Within-imputation covariance
+        wcov = sum(cov) / len(cov)
+
+        # Between-imputation covariance
+        bcov = np.cov(par.T)
+        bcov = np.atleast_2d(bcov)
+
+        # Overall covariance
+        covp = wcov + (1 + 1/float(m))*bcov
+
+        # Fraction of missing information
+        fmi = (1 + 1/float(m)) * np.diag(bcov) / np.diag(covp)
+
+        return params, covp, fmi


 class MIResults(LikelihoodModelResults):
@@ -241,11 +384,12 @@ class MIResults(LikelihoodModelResults):
     """

     def __init__(self, mi, model, params, normalized_cov_params):
+
         super(MIResults, self).__init__(model, params, normalized_cov_params)
         self.mi = mi
         self._model = model

-    def summary(self, title=None, alpha=0.05):
+    def summary(self, title=None, alpha=.05):
         """
         Summarize the results of running multiple imputation.

@@ -263,4 +407,25 @@ class MIResults(LikelihoodModelResults):
             This holds the summary tables and text, which can be
             printed or converted to various output formats.
         """
-        pass
+
+        from statsmodels.iolib import summary2
+
+        smry = summary2.Summary()
+        float_format = "%8.3f"
+
+        info = {}
+        info["Method:"] = "MI"
+        info["Model:"] = self.mi.model.__name__
+        info["Dependent variable:"] = self._model.endog_names
+        info["Sample size:"] = "%d" % self.mi.imp.data.shape[0]
+        info["Num. imputations"] = "%d" % self.mi.nrep
+
+        smry.add_dict(info, align='l', float_format=float_format)
+
+        param = summary2.summary_params(self, alpha=alpha)
+        param["FMI"] = self.fmi
+
+        smry.add_df(param, float_format=float_format)
+        smry.add_title(title=title, results=self)
+
+        return smry
diff --git a/statsmodels/imputation/mice.py b/statsmodels/imputation/mice.py
index 3635d4447..960d7818b 100644
--- a/statsmodels/imputation/mice.py
+++ b/statsmodels/imputation/mice.py
@@ -114,12 +114,15 @@ A Gelman et al.: 'Multiple Imputation with Diagnostics (mi) in R:
 Opening Windows into the Black Box', Journal of Statistical Software,
 2009.
 """
+
 import pandas as pd
 import numpy as np
 import patsy
 from statsmodels.base.model import LikelihoodModelResults
 from statsmodels.regression.linear_model import OLS
 from collections import defaultdict
+
+
 _mice_data_example_1 = """
     >>> imp = mice.MICEData(data)
     >>> imp.set_imputer('x1', formula='x2 + np.square(x2) + x3')
@@ -132,14 +135,14 @@ class PatsyFormula:
     """
     A simple wrapper for a string to be interpreted as a Patsy formula.
     """
-
     def __init__(self, formula):
-        self.formula = '0 + ' + formula
+        self.formula = "0 + " + formula


 class MICEData:
-    __doc__ = (
-        """    Wrap a data set to allow missing data handling with MICE.
+
+    __doc__ = """\
+    Wrap a data set to allow missing data handling with MICE.

     Parameters
     ----------
@@ -176,42 +179,72 @@ class MICEData:
     `data`.  The variable named `x1` has a conditional mean structure
     that includes an additional term for x2^2.
     %(_mice_data_example_1)s
-    """
-         % {'_mice_data_example_1': _mice_data_example_1})
+    """ % {'_mice_data_example_1': _mice_data_example_1}
+
+    def __init__(self, data, perturbation_method='gaussian',
+                 k_pmm=20, history_callback=None):

-    def __init__(self, data, perturbation_method='gaussian', k_pmm=20,
-        history_callback=None):
         if data.columns.dtype != np.dtype('O'):
-            msg = 'MICEData data column names should be string type'
+            msg = "MICEData data column names should be string type"
             raise ValueError(msg)
+
         self.regularized = dict()
+
+        # Drop observations where all variables are missing.  This
+        # also has the effect of copying the data frame.
         self.data = data.dropna(how='all').reset_index(drop=True)
+
         self.history_callback = history_callback
         self.history = []
         self.predict_kwds = {}
-        self.perturbation_method = defaultdict(lambda : perturbation_method)
+
+        # Assign the same perturbation method for all variables.
+        # Can be overridden when calling 'set_imputer'.
+        self.perturbation_method = defaultdict(lambda:
+                                               perturbation_method)
+
+        # Map from variable name to indices of observed/missing
+        # values.
         self.ix_obs = {}
         self.ix_miss = {}
         for col in self.data.columns:
             ix_obs, ix_miss = self._split_indices(self.data[col])
             self.ix_obs[col] = ix_obs
             self.ix_miss[col] = ix_miss
+
+        # Most recent model instance and results instance for each variable.
         self.models = {}
         self.results = {}
+
+        # Map from variable names to the conditional formula.
         self.conditional_formula = {}
+
+        # Map from variable names to init/fit args of the conditional
+        # models.
         self.init_kwds = defaultdict(dict)
         self.fit_kwds = defaultdict(dict)
+
+        # Map from variable names to the model class.
         self.model_class = {}
+
+        # Map from variable names to most recent params update.
         self.params = {}
+
+        # Set default imputers.
         for vname in data.columns:
             self.set_imputer(vname)
+
+        # The order in which variables are imputed in each cycle.
+        # Impute variables with the fewest missing values first.
         vnames = list(data.columns)
         nmiss = [len(self.ix_miss[v]) for v in vnames]
         nmiss = np.asarray(nmiss)
         ii = np.argsort(nmiss)
         ii = ii[sum(nmiss == 0):]
         self._cycle_order = [vnames[i] for i in ii]
+
         self._initial_imputation()
+
         self.k_pmm = k_pmm

     def next_sample(self):
@@ -232,7 +265,9 @@ class MICEData:
         The returned value is a reference to the data attribute of
         the class and should be copied before making any changes.
         """
-        pass
+
+        self.update_all(1)
+        return self.data

     def _initial_imputation(self):
         """
@@ -241,11 +276,27 @@ class MICEData:
         For each variable, missing values are imputed as the observed
         value that is closest to the mean over all observed values.
         """
-        pass
+        # Changed for pandas 2.0 copy-on-write behavior to use a single
+        # in-place fill
+        imp_values = {}
+        for col in self.data.columns:
+            di = self.data[col] - self.data[col].mean()
+            di = np.abs(di)
+            ix = di.idxmin()
+            imp_values[col] = self.data[col].loc[ix]
+        self.data.fillna(imp_values, inplace=True)
+
+    def _split_indices(self, vec):
+        null = pd.isnull(vec)
+        ix_obs = np.flatnonzero(~null)
+        ix_miss = np.flatnonzero(null)
+        if len(ix_obs) == 0:
+            raise ValueError("variable to be imputed has no observed values")
+        return ix_obs, ix_miss

     def set_imputer(self, endog_name, formula=None, model_class=None,
-        init_kwds=None, fit_kwds=None, predict_kwds=None, k_pmm=20,
-        perturbation_method=None, regularized=False):
+                    init_kwds=None, fit_kwds=None, predict_kwds=None,
+                    k_pmm=20, perturbation_method=None, regularized=False):
         """
         Specify the imputation process for a single variable.

@@ -290,7 +341,35 @@ class MICEData:
               that returns a square array-like object.
             * The model must have a `predict` method.
         """
-        pass
+
+        if formula is None:
+            main_effects = [x for x in self.data.columns
+                            if x != endog_name]
+            fml = endog_name + " ~ " + " + ".join(main_effects)
+            self.conditional_formula[endog_name] = fml
+        else:
+            fml = endog_name + " ~ " + formula
+            self.conditional_formula[endog_name] = fml
+
+        if model_class is None:
+            self.model_class[endog_name] = OLS
+        else:
+            self.model_class[endog_name] = model_class
+
+        if init_kwds is not None:
+            self.init_kwds[endog_name] = init_kwds
+
+        if fit_kwds is not None:
+            self.fit_kwds[endog_name] = fit_kwds
+
+        if predict_kwds is not None:
+            self.predict_kwds[endog_name] = predict_kwds
+
+        if perturbation_method is not None:
+            self.perturbation_method[endog_name] = perturbation_method
+
+        self.k_pmm = k_pmm
+        self.regularized[endog_name] = regularized

     def _store_changes(self, col, vals):
         """
@@ -303,7 +382,10 @@ class MICEData:
         vals : ndarray
             Array of imputed values to use for filling-in missing values.
         """
-        pass
+
+        ix = self.ix_miss[col]
+        if len(ix) > 0:
+            self.data.iloc[ix, self.data.columns.get_loc(col)] = np.atleast_1d(vals)

     def update_all(self, n_iter=1):
         """
@@ -319,7 +401,14 @@ class MICEData:
         -----
         The imputed values are stored in the class attribute `self.data`.
         """
-        pass
+
+        for k in range(n_iter):
+            for vname in self._cycle_order:
+                self.update(vname)
+
+        if self.history_callback is not None:
+            hv = self.history_callback(self)
+            self.history.append(hv)

     def get_split_data(self, vname):
         """
@@ -347,7 +436,45 @@ class MICEData:
             The fit keyword arguments for `vname`, processed through Patsy
             as required.
         """
-        pass
+
+        formula = self.conditional_formula[vname]
+        endog, exog = patsy.dmatrices(formula, self.data,
+                                      return_type="dataframe")
+
+        # Rows with observed endog
+        ixo = self.ix_obs[vname]
+        endog_obs = np.require(endog.iloc[ixo], requirements="W")
+        exog_obs = np.require(exog.iloc[ixo, :], requirements="W")
+
+        # Rows with missing endog
+        ixm = self.ix_miss[vname]
+        exog_miss = np.require(exog.iloc[ixm, :], requirements="W")
+
+        predict_obs_kwds = {}
+        if vname in self.predict_kwds:
+            kwds = self.predict_kwds[vname]
+            predict_obs_kwds = self._process_kwds(kwds, ixo)
+
+        predict_miss_kwds = {}
+        if vname in self.predict_kwds:
+            kwds = self.predict_kwds[vname]
+            predict_miss_kwds = self._process_kwds(kwds, ixo)
+
+        return (endog_obs, exog_obs, exog_miss, predict_obs_kwds,
+                predict_miss_kwds)
+
+    def _process_kwds(self, kwds, ix):
+        kwds = kwds.copy()
+        for k in kwds:
+            v = kwds[k]
+            if isinstance(v, PatsyFormula):
+                mat = patsy.dmatrix(v.formula, self.data,
+                                    return_type="dataframe")
+                mat = np.require(mat, requirements="W")[ix, :]
+                if mat.shape[1] == 1:
+                    mat = mat[:, 0]
+                kwds[k] = mat
+        return kwds

     def get_fitting_data(self, vname):
         """
@@ -378,11 +505,27 @@ class MICEData:
             The fit keyword arguments for `vname`, processed through Patsy
             as required.
         """
-        pass

-    def plot_missing_pattern(self, ax=None, row_order='pattern',
-        column_order='pattern', hide_complete_rows=False,
-        hide_complete_columns=False, color_row_patterns=True):
+        # Rows with observed endog
+        ix = self.ix_obs[vname]
+
+        formula = self.conditional_formula[vname]
+        endog, exog = patsy.dmatrices(formula, self.data,
+                                      return_type="dataframe")
+
+        endog = np.require(endog.iloc[ix, 0], requirements="W")
+        exog = np.require(exog.iloc[ix, :], requirements="W")
+
+        init_kwds = self._process_kwds(self.init_kwds[vname], ix)
+        fit_kwds = self._process_kwds(self.fit_kwds[vname], ix)
+
+        return endog, exog, init_kwds, fit_kwds
+
+    def plot_missing_pattern(self, ax=None, row_order="pattern",
+                             column_order="pattern",
+                             hide_complete_rows=False,
+                             hide_complete_columns=False,
+                             color_row_patterns=True):
         """
         Generate an image showing the missing data pattern.

@@ -408,10 +551,82 @@ class MICEData:
         -------
         A figure containing a plot of the missing data pattern.
         """
-        pass

-    def plot_bivariate(self, col1_name, col2_name, lowess_args=None,
-        lowess_min_n=40, jitter=None, plot_points=True, ax=None):
+        # Create an indicator matrix for missing values.
+        miss = np.zeros(self.data.shape)
+        cols = self.data.columns
+        for j, col in enumerate(cols):
+            ix = self.ix_miss[col]
+            miss[ix, j] = 1
+
+        # Order the columns as requested
+        if column_order == "proportion":
+            ix = np.argsort(miss.mean(0))
+        elif column_order == "pattern":
+            cv = np.cov(miss.T)
+            u, s, vt = np.linalg.svd(cv, 0)
+            ix = np.argsort(cv[:, 0])
+        elif column_order == "raw":
+            ix = np.arange(len(cols))
+        else:
+            raise ValueError(
+                column_order + " is not an allowed value for `column_order`.")
+        miss = miss[:, ix]
+        cols = [cols[i] for i in ix]
+
+        # Order the rows as requested
+        if row_order == "proportion":
+            ix = np.argsort(miss.mean(1))
+        elif row_order == "pattern":
+            x = 2**np.arange(miss.shape[1])
+            rky = np.dot(miss, x)
+            ix = np.argsort(rky)
+        elif row_order == "raw":
+            ix = np.arange(miss.shape[0])
+        else:
+            raise ValueError(
+                row_order + " is not an allowed value for `row_order`.")
+        miss = miss[ix, :]
+
+        if hide_complete_rows:
+            ix = np.flatnonzero((miss == 1).any(1))
+            miss = miss[ix, :]
+
+        if hide_complete_columns:
+            ix = np.flatnonzero((miss == 1).any(0))
+            miss = miss[:, ix]
+            cols = [cols[i] for i in ix]
+
+        from statsmodels.graphics import utils as gutils
+        from matplotlib.colors import LinearSegmentedColormap
+
+        if ax is None:
+            fig, ax = gutils.create_mpl_ax(ax)
+        else:
+            fig = ax.get_figure()
+
+        if color_row_patterns:
+            x = 2**np.arange(miss.shape[1])
+            rky = np.dot(miss, x)
+            _, rcol = np.unique(rky, return_inverse=True)
+            miss *= 1 + rcol[:, None]
+            ax.imshow(miss, aspect="auto", interpolation="nearest",
+                      cmap='gist_ncar_r')
+        else:
+            cmap = LinearSegmentedColormap.from_list("_",
+                                                     ["white", "darkgrey"])
+            ax.imshow(miss, aspect="auto", interpolation="nearest",
+                      cmap=cmap)
+
+        ax.set_ylabel("Cases")
+        ax.set_xticks(range(len(cols)))
+        ax.set_xticklabels(cols, rotation=90)
+
+        return fig
+
+    def plot_bivariate(self, col1_name, col2_name,
+                       lowess_args=None, lowess_min_n=40,
+                       jitter=None, plot_points=True, ax=None):
         """
         Plot observed and imputed values for two variables.

@@ -444,10 +659,85 @@ class MICEData:
         -------
         The matplotlib figure on which the plot id drawn.
         """
-        pass

-    def plot_fit_obs(self, col_name, lowess_args=None, lowess_min_n=40,
-        jitter=None, plot_points=True, ax=None):
+        from statsmodels.graphics import utils as gutils
+        from statsmodels.nonparametric.smoothers_lowess import lowess
+
+        if lowess_args is None:
+            lowess_args = {}
+
+        if ax is None:
+            fig, ax = gutils.create_mpl_ax(ax)
+        else:
+            fig = ax.get_figure()
+
+        ax.set_position([0.1, 0.1, 0.7, 0.8])
+
+        ix1i = self.ix_miss[col1_name]
+        ix1o = self.ix_obs[col1_name]
+        ix2i = self.ix_miss[col2_name]
+        ix2o = self.ix_obs[col2_name]
+
+        ix_ii = np.intersect1d(ix1i, ix2i)
+        ix_io = np.intersect1d(ix1i, ix2o)
+        ix_oi = np.intersect1d(ix1o, ix2i)
+        ix_oo = np.intersect1d(ix1o, ix2o)
+
+        vec1 = np.require(self.data[col1_name], requirements="W")
+        vec2 = np.require(self.data[col2_name], requirements="W")
+
+        if jitter is not None:
+            if np.isscalar(jitter):
+                jitter = (jitter, jitter)
+            vec1 += jitter[0] * np.random.normal(size=len(vec1))
+            vec2 += jitter[1] * np.random.normal(size=len(vec2))
+
+        # Plot the points
+        keys = ['oo', 'io', 'oi', 'ii']
+        lak = {'i': 'imp', 'o': 'obs'}
+        ixs = {'ii': ix_ii, 'io': ix_io, 'oi': ix_oi, 'oo': ix_oo}
+        color = {'oo': 'grey', 'ii': 'red', 'io': 'orange',
+                 'oi': 'lime'}
+        if plot_points:
+            for ky in keys:
+                ix = ixs[ky]
+                lab = lak[ky[0]] + "/" + lak[ky[1]]
+                ax.plot(vec1[ix], vec2[ix], 'o', color=color[ky],
+                        label=lab, alpha=0.6)
+
+        # Plot the lowess fits
+        for ky in keys:
+            ix = ixs[ky]
+            if len(ix) < lowess_min_n:
+                continue
+            if ky in lowess_args:
+                la = lowess_args[ky]
+            else:
+                la = {}
+            ix = ixs[ky]
+            lfit = lowess(vec2[ix], vec1[ix], **la)
+            if plot_points:
+                ax.plot(lfit[:, 0], lfit[:, 1], '-', color=color[ky],
+                        alpha=0.6, lw=4)
+            else:
+                lab = lak[ky[0]] + "/" + lak[ky[1]]
+                ax.plot(lfit[:, 0], lfit[:, 1], '-', color=color[ky],
+                        alpha=0.6, lw=4, label=lab)
+
+        ha, la = ax.get_legend_handles_labels()
+        pad = 0.0001 if plot_points else 0.5
+        leg = fig.legend(ha, la, loc='center right', numpoints=1,
+                         handletextpad=pad)
+        leg.draw_frame(False)
+
+        ax.set_xlabel(col1_name)
+        ax.set_ylabel(col2_name)
+
+        return fig
+
+    def plot_fit_obs(self, col_name, lowess_args=None,
+                     lowess_min_n=40, jitter=None,
+                     plot_points=True, ax=None):
         """
         Plot fitted versus imputed or observed values as a scatterplot.

@@ -474,10 +764,75 @@ class MICEData:
         -------
         The matplotlib figure on which the plot is drawn.
         """
-        pass
+
+        from statsmodels.graphics import utils as gutils
+        from statsmodels.nonparametric.smoothers_lowess import lowess
+
+        if lowess_args is None:
+            lowess_args = {}
+
+        if ax is None:
+            fig, ax = gutils.create_mpl_ax(ax)
+        else:
+            fig = ax.get_figure()
+
+        ax.set_position([0.1, 0.1, 0.7, 0.8])
+
+        ixi = self.ix_miss[col_name]
+        ixo = self.ix_obs[col_name]
+
+        vec1 = np.require(self.data[col_name], requirements="W")
+
+        # Fitted values
+        formula = self.conditional_formula[col_name]
+        endog, exog = patsy.dmatrices(formula, self.data,
+                                      return_type="dataframe")
+        results = self.results[col_name]
+        vec2 = results.predict(exog=exog)
+        vec2 = self._get_predicted(vec2)
+
+        if jitter is not None:
+            if np.isscalar(jitter):
+                jitter = (jitter, jitter)
+            vec1 += jitter[0] * np.random.normal(size=len(vec1))
+            vec2 += jitter[1] * np.random.normal(size=len(vec2))
+
+        # Plot the points
+        keys = ['o', 'i']
+        ixs = {'o': ixo, 'i': ixi}
+        lak = {'o': 'obs', 'i': 'imp'}
+        color = {'o': 'orange', 'i': 'lime'}
+        if plot_points:
+            for ky in keys:
+                ix = ixs[ky]
+                ax.plot(vec1[ix], vec2[ix], 'o', color=color[ky],
+                        label=lak[ky], alpha=0.6)
+
+        # Plot the lowess fits
+        for ky in keys:
+            ix = ixs[ky]
+            if len(ix) < lowess_min_n:
+                continue
+            if ky in lowess_args:
+                la = lowess_args[ky]
+            else:
+                la = {}
+            ix = ixs[ky]
+            lfit = lowess(vec2[ix], vec1[ix], **la)
+            ax.plot(lfit[:, 0], lfit[:, 1], '-', color=color[ky],
+                    alpha=0.6, lw=4, label=lak[ky])
+
+        ha, la = ax.get_legend_handles_labels()
+        leg = fig.legend(ha, la, loc='center right', numpoints=1)
+        leg.draw_frame(False)
+
+        ax.set_xlabel(col_name + " observed or imputed")
+        ax.set_ylabel(col_name + " fitted")
+
+        return fig

     def plot_imputed_hist(self, col_name, ax=None, imp_hist_args=None,
-        obs_hist_args=None, all_hist_args=None):
+                          obs_hist_args=None, all_hist_args=None):
         """
         Display imputed values for one variable as a histogram.

@@ -502,13 +857,97 @@ class MICEData:
         -------
         The matplotlib figure on which the histograms were drawn
         """
-        pass
+
+        from statsmodels.graphics import utils as gutils
+
+        if imp_hist_args is None:
+            imp_hist_args = {}
+        if obs_hist_args is None:
+            obs_hist_args = {}
+        if all_hist_args is None:
+            all_hist_args = {}
+
+        if ax is None:
+            fig, ax = gutils.create_mpl_ax(ax)
+        else:
+            fig = ax.get_figure()
+
+        ax.set_position([0.1, 0.1, 0.7, 0.8])
+
+        ixm = self.ix_miss[col_name]
+        ixo = self.ix_obs[col_name]
+
+        imp = self.data[col_name].iloc[ixm]
+        obs = self.data[col_name].iloc[ixo]
+
+        for di in imp_hist_args, obs_hist_args, all_hist_args:
+            if 'histtype' not in di:
+                di['histtype'] = 'step'
+
+        ha, la = [], []
+        if len(imp) > 0:
+            h = ax.hist(np.asarray(imp), **imp_hist_args)
+            ha.append(h[-1][0])
+            la.append("Imp")
+        h1 = ax.hist(np.asarray(obs), **obs_hist_args)
+        h2 = ax.hist(np.asarray(self.data[col_name]), **all_hist_args)
+        ha.extend([h1[-1][0], h2[-1][0]])
+        la.extend(["Obs", "All"])
+
+        leg = fig.legend(ha, la, loc='center right', numpoints=1)
+        leg.draw_frame(False)
+
+        ax.set_xlabel(col_name)
+        ax.set_ylabel("Frequency")
+
+        return fig
+
+    # Try to identify any auxiliary arrays (e.g. status vector in
+    # PHReg) that need to be bootstrapped along with exog and endog.
+    def _boot_kwds(self, kwds, rix):
+
+        for k in kwds:
+            v = kwds[k]
+
+            # This is only relevant for ndarrays
+            if not isinstance(v, np.ndarray):
+                continue
+
+            # Handle 1d vectors
+            if (v.ndim == 1) and (v.shape[0] == len(rix)):
+                kwds[k] = v[rix]
+
+            # Handle 2d arrays
+            if (v.ndim == 2) and (v.shape[0] == len(rix)):
+                kwds[k] = v[rix, :]
+
+        return kwds

     def _perturb_bootstrap(self, vname):
         """
         Perturbs the model's parameters using a bootstrap.
         """
-        pass
+
+        endog, exog, init_kwds, fit_kwds = self.get_fitting_data(vname)
+
+        m = len(endog)
+        rix = np.random.randint(0, m, m)
+        endog = endog[rix]
+        exog = exog[rix, :]
+
+        init_kwds = self._boot_kwds(init_kwds, rix)
+        fit_kwds = self._boot_kwds(fit_kwds, rix)
+
+        klass = self.model_class[vname]
+        self.models[vname] = klass(endog, exog, **init_kwds)
+
+        if vname in self.regularized and self.regularized[vname]:
+            self.results[vname] = (
+                self.models[vname].fit_regularized(**fit_kwds))
+        else:
+            self.results[vname] = self.models[vname].fit(**fit_kwds)
+
+        self.params[vname] = self.results[vname].params

     def _perturb_gaussian(self, vname):
         """
@@ -518,7 +957,30 @@ class MICEData:
         parameter estimates is used to define the mean and covariance
         structure of the perturbation distribution.
         """
-        pass
+
+        endog, exog, init_kwds, fit_kwds = self.get_fitting_data(vname)
+
+        klass = self.model_class[vname]
+        self.models[vname] = klass(endog, exog, **init_kwds)
+        self.results[vname] = self.models[vname].fit(**fit_kwds)
+
+        cov = self.results[vname].cov_params()
+        mu = self.results[vname].params
+        self.params[vname] = np.random.multivariate_normal(mean=mu, cov=cov)
+
+    def perturb_params(self, vname):
+
+        if self.perturbation_method[vname] == "gaussian":
+            self._perturb_gaussian(vname)
+        elif self.perturbation_method[vname] == "boot":
+            self._perturb_bootstrap(vname)
+        else:
+            raise ValueError("unknown perturbation method")
+
+    def impute(self, vname):
+        # Wrap this in case we later add additional imputation
+        # methods.
+        self.impute_pmm(vname)

     def update(self, vname):
         """
@@ -532,7 +994,22 @@ class MICEData:
         vname : str
             The name of the variable to be updated.
         """
-        pass
+
+        self.perturb_params(vname)
+        self.impute(vname)
+
+    # work-around for inconsistent predict return values
+    def _get_predicted(self, obj):
+
+        if isinstance(obj, np.ndarray):
+            return obj
+        elif isinstance(obj, pd.Series):
+            return obj.values
+        elif hasattr(obj, 'predicted_values'):
+            return obj.predicted_values
+        else:
+            raise ValueError(
+                "cannot obtain predicted values from %s" % obj.__class__)

     def impute_pmm(self, vname):
         """
@@ -543,7 +1020,59 @@ class MICEData:
         The `perturb_params` method must be called first to define the
         model.
         """
-        pass
+
+        k_pmm = self.k_pmm
+
+        endog_obs, exog_obs, exog_miss, predict_obs_kwds, predict_miss_kwds = (
+            self.get_split_data(vname))
+
+        # Predict imputed variable for both missing and non-missing
+        # observations
+        model = self.models[vname]
+        pendog_obs = model.predict(self.params[vname], exog_obs,
+                                   **predict_obs_kwds)
+        pendog_miss = model.predict(self.params[vname], exog_miss,
+                                    **predict_miss_kwds)
+
+        pendog_obs = self._get_predicted(pendog_obs)
+        pendog_miss = self._get_predicted(pendog_miss)
+
+        # Jointly sort the observed and predicted endog values for the
+        # cases with observed values.
+        ii = np.argsort(pendog_obs)
+        endog_obs = endog_obs[ii]
+        pendog_obs = pendog_obs[ii]
+
+        # Find the closest match to the predicted endog values for
+        # cases with missing endog values.
+        ix = np.searchsorted(pendog_obs, pendog_miss)
+
+        # Get the indices for the closest k_pmm values on
+        # either side of the closest index.
+        ixm = ix[:, None] + np.arange(-k_pmm, k_pmm)[None, :]
+
+        # Account for boundary effects
+        msk = np.nonzero((ixm < 0) | (ixm > len(endog_obs) - 1))
+        ixm = np.clip(ixm, 0, len(endog_obs) - 1)
+
+        # Get the distances
+        dx = pendog_miss[:, None] - pendog_obs[ixm]
+        dx = np.abs(dx)
+        dx[msk] = np.inf
+
+        # Closest positions in ix, row-wise.
+        dxi = np.argsort(dx, 1)[:, 0:k_pmm]
+
+        # Choose a column for each row.
+        ir = np.random.randint(0, k_pmm, len(pendog_miss))
+
+        # Unwind the indices
+        jj = np.arange(dxi.shape[0])
+        ix = dxi[(jj, ir)]
+        iz = ixm[(jj, ix)]
+
+        imputed_miss = np.array(endog_obs[iz]).squeeze()
+        self._store_changes(vname, imputed_miss)


 _mice_example_1 = """
@@ -555,6 +1084,7 @@ _mice_example_1 = """

     .. literalinclude:: ../plots/mice_example_1.txt
     """
+
 _mice_example_2 = """
     >>> imp = mice.MICEData(data)
     >>> fml = 'y ~ x1 + x2 + x3 + x4'
@@ -567,8 +1097,9 @@ _mice_example_2 = """


 class MICE:
-    __doc__ = (
-        """    Multiple Imputation with Chained Equations.
+
+    __doc__ = """\
+    Multiple Imputation with Chained Equations.

     This class can be used to fit most statsmodels models to data sets
     with missing values using the 'multiple imputation with chained
@@ -603,17 +1134,18 @@ class MICE:
     Obtain a sequence of fitted analysis models without combining
     to obtain summary::
     %(mice_example_2)s
-    """
-         % {'mice_example_1': _mice_example_1, 'mice_example_2':
-        _mice_example_2})
+    """ % {'mice_example_1': _mice_example_1,
+           'mice_example_2': _mice_example_2}

     def __init__(self, model_formula, model_class, data, n_skip=3,
-        init_kwds=None, fit_kwds=None):
+                 init_kwds=None, fit_kwds=None):
+
         self.model_formula = model_formula
         self.model_class = model_class
         self.n_skip = n_skip
         self.data = data
         self.results_list = []
+
         self.init_kwds = init_kwds if init_kwds is not None else {}
         self.fit_kwds = fit_kwds if fit_kwds is not None else {}

@@ -642,7 +1174,21 @@ class MICE:
         fitting the analysis model is repeated `n_skip + 1` times and
         the analysis model parameters from the final fit are returned.
         """
-        pass
+
+        # Impute missing values
+        self.data.update_all(self.n_skip + 1)
+        start_params = None
+        if len(self.results_list) > 0:
+            start_params = self.results_list[-1].params
+
+        # Fit the analysis model.
+        model = self.model_class.from_formula(self.model_formula,
+                                              self.data.data,
+                                              **self.init_kwds)
+        self.fit_kwds.update({"start_params": start_params})
+        result = model.fit(**self.fit_kwds)
+
+        return result

     def fit(self, n_burnin=10, n_imputations=10):
         """
@@ -655,7 +1201,18 @@ class MICE:
         n_imputations : int
             The number of data sets to impute
         """
-        pass
+
+        # Run without fitting the analysis model
+        self.data.update_all(n_burnin)
+
+        for j in range(n_imputations):
+            result = self.next_sample()
+            self.results_list.append(result)
+
+        self.endog_names = result.model.endog_names
+        self.exog_names = result.model.exog_names
+
+        return self.combine()

     def combine(self):
         """
@@ -667,15 +1224,56 @@ class MICE:

         Returns a MICEResults instance.
         """
-        pass
+
+        # Extract a few things from the models that were fit to
+        # imputed data sets.
+        params_list = []
+        cov_within = 0.
+        scale_list = []
+        for results in self.results_list:
+            results_uw = results._results
+            params_list.append(results_uw.params)
+            cov_within += results_uw.cov_params()
+            scale_list.append(results.scale)
+        params_list = np.asarray(params_list)
+        scale_list = np.asarray(scale_list)
+
+        # The estimated parameters for the MICE analysis
+        params = params_list.mean(0)
+
+        # The average of the within-imputation covariances
+        cov_within /= len(self.results_list)
+
+        # The between-imputation covariance
+        cov_between = np.cov(params_list.T)
+
+        # The estimated covariance matrix for the MICE analysis
+        f = 1 + 1 / float(len(self.results_list))
+        cov_params = cov_within + f * cov_between
+
+        # Fraction of missing information
+        fmi = f * np.diag(cov_between) / np.diag(cov_params)
+
+        # Set up a results instance
+        scale = np.mean(scale_list)
+        results = MICEResults(self, params, cov_params / scale)
+        results.scale = scale
+        results.frac_miss_info = fmi
+        results.exog_names = self.exog_names
+        results.endog_names = self.endog_names
+        results.model_class = self.model_class
+
+        return results


 class MICEResults(LikelihoodModelResults):

     def __init__(self, model, params, normalized_cov_params):
-        super(MICEResults, self).__init__(model, params, normalized_cov_params)

-    def summary(self, title=None, alpha=0.05):
+        super(MICEResults, self).__init__(model, params,
+                                          normalized_cov_params)
+
+    def summary(self, title=None, alpha=.05):
         """
         Summarize the results of running MICE.

@@ -693,4 +1291,26 @@ class MICEResults(LikelihoodModelResults):
             This holds the summary tables and text, which can be
             printed or converted to various output formats.
         """
-        pass
+
+        from statsmodels.iolib import summary2
+
+        smry = summary2.Summary()
+        float_format = "%8.3f"
+
+        info = {}
+        info["Method:"] = "MICE"
+        info["Model:"] = self.model_class.__name__
+        info["Dependent variable:"] = self.endog_names
+        info["Sample size:"] = "%d" % self.model.data.data.shape[0]
+        info["Scale"] = "%.2f" % self.scale
+        info["Num. imputations"] = "%d" % len(self.model.results_list)
+
+        smry.add_dict(info, align='l', float_format=float_format)
+
+        param = summary2.summary_params(self, alpha=alpha)
+        param["FMI"] = self.frac_miss_info
+
+        smry.add_df(param, float_format=float_format)
+        smry.add_title(title=title, results=self)
+
+        return smry
diff --git a/statsmodels/imputation/ros.py b/statsmodels/imputation/ros.py
index 40c1316bc..942b135a4 100644
--- a/statsmodels/imputation/ros.py
+++ b/statsmodels/imputation/ros.py
@@ -11,7 +11,9 @@ Company: Geosyntec Consultants (Portland, OR)
 Date: 2016-06-14

 """
+
 import warnings
+
 import numpy as np
 import pandas as pd
 from scipy import stats
@@ -45,11 +47,25 @@ def _ros_sort(df, observations, censorship, warn=False):
         The sorted dataframe with all columns dropped except the
         observation and censorship columns.
     """
-    pass
+
+    # separate uncensored data from censored data
+    censored = df[df[censorship]].sort_values(observations, axis=0)
+    uncensored = df[~df[censorship]].sort_values(observations, axis=0)
+
+    if censored[observations].max() > uncensored[observations].max():
+        censored = censored[censored[observations] <= uncensored[observations].max()]
+
+        if warn:
+            msg = ("Dropping censored observations greater than "
+                   "the max uncensored observation.")
+            warnings.warn(msg)
+
+    combined = pd.concat([censored, uncensored], axis=0)
+    return combined[[observations, censorship]].reset_index(drop=True)


 def cohn_numbers(df, observations, censorship):
-    """
+    r"""
     Computes the Cohn numbers for the detection limits in the dataset.

     The Cohn Numbers are:
@@ -60,10 +76,10 @@ def cohn_numbers(df, observations, censorship):
           the jth threshold.
         - :math:`C_j =` the number of censored observations at the jth
           threshold.
-        - :math:`\\mathrm{PE}_j =` the probability of exceeding the jth
+        - :math:`\mathrm{PE}_j =` the probability of exceeding the jth
           threshold
-        - :math:`\\mathrm{DL}_j =` the unique, sorted detection limits
-        - :math:`\\mathrm{DL}_{j+1} = \\mathrm{DL}_j` shifted down a
+        - :math:`\mathrm{DL}_j =` the unique, sorted detection limits
+        - :math:`\mathrm{DL}_{j+1} = \mathrm{DL}_j` shifted down a
           single index (row)

     Parameters
@@ -84,7 +100,103 @@ def cohn_numbers(df, observations, censorship):
     -------
     cohn : DataFrame
     """
-    pass
+
+    def nuncen_above(row):
+        """ A, the number of uncensored obs above the given threshold.
+        """
+
+        # index of observations above the lower_dl DL
+        above = df[observations] >= row['lower_dl']
+
+        # index of observations below the upper_dl DL
+        below = df[observations] < row['upper_dl']
+
+        # index of non-detect observations
+        detect = ~df[censorship]
+
+        # return the number of observations where all conditions are True
+        return df[above & below & detect].shape[0]
+
+    def nobs_below(row):
+        """ B, the number of observations (cen & uncen) below the given
+        threshold
+        """
+
+        # index of data less than the lower_dl DL
+        less_than = df[observations] < row['lower_dl']
+
+        # index of data less than or equal to the lower_dl DL
+        less_thanequal = df[observations] <= row['lower_dl']
+
+        # index of detects, non-detects
+        uncensored = ~df[censorship]
+        censored = df[censorship]
+
+        # number observations less than or equal to lower_dl DL and non-detect
+        LTE_censored = df[less_thanequal & censored].shape[0]
+
+        # number of observations less than lower_dl DL and detected
+        LT_uncensored = df[less_than & uncensored].shape[0]
+
+        # return the sum
+        return LTE_censored + LT_uncensored
+
+    def ncen_equal(row):
+        """ C, the number of censored observations at the given
+        threshold.
+        """
+
+        censored_index = df[censorship]
+        censored_data = df[observations][censored_index]
+        censored_below = censored_data == row['lower_dl']
+        return censored_below.sum()
+
+    def set_upper_limit(cohn):
+        """ Sets the upper_dl DL for each row of the Cohn dataframe. """
+        if cohn.shape[0] > 1:
+            return cohn['lower_dl'].shift(-1).fillna(value=np.inf)
+        else:
+            return [np.inf]
+
+    def compute_PE(A, B):
+        """ Computes the probability of excedance for each row of the
+        Cohn dataframe. """
+        N = len(A)
+        PE = np.empty(N, dtype='float64')
+        PE[-1] = 0.0
+        for j in range(N-2, -1, -1):
+            PE[j] = PE[j+1] + (1 - PE[j+1]) * A[j] / (A[j] + B[j])
+
+        return PE
+
+    # unique, sorted detection limts
+    censored_data = df[censorship]
+    DLs = pd.unique(df.loc[censored_data, observations])
+    DLs.sort()
+
+    # if there is a observations smaller than the minimum detection limit,
+    # add that value to the array
+    if DLs.shape[0] > 0:
+        if df[observations].min() < DLs.min():
+            DLs = np.hstack([df[observations].min(), DLs])
+
+        # create a dataframe
+        # (editted for pandas 0.14 compatibility; see commit 63f162e
+        #  when `pipe` and `assign` are available)
+        cohn = pd.DataFrame(DLs, columns=['lower_dl'])
+        cohn.loc[:, 'upper_dl'] = set_upper_limit(cohn)
+        cohn.loc[:, 'nuncen_above'] = cohn.apply(nuncen_above, axis=1)
+        cohn.loc[:, 'nobs_below'] = cohn.apply(nobs_below, axis=1)
+        cohn.loc[:, 'ncen_equal'] = cohn.apply(ncen_equal, axis=1)
+        cohn = cohn.reindex(range(DLs.shape[0] + 1))
+        cohn.loc[:, 'prob_exceedance'] = compute_PE(cohn['nuncen_above'], cohn['nobs_below'])
+
+    else:
+        dl_cols = ['lower_dl', 'upper_dl', 'nuncen_above',
+                   'nobs_below', 'ncen_equal', 'prob_exceedance']
+        cohn = pd.DataFrame(np.empty((0, len(dl_cols))), columns=dl_cols)
+
+    return cohn


 def _detection_limit_index(obs, cohn):
@@ -111,7 +223,14 @@ def _detection_limit_index(obs, cohn):
     --------
     cohn_numbers
     """
-    pass
+
+    if cohn.shape[0] > 0:
+        index, = np.where(cohn['lower_dl'] <= obs)
+        det_limit_index = index[-1]
+    else:
+        det_limit_index = 0
+
+    return det_limit_index


 def _ros_group_rank(df, dl_idx, censorship):
@@ -140,7 +259,16 @@ def _ros_group_rank(df, dl_idx, censorship):
     ranks : ndarray
         Array of ranks for the dataset.
     """
-    pass
+
+    # (editted for pandas 0.14 compatibility; see commit 63f162e
+    #  when `pipe` and `assign` are available)
+    ranks = df.copy()
+    ranks.loc[:, 'rank'] = 1
+    ranks = (
+        ranks.groupby(by=[dl_idx, censorship])['rank']
+             .transform(lambda g: g.cumsum())
+    )
+    return ranks


 def _ros_plot_pos(row, censorship, cohn):
@@ -172,7 +300,18 @@ def _ros_plot_pos(row, censorship, cohn):
     --------
     cohn_numbers
     """
-    pass
+
+    DL_index = row['det_limit_index']
+    rank = row['rank']
+    censored = row[censorship]
+
+    dl_1 = cohn.iloc[DL_index]
+    dl_2 = cohn.iloc[DL_index + 1]
+    if censored:
+        return (1 - dl_1['prob_exceedance']) * rank / (dl_1['ncen_equal']+1)
+    else:
+        return (1 - dl_1['prob_exceedance']) + (dl_1['prob_exceedance'] - dl_2['prob_exceedance']) * \
+                rank / (dl_1['nuncen_above']+1)


 def _norm_plot_pos(observations):
@@ -188,7 +327,8 @@ def _norm_plot_pos(observations):
     -------
     plotting_position : array of floats
     """
-    pass
+    ppos, sorted_res = stats.probplot(observations, fit=False)
+    return stats.norm.cdf(ppos)


 def plotting_positions(df, censorship, cohn):
@@ -218,7 +358,16 @@ def plotting_positions(df, censorship, cohn):
     --------
     cohn_numbers
     """
-    pass
+
+    plot_pos = df.apply(lambda r: _ros_plot_pos(r, censorship, cohn), axis=1)
+
+    # correctly sort the plotting positions of the ND data:
+    ND_plotpos = plot_pos[df[censorship]]
+    ND_plotpos_arr = np.require(ND_plotpos, requirements="W")
+    ND_plotpos_arr.sort()
+    plot_pos.loc[df[censorship].index[df[censorship]]] = ND_plotpos_arr
+
+    return plot_pos


 def _impute(df, observations, censorship, transform_in, transform_out):
@@ -253,7 +402,27 @@ def _impute(df, observations, censorship, transform_in, transform_out):
         only where the original observations were censored, and the original
         observations everwhere else.
     """
-    pass
+
+    # detect/non-detect selectors
+    uncensored_mask = ~df[censorship]
+    censored_mask = df[censorship]
+
+    # fit a line to the logs of the detected data
+    fit_params = stats.linregress(
+        df['Zprelim'][uncensored_mask],
+        transform_in(df[observations][uncensored_mask])
+    )
+
+    # pull out the slope and intercept for use later
+    slope, intercept = fit_params[:2]
+
+    # model the data based on the best-fit curve
+    # (editted for pandas 0.14 compatibility; see commit 63f162e
+    #  when `pipe` and `assign` are available)
+    df.loc[:, 'estimated'] = transform_out(slope * df['Zprelim'][censored_mask] + intercept)
+    df.loc[:, 'final'] = np.where(df[censorship], df['estimated'], df[observations])
+
+    return df


 def _do_ros(df, observations, censorship, transform_in, transform_out):
@@ -291,12 +460,25 @@ def _do_ros(df, observations, censorship, transform_in, transform_out):
         only where the original observations were censored, and the original
         observations everwhere else.
     """
-    pass
+
+    # compute the Cohn numbers
+    cohn = cohn_numbers(df, observations=observations, censorship=censorship)
+
+    # (editted for pandas 0.14 compatibility; see commit 63f162e
+    #  when `pipe` and `assign` are available)
+    modeled = _ros_sort(df, observations=observations, censorship=censorship)
+    modeled.loc[:, 'det_limit_index'] = modeled[observations].apply(_detection_limit_index, args=(cohn,))
+    modeled.loc[:, 'rank'] = _ros_group_rank(modeled, 'det_limit_index', censorship)
+    modeled.loc[:, 'plot_pos'] = plotting_positions(modeled, censorship, cohn)
+    modeled.loc[:, 'Zprelim'] = stats.norm.ppf(modeled['plot_pos'])
+
+    return _impute(modeled, observations, censorship, transform_in, transform_out)


 def impute_ros(observations, censorship, df=None, min_uncensored=2,
-    max_fraction_censored=0.8, substitution_fraction=0.5, transform_in=np.
-    log, transform_out=np.exp, as_array=True):
+               max_fraction_censored=0.8, substitution_fraction=0.5,
+               transform_in=np.log, transform_out=np.exp,
+               as_array=True):
     """
     Impute censored dataset using Regression on Order Statistics (ROS).

@@ -359,4 +541,42 @@ def impute_ros(observations, censorship, df=None, min_uncensored=2,
     -----
     This function requires pandas 0.14 or more recent.
     """
-    pass
+
+    # process arrays into a dataframe, if necessary
+    if df is None:
+        df = pd.DataFrame({'obs': observations, 'cen': censorship})
+        observations = 'obs'
+        censorship = 'cen'
+
+    # basic counts/metrics of the dataset
+    N_observations = df.shape[0]
+    N_censored = df[censorship].astype(int).sum()
+    N_uncensored = N_observations - N_censored
+    fraction_censored = N_censored / N_observations
+
+    # add plotting positions if there are no censored values
+    # (editted for pandas 0.14 compatibility; see commit 63f162e
+    #  when `pipe` and `assign` are available)
+    if N_censored == 0:
+        output = df[[observations, censorship]].copy()
+        output.loc[:, 'final'] = df[observations]
+
+    # substitute w/ fraction of the DLs if there's insufficient
+    # uncensored data
+    # (editted for pandas 0.14 compatibility; see commit 63f162e
+    #  when `pipe` and `assign` are available)
+    elif (N_uncensored < min_uncensored) or (fraction_censored > max_fraction_censored):
+        output = df[[observations, censorship]].copy()
+        output.loc[:, 'final'] = df[observations]
+        output.loc[df[censorship], 'final'] *= substitution_fraction
+
+
+    # normal ROS stuff
+    else:
+        output = _do_ros(df, observations, censorship, transform_in, transform_out)
+
+    # convert to an array if necessary
+    if as_array:
+        output = output['final'].values
+
+    return output
diff --git a/statsmodels/iolib/api.py b/statsmodels/iolib/api.py
index 8362963c8..70f8bea05 100644
--- a/statsmodels/iolib/api.py
+++ b/statsmodels/iolib/api.py
@@ -1,4 +1,7 @@
-__all__ = ['SimpleTable', 'savetxt', 'csv2st', 'save_pickle', 'load_pickle']
+__all__ = [
+    "SimpleTable", "savetxt", "csv2st",
+    "save_pickle", "load_pickle"
+]
 from .foreign import savetxt
 from .table import SimpleTable, csv2st
 from .smpickle import save_pickle, load_pickle
diff --git a/statsmodels/iolib/foreign.py b/statsmodels/iolib/foreign.py
index aebf4b004..11f9d856c 100644
--- a/statsmodels/iolib/foreign.py
+++ b/statsmodels/iolib/foreign.py
@@ -6,6 +6,7 @@ See Also
 numpy.lib.io
 """
 import numpy as np
+
 from statsmodels.iolib.openfile import get_file_obj


@@ -96,4 +97,44 @@ def savetxt(fname, X, names=None, fmt='%.18e', delimiter=' '):
     >>> savetxt('test.out', (x,y,z))   # x,y,z equal sized 1D arrays
     >>> savetxt('test.out', x, fmt='%1.4e')   # use exponential notation
     """
-    pass
+
+    with get_file_obj(fname, 'w') as fh:
+        X = np.asarray(X)
+
+        # Handle 1-dimensional arrays
+        if X.ndim == 1:
+            # Common case -- 1d array of numbers
+            if X.dtype.names is None:
+                X = np.atleast_2d(X).T
+                ncol = 1
+
+            # Complex dtype -- each field indicates a separate column
+            else:
+                ncol = len(X.dtype.descr)
+        else:
+            ncol = X.shape[1]
+
+        # `fmt` can be a string with multiple insertion points or a list of formats.
+        # E.g. '%10.5f\t%10d' or ('%10.5f', '$10d')
+        if isinstance(fmt, (list, tuple)):
+            if len(fmt) != ncol:
+                raise AttributeError('fmt has wrong shape.  %s' % str(fmt))
+            format = delimiter.join(fmt)
+        elif isinstance(fmt, str):
+            if fmt.count('%') == 1:
+                fmt = [fmt, ]*ncol
+                format = delimiter.join(fmt)
+            elif fmt.count('%') != ncol:
+                raise AttributeError('fmt has wrong number of %% formats.  %s'
+                                     % fmt)
+            else:
+                format = fmt
+
+        # handle names
+        if names is None and X.dtype.names:
+            names = X.dtype.names
+        if names is not None:
+            fh.write(delimiter.join(names) + '\n')
+
+        for row in X:
+            fh.write(format % tuple(row) + '\n')
diff --git a/statsmodels/iolib/openfile.py b/statsmodels/iolib/openfile.py
index a28924fcc..2da66e77b 100644
--- a/statsmodels/iolib/openfile.py
+++ b/statsmodels/iolib/openfile.py
@@ -2,6 +2,7 @@
 Handle file opening for read/write
 """
 from pathlib import Path
+
 from numpy.lib._iotools import _is_string_like


@@ -26,7 +27,16 @@ class EmptyContextManager:
         return getattr(self._obj, name)


-def get_file_obj(fname, mode='r', encoding=None):
+def _open(fname, mode, encoding):
+    if fname.endswith(".gz"):
+        import gzip
+
+        return gzip.open(fname, mode, encoding=encoding)
+    else:
+        return open(fname, mode, encoding=encoding)
+
+
+def get_file_obj(fname, mode="r", encoding=None):
     """
     Light wrapper to handle strings, path objects and let files (anything else)
     pass through.
@@ -48,4 +58,22 @@ def get_file_obj(fname, mode='r', encoding=None):
     already a file-like object, the returned context manager *will not
     close the file*.
     """
-    pass
+
+    if _is_string_like(fname):
+        fname = Path(fname)
+    if isinstance(fname, Path):
+        return fname.open(mode=mode, encoding=encoding)
+    elif hasattr(fname, "open"):
+        return fname.open(mode=mode, encoding=encoding)
+    try:
+        return open(fname, mode, encoding=encoding)
+    except TypeError:
+        try:
+            # Make sure the object has the write methods
+            if "r" in mode:
+                fname.read
+            if "w" in mode or "a" in mode:
+                fname.write
+        except AttributeError:
+            raise ValueError("fname must be a string or a file-like object")
+        return EmptyContextManager(fname)
diff --git a/statsmodels/iolib/smpickle.py b/statsmodels/iolib/smpickle.py
index d92ec645d..20a27dc28 100644
--- a/statsmodels/iolib/smpickle.py
+++ b/statsmodels/iolib/smpickle.py
@@ -11,7 +11,10 @@ def save_pickle(obj, fname):
     fname : {str, pathlib.Path}
         Filename to pickle to
     """
-    pass
+    import pickle
+
+    with get_file_obj(fname, "wb") as fout:
+        pickle.dump(obj, fout, protocol=-1)


 def load_pickle(fname):
@@ -33,4 +36,7 @@ def load_pickle(fname):
     -----
     This method can be used to load *both* models and results.
     """
-    pass
+    import pickle
+
+    with get_file_obj(fname, "rb") as fin:
+        return pickle.load(fin)
diff --git a/statsmodels/iolib/stata_summary_examples.py b/statsmodels/iolib/stata_summary_examples.py
index 27bfacf26..817ddb460 100644
--- a/statsmodels/iolib/stata_summary_examples.py
+++ b/statsmodels/iolib/stata_summary_examples.py
@@ -1,3 +1,4 @@
+
 """. regress totemp gnpdefl gnp unemp armed pop year

       Source |       SS       df       MS              Number of obs =      16
@@ -19,6 +20,9 @@
        _cons |   -3482258   890420.3    -3.91   0.004     -5496529    -1467987
 ------------------------------------------------------------------------------
 """
+
+
+#From Stata using Longley dataset as in the test and example for GLM
 """
 . glm totemp gnpdefl gnp unemp armed pop year

@@ -49,6 +53,9 @@ Log likelihood   = -109.6174355                    BIC             =  836399.2
        _cons |   -3482258   890420.3    -3.91   0.000     -5227450    -1737066
 ------------------------------------------------------------------------------
 """
+
+#RLM Example
+
 """
 . rreg stackloss airflow watertemp acidconc

diff --git a/statsmodels/iolib/summary.py b/statsmodels/iolib/summary.py
index 9bdf3035a..88f9f364c 100644
--- a/statsmodels/iolib/summary.py
+++ b/statsmodels/iolib/summary.py
@@ -1,13 +1,40 @@
 from statsmodels.compat.python import lmap, lrange, lzip
+
 import copy
 from itertools import zip_longest
 import time
+
 import numpy as np
+
 from statsmodels.iolib.table import SimpleTable
-from statsmodels.iolib.tableformatting import fmt_2, fmt_2cols, fmt_params, gen_fmt
+from statsmodels.iolib.tableformatting import (
+    fmt_2,
+    fmt_2cols,
+    fmt_params,
+    gen_fmt,
+)
+
 from .summary2 import _model_types


+def forg(x, prec=3):
+    x = np.squeeze(x)
+    if prec == 3:
+        # for 3 decimals
+        if (abs(x) >= 1e4) or (abs(x) < 1e-4):
+            return '%9.3g' % x
+        else:
+            return '%9.3f' % x
+    elif prec == 4:
+        if (abs(x) >= 1e4) or (abs(x) < 1e-4):
+            return '%10.4g' % x
+        else:
+            return '%10.4f' % x
+    else:
+        raise ValueError("`prec` argument must be either 3 or 4, not {prec}"
+                         .format(prec=prec))
+
+
 def d_or_f(x, width=6):
     """convert number to string with either integer of float formatting

@@ -25,11 +52,17 @@ def d_or_f(x, width=6):
     str : str
         number as formatted string
     """
-    pass
+    if np.isnan(x):
+        return (width - 3) * ' ' + 'NaN'
+
+    if x // 1 == x:
+        return "%#6d" % x
+    else:
+        return "%#8.2f" % x


-def summary(self, yname=None, xname=None, title=0, alpha=0.05, returns=
-    'text', model_info=None):
+def summary(self, yname=None, xname=None, title=0, alpha=.05,
+            returns='text', model_info=None):
     """
     Parameters
     ----------
@@ -83,30 +116,278 @@ def summary(self, yname=None, xname=None, title=0, alpha=0.05, returns=
     -----
     conf_int calculated from normal dist.
     """
-    pass
+    if title == 0:
+        title = _model_types[self.model.__class__.__name__]
+
+    if xname is not None and len(xname) != len(self.params):
+        # GH 2298
+        raise ValueError('User supplied xnames must have the same number of '
+                         'entries as the number of model parameters '
+                         '({0})'.format(len(self.params)))
+
+    yname, xname = _getnames(self, yname, xname)
+
+    time_now = time.localtime()
+    time_of_day = [time.strftime("%H:%M:%S", time_now)]
+    date = time.strftime("%a, %d %b %Y", time_now)
+    modeltype = self.model.__class__.__name__
+    nobs = self.nobs
+    df_model = self.df_model
+    df_resid = self.df_resid
+
+    #General part of the summary table, Applicable to all? models
+    #------------------------------------------------------------
+    # TODO: define this generically, overwrite in model classes
+    #replace definition of stubs data by single list
+    #e.g.
+    gen_left = [('Model type:', [modeltype]),
+                ('Date:', [date]),
+                ('Dependent Variable:', yname),  # TODO: What happens with multiple names?
+                ('df model', [df_model])
+                ]
+    gen_stubs_left, gen_data_left = zip_longest(*gen_left) #transpose row col
+
+    gen_title = title
+    gen_header = None
+    gen_table_left = SimpleTable(gen_data_left,
+                                 gen_header,
+                                 gen_stubs_left,
+                                 title=gen_title,
+                                 txt_fmt=gen_fmt
+                                 )
+
+    gen_stubs_right = ('Method:',
+                       'Time:',
+                       'Number of Obs:',
+                       'df resid')
+    gen_data_right = ([modeltype], #was dist family need to look at more
+                      time_of_day,
+                      [nobs],
+                      [df_resid]
+                      )
+    gen_table_right = SimpleTable(gen_data_right,
+                                  gen_header,
+                                  gen_stubs_right,
+                                  title=gen_title,
+                                  txt_fmt=gen_fmt
+                                  )
+    gen_table_left.extend_right(gen_table_right)
+    general_table = gen_table_left
+
+    # Parameters part of the summary table
+    # ------------------------------------
+    # Note: this is not necessary since we standardized names,
+    #  only t versus normal
+    tstats = {'OLS': self.t(),
+              'GLS': self.t(),
+              'GLSAR': self.t(),
+              'WLS': self.t(),
+              'RLM': self.t(),
+              'GLM': self.t()}
+    prob_stats = {'OLS': self.pvalues,
+                  'GLS': self.pvalues,
+                  'GLSAR': self.pvalues,
+                  'WLS': self.pvalues,
+                  'RLM': self.pvalues,
+                  'GLM': self.pvalues
+                  }
+    # Dictionary to store the header names for the parameter part of the
+    # summary table. look up by modeltype
+    alp = str((1-alpha)*100)+'%'
+    param_header = {
+         'OLS'   : ['coef', 'std err', 't', 'P>|t|', alp + ' Conf. Interval'],
+         'GLS'   : ['coef', 'std err', 't', 'P>|t|', alp + ' Conf. Interval'],
+         'GLSAR' : ['coef', 'std err', 't', 'P>|t|', alp + ' Conf. Interval'],
+         'WLS'   : ['coef', 'std err', 't', 'P>|t|', alp + ' Conf. Interval'],
+         'GLM'   : ['coef', 'std err', 't', 'P>|t|', alp + ' Conf. Interval'], #glm uses t-distribution
+         'RLM'   : ['coef', 'std err', 'z', 'P>|z|', alp + ' Conf. Interval']  #checke z
+                   }
+    params_stubs = xname
+    params = self.params
+    conf_int = self.conf_int(alpha)
+    std_err = self.bse
+    exog_len = lrange(len(xname))
+    tstat = tstats[modeltype]
+    prob_stat = prob_stats[modeltype]
+
+    # Simpletable should be able to handle the formating
+    params_data = lzip(["%#6.4g" % (params[i]) for i in exog_len],
+                       ["%#6.4f" % (std_err[i]) for i in exog_len],
+                       ["%#6.4f" % (tstat[i]) for i in exog_len],
+                       ["%#6.4f" % (prob_stat[i]) for i in exog_len],
+                       ["(%#5g, %#5g)" % tuple(conf_int[i]) for i in exog_len])
+    parameter_table = SimpleTable(params_data,
+                                  param_header[modeltype],
+                                  params_stubs,
+                                  title=None,
+                                  txt_fmt=fmt_2
+                                  )
+
+    #special table
+    #-------------
+    #TODO: exists in linear_model, what about other models
+    #residual diagnostics
+
+    #output options
+    #--------------
+    #TODO: JP the rest needs to be fixed, similar to summary in linear_model
+
+    def ols_printer():
+        """
+        print summary table for ols models
+        """
+        table = str(general_table)+'\n'+str(parameter_table)
+        return table
+
+    def glm_printer():
+        table = str(general_table)+'\n'+str(parameter_table)
+        return table
+
+    printers = {'OLS': ols_printer, 'GLM': glm_printer}
+
+    if returns == 'print':
+        try:
+            return printers[modeltype]()
+        except KeyError:
+            return printers['OLS']()


 def _getnames(self, yname=None, xname=None):
-    """extract names from model or construct names
-    """
-    pass
+    '''extract names from model or construct names
+    '''
+    if yname is None:
+        if getattr(self.model, 'endog_names', None) is not None:
+            yname = self.model.endog_names
+        else:
+            yname = 'y'

+    if xname is None:
+        if getattr(self.model, 'exog_names', None) is not None:
+            xname = self.model.exog_names
+        else:
+            xname = ['var_%d' % i for i in range(len(self.params))]

-def summary_top(results, title=None, gleft=None, gright=None, yname=None,
-    xname=None):
-    """generate top table(s)
+    return yname, xname


-    TODO: this still uses predefined model_methods
-    ? allow gleft, gright to be 1 element tuples instead of filling with None?
+def summary_top(results, title=None, gleft=None, gright=None, yname=None, xname=None):
+    '''generate top table(s)

-    """
-    pass

+    TODO: this still uses predefined model_methods
+    ? allow gleft, gright to be 1 element tuples instead of filling with None?

-def summary_params(results, yname=None, xname=None, alpha=0.05, use_t=True,
-    skip_header=False, title=None):
-    """create a summary table for the parameters
+    '''
+    #change of names ?
+    gen_left, gen_right = gleft, gright
+
+    # time and names are always included
+    time_now = time.localtime()
+    time_of_day = [time.strftime("%H:%M:%S", time_now)]
+    date = time.strftime("%a, %d %b %Y", time_now)
+
+    yname, xname = _getnames(results, yname=yname, xname=xname)
+
+    # create dictionary with default
+    # use lambdas because some values raise exception if they are not available
+    default_items = dict([
+          ('Dependent Variable:', lambda: [yname]),
+          ('Dep. Variable:', lambda: [yname]),
+          ('Model:', lambda: [results.model.__class__.__name__]),
+          ('Date:', lambda: [date]),
+          ('Time:', lambda: time_of_day),
+          ('Number of Obs:', lambda: [results.nobs]),
+          ('No. Observations:', lambda: [d_or_f(results.nobs)]),
+          ('Df Model:', lambda: [d_or_f(results.df_model)]),
+          ('Df Residuals:', lambda: [d_or_f(results.df_resid)]),
+          ('Log-Likelihood:', lambda: ["%#8.5g" % results.llf])  # does not exist for RLM - exception
+    ])
+
+    if title is None:
+        title = results.model.__class__.__name__ + 'Regression Results'
+
+    if gen_left is None:
+        # default: General part of the summary table, Applicable to all? models
+        gen_left = [('Dep. Variable:', None),
+                    ('Model type:', None),
+                    ('Date:', None),
+                    ('No. Observations:', None),
+                    ('Df model:', None),
+                    ('Df resid:', None)]
+
+        try:
+            llf = results.llf  # noqa: F841
+            gen_left.append(('Log-Likelihood', None))
+        except: # AttributeError, NotImplementedError
+            pass
+
+        gen_right = []
+
+    gen_title = title
+    gen_header = None
+
+    # replace missing (None) values with default values
+    gen_left_ = []
+    for item, value in gen_left:
+        if value is None:
+            value = default_items[item]()  # let KeyErrors raise exception
+        gen_left_.append((item, value))
+    gen_left = gen_left_
+
+    if gen_right:
+        gen_right_ = []
+        for item, value in gen_right:
+            if value is None:
+                value = default_items[item]()  # let KeyErrors raise exception
+            gen_right_.append((item, value))
+        gen_right = gen_right_
+
+    # check nothing was missed
+    missing_values = [k for k,v in gen_left + gen_right if v is None]
+    assert missing_values == [], missing_values
+
+    # pad both tables to equal number of rows
+    if gen_right:
+        if len(gen_right) < len(gen_left):
+            # fill up with blank lines to same length
+            gen_right += [(' ', ' ')] * (len(gen_left) - len(gen_right))
+        elif len(gen_right) > len(gen_left):
+            # fill up with blank lines to same length, just to keep it symmetric
+            gen_left += [(' ', ' ')] * (len(gen_right) - len(gen_left))
+
+        # padding in SimpleTable does not work like I want
+        #force extra spacing and exact string length in right table
+        gen_right = [('%-21s' % ('  '+k), v) for k,v in gen_right]
+        gen_stubs_right, gen_data_right = zip_longest(*gen_right) #transpose row col
+        gen_table_right = SimpleTable(gen_data_right,
+                                      gen_header,
+                                      gen_stubs_right,
+                                      title=gen_title,
+                                      txt_fmt=fmt_2cols
+                                      )
+    else:
+        gen_table_right = []  #because .extend_right seems works with []
+
+    #moved below so that we can pad if needed to match length of gen_right
+    #transpose rows and columns, `unzip`
+    gen_stubs_left, gen_data_left = zip_longest(*gen_left) #transpose row col
+
+    gen_table_left = SimpleTable(gen_data_left,
+                                 gen_header,
+                                 gen_stubs_left,
+                                 title=gen_title,
+                                 txt_fmt=fmt_2cols
+                                 )
+
+    gen_table_left.extend_right(gen_table_right)
+    general_table = gen_table_left
+
+    return general_table
+
+
+def summary_params(results, yname=None, xname=None, alpha=.05, use_t=True,
+                   skip_header=False, title=None):
+    '''create a summary table for the parameters

     Parameters
     ----------
@@ -129,12 +410,69 @@ def summary_params(results, yname=None, xname=None, alpha=0.05, use_t=True,
     Returns
     -------
     params_table : SimpleTable instance
-    """
-    pass
-
-
-def summary_params_frame(results, yname=None, xname=None, alpha=0.05, use_t
-    =True):
+    '''
+
+    # Parameters part of the summary table
+    # ------------------------------------
+    # Note: this is not necessary since we standardized names,
+    #   only t versus normal
+
+    if isinstance(results, tuple):
+        # for multivariate endog
+        # TODO: check whether I do not want to refactor this
+        #we need to give parameter alpha to conf_int
+        results, params, std_err, tvalues, pvalues, conf_int = results
+    else:
+        params = np.asarray(results.params)
+        std_err = np.asarray(results.bse)
+        tvalues = np.asarray(results.tvalues)  # is this sometimes called zvalues
+        pvalues = np.asarray(results.pvalues)
+        conf_int = np.asarray(results.conf_int(alpha))
+    if params.size == 0:
+        return SimpleTable([['No Model Parameters']])
+    # Dictionary to store the header names for the parameter part of the
+    # summary table. look up by modeltype
+    if use_t:
+        param_header = ['coef', 'std err', 't', 'P>|t|',
+                        '[' + str(alpha/2), str(1-alpha/2) + ']']
+    else:
+        param_header = ['coef', 'std err', 'z', 'P>|z|',
+                        '[' + str(alpha/2), str(1-alpha/2) + ']']
+
+    if skip_header:
+        param_header = None
+
+    _, xname = _getnames(results, yname=yname, xname=xname)
+
+    if len(xname) != len(params):
+        raise ValueError('xnames and params do not have the same length')
+
+    params_stubs = xname
+
+    exog_idx = lrange(len(xname))
+    params = np.asarray(params)
+    std_err = np.asarray(std_err)
+    tvalues = np.asarray(tvalues)
+    pvalues = np.asarray(pvalues)
+    conf_int = np.asarray(conf_int)
+    params_data = lzip([forg(params[i], prec=4) for i in exog_idx],
+                       [forg(std_err[i]) for i in exog_idx],
+                       [forg(tvalues[i]) for i in exog_idx],
+                       ["%#6.3f" % (pvalues[i]) for i in exog_idx],
+                       [forg(conf_int[i,0]) for i in exog_idx],
+                       [forg(conf_int[i,1]) for i in exog_idx])
+    parameter_table = SimpleTable(params_data,
+                                  param_header,
+                                  params_stubs,
+                                  title=title,
+                                  txt_fmt=fmt_params
+                                  )
+
+    return parameter_table
+
+
+def summary_params_frame(results, yname=None, xname=None, alpha=.05,
+                         use_t=True):
     """
     Create a summary table for the parameters

@@ -160,11 +498,42 @@ def summary_params_frame(results, yname=None, xname=None, alpha=0.05, use_t
     -------
     params_table : SimpleTable instance
     """
-    pass

-
-def summary_params_2d(result, extras=None, endog_names=None, exog_names=
-    None, title=None):
+    # Parameters part of the summary table
+    # ------------------------------------
+    # Note: this is not necessary since we standardized names,
+    #   only t versus normal
+
+    if isinstance(results, tuple):
+        # for multivariate endog
+        # TODO: check whether I do not want to refactor this
+        #we need to give parameter alpha to conf_int
+        results, params, std_err, tvalues, pvalues, conf_int = results
+    else:
+        params = results.params
+        std_err = results.bse
+        tvalues = results.tvalues  #is this sometimes called zvalues
+        pvalues = results.pvalues
+        conf_int = results.conf_int(alpha)
+
+    # Dictionary to store the header names for the parameter part of the
+    # summary table. look up by modeltype
+    if use_t:
+        param_header = ['coef', 'std err', 't', 'P>|t|',
+                        'Conf. Int. Low', 'Conf. Int. Upp.']
+    else:
+        param_header = ['coef', 'std err', 'z', 'P>|z|',
+                        'Conf. Int. Low', 'Conf. Int. Upp.']
+
+    _, xname = _getnames(results, yname=yname, xname=xname)
+
+    from pandas import DataFrame
+    table = np.column_stack((params, std_err, tvalues, pvalues, conf_int))
+    return DataFrame(table, columns=param_header, index=xname)
+
+
+def summary_params_2d(result, extras=None, endog_names=None, exog_names=None,
+                      title=None):
     """create summary table of regression parameters with several equations

     This allows interleaving of parameters with bse and/or tvalues
@@ -192,11 +561,40 @@ def summary_params_2d(result, extras=None, endog_names=None, exog_names=
         array

     """
-    pass
-
-
-def summary_params_2dflat(result, endog_names=None, exog_names=None, alpha=
-    0.05, use_t=True, keep_headers=True, endog_cols=False):
+    if endog_names is None:
+        # TODO: note the [1:] is specific to current MNLogit
+        endog_names = ['endog_%d' % i for i in
+                       np.unique(result.model.endog)[1:]]
+    if exog_names is None:
+        exog_names = ['var%d' % i for i in range(len(result.params))]
+
+    # TODO: check formatting options with different values
+    res_params = [[forg(item, prec=4) for item in row] for row in result.params]
+    if extras:
+        extras_list = [[['%10s' % ('(' + forg(v, prec=3).strip() + ')')
+                         for v in col]
+                        for col in getattr(result, what)]
+                       for what in extras
+                       ]
+        data = lzip(res_params, *extras_list)
+        data = [i for j in data for i in j]  #flatten
+        stubs = lzip(endog_names, *[['']*len(endog_names)]*len(extras))
+        stubs = [i for j in stubs for i in j] #flatten
+    else:
+        data = res_params
+        stubs = endog_names
+
+    txt_fmt = copy.deepcopy(fmt_params)
+    txt_fmt["data_fmts"] = ["%s"]*result.params.shape[1]
+
+    return SimpleTable(data, headers=exog_names,
+                             stubs=stubs,
+                             title=title,
+                             txt_fmt=txt_fmt)
+
+
+def summary_params_2dflat(result, endog_names=None, exog_names=None, alpha=0.05,
+                          use_t=True, keep_headers=True, endog_cols=False):
     """summary table for parameters that are 2d, e.g. multi-equation models

     Parameters
@@ -229,7 +627,50 @@ def summary_params_2dflat(result, endog_names=None, exog_names=None, alpha=
         array

     """
-    pass
+
+    res = result
+    params = res.params
+    if params.ndim == 2:  # we've got multiple equations
+        n_equ = params.shape[1]
+        if len(endog_names) != params.shape[1]:
+            raise ValueError('endog_names has wrong length')
+    else:
+        if len(endog_names) != len(params):
+            raise ValueError('endog_names has wrong length')
+        n_equ = 1
+
+    #VAR does not have conf_int
+    #params = res.params.T # this is a convention for multi-eq models
+
+    # check that we have the right length of names
+    if not isinstance(endog_names, list):
+        # TODO: this might be specific to multinomial logit type, move?
+        if endog_names is None:
+            endog_basename = 'endog'
+        else:
+            endog_basename = endog_names
+        # TODO: note, the [1:] is specific to current MNLogit
+        endog_names = res.model.endog_names[1:]
+
+    tables = []
+    for eq in range(n_equ):
+        restup = (res, res.params[:,eq], res.bse[:,eq], res.tvalues[:,eq],
+                  res.pvalues[:,eq], res.conf_int(alpha)[eq])
+
+        skiph = False
+        tble = summary_params(restup, yname=endog_names[eq],
+                              xname=exog_names, alpha=alpha, use_t=use_t,
+                              skip_header=skiph)
+
+        tables.append(tble)
+
+    # add titles, they will be moved to header lines in table_extend
+    for i in range(len(endog_names)):
+        tables[i].title = endog_names[i]
+
+    table_all = table_extend(tables, keep_headers=keep_headers)
+
+    return tables, table_all


 def table_extend(tables, keep_headers=True):
@@ -251,7 +692,52 @@ def table_extend(tables, keep_headers=True):
         merged tables as a single SimpleTable instance

     """
-    pass
+    from copy import deepcopy
+    for ii, t in enumerate(tables[:]): #[1:]:
+        t = deepcopy(t)
+
+        #move title to first cell of header
+        # TODO: check if we have multiline headers
+        if t[0].datatype == 'header':
+            t[0][0].data = t.title
+            t[0][0]._datatype = None
+            t[0][0].row = t[0][1].row
+            if not keep_headers and (ii > 0):
+                for c in t[0][1:]:
+                    c.data = ''
+
+        # add separating line and extend tables
+        if ii == 0:
+            table_all = t
+        else:
+            r1 = table_all[-1]
+            r1.add_format('txt', row_dec_below='-')
+            table_all.extend(t)
+
+    table_all.title = None
+    return table_all
+
+
+def summary_return(tables, return_fmt='text'):
+    # join table parts then print
+    if return_fmt == 'text':
+        strdrop = lambda x: str(x).rsplit('\n',1)[0]
+        # convert to string drop last line
+        return '\n'.join(lmap(strdrop, tables[:-1]) + [str(tables[-1])])
+    elif return_fmt == 'tables':
+        return tables
+    elif return_fmt == 'csv':
+        return '\n'.join(x.as_csv() for x in tables)
+    elif return_fmt == 'latex':
+        # TODO: insert \hline after updating SimpleTable
+        table = copy.deepcopy(tables[0])
+        for part in tables[1:]:
+            table.extend(part)
+        return table.as_latex_tabular()
+    elif return_fmt == 'html':
+        return "\n".join(table.as_html() for table in tables)
+    else:
+        raise ValueError('available output formats are text, csv, latex, html')


 class Summary:
@@ -270,7 +756,6 @@ class Summary:
         extra lines that are added to the text output, used for warnings
         and explanations.
     """
-
     def __init__(self):
         self.tables = []
         self.extra_txt = None
@@ -283,14 +768,14 @@ class Summary:

     def _repr_html_(self):
         """Display as HTML in IPython notebook."""
-        pass
+        return self.as_html()

     def _repr_latex_(self):
         """Display as LaTeX when converting IPython notebook to PDF."""
-        pass
+        return self.as_latex()

-    def add_table_2cols(self, res, title=None, gleft=None, gright=None,
-        yname=None, xname=None):
+    def add_table_2cols(self, res,  title=None, gleft=None, gright=None,
+                        yname=None, xname=None):
         """
         Add a double table, 2 tables with one column merged horizontally

@@ -312,10 +797,13 @@ class Summary:
             optional names for the exogenous variables, default is "var_xx".
             Must match the number of parameters in the model.
         """
-        pass

-    def add_table_params(self, res, yname=None, xname=None, alpha=0.05,
-        use_t=True):
+        table = summary_top(res, title=title, gleft=gleft, gright=gright,
+                            yname=yname, xname=xname)
+        self.tables.append(table)
+
+    def add_table_params(self, res, yname=None, xname=None, alpha=.05,
+                         use_t=True):
         """create and add a table for the parameter estimates

         Parameters
@@ -338,7 +826,16 @@ class Summary:
         None : table is attached

         """
-        pass
+        if res.params.ndim == 1:
+            table = summary_params(res, yname=yname, xname=xname, alpha=alpha,
+                                   use_t=use_t)
+        elif res.params.ndim == 2:
+            _, table = summary_params_2dflat(res, endog_names=yname,
+                                             exog_names=xname,
+                                             alpha=alpha, use_t=use_t)
+        else:
+            raise ValueError('params has to be 1d or 2d')
+        self.tables.append(table)

     def add_extra_txt(self, etext):
         """add additional text that will be added at the end in text format
@@ -349,7 +846,7 @@ class Summary:
             string with lines that are added to the text output.

         """
-        pass
+        self.extra_txt = '\n'.join(etext)

     def as_text(self):
         """return tables as string
@@ -360,7 +857,10 @@ class Summary:
             summary tables and extra text as one string

         """
-        pass
+        txt = summary_return(self.tables, return_fmt='text')
+        if self.extra_txt is not None:
+            txt = txt + '\n\n' + self.extra_txt
+        return txt

     def as_latex(self):
         """return tables as string
@@ -377,7 +877,10 @@ class Summary:
         tables.

         """
-        pass
+        latex = summary_return(self.tables, return_fmt='latex')
+        if self.extra_txt is not None:
+            latex = latex + '\n\n' + self.extra_txt.replace('\n', ' \\newline\n ')
+        return latex

     def as_csv(self):
         """return tables as string
@@ -388,7 +891,10 @@ class Summary:
             concatenated summary tables in comma delimited format

         """
-        pass
+        csv = summary_return(self.tables, return_fmt='csv')
+        if self.extra_txt is not None:
+            csv = csv + '\n\n' + self.extra_txt
+        return csv

     def as_html(self):
         """return tables as string
@@ -399,4 +905,7 @@ class Summary:
             concatenated summary tables in HTML format

         """
-        pass
+        html = summary_return(self.tables, return_fmt='html')
+        if self.extra_txt is not None:
+            html = html + '<br/><br/>' + self.extra_txt.replace('\n', '<br/>')
+        return html
diff --git a/statsmodels/iolib/summary2.py b/statsmodels/iolib/summary2.py
index a40869d90..d874096a2 100644
--- a/statsmodels/iolib/summary2.py
+++ b/statsmodels/iolib/summary2.py
@@ -1,17 +1,19 @@
 from statsmodels.compat.pandas import FUTURE_STACK
 from statsmodels.compat.python import lzip
+
 import datetime
 from functools import reduce
 import re
 import textwrap
+
 import numpy as np
 import pandas as pd
+
 from .table import SimpleTable
 from .tableformatting import fmt_latex, fmt_txt


 class Summary:
-
     def __init__(self):
         self.tables = []
         self.settings = []
@@ -27,14 +29,14 @@ class Summary:

     def _repr_html_(self):
         """Display as HTML in IPython notebook."""
-        pass
+        return self.as_html()

     def _repr_latex_(self):
-        """Display as LaTeX when converting IPython notebook to PDF."""
-        pass
+        '''Display as LaTeX when converting IPython notebook to PDF.'''
+        return self.as_latex()

     def add_df(self, df, index=True, header=True, float_format='%.4f',
-        align='r'):
+               align='r'):
         """
         Add the contents of a DataFrame to summary table

@@ -50,9 +52,13 @@ class Summary:
         align : str
             Data alignment (l/c/r)
         """
-        pass

-    def add_array(self, array, align='r', float_format='%.4f'):
+        settings = {'index': index, 'header': header,
+                    'float_format': float_format, 'align': align}
+        self.tables.append(df)
+        self.settings.append(settings)
+
+    def add_array(self, array, align='r', float_format="%.4f"):
         """Add the contents of a Numpy array to summary table

         Parameters
@@ -63,9 +69,12 @@ class Summary:
         align : str
             Data alignment (l/c/r)
         """
-        pass

-    def add_dict(self, d, ncols=2, align='l', float_format='%.4f'):
+        table = pd.DataFrame(array)
+        self.add_df(table, index=False, header=False,
+                    float_format=float_format, align=align)
+
+    def add_dict(self, d, ncols=2, align='l', float_format="%.4f"):
         """Add the contents of a Dict to summary table

         Parameters
@@ -80,13 +89,24 @@ class Summary:
         float_format : str
             Formatting to float data columns
         """
-        pass
+
+        keys = [_formatter(x, float_format) for x in d.keys()]
+        vals = [_formatter(x, float_format) for x in d.values()]
+        data = np.array(lzip(keys, vals))
+
+        if data.shape[0] % ncols != 0:
+            pad = ncols - (data.shape[0] % ncols)
+            data = np.vstack([data, np.array(pad * [['', '']])])
+
+        data = np.split(data, ncols)
+        data = reduce(lambda x, y: np.hstack([x, y]), data)
+        self.add_array(data, align=align)

     def add_text(self, string):
         """Append a note to the bottom of the summary table. In ASCII tables,
         the note will be wrapped to table width. Notes are not indented.
         """
-        pass
+        self.extra_txt.append(string)

     def add_title(self, title=None, results=None):
         """Insert a title on top of the summary table. If a string is provided
@@ -94,10 +114,19 @@ class Summary:
         provided but a results instance is provided, statsmodels attempts
         to construct a useful title automatically.
         """
-        pass
-
-    def add_base(self, results, alpha=0.05, float_format='%.4f', title=None,
-        xname=None, yname=None):
+        if isinstance(title, str):
+            self.title = title
+        else:
+            if results is not None:
+                model = results.model.__class__.__name__
+                if model in _model_types:
+                    model = _model_types[model]
+                self.title = 'Results: ' + model
+            else:
+                self.title = ''
+
+    def add_base(self, results, alpha=0.05, float_format="%.4f", title=None,
+                 xname=None, yname=None):
         """Try to construct a basic summary instance.

         Parameters
@@ -114,17 +143,71 @@ class Summary:
         yname : str
             Name of the dependent variable (optional)
         """
-        pass
+
+        param = summary_params(results, alpha=alpha, use_t=results.use_t)
+        info = summary_model(results)
+        if xname is not None:
+            param.index = xname
+        if yname is not None:
+            info['Dependent Variable:'] = yname
+        self.add_dict(info, align='l')
+        self.add_df(param, float_format=float_format)
+        self.add_title(title=title, results=results)

     def as_text(self):
         """Generate ASCII Summary Table
         """
-        pass
+
+        tables = self.tables
+        settings = self.settings
+        title = self.title
+        extra_txt = self.extra_txt
+
+        pad_col, pad_index, widest = _measure_tables(tables, settings)
+
+        rule_equal = widest * '='
+
+        simple_tables = _simple_tables(tables, settings, pad_col, pad_index)
+        tab = [x.as_text() for x in simple_tables]
+
+        tab = '\n'.join(tab)
+        tab = tab.split('\n')
+        tab[0] = rule_equal
+        tab.append(rule_equal)
+        tab = '\n'.join(tab)
+
+        if title is not None:
+            title = title
+            if len(title) < widest:
+                title = ' ' * int(widest / 2 - len(title) / 2) + title
+        else:
+            title = ''
+
+        txt = [textwrap.wrap(x, widest) for x in extra_txt]
+        txt = ['\n'.join(x) for x in txt]
+        txt = '\n'.join(txt)
+
+        out = '\n'.join([title, tab, txt])
+
+        return out

     def as_html(self):
         """Generate HTML Summary Table
         """
-        pass
+
+        tables = self.tables
+        settings = self.settings
+
+        simple_tables = _simple_tables(tables, settings)
+        tab = [x.as_html() for x in simple_tables]
+        tab = '\n'.join(tab)
+
+        temp_txt = [st.replace('\n', '<br/>\n')for st in self.extra_txt]
+        txt = '<br/>\n'.join(temp_txt)
+
+        out = '<br/>\n'.join([tab, txt])
+
+        return out

     def as_latex(self, label=''):
         """Generate LaTeX Summary Table
@@ -135,7 +218,35 @@ class Summary:
             Label of the summary table that can be referenced
             in a latex document (optional)
         """
-        pass
+        tables = self.tables
+        settings = self.settings
+        title = self.title
+
+        if title is not None:
+            title = '\\caption{' + title + '}'
+        else:
+            title = '\\caption{}'
+
+        label = '\\label{' + label + '}'
+
+        simple_tables = _simple_tables(tables, settings)
+        tab = [x.as_latex_tabular() for x in simple_tables]
+        tab = '\n\n'.join(tab)
+
+        to_replace = ('\\\\hline\\n\\\\hline\\n\\\\'
+                      'end{tabular}\\n\\\\begin{tabular}{.*}\\n')
+
+        if self._merge_latex:
+            # create single tabular object for summary_col
+            tab = re.sub(to_replace, r'\\midrule\n', tab)
+
+        non_captioned = '\\begin{table}', title, label, tab, '\\end{table}'
+        non_captioned = '\n'.join(non_captioned)
+
+        txt = ' \\newline \n'.join(self.extra_txt)
+        out = non_captioned + '\n\\bigskip\n' + txt
+
+        return out


 def _measure_tables(tables, settings):
@@ -144,25 +255,88 @@ def _measure_tables(tables, settings):
     width of the largest table. Then, we add a few spaces to the first
     column to pad the rest.
     """
-    pass
+
+    simple_tables = _simple_tables(tables, settings)
+    tab = [x.as_text() for x in simple_tables]
+
+    length = [len(x.splitlines()[0]) for x in tab]
+    len_max = max(length)
+    pad_sep = []
+    pad_index = []
+
+    for i in range(len(tab)):
+        nsep = max(tables[i].shape[1] - 1, 1)
+        pad = int((len_max - length[i]) / nsep)
+        pad_sep.append(pad)
+        len_new = length[i] + nsep * pad
+        pad_index.append(len_max - len_new)
+
+    return pad_sep, pad_index, max(length)


-_model_types = {'OLS': 'Ordinary least squares', 'GLS':
-    'Generalized least squares', 'GLSAR':
-    'Generalized least squares with AR(p)', 'WLS': 'Weighted least squares',
-    'RLM': 'Robust linear model', 'NBin': 'Negative binomial model', 'GLM':
-    'Generalized linear model'}
+# Useful stuff  # TODO: be more specific
+_model_types = {'OLS': 'Ordinary least squares',
+                'GLS': 'Generalized least squares',
+                'GLSAR': 'Generalized least squares with AR(p)',
+                'WLS': 'Weighted least squares',
+                'RLM': 'Robust linear model',
+                'NBin': 'Negative binomial model',
+                'GLM': 'Generalized linear model'
+                }


 def summary_model(results):
     """
     Create a dict with information about the model
     """
-    pass

-
-def summary_params(results, yname=None, xname=None, alpha=0.05, use_t=True,
-    skip_header=False, float_format='%.4f'):
+    def time_now(*args, **kwds):
+        now = datetime.datetime.now()
+        return now.strftime('%Y-%m-%d %H:%M')
+
+    info = {}
+    info['Model:'] = lambda x: x.model.__class__.__name__
+    info['Model Family:'] = lambda x: x.family.__class.__name__
+    info['Link Function:'] = lambda x: x.family.link.__class__.__name__
+    info['Dependent Variable:'] = lambda x: x.model.endog_names
+    info['Date:'] = time_now
+    info['No. Observations:'] = lambda x: "%#6d" % x.nobs
+    info['Df Model:'] = lambda x: "%#6d" % x.df_model
+    info['Df Residuals:'] = lambda x: "%#6d" % x.df_resid
+    info['Converged:'] = lambda x: x.mle_retvals['converged']
+    info['No. Iterations:'] = lambda x: x.mle_retvals['iterations']
+    info['Method:'] = lambda x: x.method
+    info['Norm:'] = lambda x: x.fit_options['norm']
+    info['Scale Est.:'] = lambda x: x.fit_options['scale_est']
+    info['Cov. Type:'] = lambda x: x.fit_options['cov']
+
+    rsquared_type = '' if results.k_constant else ' (uncentered)'
+    info['R-squared' + rsquared_type + ':'] = lambda x: "%#8.3f" % x.rsquared
+    info['Adj. R-squared' + rsquared_type + ':'] = lambda x: "%#8.3f" % x.rsquared_adj  # noqa:E501
+    info['Pseudo R-squared:'] = lambda x: "%#8.3f" % x.prsquared
+    info['AIC:'] = lambda x: "%8.4f" % x.aic
+    info['BIC:'] = lambda x: "%8.4f" % x.bic
+    info['Log-Likelihood:'] = lambda x: "%#8.5g" % x.llf
+    info['LL-Null:'] = lambda x: "%#8.5g" % x.llnull
+    info['LLR p-value:'] = lambda x: "%#8.5g" % x.llr_pvalue
+    info['Deviance:'] = lambda x: "%#8.5g" % x.deviance
+    info['Pearson chi2:'] = lambda x: "%#6.3g" % x.pearson_chi2
+    info['F-statistic:'] = lambda x: "%#8.4g" % x.fvalue
+    info['Prob (F-statistic):'] = lambda x: "%#6.3g" % x.f_pvalue
+    info['Scale:'] = lambda x: "%#8.5g" % x.scale
+    out = {}
+    for key, func in info.items():
+        try:
+            out[key] = func(results)
+        except (AttributeError, KeyError, NotImplementedError):
+            # NOTE: some models do not have loglike defined (RLM),
+            #   so raise NotImplementedError
+            pass
+    return out
+
+
+def summary_params(results, yname=None, xname=None, alpha=.05, use_t=True,
+                   skip_header=False, float_format="%.4f"):
     """create a summary table of parameters from results instance

     Parameters
@@ -189,23 +363,115 @@ def summary_params(results, yname=None, xname=None, alpha=0.05, use_t=True,
     -------
     params_table : SimpleTable instance
     """
-    pass
-

+    if isinstance(results, tuple):
+        results, params, bse, tvalues, pvalues, conf_int = results
+    else:
+        params = results.params
+        bse = results.bse
+        tvalues = results.tvalues
+        pvalues = results.pvalues
+        conf_int = results.conf_int(alpha)
+
+    data = np.array([params, bse, tvalues, pvalues]).T
+    data = np.hstack([data, conf_int])
+    data = pd.DataFrame(data)
+
+    if use_t:
+        data.columns = ['Coef.', 'Std.Err.', 't', 'P>|t|',
+                        '[' + str(alpha / 2), str(1 - alpha / 2) + ']']
+    else:
+        data.columns = ['Coef.', 'Std.Err.', 'z', 'P>|z|',
+                        '[' + str(alpha / 2), str(1 - alpha / 2) + ']']
+
+    if not xname:
+        try:
+            data.index = results.model.data.param_names
+        except AttributeError:
+            data.index = results.model.exog_names
+    else:
+        data.index = xname
+
+    return data
+
+
+# Vertical summary instance for multiple models
 def _col_params(result, float_format='%.4f', stars=True, include_r2=False):
     """Stack coefficients and standard errors in single column
     """
-    pass
+
+    # Extract parameters
+    res = summary_params(result)
+    # Format float
+    for col in res.columns[:2]:
+        res[col] = res[col].apply(lambda x: float_format % x)
+    # Std.Errors in parentheses
+    res.iloc[:, 1] = '(' + res.iloc[:, 1] + ')'
+    # Significance stars
+    if stars:
+        idx = res.iloc[:, 3] < .1
+        res.loc[idx, res.columns[0]] = res.loc[idx, res.columns[0]] + '*'
+        idx = res.iloc[:, 3] < .05
+        res.loc[idx, res.columns[0]] = res.loc[idx, res.columns[0]] + '*'
+        idx = res.iloc[:, 3] < .01
+        res.loc[idx, res.columns[0]] = res.loc[idx, res.columns[0]] + '*'
+    # Stack Coefs and Std.Errors
+    res = res.iloc[:, :2]
+    res = res.stack(**FUTURE_STACK)
+
+    # Add R-squared
+    if include_r2:
+        rsquared = getattr(result, 'rsquared', np.nan)
+        rsquared_adj = getattr(result, 'rsquared_adj', np.nan)
+        r2 = pd.Series({('R-squared', ""): rsquared,
+                        ('R-squared Adj.', ""): rsquared_adj})
+
+        if r2.notnull().any():
+            r2 = r2.apply(lambda x: float_format % x)
+            res = pd.concat([res, r2], axis=0)
+
+    res = pd.DataFrame(res)
+    res.columns = [str(result.model.endog_names)]
+    return res


 def _col_info(result, info_dict=None):
     """Stack model info in a column
     """
-    pass
+
+    if info_dict is None:
+        info_dict = {}
+    out = []
+    index = []
+    for i in info_dict:
+        if isinstance(info_dict[i], dict):
+            # this is a specific model info_dict, but not for this result...
+            continue
+        try:
+            out.append(info_dict[i](result))
+        except AttributeError:
+            out.append('')
+        index.append(i)
+    out = pd.DataFrame({str(result.model.endog_names): out}, index=index)
+    return out
+
+
+def _make_unique(list_of_names):
+    if len(set(list_of_names)) == len(list_of_names):
+        return list_of_names
+    # pandas does not like it if multiple columns have the same names
+    from collections import defaultdict
+    name_counter = defaultdict(str)
+    header = []
+    for _name in list_of_names:
+        name_counter[_name] += "I"
+        header.append(_name + " " + name_counter[_name])
+    return header


 def summary_col(results, float_format='%.4f', model_names=(), stars=False,
-    info_dict=None, regressor_order=(), drop_omitted=False, include_r2=True):
+                info_dict=None, regressor_order=(), drop_omitted=False,
+                include_r2=True):
     """
     Summarize multiple results instances side-by-side (coefs and SEs)

@@ -239,4 +505,144 @@ def summary_col(results, float_format='%.4f', model_names=(), stars=False,
     include_r2 : bool, optional
         Includes R2 and adjusted R2 in the summary table.
     """
-    pass
+
+    if not isinstance(results, list):
+        results = [results]
+
+    cols = [_col_params(x, stars=stars, float_format=float_format,
+                        include_r2=include_r2) for x in results]
+
+    # Unique column names (pandas has problems merging otherwise)
+    if model_names:
+        colnames = _make_unique(model_names)
+    else:
+        colnames = _make_unique([x.columns[0] for x in cols])
+    for i in range(len(cols)):
+        cols[i].columns = [colnames[i]]
+
+    def merg(x, y):
+        return x.merge(y, how='outer', right_index=True, left_index=True)
+
+    # Changes due to how pandas 2.2.0 handles merge
+    index = list(cols[0].index)
+    for col in cols[1:]:
+        for key in col.index:
+            if key not in index:
+                index.append(key)
+    for special in (('R-squared', ''), ('R-squared Adj.', '')):
+        if special in index:
+            index.remove(special)
+            index.insert(len(index), special)
+
+    summ = reduce(merg, cols)
+    summ = summ.reindex(index)
+
+    if regressor_order:
+        varnames = summ.index.get_level_values(0).tolist()
+        vc = pd.Series(varnames).value_counts()
+        varnames = vc.loc[vc == 2].index.tolist()
+        ordered = [x for x in regressor_order if x in varnames]
+        unordered = [x for x in varnames if x not in regressor_order]
+        new_order = ordered + unordered
+        other = [x for x in summ.index.get_level_values(0)
+                 if x not in new_order]
+        new_order += other
+        if drop_omitted:
+            for uo in unordered:
+                new_order.remove(uo)
+        summ = summ.reindex(new_order, level=0)
+
+    idx = []
+    index = summ.index.get_level_values(0)
+    for i in range(0, index.shape[0], 2):
+        idx.append(index[i])
+        if (i + 1) < index.shape[0] and (index[i] == index[i + 1]):
+            idx.append("")
+        else:
+            idx.append(index[i + 1])
+    summ.index = idx
+
+    # add infos about the models.
+    if info_dict:
+        cols = [_col_info(x, info_dict.get(x.model.__class__.__name__,
+                                           info_dict)) for x in results]
+    else:
+        cols = [_col_info(x, getattr(x, "default_model_infos", None)) for x in
+                results]
+    # use unique column names, otherwise the merge will not succeed
+    for df, name in zip(cols, _make_unique([df.columns[0] for df in cols])):
+        df.columns = [name]
+
+    info = reduce(merg, cols)
+    dat = pd.DataFrame(np.vstack([summ, info]))  # pd.concat better, but error
+    dat.columns = summ.columns
+    dat.index = pd.Index(summ.index.tolist() + info.index.tolist())
+    summ = dat
+
+    summ = summ.fillna('')
+
+    smry = Summary()
+    smry._merge_latex = True
+    smry.add_df(summ, header=True, align='l')
+    smry.add_text('Standard errors in parentheses.')
+    if stars:
+        smry.add_text('* p<.1, ** p<.05, ***p<.01')
+
+    return smry
+
+
+def _formatter(element, float_format='%.4f'):
+    try:
+        out = float_format % element
+    except (ValueError, TypeError):
+        out = str(element)
+    return out.strip()
+
+
+def _df_to_simpletable(df, align='r', float_format="%.4f", header=True,
+                       index=True, table_dec_above='-', table_dec_below=None,
+                       header_dec_below='-', pad_col=0, pad_index=0):
+    dat = df.copy()
+    try:
+        dat = dat.map(lambda x: _formatter(x, float_format))
+    except AttributeError:
+        dat = dat.applymap(lambda x: _formatter(x, float_format))
+    if header:
+        headers = [str(x) for x in dat.columns.tolist()]
+    else:
+        headers = None
+    if index:
+        stubs = [str(x) + int(pad_index) * ' ' for x in dat.index.tolist()]
+    else:
+        dat.iloc[:, 0] = [str(x) + int(pad_index) * ' '
+                          for x in dat.iloc[:, 0]]
+        stubs = None
+    st = SimpleTable(np.array(dat), headers=headers, stubs=stubs,
+                     ltx_fmt=fmt_latex, txt_fmt=fmt_txt)
+    st.output_formats['latex']['data_aligns'] = align
+    st.output_formats['latex']['header_align'] = align
+    st.output_formats['txt']['data_aligns'] = align
+    st.output_formats['txt']['table_dec_above'] = table_dec_above
+    st.output_formats['txt']['table_dec_below'] = table_dec_below
+    st.output_formats['txt']['header_dec_below'] = header_dec_below
+    st.output_formats['txt']['colsep'] = ' ' * int(pad_col + 1)
+    return st
+
+
+def _simple_tables(tables, settings, pad_col=None, pad_index=None):
+    simple_tables = []
+    float_format = settings[0]['float_format'] if settings else '%.4f'
+    if pad_col is None:
+        pad_col = [0] * len(tables)
+    if pad_index is None:
+        pad_index = [0] * len(tables)
+    for i, v in enumerate(tables):
+        index = settings[i]['index']
+        header = settings[i]['header']
+        align = settings[i]['align']
+        simple_tables.append(_df_to_simpletable(v, align=align,
+                                                float_format=float_format,
+                                                header=header, index=index,
+                                                pad_col=pad_col[i],
+                                                pad_index=pad_index[i]))
+    return simple_tables
diff --git a/statsmodels/iolib/table.py b/statsmodels/iolib/table.py
index a4eb14c0e..3293b5d60 100644
--- a/statsmodels/iolib/table.py
+++ b/statsmodels/iolib/table.py
@@ -81,7 +81,9 @@ Potential problems for Python 3
 :change: 2010-05-02 eliminate newlines that came before and after table
 :change: 2010-05-06 add `label_cells` to `SimpleTable`
 """
+
 from statsmodels.compat.python import lmap, lrange
+
 from itertools import cycle, zip_longest
 import csv

@@ -94,7 +96,29 @@ def csv2st(csvfile, headers=False, stubs=False, title=None):
     The first column may contain stubs: set stubs=True.
     Can also supply headers and stubs as tuples of strings.
     """
-    pass
+    rows = list()
+    with open(csvfile, 'r', encoding="utf-8") as fh:
+        reader = csv.reader(fh)
+        if headers is True:
+            headers = next(reader)
+        elif headers is False:
+            headers = ()
+        if stubs is True:
+            stubs = list()
+            for row in reader:
+                if row:
+                    stubs.append(row[0])
+                    rows.append(row[1:])
+        else:  # no stubs, or stubs provided
+            for row in reader:
+                if row:
+                    rows.append(row)
+        if stubs is False:
+            stubs = ()
+    ncols = len(rows[0])
+    if any(len(row) != ncols for row in rows):
+        raise IOError('All rows of CSV file must have same length.')
+    return SimpleTable(data=rows, headers=headers, stubs=stubs)


 class SimpleTable(list):
@@ -123,10 +147,9 @@ class SimpleTable(list):
         with open('c:/temp/temp.tex','w') as fh:
             fh.write( tbl.as_latex_tabular() )
     """
-
-    def __init__(self, data, headers=None, stubs=None, title='', datatypes=
-        None, csv_fmt=None, txt_fmt=None, ltx_fmt=None, html_fmt=None,
-        celltype=None, rowtype=None, **fmt_dict):
+    def __init__(self, data, headers=None, stubs=None, title='',
+                 datatypes=None, csv_fmt=None, txt_fmt=None, ltx_fmt=None,
+                 html_fmt=None, celltype=None, rowtype=None, **fmt_dict):
         """
         Parameters
         ----------
@@ -159,23 +182,31 @@ class SimpleTable(list):
         self._datatypes = datatypes
         if self._datatypes is None:
             self._datatypes = [] if len(data) == 0 else lrange(len(data[0]))
+        # start with default formatting
         self._txt_fmt = default_txt_fmt.copy()
         self._latex_fmt = default_latex_fmt.copy()
         self._csv_fmt = default_csv_fmt.copy()
         self._html_fmt = default_html_fmt.copy()
+        # substitute any general user specified formatting
+        # :note: these will be overridden by output specific arguments
         self._csv_fmt.update(fmt_dict)
         self._txt_fmt.update(fmt_dict)
         self._latex_fmt.update(fmt_dict)
         self._html_fmt.update(fmt_dict)
+        # substitute any output-type specific formatting
         self._csv_fmt.update(csv_fmt or dict())
         self._txt_fmt.update(txt_fmt or dict())
         self._latex_fmt.update(ltx_fmt or dict())
         self._html_fmt.update(html_fmt or dict())
-        self.output_formats = dict(txt=self._txt_fmt, csv=self._csv_fmt,
-            html=self._html_fmt, latex=self._latex_fmt)
+        self.output_formats = dict(
+            txt=self._txt_fmt,
+            csv=self._csv_fmt,
+            html=self._html_fmt,
+            latex=self._latex_fmt
+        )
         self._Cell = celltype or Cell
         self._Row = rowtype or Row
-        rows = self._data2rows(data)
+        rows = self._data2rows(data)  # a list of Row instances
         list.__init__(self, rows)
         self._add_headers_stubs(headers, stubs)
         self._colwidths = dict()
@@ -186,6 +217,12 @@ class SimpleTable(list):
     def __repr__(self):
         return str(type(self))

+    def _repr_html_(self, **fmt_dict):
+        return self.as_html(**fmt_dict)
+
+    def _repr_latex_(self, center=True, **fmt_dict):
+        return self.as_latex_tabular(center, **fmt_dict)
+
     def _add_headers_stubs(self, headers, stubs):
         """Return None.  Adds headers and stubs to table,
         if these were provided at initialization.
@@ -198,58 +235,160 @@ class SimpleTable(list):

         :note: a header row does not receive a stub!
         """
-        pass
+        if headers:
+            self.insert_header_row(0, headers, dec_below='header_dec_below')
+        if stubs:
+            self.insert_stubs(0, stubs)

     def insert(self, idx, row, datatype=None):
         """Return None.  Insert a row into a table.
         """
-        pass
+        if datatype is None:
+            try:
+                datatype = row.datatype
+            except AttributeError:
+                pass
+        row = self._Row(row, datatype=datatype, table=self)
+        list.insert(self, idx, row)

     def insert_header_row(self, rownum, headers, dec_below='header_dec_below'):
         """Return None.  Insert a row of headers,
         where ``headers`` is a sequence of strings.
         (The strings may contain newlines, to indicated multiline headers.)
         """
-        pass
+        header_rows = [header.split('\n') for header in headers]
+        # rows in reverse order
+        rows = list(zip_longest(*header_rows, **dict(fillvalue='')))
+        rows.reverse()
+        for i, row in enumerate(rows):
+            self.insert(rownum, row, datatype='header')
+            if i == 0:
+                self[rownum].dec_below = dec_below
+            else:
+                self[rownum].dec_below = None

     def insert_stubs(self, loc, stubs):
         """Return None.  Insert column of stubs at column `loc`.
         If there is a header row, it gets an empty cell.
         So ``len(stubs)`` should equal the number of non-header rows.
         """
-        pass
+        _Cell = self._Cell
+        stubs = iter(stubs)
+        for row in self:
+            if row.datatype == 'header':
+                empty_cell = _Cell('', datatype='empty')
+                row.insert(loc, empty_cell)
+            else:
+                try:
+                    row.insert_stub(loc, next(stubs))
+                except StopIteration:
+                    raise ValueError('length of stubs must match table length')

     def _data2rows(self, raw_data):
         """Return list of Row,
         the raw data as rows of cells.
         """
-        pass
+
+        _Cell = self._Cell
+        _Row = self._Row
+        rows = []
+        for datarow in raw_data:
+            dtypes = cycle(self._datatypes)
+            newrow = _Row(datarow, datatype='data', table=self, celltype=_Cell)
+            for cell in newrow:
+                cell.datatype = next(dtypes)
+                cell.row = newrow  # a cell knows its row
+            rows.append(newrow)
+
+        return rows

     def pad(self, s, width, align):
         """DEPRECATED: just use the pad function"""
-        pass
+        return pad(s, width, align)

     def _get_colwidths(self, output_format, **fmt_dict):
         """Return list, the calculated widths of each column."""
-        pass
+        output_format = get_output_format(output_format)
+        fmt = self.output_formats[output_format].copy()
+        fmt.update(fmt_dict)
+        ncols = max(len(row) for row in self)
+        request = fmt.get('colwidths')
+        if request == 0:  # assume no extra space desired (e.g, CSV)
+            return [0] * ncols
+        elif request is None:  # assume no extra space desired (e.g, CSV)
+            request = [0] * ncols
+        elif isinstance(request, int):
+            request = [request] * ncols
+        elif len(request) < ncols:
+            request = [request[i % len(request)] for i in range(ncols)]
+        min_widths = []
+        for col in zip(*self):
+            maxwidth = max(len(c.format(0, output_format, **fmt)) for c in col)
+            min_widths.append(maxwidth)
+        result = lmap(max, min_widths, request)
+        return result

     def get_colwidths(self, output_format, **fmt_dict):
         """Return list, the widths of each column."""
-        pass
+        call_args = [output_format]
+        for k, v in sorted(fmt_dict.items()):
+            if isinstance(v, list):
+                call_args.append((k, tuple(v)))
+            elif isinstance(v, dict):
+                call_args.append((k, tuple(sorted(v.items()))))
+            else:
+                call_args.append((k, v))
+        key = tuple(call_args)
+        try:
+            return self._colwidths[key]
+        except KeyError:
+            self._colwidths[key] = self._get_colwidths(output_format,
+                                                       **fmt_dict)
+            return self._colwidths[key]

     def _get_fmt(self, output_format, **fmt_dict):
         """Return dict, the formatting options.
         """
-        pass
+        output_format = get_output_format(output_format)
+        # first get the default formatting
+        try:
+            fmt = self.output_formats[output_format].copy()
+        except KeyError:
+            raise ValueError('Unknown format: %s' % output_format)
+        # then, add formatting specific to this call
+        fmt.update(fmt_dict)
+        return fmt

     def as_csv(self, **fmt_dict):
         """Return string, the table in CSV format.
         Currently only supports comma separator."""
-        pass
+        # fetch the format, which may just be default_csv_format
+        fmt = self._get_fmt('csv', **fmt_dict)
+        return self.as_text(**fmt)

     def as_text(self, **fmt_dict):
         """Return string, the table as text."""
-        pass
+        # fetch the text format, override with fmt_dict
+        fmt = self._get_fmt('txt', **fmt_dict)
+        # get rows formatted as strings
+        formatted_rows = [row.as_string('text', **fmt) for row in self]
+        rowlen = len(formatted_rows[-1])  # do not use header row
+
+        # place decoration above the table body, if desired
+        table_dec_above = fmt.get('table_dec_above', '=')
+        if table_dec_above:
+            formatted_rows.insert(0, table_dec_above * rowlen)
+        # next place a title at the very top, if desired
+        # :note: user can include a newlines at end of title if desired
+        title = self.title
+        if title:
+            title = pad(self.title, rowlen, fmt.get('title_align', 'c'))
+            formatted_rows.insert(0, title)
+        # add decoration below the table, if desired
+        table_dec_below = fmt.get('table_dec_below', '-')
+        if table_dec_below:
+            formatted_rows.append(table_dec_below * rowlen)
+        return '\n'.join(formatted_rows)

     def as_html(self, **fmt_dict):
         """Return string.
@@ -257,12 +396,63 @@ class SimpleTable(list):
         An HTML table formatter must accept as arguments
         a table and a format dictionary.
         """
-        pass
+        # fetch the text format, override with fmt_dict
+        fmt = self._get_fmt('html', **fmt_dict)
+        formatted_rows = ['<table class="simpletable">']
+        if self.title:
+            title = '<caption>%s</caption>' % self.title
+            formatted_rows.append(title)
+        formatted_rows.extend(row.as_string('html', **fmt) for row in self)
+        formatted_rows.append('</table>')
+        return '\n'.join(formatted_rows)

     def as_latex_tabular(self, center=True, **fmt_dict):
-        """Return string, the table as a LaTeX tabular environment.
-        Note: will require the booktabs package."""
-        pass
+        '''Return string, the table as a LaTeX tabular environment.
+        Note: will require the booktabs package.'''
+        # fetch the text format, override with fmt_dict
+        fmt = self._get_fmt('latex', **fmt_dict)
+
+        formatted_rows = []
+        if center:
+            formatted_rows.append(r'\begin{center}')
+
+        table_dec_above = fmt['table_dec_above'] or ''
+        table_dec_below = fmt['table_dec_below'] or ''
+
+        prev_aligns = None
+        last = None
+        for row in self + [last]:
+            if row == last:
+                aligns = None
+            else:
+                aligns = row.get_aligns('latex', **fmt)
+
+            if aligns != prev_aligns:
+                # When the number/type of columns changes...
+                if prev_aligns:
+                    # ... if there is a tabular to close, close it...
+                    formatted_rows.append(table_dec_below)
+                    formatted_rows.append(r'\end{tabular}')
+                if aligns:
+                    # ... and if there are more lines, open a new one:
+                    formatted_rows.append(r'\begin{tabular}{%s}' % aligns)
+                    if not prev_aligns:
+                        # (with a nice line if it's the top of the whole table)
+                        formatted_rows.append(table_dec_above)
+            if row != last:
+                formatted_rows.append(
+                    row.as_string(output_format='latex', **fmt))
+            prev_aligns = aligns
+        # tabular does not support caption, but make it available for
+        # figure environment
+        if self.title:
+            title = r'%%\caption{%s}' % self.title
+            formatted_rows.append(title)
+        if center:
+            formatted_rows.append(r'\end{center}')
+
+        # Replace $$ due to bug in GH 5444
+        return '\n'.join(formatted_rows).replace('$$', ' ')

     def extend_right(self, table):
         """Return None.
@@ -275,29 +465,43 @@ class SimpleTable(list):
         only if the two tables have the same number of columns,
         but that is not enforced.
         """
-        pass
+        for row1, row2 in zip(self, table):
+            row1.extend(row2)

     def label_cells(self, func):
         """Return None.  Labels cells based on `func`.
         If ``func(cell) is None`` then its datatype is
         not changed; otherwise it is set to ``func(cell)``.
         """
-        pass
+        for row in self:
+            for cell in row:
+                label = func(cell)
+                if label is not None:
+                    cell.datatype = label
+
+    @property
+    def data(self):
+        return [row.data for row in self]


 def pad(s, width, align):
     """Return string padded with spaces,
     based on alignment parameter."""
-    pass
+    if align == 'l':
+        s = s.ljust(width)
+    elif align == 'r':
+        s = s.rjust(width)
+    else:
+        s = s.center(width)
+    return s


 class Row(list):
     """Provides a table row as a list of cells.
     A row can belong to a SimpleTable, but does not have to.
     """
-
     def __init__(self, seq, datatype='data', table=None, celltype=None,
-        dec_below='row_dec_below', **fmt_dict):
+                 dec_below='row_dec_below', **fmt_dict):
         """
         Parameters
         ----------
@@ -318,7 +522,7 @@ class Row(list):
                 celltype = table._Cell
         self._Cell = celltype
         self._fmt = fmt_dict
-        self.special_fmts = dict()
+        self.special_fmts = dict()  # special formatting for any output format
         self.dec_below = dec_below
         list.__init__(self, (celltype(cell, row=self) for cell in seq))

@@ -328,23 +532,48 @@ class Row(list):
         for the specified output format.
         Example: myrow.add_format('txt', row_dec_below='+-')
         """
-        pass
+        output_format = get_output_format(output_format)
+        if output_format not in self.special_fmts:
+            self.special_fmts[output_format] = dict()
+        self.special_fmts[output_format].update(fmt_dict)

     def insert_stub(self, loc, stub):
         """Return None.  Inserts a stub cell
         in the row at `loc`.
         """
-        pass
+        _Cell = self._Cell
+        if not isinstance(stub, _Cell):
+            stub = stub
+            stub = _Cell(stub, datatype='stub', row=self)
+        self.insert(loc, stub)

     def _get_fmt(self, output_format, **fmt_dict):
         """Return dict, the formatting options.
         """
-        pass
+        output_format = get_output_format(output_format)
+        # first get the default formatting
+        try:
+            fmt = default_fmts[output_format].copy()
+        except KeyError:
+            raise ValueError('Unknown format: %s' % output_format)
+        # second get table specific formatting (if possible)
+        try:
+            fmt.update(self.table.output_formats[output_format])
+        except AttributeError:
+            pass
+        # finally, add formatting for this row and this call
+        fmt.update(self._fmt)
+        fmt.update(fmt_dict)
+        special_fmt = self.special_fmts.get(output_format, None)
+        if special_fmt is not None:
+            fmt.update(special_fmt)
+        return fmt

     def get_aligns(self, output_format, **fmt_dict):
         """Return string, sequence of column alignments.
         Ensure comformable data_aligns in `fmt_dict`."""
-        pass
+        fmt = self._get_fmt(output_format, **fmt_dict)
+        return ''.join(cell.alignment(output_format, **fmt) for cell in self)

     def as_string(self, output_format='txt', **fmt_dict):
         """Return string: the formatted row.
@@ -354,21 +583,61 @@ class Row(list):
         a row (self) and an output format,
         one of ('html', 'txt', 'csv', 'latex').
         """
-        pass
+        fmt = self._get_fmt(output_format, **fmt_dict)
+
+        # get column widths
+        try:
+            colwidths = self.table.get_colwidths(output_format, **fmt)
+        except AttributeError:
+            colwidths = fmt.get('colwidths')
+        if colwidths is None:
+            colwidths = (0,) * len(self)
+
+        colsep = fmt['colsep']
+        row_pre = fmt.get('row_pre', '')
+        row_post = fmt.get('row_post', '')
+        formatted_cells = []
+        for cell, width in zip(self, colwidths):
+            content = cell.format(width, output_format=output_format, **fmt)
+            formatted_cells.append(content)
+        formatted_row = row_pre + colsep.join(formatted_cells) + row_post
+        formatted_row = self._decorate_below(formatted_row, output_format,
+                                             **fmt)
+        return formatted_row

     def _decorate_below(self, row_as_string, output_format, **fmt_dict):
         """This really only makes sense for the text and latex output formats.
         """
-        pass
+        dec_below = fmt_dict.get(self.dec_below, None)
+        if dec_below is None:
+            result = row_as_string
+        else:
+            output_format = get_output_format(output_format)
+            if output_format == 'txt':
+                row0len = len(row_as_string)
+                dec_len = len(dec_below)
+                repeat, addon = divmod(row0len, dec_len)
+                result = row_as_string + "\n" + (dec_below * repeat +
+                                                 dec_below[:addon])
+            elif output_format == 'latex':
+                result = row_as_string + "\n" + dec_below
+            else:
+                raise ValueError("I cannot decorate a %s header." %
+                                 output_format)
+        return result
+
+    @property
+    def data(self):
+        return [cell.data for cell in self]


 class Cell:
     """Provides a table cell.
     A cell can belong to a Row, but does not have to.
     """
-
     def __init__(self, data='', datatype=None, row=None, **fmt_dict):
         if isinstance(data, Cell):
+            # might have passed a Cell instance
             self.data = data.data
             self._datatype = data.datatype
             self._fmt = data._fmt
@@ -385,7 +654,52 @@ class Cell:
     def _get_fmt(self, output_format, **fmt_dict):
         """Return dict, the formatting options.
         """
-        pass
+        output_format = get_output_format(output_format)
+        # first get the default formatting
+        try:
+            fmt = default_fmts[output_format].copy()
+        except KeyError:
+            raise ValueError('Unknown format: %s' % output_format)
+        # then get any table specific formtting
+        try:
+            fmt.update(self.row.table.output_formats[output_format])
+        except AttributeError:
+            pass
+        # then get any row specific formtting
+        try:
+            fmt.update(self.row._fmt)
+        except AttributeError:
+            pass
+        # finally add formatting for this instance and call
+        fmt.update(self._fmt)
+        fmt.update(fmt_dict)
+        return fmt
+
+    def alignment(self, output_format, **fmt_dict):
+        fmt = self._get_fmt(output_format, **fmt_dict)
+        datatype = self.datatype
+        data_aligns = fmt.get('data_aligns', 'c')
+        if isinstance(datatype, int):
+            align = data_aligns[datatype % len(data_aligns)]
+        elif datatype == 'stub':
+            # still support deprecated `stubs_align`
+            align = fmt.get('stubs_align') or fmt.get('stub_align', 'l')
+        elif datatype in fmt:
+            label_align = '%s_align' % datatype
+            align = fmt.get(label_align, 'c')
+        else:
+            raise ValueError('Unknown cell datatype: %s' % datatype)
+        return align
+
+    @staticmethod
+    def _latex_escape(data, fmt, output_format):
+        if output_format != 'latex':
+            return data
+        if "replacements" in fmt:
+            if isinstance(data, str):
+                for repl in sorted(fmt["replacements"]):
+                    data = data.replace(repl, fmt["replacements"][repl])
+        return data

     def format(self, width, output_format='txt', **fmt_dict):
         """Return string.
@@ -397,10 +711,55 @@ class Cell:
         It will generally respond to the datatype,
         one of (int, 'header', 'stub').
         """
-        pass
+        fmt = self._get_fmt(output_format, **fmt_dict)
+
+        data = self.data
+        datatype = self.datatype
+        data_fmts = fmt.get('data_fmts')
+        if data_fmts is None:
+            # chk allow for deprecated use of data_fmt
+            data_fmt = fmt.get('data_fmt')
+            if data_fmt is None:
+                data_fmt = '%s'
+            data_fmts = [data_fmt]
+        if isinstance(datatype, int):
+            datatype = datatype % len(data_fmts)  # constrain to indexes
+            data_fmt = data_fmts[datatype]
+            if isinstance(data_fmt, str):
+                content = data_fmt % (data,)
+            elif callable(data_fmt):
+                content = data_fmt(data)
+            else:
+                raise TypeError("Must be a string or a callable")
+            if datatype == 0:
+                content = self._latex_escape(content, fmt, output_format)
+        elif datatype in fmt:
+            data = self._latex_escape(data, fmt, output_format)
+
+            dfmt = fmt.get(datatype)
+            try:
+                content = dfmt % (data,)
+            except TypeError:  # dfmt is not a substitution string
+                content = dfmt
+        else:
+            raise ValueError('Unknown cell datatype: %s' % datatype)
+        align = self.alignment(output_format, **fmt)
+        return pad(content, width, align)
+
+    def get_datatype(self):
+        if self._datatype is None:
+            dtype = self.row.datatype
+        else:
+            dtype = self._datatype
+        return dtype
+
+    def set_datatype(self, val):
+        # TODO: add checking
+        self._datatype = val
     datatype = property(get_datatype, set_datatype)


+# begin: default formats for SimpleTable
 """ Some formatting suggestions:

 - if you want rows to have no extra spacing,
@@ -416,34 +775,153 @@ class Cell:
         colwidths = 14,
         data_aligns = "r",
 """
-default_txt_fmt = dict(fmt='txt', table_dec_above='=', table_dec_below='-',
-    title_align='c', row_pre='', row_post='', header_dec_below='-',
-    row_dec_below=None, colwidths=None, colsep=' ', data_aligns='r',
-    data_fmts=['%s'], stub_align='l', header_align='c', header_fmt='%s',
-    stub_fmt='%s', header='%s', stub='%s', empty_cell='', empty='', missing
-    ='--')
-default_csv_fmt = dict(fmt='csv', table_dec_above=None, table_dec_below=
-    None, row_pre='', row_post='', header_dec_below=None, row_dec_below=
-    None, title_align='', data_aligns='l', colwidths=None, colsep=',',
-    data_fmt='%s', data_fmts=['%s'], stub_align='l', header_align='c',
-    header_fmt='"%s"', stub_fmt='"%s"', empty_cell='', header='%s', stub=
-    '%s', empty='', missing='--')
-default_html_fmt = dict(table_dec_above=None, table_dec_below=None,
-    header_dec_below=None, row_dec_below=None, title_align='c', colwidths=
-    None, colsep=' ', row_pre='<tr>\n  ', row_post='\n</tr>', data_aligns=
-    'c', data_fmts=['<td>%s</td>'], data_fmt='<td>%s</td>', stub_align='l',
-    header_align='c', header_fmt='<th>%s</th>', stub_fmt='<th>%s</th>',
-    empty_cell='<td></td>', header='<th>%s</th>', stub='<th>%s</th>', empty
-    ='<td></td>', missing='<td>--</td>')
-default_latex_fmt = dict(fmt='ltx', table_dec_above='\\toprule',
-    table_dec_below='\\bottomrule', header_dec_below='\\midrule',
-    row_dec_below=None, strip_backslash=True, row_post='  \\\\',
-    data_aligns='c', colwidths=None, colsep=' & ', data_fmts=['%s'],
-    data_fmt='%s', stub_align='l', header_align='c', empty_align='l',
-    header_fmt='\\textbf{%s}', stub_fmt='\\textbf{%s}', empty_cell='',
-    header='\\textbf{%s}', stub='\\textbf{%s}', empty='', missing='--',
-    replacements={'#': '\\#', '$': '\\$', '%': '\\%', '&': '\\&', '>':
-    '$>$', '_': '\\_', '|': '$|$'})
-default_fmts = dict(html=default_html_fmt, txt=default_txt_fmt, latex=
-    default_latex_fmt, csv=default_csv_fmt)
-output_format_translations = dict(htm='html', text='txt', ltx='latex')
+default_txt_fmt = dict(
+    fmt='txt',
+    # basic table formatting
+    table_dec_above='=',
+    table_dec_below='-',
+    title_align='c',
+    # basic row formatting
+    row_pre='',
+    row_post='',
+    header_dec_below='-',
+    row_dec_below=None,
+    colwidths=None,
+    colsep=' ',
+    data_aligns="r",  # GH 1477
+    # data formats
+    # data_fmt="%s",  #deprecated; use data_fmts
+    data_fmts=["%s"],
+    # labeled alignments
+    # stubs_align='l',   #deprecated; use data_fmts
+    stub_align='l',
+    header_align='c',
+    # labeled formats
+    header_fmt='%s',  # deprecated; just use 'header'
+    stub_fmt='%s',  # deprecated; just use 'stub'
+    header='%s',
+    stub='%s',
+    empty_cell='',  # deprecated; just use 'empty'
+    empty='',
+    missing='--',
+)
+
+default_csv_fmt = dict(
+    fmt='csv',
+    table_dec_above=None,  # '',
+    table_dec_below=None,  # '',
+    # basic row formatting
+    row_pre='',
+    row_post='',
+    header_dec_below=None,  # '',
+    row_dec_below=None,
+    title_align='',
+    data_aligns="l",
+    colwidths=None,
+    colsep=',',
+    # data formats
+    data_fmt='%s',  # deprecated; use data_fmts
+    data_fmts=['%s'],
+    # labeled alignments
+    # stubs_align='l',   # deprecated; use data_fmts
+    stub_align="l",
+    header_align='c',
+    # labeled formats
+    header_fmt='"%s"',  # deprecated; just use 'header'
+    stub_fmt='"%s"',  # deprecated; just use 'stub'
+    empty_cell='',  # deprecated; just use 'empty'
+    header='%s',
+    stub='%s',
+    empty='',
+    missing='--',
+)
+
+default_html_fmt = dict(
+    # basic table formatting
+    table_dec_above=None,
+    table_dec_below=None,
+    header_dec_below=None,
+    row_dec_below=None,
+    title_align='c',
+    # basic row formatting
+    colwidths=None,
+    colsep=' ',
+    row_pre='<tr>\n  ',
+    row_post='\n</tr>',
+    data_aligns="c",
+    # data formats
+    data_fmts=['<td>%s</td>'],
+    data_fmt="<td>%s</td>",  # deprecated; use data_fmts
+    # labeled alignments
+    # stubs_align='l',   #deprecated; use data_fmts
+    stub_align='l',
+    header_align='c',
+    # labeled formats
+    header_fmt='<th>%s</th>',  # deprecated; just use `header`
+    stub_fmt='<th>%s</th>',  # deprecated; just use `stub`
+    empty_cell='<td></td>',  # deprecated; just use `empty`
+    header='<th>%s</th>',
+    stub='<th>%s</th>',
+    empty='<td></td>',
+    missing='<td>--</td>',
+)
+
+default_latex_fmt = dict(
+    fmt='ltx',
+    # basic table formatting
+    table_dec_above=r'\toprule',
+    table_dec_below=r'\bottomrule',
+    header_dec_below=r'\midrule',
+    row_dec_below=None,
+    strip_backslash=True,  # NotImplemented
+    # row formatting
+    row_post=r'  \\',
+    data_aligns='c',
+    colwidths=None,
+    colsep=' & ',
+    # data formats
+    data_fmts=['%s'],
+    data_fmt='%s',  # deprecated; use data_fmts
+    # labeled alignments
+    # stubs_align='l',   # deprecated; use data_fmts
+    stub_align='l',
+    header_align='c',
+    empty_align='l',
+    # labeled formats
+    header_fmt=r'\textbf{%s}',  # deprecated; just use 'header'
+    stub_fmt=r'\textbf{%s}',  # deprecated; just use 'stub'
+    empty_cell='',  # deprecated; just use 'empty'
+    header=r'\textbf{%s}',
+    stub=r'\textbf{%s}',
+    empty='',
+    missing='--',
+    # replacements will be processed in lexicographical order
+    replacements={"#": r"\#",
+                  "$": r"\$",
+                  "%": r"\%",
+                  "&": r"\&",
+                  ">": r"$>$",
+                  "_": r"\_",
+                  "|": r"$|$"}
+)
+
+default_fmts = dict(
+    html=default_html_fmt,
+    txt=default_txt_fmt,
+    latex=default_latex_fmt,
+    csv=default_csv_fmt
+)
+output_format_translations = dict(
+    htm='html',
+    text='txt',
+    ltx='latex'
+)
+
+
+def get_output_format(output_format):
+    if output_format not in ('html', 'txt', 'latex', 'csv'):
+        try:
+            output_format = output_format_translations[output_format]
+        except KeyError:
+            raise ValueError('unknown output format %s' % output_format)
+    return output_format
diff --git a/statsmodels/iolib/tableformatting.py b/statsmodels/iolib/tableformatting.py
index b21d135ec..2b2e96fb8 100644
--- a/statsmodels/iolib/tableformatting.py
+++ b/statsmodels/iolib/tableformatting.py
@@ -3,47 +3,150 @@ Summary Table formating
 This is here to help keep the formating consistent across the different models
 """
 import copy
-gen_fmt = {'data_fmts': ['%s', '%s', '%s', '%s', '%s'], 'empty_cell': '',
-    'colwidths': 7, 'colsep': '   ', 'row_pre': '  ', 'row_post': '  ',
-    'table_dec_above': '": ', 'table_dec_below': None, 'header_dec_below':
-    None, 'header_fmt': '%s', 'stub_fmt': '%s', 'title_align': 'c',
-    'header_align': 'r', 'data_aligns': 'r', 'stubs_align': 'l', 'fmt': 'txt'}
-fmt_1_right = {'data_fmts': ['%s', '%s', '%s', '%s', '%s'], 'empty_cell':
-    '', 'colwidths': 16, 'colsep': '   ', 'row_pre': '', 'row_post': '',
-    'table_dec_above': '": ', 'table_dec_below': None, 'header_dec_below':
-    None, 'header_fmt': '%s', 'stub_fmt': '%s', 'title_align': 'c',
-    'header_align': 'r', 'data_aligns': 'r', 'stubs_align': 'l', 'fmt': 'txt'}
-fmt_2 = {'data_fmts': ['%s', '%s', '%s', '%s'], 'empty_cell': '',
-    'colwidths': 10, 'colsep': ' ', 'row_pre': '  ', 'row_post': '   ',
-    'table_dec_above': '": ', 'table_dec_below': '": ', 'header_dec_below':
-    '-', 'header_fmt': '%s', 'stub_fmt': '%s', 'title_align': 'c',
-    'header_align': 'r', 'data_aligns': 'r', 'stubs_align': 'l', 'fmt': 'txt'}
-fmt_base = {'data_fmts': ['%s', '%s', '%s', '%s', '%s'], 'empty_cell': '',
-    'colwidths': 10, 'colsep': ' ', 'row_pre': '', 'row_post': '',
-    'table_dec_above': '=', 'table_dec_below': '=', 'header_dec_below': '-',
-    'header_fmt': '%s', 'stub_fmt': '%s', 'title_align': 'c',
-    'header_align': 'r', 'data_aligns': 'r', 'stubs_align': 'l', 'fmt': 'txt'}
+
+gen_fmt = {
+    "data_fmts": ["%s", "%s", "%s", "%s", "%s"],
+    "empty_cell": '',
+    "colwidths": 7,
+    "colsep": '   ',
+    "row_pre": '  ',
+    "row_post": '  ',
+    "table_dec_above": '": ',
+    "table_dec_below": None,
+    "header_dec_below": None,
+    "header_fmt": '%s',
+    "stub_fmt": '%s',
+    "title_align": 'c',
+    "header_align": 'r',
+    "data_aligns": "r",
+    "stubs_align": "l",
+    "fmt": 'txt'
+}
+
+# Note table_1l_fmt over rides the below formating unless it is not
+# appended to table_1l
+fmt_1_right = {
+    "data_fmts": ["%s", "%s", "%s", "%s", "%s"],
+    "empty_cell": '',
+    "colwidths": 16,
+    "colsep": '   ',
+    "row_pre": '',
+    "row_post": '',
+    "table_dec_above": '": ',
+    "table_dec_below": None,
+    "header_dec_below": None,
+    "header_fmt": '%s',
+    "stub_fmt": '%s',
+    "title_align": 'c',
+    "header_align": 'r',
+    "data_aligns": "r",
+    "stubs_align": "l",
+    "fmt": 'txt'
+}
+
+fmt_2 = {
+    "data_fmts": ["%s", "%s", "%s", "%s"],
+    "empty_cell": '',
+    "colwidths": 10,
+    "colsep": ' ',
+    "row_pre": '  ',
+    "row_post": '   ',
+    "table_dec_above": '": ',
+    "table_dec_below": '": ',
+    "header_dec_below": '-',
+    "header_fmt": '%s',
+    "stub_fmt": '%s',
+    "title_align": 'c',
+    "header_align": 'r',
+    "data_aligns": 'r',
+    "stubs_align": 'l',
+    "fmt": 'txt'
+}
+
+
+# new version  # TODO: as of when?  compared to what?  is old version needed?
+fmt_base = {
+    "data_fmts": ["%s", "%s", "%s", "%s", "%s"],
+    "empty_cell": '',
+    "colwidths": 10,
+    "colsep": ' ',
+    "row_pre": '',
+    "row_post": '',
+    "table_dec_above": '=',
+    "table_dec_below": '=',  # TODO need '=' at the last subtable
+    "header_dec_below": '-',
+    "header_fmt": '%s',
+    "stub_fmt": '%s',
+    "title_align": 'c',
+    "header_align": 'r',
+    "data_aligns": 'r',
+    "stubs_align": 'l',
+    "fmt": 'txt'
+}
+
 fmt_2cols = copy.deepcopy(fmt_base)
-fmt2 = {'data_fmts': ['%18s', '-%19s', '%18s', '%19s'], 'colsep': ' ',
-    'colwidths': 18, 'stub_fmt': '-%21s'}
+
+fmt2 = {
+    "data_fmts": ["%18s", "-%19s", "%18s", "%19s"],  # TODO: TODO: what?
+    "colsep": ' ',
+    "colwidths": 18,
+    "stub_fmt": '-%21s',
+}
 fmt_2cols.update(fmt2)
+
 fmt_params = copy.deepcopy(fmt_base)
-fmt3 = {'data_fmts': ['%s', '%s', '%8s', '%s', '%11s', '%11s']}
+
+fmt3 = {
+    "data_fmts": ["%s", "%s", "%8s", "%s", "%11s", "%11s"],
+}
 fmt_params.update(fmt3)
+
 """
 Summary Table formating
 This is here to help keep the formating consistent across the different models
 """
-fmt_latex = {'colsep': ' & ', 'colwidths': None, 'data_aligns': 'r',
-    'data_fmt': '%s', 'data_fmts': ['%s'], 'empty': '', 'empty_cell': '',
-    'fmt': 'ltx', 'header': '%s', 'header_align': 'c', 'header_dec_below':
-    '\\hline', 'header_fmt': '%s', 'missing': '--', 'row_dec_below': None,
-    'row_post': '  \\\\', 'strip_backslash': True, 'stub': '%s',
-    'stub_align': 'l', 'stub_fmt': '%s', 'table_dec_above': '\\hline',
+fmt_latex = {
+    'colsep': ' & ',
+    'colwidths': None,
+    'data_aligns': 'r',
+    'data_fmt': '%s',
+    'data_fmts': ['%s'],
+    'empty': '',
+    'empty_cell': '',
+    'fmt': 'ltx',
+    'header': '%s',
+    'header_align': 'c',
+    'header_dec_below': '\\hline',
+    'header_fmt': '%s',
+    'missing': '--',
+    'row_dec_below': None,
+    'row_post': '  \\\\',
+    'strip_backslash': True,
+    'stub': '%s',
+    'stub_align': 'l',
+    'stub_fmt': '%s',
+    'table_dec_above': '\\hline',
     'table_dec_below': '\\hline'}
-fmt_txt = {'colsep': ' ', 'colwidths': None, 'data_aligns': 'r',
-    'data_fmts': ['%s'], 'empty': '', 'empty_cell': '', 'fmt': 'txt',
-    'header': '%s', 'header_align': 'c', 'header_dec_below': '-',
-    'header_fmt': '%s', 'missing': '--', 'row_dec_below': None, 'row_post':
-    '', 'row_pre': '', 'stub': '%s', 'stub_align': 'l', 'stub_fmt': '%s',
-    'table_dec_above': '-', 'table_dec_below': None, 'title_align': 'c'}
+
+fmt_txt = {
+    'colsep': ' ',
+    'colwidths': None,
+    'data_aligns': 'r',
+    'data_fmts': ['%s'],
+    'empty': '',
+    'empty_cell': '',
+    'fmt': 'txt',
+    'header': '%s',
+    'header_align': 'c',
+    'header_dec_below': '-',
+    'header_fmt': '%s',
+    'missing': '--',
+    'row_dec_below': None,
+    'row_post': '',
+    'row_pre': '',
+    'stub': '%s',
+    'stub_align': 'l',
+    'stub_fmt': '%s',
+    'table_dec_above': '-',
+    'table_dec_below': None,
+    'title_align': 'c'}
diff --git a/statsmodels/miscmodels/api.py b/statsmodels/miscmodels/api.py
index 2bd91c15f..167aaa6a5 100644
--- a/statsmodels/miscmodels/api.py
+++ b/statsmodels/miscmodels/api.py
@@ -1,3 +1,5 @@
-__all__ = ['TLinearModel', 'PoissonGMLE', 'PoissonOffsetGMLE', 'PoissonZiGMLE']
+__all__ = ["TLinearModel", "PoissonGMLE", "PoissonOffsetGMLE", "PoissonZiGMLE"]
 from .tmodel import TLinearModel
-from .count import PoissonGMLE, PoissonOffsetGMLE, PoissonZiGMLE
+from .count import (PoissonGMLE, PoissonOffsetGMLE, PoissonZiGMLE,
+                    #NonlinearDeltaCov
+                    )
diff --git a/statsmodels/miscmodels/count.py b/statsmodels/miscmodels/count.py
index 363471ae9..d93dec55d 100644
--- a/statsmodels/miscmodels/count.py
+++ b/statsmodels/miscmodels/count.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Mon Jul 26 08:34:59 2010

@@ -33,8 +34,16 @@ from scipy.special import factorial
 from statsmodels.base.model import GenericLikelihoodModel


+def maxabs(arr1, arr2):
+    return np.max(np.abs(arr1 - arr2))
+
+def maxabsrel(arr1, arr2):
+    return np.max(np.abs(arr2 / arr1 - 1))
+
+
+
 class PoissonGMLE(GenericLikelihoodModel):
-    """Maximum Likelihood Estimation of Poisson Model
+    '''Maximum Likelihood Estimation of Poisson Model

     This is an example for generic MLE which has the same
     statistical model as discretemod.Poisson.
@@ -44,8 +53,9 @@ class PoissonGMLE(GenericLikelihoodModel):
     and all resulting statistics are based on numerical
     differentiation.

-    """
+    '''

+    # copied from discretemod.Poisson
     def nloglikeobs(self, params):
         """
         Loglikelihood of Poisson model
@@ -63,16 +73,28 @@ class PoissonGMLE(GenericLikelihoodModel):
         -----
         .. math:: \\ln L=\\sum_{i=1}^{n}\\left[-\\lambda_{i}+y_{i}x_{i}^{\\prime}\\beta-\\ln y_{i}!\\right]
         """
-        pass
+        XB = np.dot(self.exog, params)
+        endog = self.endog
+        return np.exp(XB) -  endog*XB + np.log(factorial(endog))

     def predict_distribution(self, exog):
-        """return frozen scipy.stats distribution with mu at estimated prediction
-        """
-        pass
+        '''return frozen scipy.stats distribution with mu at estimated prediction
+        '''
+        if not hasattr(self, "result"):
+            # TODO: why would this be ValueError instead of AttributeError?
+            # TODO: Why even make this a Model attribute in the first place?
+            #  It belongs on the Results class
+            raise ValueError
+        else:
+            result = self.result
+            params = result.params
+            mu = np.exp(np.dot(exog, params))
+            return stats.poisson(mu, loc=0)
+


 class PoissonOffsetGMLE(GenericLikelihoodModel):
-    """Maximum Likelihood Estimation of Poisson Model
+    '''Maximum Likelihood Estimation of Poisson Model

     This is an example for generic MLE which has the same
     statistical model as discretemod.Poisson but adds offset
@@ -82,18 +104,24 @@ class PoissonOffsetGMLE(GenericLikelihoodModel):
     and all resulting statistics are based on numerical
     differentiation.

-    """
+    '''

     def __init__(self, endog, exog=None, offset=None, missing='none', **kwds):
+        # let them be none in case user wants to use inheritance
         if offset is not None:
             if offset.ndim == 1:
-                offset = offset[:, None]
+                offset = offset[:,None] #need column
             self.offset = offset.ravel()
         else:
-            self.offset = 0.0
-        super(PoissonOffsetGMLE, self).__init__(endog, exog, missing=
-            missing, **kwds)
+            self.offset = 0.
+        super(PoissonOffsetGMLE, self).__init__(endog, exog, missing=missing,
+                **kwds)
+
+#this was added temporarily for bug-hunting, but should not be needed
+#    def loglike(self, params):
+#        return -self.nloglikeobs(params).sum(0)

+    # original copied from discretemod.Poisson
     def nloglikeobs(self, params):
         """
         Loglikelihood of Poisson model
@@ -111,11 +139,14 @@ class PoissonOffsetGMLE(GenericLikelihoodModel):
         -----
         .. math:: \\ln L=\\sum_{i=1}^{n}\\left[-\\lambda_{i}+y_{i}x_{i}^{\\prime}\\beta-\\ln y_{i}!\\right]
         """
-        pass

+        XB = self.offset + np.dot(self.exog, params)
+        endog = self.endog
+        nloglik = np.exp(XB) -  endog*XB + np.log(factorial(endog))
+        return nloglik

 class PoissonZiGMLE(GenericLikelihoodModel):
-    """Maximum Likelihood Estimation of Poisson Model
+    '''Maximum Likelihood Estimation of Poisson Model

     This is an example for generic MLE which has the same statistical model
     as discretemod.Poisson but adds offset and zero-inflation.
@@ -127,25 +158,35 @@ class PoissonZiGMLE(GenericLikelihoodModel):

     There are numerical problems if there is no zero-inflation.

-    """
+    '''

     def __init__(self, endog, exog=None, offset=None, missing='none', **kwds):
+        # let them be none in case user wants to use inheritance
         self.k_extra = 1
         super(PoissonZiGMLE, self).__init__(endog, exog, missing=missing,
-            extra_params_names=['zi'], **kwds)
+                extra_params_names=["zi"], **kwds)
         if offset is not None:
             if offset.ndim == 1:
-                offset = offset[:, None]
-            self.offset = offset.ravel()
+                offset = offset[:,None] #need column
+            self.offset = offset.ravel()  #which way?
         else:
-            self.offset = 0.0
+            self.offset = 0.
+
+        #TODO: it's not standard pattern to use default exog
         if exog is None:
-            self.exog = np.ones((self.nobs, 1))
+            self.exog = np.ones((self.nobs,1))
         self.nparams = self.exog.shape[1]
+        #what's the shape in regression for exog if only constant
         self.start_params = np.hstack((np.ones(self.nparams), 0))
+        # need to add zi params to nparams
         self.nparams += 1
         self.cloneattr = ['start_params']
+        # needed for t_test and summary
+        # Note: no added to super __init__ which also adjusts df_resid
+        # self.exog_names.append('zi')
+

+    # original copied from discretemod.Poisson
     def nloglikeobs(self, params):
         """
         Loglikelihood of Poisson model
@@ -163,4 +204,13 @@ class PoissonZiGMLE(GenericLikelihoodModel):
         -----
         .. math:: \\ln L=\\sum_{i=1}^{n}\\left[-\\lambda_{i}+y_{i}x_{i}^{\\prime}\\beta-\\ln y_{i}!\\right]
         """
-        pass
+        beta = params[:-1]
+        gamm = 1 / (1 + np.exp(params[-1]))  #check this
+        # replace with np.dot(self.exogZ, gamma)
+        #print(np.shape(self.offset), self.exog.shape, beta.shape
+        XB = self.offset + np.dot(self.exog, beta)
+        endog = self.endog
+        nloglik = -np.log(1-gamm) + np.exp(XB) -  endog*XB + np.log(factorial(endog))
+        nloglik[endog==0] = - np.log(gamm + np.exp(-nloglik[endog==0]))
+
+        return nloglik
diff --git a/statsmodels/miscmodels/nonlinls.py b/statsmodels/miscmodels/nonlinls.py
index 298f58f6b..126433a01 100644
--- a/statsmodels/miscmodels/nonlinls.py
+++ b/statsmodels/miscmodels/nonlinls.py
@@ -1,24 +1,58 @@
-"""Non-linear least squares
+'''Non-linear least squares



 Author: Josef Perktold based on scipy.optimize.curve_fit

-"""
+'''
 import numpy as np
 from scipy import optimize
+
 from statsmodels.base.model import Model


 class Results:
-    """just a dummy placeholder for now
+    '''just a dummy placeholder for now
     most results from RegressionResults can be used here
-    """
+    '''
     pass


-class NonlinearLS(Model):
-    """Base class for estimation of a non-linear model with least squares
+##def getjaccov(retval, n):
+##    '''calculate something and raw covariance matrix from return of optimize.leastsq
+##
+##    I cannot figure out how to recover the Jacobian, or whether it is even
+##    possible
+##
+##    this is a partial copy of scipy.optimize.leastsq
+##    '''
+##    info = retval[-1]
+##    #n = len(x0)  #nparams, where do I get this
+##    cov_x = None
+##    if info in [1,2,3,4]:
+##        from numpy.dual import inv
+##        from numpy.linalg import LinAlgError
+##        perm = np.take(np.eye(n), retval[1]['ipvt']-1,0)
+##        r = np.triu(np.transpose(retval[1]['fjac'])[:n,:])
+##        R = np.dot(r, perm)
+##        try:
+##            cov_x = inv(np.dot(np.transpose(R),R))
+##        except LinAlgError:
+##            print 'cov_x not available'
+##            pass
+##        return r, R, cov_x
+##
+##def _general_function(params, xdata, ydata, function):
+##    return function(xdata, *params) - ydata
+##
+##def _weighted_general_function(params, xdata, ydata, function, weights):
+##    return weights * (function(xdata, *params) - ydata)
+##
+
+
+
+class NonlinearLS(Model):  #or subclass a model
+    r'''Base class for estimation of a non-linear model with least squares

     This class is supposed to be subclassed, and the subclass has to provide a method
     `_predict` that defines the non-linear function `f(params) that is predicting the endogenous
@@ -28,7 +62,7 @@ class NonlinearLS(Model):

     and the estimator minimizes the sum of squares of the estimated error.

-    :math: min_parmas \\sum (y - f(params))**2
+    :math: min_parmas \sum (y - f(params))**2

     f has to return the prediction for each observation. Exogenous or explanatory variables
     should be accessed as attributes of the class instance, and can be given as arguments
@@ -74,56 +108,198 @@ class NonlinearLS(Model):
     myres.tvalues


-    """
-
+    '''
+    #NOTE: This needs to call super for data checking
     def __init__(self, endog=None, exog=None, weights=None, sigma=None,
-        missing='none'):
+            missing='none'):
         self.endog = endog
         self.exog = exog
         if sigma is not None:
             sigma = np.asarray(sigma)
             if sigma.ndim < 2:
                 self.sigma = sigma
-                self.weights = 1.0 / sigma
+                self.weights = 1./sigma
             else:
                 raise ValueError('correlated errors are not handled yet')
         else:
             self.weights = None

-    def fit_minimal(self, start_value, **kwargs):
-        """minimal fitting with no extra calculations"""
+    def predict(self, exog, params=None):
+        #copied from GLS, Model has different signature
+        return self._predict(params)
+
+
+    def _predict(self, params):
         pass

+    def start_value(self):
+        return None
+
+    def geterrors(self, params, weights=None):
+        if weights is None:
+            if self.weights is None:
+                return self.endog - self._predict(params)
+            else:
+                weights = self.weights
+        return weights * (self.endog - self._predict(params))
+
+    def errorsumsquares(self, params):
+        return (self.geterrors(params)**2).sum()
+
+
+    def fit(self, start_value=None, nparams=None, **kw):
+        #if hasattr(self, 'start_value'):
+        #I added start_value even if it's empty, not sure about it
+        #but it makes a visible placeholder
+
+        if start_value is not None:
+            p0 = start_value
+        else:
+            #nesting so that start_value is only calculated if it is needed
+            p0 = self.start_value()
+            if p0 is not None:
+                pass
+            elif nparams is not None:
+                p0 = 0.1 * np.ones(nparams)
+            else:
+                raise ValueError('need information about start values for' +
+                             'optimization')
+
+        func = self.geterrors
+        res = optimize.leastsq(func, p0, full_output=1, **kw)
+        (popt, pcov, infodict, errmsg, ier) = res
+
+        if ier not in [1,2,3,4]:
+            msg = "Optimal parameters not found: " + errmsg
+            raise RuntimeError(msg)
+
+        err = infodict['fvec']
+
+        ydata = self.endog
+        if (len(ydata) > len(p0)) and pcov is not None:
+            #this can use the returned errors instead of recalculating
+
+            s_sq = (err**2).sum()/(len(ydata)-len(p0))
+            pcov = pcov * s_sq
+        else:
+            pcov = None
+
+        self.df_resid = len(ydata)-len(p0)
+        self.df_model = len(p0)
+        fitres = Results()
+        fitres.params = popt
+        fitres.pcov = pcov
+        fitres.rawres = res
+        self.wendog = self.endog  #add weights
+        self.wexog = self.jac_predict(popt)
+        pinv_wexog = np.linalg.pinv(self.wexog)
+        self.normalized_cov_params = np.dot(pinv_wexog,
+                                         np.transpose(pinv_wexog))
+
+        #TODO: check effect of `weights` on result statistics
+        #I think they are correctly included in cov_params
+        #maybe not anymore, I'm not using pcov of leastsq
+        #direct calculation with jac_predict misses the weights
+
+##        if not weights is None
+##            fitres.wexogw = self.weights * self.jacpredict(popt)
+        from statsmodels.regression import RegressionResults
+        results = RegressionResults
+
+        beta = popt
+        lfit = RegressionResults(self, beta,
+                       normalized_cov_params=self.normalized_cov_params)
+
+        lfit.fitres = fitres   #mainly for testing
+        self._results = lfit
+        return lfit
+
+    def fit_minimal(self, start_value, **kwargs):
+        '''minimal fitting with no extra calculations'''
+        func = self.geterrors
+        res = optimize.leastsq(func, start_value, full_output=0, **kwargs)
+        return res
+
     def fit_random(self, ntries=10, rvs_generator=None, nparams=None):
-        """fit with random starting values
+        '''fit with random starting values

         this could be replaced with a global fitter

-        """
-        pass
+        '''
+
+        if nparams is None:
+            nparams = self.nparams
+        if rvs_generator is None:
+            rvs = np.random.uniform(low=-10, high=10, size=(ntries, nparams))
+        else:
+            rvs = rvs_generator(size=(ntries, nparams))
+
+        results = np.array([np.r_[self.fit_minimal(rv),  rv] for rv in rvs])
+        #selct best results and check how many solutions are within 1e-6 of best
+        #not sure what leastsq returns
+        return results

     def jac_predict(self, params):
-        """jacobian of prediction function using complex step derivative
+        '''jacobian of prediction function using complex step derivative

         This assumes that the predict function does not use complex variable
         but is designed to do so.

-        """
-        pass
+        '''
+        from statsmodels.tools.numdiff import approx_fprime_cs
+
+        jaccs_err = approx_fprime_cs(params, self._predict)
+        return jaccs_err


 class Myfunc(NonlinearLS):
-    pass
+
+    #predict model.Model has a different signature
+##    def predict(self, params, exog=None):
+##        if not exog is None:
+##            x = exog
+##        else:
+##            x = self.exog
+##        a, b, c = params
+##        return a*np.exp(-b*x) + c
+
+    def _predict(self, params):
+        x = self.exog
+        a, b, c = params
+        return a*np.exp(-b*x) + c
+
+
+


 if __name__ == '__main__':
-    x = np.linspace(0, 4, 50)
+    def func0(x, a, b, c):
+        return a*np.exp(-b*x) + c
+
+    def func(params, x):
+        a, b, c = params
+        return a*np.exp(-b*x) + c
+
+    def error(params, x, y):
+        return y - func(params, x)
+
+    def error2(params, x, y):
+        return (y - func(params, x))**2
+
+
+
+
+    x = np.linspace(0,4,50)
     params = np.array([2.5, 1.3, 0.5])
     y0 = func(params, x)
-    y = y0 + 0.2 * np.random.normal(size=len(x))
+    y = y0 + 0.2*np.random.normal(size=len(x))
+
     res = optimize.leastsq(error, params, args=(x, y), full_output=True)
+##    r, R, c = getjaccov(res[1:], 3)
+
     mod = Myfunc(y, x)
     resmy = mod.fit(nparams=3)
+
     cf_params, cf_pcov = optimize.curve_fit(func0, x, y)
     cf_bse = np.sqrt(np.diag(cf_pcov))
     print(res[0])
diff --git a/statsmodels/miscmodels/ordinal_model.py b/statsmodels/miscmodels/ordinal_model.py
index 1f090c9b6..0eaca91d9 100644
--- a/statsmodels/miscmodels/ordinal_model.py
+++ b/statsmodels/miscmodels/ordinal_model.py
@@ -1,17 +1,28 @@
+# -*- coding: utf-8 -*-
 """
 Created on Sat Aug 22 20:24:42 2015

 Author: Josef Perktold
 License: BSD-3
 """
+
 import warnings
+
 from statsmodels.compat.pandas import Appender
+
 import numpy as np
 import pandas as pd
 from pandas.api.types import CategoricalDtype
 from scipy import stats
-from statsmodels.base.model import Model, LikelihoodModel, GenericLikelihoodModel, GenericLikelihoodModelResults
+
+from statsmodels.base.model import (
+    Model,
+    LikelihoodModel,
+    GenericLikelihoodModel,
+    GenericLikelihoodModelResults,
+)
 import statsmodels.base.wrapper as wrap
+# for results wrapper:
 import statsmodels.regression.linear_model as lm
 from statsmodels.tools.decorators import cache_readonly

@@ -104,39 +115,52 @@ class OrderedModel(GenericLikelihoodModel):
     _formula_max_endog = np.inf

     def __init__(self, endog, exog, offset=None, distr='probit', **kwds):
+
         if distr == 'probit':
             self.distr = stats.norm
         elif distr == 'logit':
             self.distr = stats.logistic
         else:
             self.distr = distr
+
         if offset is not None:
             offset = np.asarray(offset)
+
         self.offset = offset
+
         endog, labels, is_pandas = self._check_inputs(endog, exog)
+
         super(OrderedModel, self).__init__(endog, exog, **kwds)
-        k_levels = None
+        k_levels = None  # initialize
         if not is_pandas:
             if self.endog.ndim == 1:
                 unique, index = np.unique(self.endog, return_inverse=True)
                 self.endog = index
                 labels = unique
                 if np.isnan(labels).any():
-                    msg = (
-                        'NaN in dependent variable detected. Missing values need to be removed.'
-                        )
+                    msg = ("NaN in dependent variable detected. "
+                           "Missing values need to be removed.")
                     raise ValueError(msg)
             elif self.endog.ndim == 2:
-                if not hasattr(self, 'design_info'):
-                    raise ValueError('2-dim endog not supported')
+                if not hasattr(self, "design_info"):
+                    raise ValueError("2-dim endog not supported")
+                # this branch is currently only in support of from_formula
+                # we need to initialize k_levels correctly for df_resid
                 k_levels = self.endog.shape[1]
                 labels = []
+                # Note: Doing the following here would break from_formula
+                # self.endog = self.endog.argmax(1)
+
         if self.k_constant > 0:
-            raise ValueError('There should not be a constant in the model')
+            raise ValueError("There should not be a constant in the model")
+
         self._initialize_labels(labels, k_levels=k_levels)
+
+        # adjust df
         self.k_extra = self.k_levels - 1
         self.df_model = self.k_vars
         self.df_resid = self.nobs - (self.k_vars + self.k_extra)
+
         self.results_class = OrderedResults

     def _check_inputs(self, endog, exog):
@@ -167,9 +191,88 @@ class OrderedModel(GenericLikelihoodModel):
             Series and False otherwise.

         """
-        pass
+
+        if not isinstance(self.distr, stats.rv_continuous):
+            msg = (
+                f"{self.distr.name} is not a scipy.stats distribution."
+            )
+            warnings.warn(msg)
+
+        labels = None
+        is_pandas = False
+        if isinstance(endog, pd.Series):
+            if isinstance(endog.dtypes, CategoricalDtype):
+                if not endog.dtype.ordered:
+                    warnings.warn("the endog has ordered == False, "
+                                  "risk of capturing a wrong order for the "
+                                  "categories. ordered == True preferred.",
+                                  Warning)
+
+                endog_name = endog.name
+                labels = endog.values.categories
+                endog = endog.cat.codes
+                if endog.min() == -1:  # means there is a missing value
+                    raise ValueError("missing values in categorical endog are "
+                                     "not supported")
+                endog.name = endog_name
+                is_pandas = True
+
+        return endog, labels, is_pandas
+
+    def _initialize_labels(self, labels, k_levels=None):
+        self.labels = labels
+        if k_levels is None:
+            self.k_levels = len(labels)
+        else:
+            self.k_levels = k_levels
+
+        if self.exog is not None:
+            self.nobs, self.k_vars = self.exog.shape
+        else:  # no exog in model
+            self.nobs, self.k_vars = self.endog.shape[0], 0
+
+        threshold_names = [str(x) + '/' + str(y)
+                           for x, y in zip(labels[:-1], labels[1:])]
+
+        # from GenericLikelihoodModel.fit
+        if self.exog is not None:
+            # avoid extending several times
+            if len(self.exog_names) > self.k_vars:
+                raise RuntimeError("something wrong with exog_names, too long")
+            self.exog_names.extend(threshold_names)
+        else:
+            self.data.xnames = threshold_names
+
+    @classmethod
+    def from_formula(cls, formula, data, subset=None, drop_cols=None,
+                     *args, **kwargs):
+
+        # we want an explicit Intercept in the model that we can remove
+        # Removing constant with "0 +" or "- 1" does not work for categ. exog
+
+        endog_name = formula.split("~")[0].strip()
+        original_endog = data[endog_name]
+
+        model = super(OrderedModel, cls).from_formula(
+            formula, data=data, drop_cols=["Intercept"], *args, **kwargs)
+
+        if model.endog.ndim == 2:
+            if not (isinstance(original_endog.dtype, CategoricalDtype)
+                    and original_endog.dtype.ordered):
+                msg = ("Only ordered pandas Categorical are supported as "
+                       "endog in formulas")
+                raise ValueError(msg)
+
+            labels = original_endog.values.categories
+            model._initialize_labels(labels)
+            model.endog = model.endog.argmax(1)
+            model.data.ynames = endog_name
+
+        return model
+
     from_formula.__func__.__doc__ = Model.from_formula.__doc__

+
     def cdf(self, x):
         """Cdf evaluated at x.

@@ -184,7 +287,7 @@ class OrderedModel(GenericLikelihoodModel):
         Value of the cumulative distribution function of the underlying latent
         variable evaluated at x.
         """
-        pass
+        return self.distr.cdf(x)

     def pdf(self, x):
         """Pdf evaluated at x
@@ -200,7 +303,7 @@ class OrderedModel(GenericLikelihoodModel):
         Value of the probability density function of the underlying latent
         variable evaluated at x.
         """
-        pass
+        return self.distr.pdf(x)

     def prob(self, low, upp):
         """Interval probability.
@@ -222,7 +325,7 @@ class OrderedModel(GenericLikelihoodModel):
             Probability that value falls in interval (low, upp]

         """
-        pass
+        return np.maximum(self.cdf(upp) - self.cdf(low), 0)

     def transform_threshold_params(self, params):
         """transformation of the parameters in the optimization
@@ -242,7 +345,11 @@ class OrderedModel(GenericLikelihoodModel):
             Thresh are the thresholds or cutoff constants for the intervals.

         """
-        pass
+        th_params = params[-(self.k_levels - 1):]
+        thresh = np.concatenate((th_params[:1],
+                                 np.exp(th_params[1:]))).cumsum()
+        thresh = np.concatenate(([-np.inf], thresh, [np.inf]))
+        return thresh

     def transform_reverse_threshold_params(self, params):
         """obtain transformed thresholds from original thresholds or cutoffs
@@ -262,9 +369,11 @@ class OrderedModel(GenericLikelihoodModel):
             Transformed parameters can be any real number without restrictions.

         """
-        pass
+        thresh_params = np.concatenate((params[:1],
+                                        np.log(np.diff(params[:-1]))))
+        return thresh_params

-    def predict(self, params, exog=None, offset=None, which='prob'):
+    def predict(self, params, exog=None, offset=None, which="prob"):
         """
         Predicted probabilities for each level of the ordinal endog.

@@ -300,7 +409,23 @@ class OrderedModel(GenericLikelihoodModel):
             latent variable is returned. In this case, the return is
             one-dimensional.
         """
-        pass
+        # note, exog and offset handling is in linpred
+
+        thresh = self.transform_threshold_params(params)
+        xb = self._linpred(params, exog=exog, offset=offset)
+        if which == "linpred":
+            return xb
+        xb = xb[:, None]
+        low = thresh[:-1] - xb
+        upp = thresh[1:] - xb
+        if which == "prob":
+            prob = self.prob(low, upp)
+            return prob
+        elif which in ["cum", "cumprob"]:
+            cumprob = self.cdf(upp)
+            return cumprob
+        else:
+            raise ValueError("`which` is not available")

     def _linpred(self, params, exog=None, offset=None):
         """Linear prediction of latent variable `x b + offset`.
@@ -325,7 +450,26 @@ class OrderedModel(GenericLikelihoodModel):
             If exog and offset are None, then the predicted values are zero.

         """
-        pass
+        if exog is None:
+            exog = self.exog
+            if offset is None:
+                offset = self.offset
+        else:
+            if offset is None:
+                offset = 0
+
+        if offset is not None:
+            offset = np.asarray(offset)
+
+        if exog is not None:
+            _exog = np.asarray(exog)
+            _params = np.asarray(params)
+            linpred = _exog.dot(_params[:-(self.k_levels - 1)])
+        else:  # means self.exog is also None
+            linpred = np.zeros(self.nobs)
+        if offset is not None:
+            linpred += offset
+        return linpred

     def _bounds(self, params):
         """Integration bounds for the observation specific interval.
@@ -357,7 +501,19 @@ class OrderedModel(GenericLikelihoodModel):
             1-dim with length nobs.

         """
-        pass
+        thresh = self.transform_threshold_params(params)
+
+        thresh_i_low = thresh[self.endog]
+        thresh_i_upp = thresh[self.endog + 1]
+        xb = self._linpred(params)
+        low = thresh_i_low - xb
+        upp = thresh_i_upp - xb
+        return low, upp
+
+    @Appender(GenericLikelihoodModel.loglike.__doc__)
+    def loglike(self, params):
+
+        return self.loglikeobs(params).sum()

     def loglikeobs(self, params):
         """
@@ -374,7 +530,9 @@ class OrderedModel(GenericLikelihoodModel):
             The log likelihood for each observation of the model evaluated
             at ``params``.
         """
-        pass
+        low, upp = self._bounds(params)
+        prob = self.prob(low, upp)
+        return np.log(prob + 1e-20)

     def score_obs_(self, params):
         """score, first derivative of loglike for each observations
@@ -383,7 +541,30 @@ class OrderedModel(GenericLikelihoodModel):
         exog parameters, but not with respect to threshold parameters.

         """
-        pass
+        low, upp = self._bounds(params)
+
+        prob = self.prob(low, upp)
+        pdf_upp = self.pdf(upp)
+        pdf_low = self.pdf(low)
+
+        # TODO the following doesn't work yet because of the incremental exp
+        # parameterization. The following was written based on Greene for the
+        # simple non-incremental parameterization.
+        # k = self.k_levels - 1
+        # idx = self.endog
+        # score_factor = np.zeros((self.nobs, k + 1 + 2)) #+2 avoids idx bounds
+        #
+        # rows = np.arange(self.nobs)
+        # shift = 1
+        # score_factor[rows, shift + idx-1] = -pdf_low
+        # score_factor[rows, shift + idx] = pdf_upp
+        # score_factor[:, 0] = pdf_upp - pdf_low
+        score_factor = (pdf_upp - pdf_low)[:, None]
+        score_factor /= prob[:, None]
+
+        so = np.column_stack((-score_factor[:, :1] * self.exog,
+                              score_factor[:, 1:]))
+        return so

     @property
     def start_params(self):
@@ -393,7 +574,31 @@ class OrderedModel(GenericLikelihoodModel):
         transformed to the exponential increments parameterization.
         The parameters for explanatory variables are set to zero.
         """
-        pass
+        # start params based on model without exog
+        freq = np.bincount(self.endog) / len(self.endog)
+        start_ppf = self.distr.ppf(np.clip(freq.cumsum(), 0, 1))
+        start_threshold = self.transform_reverse_threshold_params(start_ppf)
+        start_params = np.concatenate((np.zeros(self.k_vars), start_threshold))
+        return start_params
+
+    @Appender(LikelihoodModel.fit.__doc__)
+    def fit(self, start_params=None, method='nm', maxiter=500, full_output=1,
+            disp=1, callback=None, retall=0, **kwargs):
+
+        fit_method = super(OrderedModel, self).fit
+        mlefit = fit_method(start_params=start_params,
+                            method=method, maxiter=maxiter,
+                            full_output=full_output,
+                            disp=disp, callback=callback, **kwargs)
+        # use the proper result class
+        ordmlefit = OrderedResults(self, mlefit)
+
+        # TODO: temporary, needs better fix, modelwc adds 1 by default
+        ordmlefit.hasconst = 0
+
+        result = OrderedResultsWrapper(ordmlefit)
+
+        return result


 class OrderedResults(GenericLikelihoodModelResults):
@@ -409,28 +614,40 @@ class OrderedResults(GenericLikelihoodModelResults):
         returns pandas DataFrame

         """
-        pass
+        # todo: add category labels
+        categories = np.arange(self.model.k_levels)
+        observed = pd.Categorical(self.model.endog,
+                                  categories=categories, ordered=True)
+        predicted = pd.Categorical(self.predict().argmax(1),
+                                   categories=categories, ordered=True)
+        table = pd.crosstab(predicted,
+                            observed.astype(int),
+                            margins=True,
+                            dropna=False).T.fillna(0)
+        return table

     @cache_readonly
     def llnull(self):
         """
         Value of the loglikelihood of model without explanatory variables
         """
-        pass
+        params_null = self.model.start_params
+        return self.model.loglike(params_null)

+    # next 3 are copied from discrete
     @cache_readonly
     def prsquared(self):
         """
         McFadden's pseudo-R-squared. `1 - (llf / llnull)`
         """
-        pass
+        return 1 - self.llf/self.llnull

     @cache_readonly
     def llr(self):
         """
         Likelihood ratio chi-squared statistic; `-2*(llnull - llf)`
         """
-        pass
+        return -2*(self.llnull - self.llf)

     @cache_readonly
     def llr_pvalue(self):
@@ -439,7 +656,8 @@ class OrderedResults(GenericLikelihoodModelResults):
         statistic greater than llr.  llr has a chi-squared distribution
         with degrees of freedom `df_model`.
         """
-        pass
+        # number of restrictions is number of exog
+        return stats.distributions.chi2.sf(self.llr, self.model.k_vars)

     @cache_readonly
     def resid_prob(self):
@@ -459,7 +677,12 @@ class OrderedResults(GenericLikelihoodModelResults):
         Biometrika. 99: 473–480

         """
-        pass
+        from statsmodels.stats.diagnostic_gen import prob_larger_ordinal_choice
+        endog = self.model.endog
+        fitted = self.predict()
+        r = prob_larger_ordinal_choice(fitted)[1]
+        resid_prob = r[np.arange(endog.shape[0]), endog]
+        return resid_prob


 class OrderedResultsWrapper(lm.RegressionResultsWrapper):
diff --git a/statsmodels/miscmodels/tmodel.py b/statsmodels/miscmodels/tmodel.py
index b8a030b68..969798d94 100644
--- a/statsmodels/miscmodels/tmodel.py
+++ b/statsmodels/miscmodels/tmodel.py
@@ -32,17 +32,23 @@ TODO


 """
+#mostly copied from the examples directory written for trying out generic mle.
+
 import numpy as np
 from scipy import special, stats
+
 from statsmodels.base.model import GenericLikelihoodModel
 from statsmodels.tsa.arma_mle import Arma
+
+
+#redefine some shortcuts
 np_log = np.log
 np_pi = np.pi
 sps_gamln = special.gammaln


 class TLinearModel(GenericLikelihoodModel):
-    """Maximum Likelihood Estimation of Linear Model with t-distributed errors
+    '''Maximum Likelihood Estimation of Linear Model with t-distributed errors

     This is an example for generic MLE.

@@ -51,7 +57,67 @@ class TLinearModel(GenericLikelihoodModel):
     and all resulting statistics are based on numerical
     differentiation.

-    """
+    '''
+
+    def initialize(self):
+        print("running Tmodel initialize")
+        # TODO: here or in __init__
+        self.k_vars = self.exog.shape[1]
+        if not hasattr(self, 'fix_df'):
+            self.fix_df = False
+
+        if self.fix_df is False:
+            # df will be estimated, no parameter restrictions
+            self.fixed_params = None
+            self.fixed_paramsmask = None
+            self.k_params = self.exog.shape[1] + 2
+            extra_params_names = ['df', 'scale']
+        else:
+            # df fixed
+            self.k_params = self.exog.shape[1] + 1
+            fixdf = np.nan * np.zeros(self.exog.shape[1] + 2)
+            fixdf[-2] = self.fix_df
+            self.fixed_params = fixdf
+            self.fixed_paramsmask = np.isnan(fixdf)
+            extra_params_names = ['scale']
+
+        super(TLinearModel, self).initialize()
+
+        # Note: this needs to be after super initialize
+        # super initialize sets default df_resid,
+        #_set_extra_params_names adjusts it
+        self._set_extra_params_names(extra_params_names)
+        self._set_start_params()
+
+
+    def _set_start_params(self, start_params=None, use_kurtosis=False):
+        if start_params is not None:
+            self.start_params = start_params
+        else:
+            from statsmodels.regression.linear_model import OLS
+            res_ols = OLS(self.endog, self.exog).fit()
+            start_params = 0.1*np.ones(self.k_params)
+            start_params[:self.k_vars] = res_ols.params
+
+            if self.fix_df is False:
+
+                if use_kurtosis:
+                    kurt = stats.kurtosis(res_ols.resid)
+                    df = 6./kurt + 4
+                else:
+                    df = 5
+
+                start_params[-2] = df
+                #TODO adjust scale for df
+                start_params[-1] = np.sqrt(res_ols.scale)
+
+            self.start_params = start_params
+
+
+
+
+    def loglike(self, params):
+        return -self.nloglikeobs(params).sum(0)

     def nloglikeobs(self, params):
         """
@@ -80,11 +146,32 @@ class TLinearModel(GenericLikelihoodModel):
         self.fixed_params and self.expandparams can be used to fix some
         parameters. (I doubt this has been tested in this model.)
         """
-        pass
+        #print len(params),
+        #store_params.append(params)
+        if self.fixed_params is not None:
+            #print 'using fixed'
+            params = self.expandparams(params)
+
+        beta = params[:-2]
+        df = params[-2]
+        scale = np.abs(params[-1])  #TODO check behavior around zero
+        loc = np.dot(self.exog, beta)
+        endog = self.endog
+        x = (endog - loc)/scale
+        #next part is stats.t._logpdf
+        lPx = sps_gamln((df+1)/2) - sps_gamln(df/2.)
+        lPx -= 0.5*np_log(df*np_pi) + (df+1)/2.*np_log(1+(x**2)/df)
+        lPx -= np_log(scale)  # correction for scale
+        return -lPx
+
+    def predict(self, params, exog=None):
+        if exog is None:
+            exog = self.exog
+        return np.dot(exog, params[:self.exog.shape[1]])


 class TArma(Arma):
-    """Univariate Arma Model with t-distributed errors
+    '''Univariate Arma Model with t-distributed errors

     This inherit all methods except loglike from tsa.arma_mle.Arma

@@ -98,8 +185,13 @@ class TArma(Arma):
     This might be replaced by a standardized t-distribution with scale**2
     equal to variance

-    """
+    '''
+
+    def loglike(self, params):
+        return -self.nloglikeobs(params).sum(0)

+
+    #add for Jacobian calculation  bsejac in GenericMLE, copied from loglike
     def nloglikeobs(self, params):
         """
         Loglikelihood for arma model for each observation, t-distribute
@@ -109,4 +201,31 @@ class TArma(Arma):
         The ancillary parameter is assumed to be the last element of
         the params vector
         """
-        pass
+
+        errorsest = self.geterrors(params[:-2])
+        #sigma2 = np.maximum(params[-1]**2, 1e-6)  #do I need this
+        #axis = 0
+        #nobs = len(errorsest)
+
+        df = params[-2]
+        scale = np.abs(params[-1])
+        llike  = - stats.t._logpdf(errorsest/scale, df) + np_log(scale)
+        return llike
+
+    #TODO rename fit_mle -> fit, fit -> fit_ls
+    def fit_mle(self, order, start_params=None, method='nm', maxiter=5000,
+            tol=1e-08, **kwds):
+        nar, nma = order
+        if start_params is not None:
+            if len(start_params) != nar + nma + 2:
+                raise ValueError('start_param need sum(order) + 2 elements')
+        else:
+            start_params = np.concatenate((0.05*np.ones(nar + nma), [5, 1]))
+
+
+        res = super(TArma, self).fit_mle(order=order,
+                                         start_params=start_params,
+                                         method=method, maxiter=maxiter,
+                                         tol=tol, **kwds)
+
+        return res
diff --git a/statsmodels/miscmodels/try_mlecov.py b/statsmodels/miscmodels/try_mlecov.py
index 48bc4067b..cb6ac842d 100644
--- a/statsmodels/miscmodels/try_mlecov.py
+++ b/statsmodels/miscmodels/try_mlecov.py
@@ -1,64 +1,133 @@
-"""Multivariate Normal Model with full covariance matrix
+'''Multivariate Normal Model with full covariance matrix

 toeplitz structure is not exploited, need cholesky or inv for toeplitz

 Author: josef-pktd
-"""
+'''
+
 import numpy as np
 from scipy import linalg
 from scipy.linalg import toeplitz
+
 from statsmodels.base.model import GenericLikelihoodModel
 from statsmodels.datasets import sunspots
-from statsmodels.tsa.arima_process import ArmaProcess, arma_acovf, arma_generate_sample
+from statsmodels.tsa.arima_process import (
+    ArmaProcess,
+    arma_acovf,
+    arma_generate_sample,
+)


 def mvn_loglike_sum(x, sigma):
-    """loglike multivariate normal
+    '''loglike multivariate normal

     copied from GLS and adjusted names
     not sure why this differes from mvn_loglike
-    """
-    pass
-
+    '''
+    nobs = len(x)
+    nobs2 = nobs / 2.0
+    SSR = (x**2).sum()
+    llf = -np.log(SSR) * nobs2      # concentrated likelihood
+    llf -= (1+np.log(np.pi/nobs2))*nobs2  # with likelihood constant
+    if np.any(sigma) and sigma.ndim == 2:
+    #FIXME: robust-enough check?  unneeded if _det_sigma gets defined
+        llf -= .5*np.log(np.linalg.det(sigma))
+    return llf

 def mvn_loglike(x, sigma):
-    """loglike multivariate normal
+    '''loglike multivariate normal

     assumes x is 1d, (nobs,) and sigma is 2d (nobs, nobs)

     brute force from formula
     no checking of correct inputs
     use of inv and log-det should be replace with something more efficient
-    """
-    pass
-
+    '''
+    #see numpy thread
+    #Sturla: sqmahal = (cx*cho_solve(cho_factor(S),cx.T).T).sum(axis=1)
+    sigmainv = linalg.inv(sigma)
+    logdetsigma = np.log(np.linalg.det(sigma))
+    nobs = len(x)
+
+    llf = - np.dot(x, np.dot(sigmainv, x))
+    llf -= nobs * np.log(2 * np.pi)
+    llf -= logdetsigma
+    llf *= 0.5
+    return llf

 def mvn_loglike_chol(x, sigma):
-    """loglike multivariate normal
+    '''loglike multivariate normal

     assumes x is 1d, (nobs,) and sigma is 2d (nobs, nobs)

     brute force from formula
     no checking of correct inputs
     use of inv and log-det should be replace with something more efficient
-    """
-    pass
-
+    '''
+    #see numpy thread
+    #Sturla: sqmahal = (cx*cho_solve(cho_factor(S),cx.T).T).sum(axis=1)
+    sigmainv = np.linalg.inv(sigma)
+    cholsigmainv = np.linalg.cholesky(sigmainv).T
+    x_whitened = np.dot(cholsigmainv, x)
+
+    logdetsigma = np.log(np.linalg.det(sigma))
+    nobs = len(x)
+    from scipy import stats
+    print('scipy.stats')
+    print(np.log(stats.norm.pdf(x_whitened)).sum())
+
+    llf = - np.dot(x_whitened.T, x_whitened)
+    llf -= nobs * np.log(2 * np.pi)
+    llf -= logdetsigma
+    llf *= 0.5
+    return llf, logdetsigma, 2 * np.sum(np.log(np.diagonal(cholsigmainv)))
+#0.5 * np.dot(x_whitened.T, x_whitened) + nobs * np.log(2 * np.pi) + logdetsigma)

 def mvn_nloglike_obs(x, sigma):
-    """loglike multivariate normal
+    '''loglike multivariate normal

     assumes x is 1d, (nobs,) and sigma is 2d (nobs, nobs)

     brute force from formula
     no checking of correct inputs
     use of inv and log-det should be replace with something more efficient
-    """
-    pass
+    '''
+    #see numpy thread
+    #Sturla: sqmahal = (cx*cho_solve(cho_factor(S),cx.T).T).sum(axis=1)
+
+    #Still wasteful to calculate pinv first
+    sigmainv = np.linalg.inv(sigma)
+    cholsigmainv = np.linalg.cholesky(sigmainv).T
+    #2 * np.sum(np.log(np.diagonal(np.linalg.cholesky(A)))) #Dag mailinglist
+    # logdet not needed ???
+    #logdetsigma = 2 * np.sum(np.log(np.diagonal(cholsigmainv)))
+    x_whitened = np.dot(cholsigmainv, x)
+
+    #sigmainv = linalg.cholesky(sigma)
+    logdetsigma = np.log(np.linalg.det(sigma))
+
+    sigma2 = 1. # error variance is included in sigma
+
+    llike  =  0.5 * (np.log(sigma2) - 2.* np.log(np.diagonal(cholsigmainv))
+                          + (x_whitened**2)/sigma2
+                          +  np.log(2*np.pi))

+    return llike
+
+
+def invertibleroots(ma):
+    proc = ArmaProcess(ma=ma)
+    return proc.invertroots(retnew=False)
+
+
+def getpoly(self, params):
+    ar = np.r_[[1], -params[:self.nar]]
+    ma = np.r_[[1], params[-self.nma:]]
+    import numpy.polynomial as poly
+    return poly.Polynomial(ar), poly.Polynomial(ma)

 class MLEGLS(GenericLikelihoodModel):
-    """ARMA model with exact loglikelhood for short time series
+    '''ARMA model with exact loglikelhood for short time series

     Inverts (nobs, nobs) matrix, use only for nobs <= 200 or so.

@@ -71,35 +140,82 @@ class MLEGLS(GenericLikelihoodModel):
     This might be missing the error variance. Does it assume error is
        distributed N(0,1)
     Maybe extend to mean handling, or assume it is already removed.
-    """
+    '''
+

     def _params2cov(self, params, nobs):
-        """get autocovariance matrix from ARMA regression parameter
+        '''get autocovariance matrix from ARMA regression parameter

         ar parameters are assumed to have rhs parameterization

-        """
-        pass
+        '''
+        ar = np.r_[[1], -params[:self.nar]]
+        ma = np.r_[[1], params[-self.nma:]]
+        #print('ar', ar
+        #print('ma', ma
+        #print('nobs', nobs
+        autocov = arma_acovf(ar, ma, nobs=nobs)
+        #print('arma_acovf(%r, %r, nobs=%d)' % (ar, ma, nobs)
+        #print(autocov.shape
+        #something is strange  fixed in aram_acovf
+        autocov = autocov[:nobs]
+        sigma = toeplitz(autocov)
+        return sigma
+
+    def loglike(self, params):
+        sig = self._params2cov(params[:-1], self.nobs)
+        sig = sig * params[-1]**2
+        loglik = mvn_loglike(self.endog, sig)
+        return loglik
+
+    def fit_invertible(self, *args, **kwds):
+        res = self.fit(*args, **kwds)
+        ma = np.r_[[1], res.params[self.nar: self.nar+self.nma]]
+        mainv, wasinvertible = invertibleroots(ma)
+        if not wasinvertible:
+            start_params = res.params.copy()
+            start_params[self.nar: self.nar+self.nma] = mainv[1:]
+            #need to add args kwds
+            res = self.fit(start_params=start_params)
+        return res
+


 if __name__ == '__main__':
     nobs = 50
     ar = [1.0, -0.8, 0.1]
-    ma = [1.0, 0.1, 0.2]
+    ma = [1.0,  0.1,  0.2]
+    #ma = [1]
     np.random.seed(9875789)
-    y = arma_generate_sample(ar, ma, nobs, 2)
-    y -= y.mean()
+    y = arma_generate_sample(ar,ma,nobs,2)
+    y -= y.mean() #I have not checked treatment of mean yet, so remove
     mod = MLEGLS(y)
-    mod.nar, mod.nma = 2, 2
+    mod.nar, mod.nma = 2, 2   #needs to be added, no init method
     mod.nobs = len(y)
-    res = mod.fit(start_params=[0.1, -0.8, 0.2, 0.1, 1.0])
+    res = mod.fit(start_params=[0.1, -0.8, 0.2, 0.1, 1.])
     print('DGP', ar, ma)
     print(res.params)
     from statsmodels.regression import yule_walker
     print(yule_walker(y, 2))
+    #resi = mod.fit_invertible(start_params=[0.1,0,0.2,0, 0.5])
+    #print(resi.params
+
     arpoly, mapoly = getpoly(mod, res.params[:-1])
+
     data = sunspots.load()
-    sigma = mod._params2cov(res.params[:-1], nobs) * res.params[-1] ** 2
+    #ys = data.endog[-100:]
+##    ys = data.endog[12:]-data.endog[:-12]
+##    ys -= ys.mean()
+##    mods = MLEGLS(ys)
+##    mods.nar, mods.nma = 13, 1   #needs to be added, no init method
+##    mods.nobs = len(ys)
+##    ress = mods.fit(start_params=np.r_[0.4, np.zeros(12), [0.2, 5.]],maxiter=200)
+##    print(ress.params
+##    import matplotlib.pyplot as plt
+##    plt.plot(data.endog[1])
+##    #plt.show()
+
+    sigma = mod._params2cov(res.params[:-1], nobs) * res.params[-1]**2
     print(mvn_loglike(y, sigma))
     llo = mvn_nloglike_obs(y, sigma)
     print(llo.sum(), llo.shape)
diff --git a/statsmodels/multivariate/api.py b/statsmodels/multivariate/api.py
index 7788ba413..3572a0760 100644
--- a/statsmodels/multivariate/api.py
+++ b/statsmodels/multivariate/api.py
@@ -1,5 +1,8 @@
-__all__ = ['PCA', 'MANOVA', 'Factor', 'FactorResults', 'CanCorr',
-    'factor_rotation']
+__all__ = [
+    "PCA", "MANOVA", "Factor", "FactorResults", "CanCorr",
+    "factor_rotation"
+]
+
 from .pca import PCA
 from .manova import MANOVA
 from .factor import Factor, FactorResults
diff --git a/statsmodels/multivariate/cancorr.py b/statsmodels/multivariate/cancorr.py
index 13c51c221..408d9548c 100644
--- a/statsmodels/multivariate/cancorr.py
+++ b/statsmodels/multivariate/cancorr.py
@@ -1,3 +1,5 @@
+# -*- coding: utf-8 -*-
+
 """Canonical correlation analysis

 author: Yichuan Liu
@@ -6,6 +8,7 @@ import numpy as np
 from numpy.linalg import svd
 import scipy
 import pandas as pd
+
 from statsmodels.base.model import Model
 from statsmodels.iolib import summary2
 from .multivariate_ols import multivariate_stats
@@ -41,15 +44,13 @@ class CanCorr(Model):
     .. [*] http://numerical.recipes/whp/notes/CanonCorrBySVD.pdf
     .. [*] http://www.csun.edu/~ata20315/psy524/docs/Psy524%20Lecture%208%20CC.pdf
     .. [*] http://www.mathematica-journal.com/2014/06/canonical-correlation-analysis/
-    """
-
-    def __init__(self, endog, exog, tolerance=1e-08, missing='none',
-        hasconst=None, **kwargs):
+    """  # noqa:E501
+    def __init__(self, endog, exog, tolerance=1e-8, missing='none', hasconst=None, **kwargs):
         super(CanCorr, self).__init__(endog, exog, missing=missing,
-            hasconst=hasconst, **kwargs)
+                                      hasconst=hasconst, **kwargs)
         self._fit(tolerance)

-    def _fit(self, tolerance=1e-08):
+    def _fit(self, tolerance=1e-8):
         """Fit the model

         A ValueError is raised if there are singular values smaller than the
@@ -60,7 +61,36 @@ class CanCorr(Model):
         tolerance : float
             eigenvalue tolerance, values smaller than which is considered 0
         """
-        pass
+        nobs, k_yvar = self.endog.shape
+        nobs, k_xvar = self.exog.shape
+        k = np.min([k_yvar, k_xvar])
+
+        x = np.array(self.exog)
+        x = x - x.mean(0)
+        y = np.array(self.endog)
+        y = y - y.mean(0)
+
+        ux, sx, vx = svd(x, 0)
+        # vx_ds = vx.T divided by sx
+        vx_ds = vx.T
+        mask = sx > tolerance
+        if mask.sum() < len(mask):
+            raise ValueError('exog is collinear.')
+        vx_ds[:, mask] /= sx[mask]
+        uy, sy, vy = svd(y, 0)
+        # vy_ds = vy.T divided by sy
+        vy_ds = vy.T
+        mask = sy > tolerance
+        if mask.sum() < len(mask):
+            raise ValueError('endog is collinear.')
+        vy_ds[:, mask] /= sy[mask]
+        u, s, v = svd(ux.T.dot(uy), 0)
+
+        # Correct any roundoff
+        self.cancorr = np.array([max(0, min(s[i], 1)) for i in range(len(s))])
+
+        self.x_cancoef = vx_ds.dot(u[:, :k])
+        self.y_cancoef = vy_ds.dot(v.T[:, :k])

     def corr_test(self):
         """Approximate F test
@@ -73,7 +103,51 @@ class CanCorr(Model):
         -------
         CanCorrTestResults instance
         """
-        pass
+        nobs, k_yvar = self.endog.shape
+        nobs, k_xvar = self.exog.shape
+        eigenvals = np.power(self.cancorr, 2)
+        stats = pd.DataFrame(columns=['Canonical Correlation', "Wilks' lambda",
+                                      'Num DF','Den DF', 'F Value','Pr > F'],
+                             index=list(range(len(eigenvals) - 1, -1, -1)))
+        prod = 1
+        for i in range(len(eigenvals) - 1, -1, -1):
+            prod *= 1 - eigenvals[i]
+            p = k_yvar - i
+            q = k_xvar - i
+            r = (nobs - k_yvar - 1) - (p - q + 1) / 2
+            u = (p * q - 2) / 4
+            df1 = p * q
+            if p ** 2 + q ** 2 - 5 > 0:
+                t = np.sqrt(((p * q) ** 2 - 4) / (p ** 2 + q ** 2 - 5))
+            else:
+                t = 1
+            df2 = r * t - 2 * u
+            lmd = np.power(prod, 1 / t)
+            F = (1 - lmd) / lmd * df2 / df1
+            stats.loc[i, 'Canonical Correlation'] = self.cancorr[i]
+            stats.loc[i, "Wilks' lambda"] = prod
+            stats.loc[i, 'Num DF'] = df1
+            stats.loc[i, 'Den DF'] = df2
+            stats.loc[i, 'F Value'] = F
+            pval = scipy.stats.f.sf(F, df1, df2)
+            stats.loc[i, 'Pr > F'] = pval
+            '''
+            # Wilk's Chi square test of each canonical correlation
+            df = (p - i + 1) * (q - i + 1)
+            chi2 = a * np.log(prod)
+            pval = stats.chi2.sf(chi2, df)
+            stats.loc[i, 'Canonical correlation'] = self.cancorr[i]
+            stats.loc[i, 'Chi-square'] = chi2
+            stats.loc[i, 'DF'] = df
+            stats.loc[i, 'Pr > ChiSq'] = pval
+            '''
+        ind = stats.index.values[::-1]
+        stats = stats.loc[ind, :]
+
+        # Multivariate tests (remember x has mean removed)
+        stats_mv = multivariate_stats(eigenvals,
+                                      k_yvar, k_xvar, nobs - k_xvar - 1)
+        return CanCorrTestResults(stats, stats_mv)


 class CanCorrTestResults:
@@ -87,10 +161,18 @@ class CanCorrTestResults:
     stats_mv : DataFrame
         Contain the multivariate statistical tests results
     """
-
     def __init__(self, stats, stats_mv):
         self.stats = stats
         self.stats_mv = stats_mv

     def __str__(self):
         return self.summary().__str__()
+
+    def summary(self):
+        summ = summary2.Summary()
+        summ.add_title('Cancorr results')
+        summ.add_df(self.stats)
+        summ.add_dict({'': ''})
+        summ.add_dict({'Multivariate Statistics and F Approximations': ''})
+        summ.add_df(self.stats_mv)
+        return summ
diff --git a/statsmodels/multivariate/factor.py b/statsmodels/multivariate/factor.py
index 5dcb809b5..0b23f400a 100644
--- a/statsmodels/multivariate/factor.py
+++ b/statsmodels/multivariate/factor.py
@@ -1,14 +1,53 @@
+# -*- coding: utf-8 -*-
+
 import warnings
+
 import numpy as np
 from numpy.linalg import eigh, inv, norm, matrix_rank
 import pandas as pd
 from scipy.optimize import minimize
+
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.base.model import Model
 from statsmodels.iolib import summary2
 from statsmodels.graphics.utils import _import_mpl
+
 from .factor_rotation import rotate_factors, promax
-_opt_defaults = {'gtol': 1e-07}
+
+
+_opt_defaults = {'gtol': 1e-7}
+
+
+def _check_args_1(endog, n_factor, corr, nobs):
+
+    msg = "Either endog or corr must be provided."
+    if endog is not None and corr is not None:
+        raise ValueError(msg)
+    if endog is None and corr is None:
+        warnings.warn('Both endog and corr are provided, ' +
+                      'corr will be used for factor analysis.')
+
+    if n_factor <= 0:
+        raise ValueError('n_factor must be larger than 0! %d < 0' %
+                         (n_factor))
+
+    if nobs is not None and endog is not None:
+        warnings.warn("nobs is ignored when endog is provided")
+
+
+def _check_args_2(endog, n_factor, corr, nobs, k_endog):
+
+    if n_factor > k_endog:
+        raise ValueError('n_factor cannot be greater than the number'
+                         ' of variables! %d > %d' %
+                         (n_factor, k_endog))
+
+    if np.max(np.abs(np.diag(corr) - 1)) > 1e-10:
+        raise ValueError("corr must be a correlation matrix")
+
+    if corr.shape[0] != corr.shape[1]:
+        raise ValueError('Correlation matrix corr must be a square '
+                         '(rows %d != cols %d)' % corr.shape)


 class Factor(Model):
@@ -65,13 +104,14 @@ class Factor(Model):
     .. [*] J Bai, K Li (2012).  Statistical analysis of factor models of high
        dimension.  Annals of Statistics. https://arxiv.org/pdf/1205.6617.pdf
     """
+    def __init__(self, endog=None, n_factor=1, corr=None, method='pa',
+                 smc=True, endog_names=None, nobs=None, missing='drop'):

-    def __init__(self, endog=None, n_factor=1, corr=None, method='pa', smc=
-        True, endog_names=None, nobs=None, missing='drop'):
         _check_args_1(endog, n_factor, corr, nobs)
+
         if endog is not None:
             super(Factor, self).__init__(endog, exog=None, missing=missing)
-            endog = self.endog
+            endog = self.endog   # after preprocessing like missing, asarray
             k_endog = endog.shape[1]
             nobs = endog.shape[0]
             corr = self.corr = np.corrcoef(endog, rowvar=0)
@@ -80,9 +120,11 @@ class Factor(Model):
             k_endog = self.corr.shape[0]
             self.endog = None
         else:
-            msg = 'Either endog or corr must be provided.'
+            msg = "Either endog or corr must be provided."
             raise ValueError(msg)
+
         _check_args_2(endog, n_factor, corr, nobs, k_endog)
+
         self.n_factor = n_factor
         self.loadings = None
         self.communality = None
@@ -92,6 +134,7 @@ class Factor(Model):
         self.method = method
         self.corr = corr
         self.k_endog = k_endog
+
         if endog_names is None:
             if hasattr(corr, 'index'):
                 endog_names = corr.index
@@ -102,10 +145,33 @@ class Factor(Model):
     @property
     def endog_names(self):
         """Names of endogenous variables"""
-        pass
+        if self._endog_names is not None:
+            return self._endog_names
+        else:
+            if self.endog is not None:
+                return self.data.ynames
+            else:
+                d = 0
+                n = self.corr.shape[0] - 1
+                while n > 0:
+                    d += 1
+                    n //= 10
+                return [('var%0' + str(d) + 'd') % i
+                        for i in range(self.corr.shape[0])]
+
+    @endog_names.setter
+    def endog_names(self, value):
+        # Check validity of endog_names:
+        if value is not None:
+            if len(value) != self.corr.shape[0]:
+                raise ValueError('The length of `endog_names` must '
+                                 'equal the number of variables.')
+            self._endog_names = np.asarray(value)
+        else:
+            self._endog_names = None

-    def fit(self, maxiter=50, tol=1e-08, start=None, opt_method='BFGS', opt
-        =None, em_iter=3):
+    def fit(self, maxiter=50, tol=1e-8, start=None, opt_method='BFGS',
+            opt=None, em_iter=3):
         """
         Estimate factor model parameters.

@@ -131,9 +197,16 @@ class Factor(Model):
         FactorResults
             Results class instance.
         """
-        pass
+        method = self.method.lower()
+        if method == 'pa':
+            return self._fit_pa(maxiter=maxiter, tol=tol)
+        elif method == 'ml':
+            return self._fit_ml(start, em_iter, opt_method, opt)
+        else:
+            msg = "Unknown factor extraction approach '%s'" % self.method
+            raise ValueError(msg)

-    def _fit_pa(self, maxiter=50, tol=1e-08):
+    def _fit_pa(self, maxiter=50, tol=1e-8):
         """
         Extract factors using the iterative principal axis method

@@ -149,7 +222,73 @@ class Factor(Model):
         -------
         results : FactorResults instance
         """
-        pass
+
+        R = self.corr.copy()  # inplace modification below
+
+        # Parameter validation
+        self.n_comp = matrix_rank(R)
+        if self.n_factor > self.n_comp:
+            raise ValueError('n_factor must be smaller or equal to the rank'
+                             ' of endog! %d > %d' %
+                             (self.n_factor, self.n_comp))
+        if maxiter <= 0:
+            raise ValueError('n_max_iter must be larger than 0! %d < 0' %
+                             (maxiter))
+        if tol <= 0 or tol > 0.01:
+            raise ValueError('tolerance must be larger than 0 and smaller than'
+                             ' 0.01! Got %f instead' % (tol))
+
+        #  Initial communality estimation
+        if self.smc:
+            c = 1 - 1 / np.diag(inv(R))
+        else:
+            c = np.ones(len(R))
+
+        # Iterative communality estimation
+        eigenvals = None
+        for i in range(maxiter):
+            # Get eigenvalues/eigenvectors of R with diag replaced by
+            # communality
+            for j in range(len(R)):
+                R[j, j] = c[j]
+            L, V = eigh(R, UPLO='U')
+            c_last = np.array(c)
+            ind = np.argsort(L)
+            ind = ind[::-1]
+            L = L[ind]
+            n_pos = (L > 0).sum()
+            V = V[:, ind]
+            eigenvals = np.array(L)
+
+            # Select eigenvectors with positive eigenvalues
+            n = np.min([n_pos, self.n_factor])
+            sL = np.diag(np.sqrt(L[:n]))
+            V = V[:, :n]
+
+            # Calculate new loadings and communality
+            A = V.dot(sL)
+            c = np.power(A, 2).sum(axis=1)
+            if norm(c_last - c) < tol:
+                break
+
+        self.eigenvals = eigenvals
+        self.communality = c
+        self.uniqueness = 1 - c
+        self.loadings = A
+        return FactorResults(self)
+
+    # Unpacks the model parameters from a flat vector, used for ML
+    # estimation.  The first k_endog elements of par are the square
+    # roots of the uniquenesses.  The remaining elements are the
+    # factor loadings, packed one factor at a time.
+    def _unpack(self, par):
+        return (par[0:self.k_endog]**2,
+                np.reshape(par[self.k_endog:], (-1, self.k_endog)).T)
+
+    # Packs the model parameters into a flat parameter, used for ML
+    # estimation.
+    def _pack(self, load, uniq):
+        return np.concatenate((np.sqrt(uniq), load.T.flat))

     def loglike(self, par):
         """
@@ -168,7 +307,32 @@ class Factor(Model):
         float
             The value of the log-likelihood evaluated at par.
         """
-        pass
+
+        if type(par) is np.ndarray:
+            uniq, load = self._unpack(par)
+        else:
+            load, uniq = par[0], par[1]
+
+        loadu = load / uniq[:, None]
+        lul = np.dot(load.T, loadu)
+
+        # log|GG' + S|
+        # Using matrix determinant lemma:
+        # |GG' + S| = |I + G'S^{-1}G|*|S|
+        lul.flat[::lul.shape[0]+1] += 1
+        _, ld = np.linalg.slogdet(lul)
+        v = np.sum(np.log(uniq)) + ld
+
+        # tr((GG' + S)^{-1}C)
+        # Using Sherman-Morrison-Woodbury
+        w = np.sum(1 / uniq)
+        b = np.dot(load.T, self.corr / uniq[:, None])
+        b = np.linalg.solve(lul, b)
+        b = np.dot(loadu, b)
+        w -= np.trace(b)
+
+        # Scaled log-likelihood
+        return -(v + w) / (2*self.k_endog)

     def score(self, par):
         """
@@ -187,22 +351,142 @@ class Factor(Model):
         ndarray
             The score function evaluated at par.
         """
-        pass

+        if type(par) is np.ndarray:
+            uniq, load = self._unpack(par)
+        else:
+            load, uniq = par[0], par[1]
+
+        # Center term of SMW
+        loadu = load / uniq[:, None]
+        c = np.dot(load.T, loadu)
+        c.flat[::c.shape[0]+1] += 1
+        d = np.linalg.solve(c, load.T)
+
+        # Precompute these terms
+        lud = np.dot(loadu, d)
+        cu = (self.corr / uniq) / uniq[:, None]
+        r = np.dot(cu, load)
+        lul = np.dot(lud.T, load)
+        luz = np.dot(cu, lul)
+
+        # First term
+        du = 2*np.sqrt(uniq) * (1/uniq - (d * load.T).sum(0) / uniq**2)
+        dl = 2*(loadu - np.dot(lud, loadu))
+
+        # Second term
+        h = np.dot(lud, cu)
+        f = np.dot(h, lud.T)
+        du -= 2*np.sqrt(uniq) * (np.diag(cu) - 2*np.diag(h) + np.diag(f))
+        dl -= 2*r
+        dl += 2*np.dot(lud, r)
+        dl += 2*luz
+        dl -= 2*np.dot(lud, luz)
+
+        # Cannot use _pack because we are working with the square root
+        # uniquenesses directly.
+        return -np.concatenate((du, dl.T.flat)) / (2*self.k_endog)
+
+    # Maximum likelihood factor analysis.
     def _fit_ml(self, start, em_iter, opt_method, opt):
         """estimate Factor model using Maximum Likelihood
         """
-        pass
+
+        # Starting values
+        if start is None:
+            load, uniq = self._fit_ml_em(em_iter)
+            start = self._pack(load, uniq)
+        elif len(start) == 2:
+            if len(start[1]) != start[0].shape[0]:
+                msg = "Starting values have incompatible dimensions"
+                raise ValueError(msg)
+            start = self._pack(start[0], start[1])
+        else:
+            raise ValueError("Invalid starting values")
+
+        def nloglike(par):
+            return -self.loglike(par)
+
+        def nscore(par):
+            return -self.score(par)
+
+        # Do the optimization
+        if opt is None:
+            opt = _opt_defaults
+        r = minimize(nloglike, start, jac=nscore, method=opt_method,
+                     options=opt)
+        if not r.success:
+            warnings.warn("Fitting did not converge")
+        par = r.x
+        uniq, load = self._unpack(par)
+
+        if uniq.min() < 1e-10:
+            warnings.warn("Some uniquenesses are nearly zero")
+
+        # Rotate solution to satisfy IC3 of Bai and Li
+        load = self._rotate(load, uniq)
+
+        self.uniqueness = uniq
+        self.communality = 1 - uniq
+        self.loadings = load
+        self.mle_retvals = r
+
+        return FactorResults(self)

     def _fit_ml_em(self, iter, random_state=None):
         """estimate Factor model using EM algorithm
         """
-        pass
+        # Starting values
+        if random_state is None:
+            random_state = np.random.RandomState(3427)
+        load = 0.1 * random_state.standard_normal(size=(self.k_endog, self.n_factor))
+        uniq = 0.5 * np.ones(self.k_endog)
+
+        for k in range(iter):
+
+            loadu = load / uniq[:, None]
+
+            f = np.dot(load.T, loadu)
+            f.flat[::f.shape[0]+1] += 1
+
+            r = np.linalg.solve(f, loadu.T)
+            q = np.dot(loadu.T, load)
+            h = np.dot(r, load)
+
+            c = load - np.dot(load, h)
+            c /= uniq[:, None]
+
+            g = np.dot(q, r)
+            e = np.dot(g, self.corr)
+            d = np.dot(loadu.T, self.corr) - e
+
+            a = np.dot(d, c)
+            a -= np.dot(load.T, c)
+            a.flat[::a.shape[0]+1] += 1
+
+            b = np.dot(self.corr, c)
+
+            load = np.linalg.solve(a, b.T).T
+            uniq = np.diag(self.corr) - (load * d.T).sum(1)
+
+        return load, uniq

     def _rotate(self, load, uniq):
         """rotate loadings for MLE
         """
-        pass
+        # Rotations used in ML estimation.
+        load, s, _ = np.linalg.svd(load, 0)
+        load *= s
+
+        if self.nobs is None:
+            nobs = 1
+        else:
+            nobs = self.nobs
+
+        cm = np.dot(load.T, load / uniq[:, None]) / nobs
+        _, f = np.linalg.eig(cm)
+        load = np.dot(load, f)
+        return load


 class FactorResults:
@@ -253,13 +537,13 @@ class FactorResults:
     Status: experimental, Some refactoring will be necessary when new
         features are added.
     """
-
     def __init__(self, factor):
         self.model = factor
         self.endog_names = factor.endog_names
         self.loadings_no_rot = factor.loadings
-        if hasattr(factor, 'eigenvals'):
+        if hasattr(factor, "eigenvals"):
             self.eigenvals = factor.eigenvals
+
         self.communality = factor.communality
         self.uniqueness = factor.uniqueness
         self.rotation_method = None
@@ -267,13 +551,17 @@ class FactorResults:
         self.n_comp = factor.loadings.shape[1]
         self.nobs = factor.nobs
         self._factor = factor
-        if hasattr(factor, 'mle_retvals'):
+        if hasattr(factor, "mle_retvals"):
             self.mle_retvals = factor.mle_retvals
+
         p, k = self.loadings_no_rot.shape
-        self.df = ((p - k) ** 2 - (p + k)) // 2
+        self.df = ((p - k)**2 - (p + k)) // 2
+
+        # no rotation, overwritten in `rotate`
         self.loadings = factor.loadings
         self.rotation_matrix = np.eye(self.n_comp)

+
     def __str__(self):
         return self.summary().__str__()

@@ -303,7 +591,24 @@ class FactorResults:
         --------
         factor_rotation : subpackage that implements rotation methods
         """
-        pass
+        self.rotation_method = method
+        if method not in ['varimax', 'quartimax', 'biquartimax',
+                          'equamax', 'oblimin', 'parsimax', 'parsimony',
+                          'biquartimin', 'promax']:
+            raise ValueError('Unknown rotation method %s' % (method))
+
+        if method in ['varimax', 'quartimax', 'biquartimax', 'equamax',
+                      'parsimax', 'parsimony', 'biquartimin']:
+            self.loadings, T = rotate_factors(self.loadings_no_rot, method)
+        elif method == 'oblimin':
+            self.loadings, T = rotate_factors(self.loadings_no_rot,
+                                              'quartimin')
+        elif method == 'promax':
+            self.loadings, T = promax(self.loadings_no_rot)
+        else:
+            raise ValueError('rotation method not recognized')
+
+        self.rotation_matrix = T

     def _corr_factors(self):
         """correlation of factors implied by rotation
@@ -318,7 +623,9 @@ class FactorResults:
             correlation matrix of rotated factors, assuming initial factors are
             orthogonal
         """
-        pass
+        T = self.rotation_matrix
+        corr_f = T.T.dot(T)
+        return corr_f

     def factor_score_params(self, method='bartlett'):
         """
@@ -349,7 +656,34 @@ class FactorResults:
         --------
         statsmodels.multivariate.factor.FactorResults.factor_scoring
         """
-        pass
+        L = self.loadings
+        T = self.rotation_matrix.T
+        #TODO: check row versus column convention for T
+        uni = 1 - self.communality #self.uniqueness
+
+        if method == 'bartlett':
+            s_mat = np.linalg.inv(L.T.dot(L/(uni[:,None]))).dot((L.T / uni)).T
+        elif method.startswith('reg'):
+            corr = self.model.corr
+            corr_f = self._corr_factors()
+            # if orthogonal then corr_f is just eye
+            s_mat = corr_f.dot(L.T.dot(np.linalg.inv(corr))).T
+        elif method == 'ols':
+            # not verified
+            corr = self.model.corr
+            corr_f = self._corr_factors()
+            s_mat = corr_f.dot(np.linalg.pinv(L)).T
+        elif method == 'gls':
+            # not verified
+            #s_mat = np.linalg.inv(1*np.eye(L.shape[1]) + L.T.dot(L/(uni[:,None])))
+            corr = self.model.corr
+            corr_f = self._corr_factors()
+            s_mat = np.linalg.inv(np.linalg.inv(corr_f) + L.T.dot(L/(uni[:,None])))
+            s_mat = s_mat.dot(L.T / uni).T
+        else:
+            raise ValueError('method not available, use "bartlett ' +
+                             'or "regression"')
+        return s_mat

     def factor_scoring(self, endog=None, method='bartlett', transform=True):
         """
@@ -385,14 +719,68 @@ class FactorResults:
         --------
         statsmodels.multivariate.factor.FactorResults.factor_score_params
         """
-        pass
+
+        if transform is False and endog is not None:
+            # no transformation in this case
+            endog = np.asarray(endog)
+        else:
+            # we need to standardize with the original mean and scale
+            if self.model.endog is not None:
+                m = self.model.endog.mean(0)
+                s = self.model.endog.std(ddof=1, axis=0)
+                if endog is None:
+                    endog = self.model.endog
+                else:
+                    endog = np.asarray(endog)
+            else:
+                raise ValueError('If transform is True, then `endog` needs ' +
+                                 'to be available in the Factor instance.')
+
+            endog = (endog - m) / s
+
+        s_mat = self.factor_score_params(method=method)
+        factors = endog.dot(s_mat)
+        return factors

     def summary(self):
         """Summary"""
-        pass
+        summ = summary2.Summary()
+        summ.add_title('Factor analysis results')
+        loadings_no_rot = pd.DataFrame(
+            self.loadings_no_rot,
+            columns=["factor %d" % (i)
+                     for i in range(self.loadings_no_rot.shape[1])],
+            index=self.endog_names
+        )
+        if hasattr(self, "eigenvals"):
+            # eigenvals not available for ML method
+            eigenvals = pd.DataFrame(
+                [self.eigenvals], columns=self.endog_names, index=[''])
+            summ.add_dict({'': 'Eigenvalues'})
+            summ.add_df(eigenvals)
+        communality = pd.DataFrame([self.communality],
+                                   columns=self.endog_names, index=[''])
+        summ.add_dict({'': ''})
+        summ.add_dict({'': 'Communality'})
+        summ.add_df(communality)
+        summ.add_dict({'': ''})
+        summ.add_dict({'': 'Pre-rotated loadings'})
+        summ.add_df(loadings_no_rot)
+        summ.add_dict({'': ''})
+        if self.rotation_method is not None:
+            loadings = pd.DataFrame(
+                self.loadings,
+                columns=["factor %d" % (i)
+                         for i in range(self.loadings.shape[1])],
+                index=self.endog_names
+            )
+            summ.add_dict({'': '%s rotated loadings' % (self.rotation_method)})
+            summ.add_df(loadings)
+        return summ

     def get_loadings_frame(self, style='display', sort_=True, threshold=0.3,
-        highlight_max=True, color_max='yellow', decimals=None):
+                           highlight_max=True, color_max='yellow',
+                           decimals=None):
         """get loadings matrix as DataFrame or pandas Styler

         Parameters
@@ -448,7 +836,82 @@ class FactorResults:
         ...                                threshold=0.3)
         >>> print(lds.to_latex())
         """
-        pass
+
+        loadings_df = pd.DataFrame(
+                self.loadings,
+                columns=["factor %d" % (i)
+                         for i in range(self.loadings.shape[1])],
+                index=self.endog_names
+                )
+
+        if style not in ['raw', 'display', 'strings']:
+            msg = "style has to be one of 'raw', 'display', 'strings'"
+            raise ValueError(msg)
+
+        if style == 'raw':
+            return loadings_df
+
+        # add sorting and some formatting
+        if sort_ is True:
+            loadings_df2 = loadings_df.copy()
+            n_f = len(loadings_df2)
+            high = np.abs(loadings_df2.values).argmax(1)
+            loadings_df2['high'] = high
+            loadings_df2['largest'] = np.abs(loadings_df.values[np.arange(n_f), high])
+            loadings_df2.sort_values(by=['high', 'largest'], ascending=[True, False], inplace=True)
+            loadings_df = loadings_df2.drop(['high', 'largest'], axis=1)
+
+        if style == 'display':
+            sty = None
+            if threshold > 0:
+                def color_white_small(val):
+                    """
+                    Takes a scalar and returns a string with
+                    the css property `'color: white'` for small values, black otherwise.
+
+                    takes threshold from outer scope
+                    """
+                    color = 'white' if np.abs(val) < threshold else 'black'
+                    return 'color: %s' % color
+                try:
+                    sty = loadings_df.style.map(color_white_small)
+                except AttributeError:
+                    # Deprecated in pandas 2.1
+                    sty = loadings_df.style.applymap(color_white_small)
+
+            if highlight_max is True:
+                def highlight_max(s):
+                    '''
+                    highlight the maximum in a Series yellow.
+                    '''
+                    s = np.abs(s)
+                    is_max = s == s.max()
+                    return ['background-color: '+ color_max if v else '' for v in is_max]
+
+                if sty is None:
+                    sty = loadings_df.style
+
+                sty = sty.apply(highlight_max, axis=1)
+
+            if decimals is not None:
+                if sty is None:
+                    sty = loadings_df.style
+
+                sty.format("{:.%sf}" % decimals)
+
+            if sty is None:
+                return loadings_df
+            else:
+                return sty
+
+        if style == 'strings':
+            ld = loadings_df
+            if decimals is not None:
+                ld = ld.round(decimals)
+            ld = ld.astype(str)
+            if threshold > 0:
+                ld[loadings_df.abs() < threshold] = ''
+            return ld

     def plot_scree(self, ncomp=None):
         """
@@ -465,7 +928,9 @@ class FactorResults:
         Figure
             Handle to the figure.
         """
-        pass
+        _import_mpl()
+        from .plots import plot_scree
+        return plot_scree(self.eigenvals, self.n_comp, ncomp)

     def plot_loadings(self, loading_pairs=None, plot_prerotated=False):
         """
@@ -485,14 +950,31 @@ class FactorResults:
         -------
         figs : a list of figure handles
         """
-        pass
+        _import_mpl()
+        from .plots import plot_loadings
+
+        if self.rotation_method is None:
+            plot_prerotated = True
+        loadings = self.loadings_no_rot if plot_prerotated else self.loadings
+        if plot_prerotated:
+            title = 'Prerotated Factor Pattern'
+        else:
+            title = '%s Rotated Factor Pattern' % (self.rotation_method)
+        var_explained = self.eigenvals / self.n_comp * 100
+
+        return plot_loadings(loadings, loading_pairs=loading_pairs,
+                             title=title, row_names=self.endog_names,
+                             percent_variance=var_explained)

     @cache_readonly
     def fitted_cov(self):
         """
         Returns the fitted covariance matrix.
         """
-        pass
+
+        c = np.dot(self.loadings, self.loadings.T)
+        c.flat[::c.shape[0]+1] += self.uniqueness
+        return c

     @cache_readonly
     def uniq_stderr(self, kurt=0):
@@ -517,7 +999,17 @@ class FactorResults:
         The standard errors are only applicable to the original,
         unrotated maximum likelihood solution.
         """
-        pass
+
+        if self.fa_method.lower() != "ml":
+            msg = "Standard errors only available under ML estimation"
+            raise ValueError(msg)
+
+        if self.nobs is None:
+            msg = "nobs is required to obtain standard errors."
+            raise ValueError(msg)
+
+        v = self.uniqueness**2 * (2 + kurt)
+        return np.sqrt(v / self.nobs)

     @cache_readonly
     def load_stderr(self):
@@ -534,4 +1026,14 @@ class FactorResults:
         The standard errors are only applicable to the original,
         unrotated maximum likelihood solution.
         """
-        pass
+
+        if self.fa_method.lower() != "ml":
+            msg = "Standard errors only available under ML estimation"
+            raise ValueError(msg)
+
+        if self.nobs is None:
+            msg = "nobs is required to obtain standard errors."
+            raise ValueError(msg)
+
+        v = np.outer(self.uniqueness, np.ones(self.loadings.shape[1]))
+        return np.sqrt(v / self.nobs)
diff --git a/statsmodels/multivariate/factor_rotation/_analytic_rotation.py b/statsmodels/multivariate/factor_rotation/_analytic_rotation.py
index cb43ea231..5425a3bc8 100644
--- a/statsmodels/multivariate/factor_rotation/_analytic_rotation.py
+++ b/statsmodels/multivariate/factor_rotation/_analytic_rotation.py
@@ -1,17 +1,19 @@
+# -*- coding: utf-8 -*-
 """
 This file contains analytic implementations of rotation methods.
 """
+
 import numpy as np
 import scipy as sp


 def target_rotation(A, H, full_rank=False):
-    """
+    r"""
     Analytically performs orthogonal rotations towards a target matrix,
     i.e., we minimize:

     .. math::
-        \\phi(L) =\\frac{1}{2}\\|AT-H\\|^2.
+        \phi(L) =\frac{1}{2}\|AT-H\|^2.

     where :math:`T` is an orthogonal matrix. This problem is also known as
     an orthogonal Procrustes problem.
@@ -20,14 +22,14 @@ def target_rotation(A, H, full_rank=False):
     solution :math:`T` is given by:

     .. math::
-        T = (A^*HH^*A)^{-\\frac{1}{2}}A^*H,
+        T = (A^*HH^*A)^{-\frac{1}{2}}A^*H,

     see Green (1952). In other cases the solution is given by :math:`T = UV`,
     where :math:`U` and :math:`V` result from the singular value decomposition
     of :math:`A^*H`:

     .. math::
-        A^*H = U\\Sigma V,
+        A^*H = U\Sigma V,

     see Schonemann (1966).

@@ -54,15 +56,21 @@ def target_rotation(A, H, full_rank=False):

     [3] Gower, Dijksterhuis (2004) - Procrustes problems
     """
-    pass
+    ATH = A.T.dot(H)
+    if full_rank or np.linalg.matrix_rank(ATH) == A.shape[1]:
+        T = sp.linalg.fractional_matrix_power(ATH.dot(ATH.T), -1/2).dot(ATH)
+    else:
+        U, D, V = np.linalg.svd(ATH, full_matrices=False)
+        T = U.dot(V)
+    return T


 def procrustes(A, H):
-    """
+    r"""
     Analytically solves the following Procrustes problem:

     .. math::
-        \\phi(L) =\\frac{1}{2}\\|AT-H\\|^2.
+        \phi(L) =\frac{1}{2}\|AT-H\|^2.

     (With no further conditions on :math:`H`)

@@ -70,7 +78,7 @@ def procrustes(A, H):
     solution :math:`T` is given by:

     .. math::
-        T = (A^*HH^*A)^{-\\frac{1}{2}}A^*H,
+        T = (A^*HH^*A)^{-\frac{1}{2}}A^*H,

     see Navarra, Simoncini (2010).

@@ -92,11 +100,11 @@ def procrustes(A, H):
     [1] Navarra, Simoncini (2010) - A guide to empirical orthogonal functions
     for climate data analysis
     """
-    pass
+    return np.linalg.inv(A.T.dot(A)).dot(A.T).dot(H)


 def promax(A, k=2):
-    """
+    r"""
     Performs promax rotation of the matrix :math:`A`.

     This method was not very clear to me from the literature, this
@@ -131,4 +139,15 @@ def promax(A, k=2):
     [2] Navarra, Simoncini (2010) - A guide to empirical orthogonal functions
     for climate data analysis
     """
-    pass
+    assert k > 0
+    # define rotation target using varimax rotation:
+    from ._wrappers import rotate_factors
+    V, T = rotate_factors(A, 'varimax')
+    H = np.abs(V)**k/V
+    # solve procrustes problem
+    S = procrustes(A, H)  # np.linalg.inv(A.T.dot(A)).dot(A.T).dot(H);
+    # normalize
+    d = np.sqrt(np.diag(np.linalg.inv(S.T.dot(S))))
+    D = np.diag(d)
+    T = np.linalg.inv(S.dot(D)).T
+    return A.dot(T), T
diff --git a/statsmodels/multivariate/factor_rotation/_gpa_rotation.py b/statsmodels/multivariate/factor_rotation/_gpa_rotation.py
index 89217acb5..1c5fd23cb 100644
--- a/statsmodels/multivariate/factor_rotation/_gpa_rotation.py
+++ b/statsmodels/multivariate/factor_rotation/_gpa_rotation.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 This file contains a Python version of the gradient projection rotation
 algorithms (GPA) developed by Bernaards, C.A. and Jennrich, R.I.
@@ -20,14 +21,15 @@ Psychometrika, 67, 7-19.

 [5] http://www.stat.ucla.edu/research/gpa/GPderfree.txt
 """
+
 import numpy as np


-def GPA(A, ff=None, vgQ=None, T=None, max_tries=501, rotation_method=
-    'orthogonal', tol=1e-05):
-    """
+def GPA(A, ff=None, vgQ=None, T=None, max_tries=501,
+        rotation_method='orthogonal', tol=1e-5):
+    r"""
     The gradient projection algorithm (GPA) minimizes a target function
-    :math:`\\phi(L)`, where :math:`L` is a matrix with rotated factors.
+    :math:`\phi(L)`, where :math:`L` is a matrix with rotated factors.

     For orthogonal rotation methods :math:`L=AT`, where :math:`T` is an
     orthogonal matrix. For oblique rotation matrices :math:`L=A(T^*)^{-1}`,
@@ -42,12 +44,12 @@ def GPA(A, ff=None, vgQ=None, T=None, max_tries=501, rotation_method=
     T : numpy matrix (default identity matrix)
         initial guess of rotation matrix
     ff : function (defualt None)
-        criterion :math:`\\phi` to optimize. Should have A, T, L as keyword
+        criterion :math:`\phi` to optimize. Should have A, T, L as keyword
         arguments
         and mapping to a float. Only used (and required) if vgQ is not
         provided.
     vgQ : function (defualt None)
-        criterion :math:`\\phi` to optimize and its derivative. Should have
+        criterion :math:`\phi` to optimize and its derivative. Should have
          A, T, L as keyword arguments and mapping to a tuple containing a
         float and vector. Can be omitted if ff is provided.
     max_tries : int (default 501)
@@ -58,48 +60,145 @@ def GPA(A, ff=None, vgQ=None, T=None, max_tries=501, rotation_method=
         stop criterion, algorithm stops if Frobenius norm of gradient is
         smaller then tol
     """
-    pass
+    # pre processing
+    if rotation_method not in ['orthogonal', 'oblique']:
+        raise ValueError('rotation_method should be one of '
+                         '{orthogonal, oblique}')
+    if vgQ is None:
+        if ff is None:
+            raise ValueError('ff should be provided if vgQ is not')
+        derivative_free = True
+        Gff = lambda x: Gf(x, lambda y: ff(T=y, A=A, L=None))
+    else:
+        derivative_free = False
+    if T is None:
+        T = np.eye(A.shape[1])
+    # pre processing for iteration
+    al = 1
+    table = []
+    # pre processing for iteration: initialize f and G
+    if derivative_free:
+        f = ff(T=T, A=A, L=None)
+        G = Gff(T)
+    elif rotation_method == 'orthogonal':  # and not derivative_free
+        L = A.dot(T)
+        f, Gq = vgQ(L=L)
+        G = (A.T).dot(Gq)
+    else:  # i.e. rotation_method == 'oblique' and not derivative_free
+        Ti = np.linalg.inv(T)
+        L = A.dot(Ti.T)
+        f, Gq = vgQ(L=L)
+        G = -((L.T).dot(Gq).dot(Ti)).T
+    # iteration
+    for i_try in range(0, max_tries):
+        # determine Gp
+        if rotation_method == 'orthogonal':
+            M = (T.T).dot(G)
+            S = (M + M.T)/2
+            Gp = G - T.dot(S)
+        else:  # i.e. if rotation_method == 'oblique':
+            Gp = G-T.dot(np.diag(np.sum(T*G, axis=0)))
+        s = np.linalg.norm(Gp, 'fro')
+        table.append([i_try, f, np.log10(s), al])
+        # if we are close stop
+        if s < tol:
+            break
+        # update T
+        al = 2*al
+        for i in range(11):
+            # determine Tt
+            X = T - al*Gp
+            if rotation_method == 'orthogonal':
+                U, D, V = np.linalg.svd(X, full_matrices=False)
+                Tt = U.dot(V)
+            else:  # i.e. if rotation_method == 'oblique':
+                v = 1/np.sqrt(np.sum(X**2, axis=0))
+                Tt = X.dot(np.diag(v))
+            # calculate objective using Tt
+            if derivative_free:
+                ft = ff(T=Tt, A=A, L=None)
+            elif rotation_method == 'orthogonal':  # and not derivative_free
+                L = A.dot(Tt)
+                ft, Gq = vgQ(L=L)
+            else:  # i.e. rotation_method == 'oblique' and not derivative_free
+                Ti = np.linalg.inv(Tt)
+                L = A.dot(Ti.T)
+                ft, Gq = vgQ(L=L)
+            # if sufficient improvement in objective -> use this T
+            if ft < f-.5*s**2*al:
+                break
+            al = al/2
+        # post processing for next iteration
+        T = Tt
+        f = ft
+        if derivative_free:
+            G = Gff(T)
+        elif rotation_method == 'orthogonal':  # and not derivative_free
+            G = (A.T).dot(Gq)
+        else:  # i.e. rotation_method == 'oblique' and not derivative_free
+            G = -((L.T).dot(Gq).dot(Ti)).T
+    # post processing
+    Th = T
+    Lh = rotateA(A, T, rotation_method=rotation_method)
+    Phi = (T.T).dot(T)
+    return Lh, Phi, Th, table


 def Gf(T, ff):
     """
     Subroutine for the gradient of f using numerical derivatives.
     """
-    pass
+    k = T.shape[0]
+    ep = 1e-4
+    G = np.zeros((k, k))
+    for r in range(k):
+        for s in range(k):
+            dT = np.zeros((k, k))
+            dT[r, s] = ep
+            G[r, s] = (ff(T+dT)-ff(T-dT))/(2*ep)
+    return G


 def rotateA(A, T, rotation_method='orthogonal'):
-    """
+    r"""
     For orthogonal rotation methods :math:`L=AT`, where :math:`T` is an
     orthogonal matrix. For oblique rotation matrices :math:`L=A(T^*)^{-1}`,
     where :math:`T` is a normal matrix, i.e., :math:`TT^*=T^*T`. Oblique
     rotations relax the orthogonality constraint in order to gain simplicity
     in the interpretation.
     """
-    pass
-
-
-def oblimin_objective(L=None, A=None, T=None, gamma=0, rotation_method=
-    'orthogonal', return_gradient=True):
-    """
+    if rotation_method == 'orthogonal':
+        L = A.dot(T)
+    elif rotation_method == 'oblique':
+        L = A.dot(np.linalg.inv(T.T))
+    else:  # i.e. if rotation_method == 'oblique':
+        raise ValueError('rotation_method should be one of '
+                         '{orthogonal, oblique}')
+    return L
+
+
+def oblimin_objective(L=None, A=None, T=None, gamma=0,
+                      rotation_method='orthogonal',
+                      return_gradient=True):
+    r"""
     Objective function for the oblimin family for orthogonal or
     oblique rotation wich minimizes:

     .. math::
-        \\phi(L) = \\frac{1}{4}(L\\circ L,(I-\\gamma C)(L\\circ L)N),
+        \phi(L) = \frac{1}{4}(L\circ L,(I-\gamma C)(L\circ L)N),

-    where :math:`L` is a :math:`p\\times k` matrix, :math:`N` is
-    :math:`k\\times k`
+    where :math:`L` is a :math:`p\times k` matrix, :math:`N` is
+    :math:`k\times k`
     matrix with zeros on the diagonal and ones elsewhere, :math:`C` is a
-    :math:`p\\times p` matrix with elements equal to :math:`1/p`,
-    :math:`(X,Y)=\\operatorname{Tr}(X^*Y)` is the Frobenius norm and
-    :math:`\\circ`
+    :math:`p\times p` matrix with elements equal to :math:`1/p`,
+    :math:`(X,Y)=\operatorname{Tr}(X^*Y)` is the Frobenius norm and
+    :math:`\circ`
     is the element-wise product or Hadamard product.

     The gradient is given by

     .. math::
-        L\\circ\\left[(I-\\gamma C) (L \\circ L)N\\right].
+        L\circ\left[(I-\gamma C) (L \circ L)N\right].

     Either :math:`L` should be provided or :math:`A` and :math:`T` should be
     provided.
@@ -117,17 +216,17 @@ def oblimin_objective(L=None, A=None, T=None, gamma=0, rotation_method=

     where :math:`T` is a normal matrix.

-    The oblimin family is parametrized by the parameter :math:`\\gamma`. For
+    The oblimin family is parametrized by the parameter :math:`\gamma`. For
     orthogonal rotations:

-    * :math:`\\gamma=0` corresponds to quartimax,
-    * :math:`\\gamma=\\frac{1}{2}` corresponds to biquartimax,
-    * :math:`\\gamma=1` corresponds to varimax,
-    * :math:`\\gamma=\\frac{1}{p}` corresponds to equamax.
+    * :math:`\gamma=0` corresponds to quartimax,
+    * :math:`\gamma=\frac{1}{2}` corresponds to biquartimax,
+    * :math:`\gamma=1` corresponds to varimax,
+    * :math:`\gamma=\frac{1}{p}` corresponds to equamax.
     For oblique rotations rotations:

-    * :math:`\\gamma=0` corresponds to quartimin,
-    * :math:`\\gamma=\\frac{1}{2}` corresponds to biquartimin.
+    * :math:`\gamma=0` corresponds to quartimin,
+    * :math:`\gamma=\frac{1}{2}` corresponds to biquartimin.

     Parameters
     ----------
@@ -144,22 +243,38 @@ def oblimin_objective(L=None, A=None, T=None, gamma=0, rotation_method=
     return_gradient : bool (default True)
         toggles return of gradient
     """
-    pass
+    if L is None:
+        assert A is not None and T is not None
+        L = rotateA(A, T, rotation_method=rotation_method)
+    p, k = L.shape
+    L2 = L**2
+    N = np.ones((k, k))-np.eye(k)
+    if np.isclose(gamma, 0):
+        X = L2.dot(N)
+    else:
+        C = np.ones((p, p))/p
+        X = (np.eye(p) - gamma*C).dot(L2).dot(N)
+    phi = np.sum(L2*X)/4
+    if return_gradient:
+        Gphi = L*X
+        return phi, Gphi
+    else:
+        return phi


 def orthomax_objective(L=None, A=None, T=None, gamma=0, return_gradient=True):
-    """
+    r"""
     Objective function for the orthomax family for orthogonal
     rotation wich minimizes the following objective:

     .. math::
-        \\phi(L) = -\\frac{1}{4}(L\\circ L,(I-\\gamma C)(L\\circ L)),
+        \phi(L) = -\frac{1}{4}(L\circ L,(I-\gamma C)(L\circ L)),

-    where :math:`0\\leq\\gamma\\leq1`, :math:`L` is a :math:`p\\times k` matrix,
-    :math:`C` is a  :math:`p\\times p` matrix with elements equal to
+    where :math:`0\leq\gamma\leq1`, :math:`L` is a :math:`p\times k` matrix,
+    :math:`C` is a  :math:`p\times p` matrix with elements equal to
     :math:`1/p`,
-    :math:`(X,Y)=\\operatorname{Tr}(X^*Y)` is the Frobenius norm and
-    :math:`\\circ` is the element-wise product or Hadamard product.
+    :math:`(X,Y)=\operatorname{Tr}(X^*Y)` is the Frobenius norm and
+    :math:`\circ` is the element-wise product or Hadamard product.

     Either :math:`L` should be provided or :math:`A` and :math:`T` should be
     provided.
@@ -171,12 +286,12 @@ def orthomax_objective(L=None, A=None, T=None, gamma=0, return_gradient=True):

     where :math:`T` is an orthogonal matrix.

-    The orthomax family is parametrized by the parameter :math:`\\gamma`:
+    The orthomax family is parametrized by the parameter :math:`\gamma`:

-    * :math:`\\gamma=0` corresponds to quartimax,
-    * :math:`\\gamma=\\frac{1}{2}` corresponds to biquartimax,
-    * :math:`\\gamma=1` corresponds to varimax,
-    * :math:`\\gamma=\\frac{1}{p}` corresponds to equamax.
+    * :math:`\gamma=0` corresponds to quartimax,
+    * :math:`\gamma=\frac{1}{2}` corresponds to biquartimax,
+    * :math:`\gamma=1` corresponds to varimax,
+    * :math:`\gamma=\frac{1}{p}` corresponds to equamax.

     Parameters
     ----------
@@ -191,32 +306,49 @@ def orthomax_objective(L=None, A=None, T=None, gamma=0, return_gradient=True):
     return_gradient : bool (default True)
         toggles return of gradient
     """
-    pass
-
-
-def CF_objective(L=None, A=None, T=None, kappa=0, rotation_method=
-    'orthogonal', return_gradient=True):
-    """
+    assert 0 <= gamma <= 1, "Gamma should be between 0 and 1"
+    if L is None:
+        assert A is not None and T is not None
+        L = rotateA(A, T, rotation_method='orthogonal')
+    p, k = L.shape
+    L2 = L**2
+    if np.isclose(gamma, 0):
+        X = L2
+    else:
+        C = np.ones((p, p))/p
+        X = (np.eye(p)-gamma*C).dot(L2)
+    phi = -np.sum(L2*X)/4
+    if return_gradient:
+        Gphi = -L*X
+        return phi, Gphi
+    else:
+        return phi
+
+
+def CF_objective(L=None, A=None, T=None, kappa=0,
+                 rotation_method='orthogonal',
+                 return_gradient=True):
+    r"""
     Objective function for the Crawford-Ferguson family for orthogonal
     and oblique rotation wich minimizes the following objective:

     .. math::
-        \\phi(L) =\\frac{1-\\kappa}{4} (L\\circ L,(L\\circ L)N)
-                  -\\frac{1}{4}(L\\circ L,M(L\\circ L)),
+        \phi(L) =\frac{1-\kappa}{4} (L\circ L,(L\circ L)N)
+                  -\frac{1}{4}(L\circ L,M(L\circ L)),

-    where :math:`0\\leq\\kappa\\leq1`, :math:`L` is a :math:`p\\times k` matrix,
-    :math:`N` is :math:`k\\times k` matrix with zeros on the diagonal and ones
+    where :math:`0\leq\kappa\leq1`, :math:`L` is a :math:`p\times k` matrix,
+    :math:`N` is :math:`k\times k` matrix with zeros on the diagonal and ones
     elsewhere,
-    :math:`M` is :math:`p\\times p` matrix with zeros on the diagonal and ones
+    :math:`M` is :math:`p\times p` matrix with zeros on the diagonal and ones
     elsewhere
-    :math:`(X,Y)=\\operatorname{Tr}(X^*Y)` is the Frobenius norm and
-    :math:`\\circ` is the element-wise product or Hadamard product.
+    :math:`(X,Y)=\operatorname{Tr}(X^*Y)` is the Frobenius norm and
+    :math:`\circ` is the element-wise product or Hadamard product.

     The gradient is given by

     .. math::
-       d\\phi(L) = (1-\\kappa) L\\circ\\left[(L\\circ L)N\\right]
-                   -\\kappa L\\circ \\left[M(L\\circ L)\\right].
+       d\phi(L) = (1-\kappa) L\circ\left[(L\circ L)N\right]
+                   -\kappa L\circ \left[M(L\circ L)\right].

     Either :math:`L` should be provided or :math:`A` and :math:`T` should be
     provided.
@@ -237,10 +369,10 @@ def CF_objective(L=None, A=None, T=None, kappa=0, rotation_method=
     For orthogonal rotations the oblimin (and orthomax) family of rotations is
     equivalent to the Crawford-Ferguson family. To be more precise:

-    * :math:`\\kappa=0` corresponds to quartimax,
-    * :math:`\\kappa=\\frac{1}{p}` corresponds to variamx,
-    * :math:`\\kappa=\\frac{k-1}{p+k-2}` corresponds to parsimax,
-    * :math:`\\kappa=1` corresponds to factor parsimony.
+    * :math:`\kappa=0` corresponds to quartimax,
+    * :math:`\kappa=\frac{1}{p}` corresponds to variamx,
+    * :math:`\kappa=\frac{k-1}{p+k-2}` corresponds to parsimax,
+    * :math:`\kappa=1` corresponds to factor parsimony.

     Parameters
     ----------
@@ -257,21 +389,42 @@ def CF_objective(L=None, A=None, T=None, kappa=0, rotation_method=
     return_gradient : bool (default True)
         toggles return of gradient
     """
-    pass
+    assert 0 <= kappa <= 1, "Kappa should be between 0 and 1"
+    if L is None:
+        assert A is not None and T is not None
+        L = rotateA(A, T, rotation_method=rotation_method)
+    p, k = L.shape
+    L2 = L**2
+    X = None
+    if not np.isclose(kappa, 1):
+        N = np.ones((k, k)) - np.eye(k)
+        X = (1 - kappa)*L2.dot(N)
+    if not np.isclose(kappa, 0):
+        M = np.ones((p, p)) - np.eye(p)
+        if X is None:
+            X = kappa*M.dot(L2)
+        else:
+            X += kappa*M.dot(L2)
+    phi = np.sum(L2 * X) / 4
+    if return_gradient:
+        Gphi = L*X
+        return phi, Gphi
+    else:
+        return phi


 def vgQ_target(H, L=None, A=None, T=None, rotation_method='orthogonal'):
-    """
+    r"""
     Subroutine for the value of vgQ using orthogonal or oblique rotation
     towards a target matrix, i.e., we minimize:

     .. math::
-        \\phi(L) =\\frac{1}{2}\\|L-H\\|^2
+        \phi(L) =\frac{1}{2}\|L-H\|^2

     and the gradient is given by

     .. math::
-        d\\phi(L)=L-H.
+        d\phi(L)=L-H.

     Either :math:`L` should be provided or :math:`A` and :math:`T` should be
     provided.
@@ -302,16 +455,21 @@ def vgQ_target(H, L=None, A=None, T=None, rotation_method='orthogonal'):
     rotation_method : str
         should be one of {orthogonal, oblique}
     """
-    pass
+    if L is None:
+        assert A is not None and T is not None
+        L = rotateA(A, T, rotation_method=rotation_method)
+    q = np.linalg.norm(L-H, 'fro')**2
+    Gq = 2*(L-H)
+    return q, Gq


 def ff_target(H, L=None, A=None, T=None, rotation_method='orthogonal'):
-    """
+    r"""
     Subroutine for the value of f using (orthogonal or oblique) rotation
     towards a target matrix, i.e., we minimize:

     .. math::
-        \\phi(L) =\\frac{1}{2}\\|L-H\\|^2.
+        \phi(L) =\frac{1}{2}\|L-H\|^2.

     Either :math:`L` should be provided or :math:`A` and :math:`T` should be
     provided. For orthogonal rotations :math:`L` satisfies
@@ -340,23 +498,26 @@ def ff_target(H, L=None, A=None, T=None, rotation_method='orthogonal'):
     rotation_method : str
         should be one of {orthogonal, oblique}
     """
-    pass
+    if L is None:
+        assert A is not None and T is not None
+        L = rotateA(A, T, rotation_method=rotation_method)
+    return np.linalg.norm(L-H, 'fro')**2


 def vgQ_partial_target(H, W=None, L=None, A=None, T=None):
-    """
+    r"""
     Subroutine for the value of vgQ using orthogonal rotation towards a partial
     target matrix, i.e., we minimize:

     .. math::
-        \\phi(L) =\\frac{1}{2}\\|W\\circ(L-H)\\|^2,
+        \phi(L) =\frac{1}{2}\|W\circ(L-H)\|^2,

-    where :math:`\\circ` is the element-wise product or Hadamard product and
+    where :math:`\circ` is the element-wise product or Hadamard product and
     :math:`W` is a matrix whose entries can only be one or zero. The gradient
     is given by

     .. math::
-        d\\phi(L)=W\\circ(L-H).
+        d\phi(L)=W\circ(L-H).

     Either :math:`L` should be provided or :math:`A` and :math:`T` should be
     provided.
@@ -381,18 +542,25 @@ def vgQ_partial_target(H, W=None, L=None, A=None, T=None):
     T : numpy matrix (default None)
         rotation matrix
     """
-    pass
+    if W is None:
+        return vgQ_target(H, L=L, A=A, T=T)
+    if L is None:
+        assert A is not None and T is not None
+        L = rotateA(A, T, rotation_method='orthogonal')
+    q = np.linalg.norm(W*(L-H), 'fro')**2
+    Gq = 2*W*(L-H)
+    return q, Gq


 def ff_partial_target(H, W=None, L=None, A=None, T=None):
-    """
+    r"""
     Subroutine for the value of vgQ using orthogonal rotation towards a partial
     target matrix, i.e., we minimize:

     .. math::
-        \\phi(L) =\\frac{1}{2}\\|W\\circ(L-H)\\|^2,
+        \phi(L) =\frac{1}{2}\|W\circ(L-H)\|^2,

-    where :math:`\\circ` is the element-wise product or Hadamard product and
+    where :math:`\circ` is the element-wise product or Hadamard product and
     :math:`W` is a matrix whose entries can only be one or zero. Either
     :math:`L` should be provided or :math:`A` and :math:`T` should be provided.

@@ -416,4 +584,10 @@ def ff_partial_target(H, W=None, L=None, A=None, T=None):
     T : numpy matrix (default None)
         rotation matrix
     """
-    pass
+    if W is None:
+        return ff_target(H, L=L, A=A, T=T)
+    if L is None:
+        assert A is not None and T is not None
+        L = rotateA(A, T, rotation_method='orthogonal')
+    q = np.linalg.norm(W*(L-H), 'fro')**2
+    return q
diff --git a/statsmodels/multivariate/factor_rotation/_wrappers.py b/statsmodels/multivariate/factor_rotation/_wrappers.py
index 38915470c..e5a5e4b57 100644
--- a/statsmodels/multivariate/factor_rotation/_wrappers.py
+++ b/statsmodels/multivariate/factor_rotation/_wrappers.py
@@ -1,13 +1,17 @@
+# -*- coding: utf-8 -*-
+
+
 from ._analytic_rotation import target_rotation
 from ._gpa_rotation import oblimin_objective, orthomax_objective, CF_objective
 from ._gpa_rotation import ff_partial_target, ff_target
 from ._gpa_rotation import vgQ_partial_target, vgQ_target
 from ._gpa_rotation import rotateA, GPA
+
 __all__ = []


 def rotate_factors(A, method, *method_args, **algorithm_kwargs):
-    """
+    r"""
     Subroutine for orthogonal and oblique rotation of the matrix :math:`A`.
     For orthogonal rotations :math:`A` is rotated to :math:`L` according to

@@ -70,31 +74,31 @@ def rotate_factors(A, method, *method_args, **algorithm_kwargs):

     Below,

-        * :math:`L` is a :math:`p\\times k` matrix;
-        * :math:`N` is :math:`k\\times k` matrix with zeros on the diagonal and ones
+        * :math:`L` is a :math:`p\times k` matrix;
+        * :math:`N` is :math:`k\times k` matrix with zeros on the diagonal and ones
           elsewhere;
-        * :math:`M` is :math:`p\\times p` matrix with zeros on the diagonal and ones
+        * :math:`M` is :math:`p\times p` matrix with zeros on the diagonal and ones
           elsewhere;
-        * :math:`C` is a :math:`p\\times p` matrix with elements equal to
+        * :math:`C` is a :math:`p\times p` matrix with elements equal to
           :math:`1/p`;
-        * :math:`(X,Y)=\\operatorname{Tr}(X^*Y)` is the Frobenius norm;
-        * :math:`\\circ` is the element-wise product or Hadamard product.
+        * :math:`(X,Y)=\operatorname{Tr}(X^*Y)` is the Frobenius norm;
+        * :math:`\circ` is the element-wise product or Hadamard product.

     oblimin : orthogonal or oblique rotation that minimizes
         .. math::
-            \\phi(L) = \\frac{1}{4}(L\\circ L,(I-\\gamma C)(L\\circ L)N).
+            \phi(L) = \frac{1}{4}(L\circ L,(I-\gamma C)(L\circ L)N).

         For orthogonal rotations:

-        * :math:`\\gamma=0` corresponds to quartimax,
-        * :math:`\\gamma=\\frac{1}{2}` corresponds to biquartimax,
-        * :math:`\\gamma=1` corresponds to varimax,
-        * :math:`\\gamma=\\frac{1}{p}` corresponds to equamax.
+        * :math:`\gamma=0` corresponds to quartimax,
+        * :math:`\gamma=\frac{1}{2}` corresponds to biquartimax,
+        * :math:`\gamma=1` corresponds to varimax,
+        * :math:`\gamma=\frac{1}{p}` corresponds to equamax.

         For oblique rotations rotations:

-        * :math:`\\gamma=0` corresponds to quartimin,
-        * :math:`\\gamma=\\frac{1}{2}` corresponds to biquartimin.
+        * :math:`\gamma=0` corresponds to quartimin,
+        * :math:`\gamma=\frac{1}{2}` corresponds to biquartimin.

         method_args:

@@ -106,16 +110,16 @@ def rotate_factors(A, method, *method_args, **algorithm_kwargs):
     orthomax : orthogonal rotation that minimizes

         .. math::
-            \\phi(L) = -\\frac{1}{4}(L\\circ L,(I-\\gamma C)(L\\circ L)),
+            \phi(L) = -\frac{1}{4}(L\circ L,(I-\gamma C)(L\circ L)),

-        where :math:`0\\leq\\gamma\\leq1`. The orthomax family is equivalent to
+        where :math:`0\leq\gamma\leq1`. The orthomax family is equivalent to
         the oblimin family (when restricted to orthogonal rotations).
         Furthermore,

-        * :math:`\\gamma=0` corresponds to quartimax,
-        * :math:`\\gamma=\\frac{1}{2}` corresponds to biquartimax,
-        * :math:`\\gamma=1` corresponds to varimax,
-        * :math:`\\gamma=\\frac{1}{p}` corresponds to equamax.
+        * :math:`\gamma=0` corresponds to quartimax,
+        * :math:`\gamma=\frac{1}{2}` corresponds to biquartimax,
+        * :math:`\gamma=1` corresponds to varimax,
+        * :math:`\gamma=\frac{1}{p}` corresponds to equamax.

         method_args:

@@ -127,18 +131,18 @@ def rotate_factors(A, method, *method_args, **algorithm_kwargs):

         .. math::

-            \\phi(L) =\\frac{1-\\kappa}{4} (L\\circ L,(L\\circ L)N)
-                     -\\frac{1}{4}(L\\circ L,M(L\\circ L)),
+            \phi(L) =\frac{1-\kappa}{4} (L\circ L,(L\circ L)N)
+                     -\frac{1}{4}(L\circ L,M(L\circ L)),

-        where :math:`0\\leq\\kappa\\leq1`. For orthogonal rotations the oblimin
+        where :math:`0\leq\kappa\leq1`. For orthogonal rotations the oblimin
         (and orthomax) family of rotations is equivalent to the
         Crawford-Ferguson family.
         To be more precise:

-        * :math:`\\kappa=0` corresponds to quartimax,
-        * :math:`\\kappa=\\frac{1}{p}` corresponds to varimax,
-        * :math:`\\kappa=\\frac{k-1}{p+k-2}` corresponds to parsimax,
-        * :math:`\\kappa=1` corresponds to factor parsimony.
+        * :math:`\kappa=0` corresponds to quartimax,
+        * :math:`\kappa=\frac{1}{p}` corresponds to varimax,
+        * :math:`\kappa=\frac{k-1}{p+k-2}` corresponds to parsimax,
+        * :math:`\kappa=1` corresponds to factor parsimony.

         method_args:

@@ -148,29 +152,29 @@ def rotate_factors(A, method, *method_args, **algorithm_kwargs):
             should be one of {orthogonal, oblique}

     quartimax : orthogonal rotation method
-        minimizes the orthomax objective with :math:`\\gamma=0`
+        minimizes the orthomax objective with :math:`\gamma=0`

     biquartimax : orthogonal rotation method
-        minimizes the orthomax objective with :math:`\\gamma=\\frac{1}{2}`
+        minimizes the orthomax objective with :math:`\gamma=\frac{1}{2}`

     varimax : orthogonal rotation method
-        minimizes the orthomax objective with :math:`\\gamma=1`
+        minimizes the orthomax objective with :math:`\gamma=1`

     equamax : orthogonal rotation method
-        minimizes the orthomax objective with :math:`\\gamma=\\frac{1}{p}`
+        minimizes the orthomax objective with :math:`\gamma=\frac{1}{p}`

     parsimax : orthogonal rotation method
         minimizes the Crawford-Ferguson family objective with
-        :math:`\\kappa=\\frac{k-1}{p+k-2}`
+        :math:`\kappa=\frac{k-1}{p+k-2}`

     parsimony : orthogonal rotation method
-        minimizes the Crawford-Ferguson family objective with :math:`\\kappa=1`
+        minimizes the Crawford-Ferguson family objective with :math:`\kappa=1`

     quartimin : oblique rotation method that minimizes
-        minimizes the oblimin objective with :math:`\\gamma=0`
+        minimizes the oblimin objective with :math:`\gamma=0`

     quartimin : oblique rotation method that minimizes
-        minimizes the oblimin objective with :math:`\\gamma=\\frac{1}{2}`
+        minimizes the oblimin objective with :math:`\gamma=\frac{1}{2}`

     target : orthogonal or oblique rotation that rotates towards a target

@@ -178,7 +182,7 @@ def rotate_factors(A, method, *method_args, **algorithm_kwargs):

         .. math::

-            \\phi(L) =\\frac{1}{2}\\|L-H\\|^2.
+            \phi(L) =\frac{1}{2}\|L-H\|^2.

         method_args:

@@ -198,7 +202,7 @@ def rotate_factors(A, method, *method_args, **algorithm_kwargs):

         .. math::

-            \\phi(L) =\\frac{1}{2}\\|W\\circ(L-H)\\|^2.
+            \phi(L) =\frac{1}{2}\|W\circ(L-H)\|^2.

         method_args:

@@ -217,4 +221,133 @@ def rotate_factors(A, method, *method_args, **algorithm_kwargs):
     >>> L, T = rotate_factors(A,'quartimin',0.5)
     >>> np.allclose(L,A.dot(np.linalg.inv(T.T)))
     """
-    pass
+    if 'algorithm' in algorithm_kwargs:
+        algorithm = algorithm_kwargs['algorithm']
+        algorithm_kwargs.pop('algorithm')
+    else:
+        algorithm = 'gpa'
+    assert not ('rotation_method' in algorithm_kwargs), (
+        'rotation_method cannot be provided as keyword argument')
+    L = None
+    T = None
+    ff = None
+    vgQ = None
+    p, k = A.shape
+    # set ff or vgQ to appropriate objective function, compute solution using
+    # recursion or analytically compute solution
+    if method == 'orthomax':
+        assert len(method_args) == 1, ('Only %s family parameter should be '
+                                       'provided' % method)
+        rotation_method = 'orthogonal'
+        gamma = method_args[0]
+        if algorithm == 'gpa':
+            vgQ = lambda L=None, A=None, T=None: orthomax_objective(
+                L=L, A=A, T=T, gamma=gamma, return_gradient=True)
+        elif algorithm == 'gpa_der_free':
+            ff = lambda L=None, A=None, T=None: orthomax_objective(
+                L=L, A=A, T=T, gamma=gamma, return_gradient=False)
+        else:
+            raise ValueError('Algorithm %s is not possible for %s '
+                             'rotation' % (algorithm, method))
+    elif method == 'oblimin':
+        assert len(method_args) == 2, ('Both %s family parameter and '
+                                       'rotation_method should be '
+                                       'provided' % method)
+        rotation_method = method_args[1]
+        assert rotation_method in ['orthogonal', 'oblique'], (
+            'rotation_method should be one of {orthogonal, oblique}')
+        gamma = method_args[0]
+        if algorithm == 'gpa':
+            vgQ = lambda L=None, A=None, T=None: oblimin_objective(
+                L=L, A=A, T=T, gamma=gamma, return_gradient=True)
+        elif algorithm == 'gpa_der_free':
+            ff = lambda L=None, A=None, T=None: oblimin_objective(
+                L=L, A=A, T=T, gamma=gamma, rotation_method=rotation_method,
+                return_gradient=False)
+        else:
+            raise ValueError('Algorithm %s is not possible for %s '
+                             'rotation' % (algorithm, method))
+    elif method == 'CF':
+        assert len(method_args) == 2, ('Both %s family parameter and '
+                                       'rotation_method should be provided'
+                                       % method)
+        rotation_method = method_args[1]
+        assert rotation_method in ['orthogonal', 'oblique'], (
+            'rotation_method should be one of {orthogonal, oblique}')
+        kappa = method_args[0]
+        if algorithm == 'gpa':
+            vgQ = lambda L=None, A=None, T=None: CF_objective(
+                L=L, A=A, T=T, kappa=kappa, rotation_method=rotation_method,
+                return_gradient=True)
+        elif algorithm == 'gpa_der_free':
+            ff = lambda L=None, A=None, T=None: CF_objective(
+                L=L, A=A, T=T, kappa=kappa, rotation_method=rotation_method,
+                return_gradient=False)
+        else:
+            raise ValueError('Algorithm %s is not possible for %s '
+                             'rotation' % (algorithm, method))
+    elif method == 'quartimax':
+        return rotate_factors(A, 'orthomax', 0, **algorithm_kwargs)
+    elif method == 'biquartimax':
+        return rotate_factors(A, 'orthomax', 0.5, **algorithm_kwargs)
+    elif method == 'varimax':
+        return rotate_factors(A, 'orthomax', 1, **algorithm_kwargs)
+    elif method == 'equamax':
+        return rotate_factors(A, 'orthomax', 1/p, **algorithm_kwargs)
+    elif method == 'parsimax':
+        return rotate_factors(A, 'CF', (k-1)/(p+k-2),
+                              'orthogonal', **algorithm_kwargs)
+    elif method == 'parsimony':
+        return rotate_factors(A, 'CF', 1, 'orthogonal', **algorithm_kwargs)
+    elif method == 'quartimin':
+        return rotate_factors(A, 'oblimin', 0, 'oblique', **algorithm_kwargs)
+    elif method == 'biquartimin':
+        return rotate_factors(A, 'oblimin', 0.5, 'oblique', **algorithm_kwargs)
+    elif method == 'target':
+        assert len(method_args) == 2, (
+            'only the rotation target and orthogonal/oblique should be provide'
+            ' for %s rotation' % method)
+        H = method_args[0]
+        rotation_method = method_args[1]
+        assert rotation_method in ['orthogonal', 'oblique'], (
+            'rotation_method should be one of {orthogonal, oblique}')
+        if algorithm == 'gpa':
+            vgQ = lambda L=None, A=None, T=None: vgQ_target(
+                H, L=L, A=A, T=T, rotation_method=rotation_method)
+        elif algorithm == 'gpa_der_free':
+            ff = lambda L=None, A=None, T=None: ff_target(
+                H, L=L, A=A, T=T, rotation_method=rotation_method)
+        elif algorithm == 'analytic':
+            assert rotation_method == 'orthogonal', (
+                'For analytic %s rotation only orthogonal rotation is '
+                'supported')
+            T = target_rotation(A, H, **algorithm_kwargs)
+        else:
+            raise ValueError('Algorithm %s is not possible for %s rotation'
+                             % (algorithm, method))
+    elif method == 'partial_target':
+        assert len(method_args) == 2, ('2 additional arguments are expected '
+                                       'for %s rotation' % method)
+        H = method_args[0]
+        W = method_args[1]
+        rotation_method = 'orthogonal'
+        if algorithm == 'gpa':
+            vgQ = lambda L=None, A=None, T=None: vgQ_partial_target(
+                H, W=W, L=L, A=A, T=T)
+        elif algorithm == 'gpa_der_free':
+            ff = lambda L=None, A=None, T=None: ff_partial_target(
+                H, W=W, L=L, A=A, T=T)
+        else:
+            raise ValueError('Algorithm %s is not possible for %s '
+                             'rotation' % (algorithm, method))
+    else:
+        raise ValueError('Invalid method')
+    # compute L and T if not already done
+    if T is None:
+        L, phi, T, table = GPA(A, vgQ=vgQ, ff=ff,
+                               rotation_method=rotation_method,
+                               **algorithm_kwargs)
+    if L is None:
+        assert T is not None, 'Cannot compute L without T'
+        L = rotateA(A, T, rotation_method=rotation_method)
+    return L, T
diff --git a/statsmodels/multivariate/manova.py b/statsmodels/multivariate/manova.py
index 83af8ad8e..6cba8dc43 100644
--- a/statsmodels/multivariate/manova.py
+++ b/statsmodels/multivariate/manova.py
@@ -1,13 +1,17 @@
+# -*- coding: utf-8 -*-
+
 """Multivariate analysis of variance

 author: Yichuan Liu
 """
 import numpy as np
+
 from statsmodels.compat.pandas import Substitution
 from statsmodels.base.model import Model
 from .multivariate_ols import MultivariateTestResults
 from .multivariate_ols import _multivariate_ols_fit
 from .multivariate_ols import _multivariate_ols_test, _hypotheses_doc
+
 __docformat__ = 'restructuredtext en'


@@ -56,13 +60,16 @@ class MANOVA(Model):

     def __init__(self, endog, exog, missing='none', hasconst=None, **kwargs):
         if len(endog.shape) == 1 or endog.shape[1] == 1:
-            raise ValueError(
-                'There must be more than one dependent variable to fit MANOVA!'
-                )
-        super(MANOVA, self).__init__(endog, exog, missing=missing, hasconst
-            =hasconst, **kwargs)
+            raise ValueError('There must be more than one dependent variable'
+                             ' to fit MANOVA!')
+        super(MANOVA, self).__init__(endog, exog, missing=missing,
+                                     hasconst=hasconst, **kwargs)
         self._fittedmod = _multivariate_ols_fit(self.endog, self.exog)

+    def fit(self):
+        raise NotImplementedError('fit is not needed to use MANOVA. Call'
+                                  'mv_test directly on a MANOVA instance.')
+
     @Substitution(hypotheses_doc=_hypotheses_doc)
     def mv_test(self, hypotheses=None, skip_intercept_test=False):
         """
@@ -97,4 +104,26 @@ class MANOVA(Model):
         interface should be preferred when specifying a model since it
         provides knowledge about the model when specifying the hypotheses.
         """
-        pass
+        if hypotheses is None:
+            if (hasattr(self, 'data') and self.data is not None and
+                        hasattr(self.data, 'design_info')):
+                terms = self.data.design_info.term_name_slices
+                hypotheses = []
+                for key in terms:
+                    if skip_intercept_test and key == 'Intercept':
+                        continue
+                    L_contrast = np.eye(self.exog.shape[1])[terms[key], :]
+                    hypotheses.append([key, L_contrast, None])
+            else:
+                hypotheses = []
+                for i in range(self.exog.shape[1]):
+                    name = 'x%d' % (i)
+                    L = np.zeros([1, self.exog.shape[1]])
+                    L[0, i] = 1
+                    hypotheses.append([name, L, None])
+
+        results = _multivariate_ols_test(hypotheses, self._fittedmod,
+                                         self.exog_names, self.endog_names)
+
+        return MultivariateTestResults(results, self.endog_names,
+                                       self.exog_names)
diff --git a/statsmodels/multivariate/multivariate_ols.py b/statsmodels/multivariate/multivariate_ols.py
index a5d9f9e9d..95f707336 100644
--- a/statsmodels/multivariate/multivariate_ols.py
+++ b/statsmodels/multivariate/multivariate_ols.py
@@ -1,3 +1,5 @@
+# -*- coding: utf-8 -*-
+
 """General linear model

 author: Yichuan Liu
@@ -7,11 +9,14 @@ from numpy.linalg import eigvals, inv, solve, matrix_rank, pinv, svd
 from scipy import stats
 import pandas as pd
 from patsy import DesignInfo
+
 from statsmodels.compat.pandas import Substitution
 from statsmodels.base.model import Model
 from statsmodels.iolib import summary2
 __docformat__ = 'restructuredtext en'
-_hypotheses_doc = """hypotheses : list[tuple]
+
+_hypotheses_doc = \
+"""hypotheses : list[tuple]
     Hypothesis `L*B*M = C` to be tested where B is the parameters in
     regression Y = X*B. Each element is a tuple of length 2, 3, or 4:

@@ -53,7 +58,7 @@ _hypotheses_doc = """hypotheses : list[tuple]
 """


-def _multivariate_ols_fit(endog, exog, method='svd', tolerance=1e-08):
+def _multivariate_ols_fit(endog, exog, method='svd', tolerance=1e-8):
     """
     Solve multivariate linear model y = x * params
     where y is dependent variables, x is independent variables
@@ -79,11 +84,49 @@ def _multivariate_ols_fit(endog, exog, method='svd', tolerance=1e-08):
     -----
     Status: experimental and incomplete
     """
-    pass
-
-
-def multivariate_stats(eigenvals, r_err_sscp, r_contrast, df_resid,
-    tolerance=1e-08):
+    y = endog
+    x = exog
+    nobs, k_endog = y.shape
+    nobs1, k_exog= x.shape
+    if nobs != nobs1:
+        raise ValueError('x(n=%d) and y(n=%d) should have the same number of '
+                         'rows!' % (nobs1, nobs))
+
+    # Calculate the matrices necessary for hypotheses testing
+    df_resid = nobs - k_exog
+    if method == 'pinv':
+        # Regression coefficients matrix
+        pinv_x = pinv(x)
+        params = pinv_x.dot(y)
+
+        # inverse of x'x
+        inv_cov = pinv_x.dot(pinv_x.T)
+        if matrix_rank(inv_cov,tol=tolerance) < k_exog:
+            raise ValueError('Covariance of x singular!')
+
+        # Sums of squares and cross-products of residuals
+        # Y'Y - (X * params)'B * params
+        t = x.dot(params)
+        sscpr = np.subtract(y.T.dot(y), t.T.dot(t))
+        return (params, df_resid, inv_cov, sscpr)
+    elif method == 'svd':
+        u, s, v = svd(x, 0)
+        if (s > tolerance).sum() < len(s):
+            raise ValueError('Covariance of x singular!')
+        invs = 1. / s
+
+        params = v.T.dot(np.diag(invs)).dot(u.T).dot(y)
+        inv_cov = v.T.dot(np.diag(np.power(invs, 2))).dot(v)
+        t = np.diag(s).dot(v).dot(params)
+        sscpr = np.subtract(y.T.dot(y), t.T.dot(t))
+        return (params, df_resid, inv_cov, sscpr)
+    else:
+        raise ValueError('%s is not a supported method!' % method)
+
+
+def multivariate_stats(eigenvals,
+                       r_err_sscp,
+                       r_contrast, df_resid, tolerance=1e-8):
     """
     For multivariate linear model Y = X * B
     Testing hypotheses
@@ -115,7 +158,109 @@ def multivariate_stats(eigenvals, r_err_sscp, r_contrast, df_resid,
     ----------
     .. [*] https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_introreg_sect012.htm
     """
-    pass
+    v = df_resid
+    p = r_err_sscp
+    q = r_contrast
+    s = np.min([p, q])
+    ind = eigenvals > tolerance
+    n_e = ind.sum()
+    eigv2 = eigenvals[ind]
+    eigv1 = np.array([i / (1 - i) for i in eigv2])
+    m = (np.abs(p - q) - 1) / 2
+    n = (v - p - 1) / 2
+
+    cols = ['Value', 'Num DF', 'Den DF', 'F Value', 'Pr > F']
+    index = ["Wilks' lambda", "Pillai's trace",
+             "Hotelling-Lawley trace", "Roy's greatest root"]
+    results = pd.DataFrame(columns=cols,
+                           index=index)
+
+    def fn(x):
+        return np.real([x])[0]
+
+    results.loc["Wilks' lambda", 'Value'] = fn(np.prod(1 - eigv2))
+
+    results.loc["Pillai's trace", 'Value'] = fn(eigv2.sum())
+
+    results.loc["Hotelling-Lawley trace", 'Value'] = fn(eigv1.sum())
+
+    results.loc["Roy's greatest root", 'Value'] = fn(eigv1.max())
+
+    r = v - (p - q + 1)/2
+    u = (p*q - 2) / 4
+    df1 = p * q
+    if p*p + q*q - 5 > 0:
+        t = np.sqrt((p*p*q*q - 4) / (p*p + q*q - 5))
+    else:
+        t = 1
+    df2 = r*t - 2*u
+    lmd = results.loc["Wilks' lambda", 'Value']
+    lmd = np.power(lmd, 1 / t)
+    F = (1 - lmd) / lmd * df2 / df1
+    results.loc["Wilks' lambda", 'Num DF'] = df1
+    results.loc["Wilks' lambda", 'Den DF'] = df2
+    results.loc["Wilks' lambda", 'F Value'] = F
+    pval = stats.f.sf(F, df1, df2)
+    results.loc["Wilks' lambda", 'Pr > F'] = pval
+
+    V = results.loc["Pillai's trace", 'Value']
+    df1 = s * (2*m + s + 1)
+    df2 = s * (2*n + s + 1)
+    F = df2 / df1 * V / (s - V)
+    results.loc["Pillai's trace", 'Num DF'] = df1
+    results.loc["Pillai's trace", 'Den DF'] = df2
+    results.loc["Pillai's trace", 'F Value'] = F
+    pval = stats.f.sf(F, df1, df2)
+    results.loc["Pillai's trace", 'Pr > F'] = pval
+
+    U = results.loc["Hotelling-Lawley trace", 'Value']
+    if n > 0:
+        b = (p + 2*n) * (q + 2*n) / 2 / (2*n + 1) / (n - 1)
+        df1 = p * q
+        df2 = 4 + (p*q + 2) / (b - 1)
+        c = (df2 - 2) / 2 / n
+        F = df2 / df1 * U / c
+    else:
+        df1 = s * (2*m + s + 1)
+        df2 = s * (s*n + 1)
+        F = df2 / df1 / s * U
+    results.loc["Hotelling-Lawley trace", 'Num DF'] = df1
+    results.loc["Hotelling-Lawley trace", 'Den DF'] = df2
+    results.loc["Hotelling-Lawley trace", 'F Value'] = F
+    pval = stats.f.sf(F, df1, df2)
+    results.loc["Hotelling-Lawley trace", 'Pr > F'] = pval
+
+    sigma = results.loc["Roy's greatest root", 'Value']
+    r = np.max([p, q])
+    df1 = r
+    df2 = v - r + q
+    F = df2 / df1 * sigma
+    results.loc["Roy's greatest root", 'Num DF'] = df1
+    results.loc["Roy's greatest root", 'Den DF'] = df2
+    results.loc["Roy's greatest root", 'F Value'] = F
+    pval = stats.f.sf(F, df1, df2)
+    results.loc["Roy's greatest root", 'Pr > F'] = pval
+    return results
+
+
+def _multivariate_ols_test(hypotheses, fit_results, exog_names,
+                            endog_names):
+    def fn(L, M, C):
+        # .. [1] https://support.sas.com/documentation/cdl/en/statug/63033
+        #        /HTML/default/viewer.htm#statug_introreg_sect012.htm
+        params, df_resid, inv_cov, sscpr = fit_results
+        # t1 = (L * params)M
+        t1 = L.dot(params).dot(M) - C
+        # H = t1'L(X'X)^L't1
+        t2 = L.dot(inv_cov).dot(L.T)
+        q = matrix_rank(t2)
+        H = t1.T.dot(inv(t2)).dot(t1)
+
+        # E = M'(Y'Y - B'(X'X)B)M
+        E = M.T.dot(sscpr).dot(M)
+        return E, H, q, df_resid
+
+    return _multivariate_test(hypotheses, exog_names, endog_names, fn)


 @Substitution(hypotheses_doc=_hypotheses_doc)
@@ -154,7 +299,70 @@ def _multivariate_test(hypotheses, exog_names, endog_names, fn):
     -------
     results : MANOVAResults
     """
-    pass
+
+    k_xvar = len(exog_names)
+    k_yvar = len(endog_names)
+    results = {}
+    for hypo in hypotheses:
+        if len(hypo) ==2:
+            name, L = hypo
+            M = None
+            C = None
+        elif len(hypo) == 3:
+            name, L, M = hypo
+            C = None
+        elif len(hypo) == 4:
+            name, L, M, C = hypo
+        else:
+            raise ValueError('hypotheses must be a tuple of length 2, 3 or 4.'
+                             ' len(hypotheses)=%d' % len(hypo))
+        if any(isinstance(j, str) for j in L):
+            L = DesignInfo(exog_names).linear_constraint(L).coefs
+        else:
+            if not isinstance(L, np.ndarray) or len(L.shape) != 2:
+                raise ValueError('Contrast matrix L must be a 2-d array!')
+            if L.shape[1] != k_xvar:
+                raise ValueError('Contrast matrix L should have the same '
+                                 'number of columns as exog! %d != %d' %
+                                 (L.shape[1], k_xvar))
+        if M is None:
+            M = np.eye(k_yvar)
+        elif any(isinstance(j, str) for j in M):
+            M = DesignInfo(endog_names).linear_constraint(M).coefs.T
+        else:
+            if M is not None:
+                if not isinstance(M, np.ndarray) or len(M.shape) != 2:
+                    raise ValueError('Transform matrix M must be a 2-d array!')
+                if M.shape[0] != k_yvar:
+                    raise ValueError('Transform matrix M should have the same '
+                                     'number of rows as the number of columns '
+                                     'of endog! %d != %d' %
+                                     (M.shape[0], k_yvar))
+        if C is None:
+            C = np.zeros([L.shape[0], M.shape[1]])
+        elif not isinstance(C, np.ndarray):
+            raise ValueError('Constant matrix C must be a 2-d array!')
+
+        if C.shape[0] != L.shape[0]:
+            raise ValueError('contrast L and constant C must have the same '
+                             'number of rows! %d!=%d'
+                             % (L.shape[0], C.shape[0]))
+        if C.shape[1] != M.shape[1]:
+            raise ValueError('transform M and constant C must have the same '
+                             'number of columns! %d!=%d'
+                             % (M.shape[1], C.shape[1]))
+        E, H, q, df_resid = fn(L, M, C)
+        EH = np.add(E, H)
+        p = matrix_rank(EH)
+
+        # eigenvalues of inv(E + H)H
+        eigv2 = np.sort(eigvals(solve(EH, H)))
+        stat_table = multivariate_stats(eigv2, p, q, df_resid)
+
+        results[name] = {'stat': stat_table, 'contrast_L': L,
+                         'transform_M': M, 'constant_C': C,
+                         'E': E, 'H': H}
+    return results


 class _MultivariateOLS(Model):
@@ -186,21 +394,24 @@ class _MultivariateOLS(Model):

     def __init__(self, endog, exog, missing='none', hasconst=None, **kwargs):
         if len(endog.shape) == 1 or endog.shape[1] == 1:
-            raise ValueError(
-                'There must be more than one dependent variable to fit multivariate OLS!'
-                )
+            raise ValueError('There must be more than one dependent variable'
+                             ' to fit multivariate OLS!')
         super(_MultivariateOLS, self).__init__(endog, exog, missing=missing,
-            hasconst=hasconst, **kwargs)
+                                               hasconst=hasconst, **kwargs)
+
+    def fit(self, method='svd'):
+        self._fittedmod = _multivariate_ols_fit(
+            self.endog, self.exog, method=method)
+        return _MultivariateOLSResults(self)


 class _MultivariateOLSResults:
     """
     _MultivariateOLS results class
     """
-
     def __init__(self, fitted_mv_ols):
-        if hasattr(fitted_mv_ols, 'data') and hasattr(fitted_mv_ols.data,
-            'design_info'):
+        if (hasattr(fitted_mv_ols, 'data') and
+                hasattr(fitted_mv_ols.data, 'design_info')):
             self.design_info = fitted_mv_ols.data.design_info
         else:
             self.design_info = None
@@ -240,7 +451,33 @@ class _MultivariateOLSResults:
         linear model y = x * params, `L` is the contrast matrix, `M` is the
         dependent variable transform matrix and C is the constant matrix.
         """
-        pass
+        k_xvar = len(self.exog_names)
+        if hypotheses is None:
+            if self.design_info is not None:
+                terms = self.design_info.term_name_slices
+                hypotheses = []
+                for key in terms:
+                    if skip_intercept_test and key == 'Intercept':
+                        continue
+                    L_contrast = np.eye(k_xvar)[terms[key], :]
+                    hypotheses.append([key, L_contrast, None])
+            else:
+                hypotheses = []
+                for i in range(k_xvar):
+                    name = 'x%d' % (i)
+                    L = np.zeros([1, k_xvar])
+                    L[i] = 1
+                    hypotheses.append([name, L, None])
+
+        results = _multivariate_ols_test(hypotheses, self._fittedmod,
+                                          self.exog_names, self.endog_names)
+
+        return MultivariateTestResults(results,
+                                       self.endog_names,
+                                       self.exog_names)
+
+    def summary(self):
+        raise NotImplementedError


 class MultivariateTestResults:
@@ -303,10 +540,18 @@ class MultivariateTestResults:
         """
         Return results as a multiindex dataframe
         """
-        pass
+        df = []
+        for key in self.results:
+            tmp = self.results[key]['stat'].copy()
+            tmp.loc[:, 'Effect'] = key
+            df.append(tmp.reset_index())
+        df = pd.concat(df, axis=0)
+        df = df.set_index(['Effect', 'index'])
+        df.index.set_names(['Effect', 'Statistic'], inplace=True)
+        return df

     def summary(self, show_contrast_L=False, show_transform_M=False,
-        show_constant_C=False):
+                show_constant_C=False):
         """
         Summary of test results

@@ -319,4 +564,29 @@ class MultivariateTestResults:
         show_constant_C : bool
             Whether to show the constant_C
         """
-        pass
+        summ = summary2.Summary()
+        summ.add_title('Multivariate linear model')
+        for key in self.results:
+            summ.add_dict({'': ''})
+            df = self.results[key]['stat'].copy()
+            df = df.reset_index()
+            c = list(df.columns)
+            c[0] = key
+            df.columns = c
+            df.index = ['', '', '', '']
+            summ.add_df(df)
+            if show_contrast_L:
+                summ.add_dict({key: ' contrast L='})
+                df = pd.DataFrame(self.results[key]['contrast_L'],
+                                  columns=self.exog_names)
+                summ.add_df(df)
+            if show_transform_M:
+                summ.add_dict({key: ' transform M='})
+                df = pd.DataFrame(self.results[key]['transform_M'],
+                                  index=self.endog_names)
+                summ.add_df(df)
+            if show_constant_C:
+                summ.add_dict({key: ' constant C='})
+                df = pd.DataFrame(self.results[key]['constant_C'])
+                summ.add_df(df)
+        return summ
diff --git a/statsmodels/multivariate/pca.py b/statsmodels/multivariate/pca.py
index a3c9024b5..bed5a838e 100644
--- a/statsmodels/multivariate/pca.py
+++ b/statsmodels/multivariate/pca.py
@@ -3,10 +3,22 @@
 Author: josef-pktd
 Modified by Kevin Sheppard
 """
+
 import numpy as np
 import pandas as pd
-from statsmodels.tools.sm_exceptions import ValueWarning, EstimationWarning
-from statsmodels.tools.validation import string_like, array_like, bool_like, float_like, int_like
+
+from statsmodels.tools.sm_exceptions import (ValueWarning,
+                                             EstimationWarning)
+from statsmodels.tools.validation import (string_like,
+                                          array_like,
+                                          bool_like,
+                                          float_like,
+                                          int_like,
+                                          )
+
+
+def _norm(x):
+    return np.sqrt(np.sum(x * x))


 class PCA:
@@ -184,28 +196,33 @@ class PCA:
     """

     def __init__(self, data, ncomp=None, standardize=True, demean=True,
-        normalize=True, gls=False, weights=None, method='svd', missing=None,
-        tol=5e-08, max_iter=1000, tol_em=5e-08, max_em_iter=100,
-        svd_full_matrices=False):
+                 normalize=True, gls=False, weights=None, method='svd',
+                 missing=None, tol=5e-8, max_iter=1000, tol_em=5e-8,
+                 max_em_iter=100, svd_full_matrices=False):
         self._index = None
         self._columns = []
         if isinstance(data, pd.DataFrame):
             self._index = data.index
             self._columns = data.columns
-        self.data = array_like(data, 'data', ndim=2)
-        self._gls = bool_like(gls, 'gls')
-        self._normalize = bool_like(normalize, 'normalize')
-        self._svd_full_matrices = bool_like(svd_full_matrices, 'svd_fm')
-        self._tol = float_like(tol, 'tol')
+
+        self.data = array_like(data, "data", ndim=2)
+        # Store inputs
+        self._gls = bool_like(gls, "gls")
+        self._normalize = bool_like(normalize, "normalize")
+        self._svd_full_matrices = bool_like(svd_full_matrices, "svd_fm")
+        self._tol = float_like(tol, "tol")
         if not 0 < self._tol < 1:
             raise ValueError('tol must be strictly between 0 and 1')
-        self._max_iter = int_like(max_iter, 'int_like')
-        self._max_em_iter = int_like(max_em_iter, 'max_em_iter')
-        self._tol_em = float_like(tol_em, 'tol_em')
-        self._standardize = bool_like(standardize, 'standardize')
-        self._demean = bool_like(demean, 'demean')
+        self._max_iter = int_like(max_iter, "int_like")
+        self._max_em_iter = int_like(max_em_iter, "max_em_iter")
+        self._tol_em = float_like(tol_em, "tol_em")
+
+        # Prepare data
+        self._standardize = bool_like(standardize, "standardize")
+        self._demean = bool_like(demean, "demean")
+
         self._nobs, self._nvar = self.data.shape
-        weights = array_like(weights, 'weights', maxdim=1, optional=True)
+        weights = array_like(weights, "weights", maxdim=1, optional=True)
         if weights is None:
             weights = np.ones(self._nvar)
         else:
@@ -214,32 +231,44 @@ class PCA:
                 raise ValueError('weights should have nvar elements')
             weights = weights / np.sqrt((weights ** 2.0).mean())
         self.weights = weights
+
+        # Check ncomp against maximum
         min_dim = min(self._nobs, self._nvar)
         self._ncomp = min_dim if ncomp is None else ncomp
         if self._ncomp > min_dim:
             import warnings
-            warn = (
-                'The requested number of components is more than can be computed from data. The maximum number of components is the minimum of the number of observations or variables'
-                )
+
+            warn = 'The requested number of components is more than can be ' \
+                   'computed from data. The maximum number of components is ' \
+                   'the minimum of the number of observations or variables'
             warnings.warn(warn, ValueWarning)
             self._ncomp = min_dim
+
         self._method = method
+        # Workaround to avoid instance methods in __dict__
         if self._method not in ('eig', 'svd', 'nipals'):
             raise ValueError('method {0} is not known.'.format(method))
         if self._method == 'svd':
             self._svd_full_matrices = True
+
         self.rows = np.arange(self._nobs)
         self.cols = np.arange(self._nvar)
-        self._missing = string_like(missing, 'missing', optional=True)
+        # Handle missing
+        self._missing = string_like(missing, "missing", optional=True)
         self._adjusted_data = self.data
         self._adjust_missing()
+
+        # Update size
         self._nobs, self._nvar = self._adjusted_data.shape
         if self._ncomp == np.min(self.data.shape):
             self._ncomp = np.min(self._adjusted_data.shape)
         elif self._ncomp > np.min(self._adjusted_data.shape):
-            raise ValueError(
-                'When adjusting for missing values, user provided ncomp must be no larger than the smallest dimension of the missing-value-adjusted data size.'
-                )
+            raise ValueError('When adjusting for missing values, user '
+                             'provided ncomp must be no larger than the '
+                             'smallest dimension of the '
+                             'missing-value-adjusted data size.')
+
+        # Attributes and internal values
         self._tss = 0.0
         self._ess = None
         self.transformed_data = None
@@ -255,12 +284,17 @@ class PCA:
         self.projection = None
         self.rsquare = None
         self.ic = None
+
+        # Prepare data
         self.transformed_data = self._prepare_data()
+        # Perform the PCA
         self._pca()
         if gls:
             self._compute_gls_weights()
             self.transformed_data = self._prepare_data()
             self._pca()
+
+        # Final calculations
         self._compute_rsquare_and_ic()
         if self._index is not None:
             self._to_pandas()
@@ -269,19 +303,87 @@ class PCA:
         """
         Implements alternatives for handling missing values
         """
-        pass
+
+        def keep_col(x):
+            index = np.logical_not(np.any(np.isnan(x), 0))
+            return x[:, index], index
+
+        def keep_row(x):
+            index = np.logical_not(np.any(np.isnan(x), 1))
+            return x[index, :], index
+
+        if self._missing == 'drop-col':
+            self._adjusted_data, index = keep_col(self.data)
+            self.cols = np.where(index)[0]
+            self.weights = self.weights[index]
+        elif self._missing == 'drop-row':
+            self._adjusted_data, index = keep_row(self.data)
+            self.rows = np.where(index)[0]
+        elif self._missing == 'drop-min':
+            drop_col, drop_col_index = keep_col(self.data)
+            drop_col_size = drop_col.size
+
+            drop_row, drop_row_index = keep_row(self.data)
+            drop_row_size = drop_row.size
+
+            if drop_row_size > drop_col_size:
+                self._adjusted_data = drop_row
+                self.rows = np.where(drop_row_index)[0]
+            else:
+                self._adjusted_data = drop_col
+                self.weights = self.weights[drop_col_index]
+                self.cols = np.where(drop_col_index)[0]
+        elif self._missing == 'fill-em':
+            self._adjusted_data = self._fill_missing_em()
+        elif self._missing is None:
+            if not np.isfinite(self._adjusted_data).all():
+                raise ValueError("""\
+data contains non-finite values (inf, NaN). You should drop these values or
+use one of the methods for adjusting data for missing-values.""")
+        else:
+            raise ValueError('missing method is not known.')
+
+        if self._index is not None:
+            self._columns = self._columns[self.cols]
+            self._index = self._index[self.rows]
+
+        # Check adjusted data size
+        if self._adjusted_data.size == 0:
+            raise ValueError('Removal of missing values has eliminated '
+                             'all data.')

     def _compute_gls_weights(self):
         """
         Computes GLS weights based on percentage of data fit
         """
-        pass
+        projection = np.asarray(self.project(transform=False))
+        errors = self.transformed_data - projection
+        if self._ncomp == self._nvar:
+            raise ValueError('gls can only be used when ncomp < nvar '
+                             'so that residuals have non-zero variance')
+        var = (errors ** 2.0).mean(0)
+        weights = 1.0 / var
+        weights = weights / np.sqrt((weights ** 2.0).mean())
+        nvar = self._nvar
+        eff_series_perc = (1.0 / sum((weights / weights.sum()) ** 2.0)) / nvar
+        if eff_series_perc < 0.1:
+            eff_series = int(np.round(eff_series_perc * nvar))
+            import warnings
+
+            warn = f"""\
+Many series are being down weighted by GLS. Of the {nvar} series, the GLS
+estimates are based on only {eff_series} (effective) series."""
+            warnings.warn(warn, EstimationWarning)
+
+        self.weights = weights

     def _pca(self):
         """
         Main PCA routine
         """
-        pass
+        self._compute_eig()
+        self._compute_pca_from_eig()
+        self.projection = self.project()

     def __repr__(self):
         string = self.__str__()
@@ -312,7 +414,19 @@ class PCA:
         """
         Standardize or demean data.
         """
-        pass
+        adj_data = self._adjusted_data
+        if np.all(np.isnan(adj_data)):
+            return np.empty(adj_data.shape[1]).fill(np.nan)
+
+        self._mu = np.nanmean(adj_data, axis=0)
+        self._sigma = np.sqrt(np.nanmean((adj_data - self._mu) ** 2.0, axis=0))
+        if self._standardize:
+            data = (adj_data - self._mu) / self._sigma
+        elif self._demean:
+            data = (adj_data - self._mu)
+        else:
+            data = adj_data
+        return data / np.sqrt(self.weights)

     def _compute_eig(self):
         """
@@ -320,42 +434,192 @@ class PCA:

         This is a workaround to avoid instance methods in __dict__
         """
-        pass
+        if self._method == 'eig':
+            return self._compute_using_eig()
+        elif self._method == 'svd':
+            return self._compute_using_svd()
+        else:  # self._method == 'nipals'
+            return self._compute_using_nipals()

     def _compute_using_svd(self):
         """SVD method to compute eigenvalues and eigenvecs"""
-        pass
+        x = self.transformed_data
+        u, s, v = np.linalg.svd(x, full_matrices=self._svd_full_matrices)
+        self.eigenvals = s ** 2.0
+        self.eigenvecs = v.T

     def _compute_using_eig(self):
         """
         Eigenvalue decomposition method to compute eigenvalues and eigenvectors
         """
-        pass
+        x = self.transformed_data
+        self.eigenvals, self.eigenvecs = np.linalg.eigh(x.T.dot(x))

     def _compute_using_nipals(self):
         """
         NIPALS implementation to compute small number of eigenvalues
         and eigenvectors
         """
-        pass
+        x = self.transformed_data
+        if self._ncomp > 1:
+            x = x + 0.0  # Copy
+
+        tol, max_iter, ncomp = self._tol, self._max_iter, self._ncomp
+        vals = np.zeros(self._ncomp)
+        vecs = np.zeros((self._nvar, self._ncomp))
+        for i in range(ncomp):
+            max_var_ind = np.argmax(x.var(0))
+            factor = x[:, [max_var_ind]]
+            _iter = 0
+            diff = 1.0
+            while diff > tol and _iter < max_iter:
+                vec = x.T.dot(factor) / (factor.T.dot(factor))
+                vec = vec / np.sqrt(vec.T.dot(vec))
+                factor_last = factor
+                factor = x.dot(vec) / (vec.T.dot(vec))
+                diff = _norm(factor - factor_last) / _norm(factor)
+                _iter += 1
+            vals[i] = (factor ** 2).sum()
+            vecs[:, [i]] = vec
+            if ncomp > 1:
+                x -= factor.dot(vec.T)
+
+        self.eigenvals = vals
+        self.eigenvecs = vecs

     def _fill_missing_em(self):
         """
         EM algorithm to fill missing values
         """
-        pass
+        non_missing = np.logical_not(np.isnan(self.data))
+
+        # If nothing missing, return without altering the data
+        if np.all(non_missing):
+            return self.data
+
+        # 1. Standardized data as needed
+        data = self.transformed_data = np.asarray(self._prepare_data())
+
+        ncomp = self._ncomp
+
+        # 2. Check for all nans
+        col_non_missing = np.sum(non_missing, 1)
+        row_non_missing = np.sum(non_missing, 0)
+        if np.any(col_non_missing < ncomp) or np.any(row_non_missing < ncomp):
+            raise ValueError('Implementation requires that all columns and '
+                             'all rows have at least ncomp non-missing values')
+        # 3. Get mask
+        mask = np.isnan(data)
+
+        # 4. Compute mean
+        mu = np.nanmean(data, 0)
+
+        # 5. Replace missing with mean
+        projection = np.ones((self._nobs, 1)) * mu
+        projection_masked = projection[mask]
+        data[mask] = projection_masked
+
+        # 6. Compute eigenvalues and fit
+        diff = 1.0
+        _iter = 0
+        while diff > self._tol_em and _iter < self._max_em_iter:
+            last_projection_masked = projection_masked
+            # Set transformed data to compute eigenvalues
+            self.transformed_data = data
+            # Call correct eig function here
+            self._compute_eig()
+            # Call function to compute factors and projection
+            self._compute_pca_from_eig()
+            projection = np.asarray(self.project(transform=False,
+                                                 unweight=False))
+            projection_masked = projection[mask]
+            data[mask] = projection_masked
+            delta = last_projection_masked - projection_masked
+            diff = _norm(delta) / _norm(projection_masked)
+            _iter += 1
+        # Must copy to avoid overwriting original data since replacing values
+        data = self._adjusted_data + 0.0
+        projection = np.asarray(self.project())
+        data[mask] = projection[mask]
+
+        return data

     def _compute_pca_from_eig(self):
         """
         Compute relevant statistics after eigenvalues have been computed
         """
-        pass
+        # Ensure sorted largest to smallest
+        vals, vecs = self.eigenvals, self.eigenvecs
+        indices = np.argsort(vals)
+        indices = indices[::-1]
+        vals = vals[indices]
+        vecs = vecs[:, indices]
+        if (vals <= 0).any():
+            # Discard and warn
+            num_good = vals.shape[0] - (vals <= 0).sum()
+            if num_good < self._ncomp:
+                import warnings
+
+                warnings.warn('Only {num:d} eigenvalues are positive.  '
+                              'This is the maximum number of components '
+                              'that can be extracted.'.format(num=num_good),
+                              EstimationWarning)
+
+                self._ncomp = num_good
+                vals[num_good:] = np.finfo(np.float64).tiny
+        # Use ncomp for the remaining calculations
+        vals = vals[:self._ncomp]
+        vecs = vecs[:, :self._ncomp]
+        self.eigenvals, self.eigenvecs = vals, vecs
+        # Select correct number of components to return
+        self.scores = self.factors = self.transformed_data.dot(vecs)
+        self.loadings = vecs
+        self.coeff = vecs.T
+        if self._normalize:
+            self.coeff = (self.coeff.T * np.sqrt(vals)).T
+            self.factors /= np.sqrt(vals)
+            self.scores = self.factors

     def _compute_rsquare_and_ic(self):
         """
         Final statistics to compute
         """
-        pass
+        # TSS and related calculations
+        # TODO: This needs careful testing, with and without weights,
+        #   gls, standardized and demean
+        weights = self.weights
+        ss_data = self.transformed_data * np.sqrt(weights)
+        self._tss_indiv = np.sum(ss_data ** 2, 0)
+        self._tss = np.sum(self._tss_indiv)
+        self._ess = np.zeros(self._ncomp + 1)
+        self._ess_indiv = np.zeros((self._ncomp + 1, self._nvar))
+        for i in range(self._ncomp + 1):
+            # Projection in the same space as transformed_data
+            projection = self.project(ncomp=i, transform=False, unweight=False)
+            indiv_rss = (projection ** 2).sum(axis=0)
+            rss = indiv_rss.sum()
+            self._ess[i] = self._tss - rss
+            self._ess_indiv[i, :] = self._tss_indiv - indiv_rss
+        self.rsquare = 1.0 - self._ess / self._tss
+        # Information Criteria
+        ess = self._ess
+        invalid = ess <= 0  # Prevent log issues of 0
+        if invalid.any():
+            last_obs = (np.where(invalid)[0]).min()
+            ess = ess[:last_obs]
+
+        log_ess = np.log(ess)
+        r = np.arange(ess.shape[0])
+
+        nobs, nvar = self._nobs, self._nvar
+        sum_to_prod = (nobs + nvar) / (nobs * nvar)
+        min_dim = min(nobs, nvar)
+        penalties = np.array([sum_to_prod * np.log(1.0 / sum_to_prod),
+                              sum_to_prod * np.log(min_dim),
+                              np.log(min_dim) / min_dim])
+        penalties = penalties[:, None]
+        ic = log_ess + r * penalties
+        self.ic = ic.T

     def project(self, ncomp=None, transform=True, unweight=True):
         """
@@ -382,16 +646,70 @@ class PCA:
         Notes
         -----
         """
-        pass
+        # Projection needs to be scaled/shifted based on inputs
+        ncomp = self._ncomp if ncomp is None else ncomp
+        if ncomp > self._ncomp:
+            raise ValueError('ncomp must be smaller than the number of '
+                             'components computed.')
+        factors = np.asarray(self.factors)
+        coeff = np.asarray(self.coeff)
+
+        projection = factors[:, :ncomp].dot(coeff[:ncomp, :])
+        if transform or unweight:
+            projection *= np.sqrt(self.weights)
+        if transform:
+            # Remove the weights, which do not depend on transformation
+            if self._standardize:
+                projection *= self._sigma
+            if self._standardize or self._demean:
+                projection += self._mu
+        if self._index is not None:
+            projection = pd.DataFrame(projection,
+                                      columns=self._columns,
+                                      index=self._index)
+        return projection

     def _to_pandas(self):
         """
         Returns pandas DataFrames for all values
         """
-        pass
-
-    def plot_scree(self, ncomp=None, log_scale=True, cumulative=False, ax=None
-        ):
+        index = self._index
+        # Principal Components
+        num_zeros = np.ceil(np.log10(self._ncomp))
+        comp_str = 'comp_{0:0' + str(int(num_zeros)) + 'd}'
+        cols = [comp_str.format(i) for i in range(self._ncomp)]
+        df = pd.DataFrame(self.factors, columns=cols, index=index)
+        self.scores = self.factors = df
+        # Projections
+        df = pd.DataFrame(self.projection,
+                          columns=self._columns,
+                          index=index)
+        self.projection = df
+        # Weights
+        df = pd.DataFrame(self.coeff, index=cols,
+                          columns=self._columns)
+        self.coeff = df
+        # Loadings
+        df = pd.DataFrame(self.loadings,
+                          index=self._columns, columns=cols)
+        self.loadings = df
+        # eigenvals
+        self.eigenvals = pd.Series(self.eigenvals)
+        self.eigenvals.name = 'eigenvals'
+        # eigenvecs
+        vec_str = comp_str.replace('comp', 'eigenvec')
+        cols = [vec_str.format(i) for i in range(self.eigenvecs.shape[1])]
+        self.eigenvecs = pd.DataFrame(self.eigenvecs, columns=cols)
+        # R2
+        self.rsquare = pd.Series(self.rsquare)
+        self.rsquare.index.name = 'ncomp'
+        self.rsquare.name = 'rsquare'
+        # IC
+        self.ic = pd.DataFrame(self.ic, columns=['IC_p1', 'IC_p2', 'IC_p3'])
+        self.ic.index.name = 'ncomp'
+
+    def plot_scree(self, ncomp=None, log_scale=True,
+                   cumulative=False, ax=None):
         """
         Plot of the ordered eigenvalues

@@ -414,7 +732,41 @@ class PCA:
         matplotlib.figure.Figure
             The handle to the figure.
         """
-        pass
+        import statsmodels.graphics.utils as gutils
+
+        fig, ax = gutils.create_mpl_ax(ax)
+
+        ncomp = self._ncomp if ncomp is None else ncomp
+        vals = np.asarray(self.eigenvals)
+        vals = vals[:self._ncomp]
+        if cumulative:
+            vals = np.cumsum(vals)
+
+        if log_scale:
+            ax.set_yscale('log')
+        ax.plot(np.arange(ncomp), vals[: ncomp], 'bo')
+        ax.autoscale(tight=True)
+        xlim = np.array(ax.get_xlim())
+        sp = xlim[1] - xlim[0]
+        xlim += 0.02 * np.array([-sp, sp])
+        ax.set_xlim(xlim)
+
+        ylim = np.array(ax.get_ylim())
+        scale = 0.02
+        if log_scale:
+            sp = np.log(ylim[1] / ylim[0])
+            ylim = np.exp(np.array([np.log(ylim[0]) - scale * sp,
+                                    np.log(ylim[1]) + scale * sp]))
+        else:
+            sp = ylim[1] - ylim[0]
+            ylim += scale * np.array([-sp, sp])
+        ax.set_ylim(ylim)
+        ax.set_title('Scree Plot')
+        ax.set_ylabel('Eigenvalue')
+        ax.set_xlabel('Component Number')
+        fig.tight_layout()
+
+        return fig

     def plot_rsquare(self, ncomp=None, ax=None):
         """
@@ -434,11 +786,26 @@ class PCA:
         matplotlib.figure.Figure
             The handle to the figure.
         """
-        pass
+        import statsmodels.graphics.utils as gutils
+
+        fig, ax = gutils.create_mpl_ax(ax)
+
+        ncomp = 10 if ncomp is None else ncomp
+        ncomp = min(ncomp, self._ncomp)
+        # R2s in rows, series in columns
+        r2s = 1.0 - self._ess_indiv / self._tss_indiv
+        r2s = r2s[1:]
+        r2s = r2s[:ncomp]
+        ax.boxplot(r2s.T)
+        ax.set_title('Individual Input $R^2$')
+        ax.set_ylabel('$R^2$')
+        ax.set_xlabel('Number of Included Principal Components')
+
+        return fig


 def pca(data, ncomp=None, standardize=True, demean=True, normalize=True,
-    gls=False, weights=None, method='svd'):
+        gls=False, weights=None, method='svd'):
     """
     Perform Principal Component Analysis (PCA).

@@ -499,4 +866,8 @@ def pca(data, ncomp=None, standardize=True, demean=True, normalize=True,
     This is a simple function wrapper around the PCA class. See PCA for
     more information and additional methods.
     """
-    pass
+    pc = PCA(data, ncomp=ncomp, standardize=standardize, demean=demean,
+             normalize=normalize, gls=gls, weights=weights, method=method)
+
+    return (pc.factors, pc.loadings, pc.projection, pc.rsquare, pc.ic,
+            pc.eigenvals, pc.eigenvecs)
diff --git a/statsmodels/multivariate/plots.py b/statsmodels/multivariate/plots.py
index 71686511b..b47ea0e84 100644
--- a/statsmodels/multivariate/plots.py
+++ b/statsmodels/multivariate/plots.py
@@ -23,11 +23,59 @@ def plot_scree(eigenvals, total_var, ncomp=None, x_label='factor'):
     Figure
         Handle to the figure.
     """
-    pass
+    fig = plt.figure()
+    ncomp = len(eigenvals) if ncomp is None else ncomp
+    vals = eigenvals
+    vals = vals[:ncomp]
+    #    vals = np.cumsum(vals)

+    ax = fig.add_subplot(121)
+    ax.plot(np.arange(ncomp), vals[: ncomp], 'b-o')
+    ax.autoscale(tight=True)
+    xlim = np.array(ax.get_xlim())
+    sp = xlim[1] - xlim[0]
+    xlim += 0.02 * np.array([-sp, sp])
+    ax.set_xticks(np.arange(ncomp))
+    ax.set_xlim(xlim)

-def plot_loadings(loadings, col_names=None, row_names=None, loading_pairs=
-    None, percent_variance=None, title='Factor patterns'):
+    ylim = np.array(ax.get_ylim())
+    scale = 0.02
+    sp = ylim[1] - ylim[0]
+    ylim += scale * np.array([-sp, sp])
+    ax.set_ylim(ylim)
+    ax.set_title('Scree Plot')
+    ax.set_ylabel('Eigenvalue')
+    ax.set_xlabel(x_label)
+
+    per_variance = vals / total_var
+    cumper_variance = np.cumsum(per_variance)
+    ax = fig.add_subplot(122)
+
+    ax.plot(np.arange(ncomp), per_variance[: ncomp], 'b-o')
+    ax.plot(np.arange(ncomp), cumper_variance[: ncomp], 'g--o')
+    ax.autoscale(tight=True)
+    xlim = np.array(ax.get_xlim())
+    sp = xlim[1] - xlim[0]
+    xlim += 0.02 * np.array([-sp, sp])
+    ax.set_xticks(np.arange(ncomp))
+    ax.set_xlim(xlim)
+
+    ylim = np.array(ax.get_ylim())
+    scale = 0.02
+    sp = ylim[1] - ylim[0]
+    ylim += scale * np.array([-sp, sp])
+    ax.set_ylim(ylim)
+    ax.set_title('Variance Explained')
+    ax.set_ylabel('Proportion')
+    ax.set_xlabel(x_label)
+    ax.legend(['Proportion', 'Cumulative'], loc=5)
+    fig.tight_layout()
+    return fig
+
+
+def plot_loadings(loadings, col_names=None, row_names=None,
+                  loading_pairs=None, percent_variance=None,
+                  title='Factor patterns'):
     """
     Plot factor loadings in 2-d plots

@@ -50,4 +98,43 @@ def plot_loadings(loadings, col_names=None, row_names=None, loading_pairs=
     -------
     figs : a list of figure handles
     """
-    pass
+    k_var, n_factor = loadings.shape
+    if loading_pairs is None:
+        loading_pairs = []
+        for i in range(n_factor):
+            for j in range(i + 1,n_factor):
+                loading_pairs.append([i, j])
+    if col_names is None:
+        col_names = ["factor %d" % i for i in range(n_factor)]
+    if row_names is None:
+        row_names = ["var %d" % i for i in range(k_var)]
+    figs = []
+    for item in loading_pairs:
+        i = item[0]
+        j = item[1]
+        fig = plt.figure(figsize=(7, 7))
+        figs.append(fig)
+        ax = fig.add_subplot(111)
+        for k in range(loadings.shape[0]):
+            plt.text(loadings[k, i], loadings[k, j],
+                     row_names[k], fontsize=12)
+        ax.plot(loadings[:, i], loadings[:, j], 'bo')
+        ax.set_title(title)
+        if percent_variance is not None:
+            x_str = '%s (%.1f%%)' % (col_names[i], percent_variance[i])
+            y_str = '%s (%.1f%%)' % (col_names[j], percent_variance[j])
+            ax.set_xlabel(x_str)
+            ax.set_ylabel(y_str)
+        else:
+            ax.set_xlabel(col_names[i])
+            ax.set_ylabel(col_names[j])
+        v = 1.05
+        xlim = np.array([-v, v])
+        ylim = np.array([-v, v])
+        ax.plot(xlim, [0, 0], 'k--')
+        ax.plot([0, 0], ylim, 'k--')
+        ax.set_aspect('equal', 'datalim')
+        ax.set_xlim(xlim)
+        ax.set_ylim(ylim)
+        fig.tight_layout()
+    return figs
diff --git a/statsmodels/nonparametric/_kernel_base.py b/statsmodels/nonparametric/_kernel_base.py
index 2a28747ad..1526048ff 100644
--- a/statsmodels/nonparametric/_kernel_base.py
+++ b/statsmodels/nonparametric/_kernel_base.py
@@ -3,32 +3,47 @@ Module containing the base object for multivariate kernel density and
 regression, plus some utilities.
 """
 import copy
+
 import numpy as np
 from scipy import optimize
 from scipy.stats.mstats import mquantiles
+
 try:
     import joblib
     has_joblib = True
 except ImportError:
     has_joblib = False
+
 from . import kernels
-kernel_func = dict(wangryzin=kernels.wang_ryzin, aitchisonaitken=kernels.
-    aitchison_aitken, gaussian=kernels.gaussian, aitchison_aitken_reg=
-    kernels.aitchison_aitken_reg, wangryzin_reg=kernels.wang_ryzin_reg,
-    gauss_convolution=kernels.gaussian_convolution, wangryzin_convolution=
-    kernels.wang_ryzin_convolution, aitchisonaitken_convolution=kernels.
-    aitchison_aitken_convolution, gaussian_cdf=kernels.gaussian_cdf,
-    aitchisonaitken_cdf=kernels.aitchison_aitken_cdf, wangryzin_cdf=kernels
-    .wang_ryzin_cdf, d_gaussian=kernels.d_gaussian, tricube=kernels.tricube)
+
+
+kernel_func = dict(wangryzin=kernels.wang_ryzin,
+                   aitchisonaitken=kernels.aitchison_aitken,
+                   gaussian=kernels.gaussian,
+                   aitchison_aitken_reg = kernels.aitchison_aitken_reg,
+                   wangryzin_reg = kernels.wang_ryzin_reg,
+                   gauss_convolution=kernels.gaussian_convolution,
+                   wangryzin_convolution=kernels.wang_ryzin_convolution,
+                   aitchisonaitken_convolution=kernels.aitchison_aitken_convolution,
+                   gaussian_cdf=kernels.gaussian_cdf,
+                   aitchisonaitken_cdf=kernels.aitchison_aitken_cdf,
+                   wangryzin_cdf=kernels.wang_ryzin_cdf,
+                   d_gaussian=kernels.d_gaussian,
+                   tricube=kernels.tricube)


 def _compute_min_std_IQR(data):
     """Compute minimum of std and IQR for each variable."""
-    pass
+    s1 = np.std(data, axis=0)
+    q75 = mquantiles(data, 0.75, axis=0).data[0]
+    q25 = mquantiles(data, 0.25, axis=0).data[0]
+    s2 = (q75 - q25) / 1.349  # IQR
+    dispersion = np.minimum(s1, s2)
+    return dispersion


-def _compute_subset(class_type, data, bw, co, do, n_cvars, ix_ord, ix_unord,
-    n_sub, class_vars, randomize, bound):
+def _compute_subset(class_type, data, bw, co, do, n_cvars, ix_ord,
+                    ix_unord, n_sub, class_vars, randomize, bound):
     """"Compute bw on subset of data.

     Called from ``GenericKDE._compute_efficient_*``.
@@ -37,14 +52,54 @@ def _compute_subset(class_type, data, bw, co, do, n_cvars, ix_ord, ix_unord,
     -----
     Needs to be outside the class in order for joblib to be able to pickle it.
     """
-    pass
-
-
-class GenericKDE(object):
+    if randomize:
+        np.random.shuffle(data)
+        sub_data = data[:n_sub, :]
+    else:
+        sub_data = data[bound[0]:bound[1], :]
+
+    if class_type == 'KDEMultivariate':
+        from .kernel_density import KDEMultivariate
+        var_type = class_vars[0]
+        sub_model = KDEMultivariate(sub_data, var_type, bw=bw,
+                        defaults=EstimatorSettings(efficient=False))
+    elif class_type == 'KDEMultivariateConditional':
+        from .kernel_density import KDEMultivariateConditional
+        k_dep, dep_type, indep_type = class_vars
+        endog = sub_data[:, :k_dep]
+        exog = sub_data[:, k_dep:]
+        sub_model = KDEMultivariateConditional(endog, exog, dep_type,
+            indep_type, bw=bw, defaults=EstimatorSettings(efficient=False))
+    elif class_type == 'KernelReg':
+        from .kernel_regression import KernelReg
+        var_type, k_vars, reg_type = class_vars
+        endog = _adjust_shape(sub_data[:, 0], 1)
+        exog = _adjust_shape(sub_data[:, 1:], k_vars)
+        sub_model = KernelReg(endog=endog, exog=exog, reg_type=reg_type,
+                              var_type=var_type, bw=bw,
+                              defaults=EstimatorSettings(efficient=False))
+    else:
+        raise ValueError("class_type not recognized, should be one of " \
+                 "{KDEMultivariate, KDEMultivariateConditional, KernelReg}")
+
+    # Compute dispersion in next 4 lines
+    if class_type == 'KernelReg':
+        sub_data = sub_data[:, 1:]
+
+    dispersion = _compute_min_std_IQR(sub_data)
+
+    fct = dispersion * n_sub**(-1. / (n_cvars + co))
+    fct[ix_unord] = n_sub**(-2. / (n_cvars + do))
+    fct[ix_ord] = n_sub**(-2. / (n_cvars + do))
+    sample_scale_sub = sub_model.bw / fct  #TODO: check if correct
+    bw_sub = sub_model.bw
+    return sample_scale_sub, bw_sub
+
+
+class GenericKDE (object):
     """
     Base class for density estimation and regression KDE classes.
     """
-
     def _compute_bw(self, bw):
         """
         Computes the bandwidth of the data.
@@ -63,7 +118,25 @@ class GenericKDE(object):
         -----
         The default values for bw is 'normal_reference'.
         """
-        pass
+        if bw is None:
+            bw = 'normal_reference'
+
+        if not isinstance(bw, str):
+            self._bw_method = "user-specified"
+            res = np.asarray(bw)
+        else:
+            # The user specified a bandwidth selection method
+            self._bw_method = bw
+            # Workaround to avoid instance methods in __dict__
+            if bw == 'normal_reference':
+                bwfunc = self._normal_reference
+            elif bw == 'cv_ml':
+                bwfunc = self._cv_ml
+            else:  # bw == 'cv_ls'
+                bwfunc = self._cv_ls
+            res = bwfunc()
+
+        return res

     def _compute_dispersion(self, data):
         """
@@ -82,7 +155,7 @@ class GenericKDE(object):
         In the notes on bwscaling option in npreg, npudens, npcdens there is
         a discussion on the measure of dispersion
         """
-        pass
+        return _compute_min_std_IQR(data)

     def _get_class_vars_type(self):
         """Helper method to be able to pass needed vars to _compute_subset.
@@ -101,11 +174,78 @@ class GenericKDE(object):
         ----------
         See p.9 in socserv.mcmaster.ca/racine/np_faq.pdf
         """
-        pass
+
+        if bw is None:
+            self._bw_method = 'normal_reference'
+        if isinstance(bw, str):
+            self._bw_method = bw
+        else:
+            self._bw_method = "user-specified"
+            return bw
+
+        nobs = self.nobs
+        n_sub = self.n_sub
+        data = copy.deepcopy(self.data)
+        n_cvars = self.data_type.count('c')
+        co = 4  # 2*order of continuous kernel
+        do = 4  # 2*order of discrete kernel
+        _, ix_ord, ix_unord = _get_type_pos(self.data_type)
+
+        # Define bounds for slicing the data
+        if self.randomize:
+            # randomize chooses blocks of size n_sub, independent of nobs
+            bounds = [None] * self.n_res
+        else:
+            bounds = [(i * n_sub, (i+1) * n_sub) for i in range(nobs // n_sub)]
+            if nobs % n_sub > 0:
+                bounds.append((nobs - nobs % n_sub, nobs))
+
+        n_blocks = self.n_res if self.randomize else len(bounds)
+        sample_scale = np.empty((n_blocks, self.k_vars))
+        only_bw = np.empty((n_blocks, self.k_vars))
+
+        class_type, class_vars = self._get_class_vars_type()
+        if has_joblib:
+            # `res` is a list of tuples (sample_scale_sub, bw_sub)
+            res = joblib.Parallel(n_jobs=self.n_jobs)(
+                joblib.delayed(_compute_subset)(
+                    class_type, data, bw, co, do, n_cvars, ix_ord, ix_unord, \
+                    n_sub, class_vars, self.randomize, bounds[i]) \
+                for i in range(n_blocks))
+        else:
+            res = []
+            for i in range(n_blocks):
+                res.append(_compute_subset(class_type, data, bw, co, do,
+                                           n_cvars, ix_ord, ix_unord, n_sub,
+                                           class_vars, self.randomize,
+                                           bounds[i]))
+
+        for i in range(n_blocks):
+            sample_scale[i, :] = res[i][0]
+            only_bw[i, :] = res[i][1]
+
+        s = self._compute_dispersion(data)
+        order_func = np.median if self.return_median else np.mean
+        m_scale = order_func(sample_scale, axis=0)
+        # TODO: Check if 1/5 is correct in line below!
+        bw = m_scale * s * nobs**(-1. / (n_cvars + co))
+        bw[ix_ord] = m_scale[ix_ord] * nobs**(-2./ (n_cvars + do))
+        bw[ix_unord] = m_scale[ix_unord] * nobs**(-2./ (n_cvars + do))
+
+        if self.return_only_bw:
+            bw = np.median(only_bw, axis=0)
+
+        return bw

     def _set_defaults(self, defaults):
         """Sets the default values for the efficient estimation"""
-        pass
+        self.n_res = defaults.n_res
+        self.n_sub = defaults.n_sub
+        self.randomize = defaults.randomize
+        self.return_median = defaults.return_median
+        self.efficient = defaults.efficient
+        self.return_only_bw = defaults.return_only_bw
+        self.n_jobs = defaults.n_jobs

     def _normal_reference(self):
         """
@@ -121,17 +261,23 @@ class GenericKDE(object):
         where ``n`` is the number of observations and ``q`` is the number of
         variables.
         """
-        pass
+        X = np.std(self.data, axis=0)
+        return 1.06 * X * self.nobs ** (- 1. / (4 + self.data.shape[1]))

     def _set_bw_bounds(self, bw):
         """
         Sets bandwidth lower bound to effectively zero )1e-10), and for
         discrete values upper bound to 1.
         """
-        pass
+        bw[bw < 0] = 1e-10
+        _, ix_ord, ix_unord = _get_type_pos(self.data_type)
+        bw[ix_ord] = np.minimum(bw[ix_ord], 1.)
+        bw[ix_unord] = np.minimum(bw[ix_unord], 1.)
+
+        return bw

     def _cv_ml(self):
-        """
+        r"""
         Returns the cross validation maximum likelihood bandwidth parameter.

         Notes
@@ -141,23 +287,28 @@ class GenericKDE(object):
         Returns the bandwidth estimate that maximizes the leave-out-out
         likelihood.  The leave-one-out log likelihood function is:

-        .. math:: \\ln L=\\sum_{i=1}^{n}\\ln f_{-i}(X_{i})
+        .. math:: \ln L=\sum_{i=1}^{n}\ln f_{-i}(X_{i})

         The leave-one-out kernel estimator of :math:`f_{-i}` is:

-        .. math:: f_{-i}(X_{i})=\\frac{1}{(n-1)h}
-                        \\sum_{j=1,j\\neq i}K_{h}(X_{i},X_{j})
+        .. math:: f_{-i}(X_{i})=\frac{1}{(n-1)h}
+                        \sum_{j=1,j\neq i}K_{h}(X_{i},X_{j})

         where :math:`K_{h}` represents the Generalized product kernel
         estimator:

-        .. math:: K_{h}(X_{i},X_{j})=\\prod_{s=1}^
-                        {q}h_{s}^{-1}k\\left(\\frac{X_{is}-X_{js}}{h_{s}}\\right)
+        .. math:: K_{h}(X_{i},X_{j})=\prod_{s=1}^
+                        {q}h_{s}^{-1}k\left(\frac{X_{is}-X_{js}}{h_{s}}\right)
         """
-        pass
+        # the initial value for the optimization is the normal_reference
+        h0 = self._normal_reference()
+        bw = optimize.fmin(self.loo_likelihood, x0=h0, args=(np.log, ),
+                           maxiter=1e3, maxfun=1e3, disp=0, xtol=1e-3)
+        bw = self._set_bw_bounds(bw)  # bound bw if necessary
+        return bw

     def _cv_ls(self):
-        """
+        r"""
         Returns the cross-validation least squares bandwidth parameter(s).

         Notes
@@ -168,13 +319,20 @@ class GenericKDE(object):
         square error between the estimated and actual distribution.  The
         integrated mean square error (IMSE) is given by:

-        .. math:: \\int\\left[\\hat{f}(x)-f(x)\\right]^{2}dx
+        .. math:: \int\left[\hat{f}(x)-f(x)\right]^{2}dx

         This is the general formula for the IMSE.  The IMSE differs for
         conditional (``KDEMultivariateConditional``) and unconditional
         (``KDEMultivariate``) kernel density estimation.
         """
-        pass
+        h0 = self._normal_reference()
+        bw = optimize.fmin(self.imse, x0=h0, maxiter=1e3, maxfun=1e3, disp=0,
+                           xtol=1e-3)
+        bw = self._set_bw_bounds(bw)  # bound bw if necessary
+        return bw
+
+    def loo_likelihood(self):
+        raise NotImplementedError


 class EstimatorSettings:
@@ -224,15 +382,14 @@ class EstimatorSettings:
     >>> settings = EstimatorSettings(randomize=True, n_jobs=3)
     >>> k_dens = KDEMultivariate(data, var_type, defaults=settings)
     """
-
     def __init__(self, efficient=False, randomize=False, n_res=25, n_sub=50,
-        return_median=True, return_only_bw=False, n_jobs=-1):
+                 return_median=True, return_only_bw=False, n_jobs=-1):
         self.efficient = efficient
         self.randomize = randomize
         self.n_res = n_res
         self.n_sub = n_sub
         self.return_median = return_median
-        self.return_only_bw = return_only_bw
+        self.return_only_bw = return_only_bw  # TODO: remove this?
         self.n_jobs = n_jobs


@@ -257,27 +414,48 @@ class LeaveOneOut:
     A little lighter weight than sklearn LOO. We do not need test index.
     Also passes views on X, not the index.
     """
-
     def __init__(self, X):
         self.X = np.asarray(X)

     def __iter__(self):
         X = self.X
         nobs, k_vars = np.shape(X)
+
         for i in range(nobs):
             index = np.ones(nobs, dtype=bool)
             index[i] = False
             yield X[index, :]


-def _adjust_shape(dat, k_vars):
-    """ Returns an array of shape (nobs, k_vars) for use with `gpke`."""
-    pass
+def _get_type_pos(var_type):
+    ix_cont = np.array([c == 'c' for c in var_type])
+    ix_ord = np.array([c == 'o' for c in var_type])
+    ix_unord = np.array([c == 'u' for c in var_type])
+    return ix_cont, ix_ord, ix_unord


-def gpke(bw, data, data_predict, var_type, ckertype='gaussian', okertype=
-    'wangryzin', ukertype='aitchisonaitken', tosum=True):
-    """
+def _adjust_shape(dat, k_vars):
+    """ Returns an array of shape (nobs, k_vars) for use with `gpke`."""
+    dat = np.asarray(dat)
+    if dat.ndim > 2:
+        dat = np.squeeze(dat)
+    if dat.ndim == 1 and k_vars > 1:  # one obs many vars
+        nobs = 1
+    elif dat.ndim == 1 and k_vars == 1:  # one obs one var
+        nobs = len(dat)
+    else:
+        if np.shape(dat)[0] == k_vars and np.shape(dat)[1] != k_vars:
+            dat = dat.T
+
+        nobs = np.shape(dat)[0]  # ndim >1 so many obs many vars
+
+    dat = np.reshape(dat, (nobs, k_vars))
+    return dat
+
+
+def gpke(bw, data, data_predict, var_type, ckertype='gaussian',
+         okertype='wangryzin', ukertype='aitchisonaitken', tosum=True):
+    r"""
     Returns the non-normalized Generalized Product Kernel Estimator

     Parameters
@@ -309,14 +487,32 @@ def gpke(bw, data, data_predict, var_type, ckertype='gaussian', okertype=
     -----
     The formula for the multivariate kernel estimator for the pdf is:

-    .. math:: f(x)=\\frac{1}{nh_{1}...h_{q}}\\sum_{i=1}^
-                        {n}K\\left(\\frac{X_{i}-x}{h}\\right)
+    .. math:: f(x)=\frac{1}{nh_{1}...h_{q}}\sum_{i=1}^
+                        {n}K\left(\frac{X_{i}-x}{h}\right)

     where

-    .. math:: K\\left(\\frac{X_{i}-x}{h}\\right) =
-                k\\left( \\frac{X_{i1}-x_{1}}{h_{1}}\\right)\\times
-                k\\left( \\frac{X_{i2}-x_{2}}{h_{2}}\\right)\\times...\\times
-                k\\left(\\frac{X_{iq}-x_{q}}{h_{q}}\\right)
+    .. math:: K\left(\frac{X_{i}-x}{h}\right) =
+                k\left( \frac{X_{i1}-x_{1}}{h_{1}}\right)\times
+                k\left( \frac{X_{i2}-x_{2}}{h_{2}}\right)\times...\times
+                k\left(\frac{X_{iq}-x_{q}}{h_{q}}\right)
     """
-    pass
+    kertypes = dict(c=ckertype, o=okertype, u=ukertype)
+    #Kval = []
+    #for ii, vtype in enumerate(var_type):
+    #    func = kernel_func[kertypes[vtype]]
+    #    Kval.append(func(bw[ii], data[:, ii], data_predict[ii]))
+
+    #Kval = np.column_stack(Kval)
+
+    Kval = np.empty(data.shape)
+    for ii, vtype in enumerate(var_type):
+        func = kernel_func[kertypes[vtype]]
+        Kval[:, ii] = func(bw[ii], data[:, ii], data_predict[ii])
+
+    iscontinuous = np.array([c == 'c' for c in var_type])
+    dens = Kval.prod(axis=1) / np.prod(bw[iscontinuous])
+    if tosum:
+        return dens.sum(axis=0)
+    else:
+        return dens
diff --git a/statsmodels/nonparametric/api.py b/statsmodels/nonparametric/api.py
index 015cd8089..67cf5ffcc 100644
--- a/statsmodels/nonparametric/api.py
+++ b/statsmodels/nonparametric/api.py
@@ -1,9 +1,15 @@
-__all__ = ['KDEUnivariate', 'KDEMultivariate', 'KDEMultivariateConditional',
-    'EstimatorSettings', 'KernelReg', 'KernelCensoredReg', 'lowess',
-    'bandwidths', 'pdf_kernel_asym', 'cdf_kernel_asym']
+__all__ = [
+    "KDEUnivariate",
+    "KDEMultivariate", "KDEMultivariateConditional", "EstimatorSettings",
+    "KernelReg", "KernelCensoredReg",
+    "lowess", "bandwidths",
+    "pdf_kernel_asym", "cdf_kernel_asym"
+]
 from .kde import KDEUnivariate
 from .smoothers_lowess import lowess
 from . import bandwidths
-from .kernel_density import KDEMultivariate, KDEMultivariateConditional, EstimatorSettings
+
+from .kernel_density import \
+    KDEMultivariate, KDEMultivariateConditional, EstimatorSettings
 from .kernel_regression import KernelReg, KernelCensoredReg
 from .kernels_asymmetric import pdf_kernel_asym, cdf_kernel_asym
diff --git a/statsmodels/nonparametric/bandwidths.py b/statsmodels/nonparametric/bandwidths.py
index 68129d1cf..0edd4f94c 100644
--- a/statsmodels/nonparametric/bandwidths.py
+++ b/statsmodels/nonparametric/bandwidths.py
@@ -1,5 +1,6 @@
 import numpy as np
 from scipy.stats import scoreatpercentile
+
 from statsmodels.compat.pandas import Substitution
 from statsmodels.sandbox.nonparametric import kernels

@@ -12,9 +13,17 @@ def _select_sigma(x, percentile=25):
     ----------
     Silverman (1986) p.47
     """
-    pass
+    # normalize = norm.ppf(.75) - norm.ppf(.25)
+    normalize = 1.349
+    IQR = (scoreatpercentile(x, 75) - scoreatpercentile(x, 25)) / normalize
+    std_dev = np.std(x, axis=0, ddof=1)
+    if IQR > 0:
+        return np.minimum(std_dev, IQR)
+    else:
+        return std_dev


+## Univariate Rule of Thumb Bandwidths ##
 def bw_scott(x, kernel=None):
     """
     Scott's Rule of Thumb
@@ -44,8 +53,9 @@ def bw_scott(x, kernel=None):
     Scott, D.W. (1992) Multivariate Density Estimation: Theory, Practice, and
         Visualization.
     """
-    pass
-
+    A = _select_sigma(x)
+    n = len(x)
+    return 1.059 * A * n ** (-0.2)

 def bw_silverman(x, kernel=None):
     """
@@ -75,7 +85,9 @@ def bw_silverman(x, kernel=None):

     Silverman, B.W. (1986) `Density Estimation.`
     """
-    pass
+    A = _select_sigma(x)
+    n = len(x)
+    return .9 * A * n ** (-0.2)


 def bw_normal_reference(x, kernel=None):
@@ -117,14 +129,27 @@ def bw_normal_reference(x, kernel=None):
     Silverman, B.W. (1986) `Density Estimation.`
     Hansen, B.E. (2009) `Lecture Notes on Nonparametrics.`
     """
-    pass
+    if kernel is None:
+        kernel = kernels.Gaussian()
+    C = kernel.normal_reference_constant
+    A = _select_sigma(x)
+    n = len(x)
+    return C * A * n ** (-0.2)
+
+## Plug-In Methods ##
+
+## Least Squares Cross-Validation ##

+## Helper Functions ##

-bandwidth_funcs = {'scott': bw_scott, 'silverman': bw_silverman,
-    'normal_reference': bw_normal_reference}
+bandwidth_funcs = {
+    "scott": bw_scott,
+    "silverman": bw_silverman,
+    "normal_reference": bw_normal_reference,
+}


-@Substitution(', '.join(sorted(bandwidth_funcs.keys())))
+@Substitution(", ".join(sorted(bandwidth_funcs.keys())))
 def select_bandwidth(x, bw, kernel):
     """
     Selects bandwidth for a selection rule bw
@@ -145,4 +170,15 @@ def select_bandwidth(x, bw, kernel):
     bw : float
         The estimate of the bandwidth
     """
-    pass
+    bw = bw.lower()
+    if bw not in bandwidth_funcs:
+        raise ValueError("Bandwidth %s not understood" % bw)
+    bandwidth = bandwidth_funcs[bw](x, kernel)
+    if np.any(bandwidth == 0):
+        # eventually this can fall back on another selection criterion.
+        err = "Selected KDE bandwidth is 0. Cannot estimate density. " \
+              "Either provide the bandwidth during initialization or use " \
+              "an alternative method."
+        raise RuntimeError(err)
+    else:
+        return bandwidth
diff --git a/statsmodels/nonparametric/kde.py b/statsmodels/nonparametric/kde.py
index 32ceaef42..7a9c54fb1 100644
--- a/statsmodels/nonparametric/kde.py
+++ b/statsmodels/nonparametric/kde.py
@@ -13,18 +13,38 @@ Silverman, B.W.  Density Estimation for Statistics and Data Analysis.
 """
 import numpy as np
 from scipy import integrate, stats
+
 from statsmodels.sandbox.nonparametric import kernels
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.validation import array_like, float_like
+
 from . import bandwidths
 from .kdetools import forrt, revrt, silverman_transform
 from .linbin import fast_linbin
-kernel_switch = dict(gau=kernels.Gaussian, epa=kernels.Epanechnikov, uni=
-    kernels.Uniform, tri=kernels.Triangular, biw=kernels.Biweight, triw=
-    kernels.Triweight, cos=kernels.Cosine, cos2=kernels.Cosine2, tric=
-    kernels.Tricube)
+
+# Kernels Switch for estimators
+
+kernel_switch = dict(
+    gau=kernels.Gaussian,
+    epa=kernels.Epanechnikov,
+    uni=kernels.Uniform,
+    tri=kernels.Triangular,
+    biw=kernels.Biweight,
+    triw=kernels.Triweight,
+    cos=kernels.Cosine,
+    cos2=kernels.Cosine2,
+    tric=kernels.Tricube
+)


+def _checkisfit(self):
+    try:
+        self.density
+    except Exception:
+        raise ValueError("Call fit to fit the density first")
+
+
+# Kernel Density Estimator Class
 class KDEUnivariate:
     """
     Univariate Kernel Density Estimator.
@@ -63,10 +83,19 @@ class KDEUnivariate:
     """

     def __init__(self, endog):
-        self.endog = array_like(endog, 'endog', ndim=1, contiguous=True)
-
-    def fit(self, kernel='gau', bw='normal_reference', fft=True, weights=
-        None, gridsize=None, adjust=1, cut=3, clip=(-np.inf, np.inf)):
+        self.endog = array_like(endog, "endog", ndim=1, contiguous=True)
+
+    def fit(
+        self,
+        kernel="gau",
+        bw="normal_reference",
+        fft=True,
+        weights=None,
+        gridsize=None,
+        adjust=1,
+        cut=3,
+        clip=(-np.inf, np.inf),
+    ):
         """
         Attach the density estimate to the KDEUnivariate class.

@@ -120,7 +149,54 @@ class KDEUnivariate:
         KDEUnivariate
             The instance fit,
         """
-        pass
+        if isinstance(bw, str):
+            self.bw_method = bw
+        else:
+            self.bw_method = "user-given"
+            if not callable(bw):
+                bw = float_like(bw, "bw")
+
+        endog = self.endog
+
+        if fft:
+            if kernel != "gau":
+                msg = "Only gaussian kernel is available for fft"
+                raise NotImplementedError(msg)
+            if weights is not None:
+                msg = "Weights are not implemented for fft"
+                raise NotImplementedError(msg)
+            density, grid, bw = kdensityfft(
+                endog,
+                kernel=kernel,
+                bw=bw,
+                adjust=adjust,
+                weights=weights,
+                gridsize=gridsize,
+                clip=clip,
+                cut=cut,
+            )
+        else:
+            density, grid, bw = kdensity(
+                endog,
+                kernel=kernel,
+                bw=bw,
+                adjust=adjust,
+                weights=weights,
+                gridsize=gridsize,
+                clip=clip,
+                cut=cut,
+            )
+        self.density = density
+        self.support = grid
+        self.bw = bw
+        self.kernel = kernel_switch[kernel](h=bw)  # we instantiate twice,
+        # should this passed to funcs?
+        # put here to ensure empty cache after re-fit with new options
+        self.kernel.weights = weights
+        if weights is not None:
+            self.kernel.weights /= weights.sum()
+        self._cache = {}
+        return self

     @cache_readonly
     def cdf(self):
@@ -131,7 +207,25 @@ class KDEUnivariate:
         -----
         Will not work if fit has not been called.
         """
-        pass
+        _checkisfit(self)
+        kern = self.kernel
+        if kern.domain is None:  # TODO: test for grid point at domain bound
+            a, b = -np.inf, np.inf
+        else:
+            a, b = kern.domain
+
+        def func(x, s):
+            return np.squeeze(kern.density(s, x))
+
+        support = self.support
+        support = np.r_[a, support]
+        gridsize = len(support)
+        endog = self.endog
+        probs = [
+            integrate.quad(func, support[i - 1], support[i], args=endog)[0]
+            for i in range(1, gridsize)
+        ]
+        return np.cumsum(probs)

     @cache_readonly
     def cumhazard(self):
@@ -142,7 +236,8 @@ class KDEUnivariate:
         -----
         Will not work if fit has not been called.
         """
-        pass
+        _checkisfit(self)
+        return -np.log(self.sf)

     @cache_readonly
     def sf(self):
@@ -153,7 +248,8 @@ class KDEUnivariate:
         -----
         Will not work if fit has not been called.
         """
-        pass
+        _checkisfit(self)
+        return 1 - self.cdf

     @cache_readonly
     def entropy(self):
@@ -165,7 +261,21 @@ class KDEUnivariate:
         Will not work if fit has not been called. 1e-12 is added to each
         probability to ensure that log(0) is not called.
         """
-        pass
+        _checkisfit(self)
+
+        def entr(x, s):
+            pdf = kern.density(s, x)
+            return pdf * np.log(pdf + 1e-12)
+
+        kern = self.kernel
+
+        if kern.domain is not None:
+            a, b = self.domain
+        else:
+            a, b = -np.inf, np.inf
+        endog = self.endog
+        # TODO: below could run into integr problems, cf. stats.dist._entropy
+        return -integrate.quad(entr, a, b, args=(endog,))[0]

     @cache_readonly
     def icdf(self):
@@ -177,7 +287,9 @@ class KDEUnivariate:
         Will not work if fit has not been called. Uses
         `scipy.stats.mstats.mquantiles`.
         """
-        pass
+        _checkisfit(self)
+        gridsize = len(self.density)
+        return stats.mstats.mquantiles(self.endog, np.linspace(0, 1, gridsize))

     def evaluate(self, point):
         """
@@ -188,11 +300,22 @@ class KDEUnivariate:
         point : {float, ndarray}
             Point(s) at which to evaluate the density.
         """
-        pass
-
-
-def kdensity(x, kernel='gau', bw='normal_reference', weights=None, gridsize
-    =None, adjust=1, clip=(-np.inf, np.inf), cut=3, retgrid=True):
+        _checkisfit(self)
+        return self.kernel.density(self.endog, point)
+
+
+# Kernel Density Estimator Functions
+def kdensity(
+    x,
+    kernel="gau",
+    bw="normal_reference",
+    weights=None,
+    gridsize=None,
+    adjust=1,
+    clip=(-np.inf, np.inf),
+    cut=3,
+    retgrid=True,
+):
     """
     Rosenblatt-Parzen univariate kernel density estimator.

@@ -256,11 +379,88 @@ def kdensity(x, kernel='gau', bw='normal_reference', weights=None, gridsize
     Creates an intermediate (`gridsize` x `nobs`) array. Use FFT for a more
     computationally efficient version.
     """
-    pass
-
-
-def kdensityfft(x, kernel='gau', bw='normal_reference', weights=None,
-    gridsize=None, adjust=1, clip=(-np.inf, np.inf), cut=3, retgrid=True):
+    x = np.asarray(x)
+    if x.ndim == 1:
+        x = x[:, None]
+    clip_x = np.logical_and(x > clip[0], x < clip[1])
+    x = x[clip_x]
+
+    nobs = len(x)  # after trim
+
+    if gridsize is None:
+        gridsize = max(nobs, 50)  # do not need to resize if no FFT
+
+        # handle weights
+    if weights is None:
+        weights = np.ones(nobs)
+        q = nobs
+    else:
+        # ensure weights is a numpy array
+        weights = np.asarray(weights)
+
+        if len(weights) != len(clip_x):
+            msg = "The length of the weights must be the same as the given x."
+            raise ValueError(msg)
+        weights = weights[clip_x.squeeze()]
+        q = weights.sum()
+
+    # Get kernel object corresponding to selection
+    kern = kernel_switch[kernel]()
+
+    if callable(bw):
+        bw = float(bw(x, kern))
+        # user passed a callable custom bandwidth function
+    elif isinstance(bw, str):
+        bw = bandwidths.select_bandwidth(x, bw, kern)
+        # will cross-val fit this pattern?
+    else:
+        bw = float_like(bw, "bw")
+
+    bw *= adjust
+
+    a = np.min(x, axis=0) - cut * bw
+    b = np.max(x, axis=0) + cut * bw
+    grid = np.linspace(a, b, gridsize)
+
+    k = (
+        x.T - grid[:, None]
+    ) / bw  # uses broadcasting to make a gridsize x nobs
+
+    # set kernel bandwidth
+    kern.seth(bw)
+
+    # truncate to domain
+    if (
+        kern.domain is not None
+    ):  # will not work for piecewise kernels like parzen
+        z_lo, z_high = kern.domain
+        domain_mask = (k < z_lo) | (k > z_high)
+        k = kern(k)  # estimate density
+        k[domain_mask] = 0
+    else:
+        k = kern(k)  # estimate density
+
+    k[k < 0] = 0  # get rid of any negative values, do we need this?
+
+    dens = np.dot(k, weights) / (q * bw)
+
+    if retgrid:
+        return dens, grid, bw
+    else:
+        return dens, bw
+
+
+def kdensityfft(
+    x,
+    kernel="gau",
+    bw="normal_reference",
+    weights=None,
+    gridsize=None,
+    adjust=1,
+    clip=(-np.inf, np.inf),
+    cut=3,
+    retgrid=True,
+):
     """
     Rosenblatt-Parzen univariate kernel density estimator

@@ -343,4 +543,68 @@ def kdensityfft(x, kernel='gau', bw='normal_reference', weights=None,
         the Fast Fourier Transform. Journal of the Royal Statistical Society.
         Series C. 31.2, 93-9.
     """
-    pass
+    x = np.asarray(x)
+    # will not work for two columns.
+    x = x[np.logical_and(x > clip[0], x < clip[1])]
+
+    # Get kernel object corresponding to selection
+    kern = kernel_switch[kernel]()
+
+    if callable(bw):
+        bw = float(bw(x, kern))
+        # user passed a callable custom bandwidth function
+    elif isinstance(bw, str):
+        # if bw is None, select optimal bandwidth for kernel
+        bw = bandwidths.select_bandwidth(x, bw, kern)
+        # will cross-val fit this pattern?
+    else:
+        bw = float_like(bw, "bw")
+
+    bw *= adjust
+
+    nobs = len(x)  # after trim
+
+    # 1 Make grid and discretize the data
+    if gridsize is None:
+        gridsize = np.max((nobs, 512.0))
+    gridsize = 2 ** np.ceil(np.log2(gridsize))  # round to next power of 2
+
+    a = np.min(x) - cut * bw
+    b = np.max(x) + cut * bw
+    grid, delta = np.linspace(a, b, int(gridsize), retstep=True)
+    RANGE = b - a
+
+    # TODO: Fix this?
+    # This is the Silverman binning function, but I believe it's buggy (SS)
+    # weighting according to Silverman
+    #    count = counts(x,grid)
+    #    binned = np.zeros_like(grid)    #xi_{k} in Silverman
+    #    j = 0
+    #    for k in range(int(gridsize-1)):
+    #        if count[k]>0: # there are points of x in the grid here
+    #            Xingrid = x[j:j+count[k]] # get all these points
+    #            # get weights at grid[k],grid[k+1]
+    #            binned[k] += np.sum(grid[k+1]-Xingrid)
+    #            binned[k+1] += np.sum(Xingrid-grid[k])
+    #            j += count[k]
+    #    binned /= (nobs)*delta**2 # normalize binned to sum to 1/delta
+
+    # NOTE: THE ABOVE IS WRONG, JUST TRY WITH LINEAR BINNING
+    binned = fast_linbin(x, a, b, gridsize) / (delta * nobs)
+
+    # step 2 compute FFT of the weights, using Munro (1976) FFT convention
+    y = forrt(binned)
+
+    # step 3 and 4 for optimal bw compute zstar and the density estimate f
+    # do not have to redo the above if just changing bw, ie., for cross val
+
+    # NOTE: silverman_transform is the closed form solution of the FFT of the
+    # gaussian kernel. Not yet sure how to generalize it.
+    zstar = silverman_transform(bw, gridsize, RANGE) * y
+    # 3.49 in Silverman
+    # 3.50 w Gaussian kernel
+    f = revrt(zstar)
+    if retgrid:
+        return f, grid, bw
+    else:
+        return f, bw
diff --git a/statsmodels/nonparametric/kdetools.py b/statsmodels/nonparametric/kdetools.py
index 13d5274e8..70bb18ed5 100644
--- a/statsmodels/nonparametric/kdetools.py
+++ b/statsmodels/nonparametric/kdetools.py
@@ -1,19 +1,24 @@
+#### Convenience Functions to be moved to kerneltools ####
 import numpy as np

-
 def forrt(X, m=None):
     """
     RFFT with order like Munro (1976) FORTT routine.
     """
-    pass
-
+    if m is None:
+        m = len(X)
+    y = np.fft.rfft(X, m) / m
+    return np.r_[y.real, y[1:-1].imag]

 def revrt(X, m=None):
     """
     Inverse of forrt. Equivalent to Munro (1976) REVRT routine.
     """
-    pass
-
+    if m is None:
+        m = len(X)
+    i = int(m // 2 + 1)
+    y = X[:i] + np.r_[0, X[i:], 0] * 1j
+    return np.fft.irfft(y)*m

 def silverman_transform(bw, M, RANGE):
     """
@@ -23,8 +28,13 @@ def silverman_transform(bw, M, RANGE):
     -----
     Underflow is intentional as a dampener.
     """
-    pass
-
+    J = np.arange(M/2+1)
+    FAC1 = 2*(np.pi*bw/RANGE)**2
+    JFAC = J**2*FAC1
+    BC = 1 - 1. / 3 * (J * 1./M*np.pi)**2
+    FAC = np.exp(-JFAC)/BC
+    kern_est = np.r_[FAC, FAC[1:-1]]
+    return kern_est

 def counts(x, v):
     """
@@ -34,4 +44,12 @@ def counts(x, v):
     -----
     Using np.digitize and np.bincount
     """
-    pass
+    idx = np.digitize(x, v)
+    try: # numpy 1.6
+        return np.bincount(idx, minlength=len(v))
+    except:
+        bc = np.bincount(idx)
+        return np.r_[bc, np.zeros(len(v) - len(bc))]
+
+def kdesum(x, axis=0):
+    return np.asarray([np.sum(x[i] - x, axis) for i in range(len(x))])
diff --git a/statsmodels/nonparametric/kernel_density.py b/statsmodels/nonparametric/kernel_density.py
index 4221a95bb..4757e545a 100644
--- a/statsmodels/nonparametric/kernel_density.py
+++ b/statsmodels/nonparametric/kernel_density.py
@@ -27,11 +27,15 @@ References
         Models", 2006, Econometric Reviews 25, 523-544

 """
+# TODO: make default behavior efficient=True above a certain n_obs
 import numpy as np
+
 from . import kernels
-from ._kernel_base import GenericKDE, EstimatorSettings, gpke, LeaveOneOut, _adjust_shape
-__all__ = ['KDEMultivariate', 'KDEMultivariateConditional', 'EstimatorSettings'
-    ]
+from ._kernel_base import GenericKDE, EstimatorSettings, gpke, \
+    LeaveOneOut, _adjust_shape
+
+
+__all__ = ['KDEMultivariate', 'KDEMultivariateConditional', 'EstimatorSettings']


 class KDEMultivariate(GenericKDE):
@@ -94,7 +98,6 @@ class KDEMultivariate(GenericKDE):
     >>> dens_u.bw
     array([ 0.39967419,  0.38423292])
     """
-
     def __init__(self, data, var_type, bw=None, defaults=None):
         self.var_type = var_type
         self.k_vars = len(self.var_type)
@@ -102,9 +105,8 @@ class KDEMultivariate(GenericKDE):
         self.data_type = var_type
         self.nobs, self.k_vars = np.shape(self.data)
         if self.nobs <= self.k_vars:
-            raise ValueError(
-                'The number of observations must be larger than the number of variables.'
-                )
+            raise ValueError("The number of observations must be larger " \
+                             "than the number of variables.")
         defaults = EstimatorSettings() if defaults is None else defaults
         self._set_defaults(defaults)
         if not self.efficient:
@@ -114,15 +116,15 @@ class KDEMultivariate(GenericKDE):

     def __repr__(self):
         """Provide something sane to print."""
-        rpr = 'KDE instance\n'
-        rpr += 'Number of variables: k_vars = ' + str(self.k_vars) + '\n'
-        rpr += 'Number of samples:   nobs = ' + str(self.nobs) + '\n'
-        rpr += 'Variable types:      ' + self.var_type + '\n'
-        rpr += 'BW selection method: ' + self._bw_method + '\n'
+        rpr = "KDE instance\n"
+        rpr += "Number of variables: k_vars = " + str(self.k_vars) + "\n"
+        rpr += "Number of samples:   nobs = " + str(self.nobs) + "\n"
+        rpr += "Variable types:      " + self.var_type + "\n"
+        rpr += "BW selection method: " + self._bw_method + "\n"
         return rpr

     def loo_likelihood(self, bw, func=lambda x: x):
-        """
+        r"""
         Returns the leave-one-out likelihood function.

         The leave-one-out likelihood function for the unconditional KDE.
@@ -139,19 +141,26 @@ class KDEMultivariate(GenericKDE):
         -----
         The leave-one-out kernel estimator of :math:`f_{-i}` is:

-        .. math:: f_{-i}(X_{i})=\\frac{1}{(n-1)h}
-                    \\sum_{j=1,j\\neq i}K_{h}(X_{i},X_{j})
+        .. math:: f_{-i}(X_{i})=\frac{1}{(n-1)h}
+                    \sum_{j=1,j\neq i}K_{h}(X_{i},X_{j})

         where :math:`K_{h}` represents the generalized product kernel
         estimator:

         .. math:: K_{h}(X_{i},X_{j}) =
-            \\prod_{s=1}^{q}h_{s}^{-1}k\\left(\\frac{X_{is}-X_{js}}{h_{s}}\\right)
+            \prod_{s=1}^{q}h_{s}^{-1}k\left(\frac{X_{is}-X_{js}}{h_{s}}\right)
         """
-        pass
+        LOO = LeaveOneOut(self.data)
+        L = 0
+        for i, X_not_i in enumerate(LOO):
+            f_i = gpke(bw, data=-X_not_i, data_predict=-self.data[i, :],
+                       var_type=self.var_type)
+            L += func(f_i)
+
+        return -L

     def pdf(self, data_predict=None):
-        """
+        r"""
         Evaluate the probability density function.

         Parameters
@@ -170,12 +179,24 @@ class KDEMultivariate(GenericKDE):
         estimator:

         .. math:: K_{h}(X_{i},X_{j}) =
-            \\prod_{s=1}^{q}h_{s}^{-1}k\\left(\\frac{X_{is}-X_{js}}{h_{s}}\\right)
+            \prod_{s=1}^{q}h_{s}^{-1}k\left(\frac{X_{is}-X_{js}}{h_{s}}\right)
         """
-        pass
+        if data_predict is None:
+            data_predict = self.data
+        else:
+            data_predict = _adjust_shape(data_predict, self.k_vars)
+
+        pdf_est = []
+        for i in range(np.shape(data_predict)[0]):
+            pdf_est.append(gpke(self.bw, data=self.data,
+                                data_predict=data_predict[i, :],
+                                var_type=self.var_type) / self.nobs)
+
+        pdf_est = np.squeeze(pdf_est)
+        return pdf_est

     def cdf(self, data_predict=None):
-        """
+        r"""
         Evaluate the cumulative distribution function.

         Parameters
@@ -198,17 +219,32 @@ class KDEMultivariate(GenericKDE):

         .. math::

-            F(x^{c},x^{d})=n^{-1}\\sum_{i=1}^{n}\\left[G(\\frac{x^{c}-X_{i}}{h})\\sum_{u\\leq x^{d}}L(X_{i}^{d},x_{i}^{d}, \\lambda)\\right]
+            F(x^{c},x^{d})=n^{-1}\sum_{i=1}^{n}\left[G(\frac{x^{c}-X_{i}}{h})\sum_{u\leq x^{d}}L(X_{i}^{d},x_{i}^{d}, \lambda)\right]

         where G() is the product kernel CDF estimator for the continuous
         and L() for the discrete variables.

         Used bandwidth is ``self.bw``.
         """
-        pass
+        if data_predict is None:
+            data_predict = self.data
+        else:
+            data_predict = _adjust_shape(data_predict, self.k_vars)
+
+        cdf_est = []
+        for i in range(np.shape(data_predict)[0]):
+            cdf_est.append(gpke(self.bw, data=self.data,
+                                data_predict=data_predict[i, :],
+                                var_type=self.var_type,
+                                ckertype="gaussian_cdf",
+                                ukertype="aitchisonaitken_cdf",
+                                okertype='wangryzin_cdf') / self.nobs)
+
+        cdf_est = np.squeeze(cdf_est)
+        return cdf_est

     def imse(self, bw):
-        """
+        r"""
         Returns the Integrated Mean Square Error for the unconditional KDE.

         Parameters
@@ -228,11 +264,11 @@ class KDEMultivariate(GenericKDE):

         The formula for the cross-validation objective function is:

-        .. math:: CV=\\frac{1}{n^{2}}\\sum_{i=1}^{n}\\sum_{j=1}^{N}
-            \\bar{K}_{h}(X_{i},X_{j})-\\frac{2}{n(n-1)}\\sum_{i=1}^{n}
-            \\sum_{j=1,j\\neq i}^{N}K_{h}(X_{i},X_{j})
+        .. math:: CV=\frac{1}{n^{2}}\sum_{i=1}^{n}\sum_{j=1}^{N}
+            \bar{K}_{h}(X_{i},X_{j})-\frac{2}{n(n-1)}\sum_{i=1}^{n}
+            \sum_{j=1,j\neq i}^{N}K_{h}(X_{i},X_{j})

-        Where :math:`\\bar{K}_{h}` is the multivariate product convolution
+        Where :math:`\bar{K}_{h}` is the multivariate product convolution
         kernel (consult [2]_ for mixed data types).

         References
@@ -242,11 +278,64 @@ class KDEMultivariate(GenericKDE):
         .. [2] Racine, J., Li, Q. "Nonparametric Estimation of Distributions
                 with Categorical and Continuous Data." Working Paper. (2000)
         """
-        pass
+        #F = 0
+        #for i in range(self.nobs):
+        #    k_bar_sum = gpke(bw, data=-self.data,
+        #                     data_predict=-self.data[i, :],
+        #                     var_type=self.var_type,
+        #                     ckertype='gauss_convolution',
+        #                     okertype='wangryzin_convolution',
+        #                     ukertype='aitchisonaitken_convolution')
+        #    F += k_bar_sum
+        ## there is a + because loo_likelihood returns the negative
+        #return (F / self.nobs**2 + self.loo_likelihood(bw) * \
+        #        2 / ((self.nobs) * (self.nobs - 1)))
+
+        # The code below is equivalent to the commented-out code above.  It's
+        # about 20% faster due to some code being moved outside the for-loops
+        # and shared by gpke() and loo_likelihood().
+        F = 0
+        kertypes = dict(c=kernels.gaussian_convolution,
+                        o=kernels.wang_ryzin_convolution,
+                        u=kernels.aitchison_aitken_convolution)
+        nobs = self.nobs
+        data = -self.data
+        var_type = self.var_type
+        ix_cont = np.array([c == 'c' for c in var_type])
+        _bw_cont_product = bw[ix_cont].prod()
+        Kval = np.empty(data.shape)
+        for i in range(nobs):
+            for ii, vtype in enumerate(var_type):
+                Kval[:, ii] = kertypes[vtype](bw[ii],
+                                              data[:, ii],
+                                              data[i, ii])
+
+            dens = Kval.prod(axis=1) / _bw_cont_product
+            k_bar_sum = dens.sum(axis=0)
+            F += k_bar_sum  # sum of prod kernel over nobs
+
+        kertypes = dict(c=kernels.gaussian,
+                        o=kernels.wang_ryzin,
+                        u=kernels.aitchison_aitken)
+        LOO = LeaveOneOut(self.data)
+        L = 0   # leave-one-out likelihood
+        Kval = np.empty((data.shape[0]-1, data.shape[1]))
+        for i, X_not_i in enumerate(LOO):
+            for ii, vtype in enumerate(var_type):
+                Kval[:, ii] = kertypes[vtype](bw[ii],
+                                              -X_not_i[:, ii],
+                                              data[i, ii])
+            dens = Kval.prod(axis=1) / _bw_cont_product
+            L += dens.sum(axis=0)
+
+        # CV objective function, eq. (2.4) of Ref. [3]
+        return (F / nobs**2 - 2 * L / (nobs * (nobs - 1)))

     def _get_class_vars_type(self):
         """Helper method to be able to pass needed vars to _compute_subset."""
-        pass
+        class_type = 'KDEMultivariate'
+        class_vars = (self.var_type, )
+        return class_type, class_vars


 class KDEMultivariateConditional(GenericKDE):
@@ -315,7 +404,8 @@ class KDEMultivariateConditional(GenericKDE):
     array([ 0.41223484,  0.40976931])
     """

-    def __init__(self, endog, exog, dep_type, indep_type, bw, defaults=None):
+    def __init__(self, endog, exog, dep_type, indep_type, bw,
+                 defaults=None):
         self.dep_type = dep_type
         self.indep_type = indep_type
         self.data_type = dep_type + indep_type
@@ -335,15 +425,15 @@ class KDEMultivariateConditional(GenericKDE):

     def __repr__(self):
         """Provide something sane to print."""
-        rpr = 'KDEMultivariateConditional instance\n'
-        rpr += 'Number of independent variables: k_indep = ' + str(self.k_indep
-            ) + '\n'
-        rpr += 'Number of dependent variables: k_dep = ' + str(self.k_dep
-            ) + '\n'
-        rpr += 'Number of observations: nobs = ' + str(self.nobs) + '\n'
-        rpr += 'Independent variable types:      ' + self.indep_type + '\n'
-        rpr += 'Dependent variable types:      ' + self.dep_type + '\n'
-        rpr += 'BW selection method: ' + self._bw_method + '\n'
+        rpr = "KDEMultivariateConditional instance\n"
+        rpr += "Number of independent variables: k_indep = " + \
+               str(self.k_indep) + "\n"
+        rpr += "Number of dependent variables: k_dep = " + \
+               str(self.k_dep) + "\n"
+        rpr += "Number of observations: nobs = " + str(self.nobs) + "\n"
+        rpr += "Independent variable types:      " + self.indep_type + "\n"
+        rpr += "Dependent variable types:      " + self.dep_type + "\n"
+        rpr += "BW selection method: " + self._bw_method + "\n"
         return rpr

     def loo_likelihood(self, bw, func=lambda x: x):
@@ -371,10 +461,23 @@ class KDEMultivariateConditional(GenericKDE):
         Similar to ``KDE.loo_likelihood`, but substitute ``f(y|x)=f(x,y)/f(x)``
         for ``f(x)``.
         """
-        pass
+        yLOO = LeaveOneOut(self.data)
+        xLOO = LeaveOneOut(self.exog).__iter__()
+        L = 0
+        for i, Y_j in enumerate(yLOO):
+            X_not_i = next(xLOO)
+            f_yx = gpke(bw, data=-Y_j, data_predict=-self.data[i, :],
+                        var_type=(self.dep_type + self.indep_type))
+            f_x = gpke(bw[self.k_dep:], data=-X_not_i,
+                       data_predict=-self.exog[i, :],
+                       var_type=self.indep_type)
+            f_i = f_yx / f_x
+            L += func(f_i)
+
+        return -L

     def pdf(self, endog_predict=None, exog_predict=None):
-        """
+        r"""
         Evaluate the probability density function.

         Parameters
@@ -394,19 +497,39 @@ class KDEMultivariateConditional(GenericKDE):
         -----
         The formula for the conditional probability density is:

-        .. math:: f(y|x)=\\frac{f(x,y)}{f(x)}
+        .. math:: f(y|x)=\frac{f(x,y)}{f(x)}

         with

-        .. math:: f(x)=\\prod_{s=1}^{q}h_{s}^{-1}k
-                            \\left(\\frac{x_{is}-x_{js}}{h_{s}}\\right)
+        .. math:: f(x)=\prod_{s=1}^{q}h_{s}^{-1}k
+                            \left(\frac{x_{is}-x_{js}}{h_{s}}\right)

         where :math:`k` is the appropriate kernel for each variable.
         """
-        pass
+        if endog_predict is None:
+            endog_predict = self.endog
+        else:
+            endog_predict = _adjust_shape(endog_predict, self.k_dep)
+        if exog_predict is None:
+            exog_predict = self.exog
+        else:
+            exog_predict = _adjust_shape(exog_predict, self.k_indep)
+
+        pdf_est = []
+        data_predict = np.column_stack((endog_predict, exog_predict))
+        for i in range(np.shape(data_predict)[0]):
+            f_yx = gpke(self.bw, data=self.data,
+                        data_predict=data_predict[i, :],
+                        var_type=(self.dep_type + self.indep_type))
+            f_x = gpke(self.bw[self.k_dep:], data=self.exog,
+                       data_predict=exog_predict[i, :],
+                       var_type=self.indep_type)
+            pdf_est.append(f_yx / f_x)
+
+        return np.squeeze(pdf_est)

     def cdf(self, endog_predict=None, exog_predict=None):
-        """
+        r"""
         Cumulative distribution function for the conditional density.

         Parameters
@@ -432,7 +555,7 @@ class KDEMultivariateConditional(GenericKDE):

         .. math::

-            F(y|x)=\\frac{n^{-1}\\sum_{i=1}^{n}G(\\frac{y-Y_{i}}{h_{0}}) W_{h}(X_{i},x)}{\\widehat{\\mu}(x)}
+            F(y|x)=\frac{n^{-1}\sum_{i=1}^{n}G(\frac{y-Y_{i}}{h_{0}}) W_{h}(X_{i},x)}{\widehat{\mu}(x)}

         where G() is the product kernel CDF estimator for the dependent (y)
         variable(s) and W() is the product kernel CDF estimator for the
@@ -446,10 +569,39 @@ class KDEMultivariateConditional(GenericKDE):
                     distribution function." Journal of Nonparametric
                     Statistics (2008)
         """
-        pass
+        if endog_predict is None:
+            endog_predict = self.endog
+        else:
+            endog_predict = _adjust_shape(endog_predict, self.k_dep)
+        if exog_predict is None:
+            exog_predict = self.exog
+        else:
+            exog_predict = _adjust_shape(exog_predict, self.k_indep)
+
+        N_data_predict = np.shape(exog_predict)[0]
+        cdf_est = np.empty(N_data_predict)
+        for i in range(N_data_predict):
+            mu_x = gpke(self.bw[self.k_dep:], data=self.exog,
+                        data_predict=exog_predict[i, :],
+                        var_type=self.indep_type) / self.nobs
+            mu_x = np.squeeze(mu_x)
+            cdf_endog = gpke(self.bw[0:self.k_dep], data=self.endog,
+                             data_predict=endog_predict[i, :],
+                             var_type=self.dep_type,
+                             ckertype="gaussian_cdf",
+                             ukertype="aitchisonaitken_cdf",
+                             okertype='wangryzin_cdf', tosum=False)
+
+            cdf_exog = gpke(self.bw[self.k_dep:], data=self.exog,
+                            data_predict=exog_predict[i, :],
+                            var_type=self.indep_type, tosum=False)
+            S = (cdf_endog * cdf_exog).sum(axis=0)
+            cdf_est[i] = S / (self.nobs * mu_x)
+
+        return cdf_est

     def imse(self, bw):
-        """
+        r"""
         The integrated mean square error for the conditional KDE.

         Parameters
@@ -470,17 +622,17 @@ class KDEMultivariateConditional(GenericKDE):
         The formula for the cross-validation objective function for mixed
         variable types is:

-        .. math:: CV(h,\\lambda)=\\frac{1}{n}\\sum_{l=1}^{n}
-            \\frac{G_{-l}(X_{l})}{\\left[\\mu_{-l}(X_{l})\\right]^{2}}-
-            \\frac{2}{n}\\sum_{l=1}^{n}\\frac{f_{-l}(X_{l},Y_{l})}{\\mu_{-l}(X_{l})}
+        .. math:: CV(h,\lambda)=\frac{1}{n}\sum_{l=1}^{n}
+            \frac{G_{-l}(X_{l})}{\left[\mu_{-l}(X_{l})\right]^{2}}-
+            \frac{2}{n}\sum_{l=1}^{n}\frac{f_{-l}(X_{l},Y_{l})}{\mu_{-l}(X_{l})}

         where

-        .. math:: G_{-l}(X_{l}) = n^{-2}\\sum_{i\\neq l}\\sum_{j\\neq l}
+        .. math:: G_{-l}(X_{l}) = n^{-2}\sum_{i\neq l}\sum_{j\neq l}
                         K_{X_{i},X_{l}} K_{X_{j},X_{l}}K_{Y_{i},Y_{j}}^{(2)}

         where :math:`K_{X_{i},X_{l}}` is the multivariate product kernel and
-        :math:`\\mu_{-l}(X_{l})` is the leave-one-out estimator of the pdf.
+        :math:`\mu_{-l}(X_{l})` is the leave-one-out estimator of the pdf.

         :math:`K_{Y_{i},Y_{j}}^{(2)}` is the convolution kernel.

@@ -495,8 +647,41 @@ class KDEMultivariateConditional(GenericKDE):
         .. [2] Racine, J., Li, Q. "Nonparametric Estimation of Distributions
                 with Categorical and Continuous Data." Working Paper. (2000)
         """
-        pass
+        zLOO = LeaveOneOut(self.data)
+        CV = 0
+        nobs = float(self.nobs)
+        expander = np.ones((self.nobs - 1, 1))
+        for ii, Z in enumerate(zLOO):
+            X = Z[:, self.k_dep:]
+            Y = Z[:, :self.k_dep]
+            Ye_L = np.kron(Y, expander)
+            Ye_R = np.kron(expander, Y)
+            Xe_L = np.kron(X, expander)
+            Xe_R = np.kron(expander, X)
+            K_Xi_Xl = gpke(bw[self.k_dep:], data=Xe_L,
+                           data_predict=self.exog[ii, :],
+                           var_type=self.indep_type, tosum=False)
+            K_Xj_Xl = gpke(bw[self.k_dep:], data=Xe_R,
+                           data_predict=self.exog[ii, :],
+                           var_type=self.indep_type, tosum=False)
+            K2_Yi_Yj = gpke(bw[0:self.k_dep], data=Ye_L,
+                            data_predict=Ye_R, var_type=self.dep_type,
+                            ckertype='gauss_convolution',
+                            okertype='wangryzin_convolution',
+                            ukertype='aitchisonaitken_convolution',
+                            tosum=False)
+            G = (K_Xi_Xl * K_Xj_Xl * K2_Yi_Yj).sum() / nobs**2
+            f_X_Y = gpke(bw, data=-Z, data_predict=-self.data[ii, :],
+                         var_type=(self.dep_type + self.indep_type)) / nobs
+            m_x = gpke(bw[self.k_dep:], data=-X,
+                       data_predict=-self.exog[ii, :],
+                       var_type=self.indep_type) / nobs
+            CV += (G / m_x ** 2) - 2 * (f_X_Y / m_x)
+
+        return CV / nobs

     def _get_class_vars_type(self):
         """Helper method to be able to pass needed vars to _compute_subset."""
-        pass
+        class_type = 'KDEMultivariateConditional'
+        class_vars = (self.k_dep, self.dep_type, self.indep_type)
+        return class_type, class_vars
diff --git a/statsmodels/nonparametric/kernel_regression.py b/statsmodels/nonparametric/kernel_regression.py
index 93c94ef0a..a203adde5 100644
--- a/statsmodels/nonparametric/kernel_regression.py
+++ b/statsmodels/nonparametric/kernel_regression.py
@@ -27,11 +27,18 @@ References
         Models", 2006, Econometric Reviews 25, 523-544

 """
+
+# TODO: make default behavior efficient=True above a certain n_obs
 import copy
+
 import numpy as np
 from scipy import optimize
 from scipy.stats.mstats import mquantiles
-from ._kernel_base import GenericKDE, EstimatorSettings, gpke, LeaveOneOut, _get_type_pos, _adjust_shape, _compute_min_std_IQR, kernel_func
+
+from ._kernel_base import GenericKDE, EstimatorSettings, gpke, \
+    LeaveOneOut, _get_type_pos, _adjust_shape, _compute_min_std_IQR, kernel_func
+
+
 __all__ = ['KernelReg', 'KernelCensoredReg']


@@ -83,21 +90,20 @@ class KernelReg(GenericKDE):
     bw : array_like
         The bandwidth parameters.
     """
-
     def __init__(self, endog, exog, var_type, reg_type='ll', bw='cv_ls',
-        ckertype='gaussian', okertype='wangryzin', ukertype=
-        'aitchisonaitken', defaults=None):
+                 ckertype='gaussian', okertype='wangryzin',
+                 ukertype='aitchisonaitken', defaults=None):
         self.var_type = var_type
         self.data_type = var_type
         self.reg_type = reg_type
         self.ckertype = ckertype
         self.okertype = okertype
         self.ukertype = ukertype
-        if not (self.ckertype in kernel_func and self.ukertype in
-            kernel_func and self.okertype in kernel_func):
-            raise ValueError(
-                'user specified kernel must be a supported kernel from statsmodels.nonparametric.kernels.'
-                )
+        if not (self.ckertype in kernel_func and self.ukertype in kernel_func
+                and self.okertype in kernel_func):
+            raise ValueError('user specified kernel must be a supported '
+                             'kernel from statsmodels.nonparametric.kernels.')
+
         self.k_vars = len(self.var_type)
         self.endog = _adjust_shape(endog, 1)
         self.exog = _adjust_shape(exog, self.k_vars)
@@ -109,14 +115,34 @@ class KernelReg(GenericKDE):
         if not isinstance(bw, str):
             bw = np.asarray(bw)
             if len(bw) != self.k_vars:
-                raise ValueError(
-                    'bw must have the same dimension as the number of variables.'
-                    )
+                raise ValueError('bw must have the same dimension as the '
+                                 'number of variables.')
         if not self.efficient:
             self.bw = self._compute_reg_bw(bw)
         else:
             self.bw = self._compute_efficient(bw)

+    def _compute_reg_bw(self, bw):
+        if not isinstance(bw, str):
+            self._bw_method = "user-specified"
+            return np.asarray(bw)
+        else:
+            # The user specified a bandwidth selection method e.g. 'cv_ls'
+            self._bw_method = bw
+            # Workaround to avoid instance methods in __dict__
+            if bw == 'cv_ls':
+                res = self.cv_loo
+            else:  # bw == 'aic'
+                res = self.aic_hurvich
+            X = np.std(self.exog, axis=0)
+            h0 = 1.06 * X * \
+                 self.nobs ** (- 1. / (4 + np.size(self.exog, axis=1)))
+
+            func = self.est[self.reg_type]
+            bw_estimated = optimize.fmin(res, x0=h0, args=(func, ),
+                                         maxiter=1e3, maxfun=1e3, disp=0)
+            return bw_estimated
+
     def _est_loc_linear(self, bw, endog, exog, data_predict):
         """
         Local linear estimator of g(x) in the regression ``y = g(x) + e``.
@@ -142,7 +168,40 @@ class KernelReg(GenericKDE):
         See p. 81 in [1] and p.38 in [2] for the formulas.
         Unlike other methods, this one requires that `data_predict` be 1D.
         """
-        pass
+        nobs, k_vars = exog.shape
+        ker = gpke(bw, data=exog, data_predict=data_predict,
+                   var_type=self.var_type,
+                   ckertype=self.ckertype,
+                   ukertype=self.ukertype,
+                   okertype=self.okertype,
+                   tosum=False) / float(nobs)
+        # Create the matrix on p.492 in [7], after the multiplication w/ K_h,ij
+        # See also p. 38 in [2]
+        #ix_cont = np.arange(self.k_vars)  # Use all vars instead of continuous only
+        # Note: because ix_cont was defined here such that it selected all
+        # columns, I removed the indexing with it from exog/data_predict.
+
+        # Convert ker to a 2-D array to make matrix operations below work
+        ker = ker[:, np.newaxis]
+
+        M12 = exog - data_predict
+        M22 = np.dot(M12.T, M12 * ker)
+        M12 = (M12 * ker).sum(axis=0)
+        M = np.empty((k_vars + 1, k_vars + 1))
+        M[0, 0] = ker.sum()
+        M[0, 1:] = M12
+        M[1:, 0] = M12
+        M[1:, 1:] = M22
+
+        ker_endog = ker * endog
+        V = np.empty((k_vars + 1, 1))
+        V[0, 0] = ker_endog.sum()
+        V[1:, 0] = ((exog - data_predict) * ker_endog).sum(axis=0)
+
+        mean_mfx = np.dot(np.linalg.pinv(M), V)
+        mean = mean_mfx[0]
+        mfx = mean_mfx[1:, :]
+        return mean, mfx

     def _est_loc_constant(self, bw, endog, exog, data_predict):
         """
@@ -167,7 +226,31 @@ class KernelReg(GenericKDE):
         B_x : ndarray
             The marginal effects.
         """
-        pass
+        ker_x = gpke(bw, data=exog, data_predict=data_predict,
+                     var_type=self.var_type,
+                     ckertype=self.ckertype,
+                     ukertype=self.ukertype,
+                     okertype=self.okertype,
+                     tosum=False)
+        ker_x = np.reshape(ker_x, np.shape(endog))
+        G_numer = (ker_x * endog).sum(axis=0)
+        G_denom = ker_x.sum(axis=0)
+        G = G_numer / G_denom
+        nobs = exog.shape[0]
+        f_x = G_denom / float(nobs)
+        ker_xc = gpke(bw, data=exog, data_predict=data_predict,
+                      var_type=self.var_type,
+                      ckertype='d_gaussian',
+                      #okertype='wangryzin_reg',
+                      tosum=False)
+
+        ker_xc = ker_xc[:, np.newaxis]
+        d_mx = -(endog * ker_xc).sum(axis=0) / float(nobs) #* np.prod(bw[:, ix_cont]))
+        d_fx = -ker_xc.sum(axis=0) / float(nobs) #* np.prod(bw[:, ix_cont]))
+        B_x = d_mx / f_x - G * d_fx / f_x
+        B_x = (G_numer * d_fx - G_denom * d_mx) / (G_denom**2)
+        #B_x = (f_x * d_mx - m_x * d_fx) / (f_x ** 2)
+        return G, B_x

     def aic_hurvich(self, bw, func=None):
         """
@@ -189,10 +272,31 @@ class KernelReg(GenericKDE):
         ----------
         See ch.2 in [1] and p.35 in [2].
         """
-        pass
+        H = np.empty((self.nobs, self.nobs))
+        for j in range(self.nobs):
+            H[:, j] = gpke(bw, data=self.exog, data_predict=self.exog[j,:],
+                           ckertype=self.ckertype, ukertype=self.ukertype,
+                           okertype=self.okertype, var_type=self.var_type,
+                           tosum=False)
+
+        denom = H.sum(axis=1)
+        H = H / denom
+        gx = KernelReg(endog=self.endog, exog=self.exog, var_type=self.var_type,
+                       reg_type=self.reg_type, bw=bw,
+                       defaults=EstimatorSettings(efficient=False)).fit()[0]
+        gx = np.reshape(gx, (self.nobs, 1))
+        sigma = ((self.endog - gx)**2).sum(axis=0) / float(self.nobs)
+
+        frac = (1 + np.trace(H) / float(self.nobs)) / \
+               (1 - (np.trace(H) + 2) / float(self.nobs))
+        #siga = np.dot(self.endog.T, (I - H).T)
+        #sigb = np.dot((I - H), self.endog)
+        #sigma = np.dot(siga, sigb) / float(self.nobs)
+        aic = np.log(sigma) + frac
+        return aic

     def cv_loo(self, bw, func):
-        """
+        r"""
         The cross-validation function with leave-one-out estimator.

         Parameters
@@ -215,15 +319,25 @@ class KernelReg(GenericKDE):

         For details see p.35 in [2]

-        .. math:: CV(h)=n^{-1}\\sum_{i=1}^{n}(Y_{i}-g_{-i}(X_{i}))^{2}
+        .. math:: CV(h)=n^{-1}\sum_{i=1}^{n}(Y_{i}-g_{-i}(X_{i}))^{2}

         where :math:`g_{-i}(X_{i})` is the leave-one-out estimator of g(X)
         and :math:`h` is the vector of bandwidths
         """
-        pass
+        LOO_X = LeaveOneOut(self.exog)
+        LOO_Y = LeaveOneOut(self.endog).__iter__()
+        L = 0
+        for ii, X_not_i in enumerate(LOO_X):
+            Y = next(LOO_Y)
+            G = func(bw, endog=Y, exog=-X_not_i,
+                     data_predict=-self.exog[ii, :])[0]
+            L += (self.endog[ii] - G) ** 2
+
+        # Note: There might be a way to vectorize this. See p.72 in [1]
+        return L / self.nobs

     def r_squared(self):
-        """
+        r"""
         Returns the R-Squared for the nonparametric regression.

         Notes
@@ -231,14 +345,20 @@ class KernelReg(GenericKDE):
         For more details see p.45 in [2]
         The R-Squared is calculated by:

-        .. math:: R^{2}=\\frac{\\left[\\sum_{i=1}^{n}
-            (Y_{i}-\\bar{y})(\\hat{Y_{i}}-\\bar{y}\\right]^{2}}{\\sum_{i=1}^{n}
-            (Y_{i}-\\bar{y})^{2}\\sum_{i=1}^{n}(\\hat{Y_{i}}-\\bar{y})^{2}},
+        .. math:: R^{2}=\frac{\left[\sum_{i=1}^{n}
+            (Y_{i}-\bar{y})(\hat{Y_{i}}-\bar{y}\right]^{2}}{\sum_{i=1}^{n}
+            (Y_{i}-\bar{y})^{2}\sum_{i=1}^{n}(\hat{Y_{i}}-\bar{y})^{2}},

-        where :math:`\\hat{Y_{i}}` is the mean calculated in `fit` at the exog
+        where :math:`\hat{Y_{i}}` is the mean calculated in `fit` at the exog
         points.
         """
-        pass
+        Y = np.squeeze(self.endog)
+        Yhat = self.fit()[0]
+        Y_bar = np.mean(Yhat)
+        R2_numer = (((Y - Y_bar) * (Yhat - Y_bar)).sum())**2
+        R2_denom = ((Y - Y_bar)**2).sum(axis=0) * \
+                   ((Yhat - Y_bar)**2).sum(axis=0)
+        return R2_numer / R2_denom

     def fit(self, data_predict=None):
         """
@@ -257,7 +377,23 @@ class KernelReg(GenericKDE):
         mfx : ndarray
             The marginal effects, i.e. the partial derivatives of the mean.
         """
-        pass
+        func = self.est[self.reg_type]
+        if data_predict is None:
+            data_predict = self.exog
+        else:
+            data_predict = _adjust_shape(data_predict, self.k_vars)
+
+        N_data_predict = np.shape(data_predict)[0]
+        mean = np.empty((N_data_predict,))
+        mfx = np.empty((N_data_predict, self.k_vars))
+        for i in range(N_data_predict):
+            mean_mfx = func(self.bw, self.endog, self.exog,
+                            data_predict=data_predict[i, :])
+            mean[i] = np.squeeze(mean_mfx[0])
+            mfx_c = np.squeeze(mean_mfx[1])
+            mfx[i, :] = mfx_c
+
+        return mean, mfx

     def sig_test(self, var_pos, nboot=50, nested_res=25, pivot=False):
         """
@@ -278,21 +414,33 @@ class KernelReg(GenericKDE):
                 - `***` : at 99* confidence level
                 - "Not Significant" : if not significant
         """
-        pass
+        var_pos = np.asarray(var_pos)
+        ix_cont, ix_ord, ix_unord = _get_type_pos(self.var_type)
+        if np.any(ix_cont[var_pos]):
+            if np.any(ix_ord[var_pos]) or np.any(ix_unord[var_pos]):
+                raise ValueError("Discrete variable in hypothesis. Must be continuous")
+
+            Sig = TestRegCoefC(self, var_pos, nboot, nested_res, pivot)
+        else:
+            Sig = TestRegCoefD(self, var_pos, nboot)
+
+        return Sig.sig

     def __repr__(self):
         """Provide something sane to print."""
-        rpr = 'KernelReg instance\n'
-        rpr += 'Number of variables: k_vars = ' + str(self.k_vars) + '\n'
-        rpr += 'Number of samples:   N = ' + str(self.nobs) + '\n'
-        rpr += 'Variable types:      ' + self.var_type + '\n'
-        rpr += 'BW selection method: ' + self._bw_method + '\n'
-        rpr += 'Estimator type: ' + self.reg_type + '\n'
+        rpr = "KernelReg instance\n"
+        rpr += "Number of variables: k_vars = " + str(self.k_vars) + "\n"
+        rpr += "Number of samples:   N = " + str(self.nobs) + "\n"
+        rpr += "Variable types:      " + self.var_type + "\n"
+        rpr += "BW selection method: " + self._bw_method + "\n"
+        rpr += "Estimator type: " + self.reg_type + "\n"
         return rpr

     def _get_class_vars_type(self):
         """Helper method to be able to pass needed vars to _compute_subset."""
-        pass
+        class_type = 'KernelReg'
+        class_vars = (self.var_type, self.k_vars, self.reg_type)
+        return class_type, class_vars

     def _compute_dispersion(self, data):
         """
@@ -306,7 +454,8 @@ class KernelReg(GenericKDE):
         In the notes on bwscaling option in npreg, npudens, npcdens there is
         a discussion on the measure of dispersion
         """
-        pass
+        data = data[:, 1:]
+        return _compute_min_std_IQR(data)


 class KernelCensoredReg(KernelReg):
@@ -355,21 +504,22 @@ class KernelCensoredReg(KernelReg):
     bw : array_like
         The bandwidth parameters
     """
-
     def __init__(self, endog, exog, var_type, reg_type, bw='cv_ls',
-        ckertype='gaussian', ukertype='aitchison_aitken_reg', okertype=
-        'wangryzin_reg', censor_val=0, defaults=None):
+                 ckertype='gaussian',
+                 ukertype='aitchison_aitken_reg',
+                 okertype='wangryzin_reg',
+                 censor_val=0, defaults=None):
         self.var_type = var_type
         self.data_type = var_type
         self.reg_type = reg_type
         self.ckertype = ckertype
         self.okertype = okertype
         self.ukertype = ukertype
-        if not (self.ckertype in kernel_func and self.ukertype in
-            kernel_func and self.okertype in kernel_func):
-            raise ValueError(
-                'user specified kernel must be a supported kernel from statsmodels.nonparametric.kernels.'
-                )
+        if not (self.ckertype in kernel_func and self.ukertype in kernel_func
+                and self.okertype in kernel_func):
+            raise ValueError('user specified kernel must be a supported '
+                             'kernel from statsmodels.nonparametric.kernels.')
+
         self.k_vars = len(self.var_type)
         self.endog = _adjust_shape(endog, 1)
         self.exog = _adjust_shape(exog, self.k_vars)
@@ -383,19 +533,38 @@ class KernelCensoredReg(KernelReg):
             self.censored(censor_val)
         else:
             self.W_in = np.ones((self.nobs, 1))
+
         if not self.efficient:
             self.bw = self._compute_reg_bw(bw)
         else:
             self.bw = self._compute_efficient(bw)

+    def censored(self, censor_val):
+        # see pp. 341-344 in [1]
+        self.d = (self.endog != censor_val) * 1.
+        ix = np.argsort(np.squeeze(self.endog))
+        self.sortix = ix
+        self.sortix_rev = np.zeros(ix.shape, int)
+        self.sortix_rev[ix] = np.arange(len(ix))
+        self.endog = np.squeeze(self.endog[ix])
+        self.endog = _adjust_shape(self.endog, 1)
+        self.exog = np.squeeze(self.exog[ix])
+        self.d = np.squeeze(self.d[ix])
+        self.W_in = np.empty((self.nobs, 1))
+        for i in range(1, self.nobs + 1):
+            P=1
+            for j in range(1, i):
+                P *= ((self.nobs - j)/(float(self.nobs)-j+1))**self.d[j-1]
+            self.W_in[i-1,0] = P * self.d[i-1] / (float(self.nobs) - i + 1 )
+
     def __repr__(self):
         """Provide something sane to print."""
-        rpr = 'KernelCensoredReg instance\n'
-        rpr += 'Number of variables: k_vars = ' + str(self.k_vars) + '\n'
-        rpr += 'Number of samples:   nobs = ' + str(self.nobs) + '\n'
-        rpr += 'Variable types:      ' + self.var_type + '\n'
-        rpr += 'BW selection method: ' + self._bw_method + '\n'
-        rpr += 'Estimator type: ' + self.reg_type + '\n'
+        rpr = "KernelCensoredReg instance\n"
+        rpr += "Number of variables: k_vars = " + str(self.k_vars) + "\n"
+        rpr += "Number of samples:   nobs = " + str(self.nobs) + "\n"
+        rpr += "Variable types:      " + self.var_type + "\n"
+        rpr += "BW selection method: " + self._bw_method + "\n"
+        rpr += "Estimator type: " + self.reg_type + "\n"
         return rpr

     def _est_loc_linear(self, bw, endog, exog, data_predict, W):
@@ -424,10 +593,40 @@ class KernelCensoredReg(KernelReg):
         See p. 81 in [1] and p.38 in [2] for the formulas
         Unlike other methods, this one requires that data_predict be 1D
         """
-        pass
+        nobs, k_vars = exog.shape
+        ker = gpke(bw, data=exog, data_predict=data_predict,
+                   var_type=self.var_type,
+                   ckertype=self.ckertype,
+                   ukertype=self.ukertype,
+                   okertype=self.okertype, tosum=False)
+        # Create the matrix on p.492 in [7], after the multiplication w/ K_h,ij
+        # See also p. 38 in [2]
+
+        # Convert ker to a 2-D array to make matrix operations below work
+        ker = W * ker[:, np.newaxis]
+
+        M12 = exog - data_predict
+        M22 = np.dot(M12.T, M12 * ker)
+        M12 = (M12 * ker).sum(axis=0)
+        M = np.empty((k_vars + 1, k_vars + 1))
+        M[0, 0] = ker.sum()
+        M[0, 1:] = M12
+        M[1:, 0] = M12
+        M[1:, 1:] = M22
+
+        ker_endog = ker * endog
+        V = np.empty((k_vars + 1, 1))
+        V[0, 0] = ker_endog.sum()
+        V[1:, 0] = ((exog - data_predict) * ker_endog).sum(axis=0)
+
+        mean_mfx = np.dot(np.linalg.pinv(M), V)
+        mean = mean_mfx[0]
+        mfx = mean_mfx[1:, :]
+        return mean, mfx
+

     def cv_loo(self, bw, func):
-        """
+        r"""
         The cross-validation function with leave-one-out
         estimator

@@ -453,18 +652,47 @@ class KernelCensoredReg(KernelReg):

         For details see p.35 in [2]

-        .. math:: CV(h)=n^{-1}\\sum_{i=1}^{n}(Y_{i}-g_{-i}(X_{i}))^{2}
+        .. math:: CV(h)=n^{-1}\sum_{i=1}^{n}(Y_{i}-g_{-i}(X_{i}))^{2}

         where :math:`g_{-i}(X_{i})` is the leave-one-out estimator of g(X)
         and :math:`h` is the vector of bandwidths
         """
-        pass
+        LOO_X = LeaveOneOut(self.exog)
+        LOO_Y = LeaveOneOut(self.endog).__iter__()
+        LOO_W = LeaveOneOut(self.W_in).__iter__()
+        L = 0
+        for ii, X_not_i in enumerate(LOO_X):
+            Y = next(LOO_Y)
+            w = next(LOO_W)
+            G = func(bw, endog=Y, exog=-X_not_i,
+                     data_predict=-self.exog[ii, :], W=w)[0]
+            L += (self.endog[ii] - G) ** 2
+
+        # Note: There might be a way to vectorize this. See p.72 in [1]
+        return L / self.nobs

     def fit(self, data_predict=None):
         """
         Returns the marginal effects at the data_predict points.
         """
-        pass
+        func = self.est[self.reg_type]
+        if data_predict is None:
+            data_predict = self.exog
+        else:
+            data_predict = _adjust_shape(data_predict, self.k_vars)
+
+        N_data_predict = np.shape(data_predict)[0]
+        mean = np.empty((N_data_predict,))
+        mfx = np.empty((N_data_predict, self.k_vars))
+        for i in range(N_data_predict):
+            mean_mfx = func(self.bw, self.endog, self.exog,
+                            data_predict=data_predict[i, :],
+                            W=self.W_in)
+            mean[i] = np.squeeze(mean_mfx[0])
+            mfx_c = np.squeeze(mean_mfx[1])
+            mfx[i, :] = mfx_c
+
+        return mean, mfx


 class TestRegCoefC:
@@ -517,9 +745,11 @@ class TestRegCoefC:

     Chapter 12 in [1].
     """
-
-    def __init__(self, model, test_vars, nboot=400, nested_res=400, pivot=False
-        ):
+    # Significance of continuous vars in nonparametric regression
+    # Racine: Consistent Significance Testing for Nonparametric Regression
+    # Journal of Business & Economics Statistics
+    def __init__(self, model, test_vars, nboot=400, nested_res=400,
+                 pivot=False):
         self.nboot = nboot
         self.nres = nested_res
         self.test_vars = test_vars
@@ -534,15 +764,36 @@ class TestRegCoefC:
         self.pivot = pivot
         self.run()

+    def run(self):
+        self.test_stat = self._compute_test_stat(self.endog, self.exog)
+        self.sig = self._compute_sig()
+
     def _compute_test_stat(self, Y, X):
         """
         Computes the test statistic.  See p.371 in [8].
         """
-        pass
+        lam = self._compute_lambda(Y, X)
+        t = lam
+        if self.pivot:
+            se_lam = self._compute_se_lambda(Y, X)
+            t = lam / float(se_lam)
+
+        return t

     def _compute_lambda(self, Y, X):
         """Computes only lambda -- the main part of the test statistic"""
-        pass
+        n = np.shape(X)[0]
+        Y = _adjust_shape(Y, 1)
+        X = _adjust_shape(X, self.k_vars)
+        b = KernelReg(Y, X, self.var_type, self.model.reg_type, self.bw,
+                        defaults = EstimatorSettings(efficient=False)).fit()[1]
+
+        b = b[:, self.test_vars]
+        b = np.reshape(b, (n, len(self.test_vars)))
+        #fct = np.std(b)  # Pivot the statistic by dividing by SE
+        fct = 1.  # Do not Pivot -- Bootstrapping works better if Pivot
+        lam = ((b / fct) ** 2).sum() / float(n)
+        return lam

     def _compute_se_lambda(self, Y, X):
         """
@@ -551,7 +802,16 @@ class TestRegCoefC:
         Bootstrapping works better with estimating pivotal statistics
         but slows down computation significantly.
         """
-        pass
+        n = np.shape(Y)[0]
+        lam = np.empty(shape=(self.nres,))
+        for i in range(self.nres):
+            ind = np.random.randint(0, n, size=(n, 1))
+            Y1 = Y[ind, 0]
+            X1 = X[ind, :]
+            lam[i] = self._compute_lambda(Y1, X1)
+
+        se_lambda = np.std(lam)
+        return se_lambda

     def _compute_sig(self):
         """
@@ -561,7 +821,34 @@ class TestRegCoefC:
         bootstrapping the sample.  The null hypothesis is rejected if the test
         statistic is larger than the 90, 95, 99 percentiles.
         """
-        pass
+        t_dist = np.empty(shape=(self.nboot, ))
+        Y = self.endog
+        X = copy.deepcopy(self.exog)
+        n = np.shape(Y)[0]
+
+        X[:, self.test_vars] = np.mean(X[:, self.test_vars], axis=0)
+        # Calculate the restricted mean. See p. 372 in [8]
+        M = KernelReg(Y, X, self.var_type, self.model.reg_type, self.bw,
+                      defaults=EstimatorSettings(efficient=False)).fit()[0]
+        M = np.reshape(M, (n, 1))
+        e = Y - M
+        e = e - np.mean(e)  # recenter residuals
+        for i in range(self.nboot):
+            ind = np.random.randint(0, n, size=(n, 1))
+            e_boot = e[ind, 0]
+            Y_boot = M + e_boot
+            t_dist[i] = self._compute_test_stat(Y_boot, self.exog)
+
+        self.t_dist = t_dist
+        sig = "Not Significant"
+        if self.test_stat > mquantiles(t_dist, 0.9):
+            sig = "*"
+        if self.test_stat > mquantiles(t_dist, 0.95):
+            sig = "**"
+        if self.test_stat > mquantiles(t_dist, 0.99):
+            sig = "***"
+
+        return sig


 class TestRegCoefD(TestRegCoefC):
@@ -604,15 +891,73 @@ class TestRegCoefD(TestRegCoefC):

     def _compute_test_stat(self, Y, X):
         """Computes the test statistic"""
-        pass
+
+        dom_x = np.sort(np.unique(self.exog[:, self.test_vars]))
+
+        n = np.shape(X)[0]
+        model = KernelReg(Y, X, self.var_type, self.model.reg_type, self.bw,
+                          defaults = EstimatorSettings(efficient=False))
+        X1 = copy.deepcopy(X)
+        X1[:, self.test_vars] = 0
+
+        m0 = model.fit(data_predict=X1)[0]
+        m0 = np.reshape(m0, (n, 1))
+        zvec = np.zeros((n, 1))  # noqa:E741
+        for i in dom_x[1:] :
+            X1[:, self.test_vars] = i
+            m1 = model.fit(data_predict=X1)[0]
+            m1 = np.reshape(m1, (n, 1))
+            zvec += (m1 - m0) ** 2  # noqa:E741
+
+        avg = zvec.sum(axis=0) / float(n)
+        return avg

     def _compute_sig(self):
         """Calculates the significance level of the variable tested"""
-        pass
+
+        m = self._est_cond_mean()
+        Y = self.endog
+        X = self.exog
+        n = np.shape(X)[0]
+        u = Y - m
+        u = u - np.mean(u)  # center
+        fct1 = (1 - 5**0.5) / 2.
+        fct2 = (1 + 5**0.5) / 2.
+        u1 = fct1 * u
+        u2 = fct2 * u
+        r = fct2 / (5 ** 0.5)
+        I_dist = np.empty((self.nboot,1))
+        for j in range(self.nboot):
+            u_boot = copy.deepcopy(u2)
+
+            prob = np.random.uniform(0,1, size = (n,1))
+            ind = prob < r
+            u_boot[ind] = u1[ind]
+            Y_boot = m + u_boot
+            I_dist[j] = self._compute_test_stat(Y_boot, X)
+
+        sig = "Not Significant"
+        if self.test_stat > mquantiles(I_dist, 0.9):
+            sig = "*"
+        if self.test_stat > mquantiles(I_dist, 0.95):
+            sig = "**"
+        if self.test_stat > mquantiles(I_dist, 0.99):
+            sig = "***"
+
+        return sig

     def _est_cond_mean(self):
         """
         Calculates the expected conditional mean
         m(X, Z=l) for all possible l
         """
-        pass
+        self.dom_x = np.sort(np.unique(self.exog[:, self.test_vars]))
+        X = copy.deepcopy(self.exog)
+        m=0
+        for i in self.dom_x:
+            X[:, self.test_vars]  = i
+            m += self.model.fit(data_predict = X)[0]
+
+        m = m / float(len(self.dom_x))
+        m = np.reshape(m, (np.shape(self.exog)[0], 1))
+        return m
diff --git a/statsmodels/nonparametric/kernels.py b/statsmodels/nonparametric/kernels.py
index 24ad6e2d3..346b1846e 100644
--- a/statsmodels/nonparametric/kernels.py
+++ b/statsmodels/nonparametric/kernels.py
@@ -10,12 +10,18 @@ kernel density estimation much easier.

 NOTE: As it is, this module does not interact with the existing API
 """
+
 import numpy as np
 from scipy.special import erf


+#TODO:
+# - make sure we only receive int input for wang-ryzin and aitchison-aitken
+# - Check for the scalar Xi case everywhere
+
+
 def aitchison_aitken(h, Xi, x, num_levels=None):
-    """
+    r"""
     The Aitchison-Aitken kernel, used for unordered discrete random variables.

     Parameters
@@ -39,7 +45,7 @@ def aitchison_aitken(h, Xi, x, num_levels=None):
     Notes
     -----
     See p.18 of [2]_ for details.  The value of the kernel L if :math:`X_{i}=x`
-    is :math:`1-\\lambda`, otherwise it is :math:`\\frac{\\lambda}{c-1}`.
+    is :math:`1-\lambda`, otherwise it is :math:`\frac{\lambda}{c-1}`.
     Here :math:`c` is the number of levels plus one of the RV.

     References
@@ -49,11 +55,18 @@ def aitchison_aitken(h, Xi, x, num_levels=None):
     .. [*] Racine, Jeff. "Nonparametric Econometrics: A Primer," Foundation
            and Trends in Econometrics: Vol 3: No 1, pp1-88., 2008.
     """
-    pass
+    Xi = Xi.reshape(Xi.size)  # seems needed in case Xi is scalar
+    if num_levels is None:
+        num_levels = np.asarray(np.unique(Xi).size)
+
+    kernel_value = np.ones(Xi.size) * h / (num_levels - 1)
+    idx = Xi == x
+    kernel_value[idx] = (idx * (1 - h))[idx]
+    return kernel_value


 def wang_ryzin(h, Xi, x):
-    """
+    r"""
     The Wang-Ryzin kernel, used for ordered discrete random variables.

     Parameters
@@ -73,8 +86,8 @@ def wang_ryzin(h, Xi, x):
     Notes
     -----
     See p. 19 in [1]_ for details.  The value of the kernel L if
-    :math:`X_{i}=x` is :math:`1-\\lambda`, otherwise it is
-    :math:`\\frac{1-\\lambda}{2}\\lambda^{|X_{i}-x|}`, where :math:`\\lambda` is
+    :math:`X_{i}=x` is :math:`1-\lambda`, otherwise it is
+    :math:`\frac{1-\lambda}{2}\lambda^{|X_{i}-x|}`, where :math:`\lambda` is
     the bandwidth.

     References
@@ -85,7 +98,11 @@ def wang_ryzin(h, Xi, x):
     .. [*] M.-C. Wang and J. van Ryzin, "A class of smooth estimators for
            discrete distributions", Biometrika, vol. 68, pp. 301-309, 1981.
     """
-    pass
+    Xi = Xi.reshape(Xi.size)  # seems needed in case Xi is scalar
+    kernel_value = 0.5 * (1 - h) * (h ** abs(Xi - x))
+    idx = Xi == x
+    kernel_value[idx] = (idx * (1 - h))[idx]
+    return kernel_value


 def gaussian(h, Xi, x):
@@ -105,7 +122,7 @@ def gaussian(h, Xi, x):
     kernel_value : ndarray, shape (nobs, K)
         The value of the kernel function at each training point for each var.
     """
-    pass
+    return (1. / np.sqrt(2 * np.pi)) * np.exp(-(Xi - x)**2 / (h**2 * 2.))


 def tricube(h, Xi, x):
@@ -125,12 +142,66 @@ def tricube(h, Xi, x):
     kernel_value : ndarray, shape (nobs, K)
         The value of the kernel function at each training point for each var.
     """
-    pass
+    u = (Xi - x) / h
+    u[np.abs(u) > 1] = 0
+    return (70. / 81) * (1 - np.abs(u)**3)**3


 def gaussian_convolution(h, Xi, x):
     """ Calculates the Gaussian Convolution Kernel """
-    pass
+    return (1. / np.sqrt(4 * np.pi)) * np.exp(- (Xi - x)**2 / (h**2 * 4.))
+
+
+def wang_ryzin_convolution(h, Xi, Xj):
+    # This is the equivalent of the convolution case with the Gaussian Kernel
+    # However it is not exactly convolution. Think of a better name
+    # References
+    ordered = np.zeros(Xi.size)
+    for x in np.unique(Xi):
+        ordered += wang_ryzin(h, Xi, x) * wang_ryzin(h, Xj, x)
+
+    return ordered
+
+
+def aitchison_aitken_convolution(h, Xi, Xj):
+    Xi_vals = np.unique(Xi)
+    ordered = np.zeros(Xi.size)
+    num_levels = Xi_vals.size
+    for x in Xi_vals:
+        ordered += aitchison_aitken(h, Xi, x, num_levels=num_levels) * \
+                   aitchison_aitken(h, Xj, x, num_levels=num_levels)
+
+    return ordered
+
+
+def gaussian_cdf(h, Xi, x):
+    return 0.5 * h * (1 + erf((x - Xi) / (h * np.sqrt(2))))
+
+
+def aitchison_aitken_cdf(h, Xi, x_u):
+    x_u = int(x_u)
+    Xi_vals = np.unique(Xi)
+    ordered = np.zeros(Xi.size)
+    num_levels = Xi_vals.size
+    for x in Xi_vals:
+        if x <= x_u:  #FIXME: why a comparison for unordered variables?
+            ordered += aitchison_aitken(h, Xi, x, num_levels=num_levels)
+
+    return ordered
+
+
+def wang_ryzin_cdf(h, Xi, x_u):
+    ordered = np.zeros(Xi.size)
+    for x in np.unique(Xi):
+        if x <= x_u:
+            ordered += wang_ryzin(h, Xi, x)
+
+    return ordered
+
+
+def d_gaussian(h, Xi, x):
+    # The derivative of the Gaussian Kernel
+    return 2 * (Xi - x) * gaussian(h, Xi, x) / h**2


 def aitchison_aitken_reg(h, Xi, x):
@@ -139,7 +210,11 @@ def aitchison_aitken_reg(h, Xi, x):

     Suggested by Li and Racine.
     """
-    pass
+    kernel_value = np.ones(Xi.size)
+    ix = Xi != x
+    inDom = ix * h
+    kernel_value[ix] = inDom[ix]
+    return kernel_value


 def wang_ryzin_reg(h, Xi, x):
@@ -148,4 +223,4 @@ def wang_ryzin_reg(h, Xi, x):

     Suggested by Li and Racine in [1] ch.4
     """
-    pass
+    return h ** abs(Xi - x)
diff --git a/statsmodels/nonparametric/kernels_asymmetric.py b/statsmodels/nonparametric/kernels_asymmetric.py
index d4d9ec41b..e6f769a93 100644
--- a/statsmodels/nonparametric/kernels_asymmetric.py
+++ b/statsmodels/nonparametric/kernels_asymmetric.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Asymmetric kernels for R+ and unit interval

 References
@@ -40,9 +41,12 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 from scipy import special, stats
-doc_params = """Parameters
+
+doc_params = """\
+Parameters
     ----------
     x : array_like, float
         Points for which density is evaluated. ``x`` can be scalar or 1-dim.
@@ -91,7 +95,36 @@ def pdf_kernel_asym(x, sample, bw, kernel_type, weights=None, batch_size=10):
     pdf : float or ndarray
         Estimate of pdf at points x. ``pdf`` has the same size or shape as x.
     """
-    pass
+
+    if callable(kernel_type):
+        kfunc = kernel_type
+    else:
+        kfunc = kernel_dict_pdf[kernel_type]
+
+    batch_size = batch_size * 1000
+
+    if np.size(x) * len(sample) < batch_size:
+        # no batch-loop
+        if np.size(x) > 1:
+            x = np.asarray(x)[:, None]
+
+        pdfi = kfunc(x, sample, bw)
+        if weights is None:
+            pdf = pdfi.mean(-1)
+        else:
+            pdf = pdfi @ weights
+    else:
+        # batch, designed for 1-d x
+        if weights is None:
+            weights = np.ones(len(sample)) / len(sample)
+
+        k = batch_size // len(sample)
+        n = len(x) // k
+        x_split = np.array_split(x, n)
+        pdf = np.concatenate([(kfunc(xi[:, None], sample, bw) @ weights)
+                              for xi in x_split])
+
+    return pdf


 def cdf_kernel_asym(x, sample, bw, kernel_type, weights=None, batch_size=10):
@@ -129,11 +162,45 @@ def cdf_kernel_asym(x, sample, bw, kernel_type, weights=None, batch_size=10):
     cdf : float or ndarray
         Estimate of cdf at points x. ``cdf`` has the same size or shape as x.
     """
-    pass
+
+    if callable(kernel_type):
+        kfunc = kernel_type
+    else:
+        kfunc = kernel_dict_cdf[kernel_type]
+
+    batch_size = batch_size * 1000
+
+    if np.size(x) * len(sample) < batch_size:
+        # no batch-loop
+        if np.size(x) > 1:
+            x = np.asarray(x)[:, None]
+
+        cdfi = kfunc(x, sample, bw)
+        if weights is None:
+            cdf = cdfi.mean(-1)
+        else:
+            cdf = cdfi @ weights
+    else:
+        # batch, designed for 1-d x
+        if weights is None:
+            weights = np.ones(len(sample)) / len(sample)
+
+        k = batch_size // len(sample)
+        n = len(x) // k
+        x_split = np.array_split(x, n)
+        cdf = np.concatenate([(kfunc(xi[:, None], sample, bw) @ weights)
+                              for xi in x_split])
+
+    return cdf
+
+
+def kernel_pdf_beta(x, sample, bw):
+    # Beta kernel for density, pdf, estimation
+    return stats.beta.pdf(sample, x / bw + 1, (1 - x) / bw + 1)


-kernel_pdf_beta.__doc__ = (
-    """    Beta kernel for density, pdf, estimation.
+kernel_pdf_beta.__doc__ = """\
+    Beta kernel for density, pdf, estimation.

     {doc_params}

@@ -146,10 +213,16 @@ kernel_pdf_beta.__doc__ = (
     .. [2] Chen, Song Xi. 1999. “Beta Kernel Estimators for Density Functions.”
        Computational Statistics & Data Analysis 31 (2): 131–45.
        https://doi.org/10.1016/S0167-9473(99)00010-9.
-    """
-    .format(doc_params=doc_params))
-kernel_cdf_beta.__doc__ = (
-    """    Beta kernel for cumulative distribution, cdf, estimation.
+    """.format(doc_params=doc_params)
+
+
+def kernel_cdf_beta(x, sample, bw):
+    # Beta kernel for cumulative distribution, cdf, estimation
+    return stats.beta.sf(sample, x / bw + 1, (1 - x) / bw + 1)
+
+
+kernel_cdf_beta.__doc__ = """\
+    Beta kernel for cumulative distribution, cdf, estimation.

     {doc_params}

@@ -162,10 +235,48 @@ kernel_cdf_beta.__doc__ = (
     .. [2] Chen, Song Xi. 1999. “Beta Kernel Estimators for Density Functions.”
        Computational Statistics & Data Analysis 31 (2): 131–45.
        https://doi.org/10.1016/S0167-9473(99)00010-9.
-    """
-    .format(doc_params=doc_params))
-kernel_pdf_beta2.__doc__ = (
-    """    Beta kernel for density, pdf, estimation with boundary corrections.
+    """.format(doc_params=doc_params)
+
+
+def kernel_pdf_beta2(x, sample, bw):
+    # Beta kernel for density, pdf, estimation with boundary corrections
+
+    # a = 2 * bw**2 + 2.5 -
+    #     np.sqrt(4 * bw**4 + 6 * bw**2 + 2.25 - x**2 - x / bw)
+    # terms a1 and a2 are independent of x
+    a1 = 2 * bw**2 + 2.5
+    a2 = 4 * bw**4 + 6 * bw**2 + 2.25
+
+    if np.size(x) == 1:
+        # without vectorizing:
+        if x < 2 * bw:
+            a = a1 - np.sqrt(a2 - x**2 - x / bw)
+            pdf = stats.beta.pdf(sample, a, (1 - x) / bw)
+        elif x > (1 - 2 * bw):
+            x_ = 1 - x
+            a = a1 - np.sqrt(a2 - x_**2 - x_ / bw)
+            pdf = stats.beta.pdf(sample, x / bw, a)
+        else:
+            pdf = stats.beta.pdf(sample, x / bw, (1 - x) / bw)
+    else:
+        alpha = x / bw
+        beta = (1 - x) / bw
+
+        mask_low = x < 2 * bw
+        x_ = x[mask_low]
+        alpha[mask_low] = a1 - np.sqrt(a2 - x_**2 - x_ / bw)
+
+        mask_upp = x > (1 - 2 * bw)
+        x_ = 1 - x[mask_upp]
+        beta[mask_upp] = a1 - np.sqrt(a2 - x_**2 - x_ / bw)
+
+        pdf = stats.beta.pdf(sample, alpha, beta)
+
+    return pdf
+
+
+kernel_pdf_beta2.__doc__ = """\
+    Beta kernel for density, pdf, estimation with boundary corrections.

     {doc_params}

@@ -178,10 +289,48 @@ kernel_pdf_beta2.__doc__ = (
     .. [2] Chen, Song Xi. 1999. “Beta Kernel Estimators for Density Functions.”
        Computational Statistics & Data Analysis 31 (2): 131–45.
        https://doi.org/10.1016/S0167-9473(99)00010-9.
-    """
-    .format(doc_params=doc_params))
-kernel_cdf_beta2.__doc__ = (
-    """    Beta kernel for cdf estimation with boundary correction.
+    """.format(doc_params=doc_params)
+
+
+def kernel_cdf_beta2(x, sample, bw):
+    # Beta kernel for cdf estimation with boundary correction
+
+    # a = 2 * bw**2 + 2.5 -
+    #     np.sqrt(4 * bw**4 + 6 * bw**2 + 2.25 - x**2 - x / bw)
+    # terms a1 and a2 are independent of x
+    a1 = 2 * bw**2 + 2.5
+    a2 = 4 * bw**4 + 6 * bw**2 + 2.25
+
+    if np.size(x) == 1:
+        # without vectorizing:
+        if x < 2 * bw:
+            a = a1 - np.sqrt(a2 - x**2 - x / bw)
+            pdf = stats.beta.sf(sample, a, (1 - x) / bw)
+        elif x > (1 - 2 * bw):
+            x_ = 1 - x
+            a = a1 - np.sqrt(a2 - x_**2 - x_ / bw)
+            pdf = stats.beta.sf(sample, x / bw, a)
+        else:
+            pdf = stats.beta.sf(sample, x / bw, (1 - x) / bw)
+    else:
+        alpha = x / bw
+        beta = (1 - x) / bw
+        mask_low = x < 2 * bw
+
+        x_ = x[mask_low]
+        alpha[mask_low] = a1 - np.sqrt(a2 - x_**2 - x_ / bw)
+
+        mask_upp = x > (1 - 2 * bw)
+        x_ = 1 - x[mask_upp]
+        beta[mask_upp] = a1 - np.sqrt(a2 - x_**2 - x_ / bw)
+
+        pdf = stats.beta.sf(sample, alpha, beta)
+
+    return pdf
+
+
+kernel_cdf_beta2.__doc__ = """\
+    Beta kernel for cdf estimation with boundary correction.

     {doc_params}

@@ -194,10 +343,17 @@ kernel_cdf_beta2.__doc__ = (
     .. [2] Chen, Song Xi. 1999. “Beta Kernel Estimators for Density Functions.”
        Computational Statistics & Data Analysis 31 (2): 131–45.
        https://doi.org/10.1016/S0167-9473(99)00010-9.
-    """
-    .format(doc_params=doc_params))
-kernel_pdf_gamma.__doc__ = (
-    """    Gamma kernel for density, pdf, estimation.
+    """.format(doc_params=doc_params)
+
+
+def kernel_pdf_gamma(x, sample, bw):
+    # Gamma kernel for density, pdf, estimation
+    pdfi = stats.gamma.pdf(sample, x / bw + 1, scale=bw)
+    return pdfi
+
+
+kernel_pdf_gamma.__doc__ = """\
+    Gamma kernel for density, pdf, estimation.

     {doc_params}

@@ -211,10 +367,18 @@ kernel_pdf_gamma.__doc__ = (
        Gamma Krnels.”
        Annals of the Institute of Statistical Mathematics 52 (3): 471–80.
        https://doi.org/10.1023/A:1004165218295.
-    """
-    .format(doc_params=doc_params))
-kernel_cdf_gamma.__doc__ = (
-    """    Gamma kernel for cumulative distribution, cdf, estimation.
+    """.format(doc_params=doc_params)
+
+
+def kernel_cdf_gamma(x, sample, bw):
+    # Gamma kernel for density, pdf, estimation
+    # kernel cdf uses the survival function, but I don't know why.
+    cdfi = stats.gamma.sf(sample, x / bw + 1, scale=bw)
+    return cdfi
+
+
+kernel_cdf_gamma.__doc__ = """\
+    Gamma kernel for cumulative distribution, cdf, estimation.

     {doc_params}

@@ -228,8 +392,7 @@ kernel_cdf_gamma.__doc__ = (
        Gamma Krnels.”
        Annals of the Institute of Statistical Mathematics 52 (3): 471–80.
        https://doi.org/10.1023/A:1004165218295.
-    """
-    .format(doc_params=doc_params))
+    """.format(doc_params=doc_params)


 def _kernel_pdf_gamma(x, sample, bw):
@@ -241,7 +404,7 @@ def _kernel_pdf_gamma(x, sample, bw):
     neighborhood of zero boundary is small.

     """
-    pass
+    return stats.gamma.pdf(sample, x / bw, scale=bw)


 def _kernel_cdf_gamma(x, sample, bw):
@@ -253,11 +416,28 @@ def _kernel_cdf_gamma(x, sample, bw):
     neighborhood of zero boundary is small.

     """
-    pass
+    return stats.gamma.sf(sample, x / bw, scale=bw)
+
+
+def kernel_pdf_gamma2(x, sample, bw):
+    # Gamma kernel for density, pdf, estimation with boundary correction
+    if np.size(x) == 1:
+        # without vectorizing, easier to read
+        if x < 2 * bw:
+            a = (x / bw)**2 + 1
+        else:
+            a = x / bw
+    else:
+        a = x / bw
+        mask = x < 2 * bw
+        a[mask] = a[mask]**2 + 1
+    pdf = stats.gamma.pdf(sample, a, scale=bw)

+    return pdf

-kernel_pdf_gamma2.__doc__ = (
-    """    Gamma kernel for density, pdf, estimation with boundary correction.
+
+kernel_pdf_gamma2.__doc__ = """\
+    Gamma kernel for density, pdf, estimation with boundary correction.

     {doc_params}

@@ -271,10 +451,28 @@ kernel_pdf_gamma2.__doc__ = (
        Gamma Krnels.”
        Annals of the Institute of Statistical Mathematics 52 (3): 471–80.
        https://doi.org/10.1023/A:1004165218295.
-    """
-    .format(doc_params=doc_params))
-kernel_cdf_gamma2.__doc__ = (
-    """    Gamma kernel for cdf estimation with boundary correction.
+    """.format(doc_params=doc_params)
+
+
+def kernel_cdf_gamma2(x, sample, bw):
+    # Gamma kernel for cdf estimation with boundary correction
+    if np.size(x) == 1:
+        # without vectorizing
+        if x < 2 * bw:
+            a = (x / bw)**2 + 1
+        else:
+            a = x / bw
+    else:
+        a = x / bw
+        mask = x < 2 * bw
+        a[mask] = a[mask]**2 + 1
+    pdf = stats.gamma.sf(sample, a, scale=bw)
+
+    return pdf
+
+
+kernel_cdf_gamma2.__doc__ = """\
+    Gamma kernel for cdf estimation with boundary correction.

     {doc_params}

@@ -288,10 +486,16 @@ kernel_cdf_gamma2.__doc__ = (
        Gamma Krnels.”
        Annals of the Institute of Statistical Mathematics 52 (3): 471–80.
        https://doi.org/10.1023/A:1004165218295.
-    """
-    .format(doc_params=doc_params))
-kernel_pdf_invgamma.__doc__ = (
-    """    Inverse gamma kernel for density, pdf, estimation.
+    """.format(doc_params=doc_params)
+
+
+def kernel_pdf_invgamma(x, sample, bw):
+    # Inverse gamma kernel for density, pdf, estimation
+    return stats.invgamma.pdf(sample, 1 / bw + 1, scale=x / bw)
+
+
+kernel_pdf_invgamma.__doc__ = """\
+    Inverse gamma kernel for density, pdf, estimation.

     Based on cdf kernel by Micheaux and Ouimet (2020)

@@ -302,10 +506,16 @@ kernel_pdf_invgamma.__doc__ = (
     .. [1] Micheaux, Pierre Lafaye de, and Frédéric Ouimet. 2020. “A Study of
        Seven Asymmetric Kernels for the Estimation of Cumulative Distribution
        Functions,” November. https://arxiv.org/abs/2011.14893v1.
-    """
-    .format(doc_params=doc_params))
-kernel_cdf_invgamma.__doc__ = (
-    """    Inverse gamma kernel for cumulative distribution, cdf, estimation.
+    """.format(doc_params=doc_params)
+
+
+def kernel_cdf_invgamma(x, sample, bw):
+    # Inverse gamma kernel for cumulative distribution, cdf, estimation
+    return stats.invgamma.sf(sample, 1 / bw + 1, scale=x / bw)
+
+
+kernel_cdf_invgamma.__doc__ = """\
+    Inverse gamma kernel for cumulative distribution, cdf, estimation.

     {doc_params}

@@ -314,10 +524,18 @@ kernel_cdf_invgamma.__doc__ = (
     .. [1] Micheaux, Pierre Lafaye de, and Frédéric Ouimet. 2020. “A Study of
        Seven Asymmetric Kernels for the Estimation of Cumulative Distribution
        Functions,” November. https://arxiv.org/abs/2011.14893v1.
-    """
-    .format(doc_params=doc_params))
-kernel_pdf_invgauss.__doc__ = (
-    """    Inverse gaussian kernel for density, pdf, estimation.
+    """.format(doc_params=doc_params)
+
+
+def kernel_pdf_invgauss(x, sample, bw):
+    # Inverse gaussian kernel for density, pdf, estimation
+    m = x
+    lam = 1 / bw
+    return stats.invgauss.pdf(sample, m / lam, scale=lam)
+
+
+kernel_pdf_invgauss.__doc__ = """\
+    Inverse gaussian kernel for density, pdf, estimation.

     {doc_params}

@@ -327,8 +545,7 @@ kernel_pdf_invgauss.__doc__ = (
        Inverse Gaussian Kernels.”
        Journal of Nonparametric Statistics 16 (1–2): 217–26.
        https://doi.org/10.1080/10485250310001624819.
-    """
-    .format(doc_params=doc_params))
+    """.format(doc_params=doc_params)


 def kernel_pdf_invgauss_(x, sample, bw):
@@ -336,11 +553,20 @@ def kernel_pdf_invgauss_(x, sample, bw):

     Scaillet 2004
     """
-    pass
+    pdf = (1 / np.sqrt(2 * np.pi * bw * sample**3) *
+           np.exp(- 1 / (2 * bw * x) * (sample / x - 2 + x / sample)))
+    return pdf.mean(-1)


-kernel_cdf_invgauss.__doc__ = (
-    """    Inverse gaussian kernel for cumulative distribution, cdf, estimation.
+def kernel_cdf_invgauss(x, sample, bw):
+    # Inverse gaussian kernel for cumulative distribution, cdf, estimation
+    m = x
+    lam = 1 / bw
+    return stats.invgauss.sf(sample, m / lam, scale=lam)
+
+
+kernel_cdf_invgauss.__doc__ = """\
+    Inverse gaussian kernel for cumulative distribution, cdf, estimation.

     {doc_params}

@@ -350,10 +576,21 @@ kernel_cdf_invgauss.__doc__ = (
        Inverse Gaussian Kernels.”
        Journal of Nonparametric Statistics 16 (1–2): 217–26.
        https://doi.org/10.1080/10485250310001624819.
-    """
-    .format(doc_params=doc_params))
-kernel_pdf_recipinvgauss.__doc__ = (
-    """    Reciprocal inverse gaussian kernel for density, pdf, estimation.
+    """.format(doc_params=doc_params)
+
+
+def kernel_pdf_recipinvgauss(x, sample, bw):
+    # Reciprocal inverse gaussian kernel for density, pdf, estimation
+
+    # need shape-scale parameterization for scipy
+    # references use m, lambda parameterization
+    m = 1 / (x - bw)
+    lam = 1 / bw
+    return stats.recipinvgauss.pdf(sample, m / lam, scale=1 / lam)
+
+
+kernel_pdf_recipinvgauss.__doc__ = """\
+    Reciprocal inverse gaussian kernel for density, pdf, estimation.

     {doc_params}

@@ -363,8 +600,7 @@ kernel_pdf_recipinvgauss.__doc__ = (
        Inverse Gaussian Kernels.”
        Journal of Nonparametric Statistics 16 (1–2): 217–26.
        https://doi.org/10.1080/10485250310001624819.
-    """
-    .format(doc_params=doc_params))
+    """.format(doc_params=doc_params)


 def kernel_pdf_recipinvgauss_(x, sample, bw):
@@ -372,11 +608,25 @@ def kernel_pdf_recipinvgauss_(x, sample, bw):

     Scaillet 2004
     """
-    pass

+    pdf = (1 / np.sqrt(2 * np.pi * bw * sample) *
+           np.exp(- (x - bw) / (2 * bw) * sample / (x - bw) - 2 +
+                  (x - bw) / sample))
+    return pdf
+
+
+def kernel_cdf_recipinvgauss(x, sample, bw):
+    # Reciprocal inverse gaussian kernel for cdf estimation
+
+    # need shape-scale parameterization for scipy
+    # references use m, lambda parameterization
+    m = 1 / (x - bw)
+    lam = 1 / bw
+    return stats.recipinvgauss.sf(sample, m / lam, scale=1 / lam)

-kernel_cdf_recipinvgauss.__doc__ = (
-    """    Reciprocal inverse gaussian kernel for cdf estimation.
+
+kernel_cdf_recipinvgauss.__doc__ = """\
+    Reciprocal inverse gaussian kernel for cdf estimation.

     {doc_params}

@@ -386,10 +636,16 @@ kernel_cdf_recipinvgauss.__doc__ = (
        Inverse Gaussian Kernels.”
        Journal of Nonparametric Statistics 16 (1–2): 217–26.
        https://doi.org/10.1080/10485250310001624819.
-    """
-    .format(doc_params=doc_params))
-kernel_pdf_bs.__doc__ = (
-    """    Birnbaum Saunders (normal) kernel for density, pdf, estimation.
+    """.format(doc_params=doc_params)
+
+
+def kernel_pdf_bs(x, sample, bw):
+    # Birnbaum Saunders (normal) kernel for density, pdf, estimation
+    return stats.fatiguelife.pdf(sample, bw, scale=x)
+
+
+kernel_pdf_bs.__doc__ = """\
+    Birnbaum Saunders (normal) kernel for density, pdf, estimation.

     {doc_params}

@@ -398,10 +654,16 @@ kernel_pdf_bs.__doc__ = (
     .. [1] Jin, Xiaodong, and Janusz Kawczak. 2003. “Birnbaum-Saunders and
        Lognormal Kernel Estimators for Modelling Durations in High Frequency
        Financial Data.” Annals of Economics and Finance 4: 103–24.
-    """
-    .format(doc_params=doc_params))
-kernel_cdf_bs.__doc__ = (
-    """    Birnbaum Saunders (normal) kernel for cdf estimation.
+    """.format(doc_params=doc_params)
+
+
+def kernel_cdf_bs(x, sample, bw):
+    # Birnbaum Saunders (normal) kernel for cdf estimation
+    return stats.fatiguelife.sf(sample, bw, scale=x)
+
+
+kernel_cdf_bs.__doc__ = """\
+    Birnbaum Saunders (normal) kernel for cdf estimation.

     {doc_params}

@@ -413,10 +675,24 @@ kernel_cdf_bs.__doc__ = (
     .. [2] Mombeni, Habib Allah, B Masouri, and Mohammad Reza Akhoond. 2019.
        “Asymmetric Kernels for Boundary Modification in Distribution Function
        Estimation.” REVSTAT, 1–27.
-    """
-    .format(doc_params=doc_params))
-kernel_pdf_lognorm.__doc__ = (
-    """    Log-normal kernel for density, pdf, estimation.
+    """.format(doc_params=doc_params)
+
+
+def kernel_pdf_lognorm(x, sample, bw):
+    # Log-normal kernel for density, pdf, estimation
+
+    # need shape-scale parameterization for scipy
+    # not sure why JK picked this normalization, makes required bw small
+    # maybe we should skip this transformation and just use bw
+    # Funke and Kawka 2015 (table 1) use bw (or bw**2) corresponding to
+    #    variance of normal pdf
+    # bw = np.exp(bw_**2 / 4) - 1  # this is inverse transformation
+    bw_ = np.sqrt(4*np.log(1+bw))
+    return stats.lognorm.pdf(sample, bw_, scale=x)
+
+
+kernel_pdf_lognorm.__doc__ = """\
+    Log-normal kernel for density, pdf, estimation.

     {doc_params}

@@ -429,10 +705,24 @@ kernel_pdf_lognorm.__doc__ = (
     .. [1] Jin, Xiaodong, and Janusz Kawczak. 2003. “Birnbaum-Saunders and
        Lognormal Kernel Estimators for Modelling Durations in High Frequency
        Financial Data.” Annals of Economics and Finance 4: 103–24.
-    """
-    .format(doc_params=doc_params))
-kernel_cdf_lognorm.__doc__ = (
-    """    Log-normal kernel for cumulative distribution, cdf, estimation.
+    """.format(doc_params=doc_params)
+
+
+def kernel_cdf_lognorm(x, sample, bw):
+    # Log-normal kernel for cumulative distribution, cdf, estimation
+
+    # need shape-scale parameterization for scipy
+    # not sure why JK picked this normalization, makes required bw small
+    # maybe we should skip this transformation and just use bw
+    # Funke and Kawka 2015 (table 1) use bw (or bw**2) corresponding to
+    #    variance of normal pdf
+    # bw = np.exp(bw_**2 / 4) - 1  # this is inverse transformation
+    bw_ = np.sqrt(4*np.log(1+bw))
+    return stats.lognorm.sf(sample, bw_, scale=x)
+
+
+kernel_cdf_lognorm.__doc__ = """\
+    Log-normal kernel for cumulative distribution, cdf, estimation.

     {doc_params}

@@ -445,8 +735,7 @@ kernel_cdf_lognorm.__doc__ = (
     .. [1] Jin, Xiaodong, and Janusz Kawczak. 2003. “Birnbaum-Saunders and
        Lognormal Kernel Estimators for Modelling Durations in High Frequency
        Financial Data.” Annals of Economics and Finance 4: 103–24.
-    """
-    .format(doc_params=doc_params))
+    """.format(doc_params=doc_params)


 def kernel_pdf_lognorm_(x, sample, bw):
@@ -454,11 +743,23 @@ def kernel_pdf_lognorm_(x, sample, bw):

     Jin, Kawczak 2003
     """
-    pass
+    term = 8 * np.log(1 + bw)  # this is 2 * variance in normal pdf
+    pdf = (1 / np.sqrt(term * np.pi) / sample *
+           np.exp(- (np.log(x) - np.log(sample))**2 / term))
+    return pdf.mean(-1)
+
+
+def kernel_pdf_weibull(x, sample, bw):
+    # Weibull kernel for density, pdf, estimation
+
+    # need shape-scale parameterization for scipy
+    # references use m, lambda parameterization
+    return stats.weibull_min.pdf(sample, 1 / bw,
+                                 scale=x / special.gamma(1 + bw))


-kernel_pdf_weibull.__doc__ = (
-    """    Weibull kernel for density, pdf, estimation.
+kernel_pdf_weibull.__doc__ = """\
+    Weibull kernel for density, pdf, estimation.

     Based on cdf kernel by Mombeni et al. (2019)

@@ -469,10 +770,20 @@ kernel_pdf_weibull.__doc__ = (
     .. [1] Mombeni, Habib Allah, B Masouri, and Mohammad Reza Akhoond. 2019.
        “Asymmetric Kernels for Boundary Modification in Distribution Function
        Estimation.” REVSTAT, 1–27.
-    """
-    .format(doc_params=doc_params))
-kernel_cdf_weibull.__doc__ = (
-    """    Weibull kernel for cumulative distribution, cdf, estimation.
+    """.format(doc_params=doc_params)
+
+
+def kernel_cdf_weibull(x, sample, bw):
+    # Weibull kernel for cumulative distribution, cdf, estimation
+
+    # need shape-scale parameterization for scipy
+    # references use m, lambda parameterization
+    return stats.weibull_min.sf(sample, 1 / bw,
+                                scale=x / special.gamma(1 + bw))
+
+
+kernel_cdf_weibull.__doc__ = """\
+    Weibull kernel for cumulative distribution, cdf, estimation.

     {doc_params}

@@ -481,15 +792,34 @@ kernel_cdf_weibull.__doc__ = (
     .. [1] Mombeni, Habib Allah, B Masouri, and Mohammad Reza Akhoond. 2019.
        “Asymmetric Kernels for Boundary Modification in Distribution Function
        Estimation.” REVSTAT, 1–27.
-    """
-    .format(doc_params=doc_params))
-kernel_dict_cdf = {'beta': kernel_cdf_beta, 'beta2': kernel_cdf_beta2, 'bs':
-    kernel_cdf_bs, 'gamma': kernel_cdf_gamma, 'gamma2': kernel_cdf_gamma2,
-    'invgamma': kernel_cdf_invgamma, 'invgauss': kernel_cdf_invgauss,
-    'lognorm': kernel_cdf_lognorm, 'recipinvgauss':
-    kernel_cdf_recipinvgauss, 'weibull': kernel_cdf_weibull}
-kernel_dict_pdf = {'beta': kernel_pdf_beta, 'beta2': kernel_pdf_beta2, 'bs':
-    kernel_pdf_bs, 'gamma': kernel_pdf_gamma, 'gamma2': kernel_pdf_gamma2,
-    'invgamma': kernel_pdf_invgamma, 'invgauss': kernel_pdf_invgauss,
-    'lognorm': kernel_pdf_lognorm, 'recipinvgauss':
-    kernel_pdf_recipinvgauss, 'weibull': kernel_pdf_weibull}
+    """.format(doc_params=doc_params)
+
+
+# produced wth
+# print("\n".join(['"%s": %s,' % (i.split("_")[-1], i) for i in dir(kern)
+#                  if "kernel" in i and not i.endswith("_")]))
+kernel_dict_cdf = {
+    "beta": kernel_cdf_beta,
+    "beta2": kernel_cdf_beta2,
+    "bs": kernel_cdf_bs,
+    "gamma": kernel_cdf_gamma,
+    "gamma2": kernel_cdf_gamma2,
+    "invgamma": kernel_cdf_invgamma,
+    "invgauss": kernel_cdf_invgauss,
+    "lognorm": kernel_cdf_lognorm,
+    "recipinvgauss": kernel_cdf_recipinvgauss,
+    "weibull": kernel_cdf_weibull,
+    }
+
+kernel_dict_pdf = {
+    "beta": kernel_pdf_beta,
+    "beta2": kernel_pdf_beta2,
+    "bs": kernel_pdf_bs,
+    "gamma": kernel_pdf_gamma,
+    "gamma2": kernel_pdf_gamma2,
+    "invgamma": kernel_pdf_invgamma,
+    "invgauss": kernel_pdf_invgauss,
+    "lognorm": kernel_pdf_lognorm,
+    "recipinvgauss": kernel_pdf_recipinvgauss,
+    "weibull": kernel_pdf_weibull,
+    }
diff --git a/statsmodels/nonparametric/smoothers_lowess.py b/statsmodels/nonparametric/smoothers_lowess.py
index 904893200..d8a025250 100644
--- a/statsmodels/nonparametric/smoothers_lowess.py
+++ b/statsmodels/nonparametric/smoothers_lowess.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Lowess - wrapper for cythonized extension

 Author : Chris Jordan-Squire
@@ -5,13 +6,13 @@ Author : Carl Vogel
 Author : Josef Perktold

 """
+
 import numpy as np
 from ._smoothers_lowess import lowess as _lowess

-
-def lowess(endog, exog, frac=2.0 / 3.0, it=3, delta=0.0, xvals=None,
-    is_sorted=False, missing='drop', return_sorted=True):
-    """LOWESS (Locally Weighted Scatterplot Smoothing)
+def lowess(endog, exog, frac=2.0/3.0, it=3, delta=0.0, xvals=None, is_sorted=False,
+           missing='drop', return_sorted=True):
+    '''LOWESS (Locally Weighted Scatterplot Smoothing)

     A lowess function that outs smoothed estimates of endog
     at the given exog values from points (exog, endog)
@@ -132,5 +133,134 @@ def lowess(endog, exog, frac=2.0 / 3.0, it=3, delta=0.0, xvals=None,
     >>> z = lowess(y, x, frac= 1./3, it=0)
     >>> w = lowess(y, x, frac=1./3)

-    """
-    pass
+    '''
+
+    endog = np.asarray(endog, float)
+    exog = np.asarray(exog, float)
+
+    # Whether xvals argument was provided
+    given_xvals = (xvals is not None)
+
+    # Inputs should be vectors (1-D arrays) of the
+    # same length.
+    if exog.ndim != 1:
+        raise ValueError('exog must be a vector')
+    if endog.ndim != 1:
+        raise ValueError('endog must be a vector')
+    if endog.shape[0] != exog.shape[0] :
+        raise ValueError('exog and endog must have same length')
+
+    if xvals is not None:
+        xvals = np.ascontiguousarray(xvals)
+        if xvals.ndim != 1:
+            raise ValueError('exog_predict must be a vector')
+
+    if missing in ['drop', 'raise']:
+        mask_valid = (np.isfinite(exog) & np.isfinite(endog))
+        all_valid = np.all(mask_valid)
+        if all_valid:
+            y = endog
+            x = exog
+        else:
+            if missing == 'drop':
+                x = exog[mask_valid]
+                y = endog[mask_valid]
+            else:
+                raise ValueError('nan or inf found in data')
+    elif missing == 'none':
+        y = endog
+        x = exog
+        all_valid = True   # we assume it's true if missing='none'
+    else:
+        raise ValueError("missing can only be 'none', 'drop' or 'raise'")
+
+    if not is_sorted:
+        # Sort both inputs according to the ascending order of x values
+        sort_index = np.argsort(x)
+        x = np.array(x[sort_index])
+        y = np.array(y[sort_index])
+
+    if not given_xvals:
+        # If given no explicit x values, we use the x-values in the exog array
+        xvals = exog
+        xvalues = x
+
+        xvals_all_valid = all_valid
+        if missing == 'drop':
+            xvals_mask_valid = mask_valid
+    else:
+        if delta != 0.0:
+            raise ValueError("Cannot have non-zero 'delta' and 'xvals' values")
+            # TODO: allow this again
+        mask_valid = np.isfinite(xvals)
+        if missing == "raise":
+            raise ValueError("NaN values in xvals with missing='raise'")
+        elif missing == 'drop':
+            xvals_mask_valid = mask_valid
+
+        xvalues = xvals
+        xvals_all_valid = True if missing == "none" else np.all(mask_valid)
+        # With explicit xvals, we ignore 'return_sorted' and always
+        # use the order provided
+        return_sorted = False
+
+        if missing in ['drop', 'raise']:
+            xvals_mask_valid = np.isfinite(xvals)
+            xvals_all_valid = np.all(xvals_mask_valid)
+            if xvals_all_valid:
+                xvalues = xvals
+            else:
+                if missing == 'drop':
+                    xvalues = xvals[xvals_mask_valid]
+                else:
+                    raise ValueError("nan or inf found in xvals")
+
+        if not is_sorted:
+            sort_index = np.argsort(xvalues)
+            xvalues = np.array(xvalues[sort_index])
+        else:
+            xvals_all_valid = True
+    y = np.ascontiguousarray(y)
+    x = np.ascontiguousarray(x)
+    if not given_xvals:
+        # Run LOWESS on the data points
+        res, _ = _lowess(y, x, x, np.ones_like(x),
+                        frac=frac, it=it, delta=delta, given_xvals=False)
+    else:
+        # First run LOWESS on the data points to get the weights of the data points
+        # using it-1 iterations, last iter done next
+        if it > 0:
+            _, weights = _lowess(y, x, x, np.ones_like(x),
+                                frac=frac, it=it-1, delta=delta, given_xvals=False)
+        else:
+            weights = np.ones_like(x)
+        xvalues = np.ascontiguousarray(xvalues, dtype=float)
+        # Then run once more using those supplied weights at the points provided by xvals
+        # No extra iterations are performed here since weights are fixed
+        res, _ = _lowess(y, x, xvalues, weights,
+                        frac=frac, it=0, delta=delta, given_xvals=True)
+
+    _, yfitted = res.T
+
+    if return_sorted:
+        return res
+    else:
+
+        # rebuild yfitted with original indices
+        # a bit messy: y might have been selected twice
+        if not is_sorted:
+            yfitted_ = np.empty_like(xvalues)
+            yfitted_.fill(np.nan)
+            yfitted_[sort_index] = yfitted
+            yfitted = yfitted_
+        else:
+            yfitted = yfitted
+
+        if not xvals_all_valid:
+            yfitted_ = np.empty_like(xvals)
+            yfitted_.fill(np.nan)
+            yfitted_[xvals_mask_valid] = yfitted
+            yfitted = yfitted_
+
+        # we do not need to return exog anymore
+        return yfitted
diff --git a/statsmodels/nonparametric/smoothers_lowess_old.py b/statsmodels/nonparametric/smoothers_lowess_old.py
index 571f1b201..9daaed5a4 100644
--- a/statsmodels/nonparametric/smoothers_lowess_old.py
+++ b/statsmodels/nonparametric/smoothers_lowess_old.py
@@ -11,7 +11,7 @@ import numpy as np
 from numpy.linalg import lstsq


-def lowess(endog, exog, frac=2.0 / 3, it=3):
+def lowess(endog, exog, frac=2./3, it=3):
     """
     LOWESS (Locally Weighted Scatterplot Smoothing)

@@ -91,7 +91,34 @@ def lowess(endog, exog, frac=2.0 / 3, it=3):
     >>> z = lowess(y, x, frac= 1./3, it=0)
     >>> w = lowess(y, x, frac=1./3)
     """
-    pass
+    x = exog
+
+    if exog.ndim != 1:
+        raise ValueError('exog must be a vector')
+    if endog.ndim != 1:
+        raise ValueError('endog must be a vector')
+    if endog.shape[0] != x.shape[0] :
+        raise ValueError('exog and endog must have same length')
+
+    n = exog.shape[0]
+    fitted = np.zeros(n)
+
+    k = int(frac * n)
+
+    index_array = np.argsort(exog)
+    x_copy = np.array(exog[index_array]) #, dtype ='float32')
+    y_copy = endog[index_array]
+
+    fitted, weights = _lowess_initial_fit(x_copy, y_copy, k, n)
+
+    for i in range(it):
+        _lowess_robustify_fit(x_copy, y_copy, fitted,
+                              weights, k, n)
+
+    out = np.array([x_copy, fitted]).T
+    out.shape = (n,2)
+
+    return out


 def _lowess_initial_fit(x_copy, y_copy, k, n):
@@ -120,7 +147,33 @@ def _lowess_initial_fit(x_copy, y_copy, k, n):
         x-values

    """
-    pass
+    weights = np.zeros((n,k), dtype = x_copy.dtype)
+    nn_indices = [0,k]
+
+    X = np.ones((k,2))
+    fitted = np.zeros(n)
+
+    for i in range(n):
+        #note: all _lowess functions are inplace, no return
+        left_width = x_copy[i] - x_copy[nn_indices[0]]
+        right_width = x_copy[nn_indices[1]-1] - x_copy[i]
+        width = max(left_width, right_width)
+        _lowess_wt_standardize(weights[i,:],
+                                x_copy[nn_indices[0]:nn_indices[1]],
+                            x_copy[i], width)
+        _lowess_tricube(weights[i,:])
+        weights[i,:] = np.sqrt(weights[i,:])
+
+        X[:,1] = x_copy[nn_indices[0]:nn_indices[1]]
+        y_i = weights[i,:] * y_copy[nn_indices[0]:nn_indices[1]]
+
+        beta = lstsq(weights[i,:].reshape(k,1) * X, y_i, rcond=-1)[0]
+        fitted[i] = beta[0] + beta[1]*x_copy[i]
+
+        _lowess_update_nn(x_copy, nn_indices, i+1)
+
+
+    return fitted, weights


 def _lowess_wt_standardize(weights, new_entries, x_copy_i, width):
@@ -143,7 +196,9 @@ def _lowess_wt_standardize(weights, new_entries, x_copy_i, width):
     -------
     Nothing. The modifications are made to weight in place.
     """
-    pass
+    weights[:] = new_entries
+    weights -= x_copy_i
+    weights /= width


 def _lowess_robustify_fit(x_copy, y_copy, fitted, weights, k, n):
@@ -174,10 +229,36 @@ def _lowess_robustify_fit(x_copy, y_copy, fitted, weights, k, n):
     -------
     Nothing. The fitted values are modified in place.
     """
-    pass
+    nn_indices = [0,k]
+    X = np.ones((k,2))
+
+    residual_weights = np.copy(y_copy)
+    residual_weights.shape = (n,)
+    residual_weights -= fitted
+    residual_weights = np.absolute(residual_weights)#, out=residual_weights)
+    s = np.median(residual_weights)
+    residual_weights /= (6*s)
+    too_big = residual_weights>=1
+    _lowess_bisquare(residual_weights)
+    residual_weights[too_big] = 0
+
+
+    for i in range(n):
+        total_weights = weights[i,:] * np.sqrt(residual_weights[nn_indices[0]:
+                                                        nn_indices[1]])
+
+        X[:,1] = x_copy[nn_indices[0]:nn_indices[1]]
+        y_i = total_weights * y_copy[nn_indices[0]:nn_indices[1]]
+        total_weights.shape = (k,1)
+
+        beta = lstsq(total_weights * X, y_i, rcond=-1)[0]
+
+        fitted[i] = beta[0] + beta[1] * x_copy[i]
+
+        _lowess_update_nn(x_copy, nn_indices, i+1)


-def _lowess_update_nn(x, cur_nn, i):
+def _lowess_update_nn(x, cur_nn,i):
     """
     Update the endpoints of the nearest neighbors to
     the ith point.
@@ -198,7 +279,17 @@ def _lowess_update_nn(x, cur_nn, i):
     -------
     Nothing. It modifies cur_nn in place.
     """
-    pass
+    while True:
+        if cur_nn[1]<x.size:
+            left_dist = x[i] - x[cur_nn[0]]
+            new_right_dist = x[cur_nn[1]] - x[i]
+            if new_right_dist < left_dist:
+                cur_nn[0] = cur_nn[0] + 1
+                cur_nn[1] = cur_nn[1] + 1
+            else:
+                break
+        else:
+            break


 def _lowess_tricube(t):
@@ -216,7 +307,12 @@ def _lowess_tricube(t):
     -------
     Nothing
     """
-    pass
+    #t = (1-np.abs(t)**3)**3
+    t[:] = np.absolute(t) #, out=t) #numpy version?
+    _lowess_mycube(t)
+    t[:] = np.negative(t) #, out = t)
+    t += 1
+    _lowess_mycube(t)


 def _lowess_mycube(t):
@@ -232,7 +328,9 @@ def _lowess_mycube(t):
     -------
     Nothing
     """
-    pass
+    #t **= 3
+    t2 = t*t
+    t *= t2


 def _lowess_bisquare(t):
@@ -249,4 +347,8 @@ def _lowess_bisquare(t):
     -------
     Nothing
     """
-    pass
+    #t = (1-t**2)**2
+    t *= t
+    t[:] = np.negative(t) #, out=t)
+    t += 1
+    t *= t
diff --git a/statsmodels/othermod/api.py b/statsmodels/othermod/api.py
index 2133d479f..964b80e8d 100644
--- a/statsmodels/othermod/api.py
+++ b/statsmodels/othermod/api.py
@@ -1,2 +1,3 @@
 from .betareg import BetaModel
-__all__ = ['BetaModel']
+
+__all__ = ["BetaModel"]
diff --git a/statsmodels/othermod/betareg.py b/statsmodels/othermod/betareg.py
index 22e2195e0..ddf3fb345 100644
--- a/statsmodels/othermod/betareg.py
+++ b/statsmodels/othermod/betareg.py
@@ -1,3 +1,5 @@
+# -*- coding: utf-8 -*-
+
 u"""
 Beta regression for modeling rates and proportions.

@@ -11,14 +13,19 @@ Smithson, Michael, and Jay Verkuilen. "A better lemon squeezer?
 Maximum-likelihood regression with beta-distributed dependent variables."
 Psychological methods 11.1 (2006): 54.
 """
+
 import numpy as np
 from scipy.special import gammaln as lgamma
 import patsy
+
 import statsmodels.base.wrapper as wrap
 import statsmodels.regression.linear_model as lm
 from statsmodels.tools.decorators import cache_readonly
-from statsmodels.base.model import GenericLikelihoodModel, GenericLikelihoodModelResults, _LLRMixin
+from statsmodels.base.model import (
+    GenericLikelihoodModel, GenericLikelihoodModelResults, _LLRMixin)
 from statsmodels.genmod import families
+
+
 _init_example = """

     Beta regression with default of logit-link for exog and log-link
@@ -50,8 +57,7 @@ _init_example = """


 class BetaModel(GenericLikelihoodModel):
-    __doc__ = (
-        """Beta Regression.
+    __doc__ = """Beta Regression.

     The Model is parameterized by mean and precision. Both can depend on
     explanatory variables through link functions.
@@ -91,31 +97,39 @@ class BetaModel(GenericLikelihoodModel):
     --------
     :ref:`links`

-    """
-        .format(example=_init_example))
+    """.format(example=_init_example)
+
+    def __init__(self, endog, exog, exog_precision=None,
+                 link=families.links.Logit(),
+                 link_precision=families.links.Log(), **kwds):

-    def __init__(self, endog, exog, exog_precision=None, link=families.
-        links.Logit(), link_precision=families.links.Log(), **kwds):
         etmp = np.array(endog)
         assert np.all((0 < etmp) & (etmp < 1))
         if exog_precision is None:
             extra_names = ['precision']
             exog_precision = np.ones((len(endog), 1), dtype='f')
         else:
-            extra_names = [('precision-%s' % zc) for zc in (exog_precision.
-                columns if hasattr(exog_precision, 'columns') else range(1,
-                exog_precision.shape[1] + 1))]
+            extra_names = ['precision-%s' % zc for zc in
+                           (exog_precision.columns
+                            if hasattr(exog_precision, 'columns')
+                            else range(1, exog_precision.shape[1] + 1))]
+
         kwds['extra_params_names'] = extra_names
-        super(BetaModel, self).__init__(endog, exog, exog_precision=
-            exog_precision, **kwds)
+
+        super(BetaModel, self).__init__(endog, exog,
+                                        exog_precision=exog_precision,
+                                        **kwds)
         self.link = link
         self.link_precision = link_precision
+        # not needed, handled by super:
+        # self.exog_precision = exog_precision
+        # inherited df do not account for precision params
         self.nobs = self.endog.shape[0]
         self.k_extra = 1
         self.df_model = self.nparams - 2
         self.df_resid = self.nobs - self.nparams
         assert len(self.exog_precision) == len(self.endog)
-        self.hess_type = 'oim'
+        self.hess_type = "oim"
         if 'exog_precision' not in self._init_keys:
             self._init_keys.extend(['exog_precision'])
         self._init_keys.extend(['link', 'link_precision'])
@@ -125,7 +139,24 @@ class BetaModel(GenericLikelihoodModel):
         self.results_class = BetaResults
         self.results_class_wrapper = BetaResultsWrapper

-    def predict(self, params, exog=None, exog_precision=None, which='mean'):
+    @classmethod
+    def from_formula(cls, formula, data, exog_precision_formula=None,
+                     *args, **kwargs):
+        if exog_precision_formula is not None:
+            if 'subset' in kwargs:
+                d = data.ix[kwargs['subset']]
+                Z = patsy.dmatrix(exog_precision_formula, d)
+            else:
+                Z = patsy.dmatrix(exog_precision_formula, data)
+            kwargs['exog_precision'] = Z
+
+        return super(BetaModel, cls).from_formula(formula, data, *args,
+                                                  **kwargs)
+
+    def _get_exogs(self):
+        return (self.exog, self.exog_precision)
+
+    def predict(self, params, exog=None, exog_precision=None, which="mean"):
         """Predict values for mean or precision

         Parameters
@@ -147,7 +178,48 @@ class BetaModel(GenericLikelihoodModel):
         -------
         ndarray, predicted values
         """
-        pass
+        # compatibility with old names and misspelling
+        if which == "linpred":
+            which = "linear"
+        if which in ["linpred_precision", "linear_precision"]:
+            which = "linear-precision"
+
+        k_mean = self.exog.shape[1]
+        if which in ["mean",  "linear"]:
+            if exog is None:
+                exog = self.exog
+            params_mean = params[:k_mean]
+            # Zparams = params[k_mean:]
+            linpred = np.dot(exog, params_mean)
+            if which == "mean":
+                mu = self.link.inverse(linpred)
+                res = mu
+            else:
+                res = linpred
+
+        elif which in ["precision", "linear-precision"]:
+            if exog_precision is None:
+                exog_precision = self.exog_precision
+            params_prec = params[k_mean:]
+            linpred_prec = np.dot(exog_precision, params_prec)
+
+            if which == "precision":
+                phi = self.link_precision.inverse(linpred_prec)
+                res = phi
+            else:
+                res = linpred_prec
+
+        elif which == "var":
+            res = self._predict_var(
+                params,
+                exog=exog,
+                exog_precision=exog_precision
+                )
+
+        else:
+            raise ValueError('which = %s is not available' % which)
+
+        return res

     def _predict_precision(self, params, exog_precision=None):
         """Predict values for precision function for given exog_precision.
@@ -163,7 +235,15 @@ class BetaModel(GenericLikelihoodModel):
         -------
         Predicted precision.
         """
-        pass
+        if exog_precision is None:
+            exog_precision = self.exog_precision
+
+        k_mean = self.exog.shape[1]
+        params_precision = params[k_mean:]
+        linpred_prec = np.dot(exog_precision, params_precision)
+        phi = self.link_precision.inverse(linpred_prec)
+
+        return phi

     def _predict_var(self, params, exog=None, exog_precision=None):
         """predict values for conditional variance V(endog | exog)
@@ -181,7 +261,12 @@ class BetaModel(GenericLikelihoodModel):
         -------
         Predicted conditional variance.
         """
-        pass
+        mean = self.predict(params, exog=exog)
+        precision = self._predict_precision(params,
+                                            exog_precision=exog_precision)
+
+        var_endog = mean * (1 - mean) / (1 + precision)
+        return var_endog

     def loglikeobs(self, params):
         """
@@ -199,7 +284,7 @@ class BetaModel(GenericLikelihoodModel):
             The log likelihood for each observation of the model evaluated
             at `params`.
         """
-        pass
+        return self._llobs(self.endog, self.exog, self.exog_precision, params)

     def _llobs(self, endog, exog, exog_precision, params):
         """
@@ -223,7 +308,27 @@ class BetaModel(GenericLikelihoodModel):
             The log likelihood for each observation of the model evaluated
             at `params`.
         """
-        pass
+        y, X, Z = endog, exog, exog_precision
+        nz = Z.shape[1]
+
+        params_mean = params[:-nz]
+        params_prec = params[-nz:]
+        linpred = np.dot(X, params_mean)
+        linpred_prec = np.dot(Z, params_prec)
+
+        mu = self.link.inverse(linpred)
+        phi = self.link_precision.inverse(linpred_prec)
+
+        eps_lb = 1e-200
+        alpha = np.clip(mu * phi, eps_lb, np.inf)
+        beta = np.clip((1 - mu) * phi, eps_lb, np.inf)
+
+        ll = (lgamma(phi) - lgamma(alpha)
+              - lgamma(beta)
+              + (mu * phi - 1) * np.log(y)
+              + (((1 - mu) * phi) - 1) * np.log(1 - y))
+
+        return ll

     def score(self, params):
         """
@@ -241,7 +346,11 @@ class BetaModel(GenericLikelihoodModel):
         score : ndarray
             First derivative of loglikelihood function.
         """
-        pass
+        sf1, sf2 = self.score_factor(params)
+
+        d1 = np.dot(sf1, self.exog)
+        d2 = np.dot(sf2, self.exog_precision)
+        return np.concatenate((d1, d2))

     def _score_check(self, params):
         """Inherited score with finite differences
@@ -255,7 +364,7 @@ class BetaModel(GenericLikelihoodModel):
         -------
         score based on numerical derivatives
         """
-        pass
+        return super(BetaModel, self).score(params)

     def score_factor(self, params, endog=None):
         """Derivative of loglikelihood function w.r.t. linear predictors.
@@ -280,10 +389,39 @@ class BetaModel(GenericLikelihoodModel):
             - d2 = sf[:, 1:2] * exog_precision

         """
-        pass
+        from scipy import special
+        digamma = special.psi
+
+        y = self.endog if endog is None else endog
+        X, Z = self.exog, self.exog_precision
+        nz = Z.shape[1]
+        Xparams = params[:-nz]
+        Zparams = params[-nz:]
+
+        # NO LINKS
+        mu = self.link.inverse(np.dot(X, Xparams))
+        phi = self.link_precision.inverse(np.dot(Z, Zparams))
+
+        eps_lb = 1e-200  # lower bound for evaluating digamma, avoids -inf
+        alpha = np.clip(mu * phi, eps_lb, np.inf)
+        beta = np.clip((1 - mu) * phi, eps_lb, np.inf)
+
+        ystar = np.log(y / (1. - y))
+        dig_beta = digamma(beta)
+        mustar = digamma(alpha) - dig_beta
+        yt = np.log(1 - y)
+        mut = dig_beta - digamma(phi)

-    def score_hessian_factor(self, params, return_hessian=False, observed=True
-        ):
+        t = 1. / self.link.deriv(mu)
+        h = 1. / self.link_precision.deriv(phi)
+        #
+        sf1 = phi * t * (ystar - mustar)
+        sf2 = h * (mu * (ystar - mustar) + yt - mut)
+
+        return (sf1, sf2)
+
+    def score_hessian_factor(self, params, return_hessian=False,
+                             observed=True):
         """Derivatives of loglikelihood function w.r.t. linear predictors.

         This calculates score and hessian factors at the same time, because
@@ -309,7 +447,63 @@ class BetaModel(GenericLikelihoodModel):
             triangle of the Hessian matrix.
             TODO: check why there are minus
         """
-        pass
+        from scipy import special
+        digamma = special.psi
+
+        y, X, Z = self.endog, self.exog, self.exog_precision
+        nz = Z.shape[1]
+        Xparams = params[:-nz]
+        Zparams = params[-nz:]
+
+        # NO LINKS
+        mu = self.link.inverse(np.dot(X, Xparams))
+        phi = self.link_precision.inverse(np.dot(Z, Zparams))
+
+        # We need to prevent mu = 0 and (1-mu) = 0 in digamma call
+        eps_lb = 1e-200  # lower bound for evaluating digamma, avoids -inf
+        alpha = np.clip(mu * phi, eps_lb, np.inf)
+        beta = np.clip((1 - mu) * phi, eps_lb, np.inf)
+
+        ystar = np.log(y / (1. - y))
+        dig_beta = digamma(beta)
+        mustar = digamma(alpha) - dig_beta
+        yt = np.log(1 - y)
+        mut = dig_beta - digamma(phi)
+
+        t = 1. / self.link.deriv(mu)
+        h = 1. / self.link_precision.deriv(phi)
+
+        ymu_star = (ystar - mustar)
+        sf1 = phi * t * ymu_star
+        sf2 = h * (mu * ymu_star + yt - mut)
+
+        if return_hessian:
+            trigamma = lambda x: special.polygamma(1, x)  # noqa
+            trig_beta = trigamma(beta)
+            var_star = trigamma(alpha) + trig_beta
+            var_t = trig_beta - trigamma(phi)
+
+            c = - trig_beta
+            s = self.link.deriv2(mu)
+            q = self.link_precision.deriv2(phi)
+
+            jbb = (phi * t) * var_star
+            if observed:
+                jbb += s * t**2 * ymu_star
+
+            jbb *= t * phi
+
+            jbg = phi * t * h * (mu * var_star + c)
+            if observed:
+                jbg -= ymu_star * t * h
+
+            jgg = h**2 * (mu**2 * var_star + 2 * mu * c + var_t)
+            if observed:
+                jgg += (mu * ymu_star + yt - mut) * q * h**3    # **3 ?
+
+            return (sf1, sf2), (-jbb, -jbg, -jgg)
+        else:
+            return (sf1, sf2)

     def score_obs(self, params):
         """
@@ -326,7 +520,12 @@ class BetaModel(GenericLikelihoodModel):
             The first derivative of the loglikelihood function evaluated at
             params for each observation.
         """
-        pass
+        sf1, sf2 = self.score_factor(params)
+
+        # elementwise product for each row (observation)
+        d1 = sf1[:, None] * self.exog
+        d2 = sf2[:, None] * self.exog_precision
+        return np.column_stack((d1, d2))

     def hessian(self, params, observed=None):
         """Hessian, second derivative of loglikelihood function
@@ -344,12 +543,27 @@ class BetaModel(GenericLikelihoodModel):
         hessian : ndarray
             Hessian, i.e. observed information, or expected information matrix.
         """
-        pass
+        if self.hess_type == "eim":
+            observed = False
+        else:
+            observed = True
+        _, hf = self.score_hessian_factor(params, return_hessian=True,
+                                          observed=observed)
+
+        hf11, hf12, hf22 = hf
+
+        # elementwise product for each row (observation)
+        d11 = (self.exog.T * hf11).dot(self.exog)
+        d12 = (self.exog.T * hf12).dot(self.exog_precision)
+        d22 = (self.exog_precision.T * hf22).dot(self.exog_precision)
+        return np.block([[d11, d12], [d12.T, d22]])

     def hessian_factor(self, params, observed=True):
         """Derivatives of loglikelihood function w.r.t. linear predictors.
         """
-        pass
+        _, hf = self.score_hessian_factor(params, return_hessian=True,
+                                          observed=observed)
+        return hf

     def _start_params(self, niter=2, return_intermediate=False):
         """find starting values
@@ -379,10 +593,44 @@ class BetaModel(GenericLikelihoodModel):
         This calculates a few iteration of weighted least squares. This is not
         a full scoring algorithm.
         """
-        pass
-
-    def fit(self, start_params=None, maxiter=1000, disp=False, method=
-        'bfgs', **kwds):
+        # WLS of the mean equation uses the implied weights (inverse variance),
+        # WLS for the precision equations uses weights that only take
+        # account of the link transformation of the precision endog.
+        from statsmodels.regression.linear_model import OLS, WLS
+        res_m = OLS(self.link(self.endog), self.exog).fit()
+        fitted = self.link.inverse(res_m.fittedvalues)
+        resid = self.endog - fitted
+
+        prec_i = fitted * (1 - fitted) / np.maximum(np.abs(resid), 1e-2)**2 - 1
+        res_p = OLS(self.link_precision(prec_i), self.exog_precision).fit()
+        prec_fitted = self.link_precision.inverse(res_p.fittedvalues)
+        # sp = np.concatenate((res_m.params, res_p.params))
+
+        for _ in range(niter):
+            y_var_inv = (1 + prec_fitted) / (fitted * (1 - fitted))
+            # y_var = fitted * (1 - fitted) / (1 + prec_fitted)
+
+            ylink_var_inv = y_var_inv / self.link.deriv(fitted)**2
+            res_m2 = WLS(self.link(self.endog), self.exog,
+                         weights=ylink_var_inv).fit()
+            fitted = self.link.inverse(res_m2.fittedvalues)
+            resid2 = self.endog - fitted
+
+            prec_i2 = (fitted * (1 - fitted) /
+                       np.maximum(np.abs(resid2), 1e-2)**2 - 1)
+            w_p = 1. / self.link_precision.deriv(prec_fitted)**2
+            res_p2 = WLS(self.link_precision(prec_i2), self.exog_precision,
+                         weights=w_p).fit()
+            prec_fitted = self.link_precision.inverse(res_p2.fittedvalues)
+            sp2 = np.concatenate((res_m2.params, res_p2.params))
+
+        if return_intermediate:
+            return sp2, res_m2, res_p2
+
+        return sp2
+
+    def fit(self, start_params=None, maxiter=1000, disp=False,
+            method='bfgs', **kwds):
         """
         Fit the model by maximum likelihood.

@@ -404,7 +652,27 @@ class BetaModel(GenericLikelihoodModel):
         -------
         BetaResults instance.
         """
-        pass
+
+        if start_params is None:
+            start_params = self._start_params()
+#           # http://www.ime.usp.br/~sferrari/beta.pdf suggests starting phi
+#           # on page 8
+
+        if "cov_type" in kwds:
+            # this is a workaround because we cannot tell super to use eim
+            if kwds["cov_type"].lower() == "eim":
+                self.hess_type = "eim"
+                del kwds["cov_type"]
+        else:
+            self.hess_type = "oim"
+
+        res = super(BetaModel, self).fit(start_params=start_params,
+                                         maxiter=maxiter, method=method,
+                                         disp=disp, **kwds)
+        if not isinstance(res, BetaResultsWrapper):
+            # currently GenericLikelihoodModel doe not add wrapper
+            res = BetaResultsWrapper(res)
+        return res

     def _deriv_mean_dparams(self, params):
         """
@@ -422,7 +690,11 @@ class BetaModel(GenericLikelihoodModel):
         The value of the derivative of the expected endog with respect
         to the parameter vector.
         """
-        pass
+        link = self.link
+        lin_pred = self.predict(params, which="linear")
+        idl = link.inverse_deriv(lin_pred)
+        dmat = self.exog * idl[:, None]
+        return np.column_stack((dmat, np.zeros(self.exog_precision.shape)))

     def _deriv_score_obs_dendog(self, params):
         """derivative of score_obs w.r.t. endog
@@ -437,7 +709,20 @@ class BetaModel(GenericLikelihoodModel):
         derivative : ndarray_2d
             The derivative of the score_obs with respect to endog.
         """
-        pass
+        from statsmodels.tools.numdiff import _approx_fprime_cs_scalar
+
+        def f(y):
+            if y.ndim == 2 and y.shape[1] == 1:
+                y = y[:, 0]
+            sf = self.score_factor(params, endog=y)
+            return np.column_stack(sf)
+
+        dsf = _approx_fprime_cs_scalar(self.endog[:, None], f)
+        # deriv is 2d vector
+        d1 = dsf[:, :1] * self.exog
+        d2 = dsf[:, 1:2] * self.exog_precision
+
+        return np.column_stack((d1, d2))

     def get_distribution_params(self, params, exog=None, exog_precision=None):
         """
@@ -458,7 +743,10 @@ class BetaModel(GenericLikelihoodModel):
             Parameters for the scipy distribution to evaluate predictive
             distribution.
         """
-        pass
+        mean = self.predict(params, exog=exog)
+        precision = self.predict(params, exog_precision=exog_precision,
+                                 which="precision")
+        return precision * mean, precision * (1 - mean)

     def get_distribution(self, params, exog=None, exog_precision=None):
         """
@@ -493,7 +781,11 @@ class BetaModel(GenericLikelihoodModel):
         to fit the model.  If any other value is used for ``n``, misleading
         results will be produced.
         """
-        pass
+        from scipy import stats
+        args = self.get_distribution_params(params, exog=exog,
+                                            exog_precision=exog_precision)
+        distr = stats.beta(*args)
+        return distr


 class BetaResults(GenericLikelihoodModelResults, _LLRMixin):
@@ -503,25 +795,27 @@ class BetaResults(GenericLikelihoodModelResults, _LLRMixin):
     inherited methods might be appropriate in this case.
     """

+    # GenericLikeihoodmodel doesn't define fittedvalues, residuals and similar
     @cache_readonly
     def fittedvalues(self):
         """In-sample predicted mean, conditional expectation."""
-        pass
+        return self.model.predict(self.params)

     @cache_readonly
     def fitted_precision(self):
         """In-sample predicted precision"""
-        pass
+        return self.model.predict(self.params, which="precision")

     @cache_readonly
     def resid(self):
         """Response residual"""
-        pass
+        return self.model.endog - self.fittedvalues

     @cache_readonly
     def resid_pearson(self):
         """Pearson standardize residual"""
-        pass
+        std = np.sqrt(self.model.predict(self.params, which="var"))
+        return self.resid / std

     @cache_readonly
     def prsquared(self):
@@ -529,10 +823,10 @@ class BetaResults(GenericLikelihoodModelResults, _LLRMixin):

         1 - exp((llnull - .llf) * (2 / nobs))
         """
-        pass
+        return self.pseudo_rsquared(kind="lr")

     def get_distribution_params(self, exog=None, exog_precision=None,
-        transform=True):
+                                transform=True):
         """
         Return distribution parameters converted from model prediction.

@@ -552,7 +846,10 @@ class BetaResults(GenericLikelihoodModelResults, _LLRMixin):
             Parameters for the scipy distribution to evaluate predictive
             distribution.
         """
-        pass
+        mean = self.predict(exog=exog, transform=transform)
+        precision = self.predict(exog_precision=exog_precision,
+                                 which="precision", transform=transform)
+        return precision * mean, precision * (1 - mean)

     def get_distribution(self, exog=None, exog_precision=None, transform=True):
         """
@@ -588,7 +885,13 @@ class BetaResults(GenericLikelihoodModelResults, _LLRMixin):
         to fit the model.  If any other value is used for ``n``, misleading
         results will be produced.
         """
-        pass
+        from scipy import stats
+        args = self.get_distribution_params(exog=exog,
+                                            exog_precision=exog_precision,
+                                            transform=transform)
+        args = (np.asarray(arg) for arg in args)
+        distr = stats.beta(*args)
+        return distr

     def get_influence(self):
         """
@@ -621,11 +924,16 @@ class BetaResults(GenericLikelihoodModelResults, _LLRMixin):
         todo

         """
-        pass
+        from statsmodels.stats.outliers_influence import MLEInfluence
+        return MLEInfluence(self)
+
+    def bootstrap(self, *args, **kwargs):
+        raise NotImplementedError


 class BetaResultsWrapper(lm.RegressionResultsWrapper):
     pass


-wrap.populate_wrapper(BetaResultsWrapper, BetaResults)
+wrap.populate_wrapper(BetaResultsWrapper,
+                      BetaResults)
diff --git a/statsmodels/regression/_prediction.py b/statsmodels/regression/_prediction.py
index 05275580a..9cd1a3878 100644
--- a/statsmodels/regression/_prediction.py
+++ b/statsmodels/regression/_prediction.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Fri Dec 19 11:29:18 2014

@@ -5,11 +6,13 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 from scipy import stats
 import pandas as pd


+# this is similar to ContrastResults after t_test, copied and adjusted
 class PredictionResults:
     """
     Results class for predictions.
@@ -31,23 +34,47 @@ class PredictionResults:
         Row labels used in summary frame.
     """

-    def __init__(self, predicted_mean, var_pred_mean, var_resid, df=None,
-        dist=None, row_labels=None):
+    def __init__(self, predicted_mean, var_pred_mean, var_resid,
+                 df=None, dist=None, row_labels=None):
         self.predicted = predicted_mean
         self.var_pred = var_pred_mean
         self.df = df
         self.var_resid = var_resid
         self.row_labels = row_labels
+
         if dist is None or dist == 'norm':
             self.dist = stats.norm
             self.dist_args = ()
         elif dist == 't':
             self.dist = stats.t
-            self.dist_args = self.df,
+            self.dist_args = (self.df,)
         else:
             self.dist = dist
             self.dist_args = ()

+    @property
+    def se_obs(self):
+        return np.sqrt(self.var_pred_mean + self.var_resid)
+
+    @property
+    def se_mean(self):
+        return self.se
+
+    @property
+    def predicted_mean(self):
+        # alias for backwards compatibility
+        return self.predicted
+
+    @property
+    def var_pred_mean(self):
+        # alias for backwards compatibility
+        return self.var_pred
+
+    @property
+    def se(self):
+        # alias for backwards compatibility
+        return np.sqrt(self.var_pred_mean)
+
     def conf_int(self, obs=False, alpha=0.05):
         """
         Returns the confidence interval of the value, `effect` of the
@@ -67,11 +94,37 @@ class PredictionResults:
             The array has the lower and the upper limit of the confidence
             interval in the columns.
         """
-        pass
+
+        se = self.se_obs if obs else self.se_mean
+
+        q = self.dist.ppf(1 - alpha / 2., *self.dist_args)
+        lower = self.predicted_mean - q * se
+        upper = self.predicted_mean + q * se
+        return np.column_stack((lower, upper))
+
+    def summary_frame(self, alpha=0.05):
+        # TODO: finish and cleanup
+        ci_obs = self.conf_int(alpha=alpha, obs=True)  # need to split
+        ci_mean = self.conf_int(alpha=alpha, obs=False)
+        to_include = {}
+        to_include['mean'] = self.predicted_mean
+        to_include['mean_se'] = self.se_mean
+        to_include['mean_ci_lower'] = ci_mean[:, 0]
+        to_include['mean_ci_upper'] = ci_mean[:, 1]
+        to_include['obs_ci_lower'] = ci_obs[:, 0]
+        to_include['obs_ci_upper'] = ci_obs[:, 1]
+
+        self.table = to_include
+        # pandas dict does not handle 2d_array
+        # data = np.column_stack(list(to_include.values()))
+        # names = ....
+        res = pd.DataFrame(to_include, index=self.row_labels,
+                           columns=to_include.keys())
+        return res


 def get_prediction(self, exog=None, transform=True, weights=None,
-    row_labels=None, pred_kwds=None):
+                   row_labels=None, pred_kwds=None):
     """
     Compute prediction results.

@@ -103,4 +156,61 @@ def get_prediction(self, exog=None, transform=True, weights=None,
         variance and can on demand calculate confidence intervals and summary
         tables for the prediction of the mean and of new observations.
     """
-    pass
+
+    # prepare exog and row_labels, based on base Results.predict
+    if transform and hasattr(self.model, 'formula') and exog is not None:
+        from patsy import dmatrix
+        if isinstance(exog, pd.Series):
+            # GH-6509
+            exog = pd.DataFrame(exog)
+        exog = dmatrix(self.model.data.design_info, exog)
+
+    if exog is not None:
+        if row_labels is None:
+            row_labels = getattr(exog, 'index', None)
+            if callable(row_labels):
+                row_labels = None
+
+        exog = np.asarray(exog)
+        if exog.ndim == 1:
+            # Params informs whether a row or column vector
+            if self.params.shape[0] > 1:
+                exog = exog[None, :]
+            else:
+                exog = exog[:, None]
+        exog = np.atleast_2d(exog)  # needed in count model shape[1]
+    else:
+        exog = self.model.exog
+        if weights is None:
+            weights = getattr(self.model, 'weights', None)
+
+        if row_labels is None:
+            row_labels = getattr(self.model.data, 'row_labels', None)
+
+    # need to handle other arrays, TODO: is delegating to model possible ?
+    if weights is not None:
+        weights = np.asarray(weights)
+        if (weights.size > 1 and
+                (weights.ndim != 1 or weights.shape[0] == exog.shape[1])):
+            raise ValueError('weights has wrong shape')
+
+    if pred_kwds is None:
+        pred_kwds = {}
+    predicted_mean = self.model.predict(self.params, exog, **pred_kwds)
+
+    covb = self.cov_params()
+    var_pred_mean = (exog * np.dot(covb, exog.T).T).sum(1)
+    var_resid = self.scale  # self.mse_resid / weights
+
+    # TODO: check that we have correct scale, Refactor scale #???
+    # special case for now:
+    if self.cov_type == 'fixed scale':
+        var_resid = self.cov_kwds['scale']
+
+    if weights is not None:
+        var_resid /= weights
+
+    dist = ['norm', 't'][self.use_t]
+    return PredictionResults(predicted_mean, var_pred_mean, var_resid,
+                             df=self.df_resid, dist=dist,
+                             row_labels=row_labels)
diff --git a/statsmodels/regression/_tools.py b/statsmodels/regression/_tools.py
index 264140e9b..13c987328 100644
--- a/statsmodels/regression/_tools.py
+++ b/statsmodels/regression/_tools.py
@@ -34,10 +34,11 @@ class _MinimalWLS:
     Does not perform any checks on the input data for type or shape
     compatibility
     """
+
     msg = 'NaN, inf or invalid value detected in {0}, estimation infeasible.'

     def __init__(self, endog, exog, weights=1.0, check_endog=False,
-        check_weights=False):
+                 check_weights=False):
         self.endog = endog
         self.exog = exog
         self.weights = weights
@@ -45,9 +46,11 @@ class _MinimalWLS:
         if check_weights:
             if not np.all(np.isfinite(w_half)):
                 raise ValueError(self.msg.format('weights'))
+
         if check_endog:
             if not np.all(np.isfinite(endog)):
                 raise ValueError(self.msg.format('endog'))
+
         self.wendog = w_half * endog
         if np.isscalar(weights):
             self.wexog = w_half * exog
@@ -88,7 +91,16 @@ class _MinimalWLS:
         --------
         statsmodels.regression.linear_model.WLS
         """
-        pass
+        if method == 'pinv':
+            pinv_wexog = np.linalg.pinv(self.wexog)
+            params = pinv_wexog.dot(self.wendog)
+        elif method == 'qr':
+            Q, R = np.linalg.qr(self.wexog)
+            params = np.linalg.solve(R, np.dot(Q.T, self.wendog))
+        else:
+            params, _, _, _ = np.linalg.lstsq(self.wexog, self.wendog,
+                                              rcond=-1)
+        return self.results(params)

     def results(self, params):
         """
@@ -102,4 +114,11 @@ class _MinimalWLS:
         Allows results to be constructed from either existing parameters or
         when estimated using using ``fit``
         """
-        pass
+        fitted_values = self.exog.dot(params)
+        resid = self.endog - fitted_values
+        wresid = self.wendog - self.wexog.dot(params)
+        df_resid = self.wexog.shape[0] - self.wexog.shape[1]
+        scale = np.dot(wresid, wresid) / df_resid
+
+        return Bunch(params=params, fittedvalues=fitted_values, resid=resid,
+                     model=self, scale=scale)
diff --git a/statsmodels/regression/dimred.py b/statsmodels/regression/dimred.py
index b4085d462..5d401c181 100644
--- a/statsmodels/regression/dimred.py
+++ b/statsmodels/regression/dimred.py
@@ -1,6 +1,8 @@
 import warnings
+
 import numpy as np
 import pandas as pd
+
 from statsmodels.base import model
 import statsmodels.base.wrapper as wrap
 from statsmodels.tools.sm_exceptions import ConvergenceWarning
@@ -14,6 +16,23 @@ class _DimReductionRegression(model.Model):
     def __init__(self, endog, exog, **kwargs):
         super(_DimReductionRegression, self).__init__(endog, exog, **kwargs)

+    def _prep(self, n_slice):
+
+        # Sort the data by endog
+        ii = np.argsort(self.endog)
+        x = self.exog[ii, :]
+
+        # Whiten the data
+        x -= x.mean(0)
+        covx = np.dot(x.T, x) / x.shape[0]
+        covxr = np.linalg.cholesky(covx)
+        x = np.linalg.solve(covxr, x.T).T
+        self.wexog = x
+        self._covxr = covxr
+
+        # Split the data into slices
+        self._split_wexog = np.array_split(x, n_slice)
+

 class SlicedInverseReg(_DimReductionRegression):
     """
@@ -41,10 +60,107 @@ class SlicedInverseReg(_DimReductionRegression):
         slice_n : int, optional
             Target number of observations per slice
         """
-        pass
+
+        # Sample size per slice
+        if len(kwargs) > 0:
+            msg = "SIR.fit does not take any extra keyword arguments"
+            warnings.warn(msg)
+
+        # Number of slices
+        n_slice = self.exog.shape[0] // slice_n
+
+        self._prep(n_slice)
+
+        mn = [z.mean(0) for z in self._split_wexog]
+        n = [z.shape[0] for z in self._split_wexog]
+        mn = np.asarray(mn)
+        n = np.asarray(n)
+
+        # Estimate Cov E[X | Y=y]
+        mnc = np.dot(mn.T, n[:, None] * mn) / n.sum()
+
+        a, b = np.linalg.eigh(mnc)
+        jj = np.argsort(-a)
+        a = a[jj]
+        b = b[:, jj]
+        params = np.linalg.solve(self._covxr.T, b)
+
+        results = DimReductionResults(self, params, eigs=a)
+        return DimReductionResultsWrapper(results)
+
+    def _regularized_objective(self, A):
+        # The objective function for regularized SIR
+
+        p = self.k_vars
+        covx = self._covx
+        mn = self._slice_means
+        ph = self._slice_props
+        v = 0
+        A = np.reshape(A, (p, self.ndim))
+
+        # The penalty
+        for k in range(self.ndim):
+            u = np.dot(self.pen_mat, A[:, k])
+            v += np.sum(u * u)
+
+        # The SIR objective function
+        covxa = np.dot(covx, A)
+        q, _ = np.linalg.qr(covxa)
+        qd = np.dot(q, np.dot(q.T, mn.T))
+        qu = mn.T - qd
+        v += np.dot(ph, (qu * qu).sum(0))
+
+        return v
+
+    def _regularized_grad(self, A):
+        # The gradient of the objective function for regularized SIR
+
+        p = self.k_vars
+        ndim = self.ndim
+        covx = self._covx
+        n_slice = self.n_slice
+        mn = self._slice_means
+        ph = self._slice_props
+        A = A.reshape((p, ndim))
+
+        # Penalty gradient
+        gr = 2 * np.dot(self.pen_mat.T, np.dot(self.pen_mat, A))
+
+        A = A.reshape((p, ndim))
+        covxa = np.dot(covx, A)
+        covx2a = np.dot(covx, covxa)
+        Q = np.dot(covxa.T, covxa)
+        Qi = np.linalg.inv(Q)
+        jm = np.zeros((p, ndim))
+        qcv = np.linalg.solve(Q, covxa.T)
+
+        ft = [None] * (p * ndim)
+        for q in range(p):
+            for r in range(ndim):
+                jm *= 0
+                jm[q, r] = 1
+                umat = np.dot(covx2a.T, jm)
+                umat += umat.T
+                umat = -np.dot(Qi, np.dot(umat, Qi))
+                fmat = np.dot(np.dot(covx, jm), qcv)
+                fmat += np.dot(covxa, np.dot(umat, covxa.T))
+                fmat += np.dot(covxa, np.linalg.solve(Q, np.dot(jm.T, covx)))
+                ft[q*ndim + r] = fmat
+
+        ch = np.linalg.solve(Q, np.dot(covxa.T, mn.T))
+        cu = mn - np.dot(covxa, ch).T
+        for i in range(n_slice):
+            u = cu[i, :]
+            v = mn[i, :]
+            for q in range(p):
+                for r in range(ndim):
+                    f = np.dot(u, np.dot(ft[q*ndim + r], v))
+                    gr[q, r] -= 2 * ph[i] * f
+
+        return gr.ravel()

     def fit_regularized(self, ndim=1, pen_mat=None, slice_n=20, maxiter=100,
-        gtol=0.001, **kwargs):
+                        gtol=1e-3, **kwargs):
         """
         Estimate the EDR space using regularized SIR.

@@ -84,7 +200,65 @@ class SlicedInverseReg(_DimReductionRegression):
         analysis.  Statistics: a journal of theoretical and applied
         statistics 37(6) 475-488.
         """
-        pass
+
+        if len(kwargs) > 0:
+            msg = "SIR.fit_regularized does not take keyword arguments"
+            warnings.warn(msg)
+
+        if pen_mat is None:
+            raise ValueError("pen_mat is a required argument")
+
+        start_params = kwargs.get("start_params", None)
+
+        # Sample size per slice
+        slice_n = kwargs.get("slice_n", 20)
+
+        # Number of slices
+        n_slice = self.exog.shape[0] // slice_n
+
+        # Sort the data by endog
+        ii = np.argsort(self.endog)
+        x = self.exog[ii, :]
+        x -= x.mean(0)
+
+        covx = np.cov(x.T)
+
+        # Split the data into slices
+        split_exog = np.array_split(x, n_slice)
+
+        mn = [z.mean(0) for z in split_exog]
+        n = [z.shape[0] for z in split_exog]
+        mn = np.asarray(mn)
+        n = np.asarray(n)
+        self._slice_props = n / n.sum()
+        self.ndim = ndim
+        self.k_vars = covx.shape[0]
+        self.pen_mat = pen_mat
+        self._covx = covx
+        self.n_slice = n_slice
+        self._slice_means = mn
+
+        if start_params is None:
+            params = np.zeros((self.k_vars, ndim))
+            params[0:ndim, 0:ndim] = np.eye(ndim)
+            params = params
+        else:
+            if start_params.shape[1] != ndim:
+                msg = "Shape of start_params is not compatible with ndim"
+                raise ValueError(msg)
+            params = start_params
+
+        params, _, cnvrg = _grass_opt(params, self._regularized_objective,
+                                      self._regularized_grad, maxiter, gtol)
+
+        if not cnvrg:
+            g = self._regularized_grad(params.ravel())
+            gn = np.sqrt(np.dot(g, g))
+            msg = "SIR.fit_regularized did not converge, |g|=%f" % gn
+            warnings.warn(msg)
+
+        results = DimReductionResults(self, params, eigs=None)
+        return DimReductionResultsWrapper(results)


 class PrincipalHessianDirections(_DimReductionRegression):
@@ -126,7 +300,30 @@ class PrincipalHessianDirections(_DimReductionRegression):
         A results instance which can be used to access the estimated
         parameters.
         """
-        pass
+
+        resid = kwargs.get("resid", False)
+
+        y = self.endog - self.endog.mean()
+        x = self.exog - self.exog.mean(0)
+
+        if resid:
+            from statsmodels.regression.linear_model import OLS
+            r = OLS(y, x).fit()
+            y = r.resid
+
+        cm = np.einsum('i,ij,ik->jk', y, x, x)
+        cm /= len(y)
+
+        cx = np.cov(x.T)
+        cb = np.linalg.solve(cx, cm)
+
+        a, b = np.linalg.eig(cb)
+        jj = np.argsort(-np.abs(a))
+        a = a[jj]
+        params = b[:, jj]
+
+        results = DimReductionResults(self, params, eigs=a)
+        return DimReductionResultsWrapper(results)


 class SlicedAverageVarianceEstimation(_DimReductionRegression):
@@ -155,8 +352,9 @@ class SlicedAverageVarianceEstimation(_DimReductionRegression):

     def __init__(self, endog, exog, **kwargs):
         super(SAVE, self).__init__(endog, exog, **kwargs)
+
         self.bc = False
-        if 'bc' in kwargs and kwargs['bc'] is True:
+        if "bc" in kwargs and kwargs["bc"] is True:
             self.bc = True

     def fit(self, **kwargs):
@@ -168,7 +366,61 @@ class SlicedAverageVarianceEstimation(_DimReductionRegression):
         slice_n : int
             Number of observations per slice
         """
-        pass
+
+        # Sample size per slice
+        slice_n = kwargs.get("slice_n", 50)
+
+        # Number of slices
+        n_slice = self.exog.shape[0] // slice_n
+
+        self._prep(n_slice)
+
+        cv = [np.cov(z.T) for z in self._split_wexog]
+        ns = [z.shape[0] for z in self._split_wexog]
+
+        p = self.wexog.shape[1]
+
+        if not self.bc:
+            # Cook's original approach
+            vm = 0
+            for w, cvx in zip(ns, cv):
+                icv = np.eye(p) - cvx
+                vm += w * np.dot(icv, icv)
+            vm /= len(cv)
+        else:
+            # The bias-corrected approach of Li and Zhu
+
+            # \Lambda_n in Li, Zhu
+            av = 0
+            for c in cv:
+                av += np.dot(c, c)
+            av /= len(cv)
+
+            # V_n in Li, Zhu
+            vn = 0
+            for x in self._split_wexog:
+                r = x - x.mean(0)
+                for i in range(r.shape[0]):
+                    u = r[i, :]
+                    m = np.outer(u, u)
+                    vn += np.dot(m, m)
+            vn /= self.exog.shape[0]
+
+            c = np.mean(ns)
+            k1 = c * (c - 1) / ((c - 1)**2 + 1)
+            k2 = (c - 1) / ((c - 1)**2 + 1)
+            av2 = k1 * av - k2 * vn
+
+            vm = np.eye(p) - 2 * sum(cv) / len(cv) + av2
+
+        a, b = np.linalg.eigh(vm)
+        jj = np.argsort(-a)
+        a = a[jj]
+        b = b[:, jj]
+        params = np.linalg.solve(self._covxr.T, b)
+
+        results = DimReductionResults(self, params, eigs=a)
+        return DimReductionResultsWrapper(results)


 class DimReductionResults(model.Results):
@@ -185,16 +437,19 @@ class DimReductionResults(model.Results):
     """

     def __init__(self, model, params, eigs):
-        super(DimReductionResults, self).__init__(model, params)
+        super(DimReductionResults, self).__init__(
+              model, params)
         self.eigs = eigs


 class DimReductionResultsWrapper(wrap.ResultsWrapper):
-    _attrs = {'params': 'columns'}
+    _attrs = {
+        'params': 'columns',
+    }
     _wrap_attrs = _attrs

-
-wrap.populate_wrapper(DimReductionResultsWrapper, DimReductionResults)
+wrap.populate_wrapper(DimReductionResultsWrapper,  # noqa:E305
+                      DimReductionResults)


 def _grass_opt(params, fun, grad, maxiter, gtol):
@@ -234,7 +489,48 @@ def _grass_opt(params, fun, grad, maxiter, gtol):
     orthogonality constraints. SIAM J Matrix Anal Appl.
     http://math.mit.edu/~edelman/publications/geometry_of_algorithms.pdf
     """
-    pass
+
+    p, d = params.shape
+    params = params.ravel()
+
+    f0 = fun(params)
+    cnvrg = False
+
+    for _ in range(maxiter):
+
+        # Project the gradient to the tangent space
+        g = grad(params)
+        g -= np.dot(g, params) * params / np.dot(params, params)
+
+        if np.sqrt(np.sum(g * g)) < gtol:
+            cnvrg = True
+            break
+
+        gm = g.reshape((p, d))
+        u, s, vt = np.linalg.svd(gm, 0)
+
+        paramsm = params.reshape((p, d))
+        pa0 = np.dot(paramsm, vt.T)
+
+        def geo(t):
+            # Parameterize the geodesic path in the direction
+            # of the gradient as a function of a real value t.
+            pa = pa0 * np.cos(s * t) + u * np.sin(s * t)
+            return np.dot(pa, vt).ravel()
+
+        # Try to find a downhill step along the geodesic path.
+        step = 2.
+        while step > 1e-10:
+            pa = geo(-step)
+            f1 = fun(pa)
+            if f1 < f0:
+                params = pa
+                f0 = f1
+                break
+            step /= 2
+
+    params = params.reshape((p, d))
+    return params, f0, cnvrg


 class CovarianceReduction(_DimReductionRegression):
@@ -278,18 +574,24 @@ class CovarianceReduction(_DimReductionRegression):
     """

     def __init__(self, endog, exog, dim):
+
         super(CovarianceReduction, self).__init__(endog, exog)
+
         covs, ns = [], []
         df = pd.DataFrame(self.exog, index=self.endog)
         for _, v in df.groupby(df.index):
             covs.append(v.cov().values)
             ns.append(v.shape[0])
+
         self.nobs = len(endog)
+
+        # The marginal covariance
         covm = 0
         for i, _ in enumerate(covs):
             covm += covs[i] * ns[i]
         covm /= self.nobs
         self.covm = covm
+
         self.covs = covs
         self.ns = ns
         self.dim = dim
@@ -306,7 +608,20 @@ class CovarianceReduction(_DimReductionRegression):

         Returns the log-likelihood.
         """
-        pass
+
+        p = self.covm.shape[0]
+        proj = params.reshape((p, self.dim))
+
+        c = np.dot(proj.T, np.dot(self.covm, proj))
+        _, ldet = np.linalg.slogdet(c)
+        f = self.nobs * ldet / 2
+
+        for j, c in enumerate(self.covs):
+            c = np.dot(proj.T, np.dot(c, proj))
+            _, ldet = np.linalg.slogdet(c)
+            f -= self.ns[j] * ldet / 2
+
+        return f

     def score(self, params):
         """
@@ -320,9 +635,22 @@ class CovarianceReduction(_DimReductionRegression):

         Returns the score function evaluated at 'params'.
         """
-        pass

-    def fit(self, start_params=None, maxiter=200, gtol=0.0001):
+        p = self.covm.shape[0]
+        proj = params.reshape((p, self.dim))
+
+        c0 = np.dot(proj.T, np.dot(self.covm, proj))
+        cP = np.dot(self.covm, proj)
+        g = self.nobs * np.linalg.solve(c0, cP.T).T
+
+        for j, c in enumerate(self.covs):
+            c0 = np.dot(proj.T, np.dot(c, proj))
+            cP = np.dot(c, proj)
+            g -= self.ns[j] * np.linalg.solve(c0, cP.T).T
+
+        return g.ravel()
+
+    def fit(self, start_params=None, maxiter=200, gtol=1e-4):
         """
         Fit the covariance reduction model.

@@ -341,9 +669,36 @@ class CovarianceReduction(_DimReductionRegression):
         A results instance that can be used to access the
         fitted parameters.
         """
-        pass
-

+        p = self.covm.shape[0]
+        d = self.dim
+
+        # Starting value for params
+        if start_params is None:
+            params = np.zeros((p, d))
+            params[0:d, 0:d] = np.eye(d)
+            params = params
+        else:
+            params = start_params
+
+        # _grass_opt is designed for minimization, we are doing maximization
+        # here so everything needs to be flipped.
+        params, llf, cnvrg = _grass_opt(params, lambda x: -self.loglike(x),
+                                        lambda x: -self.score(x), maxiter,
+                                        gtol)
+        llf *= -1
+        if not cnvrg:
+            g = self.score(params.ravel())
+            gn = np.sqrt(np.sum(g * g))
+            msg = "CovReduce optimization did not converge, |g|=%f" % gn
+            warnings.warn(msg, ConvergenceWarning)
+
+        results = DimReductionResults(self, params, eigs=None)
+        results.llf = llf
+        return DimReductionResultsWrapper(results)
+
+
+# aliases for expert users
 SIR = SlicedInverseReg
 PHD = PrincipalHessianDirections
 SAVE = SlicedAverageVarianceEstimation
diff --git a/statsmodels/regression/feasible_gls.py b/statsmodels/regression/feasible_gls.py
index 4e1851318..dea412a09 100644
--- a/statsmodels/regression/feasible_gls.py
+++ b/statsmodels/regression/feasible_gls.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """

 Created on Tue Dec 20 20:24:20 2011
@@ -6,12 +7,20 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 from statsmodels.regression.linear_model import OLS, GLS, WLS


+def atleast_2dcols(x):
+    x = np.asarray(x)
+    if x.ndim == 1:
+        x = x[:,None]
+    return x
+
+
 class GLSHet2(GLS):
-    """WLS with heteroscedasticity that depends on explanatory variables
+    '''WLS with heteroscedasticity that depends on explanatory variables

     note: mixing GLS sigma and weights for heteroscedasticity might not make
     sense
@@ -19,13 +28,30 @@ class GLSHet2(GLS):
     I think rewriting following the pattern of GLSAR is better
     stopping criteria: improve in GLSAR also, e.g. change in rho

-    """
+    '''
+

     def __init__(self, endog, exog, exog_var, sigma=None):
         self.exog_var = atleast_2dcols(exog_var)
         super(self.__class__, self).__init__(endog, exog, sigma=sigma)


+    def fit(self, lambd=1.):
+        #maybe iterate
+        #preliminary estimate
+        res_gls = GLS(self.endog, self.exog, sigma=self.sigma).fit()
+        res_resid = OLS(res_gls.resid**2, self.exog_var).fit()
+        #or  log-link
+        #res_resid = OLS(np.log(res_gls.resid**2), self.exog_var).fit()
+        #here I could use whiten and current instance instead of delegating
+        #but this is easier
+        #see pattern of GLSAR, calls self.initialize and self.fit
+        res_wls = WLS(self.endog, self.exog, weights=1./res_resid.fittedvalues).fit()
+
+        res_wls._results.results_residual_regression = res_resid
+        return res_wls
+
+
 class GLSHet(WLS):
     """
     A regression model with an estimated heteroscedasticity.
@@ -106,17 +132,17 @@ class GLSHet(WLS):

     TODO: test link option
     """
-
     def __init__(self, endog, exog, exog_var=None, weights=None, link=None):
         self.exog_var = atleast_2dcols(exog_var)
         if weights is None:
             weights = np.ones(endog.shape)
         if link is not None:
             self.link = link
-            self.linkinv = link.inverse
+            self.linkinv = link.inverse   #as defined in families.links
         else:
-            self.link = lambda x: x
+            self.link = lambda x: x  #no transformation
             self.linkinv = lambda x: x
+
         super(self.__class__, self).__init__(endog, exog, weights=weights)

     def iterative_fit(self, maxiter=3):
@@ -148,4 +174,31 @@ class GLSHet(WLS):
         calculation. Calling fit_iterative(maxiter) ones does not do any
         redundant recalculations (whitening or calculating pinv_wexog).
         """
-        pass
+
+        import collections
+        self.history = collections.defaultdict(list) #not really necessary
+        res_resid = None  #if maxiter < 2 no updating
+        for i in range(maxiter):
+            #pinv_wexog is cached
+            if hasattr(self, 'pinv_wexog'):
+                del self.pinv_wexog
+            #self.initialize()
+            #print 'wls self',
+            results = self.fit()
+            self.history['self_params'].append(results.params)
+            if not i == maxiter-1:  #skip for last iteration, could break instead
+                #print 'ols',
+                self.results_old = results #for debugging
+                #estimate heteroscedasticity
+                res_resid = OLS(self.link(results.resid**2), self.exog_var).fit()
+                self.history['ols_params'].append(res_resid.params)
+                #update weights
+                self.weights = 1./self.linkinv(res_resid.fittedvalues)
+                self.weights /= self.weights.max()  #not required
+                self.weights[self.weights < 1e-14] = 1e-14  #clip
+                #print 'in iter', i, self.weights.var() #debug, do weights change
+                self.initialize()
+
+        #note results is the wrapper, results._results is the results instance
+        results._results.results_residual_regression = res_resid
+        return results
diff --git a/statsmodels/regression/linear_model.py b/statsmodels/regression/linear_model.py
index 3c5ba0fde..34166588e 100644
--- a/statsmodels/regression/linear_model.py
+++ b/statsmodels/regression/linear_model.py
@@ -1,3 +1,8 @@
+# TODO: Determine which tests are valid for GLSAR, and under what conditions
+# TODO: Fix issue with constant and GLS
+# TODO: GLS: add options Iterative GLS, for iterative fgls if sigma is None
+# TODO: GLS: default if sigma is none should be two-step GLS
+# TODO: Check nesting when performing model based tests, lr, wald, lm
 """
 This module implements standard regression models:

@@ -26,28 +31,42 @@ R. Davidson and J.G. MacKinnon.  "Econometric Theory and Methods," Oxford,
 W. Green.  "Econometric Analysis," 5th ed., Pearson, 2003.
 """
 from __future__ import annotations
+
 from statsmodels.compat.pandas import Appender
 from statsmodels.compat.python import lrange, lzip
+
 from typing import Literal, Sequence
 import warnings
+
 import numpy as np
 from scipy import optimize, stats
 from scipy.linalg import cholesky, toeplitz
 from scipy.linalg.lapack import dtrtri
+
 import statsmodels.base.model as base
 import statsmodels.base.wrapper as wrap
 from statsmodels.emplike.elregress import _ELRegOpts
+# need import in module instead of lazily to copy `__doc__`
 from statsmodels.regression._prediction import PredictionResults
 from statsmodels.tools.decorators import cache_readonly, cache_writable
-from statsmodels.tools.sm_exceptions import InvalidTestWarning, ValueWarning
+from statsmodels.tools.sm_exceptions import (
+    InvalidTestWarning,
+    ValueWarning,
+    )
 from statsmodels.tools.tools import pinv_extended
 from statsmodels.tools.typing import Float64Array
 from statsmodels.tools.validation import bool_like, float_like, string_like
+
 from . import _prediction as pred
+
 __docformat__ = 'restructuredtext en'
+
 __all__ = ['GLS', 'WLS', 'OLS', 'GLSAR', 'PredictionResults',
-    'RegressionResultsWrapper']
-_fit_regularized_doc = """
+           'RegressionResultsWrapper']
+
+
+_fit_regularized_doc =\
+        r"""
         Return a regularized fit to a linear regression model.

         Parameters
@@ -91,7 +110,7 @@ _fit_regularized_doc = """

         .. math::

-            0.5*RSS/n + alpha*((1-L1\\_wt)*|params|_2^2/2 + L1\\_wt*|params|_1)
+            0.5*RSS/n + alpha*((1-L1\_wt)*|params|_2^2/2 + L1\_wt*|params|_1)

         where RSS is the usual regression sum of squares, n is the
         sample size, and :math:`|*|_1` and :math:`|*|_2` are the L1 and L2
@@ -149,7 +168,28 @@ def _get_sigma(sigma, nobs):
     If sigma is None, returns None, None. Otherwise returns sigma,
     cholsigmainv.
     """
-    pass
+    if sigma is None:
+        return None, None
+    sigma = np.asarray(sigma).squeeze()
+    if sigma.ndim == 0:
+        sigma = np.repeat(sigma, nobs)
+    if sigma.ndim == 1:
+        if sigma.shape != (nobs,):
+            raise ValueError("Sigma must be a scalar, 1d of length %s or a 2d "
+                             "array of shape %s x %s" % (nobs, nobs, nobs))
+        cholsigmainv = 1/np.sqrt(sigma)
+    else:
+        if sigma.shape != (nobs, nobs):
+            raise ValueError("Sigma must be a scalar, 1d of length %s or a 2d "
+                             "array of shape %s x %s" % (nobs, nobs, nobs))
+        cholsigmainv, info = dtrtri(cholesky(sigma, lower=True),
+                                    lower=True, overwrite_c=True)
+        if info > 0:
+            raise np.linalg.LinAlgError('Cholesky decomposition of sigma '
+                                        'yields a singular matrix')
+        elif info < 0:
+            raise ValueError('Invalid input to dtrtri (info = %d)' % info)
+    return sigma, cholsigmainv


 class RegressionModel(base.LikelihoodModel):
@@ -158,7 +198,6 @@ class RegressionModel(base.LikelihoodModel):

     Intended for subclassing.
     """
-
     def __init__(self, endog, exog, **kwargs):
         super(RegressionModel, self).__init__(endog, exog, **kwargs)
         self.pinv_wexog: Float64Array | None = None
@@ -166,7 +205,14 @@ class RegressionModel(base.LikelihoodModel):

     def initialize(self):
         """Initialize model components."""
-        pass
+        self.wexog = self.whiten(self.exog)
+        self.wendog = self.whiten(self.endog)
+        # overwrite nobs from class Model:
+        self.nobs = float(self.wexog.shape[0])
+
+        self._df_model = None
+        self._df_resid = None
+        self.rank = None

     @property
     def df_model(self):
@@ -176,7 +222,15 @@ class RegressionModel(base.LikelihoodModel):
         The dof is defined as the rank of the regressor matrix minus 1 if a
         constant is included.
         """
-        pass
+        if self._df_model is None:
+            if self.rank is None:
+                self.rank = np.linalg.matrix_rank(self.exog)
+            self._df_model = float(self.rank - self.k_constant)
+        return self._df_model
+
+    @df_model.setter
+    def df_model(self, value):
+        self._df_model = value

     @property
     def df_resid(self):
@@ -186,7 +240,16 @@ class RegressionModel(base.LikelihoodModel):
         The dof is defined as the number of observations minus the rank of
         the regressor matrix.
         """
-        pass
+
+        if self._df_resid is None:
+            if self.rank is None:
+                self.rank = np.linalg.matrix_rank(self.exog)
+            self._df_resid = self.nobs - self.rank
+        return self._df_resid
+
+    @df_resid.setter
+    def df_resid(self, value):
+        self._df_resid = value

     def whiten(self, x):
         """
@@ -197,12 +260,27 @@ class RegressionModel(base.LikelihoodModel):
         x : array_like
             Data to be whitened.
         """
-        pass
-
-    def fit(self, method: Literal['pinv', 'qr']='pinv', cov_type: Literal[
-        'nonrobust', 'fixed scale', 'HC0', 'HC1', 'HC2', 'HC3', 'HAC',
-        'hac-panel', 'hac-groupsum', 'cluster']='nonrobust', cov_kwds=None,
-        use_t: (bool | None)=None, **kwargs):
+        raise NotImplementedError("Subclasses must implement.")
+
+    def fit(
+            self,
+            method: Literal["pinv", "qr"] = "pinv",
+            cov_type: Literal[
+                "nonrobust",
+                "fixed scale",
+                "HC0",
+                "HC1",
+                "HC2",
+                "HC3",
+                "HAC",
+                "hac-panel",
+                "hac-groupsum",
+                "cluster",
+            ] = "nonrobust",
+            cov_kwds=None,
+            use_t: bool | None = None,
+            **kwargs
+    ):
         """
         Full fit of the model.

@@ -249,7 +327,60 @@ class RegressionModel(base.LikelihoodModel):
         The fit method uses the pseudoinverse of the design/exogenous variables
         to solve the least squares minimization.
         """
-        pass
+        if method == "pinv":
+            if not (hasattr(self, 'pinv_wexog') and
+                    hasattr(self, 'normalized_cov_params') and
+                    hasattr(self, 'rank')):
+
+                self.pinv_wexog, singular_values = pinv_extended(self.wexog)
+                self.normalized_cov_params = np.dot(
+                    self.pinv_wexog, np.transpose(self.pinv_wexog))
+
+                # Cache these singular values for use later.
+                self.wexog_singular_values = singular_values
+                self.rank = np.linalg.matrix_rank(np.diag(singular_values))
+
+            beta = np.dot(self.pinv_wexog, self.wendog)
+
+        elif method == "qr":
+            if not (hasattr(self, 'exog_Q') and
+                    hasattr(self, 'exog_R') and
+                    hasattr(self, 'normalized_cov_params') and
+                    hasattr(self, 'rank')):
+                Q, R = np.linalg.qr(self.wexog)
+                self.exog_Q, self.exog_R = Q, R
+                self.normalized_cov_params = np.linalg.inv(np.dot(R.T, R))
+
+                # Cache singular values from R.
+                self.wexog_singular_values = np.linalg.svd(R, 0, 0)
+                self.rank = np.linalg.matrix_rank(R)
+            else:
+                Q, R = self.exog_Q, self.exog_R
+            # Needed for some covariance estimators, see GH #8157
+            self.pinv_wexog = np.linalg.pinv(self.wexog)
+            # used in ANOVA
+            self.effects = effects = np.dot(Q.T, self.wendog)
+            beta = np.linalg.solve(R, effects)
+        else:
+            raise ValueError('method has to be "pinv" or "qr"')
+
+        if self._df_model is None:
+            self._df_model = float(self.rank - self.k_constant)
+        if self._df_resid is None:
+            self.df_resid = self.nobs - self.rank
+
+        if isinstance(self, OLS):
+            lfit = OLSResults(
+                self, beta,
+                normalized_cov_params=self.normalized_cov_params,
+                cov_type=cov_type, cov_kwds=cov_kwds, use_t=use_t)
+        else:
+            lfit = RegressionResults(
+                self, beta,
+                normalized_cov_params=self.normalized_cov_params,
+                cov_type=cov_type, cov_kwds=cov_kwds, use_t=use_t,
+                **kwargs)
+        return RegressionResultsWrapper(lfit)

     def predict(self, params, exog=None):
         """
@@ -271,7 +402,13 @@ class RegressionModel(base.LikelihoodModel):
         -----
         If the model has not yet been fit, params is not optional.
         """
-        pass
+        # JP: this does not look correct for GLMAR
+        # SS: it needs its own predict method
+
+        if exog is None:
+            exog = self.exog
+
+        return np.dot(exog, params)

     def get_distribution(self, params, scale, exog=None, dist_class=None):
         """
@@ -305,12 +442,16 @@ class RegressionModel(base.LikelihoodModel):
         the data set used to fit the model.  If any other value is
         used for ``n``, misleading results will be produced.
         """
-        pass
+        fit = self.predict(params, exog)
+        if dist_class is None:
+            from scipy.stats.distributions import norm
+            dist_class = norm
+        gen = dist_class(loc=fit, scale=np.sqrt(scale))
+        return gen


 class GLS(RegressionModel):
-    __doc__ = (
-        """
+    __doc__ = r"""
     Generalized Least Squares

     %(params)s
@@ -340,7 +481,7 @@ class GLS(RegressionModel):
     nobs : float
         The number of observations n.
     normalized_cov_params : ndarray
-        p x p array :math:`(X^{T}\\Sigma^{-1}X)^{-1}`
+        p x p array :math:`(X^{T}\Sigma^{-1}X)^{-1}`
     results : RegressionResults instance
         A property that returns the RegressionResults class if fit.
     sigma : ndarray
@@ -383,17 +524,22 @@ class GLS(RegressionModel):
     >>> gls_model = sm.GLS(data.endog, data.exog, sigma=sigma)
     >>> gls_results = gls_model.fit()
     >>> print(gls_results.summary())
-    """
-         % {'params': base._model_params_doc, 'extra_params': base.
-        _missing_param_doc + base._extra_param_doc})
+    """ % {'params': base._model_params_doc,
+           'extra_params': base._missing_param_doc + base._extra_param_doc}

-    def __init__(self, endog, exog, sigma=None, missing='none', hasconst=
-        None, **kwargs):
+    def __init__(self, endog, exog, sigma=None, missing='none', hasconst=None,
+                 **kwargs):
         if type(self) is GLS:
             self._check_kwargs(kwargs)
+        # TODO: add options igls, for iterative fgls if sigma is None
+        # TODO: default if sigma is none should be two-step GLS
         sigma, cholsigmainv = _get_sigma(sigma, len(endog))
-        super(GLS, self).__init__(endog, exog, missing=missing, hasconst=
-            hasconst, sigma=sigma, cholsigmainv=cholsigmainv, **kwargs)
+
+        super(GLS, self).__init__(endog, exog, missing=missing,
+                                  hasconst=hasconst, sigma=sigma,
+                                  cholsigmainv=cholsigmainv, **kwargs)
+
+        # store attribute names for data arrays
         self._data_attr.extend(['sigma', 'cholsigmainv'])

     def whiten(self, x):
@@ -414,10 +560,19 @@ class GLS(RegressionModel):
         --------
         GLS : Fit a linear model using Generalized Least Squares.
         """
-        pass
+        x = np.asarray(x)
+        if self.sigma is None or self.sigma.shape == ():
+            return x
+        elif self.sigma.ndim == 1:
+            if x.ndim == 1:
+                return x * self.cholsigmainv
+            else:
+                return x * self.cholsigmainv[:, None]
+        else:
+            return np.dot(self.cholsigmainv, x)

     def loglike(self, params):
-        """
+        r"""
         Compute the value of the Gaussian log-likelihood function at params.

         Given the whitened design matrix, the log-likelihood is evaluated
@@ -437,14 +592,27 @@ class GLS(RegressionModel):
         -----
         The log-likelihood function for the normal distribution is

-        .. math:: -\\frac{n}{2}\\log\\left(\\left(Y-\\hat{Y}\\right)^{\\prime}
-                   \\left(Y-\\hat{Y}\\right)\\right)
-                  -\\frac{n}{2}\\left(1+\\log\\left(\\frac{2\\pi}{n}\\right)\\right)
-                  -\\frac{1}{2}\\log\\left(\\left|\\Sigma\\right|\\right)
+        .. math:: -\frac{n}{2}\log\left(\left(Y-\hat{Y}\right)^{\prime}
+                   \left(Y-\hat{Y}\right)\right)
+                  -\frac{n}{2}\left(1+\log\left(\frac{2\pi}{n}\right)\right)
+                  -\frac{1}{2}\log\left(\left|\Sigma\right|\right)

         Y and Y-hat are whitened.
         """
-        pass
+        # TODO: combine this with OLS/WLS loglike and add _det_sigma argument
+        nobs2 = self.nobs / 2.0
+        SSR = np.sum((self.wendog - np.dot(self.wexog, params))**2, axis=0)
+        llf = -np.log(SSR) * nobs2      # concentrated likelihood
+        llf -= (1+np.log(np.pi/nobs2))*nobs2  # with likelihood constant
+        if np.any(self.sigma):
+            # FIXME: robust-enough check? unneeded if _det_sigma gets defined
+            if self.sigma.ndim == 2:
+                det = np.linalg.slogdet(self.sigma)
+                llf -= .5*det[1]
+            else:
+                llf -= 0.5*np.sum(np.log(self.sigma))
+            # with error covariance matrix
+        return llf

     def hessian_factor(self, params, scale=None, observed=True):
         """
@@ -468,12 +636,49 @@ class GLS(RegressionModel):
             A 1d weight vector used in the calculation of the Hessian.
             The hessian is obtained by `(exog.T * hessian_factor).dot(exog)`.
         """
-        pass
+
+        if self.sigma is None or self.sigma.shape == ():
+            return np.ones(self.exog.shape[0])
+        elif self.sigma.ndim == 1:
+            return self.cholsigmainv
+        else:
+            return np.diag(self.cholsigmainv)
+
+    @Appender(_fit_regularized_doc)
+    def fit_regularized(self, method="elastic_net", alpha=0.,
+                        L1_wt=1., start_params=None, profile_scale=False,
+                        refit=False, **kwargs):
+        if not np.isscalar(alpha):
+            alpha = np.asarray(alpha)
+        # Need to adjust since RSS/n term in elastic net uses nominal
+        # n in denominator
+        if self.sigma is not None:
+            if self.sigma.ndim == 2:
+                var_obs = np.diag(self.sigma)
+            elif self.sigma.ndim == 1:
+                var_obs = self.sigma
+            else:
+                raise ValueError("sigma should be 1-dim or 2-dim")
+
+            alpha = alpha * np.sum(1 / var_obs) / len(self.endog)
+
+        rslt = OLS(self.wendog, self.wexog).fit_regularized(
+            method=method, alpha=alpha,
+            L1_wt=L1_wt,
+            start_params=start_params,
+            profile_scale=profile_scale,
+            refit=refit, **kwargs)
+
+        from statsmodels.base.elastic_net import (
+            RegularizedResults,
+            RegularizedResultsWrapper,
+        )
+        rrslt = RegularizedResults(self, rslt.params)
+        return RegularizedResultsWrapper(rrslt)


 class WLS(RegressionModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Weighted Least Squares

     The weights are presumed to be (proportional to) the inverse of
@@ -520,27 +725,28 @@ class WLS(RegressionModel):
      t=array([[ 2.0652652]]), p=array([[ 0.04690139]]), df_denom=5>
     >>> print(results.f_test([0, 1]))
     <F test: F=array([[ 0.12733784]]), p=[[ 0.73577409]], df_denom=5, df_num=1>
-    """
-         % {'params': base._model_params_doc, 'extra_params': base.
-        _missing_param_doc + base._extra_param_doc})
+    """ % {'params': base._model_params_doc,
+           'extra_params': base._missing_param_doc + base._extra_param_doc}

-    def __init__(self, endog, exog, weights=1.0, missing='none', hasconst=
-        None, **kwargs):
+    def __init__(self, endog, exog, weights=1., missing='none', hasconst=None,
+                 **kwargs):
         if type(self) is WLS:
             self._check_kwargs(kwargs)
         weights = np.array(weights)
         if weights.shape == ():
-            if missing == 'drop' and 'missing_idx' in kwargs and kwargs[
-                'missing_idx'] is not None:
+            if (missing == 'drop' and 'missing_idx' in kwargs and
+                    kwargs['missing_idx'] is not None):
+                # patsy may have truncated endog
                 weights = np.repeat(weights, len(kwargs['missing_idx']))
             else:
                 weights = np.repeat(weights, len(endog))
+        # handle case that endog might be of len == 1
         if len(weights) == 1:
             weights = np.array([weights.squeeze()])
         else:
             weights = weights.squeeze()
-        super(WLS, self).__init__(endog, exog, missing=missing, weights=
-            weights, hasconst=hasconst, **kwargs)
+        super(WLS, self).__init__(endog, exog, missing=missing,
+                                  weights=weights, hasconst=hasconst, **kwargs)
         nobs = self.exog.shape[0]
         weights = self.weights
         if weights.size != nobs and weights.shape[0] != nobs:
@@ -560,10 +766,15 @@ class WLS(RegressionModel):
         array_like
             The whitened values sqrt(weights)*X.
         """
-        pass
+
+        x = np.asarray(x)
+        if x.ndim == 1:
+            return x * np.sqrt(self.weights)
+        elif x.ndim == 2:
+            return np.sqrt(self.weights)[:, None] * x

     def loglike(self, params):
-        """
+        r"""
         Compute the value of the gaussian log-likelihood function at params.

         Given the whitened design matrix, the log-likelihood is evaluated
@@ -581,16 +792,21 @@ class WLS(RegressionModel):

         Notes
         -----
-        .. math:: -\\frac{n}{2}\\log SSR
-                  -\\frac{n}{2}\\left(1+\\log\\left(\\frac{2\\pi}{n}\\right)\\right)
-                  +\\frac{1}{2}\\log\\left(\\left|W\\right|\\right)
+        .. math:: -\frac{n}{2}\log SSR
+                  -\frac{n}{2}\left(1+\log\left(\frac{2\pi}{n}\right)\right)
+                  +\frac{1}{2}\log\left(\left|W\right|\right)

         where :math:`W` is a diagonal weight matrix,
-        :math:`\\left|W\\right|` is its determinant, and
-        :math:`SSR=\\left(Y-\\hat{Y}\\right)^\\prime W \\left(Y-\\hat{Y}\\right)` is
+        :math:`\left|W\right|` is its determinant, and
+        :math:`SSR=\left(Y-\hat{Y}\right)^\prime W \left(Y-\hat{Y}\right)` is
         the sum of the squared weighted residuals.
         """
-        pass
+        nobs2 = self.nobs / 2.0
+        SSR = np.sum((self.wendog - np.dot(self.wexog, params))**2, axis=0)
+        llf = -np.log(SSR) * nobs2      # concentrated likelihood
+        llf -= (1+np.log(np.pi/nobs2))*nobs2  # with constant
+        llf += 0.5 * np.sum(np.log(self.weights))
+        return llf

     def hessian_factor(self, params, scale=None, observed=True):
         """
@@ -614,12 +830,37 @@ class WLS(RegressionModel):
             A 1d weight vector used in the calculation of the Hessian.
             The hessian is obtained by `(exog.T * hessian_factor).dot(exog)`.
         """
-        pass
+
+        return self.weights
+
+    @Appender(_fit_regularized_doc)
+    def fit_regularized(self, method="elastic_net", alpha=0.,
+                        L1_wt=1., start_params=None, profile_scale=False,
+                        refit=False, **kwargs):
+        # Docstring attached below
+        if not np.isscalar(alpha):
+            alpha = np.asarray(alpha)
+        # Need to adjust since RSS/n in elastic net uses nominal n in
+        # denominator
+        alpha = alpha * np.sum(self.weights) / len(self.weights)
+
+        rslt = OLS(self.wendog, self.wexog).fit_regularized(
+            method=method, alpha=alpha,
+            L1_wt=L1_wt,
+            start_params=start_params,
+            profile_scale=profile_scale,
+            refit=refit, **kwargs)
+
+        from statsmodels.base.elastic_net import (
+            RegularizedResults,
+            RegularizedResultsWrapper,
+        )
+        rrslt = RegularizedResults(self, rslt.params)
+        return RegularizedResultsWrapper(rrslt)


 class OLS(WLS):
-    __doc__ = (
-        """
+    __doc__ = """
     Ordinary Least Squares

     %(params)s
@@ -670,23 +911,22 @@ class OLS(WLS):
     >>> print(results.f_test(np.identity(2)))
     <F test: F=array([[159.63031026]]), p=1.2607168903696672e-20,
      df_denom=43, df_num=2>
-    """
-         % {'params': base._model_params_doc, 'extra_params': base.
-        _missing_param_doc + base._extra_param_doc})
-
-    def __init__(self, endog, exog=None, missing='none', hasconst=None, **
-        kwargs):
-        if 'weights' in kwargs:
-            msg = (
-                'Weights are not supported in OLS and will be ignoredAn exception will be raised in the next version.'
-                )
+    """ % {'params': base._model_params_doc,
+           'extra_params': base._missing_param_doc + base._extra_param_doc}
+
+    def __init__(self, endog, exog=None, missing='none', hasconst=None,
+                 **kwargs):
+        if "weights" in kwargs:
+            msg = ("Weights are not supported in OLS and will be ignored"
+                   "An exception will be raised in the next version.")
             warnings.warn(msg, ValueWarning)
-        super(OLS, self).__init__(endog, exog, missing=missing, hasconst=
-            hasconst, **kwargs)
-        if 'weights' in self._init_keys:
-            self._init_keys.remove('weights')
+        super(OLS, self).__init__(endog, exog, missing=missing,
+                                  hasconst=hasconst, **kwargs)
+        if "weights" in self._init_keys:
+            self._init_keys.remove("weights")
+
         if type(self) is OLS:
-            self._check_kwargs(kwargs, ['offset'])
+            self._check_kwargs(kwargs, ["offset"])

     def loglike(self, params, scale=None):
         """
@@ -706,7 +946,19 @@ class OLS(WLS):
         float
             The likelihood function evaluated at params.
         """
-        pass
+        nobs2 = self.nobs / 2.0
+        nobs = float(self.nobs)
+        resid = self.endog - np.dot(self.exog, params)
+        if hasattr(self, 'offset'):
+            resid -= self.offset
+        ssr = np.sum(resid**2)
+        if scale is None:
+            # profile log likelihood
+            llf = -nobs2*np.log(2*np.pi) - nobs2*np.log(ssr / nobs) - nobs2
+        else:
+            # log-likelihood
+            llf = -nobs2 * np.log(2 * np.pi * scale) - ssr / (2*scale)
+        return llf

     def whiten(self, x):
         """
@@ -726,7 +978,7 @@ class OLS(WLS):
         --------
         OLS : Fit a linear model using Ordinary Least Squares.
         """
-        pass
+        return x

     def score(self, params, scale=None):
         """
@@ -751,7 +1003,28 @@ class OLS(WLS):
         ndarray
             The score vector.
         """
-        pass
+
+        if not hasattr(self, "_wexog_xprod"):
+            self._setup_score_hess()
+
+        xtxb = np.dot(self._wexog_xprod, params)
+        sdr = -self._wexog_x_wendog + xtxb
+
+        if scale is None:
+            ssr = self._wendog_xprod - 2 * np.dot(self._wexog_x_wendog.T,
+                                                  params)
+            ssr += np.dot(params, xtxb)
+            return -self.nobs * sdr / ssr
+        else:
+            return -sdr / scale
+
+    def _setup_score_hess(self):
+        y = self.wendog
+        if hasattr(self, 'offset'):
+            y = y - self.offset
+        self._wendog_xprod = np.sum(y * y)
+        self._wexog_xprod = np.dot(self.wexog.T, self.wexog)
+        self._wexog_x_wendog = np.dot(self.wexog.T, y)

     def hessian(self, params, scale=None):
         """
@@ -771,7 +1044,21 @@ class OLS(WLS):
         ndarray
             The Hessian matrix.
         """
-        pass
+
+        if not hasattr(self, "_wexog_xprod"):
+            self._setup_score_hess()
+
+        xtxb = np.dot(self._wexog_xprod, params)
+
+        if scale is None:
+            ssr = self._wendog_xprod - 2 * np.dot(self._wexog_x_wendog.T,
+                                                  params)
+            ssr += np.dot(params, xtxb)
+            ssrp = -2*self._wexog_x_wendog + 2*xtxb
+            hm = self._wexog_xprod / ssr - np.outer(ssrp, ssrp) / ssr**2
+            return -self.nobs * hm / 2
+        else:
+            return -self._wexog_xprod / scale

     def hessian_factor(self, params, scale=None, observed=True):
         """
@@ -795,7 +1082,105 @@ class OLS(WLS):
             A 1d weight vector used in the calculation of the Hessian.
             The hessian is obtained by `(exog.T * hessian_factor).dot(exog)`.
         """
-        pass
+
+        return np.ones(self.exog.shape[0])
+
+    @Appender(_fit_regularized_doc)
+    def fit_regularized(self, method="elastic_net", alpha=0.,
+                        L1_wt=1., start_params=None, profile_scale=False,
+                        refit=False, **kwargs):
+
+        # In the future we could add support for other penalties, e.g. SCAD.
+        if method not in ("elastic_net", "sqrt_lasso"):
+            msg = "Unknown method '%s' for fit_regularized" % method
+            raise ValueError(msg)
+
+        # Set default parameters.
+        defaults = {"maxiter":  50, "cnvrg_tol": 1e-10,
+                    "zero_tol": 1e-8}
+        defaults.update(kwargs)
+
+        if method == "sqrt_lasso":
+            from statsmodels.base.elastic_net import (
+                RegularizedResults,
+                RegularizedResultsWrapper,
+            )
+            params = self._sqrt_lasso(alpha, refit, defaults["zero_tol"])
+            results = RegularizedResults(self, params)
+            return RegularizedResultsWrapper(results)
+
+        from statsmodels.base.elastic_net import fit_elasticnet
+
+        if L1_wt == 0:
+            return self._fit_ridge(alpha)
+
+        # If a scale parameter is passed in, the non-profile
+        # likelihood (residual sum of squares divided by -2) is used,
+        # otherwise the profile likelihood is used.
+        if profile_scale:
+            loglike_kwds = {}
+            score_kwds = {}
+            hess_kwds = {}
+        else:
+            loglike_kwds = {"scale": 1}
+            score_kwds = {"scale": 1}
+            hess_kwds = {"scale": 1}
+
+        return fit_elasticnet(self, method=method,
+                              alpha=alpha,
+                              L1_wt=L1_wt,
+                              start_params=start_params,
+                              loglike_kwds=loglike_kwds,
+                              score_kwds=score_kwds,
+                              hess_kwds=hess_kwds,
+                              refit=refit,
+                              check_step=False,
+                              **defaults)
+
+    def _sqrt_lasso(self, alpha, refit, zero_tol):
+
+        try:
+            import cvxopt
+        except ImportError:
+            msg = 'sqrt_lasso fitting requires the cvxopt module'
+            raise ValueError(msg)
+
+        n = len(self.endog)
+        p = self.exog.shape[1]
+
+        h0 = cvxopt.matrix(0., (2*p+1, 1))
+        h1 = cvxopt.matrix(0., (n+1, 1))
+        h1[1:, 0] = cvxopt.matrix(self.endog, (n, 1))
+
+        G0 = cvxopt.spmatrix([], [], [], (2*p+1, 2*p+1))
+        for i in range(1, 2*p+1):
+            G0[i, i] = -1
+        G1 = cvxopt.matrix(0., (n+1, 2*p+1))
+        G1[0, 0] = -1
+        G1[1:, 1:p+1] = self.exog
+        G1[1:, p+1:] = -self.exog
+
+        c = cvxopt.matrix(alpha / n, (2*p + 1, 1))
+        c[0] = 1 / np.sqrt(n)
+
+        from cvxopt import solvers
+        solvers.options["show_progress"] = False
+
+        rslt = solvers.socp(c, Gl=G0, hl=h0, Gq=[G1], hq=[h1])
+        x = np.asarray(rslt['x']).flat
+        bp = x[1:p+1]
+        bn = x[p+1:]
+        params = bp - bn
+
+        if not refit:
+            return params
+
+        ii = np.flatnonzero(np.abs(params) > zero_tol)
+        rfr = OLS(self.endog, self.exog[:, ii]).fit()
+        params *= 0
+        params[ii] = rfr.params
+
+        return params

     def _fit_ridge(self, alpha):
         """
@@ -814,12 +1199,29 @@ class OLS(WLS):
         Equivalent to fit_regularized with L1_wt = 0 (but implemented
         more efficiently).
         """
-        pass
+
+        u, s, vt = np.linalg.svd(self.exog, 0)
+        v = vt.T
+        q = np.dot(u.T, self.endog) * s
+        s2 = s * s
+        if np.isscalar(alpha):
+            sd = s2 + alpha * self.nobs
+            params = q / sd
+            params = np.dot(v, params)
+        else:
+            alpha = np.asarray(alpha)
+            vtav = self.nobs * np.dot(vt, alpha[:, None] * v)
+            d = np.diag(vtav) + s2
+            np.fill_diagonal(vtav, d)
+            r = np.linalg.solve(vtav, q)
+            params = np.dot(v, r)
+
+        from statsmodels.base.elastic_net import RegularizedResults
+        return RegularizedResults(self, params)


 class GLSAR(GLS):
-    __doc__ = (
-        """
+    __doc__ = """
     Generalized Least Squares with AR covariance structure

     %(params)s
@@ -870,29 +1272,35 @@ class GLSAR(GLS):
     >>> res = model2.iterative_fit(maxiter=6)
     >>> model2.rho
     array([-0.60479146, -0.85841922])
-    """
-         % {'params': base._model_params_doc, 'extra_params': base.
-        _missing_param_doc + base._extra_param_doc})
+    """ % {'params': base._model_params_doc,
+           'extra_params': base._missing_param_doc + base._extra_param_doc}
+    # TODO: Complete docstring

-    def __init__(self, endog, exog=None, rho=1, missing='none', hasconst=
-        None, **kwargs):
+    def __init__(self, endog, exog=None, rho=1, missing='none', hasconst=None,
+                 **kwargs):
+        # this looks strange, interpreting rho as order if it is int
         if isinstance(rho, (int, np.integer)):
             self.order = int(rho)
             self.rho = np.zeros(self.order, np.float64)
         else:
             self.rho = np.squeeze(np.asarray(rho))
             if len(self.rho.shape) not in [0, 1]:
-                raise ValueError('AR parameters must be a scalar or a vector')
+                raise ValueError("AR parameters must be a scalar or a vector")
             if self.rho.shape == ():
-                self.rho.shape = 1,
+                self.rho.shape = (1,)
             self.order = self.rho.shape[0]
         if exog is None:
+            # JP this looks wrong, should be a regression on constant
+            # results for rho estimate now identical to yule-walker on y
+            # super(AR, self).__init__(endog, add_constant(endog))
             super(GLSAR, self).__init__(endog, np.ones((endog.shape[0], 1)),
-                missing=missing, hasconst=None, **kwargs)
+                                        missing=missing, hasconst=None,
+                                        **kwargs)
         else:
-            super(GLSAR, self).__init__(endog, exog, missing=missing, **kwargs)
+            super(GLSAR, self).__init__(endog, exog, missing=missing,
+                                        **kwargs)

-    def iterative_fit(self, maxiter=3, rtol=0.0001, **kwargs):
+    def iterative_fit(self, maxiter=3, rtol=1e-4, **kwargs):
         """
         Perform an iterative two-stage procedure to estimate a GLS model.

@@ -914,7 +1322,48 @@ class GLSAR(GLS):
         RegressionResults
             The results computed using an iterative fit.
         """
-        pass
+        # TODO: update this after going through example.
+        converged = False
+        i = -1  # need to initialize for maxiter < 1 (skip loop)
+        history = {'params': [], 'rho': [self.rho]}
+        for i in range(maxiter - 1):
+            if hasattr(self, 'pinv_wexog'):
+                del self.pinv_wexog
+            self.initialize()
+            results = self.fit()
+            history['params'].append(results.params)
+            if i == 0:
+                last = results.params
+            else:
+                diff = np.max(np.abs(last - results.params) / np.abs(last))
+                if diff < rtol:
+                    converged = True
+                    break
+                last = results.params
+            self.rho, _ = yule_walker(results.resid,
+                                      order=self.order, df=None)
+            history['rho'].append(self.rho)
+
+        # why not another call to self.initialize
+        # Use kwarg to insert history
+        if not converged and maxiter > 0:
+            # maxiter <= 0 just does OLS
+            if hasattr(self, 'pinv_wexog'):
+                del self.pinv_wexog
+            self.initialize()
+
+        # if converged then this is a duplicate fit, because we did not
+        # update rho
+        results = self.fit(history=history, **kwargs)
+        results.iter = i + 1
+        # add last fit to history, not if duplicate fit
+        if not converged:
+            results.history['params'].append(results.params)
+            results.iter += 1
+
+        results.converged = converged
+
+        return results

     def whiten(self, x):
         """
@@ -932,11 +1381,18 @@ class GLSAR(GLS):
         ndarray
             The whitened data.
         """
-        pass
+        # TODO: notation for AR process
+        x = np.asarray(x, np.float64)
+        _x = x.copy()

+        # the following loops over the first axis,  works for 1d and nd
+        for i in range(self.order):
+            _x[(i + 1):] = _x[(i + 1):] - self.rho[i] * x[0:-(i + 1)]
+        return _x[self.order:]

-def yule_walker(x, order=1, method='adjusted', df=None, inv=False, demean=True
-    ):
+
+def yule_walker(x, order=1, method="adjusted", df=None, inv=False,
+                demean=True):
     """
     Estimate AR(p) parameters from a sequence using the Yule-Walker equations.

@@ -992,7 +1448,59 @@ def yule_walker(x, order=1, method='adjusted', df=None, inv=False, demean=True
     >>> sigma
     16.808022730464351
     """
-    pass
+    # TODO: define R better, look back at notes and technical notes on YW.
+    # First link here is useful
+    # http://www-stat.wharton.upenn.edu/~steele/Courses/956/ResourceDetails/YuleWalkerAndMore.htm
+
+    method = string_like(
+        method, "method", options=("adjusted", "unbiased", "mle")
+    )
+    if method == "unbiased":
+        warnings.warn(
+            "unbiased is deprecated in factor of adjusted to reflect that the "
+            "term is adjusting the sample size used in the autocovariance "
+            "calculation rather than estimating an unbiased autocovariance. "
+            "After release 0.13, using 'unbiased' will raise.",
+            FutureWarning,
+        )
+        method = "adjusted"
+
+    if method not in ("adjusted", "mle"):
+        raise ValueError("ACF estimation method must be 'adjusted' or 'MLE'")
+    x = np.array(x, dtype=np.float64)
+    if demean:
+        x -= x.mean()
+    n = df or x.shape[0]
+
+    # this handles df_resid ie., n - p
+    adj_needed = method == "adjusted"
+
+    if x.ndim > 1 and x.shape[1] != 1:
+        raise ValueError("expecting a vector to estimate AR parameters")
+    r = np.zeros(order+1, np.float64)
+    r[0] = (x ** 2).sum() / n
+    for k in range(1, order+1):
+        r[k] = (x[0:-k] * x[k:]).sum() / (n - k * adj_needed)
+    R = toeplitz(r[:-1])
+
+    try:
+        rho = np.linalg.solve(R, r[1:])
+    except np.linalg.LinAlgError as err:
+        if 'Singular matrix' in str(err):
+            warnings.warn("Matrix is singular. Using pinv.", ValueWarning)
+            rho = np.linalg.pinv(R) @ r[1:]
+        else:
+            raise
+
+    sigmasq = r[0] - (r[1:]*rho).sum()
+    if not np.isnan(sigmasq) and sigmasq > 0:
+        sigma = np.sqrt(sigmasq)
+    else:
+        sigma = np.nan
+    if inv:
+        return rho, sigma, np.linalg.inv(R)
+    else:
+        return rho, sigma


 def burg(endog, order=1, demean=True):
@@ -1041,11 +1549,24 @@ def burg(endog, order=1, demean=True):
     >>> sigma2
     271.2467306963966
     """
-    pass
+    # Avoid circular imports
+    from statsmodels.tsa.stattools import levinson_durbin_pacf, pacf_burg
+
+    endog = np.squeeze(np.asarray(endog))
+    if endog.ndim != 1:
+        raise ValueError('endog must be 1-d or squeezable to 1-d.')
+    order = int(order)
+    if order < 1:
+        raise ValueError('order must be an integer larger than 1')
+    if demean:
+        endog = endog - endog.mean()
+    pacf, sigma = pacf_burg(endog, order, demean=demean)
+    ar, _ = levinson_durbin_pacf(pacf)
+    return ar, sigma[-1]


 class RegressionResults(base.LikelihoodModelResults):
-    """
+    r"""
     This class summarizes the fit of a linear regression model.

     It handles the output of contrasts, estimates of covariance, etc.
@@ -1094,40 +1615,47 @@ class RegressionResults(base.LikelihoodModelResults):
         criterion.  This is usually called Beta for the classical
         linear model.
     """
-    _cache = {}

-    def __init__(self, model, params, normalized_cov_params=None, scale=1.0,
-        cov_type='nonrobust', cov_kwds=None, use_t=None, **kwargs):
-        super(RegressionResults, self).__init__(model, params,
-            normalized_cov_params, scale)
+    _cache = {}  # needs to be a class attribute for scale setter?
+
+    def __init__(self, model, params, normalized_cov_params=None, scale=1.,
+                 cov_type='nonrobust', cov_kwds=None, use_t=None, **kwargs):
+        super(RegressionResults, self).__init__(
+            model, params, normalized_cov_params, scale)
+
         self._cache = {}
         if hasattr(model, 'wexog_singular_values'):
             self._wexog_singular_values = model.wexog_singular_values
         else:
             self._wexog_singular_values = None
+
         self.df_model = model.df_model
         self.df_resid = model.df_resid
+
         if cov_type == 'nonrobust':
             self.cov_type = 'nonrobust'
-            self.cov_kwds = {'description': 
-                'Standard Errors assume that the ' +
-                'covariance matrix of the errors is correctly ' + 'specified.'}
+            self.cov_kwds = {
+                'description': 'Standard Errors assume that the ' +
+                'covariance matrix of the errors is correctly ' +
+                'specified.'}
             if use_t is None:
-                use_t = True
+                use_t = True  # TODO: class default
             self.use_t = use_t
         else:
             if cov_kwds is None:
                 cov_kwds = {}
             if 'use_t' in cov_kwds:
+                # TODO: we want to get rid of 'use_t' in cov_kwds
                 use_t_2 = cov_kwds.pop('use_t')
                 if use_t is None:
                     use_t = use_t_2
+                # TODO: warn or not?
             self.get_robustcov_results(cov_type=cov_type, use_self=True,
-                use_t=use_t, **cov_kwds)
+                                       use_t=use_t, **cov_kwds)
         for key in kwargs:
             setattr(self, key, kwargs[key])

-    def conf_int(self, alpha=0.05, cols=None):
+    def conf_int(self, alpha=.05, cols=None):
         """
         Compute the confidence interval of the fitted parameters.

@@ -1148,30 +1676,35 @@ class RegressionResults(base.LikelihoodModelResults):
         -----
         The confidence interval is based on Student's t-distribution.
         """
-        pass
+        # keep method for docstring for now
+        ci = super(RegressionResults, self).conf_int(alpha=alpha, cols=cols)
+        return ci

     @cache_readonly
     def nobs(self):
         """Number of observations n."""
-        pass
+        return float(self.model.wexog.shape[0])

     @cache_readonly
     def fittedvalues(self):
         """The predicted values for the original (unwhitened) design."""
-        pass
+        return self.model.predict(self.params, self.model.exog)

     @cache_readonly
     def wresid(self):
         """
         The residuals of the transformed/whitened regressand and regressor(s).
         """
-        pass
+        return self.model.wendog - self.model.predict(
+            self.params, self.model.wexog)

     @cache_readonly
     def resid(self):
         """The residuals of the model."""
-        pass
+        return self.model.endog - self.model.predict(
+            self.params, self.model.exog)

+    # TODO: fix writable example
     @cache_writable()
     def scale(self):
         """
@@ -1180,17 +1713,35 @@ class RegressionResults(base.LikelihoodModelResults):
         The Default value is ssr/(n-p).  Note that the square root of `scale`
         is often called the standard error of the regression.
         """
-        pass
+        wresid = self.wresid
+        return np.dot(wresid, wresid) / self.df_resid

     @cache_readonly
     def ssr(self):
         """Sum of squared (whitened) residuals."""
-        pass
+        wresid = self.wresid
+        return np.dot(wresid, wresid)

     @cache_readonly
     def centered_tss(self):
         """The total (weighted) sum of squares centered about the mean."""
-        pass
+        model = self.model
+        weights = getattr(model, 'weights', None)
+        sigma = getattr(model, 'sigma', None)
+        if weights is not None:
+            mean = np.average(model.endog, weights=weights)
+            return np.sum(weights * (model.endog - mean)**2)
+        elif sigma is not None:
+            # Exactly matches WLS when sigma is diagonal
+            iota = np.ones_like(model.endog)
+            iota = model.whiten(iota)
+            mean = model.wendog.dot(iota) / iota.dot(iota)
+            err = model.endog - mean
+            err = model.whiten(err)
+            return np.sum(err**2)
+        else:
+            centered_endog = model.wendog - model.wendog.mean()
+            return np.dot(centered_endog, centered_endog)

     @cache_readonly
     def uncentered_tss(self):
@@ -1200,7 +1751,8 @@ class RegressionResults(base.LikelihoodModelResults):
         The sum of the squared values of the (whitened) endogenous response
         variable.
         """
-        pass
+        wendog = self.model.wendog
+        return np.dot(wendog, wendog)

     @cache_readonly
     def ess(self):
@@ -1211,7 +1763,11 @@ class RegressionResults(base.LikelihoodModelResults):
         sum of squared residuals. If there is no constant, the uncentered total
         sum of squares is used.
         """
-        pass
+
+        if self.k_constant:
+            return self.centered_tss - self.ssr
+        else:
+            return self.uncentered_tss - self.ssr

     @cache_readonly
     def rsquared(self):
@@ -1222,7 +1778,10 @@ class RegressionResults(base.LikelihoodModelResults):
         included in the model and 1 - `ssr`/`uncentered_tss` if the constant is
         omitted.
         """
-        pass
+        if self.k_constant:
+            return 1 - self.ssr/self.centered_tss
+        else:
+            return 1 - self.ssr/self.uncentered_tss

     @cache_readonly
     def rsquared_adj(self):
@@ -1233,7 +1792,8 @@ class RegressionResults(base.LikelihoodModelResults):
         if a constant is included and 1 - `nobs`/`df_resid` * (1-`rsquared`) if
         no constant is included.
         """
-        pass
+        return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
+                    * (1 - self.rsquared))

     @cache_readonly
     def mse_model(self):
@@ -1242,7 +1802,9 @@ class RegressionResults(base.LikelihoodModelResults):

         The explained sum of squares divided by the model degrees of freedom.
         """
-        pass
+        if np.all(self.df_model == 0.0):
+            return np.full_like(self.ess, np.nan)
+        return self.ess/self.df_model

     @cache_readonly
     def mse_resid(self):
@@ -1252,7 +1814,9 @@ class RegressionResults(base.LikelihoodModelResults):
         The sum of squared residuals divided by the residual degrees of
         freedom.
         """
-        pass
+        if np.all(self.df_resid == 0.0):
+            return np.full_like(self.ssr, np.nan)
+        return self.ssr/self.df_resid

     @cache_readonly
     def mse_total(self):
@@ -1262,7 +1826,12 @@ class RegressionResults(base.LikelihoodModelResults):
         The uncentered total sum of squares divided by the number of
         observations.
         """
-        pass
+        if np.all(self.df_resid + self.df_model == 0.0):
+            return np.full_like(self.centered_tss, np.nan)
+        if self.k_constant:
+            return self.centered_tss / (self.df_resid + self.df_model)
+        else:
+            return self.uncentered_tss / (self.df_resid + self.df_model)

     @cache_readonly
     def fvalue(self):
@@ -1274,37 +1843,65 @@ class RegressionResults(base.LikelihoodModelResults):
         Otherwise computed using a Wald-like quadratic form that tests whether
         all coefficients (excluding the constant) are zero.
         """
-        pass
+        if hasattr(self, 'cov_type') and self.cov_type != 'nonrobust':
+            # with heteroscedasticity or correlation robustness
+            k_params = self.normalized_cov_params.shape[0]
+            mat = np.eye(k_params)
+            const_idx = self.model.data.const_idx
+            # TODO: What if model includes implicit constant, e.g. all
+            #       dummies but no constant regressor?
+            # TODO: Restats as LM test by projecting orthogonalizing
+            #       to constant?
+            if self.model.data.k_constant == 1:
+                # if constant is implicit, return nan see #2444
+                if const_idx is None:
+                    return np.nan
+
+                idx = lrange(k_params)
+                idx.pop(const_idx)
+                mat = mat[idx]  # remove constant
+                if mat.size == 0:  # see  #3642
+                    return np.nan
+            ft = self.f_test(mat)
+            # using backdoor to set another attribute that we already have
+            self._cache['f_pvalue'] = float(ft.pvalue)
+            return float(ft.fvalue)
+        else:
+            # for standard homoscedastic case
+            return self.mse_model/self.mse_resid

     @cache_readonly
     def f_pvalue(self):
         """The p-value of the F-statistic."""
-        pass
+        # Special case for df_model 0
+        if self.df_model == 0:
+            return np.full_like(self.fvalue, np.nan)
+        return stats.f.sf(self.fvalue, self.df_model, self.df_resid)

     @cache_readonly
     def bse(self):
         """The standard errors of the parameter estimates."""
-        pass
+        return np.sqrt(np.diag(self.cov_params()))

     @cache_readonly
     def aic(self):
-        """
+        r"""
         Akaike's information criteria.

-        For a model with a constant :math:`-2llf + 2(df\\_model + 1)`. For a
-        model without a constant :math:`-2llf + 2(df\\_model)`.
+        For a model with a constant :math:`-2llf + 2(df\_model + 1)`. For a
+        model without a constant :math:`-2llf + 2(df\_model)`.
         """
-        pass
+        return self.info_criteria("aic")

     @cache_readonly
     def bic(self):
-        """
+        r"""
         Bayes' information criteria.

-        For a model with a constant :math:`-2llf + \\log(n)(df\\_model+1)`.
-        For a model without a constant :math:`-2llf + \\log(n)(df\\_model)`.
+        For a model with a constant :math:`-2llf + \log(n)(df\_model+1)`.
+        For a model without a constant :math:`-2llf + \log(n)(df\_model)`.
         """
-        pass
+        return self.info_criteria("bic")

     def info_criteria(self, crit, dk_params=0):
         """Return an information criterion for the model.
@@ -1328,14 +1925,32 @@ class RegressionResults(base.LikelihoodModelResults):
         Burnham KP, Anderson KR (2002). Model Selection and Multimodel
         Inference; Springer New York.
         """
-        pass
+        crit = crit.lower()
+        k_params = self.df_model + self.k_constant + dk_params
+
+        if crit == "aic":
+            return -2 * self.llf + 2 * k_params
+        elif crit == "bic":
+            bic = -2*self.llf + np.log(self.nobs) * k_params
+            return bic
+        elif crit == "aicc":
+            from statsmodels.tools.eval_measures import aicc
+            return aicc(self.llf, self.nobs, k_params)
+        elif crit == "hqic":
+            from statsmodels.tools.eval_measures import hqic
+            return hqic(self.llf, self.nobs, k_params)

     @cache_readonly
     def eigenvals(self):
         """
         Return eigenvalues sorted in decreasing order.
         """
-        pass
+        if self._wexog_singular_values is not None:
+            eigvals = self._wexog_singular_values ** 2
+        else:
+            wx = self.model.wexog
+            eigvals = np.linalg.linalg.eigvalsh(wx.T @ wx)
+        return np.sort(eigvals)[::-1]

     @cache_readonly
     def condition_number(self):
@@ -1347,35 +1962,58 @@ class RegressionResults(base.LikelihoodModelResults):
         the ratio of the largest to smallest eigenvalue of the inner-product
         of the exogenous variables.
         """
-        pass
+        eigvals = self.eigenvals
+        return np.sqrt(eigvals[0]/eigvals[-1])
+
+    # TODO: make these properties reset bse
+    def _HCCM(self, scale):
+        H = np.dot(self.model.pinv_wexog,
+                   scale[:, None] * self.model.pinv_wexog.T)
+        return H
+
+    def _abat_diagonal(self, a, b):
+        # equivalent to np.diag(a @ b @ a.T)
+        return np.einsum('ij,ik,kj->i', a, a, b)

     @cache_readonly
     def cov_HC0(self):
         """
         Heteroscedasticity robust covariance matrix. See HC0_se.
         """
-        pass
+        self.het_scale = self.wresid**2
+        cov_HC0 = self._HCCM(self.het_scale)
+        return cov_HC0

     @cache_readonly
     def cov_HC1(self):
         """
         Heteroscedasticity robust covariance matrix. See HC1_se.
         """
-        pass
+        self.het_scale = self.nobs/(self.df_resid)*(self.wresid**2)
+        cov_HC1 = self._HCCM(self.het_scale)
+        return cov_HC1

     @cache_readonly
     def cov_HC2(self):
         """
         Heteroscedasticity robust covariance matrix. See HC2_se.
         """
-        pass
+        wexog = self.model.wexog
+        h = self._abat_diagonal(wexog, self.normalized_cov_params)
+        self.het_scale = self.wresid**2/(1-h)
+        cov_HC2 = self._HCCM(self.het_scale)
+        return cov_HC2

     @cache_readonly
     def cov_HC3(self):
         """
         Heteroscedasticity robust covariance matrix. See HC3_se.
         """
-        pass
+        wexog = self.model.wexog
+        h = self._abat_diagonal(wexog, self.normalized_cov_params)
+        self.het_scale = (self.wresid / (1 - h))**2
+        cov_HC3 = self._HCCM(self.het_scale)
+        return cov_HC3

     @cache_readonly
     def HC0_se(self):
@@ -1391,7 +2029,7 @@ class RegressionResults(base.LikelihoodModelResults):
         then have another attribute `het_scale`, which is in this case is just
         resid**2.
         """
-        pass
+        return np.sqrt(np.diag(self.cov_HC0))

     @cache_readonly
     def HC1_se(self):
@@ -1406,7 +2044,7 @@ class RegressionResults(base.LikelihoodModelResults):
         then have another attribute `het_scale`, which is in this case is
         n/(n-p)*resid**2.
         """
-        pass
+        return np.sqrt(np.diag(self.cov_HC1))

     @cache_readonly
     def HC2_se(self):
@@ -1422,7 +2060,7 @@ class RegressionResults(base.LikelihoodModelResults):
         then have another attribute `het_scale`, which is in this case is
         resid^(2)/(1-h_ii).
         """
-        pass
+        return np.sqrt(np.diag(self.cov_HC2))

     @cache_readonly
     def HC3_se(self):
@@ -1438,7 +2076,7 @@ class RegressionResults(base.LikelihoodModelResults):
         then have another attribute `het_scale`, which is in this case is
         resid^(2)/(1-h_ii)^(2).
         """
-        pass
+        return np.sqrt(np.diag(self.cov_HC3))

     @cache_readonly
     def resid_pearson(self):
@@ -1451,7 +2089,19 @@ class RegressionResults(base.LikelihoodModelResults):
             The array `wresid` normalized by the sqrt of the scale to have
             unit variance.
         """
-        pass
+
+        if not hasattr(self, 'resid'):
+            raise ValueError('Method requires residuals.')
+        eps = np.finfo(self.wresid.dtype).eps
+        if np.sqrt(self.scale) < 10 * eps * self.model.endog.mean():
+            # do not divide if scale is zero close to numerical precision
+            warnings.warn(
+                "All residuals are 0, cannot compute normed residuals.",
+                RuntimeWarning
+            )
+            return self.wresid
+        else:
+            return self.wresid / np.sqrt(self.scale)

     def _is_nested(self, restricted):
         """
@@ -1474,7 +2124,23 @@ class RegressionResults(base.LikelihoodModelResults):
         model are spanned by the regressors in the larger model and
         the regressand is identical.
         """
-        pass
+
+        if self.model.nobs != restricted.model.nobs:
+            return False
+
+        full_rank = self.model.rank
+        restricted_rank = restricted.model.rank
+        if full_rank <= restricted_rank:
+            return False
+
+        restricted_exog = restricted.model.wexog
+        full_wresid = self.wresid
+
+        scores = restricted_exog * full_wresid[:, None]
+        score_l2 = np.sqrt(np.mean(scores.mean(0) ** 2))
+        # TODO: Could be improved, and may fail depending on scale of
+        # regressors
+        return np.allclose(score_l2, 0)

     def compare_lm_test(self, restricted, demean=True, use_lr=False):
         """
@@ -1515,7 +2181,56 @@ class RegressionResults(base.LikelihoodModelResults):
         the sum of squared errors, and so the scores should be close to zero,
         on average.
         """
-        pass
+        from numpy.linalg import inv
+
+        import statsmodels.stats.sandwich_covariance as sw
+
+        if not self._is_nested(restricted):
+            raise ValueError("Restricted model is not nested by full model.")
+
+        wresid = restricted.wresid
+        wexog = self.model.wexog
+        scores = wexog * wresid[:, None]
+
+        n = self.nobs
+        df_full = self.df_resid
+        df_restr = restricted.df_resid
+        df_diff = (df_restr - df_full)
+
+        s = scores.mean(axis=0)
+        if use_lr:
+            scores = wexog * self.wresid[:, None]
+            demean = False
+
+        if demean:
+            scores = scores - scores.mean(0)[None, :]
+        # Form matters here.  If homoskedastics can be sigma^2 (X'X)^-1
+        # If Heteroskedastic then the form below is fine
+        # If HAC then need to use HAC
+        # If Cluster, should use cluster
+
+        cov_type = getattr(self, 'cov_type', 'nonrobust')
+        if cov_type == 'nonrobust':
+            sigma2 = np.mean(wresid**2)
+            xpx = np.dot(wexog.T, wexog) / n
+            s_inv = inv(sigma2 * xpx)
+        elif cov_type in ('HC0', 'HC1', 'HC2', 'HC3'):
+            s_inv = inv(np.dot(scores.T, scores) / n)
+        elif cov_type == 'HAC':
+            maxlags = self.cov_kwds['maxlags']
+            s_inv = inv(sw.S_hac_simple(scores, maxlags) / n)
+        elif cov_type == 'cluster':
+            # cluster robust standard errors
+            groups = self.cov_kwds['groups']
+            # TODO: Might need demean option in S_crosssection by group?
+            s_inv = inv(sw.S_crosssection(scores, groups))
+        else:
+            raise ValueError('Only nonrobust, HC, HAC and cluster are ' +
+                             'currently connected')
+
+        lm_value = n * (s @ s_inv @ s.T)
+        p_value = stats.chi2.sf(lm_value, df_diff)
+        return lm_value, p_value, df_diff

     def compare_f_test(self, restricted):
         """
@@ -1550,7 +2265,25 @@ class RegressionResults(base.LikelihoodModelResults):
         the assumption of homoscedasticity and no autocorrelation
         (sphericity).
         """
-        pass
+
+        has_robust1 = getattr(self, 'cov_type', 'nonrobust') != 'nonrobust'
+        has_robust2 = (getattr(restricted, 'cov_type', 'nonrobust') !=
+                       'nonrobust')
+
+        if has_robust1 or has_robust2:
+            warnings.warn('F test for comparison is likely invalid with ' +
+                          'robust covariance, proceeding anyway',
+                          InvalidTestWarning)
+
+        ssr_full = self.ssr
+        ssr_restr = restricted.ssr
+        df_full = self.df_resid
+        df_restr = restricted.df_resid
+
+        df_diff = (df_restr - df_full)
+        f_value = (ssr_restr - ssr_full) / df_diff / ssr_full * df_full
+        p_value = stats.f.sf(f_value, df_diff, df_full)
+        return f_value, p_value, df_diff

     def compare_lr_test(self, restricted, large_sample=False):
         """
@@ -1628,7 +2361,32 @@ class RegressionResults(base.LikelihoodModelResults):
         estimated using the same estimator as in the alternative
         model.
         """
-        pass
+        # TODO: put into separate function, needs tests
+
+        # See mailing list discussion October 17,
+
+        if large_sample:
+            return self.compare_lm_test(restricted, use_lr=True)
+
+        has_robust1 = (getattr(self, 'cov_type', 'nonrobust') != 'nonrobust')
+        has_robust2 = (
+            getattr(restricted, 'cov_type', 'nonrobust') != 'nonrobust')
+
+        if has_robust1 or has_robust2:
+            warnings.warn('Likelihood Ratio test is likely invalid with ' +
+                          'robust covariance, proceeding anyway',
+                          InvalidTestWarning)
+
+        llf_full = self.llf
+        llf_restr = restricted.llf
+        df_full = self.df_resid
+        df_restr = restricted.df_resid
+
+        lrdf = (df_restr - df_full)
+        lrstat = -2*(llf_restr - llf_full)
+        lr_pvalue = stats.chi2.sf(lrstat, lrdf)
+
+        return lrstat, lr_pvalue, lrdf

     def get_robustcov_results(self, cov_type='HC1', use_t=None, **kwargs):
         """
@@ -1755,11 +2513,193 @@ class RegressionResults(base.LikelihoodModelResults):
         .. todo:: Currently there is no check for extra or misspelled keywords,
              except in the case of cov_type `HCx`
         """
-        pass
+        from statsmodels.base.covtype import descriptions, normalize_cov_type
+        import statsmodels.stats.sandwich_covariance as sw
+
+        cov_type = normalize_cov_type(cov_type)

-    def summary(self, yname: (str | None)=None, xname: (Sequence[str] |
-        None)=None, title: (str | None)=None, alpha: float=0.05, slim: bool
-        =False):
+        if 'kernel' in kwargs:
+            kwargs['weights_func'] = kwargs.pop('kernel')
+        if 'weights_func' in kwargs and not callable(kwargs['weights_func']):
+            kwargs['weights_func'] = sw.kernel_dict[kwargs['weights_func']]
+
+        # TODO: make separate function that returns a robust cov plus info
+        use_self = kwargs.pop('use_self', False)
+        if use_self:
+            res = self
+        else:
+            res = self.__class__(
+                self.model, self.params,
+                normalized_cov_params=self.normalized_cov_params,
+                scale=self.scale)
+
+        res.cov_type = cov_type
+        # use_t might already be defined by the class, and already set
+        if use_t is None:
+            use_t = self.use_t
+        res.cov_kwds = {'use_t': use_t}  # store for information
+        res.use_t = use_t
+
+        adjust_df = False
+        if cov_type in ['cluster', 'hac-panel', 'hac-groupsum']:
+            df_correction = kwargs.get('df_correction', None)
+            # TODO: check also use_correction, do I need all combinations?
+            if df_correction is not False:  # i.e. in [None, True]:
+                # user did not explicitely set it to False
+                adjust_df = True
+
+        res.cov_kwds['adjust_df'] = adjust_df
+
+        # verify and set kwargs, and calculate cov
+        # TODO: this should be outsourced in a function so we can reuse it in
+        #       other models
+        # TODO: make it DRYer   repeated code for checking kwargs
+        if cov_type in ['fixed scale', 'fixed_scale']:
+            res.cov_kwds['description'] = descriptions['fixed_scale']
+
+            res.cov_kwds['scale'] = scale = kwargs.get('scale', 1.)
+            res.cov_params_default = scale * res.normalized_cov_params
+        elif cov_type.upper() in ('HC0', 'HC1', 'HC2', 'HC3'):
+            if kwargs:
+                raise ValueError('heteroscedasticity robust covariance '
+                                 'does not use keywords')
+            res.cov_kwds['description'] = descriptions[cov_type.upper()]
+            res.cov_params_default = getattr(self, 'cov_' + cov_type.upper())
+        elif cov_type.lower() == 'hac':
+            # TODO: check if required, default in cov_hac_simple
+            maxlags = kwargs['maxlags']
+            res.cov_kwds['maxlags'] = maxlags
+            weights_func = kwargs.get('weights_func', sw.weights_bartlett)
+            res.cov_kwds['weights_func'] = weights_func
+            use_correction = kwargs.get('use_correction', False)
+            res.cov_kwds['use_correction'] = use_correction
+            res.cov_kwds['description'] = descriptions['HAC'].format(
+                maxlags=maxlags,
+                correction=['without', 'with'][use_correction])
+
+            res.cov_params_default = sw.cov_hac_simple(
+                self, nlags=maxlags, weights_func=weights_func,
+                use_correction=use_correction)
+        elif cov_type.lower() == 'cluster':
+            # cluster robust standard errors, one- or two-way
+            groups = kwargs['groups']
+            if not hasattr(groups, 'shape'):
+                groups = np.asarray(groups).T
+
+            if groups.ndim >= 2:
+                groups = groups.squeeze()
+
+            res.cov_kwds['groups'] = groups
+            use_correction = kwargs.get('use_correction', True)
+            res.cov_kwds['use_correction'] = use_correction
+            if groups.ndim == 1:
+                if adjust_df:
+                    # need to find number of groups
+                    # duplicate work
+                    self.n_groups = n_groups = len(np.unique(groups))
+                res.cov_params_default = sw.cov_cluster(
+                    self, groups, use_correction=use_correction)
+
+            elif groups.ndim == 2:
+                if hasattr(groups, 'values'):
+                    groups = groups.values
+
+                if adjust_df:
+                    # need to find number of groups
+                    # duplicate work
+                    n_groups0 = len(np.unique(groups[:, 0]))
+                    n_groups1 = len(np.unique(groups[:, 1]))
+                    self.n_groups = (n_groups0, n_groups1)
+                    n_groups = min(n_groups0, n_groups1)  # use for adjust_df
+
+                # Note: sw.cov_cluster_2groups has 3 returns
+                res.cov_params_default = sw.cov_cluster_2groups(
+                    self, groups, use_correction=use_correction)[0]
+            else:
+                raise ValueError('only two groups are supported')
+            res.cov_kwds['description'] = descriptions['cluster']
+
+        elif cov_type.lower() == 'hac-panel':
+            # cluster robust standard errors
+            res.cov_kwds['time'] = time = kwargs.get('time', None)
+            res.cov_kwds['groups'] = groups = kwargs.get('groups', None)
+            # TODO: nlags is currently required
+            # nlags = kwargs.get('nlags', True)
+            # res.cov_kwds['nlags'] = nlags
+            # TODO: `nlags` or `maxlags`
+            res.cov_kwds['maxlags'] = maxlags = kwargs['maxlags']
+            use_correction = kwargs.get('use_correction', 'hac')
+            res.cov_kwds['use_correction'] = use_correction
+            weights_func = kwargs.get('weights_func', sw.weights_bartlett)
+            res.cov_kwds['weights_func'] = weights_func
+            if groups is not None:
+                groups = np.asarray(groups)
+                tt = (np.nonzero(groups[:-1] != groups[1:])[0] + 1).tolist()
+                nobs_ = len(groups)
+            elif time is not None:
+                time = np.asarray(time)
+                # TODO: clumsy time index in cov_nw_panel
+                tt = (np.nonzero(time[1:] < time[:-1])[0] + 1).tolist()
+                nobs_ = len(time)
+            else:
+                raise ValueError('either time or groups needs to be given')
+            groupidx = lzip([0] + tt, tt + [nobs_])
+            self.n_groups = n_groups = len(groupidx)
+            res.cov_params_default = sw.cov_nw_panel(
+                self,
+                maxlags,
+                groupidx,
+                weights_func=weights_func,
+                use_correction=use_correction
+            )
+            res.cov_kwds['description'] = descriptions['HAC-Panel']
+
+        elif cov_type.lower() == 'hac-groupsum':
+            # Driscoll-Kraay standard errors
+            res.cov_kwds['time'] = time = kwargs['time']
+            # TODO: nlags is currently required
+            # nlags = kwargs.get('nlags', True)
+            # res.cov_kwds['nlags'] = nlags
+            # TODO: `nlags` or `maxlags`
+            res.cov_kwds['maxlags'] = maxlags = kwargs['maxlags']
+            use_correction = kwargs.get('use_correction', 'cluster')
+            res.cov_kwds['use_correction'] = use_correction
+            weights_func = kwargs.get('weights_func', sw.weights_bartlett)
+            res.cov_kwds['weights_func'] = weights_func
+            if adjust_df:
+                # need to find number of groups
+                tt = (np.nonzero(time[1:] < time[:-1])[0] + 1)
+                self.n_groups = n_groups = len(tt) + 1
+            res.cov_params_default = sw.cov_nw_groupsum(
+                self, maxlags, time, weights_func=weights_func,
+                use_correction=use_correction)
+            res.cov_kwds['description'] = descriptions['HAC-Groupsum']
+        else:
+            raise ValueError('cov_type not recognized. See docstring for ' +
+                             'available options and spelling')
+
+        if adjust_df:
+            # Note: df_resid is used for scale and others, add new attribute
+            res.df_resid_inference = n_groups - 1
+
+        return res
+
+    @Appender(pred.get_prediction.__doc__)
+    def get_prediction(self, exog=None, transform=True, weights=None,
+                       row_labels=None, **kwargs):
+
+        return pred.get_prediction(
+            self, exog=exog, transform=transform, weights=weights,
+            row_labels=row_labels, **kwargs)
+
+    def summary(
+            self,
+            yname: str | None = None,
+            xname: Sequence[str] | None = None,
+            title: str | None = None,
+            alpha: float = 0.05,
+            slim: bool = False,
+    ):
         """
         Summarize the Regression Results.

@@ -1790,11 +2730,140 @@ class RegressionResults(base.LikelihoodModelResults):
         --------
         statsmodels.iolib.summary.Summary : A class that holds summary results.
         """
-        pass
-
-    def summary2(self, yname: (str | None)=None, xname: (Sequence[str] |
-        None)=None, title: (str | None)=None, alpha: float=0.05,
-        float_format: str='%.4f'):
+        from statsmodels.stats.stattools import (
+            durbin_watson,
+            jarque_bera,
+            omni_normtest,
+        )
+        alpha = float_like(alpha, "alpha", optional=False)
+        slim = bool_like(slim, "slim", optional=False, strict=True)
+
+        jb, jbpv, skew, kurtosis = jarque_bera(self.wresid)
+        omni, omnipv = omni_normtest(self.wresid)
+
+        eigvals = self.eigenvals
+        condno = self.condition_number
+
+        # TODO: Avoid adding attributes in non-__init__
+        self.diagn = dict(jb=jb, jbpv=jbpv, skew=skew, kurtosis=kurtosis,
+                          omni=omni, omnipv=omnipv, condno=condno,
+                          mineigval=eigvals[-1])
+
+        # TODO not used yet
+        # diagn_left_header = ['Models stats']
+        # diagn_right_header = ['Residual stats']
+
+        # TODO: requiring list/iterable is a bit annoying
+        #   need more control over formatting
+        # TODO: default do not work if it's not identically spelled
+
+        top_left = [('Dep. Variable:', None),
+                    ('Model:', None),
+                    ('Method:', ['Least Squares']),
+                    ('Date:', None),
+                    ('Time:', None),
+                    ('No. Observations:', None),
+                    ('Df Residuals:', None),
+                    ('Df Model:', None),
+                    ]
+
+        if hasattr(self, 'cov_type'):
+            top_left.append(('Covariance Type:', [self.cov_type]))
+
+        rsquared_type = '' if self.k_constant else ' (uncentered)'
+        top_right = [('R-squared' + rsquared_type + ':',
+                      ["%#8.3f" % self.rsquared]),
+                     ('Adj. R-squared' + rsquared_type + ':',
+                      ["%#8.3f" % self.rsquared_adj]),
+                     ('F-statistic:', ["%#8.4g" % self.fvalue]),
+                     ('Prob (F-statistic):', ["%#6.3g" % self.f_pvalue]),
+                     ('Log-Likelihood:', None),
+                     ('AIC:', ["%#8.4g" % self.aic]),
+                     ('BIC:', ["%#8.4g" % self.bic])
+                     ]
+
+        if slim:
+            slimlist = ['Dep. Variable:', 'Model:', 'No. Observations:',
+                        'Covariance Type:', 'R-squared:', 'Adj. R-squared:',
+                        'F-statistic:', 'Prob (F-statistic):']
+            diagn_left = diagn_right = []
+            top_left = [elem for elem in top_left if elem[0] in slimlist]
+            top_right = [elem for elem in top_right if elem[0] in slimlist]
+            top_right = top_right + \
+                [("", [])] * (len(top_left) - len(top_right))
+        else:
+            diagn_left = [('Omnibus:', ["%#6.3f" % omni]),
+                          ('Prob(Omnibus):', ["%#6.3f" % omnipv]),
+                          ('Skew:', ["%#6.3f" % skew]),
+                          ('Kurtosis:', ["%#6.3f" % kurtosis])
+                          ]
+
+            diagn_right = [('Durbin-Watson:',
+                            ["%#8.3f" % durbin_watson(self.wresid)]
+                            ),
+                           ('Jarque-Bera (JB):', ["%#8.3f" % jb]),
+                           ('Prob(JB):', ["%#8.3g" % jbpv]),
+                           ('Cond. No.', ["%#8.3g" % condno])
+                           ]
+
+        if title is None:
+            title = self.model.__class__.__name__ + ' ' + "Regression Results"
+
+        # create summary table instance
+        from statsmodels.iolib.summary import Summary
+        smry = Summary()
+        smry.add_table_2cols(self, gleft=top_left, gright=top_right,
+                             yname=yname, xname=xname, title=title)
+        smry.add_table_params(self, yname=yname, xname=xname, alpha=alpha,
+                              use_t=self.use_t)
+        if not slim:
+            smry.add_table_2cols(self, gleft=diagn_left, gright=diagn_right,
+                                 yname=yname, xname=xname,
+                                 title="")
+
+        # add warnings/notes, added to text format only
+        etext = []
+        if not self.k_constant:
+            etext.append(
+                "R² is computed without centering (uncentered) since the "
+                "model does not contain a constant."
+            )
+        if hasattr(self, 'cov_type'):
+            etext.append(self.cov_kwds['description'])
+        if self.model.exog.shape[0] < self.model.exog.shape[1]:
+            wstr = "The input rank is higher than the number of observations."
+            etext.append(wstr)
+        if eigvals[-1] < 1e-10:
+            wstr = "The smallest eigenvalue is %6.3g. This might indicate "
+            wstr += "that there are\n"
+            wstr += "strong multicollinearity problems or that the design "
+            wstr += "matrix is singular."
+            wstr = wstr % eigvals[-1]
+            etext.append(wstr)
+        elif condno > 1000:  # TODO: what is recommended?
+            wstr = "The condition number is large, %6.3g. This might "
+            wstr += "indicate that there are\n"
+            wstr += "strong multicollinearity or other numerical "
+            wstr += "problems."
+            wstr = wstr % condno
+            etext.append(wstr)
+
+        if etext:
+            etext = ["[{0}] {1}".format(i + 1, text)
+                     for i, text in enumerate(etext)]
+            etext.insert(0, "Notes:")
+            smry.add_extra_txt(etext)
+
+        return smry
+
+    def summary2(
+            self,
+            yname: str | None = None,
+            xname: Sequence[str] | None = None,
+            title: str | None = None,
+            alpha: float = 0.05,
+            float_format: str = "%.4f",
+    ):
         """
         Experimental summary function to summarize the regression results.

@@ -1825,7 +2894,70 @@ class RegressionResults(base.LikelihoodModelResults):
         statsmodels.iolib.summary2.Summary
             A class that holds summary results.
         """
-        pass
+        # Diagnostics
+        from statsmodels.stats.stattools import (
+            durbin_watson,
+            jarque_bera,
+            omni_normtest,
+        )
+
+        jb, jbpv, skew, kurtosis = jarque_bera(self.wresid)
+        omni, omnipv = omni_normtest(self.wresid)
+        dw = durbin_watson(self.wresid)
+        eigvals = self.eigenvals
+        condno = self.condition_number
+        diagnostic = dict([
+            ('Omnibus:',  "%.3f" % omni),
+            ('Prob(Omnibus):', "%.3f" % omnipv),
+            ('Skew:', "%.3f" % skew),
+            ('Kurtosis:', "%.3f" % kurtosis),
+            ('Durbin-Watson:', "%.3f" % dw),
+            ('Jarque-Bera (JB):', "%.3f" % jb),
+            ('Prob(JB):', "%.3f" % jbpv),
+            ('Condition No.:', "%.0f" % condno)
+            ])
+
+        # Summary
+        from statsmodels.iolib import summary2
+        smry = summary2.Summary()
+        smry.add_base(results=self, alpha=alpha, float_format=float_format,
+                      xname=xname, yname=yname, title=title)
+        smry.add_dict(diagnostic)
+
+        etext = []
+
+        if not self.k_constant:
+            etext.append(
+                "R² is computed without centering (uncentered) since the \
+                model does not contain a constant."
+            )
+        if hasattr(self, 'cov_type'):
+            etext.append(self.cov_kwds['description'])
+        if self.model.exog.shape[0] < self.model.exog.shape[1]:
+            wstr = "The input rank is higher than the number of observations."
+            etext.append(wstr)
+
+        # Warnings
+        if eigvals[-1] < 1e-10:
+            warn = "The smallest eigenvalue is %6.3g. This might indicate that\
+                there are strong multicollinearity problems or that the design\
+                matrix is singular." % eigvals[-1]
+            etext.append(warn)
+        elif condno > 1000:
+            warn = "The condition number is large, %6.3g. This might indicate\
+                that there are strong multicollinearity or other numerical\
+                problems." % condno
+            etext.append(warn)
+
+        if etext:
+            etext = ["[{0}] {1}".format(i + 1, text)
+                     for i, text in enumerate(etext)]
+            etext.insert(0, "Notes:")
+
+        for line in etext:
+            smry.add_text(line)
+
+        return smry


 class OLSResults(RegressionResults):
@@ -1882,10 +3014,11 @@ class OLSResults(RegressionResults):
         statsmodels.stats.outliers_influence.OLSInfluence
             A class that exposes methods to examine observation influence.
         """
-        pass
+        from statsmodels.stats.outliers_influence import OLSInfluence
+        return OLSInfluence(self)

-    def outlier_test(self, method='bonf', alpha=0.05, labels=None, order=
-        False, cutoff=None):
+    def outlier_test(self, method='bonf', alpha=.05, labels=None,
+                     order=False, cutoff=None):
         """
         Test observations for outliers according to method.

@@ -1931,10 +3064,12 @@ class OLSResults(RegressionResults):
         The unadjusted p-value is stats.t.sf(abs(resid), df) where
         df = df_resid - 1.
         """
-        pass
+        from statsmodels.stats.outliers_influence import outlier_test
+        return outlier_test(self, method, alpha, labels=labels,
+                            order=order, cutoff=cutoff)

     def el_test(self, b0_vals, param_nums, return_weights=0, ret_params=0,
-        method='nm', stochastic_exog=1):
+                method='nm', stochastic_exog=1):
         """
         Test single or joint hypotheses using Empirical Likelihood.

@@ -1983,10 +3118,47 @@ class OLSResults(RegressionResults):
         >>> fitted.el_test([0], [1])
         >>> (27.248146353888796, 1.7894660442330235e-07)
         """
-        pass
+        params = np.copy(self.params)
+        opt_fun_inst = _ELRegOpts()  # to store weights
+        if len(param_nums) == len(params):
+            llr = opt_fun_inst._opt_nuis_regress(
+                [],
+                param_nums=param_nums,
+                endog=self.model.endog,
+                exog=self.model.exog,
+                nobs=self.model.nobs,
+                nvar=self.model.exog.shape[1],
+                params=params,
+                b0_vals=b0_vals,
+                stochastic_exog=stochastic_exog)
+            pval = 1 - stats.chi2.cdf(llr, len(param_nums))
+            if return_weights:
+                return llr, pval, opt_fun_inst.new_weights
+            else:
+                return llr, pval
+        x0 = np.delete(params, param_nums)
+        args = (param_nums, self.model.endog, self.model.exog,
+                self.model.nobs, self.model.exog.shape[1], params,
+                b0_vals, stochastic_exog)
+        if method == 'nm':
+            llr = optimize.fmin(opt_fun_inst._opt_nuis_regress, x0,
+                                maxfun=10000, maxiter=10000, full_output=1,
+                                disp=0, args=args)[1]
+        if method == 'powell':
+            llr = optimize.fmin_powell(opt_fun_inst._opt_nuis_regress, x0,
+                                       full_output=1, disp=0,
+                                       args=args)[1]
+
+        pval = 1 - stats.chi2.cdf(llr, len(param_nums))
+        if ret_params:
+            return llr, pval, opt_fun_inst.new_weights, opt_fun_inst.new_params
+        elif return_weights:
+            return llr, pval, opt_fun_inst.new_weights
+        else:
+            return llr, pval

-    def conf_int_el(self, param_num, sig=0.05, upper_bound=None,
-        lower_bound=None, method='nm', stochastic_exog=True):
+    def conf_int_el(self, param_num, sig=.05, upper_bound=None,
+                    lower_bound=None, method='nm', stochastic_exog=True):
         """
         Compute the confidence interval using Empirical Likelihood.

@@ -2048,19 +3220,50 @@ class OLSResults(RegressionResults):
         (>50), the starting parameters of the interior minimization need
         to be changed.
         """
-        pass
+        r0 = stats.chi2.ppf(1 - sig, 1)
+        if upper_bound is None:
+            upper_bound = self.conf_int(.01)[param_num][1]
+        if lower_bound is None:
+            lower_bound = self.conf_int(.01)[param_num][0]
+
+        def f(b0):
+            return self.el_test(np.array([b0]), np.array([param_num]),
+                                method=method,
+                                stochastic_exog=stochastic_exog)[0] - r0
+
+        lowerl = optimize.brenth(f, lower_bound,
+                                 self.params[param_num])
+        upperl = optimize.brenth(f, self.params[param_num],
+                                 upper_bound)
+        #  ^ Seems to be faster than brentq in most cases
+        return (lowerl, upperl)


 class RegressionResultsWrapper(wrap.ResultsWrapper):
-    _attrs = {'chisq': 'columns', 'sresid': 'rows', 'weights': 'rows',
-        'wresid': 'rows', 'bcov_unscaled': 'cov', 'bcov_scaled': 'cov',
-        'HC0_se': 'columns', 'HC1_se': 'columns', 'HC2_se': 'columns',
-        'HC3_se': 'columns', 'norm_resid': 'rows'}
-    _wrap_attrs = wrap.union_dicts(base.LikelihoodResultsWrapper._attrs, _attrs
-        )
+
+    _attrs = {
+        'chisq': 'columns',
+        'sresid': 'rows',
+        'weights': 'rows',
+        'wresid': 'rows',
+        'bcov_unscaled': 'cov',
+        'bcov_scaled': 'cov',
+        'HC0_se': 'columns',
+        'HC1_se': 'columns',
+        'HC2_se': 'columns',
+        'HC3_se': 'columns',
+        'norm_resid': 'rows',
+    }
+
+    _wrap_attrs = wrap.union_dicts(base.LikelihoodResultsWrapper._attrs,
+                                   _attrs)
+
     _methods = {}
-    _wrap_methods = wrap.union_dicts(base.LikelihoodResultsWrapper.
-        _wrap_methods, _methods)
+
+    _wrap_methods = wrap.union_dicts(
+                        base.LikelihoodResultsWrapper._wrap_methods,
+                        _methods)


-wrap.populate_wrapper(RegressionResultsWrapper, RegressionResults)
+wrap.populate_wrapper(RegressionResultsWrapper,
+                      RegressionResults)
diff --git a/statsmodels/regression/mixed_linear_model.py b/statsmodels/regression/mixed_linear_model.py
index 68d4e076a..65603dfba 100644
--- a/statsmodels/regression/mixed_linear_model.py
+++ b/statsmodels/regression/mixed_linear_model.py
@@ -143,26 +143,36 @@ Therefore, optimization methods requiring the Hessian matrix such as
 the Newton-Raphson algorithm cannot be used for model fitting.
 """
 import warnings
+
 import numpy as np
 import pandas as pd
 import patsy
 from scipy import sparse
 from scipy.stats.distributions import norm
+
 from statsmodels.base._penalties import Penalty
 import statsmodels.base.model as base
 from statsmodels.tools import data as data_tools
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.sm_exceptions import ConvergenceWarning
-_warn_cov_sing = 'The random effects covariance matrix is singular.'
+
+_warn_cov_sing = "The random effects covariance matrix is singular."


 def _dot(x, y):
     """
     Returns the dot product of the arrays, works for sparse and dense.
     """
-    pass
+
+    if isinstance(x, np.ndarray) and isinstance(y, np.ndarray):
+        return np.dot(x, y)
+    elif sparse.issparse(x):
+        return x.dot(y)
+    elif sparse.issparse(y):
+        return y.T.dot(x.T).T


+# From numpy, adapted to work with sparse and dense arrays.
 def _multi_dot_three(A, B, C):
     """
     Find best ordering for three arrays and do the multiplication.
@@ -170,7 +180,17 @@ def _multi_dot_three(A, B, C):
     Doing in manually instead of using dynamic programing is
     approximately 15 times faster.
     """
-    pass
+    # cost1 = cost((AB)C)
+    cost1 = (A.shape[0] * A.shape[1] * B.shape[1] +  # (AB)
+             A.shape[0] * B.shape[1] * C.shape[1])   # (--)C
+    # cost2 = cost((AB)C)
+    cost2 = (B.shape[0] * B.shape[1] * C.shape[1] +  # (BC)
+             A.shape[0] * A.shape[1] * C.shape[1])   # A(--)
+
+    if cost1 < cost2:
+        return _dot(_dot(A, B), C)
+    else:
+        return _dot(A, _dot(B, C))


 def _dotsum(x, y):
@@ -178,7 +198,12 @@ def _dotsum(x, y):
     Returns sum(x * y), where '*' is the pointwise product, computed
     efficiently for dense and sparse matrices.
     """
-    pass
+
+    if sparse.issparse(x):
+        return x.multiply(y).sum()
+    else:
+        # This way usually avoids allocating a temporary.
+        return np.dot(x.ravel(), y.ravel())


 class VCSpec:
@@ -209,7 +234,18 @@ def _get_exog_re_names(self, exog_re):
     Passes through if given a list of names. Otherwise, gets pandas names
     or creates some generic variable names as needed.
     """
-    pass
+    if self.k_re == 0:
+        return []
+    if isinstance(exog_re, pd.DataFrame):
+        return exog_re.columns.tolist()
+    elif isinstance(exog_re, pd.Series) and exog_re.name is not None:
+        return [exog_re.name]
+    elif isinstance(exog_re, list):
+        return exog_re
+
+    # Default names
+    defnames = ["x_re{0:1d}".format(k + 1) for k in range(exog_re.shape[1])]
+    return defnames


 class MixedLMParams:
@@ -233,6 +269,7 @@ class MixedLMParams:
     """

     def __init__(self, k_fe, k_re, k_vc):
+
         self.k_fe = k_fe
         self.k_re = k_re
         self.k_re2 = k_re * (k_re + 1) // 2
@@ -265,11 +302,45 @@ class MixedLMParams:
         -------
         A MixedLMParams object.
         """
-        pass
+        k_re2 = int(k_re * (k_re + 1) / 2)
+
+        # The number of covariance parameters.
+        if has_fe:
+            k_vc = len(params) - k_fe - k_re2
+        else:
+            k_vc = len(params) - k_re2
+
+        pa = MixedLMParams(k_fe, k_re, k_vc)
+
+        cov_re = np.zeros((k_re, k_re))
+        ix = pa._ix
+        if has_fe:
+            pa.fe_params = params[0:k_fe]
+            cov_re[ix] = params[k_fe:k_fe+k_re2]
+        else:
+            pa.fe_params = np.zeros(k_fe)
+            cov_re[ix] = params[0:k_re2]
+
+        if use_sqrt:
+            cov_re = np.dot(cov_re, cov_re.T)
+        else:
+            cov_re = (cov_re + cov_re.T) - np.diag(np.diag(cov_re))
+
+        pa.cov_re = cov_re
+        if k_vc > 0:
+            if use_sqrt:
+                pa.vcomp = params[-k_vc:]**2
+            else:
+                pa.vcomp = params[-k_vc:]
+        else:
+            pa.vcomp = np.array([])
+
+        return pa
+
     from_packed = staticmethod(from_packed)

     def from_components(fe_params=None, cov_re=None, cov_re_sqrt=None,
-        vcomp=None):
+                        vcomp=None):
         """
         Create a MixedLMParams object from each parameter component.

@@ -292,14 +363,40 @@ class MixedLMParams:
         -------
         A MixedLMParams object.
         """
-        pass
+
+        if vcomp is None:
+            vcomp = np.empty(0)
+        if fe_params is None:
+            fe_params = np.empty(0)
+        if cov_re is None and cov_re_sqrt is None:
+            cov_re = np.empty((0, 0))
+
+        k_fe = len(fe_params)
+        k_vc = len(vcomp)
+        k_re = cov_re.shape[0] if cov_re is not None else cov_re_sqrt.shape[0]
+
+        pa = MixedLMParams(k_fe, k_re, k_vc)
+        pa.fe_params = fe_params
+        if cov_re_sqrt is not None:
+            pa.cov_re = np.dot(cov_re_sqrt, cov_re_sqrt.T)
+        elif cov_re is not None:
+            pa.cov_re = cov_re
+
+        pa.vcomp = vcomp
+
+        return pa
+
     from_components = staticmethod(from_components)

     def copy(self):
         """
         Returns a copy of the object.
         """
-        pass
+        obj = MixedLMParams(self.k_fe, self.k_re, self.k_vc)
+        obj.fe_params = self.fe_params.copy()
+        obj.cov_re = self.cov_re.copy()
+        obj.vcomp = self.vcomp.copy()
+        return obj

     def get_packed(self, use_sqrt, has_fe=False):
         """
@@ -315,16 +412,39 @@ class MixedLMParams:
             If True, the fixed effects parameters are included
             in the packed result, otherwise they are omitted.
         """
-        pass
+
+        if self.k_re > 0:
+            if use_sqrt:
+                try:
+                    L = np.linalg.cholesky(self.cov_re)
+                except np.linalg.LinAlgError:
+                    L = np.diag(np.sqrt(np.diag(self.cov_re)))
+                cpa = L[self._ix]
+            else:
+                cpa = self.cov_re[self._ix]
+        else:
+            cpa = np.zeros(0)
+
+        if use_sqrt:
+            vcomp = np.sqrt(self.vcomp)
+        else:
+            vcomp = self.vcomp
+
+        if has_fe:
+            pa = np.concatenate((self.fe_params, cpa, vcomp))
+        else:
+            pa = np.concatenate((cpa, vcomp))
+
+        return pa


 def _smw_solver(s, A, AtA, Qi, di):
-    """
+    r"""
     Returns a solver for the linear system:

     .. math::

-        (sI + ABA^\\prime) y = x
+        (sI + ABA^\prime) y = x

     The returned function f satisfies f(x) = y as defined above.

@@ -338,7 +458,7 @@ def _smw_solver(s, A, AtA, Qi, di):
     A : ndarray
         p x q matrix, in general q << p, may be sparse.
     AtA : square ndarray
-        :math:`A^\\prime  A`, a q x q matrix.
+        :math:`A^\prime  A`, a q x q matrix.
     Qi : square symmetric ndarray
         The matrix `B` is q x q, where q = r + d.  `B` consists of a r
         x r diagonal block whose inverse is `Qi`, and a d x d diagonal
@@ -355,16 +475,48 @@ def _smw_solver(s, A, AtA, Qi, di):
     Uses Sherman-Morrison-Woodbury identity:
         https://en.wikipedia.org/wiki/Woodbury_matrix_identity
     """
-    pass
+
+    # Use SMW identity
+    qmat = AtA / s
+    m = Qi.shape[0]
+    qmat[0:m, 0:m] += Qi
+
+    if sparse.issparse(A):
+        qmat[m:, m:] += sparse.diags(di)
+
+        def solver(rhs):
+            ql = A.T.dot(rhs)
+            # Based on profiling, the next line can be the
+            # majority of the entire run time of fitting the model.
+            ql = sparse.linalg.spsolve(qmat, ql)
+            if ql.ndim < rhs.ndim:
+                # spsolve squeezes nx1 rhs
+                ql = ql[:, None]
+            ql = A.dot(ql)
+            return rhs / s - ql / s**2
+
+    else:
+        d = qmat.shape[0]
+        qmat.flat[m*(d+1)::d+1] += di
+        qmati = np.linalg.solve(qmat, A.T)
+
+        def solver(rhs):
+            # A is tall and qmati is wide, so we want
+            # A * (qmati * rhs) not (A * qmati) * rhs
+            ql = np.dot(qmati, rhs)
+            ql = np.dot(A, ql)
+            return rhs / s - ql / s**2
+
+    return solver


 def _smw_logdet(s, A, AtA, Qi, di, B_logdet):
-    """
+    r"""
     Returns the log determinant of

     .. math::

-        sI + ABA^\\prime
+        sI + ABA^\prime

     Uses the matrix determinant lemma to accelerate the calculation.
     B is assumed to be positive definite, and s > 0, therefore the
@@ -377,7 +529,7 @@ def _smw_logdet(s, A, AtA, Qi, di, B_logdet):
     A : ndarray
         p x q matrix, in general q << p.
     AtA : square ndarray
-        :math:`A^\\prime  A`, a q x q matrix.
+        :math:`A^\prime  A`, a q x q matrix.
     Qi : square symmetric ndarray
         The matrix `B` is q x q, where q = r + d.  `B` consists of a r
         x r diagonal block whose inverse is `Qi`, and a d x d diagonal
@@ -396,7 +548,62 @@ def _smw_logdet(s, A, AtA, Qi, di, B_logdet):
     Uses the matrix determinant lemma:
         https://en.wikipedia.org/wiki/Matrix_determinant_lemma
     """
-    pass
+
+    p = A.shape[0]
+    ld = p * np.log(s)
+    qmat = AtA / s
+    m = Qi.shape[0]
+    qmat[0:m, 0:m] += Qi
+
+    if sparse.issparse(qmat):
+        qmat[m:, m:] += sparse.diags(di)
+
+        # There are faster but much more difficult ways to do this
+        # https://stackoverflow.com/questions/19107617
+        lu = sparse.linalg.splu(qmat)
+        dl = lu.L.diagonal().astype(np.complex128)
+        du = lu.U.diagonal().astype(np.complex128)
+        ld1 = np.log(dl).sum() + np.log(du).sum()
+        ld1 = ld1.real
+    else:
+        d = qmat.shape[0]
+        qmat.flat[m*(d+1)::d+1] += di
+        _, ld1 = np.linalg.slogdet(qmat)
+
+    return B_logdet + ld + ld1
+
+
+def _convert_vc(exog_vc):
+
+    vc_names = []
+    vc_colnames = []
+    vc_mats = []
+
+    # Get the groups in sorted order
+    groups = set()
+    for k, v in exog_vc.items():
+        groups |= set(v.keys())
+    groups = list(groups)
+    groups.sort()
+
+    for k, v in exog_vc.items():
+        vc_names.append(k)
+        colnames, mats = [], []
+        for g in groups:
+            try:
+                colnames.append(v[g].columns)
+            except AttributeError:
+                colnames.append([str(j) for j in range(v[g].shape[1])])
+            mats.append(v[g])
+        vc_colnames.append(colnames)
+        vc_mats.append(mats)
+
+    ii = np.argsort(vc_names)
+    vc_names = [vc_names[i] for i in ii]
+    vc_colnames = [vc_colnames[i] for i in ii]
+    vc_mats = [vc_mats[i] for i in ii]
+
+    return VCSpec(vc_names, vc_colnames, vc_mats)


 class MixedLM(base.LikelihoodModel):
@@ -487,37 +694,61 @@ class MixedLM(base.LikelihoodModel):
     >>> result = model.fit()
     """

-    def __init__(self, endog, exog, groups, exog_re=None, exog_vc=None,
-        use_sqrt=True, missing='none', **kwargs):
-        _allowed_kwargs = ['missing_idx', 'design_info', 'formula']
+    def __init__(self, endog, exog, groups, exog_re=None,
+                 exog_vc=None, use_sqrt=True, missing='none',
+                 **kwargs):
+
+        _allowed_kwargs = ["missing_idx", "design_info", "formula"]
         for x in kwargs.keys():
             if x not in _allowed_kwargs:
                 raise ValueError(
-                    'argument %s not permitted for MixedLM initialization' % x)
+                    "argument %s not permitted for MixedLM initialization" % x)
+
         self.use_sqrt = use_sqrt
+
+        # Some defaults
         self.reml = True
         self.fe_pen = None
         self.re_pen = None
+
         if isinstance(exog_vc, dict):
-            warnings.warn('Using deprecated variance components format')
+            warnings.warn("Using deprecated variance components format")
+            # Convert from old to new representation
             exog_vc = _convert_vc(exog_vc)
+
         if exog_vc is not None:
             self.k_vc = len(exog_vc.names)
             self.exog_vc = exog_vc
         else:
             self.k_vc = 0
             self.exog_vc = VCSpec([], [], [])
-        if exog is not None and data_tools._is_using_ndarray_type(exog, None
-            ) and exog.ndim == 1:
+
+        # If there is one covariate, it may be passed in as a column
+        # vector, convert these to 2d arrays.
+        # TODO: Can this be moved up in the class hierarchy?
+        #       yes, it should be done up the hierarchy
+        if (exog is not None and
+                data_tools._is_using_ndarray_type(exog, None) and
+                exog.ndim == 1):
             exog = exog[:, None]
-        if exog_re is not None and data_tools._is_using_ndarray_type(exog_re,
-            None) and exog_re.ndim == 1:
+        if (exog_re is not None and
+                data_tools._is_using_ndarray_type(exog_re, None) and
+                exog_re.ndim == 1):
             exog_re = exog_re[:, None]
-        super(MixedLM, self).__init__(endog, exog, groups=groups, exog_re=
-            exog_re, missing=missing, **kwargs)
-        self._init_keys.extend(['use_sqrt', 'exog_vc'])
+
+        # Calling super creates self.endog, etc. as ndarrays and the
+        # original exog, endog, etc. are self.data.endog, etc.
+        super(MixedLM, self).__init__(endog, exog, groups=groups,
+                                      exog_re=exog_re, missing=missing,
+                                      **kwargs)
+
+        self._init_keys.extend(["use_sqrt", "exog_vc"])
+
+        # Number of fixed effects parameters
         self.k_fe = exog.shape[1]
+
         if exog_re is None and len(self.exog_vc.names) == 0:
+            # Default random effects structure (random intercepts).
             self.k_re = 1
             self.k_re2 = 1
             self.exog_re = np.ones((len(endog), 1), dtype=np.float64)
@@ -526,23 +757,39 @@ class MixedLM(base.LikelihoodModel):
             self.data.param_names = self.exog_names + names
             self.data.exog_re_names = names
             self.data.exog_re_names_full = names
+
         elif exog_re is not None:
+            # Process exog_re the same way that exog is handled
+            # upstream
+            # TODO: this is wrong and should be handled upstream wholly
             self.data.exog_re = exog_re
             self.exog_re = np.asarray(exog_re)
             if self.exog_re.ndim == 1:
                 self.exog_re = self.exog_re[:, None]
+            # Model dimensions
+            # Number of random effect covariates
             self.k_re = self.exog_re.shape[1]
+            # Number of covariance parameters
             self.k_re2 = self.k_re * (self.k_re + 1) // 2
+
         else:
+            # All random effects are variance components
             self.k_re = 0
             self.k_re2 = 0
+
         if not self.data._param_names:
-            param_names, exog_re_names, exog_re_names_full = (self.
-                _make_param_names(exog_re))
+            # HACK: could have been set in from_formula already
+            # needs refactor
+            (param_names, exog_re_names,
+             exog_re_names_full) = self._make_param_names(exog_re)
             self.data.param_names = param_names
             self.data.exog_re_names = exog_re_names
             self.data.exog_re_names_full = exog_re_names_full
+
         self.k_params = self.k_fe + self.k_re2
+
+        # Convert the data to the internal representation, which is a
+        # list of arrays, corresponding to the groups.
         group_labels = list(set(groups))
         group_labels.sort()
         row_indices = dict((s, []) for s in group_labels)
@@ -551,25 +798,38 @@ class MixedLM(base.LikelihoodModel):
         self.row_indices = row_indices
         self.group_labels = group_labels
         self.n_groups = len(self.group_labels)
+
+        # Split the data by groups
         self.endog_li = self.group_list(self.endog)
         self.exog_li = self.group_list(self.exog)
         self.exog_re_li = self.group_list(self.exog_re)
+
+        # Precompute this.
         if self.exog_re is None:
             self.exog_re2_li = None
         else:
             self.exog_re2_li = [np.dot(x.T, x) for x in self.exog_re_li]
+
+        # The total number of observations, summed over all groups
         self.nobs = len(self.endog)
         self.n_totobs = self.nobs
+
+        # Set the fixed effects parameter names
         if self.exog_names is None:
-            self.exog_names = [('FE%d' % (k + 1)) for k in range(self.exog.
-                shape[1])]
+            self.exog_names = ["FE%d" % (k + 1) for k in
+                               range(self.exog.shape[1])]
+
+        # Precompute this
         self._aex_r = []
         self._aex_r2 = []
         for i in range(self.n_groups):
             a = self._augment_exog(i)
             self._aex_r.append(a)
+
             ma = _dot(a.T, a)
             self._aex_r2.append(ma)
+
+        # Precompute this
         self._lin, self._quad = self._reparam()

     def _make_param_names(self, exog_re):
@@ -578,11 +838,28 @@ class MixedLM(base.LikelihoodModel):
         effects variables, and the exogenous random effects variables with
         the interaction terms.
         """
-        pass
+        exog_names = list(self.exog_names)
+        exog_re_names = _get_exog_re_names(self, exog_re)
+        param_names = []
+
+        jj = self.k_fe
+        for i in range(len(exog_re_names)):
+            for j in range(i + 1):
+                if i == j:
+                    param_names.append(exog_re_names[i] + " Var")
+                else:
+                    param_names.append(exog_re_names[j] + " x " +
+                                       exog_re_names[i] + " Cov")
+                jj += 1
+
+        vc_names = [x + " Var" for x in self.exog_vc.names]
+
+        return exog_names + param_names + vc_names, exog_re_names, param_names

     @classmethod
     def from_formula(cls, formula, data, re_formula=None, vc_formula=None,
-        subset=None, use_sparse=False, missing='none', *args, **kwargs):
+                     subset=None, use_sparse=False, missing='none', *args,
+                     **kwargs):
         """
         Create a Model from a formula and dataframe.

@@ -656,7 +933,8 @@ class MixedLM(base.LikelihoodModel):
         different across the schools.

         >>> vc = {'classroom': '0 + C(classroom)'}
-        >>> MixedLM.from_formula('test_score ~ age', vc_formula=vc,                                   re_formula='1', groups='school', data=data)
+        >>> MixedLM.from_formula('test_score ~ age', vc_formula=vc, \
+                                  re_formula='1', groups='school', data=data)

         Now suppose we also have a previous test score called
         'pretest'.  If we want the relationship between pretest
@@ -664,16 +942,123 @@ class MixedLM(base.LikelihoodModel):
         specify a random slope for the pretest score

         >>> vc = {'classroom': '0 + C(classroom)', 'pretest': '0 + pretest'}
-        >>> MixedLM.from_formula('test_score ~ age + pretest', vc_formula=vc,                                   re_formula='1', groups='school', data=data)
+        >>> MixedLM.from_formula('test_score ~ age + pretest', vc_formula=vc, \
+                                  re_formula='1', groups='school', data=data)

         The following model is almost equivalent to the previous one,
         but here the classroom random intercept and pretest slope may
         be correlated.

         >>> vc = {'classroom': '0 + C(classroom)'}
-        >>> MixedLM.from_formula('test_score ~ age + pretest', vc_formula=vc,                                   re_formula='1 + pretest', groups='school',                                   data=data)
+        >>> MixedLM.from_formula('test_score ~ age + pretest', vc_formula=vc, \
+                                  re_formula='1 + pretest', groups='school', \
+                                  data=data)
         """
-        pass
+
+        if "groups" not in kwargs.keys():
+            raise AttributeError("'groups' is a required keyword argument " +
+                                 "in MixedLM.from_formula")
+        groups = kwargs["groups"]
+
+        # If `groups` is a variable name, retrieve the data for the
+        # groups variable.
+        group_name = "Group"
+        if isinstance(groups, str):
+            group_name = groups
+            groups = np.asarray(data[groups])
+        else:
+            groups = np.asarray(groups)
+        del kwargs["groups"]
+
+        # Bypass all upstream missing data handling to properly handle
+        # variance components
+        if missing == 'drop':
+            data, groups = _handle_missing(data, groups, formula, re_formula,
+                                           vc_formula)
+            missing = 'none'
+
+        if re_formula is not None:
+            if re_formula.strip() == "1":
+                # Work around Patsy bug, fixed by 0.3.
+                exog_re = np.ones((data.shape[0], 1))
+                exog_re_names = [group_name]
+            else:
+                eval_env = kwargs.get('eval_env', None)
+                if eval_env is None:
+                    eval_env = 1
+                elif eval_env == -1:
+                    from patsy import EvalEnvironment
+                    eval_env = EvalEnvironment({})
+                exog_re = patsy.dmatrix(re_formula, data, eval_env=eval_env)
+                exog_re_names = exog_re.design_info.column_names
+                exog_re_names = [x.replace("Intercept", group_name)
+                                 for x in exog_re_names]
+                exog_re = np.asarray(exog_re)
+            if exog_re.ndim == 1:
+                exog_re = exog_re[:, None]
+        else:
+            exog_re = None
+            if vc_formula is None:
+                exog_re_names = [group_name]
+            else:
+                exog_re_names = []
+
+        if vc_formula is not None:
+            eval_env = kwargs.get('eval_env', None)
+            if eval_env is None:
+                eval_env = 1
+            elif eval_env == -1:
+                from patsy import EvalEnvironment
+                eval_env = EvalEnvironment({})
+
+            vc_mats = []
+            vc_colnames = []
+            vc_names = []
+            gb = data.groupby(groups)
+            kylist = sorted(gb.groups.keys())
+            vcf = sorted(vc_formula.keys())
+            for vc_name in vcf:
+                md = patsy.ModelDesc.from_formula(vc_formula[vc_name])
+                vc_names.append(vc_name)
+                evc_mats, evc_colnames = [], []
+                for group_ix, group in enumerate(kylist):
+                    ii = gb.groups[group]
+                    mat = patsy.dmatrix(
+                             md,
+                             data.loc[ii, :],
+                             eval_env=eval_env,
+                             return_type='dataframe')
+                    evc_colnames.append(mat.columns.tolist())
+                    if use_sparse:
+                        evc_mats.append(sparse.csr_matrix(mat))
+                    else:
+                        evc_mats.append(np.asarray(mat))
+                vc_mats.append(evc_mats)
+                vc_colnames.append(evc_colnames)
+            exog_vc = VCSpec(vc_names, vc_colnames, vc_mats)
+        else:
+            exog_vc = VCSpec([], [], [])
+
+        kwargs["subset"] = None
+        kwargs["exog_re"] = exog_re
+        kwargs["exog_vc"] = exog_vc
+        kwargs["groups"] = groups
+        mod = super(MixedLM, cls).from_formula(
+            formula, data, *args, **kwargs)
+
+        # expand re names to account for pairs of RE
+        (param_names,
+         exog_re_names,
+         exog_re_names_full) = mod._make_param_names(exog_re_names)
+
+        mod.data.param_names = param_names
+        mod.data.exog_re_names = exog_re_names
+        mod.data.exog_re_names_full = exog_re_names_full
+
+        if vc_formula is not None:
+            mod.data.vcomp_names = mod.exog_vc.names
+
+        return mod

     def predict(self, params, exog=None):
         """
@@ -696,17 +1081,34 @@ class MixedLM(base.LikelihoodModel):
         An array of fitted values.  Note that these predicted values
         only reflect the fixed effects mean structure of the model.
         """
-        pass
+        if exog is None:
+            exog = self.exog
+
+        if isinstance(params, MixedLMParams):
+            params = params.fe_params
+        else:
+            params = params[0:self.k_fe]
+
+        return np.dot(exog, params)

     def group_list(self, array):
         """
         Returns `array` split into subarrays corresponding to the
         grouping structure.
         """
-        pass

-    def fit_regularized(self, start_params=None, method='l1', alpha=0, ceps
-        =0.0001, ptol=1e-06, maxit=200, **fit_kwargs):
+        if array is None:
+            return None
+
+        if array.ndim == 1:
+            return [np.array(array[self.row_indices[k]])
+                    for k in self.group_labels]
+        else:
+            return [np.array(array[self.row_indices[k], :])
+                    for k in self.group_labels]
+
+    def fit_regularized(self, start_params=None, method='l1', alpha=0,
+                        ceps=1e-4, ptol=1e-6, maxit=200, **fit_kwargs):
         """
         Fit a model in which the fixed effects parameters are
         penalized.  The dependence parameters are held fixed at their
@@ -759,7 +1161,111 @@ class MixedLM(base.LikelihoodModel):

         http://statweb.stanford.edu/~tibs/stat315a/Supplements/fuse.pdf
         """
-        pass
+
+        if isinstance(method, str) and (method.lower() != 'l1'):
+            raise ValueError("Invalid regularization method")
+
+        # If method is a smooth penalty just optimize directly.
+        if isinstance(method, Penalty):
+            # Scale the penalty weights by alpha
+            method.alpha = alpha
+            fit_kwargs.update({"fe_pen": method})
+            return self.fit(**fit_kwargs)
+
+        if np.isscalar(alpha):
+            alpha = alpha * np.ones(self.k_fe, dtype=np.float64)
+
+        # Fit the unpenalized model to get the dependence structure.
+        mdf = self.fit(**fit_kwargs)
+        fe_params = mdf.fe_params
+        cov_re = mdf.cov_re
+        vcomp = mdf.vcomp
+        scale = mdf.scale
+        try:
+            cov_re_inv = np.linalg.inv(cov_re)
+        except np.linalg.LinAlgError:
+            cov_re_inv = None
+
+        for itr in range(maxit):
+
+            fe_params_s = fe_params.copy()
+            for j in range(self.k_fe):
+
+                if abs(fe_params[j]) < ceps:
+                    continue
+
+                # The residuals
+                fe_params[j] = 0.
+                expval = np.dot(self.exog, fe_params)
+                resid_all = self.endog - expval
+
+                # The loss function has the form
+                # a*x^2 + b*x + pwt*|x|
+                a, b = 0., 0.
+                for group_ix, group in enumerate(self.group_labels):
+
+                    vc_var = self._expand_vcomp(vcomp, group_ix)
+
+                    exog = self.exog_li[group_ix]
+                    ex_r, ex2_r = self._aex_r[group_ix], self._aex_r2[group_ix]
+
+                    resid = resid_all[self.row_indices[group]]
+                    solver = _smw_solver(scale, ex_r, ex2_r, cov_re_inv,
+                                         1 / vc_var)
+
+                    x = exog[:, j]
+                    u = solver(x)
+                    a += np.dot(u, x)
+                    b -= 2 * np.dot(u, resid)
+
+                pwt1 = alpha[j]
+                if b > pwt1:
+                    fe_params[j] = -(b - pwt1) / (2 * a)
+                elif b < -pwt1:
+                    fe_params[j] = -(b + pwt1) / (2 * a)
+
+            if np.abs(fe_params_s - fe_params).max() < ptol:
+                break
+
+        # Replace the fixed effects estimates with their penalized
+        # values, leave the dependence parameters in their unpenalized
+        # state.
+        params_prof = mdf.params.copy()
+        params_prof[0:self.k_fe] = fe_params
+
+        scale = self.get_scale(fe_params, mdf.cov_re_unscaled, mdf.vcomp)
+
+        # Get the Hessian including only the nonzero fixed effects,
+        # then blow back up to the full size after inverting.
+        hess, sing = self.hessian(params_prof)
+        if sing:
+            warnings.warn(_warn_cov_sing)
+
+        pcov = np.nan * np.ones_like(hess)
+        ii = np.abs(params_prof) > ceps
+        ii[self.k_fe:] = True
+        ii = np.flatnonzero(ii)
+        hess1 = hess[ii, :][:, ii]
+        pcov[np.ix_(ii, ii)] = np.linalg.inv(-hess1)
+
+        params_object = MixedLMParams.from_components(fe_params, cov_re=cov_re)
+
+        results = MixedLMResults(self, params_prof, pcov / scale)
+        results.params_object = params_object
+        results.fe_params = fe_params
+        results.cov_re = cov_re
+        results.vcomp = vcomp
+        results.scale = scale
+        results.cov_re_unscaled = mdf.cov_re_unscaled
+        results.method = mdf.method
+        results.converged = True
+        results.cov_pen = self.cov_pen
+        results.k_fe = self.k_fe
+        results.k_re = self.k_re
+        results.k_re2 = self.k_re2
+        results.k_vc = self.k_vc
+
+        return MixedLMResultsWrapper(results)

     def get_fe_params(self, cov_re, vcomp, tol=1e-10):
         """
@@ -782,7 +1288,64 @@ class MixedLM(base.LikelihoodModel):
         singular : bool
             True if the covariance is singular
         """
-        pass
+
+        if self.k_fe == 0:
+            return np.array([]), False
+
+        sing = False
+
+        if self.k_re == 0:
+            cov_re_inv = np.empty((0, 0))
+        else:
+            w, v = np.linalg.eigh(cov_re)
+            if w.min() < tol:
+                # Singular, use pseudo-inverse
+                sing = True
+                ii = np.flatnonzero(w >= tol)
+                if len(ii) == 0:
+                    cov_re_inv = np.zeros_like(cov_re)
+                else:
+                    vi = v[:, ii]
+                    wi = w[ii]
+                    cov_re_inv = np.dot(vi / wi, vi.T)
+            else:
+                cov_re_inv = np.linalg.inv(cov_re)
+
+        # Cache these quantities that do not change.
+        if not hasattr(self, "_endex_li"):
+            self._endex_li = []
+            for group_ix, _ in enumerate(self.group_labels):
+                mat = np.concatenate(
+                    (self.exog_li[group_ix],
+                     self.endog_li[group_ix][:, None]), axis=1)
+                self._endex_li.append(mat)
+
+        xtxy = 0.
+        for group_ix, group in enumerate(self.group_labels):
+            vc_var = self._expand_vcomp(vcomp, group_ix)
+            if vc_var.size > 0:
+                if vc_var.min() < tol:
+                    # Pseudo-inverse
+                    sing = True
+                    ii = np.flatnonzero(vc_var >= tol)
+                    vc_vari = np.zeros_like(vc_var)
+                    vc_vari[ii] = 1 / vc_var[ii]
+                else:
+                    vc_vari = 1 / vc_var
+            else:
+                vc_vari = np.empty(0)
+            exog = self.exog_li[group_ix]
+            ex_r, ex2_r = self._aex_r[group_ix], self._aex_r2[group_ix]
+            solver = _smw_solver(1., ex_r, ex2_r, cov_re_inv, vc_vari)
+            u = solver(self._endex_li[group_ix])
+            xtxy += np.dot(exog.T, u)
+
+        if sing:
+            fe_params = np.dot(np.linalg.pinv(xtxy[:, 0:-1]), xtxy[:, -1])
+        else:
+            fe_params = np.linalg.solve(xtxy[:, 0:-1], xtxy[:, -1])
+
+        return fe_params, sing

     def _reparam(self):
         """
@@ -803,7 +1366,46 @@ class MixedLM(base.LikelihoodModel):
         covariance and square root transformed variance components),
         then P[i] = lin[i] * R + R' * quad[i] * R
         """
-        pass
+
+        k_fe, k_re, k_re2, k_vc = self.k_fe, self.k_re, self.k_re2, self.k_vc
+        k_tot = k_fe + k_re2 + k_vc
+        ix = np.tril_indices(self.k_re)
+
+        lin = []
+        for k in range(k_fe):
+            e = np.zeros(k_tot)
+            e[k] = 1
+            lin.append(e)
+        for k in range(k_re2):
+            lin.append(np.zeros(k_tot))
+        for k in range(k_vc):
+            lin.append(np.zeros(k_tot))
+
+        quad = []
+        # Quadratic terms for fixed effects.
+        for k in range(k_tot):
+            quad.append(np.zeros((k_tot, k_tot)))
+
+        # Quadratic terms for random effects covariance.
+        ii = np.tril_indices(k_re)
+        ix = [(a, b) for a, b in zip(ii[0], ii[1])]
+        for i1 in range(k_re2):
+            for i2 in range(k_re2):
+                ix1 = ix[i1]
+                ix2 = ix[i2]
+                if (ix1[1] == ix2[1]) and (ix1[0] <= ix2[0]):
+                    ii = (ix2[0], ix1[0])
+                    k = ix.index(ii)
+                    quad[k_fe+k][k_fe+i2, k_fe+i1] += 1
+        for k in range(k_tot):
+            quad[k] = 0.5*(quad[k] + quad[k].T)
+
+        # Quadratic terms for variance components.
+        km = k_fe + k_re2
+        for k in range(km, km+k_vc):
+            quad[k][k, k] = 1
+
+        return lin, quad

     def _expand_vcomp(self, vcomp, group_ix):
         """
@@ -820,7 +1422,17 @@ class MixedLM(base.LikelihoodModel):
         parameter is copied as many times as there are independent
         realizations of the variance component in the given group.
         """
-        pass
+        if len(vcomp) == 0:
+            return np.empty(0)
+        vc_var = []
+        for j in range(len(self.exog_vc.names)):
+            d = self.exog_vc.mats[j][group_ix].shape[1]
+            vc_var.append(vcomp[j] * np.ones(d))
+        if len(vc_var) > 0:
+            return np.concatenate(vc_var)
+        else:
+            # Cannot reach here?
+            return np.empty(0)

     def _augment_exog(self, group_ix):
         """
@@ -828,7 +1440,25 @@ class MixedLM(base.LikelihoodModel):
         for other random effects to obtain a single random effects
         exog matrix for a given group.
         """
-        pass
+        ex_r = self.exog_re_li[group_ix] if self.k_re > 0 else None
+        if self.k_vc == 0:
+            return ex_r
+
+        ex = [ex_r] if self.k_re > 0 else []
+        any_sparse = False
+        for j, _ in enumerate(self.exog_vc.names):
+            ex.append(self.exog_vc.mats[j][group_ix])
+            any_sparse |= sparse.issparse(ex[-1])
+        if any_sparse:
+            for j, x in enumerate(ex):
+                if not sparse.issparse(x):
+                    ex[j] = sparse.csr_matrix(x)
+            ex = sparse.hstack(ex)
+            ex = sparse.csr_matrix(ex)
+        else:
+            ex = np.concatenate(ex, axis=1)
+
+        return ex

     def loglike(self, params, profile_fe=True):
         """
@@ -855,7 +1485,89 @@ class MixedLM(base.LikelihoodModel):
         log-likelihood.  In addition, if `profile_fe` is true the
         fixed effects parameters are also profiled out.
         """
-        pass
+
+        if type(params) is not MixedLMParams:
+            params = MixedLMParams.from_packed(params, self.k_fe,
+                                               self.k_re, self.use_sqrt,
+                                               has_fe=False)
+
+        cov_re = params.cov_re
+        vcomp = params.vcomp
+
+        # Move to the profile set
+        if profile_fe:
+            fe_params, sing = self.get_fe_params(cov_re, vcomp)
+            if sing:
+                self._cov_sing += 1
+        else:
+            fe_params = params.fe_params
+
+        if self.k_re > 0:
+            try:
+                cov_re_inv = np.linalg.inv(cov_re)
+            except np.linalg.LinAlgError:
+                cov_re_inv = np.linalg.pinv(cov_re)
+                self._cov_sing += 1
+            _, cov_re_logdet = np.linalg.slogdet(cov_re)
+        else:
+            cov_re_inv = np.zeros((0, 0))
+            cov_re_logdet = 0
+
+        # The residuals
+        expval = np.dot(self.exog, fe_params)
+        resid_all = self.endog - expval
+
+        likeval = 0.
+
+        # Handle the covariance penalty
+        if (self.cov_pen is not None) and (self.k_re > 0):
+            likeval -= self.cov_pen.func(cov_re, cov_re_inv)
+
+        # Handle the fixed effects penalty
+        if (self.fe_pen is not None):
+            likeval -= self.fe_pen.func(fe_params)
+
+        xvx, qf = 0., 0.
+        for group_ix, group in enumerate(self.group_labels):
+
+            vc_var = self._expand_vcomp(vcomp, group_ix)
+            cov_aug_logdet = cov_re_logdet + np.sum(np.log(vc_var))
+
+            exog = self.exog_li[group_ix]
+            ex_r, ex2_r = self._aex_r[group_ix], self._aex_r2[group_ix]
+            solver = _smw_solver(1., ex_r, ex2_r, cov_re_inv, 1 / vc_var)
+
+            resid = resid_all[self.row_indices[group]]
+
+            # Part 1 of the log likelihood (for both ML and REML)
+            ld = _smw_logdet(1., ex_r, ex2_r, cov_re_inv, 1 / vc_var,
+                             cov_aug_logdet)
+            likeval -= ld / 2.
+
+            # Part 2 of the log likelihood (for both ML and REML)
+            u = solver(resid)
+            qf += np.dot(resid, u)
+
+            # Adjustment for REML
+            if self.reml:
+                mat = solver(exog)
+                xvx += np.dot(exog.T, mat)
+
+        if self.reml:
+            likeval -= (self.n_totobs - self.k_fe) * np.log(qf) / 2.
+            _, ld = np.linalg.slogdet(xvx)
+            likeval -= ld / 2.
+            likeval -= (self.n_totobs - self.k_fe) * np.log(2 * np.pi) / 2.
+            likeval += ((self.n_totobs - self.k_fe) *
+                        np.log(self.n_totobs - self.k_fe) / 2.)
+            likeval -= (self.n_totobs - self.k_fe) / 2.
+        else:
+            likeval -= self.n_totobs * np.log(qf) / 2.
+            likeval -= self.n_totobs * np.log(2 * np.pi) / 2.
+            likeval += self.n_totobs * np.log(self.n_totobs) / 2.
+            likeval -= self.n_totobs / 2.
+
+        return likeval

     def _gen_dV_dPar(self, ex_r, solver, group_ix, max_ix=None):
         """
@@ -874,7 +1586,29 @@ class MixedLM(base.LikelihoodModel):
             If not None, the generator ends when this index
             is reached.
         """
-        pass
+
+        axr = solver(ex_r)
+
+        # Regular random effects
+        jj = 0
+        for j1 in range(self.k_re):
+            for j2 in range(j1 + 1):
+                if max_ix is not None and jj > max_ix:
+                    return
+                # Need 2d
+                mat_l, mat_r = ex_r[:, j1:j1+1], ex_r[:, j2:j2+1]
+                vsl, vsr = axr[:, j1:j1+1], axr[:, j2:j2+1]
+                yield jj, mat_l, mat_r, vsl, vsr, j1 == j2
+                jj += 1
+
+        # Variance components
+        for j, _ in enumerate(self.exog_vc.names):
+            if max_ix is not None and jj > max_ix:
+                return
+            mat = self.exog_vc.mats[j][group_ix]
+            axmat = solver(mat)
+            yield jj, mat, mat, axmat, axmat, True
+            jj += 1

     def score(self, params, profile_fe=True):
         """
@@ -886,7 +1620,36 @@ class MixedLM(base.LikelihoodModel):
         the parameterization defined by this model instance's
         `use_sqrt` attribute.
         """
-        pass
+
+        if type(params) is not MixedLMParams:
+            params = MixedLMParams.from_packed(
+                params, self.k_fe, self.k_re, self.use_sqrt,
+                has_fe=False)
+
+        if profile_fe:
+            params.fe_params, sing = \
+                self.get_fe_params(params.cov_re, params.vcomp)
+
+            if sing:
+                msg = "Random effects covariance is singular"
+                warnings.warn(msg)
+
+        if self.use_sqrt:
+            score_fe, score_re, score_vc = self.score_sqrt(
+                params, calc_fe=not profile_fe)
+        else:
+            score_fe, score_re, score_vc = self.score_full(
+                params, calc_fe=not profile_fe)
+
+        if self._freepat is not None:
+            score_fe *= self._freepat.fe_params
+            score_re *= self._freepat.cov_re[self._freepat._ix]
+            score_vc *= self._freepat.vcomp
+
+        if profile_fe:
+            return np.concatenate((score_re, score_vc))
+        else:
+            return np.concatenate((score_fe, score_re, score_vc))

     def score_full(self, params, calc_fe):
         """
@@ -926,7 +1689,123 @@ class MixedLM(base.LikelihoodModel):
         which `cov_re` is represented through its lower triangle
         (without taking the Cholesky square root).
         """
-        pass
+
+        fe_params = params.fe_params
+        cov_re = params.cov_re
+        vcomp = params.vcomp
+
+        try:
+            cov_re_inv = np.linalg.inv(cov_re)
+        except np.linalg.LinAlgError:
+            cov_re_inv = np.linalg.pinv(cov_re)
+            self._cov_sing += 1
+
+        score_fe = np.zeros(self.k_fe)
+        score_re = np.zeros(self.k_re2)
+        score_vc = np.zeros(self.k_vc)
+
+        # Handle the covariance penalty.
+        if self.cov_pen is not None:
+            score_re -= self.cov_pen.deriv(cov_re, cov_re_inv)
+
+        # Handle the fixed effects penalty.
+        if calc_fe and (self.fe_pen is not None):
+            score_fe -= self.fe_pen.deriv(fe_params)
+
+        # resid' V^{-1} resid, summed over the groups (a scalar)
+        rvir = 0.
+
+        # exog' V^{-1} resid, summed over the groups (a k_fe
+        # dimensional vector)
+        xtvir = 0.
+
+        # exog' V^{_1} exog, summed over the groups (a k_fe x k_fe
+        # matrix)
+        xtvix = 0.
+
+        # V^{-1} exog' dV/dQ_jj exog V^{-1}, where Q_jj is the jj^th
+        # covariance parameter.
+        xtax = [0., ] * (self.k_re2 + self.k_vc)
+
+        # Temporary related to the gradient of log |V|
+        dlv = np.zeros(self.k_re2 + self.k_vc)
+
+        # resid' V^{-1} dV/dQ_jj V^{-1} resid (a scalar)
+        rvavr = np.zeros(self.k_re2 + self.k_vc)
+
+        for group_ix, group in enumerate(self.group_labels):
+
+            vc_var = self._expand_vcomp(vcomp, group_ix)
+
+            exog = self.exog_li[group_ix]
+            ex_r, ex2_r = self._aex_r[group_ix], self._aex_r2[group_ix]
+            solver = _smw_solver(1., ex_r, ex2_r, cov_re_inv, 1 / vc_var)
+
+            # The residuals
+            resid = self.endog_li[group_ix]
+            if self.k_fe > 0:
+                expval = np.dot(exog, fe_params)
+                resid = resid - expval
+
+            if self.reml:
+                viexog = solver(exog)
+                xtvix += np.dot(exog.T, viexog)
+
+            # Contributions to the covariance parameter gradient
+            vir = solver(resid)
+            for (jj, matl, matr, vsl, vsr, sym) in\
+                    self._gen_dV_dPar(ex_r, solver, group_ix):
+                dlv[jj] = _dotsum(matr, vsl)
+                if not sym:
+                    dlv[jj] += _dotsum(matl, vsr)
+
+                ul = _dot(vir, matl)
+                ur = ul.T if sym else _dot(matr.T, vir)
+                ulr = np.dot(ul, ur)
+                rvavr[jj] += ulr
+                if not sym:
+                    rvavr[jj] += ulr.T
+
+                if self.reml:
+                    ul = _dot(viexog.T, matl)
+                    ur = ul.T if sym else _dot(matr.T, viexog)
+                    ulr = np.dot(ul, ur)
+                    xtax[jj] += ulr
+                    if not sym:
+                        xtax[jj] += ulr.T
+
+            # Contribution of log|V| to the covariance parameter
+            # gradient.
+            if self.k_re > 0:
+                score_re -= 0.5 * dlv[0:self.k_re2]
+            if self.k_vc > 0:
+                score_vc -= 0.5 * dlv[self.k_re2:]
+
+            rvir += np.dot(resid, vir)
+
+            if calc_fe:
+                xtvir += np.dot(exog.T, vir)
+
+        fac = self.n_totobs
+        if self.reml:
+            fac -= self.k_fe
+
+        if calc_fe and self.k_fe > 0:
+            score_fe += fac * xtvir / rvir
+
+        if self.k_re > 0:
+            score_re += 0.5 * fac * rvavr[0:self.k_re2] / rvir
+        if self.k_vc > 0:
+            score_vc += 0.5 * fac * rvavr[self.k_re2:] / rvir
+
+        if self.reml:
+            xtvixi = np.linalg.inv(xtvix)
+            for j in range(self.k_re2):
+                score_re[j] += 0.5 * _dotsum(xtvixi.T, xtax[j])
+            for j in range(self.k_vc):
+                score_vc[j] += 0.5 * _dotsum(xtvixi.T, xtax[self.k_re2 + j])
+
+        return score_fe, score_re, score_vc

     def score_sqrt(self, params, calc_fe=True):
         """
@@ -958,7 +1837,20 @@ class MixedLM(base.LikelihoodModel):
             The score vector with respect to variance components
             parameters.
         """
-        pass
+
+        score_fe, score_re, score_vc = self.score_full(params, calc_fe=calc_fe)
+        params_vec = params.get_packed(use_sqrt=True, has_fe=True)
+
+        score_full = np.concatenate((score_fe, score_re, score_vc))
+        scr = 0.
+        for i in range(len(params_vec)):
+            v = self._lin[i] + 2 * np.dot(self._quad[i], params_vec)
+            scr += score_full[i] * v
+        score_fe = scr[0:self.k_fe]
+        score_re = scr[self.k_fe:self.k_fe + self.k_re2]
+        score_vc = scr[self.k_fe + self.k_re2:]
+
+        return score_fe, score_re, score_vc

     def hessian(self, params):
         """
@@ -984,7 +1876,157 @@ class MixedLM(base.LikelihoodModel):
             If True, the covariance matrix is singular and a
             pseudo-inverse is returned.
         """
-        pass
+
+        if type(params) is not MixedLMParams:
+            params = MixedLMParams.from_packed(params, self.k_fe, self.k_re,
+                                               use_sqrt=self.use_sqrt,
+                                               has_fe=True)
+
+        fe_params = params.fe_params
+        vcomp = params.vcomp
+        cov_re = params.cov_re
+        sing = False
+
+        if self.k_re > 0:
+            try:
+                cov_re_inv = np.linalg.inv(cov_re)
+            except np.linalg.LinAlgError:
+                cov_re_inv = np.linalg.pinv(cov_re)
+                sing = True
+        else:
+            cov_re_inv = np.empty((0, 0))
+
+        # Blocks for the fixed and random effects parameters.
+        hess_fe = 0.
+        hess_re = np.zeros((self.k_re2 + self.k_vc, self.k_re2 + self.k_vc))
+        hess_fere = np.zeros((self.k_re2 + self.k_vc, self.k_fe))
+
+        fac = self.n_totobs
+        if self.reml:
+            fac -= self.exog.shape[1]
+
+        rvir = 0.
+        xtvix = 0.
+        xtax = [0., ] * (self.k_re2 + self.k_vc)
+        m = self.k_re2 + self.k_vc
+        B = np.zeros(m)
+        D = np.zeros((m, m))
+        F = [[0.] * m for k in range(m)]
+        for group_ix, group in enumerate(self.group_labels):
+
+            vc_var = self._expand_vcomp(vcomp, group_ix)
+            vc_vari = np.zeros_like(vc_var)
+            ii = np.flatnonzero(vc_var >= 1e-10)
+            if len(ii) > 0:
+                vc_vari[ii] = 1 / vc_var[ii]
+            if len(ii) < len(vc_var):
+                sing = True
+
+            exog = self.exog_li[group_ix]
+            ex_r, ex2_r = self._aex_r[group_ix], self._aex_r2[group_ix]
+            solver = _smw_solver(1., ex_r, ex2_r, cov_re_inv, vc_vari)
+
+            # The residuals
+            resid = self.endog_li[group_ix]
+            if self.k_fe > 0:
+                expval = np.dot(exog, fe_params)
+                resid = resid - expval
+
+            viexog = solver(exog)
+            xtvix += np.dot(exog.T, viexog)
+            vir = solver(resid)
+            rvir += np.dot(resid, vir)
+
+            for (jj1, matl1, matr1, vsl1, vsr1, sym1) in\
+                    self._gen_dV_dPar(ex_r, solver, group_ix):
+
+                ul = _dot(viexog.T, matl1)
+                ur = _dot(matr1.T, vir)
+                hess_fere[jj1, :] += np.dot(ul, ur)
+                if not sym1:
+                    ul = _dot(viexog.T, matr1)
+                    ur = _dot(matl1.T, vir)
+                    hess_fere[jj1, :] += np.dot(ul, ur)
+
+                if self.reml:
+                    ul = _dot(viexog.T, matl1)
+                    ur = ul if sym1 else np.dot(viexog.T, matr1)
+                    ulr = _dot(ul, ur.T)
+                    xtax[jj1] += ulr
+                    if not sym1:
+                        xtax[jj1] += ulr.T
+
+                ul = _dot(vir, matl1)
+                ur = ul if sym1 else _dot(vir, matr1)
+                B[jj1] += np.dot(ul, ur) * (1 if sym1 else 2)
+
+                # V^{-1} * dV/d_theta
+                E = [(vsl1, matr1)]
+                if not sym1:
+                    E.append((vsr1, matl1))
+
+                for (jj2, matl2, matr2, vsl2, vsr2, sym2) in\
+                        self._gen_dV_dPar(ex_r, solver, group_ix, jj1):
+
+                    re = sum([_multi_dot_three(matr2.T, x[0], x[1].T)
+                              for x in E])
+                    vt = 2 * _dot(_multi_dot_three(vir[None, :], matl2, re),
+                                  vir[:, None])
+
+                    if not sym2:
+                        le = sum([_multi_dot_three(matl2.T, x[0], x[1].T)
+                                  for x in E])
+                        vt += 2 * _dot(_multi_dot_three(
+                            vir[None, :], matr2, le), vir[:, None])
+
+                    D[jj1, jj2] += np.squeeze(vt)
+                    if jj1 != jj2:
+                        D[jj2, jj1] += np.squeeze(vt)
+
+                    rt = _dotsum(vsl2, re.T) / 2
+                    if not sym2:
+                        rt += _dotsum(vsr2, le.T) / 2
+
+                    hess_re[jj1, jj2] += rt
+                    if jj1 != jj2:
+                        hess_re[jj2, jj1] += rt
+
+                    if self.reml:
+                        ev = sum([_dot(x[0], _dot(x[1].T, viexog)) for x in E])
+                        u1 = _dot(viexog.T, matl2)
+                        u2 = _dot(matr2.T, ev)
+                        um = np.dot(u1, u2)
+                        F[jj1][jj2] += um + um.T
+                        if not sym2:
+                            u1 = np.dot(viexog.T, matr2)
+                            u2 = np.dot(matl2.T, ev)
+                            um = np.dot(u1, u2)
+                            F[jj1][jj2] += um + um.T
+
+        hess_fe -= fac * xtvix / rvir
+        hess_re = hess_re - 0.5 * fac * (D/rvir - np.outer(B, B) / rvir**2)
+        hess_fere = -fac * hess_fere / rvir
+
+        if self.reml:
+            QL = [np.linalg.solve(xtvix, x) for x in xtax]
+            for j1 in range(self.k_re2 + self.k_vc):
+                for j2 in range(j1 + 1):
+                    a = _dotsum(QL[j1].T, QL[j2])
+                    a -= np.trace(np.linalg.solve(xtvix, F[j1][j2]))
+                    a *= 0.5
+                    hess_re[j1, j2] += a
+                    if j1 > j2:
+                        hess_re[j2, j1] += a
+
+        # Put the blocks together to get the Hessian.
+        m = self.k_fe + self.k_re2 + self.k_vc
+        hess = np.zeros((m, m))
+        hess[0:self.k_fe, 0:self.k_fe] = hess_fe
+        hess[0:self.k_fe, self.k_fe:] = hess_fere.T
+        hess[self.k_fe:, 0:self.k_fe] = hess_fere
+        hess[self.k_fe:, self.k_fe:] = hess_re
+
+        return hess, sing

     def get_scale(self, fe_params, cov_re, vcomp):
         """
@@ -1005,11 +2047,42 @@ class MixedLM(base.LikelihoodModel):
         scale : float
             The estimated error variance.
         """
-        pass

-    def fit(self, start_params=None, reml=True, niter_sa=0, do_cg=True,
-        fe_pen=None, cov_pen=None, free=None, full_output=False, method=
-        None, **fit_kwargs):
+        try:
+            cov_re_inv = np.linalg.inv(cov_re)
+        except np.linalg.LinAlgError:
+            cov_re_inv = np.linalg.pinv(cov_re)
+            warnings.warn(_warn_cov_sing)
+
+        qf = 0.
+        for group_ix, group in enumerate(self.group_labels):
+
+            vc_var = self._expand_vcomp(vcomp, group_ix)
+
+            exog = self.exog_li[group_ix]
+            ex_r, ex2_r = self._aex_r[group_ix], self._aex_r2[group_ix]
+
+            solver = _smw_solver(1., ex_r, ex2_r, cov_re_inv, 1 / vc_var)
+
+            # The residuals
+            resid = self.endog_li[group_ix]
+            if self.k_fe > 0:
+                expval = np.dot(exog, fe_params)
+                resid = resid - expval
+
+            mat = solver(resid)
+            qf += np.dot(resid, mat)
+
+        if self.reml:
+            qf /= (self.n_totobs - self.k_fe)
+        else:
+            qf /= self.n_totobs
+
+        return qf
+
+    def fit(self, start_params=None, reml=True, niter_sa=0,
+            do_cg=True, fe_pen=None, cov_pen=None, free=None,
+            full_output=False, method=None, **fit_kwargs):
         """
         Fit a linear mixed model to the data.

@@ -1056,7 +2129,163 @@ class MixedLM(base.LikelihoodModel):
         -------
         A MixedLMResults instance.
         """
-        pass
+
+        _allowed_kwargs = ['gtol', 'maxiter', 'eps', 'maxcor', 'ftol',
+                           'tol', 'disp', 'maxls']
+        for x in fit_kwargs.keys():
+            if x not in _allowed_kwargs:
+                warnings.warn("Argument %s not used by MixedLM.fit" % x)
+
+        if method is None:
+            method = ['bfgs', 'lbfgs', 'cg']
+        elif isinstance(method, str):
+            method = [method]
+
+        for meth in method:
+            if meth.lower() in ["newton", "ncg"]:
+                raise ValueError(
+                    "method %s not available for MixedLM" % meth)
+
+        self.reml = reml
+        self.cov_pen = cov_pen
+        self.fe_pen = fe_pen
+        self._cov_sing = 0
+        self._freepat = free
+
+        if full_output:
+            hist = []
+        else:
+            hist = None
+
+        if start_params is None:
+            params = MixedLMParams(self.k_fe, self.k_re, self.k_vc)
+            params.fe_params = np.zeros(self.k_fe)
+            params.cov_re = np.eye(self.k_re)
+            params.vcomp = np.ones(self.k_vc)
+        else:
+            if isinstance(start_params, MixedLMParams):
+                params = start_params
+            else:
+                # It's a packed array
+                if len(start_params) == self.k_fe + self.k_re2 + self.k_vc:
+                    params = MixedLMParams.from_packed(
+                        start_params, self.k_fe, self.k_re, self.use_sqrt,
+                        has_fe=True)
+                elif len(start_params) == self.k_re2 + self.k_vc:
+                    params = MixedLMParams.from_packed(
+                        start_params, self.k_fe, self.k_re, self.use_sqrt,
+                        has_fe=False)
+                else:
+                    raise ValueError("invalid start_params")
+
+        if do_cg:
+            fit_kwargs["retall"] = hist is not None
+            if "disp" not in fit_kwargs:
+                fit_kwargs["disp"] = False
+            packed = params.get_packed(use_sqrt=self.use_sqrt, has_fe=False)
+
+            if niter_sa > 0:
+                warnings.warn("niter_sa is currently ignored")
+
+            # Try optimizing one or more times
+            for j in range(len(method)):
+                rslt = super(MixedLM, self).fit(start_params=packed,
+                                                skip_hessian=True,
+                                                method=method[j],
+                                                **fit_kwargs)
+                if rslt.mle_retvals['converged']:
+                    break
+                packed = rslt.params
+                if j + 1 < len(method):
+                    next_method = method[j + 1]
+                    warnings.warn(
+                        "Retrying MixedLM optimization with %s" % next_method,
+                        ConvergenceWarning)
+                else:
+                    msg = ("MixedLM optimization failed, " +
+                           "trying a different optimizer may help.")
+                    warnings.warn(msg, ConvergenceWarning)
+
+            # The optimization succeeded
+            params = np.atleast_1d(rslt.params)
+            if hist is not None:
+                hist.append(rslt.mle_retvals)
+
+        converged = rslt.mle_retvals['converged']
+        if not converged:
+            gn = self.score(rslt.params)
+            gn = np.sqrt(np.sum(gn**2))
+            msg = "Gradient optimization failed, |grad| = %f" % gn
+            warnings.warn(msg, ConvergenceWarning)
+
+        # Convert to the final parameterization (i.e. undo the square
+        # root transform of the covariance matrix, and the profiling
+        # over the error variance).
+        params = MixedLMParams.from_packed(
+            params, self.k_fe, self.k_re, use_sqrt=self.use_sqrt, has_fe=False)
+        cov_re_unscaled = params.cov_re
+        vcomp_unscaled = params.vcomp
+        fe_params, sing = self.get_fe_params(cov_re_unscaled, vcomp_unscaled)
+        params.fe_params = fe_params
+        scale = self.get_scale(fe_params, cov_re_unscaled, vcomp_unscaled)
+        cov_re = scale * cov_re_unscaled
+        vcomp = scale * vcomp_unscaled
+
+        f1 = (self.k_re > 0) and (np.min(np.abs(np.diag(cov_re))) < 0.01)
+        f2 = (self.k_vc > 0) and (np.min(np.abs(vcomp)) < 0.01)
+        if f1 or f2:
+            msg = "The MLE may be on the boundary of the parameter space."
+            warnings.warn(msg, ConvergenceWarning)
+
+        # Compute the Hessian at the MLE.  Note that this is the
+        # Hessian with respect to the random effects covariance matrix
+        # (not its square root).  It is used for obtaining standard
+        # errors, not for optimization.
+        hess, sing = self.hessian(params)
+        if sing:
+            warnings.warn(_warn_cov_sing)
+
+        hess_diag = np.diag(hess)
+        if free is not None:
+            pcov = np.zeros_like(hess)
+            pat = self._freepat.get_packed(use_sqrt=False, has_fe=True)
+            ii = np.flatnonzero(pat)
+            hess_diag = hess_diag[ii]
+            if len(ii) > 0:
+                hess1 = hess[np.ix_(ii, ii)]
+                pcov[np.ix_(ii, ii)] = np.linalg.inv(-hess1)
+        else:
+            pcov = np.linalg.inv(-hess)
+        if np.any(hess_diag >= 0):
+            msg = ("The Hessian matrix at the estimated parameter values " +
+                   "is not positive definite.")
+            warnings.warn(msg, ConvergenceWarning)
+
+        # Prepare a results class instance
+        params_packed = params.get_packed(use_sqrt=False, has_fe=True)
+        results = MixedLMResults(self, params_packed, pcov / scale)
+        results.params_object = params
+        results.fe_params = fe_params
+        results.cov_re = cov_re
+        results.vcomp = vcomp
+        results.scale = scale
+        results.cov_re_unscaled = cov_re_unscaled
+        results.method = "REML" if self.reml else "ML"
+        results.converged = converged
+        results.hist = hist
+        results.reml = self.reml
+        results.cov_pen = self.cov_pen
+        results.k_fe = self.k_fe
+        results.k_re = self.k_re
+        results.k_re2 = self.k_re2
+        results.k_vc = self.k_vc
+        results.use_sqrt = self.use_sqrt
+        results.freepat = self._freepat
+
+        return MixedLMResultsWrapper(results)
+
+    def get_distribution(self, params, scale, exog):
+        return _mixedlm_distribution(self, params, scale, exog)


 class _mixedlm_distribution:
@@ -1088,14 +2317,18 @@ class _mixedlm_distribution:
     """

     def __init__(self, model, params, scale, exog):
+
         self.model = model
         self.exog = exog if exog is not None else model.exog
-        po = MixedLMParams.from_packed(params, model.k_fe, model.k_re, 
-            False, True)
+
+        po = MixedLMParams.from_packed(
+                params, model.k_fe, model.k_re, False, True)
+
         self.fe_params = po.fe_params
         self.cov_re = scale * po.cov_re
         self.vcomp = scale * po.vcomp
         self.scale = scale
+
         group_idx = np.zeros(model.nobs, dtype=int)
         for k, g in enumerate(model.group_labels):
             group_idx[model.row_indices[g]] = k
@@ -1108,11 +2341,35 @@ class _mixedlm_distribution:

         The parameter n is ignored, but required by the interface
         """
-        pass
+
+        model = self.model
+
+        # Fixed effects
+        y = np.dot(self.exog, self.fe_params)
+
+        # Random effects
+        u = np.random.normal(size=(model.n_groups, model.k_re))
+        u = np.dot(u, np.linalg.cholesky(self.cov_re).T)
+        y += (u[self.group_idx, :] * model.exog_re).sum(1)
+
+        # Variance components
+        for j, _ in enumerate(model.exog_vc.names):
+            ex = model.exog_vc.mats[j]
+            v = self.vcomp[j]
+            for i, g in enumerate(model.group_labels):
+                exg = ex[i]
+                ii = model.row_indices[g]
+                u = np.random.normal(size=exg.shape[1])
+                y[ii] += np.sqrt(v) * np.dot(exg, u)
+
+        # Residual variance
+        y += np.sqrt(self.scale) * np.random.normal(size=len(y))
+
+        return y


 class MixedLMResults(base.LikelihoodModelResults, base.ResultMixin):
-    """
+    '''
     Class to contain results of fitting a linear mixed effects model.

     MixedLMResults inherits from statsmodels.LikelihoodModelResults
@@ -1150,11 +2407,12 @@ class MixedLMResults(base.LikelihoodModelResults, base.ResultMixin):
     See Also
     --------
     statsmodels.LikelihoodModelResults
-    """
+    '''

     def __init__(self, model, params, cov_params):
+
         super(MixedLMResults, self).__init__(model, params,
-            normalized_cov_params=cov_params)
+                                             normalized_cov_params=cov_params)
         self.nobs = self.model.nobs
         self.df_resid = self.nobs - np.linalg.matrix_rank(self.model.exog)

@@ -1166,7 +2424,21 @@ class MixedLMResults(base.LikelihoodModelResults, base.ResultMixin):
         The fitted values reflect the mean structure specified by the
         fixed effects and the predicted random effects.
         """
-        pass
+        fit = np.dot(self.model.exog, self.fe_params)
+        re = self.random_effects
+        for group_ix, group in enumerate(self.model.group_labels):
+            ix = self.model.row_indices[group]
+
+            mat = []
+            if self.model.exog_re_li is not None:
+                mat.append(self.model.exog_re_li[group_ix])
+            for j in range(self.k_vc):
+                mat.append(self.model.exog_vc.mats[j][group_ix])
+            mat = np.concatenate(mat, axis=1)
+
+            fit[ix] += np.dot(mat, re[group])
+
+        return fit

     @cache_readonly
     def resid(self):
@@ -1176,7 +2448,7 @@ class MixedLMResults(base.LikelihoodModelResults, base.ResultMixin):
         The residuals reflect the mean structure specified by the
         fixed effects and the predicted random effects.
         """
-        pass
+        return self.model.endog - self.fittedvalues

     @cache_readonly
     def bse_fe(self):
@@ -1184,7 +2456,8 @@ class MixedLMResults(base.LikelihoodModelResults, base.ResultMixin):
         Returns the standard errors of the fixed effect regression
         coefficients.
         """
-        pass
+        p = self.model.exog.shape[1]
+        return np.sqrt(np.diag(self.cov_params())[0:p])

     @cache_readonly
     def bse_re(self):
@@ -1201,7 +2474,18 @@ class MixedLMResults(base.LikelihoodModelResults, base.ResultMixin):
         standard errors may not give meaningful confidence intervals
         or p-values if used in the usual way.
         """
-        pass
+        p = self.model.exog.shape[1]
+        return np.sqrt(self.scale * np.diag(self.cov_params())[p:])
+
+    def _expand_re_names(self, group_ix):
+        names = list(self.model.data.exog_re_names)
+
+        for j, v in enumerate(self.model.exog_vc.names):
+            vg = self.model.exog_vc.colnames[j][group_ix]
+            na = ["%s[%s]" % (v, s) for s in vg]
+            names.extend(na)
+
+        return names

     @cache_readonly
     def random_effects(self):
@@ -1215,7 +2499,42 @@ class MixedLMResults(base.LikelihoodModelResults, base.ResultMixin):
             conditional means of the random effects for the group
             given the data.
         """
-        pass
+        try:
+            cov_re_inv = np.linalg.inv(self.cov_re)
+        except np.linalg.LinAlgError:
+            raise ValueError("Cannot predict random effects from " +
+                             "singular covariance structure.")
+
+        vcomp = self.vcomp
+        k_re = self.k_re
+
+        ranef_dict = {}
+        for group_ix, group in enumerate(self.model.group_labels):
+
+            endog = self.model.endog_li[group_ix]
+            exog = self.model.exog_li[group_ix]
+            ex_r = self.model._aex_r[group_ix]
+            ex2_r = self.model._aex_r2[group_ix]
+            vc_var = self.model._expand_vcomp(vcomp, group_ix)
+
+            # Get the residuals relative to fixed effects
+            resid = endog
+            if self.k_fe > 0:
+                expval = np.dot(exog, self.fe_params)
+                resid = resid - expval
+
+            solver = _smw_solver(self.scale, ex_r, ex2_r, cov_re_inv,
+                                 1 / vc_var)
+            vir = solver(resid)
+
+            xtvir = _dot(ex_r.T, vir)
+
+            xtvir[0:k_re] = np.dot(self.cov_re, xtvir[0:k_re])
+            xtvir[k_re:] *= vc_var
+            ranef_dict[group] = pd.Series(
+                xtvir, index=self._expand_re_names(group_ix))
+
+        return ranef_dict

     @cache_readonly
     def random_effects_cov(self):
@@ -1230,8 +2549,45 @@ class MixedLMResults(base.LikelihoodModelResults, base.ResultMixin):
             variable to the conditional covariance matrix of the
             random effects given the data.
         """
-        pass

+        try:
+            cov_re_inv = np.linalg.inv(self.cov_re)
+        except np.linalg.LinAlgError:
+            cov_re_inv = None
+
+        vcomp = self.vcomp
+
+        ranef_dict = {}
+        for group_ix in range(self.model.n_groups):
+
+            ex_r = self.model._aex_r[group_ix]
+            ex2_r = self.model._aex_r2[group_ix]
+            label = self.model.group_labels[group_ix]
+            vc_var = self.model._expand_vcomp(vcomp, group_ix)
+
+            solver = _smw_solver(self.scale, ex_r, ex2_r, cov_re_inv,
+                                 1 / vc_var)
+
+            n = ex_r.shape[0]
+            m = self.cov_re.shape[0]
+            mat1 = np.empty((n, m + len(vc_var)))
+            mat1[:, 0:m] = np.dot(ex_r[:, 0:m], self.cov_re)
+            mat1[:, m:] = np.dot(ex_r[:, m:], np.diag(vc_var))
+            mat2 = solver(mat1)
+            mat2 = np.dot(mat1.T, mat2)
+
+            v = -mat2
+            v[0:m, 0:m] += self.cov_re
+            ix = np.arange(m, v.shape[0])
+            v[ix, ix] += vc_var
+            na = self._expand_re_names(group_ix)
+            v = pd.DataFrame(v, index=na, columns=na)
+            ranef_dict[label] = v
+
+        return ranef_dict
+
+    # Need to override since t-tests are only used for fixed effects
+    # parameters.
     def t_test(self, r_matrix, use_t=None):
         """
         Compute a t-test for a each linear hypothesis of the form Rb = q
@@ -1259,10 +2615,18 @@ class MixedLMResults(base.LikelihoodModelResults, base.ResultMixin):
             The available results have the same elements as the parameter table
             in `summary()`.
         """
-        pass
-
-    def summary(self, yname=None, xname_fe=None, xname_re=None, title=None,
-        alpha=0.05):
+        if r_matrix.shape[1] != self.k_fe:
+            raise ValueError("r_matrix for t-test should have %d columns"
+                             % self.k_fe)
+
+        d = self.k_re2 + self.k_vc
+        z0 = np.zeros((r_matrix.shape[0], d))
+        r_matrix = np.concatenate((r_matrix, z0), axis=1)
+        tst_rslt = super(MixedLMResults, self).t_test(r_matrix, use_t=use_t)
+        return tst_rslt
+
+    def summary(self, yname=None, xname_fe=None, xname_re=None,
+                title=None, alpha=.05):
         """
         Summarize the mixed model regression results.

@@ -1290,20 +2654,121 @@ class MixedLMResults(base.LikelihoodModelResults, base.ResultMixin):
         --------
         statsmodels.iolib.summary2.Summary : class to hold summary results
         """
-        pass
+
+        from statsmodels.iolib import summary2
+        smry = summary2.Summary()
+
+        info = {}
+        info["Model:"] = "MixedLM"
+        if yname is None:
+            yname = self.model.endog_names
+
+        param_names = self.model.data.param_names[:]
+        k_fe_params = len(self.fe_params)
+        k_re_params = len(param_names) - len(self.fe_params)
+
+        if xname_fe is not None:
+            if len(xname_fe) != k_fe_params:
+                msg = "xname_fe should be a list of length %d" % k_fe_params
+                raise ValueError(msg)
+            param_names[:k_fe_params] = xname_fe
+
+        if xname_re is not None:
+            if len(xname_re) != k_re_params:
+                msg = "xname_re should be a list of length %d" % k_re_params
+                raise ValueError(msg)
+            param_names[k_fe_params:] = xname_re
+
+        info["No. Observations:"] = str(self.model.n_totobs)
+        info["No. Groups:"] = str(self.model.n_groups)
+
+        gs = np.array([len(x) for x in self.model.endog_li])
+        info["Min. group size:"] = "%.0f" % min(gs)
+        info["Max. group size:"] = "%.0f" % max(gs)
+        info["Mean group size:"] = "%.1f" % np.mean(gs)
+
+        info["Dependent Variable:"] = yname
+        info["Method:"] = self.method
+        info["Scale:"] = self.scale
+        info["Log-Likelihood:"] = self.llf
+        info["Converged:"] = "Yes" if self.converged else "No"
+        smry.add_dict(info)
+        smry.add_title("Mixed Linear Model Regression Results")
+
+        float_fmt = "%.3f"
+
+        sdf = np.nan * np.ones((self.k_fe + self.k_re2 + self.k_vc, 6))
+
+        # Coefficient estimates
+        sdf[0:self.k_fe, 0] = self.fe_params
+
+        # Standard errors
+        sdf[0:self.k_fe, 1] = np.sqrt(np.diag(self.cov_params()[0:self.k_fe]))
+
+        # Z-scores
+        sdf[0:self.k_fe, 2] = sdf[0:self.k_fe, 0] / sdf[0:self.k_fe, 1]
+
+        # p-values
+        sdf[0:self.k_fe, 3] = 2 * norm.cdf(-np.abs(sdf[0:self.k_fe, 2]))
+
+        # Confidence intervals
+        qm = -norm.ppf(alpha / 2)
+        sdf[0:self.k_fe, 4] = sdf[0:self.k_fe, 0] - qm * sdf[0:self.k_fe, 1]
+        sdf[0:self.k_fe, 5] = sdf[0:self.k_fe, 0] + qm * sdf[0:self.k_fe, 1]
+
+        # All random effects variances and covariances
+        jj = self.k_fe
+        for i in range(self.k_re):
+            for j in range(i + 1):
+                sdf[jj, 0] = self.cov_re[i, j]
+                sdf[jj, 1] = np.sqrt(self.scale) * self.bse[jj]
+                jj += 1
+
+        # Variance components
+        for i in range(self.k_vc):
+            sdf[jj, 0] = self.vcomp[i]
+            sdf[jj, 1] = np.sqrt(self.scale) * self.bse[jj]
+            jj += 1
+
+        sdf = pd.DataFrame(index=param_names, data=sdf)
+        sdf.columns = ['Coef.', 'Std.Err.', 'z', 'P>|z|',
+                       '[' + str(alpha/2), str(1-alpha/2) + ']']
+        for col in sdf.columns:
+            sdf[col] = [float_fmt % x if np.isfinite(x) else ""
+                        for x in sdf[col]]
+
+        smry.add_df(sdf, align='r')
+
+        return smry
+
+    @cache_readonly
+    def llf(self):
+        return self.model.loglike(self.params_object, profile_fe=False)

     @cache_readonly
     def aic(self):
         """Akaike information criterion"""
-        pass
+        if self.reml:
+            return np.nan
+        if self.freepat is not None:
+            df = self.freepat.get_packed(use_sqrt=False, has_fe=True).sum() + 1
+        else:
+            df = self.params.size + 1
+        return -2 * (self.llf - df)

     @cache_readonly
     def bic(self):
         """Bayesian information criterion"""
-        pass
+        if self.reml:
+            return np.nan
+        if self.freepat is not None:
+            df = self.freepat.get_packed(use_sqrt=False, has_fe=True).sum() + 1
+        else:
+            df = self.params.size + 1
+        return -2 * self.llf + np.log(self.nobs) * df

-    def profile_re(self, re_ix, vtype, num_low=5, dist_low=1.0, num_high=5,
-        dist_high=1.0, **fit_kwargs):
+    def profile_re(self, re_ix, vtype, num_low=5, dist_low=1., num_high=5,
+                   dist_high=1., **fit_kwargs):
         """
         Profile-likelihood inference for variance parameters.

@@ -1342,17 +2807,148 @@ class MixedLMResults(base.LikelihoodModelResults, base.ResultMixin):
         -----
         Only variance parameters can be profiled.
         """
-        pass
+
+        pmodel = self.model
+        k_fe = pmodel.k_fe
+        k_re = pmodel.k_re
+        k_vc = pmodel.k_vc
+        endog, exog = pmodel.endog, pmodel.exog
+
+        # Need to permute the columns of the random effects design
+        # matrix so that the profiled variable is in the first column.
+        if vtype == 're':
+            ix = np.arange(k_re)
+            ix[0] = re_ix
+            ix[re_ix] = 0
+            exog_re = pmodel.exog_re.copy()[:, ix]
+
+            # Permute the covariance structure to match the permuted
+            # design matrix.
+            params = self.params_object.copy()
+            cov_re_unscaled = params.cov_re
+            cov_re_unscaled = cov_re_unscaled[np.ix_(ix, ix)]
+            params.cov_re = cov_re_unscaled
+            ru0 = cov_re_unscaled[0, 0]
+
+            # Convert dist_low and dist_high to the profile
+            # parameterization
+            cov_re = self.scale * cov_re_unscaled
+            low = (cov_re[0, 0] - dist_low) / self.scale
+            high = (cov_re[0, 0] + dist_high) / self.scale
+
+        elif vtype == 'vc':
+            re_ix = self.model.exog_vc.names.index(re_ix)
+            params = self.params_object.copy()
+            vcomp = self.vcomp
+            low = (vcomp[re_ix] - dist_low) / self.scale
+            high = (vcomp[re_ix] + dist_high) / self.scale
+            ru0 = vcomp[re_ix] / self.scale
+
+        # Define the sequence of values to which the parameter of
+        # interest will be constrained.
+        if low <= 0:
+            raise ValueError("dist_low is too large and would result in a "
+                             "negative variance. Try a smaller value.")
+        left = np.linspace(low, ru0, num_low + 1)
+        right = np.linspace(ru0, high, num_high+1)[1:]
+        rvalues = np.concatenate((left, right))
+
+        # Indicators of which parameters are free and fixed.
+        free = MixedLMParams(k_fe, k_re, k_vc)
+        if self.freepat is None:
+            free.fe_params = np.ones(k_fe)
+            vcomp = np.ones(k_vc)
+            mat = np.ones((k_re, k_re))
+        else:
+            # If a freepat already has been specified, we add the
+            # constraint to it.
+            free.fe_params = self.freepat.fe_params
+            vcomp = self.freepat.vcomp
+            mat = self.freepat.cov_re
+            if vtype == 're':
+                mat = mat[np.ix_(ix, ix)]
+        if vtype == 're':
+            mat[0, 0] = 0
+        else:
+            vcomp[re_ix] = 0
+        free.cov_re = mat
+        free.vcomp = vcomp
+
+        klass = self.model.__class__
+        init_kwargs = pmodel._get_init_kwds()
+        if vtype == 're':
+            init_kwargs['exog_re'] = exog_re
+
+        likev = []
+        for x in rvalues:
+
+            model = klass(endog, exog, **init_kwargs)
+
+            if vtype == 're':
+                cov_re = params.cov_re.copy()
+                cov_re[0, 0] = x
+                params.cov_re = cov_re
+            else:
+                params.vcomp[re_ix] = x
+
+            # TODO should use fit_kwargs
+            rslt = model.fit(start_params=params, free=free,
+                             reml=self.reml, cov_pen=self.cov_pen,
+                             **fit_kwargs)._results
+            likev.append([x * rslt.scale, rslt.llf])
+
+        likev = np.asarray(likev)
+
+        return likev


 class MixedLMResultsWrapper(base.LikelihoodResultsWrapper):
     _attrs = {'bse_re': ('generic_columns', 'exog_re_names_full'),
-        'fe_params': ('generic_columns', 'xnames'), 'bse_fe': (
-        'generic_columns', 'xnames'), 'cov_re': ('generic_columns_2d',
-        'exog_re_names'), 'cov_re_unscaled': ('generic_columns_2d',
-        'exog_re_names')}
+              'fe_params': ('generic_columns', 'xnames'),
+              'bse_fe': ('generic_columns', 'xnames'),
+              'cov_re': ('generic_columns_2d', 'exog_re_names'),
+              'cov_re_unscaled': ('generic_columns_2d', 'exog_re_names'),
+              }
     _upstream_attrs = base.LikelihoodResultsWrapper._wrap_attrs
     _wrap_attrs = base.wrap.union_dicts(_attrs, _upstream_attrs)
+
     _methods = {}
     _upstream_methods = base.LikelihoodResultsWrapper._wrap_methods
     _wrap_methods = base.wrap.union_dicts(_methods, _upstream_methods)
+
+
+def _handle_missing(data, groups, formula, re_formula, vc_formula):
+
+    tokens = set()
+
+    forms = [formula]
+    if re_formula is not None:
+        forms.append(re_formula)
+    if vc_formula is not None:
+        forms.extend(vc_formula.values())
+
+    from statsmodels.compat.python import asunicode
+
+    from io import StringIO
+    import tokenize
+    skiptoks = {"(", ")", "*", ":", "+", "-", "**", "/"}
+
+    for fml in forms:
+        # Unicode conversion is for Py2 compatability
+        rl = StringIO(fml)
+
+        def rlu():
+            line = rl.readline()
+            return asunicode(line, 'ascii')
+        g = tokenize.generate_tokens(rlu)
+        for tok in g:
+            if tok not in skiptoks:
+                tokens.add(tok.string)
+    tokens = sorted(tokens & set(data.columns))
+
+    data = data[tokens]
+    ii = pd.notnull(data).all(1)
+    if type(groups) is not str:
+        ii &= pd.notnull(groups)
+
+    return data.loc[ii, :], groups[np.asarray(ii)]
diff --git a/statsmodels/regression/process_regression.py b/statsmodels/regression/process_regression.py
index 0bdf0adee..81b0bd3d0 100644
--- a/statsmodels/regression/process_regression.py
+++ b/statsmodels/regression/process_regression.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 This module implements maximum likelihood-based estimation (MLE) of
 Gaussian regression models for finite-dimensional observations made on
@@ -12,6 +13,7 @@ measures occur at arbitrary real-valued time points.
 The mean structure is specified as a linear model.  The covariance
 parameters depend on covariates via a link function.
 """
+
 import numpy as np
 import pandas as pd
 import patsy
@@ -25,13 +27,13 @@ import warnings


 class ProcessCovariance:
-    """
+    r"""
     A covariance model for a process indexed by a real parameter.

     An implementation of this class is based on a positive definite
     correlation function h that maps real numbers to the interval [0,
     1], such as the Gaussian (squared exponential) correlation
-    function :math:`\\exp(-x^2)`.  It also depends on a positive
+    function :math:`\exp(-x^2)`.  It also depends on a positive
     scaling function `s` and a positive smoothness function `u`.
     """

@@ -50,7 +52,7 @@ class ProcessCovariance:
             The smoothness parameters for the observation.  See class
             docstring for details.
         """
-        pass
+        raise NotImplementedError

     def jac(self, time, sc, sm):
         """
@@ -67,11 +69,11 @@ class ProcessCovariance:
             jsm[i] is the derivative of the covariance matrix
             with respect to the i^th smoothness parameter.
         """
-        pass
+        raise NotImplementedError


 class GaussianCovariance(ProcessCovariance):
-    """
+    r"""
     An implementation of ProcessCovariance using the Gaussian kernel.

     This class represents a parametric covariance model for a Gaussian
@@ -82,8 +84,8 @@ class GaussianCovariance(ProcessCovariance):

     .. math::

-      s[i] \\cdot s[j] \\cdot h(|time[i] - time[j]| / \\sqrt{(u[i] + u[j]) /
-      2}) \\cdot \\frac{u[i]^{1/4}u[j]^{1/4}}{\\sqrt{(u[i] + u[j])/2}}
+      s[i] \cdot s[j] \cdot h(|time[i] - time[j]| / \sqrt{(u[i] + u[j]) /
+      2}) \cdot \frac{u[i]^{1/4}u[j]^{1/4}}{\sqrt{(u[i] + u[j])/2}}

     The ProcessMLE class allows linear models with this covariance
     structure to be fit using maximum likelihood (ML). The mean and
@@ -106,6 +108,84 @@ class GaussianCovariance(ProcessCovariance):
         https://papers.nips.cc/paper/2350-nonstationary-covariance-functions-for-gaussian-process-regression.pdf
     """

+    def get_cov(self, time, sc, sm):
+
+        da = np.subtract.outer(time, time)
+        ds = np.add.outer(sm, sm) / 2
+
+        qmat = da * da / ds
+        cm = np.exp(-qmat / 2) / np.sqrt(ds)
+        cm *= np.outer(sm, sm)**0.25
+        cm *= np.outer(sc, sc)
+
+        return cm
+
+    def jac(self, time, sc, sm):
+
+        da = np.subtract.outer(time, time)
+        ds = np.add.outer(sm, sm) / 2
+        sds = np.sqrt(ds)
+        daa = da * da
+        qmat = daa / ds
+        p = len(time)
+        eqm = np.exp(-qmat / 2)
+        sm4 = np.outer(sm, sm)**0.25
+        cmx = eqm * sm4 / sds
+        dq0 = -daa / ds**2
+        di = np.zeros((p, p))
+        fi = np.zeros((p, p))
+        scc = np.outer(sc, sc)
+
+        # Derivatives with respect to the smoothing parameters.
+        jsm = []
+        for i, _ in enumerate(sm):
+            di *= 0
+            di[i, :] += 0.5
+            di[:, i] += 0.5
+            dbottom = 0.5 * di / sds
+            dtop = -0.5 * eqm * dq0 * di
+            b = dtop / sds - eqm * dbottom / ds
+            c = eqm / sds
+            v = 0.25 * sm**0.25 / sm[i]**0.75
+            fi *= 0
+            fi[i, :] = v
+            fi[:, i] = v
+            fi[i, i] = 0.5 / sm[i]**0.5
+            b = c * fi + b * sm4
+            b *= scc
+            jsm.append(b)
+
+        # Derivatives with respect to the scaling parameters.
+        jsc = []
+        for i in range(0, len(sc)):
+            b = np.zeros((p, p))
+            b[i, :] = cmx[i, :] * sc
+            b[:, i] += cmx[:, i] * sc
+            jsc.append(b)
+
+        return jsc, jsm
+
+
+def _check_args(endog, exog, exog_scale, exog_smooth, exog_noise, time,
+                groups):
+
+    v = [
+        len(endog),
+        exog.shape[0],
+        exog_scale.shape[0],
+        exog_smooth.shape[0],
+        len(time),
+        len(groups)
+    ]
+
+    if exog_noise is not None:
+        v.append(exog_noise.shape[0])
+
+    if min(v) != max(v):
+        msg = ("The leading dimensions of all array arguments " +
+               "must be equal.")
+        raise ValueError(msg)
+

 class ProcessMLE(base.LikelihoodModel):
     """
@@ -159,52 +239,215 @@ class ProcessMLE(base.LikelihoodModel):
         Defaults to GaussianCovariance.
     """

-    def __init__(self, endog, exog, exog_scale, exog_smooth, exog_noise,
-        time, groups, cov=None, **kwargs):
-        super(ProcessMLE, self).__init__(endog, exog, exog_scale=exog_scale,
-            exog_smooth=exog_smooth, exog_noise=exog_noise, time=time,
-            groups=groups, **kwargs)
+    def __init__(self,
+                 endog,
+                 exog,
+                 exog_scale,
+                 exog_smooth,
+                 exog_noise,
+                 time,
+                 groups,
+                 cov=None,
+                 **kwargs):
+
+        super(ProcessMLE, self).__init__(
+            endog,
+            exog,
+            exog_scale=exog_scale,
+            exog_smooth=exog_smooth,
+            exog_noise=exog_noise,
+            time=time,
+            groups=groups,
+            **kwargs)
+
         self._has_noise = exog_noise is not None
+
+        # Create parameter names
         xnames = []
-        if hasattr(exog, 'columns'):
+        if hasattr(exog, "columns"):
             xnames = list(exog.columns)
         else:
-            xnames = [('Mean%d' % j) for j in range(exog.shape[1])]
-        if hasattr(exog_scale, 'columns'):
+            xnames = ["Mean%d" % j for j in range(exog.shape[1])]
+
+        if hasattr(exog_scale, "columns"):
             xnames += list(exog_scale.columns)
         else:
-            xnames += [('Scale%d' % j) for j in range(exog_scale.shape[1])]
-        if hasattr(exog_smooth, 'columns'):
+            xnames += ["Scale%d" % j for j in range(exog_scale.shape[1])]
+
+        if hasattr(exog_smooth, "columns"):
             xnames += list(exog_smooth.columns)
         else:
-            xnames += [('Smooth%d' % j) for j in range(exog_smooth.shape[1])]
+            xnames += ["Smooth%d" % j for j in range(exog_smooth.shape[1])]
+
         if self._has_noise:
-            if hasattr(exog_noise, 'columns'):
+            if hasattr(exog_noise, "columns"):
+                # If pandas-like, get the actual column names
                 xnames += list(exog_noise.columns)
             else:
-                xnames += [('Noise%d' % j) for j in range(exog_noise.shape[1])]
+                # If numpy-like, create default names
+                xnames += ["Noise%d" % j for j in range(exog_noise.shape[1])]
+
         self.data.param_names = xnames
+
         if cov is None:
             cov = GaussianCovariance()
         self.cov = cov
-        _check_args(endog, exog, exog_scale, exog_smooth, exog_noise, time,
-            groups)
-        groups_ix = collections.defaultdict(lambda : [])
+
+        _check_args(endog, exog, exog_scale, exog_smooth, exog_noise,
+                    time, groups)
+
+        groups_ix = collections.defaultdict(lambda: [])
         for i, g in enumerate(groups):
             groups_ix[g].append(i)
         self._groups_ix = groups_ix
+
+        # Default, can be set in call to fit.
         self.verbose = False
+
         self.k_exog = self.exog.shape[1]
         self.k_scale = self.exog_scale.shape[1]
         self.k_smooth = self.exog_smooth.shape[1]
         if self._has_noise:
             self.k_noise = self.exog_noise.shape[1]

+    def _split_param_names(self):
+        xnames = self.data.param_names
+        q = 0
+        mean_names = xnames[q:q+self.k_exog]
+        q += self.k_exog
+        scale_names = xnames[q:q+self.k_scale]
+        q += self.k_scale
+        smooth_names = xnames[q:q+self.k_smooth]
+
+        if self._has_noise:
+            q += self.k_noise
+            noise_names = xnames[q:q+self.k_noise]
+        else:
+            noise_names = []
+
+        return mean_names, scale_names, smooth_names, noise_names
+
+    @classmethod
+    def from_formula(cls,
+                     formula,
+                     data,
+                     subset=None,
+                     drop_cols=None,
+                     *args,
+                     **kwargs):
+
+        if "scale_formula" in kwargs:
+            scale_formula = kwargs["scale_formula"]
+        else:
+            raise ValueError("scale_formula is a required argument")
+
+        if "smooth_formula" in kwargs:
+            smooth_formula = kwargs["smooth_formula"]
+        else:
+            raise ValueError("smooth_formula is a required argument")
+
+        if "noise_formula" in kwargs:
+            noise_formula = kwargs["noise_formula"]
+        else:
+            noise_formula = None
+
+        if "time" in kwargs:
+            time = kwargs["time"]
+        else:
+            raise ValueError("time is a required argument")
+
+        if "groups" in kwargs:
+            groups = kwargs["groups"]
+        else:
+            raise ValueError("groups is a required argument")
+
+        if subset is not None:
+            warnings.warn("'subset' is ignored")
+
+        if drop_cols is not None:
+            warnings.warn("'drop_cols' is ignored")
+
+        if isinstance(time, str):
+            time = np.asarray(data[time])
+
+        if isinstance(groups, str):
+            groups = np.asarray(data[groups])
+
+        exog_scale = patsy.dmatrix(scale_formula, data)
+        scale_design_info = exog_scale.design_info
+        scale_names = scale_design_info.column_names
+        exog_scale = np.asarray(exog_scale)
+
+        exog_smooth = patsy.dmatrix(smooth_formula, data)
+        smooth_design_info = exog_smooth.design_info
+        smooth_names = smooth_design_info.column_names
+        exog_smooth = np.asarray(exog_smooth)
+
+        if noise_formula is not None:
+            exog_noise = patsy.dmatrix(noise_formula, data)
+            noise_design_info = exog_noise.design_info
+            noise_names = noise_design_info.column_names
+            exog_noise = np.asarray(exog_noise)
+        else:
+            exog_noise, noise_design_info, noise_names, exog_noise =\
+                None, None, [], None
+
+        mod = super(ProcessMLE, cls).from_formula(
+            formula,
+            data=data,
+            subset=None,
+            exog_scale=exog_scale,
+            exog_smooth=exog_smooth,
+            exog_noise=exog_noise,
+            time=time,
+            groups=groups)
+
+        mod.data.scale_design_info = scale_design_info
+        mod.data.smooth_design_info = smooth_design_info
+
+        if mod._has_noise:
+            mod.data.noise_design_info = noise_design_info
+
+        mod.data.param_names = (mod.exog_names + scale_names +
+                                smooth_names + noise_names)
+
+        return mod
+
     def unpack(self, z):
         """
         Split the packed parameter vector into blocks.
         """
-        pass
+
+        # Mean parameters
+        pm = self.exog.shape[1]
+        mnpar = z[0:pm]
+
+        # Standard deviation parameters
+        pv = self.exog_scale.shape[1]
+        scpar = z[pm:pm + pv]
+
+        # Smoothness parameters
+        ps = self.exog_smooth.shape[1]
+        smpar = z[pm + pv:pm + pv + ps]
+
+        # Observation white noise standard deviation.
+        # Empty if has_noise = False.
+        nopar = z[pm + pv + ps:]
+
+        return mnpar, scpar, smpar, nopar
+
+    def _get_start(self):
+
+        # Use OLS to get starting values for mean structure parameters
+        model = OLS(self.endog, self.exog)
+        result = model.fit()
+
+        m = self.exog_scale.shape[1] + self.exog_smooth.shape[1]
+
+        if self._has_noise:
+            m += self.exog_noise.shape[1]
+
+        return np.concatenate((result.params, np.zeros(m)))

     def loglike(self, params):
         """
@@ -224,7 +467,41 @@ class ProcessMLE(base.LikelihoodModel):
         The mean, scaling, and smoothing parameters are packed into
         a vector.  Use `unpack` to access the component vectors.
         """
-        pass
+
+        mnpar, scpar, smpar, nopar = self.unpack(params)
+
+        # Residuals
+        resid = self.endog - np.dot(self.exog, mnpar)
+
+        # Scaling parameters
+        sc = np.exp(np.dot(self.exog_scale, scpar))
+
+        # Smoothness parameters
+        sm = np.exp(np.dot(self.exog_smooth, smpar))
+
+        # White noise standard deviation
+        if self._has_noise:
+            no = np.exp(np.dot(self.exog_noise, nopar))
+
+        # Get the log-likelihood
+        ll = 0.
+        for _, ix in self._groups_ix.items():
+
+            # Get the covariance matrix for this person.
+            cm = self.cov.get_cov(self.time[ix], sc[ix], sm[ix])
+
+            # The variance of the additive noise, if present.
+            if self._has_noise:
+                cm.flat[::cm.shape[0] + 1] += no[ix]**2
+
+            re = resid[ix]
+            ll -= 0.5 * np.linalg.slogdet(cm)[1]
+            ll -= 0.5 * np.dot(re, np.linalg.solve(cm, re))
+
+        if self.verbose:
+            print("L=", ll)
+
+        return ll

     def score(self, params):
         """
@@ -244,9 +521,89 @@ class ProcessMLE(base.LikelihoodModel):
         The mean, scaling, and smoothing parameters are packed into
         a vector.  Use `unpack` to access the component vectors.
         """
-        pass

-    def fit(self, start_params=None, method=None, maxiter=None, **kwargs):
+        mnpar, scpar, smpar, nopar = self.unpack(params)
+        pm, pv, ps = len(mnpar), len(scpar), len(smpar)
+
+        # Residuals
+        resid = self.endog - np.dot(self.exog, mnpar)
+
+        # Scaling
+        sc = np.exp(np.dot(self.exog_scale, scpar))
+
+        # Smoothness
+        sm = np.exp(np.dot(self.exog_smooth, smpar))
+
+        # White noise standard deviation
+        if self._has_noise:
+            no = np.exp(np.dot(self.exog_noise, nopar))
+
+        # Get the log-likelihood
+        score = np.zeros(len(mnpar) + len(scpar) + len(smpar) + len(nopar))
+        for _, ix in self._groups_ix.items():
+
+            sc_i = sc[ix]
+            sm_i = sm[ix]
+            resid_i = resid[ix]
+            time_i = self.time[ix]
+            exog_i = self.exog[ix, :]
+            exog_scale_i = self.exog_scale[ix, :]
+            exog_smooth_i = self.exog_smooth[ix, :]
+
+            # Get the covariance matrix for this person.
+            cm = self.cov.get_cov(time_i, sc_i, sm_i)
+
+            if self._has_noise:
+                no_i = no[ix]
+                exog_noise_i = self.exog_noise[ix, :]
+                cm.flat[::cm.shape[0] + 1] += no[ix]**2
+
+            cmi = np.linalg.inv(cm)
+
+            jacv, jacs = self.cov.jac(time_i, sc_i, sm_i)
+
+            # The derivatives for the mean parameters.
+            dcr = np.linalg.solve(cm, resid_i)
+            score[0:pm] += np.dot(exog_i.T, dcr)
+
+            # The derivatives for the scaling parameters.
+            rx = np.outer(resid_i, resid_i)
+            qm = np.linalg.solve(cm, rx)
+            qm = 0.5 * np.linalg.solve(cm, qm.T)
+            scx = sc_i[:, None] * exog_scale_i
+            for i, _ in enumerate(ix):
+                jq = np.sum(jacv[i] * qm)
+                score[pm:pm + pv] += jq * scx[i, :]
+                score[pm:pm + pv] -= 0.5 * np.sum(jacv[i] * cmi) * scx[i, :]
+
+            # The derivatives for the smoothness parameters.
+            smx = sm_i[:, None] * exog_smooth_i
+            for i, _ in enumerate(ix):
+                jq = np.sum(jacs[i] * qm)
+                score[pm + pv:pm + pv + ps] += jq * smx[i, :]
+                score[pm + pv:pm + pv + ps] -= (
+                         0.5 * np.sum(jacs[i] * cmi) * smx[i, :])
+
+            # The derivatives with respect to the standard deviation parameters
+            if self._has_noise:
+                sno = no_i[:, None]**2 * exog_noise_i
+                score[pm + pv + ps:] -= np.dot(cmi.flat[::cm.shape[0] + 1],
+                                               sno)
+                bm = np.dot(cmi, np.dot(rx, cmi))
+                score[pm + pv + ps:] += np.dot(bm.flat[::bm.shape[0] + 1], sno)
+
+        if self.verbose:
+            print("|G|=", np.sqrt(np.sum(score * score)))
+
+        return score
+
+    def hessian(self, params):
+
+        hess = approx_fprime(params, self.score)
+        return hess
+
+    def fit(self, start_params=None, method=None, maxiter=None,
+            **kwargs):
         """
         Fit a grouped Gaussian process regression using MLE.

@@ -263,10 +620,75 @@ class ProcessMLE(base.LikelihoodModel):
         -------
         An instance of ProcessMLEResults.
         """
-        pass
+
+        if "verbose" in kwargs:
+            self.verbose = kwargs["verbose"]
+
+        minim_opts = {}
+        if "minim_opts" in kwargs:
+            minim_opts = kwargs["minim_opts"]
+
+        if start_params is None:
+            start_params = self._get_start()
+
+        if isinstance(method, str):
+            method = [method]
+        elif method is None:
+            method = ["powell", "bfgs"]
+
+        for j, meth in enumerate(method):
+
+            if meth not in ("powell",):
+                def jac(x):
+                    return -self.score(x)
+            else:
+                jac = None
+
+            if maxiter is not None:
+                if np.isscalar(maxiter):
+                    minim_opts["maxiter"] = maxiter
+                else:
+                    minim_opts["maxiter"] = maxiter[j % len(maxiter)]
+
+            f = minimize(
+                lambda x: -self.loglike(x),
+                method=meth,
+                x0=start_params,
+                jac=jac,
+                options=minim_opts)
+
+            if not f.success:
+                msg = "Fitting did not converge"
+                if jac is not None:
+                    msg += ", |gradient|=%.6f" % np.sqrt(np.sum(f.jac**2))
+                if j < len(method) - 1:
+                    msg += ", trying %s next..." % method[j+1]
+                warnings.warn(msg)
+
+            if np.isfinite(f.x).all():
+                start_params = f.x
+
+        hess = self.hessian(f.x)
+        try:
+            cov_params = -np.linalg.inv(hess)
+        except Exception:
+            cov_params = None
+
+        class rslt:
+            pass
+
+        r = rslt()
+        r.params = f.x
+        r.normalized_cov_params = cov_params
+        r.optim_retvals = f
+        r.scale = 1
+
+        rslt = ProcessMLEResults(self, r)
+
+        return rslt

     def covariance(self, time, scale_params, smooth_params, scale_data,
-        smooth_data):
+                   smooth_data):
         """
         Returns a Gaussian process covariance matrix.

@@ -303,7 +725,17 @@ class ProcessMLE(base.LikelihoodModel):
         The covariance is only for the Gaussian process and does not include
         the white noise variance.
         """
-        pass
+
+        if not hasattr(self.data, "scale_design_info"):
+            sca = np.dot(scale_data, scale_params)
+            smo = np.dot(smooth_data, smooth_params)
+        else:
+            sc = patsy.dmatrix(self.data.scale_design_info, scale_data)
+            sm = patsy.dmatrix(self.data.smooth_design_info, smooth_data)
+            sca = np.exp(np.dot(sc, scale_params))
+            smo = np.exp(np.dot(sm, smooth_params))
+
+        return self.cov.get_cov(time, sca, smo)

     def predict(self, params, exog=None, *args, **kwargs):
         """
@@ -318,7 +750,17 @@ class ProcessMLE(base.LikelihoodModel):
             The design matrix for the mean structure.  If not provided,
             the model's design matrix is used.
         """
-        pass
+
+        if exog is None:
+            exog = self.exog
+        elif hasattr(self.data, "design_info"):
+            # Run the provided data through the formula if present
+            exog = patsy.dmatrix(self.data.design_info, exog)
+
+        if len(params) > exog.shape[1]:
+            params = params[0:exog.shape[1]]
+
+        return np.dot(exog, params)


 class ProcessMLEResults(base.GenericLikelihoodModelResults):
@@ -327,20 +769,37 @@ class ProcessMLEResults(base.GenericLikelihoodModelResults):
     """

     def __init__(self, model, mlefit):
-        super(ProcessMLEResults, self).__init__(model, mlefit)
+
+        super(ProcessMLEResults, self).__init__(
+            model, mlefit)
+
         pa = model.unpack(mlefit.params)
+
         self.mean_params = pa[0]
         self.scale_params = pa[1]
         self.smooth_params = pa[2]
         self.no_params = pa[3]
+
         self.df_resid = model.endog.shape[0] - len(mlefit.params)
+
         self.k_exog = self.model.exog.shape[1]
         self.k_scale = self.model.exog_scale.shape[1]
         self.k_smooth = self.model.exog_smooth.shape[1]
+
         self._has_noise = model._has_noise
         if model._has_noise:
             self.k_noise = self.model.exog_noise.shape[1]

+    def predict(self, exog=None, transform=True, *args, **kwargs):
+
+        if not transform:
+            warnings.warn("'transform=False' is ignored in predict")
+
+        if len(args) > 0 or len(kwargs) > 0:
+            warnings.warn("extra arguments ignored in 'predict'")
+
+        return self.model.predict(self.params, exog)
+
     def covariance(self, time, scale, smooth):
         """
         Returns a fitted covariance matrix.
@@ -369,4 +828,65 @@ class ProcessMLEResults(base.GenericLikelihoodModelResults):
         Otherwise, `scale` and `smooth` should be data arrays whose
         columns align with the fitted scaling and smoothing parameters.
         """
-        pass
+
+        return self.model.covariance(time, self.scale_params,
+                                     self.smooth_params, scale, smooth)
+
+    def covariance_group(self, group):
+
+        # Check if the group exists, since _groups_ix is a
+        # DefaultDict use len instead of catching a KeyError.
+        ix = self.model._groups_ix[group]
+        if len(ix) == 0:
+            msg = "Group '%s' does not exist" % str(group)
+            raise ValueError(msg)
+
+        scale_data = self.model.exog_scale[ix, :]
+        smooth_data = self.model.exog_smooth[ix, :]
+
+        _, scale_names, smooth_names, _ = self.model._split_param_names()
+
+        scale_data = pd.DataFrame(scale_data, columns=scale_names)
+        smooth_data = pd.DataFrame(smooth_data, columns=smooth_names)
+        time = self.model.time[ix]
+
+        return self.model.covariance(time,
+                                     self.scale_params,
+                                     self.smooth_params,
+                                     scale_data,
+                                     smooth_data)
+
+    def summary(self, yname=None, xname=None, title=None, alpha=0.05):
+
+        df = pd.DataFrame()
+
+        typ = (["Mean"] * self.k_exog + ["Scale"] * self.k_scale +
+               ["Smooth"] * self.k_smooth)
+        if self._has_noise:
+            typ += ["SD"] * self.k_noise
+        df["Type"] = typ
+
+        df["coef"] = self.params
+
+        try:
+            df["std err"] = np.sqrt(np.diag(self.cov_params()))
+        except Exception:
+            df["std err"] = np.nan
+
+        from scipy.stats.distributions import norm
+        df["tvalues"] = df.coef / df["std err"]
+        df["P>|t|"] = 2 * norm.sf(np.abs(df.tvalues))
+
+        f = norm.ppf(1 - alpha / 2)
+        df["[%.3f" % (alpha / 2)] = df.coef - f * df["std err"]
+        df["%.3f]" % (1 - alpha / 2)] = df.coef + f * df["std err"]
+
+        df.index = self.model.data.param_names
+
+        summ = summary2.Summary()
+        if title is None:
+            title = "Gaussian process regression results"
+        summ.add_title(title)
+        summ.add_df(df)
+
+        return summ
diff --git a/statsmodels/regression/quantile_regression.py b/statsmodels/regression/quantile_regression.py
index 9d8f78266..e36e7b364 100644
--- a/statsmodels/regression/quantile_regression.py
+++ b/statsmodels/regression/quantile_regression.py
@@ -1,4 +1,6 @@
-"""
+#!/usr/bin/env python
+
+'''
 Quantile regression model

 Model parameters are estimated using iterated reweighted least squares. The
@@ -13,19 +15,22 @@ University of Tehran, 2008 (shmohammadi@gmail.com), with some lines based on
 code written by James P. Lesage in Applied Econometrics Using MATLAB(1999).PP.
 73-4.  Translated to python with permission from original author by Christian
 Prinoth (christian at prinoth dot name).
-"""
+'''
+
 import numpy as np
 import warnings
 import scipy.stats as stats
 from numpy.linalg import pinv
 from scipy.stats import norm
 from statsmodels.tools.decorators import cache_readonly
-from statsmodels.regression.linear_model import RegressionModel, RegressionResults, RegressionResultsWrapper
-from statsmodels.tools.sm_exceptions import ConvergenceWarning, IterationLimitWarning
-
+from statsmodels.regression.linear_model import (RegressionModel,
+                                                 RegressionResults,
+                                                 RegressionResultsWrapper)
+from statsmodels.tools.sm_exceptions import (ConvergenceWarning,
+                                             IterationLimitWarning)

 class QuantReg(RegressionModel):
-    """Quantile Regression
+    '''Quantile Regression

     Estimate a quantile regression model using iterative reweighted least
     squares.
@@ -67,7 +72,7 @@ class QuantReg(RegressionModel):

     Keywords: Least Absolute Deviation(LAD) Regression, Quantile Regression,
     Regression, Robust Estimation.
-    """
+    '''

     def __init__(self, endog, exog, **kwargs):
         self._check_kwargs(kwargs)
@@ -77,10 +82,10 @@ class QuantReg(RegressionModel):
         """
         QuantReg model whitener does nothing: returns data.
         """
-        pass
+        return data

-    def fit(self, q=0.5, vcov='robust', kernel='epa', bandwidth='hsheather',
-        max_iter=1000, p_tol=1e-06, **kwargs):
+    def fit(self, q=.5, vcov='robust', kernel='epa', bandwidth='hsheather',
+            max_iter=1000, p_tol=1e-6, **kwargs):
         """
         Solve by Iterative Weighted Least Squares

@@ -111,24 +116,228 @@ class QuantReg(RegressionModel):
             - bofinger: Bofinger (1975)
             - chamberlain: Chamberlain (1994)
         """
-        pass
+
+        if q <= 0 or q >= 1:
+            raise Exception('q must be strictly between 0 and 1')
+
+        kern_names = ['biw', 'cos', 'epa', 'gau', 'par']
+        if kernel not in kern_names:
+            raise Exception("kernel must be one of " + ', '.join(kern_names))
+        else:
+            kernel = kernels[kernel]
+
+        if bandwidth == 'hsheather':
+            bandwidth = hall_sheather
+        elif bandwidth == 'bofinger':
+            bandwidth = bofinger
+        elif bandwidth == 'chamberlain':
+            bandwidth = chamberlain
+        else:
+            raise Exception("bandwidth must be in 'hsheather', 'bofinger', 'chamberlain'")
+
+        endog = self.endog
+        exog = self.exog
+        nobs = self.nobs
+        exog_rank = np.linalg.matrix_rank(self.exog)
+        self.rank = exog_rank
+        self.df_model = float(self.rank - self.k_constant)
+        self.df_resid = self.nobs - self.rank
+        n_iter = 0
+        xstar = exog
+
+        beta = np.ones(exog.shape[1])
+        # TODO: better start, initial beta is used only for convergence check
+
+        # Note the following does not work yet,
+        # the iteration loop always starts with OLS as initial beta
+        # if start_params is not None:
+        #    if len(start_params) != rank:
+        #       raise ValueError('start_params has wrong length')
+        #       beta = start_params
+        #    else:
+        #       # start with OLS
+        #       beta = np.dot(np.linalg.pinv(exog), endog)
+
+        diff = 10
+        cycle = False
+
+        history = dict(params = [], mse=[])
+        while n_iter < max_iter and diff > p_tol and not cycle:
+            n_iter += 1
+            beta0 = beta
+            xtx = np.dot(xstar.T, exog)
+            xty = np.dot(xstar.T, endog)
+            beta = np.dot(pinv(xtx), xty)
+            resid = endog - np.dot(exog, beta)
+
+            mask = np.abs(resid) < .000001
+            resid[mask] = ((resid[mask] >= 0) * 2 - 1) * .000001
+            resid = np.where(resid < 0, q * resid, (1-q) * resid)
+            resid = np.abs(resid)
+            xstar = exog / resid[:, np.newaxis]
+            diff = np.max(np.abs(beta - beta0))
+            history['params'].append(beta)
+            history['mse'].append(np.mean(resid*resid))
+
+            if (n_iter >= 300) and (n_iter % 100 == 0):
+                # check for convergence circle, should not happen
+                for ii in range(2, 10):
+                    if np.all(beta == history['params'][-ii]):
+                        cycle = True
+                        warnings.warn("Convergence cycle detected", ConvergenceWarning)
+                        break
+
+        if n_iter == max_iter:
+            warnings.warn("Maximum number of iterations (" + str(max_iter) +
+                          ") reached.", IterationLimitWarning)
+
+        e = endog - np.dot(exog, beta)
+        # Greene (2008, p.407) writes that Stata 6 uses this bandwidth:
+        # h = 0.9 * np.std(e) / (nobs**0.2)
+        # Instead, we calculate bandwidth as in Stata 12
+        iqre = stats.scoreatpercentile(e, 75) - stats.scoreatpercentile(e, 25)
+        h = bandwidth(nobs, q)
+        h = min(np.std(endog),
+                iqre / 1.34) * (norm.ppf(q + h) - norm.ppf(q - h))
+
+        fhat0 = 1. / (nobs * h) * np.sum(kernel(e / h))
+
+        if vcov == 'robust':
+            d = np.where(e > 0, (q/fhat0)**2, ((1-q)/fhat0)**2)
+            xtxi = pinv(np.dot(exog.T, exog))
+            xtdx = np.dot(exog.T * d[np.newaxis, :], exog)
+            vcov = xtxi @ xtdx @ xtxi
+        elif vcov == 'iid':
+            vcov = (1. / fhat0)**2 * q * (1 - q) * pinv(np.dot(exog.T, exog))
+        else:
+            raise Exception("vcov must be 'robust' or 'iid'")
+
+        lfit = QuantRegResults(self, beta, normalized_cov_params=vcov)
+
+        lfit.q = q
+        lfit.iterations = n_iter
+        lfit.sparsity = 1. / fhat0
+        lfit.bandwidth = h
+        lfit.history = history
+
+        return RegressionResultsWrapper(lfit)
+
+
+def _parzen(u):
+    z = np.where(np.abs(u) <= .5, 4./3 - 8. * u**2 + 8. * np.abs(u)**3,
+                 8. * (1 - np.abs(u))**3 / 3.)
+    z[np.abs(u) > 1] = 0
+    return z


 kernels = {}
-kernels['biw'] = lambda u: 15.0 / 16 * (1 - u ** 2) ** 2 * np.where(np.abs(
-    u) <= 1, 1, 0)
-kernels['cos'] = lambda u: np.where(np.abs(u) <= 0.5, 1 + np.cos(2 * np.pi *
-    u), 0)
-kernels['epa'] = lambda u: 3.0 / 4 * (1 - u ** 2) * np.where(np.abs(u) <= 1,
-    1, 0)
+kernels['biw'] = lambda u: 15. / 16 * (1 - u**2)**2 * np.where(np.abs(u) <= 1, 1, 0)
+kernels['cos'] = lambda u: np.where(np.abs(u) <= .5, 1 + np.cos(2 * np.pi * u), 0)
+kernels['epa'] = lambda u: 3. / 4 * (1-u**2) * np.where(np.abs(u) <= 1, 1, 0)
 kernels['gau'] = norm.pdf
 kernels['par'] = _parzen
+#kernels['bet'] = lambda u: np.where(np.abs(u) <= 1, .75 * (1 - u) * (1 + u), 0)
+#kernels['log'] = lambda u: logistic.pdf(u) * (1 - logistic.pdf(u))
+#kernels['tri'] = lambda u: np.where(np.abs(u) <= 1, 1 - np.abs(u), 0)
+#kernels['trw'] = lambda u: 35. / 32 * (1 - u**2)**3 * np.where(np.abs(u) <= 1, 1, 0)
+#kernels['uni'] = lambda u: 1. / 2 * np.where(np.abs(u) <= 1, 1, 0)


-class QuantRegResults(RegressionResults):
-    """Results instance for the QuantReg model"""
+def hall_sheather(n, q, alpha=.05):
+    z = norm.ppf(q)
+    num = 1.5 * norm.pdf(z)**2.
+    den = 2. * z**2. + 1.
+    h = n**(-1. / 3) * norm.ppf(1. - alpha / 2.)**(2./3) * (num / den)**(1./3)
+    return h
+

-    def summary(self, yname=None, xname=None, title=None, alpha=0.05):
+def bofinger(n, q):
+    num = 9. / 2 * norm.pdf(2 * norm.ppf(q))**4
+    den = (2 * norm.ppf(q)**2 + 1)**2
+    h = n**(-1. / 5) * (num / den)**(1. / 5)
+    return h
+
+
+def chamberlain(n, q, alpha=.05):
+    return norm.ppf(1 - alpha / 2) * np.sqrt(q*(1 - q) / n)
+
+
+class QuantRegResults(RegressionResults):
+    '''Results instance for the QuantReg model'''
+
+    @cache_readonly
+    def prsquared(self):
+        q = self.q
+        endog = self.model.endog
+        e = self.resid
+        e = np.where(e < 0, (1 - q) * e, q * e)
+        e = np.abs(e)
+        ered = endog - stats.scoreatpercentile(endog, q * 100)
+        ered = np.where(ered < 0, (1 - q) * ered, q * ered)
+        ered = np.abs(ered)
+        return 1 - np.sum(e) / np.sum(ered)
+
+    #@cache_readonly
+    def scale(self):
+        return 1.
+
+    @cache_readonly
+    def bic(self):
+        return np.nan
+
+    @cache_readonly
+    def aic(self):
+        return np.nan
+
+    @cache_readonly
+    def llf(self):
+        return np.nan
+
+    @cache_readonly
+    def rsquared(self):
+        return np.nan
+
+    @cache_readonly
+    def rsquared_adj(self):
+        return np.nan
+
+    @cache_readonly
+    def mse(self):
+        return np.nan
+
+    @cache_readonly
+    def mse_model(self):
+        return np.nan
+
+    @cache_readonly
+    def mse_total(self):
+        return np.nan
+
+    @cache_readonly
+    def centered_tss(self):
+        return np.nan
+
+    @cache_readonly
+    def uncentered_tss(self):
+        return np.nan
+
+    @cache_readonly
+    def HC0_se(self):
+        raise NotImplementedError
+
+    @cache_readonly
+    def HC1_se(self):
+        raise NotImplementedError
+
+    @cache_readonly
+    def HC2_se(self):
+        raise NotImplementedError
+
+    @cache_readonly
+    def HC3_se(self):
+        raise NotImplementedError
+
+    def summary(self, yname=None, xname=None, title=None, alpha=.05):
         """Summarize the Regression Results

         Parameters
@@ -155,4 +364,53 @@ class QuantRegResults(RegressionResults):
         --------
         statsmodels.iolib.summary.Summary : class to hold summary results
         """
-        pass
+        eigvals = self.eigenvals
+        condno = self.condition_number
+
+        top_left = [('Dep. Variable:', None),
+                    ('Model:', None),
+                    ('Method:', ['Least Squares']),
+                    ('Date:', None),
+                    ('Time:', None)
+                    ]
+
+        top_right = [('Pseudo R-squared:', ["%#8.4g" % self.prsquared]),
+                     ('Bandwidth:', ["%#8.4g" % self.bandwidth]),
+                     ('Sparsity:', ["%#8.4g" % self.sparsity]),
+                     ('No. Observations:', None),
+                     ('Df Residuals:', None),
+                     ('Df Model:', None)
+                     ]
+
+        if title is None:
+            title = self.model.__class__.__name__ + ' ' + "Regression Results"
+
+        # create summary table instance
+        from statsmodels.iolib.summary import Summary
+        smry = Summary()
+        smry.add_table_2cols(self, gleft=top_left, gright=top_right,
+                             yname=yname, xname=xname, title=title)
+        smry.add_table_params(self, yname=yname, xname=xname, alpha=alpha,
+                              use_t=self.use_t)
+
+        # add warnings/notes, added to text format only
+        etext = []
+        if eigvals[-1] < 1e-10:
+            wstr = "The smallest eigenvalue is %6.3g. This might indicate "
+            wstr += "that there are\n"
+            wstr += "strong multicollinearity problems or that the design "
+            wstr += "matrix is singular."
+            wstr = wstr % eigvals[-1]
+            etext.append(wstr)
+        elif condno > 1000:  # TODO: what is recommended
+            wstr = "The condition number is large, %6.3g. This might "
+            wstr += "indicate that there are\n"
+            wstr += "strong multicollinearity or other numerical "
+            wstr += "problems."
+            wstr = wstr % condno
+            etext.append(wstr)
+
+        if etext:
+            smry.add_extra_txt(etext)
+
+        return smry
diff --git a/statsmodels/regression/recursive_ls.py b/statsmodels/regression/recursive_ls.py
index 0dad99695..139ad530b 100644
--- a/statsmodels/regression/recursive_ls.py
+++ b/statsmodels/regression/recursive_ls.py
@@ -4,22 +4,31 @@ Recursive least squares model
 Author: Chad Fulton
 License: Simplified-BSD
 """
+
 import numpy as np
 import pandas as pd
+
 from statsmodels.compat.pandas import Appender
+
 from statsmodels.tools.data import _is_using_pandas
-from statsmodels.tsa.statespace.mlemodel import MLEModel, MLEResults, MLEResultsWrapper, PredictionResults, PredictionResultsWrapper
+from statsmodels.tsa.statespace.mlemodel import (
+    MLEModel, MLEResults, MLEResultsWrapper, PredictionResults,
+    PredictionResultsWrapper)
 from statsmodels.tsa.statespace.tools import concat
 from statsmodels.tools.tools import Bunch
 from statsmodels.tools.decorators import cache_readonly
 import statsmodels.base.wrapper as wrap
-_cusum_squares_scalars = np.array([[1.072983, 1.2238734, 1.3581015, 
-    1.5174271, 1.6276236], [-0.6698868, -0.6700069, -0.6701218, -0.6702672,
-    -0.6703724], [-0.5816458, -0.7351697, -0.8858694, -1.0847745, -1.2365861]])
+
+# Columns are alpha = 0.1, 0.05, 0.025, 0.01, 0.005
+_cusum_squares_scalars = np.array([
+    [1.0729830,   1.2238734,  1.3581015,  1.5174271,  1.6276236],
+    [-0.6698868, -0.6700069, -0.6701218, -0.6702672, -0.6703724],
+    [-0.5816458, -0.7351697, -0.8858694, -1.0847745, -1.2365861]
+])


 class RecursiveLS(MLEModel):
-    """
+    r"""
     Recursive least squares

     Parameters
@@ -51,20 +60,26 @@ class RecursiveLS(MLEModel):
        Time Series Analysis by State Space Methods: Second Edition.
        Oxford University Press.
     """
-
     def __init__(self, endog, exog, constraints=None, **kwargs):
+        # Standardize data
         endog_using_pandas = _is_using_pandas(endog, None)
         if not endog_using_pandas:
             endog = np.asanyarray(endog)
+
         exog_is_using_pandas = _is_using_pandas(exog, None)
         if not exog_is_using_pandas:
             exog = np.asarray(exog)
+
+        # Make sure we have 2-dimensional array
         if exog.ndim == 1:
             if not exog_is_using_pandas:
                 exog = exog[:, None]
             else:
                 exog = pd.DataFrame(exog)
+
         self.k_exog = exog.shape[1]
+
+        # Handle constraints
         self.k_constraints = 0
         self._r_matrix = self._q_matrix = None
         if constraints is not None:
@@ -75,34 +90,67 @@ class RecursiveLS(MLEModel):
             LC = DesignInfo(names).linear_constraint(constraints)
             self._r_matrix, self._q_matrix = LC.coefs, LC.constants
             self.k_constraints = self._r_matrix.shape[0]
+
             nobs = len(endog)
             constraint_endog = np.zeros((nobs, len(self._r_matrix)))
             if endog_using_pandas:
-                constraint_endog = pd.DataFrame(constraint_endog, index=
-                    endog.index)
+                constraint_endog = pd.DataFrame(constraint_endog,
+                                                index=endog.index)
                 endog = concat([endog, constraint_endog], axis=1)
+                # Complexity needed to handle multiple version of pandas
+                # Pandas >= 2 can use endog.iloc[:, 1:] = self._q_matrix.T
                 endog.iloc[:, 1:] = np.tile(self._q_matrix.T, (nobs, 1))
             else:
                 endog[:, 1:] = self._q_matrix[:, 0]
+
+        # Handle coefficient initialization
         kwargs.setdefault('initialization', 'diffuse')
+
+        # Remove some formula-specific kwargs
         formula_kwargs = ['missing', 'missing_idx', 'formula', 'design_info']
         for name in formula_kwargs:
             if name in kwargs:
                 del kwargs[name]
-        super(RecursiveLS, self).__init__(endog, k_states=self.k_exog, exog
-            =exog, **kwargs)
+
+        # Initialize the state space representation
+        super(RecursiveLS, self).__init__(
+            endog, k_states=self.k_exog, exog=exog, **kwargs)
+
+        # Use univariate filtering by default
         self.ssm.filter_univariate = True
+
+        # Concentrate the scale out of the likelihood function
         self.ssm.filter_concentrated = True
+
+        # Setup the state space representation
         self['design'] = np.zeros((self.k_endog, self.k_states, self.nobs))
         self['design', 0] = self.exog[:, :, None].T
         if self._r_matrix is not None:
             self['design', 1:, :] = self._r_matrix[:, :, None]
         self['transition'] = np.eye(self.k_states)
-        self['obs_cov', 0, 0] = 1.0
+
+        # Notice that the filter output does not depend on the measurement
+        # variance, so we set it here to 1
+        self['obs_cov', 0, 0] = 1.
         self['transition'] = np.eye(self.k_states)
+
+        # Linear constraints are technically imposed by adding "fake" endog
+        # variables that are used during filtering, but for all model- and
+        # results-based purposes we want k_endog = 1.
         if self._r_matrix is not None:
             self.k_endog = 1

+    @classmethod
+    def from_formula(cls, formula, data, subset=None, constraints=None):
+        return super(MLEModel, cls).from_formula(formula, data, subset,
+                                                 constraints=constraints)
+
+    def _validate_can_fix_params(self, param_names):
+        raise ValueError('Linear constraints on coefficients should be given'
+                         ' using the `constraints` argument in constructing.'
+                         ' the model. Other parameter constraints are not'
+                         ' available in the resursive least squares model.')
+
     def fit(self):
         """
         Fits the model by application of the Kalman filter
@@ -111,7 +159,72 @@ class RecursiveLS(MLEModel):
         -------
         RecursiveLSResults
         """
-        pass
+        smoother_results = self.smooth(return_ssm=True)
+
+        with self.ssm.fixed_scale(smoother_results.scale):
+            res = self.smooth()
+
+        return res
+
+    def filter(self, return_ssm=False, **kwargs):
+        # Get the state space output
+        result = super(RecursiveLS, self).filter([], transformed=True,
+                                                 cov_type='none',
+                                                 return_ssm=True, **kwargs)
+
+        # Wrap in a results object
+        if not return_ssm:
+            params = result.filtered_state[:, -1]
+            cov_kwds = {
+                'custom_cov_type': 'nonrobust',
+                'custom_cov_params': result.filtered_state_cov[:, :, -1],
+                'custom_description': ('Parameters and covariance matrix'
+                                       ' estimates are RLS estimates'
+                                       ' conditional on the entire sample.')
+            }
+            result = RecursiveLSResultsWrapper(
+                RecursiveLSResults(self, params, result, cov_type='custom',
+                                   cov_kwds=cov_kwds)
+            )
+
+        return result
+
+    def smooth(self, return_ssm=False, **kwargs):
+        # Get the state space output
+        result = super(RecursiveLS, self).smooth([], transformed=True,
+                                                 cov_type='none',
+                                                 return_ssm=True, **kwargs)
+
+        # Wrap in a results object
+        if not return_ssm:
+            params = result.filtered_state[:, -1]
+            cov_kwds = {
+                'custom_cov_type': 'nonrobust',
+                'custom_cov_params': result.filtered_state_cov[:, :, -1],
+                'custom_description': ('Parameters and covariance matrix'
+                                       ' estimates are RLS estimates'
+                                       ' conditional on the entire sample.')
+            }
+            result = RecursiveLSResultsWrapper(
+                RecursiveLSResults(self, params, result, cov_type='custom',
+                                   cov_kwds=cov_kwds)
+            )
+
+        return result
+
+    @property
+    def endog_names(self):
+        endog_names = super(RecursiveLS, self).endog_names
+        return endog_names[0] if isinstance(endog_names, list) else endog_names
+
+    @property
+    def param_names(self):
+        return self.exog_names
+
+    @property
+    def start_params(self):
+        # Only parameter is the measurement disturbance standard deviation
+        return np.zeros(0)

     def update(self, params, **kwargs):
         """
@@ -157,20 +270,30 @@ class RecursiveLSResults(MLEResults):
     statsmodels.tsa.statespace.mlemodel.MLEResults
     """

-    def __init__(self, model, params, filter_results, cov_type='opg', **kwargs
-        ):
-        super(RecursiveLSResults, self).__init__(model, params,
-            filter_results, cov_type, **kwargs)
+    def __init__(self, model, params, filter_results, cov_type='opg',
+                 **kwargs):
+        super(RecursiveLSResults, self).__init__(
+            model, params, filter_results, cov_type, **kwargs)
+
+        # Since we are overriding params with things that are not MLE params,
+        # need to adjust df's
         q = max(self.loglikelihood_burn, self.k_diffuse_states)
         self.df_model = q - self.model.k_constraints
         self.df_resid = self.nobs_effective - self.df_model
+
+        # Save _init_kwds
         self._init_kwds = self.model._get_init_kwds()
-        self.specification = Bunch(**{'k_exog': self.model.k_exog,
+
+        # Save the model specification
+        self.specification = Bunch(**{
+            'k_exog': self.model.k_exog,
             'k_constraints': self.model.k_constraints})
+
+        # Adjust results to remove "faux" endog from the constraints
         if self.model._r_matrix is not None:
             for name in ['forecasts', 'forecasts_error',
-                'forecasts_error_cov', 'standardized_forecasts_error',
-                'forecasts_error_diffuse_cov']:
+                         'forecasts_error_cov', 'standardized_forecasts_error',
+                         'forecasts_error_diffuse_cov']:
                 setattr(self, name, getattr(self, name)[0:1])

     @property
@@ -194,11 +317,26 @@ class RecursiveLSResults(MLEResults):
             - `offset`: an integer giving the offset in the state vector where
                         this component begins
         """
-        pass
+        out = None
+        spec = self.specification
+        start = offset = 0
+        end = offset + spec.k_exog
+        out = Bunch(
+            filtered=self.filtered_state[start:end],
+            filtered_cov=self.filtered_state_cov[start:end, start:end],
+            smoothed=None, smoothed_cov=None,
+            offset=offset
+        )
+        if self.smoothed_state is not None:
+            out.smoothed = self.smoothed_state[start:end]
+        if self.smoothed_state_cov is not None:
+            out.smoothed_cov = (
+                self.smoothed_state_cov[start:end, start:end])
+        return out

     @cache_readonly
     def resid_recursive(self):
-        """
+        r"""
         Recursive residuals

         Returns
@@ -217,16 +355,17 @@ class RecursiveLSResults(MLEResults):
         multiply by the standard deviation.

         Harvey notes that in smaller samples, "although the second moment
-        of the :math:`\\tilde \\sigma_*^{-1} \\tilde v_t`'s is unity, the
+        of the :math:`\tilde \sigma_*^{-1} \tilde v_t`'s is unity, the
         variance is not necessarily equal to unity as the mean need not be
         equal to zero", and he defines an alternative version (which are
         not provided here).
         """
-        pass
+        return (self.filter_results.standardized_forecasts_error[0] *
+                self.scale**0.5)

     @cache_readonly
     def cusum(self):
-        """
+        r"""
         Cumulative sum of standardized recursive residuals statistics

         Returns
@@ -241,15 +380,15 @@ class RecursiveLSResults(MLEResults):

         .. math::

-            W_t = \\frac{1}{\\hat \\sigma} \\sum_{j=k+1}^t w_j
+            W_t = \frac{1}{\hat \sigma} \sum_{j=k+1}^t w_j

         where :math:`w_j` is the recursive residual at time :math:`j` and
-        :math:`\\hat \\sigma` is the estimate of the standard deviation
+        :math:`\hat \sigma` is the estimate of the standard deviation
         from the full sample.

         Excludes the first `k_exog` datapoints.

-        Due to differences in the way :math:`\\hat \\sigma` is calculated, the
+        Due to differences in the way :math:`\hat \sigma` is calculated, the
         output of this function differs slightly from the output in the
         R package strucchange and the Stata contributed .ado file cusum6. The
         calculation in this package is consistent with the description of
@@ -263,11 +402,13 @@ class RecursiveLSResults(MLEResults):
            Journal of the Royal Statistical Society.
            Series B (Methodological) 37 (2): 149-92.
         """
-        pass
+        d = max(self.nobs_diffuse, self.loglikelihood_burn)
+        return (np.cumsum(self.resid_recursive[d:]) /
+                np.std(self.resid_recursive[d:], ddof=1))

     @cache_readonly
     def cusum_squares(self):
-        """
+        r"""
         Cumulative sum of squares of standardized recursive residuals
         statistics

@@ -283,8 +424,8 @@ class RecursiveLSResults(MLEResults):

         .. math::

-            s_t = \\left ( \\sum_{j=k+1}^t w_j^2 \\right ) \\Bigg /
-                  \\left ( \\sum_{j=k+1}^T w_j^2 \\right )
+            s_t = \left ( \sum_{j=k+1}^t w_j^2 \right ) \Bigg /
+                  \left ( \sum_{j=k+1}^T w_j^2 \right )

         where :math:`w_j` is the recursive residual at time :math:`j`.

@@ -298,65 +439,117 @@ class RecursiveLSResults(MLEResults):
            Journal of the Royal Statistical Society.
            Series B (Methodological) 37 (2): 149-92.
         """
-        pass
+        d = max(self.nobs_diffuse, self.loglikelihood_burn)
+        numer = np.cumsum(self.resid_recursive[d:]**2)
+        denom = numer[-1]
+        return numer / denom

     @cache_readonly
     def llf_recursive_obs(self):
         """
         (float) Loglikelihood at observation, computed from recursive residuals
         """
-        pass
+        from scipy.stats import norm
+        return np.log(norm.pdf(self.resid_recursive, loc=0,
+                               scale=self.scale**0.5))

     @cache_readonly
     def llf_recursive(self):
         """
         (float) Loglikelihood defined by recursive residuals, equivalent to OLS
         """
-        pass
+        return np.sum(self.llf_recursive_obs)

     @cache_readonly
     def ssr(self):
         """ssr"""
-        pass
+        d = max(self.nobs_diffuse, self.loglikelihood_burn)
+        return (self.nobs - d) * self.filter_results.obs_cov[0, 0, 0]

     @cache_readonly
     def centered_tss(self):
         """Centered tss"""
-        pass
+        return np.sum((self.filter_results.endog[0] -
+                       np.mean(self.filter_results.endog))**2)

     @cache_readonly
     def uncentered_tss(self):
         """uncentered tss"""
-        pass
+        return np.sum((self.filter_results.endog[0])**2)

     @cache_readonly
     def ess(self):
         """ess"""
-        pass
+        if self.k_constant:
+            return self.centered_tss - self.ssr
+        else:
+            return self.uncentered_tss - self.ssr

     @cache_readonly
     def rsquared(self):
         """rsquared"""
-        pass
+        if self.k_constant:
+            return 1 - self.ssr / self.centered_tss
+        else:
+            return 1 - self.ssr / self.uncentered_tss

     @cache_readonly
     def mse_model(self):
         """mse_model"""
-        pass
+        return self.ess / self.df_model

     @cache_readonly
     def mse_resid(self):
         """mse_resid"""
-        pass
+        return self.ssr / self.df_resid

     @cache_readonly
     def mse_total(self):
         """mse_total"""
-        pass
+        if self.k_constant:
+            return self.centered_tss / (self.df_resid + self.df_model)
+        else:
+            return self.uncentered_tss / (self.df_resid + self.df_model)
+
+    @Appender(MLEResults.get_prediction.__doc__)
+    def get_prediction(self, start=None, end=None, dynamic=False,
+                       information_set='predicted', signal_only=False,
+                       index=None, **kwargs):
+        # Note: need to override this, because we currently do not support
+        # dynamic prediction or forecasts when there are constraints.
+        if start is None:
+            start = self.model._index[0]
+
+        # Handle start, end, dynamic
+        start, end, out_of_sample, prediction_index = (
+            self.model._get_prediction_index(start, end, index))
+
+        # Handle `dynamic`
+        if isinstance(dynamic, (bytes, str)):
+            dynamic, _, _ = self.model._get_index_loc(dynamic)
+
+        if self.model._r_matrix is not None and (out_of_sample or dynamic):
+            raise NotImplementedError('Cannot yet perform out-of-sample or'
+                                      ' dynamic prediction in models with'
+                                      ' constraints.')
+
+        # Perform the prediction
+        # This is a (k_endog x npredictions) array; do not want to squeeze in
+        # case of npredictions = 1
+        prediction_results = self.filter_results.predict(
+            start, end + out_of_sample + 1, dynamic, **kwargs)
+
+        # Return a new mlemodel.PredictionResults object
+        res_obj = PredictionResults(self, prediction_results,
+                                    information_set=information_set,
+                                    signal_only=signal_only,
+                                    row_labels=prediction_index)
+        return PredictionResultsWrapper(res_obj)

     def plot_recursive_coefficient(self, variables=0, alpha=0.05,
-        legend_loc='upper left', fig=None, figsize=None):
-        """
+                                   legend_loc='upper left', fig=None,
+                                   figsize=None):
+        r"""
         Plot the recursively estimated coefficients on a given variable

         Parameters
@@ -381,7 +574,78 @@ class RecursiveLSResults(MLEResults):
         -----
         All plots contain (1 - `alpha`) %  confidence intervals.
         """
-        pass
+        # Get variables
+        if isinstance(variables, (int, str)):
+            variables = [variables]
+        k_variables = len(variables)
+
+        # If a string was given for `variable`, try to get it from exog names
+        exog_names = self.model.exog_names
+        for i in range(k_variables):
+            variable = variables[i]
+            if isinstance(variable, str):
+                variables[i] = exog_names.index(variable)
+
+        # Create the plot
+        from scipy.stats import norm
+        from statsmodels.graphics.utils import _import_mpl, create_mpl_fig
+        plt = _import_mpl()
+        fig = create_mpl_fig(fig, figsize)
+
+        for i in range(k_variables):
+            variable = variables[i]
+            ax = fig.add_subplot(k_variables, 1, i + 1)
+
+            # Get dates, if applicable
+            if hasattr(self.data, 'dates') and self.data.dates is not None:
+                dates = self.data.dates._mpl_repr()
+            else:
+                dates = np.arange(self.nobs)
+            d = max(self.nobs_diffuse, self.loglikelihood_burn)
+
+            # Plot the coefficient
+            coef = self.recursive_coefficients
+            ax.plot(dates[d:], coef.filtered[variable, d:],
+                    label='Recursive estimates: %s' % exog_names[variable])
+
+            # Legend
+            handles, labels = ax.get_legend_handles_labels()
+
+            # Get the critical value for confidence intervals
+            if alpha is not None:
+                critical_value = norm.ppf(1 - alpha / 2.)
+
+                # Plot confidence intervals
+                std_errors = np.sqrt(coef.filtered_cov[variable, variable, :])
+                ci_lower = (
+                    coef.filtered[variable] - critical_value * std_errors)
+                ci_upper = (
+                    coef.filtered[variable] + critical_value * std_errors)
+                ci_poly = ax.fill_between(
+                    dates[d:], ci_lower[d:], ci_upper[d:], alpha=0.2
+                )
+                ci_label = ('$%.3g \\%%$ confidence interval'
+                            % ((1 - alpha)*100))
+
+                # Only add CI to legend for the first plot
+                if i == 0:
+                    # Proxy artist for fill_between legend entry
+                    # See https://matplotlib.org/1.3.1/users/legend_guide.html
+                    p = plt.Rectangle((0, 0), 1, 1,
+                                      fc=ci_poly.get_facecolor()[0])
+
+                    handles.append(p)
+                    labels.append(ci_label)
+
+            ax.legend(handles, labels, loc=legend_loc)
+
+            # Remove xticks for all but the last plot
+            if i < k_variables - 1:
+                ax.xaxis.set_ticklabels([])
+
+        fig.tight_layout()
+
+        return fig

     def _cusum_significance_bounds(self, alpha, ddof=0, points=None):
         """
@@ -411,11 +675,30 @@ class RecursiveLSResults(MLEResults):
         three initial observations to get the initial OLS estimates, whereas
         we do not need to do that.
         """
-        pass
-
-    def plot_cusum(self, alpha=0.05, legend_loc='upper left', fig=None,
-        figsize=None):
-        """
+        # Get the constant associated with the significance level
+        if alpha == 0.01:
+            scalar = 1.143
+        elif alpha == 0.05:
+            scalar = 0.948
+        elif alpha == 0.10:
+            scalar = 0.950
+        else:
+            raise ValueError('Invalid significance level.')
+
+        # Get the points for the significance bound lines
+        d = max(self.nobs_diffuse, self.loglikelihood_burn)
+        tmp = (self.nobs - d - ddof)**0.5
+
+        def upper_line(x):
+            return scalar * tmp + 2 * scalar * (x - d) / tmp
+
+        if points is None:
+            points = np.array([d, self.nobs])
+        return -upper_line(points), upper_line(points)
+
+    def plot_cusum(self, alpha=0.05, legend_loc='upper left',
+                   fig=None, figsize=None):
+        r"""
         Plot the CUSUM statistic and significance bounds.

         Parameters
@@ -445,7 +728,32 @@ class RecursiveLSResults(MLEResults):
            Journal of the Royal Statistical Society.
            Series B (Methodological) 37 (2): 149-92.
         """
-        pass
+        # Create the plot
+        from statsmodels.graphics.utils import _import_mpl, create_mpl_fig
+        _import_mpl()
+        fig = create_mpl_fig(fig, figsize)
+        ax = fig.add_subplot(1, 1, 1)
+
+        # Get dates, if applicable
+        if hasattr(self.data, 'dates') and self.data.dates is not None:
+            dates = self.data.dates._mpl_repr()
+        else:
+            dates = np.arange(self.nobs)
+        d = max(self.nobs_diffuse, self.loglikelihood_burn)
+
+        # Plot cusum series and reference line
+        ax.plot(dates[d:], self.cusum, label='CUSUM')
+        ax.hlines(0, dates[d], dates[-1], color='k', alpha=0.3)
+
+        # Plot significance bounds
+        lower_line, upper_line = self._cusum_significance_bounds(alpha)
+        ax.plot([dates[d], dates[-1]], upper_line, 'k--',
+                label='%d%% significance' % (alpha * 100))
+        ax.plot([dates[d], dates[-1]], lower_line, 'k--')
+
+        ax.legend(loc=legend_loc)
+
+        return fig

     def _cusum_squares_significance_bounds(self, alpha, points=None):
         """
@@ -462,11 +770,27 @@ class RecursiveLSResults(MLEResults):
         computing relatively good approximations for any number of
         observations.
         """
-        pass
-
-    def plot_cusum_squares(self, alpha=0.05, legend_loc='upper left', fig=
-        None, figsize=None):
-        """
+        # Get the approximate critical value associated with the significance
+        # level
+        d = max(self.nobs_diffuse, self.loglikelihood_burn)
+        n = 0.5 * (self.nobs - d) - 1
+        try:
+            ix = [0.1, 0.05, 0.025, 0.01, 0.005].index(alpha / 2)
+        except ValueError:
+            raise ValueError('Invalid significance level.')
+        scalars = _cusum_squares_scalars[:, ix]
+        crit = scalars[0] / n**0.5 + scalars[1] / n + scalars[2] / n**1.5
+
+        # Get the points for the significance bound lines
+        if points is None:
+            points = np.array([d, self.nobs])
+        line = (points - d) / (self.nobs - d)
+
+        return line - crit, line + crit
+
+    def plot_cusum_squares(self, alpha=0.05, legend_loc='upper left',
+                           fig=None, figsize=None):
+        r"""
         Plot the CUSUM of squares statistic and significance bounds.

         Parameters
@@ -503,14 +827,41 @@ class RecursiveLSResults(MLEResults):
            in Medium and Large Sized Samples."
            Oxford Bulletin of Economics and Statistics 56 (3): 355-65.
         """
-        pass
+        # Create the plot
+        from statsmodels.graphics.utils import _import_mpl, create_mpl_fig
+        _import_mpl()
+        fig = create_mpl_fig(fig, figsize)
+        ax = fig.add_subplot(1, 1, 1)
+
+        # Get dates, if applicable
+        if hasattr(self.data, 'dates') and self.data.dates is not None:
+            dates = self.data.dates._mpl_repr()
+        else:
+            dates = np.arange(self.nobs)
+        d = max(self.nobs_diffuse, self.loglikelihood_burn)
+
+        # Plot cusum series and reference line
+        ax.plot(dates[d:], self.cusum_squares, label='CUSUM of squares')
+        ref_line = (np.arange(d, self.nobs) - d) / (self.nobs - d)
+        ax.plot(dates[d:], ref_line, 'k', alpha=0.3)
+
+        # Plot significance bounds
+        lower_line, upper_line = self._cusum_squares_significance_bounds(alpha)
+        ax.plot([dates[d], dates[-1]], upper_line, 'k--',
+                label='%d%% significance' % (alpha * 100))
+        ax.plot([dates[d], dates[-1]], lower_line, 'k--')
+
+        ax.legend(loc=legend_loc)
+
+        return fig


 class RecursiveLSResultsWrapper(MLEResultsWrapper):
     _attrs = {}
-    _wrap_attrs = wrap.union_dicts(MLEResultsWrapper._wrap_attrs, _attrs)
+    _wrap_attrs = wrap.union_dicts(MLEResultsWrapper._wrap_attrs,
+                                   _attrs)
     _methods = {}
-    _wrap_methods = wrap.union_dicts(MLEResultsWrapper._wrap_methods, _methods)
-
-
-wrap.populate_wrapper(RecursiveLSResultsWrapper, RecursiveLSResults)
+    _wrap_methods = wrap.union_dicts(MLEResultsWrapper._wrap_methods,
+                                     _methods)
+wrap.populate_wrapper(RecursiveLSResultsWrapper,  # noqa:E305
+                      RecursiveLSResults)
diff --git a/statsmodels/regression/rolling.py b/statsmodels/regression/rolling.py
index 9826ab9d6..f7cc3eb75 100644
--- a/statsmodels/regression/rolling.py
+++ b/statsmodels/regression/rolling.py
@@ -8,29 +8,66 @@ Copyright (c) 2019 Kevin Sheppard
 License: 3-clause BSD
 """
 from statsmodels.compat.numpy import lstsq
-from statsmodels.compat.pandas import Appender, Substitution, cache_readonly, call_cached_func, get_cached_doc
+from statsmodels.compat.pandas import (
+    Appender,
+    Substitution,
+    cache_readonly,
+    call_cached_func,
+    get_cached_doc,
+)
+
 from collections import namedtuple
+
 import numpy as np
 from pandas import DataFrame, MultiIndex, Series
 from scipy import stats
+
 from statsmodels.base import model
 from statsmodels.base.model import LikelihoodModelResults, Model
-from statsmodels.regression.linear_model import RegressionModel, RegressionResults
+from statsmodels.regression.linear_model import (
+    RegressionModel,
+    RegressionResults,
+)
 from statsmodels.tools.validation import array_like, int_like, string_like
-RollingStore = namedtuple('RollingStore', ['params', 'ssr', 'llf', 'nobs',
-    's2', 'xpxi', 'xeex', 'centered_tss', 'uncentered_tss'])
-common_params = '\n'.join(map(strip4, model._model_params_doc.split('\n')))
-window_parameters = """window : int
+
+
+def strip4(line):
+    if line.startswith(" "):
+        return line[4:]
+    return line
+
+
+RollingStore = namedtuple(
+    "RollingStore",
+    [
+        "params",
+        "ssr",
+        "llf",
+        "nobs",
+        "s2",
+        "xpxi",
+        "xeex",
+        "centered_tss",
+        "uncentered_tss",
+    ],
+)
+
+common_params = "\n".join(map(strip4, model._model_params_doc.split("\n")))
+window_parameters = """\
+window : int
     Length of the rolling window. Must be strictly larger than the number
     of variables in the model.
 """
+
 weight_parameters = """
 weights : array_like, optional
     A 1d array of weights.  If you supply 1/W then the variables are
     pre- multiplied by 1/sqrt(W).  If no weights are supplied the
     default value is 1 and WLS results are the same as OLS.
 """
-_missing_param_doc = """min_nobs : {int, None}
+
+_missing_param_doc = """\
+min_nobs : {int, None}
     Minimum number of observations required to estimate a model when
     data are missing.  If None, the minimum depends on the number of
     regressors in the model. Must be smaller than window.
@@ -45,8 +82,11 @@ expanding : bool, default False
     an expanding scheme until ``window`` observations are available, after
     which rolling is used.
 """
+
+
 extra_base = _missing_param_doc
 extra_parameters = window_parameters + weight_parameters + extra_base
+
 _doc = """
 Rolling %(model_type)s Least Squares

@@ -88,51 +128,175 @@ expanding scheme until window observation, and the roll.
 """


-@Substitution(model_type='Weighted', model='WLS', parameters=common_params,
-    extra_parameters=extra_parameters)
+@Substitution(
+    model_type="Weighted",
+    model="WLS",
+    parameters=common_params,
+    extra_parameters=extra_parameters,
+)
 @Appender(_doc)
 class RollingWLS:
-
-    def __init__(self, endog, exog, window=None, *, weights=None, min_nobs=
-        None, missing='drop', expanding=False):
-        missing = string_like(missing, 'missing', options=('drop', 'raise',
-            'skip'))
-        temp_msng = 'drop' if missing != 'raise' else 'raise'
+    def __init__(
+        self,
+        endog,
+        exog,
+        window=None,
+        *,
+        weights=None,
+        min_nobs=None,
+        missing="drop",
+        expanding=False
+    ):
+        # Call Model.__init__ twice to use const detection in first pass
+        # But to not drop in the second pass
+        missing = string_like(
+            missing, "missing", options=("drop", "raise", "skip")
+        )
+        temp_msng = "drop" if missing != "raise" else "raise"
         Model.__init__(self, endog, exog, missing=temp_msng, hasconst=None)
         k_const = self.k_constant
         const_idx = self.data.const_idx
-        Model.__init__(self, endog, exog, missing='none', hasconst=False)
+        Model.__init__(self, endog, exog, missing="none", hasconst=False)
         self.k_constant = k_const
         self.data.const_idx = const_idx
-        self._y = array_like(endog, 'endog')
+        self._y = array_like(endog, "endog")
         nobs = self._y.shape[0]
-        self._x = array_like(exog, 'endog', ndim=2, shape=(nobs, None))
-        window = int_like(window, 'window', optional=True)
-        weights = array_like(weights, 'weights', optional=True, shape=(nobs,))
+        self._x = array_like(exog, "endog", ndim=2, shape=(nobs, None))
+        window = int_like(window, "window", optional=True)
+        weights = array_like(weights, "weights", optional=True, shape=(nobs,))
         self._window = window if window is not None else self._y.shape[0]
         self._weighted = weights is not None
         self._weights = np.ones(nobs) if weights is None else weights
         w12 = np.sqrt(self._weights)
         self._wy = w12 * self._y
         self._wx = w12[:, None] * self._x
-        min_nobs = int_like(min_nobs, 'min_nobs', optional=True)
+
+        min_nobs = int_like(min_nobs, "min_nobs", optional=True)
         self._min_nobs = min_nobs if min_nobs is not None else self._x.shape[1]
         if self._min_nobs < self._x.shape[1] or self._min_nobs > self._window:
             raise ValueError(
-                'min_nobs must be larger than the number of regressors in the model and less than window'
-                )
+                "min_nobs must be larger than the number of "
+                "regressors in the model and less than window"
+            )
+
         self._expanding = expanding
+
         self._is_nan = np.zeros_like(self._y, dtype=bool)
         self._has_nan = self._find_nans()
         self.const_idx = self.data.const_idx
-        self._skip_missing = missing == 'skip'
+        self._skip_missing = missing == "skip"
+
+    def _handle_data(self, endog, exog, missing, hasconst, **kwargs):
+        return Model._handle_data(
+            self, endog, exog, missing, hasconst, **kwargs
+        )
+
+    def _find_nans(self):
+        nans = np.isnan(self._y)
+        nans |= np.any(np.isnan(self._x), axis=1)
+        nans |= np.isnan(self._weights)
+        self._is_nan[:] = nans
+        has_nan = np.cumsum(nans)
+        w = self._window
+        has_nan[w - 1 :] = has_nan[w - 1 :] - has_nan[: -(w - 1)]
+        if self._expanding:
+            has_nan[: self._min_nobs] = False
+        else:
+            has_nan[: w - 1] = False
+
+        return has_nan.astype(bool)
+
+    def _get_data(self, idx):
+        window = self._window
+        if idx >= window:
+            loc = slice(idx - window, idx)
+        else:
+            loc = slice(idx)
+        y = self._y[loc]
+        wy = self._wy[loc]
+        wx = self._wx[loc]
+        weights = self._weights[loc]
+        missing = self._is_nan[loc]
+        not_missing = ~missing
+        if np.any(missing):
+            y = y[not_missing]
+            wy = wy[not_missing]
+            wx = wx[not_missing]
+            weights = weights[not_missing]
+        return y, wy, wx, weights, not_missing
+
+    def _fit_single(self, idx, wxpwx, wxpwy, nobs, store, params_only, method):
+        if nobs < self._min_nobs:
+            return
+        try:
+            if (method == "inv") or not params_only:
+                wxpwxi = np.linalg.inv(wxpwx)
+            if method == "inv":
+                params = wxpwxi @ wxpwy
+            else:
+                _, wy, wx, _, _ = self._get_data(idx)
+                if method == "lstsq":
+                    params = lstsq(wx, wy)[0]
+                else:  # 'pinv'
+                    wxpwxiwxp = np.linalg.pinv(wx)
+                    params = wxpwxiwxp @ wy
+
+        except np.linalg.LinAlgError:
+            return
+        store.params[idx - 1] = params
+        if params_only:
+            return
+        y, wy, wx, weights, _ = self._get_data(idx)
+
+        wresid, ssr, llf = self._loglike(params, wy, wx, weights, nobs)
+        wxwresid = wx * wresid[:, None]
+        wxepwxe = wxwresid.T @ wxwresid
+        tot_params = wx.shape[1]
+        s2 = ssr / (nobs - tot_params)
+
+        centered_tss, uncentered_tss = self._sum_of_squares(y, wy, weights)
+
+        store.ssr[idx - 1] = ssr
+        store.llf[idx - 1] = llf
+        store.nobs[idx - 1] = nobs
+        store.s2[idx - 1] = s2
+        store.xpxi[idx - 1] = wxpwxi
+        store.xeex[idx - 1] = wxepwxe
+        store.centered_tss[idx - 1] = centered_tss
+        store.uncentered_tss[idx - 1] = uncentered_tss
+
+    def _loglike(self, params, wy, wx, weights, nobs):
+        nobs2 = nobs / 2.0
+        wresid = wy - wx @ params
+        ssr = np.sum(wresid ** 2, axis=0)
+        llf = -np.log(ssr) * nobs2  # concentrated likelihood
+        llf -= (1 + np.log(np.pi / nobs2)) * nobs2  # with constant
+        llf += 0.5 * np.sum(np.log(weights))
+        return wresid, ssr, llf
+
+    def _sum_of_squares(self, y, wy, weights):
+        mean = np.average(y, weights=weights)
+        centered_tss = np.sum(weights * (y - mean) ** 2)
+        uncentered_tss = np.dot(wy, wy)
+        return centered_tss, uncentered_tss

     def _reset(self, idx):
         """Compute xpx and xpy using a single dot product"""
-        pass
-
-    def fit(self, method='inv', cov_type='nonrobust', cov_kwds=None, reset=
-        None, use_t=False, params_only=False):
+        _, wy, wx, _, not_missing = self._get_data(idx)
+        nobs = not_missing.sum()
+        xpx = wx.T @ wx
+        xpy = wx.T @ wy
+        return xpx, xpy, nobs
+
+    def fit(
+        self,
+        method="inv",
+        cov_type="nonrobust",
+        cov_kwds=None,
+        reset=None,
+        use_t=False,
+        params_only=False,
+    ):
         """
         Estimate model parameters.

@@ -170,21 +334,132 @@ class RollingWLS:
         RollingRegressionResults
             Estimation results where all pre-sample values are nan-filled.
         """
-        pass
+        method = string_like(
+            method, "method", options=("inv", "lstsq", "pinv")
+        )
+        reset = int_like(reset, "reset", optional=True)
+        reset = self._y.shape[0] if reset is None else reset
+        if reset < 1:
+            raise ValueError("reset must be a positive integer")
+
+        nobs, k = self._x.shape
+        store = RollingStore(
+            params=np.full((nobs, k), np.nan),
+            ssr=np.full(nobs, np.nan),
+            llf=np.full(nobs, np.nan),
+            nobs=np.zeros(nobs, dtype=int),
+            s2=np.full(nobs, np.nan),
+            xpxi=np.full((nobs, k, k), np.nan),
+            xeex=np.full((nobs, k, k), np.nan),
+            centered_tss=np.full(nobs, np.nan),
+            uncentered_tss=np.full(nobs, np.nan),
+        )
+        w = self._window
+        first = self._min_nobs if self._expanding else w
+        xpx, xpy, nobs = self._reset(first)
+        if not (self._has_nan[first - 1] and self._skip_missing):
+            self._fit_single(first, xpx, xpy, nobs, store, params_only, method)
+        wx, wy = self._wx, self._wy
+        for i in range(first + 1, self._x.shape[0] + 1):
+            if self._has_nan[i - 1] and self._skip_missing:
+                continue
+            if i % reset == 0:
+                xpx, xpy, nobs = self._reset(i)
+            else:
+                if not self._is_nan[i - w - 1] and i > w:
+                    remove_x = wx[i - w - 1 : i - w]
+                    xpx -= remove_x.T @ remove_x
+                    xpy -= remove_x.T @ wy[i - w - 1 : i - w]
+                    nobs -= 1
+                if not self._is_nan[i - 1]:
+                    add_x = wx[i - 1 : i]
+                    xpx += add_x.T @ add_x
+                    xpy += add_x.T @ wy[i - 1 : i]
+                    nobs += 1
+
+            self._fit_single(i, xpx, xpy, nobs, store, params_only, method)
+
+        return RollingRegressionResults(
+            self, store, self.k_constant, use_t, cov_type
+        )
+
+    @classmethod
+    @Appender(Model.from_formula.__doc__)
+    def from_formula(
+        cls, formula, data, window, weights=None, subset=None, *args, **kwargs
+    ):
+        if subset is not None:
+            data = data.loc[subset]
+        eval_env = kwargs.pop("eval_env", None)
+        if eval_env is None:
+            eval_env = 2
+        elif eval_env == -1:
+            from patsy import EvalEnvironment
+
+            eval_env = EvalEnvironment({})
+        else:
+            eval_env += 1  # we're going down the stack again
+        missing = kwargs.get("missing", "skip")
+        from patsy import NAAction, dmatrices
+
+        na_action = NAAction(on_NA="raise", NA_types=[])
+        result = dmatrices(
+            formula,
+            data,
+            eval_env,
+            return_type="dataframe",
+            NA_action=na_action,
+        )
+
+        endog, exog = result
+        if (endog.ndim > 1 and endog.shape[1] > 1) or endog.ndim > 2:
+            raise ValueError(
+                "endog has evaluated to an array with multiple "
+                "columns that has shape {0}. This occurs when "
+                "the variable converted to endog is non-numeric"
+                " (e.g., bool or str).".format(endog.shape)
+            )
+
+        kwargs.update({"missing": missing, "window": window})
+        if weights is not None:
+            kwargs["weights"] = weights
+        mod = cls(endog, exog, *args, **kwargs)
+        mod.formula = formula
+        # since we got a dataframe, attach the original
+        mod.data.frame = data
+        return mod


 extra_parameters = window_parameters + extra_base


-@Substitution(model_type='Ordinary', model='OLS', parameters=common_params,
-    extra_parameters=extra_parameters)
+@Substitution(
+    model_type="Ordinary",
+    model="OLS",
+    parameters=common_params,
+    extra_parameters=extra_parameters,
+)
 @Appender(_doc)
 class RollingOLS(RollingWLS):
-
-    def __init__(self, endog, exog, window=None, *, min_nobs=None, missing=
-        'drop', expanding=False):
-        super().__init__(endog, exog, window, weights=None, min_nobs=
-            min_nobs, missing=missing, expanding=expanding)
+    def __init__(
+        self,
+        endog,
+        exog,
+        window=None,
+        *,
+        min_nobs=None,
+        missing="drop",
+        expanding=False
+    ):
+        super().__init__(
+            endog,
+            exog,
+            window,
+            weights=None,
+            min_nobs=min_nobs,
+            missing=missing,
+            expanding=expanding,
+        )


 class RollingRegressionResults:
@@ -205,10 +480,12 @@ class RollingRegressionResults:
     cov_type : str
         Name of covariance estimator
     """
+
     _data_in_cache = tuple()

-    def __init__(self, model, store: RollingStore, k_constant, use_t, cov_type
-        ):
+    def __init__(
+        self, model, store: RollingStore, k_constant, use_t, cov_type
+    ):
         self.model = model
         self._params = store.params
         self._ssr = store.ssr
@@ -222,7 +499,7 @@ class RollingRegressionResults:
         self._k_constant = k_constant
         self._nvar = self._xpxi.shape[-1]
         if use_t is None:
-            use_t = cov_type == 'nonrobust'
+            use_t = cov_type == "nonrobust"
         self._use_t = use_t
         self._cov_type = cov_type
         self._use_pandas = self.model.data.row_labels is not None
@@ -231,17 +508,123 @@ class RollingRegressionResults:

     def _wrap(self, val):
         """Wrap output as pandas Series or DataFrames as needed"""
-        pass
+        if not self._use_pandas:
+            return val
+        col_names = self.model.data.param_names
+        row_names = self.model.data.row_labels
+        if val.ndim == 1:
+            return Series(val, index=row_names)
+        if val.ndim == 2:
+            return DataFrame(val, columns=col_names, index=row_names)
+        else:  # ndim == 3
+            mi = MultiIndex.from_product((row_names, col_names))
+            val = np.reshape(val, (-1, val.shape[-1]))
+            return DataFrame(val, columns=col_names, index=mi)
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.aic))
+    def aic(self):
+        return self._wrap(call_cached_func(RegressionResults.aic, self))
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.bic))
+    def bic(self):
+        with np.errstate(divide="ignore"):
+            return self._wrap(call_cached_func(RegressionResults.bic, self))
+
+    def info_criteria(self, crit, dk_params=0):
+        return self._wrap(
+            RegressionResults.info_criteria(self, crit, dk_params=dk_params)
+        )

     @cache_readonly
     def params(self):
         """Estimated model parameters"""
-        pass
+        return self._wrap(self._params)
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.ssr))
+    def ssr(self):
+        return self._wrap(self._ssr)
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.llf))
+    def llf(self):
+        return self._wrap(self._llf)
+
+    @cache_readonly
+    @Appender(RegressionModel.df_model.__doc__)
+    def df_model(self):
+        return self._nvar - self._k_constant

     @cache_readonly
     def k_constant(self):
         """Flag indicating whether the model contains a constant"""
-        pass
+        return self._k_constant
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.centered_tss))
+    def centered_tss(self):
+        return self._centered_tss
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.uncentered_tss))
+    def uncentered_tss(self):
+        return self._uncentered_tss
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.rsquared))
+    def rsquared(self):
+        return self._wrap(call_cached_func(RegressionResults.rsquared, self))
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.rsquared_adj))
+    def rsquared_adj(self):
+        return self._wrap(
+            call_cached_func(RegressionResults.rsquared_adj, self)
+        )
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.nobs))
+    def nobs(self):
+        return self._wrap(self._nobs)
+
+    @cache_readonly
+    @Appender(RegressionModel.df_resid.__doc__)
+    def df_resid(self):
+        return self._wrap(self._nobs - self.df_model - self._k_constant)
+
+    @cache_readonly
+    @Appender(RegressionResults.use_t.__doc__)
+    def use_t(self):
+        return self._use_t
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.ess))
+    def ess(self):
+        return self._wrap(call_cached_func(RegressionResults.ess, self))
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.mse_model))
+    def mse_model(self):
+        return self._wrap(call_cached_func(RegressionResults.mse_model, self))
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.mse_resid))
+    def mse_resid(self):
+        return self._wrap(call_cached_func(RegressionResults.mse_resid, self))
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.mse_total))
+    def mse_total(self):
+        return self._wrap(call_cached_func(RegressionResults.mse_total, self))
+
+    @cache_readonly
+    def _cov_params(self):
+        if self._cov_type == "nonrobust":
+            return self._s2[:, None, None] * self._xpxi
+        else:
+            return self._xpxi @ self._xepxe @ self._xpxi

     def cov_params(self):
         """
@@ -257,17 +640,128 @@ class RollingRegressionResults:
             key (observation, variable), so that the covariance for
             observation with index i is cov.loc[i].
         """
-        pass
+        return self._wrap(self._cov_params)
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.f_pvalue))
+    def f_pvalue(self):
+        with np.errstate(invalid="ignore"):
+            return self._wrap(
+                call_cached_func(RegressionResults.f_pvalue, self)
+            )
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.fvalue))
+    def fvalue(self):
+        if self._cov_type == "nonrobust":
+            return self.mse_model / self.mse_resid
+        else:
+            nobs = self._params.shape[0]
+            stat = np.full(nobs, np.nan)
+            k = self._params.shape[1]
+            r = np.eye(k)
+            locs = list(range(k))
+            if self.k_constant:
+                locs.pop(self.model.const_idx)
+            if not locs:
+                return stat
+            r = r[locs]
+            vcv = self._cov_params
+            rvcvr = r @ vcv @ r.T
+            p = self._params
+            for i in range(nobs):
+                rp = p[i : i + 1] @ r.T
+                denom = rp.shape[1]
+                inv_cov = np.linalg.inv(rvcvr[i])
+                stat[i] = np.squeeze(rp @ inv_cov @ rp.T) / denom
+            return stat
+
+    @cache_readonly
+    @Appender(get_cached_doc(RegressionResults.bse))
+    def bse(self):
+        with np.errstate(invalid="ignore"):
+            return self._wrap(np.sqrt(np.diagonal(self._cov_params, 0, 2)))
+
+    @cache_readonly
+    @Appender(get_cached_doc(LikelihoodModelResults.tvalues))
+    def tvalues(self):
+        with np.errstate(invalid="ignore"):
+            return self._wrap(
+                call_cached_func(LikelihoodModelResults.tvalues, self)
+            )
+
+    @cache_readonly
+    @Appender(get_cached_doc(LikelihoodModelResults.pvalues))
+    def pvalues(self):
+        if self.use_t:
+            df_resid = getattr(self, "df_resid_inference", self.df_resid)
+            df_resid = np.asarray(df_resid)[:, None]
+            with np.errstate(invalid="ignore"):
+                return stats.t.sf(np.abs(self.tvalues), df_resid) * 2
+        else:
+            with np.errstate(invalid="ignore"):
+                return stats.norm.sf(np.abs(self.tvalues)) * 2
+
+    def _conf_int(self, alpha, cols):
+        bse = np.asarray(self.bse)
+
+        if self.use_t:
+            dist = stats.t
+            df_resid = getattr(self, "df_resid_inference", self.df_resid)
+            df_resid = np.asarray(df_resid)[:, None]
+            q = dist.ppf(1 - alpha / 2, df_resid)
+        else:
+            dist = stats.norm
+            q = dist.ppf(1 - alpha / 2)
+
+        params = np.asarray(self.params)
+        lower = params - q * bse
+        upper = params + q * bse
+        if cols is not None:
+            cols = np.asarray(cols)
+            lower = lower[:, cols]
+            upper = upper[:, cols]
+        return np.asarray(list(zip(lower, upper)))
+
+    @Appender(LikelihoodModelResults.conf_int.__doc__)
+    def conf_int(self, alpha=0.05, cols=None):
+        ci = self._conf_int(alpha, cols)
+        if not self._use_pandas:
+            return ci
+        ci_names = ("lower", "upper")
+        row_names = self.model.data.row_labels
+        col_names = self.model.data.param_names
+        if cols is not None:
+            col_names = [col_names[i] for i in cols]
+        mi = MultiIndex.from_product((col_names, ci_names))
+        ci = np.reshape(np.swapaxes(ci, 1, 2), (ci.shape[0], -1))
+        return DataFrame(ci, columns=mi, index=row_names)

     @property
     def cov_type(self):
         """Name of covariance estimator"""
-        pass
+        return self._cov_type
+
+    @classmethod
+    @Appender(LikelihoodModelResults.load.__doc__)
+    def load(cls, fname):
+        return LikelihoodModelResults.load(fname)
+
     remove_data = LikelihoodModelResults.remove_data

-    def plot_recursive_coefficient(self, variables=None, alpha=0.05,
-        legend_loc='upper left', fig=None, figsize=None):
-        """
+    @Appender(LikelihoodModelResults.save.__doc__)
+    def save(self, fname, remove_data=False):
+        return LikelihoodModelResults.save(self, fname, remove_data)
+
+    def plot_recursive_coefficient(
+        self,
+        variables=None,
+        alpha=0.05,
+        legend_loc="upper left",
+        fig=None,
+        figsize=None,
+    ):
+        r"""
         Plot the recursively estimated coefficients on a given variable

         Parameters
@@ -294,4 +788,67 @@ class RollingRegressionResults:
         Figure
             The matplotlib Figure object.
         """
-        pass
+        from statsmodels.graphics.utils import _import_mpl, create_mpl_fig
+
+        if alpha is not None:
+            ci = self._conf_int(alpha, None)
+
+        row_labels = self.model.data.row_labels
+        if row_labels is None:
+            row_labels = np.arange(self._params.shape[0])
+        k_variables = self._params.shape[1]
+        param_names = self.model.data.param_names
+        if variables is None:
+            variable_idx = list(range(k_variables))
+        else:
+            if isinstance(variables, (int, str)):
+                variables = [variables]
+            variable_idx = []
+            for i in range(len(variables)):
+                variable = variables[i]
+                if variable in param_names:
+                    variable_idx.append(param_names.index(variable))
+                elif isinstance(variable, int):
+                    variable_idx.append(variable)
+                else:
+                    msg = (
+                        "variable {0} is not an integer and was not found "
+                        "in the list of variable "
+                        "names: {1}".format(
+                            variables[i], ", ".join(param_names)
+                        )
+                    )
+                    raise ValueError(msg)
+
+        _import_mpl()
+        fig = create_mpl_fig(fig, figsize)
+
+        loc = 0
+        import pandas as pd
+
+        if isinstance(row_labels, pd.PeriodIndex):
+            row_labels = row_labels.to_timestamp()
+        row_labels = np.asarray(row_labels)
+        for i in variable_idx:
+            ax = fig.add_subplot(len(variable_idx), 1, loc + 1)
+            params = self._params[:, i]
+            valid = ~np.isnan(self._params[:, i])
+            row_lbl = row_labels[valid]
+            ax.plot(row_lbl, params[valid])
+            if alpha is not None:
+                this_ci = np.reshape(ci[:, :, i], (-1, 2))
+                if not np.all(np.isnan(this_ci)):
+                    ax.plot(
+                        row_lbl, this_ci[:, 0][valid], "k:", label="Lower CI"
+                    )
+                    ax.plot(
+                        row_lbl, this_ci[:, 1][valid], "k:", label="Upper CI"
+                    )
+                    if loc == 0:
+                        ax.legend(loc=legend_loc)
+            ax.set_xlim(row_lbl[0], row_lbl[-1])
+            ax.set_title(param_names[i])
+            loc += 1
+
+        fig.tight_layout()
+        return fig
diff --git a/statsmodels/robust/norms.py b/statsmodels/robust/norms.py
index 3d2860112..a00e51971 100644
--- a/statsmodels/robust/norms.py
+++ b/statsmodels/robust/norms.py
@@ -1,5 +1,7 @@
 import numpy as np

+# TODO: add plots to weighting functions for online docs.
+

 def _cabs(x):
     """absolute value function that changes complex sign based on real sign
@@ -7,7 +9,8 @@ def _cabs(x):
     This could be useful for complex step derivatives of functions that
     need abs. Not yet used.
     """
-    pass
+    sign = (x.real >= 0) * 2 - 1
+    return sign * x


 class RobustNorm:
@@ -44,7 +47,7 @@ class RobustNorm:

         -2 loglike used in M-estimator
         """
-        pass
+        raise NotImplementedError

     def psi(self, z):
         """
@@ -54,7 +57,7 @@ class RobustNorm:

         psi = rho'
         """
-        pass
+        raise NotImplementedError

     def weights(self, z):
         """
@@ -64,7 +67,7 @@ class RobustNorm:

         psi(z) / z
         """
-        pass
+        raise NotImplementedError

     def psi_deriv(self, z):
         """
@@ -76,7 +79,7 @@ class RobustNorm:

         psi_derive = psi'
         """
-        pass
+        raise NotImplementedError

     def __call__(self, z):
         """
@@ -108,7 +111,8 @@ class LeastSquares(RobustNorm):
         rho : ndarray
             rho(z) = (1/2.)*z**2
         """
-        pass
+
+        return z**2 * 0.5

     def psi(self, z):
         """
@@ -126,7 +130,8 @@ class LeastSquares(RobustNorm):
         psi : ndarray
             psi(z) = z
         """
-        pass
+
+        return np.asarray(z)

     def weights(self, z):
         """
@@ -144,7 +149,9 @@ class LeastSquares(RobustNorm):
         weights : ndarray
             weights(z) = np.ones(z.shape)
         """
-        pass
+
+        z = np.asarray(z)
+        return np.ones(z.shape, np.float64)

     def psi_deriv(self, z):
         """
@@ -159,7 +166,7 @@ class LeastSquares(RobustNorm):
         -----
         Used to estimate the robust covariance matrix.
         """
-        pass
+        return np.ones(z.shape, np.float64)


 class HuberT(RobustNorm):
@@ -184,10 +191,11 @@ class HuberT(RobustNorm):
         """
         Huber's T is defined piecewise over the range for z
         """
-        pass
+        z = np.asarray(z)
+        return np.less_equal(np.abs(z), self.t)

     def rho(self, z):
-        """
+        r"""
         The robust criterion function for Huber's t.

         Parameters
@@ -198,14 +206,17 @@ class HuberT(RobustNorm):
         Returns
         -------
         rho : ndarray
-            rho(z) = .5*z**2            for \\|z\\| <= t
+            rho(z) = .5*z**2            for \|z\| <= t

-            rho(z) = \\|z\\|*t - .5*t**2    for \\|z\\| > t
+            rho(z) = \|z\|*t - .5*t**2    for \|z\| > t
         """
-        pass
+        z = np.asarray(z)
+        test = self._subset(z)
+        return (test * 0.5 * z**2 +
+                (1 - test) * (np.abs(z) * self.t - 0.5 * self.t**2))

     def psi(self, z):
-        """
+        r"""
         The psi function for Huber's t estimator

         The analytic derivative of rho
@@ -218,14 +229,16 @@ class HuberT(RobustNorm):
         Returns
         -------
         psi : ndarray
-            psi(z) = z      for \\|z\\| <= t
+            psi(z) = z      for \|z\| <= t

-            psi(z) = sign(z)*t for \\|z\\| > t
+            psi(z) = sign(z)*t for \|z\| > t
         """
-        pass
+        z = np.asarray(z)
+        test = self._subset(z)
+        return test * z + (1 - test) * self.t * np.sign(z)

     def weights(self, z):
-        """
+        r"""
         Huber's t weighting function for the IRLS algorithm

         The psi function scaled by z
@@ -238,11 +251,21 @@ class HuberT(RobustNorm):
         Returns
         -------
         weights : ndarray
-            weights(z) = 1          for \\|z\\| <= t
+            weights(z) = 1          for \|z\| <= t

-            weights(z) = t/\\|z\\|      for \\|z\\| > t
+            weights(z) = t/\|z\|      for \|z\| > t
         """
-        pass
+        z_isscalar = np.isscalar(z)
+        z = np.atleast_1d(z)
+
+        test = self._subset(z)
+        absz = np.abs(z)
+        absz[test] = 1.0
+        v = test + (1 - test) * self.t / absz
+
+        if z_isscalar:
+            v = v[0]
+        return v

     def psi_deriv(self, z):
         """
@@ -252,9 +275,10 @@ class HuberT(RobustNorm):
         -----
         Used to estimate the robust covariance matrix.
         """
-        pass
+        return np.less_equal(np.abs(z), self.t).astype(float)


+# TODO: untested, but looks right.  RamsayE not available in R or SAS?
 class RamsayE(RobustNorm):
     """
     Ramsay's Ea for M estimation.
@@ -270,11 +294,11 @@ class RamsayE(RobustNorm):
     statsmodels.robust.norms.RobustNorm
     """

-    def __init__(self, a=0.3):
+    def __init__(self, a=.3):
         self.a = a

     def rho(self, z):
-        """
+        r"""
         The robust criterion function for Ramsay's Ea.

         Parameters
@@ -285,12 +309,14 @@ class RamsayE(RobustNorm):
         Returns
         -------
         rho : ndarray
-            rho(z) = a**-2 * (1 - exp(-a*\\|z\\|)*(1 + a*\\|z\\|))
+            rho(z) = a**-2 * (1 - exp(-a*\|z\|)*(1 + a*\|z\|))
         """
-        pass
+        z = np.asarray(z)
+        return (1 - np.exp(-self.a * np.abs(z)) *
+                (1 + self.a * np.abs(z))) / self.a**2

     def psi(self, z):
-        """
+        r"""
         The psi function for Ramsay's Ea estimator

         The analytic derivative of rho
@@ -303,12 +329,13 @@ class RamsayE(RobustNorm):
         Returns
         -------
         psi : ndarray
-            psi(z) = z*exp(-a*\\|z\\|)
+            psi(z) = z*exp(-a*\|z\|)
         """
-        pass
+        z = np.asarray(z)
+        return z * np.exp(-self.a * np.abs(z))

     def weights(self, z):
-        """
+        r"""
         Ramsay's Ea weighting function for the IRLS algorithm

         The psi function scaled by z
@@ -321,9 +348,11 @@ class RamsayE(RobustNorm):
         Returns
         -------
         weights : ndarray
-            weights(z) = exp(-a*\\|z\\|)
+            weights(z) = exp(-a*\|z\|)
         """
-        pass
+
+        z = np.asarray(z)
+        return np.exp(-self.a * np.abs(z))

     def psi_deriv(self, z):
         """
@@ -333,7 +362,12 @@ class RamsayE(RobustNorm):
         -----
         Used to estimate the robust covariance matrix.
         """
-        pass
+        a = self.a
+        x = np.exp(-a * np.abs(z))
+        dx = -a * x * np.sign(z)
+        y = z
+        dy = 1
+        return x * dy + y * dx


 class AndrewWave(RobustNorm):
@@ -350,7 +384,6 @@ class AndrewWave(RobustNorm):
     --------
     statsmodels.robust.norms.RobustNorm
     """
-
     def __init__(self, a=1.339):
         self.a = a

@@ -358,10 +391,11 @@ class AndrewWave(RobustNorm):
         """
         Andrew's wave is defined piecewise over the range of z.
         """
-        pass
+        z = np.asarray(z)
+        return np.less_equal(np.abs(z), self.a * np.pi)

     def rho(self, z):
-        """
+        r"""
         The robust criterion function for Andrew's wave.

         Parameters
@@ -376,13 +410,18 @@ class AndrewWave(RobustNorm):

             .. math::

-                rho(z) & = a^2 *(1-cos(z/a)), |z| \\leq a\\pi \\\\
-                rho(z) & = 2a, |z|>q\\pi
+                rho(z) & = a^2 *(1-cos(z/a)), |z| \leq a\pi \\
+                rho(z) & = 2a, |z|>q\pi
         """
-        pass
+
+        a = self.a
+        z = np.asarray(z)
+        test = self._subset(z)
+        return (test * a**2 * (1 - np.cos(z / a)) +
+                (1 - test) * a**2 * 2)

     def psi(self, z):
-        """
+        r"""
         The psi function for Andrew's wave

         The analytic derivative of rho
@@ -395,14 +434,18 @@ class AndrewWave(RobustNorm):
         Returns
         -------
         psi : ndarray
-            psi(z) = a * sin(z/a)   for \\|z\\| <= a*pi
+            psi(z) = a * sin(z/a)   for \|z\| <= a*pi

-            psi(z) = 0              for \\|z\\| > a*pi
+            psi(z) = 0              for \|z\| > a*pi
         """
-        pass
+
+        a = self.a
+        z = np.asarray(z)
+        test = self._subset(z)
+        return test * a * np.sin(z / a)

     def weights(self, z):
-        """
+        r"""
         Andrew's wave weighting function for the IRLS algorithm

         The psi function scaled by z
@@ -415,11 +458,23 @@ class AndrewWave(RobustNorm):
         Returns
         -------
         weights : ndarray
-            weights(z) = sin(z/a) / (z/a)     for \\|z\\| <= a*pi
-
-            weights(z) = 0                    for \\|z\\| > a*pi
-        """
-        pass
+            weights(z) = sin(z/a) / (z/a)     for \|z\| <= a*pi
+
+            weights(z) = 0                    for \|z\| > a*pi
+        """
+        a = self.a
+        z = np.asarray(z)
+        test = self._subset(z)
+        ratio = z / a
+        small = np.abs(ratio) < np.finfo(np.double).eps
+        if np.any(small):
+            weights = np.ones_like(ratio)
+            large = ~small
+            ratio = ratio[large]
+            weights[large] = test[large] * np.sin(ratio) / ratio
+        else:
+            weights = test * np.sin(ratio) / ratio
+        return weights

     def psi_deriv(self, z):
         """
@@ -429,9 +484,12 @@ class AndrewWave(RobustNorm):
         -----
         Used to estimate the robust covariance matrix.
         """
-        pass

+        test = self._subset(z)
+        return test * np.cos(z / self.a)

+
+# TODO: this is untested
 class TrimmedMean(RobustNorm):
     """
     Trimmed mean function for M-estimation.
@@ -447,17 +505,19 @@ class TrimmedMean(RobustNorm):
     statsmodels.robust.norms.RobustNorm
     """

-    def __init__(self, c=2.0):
+    def __init__(self, c=2.):
         self.c = c

     def _subset(self, z):
         """
         Least trimmed mean is defined piecewise over the range of z.
         """
-        pass
+
+        z = np.asarray(z)
+        return np.less_equal(np.abs(z), self.c)

     def rho(self, z):
-        """
+        r"""
         The robust criterion function for least trimmed mean.

         Parameters
@@ -468,14 +528,17 @@ class TrimmedMean(RobustNorm):
         Returns
         -------
         rho : ndarray
-            rho(z) = (1/2.)*z**2    for \\|z\\| <= c
+            rho(z) = (1/2.)*z**2    for \|z\| <= c

-            rho(z) = (1/2.)*c**2              for \\|z\\| > c
+            rho(z) = (1/2.)*c**2              for \|z\| > c
         """
-        pass
+
+        z = np.asarray(z)
+        test = self._subset(z)
+        return test * z**2 * 0.5 + (1 - test) * self.c**2 * 0.5

     def psi(self, z):
-        """
+        r"""
         The psi function for least trimmed mean

         The analytic derivative of rho
@@ -488,14 +551,16 @@ class TrimmedMean(RobustNorm):
         Returns
         -------
         psi : ndarray
-            psi(z) = z              for \\|z\\| <= c
+            psi(z) = z              for \|z\| <= c

-            psi(z) = 0              for \\|z\\| > c
+            psi(z) = 0              for \|z\| > c
         """
-        pass
+        z = np.asarray(z)
+        test = self._subset(z)
+        return test * z

     def weights(self, z):
-        """
+        r"""
         Least trimmed mean weighting function for the IRLS algorithm

         The psi function scaled by z
@@ -508,11 +573,13 @@ class TrimmedMean(RobustNorm):
         Returns
         -------
         weights : ndarray
-            weights(z) = 1             for \\|z\\| <= c
+            weights(z) = 1             for \|z\| <= c

-            weights(z) = 0             for \\|z\\| > c
+            weights(z) = 0             for \|z\| > c
         """
-        pass
+        z = np.asarray(z)
+        test = self._subset(z)
+        return test

     def psi_deriv(self, z):
         """
@@ -522,7 +589,8 @@ class TrimmedMean(RobustNorm):
         -----
         Used to estimate the robust covariance matrix.
         """
-        pass
+        test = self._subset(z)
+        return test


 class Hampel(RobustNorm):
@@ -543,7 +611,7 @@ class Hampel(RobustNorm):
     statsmodels.robust.norms.RobustNorm
     """

-    def __init__(self, a=2.0, b=4.0, c=8.0):
+    def __init__(self, a=2., b=4., c=8.):
         self.a = a
         self.b = b
         self.c = c
@@ -552,10 +620,14 @@ class Hampel(RobustNorm):
         """
         Hampel's function is defined piecewise over the range of z
         """
-        pass
+        z = np.abs(np.asarray(z))
+        t1 = np.less_equal(z, self.a)
+        t2 = np.less_equal(z, self.b) * np.greater(z, self.a)
+        t3 = np.less_equal(z, self.c) * np.greater(z, self.b)
+        return t1, t2, t3

     def rho(self, z):
-        """
+        r"""
         The robust criterion function for Hampel's estimator

         Parameters
@@ -566,18 +638,37 @@ class Hampel(RobustNorm):
         Returns
         -------
         rho : ndarray
-            rho(z) = z**2 / 2                     for \\|z\\| <= a
+            rho(z) = z**2 / 2                     for \|z\| <= a

-            rho(z) = a*\\|z\\| - 1/2.*a**2               for a < \\|z\\| <= b
+            rho(z) = a*\|z\| - 1/2.*a**2               for a < \|z\| <= b

-            rho(z) = a*(c - \\|z\\|)**2 / (c - b) / 2    for b < \\|z\\| <= c
+            rho(z) = a*(c - \|z\|)**2 / (c - b) / 2    for b < \|z\| <= c

-            rho(z) = a*(b + c - a) / 2                 for \\|z\\| > c
+            rho(z) = a*(b + c - a) / 2                 for \|z\| > c
         """
-        pass
+        a, b, c = self.a, self.b, self.c
+
+        z_isscalar = np.isscalar(z)
+        z = np.atleast_1d(z)
+
+        t1, t2, t3 = self._subset(z)
+        t34 = ~(t1 | t2)
+        dt = np.promote_types(z.dtype, "float")
+        v = np.zeros(z.shape, dtype=dt)
+        z = np.abs(z)
+        v[t1] = z[t1]**2 * 0.5
+        # v[t2] = (a * (z[t2] - a) + a**2 * 0.5)
+        v[t2] = (a * z[t2] - a**2 * 0.5)
+        v[t3] = a * (c - z[t3])**2 / (c - b) * (-0.5)
+        v[t34] += a * (b + c - a) * 0.5
+
+        if z_isscalar:
+            v = v[0]
+
+        return v

     def psi(self, z):
-        """
+        r"""
         The psi function for Hampel's estimator

         The analytic derivative of rho
@@ -590,18 +681,35 @@ class Hampel(RobustNorm):
         Returns
         -------
         psi : ndarray
-            psi(z) = z                            for \\|z\\| <= a
+            psi(z) = z                            for \|z\| <= a

-            psi(z) = a*sign(z)                    for a < \\|z\\| <= b
+            psi(z) = a*sign(z)                    for a < \|z\| <= b

-            psi(z) = a*sign(z)*(c - \\|z\\|)/(c-b)    for b < \\|z\\| <= c
+            psi(z) = a*sign(z)*(c - \|z\|)/(c-b)    for b < \|z\| <= c

-            psi(z) = 0                            for \\|z\\| > c
+            psi(z) = 0                            for \|z\| > c
         """
-        pass
+        a, b, c = self.a, self.b, self.c
+
+        z_isscalar = np.isscalar(z)
+        z = np.atleast_1d(z)
+
+        t1, t2, t3 = self._subset(z)
+        dt = np.promote_types(z.dtype, "float")
+        v = np.zeros(z.shape, dtype=dt)
+        s = np.sign(z)
+        za = np.abs(z)
+
+        v[t1] = z[t1]
+        v[t2] = a * s[t2]
+        v[t3] = a * s[t3] * (c - za[t3]) / (c - b)
+
+        if z_isscalar:
+            v = v[0]
+        return v

     def weights(self, z):
-        """
+        r"""
         Hampel weighting function for the IRLS algorithm

         The psi function scaled by z
@@ -614,20 +722,52 @@ class Hampel(RobustNorm):
         Returns
         -------
         weights : ndarray
-            weights(z) = 1                                for \\|z\\| <= a
+            weights(z) = 1                                for \|z\| <= a

-            weights(z) = a/\\|z\\|                          for a < \\|z\\| <= b
+            weights(z) = a/\|z\|                          for a < \|z\| <= b

-            weights(z) = a*(c - \\|z\\|)/(\\|z\\|*(c-b))      for b < \\|z\\| <= c
+            weights(z) = a*(c - \|z\|)/(\|z\|*(c-b))      for b < \|z\| <= c

-            weights(z) = 0                                for \\|z\\| > c
+            weights(z) = 0                                for \|z\| > c
         """
-        pass
+        a, b, c = self.a, self.b, self.c
+
+        z_isscalar = np.isscalar(z)
+        z = np.atleast_1d(z)
+
+        t1, t2, t3 = self._subset(z)
+
+        dt = np.promote_types(z.dtype, "float")
+        v = np.zeros(z.shape, dtype=dt)
+        v[t1] = 1.0
+        abs_z = np.abs(z)
+        v[t2] = a / abs_z[t2]
+        abs_zt3 = abs_z[t3]
+        v[t3] = a * (c - abs_zt3) / (abs_zt3 * (c - b))
+
+        if z_isscalar:
+            v = v[0]
+        return v

     def psi_deriv(self, z):
         """Derivative of psi function, second derivative of rho function.
         """
-        pass
+        a, b, c = self.a, self.b, self.c
+
+        z_isscalar = np.isscalar(z)
+        z = np.atleast_1d(z)
+
+        t1, _, t3 = self._subset(z)
+
+        dt = np.promote_types(z.dtype, "float")
+        d = np.zeros(z.shape, dtype=dt)
+        d[t1] = 1.0
+        zt3 = z[t3]
+        d[t3] = -(a * np.sign(zt3) * zt3) / (np.abs(zt3) * (c - b))
+
+        if z_isscalar:
+            d = d[0]
+        return d


 class TukeyBiweight(RobustNorm):
@@ -653,10 +793,11 @@ class TukeyBiweight(RobustNorm):
         """
         Tukey's biweight is defined piecewise over the range of z
         """
-        pass
+        z = np.abs(np.asarray(z))
+        return np.less_equal(z, self.c)

     def rho(self, z):
-        """
+        r"""
         The robust criterion function for Tukey's biweight estimator

         Parameters
@@ -667,14 +808,16 @@ class TukeyBiweight(RobustNorm):
         Returns
         -------
         rho : ndarray
-            rho(z) = -(1 - (z/c)**2)**3 * c**2/6.   for \\|z\\| <= R
+            rho(z) = -(1 - (z/c)**2)**3 * c**2/6.   for \|z\| <= R

-            rho(z) = 0                              for \\|z\\| > R
+            rho(z) = 0                              for \|z\| > R
         """
-        pass
+        subset = self._subset(z)
+        factor = self.c**2 / 6.
+        return -(1 - (z / self.c)**2)**3 * subset * factor + factor

     def psi(self, z):
-        """
+        r"""
         The psi function for Tukey's biweight estimator

         The analytic derivative of rho
@@ -687,14 +830,17 @@ class TukeyBiweight(RobustNorm):
         Returns
         -------
         psi : ndarray
-            psi(z) = z*(1 - (z/c)**2)**2        for \\|z\\| <= R
+            psi(z) = z*(1 - (z/c)**2)**2        for \|z\| <= R

-            psi(z) = 0                           for \\|z\\| > R
+            psi(z) = 0                           for \|z\| > R
         """
-        pass
+
+        z = np.asarray(z)
+        subset = self._subset(z)
+        return z * (1 - (z / self.c)**2)**2 * subset

     def weights(self, z):
-        """
+        r"""
         Tukey's biweight weighting function for the IRLS algorithm

         The psi function scaled by z
@@ -707,11 +853,13 @@ class TukeyBiweight(RobustNorm):
         Returns
         -------
         weights : ndarray
-            psi(z) = (1 - (z/c)**2)**2          for \\|z\\| <= R
+            psi(z) = (1 - (z/c)**2)**2          for \|z\| <= R

-            psi(z) = 0                          for \\|z\\| > R
+            psi(z) = 0                          for \|z\| > R
         """
-        pass
+
+        subset = self._subset(z)
+        return (1 - (z / self.c)**2)**2 * subset

     def psi_deriv(self, z):
         """
@@ -721,7 +869,9 @@ class TukeyBiweight(RobustNorm):
         -----
         Used to estimate the robust covariance matrix.
         """
-        pass
+        subset = self._subset(z)
+        return subset * ((1 - (z/self.c)**2)**2
+                         - (4*z**2/self.c**2) * (1-(z/self.c)**2))


 class MQuantileNorm(RobustNorm):
@@ -778,6 +928,15 @@ class MQuantileNorm(RobustNorm):
         self.q = q
         self.base_norm = base_norm

+    def _get_q(self, z):
+
+        nobs = len(z)
+        mask_neg = (z < 0)  # if self.q < 0.5 else (z <= 0)  # maybe symmetric
+        qq = np.empty(nobs)
+        qq[mask_neg] = 1 - self.q
+        qq[~mask_neg] = self.q
+        return qq
+
     def rho(self, z):
         """
         The robust criterion function for MQuantileNorm.
@@ -791,7 +950,8 @@ class MQuantileNorm(RobustNorm):
         -------
         rho : ndarray
         """
-        pass
+        qq = self._get_q(z)
+        return qq * self.base_norm.rho(z)

     def psi(self, z):
         """
@@ -808,7 +968,8 @@ class MQuantileNorm(RobustNorm):
         -------
         psi : ndarray
         """
-        pass
+        qq = self._get_q(z)
+        return qq * self.base_norm.psi(z)

     def weights(self, z):
         """
@@ -825,10 +986,11 @@ class MQuantileNorm(RobustNorm):
         -------
         weights : ndarray
         """
-        pass
+        qq = self._get_q(z)
+        return qq * self.base_norm.weights(z)

     def psi_deriv(self, z):
-        """
+        '''
         The derivative of MQuantileNorm function

         Parameters
@@ -843,8 +1005,9 @@ class MQuantileNorm(RobustNorm):
         Notes
         -----
         Used to estimate the robust covariance matrix.
-        """
-        pass
+        '''
+        qq = self._get_q(z)
+        return qq * self.base_norm.psi_deriv(z)

     def __call__(self, z):
         """
@@ -853,8 +1016,8 @@ class MQuantileNorm(RobustNorm):
         return self.rho(z)


-def estimate_location(a, scale, norm=None, axis=0, initial=None, maxiter=30,
-    tol=1e-06):
+def estimate_location(a, scale, norm=None, axis=0, initial=None,
+                      maxiter=30, tol=1.0e-06):
     """
     M-estimator of location using self.norm and a current
     estimator of scale.
@@ -886,4 +1049,20 @@ def estimate_location(a, scale, norm=None, axis=0, initial=None, maxiter=30,
     mu : ndarray
         Estimate of location
     """
-    pass
+    if norm is None:
+        norm = HuberT()
+
+    if initial is None:
+        mu = np.median(a, axis)
+    else:
+        mu = initial
+
+    for _ in range(maxiter):
+        W = norm.weights((a-mu)/scale)
+        nmu = np.sum(W*a, axis) / np.sum(W, axis)
+        if np.all(np.less(np.abs(mu - nmu), scale * tol)):
+            return nmu
+        else:
+            mu = nmu
+    raise ValueError("location estimator failed to converge in %d iterations"
+                     % maxiter)
diff --git a/statsmodels/robust/robust_linear_model.py b/statsmodels/robust/robust_linear_model.py
index c3f3b6fc0..b3df91ff8 100644
--- a/statsmodels/robust/robust_linear_model.py
+++ b/statsmodels/robust/robust_linear_model.py
@@ -15,6 +15,7 @@ R Venables, B Ripley. 'Modern Applied Statistics in S'  Springer, New York,
 """
 import numpy as np
 import scipy.stats as stats
+
 import statsmodels.base.model as base
 import statsmodels.base.wrapper as wrap
 import statsmodels.regression._tools as reg_tools
@@ -23,12 +24,17 @@ import statsmodels.robust.norms as norms
 import statsmodels.robust.scale as scale
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.sm_exceptions import ConvergenceWarning
+
 __all__ = ['RLM']


+def _check_convergence(criterion, iteration, tol, maxiter):
+    cond = np.abs(criterion[iteration] - criterion[iteration - 1])
+    return not (np.any(cond > tol) and iteration < maxiter)
+
+
 class RLM(base.LikelihoodModel):
-    __doc__ = (
-        """
+    __doc__ = """
     Robust Linear Model

     Estimate a robust linear model via iteratively reweighted least squares
@@ -79,7 +85,8 @@ class RLM(base.LikelihoodModel):
     >>> import statsmodels.api as sm
     >>> data = sm.datasets.stackloss.load()
     >>> data.exog = sm.add_constant(data.exog)
-    >>> rlm_model = sm.RLM(data.endog, data.exog,                            M=sm.robust.norms.HuberT())
+    >>> rlm_model = sm.RLM(data.endog, data.exog, \
+                           M=sm.robust.norms.HuberT())

     >>> rlm_results = rlm_model.fit()
     >>> rlm_results.params
@@ -95,16 +102,17 @@ class RLM(base.LikelihoodModel):
     >>> rlm_hamp_hub = mod.fit(scale_est=sm.robust.scale.HuberScale())
     >>> rlm_hamp_hub.params
     array([  0.73175452,   1.25082038,  -0.14794399, -40.27122257])
-    """
-         % {'params': base._model_params_doc, 'extra_params': base.
-        _missing_param_doc})
+    """ % {'params': base._model_params_doc,
+           'extra_params': base._missing_param_doc}

-    def __init__(self, endog, exog, M=None, missing='none', **kwargs):
+    def __init__(self, endog, exog, M=None, missing='none',
+                 **kwargs):
         self._check_kwargs(kwargs)
         self.M = M if M is not None else norms.HuberT()
-        super(base.LikelihoodModel, self).__init__(endog, exog, missing=
-            missing, **kwargs)
+        super(base.LikelihoodModel, self).__init__(endog, exog,
+                                                   missing=missing, **kwargs)
         self._initialize()
+        # things to remove_data
         self._data_attr.extend(['weights', 'pinv_wexog'])

     def _initialize(self):
@@ -113,7 +121,19 @@ class RLM(base.LikelihoodModel):

         Resets the history and number of iterations.
         """
-        pass
+        self.pinv_wexog = np.linalg.pinv(self.exog)
+        self.normalized_cov_params = np.dot(self.pinv_wexog,
+                                            np.transpose(self.pinv_wexog))
+        self.df_resid = (float(self.exog.shape[0] -
+                               np.linalg.matrix_rank(self.exog)))
+        self.df_model = float(np.linalg.matrix_rank(self.exog) - 1)
+        self.nobs = float(self.endog.shape[0])
+
+    def score(self, params):
+        raise NotImplementedError
+
+    def information(self, params):
+        raise NotImplementedError

     def predict(self, params, exog=None):
         """
@@ -130,22 +150,49 @@ class RLM(base.LikelihoodModel):
         -------
         An array of fitted values
         """
-        pass
+        # copied from linear_model  # TODO: then is it needed?
+        if exog is None:
+            exog = self.exog
+        return np.dot(exog, params)
+
+    def loglike(self, params):
+        raise NotImplementedError

     def deviance(self, tmp_results):
         """
         Returns the (unnormalized) log-likelihood from the M estimator.
         """
-        pass
+        tmp_resid = self.endog - tmp_results.fittedvalues
+        return self.M(tmp_resid / tmp_results.scale).sum()
+
+    def _update_history(self, tmp_results, history, conv):
+        history['params'].append(tmp_results.params)
+        history['scale'].append(tmp_results.scale)
+        if conv == 'dev':
+            history['deviance'].append(self.deviance(tmp_results))
+        elif conv == 'sresid':
+            history['sresid'].append(tmp_results.resid / tmp_results.scale)
+        elif conv == 'weights':
+            history['weights'].append(tmp_results.model.weights)
+        return history

     def _estimate_scale(self, resid):
         """
         Estimates the scale based on the option provided to the fit method.
         """
-        pass
-
-    def fit(self, maxiter=50, tol=1e-08, scale_est='mad', init=None, cov=
-        'H1', update_scale=True, conv='dev', start_params=None):
+        if isinstance(self.scale_est, str):
+            if self.scale_est.lower() == 'mad':
+                return scale.mad(resid, center=0)
+            else:
+                raise ValueError("Option %s for scale_est not understood" %
+                                 self.scale_est)
+        elif isinstance(self.scale_est, scale.HuberScale):
+            return self.scale_est(self.df_resid, self.nobs, resid)
+        else:
+            return scale.scale_est(self, resid) ** 2
+
+    def fit(self, maxiter=50, tol=1e-8, scale_est='mad', init=None, cov='H1',
+            update_scale=True, conv='dev', start_params=None):
         """
         Fits the model using iteratively reweighted least squares.

@@ -193,7 +240,77 @@ class RLM(base.LikelihoodModel):
         results : statsmodels.rlm.RLMresults
             Results instance
         """
-        pass
+        if cov.upper() not in ["H1", "H2", "H3"]:
+            raise ValueError("Covariance matrix %s not understood" % cov)
+        else:
+            self.cov = cov.upper()
+        conv = conv.lower()
+        if conv not in ["weights", "coefs", "dev", "sresid"]:
+            raise ValueError("Convergence argument %s not understood" % conv)
+        self.scale_est = scale_est
+
+        if start_params is None:
+            wls_results = lm.WLS(self.endog, self.exog).fit()
+        else:
+            start_params = np.asarray(start_params, dtype=np.double).squeeze()
+            if (start_params.shape[0] != self.exog.shape[1] or
+                    start_params.ndim != 1):
+                raise ValueError('start_params must by a 1-d array with {0} '
+                                 'values'.format(self.exog.shape[1]))
+            fake_wls = reg_tools._MinimalWLS(self.endog, self.exog,
+                                             weights=np.ones_like(self.endog),
+                                             check_weights=False)
+            wls_results = fake_wls.results(start_params)
+
+        if not init:
+            self.scale = self._estimate_scale(wls_results.resid)
+
+        history = dict(params=[np.inf], scale=[])
+        if conv == 'coefs':
+            criterion = history['params']
+        elif conv == 'dev':
+            history.update(dict(deviance=[np.inf]))
+            criterion = history['deviance']
+        elif conv == 'sresid':
+            history.update(dict(sresid=[np.inf]))
+            criterion = history['sresid']
+        elif conv == 'weights':
+            history.update(dict(weights=[np.inf]))
+            criterion = history['weights']
+
+        # done one iteration so update
+        history = self._update_history(wls_results, history, conv)
+        iteration = 1
+        converged = 0
+        while not converged:
+            if self.scale == 0.0:
+                import warnings
+                warnings.warn('Estimated scale is 0.0 indicating that the most'
+                              ' last iteration produced a perfect fit of the '
+                              'weighted data.', ConvergenceWarning)
+                break
+            self.weights = self.M.weights(wls_results.resid / self.scale)
+            wls_results = reg_tools._MinimalWLS(self.endog, self.exog,
+                                                weights=self.weights,
+                                                check_weights=True).fit()
+            if update_scale is True:
+                self.scale = self._estimate_scale(wls_results.resid)
+            history = self._update_history(wls_results, history, conv)
+            iteration += 1
+            converged = _check_convergence(criterion, iteration, tol, maxiter)
+        results = RLMResults(self, wls_results.params,
+                             self.normalized_cov_params, self.scale)
+
+        history['iteration'] = iteration
+        results.fit_history = history
+        results.fit_options = dict(cov=cov.upper(), scale_est=scale_est,
+                                   norm=self.M.__class__.__name__, conv=conv)
+        # norm is not changed in fit, no old state
+
+        # doing the next causes exception
+        # self.cov = self.scale_est = None #reset for additional fits
+        # iteration and history could contain wrong state with repeated fit
+        return RLMResultsWrapper(results)


 class RLMResults(base.LikelihoodModelResults):
@@ -284,24 +401,131 @@ class RLMResults(base.LikelihoodModelResults):

     def __init__(self, model, params, normalized_cov_params, scale):
         super(RLMResults, self).__init__(model, params,
-            normalized_cov_params, scale)
+                                         normalized_cov_params, scale)
         self.model = model
         self.df_model = model.df_model
         self.df_resid = model.df_resid
         self.nobs = model.nobs
         self._cache = {}
+        # for remove_data
         self._data_in_cache.extend(['sresid'])
-        self.cov_params_default = self.bcov_scaled

-    def summary(self, yname=None, xname=None, title=0, alpha=0.05,
-        return_fmt='text'):
+        self.cov_params_default = self.bcov_scaled
+        # TODO: "pvals" should come from chisq on bse?
+
+    @cache_readonly
+    def fittedvalues(self):
+        return np.dot(self.model.exog, self.params)
+
+    @cache_readonly
+    def resid(self):
+        return self.model.endog - self.fittedvalues  # before bcov
+
+    @cache_readonly
+    def sresid(self):
+        if self.scale == 0.0:
+            sresid = self.resid.copy()
+            sresid[:] = 0.0
+            return sresid
+        return self.resid / self.scale
+
+    @cache_readonly
+    def bcov_unscaled(self):
+        return self.normalized_cov_params
+
+    @cache_readonly
+    def weights(self):
+        return self.model.weights
+
+    @cache_readonly
+    def bcov_scaled(self):
+        model = self.model
+        m = np.mean(model.M.psi_deriv(self.sresid))
+        var_psiprime = np.var(model.M.psi_deriv(self.sresid))
+        k = 1 + (self.df_model + 1) / self.nobs * var_psiprime / m ** 2
+
+        if model.cov == "H1":
+            ss_psi = np.sum(model.M.psi(self.sresid) ** 2)
+            s_psi_deriv = np.sum(model.M.psi_deriv(self.sresid))
+            return k ** 2 * (1 / self.df_resid * ss_psi * self.scale ** 2) /\
+                ((1 / self.nobs * s_psi_deriv) ** 2) *\
+                model.normalized_cov_params
+        else:
+            W = np.dot(model.M.psi_deriv(self.sresid) * model.exog.T,
+                       model.exog)
+            W_inv = np.linalg.inv(W)
+            # [W_jk]^-1 = [SUM(psi_deriv(Sr_i)*x_ij*x_jk)]^-1
+            # where Sr are the standardized residuals
+            if model.cov == "H2":
+                # These are correct, based on Huber (1973) 8.13
+                return k * (1 / self.df_resid) * np.sum(
+                    model.M.psi(self.sresid) ** 2) * self.scale ** 2 \
+                       / ((1 / self.nobs) *
+                          np.sum(model.M.psi_deriv(self.sresid))) * W_inv
+            elif model.cov == "H3":
+                return k ** -1 * 1 / self.df_resid * np.sum(
+                    model.M.psi(self.sresid) ** 2) * self.scale ** 2 \
+                       * np.dot(
+                    np.dot(W_inv, np.dot(model.exog.T, model.exog)),
+                    W_inv)
+
+    @cache_readonly
+    def pvalues(self):
+        return stats.norm.sf(np.abs(self.tvalues)) * 2
+
+    @cache_readonly
+    def bse(self):
+        return np.sqrt(np.diag(self.bcov_scaled))
+
+    @cache_readonly
+    def chisq(self):
+        return (self.params / self.bse) ** 2
+
+    def summary(self, yname=None, xname=None, title=0, alpha=.05,
+                return_fmt='text'):
         """
         This is for testing the new summary setup
         """
-        pass
-
-    def summary2(self, xname=None, yname=None, title=None, alpha=0.05,
-        float_format='%.4f'):
+        top_left = [('Dep. Variable:', None),
+                    ('Model:', None),
+                    ('Method:', ['IRLS']),
+                    ('Norm:', [self.fit_options['norm']]),
+                    ('Scale Est.:', [self.fit_options['scale_est']]),
+                    ('Cov Type:', [self.fit_options['cov']]),
+                    ('Date:', None),
+                    ('Time:', None),
+                    ('No. Iterations:', ["%d" % self.fit_history['iteration']])
+                    ]
+        top_right = [('No. Observations:', None),
+                     ('Df Residuals:', None),
+                     ('Df Model:', None)
+                     ]
+
+        if title is not None:
+            title = "Robust linear Model Regression Results"
+
+        # boiler plate
+        from statsmodels.iolib.summary import Summary
+        smry = Summary()
+        smry.add_table_2cols(self, gleft=top_left, gright=top_right,
+                             yname=yname, xname=xname, title=title)
+        smry.add_table_params(self, yname=yname, xname=xname, alpha=alpha,
+                              use_t=self.use_t)
+
+        # add warnings/notes, added to text format only
+        etext = []
+        wstr = ("If the model instance has been used for another fit with "
+                "different fit parameters, then the fit options might not be "
+                "the correct ones anymore .")
+        etext.append(wstr)
+
+        if etext:
+            smry.add_extra_txt(etext)
+
+        return smry
+
+    def summary2(self, xname=None, yname=None, title=None, alpha=.05,
+                 float_format="%.4f"):
         """Experimental summary function for regression results

         Parameters
@@ -330,11 +554,16 @@ class RLMResults(base.LikelihoodModelResults):
         --------
         statsmodels.iolib.summary2.Summary : class to hold summary results
         """
-        pass
+        from statsmodels.iolib import summary2
+        smry = summary2.Summary()
+        smry.add_base(results=self, alpha=alpha, float_format=float_format,
+                      xname=xname, yname=yname, title=title)
+
+        return smry


 class RLMResultsWrapper(lm.RegressionResultsWrapper):
     pass


-wrap.populate_wrapper(RLMResultsWrapper, RLMResults)
+wrap.populate_wrapper(RLMResultsWrapper, RLMResults)  # noqa:E305
diff --git a/statsmodels/robust/scale.py b/statsmodels/robust/scale.py
index dca2ed321..1927d4db7 100644
--- a/statsmodels/robust/scale.py
+++ b/statsmodels/robust/scale.py
@@ -13,8 +13,10 @@ estimators of scale' Computational statistics. Physica, Heidelberg, 1992.
 """
 import numpy as np
 from scipy.stats import norm as Gaussian
+
 from statsmodels.tools import tools
 from statsmodels.tools.validation import array_like, float_like
+
 from . import norms
 from ._qn import _qn

@@ -42,7 +44,26 @@ def mad(a, c=Gaussian.ppf(3 / 4.0), axis=0, center=np.median):
     mad : float
         `mad` = median(abs(`a` - center))/`c`
     """
-    pass
+    a = array_like(a, "a", ndim=None)
+    c = float_like(c, "c")
+    if not a.size:
+        center_val = 0.0
+    elif callable(center):
+        if axis is not None:
+            center_val = np.apply_over_axes(center, a, axis)
+        else:
+            center_val = center(a.ravel())
+    else:
+        center_val = float_like(center, "center")
+    err = (np.abs(a - center_val)) / c
+    if not err.size:
+        if axis is None or err.ndim == 1:
+            return np.nan
+        else:
+            shape = list(err.shape)
+            shape.pop(axis)
+            return np.empty(shape)
+    return np.median(err, axis=axis)


 def iqr(a, c=Gaussian.ppf(3 / 4) - Gaussian.ppf(1 / 4), axis=0):
@@ -65,7 +86,16 @@ def iqr(a, c=Gaussian.ppf(3 / 4) - Gaussian.ppf(1 / 4), axis=0):
     -------
     The normalized interquartile range
     """
-    pass
+    a = array_like(a, "a", ndim=None)
+    c = float_like(c, "c")
+
+    if a.ndim == 0:
+        raise ValueError("a should have at least one dimension")
+    elif a.size == 0:
+        return np.nan
+    else:
+        quantiles = np.quantile(a, [0.25, 0.75], axis=axis)
+        return np.squeeze(np.diff(quantiles, axis=0) / c)


 def qn_scale(a, c=1 / (np.sqrt(2) * Gaussian.ppf(5 / 8)), axis=0):
@@ -95,7 +125,19 @@ def qn_scale(a, c=1 / (np.sqrt(2) * Gaussian.ppf(5 / 8)), axis=0):
     {float, ndarray}
         The Qn robust estimator of scale
     """
-    pass
+    a = array_like(
+        a, "a", ndim=None, dtype=np.float64, contiguous=True, order="C"
+    )
+    c = float_like(c, "c")
+    if a.ndim == 0:
+        raise ValueError("a should have at least one dimension")
+    elif a.size == 0:
+        return np.nan
+    else:
+        out = np.apply_along_axis(_qn, axis=axis, arr=a, c=c)
+        if out.ndim == 0:
+            return float(out)
+        return out


 def _qn_naive(a, c=1 / (np.sqrt(2) * Gaussian.ppf(5 / 8))):
@@ -116,7 +158,18 @@ def _qn_naive(a, c=1 / (np.sqrt(2) * Gaussian.ppf(5 / 8))):
     -------
     The Qn robust estimator of scale
     """
-    pass
+    a = np.squeeze(a)
+    n = a.shape[0]
+    if a.size == 0:
+        return np.nan
+    else:
+        h = int(n // 2 + 1)
+        k = int(h * (h - 1) / 2)
+        idx = np.triu_indices(n, k=1)
+        diffs = np.abs(a[idx[0]] - a[idx[1]])
+        output = np.partition(diffs, kth=k - 1)[k - 1]
+        output = c * output
+        return output


 class Huber:
@@ -150,7 +203,7 @@ class Huber:
     (array(3.2054980819923693), array(0.67365260010478967))
     """

-    def __init__(self, c=1.5, tol=1e-08, maxiter=30, norm=None):
+    def __init__(self, c=1.5, tol=1.0e-08, maxiter=30, norm=None):
         self.c = c
         self.maxiter = maxiter
         self.tol = tol
@@ -194,6 +247,7 @@ class Huber:
             n = a.shape[0]
             mu = mu
             est_mu = False
+
         if initscale is None:
             scale = mad(a, axis=axis)
         else:
@@ -212,15 +266,59 @@ class Huber:

         where estimate_location is an M-estimator and estimate_scale implements
         the check used in Section 5.5 of Venables & Ripley
-        """
-        pass
+        """  # noqa:E501
+        for _ in range(self.maxiter):
+            # Estimate the mean along a given axis
+            if est_mu:
+                if self.norm is None:
+                    # This is a one-step fixed-point estimator
+                    # if self.norm == norms.HuberT
+                    # It should be faster than using norms.HuberT
+                    nmu = (
+                        np.clip(
+                            a, mu - self.c * scale, mu + self.c * scale
+                        ).sum(axis)
+                        / a.shape[axis]
+                    )
+                else:
+                    nmu = norms.estimate_location(
+                        a, scale, self.norm, axis, mu, self.maxiter, self.tol
+                    )
+            else:
+                # Effectively, do nothing
+                nmu = mu.squeeze()
+            nmu = tools.unsqueeze(nmu, axis, a.shape)
+
+            subset = np.less_equal(np.abs((a - mu) / scale), self.c)
+            card = subset.sum(axis)
+
+            scale_num = np.sum(subset * (a - nmu) ** 2, axis)
+            scale_denom = n * self.gamma - (a.shape[axis] - card) * self.c ** 2
+            nscale = np.sqrt(scale_num / scale_denom)
+            nscale = tools.unsqueeze(nscale, axis, a.shape)
+
+            test1 = np.all(
+                np.less_equal(np.abs(scale - nscale), nscale * self.tol)
+            )
+            test2 = np.all(
+                np.less_equal(np.abs(mu - nmu), nscale * self.tol)
+            )
+            if not (test1 and test2):
+                mu = nmu
+                scale = nscale
+            else:
+                return nmu.squeeze(), nscale.squeeze()
+        raise ValueError(
+            "joint estimation of location and scale failed "
+            "to converge in %d iterations" % self.maxiter
+        )


 huber = Huber()


 class HuberScale:
-    """
+    r"""
     Huber's scaling for fitting robust linear models.

     Huber's scale is intended to be used as the scale estimate in the
@@ -248,8 +346,8 @@ class HuberScale:

     where the Huber function is

-    chi(x) = (x**2)/2       for \\|x\\| < d
-    chi(x) = (d**2)/2       for \\|x\\| >= d
+    chi(x) = (x**2)/2       for \|x\| < d
+    chi(x) = (d**2)/2       for \|x\| >= d

     and the Huber constant h = (n-p)/n*(d**2 + (1-d**2)*
     scipy.stats.norm.cdf(d) - .5 - d*sqrt(2*pi)*exp(-0.5*d**2)
@@ -261,9 +359,16 @@ class HuberScale:
         self.maxiter = maxiter

     def __call__(self, df_resid, nobs, resid):
-        h = df_resid / nobs * (self.d ** 2 + (1 - self.d ** 2) * Gaussian.
-            cdf(self.d) - 0.5 - self.d / np.sqrt(2 * np.pi) * np.exp(-0.5 *
-            self.d ** 2))
+        h = (
+            df_resid
+            / nobs
+            * (
+                self.d ** 2
+                + (1 - self.d ** 2) * Gaussian.cdf(self.d)
+                - 0.5
+                - self.d / (np.sqrt(2 * np.pi)) * np.exp(-0.5 * self.d ** 2)
+            )
+        )
         s = mad(resid)

         def subset(x):
@@ -271,15 +376,24 @@ class HuberScale:

         def chi(s):
             return subset(s) * (resid / s) ** 2 / 2 + (1 - subset(s)) * (
-                self.d ** 2 / 2)
+                self.d ** 2 / 2
+            )
+
         scalehist = [np.inf, s]
         niter = 1
-        while np.abs(scalehist[niter - 1] - scalehist[niter]
-            ) > self.tol and niter < self.maxiter:
-            nscale = np.sqrt(1 / (nobs * h) * np.sum(chi(scalehist[-1])) * 
-                scalehist[-1] ** 2)
+        while (
+            np.abs(scalehist[niter - 1] - scalehist[niter]) > self.tol
+            and niter < self.maxiter
+        ):
+            nscale = np.sqrt(
+                1
+                / (nobs * h)
+                * np.sum(chi(scalehist[-1]))
+                * scalehist[-1] ** 2
+            )
             scalehist.append(nscale)
             niter += 1
+            # TODO: raise on convergence failure?
         return scalehist[-1]


diff --git a/statsmodels/sandbox/archive/linalg_covmat.py b/statsmodels/sandbox/archive/linalg_covmat.py
index 043a1dbd6..fc1c79c3d 100644
--- a/statsmodels/sandbox/archive/linalg_covmat.py
+++ b/statsmodels/sandbox/archive/linalg_covmat.py
@@ -1,23 +1,39 @@
 import math
 import numpy as np
 from scipy import linalg, stats, special
+
 from .linalg_decomp_1 import SvdArray
+
+
+#univariate standard normal distribution
+#following from scipy.stats.distributions with adjustments
 sqrt2pi = math.sqrt(2 * np.pi)
 logsqrt2pi = math.log(sqrt2pi)

-
 class StandardNormal:
-    """Distribution of vector x, with independent distribution N(0,1)
+    '''Distribution of vector x, with independent distribution N(0,1)

     this is the same as univariate normal for pdf and logpdf

     other methods not checked/adjusted yet

-    """
+    '''
+    def rvs(self, size):
+        return np.random.standard_normal(size)
+    def pdf(self, x):
+        return np.exp(-x**2 * 0.5) / sqrt2pi
+    def logpdf(self, x):
+        return -x**2 * 0.5 - logsqrt2pi
+    def _cdf(self, x):
+        return special.ndtr(x)
+    def _logcdf(self, x):
+        return np.log(special.ndtr(x))
+    def _ppf(self, q):
+        return special.ndtri(q)


 class AffineTransform:
-    """affine full rank transformation of a multivariate distribution
+    '''affine full rank transformation of a multivariate distribution

     no dimension checking, assumes everything broadcasts correctly
     first version without bound support
@@ -25,8 +41,7 @@ class AffineTransform:
     provides distribution of y given distribution of x
     y = const + tmat * x

-    """
-
+    '''
     def __init__(self, const, tmat, dist):
         self.const = const
         self.tmat = tmat
@@ -34,14 +49,36 @@ class AffineTransform:
         self.nrv = len(const)
         if not np.equal(self.nrv, tmat.shape).all():
             raise ValueError('dimension of const and tmat do not agree')
+
+        #replace the following with a linalgarray class
         self.tmatinv = linalg.inv(tmat)
         self.absdet = np.abs(np.linalg.det(self.tmat))
         self.logabsdet = np.log(np.abs(np.linalg.det(self.tmat)))
         self.dist

+    def rvs(self, size):
+        #size can only be integer not yet tuple
+        print((size,)+(self.nrv,))
+        return self.transform(self.dist.rvs(size=(size,)+(self.nrv,)))
+
+    def transform(self, x):
+        #return np.dot(self.tmat, x) + self.const
+        return np.dot(x, self.tmat) + self.const
+
+    def invtransform(self, y):
+        return np.dot(self.tmatinv, y - self.const)
+
+    def pdf(self, x):
+        return 1. / self.absdet * self.dist.pdf(self.invtransform(x))
+
+    def logpdf(self, x):
+        return - self.logabsdet + self.dist.logpdf(self.invtransform(x))
+
+
+

 class MultivariateNormalChol:
-    """multivariate normal distribution with cholesky decomposition of sigma
+    '''multivariate normal distribution with cholesky decomposition of sigma

     ignoring mean at the beginning, maybe

@@ -51,14 +88,42 @@ class MultivariateNormalChol:

     initially 1d is ok, 2d should work with iid in axis 0 and mvn in axis 1

-    """
+    '''

     def __init__(self, mean, sigma):
         self.mean = mean
         self.sigma = sigma
         self.sigmainv = sigmainv
         self.cholsigma = linalg.cholesky(sigma)
-        self.cholsigmainv = linalg.cholesky(sigmainv)[::-1, ::-1]
+        #the following makes it lower triangular with increasing time
+        self.cholsigmainv = linalg.cholesky(sigmainv)[::-1,::-1]
+        #todo: this might be a trick todo backward instead of forward filtering
+
+    def whiten(self, x):
+        return np.dot(cholsigmainv, x)
+
+    def logpdf_obs(self, x):
+        x = x - self.mean
+        x_whitened = self.whiten(x)
+
+        #sigmainv = linalg.cholesky(sigma)
+        logdetsigma = np.log(np.linalg.det(sigma))
+
+        sigma2 = 1. # error variance is included in sigma
+
+        llike  =  0.5 * (np.log(sigma2)
+                         - 2.* np.log(np.diagonal(self.cholsigmainv))
+                         + (x_whitened**2)/sigma2
+                         +  np.log(2*np.pi))
+
+        return llike
+
+    def logpdf(self, x):
+        return self.logpdf_obs(x).sum(-1)
+
+    def pdf(self, x):
+        return np.exp(self.logpdf(x))
+


 class MultivariateNormal:
@@ -68,82 +133,141 @@ class MultivariateNormal:
         self.sigma = SvdArray(sigma)


+
+
 def loglike_ar1(x, rho):
-    """loglikelihood of AR(1) process, as a test case
+    '''loglikelihood of AR(1) process, as a test case

     sigma_u partially hard coded

     Greene chapter 12 eq. (12-31)
-    """
-    pass
+    '''
+    x = np.asarray(x)
+    u = np.r_[x[0], x[1:] - rho * x[:-1]]
+    sigma_u2 = 2*(1-rho**2)
+    loglik = 0.5*(-(u**2).sum(0) / sigma_u2 + np.log(1-rho**2)
+                  - x.shape[0] * (np.log(2*np.pi) + np.log(sigma_u2)))
+    return loglik


 def ar2transform(x, arcoefs):
-    """
+    '''

     (Greene eq 12-30)
-    """
-    pass
+    '''
+    a1, a2 = arcoefs
+    y = np.zeros_like(x)
+    y[0] = np.sqrt((1+a2) * ((1-a2)**2 - a1**2) / (1-a2)) * x[0]
+    y[1] = np.sqrt(1-a2**2) * x[2] - a1 * np.sqrt(1-a1**2)/(1-a2) * x[1] #TODO:wrong index in x
+    y[2:] = x[2:] - a1 * x[1:-1] - a2 * x[:-2]
+    return y


 def mvn_loglike(x, sigma):
-    """loglike multivariate normal
+    '''loglike multivariate normal

     assumes x is 1d, (nobs,) and sigma is 2d (nobs, nobs)

     brute force from formula
     no checking of correct inputs
     use of inv and log-det should be replace with something more efficient
-    """
-    pass
-
+    '''
+    #see numpy thread
+    #Sturla: sqmahal = (cx*cho_solve(cho_factor(S),cx.T).T).sum(axis=1)
+    sigmainv = linalg.inv(sigma)
+    logdetsigma = np.log(np.linalg.det(sigma))
+    nobs = len(x)
+
+    llf = - np.dot(x, np.dot(sigmainv, x))
+    llf -= nobs * np.log(2 * np.pi)
+    llf -= logdetsigma
+    llf *= 0.5
+    return llf

 def mvn_nloglike_obs(x, sigma):
-    """loglike multivariate normal
+    '''loglike multivariate normal

     assumes x is 1d, (nobs,) and sigma is 2d (nobs, nobs)

     brute force from formula
     no checking of correct inputs
     use of inv and log-det should be replace with something more efficient
-    """
-    pass
+    '''
+    #see numpy thread
+    #Sturla: sqmahal = (cx*cho_solve(cho_factor(S),cx.T).T).sum(axis=1)

+    #Still wasteful to calculate pinv first
+    sigmainv = linalg.inv(sigma)
+    cholsigmainv = linalg.cholesky(sigmainv)
+    #2 * np.sum(np.log(np.diagonal(np.linalg.cholesky(A)))) #Dag mailinglist
+    # logdet not needed ???
+    #logdetsigma = 2 * np.sum(np.log(np.diagonal(cholsigmainv)))
+    x_whitened = np.dot(cholsigmainv, x)
+
+    #sigmainv = linalg.cholesky(sigma)
+    logdetsigma = np.log(np.linalg.det(sigma))
+
+    sigma2 = 1. # error variance is included in sigma
+
+    llike  =  0.5 * (np.log(sigma2) - 2.* np.log(np.diagonal(cholsigmainv))
+                          + (x_whitened**2)/sigma2
+                          +  np.log(2*np.pi))
+
+    return llike, (x_whitened**2)

 nobs = 10
 x = np.arange(nobs)
-autocov = 2 * 0.8 ** np.arange(nobs)
+autocov = 2*0.8**np.arange(nobs)# +0.01 * np.random.randn(nobs)
 sigma = linalg.toeplitz(autocov)
-cholsigma = linalg.cholesky(sigma).T
+#sigma = np.diag(1+np.random.randn(10)**2)
+
+cholsigma = linalg.cholesky(sigma).T#, lower=True)
+
 sigmainv = linalg.inv(sigma)
 cholsigmainv = linalg.cholesky(sigmainv)
+#2 * np.sum(np.log(np.diagonal(np.linalg.cholesky(A)))) #Dag mailinglist
+# logdet not needed ???
+#logdetsigma = 2 * np.sum(np.log(np.diagonal(cholsigmainv)))
 x_whitened = np.dot(cholsigmainv, x)
+
+#sigmainv = linalg.cholesky(sigma)
 logdetsigma = np.log(np.linalg.det(sigma))
-sigma2 = 1.0
-llike = 0.5 * (np.log(sigma2) - 2.0 * np.log(np.diagonal(cholsigmainv)) + 
-    x_whitened ** 2 / sigma2 + np.log(2 * np.pi))
+
+sigma2 = 1. # error variance is included in sigma
+
+llike  =  0.5 * (np.log(sigma2) - 2.* np.log(np.diagonal(cholsigmainv))
+                      + (x_whitened**2)/sigma2
+                      +  np.log(2*np.pi))
+
 ll, ls = mvn_nloglike_obs(x, sigma)
+#the following are all the same for diagonal sigma
 print(ll.sum(), 'll.sum()')
 print(llike.sum(), 'llike.sum()')
-print(np.log(stats.norm._pdf(x_whitened)).sum() - 0.5 * logdetsigma)
+print(np.log(stats.norm._pdf(x_whitened)).sum() - 0.5 * logdetsigma,)
 print('stats whitened')
-print(np.log(stats.norm.pdf(x, scale=np.sqrt(np.diag(sigma)))).sum())
+print(np.log(stats.norm.pdf(x,scale=np.sqrt(np.diag(sigma)))).sum(),)
 print('stats scaled')
-print(0.5 * (np.dot(linalg.cho_solve((linalg.cho_factor(sigma, lower=False)
-    [0].T, False), x.T), x) + nobs * np.log(2 * np.pi) - 2.0 * np.log(np.
-    diagonal(cholsigmainv)).sum()))
-print(0.5 * (np.dot(linalg.cho_solve((linalg.cho_factor(sigma)[0].T, False),
-    x.T), x) + nobs * np.log(2 * np.pi) - 2.0 * np.log(np.diagonal(
-    cholsigmainv)).sum()))
-print(0.5 * (np.dot(linalg.cho_solve(linalg.cho_factor(sigma), x.T), x) + 
-    nobs * np.log(2 * np.pi) - 2.0 * np.log(np.diagonal(cholsigmainv)).sum()))
+print(0.5*(np.dot(linalg.cho_solve((linalg.cho_factor(sigma, lower=False)[0].T,
+                                    False),x.T), x)
+           + nobs*np.log(2*np.pi)
+           - 2.* np.log(np.diagonal(cholsigmainv)).sum()))
+print(0.5*(np.dot(linalg.cho_solve((linalg.cho_factor(sigma)[0].T, False),x.T), x) + nobs*np.log(2*np.pi)- 2.* np.log(np.diagonal(cholsigmainv)).sum()))
+print(0.5*(np.dot(linalg.cho_solve(linalg.cho_factor(sigma),x.T), x) + nobs*np.log(2*np.pi)- 2.* np.log(np.diagonal(cholsigmainv)).sum()))
 print(mvn_loglike(x, sigma))
+
+
 normtransf = AffineTransform(np.zeros(nobs), cholsigma, StandardNormal())
 print(normtransf.logpdf(x_whitened).sum())
+#print(normtransf.rvs(5)
 print(loglike_ar1(x, 0.8))
+
 mch = MultivariateNormalChol(np.zeros(nobs), sigma)
 print(mch.logpdf(x))
+
+#from .linalg_decomp_1 import tiny2zero
+#print(tiny2zero(mch.cholsigmainv / mch.cholsigmainv[-1,-1])
+
 xw = mch.whiten(x)
-print('xSigmax', np.dot(xw, xw))
-print('xSigmax', np.dot(x, linalg.cho_solve(linalg.cho_factor(mch.sigma), x)))
-print('xSigmax', np.dot(x, linalg.cho_solve((mch.cholsigma, False), x)))
+print('xSigmax', np.dot(xw,xw))
+print('xSigmax', np.dot(x,linalg.cho_solve(linalg.cho_factor(mch.sigma),x)))
+print('xSigmax', np.dot(x,linalg.cho_solve((mch.cholsigma, False),x)))
diff --git a/statsmodels/sandbox/archive/linalg_decomp_1.py b/statsmodels/sandbox/archive/linalg_decomp_1.py
index e16ec7978..4726f17dd 100644
--- a/statsmodels/sandbox/archive/linalg_decomp_1.py
+++ b/statsmodels/sandbox/archive/linalg_decomp_1.py
@@ -1,4 +1,4 @@
-"""Recipes for more efficient work with linalg using classes
+'''Recipes for more efficient work with linalg using classes


 intended for use for multivariate normal and linear regression
@@ -20,22 +20,23 @@ maybe extend to sparse if some examples work out

 Author: josef-pktd
 Created on 2010-10-20
-"""
+'''
+
 import numpy as np
 from scipy import linalg
+
 from statsmodels.tools.decorators import cache_readonly


 class PlainMatrixArray:
-    """Class that defines linalg operation on an array
+    '''Class that defines linalg operation on an array

     simplest version as benchmark

     linear algebra recipes for multivariate normal and linear
     regression calculations

-    """
-
+    '''
     def __init__(self, data=None, sym=None):
         if data is not None:
             if sym is None:
@@ -45,72 +46,213 @@ class PlainMatrixArray:
                 raise ValueError('data and sym cannot be both given')
         elif sym is not None:
             self.m = np.asarray(sym)
-            self.x = np.eye(*self.m.shape)
+            self.x = np.eye(*self.m.shape) #default
+
         else:
             raise ValueError('either data or sym need to be given')

+    @cache_readonly
+    def minv(self):
+        return np.linalg.inv(self.m)
+
+    def m_y(self, y):
+        return np.dot(self.m, y)
+
+    def minv_y(self, y):
+        return np.dot(self.minv, y)
+
+    @cache_readonly
+    def mpinv(self):
+        return linalg.pinv(self.m)
+
+    @cache_readonly
+    def xpinv(self):
+        return linalg.pinv(self.x)
+
+    def yt_m_y(self, y):
+        return np.dot(y.T, np.dot(self.m, y))
+
+    def yt_minv_y(self, y):
+        return np.dot(y.T, np.dot(self.minv, y))
+
+    #next two are redundant
+    def y_m_yt(self, y):
+        return np.dot(y, np.dot(self.m, y.T))
+
+    def y_minv_yt(self, y):
+        return np.dot(y, np.dot(self.minv, y.T))
+
+    @cache_readonly
+    def mdet(self):
+        return linalg.det(self.m)
+
+    @cache_readonly
+    def mlogdet(self):
+        return np.log(linalg.det(self.m))
+
+    @cache_readonly
+    def meigh(self):
+        evals, evecs = linalg.eigh(self.m)
+        sortind = np.argsort(evals)[::-1]
+        return evals[sortind], evecs[:,sortind]
+
+    @cache_readonly
+    def mhalf(self):
+        evals, evecs = self.meigh
+        return np.dot(np.diag(evals**0.5), evecs.T)
+        #return np.dot(evecs, np.dot(np.diag(evals**0.5), evecs.T))
+        #return np.dot(evecs, 1./np.sqrt(evals) * evecs.T))
+
+    @cache_readonly
+    def minvhalf(self):
+        evals, evecs = self.meigh
+        return np.dot(evecs, 1./np.sqrt(evals) * evecs.T)
+
+

 class SvdArray(PlainMatrixArray):
-    """Class that defines linalg operation on an array
+    '''Class that defines linalg operation on an array

     svd version, where svd is taken on original data array, if
     or when it matters

     no spectral cutoff in first version
-    """
+    '''

     def __init__(self, data=None, sym=None):
         super(SvdArray, self).__init__(data=data, sym=sym)
+
         u, s, v = np.linalg.svd(self.x, full_matrices=1)
         self.u, self.s, self.v = u, s, v
         self.sdiag = linalg.diagsvd(s, *x.shape)
-        self.sinvdiag = linalg.diagsvd(1.0 / s, *x.shape)
+        self.sinvdiag = linalg.diagsvd(1./s, *x.shape)
+
+    def _sdiagpow(self, p):
+        return linalg.diagsvd(np.power(self.s, p), *x.shape)
+
+    @cache_readonly
+    def minv(self):
+        sinvv = np.dot(self.sinvdiag, self.v)
+        return np.dot(sinvv.T, sinvv)
+
+    @cache_readonly
+    def meigh(self):
+        evecs = self.v.T
+        evals = self.s**2
+        return evals, evecs
+
+    @cache_readonly
+    def mdet(self):
+        return self.meigh[0].prod()
+
+    @cache_readonly
+    def mlogdet(self):
+        return np.log(self.meigh[0]).sum()
+
+    @cache_readonly
+    def mhalf(self):
+        return np.dot(np.diag(self.s), self.v)
+
+    @cache_readonly
+    def xxthalf(self):
+        return np.dot(self.u, self.sdiag)
+
+    @cache_readonly
+    def xxtinvhalf(self):
+        return np.dot(self.u, self.sinvdiag)


 class CholArray(PlainMatrixArray):
-    """Class that defines linalg operation on an array
+    '''Class that defines linalg operation on an array

     cholesky version, where svd is taken on original data array, if
     or when it matters

     plan: use cholesky factor and cholesky solve
     nothing implemented yet
-    """
+    '''

     def __init__(self, data=None, sym=None):
         super(SvdArray, self).__init__(data=data, sym=sym)

+
     def yt_minv_y(self, y):
-        """xSigmainvx
+        '''xSigmainvx
         does not use stored cholesky yet
-        """
-        pass
+        '''
+        return np.dot(x,linalg.cho_solve(linalg.cho_factor(self.m),x))
+        #same as
+        #lower = False   #if cholesky(sigma) is used, default is upper
+        #np.dot(x,linalg.cho_solve((self.cholsigma, lower),x))
+
+
+
+def testcompare(m1, m2):
+    from numpy.testing import assert_almost_equal, assert_approx_equal
+    decimal = 12
+
+    #inv
+    assert_almost_equal(m1.minv, m2.minv, decimal=decimal)
+
+    #matrix half and invhalf
+    #fix sign in test, should this be standardized
+    s1 = np.sign(m1.mhalf.sum(1))[:,None]
+    s2 = np.sign(m2.mhalf.sum(1))[:,None]
+    scorr = s1/s2
+    assert_almost_equal(m1.mhalf, m2.mhalf * scorr, decimal=decimal)
+    assert_almost_equal(m1.minvhalf, m2.minvhalf, decimal=decimal)
+
+    #eigenvalues, eigenvectors
+    evals1, evecs1 = m1.meigh
+    evals2, evecs2 = m2.meigh
+    assert_almost_equal(evals1, evals2, decimal=decimal)
+    #normalization can be different: evecs in columns
+    s1 = np.sign(evecs1.sum(0))
+    s2 = np.sign(evecs2.sum(0))
+    scorr = s1/s2
+    assert_almost_equal(evecs1, evecs2 * scorr, decimal=decimal)
+
+    #determinant
+    assert_approx_equal(m1.mdet, m2.mdet, significant=13)
+    assert_approx_equal(m1.mlogdet, m2.mlogdet, significant=13)
+
+####### helper function for interactive work
+def tiny2zero(x, eps = 1e-15):
+    '''replace abs values smaller than eps by zero, makes copy
+    '''
+    mask = np.abs(x.copy()) <  eps
+    x[mask] = 0
+    return x
+
+def maxabs(x):
+    return np.max(np.abs(x))


-def tiny2zero(x, eps=1e-15):
-    """replace abs values smaller than eps by zero, makes copy
-    """
-    pass
+if __name__ == '__main__':


-if __name__ == '__main__':
     n = 5
     y = np.arange(n)
-    x = np.random.randn(100, n)
-    autocov = 2 * 0.8 ** np.arange(n) + 0.01 * np.random.randn(n)
+    x = np.random.randn(100,n)
+    autocov = 2*0.8**np.arange(n) +0.01 * np.random.randn(n)
     sigma = linalg.toeplitz(autocov)
+
     mat = PlainMatrixArray(sym=sigma)
     print(tiny2zero(mat.mhalf))
     mih = mat.minvhalf
-    print(tiny2zero(mih))
+    print(tiny2zero(mih)) #for nicer printing
+
     mat2 = PlainMatrixArray(data=x)
     print(maxabs(mat2.yt_minv_y(np.dot(x.T, x)) - mat2.m))
     print(tiny2zero(mat2.minv_y(mat2.m)))
+
     mat3 = SvdArray(data=x)
     print(mat3.meigh[0])
     print(mat2.meigh[0])
+
     testcompare(mat2, mat3)
-    """
+
+    '''
     m = np.dot(x.T, x)

     u,s,v = np.linalg.svd(x, full_matrices=1)
@@ -146,4 +288,4 @@ if __name__ == '__main__':
     1.1368683772161603e-013


-    """
+    '''
diff --git a/statsmodels/sandbox/archive/tsa.py b/statsmodels/sandbox/archive/tsa.py
index c0ba88fa7..1d021dd08 100644
--- a/statsmodels/sandbox/archive/tsa.py
+++ b/statsmodels/sandbox/archive/tsa.py
@@ -1,4 +1,4 @@
-"""Collection of alternative implementations for time series analysis
+'''Collection of alternative implementations for time series analysis


 >>> signal.fftconvolve(x,x[::-1])[len(x)-1:len(x)+10]/x.shape[0]
@@ -18,12 +18,12 @@ array([  2.12286549e+00,   1.27450889e+00,   7.86898619e-02,
         -5.80017553e-01,  -5.74814915e-01,  -2.28006995e-01,
          9.39554926e-02,   2.00610244e-01,   1.32239575e-01,
          1.24504352e-03,  -8.81846018e-02])
-"""
+'''
 import numpy as np


 def acovf_fft(x, demean=True):
-    """autocovariance function with call to fftconvolve, biased
+    '''autocovariance function with call to fftconvolve, biased

     Parameters
     ----------
@@ -39,5 +39,11 @@ def acovf_fft(x, demean=True):

     might work for nd in parallel with time along axis 0

-    """
-    pass
+    '''
+    from scipy import signal
+    x = np.asarray(x)
+
+    if demean:
+        x = x - x.mean()
+
+    signal.fftconvolve(x,x[::-1])[len(x)-1:len(x)+10]/x.shape[0]
diff --git a/statsmodels/sandbox/bspline.py b/statsmodels/sandbox/bspline.py
index 5fa39c82d..a48785db3 100644
--- a/statsmodels/sandbox/bspline.py
+++ b/statsmodels/sandbox/bspline.py
@@ -1,4 +1,4 @@
-"""
+'''
 Bspines and smoothing splines.

 General references:
@@ -13,12 +13,16 @@ General references:

     Hutchison, M. and Hoog, F. "Smoothing noisy data with spline functions."
     Numerische Mathematik, 47(1), 99-106.
-"""
+'''
+
 import numpy as np
 import numpy.linalg as L
+
 from scipy.linalg import solveh_banded
 from scipy.optimize import golden
-from models import _hbspline
+from models import _hbspline     #removed because this was segfaulting
+
+# Issue warning regarding heavy development status of this module
 import warnings
 _msg = """
 The bspline code is technology preview and requires significant work
@@ -39,7 +43,30 @@ def _band2array(a, lower=0, symmetric=False, hermitian=False):
        hermitian -- if True (and symmetric False), return the original
                     result plus its conjugate transposed
     """
-    pass
+
+    n = a.shape[1]
+    r = a.shape[0]
+    _a = 0
+
+    if not lower:
+        for j in range(r):
+            _b = np.diag(a[r-1-j],k=j)[j:(n+j),j:(n+j)]
+            _a += _b
+            if symmetric and j > 0:
+                _a += _b.T
+            elif hermitian and j > 0:
+                _a += _b.conjugate().T
+    else:
+        for j in range(r):
+            _b = np.diag(a[j],k=j)[0:n,0:n]
+            _a += _b
+            if symmetric and j > 0:
+                _a += _b.T
+            elif hermitian and j > 0:
+                _a += _b.conjugate().T
+        _a = _a.T
+
+    return _a


 def _upper2lower(ub):
@@ -53,8 +80,13 @@ def _upper2lower(ub):
        lb  -- a lower triangular banded matrix with same entries
               as ub
     """
-    pass

+    lb = np.zeros(ub.shape, ub.dtype)
+    nrow, ncol = ub.shape
+    for i in range(ub.shape[0]):
+        lb[i,0:(ncol-i)] = ub[nrow-1-i,i:ncol]
+        lb[i,(ncol-i):] = ub[nrow-1-i,0:i]
+    return lb

 def _lower2upper(lb):
     """
@@ -67,8 +99,13 @@ def _lower2upper(lb):
        ub  -- an upper triangular banded matrix with same entries
               as lb
     """
-    pass

+    ub = np.zeros(lb.shape, lb.dtype)
+    nrow, ncol = lb.shape
+    for i in range(lb.shape[0]):
+        ub[nrow-1-i,i:ncol] = lb[i,0:(ncol-i)]
+        ub[nrow-1-i,0:i] = lb[i,(ncol-i):]
+    return ub

 def _triangle2unit(tb, lower=0):
     """
@@ -88,7 +125,17 @@ def _triangle2unit(tb, lower=0):
                 else lower is True, b is lower triangular banded
                 and its columns have been divieed by d.
     """
-    pass
+
+    if lower:
+        d = tb[0].copy()
+    else:
+        d = tb[-1].copy()
+
+    if lower:
+        return d, (tb / d)
+    else:
+        lnum = _upper2lower(tb)
+        return d, _lower2upper(lnum / d)


 def _trace_symbanded(a, b, lower=0):
@@ -104,7 +151,13 @@ def _trace_symbanded(a, b, lower=0):
     OUTPUTS: trace
        trace   -- trace(ab)
     """
-    pass
+
+    if lower:
+        t = _zero_triband(a * b, lower=1)
+        return t[0].sum() + 2 * t[1:].sum()
+    else:
+        t = _zero_triband(a * b, lower=0)
+        return t[-1].sum() + 2 * t[:-1].sum()


 def _zero_triband(a, lower=0):
@@ -115,11 +168,20 @@ def _zero_triband(a, lower=0):
        a   -- a real symmetric banded matrix (either upper or lower hald)
        lower   -- if True, a is assumed to be the lower half
     """
-    pass
+
+    nrow, ncol = a.shape
+    if lower:
+        for i in range(nrow):
+            a[i, (ncol-i):] = 0.
+    else:
+        for i in range(nrow):
+            a[i, 0:i] = 0.
+    return a


 class BSpline:
-    """
+
+    '''

     Bsplines of a given order and specified knots.

@@ -142,28 +204,43 @@ class BSpline:
        x      -- an optional set of x values at which to evaluate the
                  Bspline to avoid extra evaluation in the __call__ method

-    """
+    '''
+    # FIXME: update parameter names, replace single character names
+    # FIXME: `order` should be actual spline order (implemented as order+1)
+    ## FIXME: update the use of spline order in extension code (evaluate is recursively called)
+    # FIXME: eliminate duplicate M and m attributes (m is order, M is related to tau size)

     def __init__(self, knots, order=4, M=None, coef=None, x=None):
+
         knots = np.squeeze(np.unique(np.asarray(knots)))
+
         if knots.ndim != 1:
             raise ValueError('expecting 1d array for knots')
+
         self.m = order
         if M is None:
             M = self.m
         self.M = M
-        self.tau = np.hstack([[knots[0]] * (self.M - 1), knots, [knots[-1]] *
-            (self.M - 1)])
+
+        self.tau = np.hstack([[knots[0]]*(self.M-1), knots, [knots[-1]]*(self.M-1)])
+
         self.K = knots.shape[0] - 2
         if coef is None:
-            self.coef = np.zeros(self.K + 2 * self.M - self.m, np.float64)
+            self.coef = np.zeros((self.K + 2 * self.M - self.m), np.float64)
         else:
             self.coef = np.squeeze(coef)
-            if self.coef.shape != self.K + 2 * self.M - self.m:
-                raise ValueError('coefficients of Bspline have incorrect shape'
-                    )
+            if self.coef.shape != (self.K + 2 * self.M - self.m):
+                raise ValueError('coefficients of Bspline have incorrect shape')
         if x is not None:
             self.x = x
+
+    def _setx(self, x):
+        self._x = x
+        self._basisx = self.basis(self._x)
+
+    def _getx(self):
+        return self._x
+
     x = property(_getx, _setx)

     def __call__(self, *args):
@@ -187,6 +264,7 @@ class BSpline:
            If self has no attribute x, an exception will be raised
            because self has no attribute _basisx.
         """
+
         if not args:
             b = self._basisx.T
         else:
@@ -208,7 +286,22 @@ class BSpline:
            y  -- value of d-th derivative of the i-th basis element
                  of the BSpline at specified x values
         """
-        pass
+
+        x = np.asarray(x, np.float64)
+        _shape = x.shape
+        if _shape == ():
+            x.shape = (1,)
+        x.shape = (np.product(_shape,axis=0),)
+        if i < self.tau.shape[0] - 1:
+            # TODO: OWNDATA flags...
+            v = _hbspline.evaluate(x, self.tau, self.m, d, i, i+1)
+        else:
+            return np.zeros(x.shape, np.float64)
+
+        if (i == self.tau.shape[0] - self.m):
+            v = np.where(np.equal(x, self.tau[-1]), 1, v)
+        v.shape = _shape
+        return v

     def basis(self, x, d=0, lower=None, upper=None):
         """
@@ -229,7 +322,35 @@ class BSpline:
            y  -- value of d-th derivative of the basis elements
                  of the BSpline at specified x values
         """
-        pass
+        x = np.asarray(x)
+        _shape = x.shape
+        if _shape == ():
+            x.shape = (1,)
+        x.shape = (np.product(_shape,axis=0),)
+
+        if upper is None:
+            upper = self.tau.shape[0] - self.m
+        if lower is None:
+            lower = 0
+        upper = min(upper, self.tau.shape[0] - self.m)
+        lower = max(0, lower)
+
+        d = np.asarray(d)
+        if d.shape == ():
+            v = _hbspline.evaluate(x, self.tau, self.m, int(d), lower, upper)
+        else:
+            if d.shape[0] != 2:
+                raise ValueError("if d is not an integer, expecting a jx2 \
+                   array with first row indicating order \
+                   of derivative, second row coefficient in front.")
+            v = 0
+            for i in range(d.shape[1]):
+                v += d[1,i] * _hbspline.evaluate(x, self.tau, self.m, d[0,i], lower, upper)
+
+        v.shape = (upper-lower,) + _shape
+        if upper == self.tau.shape[0] - self.m:
+            v[-1] = np.where(np.equal(x, self.tau[-1]), 1, v[-1])
+        return v

     def gram(self, d=0):
         """
@@ -263,24 +384,43 @@ class BSpline:
            gram -- the matrix of inner products of (derivatives)
                    of the BSpline elements
         """
-        pass

+        d = np.squeeze(d)
+        if np.asarray(d).shape == ():
+            self.g = _hbspline.gram(self.tau, self.m, int(d), int(d))
+        else:
+            d = np.asarray(d)
+            if d.shape[0] != 2:
+                raise ValueError("if d is not an integer, expecting a jx2 \
+                   array with first row indicating order \
+                   of derivative, second row coefficient in front.")
+            if d.shape == (2,):
+                d.shape = (2,1)
+            self.g = 0
+            for i in range(d.shape[1]):
+                for j in range(d.shape[1]):
+                    self.g += d[1,i]* d[1,j] * _hbspline.gram(self.tau, self.m, int(d[0,i]), int(d[0,j]))
+        self.g = self.g.T
+        self.d = d
+        return np.nan_to_num(self.g)

 class SmoothingSpline(BSpline):
-    penmax = 30.0
-    method = 'target_df'
+
+    penmax = 30.
+    method = "target_df"
     target_df = 5
-    default_pen = 0.001
+    default_pen = 1.0e-03
     optimize = True
-    """
+
+    '''
     A smoothing spline, which can be used to smooth scatterplots, i.e.
     a list of (x,y) tuples.

     See fit method for more information.

-    """
+    '''

-    def fit(self, y, x=None, weights=None, pen=0.0):
+    def fit(self, y, x=None, weights=None, pen=0.):
         """
         Fit the smoothing spline to a set of (x,y) pairs.

@@ -312,7 +452,83 @@ class SmoothingSpline(BSpline):
            Should add arbitrary derivative penalty instead of just
            second derivative.
         """
-        pass
+
+        banded = True
+
+        if x is None:
+            x = self._x
+            bt = self._basisx.copy()
+        else:
+            bt = self.basis(x)
+
+        if pen == 0.: # cannot use cholesky for singular matrices
+            banded = False
+
+        if x.shape != y.shape:
+            raise ValueError('x and y shape do not agree, by default x are \
+               the Bspline\'s internal knots')
+
+        if pen >= self.penmax:
+            pen = self.penmax
+
+
+        if weights is not None:
+            self.weights = weights
+        else:
+            self.weights = 1.
+
+        _w = np.sqrt(self.weights)
+        bt *= _w
+
+        # throw out rows with zeros (this happens at boundary points!)
+
+        mask = np.flatnonzero(1 - np.all(np.equal(bt, 0), axis=0))
+
+        bt = bt[:,mask]
+        y = y[mask]
+
+        self.df_total = y.shape[0]
+
+        bty = np.squeeze(np.dot(bt, _w * y))
+        self.N = y.shape[0]
+
+        if not banded:
+            self.btb = np.dot(bt, bt.T)
+            _g = _band2array(self.g, lower=1, symmetric=True)
+            self.coef, _, self.rank = L.lstsq(self.btb + pen*_g, bty)[0:3]
+            self.rank = min(self.rank, self.btb.shape[0])
+            del _g
+        else:
+            self.btb = np.zeros(self.g.shape, np.float64)
+            nband, nbasis = self.g.shape
+            for i in range(nbasis):
+                for k in range(min(nband, nbasis-i)):
+                    self.btb[k,i] = (bt[i] * bt[i+k]).sum()
+
+            bty.shape = (1,bty.shape[0])
+            self.pen = pen
+            self.chol, self.coef = solveh_banded(self.btb +
+                                                 pen*self.g,
+                                                 bty, lower=1)
+
+        self.coef = np.squeeze(self.coef)
+        self.resid = y * self.weights - np.dot(self.coef, bt)
+        self.pen = pen
+
+        del bty
+        del mask
+        del bt
+
+    def smooth(self, y, x=None, weights=None):
+
+        if self.method == "target_df":
+            if hasattr(self, 'pen'):
+                self.fit(y, x=x, weights=weights, pen=self.pen)
+            else:
+                self.fit_target_df(y, x=x, weights=weights, df=self.target_df)
+        elif self.method == "optimize_gcv":
+            self.fit_optimize_gcv(y, x=x, weights=weights)
+

     def gcv(self):
         """
@@ -323,7 +539,9 @@ class SmoothingSpline(BSpline):
         the method of generalized cross-validation."
         Numerische Mathematik, 31(4), 377-403.
         """
-        pass
+
+        norm_resid = (self.resid**2).sum()
+        return norm_resid / (self.df_total - self.trace())

     def df_resid(self):
         """
@@ -333,7 +551,8 @@ class SmoothingSpline(BSpline):

         where self.N is the number of observations of last fit.
         """
-        pass
+
+        return self.N - self.trace()

     def df_fit(self):
         """
@@ -341,7 +560,7 @@ class SmoothingSpline(BSpline):

         self.trace()
         """
-        pass
+        return self.trace()

     def trace(self):
         """
@@ -349,10 +568,16 @@ class SmoothingSpline(BSpline):

         TODO: addin a reference to Wahba, and whoever else I used.
         """
-        pass

-    def fit_target_df(self, y, x=None, df=None, weights=None, tol=0.001,
-        apen=0, bpen=0.001):
+        if self.pen > 0:
+            _invband = _hbspline.invband(self.chol.copy())
+            tr = _trace_symbanded(_invband, self.btb, lower=1)
+            return tr
+        else:
+            return self.rank
+
+    def fit_target_df(self, y, x=None, df=None, weights=None, tol=1.0e-03,
+                      apen=0, bpen=1.0e-03):
         """
         Fit smoothing spline with approximately df degrees of freedom
         used in the fit, i.e. so that self.trace() is approximately df.
@@ -376,10 +601,38 @@ class SmoothingSpline(BSpline):
            The smoothing spline is determined by self.coef,
            subsequent calls of __call__ will be the smoothing spline.
         """
-        pass

-    def fit_optimize_gcv(self, y, x=None, weights=None, tol=0.001, brack=(-
-        100, 20)):
+        df = df or self.target_df
+
+        olddf = y.shape[0] - self.m
+
+        if hasattr(self, "pen"):
+            self.fit(y, x=x, weights=weights, pen=self.pen)
+            curdf = self.trace()
+            if np.fabs(curdf - df) / df < tol:
+                return
+            if curdf > df:
+                apen, bpen = self.pen, 2 * self.pen
+            else:
+                apen, bpen = 0., self.pen
+
+        while True:
+
+            curpen = 0.5 * (apen + bpen)
+            self.fit(y, x=x, weights=weights, pen=curpen)
+            curdf = self.trace()
+            if curdf > df:
+                apen, bpen = curpen, 2 * curpen
+            else:
+                apen, bpen = apen, curpen
+            if apen >= self.penmax:
+                raise ValueError("penalty too large, try setting penmax \
+                   higher or decreasing df")
+            if np.fabs(curdf - df) / df < tol:
+                break
+
+    def fit_optimize_gcv(self, y, x=None, weights=None, tol=1.0e-03,
+                         brack=(-100,20)):
         """
         Fit smoothing spline trying to optimize GCV.

@@ -401,4 +654,10 @@ class SmoothingSpline(BSpline):
            The smoothing spline is determined by self.coef,
            subsequent calls of __call__ will be the smoothing spline.
         """
-        pass
+
+        def _gcv(pen, y, x):
+            self.fit(y, x=x, pen=np.exp(pen))
+            a = self.gcv()
+            return a
+
+        a = golden(_gcv, args=(y,x), brack=brack, tol=tol)
diff --git a/statsmodels/sandbox/datarich/factormodels.py b/statsmodels/sandbox/datarich/factormodels.py
index 3bac5ad8e..b1ac73938 100644
--- a/statsmodels/sandbox/datarich/factormodels.py
+++ b/statsmodels/sandbox/datarich/factormodels.py
@@ -1,39 +1,65 @@
+# -*- coding: utf-8 -*-
 """
 Created on Sun Nov 14 08:21:41 2010

 Author: josef-pktd
 License: BSD (3-clause)
 """
+
 import numpy as np
 import statsmodels.api as sm
 from statsmodels.sandbox.tools import pca
 from statsmodels.sandbox.tools.cross_val import LeaveOneOut

+#converting example Principal Component Regression to a class
+#from sandbox/example_pca_regression.py
+

 class FactorModelUnivariate:
-    """
+    '''

     Todo:
     check treatment of const, make it optional ?
         add hasconst (0 or 1), needed when selecting nfact+hasconst
     options are arguments in calc_factors, should be more public instead
     cross-validation is slow for large number of observations
-    """
-
+    '''
     def __init__(self, endog, exog):
+        #do this in a superclass?
         self.endog = np.asarray(endog)
         self.exog = np.asarray(exog)

+
     def calc_factors(self, x=None, keepdim=0, addconst=True):
-        """get factor decomposition of exogenous variables
+        '''get factor decomposition of exogenous variables

         This uses principal component analysis to obtain the factors. The number
         of factors kept is the maximum that will be considered in the regression.
-        """
-        pass
+        '''
+        if x is None:
+            x = self.exog
+        else:
+            x = np.asarray(x)
+        xred, fact, evals, evecs  = pca(x, keepdim=keepdim, normalize=1)
+        self.exog_reduced = xred
+        #self.factors = fact
+        if addconst:
+            self.factors = sm.add_constant(fact, prepend=True)
+            self.hasconst = 1  #needs to be int
+        else:
+            self.factors = fact
+            self.hasconst = 0  #needs to be int
+
+        self.evals = evals
+        self.evecs = evecs
+
+    def fit_fixed_nfact(self, nfact):
+        if not hasattr(self, 'factors_wconst'):
+            self.calc_factors()
+        return sm.OLS(self.endog, self.factors[:,:nfact+1]).fit()

     def fit_find_nfact(self, maxfact=None, skip_crossval=True, cv_iter=None):
-        """estimate the model and selection criteria for up to maxfact factors
+        '''estimate the model and selection criteria for up to maxfact factors

         The selection criteria that are calculated are AIC, BIC, and R2_adj. and
         additionally cross-validation prediction error sum of squares if `skip_crossval`
@@ -48,37 +74,114 @@ class FactorModelUnivariate:



-        """
-        pass
+        '''
+        #print 'OLS on Factors'
+        if not hasattr(self, 'factors'):
+            self.calc_factors()
+
+        hasconst = self.hasconst
+        if maxfact is None:
+            maxfact = self.factors.shape[1] - hasconst
+
+        if (maxfact+hasconst) < 1:
+            raise ValueError('nothing to do, number of factors (incl. constant) should ' +
+                             'be at least 1')
+
+        #temporary safety
+        maxfact = min(maxfact, 10)
+
+        y0 = self.endog
+        results = []
+        #xred, fact, eva, eve  = pca(x0, keepdim=0, normalize=1)
+        for k in range(1, maxfact+hasconst): #k includes now the constnat
+            #xred, fact, eva, eve  = pca(x0, keepdim=k, normalize=1)
+            # this is faster and same result
+            fact = self.factors[:,:k]
+            res = sm.OLS(y0, fact).fit()
+        ##    print 'k =', k
+        ##    print res.params
+        ##    print 'aic:  ', res.aic
+        ##    print 'bic:  ', res.bic
+        ##    print 'llf:  ', res.llf
+        ##    print 'R2    ', res.rsquared
+        ##    print 'R2 adj', res.rsquared_adj
+
+            if not skip_crossval:
+                if cv_iter is None:
+                    cv_iter = LeaveOneOut(len(y0))
+                prederr2 = 0.
+                for inidx, outidx in cv_iter:
+                    res_l1o = sm.OLS(y0[inidx], fact[inidx,:]).fit()
+                    #print data.endog[outidx], res.model.predict(data.exog[outidx,:]),
+                    prederr2 += (y0[outidx] -
+                                 res_l1o.model.predict(res_l1o.params, fact[outidx,:]))**2.
+            else:
+                prederr2 = np.nan
+
+            results.append([k, res.aic, res.bic, res.rsquared_adj, prederr2])
+
+        self.results_find_nfact = results = np.array(results)
+        self.best_nfact = np.r_[(np.argmin(results[:,1:3],0), np.argmax(results[:,3],0),
+                     np.argmin(results[:,-1],0))]

     def summary_find_nfact(self):
-        """provides a summary for the selection of the number of factors
+        '''provides a summary for the selection of the number of factors

         Returns
         -------
         sumstr : str
             summary of the results for selecting the number of factors

-        """
-        pass
+        '''
+        if not hasattr(self, 'results_find_nfact'):
+            self.fit_find_nfact()
+
+
+        results = self.results_find_nfact
+        sumstr = ''
+        sumstr += '\n' + 'Best result for k, by AIC, BIC, R2_adj, L1O'
+#        best = np.r_[(np.argmin(results[:,1:3],0), np.argmax(results[:,3],0),
+#                     np.argmin(results[:,-1],0))]
+
+        sumstr += '\n' + ' '*19 + '%5d %4d %6d %5d' % tuple(self.best_nfact)
+
+        from statsmodels.iolib.table import SimpleTable
+
+        headers = 'k, AIC, BIC, R2_adj, L1O'.split(', ')
+        numformat = ['%6d'] + ['%10.3f']*4 #'%10.4f'
+        txt_fmt1 = dict(data_fmts = numformat)
+        tabl = SimpleTable(results, headers, None, txt_fmt=txt_fmt1)
+
+        sumstr += '\n' + "PCA regression on simulated data,"
+        sumstr += '\n' + "DGP: 2 factors and 4 explanatory variables"
+        sumstr += '\n' + tabl.__str__()
+        sumstr += '\n' + "Notes: k is number of components of PCA,"
+        sumstr += '\n' + "       constant is added additionally"
+        sumstr += '\n' + "       k=0 means regression on constant only"
+        sumstr += '\n' + "       L1O: sum of squared prediction errors for leave-one-out"
+        return sumstr


 if __name__ == '__main__':
+
     examples = [1]
     if 1 in examples:
         nobs = 500
-        f0 = np.c_[np.random.normal(size=(nobs, 2)), np.ones((nobs, 1))]
-        f2xcoef = np.c_[np.repeat(np.eye(2), 2, 0), np.arange(4)[::-1]].T
-        f2xcoef = np.array([[1.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 1.0], [
-            3.0, 2.0, 1.0, 0.0]])
-        f2xcoef = np.array([[0.1, 3.0, 1.0, 0.0], [0.0, 0.0, 1.5, 0.1], [
-            3.0, 2.0, 1.0, 0.0]])
+        f0 = np.c_[np.random.normal(size=(nobs,2)), np.ones((nobs,1))]
+        f2xcoef = np.c_[np.repeat(np.eye(2),2,0),np.arange(4)[::-1]].T
+        f2xcoef = np.array([[ 1.,  1.,  0.,  0.],
+                            [ 0.,  0.,  1.,  1.],
+                            [ 3.,  2.,  1.,  0.]])
+        f2xcoef = np.array([[ 0.1,  3.,  1.,    0.],
+                            [ 0.,  0.,  1.5,   0.1],
+                            [ 3.,  2.,  1.,    0.]])
         x0 = np.dot(f0, f2xcoef)
-        x0 += 0.1 * np.random.normal(size=x0.shape)
-        ytrue = np.dot(f0, [1.0, 1.0, 1.0])
-        y0 = ytrue + 0.1 * np.random.normal(size=ytrue.shape)
+        x0 += 0.1*np.random.normal(size=x0.shape)
+        ytrue = np.dot(f0,[1., 1., 1.])
+        y0 = ytrue + 0.1*np.random.normal(size=ytrue.shape)
+
         mod = FactorModelUnivariate(y0, x0)
         print(mod.summary_find_nfact())
-        print('with cross validation - slower')
+        print("with cross validation - slower")
         mod.fit_find_nfact(maxfact=None, skip_crossval=False, cv_iter=None)
         print(mod.summary_find_nfact())
diff --git a/statsmodels/sandbox/descstats.py b/statsmodels/sandbox/descstats.py
index ee1ab949d..b6f895c5f 100644
--- a/statsmodels/sandbox/descstats.py
+++ b/statsmodels/sandbox/descstats.py
@@ -1,14 +1,22 @@
-"""
+'''
 Glue for returning descriptive statistics.
-"""
+'''
 import os
+
 import numpy as np
 from scipy import stats
+
 from statsmodels.stats.descriptivestats import sign_test

+#############################################
+#
+#============================================
+#       Univariate Descriptive Statistics
+#============================================
+#

 def descstats(data, cols=None, axis=0):
-    """
+    '''
     Prints descriptive statistics for one or multiple variables.

     Parameters
@@ -26,19 +34,150 @@ def descstats(data, cols=None, axis=0):
     Examples
     --------
     >>> descstats(data.exog,v=['x_1','x_2','x_3'])
-    """
-    pass
+    '''
+
+    x = np.array(data)  # or rather, the data we're interested in
+    if cols is None:
+        x = x[:, None]
+    if cols is None and x.ndim == 1:
+        x = x[:,None]
+
+    if x.shape[1] == 1:
+        desc = '''
+    ---------------------------------------------
+    Univariate Descriptive Statistics
+    ---------------------------------------------
+
+    Var. Name   %(name)12s
+    ----------
+    Obs.          %(nobs)22i  Range                  %(range)22s
+    Sum of Wts.   %(sum)22s  Coeff. of Variation     %(coeffvar)22.4g
+    Mode          %(mode)22.4g  Skewness                %(skewness)22.4g
+    Repeats       %(nmode)22i  Kurtosis                %(kurtosis)22.4g
+    Mean          %(mean)22.4g  Uncorrected SS          %(uss)22.4g
+    Median        %(median)22.4g  Corrected SS            %(ss)22.4g
+    Variance      %(variance)22.4g  Sum Observations        %(sobs)22.4g
+    Std. Dev.     %(stddev)22.4g
+    ''' % {'name': cols, 'sum': 'N/A', 'nobs': len(x), 'mode': \
+    stats.mode(x)[0][0], 'nmode': stats.mode(x)[1][0], \
+    'mean': x.mean(), 'median': np.median(x), 'range': \
+    '('+str(x.min())+', '+str(x.max())+')', 'variance': \
+    x.var(), 'stddev': x.std(), 'coeffvar': \
+    stats.variation(x), 'skewness': stats.skew(x), \
+    'kurtosis': stats.kurtosis(x), 'uss': np.sum(x**2, axis=0),\
+    'ss': np.sum((x-x.mean())**2, axis=0), 'sobs': np.sum(x)}
+
+        desc+= '''
+
+    Percentiles
+    -------------
+    1  %%          %12.4g
+    5  %%          %12.4g
+    10 %%          %12.4g
+    25 %%          %12.4g
+
+    50 %%          %12.4g
+
+    75 %%          %12.4g
+    90 %%          %12.4g
+    95 %%          %12.4g
+    99 %%          %12.4g
+    ''' % tuple([stats.scoreatpercentile(x,per) for per in (1,5,10,25,
+                50,75,90,95,99)])
+        t,p_t=stats.ttest_1samp(x,0)
+        M,p_M=sign_test(x)
+        S,p_S=stats.wilcoxon(np.squeeze(x))
+
+        desc+= '''

+    Tests of Location (H0: Mu0=0)
+    -----------------------------
+    Test                Statistic       Two-tailed probability
+    -----------------+-----------------------------------------
+    Student's t      |  t %7.5f   Pr > |t|   <%.4f
+    Sign             |  M %8.2f   Pr >= |M|  <%.4f
+    Signed Rank      |  S %8.2f   Pr >= |S|  <%.4f
+
+    ''' % (t,p_t,M,p_M,S,p_S)
+# Should this be part of a 'descstats'
+# in any event these should be split up, so that they can be called
+# individually and only returned together if someone calls summary
+# or something of the sort
+
+    elif x.shape[1] > 1:
+        desc ='''
+    Var. Name   |     Obs.        Mean    Std. Dev.           Range
+    ------------+--------------------------------------------------------'''+\
+            os.linesep
+
+        for var in range(x.shape[1]):
+            xv = x[:, var]
+            kwargs = {
+                'name': var,
+                'obs': len(xv),
+                'mean': xv.mean(),
+                'stddev': xv.std(),
+                'range': '('+str(xv.min())+', '+str(xv.max())+')'+os.linesep
+                }
+            desc += ("%(name)15s %(obs)9i %(mean)12.4g %(stddev)12.4g "
+                     "%(range)20s" % kwargs)
+    else:
+        raise ValueError("data not understood")
+
+    return desc
+
+#if __name__=='__main__':
+# test descstats
+#    import os
+#    loc='http://eagle1.american.edu/~js2796a/data/handguns_data.csv'
+#    relpath=(load_dataset(loc))
+#    dta=np.recfromcsv(relpath)
+#    descstats(dta,['stpop'])
+#    raw_input('Hit enter for multivariate test')
+#    descstats(dta,['stpop','avginc','vio'])
+
+# with plain arrays
+#    import string2dummy as s2d
+#    dts=s2d.string2dummy(dta)
+#    ndts=np.vstack(dts[col] for col in dts.dtype.names)
+# observations in columns and data in rows
+# is easier for the call to stats
+
+# what to make of
+# ndts=np.column_stack(dts[col] for col in dts.dtype.names)
+# ntda=ntds.swapaxis(1,0)
+# ntda is ntds returns false?
+
+# or now we just have detailed information about the different strings
+# would this approach ever be inappropriate for a string typed variable
+# other than dates?
+#    descstats(ndts, [1])
+#    raw_input("Enter to try second part")
+#    descstats(ndts, [1,20,3])

 if __name__ == '__main__':
     import statsmodels.api as sm
     data = sm.datasets.longley.load()
     data.exog = sm.add_constant(data.exog, prepend=False)
     sum1 = descstats(data.exog)
-    sum1a = descstats(data.exog[:, :1])
+    sum1a = descstats(data.exog[:,:1])
+
+#    loc='http://eagle1.american.edu/~js2796a/data/handguns_data.csv'
+#    dta=np.recfromcsv(loc)
+#    summary2 = descstats(dta,['stpop'])
+#    summary3 =  descstats(dta,['stpop','avginc','vio'])
+#TODO: needs a by argument
+#    summary4 = descstats(dta) this fails
+# this is a bug
+# p = dta[['stpop']]
+# p.view(dtype = np.float, type = np.ndarray)
+# this works
+# p.view(dtype = np.int, type = np.ndarray)
+
+### This is *really* slow ###
     if os.path.isfile('./Econ724_PS_I_Data.csv'):
         data2 = np.recfromcsv('./Econ724_PS_I_Data.csv')
         sum2 = descstats(data2.ahe)
-        sum3 = descstats(np.column_stack((data2.ahe, data2.yrseduc)))
-        sum4 = descstats(np.column_stack([data2[_] for _ in data2.dtype.names])
-            )
+        sum3 = descstats(np.column_stack((data2.ahe,data2.yrseduc)))
+        sum4 = descstats(np.column_stack(([data2[_] for \
+                _ in data2.dtype.names])))
diff --git a/statsmodels/sandbox/distributions/estimators.py b/statsmodels/sandbox/distributions/estimators.py
index a3bb1508d..1dcb173bd 100644
--- a/statsmodels/sandbox/distributions/estimators.py
+++ b/statsmodels/sandbox/distributions/estimators.py
@@ -1,4 +1,4 @@
-"""estimate distribution parameters by various methods
+'''estimate distribution parameters by various methods
 method of moments or matching quantiles, and Maximum Likelihood estimation
 based on binned data and Maximum Product-of-Spacings

@@ -85,14 +85,17 @@ created : 2010-04-20
 changes:
 added Maximum Product-of-Spacings 2010-05-12

-"""
+'''
+
 import numpy as np
 from scipy import stats, optimize, special
-cache = {}

+cache = {}   #module global storage for temp results, not used

+
+# the next two use distfn from module scope - not anymore
 def gammamomentcond(distfn, params, mom2, quantile=None):
-    """estimate distribution parameters based method of moments (mean,
+    '''estimate distribution parameters based method of moments (mean,
     variance) for distributions with 1 shape parameter and fixed loc=0.

     Returns
@@ -103,12 +106,16 @@ def gammamomentcond(distfn, params, mom2, quantile=None):
     -----
     first test version, quantile argument not used

-    """
-    pass
-
+    '''
+    def cond(params):
+        alpha, scale = params
+        mom2s = distfn.stats(alpha, 0.,scale)
+        #quantil
+        return np.array(mom2)-mom2s
+    return cond

 def gammamomentcond2(distfn, params, mom2, quantile=None):
-    """estimate distribution parameters based method of moments (mean,
+    '''estimate distribution parameters based method of moments (mean,
     variance) for distributions with 1 shape parameter and fixed loc=0.

     Returns
@@ -122,12 +129,16 @@ def gammamomentcond2(distfn, params, mom2, quantile=None):

     The only difference to previous function is return type.

-    """
-    pass
+    '''
+    alpha, scale = params
+    mom2s = distfn.stats(alpha, 0.,scale)
+    return np.array(mom2)-mom2s


+
+######### fsolve does not move in small samples, fmin not very accurate
 def momentcondunbound(distfn, params, mom2, quantile=None):
-    """moment conditions for estimating distribution parameters using method
+    '''moment conditions for estimating distribution parameters using method
     of moments, uses mean, variance and one quantile for distributions
     with 1 shape parameter.

@@ -136,12 +147,20 @@ def momentcondunbound(distfn, params, mom2, quantile=None):
     difference : ndarray
         difference between theoretical and empirical moments and quantiles

-    """
-    pass
+    '''
+    shape, loc, scale = params
+    mom2diff = np.array(distfn.stats(shape, loc,scale)) - mom2
+    if quantile is not None:
+        pq, xq = quantile
+        #ppfdiff = distfn.ppf(pq, alpha)
+        cdfdiff = distfn.cdf(xq, shape, loc, scale) - pq
+        return np.concatenate([mom2diff, cdfdiff[:1]])
+    return mom2diff


+###### loc scale only
 def momentcondunboundls(distfn, params, mom2, quantile=None, shape=None):
-    """moment conditions for estimating loc and scale of a distribution
+    '''moment conditions for estimating loc and scale of a distribution
     with method of moments using either 2 quantiles or 2 moments (not both).

     Returns
@@ -149,12 +168,24 @@ def momentcondunboundls(distfn, params, mom2, quantile=None, shape=None):
     difference : ndarray
         difference between theoretical and empirical moments or quantiles

-    """
-    pass
+    '''
+    loc, scale = params
+    mom2diff = np.array(distfn.stats(shape, loc, scale)) - mom2
+    if quantile is not None:
+        pq, xq = quantile
+        #ppfdiff = distfn.ppf(pq, alpha)
+        cdfdiff = distfn.cdf(xq, shape, loc, scale) - pq
+        #return np.concatenate([mom2diff, cdfdiff[:1]])
+        return cdfdiff
+    return mom2diff
+
+

+######### try quantile GMM with identity weight matrix
+#(just a guess that's what it is

 def momentcondquant(distfn, params, mom2, quantile=None, shape=None):
-    """moment conditions for estimating distribution parameters by matching
+    '''moment conditions for estimating distribution parameters by matching
     quantiles, defines as many moment conditions as quantiles.

     Returns
@@ -167,12 +198,44 @@ def momentcondquant(distfn, params, mom2, quantile=None, shape=None):
     This can be used for method of moments or for generalized method of
     moments.

-    """
-    pass
+    '''
+    #this check looks redundant/unused know
+    if len(params) == 2:
+        loc, scale = params
+    elif len(params) == 3:
+        shape, loc, scale = params
+    else:
+        #raise NotImplementedError
+        pass #see whether this might work, seems to work for beta with 2 shape args
+
+    #mom2diff = np.array(distfn.stats(*params)) - mom2
+    #if not quantile is None:
+    pq, xq = quantile
+    #ppfdiff = distfn.ppf(pq, alpha)
+    cdfdiff = distfn.cdf(xq, *params) - pq
+    #return np.concatenate([mom2diff, cdfdiff[:1]])
+    return cdfdiff
+    #return mom2diff
+
+def fitquantilesgmm(distfn, x, start=None, pquant=None, frozen=None):
+    if pquant is None:
+        pquant = np.array([0.01, 0.05,0.1,0.4,0.6,0.9,0.95,0.99])
+    if start is None:
+        if hasattr(distfn, '_fitstart'):
+            start = distfn._fitstart(x)
+        else:
+            start = [1]*distfn.numargs + [0.,1.]
+    #TODO: vectorize this:
+    xqs = [stats.scoreatpercentile(x, p) for p in pquant*100]
+    mom2s = None
+    parest = optimize.fmin(lambda params:np.sum(
+        momentcondquant(distfn, params, mom2s,(pquant,xqs), shape=None)**2), start)
+    return parest
+


 def fitbinned(distfn, freq, binedges, start, fixed=None):
-    """estimate parameters of distribution function for binned data using MLE
+    '''estimate parameters of distribution function for binned data using MLE

     Parameters
     ----------
@@ -196,13 +259,24 @@ def fitbinned(distfn, freq, binedges, start, fixed=None):

     added factorial

-    """
-    pass
+    '''
+    if fixed is not None:
+        raise NotImplementedError
+    nobs = np.sum(freq)
+    lnnobsfact = special.gammaln(nobs+1)
+
+    def nloglike(params):
+        '''negative loglikelihood function of binned data

+        corresponds to multinomial
+        '''
+        prob = np.diff(distfn.cdf(binedges, *params))
+        return -(lnnobsfact + np.sum(freq*np.log(prob)- special.gammaln(freq+1)))
+    return optimize.fmin(nloglike, start)

-def fitbinnedgmm(distfn, freq, binedges, start, fixed=None, weightsoptimal=True
-    ):
-    """estimate parameters of distribution function for binned data using GMM
+
+def fitbinnedgmm(distfn, freq, binedges, start, fixed=None, weightsoptimal=True):
+    '''estimate parameters of distribution function for binned data using GMM

     Parameters
     ----------
@@ -231,10 +305,28 @@ def fitbinnedgmm(distfn, freq, binedges, start, fixed=None, weightsoptimal=True

     added factorial

-    """
-    pass
-
-
+    '''
+    if fixed is not None:
+        raise NotImplementedError
+    nobs = np.sum(freq)
+    if weightsoptimal:
+        weights = freq/float(nobs)
+    else:
+        weights = np.ones(len(freq))
+    freqnormed = freq/float(nobs)
+    # skip turning weights into matrix diag(freq/float(nobs))
+
+    def gmmobjective(params):
+        '''negative loglikelihood function of binned data
+
+        corresponds to multinomial
+        '''
+        prob = np.diff(distfn.cdf(binedges, *params))
+        momcond = freqnormed - prob
+        return np.dot(momcond*weights, momcond)
+    return optimize.fmin(gmmobjective, start)
+
+#Addition from try_maxproductspacings:
 """Estimating Parameters of Log-Normal Distribution with Maximum
 Likelihood and Maximum Product-of-Spacings

@@ -245,9 +337,16 @@ Author: josef-pktd
 License: BSD
 """

+def hess_ndt(fun, pars, args, options):
+    import numdifftools as ndt
+    if not ('stepMax' in options or 'stepFix' in options):
+        options['stepMax'] = 1e-5
+    f = lambda params: fun(params, *args)
+    h = ndt.Hessian(f, **options)
+    return h(pars), h

 def logmps(params, xsorted, dist):
-    """calculate negative log of Product-of-Spacings
+    '''calculate negative log of Product-of-Spacings

     Parameters
     ----------
@@ -267,12 +366,13 @@ def logmps(params, xsorted, dist):
     Notes
     -----
     MPS definiton from JKB page 233
-    """
-    pass
-
+    '''
+    xcdf = np.r_[0., dist.cdf(xsorted, *params), 1.]
+    D = np.diff(xcdf)
+    return -np.log(D).mean()

 def getstartparams(dist, data):
-    """get starting values for estimation of distribution parameters
+    '''get starting values for estimation of distribution parameters

     Parameters
     ----------
@@ -289,12 +389,19 @@ def getstartparams(dist, data):
         preliminary estimate or starting value for the parameters of
         the distribution given the data, including loc and scale

-    """
-    pass
-
+    '''
+    if hasattr(dist, 'fitstart'):
+        #x0 = getattr(dist, 'fitstart')(data)
+        x0 = dist.fitstart(data)
+    else:
+        if np.isfinite(dist.a):
+            x0 = np.r_[[1.]*dist.numargs, (data.min()-1), 1.]
+        else:
+            x0 = np.r_[[1.]*dist.numargs, (data.mean()-1), 1.]
+    return x0

 def fit_mps(dist, data, x0=None):
-    """Estimate distribution parameters with Maximum Product-of-Spacings
+    '''Estimate distribution parameters with Maximum Product-of-Spacings

     Parameters
     ----------
@@ -312,97 +419,145 @@ def fit_mps(dist, data, x0=None):
         including loc and scale


-    """
-    pass
+    '''
+    xsorted = np.sort(data)
+    if x0 is None:
+        x0 = getstartparams(dist, xsorted)
+    args = (xsorted, dist)
+    print(x0)
+    #print(args)
+    return optimize.fmin(logmps, x0, args=args)
+


 if __name__ == '__main__':
+
+    #Example: gamma - distribution
+    #-----------------------------
+
     print('\n\nExample: gamma Distribution')
-    print('---------------------------')
+    print(    '---------------------------')
+
     alpha = 2
     xq = [0.5, 4]
     pq = [0.1, 0.9]
     print(stats.gamma.ppf(pq, alpha))
     xq = stats.gamma.ppf(pq, alpha)
-    print(np.diff(stats.gamma.ppf(pq, np.linspace(0.01, 4, 10)[:, None]) *
-        xq[::-1]))
-    print(optimize.fsolve(lambda alpha: np.diff(stats.gamma.ppf(pq, alpha) *
-        xq[::-1]), 3.0))
+    print(np.diff((stats.gamma.ppf(pq, np.linspace(0.01,4,10)[:,None])*xq[::-1])))
+    #optimize.bisect(lambda alpha: np.diff((stats.gamma.ppf(pq, alpha)*xq[::-1])))
+    print(optimize.fsolve(lambda alpha: np.diff((stats.gamma.ppf(pq, alpha)*xq[::-1])), 3.))
+
     distfn = stats.gamma
-    mcond = gammamomentcond(distfn, [5.0, 10], mom2=stats.gamma.stats(alpha,
-        0.0, 1.0), quantile=None)
-    print(optimize.fsolve(mcond, [1.0, 2.0]))
-    mom2 = stats.gamma.stats(alpha, 0.0, 1.0)
-    print(optimize.fsolve(lambda params: gammamomentcond2(distfn, params,
-        mom2), [1.0, 2.0]))
-    grvs = stats.gamma.rvs(alpha, 0.0, 2.0, size=1000)
+    mcond = gammamomentcond(distfn, [5.,10], mom2=stats.gamma.stats(alpha, 0.,1.), quantile=None)
+    print(optimize.fsolve(mcond, [1.,2.]))
+    mom2 = stats.gamma.stats(alpha, 0.,1.)
+    print(optimize.fsolve(lambda params:gammamomentcond2(distfn, params, mom2), [1.,2.]))
+
+    grvs = stats.gamma.rvs(alpha, 0.,2., size=1000)
     mom2 = np.array([grvs.mean(), grvs.var()])
-    alphaestq = optimize.fsolve(lambda params: gammamomentcond2(distfn,
-        params, mom2), [1.0, 3.0])
+    alphaestq = optimize.fsolve(lambda params:gammamomentcond2(distfn, params, mom2), [1.,3.])
     print(alphaestq)
-    print('scale = ', xq / stats.gamma.ppf(pq, alphaestq))
+    print('scale = ', xq/stats.gamma.ppf(pq, alphaestq))
+
+
+    #Example beta - distribution
+    #---------------------------
+
+    #Warning: this example had cut-and-paste errors
+
     print('\n\nExample: beta Distribution')
-    print('--------------------------')
-    stats.distributions.beta_gen._fitstart = lambda self, data: (5, 5, 0, 1)
-    pq = np.array([0.01, 0.05, 0.1, 0.4, 0.6, 0.9, 0.95, 0.99])
-    rvsb = stats.beta.rvs(10, 15, size=2000)
+    print(    '--------------------------')
+
+    #monkey patching :
+##    if hasattr(stats.beta, '_fitstart'):
+##        del stats.beta._fitstart  #bug in _fitstart  #raises AttributeError: _fitstart
+    #stats.distributions.beta_gen._fitstart = lambda self, data : np.array([1,1,0,1])
+    #_fitstart seems to require a tuple
+    stats.distributions.beta_gen._fitstart = lambda self, data : (5,5,0,1)
+
+    pq = np.array([0.01, 0.05,0.1,0.4,0.6,0.9,0.95,0.99])
+    #rvsb = stats.beta.rvs(0.5,0.15,size=200)
+    rvsb = stats.beta.rvs(10,15,size=2000)
     print('true params', 10, 15, 0, 1)
     print(stats.beta.fit(rvsb))
-    xqsb = [stats.scoreatpercentile(rvsb, p) for p in pq * 100]
+    xqsb = [stats.scoreatpercentile(rvsb, p) for p in pq*100]
     mom2s = np.array([rvsb.mean(), rvsb.var()])
-    betaparest_gmmquantile = optimize.fmin(lambda params: np.sum(
-        momentcondquant(stats.beta, params, mom2s, (pq, xqsb), shape=None) **
-        2), [10, 10, 0.0, 1.0], maxiter=2000)
-    print('betaparest_gmmquantile', betaparest_gmmquantile)
+    betaparest_gmmquantile = optimize.fmin(lambda params:np.sum(momentcondquant(stats.beta, params, mom2s,(pq,xqsb), shape=None)**2),
+                                           [10,10, 0., 1.], maxiter=2000)
+    print('betaparest_gmmquantile',  betaparest_gmmquantile)
+    #result sensitive to initial condition
+
+
+    #Example t - distribution
+    #------------------------
+
     print('\n\nExample: t Distribution')
-    print('-----------------------')
+    print(    '-----------------------')
+
     nobs = 1000
     distfn = stats.t
-    pq = np.array([0.1, 0.9])
-    paramsdgp = 5, 0, 1
+    pq = np.array([0.1,0.9])
+    paramsdgp = (5, 0, 1)
     trvs = distfn.rvs(5, 0, 1, size=nobs)
-    xqs = [stats.scoreatpercentile(trvs, p) for p in pq * 100]
+    xqs = [stats.scoreatpercentile(trvs, p) for p in pq*100]
     mom2th = distfn.stats(*paramsdgp)
     mom2s = np.array([trvs.mean(), trvs.var()])
-    tparest_gmm3quantilefsolve = optimize.fsolve(lambda params:
-        momentcondunbound(distfn, params, mom2s, (pq, xqs)), [10, 1.0, 2.0])
+    tparest_gmm3quantilefsolve = optimize.fsolve(lambda params:momentcondunbound(distfn,params, mom2s,(pq,xqs)), [10,1.,2.])
     print('tparest_gmm3quantilefsolve', tparest_gmm3quantilefsolve)
-    tparest_gmm3quantile = optimize.fmin(lambda params: np.sum(
-        momentcondunbound(distfn, params, mom2s, (pq, xqs)) ** 2), [10, 1.0,
-        2.0])
+    tparest_gmm3quantile = optimize.fmin(lambda params:np.sum(momentcondunbound(distfn,params, mom2s,(pq,xqs))**2), [10,1.,2.])
     print('tparest_gmm3quantile', tparest_gmm3quantile)
     print(distfn.fit(trvs))
-    print(optimize.fsolve(lambda params: momentcondunboundls(distfn, params,
-        mom2s, shape=5), [1.0, 2.0]))
-    print(optimize.fmin(lambda params: np.sum(momentcondunboundls(distfn,
-        params, mom2s, shape=5) ** 2), [1.0, 2.0]))
+
+    ##
+
+    ##distfn = stats.t
+    ##pq = np.array([0.1,0.9])
+    ##paramsdgp = (5, 0, 1)
+    ##trvs = distfn.rvs(5, 0, 1, size=nobs)
+    ##xqs = [stats.scoreatpercentile(trvs, p) for p in pq*100]
+    ##mom2th = distfn.stats(*paramsdgp)
+    ##mom2s = np.array([trvs.mean(), trvs.var()])
+    print(optimize.fsolve(lambda params:momentcondunboundls(distfn, params, mom2s,shape=5), [1.,2.]))
+    print(optimize.fmin(lambda params:np.sum(momentcondunboundls(distfn, params, mom2s,shape=5)**2), [1.,2.]))
     print(distfn.fit(trvs))
-    print(optimize.fsolve(lambda params: momentcondunboundls(distfn, params,
-        mom2s, (pq, xqs), shape=5), [1.0, 2.0]))
-    pq = np.array([0.01, 0.05, 0.1, 0.4, 0.6, 0.9, 0.95, 0.99])
-    xqs = [stats.scoreatpercentile(trvs, p) for p in pq * 100]
-    tparest_gmmquantile = optimize.fmin(lambda params: np.sum(
-        momentcondquant(distfn, params, mom2s, (pq, xqs), shape=None) ** 2),
-        [10, 1.0, 2.0])
+    #loc, scale, based on quantiles
+    print(optimize.fsolve(lambda params:momentcondunboundls(distfn, params, mom2s,(pq,xqs),shape=5), [1.,2.]))
+
+    ##
+
+    pq = np.array([0.01, 0.05,0.1,0.4,0.6,0.9,0.95,0.99])
+    #paramsdgp = (5, 0, 1)
+    xqs = [stats.scoreatpercentile(trvs, p) for p in pq*100]
+    tparest_gmmquantile = optimize.fmin(lambda params:np.sum(momentcondquant(distfn, params, mom2s,(pq,xqs), shape=None)**2), [10, 1.,2.])
     print('tparest_gmmquantile', tparest_gmmquantile)
-    tparest_gmmquantile2 = fitquantilesgmm(distfn, trvs, start=[10, 1.0, 
-        2.0], pquant=None, frozen=None)
+    tparest_gmmquantile2 = fitquantilesgmm(distfn, trvs, start=[10, 1.,2.], pquant=None, frozen=None)
     print('tparest_gmmquantile2', tparest_gmmquantile2)
-    bt = stats.t.ppf(np.linspace(0, 1, 21), 5)
-    ft, bt = np.histogram(trvs, bins=bt)
+
+
+    ##
+
+
+    #use trvs from before
+    bt = stats.t.ppf(np.linspace(0,1,21),5)
+    ft,bt = np.histogram(trvs,bins=bt)
     print('fitbinned t-distribution')
     tparest_mlebinew = fitbinned(stats.t, ft, bt, [10, 0, 1])
     tparest_gmmbinewidentity = fitbinnedgmm(stats.t, ft, bt, [10, 0, 1])
-    tparest_gmmbinewoptimal = fitbinnedgmm(stats.t, ft, bt, [10, 0, 1],
-        weightsoptimal=False)
+    tparest_gmmbinewoptimal = fitbinnedgmm(stats.t, ft, bt, [10, 0, 1], weightsoptimal=False)
     print(paramsdgp)
-    ft2, bt2 = np.histogram(trvs, bins=50)
-    """fitbinned t-distribution"""
+
+    #Note: this can be used for chisquare test and then has correct asymptotic
+    #   distribution for a distribution with estimated parameters, find ref again
+    #TODO combine into test with binning included, check rule for number of bins
+
+    #bt2 = stats.t.ppf(np.linspace(trvs.,1,21),5)
+    ft2,bt2 = np.histogram(trvs,bins=50)
+    'fitbinned t-distribution'
     tparest_mlebinel = fitbinned(stats.t, ft2, bt2, [10, 0, 1])
     tparest_gmmbinelidentity = fitbinnedgmm(stats.t, ft2, bt2, [10, 0, 1])
-    tparest_gmmbineloptimal = fitbinnedgmm(stats.t, ft2, bt2, [10, 0, 1],
-        weightsoptimal=False)
+    tparest_gmmbineloptimal = fitbinnedgmm(stats.t, ft2, bt2, [10, 0, 1], weightsoptimal=False)
     tparest_mle = stats.t.fit(trvs)
+
     np.set_printoptions(precision=6)
     print('sample size', nobs)
     print('true (df, loc, scale)      ', paramsdgp)
@@ -419,7 +574,8 @@ if __name__ == '__main__':
     print('tparest_gmmquantileidentity', tparest_gmmquantile)
     print('tparest_gmm3quantilefsolve ', tparest_gmm3quantilefsolve)
     print('tparest_gmm3quantile       ', tparest_gmm3quantile)
-    """ example results:
+
+    ''' example results:
     standard error for df estimate looks large
     note: iI do not impose that df is an integer, (b/c not necessary)
     need Monte Carlo to check variance of estimators
@@ -440,48 +596,82 @@ if __name__ == '__main__':
     tparest_gmmquantileidentity [ 3.940797 -0.046469  1.002001]
     tparest_gmm3quantilefsolve  [ 10.   1.   2.]
     tparest_gmm3quantile        [ 6.376101 -0.029322  1.112403]
-    """
+    '''
+
+    #Example with Maximum Product of Spacings Estimation
+    #===================================================
+
+    #Example: Lognormal Distribution
+    #-------------------------------
+
+    #tough problem for MLE according to JKB
+    #but not sure for which parameters
+
     print('\n\nExample: Lognormal Distribution')
-    print('-------------------------------')
+    print(    '-------------------------------')
+
     sh = np.exp(10)
     sh = 0.01
     print(sh)
-    x = stats.lognorm.rvs(sh, loc=100, scale=10, size=200)
+    x = stats.lognorm.rvs(sh,loc=100, scale=10,size=200)
+
     print(x.min())
-    print(stats.lognorm.fit(x, 1.0, loc=x.min() - 1, scale=1))
+    print(stats.lognorm.fit(x,  1.,loc=x.min()-1,scale=1))
+
     xsorted = np.sort(x)
-    x0 = [1.0, x.min() - 1, 1]
-    args = xsorted, stats.lognorm
-    print(optimize.fmin(logmps, x0, args=args))
+
+    x0 = [1., x.min()-1, 1]
+    args = (xsorted, stats.lognorm)
+    print(optimize.fmin(logmps,x0,args=args))
+
+
+    #Example: Lomax, Pareto, Generalized Pareto Distributions
+    #--------------------------------------------------------
+
+    #partially a follow-up to the discussion about numpy.random.pareto
+    #Reference: JKB
+    #example Maximum Product of Spacings Estimation
+
+    # current results:
+    # does not look very good yet sensitivity to starting values
+    # Pareto and Generalized Pareto look like a tough estimation problemprint('\n\nExample: Lognormal Distribution'
+
     print('\n\nExample: Lomax, Pareto, Generalized Pareto Distributions')
-    print('--------------------------------------------------------')
+    print(    '--------------------------------------------------------')
+
     p2rvs = stats.genpareto.rvs(2, size=500)
+    #Note: is Lomax without +1; and classical Pareto with +1
     p2rvssorted = np.sort(p2rvs)
-    argsp = p2rvssorted, stats.pareto
-    x0p = [1.0, p2rvs.min() - 5, 1]
-    print(optimize.fmin(logmps, x0p, args=argsp))
+    argsp = (p2rvssorted, stats.pareto)
+    x0p = [1., p2rvs.min()-5, 1]
+    print(optimize.fmin(logmps,x0p,args=argsp))
     print(stats.pareto.fit(p2rvs, 0.5, loc=-20, scale=0.5))
     print('gpdparest_ mle', stats.genpareto.fit(p2rvs))
     parsgpd = fit_mps(stats.genpareto, p2rvs)
     print('gpdparest_ mps', parsgpd)
-    argsgpd = p2rvssorted, stats.genpareto
-    options = dict(stepFix=1e-07)
+    argsgpd = (p2rvssorted, stats.genpareto)
+    options = dict(stepFix=1e-7)
+    #hess_ndt(fun, pars, argsgdp, options)
+    #the results for the following look strange, maybe refactoring error
     he, h = hess_ndt(logmps, parsgpd, argsgpd, options)
     print(np.linalg.eigh(he)[0])
     f = lambda params: logmps(params, *argsgpd)
     print(f(parsgpd))
+    #add binned
     fp2, bp2 = np.histogram(p2rvs, bins=50)
-    """fitbinned t-distribution"""
+    'fitbinned t-distribution'
     gpdparest_mlebinel = fitbinned(stats.genpareto, fp2, bp2, x0p)
     gpdparest_gmmbinelidentity = fitbinnedgmm(stats.genpareto, fp2, bp2, x0p)
     print('gpdparest_mlebinel', gpdparest_mlebinel)
     print('gpdparest_gmmbinelidentity', gpdparest_gmmbinelidentity)
-    gpdparest_gmmquantile2 = fitquantilesgmm(stats.genpareto, p2rvs, start=
-        x0p, pquant=None, frozen=None)
+    gpdparest_gmmquantile2 = fitquantilesgmm(
+        stats.genpareto, p2rvs, start=x0p, pquant=None, frozen=None)
     print('gpdparest_gmmquantile2', gpdparest_gmmquantile2)
-    print(fitquantilesgmm(stats.genpareto, p2rvs, start=x0p, pquant=np.
-        linspace(0.01, 0.99, 10), frozen=None))
-    fp2, bp2 = np.histogram(p2rvs, bins=stats.genpareto(2).ppf(np.linspace(
-        0, 0.99, 10)))
+
+    print(fitquantilesgmm(stats.genpareto, p2rvs, start=x0p,
+                          pquant=np.linspace(0.01,0.99,10), frozen=None))
+    fp2, bp2 = np.histogram(
+        p2rvs,
+        bins=stats.genpareto(2).ppf(np.linspace(0,0.99,10)))
     print('fitbinnedgmm equal weight bins')
     print(fitbinnedgmm(stats.genpareto, fp2, bp2, x0p))
diff --git a/statsmodels/sandbox/distributions/examples/ex_extras.py b/statsmodels/sandbox/distributions/examples/ex_extras.py
index c7b6e9616..9ab22c847 100644
--- a/statsmodels/sandbox/distributions/examples/ex_extras.py
+++ b/statsmodels/sandbox/distributions/examples/ex_extras.py
@@ -1,13 +1,125 @@
+# -*- coding: utf-8 -*-
 """

 Created on Wed Feb 19 12:39:49 2014

 Author: Josef Perktold
 """
+
 import numpy as np
 from scipy import stats
-from statsmodels.sandbox.distributions.extras import SkewNorm_gen, skewnorm, ACSkewT_gen, NormExpan_gen, pdf_moments, ExpTransf_gen, LogTransf_gen
+
+from statsmodels.sandbox.distributions.extras import (SkewNorm_gen, skewnorm,
+                                ACSkewT_gen,
+                                NormExpan_gen, pdf_moments,
+                                ExpTransf_gen, LogTransf_gen)
 from statsmodels.stats.moment_helpers import mc2mvsk, mnc2mc, mvsk2mnc
+
+
+def example_n():
+
+    print(skewnorm.pdf(1,0), stats.norm.pdf(1), skewnorm.pdf(1,0) - stats.norm.pdf(1))
+    print(skewnorm.pdf(1,1000), stats.chi.pdf(1,1), skewnorm.pdf(1,1000) - stats.chi.pdf(1,1))
+    print(skewnorm.pdf(-1,-1000), stats.chi.pdf(1,1), skewnorm.pdf(-1,-1000) - stats.chi.pdf(1,1))
+    rvs = skewnorm.rvs(0,size=500)
+    print('sample mean var: ', rvs.mean(), rvs.var())
+    print('theoretical mean var', skewnorm.stats(0))
+    rvs = skewnorm.rvs(5,size=500)
+    print('sample mean var: ', rvs.mean(), rvs.var())
+    print('theoretical mean var', skewnorm.stats(5))
+    print(skewnorm.cdf(1,0), stats.norm.cdf(1), skewnorm.cdf(1,0) - stats.norm.cdf(1))
+    print(skewnorm.cdf(1,1000), stats.chi.cdf(1,1), skewnorm.cdf(1,1000) - stats.chi.cdf(1,1))
+    print(skewnorm.sf(0.05,1000), stats.chi.sf(0.05,1), skewnorm.sf(0.05,1000) - stats.chi.sf(0.05,1))
+
+
+def example_T():
+    skewt = ACSkewT_gen()
+    rvs = skewt.rvs(10,0,size=500)
+    print('sample mean var: ', rvs.mean(), rvs.var())
+    print('theoretical mean var', skewt.stats(10,0))
+    print('t mean var', stats.t.stats(10))
+    print(skewt.stats(10,1000)) # -> folded t distribution, as alpha -> inf
+    rvs = np.abs(stats.t.rvs(10,size=1000))
+    print(rvs.mean(), rvs.var())
+
+
+
+def examples_normexpand():
+    skewnorm = SkewNorm_gen()
+    rvs = skewnorm.rvs(5,size=100)
+    normexpan = NormExpan_gen(rvs, mode='sample')
+
+    smvsk = stats.describe(rvs)[2:]
+    print('sample: mu,sig,sk,kur')
+    print(smvsk)
+
+    dmvsk = normexpan.stats(moments='mvsk')
+    print('normexpan: mu,sig,sk,kur')
+    print(dmvsk)
+    print('mvsk diff distribution - sample')
+    print(np.array(dmvsk) - np.array(smvsk))
+    print('normexpan attributes mvsk')
+    print(mc2mvsk(normexpan.cnt))
+    print(normexpan.mvsk)
+
+    mnc = mvsk2mnc(dmvsk)
+    mc = mnc2mc(mnc)
+    print('central moments')
+    print(mc)
+    print('non-central moments')
+    print(mnc)
+
+
+    pdffn = pdf_moments(mc)
+    print('\npdf approximation from moments')
+    print('pdf at', mc[0]-1,mc[0]+1)
+    print(pdffn([mc[0]-1,mc[0]+1]))
+    print(normexpan.pdf([mc[0]-1,mc[0]+1]))
+
+
+def examples_transf():
+    ##lognormal = ExpTransf(a=0.0, xa=-10.0, name = 'Log transformed normal')
+    ##print(lognormal.cdf(1))
+    ##print(stats.lognorm.cdf(1,1))
+    ##print(lognormal.stats())
+    ##print(stats.lognorm.stats(1))
+    ##print(lognormal.rvs(size=10))
+
+    print('Results for lognormal')
+    lognormalg = ExpTransf_gen(stats.norm, a=0, name = 'Log transformed normal general')
+    print(lognormalg.cdf(1))
+    print(stats.lognorm.cdf(1,1))
+    print(lognormalg.stats())
+    print(stats.lognorm.stats(1))
+    print(lognormalg.rvs(size=5))
+
+    ##print('Results for loggamma')
+    ##loggammag = ExpTransf_gen(stats.gamma)
+    ##print(loggammag._cdf(1,10))
+    ##print(stats.loggamma.cdf(1,10))
+
+    print('Results for expgamma')
+    loggammaexpg = LogTransf_gen(stats.gamma)
+    print(loggammaexpg._cdf(1,10))
+    print(stats.loggamma.cdf(1,10))
+    print(loggammaexpg._cdf(2,15))
+    print(stats.loggamma.cdf(2,15))
+
+
+    # this requires change in scipy.stats.distribution
+    #print(loggammaexpg.cdf(1,10))
+
+    print('Results for loglaplace')
+    loglaplaceg = LogTransf_gen(stats.laplace)
+    print(loglaplaceg._cdf(2))
+    print(stats.loglaplace.cdf(2,1))
+    loglaplaceexpg = ExpTransf_gen(stats.laplace)
+    print(loglaplaceexpg._cdf(2))
+    stats.loglaplace.cdf(3,3)
+    #0.98148148148148151
+    loglaplaceexpg._cdf(3,0,1./3)
+    #0.98148148148148151
+
 if __name__ == '__main__':
     example_n()
     example_T()
diff --git a/statsmodels/sandbox/distributions/examples/ex_fitfr.py b/statsmodels/sandbox/distributions/examples/ex_fitfr.py
index 5ee42e544..5f00a1251 100644
--- a/statsmodels/sandbox/distributions/examples/ex_fitfr.py
+++ b/statsmodels/sandbox/distributions/examples/ex_fitfr.py
@@ -1,18 +1,28 @@
-"""Example for estimating distribution parameters when some are fixed.
+'''Example for estimating distribution parameters when some are fixed.

 This uses currently a patched version of the distributions, two methods are
 added to the continuous distributions. This has no side effects.
 It also adds bounds to vonmises, which changes the behavior of it for some
 methods.

-"""
+'''
+
 import numpy as np
 from scipy import stats
+# Note the following import attaches methods to scipy.stats.distributions
+#     and adds bounds to stats.vonmises
+# from statsmodels.sandbox.distributions import sppatch
+
+
 np.random.seed(12345)
 x = stats.gamma.rvs(2.5, loc=0, scale=1.2, size=200)
+
+#estimate all parameters
 print(stats.gamma.fit(x))
 print(stats.gamma.fit_fr(x, frozen=[np.nan, np.nan, np.nan]))
-print(stats.gamma.fit_fr(x, frozen=[np.nan, 0.0, 1.2]))
+#estimate shape parameter only
+print(stats.gamma.fit_fr(x, frozen=[np.nan, 0., 1.2]))
+
 np.random.seed(12345)
 x = stats.lognorm.rvs(2, loc=0, scale=2, size=200)
-print(stats.lognorm.fit_fr(x, frozen=[np.nan, 0.0, np.nan]))
+print(stats.lognorm.fit_fr(x, frozen=[np.nan, 0., np.nan]))
diff --git a/statsmodels/sandbox/distributions/examples/ex_gof.py b/statsmodels/sandbox/distributions/examples/ex_gof.py
index 2304de8a0..13345fb7a 100644
--- a/statsmodels/sandbox/distributions/examples/ex_gof.py
+++ b/statsmodels/sandbox/distributions/examples/ex_gof.py
@@ -1,9 +1,11 @@
 from scipy import stats
 from statsmodels.stats import gof
-poissrvs = stats.poisson.rvs(0.6, size=200)
-freq, expfreq, histsupp = gof.gof_binning_discrete(poissrvs, stats.poisson,
-    (0.6,), nsupp=20)
-chi2val, pval = stats.chisquare(freq, expfreq)
+
+poissrvs = stats.poisson.rvs(0.6, size = 200)
+
+freq, expfreq, histsupp = gof.gof_binning_discrete(poissrvs, stats.poisson, (0.6,), nsupp=20)
+(chi2val, pval) = stats.chisquare(freq, expfreq)
 print(chi2val, pval)
+
 print(gof.gof_chisquare_discrete(stats.poisson, (0.6,), poissrvs, 0.05,
-    'Poisson'))
+                                     'Poisson'))
diff --git a/statsmodels/sandbox/distributions/examples/ex_mvelliptical.py b/statsmodels/sandbox/distributions/examples/ex_mvelliptical.py
index 159ef1dbc..55801491e 100644
--- a/statsmodels/sandbox/distributions/examples/ex_mvelliptical.py
+++ b/statsmodels/sandbox/distributions/examples/ex_mvelliptical.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """examples for multivariate normal and t distributions


@@ -12,99 +13,151 @@ for comparison I used R mvtnorm version 0.9-96
 import numpy as np
 from numpy.testing import assert_array_almost_equal
 import matplotlib.pyplot as plt
+
 import statsmodels.api as sm
 import statsmodels.distributions.mixture_rvs as mix
 import statsmodels.sandbox.distributions.mv_normal as mvd
-cov3 = np.array([[1.0, 0.5, 0.75], [0.5, 1.5, 0.6], [0.75, 0.6, 2.0]])
+
+
+cov3 = np.array([[ 1.  ,  0.5 ,  0.75],
+                   [ 0.5 ,  1.5 ,  0.6 ],
+                   [ 0.75,  0.6 ,  2.  ]])
+
 mu = np.array([-1, 0.0, 2.0])
+
+#************** multivariate normal distribution ***************
+
 mvn3 = mvd.MVNormal(mu, cov3)
+
+#compare with random sample
 x = mvn3.rvs(size=1000000)
-xli = [[2.0, 1.0, 1.5], [0.0, 2.0, 1.5], [1.5, 1.0, 2.5], [0.0, 1.0, 1.5]]
-xliarr = np.asarray(xli).T[None, :, :]
+
+xli = [[2., 1., 1.5],
+       [0., 2., 1.5],
+       [1.5, 1., 2.5],
+       [0., 1., 1.5]]
+
+xliarr = np.asarray(xli).T[None,:, :]
+
+#from R session
+#pmvnorm(lower=-Inf,upper=(x[0,.]-mu)/sqrt(diag(cov3)),mean=rep(0,3),corr3)
 r_cdf = [0.3222292, 0.3414643, 0.5450594, 0.3116296]
 r_cdf_errors = [1.715116e-05, 1.590284e-05, 5.356471e-05, 3.567548e-05]
 n_cdf = [mvn3.cdf(a) for a in xli]
 assert_array_almost_equal(r_cdf, n_cdf, decimal=4)
+
 print(n_cdf)
 print('')
-print((x < np.array(xli[0])).all(-1).mean(0))
-print((x[..., None] < xliarr).all(1).mean(0))
-print(mvn3.expect_mc(lambda x: (x < xli[0]).all(-1), size=100000))
-print(mvn3.expect_mc(lambda x: (x[..., None] < xliarr).all(1), size=100000))
+print((x<np.array(xli[0])).all(-1).mean(0))
+print((x[...,None]<xliarr).all(1).mean(0))
+print(mvn3.expect_mc(lambda x: (x<xli[0]).all(-1), size=100000))
+print(mvn3.expect_mc(lambda x: (x[...,None]<xliarr).all(1), size=100000))
+
+#other methods
 mvn3n = mvn3.normalized()
+
 assert_array_almost_equal(mvn3n.cov, mvn3n.corr, decimal=15)
 assert_array_almost_equal(mvn3n.mean, np.zeros(3), decimal=15)
+
 xn = mvn3.normalize(x)
 xn_cov = np.cov(xn, rowvar=0)
 assert_array_almost_equal(mvn3n.cov, xn_cov, decimal=2)
 assert_array_almost_equal(np.zeros(3), xn.mean(0), decimal=2)
+
 mvn3n2 = mvn3.normalized2()
 assert_array_almost_equal(mvn3n.cov, mvn3n2.cov, decimal=2)
+#mistake: "normalized2" standardizes - FIXED
+#assert_array_almost_equal(np.eye(3), mvn3n2.cov, decimal=2)
+
 xs = mvn3.standardize(x)
 xs_cov = np.cov(xn, rowvar=0)
+#another mixup xs is normalized
+#assert_array_almost_equal(np.eye(3), xs_cov, decimal=2)
 assert_array_almost_equal(mvn3.corr, xs_cov, decimal=2)
 assert_array_almost_equal(np.zeros(3), xs.mean(0), decimal=2)
-mv2m = mvn3.marginal(np.array([0, 1]))
+
+mv2m = mvn3.marginal(np.array([0,1]))
 print(mv2m.mean)
 print(mv2m.cov)
-mv2c = mvn3.conditional(np.array([0, 1]), [0])
+
+mv2c = mvn3.conditional(np.array([0,1]), [0])
 print(mv2c.mean)
 print(mv2c.cov)
+
 mv2c = mvn3.conditional(np.array([0]), [0, 0])
 print(mv2c.mean)
 print(mv2c.cov)
-mod = sm.OLS(x[:, 0], sm.add_constant(x[:, 1:], prepend=True))
+
+mod = sm.OLS(x[:,0], sm.add_constant(x[:,1:], prepend=True))
 res = mod.fit()
-print(res.model.predict(np.array([1, 0, 0])))
+print(res.model.predict(np.array([1,0,0])))
 mv2c = mvn3.conditional(np.array([0]), [0, 0])
 print(mv2c.mean)
 mv2c = mvn3.conditional(np.array([0]), [1, 1])
-print(res.model.predict(np.array([1, 1, 1])))
+print(res.model.predict(np.array([1,1,1])))
 print(mv2c.mean)
+
+#the following wrong input does not raise an exception but produces wrong numbers
+#mv2c = mvn3.conditional(np.array([0]), [[1, 1],[2,2]])
+
+#************** multivariate t distribution ***************
+
 mvt3 = mvd.MVT(mu, cov3, 4)
 xt = mvt3.rvs(size=100000)
 assert_array_almost_equal(mvt3.cov, np.cov(xt, rowvar=0), decimal=1)
 mvt3s = mvt3.standardized()
 mvt3n = mvt3.normalized()
+
+#the following should be equal or correct up to numerical precision of float
 assert_array_almost_equal(mvt3.corr, mvt3n.sigma, decimal=15)
 assert_array_almost_equal(mvt3n.corr, mvt3n.sigma, decimal=15)
 assert_array_almost_equal(np.eye(3), mvt3s.sigma, decimal=15)
+
 xts = mvt3.standardize(xt)
 xts_cov = np.cov(xts, rowvar=0)
 xtn = mvt3.normalize(xt)
 xtn_cov = np.cov(xtn, rowvar=0)
 xtn_corr = np.corrcoef(xtn, rowvar=0)
+
 assert_array_almost_equal(mvt3n.mean, xtn.mean(0), decimal=2)
+#the following might fail sometimes (random test), add seed in tests
 assert_array_almost_equal(mvt3n.corr, xtn_corr, decimal=1)
+#watch out cov is not the same as sigma for t distribution, what's right here?
+#normalize by sigma or by cov ? now normalized by sigma
 assert_array_almost_equal(mvt3n.cov, xtn_cov, decimal=1)
 assert_array_almost_equal(mvt3s.cov, xts_cov, decimal=1)
+
 a = [0.0, 1.0, 1.5]
 mvt3_cdf0 = mvt3.cdf(a)
 print(mvt3_cdf0)
-print((xt < np.array(a)).all(-1).mean(0))
-print('R', 0.3026741)
-print('R', 0.3026855)
+print((xt<np.array(a)).all(-1).mean(0))
+print('R', 0.3026741) # "error": 0.0004832187
+print('R', 0.3026855) # error 3.444375e-06   with smaller abseps
 print('diff', mvt3_cdf0 - 0.3026855)
 a = [0.0, 0.5, 1.0]
 mvt3_cdf1 = mvt3.cdf(a)
 print(mvt3_cdf1)
-print((xt < np.array(a)).all(-1).mean(0))
-print('R', 0.1946621)
-print('R', 0.1946217)
+print((xt<np.array(a)).all(-1).mean(0))
+print('R', 0.1946621) # "error": 0.0002524817)
+print('R', 0.1946217) # "error:"2.748699e-06    with smaller abseps)
 print('diff', mvt3_cdf1 - 0.1946217)
+
 assert_array_almost_equal(mvt3_cdf0, 0.3026855, decimal=5)
 assert_array_almost_equal(mvt3_cdf1, 0.1946217, decimal=5)
+
 mu2 = np.array([4, 2.0, 2.0])
-mvn32 = mvd.MVNormal(mu2, cov3 / 2.0, 4)
+mvn32 = mvd.MVNormal(mu2, cov3/2., 4)
 md = mix.mv_mixture_rvs([0.4, 0.6], 5, [mvt3, mvt3n], 3)
 rvs = mix.mv_mixture_rvs([0.4, 0.6], 2000, [mvn3, mvn32], 3)
+#rvs2 = rvs[:,:2]
 fig = plt.figure()
 fig.add_subplot(2, 2, 1)
-plt.plot(rvs[:, 0], rvs[:, 1], '.', alpha=0.25)
+plt.plot(rvs[:,0], rvs[:,1], '.', alpha=0.25)
 plt.title('1 versus 0')
 fig.add_subplot(2, 2, 2)
-plt.plot(rvs[:, 0], rvs[:, 2], '.', alpha=0.25)
+plt.plot(rvs[:,0], rvs[:,2], '.', alpha=0.25)
 plt.title('2 versus 0')
 fig.add_subplot(2, 2, 3)
-plt.plot(rvs[:, 1], rvs[:, 2], '.', alpha=0.25)
+plt.plot(rvs[:,1], rvs[:,2], '.', alpha=0.25)
 plt.title('2 versus 1')
+#plt.show()
diff --git a/statsmodels/sandbox/distributions/examples/ex_transf2.py b/statsmodels/sandbox/distributions/examples/ex_transf2.py
index c0831a282..c187c3696 100644
--- a/statsmodels/sandbox/distributions/examples/ex_transf2.py
+++ b/statsmodels/sandbox/distributions/examples/ex_transf2.py
@@ -1,30 +1,104 @@
+# -*- coding: utf-8 -*-
 """
 Created on Sun May 09 22:23:22 2010
 Author: josef-pktd
 Licese: BSD
 """
 import numpy as np
+
 from numpy.testing import assert_almost_equal
 from scipy import stats
-from statsmodels.sandbox.distributions.extras import ExpTransf_gen, LogTransf_gen, squarenormalg, absnormalg, negsquarenormalg, squaretg
+from statsmodels.sandbox.distributions.extras import (
+    ExpTransf_gen, LogTransf_gen,
+    squarenormalg, absnormalg, negsquarenormalg, squaretg)
+
+#define these as module globals
 l, s = 0.0, 1.0
 ppfq = [0.1, 0.5, 0.9]
 xx = [0.95, 1.0, 1.1]
 nxx = [-0.95, -1.0, -1.1]


+def test_loggamma():
+    #'Results for expgamma'
+    loggammaexpg = LogTransf_gen(stats.gamma)
+    cdftr = loggammaexpg._cdf(1,10)
+    cdfst = stats.loggamma.cdf(1,10)
+    assert_almost_equal(cdfst, cdftr, 14)
+
+    cdftr = loggammaexpg._cdf(2,15)
+    cdfst = stats.loggamma.cdf(2,15)
+    assert_almost_equal(cdfst, cdftr, 14)
+
+def test_loglaplace():
+    #if x is laplace then y = exp(x) is loglaplace
+    #parameters are tricky
+    #the stats.loglaplace parameter is the inverse scale of x
+    loglaplaceexpg = ExpTransf_gen(stats.laplace)
+
+    cdfst = stats.loglaplace.cdf(3,3)
+    #0.98148148148148151
+    #the parameters are shape, loc and scale of underlying laplace
+    cdftr = loglaplaceexpg._cdf(3,0,1./3)
+    assert_almost_equal(cdfst, cdftr, 14)
+
 class CheckDistEquivalence:
-    pass
+
+    #no args, kwds yet
+
+    def test_cdf(self):
+        #'\nsquare of standard normal random variable is chisquare with dof=1 distributed'
+        cdftr = self.dist.cdf(xx, *self.trargs, **self.trkwds)
+        sfctr = 1-self.dist.sf(xx, *self.trargs, **self.trkwds) #sf complement
+        cdfst = self.statsdist.cdf(xx, *self.stargs, **self.stkwds)
+        assert_almost_equal(cdfst, cdftr, 14)
+        assert_almost_equal(cdfst, sfctr, 14)
+
+    def test_pdf(self):
+        #'\nsquare of standard normal random variable is chisquare with dof=1 distributed'
+        pdftr = self.dist.pdf(xx, *self.trargs, **self.trkwds)
+        pdfst = self.statsdist.pdf(xx, *self.stargs, **self.stkwds)
+        assert_almost_equal(pdfst, pdftr, 13)
+
+    def test_ppf(self):
+        #'\nsquare of standard normal random variable is chisquare with dof=1 distributed'
+        ppftr = self.dist.ppf(ppfq, *self.trargs, **self.trkwds)
+        ppfst = self.statsdist.ppf(ppfq, *self.stargs, **self.stkwds)
+        assert_almost_equal(ppfst, ppftr, 13)
+
+    def test_rvs(self):
+        rvs = self.dist.rvs(*self.trargs, **{'size':100})
+        mean_s = rvs.mean(0)
+        mean_d, var_d = self.dist.stats(*self.trargs, **{'moments':'mv'})
+        if np.any(np.abs(mean_d) < 1):
+            assert_almost_equal(mean_d, mean_s, 1)
+        else:
+            assert_almost_equal(mean_s/mean_d, 1., 0) #tests 0.5<meanration<1.5
+
+    def test_stats(self):
+        trkwds = {'moments':'mvsk'}
+        trkwds.update(self.stkwds)
+        stkwds = {'moments':'mvsk'}
+        stkwds.update(self.stkwds)
+        mvsktr = np.array(self.dist.stats(*self.trargs, **trkwds))
+        mvskst = np.array(self.statsdist.stats(*self.stargs, **stkwds))
+        assert_almost_equal(mvskst[:2], mvsktr[:2], 8)
+        if np.any(np.abs(mvskst[2:]) < 1):
+            assert_almost_equal(mvskst[2:], mvsktr[2:], 1)
+        else:
+            assert_almost_equal(mvskst[2:]/mvsktr[2:], np.ones(2), 0)
+            #tests 0.5<meanration<1.5
+


 class TestLoggamma_1(CheckDistEquivalence):

     def __init__(self):
         self.dist = LogTransf_gen(stats.gamma)
-        self.trargs = 10,
+        self.trargs = (10,)
         self.trkwds = {}
         self.statsdist = stats.loggamma
-        self.stargs = 10,
+        self.stargs = (10,)
         self.stkwds = {}


@@ -35,10 +109,9 @@ class TestSquaredNormChi2_1(CheckDistEquivalence):
         self.trargs = ()
         self.trkwds = {}
         self.statsdist = stats.chi2
-        self.stargs = 1,
+        self.stargs = (1,)
         self.stkwds = {}

-
 class TestSquaredNormChi2_2(CheckDistEquivalence):

     def __init__(self):
@@ -46,10 +119,9 @@ class TestSquaredNormChi2_2(CheckDistEquivalence):
         self.trargs = ()
         self.trkwds = dict(loc=-10, scale=20)
         self.statsdist = stats.chi2
-        self.stargs = 1,
+        self.stargs = (1,)
         self.stkwds = dict(loc=-10, scale=20)

-
 class TestAbsNormHalfNorm(CheckDistEquivalence):

     def __init__(self):
@@ -60,131 +132,131 @@ class TestAbsNormHalfNorm(CheckDistEquivalence):
         self.stargs = ()
         self.stkwds = {}

-
 class TestSquaredTF(CheckDistEquivalence):

     def __init__(self):
         self.dist = squaretg
-        self.trargs = 10,
+        self.trargs = (10,)
         self.trkwds = {}
+
         self.statsdist = stats.f
-        self.stargs = 1, 10
+        self.stargs = (1,10)
         self.stkwds = {}

+def test_squared_normal_chi2():
+    #'\nsquare of standard normal random variable is chisquare with dof=1 distributed'
+    cdftr = squarenormalg.cdf(xx,loc=l, scale=s)
+    sfctr = 1-squarenormalg.sf(xx,loc=l, scale=s) #sf complement
+    cdfst = stats.chi2.cdf(xx,1)
+    assert_almost_equal(cdfst, cdftr, 14)
+    assert_almost_equal(cdfst, sfctr, 14)
+
+#    print('sqnorm  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), squarenormalg.pdf(xx,loc=l, scale=s)
+#    print('chi2    pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.pdf(xx,1)
+#    print('sqnorm  ppf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), squarenormalg.ppf(ppfq,loc=l, scale=s)
+#    print('chi2    ppf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.ppf(ppfq,1)
+#    print('sqnorm  cdf with loc scale', squarenormalg.cdf(xx,loc=-10, scale=20)
+#    print('chi2    cdf with loc scale', stats.chi2.cdf(xx,1,loc=-10, scale=20)
+
+

 if __name__ == '__main__':
-    l, s = 0.0, 1.0
+
+    #Examples for Transf2_gen, u- or hump shaped transformation
+    #copied from transformtwo.py
+    l,s = 0.0, 1.0
     ppfq = [0.1, 0.5, 0.9]
     xx = [0.95, 1.0, 1.1]
     nxx = [-0.95, -1.0, -1.1]
     print
-    print(
-        '\nsquare of standard normal random variable is chisquare with dof=1 distributed'
-        )
-    print('sqnorm  cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx),
-        squarenormalg.cdf(xx, loc=l, scale=s))
-    print('sqnorm 1-sf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), 1 -
-        squarenormalg.sf(xx, loc=l, scale=s))
-    print('chi2    cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.
-        cdf(xx, 1))
-    print('sqnorm  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx),
-        squarenormalg.pdf(xx, loc=l, scale=s))
-    print('chi2    pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.
-        pdf(xx, 1))
-    print('sqnorm  ppf for (%3.2f, %3.2f, %3.2f):' % tuple(xx),
-        squarenormalg.ppf(ppfq, loc=l, scale=s))
-    print('chi2    ppf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.
-        ppf(ppfq, 1))
-    print('sqnorm  cdf with loc scale', squarenormalg.cdf(xx, loc=-10,
-        scale=20))
-    print('chi2    cdf with loc scale', stats.chi2.cdf(xx, 1, loc=-10,
-        scale=20))
-    print(
-        '\nabsolute value of standard normal random variable is foldnorm(0) and '
-        )
+    #print(invnormalg.__doc__
+    print('\nsquare of standard normal random variable is chisquare with dof=1 distributed')
+    print('sqnorm  cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), squarenormalg.cdf(xx,loc=l, scale=s))
+    print('sqnorm 1-sf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), 1-squarenormalg.sf(xx,loc=l, scale=s))
+    print('chi2    cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.cdf(xx,1))
+    print('sqnorm  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), squarenormalg.pdf(xx,loc=l, scale=s))
+    print('chi2    pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.pdf(xx,1))
+    print('sqnorm  ppf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), squarenormalg.ppf(ppfq,loc=l, scale=s))
+    print('chi2    ppf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.ppf(ppfq,1))
+    print('sqnorm  cdf with loc scale', squarenormalg.cdf(xx,loc=-10, scale=20))
+    print('chi2    cdf with loc scale', stats.chi2.cdf(xx,1,loc=-10, scale=20))
+#    print('cdf for [0.5]:', squarenormalg.cdf(0.5,loc=l, scale=s))
+#    print('chi square distribution')
+#    print('chi2 pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.pdf(xx,1))
+#    print('cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.cdf(xx,1))
+
+    print('\nabsolute value of standard normal random variable is foldnorm(0) and ')
     print('halfnorm distributed:')
-    print('absnorm  cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), absnormalg
-        .cdf(xx, loc=l, scale=s))
-    print('absnorm 1-sf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), 1 -
-        absnormalg.sf(xx, loc=l, scale=s))
-    print('foldn    cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.
-        foldnorm.cdf(xx, 1e-05))
-    print('halfn    cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.
-        halfnorm.cdf(xx))
-    print('absnorm  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), absnormalg
-        .pdf(xx, loc=l, scale=s))
-    print('foldn    pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.
-        foldnorm.pdf(xx, 1e-05))
-    print('halfn    pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.
-        halfnorm.pdf(xx))
-    print('absnorm  ppf for (%3.2f, %3.2f, %3.2f):' % tuple(ppfq),
-        absnormalg.ppf(ppfq, loc=l, scale=s))
-    print('foldn    ppf for (%3.2f, %3.2f, %3.2f):' % tuple(ppfq), stats.
-        foldnorm.ppf(ppfq, 1e-05))
-    print('halfn    ppf for (%3.2f, %3.2f, %3.2f):' % tuple(ppfq), stats.
-        halfnorm.ppf(ppfq))
+    print('absnorm  cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), absnormalg.cdf(xx,loc=l, scale=s))
+    print('absnorm 1-sf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), 1-absnormalg.sf(xx,loc=l, scale=s))
+    print('foldn    cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.foldnorm.cdf(xx,1e-5))
+    print('halfn    cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.halfnorm.cdf(xx))
+    print('absnorm  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), absnormalg.pdf(xx,loc=l, scale=s))
+    print('foldn    pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.foldnorm.pdf(xx,1e-5))
+    print('halfn    pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.halfnorm.pdf(xx))
+    print('absnorm  ppf for (%3.2f, %3.2f, %3.2f):' % tuple(ppfq), absnormalg.ppf(ppfq,loc=l, scale=s))
+    print('foldn    ppf for (%3.2f, %3.2f, %3.2f):' % tuple(ppfq), stats.foldnorm.ppf(ppfq,1e-5))
+    print('halfn    ppf for (%3.2f, %3.2f, %3.2f):' % tuple(ppfq), stats.halfnorm.ppf(ppfq))
+#    print('cdf for [0.5]:', squarenormalg.cdf(0.5,loc=l, scale=s)
+#    print('chi square distribution'
+#    print('chi2 pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.pdf(xx,1)
+#    print('cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.cdf(xx,1)
+
     print('\nnegative square of standard normal random variable is')
     print('1-chisquare with dof=1 distributed')
     print('this is mainly for testing')
     print('the following should be outside of the support - returns nan')
-    print('nsqnorm  cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx),
-        negsquarenormalg.cdf(xx, loc=l, scale=s))
-    print('nsqnorm 1-sf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), 1 -
-        negsquarenormalg.sf(xx, loc=l, scale=s))
-    print('nsqnorm  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx),
-        negsquarenormalg.pdf(xx, loc=l, scale=s))
-    print('nsqnorm  cdf for (%3.2f, %3.2f, %3.2f):' % tuple(nxx),
-        negsquarenormalg.cdf(nxx, loc=l, scale=s))
-    print('nsqnorm 1-sf for (%3.2f, %3.2f, %3.2f):' % tuple(nxx), 1 -
-        negsquarenormalg.sf(nxx, loc=l, scale=s))
-    print('chi2      sf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2
-        .sf(xx, 1))
-    print('nsqnorm  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(nxx),
-        negsquarenormalg.pdf(nxx, loc=l, scale=s))
-    print('chi2     pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2
-        .pdf(xx, 1))
-    print('nsqnorm  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(nxx),
-        negsquarenormalg.pdf(nxx, loc=l, scale=s))
+    print('nsqnorm  cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), negsquarenormalg.cdf(xx,loc=l, scale=s))
+    print('nsqnorm 1-sf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), 1-negsquarenormalg.sf(xx,loc=l, scale=s))
+    print('nsqnorm  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), negsquarenormalg.pdf(xx,loc=l, scale=s))
+
+    print('nsqnorm  cdf for (%3.2f, %3.2f, %3.2f):' % tuple(nxx), negsquarenormalg.cdf(nxx,loc=l, scale=s))
+    print('nsqnorm 1-sf for (%3.2f, %3.2f, %3.2f):' % tuple(nxx), 1-negsquarenormalg.sf(nxx,loc=l, scale=s))
+    print('chi2      sf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.sf(xx,1))
+    print('nsqnorm  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(nxx), negsquarenormalg.pdf(nxx,loc=l, scale=s))
+    print('chi2     pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.chi2.pdf(xx,1))
+    print('nsqnorm  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(nxx), negsquarenormalg.pdf(nxx,loc=l, scale=s))
+
+
+
     print('\nsquare of a t distributed random variable with dof=10 is')
     print('        F with dof=1,10 distributed')
-    print('sqt  cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), squaretg.cdf(
-        xx, 10))
-    print('sqt 1-sf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), 1 - squaretg.
-        sf(xx, 10))
-    print('f    cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.f.cdf(xx,
-        1, 10))
-    print('sqt  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), squaretg.pdf(
-        xx, 10))
-    print('f    pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.f.pdf(xx,
-        1, 10))
-    print('sqt  ppf for (%3.2f, %3.2f, %3.2f):' % tuple(ppfq), squaretg.ppf
-        (ppfq, 10))
-    print('f    ppf for (%3.2f, %3.2f, %3.2f):' % tuple(ppfq), stats.f.ppf(
-        ppfq, 1, 10))
-    print('sqt  cdf for 100:', squaretg.cdf(100, 10))
-    print('f    cdf for 100:', stats.f.cdf(100, 1, 10))
+    print('sqt  cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), squaretg.cdf(xx,10))
+    print('sqt 1-sf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), 1-squaretg.sf(xx,10))
+    print('f    cdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.f.cdf(xx,1,10))
+    print('sqt  pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), squaretg.pdf(xx,10))
+    print('f    pdf for (%3.2f, %3.2f, %3.2f):' % tuple(xx), stats.f.pdf(xx,1,10))
+    print('sqt  ppf for (%3.2f, %3.2f, %3.2f):' % tuple(ppfq), squaretg.ppf(ppfq,10))
+    print('f    ppf for (%3.2f, %3.2f, %3.2f):' % tuple(ppfq), stats.f.ppf(ppfq,1,10))
+    print('sqt  cdf for 100:', squaretg.cdf(100,10))
+    print('f    cdf for 100:', stats.f.cdf(100,1,10))
     print('sqt  stats:', squaretg.stats(10, moments='mvsk'))
-    print('f    stats:', stats.f.stats(1, 10, moments='mvsk'))
-    v1 = 1
-    v2 = 10
-    g1 = 2 * (v2 + 2 * v1 - 2.0) / (v2 - 6.0) * np.sqrt(2 * (v2 - 4.0) / (
-        v1 * (v2 + v1 - 2.0)))
-    g2 = 3 / (2.0 * v2 - 16) * (8 + g1 * g1 * (v2 - 6.0))
+    print('f    stats:', stats.f.stats(1,10, moments='mvsk'))
+    #Note the results differ for skew and kurtosis. I think the 3rd and 4th moment
+    #    in the scipy.stats.f distribution is incorrect.
+    # I corrected it now in stats.distributions.py in bzr branch
+    v1=1
+    v2=10
+    g1 = 2*(v2+2*v1-2.)/(v2-6.)*np.sqrt(2*(v2-4.)/(v1*(v2+v1-2.)))
+    g2 = 3/(2.*v2-16)*(8+g1*g1*(v2-6.))
     print('corrected skew, kurtosis of f(1,10) is', g1, g2)
     print(squarenormalg.rvs())
-    print(squarenormalg.rvs(size=(2, 4)))
+    print(squarenormalg.rvs(size=(2,4)))
     print('sqt random variables')
-    print(stats.f.rvs(1, 10, size=4))
-    print(squaretg.rvs(10, size=4))
+    print(stats.f.rvs(1,10,size=4))
+    print(squaretg.rvs(10,size=4))
+
+    #a large number check:
     np.random.seed(464239857)
-    rvstsq = squaretg.rvs(10, size=100000)
-    squaretg.moment(4, 10)
-    (rvstsq ** 4).mean()
-    squaretg.moment(3, 10)
-    (rvstsq ** 3).mean()
+    rvstsq = squaretg.rvs(10,size=100000)
+    squaretg.moment(4,10)
+    (rvstsq**4).mean()
+    squaretg.moment(3,10)
+    (rvstsq**3).mean()
     squaretg.stats(10, moments='mvsk')
     stats.describe(rvstsq)
-    """
+
+    '''
     >>> np.random.seed(464239857)
     >>> rvstsq = squaretg.rvs(10,size=100000)
     >>> squaretg.moment(4,10)
@@ -199,15 +271,21 @@ if __name__ == '__main__':
     (array(1.2500000000000022), array(4.6874999999630909), array(5.7735026919777912), array(106.00000000170148))
     >>> stats.describe(rvstsq)
     (100000, (3.2953470738423724e-009, 92.649615690914473), 1.2534924690963247, 4.7741427958594098, 6.1562177957041895, 100.99331166052181)
-    """
-    dec = squaretg.ppf(np.linspace(0.0, 1, 11), 10)
-    freq, edges = np.histogram(rvstsq, bins=dec)
-    print(freq / float(len(rvstsq)))
+    '''
+    # checking the distribution
+    # fraction of observations in each decile
+    dec = squaretg.ppf(np.linspace(0.,1,11),10)
+    freq,edges = np.histogram(rvstsq, bins=dec)
+    print(freq/float(len(rvstsq)))
+
     import matplotlib.pyplot as plt
-    freq, edges, _ = plt.hist(rvstsq, bins=50, range=(0, 4), normed=True)
-    edges += (edges[1] - edges[0]) / 2.0
+    freq,edges,_ = plt.hist(rvstsq, bins=50, range=(0,4),normed=True)
+    edges += (edges[1]-edges[0])/2.0
     plt.plot(edges[:-1], squaretg.pdf(edges[:-1], 10), 'r')
-    """
+    #plt.show()
+    #plt.close()
+
+    '''
     >>> plt.plot(edges[:-1], squaretg.pdf(edges[:-1], 10), 'r')
     [<matplotlib.lines.Line2D object at 0x06EBFDB0>]
     >>> plt.fill(edges[4:8], squaretg.pdf(edges[4:8], 10), 'r')
@@ -231,6 +309,7 @@ if __name__ == '__main__':
     AttributeError: 'AxesSubplot' object has no attribute 'fill_between'
     >>> ax1.fill(edges[4:8], squaretg.pdf(edges[4:8], 10), 0, 'r')
     Traceback (most recent call last):
-    """
+    '''
+
     import pytest
     pytest.main([__file__, '-vvs', '-x', '--pdb'])
diff --git a/statsmodels/sandbox/distributions/examples/matchdist.py b/statsmodels/sandbox/distributions/examples/matchdist.py
index c8507d1a5..a2d4a7bf8 100644
--- a/statsmodels/sandbox/distributions/examples/matchdist.py
+++ b/statsmodels/sandbox/distributions/examples/matchdist.py
@@ -1,4 +1,4 @@
-"""given a 1D sample of observation, find a matching distribution
+'''given a 1D sample of observation, find a matching distribution

 * estimate maximum likelihood parameter for each distribution
 * rank estimated distribution by Kolmogorov-Smirnov and Anderson-Darling
@@ -14,26 +14,70 @@ TODO:
 * split estimation by support, add option and choose automatically
 *

-"""
+'''
 from scipy import stats
 import numpy as np
 import matplotlib.pyplot as plt
-targetdist = ['norm', 'alpha', 'anglit', 'arcsine', 'beta', 'betaprime',
-    'bradford', 'burr', 'fisk', 'cauchy', 'chi', 'chi2', 'cosine', 'dgamma',
-    'dweibull', 'erlang', 'expon', 'exponweib', 'exponpow', 'fatiguelife',
-    'foldcauchy', 'f', 'foldnorm', 'frechet_r', 'weibull_min', 'frechet_l',
-    'weibull_max', 'genlogistic', 'genpareto', 'genexpon', 'genextreme',
-    'gamma', 'gengamma', 'genhalflogistic', 'gompertz', 'gumbel_r',
-    'gumbel_l', 'halfcauchy', 'halflogistic', 'halfnorm', 'hypsecant',
-    'gausshyper', 'invgamma', 'invnorm', 'invweibull', 'johnsonsb',
-    'johnsonsu', 'laplace', 'levy', 'levy_l', 'logistic', 'loggamma',
-    'loglaplace', 'lognorm', 'gilbrat', 'maxwell', 'mielke', 'nakagami',
-    'ncx2', 'ncf', 't', 'nct', 'pareto', 'lomax', 'powerlaw',
-    'powerlognorm', 'powernorm', 'rdist', 'rayleigh', 'reciprocal', 'rice',
-    'recipinvgauss', 'semicircular', 'triang', 'truncexpon', 'truncnorm',
-    'tukeylambda', 'uniform', 'vonmises', 'wald', 'wrapcauchy', 'binom',
-    'bernoulli', 'nbinom', 'geom', 'hypergeom', 'logser', 'poisson',
-    'planck', 'boltzmann', 'randint', 'zipf', 'dlaplace']
+
+#stats.distributions.beta_gen._fitstart = lambda self, data : (5,5,0,1)
+
+def plothist(x,distfn, args, loc, scale, right=1):
+
+    plt.figure()
+    # the histogram of the data
+    n, bins, patches = plt.hist(x, 25, normed=1, facecolor='green', alpha=0.75)
+    maxheight = max([p.get_height() for p in patches])
+    print(maxheight)
+    axlim = list(plt.axis())
+    #print(axlim)
+    axlim[-1] = maxheight*1.05
+    #plt.axis(tuple(axlim))
+##    print(bins)
+##    print('args in plothist', args)
+    # add a 'best fit' line
+    #yt = stats.norm.pdf( bins, loc=loc, scale=scale)
+    yt = distfn.pdf( bins, loc=loc, scale=scale, *args)
+    yt[yt>maxheight]=maxheight
+    lt = plt.plot(bins, yt, 'r--', linewidth=1)
+    ys = stats.t.pdf( bins, 10,scale=10,)*right
+    ls = plt.plot(bins, ys, 'b-', linewidth=1)
+
+    plt.xlabel('Smarts')
+    plt.ylabel('Probability')
+    plt.title(r'$\mathrm{Testing: %s :}\ \mu=%f,\ \sigma=%f$' % (distfn.name,loc,scale))
+
+    #plt.axis([bins[0], bins[-1], 0, 0.134+0.05])
+
+    plt.grid(True)
+    plt.draw()
+    #plt.show()
+    #plt.close()
+
+
+
+
+
+#targetdist = ['norm','t','truncnorm','johnsonsu','johnsonsb',
+targetdist = ['norm','alpha', 'anglit', 'arcsine',
+           'beta', 'betaprime', 'bradford', 'burr', 'fisk', 'cauchy',
+           'chi', 'chi2', 'cosine', 'dgamma', 'dweibull', 'erlang',
+           'expon', 'exponweib', 'exponpow', 'fatiguelife', 'foldcauchy',
+           'f', 'foldnorm', 'frechet_r', 'weibull_min', 'frechet_l',
+           'weibull_max', 'genlogistic', 'genpareto', 'genexpon', 'genextreme',
+           'gamma', 'gengamma', 'genhalflogistic', 'gompertz', 'gumbel_r',
+           'gumbel_l', 'halfcauchy', 'halflogistic', 'halfnorm', 'hypsecant',
+           'gausshyper', 'invgamma', 'invnorm', 'invweibull', 'johnsonsb',
+           'johnsonsu', 'laplace', 'levy', 'levy_l',
+           'logistic', 'loggamma', 'loglaplace', 'lognorm', 'gilbrat',
+           'maxwell', 'mielke', 'nakagami', 'ncx2', 'ncf', 't',
+           'nct', 'pareto', 'lomax', 'powerlaw', 'powerlognorm', 'powernorm',
+           'rdist', 'rayleigh', 'reciprocal', 'rice', 'recipinvgauss',
+           'semicircular', 'triang', 'truncexpon', 'truncnorm',
+           'tukeylambda', 'uniform', 'vonmises', 'wald', 'wrapcauchy',
+
+           'binom', 'bernoulli', 'nbinom', 'geom', 'hypergeom', 'logser',
+           'poisson', 'planck', 'boltzmann', 'randint', 'zipf', 'dlaplace']
+
 left = []
 right = []
 finite = []
@@ -41,32 +85,42 @@ unbound = []
 other = []
 contdist = []
 discrete = []
-categ = {('open', 'open'): 'unbound', ('0', 'open'): 'right', ('open', '0'):
-    'left', ('finite', 'finite'): 'finite', ('oth', 'oth'): 'other'}
-categ = {('open', 'open'): unbound, ('0', 'open'): right, ('open', '0'):
-    left, ('finite', 'finite'): finite, ('oth', 'oth'): other}
-categ2 = {('open', '0'): ['frechet_l', 'weibull_max', 'levy_l'], ('finite',
-    'finite'): ['anglit', 'cosine', 'rdist', 'semicircular'], ('0', 'open'):
-    ['alpha', 'burr', 'fisk', 'chi', 'chi2', 'erlang', 'expon', 'exponweib',
-    'exponpow', 'fatiguelife', 'foldcauchy', 'f', 'foldnorm', 'frechet_r',
-    'weibull_min', 'genpareto', 'genexpon', 'gamma', 'gengamma',
-    'genhalflogistic', 'gompertz', 'halfcauchy', 'halflogistic', 'halfnorm',
-    'invgamma', 'invnorm', 'invweibull', 'levy', 'loglaplace', 'lognorm',
-    'gilbrat', 'maxwell', 'mielke', 'nakagami', 'ncx2', 'ncf', 'lomax',
-    'powerlognorm', 'rayleigh', 'rice', 'recipinvgauss', 'truncexpon',
-    'wald'], ('open', 'open'): ['cauchy', 'dgamma', 'dweibull',
-    'genlogistic', 'genextreme', 'gumbel_r', 'gumbel_l', 'hypsecant',
-    'johnsonsu', 'laplace', 'logistic', 'loggamma', 't', 'nct', 'powernorm',
-    'reciprocal', 'truncnorm', 'tukeylambda', 'vonmises'], ('0', 'finite'):
-    ['arcsine', 'beta', 'betaprime', 'bradford', 'gausshyper', 'johnsonsb',
-    'powerlaw', 'triang', 'uniform', 'wrapcauchy'], ('finite', 'open'): [
-    'pareto']}
+
+categ = {('open','open'):'unbound', ('0','open'):'right',('open','0',):'left',
+             ('finite','finite'):'finite',('oth','oth'):'other'}
+categ = {('open','open'):unbound, ('0','open'):right,('open','0',):left,
+             ('finite','finite'):finite,('oth','oth'):other}
+
+categ2 = {
+    ('open', '0') : ['frechet_l', 'weibull_max', 'levy_l'],
+    ('finite', 'finite') : ['anglit', 'cosine', 'rdist', 'semicircular'],
+    ('0', 'open') : ['alpha', 'burr', 'fisk', 'chi', 'chi2', 'erlang',
+                'expon', 'exponweib', 'exponpow', 'fatiguelife', 'foldcauchy', 'f',
+                'foldnorm', 'frechet_r', 'weibull_min', 'genpareto', 'genexpon',
+                'gamma', 'gengamma', 'genhalflogistic', 'gompertz', 'halfcauchy',
+                'halflogistic', 'halfnorm', 'invgamma', 'invnorm', 'invweibull',
+                'levy', 'loglaplace', 'lognorm', 'gilbrat', 'maxwell', 'mielke',
+                'nakagami', 'ncx2', 'ncf', 'lomax', 'powerlognorm', 'rayleigh',
+                'rice', 'recipinvgauss', 'truncexpon', 'wald'],
+    ('open', 'open') : ['cauchy', 'dgamma', 'dweibull', 'genlogistic', 'genextreme',
+                'gumbel_r', 'gumbel_l', 'hypsecant', 'johnsonsu', 'laplace',
+                'logistic', 'loggamma', 't', 'nct', 'powernorm', 'reciprocal',
+                'truncnorm', 'tukeylambda', 'vonmises'],
+    ('0', 'finite') : ['arcsine', 'beta', 'betaprime', 'bradford', 'gausshyper',
+                'johnsonsb', 'powerlaw', 'triang', 'uniform', 'wrapcauchy'],
+    ('finite', 'open') : ['pareto']
+    }
+
+#Note: weibull_max == frechet_l
+
 right_incorrect = ['genextreme']
-right_all = categ2['0', 'open'] + categ2['0', 'finite'] + categ2['finite',
-    'open'] + right_incorrect
+
+right_all = categ2[('0', 'open')] + categ2[('0', 'finite')] + categ2[('finite', 'open')]\
+            + right_incorrect
+
 for distname in targetdist:
-    distfn = getattr(stats, distname)
-    if hasattr(distfn, '_pdf'):
+    distfn = getattr(stats,distname)
+    if hasattr(distfn,'_pdf'):
         if np.isinf(distfn.a):
             low = 'open'
         elif distfn.a == 0:
@@ -80,14 +134,24 @@ for distname in targetdist:
         else:
             high = 'finite'
         contdist.append(distname)
-        categ.setdefault((low, high), []).append(distname)
+        categ.setdefault((low,high),[]).append(distname)
+
 not_good = ['genextreme', 'reciprocal', 'vonmises']
-targetdist = [f for f in categ['open', 'open'] if f not in not_good]
+# 'genextreme' is right (or left?), 'reciprocal' requires 0<a<b, 'vonmises' no a,b
+targetdist = [f for f in categ[('open', 'open')] if f not in not_good]
 not_good = ['wrapcauchy']
 not_good = ['vonmises']
-not_good = ['genexpon', 'vonmises']
+not_good = ['genexpon','vonmises']
+#'wrapcauchy' requires additional parameter (scale) in argcheck
 targetdist = [f for f in contdist if f not in not_good]
+#targetdist = contdist
+#targetdist = not_good
+#targetdist = ['t', 'f']
+#targetdist = ['norm','burr']
+
 if __name__ == '__main__':
+
+    #TODO: calculate correct tail probability for mixture
     prefix = 'run_conv500_1_'
     convol = 0.75
     n = 500
@@ -95,73 +159,89 @@ if __name__ == '__main__':
     dgp_scale = 10
     results = []
     for i in range(1):
-        rvs_orig = stats.t.rvs(dgp_arg, scale=dgp_scale, size=n * convol)
-        rvs_orig = np.hstack((rvs_orig, stats.halflogistic.rvs(loc=0.4,
-            scale=5.0, size=n * (1 - convol))))
+        rvs_orig = stats.t.rvs(dgp_arg,scale=dgp_scale,size=n*convol)
+        rvs_orig = np.hstack((rvs_orig,stats.halflogistic.rvs(loc=0.4, scale=5.0,size =n*(1-convol))))
         rvs_abs = np.absolute(rvs_orig)
-        rvs_pos = rvs_orig[rvs_orig > 0]
+        rvs_pos = rvs_orig[rvs_orig>0]
         rightfactor = 1
         rvs_right = rvs_pos
-        print('=' * 50)
+        print('='*50)
         print('samplesize = ', n)
         for distname in targetdist:
-            distfn = getattr(stats, distname)
+            distfn = getattr(stats,distname)
             if distname in right_all:
                 rvs = rvs_right
                 rind = rightfactor
+
             else:
                 rvs = rvs_orig
                 rind = 1
-            print('-' * 30)
+            print('-'*30)
             print('target = %s' % distname)
             sm = rvs.mean()
             sstd = np.sqrt(rvs.var())
-            ssupp = rvs.min(), rvs.max()
-            if distname in ['truncnorm', 'betaprime', 'reciprocal']:
-                par0 = sm - 2 * sstd, sm + 2 * sstd
-                par_est = tuple(distfn.fit(rvs, *par0, loc=sm, scale=sstd))
+            ssupp = (rvs.min(), rvs.max())
+            if distname in ['truncnorm','betaprime','reciprocal']:
+
+                par0 = (sm-2*sstd,sm+2*sstd)
+                par_est = tuple(distfn.fit(rvs,loc=sm,scale=sstd,*par0))
             elif distname == 'norm':
-                par_est = tuple(distfn.fit(rvs, loc=sm, scale=sstd))
+                par_est = tuple(distfn.fit(rvs,loc=sm,scale=sstd))
             elif distname == 'genextreme':
-                par_est = tuple(distfn.fit(rvs, -5, loc=sm, scale=sstd))
+                par_est = tuple(distfn.fit(rvs,-5,loc=sm,scale=sstd))
             elif distname == 'wrapcauchy':
-                par_est = tuple(distfn.fit(rvs, 0.5, loc=0, scale=sstd))
+                par_est = tuple(distfn.fit(rvs,0.5,loc=0,scale=sstd))
             elif distname == 'f':
-                par_est = tuple(distfn.fit(rvs, 10, 15, loc=0, scale=1))
+                par_est = tuple(distfn.fit(rvs,10,15,loc=0,scale=1))
+
             elif distname in right:
                 sm = rvs.mean()
                 sstd = np.sqrt(rvs.var())
-                par_est = tuple(distfn.fit(rvs, loc=0, scale=1))
+                par_est = tuple(distfn.fit(rvs,loc=0,scale=1))
             else:
                 sm = rvs.mean()
                 sstd = np.sqrt(rvs.var())
-                par_est = tuple(distfn.fit(rvs, loc=sm, scale=sstd))
+                par_est = tuple(distfn.fit(rvs,loc=sm,scale=sstd))
+
+
             print('fit', par_est)
             arg_est = par_est[:-2]
             loc_est = par_est[-2]
             scale_est = par_est[-1]
-            rvs_normed = (rvs - loc_est) / scale_est
-            ks_stat, ks_pval = stats.kstest(rvs_normed, distname, arg_est)
+            rvs_normed = (rvs-loc_est)/scale_est
+            ks_stat, ks_pval = stats.kstest(rvs_normed,distname, arg_est)
             print('kstest', ks_stat, ks_pval)
             quant = 0.1
-            crit = distfn.ppf(1 - quant * float(rind), *par_est, loc=
-                loc_est, scale=scale_est)
-            tail_prob = stats.t.sf(crit, dgp_arg, scale=dgp_scale)
+            crit = distfn.ppf(1-quant*float(rind), loc=loc_est, scale=scale_est,*par_est)
+            tail_prob = stats.t.sf(crit,dgp_arg,scale=dgp_scale)
             print('crit, prob', quant, crit, tail_prob)
-            results.append([distname, ks_stat, ks_pval, arg_est, loc_est,
-                scale_est, crit, tail_prob])
+            #if distname == 'norm':
+                #plothist(rvs,loc_est,scale_est)
+                #args = tuple()
+            results.append([distname,ks_stat, ks_pval,arg_est,loc_est,scale_est,crit,tail_prob ])
+            #plothist(rvs,distfn,arg_est,loc_est,scale_est)
+
+    #plothist(rvs,distfn,arg_est,loc_est,scale_est)
+    #plt.show()
+    #plt.close()
+    #TODO: collect results and compare tail quantiles
+
+
     from operator import itemgetter
-    res_sort = sorted(results, key=itemgetter(2))
-    res_sort.reverse()
+
+    res_sort = sorted(results, key = itemgetter(2))
+
+    res_sort.reverse()  #kstest statistic: smaller is better, pval larger is better
+
     print('number of distributions', len(res_sort))
     imagedir = 'matchresults'
     import os
     if not os.path.exists(imagedir):
         os.makedirs(imagedir)
-    for ii, di in enumerate(res_sort):
-        (distname, ks_stat, ks_pval, arg_est, loc_est, scale_est, crit,
-            tail_prob) = di[:]
-        distfn = getattr(stats, distname)
+
+    for ii,di in enumerate(res_sort):
+        distname,ks_stat, ks_pval,arg_est,loc_est,scale_est,crit,tail_prob = di[:]
+        distfn = getattr(stats,distname)
         if distname in right_all:
             rvs = rvs_right
             rind = rightfactor
@@ -170,8 +250,11 @@ if __name__ == '__main__':
             rvs = rvs_orig
             ri = ''
             rind = 1
-        print('%s ks-stat = %f, ks-pval = %f tail_prob = %f)' % (distname,
-            ks_stat, ks_pval, tail_prob))
-        plothist(rvs, distfn, arg_est, loc_est, scale_est, right=rind)
-        plt.savefig(os.path.join(imagedir, '%s%s%02d_%s.png' % (prefix, ri,
-            ii, distname)))
+        print('%s ks-stat = %f, ks-pval = %f tail_prob = %f)' % \
+              (distname, ks_stat, ks_pval, tail_prob))
+    ##    print('arg_est = %s, loc_est = %f scale_est = %f)' % \
+    ##          (repr(arg_est),loc_est,scale_est))
+        plothist(rvs,distfn,arg_est,loc_est,scale_est,right = rind)
+        plt.savefig(os.path.join(imagedir,'%s%s%02d_%s.png'% (prefix, ri,ii, distname)))
+    ##plt.show()
+    ##plt.close()
diff --git a/statsmodels/sandbox/distributions/extras.py b/statsmodels/sandbox/distributions/extras.py
index b810acb8d..311dace97 100644
--- a/statsmodels/sandbox/distributions/extras.py
+++ b/statsmodels/sandbox/distributions/extras.py
@@ -1,4 +1,4 @@
-"""Various extensions to distributions
+'''Various extensions to distributions

 * skew normal and skew t distribution by Azzalini, A. & Capitanio, A.
 * Gram-Charlier expansion distribution (using 4 moments),
@@ -30,8 +30,7 @@ Example
 >>> logtg = Transf_gen(stats.t, np.exp, np.log,
                 numargs = 1, a=0, name = 'lnnorm',
                 longname = 'Exp transformed normal',
-                extradoc = '
-distribution of y = exp(x), with x standard normal'
+                extradoc = '\ndistribution of y = exp(x), with x standard normal'
                 'precision for moment andstats is not very high, 2-3 decimals')
 >>> logtg.cdf(5, 6)
 0.92067704211191848
@@ -49,57 +48,197 @@ distribution of y = exp(x), with x standard normal'
 Author: josef-pktd
 License: BSD

-"""
+'''
+
 import numpy as np
 from numpy import poly1d, sqrt, exp
+
 import scipy
 from scipy import stats, special
 from scipy.stats import distributions
+
 from statsmodels.stats.moment_helpers import mvsk2mc, mc2mvsk
+
 try:
     from scipy.stats._mvn import mvndst
 except ImportError:
+    # Must be using SciPy <1.8.0 where this function was moved (it's not a
+    # public SciPy function, but we need it here)
     from scipy.stats.mvn import mvndst


+# note copied from distr_skewnorm_0.py
+
+
 class SkewNorm_gen(distributions.rv_continuous):
-    """univariate Skew-Normal distribution of Azzalini
+    '''univariate Skew-Normal distribution of Azzalini

     class follows scipy.stats.distributions pattern
     but with __init__


-    """
+    '''

     def __init__(self):
-        distributions.rv_continuous.__init__(self, name=
-            'Skew Normal distribution', shapes='alpha')
+        # super(SkewNorm_gen,self).__init__(
+        distributions.rv_continuous.__init__(self,
+                                             name='Skew Normal distribution', shapes='alpha',
+                                             # extradoc = ''' '''
+                                             )
+
+    def _argcheck(self, alpha):
+        return 1  # (alpha >= 0)
+
+    def _rvs(self, alpha):
+        # see http://azzalini.stat.unipd.it/SN/faq.html
+        delta = alpha / np.sqrt(1 + alpha ** 2)
+        u0 = stats.norm.rvs(size=self._size)
+        u1 = delta * u0 + np.sqrt(1 - delta ** 2) * stats.norm.rvs(size=self._size)
+        return np.where(u0 > 0, u1, -u1)
+
+    def _munp(self, n, alpha):
+        # use pdf integration with _mom0_sc if only _pdf is defined.
+        # default stats calculation uses ppf, which is much slower
+        return self._mom0_sc(n, alpha)
+
+    def _pdf(self, x, alpha):
+        # 2*normpdf(x)*normcdf(alpha*x)
+        return 2.0 / np.sqrt(2 * np.pi) * np.exp(-x ** 2 / 2.0) * special.ndtr(alpha * x)
+
+    def _stats_skip(self, x, alpha, moments='mvsk'):
+        # skip for now to force moment integration as check
+        pass


 skewnorm = SkewNorm_gen()


+# generated the same way as distributions in stats.distributions
 class SkewNorm2_gen(distributions.rv_continuous):
-    """univariate Skew-Normal distribution of Azzalini
+    '''univariate Skew-Normal distribution of Azzalini

     class follows scipy.stats.distributions pattern

-    """
+    '''
+
+    def _argcheck(self, alpha):
+        return 1  # where(alpha>=0, 1, 0)

+    def _pdf(self, x, alpha):
+        # 2*normpdf(x)*normcdf(alpha*x
+        return 2.0 / np.sqrt(2 * np.pi) * np.exp(-x ** 2 / 2.0) * special.ndtr(alpha * x)

-skewnorm2 = SkewNorm2_gen(name='Skew Normal distribution', shapes='alpha')
+
+skewnorm2 = SkewNorm2_gen(name='Skew Normal distribution', shapes='alpha',
+                          # extradoc = '''  -inf < alpha < inf'''
+                          )


 class ACSkewT_gen(distributions.rv_continuous):
-    """univariate Skew-T distribution of Azzalini
+    '''univariate Skew-T distribution of Azzalini

     class follows scipy.stats.distributions pattern
     but with __init__
-    """
+    '''

     def __init__(self):
-        distributions.rv_continuous.__init__(self, name=
-            'Skew T distribution', shapes='df, alpha')
+        # super(SkewT_gen,self).__init__(
+        distributions.rv_continuous.__init__(self,
+                                             name='Skew T distribution', shapes='df, alpha',
+                                             )
+
+    #             extradoc = '''
+    # Skewed T distribution by Azzalini, A. & Capitanio, A. (2003)_
+    #
+    # the pdf is given by:
+    #  pdf(x) = 2.0 * t.pdf(x, df) * t.cdf(df+1, alpha*x*np.sqrt((1+df)/(x**2+df)))
+    #
+    # with alpha >=0
+    #
+    # Note: different from skewed t distribution by Hansen 1999
+    # .._
+    # Azzalini, A. & Capitanio, A. (2003), Distributions generated by perturbation of
+    # symmetry with emphasis on a multivariate skew-t distribution,
+    # appears in J.Roy.Statist.Soc, series B, vol.65, pp.367-389
+    #
+    # '''                               )
+
+    def _argcheck(self, df, alpha):
+        return (alpha == alpha) * (df > 0)
+
+    ##    def _arg_check(self, alpha):
+    ##        return np.where(alpha>=0, 0, 1)
+    ##    def _argcheck(self, alpha):
+    ##        return np.where(alpha>=0, 1, 0)
+
+    def _rvs(self, df, alpha):
+        # see http://azzalini.stat.unipd.it/SN/faq.html
+        # delta = alpha/np.sqrt(1+alpha**2)
+        V = stats.chi2.rvs(df, size=self._size)
+        z = skewnorm.rvs(alpha, size=self._size)
+        return z / np.sqrt(V / df)
+
+    def _munp(self, n, df, alpha):
+        # use pdf integration with _mom0_sc if only _pdf is defined.
+        # default stats calculation uses ppf
+        return self._mom0_sc(n, df, alpha)
+
+    def _pdf(self, x, df, alpha):
+        # 2*normpdf(x)*normcdf(alpha*x)
+        return 2.0 * distributions.t._pdf(x, df) * special.stdtr(df + 1, alpha * x * np.sqrt(
+            (1 + df) / (x ** 2 + df)))
+
+
+##
+##def mvsk2cm(*args):
+##    mu,sig,sk,kur = args
+##    # Get central moments
+##    cnt = [None]*4
+##    cnt[0] = mu
+##    cnt[1] = sig #*sig
+##    cnt[2] = sk * sig**1.5
+##    cnt[3] = (kur+3.0) * sig**2.0
+##    return cnt
+##
+##
+##def mvsk2m(args):
+##    mc, mc2, skew, kurt = args#= self._stats(*args,**mdict)
+##    mnc = mc
+##    mnc2 = mc2 + mc*mc
+##    mc3  = skew*(mc2**1.5) # 3rd central moment
+##    mnc3 = mc3+3*mc*mc2+mc**3 # 3rd non-central moment
+##    mc4  = (kurt+3.0)*(mc2**2.0) # 4th central moment
+##    mnc4 = mc4+4*mc*mc3+6*mc*mc*mc2+mc**4
+##    return (mc, mc2, mc3, mc4), (mnc, mnc2, mnc3, mnc4)
+##
+##def mc2mvsk(args):
+##    mc, mc2, mc3, mc4 = args
+##    skew = mc3 / mc2**1.5
+##    kurt = mc4 / mc2**2.0 - 3.0
+##    return (mc, mc2, skew, kurt)
+##
+##def m2mc(args):
+##    mnc, mnc2, mnc3, mnc4 = args
+##    mc = mnc
+##    mc2 = mnc2 - mnc*mnc
+##    #mc3  = skew*(mc2**1.5) # 3rd central moment
+##    mc3 = mnc3 - (3*mc*mc2+mc**3) # 3rd central moment
+##    #mc4  = (kurt+3.0)*(mc2**2.0) # 4th central moment
+##    mc4 = mnc4 - (4*mc*mc3+6*mc*mc*mc2+mc**4)
+##    return (mc, mc2, mc3, mc4)
+
+
+def _hermnorm(N):
+    # return the negatively normalized hermite polynomials up to order N-1
+    #  (inclusive)
+    #  using the recursive relationship
+    #  p_n+1 = p_n(x)' - x*p_n(x)
+    #   and p_0(x) = 1
+    plist = [None] * N
+    plist[0] = poly1d(1)
+    for n in range(1, N):
+        plist[n] = plist[n - 1].deriv() - poly1d([1, 0]) * plist[n - 1]
+    return plist


 def pdf_moments_st(cnt):
@@ -109,7 +248,38 @@ def pdf_moments_st(cnt):
     version of scipy.stats, any changes ?
     the scipy.stats version has a bug and returns normal distribution
     """
-    pass
+
+    N = len(cnt)
+    if N < 2:
+        raise ValueError("At least two moments must be given to "
+                         "approximate the pdf.")
+
+    totp = poly1d(1)
+    sig = sqrt(cnt[1])
+    mu = cnt[0]
+    if N > 2:
+        Dvals = _hermnorm(N + 1)
+    for k in range(3, N + 1):
+        # Find Ck
+        Ck = 0.0
+        for n in range((k - 3) / 2):
+            m = k - 2 * n
+            if m % 2:  # m is odd
+                momdiff = cnt[m - 1]
+            else:
+                momdiff = cnt[m - 1] - sig * sig * scipy.factorial2(m - 1)
+            Ck += Dvals[k][m] / sig ** m * momdiff
+        # Add to totp
+        raise SystemError
+        print(Dvals)
+        print(Ck)
+        totp = totp + Ck * Dvals[k]
+
+    def thisfunc(x):
+        xn = (x - mu) / sig
+        return totp(xn) * exp(-xn * xn / 2.0) / sqrt(2 * np.pi) / sig
+
+    return thisfunc, totp


 def pdf_mvsk(mvsk):
@@ -152,7 +322,28 @@ def pdf_mvsk(mvsk):
     Johnson N.L., S. Kotz, N. Balakrishnan: Continuous Univariate
     Distributions, Volume 1, 2nd ed., p.30
     """
-    pass
+    N = len(mvsk)
+    if N < 4:
+        raise ValueError("Four moments must be given to "
+                         "approximate the pdf.")
+
+    mu, mc2, skew, kurt = mvsk
+
+    totp = poly1d(1)
+    sig = sqrt(mc2)
+    if N > 2:
+        Dvals = _hermnorm(N + 1)
+        C3 = skew / 6.0
+        C4 = kurt / 24.0
+        # Note: Hermite polynomial for order 3 in _hermnorm is negative
+        # instead of positive
+        totp = totp - C3 * Dvals[3] + C4 * Dvals[4]
+
+    def pdffunc(x):
+        xn = (x - mu) / sig
+        return totp(xn) * np.exp(-xn * xn / 2.0) / np.sqrt(2 * np.pi) / sig
+
+    return pdffunc


 def pdf_moments(cnt):
@@ -181,24 +372,82 @@ def pdf_moments(cnt):
     Johnson N.L., S. Kotz, N. Balakrishnan: Continuous Univariate
     Distributions, Volume 1, 2nd ed., p.30
     """
-    pass
+    N = len(cnt)
+    if N < 2:
+        raise ValueError("At least two moments must be given to "
+                         "approximate the pdf.")
+
+    mc, mc2, mc3, mc4 = cnt
+    skew = mc3 / mc2 ** 1.5
+    kurt = mc4 / mc2 ** 2.0 - 3.0  # Fisher kurtosis, excess kurtosis
+
+    totp = poly1d(1)
+    sig = sqrt(cnt[1])
+    mu = cnt[0]
+    if N > 2:
+        Dvals = _hermnorm(N + 1)
+        ##    for k in range(3,N+1):
+        ##        # Find Ck
+        ##        Ck = 0.0
+        ##        for n in range((k-3)/2):
+        ##            m = k-2*n
+        ##            if m % 2: # m is odd
+        ##                momdiff = cnt[m-1]
+        ##            else:
+        ##                momdiff = cnt[m-1] - sig*sig*scipy.factorial2(m-1)
+        ##            Ck += Dvals[k][m] / sig**m * momdiff
+        ##        # Add to totp
+        ##        raise
+        ##        print Dvals
+        ##        print Ck
+        ##        totp = totp +  Ck*Dvals[k]
+        C3 = skew / 6.0
+        C4 = kurt / 24.0
+        totp = totp - C3 * Dvals[3] + C4 * Dvals[4]
+
+    def thisfunc(x):
+        xn = (x - mu) / sig
+        return totp(xn) * np.exp(-xn * xn / 2.0) / np.sqrt(2 * np.pi) / sig
+
+    return thisfunc


 class NormExpan_gen(distributions.rv_continuous):
-    """Gram-Charlier Expansion of Normal distribution
+    '''Gram-Charlier Expansion of Normal distribution

     class follows scipy.stats.distributions pattern
     but with __init__

-    """
+    '''

     def __init__(self, args, **kwds):
-        distributions.rv_continuous.__init__(self, name=
-            'Normal Expansion distribution', shapes=' ')
+        # todo: replace with super call
+        distributions.rv_continuous.__init__(self,
+                                             name='Normal Expansion distribution', shapes=' ',
+                                             )
+        #     extradoc = '''
+        # The distribution is defined as the Gram-Charlier expansion of
+        # the normal distribution using the first four moments. The pdf
+        # is given by
+        #
+        # pdf(x) = (1+ skew/6.0 * H(xc,3) + kurt/24.0 * H(xc,4))*normpdf(xc)
+        #
+        # where xc = (x-mu)/sig is the standardized value of the random variable
+        # and H(xc,3) and H(xc,4) are Hermite polynomials
+        #
+        # Note: This distribution has to be parametrized during
+        # initialization and instantiation, and does not have a shape
+        # parameter after instantiation (similar to frozen distribution
+        # except for location and scale.) Location and scale can be used
+        # as with other distributions, however note, that they are relative
+        # to the initialized distribution.
+        # '''  )
+        # print args, kwds
         mode = kwds.get('mode', 'sample')
+
         if mode == 'sample':
             mu, sig, sk, kur = stats.describe(args)[2:]
-            self.mvsk = mu, sig, sk, kur
+            self.mvsk = (mu, sig, sk, kur)
             cnt = mvsk2mc((mu, sig, sk, kur))
         elif mode == 'mvsk':
             cnt = mvsk2mc(args)
@@ -208,11 +457,25 @@ class NormExpan_gen(distributions.rv_continuous):
             self.mvsk = mc2mvsk(cnt)
         else:
             raise ValueError("mode must be 'mvsk' or centmom")
+
         self.cnt = cnt
+        # self.mvsk = (mu,sig,sk,kur)
+        # self._pdf = pdf_moments(cnt)
         self._pdf = pdf_mvsk(self.mvsk)

+    def _munp(self, n):
+        # use pdf integration with _mom0_sc if only _pdf is defined.
+        # default stats calculation uses ppf
+        return self._mom0_sc(n)
+
+    def _stats_skip(self):
+        # skip for now to force numerical integration of pdf for testing
+        return self.mvsk
+
+
+## copied from nonlinear_transform_gen.py

-""" A class for the distribution of a non-linear monotonic transformation of a continuous random variable
+''' A class for the distribution of a non-linear monotonic transformation of a continuous random variable

 simplest usage:
 example: create log-gamma distribution, i.e. y = log(x),
@@ -247,56 +510,122 @@ Created on Tuesday, October 28, 2008, 12:40:37 PM
 Author: josef-pktd
 License: BSD

-"""
+'''
+
+
+def get_u_argskwargs(**kwargs):
+    # Todo: What's this? wrong spacing, used in Transf_gen TransfTwo_gen
+    u_kwargs = dict((k.replace('u_', '', 1), v) for k, v in kwargs.items()
+                    if k.startswith('u_'))
+    u_args = u_kwargs.pop('u_args', None)
+    return u_args, u_kwargs


 class Transf_gen(distributions.rv_continuous):
-    """a class for non-linear monotonic transformation of a continuous random variable
+    '''a class for non-linear monotonic transformation of a continuous random variable

-    """
+    '''

     def __init__(self, kls, func, funcinv, *args, **kwargs):
+        # print args
+        # print kwargs
+
         self.func = func
         self.funcinv = funcinv
+        # explicit for self.__dict__.update(kwargs)
+        # need to set numargs because inspection does not work
         self.numargs = kwargs.pop('numargs', 0)
+        # print self.numargs
         name = kwargs.pop('name', 'transfdist')
-        longname = kwargs.pop('longname', 'Non-linear transformed distribution'
-            )
+        longname = kwargs.pop('longname', 'Non-linear transformed distribution')
         extradoc = kwargs.pop('extradoc', None)
         a = kwargs.pop('a', -np.inf)
         b = kwargs.pop('b', np.inf)
         self.decr = kwargs.pop('decr', False)
+        # defines whether it is a decreasing (True)
+        #       or increasing (False) monotonic transformation
+
         self.u_args, self.u_kwargs = get_u_argskwargs(**kwargs)
-        self.kls = kls
-        super(Transf_gen, self).__init__(a=a, b=b, name=name, longname=longname
-            )
+        self.kls = kls  # (self.u_args, self.u_kwargs)
+        # possible to freeze the underlying distribution
+
+        super(Transf_gen, self).__init__(a=a, b=b, name=name,
+                                         longname=longname,
+                                         )
+        # extradoc = extradoc)
+
+    def _rvs(self, *args, **kwargs):
+        self.kls._size = self._size
+        return self.funcinv(self.kls._rvs(*args))
+
+    def _cdf(self, x, *args, **kwargs):
+        # print args
+        if not self.decr:
+            return self.kls._cdf(self.funcinv(x), *args, **kwargs)
+            # note scipy _cdf only take *args not *kwargs
+        else:
+            return 1.0 - self.kls._cdf(self.funcinv(x), *args, **kwargs)
+
+    def _ppf(self, q, *args, **kwargs):
+        if not self.decr:
+            return self.func(self.kls._ppf(q, *args, **kwargs))
+        else:
+            return self.func(self.kls._ppf(1 - q, *args, **kwargs))
+
+
+def inverse(x):
+    return np.divide(1.0, x)


 mux, stdx = 0.05, 0.1
 mux, stdx = 9.0, 1.0
-invdnormalg = Transf_gen(stats.norm, inversew, inversew_inv, decr=True,
-    numargs=0, name='discf', longname='normal-based discount factor',
-    extradoc=
-    """
-distribution of discount factor y=1/(1+x)) with x N(0.05,0.1**2)""")
-lognormalg = Transf_gen(stats.norm, np.exp, np.log, numargs=2, a=0, name=
-    'lnnorm', longname='Exp transformed normal')
+
+
+def inversew(x):
+    return 1.0 / (1 + mux + x * stdx)
+
+
+def inversew_inv(x):
+    return (1.0 / x - 1.0 - mux) / stdx  # .np.divide(1.0,x)-10
+
+
+def identit(x):
+    return x
+
+
+invdnormalg = Transf_gen(stats.norm, inversew, inversew_inv, decr=True,  # a=-np.inf,
+                         numargs=0, name='discf', longname='normal-based discount factor',
+                         extradoc='\ndistribution of discount factor y=1/(1+x)) with x N(0.05,0.1**2)')
+
+lognormalg = Transf_gen(stats.norm, np.exp, np.log,
+                        numargs=2, a=0, name='lnnorm',
+                        longname='Exp transformed normal',
+                        # extradoc = '\ndistribution of y = exp(x), with x standard normal'
+                        # 'precision for moment andstats is not very high, 2-3 decimals'
+                        )
+
 loggammaexpg = Transf_gen(stats.gamma, np.log, np.exp, numargs=1)
-"""univariate distribution of a non-linear monotonic transformation of a
+
+## copied form nonlinear_transform_short.py
+
+'''univariate distribution of a non-linear monotonic transformation of a
 random variable

-"""
+'''


 class ExpTransf_gen(distributions.rv_continuous):
-    """Distribution based on log/exp transformation
+    '''Distribution based on log/exp transformation

     the constructor can be called with a distribution class
     and generates the distribution of the transformed random variable

-    """
+    '''

     def __init__(self, kls, *args, **kwargs):
+        # print args
+        # print kwargs
+        # explicit for self.__dict__.update(kwargs)
         if 'numargs' in kwargs:
             self.numargs = kwargs['numargs']
         else:
@@ -312,16 +641,25 @@ class ExpTransf_gen(distributions.rv_continuous):
         super(ExpTransf_gen, self).__init__(a=0, name=name)
         self.kls = kls

+    def _cdf(self, x, *args):
+        pass
+        # print args
+        return self.kls.cdf(np.log(x), *args)
+
+    def _ppf(self, q, *args):
+        return np.exp(self.kls.ppf(q, *args))
+

 class LogTransf_gen(distributions.rv_continuous):
-    """Distribution based on log/exp transformation
+    '''Distribution based on log/exp transformation

     the constructor can be called with a distribution class
     and generates the distribution of the transformed random variable

-    """
+    '''

     def __init__(self, kls, *args, **kwargs):
+        # explicit for self.__dict__.update(kwargs)
         if 'numargs' in kwargs:
             self.numargs = kwargs['numargs']
         else:
@@ -334,16 +672,27 @@ class LogTransf_gen(distributions.rv_continuous):
             a = kwargs['a']
         else:
             a = 0
+
         super(LogTransf_gen, self).__init__(a=a, name=name)
         self.kls = kls

+    def _cdf(self, x, *args):
+        # print args
+        return self.kls._cdf(np.exp(x), *args)
+
+    def _ppf(self, q, *args):
+        return np.log(self.kls._ppf(q, *args))
+

-"""
+## copied from transformtwo.py
+
+'''
 Created on Apr 28, 2009

 @author: Josef Perktold
-"""
-""" A class for the distribution of a non-linear u-shaped or hump shaped transformation of a
+'''
+
+''' A class for the distribution of a non-linear u-shaped or hump shaped transformation of a
 continuous random variable

 This is a companion to the distributions of non-linear monotonic transformation to the case
@@ -371,11 +720,11 @@ TODO:

   * add _rvs as method, will be faster in many cases

-"""
+'''


 class TransfTwo_gen(distributions.rv_continuous):
-    """Distribution based on a non-monotonic (u- or hump-shaped transformation)
+    '''Distribution based on a non-monotonic (u- or hump-shaped transformation)

     the constructor can be called with a distribution class, and functions
     that define the non-linear transformation.
@@ -388,57 +737,205 @@ class TransfTwo_gen(distributions.rv_continuous):
     This can be used to generate distribution instances similar to the
     distributions in scipy.stats.

-    """
+    '''

+    # a class for non-linear non-monotonic transformation of a continuous random variable
     def __init__(self, kls, func, funcinvplus, funcinvminus, derivplus,
-        derivminus, *args, **kwargs):
+                 derivminus, *args, **kwargs):
+        # print args
+        # print kwargs
+
         self.func = func
         self.funcinvplus = funcinvplus
         self.funcinvminus = funcinvminus
         self.derivplus = derivplus
         self.derivminus = derivminus
+        # explicit for self.__dict__.update(kwargs)
+        # need to set numargs because inspection does not work
         self.numargs = kwargs.pop('numargs', 0)
+        # print self.numargs
         name = kwargs.pop('name', 'transfdist')
-        longname = kwargs.pop('longname', 'Non-linear transformed distribution'
-            )
+        longname = kwargs.pop('longname', 'Non-linear transformed distribution')
         extradoc = kwargs.pop('extradoc', None)
-        a = kwargs.pop('a', -np.inf)
-        b = kwargs.pop('b', np.inf)
+        a = kwargs.pop('a', -np.inf)  # attached to self in super
+        b = kwargs.pop('b', np.inf)  # self.a, self.b would be overwritten
         self.shape = kwargs.pop('shape', False)
+        # defines whether it is a `u` shaped or `hump' shaped
+        #       transformation
+
         self.u_args, self.u_kwargs = get_u_argskwargs(**kwargs)
-        self.kls = kls
-        super(TransfTwo_gen, self).__init__(a=a, b=b, name=name, shapes=kls
-            .shapes, longname=longname)
-        self._ctor_param.update(dict(kls=kls, func=func, funcinvplus=
-            funcinvplus, funcinvminus=funcinvminus, derivplus=derivplus,
-            derivminus=derivminus, shape=self.shape))
+        self.kls = kls  # (self.u_args, self.u_kwargs)
+        # possible to freeze the underlying distribution
+
+        super(TransfTwo_gen, self).__init__(a=a, b=b, name=name,
+                                            shapes=kls.shapes,
+                                            longname=longname,
+                                            # extradoc = extradoc
+                                            )
+
+        # add enough info for self.freeze() to be able to reconstruct the instance
+        self._ctor_param.update(
+            dict(kls=kls, func=func, funcinvplus=funcinvplus,
+                 funcinvminus=funcinvminus, derivplus=derivplus,
+                 derivminus=derivminus, shape=self.shape)
+        )
+
+    def _rvs(self, *args):
+        self.kls._size = self._size  # size attached to self, not function argument
+        return self.func(self.kls._rvs(*args))
+
+    def _pdf(self, x, *args, **kwargs):
+        # print args
+        if self.shape == 'u':
+            signpdf = 1
+        elif self.shape == 'hump':
+            signpdf = -1
+        else:
+            raise ValueError('shape can only be `u` or `hump`')
+
+        return signpdf * (self.derivplus(x) * self.kls._pdf(self.funcinvplus(x), *args, **kwargs) -
+                          self.derivminus(x) * self.kls._pdf(self.funcinvminus(x), *args,
+                                                             **kwargs))
+        # note scipy _cdf only take *args not *kwargs
+
+    def _cdf(self, x, *args, **kwargs):
+        # print args
+        if self.shape == 'u':
+            return self.kls._cdf(self.funcinvplus(x), *args, **kwargs) - \
+                self.kls._cdf(self.funcinvminus(x), *args, **kwargs)
+            # note scipy _cdf only take *args not *kwargs
+        else:
+            return 1.0 - self._sf(x, *args, **kwargs)
+
+    def _sf(self, x, *args, **kwargs):
+        # print args
+        if self.shape == 'hump':
+            return self.kls._cdf(self.funcinvplus(x), *args, **kwargs) - \
+                self.kls._cdf(self.funcinvminus(x), *args, **kwargs)
+            # note scipy _cdf only take *args not *kwargs
+        else:
+            return 1.0 - self._cdf(x, *args, **kwargs)
+
+    def _munp(self, n, *args, **kwargs):
+        args = [np.squeeze(arg) for arg in args]
+        out = np.squeeze(self._mom0_sc(n, *args))
+        if np.isscalar(out):
+            return float(out)
+        return out
+
+
+# ppf might not be possible in general case?
+# should be possible in symmetric case
+#    def _ppf(self, q, *args, **kwargs):
+#        if self.shape == 'u':
+#            return self.func(self.kls._ppf(q,*args, **kwargs))
+#        elif self.shape == 'hump':
+#            return self.func(self.kls._ppf(1-q,*args, **kwargs))

+# TODO: rename these functions to have unique names

 class SquareFunc:
-    """class to hold quadratic function with inverse function and derivative
+    '''class to hold quadratic function with inverse function and derivative

     using instance methods instead of class methods, if we want extension
     to parametrized function
-    """
+    '''
+
+    def inverseplus(self, x):
+        return np.sqrt(x)
+
+    def inverseminus(self, x):
+        return 0.0 - np.sqrt(x)
+
+    def derivplus(self, x):
+        return 0.5 / np.sqrt(x)
+
+    def derivminus(self, x):
+        return 0.0 - 0.5 / np.sqrt(x)
+
+    def squarefunc(self, x):
+        return np.power(x, 2)


 sqfunc = SquareFunc()
-squarenormalg = TransfTwo_gen(stats.norm, sqfunc.squarefunc, sqfunc.
-    inverseplus, sqfunc.inverseminus, sqfunc.derivplus, sqfunc.derivminus,
-    shape='u', a=0.0, b=np.inf, numargs=0, name='squarenorm', longname=
-    'squared normal distribution')
+
+squarenormalg = TransfTwo_gen(stats.norm, sqfunc.squarefunc, sqfunc.inverseplus,
+                              sqfunc.inverseminus, sqfunc.derivplus, sqfunc.derivminus,
+                              shape='u', a=0.0, b=np.inf,
+                              numargs=0, name='squarenorm', longname='squared normal distribution',
+                              # extradoc = '\ndistribution of the square of a normal random variable' +\
+                              #           ' y=x**2 with x N(0.0,1)'
+                              )
+# u_loc=l, u_scale=s)
 squaretg = TransfTwo_gen(stats.t, sqfunc.squarefunc, sqfunc.inverseplus,
-    sqfunc.inverseminus, sqfunc.derivplus, sqfunc.derivminus, shape='u', a=
-    0.0, b=np.inf, numargs=1, name='squarenorm', longname=
-    'squared t distribution')
-negsquarenormalg = TransfTwo_gen(stats.norm, negsquarefunc, inverseplus,
-    inverseminus, derivplus, derivminus, shape='hump', a=-np.inf, b=0.0,
-    numargs=0, name='negsquarenorm', longname=
-    'negative squared normal distribution')
+                         sqfunc.inverseminus, sqfunc.derivplus, sqfunc.derivminus,
+                         shape='u', a=0.0, b=np.inf,
+                         numargs=1, name='squarenorm', longname='squared t distribution',
+                         # extradoc = '\ndistribution of the square of a t random variable' +\
+                         #            ' y=x**2 with x t(dof,0.0,1)'
+                         )
+
+
+def inverseplus(x):
+    return np.sqrt(-x)
+
+
+def inverseminus(x):
+    return 0.0 - np.sqrt(-x)
+
+
+def derivplus(x):
+    return 0.0 - 0.5 / np.sqrt(-x)
+
+
+def derivminus(x):
+    return 0.5 / np.sqrt(-x)
+
+
+def negsquarefunc(x):
+    return -np.power(x, 2)
+
+
+negsquarenormalg = TransfTwo_gen(stats.norm, negsquarefunc, inverseplus, inverseminus,
+                                 derivplus, derivminus, shape='hump', a=-np.inf, b=0.0,
+                                 numargs=0, name='negsquarenorm',
+                                 longname='negative squared normal distribution',
+                                 # extradoc = '\ndistribution of the negative square of a normal random variable' +\
+                                 #            ' y=-x**2 with x N(0.0,1)'
+                                 )
+
+
+# u_loc=l, u_scale=s)
+
+def inverseplus(x):
+    return x
+
+
+def inverseminus(x):
+    return 0.0 - x
+
+
+def derivplus(x):
+    return 1.0
+
+
+def derivminus(x):
+    return 0.0 - 1.0
+
+
+def absfunc(x):
+    return np.abs(x)
+
+
 absnormalg = TransfTwo_gen(stats.norm, np.abs, inverseplus, inverseminus,
-    derivplus, derivminus, shape='u', a=0.0, b=np.inf, numargs=0, name=
-    'absnorm', longname='absolute of normal distribution')
-"""multivariate normal probabilities and cumulative distribution function
+                           derivplus, derivminus, shape='u', a=0.0, b=np.inf,
+                           numargs=0, name='absnorm', longname='absolute of normal distribution',
+                           # extradoc = '\ndistribution of the absolute value of a normal random variable' +\
+                           #            ' y=abs(x) with x N(0,1)'
+                           )
+
+# copied from mvncdf.py
+'''multivariate normal probabilities and cumulative distribution function
 a wrapper for scipy.stats._mvn.mvndst


@@ -515,15 +1012,16 @@ a wrapper for scipy.stats._mvn.mvndst
 (2e-016, 0.33333333333333337, 0)
 >>> mvndst([0.0,0.0],[0.0,0.0],[0,0],[0.99])
 (2e-016, 0.47747329317779391, 0)
-"""
-informcode = {(0): 'normal completion with ERROR < EPS', (1):
-    """completion with ERROR > EPS and MAXPTS function values used;
-                    increase MAXPTS to decrease ERROR;"""
-    , (2): 'N > 500 or N < 1'}
+'''
+
+informcode = {0: 'normal completion with ERROR < EPS',
+              1: '''completion with ERROR > EPS and MAXPTS function values used;
+                    increase MAXPTS to decrease ERROR;''',
+              2: 'N > 500 or N < 1'}


 def mvstdnormcdf(lower, upper, corrcoef, **kwds):
-    """standardized multivariate normal cumulative distribution function
+    '''standardized multivariate normal cumulative distribution function

     This is a wrapper for scipy.stats._mvn.mvndst which calculates
     a rectangular integral over a standardized multivariate normal
@@ -580,15 +1078,67 @@ def mvstdnormcdf(lower, upper, corrcoef, **kwds):
     something wrong completion with ERROR > EPS and MAXPTS function values used;
                         increase MAXPTS to decrease ERROR; 1.048330348e-006
     0.166666546218
-    >>> print(mvstdnormcdf([-np.inf,-np.inf,-100.0],[0.0,0.0,0.0], corr,                             maxpts=100000, abseps=1e-8))
+    >>> print(mvstdnormcdf([-np.inf,-np.inf,-100.0],[0.0,0.0,0.0], corr, \
+                            maxpts=100000, abseps=1e-8))
     0.166666588293

-    """
-    pass
+    '''
+    n = len(lower)
+    # do not know if converting to array is necessary,
+    # but it makes ndim check possible
+    lower = np.array(lower)
+    upper = np.array(upper)
+    corrcoef = np.array(corrcoef)
+
+    correl = np.zeros(int(n * (n - 1) / 2.0))  # dtype necessary?
+
+    if (lower.ndim != 1) or (upper.ndim != 1):
+        raise ValueError('can handle only 1D bounds')
+    if len(upper) != n:
+        raise ValueError('bounds have different lengths')
+    if n == 2 and corrcoef.size == 1:
+        correl = corrcoef
+        # print 'case scalar rho', n
+    elif corrcoef.ndim == 1 and len(corrcoef) == n * (n - 1) / 2.0:
+        # print 'case flat corr', corrcoeff.shape
+        correl = corrcoef
+    elif corrcoef.shape == (n, n):
+        # print 'case square corr',  correl.shape
+        correl = corrcoef[np.tril_indices(n, -1)]
+    #        for ii in range(n):
+    #            for jj in range(ii):
+    #                correl[ jj + ((ii-2)*(ii-1))/2] = corrcoef[ii,jj]
+    else:
+        raise ValueError('corrcoef has incorrect dimension')
+
+    if 'maxpts' not in kwds:
+        if n > 2:
+            kwds['maxpts'] = 10000 * n
+
+    lowinf = np.isneginf(lower)
+    uppinf = np.isposinf(upper)
+    infin = 2.0 * np.ones(n)
+
+    np.putmask(infin, lowinf, 0)  # infin.putmask(0,lowinf)
+    np.putmask(infin, uppinf, 1)  # infin.putmask(1,uppinf)
+    # this has to be last
+    np.putmask(infin, lowinf * uppinf, -1)
+
+    ##    #remove infs
+    ##    np.putmask(lower,lowinf,-100)# infin.putmask(0,lowinf)
+    ##    np.putmask(upper,uppinf,100) #infin.putmask(1,uppinf)
+
+    # print lower,',',upper,',',infin,',',correl
+    # print correl.shape
+    # print kwds.items()
+    error, cdfvalue, inform = mvndst(lower, upper, infin, correl, **kwds)
+    if inform:
+        print('something wrong', informcode[inform], error)
+    return cdfvalue


 def mvnormcdf(upper, mu, cov, lower=None, **kwds):
-    """multivariate normal cumulative distribution function
+    '''multivariate normal cumulative distribution function

     This is a wrapper for scipy.stats._mvn.mvndst which calculates
     a rectangular integral over a multivariate normal distribution.
@@ -625,5 +1175,21 @@ def mvnormcdf(upper, mu, cov, lower=None, **kwds):
     See Also
     --------
     mvstdnormcdf : location and scale standardized multivariate normal cdf
-    """
-    pass
+    '''
+
+    upper = np.array(upper)
+    if lower is None:
+        lower = -np.ones(upper.shape) * np.inf
+    else:
+        lower = np.array(lower)
+    cov = np.array(cov)
+    stdev = np.sqrt(np.diag(cov))  # standard deviation vector
+    # do I need to make sure stdev is float and not int?
+    # is this correct to normalize to corr?
+    lower = (lower - mu) / stdev
+    upper = (upper - mu) / stdev
+    divrow = np.atleast_2d(stdev)
+    corr = cov / divrow / divrow.T
+    # v/np.sqrt(np.atleast_2d(np.diag(covv)))/np.sqrt(np.atleast_2d(np.diag(covv))).T
+
+    return mvstdnormcdf(lower, upper, corr, **kwds)
diff --git a/statsmodels/sandbox/distributions/genpareto.py b/statsmodels/sandbox/distributions/genpareto.py
index f996edcdd..796402006 100644
--- a/statsmodels/sandbox/distributions/genpareto.py
+++ b/statsmodels/sandbox/distributions/genpareto.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Thu Aug 12 14:59:03 2010

@@ -10,23 +11,68 @@ from scipy import stats
 from scipy.special import comb
 from scipy.stats.distributions import rv_continuous
 import matplotlib.pyplot as plt
+
 from numpy import where, inf
 from numpy import abs as np_abs


+## Generalized Pareto  with reversed sign of c as in literature
 class genpareto2_gen(rv_continuous):
-    pass
+    def _argcheck(self, c):
+        c = np.asarray(c)
+        self.b = where(c > 0, 1.0 / np_abs(c), inf)
+        return where(c == 0, 0, 1)
+
+    def _pdf(self, x, c):
+        Px = np.power(1 - c * x, -1.0 + 1.0 / c)
+        return Px
+
+    def _logpdf(self, x, c):
+        return (-1.0 + 1.0 / c) * np.log1p(-c * x)
+
+    def _cdf(self, x, c):
+        return 1.0 - np.power(1 - c * x, 1.0 / c)
+
+    def _ppf(self, q, c):
+        vals = -1.0 / c * (np.power(1 - q, c) - 1)
+        return vals
+
+    def _munp(self, n, c):
+        k = np.arange(0, n + 1)
+        val = (1.0 / c) ** n * np.sum(comb(n, k) * (-1) ** k / (1.0 + c * k), axis=0)
+        return where(c * n > -1, val, inf)
+
+    def _entropy(self, c):
+        if (c < 0):
+            return 1 - c
+        else:
+            self.b = 1.0 / c
+            return rv_continuous._entropy(self, c)


-genpareto2 = genpareto2_gen(a=0.0, name='genpareto', longname=
-    'A generalized Pareto', shapes='c')
+genpareto2 = genpareto2_gen(a=0.0, name='genpareto',
+                            longname="A generalized Pareto",
+                            shapes='c',
+                            #                           extradoc="""
+                            #
+                            # Generalized Pareto distribution
+                            #
+                            # genpareto2.pdf(x,c) = (1+c*x)**(-1-1/c)
+                            # for c != 0, and for x >= 0 for all c, and x < 1/abs(c) for c < 0.
+                            # """
+                            )
+
 shape, loc, scale = 0.5, 0, 1
 rv = np.arange(5)
 quant = [0.01, 0.1, 0.5, 0.9, 0.99]
-for method, x in [('pdf', rv), ('cdf', rv), ('sf', rv), ('ppf', quant), (
-    'isf', quant)]:
+for method, x in [('pdf', rv),
+                  ('cdf', rv),
+                  ('sf', rv),
+                  ('ppf', quant),
+                  ('isf', quant)]:
     print(getattr(genpareto2, method)(x, shape, loc, scale))
     print(getattr(stats.genpareto, method)(x, -shape, loc, scale))
+
 print(genpareto2.stats(shape, loc, scale, moments='mvsk'))
 print(stats.genpareto.stats(-shape, loc, scale, moments='mvsk'))
 print(genpareto2.entropy(shape, loc, scale))
@@ -34,40 +80,100 @@ print(stats.genpareto.entropy(-shape, loc, scale))


 def paramstopot(thresh, shape, scale):
-    """transform shape scale for peak over threshold
+    '''transform shape scale for peak over threshold

     y = x-u|x>u ~ GPD(k, sigma-k*u) if x ~ GPD(k, sigma)
     notation of de Zea Bermudez, Kotz
     k, sigma is shape, scale
-    """
-    pass
+    '''
+    return shape, scale - shape * thresh
+
+
+def paramsfrompot(thresh, shape, scalepot):
+    return shape, scalepot + shape * thresh
+
+
+def warnif(cond, msg):
+    if not cond:
+        print(msg, 'does not hold')


 def meanexcess(thresh, shape, scale):
-    """mean excess function of genpareto
+    '''mean excess function of genpareto

     assert are inequality conditions in de Zea Bermudez, Kotz
-    """
-    pass
+    '''
+    warnif(shape > -1, 'shape > -1')
+    warnif(thresh >= 0, 'thresh >= 0')  # make it weak inequality
+    warnif((scale - shape * thresh) > 0, '(scale - shape*thresh) > 0')
+    return (scale - shape * thresh) / (1 + shape)
+
+
+def meanexcess_plot(data, params=None, lidx=100, uidx=10, method='emp', plot=0):
+    if method == 'est':
+        # does not make much sense yet,
+        # estimate the parameters and use theoretical meanexcess
+        if params is None:
+            raise NotImplementedError
+        else:
+            pass  # estimate parames
+    elif method == 'emp':
+        # calculate meanexcess from data
+        datasorted = np.sort(data)
+        meanexcess = (datasorted[::-1].cumsum()) / np.arange(1, len(data) + 1) - datasorted[::-1]
+        meanexcess = meanexcess[::-1]
+        if plot:
+            plt.plot(datasorted[:-uidx], meanexcess[:-uidx])
+            if params is not None:
+                shape, scale = params
+                plt.plot(datasorted[:-uidx], (scale - datasorted[:-uidx] * shape) / (1. + shape))
+    return datasorted, meanexcess


 print(meanexcess(5, -0.5, 10))
 print(meanexcess(5, -2, 10))
+
 data = genpareto2.rvs(-0.75, scale=5, size=1000)
+# data = np.random.uniform(50, size=1000)
+# data = stats.norm.rvs(0, np.sqrt(50), size=1000)
+# data = stats.pareto.rvs(1.5, np.sqrt(50), size=1000)
 tmp = meanexcess_plot(data, params=(-0.75, 5), plot=1)
 print(tmp[1][-20:])
 print(tmp[0][-20:])
-ds, me, mc = meanexcess_emp(1.0 * np.arange(1, 10))
+
+
+# plt.show()
+
+def meanexcess_emp(data):
+    datasorted = np.sort(data).astype(float)
+    meanexcess = (datasorted[::-1].cumsum()) / np.arange(1, len(data) + 1) - datasorted[::-1]
+    meancont = (datasorted[::-1].cumsum()) / np.arange(1, len(data) + 1)
+    meanexcess = meanexcess[::-1]
+    return datasorted, meanexcess, meancont[::-1]
+
+
+def meanexcess_dist(self, lb, *args, **kwds):
+    # default function in expect is identity
+    # need args in call
+    if np.ndim(lb) == 0:
+        return self.expect(lb=lb, conditional=True)
+    else:
+        return np.array([self.expect(lb=lbb, conditional=True) for
+                         lbb in lb])
+
+
+ds, me, mc = meanexcess_emp(1. * np.arange(1, 10))
 print(ds)
 print(me)
 print(mc)
+
 print(meanexcess_dist(stats.norm, lb=0.5))
 print(meanexcess_dist(stats.norm, lb=[-np.inf, -0.5, 0, 0.5]))
 rvs = stats.norm.rvs(size=100000)
 rvs = rvs - rvs.mean()
-print(rvs.mean(), rvs[rvs > -0.5].mean(), rvs[rvs > 0].mean(), rvs[rvs > 
-    0.5].mean())
-"""
+print(rvs.mean(), rvs[rvs > -0.5].mean(), rvs[rvs > 0].mean(), rvs[rvs > 0.5].mean())
+
+'''
 [ 1.   0.5  0.   0.   0. ]
 [ 1.   0.5  0.   0.   0. ]
 [ 0.    0.75  1.    1.    1.  ]
@@ -140,4 +246,4 @@ array([  9.,  17.,  24.,  30.,  35.,  39.,  42.,  44.,  45.])
 >>> datasorted[::-1]
 array([ 9.,  8.,  7.,  6.,  5.,  4.,  3.,  2.,  1.])
 >>>
-"""
+'''
diff --git a/statsmodels/sandbox/distributions/gof_new.py b/statsmodels/sandbox/distributions/gof_new.py
index 349fcd9fe..ddde3d99e 100644
--- a/statsmodels/sandbox/distributions/gof_new.py
+++ b/statsmodels/sandbox/distributions/gof_new.py
@@ -1,4 +1,4 @@
-"""More Goodness of fit tests
+'''More Goodness of fit tests

 contains

@@ -16,14 +16,17 @@ parts based on ks_2samp and kstest from scipy.stats
 References
 ----------

-"""
+'''
 from statsmodels.compat.python import lmap
 import numpy as np
+
 from scipy.stats import distributions
+
 from statsmodels.tools.decorators import cache_readonly
-from scipy.special import kolmogorov as ksprob

+from scipy.special import kolmogorov as ksprob

+#from scipy.stats unchanged
 def ks_2samp(data1, data2):
     """
     Computes the Kolmogorov-Smirnof statistic on 2 samples.
@@ -96,11 +99,30 @@ def ks_2samp(data1, data2):
     >>> ks_2samp(rvs1,rvs4)
     (0.07999999999999996, 0.41126949729859719)
     """
-    pass
-
-
-def kstest(rvs, cdf, args=(), N=20, alternative='two_sided', mode='approx',
-    **kwds):
+    data1, data2 = lmap(np.asarray, (data1, data2))
+    n1 = data1.shape[0]
+    n2 = data2.shape[0]
+    n1 = len(data1)
+    n2 = len(data2)
+    data1 = np.sort(data1)
+    data2 = np.sort(data2)
+    data_all = np.concatenate([data1,data2])
+    #reminder: searchsorted inserts 2nd into 1st array
+    cdf1 = np.searchsorted(data1,data_all,side='right')/(1.0*n1)
+    cdf2 = (np.searchsorted(data2,data_all,side='right'))/(1.0*n2)
+    d = np.max(np.absolute(cdf1-cdf2))
+    #Note: d absolute not signed distance
+    en = np.sqrt(n1*n2/float(n1+n2))
+    try:
+        prob = ksprob((en+0.12+0.11/en)*d)
+    except:
+        prob = 1.0
+    return d, prob
+
+
+
+#from scipy.stats unchanged
+def kstest(rvs, cdf, args=(), N=20, alternative = 'two_sided', mode='approx',**kwds):
     """
     Perform the Kolmogorov-Smirnov test for goodness of fit

@@ -215,23 +237,134 @@ def kstest(rvs, cdf, args=(), N=20, alternative='two_sided', mode='approx',
     >>> stats.kstest(stats.t.rvs(3,size=100),'norm')
     (0.131016895759829, 0.058826222555312224)
     """
-    pass
+    if isinstance(rvs, str):
+        #cdf = getattr(stats, rvs).cdf
+        if (not cdf) or (cdf == rvs):
+            cdf = getattr(distributions, rvs).cdf
+            rvs = getattr(distributions, rvs).rvs
+        else:
+            raise AttributeError('if rvs is string, cdf has to be the same distribution')
+
+
+    if isinstance(cdf, str):
+        cdf = getattr(distributions, cdf).cdf
+    if callable(rvs):
+        kwds = {'size':N}
+        vals = np.sort(rvs(*args,**kwds))
+    else:
+        vals = np.sort(rvs)
+        N = len(vals)
+    cdfvals = cdf(vals, *args)
+
+    if alternative in ['two_sided', 'greater']:
+        Dplus = (np.arange(1.0, N+1)/N - cdfvals).max()
+        if alternative == 'greater':
+            return Dplus, distributions.ksone.sf(Dplus,N)
+
+    if alternative in ['two_sided', 'less']:
+        Dmin = (cdfvals - np.arange(0.0, N)/N).max()
+        if alternative == 'less':
+            return Dmin, distributions.ksone.sf(Dmin,N)
+
+    if alternative == 'two_sided':
+        D = np.max([Dplus,Dmin])
+        if mode == 'asymp':
+            return D, distributions.kstwobign.sf(D*np.sqrt(N))
+        if mode == 'approx':
+            pval_two = distributions.kstwobign.sf(D*np.sqrt(N))
+            if N > 2666 or pval_two > 0.80 - N*0.3/1000.0 :
+                return D, distributions.kstwobign.sf(D*np.sqrt(N))
+            else:
+                return D, distributions.ksone.sf(D,N)*2
+
+#TODO: split into modification and pvalue functions separately ?
+#      for separate testing and combining different pieces

+def dplus_st70_upp(stat, nobs):
+    mod_factor = np.sqrt(nobs) + 0.12 + 0.11 / np.sqrt(nobs)
+    stat_modified = stat * mod_factor
+    pval = np.exp(-2 * stat_modified**2)
+    digits = np.sum(stat > np.array([0.82, 0.82, 1.00]))
+    #repeat low to get {0,2,3}
+    return stat_modified, pval, digits

 dminus_st70_upp = dplus_st70_upp
+
+
+def d_st70_upp(stat, nobs):
+    mod_factor = np.sqrt(nobs) + 0.12 + 0.11 / np.sqrt(nobs)
+    stat_modified = stat * mod_factor
+    pval = 2 * np.exp(-2 * stat_modified**2)
+    digits = np.sum(stat > np.array([0.91, 0.91, 1.08]))
+    #repeat low to get {0,2,3}
+    return stat_modified, pval, digits
+
+def v_st70_upp(stat, nobs):
+    mod_factor = np.sqrt(nobs) + 0.155 + 0.24 / np.sqrt(nobs)
+    #repeat low to get {0,2,3}
+    stat_modified = stat * mod_factor
+    zsqu = stat_modified**2
+    pval = (8 * zsqu - 2) * np.exp(-2 * zsqu)
+    digits = np.sum(stat > np.array([1.06, 1.06, 1.26]))
+    return stat_modified, pval, digits
+
+def wsqu_st70_upp(stat, nobs):
+    nobsinv = 1. / nobs
+    stat_modified = (stat - 0.4 * nobsinv + 0.6 * nobsinv**2) * (1 + nobsinv)
+    pval = 0.05 * np.exp(2.79 - 6 * stat_modified)
+    digits = np.nan  # some explanation in txt
+    #repeat low to get {0,2,3}
+    return stat_modified, pval, digits
+
+def usqu_st70_upp(stat, nobs):
+    nobsinv = 1. / nobs
+    stat_modified = (stat - 0.1 * nobsinv + 0.1 * nobsinv**2)
+    stat_modified *= (1 + 0.8 * nobsinv)
+    pval = 2 * np.exp(- 2 * stat_modified * np.pi**2)
+    digits = np.sum(stat > np.array([0.29, 0.29, 0.34]))
+    #repeat low to get {0,2,3}
+    return stat_modified, pval, digits
+
+def a_st70_upp(stat, nobs):
+    nobsinv = 1. / nobs
+    stat_modified = (stat - 0.7 * nobsinv + 0.9 * nobsinv**2)
+    stat_modified *= (1 + 1.23 * nobsinv)
+    pval = 1.273 * np.exp(- 2 * stat_modified / 2. * np.pi**2)
+    digits = np.sum(stat > np.array([0.11, 0.11, 0.452]))
+    #repeat low to get {0,2,3}
+    return stat_modified, pval, digits
+
+
+
 gof_pvals = {}
-gof_pvals['stephens70upp'] = {'d_plus': dplus_st70_upp, 'd_minus':
-    dplus_st70_upp, 'd': d_st70_upp, 'v': v_st70_upp, 'wsqu': wsqu_st70_upp,
-    'usqu': usqu_st70_upp, 'a': a_st70_upp}
-gof_pvals['scipy'] = {'d_plus': lambda Dplus, N: (Dplus, distributions.
-    ksone.sf(Dplus, N), np.nan), 'd_minus': lambda Dmin, N: (Dmin,
-    distributions.ksone.sf(Dmin, N), np.nan), 'd': lambda D, N: (D,
-    distributions.kstwobign.sf(D * np.sqrt(N)), np.nan)}
-gof_pvals['scipy_approx'] = {'d': pval_kstest_approx}

+gof_pvals['stephens70upp'] = {
+    'd_plus' : dplus_st70_upp,
+    'd_minus' : dplus_st70_upp,
+    'd' : d_st70_upp,
+    'v' : v_st70_upp,
+    'wsqu' : wsqu_st70_upp,
+    'usqu' : usqu_st70_upp,
+    'a' : a_st70_upp }
+
+def pval_kstest_approx(D, N):
+    pval_two = distributions.kstwobign.sf(D*np.sqrt(N))
+    if N > 2666 or pval_two > 0.80 - N*0.3/1000.0 :
+        return D, distributions.kstwobign.sf(D*np.sqrt(N)), np.nan
+    else:
+        return D, distributions.ksone.sf(D,N)*2, np.nan
+
+gof_pvals['scipy'] = {
+    'd_plus' : lambda Dplus, N: (Dplus, distributions.ksone.sf(Dplus, N), np.nan),
+    'd_minus' : lambda Dmin, N: (Dmin, distributions.ksone.sf(Dmin,N), np.nan),
+    'd' : lambda D, N: (D, distributions.kstwobign.sf(D*np.sqrt(N)), np.nan)
+    }
+
+gof_pvals['scipy_approx'] = {
+    'd' : pval_kstest_approx }

 class GOF:
-    """One Sample Goodness of Fit tests
+    '''One Sample Goodness of Fit tests

     includes Kolmogorov-Smirnov D, D+, D-, Kuiper V, Cramer-von Mises W^2, U^2 and
     Anderson-Darling A, A^2. The p-values for all tests except for A^2 are based on
@@ -249,58 +382,164 @@ class GOF:



-    """
+    '''
+
+
+

     def __init__(self, rvs, cdf, args=(), N=20):
         if isinstance(rvs, str):
-            if not cdf or cdf == rvs:
+            #cdf = getattr(stats, rvs).cdf
+            if (not cdf) or (cdf == rvs):
                 cdf = getattr(distributions, rvs).cdf
                 rvs = getattr(distributions, rvs).rvs
             else:
-                raise AttributeError(
-                    'if rvs is string, cdf has to be the same distribution')
+                raise AttributeError('if rvs is string, cdf has to be the same distribution')
+
+
         if isinstance(cdf, str):
             cdf = getattr(distributions, cdf).cdf
         if callable(rvs):
-            kwds = {'size': N}
-            vals = np.sort(rvs(*args, **kwds))
+            kwds = {'size':N}
+            vals = np.sort(rvs(*args,**kwds))
         else:
             vals = np.sort(rvs)
             N = len(vals)
         cdfvals = cdf(vals, *args)
+
         self.nobs = N
         self.vals_sorted = vals
         self.cdfvals = cdfvals

+
+
+    @cache_readonly
+    def d_plus(self):
+        nobs = self.nobs
+        cdfvals = self.cdfvals
+        return (np.arange(1.0, nobs+1)/nobs - cdfvals).max()
+
+    @cache_readonly
+    def d_minus(self):
+        nobs = self.nobs
+        cdfvals = self.cdfvals
+        return (cdfvals - np.arange(0.0, nobs)/nobs).max()
+
+    @cache_readonly
+    def d(self):
+        return np.max([self.d_plus, self.d_minus])
+
     @cache_readonly
     def v(self):
-        """Kuiper"""
-        pass
+        '''Kuiper'''
+        return self.d_plus + self.d_minus

     @cache_readonly
     def wsqu(self):
-        """Cramer von Mises"""
-        pass
+        '''Cramer von Mises'''
+        nobs = self.nobs
+        cdfvals = self.cdfvals
+        #use literal formula, TODO: simplify with arange(,,2)
+        wsqu = ((cdfvals - (2. * np.arange(1., nobs+1) - 1)/nobs/2.)**2).sum() \
+               + 1./nobs/12.
+        return wsqu
+
+    @cache_readonly
+    def usqu(self):
+        nobs = self.nobs
+        cdfvals = self.cdfvals
+        #use literal formula, TODO: simplify with arange(,,2)
+        usqu = self.wsqu - nobs * (cdfvals.mean() - 0.5)**2
+        return usqu
+
+    @cache_readonly
+    def a(self):
+        nobs = self.nobs
+        cdfvals = self.cdfvals
+
+        #one loop instead of large array
+        msum = 0
+        for j in range(1,nobs):
+            mj = cdfvals[j] - cdfvals[:j]
+            mask = (mj > 0.5)
+            mj[mask] = 1 - mj[mask]
+            msum += mj.sum()
+
+        a = nobs / 4. - 2. / nobs * msum
+        return a

     @cache_readonly
     def asqu(self):
-        """Stephens 1974, does not have p-value formula for A^2"""
-        pass
+        '''Stephens 1974, does not have p-value formula for A^2'''
+        nobs = self.nobs
+        cdfvals = self.cdfvals
+
+        asqu = -((2. * np.arange(1., nobs+1) - 1) *
+                (np.log(cdfvals) + np.log(1-cdfvals[::-1]) )).sum()/nobs - nobs
+
+        return asqu
+

     def get_test(self, testid='d', pvals='stephens70upp'):
-        """
+        '''
+
+        '''
+        #print gof_pvals[pvals][testid]
+        stat = getattr(self, testid)
+        if pvals == 'stephens70upp':
+            return gof_pvals[pvals][testid](stat, self.nobs), stat
+        else:
+            return gof_pvals[pvals][testid](stat, self.nobs)
+

-        """
-        pass


+
+
+
+
+def gof_mc(randfn, distr, nobs=100):
+    #print '\nIs it correctly sized?'
+    from collections import defaultdict
+
+    results = defaultdict(list)
+    for i in range(1000):
+        rvs = randfn(nobs)
+        goft = GOF(rvs, distr)
+        for ti in all_gofs:
+            results[ti].append(goft.get_test(ti, 'stephens70upp')[0][1])
+
+    resarr = np.array([results[ti] for ti in all_gofs])
+    print('         ', '      '.join(all_gofs))
+    print('at 0.01:', (resarr < 0.01).mean(1))
+    print('at 0.05:', (resarr < 0.05).mean(1))
+    print('at 0.10:', (resarr < 0.1).mean(1))
+
 def asquare(cdfvals, axis=0):
-    """vectorized Anderson Darling A^2, Stephens 1974"""
-    pass
+    '''vectorized Anderson Darling A^2, Stephens 1974'''
+    ndim = len(cdfvals.shape)
+    nobs = cdfvals.shape[axis]
+    slice_reverse = [slice(None)] * ndim  #might make copy if not specific axis???
+    islice = [None] * ndim
+    islice[axis] = slice(None)
+    slice_reverse[axis] = slice(None, None, -1)
+    asqu = -((2. * np.arange(1., nobs+1)[tuple(islice)] - 1) *
+            (np.log(cdfvals) + np.log(1-cdfvals[tuple(slice_reverse)]))/nobs).sum(axis) \
+            - nobs

+    return asqu

+
+#class OneSGOFFittedVec:
+#    '''for vectorized fitting'''
+    # currently I use the bootstrap as function instead of full class
+
+    #note: kwds loc and scale are a pain
+    # I would need to overwrite rvs, fit and cdf depending on fixed parameters
+
+    #def bootstrap(self, distr, args=(), kwds={}, nobs=200, nrep=1000,
 def bootstrap(distr, args=(), nobs=200, nrep=100, value=None, batch_size=None):
-    """Monte Carlo (or parametric bootstrap) p-values for gof
+    '''Monte Carlo (or parametric bootstrap) p-values for gof

     currently hardcoded for A^2 only

@@ -311,12 +550,46 @@ def bootstrap(distr, args=(), nobs=200, nrep=100, value=None, batch_size=None):

     this works also with nrep=1

-    """
-    pass
+    '''
+    #signature similar to kstest ?
+    #delegate to fn ?
+
+    #rvs_kwds = {'size':(nobs, nrep)}
+    #rvs_kwds.update(kwds)
+
+
+    #it will be better to build a separate batch function that calls bootstrap
+    #keep batch if value is true, but batch iterate from outside if stat is returned
+    if batch_size is not None:
+        if value is None:
+            raise ValueError('using batching requires a value')
+        n_batch = int(np.ceil(nrep/float(batch_size)))
+        count = 0
+        for irep in range(n_batch):
+            rvs = distr.rvs(args, **{'size':(batch_size, nobs)})
+            params = distr.fit_vec(rvs, axis=1)
+            params = lmap(lambda x: np.expand_dims(x, 1), params)
+            cdfvals = np.sort(distr.cdf(rvs, params), axis=1)
+            stat = asquare(cdfvals, axis=1)
+            count += (stat >= value).sum()
+        return count / float(n_batch * batch_size)
+    else:
+        #rvs = distr.rvs(args, **kwds)  #extension to distribution kwds ?
+        rvs = distr.rvs(args, **{'size':(nrep, nobs)})
+        params = distr.fit_vec(rvs, axis=1)
+        params = lmap(lambda x: np.expand_dims(x, 1), params)
+        cdfvals = np.sort(distr.cdf(rvs, params), axis=1)
+        stat = asquare(cdfvals, axis=1)
+        if value is None:           #return all bootstrap results
+            stat_sorted = np.sort(stat)
+            return stat_sorted
+        else:                       #calculate and return specific p-value
+            return (stat >= value).mean()
+


 def bootstrap2(value, distr, args=(), nobs=200, nrep=100):
-    """Monte Carlo (or parametric bootstrap) p-values for gof
+    '''Monte Carlo (or parametric bootstrap) p-values for gof

     currently hardcoded for A^2 only

@@ -325,27 +598,60 @@ def bootstrap2(value, distr, args=(), nobs=200, nrep=100):

     rename function to less generic

-    """
-    pass
+    '''
+    #signature similar to kstest ?
+    #delegate to fn ?
+
+    #rvs_kwds = {'size':(nobs, nrep)}
+    #rvs_kwds.update(kwds)
+
+
+    count = 0
+    for irep in range(nrep):
+        #rvs = distr.rvs(args, **kwds)  #extension to distribution kwds ?
+        rvs = distr.rvs(args, **{'size':nobs})
+        params = distr.fit_vec(rvs)
+        cdfvals = np.sort(distr.cdf(rvs, params))
+        stat = asquare(cdfvals, axis=0)
+        count += (stat >= value)
+    return count * 1. / nrep


 class NewNorm:
-    """just a holder for modified distributions
-    """
+    '''just a holder for modified distributions
+    '''
+
+    def fit_vec(self, x, axis=0):
+        return x.mean(axis), x.std(axis)
+
+    def cdf(self, x, args):
+        return distributions.norm.cdf(x, loc=args[0], scale=args[1])
+
+    def rvs(self, args, size):
+        loc=args[0]
+        scale=args[1]
+        return loc + scale * distributions.norm.rvs(size=size)
+
+
+


 if __name__ == '__main__':
     from scipy import stats
+    #rvs = np.random.randn(1000)
     rvs = stats.t.rvs(3, size=200)
     print('scipy kstest')
     print(kstest(rvs, 'norm'))
     goft = GOF(rvs, 'norm')
     print(goft.get_test())
+
     all_gofs = ['d', 'd_plus', 'd_minus', 'v', 'wsqu', 'usqu', 'a']
     for ti in all_gofs:
         print(ti, goft.get_test(ti, 'stephens70upp'))
+
     print('\nIs it correctly sized?')
     from collections import defaultdict
+
     results = defaultdict(list)
     nobs = 200
     for i in range(100):
@@ -353,22 +659,28 @@ if __name__ == '__main__':
         goft = GOF(rvs, 'norm')
         for ti in all_gofs:
             results[ti].append(goft.get_test(ti, 'stephens70upp')[0][1])
+
     resarr = np.array([results[ti] for ti in all_gofs])
     print('         ', '      '.join(all_gofs))
     print('at 0.01:', (resarr < 0.01).mean(1))
     print('at 0.05:', (resarr < 0.05).mean(1))
     print('at 0.10:', (resarr < 0.1).mean(1))
+
     gof_mc(lambda nobs: stats.t.rvs(3, size=nobs), 'norm', nobs=200)
+
     nobs = 200
     nrep = 100
-    bt = bootstrap(NewNorm(), args=(0, 1), nobs=nobs, nrep=nrep, value=None)
+    bt = bootstrap(NewNorm(), args=(0,1), nobs=nobs, nrep=nrep, value=None)
     quantindex = np.floor(nrep * np.array([0.99, 0.95, 0.9])).astype(int)
     print(bt[quantindex])
-    """
+
+    #the bootstrap results match Stephens pretty well for nobs=100, but not so well for
+    #large (1000) or small (20) nobs
+    '''
     >>> np.array([15.0, 10.0, 5.0, 2.5, 1.0])/100.  #Stephens
     array([ 0.15 ,  0.1  ,  0.05 ,  0.025,  0.01 ])
     >>> nobs = 100
     >>> [bootstrap(NewNorm(), args=(0,1), nobs=nobs, nrep=10000, value=c/ (1 + 4./nobs - 25./nobs**2)) for c in [0.576, 0.656, 0.787, 0.918, 1.092]]
     [0.1545, 0.10009999999999999, 0.049000000000000002, 0.023, 0.0104]
     >>>
-    """
+    '''
diff --git a/statsmodels/sandbox/distributions/multivariate.py b/statsmodels/sandbox/distributions/multivariate.py
index cd0aa83ad..fce612046 100644
--- a/statsmodels/sandbox/distributions/multivariate.py
+++ b/statsmodels/sandbox/distributions/multivariate.py
@@ -1,4 +1,4 @@
-"""Multivariate Distribution
+'''Multivariate Distribution

 Probability of a multivariate t distribution

@@ -13,23 +13,57 @@ License: BSD (3-clause)
 Reference:
 Genz and Bretz for formula

-"""
+'''
 import numpy as np
 from scipy import integrate, stats, special
 from scipy.stats import chi
+
 from .extras import mvstdnormcdf
+
 from numpy import exp as np_exp
 from numpy import log as np_log
 from scipy.special import gamma as sps_gamma
 from scipy.special import gammaln as sps_gammaln

-
 def chi2_pdf(self, x, df):
-    """pdf of chi-square distribution"""
-    pass
+    '''pdf of chi-square distribution'''
+    #from scipy.stats.distributions
+    Px = x**(df/2.0-1)*np.exp(-x/2.0)
+    Px /= special.gamma(df/2.0)* 2**(df/2.0)
+    return Px
+
+def chi_pdf(x, df):
+    tmp = (df-1.)*np_log(x) + (-x*x*0.5) - (df*0.5-1)*np_log(2.0) \
+          - sps_gammaln(df*0.5)
+    return np_exp(tmp)
+    #return x**(df-1.)*np_exp(-x*x*0.5)/(2.0)**(df*0.5-1)/sps_gamma(df*0.5)
+
+def chi_logpdf(x, df):
+    tmp = (df-1.)*np_log(x) + (-x*x*0.5) - (df*0.5-1)*np_log(2.0) \
+          - sps_gammaln(df*0.5)
+    return tmp
+
+def funbgh(s, a, b, R, df):
+    sqrt_df = np.sqrt(df+0.5)
+    ret = chi_logpdf(s,df)
+    ret += np_log(mvstdnormcdf(s*a/sqrt_df, s*b/sqrt_df, R,
+                                         maxpts=1000000, abseps=1e-6))
+    ret = np_exp(ret)
+    return ret
+
+def funbgh2(s, a, b, R, df):
+    n = len(a)
+    sqrt_df = np.sqrt(df)
+    #np.power(s, df-1) * np_exp(-s*s*0.5)
+    return np_exp((df-1)*np_log(s)-s*s*0.5) \
+           * mvstdnormcdf(s*a/sqrt_df, s*b/sqrt_df, R[np.tril_indices(n, -1)],
+                          maxpts=1000000, abseps=1e-4)
+
+def bghfactor(df):
+    return np.power(2.0, 1-df*0.5) / sps_gamma(df*0.5)


-def mvstdtprob(a, b, R, df, ieps=1e-05, quadkwds=None, mvstkwds=None):
+def mvstdtprob(a, b, R, df, ieps=1e-5, quadkwds=None, mvstkwds=None):
     """
     Probability of rectangular area of standard t distribution

@@ -41,11 +75,18 @@ def mvstdtprob(a, b, R, df, ieps=1e-05, quadkwds=None, mvstkwds=None):
     between the underlying multivariate normal probability calculations
     and the integration.
     """
-    pass
-
-
+    kwds = dict(args=(a, b, R, df), epsabs=1e-4, epsrel=1e-2, limit=150)
+    if quadkwds is not None:
+        kwds.update(quadkwds)
+    lower, upper = chi.ppf([ieps, 1 - ieps], df)
+    res, err = integrate.quad(funbgh2, lower, upper, **kwds)
+    prob = res * bghfactor(df)
+    return prob
+
+#written by Enzo Michelangeli, style changes by josef-pktd
+# Student's T random variable
 def multivariate_t_rvs(m, S, df=np.inf, n=1):
-    """generate random variables of multivariate t distribution
+    '''generate random variables of multivariate t distribution

     Parameters
     ----------
@@ -65,36 +106,54 @@ def multivariate_t_rvs(m, S, df=np.inf, n=1):
         random variable


-    """
-    pass
+    '''
+    m = np.asarray(m)
+    d = len(m)
+    if df == np.inf:
+        x = np.ones(n)
+    else:
+        x = np.random.chisquare(df, n)/df
+    z = np.random.multivariate_normal(np.zeros(d),S,(n,))
+    return m + z/np.sqrt(x)[:,None]   # same output format as random.multivariate_normal
+
+


 if __name__ == '__main__':
-    corr = np.asarray([[1.0, 0, 0.5], [0, 1, 0], [0.5, 0, 1]])
-    corr_indep = np.asarray([[1.0, 0, 0], [0, 1, 0], [0, 0, 1]])
-    corr_equal = np.asarray([[1.0, 0.5, 0.5], [0.5, 1, 0.5], [0.5, 0.5, 1]])
+    corr = np.asarray([[1.0, 0, 0.5],[0,1,0],[0.5,0,1]])
+    corr_indep = np.asarray([[1.0, 0, 0],[0,1,0],[0,0,1]])
+    corr_equal = np.asarray([[1.0, 0.5, 0.5],[0.5,1,0.5],[0.5,0.5,1]])
     R = corr_equal
-    a = np.array([-np.inf, -np.inf, -100.0])
-    a = np.array([-0.96, -0.96, -0.96])
-    b = np.array([0.0, 0.0, 0.0])
-    b = np.array([0.96, 0.96, 0.96])
+    a = np.array([-np.inf,-np.inf,-100.0])
+    a = np.array([-0.96,-0.96,-0.96])
+    b = np.array([0.0,0.0,0.0])
+    b = np.array([0.96,0.96, 0.96])
     a[:] = -1
     b[:] = 3
-    df = 10.0
+    df = 10.
     sqrt_df = np.sqrt(df)
-    print(mvstdnormcdf(a, b, corr, abseps=1e-06))
-    print((stats.t.cdf(b[0], df) - stats.t.cdf(a[0], df)) ** 3)
+    print(mvstdnormcdf(a, b, corr, abseps=1e-6))
+
+    #print integrate.quad(funbgh, 0, np.inf, args=(a,b,R,df))
+    print((stats.t.cdf(b[0], df) - stats.t.cdf(a[0], df))**3)
+
     s = 1
-    print(mvstdnormcdf(s * a / sqrt_df, s * b / sqrt_df, R))
-    df = 4
+    print(mvstdnormcdf(s*a/sqrt_df, s*b/sqrt_df, R))
+
+
+    df=4
     print(mvstdtprob(a, b, R, df))
-    S = np.array([[1.0, 0.5], [0.5, 1.0]])
-    print(multivariate_t_rvs([10.0, 20.0], S, 2, 5))
+
+    S = np.array([[1.,.5],[.5,1.]])
+    print(multivariate_t_rvs([10.,20.], S, 2, 5))
+
     nobs = 10000
-    rvst = multivariate_t_rvs([10.0, 20.0], S, 2, nobs)
-    print(np.sum((rvst < [10.0, 20.0]).all(1), 0) * 1.0 / nobs)
-    print(mvstdtprob(-np.inf * np.ones(2), np.zeros(2), R[:2, :2], 2))
-    """
+    rvst = multivariate_t_rvs([10.,20.], S, 2, nobs)
+    print(np.sum((rvst<[10.,20.]).all(1),0) * 1. / nobs)
+    print(mvstdtprob(-np.inf*np.ones(2), np.zeros(2), R[:2,:2], 2))
+
+
+    '''
         > lower <- -1
         > upper <- 3
         > df <- 4
@@ -109,4 +168,4 @@ if __name__ == '__main__':
         > (pt(upper, df) - pt(lower, df))**3
         [1] 0.4988254

-    """
+    '''
diff --git a/statsmodels/sandbox/distributions/mv_measures.py b/statsmodels/sandbox/distributions/mv_measures.py
index d9fb59dd2..7074da810 100644
--- a/statsmodels/sandbox/distributions/mv_measures.py
+++ b/statsmodels/sandbox/distributions/mv_measures.py
@@ -1,4 +1,4 @@
-"""using multivariate dependence and divergence measures
+'''using multivariate dependence and divergence measures

 The standard correlation coefficient measures only linear dependence between
 random variables.
@@ -18,29 +18,61 @@ Phys. Rev. E 76, 026209 (2007)
 http://pre.aps.org/abstract/PRE/v76/i2/e026209


-"""
+'''
+
 import numpy as np
 from scipy import stats
 from scipy.stats import gaussian_kde
+
 import statsmodels.sandbox.infotheo as infotheo


 def mutualinfo_kde(y, x, normed=True):
-    """mutual information of two random variables estimated with kde
-
-    """
-    pass
-
+    '''mutual information of two random variables estimated with kde
+
+    '''
+    nobs = len(x)
+    if not len(y) == nobs:
+        raise ValueError('both data arrays need to have the same size')
+    x = np.asarray(x, float)
+    y = np.asarray(y, float)
+    yx = np.vstack((y,x))
+    kde_x = gaussian_kde(x)(x)
+    kde_y = gaussian_kde(y)(y)
+    kde_yx = gaussian_kde(yx)(yx)
+
+    mi_obs = np.log(kde_yx) - np.log(kde_x) - np.log(kde_y)
+    mi = mi_obs.sum() / nobs
+    if normed:
+        mi_normed = np.sqrt(1. - np.exp(-2 * mi))
+        return mi_normed
+    else:
+        return mi

 def mutualinfo_kde_2sample(y, x, normed=True):
-    """mutual information of two random variables estimated with kde
-
-    """
-    pass
-
+    '''mutual information of two random variables estimated with kde
+
+    '''
+    nobs = len(x)
+    x = np.asarray(x, float)
+    y = np.asarray(y, float)
+    #yx = np.vstack((y,x))
+    kde_x = gaussian_kde(x.T)(x.T)
+    kde_y = gaussian_kde(y.T)(x.T)
+    #kde_yx = gaussian_kde(yx)(yx)
+
+    mi_obs = np.log(kde_x) - np.log(kde_y)
+    if len(mi_obs) != nobs:
+        raise ValueError("Wrong number of observations")
+    mi = mi_obs.mean()
+    if normed:
+        mi_normed = np.sqrt(1. - np.exp(-2 * mi))
+        return mi_normed
+    else:
+        return mi

 def mutualinfo_binned(y, x, bins, normed=True):
-    """mutual information of two random variables estimated with kde
+    '''mutual information of two random variables estimated with kde



@@ -50,52 +82,112 @@ def mutualinfo_binned(y, x, bins, normed=True):
     are expected to be in each bin under the assumption of independence. This
     follows roughly the description in Kahn et al. 2007

-    """
-    pass
+    '''
+    nobs = len(x)
+    if not len(y) == nobs:
+        raise ValueError('both data arrays need to have the same size')
+    x = np.asarray(x, float)
+    y = np.asarray(y, float)
+    #yx = np.vstack((y,x))
+
+
+##    fyx, binsy, binsx = np.histogram2d(y, x, bins=bins)
+##    fx, binsx_ = np.histogram(x, bins=binsx)
+##    fy, binsy_ = np.histogram(y, bins=binsy)
+
+    if bins == 'auto':
+        ys = np.sort(y)
+        xs = np.sort(x)
+        #quantiles = np.array([0,0.25, 0.4, 0.6, 0.75, 1])
+        qbin_sqr = np.sqrt(5./nobs)
+        quantiles = np.linspace(0, 1, 1./qbin_sqr)
+        quantile_index = ((nobs-1)*quantiles).astype(int)
+        #move edges so that they do not coincide with an observation
+        shift = 1e-6 + np.ones(quantiles.shape)
+        shift[0] -= 2*1e-6
+        binsy = ys[quantile_index] + shift
+        binsx = xs[quantile_index] + shift
+
+    elif np.size(bins) == 1:
+        binsy = bins
+        binsx = bins
+    elif (len(bins) == 2):
+        binsy, binsx = bins
+##        if np.size(bins[0]) == 1:
+##            binsx = bins[0]
+##        if np.size(bins[1]) == 1:
+##            binsx = bins[1]
+
+    fx, binsx = np.histogram(x, bins=binsx)
+    fy, binsy = np.histogram(y, bins=binsy)
+    fyx, binsy, binsx = np.histogram2d(y, x, bins=(binsy, binsx))
+
+    pyx = fyx * 1. / nobs
+    px = fx * 1. / nobs
+    py = fy * 1. / nobs
+
+
+    mi_obs = pyx * (np.log(pyx+1e-10) - np.log(py)[:,None] - np.log(px))
+    mi = mi_obs.sum()
+
+    if normed:
+        mi_normed = np.sqrt(1. - np.exp(-2 * mi))
+        return mi_normed, (pyx, py, px, binsy, binsx), mi_obs
+    else:
+        return mi


 if __name__ == '__main__':
     import statsmodels.api as sm
+
     funtype = ['linear', 'quadratic'][1]
     nobs = 200
-    sig = 2
-    x = np.sort(3 * np.random.randn(nobs))
+    sig = 2#5.
+    #x = np.linspace(-3, 3, nobs) + np.random.randn(nobs)
+    x = np.sort(3*np.random.randn(nobs))
     exog = sm.add_constant(x, prepend=True)
+    #y = 0 + np.log(1+x**2) + sig * np.random.randn(nobs)
     if funtype == 'quadratic':
-        y = 0 + x ** 2 + sig * np.random.randn(nobs)
+        y = 0 + x**2 + sig * np.random.randn(nobs)
     if funtype == 'linear':
         y = 0 + x + sig * np.random.randn(nobs)
+
     print('correlation')
-    print(np.corrcoef(y, x)[0, 1])
-    print('pearsonr', stats.pearsonr(y, x))
-    print('spearmanr', stats.spearmanr(y, x))
-    print('kendalltau', stats.kendalltau(y, x))
-    pxy, binsx, binsy = np.histogram2d(x, y, bins=5)
+    print(np.corrcoef(y,x)[0, 1])
+    print('pearsonr', stats.pearsonr(y,x))
+    print('spearmanr', stats.spearmanr(y,x))
+    print('kendalltau', stats.kendalltau(y,x))
+
+    pxy, binsx, binsy = np.histogram2d(x,y, bins=5)
     px, binsx_ = np.histogram(x, bins=binsx)
     py, binsy_ = np.histogram(y, bins=binsy)
-    print('mutualinfo', infotheo.mutualinfo(px * 1.0 / nobs, py * 1.0 /
-        nobs, 1e-15 + pxy * 1.0 / nobs, logbase=np.e))
-    print('mutualinfo_kde normed', mutualinfo_kde(y, x))
-    print('mutualinfo_kde       ', mutualinfo_kde(y, x, normed=False))
-    mi_normed, (pyx2, py2, px2, binsy2, binsx2), mi_obs = mutualinfo_binned(y,
-        x, 5, normed=True)
+    print('mutualinfo', infotheo.mutualinfo(px*1./nobs, py*1./nobs,
+                                            1e-15+pxy*1./nobs, logbase=np.e))
+
+    print('mutualinfo_kde normed', mutualinfo_kde(y,x))
+    print('mutualinfo_kde       ', mutualinfo_kde(y,x, normed=False))
+    mi_normed, (pyx2, py2, px2, binsy2, binsx2), mi_obs = \
+               mutualinfo_binned(y, x, 5, normed=True)
     print('mutualinfo_binned normed', mi_normed)
     print('mutualinfo_binned       ', mi_obs.sum())
-    mi_normed, (pyx2, py2, px2, binsy2, binsx2), mi_obs = mutualinfo_binned(y,
-        x, 'auto', normed=True)
+
+    mi_normed, (pyx2, py2, px2, binsy2, binsx2), mi_obs = \
+               mutualinfo_binned(y, x, 'auto', normed=True)
     print('auto')
     print('mutualinfo_binned normed', mi_normed)
     print('mutualinfo_binned       ', mi_obs.sum())
+
     ys = np.sort(y)
     xs = np.sort(x)
-    by = ys[((nobs - 1) * np.array([0, 0.25, 0.4, 0.6, 0.75, 1])).astype(int)]
-    bx = xs[((nobs - 1) * np.array([0, 0.25, 0.4, 0.6, 0.75, 1])).astype(int)]
-    mi_normed, (pyx2, py2, px2, binsy2, binsx2), mi_obs = mutualinfo_binned(y,
-        x, (by, bx), normed=True)
+    by = ys[((nobs-1)*np.array([0, 0.25, 0.4, 0.6, 0.75, 1])).astype(int)]
+    bx = xs[((nobs-1)*np.array([0, 0.25, 0.4, 0.6, 0.75, 1])).astype(int)]
+    mi_normed, (pyx2, py2, px2, binsy2, binsx2), mi_obs = \
+               mutualinfo_binned(y, x, (by,bx), normed=True)
     print('quantiles')
     print('mutualinfo_binned normed', mi_normed)
     print('mutualinfo_binned       ', mi_obs.sum())
-    doplot = 1
+
+    doplot = 1#False
     if doplot:
         import matplotlib.pyplot as plt
         plt.plot(x, y, 'o')
diff --git a/statsmodels/sandbox/distributions/mv_normal.py b/statsmodels/sandbox/distributions/mv_normal.py
index 165a4e505..f3e3650ec 100644
--- a/statsmodels/sandbox/distributions/mv_normal.py
+++ b/statsmodels/sandbox/distributions/mv_normal.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Multivariate Normal and t distributions


@@ -145,12 +146,13 @@ What's currently there?
 """
 import numpy as np
 from scipy import special
+
 from statsmodels.sandbox.distributions.multivariate import mvstdtprob
 from .extras import mvnormcdf


 def expect_mc(dist, func=lambda x: 1, size=50000):
-    """calculate expected value of function by Monte Carlo integration
+    '''calculate expected value of function by Monte Carlo integration

     Parameters
     ----------
@@ -194,13 +196,15 @@ def expect_mc(dist, func=lambda x: 1, size=50000):
     array([ 0.09937,  0.10075])


-    """
-    pass
-
+    '''
+    def fun(x):
+        return func(x) # * dist.pdf(x)
+    rvs = dist.rvs(size=size)
+    return fun(rvs).mean(0)

-def expect_mc_bounds(dist, func=lambda x: 1, size=50000, lower=None, upper=
-    None, conditional=False, overfact=1.2):
-    """calculate expected value of function by Monte Carlo integration
+def expect_mc_bounds(dist, func=lambda x: 1, size=50000, lower=None, upper=None,
+                     conditional=False, overfact=1.2):
+    '''calculate expected value of function by Monte Carlo integration

     Parameters
     ----------
@@ -256,8 +260,46 @@ def expect_mc_bounds(dist, func=lambda x: 1, size=50000, lower=None, upper=
     [0.0, 1.0, 0.0, 3.0]


-    """
-    pass
+    '''
+    #call rvs once to find length of random vector
+    rvsdim = dist.rvs(size=1).shape[-1]
+    if lower is None:
+        lower = -np.inf * np.ones(rvsdim)
+    else:
+        lower = np.asarray(lower)
+    if upper is None:
+        upper = np.inf * np.ones(rvsdim)
+    else:
+        upper = np.asarray(upper)
+
+    def fun(x):
+        return func(x) # * dist.pdf(x)
+
+    rvsli = []
+    used = 0 #remain = size  #inplace changes size
+    total = 0
+    while True:
+        remain = size - used  #just a temp variable
+        rvs = dist.rvs(size=int(remain * overfact))
+        total += int(size * overfact)
+
+        rvsok = rvs[((rvs >= lower) & (rvs <= upper)).all(-1)]
+        #if rvsok.ndim == 1: #possible shape problems if only 1 random vector
+        rvsok = np.atleast_2d(rvsok)
+        used += rvsok.shape[0]
+
+        rvsli.append(rvsok)   #[:remain]) use extras instead
+        print(used)
+        if used >= size:
+            break
+    rvs = np.vstack(rvsli)
+    print(rvs.shape)
+    assert used == rvs.shape[0] #saftey check
+    mean_conditional = fun(rvs).mean(0)
+    if conditional:
+        return mean_conditional
+    else:
+        return mean_conditional * (used * 1. / total)


 def bivariate_normal(x, mu, cov):
@@ -268,19 +310,55 @@ def bivariate_normal(x, mu, cov):
     <http://mathworld.wolfram.com/BivariateNormalDistribution.html>`_
     at mathworld.
     """
-    pass
+    X, Y = np.transpose(x)
+    mux, muy = mu
+    sigmax, sigmaxy, tmp, sigmay = np.ravel(cov)
+    sigmax, sigmay = np.sqrt(sigmax), np.sqrt(sigmay)
+    Xmu = X-mux
+    Ymu = Y-muy
+
+    rho = sigmaxy/(sigmax*sigmay)
+    z = Xmu**2/sigmax**2 + Ymu**2/sigmay**2 - 2*rho*Xmu*Ymu/(sigmax*sigmay)
+    denom = 2*np.pi*sigmax*sigmay*np.sqrt(1-rho**2)
+    return np.exp( -z/(2*(1-rho**2))) / denom
+


 class BivariateNormal:

+
+    #TODO: make integration limits more flexible
+    #      or normalize before integration
+
     def __init__(self, mean, cov):
         self.mean = mu
         self.cov = cov
         self.sigmax, self.sigmaxy, tmp, self.sigmay = np.ravel(cov)
         self.nvars = 2

+    def rvs(self, size=1):
+        return np.random.multivariate_normal(self.mean, self.cov, size=size)
+
+    def pdf(self, x):
+        return bivariate_normal(x, self.mean, self.cov)
+
+    def logpdf(self, x):
+        #TODO: replace this
+        return np.log(self.pdf(x))
+
+    def cdf(self, x):
+        return self.expect(upper=x)
+
+    def expect(self, func=lambda x: 1, lower=(-10,-10), upper=(10,10)):
+        def fun(x, y):
+            x = np.column_stack((x,y))
+            return func(x) * self.pdf(x)
+        from scipy.integrate import dblquad
+        return dblquad(fun, lower[0], upper[0], lambda y: lower[1],
+                       lambda y: upper[1])
+
     def kl(self, other):
-        """Kullback-Leibler divergence between this and another distribution
+        '''Kullback-Leibler divergence between this and another distribution

         int f(x) (log f(x) - log g(x)) dx

@@ -290,20 +368,27 @@ class BivariateNormal:

         limits currently hardcoded

-        """
-        pass
+        '''
+        fun = lambda x : self.logpdf(x) - other.logpdf(x)
+        return self.expect(fun)

+    def kl_mc(self, other, size=500000):
+        fun = lambda x : self.logpdf(x) - other.logpdf(x)
+        rvs = self.rvs(size=size)
+        return fun(rvs).mean()

 class MVElliptical:
-    """Base Class for multivariate elliptical distributions, normal and t
+    '''Base Class for multivariate elliptical distributions, normal and t

     contains common initialization, and some common methods
     subclass needs to implement at least rvs and logpdf methods

-    """
+    '''
+    #getting common things between normal and t distribution
+

     def __init__(self, mean, sigma, *args, **kwds):
-        """initialize instance
+        '''initialize instance

         Parameters
         ----------
@@ -318,29 +403,39 @@ class MVElliptical:
         kwds : dict
             currently not used

-        """
+        '''
+
         self.extra_args = []
         self.mean = np.asarray(mean)
         self.sigma = sigma = np.asarray(sigma)
         sigma = np.squeeze(sigma)
         self.nvars = nvars = len(mean)
+        #self.covchol = np.linalg.cholesky(sigma)
+
+
+        #in the following sigma is original, self.sigma is full matrix
         if sigma.shape == ():
+            #iid
             self.sigma = np.eye(nvars) * sigma
             self.sigmainv = np.eye(nvars) / sigma
             self.cholsigmainv = np.eye(nvars) / np.sqrt(sigma)
-        elif sigma.ndim == 1 and len(sigma) == nvars:
+        elif (sigma.ndim == 1) and (len(sigma) == nvars):
+            #independent heteroskedastic
             self.sigma = np.diag(sigma)
-            self.sigmainv = np.diag(1.0 / sigma)
-            self.cholsigmainv = np.diag(1.0 / np.sqrt(sigma))
-        elif sigma.shape == (nvars, nvars):
+            self.sigmainv = np.diag(1. / sigma)
+            self.cholsigmainv = np.diag( 1. / np.sqrt(sigma))
+        elif sigma.shape == (nvars, nvars): #python tuple comparison
+            #general
             self.sigmainv = np.linalg.pinv(sigma)
             self.cholsigmainv = np.linalg.cholesky(self.sigmainv).T
         else:
             raise ValueError('sigma has invalid shape')
+
+        #store logdetsigma for logpdf
         self.logdetsigma = np.log(np.linalg.det(self.sigma))

     def rvs(self, size=1):
-        """random variable
+        '''random variable

         Parameters
         ----------
@@ -355,11 +450,11 @@ class MVElliptical:
             dimension


-        """
-        pass
+        '''
+        raise NotImplementedError

     def logpdf(self, x):
-        """logarithm of probability density function
+        '''logarithm of probability density function

         Parameters
         ----------
@@ -377,11 +472,13 @@ class MVElliptical:
         with multivariate normal vector in each row and iid across rows
         does not work now because of dot in whiten

-        """
-        pass
+        '''
+
+
+        raise NotImplementedError

     def cdf(self, x, **kwds):
-        """cumulative distribution function
+        '''cumulative distribution function

         Parameters
         ----------
@@ -396,13 +493,15 @@ class MVElliptical:
         cdf : float or array
             probability density value of each random vector

-        """
-        pass
+        '''
+        raise NotImplementedError
+

     def affine_transformed(self, shift, scale_matrix):
-        """affine transformation define in subclass because of distribution
-        specific restrictions"""
-        pass
+        '''affine transformation define in subclass because of distribution
+        specific restrictions'''
+        #implemented in subclass at least for now
+        raise NotImplementedError

     def whiten(self, x):
         """
@@ -427,10 +526,11 @@ class MVElliptical:
         --------
         standardize : subtract mean and rescale to standardized random variable.
         """
-        pass
+        x = np.asarray(x)
+        return np.dot(x, self.cholsigmainv.T)

     def pdf(self, x):
-        """probability density function
+        '''probability density function

         Parameters
         ----------
@@ -443,11 +543,11 @@ class MVElliptical:
         pdf : float or array
             probability density value of each random vector

-        """
-        pass
+        '''
+        return np.exp(self.logpdf(x))

     def standardize(self, x):
-        """standardize the random variable, i.e. subtract mean and whiten
+        '''standardize the random variable, i.e. subtract mean and whiten

         Parameters
         ----------
@@ -468,16 +568,17 @@ class MVElliptical:
         whiten : rescale random variable, standardize without subtracting mean.


-        """
-        pass
+        '''
+        return self.whiten(x - self.mean)

     def standardized(self):
-        """return new standardized MVNormal instance
-        """
-        pass
+        '''return new standardized MVNormal instance
+        '''
+        return self.affine_transformed(-self.mean, self.cholsigmainv)
+

     def normalize(self, x):
-        """normalize the random variable, i.e. subtract mean and rescale
+        '''normalize the random variable, i.e. subtract mean and rescale

         The distribution will have zero mean and sigma equal to correlation

@@ -500,46 +601,63 @@ class MVElliptical:
         whiten : rescale random variable, standardize without subtracting mean.


-        """
-        pass
+        '''
+        std_ = np.atleast_2d(self.std_sigma)
+        return (x - self.mean)/std_ #/std_.T

     def normalized(self, demeaned=True):
-        """return a normalized distribution where sigma=corr
+        '''return a normalized distribution where sigma=corr

         if demeaned is True, then mean will be set to zero

-        """
-        pass
+        '''
+        if demeaned:
+            mean_new = np.zeros_like(self.mean)
+        else:
+            mean_new = self.mean / self.std_sigma
+        sigma_new = self.corr
+        args = [getattr(self, ea) for ea in self.extra_args]
+        return self.__class__(mean_new, sigma_new, *args)

     def normalized2(self, demeaned=True):
-        """return a normalized distribution where sigma=corr
+        '''return a normalized distribution where sigma=corr



         second implementation for testing affine transformation
-        """
-        pass
+        '''
+        if demeaned:
+            shift = -self.mean
+        else:
+            shift = self.mean * (1. / self.std_sigma - 1.)
+        return self.affine_transformed(shift, np.diag(1. / self.std_sigma))
+        #the following "standardizes" cov instead
+        #return self.affine_transformed(shift, self.cholsigmainv)
+
+

     @property
     def std(self):
-        """standard deviation, square root of diagonal elements of cov
-        """
-        pass
+        '''standard deviation, square root of diagonal elements of cov
+        '''
+        return np.sqrt(np.diag(self.cov))

     @property
     def std_sigma(self):
-        """standard deviation, square root of diagonal elements of sigma
-        """
-        pass
+        '''standard deviation, square root of diagonal elements of sigma
+        '''
+        return np.sqrt(np.diag(self.sigma))
+

     @property
     def corr(self):
-        """correlation matrix"""
-        pass
+        '''correlation matrix'''
+        return self.cov / np.outer(self.std, self.std)
+
     expect_mc = expect_mc

     def marginal(self, indices):
-        """return marginal distribution for variables given by indices
+        '''return marginal distribution for variables given by indices

         this should be correct for normal and t distribution

@@ -555,12 +673,17 @@ class MVElliptical:
             contains the marginal distribution of the variables given in
             indices

-        """
-        pass
+        '''
+        indices = np.asarray(indices)
+        mean_new = self.mean[indices]
+        sigma_new = self.sigma[indices[:,None], indices]
+        args = [getattr(self, ea) for ea in self.extra_args]
+        return self.__class__(mean_new, sigma_new, *args)


+#parts taken from linear_model, but heavy adjustments
 class MVNormal0:
-    """Class for Multivariate Normal Distribution
+    '''Class for Multivariate Normal Distribution

     original full version, kept for testing, new version inherits from
     MVElliptical
@@ -568,26 +691,35 @@ class MVNormal0:
     uses Cholesky decomposition of covariance matrix for the transformation
     of the data

-    """
+    '''
+

     def __init__(self, mean, cov):
         self.mean = mean
         self.cov = cov = np.asarray(cov)
         cov = np.squeeze(cov)
         self.nvars = nvars = len(mean)
+
+
+        #in the following cov is original, self.cov is full matrix
         if cov.shape == ():
+            #iid
             self.cov = np.eye(nvars) * cov
             self.covinv = np.eye(nvars) / cov
             self.cholcovinv = np.eye(nvars) / np.sqrt(cov)
-        elif cov.ndim == 1 and len(cov) == nvars:
+        elif (cov.ndim == 1) and (len(cov) == nvars):
+            #independent heteroskedastic
             self.cov = np.diag(cov)
-            self.covinv = np.diag(1.0 / cov)
-            self.cholcovinv = np.diag(1.0 / np.sqrt(cov))
-        elif cov.shape == (nvars, nvars):
+            self.covinv = np.diag(1. / cov)
+            self.cholcovinv = np.diag( 1. / np.sqrt(cov))
+        elif cov.shape == (nvars, nvars): #python tuple comparison
+            #general
             self.covinv = np.linalg.pinv(cov)
             self.cholcovinv = np.linalg.cholesky(self.covinv).T
         else:
             raise ValueError('cov has invalid shape')
+
+        #store logdetcov for logpdf
         self.logdetcov = np.log(np.linalg.det(self.cov))

     def whiten(self, x):
@@ -613,10 +745,15 @@ class MVNormal0:
         --------
         standardize : subtract mean and rescale to standardized random variable.
         """
-        pass
+        x = np.asarray(x)
+        if np.any(self.cov):
+            #return np.dot(self.cholcovinv, x)
+            return np.dot(x, self.cholcovinv.T)
+        else:
+            return x

     def rvs(self, size=1):
-        """random variable
+        '''random variable

         Parameters
         ----------
@@ -634,11 +771,11 @@ class MVNormal0:
         -----
         uses numpy.random.multivariate_normal directly

-        """
-        pass
+        '''
+        return np.random.multivariate_normal(self.mean, self.cov, size=size)

     def pdf(self, x):
-        """probability density function
+        '''probability density function

         Parameters
         ----------
@@ -651,11 +788,12 @@ class MVNormal0:
         pdf : float or array
             probability density value of each random vector

-        """
-        pass
+        '''
+
+        return np.exp(self.logpdf(x))

     def logpdf(self, x):
-        """logarithm of probability density function
+        '''logarithm of probability density function

         Parameters
         ----------
@@ -673,22 +811,31 @@ class MVNormal0:
         with multivariate normal vector in each row and iid across rows
         does not work now because of dot in whiten

-        """
-        pass
+        '''
+        x = np.asarray(x)
+        x_whitened = self.whiten(x - self.mean)
+        SSR = np.sum(x_whitened**2, -1)
+        llf = -SSR
+        llf -= self.nvars * np.log(2. * np.pi)
+        llf -= self.logdetcov
+        llf *= 0.5
+        return llf
+
     expect_mc = expect_mc


 class MVNormal(MVElliptical):
-    """Class for Multivariate Normal Distribution
+    '''Class for Multivariate Normal Distribution

     uses Cholesky decomposition of covariance matrix for the transformation
     of the data

-    """
+    '''
     __name__ == 'Multivariate Normal Distribution'

+
     def rvs(self, size=1):
-        """random variable
+        '''random variable

         Parameters
         ----------
@@ -706,11 +853,11 @@ class MVNormal(MVElliptical):
         -----
         uses numpy.random.multivariate_normal directly

-        """
-        pass
+        '''
+        return np.random.multivariate_normal(self.mean, self.sigma, size=size)

     def logpdf(self, x):
-        """logarithm of probability density function
+        '''logarithm of probability density function

         Parameters
         ----------
@@ -728,11 +875,18 @@ class MVNormal(MVElliptical):
         with multivariate normal vector in each row and iid across rows
         does not work now because of dot in whiten

-        """
-        pass
+        '''
+        x = np.asarray(x)
+        x_whitened = self.whiten(x - self.mean)
+        SSR = np.sum(x_whitened**2, -1)
+        llf = -SSR
+        llf -= self.nvars * np.log(2. * np.pi)
+        llf -= self.logdetsigma
+        llf *= 0.5
+        return llf

     def cdf(self, x, **kwds):
-        """cumulative distribution function
+        '''cumulative distribution function

         Parameters
         ----------
@@ -747,16 +901,18 @@ class MVNormal(MVElliptical):
         cdf : float or array
             probability density value of each random vector

-        """
-        pass
+        '''
+        #lower = -np.inf * np.ones_like(x)
+        #return mvstdnormcdf(lower, self.standardize(x), self.corr, **kwds)
+        return mvnormcdf(x, self.mean, self.cov, **kwds)

     @property
     def cov(self):
-        """covariance matrix"""
-        pass
+        '''covariance matrix'''
+        return self.sigma

     def affine_transformed(self, shift, scale_matrix):
-        """return distribution of an affine transform
+        '''return distribution of an affine transform

         for full rank scale_matrix only

@@ -788,21 +944,24 @@ class MVNormal(MVElliptical):

         currently only tested because it's called by standardized

-        """
-        pass
+        '''
+        B = scale_matrix  #tmp variable
+        mean_new = np.dot(B, self.mean) + shift
+        sigma_new = np.dot(np.dot(B, self.sigma), B.T)
+        return MVNormal(mean_new, sigma_new)

     def conditional(self, indices, values):
-        """return conditional distribution
+        r'''return conditional distribution

         indices are the variables to keep, the complement is the conditioning
         set
         values are the values of the conditioning variables

-        \\bar{\\mu} = \\mu_1 + \\Sigma_{12} \\Sigma_{22}^{-1} \\left( a - \\mu_2 \\right)
+        \bar{\mu} = \mu_1 + \Sigma_{12} \Sigma_{22}^{-1} \left( a - \mu_2 \right)

         and covariance matrix

-        \\overline{\\Sigma} = \\Sigma_{11} - \\Sigma_{12} \\Sigma_{22}^{-1} \\Sigma_{21}.T
+        \overline{\Sigma} = \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21}.T

         Parameters
         ----------
@@ -819,20 +978,38 @@ class MVNormal(MVElliptical):
              values of the excluded variables.


-        """
-        pass
+        '''
+        #indices need to be nd arrays for broadcasting
+        keep = np.asarray(indices)
+        given = np.asarray([i for i in range(self.nvars) if i not in keep])
+        sigmakk = self.sigma[keep[:, None], keep]
+        sigmagg = self.sigma[given[:, None], given]
+        sigmakg = self.sigma[keep[:, None], given]
+        sigmagk = self.sigma[given[:, None], keep]
+
+
+        sigma_new = sigmakk - np.dot(sigmakg, np.linalg.solve(sigmagg, sigmagk))
+        mean_new = self.mean[keep] +  \
+            np.dot(sigmakg, np.linalg.solve(sigmagg, values-self.mean[given]))
+
+#        #or
+#        sig = np.linalg.solve(sigmagg, sigmagk).T
+#        mean_new = self.mean[keep] + np.dot(sigmakg, values-self.mean[given])
+#        sigma_new = sigmakk - np.dot(sigmakg, sig)
+        return MVNormal(mean_new, sigma_new)


+#redefine some shortcuts
 np_log = np.log
 np_pi = np.pi
 sps_gamln = special.gammaln

-
 class MVT(MVElliptical):
+
     __name__ == 'Multivariate Student T Distribution'

     def __init__(self, mean, sigma, df):
-        """initialize instance
+        '''initialize instance

         Parameters
         ----------
@@ -847,13 +1024,13 @@ class MVT(MVElliptical):
         kwds : dict
             currently not used

-        """
+        '''
         super(MVT, self).__init__(mean, sigma)
-        self.extra_args = ['df']
+        self.extra_args = ['df']  #overwrites extra_args of super
         self.df = df

     def rvs(self, size=1):
-        """random variables with Student T distribution
+        '''random variables with Student T distribution

         Parameters
         ----------
@@ -875,11 +1052,13 @@ class MVT(MVElliptical):
         does this require df>2 ?


-        """
-        pass
+        '''
+        from .multivariate import multivariate_t_rvs
+        return multivariate_t_rvs(self.mean, self.sigma, df=self.df, n=size)
+

     def logpdf(self, x):
-        """logarithm of probability density function
+        '''logarithm of probability density function

         Parameters
         ----------
@@ -892,11 +1071,25 @@ class MVT(MVElliptical):
         logpdf : float or array
             probability density value of each random vector

-        """
-        pass
+        '''
+
+        x = np.asarray(x)
+
+        df = self.df
+        nvars = self.nvars
+
+        x_whitened = self.whiten(x - self.mean) #should be float
+
+        llf = - nvars * np_log(df * np_pi)
+        llf -= self.logdetsigma
+        llf -= (df + nvars) * np_log(1 + np.sum(x_whitened**2,-1) / df)
+        llf *= 0.5
+        llf += sps_gamln((df + nvars) / 2.) - sps_gamln(df / 2.)
+
+        return llf

     def cdf(self, x, **kwds):
-        """cumulative distribution function
+        '''cumulative distribution function

         Parameters
         ----------
@@ -911,21 +1104,29 @@ class MVT(MVElliptical):
         cdf : float or array
             probability density value of each random vector

-        """
-        pass
+        '''
+        lower = -np.inf * np.ones_like(x)
+        #std_sigma = np.sqrt(np.diag(self.sigma))
+        upper = (x - self.mean)/self.std_sigma
+        return mvstdtprob(lower, upper, self.corr, self.df, **kwds)
+        #mvstdtcdf does not exist yet
+        #return mvstdtcdf(lower, x, self.corr, df, **kwds)

     @property
     def cov(self):
-        """covariance matrix
+        '''covariance matrix

         The covariance matrix for the t distribution does not exist for df<=2,
         and is equal to sigma * df/(df-2) for df>2

-        """
-        pass
+        '''
+        if self.df <= 2:
+            return np.nan * np.ones_like(self.sigma)
+        else:
+            return self.df / (self.df - 2.) * self.sigma

     def affine_transformed(self, shift, scale_matrix):
-        """return distribution of a full rank affine transform
+        '''return distribution of a full rank affine transform

         for full rank scale_matrix only

@@ -959,64 +1160,115 @@ class MVT(MVElliptical):
         where a is shift,
         B is full rank scale matrix with same dimension as sigma

-        """
-        pass
+        '''
+        #full rank method could also be in elliptical and called with super
+        #after the rank check
+        B = scale_matrix  #tmp variable as shorthand
+        if not B.shape == (self.nvars, self.nvars):
+            if (np.linalg.eigvals(B) <= 0).any():
+                raise ValueError('affine transform has to be full rank')

+        mean_new = np.dot(B, self.mean) + shift
+        sigma_new = np.dot(np.dot(B, self.sigma), B.T)
+        return MVT(mean_new, sigma_new, self.df)
+
+
+def quad2d(func=lambda x: 1, lower=(-10,-10), upper=(10,10)):
+    def fun(x, y):
+        x = np.column_stack((x,y))
+        return func(x)
+    from scipy.integrate import dblquad
+    return dblquad(fun, lower[0], upper[0], lambda y: lower[1],
+                   lambda y: upper[1])

 if __name__ == '__main__':
+
     from numpy.testing import assert_almost_equal, assert_array_almost_equal
+
     examples = ['mvn']
-    mu = 0, 0
+
+    mu = (0,0)
     covx = np.array([[1.0, 0.5], [0.5, 1.0]])
-    mu3 = [-1, 0.0, 2.0]
-    cov3 = np.array([[1.0, 0.5, 0.75], [0.5, 1.5, 0.6], [0.75, 0.6, 2.0]])
+    mu3 = [-1, 0., 2.]
+    cov3 = np.array([[ 1.  ,  0.5 ,  0.75],
+                     [ 0.5 ,  1.5 ,  0.6 ],
+                     [ 0.75,  0.6 ,  2.  ]])
+
+
     if 'mvn' in examples:
         bvn = BivariateNormal(mu, covx)
         rvs = bvn.rvs(size=1000)
         print(rvs.mean(0))
         print(np.cov(rvs, rowvar=0))
         print(bvn.expect())
-        print(bvn.cdf([0, 0]))
+        print(bvn.cdf([0,0]))
         bvn1 = BivariateNormal(mu, np.eye(2))
-        bvn2 = BivariateNormal(mu, 4 * np.eye(2))
-        fun = lambda x: np.log(bvn1.pdf(x)) - np.log(bvn.pdf(x))
+        bvn2 = BivariateNormal(mu, 4*np.eye(2))
+        fun = lambda x : np.log(bvn1.pdf(x)) - np.log(bvn.pdf(x))
         print(bvn1.expect(fun))
         print(bvn1.kl(bvn2), bvn1.kl_mc(bvn2))
         print(bvn2.kl(bvn1), bvn2.kl_mc(bvn1))
         print(bvn1.kl(bvn), bvn1.kl_mc(bvn))
         mvn = MVNormal(mu, covx)
-        mvn.pdf([0, 0])
-        mvn.pdf(np.zeros((2, 2)))
-        cov3 = np.array([[1.0, 0.5, 0.75], [0.5, 1.5, 0.6], [0.75, 0.6, 2.0]])
-        mu3 = [-1, 0.0, 2.0]
+        mvn.pdf([0,0])
+        mvn.pdf(np.zeros((2,2)))
+        #np.dot(mvn.cholcovinv.T, mvn.cholcovinv) - mvn.covinv
+
+        cov3 = np.array([[ 1.  ,  0.5 ,  0.75],
+                         [ 0.5 ,  1.5 ,  0.6 ],
+                         [ 0.75,  0.6 ,  2.  ]])
+        mu3 = [-1, 0., 2.]
         mvn3 = MVNormal(mu3, cov3)
-        mvn3.pdf((0.0, 2.0, 3.0))
-        mvn3.logpdf((0.0, 2.0, 3.0))
+        mvn3.pdf((0., 2., 3.))
+        mvn3.logpdf((0., 2., 3.))
+        #comparisons with R mvtnorm::dmvnorm
+        #decimal=14
+#        mvn3.logpdf(cov3) - [-7.667977543898155, -6.917977543898155, -5.167977543898155]
+#        #decimal 18
+#        mvn3.pdf(cov3) - [0.000467562492721686, 0.000989829804859273, 0.005696077243833402]
+#        #cheating new mean, same cov
+#        mvn3.mean = np.array([0,0,0])
+#        #decimal= 16
+#        mvn3.pdf(cov3) - [0.02914269740502042, 0.02269635555984291, 0.01767593948287269]
+
+        #as asserts
         r_val = [-7.667977543898155, -6.917977543898155, -5.167977543898155]
-        assert_array_almost_equal(mvn3.logpdf(cov3), r_val, decimal=14)
-        r_val = [0.000467562492721686, 0.000989829804859273, 
-            0.005696077243833402]
-        assert_array_almost_equal(mvn3.pdf(cov3), r_val, decimal=17)
-        mvn3c = MVNormal(np.array([0, 0, 0]), cov3)
+        assert_array_almost_equal( mvn3.logpdf(cov3), r_val, decimal = 14)
+        #decimal 18
+        r_val = [0.000467562492721686, 0.000989829804859273, 0.005696077243833402]
+        assert_array_almost_equal( mvn3.pdf(cov3), r_val, decimal = 17)
+        #cheating new mean, same cov, too dangerous, got wrong instance in tests
+        #mvn3.mean = np.array([0,0,0])
+        mvn3c = MVNormal(np.array([0,0,0]), cov3)
         r_val = [0.02914269740502042, 0.02269635555984291, 0.01767593948287269]
-        assert_array_almost_equal(mvn3c.pdf(cov3), r_val, decimal=16)
-        mvn3b = MVNormal((0, 0, 0), 1)
-        fun = lambda x: np.log(mvn3.pdf(x)) - np.log(mvn3b.pdf(x))
+        assert_array_almost_equal( mvn3c.pdf(cov3), r_val, decimal = 16)
+
+        mvn3b = MVNormal((0,0,0), 1)
+        fun = lambda x : np.log(mvn3.pdf(x)) - np.log(mvn3b.pdf(x))
         print(mvn3.expect_mc(fun))
         print(mvn3.expect_mc(fun, size=200000))
-    mvt = MVT((0, 0), 1, 5)
-    assert_almost_equal(mvt.logpdf(np.array([0.0, 0.0])), -
-        1.837877066409345, decimal=15)
-    assert_almost_equal(mvt.pdf(np.array([0.0, 0.0])), 0.1591549430918953,
-        decimal=15)
-    mvt.logpdf(np.array([1.0, 1.0])) - -3.01552989458359
-    mvt1 = MVT((0, 0), 1, 1)
-    mvt1.logpdf(np.array([1.0, 1.0])) - -3.48579549941151
+
+
+    mvt = MVT((0,0), 1, 5)
+    assert_almost_equal(mvt.logpdf(np.array([0.,0.])), -1.837877066409345,
+                        decimal=15)
+    assert_almost_equal(mvt.pdf(np.array([0.,0.])), 0.1591549430918953,
+                        decimal=15)
+
+    mvt.logpdf(np.array([1.,1.]))-(-3.01552989458359)
+
+    mvt1 = MVT((0,0), 1, 1)
+    mvt1.logpdf(np.array([1.,1.]))-(-3.48579549941151) #decimal=16
+
     rvs = mvt.rvs(100000)
     assert_almost_equal(np.cov(rvs, rowvar=0), mvt.cov, decimal=1)
+
     mvt31 = MVT(mu3, cov3, 1)
-    assert_almost_equal(mvt31.pdf(cov3), [0.0007276818698165781, 
-        0.0009980625182293658, 0.0027661422056214652], decimal=18)
+    assert_almost_equal(mvt31.pdf(cov3),
+        [0.0007276818698165781, 0.0009980625182293658, 0.0027661422056214652],
+        decimal=18)
+
     mvt = MVT(mu3, cov3, 3)
-    assert_almost_equal(mvt.pdf(cov3), [0.00086377742424741, 
-        0.001277510788307594, 0.004156314279452241], decimal=17)
+    assert_almost_equal(mvt.pdf(cov3),
+        [0.000863777424247410, 0.001277510788307594, 0.004156314279452241],
+        decimal=17)
diff --git a/statsmodels/sandbox/distributions/otherdist.py b/statsmodels/sandbox/distributions/otherdist.py
index 27dbe863a..3bc09ed52 100644
--- a/statsmodels/sandbox/distributions/otherdist.py
+++ b/statsmodels/sandbox/distributions/otherdist.py
@@ -1,4 +1,4 @@
-"""Parametric Mixture Distributions
+'''Parametric Mixture Distributions

 Created on Sat Jun 04 2011

@@ -18,13 +18,14 @@ pdf of Tobit model (?) - truncation with clipping
 Question: Metaclasses and class factories for generating new distributions from
 existing distributions by transformation, mixing, compounding

-"""
+'''
+
+
 import numpy as np
 from scipy import stats

-
 class ParametricMixtureD:
-    """mixtures with a discrete distribution
+    '''mixtures with a discrete distribution

     The mixing distribution is a discrete distribution like scipy.stats.poisson.
     All distribution in the mixture of the same type and parametrized
@@ -41,11 +42,10 @@ class ParametricMixtureD:
     initialization looks fragile for all possible cases of lower and upper
     bounds of the distributions.

-    """
-
+    '''
     def __init__(self, mixing_dist, base_dist, bd_args_func, bd_kwds_func,
-        cutoff=0.001):
-        """create a mixture distribution
+                 cutoff=1e-3):
+        '''create a mixture distribution

         Parameters
         ----------
@@ -70,27 +70,63 @@ class ParametricMixtureD:
             draws that are outside the truncated range are clipped, that is
             assigned to the highest or lowest value in the truncated support.

-        """
+        '''
         self.mixing_dist = mixing_dist
         self.base_dist = base_dist
+        #self.bd_args = bd_args
         if not np.isneginf(mixing_dist.dist.a):
             lower = mixing_dist.dist.a
         else:
-            lower = mixing_dist.ppf(0.0001)
+            lower = mixing_dist.ppf(1e-4)
         if not np.isposinf(mixing_dist.dist.b):
             upper = mixing_dist.dist.b
         else:
-            upper = mixing_dist.isf(0.0001)
+            upper = mixing_dist.isf(1e-4)
         self.ma = lower
         self.mb = upper
-        mixing_support = np.arange(lower, upper + 1)
+        mixing_support = np.arange(lower, upper+1)
         self.mixing_probs = mixing_dist.pmf(mixing_support)
+
         self.bd_args = bd_args_func(mixing_support)
         self.bd_kwds = bd_kwds_func(mixing_support)

+    def rvs(self, size=1):
+        mrvs = self.mixing_dist.rvs(size)
+        #TODO: check strange cases ? this assumes continous integers
+        mrvs_idx = (np.clip(mrvs, self.ma, self.mb) - self.ma).astype(int)
+
+        bd_args = tuple(md[mrvs_idx] for md in self.bd_args)
+        bd_kwds = dict((k, self.bd_kwds[k][mrvs_idx]) for k in self.bd_kwds)
+        kwds = {'size':size}
+        kwds.update(bd_kwds)
+        rvs = self.base_dist.rvs(*self.bd_args, **kwds)
+        return rvs, mrvs_idx
+
+
+
+
+
+    def pdf(self, x):
+        x = np.asarray(x)
+        if np.size(x) > 1:
+            x = x[...,None] #[None, ...]
+        bd_probs = self.base_dist.pdf(x, *self.bd_args, **self.bd_kwds)
+        prob = (bd_probs * self.mixing_probs).sum(-1)
+        return prob, bd_probs
+
+    def cdf(self, x):
+        x = np.asarray(x)
+        if np.size(x) > 1:
+            x = x[...,None] #[None, ...]
+        bd_probs = self.base_dist.cdf(x, *self.bd_args, **self.bd_kwds)
+        prob = (bd_probs * self.mixing_probs).sum(-1)
+        return prob, bd_probs
+
+
+#try:

 class ClippedContinuous:
-    """clipped continuous distribution with a masspoint at clip_lower
+    '''clipped continuous distribution with a masspoint at clip_lower


     Notes
@@ -113,52 +149,159 @@ class ClippedContinuous:
     We could add a check whether the values are in a small neighborhood, but
     it would be expensive (need to search and check all values).

-    """
+    '''

     def __init__(self, base_dist, clip_lower):
         self.base_dist = base_dist
         self.clip_lower = clip_lower

     def _get_clip_lower(self, kwds):
-        """helper method to get clip_lower from kwds or attribute
+        '''helper method to get clip_lower from kwds or attribute
+
+        '''
+        if 'clip_lower' not in kwds:
+            clip_lower = self.clip_lower
+        else:
+            clip_lower = kwds.pop('clip_lower')
+        return clip_lower, kwds
+
+    def rvs(self, *args, **kwds):
+        clip_lower, kwds = self._get_clip_lower(kwds)
+        rvs_ = self.base_dist.rvs(*args, **kwds)
+        #same as numpy.clip ?
+        rvs_[rvs_ < clip_lower] = clip_lower
+        return rvs_
+
+
+
+    def pdf(self, x, *args, **kwds):
+        x = np.atleast_1d(x)
+        if 'clip_lower' not in kwds:
+            clip_lower = self.clip_lower
+        else:
+            #allow clip_lower to be a possible parameter
+            clip_lower = kwds.pop('clip_lower')
+        pdf_raw = np.atleast_1d(self.base_dist.pdf(x, *args, **kwds))
+        clip_mask = (x == self.clip_lower)
+        if np.any(clip_mask):
+            clip_prob = self.base_dist.cdf(clip_lower, *args, **kwds)
+            pdf_raw[clip_mask] = clip_prob
+
+        #the following will be handled by sub-classing rv_continuous
+        pdf_raw[x < clip_lower] = 0
+
+        return pdf_raw
+
+    def cdf(self, x, *args, **kwds):
+        if 'clip_lower' not in kwds:
+            clip_lower = self.clip_lower
+        else:
+            #allow clip_lower to be a possible parameter
+            clip_lower = kwds.pop('clip_lower')
+        cdf_raw = self.base_dist.cdf(x, *args, **kwds)
+
+        #not needed if equality test is used
+##        clip_mask = (x == self.clip_lower)
+##        if np.any(clip_mask):
+##            clip_prob = self.base_dist.cdf(clip_lower, *args, **kwds)
+##            pdf_raw[clip_mask] = clip_prob
+
+        #the following will be handled by sub-classing rv_continuous
+        #if self.a is defined
+        cdf_raw[x < clip_lower] = 0
+
+        return cdf_raw
+
+    def sf(self, x, *args, **kwds):
+        if 'clip_lower' not in kwds:
+            clip_lower = self.clip_lower
+        else:
+            #allow clip_lower to be a possible parameter
+            clip_lower = kwds.pop('clip_lower')
+
+        sf_raw = self.base_dist.sf(x, *args, **kwds)
+        sf_raw[x <= clip_lower] = 1
+
+        return sf_raw
+
+
+    def ppf(self, x, *args, **kwds):
+        raise NotImplementedError
+
+    def plot(self, x, *args, **kwds):
+
+        clip_lower, kwds = self._get_clip_lower(kwds)
+        mass = self.pdf(clip_lower, *args, **kwds)
+        xr = np.concatenate(([clip_lower+1e-6], x[x>clip_lower]))
+        import matplotlib.pyplot as plt
+        #x = np.linspace(-4, 4, 21)
+        #plt.figure()
+        plt.xlim(clip_lower-0.1, x.max())
+        #remove duplicate calculation
+        xpdf = self.pdf(x, *args, **kwds)
+        plt.ylim(0, max(mass, xpdf.max())*1.1)
+        plt.plot(xr, self.pdf(xr, *args, **kwds))
+        #plt.vline(clip_lower, self.pdf(clip_lower, *args, **kwds))
+        plt.stem([clip_lower], [mass],
+                 linefmt='b-', markerfmt='bo', basefmt='r-')
+        return
+

-        """
-        pass


 if __name__ == '__main__':
+
     doplots = 1
-    mdist = stats.poisson(2.0)
+
+    #*********** Poisson-Normal Mixture
+    mdist = stats.poisson(2.)
     bdist = stats.norm
     bd_args_fn = lambda x: ()
-    bd_kwds_fn = lambda x: {'loc': x, 'scale': 0.1 * np.ones_like(x)}
+    #bd_kwds_fn = lambda x: {'loc': np.atleast_2d(10./(1+x))}
+    bd_kwds_fn = lambda x: {'loc': x, 'scale': 0.1*np.ones_like(x)} #10./(1+x)}
+
+
     pd = ParametricMixtureD(mdist, bdist, bd_args_fn, bd_kwds_fn)
     print(pd.pdf(1))
-    p, bp = pd.pdf(np.linspace(0, 20, 21))
-    pc, bpc = pd.cdf(np.linspace(0, 20, 21))
+    p, bp = pd.pdf(np.linspace(0,20,21))
+    pc, bpc = pd.cdf(np.linspace(0,20,21))
     print(pd.rvs())
     rvs, m = pd.rvs(size=1000)
+
+
     if doplots:
         import matplotlib.pyplot as plt
-        plt.hist(rvs, bins=100)
+        plt.hist(rvs, bins = 100)
         plt.title('poisson mixture of normal distributions')
+
+    #********** clipped normal distribution (Tobit)
+
     bdist = stats.norm
-    clip_lower_ = 0.0
+    clip_lower_ = 0. #-0.5
     cnorm = ClippedContinuous(bdist, clip_lower_)
-    x = np.linspace(1e-08, 4, 11)
+    x = np.linspace(1e-8, 4, 11)
     print(cnorm.pdf(x))
     print(cnorm.cdf(x))
+
     if doplots:
+        #plt.figure()
+        #cnorm.plot(x)
         plt.figure()
-        cnorm.plot(x=np.linspace(-1, 4, 51), loc=0.5, scale=np.sqrt(2))
+        cnorm.plot(x = np.linspace(-1, 4, 51), loc=0.5, scale=np.sqrt(2))
         plt.title('clipped normal distribution')
+
         fig = plt.figure()
-        for i, loc in enumerate([0.0, 0.5, 1.0, 2.0]):
-            fig.add_subplot(2, 2, i + 1)
-            cnorm.plot(x=np.linspace(-1, 4, 51), loc=loc, scale=np.sqrt(2))
+        for i, loc in enumerate([0., 0.5, 1.,2.]):
+            fig.add_subplot(2,2,i+1)
+            cnorm.plot(x = np.linspace(-1, 4, 51), loc=loc, scale=np.sqrt(2))
             plt.title('clipped normal, loc = %3.2f' % loc)
+
+
         loc = 1.5
         rvs = cnorm.rvs(loc=loc, size=2000)
         plt.figure()
         plt.hist(rvs, bins=50)
         plt.title('clipped normal rvs, loc = %3.2f' % loc)
+
+
+    #plt.show()
diff --git a/statsmodels/sandbox/distributions/quantize.py b/statsmodels/sandbox/distributions/quantize.py
index e25950c55..1d1c06a45 100644
--- a/statsmodels/sandbox/distributions/quantize.py
+++ b/statsmodels/sandbox/distributions/quantize.py
@@ -1,13 +1,12 @@
-"""Quantizing a continuous distribution in 2d
+'''Quantizing a continuous distribution in 2d

 Author: josef-pktd
-"""
+'''
 from statsmodels.compat.python import lmap
 import numpy as np

-
 def prob_bv_rectangle(lower, upper, cdf):
-    """helper function for probability of a rectangle in a bivariate distribution
+    '''helper function for probability of a rectangle in a bivariate distribution

     Parameters
     ----------
@@ -20,12 +19,15 @@ def prob_bv_rectangle(lower, upper, cdf):


     how does this generalize to more than 2 variates ?
-    """
-    pass
-
+    '''
+    probuu = cdf(*upper)
+    probul = cdf(upper[0], lower[1])
+    problu = cdf(lower[0], upper[1])
+    probll = cdf(*lower)
+    return probuu - probul - problu + probll

 def prob_mv_grid(bins, cdf, axis=-1):
-    """helper function for probability of a rectangle grid in a multivariate distribution
+    '''helper function for probability of a rectangle grid in a multivariate distribution

     how does this generalize to more than 2 variates ?

@@ -33,24 +35,58 @@ def prob_mv_grid(bins, cdf, axis=-1):
         tuple of bin edges, currently it is assumed that they broadcast
         correctly

-    """
-    pass
+    '''
+    if not isinstance(bins, np.ndarray):
+        bins = lmap(np.asarray, bins)
+        n_dim = len(bins)
+        bins_ = []
+        #broadcast if binedges are 1d
+        if all(lmap(np.ndim, bins) == np.ones(n_dim)):
+            for d in range(n_dim):
+                sl = [None]*n_dim
+                sl[d] = slice(None)
+                bins_.append(bins[d][sl])
+    else: #assume it is already correctly broadcasted
+        n_dim = bins.shape[0]
+        bins_ = bins
+
+    print(len(bins))
+    cdf_values = cdf(bins_)
+    probs = cdf_values.copy()
+    for d in range(n_dim):
+        probs = np.diff(probs, axis=d)
+
+    return probs


 def prob_quantize_cdf(binsx, binsy, cdf):
-    """quantize a continuous distribution given by a cdf
+    '''quantize a continuous distribution given by a cdf

     Parameters
     ----------
     binsx : array_like, 1d
         binedges

-    """
-    pass
-
+    '''
+    binsx = np.asarray(binsx)
+    binsy = np.asarray(binsy)
+    nx = len(binsx) - 1
+    ny = len(binsy) - 1
+    probs = np.nan * np.ones((nx, ny)) #np.empty(nx,ny)
+    cdf_values = cdf(binsx[:,None], binsy)
+    cdf_func = lambda x, y: cdf_values[x,y]
+    for xind in range(1, nx+1):
+        for yind in range(1, ny+1):
+            upper = (xind, yind)
+            lower = (xind-1, yind-1)
+            #print upper,lower,
+            probs[xind-1,yind-1] = prob_bv_rectangle(lower, upper, cdf_func)
+
+    assert not np.isnan(probs).any()
+    return probs

 def prob_quantize_cdf_old(binsx, binsy, cdf):
-    """quantize a continuous distribution given by a cdf
+    '''quantize a continuous distribution given by a cdf

     old version without precomputing cdf values

@@ -59,27 +95,47 @@ def prob_quantize_cdf_old(binsx, binsy, cdf):
     binsx : array_like, 1d
         binedges

-    """
-    pass
+    '''
+    binsx = np.asarray(binsx)
+    binsy = np.asarray(binsy)
+    nx = len(binsx) - 1
+    ny = len(binsy) - 1
+    probs = np.nan * np.ones((nx, ny)) #np.empty(nx,ny)
+    for xind in range(1, nx+1):
+        for yind in range(1, ny+1):
+            upper = (binsx[xind], binsy[yind])
+            lower = (binsx[xind-1], binsy[yind-1])
+            #print upper,lower,
+            probs[xind-1,yind-1] = prob_bv_rectangle(lower, upper, cdf)
+
+    assert not np.isnan(probs).any()
+    return probs
+
+


 if __name__ == '__main__':
     from numpy.testing import assert_almost_equal
-    unif_2d = lambda x, y: x * y
-    assert_almost_equal(prob_bv_rectangle([0, 0], [1, 0.5], unif_2d), 0.5, 14)
-    assert_almost_equal(prob_bv_rectangle([0, 0], [0.5, 0.5], unif_2d), 
-        0.25, 14)
-    arr1b = np.array([[0.05, 0.05, 0.05, 0.05], [0.05, 0.05, 0.05, 0.05], [
-        0.05, 0.05, 0.05, 0.05], [0.05, 0.05, 0.05, 0.05], [0.05, 0.05, 
-        0.05, 0.05]])
-    arr1a = prob_quantize_cdf(np.linspace(0, 1, 6), np.linspace(0, 1, 5),
-        unif_2d)
+    unif_2d = lambda x,y: x*y
+    assert_almost_equal(prob_bv_rectangle([0,0], [1,0.5], unif_2d), 0.5, 14)
+    assert_almost_equal(prob_bv_rectangle([0,0], [0.5,0.5], unif_2d), 0.25, 14)
+
+    arr1b = np.array([[ 0.05,  0.05,  0.05,  0.05],
+                       [ 0.05,  0.05,  0.05,  0.05],
+                       [ 0.05,  0.05,  0.05,  0.05],
+                       [ 0.05,  0.05,  0.05,  0.05],
+                       [ 0.05,  0.05,  0.05,  0.05]])
+
+    arr1a = prob_quantize_cdf(np.linspace(0,1,6), np.linspace(0,1,5), unif_2d)
     assert_almost_equal(arr1a, arr1b, 14)
-    arr2b = np.array([[0.25], [0.25], [0.25], [0.25]])
-    arr2a = prob_quantize_cdf(np.linspace(0, 1, 5), np.linspace(0, 1, 2),
-        unif_2d)
+
+    arr2b = np.array([[ 0.25],
+                      [ 0.25],
+                      [ 0.25],
+                      [ 0.25]])
+    arr2a = prob_quantize_cdf(np.linspace(0,1,5), np.linspace(0,1,2), unif_2d)
     assert_almost_equal(arr2a, arr2b, 14)
-    arr3b = np.array([[0.25, 0.25, 0.25, 0.25]])
-    arr3a = prob_quantize_cdf(np.linspace(0, 1, 2), np.linspace(0, 1, 5),
-        unif_2d)
+
+    arr3b = np.array([[ 0.25,  0.25,  0.25,  0.25]])
+    arr3a = prob_quantize_cdf(np.linspace(0,1,2), np.linspace(0,1,5), unif_2d)
     assert_almost_equal(arr3a, arr3b, 14)
diff --git a/statsmodels/sandbox/distributions/sppatch.py b/statsmodels/sandbox/distributions/sppatch.py
index 4b53f521b..cd62f4a50 100644
--- a/statsmodels/sandbox/distributions/sppatch.py
+++ b/statsmodels/sandbox/distributions/sppatch.py
@@ -1,4 +1,4 @@
-"""patching scipy to fit distributions and expect method
+'''patching scipy to fit distributions and expect method

 This adds new methods to estimate continuous distribution parameters with some
 fixed/frozen parameters. It also contains functions that calculate the expected
@@ -9,16 +9,25 @@ distribution fit, but these are neither general nor verified.

 Author: josef-pktd
 License: Simplified BSD
-"""
+'''
 from statsmodels.compat.python import lmap
 import numpy as np
 from scipy import stats, optimize, integrate
+
+
+########## patching scipy
+
+#vonmises does not define finite bounds, because it is intended for circular
+#support which does not define a proper pdf on the real line
+
 stats.distributions.vonmises.a = -np.pi
 stats.distributions.vonmises.b = np.pi

+#the next 3 functions are for fit with some fixed parameters
+#As they are written, they do not work as functions, only as methods

 def _fitstart(self, x):
-    """example method, method of moment estimator as starting values
+    '''example method, method of moment estimator as starting values

     Parameters
     ----------
@@ -38,12 +47,14 @@ def _fitstart(self, x):
     This example was written for the gamma distribution, but not verified
     with literature

-    """
-    pass
-
+    '''
+    loc = np.min([x.min(),0])
+    a = 4/stats.skew(x)**2
+    scale = np.std(x) / np.sqrt(a)
+    return (a, loc, scale)

 def _fitstart_beta(self, x, fixed=None):
-    """method of moment estimator as starting values for beta distribution
+    '''method of moment estimator as starting values for beta distribution

     Parameters
     ----------
@@ -71,12 +82,41 @@ def _fitstart_beta(self, x, fixed=None):
     NIST reference also includes reference to MLE in
     Johnson, Kotz, and Balakrishan, Volume II, pages 221-235

-    """
-    pass
-
+    '''
+    #todo: separate out this part to be used for other compact support distributions
+    #      e.g. rdist, vonmises, and truncnorm
+    #      but this might not work because it might still be distribution specific
+    a, b = x.min(), x.max()
+    eps = (a-b)*0.01
+    if fixed is None:
+        #this part not checked with books
+        loc = a - eps
+        scale = (a - b) * (1 + 2*eps)
+    else:
+        if np.isnan(fixed[-2]):
+            #estimate loc
+            loc = a - eps
+        else:
+            loc = fixed[-2]
+        if np.isnan(fixed[-1]):
+            #estimate scale
+            scale = (b + eps) - loc
+        else:
+            scale = fixed[-1]
+
+    #method of moment for known loc scale:
+    scale = float(scale)
+    xtrans = (x - loc)/scale
+    xm = xtrans.mean()
+    xv = xtrans.var()
+    tmp = (xm*(1-xm)/xv - 1)
+    p = xm * tmp
+    q = (1 - xm) * tmp
+
+    return (p, q, loc, scale)  #check return type and should fixed be returned ?

 def _fitstart_poisson(self, x, fixed=None):
-    """maximum likelihood estimator as starting values for Poisson distribution
+    '''maximum likelihood estimator as starting values for Poisson distribution

     Parameters
     ----------
@@ -101,12 +141,61 @@ def _fitstart_poisson(self, x, fixed=None):
     MLE :
     https://en.wikipedia.org/wiki/Poisson_distribution#Maximum_likelihood

-    """
-    pass
-
+    '''
+    #todo: separate out this part to be used for other compact support distributions
+    #      e.g. rdist, vonmises, and truncnorm
+    #      but this might not work because it might still be distribution specific
+    a = x.min()
+    eps = 0 # is this robust ?
+    if fixed is None:
+        #this part not checked with books
+        loc = a - eps
+    else:
+        if np.isnan(fixed[-1]):
+            #estimate loc
+            loc = a - eps
+        else:
+            loc = fixed[-1]
+
+    #MLE for standard (unshifted, if loc=0) Poisson distribution
+
+    xtrans = (x - loc)
+    lambd = xtrans.mean()
+    #second derivative d loglike/ dlambd Not used
+    #dlldlambd = 1/lambd # check
+
+    return (lambd, loc)  #check return type and should fixed be returned ?
+
+
+def nnlf_fr(self, thetash, x, frmask):
+    # new frozen version
+    # - sum (log pdf(x, theta),axis=0)
+    #   where theta are the parameters (including loc and scale)
+    #
+    try:
+        if frmask is not None:
+            theta = frmask.copy()
+            theta[np.isnan(frmask)] = thetash
+        else:
+            theta = thetash
+        loc = theta[-2]
+        scale = theta[-1]
+        args = tuple(theta[:-2])
+    except IndexError:
+        raise ValueError("Not enough input arguments.")
+    if not self._argcheck(*args) or scale <= 0:
+        return np.inf
+    x = np.array((x-loc) / scale)
+    cond0 = (x <= self.a) | (x >= self.b)
+    if (np.any(cond0)):
+        return np.inf
+    else:
+        N = len(x)
+        #raise ValueError
+        return self._nnlf(x, *args) + N*np.log(scale)

 def fit_fr(self, data, *args, **kwds):
-    """estimate distribution parameters by MLE taking some parameters as fixed
+    '''estimate distribution parameters by MLE taking some parameters as fixed

     Parameters
     ----------
@@ -162,13 +251,52 @@ def fit_fr(self, data, *args, **kwds):
     * check if docstring is correct
     * more input checking, args is list ? might also apply to current fit method

-    """
-    pass
+    '''
+    loc0, scale0 = lmap(kwds.get, ['loc', 'scale'],[0.0, 1.0])
+    Narg = len(args)
+
+    if Narg == 0 and hasattr(self, '_fitstart'):
+        x0 = self._fitstart(data)
+    elif Narg > self.numargs:
+        raise ValueError("Too many input arguments.")
+    else:
+        args += (1.0,)*(self.numargs-Narg)
+        # location and scale are at the end
+        x0 = args + (loc0, scale0)
+
+    if 'frozen' in kwds:
+        frmask = np.array(kwds['frozen'])
+        if len(frmask) != self.numargs+2:
+            raise ValueError("Incorrect number of frozen arguments.")
+        else:
+            # keep starting values for not frozen parameters
+            for n in range(len(frmask)):
+                # Troubleshooting ex_generic_mle_tdist
+                if isinstance(frmask[n], np.ndarray) and frmask[n].size == 1:
+                    frmask[n] = frmask[n].item()
+
+            # If there were array elements, then frmask will be object-dtype,
+            #  in which case np.isnan will raise TypeError
+            frmask = frmask.astype(np.float64)
+            x0  = np.array(x0)[np.isnan(frmask)]
+    else:
+        frmask = None
+
+    #print(x0
+    #print(frmask
+    return optimize.fmin(self.nnlf_fr, x0,
+                args=(np.ravel(data), frmask), disp=0)
+

+#The next two functions/methods calculate expected value of an arbitrary
+#function, however for the continuous functions intquad is use, which might
+#require continuouity or smoothness in the function.

-def expect(self, fn=None, args=(), loc=0, scale=1, lb=None, ub=None,
-    conditional=False):
-    """calculate expected value of a function with respect to the distribution
+
+#TODO: add option for Monte Carlo integration
+
+def expect(self, fn=None, args=(), loc=0, scale=1, lb=None, ub=None, conditional=False):
+    '''calculate expected value of a function with respect to the distribution

     location and scale only tested on a few examples

@@ -196,13 +324,28 @@ def expect(self, fn=None, args=(), loc=0, scale=1, lb=None, ub=None,
     This function has not been checked for it's behavior when the integral is
     not finite. The integration behavior is inherited from scipy.integrate.quad.

-    """
-    pass
+    '''
+    if fn is None:
+        def fun(x, *args):
+            return x*self.pdf(x, loc=loc, scale=scale, *args)
+    else:
+        def fun(x, *args):
+            return fn(x)*self.pdf(x, loc=loc, scale=scale, *args)
+    if lb is None:
+        lb = loc + self.a * scale #(self.a - loc)/(1.0*scale)
+    if ub is None:
+        ub = loc + self.b * scale #(self.b - loc)/(1.0*scale)
+    if conditional:
+        invfac = (self.sf(lb, loc=loc, scale=scale, *args)
+                  - self.sf(ub, loc=loc, scale=scale, *args))
+    else:
+        invfac = 1.0
+    return integrate.quad(fun, lb, ub,
+                                args=args)[0]/invfac


-def expect_v2(self, fn=None, args=(), loc=0, scale=1, lb=None, ub=None,
-    conditional=False):
-    """calculate expected value of a function with respect to the distribution
+def expect_v2(self, fn=None, args=(), loc=0, scale=1, lb=None, ub=None, conditional=False):
+    '''calculate expected value of a function with respect to the distribution

     location and scale only tested on a few examples

@@ -242,13 +385,50 @@ def expect_v2(self, fn=None, args=(), loc=0, scale=1, lb=None, ub=None,
     for example if the distribution is very concentrated and the default limits
     are too large.

-    """
-    pass
+    '''
+    #changes: 20100809
+    #correction and refactoring how loc and scale are handled
+    #uses now _pdf
+    #needs more testing for distribution with bound support, e.g. genpareto
+
+    if fn is None:
+        def fun(x, *args):
+            return (loc + x*scale)*self._pdf(x, *args)
+    else:
+        def fun(x, *args):
+            return fn(loc + x*scale)*self._pdf(x, *args)
+    if lb is None:
+        #lb = self.a
+        try:
+            lb = self.ppf(1e-9, *args)  #1e-14 quad fails for pareto
+        except ValueError:
+            lb = self.a
+    else:
+        lb = max(self.a, (lb - loc)/(1.0*scale)) #transform to standardized
+    if ub is None:
+        #ub = self.b
+        try:
+            ub = self.ppf(1-1e-9, *args)
+        except ValueError:
+            ub = self.b
+    else:
+        ub = min(self.b, (ub - loc)/(1.0*scale))
+    if conditional:
+        invfac = self._sf(lb,*args) - self._sf(ub,*args)
+    else:
+        invfac = 1.0
+    return integrate.quad(fun, lb, ub,
+                                args=args, limit=500)[0]/invfac

+### for discrete distributions

+#TODO: check that for a distribution with finite support the calculations are
+#      done with one array summation (np.dot)
+
+#based on _drv2_moment(self, n, *args), but streamlined
 def expect_discrete(self, fn=None, args=(), loc=0, lb=None, ub=None,
-    conditional=False):
-    """calculate expected value of a function with respect to the distribution
+                    conditional=False):
+    '''calculate expected value of a function with respect to the distribution
     for discrete distribution

     Parameters
@@ -288,20 +468,86 @@ def expect_discrete(self, fn=None, args=(), loc=0, lb=None, ub=None,
         are evaluated)


-    """
-    pass
+    '''
+
+    #moment_tol = 1e-12 # increase compared to self.moment_tol,
+    # too slow for only small gain in precision for zipf
+
+    #avoid endless loop with unbound integral, eg. var of zipf(2)
+    maxcount = 1000
+    suppnmin = 100  #minimum number of points to evaluate (+ and -)
+
+    if fn is None:
+        def fun(x):
+            #loc and args from outer scope
+            return (x+loc)*self._pmf(x, *args)
+    else:
+        def fun(x):
+            #loc and args from outer scope
+            return fn(x+loc)*self._pmf(x, *args)
+    # used pmf because _pmf does not check support in randint
+    # and there might be problems(?) with correct self.a, self.b at this stage
+    # maybe not anymore, seems to work now with _pmf
+
+    self._argcheck(*args) # (re)generate scalar self.a and self.b
+    if lb is None:
+        lb = (self.a)
+    else:
+        lb = lb - loc

+    if ub is None:
+        ub = (self.b)
+    else:
+        ub = ub - loc
+    if conditional:
+        invfac = self.sf(lb,*args) - self.sf(ub+1,*args)
+    else:
+        invfac = 1.0
+
+    tot = 0.0
+    low, upp = self._ppf(0.001, *args), self._ppf(0.999, *args)
+    low = max(min(-suppnmin, low), lb)
+    upp = min(max(suppnmin, upp), ub)
+    supp = np.arange(low, upp+1, self.inc) #check limits
+    #print('low, upp', low, upp
+    tot = np.sum(fun(supp))
+    diff = 1e100
+    pos = upp + self.inc
+    count = 0
+
+    #handle cases with infinite support
+
+    while (pos <= ub) and (diff > self.moment_tol) and count <= maxcount:
+        diff = fun(pos)
+        tot += diff
+        pos += self.inc
+        count += 1
+
+    if self.a < 0: #handle case when self.a = -inf
+        diff = 1e100
+        pos = low - self.inc
+        while (pos >= lb) and (diff > self.moment_tol) and count <= maxcount:
+            diff = fun(pos)
+            tot += diff
+            pos -= self.inc
+            count += 1
+    if count > maxcount:
+        # replace with proper warning
+        print('sum did not converge')
+    return tot/invfac

 stats.distributions.rv_continuous.fit_fr = fit_fr
 stats.distributions.rv_continuous.nnlf_fr = nnlf_fr
 stats.distributions.rv_continuous.expect = expect
 stats.distributions.rv_discrete.expect = expect_discrete
-stats.distributions.beta_gen._fitstart = _fitstart_beta
-stats.distributions.poisson_gen._fitstart = _fitstart_poisson
+stats.distributions.beta_gen._fitstart = _fitstart_beta  #not tried out yet
+stats.distributions.poisson_gen._fitstart = _fitstart_poisson  #not tried out yet
+
+########## end patching scipy


 def distfitbootstrap(sample, distr, nrepl=100):
-    """run bootstrap for estimation of distribution parameters
+    '''run bootstrap for estimation of distribution parameters

     hard coded: only one shape parameter is allowed and estimated,
         loc=0 and scale=1 are fixed in the estimation
@@ -319,12 +565,17 @@ def distfitbootstrap(sample, distr, nrepl=100):
     res : array (nrepl,)
         parameter estimates for all bootstrap replications

-    """
-    pass
-
+    '''
+    nobs = len(sample)
+    res = np.zeros(nrepl)
+    for ii in range(nrepl):
+        rvsind = np.random.randint(nobs, size=nobs)
+        x = sample[rvsind]
+        res[ii] = distr.fit_fr(x, frozen=[np.nan, 0.0, 1.0])
+    return res

 def distfitmc(sample, distr, nrepl=100, distkwds={}):
-    """run Monte Carlo for estimation of distribution parameters
+    '''run Monte Carlo for estimation of distribution parameters

     hard coded: only one shape parameter is allowed and estimated,
         loc=0 and scale=1 are fixed in the estimation
@@ -342,12 +593,18 @@ def distfitmc(sample, distr, nrepl=100, distkwds={}):
     res : array (nrepl,)
         parameter estimates for all Monte Carlo replications

-    """
-    pass
+    '''
+    arg = distkwds.pop('arg')
+    nobs = len(sample)
+    res = np.zeros(nrepl)
+    for ii in range(nrepl):
+        x = distr.rvs(arg, size=nobs, **distkwds)
+        res[ii] = distr.fit_fr(x, frozen=[np.nan, 0.0, 1.0])
+    return res


 def printresults(sample, arg, bres, kind='bootstrap'):
-    """calculate and print(Bootstrap or Monte Carlo result
+    '''calculate and print(Bootstrap or Monte Carlo result

     Parameters
     ----------
@@ -376,15 +633,43 @@ def printresults(sample, arg, bres, kind='bootstrap'):

     todo: return results and string instead of printing

-    """
-    pass
+    '''
+    print('true parameter value')
+    print(arg)
+    print('MLE estimate of parameters using sample (nobs=%d)'% (nobs))
+    argest = distr.fit_fr(sample, frozen=[np.nan, 0.0, 1.0])
+    print(argest)
+    if kind == 'bootstrap':
+        #bootstrap compares to estimate from sample
+        argorig = arg
+        arg = argest
+
+    print('%s distribution of parameter estimate (nrepl=%d)'% (kind, nrepl))
+    print('mean = %f, bias=%f' % (bres.mean(0), bres.mean(0)-arg))
+    print('median', np.median(bres, axis=0))
+    print('var and std', bres.var(0), np.sqrt(bres.var(0)))
+    bmse = ((bres - arg)**2).mean(0)
+    print('mse, rmse', bmse, np.sqrt(bmse))
+    bressorted = np.sort(bres)
+    print('%s confidence interval (90%% coverage)' % kind)
+    print(bressorted[np.floor(nrepl*0.05)], bressorted[np.floor(nrepl*0.95)])
+    print('%s confidence interval (90%% coverage) normal approximation' % kind)
+    print(stats.norm.ppf(0.05, loc=bres.mean(), scale=bres.std()),)
+    print(stats.norm.isf(0.05, loc=bres.mean(), scale=bres.std()))
+    print('Kolmogorov-Smirnov test for normality of %s distribution' % kind)
+    print(' - estimated parameters, p-values not really correct')
+    print(stats.kstest(bres, 'norm', (bres.mean(), bres.std())))


 if __name__ == '__main__':
+
     examplecases = ['largenumber', 'bootstrap', 'montecarlo'][:]
+
     if 'largenumber' in examplecases:
+
         print('\nDistribution: vonmises')
-        for nobs in [200]:
+
+        for nobs in [200]:#[20000, 1000, 100]:
             x = stats.vonmises.rvs(1.23, loc=0, scale=1, size=nobs)
             print('\nnobs:', nobs)
             print('true parameter')
@@ -394,10 +679,12 @@ if __name__ == '__main__':
             print(stats.vonmises.fit_fr(x, frozen=[np.nan, np.nan, np.nan]))
             print('with fixed loc and scale')
             print(stats.vonmises.fit_fr(x, frozen=[np.nan, 0.0, 1.0]))
+
         print('\nDistribution: gamma')
         distr = stats.gamma
-        arg, loc, scale = 2.5, 0.0, 20.0
-        for nobs in [200]:
+        arg, loc, scale = 2.5, 0., 20.
+
+        for nobs in [200]:#[20000, 1000, 100]:
             x = distr.rvs(arg, loc=loc, scale=scale, size=nobs)
             print('\nnobs:', nobs)
             print('true parameter')
@@ -409,25 +696,32 @@ if __name__ == '__main__':
             print(distr.fit_fr(x, frozen=[np.nan, 0.0, 1.0]))
             print('with fixed loc')
             print(distr.fit_fr(x, frozen=[np.nan, 0.0, np.nan]))
+
+
     ex = ['gamma', 'vonmises'][0]
+
     if ex == 'gamma':
         distr = stats.gamma
-        arg, loc, scale = 2.5, 0.0, 1
+        arg, loc, scale = 2.5, 0., 1
     elif ex == 'vonmises':
         distr = stats.vonmises
-        arg, loc, scale = 1.5, 0.0, 1
+        arg, loc, scale = 1.5, 0., 1
     else:
         raise ValueError('wrong example')
+
     nobs = 100
     nrepl = 1000
+
     sample = distr.rvs(arg, loc=loc, scale=scale, size=nobs)
+
     print('\nDistribution:', distr)
     if 'bootstrap' in examplecases:
         print('\nBootstrap')
-        bres = distfitbootstrap(sample, distr, nrepl=nrepl)
+        bres = distfitbootstrap(sample, distr, nrepl=nrepl )
         printresults(sample, arg, bres)
+
     if 'montecarlo' in examplecases:
         print('\nMonteCarlo')
-        mcres = distfitmc(sample, distr, nrepl=nrepl, distkwds=dict(arg=arg,
-            loc=loc, scale=scale))
+        mcres = distfitmc(sample, distr, nrepl=nrepl,
+                          distkwds=dict(arg=arg, loc=loc, scale=scale))
         printresults(sample, arg, mcres, kind='montecarlo')
diff --git a/statsmodels/sandbox/distributions/transform_functions.py b/statsmodels/sandbox/distributions/transform_functions.py
index afc4d5c66..616d2371e 100644
--- a/statsmodels/sandbox/distributions/transform_functions.py
+++ b/statsmodels/sandbox/distributions/transform_functions.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Nonlinear Transformation classes


@@ -15,31 +16,101 @@ class TransformFunction:
         self.func(x)


+
+## Hump and U-shaped functions
+
+
 class SquareFunc(TransformFunction):
-    """class to hold quadratic function with inverse function and derivative
+    '''class to hold quadratic function with inverse function and derivative

     using instance methods instead of class methods, if we want extension
     to parametrized function
-    """
+    '''
+
+    def func(self, x):
+        return np.power(x, 2.)
+
+    def inverseplus(self, x):
+        return np.sqrt(x)
+
+    def inverseminus(self, x):
+        return 0.0 - np.sqrt(x)
+
+    def derivplus(self, x):
+        return 0.5/np.sqrt(x)
+
+    def derivminus(self, x):
+        return 0.0 - 0.5/np.sqrt(x)
+
+


 class NegSquareFunc(TransformFunction):
-    """negative quadratic function
+    '''negative quadratic function

-    """
+    '''
+    def func(self, x):
+        return -np.power(x,2)
+
+    def inverseplus(self, x):
+        return np.sqrt(-x)
+
+    def inverseminus(self, x):
+        return 0.0 - np.sqrt(-x)
+
+    def derivplus(self, x):
+        return 0.0 - 0.5/np.sqrt(-x)
+
+    def derivminus(self, x):
+        return 0.5/np.sqrt(-x)


 class AbsFunc(TransformFunction):
-    """class for absolute value transformation
-    """
+    '''class for absolute value transformation
+    '''
+
+    def func(self, x):
+        return np.abs(x)
+
+    def inverseplus(self, x):
+        return x
+
+    def inverseminus(self, x):
+        return 0.0 - x
+
+    def derivplus(self, x):
+        return 1.0
+
+    def derivminus(self, x):
+        return 0.0 - 1.0
+
+
+## monotonic functions
+# more monotone functions in families.links, some for restricted domains


 class LogFunc(TransformFunction):
-    pass

+    def func(self, x):
+        return np.log(x)
+
+    def inverse(self, y):
+        return np.exp(y)
+
+    def deriv(self, x):
+        return 1./x

 class ExpFunc(TransformFunction):
-    pass
+
+
+    def func(self, x):
+        return np.exp(x)
+
+    def inverse(self, y):
+        return np.log(y)
+
+    def deriv(self, x):
+        return np.exp(x)


 class BoxCoxNonzeroFunc(TransformFunction):
@@ -47,6 +118,15 @@ class BoxCoxNonzeroFunc(TransformFunction):
     def __init__(self, lamda):
         self.lamda = lamda

+    def func(self, x):
+        return (np.power(x, self.lamda) - 1)/self.lamda
+
+    def inverse(self, y):
+        return (self.lamda * y + 1)/self.lamda
+
+    def deriv(self, x):
+        return np.power(x, self.lamda - 1)
+

 class AffineFunc(TransformFunction):

@@ -54,6 +134,15 @@ class AffineFunc(TransformFunction):
         self.constant = constant
         self.slope = slope

+    def func(self, x):
+        return self.constant + self.slope * x
+
+    def inverse(self, y):
+        return (y - self.constant) / self.slope
+
+    def deriv(self, x):
+        return self.slope
+

 class ChainFunc(TransformFunction):

@@ -61,6 +150,30 @@ class ChainFunc(TransformFunction):
         self.finn = finn
         self.fout = fout

+    def func(self, x):
+        return self.fout.func(self.finn.func(x))
+
+    def inverse(self, y):
+        return self.f1.inverse(self.fout.inverse(y))
+
+    def deriv(self, x):
+        z = self.finn.func(x)
+        return self.fout.deriv(z) * self.finn.deriv(x)
+
+
+#def inverse(x):
+#    return np.divide(1.0,x)
+#
+#mux, stdx = 0.05, 0.1
+#mux, stdx = 9.0, 1.0
+#def inversew(x):
+#    return 1.0/(1+mux+x*stdx)
+#def inversew_inv(x):
+#    return (1.0/x - 1.0 - mux)/stdx #.np.divide(1.0,x)-10
+#
+#def identit(x):
+#    return x
+

 if __name__ == '__main__':
     absf = AbsFunc()
@@ -68,7 +181,8 @@ if __name__ == '__main__':
     absf.func(-5) == 5
     absf.inverseplus(5) == 5
     absf.inverseminus(5) == -5
-    chainf = ChainFunc(AffineFunc(1, 2), BoxCoxNonzeroFunc(2))
-    print(chainf.func(3.0))
-    chainf2 = ChainFunc(BoxCoxNonzeroFunc(2), AffineFunc(1, 2))
-    print(chainf.func(3.0))
+
+    chainf = ChainFunc(AffineFunc(1,2), BoxCoxNonzeroFunc(2))
+    print(chainf.func(3.))
+    chainf2 = ChainFunc(BoxCoxNonzeroFunc(2), AffineFunc(1,2))
+    print(chainf.func(3.))
diff --git a/statsmodels/sandbox/distributions/transformed.py b/statsmodels/sandbox/distributions/transformed.py
index 45c28cee8..648954fe7 100644
--- a/statsmodels/sandbox/distributions/transformed.py
+++ b/statsmodels/sandbox/distributions/transformed.py
@@ -1,4 +1,6 @@
-""" A class for the distribution of a non-linear monotonic transformation of a continuous random variable
+## copied from nonlinear_transform_gen.py
+
+''' A class for the distribution of a non-linear monotonic transformation of a continuous random variable

 simplest usage:
 example: create log-gamma distribution, i.e. y = log(x),
@@ -33,56 +35,123 @@ Created on Tuesday, October 28, 2008, 12:40:37 PM
 Author: josef-pktd
 License: BSD

-"""
+'''
 from scipy import stats
 from scipy.stats import distributions
 import numpy as np


+def get_u_argskwargs(**kwargs):
+    # Todo: What's this? wrong spacing, used in Transf_gen TransfTwo_gen
+    u_kwargs = dict((k.replace('u_', '', 1), v) for k, v in kwargs.items()
+                    if k.startswith('u_'))
+    u_args = u_kwargs.pop('u_args', None)
+    return u_args, u_kwargs
+
+
 class Transf_gen(distributions.rv_continuous):
-    """a class for non-linear monotonic transformation of a continuous random variable
+    '''a class for non-linear monotonic transformation of a continuous random variable

-    """
+    '''

     def __init__(self, kls, func, funcinv, *args, **kwargs):
+        # print(args
+        # print(kwargs
+
         self.func = func
         self.funcinv = funcinv
+        # explicit for self.__dict__.update(kwargs)
+        # need to set numargs because inspection does not work
         self.numargs = kwargs.pop('numargs', 0)
+        # print(self.numargs
         name = kwargs.pop('name', 'transfdist')
-        longname = kwargs.pop('longname', 'Non-linear transformed distribution'
-            )
+        longname = kwargs.pop('longname', 'Non-linear transformed distribution')
         extradoc = kwargs.pop('extradoc', None)
         a = kwargs.pop('a', -np.inf)
         b = kwargs.pop('b', np.inf)
         self.decr = kwargs.pop('decr', False)
+        # defines whether it is a decreasing (True)
+        #       or increasing (False) monotonic transformation
+
         self.u_args, self.u_kwargs = get_u_argskwargs(**kwargs)
-        self.kls = kls
-        super(Transf_gen, self).__init__(a=a, b=b, name=name, shapes=kls.
-            shapes, longname=longname)
+        self.kls = kls  # (self.u_args, self.u_kwargs)
+        # possible to freeze the underlying distribution
+
+        super(Transf_gen, self).__init__(a=a, b=b, name=name,
+                                         shapes=kls.shapes,
+                                         longname=longname,
+                                         # extradoc = extradoc
+                                         )
+
+    def _cdf(self, x, *args, **kwargs):
+        # print(args
+        if not self.decr:
+            return self.kls._cdf(self.funcinv(x), *args, **kwargs)
+            # note scipy _cdf only take *args not *kwargs
+        else:
+            return 1.0 - self.kls._cdf(self.funcinv(x), *args, **kwargs)
+
+    def _ppf(self, q, *args, **kwargs):
+        if not self.decr:
+            return self.func(self.kls._ppf(q, *args, **kwargs))
+        else:
+            return self.func(self.kls._ppf(1 - q, *args, **kwargs))
+
+
+def inverse(x):
+    return np.divide(1.0, x)


 mux, stdx = 0.05, 0.1
 mux, stdx = 9.0, 1.0
-invdnormalg = Transf_gen(stats.norm, inversew, inversew_inv, decr=True,
-    numargs=0, name='discf', longname='normal-based discount factor')
-lognormalg = Transf_gen(stats.norm, np.exp, np.log, numargs=2, a=0, name=
-    'lnnorm', longname='Exp transformed normal')
+
+
+def inversew(x):
+    return 1.0 / (1 + mux + x * stdx)
+
+
+def inversew_inv(x):
+    return (1.0 / x - 1.0 - mux) / stdx  # .np.divide(1.0,x)-10
+
+
+def identit(x):
+    return x
+
+
+invdnormalg = Transf_gen(stats.norm, inversew, inversew_inv, decr=True,  # a=-np.inf,
+                         numargs=0, name='discf', longname='normal-based discount factor',
+                         # extradoc = '\ndistribution of discount factor y=1/(1+x)) with x N(0.05,0.1**2)'
+                         )
+
+lognormalg = Transf_gen(stats.norm, np.exp, np.log,
+                        numargs=2, a=0, name='lnnorm',
+                        longname='Exp transformed normal',
+                        # extradoc = '\ndistribution of y = exp(x), with x standard normal'
+                        # 'precision for moment andstats is not very high, 2-3 decimals'
+                        )
+
 loggammaexpg = Transf_gen(stats.gamma, np.log, np.exp, numargs=1)
-"""univariate distribution of a non-linear monotonic transformation of a
+
+## copied form nonlinear_transform_short.py
+
+'''univariate distribution of a non-linear monotonic transformation of a
 random variable

-"""
+'''


 class ExpTransf_gen(distributions.rv_continuous):
-    """Distribution based on log/exp transformation
+    '''Distribution based on log/exp transformation

     the constructor can be called with a distribution class
     and generates the distribution of the transformed random variable

-    """
+    '''

     def __init__(self, kls, *args, **kwargs):
+        # print(args
+        # print(kwargs
+        # explicit for self.__dict__.update(kwargs)
         if 'numargs' in kwargs:
             self.numargs = kwargs['numargs']
         else:
@@ -98,16 +167,24 @@ class ExpTransf_gen(distributions.rv_continuous):
         super(ExpTransf_gen, self).__init__(a=a, name=name)
         self.kls = kls

+    def _cdf(self, x, *args):
+        # print(args
+        return self.kls._cdf(np.log(x), *args)
+
+    def _ppf(self, q, *args):
+        return np.exp(self.kls._ppf(q, *args))
+

 class LogTransf_gen(distributions.rv_continuous):
-    """Distribution based on log/exp transformation
+    '''Distribution based on log/exp transformation

     the constructor can be called with a distribution class
     and generates the distribution of the transformed random variable

-    """
+    '''

     def __init__(self, kls, *args, **kwargs):
+        # explicit for self.__dict__.update(kwargs)
         if 'numargs' in kwargs:
             self.numargs = kwargs['numargs']
         else:
@@ -120,16 +197,66 @@ class LogTransf_gen(distributions.rv_continuous):
             a = kwargs['a']
         else:
             a = 0
+
         super(LogTransf_gen, self).__init__(a=a, name=name)
         self.kls = kls

+    def _cdf(self, x, *args):
+        # print(args
+        return self.kls._cdf(np.exp(x), *args)
+
+    def _ppf(self, q, *args):
+        return np.log(self.kls._ppf(q, *args))
+
+
+def examples_transf():
+    ##lognormal = ExpTransf(a=0.0, xa=-10.0, name = 'Log transformed normal')
+    ##print(lognormal.cdf(1)
+    ##print(stats.lognorm.cdf(1,1)
+    ##print(lognormal.stats()
+    ##print(stats.lognorm.stats(1)
+    ##print(lognormal.rvs(size=10)
+
+    print('Results for lognormal')
+    lognormalg = ExpTransf_gen(stats.norm, a=0, name='Log transformed normal general')
+    print(lognormalg.cdf(1))
+    print(stats.lognorm.cdf(1, 1))
+    print(lognormalg.stats())
+    print(stats.lognorm.stats(1))
+    print(lognormalg.rvs(size=5))

-"""
+    ##print('Results for loggamma'
+    ##loggammag = ExpTransf_gen(stats.gamma)
+    ##print(loggammag._cdf(1,10)
+    ##print(stats.loggamma.cdf(1,10)
+
+    print('Results for expgamma')
+    loggammaexpg = LogTransf_gen(stats.gamma)
+    print(loggammaexpg._cdf(1, 10))
+    print(stats.loggamma.cdf(1, 10))
+    print(loggammaexpg._cdf(2, 15))
+    print(stats.loggamma.cdf(2, 15))
+
+    # this requires change in scipy.stats.distribution
+    # print(loggammaexpg.cdf(1,10)
+
+    print('Results for loglaplace')
+    loglaplaceg = LogTransf_gen(stats.laplace)
+    print(loglaplaceg._cdf(2, 10))
+    print(stats.loglaplace.cdf(2, 10))
+    loglaplaceexpg = ExpTransf_gen(stats.laplace)
+    print(loglaplaceexpg._cdf(2, 10))
+
+
+## copied from transformtwo.py
+
+'''
 Created on Apr 28, 2009

 @author: Josef Perktold
-"""
-""" A class for the distribution of a non-linear u-shaped or hump shaped transformation of a
+'''
+
+''' A class for the distribution of a non-linear u-shaped or hump shaped transformation of a
 continuous random variable

 This is a companion to the distributions of non-linear monotonic transformation to the case
@@ -157,11 +284,11 @@ TODO:

   * add _rvs as method, will be faster in many cases

-"""
+'''


 class TransfTwo_gen(distributions.rv_continuous):
-    """Distribution based on a non-monotonic (u- or hump-shaped transformation)
+    '''Distribution based on a non-monotonic (u- or hump-shaped transformation)

     the constructor can be called with a distribution class, and functions
     that define the non-linear transformation.
@@ -174,50 +301,189 @@ class TransfTwo_gen(distributions.rv_continuous):
     This can be used to generate distribution instances similar to the
     distributions in scipy.stats.

-    """
+    '''

+    # a class for non-linear non-monotonic transformation of a continuous random variable
     def __init__(self, kls, func, funcinvplus, funcinvminus, derivplus,
-        derivminus, *args, **kwargs):
+                 derivminus, *args, **kwargs):
+        # print(args
+        # print(kwargs
+
         self.func = func
         self.funcinvplus = funcinvplus
         self.funcinvminus = funcinvminus
         self.derivplus = derivplus
         self.derivminus = derivminus
+        # explicit for self.__dict__.update(kwargs)
+        # need to set numargs because inspection does not work
         self.numargs = kwargs.pop('numargs', 0)
+        # print(self.numargs
         name = kwargs.pop('name', 'transfdist')
-        longname = kwargs.pop('longname', 'Non-linear transformed distribution'
-            )
+        longname = kwargs.pop('longname', 'Non-linear transformed distribution')
         extradoc = kwargs.pop('extradoc', None)
-        a = kwargs.pop('a', -np.inf)
-        b = kwargs.pop('b', np.inf)
+        a = kwargs.pop('a', -np.inf)  # attached to self in super
+        b = kwargs.pop('b', np.inf)  # self.a, self.b would be overwritten
         self.shape = kwargs.pop('shape', False)
+        # defines whether it is a `u` shaped or `hump' shaped
+        #       transformation
+
         self.u_args, self.u_kwargs = get_u_argskwargs(**kwargs)
-        self.kls = kls
-        super(TransfTwo_gen, self).__init__(a=a, b=b, name=name, shapes=kls
-            .shapes, longname=longname)
+        self.kls = kls  # (self.u_args, self.u_kwargs)
+        # possible to freeze the underlying distribution
+
+        super(TransfTwo_gen, self).__init__(a=a, b=b,
+                                            name=name,
+                                            shapes=kls.shapes,
+                                            longname=longname,
+                                            # extradoc = extradoc
+                                            )
+
+    def _rvs(self, *args):
+        self.kls._size = self._size  # size attached to self, not function argument
+        return self.func(self.kls._rvs(*args))
+
+    def _pdf(self, x, *args, **kwargs):
+        # print(args
+        if self.shape == 'u':
+            signpdf = 1
+        elif self.shape == 'hump':
+            signpdf = -1
+        else:
+            raise ValueError('shape can only be `u` or `hump`')
+
+        return signpdf * (self.derivplus(x) * self.kls._pdf(self.funcinvplus(x), *args, **kwargs) -
+                          self.derivminus(x) * self.kls._pdf(self.funcinvminus(x), *args,
+                                                             **kwargs))
+        # note scipy _cdf only take *args not *kwargs
+
+    def _cdf(self, x, *args, **kwargs):
+        # print(args
+        if self.shape == 'u':
+            return self.kls._cdf(self.funcinvplus(x), *args, **kwargs) - \
+                self.kls._cdf(self.funcinvminus(x), *args, **kwargs)
+            # note scipy _cdf only take *args not *kwargs
+        else:
+            return 1.0 - self._sf(x, *args, **kwargs)
+
+    def _sf(self, x, *args, **kwargs):
+        # print(args
+        if self.shape == 'hump':
+            return self.kls._cdf(self.funcinvplus(x), *args, **kwargs) - \
+                self.kls._cdf(self.funcinvminus(x), *args, **kwargs)
+            # note scipy _cdf only take *args not *kwargs
+        else:
+            return 1.0 - self._cdf(x, *args, **kwargs)

+    def _munp(self, n, *args, **kwargs):
+        return self._mom0_sc(n, *args)
+
+
+# ppf might not be possible in general case?
+# should be possible in symmetric case
+#    def _ppf(self, q, *args, **kwargs):
+#        if self.shape == 'u':
+#            return self.func(self.kls._ppf(q,*args, **kwargs))
+#        elif self.shape == 'hump':
+#            return self.func(self.kls._ppf(1-q,*args, **kwargs))
+
+# TODO: rename these functions to have unique names

 class SquareFunc:
-    """class to hold quadratic function with inverse function and derivative
+    '''class to hold quadratic function with inverse function and derivative

     using instance methods instead of class methods, if we want extension
     to parametrized function
-    """
+    '''
+
+    def inverseplus(self, x):
+        return np.sqrt(x)
+
+    def inverseminus(self, x):
+        return 0.0 - np.sqrt(x)
+
+    def derivplus(self, x):
+        return 0.5 / np.sqrt(x)
+
+    def derivminus(self, x):
+        return 0.0 - 0.5 / np.sqrt(x)
+
+    def squarefunc(self, x):
+        return np.power(x, 2)


 sqfunc = SquareFunc()
-squarenormalg = TransfTwo_gen(stats.norm, sqfunc.squarefunc, sqfunc.
-    inverseplus, sqfunc.inverseminus, sqfunc.derivplus, sqfunc.derivminus,
-    shape='u', a=0.0, b=np.inf, numargs=0, name='squarenorm', longname=
-    'squared normal distribution')
+
+squarenormalg = TransfTwo_gen(stats.norm, sqfunc.squarefunc, sqfunc.inverseplus,
+                              sqfunc.inverseminus, sqfunc.derivplus, sqfunc.derivminus,
+                              shape='u', a=0.0, b=np.inf,
+                              numargs=0, name='squarenorm', longname='squared normal distribution',
+                              # extradoc = '\ndistribution of the square of a normal random variable' +\
+                              #            ' y=x**2 with x N(0.0,1)'
+                              )
+# u_loc=l, u_scale=s)
 squaretg = TransfTwo_gen(stats.t, sqfunc.squarefunc, sqfunc.inverseplus,
-    sqfunc.inverseminus, sqfunc.derivplus, sqfunc.derivminus, shape='u', a=
-    0.0, b=np.inf, numargs=1, name='squarenorm', longname=
-    'squared t distribution')
-negsquarenormalg = TransfTwo_gen(stats.norm, negsquarefunc, inverseplus,
-    inverseminus, derivplus, derivminus, shape='hump', a=-np.inf, b=0.0,
-    numargs=0, name='negsquarenorm', longname=
-    'negative squared normal distribution')
+                         sqfunc.inverseminus, sqfunc.derivplus, sqfunc.derivminus,
+                         shape='u', a=0.0, b=np.inf,
+                         numargs=1, name='squarenorm', longname='squared t distribution',
+                         # extradoc = '\ndistribution of the square of a t random variable' +\
+                         #           ' y=x**2 with x t(dof,0.0,1)'
+                         )
+
+
+def inverseplus(x):
+    return np.sqrt(-x)
+
+
+def inverseminus(x):
+    return 0.0 - np.sqrt(-x)
+
+
+def derivplus(x):
+    return 0.0 - 0.5 / np.sqrt(-x)
+
+
+def derivminus(x):
+    return 0.5 / np.sqrt(-x)
+
+
+def negsquarefunc(x):
+    return -np.power(x, 2)
+
+
+negsquarenormalg = TransfTwo_gen(stats.norm, negsquarefunc, inverseplus, inverseminus,
+                                 derivplus, derivminus, shape='hump', a=-np.inf, b=0.0,
+                                 numargs=0, name='negsquarenorm',
+                                 longname='negative squared normal distribution',
+                                 # extradoc = '\ndistribution of the negative square of a normal random variable' +\
+                                 #            ' y=-x**2 with x N(0.0,1)'
+                                 )
+
+
+# u_loc=l, u_scale=s)
+
+def inverseplus(x):
+    return x
+
+
+def inverseminus(x):
+    return 0.0 - x
+
+
+def derivplus(x):
+    return 1.0
+
+
+def derivminus(x):
+    return 0.0 - 1.0
+
+
+def absfunc(x):
+    return np.abs(x)
+
+
 absnormalg = TransfTwo_gen(stats.norm, np.abs, inverseplus, inverseminus,
-    derivplus, derivminus, shape='u', a=0.0, b=np.inf, numargs=0, name=
-    'absnorm', longname='absolute of normal distribution')
+                           derivplus, derivminus, shape='u', a=0.0, b=np.inf,
+                           numargs=0, name='absnorm', longname='absolute of normal distribution',
+                           # extradoc = '\ndistribution of the absolute value of a normal random variable' +\
+                           #           ' y=abs(x) with x N(0,1)'
+                           )
diff --git a/statsmodels/sandbox/distributions/try_max.py b/statsmodels/sandbox/distributions/try_max.py
index 8090c5fe3..3a854e4aa 100644
--- a/statsmodels/sandbox/distributions/try_max.py
+++ b/statsmodels/sandbox/distributions/try_max.py
@@ -1,12 +1,14 @@
-"""
+'''

 adjusted from Denis on pystatsmodels mailing list

 there might still be problems with loc and scale,

-"""
+'''
+
 from scipy import stats
-__date__ = '2010-12-29 dec'
+
+__date__ = "2010-12-29 dec"


 class MaxDist(stats.rv_continuous):
@@ -20,16 +22,37 @@ class MaxDist(stats.rv_continuous):
     def __init__(self, dist, n):
         self.dist = dist
         self.n = n
-        extradoc = ('maximumdistribution is the distribution of the ' +
-            'maximum of n i.i.d. random variable')
+        extradoc = 'maximumdistribution is the distribution of the ' \
+                   + 'maximum of n i.i.d. random variable'
         super(MaxDist, self).__init__(name='maxdist', a=dist.a, b=dist.b,
-            longname='A maximumdistribution')
+                                      longname='A maximumdistribution',
+                                      # extradoc = extradoc
+                                      )
+
+    def _pdf(self, x, *args, **kw):
+        return self.n * self.dist.pdf(x, *args, **kw) \
+            * self.dist.cdf(x, *args, **kw) ** (self.n - 1)
+
+    def _cdf(self, x, *args, **kw):
+        return self.dist.cdf(x, *args, **kw) ** self.n
+
+    def _ppf(self, q, *args, **kw):
+        # y = F(x) ^ n  <=>  x = F-1( y ^ 1/n)
+        return self.dist.ppf(q ** (1. / self.n), *args, **kw)
+
+
+##    def rvs( self, *args, **kw ):
+##       size = kw.pop( "size", 1 )
+##       u = np.random.uniform( size=size, **kw ) ** (1 / self.n)
+##       return self.dist.ppf( u, **kw )


 maxdistr = MaxDist(stats.norm, 10)
+
 print(maxdistr.rvs(size=10))
 print(maxdistr.stats(moments='mvsk'))
-"""
+
+'''
 >>> print maxdistr.stats(moments = 'mvsk')
 (array(1.5387527308351818), array(0.34434382328492852), array(0.40990510188513779), array(0.33139861783918922))
 >>> rvs = np.random.randn(1000,10)
@@ -52,4 +75,4 @@ print(maxdistr.stats(moments='mvsk'))
 0.99999999999999956


-"""
+'''
diff --git a/statsmodels/sandbox/distributions/try_pot.py b/statsmodels/sandbox/distributions/try_pot.py
index 6ea21d5f3..6a088423b 100644
--- a/statsmodels/sandbox/distributions/try_pot.py
+++ b/statsmodels/sandbox/distributions/try_pot.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Wed May 04 06:09:18 2011

@@ -7,7 +8,7 @@ import numpy as np


 def mean_residual_life(x, frac=None, alpha=0.05):
-    """empirical mean residual life or expected shortfall
+    '''empirical mean residual life or expected shortfall

     Parameters
     ----------
@@ -23,16 +24,44 @@ def mean_residual_life(x, frac=None, alpha=0.05):
         last observations std is zero
         vectorize loop using cumsum
         frac does not work yet
-    """
-    pass
+    '''

+    axis = 0  # searchsorted is 1d only
+    x = np.asarray(x)
+    nobs = x.shape[axis]
+    xsorted = np.sort(x, axis=axis)
+    if frac is None:
+        xthreshold = xsorted
+    else:
+        xthreshold = xsorted[np.floor(nobs * frac).astype(int)]
+    # use searchsorted instead of simple index in case of ties
+    xlargerindex = np.searchsorted(xsorted, xthreshold, side='right')

-expected_shortfall = mean_residual_life
-if __name__ == '__main__':
+    # TODO:replace loop with cumsum ?
+    result = []
+    for i in range(len(xthreshold)-1):
+        k_ind = xlargerindex[i]
+        rmean = x[k_ind:].mean()
+        # this does not work for last observations, nans
+        rstd = x[k_ind:].std()
+        rmstd = rstd/np.sqrt(nobs-k_ind)  # std error of mean, check formula
+        result.append((k_ind, xthreshold[i], rmean, rmstd))
+
+    res = np.array(result)
+    crit = 1.96  # TODO: without loading stats, crit = -stats.t.ppf(0.05)
+    confint = res[:, 1:2] + crit * res[:, -1:] * np.array([[-1, 1]])
+    return np.column_stack((res, confint))
+
+
+expected_shortfall = mean_residual_life  # alias
+
+
+if __name__ == "__main__":
     rvs = np.random.standard_t(5, size=10)
     res = mean_residual_life(rvs)
     print(res)
     rmean = [rvs[i:].mean() for i in range(len(rvs))]
     print(res[:, 2] - rmean[1:])
+
     res_frac = mean_residual_life(rvs, frac=[0.5])
     print(res_frac)
diff --git a/statsmodels/sandbox/examples/bayesprior.py b/statsmodels/sandbox/examples/bayesprior.py
index 10acc7395..c0e48959d 100644
--- a/statsmodels/sandbox/examples/bayesprior.py
+++ b/statsmodels/sandbox/examples/bayesprior.py
@@ -1,111 +1,245 @@
+#
+# This script examines the predictive prior densities of two local level
+# models given the same priors for parameters that appear to be the same.
+# Reference: Del Negro and Schorfheide.
+
 try:
     import pymc
     pymc_installed = 1
 except:
-    print('pymc not imported')
+    print("pymc not imported")
     pymc_installed = 0
+
 import numpy as np
 from matplotlib import pyplot as plt
 from scipy import stats, integrate
 from scipy.stats import rv_continuous
 from scipy.special import gammaln, gammaincinv, gammainc
-from numpy import log, exp
+from numpy import log,exp

+#np.random.seed(12345)

 class igamma_gen(rv_continuous):
-    pass
+    def _pdf(self, x, a, b):
+        return exp(self._logpdf(x,a,b))
+    def _logpdf(self, x, a, b):
+        return a*log(b) - gammaln(a) -(a+1)*log(x) - b/x
+    def _cdf(self, x, a, b):
+        return 1.0-gammainc(a,b/x) # why is this different than the wiki?
+    def _ppf(self, q, a, b):
+        return b/gammaincinv(a,1-q)
+#NOTE: should be correct, work through invgamma example and 2 param inv gamma
+#CDF
+    def _munp(self, n, a, b):
+        args = (a,b)
+        super(igamma_gen, self)._munp(self, n, *args)
+#TODO: is this robust for differential entropy in this case? closed form or
+#shortcuts in special?
+    def _entropy(self, *args):
+        def integ(x):
+            val = self._pdf(x, *args)
+            return val*log(val)

+        entr = -integrate.quad(integ, self.a, self.b)[0]
+        if not np.isnan(entr):
+            return entr
+        else:
+            raise ValueError("Problem with integration.  Returned nan.")

-igamma = igamma_gen(a=0.0, name='invgamma', longname='An inverted gamma',
-    shapes='a,b', extradoc=
-    """
+igamma = igamma_gen(a=0.0, name='invgamma', longname="An inverted gamma",
+            shapes = 'a,b', extradoc="""

 Inverted gamma distribution

 invgamma.pdf(x,a,b) = b**a*x**(-a-1)/gamma(a) * exp(-b/x)
 for x > 0, a > 0, b>0.
-"""
-    )
-palpha = np.random.gamma(400.0, 0.005, size=10000)
-print("""First moment: %s
-Second moment: %s""" % (palpha.mean(), palpha.std()))
+""")
+
+
+#NOTE: the above is unnecessary.  B takes the same role as the scale parameter
+# in inverted gamma
+
+palpha = np.random.gamma(400.,.005, size=10000)
+print("First moment: %s\nSecond moment: %s" % (palpha.mean(),palpha.std()))
 palpha = palpha[0]
-prho = np.random.beta(49.5, 49.5, size=100000.0)
-print('Beta Distribution')
-print("""First moment: %s
-Second moment: %s""" % (prho.mean(), prho.std()))
+
+prho = np.random.beta(49.5,49.5, size=1e5)
+print("Beta Distribution")
+print("First moment: %s\nSecond moment: %s" % (prho.mean(),prho.std()))
 prho = prho[0]
-psigma = igamma.rvs(1.0, 4.0 ** 2 / 2, size=100000.0)
-print('Inverse Gamma Distribution')
-print("""First moment: %s
-Second moment: %s""" % (psigma.mean(), psigma.std()))
+
+psigma = igamma.rvs(1.,4.**2/2, size=1e5)
+print("Inverse Gamma Distribution")
+print("First moment: %s\nSecond moment: %s" % (psigma.mean(),psigma.std()))
+
+# First do the univariate case
+# y_t = theta_t + epsilon_t
+# epsilon ~ N(0,1)
+# Where theta ~ N(mu,lambda**2)
+
+
+# or the model
+# y_t = theta2_t + theta1_t * y_t-1 + epsilon_t
+
+# Prior 1:
+# theta1 ~ uniform(0,1)
+# theta2|theta1 ~ N(mu,lambda**2)
+# Prior 2:
+# theta1 ~ U(0,1)
+# theta2|theta1 ~ N(mu(1-theta1),lambda**2(1-theta1)**2)
+
 draws = 400
-mu_, lambda_ = 1.0, 2.0
-y1y2 = np.zeros((draws, 2))
+# prior beliefs, from JME paper
+mu_, lambda_ = 1.,2.
+
+# Model 1
+y1y2 = np.zeros((draws,2))
 for draw in range(draws):
-    theta = np.random.normal(mu_, lambda_ ** 2)
+    theta = np.random.normal(mu_,lambda_**2)
     y1 = theta + np.random.normal()
     y2 = theta + np.random.normal()
-    y1y2[draw] = y1, y2
-lnp1p2_mod1 = stats.norm.pdf(y1, loc=mu_, scale=lambda_ ** 2 + 1
-    ) * stats.norm.pdf(y2, mu_, scale=lambda_ ** 2 + 1)
-pmu_pairsp1 = np.zeros((draws, 2))
-y1y2pairsp1 = np.zeros((draws, 2))
+    y1y2[draw] = y1,y2
+
+
+# log marginal distribution
+lnp1p2_mod1 = stats.norm.pdf(y1,loc=mu_, scale=lambda_**2+1)*\
+                stats.norm.pdf(y2,mu_,scale=lambda_**2+1)
+
+
+# Model 2
+pmu_pairsp1 = np.zeros((draws,2))
+y1y2pairsp1 = np.zeros((draws,2))
+# prior 1
 for draw in range(draws):
-    theta1 = np.random.uniform(0, 1)
-    theta2 = np.random.normal(mu_, lambda_ ** 2)
+    theta1 = np.random.uniform(0,1)
+    theta2 = np.random.normal(mu_, lambda_**2)
+#    mu = theta2/(1-theta1)
+#do not do this to maintain independence theta2 is the _location_
+#    y1 = np.random.normal(mu_, lambda_**2)
     y1 = theta2
-    pmu_pairsp1[draw] = theta2, theta1
+#    pmu_pairsp1[draw] = mu, theta1
+    pmu_pairsp1[draw] = theta2, theta1 # mean, autocorr
     y2 = theta2 + theta1 * y1 + np.random.normal()
-    y1y2pairsp1[draw] = y1, y2
-pmu_pairsp2 = np.zeros((draws, 2))
-y1y2pairsp2 = np.zeros((draws, 2))
+    y1y2pairsp1[draw] = y1,y2
+
+
+
+# for a = 0, b = 1 - epsilon = .99999
+# mean of u is .5*.99999
+# variance is 1./12 * .99999**2
+
+# Model 2
+pmu_pairsp2 = np.zeros((draws,2))
+y1y2pairsp2 = np.zeros((draws,2))
+# prior 2
 theta12_2 = []
 for draw in range(draws):
-    theta1 = np.random.uniform(0, 1)
-    theta2 = np.random.normal(mu_ * (1 - theta1), lambda_ ** 2 * (1 -
-        theta1) ** 2)
-    theta12_2.append([theta1, theta2])
-    mu = theta2 / (1 - theta1)
-    y1 = np.random.normal(mu_, lambda_ ** 2)
+#    y1 = np.random.uniform(-4,6)
+    theta1 = np.random.uniform(0,1)
+    theta2 = np.random.normal(mu_*(1-theta1), lambda_**2*(1-theta1)**2)
+    theta12_2.append([theta1,theta2])
+
+    mu = theta2/(1-theta1)
+    y1 = np.random.normal(mu_,lambda_**2)
     y2 = theta2 + theta1 * y1 + np.random.normal()
     pmu_pairsp2[draw] = mu, theta1
-    y1y2pairsp2[draw] = y1, y2
+    y1y2pairsp2[draw] = y1,y2
+
 fig = plt.figure()
 fsp = fig.add_subplot(221)
-fsp.scatter(pmu_pairsp1[:, 0], pmu_pairsp1[:, 1], color='b', facecolor='none')
+fsp.scatter(pmu_pairsp1[:,0], pmu_pairsp1[:,1], color='b', facecolor='none')
 fsp.set_ylabel('Autocorrelation (Y)')
 fsp.set_xlabel('Mean (Y)')
 fsp.set_title('Model 2 (P1)')
-fsp.axis([-20, 20, 0, 1])
+fsp.axis([-20,20,0,1])
+
 fsp = fig.add_subplot(222)
-fsp.scatter(pmu_pairsp2[:, 0], pmu_pairsp2[:, 1], color='b', facecolor='none')
+fsp.scatter(pmu_pairsp2[:,0],pmu_pairsp2[:,1], color='b', facecolor='none')
 fsp.set_title('Model 2 (P2)')
 fsp.set_ylabel('Autocorrelation (Y)')
 fsp.set_xlabel('Mean (Y)')
 fsp.set_title('Model 2 (P2)')
-fsp.axis([-20, 20, 0, 1])
+fsp.axis([-20,20,0,1])
+
 fsp = fig.add_subplot(223)
-fsp.scatter(y1y2pairsp1[:, 0], y1y2pairsp1[:, 1], color='b', marker='o',
+fsp.scatter(y1y2pairsp1[:,0], y1y2pairsp1[:,1], color='b', marker='o',
     facecolor='none')
-fsp.scatter(y1y2[:, 0], y1y2[:, 1], color='g', marker='+')
+fsp.scatter(y1y2[:,0], y1y2[:,1], color ='g', marker='+')
 fsp.set_title('Model 1 vs. Model 2 (P1)')
 fsp.set_ylabel('Y(2)')
 fsp.set_xlabel('Y(1)')
-fsp.axis([-20, 20, -20, 20])
+fsp.axis([-20,20,-20,20])
+
 fsp = fig.add_subplot(224)
-fsp.scatter(y1y2pairsp2[:, 0], y1y2pairsp2[:, 1], color='b', marker='o')
-fsp.scatter(y1y2[:, 0], y1y2[:, 1], color='g', marker='+')
+fsp.scatter(y1y2pairsp2[:,0], y1y2pairsp2[:,1], color='b', marker='o')
+fsp.scatter(y1y2[:,0], y1y2[:,1], color='g', marker='+')
 fsp.set_title('Model 1 vs. Model 2 (P2)')
 fsp.set_ylabel('Y(2)')
 fsp.set_xlabel('Y(1)')
-fsp.axis([-20, 20, -20, 20])
-palpha = np.random.gamma(400, 0.005)
+fsp.axis([-20,20,-20,20])
+
+#plt.show()
+
+#TODO: this does not look the same as the working paper?
+#NOTE: but it matches the language?  I think mine is right!
+
+# Contour plots.
+# on the basis of observed data. ie., the mgrid
+#np.mgrid[6:-4:10j,-4:6:10j]
+
+
+
+
+# Example 2:
+# 2 NK Phillips Curves
+# Structural form
+# M1: y_t = 1/alpha *E_t[y_t+1] + mu_t
+# mu_t = p1 * mu_t-1 + epsilon_t
+# epsilon_t ~ N(0,sigma2)
+
+# Reduced form Law of Motion
+# M1: y_t = p1*y_t-1 + 1/(1-p1/alpha)*epsilon_t
+
+# specify prior for M1
+# for i = 1,2
+# theta_i = [alpha
+#             p_i
+#             sigma]
+# truncate effective priors by the determinancy region
+# for determinancy we need alpha > 1
+# p in [0,1)
+# palpha ~ Gamma(2.00,.10)
+# mean = 2.00
+# std = .1 which implies k = 400, theta = .005
+palpha = np.random.gamma(400,.005)
+
+# pi ~ Beta(.5,.05)
 pi = np.random.beta(49.5, 49.5)
-psigma = igamma.rvs(1.0, 4.0, size=1000000.0)
+
+# psigma ~ InvGamma(1.00,4.00)
+#def invgamma(a,b):
+#    return np.sqrt(b*a**2/np.sum(np.random.random(b,1)**2, axis=1))
+#NOTE: Use inverse gamma distribution igamma
+psigma = igamma.rvs(1.,4.0, size=1e6) #TODO: parameterization is not correct vs.
+# Del Negro and Schorfheide
 if pymc_installed:
-    psigma2 = pymc.rinverse_gamma(1.0, 4.0, size=1000000.0)
+    psigma2 = pymc.rinverse_gamma(1.,4.0, size=1e6)
 else:
-    psigma2 = stats.invgamma.rvs(1.0, scale=4.0, size=1000000.0)
+    psigma2 = stats.invgamma.rvs(1., scale=4.0, size=1e6)
 nsims = 500
-y = np.zeros(nsims)
+y = np.zeros((nsims))
+#for i in range(1,nsims):
+#    y[i] = .9*y[i-1] + 1/(1-p1/alpha) + np.random.normal()
+
+#Are these supposed to be sampled jointly?
+
+# InvGamma(sigma|v,s) propto sigma**(-v-1)*e**(-vs**2/2*sigma**2)
+#igamma =
+
+# M2: y_t = 1/alpha * E_t[y_t+1] + p2*y_t-1 + mu_t
+# mu_t ~ epsilon_t
+# epsilon_t ~ n(0,sigma2)
+
+# Reduced form Law of Motion
+# y_t = 1/2 (alpha-sqrt(alpha**2-4*p2*alpha)) * y_t-1 + 2*alpha/(alpha + \
+#        sqrt(alpha**2 - 4*p2*alpha)) * epsilon_t
diff --git a/statsmodels/sandbox/examples/ex_cusum.py b/statsmodels/sandbox/examples/ex_cusum.py
index 2d140dc36..3d4480b66 100644
--- a/statsmodels/sandbox/examples/ex_cusum.py
+++ b/statsmodels/sandbox/examples/ex_cusum.py
@@ -1,58 +1,105 @@
+# -*- coding: utf-8 -*-
 """
 Created on Fri Apr 02 11:41:25 2010

 Author: josef-pktd
 """
+
+
 import numpy as np
 from numpy.testing import assert_almost_equal
 import matplotlib.pyplot as plt
+
 import statsmodels.api as sm
 from statsmodels.stats.diagnostic import recursive_olsresiduals
-from statsmodels.stats.diagnostic import breaks_hansen, breaks_cusumolsresid
+from statsmodels.stats.diagnostic import (
+    breaks_hansen, breaks_cusumolsresid)
+
+
+#examples from ex_onewaygls.py
+#choose example
+#--------------
 example = ['null', 'smalldiff', 'mediumdiff', 'largediff'][1]
 example_size = [20, 100][1]
 example_groups = ['2', '2-2'][1]
+#'2-2': 4 groups,
+#       groups 0 and 1 and groups 2 and 3 have identical parameters in DGP
+
+#generate example
+#----------------
+#np.random.seed(87654589)
 nobs = example_size
-x1 = 0.1 + np.random.randn(nobs)
-y1 = 10 + 15 * x1 + 2 * np.random.randn(nobs)
+x1 = 0.1+np.random.randn(nobs)
+y1 = 10 + 15*x1 + 2*np.random.randn(nobs)
+
 x1 = sm.add_constant(x1, prepend=False)
-x2 = 0.1 + np.random.randn(nobs)
+#assert_almost_equal(x1, np.vander(x1[:,0],2), 16)
+#res1 = sm.OLS(y1, x1).fit()
+#print res1.params
+#print np.polyfit(x1[:,0], y1, 1)
+#assert_almost_equal(res1.params, np.polyfit(x1[:,0], y1, 1), 14)
+#print res1.summary(xname=['x1','const1'])
+
+#regression 2
+x2 = 0.1+np.random.randn(nobs)
 if example == 'null':
-    y2 = 10 + 15 * x2 + 2 * np.random.randn(nobs)
+    y2 = 10 + 15*x2 + 2*np.random.randn(nobs)  # if H0 is true
 elif example == 'smalldiff':
-    y2 = 11 + 16 * x2 + 2 * np.random.randn(nobs)
+    y2 = 11 + 16*x2 + 2*np.random.randn(nobs)
 elif example == 'mediumdiff':
-    y2 = 12 + 16 * x2 + 2 * np.random.randn(nobs)
+    y2 = 12 + 16*x2 + 2*np.random.randn(nobs)
 else:
-    y2 = 19 + 17 * x2 + 2 * np.random.randn(nobs)
+    y2 = 19 + 17*x2 + 2*np.random.randn(nobs)
+
 x2 = sm.add_constant(x2, prepend=False)
-x = np.concatenate((x1, x2), 0)
-y = np.concatenate((y1, y2))
+
+# stack
+x = np.concatenate((x1,x2),0)
+y = np.concatenate((y1,y2))
 if example_groups == '2':
-    groupind = (np.arange(2 * nobs) > nobs - 1).astype(int)
+    groupind = (np.arange(2*nobs)>nobs-1).astype(int)
 else:
-    groupind = np.mod(np.arange(2 * nobs), 4)
+    groupind = np.mod(np.arange(2*nobs),4)
     groupind.sort()
+#x = np.column_stack((x,x*groupind[:,None]))
+
 res1 = sm.OLS(y, x).fit()
 skip = 8
-(rresid, rparams, rypred, rresid_standardized, rresid_scaled, rcusum, rcusumci
-    ) = recursive_olsresiduals(res1, skip)
+
+rresid, rparams, rypred, rresid_standardized, rresid_scaled, rcusum, rcusumci = \
+            recursive_olsresiduals(res1, skip)
 print(rcusum)
-print(rresid_scaled[skip - 1:])
+print(rresid_scaled[skip-1:])
+
 assert_almost_equal(rparams[-1], res1.params)
+
 plt.plot(rcusum)
 plt.plot(rcusumci[0])
 plt.plot(rcusumci[1])
 plt.figure()
 plt.plot(rresid)
 plt.plot(np.abs(rresid))
+
 print('cusum test reject:')
-print(((rcusum[1:] > rcusumci[1]) | (rcusum[1:] < rcusumci[0])).any())
-(rresid2, rparams2, rypred2, rresid_standardized2, rresid_scaled2, rcusum2,
-    rcusumci2) = recursive_olsresiduals(res1, skip)
-assert_almost_equal(rparams[skip:], rparams2[skip:], 13)
+print(((rcusum[1:]>rcusumci[1]) | (rcusum[1:]<rcusumci[0])).any())
+
+rresid2, rparams2, rypred2, rresid_standardized2, rresid_scaled2, rcusum2, rcusumci2 = \
+            recursive_olsresiduals(res1, skip)
+#assert_almost_equal(rparams[skip+1:], rparams2[skip:-1],13)
+assert_almost_equal(rparams[skip:], rparams2[skip:],13)
+#np.c_[rparams[skip+1:], rparams2[skip:-1]]
+#plt.show()
+
+####################  Example break test
 H, crit95 = breaks_hansen(res1)
 print(H)
 print(crit95)
+
 supb, pval, crit = breaks_cusumolsresid(res1.resid)
 print(supb, pval, crit)
+
+##check whether this works directly: Ploberger/Kramer framing of standard cusum
+##no, it's different, there is another denominator
+#print breaks_cusumolsresid(rresid[skip:])
+#this function is still completely wrong, cut and paste does not apply
+#print breaks_cusum(rresid[skip:])
diff --git a/statsmodels/sandbox/examples/ex_gam_results.py b/statsmodels/sandbox/examples/ex_gam_results.py
index 418388391..8c646e6b6 100644
--- a/statsmodels/sandbox/examples/ex_gam_results.py
+++ b/statsmodels/sandbox/examples/ex_gam_results.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Example results for GAM from tests

 Created on Mon Nov 07 13:13:15 2011
@@ -10,25 +11,33 @@ I do not know yet why there is the small difference and why GAM does not
 converge in this case

 """
+
+
 import matplotlib.pyplot as plt
+
 from statsmodels.sandbox.tests.test_gam import _estGAMGaussianLogLink
+
+
 tt = _estGAMGaussianLogLink()
 comp, const = tt.res_gam.smoothed_demeaned(tt.mod_gam.exog)
 comp_glm_ = tt.res2.model.exog * tt.res2.params
-comp1 = comp_glm_[:, 1:4].sum(1)
+comp1 = comp_glm_[:,1:4].sum(1)
 mean1 = comp1.mean()
 comp1 -= mean1
-comp2 = comp_glm_[:, 4:].sum(1)
+comp2 = comp_glm_[:,4:].sum(1)
 mean2 = comp2.mean()
 comp2 -= mean2
-comp1_true = tt.res2.model.exog[:, 1:4].sum(1)
+
+comp1_true = tt.res2.model.exog[:,1:4].sum(1)
 mean1 = comp1_true.mean()
 comp1_true -= mean1
-comp2_true = tt.res2.model.exog[:, 4:].sum(1)
+comp2_true = tt.res2.model.exog[:,4:].sum(1)
 mean2 = comp2_true.mean()
 comp2_true -= mean2
+
 noise = tt.res2.model.endog - tt.mu_true
-noise_eta = tt.family.link(tt.res2.model.endog) - tt.y_true
+noise_eta =  tt.family.link(tt.res2.model.endog) - tt.y_true
+
 plt.figure()
 plt.plot(noise, 'k.')
 plt.figure()
@@ -37,7 +46,13 @@ plt.plot(comp1, 'b-')
 plt.plot(comp2, 'b-')
 plt.plot(comp1_true, 'k--', lw=2)
 plt.plot(comp2_true, 'k--', lw=2)
+#the next does not make sense - non-linear
+#c1 = tt.family.link(tt.family.link.inverse(comp1_true) + noise)
+#c2 = tt.family.link(tt.family.link.inverse(comp2_true) + noise)
+#not nice in example/plot: noise variance is constant not proportional
 plt.plot(comp1_true + noise_eta, 'g.', alpha=0.95)
 plt.plot(comp2_true + noise_eta, 'r.', alpha=0.95)
+#plt.plot(c1, 'g.', alpha=0.95)
+#plt.plot(c2, 'r.', alpha=0.95)
 plt.title('Gaussian loglink, GAM (red), GLM (blue), true (black)')
 plt.show()
diff --git a/statsmodels/sandbox/examples/ex_mixed_lls_0.py b/statsmodels/sandbox/examples/ex_mixed_lls_0.py
index 65e814294..92e1ffd14 100644
--- a/statsmodels/sandbox/examples/ex_mixed_lls_0.py
+++ b/statsmodels/sandbox/examples/ex_mixed_lls_0.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Example using OneWayMixed


@@ -11,46 +12,77 @@ effects and random coefficients, and uses OneWayMixed to estimate it.

 """
 import numpy as np
+
 from statsmodels.sandbox.panel.mixed import OneWayMixed, Unit
+
 examples = ['ex1']
+
 if 'ex1' in examples:
+    #np.random.seed(54321)
     np.random.seed(978326)
     nsubj = 2000
-    units = []
-    nobs_i = 4
-    nx = 4
-    nz = 2
+    units  = []
+
+    nobs_i = 4 #number of observations per unit, changed below
+
+    nx = 4  #number fixed effects
+    nz = 2 ##number random effects
     beta = np.ones(nx)
-    gamma = 0.5 * np.ones(nz)
+    gamma = 0.5 * np.ones(nz)   #mean of random effect
     gamma[0] = 0
     gamma_re_true = []
     for i in range(nsubj):
+        #create data for one unit
+
+        #random effect/coefficient
         gamma_re = gamma + 0.2 * np.random.standard_normal(nz)
+        #store true parameter for checking
         gamma_re_true.append(gamma_re)
-        if i > nsubj // 4:
+
+        #for testing unbalanced case, let's change nobs per unit
+        if i > nsubj//4:
             nobs_i = 6
+
+        #generate exogenous variables
         X = np.random.standard_normal((nobs_i, nx))
-        Z = np.random.standard_normal((nobs_i, nz - 1))
+        Z = np.random.standard_normal((nobs_i, nz-1))
         Z = np.column_stack((np.ones(nobs_i), Z))
-        noise = 0.1 * np.random.randn(nobs_i)
+
+        noise = 0.1 * np.random.randn(nobs_i) #sig_e = 0.1
+
+        #generate endogenous variable
         Y = np.dot(X, beta) + np.dot(Z, gamma_re) + noise
-        X = np.hstack((X, Z))
+
+        #add random effect design matrix also to fixed effects to
+        #capture the mean
+        #this seems to be necessary to force mean of RE to zero !?
+        #(It's not required for estimation but interpretation of random
+        #effects covariance matrix changes - still need to check details.
+        X = np.hstack((X,Z))
+
+        # create units and append to list
         new_unit = Unit(Y, X, Z)
         units.append(new_unit)
+
+
     m = OneWayMixed(units)
+
     import time
     t0 = time.time()
     m.initialize()
-    res = m.fit(maxiter=100, rtol=1e-05, params_rtol=1e-06, params_atol=1e-06)
+    res = m.fit(maxiter=100, rtol=1.0e-5, params_rtol=1e-6, params_atol=1e-6)
     t1 = time.time()
-    print('time for initialize and fit', t1 - t0)
+    print('time for initialize and fit', t1-t0)
     print('number of iterations', m.iterations)
+    #print(dir(m)
+    #print(vars(m)
     print('\nestimates for fixed effects')
     print(m.a)
     print(m.params)
     bfixed_cov = m.cov_fixed()
     print('beta fixed standard errors')
     print(np.sqrt(np.diag(bfixed_cov)))
+
     print(m.bse)
     b_re = m.params_random_units
     print('RE mean:', b_re.mean(0))
@@ -64,14 +96,18 @@ if 'ex1' in examples:
     print('std of above')
     print(res.std_random())
     print(np.sqrt(np.diag(m.cov_random())))
+
     print('\n(non)convergence of llf')
     print(m.history['llf'][-4:])
     print('convergence of parameters')
-    print(np.diff(np.vstack(m.history['params'][-4:]), axis=0))
+    #print(np.diff(np.vstack(m.history[-4:])[:,1:],axis=0)
+    print(np.diff(np.vstack(m.history['params'][-4:]),axis=0))
     print('convergence of D')
     print(np.diff(np.array(m.history['D'][-4:]), axis=0))
-    zb = np.array([(unit.Z * unit.b[None, :]).sum(0) for unit in m.units])
-    """if Z is not included in X:
+
+    #zdotb = np.array([np.dot(unit.Z, unit.b) for unit in m.units])
+    zb = np.array([(unit.Z * unit.b[None,:]).sum(0) for unit in m.units])
+    '''if Z is not included in X:
     >>> np.dot(b_re.T, b_re)/100
     array([[ 0.03270611, -0.00916051],
            [-0.00916051,  0.26432783]])
@@ -79,32 +115,40 @@ if 'ex1' in examples:
     array([[ 0.0348722 , -0.00909159],
            [-0.00909159,  0.26846254]])
     >>> #note cov_random does not subtract mean!
-    """
+    '''
     print('\nchecking the random effects distribution and prediction')
     gamma_re_true = np.array(gamma_re_true)
     print('mean of random effect true', gamma_re_true.mean(0))
     print('mean from fixed effects   ', m.params[-2:])
     print('mean of estimated RE      ', b_re.mean(0))
+
     print('')
     absmean_true = np.abs(gamma_re_true).mean(0)
-    mape = ((m.params[-2:] + b_re) / gamma_re_true - 1).mean(0) * 100
-    mean_abs_perc = np.abs(m.params[-2:] + b_re - gamma_re_true).mean(0
-        ) / absmean_true * 100
-    median_abs_perc = np.median(np.abs(m.params[-2:] + b_re - gamma_re_true), 0
-        ) / absmean_true * 100
-    rmse_perc = (m.params[-2:] + b_re - gamma_re_true).std(0
-        ) / absmean_true * 100
+    mape = ((m.params[-2:] + b_re) / gamma_re_true - 1).mean(0)*100
+    mean_abs_perc = np.abs((m.params[-2:] + b_re) - gamma_re_true).mean(0) \
+                       / absmean_true*100
+    median_abs_perc = np.median(np.abs((m.params[-2:] + b_re) - gamma_re_true), 0) \
+                         / absmean_true*100
+    rmse_perc = ((m.params[-2:] + b_re) - gamma_re_true).std(0) \
+                  / absmean_true*100
     print('mape           ', mape)
     print('mean_abs_perc  ', mean_abs_perc)
     print('median_abs_perc', median_abs_perc)
     print('rmse_perc (std)', rmse_perc)
-    print(res.llf)
+    #from numpy.testing import assert_almost_equal
+    #assert is for n_units=100 in original example
+    #I changed random number generation, so this will not work anymore
+    #assert_almost_equal(rmse_perc, [ 34.14783884,  11.6031684 ], decimal=8)
+
+    #now returns res
+    print(res.llf)  #based on MLE, does not include constant
     print(res.tvalues)
     print(res.pvalues)
-    print(res.t_test([1, -1, 0, 0, 0, 0]))
+    print(res.t_test([1,-1,0,0,0,0]))
     print('test mean of both random effects variables is zero')
-    print(res.f_test([[0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1]]))
+    print(res.f_test([[0,0,0,0,1,0], [0,0,0,0,0,1]]))
     plots = res.plot_random_univariate(bins=50)
     fig = res.plot_scatter_pairs(0, 1)
     import matplotlib.pyplot as plt
+
     plt.show()
diff --git a/statsmodels/sandbox/examples/ex_mixed_lls_re.py b/statsmodels/sandbox/examples/ex_mixed_lls_re.py
index 08d887a18..980e6d682 100644
--- a/statsmodels/sandbox/examples/ex_mixed_lls_re.py
+++ b/statsmodels/sandbox/examples/ex_mixed_lls_re.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Example using OneWayMixed


@@ -13,47 +14,79 @@ individual specific constant, that is just a random effect without exogenous
 regressors.

 """
+
 import numpy as np
+
 from statsmodels.sandbox.panel.mixed import OneWayMixed, Unit
+
 examples = ['ex1']
+
 if 'ex1' in examples:
+    #np.random.seed(54321)
     np.random.seed(978326)
     nsubj = 2000
-    units = []
-    nobs_i = 4
-    nx = 0
-    nz = 1
+    units  = []
+
+    nobs_i = 4 #number of observations per unit, changed below
+
+    nx = 0  #number fixed effects
+    nz = 1 ##number random effects
     beta = np.ones(nx)
-    gamma = 0.5 * np.ones(nz)
+    gamma = 0.5 * np.ones(nz)   #mean of random effect
     gamma[0] = 0
     gamma_re_true = []
     for i in range(nsubj):
+        #create data for one unit
+
+        #random effect/coefficient
         gamma_re = gamma + 0.2 * np.random.standard_normal(nz)
+        #store true parameter for checking
         gamma_re_true.append(gamma_re)
-        if i > nsubj // 4:
+
+        #for testing unbalanced case, let's change nobs per unit
+        if i > nsubj//4:
             nobs_i = 6
+
+        #generate exogenous variables
         X = np.random.standard_normal((nobs_i, nx))
-        Z = np.random.standard_normal((nobs_i, nz - 1))
+        Z = np.random.standard_normal((nobs_i, nz-1))
         Z = np.column_stack((np.ones(nobs_i), Z))
-        noise = 0.1 * np.random.randn(nobs_i)
+
+        noise = 0.1 * np.random.randn(nobs_i) #sig_e = 0.1
+
+        #generate endogenous variable
         Y = np.dot(X, beta) + np.dot(Z, gamma_re) + noise
-        X = np.hstack((X, Z))
+
+        #add random effect design matrix also to fixed effects to
+        #capture the mean
+        #this seems to be necessary to force mean of RE to zero !?
+        #(It's not required for estimation but interpretation of random
+        #effects covariance matrix changes - still need to check details.
+        X = np.hstack((X,Z))
+
+        # create units and append to list
         new_unit = Unit(Y, X, Z)
         units.append(new_unit)
+
+
     m = OneWayMixed(units)
+
     import time
     t0 = time.time()
     m.initialize()
-    res = m.fit(maxiter=100, rtol=1e-05, params_rtol=1e-06, params_atol=1e-06)
+    res = m.fit(maxiter=100, rtol=1.0e-5, params_rtol=1e-6, params_atol=1e-6)
     t1 = time.time()
-    print('time for initialize and fit', t1 - t0)
+    print('time for initialize and fit', t1-t0)
     print('number of iterations', m.iterations)
+    #print dir(m)
+    #print vars(m)
     print('\nestimates for fixed effects')
     print(m.a)
     print(m.params)
     bfixed_cov = m.cov_fixed()
     print('beta fixed standard errors')
     print(np.sqrt(np.diag(bfixed_cov)))
+
     print(m.bse)
     b_re = m.params_random_units
     print('RE mean:', b_re.mean(0))
@@ -61,20 +94,25 @@ if 'ex1' in examples:
     print('np.cov(b_re, rowvar=0), sample statistic')
     print(np.cov(b_re, rowvar=0))
     print('std of above')
+    #need atleast_1d or diag raises exception
     print(np.sqrt(np.diag(np.atleast_1d(np.cov(b_re, rowvar=0)))))
     print('m.cov_random()')
     print(m.cov_random())
     print('std of above')
     print(res.std_random())
     print(np.sqrt(np.diag(m.cov_random())))
+
     print('\n(non)convergence of llf')
     print(m.history['llf'][-4:])
     print('convergence of parameters')
-    print(np.diff(np.vstack(m.history['params'][-4:]), axis=0))
+    #print np.diff(np.vstack(m.history[-4:])[:,1:],axis=0)
+    print(np.diff(np.vstack(m.history['params'][-4:]),axis=0))
     print('convergence of D')
     print(np.diff(np.array(m.history['D'][-4:]), axis=0))
-    zb = np.array([(unit.Z * unit.b[None, :]).sum(0) for unit in m.units])
-    """if Z is not included in X:
+
+    #zdotb = np.array([np.dot(unit.Z, unit.b) for unit in m.units])
+    zb = np.array([(unit.Z * unit.b[None,:]).sum(0) for unit in m.units])
+    '''if Z is not included in X:
     >>> np.dot(b_re.T, b_re)/100
     array([[ 0.03270611, -0.00916051],
            [-0.00916051,  0.26432783]])
@@ -82,31 +120,40 @@ if 'ex1' in examples:
     array([[ 0.0348722 , -0.00909159],
            [-0.00909159,  0.26846254]])
     >>> #note cov_random does not subtract mean!
-    """
+    '''
     print('\nchecking the random effects distribution and prediction')
     gamma_re_true = np.array(gamma_re_true)
     print('mean of random effect true', gamma_re_true.mean(0))
     print('mean from fixed effects   ', m.params[-2:])
     print('mean of estimated RE      ', b_re.mean(0))
+
     print()
     absmean_true = np.abs(gamma_re_true).mean(0)
-    mape = ((m.params[-2:] + b_re) / gamma_re_true - 1).mean(0) * 100
-    mean_abs_perc = np.abs(m.params[-2:] + b_re - gamma_re_true).mean(0
-        ) / absmean_true * 100
-    median_abs_perc = np.median(np.abs(m.params[-2:] + b_re - gamma_re_true), 0
-        ) / absmean_true * 100
-    rmse_perc = (m.params[-2:] + b_re - gamma_re_true).std(0
-        ) / absmean_true * 100
+    mape = ((m.params[-2:] + b_re) / gamma_re_true - 1).mean(0)*100
+    mean_abs_perc = np.abs((m.params[-2:] + b_re) - gamma_re_true).mean(0) \
+                       / absmean_true*100
+    median_abs_perc = np.median(np.abs((m.params[-2:] + b_re) - gamma_re_true), 0) \
+                         / absmean_true*100
+    rmse_perc = ((m.params[-2:] + b_re) - gamma_re_true).std(0) \
+                  / absmean_true*100
     print('mape           ', mape)
     print('mean_abs_perc  ', mean_abs_perc)
     print('median_abs_perc', median_abs_perc)
     print('rmse_perc (std)', rmse_perc)
-    print('llf', res.llf)
+    #from numpy.testing import assert_almost_equal
+    #assert is for n_units=100 in original example
+    #I changed random number generation, so this will not work anymore
+    #assert_almost_equal(rmse_perc, [ 34.14783884,  11.6031684 ], decimal=8)
+
+    #now returns res
+    print('llf', res.llf)  #based on MLE, does not include constant
     print('tvalues', res.tvalues)
     print('pvalues', res.pvalues)
     print(res.t_test([1]))
     print('test mean of both random effects variables is zero')
     print(res.f_test([[1]]))
     plots = res.plot_random_univariate(bins=50)
+    #fig = res.plot_scatter_pairs(0, 1) #no pairs
     import matplotlib.pyplot as plt
+
     plt.show()
diff --git a/statsmodels/sandbox/examples/ex_mixed_lls_timecorr.py b/statsmodels/sandbox/examples/ex_mixed_lls_timecorr.py
index 9fa6a9fe2..ed64d3a05 100644
--- a/statsmodels/sandbox/examples/ex_mixed_lls_timecorr.py
+++ b/statsmodels/sandbox/examples/ex_mixed_lls_timecorr.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Example using OneWayMixed with within group intertemporal correlation


@@ -45,52 +46,92 @@ used AR(1) as example, but only starting at second period. (?)
 Note: we do not impose AR structure in the estimation

 """
+
 import numpy as np
+
 from statsmodels.sandbox.panel.mixed import OneWayMixed, Unit
+
 examples = ['ex1']
+
 if 'ex1' in examples:
+    #np.random.seed(54321)
+    #np.random.seed(978326)
     nsubj = 200
-    units = []
-    nobs_i = 8
-    nx = 1
-    nz = nobs_i - 1
+    units  = []
+
+    nobs_i = 8 #number of observations per unit, changed below
+
+    nx = 1  #number fixed effects
+    nz = nobs_i - 1 ##number random effects
     beta = np.ones(nx)
-    gamma = 0.5 * np.ones(nz)
+    gamma = 0.5 * np.ones(nz)   #mean of random effect
+    #gamma[0] = 0
     gamma_re_true = []
     for i in range(nsubj):
+        #create data for one unit
+
+        #random effect/coefficient
+
         use_correlated = True
         if not use_correlated:
             gamma_re = gamma + 0.2 * np.random.standard_normal(nz)
         else:
+            #coefficients are AR(1) for all but first time periods
             from scipy import linalg as splinalg
             rho = 0.6
-            corr_re = splinalg.toeplitz(rho ** np.arange(nz))
+            corr_re = splinalg.toeplitz(rho**np.arange(nz))
             rvs = np.random.multivariate_normal(np.zeros(nz), corr_re)
             gamma_re = gamma + 0.2 * rvs
+
+        #store true parameter for checking
         gamma_re_true.append(gamma_re)
+
+        #generate exogenous variables
         X = np.random.standard_normal((nobs_i, nx))
-        time_dummies = (np.arange(nobs_i)[:, None] == np.arange(nobs_i)[
-            None, :]).astype(float)
-        Z = time_dummies[:, 1:]
-        noise = 0.1 * np.random.randn(nobs_i)
+
+        #try Z should be time dummies
+        time_dummies = (np.arange(nobs_i)[:, None] == np.arange(nobs_i)[None, :]).astype(float)
+        Z = time_dummies[:,1:]
+
+#        Z = np.random.standard_normal((nobs_i, nz-1))
+#        Z = np.column_stack((np.ones(nobs_i), Z))
+
+        noise = 0.1 * np.random.randn(nobs_i) #sig_e = 0.1
+
+        #generate endogenous variable
         Y = np.dot(X, beta) + np.dot(Z, gamma_re) + noise
+
+        #add random effect design matrix also to fixed effects to
+        #capture the mean
+        #this seems to be necessary to force mean of RE to zero !?
+        #(It's not required for estimation but interpretation of random
+        #effects covariance matrix changes - still need to check details.
+        #X = np.hstack((X,Z))
         X = np.hstack((X, time_dummies))
+
+        # create units and append to list
         new_unit = Unit(Y, X, Z)
         units.append(new_unit)
+
+
     m = OneWayMixed(units)
+
     import time
     t0 = time.time()
     m.initialize()
-    res = m.fit(maxiter=100, rtol=1e-05, params_rtol=1e-06, params_atol=1e-06)
+    res = m.fit(maxiter=100, rtol=1.0e-5, params_rtol=1e-6, params_atol=1e-6)
     t1 = time.time()
-    print('time for initialize and fit', t1 - t0)
+    print('time for initialize and fit', t1-t0)
     print('number of iterations', m.iterations)
+    #print dir(m)
+    #print vars(m)
     print('\nestimates for fixed effects')
     print(m.a)
     print(m.params)
     bfixed_cov = m.cov_fixed()
     print('beta fixed standard errors')
     print(np.sqrt(np.diag(bfixed_cov)))
+
     print(m.bse)
     b_re = m.params_random_units
     print('RE mean:', b_re.mean(0))
@@ -100,22 +141,27 @@ if 'ex1' in examples:
     print('sample correlation of estimated random effects')
     print(np.corrcoef(b_re, rowvar=0))
     print('std of above')
+    #need atleast_1d or diag raises exception
     print(np.sqrt(np.diag(np.atleast_1d(np.cov(b_re, rowvar=0)))))
     print('m.cov_random()')
     print(m.cov_random())
     print('correlation from above')
-    print(res.cov_random() / res.std_random()[:, None] / res.std_random())
+    print(res.cov_random()/ res.std_random()[:,None] /res.std_random())
     print('std of above')
     print(res.std_random())
     print(np.sqrt(np.diag(m.cov_random())))
+
     print('\n(non)convergence of llf')
     print(m.history['llf'][-4:])
     print('convergence of parameters')
-    print(np.diff(np.vstack(m.history['params'][-4:]), axis=0))
+    #print np.diff(np.vstack(m.history[-4:])[:,1:],axis=0)
+    print(np.diff(np.vstack(m.history['params'][-4:]),axis=0))
     print('convergence of D')
     print(np.diff(np.array(m.history['D'][-4:]), axis=0))
-    zb = np.array([(unit.Z * unit.b[None, :]).sum(0) for unit in m.units])
-    """if Z is not included in X:
+
+    #zdotb = np.array([np.dot(unit.Z, unit.b) for unit in m.units])
+    zb = np.array([(unit.Z * unit.b[None,:]).sum(0) for unit in m.units])
+    '''if Z is not included in X:
     >>> np.dot(b_re.T, b_re)/100
     array([[ 0.03270611, -0.00916051],
            [-0.00916051,  0.26432783]])
@@ -123,37 +169,43 @@ if 'ex1' in examples:
     array([[ 0.0348722 , -0.00909159],
            [-0.00909159,  0.26846254]])
     >>> #note cov_random does not subtract mean!
-    """
+    '''
     print('\nchecking the random effects distribution and prediction')
     gamma_re_true = np.array(gamma_re_true)
     print('mean of random effect true', gamma_re_true.mean(0))
     print('mean from fixed effects   ', m.params[-2:])
     print('mean of estimated RE      ', b_re.mean(0))
+
     print()
     absmean_true = np.abs(gamma_re_true).mean(0)
-    mape = ((m.params[-nz:] + b_re) / gamma_re_true - 1).mean(0) * 100
-    mean_abs_perc = np.abs(m.params[-nz:] + b_re - gamma_re_true).mean(0
-        ) / absmean_true * 100
-    median_abs_perc = np.median(np.abs(m.params[-nz:] + b_re -
-        gamma_re_true), 0) / absmean_true * 100
-    rmse_perc = (m.params[-nz:] + b_re - gamma_re_true).std(0
-        ) / absmean_true * 100
+    mape = ((m.params[-nz:] + b_re) / gamma_re_true - 1).mean(0)*100
+    mean_abs_perc = np.abs((m.params[-nz:] + b_re) - gamma_re_true).mean(0) \
+                       / absmean_true*100
+    median_abs_perc = np.median(np.abs((m.params[-nz:] + b_re) - gamma_re_true), 0) \
+                         / absmean_true*100
+    rmse_perc = ((m.params[-nz:] + b_re) - gamma_re_true).std(0) \
+                  / absmean_true*100
     print('mape           ', mape)
     print('mean_abs_perc  ', mean_abs_perc)
     print('median_abs_perc', median_abs_perc)
     print('rmse_perc (std)', rmse_perc)
-    print('llf', res.llf)
+    #from numpy.testing import assert_almost_equal
+    #assert is for n_units=100 in original example
+    #I changed random number generation, so this will not work anymore
+    #assert_almost_equal(rmse_perc, [ 34.14783884,  11.6031684 ], decimal=8)
+
+    #now returns res
+    print('llf', res.llf)  #based on MLE, does not include constant
     print('tvalues', res.tvalues)
     print('pvalues', res.pvalues)
     rmat = np.zeros(len(res.params))
     rmat[-nz:] = 1
     print('t_test mean of random effects variables are zero')
     print(res.t_test(rmat))
-    print(
-        'f_test mean of both random effects variables is zero (joint hypothesis)'
-        )
+    print('f_test mean of both random effects variables is zero (joint hypothesis)')
     print(res.f_test(rmat))
-    plots = res.plot_random_univariate()
+    plots = res.plot_random_univariate() #(bins=50)
     fig = res.plot_scatter_all_pairs()
     import matplotlib.pyplot as plt
+
     plt.show()
diff --git a/statsmodels/sandbox/examples/ex_onewaygls.py b/statsmodels/sandbox/examples/ex_onewaygls.py
index 5919001cd..f025473c3 100644
--- a/statsmodels/sandbox/examples/ex_onewaygls.py
+++ b/statsmodels/sandbox/examples/ex_onewaygls.py
@@ -1,53 +1,196 @@
+# -*- coding: utf-8 -*-
 """Example: Test for equality of coefficients across groups/regressions


 Created on Sat Mar 27 22:36:51 2010
 Author: josef-pktd
 """
+
 import numpy as np
 from scipy import stats
+#from numpy.testing import assert_almost_equal
 import statsmodels.api as sm
 from statsmodels.sandbox.regression.onewaygls import OneWayLS
-example = ['null', 'diff'][1]
+
+#choose example
+#--------------
+example = ['null', 'diff'][1]   #null: identical coefficients across groups
 example_size = [10, 100][0]
-example_size = [(10, 2), (100, 2)][0]
+example_size = [(10,2), (100,2)][0]
 example_groups = ['2', '2-2'][1]
+#'2-2': 4 groups,
+#       groups 0 and 1 and groups 2 and 3 have identical parameters in DGP
+
+#generate example
+#----------------
 np.random.seed(87654589)
 nobs, nvars = example_size
 x1 = np.random.normal(size=(nobs, nvars))
-y1 = 10 + np.dot(x1, [15.0] * nvars) + 2 * np.random.normal(size=nobs)
+y1 = 10 + np.dot(x1,[15.]*nvars) + 2*np.random.normal(size=nobs)
+
 x1 = sm.add_constant(x1, prepend=False)
-x2 = np.random.normal(size=(nobs, nvars))
+#assert_almost_equal(x1, np.vander(x1[:,0],2), 16)
+#res1 = sm.OLS(y1, x1).fit()
+#print res1.params
+#print np.polyfit(x1[:,0], y1, 1)
+#assert_almost_equal(res1.params, np.polyfit(x1[:,0], y1, 1), 14)
+#print res1.summary(xname=['x1','const1'])
+
+#regression 2
+x2 = np.random.normal(size=(nobs,nvars))
 if example == 'null':
-    y2 = 10 + np.dot(x2, [15.0] * nvars) + 2 * np.random.normal(size=nobs)
+    y2 = 10 + np.dot(x2,[15.]*nvars) + 2*np.random.normal(size=nobs)  # if H0 is true
 else:
-    y2 = 19 + np.dot(x2, [17.0] * nvars) + 2 * np.random.normal(size=nobs)
+    y2 = 19 + np.dot(x2,[17.]*nvars) + 2*np.random.normal(size=nobs)
+
 x2 = sm.add_constant(x2, prepend=False)
-x = np.concatenate((x1, x2), 0)
-y = np.concatenate((y1, y2))
+
+# stack
+x = np.concatenate((x1,x2),0)
+y = np.concatenate((y1,y2))
 if example_groups == '2':
-    groupind = (np.arange(2 * nobs) > nobs - 1).astype(int)
+    groupind = (np.arange(2*nobs)>nobs-1).astype(int)
 else:
-    groupind = np.mod(np.arange(2 * nobs), 4)
+    groupind = np.mod(np.arange(2*nobs),4)
     groupind.sort()
-print("""
-Test for equality of coefficients for all exogenous variables""")
+#x = np.column_stack((x,x*groupind[:,None]))
+
+
+def print_results(res):
+    groupind = res.groups
+    #res.fitjoint()  #not really necessary, because called by ftest_summary
+    ft = res.ftest_summary()
+    #print ft[0]  #skip because table is nicer
+    print('\nTable of F-tests for overall or pairwise equality of coefficients')
+##    print 'hypothesis F-statistic         p-value  df_denom df_num  reject'
+##    for row in ft[1]:
+##        print row,
+##        if row[1][1]<0.05:
+##            print '*'
+##        else:
+##            print ''
+    from statsmodels.iolib import SimpleTable
+    print(SimpleTable([(['%r' % (row[0],)]
+                        + list(row[1])
+                        + ['*']*(row[1][1]>0.5).item() ) for row in ft[1]],
+                      headers=['pair', 'F-statistic','p-value','df_denom',
+                               'df_num']))
+
+    print('Notes: p-values are not corrected for many tests')
+    print('       (no Bonferroni correction)')
+    print('       * : reject at 5% uncorrected confidence level')
+    print('Null hypothesis: all or pairwise coefficient are the same')
+    print('Alternative hypothesis: all coefficients are different')
+
+    print('\nComparison with stats.f_oneway')
+    print(stats.f_oneway(*[y[groupind==gr] for gr in res.unique]))
+    print('\nLikelihood Ratio Test')
+    print('likelihood ratio    p-value       df')
+    print(res.lr_test())
+    print('Null model: pooled all coefficients are the same across groups,')
+    print('Alternative model: all coefficients are allowed to be different')
+    print('not verified but looks close to f-test result')
+
+    print('\nOLS parameters by group from individual, separate ols regressions')
+    for group in sorted(res.olsbygroup):
+        r = res.olsbygroup[group]
+        print(group, r.params)
+
+    print('\nCheck for heteroscedasticity, ')
+    print('variance and standard deviation for individual regressions')
+    print(' '*12, ' '.join('group %-10s' %(gr) for gr in res.unique))
+    print('variance    ', res.sigmabygroup)
+    print('standard dev', np.sqrt(res.sigmabygroup))
+
+#now added to class
+def print_results2(res):
+    groupind = res.groups
+    #res.fitjoint()  #not really necessary, because called by ftest_summary
+    ft = res.ftest_summary()
+    txt = ''
+    #print ft[0]  #skip because table is nicer
+    templ = \
+'''Table of F-tests for overall or pairwise equality of coefficients'
+%(tab)s
+
+
+Notes: p-values are not corrected for many tests
+       (no Bonferroni correction)
+       * : reject at 5%% uncorrected confidence level
+Null hypothesis: all or pairwise coefficient are the same'
+Alternative hypothesis: all coefficients are different'
+
+
+Comparison with stats.f_oneway
+%(statsfow)s
+
+
+Likelihood Ratio Test
+%(lrtest)s
+Null model: pooled all coefficients are the same across groups,'
+Alternative model: all coefficients are allowed to be different'
+not verified but looks close to f-test result'
+
+
+OLS parameters by group from individual, separate ols regressions'
+%(olsbg)s
+for group in sorted(res.olsbygroup):
+    r = res.olsbygroup[group]
+    print group, r.params
+
+
+Check for heteroscedasticity, '
+variance and standard deviation for individual regressions'
+%(grh)s
+variance    ', res.sigmabygroup
+standard dev', np.sqrt(res.sigmabygroup)
+'''
+
+    from statsmodels.iolib import SimpleTable
+    resvals = {}
+    resvals['tab'] = str(SimpleTable([(['%r' % (row[0],)]
+                        + list(row[1])
+                        + ['*']*(row[1][1]>0.5).item() ) for row in ft[1]],
+                      headers=['pair', 'F-statistic','p-value','df_denom',
+                               'df_num']))
+    resvals['statsfow'] = str(stats.f_oneway(*[y[groupind==gr] for gr in
+                                               res.unique]))
+    #resvals['lrtest'] = str(res.lr_test())
+    resvals['lrtest'] = str(SimpleTable([res.lr_test()],
+                                headers=['likelihood ratio', 'p-value', 'df'] ))
+
+    resvals['olsbg'] = str(SimpleTable([[group]
+                                        + res.olsbygroup[group].params.tolist()
+                                        for group in sorted(res.olsbygroup)]))
+    resvals['grh'] = str(SimpleTable(np.vstack([res.sigmabygroup,
+                                           np.sqrt(res.sigmabygroup)]),
+                                 headers=res.unique.tolist()))
+
+    return templ % resvals
+
+
+
+#get results for example
+#-----------------------
+
+print('\nTest for equality of coefficients for all exogenous variables')
 print('-------------------------------------------------------------')
-res = OneWayLS(y, x, groups=groupind.astype(int))
+res = OneWayLS(y,x, groups=groupind.astype(int))
 print_results(res)
-print("""

-One way ANOVA, constant is the only regressor""")
+print('\n\nOne way ANOVA, constant is the only regressor')
 print('---------------------------------------------')
+
 print('this is the same as scipy.stats.f_oneway')
-res = OneWayLS(y, np.ones(len(y)), groups=groupind)
+res = OneWayLS(y,np.ones(len(y)), groups=groupind)
 print_results(res)
-print("""

-One way ANOVA, constant is the only regressor with het is true""")
+
+print('\n\nOne way ANOVA, constant is the only regressor with het is true')
 print('--------------------------------------------------------------')
+
 print('this is the similar to scipy.stats.f_oneway,')
 print('but variance is not assumed to be the same across groups')
-res = OneWayLS(y, np.ones(len(y)), groups=groupind.astype(str), het=True)
+res = OneWayLS(y,np.ones(len(y)), groups=groupind.astype(str), het=True)
 print_results(res)
-print(res.print_summary())
+print(res.print_summary()) #(res)
diff --git a/statsmodels/sandbox/examples/ex_random_panel.py b/statsmodels/sandbox/examples/ex_random_panel.py
index cc2bd9b88..a555b531f 100644
--- a/statsmodels/sandbox/examples/ex_random_panel.py
+++ b/statsmodels/sandbox/examples/ex_random_panel.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """

 Created on Fri May 18 13:05:47 2012
@@ -6,46 +7,73 @@ Author: Josef Perktold

 moved example from main of random_panel
 """
+
 import numpy as np
 from statsmodels.sandbox.panel.panel_short import ShortPanelGLS, ShortPanelGLS2
 from statsmodels.sandbox.panel.random_panel import PanelSample
 import statsmodels.sandbox.panel.correlation_structures as cs
+
 import statsmodels.stats.sandwich_covariance as sw
+#from statsmodels.stats.sandwich_covariance import (
+#                   S_hac_groupsum, weights_bartlett, _HCCM2)
 from statsmodels.stats.moment_helpers import se_cov
 cov_nw_panel2 = sw.cov_nw_groupsum
+
+
 examples = ['ex1']
+
+
 if 'ex1' in examples:
     nobs = 100
     nobs_i = 5
     n_groups = nobs // nobs_i
     k_vars = 3
+
+#    dgp = PanelSample(nobs, k_vars, n_groups, corr_structure=cs.corr_equi,
+#                      corr_args=(0.6,))
+#    dgp = PanelSample(nobs, k_vars, n_groups, corr_structure=cs.corr_ar,
+#                      corr_args=([1, -0.95],))
     dgp = PanelSample(nobs, k_vars, n_groups, corr_structure=cs.corr_arma,
-        corr_args=([1], [1.0, -0.9]), seed=377769)
+                      corr_args=([1], [1., -0.9],), seed=377769)
     print('seed', dgp.seed)
     y = dgp.generate_panel()
     noise = y - dgp.y_true
-    print(np.corrcoef(y.reshape(-1, n_groups, order='F')))
-    print(np.corrcoef(noise.reshape(-1, n_groups, order='F')))
+    print(np.corrcoef(y.reshape(-1,n_groups, order='F')))
+    print(np.corrcoef(noise.reshape(-1,n_groups, order='F')))
+
     mod = ShortPanelGLS2(y, dgp.exog, dgp.groups)
     res = mod.fit()
     print(res.params)
     print(res.bse)
+    #Now what?
+    #res.resid is of transformed model
+    #np.corrcoef(res.resid.reshape(-1,n_groups, order='F'))
     y_pred = np.dot(mod.exog, res.params)
     resid = y - y_pred
-    print(np.corrcoef(resid.reshape(-1, n_groups, order='F')))
+    print(np.corrcoef(resid.reshape(-1,n_groups, order='F')))
     print(resid.std())
     err = y_pred - dgp.y_true
     print(err.std())
+    #OLS standard errors are too small
     mod.res_pooled.params
     mod.res_pooled.bse
+    #heteroscedasticity robust does not help
     mod.res_pooled.HC1_se
+    #compare with cluster robust se
+
     print(sw.se_cov(sw.cov_cluster(mod.res_pooled, dgp.groups.astype(int))))
+    #not bad, pretty close to panel estimator
+    #and with Newey-West Hac
     print(sw.se_cov(sw.cov_nw_panel(mod.res_pooled, 4, mod.group.groupidx)))
+    #too small, assuming no bugs,
+    #see Peterson assuming it refers to same kind of model
     print(dgp.cov)
+
     mod2 = ShortPanelGLS(y, dgp.exog, dgp.groups)
     res2 = mod2.fit_iterative(2)
     print(res2.params)
     print(res2.bse)
+    #both implementations produce the same results:
     from numpy.testing import assert_almost_equal
     assert_almost_equal(res.params, res2.params, decimal=12)
     assert_almost_equal(res.bse, res2.bse, decimal=13)
@@ -53,49 +81,71 @@ if 'ex1' in examples:
     res5 = mod5.fit_iterative(5)
     print(res5.params)
     print(res5.bse)
+    #fitting once is the same as OLS
+    #note: I need to create new instance, otherwise it continuous fitting
     mod1 = ShortPanelGLS(y, dgp.exog, dgp.groups)
     res1 = mod1.fit_iterative(1)
     res_ols = mod1._fit_ols()
     assert_almost_equal(res1.params, res_ols.params, decimal=12)
     assert_almost_equal(res1.bse, res_ols.bse, decimal=13)
+
+    #cov_hac_panel with uniform_kernel is the same as cov_cluster for balanced
+    #panel with full length kernel
+    #I fixe default correction to be equal
     mod2._fit_ols()
     cov_clu = sw.cov_cluster(mod2.res_pooled, dgp.groups.astype(int))
     clubse = se_cov(cov_clu)
     cov_uni = sw.cov_nw_panel(mod2.res_pooled, 4, mod2.group.groupidx,
-        weights_func=sw.weights_uniform, use_correction='cluster')
+                              weights_func=sw.weights_uniform,
+                              use_correction='cluster')
     assert_almost_equal(cov_uni, cov_clu, decimal=7)
+
+    #without correction
     cov_clu2 = sw.cov_cluster(mod2.res_pooled, dgp.groups.astype(int),
-        use_correction=False)
+                              use_correction=False)
     cov_uni2 = sw.cov_nw_panel(mod2.res_pooled, 4, mod2.group.groupidx,
-        weights_func=sw.weights_uniform, use_correction=False)
+                              weights_func=sw.weights_uniform,
+                              use_correction=False)
     assert_almost_equal(cov_uni2, cov_clu2, decimal=8)
+
     cov_white = sw.cov_white_simple(mod2.res_pooled)
     cov_pnw0 = sw.cov_nw_panel(mod2.res_pooled, 0, mod2.group.groupidx,
-        use_correction='hac')
+                              use_correction='hac')
     assert_almost_equal(cov_pnw0, cov_white, decimal=13)
+
     time = np.tile(np.arange(nobs_i), n_groups)
+    #time = mod2.group.group_int
     cov_pnw1 = sw.cov_nw_panel(mod2.res_pooled, 4, mod2.group.groupidx)
     cov_pnw2 = cov_nw_panel2(mod2.res_pooled, 4, time)
-    c2, ct, cg = sw.cov_cluster_2groups(mod2.res_pooled, time, dgp.groups.
-        astype(int), use_correction=False)
-    ct_nw0 = cov_nw_panel2(mod2.res_pooled, 0, time, weights_func=sw.
-        weights_uniform, use_correction=False)
-    cg_nw0 = cov_nw_panel2(mod2.res_pooled, 0, dgp.groups.astype(int),
-        weights_func=sw.weights_uniform, use_correction=False)
+    #s = sw.group_sums(x, time)
+
+    c2, ct, cg = sw.cov_cluster_2groups(mod2.res_pooled, time, dgp.groups.astype(int), use_correction=False)
+    ct_nw0 = cov_nw_panel2(mod2.res_pooled, 0, time, weights_func=sw.weights_uniform, use_correction=False)
+    cg_nw0 = cov_nw_panel2(mod2.res_pooled, 0, dgp.groups.astype(int), weights_func=sw.weights_uniform, use_correction=False)
     assert_almost_equal(ct_nw0, ct, decimal=13)
-    assert_almost_equal(cg_nw0, cg, decimal=13)
+    assert_almost_equal(cg_nw0, cg, decimal=13)   #pnw2 0 lags
     assert_almost_equal(cov_clu2, cg, decimal=13)
-    assert_almost_equal(cov_uni2, cg, decimal=8)
+    assert_almost_equal(cov_uni2, cg, decimal=8)  #pnw all lags
+
+
+
+
     import pandas as pa
-    se = pa.DataFrame(res_ols.bse[None, :], index=['OLS'])
-    se = se.append(pa.DataFrame(res5.bse[None, :], index=['PGLSit5']))
+    #pandas.DataFrame does not do inplace append
+    se = pa.DataFrame(res_ols.bse[None,:], index=['OLS'])
+    se = se.append(pa.DataFrame(res5.bse[None,:], index=['PGLSit5']))
     clbse = sw.se_cov(sw.cov_cluster(mod.res_pooled, dgp.groups.astype(int)))
-    se = se.append(pa.DataFrame(clbse[None, :], index=['OLSclu']))
+    se = se.append(pa.DataFrame(clbse[None,:], index=['OLSclu']))
     pnwse = sw.se_cov(sw.cov_nw_panel(mod.res_pooled, 4, mod.group.groupidx))
-    se = se.append(pa.DataFrame(pnwse[None, :], index=['OLSpnw']))
+    se = se.append(pa.DataFrame(pnwse[None,:], index=['OLSpnw']))
     print(se)
+    #list(se.index)
     from statsmodels.iolib.table import SimpleTable
     headers = [str(i) for i in se.columns]
-    stubs = list(se.index)
-    print(SimpleTable(np.asarray(se), headers=headers, stubs=stubs, txt_fmt
-        =dict(data_fmts=['%10.4f']), title='Standard Errors'))
+    stubs=list(se.index)
+#    print SimpleTable(np.round(np.asarray(se), 4),
+#                      headers=headers,
+#                      stubs=stubs)
+    print(SimpleTable(np.asarray(se), headers=headers, stubs=stubs,
+                      txt_fmt=dict(data_fmts=['%10.4f']),
+                      title='Standard Errors'))
diff --git a/statsmodels/sandbox/examples/example_crossval.py b/statsmodels/sandbox/examples/example_crossval.py
index 25d738d1c..135883586 100644
--- a/statsmodels/sandbox/examples/example_crossval.py
+++ b/statsmodels/sandbox/examples/example_crossval.py
@@ -1,38 +1,61 @@
+
 import numpy as np
+
 from statsmodels.sandbox.tools import cross_val
+
 if __name__ == '__main__':
+    #A: josef-pktd
+
     import statsmodels.api as sm
+    #from statsmodels.datasets.longley import load
     from statsmodels.datasets.stackloss import load
+
     data = load()
     data.exog = sm.tools.add_constant(data.exog, prepend=False)
+
     resols = sm.OLS(data.endog, data.exog).fit()
+
     print('\n OLS leave 1 out')
     for inidx, outidx in cross_val.LeaveOneOut(len(data.endog)):
-        res = sm.OLS(data.endog[inidx], data.exog[inidx, :]).fit()
-        print(data.endog[outidx], res.model.predict(res.params, data.exog[
-            outidx, :], end=' '))
-        print(data.endog[outidx] - res.model.predict(res.params, data.exog[
-            outidx, :]))
+        res = sm.OLS(data.endog[inidx], data.exog[inidx,:]).fit()
+        print(data.endog[outidx], res.model.predict(res.params, data.exog[outidx,:], end=' '))
+        print(data.endog[outidx] - res.model.predict(res.params, data.exog[outidx,:]))
+
     print('\n OLS leave 2 out')
     resparams = []
     for inidx, outidx in cross_val.LeavePOut(len(data.endog), 2):
-        res = sm.OLS(data.endog[inidx], data.exog[inidx, :]).fit()
+        res = sm.OLS(data.endog[inidx], data.exog[inidx,:]).fit()
+        #print data.endog[outidx], res.model.predict(data.exog[outidx,:]),
+        #print ((data.endog[outidx] - res.model.predict(data.exog[outidx,:]))**2).sum()
         resparams.append(res.params)
+
     resparams = np.array(resparams)
     print(resparams)
+
     doplots = 1
     if doplots:
         from matplotlib.font_manager import FontProperties
         import matplotlib.pyplot as plt
+
         plt.figure()
         figtitle = 'Leave2out parameter estimates'
-        t = plt.gcf().text(0.5, 0.95, figtitle, horizontalalignment=
-            'center', fontproperties=FontProperties(size=16))
+
+        t = plt.gcf().text(0.5,
+        0.95, figtitle,
+        horizontalalignment='center',
+        fontproperties=FontProperties(size=16))
+
         for i in range(resparams.shape[1]):
-            plt.subplot(4, 2, i + 1)
-            plt.hist(resparams[:, i], bins=10)
+            plt.subplot(4, 2, i+1)
+            plt.hist(resparams[:,i], bins = 10)
+            #plt.title("Leave2out parameter estimates")
         plt.show()
-    for inidx, outidx in cross_val.KStepAhead(20, 2):
+
+
+
+
+    for inidx, outidx in cross_val.KStepAhead(20,2):
+        #note the following were broken because KStepAhead returns now a slice by default
         print(inidx)
         print(np.ones(20)[inidx].sum(), np.arange(20)[inidx][-4:])
         print(outidx)
diff --git a/statsmodels/sandbox/examples/example_gam.py b/statsmodels/sandbox/examples/example_gam.py
index 52dd9cae6..bb8422a72 100644
--- a/statsmodels/sandbox/examples/example_gam.py
+++ b/statsmodels/sandbox/examples/example_gam.py
@@ -1,36 +1,47 @@
-"""original example for checking how far GAM works
+'''original example for checking how far GAM works

 Note: uncomment plt.show() to display graphs
-"""
+'''
+
 import time
+
 import numpy as np
 import numpy.random as R
 import matplotlib.pyplot as plt
 import scipy.stats
+
 from statsmodels.sandbox.gam import AdditiveModel
-from statsmodels.sandbox.gam import Model as GAM
+from statsmodels.sandbox.gam import Model as GAM #?
 from statsmodels.genmod.families import family
-example = 2
+
+example = 2  # 1,2 or 3
+
 standardize = lambda x: (x - x.mean()) / x.std()
-demean = lambda x: x - x.mean()
+demean = lambda x: (x - x.mean())
 nobs = 150
 x1 = R.standard_normal(nobs)
 x1.sort()
 x2 = R.standard_normal(nobs)
 x2.sort()
 y = R.standard_normal((nobs,))
-f1 = lambda x1: x1 + x1 ** 2 - 3 - 1 * x1 ** 3 + 0.1 * np.exp(-x1 / 4.0)
-f2 = lambda x2: x2 + x2 ** 2 - 0.1 * np.exp(x2 / 4.0)
+
+f1 = lambda x1: (x1 + x1**2 - 3 - 1 * x1**3 + 0.1 * np.exp(-x1/4.))
+f2 = lambda x2: (x2 + x2**2 - 0.1 * np.exp(x2/4.))
 z = standardize(f1(x1)) + standardize(f2(x2))
-z = standardize(z) * 2
+z = standardize(z) * 2 # 0.1
+
 y += z
-d = np.array([x1, x2]).T
+d = np.array([x1,x2]).T
+
+
 if example == 1:
-    print('normal')
+    print("normal")
     m = AdditiveModel(d)
     m.fit(y)
-    x = np.linspace(-2, 2, 50)
+    x = np.linspace(-2,2,50)
+
     print(m)
+
     y_pred = m.results.predict(d)
     plt.figure()
     plt.plot(y, '.')
@@ -38,8 +49,10 @@ if example == 1:
     plt.plot(y_pred, 'r-', label='AdditiveModel')
     plt.legend()
     plt.title('gam.AdditiveModel')
+
+
 if example == 2:
-    print('binomial')
+    print("binomial")
     f = family.Binomial()
     b = np.asarray([scipy.stats.bernoulli.rvs(p) for p in f.link.inverse(y)])
     b.shape = y.shape
@@ -47,24 +60,41 @@ if example == 2:
     toc = time.time()
     m.fit(b)
     tic = time.time()
-    print(tic - toc)
+    print(tic-toc)
+
+
 if example == 3:
-    print('Poisson')
+    print("Poisson")
     f = family.Poisson()
-    y = y / y.max() * 3
+    y = y/y.max() * 3
     yp = f.link.inverse(y)
-    p = np.asarray([scipy.stats.poisson.rvs(p) for p in f.link.inverse(y)],
-        float)
+    p = np.asarray([scipy.stats.poisson.rvs(p) for p in f.link.inverse(y)], float)
     p.shape = y.shape
     m = GAM(p, d, family=f)
     toc = time.time()
     m.fit(p)
     tic = time.time()
-    print(tic - toc)
+    print(tic-toc)
+
+
 plt.figure()
 plt.plot(x1, standardize(m.smoothers[0](x1)), 'r')
 plt.plot(x1, standardize(f1(x1)), linewidth=2)
 plt.figure()
 plt.plot(x2, standardize(m.smoothers[1](x2)), 'r')
 plt.plot(x2, standardize(f2(x2)), linewidth=2)
+
+
+
+
 plt.show()
+
+
+
+##     pylab.figure(num=1)
+##     pylab.plot(x1, standardize(m.smoothers[0](x1)), 'b')
+##     pylab.plot(x1, standardize(f1(x1)), linewidth=2)
+##     pylab.figure(num=2)
+##     pylab.plot(x2, standardize(m.smoothers[1](x2)), 'b')
+##     pylab.plot(x2, standardize(f2(x2)), linewidth=2)
+##     pylab.show()
diff --git a/statsmodels/sandbox/examples/example_gam_0.py b/statsmodels/sandbox/examples/example_gam_0.py
index 7677ff966..1d6e3c9cb 100644
--- a/statsmodels/sandbox/examples/example_gam_0.py
+++ b/statsmodels/sandbox/examples/example_gam_0.py
@@ -1,4 +1,4 @@
-"""first examples for gam and PolynomialSmoother used for debugging
+'''first examples for gam and PolynomialSmoother used for debugging

 This example was written as a test case.
 The data generating process is chosen so the parameters are well identified
@@ -6,80 +6,113 @@ and estimated.


 Note: uncomment plt.show() to display graphs
-"""
+'''
+
 import time
+
 import numpy as np
 import numpy.random as R
 import matplotlib.pyplot as plt
 import scipy.stats
+
 from statsmodels.sandbox.gam import AdditiveModel
-from statsmodels.sandbox.gam import Model as GAM
+from statsmodels.sandbox.gam import Model as GAM #?
 from statsmodels.genmod import families
-example = 2
+
+
+example = 2 #3  # 1,2 or 3
+
+#np.random.seed(987654)
+
 standardize = lambda x: (x - x.mean()) / x.std()
-demean = lambda x: x - x.mean()
+demean = lambda x: (x - x.mean())
 nobs = 500
-lb, ub = -1.0, 1.0
-x1 = R.uniform(lb, ub, nobs)
+lb, ub = -1., 1. #for Poisson
+#lb, ub = -0.75, 2 #0.75 #for Binomial
+x1 = R.uniform(lb, ub, nobs)   #R.standard_normal(nobs)
 x1 = np.linspace(lb, ub, nobs)
 x1.sort()
-x2 = R.uniform(lb, ub, nobs)
+x2 = R.uniform(lb, ub, nobs)   #
+#x2 = R.standard_normal(nobs)
 x2.sort()
-x2 = x2 + np.exp(x2 / 2.0)
-y = 0.5 * R.uniform(lb, ub, nobs)
-f1 = lambda x1: 2 * x1 - 0.5 * x1 ** 2 - 0.75 * x1 ** 3
-f2 = lambda x2: x2 - 1 * x2 ** 2
+#x2 = np.cos(x2)
+x2 = x2 + np.exp(x2/2.)
+#x2 = np.log(x2-x2.min()+0.1)
+y = 0.5 * R.uniform(lb, ub, nobs)   #R.standard_normal((nobs,))
+
+f1 = lambda x1: (2*x1 - 0.5 * x1**2  - 0.75 * x1**3) # + 0.1 * np.exp(-x1/4.))
+f2 = lambda x2: (x2 - 1* x2**2) # - 0.75 * np.exp(x2))
 z = standardize(f1(x1)) + standardize(f2(x2))
-z = standardize(z) + 1
+z = standardize(z) + 1 # 0.1
+#try this
 z = f1(x1) + f2(x2)
+#z = demean(z)
 z -= np.median(z)
 print('z.std()', z.std())
+#z = standardize(z) + 0.2
+# with standardize I get better values, but I do not know what the true params are
 print(z.mean(), z.min(), z.max())
+
+#y += z  #noise
 y = z
-d = np.array([x1, x2]).T
+
+d = np.array([x1,x2]).T
+
+
 if example == 1:
-    print('normal')
+    print("normal")
     m = AdditiveModel(d)
     m.fit(y)
-    x = np.linspace(-2, 2, 50)
+    x = np.linspace(-2,2,50)
+
     print(m)
+
+
 if example == 2:
-    print('binomial')
+    print("binomial")
     mod_name = 'Binomial'
     f = families.Binomial()
+    #b = np.asarray([scipy.stats.bernoulli.rvs(p) for p in f.link.inverse(y)])
     b = np.asarray([scipy.stats.bernoulli.rvs(p) for p in f.link.inverse(z)])
     b.shape = y.shape
     m = GAM(b, d, family=f)
     toc = time.time()
     m.fit(b)
     tic = time.time()
-    print(tic - toc)
+    print(tic-toc)
+    #for plotting
     yp = f.link.inverse(y)
     p = b
+
+
 if example == 3:
-    print('Poisson')
+    print("Poisson")
     f = families.Poisson()
+    #y = y/y.max() * 3
     yp = f.link.inverse(z)
-    p = np.asarray([scipy.stats.poisson.rvs(val) for val in f.link.inverse(
-        z)], float)
+    p = np.asarray([scipy.stats.poisson.rvs(val) for val in f.link.inverse(z)],
+                   float)
     p.shape = y.shape
     m = GAM(p, d, family=f)
     toc = time.time()
     m.fit(p)
     tic = time.time()
-    print(tic - toc)
+    print(tic-toc)
+
 if example > 1:
-    y_pred = m.results.mu
+    y_pred = m.results.mu# + m.results.alpha#m.results.predict(d)
     plt.figure()
-    plt.subplot(2, 2, 1)
+    plt.subplot(2,2,1)
     plt.plot(p, '.')
     plt.plot(yp, 'b-', label='true')
     plt.plot(y_pred, 'r-', label='GAM')
     plt.legend(loc='upper left')
     plt.title('gam.GAM ' + mod_name)
+
     counter = 2
     for ii, xx in zip(['z', 'x1', 'x2'], [z, x1, x2]):
         sortidx = np.argsort(xx)
+        #plt.figure()
         plt.subplot(2, 2, counter)
         plt.plot(xx[sortidx], p[sortidx], '.')
         plt.plot(xx[sortidx], yp[sortidx], 'b.', label='true')
@@ -87,13 +120,50 @@ if example > 1:
         plt.legend(loc='upper left')
         plt.title('gam.GAM ' + mod_name + ' ' + ii)
         counter += 1
+
+#    counter = 2
+#    for ii, xx in zip(['z', 'x1', 'x2'], [z, x1, x2]):
+#        #plt.figure()
+#        plt.subplot(2, 2, counter)
+#        plt.plot(xx, p, '.')
+#        plt.plot(xx, yp, 'b-', label='true')
+#        plt.plot(xx, y_pred, 'r-', label='GAM')
+#        plt.legend(loc='upper left')
+#        plt.title('gam.GAM Poisson ' + ii)
+#        counter += 1
+
     plt.figure()
-    plt.plot(z, 'b-', label='true')
+    plt.plot(z, 'b-', label='true' )
     plt.plot(np.log(m.results.mu), 'r-', label='GAM')
     plt.title('GAM Poisson, raw')
+
+
 plt.figure()
 plt.plot(x1, standardize(m.smoothers[0](x1)), 'r')
 plt.plot(x1, standardize(f1(x1)), linewidth=2)
 plt.figure()
 plt.plot(x2, standardize(m.smoothers[1](x2)), 'r')
 plt.plot(x2, standardize(f2(x2)), linewidth=2)
+
+##y_pred = m.results.predict(d)
+##plt.figure()
+##plt.plot(z, p, '.')
+##plt.plot(z, yp, 'b-', label='true')
+##plt.plot(z, y_pred, 'r-', label='AdditiveModel')
+##plt.legend()
+##plt.title('gam.AdditiveModel')
+
+
+
+
+#plt.show()
+
+
+
+##     pylab.figure(num=1)
+##     pylab.plot(x1, standardize(m.smoothers[0](x1)), 'b')
+##     pylab.plot(x1, standardize(f1(x1)), linewidth=2)
+##     pylab.figure(num=2)
+##     pylab.plot(x2, standardize(m.smoothers[1](x2)), 'b')
+##     pylab.plot(x2, standardize(f2(x2)), linewidth=2)
+##     pylab.show()
diff --git a/statsmodels/sandbox/examples/example_mle.py b/statsmodels/sandbox/examples/example_mle.py
index 21ef78de7..5793c23c8 100644
--- a/statsmodels/sandbox/examples/example_mle.py
+++ b/statsmodels/sandbox/examples/example_mle.py
@@ -1,49 +1,68 @@
-"""Examples to compare MLE with OLS
+'''Examples to compare MLE with OLS

 TODO: compare standard error of parameter estimates
-"""
+'''
+
 import numpy as np
 from scipy import optimize
+
 import statsmodels.api as sm
 from statsmodels.datasets.longley import load
-print("""
-Example 1: Artificial Data""")
+
+print('\nExample 1: Artificial Data')
 print('--------------------------\n')
+
 np.random.seed(54321)
-X = np.random.rand(40, 2)
+X = np.random.rand(40,2)
 X = sm.add_constant(X, prepend=False)
 beta = np.array((3.5, 5.7, 150))
-Y = np.dot(X, beta) + np.random.standard_normal(40)
-mod2 = sm.OLS(Y, X)
+Y = np.dot(X,beta) + np.random.standard_normal(40)
+mod2 = sm.OLS(Y,X)
 res2 = mod2.fit()
-f2 = lambda params: -1 * mod2.loglike(params)
+f2 = lambda params: -1*mod2.loglike(params)
 resfmin = optimize.fmin(f2, np.ones(3), ftol=1e-10)
 print('OLS')
 print(res2.params)
 print('MLE')
 print(resfmin)
-print("""
-Example 2: Longley Data, high multicollinearity""")
+
+
+
+print('\nExample 2: Longley Data, high multicollinearity')
 print('-----------------------------------------------\n')
+
 data = load()
 data.exog = sm.add_constant(data.exog, prepend=False)
 mod = sm.OLS(data.endog, data.exog)
-f = lambda params: -1 * mod.loglike(params)
-score = lambda params: -1 * mod.score(params)
+f = lambda params: -1*mod.loglike(params)
+score = lambda params: -1*mod.score(params)
+
+#now you're set up to try and minimize or root find, but I couldn't get this one to work
+#note that if you want to get the results, it's also a property of mod, so you can do
+
 res = mod.fit()
+#print mod.results.params
 print('OLS')
 print(res.params)
 print('MLE')
-resfmin2 = optimize.fmin(f, np.ones(7), maxfun=5000, maxiter=5000, xtol=
-    1e-10, ftol=1e-10)
+#resfmin2 = optimize.fmin(f, mod.results.params*0.9, maxfun=5000, maxiter=5000, xtol=1e-10, ftol= 1e-10)
+resfmin2 = optimize.fmin(f, np.ones(7), maxfun=5000, maxiter=5000, xtol=1e-10, ftol= 1e-10)
 print(resfmin2)
-xtxi = np.linalg.inv(np.dot(data.exog.T, data.exog))
+# there is not a unique solution?  Is this due to the multicollinearity? Improved with use of analytically
+# defined score function?
+
+#check X'X matrix
+xtxi = np.linalg.inv(np.dot(data.exog.T,data.exog))
 eval, evec = np.linalg.eig(xtxi)
 print('Eigenvalues')
 print(eval)
+# look at correlation
 print('correlation matrix')
-print(np.corrcoef(data.exog[:, :-1], rowvar=0))
+print(np.corrcoef(data.exog[:,:-1], rowvar=0)) #exclude constant
+# --> conclusion high multicollinearity
+
+# compare
 print('with matrix formula')
-print(np.dot(xtxi, np.dot(data.exog.T, data.endog[:, np.newaxis])).ravel())
+print(np.dot(xtxi,np.dot(data.exog.T, data.endog[:,np.newaxis])).ravel())
 print('with pinv')
-print(np.dot(np.linalg.pinv(data.exog), data.endog[:, np.newaxis]).ravel())
+print(np.dot(np.linalg.pinv(data.exog), data.endog[:,np.newaxis]).ravel())
diff --git a/statsmodels/sandbox/examples/example_nbin.py b/statsmodels/sandbox/examples/example_nbin.py
index 070901bb7..d61820493 100644
--- a/statsmodels/sandbox/examples/example_nbin.py
+++ b/statsmodels/sandbox/examples/example_nbin.py
@@ -1,4 +1,5 @@
-"""
+# -*- coding: utf-8 -*-
+'''
 Author: Vincent Arel-Bundock <varel@umich.edu>
 Date: 2012-08-25

@@ -18,20 +19,23 @@ compare to equivalent models in R. Results usually agree up to the 4th digit.
 The NB-P and left-truncated model results have not been compared to other
 implementations. Note that NB-P appears to only have been implemented in the
 LIMDEP software.
-"""
+'''
 from urllib.request import urlopen
+
 import numpy as np
 from numpy.testing import assert_almost_equal
 from scipy.special import digamma
 from scipy.stats import nbinom
 import pandas
 import patsy
+
 from statsmodels.base.model import GenericLikelihoodModel
 from statsmodels.base.model import GenericLikelihoodModelResults


+#### Negative Binomial Log-likelihoods ####
 def _ll_nbp(y, X, beta, alph, Q):
-    """
+    r'''
     Negative Binomial Log-likelihood -- type P

     References:
@@ -41,50 +45,63 @@ def _ll_nbp(y, X, beta, alph, Q):
     Hilbe, J.M. 2011. "Negative binomial regression". Cambridge University Press.

     Following notation in Greene (2008), with negative binomial heterogeneity
-    parameter :math:`\\alpha`:
+    parameter :math:`\alpha`:

     .. math::

-        \\lambda_i = exp(X\\beta)\\\\
-        \\theta = 1 / \\alpha \\\\
-        g_i = \\theta \\lambda_i^Q \\\\
-        w_i = g_i/(g_i + \\lambda_i) \\\\
-        r_i = \\theta / (\\theta+\\lambda_i) \\\\
-        ln \\mathcal{L}_i = ln \\Gamma(y_i+g_i) - ln \\Gamma(1+y_i) + g_iln (r_i) + y_i ln(1-r_i)
-    """
-    pass
+        \lambda_i = exp(X\beta)\\
+        \theta = 1 / \alpha \\
+        g_i = \theta \lambda_i^Q \\
+        w_i = g_i/(g_i + \lambda_i) \\
+        r_i = \theta / (\theta+\lambda_i) \\
+        ln \mathcal{L}_i = ln \Gamma(y_i+g_i) - ln \Gamma(1+y_i) + g_iln (r_i) + y_i ln(1-r_i)
+    '''
+    mu = np.exp(np.dot(X, beta))
+    size = 1/alph*mu**Q
+    prob = size/(size+mu)
+    ll = nbinom.logpmf(y, size, prob)
+    return ll


 def _ll_nb1(y, X, beta, alph):
-    """Negative Binomial regression (type 1 likelihood)"""
-    pass
+    '''Negative Binomial regression (type 1 likelihood)'''
+    ll = _ll_nbp(y, X, beta, alph, Q=1)
+    return ll


 def _ll_nb2(y, X, beta, alph):
-    """Negative Binomial regression (type 2 likelihood)"""
-    pass
+    '''Negative Binomial regression (type 2 likelihood)'''
+    ll = _ll_nbp(y, X, beta, alph, Q=0)
+    return ll


 def _ll_geom(y, X, beta):
-    """Geometric regression"""
-    pass
+    '''Geometric regression'''
+    ll = _ll_nbp(y, X, beta, alph=1, Q=0)
+    return ll


 def _ll_nbt(y, X, beta, alph, C=0):
-    """
+    r'''
     Negative Binomial (truncated)

     Truncated densities for count models (Cameron & Trivedi, 2005, 680):

     .. math::

-        f(y|\\beta, y \\geq C+1) = \\frac{f(y|\\beta)}{1-F(C|\\beta)}
-    """
-    pass
+        f(y|\beta, y \geq C+1) = \frac{f(y|\beta)}{1-F(C|\beta)}
+    '''
+    Q = 0
+    mu = np.exp(np.dot(X, beta))
+    size = 1/alph*mu**Q
+    prob = size/(size+mu)
+    ll = nbinom.logpmf(y, size, prob) - np.log(1 - nbinom.cdf(C, size, prob))
+    return ll


+#### Model Classes ####
 class NBin(GenericLikelihoodModel):
-    """
+    '''
     Negative Binomial regression

     Parameters
@@ -104,26 +121,28 @@ class NBin(GenericLikelihoodModel):
         `geom`: Geometric regression model
     C: int
         Cut-point for `nbt` model
-    """
-
+    '''
     def __init__(self, endog, exog, ll_type='nb2', C=0, **kwds):
         self.exog = np.array(exog)
         self.endog = np.array(endog)
         self.C = C
         super(NBin, self).__init__(endog, exog, **kwds)
+        # Check user input
         if ll_type not in ['nb2', 'nb1', 'nbp', 'nbt', 'geom']:
             raise NameError('Valid ll_type are: nb2, nb1, nbp,  nbt, geom')
         self.ll_type = ll_type
+        # Starting values (assumes first column of exog is constant)
         if ll_type == 'geom':
             self.start_params_default = np.zeros(self.exog.shape[1])
         elif ll_type == 'nbp':
+            # Greene recommends starting NB-P at NB-2
             start_mod = NBin(endog, exog, 'nb2')
             start_res = start_mod.fit(disp=False)
             self.start_params_default = np.append(start_res.params, 0)
         else:
-            self.start_params_default = np.append(np.zeros(self.exog.shape[
-                1]), 0.5)
+            self.start_params_default = np.append(np.zeros(self.exog.shape[1]), .5)
         self.start_params_default[0] = np.log(self.endog.mean())
+        # Define loglik based on ll_type argument
         if ll_type == 'nb1':
             self.ll_func = _ll_nb1
         elif ll_type == 'nb2':
@@ -135,46 +154,105 @@ class NBin(GenericLikelihoodModel):
         elif ll_type == 'nbt':
             self.ll_func = _ll_nbt

+    def nloglikeobs(self, params):
+        alph = params[-1]
+        beta = params[:self.exog.shape[1]]
+        if self.ll_type == 'geom':
+            return -self.ll_func(self.endog, self.exog, beta)
+        elif self.ll_type == 'nbt':
+            return -self.ll_func(self.endog, self.exog, beta, alph, self.C)
+        elif self.ll_type == 'nbp':
+            Q = params[-2]
+            return -self.ll_func(self.endog, self.exog, beta, alph, Q)
+        else:
+            return -self.ll_func(self.endog, self.exog, beta, alph)
+
+    def fit(self, start_params=None, maxiter=10000, maxfun=5000, **kwds):
+        if start_params is None:
+            countfit = super(NBin, self).fit(start_params=self.start_params_default,
+                                             maxiter=maxiter, maxfun=maxfun, **kwds)
+        else:
+            countfit = super(NBin, self).fit(start_params=start_params,
+                                             maxiter=maxiter, maxfun=maxfun, **kwds)
+        countfit = CountResults(self, countfit)
+        return countfit

-class CountResults(GenericLikelihoodModelResults):

+class CountResults(GenericLikelihoodModelResults):
     def __init__(self, model, mlefit):
         self.model = model
         self.__dict__.update(mlefit.__dict__)
+    def summary(self, yname=None, xname=None, title=None, alpha=.05,
+                yname_list=None):
+        top_left = [('Dep. Variable:', None),
+                     ('Model:', [self.model.__class__.__name__]),
+                     ('Method:', ['MLE']),
+                     ('Date:', None),
+                     ('Time:', None),
+                     ('Converged:', ["%s" % self.mle_retvals['converged']])]
+        top_right = [('No. Observations:', None),
+                     ('Log-Likelihood:', None),
+                     ]
+        if title is None:
+            title = self.model.__class__.__name__ + ' ' + "Regression Results"
+        #boiler plate
+        from statsmodels.iolib.summary import Summary
+        smry = Summary()
+        # for top of table
+        smry.add_table_2cols(self, gleft=top_left, gright=top_right, #[],
+                          yname=yname, xname=xname, title=title)
+        # for parameters, etc
+        smry.add_table_params(self, yname=yname_list, xname=xname, alpha=alpha,
+                             use_t=True)
+        return smry
+
+
+#### Score function for NB-P ####


 def _score_nbp(y, X, beta, thet, Q):
-    """
+    r'''
     Negative Binomial Score -- type P likelihood from Greene (2007)
     .. math::

-        \\lambda_i = exp(X\\beta)\\\\
-        g_i = \\theta \\lambda_i^Q \\\\
-        w_i = g_i/(g_i + \\lambda_i) \\\\
-        r_i = \\theta / (\\theta+\\lambda_i) \\\\
-        A_i = \\left [ \\Psi(y_i+g_i) - \\Psi(g_i) + ln w_i \\right ] \\\\
-        B_i = \\left [ g_i (1-w_i) - y_iw_i \\right ] \\\\
-        \\partial ln \\mathcal{L}_i / \\partial
-            \\begin{pmatrix} \\lambda_i \\\\ \\theta \\\\ Q \\end{pmatrix}=
+        \lambda_i = exp(X\beta)\\
+        g_i = \theta \lambda_i^Q \\
+        w_i = g_i/(g_i + \lambda_i) \\
+        r_i = \theta / (\theta+\lambda_i) \\
+        A_i = \left [ \Psi(y_i+g_i) - \Psi(g_i) + ln w_i \right ] \\
+        B_i = \left [ g_i (1-w_i) - y_iw_i \right ] \\
+        \partial ln \mathcal{L}_i / \partial
+            \begin{pmatrix} \lambda_i \\ \theta \\ Q \end{pmatrix}=
             [A_i+B_i]
-            \\begin{pmatrix} Q/\\lambda_i \\\\ 1/\\theta \\\\ ln(\\lambda_i) \\end{pmatrix}
+            \begin{pmatrix} Q/\lambda_i \\ 1/\theta \\ ln(\lambda_i) \end{pmatrix}
             -B_i
-            \\begin{pmatrix} 1/\\lambda_i\\\\ 0 \\\\ 0 \\end{pmatrix} \\\\
-        \\frac{\\partial \\lambda}{\\partial \\beta} = \\lambda_i \\mathbf{x}_i \\\\
-        \\frac{\\partial \\mathcal{L}_i}{\\partial \\beta} =
-            \\left (\\frac{\\partial\\mathcal{L}_i}{\\partial \\lambda_i} \\right )
-            \\frac{\\partial \\lambda_i}{\\partial \\beta}
-    """
-    pass
-
-
-medpar = pandas.read_csv(urlopen(
-    'https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/csv/COUNT/medpar.csv'
-    ))
-mdvis = pandas.read_csv(urlopen(
-    'https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/csv/COUNT/mdvis.csv'
-    ))
-"""
+            \begin{pmatrix} 1/\lambda_i\\ 0 \\ 0 \end{pmatrix} \\
+        \frac{\partial \lambda}{\partial \beta} = \lambda_i \mathbf{x}_i \\
+        \frac{\partial \mathcal{L}_i}{\partial \beta} =
+            \left (\frac{\partial\mathcal{L}_i}{\partial \lambda_i} \right )
+            \frac{\partial \lambda_i}{\partial \beta}
+    '''
+    lamb = np.exp(np.dot(X, beta))
+    g = thet * lamb**Q
+    w = g / (g + lamb)
+    r = thet / (thet+lamb)
+    A = digamma(y+g) - digamma(g) + np.log(w)
+    B = g*(1-w) - y*w
+    dl = (A+B) * Q/lamb - B * 1/lamb
+    dt = (A+B) * 1/thet
+    dq = (A+B) * np.log(lamb)
+    db = X * (dl * lamb)[:,np.newaxis]
+    sc = np.array([dt.sum(), dq.sum()])
+    sc = np.concatenate([db.sum(axis=0), sc])
+    return sc
+
+
+#### Tests ####
+medpar = pandas.read_csv(urlopen('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/csv/COUNT/medpar.csv'))
+mdvis = pandas.read_csv(urlopen('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/csv/COUNT/mdvis.csv'))
+
+# NB-2
+'''
 # R v2.15.1
 library(MASS)
 library(COUNT)
@@ -213,8 +291,19 @@ Number of Fisher Scoring iterations: 1
           Std. Err.:  0.0997

  2 x log-likelihood:  -9594.9530
-"""
-"""
+'''
+
+def test_nb2():
+    y, X = patsy.dmatrices('los ~ C(type) + hmo + white', medpar)
+    y = np.array(y)[:,0]
+    nb2 = NBin(y,X,'nb2').fit(maxiter=10000, maxfun=5000)
+    assert_almost_equal(nb2.params,
+                        [2.31027893349935, 0.221248978197356, 0.706158824346228,
+                         -0.067955221930748, -0.129065442248951, 0.4457567],
+                        decimal=2)
+
+# NB-1
+'''
 # R v2.15.1
 # COUNT v1.2.3
 library(COUNT)
@@ -229,8 +318,20 @@ factor(type)3  0.41879257 0.06553258  6.3906006  0.29034871  0.54723643
 hmo           -0.04533566 0.05004714 -0.9058592 -0.14342805  0.05275673
 white         -0.12951295 0.06071130 -2.1332593 -0.24850710 -0.01051880
 alpha          4.57898241 0.22015968 20.7984603  4.14746943  5.01049539
-"""
-"""
+'''
+
+#def test_nb1():
+    #y, X = patsy.dmatrices('los ~ C(type) + hmo + white', medpar)
+    #y = np.array(y)[:,0]
+    ## TODO: Test fails with some of the other optimization methods
+    #nb1 = NBin(y,X,'nb1').fit(method='ncg', maxiter=10000, maxfun=5000)
+    #assert_almost_equal(nb1.params,
+                        #[2.34918407014186, 0.161754714412848, 0.418792569970658,
+                        # -0.0453356614650342, -0.129512952033423, 4.57898241219275],
+                        #decimal=2)
+
+# NB-Geometric
+'''
 MASS v7.3-20
 R v2.15.1
 library(MASS)
@@ -262,5 +363,16 @@ Residual deviance: 811.95  on 1490  degrees of freedom
 AIC: 9927.3

 Number of Fisher Scoring iterations: 5
-"""
+'''
+
+#def test_geom():
+    #y, X = patsy.dmatrices('los ~ C(type) + hmo + white', medpar)
+    #y = np.array(y)[:,0]
+    ## TODO: remove alph from geom params
+    #geom = NBin(y,X,'geom').fit(maxiter=10000, maxfun=5000)
+    #assert_almost_equal(geom.params,
+                        #[2.3084850946241, 0.221206159108742, 0.705986369841159,
+                        # -0.0677871843613577, -0.127088772164963],
+                        #decimal=4)
+
 test_nb2()
diff --git a/statsmodels/sandbox/examples/example_pca.py b/statsmodels/sandbox/examples/example_pca.py
index 299a1f7fd..2d32c25f3 100644
--- a/statsmodels/sandbox/examples/example_pca.py
+++ b/statsmodels/sandbox/examples/example_pca.py
@@ -1,9 +1,21 @@
+#!/usr/bin/env python
+
 import numpy as np
 from statsmodels.sandbox.pca import Pca
-x = np.random.randn(1000)
-y = x * 2.3 + 5 + np.random.randn(1000)
-z = x * 3.1 + 2.1 * y + np.random.randn(1000) / 2
-p = Pca((x, y, z))
-print('energies:', p.getEnergies())
-print('vecs:', p.getEigenvectors())
-print('projected data', p.project(vals=np.ones((3, 10))))
+
+x=np.random.randn(1000)
+y=x*2.3+5+np.random.randn(1000)
+z=x*3.1+2.1*y+np.random.randn(1000)/2
+
+#create the Pca object - requires a p x N array as the input
+p=Pca((x,y,z))
+print('energies:',p.getEnergies())
+print('vecs:',p.getEigenvectors())
+print('projected data',p.project(vals=np.ones((3,10))))
+
+
+#p.plot2d() #requires matplotlib
+#from matplotlib import pyplot as plt
+#plt.show() #necessary for script
+
+#p.plot3d() #requires mayavi
diff --git a/statsmodels/sandbox/examples/example_pca_regression.py b/statsmodels/sandbox/examples/example_pca_regression.py
index 996d19665..4927db5b1 100644
--- a/statsmodels/sandbox/examples/example_pca_regression.py
+++ b/statsmodels/sandbox/examples/example_pca_regression.py
@@ -1,4 +1,4 @@
-"""Example: Principal Component Regression
+'''Example: Principal Component Regression

 * simulate model with 2 factors and 4 explanatory variables
 * use pca to extract factors from data,
@@ -12,57 +12,93 @@ endogenous variable.
 # try out partial correlation for dropping (or adding) factors
 # get algorithm for partial least squares as an alternative to PCR

-"""
+'''
+
+
 import numpy as np
 import statsmodels.api as sm
 from statsmodels.iolib.table import SimpleTable
 from statsmodels.sandbox.tools import pca
 from statsmodels.sandbox.tools.cross_val import LeaveOneOut
+
+
+# Example: principal component regression
 nobs = 1000
-f0 = np.c_[np.random.normal(size=(nobs, 2)), np.ones((nobs, 1))]
-f2xcoef = np.c_[np.repeat(np.eye(2), 2, 0), np.arange(4)[::-1]].T
-f2xcoef = np.array([[1.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 1.0], [3.0, 2.0, 
-    1.0, 0.0]])
-f2xcoef = np.array([[0.1, 3.0, 1.0, 0.0], [0.0, 0.0, 1.5, 0.1], [3.0, 2.0, 
-    1.0, 0.0]])
+f0 = np.c_[np.random.normal(size=(nobs,2)), np.ones((nobs,1))]
+f2xcoef = np.c_[np.repeat(np.eye(2),2,0),np.arange(4)[::-1]].T
+f2xcoef = np.array([[ 1.,  1.,  0.,  0.],
+                    [ 0.,  0.,  1.,  1.],
+                    [ 3.,  2.,  1.,  0.]])
+f2xcoef = np.array([[ 0.1,  3.,  1.,    0.],
+                    [ 0.,  0.,  1.5,   0.1],
+                    [ 3.,  2.,  1.,    0.]])
 x0 = np.dot(f0, f2xcoef)
-x0 += 0.1 * np.random.normal(size=x0.shape)
-ytrue = np.dot(f0, [1.0, 1.0, 1.0])
-y0 = ytrue + 0.1 * np.random.normal(size=ytrue.shape)
-xred, fact, eva, eve = pca(x0, keepdim=0)
+x0 += 0.1*np.random.normal(size=x0.shape)
+ytrue = np.dot(f0,[1., 1., 1.])
+y0 = ytrue + 0.1*np.random.normal(size=ytrue.shape)
+
+xred, fact, eva, eve  = pca(x0, keepdim=0)
 print(eve)
 print(fact[:5])
 print(f0[:5])
+
 res = sm.OLS(y0, sm.add_constant(x0, prepend=False)).fit()
 print('OLS on original data')
 print(res.params)
 print(res.aic)
 print(res.rsquared)
+
+#print 'OLS on Factors'
+#for k in range(x0.shape[1]):
+#    xred, fact, eva, eve  = pca(x0, keepdim=k, normalize=1)
+#    fact_wconst = sm.add_constant(fact)
+#    res = sm.OLS(y0, fact_wconst).fit()
+#    print 'k =', k
+#    print res.params
+#    print 'aic:  ', res.aic
+#    print 'bic:  ', res.bic
+#    print 'llf:  ', res.llf
+#    print 'R2    ', res.rsquared
+#    print 'R2 adj', res.rsquared_adj
+
 print('OLS on Factors')
 results = []
-xred, fact, eva, eve = pca(x0, keepdim=0, normalize=1)
-for k in range(0, x0.shape[1] + 1):
-    fact_wconst = sm.add_constant(fact[:, :k], prepend=False)
+xred, fact, eva, eve  = pca(x0, keepdim=0, normalize=1)
+for k in range(0, x0.shape[1]+1):
+    #xred, fact, eva, eve  = pca(x0, keepdim=k, normalize=1)
+    # this is faster and same result
+    fact_wconst = sm.add_constant(fact[:,:k], prepend=False)
     res = sm.OLS(y0, fact_wconst).fit()
-    prederr2 = 0.0
+##    print 'k =', k
+##    print res.params
+##    print 'aic:  ', res.aic
+##    print 'bic:  ', res.bic
+##    print 'llf:  ', res.llf
+##    print 'R2    ', res.rsquared
+##    print 'R2 adj', res.rsquared_adj
+    prederr2 = 0.
     for inidx, outidx in LeaveOneOut(len(y0)):
-        resl1o = sm.OLS(y0[inidx], fact_wconst[inidx, :]).fit()
-        prederr2 += (y0[outidx] - resl1o.predict(fact_wconst[outidx, :])
-            ) ** 2.0
+        resl1o = sm.OLS(y0[inidx], fact_wconst[inidx,:]).fit()
+        #print data.endog[outidx], res.model.predict(data.exog[outidx,:]),
+        prederr2 += (y0[outidx] - resl1o.predict(fact_wconst[outidx,:]))**2.
     results.append([k, res.aic, res.bic, res.rsquared_adj, prederr2])
+
 results = np.array(results)
 print(results)
 print('best result for k, by AIC, BIC, R2_adj, L1O')
-print(np.r_[np.argmin(results[:, 1:3], 0), np.argmax(results[:, 3], 0), np.
-    argmin(results[:, -1], 0)])
+print(np.r_[(np.argmin(results[:,1:3],0), np.argmax(results[:,3],0),
+             np.argmin(results[:,-1],0))])
+
+
 headers = 'k, AIC, BIC, R2_adj, L1O'.split(', ')
-numformat = ['%6d'] + ['%10.3f'] * 4
-txt_fmt1 = dict(data_fmts=numformat)
+numformat = ['%6d'] + ['%10.3f']*4 #'%10.4f'
+txt_fmt1 = dict(data_fmts = numformat)
 tabl = SimpleTable(results, headers, None, txt_fmt=txt_fmt1)
-print('PCA regression on simulated data,')
-print('DGP: 2 factors and 4 explanatory variables')
+
+print("PCA regression on simulated data,")
+print("DGP: 2 factors and 4 explanatory variables")
 print(tabl)
-print('Notes: k is number of components of PCA,')
-print('       constant is added additionally')
-print('       k=0 means regression on constant only')
-print('       L1O: sum of squared prediction errors for leave-one-out')
+print("Notes: k is number of components of PCA,")
+print("       constant is added additionally")
+print("       k=0 means regression on constant only")
+print("       L1O: sum of squared prediction errors for leave-one-out")
diff --git a/statsmodels/sandbox/examples/example_sysreg.py b/statsmodels/sandbox/examples/example_sysreg.py
index a57b16014..1da5badc2 100644
--- a/statsmodels/sandbox/examples/example_sysreg.py
+++ b/statsmodels/sandbox/examples/example_sysreg.py
@@ -1,119 +1,209 @@
 """Example: statsmodels.sandbox.sysreg
 """
+#TODO: this is going to change significantly once we have a panel data structure
 from statsmodels.compat.python import asbytes, lmap
+
 import numpy as np
+
 import statsmodels.api as sm
 from statsmodels.sandbox.regression.gmm import IV2SLS
 from statsmodels.sandbox.sysreg import SUR, Sem2SLS
+
+#for Python 3 compatibility
+
+# Seemingly Unrelated Regressions (SUR) Model
+
+# This example uses the subset of the Grunfeld data in Greene's Econometric
+# Analysis Chapter 14 (5th Edition)
+
 grun_data = sm.datasets.grunfeld.load()
+
 firms = ['General Motors', 'Chrysler', 'General Electric', 'Westinghouse',
-    'US Steel']
+        'US Steel']
+#for Python 3 compatibility
 firms = lmap(asbytes, firms)
+
 grun_exog = grun_data.exog
 grun_endog = grun_data.endog
+
+# Right now takes SUR takes a list of arrays
+# The array alternates between the LHS of an equation and RHS side of an
+# equation
+# This is very likely to change
 grun_sys = []
 for i in firms:
     index = grun_exog['firm'] == i
     grun_sys.append(grun_endog[index])
-    exog = grun_exog[index][['value', 'capital']].view(float).reshape(-1, 2)
+    exog = grun_exog[index][['value','capital']].view(float).reshape(-1,2)
     exog = sm.add_constant(exog, prepend=True)
     grun_sys.append(exog)
+
+# Note that the results in Greene (5th edition) uses a slightly different
+# version of the Grunfeld data. To reproduce Table 14.1 the following changes
+# are necessary.
 grun_sys[-2][5] = 261.6
 grun_sys[-2][-3] = 645.2
-grun_sys[-1][11, 2] = 232.6
+grun_sys[-1][11,2] = 232.6
+
 grun_mod = SUR(grun_sys)
 grun_res = grun_mod.fit()
-print('Results for the 2-step GLS')
-print('Compare to Greene Table 14.1, 5th edition')
+print("Results for the 2-step GLS")
+print("Compare to Greene Table 14.1, 5th edition")
 print(grun_res.params)
-print('Results for iterative GLS (equivalent to MLE)')
-print('Compare to Greene Table 14.3')
+# or you can do an iterative fit
+# you have to define a new model though this will be fixed
+# TODO: note the above
+print("Results for iterative GLS (equivalent to MLE)")
+print("Compare to Greene Table 14.3")
+#TODO: these are slightly off, could be a convergence issue
+# or might use a different default DOF correction?
 grun_imod = SUR(grun_sys)
 grun_ires = grun_imod.fit(igls=True)
 print(grun_ires.params)
+
+# Two-Stage Least Squares for Simultaneous Equations
+#TODO: we are going to need *some kind* of formula framework
+
+# This follows the simple macroeconomic model given in
+# Greene Example 15.1 (5th Edition)
+# The data however is from statsmodels and is not the same as
+# Greene's
+
+# The model is
+# consumption: c_{t} = \alpha_{0} + \alpha_{1}y_{t} + \alpha_{2}c_{t-1} + \epsilon_{t1}
+# investment: i_{t} = \beta_{0} + \beta_{1}r_{t} + \beta_{2}\left(y_{t}-y_{t-1}\right) + \epsilon_{t2}
+# demand: y_{t} = c_{t} + I_{t} + g_{t}
+
+# See Greene's Econometric Analysis for more information
+
+# Load the data
 macrodata = sm.datasets.macrodata.load().data
-macrodata = np.sort(macrodata, order=['year', 'quarter'])
+
+# Not needed, but make sure the data is sorted
+macrodata = np.sort(macrodata, order=['year','quarter'])
+
+# Impose the demand restriction
 y = macrodata['realcons'] + macrodata['realinv'] + macrodata['realgovt']
+
+# Build the system
 macro_sys = []
-macro_sys.append(macrodata['realcons'][1:])
-exog1 = np.column_stack((y[1:], macrodata['realcons'][:-1]))
+# First equation LHS
+macro_sys.append(macrodata['realcons'][1:]) # leave off first date
+# First equation RHS
+exog1 = np.column_stack((y[1:],macrodata['realcons'][:-1]))
+#TODO: it might be nice to have "lag" and "lead" functions
 exog1 = sm.add_constant(exog1, prepend=True)
 macro_sys.append(exog1)
+# Second equation LHS
 macro_sys.append(macrodata['realinv'][1:])
+# Second equation RHS
 exog2 = np.column_stack((macrodata['tbilrate'][1:], np.diff(y)))
 exog2 = sm.add_constant(exog2, prepend=True)
 macro_sys.append(exog2)
-indep_endog = {(0): [1]}
-instruments = np.column_stack((macrodata[['realgovt', 'tbilrate']][1:].view
-    (float).reshape(-1, 2), macrodata['realcons'][:-1], y[:-1]))
+
+# We need to say that y_{t} in the RHS of equation 1 is an endogenous regressor
+# We will call these independent endogenous variables
+# Right now, we use a dictionary to declare these
+indep_endog = {0 : [1]}
+
+# We also need to create a design of our instruments
+# This will be done automatically in the future
+instruments = np.column_stack((macrodata[['realgovt',
+    'tbilrate']][1:].view(float).reshape(-1,2),macrodata['realcons'][:-1],
+    y[:-1]))
 instruments = sm.add_constant(instruments, prepend=True)
-macro_mod = Sem2SLS(macro_sys, indep_endog=indep_endog, instruments=instruments
-    )
+macro_mod = Sem2SLS(macro_sys, indep_endog=indep_endog, instruments=instruments)
+# Right now this only returns parameters
 macro_params = macro_mod.fit()
-print('The parameters for the first equation are correct.')
-print('The parameters for the second equation are not.')
+print("The parameters for the first equation are correct.")
+print("The parameters for the second equation are not.")
 print(macro_params)
-y_instrumented = macro_mod.wexog[0][:, 1]
+
+#TODO: Note that the above is incorrect, because we have no way of telling the
+# model that *part* of the y_{t} - y_{t-1} is an independent endogenous variable
+# To correct for this we would have to do the following
+y_instrumented = macro_mod.wexog[0][:,1]
 whitened_ydiff = y_instrumented - y[:-1]
-wexog = np.column_stack((macrodata['tbilrate'][1:], whitened_ydiff))
+wexog = np.column_stack((macrodata['tbilrate'][1:],whitened_ydiff))
 wexog = sm.add_constant(wexog, prepend=True)
 correct_params = sm.GLS(macrodata['realinv'][1:], wexog).fit().params
-print('If we correctly instrument everything, then these are the parameters')
-print('for the second equation')
+
+print("If we correctly instrument everything, then these are the parameters")
+print("for the second equation")
 print(correct_params)
-print('Compare to output of R script statsmodels/sandbox/tests/macrodata.s')
+print("Compare to output of R script statsmodels/sandbox/tests/macrodata.s")
+
 print('\nUsing IV2SLS')
 miv = IV2SLS(macro_sys[0], macro_sys[1], instruments)
 resiv = miv.fit()
-print('equation 1')
+print("equation 1")
 print(resiv.params)
 miv2 = IV2SLS(macro_sys[2], macro_sys[3], instruments)
 resiv2 = miv2.fit()
-print('equation 2')
+print("equation 2")
 print(resiv2.params)
+
+### Below is the same example using Greene's data ###
+
 run_greene = 0
 if run_greene:
     try:
-        data3 = np.genfromtxt(
-            '/home/skipper/school/MetricsII/Greene TableF5-1.txt', names=True)
+        data3 = np.genfromtxt('/home/skipper/school/MetricsII/Greene \
+TableF5-1.txt', names=True)
     except:
-        raise ValueError(
-            'Based on Greene TableF5-1.  You should download it from his web site and edit this script accordingly.'
-            )
+        raise ValueError("Based on Greene TableF5-1.  You should download it "
+                         "from his web site and edit this script accordingly.")
+
+    # Example 15.1 in Greene 5th Edition
+# c_t = constant + y_t + c_t-1
+# i_t = constant + r_t + (y_t - y_t-1)
+# y_t = c_t + i_t + g_t
     sys3 = []
-    sys3.append(data3['realcons'][1:])
+    sys3.append(data3['realcons'][1:])  # have to leave off a beg. date
+# impose 3rd equation on y
     y = data3['realcons'] + data3['realinvs'] + data3['realgovt']
-    exog1 = np.column_stack((y[1:], data3['realcons'][:-1]))
+
+    exog1 = np.column_stack((y[1:],data3['realcons'][:-1]))
     exog1 = sm.add_constant(exog1, prepend=False)
     sys3.append(exog1)
     sys3.append(data3['realinvs'][1:])
-    exog2 = np.column_stack((data3['tbilrate'][1:], np.diff(y)))
+    exog2 = np.column_stack((data3['tbilrate'][1:],
+        np.diff(y)))
+    # realint is missing 1st observation
     exog2 = sm.add_constant(exog2, prepend=False)
     sys3.append(exog2)
-    indep_endog = {(0): [0]}
-    instruments = np.column_stack((data3[['realgovt', 'tbilrate']][1:].view
-        (float).reshape(-1, 2), data3['realcons'][:-1], y[:-1]))
+    indep_endog = {0 : [0]} # need to be able to say that y_1 is an instrument..
+    instruments = np.column_stack((data3[['realgovt',
+        'tbilrate']][1:].view(float).reshape(-1,2),data3['realcons'][:-1],
+        y[:-1]))
     instruments = sm.add_constant(instruments, prepend=False)
-    sem_mod = Sem2SLS(sys3, indep_endog=indep_endog, instruments=instruments)
-    sem_params = sem_mod.fit()
-    y_instr = sem_mod.wexog[0][:, 0]
+    sem_mod = Sem2SLS(sys3, indep_endog = indep_endog, instruments=instruments)
+    sem_params = sem_mod.fit()  # first equation is right, but not second?
+                                # should y_t in the diff be instrumented?
+                                # how would R know this in the script?
+    # well, let's check...
+    y_instr = sem_mod.wexog[0][:,0]
     wyd = y_instr - y[:-1]
-    wexog = np.column_stack((data3['tbilrate'][1:], wyd))
+    wexog = np.column_stack((data3['tbilrate'][1:],wyd))
     wexog = sm.add_constant(wexog, prepend=False)
     params = sm.GLS(data3['realinvs'][1:], wexog).fit().params
-    print(
-        "These are the simultaneous equation estimates for Greene's example 13-1 (Also application 13-1 in 6th edition."
-        )
+
+    print("These are the simultaneous equation estimates for Greene's \
+example 13-1 (Also application 13-1 in 6th edition.")
     print(sem_params)
-    print('The first set of parameters is correct.  The second set is not.')
-    print(
-        'Compare to the solution manual at http://pages.stern.nyu.edu/~wgreene/Text/econometricanalysis.htm'
-        )
-    print('The reason is the restriction on (y_t - y_1)')
-    print('Compare to R script GreeneEx15_1.s')
-    print(
-        'Somehow R carries y.1 in yd to know that it needs to be instrumented')
-    print('If we replace our estimate with the instrumented one')
+    print("The first set of parameters is correct.  The second set is not.")
+    print("Compare to the solution manual at \
+http://pages.stern.nyu.edu/~wgreene/Text/econometricanalysis.htm")
+    print("The reason is the restriction on (y_t - y_1)")
+    print("Compare to R script GreeneEx15_1.s")
+    print("Somehow R carries y.1 in yd to know that it needs to be \
+instrumented")
+    print("If we replace our estimate with the instrumented one")
     print(params)
-    print('We get the right estimate')
-    print('Without a formula framework we have to be able to do restrictions.')
+    print("We get the right estimate")
+    print("Without a formula framework we have to be able to do restrictions.")
+# yep!, but how in the world does R know this when we just fed it yd??
+# must be implicit in the formula framework...
+# we are going to need to keep the two equations separate and use
+# a restrictions matrix.  Ugh, is a formula framework really, necessary to get
+# around this?
diff --git a/statsmodels/sandbox/examples/run_all.py b/statsmodels/sandbox/examples/run_all.py
index 7f8171148..9baf39995 100644
--- a/statsmodels/sandbox/examples/run_all.py
+++ b/statsmodels/sandbox/examples/run_all.py
@@ -1,4 +1,4 @@
-"""run all examples to make sure we do not get an exception
+'''run all examples to make sure we do not get an exception

 Note:
 If an example contaings plt.show(), then all plot windows have to be closed
@@ -6,26 +6,32 @@ manually, at least in my setup.

 uncomment plt.show() to show all plot windows

-"""
+'''
 from statsmodels.compat.python import input
 stop_on_error = True
+
+
 filelist = ['example_pca.py', 'example_sysreg.py', 'example_mle.py',
-    'example_pca_regression.py']
-cont = input(
-    """Are you sure you want to run all of the examples?
+#            'example_gam.py', # exclude, currently we are not working on it
+            'example_pca_regression.py']
+
+cont = input("""Are you sure you want to run all of the examples?
 This is done mainly to check that they are up to date.
-(y/n) >>> """
-    )
+(y/n) >>> """)
 if 'y' in cont.lower():
     for run_all_f in filelist:
         try:
-            print('Executing example file', run_all_f)
-            print('-----------------------' + '-' * len(run_all_f))
-            with open(run_all_f, encoding='utf-8') as f:
+            print("Executing example file", run_all_f)
+            print("-----------------------" + "-"*len(run_all_f))
+            with open(run_all_f, encoding="utf-8") as f:
                 exec(f.read())
         except:
-            print('*********************')
-            print('ERROR in example file', run_all_f)
-            print('**********************' + '*' * len(run_all_f))
+            #f might be overwritten in the executed file
+            print("*********************")
+            print("ERROR in example file", run_all_f)
+            print("**********************" + "*"*len(run_all_f))
             if stop_on_error:
                 raise
+#plt.show()
+#plt.close('all')
+#close does not work because I never get here without closing plots manually
diff --git a/statsmodels/sandbox/examples/thirdparty/ex_ratereturn.py b/statsmodels/sandbox/examples/thirdparty/ex_ratereturn.py
index 654e73c0f..d55e890d2 100644
--- a/statsmodels/sandbox/examples/thirdparty/ex_ratereturn.py
+++ b/statsmodels/sandbox/examples/thirdparty/ex_ratereturn.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Playing with correlation of DJ-30 stock returns

 this uses pickled data that needs to be created with findow.py
@@ -8,98 +9,132 @@ Created on Sat Jan 30 16:30:18 2010
 Author: josef-pktd
 """
 import pickle
+
 import numpy as np
 import matplotlib.pyplot as plt
 import matplotlib as mpl
+
 import statsmodels.sandbox.tools as sbtools
+
 from statsmodels.graphics.correlation import plot_corr, plot_corr_grid
+
 try:
     with open('dj30rr', 'rb') as fd:
         rrdm = pickle.load(fd)
-except Exception:
-    print(
-        'Error with unpickling, a new pickle file can be created with findow_1'
-        )
+except Exception: #blanket for any unpickling error
+    print("Error with unpickling, a new pickle file can be created with findow_1")
     raise
+
 ticksym = rrdm.columns.tolist()
 rr = rrdm.values[1:400]
+
 rrcorr = np.corrcoef(rr, rowvar=0)
+
+
 plot_corr(rrcorr, xnames=ticksym)
 nvars = rrcorr.shape[0]
 plt.figure()
-plt.hist(rrcorr[np.triu_indices(nvars, 1)])
+plt.hist(rrcorr[np.triu_indices(nvars,1)])
 plt.title('Correlation Coefficients')
-xreda, facta, evaa, evea = sbtools.pcasvd(rr)
-evallcs = evaa.cumsum()
-print(evallcs / evallcs[-1])
-xred, fact, eva, eve = sbtools.pcasvd(rr, keepdim=4)
+
+xreda, facta, evaa, evea  = sbtools.pcasvd(rr)
+evallcs = (evaa).cumsum()
+print(evallcs/evallcs[-1])
+xred, fact, eva, eve  = sbtools.pcasvd(rr, keepdim=4)
 pcacorr = np.corrcoef(xred, rowvar=0)
+
 plot_corr(pcacorr, xnames=ticksym, title='Correlation PCA')
-resid = rr - xred
+
+resid = rr-xred
 residcorr = np.corrcoef(resid, rowvar=0)
 plot_corr(residcorr, xnames=ticksym, title='Correlation Residuals')
+
 plt.matshow(residcorr)
-plt.imshow(residcorr, cmap=plt.cm.jet, interpolation='nearest', extent=(0, 
-    30, 0, 30), vmin=-1.0, vmax=1.0)
+plt.imshow(residcorr, cmap=plt.cm.jet, interpolation='nearest',
+           extent=(0,30,0,30), vmin=-1.0, vmax=1.0)
 plt.colorbar()
-normcolor = 0, 1
+
+normcolor = (0,1) #False #True
 fig = plt.figure()
-ax = fig.add_subplot(2, 2, 1)
+ax = fig.add_subplot(2,2,1)
 plot_corr(rrcorr, xnames=ticksym, normcolor=normcolor, ax=ax)
-ax2 = fig.add_subplot(2, 2, 3)
-plot_corr(pcacorr, xnames=ticksym, title='Correlation PCA', normcolor=
-    normcolor, ax=ax2)
-ax3 = fig.add_subplot(2, 2, 4)
+ax2 = fig.add_subplot(2,2,3)
+#pcacorr = np.corrcoef(xred, rowvar=0)
+plot_corr(pcacorr, xnames=ticksym, title='Correlation PCA',
+          normcolor=normcolor, ax=ax2)
+ax3 = fig.add_subplot(2,2,4)
 plot_corr(residcorr, xnames=ticksym, title='Correlation Residuals',
-    normcolor=normcolor, ax=ax3)
-images = [c for fig_ax in fig.axes for c in fig_ax.get_children() if
-    isinstance(c, mpl.image.AxesImage)]
+          normcolor=normcolor, ax=ax3)
+
+images = [c for fig_ax in fig.axes for c in fig_ax.get_children() if isinstance(c, mpl.image.AxesImage)]
 print(images)
 print(ax.get_children())
-fig.subplots_adjust(bottom=0.1, right=0.9, top=0.9)
+#cax = fig.add_subplot(2,2,2)
+#[0.85, 0.1, 0.075, 0.8]
+fig. subplots_adjust(bottom=0.1, right=0.9, top=0.9)
 cax = fig.add_axes([0.9, 0.1, 0.025, 0.8])
 fig.colorbar(images[0], cax=cax)
 fig.savefig('corrmatrixgrid.png', dpi=120)
+
 has_sklearn = True
 try:
-    import sklearn
+    import sklearn  # noqa:F401
 except ImportError:
     has_sklearn = False
     print('sklearn not available')
+
+
+def cov2corr(cov):
+    std_ = np.sqrt(np.diag(cov))
+    corr = cov / np.outer(std_, std_)
+    return corr
+
 if has_sklearn:
     from sklearn.covariance import LedoitWolf, OAS, MCD
+
     lw = LedoitWolf(store_precision=False)
     lw.fit(rr, assume_centered=False)
     cov_lw = lw.covariance_
     corr_lw = cov2corr(cov_lw)
+
     oas = OAS(store_precision=False)
     oas.fit(rr, assume_centered=False)
     cov_oas = oas.covariance_
     corr_oas = cov2corr(cov_oas)
-    mcd = MCD()
+
+    mcd = MCD()#.fit(rr, reweight=None)
     mcd.fit(rr, assume_centered=False)
     cov_mcd = mcd.covariance_
     corr_mcd = cov2corr(cov_mcd)
+
     titles = ['raw correlation', 'lw', 'oas', 'mcd']
     normcolor = None
     fig = plt.figure()
     for i, c in enumerate([rrcorr, corr_lw, corr_oas, corr_mcd]):
-        ax = fig.add_subplot(2, 2, i + 1)
-        plot_corr(c, xnames=None, title=titles[i], normcolor=normcolor, ax=ax)
-    images = [c for fig_ax in fig.axes for c in fig_ax.get_children() if
-        isinstance(c, mpl.image.AxesImage)]
-    fig.subplots_adjust(bottom=0.1, right=0.9, top=0.9)
+    #for i, c in enumerate([np.cov(rr, rowvar=0), cov_lw, cov_oas, cov_mcd]):
+        ax = fig.add_subplot(2,2,i+1)
+        plot_corr(c, xnames=None, title=titles[i],
+              normcolor=normcolor, ax=ax)
+
+    images = [c for fig_ax in fig.axes for c in fig_ax.get_children() if isinstance(c, mpl.image.AxesImage)]
+    fig. subplots_adjust(bottom=0.1, right=0.9, top=0.9)
     cax = fig.add_axes([0.9, 0.1, 0.025, 0.8])
     fig.colorbar(images[0], cax=cax)
+
     corrli = [rrcorr, corr_lw, corr_oas, corr_mcd, pcacorr]
-    diffssq = np.array([[((ci - cj) ** 2).sum() for ci in corrli] for cj in
-        corrli])
-    diffsabs = np.array([[np.max(np.abs(ci - cj)) for ci in corrli] for cj in
-        corrli])
+    diffssq = np.array([[((ci-cj)**2).sum() for ci in corrli]
+                            for cj in corrli])
+    diffsabs = np.array([[np.max(np.abs(ci-cj)) for ci in corrli]
+                            for cj in corrli])
     print(diffssq)
     print('\nmaxabs')
     print(diffsabs)
     fig.savefig('corrmatrix_sklearn.png', dpi=120)
-    fig2 = plot_corr_grid(corrli + [residcorr], ncols=3, titles=titles + [
-        'pca', 'pca-residual'], xnames=[], ynames=[])
+
+    fig2 = plot_corr_grid(corrli+[residcorr], ncols=3,
+                          titles=titles+['pca', 'pca-residual'],
+                          xnames=[], ynames=[])
     fig2.savefig('corrmatrix_sklearn_2.png', dpi=120)
+
+#plt.show()
+#plt.close('all')
diff --git a/statsmodels/sandbox/examples/thirdparty/findow_1.py b/statsmodels/sandbox/examples/thirdparty/findow_1.py
index 8a8255b95..b2ec61086 100644
--- a/statsmodels/sandbox/examples/thirdparty/findow_1.py
+++ b/statsmodels/sandbox/examples/thirdparty/findow_1.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """A quick look at volatility of stock returns for 2009

 Just an exercise to find my way around the pandas methods.
@@ -15,35 +16,78 @@ Created on Sat Jan 30 16:30:18 2010
 Author: josef-pktd
 """
 import os
+
 from statsmodels.compat.python import lzip
 import numpy as np
 import matplotlib.finance as fin
 import matplotlib.pyplot as plt
 import datetime as dt
+
 import pandas as pd
+
+
+def getquotes(symbol, start, end):
+    # Taken from the no-longer-existent pandas.examples.finance
+    quotes = fin.quotes_historical_yahoo(symbol, start, end)
+    dates, open, close, high, low, volume = lzip(*quotes)
+
+    data = {
+        'open' : open,
+        'close' : close,
+        'high' : high,
+        'low' : low,
+        'volume' : volume
+    }
+
+    dates = pd.Index([dt.datetime.fromordinal(int(d)) for d in dates])
+    return pd.DataFrame(data, index=dates)
+
+
 start_date = dt.datetime(2007, 1, 1)
 end_date = dt.datetime(2009, 12, 31)
-dj30 = ['MMM', 'AA', 'AXP', 'T', 'BAC', 'BA', 'CAT', 'CVX', 'CSCO', 'KO',
-    'DD', 'XOM', 'GE', 'HPQ', 'HD', 'INTC', 'IBM', 'JNJ', 'JPM', 'KFT',
-    'MCD', 'MRK', 'MSFT', 'PFE', 'PG', 'TRV', 'UTX', 'VZ', 'WMT', 'DIS']
+
+dj30 = ['MMM', 'AA', 'AXP', 'T', 'BAC', 'BA', 'CAT', 'CVX', 'CSCO',
+       'KO', 'DD', 'XOM', 'GE', 'HPQ', 'HD', 'INTC', 'IBM', 'JNJ',
+       'JPM', 'KFT', 'MCD', 'MRK', 'MSFT', 'PFE', 'PG', 'TRV',
+       'UTX', 'VZ', 'WMT', 'DIS']
 mysym = ['msft', 'ibm', 'goog']
 indexsym = ['gspc', 'dji']
+
+
+# download data
 dmall = {}
 for sy in dj30:
-    dmall[sy] = getquotes(sy, start_date, end_date)
+    dmall[sy]  = getquotes(sy, start_date, end_date)
+
+# combine into WidePanel
 pawp = pd.WidePanel.fromDict(dmall)
 print(pawp.values.shape)
+
+# select closing prices
 paclose = pawp.getMinorXS('close')
+
+# take log and first difference over time
 paclose_ratereturn = paclose.apply(np.log).diff()
+
 if not os.path.exists('dj30rr'):
+    #if pandas is updated, then sometimes unpickling fails, and need to save again
     paclose_ratereturn.save('dj30rr')
+
 plt.figure()
 paclose_ratereturn.plot()
 plt.title('daily rate of return')
-paclose_ratereturn_vol = paclose_ratereturn.apply(lambda x: np.power(x, 2))
+
+# square the returns
+paclose_ratereturn_vol = paclose_ratereturn.apply(lambda x:np.power(x,2))
 plt.figure()
 plt.title('volatility (with 5 day moving average')
 paclose_ratereturn_vol.plot()
-paclose_ratereturn_vol_mov = paclose_ratereturn_vol.apply(lambda x: np.
-    convolve(x, np.ones(5) / 5.0, 'same'))
+
+# use convolution to get moving average
+paclose_ratereturn_vol_mov = paclose_ratereturn_vol.apply(
+                        lambda x:np.convolve(x,np.ones(5)/5.,'same'))
 paclose_ratereturn_vol_mov.plot()
+
+
+
+#plt.show()
diff --git a/statsmodels/sandbox/examples/thirdparty/try_interchange.py b/statsmodels/sandbox/examples/thirdparty/try_interchange.py
index 1d188e289..c400fdd79 100644
--- a/statsmodels/sandbox/examples/thirdparty/try_interchange.py
+++ b/statsmodels/sandbox/examples/thirdparty/try_interchange.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """groupmean, groupby in pandas, la and tabular from a scikits.timeseries

 after a question on the scipy-user mailing list I tried to do
@@ -26,43 +27,48 @@ import scikits.timeseries as ts
 import la
 import pandas
 import tabular as tb
-from finance import msft, ibm
-s = ts.time_series([1, 2, 3, 4, 5], dates=ts.date_array(['2001-01',
-    '2001-01', '2001-02', '2001-03', '2001-03'], freq='M'))
+from finance import msft, ibm  # hack to make it run as standalone
+
+s = ts.time_series([1,2,3,4,5],
+            dates=ts.date_array(["2001-01","2001-01",
+            "2001-02","2001-03","2001-03"],freq="M"))
+
 print('\nUsing la')
 dta = la.larry(s.data, label=[lrange(len(s.data))])
 dat = la.larry(s.dates.tolist(), label=[lrange(len(s.data))])
-s2 = ts.time_series(dta.group_mean(dat).x, dates=ts.date_array(dat.x, freq='M')
-    )
+s2 = ts.time_series(dta.group_mean(dat).x,dates=ts.date_array(dat.x,freq="M"))
 s2u = ts.remove_duplicated_dates(s2)
 print(repr(s))
 print(dat)
 print(repr(s2))
 print(repr(s2u))
+
 print('\nUsing pandas')
 pdta = pandas.DataFrame(s.data, np.arange(len(s.data)), [1])
-pa = pdta.groupby(dict(zip(np.arange(len(s.data)), s.dates.tolist()))
-    ).aggregate(np.mean)
-s3 = ts.time_series(pa.values.ravel(), dates=ts.date_array(pa.index.tolist(
-    ), freq='M'))
+pa = pdta.groupby(dict(zip(np.arange(len(s.data)),
+            s.dates.tolist()))).aggregate(np.mean)
+s3 = ts.time_series(pa.values.ravel(),
+            dates=ts.date_array(pa.index.tolist(),freq="M"))
+
 print(pa)
 print(repr(s3))
+
 print('\nUsing tabular')
 X = tb.tabarray(array=s.torecords(), dtype=s.torecords().dtype)
-tabx = X.aggregate(On=['_dates'], AggFuncDict={'_data': np.mean, '_mask':
-    np.all})
-s4 = ts.time_series(tabx['_data'], dates=ts.date_array(tabx['_dates'], freq
-    ='M'))
+tabx = X.aggregate(On=['_dates'], AggFuncDict={'_data':np.mean,'_mask':np.all})
+s4 = ts.time_series(tabx['_data'],dates=ts.date_array(tabx['_dates'],freq="M"))
 print(tabx)
 print(repr(s4))
+
+#after running pandas/examples/finance.py
 larmsft = la.larry(msft.values, [msft.index.tolist(), msft.columns.tolist()])
 laribm = la.larry(ibm.values, [ibm.index.tolist(), ibm.columns.tolist()])
-lar1 = la.larry(np.dstack((msft.values, ibm.values)), [ibm.index.tolist(),
-    ibm.columns.tolist(), ['msft', 'ibm']])
+lar1 = la.larry(np.dstack((msft.values,ibm.values)), [ibm.index.tolist(), ibm.columns.tolist(), ['msft', 'ibm']])
 print(lar1.mean(0))
+
+
 y = la.larry([[1.0, 2.0], [3.0, 4.0]], [['a', 'b'], ['c', 'd']])
-ysr = np.empty(y.x.shape[0], dtype=[('index', 'S1')] + [(i, np.float) for i in
-    y.label[1]])
+ysr = np.empty(y.x.shape[0],dtype=([('index','S1')]+[(i,np.float) for i in y.label[1]]))
 ysr['index'] = y.label[0]
 for i in ysr.dtype.names[1:]:
     ysr[i] = y[y.labelindex(i, axis=1)].x
diff --git a/statsmodels/sandbox/examples/try_gmm_other.py b/statsmodels/sandbox/examples/try_gmm_other.py
index c977b595e..094845f55 100644
--- a/statsmodels/sandbox/examples/try_gmm_other.py
+++ b/statsmodels/sandbox/examples/try_gmm_other.py
@@ -1,19 +1,61 @@
+
 import numpy as np
 from scipy import stats
+
 from statsmodels.regression.linear_model import OLS
 from statsmodels.tools import tools
 from statsmodels.sandbox.regression.gmm import spec_hausman
+
 from statsmodels.sandbox.regression import gmm
+
+
 if __name__ == '__main__':
     examples = ['ivols', 'distquant'][:]
+
     if 'ivols' in examples:
         exampledata = ['ols', 'iv', 'ivfake'][1]
         nobs = nsample = 500
         sige = 3
         corrfactor = 0.025
-        x = np.linspace(0, 10, nobs)
-        X = tools.add_constant(np.column_stack((x, x ** 2)), prepend=False)
+
+
+        x = np.linspace(0,10, nobs)
+        X = tools.add_constant(np.column_stack((x, x**2)), prepend=False)
         beta = np.array([1, 0.1, 10])
+
+        def sample_ols(exog):
+            endog = np.dot(exog, beta) + sige*np.random.normal(size=nobs)
+            return endog, exog, None
+
+        def sample_iv(exog):
+            print('using iv example')
+            X = exog.copy()
+            e = sige * np.random.normal(size=nobs)
+            endog = np.dot(X, beta) + e
+            exog[:,0] = X[:,0] + corrfactor * e
+            z0 = X[:,0] + np.random.normal(size=nobs)
+            z1 = X.sum(1) + np.random.normal(size=nobs)
+            z2 = X[:,1]
+            z3 = (np.dot(X, np.array([2,1, 0])) +
+                            sige/2. * np.random.normal(size=nobs))
+            z4 = X[:,1] + np.random.normal(size=nobs)
+            instrument = np.column_stack([z0, z1, z2, z3, z4, X[:,-1]])
+            return endog, exog, instrument
+
+        def sample_ivfake(exog):
+            X = exog
+            e = sige * np.random.normal(size=nobs)
+            endog = np.dot(X, beta) + e
+            #X[:,0] += 0.01 * e
+            #z1 = X.sum(1) + np.random.normal(size=nobs)
+            #z2 = X[:,1]
+            z3 = (np.dot(X, np.array([2,1, 0])) +
+                            sige/2. * np.random.normal(size=nobs))
+            z4 = X[:,1] + np.random.normal(size=nobs)
+            instrument = np.column_stack([X[:,:2], z3, z4, X[:,-1]]) #last is constant
+            return endog, exog, instrument
+
+
         if exampledata == 'ols':
             endog, exog, _ = sample_ols(X)
             instrument = exog
@@ -21,55 +63,77 @@ if __name__ == '__main__':
             endog, exog, instrument = sample_iv(X)
         elif exampledata == 'ivfake':
             endog, exog, instrument = sample_ivfake(X)
+
+
+        #using GMM and IV2SLS classes
+        #----------------------------
+
         mod = gmm.IVGMM(endog, exog, instrument, nmoms=instrument.shape[1])
         res = mod.fit()
         modgmmols = gmm.IVGMM(endog, exog, exog, nmoms=exog.shape[1])
         resgmmols = modgmmols.fit()
-        modgmmiv = gmm.IVGMM(endog, exog, instrument, nmoms=instrument.shape[1]
-            )
-        resgmmiv = modgmmiv.fitgmm(np.ones(exog.shape[1], float), weights=
-            np.linalg.inv(np.dot(instrument.T, instrument)))
+        #the next is the same as IV2SLS, (Z'Z)^{-1} as weighting matrix
+        modgmmiv = gmm.IVGMM(endog, exog, instrument, nmoms=instrument.shape[1]) #same as mod
+        resgmmiv = modgmmiv.fitgmm(np.ones(exog.shape[1], float),
+                        weights=np.linalg.inv(np.dot(instrument.T, instrument)))
         modls = gmm.IV2SLS(endog, exog, instrument)
         resls = modls.fit()
         modols = OLS(endog, exog)
         resols = modols.fit()
+
         print('\nIV case')
         print('params')
         print('IV2SLS', resls.params)
-        print('GMMIV ', resgmmiv)
+        print('GMMIV ', resgmmiv) # .params
         print('GMM   ', res.params)
         print('diff  ', res.params - resls.params)
         print('OLS   ', resols.params)
         print('GMMOLS', resgmmols.params)
+
         print('\nbse')
         print('IV2SLS', resls.bse)
-        print('GMM   ', res.bse)
+        print('GMM   ', res.bse)   #bse currently only attached to model not results
         print('diff  ', res.bse - resls.bse)
         print('%-diff', resls.bse / res.bse * 100 - 100)
         print('OLS   ', resols.bse)
         print('GMMOLS', resgmmols.bse)
+        #print 'GMMiv', modgmmiv.bse
+
         print("Hausman's specification test")
         print(resls.spec_hausman())
         print(spec_hausman(resols.params, res.params, resols.cov_params(),
-            res.cov_params()))
-        print(spec_hausman(resgmmols.params, res.params, resgmmols.
-            cov_params(), res.cov_params()))
+                           res.cov_params()))
+        print(spec_hausman(resgmmols.params, res.params, resgmmols.cov_params(),
+                           res.cov_params()))
+
+
     if 'distquant' in examples:
+
+
+        #estimating distribution parameters from quantiles
+        #-------------------------------------------------
+
+        #example taken from distribution_estimators.py
         gparrvs = stats.genpareto.rvs(2, size=5000)
-        x0p = [1.0, gparrvs.min() - 5, 1]
-        moddist = gmm.DistQuantilesGMM(gparrvs, None, None, distfn=stats.
-            genpareto)
-        pit1, wit1 = moddist.fititer([1.5, 0, 1.5], maxiter=1)
+        x0p = [1., gparrvs.min()-5, 1]
+
+        moddist = gmm.DistQuantilesGMM(gparrvs, None, None, distfn=stats.genpareto)
+        #produces non-sense because optimal weighting matrix calculations do not
+        #apply to this case
+        #resgp = moddist.fit() #now with 'cov': LinAlgError: Singular matrix
+        pit1, wit1 = moddist.fititer([1.5,0,1.5], maxiter=1)
         print(pit1)
-        p1 = moddist.fitgmm([1.5, 0, 1.5])
+        p1 = moddist.fitgmm([1.5,0,1.5])
         print(p1)
-        moddist2 = gmm.DistQuantilesGMM(gparrvs, None, None, distfn=stats.
-            genpareto, pquant=np.linspace(0.01, 0.99, 10))
-        pit1a, wit1a = moddist2.fititer([1.5, 0, 1.5], maxiter=1)
+        moddist2 = gmm.DistQuantilesGMM(gparrvs, None, None, distfn=stats.genpareto,
+                                    pquant=np.linspace(0.01,0.99,10))
+        pit1a, wit1a = moddist2.fititer([1.5,0,1.5], maxiter=1)
         print(pit1a)
-        p1a = moddist2.fitgmm([1.5, 0, 1.5])
+        p1a = moddist2.fitgmm([1.5,0,1.5])
         print(p1a)
-        res1b = moddist2.fitonce([1.5, 0, 1.5])
+        #Note: pit1a and p1a are the same and almost the same (1e-5) as
+        #      fitquantilesgmm version (functions instead of class)
+        res1b = moddist2.fitonce([1.5,0,1.5])
         print(res1b.params)
-        print(res1b.bse)
+        print(res1b.bse)  #they look much too large
         print(np.sqrt(np.diag(res1b._cov_params)))
diff --git a/statsmodels/sandbox/examples/try_multiols.py b/statsmodels/sandbox/examples/try_multiols.py
index 9ec6aa5c2..5d8a31cb9 100644
--- a/statsmodels/sandbox/examples/try_multiols.py
+++ b/statsmodels/sandbox/examples/try_multiols.py
@@ -1,26 +1,44 @@
+# -*- coding: utf-8 -*-
 """

 Created on Sun May 26 13:23:40 2013

 Author: Josef Perktold, based on Enrico Giampieri's multiOLS
 """
+
+#import numpy as np
 import pandas as pd
+
 import statsmodels.api as sm
 from statsmodels.sandbox.multilinear import multiOLS, multigroup
+
 data = sm.datasets.longley.load_pandas()
 df = data.exog
 df['TOTEMP'] = data.endog
+
+#This will perform the specified linear model on all the
+#other columns of the dataframe
 res0 = multiOLS('GNP + 1', df)
+
+#This select only a certain subset of the columns
 res = multiOLS('GNP + 0', df, ['GNPDEFL', 'TOTEMP', 'POP'])
 print(res.to_string())
-url = 'https://raw.githubusercontent.com/vincentarelbundock/'
-url = url + 'Rdatasets/csv/HistData/Guerry.csv'
-df = pd.read_csv(url, index_col=1)
+
+
+url = "https://raw.githubusercontent.com/vincentarelbundock/"
+url = url + "Rdatasets/csv/HistData/Guerry.csv"
+df = pd.read_csv(url, index_col=1) #'dept')
+
+#evaluate the relationship between the various parameters whith the Wealth
 pvals = multiOLS('Wealth', df)['adj_pvals', '_f_test']
+
+#define the groups
 groups = {}
-groups['crime'] = ['Crime_prop', 'Infanticide', 'Crime_parents',
-    'Desertion', 'Crime_pers']
+groups['crime'] = ['Crime_prop', 'Infanticide',
+                   'Crime_parents', 'Desertion', 'Crime_pers']
 groups['religion'] = ['Donation_clergy', 'Clergy', 'Donations']
 groups['wealth'] = ['Commerce', 'Lottery', 'Instruction', 'Literacy']
+
+#do the analysis of the significance
 res3 = multigroup(pvals < 0.05, groups)
 print(res3)
diff --git a/statsmodels/sandbox/examples/try_quantile_regression.py b/statsmodels/sandbox/examples/try_quantile_regression.py
index 06908ecab..cd662ca4b 100644
--- a/statsmodels/sandbox/examples/try_quantile_regression.py
+++ b/statsmodels/sandbox/examples/try_quantile_regression.py
@@ -1,24 +1,38 @@
-"""Example to illustrate Quantile Regression
+'''Example to illustrate Quantile Regression

 Author: Josef Perktold

-"""
+'''
+
 import numpy as np
 import matplotlib.pyplot as plt
+
 import statsmodels.api as sm
+
 from statsmodels.regression.quantile_regression import QuantReg
 sige = 5
 nobs, k_vars = 500, 5
 x = np.random.randn(nobs, k_vars)
-y = x.sum(1) + sige * (np.random.randn(nobs) / 2 + 1) ** 3
+#x[:,0] = 1
+y = x.sum(1) + sige * (np.random.randn(nobs)/2 + 1)**3
 p = 0.5
 exog = np.column_stack((np.ones(nobs), x))
 res_qr = QuantReg(y, exog).fit(p)
+
 res_qr2 = QuantReg(y, exog).fit(0.25)
 res_qr3 = QuantReg(y, exog).fit(0.75)
 res_ols = sm.OLS(y, exog).fit()
+
+
+##print 'ols ', res_ols.params
+##print '0.25', res_qr2
+##print '0.5 ', res_qr
+##print '0.75', res_qr3
+
 params = [res_ols.params, res_qr2.params, res_qr.params, res_qr3.params]
 labels = ['ols', 'qr 0.25', 'qr 0.5', 'qr 0.75']
+
+#sortidx = np.argsort(y)
 fitted_ols = np.dot(res_ols.model.exog, params[0])
 sortidx = np.argsort(fitted_ols)
 x_sorted = res_ols.model.exog[sortidx]
@@ -31,4 +45,5 @@ for lab, beta in zip(['ols', 'qr 0.25', 'qr 0.5', 'qr 0.75'], params):
     lw = 2 if lab == 'ols' else 1
     plt.plot(fitted, lw=lw, label=lab)
 plt.legend()
+
 plt.show()
diff --git a/statsmodels/sandbox/examples/try_quantile_regression1.py b/statsmodels/sandbox/examples/try_quantile_regression1.py
index 2df9e1261..9262978bb 100644
--- a/statsmodels/sandbox/examples/try_quantile_regression1.py
+++ b/statsmodels/sandbox/examples/try_quantile_regression1.py
@@ -1,30 +1,37 @@
-"""Example to illustrate Quantile Regression
+'''Example to illustrate Quantile Regression

 Author: Josef Perktold

 polynomial regression with systematic deviations above

-"""
+'''
+
 import numpy as np
 import matplotlib.pyplot as plt
+
 from scipy import stats
 import statsmodels.api as sm
+
 from statsmodels.regression.quantile_regression import QuantReg
+
 sige = 0.1
 nobs, k_vars = 500, 3
 x = np.random.uniform(-1, 1, size=nobs)
 x.sort()
-exog = np.vander(x, k_vars + 1)[:, ::-1]
-mix = 0.1 * stats.norm.pdf(x[:, None], loc=np.linspace(-0.5, 0.75, 4),
-    scale=0.01).sum(1)
-y = exog.sum(1) + mix + sige * (np.random.randn(nobs) / 2 + 1) ** 3
+exog = np.vander(x, k_vars+1)[:,::-1]
+mix = 0.1 * stats.norm.pdf(x[:,None], loc=np.linspace(-0.5, 0.75, 4), scale=0.01).sum(1)
+y = exog.sum(1) + mix + sige * (np.random.randn(nobs)/2 + 1)**3
+
 p = 0.5
 res_qr = QuantReg(y, exog).fit(p)
 res_qr2 = QuantReg(y, exog).fit(0.1)
 res_qr3 = QuantReg(y, exog).fit(0.75)
 res_ols = sm.OLS(y, exog).fit()
+
 params = [res_ols.params, res_qr2.params, res_qr.params, res_qr3.params]
 labels = ['ols', 'qr 0.1', 'qr 0.5', 'qr 0.75']
+
+
 plt.figure()
 plt.plot(x, y, '.', alpha=0.5)
 for lab, beta in zip(['ols', 'qr 0.1', 'qr 0.5', 'qr 0.75'], params):
@@ -34,4 +41,5 @@ for lab, beta in zip(['ols', 'qr 0.1', 'qr 0.5', 'qr 0.75'], params):
     plt.plot(x, fitted, lw=lw, label=lab)
 plt.legend()
 plt.title('Quantile Regression')
+
 plt.show()
diff --git a/statsmodels/sandbox/examples/try_smoothers.py b/statsmodels/sandbox/examples/try_smoothers.py
index 96389e10c..81a12c9e7 100644
--- a/statsmodels/sandbox/examples/try_smoothers.py
+++ b/statsmodels/sandbox/examples/try_smoothers.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Tue Nov 01 15:17:52 2011

@@ -7,65 +8,92 @@ Author: Josef
 mainly script for checking Kernel Regression
 """
 import numpy as np
-if __name__ == '__main__':
+
+if __name__ == "__main__":
+    #from statsmodels.sandbox.nonparametric import smoothers as s
     from statsmodels.sandbox.nonparametric import smoothers, kernels
     import matplotlib.pyplot as plt
+    #from numpy import sin, array, random
+
     import time
     np.random.seed(500)
     nobs = 250
     sig_fac = 0.5
+    #x = np.random.normal(size=nobs)
     x = np.random.uniform(-2, 2, size=nobs)
-    y = np.sin(x * 5) / x + 2 * x + sig_fac * (3 + x) * np.random.normal(size
-        =nobs)
+    #y = np.array([np.sin(i*5)/i + 2*i + (3+i)*np.random.normal() for i in x])
+    y = np.sin(x*5)/x + 2*x + sig_fac * (3+x)*np.random.normal(size=nobs)
+
     K = kernels.Biweight(0.25)
-    K2 = kernels.CustomKernel(lambda x: (1 - x * x) ** 2, 0.25, domain=[-
-        1.0, 1.0])
+    K2 = kernels.CustomKernel(lambda x: (1 - x*x)**2, 0.25, domain = [-1.0,
+                               1.0])
+
     KS = smoothers.KernelSmoother(x, y, K)
     KS2 = smoothers.KernelSmoother(x, y, K2)
+
+
     KSx = np.arange(-3, 3, 0.1)
     start = time.time()
     KSy = KS.conf(KSx)
     KVar = KS.std(KSx)
-    print(time.time() - start)
-    start = time.time()
-    KS2y = KS2.conf(KSx)
-    K2Var = KS2.std(KSx)
-    print(time.time() - start)
+    print(time.time() - start)    # This should be significantly quicker...
+    start = time.time()          #
+    KS2y = KS2.conf(KSx)         #
+    K2Var = KS2.std(KSx)         #
+    print(time.time() - start)    # ...than this.
+
     KSConfIntx, KSConfInty = KS.conf(15)
-    print('Norm const should be 0.9375')
+
+    print("Norm const should be 0.9375")
     print(K2.norm_const)
-    print('L2 Norms Should Match:')
+
+    print("L2 Norms Should Match:")
     print(K.L2Norm)
     print(K2.L2Norm)
-    print('Fit values should match:')
+
+    print("Fit values should match:")
+    #print zip(KSy, KS2y)
     print(KSy[28])
     print(KS2y[28])
-    print('Var values should match:')
+
+    print("Var values should match:")
+    #print zip(KVar, K2Var)
     print(KVar[39])
     print(K2Var[39])
+
     fig = plt.figure()
     ax = fig.add_subplot(221)
-    ax.plot(x, y, '+')
-    ax.plot(KSx, KSy, '-o')
+    ax.plot(x, y, "+")
+    ax.plot(KSx, KSy, "-o")
+    #ax.set_ylim(-20, 30)
     ax2 = fig.add_subplot(222)
-    ax2.plot(KSx, KVar, '-o')
+    ax2.plot(KSx, KVar, "-o")
+
     ax3 = fig.add_subplot(223)
-    ax3.plot(x, y, '+')
-    ax3.plot(KSx, KS2y, '-o')
+    ax3.plot(x, y, "+")
+    ax3.plot(KSx, KS2y, "-o")
+    #ax3.set_ylim(-20, 30)
     ax4 = fig.add_subplot(224)
-    ax4.plot(KSx, K2Var, '-o')
+    ax4.plot(KSx, K2Var, "-o")
+
     fig2 = plt.figure()
     ax5 = fig2.add_subplot(111)
-    ax5.plot(x, y, '+')
-    ax5.plot(KSConfIntx, KSConfInty, '-o')
+    ax5.plot(x, y, "+")
+    ax5.plot(KSConfIntx, KSConfInty, "-o")
+
     import statsmodels.nonparametric.smoothers_lowess as lo
     ys = lo.lowess(y, x)
-    ax5.plot(ys[:, 0], ys[:, 1], 'b-')
+    ax5.plot(ys[:,0], ys[:,1], 'b-')
     ys2 = lo.lowess(y, x, frac=0.25)
-    ax5.plot(ys2[:, 0], ys2[:, 1], 'b--', lw=2)
+    ax5.plot(ys2[:,0], ys2[:,1], 'b--', lw=2)
+
+    #need to sort for matplolib plot ?
     xind = np.argsort(x)
     pmod = smoothers.PolySmoother(5, x[xind])
     pmod.fit(y[xind])
+
     yp = pmod(x[xind])
     ax5.plot(x[xind], yp, 'k-')
     ax5.set_title('Kernel regression, lowess - blue, polysmooth - black')
+
+    #plt.show()
diff --git a/statsmodels/sandbox/gam.py b/statsmodels/sandbox/gam.py
index feeb80e9e..d51281226 100644
--- a/statsmodels/sandbox/gam.py
+++ b/statsmodels/sandbox/gam.py
@@ -33,21 +33,68 @@ case for gamma and the others. Advantage of PolySmoother is that we can
 benchmark against the parametric GLM results.

 """
+
+# JP:
+# changes: use PolySmoother instead of crashing bspline
+# TODO: check/catalogue required interface of a smoother
+# TODO: replace default smoother by corresponding function to initialize
+#       other smoothers
+# TODO: fix iteration, do not define class with iterator methods, use looping;
+#       add maximum iteration and other optional stop criteria
+# fixed some of the dimension problems in PolySmoother,
+#       now graph for example looks good
+# NOTE: example script is now in examples folder
+#update: I did some of the above, see module docstring
+
 import numpy as np
+
 from statsmodels.genmod import families
 from statsmodels.sandbox.nonparametric.smoothers import PolySmoother
 from statsmodels.genmod.generalized_linear_model import GLM
 from statsmodels.tools.sm_exceptions import IterationLimitWarning, iteration_limit_doc
+
 import warnings
-DEBUG = False

+DEBUG = False

 def default_smoother(x, s_arg=None):
-    """
-
-    """
-    pass
-
+    '''
+
+    '''
+#    _x = x.copy()
+#    _x.sort()
+    _x = np.sort(x)
+    n = x.shape[0]
+    # taken form smooth.spline in R
+
+    #if n < 50:
+    if n < 500:
+        nknots = n
+    else:
+        a1 = np.log(50) / np.log(2)
+        a2 = np.log(100) / np.log(2)
+        a3 = np.log(140) / np.log(2)
+        a4 = np.log(200) / np.log(2)
+        if n < 200:
+            nknots = 2**(a1 + (a2 - a1) * (n - 50)/150.)
+        elif n < 800:
+            nknots = 2**(a2 + (a3 - a2) * (n - 200)/600.)
+        elif n < 3200:
+            nknots = 2**(a3 + (a4 - a3) * (n - 800)/2400.)
+        else:
+            nknots = 200 + (n - 3200.)**0.2
+    knots = _x[np.linspace(0, n-1, nknots).astype(np.int32)]
+
+    #s = SmoothingSpline(knots, x=x.copy())
+    #when I set order=2, I get nans in the GAM prediction
+    if s_arg is None:
+        order = 3 #what about knots? need smoother *args or **kwds
+    else:
+        order = s_arg
+    s = PolySmoother(order, x=x.copy())  #TODO: change order, why copy?
+#    s.gram(d=2)
+#    s.target_df = 5
+    return s

 class Offset:

@@ -58,11 +105,12 @@ class Offset:
     def __call__(self, *args, **kw):
         return self.fn(*args, **kw) + self.offset

-
 class Results:

     def __init__(self, Y, alpha, exog, smoothers, family, offset):
-        self.nobs, self.k_vars = exog.shape
+        self.nobs, self.k_vars = exog.shape  #assumes exog is 2d
+        #weird: If I put the previous line after the definition of self.mu,
+        #    then the attributed do not get added
         self.Y = Y
         self.alpha = alpha
         self.smoothers = smoothers
@@ -70,34 +118,66 @@ class Results:
         self.family = family
         self.exog = exog
         self.offset = offset
-        self.mu = self.linkinversepredict(exog)
+        self.mu = self.linkinversepredict(exog)  #TODO: remove __call__
+
+

     def __call__(self, exog):
-        """expected value ? check new GLM, same as mu for given exog
+        '''expected value ? check new GLM, same as mu for given exog
         maybe remove this
-        """
+        '''
         return self.linkinversepredict(exog)

-    def linkinversepredict(self, exog):
-        """expected value ? check new GLM, same as mu for given exog
-        """
-        pass
+    def linkinversepredict(self, exog):  #TODO what's the name in GLM
+        '''expected value ? check new GLM, same as mu for given exog
+        '''
+        return self.family.link.inverse(self.predict(exog))

     def predict(self, exog):
-        """predict response, sum of smoothed components
+        '''predict response, sum of smoothed components
         TODO: What's this in the case of GLM, corresponds to X*beta ?
-        """
-        pass
+        '''
+        #note: sum is here over axis=0,
+        #TODO: transpose in smoothed and sum over axis=1
+
+        #BUG: there is some inconsistent orientation somewhere
+        #temporary hack, will not work for 1d
+        #print dir(self)
+        #print 'self.nobs, self.k_vars', self.nobs, self.k_vars
+        exog_smoothed = self.smoothed(exog)
+        #print 'exog_smoothed.shape', exog_smoothed.shape
+        if exog_smoothed.shape[0] == self.k_vars:
+            import warnings
+            warnings.warn("old orientation, colvars, will go away",
+                          FutureWarning)
+            return np.sum(self.smoothed(exog), axis=0) + self.alpha
+        if exog_smoothed.shape[1] == self.k_vars:
+            return np.sum(exog_smoothed, axis=1) + self.alpha
+        else:
+            raise ValueError('shape mismatch in predict')

     def smoothed(self, exog):
-        """get smoothed prediction for each component
-
-        """
-        pass
-
+        '''get smoothed prediction for each component
+
+        '''
+        #bug: with exog in predict I get a shape error
+        #print 'smoothed', exog.shape, self.smoothers[0].predict(exog).shape
+        #there was a mistake exog did not have column index i
+        return np.array([self.smoothers[i].predict(exog[:,i]) + self.offset[i]
+        #should not be a mistake because exog[:,i] is attached to smoother, but
+        #it is for different exog
+        #return np.array([self.smoothers[i].predict() + self.offset[i]
+                         for i in range(exog.shape[1])]).T
+
+    def smoothed_demeaned(self, exog):
+        components = self.smoothed(exog)
+        means = components.mean(0)
+        constant = means.sum() + self.alpha
+        components_demeaned = components - means
+        return components_demeaned, constant

 class AdditiveModel:
-    """additive model with non-parametric, smoothed components
+    '''additive model with non-parametric, smoothed components

     Parameters
     ----------
@@ -108,7 +188,7 @@ class AdditiveModel:
     family : None or family instance
         I think only used because of shared results with GAM and subclassing.
         If None, then Gaussian is used.
-    """
+    '''

     def __init__(self, exog, smoothers=None, weights=None, family=None):
         self.exog = exog
@@ -116,32 +196,67 @@ class AdditiveModel:
             self.weights = weights
         else:
             self.weights = np.ones(self.exog.shape[0])
-        self.smoothers = smoothers or [default_smoother(exog[:, i]) for i in
-            range(exog.shape[1])]
+
+        self.smoothers = smoothers or [default_smoother(exog[:,i]) for i in range(exog.shape[1])]
+
+        #TODO: why do we set here df, refactoring temporary?
         for i in range(exog.shape[1]):
             self.smoothers[i].df = 10
+
         if family is None:
             self.family = families.Gaussian()
         else:
             self.family = family
+        #self.family = families.Gaussian()

     def _iter__(self):
-        """initialize iteration ?, should be removed
+        '''initialize iteration ?, should be removed

-        """
-        pass
+        '''
+        self.iter = 0
+        self.dev = np.inf
+        return self

     def next(self):
-        """internal calculation for one fit iteration
+        '''internal calculation for one fit iteration

         BUG: I think this does not improve, what is supposed to improve
             offset does not seem to be used, neither an old alpha
             The smoothers keep coef/params from previous iteration
-        """
-        pass
+        '''
+        _results = self.results
+        Y = self.results.Y
+        mu = _results.predict(self.exog)
+        #TODO offset is never used ?
+        offset = np.zeros(self.exog.shape[1], np.float64)
+        alpha = (Y * self.weights).sum() / self.weights.sum()
+        for i in range(self.exog.shape[1]):
+            tmp = self.smoothers[i].predict()
+            #TODO: check what smooth needs to do
+            #smooth (alias for fit, fit given x to new y and attach
+            #print 'next shape', (Y - alpha - mu + tmp).shape
+            bad = np.isnan(Y - alpha - mu + tmp).any()
+            if bad: #temporary assert while debugging
+                print(Y, alpha, mu, tmp)
+                raise ValueError("nan encountered")
+            #self.smoothers[i].smooth(Y - alpha - mu + tmp,
+            self.smoothers[i].smooth(Y - mu + tmp,
+                                     weights=self.weights)
+            tmp2 = self.smoothers[i].predict() #fittedvalues of previous smooth/fit
+            self.results.offset[i] = -(tmp2*self.weights).sum() / self.weights.sum()
+            #self.offset used in smoothed
+            if DEBUG:
+                print(self.smoothers[i].params)
+            mu += tmp2 - tmp
+        #change setting offset here: tests still pass, offset equal to constant
+        #in component ??? what's the effect of offset
+        offset = self.results.offset
+        #print self.iter
+        #self.iter += 1 #missing incrementing of iter counter NOT
+        return Results(Y, alpha, self.exog, self.smoothers, self.family, offset)

     def cont(self):
-        """condition to continue iteration loop
+        '''condition to continue iteration loop

         Parameters
         ----------
@@ -152,38 +267,167 @@ class AdditiveModel:
         cont : bool
             If true, then iteration should be continued.

-        """
-        pass
+        '''
+        self.iter += 1 #moved here to always count, not necessary
+        if DEBUG:
+            print(self.iter, self.results.Y.shape)
+            print(self.results.predict(self.exog).shape, self.weights.shape)
+        curdev = (((self.results.Y - self.results.predict(self.exog))**2) * self.weights).sum()
+
+        if self.iter > self.maxiter: #kill it, no max iterationoption
+            return False
+        if np.fabs((self.dev - curdev) / curdev) < self.rtol:
+            self.dev = curdev
+            return False
+
+        #self.iter += 1
+        self.dev = curdev
+        return True

     def df_resid(self):
-        """degrees of freedom of residuals, ddof is sum of all smoothers df
-        """
-        pass
+        '''degrees of freedom of residuals, ddof is sum of all smoothers df
+        '''
+        return self.results.Y.shape[0] - np.array([self.smoothers[i].df_fit() for i in range(self.exog.shape[1])]).sum()

     def estimate_scale(self):
-        """estimate standard deviation of residuals
-        """
-        pass
+        '''estimate standard deviation of residuals
+        '''
+        #TODO: remove use of self.results.__call__
+        return ((self.results.Y - self.results(self.exog))**2).sum() / self.df_resid()

-    def fit(self, Y, rtol=1e-06, maxiter=30):
-        """fit the model to a given endogenous variable Y
+    def fit(self, Y, rtol=1.0e-06, maxiter=30):
+        '''fit the model to a given endogenous variable Y

         This needs to change for consistency with statsmodels

-        """
-        pass
+        '''
+        self.rtol = rtol
+        self.maxiter = maxiter
+        #iter(self)  # what does this do? anything?
+        self._iter__()
+        mu = 0
+        alpha = (Y * self.weights).sum() / self.weights.sum()

+        offset = np.zeros(self.exog.shape[1], np.float64)

-class Model(GLM, AdditiveModel):
+        for i in range(self.exog.shape[1]):
+            self.smoothers[i].smooth(Y - alpha - mu,
+                                     weights=self.weights)
+            tmp = self.smoothers[i].predict()
+            offset[i] = (tmp * self.weights).sum() / self.weights.sum()
+            tmp -= tmp.sum()
+            mu += tmp
+
+        self.results = Results(Y, alpha, self.exog, self.smoothers, self.family, offset)
+
+        while self.cont():
+            self.results = self.next()
+
+        if self.iter >= self.maxiter:
+            warnings.warn(iteration_limit_doc, IterationLimitWarning)
+
+        return self.results

-    def __init__(self, endog, exog, smoothers=None, family=families.Gaussian()
-        ):
+class Model(GLM, AdditiveModel):
+#class Model(AdditiveModel):
+    #TODO: what does GLM do? Is it actually used ?
+    #only used in __init__, dropping it does not change results
+    #but where gets family attached now? - weird, it's Gaussian in this case now
+    #also where is the link defined?
+    #AdditiveModel overwrites family and sets it to Gaussian - corrected
+
+    #I think both GLM and AdditiveModel subclassing is only used in __init__
+
+    #niter = 2
+
+#    def __init__(self, exog, smoothers=None, family=family.Gaussian()):
+#        GLM.__init__(self, exog, family=family)
+#        AdditiveModel.__init__(self, exog, smoothers=smoothers)
+#        self.family = family
+    def __init__(self, endog, exog, smoothers=None, family=families.Gaussian()):
+        #self.family = family
+        #TODO: inconsistent super __init__
         AdditiveModel.__init__(self, exog, smoothers=smoothers, family=family)
         GLM.__init__(self, endog, exog, family=family)
-        assert self.family is family
+        assert self.family is family  #make sure we got the right family
+
+    def next(self):
+        _results = self.results
+        Y = _results.Y
+        if np.isnan(self.weights).all():
+            print("nanweights1")
+
+        _results.mu = self.family.link.inverse(_results.predict(self.exog))
+        #eta = _results.predict(self.exog)
+        #_results.mu = self.family.fitted(eta)
+        weights = self.family.weights(_results.mu)
+        if np.isnan(weights).all():
+            self.weights = weights
+            print("nanweights2")
+        self.weights = weights
+        if DEBUG:
+            print('deriv isnan', np.isnan(self.family.link.deriv(_results.mu)).any())
+
+        #Z = _results.predict(self.exog) + \
+        Z = _results.predict(self.exog) + \
+               self.family.link.deriv(_results.mu) * (Y - _results.mu) #- _results.alpha #?added alpha
+
+        m = AdditiveModel(self.exog, smoothers=self.smoothers,
+                          weights=self.weights, family=self.family)
+
+        #TODO: I do not know what the next two lines do, Z, Y ? which is endog?
+        #Y is original endog, Z is endog for the next step in the iterative solver
+
+        _results = m.fit(Z)
+        self.history.append([Z, _results.predict(self.exog)])
+        _results.Y = Y
+        _results.mu = self.family.link.inverse(_results.predict(self.exog))
+        self.iter += 1
+        self.results = _results
+
+        return _results

     def estimate_scale(self, Y=None):
         """
-        Return Pearson's X^2 estimate of scale.
+        Return Pearson\'s X^2 estimate of scale.
         """
-        pass
+
+        if Y is None:
+            Y = self.Y
+        resid = Y - self.results.mu
+        return (np.power(resid, 2) / self.family.variance(self.results.mu)).sum() \
+                    / self.df_resid   #TODO check this
+                    #/ AdditiveModel.df_resid(self)  #what is the class doing here?
+
+
+    def fit(self, Y, rtol=1.0e-06, maxiter=30):
+
+        self.rtol = rtol
+        self.maxiter = maxiter
+
+        self.Y = np.asarray(Y, np.float64)
+
+        self.history = []
+
+        #iter(self)
+        self._iter__()
+
+        #TODO code duplication with next?
+        alpha = self.Y.mean()
+        mu0 = self.family.starting_mu(Y)
+        #Z = self.family.link(alpha) + self.family.link.deriv(alpha) * (Y - alpha)
+        Z = self.family.link(alpha) + self.family.link.deriv(alpha) * (Y - mu0)
+        m = AdditiveModel(self.exog, smoothers=self.smoothers, family=self.family)
+        self.results = m.fit(Z)
+        self.results.mu = self.family.link.inverse(self.results.predict(self.exog))
+        self.results.Y = Y
+
+        while self.cont():
+            self.results = self.next()
+            self.scale = self.results.scale = self.estimate_scale()
+
+        if self.iter >= self.maxiter:
+            import warnings
+            warnings.warn(iteration_limit_doc, IterationLimitWarning)
+
+        return self.results
diff --git a/statsmodels/sandbox/infotheo.py b/statsmodels/sandbox/infotheo.py
index 6734017fd..c85ab1343 100644
--- a/statsmodels/sandbox/infotheo.py
+++ b/statsmodels/sandbox/infotheo.py
@@ -9,12 +9,38 @@ Golan, As. 2008. "Information and Entropy Econometrics -- A Review and
 Golan, A., Judge, G., and Miller, D.  1996.  Maximum Entropy Econometrics.
     Wiley & Sons, Chichester.
 """
+#For MillerMadow correction
+#Miller, G. 1955. Note on the bias of information estimates. Info. Theory
+#    Psychol. Prob. Methods II-B:95-100.
+
+#For ChaoShen method
+#Chao, A., and T.-J. Shen. 2003. Nonparametric estimation of Shannon's index of diversity when
+#there are unseen species in sample. Environ. Ecol. Stat. 10:429-443.
+#Good, I. J. 1953. The population frequencies of species and the estimation of population parameters.
+#Biometrika 40:237-264.
+#Horvitz, D.G., and D. J. Thompson. 1952. A generalization of sampling without replacement from a finute universe. J. Am. Stat. Assoc. 47:663-685.
+
+#For NSB method
+#Nemenman, I., F. Shafee, and W. Bialek. 2002. Entropy and inference, revisited. In: Dietterich, T.,
+#S. Becker, Z. Gharamani, eds. Advances in Neural Information Processing Systems 14: 471-478.
+#Cambridge (Massachusetts): MIT Press.
+
+#For shrinkage method
+#Dougherty, J., Kohavi, R., and Sahami, M. (1995). Supervised and unsupervised discretization of
+#continuous features. In International Conference on Machine Learning.
+#Yang, Y. and Webb, G. I. (2003). Discretization for naive-bayes learning: managing discretization
+#bias and variance. Technical Report 2003/131 School of Computer Science and Software Engineer-
+#ing, Monash University.
+
 from statsmodels.compat.python import lzip, lmap
 from scipy import stats
 import numpy as np
 from matplotlib import pyplot as plt
 from scipy.special import logsumexp as sp_logsumexp

+#TODO: change these to use maxentutils so that over/underflow is handled
+#with the logsumexp.
+

 def logsumexp(a, axis=None):
     """
@@ -40,17 +66,29 @@ def logsumexp(a, axis=None):

     This should be superceded by the ufunc when it is finished.
     """
-    pass
+    if axis is None:
+        # Use the scipy.maxentropy version.
+        return sp_logsumexp(a)
+    a = np.asarray(a)
+    shp = list(a.shape)
+    shp[axis] = 1
+    a_max = a.max(axis=axis)
+    s = np.log(np.exp(a - a_max.reshape(shp)).sum(axis=axis))
+    lse  = a_max + s
+    return lse


 def _isproperdist(X):
     """
     Checks to see if `X` is a proper probability distribution
     """
-    pass
+    X = np.asarray(X)
+    if not np.allclose(np.sum(X), 1) or not np.all(X>=0) or not np.all(X<=1):
+        return False
+    else:
+        return True

-
-def discretize(X, method='ef', nbins=None):
+def discretize(X, method="ef", nbins=None):
     """
     Discretize `X`

@@ -65,10 +103,32 @@ def discretize(X, method='ef', nbins=None):
     Examples
     --------
     """
-    pass
-
-
-def logbasechange(a, b):
+    nobs = len(X)
+    if nbins is None:
+        nbins = np.floor(np.sqrt(nobs))
+    if method == "ef":
+        discrete = np.ceil(nbins * stats.rankdata(X)/nobs)
+    if method == "ew":
+        width = np.max(X) - np.min(X)
+        width = np.floor(width/nbins)
+        svec, ivec = stats.fastsort(X)
+        discrete = np.zeros(nobs)
+        binnum = 1
+        base = svec[0]
+        discrete[ivec[0]] = binnum
+        for i in range(1,nobs):
+            if svec[i] < base + width:
+                discrete[ivec[i]] = binnum
+            else:
+                base = svec[i]
+                binnum += 1
+                discrete[ivec[i]] = binnum
+    return discrete
+#TODO: looks okay but needs more robust tests for corner cases
+
+
+
+def logbasechange(a,b):
     """
     There is a one-to-one transformation of the entropy value from
     a log base b to a log base a :
@@ -79,23 +139,22 @@ def logbasechange(a, b):
     -------
     log_{b}(a)
     """
-    pass
-
+    return np.log(b)/np.log(a)

 def natstobits(X):
     """
     Converts from nats to bits
     """
-    pass
-
+    return logbasechange(np.e, 2) * X

 def bitstonats(X):
     """
     Converts from bits to nats
     """
-    pass
-
+    return logbasechange(2, np.e) * X

+#TODO: make this entropy, and then have different measures as
+#a method
 def shannonentropy(px, logbase=2):
     """
     This is Shannon's entropy
@@ -120,9 +179,17 @@ def shannonentropy(px, logbase=2):
     -----
     shannonentropy(0) is defined as 0
     """
-    pass
-
-
+#TODO: have not defined the px,py case?
+    px = np.asarray(px)
+    if not np.all(px <= 1) or not np.all(px >= 0):
+        raise ValueError("px does not define proper distribution")
+    entropy = -np.sum(np.nan_to_num(px*np.log2(px)))
+    if logbase != 2:
+        return logbasechange(2,logbase) * entropy
+    else:
+        return entropy
+
+# Shannon's information content
 def shannoninfo(px, logbase=2):
     """
     Shannon's information
@@ -137,8 +204,13 @@ def shannoninfo(px, logbase=2):
     For logbase = 2
     np.log2(px)
     """
-    pass
-
+    px = np.asarray(px)
+    if not np.all(px <= 1) or not np.all(px >= 0):
+        raise ValueError("px does not define proper distribution")
+    if logbase != 2:
+        return - logbasechange(2,logbase) * np.log2(px)
+    else:
+        return - np.log2(px)

 def condentropy(px, py, pxpy=None, logbase=2):
     """
@@ -160,10 +232,19 @@ def condentropy(px, py, pxpy=None, logbase=2):
     where q_{j} = Y[j]
     and w_kj = X[k,j]
     """
-    pass
-
-
-def mutualinfo(px, py, pxpy, logbase=2):
+    if not _isproperdist(px) or not _isproperdist(py):
+        raise ValueError("px or py is not a proper probability distribution")
+    if pxpy is not None and not _isproperdist(pxpy):
+        raise ValueError("pxpy is not a proper joint distribtion")
+    if pxpy is None:
+        pxpy = np.outer(py,px)
+    condent = np.sum(pxpy * np.nan_to_num(np.log2(py/pxpy)))
+    if logbase == 2:
+        return condent
+    else:
+        return logbasechange(2, logbase) * condent
+
+def mutualinfo(px,py,pxpy, logbase=2):
     """
     Returns the mutual information between X and Y.

@@ -184,10 +265,16 @@ def mutualinfo(px, py, pxpy, logbase=2):
     -------
     shannonentropy(px) - condentropy(px,py,pxpy)
     """
-    pass
-
+    if not _isproperdist(px) or not _isproperdist(py):
+        raise ValueError("px or py is not a proper probability distribution")
+    if pxpy is not None and not _isproperdist(pxpy):
+        raise ValueError("pxpy is not a proper joint distribtion")
+    if pxpy is None:
+        pxpy = np.outer(py,px)
+    return shannonentropy(px, logbase=logbase) - condentropy(px,py,pxpy,
+            logbase=logbase)

-def corrent(px, py, pxpy, logbase=2):
+def corrent(px,py,pxpy,logbase=2):
     """
     An information theoretic correlation measure.

@@ -217,10 +304,17 @@ def corrent(px, py, pxpy, logbase=2):

     corrent(px,py,pxpy) = 1 - condent(px,py,pxpy)/shannonentropy(py)
     """
-    pass
-
+    if not _isproperdist(px) or not _isproperdist(py):
+        raise ValueError("px or py is not a proper probability distribution")
+    if pxpy is not None and not _isproperdist(pxpy):
+        raise ValueError("pxpy is not a proper joint distribtion")
+    if pxpy is None:
+        pxpy = np.outer(py,px)
+
+    return mutualinfo(px,py,pxpy,logbase=logbase)/shannonentropy(py,
+            logbase=logbase)

-def covent(px, py, pxpy, logbase=2):
+def covent(px,py,pxpy,logbase=2):
     """
     An information theoretic covariance measure.

@@ -251,10 +345,22 @@ def covent(px, py, pxpy, logbase=2):

     covent(px,py,pxpy) = condent(px,py,pxpy) + condent(py,px,pxpy)
     """
-    pass
+    if not _isproperdist(px) or not _isproperdist(py):
+        raise ValueError("px or py is not a proper probability distribution")
+    if pxpy is not None and not _isproperdist(pxpy):
+        raise ValueError("pxpy is not a proper joint distribtion")
+    if pxpy is None:
+        pxpy = np.outer(py,px)
+
+    # FIXME: these should be `condentropy`, not `condent`
+    return (condent(px, py, pxpy, logbase=logbase)  # noqa:F821  See GH#5756
+            + condent(py, px, pxpy, logbase=logbase))  # noqa:F821  See GH#5756


-def renyientropy(px, alpha=1, logbase=2, measure='R'):
+
+#### Generalized Entropies ####
+
+def renyientropy(px,alpha=1,logbase=2,measure='R'):
     """
     Renyi's generalized entropy

@@ -281,10 +387,31 @@ def renyientropy(px, alpha=1, logbase=2, measure='R'):

     In the limit as alpha -> inf, min-entropy is returned.
     """
-    pass
-
-
-def gencrossentropy(px, py, pxpy, alpha=1, logbase=2, measure='T'):
+#TODO:finish returns
+#TODO:add checks for measure
+    if not _isproperdist(px):
+        raise ValueError("px is not a proper probability distribution")
+    alpha = float(alpha)
+    if alpha == 1:
+        genent = shannonentropy(px)
+        if logbase != 2:
+            return logbasechange(2, logbase) * genent
+        return genent
+    elif 'inf' in str(alpha).lower() or alpha == np.inf:
+        return -np.log(np.max(px))
+
+    # gets here if alpha != (1 or inf)
+    px = px**alpha
+    genent = np.log(px.sum())
+    if logbase == 2:
+        return 1/(1-alpha) * genent
+    else:
+        return 1/(1-alpha) * logbasechange(2, logbase) * genent
+
+#TODO: before completing this, need to rethink the organization of
+# (relative) entropy measures, ie., all put into one function
+# and have kwdargs, etc.?
+def gencrossentropy(px,py,pxpy,alpha=1,logbase=2, measure='T'):
     """
     Generalized cross-entropy measures.

@@ -304,80 +431,90 @@ def gencrossentropy(px, py, pxpy, alpha=1, logbase=2, measure='T'):
         the cross-entropy version of the Tsallis measure.  'CR' is Cressie-Read
         measure.
     """
-    pass


-if __name__ == '__main__':
-    print(
-        'From Golan (2008) "Information and Entropy Econometrics -- A Review and Synthesis'
-        )
-    print('Table 3.1')
-    X = [0.2, 0.2, 0.2, 0.2, 0.2]
-    Y = [0.322, 0.072, 0.511, 0.091, 0.004]
+if __name__ == "__main__":
+    print("From Golan (2008) \"Information and Entropy Econometrics -- A Review \
+and Synthesis")
+    print("Table 3.1")
+    # Examples from Golan (2008)
+
+    X = [.2,.2,.2,.2,.2]
+    Y = [.322,.072,.511,.091,.004]
+
     for i in X:
         print(shannoninfo(i))
     for i in Y:
         print(shannoninfo(i))
     print(shannonentropy(X))
     print(shannonentropy(Y))
-    p = [1e-05, 0.0001, 0.001, 0.01, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 
-        0.45, 0.5]
+
+    p = [1e-5,1e-4,.001,.01,.1,.15,.2,.25,.3,.35,.4,.45,.5]
+
     plt.subplot(111)
-    plt.ylabel('Information')
-    plt.xlabel('Probability')
-    x = np.linspace(0, 1, 100001)
+    plt.ylabel("Information")
+    plt.xlabel("Probability")
+    x = np.linspace(0,1,100001)
     plt.plot(x, shannoninfo(x))
+#    plt.show()
+
     plt.subplot(111)
-    plt.ylabel('Entropy')
-    plt.xlabel('Probability')
-    x = np.linspace(0, 1, 101)
-    plt.plot(x, lmap(shannonentropy, lzip(x, 1 - x)))
-    w = np.array([[0, 0, 1.0 / 3], [1 / 9.0, 1 / 9.0, 1 / 9.0], [1 / 18.0, 
-        1 / 9.0, 1 / 6.0]])
+    plt.ylabel("Entropy")
+    plt.xlabel("Probability")
+    x = np.linspace(0,1,101)
+    plt.plot(x, lmap(shannonentropy, lzip(x,1-x)))
+#    plt.show()
+
+    # define a joint probability distribution
+    # from Golan (2008) table 3.3
+    w = np.array([[0,0,1./3],[1/9.,1/9.,1/9.],[1/18.,1/9.,1/6.]])
+    # table 3.4
     px = w.sum(0)
     py = w.sum(1)
     H_X = shannonentropy(px)
     H_Y = shannonentropy(py)
     H_XY = shannonentropy(w)
-    H_XgivenY = condentropy(px, py, w)
-    H_YgivenX = condentropy(py, px, w)
-    D_YX = logbasechange(2, np.e) * stats.entropy(px, py)
-    D_XY = logbasechange(2, np.e) * stats.entropy(py, px)
-    I_XY = mutualinfo(px, py, w)
-    print('Table 3.3')
-    print(H_X, H_Y, H_XY, H_XgivenY, H_YgivenX, D_YX, D_XY, I_XY)
-    print('discretize functions')
-    X = np.array([21.2, 44.5, 31.0, 19.5, 40.6, 38.7, 11.1, 15.8, 31.9, 
-        25.8, 20.2, 14.2, 24.0, 21.0, 11.3, 18.0, 16.3, 22.2, 7.8, 27.8, 
-        16.3, 35.1, 14.9, 17.1, 28.2, 16.4, 16.5, 46.0, 9.5, 18.8, 32.1, 
-        26.1, 16.1, 7.3, 21.4, 20.0, 29.3, 14.9, 8.3, 22.5, 12.8, 26.9, 
-        25.5, 22.9, 11.2, 20.7, 26.2, 9.3, 10.8, 15.6])
+    H_XgivenY = condentropy(px,py,w)
+    H_YgivenX = condentropy(py,px,w)
+# note that cross-entropy is not a distance measure as the following shows
+    D_YX = logbasechange(2,np.e)*stats.entropy(px, py)
+    D_XY = logbasechange(2,np.e)*stats.entropy(py, px)
+    I_XY = mutualinfo(px,py,w)
+    print("Table 3.3")
+    print(H_X,H_Y, H_XY, H_XgivenY, H_YgivenX, D_YX, D_XY, I_XY)
+
+    print("discretize functions")
+    X=np.array([21.2,44.5,31.0,19.5,40.6,38.7,11.1,15.8,31.9,25.8,20.2,14.2,
+        24.0,21.0,11.3,18.0,16.3,22.2,7.8,27.8,16.3,35.1,14.9,17.1,28.2,16.4,
+        16.5,46.0,9.5,18.8,32.1,26.1,16.1,7.3,21.4,20.0,29.3,14.9,8.3,22.5,
+        12.8,26.9,25.5,22.9,11.2,20.7,26.2,9.3,10.8,15.6])
     discX = discretize(X)
+    #CF: R's infotheo
+#TODO: compare to pyentropy quantize?
     print
-    print('Example in section 3.6 of Golan, using table 3.3')
+    print("Example in section 3.6 of Golan, using table 3.3")
     print("Bounding errors using Fano's inequality")
-    print('H(P_{e}) + P_{e}log(K-1) >= H(X|Y)')
-    print('or, a weaker inequality')
-    print('P_{e} >= [H(X|Y) - 1]/log(K)')
-    print('P(x) = %s' % px)
-    print('X = 3 has the highest probability, so this is the estimate Xhat')
+    print("H(P_{e}) + P_{e}log(K-1) >= H(X|Y)")
+    print("or, a weaker inequality")
+    print("P_{e} >= [H(X|Y) - 1]/log(K)")
+    print("P(x) = %s" % px)
+    print("X = 3 has the highest probability, so this is the estimate Xhat")
     pe = 1 - px[2]
-    print('The probability of error Pe is 1 - p(X=3) = %0.4g' % pe)
-    H_pe = shannonentropy([pe, 1 - pe])
-    print('H(Pe) = %0.4g and K=3' % H_pe)
-    print('H(Pe) + Pe*log(K-1) = %0.4g >= H(X|Y) = %0.4g' % (H_pe + pe * np
-        .log2(2), H_XgivenY))
-    print('or using the weaker inequality')
-    print('Pe = %0.4g >= [H(X) - 1]/log(K) = %0.4g' % (pe, (H_X - 1) / np.
-        log2(3)))
-    print('Consider now, table 3.5, where there is additional information')
-    print('The conditional probabilities of P(X|Y=y) are ')
-    w2 = np.array([[0.0, 0.0, 1.0], [1 / 3.0, 1 / 3.0, 1 / 3.0], [1 / 6.0, 
-        1 / 3.0, 1 / 2.0]])
+    print("The probability of error Pe is 1 - p(X=3) = %0.4g" % pe)
+    H_pe = shannonentropy([pe,1-pe])
+    print("H(Pe) = %0.4g and K=3" % H_pe)
+    print("H(Pe) + Pe*log(K-1) = %0.4g >= H(X|Y) = %0.4g" % \
+            (H_pe+pe*np.log2(2), H_XgivenY))
+    print("or using the weaker inequality")
+    print("Pe = %0.4g >= [H(X) - 1]/log(K) = %0.4g" % (pe, (H_X - 1)/np.log2(3)))
+    print("Consider now, table 3.5, where there is additional information")
+    print("The conditional probabilities of P(X|Y=y) are ")
+    w2 = np.array([[0.,0.,1.],[1/3.,1/3.,1/3.],[1/6.,1/3.,1/2.]])
     print(w2)
-    print('The probability of error given this information is')
-    print('Pe = [H(X|Y) -1]/log(K) = %0.4g' % ((np.mean([0, shannonentropy(
-        w2[1]), shannonentropy(w2[2])]) - 1) / np.log2(3)))
-    print('such that more information lowers the error')
-    markovchain = np.array([[0.553, 0.284, 0.163], [0.465, 0.312, 0.223], [
-        0.42, 0.322, 0.258]])
+# not a proper distribution?
+    print("The probability of error given this information is")
+    print("Pe = [H(X|Y) -1]/log(K) = %0.4g" % ((np.mean([0,shannonentropy(w2[1]),shannonentropy(w2[2])])-1)/np.log2(3)))
+    print("such that more information lowers the error")
+
+### Stochastic processes
+    markovchain = np.array([[.553,.284,.163],[.465,.312,.223],[.420,.322,.258]])
diff --git a/statsmodels/sandbox/mcevaluate/arma.py b/statsmodels/sandbox/mcevaluate/arma.py
index 7ad827c3c..6fe683e85 100644
--- a/statsmodels/sandbox/mcevaluate/arma.py
+++ b/statsmodels/sandbox/mcevaluate/arma.py
@@ -1,10 +1,13 @@
+
 import numpy as np
 from statsmodels.tsa.arima_process import arma_generate_sample
 from statsmodels.tsa.arma_mle import Arma


+#TODO: still refactoring problem with cov_x
+#copied from sandbox.tsa.arima.py
 def mcarma22(niter=10, nsample=1000, ar=None, ma=None, sig=0.5):
-    """run Monte Carlo for ARMA(2,2)
+    '''run Monte Carlo for ARMA(2,2)

     DGP parameters currently hard coded
     also sample size `nsample`
@@ -12,12 +15,62 @@ def mcarma22(niter=10, nsample=1000, ar=None, ma=None, sig=0.5):
     was not a self contained function, used instances from outer scope
       now corrected

-    """
-    pass
+    '''
+    #nsample = 1000
+    #ar = [1.0, 0, 0]
+    if ar is None:
+        ar = [1.0, -0.55, -0.1]
+    #ma = [1.0, 0, 0]
+    if ma is None:
+        ma = [1.0,  0.3,  0.2]
+    results = []
+    results_bse = []
+    for _ in range(niter):
+        y2 = arma_generate_sample(ar,ma,nsample+1000, sig)[-nsample:]
+        y2 -= y2.mean()
+        arest2 = Arma(y2)
+        rhohat2a, cov_x2a, infodict, mesg, ier = arest2.fit((2,2))
+        results.append(rhohat2a)
+        err2a = arest2.geterrors(rhohat2a)
+        sige2a = np.sqrt(np.dot(err2a,err2a)/nsample)
+        #print('sige2a', sige2a,
+        #print('cov_x2a.shape', cov_x2a.shape
+        #results_bse.append(sige2a * np.sqrt(np.diag(cov_x2a)))
+        if cov_x2a is not None:
+            results_bse.append(sige2a * np.sqrt(np.diag(cov_x2a)))
+        else:
+            results_bse.append(np.nan + np.zeros_like(rhohat2a))
+    return np.r_[ar[1:], ma[1:]], np.array(results), np.array(results_bse)
+
+def mc_summary(res, rt=None):
+    if rt is None:
+        rt = np.zeros(res.shape[1])
+    nanrows = np.isnan(res).any(1)
+    print('fractions of iterations with nans', nanrows.mean())
+    res = res[~nanrows]
+    print('RMSE')
+    print(np.sqrt(((res-rt)**2).mean(0)))
+    print('mean bias')
+    print((res-rt).mean(0))
+    print('median bias')
+    print(np.median((res-rt),0))
+    print('median bias percent')
+    print(np.median((res-rt)/rt*100,0))
+    print('median absolute error')
+    print(np.median(np.abs(res-rt),0))
+    print('positive error fraction')
+    print((res > rt).mean(0))


 if __name__ == '__main__':
-    """ niter 50, sample size=1000, 2 runs
+
+#short version
+#    true, est, bse = mcarma22(niter=50)
+#    print(true
+#    #print(est
+#    print(est.mean(0)
+
+    ''' niter 50, sample size=1000, 2 runs
     [-0.55 -0.1   0.3   0.2 ]
     [-0.542401   -0.09904305  0.30840599  0.2052473 ]

@@ -63,13 +116,17 @@ if __name__ == '__main__':

     [-0.55 -0.1   0.3   0.2 ]
     [-0.47789765 -0.08650743  0.3554441   0.24196087]
-    """
+    '''
+
     ar = [1.0, -0.55, -0.1]
-    ma = [1.0, 0.3, 0.2]
+    ma = [1.0,  0.3,  0.2]
     nsample = 200
-    run_mc = True
+
+
+
+    run_mc = True#False
     if run_mc:
-        for sig in [0.1, 0.5, 1.0]:
+        for sig in [0.1, 0.5, 1.]:
             import time
             t0 = time.time()
             rt, res_rho, res_bse = mcarma22(niter=100, sig=sig)
@@ -77,10 +134,14 @@ if __name__ == '__main__':
             print('true')
             print(rt)
             print('nsample =', nsample, 'sigma = ', sig)
-            print('elapsed time for Monte Carlo', time.time() - t0)
+            print('elapsed time for Monte Carlo', time.time()-t0)
+            # 20 seconds for ARMA(2,2), 1000 iterations with 1000 observations
+            #sige2a = np.sqrt(np.dot(err2a,err2a)/nsample)
+            #print('\nbse of one sample'
+            #print(sige2a * np.sqrt(np.diag(cov_x2a))
             print('\nMC of rho versus true')
             mc_summary(res_rho, rt)
-            print('\nMC of bse versus zero')
+            print('\nMC of bse versus zero')  # this implies inf in percent
             mc_summary(res_bse)
             print('\nMC of bse versus std')
             mc_summary(res_bse, res_rho.std(0))
diff --git a/statsmodels/sandbox/mle.py b/statsmodels/sandbox/mle.py
index 3bbe8aad4..1c599b129 100644
--- a/statsmodels/sandbox/mle.py
+++ b/statsmodels/sandbox/mle.py
@@ -1,44 +1,63 @@
-"""What's the origin of this file? It is not ours.
+'''What's the origin of this file? It is not ours.
 Does not run because of missing mtx files, now included

 changes: JP corrections to imports so it runs, comment out print
-"""
+'''
 import numpy as np
-from numpy import dot, outer, random
+from numpy import dot,  outer, random
 from scipy import io, linalg, optimize
 from scipy.sparse import eye as speye
 import matplotlib.pyplot as plt

+def R(v):
+    rq = dot(v.T,A*v)/dot(v.T,B*v)
+    res = (A*v-rq*B*v)/linalg.norm(B*v)
+    data.append(linalg.norm(res))
+    return rq

 def Rp(v):
     """ Gradient """
-    pass
-
+    result = 2*(A*v-R(v)*B*v)/dot(v.T,B*v)
+    #print "Rp: ", result
+    return result

 def Rpp(v):
     """ Hessian """
-    pass
+    result = 2*(A-R(v)*B-outer(B*v,Rp(v))-outer(Rp(v),B*v))/dot(v.T,B*v)
+    #print "Rpp: ", result
+    return result


-A = io.mmread('nos4.mtx')
+A = io.mmread('nos4.mtx') # clustered eigenvalues
+#B = io.mmread('bcsstm02.mtx.gz')
+#A = io.mmread('bcsstk06.mtx.gz') # clustered eigenvalues
+#B = io.mmread('bcsstm06.mtx.gz')
 n = A.shape[0]
-B = speye(n, n)
+B = speye(n,n)
 random.seed(1)
-v_0 = random.rand(n)
-print('try fmin_bfgs')
+v_0=random.rand(n)
+
+print("try fmin_bfgs")
 full_output = 1
-data = []
-v, fopt, gopt, Hopt, func_calls, grad_calls, warnflag, allvecs = (optimize.
-    fmin_bfgs(R, v_0, fprime=Rp, full_output=full_output, retall=1))
-if warnflag == 0:
-    plt.semilogy(np.arange(0, len(data)), data)
-    print('Rayleigh quotient BFGS', R(v))
-print('fmin_bfgs OK')
-print('try fmin_ncg')
-data = []
-v, fopt, fcalls, gcalls, hcalls, warnflag, allvecs = optimize.fmin_ncg(R,
-    v_0, fprime=Rp, fhess=Rpp, full_output=full_output, retall=1)
+data=[]
+v,fopt, gopt, Hopt, func_calls, grad_calls, warnflag, allvecs = \
+        optimize.fmin_bfgs(R,v_0,fprime=Rp,full_output=full_output,retall=1)
 if warnflag == 0:
+    plt.semilogy(np.arange(0,len(data)),data)
+    print('Rayleigh quotient BFGS',R(v))
+
+
+print("fmin_bfgs OK")
+
+print("try fmin_ncg")
+
+#
+# WARNING: the program may hangs if fmin_ncg is used
+#
+data=[]
+v,fopt, fcalls, gcalls, hcalls, warnflag, allvecs = \
+        optimize.fmin_ncg(R,v_0,fprime=Rp,fhess=Rpp,full_output=full_output,retall=1)
+if warnflag==0:
     plt.figure()
-    plt.semilogy(np.arange(0, len(data)), data)
-    print('Rayleigh quotient NCG', R(v))
+    plt.semilogy(np.arange(0,len(data)),data)
+    print('Rayleigh quotient NCG',R(v))
diff --git a/statsmodels/sandbox/multilinear.py b/statsmodels/sandbox/multilinear.py
index 953302a10..5b67935e4 100644
--- a/statsmodels/sandbox/multilinear.py
+++ b/statsmodels/sandbox/multilinear.py
@@ -17,17 +17,31 @@ from statsmodels.api import stats
 import numpy as np
 import logging

-
 def _model2dataframe(model_endog, model_exog, model_type=OLS, **kwargs):
     """return a series containing the summary of a linear model

     All the exceding parameters will be redirected to the linear model
     """
-    pass
-
-
-def multiOLS(model, dataframe, column_list=None, method='fdr_bh', alpha=
-    0.05, subset=None, model_type=OLS, **kwargs):
+    # create the linear model and perform the fit
+    model_result = model_type(model_endog, model_exog, **kwargs).fit()
+    # keeps track of some global statistics
+    statistics = pd.Series({'r2': model_result.rsquared,
+                  'adj_r2': model_result.rsquared_adj})
+    # put them togher with the result for each term
+    result_df = pd.DataFrame({'params': model_result.params,
+                              'pvals': model_result.pvalues,
+                              'std': model_result.bse,
+                              'statistics': statistics})
+    # add the complexive results for f-value and the total p-value
+    fisher_df = pd.DataFrame({'params': {'_f_test': model_result.fvalue},
+                              'pvals': {'_f_test': model_result.f_pvalue}})
+    # merge them and unstack to obtain a hierarchically indexed series
+    res_series = pd.concat([result_df, fisher_df]).unstack()
+    return res_series.dropna()
+
+
+def multiOLS(model, dataframe, column_list=None, method='fdr_bh',
+             alpha=0.05, subset=None, model_type=OLS, **kwargs):
     """apply a linear model to several endogenous variables on a dataframe

     Take a linear model definition via formula and a dataframe that will be
@@ -125,7 +139,49 @@ def multiOLS(model, dataframe, column_list=None, method='fdr_bh', alpha=
     Even a single column name can be given without enclosing it in a list
     >>> multiOLS('GNP + 0', df, 'GNPDEFL')
     """
-    pass
+    # data normalization
+    # if None take all the numerical columns that are not present in the model
+    # it's not waterproof but is a good enough criterion for everyday use
+    if column_list is None:
+        column_list = [name for name in dataframe.columns
+                      if dataframe[name].dtype != object and name not in model]
+    # if it's a single string transform it in a single element list
+    if isinstance(column_list, str):
+        column_list = [column_list]
+    if subset is not None:
+        dataframe = dataframe.loc[subset]
+    # perform each model and retrieve the statistics
+    col_results = {}
+    # as the model will use always the same endogenous variables
+    # we can create them once and reuse
+    model_exog = dmatrix(model, data=dataframe, return_type="dataframe")
+    for col_name in column_list:
+        # it will try to interpret the column name as a valid dataframe
+        # index as it can be several times faster. If it fails it
+        # interpret it as a patsy formula (for example for centering)
+        try:
+            model_endog = dataframe[col_name]
+        except KeyError:
+            model_endog = dmatrix(col_name + ' + 0', data=dataframe)
+        # retrieve the result and store them
+        res = _model2dataframe(model_endog, model_exog, model_type, **kwargs)
+        col_results[col_name] = res
+    # mangle them togheter and sort by complexive p-value
+    summary = pd.DataFrame(col_results)
+    # order by the p-value: the most useful model first!
+    summary = summary.T.sort_values([('pvals', '_f_test')])
+    summary.index.name = 'endogenous vars'
+    # implementing the pvalue correction method
+    smt = stats.multipletests
+    for (key1, key2) in summary:
+        if key1 != 'pvals':
+            continue
+        p_values = summary[key1, key2]
+        corrected = smt(p_values, method=method, alpha=alpha)[1]
+        # extend the dataframe of results with the column
+        # of the corrected p_values
+        summary['adj_' + key1, key2] = corrected
+    return summary


 def _test_group(pvalues, group_name, group, exact=True):
@@ -134,7 +190,34 @@ def _test_group(pvalues, group_name, group, exact=True):
     The test is performed on the pvalues set (ad a pandas series) over
     the group specified via a fisher exact test.
     """
-    pass
+    from scipy.stats import fisher_exact, chi2_contingency
+
+    totals = 1.0 * len(pvalues)
+    total_significant = 1.0 * np.sum(pvalues)
+    cross_index = [c for c in group if c in pvalues.index]
+    missing = [c for c in group if c not in pvalues.index]
+    if missing:
+        s = ('the test is not well defined if the group '
+             'has elements not presents in the significativity '
+             'array. group name: {}, missing elements: {}')
+        logging.warning(s.format(group_name, missing))
+    # how many are significant and not in the group
+    group_total = 1.0 * len(cross_index)
+    group_sign = 1.0 * len([c for c in cross_index if pvalues[c]])
+    group_nonsign = 1.0 * (group_total - group_sign)
+    # how many are significant and not outside the group
+    extern_sign = 1.0 * (total_significant - group_sign)
+    extern_nonsign = 1.0 * (totals - total_significant - group_nonsign)
+    # make the fisher test or the chi squared
+    test = fisher_exact if exact else chi2_contingency
+    table = [[extern_nonsign, extern_sign], [group_nonsign, group_sign]]
+    pvalue = test(np.array(table))[1]
+    # is the group more represented or less?
+    part = group_sign, group_nonsign, extern_sign, extern_nonsign
+    #increase = (group_sign / group_total) > (total_significant / totals)
+    increase = np.log((totals * group_sign)
+                      / (total_significant * group_total))
+    return pvalue, increase, part


 def multigroup(pvals, groups, exact=True, keep_all=True, alpha=0.05):
@@ -212,4 +295,29 @@ def multigroup(pvals, groups, exact=True, keep_all=True, alpha=0.05):
     do the analysis of the significativity
     >>> multigroup(pvals < 0.05, groups)
     """
-    pass
+    pvals = pd.Series(pvals)
+    if not (set(pvals.unique()) <= set([False, True])):
+        raise ValueError("the series should be binary")
+    if hasattr(pvals.index, 'is_unique') and not pvals.index.is_unique:
+        raise ValueError("series with duplicated index is not accepted")
+    results = {'pvals': {},
+        'increase': {},
+        '_in_sign': {},
+        '_in_non': {},
+        '_out_sign': {},
+        '_out_non': {}}
+    for group_name, group_list in groups.items():
+        res = _test_group(pvals, group_name, group_list, exact)
+        results['pvals'][group_name] = res[0]
+        results['increase'][group_name] = res[1]
+        results['_in_sign'][group_name] = res[2][0]
+        results['_in_non'][group_name] = res[2][1]
+        results['_out_sign'][group_name] = res[2][2]
+        results['_out_non'][group_name] = res[2][3]
+    result_df = pd.DataFrame(results).sort_values('pvals')
+    if not keep_all:
+        result_df = result_df[result_df.increase]
+    smt = stats.multipletests
+    corrected = smt(result_df['pvals'], method='fdr_bh', alpha=alpha)[1]
+    result_df['adj_pvals'] = corrected
+    return result_df
diff --git a/statsmodels/sandbox/nonparametric/densityorthopoly.py b/statsmodels/sandbox/nonparametric/densityorthopoly.py
index 3019ae76e..03adb1762 100644
--- a/statsmodels/sandbox/nonparametric/densityorthopoly.py
+++ b/statsmodels/sandbox/nonparametric/densityorthopoly.py
@@ -1,4 +1,6 @@
-"""density estimation based on orthogonal polynomials
+# -*- coding: utf-8 -*-
+# some cut and paste characters are not ASCII
+'''density estimation based on orthogonal polynomials


 Author: Josef Perktold
@@ -34,14 +36,16 @@ enhancements:
   * local or piecewise approximations


-"""
+'''
 from scipy import stats, integrate, special
+
 import numpy as np
-sqr2 = np.sqrt(2.0)


+sqr2 = np.sqrt(2.)
+
 class FPoly:
-    """Orthonormal (for weight=1) Fourier Polynomial on [0,1]
+    '''Orthonormal (for weight=1) Fourier Polynomial on [0,1]

     orthonormal polynomial but density needs corfactor that I do not see what
     it is analytically
@@ -52,11 +56,11 @@ class FPoly:
     2010 John Wiley & Sons, Inc. WIREs Comp Stat 2010 2 467-476


-    """
+    '''

     def __init__(self, order):
         self.order = order
-        self.domain = 0, 1
+        self.domain = (0, 1)
         self.intdomain = self.domain

     def __call__(self, x):
@@ -65,9 +69,8 @@ class FPoly:
         else:
             return sqr2 * np.cos(np.pi * self.order * x)

-
 class F2Poly:
-    """Orthogonal (for weight=1) Fourier Polynomial on [0,pi]
+    '''Orthogonal (for weight=1) Fourier Polynomial on [0,pi]

     is orthogonal but first component does not square-integrate to 1
     final result seems to need a correction factor of sqrt(pi)
@@ -78,11 +81,11 @@ class F2Poly:
     Peter Hall, Cross-Validation and the Smoothing of Orthogonal Series Density
     Estimators, JOURNAL OF MULTIVARIATE ANALYSIS 21, 189-206 (1987)

-    """
+    '''

     def __init__(self, order):
         self.order = order
-        self.domain = 0, np.pi
+        self.domain = (0, np.pi)
         self.intdomain = self.domain
         self.offsetfactor = 0

@@ -92,9 +95,8 @@ class F2Poly:
         else:
             return sqr2 * np.cos(self.order * x) / np.sqrt(np.pi)

-
 class ChebyTPoly:
-    """Orthonormal (for weight=1) Chebychev Polynomial on (-1,1)
+    '''Orthonormal (for weight=1) Chebychev Polynomial on (-1,1)


     Notes
@@ -105,51 +107,55 @@ class ChebyTPoly:

     or maybe there is a mistake close to the boundary, sometimes integration works.

-    """
+    '''

     def __init__(self, order):
         self.order = order
         from scipy.special import chebyt
         self.poly = chebyt(order)
-        self.domain = -1, 1
-        self.intdomain = -1 + 1e-06, 1 - 1e-06
-        self.offsetfactor = 0.01
+        self.domain = (-1, 1)
+        self.intdomain = (-1+1e-6, 1-1e-6)
+        #not sure if I need this, in integration nans are possible  on the boundary
+        self.offsetfactor = 0.01  #required for integration
+

     def __call__(self, x):
         if self.order == 0:
-            return np.ones_like(x) / (1 - x ** 2) ** (1 / 4.0) / np.sqrt(np.pi)
-        else:
-            return self.poly(x) / (1 - x ** 2) ** (1 / 4.0) / np.sqrt(np.pi
-                ) * np.sqrt(2)
+            return np.ones_like(x) / (1-x**2)**(1/4.) /np.sqrt(np.pi)

+        else:
+            return self.poly(x) / (1-x**2)**(1/4.) /np.sqrt(np.pi) *np.sqrt(2)

-logpi2 = np.log(np.pi) / 2

+logpi2 = np.log(np.pi)/2

 class HPoly:
-    """Orthonormal (for weight=1) Hermite Polynomial, uses finite bounds
+    '''Orthonormal (for weight=1) Hermite Polynomial, uses finite bounds

     for current use with DensityOrthoPoly domain is defined as [-6,6]

-    """
-
+    '''
     def __init__(self, order):
         self.order = order
         from scipy.special import hermite
         self.poly = hermite(order)
-        self.domain = -6, +6
-        self.offsetfactor = 0.5
+        self.domain = (-6, +6)
+        self.offsetfactor = 0.5  # note this is

     def __call__(self, x):
         k = self.order
-        lnfact = -(1.0 / 2) * (k * np.log(2.0) + special.gammaln(k + 1) +
-            logpi2) - x * x / 2
+
+        lnfact = -(1./2)*(k*np.log(2.) + special.gammaln(k+1) + logpi2) - x*x/2
         fact = np.exp(lnfact)
+
         return self.poly(x) * fact

+def polyvander(x, polybase, order=5):
+    polyarr = np.column_stack([polybase(i)(x) for i in range(order)])
+    return polyarr

 def inner_cont(polys, lower, upper, weight=None):
-    """inner product of continuous function (with weight=1)
+    '''inner product of continuous function (with weight=1)

     Parameters
     ----------
@@ -180,12 +186,32 @@ def inner_cont(polys, lower, upper, weight=None):
            [-0.66666667,  0.        ,  0.93333333,  0.        ],
            [ 0.        , -0.4       ,  0.        ,  0.97142857]])

-    """
-    pass
+    '''
+    n_polys = len(polys)
+    innerprod = np.empty((n_polys, n_polys))
+    innerprod.fill(np.nan)
+    interr = np.zeros((n_polys, n_polys))
+
+    for i in range(n_polys):
+        for j in range(i+1):
+            p1 = polys[i]
+            p2 = polys[j]
+            if weight is not None:
+                innp, err = integrate.quad(lambda x: p1(x)*p2(x)*weight(x),
+                                       lower, upper)
+            else:
+                innp, err = integrate.quad(lambda x: p1(x)*p2(x), lower, upper)
+            innerprod[i,j] = innp
+            interr[i,j] = err
+            if not i == j:
+                innerprod[j,i] = innp
+                interr[j,i] = err
+
+    return innerprod, interr


 def is_orthonormal_cont(polys, lower, upper, rtol=0, atol=1e-08):
-    """check whether functions are orthonormal
+    '''check whether functions are orthonormal

     Parameters
     ----------
@@ -227,12 +253,24 @@ def is_orthonormal_cont(polys, lower, upper, rtol=0, atol=1e-08):
     >>> is_orthonormal_cont(polys, -1, 1, atol=1e-6)
     True

-    """
-    pass
+    '''
+    for i in range(len(polys)):
+        for j in range(i+1):
+            p1 = polys[i]
+            p2 = polys[j]
+            innerprod = integrate.quad(lambda x: p1(x)*p2(x), lower, upper)[0]
+            #print i,j, innerprod
+            if not np.allclose(innerprod, i==j, rtol=rtol, atol=atol):
+                return False
+    return True
+
+
+
+#new versions


 class DensityOrthoPoly:
-    """Univariate density estimation by orthonormal series expansion
+    '''Univariate density estimation by orthonormal series expansion


     Uses an orthonormal polynomial basis to approximate a univariate density.
@@ -240,92 +278,213 @@ class DensityOrthoPoly:

     currently all arguments can be given to fit, I might change it to requiring
     arguments in __init__ instead.
-    """
+    '''

     def __init__(self, polybase=None, order=5):
         if polybase is not None:
             self.polybase = polybase
             self.polys = polys = [polybase(i) for i in range(order)]
+        #try:
+        #self.offsetfac = 0.05
+        #self.offsetfac = polys[0].offsetfactor #polys maybe not defined yet
         self._corfactor = 1
         self._corshift = 0

+
     def fit(self, x, polybase=None, order=5, limits=None):
-        """estimate the orthogonal polynomial approximation to the density
+        '''estimate the orthogonal polynomial approximation to the density

-        """
-        pass
+        '''
+        if polybase is None:
+            polys = self.polys[:order]
+        else:
+            self.polybase = polybase
+            self.polys = polys = [polybase(i) for i in range(order)]
+
+        #move to init ?
+        if not hasattr(self, 'offsetfac'):
+            self.offsetfac = polys[0].offsetfactor
+
+
+        xmin, xmax = x.min(), x.max()
+        if limits is None:
+            self.offset = offset = (xmax - xmin) * self.offsetfac
+            limits = self.limits = (xmin - offset, xmax + offset)
+
+        interval_length = limits[1] - limits[0]
+        xinterval = xmax - xmin
+        # need to cover (half-)open intervalls
+        self.shrink = 1. / interval_length #xinterval/interval_length
+        offset = (interval_length - xinterval ) / 2.
+        self.shift = xmin - offset
+
+        self.x = x = self._transform(x)
+
+        coeffs = [(p(x)).mean() for p in polys]
+        self.coeffs = coeffs
+        self.polys = polys
+        self._verify()  #verify that it is a proper density
+
+        return self #coeffs, polys
+
+    def evaluate(self, xeval, order=None):
+        xeval = self._transform(xeval)
+        if order is None:
+            order = len(self.polys)
+        res = sum(c*p(xeval) for c, p in list(zip(self.coeffs, self.polys))[:order])
+        res = self._correction(res)
+        return res

     def __call__(self, xeval):
-        """alias for evaluate, except no order argument"""
+        '''alias for evaluate, except no order argument'''
         return self.evaluate(xeval)

     def _verify(self):
-        """check for bona fide density correction
+        '''check for bona fide density correction

         currently only checks that density integrates to 1

 `       non-negativity - NotImplementedYet
-        """
-        pass
+        '''
+        #watch out for circular/recursive usage
+
+        #evaluate uses domain of data, we stay offset away from bounds
+        intdomain = self.limits #self.polys[0].intdomain
+        self._corfactor = 1./integrate.quad(self.evaluate, *intdomain)[0]
+        #self._corshift = 0
+        #self._corfactor
+        return self._corfactor
+
+

     def _correction(self, x):
-        """bona fide density correction
+        '''bona fide density correction

         affine shift of density to make it into a proper density

-        """
-        pass
+        '''
+        if self._corfactor != 1:
+            x *= self._corfactor
+
+        if self._corshift != 0:
+            x += self._corshift

-    def _transform(self, x):
-        """transform observation to the domain of the density
+        return x
+
+    def _transform(self, x): # limits=None):
+        '''transform observation to the domain of the density


         uses shrink and shift attribute which are set in fit to stay


-        """
-        pass
+        '''
+
+        #use domain from first instance
+        #class does not have domain  self.polybase.domain[0] AttributeError
+        domain = self.polys[0].domain
+
+        ilen = (domain[1] - domain[0])
+        shift = self.shift - domain[0]/self.shrink/ilen
+        shrink = self.shrink * ilen
+
+        return (x - shift) * shrink
+
+
+#old version as a simple function
+def density_orthopoly(x, polybase, order=5, xeval=None):
+    #polybase = legendre  #chebyt #hermitenorm#
+    #polybase = chebyt
+    #polybase = FPoly
+    #polybase = ChtPoly
+    #polybase = hermite
+    #polybase = HPoly
+
+    if xeval is None:
+        xeval = np.linspace(x.min(),x.max(),50)
+
+    #polys = [legendre(i) for i in range(order)]
+    polys = [polybase(i) for i in range(order)]
+    #coeffs = [(p(x)*(1-x**2)**(-1/2.)).mean() for p in polys]
+    #coeffs = [(p(x)*np.exp(-x*x)).mean() for p in polys]
+    coeffs = [(p(x)).mean() for p in polys]
+    res = sum(c*p(xeval) for c, p in zip(coeffs, polys))
+    #res *= (1-xeval**2)**(-1/2.)
+    #res *= np.exp(-xeval**2./2)
+    return res, xeval, coeffs, polys
+


 if __name__ == '__main__':
-    examples = ['chebyt', 'fourier', 'hermite']
+
+    examples = ['chebyt', 'fourier', 'hermite']#[2]
+
     nobs = 10000
+
     import matplotlib.pyplot as plt
-    from statsmodels.distributions.mixture_rvs import mixture_rvs, MixtureDistribution
-    mix_kwds = dict(loc=-0.5, scale=0.5), dict(loc=1, scale=0.2)
-    obs_dist = mixture_rvs([1 / 3.0, 2 / 3.0], size=nobs, dist=[stats.norm,
-        stats.norm], kwargs=mix_kwds)
+    from statsmodels.distributions.mixture_rvs import (
+                                                mixture_rvs, MixtureDistribution)
+
+    #np.random.seed(12345)
+##    obs_dist = mixture_rvs([1/3.,2/3.], size=nobs, dist=[stats.norm, stats.norm],
+##                   kwargs = (dict(loc=-1,scale=.5),dict(loc=1,scale=.75)))
+    mix_kwds = (dict(loc=-0.5,scale=.5),dict(loc=1,scale=.2))
+    obs_dist = mixture_rvs([1/3.,2/3.], size=nobs, dist=[stats.norm, stats.norm],
+                   kwargs=mix_kwds)
     mix = MixtureDistribution()
-    if 'chebyt_' in examples:
-        obs_dist = obs_dist[(obs_dist > -2) & (obs_dist < 2)] / 2.0
-        f_hat, grid, coeffs, polys = density_orthopoly(obs_dist, ChebyTPoly,
-            order=20, xeval=None)
+
+    #obs_dist = np.random.randn(nobs)/4. #np.sqrt(2)
+
+
+    if "chebyt_" in examples: # needed for Cheby example below
+        #obs_dist = np.clip(obs_dist, -2, 2)/2.01
+        #chebyt [0,1]
+        obs_dist = obs_dist[(obs_dist>-2) & (obs_dist<2)]/2.0 #/4. + 2/4.0
+        #fourier [0,1]
+        #obs_dist = obs_dist[(obs_dist>-2) & (obs_dist<2)]/4. + 2/4.0
+        f_hat, grid, coeffs, polys = density_orthopoly(obs_dist, ChebyTPoly, order=20, xeval=None)
+        #f_hat /= f_hat.sum() * (grid.max() - grid.min())/len(grid)
         f_hat0 = f_hat
-        fint = integrate.trapz(f_hat, grid)
+        fint = integrate.trapz(f_hat, grid)# dx=(grid.max() - grid.min())/len(grid))
+        #f_hat -= fint/2.
         print('f_hat.min()', f_hat.min())
-        f_hat = f_hat - f_hat.min()
-        fint2 = integrate.trapz(f_hat, grid)
+        f_hat = (f_hat - f_hat.min()) #/ f_hat.max() - f_hat.min
+        fint2 = integrate.trapz(f_hat, grid)# dx=(grid.max() - grid.min())/len(grid))
         print('fint2', fint, fint2)
         f_hat /= fint2
+
+        # note that this uses a *huge* grid by default
+        #f_hat, grid = kdensityfft(emp_dist, kernel="gauss", bw="scott")
+
+        # check the plot
+
         doplot = 0
         if doplot:
             plt.hist(obs_dist, bins=50, normed=True, color='red')
             plt.plot(grid, f_hat, lw=2, color='black')
             plt.plot(grid, f_hat0, lw=2, color='g')
             plt.show()
-        for i, p in enumerate(polys[:5]):
-            for j, p2 in enumerate(polys[:5]):
-                print(i, j, integrate.quad(lambda x: p(x) * p2(x), -1, 1)[0])
+
+        for i,p in enumerate(polys[:5]):
+            for j,p2 in enumerate(polys[:5]):
+                print(i,j,integrate.quad(lambda x: p(x)*p2(x), -1,1)[0])
+
         for p in polys:
-            print(integrate.quad(lambda x: p(x) ** 2, -1, 1))
-    if 'chebyt' in examples:
+            print(integrate.quad(lambda x: p(x)**2, -1,1))
+
+
+    #examples using the new class
+
+    if "chebyt" in examples:
         dop = DensityOrthoPoly().fit(obs_dist, ChebyTPoly, order=20)
         grid = np.linspace(obs_dist.min(), obs_dist.max())
         xf = dop(grid)
+        #print('np.max(np.abs(xf - f_hat0))', np.max(np.abs(xf - f_hat0))
         dopint = integrate.quad(dop, *dop.limits)[0]
         print('dop F integral', dopint)
-        mpdf = mix.pdf(grid, [1 / 3.0, 2 / 3.0], dist=[stats.norm, stats.
-            norm], kwargs=mix_kwds)
+        mpdf = mix.pdf(grid, [1/3.,2/3.], dist=[stats.norm, stats.norm],
+                   kwargs=mix_kwds)
+
         doplot = 1
         if doplot:
             plt.figure()
@@ -333,16 +492,20 @@ if __name__ == '__main__':
             plt.plot(grid, xf, lw=2, color='black')
             plt.plot(grid, mpdf, lw=2, color='green')
             plt.title('using Chebychev polynomials')
-    if 'fourier' in examples:
+            #plt.show()
+
+    if "fourier" in examples:
         dop = DensityOrthoPoly()
         dop.offsetfac = 0.5
         dop = dop.fit(obs_dist, F2Poly, order=30)
         grid = np.linspace(obs_dist.min(), obs_dist.max())
         xf = dop(grid)
+        #print(np.max(np.abs(xf - f_hat0))
         dopint = integrate.quad(dop, *dop.limits)[0]
         print('dop F integral', dopint)
-        mpdf = mix.pdf(grid, [1 / 3.0, 2 / 3.0], dist=[stats.norm, stats.
-            norm], kwargs=mix_kwds)
+        mpdf = mix.pdf(grid, [1/3.,2/3.], dist=[stats.norm, stats.norm],
+                   kwargs=mix_kwds)
+
         doplot = 1
         if doplot:
             plt.figure()
@@ -350,17 +513,24 @@ if __name__ == '__main__':
             plt.title('using Fourier polynomials')
             plt.plot(grid, xf, lw=2, color='black')
             plt.plot(grid, mpdf, lw=2, color='green')
-        print(np.max(np.abs(inner_cont(dop.polys[:5], 0, 1)[0] - np.eye(5))))
-    if 'hermite' in examples:
+            #plt.show()
+
+        #check orthonormality:
+        print(np.max(np.abs(inner_cont(dop.polys[:5], 0, 1)[0] -np.eye(5))))
+
+    if "hermite" in examples:
         dop = DensityOrthoPoly()
         dop.offsetfac = 0
         dop = dop.fit(obs_dist, HPoly, order=20)
         grid = np.linspace(obs_dist.min(), obs_dist.max())
         xf = dop(grid)
+        #print(np.max(np.abs(xf - f_hat0))
         dopint = integrate.quad(dop, *dop.limits)[0]
         print('dop F integral', dopint)
-        mpdf = mix.pdf(grid, [1 / 3.0, 2 / 3.0], dist=[stats.norm, stats.
-            norm], kwargs=mix_kwds)
+
+        mpdf = mix.pdf(grid, [1/3.,2/3.], dist=[stats.norm, stats.norm],
+                   kwargs=mix_kwds)
+
         doplot = 1
         if doplot:
             plt.figure()
@@ -369,16 +539,23 @@ if __name__ == '__main__':
             plt.plot(grid, mpdf, lw=2, color='green')
             plt.title('using Hermite polynomials')
             plt.show()
-        print(np.max(np.abs(inner_cont(dop.polys[:5], 0, 1)[0] - np.eye(5))))
+
+        #check orthonormality:
+        print(np.max(np.abs(inner_cont(dop.polys[:5], 0, 1)[0] -np.eye(5))))
+
+
+    #check orthonormality
+
     hpolys = [HPoly(i) for i in range(5)]
     inn = inner_cont(hpolys, -6, 6)[0]
     print(np.max(np.abs(inn - np.eye(5))))
-    print((inn * 100000).astype(int))
+    print((inn*100000).astype(int))
+
     from scipy.special import hermite, chebyt
     htpolys = [hermite(i) for i in range(5)]
     innt = inner_cont(htpolys, -10, 10)[0]
-    print((innt * 100000).astype(int))
+    print((innt*100000).astype(int))
+
     polysc = [chebyt(i) for i in range(4)]
-    r, e = inner_cont(polysc, -1, 1, weight=lambda x: (1 - x * x) ** (-1 / 2.0)
-        )
+    r, e = inner_cont(polysc, -1, 1, weight=lambda x: (1-x*x)**(-1/2.))
     print(np.max(np.abs(r - np.diag(np.diag(r)))))
diff --git a/statsmodels/sandbox/nonparametric/dgp_examples.py b/statsmodels/sandbox/nonparametric/dgp_examples.py
index a1c8612ca..2240b67fa 100644
--- a/statsmodels/sandbox/nonparametric/dgp_examples.py
+++ b/statsmodels/sandbox/nonparametric/dgp_examples.py
@@ -1,51 +1,53 @@
+# -*- coding: utf-8 -*-
 """Examples of non-linear functions for non-parametric regression

 Created on Sat Jan 05 20:21:22 2013

 Author: Josef Perktold
 """
+
 import numpy as np

+## Functions

 def fg1(x):
-    """Fan and Gijbels example function 1
-
-    """
-    pass
+    '''Fan and Gijbels example function 1

+    '''
+    return x + 2 * np.exp(-16 * x**2)

 def fg1eu(x):
-    """Eubank similar to Fan and Gijbels example function 1
-
-    """
-    pass
+    '''Eubank similar to Fan and Gijbels example function 1

+    '''
+    return x + 0.5 * np.exp(-50 * (x - 0.5)**2)

 def fg2(x):
-    """Fan and Gijbels example function 2
-
-    """
-    pass
+    '''Fan and Gijbels example function 2

+    '''
+    return np.sin(2 * x) + 2 * np.exp(-16 * x**2)

 def func1(x):
-    """made up example with sin, square
+    '''made up example with sin, square

-    """
-    pass
+    '''
+    return np.sin(x * 5) / x + 2. * x - 1. * x**2

+## Classes with Data Generating Processes

 doc = {'description':
-    """Base Class for Univariate non-linear example
+'''Base Class for Univariate non-linear example

     Does not work on it's own.
     needs additional at least self.func
-"""
-    , 'ref': ''}
-
+''',
+'ref': ''}

 class _UnivariateFunction:
-    __doc__ = """%(description)s
+    #Base Class for Univariate non-linear example.
+    #Does not work on it's own. needs additionally at least self.func
+    __doc__ = '''%(description)s

     Parameters
     ----------
@@ -73,27 +75,34 @@ class _UnivariateFunction:
         underlying function (defined by subclass)

     %(ref)s
-    """
+    ''' #% doc

     def __init__(self, nobs=200, x=None, distr_x=None, distr_noise=None):
+
         if x is None:
             if distr_x is None:
                 x = np.random.normal(loc=0, scale=self.s_x, size=nobs)
             else:
                 x = distr_x.rvs(size=nobs)
             x.sort()
+
         self.x = x
+
         if distr_noise is None:
             noise = np.random.normal(loc=0, scale=self.s_noise, size=nobs)
         else:
             noise = distr_noise.rvs(size=nobs)
+
         if hasattr(self, 'het_scale'):
             noise *= self.het_scale(self.x)
+
+        #self.func = fg1
         self.y_true = y_true = self.func(x)
         self.y = y_true + noise

+
     def plot(self, scatter=True, ax=None):
-        """plot the mean function and optionally the scatter of the sample
+        '''plot the mean function and optionally the scatter of the sample

         Parameters
         ----------
@@ -109,60 +118,68 @@ class _UnivariateFunction:
             This is either the created figure instance or the one associated
             with ax if ax is given.

-        """
-        pass
+        '''
+        if ax is None:
+            import matplotlib.pyplot as plt
+            fig = plt.figure()
+            ax = fig.add_subplot(1, 1, 1)
+
+        if scatter:
+            ax.plot(self.x, self.y, 'o', alpha=0.5)

+        xx = np.linspace(self.x.min(), self.x.max(), 100)
+        ax.plot(xx, self.func(xx), lw=2, color='b', label='dgp mean')
+        return ax.figure

 doc = {'description':
-    """Fan and Gijbels example function 1
+'''Fan and Gijbels example function 1

 linear trend plus a hump
-""",
-    'ref':
-    """
+''',
+'ref':
+'''
 References
 ----------
 Fan, Jianqing, and Irene Gijbels. 1992. "Variable Bandwidth and Local
 Linear Regression Smoothers."
 The Annals of Statistics 20 (4) (December): 2008-2036. doi:10.2307/2242378.

-"""
-    }
-
+'''}

 class UnivariateFanGijbels1(_UnivariateFunction):
     __doc__ = _UnivariateFunction.__doc__ % doc

+
     def __init__(self, nobs=200, x=None, distr_x=None, distr_noise=None):
-        self.s_x = 1.0
+        self.s_x = 1.
         self.s_noise = 0.7
         self.func = fg1
-        super(self.__class__, self).__init__(nobs=nobs, x=x, distr_x=
-            distr_x, distr_noise=distr_noise)
-
+        super(self.__class__, self).__init__(nobs=nobs, x=x,
+                                             distr_x=distr_x,
+                                             distr_noise=distr_noise)

-doc['description'] = """Fan and Gijbels example function 2
+doc['description'] =\
+'''Fan and Gijbels example function 2

 sin plus a hump
-"""
-
+'''

 class UnivariateFanGijbels2(_UnivariateFunction):
     __doc__ = _UnivariateFunction.__doc__ % doc

     def __init__(self, nobs=200, x=None, distr_x=None, distr_noise=None):
-        self.s_x = 1.0
+        self.s_x = 1.
         self.s_noise = 0.5
         self.func = fg2
-        super(self.__class__, self).__init__(nobs=nobs, x=x, distr_x=
-            distr_x, distr_noise=distr_noise)
-
+        super(self.__class__, self).__init__(nobs=nobs, x=x,
+                                             distr_x=distr_x,
+                                             distr_noise=distr_noise)

 class UnivariateFanGijbels1EU(_UnivariateFunction):
-    """
+    '''

     Eubank p.179f
-    """
+    '''

     def __init__(self, nobs=50, x=None, distr_x=None, distr_noise=None):
         if distr_x is None:
@@ -170,15 +187,15 @@ class UnivariateFanGijbels1EU(_UnivariateFunction):
             distr_x = stats.uniform
         self.s_noise = 0.15
         self.func = fg1eu
-        super(self.__class__, self).__init__(nobs=nobs, x=x, distr_x=
-            distr_x, distr_noise=distr_noise)
-
+        super(self.__class__, self).__init__(nobs=nobs, x=x,
+                                             distr_x=distr_x,
+                                             distr_noise=distr_noise)

 class UnivariateFunc1(_UnivariateFunction):
-    """
+    '''

     made up, with sin and quadratic trend
-    """
+    '''

     def __init__(self, nobs=200, x=None, distr_x=None, distr_noise=None):
         if x is None and distr_x is None:
@@ -186,7 +203,11 @@ class UnivariateFunc1(_UnivariateFunction):
             distr_x = stats.uniform(-2, 4)
         else:
             nobs = x.shape[0]
-        self.s_noise = 2.0
+        self.s_noise = 2.
         self.func = func1
-        super(UnivariateFunc1, self).__init__(nobs=nobs, x=x, distr_x=
-            distr_x, distr_noise=distr_noise)
+        super(UnivariateFunc1, self).__init__(nobs=nobs, x=x,
+                                             distr_x=distr_x,
+                                             distr_noise=distr_noise)
+
+    def het_scale(self, x):
+        return np.sqrt(np.abs(3+x))
diff --git a/statsmodels/sandbox/nonparametric/kde2.py b/statsmodels/sandbox/nonparametric/kde2.py
index 686761b87..a8dc48a86 100644
--- a/statsmodels/sandbox/nonparametric/kde2.py
+++ b/statsmodels/sandbox/nonparametric/kde2.py
@@ -1,9 +1,13 @@
+# -*- coding: utf-8 -*-
 from statsmodels.compat.python import lzip
+
 import numpy as np
+
 from statsmodels.tools.validation import array_like
 from . import kernels


+#TODO: should this be a function?
 class KDE:
     """
     Kernel Density Estimator
@@ -15,60 +19,97 @@ class KDE:
     kernel : Kernel Class
         Should be a class from *
     """
-
+    #TODO: amend docs for Nd case?
     def __init__(self, x, kernel=None):
-        x = array_like(x, 'x', maxdim=2, contiguous=True)
+        x = array_like(x, "x", maxdim=2, contiguous=True)
         if x.ndim == 1:
-            x = x[:, None]
+            x = x[:,None]
+
         nobs, n_series = x.shape
+
         if kernel is None:
-            kernel = kernels.Gaussian()
+            kernel = kernels.Gaussian()  # no meaningful bandwidth yet
+
         if n_series > 1:
-            if isinstance(kernel, kernels.CustomKernel):
-                kernel = kernels.NdKernel(n_series, kernels=kernel)
+            if isinstance( kernel, kernels.CustomKernel ):
+                kernel = kernels.NdKernel(n_series, kernels = kernel)
+
         self.kernel = kernel
-        self.n = n_series
+        self.n = n_series  #TODO change attribute
         self.x = x

-    def __call__(self, x, h='scott'):
+    def density(self, x):
+        return self.kernel.density(self.x, x)
+
+    def __call__(self, x, h="scott"):
         return np.array([self.density(xx) for xx in x])

+    def evaluate(self, x, h="silverman"):
+        density = self.kernel.density
+        return np.array([density(xx) for xx in x])

-if __name__ == '__main__':
+
+if __name__ == "__main__":
     from numpy import random
     import matplotlib.pyplot as plt
     import statsmodels.nonparametric.bandwidths as bw
     from statsmodels.sandbox.nonparametric.testdata import kdetest
+
+    # 1-D case
     random.seed(142)
-    x = random.standard_t(4.2, size=50)
+    x = random.standard_t(4.2, size = 50)
     h = bw.bw_silverman(x)
-    support = np.linspace(-10, 10, 512)
-    kern = kernels.Gaussian(h=h)
-    kde = KDE(x, kern)
+    #NOTE: try to do it with convolution
+    support = np.linspace(-10,10,512)
+
+
+    kern = kernels.Gaussian(h = h)
+    kde = KDE( x, kern)
     print(kde.density(1.015469))
     print(0.2034675)
-    Xs = np.arange(-10, 10, 0.1)
+    Xs = np.arange(-10,10,0.1)
+
     fig = plt.figure()
     ax = fig.add_subplot(111)
-    ax.plot(Xs, kde(Xs), '-')
+    ax.plot(Xs, kde(Xs), "-")
     ax.set_ylim(-10, 10)
-    ax.set_ylim(0, 0.4)
-    x = lzip(kdetest.faithfulData['eruptions'], kdetest.faithfulData['waiting']
-        )
+    ax.set_ylim(0,0.4)
+
+
+    # 2-D case
+    x = lzip(kdetest.faithfulData["eruptions"], kdetest.faithfulData["waiting"])
     x = np.array(x)
-    x = (x - x.mean(0)) / x.std(0)
+    x = (x - x.mean(0))/x.std(0)
     nobs = x.shape[0]
     H = kdetest.Hpi
-    kern = kernels.NdKernel(2)
-    kde = KDE(x, kern)
-    print(kde.density(np.matrix([1, 2])))
+    kern = kernels.NdKernel( 2 )
+    kde = KDE( x, kern )
+    print(kde.density( np.matrix( [1,2 ]))) #.T
     plt.figure()
-    plt.plot(x[:, 0], x[:, 1], 'o')
+    plt.plot(x[:,0], x[:,1], 'o')
+
+
     n_grid = 50
     xsp = np.linspace(x.min(0)[0], x.max(0)[0], n_grid)
     ysp = np.linspace(x.min(0)[1], x.max(0)[1], n_grid)
+#    xsorted = np.sort(x)
+#    xlow = xsorted[nobs/4]
+#    xupp = xsorted[3*nobs/4]
+#    xsp = np.linspace(xlow[0], xupp[0], n_grid)
+#    ysp = np.linspace(xlow[1], xupp[1], n_grid)
     xr, yr = np.meshgrid(xsp, ysp)
-    kde_vals = np.array([kde.density(np.matrix([xi, yi])) for xi, yi in zip
-        (xr.ravel(), yr.ravel())])
+    kde_vals = np.array([kde.density( np.matrix( [xi, yi ]) ) for xi, yi in
+               zip(xr.ravel(), yr.ravel())])
     plt.contour(xsp, ysp, kde_vals.reshape(n_grid, n_grid))
+
     plt.show()
+
+
+    # 5 D case
+#    random.seed(142)
+#    mu = [1.0, 4.0, 3.5, -2.4, 0.0]
+#    sigma = np.matrix(
+#        [[ 0.6 - 0.1*abs(i-j) if i != j else 1.0 for j in xrange(5)] for i in xrange(5)])
+#    x = random.multivariate_normal(mu, sigma, size = 100)
+#    kern = kernel.Gaussian()
+#    kde = KernelEstimate( x, kern )
diff --git a/statsmodels/sandbox/nonparametric/kdecovclass.py b/statsmodels/sandbox/nonparametric/kdecovclass.py
index c4b9bf429..7919ae617 100644
--- a/statsmodels/sandbox/nonparametric/kdecovclass.py
+++ b/statsmodels/sandbox/nonparametric/kdecovclass.py
@@ -1,7 +1,8 @@
-"""subclassing kde
+'''subclassing kde

 Author: josef pktd
-"""
+'''
+
 import numpy as np
 from numpy.testing import assert_almost_equal, assert_
 import scipy
@@ -10,59 +11,154 @@ import matplotlib.pylab as plt


 class gaussian_kde_set_covariance(stats.gaussian_kde):
-    """
+    '''
     from Anne Archibald in mailinglist:
     http://www.nabble.com/Width-of-the-gaussian-in-stats.kde.gaussian_kde---td19558924.html#a19558924
-    """
-
+    '''
     def __init__(self, dataset, covariance):
         self.covariance = covariance
         scipy.stats.gaussian_kde.__init__(self, dataset)

+    def _compute_covariance(self):
+        self.inv_cov = np.linalg.inv(self.covariance)
+        self._norm_factor = np.sqrt(np.linalg.det(2*np.pi*self.covariance)) * self.n

-class gaussian_kde_covfact(stats.gaussian_kde):

-    def __init__(self, dataset, covfact='scotts'):
+class gaussian_kde_covfact(stats.gaussian_kde):
+    def __init__(self, dataset, covfact = 'scotts'):
         self.covfact = covfact
         scipy.stats.gaussian_kde.__init__(self, dataset)

     def _compute_covariance_(self):
-        """not used"""
-        pass
+        '''not used'''
+        self.inv_cov = np.linalg.inv(self.covariance)
+        self._norm_factor = np.sqrt(np.linalg.det(2*np.pi*self.covariance)) * self.n
+
+    def covariance_factor(self):
+        if self.covfact in ['sc', 'scotts']:
+            return self.scotts_factor()
+        if self.covfact in ['si', 'silverman']:
+            return self.silverman_factor()
+        elif self.covfact:
+            return float(self.covfact)
+        else:
+            raise ValueError('covariance factor has to be scotts, silverman or a number')
+
+    def reset_covfact(self, covfact):
+        self.covfact = covfact
+        self.covariance_factor()
+        self._compute_covariance()
+
+def plotkde(covfact):
+    gkde.reset_covfact(covfact)
+    kdepdf = gkde.evaluate(ind)
+    plt.figure()
+    # plot histgram of sample
+    plt.hist(xn, bins=20, normed=1)
+    # plot estimated density
+    plt.plot(ind, kdepdf, label='kde', color="g")
+    # plot data generating density
+    plt.plot(ind, alpha * stats.norm.pdf(ind, loc=mlow) +
+                  (1-alpha) * stats.norm.pdf(ind, loc=mhigh),
+                  color="r", label='DGP: normal mix')
+    plt.title('Kernel Density Estimation - ' + str(gkde.covfact))
+    plt.legend()
+
+
+def test_kde_1d():
+    np.random.seed(8765678)
+    n_basesample = 500
+    xn = np.random.randn(n_basesample)
+    xnmean = xn.mean()
+    xnstd = xn.std(ddof=1)
+    print(xnmean, xnstd)
+
+    # get kde for original sample
+    gkde = stats.gaussian_kde(xn)
+
+    # evaluate the density function for the kde for some points
+    xs = np.linspace(-7,7,501)
+    kdepdf = gkde.evaluate(xs)
+    normpdf = stats.norm.pdf(xs, loc=xnmean, scale=xnstd)
+    print('MSE', np.sum((kdepdf - normpdf)**2))
+    print('axabserror', np.max(np.abs(kdepdf - normpdf)))
+    intervall = xs[1] - xs[0]
+    assert_(np.sum((kdepdf - normpdf)**2)*intervall < 0.01)
+    #assert_array_almost_equal(kdepdf, normpdf, decimal=2)
+    print(gkde.integrate_gaussian(0.0, 1.0))
+    print(gkde.integrate_box_1d(-np.inf, 0.0))
+    print(gkde.integrate_box_1d(0.0, np.inf))
+    print(gkde.integrate_box_1d(-np.inf, xnmean))
+    print(gkde.integrate_box_1d(xnmean, np.inf))
+
+    assert_almost_equal(gkde.integrate_box_1d(xnmean, np.inf), 0.5, decimal=1)
+    assert_almost_equal(gkde.integrate_box_1d(-np.inf, xnmean), 0.5, decimal=1)
+    assert_almost_equal(gkde.integrate_box(xnmean, np.inf), 0.5, decimal=1)
+    assert_almost_equal(gkde.integrate_box(-np.inf, xnmean), 0.5, decimal=1)
+
+    assert_almost_equal(gkde.integrate_kde(gkde),
+                        (kdepdf**2).sum()*intervall, decimal=2)
+    assert_almost_equal(gkde.integrate_gaussian(xnmean, xnstd**2),
+                        (kdepdf*normpdf).sum()*intervall, decimal=2)
+##    assert_almost_equal(gkde.integrate_gaussian(0.0, 1.0),
+##                        (kdepdf*normpdf).sum()*intervall, decimal=2)
+
+


 if __name__ == '__main__':
+    # generate a sample
     n_basesample = 1000
     np.random.seed(8765678)
-    alpha = 0.6
-    mlow, mhigh = -3, 3
-    xn = np.concatenate([mlow + np.random.randn(alpha * n_basesample), 
-        mhigh + np.random.randn((1 - alpha) * n_basesample)])
+    alpha = 0.6 #weight for (prob of) lower distribution
+    mlow, mhigh = (-3,3)  #mean locations for gaussian mixture
+    xn =  np.concatenate([mlow + np.random.randn(alpha * n_basesample),
+                       mhigh + np.random.randn((1-alpha) * n_basesample)])
+
+    # get kde for original sample
+    #gkde = stats.gaussian_kde(xn)
     gkde = gaussian_kde_covfact(xn, 0.1)
-    ind = np.linspace(-7, 7, 101)
+    # evaluate the density function for the kde for some points
+    ind = np.linspace(-7,7,101)
     kdepdf = gkde.evaluate(ind)
+
     plt.figure()
+    # plot histgram of sample
     plt.hist(xn, bins=20, normed=1)
-    plt.plot(ind, kdepdf, label='kde', color='g')
-    plt.plot(ind, alpha * stats.norm.pdf(ind, loc=mlow) + (1 - alpha) *
-        stats.norm.pdf(ind, loc=mhigh), color='r', label='DGP: normal mix')
+    # plot estimated density
+    plt.plot(ind, kdepdf, label='kde', color="g")
+    # plot data generating density
+    plt.plot(ind, alpha * stats.norm.pdf(ind, loc=mlow) +
+                  (1-alpha) * stats.norm.pdf(ind, loc=mhigh),
+                  color="r", label='DGP: normal mix')
     plt.title('Kernel Density Estimation')
     plt.legend()
+
     gkde = gaussian_kde_covfact(xn, 'scotts')
     kdepdf = gkde.evaluate(ind)
     plt.figure()
+    # plot histgram of sample
     plt.hist(xn, bins=20, normed=1)
-    plt.plot(ind, kdepdf, label='kde', color='g')
-    plt.plot(ind, alpha * stats.norm.pdf(ind, loc=mlow) + (1 - alpha) *
-        stats.norm.pdf(ind, loc=mhigh), color='r', label='DGP: normal mix')
+    # plot estimated density
+    plt.plot(ind, kdepdf, label='kde', color="g")
+    # plot data generating density
+    plt.plot(ind, alpha * stats.norm.pdf(ind, loc=mlow) +
+                  (1-alpha) * stats.norm.pdf(ind, loc=mhigh),
+                  color="r", label='DGP: normal mix')
     plt.title('Kernel Density Estimation')
     plt.legend()
+    #plt.show()
     for cv in ['scotts', 'silverman', 0.05, 0.1, 0.5]:
         plotkde(cv)
+
     test_kde_1d()
+
+
     np.random.seed(8765678)
     n_basesample = 1000
     xn = np.random.randn(n_basesample)
     xnmean = xn.mean()
     xnstd = xn.std(ddof=1)
+
+    # get kde for original sample
     gkde = stats.gaussian_kde(xn)
diff --git a/statsmodels/sandbox/nonparametric/kernel_extras.py b/statsmodels/sandbox/nonparametric/kernel_extras.py
index ea6bbce05..9be6de30a 100644
--- a/statsmodels/sandbox/nonparametric/kernel_extras.py
+++ b/statsmodels/sandbox/nonparametric/kernel_extras.py
@@ -27,11 +27,17 @@ References
         Models", 2006, Econometric Reviews 25, 523-544

 """
+
+# TODO: make default behavior efficient=True above a certain n_obs
 import numpy as np
 from scipy import optimize
 from scipy.stats.mstats import mquantiles
+
 from statsmodels.nonparametric.api import KDEMultivariate, KernelReg
-from statsmodels.nonparametric._kernel_base import gpke, LeaveOneOut, _get_type_pos, _adjust_shape
+from statsmodels.nonparametric._kernel_base import \
+    gpke, LeaveOneOut, _get_type_pos, _adjust_shape
+
+
 __all__ = ['SingleIndexModel', 'SemiLinear', 'TestFForm']


@@ -71,7 +77,6 @@ class TestFForm:

     See chapter 12 in [1]  pp. 355-357.
     """
-
     def __init__(self, endog, exog, bw, var_type, fform, estimator, nboot=100):
         self.endog = endog
         self.exog = exog
@@ -82,6 +87,70 @@ class TestFForm:
         self.bw = KDEMultivariate(exog, bw=bw, var_type=var_type).bw
         self.sig = self._compute_sig()

+    def _compute_sig(self):
+        Y = self.endog
+        X = self.exog
+        b = self.estimator(Y, X)
+        m = self.fform(X, b)
+        n = np.shape(X)[0]
+        resid = Y - m
+        resid = resid - np.mean(resid)  # center residuals
+        self.test_stat = self._compute_test_stat(resid)
+        sqrt5 = np.sqrt(5.)
+        fct1 = (1 - sqrt5) / 2.
+        fct2 = (1 + sqrt5) / 2.
+        u1 = fct1 * resid
+        u2 = fct2 * resid
+        r = fct2 / sqrt5
+        I_dist = np.empty((self.nboot,1))
+        for j in range(self.nboot):
+            u_boot = u2.copy()
+
+            prob = np.random.uniform(0,1, size = (n,))
+            ind = prob < r
+            u_boot[ind] = u1[ind]
+            Y_boot = m + u_boot
+            b_hat = self.estimator(Y_boot, X)
+            m_hat = self.fform(X, b_hat)
+            u_boot_hat = Y_boot - m_hat
+            I_dist[j] = self._compute_test_stat(u_boot_hat)
+
+        self.boots_results = I_dist
+        sig = "Not Significant"
+        if self.test_stat > mquantiles(I_dist, 0.9):
+            sig = "*"
+        if self.test_stat > mquantiles(I_dist, 0.95):
+            sig = "**"
+        if self.test_stat > mquantiles(I_dist, 0.99):
+            sig = "***"
+        return sig
+
+    def _compute_test_stat(self, u):
+        n = np.shape(u)[0]
+        XLOO = LeaveOneOut(self.exog)
+        uLOO = LeaveOneOut(u[:,None]).__iter__()
+        ival = 0
+        S2 = 0
+        for i, X_not_i in enumerate(XLOO):
+            u_j = next(uLOO)
+            u_j = np.squeeze(u_j)
+            # See Bootstrapping procedure on p. 357 in [1]
+            K = gpke(self.bw, data=-X_not_i, data_predict=-self.exog[i, :],
+                     var_type=self.var_type, tosum=False)
+            f_i = (u[i] * u_j * K)
+            assert u_j.shape == K.shape
+            ival += f_i.sum()  # See eq. 12.7 on p. 355 in [1]
+            S2 += (f_i**2).sum()  # See Theorem 12.1 on p.356 in [1]
+            assert np.size(ival) == 1
+            assert np.size(S2) == 1
+
+        ival *= 1. / (n * (n - 1))
+        ix_cont = _get_type_pos(self.var_type)[0]
+        hp = self.bw[ix_cont].prod()
+        S2 *= 2 * hp / (n * (n - 1))
+        T = n * ival * np.sqrt(hp / S2)
+        return T
+

 class SingleIndexModel(KernelReg):
     """
@@ -123,7 +192,6 @@ class SingleIndexModel(KernelReg):
     In the parametric binary choice models the user usually assumes
     some distribution of g() such as normal or logistic.
     """
-
     def __init__(self, endog, exog, var_type):
         self.var_type = var_type
         self.K = len(var_type)
@@ -136,16 +204,64 @@ class SingleIndexModel(KernelReg):
         self.okertype = 'wangryzin'
         self.ukertype = 'aitchisonaitken'
         self.func = self._est_loc_linear
+
         self.b, self.bw = self._est_b_bw()

+    def _est_b_bw(self):
+        params0 = np.random.uniform(size=(self.K + 1, ))
+        b_bw = optimize.fmin(self.cv_loo, params0, disp=0)
+        b = b_bw[0:self.K]
+        bw = b_bw[self.K:]
+        bw = self._set_bw_bounds(bw)
+        return b, bw
+
+    def cv_loo(self, params):
+        # See p. 254 in Textbook
+        params = np.asarray(params)
+        b = params[0 : self.K]
+        bw = params[self.K:]
+        LOO_X = LeaveOneOut(self.exog)
+        LOO_Y = LeaveOneOut(self.endog).__iter__()
+        L = 0
+        for i, X_not_i in enumerate(LOO_X):
+            Y = next(LOO_Y)
+            #print b.shape, np.dot(self.exog[i:i+1, :], b).shape, bw,
+            G = self.func(bw, endog=Y, exog=-np.dot(X_not_i, b)[:,None],
+                          #data_predict=-b*self.exog[i, :])[0]
+                          data_predict=-np.dot(self.exog[i:i+1, :], b))[0]
+            #print G.shape
+            L += (self.endog[i] - G) ** 2
+
+        # Note: There might be a way to vectorize this. See p.72 in [1]
+        return L / self.nobs
+
+    def fit(self, data_predict=None):
+        if data_predict is None:
+            data_predict = self.exog
+        else:
+            data_predict = _adjust_shape(data_predict, self.K)
+
+        N_data_predict = np.shape(data_predict)[0]
+        mean = np.empty((N_data_predict,))
+        mfx = np.empty((N_data_predict, self.K))
+        for i in range(N_data_predict):
+            mean_mfx = self.func(self.bw, self.endog,
+                                 np.dot(self.exog, self.b)[:,None],
+                                 data_predict=np.dot(data_predict[i:i+1, :],self.b))
+            mean[i] = mean_mfx[0]
+            mfx_c = np.squeeze(mean_mfx[1])
+            mfx[i, :] = mfx_c
+
+        return mean, mfx
+
     def __repr__(self):
         """Provide something sane to print."""
-        repr = 'Single Index Model \n'
-        repr += 'Number of variables: K = ' + str(self.K) + '\n'
-        repr += 'Number of samples:   nobs = ' + str(self.nobs) + '\n'
-        repr += 'Variable types:      ' + self.var_type + '\n'
-        repr += 'BW selection method: cv_ls' + '\n'
-        repr += 'Estimator type: local constant' + '\n'
+        repr = "Single Index Model \n"
+        repr += "Number of variables: K = " + str(self.K) + "\n"
+        repr += "Number of samples:   nobs = " + str(self.nobs) + "\n"
+        repr += "Variable types:      " + self.var_type + "\n"
+        repr += "BW selection method: cv_ls" + "\n"
+        repr += "Estimator type: local constant" + "\n"
         return repr


@@ -209,6 +325,7 @@ class SemiLinear(KernelReg):
         self.okertype = 'wangryzin'
         self.ukertype = 'aitchisonaitken'
         self.func = self._est_loc_linear
+
         self.b, self.bw = self._est_b_bw()

     def _est_b_bw(self):
@@ -217,7 +334,12 @@ class SemiLinear(KernelReg):

         Minimizes ``cv_loo`` with respect to ``b`` and ``bw``.
         """
-        pass
+        params0 = np.random.uniform(size=(self.k_linear + self.K, ))
+        b_bw = optimize.fmin(self.cv_loo, params0, disp=0)
+        b = b_bw[0 : self.k_linear]
+        bw = b_bw[self.k_linear:]
+        #bw = self._set_bw_bounds(np.asarray(bw))
+        return b, bw

     def cv_loo(self, params):
         """
@@ -240,18 +362,58 @@ class SemiLinear(KernelReg):
         ----------
         See p.254 in [1]
         """
-        pass
+        params = np.asarray(params)
+        b = params[0 : self.k_linear]
+        bw = params[self.k_linear:]
+        LOO_X = LeaveOneOut(self.exog)
+        LOO_Y = LeaveOneOut(self.endog).__iter__()
+        LOO_Z = LeaveOneOut(self.exog_nonparametric).__iter__()
+        Xb = np.dot(self.exog, b)[:,None]
+        L = 0
+        for ii, X_not_i in enumerate(LOO_X):
+            Y = next(LOO_Y)
+            Z = next(LOO_Z)
+            Xb_j = np.dot(X_not_i, b)[:,None]
+            Yx = Y - Xb_j
+            G = self.func(bw, endog=Yx, exog=-Z,
+                          data_predict=-self.exog_nonparametric[ii, :])[0]
+            lt = Xb[ii, :] #.sum()  # linear term
+            L += (self.endog[ii] - lt - G) ** 2
+
+        return L

     def fit(self, exog_predict=None, exog_nonparametric_predict=None):
         """Computes fitted values and marginal effects"""
-        pass
+
+        if exog_predict is None:
+            exog_predict = self.exog
+        else:
+            exog_predict = _adjust_shape(exog_predict, self.k_linear)
+
+        if exog_nonparametric_predict is None:
+            exog_nonparametric_predict = self.exog_nonparametric
+        else:
+            exog_nonparametric_predict = _adjust_shape(exog_nonparametric_predict, self.K)
+
+        N_data_predict = np.shape(exog_nonparametric_predict)[0]
+        mean = np.empty((N_data_predict,))
+        mfx = np.empty((N_data_predict, self.K))
+        Y = self.endog - np.dot(exog_predict, self.b)[:,None]
+        for i in range(N_data_predict):
+            mean_mfx = self.func(self.bw, Y, self.exog_nonparametric,
+                                 data_predict=exog_nonparametric_predict[i, :])
+            mean[i] = mean_mfx[0]
+            mfx_c = np.squeeze(mean_mfx[1])
+            mfx[i, :] = mfx_c
+
+        return mean, mfx

     def __repr__(self):
         """Provide something sane to print."""
-        repr = 'Semiparamatric Partially Linear Model \n'
-        repr += 'Number of variables: K = ' + str(self.K) + '\n'
-        repr += 'Number of samples:   N = ' + str(self.nobs) + '\n'
-        repr += 'Variable types:      ' + self.var_type + '\n'
-        repr += 'BW selection method: cv_ls' + '\n'
-        repr += 'Estimator type: local constant' + '\n'
+        repr = "Semiparamatric Partially Linear Model \n"
+        repr += "Number of variables: K = " + str(self.K) + "\n"
+        repr += "Number of samples:   N = " + str(self.nobs) + "\n"
+        repr += "Variable types:      " + self.var_type + "\n"
+        repr += "BW selection method: cv_ls" + "\n"
+        repr += "Estimator type: local constant" + "\n"
         return repr
diff --git a/statsmodels/sandbox/nonparametric/kernels.py b/statsmodels/sandbox/nonparametric/kernels.py
index 0871e71e3..7b19707df 100644
--- a/statsmodels/sandbox/nonparametric/kernels.py
+++ b/statsmodels/sandbox/nonparametric/kernels.py
@@ -1,3 +1,5 @@
+# -*- coding: utf-8 -*-
+
 """
 This models contains the Kernels for Kernel smoothing.

@@ -11,6 +13,11 @@ Pointwise Kernel Confidence Bounds
 (smoothconf)
 http://fedc.wiwi.hu-berlin.de/xplore/ebooks/html/anr/anrhtmlframe62.html
 """
+
+# pylint: disable-msg=C0103
+# pylint: disable-msg=W0142
+# pylint: disable-msg=E1101
+# pylint: disable-msg=E0611
 from statsmodels.compat.python import lzip, lfilter
 import numpy as np
 import scipy.integrate
@@ -37,29 +44,57 @@ class NdKernel:
     In the case of the Gaussian these are both equivalent, and the second constructiong
     is prefered.
     """
-
-    def __init__(self, n, kernels=None, H=None):
+    def __init__(self, n, kernels = None, H = None):
         if kernels is None:
             kernels = Gaussian()
+
         self._kernels = kernels
         self.weights = None
+
         if H is None:
-            H = np.matrix(np.identity(n))
+            H = np.matrix( np.identity(n))
+
         self._H = H
-        self._Hrootinv = np.linalg.cholesky(H.I)
+        self._Hrootinv = np.linalg.cholesky( H.I )

     def getH(self):
         """Getter for kernel bandwidth, H"""
-        pass
+        return self._H

     def setH(self, value):
         """Setter for kernel bandwidth, H"""
-        pass
-    H = property(getH, setH, doc='Kernel bandwidth matrix')
+        self._H = value
+
+    H = property(getH, setH, doc="Kernel bandwidth matrix")
+
+    def density(self, xs, x):
+
+        n = len(xs)
+        #xs = self.in_domain( xs, xs, x )[0]
+
+        if len(xs)>0:  ## Need to do product of marginal distributions
+            #w = np.sum([self(self._Hrootinv * (xx-x).T ) for xx in xs])/n
+            #vectorized does not work:
+            if self.weights is not None:
+                w = np.mean(self((xs-x) * self._Hrootinv).T * self.weights)/sum(self.weights)
+            else:
+                w = np.mean(self((xs-x) * self._Hrootinv )) #transposed
+            #w = np.mean([self(xd) for xd in ((xs-x) * self._Hrootinv)] ) #transposed
+            return w
+        else:
+            return np.nan

-    def _kernweight(self, x):
+    def _kernweight(self, x ):
         """returns the kernel weight for the independent multivariate kernel"""
-        pass
+        if isinstance( self._kernels, CustomKernel ):
+            ## Radial case
+            #d = x.T * x
+            #x is matrix, 2d, element wise sqrt looks wrong
+            #d = np.sqrt( x.T * x )
+            x = np.asarray(x)
+            #d = np.sqrt( (x * x).sum(-1) )
+            d = (x * x).sum(-1)
+            return self._kernels( np.asarray(d) )

     def __call__(self, x):
         """
@@ -77,8 +112,11 @@ class CustomKernel:
     or providing a lambda expression and domain.
     The domain allows some algorithms to run faster for finite domain kernels.
     """
+    # MC: Not sure how this will look in the end - or even still exist.
+    # Main purpose of this is to allow custom kernels and to allow speed up
+    # from finite support.

-    def __init__(self, shape, h=1.0, domain=None, norm=None):
+    def __init__(self, shape, h = 1.0, domain = None, norm = None):
         """
         shape should be a function taking and returning numeric type.

@@ -101,13 +139,13 @@ class CustomKernel:
         Warning: I think several calculations assume that the kernel is
         normalized. No tests for non-normalized kernel.
         """
-        self._normconst = norm
+        self._normconst = norm   # a value or None, if None, then calculate
         self.domain = domain
         self.weights = None
         if callable(shape):
             self._shape = shape
         else:
-            raise TypeError('shape must be a callable object/function')
+            raise TypeError("shape must be a callable object/function")
         self._h = h
         self._L2Norm = None
         self._kernel_var = None
@@ -116,24 +154,57 @@ class CustomKernel:

     def geth(self):
         """Getter for kernel bandwidth, h"""
-        pass
-
+        return self._h
     def seth(self, value):
         """Setter for kernel bandwidth, h"""
-        pass
-    h = property(geth, seth, doc='Kernel Bandwidth')
+        self._h = value
+    h = property(geth, seth, doc="Kernel Bandwidth")

     def in_domain(self, xs, ys, x):
         """
         Returns the filtered (xs, ys) based on the Kernel domain centred on x
         """
-        pass
+        # Disable black-list functions: filter used for speed instead of
+        # list-comprehension
+        # pylint: disable-msg=W0141
+        def isInDomain(xy):
+            """Used for filter to check if point is in the domain"""
+            u = (xy[0]-x)/self.h
+            return np.all((u >= self.domain[0]) & (u <= self.domain[1]))
+
+        if self.domain is None:
+            return (xs, ys)
+        else:
+            filtered = lfilter(isInDomain, lzip(xs, ys))
+            if len(filtered) > 0:
+                xs, ys = lzip(*filtered)
+                return (xs, ys)
+            else:
+                return ([], [])

     def density(self, xs, x):
         """Returns the kernel density estimate for point x based on x-values
         xs
         """
-        pass
+        xs = np.asarray(xs)
+        n = len(xs) # before in_domain?
+        if self.weights is not None:
+            xs, weights = self.in_domain( xs, self.weights, x )
+        else:
+            xs = self.in_domain( xs, xs, x )[0]
+        xs = np.asarray(xs)
+        #print 'len(xs)', len(xs), x
+        if xs.ndim == 1:
+            xs = xs[:,None]
+        if len(xs)>0:
+            h = self.h
+            if self.weights is not None:
+                w = 1 / h * np.sum(self((xs-x)/h).T * weights, axis=1)
+            else:
+                w = 1. / (h * n) * np.sum(self((xs-x)/h), axis=0)
+            return w
+        else:
+            return np.nan

     def density_var(self, density, nobs):
         """approximate pointwise variance for kernel density
@@ -157,7 +228,7 @@ class CustomKernel:
         This uses the asymptotic normal approximation to the distribution of
         the density estimate.
         """
-        pass
+        return np.asarray(density) * self.L2Norm / self.h / nobs

     def density_confint(self, density, nobs, alpha=0.05):
         """approximate pointwise confidence interval for kernel density
@@ -186,41 +257,116 @@ class CustomKernel:
         the density estimate. The lower bound can be negative for density
         values close to zero.
         """
-        pass
+        from scipy import stats
+        crit = stats.norm.isf(alpha / 2.)
+        density = np.asarray(density)
+        half_width = crit * np.sqrt(self.density_var(density, nobs))
+        conf_int = np.column_stack((density - half_width, density + half_width))
+        return conf_int

     def smooth(self, xs, ys, x):
         """Returns the kernel smoothing estimate for point x based on x-values
         xs and y-values ys.
         Not expected to be called by the user.
         """
-        pass
+        xs, ys = self.in_domain(xs, ys, x)
+
+        if len(xs)>0:
+            w = np.sum(self((xs-x)/self.h))
+            #TODO: change the below to broadcasting when shape is sorted
+            v = np.sum([yy*self((xx-x)/self.h) for xx, yy in zip(xs, ys)])
+            return v / w
+        else:
+            return np.nan

     def smoothvar(self, xs, ys, x):
         """Returns the kernel smoothing estimate of the variance at point x.
         """
-        pass
+        xs, ys = self.in_domain(xs, ys, x)
+
+        if len(xs) > 0:
+            fittedvals = np.array([self.smooth(xs, ys, xx) for xx in xs])
+            sqresid = square( subtract(ys, fittedvals) )
+            w = np.sum(self((xs-x)/self.h))
+            v = np.sum([rr*self((xx-x)/self.h) for xx, rr in zip(xs, sqresid)])
+            return v / w
+        else:
+            return np.nan

     def smoothconf(self, xs, ys, x, alpha=0.05):
         """Returns the kernel smoothing estimate with confidence 1sigma bounds
         """
-        pass
+        xs, ys = self.in_domain(xs, ys, x)
+
+        if len(xs) > 0:
+            fittedvals = np.array([self.smooth(xs, ys, xx) for xx in xs])
+            #fittedvals = self.smooth(xs, ys, x) # x or xs in Haerdle
+            sqresid = square(
+                subtract(ys, fittedvals)
+            )
+            w = np.sum(self((xs-x)/self.h))
+            #var = sqresid.sum() / (len(sqresid) - 0)  # nonlocal var ? JP just trying
+            v = np.sum([rr*self((xx-x)/self.h) for xx, rr in zip(xs, sqresid)])
+            var = v / w
+            sd = np.sqrt(var)
+            K = self.L2Norm
+            yhat = self.smooth(xs, ys, x)
+            from scipy import stats
+            crit = stats.norm.isf(alpha / 2)
+            err = crit * sd * np.sqrt(K) / np.sqrt(w * self.h * self.norm_const)
+            return (yhat - err, yhat, yhat + err)
+        else:
+            return (np.nan, np.nan, np.nan)

     @property
     def L2Norm(self):
         """Returns the integral of the square of the kernal from -inf to inf"""
-        pass
+        if self._L2Norm is None:
+            L2Func = lambda x: (self.norm_const*self._shape(x))**2
+            if self.domain is None:
+                self._L2Norm = scipy.integrate.quad(L2Func, -inf, inf)[0]
+            else:
+                self._L2Norm = scipy.integrate.quad(L2Func, self.domain[0],
+                                               self.domain[1])[0]
+        return self._L2Norm

     @property
     def norm_const(self):
         """
         Normalising constant for kernel (integral from -inf to inf)
         """
-        pass
+        if self._normconst is None:
+            if self.domain is None:
+                quadres = scipy.integrate.quad(self._shape, -inf, inf)
+            else:
+                quadres = scipy.integrate.quad(self._shape, self.domain[0],
+                                               self.domain[1])
+            self._normconst = 1.0/(quadres[0])
+        return self._normconst

     @property
     def kernel_var(self):
         """Returns the second moment of the kernel"""
-        pass
+        if self._kernel_var is None:
+            func = lambda x: x**2 * self.norm_const * self._shape(x)
+            if self.domain is None:
+                self._kernel_var = scipy.integrate.quad(func, -inf, inf)[0]
+            else:
+                self._kernel_var = scipy.integrate.quad(func, self.domain[0],
+                                               self.domain[1])[0]
+        return self._kernel_var
+
+    def moments(self, n):
+
+        if n > 2:
+            msg = "Only first and second moment currently implemented"
+            raise NotImplementedError(msg)
+
+        if n == 1:
+            return 0
+
+        if n == 2:
+            return self.kernel_var

     @property
     def normal_reference_constant(self):
@@ -235,11 +381,24 @@ class CustomKernel:

         Note: L2Norm property returns square of norm.
         """
-        pass
+        nu = self._order
+
+        if not nu == 2:
+            msg = "Only implemented for second order kernels"
+            raise NotImplementedError(msg)
+
+        if self._normal_reference_constant is None:
+            C = np.pi**(.5) * factorial(nu)**3 * self.L2Norm
+            C /= (2 * nu * factorial(2 * nu) * self.moments(nu)**2)
+            C = 2*C**(1.0/(2*nu+1))
+            self._normal_reference_constant = C
+
+        return self._normal_reference_constant
+

     def weight(self, x):
         """This returns the normalised weight at distance x"""
-        pass
+        return self.norm_const*self._shape(x)

     def __call__(self, x):
         """
@@ -251,42 +410,38 @@ class CustomKernel:


 class Uniform(CustomKernel):
-
     def __init__(self, h=1.0):
-        CustomKernel.__init__(self, shape=lambda x: 0.5 * np.ones(x.shape),
-            h=h, domain=[-1.0, 1.0], norm=1.0)
+        CustomKernel.__init__(self, shape=lambda x: 0.5 * np.ones(x.shape), h=h,
+                              domain=[-1.0, 1.0], norm = 1.0)
         self._L2Norm = 0.5
-        self._kernel_var = 1.0 / 3
+        self._kernel_var = 1. / 3
         self._order = 2


 class Triangular(CustomKernel):
-
     def __init__(self, h=1.0):
-        CustomKernel.__init__(self, shape=lambda x: 1 - abs(x), h=h, domain
-            =[-1.0, 1.0], norm=1.0)
-        self._L2Norm = 2.0 / 3.0
-        self._kernel_var = 1.0 / 6
+        CustomKernel.__init__(self, shape=lambda x: 1 - abs(x), h=h,
+                              domain=[-1.0, 1.0], norm = 1.0)
+        self._L2Norm = 2.0/3.0
+        self._kernel_var = 1. / 6
         self._order = 2


 class Epanechnikov(CustomKernel):
-
     def __init__(self, h=1.0):
-        CustomKernel.__init__(self, shape=lambda x: 0.75 * (1 - x * x), h=h,
-            domain=[-1.0, 1.0], norm=1.0)
+        CustomKernel.__init__(self, shape=lambda x: 0.75*(1 - x*x), h=h,
+                              domain=[-1.0, 1.0], norm = 1.0)
         self._L2Norm = 0.6
         self._kernel_var = 0.2
         self._order = 2


 class Biweight(CustomKernel):
-
     def __init__(self, h=1.0):
-        CustomKernel.__init__(self, shape=lambda x: 0.9375 * (1 - x * x) **
-            2, h=h, domain=[-1.0, 1.0], norm=1.0)
-        self._L2Norm = 5.0 / 7.0
-        self._kernel_var = 1.0 / 7
+        CustomKernel.__init__(self, shape=lambda x: 0.9375*(1 - x*x)**2, h=h,
+                              domain=[-1.0, 1.0], norm = 1.0)
+        self._L2Norm = 5.0/7.0
+        self._kernel_var = 1. / 7
         self._order = 2

     def smooth(self, xs, ys, x):
@@ -296,27 +451,61 @@ class Biweight(CustomKernel):

         Special implementation optimized for Biweight.
         """
-        pass
+        xs, ys = self.in_domain(xs, ys, x)
+
+        if len(xs) > 0:
+            w = np.sum(square(subtract(1, square(divide(subtract(xs, x),
+                                                        self.h)))))
+            v = np.sum(multiply(ys, square(subtract(1, square(divide(
+                                                subtract(xs, x), self.h))))))
+            return v / w
+        else:
+            return np.nan

     def smoothvar(self, xs, ys, x):
         """
         Returns the kernel smoothing estimate of the variance at point x.
         """
-        pass
+        xs, ys = self.in_domain(xs, ys, x)
+
+        if len(xs) > 0:
+            fittedvals = np.array([self.smooth(xs, ys, xx) for xx in xs])
+            rs = square(subtract(ys, fittedvals))
+            w = np.sum(square(subtract(1.0, square(divide(subtract(xs, x),
+                                                        self.h)))))
+            v = np.sum(multiply(rs, square(subtract(1, square(divide(
+                                                subtract(xs, x), self.h))))))
+            return v / w
+        else:
+            return np.nan

     def smoothconf_(self, xs, ys, x):
         """Returns the kernel smoothing estimate with confidence 1sigma bounds
         """
-        pass
-
+        xs, ys = self.in_domain(xs, ys, x)
+
+        if len(xs) > 0:
+            fittedvals = np.array([self.smooth(xs, ys, xx) for xx in xs])
+            rs = square(subtract(ys, fittedvals))
+            w = np.sum(square(subtract(1.0, square(divide(subtract(xs, x),
+                                                        self.h)))))
+            v = np.sum(multiply(rs, square(subtract(1, square(divide(
+                                                subtract(xs, x), self.h))))))
+            var = v / w
+            sd = np.sqrt(var)
+            K = self.L2Norm
+            yhat = self.smooth(xs, ys, x)
+            err = sd * K / np.sqrt(0.9375 * w * self.h)
+            return (yhat - err, yhat, yhat + err)
+        else:
+            return (np.nan, np.nan, np.nan)

 class Triweight(CustomKernel):
-
     def __init__(self, h=1.0):
-        CustomKernel.__init__(self, shape=lambda x: 1.09375 * (1 - x * x) **
-            3, h=h, domain=[-1.0, 1.0], norm=1.0)
-        self._L2Norm = 350.0 / 429.0
-        self._kernel_var = 1.0 / 9
+        CustomKernel.__init__(self, shape=lambda x: 1.09375*(1 - x*x)**3, h=h,
+                              domain=[-1.0, 1.0], norm = 1.0)
+        self._L2Norm = 350.0/429.0
+        self._kernel_var = 1. / 9
         self._order = 2


@@ -326,11 +515,10 @@ class Gaussian(CustomKernel):

     K(u) = 1 / (sqrt(2*pi)) exp(-0.5 u**2)
     """
-
     def __init__(self, h=1.0):
-        CustomKernel.__init__(self, shape=lambda x: 0.3989422804014327 * np
-            .exp(-x ** 2 / 2.0), h=h, domain=None, norm=1.0)
-        self._L2Norm = 1.0 / (2.0 * np.sqrt(np.pi))
+        CustomKernel.__init__(self, shape = lambda x: 0.3989422804014327 *
+                        np.exp(-x**2/2.0), h = h, domain = None, norm = 1.0)
+        self._L2Norm = 1.0/(2.0*np.sqrt(np.pi))
         self._kernel_var = 1.0
         self._order = 2

@@ -341,8 +529,11 @@ class Gaussian(CustomKernel):

         Special implementation optimized for Gaussian.
         """
-        pass
-
+        w = np.sum(exp(multiply(square(divide(subtract(xs, x),
+                                              self.h)),-0.5)))
+        v = np.sum(multiply(ys, exp(multiply(square(divide(subtract(xs, x),
+                                                          self.h)), -0.5))))
+        return v/w

 class Cosine(CustomKernel):
     """
@@ -350,12 +541,11 @@ class Cosine(CustomKernel):

     K(u) = pi/4 cos(0.5 * pi * u) between -1.0 and 1.0
     """
-
     def __init__(self, h=1.0):
-        CustomKernel.__init__(self, shape=lambda x: 0.7853981633974483 * np
-            .cos(np.pi / 2.0 * x), h=h, domain=[-1.0, 1.0], norm=1.0)
-        self._L2Norm = np.pi ** 2 / 16.0
-        self._kernel_var = 0.1894305308612978
+        CustomKernel.__init__(self, shape=lambda x: 0.78539816339744828 *
+                np.cos(np.pi/2.0 * x), h=h, domain=[-1.0, 1.0], norm = 1.0)
+        self._L2Norm = np.pi**2/16.0
+        self._kernel_var = 0.1894305308612978 # = 1 - 8 / np.pi**2
         self._order = 2


@@ -367,25 +557,22 @@ class Cosine2(CustomKernel):

     Note: this  is the same Cosine kernel that Stata uses
     """
-
     def __init__(self, h=1.0):
-        CustomKernel.__init__(self, shape=lambda x: 1 + np.cos(2.0 * np.pi *
-            x), h=h, domain=[-0.5, 0.5], norm=1.0)
+        CustomKernel.__init__(self, shape=lambda x: 1 + np.cos(2.0 * np.pi * x)
+                , h=h, domain=[-0.5, 0.5], norm = 1.0)
         self._L2Norm = 1.5
-        self._kernel_var = 0.03267274151216444
+        self._kernel_var = 0.03267274151216444  # = 1/12. - 0.5 / np.pi**2
         self._order = 2

-
 class Tricube(CustomKernel):
     """
     Tricube Kernel

     K(u) = 0.864197530864 * (1 - abs(x)**3)**3 between -1.0 and 1.0
     """
-
-    def __init__(self, h=1.0):
-        CustomKernel.__init__(self, shape=lambda x: 0.864197530864 * (1 - 
-            abs(x) ** 3) ** 3, h=h, domain=[-1.0, 1.0], norm=1.0)
-        self._L2Norm = 175.0 / 247.0
-        self._kernel_var = 35.0 / 243.0
+    def __init__(self,h=1.0):
+        CustomKernel.__init__(self,shape=lambda x: 0.864197530864 * (1 - abs(x)**3)**3,
+                              h=h, domain=[-1.0, 1.0], norm = 1.0)
+        self._L2Norm = 175.0/247.0
+        self._kernel_var = 35.0/243.0
         self._order = 2
diff --git a/statsmodels/sandbox/nonparametric/smoothers.py b/statsmodels/sandbox/nonparametric/smoothers.py
index 8beb8350a..440169280 100644
--- a/statsmodels/sandbox/nonparametric/smoothers.py
+++ b/statsmodels/sandbox/nonparametric/smoothers.py
@@ -2,6 +2,12 @@
 This module contains scatterplot smoothers, that is classes
 who generate a smooth fit of a set of (x,y) pairs.
 """
+
+# pylint: disable-msg=C0103
+# pylint: disable-msg=W0142
+# pylint: disable-msg=E0611
+# pylint: disable-msg=E1101
+
 import numpy as np
 from . import kernels

@@ -15,14 +21,16 @@ class KernelSmoother:
     y - array_like of y values
     Kernel - Kernel object, Default is Gaussian.
     """
-
-    def __init__(self, x, y, Kernel=None):
+    def __init__(self, x, y, Kernel = None):
         if Kernel is None:
             Kernel = kernels.Gaussian()
         self.Kernel = Kernel
         self.x = np.array(x)
         self.y = np.array(y)

+    def fit(self):
+        pass
+
     def __call__(self, x):
         return np.array([self.predict(xx) for xx in x])

@@ -35,7 +43,11 @@ class KernelSmoother:
         Otherwise an attempt is made to cast x to numpy.ndarray and an array of
         corresponding y-points is returned.
         """
-        pass
+        if np.size(x) == 1: # if isinstance(x, numbers.Real):
+            return self.Kernel.smooth(self.x, self.y, x)
+        else:
+            return np.array([self.Kernel.smooth(self.x, self.y, xx) for xx
+                                                in np.array(x)])

     def conf(self, x):
         """
@@ -52,8 +64,22 @@ class KernelSmoother:
         xth sample point - so they are closer together where the data
         is denser.
         """
-        pass
+        if isinstance(x, int):
+            sorted_x = np.array(self.x)
+            sorted_x.sort()
+            confx = sorted_x[::x]
+            conffit = self.conf(confx)
+            return (confx, conffit)
+        else:
+            return np.array([self.Kernel.smoothconf(self.x, self.y, xx)
+                                                                for xx in x])
+
+
+    def var(self, x):
+        return np.array([self.Kernel.smoothvar(self.x, self.y, xx) for xx in x])

+    def std(self, x):
+        return np.sqrt(self.var(x))

 class PolySmoother:
     """
@@ -65,41 +91,311 @@ class PolySmoother:
     This is a 3 liner with OLS or WLS, see test.
     It's here as a test smoother for GAM
     """
+    #JP: heavily adjusted to work as plugin replacement for bspline
+    #   smoother in gam.py  initialized by function default_smoother
+    #   Only fixed exceptions, I did not check whether it is statistically
+    #   correctand I think it is not, there are still be some dimension
+    #   problems, and there were some dimension problems initially.
+    # TODO: undo adjustments and fix dimensions correctly
+    # comment: this is just like polyfit with initialization options
+    #          and additional results (OLS on polynomial of x (x is 1d?))
+

     def __init__(self, order, x=None):
+        #order = 4 # set this because we get knots instead of order
         self.order = order
-        self.coef = np.zeros((order + 1,), np.float64)
+
+        #print order, x.shape
+        self.coef = np.zeros((order+1,), np.float64)
         if x is not None:
             if x.ndim > 1:
-                print('Warning: 2d x detected in PolySmoother init, shape:',
-                    x.shape)
-                x = x[0, :]
-            self.X = np.array([(x ** i) for i in range(order + 1)]).T
+                print('Warning: 2d x detected in PolySmoother init, shape:', x.shape)
+                x=x[0,:] #check orientation
+            self.X = np.array([x**i for i in range(order+1)]).T

     def df_fit(self):
-        """alias of df_model for backwards compatibility
-        """
-        pass
+        '''alias of df_model for backwards compatibility
+        '''
+        return self.df_model()

     def df_model(self):
         """
         Degrees of freedom used in the fit.
         """
+        return self.order + 1
+
+    def gram(self, d=None):
+        #fake for spline imitation
         pass

-    def smooth(self, *args, **kwds):
-        """alias for fit,  for backwards compatibility,
+    def smooth(self,*args, **kwds):
+        '''alias for fit,  for backwards compatibility,

         do we need it with different behavior than fit?

-        """
-        pass
+        '''
+        return self.fit(*args, **kwds)

     def df_resid(self):
         """
         Residual degrees of freedom from last fit.
         """
-        pass
+        return self.N - self.order - 1

     def __call__(self, x=None):
         return self.predict(x=x)
+
+
+    def predict(self, x=None):
+
+        if x is not None:
+            #if x.ndim > 1: x=x[0,:]  #why this this should select column not row
+            if x.ndim > 1:
+                print('Warning: 2d x detected in PolySmoother predict, shape:', x.shape)
+                x=x[:,0]  #TODO: check and clean this up
+            X = np.array([(x**i) for i in range(self.order+1)])
+        else:
+            X = self.X
+        #return np.squeeze(np.dot(X.T, self.coef))
+        #need to check what dimension this is supposed to be
+        if X.shape[1] == self.coef.shape[0]:
+            return np.squeeze(np.dot(X, self.coef))#[0]
+        else:
+            return np.squeeze(np.dot(X.T, self.coef))#[0]
+
+    def fit(self, y, x=None, weights=None):
+        self.N = y.shape[0]
+        if y.ndim == 1:
+            y = y[:,None]
+        if weights is None or np.isnan(weights).all():
+            weights = 1
+            _w = 1
+        else:
+            _w = np.sqrt(weights)[:,None]
+        if x is None:
+            if not hasattr(self, "X"):
+                raise ValueError("x needed to fit PolySmoother")
+        else:
+            if x.ndim > 1:
+                print('Warning: 2d x detected in PolySmoother predict, shape:', x.shape)
+                #x=x[0,:] #TODO: check orientation, row or col
+            self.X = np.array([(x**i) for i in range(self.order+1)]).T
+        #print _w.shape
+
+        X = self.X * _w
+
+        _y = y * _w#[:,None]
+        #self.coef = np.dot(L.pinv(X).T, _y[:,None])
+        #self.coef = np.dot(L.pinv(X), _y)
+        self.coef = np.linalg.lstsq(X, _y, rcond=-1)[0]
+        self.params = np.squeeze(self.coef)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+# comment out for now to remove dependency on _hbspline
+
+##class SmoothingSpline(BSpline):
+##
+##    penmax = 30.
+##
+##    def fit(self, y, x=None, weights=None, pen=0.):
+##        banded = True
+##
+##        if x is None:
+##            x = self.tau[(self.M-1):-(self.M-1)] # internal knots
+##
+##        if pen == 0.: # cannot use cholesky for singular matrices
+##            banded = False
+##
+##        if x.shape != y.shape:
+##            raise ValueError('x and y shape do not agree, by default x are the Bspline\'s internal knots')
+##
+##        bt = self.basis(x)
+##        if pen >= self.penmax:
+##            pen = self.penmax
+##
+##        if weights is None:
+##            weights = np.array(1.)
+##
+##        wmean = weights.mean()
+##        _w = np.sqrt(weights / wmean)
+##        bt *= _w
+##
+##        # throw out rows with zeros (this happens at boundary points!)
+##
+##        mask = np.flatnonzero(1 - np.alltrue(np.equal(bt, 0), axis=0))
+##
+##        bt = bt[:, mask]
+##        y = y[mask]
+##
+##        self.df_total = y.shape[0]
+##
+##        if bt.shape[1] != y.shape[0]:
+##            raise ValueError("some x values are outside range of B-spline knots")
+##        bty = np.dot(bt, _w * y)
+##        self.N = y.shape[0]
+##        if not banded:
+##            self.btb = np.dot(bt, bt.T)
+##            _g = _band2array(self.g, lower=1, symmetric=True)
+##            self.coef, _, self.rank = L.lstsq(self.btb + pen*_g, bty)[0:3]
+##            self.rank = min(self.rank, self.btb.shape[0])
+##        else:
+##            self.btb = np.zeros(self.g.shape, np.float64)
+##            nband, nbasis = self.g.shape
+##            for i in range(nbasis):
+##                for k in range(min(nband, nbasis-i)):
+##                    self.btb[k, i] = (bt[i] * bt[i+k]).sum()
+##
+##            bty.shape = (1, bty.shape[0])
+##            self.chol, self.coef = solveh_banded(self.btb +
+##                                                 pen*self.g,
+##                                                 bty, lower=1)
+##
+##        self.coef = np.squeeze(self.coef)
+##        self.resid = np.sqrt(wmean) * (y * _w - np.dot(self.coef, bt))
+##        self.pen = pen
+##
+##    def gcv(self):
+##        """
+##        Generalized cross-validation score of current fit.
+##        """
+##
+##        norm_resid = (self.resid**2).sum()
+##        return norm_resid / (self.df_total - self.trace())
+##
+##    def df_resid(self):
+##        """
+##        self.N - self.trace()
+##
+##        where self.N is the number of observations of last fit.
+##        """
+##
+##        return self.N - self.trace()
+##
+##    def df_fit(self):
+##        """
+##        = self.trace()
+##
+##        How many degrees of freedom used in the fit?
+##        """
+##        return self.trace()
+##
+##    def trace(self):
+##        """
+##        Trace of the smoothing matrix S(pen)
+##        """
+##
+##        if self.pen > 0:
+##            _invband = _hbspline.invband(self.chol.copy())
+##            tr = _trace_symbanded(_invband, self.btb, lower=1)
+##            return tr
+##        else:
+##            return self.rank
+##
+##class SmoothingSplineFixedDF(SmoothingSpline):
+##    """
+##    Fit smoothing spline with approximately df degrees of freedom
+##    used in the fit, i.e. so that self.trace() is approximately df.
+##
+##    In general, df must be greater than the dimension of the null space
+##    of the Gram inner product. For cubic smoothing splines, this means
+##    that df > 2.
+##    """
+##
+##    target_df = 5
+##
+##    def __init__(self, knots, order=4, coef=None, M=None, target_df=None):
+##        if target_df is not None:
+##            self.target_df = target_df
+##        BSpline.__init__(self, knots, order=order, coef=coef, M=M)
+##        self.target_reached = False
+##
+##    def fit(self, y, x=None, df=None, weights=None, tol=1.0e-03):
+##
+##        df = df or self.target_df
+##
+##        apen, bpen = 0, 1.0e-03
+##        olddf = y.shape[0] - self.m
+##
+##        if not self.target_reached:
+##            while True:
+##                curpen = 0.5 * (apen + bpen)
+##                SmoothingSpline.fit(self, y, x=x, weights=weights, pen=curpen)
+##                curdf = self.trace()
+##                if curdf > df:
+##                    apen, bpen = curpen, 2 * curpen
+##                else:
+##                    apen, bpen = apen, curpen
+##                    if apen >= self.penmax:
+##                        raise ValueError("penalty too large, try setting penmax higher or decreasing df")
+##                if np.fabs(curdf - df) / df < tol:
+##                    self.target_reached = True
+##                    break
+##        else:
+##            SmoothingSpline.fit(self, y, x=x, weights=weights, pen=self.pen)
+##
+##class SmoothingSplineGCV(SmoothingSpline):
+##
+##    """
+##    Fit smoothing spline trying to optimize GCV.
+##
+##    Try to find a bracketing interval for scipy.optimize.golden
+##    based on bracket.
+##
+##    It is probably best to use target_df instead, as it is
+##    sometimes difficult to find a bracketing interval.
+##
+##    """
+##
+##    def fit(self, y, x=None, weights=None, tol=1.0e-03,
+##            bracket=(0,1.0e-03)):
+##
+##        def _gcv(pen, y, x):
+##            SmoothingSpline.fit(y, x=x, pen=np.exp(pen), weights=weights)
+##            a = self.gcv()
+##            return a
+##
+##        a = golden(_gcv, args=(y,x), brack=(-100,20), tol=tol)
+##
+##def _trace_symbanded(a,b, lower=0):
+##    """
+##    Compute the trace(a*b) for two upper or lower banded real symmetric matrices.
+##    """
+##
+##    if lower:
+##        t = _zero_triband(a * b, lower=1)
+##        return t[0].sum() + 2 * t[1:].sum()
+##    else:
+##        t = _zero_triband(a * b, lower=0)
+##        return t[-1].sum() + 2 * t[:-1].sum()
+##
+##
+##
+##def _zero_triband(a, lower=0):
+##    """
+##    Zero out unnecessary elements of a real symmetric banded matrix.
+##    """
+##
+##    nrow, ncol = a.shape
+##    if lower:
+##        for i in range(nrow): a[i,(ncol-i):] = 0.
+##    else:
+##        for i in range(nrow): a[i,0:i] = 0.
+##    return a
diff --git a/statsmodels/sandbox/nonparametric/testdata.py b/statsmodels/sandbox/nonparametric/testdata.py
index 6c25c9741..f8828a80b 100644
--- a/statsmodels/sandbox/nonparametric/testdata.py
+++ b/statsmodels/sandbox/nonparametric/testdata.py
@@ -1,53 +1,58 @@
+# -*- coding: utf-8 -*-
 """
 Created on Fri Mar 04 07:36:28 2011

 @author: Mike
 """
-import numpy as np

+import numpy as np

 class kdetest:
-    Hpi = np.matrix([[0.05163034, 0.5098923], [0.50989228, 8.8822365]])
-    faithfulData = dict(eruptions=[3.6, 1.8, 3.333, 2.283, 4.533, 2.883, 
-        4.7, 3.6, 1.95, 4.35, 1.833, 3.917, 4.2, 1.75, 4.7, 2.167, 1.75, 
-        4.8, 1.6, 4.25, 1.8, 1.75, 3.45, 3.067, 4.533, 3.6, 1.967, 4.083, 
-        3.85, 4.433, 4.3, 4.467, 3.367, 4.033, 3.833, 2.017, 1.867, 4.833, 
-        1.833, 4.783, 4.35, 1.883, 4.567, 1.75, 4.533, 3.317, 3.833, 2.1, 
-        4.633, 2, 4.8, 4.716, 1.833, 4.833, 1.733, 4.883, 3.717, 1.667, 
-        4.567, 4.317, 2.233, 4.5, 1.75, 4.8, 1.817, 4.4, 4.167, 4.7, 2.067,
-        4.7, 4.033, 1.967, 4.5, 4, 1.983, 5.067, 2.017, 4.567, 3.883, 3.6, 
-        4.133, 4.333, 4.1, 2.633, 4.067, 4.933, 3.95, 4.517, 2.167, 4, 2.2,
-        4.333, 1.867, 4.817, 1.833, 4.3, 4.667, 3.75, 1.867, 4.9, 2.483, 
-        4.367, 2.1, 4.5, 4.05, 1.867, 4.7, 1.783, 4.85, 3.683, 4.733, 2.3, 
-        4.9, 4.417, 1.7, 4.633, 2.317, 4.6, 1.817, 4.417, 2.617, 4.067, 
-        4.25, 1.967, 4.6, 3.767, 1.917, 4.5, 2.267, 4.65, 1.867, 4.167, 2.8,
-        4.333, 1.833, 4.383, 1.883, 4.933, 2.033, 3.733, 4.233, 2.233, 
-        4.533, 4.817, 4.333, 1.983, 4.633, 2.017, 5.1, 1.8, 5.033, 4, 2.4, 
-        4.6, 3.567, 4, 4.5, 4.083, 1.8, 3.967, 2.2, 4.15, 2, 3.833, 3.5, 
-        4.583, 2.367, 5, 1.933, 4.617, 1.917, 2.083, 4.583, 3.333, 4.167, 
-        4.333, 4.5, 2.417, 4, 4.167, 1.883, 4.583, 4.25, 3.767, 2.033, 
-        4.433, 4.083, 1.833, 4.417, 2.183, 4.8, 1.833, 4.8, 4.1, 3.966, 
-        4.233, 3.5, 4.366, 2.25, 4.667, 2.1, 4.35, 4.133, 1.867, 4.6, 1.783,
-        4.367, 3.85, 1.933, 4.5, 2.383, 4.7, 1.867, 3.833, 3.417, 4.233, 
-        2.4, 4.8, 2, 4.15, 1.867, 4.267, 1.75, 4.483, 4, 4.117, 4.083, 
-        4.267, 3.917, 4.55, 4.083, 2.417, 4.183, 2.217, 4.45, 1.883, 1.85, 
-        4.283, 3.95, 2.333, 4.15, 2.35, 4.933, 2.9, 4.583, 3.833, 2.083, 
-        4.367, 2.133, 4.35, 2.2, 4.45, 3.567, 4.5, 4.15, 3.817, 3.917, 4.45,
-        2, 4.283, 4.767, 4.533, 1.85, 4.25, 1.983, 2.25, 4.75, 4.117, 2.15,
-        4.417, 1.817, 4.467], waiting=[79, 54, 74, 62, 85, 55, 88, 85, 51, 
-        85, 54, 84, 78, 47, 83, 52, 62, 84, 52, 79, 51, 47, 78, 69, 74, 83,
-        55, 76, 78, 79, 73, 77, 66, 80, 74, 52, 48, 80, 59, 90, 80, 58, 84,
-        58, 73, 83, 64, 53, 82, 59, 75, 90, 54, 80, 54, 83, 71, 64, 77, 81,
-        59, 84, 48, 82, 60, 92, 78, 78, 65, 73, 82, 56, 79, 71, 62, 76, 60,
-        78, 76, 83, 75, 82, 70, 65, 73, 88, 76, 80, 48, 86, 60, 90, 50, 78,
-        63, 72, 84, 75, 51, 82, 62, 88, 49, 83, 81, 47, 84, 52, 86, 81, 75,
-        59, 89, 79, 59, 81, 50, 85, 59, 87, 53, 69, 77, 56, 88, 81, 45, 82,
-        55, 90, 45, 83, 56, 89, 46, 82, 51, 86, 53, 79, 81, 60, 82, 77, 76,
-        59, 80, 49, 96, 53, 77, 77, 65, 81, 71, 70, 81, 93, 53, 89, 45, 86,
-        58, 78, 66, 76, 63, 88, 52, 93, 49, 57, 77, 68, 81, 81, 73, 50, 85,
-        74, 55, 77, 83, 83, 51, 78, 84, 46, 83, 55, 81, 57, 76, 84, 77, 81,
-        87, 77, 51, 78, 60, 82, 91, 53, 78, 46, 77, 84, 49, 83, 71, 80, 49,
-        75, 64, 76, 53, 94, 55, 76, 50, 82, 54, 75, 78, 79, 78, 78, 70, 79,
-        70, 54, 86, 50, 90, 54, 54, 77, 79, 64, 75, 47, 86, 63, 85, 82, 57,
-        82, 67, 74, 54, 83, 73, 73, 88, 80, 71, 83, 56, 79, 78, 84, 58, 83,
-        43, 60, 75, 81, 46, 90, 46, 74])
+
+
+    Hpi = np.matrix([[ 0.05163034, 0.5098923 ],
+           [0.50989228, 8.8822365 ]])
+
+
+    faithfulData = dict(
+        eruptions=[
+            3.6, 1.8, 3.333, 2.283, 4.533, 2.883, 4.7, 3.6, 1.95, 4.35, 1.833, 3.917,
+            4.2, 1.75, 4.7, 2.167, 1.75, 4.8, 1.6, 4.25, 1.8, 1.75, 3.45, 3.067, 4.533,
+            3.6, 1.967, 4.083, 3.85, 4.433, 4.3, 4.467, 3.367, 4.033, 3.833, 2.017, 1.867,
+            4.833, 1.833, 4.783, 4.35, 1.883, 4.567, 1.75, 4.533, 3.317, 3.833, 2.1, 4.633,
+            2, 4.8, 4.716, 1.833, 4.833, 1.733, 4.883, 3.717, 1.667, 4.567, 4.317, 2.233, 4.5,
+            1.75, 4.8, 1.817, 4.4, 4.167, 4.7, 2.067, 4.7, 4.033, 1.967, 4.5, 4, 1.983, 5.067,
+            2.017, 4.567, 3.883, 3.6, 4.133, 4.333, 4.1, 2.633, 4.067, 4.933, 3.95, 4.517, 2.167,
+            4, 2.2, 4.333, 1.867, 4.817, 1.833, 4.3, 4.667, 3.75, 1.867, 4.9, 2.483, 4.367, 2.1, 4.5,
+            4.05, 1.867, 4.7, 1.783, 4.85, 3.683, 4.733, 2.3, 4.9, 4.417, 1.7, 4.633, 2.317, 4.6,
+            1.817, 4.417, 2.617, 4.067, 4.25, 1.967, 4.6, 3.767, 1.917, 4.5, 2.267, 4.65, 1.867,
+            4.167, 2.8, 4.333, 1.833, 4.383, 1.883, 4.933, 2.033, 3.733, 4.233, 2.233, 4.533,
+            4.817, 4.333, 1.983, 4.633, 2.017, 5.1, 1.8, 5.033, 4, 2.4, 4.6, 3.567, 4, 4.5, 4.083,
+            1.8, 3.967, 2.2, 4.15, 2, 3.833, 3.5, 4.583, 2.367, 5, 1.933, 4.617, 1.917, 2.083,
+            4.583, 3.333, 4.167, 4.333, 4.5, 2.417, 4, 4.167, 1.883, 4.583, 4.25, 3.767, 2.033,
+            4.433, 4.083, 1.833, 4.417, 2.183, 4.8, 1.833, 4.8, 4.1, 3.966, 4.233, 3.5, 4.366,
+            2.25, 4.667, 2.1, 4.35, 4.133, 1.867, 4.6, 1.783, 4.367, 3.85, 1.933, 4.5, 2.383,
+            4.7, 1.867, 3.833, 3.417, 4.233, 2.4, 4.8, 2, 4.15, 1.867, 4.267, 1.75, 4.483, 4,
+            4.117, 4.083, 4.267, 3.917, 4.55, 4.083, 2.417, 4.183, 2.217, 4.45, 1.883, 1.85,
+            4.283, 3.95, 2.333, 4.15, 2.35, 4.933, 2.9, 4.583, 3.833, 2.083, 4.367, 2.133, 4.35,
+            2.2, 4.45, 3.567, 4.5, 4.15, 3.817, 3.917, 4.45, 2, 4.283, 4.767, 4.533, 1.85, 4.25,
+            1.983, 2.25, 4.75, 4.117, 2.15, 4.417, 1.817, 4.467],
+        waiting=[
+            79, 54, 74, 62, 85, 55, 88, 85, 51, 85, 54, 84, 78, 47, 83, 52,
+            62, 84, 52, 79, 51, 47, 78, 69, 74, 83, 55, 76, 78, 79, 73, 77,
+            66, 80, 74, 52, 48, 80, 59, 90, 80, 58, 84, 58, 73, 83, 64, 53,
+            82, 59, 75, 90, 54, 80, 54, 83, 71, 64, 77, 81, 59, 84, 48, 82,
+            60, 92, 78, 78, 65, 73, 82, 56, 79, 71, 62, 76, 60, 78, 76, 83,
+            75, 82, 70, 65, 73, 88, 76, 80, 48, 86, 60, 90, 50, 78, 63, 72,
+            84, 75, 51, 82, 62, 88, 49, 83, 81, 47, 84, 52, 86, 81, 75, 59,
+            89, 79, 59, 81, 50, 85, 59, 87, 53, 69, 77, 56, 88, 81, 45, 82,
+            55, 90, 45, 83, 56, 89, 46, 82, 51, 86, 53, 79, 81, 60, 82, 77,
+            76, 59, 80, 49, 96, 53, 77, 77, 65, 81, 71, 70, 81, 93, 53, 89,
+            45, 86, 58, 78, 66, 76, 63, 88, 52, 93, 49, 57, 77, 68, 81, 81,
+            73, 50, 85, 74, 55, 77, 83, 83, 51, 78, 84, 46, 83, 55, 81, 57,
+            76, 84, 77, 81, 87, 77, 51, 78, 60, 82, 91, 53, 78, 46, 77, 84,
+            49, 83, 71, 80, 49, 75, 64, 76, 53, 94, 55, 76, 50, 82, 54, 75,
+            78, 79, 78, 78, 70, 79, 70, 54, 86, 50, 90, 54, 54, 77, 79, 64,
+            75, 47, 86, 63, 85, 82, 57, 82, 67, 74, 54, 83, 73, 73, 88, 80,
+            71, 83, 56, 79, 78, 84, 58, 83, 43, 60, 75, 81, 46, 90, 46, 74]
+    )
diff --git a/statsmodels/sandbox/panel/correlation_structures.py b/statsmodels/sandbox/panel/correlation_structures.py
index 14539ae62..277b7674a 100644
--- a/statsmodels/sandbox/panel/correlation_structures.py
+++ b/statsmodels/sandbox/panel/correlation_structures.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Correlation and Covariance Structures

 Created on Sat Dec 17 20:46:05 2011
@@ -12,13 +13,15 @@ quick reading of some section on mixed effects models in S-plus and of
 outline for GEE.

 """
+
 import numpy as np
+
 from statsmodels.regression.linear_model import yule_walker
 from statsmodels.stats.moment_helpers import cov2corr


 def corr_equi(k_vars, rho):
-    """create equicorrelated correlation matrix with rho on off diagonal
+    '''create equicorrelated correlation matrix with rho on off diagonal

     Parameters
     ----------
@@ -32,12 +35,15 @@ def corr_equi(k_vars, rho):
     corr : ndarray (k_vars, k_vars)
         correlation matrix

-    """
-    pass
+    '''
+    corr = np.empty((k_vars, k_vars))
+    corr.fill(rho)
+    corr[np.diag_indices_from(corr)] = 1
+    return corr


 def corr_ar(k_vars, ar):
-    """create autoregressive correlation matrix
+    '''create autoregressive correlation matrix

     This might be MA, not AR, process if used for residual process - check

@@ -47,12 +53,18 @@ def corr_ar(k_vars, ar):
         AR lag-polynomial including 1 for lag 0


-    """
-    pass
+    '''
+    from scipy.linalg import toeplitz
+    if len(ar) < k_vars:
+        ar_ = np.zeros(k_vars)
+        ar_[:len(ar)] = ar
+        ar = ar_
+
+    return toeplitz(ar)


 def corr_arma(k_vars, ar, ma):
-    """create arma correlation matrix
+    '''create arma correlation matrix

     converts arma to autoregressive lag-polynomial with k_var lags

@@ -65,12 +77,18 @@ def corr_arma(k_vars, ar, ma):
     ma : array_like, 1d
         MA lag-polynomial

-    """
-    pass
+    '''
+    from scipy.linalg import toeplitz
+    from statsmodels.tsa.arima_process import arma2ar
+
+    # TODO: flesh out the comment below about a bug in arma2ar
+    ar = arma2ar(ar, ma, lags=k_vars)[:k_vars]  # bug in arma2ar
+
+    return toeplitz(ar)


 def corr2cov(corr, std):
-    """convert correlation matrix to covariance matrix
+    '''convert correlation matrix to covariance matrix

     Parameters
     ----------
@@ -80,8 +98,11 @@ def corr2cov(corr, std):
         standard deviation for the vector of random variables. If scalar, then
         it is assumed that all variables have the same scale given by std.

-    """
-    pass
+    '''
+    if np.size(std) == 1:
+        std = std*np.ones(corr.shape[0])
+    cov = corr * std[:, None] * std[None, :]  # same as outer product
+    return cov


 def whiten_ar(x, ar_coefs, order):
@@ -106,10 +127,22 @@ def whiten_ar(x, ar_coefs, order):
     x_new : ndarray
         transformed array
     """
-    pass

+    rho = ar_coefs

-def yule_walker_acov(acov, order=1, method='unbiased', df=None, inv=False):
+    x = np.array(x, np.float64)
+    _x = x.copy()
+    # TODO: dimension handling is not DRY
+    # I think previous code worked for 2d because of single index rows in np
+    if x.ndim == 2:
+        rho = rho[:, None]
+    for i in range(order):
+        _x[(i+1):] = _x[(i+1):] - rho[i] * x[0:-(i+1)]
+
+    return _x[order:]
+
+
+def yule_walker_acov(acov, order=1, method="unbiased", df=None, inv=False):
     """
     Estimate AR(p) parameters from acovf using Yule-Walker equation.

@@ -132,16 +165,17 @@ def yule_walker_acov(acov, order=1, method='unbiased', df=None, inv=False):
     Rinv : ndarray
         inverse of the Toepliz matrix
     """
-    pass
+    return yule_walker(acov, order=order, method=method, df=df, inv=inv,
+                       demean=False)


 class ARCovariance:
-    """
+    '''
     experimental class for Covariance of AR process
     classmethod? staticmethods?
-    """
+    '''

-    def __init__(self, ar=None, ar_coefs=None, sigma=1.0):
+    def __init__(self, ar=None, ar_coefs=None, sigma=1.):
         if ar is not None:
             self.ar = ar
             self.ar_coefs = -ar[1:]
@@ -150,3 +184,19 @@ class ARCovariance:
             self.arcoefs = ar_coefs
             self.ar = np.hstack(([1], -ar_coefs))
             self.k_lags = len(self.ar)
+
+    @classmethod
+    def fit(cls, cov, order, **kwds):
+        rho, sigma = yule_walker_acov(cov, order=order, **kwds)
+        return cls(ar_coefs=rho)
+
+    def whiten(self, x):
+        return whiten_ar(x, self.ar_coefs, order=self.order)
+
+    def corr(self, k_vars=None):
+        if k_vars is None:
+            k_vars = len(self.ar)   # TODO: this could move into corr_arr
+        return corr_ar(k_vars, self.ar)
+
+    def cov(self, k_vars=None):
+        return cov2corr(self.corr(k_vars=None), self.sigma)
diff --git a/statsmodels/sandbox/panel/mixed.py b/statsmodels/sandbox/panel/mixed.py
index 0142276f4..22dfe3e2d 100644
--- a/statsmodels/sandbox/panel/mixed.py
+++ b/statsmodels/sandbox/panel/mixed.py
@@ -19,18 +19,18 @@ example.
 """
 import numpy as np
 import numpy.linalg as L
+
 from statsmodels.base.model import LikelihoodModelResults
 from statsmodels.tools.decorators import cache_readonly

-
 class Unit:
     """
     Individual experimental unit for
     EM implementation of (repeated measures)
     mixed effects model.

-    'Maximum Likelihood Computations with Repeated Measures:
-    Application of the EM Algorithm'
+    \'Maximum Likelihood Computations with Repeated Measures:
+    Application of the EM Algorithm\'

     Nan Laird; Nicholas Lange; Daniel Stram

@@ -57,7 +57,9 @@ class Unit:
     centered in this case. (That's how it looks to me. JP)
     """

+
     def __init__(self, endog, exog_fe, exog_re):
+
         self.Y = endog
         self.X = exog_fe
         self.Z = exog_re
@@ -67,13 +69,14 @@ class Unit:
         """covariance of observations (nobs_i, nobs_i)  (JP check)
         Display (3.3) from Laird, Lange, Stram (see help(Unit))
         """
-        pass
+        self.S = (np.identity(self.n) * sigma**2 +
+                  np.dot(self.Z, np.dot(D, self.Z.T)))

     def _compute_W(self):
         """inverse covariance of observations (nobs_i, nobs_i)  (JP check)
         Display (3.2) from Laird, Lange, Stram (see help(Unit))
         """
-        pass
+        self.W = L.inv(self.S)

     def compute_P(self, Sinv):
         """projection matrix (nobs_i, nobs_i) (M in regression ?)  (JP check, guessing)
@@ -81,14 +84,15 @@ class Unit:

         W - W X Sinv X' W'
         """
-        pass
+        t = np.dot(self.W, self.X)
+        self.P = self.W - np.dot(np.dot(t, Sinv), t.T)

     def _compute_r(self, alpha):
         """residual after removing fixed effects

         Display (3.5) from Laird, Lange, Stram (see help(Unit))
         """
-        pass
+        self.r = self.Y - np.dot(self.X, alpha)

     def _compute_b(self, D):
         """coefficients for random effects/coefficients
@@ -96,7 +100,7 @@ class Unit:

         D Z' W r
         """
-        pass
+        self.b = np.dot(D, np.dot(np.dot(self.Z.T, self.W), self.r))

     def fit(self, a, D, sigma):
         """
@@ -105,19 +109,23 @@ class Unit:

         Displays (3.2)-(3.5).
         """
-        pass
+
+        self._compute_S(D, sigma)    #random effect plus error covariance
+        self._compute_W()            #inv(S)
+        self._compute_r(a)           #residual after removing fixed effects/exogs
+        self._compute_b(D)           #?  coefficients on random exog, Z ?

     def compute_xtwy(self):
         """
         Utility function to compute X^tWY (transposed ?) for Unit instance.
         """
-        pass
+        return np.dot(np.dot(self.W, self.Y), self.X) #is this transposed ?

     def compute_xtwx(self):
         """
         Utility function to compute X^tWX for Unit instance.
         """
-        pass
+        return np.dot(np.dot(self.X.T, self.W), self.X)

     def cov_random(self, D, Sinv=None):
         """
@@ -131,7 +139,10 @@ class Unit:
         In example where the mean of the random coefficient is not zero, this
         is not a covariance but a non-centered moment. (proof by example)
         """
-        pass
+        if Sinv is not None:
+            self.compute_P(Sinv)
+        t = np.dot(self.Z, D)
+        return D - np.dot(np.dot(t.T, self.P), t)

     def logL(self, a, ML=False):
         """
@@ -145,13 +156,20 @@ class Unit:
         If ML is false, then the residuals are calculated for the given fixed
         effects parameters a.
         """
-        pass
+
+        if ML:
+            return (np.log(L.det(self.W)) - (self.r * np.dot(self.W, self.r)).sum()) / 2.
+        else:
+            if a is None:
+                raise ValueError('need fixed effect a for REML contribution to log-likelihood')
+            r = self.Y - np.dot(self.X, a)
+            return (np.log(L.det(self.W)) - (r * np.dot(self.W, r)).sum()) / 2.

     def deviance(self, ML=False):
-        """deviance defined as 2 times the negative loglikelihood
+        '''deviance defined as 2 times the negative loglikelihood

-        """
-        pass
+        '''
+        return - 2 * self.logL(ML=ML)


 class OneWayMixed:
@@ -160,8 +178,8 @@ class OneWayMixed:
     EM implementation of (repeated measures)
     mixed effects model.

-    'Maximum Likelihood Computations with Repeated Measures:
-    Application of the EM Algorithm'
+    \'Maximum Likelihood Computations with Repeated Measures:
+    Application of the EM Algorithm\'

     Nan Laird; Nicholas Lange; Daniel Stram

@@ -229,18 +247,25 @@ class OneWayMixed:
         self.units = units
         self.m = len(self.units)
         self.n_units = self.m
+
         self.N = sum(unit.X.shape[0] for unit in self.units)
-        self.nobs = self.N
+        self.nobs = self.N     #alias for now
+
+        # Determine size of fixed effects
         d = self.units[0].X
-        self.p = d.shape[1]
-        self.k_exog_fe = self.p
+        self.p = d.shape[1]  # d.shape = p
+        self.k_exog_fe = self.p   #alias for now
         self.a = np.zeros(self.p, np.float64)
+
+        # Determine size of D, and sensible initial estimates
+        # of sigma and D
         d = self.units[0].Z
-        self.q = d.shape[1]
-        self.k_exog_re = self.q
-        self.D = np.zeros((self.q,) * 2, np.float64)
-        self.sigma = 1.0
-        self.dev = np.inf
+        self.q = d.shape[1]  # Z.shape = q
+        self.k_exog_re = self.q   #alias for now
+        self.D = np.zeros((self.q,)*2, np.float64)
+        self.sigma = 1.
+
+        self.dev = np.inf   #initialize for iterations, move it?

     def _compute_a(self):
         """fixed effects parameters
@@ -248,7 +273,15 @@ class OneWayMixed:
         Display (3.1) of
         Laird, Lange, Stram (see help(Mixed)).
         """
-        pass
+
+        for unit in self.units:
+            unit.fit(self.a, self.D, self.sigma)
+
+        S = sum([unit.compute_xtwx() for unit in self.units])
+        Y = sum([unit.compute_xtwy() for unit in self.units])
+
+        self.Sinv = L.pinv(S)
+        self.a = np.dot(self.Sinv, Y)

     def _compute_sigma(self, ML=False):
         """
@@ -260,7 +293,18 @@ class OneWayMixed:

         sigma is the standard deviation of the noise (residual)
         """
-        pass
+        sigmasq = 0.
+        for unit in self.units:
+            if ML:
+                W = unit.W
+            else:
+                unit.compute_P(self.Sinv)
+                W = unit.P
+            t = unit.r - np.dot(unit.Z, unit.b)
+            sigmasq += np.power(t, 2).sum()
+            sigmasq += self.sigma**2 * np.trace(np.identity(unit.n) -
+                                               self.sigma**2 * W)
+        self.sigma = np.sqrt(sigmasq / self.N)

     def _compute_D(self, ML=False):
         """
@@ -271,7 +315,18 @@ class OneWayMixed:
         If ML, this is (3.7) in Laird, Lange, Stram (see help(Mixed)),
         otherwise it corresponds to (3.9).
         """
-        pass
+        D = 0.
+        for unit in self.units:
+            if ML:
+                W = unit.W
+            else:
+                unit.compute_P(self.Sinv)
+                W = unit.P
+            D += np.multiply.outer(unit.b, unit.b)
+            t = np.dot(unit.Z, self.D)
+            D += self.D - np.dot(np.dot(t.T, W), t)
+
+        self.D = D / self.m

     def cov_fixed(self):
         """
@@ -279,7 +334,9 @@ class OneWayMixed:

         Just after Display (3.10) in Laird, Lange, Stram (see help(Mixed)).
         """
-        pass
+        return self.Sinv
+
+    #----------- alias (JP)   move to results class ?

     def cov_random(self):
         """
@@ -289,71 +346,186 @@ class OneWayMixed:

         see _compute_D, alias for self.D
         """
-        pass
+        return self.D

     @property
     def params(self):
-        """
+        '''
         estimated coefficients for exogeneous variables or fixed effects

         see _compute_a, alias for self.a
-        """
-        pass
+        '''
+        return self.a

     @property
     def params_random_units(self):
-        """random coefficients for each unit
+        '''random coefficients for each unit

-        """
-        pass
+        '''
+        return np.array([unit.b for unit in self.units])

     def cov_params(self):
-        """
+        '''
         estimated covariance for coefficients for exogeneous variables or fixed effects

         see cov_fixed, and Sinv in _compute_a
-        """
-        pass
+        '''
+        return self.cov_fixed()
+

     @property
     def bse(self):
-        """
+        '''
         standard errors of estimated coefficients for exogeneous variables (fixed)

-        """
-        pass
+        '''
+        return np.sqrt(np.diag(self.cov_params()))
+
+    #----------- end alias

     def deviance(self, ML=False):
-        """deviance defined as 2 times the negative loglikelihood
+        '''deviance defined as 2 times the negative loglikelihood
+
+        '''
+        return -2 * self.logL(ML=ML)

-        """
-        pass

     def logL(self, ML=False):
         """
         Return log-likelihood, REML by default.
         """
-        pass
-
-    def cont(self, ML=False, rtol=1e-05, params_rtol=1e-05, params_atol=0.0001
-        ):
-        """convergence check for iterative estimation
-
-        """
-        pass
+        #I do not know what the difference between REML and ML is here.
+        logL = 0.
+
+        for unit in self.units:
+            logL += unit.logL(a=self.a, ML=ML)
+        if not ML:
+            logL += np.log(L.det(self.Sinv)) / 2
+        return logL
+
+    def initialize(self):
+        S = sum([np.dot(unit.X.T, unit.X) for unit in self.units])
+        Y = sum([np.dot(unit.X.T, unit.Y) for unit in self.units])
+        self.a = L.lstsq(S, Y, rcond=-1)[0]
+
+        D = 0
+        t = 0
+        sigmasq = 0
+        for unit in self.units:
+            unit.r = unit.Y - np.dot(unit.X, self.a)
+            if self.q > 1:
+                unit.b = L.lstsq(unit.Z, unit.r, rcond=-1)[0]
+            else:
+                Z = unit.Z.reshape((unit.Z.shape[0], 1))
+                unit.b = L.lstsq(Z, unit.r, rcond=-1)[0]
+
+            sigmasq += (np.power(unit.Y, 2).sum() -
+                        (self.a * np.dot(unit.X.T, unit.Y)).sum() -
+                        (unit.b * np.dot(unit.Z.T, unit.r)).sum())
+            D += np.multiply.outer(unit.b, unit.b)
+            t += L.pinv(np.dot(unit.Z.T, unit.Z))
+
+        #TODO: JP added df_resid check
+        self.df_resid = (self.N - (self.m - 1) * self.q - self.p)
+        sigmasq /= (self.N - (self.m - 1) * self.q - self.p)
+        self.sigma = np.sqrt(sigmasq)
+        self.D = (D - sigmasq * t) / self.m
+
+    def cont(self, ML=False, rtol=1.0e-05, params_rtol=1e-5, params_atol=1e-4):
+        '''convergence check for iterative estimation
+
+        '''
+
+        self.dev, old = self.deviance(ML=ML), self.dev
+
+        #self.history.append(np.hstack((self.dev, self.a)))
+        self.history['llf'].append(self.dev)
+        self.history['params'].append(self.a.copy())
+        self.history['D'].append(self.D.copy())
+
+        if np.fabs((self.dev - old) / self.dev) < rtol:   #why is there times `*`?
+            #print np.fabs((self.dev - old)), self.dev, old
+            self.termination = 'llf'
+            return False
+
+        #break if parameters converged
+        #TODO: check termination conditions, OR or AND
+        if np.all(np.abs(self.a - self._a_old) < (params_rtol * self.a + params_atol)):
+            self.termination = 'params'
+            return False
+
+        self._a_old =  self.a.copy()
+        return True
+
+    def fit(self, maxiter=100, ML=False, rtol=1.0e-05, params_rtol=1e-6, params_atol=1e-6):
+
+        #initialize for convergence criteria
+        self._a_old = np.inf * self.a
+        self.history = {'llf':[], 'params':[], 'D':[]}
+
+        for i in range(maxiter):
+            self._compute_a()              #a, Sinv :  params, cov_params of fixed exog
+            self._compute_sigma(ML=ML)     #sigma   MLE or REML of sigma ?
+            self._compute_D(ML=ML)         #D :  covariance of random effects, MLE or REML
+            if not self.cont(ML=ML, rtol=rtol, params_rtol=params_rtol,
+                                             params_atol=params_atol):
+                break
+        else: #if end of loop is reached without break
+            self.termination = 'maxiter'
+            print('Warning: maximum number of iterations reached')
+
+        self.iterations = i
+
+        results = OneWayMixedResults(self)
+        #compatibility functions for fixed effects/exog
+        results.scale = 1
+        results.normalized_cov_params = self.cov_params()
+
+        return results


 class OneWayMixedResults(LikelihoodModelResults):
-    """Results class for OneWayMixed models
-
-    """
+    '''Results class for OneWayMixed models

+    '''
     def __init__(self, model):
+        #TODO: check, change initialization to more standard pattern
         self.model = model
         self.params = model.params

+
+    #need to overwrite this because we do not have a standard
+    #model.loglike yet
+    #TODO: what todo about REML loglike, logL is not normalized
+    @cache_readonly
+    def llf(self):
+        return self.model.logL(ML=True)
+
+    @property
+    def params_random_units(self):
+        return self.model.params_random_units
+
+    def cov_random(self):
+        return self.model.cov_random()
+
+    def mean_random(self, idx='lastexog'):
+        if idx == 'lastexog':
+            meanr = self.params[-self.model.k_exog_re:]
+        elif isinstance(idx, list):
+            if not len(idx) == self.model.k_exog_re:
+                raise ValueError('length of idx different from k_exog_re')
+            else:
+                meanr = self.params[idx]
+        else:
+            meanr = np.zeros(self.model.k_exog_re)
+
+        return meanr
+
+    def std_random(self):
+        return np.sqrt(np.diag(self.cov_random()))
+
     def plot_random_univariate(self, bins=None, use_loc=True):
-        """create plot of marginal distribution of random effects
+        '''create plot of marginal distribution of random effects

         Parameters
         ----------
@@ -377,11 +549,47 @@ class OneWayMixedResults(LikelihoodModelResults):
         Bin edges will not make sense if loc or scale differ across random
         effect distributions.

-        """
-        pass
+        '''
+        #outsource this
+        import matplotlib.pyplot as plt
+        from scipy.stats import norm as normal
+        fig = plt.figure()
+        k = self.model.k_exog_re
+        if k > 3:
+            rows, cols = int(np.ceil(k * 0.5)), 2
+        else:
+            rows, cols = k, 1
+        if bins is None:
+            #bins = self.model.n_units // 20    #TODO: just roughly, check
+            #bins = np.sqrt(self.model.n_units)
+            bins = 5 + 2 * self.model.n_units**(1./3.)
+
+        if use_loc:
+            loc = self.mean_random()
+        else:
+            loc = [0]*k
+
+        scale = self.std_random()
+
+        for ii in range(k):
+            ax = fig.add_subplot(rows, cols, ii)
+
+            freq, bins_, _ = ax.hist(loc[ii] + self.params_random_units[:,ii],
+                                    bins=bins, normed=True)
+            points = np.linspace(bins_[0], bins_[-1], 200)
+
+            #ax.plot(points, normal.pdf(points, loc=loc, scale=scale))
+            #loc of sample is approx. zero, with Z appended to X
+            #alternative, add fixed  to mean
+            ax.set_title('Random Effect %d Marginal Distribution' % ii)
+            ax.plot(points,
+                    normal.pdf(points, loc=loc[ii], scale=scale[ii]),
+                    'r')
+
+        return fig

     def plot_scatter_pairs(self, idx1, idx2, title=None, ax=None):
-        """create scatter plot of two random effects
+        '''create scatter plot of two random effects

         Parameters
         ----------
@@ -404,5 +612,48 @@ class OneWayMixedResults(LikelihoodModelResults):
         -----
         Still needs ellipse from estimated parameters

-        """
-        pass
+        '''
+        import matplotlib.pyplot as plt
+        if ax is None:
+            fig = plt.figure()
+            ax = fig.add_subplot(1,1,1)
+            ax_or_fig = fig
+
+        re1 = self.params_random_units[:,idx1]
+        re2 = self.params_random_units[:,idx2]
+        ax.plot(re1, re2, 'o', alpha=0.75)
+        if title is None:
+            title = 'Random Effects %d and %d' % (idx1, idx2)
+        ax.set_title(title)
+        ax_or_fig = ax
+
+        return ax_or_fig
+
+    def plot_scatter_all_pairs(self, title=None):
+        from statsmodels.graphics.plot_grids import scatter_ellipse
+        if self.model.k_exog_re < 2:
+            raise ValueError('less than two variables available')
+
+        return scatter_ellipse(self.params_random_units,
+                               #ell_kwds not implemented yet
+                               ell_kwds={'color':'r'})
+
+#        #note I have written this already as helper function, get it
+#        import matplotlib.pyplot as plt
+#        #from scipy.stats import norm as normal
+#        fig = plt.figure()
+#        k = self.model.k_exog_re
+#        n_plots = k * (k - 1) // 2
+#        if n_plots > 3:
+#            rows, cols = int(np.ceil(n_plots * 0.5)), 2
+#        else:
+#            rows, cols = n_plots, 1
+#
+#        count = 1
+#        for ii in range(k):
+#            for jj in range(ii):
+#                ax = fig.add_subplot(rows, cols, count)
+#                self.plot_scatter_pairs(ii, jj, title=None, ax=ax)
+#                count += 1
+#
+#        return fig
diff --git a/statsmodels/sandbox/panel/panel_short.py b/statsmodels/sandbox/panel/panel_short.py
index 9b0a9804c..951382835 100644
--- a/statsmodels/sandbox/panel/panel_short.py
+++ b/statsmodels/sandbox/panel/panel_short.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Panel data analysis for short T and large N

 Created on Sat Dec 17 19:32:00 2011
@@ -20,73 +21,159 @@ implementations possible (sparse, kroneker, ...)
 the only two group specific methods or get_within_cov and whiten

 """
+
 import numpy as np
 from statsmodels.regression.linear_model import OLS, GLS
 from statsmodels.tools.grouputils import GroupSorted


 def sum_outer_product_loop(x, group_iter):
-    """sum outerproduct dot(x_i, x_i.T) over individuals
+    '''sum outerproduct dot(x_i, x_i.T) over individuals

     loop version

-    """
-    pass
+    '''
+
+    mom = 0
+    for g in group_iter():
+        x_g = x[g]
+        #print 'x_g.shape', x_g.shape
+        mom += np.outer(x_g, x_g)

+    return mom

 def sum_outer_product_balanced(x, n_groups):
-    """sum outerproduct dot(x_i, x_i.T) over individuals
+    '''sum outerproduct dot(x_i, x_i.T) over individuals

     where x_i is (nobs_i, 1), and result is (nobs_i, nobs_i)

     reshape-dot version, for x.ndim=1 only

-    """
-    pass
+    '''
+    xrs = x.reshape(-1, n_groups, order='F')
+    return np.dot(xrs, xrs.T)  #should be (nobs_i, nobs_i)
+
+    #x.reshape(n_groups, nobs_i,  k_vars) #, order='F')
+    #... ? this is getting 3-dimensional  dot, tensordot?
+    #needs (n_groups, k_vars, k_vars) array with sum over groups
+    #NOT
+    #I only need this for x is 1d, i.e. residual


 def whiten_individuals_loop(x, transform, group_iter):
-    """apply linear transform for each individual
+    '''apply linear transform for each individual

     loop version
-    """
-    pass
+    '''
+
+    #Note: figure out dimension of transformed variable
+    #so we can pre-allocate
+    x_new = []
+    for g in group_iter():
+        x_g = x[g]
+        x_new.append(np.dot(transform, x_g))
+
+    return np.concatenate(x_new) #np.vstack(x_new)  #or np.array(x_new) #check shape
+


 class ShortPanelGLS2:
-    """Short Panel with general intertemporal within correlation
+    '''Short Panel with general intertemporal within correlation

     assumes data is stacked by individuals, panel is balanced and
     within correlation structure is identical across individuals.

     It looks like this can just inherit GLS and overwrite whiten
-    """
+    '''

     def __init__(self, endog, exog, group):
         self.endog = endog
         self.exog = exog
         self.group = GroupSorted(group)
         self.n_groups = self.group.n_groups
+        #self.nobs_group =   #list for unbalanced?

+    def fit_ols(self):
+        self.res_pooled = OLS(self.endog, self.exog).fit()
+        return self.res_pooled  #return or not
+
+    def get_within_cov(self, resid):
+        #central moment or not?
+        mom = sum_outer_product_loop(resid, self.group.group_iter)
+        return mom / self.n_groups   #df correction ?
+
+    def whiten_groups(self, x, cholsigmainv_i):
+        #from scipy import sparse #use sparse
+        wx = whiten_individuals_loop(x, cholsigmainv_i, self.group.group_iter)
+        return wx
+
+    def fit(self):
+        res_pooled = self.fit_ols() #get starting estimate
+        sigma_i = self.get_within_cov(res_pooled.resid)
+        self.cholsigmainv_i = np.linalg.cholesky(np.linalg.pinv(sigma_i)).T
+        wendog = self.whiten_groups(self.endog, self.cholsigmainv_i)
+        wexog = self.whiten_groups(self.exog, self.cholsigmainv_i)
+        #print wendog.shape, wexog.shape
+        self.res1 = OLS(wendog, wexog).fit()
+        return self.res1

 class ShortPanelGLS(GLS):
-    """Short Panel with general intertemporal within correlation
+    '''Short Panel with general intertemporal within correlation

     assumes data is stacked by individuals, panel is balanced and
     within correlation structure is identical across individuals.

     It looks like this can just inherit GLS and overwrite whiten
-    """
+    '''

     def __init__(self, endog, exog, group, sigma_i=None):
         self.group = GroupSorted(group)
         self.n_groups = self.group.n_groups
-        nobs_i = len(endog) / self.n_groups
+        #self.nobs_group =   #list for unbalanced?
+        nobs_i = len(endog) / self.n_groups #endog might later not be an ndarray
+        #balanced only for now,
+        #which is a requirement anyway in this case (full cov)
+        #needs to change for parametrized sigma_i
+
+        #
         if sigma_i is None:
             sigma_i = np.eye(int(nobs_i))
         self.cholsigmainv_i = np.linalg.cholesky(np.linalg.pinv(sigma_i)).T
+
+        #super is taking care of endog, exog and sigma
         super(self.__class__, self).__init__(endog, exog, sigma=None)

+    def get_within_cov(self, resid):
+        #central moment or not?
+        mom = sum_outer_product_loop(resid, self.group.group_iter)
+        return mom / self.n_groups   #df correction ?
+
+    def whiten_groups(self, x, cholsigmainv_i):
+        #from scipy import sparse #use sparse
+        wx = whiten_individuals_loop(x, cholsigmainv_i, self.group.group_iter)
+        return wx
+
+    def _fit_ols(self):
+        #used as starting estimate in old explicity version
+        self.res_pooled = OLS(self.endog, self.exog).fit()
+        return self.res_pooled  #return or not
+
+    def _fit_old(self):
+        #old explicit version
+        res_pooled = self._fit_ols() #get starting estimate
+        sigma_i = self.get_within_cov(res_pooled.resid)
+        self.cholsigmainv_i = np.linalg.cholesky(np.linalg.pinv(sigma_i)).T
+        wendog = self.whiten_groups(self.endog, self.cholsigmainv_i)
+        wexog = self.whiten_groups(self.exog, self.cholsigmainv_i)
+        self.res1 = OLS(wendog, wexog).fit()
+        return self.res1
+
+    def whiten(self, x):
+        #whiten x by groups, will be applied to endog and exog
+        wx = whiten_individuals_loop(x, self.cholsigmainv_i, self.group.group_iter)
+        return wx
+
+    #copied from GLSHet and adjusted (boiler plate?)
     def fit_iterative(self, maxiter=3):
         """
         Perform an iterative two-step procedure to estimate the GLS model.
@@ -110,4 +197,38 @@ class ShortPanelGLS(GLS):
         calculation. Calling fit_iterative(maxiter) once does not do any
         redundant recalculations (whitening or calculating pinv_wexog).
         """
-        pass
+        #Note: in contrast to GLSHet, we do not have an auxiliary regression here
+        #      might be needed if there is more structure in cov_i
+
+        #because we only have the loop we are not attaching the ols_pooled
+        #initial estimate anymore compared to original version
+
+        if maxiter < 1:
+            raise ValueError('maxiter needs to be at least 1')
+
+        import collections
+        self.history = collections.defaultdict(list) #not really necessary
+
+        for i in range(maxiter):
+            #pinv_wexog is cached, delete it to force recalculation
+            if hasattr(self, 'pinv_wexog'):
+                del self.pinv_wexog
+
+            #fit with current cov, GLS, i.e. OLS on whitened endog, exog
+            results = self.fit()
+            self.history['self_params'].append(results.params)
+
+            if not i == maxiter-1:  #skip for last iteration, could break instead
+                #print 'ols',
+                self.results_old = results #store previous results for debugging
+
+                #get cov from residuals of previous regression
+                sigma_i = self.get_within_cov(results.resid)
+                self.cholsigmainv_i = np.linalg.cholesky(np.linalg.pinv(sigma_i)).T
+
+                #calculate new whitened endog and exog
+                self.initialize()
+
+        #note results is the wrapper, results._results is the results instance
+        #results._results.results_residual_regression = res_resid
+        return results
diff --git a/statsmodels/sandbox/panel/panelmod.py b/statsmodels/sandbox/panel/panelmod.py
index 4a4c567ee..733525952 100644
--- a/statsmodels/sandbox/panel/panelmod.py
+++ b/statsmodels/sandbox/panel/panelmod.py
@@ -7,9 +7,13 @@ References
 Baltagi, Badi H. `Econometric Analysis of Panel Data.` 4th ed. Wiley, 2008.
 """
 from functools import reduce
+
 import numpy as np
+
 from statsmodels.regression.linear_model import GLS
-__all__ = ['PanelModel']
+
+__all__ = ["PanelModel"]
+
 from pandas import Panel


@@ -24,11 +28,16 @@ def group(X):
     >>> g
     array([ 0.,  0.,  1.,  2.,  1.,  2.])
     """
-    pass
-
+    uniq_dict = {}
+    group = np.zeros(len(X))
+    for i in range(len(X)):
+        if not X[i] in uniq_dict:
+            uniq_dict.update({X[i] : len(uniq_dict)})
+        group[i] = uniq_dict[X[i]]
+    return group

 def repanel_cov(groups, sigmas):
-    """calculate error covariance matrix for random effects model
+    '''calculate error covariance matrix for random effects model

     Parameters
     ----------
@@ -53,14 +62,27 @@ def repanel_cov(groups, sigmas):
     -----
     This does not use sparse matrices and constructs nobs by nobs
     matrices. Also, omegainvsqrt is not sparse, i.e. elements are non-zero
-    """
-    pass
+    '''
+
+    if groups.ndim == 1:
+        groups = groups[:,None]
+    nobs, nre = groups.shape
+    omega = sigmas[-1]*np.eye(nobs)
+    for igr in range(nre):
+        group = groups[:,igr:igr+1]
+        groupuniq = np.unique(group)
+        dummygr = sigmas[igr] * (group == groupuniq).astype(float)
+        omega +=  np.dot(dummygr, dummygr.T)
+    ev, evec = np.linalg.eigh(omega)  #eig does not work
+    omegainv = np.dot(evec, (1/ev * evec).T)
+    omegainvhalf = evec/np.sqrt(ev)
+    return omega, omegainv, omegainvhalf
+


 class PanelData(Panel):
     pass

-
 class PanelModel:
     """
     An abstract statistical model class for panel (longitudinal) datasets.
@@ -80,11 +102,22 @@ class PanelModel:
     If a pandas object is supplied it is assumed that the major_axis is time
     and that the minor_axis has the panel variable.
     """
-
     def __init__(self, endog=None, exog=None, panel=None, time=None,
-        xtnames=None, equation=None, panel_data=None):
+            xtnames=None, equation=None, panel_data=None):
         if panel_data is None:
+#            if endog == None and exog == None and panel == None and \
+#                    time == None:
+#                raise ValueError("If pandel_data is False then endog, exog, \
+#panel_arr, and time_arr cannot be None.")
             self.initialize(endog, exog, panel, time, xtnames, equation)
+#        elif aspandas != False:
+#            if not isinstance(endog, str):
+#                raise ValueError("If a pandas object is supplied then endog \
+#must be a string containing the name of the endogenous variable")
+#            if not isinstance(aspandas, Panel):
+#                raise ValueError("Only pandas.Panel objects are supported")
+#            self.initialize_pandas(endog, aspandas, panel_name)
+

     def initialize(self, endog, exog, panel, time, xtnames, equation):
         """
@@ -92,16 +125,110 @@ class PanelModel:

         See PanelModel
         """
-        pass
-
+#TODO: for now, we are going assume a constant, and then make the first
+#panel the base, add a flag for this....
+
+        # get names
+        names = equation.split(" ")
+        self.endog_name = names[0]
+        exog_names = names[1:]  # this makes the order matter in the array
+        self.panel_name = xtnames[0]
+        self.time_name = xtnames[1]
+
+
+        novar = exog.var(0) == 0
+        if True in novar:
+            cons_index = np.where(novar == 1)[0][0] # constant col. num
+            exog_names.insert(cons_index, 'cons')
+
+        self._cons_index = novar # used again in fit_fixed
+        self.exog_names = exog_names
+        self.endog = np.squeeze(np.asarray(endog))
+        exog = np.asarray(exog)
+        self.exog = exog
+        self.panel = np.asarray(panel)
+        self.time = np.asarray(time)
+
+        self.paneluniq = np.unique(panel)
+        self.timeuniq = np.unique(time)
+#TODO: this  structure can possibly be extracted somewhat to deal with
+#names in general
+
+#TODO: add some dimension checks, etc.
+
+#    def initialize_pandas(self, endog, aspandas):
+#        """
+#        Initialize pandas objects.
+#
+#        See PanelModel.
+#        """
+#        self.aspandas = aspandas
+#        endog = aspandas[endog].values
+#        self.endog = np.squeeze(endog)
+#        exog_name = aspandas.columns.tolist()
+#        exog_name.remove(endog)
+#        self.exog = aspandas.filterItems(exog_name).values
+#TODO: can the above be simplified to slice notation?
+#        if panel_name != None:
+#            self.panel_name = panel_name
+#        self.exog_name = exog_name
+#        self.endog_name = endog
+#        self.time_arr = aspandas.major_axis
+        #TODO: is time always handled correctly in fromRecords?
+#        self.panel_arr = aspandas.minor_axis
+#TODO: all of this might need to be refactored to explicitly rely (internally)
+# on the pandas LongPanel structure for speed and convenience.
+# not sure this part is finished...
+
+#TODO: does not conform to new initialize
+    def initialize_pandas(self, panel_data, endog_name, exog_name):
+        self.panel_data = panel_data
+        endog = panel_data[endog_name].values # does this create a copy?
+        self.endog = np.squeeze(endog)
+        if exog_name is None:
+            exog_name = panel_data.columns.tolist()
+            exog_name.remove(endog_name)
+        self.exog = panel_data.filterItems(exog_name).values # copy?
+        self._exog_name = exog_name
+        self._endog_name = endog_name
+        self._timeseries = panel_data.major_axis # might not need these
+        self._panelseries = panel_data.minor_axis
+
+#TODO: this could be pulled out and just have a by kwd that takes
+# the panel or time array
+#TODO: this also needs to be expanded for 'twoway'
     def _group_mean(self, X, index='oneway', counts=False, dummies=False):
         """
         Get group means of X by time or by panel.

         index default is panel
         """
-        pass
-
+        if index == 'oneway':
+            Y = self.panel
+            uniq = self.paneluniq
+        elif index == 'time':
+            Y = self.time
+            uniq = self.timeuniq
+        else:
+            raise ValueError("index %s not understood" % index)
+        print(Y, uniq, uniq[:,None], len(Y), len(uniq), len(uniq[:,None]),
+              index)
+        #TODO: use sparse matrices
+        dummy = (Y == uniq[:,None]).astype(float)
+        if X.ndim > 1:
+            mean = np.dot(dummy,X)/dummy.sum(1)[:,None]
+        else:
+            mean = np.dot(dummy,X)/dummy.sum(1)
+        if counts is False and dummies is False:
+            return mean
+        elif counts is True and dummies is False:
+            return mean, dummy.sum(1)
+        elif counts is True and dummies is True:
+            return mean, dummy.sum(1), dummy
+        elif counts is False and dummies is True:
+            return mean, dummy
+
+#TODO: Use kwd arguments or have fit_method methods?
     def fit(self, model=None, method=None, effects='oneway'):
         """
         method : LSDV, demeaned, MLE, GLS, BE, FE, optional
@@ -129,74 +256,186 @@ class PanelModel:
         This is unfinished.  None of the method arguments work yet.
         Only oneway effects should work.
         """
-        pass
+        if method: # get rid of this with default
+            method = method.lower()
+        model = model.lower()
+        if method and method not in ["lsdv", "demeaned", "mle",
+                                     "gls", "be", "fe"]:
+            # get rid of if method with default
+            raise ValueError("%s not a valid method" % method)
+#        if method == "lsdv":
+#            self.fit_lsdv(model)
+        if model == 'pooled':
+            return GLS(self.endog, self.exog).fit()
+        if model == 'between':
+            return self._fit_btwn(method, effects)
+        if model == 'fixed':
+            return self._fit_fixed(method, effects)
+
+#    def fit_lsdv(self, effects):
+#        """
+#        Fit using least squares dummy variables.
+#
+#        Notes
+#        -----
+#        Should only be used for small `nobs`.
+#        """
+#        pdummies = None
+#        tdummies = None
+
+    def _fit_btwn(self, method, effects):
+        # group mean regression or WLS
+        if effects != "twoway":
+            endog = self._group_mean(self.endog, index=effects)
+            exog = self._group_mean(self.exog, index=effects)
+        else:
+            raise ValueError("%s effects is not valid for the between "
+                             "estimator" % effects)
+        befit = GLS(endog, exog).fit()
+        return befit
+
+    def _fit_fixed(self, method, effects):
+        endog = self.endog
+        exog = self.exog
+        demeantwice = False
+        if effects in ["oneway","twoways"]:
+            if effects == "twoways":
+                demeantwice = True
+                effects = "oneway"
+            endog_mean, counts = self._group_mean(endog, index=effects,
+                counts=True)
+            exog_mean = self._group_mean(exog, index=effects)
+            counts = counts.astype(int)
+            endog = endog - np.repeat(endog_mean, counts)
+            exog = exog - np.repeat(exog_mean, counts, axis=0)
+        if demeantwice or effects == "time":
+            endog_mean, dummies = self._group_mean(endog, index="time",
+                dummies=True)
+            exog_mean = self._group_mean(exog, index="time")
+            # This allows unbalanced panels
+            endog = endog - np.dot(endog_mean, dummies)
+            exog = exog - np.dot(dummies.T, exog_mean)
+        fefit = GLS(endog, exog[:,-self._cons_index]).fit()
+#TODO: might fail with one regressor
+        return fefit
+
+


 class SURPanel(PanelModel):
     pass

-
 class SEMPanel(PanelModel):
     pass

-
 class DynamicPanel(PanelModel):
     pass

-
-if __name__ == '__main__':
+if __name__ == "__main__":
     import numpy.lib.recfunctions as nprf
     import pandas
     from pandas import Panel
+
     import statsmodels.api as sm
+
     data = sm.datasets.grunfeld.load()
+    # Baltagi does not include American Steel
     endog = data.endog[:-20]
     fullexog = data.exog[:-20]
+#    fullexog.sort(order=['firm','year'])
     panel_arr = nprf.append_fields(fullexog, 'investment', endog, float,
-        usemask=False)
+            usemask=False)
+
     panel_df = pandas.DataFrame(panel_arr)
     panel_panda = panel_df.set_index(['year', 'firm']).to_panel()
-    exog = fullexog[['value', 'capital']].view(float).reshape(-1, 2)
+
+
+    # the most cumbersome way of doing it as far as preprocessing by hand
+    exog = fullexog[['value','capital']].view(float).reshape(-1,2)
     exog = sm.add_constant(exog, prepend=False)
     panel = group(fullexog['firm'])
     year = fullexog['year']
-    panel_mod = PanelModel(endog, exog, panel, year, xtnames=['firm',
-        'year'], equation='invest value capital')
+    panel_mod = PanelModel(endog, exog, panel, year, xtnames=['firm','year'],
+            equation='invest value capital')
+# note that equation does not actually do anything but name the variables
     panel_ols = panel_mod.fit(model='pooled')
+
     panel_be = panel_mod.fit(model='between', effects='oneway')
     panel_fe = panel_mod.fit(model='fixed', effects='oneway')
+
     panel_bet = panel_mod.fit(model='between', effects='time')
     panel_fet = panel_mod.fit(model='fixed', effects='time')
+
     panel_fe2 = panel_mod.fit(model='fixed', effects='twoways')
-    groups = np.array([0, 0, 0, 1, 1, 2, 2, 2])
+
+
+#see also Baltagi (3rd edt) 3.3 THE RANDOM EFFECTS MODEL p.35
+#for explicit formulas for spectral decomposition
+#but this works also for unbalanced panel
+#
+#I also just saw: 9.4.2 The Random Effects Model p.176 which is
+#partially almost the same as I did
+#
+#this needs to use sparse matrices for larger datasets
+#
+#"""
+#
+#import numpy as np
+#
+
+    groups = np.array([0,0,0,1,1,2,2,2])
     nobs = groups.shape[0]
     groupuniq = np.unique(groups)
-    periods = np.array([0, 1, 2, 1, 2, 0, 1, 2])
+    periods = np.array([0,1,2,1,2,0,1,2])
     perioduniq = np.unique(periods)
-    dummygr = (groups[:, None] == groupuniq).astype(float)
-    dummype = (periods[:, None] == perioduniq).astype(float)
-    sigma = 1.0
-    sigmagr = np.sqrt(2.0)
-    sigmape = np.sqrt(3.0)
-    dummyall = np.c_[sigmagr * dummygr, sigmape * dummype]
-    omega = np.dot(dummyall, dummyall.T) + sigma * np.eye(nobs)
+
+    dummygr = (groups[:,None] == groupuniq).astype(float)
+    dummype = (periods[:,None] == perioduniq).astype(float)
+
+    sigma = 1.
+    sigmagr = np.sqrt(2.)
+    sigmape = np.sqrt(3.)
+
+    #dummyall = np.c_[sigma*np.ones((nobs,1)), sigmagr*dummygr,
+    #                                           sigmape*dummype]
+    #exclude constant ?
+    dummyall = np.c_[sigmagr*dummygr, sigmape*dummype]
+    # omega is the error variance-covariance matrix for the stacked
+    # observations
+    omega = np.dot(dummyall, dummyall.T) + sigma* np.eye(nobs)
     print(omega)
     print(np.linalg.cholesky(omega))
-    ev, evec = np.linalg.eigh(omega)
-    omegainv = np.dot(evec, (1 / ev * evec).T)
+    ev, evec = np.linalg.eigh(omega)  #eig does not work
+    omegainv = np.dot(evec, (1/ev * evec).T)
     omegainv2 = np.linalg.inv(omega)
     omegacomp = np.dot(evec, (ev * evec).T)
     print(np.max(np.abs(omegacomp - omega)))
-    print(np.max(np.abs(np.dot(omegainv, omega) - np.eye(nobs))))
-    omegainvhalf = evec / np.sqrt(ev)
-    print(np.max(np.abs(np.dot(omegainvhalf, omegainvhalf.T) - omegainv)))
+    #check
+    #print(np.dot(omegainv,omega)
+    print(np.max(np.abs(np.dot(omegainv,omega) - np.eye(nobs))))
+    omegainvhalf = evec/np.sqrt(ev)  #not sure whether ev should not be column
+    print(np.max(np.abs(np.dot(omegainvhalf,omegainvhalf.T) - omegainv)))
+
+    # now we can use omegainvhalf in GLS (instead of the cholesky)
+
+
+
+
+
+
+
+
     sigmas2 = np.array([sigmagr, sigmape, sigma])
     groups2 = np.column_stack((groups, periods))
     omega_, omegainv_, omegainvhalf_ = repanel_cov(groups2, sigmas2)
     print(np.max(np.abs(omega_ - omega)))
     print(np.max(np.abs(omegainv_ - omegainv)))
     print(np.max(np.abs(omegainvhalf_ - omegainvhalf)))
-    Pgr = reduce(np.dot, [dummygr, np.linalg.inv(np.dot(dummygr.T, dummygr)
-        ), dummygr.T])
+
+    # notation Baltagi (3rd) section 9.4.1 (Fixed Effects Model)
+    Pgr = reduce(np.dot,[dummygr,
+            np.linalg.inv(np.dot(dummygr.T, dummygr)),dummygr.T])
     Qgr = np.eye(nobs) - Pgr
+    # within group effect: np.dot(Qgr, groups)
+    # but this is not memory efficient, compared to groupstats
     print(np.max(np.abs(np.dot(Qgr, groups))))
diff --git a/statsmodels/sandbox/panel/random_panel.py b/statsmodels/sandbox/panel/random_panel.py
index 61a4b5d1d..6aed0f570 100644
--- a/statsmodels/sandbox/panel/random_panel.py
+++ b/statsmodels/sandbox/panel/random_panel.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Generate a random process with panel structure

 Created on Sat Dec 17 22:15:27 2011
@@ -13,12 +14,13 @@ Notes
 * only one-way (repeated measures) so far

 """
+
 import numpy as np
 from . import correlation_structures as cs


 class PanelSample:
-    """data generating process for panel with within correlation
+    '''data generating process for panel with within correlation

     allows various within correlation structures, but no random intercept yet

@@ -56,40 +58,95 @@ class PanelSample:
     This is just used in one example so far and needs more usage to see what
     will be useful to add.

-    """
+    '''

     def __init__(self, nobs, k_vars, n_groups, exog=None, within=True,
-        corr_structure=np.eye, corr_args=(), scale=1, seed=None):
-        nobs_i = nobs // n_groups
-        nobs = nobs_i * n_groups
+                 corr_structure=np.eye, corr_args=(), scale=1, seed=None):
+
+
+        nobs_i = nobs//n_groups
+        nobs = nobs_i * n_groups  #make balanced
         self.nobs = nobs
         self.nobs_i = nobs_i
         self.n_groups = n_groups
         self.k_vars = k_vars
         self.corr_structure = corr_structure
         self.groups = np.repeat(np.arange(n_groups), nobs_i)
-        self.group_indices = np.arange(n_groups + 1) * nobs_i
+
+        self.group_indices = np.arange(n_groups+1) * nobs_i #check +1
+
         if exog is None:
             if within:
+                #t = np.tile(np.linspace(-1,1,nobs_i), n_groups)
                 t = np.tile(np.linspace(0, 2, nobs_i), n_groups)
+                #rs2 = np.random.RandomState(9876)
+                #t = 1 + 0.3 * rs2.randn(nobs_i * n_groups)
+                #mix within and across variation
+                #t += np.repeat(np.linspace(-1,1,nobs_i), n_groups)
             else:
-                t = np.repeat(np.linspace(-1, 1, nobs_i), n_groups)
-            exog = t[:, None] ** np.arange(k_vars)
+                #no within group variation,
+                t = np.repeat(np.linspace(-1,1,nobs_i), n_groups)
+
+            exog = t[:,None]**np.arange(k_vars)
+
         self.exog = exog
+        #self.y_true = exog.sum(1)  #all coefficients equal 1,
+        #moved to make random coefficients
+        #initialize
         self.y_true = None
         self.beta = None
+
         if seed is None:
             seed = np.random.randint(0, 999999)
+
         self.seed = seed
         self.random_state = np.random.RandomState(seed)
+
+        #this makes overwriting difficult, move to method?
         self.std = scale * np.ones(nobs_i)
         corr = self.corr_structure(nobs_i, *corr_args)
         self.cov = cs.corr2cov(corr, self.std)
         self.group_means = np.zeros(n_groups)

+
+    def get_y_true(self):
+        if self.beta is None:
+            self.y_true = self.exog.sum(1)
+        else:
+            self.y_true = np.dot(self.exog, self.beta)
+
+
     def generate_panel(self):
-        """
+        '''
         generate endog for a random panel dataset with within correlation

-        """
-        pass
+        '''
+
+        random = self.random_state
+
+        if self.y_true is None:
+            self.get_y_true()
+
+        nobs_i = self.nobs_i
+        n_groups = self.n_groups
+
+        use_balanced = True
+        if use_balanced: #much faster for balanced case
+            noise = self.random_state.multivariate_normal(np.zeros(nobs_i),
+                                                  self.cov,
+                                                  size=n_groups).ravel()
+            #need to add self.group_means
+            noise += np.repeat(self.group_means, nobs_i)
+        else:
+            noise = np.empty(self.nobs, np.float64)
+            noise.fill(np.nan)
+            for ii in range(self.n_groups):
+                #print ii,
+                idx, idxupp = self.group_indices[ii:ii+2]
+                #print idx, idxupp
+                mean_i = self.group_means[ii]
+                noise[idx:idxupp] = self.random_state.multivariate_normal(
+                                        mean_i * np.ones(self.nobs_i), self.cov)
+
+        endog = self.y_true + noise
+        return endog
diff --git a/statsmodels/sandbox/panel/sandwich_covariance_generic.py b/statsmodels/sandbox/panel/sandwich_covariance_generic.py
index 902c5a82f..63b9e6b18 100644
--- a/statsmodels/sandbox/panel/sandwich_covariance_generic.py
+++ b/statsmodels/sandbox/panel/sandwich_covariance_generic.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """covariance with (nobs,nobs) loop and general kernel

 This is a general implementation that is not efficient for any special cases.
@@ -14,9 +15,8 @@ License: BSD-3
 """
 import numpy as np

-
 def kernel(d1, d2, r=None, weights=None):
-    """general product kernel
+    '''general product kernel

     hardcoded split for the example:
         cat1 is continuous (time), other categories are discrete
@@ -25,12 +25,19 @@ def kernel(d1, d2, r=None, weights=None):
     r is (0,1) indicator vector for boolean weights 1{d1_i == d2_i}

     returns boolean if no continuous weights are used
-    """
-    pass
+    '''
+
+    diff = d1 - d2
+    if (weights is None) or (r[0] == 0):
+        #time is irrelevant or treated as categorical
+        return np.all((r * diff) == 0)   #return bool
+    else:
+        #time uses continuous kernel, all other categorical
+        return weights[diff] * np.all((r[1:] * diff[1:]) == 0)


 def aggregate_cov(x, d, r=None, weights=None):
-    """sum of outer procuct over groups and time selected by r
+    '''sum of outer procuct over groups and time selected by r

     This is for a generic reference implementation, it uses a nobs-nobs double
     loop.
@@ -60,24 +67,57 @@ def aggregate_cov(x, d, r=None, weights=None):
     This uses `kernel` to calculate the weighted distance between two
     observations.

-    """
-    pass
+    '''

+    nobs = x.shape[0]   #either 1d or 2d with obs in rows
+    #next is not needed yet
+#    if x.ndim == 2:
+#        kvars = x.shape[1]
+#    else:
+#        kvars = 1

-def S_all_hac(x, d, nlags=1):
-    """HAC independent of categorical group membership
-    """
-    pass
+    count = 0 #count non-zero pairs for cross checking, not needed
+    res = 0 * np.outer(x[0], x[0])  #get output shape

+    for ii in range(nobs):
+        for jj in range(nobs):
+            w = kernel(d[ii], d[jj], r=r, weights=weights)
+            if w:  #true or non-zero
+                res += w * np.outer(x[0], x[0])
+                count *= 1

-def S_within_hac(x, d, nlags=1, groupidx=1):
-    """HAC for observations within a categorical group
-    """
-    pass
+    return res, count

+def weights_bartlett(nlags):
+    #with lag zero, nlags is the highest lag included
+    return 1 - np.arange(nlags+1)/(nlags+1.)
+
+#------- examples, cases: hardcoded for d is time and two categorical groups
+def S_all_hac(x, d, nlags=1):
+    '''HAC independent of categorical group membership
+    '''
+    r = np.zeros(d.shape[1])
+    r[0] = 1
+    weights = weights_bartlett(nlags)
+    return aggregate_cov(x, d, r=r, weights=weights)
+
+def S_within_hac(x, d, nlags=1, groupidx=1):
+    '''HAC for observations within a categorical group
+    '''
+    r = np.zeros(d.shape[1])
+    r[0] = 1
+    r[groupidx] = 1
+    weights = weights_bartlett(nlags)
+    return aggregate_cov(x, d, r=r, weights=weights)
+
+def S_cluster(x, d, groupidx=[1]):
+    r = np.zeros(d.shape[1])
+    r[groupidx] = 1
+    return aggregate_cov(x, d, r=r, weights=None)

 def S_white(x, d):
-    """simple white heteroscedasticity robust covariance
+    '''simple white heteroscedasticity robust covariance
     note: calculating this way is very inefficient, just for cross-checking
-    """
-    pass
+    '''
+    r = np.ones(d.shape[1])  #only points on diagonal
+    return aggregate_cov(x, d, r=r, weights=None)
diff --git a/statsmodels/sandbox/pca.py b/statsmodels/sandbox/pca.py
index 9e7df425a..bab8fff0b 100644
--- a/statsmodels/sandbox/pca.py
+++ b/statsmodels/sandbox/pca.py
@@ -1,3 +1,4 @@
+#Copyright (c) 2008 Erik Tollerud (etolleru@uci.edu)
 import numpy as np


@@ -7,46 +8,69 @@ class Pca:

     p is the number of dimensions, while N is the number of data points
     """
-    _colors = 'r', 'g', 'b', 'c', 'y', 'm', 'k'
+    _colors=('r','g','b','c','y','m','k') #defaults

-    def __init__(self, data, names=None):
+    def __calc(self):
+        A = self.A
+        M=A-np.mean(A,axis=0)
+        N=M/np.std(M,axis=0)
+
+        self.M = M
+        self.N = N
+        self._eig = None
+
+    def __init__(self,data,names=None):
         """
         p X N matrix input
         """
         A = np.array(data).T
-        n, p = A.shape
-        self.n, self.p = n, p
+        n,p = A.shape
+        self.n,self.p = n,p
         if p > n:
             from warnings import warn
             warn('p > n - intentional?', RuntimeWarning)
         self.A = A
-        self._origA = A.copy()
+        self._origA=A.copy()
+
         self.__calc()
-        self._colors = np.tile(self._colors, int((p - 1) / len(self._colors
-            )) + 1)[:p]
+
+        self._colors= np.tile(self._colors,int((p-1)/len(self._colors))+1)[:p]
         if names is not None and len(names) != p:
             raise ValueError('names must match data dimension')
         self.names = None if names is None else tuple([str(x) for x in names])

+
     def getCovarianceMatrix(self):
         """
         returns the covariance matrix for the dataset
         """
-        pass
+        return np.cov(self.N.T)

     def getEigensystem(self):
         """
         returns a tuple of (eigenvalues,eigenvectors) for the data set.
         """
-        pass
+        if self._eig is None:
+            res = np.linalg.eig(self.getCovarianceMatrix())
+            sorti=np.argsort(res[0])[::-1]
+            res=(res[0][sorti],res[1][:,sorti])
+            self._eig=res
+        return self._eig
+
+    def getEigenvalues(self):
+        return self.getEigensystem()[0]
+
+    def getEigenvectors(self):
+        return self.getEigensystem()[1]

     def getEnergies(self):
         """
         "energies" are just normalized eigenvectors
         """
-        pass
+        v=self.getEigenvalues()
+        return v/np.sum(v)

-    def plot2d(self, ix=0, iy=1, clf=True):
+    def plot2d(self,ix=0,iy=1,clf=True):
         """
         Generates a 2-dimensional plot of the data set and principle components
         using matplotlib.
@@ -54,9 +78,25 @@ class Pca:
         ix specifies which p-dimension to put on the x-axis of the plot
         and iy specifies which to put on the y-axis (0-indexed)
         """
-        pass
-
-    def plot3d(self, ix=0, iy=1, iz=2, clf=True):
+        import matplotlib.pyplot as plt
+        x,y=self.N[:,ix],self.N[:,iy]
+        if clf:
+            plt.clf()
+        plt.scatter(x,y)
+        vals,evs=self.getEigensystem()
+        #evx,evy=evs[:,ix],evs[:,iy]
+        xl,xu=plt.xlim()
+        yl,yu=plt.ylim()
+        dx,dy=(xu-xl),(yu-yl)
+        for val,vec,c in zip(vals,evs.T,self._colors):
+            plt.arrow(0,0,val*vec[ix],val*vec[iy],head_width=0.05*(dx*dy/4)**0.5,fc=c,ec=c)
+        #plt.arrow(0,0,vals[ix]*evs[ix,ix],vals[ix]*evs[iy,ix],head_width=0.05*(dx*dy/4)**0.5,fc='g',ec='g')
+        #plt.arrow(0,0,vals[iy]*evs[ix,iy],vals[iy]*evs[iy,iy],head_width=0.05*(dx*dy/4)**0.5,fc='r',ec='r')
+        if self.names is not None:
+            plt.xlabel('$'+self.names[ix]+'/\\sigma$')
+            plt.ylabel('$'+self.names[iy]+'/\\sigma$')
+
+    def plot3d(self,ix=0,iy=1,iz=2,clf=True):
         """
         Generates a 3-dimensional plot of the data set and principle components
         using mayavi.
@@ -64,9 +104,19 @@ class Pca:
         ix, iy, and iz specify which of the input p-dimensions to place on each of
         the x,y,z axes, respectively (0-indexed).
         """
-        pass
+        import enthought.mayavi.mlab as M
+        if clf:
+            M.clf()
+        z3=np.zeros(3)
+        v=(self.getEigenvectors()*self.getEigenvalues())
+        M.quiver3d(z3,z3,z3,v[ix],v[iy],v[iz],scale_factor=5)
+        M.points3d(self.N[:,ix],self.N[:,iy],self.N[:,iz],scale_factor=0.3)
+        if self.names:
+            M.axes(xlabel=self.names[ix]+'/sigma',ylabel=self.names[iy]+'/sigma',zlabel=self.names[iz]+'/sigma')
+        else:
+            M.axes()

-    def sigclip(self, sigs):
+    def sigclip(self,sigs):
         """
         clips out all data points that are more than a certain number
         of standard deviations from the mean.
@@ -75,9 +125,21 @@ class Pca:
         specifies the number of standard deviations along each of the
         p dimensions.
         """
-        pass
+        if np.isscalar(sigs):
+            sigs=sigs*np.ones(self.N.shape[1])
+        sigs = sigs*np.std(self.N,axis=1)
+        n = self.N.shape[0]
+        m = np.all(np.abs(self.N) < sigs,axis=1)
+        self.A=self.A[m]
+        self.__calc()
+        return n-sum(m)

-    def project(self, vals=None, enthresh=None, nPCs=None, cumen=None):
+    def reset(self):
+        self.A = self._origA.copy()
+        self.__calc()
+
+
+    def project(self,vals=None,enthresh=None,nPCs=None,cumen=None):
         """
         projects the normalized values onto the components

@@ -88,21 +150,75 @@ class Pca:

         returns n,p(>threshold) dimension array
         """
-        pass
-
-    def deproject(self, A, normed=True):
+        nonnones = sum([e is not None for e in (enthresh, nPCs, cumen)])
+        if nonnones == 0:
+            m = slice(None)
+        elif nonnones > 1:
+            raise ValueError("cannot specify more than one threshold")
+        else:
+            if enthresh is not None:
+                m = self.energies() > enthresh
+            elif nPCs is not None:
+                m = slice(None,nPCs)
+            elif cumen is not None:
+                m = np.cumsum(self.energies()) <  cumen
+            else:
+                raise RuntimeError('Should be unreachable')
+
+        if vals is None:
+            vals = self.N.T
+        else:
+            vals = np.array(vals,copy=False)
+            if self.N.T.shape[0] != vals.shape[0]:
+                raise ValueError("shape for vals does not match")
+        proj = np.matrix(self.getEigenvectors()).T*vals
+        return proj[m].T
+
+    def deproject(self,A,normed=True):
         """
         input is an n X q array, where q <= p

         output is p X n
         """
-        pass
+        A=np.atleast_2d(A)
+        n,q = A.shape
+        p = self.A.shape[1]
+        if q > p :
+            raise ValueError("q > p")
+
+        evinv=np.linalg.inv(np.matrix(self.getEigenvectors()).T)
+
+        zs = np.zeros((n,p))
+        zs[:,:q]=A
+
+        proj = evinv*zs.T
+
+        if normed:
+            return np.array(proj.T).T
+        else:
+            mns=np.mean(self.A,axis=0)
+            sds=np.std(self.M,axis=0)
+            return (np.array(proj.T)*sds+mns).T

-    def subtractPC(self, pc, vals=None):
+    def subtractPC(self,pc,vals=None):
         """
         pc can be a scalar or any sequence of pc indecies

         if vals is None, the source data is self.A, else whatever is in vals
         (which must be p x m)
         """
-        pass
+        if vals is None:
+            vals = self.A
+        else:
+            vals = vals.T
+            if vals.shape[1]!= self.A.shape[1]:
+                raise ValueError("vals do not have the correct number of components")
+
+        pcs=self.project()
+        zpcs=np.zeros_like(pcs)
+        zpcs[:,pc]=pcs[:,pc]
+        upc=self.deproject(zpcs,False)
+
+        A = vals.T-upc
+        B = A.T*np.std(self.M,axis=0)
+        return B+np.mean(self.A,axis=0)
diff --git a/statsmodels/sandbox/predict_functional.py b/statsmodels/sandbox/predict_functional.py
index cb98f6e0e..8996fe5b3 100644
--- a/statsmodels/sandbox/predict_functional.py
+++ b/statsmodels/sandbox/predict_functional.py
@@ -10,9 +10,12 @@ import pandas as pd
 import patsy
 import numpy as np
 import warnings
+
 from statsmodels.tools.sm_exceptions import ValueWarning
 from statsmodels.compat.pandas import Appender
-_predict_functional_doc = """
+
+_predict_functional_doc =\
+    """
     Predictions and contrasts of a fitted model as a function of a given covariate.

     The value of the focus variable varies along a sequence of its
@@ -139,7 +142,57 @@ def _make_exog_from_formula(result, focus_var, summaries, values, num_points):
     fexog : data frame
         The data frame `dexog` processed through the model formula.
     """
-    pass
+
+    model = result.model
+    exog = model.data.frame
+
+    if summaries is None:
+        summaries = {}
+    if values is None:
+        values = {}
+
+    if exog[focus_var].dtype is np.dtype('O'):
+        raise ValueError('focus variable may not have object type')
+
+    colnames = list(summaries.keys()) + list(values.keys()) + [focus_var]
+    dtypes = [exog[x].dtype for x in colnames]
+
+    # Check for variables whose values are not set either through
+    # `values` or `summaries`.  Since the model data frame can contain
+    # extra variables not referenced in the formula RHS, this may not
+    # be a problem, so just warn.  There is no obvious way to extract
+    # from a formula all the variable names that it references.
+    varl = set(exog.columns.tolist()) - set([model.endog_names])
+    unmatched = varl - set(colnames)
+    unmatched = list(unmatched)
+    if len(unmatched) > 0:
+        warnings.warn("%s in data frame but not in summaries or values."
+                      % ", ".join(["'%s'" % x for x in unmatched]),
+                      ValueWarning)
+
+    # Initialize at zero so each column can be converted to any dtype.
+    ix = range(num_points)
+    fexog = pd.DataFrame(index=ix, columns=colnames)
+    for d, x in zip(dtypes, colnames):
+        fexog[x] = pd.Series(index=ix, dtype=d)
+
+    # The values of the 'focus variable' are a sequence of percentiles
+    pctls = np.linspace(0, 100, num_points).tolist()
+    fvals = np.percentile(exog[focus_var], pctls)
+    fvals = np.asarray(fvals)
+    fexog.loc[:, focus_var] = fvals
+
+    # The values of the other variables may be given by summary functions...
+    for ky in summaries.keys():
+        fexog.loc[:, ky] = summaries[ky](exog.loc[:, ky])
+
+    # or they may be provided as given values.
+    for ky in values.keys():
+        fexog[ky] = values[ky]
+
+    dexog = patsy.dmatrix(model.data.design_info, fexog,
+                          return_type='dataframe')
+    return dexog, fexog, fvals


 def _make_exog_from_arrays(result, focus_var, summaries, values, num_points):
@@ -154,7 +207,177 @@ def _make_exog_from_arrays(result, focus_var, summaries, values, num_points):
         A data frame in which the focus variable varies and the other variables
         are fixed at specified or computed values.
     """
-    pass
+
+    model = result.model
+    model_exog = model.exog
+    exog_names = model.exog_names
+
+    if summaries is None:
+        summaries = {}
+    if values is None:
+        values = {}
+
+    exog = np.zeros((num_points, model_exog.shape[1]))
+
+    # Check for variables whose values are not set either through
+    # `values` or `summaries`.
+    colnames = list(values.keys()) + list(summaries.keys()) + [focus_var]
+    unmatched = set(exog_names) - set(colnames)
+    unmatched = list(unmatched)
+    if len(unmatched) > 0:
+        warnings.warn("%s in model but not in `summaries` or `values`."
+                      % ", ".join(["'%s'" % x for x in unmatched]),
+                      ValueWarning)
+
+    # The values of the 'focus variable' are a sequence of percentiles
+    pctls = np.linspace(0, 100, num_points).tolist()
+    ix = exog_names.index(focus_var)
+    fvals = np.percentile(model_exog[:, ix], pctls)
+    exog[:, ix] = fvals
+
+    # The values of the other variables may be given by summary functions...
+    for ky in summaries.keys():
+        ix = exog_names.index(ky)
+        exog[:, ix] = summaries[ky](model_exog[:, ix])
+
+    # or they may be provided as given values.
+    for ky in values.keys():
+        ix = exog_names.index(ky)
+        exog[:, ix] = values[ky]
+
+    return exog, fvals
+
+
+def _make_exog(result, focus_var, summaries, values, num_points):
+
+    # Branch depending on whether the model was fit with a formula.
+    if hasattr(result.model.data, "frame"):
+        dexog, fexog, fvals = _make_exog_from_formula(result, focus_var,
+                                       summaries, values, num_points)
+    else:
+        exog, fvals = _make_exog_from_arrays(result, focus_var, summaries,
+                                 values, num_points)
+        dexog, fexog = exog, exog
+
+    return dexog, fexog, fvals
+
+
+def _check_args(values, summaries, values2, summaries2):
+
+    if values is None:
+        values = {}
+    if values2 is None:
+        values2 = {}
+    if summaries is None:
+        summaries = {}
+    if summaries2 is None:
+        summaries2 = {}
+
+    for (s,v) in (summaries, values), (summaries2, values2):
+        ky = set(v.keys()) & set(s.keys())
+        ky = list(ky)
+        if len(ky) > 0:
+            raise ValueError("One or more variable names are contained in both `summaries` and `values`:" +
+                             ", ".join(ky))
+
+    return values, summaries, values2, summaries2
+
+
+@Appender(_predict_functional_doc)
+def predict_functional(result, focus_var, summaries=None, values=None,
+                       summaries2=None, values2=None, alpha=0.05,
+                       ci_method="pointwise", linear=True, num_points=10,
+                       exog=None, exog2=None, **kwargs):
+
+    if ci_method not in ("pointwise", "scheffe", "simultaneous"):
+        raise ValueError('confidence band method must be one of '
+                         '`pointwise`, `scheffe`, and `simultaneous`.')
+
+    contrast = (values2 is not None) or (summaries2 is not None)
+
+    if contrast and not linear:
+        raise ValueError("`linear` must be True for computing contrasts")
+
+    model = result.model
+    if exog is not None:
+
+        if any(x is not None for x in [summaries, summaries2, values, values2]):
+            raise ValueError("if `exog` is provided then do not "
+                             "provide `summaries` or `values`")
+
+        fexog = exog
+        dexog = patsy.dmatrix(model.data.design_info,
+                              fexog, return_type='dataframe')
+        fvals = exog[focus_var]
+
+        if exog2 is not None:
+            fexog2 = exog
+            dexog2 = patsy.dmatrix(model.data.design_info,
+                                   fexog2, return_type='dataframe')
+            fvals2 = fvals
+
+    else:
+
+        values, summaries, values2, summaries2 = _check_args(values,
+                                                             summaries,
+                                                             values2,
+                                                             summaries2)
+
+        dexog, fexog, fvals = _make_exog(result, focus_var, summaries,
+                                         values, num_points)
+
+        if len(summaries2) + len(values2) > 0:
+            dexog2, fexog2, fvals2 = _make_exog(result, focus_var, summaries2,
+                                                values2, num_points)
+
+    from statsmodels.genmod.generalized_linear_model import GLM
+    from statsmodels.genmod.generalized_estimating_equations import GEE
+    if isinstance(result.model, (GLM, GEE)):
+        kwargs_pred = kwargs.copy()
+        kwargs_pred.update({"which": "linear"})
+    else:
+        kwargs_pred = kwargs
+
+    pred = result.predict(exog=fexog, **kwargs_pred)
+    if contrast:
+        pred2 = result.predict(exog=fexog2, **kwargs_pred)
+        pred = pred - pred2
+        dexog = dexog - dexog2
+
+    if ci_method == 'pointwise':
+
+        t_test = result.t_test(dexog)
+        cb = t_test.conf_int(alpha=alpha)
+
+    elif ci_method == 'scheffe':
+
+        t_test = result.t_test(dexog)
+        sd = t_test.sd
+        cb = np.zeros((num_points, 2))
+
+        # Scheffe's method
+        from scipy.stats.distributions import f as fdist
+        df1 = result.model.exog.shape[1]
+        df2 = result.model.exog.shape[0] - df1
+        qf = fdist.cdf(1 - alpha, df1, df2)
+        fx = sd * np.sqrt(df1 * qf)
+        cb[:, 0] = pred - fx
+        cb[:, 1] = pred + fx
+
+    elif ci_method == 'simultaneous':
+
+        sigma, c = _glm_basic_scr(result, dexog, alpha)
+        cb = np.zeros((dexog.shape[0], 2))
+        cb[:, 0] = pred - c*sigma
+        cb[:, 1] = pred + c*sigma
+
+    if not linear:
+        # May need to support other models with link-like functions.
+        link = result.family.link
+        pred = link.inverse(pred)
+        cb = link.inverse(cb)
+
+    return pred, cb, fvals


 def _glm_basic_scr(result, exog, alpha):
@@ -184,4 +407,41 @@ def _glm_basic_scr(result, exog, alpha):
     interval.  The matrix `exog` is thus the basis functions and any
     other covariates evaluated as x varies.
     """
-    pass
+
+    model = result.model
+    n = model.exog.shape[0]
+
+    # Get the Hessian without recomputing.
+    cov = result.cov_params()
+    hess = np.linalg.inv(cov)
+
+    # Proposition 3.1 of Sun et al.
+    A = hess / n
+    B = np.linalg.cholesky(A).T # Upper Cholesky triangle
+
+    # The variance and SD of the linear predictor at each row of exog.
+    sigma2 = (np.dot(exog, cov) * exog).sum(1)
+    sigma = np.asarray(np.sqrt(sigma2))
+
+    # Calculate kappa_0 (formula 42 from Sun et al)
+    bz = np.linalg.solve(B.T, exog.T).T
+    bz /= np.sqrt(n)
+    bz /= sigma[:, None]
+    bzd = np.diff(bz, 1, axis=0)
+    bzdn = (bzd**2).sum(1)
+    kappa_0 = np.sqrt(bzdn).sum()
+
+    from scipy.stats.distributions import norm
+
+    # The root of this function is the multiplier for the confidence
+    # band, see Sun et al. equation 35.
+    def func(c):
+        return kappa_0 * np.exp(-c**2/2) / np.pi + 2*(1 - norm.cdf(c)) - alpha
+
+    from scipy.optimize import brentq
+
+    c, rslt = brentq(func, 1, 10, full_output=True)
+    if not rslt.converged:
+        raise ValueError("Root finding error in basic SCR")
+
+    return sigma, c
diff --git a/statsmodels/sandbox/regression/anova_nistcertified.py b/statsmodels/sandbox/regression/anova_nistcertified.py
index 4357c14cc..884819953 100644
--- a/statsmodels/sandbox/regression/anova_nistcertified.py
+++ b/statsmodels/sandbox/regression/anova_nistcertified.py
@@ -1,7 +1,7 @@
-"""calculating anova and verifying with NIST test data
+'''calculating anova and verifying with NIST test data

 compares my implementations, stats.f_oneway and anova using statsmodels.OLS
-"""
+'''
 from statsmodels.compat.python import lmap
 import os
 import numpy as np
@@ -9,9 +9,85 @@ from scipy import stats
 from statsmodels.tools.tools import add_constant
 from statsmodels.regression.linear_model import OLS
 from .try_ols_anova import data2dummy
-filenameli = ['SiRstv.dat', 'SmLs01.dat', 'SmLs02.dat', 'SmLs03.dat',
-    'AtmWtAg.dat', 'SmLs04.dat', 'SmLs05.dat', 'SmLs06.dat', 'SmLs07.dat',
-    'SmLs08.dat', 'SmLs09.dat']
+
+filenameli = ['SiRstv.dat', 'SmLs01.dat', 'SmLs02.dat', 'SmLs03.dat', 'AtmWtAg.dat',
+              'SmLs04.dat', 'SmLs05.dat', 'SmLs06.dat', 'SmLs07.dat', 'SmLs08.dat',
+              'SmLs09.dat']
+##filename = 'SmLs03.dat' #'SiRstv.dat' #'SmLs09.dat'#, 'AtmWtAg.dat' #'SmLs07.dat'
+
+
+##path = __file__
+##print(locals().keys()
+###print(path
+
+
+def getnist(filename):
+    here = os.path.dirname(__file__)
+    fname = os.path.abspath(os.path.join(here, 'data', filename))
+    with open(fname, 'r', encoding="utf-8") as fd:
+        content = fd.read().split('\n')
+
+    data = [line.split() for line in content[60:]]
+    certified = [line.split() for line in content[40:48] if line]
+    dataf = np.loadtxt(fname, skiprows=60)
+    y,x = dataf.T
+    y = y.astype(int)
+    caty = np.unique(y)
+    f = float(certified[0][-1])
+    R2 = float(certified[2][-1])
+    resstd = float(certified[4][-1])
+    dfbn = int(certified[0][-4])
+    dfwn = int(certified[1][-3])  # dfbn->dfwn is this correct
+    prob = stats.f.sf(f,dfbn,dfwn)
+    return y, x, np.array([f, prob, R2, resstd]), certified, caty
+
+
+
+
+
+def anova_oneway(y, x, seq=0):
+    # new version to match NIST
+    # no generalization or checking of arguments, tested only for 1d
+    yrvs = y[:,np.newaxis] #- min(y)
+    #subracting mean increases numerical accuracy for NIST test data sets
+    xrvs = x[:,np.newaxis] - x.mean() #for 1d#- 1e12  trick for 'SmLs09.dat'
+
+    from .try_catdata import groupsstats_dummy
+    meang, varg, xdevmeangr, countg = groupsstats_dummy(yrvs[:, :1],
+                                                        xrvs[:, :1])
+    # TODO: the following does not work as replacement
+    #  from .try_catdata import groupsstats_dummy, groupstatsbin
+    #  gcount, gmean , meanarr, withinvar, withinvararr = groupstatsbin(y, x)
+    sswn = np.dot(xdevmeangr.T,xdevmeangr)
+    ssbn = np.dot((meang-xrvs.mean())**2, countg.T)
+    nobs = yrvs.shape[0]
+    ncat = meang.shape[1]
+    dfbn = ncat - 1
+    dfwn = nobs - ncat
+    msb = ssbn/float(dfbn)
+    msw = sswn/float(dfwn)
+    f = msb/msw
+    prob = stats.f.sf(f,dfbn,dfwn)
+    R2 = (ssbn/(sswn+ssbn))  #R-squared
+    resstd = np.sqrt(msw) #residual standard deviation
+    #print(f, prob
+
+    def _fix2scalar(z): # return number
+        if np.shape(z) == (1, 1):
+            return z[0, 0]
+        else:
+            return z
+    f, prob, R2, resstd = lmap(_fix2scalar, (f, prob, R2, resstd))
+    return f, prob, R2, resstd
+
+
+def anova_ols(y, x):
+    X = add_constant(data2dummy(x), prepend=False)
+    res = OLS(y, X).fit()
+    return res.fvalue, res.f_pvalue, res.rsquared, np.sqrt(res.mse_resid)
+
+
+
 if __name__ == '__main__':
     print('\n using new ANOVA anova_oneway')
     print('f, prob, R2, resstd')
@@ -19,16 +95,23 @@ if __name__ == '__main__':
         print(fn)
         y, x, cert, certified, caty = getnist(fn)
         res = anova_oneway(y, x)
-        rtol = {'SmLs08.dat': 0.027, 'SmLs07.dat': 0.0017, 'SmLs09.dat': 0.0001
-            }.get(fn, 1e-07)
+        # TODO: figure out why these results are less accurate/precise
+        #  than others
+        rtol = {
+            "SmLs08.dat": .027,
+            "SmLs07.dat": 1.7e-3,
+            "SmLs09.dat": 1e-4
+        }.get(fn, 1e-7)
         np.testing.assert_allclose(np.array(res), cert, rtol=rtol)
+
     print('\n using stats ANOVA f_oneway')
     for fn in filenameli:
         print(fn)
         y, x, cert, certified, caty = getnist(fn)
-        xlist = [x[y == ii] for ii in caty]
+        xlist = [x[y==ii] for ii in caty]
         res = stats.f_oneway(*xlist)
         print(np.array(res) - cert[:2])
+
     print('\n using statsmodels.OLS')
     print('f, prob, R2, resstd')
     for fn in filenameli[:]:
diff --git a/statsmodels/sandbox/regression/ar_panel.py b/statsmodels/sandbox/regression/ar_panel.py
index 6d564a7d9..09489cdfb 100644
--- a/statsmodels/sandbox/regression/ar_panel.py
+++ b/statsmodels/sandbox/regression/ar_panel.py
@@ -1,4 +1,4 @@
-"""Paneldata model with fixed effect (constants) and AR(1) errors
+'''Paneldata model with fixed effect (constants) and AR(1) errors

 checking fast evaluation of groupar1filter
 quickly written to try out grouparfilter without python loops
@@ -18,48 +18,95 @@ asymptotically correct (check)

 Could be extended to AR(p) errors, but then requires panel with larger T

-"""
+'''
+
+
 import numpy as np
 from scipy import optimize
+
 from statsmodels.regression.linear_model import OLS


 class PanelAR1:
-
     def __init__(self, endog, exog=None, groups=None):
+        #take this from a super class, no checking is done here
         nobs = endog.shape[0]
         self.endog = endog
         if exog is not None:
             self.exog = exog
-        self.groups_start = np.diff(groups) != 0
+
+        self.groups_start = (np.diff(groups)!=0)
         self.groups_valid = ~self.groups_start

+    def ar1filter(self, xy, alpha):
+        #print(alpha,)
+        return (xy[1:] - alpha * xy[:-1])[self.groups_valid]
+
+    def fit_conditional(self, alpha):
+        y = self.ar1filter(self.endog, alpha)
+        x = self.ar1filter(self.exog, alpha)
+        res = OLS(y, x).fit()
+        return res.ssr  #res.llf
+
+
+    def fit(self):
+        alpha0 = 0.1 #startvalue
+        func = self.fit_conditional
+        fitres = optimize.fmin(func, alpha0)
+
+        # fit_conditional only returns ssr for now
+        alpha = fitres[0]
+        y = self.ar1filter(self.endog, alpha)
+        x = self.ar1filter(self.exog, alpha)
+        reso = OLS(y, x).fit()
+
+        return fitres, reso

 if __name__ == '__main__':
-    groups = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
-        1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
+
+    #------------ development code for groupar1filter and example
+    groups = np.array([0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,
+                       2,2,2,2,2,2,2,2])
     nobs = len(groups)
     data0 = np.arange(nobs)
-    data = np.arange(1, nobs + 1) - 0.5 * np.arange(nobs
-        ) + 0.1 * np.random.randn(nobs)
-    y00 = 0.5 * np.random.randn(nobs + 1)
-    data = np.arange(nobs) + y00[1:] + 0.2 * y00[:-1] + 0.1 * np.random.randn(
-        nobs)
-    data = y00[1:] + 0.6 * y00[:-1]
+
+    data = np.arange(1,nobs+1) - 0.5*np.arange(nobs) + 0.1*np.random.randn(nobs)
+
+    y00 = 0.5*np.random.randn(nobs+1)
+
+    # I do not think a trend is handled yet
+    data = np.arange(nobs) + y00[1:] + 0.2*y00[:-1] + 0.1*np.random.randn(nobs)
+    #Are these AR(1) or MA(1) errors ???
+    data = y00[1:] + 0.6*y00[:-1] #+ 0.1*np.random.randn(nobs)
+
     group_codes = np.unique(groups)
-    group_dummy = (groups[:, None] == group_codes).astype(int)
-    groups_start = np.diff(groups) != 0
-    groups_valid = np.diff(groups) == 0
+    group_dummy = (groups[:,None] == group_codes).astype(int)
+
+    groups_start = (np.diff(groups)!=0)
+    groups_valid = (np.diff(groups)==0)  #this applies to y with length for AR(1)
+    #could use np.nonzero for index instead
+
     y = data + np.dot(group_dummy, np.array([10, 20, 30]))
     y0 = data0 + np.dot(group_dummy, np.array([10, 20, 30]))
+
     print(groups_valid)
     print(np.diff(y)[groups_valid])
-    alpha = 1
-    print((y0[1:] - alpha * y0[:-1])[groups_valid])
-    alpha = 0.2
-    print((y0[1:] - alpha * y0[:-1] + 0.001)[groups_valid])
+
+    alpha = 1  #test with 1
+    print((y0[1:] - alpha*y0[:-1])[groups_valid])
+    alpha = 0.2  #test with 1
+    print((y0[1:] - alpha*y0[:-1] + 0.001)[groups_valid])
+    #this is now AR(1) for each group separately
+
+
+    #------------
+
+    #fitting the example
+
     exog = np.ones(nobs)
     exog = group_dummy
     mod = PanelAR1(y, exog, groups=groups)
+    #mod = PanelAR1(data, exog, groups=groups) #data does not contain different means
+    #print(mod.ar1filter(mod.endog, 1))
     resa, reso = mod.fit()
     print(resa[0], reso.params)
diff --git a/statsmodels/sandbox/regression/example_kernridge.py b/statsmodels/sandbox/regression/example_kernridge.py
index e40567401..ab85f2a8d 100644
--- a/statsmodels/sandbox/regression/example_kernridge.py
+++ b/statsmodels/sandbox/regression/example_kernridge.py
@@ -1,26 +1,34 @@
+
+
 import numpy as np
 import matplotlib.pyplot as plt
 from .kernridgeregress_class import GaussProcess, kernel_euclid
-m, k = 50, 4
+
+
+m,k = 50,4
 upper = 6
 scale = 10
-xs = np.linspace(1, upper, m)[:, np.newaxis]
-xs1 = np.sin(xs)
-y1true = np.sum(xs1 + 0.01 * np.sqrt(np.abs(xs1)), 1)[:, np.newaxis]
-y1 = y1true + 0.1 * np.random.randn(m, 1)
-stride = 3
-xstrain = xs1[::stride, :]
-ystrain = y1[::stride, :]
-xstrain = np.r_[xs1[:m / 2, :], xs1[m / 2 + 10:, :]]
-ystrain = np.r_[y1[:m / 2, :], y1[m / 2 + 10:, :]]
-index = np.hstack((np.arange(m / 2), np.arange(m / 2 + 10, m)))
-gp1 = GaussProcess(xstrain, ystrain, kernel=kernel_euclid, ridgecoeff=5 * 
-    0.0001)
+xs = np.linspace(1,upper,m)[:,np.newaxis]
+#xs1 = xs1a*np.ones((1,4)) + 1/(1.0+np.exp(np.random.randn(m,k)))
+#xs1 /= np.std(xs1[::k,:],0)   # normalize scale, could use cov to normalize
+##y1true = np.sum(np.sin(xs1)+np.sqrt(xs1),1)[:,np.newaxis]
+xs1 = np.sin(xs)#[:,np.newaxis]
+y1true = np.sum(xs1 + 0.01*np.sqrt(np.abs(xs1)),1)[:,np.newaxis]
+y1 = y1true + 0.10 * np.random.randn(m,1)
+
+stride = 3 #use only some points as trainig points e.g 2 means every 2nd
+xstrain = xs1[::stride,:]
+ystrain = y1[::stride,:]
+xstrain = np.r_[xs1[:m/2,:], xs1[m/2+10:,:]]
+ystrain = np.r_[y1[:m/2,:], y1[m/2+10:,:]]
+index = np.hstack((np.arange(m/2), np.arange(m/2+10,m)))
+gp1 = GaussProcess(xstrain, ystrain, kernel=kernel_euclid,
+                   ridgecoeff=5*1e-4)
 yhatr1 = gp1.predict(xs1)
 plt.figure()
-plt.plot(y1true, y1, 'bo', y1true, yhatr1, 'r.')
+plt.plot(y1true, y1,'bo',y1true, yhatr1,'r.')
 plt.title('euclid kernel: true y versus noisy y and estimated y')
 plt.figure()
-plt.plot(index, ystrain.ravel(), 'bo-', y1true, 'go-', yhatr1, 'r.-')
-plt.title('euclid kernel: true (green), noisy (blue) and estimated (red) ' +
-    'observations')
+plt.plot(index,ystrain.ravel(),'bo-',y1true,'go-',yhatr1,'r.-')
+plt.title('euclid kernel: true (green), noisy (blue) and estimated (red) '+
+          'observations')
diff --git a/statsmodels/sandbox/regression/gmm.py b/statsmodels/sandbox/regression/gmm.py
index 432f76e30..5eec10279 100644
--- a/statsmodels/sandbox/regression/gmm.py
+++ b/statsmodels/sandbox/regression/gmm.py
@@ -1,4 +1,4 @@
-"""Generalized Method of Moments, GMM, and Two-Stage Least Squares for
+'''Generalized Method of Moments, GMM, and Two-Stage Least Squares for
 instrumental variables IV2SLS


@@ -45,23 +45,30 @@ Unclear
 Author: josef-pktd
 License: BSD (3-clause)

-"""
+'''
+
+
 from statsmodels.compat.python import lrange
+
 import numpy as np
 from scipy import optimize, stats
+
 from statsmodels.tools.numdiff import approx_fprime
-from statsmodels.base.model import Model, LikelihoodModel, LikelihoodModelResults
-from statsmodels.regression.linear_model import OLS, RegressionResults, RegressionResultsWrapper
+from statsmodels.base.model import (Model,
+                                    LikelihoodModel, LikelihoodModelResults)
+from statsmodels.regression.linear_model import (OLS, RegressionResults,
+                                                 RegressionResultsWrapper)
 import statsmodels.stats.sandwich_covariance as smcov
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.tools import _ensure_2d
+
 DEBUG = 0


 def maxabs(x):
-    """just a shortcut to np.abs(x).max()
-    """
-    pass
+    '''just a shortcut to np.abs(x).max()
+    '''
+    return np.abs(x).max()


 class IV2SLS(LikelihoodModel):
@@ -93,15 +100,23 @@ class IV2SLS(LikelihoodModel):
     def __init__(self, endog, exog, instrument=None):
         self.instrument, self.instrument_names = _ensure_2d(instrument, True)
         super(IV2SLS, self).__init__(endog, exog)
+        # where is this supposed to be handled
+        # Note: Greene p.77/78 dof correction is not necessary (because only
+        #       asy results), but most packages do it anyway
         self.df_resid = self.exog.shape[0] - self.exog.shape[1]
+        #self.df_model = float(self.rank - self.k_constant)
         self.df_model = float(self.exog.shape[1] - self.k_constant)

+    def initialize(self):
+        self.wendog = self.endog
+        self.wexog = self.exog
+
     def whiten(self, X):
         """Not implemented"""
         pass

     def fit(self):
-        """estimate model using 2SLS IV regression
+        '''estimate model using 2SLS IV regression

         Returns
         -------
@@ -116,9 +131,34 @@ class IV2SLS(LikelihoodModel):
         Parameter estimates and covariance are correct, but other results
         have not been tested yet, to see whether they apply without changes.

-        """
-        pass
-
+        '''
+        #Greene 5th edt., p.78 section 5.4
+        #move this maybe
+        y,x,z = self.endog, self.exog, self.instrument
+        # TODO: this uses "textbook" calculation, improve linalg
+        ztz = np.dot(z.T, z)
+        ztx = np.dot(z.T, x)
+        self.xhatparams = xhatparams = np.linalg.solve(ztz, ztx)
+        #print 'x.T.shape, xhatparams.shape', x.shape, xhatparams.shape
+        F = xhat = np.dot(z, xhatparams)
+        FtF = np.dot(F.T, F)
+        self.xhatprod = FtF  #store for Housman specification test
+        Ftx = np.dot(F.T, x)
+        Fty = np.dot(F.T, y)
+        params = np.linalg.solve(FtF, Fty)
+        Ftxinv = np.linalg.inv(Ftx)
+        self.normalized_cov_params = np.dot(Ftxinv.T, np.dot(FtF, Ftxinv))
+
+        lfit = IVRegressionResults(self, params,
+                       normalized_cov_params=self.normalized_cov_params)
+
+        lfit.exog_hat_params = xhatparams
+        lfit.exog_hat = xhat  # TODO: do we want to store this, might be large
+        self._results_ols2nd = OLS(y, xhat).fit()
+
+        return RegressionResultsWrapper(lfit)
+
+    # copied from GLS, because I subclass currently LikelihoodModel and not GLS
     def predict(self, params, exog=None):
         """
         Return linear predicted values from a design matrix.
@@ -138,7 +178,10 @@ class IV2SLS(LikelihoodModel):
         -----
         If the model as not yet been fit, params is not optional.
         """
-        pass
+        if exog is None:
+            exog = self.exog
+
+        return np.dot(exog, params)


 class IVRegressionResults(RegressionResults):
@@ -158,17 +201,55 @@ class IVRegressionResults(RegressionResults):
     RegressionResults
     """

+    @cache_readonly
+    def fvalue(self):
+        const_idx = self.model.data.const_idx
+        # if constant is implicit or missing, return nan see #2444, #3544
+        if const_idx is None:
+            return np.nan
+        else:
+            k_vars = len(self.params)
+            restriction = np.eye(k_vars)
+            idx_noconstant = lrange(k_vars)
+            del idx_noconstant[const_idx]
+            fval = self.f_test(restriction[idx_noconstant]).fvalue # without constant
+            return fval
+
+
     def spec_hausman(self, dof=None):
-        """Hausman's specification test
+        '''Hausman's specification test

         See Also
         --------
         spec_hausman : generic function for Hausman's specification test

-        """
-        pass
+        '''
+        #use normalized cov_params for OLS
+
+        endog, exog = self.model.endog, self.model.exog
+        resols = OLS(endog, exog).fit()
+        normalized_cov_params_ols = resols.model.normalized_cov_params
+        # Stata `ivendog` does not use df correction for se
+        #se2 = resols.mse_resid #* resols.df_resid * 1. / len(endog)
+        se2 = resols.ssr / len(endog)
+
+        params_diff = self.params - resols.params
+
+        cov_diff = np.linalg.pinv(self.model.xhatprod) - normalized_cov_params_ols
+        #TODO: the following is very inefficient, solves problem (svd) twice
+        #use linalg.lstsq or svd directly
+        #cov_diff will very often be in-definite (singular)
+        if not dof:
+            dof = np.linalg.matrix_rank(cov_diff)
+        cov_diffpinv = np.linalg.pinv(cov_diff)
+        H = np.dot(params_diff, np.dot(cov_diffpinv, params_diff))/se2
+        pval = stats.chi2.sf(H, dof)
+
+        return H, pval, dof
+

-    def summary(self, yname=None, xname=None, title=None, alpha=0.05):
+# copied from regression results with small changes, no llf
+    def summary(self, yname=None, xname=None, title=None, alpha=.05):
         """Summarize the Regression Results

         Parameters
@@ -194,10 +275,94 @@ class IVRegressionResults(RegressionResults):
         statsmodels.iolib.summary.Summary : class to hold summary
             results
         """
-        pass

+        #TODO: import where we need it (for now), add as cached attributes
+        from statsmodels.stats.stattools import (jarque_bera,
+                omni_normtest, durbin_watson)
+        jb, jbpv, skew, kurtosis = jarque_bera(self.wresid)
+        omni, omnipv = omni_normtest(self.wresid)
+
+        #TODO: reuse condno from somewhere else ?
+        #condno = np.linalg.cond(np.dot(self.wexog.T, self.wexog))
+        wexog = self.model.wexog
+        eigvals = np.linalg.linalg.eigvalsh(np.dot(wexog.T, wexog))
+        eigvals = np.sort(eigvals) #in increasing order
+        condno = np.sqrt(eigvals[-1]/eigvals[0])
+
+        # TODO: check what is valid.
+        # box-pierce, breusch-pagan, durbin's h are not with endogenous on rhs
+        # use Cumby Huizinga 1992 instead
+        self.diagn = dict(jb=jb, jbpv=jbpv, skew=skew, kurtosis=kurtosis,
+                          omni=omni, omnipv=omnipv, condno=condno,
+                          mineigval=eigvals[0])
+
+        #TODO not used yet
+        #diagn_left_header = ['Models stats']
+        #diagn_right_header = ['Residual stats']
+
+        #TODO: requiring list/iterable is a bit annoying
+        #need more control over formatting
+        #TODO: default do not work if it's not identically spelled
+
+        top_left = [('Dep. Variable:', None),
+                    ('Model:', None),
+                    ('Method:', ['Two Stage']),
+                    ('', ['Least Squares']),
+                    ('Date:', None),
+                    ('Time:', None),
+                    ('No. Observations:', None),
+                    ('Df Residuals:', None), #[self.df_resid]), #TODO: spelling
+                    ('Df Model:', None), #[self.df_model])
+                    ]
+
+        top_right = [('R-squared:', ["%#8.3f" % self.rsquared]),
+                     ('Adj. R-squared:', ["%#8.3f" % self.rsquared_adj]),
+                     ('F-statistic:', ["%#8.4g" % self.fvalue] ),
+                     ('Prob (F-statistic):', ["%#6.3g" % self.f_pvalue]),
+                     #('Log-Likelihood:', None), #["%#6.4g" % self.llf]),
+                     #('AIC:', ["%#8.4g" % self.aic]),
+                     #('BIC:', ["%#8.4g" % self.bic])
+                     ]
+
+        diagn_left = [('Omnibus:', ["%#6.3f" % omni]),
+                      ('Prob(Omnibus):', ["%#6.3f" % omnipv]),
+                      ('Skew:', ["%#6.3f" % skew]),
+                      ('Kurtosis:', ["%#6.3f" % kurtosis])
+                      ]
+
+        diagn_right = [('Durbin-Watson:', ["%#8.3f" % durbin_watson(self.wresid)]),
+                       ('Jarque-Bera (JB):', ["%#8.3f" % jb]),
+                       ('Prob(JB):', ["%#8.3g" % jbpv]),
+                       ('Cond. No.', ["%#8.3g" % condno])
+                       ]
+
+
+        if title is None:
+            title = self.model.__class__.__name__ + ' ' + "Regression Results"
+
+        #create summary table instance
+        from statsmodels.iolib.summary import Summary
+        smry = Summary()
+        smry.add_table_2cols(self, gleft=top_left, gright=top_right,
+                          yname=yname, xname=xname, title=title)
+        smry.add_table_params(self, yname=yname, xname=xname, alpha=alpha,
+                             use_t=True)
+
+        smry.add_table_2cols(self, gleft=diagn_left, gright=diagn_right,
+                          yname=yname, xname=xname,
+                          title="")
+
+
+
+        return smry
+
+
+
+
+############# classes for Generalized Method of Moments GMM
+
+_gmm_options = '''\

-_gmm_options = """
 Options for GMM
 ---------------

@@ -250,11 +415,10 @@ The additional option is
    TODO: do we want to have a different default after `onestep`?


-"""
-
+'''

 class GMM(Model):
-    """
+    '''
     Class for estimation by Generalized Method of Moments

     needs to be subclassed, where the subclass defined the moment conditions
@@ -304,19 +468,23 @@ class GMM(Model):
     currently onestep (maxiter=0) still produces an updated estimate of bse
     and cov_params.

-    """
+    '''
+
     results_class = 'GMMResults'

     def __init__(self, endog, exog, instrument, k_moms=None, k_params=None,
-        missing='none', **kwds):
-        """
+                 missing='none', **kwds):
+        '''
         maybe drop and use mixin instead

         TODO: GMM does not really care about the data, just the moment conditions
-        """
-        instrument = self._check_inputs(instrument, endog)
-        super(GMM, self).__init__(endog, exog, missing=missing, instrument=
-            instrument)
+        '''
+        instrument = self._check_inputs(instrument, endog) # attaches if needed
+        super(GMM, self).__init__(endog, exog, missing=missing,
+                instrument=instrument)
+#         self.endog = endog
+#         self.exog = exog
+#         self.instrument = instrument
         self.nobs = endog.shape[0]
         if k_moms is not None:
             self.nmoms = k_moms
@@ -324,14 +492,41 @@ class GMM(Model):
             self.nmoms = instrument.shape[1]
         else:
             self.nmoms = np.nan
+
         if k_params is not None:
             self.k_params = k_params
         elif instrument is not None:
             self.k_params = exog.shape[1]
         else:
             self.k_params = np.nan
+
         self.__dict__.update(kwds)
-        self.epsilon_iter = 1e-06
+        self.epsilon_iter = 1e-6
+
+    def _check_inputs(self, instrument, endog):
+        if instrument is not None:
+            offset = np.asarray(instrument)
+            if offset.shape[0] != endog.shape[0]:
+                raise ValueError("instrument is not the same length as endog")
+        return instrument
+
+    def _fix_param_names(self, params, param_names=None):
+        # TODO: this is a temporary fix, need
+        xnames = self.data.xnames
+
+        if param_names is not None:
+            if len(params) == len(param_names):
+                self.data.xnames = param_names
+            else:
+                raise ValueError('param_names has the wrong length')
+
+        else:
+            if len(params) < len(xnames):
+                # cut in front for poisson multiplicative
+                self.data.xnames = xnames[-len(params):]
+            elif len(params) > len(xnames):
+                # use generic names
+                self.data.xnames = ['p%2d' % i for i in range(len(params))]

     def set_param_names(self, param_names, k_params=None):
         """set the parameter names in the model
@@ -346,12 +541,22 @@ class GMM(Model):
             If k_params is not None, then it will also set the k_params
             attribute.
         """
-        pass
+        if k_params is not None:
+            self.k_params = k_params
+        else:
+            k_params = self.k_params
+
+        if k_params == len(param_names):
+            self.data.xnames = param_names
+        else:
+            raise ValueError('param_names has the wrong length')
+

     def fit(self, start_params=None, maxiter=10, inv_weights=None,
-        weights_method='cov', wargs=(), has_optimal_weights=True,
-        optim_method='bfgs', optim_args=None):
-        """
+                  weights_method='cov', wargs=(),
+                  has_optimal_weights=True,
+                  optim_method='bfgs', optim_args=None):
+        '''
         Estimate parameters using GMM and return GMMResults

         TODO: weight and covariance arguments still need to be made consistent
@@ -429,12 +634,79 @@ class GMM(Model):
         The same options as for weight matrix also apply to the calculation of
         the estimate of the covariance matrix of the parameter estimates.

-        """
-        pass
-
-    def fitgmm(self, start, weights=None, optim_method='bfgs', optim_args=None
-        ):
-        """estimate parameters using GMM
+        '''
+        # TODO: add check for correct wargs keys
+        #       currently a misspelled key is not detected,
+        #       because I'm still adding options
+
+        # TODO: check repeated calls to fit with different options
+        #       arguments are dictionaries, i.e. mutable
+        #       unit test if anything  is stale or spilled over.
+
+        #bug: where does start come from ???
+        start = start_params  # alias for renaming
+        if start is None:
+            start = self.fitstart() #TODO: temporary hack
+
+        if inv_weights is None:
+            inv_weights
+
+        if optim_args is None:
+            optim_args = {}
+        if 'disp' not in optim_args:
+            optim_args['disp'] = 1
+
+        if maxiter == 0 or maxiter == 'cue':
+            if inv_weights is not None:
+                weights = np.linalg.pinv(inv_weights)
+            else:
+                # let start_weights handle the inv=False for maxiter=0
+                weights = self.start_weights(inv=False)
+
+            params = self.fitgmm(start, weights=weights,
+                                 optim_method=optim_method, optim_args=optim_args)
+            weights_ = weights  # temporary alias used in jval
+        else:
+            params, weights = self.fititer(start,
+                                           maxiter=maxiter,
+                                           start_invweights=inv_weights,
+                                           weights_method=weights_method,
+                                           wargs=wargs,
+                                           optim_method=optim_method,
+                                           optim_args=optim_args)
+            # TODO weights returned by fititer is inv_weights - not true anymore
+            # weights_ currently not necessary and used anymore
+            weights_ = np.linalg.pinv(weights)
+
+        if maxiter == 'cue':
+            #we have params from maxiter= 0 as starting value
+            # TODO: need to give weights options to gmmobjective_cu
+            params = self.fitgmm_cu(params,
+                                     optim_method=optim_method,
+                                     optim_args=optim_args)
+            # weights is stored as attribute
+            weights = self._weights_cu
+
+        #TODO: use Bunch instead ?
+        options_other = {'weights_method':weights_method,
+                         'has_optimal_weights':has_optimal_weights,
+                         'optim_method':optim_method}
+
+        # check that we have the right number of xnames
+        self._fix_param_names(params, param_names=None)
+        results = results_class_dict[self.results_class](
+                                        model = self,
+                                        params = params,
+                                        weights = weights,
+                                        wargs = wargs,
+                                        options_other = options_other,
+                                        optim_args = optim_args)
+
+        self.results = results # FIXME: remove, still keeping it temporarily
+        return results
+
+    def fitgmm(self, start, weights=None, optim_method='bfgs', optim_args=None):
+        '''estimate parameters using GMM

         Parameters
         ----------
@@ -456,11 +728,49 @@ class GMM(Model):

         uses scipy.optimize.fmin

-        """
-        pass
+        '''
+##        if not fixed is None:  #fixed not defined in this version
+##            raise NotImplementedError
+
+        # TODO: should start_weights only be in `fit`
+        if weights is None:
+            weights = self.start_weights(inv=False)
+
+        if optim_args is None:
+            optim_args = {}
+
+        if optim_method == 'nm':
+            optimizer = optimize.fmin
+        elif optim_method == 'bfgs':
+            optimizer = optimize.fmin_bfgs
+            # TODO: add score
+            optim_args['fprime'] = self.score #lambda params: self.score(params, weights)
+        elif optim_method == 'ncg':
+            optimizer = optimize.fmin_ncg
+            optim_args['fprime'] = self.score
+        elif optim_method == 'cg':
+            optimizer = optimize.fmin_cg
+            optim_args['fprime'] = self.score
+        elif optim_method == 'fmin_l_bfgs_b':
+            optimizer = optimize.fmin_l_bfgs_b
+            optim_args['fprime'] = self.score
+        elif optim_method == 'powell':
+            optimizer = optimize.fmin_powell
+        elif optim_method == 'slsqp':
+            optimizer = optimize.fmin_slsqp
+        else:
+            raise ValueError('optimizer method not available')
+
+        if DEBUG:
+            print(np.linalg.det(weights))
+
+        #TODO: add other optimization options and results
+        return optimizer(self.gmmobjective, start, args=(weights,),
+                         **optim_args)
+

     def fitgmm_cu(self, start, optim_method='bfgs', optim_args=None):
-        """estimate parameters using continuously updating GMM
+        '''estimate parameters using continuously updating GMM

         Parameters
         ----------
@@ -478,15 +788,32 @@ class GMM(Model):

         uses scipy.optimize.fmin

-        """
-        pass
+        '''
+##        if not fixed is None:  #fixed not defined in this version
+##            raise NotImplementedError
+
+        if optim_args is None:
+            optim_args = {}
+
+        if optim_method == 'nm':
+            optimizer = optimize.fmin
+        elif optim_method == 'bfgs':
+            optimizer = optimize.fmin_bfgs
+            optim_args['fprime'] = self.score_cu
+        elif optim_method == 'ncg':
+            optimizer = optimize.fmin_ncg
+        else:
+            raise ValueError('optimizer method not available')
+
+        #TODO: add other optimization options and results
+        return optimizer(self.gmmobjective_cu, start, args=(), **optim_args)

     def start_weights(self, inv=True):
         """Create identity matrix for starting weights"""
-        pass
+        return np.eye(self.nmoms)

     def gmmobjective(self, params, weights):
-        """
+        '''
         objective function for GMM minimization

         Parameters
@@ -501,11 +828,16 @@ class GMM(Model):
         jval : float
             value of objective function

-        """
-        pass
+        '''
+        moms = self.momcond_mean(params)
+        return np.dot(np.dot(moms, weights), moms)
+        #moms = self.momcond(params)
+        #return np.dot(np.dot(moms.mean(0),weights), moms.mean(0))

-    def gmmobjective_cu(self, params, weights_method='cov', wargs=()):
-        """
+
+    def gmmobjective_cu(self, params, weights_method='cov',
+                        wargs=()):
+        '''
         objective function for continuously updating  GMM minimization

         Parameters
@@ -518,12 +850,19 @@ class GMM(Model):
         jval : float
             value of objective function

-        """
-        pass
+        '''
+        moms = self.momcond(params)
+        inv_weights = self.calc_weightmatrix(moms, weights_method=weights_method,
+                                             wargs=wargs)
+        weights = np.linalg.pinv(inv_weights)
+        self._weights_cu = weights  # store if we need it later
+        return np.dot(np.dot(moms.mean(0), weights), moms.mean(0))
+

     def fititer(self, start, maxiter=2, start_invweights=None,
-        weights_method='cov', wargs=(), optim_method='bfgs', optim_args=None):
-        """iterative estimation with updating of optimal weighting matrix
+                    weights_method='cov', wargs=(), optim_method='bfgs',
+                    optim_args=None):
+        '''iterative estimation with updating of optimal weighting matrix

         stopping criteria are maxiter or change in parameter estimate less
         than self.epsilon_iter, with default 1e-6.
@@ -555,12 +894,46 @@ class GMM(Model):



-        """
-        pass
+        '''
+        self.history = []
+        momcond = self.momcond
+
+        if start_invweights is None:
+            w = self.start_weights(inv=True)
+        else:
+            w = start_invweights
+
+        #call fitgmm function
+        #args = (self.endog, self.exog, self.instrument)
+        #args is not used in the method version
+        winv_new = w
+        for it in range(maxiter):
+            winv = winv_new
+            w = np.linalg.pinv(winv)
+            #this is still calling function not method
+##            resgmm = fitgmm(momcond, (), start, weights=winv, fixed=None,
+##                            weightsoptimal=False)
+            resgmm = self.fitgmm(start, weights=w, optim_method=optim_method,
+                                 optim_args=optim_args)
+
+            moms = momcond(resgmm)
+            # the following is S = cov_moments
+            winv_new = self.calc_weightmatrix(moms,
+                                              weights_method=weights_method,
+                                              wargs=wargs, params=resgmm)
+
+            if it > 2 and maxabs(resgmm - start) < self.epsilon_iter:
+                #check rule for early stopping
+                # TODO: set has_optimal_weights = True
+                break
+
+            start = resgmm
+        return resgmm, w
+

     def calc_weightmatrix(self, moms, weights_method='cov', wargs=(),
-        params=None):
-        """
+                          params=None):
+        '''
         calculate omega or the weighting matrix

         Parameters
@@ -599,18 +972,108 @@ class GMM(Model):
         Greene
         Hansen, Bruce

-        """
-        pass
+        '''
+        nobs, k_moms = moms.shape
+        # TODO: wargs are tuple or dict ?
+        if DEBUG:
+            print(' momcov wargs', wargs)
+
+        centered = not ('centered' in wargs and not wargs['centered'])
+        if not centered:
+            # caller does not want centered moment conditions
+            moms_ = moms
+        else:
+            moms_ = moms - moms.mean()
+
+        # TODO: store this outside to avoid doing this inside optimization loop
+        # TODO: subclasses need to be able to add weights_methods, and remove
+        #       IVGMM can have homoscedastic (OLS),
+        #       some options will not make sense in some cases
+        #       possible add all here and allow subclasses to define a list
+        # TODO: should other weights_methods also have `ddof`
+        if weights_method == 'cov':
+            w = np.dot(moms_.T, moms_)
+            if 'ddof' in wargs:
+                # caller requests degrees of freedom correction
+                if wargs['ddof'] == 'k_params':
+                    w /= (nobs - self.k_params)
+                else:
+                    if DEBUG:
+                        print(' momcov ddof', wargs['ddof'])
+                    w /= (nobs - wargs['ddof'])
+            else:
+                # default: divide by nobs
+                w /= nobs
+
+        elif weights_method == 'flatkernel':
+            #uniform cut-off window
+            # This was a trial version, can use HAC with flatkernel
+            if 'maxlag' not in wargs:
+                raise ValueError('flatkernel requires maxlag')
+
+            maxlag = wargs['maxlag']
+            h = np.ones(maxlag + 1)
+            w = np.dot(moms_.T, moms_)/nobs
+            for i in range(1,maxlag+1):
+                w += (h[i] * np.dot(moms_[i:].T, moms_[:-i]) / (nobs-i))
+
+        elif weights_method == 'hac':
+            maxlag = wargs['maxlag']
+            if 'kernel' in wargs:
+                weights_func = wargs['kernel']
+            else:
+                weights_func = smcov.weights_bartlett
+                wargs['kernel'] = weights_func
+
+            w = smcov.S_hac_simple(moms_, nlags=maxlag,
+                                   weights_func=weights_func)
+            w /= nobs #(nobs - self.k_params)
+
+        elif weights_method == 'iid':
+            # only when we have instruments and residual mom = Z * u
+            # TODO: problem we do not have params in argument
+            #       I cannot keep everything in here w/o params as argument
+            u = self.get_error(params)
+
+            if centered:
+                # Note: I'm not centering instruments,
+                #    should not we always center u? Ok, with centered as default
+                u -= u.mean(0)  #demean inplace, we do not need original u
+
+            instrument = self.instrument
+            w = np.dot(instrument.T, instrument).dot(np.dot(u.T, u)) / nobs
+            if 'ddof' in wargs:
+                # caller requests degrees of freedom correction
+                if wargs['ddof'] == 'k_params':
+                    w /= (nobs - self.k_params)
+                else:
+                    # assume ddof is a number
+                    if DEBUG:
+                        print(' momcov ddof', wargs['ddof'])
+                    w /= (nobs - wargs['ddof'])
+            else:
+                # default: divide by nobs
+                w /= nobs
+
+        else:
+            raise ValueError('weight method not available')
+
+        return w
+

     def momcond_mean(self, params):
-        """
+        '''
         mean of moment conditions,

-        """
-        pass
+        '''
+
+        momcond = self.momcond(params)
+        self.nobs_moms, self.k_moms = momcond.shape
+        return momcond.mean(0)
+

-    def gradient_momcond(self, params, epsilon=0.0001, centered=True):
-        """gradient of moment conditions
+    def gradient_momcond(self, params, epsilon=1e-4, centered=True):
+        '''gradient of moment conditions

         Parameters
         ----------
@@ -626,41 +1089,88 @@ class GMM(Model):
         TODO: looks like not used yet
               missing argument `weights`

-        """
-        pass
+        '''
+
+        momcond = self.momcond_mean
+
+        # TODO: approx_fprime has centered keyword
+        if centered:
+            gradmoms = (approx_fprime(params, momcond, epsilon=epsilon) +
+                    approx_fprime(params, momcond, epsilon=-epsilon))/2
+        else:
+            gradmoms = approx_fprime(params, momcond, epsilon=epsilon)
+
+        return gradmoms

     def score(self, params, weights, epsilon=None, centered=True):
         """Score"""
-        pass
+        deriv = approx_fprime(params, self.gmmobjective, args=(weights,),
+                              centered=centered, epsilon=epsilon)
+
+        return deriv

     def score_cu(self, params, epsilon=None, centered=True):
         """Score cu"""
-        pass
+        deriv = approx_fprime(params, self.gmmobjective_cu, args=(),
+                              centered=centered, epsilon=epsilon)
+
+        return deriv


+# TODO: wrong superclass, I want tvalues, ... right now
 class GMMResults(LikelihoodModelResults):
-    """just a storage class right now"""
+    '''just a storage class right now'''
+
     use_t = False

     def __init__(self, *args, **kwds):
         self.__dict__.update(kwds)
+
         self.nobs = self.model.nobs
         self.df_resid = np.inf
+
         self.cov_params_default = self._cov_params()

     @cache_readonly
     def q(self):
         """Objective function at params"""
-        pass
+        return self.model.gmmobjective(self.params, self.weights)

     @cache_readonly
     def jval(self):
         """nobs_moms attached by momcond_mean"""
-        pass
+        return self.q * self.model.nobs_moms
+
+    def _cov_params(self, **kwds):
+        #TODO add options ???)
+        # this should use by default whatever options have been specified in
+        # fit
+
+        # TODO: do not do this when we want to change options
+#         if hasattr(self, '_cov_params'):
+#             #replace with decorator later
+#             return self._cov_params
+
+        # set defaults based on fit arguments
+        if 'wargs' not in kwds:
+            # Note: we do not check the keys in wargs, use either all or nothing
+            kwds['wargs'] = self.wargs
+        if 'weights_method' not in kwds:
+            kwds['weights_method'] = self.options_other['weights_method']
+        if 'has_optimal_weights' not in kwds:
+            kwds['has_optimal_weights'] = self.options_other['has_optimal_weights']

-    def calc_cov_params(self, moms, gradmoms, weights=None, use_weights=
-        False, has_optimal_weights=True, weights_method='cov', wargs=()):
-        """calculate covariance of parameter estimates
+        gradmoms = self.model.gradient_momcond(self.params)
+        moms = self.model.momcond(self.params)
+        covparams = self.calc_cov_params(moms, gradmoms, **kwds)
+
+        return covparams
+
+
+    def calc_cov_params(self, moms, gradmoms, weights=None, use_weights=False,
+                                              has_optimal_weights=True,
+                                              weights_method='cov', wargs=()):
+        '''calculate covariance of parameter estimates

         not all options tried out yet

@@ -672,17 +1182,51 @@ class GMMResults(LikelihoodModelResults):
         (API Note: The latter assumption could be changed if we allow for
         has_optimal_weights=None.)

-        """
-        pass
+        '''
+
+        nobs = moms.shape[0]
+
+        if weights is None:
+            #omegahat = self.model.calc_weightmatrix(moms, method=method, wargs=wargs)
+            #has_optimal_weights = True
+            #add other options, Barzen, ...  longrun var estimators
+            # TODO: this might still be inv_weights after fititer
+            weights = self.weights
+        else:
+            pass
+            #omegahat = weights   #2 different names used,
+            #TODO: this is wrong, I need an estimate for omega
+
+        if use_weights:
+            omegahat = weights
+        else:
+            omegahat = self.model.calc_weightmatrix(
+                                                moms,
+                                                weights_method=weights_method,
+                                                wargs=wargs,
+                                                params=self.params)
+
+
+        if has_optimal_weights: #has_optimal_weights:
+            # TOD0 make has_optimal_weights depend on convergence or iter >2
+            cov = np.linalg.inv(np.dot(gradmoms.T,
+                                    np.dot(np.linalg.inv(omegahat), gradmoms)))
+        else:
+            gw = np.dot(gradmoms.T, weights)
+            gwginv = np.linalg.inv(np.dot(gw, gradmoms))
+            cov = np.dot(np.dot(gwginv, np.dot(np.dot(gw, omegahat), gw.T)), gwginv)
+            #cov /= nobs
+
+        return cov/nobs

     @property
     def bse_(self):
-        """standard error of the parameter estimates
-        """
-        pass
+        '''standard error of the parameter estimates
+        '''
+        return self.get_bse()

     def get_bse(self, **kwds):
-        """standard error of the parameter estimates with options
+        '''standard error of the parameter estimates with options

         Parameters
         ----------
@@ -694,19 +1238,24 @@ class GMMResults(LikelihoodModelResults):
         bse : ndarray
             estimated standard error of parameter estimates

-        """
-        pass
+        '''
+        return np.sqrt(np.diag(self.cov_params(**kwds)))

     def jtest(self):
-        """overidentification test
+        '''overidentification test

         I guess this is missing a division by nobs,
         what's the normalization in jval ?
-        """
-        pass
+        '''
+
+        jstat = self.jval
+        nparams = self.params.size #self.nparams
+        df = self.model.nmoms - nparams
+        return jstat, stats.chi2.sf(jstat, df), df
+

     def compare_j(self, other):
-        """overidentification test for comparing two nested gmm estimates
+        '''overidentification test for comparing two nested gmm estimates

         This assumes that some moment restrictions have been dropped in one
         of the GMM estimates relative to the other.
@@ -719,10 +1268,21 @@ class GMMResults(LikelihoodModelResults):

         TODO: Check in which cases Stata programs use the same weigths

-        """
-        pass
-
-    def summary(self, yname=None, xname=None, title=None, alpha=0.05):
+        '''
+        jstat1 = self.jval
+        k_moms1 = self.model.nmoms
+        jstat2 = other.jval
+        k_moms2 = other.model.nmoms
+        jdiff = jstat1 - jstat2
+        df = k_moms1 - k_moms2
+        if df < 0:
+            # possible nested in other way, TODO allow this or not
+            # flip sign instead of absolute
+            df = - df
+            jdiff = - jdiff
+        return jdiff, stats.chi2.sf(jdiff, df), df
+
+    def summary(self, yname=None, xname=None, title=None, alpha=.05):
         """Summarize the Regression Results

         Parameters
@@ -748,11 +1308,48 @@ class GMMResults(LikelihoodModelResults):
         statsmodels.iolib.summary.Summary : class to hold summary
             results
         """
-        pass
+        #TODO: add a summary text for options that have been used
+
+        jvalue, jpvalue, jdf = self.jtest()
+
+        top_left = [('Dep. Variable:', None),
+                    ('Model:', None),
+                    ('Method:', ['GMM']),
+                    ('Date:', None),
+                    ('Time:', None),
+                    ('No. Observations:', None),
+                    #('Df Residuals:', None), #[self.df_resid]), #TODO: spelling
+                    #('Df Model:', None), #[self.df_model])
+                    ]
+
+        top_right = [#('R-squared:', ["%#8.3f" % self.rsquared]),
+                     #('Adj. R-squared:', ["%#8.3f" % self.rsquared_adj]),
+                     ('Hansen J:', ["%#8.4g" % jvalue] ),
+                     ('Prob (Hansen J):', ["%#6.3g" % jpvalue]),
+                     #('F-statistic:', ["%#8.4g" % self.fvalue] ),
+                     #('Prob (F-statistic):', ["%#6.3g" % self.f_pvalue]),
+                     #('Log-Likelihood:', None), #["%#6.4g" % self.llf]),
+                     #('AIC:', ["%#8.4g" % self.aic]),
+                     #('BIC:', ["%#8.4g" % self.bic])
+                     ]
+
+        if title is None:
+            title = self.model.__class__.__name__ + ' ' + "Results"
+
+        # create summary table instance
+        from statsmodels.iolib.summary import Summary
+        smry = Summary()
+        smry.add_table_2cols(self, gleft=top_left, gright=top_right,
+                             yname=yname, xname=xname, title=title)
+        smry.add_table_params(self, yname=yname, xname=xname, alpha=alpha,
+                              use_t=self.use_t)
+
+        return smry
+


 class IVGMM(GMM):
-    """
+    '''
     Basic class for instrumental variables estimation using GMM

     A linear function for the conditional mean is defined as default but the
@@ -764,28 +1361,38 @@ class IVGMM(GMM):
     LinearIVGMM
     NonlinearIVGMM

-    """
+    '''
+
     results_class = 'IVGMMResults'

     def fitstart(self):
         """Create array of zeros"""
-        pass
+        return np.zeros(self.exog.shape[1])

     def start_weights(self, inv=True):
         """Starting weights"""
-        pass
+        zz = np.dot(self.instrument.T, self.instrument)
+        nobs = self.instrument.shape[0]
+        if inv:
+            return zz / nobs
+        else:
+            return np.linalg.pinv(zz / nobs)

     def get_error(self, params):
         """Get error at params"""
-        pass
+        return self.endog - self.predict(params)

     def predict(self, params, exog=None):
         """Get prediction at params"""
-        pass
+        if exog is None:
+            exog = self.exog
+
+        return np.dot(exog, params)

     def momcond(self, params):
         """Error times instrument"""
-        pass
+        instrument = self.instrument
+        return instrument * self.get_error(params)[:, None]


 class LinearIVGMM(IVGMM):
@@ -820,7 +1427,7 @@ class LinearIVGMM(IVGMM):
     """

     def fitgmm(self, start, weights=None, optim_method=None, **kwds):
-        """estimate parameters using GMM for linear model
+        '''estimate parameters using GMM for linear model

         Uses closed form expression instead of nonlinear optimizers

@@ -844,8 +1451,55 @@ class LinearIVGMM(IVGMM):
         paramest : ndarray
             estimated parameters

-        """
-        pass
+        '''
+##        if not fixed is None:  #fixed not defined in this version
+##            raise NotImplementedError
+
+        # TODO: should start_weights only be in `fit`
+        if weights is None:
+            weights = self.start_weights(inv=False)
+
+        y, x, z = self.endog, self.exog, self.instrument
+
+        zTx = np.dot(z.T, x)
+        zTy = np.dot(z.T, y)
+        # normal equation, solved with pinv
+        part0 = zTx.T.dot(weights)
+        part1 = part0.dot(zTx)
+        part2 = part0.dot(zTy)
+        params = np.linalg.pinv(part1).dot(part2)
+
+        return params
+
+
+    def predict(self, params, exog=None):
+        if exog is None:
+            exog = self.exog
+
+        return np.dot(exog, params)
+
+
+    def gradient_momcond(self, params, **kwds):
+        # **kwds for compatibility not used
+
+        x, z = self.exog, self.instrument
+        gradmoms = -np.dot(z.T, x) / self.nobs
+
+        return gradmoms
+
+    def score(self, params, weights, **kwds):
+        # **kwds for compatibility, not used
+        # Note: I coud use general formula with gradient_momcond instead
+
+        x, z = self.exog, self.instrument
+        nobs = z.shape[0]
+
+        u = self.get_errors(params)
+        score = -2 * np.dot(x.T, z).dot(weights.dot(np.dot(z.T, u)))
+        score /= nobs * nobs
+
+        return score
+


 class NonlinearIVGMM(IVGMM):
@@ -889,33 +1543,92 @@ class NonlinearIVGMM(IVGMM):

     TODO: check required signature of jac_error and jac_func
     """
+    # This should be reversed:
+    # NonlinearIVGMM is IVGMM and need LinearIVGMM as special case (fit, predict)
+
+
+    def fitstart(self):
+        #might not make sense for more general functions
+        return np.zeros(self.exog.shape[1])
+

     def __init__(self, endog, exog, instrument, func, **kwds):
         self.func = func
         super(NonlinearIVGMM, self).__init__(endog, exog, instrument, **kwds)


+    def predict(self, params, exog=None):
+        if exog is None:
+            exog = self.exog
+
+        return self.func(params, exog)
+
+    #----------  the following a semi-general versions,
+    # TODO: move to higher class after testing
+
+    def jac_func(self, params, weights, args=None, centered=True, epsilon=None):
+
+        # TODO: Why are ther weights in the signature - copy-paste error?
+        deriv = approx_fprime(params, self.func, args=(self.exog,),
+                              centered=centered, epsilon=epsilon)
+
+        return deriv
+
+
+    def jac_error(self, params, weights, args=None, centered=True,
+                   epsilon=None):
+
+        jac_func = self.jac_func(params, weights, args=None, centered=True,
+                                 epsilon=None)
+
+        return -jac_func
+
+
+    def score(self, params, weights, **kwds):
+        # **kwds for compatibility not used
+        # Note: I coud use general formula with gradient_momcond instead
+
+        z = self.instrument
+        nobs = z.shape[0]
+
+        jac_u = self.jac_error(params, weights, args=None, epsilon=None,
+                               centered=True)
+        x = -jac_u  # alias, plays the same role as X in linear model
+
+        u = self.get_error(params)
+
+        score = -2 * np.dot(np.dot(x.T, z), weights).dot(np.dot(z.T, u))
+        score /= nobs * nobs
+
+        return score
+
+
 class IVGMMResults(GMMResults):
     """Results class of IVGMM"""
+    # this assumes that we have an additive error model `(y - f(x, params))`

     @cache_readonly
     def fittedvalues(self):
         """Fitted values"""
-        pass
+        return self.model.predict(self.params)
+

     @cache_readonly
     def resid(self):
         """Residuals"""
-        pass
+        return self.model.endog - self.fittedvalues
+

     @cache_readonly
     def ssr(self):
         """Sum of square errors"""
-        pass
+        return (self.resid * self.resid).sum(0)
+
+


 def spec_hausman(params_e, params_i, cov_params_e, cov_params_i, dof=None):
-    """Hausmans specification test
+    '''Hausmans specification test

     Parameters
     ----------
@@ -946,40 +1659,80 @@ def spec_hausman(params_e, params_i, cov_params_e, cov_params_i, dof=None):
     Greene section 5.5 p.82/83


-    """
-    pass
+    '''
+    params_diff = (params_i - params_e)
+    cov_diff = cov_params_i - cov_params_e
+    #TODO: the following is very inefficient, solves problem (svd) twice
+    #use linalg.lstsq or svd directly
+    #cov_diff will very often be in-definite (singular)
+    if not dof:
+        dof = np.linalg.matrix_rank(cov_diff)
+    cov_diffpinv = np.linalg.pinv(cov_diff)
+    H = np.dot(params_diff, np.dot(cov_diffpinv, params_diff))
+    pval = stats.chi2.sf(H, dof)
+
+    evals = np.linalg.eigvalsh(cov_diff)
+
+    return H, pval, dof, evals
+


+
+###########
+
 class DistQuantilesGMM(GMM):
-    """
+    '''
     Estimate distribution parameters by GMM based on matching quantiles

     Currently mainly to try out different requirements for GMM when we cannot
     calculate the optimal weighting matrix.

-    """
+    '''

     def __init__(self, endog, exog, instrument, **kwds):
+        #TODO: something wrong with super
         super(DistQuantilesGMM, self).__init__(endog, exog, instrument)
-        self.epsilon_iter = 1e-05
+        #self.func = func
+        self.epsilon_iter = 1e-5
+
         self.distfn = kwds['distfn']
+        #done by super does not work yet
+        #TypeError: super does not take keyword arguments
         self.endog = endog
+
+        #make this optional for fit
         if 'pquant' not in kwds:
-            self.pquant = pquant = np.array([0.01, 0.05, 0.1, 0.4, 0.6, 0.9,
-                0.95, 0.99])
+            self.pquant = pquant = np.array([0.01, 0.05,0.1,0.4,0.6,0.9,0.95,0.99])
         else:
             self.pquant = pquant = kwds['pquant']
-        self.xquant = np.array([stats.scoreatpercentile(endog, p) for p in 
-            pquant * 100])
+
+        #TODO: vectorize this: use edf
+        self.xquant = np.array([stats.scoreatpercentile(endog, p) for p
+                                in pquant*100])
         self.nmoms = len(self.pquant)
+
+        #TODOcopied from GMM, make super work
         self.endog = endog
         self.exog = exog
         self.instrument = instrument
         self.results = GMMResults(model=self)
-        self.epsilon_iter = 1e-06
+        #self.__dict__.update(kwds)
+        self.epsilon_iter = 1e-6

-    def momcond(self, params):
-        """moment conditions for estimating distribution parameters by matching
+    def fitstart(self):
+        #todo: replace with or add call to distfn._fitstart
+        #      added but not used during testing
+        distfn = self.distfn
+        if hasattr(distfn, '_fitstart'):
+            start = distfn._fitstart(self.endog)
+        else:
+            start = [1]*distfn.numargs + [0.,1.]
+
+        return np.asarray(start)
+
+    def momcond(self, params): #drop distfn as argument
+        #, mom2, quantile=None, shape=None
+        '''moment conditions for estimating distribution parameters by matching
         quantiles, defines as many moment conditions as quantiles.

         Returns
@@ -992,11 +1745,26 @@ class DistQuantilesGMM(GMM):
         This can be used for method of moments or for generalized method of
         moments.

-        """
-        pass
+        '''
+        #this check looks redundant/unused know
+        if len(params) == 2:
+            loc, scale = params
+        elif len(params) == 3:
+            shape, loc, scale = params
+        else:
+            #raise NotImplementedError
+            pass #see whether this might work, seems to work for beta with 2 shape args
+
+        #mom2diff = np.array(distfn.stats(*params)) - mom2
+        #if not quantile is None:
+        pq, xq = self.pquant, self.xquant
+        #ppfdiff = distfn.ppf(pq, alpha)
+        cdfdiff = self.distfn.cdf(xq, *params) - pq
+        #return np.concatenate([mom2diff, cdfdiff[:1]])
+        return np.atleast_2d(cdfdiff)

     def fitonce(self, start=None, weights=None, has_optimal_weights=False):
-        """fit without estimating an optimal weighting matrix and return results
+        '''fit without estimating an optimal weighting matrix and return results

         This is a convenience function that calls fitgmm and covparams with
         a given weight matrix or the identity weight matrix.
@@ -1020,9 +1788,25 @@ class DistQuantilesGMM(GMM):
         fitgmm
         cov_params

-        """
-        pass
+        '''
+        if weights is None:
+            weights = np.eye(self.nmoms)
+        params = self.fitgmm(start=start)
+        # TODO: rewrite this old hack, should use fitgmm or fit maxiter=0
+        self.results.params = params  #required before call to self.cov_params
+        self.results.wargs = {} #required before call to self.cov_params
+        self.results.options_other = {'weights_method':'cov'}
+        # TODO: which weights_method?  There should not be any needed ?
+        _cov_params = self.results.cov_params(weights=weights,
+                                      has_optimal_weights=has_optimal_weights)
+
+        self.results.weights = weights
+        self.results.jval = self.gmmobjective(params, weights)
+        self.results.options_other.update({'has_optimal_weights':has_optimal_weights})
+
+        return self.results


-results_class_dict = {'GMMResults': GMMResults, 'IVGMMResults':
-    IVGMMResults, 'DistQuantilesGMM': GMMResults}
+results_class_dict = {'GMMResults': GMMResults,
+                      'IVGMMResults': IVGMMResults,
+                      'DistQuantilesGMM': GMMResults}  #TODO: should be a default
diff --git a/statsmodels/sandbox/regression/kernridgeregress_class.py b/statsmodels/sandbox/regression/kernridgeregress_class.py
index cfb0f90cd..670460622 100644
--- a/statsmodels/sandbox/regression/kernridgeregress_class.py
+++ b/statsmodels/sandbox/regression/kernridgeregress_class.py
@@ -1,11 +1,23 @@
-"""Kernel Ridge Regression for local non-parametric regression"""
+'''Kernel Ridge Regression for local non-parametric regression'''
+
+
 import numpy as np
 from scipy import spatial as ssp
 import matplotlib.pylab as plt


+def kernel_rbf(x,y,scale=1, **kwds):
+    #scale = kwds.get('scale',1)
+    dist = ssp.minkowski_distance_p(x[:,np.newaxis,:],y[np.newaxis,:,:],2)
+    return np.exp(-0.5/scale*(dist))
+
+
+def kernel_euclid(x,y,p=2, **kwds):
+    return ssp.minkowski_distance(x[:,np.newaxis,:],y[np.newaxis,:,:],p)
+
+
 class GaussProcess:
-    """class to perform kernel ridge regression (gaussian process)
+    '''class to perform kernel ridge regression (gaussian process)

     Warning: this class is memory intensive, it creates nobs x nobs distance
     matrix and its inverse, where nobs is the number of rows (observations).
@@ -43,11 +55,11 @@ class GaussProcess:

     a short summary of the kernel ridge regression is at
     http://www.ics.uci.edu/~welling/teaching/KernelsICS273B/Kernel-Ridge.pdf
-    """
+    '''

-    def __init__(self, x, y=None, kernel=kernel_rbf, scale=0.5, ridgecoeff=
-        1e-10, **kwds):
-        """
+    def __init__(self, x, y=None, kernel=kernel_rbf,
+                 scale=0.5, ridgecoeff = 1e-10, **kwds ):
+        '''
         Parameters
         ----------
         x : 2d array (N,K)
@@ -74,31 +86,121 @@ class GaussProcess:

         Both scale and the ridge coefficient smooth the fitted curve.

-        """
+        '''
+
         self.x = x
         self.kernel = kernel
         self.scale = scale
         self.ridgecoeff = ridgecoeff
-        self.distxsample = kernel(x, x, scale=scale)
-        self.Kinv = np.linalg.inv(self.distxsample + np.eye(*self.
-            distxsample.shape) * ridgecoeff)
+        self.distxsample = kernel(x,x,scale=scale)
+        self.Kinv = np.linalg.inv(self.distxsample +
+                             np.eye(*self.distxsample.shape)*ridgecoeff)
         if y is not None:
             self.y = y
             self.yest = self.fit(y)

-    def fit(self, y):
-        """fit the training explanatory variables to a sample ouput variable"""
-        pass
-
-    def predict(self, x):
-        """predict new y values for a given array of explanatory variables"""
-        pass
-
-    def plot(self, y, plt=plt):
-        """some basic plots"""
-        pass

+    def fit(self,y):
+        '''fit the training explanatory variables to a sample ouput variable'''
+        self.parest = np.dot(self.Kinv, y) #self.kernel(y,y,scale=self.scale))
+        yhat = np.dot(self.distxsample,self.parest)
+        return yhat
+
+##        print ds33.shape
+##        ds33_2 = kernel(x,x[::k,:],scale=scale)
+##        dsinv = np.linalg.inv(ds33+np.eye(*distxsample.shape)*ridgecoeff)
+##        B = np.dot(dsinv,y[::k,:])
+    def predict(self,x):
+        '''predict new y values for a given array of explanatory variables'''
+        self.xpredict = x
+        distxpredict = self.kernel(x, self.x, scale=self.scale)
+        self.ypredict = np.dot(distxpredict, self.parest)
+        return self.ypredict
+
+    def plot(self, y, plt=plt ):
+        '''some basic plots'''
+        #todo return proper graph handles
+        plt.figure()
+        plt.plot(self.x,self.y, 'bo-', self.x, self.yest, 'r.-')
+        plt.title('sample (training) points')
+        plt.figure()
+        plt.plot(self.xpredict,y,'bo-',self.xpredict,self.ypredict,'r.-')
+        plt.title('all points')
+
+
+
+def example1():
+    m,k = 500,4
+    upper = 6
+    scale=10
+    xs1a = np.linspace(1,upper,m)[:,np.newaxis]
+    xs1 = xs1a*np.ones((1,4)) + 1/(1.0+np.exp(np.random.randn(m,k)))
+    xs1 /= np.std(xs1[::k,:],0)   # normalize scale, could use cov to normalize
+    y1true = np.sum(np.sin(xs1)+np.sqrt(xs1),1)[:,np.newaxis]
+    y1 = y1true + 0.250 * np.random.randn(m,1)
+
+    stride = 2 #use only some points as trainig points e.g 2 means every 2nd
+    gp1 = GaussProcess(xs1[::stride,:],y1[::stride,:], kernel=kernel_euclid,
+                       ridgecoeff=1e-10)
+    yhatr1 = gp1.predict(xs1)
+    plt.figure()
+    plt.plot(y1true, y1,'bo',y1true, yhatr1,'r.')
+    plt.title('euclid kernel: true y versus noisy y and estimated y')
+    plt.figure()
+    plt.plot(y1,'bo-',y1true,'go-',yhatr1,'r.-')
+    plt.title('euclid kernel: true (green), noisy (blue) and estimated (red) '+
+              'observations')
+
+    gp2 = GaussProcess(xs1[::stride,:],y1[::stride,:], kernel=kernel_rbf,
+                       scale=scale, ridgecoeff=1e-1)
+    yhatr2 = gp2.predict(xs1)
+    plt.figure()
+    plt.plot(y1true, y1,'bo',y1true, yhatr2,'r.')
+    plt.title('rbf kernel: true versus noisy (blue) and estimated (red) observations')
+    plt.figure()
+    plt.plot(y1,'bo-',y1true,'go-',yhatr2,'r.-')
+    plt.title('rbf kernel: true (green), noisy (blue) and estimated (red) '+
+              'observations')
+    #gp2.plot(y1)
+
+
+def example2(m=100, scale=0.01, stride=2):
+    #m,k = 100,1
+    upper = 6
+    xs1 = np.linspace(1,upper,m)[:,np.newaxis]
+    y1true = np.sum(np.sin(xs1**2),1)[:,np.newaxis]/xs1
+    y1 = y1true + 0.05*np.random.randn(m,1)
+
+    ridgecoeff = 1e-10
+    #stride = 2 #use only some points as trainig points e.g 2 means every 2nd
+    gp1 = GaussProcess(xs1[::stride,:],y1[::stride,:], kernel=kernel_euclid,
+                       ridgecoeff=1e-10)
+    yhatr1 = gp1.predict(xs1)
+    plt.figure()
+    plt.plot(y1true, y1,'bo',y1true, yhatr1,'r.')
+    plt.title('euclid kernel: true versus noisy (blue) and estimated (red) observations')
+    plt.figure()
+    plt.plot(y1,'bo-',y1true,'go-',yhatr1,'r.-')
+    plt.title('euclid kernel: true (green), noisy (blue) and estimated (red) '+
+              'observations')
+
+    gp2 = GaussProcess(xs1[::stride,:],y1[::stride,:], kernel=kernel_rbf,
+                       scale=scale, ridgecoeff=1e-2)
+    yhatr2 = gp2.predict(xs1)
+    plt.figure()
+    plt.plot(y1true, y1,'bo',y1true, yhatr2,'r.')
+    plt.title('rbf kernel: true versus noisy (blue) and estimated (red) observations')
+    plt.figure()
+    plt.plot(y1,'bo-',y1true,'go-',yhatr2,'r.-')
+    plt.title('rbf kernel: true (green), noisy (blue) and estimated (red) '+
+              'observations')
+    #gp2.plot(y1)

 if __name__ == '__main__':
     example2()
+    #example2(m=1000, scale=0.01)
+    #example2(m=100, scale=0.5)   # oversmoothing
+    #example2(m=2000, scale=0.005) # this looks good for rbf, zoom in
+    #example2(m=200, scale=0.01,stride=4)
     example1()
+    #plt.show()
diff --git a/statsmodels/sandbox/regression/ols_anova_original.py b/statsmodels/sandbox/regression/ols_anova_original.py
index 0596158ea..1dca10b82 100644
--- a/statsmodels/sandbox/regression/ols_anova_original.py
+++ b/statsmodels/sandbox/regression/ols_anova_original.py
@@ -1,58 +1,82 @@
-""" convenience functions for ANOVA type analysis with OLS
+''' convenience functions for ANOVA type analysis with OLS

 Note: statistical results of ANOVA are not checked, OLS is
 checked but not whether the reported results are the ones used
 in ANOVA

-"""
+'''
+
 import numpy as np
 import numpy.lib.recfunctions
+
 from statsmodels.compat.python import lmap
 from statsmodels.regression.linear_model import OLS
-dt_b = np.dtype([('breed', int), ('sex', int), ('litter', int), ('pen', int
-    ), ('pig', int), ('age', float), ('bage', float), ('y', float)])
-""" too much work using structured masked arrays
+
+
+dt_b = np.dtype([('breed', int), ('sex', int), ('litter', int),
+               ('pen', int), ('pig', int), ('age', float),
+               ('bage', float), ('y', float)])
+''' too much work using structured masked arrays
 dta = np.mafromtxt('dftest3.data', dtype=dt_b)

 dta_use = np.ma.column_stack[[dta[col] for col in 'y sex age'.split()]]
-"""
+'''
+
+
 dta = np.genfromtxt('dftest3.data')
 print(dta.shape)
 mask = np.isnan(dta)
-print('rows with missing values', mask.any(1).sum())
-vars = dict((v[0], (idx, v[1])) for idx, v in enumerate((('breed', int), (
-    'sex', int), ('litter', int), ('pen', int), ('pig', int), ('age', float
-    ), ('bage', float), ('y', float))))
+print("rows with missing values", mask.any(1).sum())
+vars = dict((v[0], (idx, v[1])) for idx, v in enumerate((('breed', int),
+                                                         ('sex', int),
+                                                         ('litter', int),
+                                                         ('pen', int),
+                                                         ('pig', int),
+                                                         ('age', float),
+                                                         ('bage', float),
+                                                         ('y', float))))
+
 datavarnames = 'y sex age'.split()
+#possible to avoid temporary array ?
 dta_use = dta[:, [vars[col][0] for col in datavarnames]]
 keeprows = ~np.isnan(dta_use).any(1)
 print('number of complete observations', keeprows.sum())
-dta_used = dta_use[keeprows, :]
-varsused = dict((k, [dta_used[:, idx], idx, vars[k][1]]) for idx, k in
-    enumerate(datavarnames))
+dta_used = dta_use[keeprows,:]
+
+varsused = dict((k, [dta_used[:,idx], idx, vars[k][1]]) for idx, k in enumerate(datavarnames))

+# use function for dummy
+#sexgroups = np.unique(dta_used[:,1])
+#sexdummy = (dta_used[:,1][:, None] == sexgroups).astype(int)

 def data2dummy(x, returnall=False):
-    """convert array of categories to dummy variables
+    '''convert array of categories to dummy variables
     by default drops dummy variable for last category
-    uses ravel, 1d only"""
-    pass
-
+    uses ravel, 1d only'''
+    x = x.ravel()
+    groups = np.unique(x)
+    if returnall:
+        return (x[:, None] == groups).astype(int)
+    else:
+        return (x[:, None] == groups).astype(int)[:,:-1]

 def data2proddummy(x):
-    """creates product dummy variables from 2 columns of 2d array
+    '''creates product dummy variables from 2 columns of 2d array

     drops last dummy variable, but not from each category
     singular with simple dummy variable but not with constant

     quickly written, no safeguards

-    """
-    pass
-
+    '''
+    #brute force, assumes x is 2d
+    #replace with encoding if possible
+    groups = np.unique(lmap(tuple, x.tolist()))
+    #includes singularity with additive factors
+    return (x==groups[:,None,:]).all(-1).T.astype(int)[:,:-1]

-def data2groupcont(x1, x2):
-    """create dummy continuous variable
+def data2groupcont(x1,x2):
+    '''create dummy continuous variable

     Parameters
     ----------
@@ -64,24 +88,31 @@ def data2groupcont(x1, x2):
     Notes
     -----
     useful for group specific slope coefficients in regression
-    """
-    pass
+    '''
+    if x2.ndim == 1:
+        x2 = x2[:,None]
+    dummy = data2dummy(x1, returnall=True)
+    return dummy * x2

-
-sexdummy = data2dummy(dta_used[:, 1])
+sexdummy = data2dummy(dta_used[:,1])
 factors = ['sex']
 for k in factors:
     varsused[k][0] = data2dummy(varsused[k][0])
+
 products = [('sex', 'age')]
 for k in products:
-    varsused[''.join(k)] = data2proddummy(np.c_[varsused[k[0]][0], varsused
-        [k[1]][0]])
-X_b0 = np.c_[sexdummy, dta_used[:, 2], np.ones((dta_used.shape[0], 1))]
-y_b0 = dta_used[:, 0]
+    varsused[''.join(k)] = data2proddummy(np.c_[varsused[k[0]][0],varsused[k[1]][0]])
+
+# make dictionary of variables with dummies as one variable
+#vars_to_use = {name: data or dummy variables}
+
+X_b0 = np.c_[sexdummy, dta_used[:,2], np.ones((dta_used.shape[0],1))]
+y_b0 = dta_used[:,0]
 res_b0 = OLS(y_b0, X_b0).results
 print(res_b0.params)
 print(res_b0.ssr)
-anova_str0 = """
+
+anova_str0 = '''
 ANOVA statistics (model sum of squares excludes constant)
 Source    DF  Sum Squares   Mean Square    F Value    Pr > F
 Model     %(df_model)i        %(ess)f       %(mse_model)f   %(fvalue)f %(f_pvalue)f
@@ -89,8 +120,9 @@ Error     %(df_resid)i     %(ssr)f       %(mse_resid)f
 CTotal    %(nobs)i    %(uncentered_tss)f     %(mse_total)f

 R squared  %(rsquared)f
-"""
-anova_str = """
+'''
+
+anova_str = '''
 ANOVA statistics (model sum of squares includes constant)
 Source    DF  Sum Squares   Mean Square    F Value    Pr > F
 Model     %(df_model)i      %(ssmwithmean)f       %(mse_model)f   %(fvalue)f %(f_pvalue)f
@@ -98,27 +130,42 @@ Error     %(df_resid)i     %(ssr)f       %(mse_resid)f
 CTotal    %(nobs)i    %(uncentered_tss)f     %(mse_total)f

 R squared  %(rsquared)f
-"""
+'''

+#print(anova_str % dict([('df_model', res.df_model)])
+#anovares = ['df_model' , 'df_resid'

 def anovadict(res):
-    """update regression results dictionary with ANOVA specific statistics
+    '''update regression results dictionary with ANOVA specific statistics

     not checked for completeness
-    """
-    pass
+    '''
+    ad = {}
+    ad.update(res.__dict__)
+    anova_attr = ['df_model', 'df_resid', 'ess', 'ssr','uncentered_tss',
+                 'mse_model', 'mse_resid', 'mse_total', 'fvalue', 'f_pvalue',
+                  'rsquared']
+    for key in anova_attr:
+        ad[key] = getattr(res, key)
+    ad['nobs'] = res.model.nobs
+    ad['ssmwithmean'] = res.uncentered_tss - res.ssr
+    return ad


 print(anova_str0 % anovadict(res_b0))
+#the following leaves the constant in, not with NIST regression
+#but something fishy with res.ess negative in examples
 print(anova_str % anovadict(res_b0))
+
 print('using sex only')
-X2 = np.c_[sexdummy, np.ones((dta_used.shape[0], 1))]
+X2 = np.c_[sexdummy, np.ones((dta_used.shape[0],1))]
 res2 = OLS(y_b0, X2).results
 print(res2.params)
 print(res2.ssr)
 print(anova_str % anovadict(res2))
+
 print('using age only')
-X3 = np.c_[dta_used[:, 2], np.ones((dta_used.shape[0], 1))]
+X3 = np.c_[ dta_used[:,2], np.ones((dta_used.shape[0],1))]
 res3 = OLS(y_b0, X3).results
 print(res3.params)
 print(res3.ssr)
@@ -126,7 +173,7 @@ print(anova_str % anovadict(res3))


 def form2design(ss, data):
-    """convert string formula to data dictionary
+    '''convert string formula to data dictionary

     ss : str
      * I : add constant
@@ -160,75 +207,129 @@ def form2design(ss, data):
     -----

     with sorted dict, separate name list would not be necessary
-    """
-    pass
-
+    '''
+    vars = {}
+    names = []
+    for item in ss.split():
+        if item == 'I':
+            vars['const'] = np.ones(data.shape[0])
+            names.append('const')
+        elif ':' not in item:
+            vars[item] = data[item]
+            names.append(item)
+        elif item[:2] == 'F:':
+            v = item.split(':')[1]
+            vars[v] = data2dummy(data[v])
+            names.append(v)
+        elif item[:2] == 'P:':
+            v = item.split(':')[1].split('*')
+            vars[''.join(v)] = data2proddummy(np.c_[data[v[0]],data[v[1]]])
+            names.append(''.join(v))
+        elif item[:2] == 'G:':
+            v = item.split(':')[1].split('*')
+            vars[''.join(v)] = data2groupcont(data[v[0]], data[v[1]])
+            names.append(''.join(v))
+        else:
+            raise ValueError('unknown expression in formula')
+    return vars, names

 nobs = 1000
-testdataint = np.random.randint(3, size=(nobs, 4)).view([('a', int), ('b',
-    int), ('c', int), ('d', int)])
-testdatacont = np.random.normal(size=(nobs, 2)).view([('e', float), ('f',
-    float)])
-dt2 = numpy.lib.recfunctions.zip_descr((testdataint, testdatacont), flatten
-    =True)
-testdata = np.empty((nobs, 1), dt2)
+testdataint = np.random.randint(3, size=(nobs,4)).view([('a',int),('b',int),('c',int),('d',int)])
+testdatacont = np.random.normal( size=(nobs,2)).view([('e',float), ('f',float)])
+dt2 = numpy.lib.recfunctions.zip_descr((testdataint, testdatacont),flatten=True)
+# concatenate structured arrays
+testdata = np.empty((nobs,1), dt2)
 for name in testdataint.dtype.names:
     testdata[name] = testdataint[name]
 for name in testdatacont.dtype.names:
     testdata[name] = testdatacont[name]
+
+
+#print(form2design('a',testdata))
+
 if 0:
-    xx, n = form2design('F:a', testdata)
+    xx, n = form2design('F:a',testdata)
     print(xx)
-    print(form2design('P:a*b', testdata))
-    print(data2proddummy(np.c_[testdata['a'], testdata['b']]))
-    xx, names = form2design('a F:b P:c*d', testdata)
+    print(form2design('P:a*b',testdata))
+    print(data2proddummy((np.c_[testdata['a'],testdata['b']])))
+
+    xx, names = form2design('a F:b P:c*d',testdata)
+
+#xx, names = form2design('I a F:b F:c F:d P:c*d',testdata)
 xx, names = form2design('I a F:b P:c*d', testdata)
 xx, names = form2design('I a F:b P:c*d G:a*e f', testdata)
+
+
 X = np.column_stack([xx[nn] for nn in names])
-y = X.sum(1) + 0.01 * np.random.normal(size=nobs)
-rest1 = OLS(y, X).results
+# simple test version: all coefficients equal to one
+y = X.sum(1) + 0.01*np.random.normal(size=(nobs))
+rest1 = OLS(y,X).results
 print(rest1.params)
 print(anova_str % anovadict(rest1))

-
 def dropname(ss, li):
-    """drop names from a list of strings,
+    '''drop names from a list of strings,
     names to drop are in space delimited list
     does not change original list
-    """
-    pass
-
+    '''
+    newli = li[:]
+    for item in ss.split():
+        newli.remove(item)
+    return newli

 X = np.column_stack([xx[nn] for nn in dropname('ae f', names)])
-y = X.sum(1) + 0.01 * np.random.normal(size=nobs)
-rest1 = OLS(y, X).results
+# simple test version: all coefficients equal to one
+y = X.sum(1) + 0.01*np.random.normal(size=(nobs))
+rest1 = OLS(y,X).results
 print(rest1.params)
 print(anova_str % anovadict(rest1))
-dta = np.genfromtxt('dftest3.data', dt_b, missing='.', usemask=True)
+
+
+# Example: from Bruce
+# -------------------
+
+# read data set and drop rows with missing data
+dta = np.genfromtxt('dftest3.data', dt_b,missing='.', usemask=True)
 print('missing', [dta.mask[k].sum() for k in dta.dtype.names])
 m = dta.mask.view(bool)
-droprows = m.reshape(-1, len(dta.dtype.names)).any(1)
-dta_use_b1 = dta[~droprows, :].data
+droprows = m.reshape(-1,len(dta.dtype.names)).any(1)
+# get complete data as plain structured array
+# maybe does not work with masked arrays
+dta_use_b1 = dta[~droprows,:].data
 print(dta_use_b1.shape)
 print(dta_use_b1.dtype)
+
+#Example b1: variables from Bruce's glm
+
+# prepare data and dummy variables
 xx_b1, names_b1 = form2design('I F:sex age', dta_use_b1)
+# create design matrix
 X_b1 = np.column_stack([xx_b1[nn] for nn in dropname('', names_b1)])
 y_b1 = dta_use_b1['y']
+# estimate using OLS
 rest_b1 = OLS(y_b1, X_b1).results
+# print(results)
 print(rest_b1.params)
 print(anova_str % anovadict(rest_b1))
+#compare with original version only in original version
 print(anova_str % anovadict(res_b0))
+
+# Example: use all variables except pig identifier
+
 allexog = ' '.join(dta.dtype.names[:-1])
-xx_b1a, names_b1a = form2design('I F:breed F:sex F:litter F:pen age bage',
-    dta_use_b1)
+#'breed sex litter pen pig age bage'
+
+xx_b1a, names_b1a = form2design('I F:breed F:sex F:litter F:pen age bage', dta_use_b1)
 X_b1a = np.column_stack([xx_b1a[nn] for nn in dropname('', names_b1a)])
 y_b1a = dta_use_b1['y']
 rest_b1a = OLS(y_b1a, X_b1a).results
 print(rest_b1a.params)
 print(anova_str % anovadict(rest_b1a))
+
 for dropn in names_b1a:
     print('\nResults dropping', dropn)
     X_b1a_ = np.column_stack([xx_b1a[nn] for nn in dropname(dropn, names_b1a)])
     y_b1a_ = dta_use_b1['y']
     rest_b1a_ = OLS(y_b1a_, X_b1a_).results
+    #print(rest_b1a_.params
     print(anova_str % anovadict(rest_b1a_))
diff --git a/statsmodels/sandbox/regression/onewaygls.py b/statsmodels/sandbox/regression/onewaygls.py
index 7e819ce63..44a875324 100644
--- a/statsmodels/sandbox/regression/onewaygls.py
+++ b/statsmodels/sandbox/regression/onewaygls.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 F test for null hypothesis that coefficients in several regressions are the same

@@ -56,7 +57,7 @@ from statsmodels.regression.linear_model import OLS, WLS


 class OneWayLS:
-    """Class to test equality of regression coefficients across groups
+    '''Class to test equality of regression coefficients across groups

     This class performs tests whether the linear regression coefficients are
     the same across pre-specified groups. This can be used to test for
@@ -102,11 +103,11 @@ class OneWayLS:
            make sure groupnames are always consistently sorted/ordered
            Fixed for getting the results, but groups are not printed yet, still
            inconsistent use for summaries of results.
-    """
-
+    '''
     def __init__(self, y, x, groups=None, het=False, data=None, meta=None):
         if groups is None:
             raise ValueError('use OLS if there are no groups')
+            #maybe replace by dispatch to OLS
         if data:
             y = data[y]
             x = [data[v] for v in x]
@@ -117,21 +118,22 @@ class OneWayLS:
         self.endog = np.asarray(y)
         self.exog = np.asarray(x)
         if self.exog.ndim == 1:
-            self.exog = self.exog[:, None]
+            self.exog = self.exog[:,None]
         self.groups = np.asarray(groups)
         self.het = het
+
         self.groupsint = None
         if np.issubdtype(self.groups.dtype, int):
             self.unique = np.unique(self.groups)
             if (self.unique == np.arange(len(self.unique))).all():
                 self.groupsint = self.groups
-        if self.groupsint is None:
-            self.unique, self.groupsint = np.unique(self.groups,
-                return_inverse=True)
-        self.uniqueint = np.arange(len(self.unique))
+
+        if self.groupsint is None: # groups are not consecutive integers
+            self.unique, self.groupsint = np.unique(self.groups, return_inverse=True)
+        self.uniqueint = np.arange(len(self.unique)) #as shortcut

     def fitbygroups(self):
-        """Fit OLS regression for each group separately.
+        '''Fit OLS regression for each group separately.

         Returns
         -------
@@ -147,11 +149,20 @@ class OneWayLS:



-        """
-        pass
+        '''
+        olsbygroup = {}
+        sigmabygroup = []
+        for gi, group in enumerate(self.unique): #np.arange(len(self.unique))):
+            groupmask = self.groupsint == gi   #group index
+            res = OLS(self.endog[groupmask], self.exog[groupmask]).fit()
+            olsbygroup[group] = res
+            sigmabygroup.append(res.mse_resid)
+        self.olsbygroup = olsbygroup
+        self.sigmabygroup = np.array(sigmabygroup)
+        self.weights = np.sqrt(self.sigmabygroup[self.groupsint]) #TODO:chk sqrt

     def fitjoint(self):
-        """fit a joint fixed effects model to all observations
+        '''fit a joint fixed effects model to all observations

         The regression results are attached as `lsjoint`.

@@ -168,16 +179,66 @@ class OneWayLS:



-        """
-        pass
+        '''
+        if not hasattr(self, 'weights'):
+            self.fitbygroups()
+        groupdummy = (self.groupsint[:,None] == self.uniqueint).astype(int)
+        #order of dummy variables by variable - not used
+        #dummyexog = self.exog[:,:,None]*groupdummy[:,None,1:]
+        #order of dummy variables by grous - used
+        dummyexog = self.exog[:,None,:]*groupdummy[:,1:,None]
+        exog = np.c_[self.exog, dummyexog.reshape(self.exog.shape[0],-1)] #self.nobs ??
+        #Notes: I changed to drop first group from dummy
+        #instead I want one full set dummies
+        if self.het:
+            weights = self.weights
+            res = WLS(self.endog, exog, weights=weights).fit()
+        else:
+            res = OLS(self.endog, exog).fit()
+        self.lsjoint = res
+        contrasts = {}
+        nvars = self.exog.shape[1]
+        nparams = exog.shape[1]
+        ndummies = nparams - nvars
+        contrasts['all'] = np.c_[np.zeros((ndummies, nvars)), np.eye(ndummies)]
+        for groupind, group in enumerate(self.unique[1:]):  #need enumerate if groups != groupsint
+            groupind = groupind + 1
+            contr = np.zeros((nvars, nparams))
+            contr[:,nvars*groupind:nvars*(groupind+1)] = np.eye(nvars)
+            contrasts[group] = contr
+            #save also for pairs, see next
+            contrasts[(self.unique[0], group)] = contr
+
+        #Note: I'm keeping some duplication for testing
+        pairs = np.triu_indices(len(self.unique),1)
+        for ind1,ind2 in zip(*pairs):  #replace with group1, group2 in sorted(keys)
+            if ind1 == 0:
+                continue # need comparison with benchmark/normalization group separate
+            g1 = self.unique[ind1]
+            g2 = self.unique[ind2]
+            group = (g1, g2)
+            contr = np.zeros((nvars, nparams))
+            contr[:,nvars*ind1:nvars*(ind1+1)] = np.eye(nvars)
+            contr[:,nvars*ind2:nvars*(ind2+1)] = -np.eye(nvars)
+            contrasts[group] = contr
+
+
+        self.contrasts = contrasts

     def fitpooled(self):
-        """fit the pooled model, which assumes there are no differences across groups
-        """
-        pass
+        '''fit the pooled model, which assumes there are no differences across groups
+        '''
+        if self.het:
+            if not hasattr(self, 'weights'):
+                self.fitbygroups()
+            weights = self.weights
+            res = WLS(self.endog, self.exog, weights=weights).fit()
+        else:
+            res = OLS(self.endog, self.exog).fit()
+        self.lspooled = res

     def ftest_summary(self):
-        """run all ftests on the joint model
+        '''run all ftests on the joint model

         Returns
         -------
@@ -190,27 +251,132 @@ class OneWayLS:
         ----
         This are the raw results and not formatted for nice printing.

-        """
-        pass
+        '''
+        if not hasattr(self, 'lsjoint'):
+            self.fitjoint()
+        txt = []
+        summarytable = []
+
+        txt.append('F-test for equality of coefficients across groups')
+        fres = self.lsjoint.f_test(self.contrasts['all'])
+        txt.append(fres.__str__())
+        summarytable.append(('all',(fres.fvalue, fres.pvalue, fres.df_denom, fres.df_num)))
+
+#        for group in self.unique[1:]:  #replace with group1, group2 in sorted(keys)
+#            txt.append('F-test for equality of coefficients between group'
+#                       ' %s and group %s' % (group, '0'))
+#            fres = self.lsjoint.f_test(self.contrasts[group])
+#            txt.append(fres.__str__())
+#            summarytable.append((group,(fres.fvalue, fres.pvalue, fres.df_denom, fres.df_num)))
+        pairs = np.triu_indices(len(self.unique),1)
+        for ind1,ind2 in zip(*pairs):  #replace with group1, group2 in sorted(keys)
+            g1 = self.unique[ind1]
+            g2 = self.unique[ind2]
+            txt.append('F-test for equality of coefficients between group'
+                       ' %s and group %s' % (g1, g2))
+            group = (g1, g2)
+            fres = self.lsjoint.f_test(self.contrasts[group])
+            txt.append(fres.__str__())
+            summarytable.append((group,(fres.fvalue, fres.pvalue, fres.df_denom, fres.df_num)))
+
+        self.summarytable = summarytable
+        return '\n'.join(txt), summarytable

-    def print_summary(self, res):
-        """printable string of summary
-
-        """
-        pass

+    def print_summary(self, res):
+        '''printable string of summary
+
+        '''
+        groupind = res.groups
+        #res.fitjoint()  #not really necessary, because called by ftest_summary
+        if hasattr(res, 'self.summarytable'):
+            summtable = self.summarytable
+        else:
+            _, summtable = res.ftest_summary()
+        txt = ''
+        #print ft[0]  #skip because table is nicer
+        templ = \
+'''Table of F-tests for overall or pairwise equality of coefficients'
+%(tab)s
+
+
+Notes: p-values are not corrected for many tests
+       (no Bonferroni correction)
+       * : reject at 5%% uncorrected confidence level
+Null hypothesis: all or pairwise coefficient are the same'
+Alternative hypothesis: all coefficients are different'
+
+
+Comparison with stats.f_oneway
+%(statsfow)s
+
+
+Likelihood Ratio Test
+%(lrtest)s
+Null model: pooled all coefficients are the same across groups,'
+Alternative model: all coefficients are allowed to be different'
+not verified but looks close to f-test result'
+
+
+OLS parameters by group from individual, separate ols regressions'
+%(olsbg)s
+for group in sorted(res.olsbygroup):
+    r = res.olsbygroup[group]
+    print group, r.params
+
+
+Check for heteroscedasticity, '
+variance and standard deviation for individual regressions'
+%(grh)s
+variance    ', res.sigmabygroup
+standard dev', np.sqrt(res.sigmabygroup)
+'''
+
+        from statsmodels.iolib import SimpleTable
+        resvals = {}
+        resvals['tab'] = str(SimpleTable([(['%r' % (row[0],)]
+                            + list(row[1])
+                            + ['*']*(row[1][1]>0.5).item() ) for row in summtable],
+                          headers=['pair', 'F-statistic','p-value','df_denom',
+                                   'df_num']))
+        resvals['statsfow'] = str(stats.f_oneway(*[res.endog[groupind==gr] for gr in
+                                                   res.unique]))
+        #resvals['lrtest'] = str(res.lr_test())
+        resvals['lrtest'] = str(SimpleTable([res.lr_test()],
+                                    headers=['likelihood ratio', 'p-value', 'df'] ))
+
+        resvals['olsbg'] = str(SimpleTable([[group]
+                                            + res.olsbygroup[group].params.tolist()
+                                            for group in sorted(res.olsbygroup)]))
+        resvals['grh'] = str(SimpleTable(np.vstack([res.sigmabygroup,
+                                               np.sqrt(res.sigmabygroup)]),
+                                     headers=res.unique.tolist()))
+
+        return templ % resvals
+
+    # a variation of this has been added to RegressionResults as compare_lr
     def lr_test(self):
-        """
+        r'''
         generic likelihood ratio test between nested models

-            \\begin{align}
-            D & = -2(\\ln(\\text{likelihood for null model}) - \\ln(\\text{likelihood for alternative model})) \\\\
-            & = -2\\ln\\left( \\frac{\\text{likelihood for null model}}{\\text{likelihood for alternative model}} \\right).
-            \\end{align}
+            \begin{align}
+            D & = -2(\ln(\text{likelihood for null model}) - \ln(\text{likelihood for alternative model})) \\
+            & = -2\ln\left( \frac{\text{likelihood for null model}}{\text{likelihood for alternative model}} \right).
+            \end{align}

         is distributed as chisquare with df equal to difference in number of parameters or equivalently
         difference in residual degrees of freedom  (sign?)

         TODO: put into separate function
-        """
-        pass
+        '''
+        if not hasattr(self, 'lsjoint'):
+            self.fitjoint()
+        if not hasattr(self, 'lspooled'):
+            self.fitpooled()
+        loglikejoint = self.lsjoint.llf
+        loglikepooled = self.lspooled.llf
+        lrstat = -2*(loglikepooled - loglikejoint)   #??? check sign
+        lrdf = self.lspooled.df_resid - self.lsjoint.df_resid
+        lrpval = stats.chi2.sf(lrstat, lrdf)
+
+        return lrstat, lrpval, lrdf
diff --git a/statsmodels/sandbox/regression/penalized.py b/statsmodels/sandbox/regression/penalized.py
index 97c3bc753..2d5caf400 100644
--- a/statsmodels/sandbox/regression/penalized.py
+++ b/statsmodels/sandbox/regression/penalized.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """linear model with Theil prior probabilistic restrictions, generalized Ridge

 Created on Tue Dec 20 00:10:10 2011
@@ -34,31 +35,32 @@ problem with definition of df_model, it has 1 subtracted for constant
 """
 from statsmodels.compat.python import lrange
 import numpy as np
+
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.regression.linear_model import OLS, GLS, RegressionResults
 from statsmodels.regression.feasible_gls import atleast_2dcols


 class TheilGLS(GLS):
-    """GLS with stochastic restrictions
+    r"""GLS with stochastic restrictions

     TheilGLS estimates the following linear model

-    .. math:: y = X \\beta + u
+    .. math:: y = X \beta + u

     using additional information given by a stochastic constraint

-    .. math:: q = R \\beta + v
+    .. math:: q = R \beta + v

-    :math:`E(u) = 0`, :math:`cov(u) = \\Sigma`
-    :math:`cov(u, v) = \\Sigma_p`, with full rank.
+    :math:`E(u) = 0`, :math:`cov(u) = \Sigma`
+    :math:`cov(u, v) = \Sigma_p`, with full rank.

     u and v are assumed to be independent of each other.
     If :math:`E(v) = 0`, then the estimator is unbiased.

     Note: The explanatory variables are not rescaled, the parameter estimates
     not scale equivariant and fitted values are not scale invariant since
-    scaling changes the relative penalization weights (for given \\Sigma_p).
+    scaling changes the relative penalization weights (for given \Sigma_p).

     Note: GLS is not tested yet, only Sigma is identity is tested

@@ -67,13 +69,13 @@ class TheilGLS(GLS):

     The parameter estimates solves the moment equation:

-    .. math:: (X' \\Sigma X + \\lambda R' \\sigma^2 \\Sigma_p^{-1} R) b = X' \\Sigma y + \\lambda R' \\Sigma_p^{-1} q
+    .. math:: (X' \Sigma X + \lambda R' \sigma^2 \Sigma_p^{-1} R) b = X' \Sigma y + \lambda R' \Sigma_p^{-1} q

-    :math:`\\lambda` is the penalization weight similar to Ridge regression.
+    :math:`\lambda` is the penalization weight similar to Ridge regression.

     If lambda is zero, then the parameter estimate is the same as OLS. If
     lambda goes to infinity, then the restriction is imposed with equality.
-    In the model `pen_weight` is used as name instead of $\\lambda$
+    In the model `pen_weight` is used as name instead of $\lambda$

     R does not have to be square. The number of rows of R can be smaller
     than the number of parameters. In this case not all linear combination
@@ -121,8 +123,9 @@ class TheilGLS(GLS):
     """

     def __init__(self, endog, exog, r_matrix=None, q_matrix=None,
-        sigma_prior=None, sigma=None):
+                 sigma_prior=None, sigma=None):
         super(TheilGLS, self).__init__(endog, exog, sigma=sigma)
+
         if r_matrix is not None:
             r_matrix = np.asarray(r_matrix)
         else:
@@ -130,37 +133,44 @@ class TheilGLS(GLS):
                 const_idx = self.data.const_idx
             except AttributeError:
                 const_idx = None
+
             k_exog = exog.shape[1]
             r_matrix = np.eye(k_exog)
             if const_idx is not None:
                 keep_idx = lrange(k_exog)
                 del keep_idx[const_idx]
-                r_matrix = r_matrix[keep_idx]
+                r_matrix = r_matrix[keep_idx]  # delete row for constant
+
         k_constraints, k_exog = r_matrix.shape
         self.r_matrix = r_matrix
         if k_exog != self.exog.shape[1]:
-            raise ValueError(
-                'r_matrix needs to have the same number of columnsas exog')
+            raise ValueError('r_matrix needs to have the same number of columns'
+                             'as exog')
+
         if q_matrix is not None:
             self.q_matrix = atleast_2dcols(q_matrix)
         else:
             self.q_matrix = np.zeros(k_constraints)[:, None]
         if self.q_matrix.shape != (k_constraints, 1):
             raise ValueError('q_matrix has wrong shape')
+
         if sigma_prior is not None:
             sigma_prior = np.asarray(sigma_prior)
             if np.size(sigma_prior) == 1:
                 sigma_prior = np.diag(sigma_prior * np.ones(k_constraints))
+                #no numerical shortcuts are used for this case
             elif sigma_prior.ndim == 1:
                 sigma_prior = np.diag(sigma_prior)
         else:
             sigma_prior = np.eye(k_constraints)
+
         if sigma_prior.shape != (k_constraints, k_constraints):
             raise ValueError('sigma_prior has wrong shape')
+
         self.sigma_prior = sigma_prior
-        self.sigma_prior_inv = np.linalg.pinv(sigma_prior)
+        self.sigma_prior_inv = np.linalg.pinv(sigma_prior) #or inv

-    def fit(self, pen_weight=1.0, cov_type='sandwich', use_t=True):
+    def fit(self, pen_weight=1., cov_type='sandwich', use_t=True):
         """Estimate parameters and return results instance

         Parameters
@@ -199,10 +209,50 @@ class TheilGLS(GLS):
         The sandwich form of the covariance estimator is not robust to
         misspecified heteroscedasticity or autocorrelation.
         """
-        pass
+        lambd = pen_weight
+        #this does duplicate transformation, but I need resid not wresid
+        res_gls = GLS(self.endog, self.exog, sigma=self.sigma).fit()
+        self.res_gls = res_gls
+        sigma2_e = res_gls.mse_resid
+
+        r_matrix = self.r_matrix
+        q_matrix = self.q_matrix
+        sigma_prior_inv = self.sigma_prior_inv
+        x = self.wexog
+        y = self.wendog[:,None]
+        #why are sigma2_e * lambd multiplied, not ratio?
+        #larger lambd -> stronger prior  (it's not the variance)
+        # Bayesian: lambd is precision = 1/sigma2_prior
+        #print('lambd inside fit', lambd
+        xx = np.dot(x.T, x)
+        xpx = xx + \
+              sigma2_e * lambd * np.dot(r_matrix.T, np.dot(sigma_prior_inv, r_matrix))
+        xpy = np.dot(x.T, y) + \
+              sigma2_e * lambd * np.dot(r_matrix.T, np.dot(sigma_prior_inv, q_matrix))
+        #xpy = xpy[:,None]
+
+        xpxi = np.linalg.pinv(xpx, rcond=1e-15**2)  #to match pinv(x) in OLS case
+        xpxi_sandwich = xpxi.dot(xx).dot(xpxi)
+        params = np.dot(xpxi, xpy)    #or solve
+        params = np.squeeze(params)
+        # normalized_cov_params should have sandwich form xpxi @ xx @ xpxi
+        if cov_type == 'sandwich':
+            normalized_cov_params = xpxi_sandwich
+        elif cov_type == 'data-prior':
+            normalized_cov_params = xpxi    #why attach it to self, i.e. model?
+        else:
+            raise ValueError("cov_type has to be 'sandwich' or 'data-prior'")
+
+        self.normalized_cov_params = xpxi_sandwich
+        self.xpxi = xpxi
+        self.sigma2_e = sigma2_e
+        lfit = TheilRegressionResults(self, params,
+                       normalized_cov_params=normalized_cov_params, use_t=use_t)
+
+        lfit.penalization_factor = lambd
+        return lfit

-    def select_pen_weight(self, method='aicc', start_params=1.0, optim_args
-        =None):
+    def select_pen_weight(self, method='aicc', start_params=1., optim_args=None):
         """find penalization factor that minimizes gcv or an information criterion

         Parameters
@@ -227,19 +277,41 @@ class TheilGLS(GLS):
         -----
         This uses `scipy.optimize.fmin` as optimizer.
         """
-        pass
+        if optim_args is None:
+            optim_args = {}

+        #this does not make sense, since number of parameters stays unchanged
+        # information criteria changes if we use df_model based on trace(hat_matrix)
+        #need leave-one-out, gcv; or some penalization for weak priors
+        #added extra penalization for lambd
+        def get_ic(lambd):
+            # this can be optimized more
+            # for pure Ridge we can keep the eigenvector decomposition
+            return getattr(self.fit(lambd), method)
+
+        from scipy import optimize
+        lambd = optimize.fmin(get_ic, start_params, **optim_args)
+        return lambd
+
+
+#TODO:
+#I need the hatmatrix in the model if I want to do iterative fitting, e.g. GCV
+#move to model or use it from a results instance inside the model,
+#    each call to fit returns results instance
+# note: we need to recalculate hatmatrix for each lambda, so keep in results is fine

 class TheilRegressionResults(RegressionResults):

     def __init__(self, *args, **kwds):
         super(TheilRegressionResults, self).__init__(*args, **kwds)
-        self.df_model = self.hatmatrix_trace() - 1
+
+        # overwrite df_model and df_resid
+        self.df_model = self.hatmatrix_trace() - 1 #assume constant
         self.df_resid = self.model.endog.shape[0] - self.df_model - 1

     @cache_readonly
     def hatmatrix_diag(self):
-        """diagonal of hat matrix
+        '''diagonal of hat matrix

         diag(X' xpxi X)

@@ -258,18 +330,61 @@ class TheilRegressionResults(RegressionResults):
         projection y_hat = H y    or in terms of transformed variables (W^{-0.5} y)

         might be wrong for WLS and GLS case
-        """
-        pass
-
+        '''
+        # TODO is this still correct with sandwich normalized_cov_params, I guess not
+        xpxi = self.model.normalized_cov_params
+        #something fishy with self.normalized_cov_params in result, does not update
+        #print(self.model.wexog.shape, np.dot(xpxi, self.model.wexog.T).shape
+        return (self.model.wexog * np.dot(xpxi, self.model.wexog.T).T).sum(1)
+
+    #@cache_readonly
     def hatmatrix_trace(self):
         """trace of hat matrix
         """
-        pass
+        return self.hatmatrix_diag.sum()
+
+##    #this does not update df_resid
+##    @property   #needs to be property or attribute (no call)
+##    def df_model(self):
+##        return self.hatmatrix_trace()
+
+    #Note: mse_resid uses df_resid not nobs-k_vars, which might differ if df_model, tr(H), is used
+    #in paper for gcv ess/nobs is used instead of mse_resid
+    @cache_readonly
+    def gcv(self):
+        return self.mse_resid / (1. - self.hatmatrix_trace() / self.nobs)**2
+
+    @cache_readonly
+    def cv(self):
+        return ((self.resid / (1. - self.hatmatrix_diag))**2).sum() / self.nobs
+
+    @cache_readonly
+    def aicc(self):
+        aic = np.log(self.mse_resid) + 1
+        eff_dof = self.nobs - self.hatmatrix_trace() - 2
+        if eff_dof > 0:
+            adj = 2 * (1. + self.hatmatrix_trace()) / eff_dof
+        else:
+            adj = np.inf
+        return aic + adj

     def test_compatibility(self):
         """Hypothesis test for the compatibility of prior mean with data
         """
-        pass
+        # TODO: should we store the OLS results ?  not needed so far, but maybe cache
+        #params_ols = np.linalg.pinv(self.model.exog).dot(self.model.endog)
+        #res = self.wald_test(self.model.r_matrix, q_matrix=self.model.q_matrix, use_f=False)
+        #from scratch
+        res_ols = OLS(self.model.endog, self.model.exog).fit()
+        r_mat = self.model.r_matrix
+        r_diff = self.model.q_matrix - r_mat.dot(res_ols.params)[:,None]
+        ols_cov_r = res_ols.cov_params(r_matrix=r_mat)
+        statistic = r_diff.T.dot(np.linalg.solve(ols_cov_r + self.model.sigma_prior, r_diff))
+        from scipy import stats
+        df = np.linalg.matrix_rank(self.model.sigma_prior)   # same as r_mat.shape[0]
+        pvalue = stats.chi2.sf(statistic, df)
+        # TODO: return results class
+        return statistic, pvalue, df

     def share_data(self):
         """a measure for the fraction of the data in the estimation result
@@ -283,4 +398,73 @@ class TheilRegressionResults(RegressionResults):
             freedom of the model and the number (TODO should be rank) of the
             explanatory variables.
         """
-        pass
+
+        # this is hatmatrix_trace / self.exog.shape[1]
+        # This needs to use rank of exog and not shape[1],
+        # since singular exog is allowed
+        return (self.df_model + 1) / self.model.rank  # + 1 is for constant
+
+
+# contrast/restriction matrices, temporary location
+
+def coef_restriction_meandiff(n_coeffs, n_vars=None, position=0):
+
+    reduced = np.eye(n_coeffs) - 1./n_coeffs
+    if n_vars is None:
+        return reduced
+    else:
+        full = np.zeros((n_coeffs, n_vars))
+        full[:, position:position+n_coeffs] = reduced
+        return full
+
+
+def coef_restriction_diffbase(n_coeffs, n_vars=None, position=0, base_idx=0):
+
+    reduced = -np.eye(n_coeffs)  #make all rows, drop one row later
+    reduced[:, base_idx] = 1
+
+    keep = lrange(n_coeffs)
+    del keep[base_idx]
+    reduced = np.take(reduced, keep, axis=0)
+
+    if n_vars is None:
+        return reduced
+    else:
+        full = np.zeros((n_coeffs-1, n_vars))
+        full[:, position:position+n_coeffs] = reduced
+        return full
+
+
+def next_odd(d):
+    return d + (1 - d % 2)
+
+
+def coef_restriction_diffseq(n_coeffs, degree=1, n_vars=None, position=0, base_idx=0):
+    #check boundaries, returns "valid" ?
+
+    if degree == 1:
+        diff_coeffs = [-1, 1]
+        n_points = 2
+    elif degree > 1:
+        from scipy import misc
+        n_points = next_odd(degree + 1)  #next odd integer after degree+1
+        diff_coeffs = misc.central_diff_weights(n_points, ndiv=degree)
+
+    dff = np.concatenate((diff_coeffs, np.zeros(n_coeffs - len(diff_coeffs))))
+    from scipy import linalg
+    reduced = linalg.toeplitz(dff, np.zeros(n_coeffs - len(diff_coeffs) + 1)).T
+    #reduced = np.kron(np.eye(n_coeffs-n_points), diff_coeffs)
+
+    if n_vars is None:
+        return reduced
+    else:
+        full = np.zeros((n_coeffs-1, n_vars))
+        full[:, position:position+n_coeffs] = reduced
+        return full
+
+
+##
+##    R = np.c_[np.zeros((n_groups, k_vars-1)), np.eye(n_groups)]
+##    r = np.zeros(n_groups)
+##    R = np.c_[np.zeros((n_groups-1, k_vars)),
+##              np.eye(n_groups-1)-1./n_groups * np.ones((n_groups-1, n_groups-1))]
diff --git a/statsmodels/sandbox/regression/predstd.py b/statsmodels/sandbox/regression/predstd.py
index 9e470419b..75f477bdf 100644
--- a/statsmodels/sandbox/regression/predstd.py
+++ b/statsmodels/sandbox/regression/predstd.py
@@ -1,24 +1,31 @@
-"""Additional functions
+'''Additional functions

 prediction standard errors and confidence intervals


 A: josef pktd
-"""
+'''
+
 import numpy as np
 from scipy import stats

-
 def atleast_2dcol(x):
-    """ convert array_like to 2d from 1d or 0d
+    ''' convert array_like to 2d from 1d or 0d

     not tested because not used
-    """
-    pass
+    '''
+    x = np.asarray(x)
+    if (x.ndim == 1):
+        x = x[:, None]
+    elif (x.ndim == 0):
+        x = np.atleast_2d(x)
+    elif (x.ndim > 0):
+        raise ValueError('too many dimensions')
+    return x


 def wls_prediction_std(res, exog=None, weights=None, alpha=0.05):
-    """calculate standard deviation and confidence interval for prediction
+    '''calculate standard deviation and confidence interval for prediction

     applies to WLS and OLS, not to general GLS,
     that is independently but not identically distributed observations
@@ -59,5 +66,36 @@ def wls_prediction_std(res, exog=None, weights=None, alpha=0.05):

     Greene p.111 for OLS, extended to WLS by analogy

-    """
-    pass
+    '''
+    # work around current bug:
+    #    fit does not attach results to model, predict broken
+    #res.model.results
+
+    covb = res.cov_params()
+    if exog is None:
+        exog = res.model.exog
+        predicted = res.fittedvalues
+        if weights is None:
+            weights = res.model.weights
+    else:
+        exog = np.atleast_2d(exog)
+        if covb.shape[1] != exog.shape[1]:
+            raise ValueError('wrong shape of exog')
+        predicted = res.model.predict(res.params, exog)
+        if weights is None:
+            weights = 1.
+        else:
+            weights = np.asarray(weights)
+            if weights.size > 1 and len(weights) != exog.shape[0]:
+                raise ValueError('weights and exog do not have matching shape')
+
+
+    # full covariance:
+    #predvar = res3.mse_resid + np.diag(np.dot(X2,np.dot(covb,X2.T)))
+    # predication variance only
+    predvar = res.mse_resid/weights + (exog * np.dot(covb, exog.T).T).sum(1)
+    predstd = np.sqrt(predvar)
+    tppf = stats.t.isf(alpha/2., res.df_resid)
+    interval_u = predicted + tppf * predstd
+    interval_l = predicted - tppf * predstd
+    return predstd, interval_l, interval_u
diff --git a/statsmodels/sandbox/regression/runmnl.py b/statsmodels/sandbox/regression/runmnl.py
index 86237533e..110d3b93f 100644
--- a/statsmodels/sandbox/regression/runmnl.py
+++ b/statsmodels/sandbox/regression/runmnl.py
@@ -1,4 +1,4 @@
-"""conditional logit and nested conditional logit
+'''conditional logit and nested conditional logit

 nested conditional logit is supposed to be the random utility version
 (RU2 and maybe RU1)
@@ -21,14 +21,14 @@ Koppelman, Frank S., and Chandra Bhat with technical support from Vaneet Sethi,

 Author: josef-pktd
 License: BSD (simplified)
-"""
+'''
 import numpy as np
 import numpy.lib.recfunctions as recf
 from scipy import optimize


 class TryCLogit:
-    """
+    '''
     Conditional Logit, data handling test

     Parameters
@@ -57,7 +57,7 @@ class TryCLogit:
     For identification, the constant of one choice should be dropped.


-    """
+    '''

     def __init__(self, endog, exog_bychoices, ncommon):
         self.endog = endog
@@ -65,27 +65,54 @@ class TryCLogit:
         self.ncommon = ncommon
         self.nobs, self.nchoices = endog.shape
         self.nchoices = len(exog_bychoices)
-        betaind = [(exog_bychoices[ii].shape[1] - ncommon) for ii in range(4)]
+
+        #TODO: rename beta to params and include inclusive values for nested CL
+        betaind = [exog_bychoices[ii].shape[1]-ncommon for ii in range(4)]
         zi = np.r_[[ncommon], ncommon + np.array(betaind).cumsum()]
-        beta_indices = [np.r_[np.array([0, 1]), z[zi[ii]:zi[ii + 1]]] for
-            ii in range(len(zi) - 1)]
+        beta_indices = [np.r_[np.array([0, 1]),z[zi[ii]:zi[ii+1]]]
+                       for ii in range(len(zi)-1)]
         self.beta_indices = beta_indices
+
+        #for testing only
         beta = np.arange(7)
         betaidx_bychoices = [beta[idx] for idx in beta_indices]

+
     def xbetas(self, params):
-        """these are the V_i
-        """
-        pass
+        '''these are the V_i
+        '''
+
+        res = np.empty((self.nobs, self.nchoices))
+        for choiceind in range(self.nchoices):
+            res[:,choiceind] = np.dot(self.exog_bychoices[choiceind],
+                                      params[self.beta_indices[choiceind]])
+        return res
+
+    def loglike(self, params):
+        #normalization ?
+        xb = self.xbetas(params)
+        expxb = np.exp(xb)
+        sumexpxb = expxb.sum(1)#[:,None]
+        probs = expxb/expxb.sum(1)[:,None]  #we do not really need this for all
+        loglike = (self.endog * np.log(probs)).sum(1)
+        #is this the same: YES
+        #self.logliketest = (self.endog * xb).sum(1) - np.log(sumexpxb)
+        #if self.endog where index then xb[self.endog]
+        return -loglike.sum()   #return sum for now not for each observation
+
+    def fit(self, start_params=None):
+        if start_params is None:
+            start_params = np.zeros(6)  # need better np.zeros(6)
+        return optimize.fmin(self.loglike, start_params, maxfun=10000)


 class TryNCLogit:
-    """
+    '''
     Nested Conditional Logit (RUNMNL), data handling test

     unfinished, does not do anything yet

-    """
+    '''

     def __init__(self, endog, exog_bychoices, ncommon):
         self.endog = endog
@@ -93,109 +120,251 @@ class TryNCLogit:
         self.ncommon = ncommon
         self.nobs, self.nchoices = endog.shape
         self.nchoices = len(exog_bychoices)
-        betaind = [(exog_bychoices[ii].shape[1] - ncommon) for ii in range(4)]
+
+
+        #TODO rename beta to params and include inclusive values for nested CL
+        betaind = [exog_bychoices[ii].shape[1]-ncommon for ii in range(4)]
         zi = np.r_[[ncommon], ncommon + np.array(betaind).cumsum()]
-        beta_indices = [np.r_[np.array([0, 1]), z[zi[ii]:zi[ii + 1]]] for
-            ii in range(len(zi) - 1)]
+        beta_indices = [np.r_[np.array([0, 1]),z[zi[ii]:zi[ii+1]]]
+                       for ii in range(len(zi)-1)]
         self.beta_indices = beta_indices
+
+        #for testing only
         beta = np.arange(7)
         betaidx_bychoices = [beta[idx] for idx in beta_indices]

-    def xbetas(self, params):
-        """these are the V_i
-        """
-        pass
-
-
-testxb = 0
-

+    def xbetas(self, params):
+        '''these are the V_i
+        '''
+
+        res = np.empty((self.nobs, self.nchoices))
+        for choiceind in range(self.nchoices):
+            res[:,choiceind] = np.dot(self.exog_bychoices[choiceind],
+                                      params[self.beta_indices[choiceind]])
+        return res
+
+    def loglike_leafbranch(self, params, tau):
+        #normalization ?
+        #check/change naming for tau
+        xb = self.xbetas(params)
+        expxb = np.exp(xb/tau)
+        sumexpxb = expxb.sum(1)#[:,None]
+        logsumexpxb = np.log(sumexpxb)
+        #loglike = (self.endog * xb).sum(1) - logsumexpxb
+        probs = expxb/sumexpxb[:,None]
+        return probs, logsumexpxp  # noqa:F821  See GH#5756
+        #if self.endog where index then xb[self.endog]
+        #return -loglike.sum()   #return sum for now not for each observation
+
+    def loglike_branch(self, params, tau):
+        #not yet sure how to keep track of branches during walking of tree
+        ivs = []
+        for b in branches:  # noqa:F821  See GH#5756
+            probs, iv = self.loglike_leafbranch(params, tau)
+            ivs.append(iv)
+
+        #ivs = np.array(ivs)   #note ivs is (nobs,nbranchchoices)
+        ivs = np.column_stack(ivs) # this way ?
+        exptiv = np.exp(tau*ivs)
+        sumexptiv = exptiv.sum(1)
+        logsumexpxb = np.log(sumexpxb)  # noqa:F821  See GH#5756
+        probs = exptiv/sumexptiv[:,None]
+
+
+####### obsolete version to try out attaching data,
+####### new in treewalkerclass.py, copy new version to replace this
+####### problem with bzr I will disconnect history when copying
+testxb = 0 #global to class
 class RU2NMNL:
-    """Nested Multinomial Logit with Random Utility 2 parameterization
+    '''Nested Multinomial Logit with Random Utility 2 parameterization

-    """
+    '''

     def __init__(self, endog, exog, tree, paramsind):
         self.endog = endog
         self.datadict = exog
         self.tree = tree
         self.paramsind = paramsind
+
         self.branchsum = ''
         self.probs = {}

-    def calc_prob(self, tree, keys=None):
-        """walking a tree bottom-up based on dictionary
-        """
-        pass
-

-dta = np.genfromtxt('TableF23-2.txt', skip_header=1, names=
-    'Mode   Ttme   Invc    Invt      GC     Hinc    PSize'.split())
-endog = dta['Mode'].reshape(-1, 4).copy()
+    def calc_prob(self, tree, keys=None):
+        '''walking a tree bottom-up based on dictionary
+        '''
+        endog = self.endog
+        datadict = self.datadict
+        paramsind = self.paramsind
+        branchsum = self.branchsum
+
+
+        if isinstance(tree, tuple):   #assumes leaves are int for choice index
+            name, subtree = tree
+            print(name, datadict[name])
+            print('subtree', subtree)
+            keys = []
+            if testxb:
+                branchsum = datadict[name]
+            else:
+                branchsum = name  #0
+            for b in subtree:
+                print(b)
+                #branchsum += branch2(b)
+                branchsum = branchsum + self.calc_prob(b, keys)
+            print('branchsum', branchsum, keys)
+            for k in keys:
+                self.probs[k] = self.probs[k] + ['*' + name + '-prob']
+
+        else:
+            keys.append(tree)
+            self.probs[tree] = [tree + '-prob' +
+                                '(%s)' % ', '.join(self.paramsind[tree])]
+            if testxb:
+                leavessum = sum((datadict[bi] for bi in tree))
+                print('final branch with', tree, ''.join(tree), leavessum) #sum(tree)
+                return leavessum  #sum(xb[tree])
+            else:
+                return ''.join(tree) #sum(tree)
+
+        print('working on branch', tree, branchsum)
+        return branchsum
+
+
+
+#Trying out ways to handle data
+#------------------------------
+
+#travel data from Greene
+dta = np.genfromtxt('TableF23-2.txt', skip_header=1,
+                    names='Mode   Ttme   Invc    Invt      GC     Hinc    PSize'.split())
+
+endog = dta['Mode'].reshape(-1,4).copy() #I do not want a view
 nobs, nchoices = endog.shape
-datafloat = dta.view(float).reshape(-1, 7)
-exog = datafloat[:, 1:].reshape(-1, 6 * nchoices).copy()
+datafloat = dta.view(float).reshape(-1,7)
+exog = datafloat[:,1:].reshape(-1,6*nchoices).copy() #I do not want a view
+
 print(endog.sum(0))
 varnames = dta.dtype.names
 print(varnames[1:])
 modes = ['Air', 'Train', 'Bus', 'Car']
-print(exog.mean(0).reshape(nchoices, -1))
+print(exog.mean(0).reshape(nchoices, -1)) # Greene Table 23.23
+
+
+
+
+#try dummy encoding for individual-specific variables
 exog_choice_names = ['GC', 'Ttme']
 exog_choice = np.column_stack([dta[name] for name in exog_choice_names])
-exog_choice = exog_choice.reshape(-1, len(exog_choice_names) * nchoices)
-exog_choice = np.c_[endog, exog_choice]
-exog_individual = dta['Hinc'][:, None]
+exog_choice = exog_choice.reshape(-1,len(exog_choice_names)*nchoices)
+exog_choice = np.c_[endog, exog_choice] # add constant dummy
+
+exog_individual = dta['Hinc'][:,None]
+
+#exog2 = np.c_[exog_choice, exog_individual*endog]
+
+# we can also overwrite and select in original datafloat
+# e.g. Hinc*endog{choice)
+
 choice_index = np.arange(dta.shape[0]) % nchoices
-hinca = dta['Hinc'] * (choice_index == 0)
-dta2 = recf.append_fields(dta, ['Hinca'], [hinca], usemask=False)
+hinca = dta['Hinc']*(choice_index==0)
+dta2=recf.append_fields(dta, ['Hinca'],[hinca], usemask=False)
+
+
+#another version
+
 xi = []
 for ii in range(4):
-    xi.append(datafloat[choice_index == ii])
-dta1 = recf.append_fields(dta, ['Const'], [np.ones(dta.shape[0])], usemask=
-    False)
-xivar = [['GC', 'Ttme', 'Const', 'Hinc'], ['GC', 'Ttme', 'Const'], ['GC',
-    'Ttme', 'Const'], ['GC', 'Ttme']]
+    xi.append(datafloat[choice_index==ii])
+
+#one more
+dta1 = recf.append_fields(dta, ['Const'],[np.ones(dta.shape[0])], usemask=False)
+
+xivar = [['GC', 'Ttme', 'Const', 'Hinc'],
+         ['GC', 'Ttme', 'Const'],
+         ['GC', 'Ttme', 'Const'],
+         ['GC', 'Ttme']]    #need to drop one constant
+
 xi = []
 for ii in range(4):
-    xi.append(dta1[xivar[ii]][choice_index == ii])
+    xi.append(dta1[xivar[ii]][choice_index==ii])
+    #this does not change sequence of columns, bug report by Skipper I think
+
 ncommon = 2
-betaind = [(len(xi[ii].dtype.names) - ncommon) for ii in range(4)]
-zi = np.r_[[ncommon], ncommon + np.array(betaind).cumsum()]
-z = np.arange(7)
-betaindices = [np.r_[np.array([0, 1]), z[zi[ii]:zi[ii + 1]]] for ii in
-    range(len(zi) - 1)]
+betaind = [len(xi[ii].dtype.names)-ncommon for ii in range(4)]
+zi=np.r_[[ncommon], ncommon+np.array(betaind).cumsum()]
+z=np.arange(7)  #what is n?
+betaindices = [np.r_[np.array([0, 1]),z[zi[ii]:zi[ii+1]]]
+               for ii in range(len(zi)-1)]
+
 beta = np.arange(7)
 betai = [beta[idx] for idx in betaindices]
-xifloat = [xx.view(float).reshape(nobs, -1) for xx in xi]
+
+
+
+
+#examples for TryCLogit
+#----------------------
+
+
+#get exogs as float
+xifloat = [xx.view(float).reshape(nobs,-1) for xx in xi]
 clogit = TryCLogit(endog, xifloat, 2)
+
 debug = 0
 if debug:
     res = optimize.fmin(clogit.loglike, np.ones(6))
-tab2324 = [-0.15501, -0.09612, 0.01329, 5.2074, 3.869, 3.1632]
+#estimated parameters from Greene:
+tab2324 = [-0.15501, -0.09612, 0.01329, 5.2074, 3.8690, 3.1632]
 if debug:
     res2 = optimize.fmin(clogit.loglike, tab2324)
-res3 = optimize.fmin(clogit.loglike, np.zeros(6), maxfun=10000)
-"""
+
+res3 = optimize.fmin(clogit.loglike, np.zeros(6),maxfun=10000)
+#this has same numbers as Greene table 23.24, but different sequence
+#coefficient on GC is exactly 10% of Greene's
+#TODO: get better starting values
+'''
 Optimization terminated successfully.
          Current function value: 199.128369
          Iterations: 957
          Function evaluations: 1456
 array([-0.0961246 , -0.0155019 ,  0.01328757,  5.20741244,  3.86905293,
         3.16319074])
-"""
+'''
 res3corr = res3[[1, 0, 2, 3, 4, 5]]
 res3corr[0] *= 10
-print(res3corr - tab2324)
+print(res3corr - tab2324)  # diff 1e-5 to 1e-6
+#199.128369 - 199.1284  #llf same up to print(precision of Greene
+
 print(clogit.fit())
-tree0 = 'top', [('Fly', ['Air']), ('Ground', ['Train', 'Car', 'Bus'])]
-datadict = dict(zip(['Air', 'Train', 'Bus', 'Car'], [xifloat[i] for i in
-    range(4)]))
-datadict = dict(zip(['Air', 'Train', 'Bus', 'Car'], ['Airdata', 'Traindata',
-    'Busdata', 'Cardata']))
-datadict.update({'top': [], 'Fly': [], 'Ground': []})
-paramsind = {'top': [], 'Fly': [], 'Ground': [], 'Air': ['GC', 'Ttme',
-    'ConstA', 'Hinc'], 'Train': ['GC', 'Ttme', 'ConstT'], 'Bus': ['GC',
-    'Ttme', 'ConstB'], 'Car': ['GC', 'Ttme']}
+
+
+tree0 = ('top',
+            [('Fly',['Air']),
+             ('Ground', ['Train', 'Car', 'Bus'])
+             ])
+
+datadict = dict(zip(['Air', 'Train', 'Bus', 'Car'],
+                    [xifloat[i]for i in range(4)]))
+
+#for testing only (mock that returns it's own name
+datadict = dict(zip(['Air', 'Train', 'Bus', 'Car'],
+                    ['Airdata', 'Traindata', 'Busdata', 'Cardata']))
+
+datadict.update({'top' :   [],
+                 'Fly' :   [],
+                 'Ground': []})
+
+paramsind = {'top' :   [],
+             'Fly' :   [],
+             'Ground': [],
+             'Air' :   ['GC', 'Ttme', 'ConstA', 'Hinc'],
+             'Train' : ['GC', 'Ttme', 'ConstT'],
+             'Bus' :   ['GC', 'Ttme', 'ConstB'],
+             'Car' :   ['GC', 'Ttme']
+             }
+
 modru = RU2NMNL(endog, datadict, tree0, paramsind)
 print(modru.calc_prob(modru.tree))
 print('\nmodru.probs')
diff --git a/statsmodels/sandbox/regression/sympy_diff.py b/statsmodels/sandbox/regression/sympy_diff.py
index 2f76cd42c..28139e5a9 100644
--- a/statsmodels/sandbox/regression/sympy_diff.py
+++ b/statsmodels/sandbox/regression/sympy_diff.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Sat Mar 13 07:56:22 2010

@@ -8,12 +9,13 @@ import sympy as sy

 def pdf(x, mu, sigma):
     """Return the probability density function as an expression in x"""
-    pass
-
+    #x = sy.sympify(x)
+    return 1/(sigma*sy.sqrt(2*sy.pi)) * sy.exp(-(x-mu)**2 / (2*sigma**2))

 def cdf(x, mu, sigma):
     """Return the cumulative density function as an expression in x"""
-    pass
+    #x = sy.sympify(x)
+    return (1+sy.erf((x-mu)/(sigma*sy.sqrt(2))))/2


 mu = sy.Symbol('mu')
@@ -23,31 +25,38 @@ x = sy.Symbol('x')
 y = sy.Symbol('y')
 df = sy.Symbol('df')
 s = sy.Symbol('s')
-dldxnorm = sy.log(pdf(x, mu, sigma)).diff(x)
+
+dldxnorm = sy.log(pdf(x, mu,sigma)).diff(x)
 print(sy.simplify(dldxnorm))
-print(sy.diff(sy.log(sy.gamma((s + 1) / 2)), s))
-print(sy.diff((df + 1) / 2.0 * sy.log(1 + df / (df - 2)), df))
-tllf1 = sy.log(sy.gamma((df + 1) / 2.0)) - sy.log(sy.gamma(df / 2.0)
-    ) - 0.5 * sy.log(df * sy.pi)
-tllf2 = (df + 1.0) / 2.0 * sy.log(1.0 + (y - mu) ** 2 / df / sigma2
-    ) + 0.5 * sy.log(sigma2)
-tllf2std = (df + 1.0) / 2.0 * sy.log(1.0 + y ** 2 / df) + 0.5
+print(sy.diff(sy.log(sy.gamma((s+1)/2)),s))
+
+print(sy.diff((df+1)/2. * sy.log(1+df/(df-2)), df))
+
+#standard t distribution, not verified
+tllf1 = sy.log(sy.gamma((df+1)/2.)) - sy.log(sy.gamma(df/2.)) - 0.5*sy.log((df)*sy.pi)
+tllf2 = (df+1.)/2. * sy.log(1. + (y-mu)**2/(df)/sigma2) + 0.5 * sy.log(sigma2)
+tllf2std = (df+1.)/2. * sy.log(1. + y**2/df) + 0.5
 tllf = tllf1 - tllf2
 print(tllf1.diff(df))
 print(tllf2.diff(y))
-dlddf = (tllf1 - tllf2).diff(df)
+dlddf = (tllf1-tllf2).diff(df)
 print(dlddf)
 print(sy.cse(dlddf))
-print("""
- derivative of loglike of t distribution wrt df""")
-for k, v in sy.cse(dlddf)[0]:
+print('\n derivative of loglike of t distribution wrt df')
+for k,v in sy.cse(dlddf)[0]:
     print(k, '=', v)
+
 print(sy.cse(dlddf)[1][0])
-print("""
-standard t distribution, dll_df, dll_dy""")
+
+print('\nstandard t distribution, dll_df, dll_dy')
 tllfstd = tllf1 - tllf2std
 print(tllfstd.diff(df))
 print(tllfstd.diff(y))
+
 print('\n')
-print(dlddf.subs(dict(y=1, mu=1, sigma2=1.5, df=10.0001)))
-print(dlddf.subs(dict(y=1, mu=1, sigma2=1.5, df=10.0001)).evalf())
+
+print(dlddf.subs(dict(y=1,mu=1,sigma2=1.5,df=10.0001)))
+print(dlddf.subs(dict(y=1,mu=1,sigma2=1.5,df=10.0001)).evalf())
+# Note: derivatives of nested function does not work in sympy
+#       at least not higher order derivatives (second or larger)
+#       looks like print(failure
diff --git a/statsmodels/sandbox/regression/tools.py b/statsmodels/sandbox/regression/tools.py
index 27dc4635c..436948d2a 100644
--- a/statsmodels/sandbox/regression/tools.py
+++ b/statsmodels/sandbox/regression/tools.py
@@ -1,4 +1,4 @@
-"""gradient/Jacobian of normal and t loglikelihood
+'''gradient/Jacobian of normal and t loglikelihood

 use chain rule

@@ -17,14 +17,15 @@ TODO:

 A: josef-pktd

-"""
+'''
+
 import numpy as np
 from scipy import special
 from scipy.special import gammaln


 def norm_lls(y, params):
-    """normal loglikelihood given observations and mean mu and variance sigma2
+    '''normal loglikelihood given observations and mean mu and variance sigma2

     Parameters
     ----------
@@ -37,12 +38,14 @@ def norm_lls(y, params):
     -------
     lls : ndarray
         contribution to loglikelihood for each observation
-    """
-    pass
+    '''

+    mu, sigma2 = params.T
+    lls = -0.5*(np.log(2*np.pi) + np.log(sigma2) + (y-mu)**2/sigma2)
+    return lls

 def norm_lls_grad(y, params):
-    """Jacobian of normal loglikelihood wrt mean mu and variance sigma2
+    '''Jacobian of normal loglikelihood wrt mean mu and variance sigma2

     Parameters
     ----------
@@ -62,18 +65,20 @@ def norm_lls_grad(y, params):
     this is actually the derivative wrt sigma not sigma**2, but evaluated
     with parameter sigma2 = sigma**2

-    """
-    pass
+    '''
+    mu, sigma2 = params.T
+    dllsdmu = (y-mu)/sigma2
+    dllsdsigma2 = ((y-mu)**2/sigma2 - 1)/np.sqrt(sigma2)
+    return np.column_stack((dllsdmu, dllsdsigma2))


 def mean_grad(x, beta):
-    """gradient/Jacobian for d (x*beta)/ d beta
-    """
-    pass
-
+    '''gradient/Jacobian for d (x*beta)/ d beta
+    '''
+    return x

 def normgrad(y, x, params):
-    """Jacobian of normal loglikelihood wrt mean mu and variance sigma2
+    '''Jacobian of normal loglikelihood wrt mean mu and variance sigma2

     Parameters
     ----------
@@ -95,12 +100,21 @@ def normgrad(y, x, params):
     -----
     TODO: for heteroscedasticity need sigma to be a 1d array

-    """
-    pass
+    '''
+    beta = params[:-1]
+    sigma2 = params[-1]*np.ones((len(y),1))
+    dmudbeta = mean_grad(x, beta)
+    mu = np.dot(x, beta)
+    #print(beta, sigma2)
+    params2 = np.column_stack((mu,sigma2))
+    dllsdms = norm_lls_grad(y,params2)
+    grad = np.column_stack((dllsdms[:,:1]*dmudbeta, dllsdms[:,:1]))
+    return grad
+


 def tstd_lls(y, params, df):
-    """t loglikelihood given observations and mean mu and variance sigma2 = 1
+    '''t loglikelihood given observations and mean mu and variance sigma2 = 1

     Parameters
     ----------
@@ -119,25 +133,35 @@ def tstd_lls(y, params, df):
     Notes
     -----
     parametrized for garch
-    """
-    pass
+    '''

+    mu, sigma2 = params.T
+    df = df*1.0
+    #lls = gammaln((df+1)/2.) - gammaln(df/2.) - 0.5*np.log((df-2)*np.pi)
+    #lls -= (df+1)/2. * np.log(1. + (y-mu)**2/(df-2.)/sigma2) + 0.5 * np.log(sigma2)
+    lls = gammaln((df+1)/2.) - gammaln(df/2.) - 0.5*np.log((df-2)*np.pi)
+    lls -= (df+1)/2. * np.log(1. + (y-mu)**2/(df-2)/sigma2) + 0.5 * np.log(sigma2)
+
+    return lls

 def norm_dlldy(y):
-    """derivative of log pdf of standard normal with respect to y
-    """
-    pass
+    '''derivative of log pdf of standard normal with respect to y
+    '''
+    return -y


 def tstd_pdf(x, df):
-    """pdf for standardized (not standard) t distribution, variance is one
+    '''pdf for standardized (not standard) t distribution, variance is one

-    """
-    pass
+    '''

+    r = np.array(df*1.0)
+    Px = np.exp(special.gammaln((r+1)/2.)-special.gammaln(r/2.))/np.sqrt((r-2)*np.pi)
+    Px /= (1+(x**2)/(r-2))**((r+1)/2.)
+    return Px

 def ts_lls(y, params, df):
-    """t loglikelihood given observations and mean mu and variance sigma2 = 1
+    '''t loglikelihood given observations and mean mu and variance sigma2 = 1

     Parameters
     ----------
@@ -164,12 +188,19 @@ def ts_lls(y, params, df):
     >>> sigma = np.sqrt(2.)
     >>> stats.t.stats(df, loc=0., scale=sigma*np.sqrt((df-2.)/df))
     (array(0.0), array(2.0))
-    """
-    pass
+    '''
+    print(y, params, df)
+    mu, sigma2 = params.T
+    df = df*1.0
+    #lls = gammaln((df+1)/2.) - gammaln(df/2.) - 0.5*np.log((df-2)*np.pi)
+    #lls -= (df+1)/2. * np.log(1. + (y-mu)**2/(df-2.)/sigma2) + 0.5 * np.log(sigma2)
+    lls = gammaln((df+1)/2.) - gammaln(df/2.) - 0.5*np.log((df)*np.pi)
+    lls -= (df+1.)/2. * np.log(1. + (y-mu)**2/(df)/sigma2) + 0.5 * np.log(sigma2)
+    return lls


 def ts_dlldy(y, df):
-    """derivative of log pdf of standard t with respect to y
+    '''derivative of log pdf of standard t with respect to y

     Parameters
     ----------
@@ -189,12 +220,14 @@ def ts_dlldy(y, df):
     -----
     with mean 0 and scale 1, but variance is df/(df-2)

-    """
-    pass
-
+    '''
+    df = df*1.
+    #(df+1)/2. / (1 + y**2/(df-2.)) * 2.*y/(df-2.)
+    #return -(df+1)/(df-2.) / (1 + y**2/(df-2.)) * y
+    return -(df+1)/(df) / (1 + y**2/(df)) * y

 def tstd_dlldy(y, df):
-    """derivative of log pdf of standardized t with respect to y
+    '''derivative of log pdf of standardized t with respect to y

         Parameters
         ----------
@@ -214,12 +247,13 @@ def tstd_dlldy(y, df):
     Notes
     -----
     parametrized for garch, standardized to variance=1
-    """
-    pass
-
+    '''
+    #(df+1)/2. / (1 + y**2/(df-2.)) * 2.*y/(df-2.)
+    return -(df+1)/(df-2.) / (1 + y**2/(df-2.)) * y
+    #return (df+1)/(df) / (1 + y**2/(df)) * y

 def locscale_grad(y, loc, scale, dlldy, *args):
-    """derivative of log-likelihood with respect to location and scale
+    '''derivative of log-likelihood with respect to location and scale

     Parameters
     ----------
@@ -243,97 +277,106 @@ def locscale_grad(y, loc, scale, dlldy, *args):
         derivative of loglikelihood wrt scale evaluated at the
         points given in y

-    """
-    pass
-
+    '''
+    yst = (y-loc)/scale    #ystandardized
+    dlldloc = -dlldy(yst, *args) / scale
+    dlldscale = -1./scale - dlldy(yst, *args) * (y-loc)/scale**2
+    return dlldloc, dlldscale

 if __name__ == '__main__':
     verbose = 0
     if verbose:
         sig = 0.1
         beta = np.ones(2)
-        rvs = np.random.randn(10, 3)
-        x = rvs[:, 1:]
-        y = np.dot(x, beta) + sig * rvs[:, 0]
-        params = [1, 1, 1]
+        rvs = np.random.randn(10,3)
+        x = rvs[:,1:]
+        y = np.dot(x,beta) + sig*rvs[:,0]
+
+        params = [1,1,1]
         print(normgrad(y, x, params))
-        dllfdbeta = (y - np.dot(x, beta))[:, None] * x
+
+        dllfdbeta = (y-np.dot(x, beta))[:,None]*x   #for sigma = 1
         print(dllfdbeta)
+
         print(locscale_grad(y, np.dot(x, beta), 1, norm_dlldy))
-        print(y - np.dot(x, beta))
+        print(y-np.dot(x, beta))
+
     from scipy import stats, misc
+
+    def llt(y,loc,scale,df):
+        return np.log(stats.t.pdf(y, df, loc=loc, scale=scale))
+    def lltloc(loc,y,scale,df):
+        return np.log(stats.t.pdf(y, df, loc=loc, scale=scale))
+    def lltscale(scale,y,loc,df):
+        return np.log(stats.t.pdf(y, df, loc=loc, scale=scale))
+
+    def llnorm(y,loc,scale):
+        return np.log(stats.norm.pdf(y, loc=loc, scale=scale))
+    def llnormloc(loc,y,scale):
+        return np.log(stats.norm.pdf(y, loc=loc, scale=scale))
+    def llnormscale(scale,y,loc):
+        return np.log(stats.norm.pdf(y, loc=loc, scale=scale))
+
     if verbose:
         print('\ngradient of t')
-        print(misc.derivative(llt, 1, dx=1e-06, n=1, args=(0, 1, 10), order=3))
+        print(misc.derivative(llt, 1, dx=1e-6, n=1, args=(0,1,10), order=3))
         print('t ', locscale_grad(1, 0, 1, tstd_dlldy, 10))
         print('ts', locscale_grad(1, 0, 1, ts_dlldy, 10))
-        print(misc.derivative(llt, 1.5, dx=1e-10, n=1, args=(0, 1, 20),
-            order=3))
+        print(misc.derivative(llt, 1.5, dx=1e-10, n=1, args=(0,1,20), order=3),)
         print('ts', locscale_grad(1.5, 0, 1, ts_dlldy, 20))
-        print(misc.derivative(llt, 1.5, dx=1e-10, n=1, args=(0, 2, 20),
-            order=3))
+        print(misc.derivative(llt, 1.5, dx=1e-10, n=1, args=(0,2,20), order=3),)
         print('ts', locscale_grad(1.5, 0, 2, ts_dlldy, 20))
-        print(misc.derivative(llt, 1.5, dx=1e-10, n=1, args=(1, 2, 20),
-            order=3))
+        print(misc.derivative(llt, 1.5, dx=1e-10, n=1, args=(1,2,20), order=3),)
         print('ts', locscale_grad(1.5, 1, 2, ts_dlldy, 20))
-        print(misc.derivative(lltloc, 1, dx=1e-10, n=1, args=(1.5, 2, 20),
-            order=3))
-        print(misc.derivative(lltscale, 2, dx=1e-10, n=1, args=(1.5, 1, 20),
-            order=3))
-        y, loc, scale, df = 1.5, 1, 2, 20
-        print('ts', locscale_grad(y, loc, scale, ts_dlldy, 20))
-        print(misc.derivative(lltloc, loc, dx=1e-10, n=1, args=(y, scale,
-            df), order=3))
-        print(misc.derivative(lltscale, scale, dx=1e-10, n=1, args=(y, loc,
-            df), order=3))
+        print(misc.derivative(lltloc, 1, dx=1e-10, n=1, args=(1.5,2,20), order=3),)
+        print(misc.derivative(lltscale, 2, dx=1e-10, n=1, args=(1.5,1,20), order=3))
+        y,loc,scale,df = 1.5, 1, 2, 20
+        print('ts', locscale_grad(y,loc,scale, ts_dlldy, 20))
+        print(misc.derivative(lltloc, loc, dx=1e-10, n=1, args=(y,scale,df), order=3),)
+        print(misc.derivative(lltscale, scale, dx=1e-10, n=1, args=(y,loc,df), order=3))
+
         print('\ngradient of norm')
-        print(misc.derivative(llnorm, 1, dx=1e-06, n=1, args=(0, 1), order=3))
+        print(misc.derivative(llnorm, 1, dx=1e-6, n=1, args=(0,1), order=3))
         print(locscale_grad(1, 0, 1, norm_dlldy))
-        y, loc, scale = 1.5, 1, 2
-        print('ts', locscale_grad(y, loc, scale, norm_dlldy))
-        print(misc.derivative(llnormloc, loc, dx=1e-10, n=1, args=(y, scale
-            ), order=3))
-        print(misc.derivative(llnormscale, scale, dx=1e-10, n=1, args=(y,
-            loc), order=3))
-        y, loc, scale = 1.5, 0, 1
-        print('ts', locscale_grad(y, loc, scale, norm_dlldy))
-        print(misc.derivative(llnormloc, loc, dx=1e-10, n=1, args=(y, scale
-            ), order=3))
-        print(misc.derivative(llnormscale, scale, dx=1e-10, n=1, args=(y,
-            loc), order=3))
+        y,loc,scale = 1.5, 1, 2
+        print('ts', locscale_grad(y,loc,scale, norm_dlldy))
+        print(misc.derivative(llnormloc, loc, dx=1e-10, n=1, args=(y,scale), order=3),)
+        print(misc.derivative(llnormscale, scale, dx=1e-10, n=1, args=(y,loc), order=3))
+        y,loc,scale = 1.5, 0, 1
+        print('ts', locscale_grad(y,loc,scale, norm_dlldy))
+        print(misc.derivative(llnormloc, loc, dx=1e-10, n=1, args=(y,scale), order=3),)
+        print(misc.derivative(llnormscale, scale, dx=1e-10, n=1, args=(y,loc), order=3))
+        #print('still something wrong with handling of scale and variance'
+        #looks ok now
         print('\nloglike of t')
-        print(tstd_lls(1, np.array([0, 1]), 100), llt(1, 0, 1, 100),
-            'differently standardized')
-        print(tstd_lls(1, np.array([0, 1]), 10), llt(1, 0, 1, 10),
-            'differently standardized')
-        print(ts_lls(1, np.array([0, 1]), 10), llt(1, 0, 1, 10))
-        print(tstd_lls(1, np.array([0, 1.0 * 10.0 / 8.0]), 10), llt(1.0, 0,
-            1.0, 10))
-        print(ts_lls(1, np.array([0, 1]), 100), llt(1, 0, 1, 100))
-        print(tstd_lls(1, np.array([0, 1]), 10), llt(1, 0, 1.0 * np.sqrt(8 /
-            10.0), 10))
+        print(tstd_lls(1, np.array([0,1]), 100), llt(1,0,1,100), 'differently standardized')
+        print(tstd_lls(1, np.array([0,1]), 10), llt(1,0,1,10), 'differently standardized')
+        print(ts_lls(1, np.array([0,1]), 10), llt(1,0,1,10))
+        print(tstd_lls(1, np.array([0,1.*10./8.]), 10), llt(1.,0,1.,10))
+        print(ts_lls(1, np.array([0,1]), 100), llt(1,0,1,100))
+
+        print(tstd_lls(1, np.array([0,1]), 10), llt(1,0,1.*np.sqrt(8/10.),10))
+
+
     from numpy.testing import assert_almost_equal
-    params = [(0, 1), (1.0, 1.0), (0.0, 2.0), (1.0, 2.0)]
-    yt = np.linspace(-2.0, 2.0, 11)
-    for loc, scale in params:
-        dlldlo = misc.derivative(llnormloc, loc, dx=1e-10, n=1, args=(yt,
-            scale), order=3)
-        dlldsc = misc.derivative(llnormscale, scale, dx=1e-10, n=1, args=(
-            yt, loc), order=3)
+    params =[(0, 1), (1.,1.), (0.,2.), ( 1., 2.)]
+    yt = np.linspace(-2.,2.,11)
+    for loc,scale in params:
+        dlldlo = misc.derivative(llnormloc, loc, dx=1e-10, n=1, args=(yt,scale), order=3)
+        dlldsc = misc.derivative(llnormscale, scale, dx=1e-10, n=1, args=(yt,loc), order=3)
         gr = locscale_grad(yt, loc, scale, norm_dlldy)
         assert_almost_equal(dlldlo, gr[0], 5, err_msg='deriv loc')
         assert_almost_equal(dlldsc, gr[1], 5, err_msg='deriv scale')
     for df in [3, 10, 100]:
-        for loc, scale in params:
-            dlldlo = misc.derivative(lltloc, loc, dx=1e-10, n=1, args=(yt,
-                scale, df), order=3)
-            dlldsc = misc.derivative(lltscale, scale, dx=1e-10, n=1, args=(
-                yt, loc, df), order=3)
+        for loc,scale in params:
+            dlldlo = misc.derivative(lltloc, loc, dx=1e-10, n=1, args=(yt,scale,df), order=3)
+            dlldsc = misc.derivative(lltscale, scale, dx=1e-10, n=1, args=(yt,loc,df), order=3)
             gr = locscale_grad(yt, loc, scale, ts_dlldy, df)
             assert_almost_equal(dlldlo, gr[0], 4, err_msg='deriv loc')
             assert_almost_equal(dlldsc, gr[1], 4, err_msg='deriv scale')
-            assert_almost_equal(ts_lls(yt, np.array([loc, scale ** 2]), df),
-                llt(yt, loc, scale, df), 5, err_msg='loglike')
-            assert_almost_equal(tstd_lls(yt, np.array([loc, scale ** 2]),
-                df), llt(yt, loc, scale * np.sqrt((df - 2.0) / df), df), 5,
-                err_msg='loglike')
+            assert_almost_equal(ts_lls(yt, np.array([loc, scale**2]), df),
+                                llt(yt,loc,scale,df), 5,
+                                err_msg='loglike')
+            assert_almost_equal(tstd_lls(yt, np.array([loc, scale**2]), df),
+                                llt(yt,loc,scale*np.sqrt((df-2.)/df),df), 5,
+                                err_msg='loglike')
diff --git a/statsmodels/sandbox/regression/treewalkerclass.py b/statsmodels/sandbox/regression/treewalkerclass.py
index 7699489c5..f34d12c89 100644
--- a/statsmodels/sandbox/regression/treewalkerclass.py
+++ b/statsmodels/sandbox/regression/treewalkerclass.py
@@ -1,4 +1,4 @@
-"""
+'''

 Formulas
 --------
@@ -100,14 +100,16 @@ still todo:

 Author: Josef Perktold
 License : BSD (3-clause)
-"""
+'''
 from statsmodels.compat.python import lrange
+
 from pprint import pprint
+
 import numpy as np


 def randintw(w, size=1):
-    """generate integer random variables given probabilties
+    '''generate integer random variables given probabilties

     useful because it can be used as index into any array or sequence type

@@ -135,12 +137,15 @@ def randintw(w, size=1):
     >>> np.bincount(randintw([0.6, 0.4, 0.0], size=3000))/3000.
     array([ 0.59566667,  0.40433333])

-    """
-    pass
-
+    '''
+    #from Charles Harris, numpy mailing list
+    from numpy.random import random
+    p = np.cumsum(w)/np.sum(w)
+    rvs = p.searchsorted(random(np.prod(size))).reshape(size)
+    return rvs

 def getbranches(tree):
-    """
+    '''
     walk tree to get list of branches

     Parameters
@@ -153,12 +158,17 @@ def getbranches(tree):
     branch : list
         list of all branch names

-    """
-    pass
-
+    '''
+    if isinstance(tree, tuple):
+        name, subtree = tree
+        a = [name]
+        for st in subtree:
+            a.extend(getbranches(st))
+        return a
+    return []

 def getnodes(tree):
-    """
+    '''
     walk tree to get list of branches and list of leaves

     Parameters
@@ -173,15 +183,30 @@ def getnodes(tree):
     leaves : list
         list of all leaves names

-    """
-    pass
+    '''
+    if isinstance(tree, tuple):
+        name, subtree = tree
+        ab = [name]
+        al = []
+        #degenerate branches
+        if len(subtree) == 1:
+            adeg = [name]
+        else:
+            adeg = []

+        for st in subtree:
+            b, l, d = getnodes(st)
+            ab.extend(b)
+            al.extend(l)
+            adeg.extend(d)
+        return ab, al, adeg
+    return [], [tree], []

-testxb = 2

+testxb = 2 #global to class to return strings instead of numbers

 class RU2NMNL:
-    """Nested Multinomial Logit with Random Utility 2 parameterization
+    '''Nested Multinomial Logit with Random Utility 2 parameterization


     Parameters
@@ -222,35 +247,52 @@ class RU2NMNL:
     by leaf names, if endog is defined as categorical variable with
     associated category level names.)

-    """
+    '''

     def __init__(self, endog, exog, tree, paramsind):
         self.endog = endog
         self.datadict = exog
         self.tree = tree
         self.paramsind = paramsind
+
         self.branchsum = ''
         self.probs = {}
         self.probstxt = {}
         self.branchleaves = {}
-        self.branchvalues = {}
+        self.branchvalues = {}  #just to keep track of returns by branches
         self.branchsums = {}
         self.bprobs = {}
-        self.branches, self.leaves, self.branches_degenerate = getnodes(tree)
+        self.branches, self.leaves, self.branches_degenerate  = getnodes(tree)
         self.nbranches = len(self.branches)
-        self.paramsnames = sorted(set([i for j in paramsind.values() for i in
-            j])) + [('tau_%s' % bname) for bname in self.branches]
+
+        #copied over but not quite sure yet
+        #unique, parameter array names,
+        #sorted alphabetically, order is/should be only internal
+
+        self.paramsnames = (sorted(set([i for j in paramsind.values()
+                                       for i in j])) +
+                            ['tau_%s' % bname for bname in self.branches])
+
         self.nparams = len(self.paramsnames)
-        self.paramsidx = dict((name, idx) for idx, name in enumerate(self.
-            paramsnames))
-        self.parinddict = dict((k, [self.paramsidx[j] for j in v]) for k, v in
-            self.paramsind.items())
-        self.recursionparams = 1.0 + np.arange(len(self.paramsnames))
+
+        #mapping coefficient names to indices to unique/parameter array
+        self.paramsidx = dict((name, idx) for (idx,name) in
+                              enumerate(self.paramsnames))
+
+        #mapping branch and leaf names to index in parameter array
+        self.parinddict = dict((k, [self.paramsidx[j] for j in v])
+                               for k,v in self.paramsind.items())
+
+        self.recursionparams = 1. + np.arange(len(self.paramsnames))
+        #for testing that individual parameters are used in the right place
         self.recursionparams = np.zeros(len(self.paramsnames))
-        self.recursionparams[-self.nbranches:] = 1
+        #self.recursionparams[2] = 1
+        self.recursionparams[-self.nbranches:] = 1  #values for tau's
+        #self.recursionparams[-2] = 2
+

     def get_probs(self, params):
-        """
+        '''
         obtain the probability array given an array of parameters

         This is the function that can be called by loglike or other methods
@@ -271,52 +313,257 @@ class RU2NMNL:



-        """
-        pass
+        '''
+        self.recursionparams = params
+        self.calc_prob(self.tree)
+        probs_array = np.array([self.probs[leaf] for leaf in self.leaves])
+        return probs_array
+        #what's the ordering? Should be the same as sequence in tree.
+        #TODO: need a check/assert that this sequence is the same as the
+        #      encoding in endog
+

     def calc_prob(self, tree, parent=None):
-        """walking a tree bottom-up based on dictionary
-        """
-        pass
+        '''walking a tree bottom-up based on dictionary
+        '''
+
+        #0.5#2 #placeholder for now
+        #should be tau=self.taus[name] but as part of params for optimization
+        endog = self.endog
+        datadict = self.datadict
+        paramsind = self.paramsind
+        branchsum = self.branchsum
+
+
+        if isinstance(tree, tuple):   #assumes leaves are int for choice index
+
+            name, subtree = tree
+            self.branchleaves[name] = []  #register branch in dictionary
+
+            tau = self.recursionparams[self.paramsidx['tau_'+name]]
+            if DEBUG:
+                print('----------- starting next branch-----------')
+                print(name, datadict[name], 'tau=', tau)
+                print('subtree', subtree)
+            branchvalue = []
+            if testxb == 2:
+                branchsum = 0
+            elif testxb == 1:
+                branchsum = datadict[name]
+            else:
+                branchsum = name
+            for b in subtree:
+                if DEBUG:
+                    print(b)
+                bv = self.calc_prob(b, name)
+                bv = np.exp(bv/tau)  #this should not be here, when adding branch data
+                branchvalue.append(bv)
+                branchsum = branchsum + bv
+            self.branchvalues[name] = branchvalue #keep track what was returned
+
+            if DEBUG:
+                print('----------- returning to branch-----------')
+                print(name)
+                print('branchsum in branch', name, branchsum)
+
+            if parent:
+                if DEBUG:
+                    print('parent', parent)
+                self.branchleaves[parent].extend(self.branchleaves[name])
+            if 0:  #not name == 'top':  # not used anymore !!! ???
+            #if not name == 'top':
+                #TODO: do I need this only on the lowest branches ?
+                tmpsum = 0
+                for k in self.branchleaves[name]:
+                    #similar to this is now also in return branch values
+                    #depends on what will be returned
+                    tmpsum += self.probs[k]
+                    iv = np.log(tmpsum)
+
+                for k in self.branchleaves[name]:
+                    self.probstxt[k] = self.probstxt[k] + ['*' + name + '-prob' +
+                                    '(%s)' % ', '.join(self.paramsind[name])]
+
+                    #TODO: does this use the denominator twice now
+                    self.probs[k] = self.probs[k] / tmpsum
+                    if np.size(self.datadict[name])>0:
+                        #not used yet, might have to move one indentation level
+                        #self.probs[k] = self.probs[k] / tmpsum
+##                            np.exp(-self.datadict[name] *
+##                             np.sum(self.recursionparams[self.parinddict[name]]))
+                        if DEBUG:
+                            print('self.datadict[name], self.probs[k]')
+                            print(self.datadict[name], self.probs[k])
+                    #if not name == 'top':
+                    #    self.probs[k] = self.probs[k] * np.exp( iv)
+
+            #walk one level down again to add branch probs to instance.probs
+            self.bprobs[name] = []
+            for bidx, b in enumerate(subtree):
+                if DEBUG:
+                    print('repr(b)', repr(b), bidx)
+                #if len(b) == 1: #TODO: skip leaves, check this
+                if not isinstance(b,  tuple): # isinstance(b, str):
+                    #TODO: replace this with a check for branch (tuple) instead
+                    #this implies name is a bottom branch,
+                    #possible to add special things here
+                    self.bprobs[name].append(self.probs[b])
+                    #TODO: need tau possibly here
+                    self.probs[b] = self.probs[b] / branchsum
+                    if DEBUG:
+                        print('*********** branchsum at bottom branch', branchsum)
+                    #self.bprobs[name].append(self.probs[b])
+                else:
+                    bname = b[0]
+                    branchsum2 = sum(self.branchvalues[name])
+                    assert np.abs(branchsum - branchsum2).sum() < 1e-8
+                    bprob = branchvalue[bidx]/branchsum
+                    self.bprobs[name].append(bprob)
+
+                    for k in self.branchleaves[bname]:
+
+                        if DEBUG:
+                            print('branchprob', bname, k, bprob, branchsum)
+                        #temporary hack with maximum to avoid zeros
+                        self.probs[k] = self.probs[k] * np.maximum(bprob, 1e-4)
+
+
+            if DEBUG:
+                print('working on branch', tree, branchsum)
+            if testxb<2:
+                return branchsum
+            else: #this is the relevant part
+                self.branchsums[name] = branchsum
+                if np.size(self.datadict[name])>0:
+                    branchxb = np.sum(self.datadict[name] *
+                                  self.recursionparams[self.parinddict[name]])
+                else:
+                    branchxb = 0
+                if not name=='top':
+                    tau = self.recursionparams[self.paramsidx['tau_'+name]]
+                else:
+                    tau = 1
+                iv = branchxb + tau * branchsum #which tau: name or parent???
+                return branchxb + tau * np.log(branchsum) #iv
+                #branchsum is now IV, TODO: add effect of branch variables
+
+        else:
+            tau = self.recursionparams[self.paramsidx['tau_'+parent]]
+            if DEBUG:
+                print('parent', parent)
+            self.branchleaves[parent].append(tree) # register leave with parent
+            self.probstxt[tree] = [tree + '-prob' +
+                                '(%s)' % ', '.join(self.paramsind[tree])]
+            #this is not yet a prob, not normalized to 1, it is exp(x*b)
+            leafprob = np.exp(np.sum(self.datadict[tree] *
+                                  self.recursionparams[self.parinddict[tree]])
+                              / tau)   # fake tau for now, wrong spot ???
+            #it seems I get the same answer with and without tau here
+            self.probs[tree] = leafprob  #= 1 #try initialization only
+            #TODO: where  should I add tau in the leaves
+
+            if testxb == 2:
+                return np.log(leafprob)
+            elif testxb == 1:
+                leavessum = np.array(datadict[tree]) # sum((datadict[bi] for bi in datadict[tree]))
+                if DEBUG:
+                    print('final branch with', tree, ''.join(tree), leavessum) #sum(tree)
+                return leavessum  #sum(xb[tree])
+            elif testxb == 0:
+                return ''.join(tree) #sum(tree)
+


 if __name__ == '__main__':
     DEBUG = 0
-    endog = 5
-    tree0 = 'top', [('Fly', ['Air']), ('Ground', ['Train', 'Car', 'Bus'])]
-    """ this is with real data from Greene's clogit example
+
+    endog = 5 # dummy place holder
+
+
+    ##############  Example similar to Greene
+
+    #get pickled data
+    #endog3, xifloat3 = pickle.load(open('xifloat2.pickle','rb'))
+
+
+    tree0 = ('top',
+                [('Fly',['Air']),
+                 ('Ground', ['Train', 'Car', 'Bus'])
+                 ])
+
+    ''' this is with real data from Greene's clogit example
     datadict = dict(zip(['Air', 'Train', 'Bus', 'Car'],
                         [xifloat[i]for i in range(4)]))
-    """
-    datadict = dict(zip(['Air', 'Train', 'Bus', 'Car'], ['Airdata',
-        'Traindata', 'Busdata', 'Cardata']))
+    '''
+
+    #for testing only (mock that returns it's own name
+    datadict = dict(zip(['Air', 'Train', 'Bus', 'Car'],
+                        ['Airdata', 'Traindata', 'Busdata', 'Cardata']))
+
     if testxb:
-        datadict = dict(zip(['Air', 'Train', 'Bus', 'Car'], np.arange(4)))
-    datadict.update({'top': [], 'Fly': [], 'Ground': []})
-    paramsind = {'top': [], 'Fly': [], 'Ground': [], 'Air': ['GC', 'Ttme',
-        'ConstA', 'Hinc'], 'Train': ['GC', 'Ttme', 'ConstT'], 'Bus': ['GC',
-        'Ttme', 'ConstB'], 'Car': ['GC', 'Ttme']}
+        datadict = dict(zip(['Air', 'Train', 'Bus', 'Car'],
+                        np.arange(4)))
+
+    datadict.update({'top' :   [],
+                     'Fly' :   [],
+                     'Ground': []})
+
+    paramsind = {'top' :   [],
+                 'Fly' :   [],
+                 'Ground': [],
+                 'Air' :   ['GC', 'Ttme', 'ConstA', 'Hinc'],
+                 'Train' : ['GC', 'Ttme', 'ConstT'],
+                 'Bus' :   ['GC', 'Ttme', 'ConstB'],
+                 'Car' :   ['GC', 'Ttme']
+                 }
+
     modru = RU2NMNL(endog, datadict, tree0, paramsind)
     modru.recursionparams[-1] = 2
     modru.recursionparams[1] = 1
+
     print('Example 1')
     print('---------\n')
     print(modru.calc_prob(modru.tree))
+
     print('Tree')
     pprint(modru.tree)
     print('\nmodru.probs')
     pprint(modru.probs)
-    tree2 = 'top', [('B1', ['a', 'b']), ('B2', [('B21', ['c', 'd']), ('B22',
-        ['e', 'f', 'g'])]), ('B3', ['h'])]
-    paramsind2 = {'B1': [], 'a': ['consta', 'p'], 'b': ['constb', 'p'],
-        'B2': ['const2', 'x2'], 'B21': [], 'c': ['constc', 'p', 'time'],
-        'd': ['constd', 'p', 'time'], 'B22': ['x22'], 'e': ['conste', 'p',
-        'hince'], 'f': ['constf', 'p', 'hincf'], 'g': ['p', 'hincg'], 'B3':
-        [], 'h': ['consth', 'p', 'h'], 'top': []}
-    datadict2 = dict([i for i in zip('abcdefgh', lrange(8))])
-    datadict2.update({'top': 1000, 'B1': 100, 'B2': 200, 'B21': 21, 'B22': 
-        22, 'B3': 300})
-    """
+
+
+
+    ##############  example with many layers
+
+    tree2 = ('top',
+                [('B1',['a','b']),
+                 ('B2',
+                       [('B21',['c', 'd']),
+                        ('B22',['e', 'f', 'g'])
+                        ]
+                  ),
+                 ('B3',['h'])])
+
+    #Note: dict looses ordering
+    paramsind2 = {
+     'B1': [],
+     'a': ['consta', 'p'],
+     'b': ['constb', 'p'],
+     'B2': ['const2', 'x2'],
+     'B21': [],
+     'c': ['constc', 'p', 'time'],
+     'd': ['constd', 'p', 'time'],
+     'B22': ['x22'],
+     'e': ['conste', 'p', 'hince'],
+     'f': ['constf', 'p', 'hincf'],
+     'g': [          'p', 'hincg'],
+     'B3': [],
+     'h': ['consth', 'p', 'h'],
+     'top': []}
+
+
+    datadict2 = dict([i for i in zip('abcdefgh',lrange(8))])
+    datadict2.update({'top':1000, 'B1':100, 'B2':200, 'B21':21,'B22':22, 'B3':300})
+    '''
     >>> pprint(datadict2)
     {'B1': 100,
      'B2': 200,
@@ -332,7 +579,9 @@ if __name__ == '__main__':
      'g': 6,
      'h': 7,
      'top': 1000}
-    """
+    '''
+
+
     modru2 = RU2NMNL(endog, datadict2, tree2, paramsind2)
     modru2.recursionparams[-3] = 2
     modru2.recursionparams[3] = 1
@@ -343,22 +592,28 @@ if __name__ == '__main__':
     pprint(modru2.tree)
     print('\nmodru.probs')
     pprint(modru2.probs)
+
+
     print('sum of probs', sum(list(modru2.probs.values())))
     print('branchvalues')
     print(modru2.branchvalues)
     print(modru.branchvalues)
+
     print('branch probabilities')
     print(modru.bprobs)
+
     print('degenerate branches')
     print(modru.branches_degenerate)
-    """
+
+    '''
     >>> modru.bprobs
     {'Fly': [], 'top': [0.0016714179077931082, 0.99832858209220687], 'Ground': []}
     >>> modru2.bprobs
     {'top': [0.25000000000000006, 0.62499999999999989, 0.12500000000000003], 'B22': [], 'B21': [], 'B1': [], 'B2': [0.40000000000000008, 0.59999999999999998], 'B3': []}
-    """
-    params1 = np.array([0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 2.0])
+    '''
+
+    params1 = np.array([ 0.,  1.,  0.,  0.,  0.,  0.,  1.,  1.,  2.])
     print(modru.get_probs(params1))
-    params2 = np.array([0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
-        0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0])
-    print(modru2.get_probs(params2))
+    params2 = np.array([ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
+                         0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  2.,  1.,  1.])
+    print(modru2.get_probs(params2)) #raises IndexError
diff --git a/statsmodels/sandbox/regression/try_catdata.py b/statsmodels/sandbox/regression/try_catdata.py
index 277c70b92..2f7709715 100644
--- a/statsmodels/sandbox/regression/try_catdata.py
+++ b/statsmodels/sandbox/regression/try_catdata.py
@@ -1,4 +1,4 @@
-"""
+'''
 Working with categorical data
 =============================

@@ -10,25 +10,116 @@ group statistics with scipy.ndimage can handle large number of observations and
 scipy.ndimage stats is missing count

 new: np.bincount can also be used for calculating values per label
-"""
+'''
 from statsmodels.compat.python import lrange
 import numpy as np
+
 from scipy import ndimage

+#problem: ndimage does not allow axis argument,
+#   calculates mean or var corresponding to axis=None in np.mean, np.var
+#   useless for multivariate application
+
+def labelmeanfilter(y, x):
+    # requires integer labels
+    # from mailing list scipy-user 2009-02-11
+    labelsunique = np.arange(np.max(y)+1)
+    labelmeans = np.array(ndimage.mean(x, labels=y, index=labelsunique))
+    # returns label means for each original observation
+    return labelmeans[y]
+
+#groupcount: i.e. number of observation by group/label
+#np.array(ndimage.histogram(yrvs[:,0],0,10,1,labels=yrvs[:,0],index=np.unique(yrvs[:,0])))
+
+def labelmeanfilter_nd(y, x):
+    # requires integer labels
+    # from mailing list scipy-user 2009-02-11
+    # adjusted for 2d x with column variables
+
+    labelsunique = np.arange(np.max(y)+1)
+    labmeansdata = []
+    labmeans = []
+
+    for xx in x.T:
+        labelmeans = np.array(ndimage.mean(xx, labels=y, index=labelsunique))
+        labmeansdata.append(labelmeans[y])
+        labmeans.append(labelmeans)
+    # group count:
+    labelcount = np.array(ndimage.histogram(y, labelsunique[0], labelsunique[-1]+1,
+                        1, labels=y, index=labelsunique))
+
+    # returns array of lable/group counts and of label/group means
+    #         and label/group means for each original observation
+    return labelcount, np.array(labmeans), np.array(labmeansdata).T
+
+def labelmeanfilter_str(ys, x):
+    # works also for string labels in ys, but requires 1D
+    # from mailing list scipy-user 2009-02-11
+    unil, unilinv = np.unique(ys, return_index=False, return_inverse=True)
+    labelmeans = np.array(ndimage.mean(x, labels=unilinv, index=np.arange(np.max(unil)+1)))
+    arr3 = labelmeans[unilinv]
+    return arr3

 def groupstatsbin(factors, values):
-    """uses np.bincount, assumes factors/labels are integers
-    """
-    pass
+    '''uses np.bincount, assumes factors/labels are integers
+    '''
+    n = len(factors)
+    ix,rind = np.unique(factors, return_inverse=1)
+    gcount = np.bincount(rind)
+    gmean = np.bincount(rind, weights=values)/ (1.0*gcount)
+    meanarr = gmean[rind]
+    withinvar = np.bincount(rind, weights=(values-meanarr)**2) / (1.0*gcount)
+    withinvararr = withinvar[rind]
+    return gcount, gmean , meanarr, withinvar, withinvararr


 def convertlabels(ys, indices=None):
-    """convert labels based on multiple variables or string labels to unique
+    '''convert labels based on multiple variables or string labels to unique
     index labels 0,1,2,...,nk-1 where nk is the number of distinct labels
-    """
-    pass
+    '''
+    if indices is None:
+        ylabel = ys
+    else:
+        idx = np.array(indices)
+        if idx.size > 1 and ys.ndim == 2:
+            ylabel = np.array(['@%s@' % ii[:2].tostring() for ii in ys])[:,np.newaxis]
+            #alternative
+    ##        if ys[:,idx].dtype.kind == 'S':
+    ##            ylabel = nd.array([' '.join(ii[:2]) for ii in ys])[:,np.newaxis]
+        else:
+            # there might be a problem here
+            ylabel = ys

+    unil, unilinv = np.unique(ylabel, return_index=False, return_inverse=True)
+    return unilinv, np.arange(len(unil)), unil

 def groupsstats_1d(y, x, labelsunique):
-    """use ndimage to get fast mean and variance"""
-    pass
+    '''use ndimage to get fast mean and variance'''
+    labelmeans = np.array(ndimage.mean(x, labels=y, index=labelsunique))
+    labelvars = np.array(ndimage.var(x, labels=y, index=labelsunique))
+    return labelmeans, labelvars
+
+def cat2dummy(y, nonseq=0):
+    if nonseq or (y.ndim == 2 and y.shape[1] > 1):
+        ycat, uniques, unitransl =  convertlabels(y, lrange(y.shape[1]))
+    else:
+        ycat = y.copy()
+        ymin = y.min()
+        uniques = np.arange(ymin,y.max()+1)
+    if ycat.ndim == 1:
+        ycat = ycat[:,np.newaxis]
+    # this builds matrix nobs*ncat
+    dummy = (ycat == uniques).astype(int)
+    return dummy
+
+def groupsstats_dummy(y, x, nonseq=0):
+    if x.ndim == 1:
+        # use groupsstats_1d
+        x = x[:,np.newaxis]
+    dummy = cat2dummy(y, nonseq=nonseq)
+    countgr = dummy.sum(0, dtype=float)
+    meangr = np.dot(x.T,dummy)/countgr
+    meandata = np.dot(dummy,meangr.T) # category/group means as array in shape of x
+    xdevmeangr = x - meandata  # deviation from category/group mean
+    vargr = np.dot((xdevmeangr * xdevmeangr).T, dummy) / countgr
+    return meangr, vargr, xdevmeangr, countgr
diff --git a/statsmodels/sandbox/regression/try_ols_anova.py b/statsmodels/sandbox/regression/try_ols_anova.py
index 9b1af19d1..a9386154a 100644
--- a/statsmodels/sandbox/regression/try_ols_anova.py
+++ b/statsmodels/sandbox/regression/try_ols_anova.py
@@ -1,4 +1,4 @@
-""" convenience functions for ANOVA type analysis with OLS
+''' convenience functions for ANOVA type analysis with OLS

 Note: statistical results of ANOVA are not checked, OLS is
 checked but not whether the reported results are the ones used
@@ -10,33 +10,41 @@ TODO:
  * ...
  *

-"""
+'''
+
 from statsmodels.compat.python import lmap
 import numpy as np
+#from scipy import stats
 import statsmodels.api as sm

-
 def data2dummy(x, returnall=False):
-    """convert array of categories to dummy variables
+    '''convert array of categories to dummy variables
     by default drops dummy variable for last category
-    uses ravel, 1d only"""
-    pass
-
+    uses ravel, 1d only'''
+    x = x.ravel()
+    groups = np.unique(x)
+    if returnall:
+        return (x[:, None] == groups).astype(int)
+    else:
+        return (x[:, None] == groups).astype(int)[:,:-1]

 def data2proddummy(x):
-    """creates product dummy variables from 2 columns of 2d array
+    '''creates product dummy variables from 2 columns of 2d array

     drops last dummy variable, but not from each category
     singular with simple dummy variable but not with constant

     quickly written, no safeguards

-    """
-    pass
-
+    '''
+    #brute force, assumes x is 2d
+    #replace with encoding if possible
+    groups = np.unique(lmap(tuple, x.tolist()))
+    #includes singularity with additive factors
+    return (x==groups[:,None,:]).all(-1).T.astype(int)[:,:-1]

-def data2groupcont(x1, x2):
-    """create dummy continuous variable
+def data2groupcont(x1,x2):
+    '''create dummy continuous variable

     Parameters
     ----------
@@ -48,11 +56,18 @@ def data2groupcont(x1, x2):
     Notes
     -----
     useful for group specific slope coefficients in regression
-    """
-    pass
-
-
-anova_str0 = """
+    '''
+    if x2.ndim == 1:
+        x2 = x2[:,None]
+    dummy = data2dummy(x1, returnall=True)
+    return dummy * x2
+
+# Result strings
+#the second leaves the constant in, not with NIST regression
+#but something fishy with res.ess negative in examples ?
+#not checked if these are all the right ones
+
+anova_str0 = '''
 ANOVA statistics (model sum of squares excludes constant)
 Source    DF  Sum Squares   Mean Square    F Value    Pr > F
 Model     %(df_model)i        %(ess)f       %(mse_model)f   %(fvalue)f %(f_pvalue)f
@@ -60,8 +75,9 @@ Error     %(df_resid)i     %(ssr)f       %(mse_resid)f
 CTotal    %(nobs)i    %(uncentered_tss)f     %(mse_total)f

 R squared  %(rsquared)f
-"""
-anova_str = """
+'''
+
+anova_str = '''
 ANOVA statistics (model sum of squares includes constant)
 Source    DF  Sum Squares   Mean Square    F Value    Pr > F
 Model     %(df_model)i      %(ssmwithmean)f       %(mse_model)f   %(fvalue)f %(f_pvalue)f
@@ -69,19 +85,28 @@ Error     %(df_resid)i     %(ssr)f       %(mse_resid)f
 CTotal    %(nobs)i    %(uncentered_tss)f     %(mse_total)f

 R squared  %(rsquared)f
-"""
+'''


 def anovadict(res):
-    """update regression results dictionary with ANOVA specific statistics
+    '''update regression results dictionary with ANOVA specific statistics

     not checked for completeness
-    """
-    pass
+    '''
+    ad = {}
+    ad.update(res.__dict__)  #dict does not work with cached attributes
+    anova_attr = ['df_model', 'df_resid', 'ess', 'ssr','uncentered_tss',
+                 'mse_model', 'mse_resid', 'mse_total', 'fvalue', 'f_pvalue',
+                  'rsquared']
+    for key in anova_attr:
+        ad[key] = getattr(res, key)
+    ad['nobs'] = res.model.nobs
+    ad['ssmwithmean'] = res.uncentered_tss - res.ssr
+    return ad


 def form2design(ss, data):
-    """convert string formula to data dictionary
+    '''convert string formula to data dictionary

     ss : str
      * I : add constant
@@ -115,77 +140,146 @@ def form2design(ss, data):
     -----

     with sorted dict, separate name list would not be necessary
-    """
-    pass
-
+    '''
+    vars = {}
+    names = []
+    for item in ss.split():
+        if item == 'I':
+            vars['const'] = np.ones(data.shape[0])
+            names.append('const')
+        elif ':' not in item:
+            vars[item] = data[item]
+            names.append(item)
+        elif item[:2] == 'F:':
+            v = item.split(':')[1]
+            vars[v] = data2dummy(data[v])
+            names.append(v)
+        elif item[:2] == 'P:':
+            v = item.split(':')[1].split('*')
+            vars[''.join(v)] = data2proddummy(np.c_[data[v[0]],data[v[1]]])
+            names.append(''.join(v))
+        elif item[:2] == 'G:':
+            v = item.split(':')[1].split('*')
+            vars[''.join(v)] = data2groupcont(data[v[0]], data[v[1]])
+            names.append(''.join(v))
+        else:
+            raise ValueError('unknown expression in formula')
+    return vars, names

 def dropname(ss, li):
-    """drop names from a list of strings,
+    '''drop names from a list of strings,
     names to drop are in space delimited list
     does not change original list
-    """
-    pass
-
+    '''
+    newli = li[:]
+    for item in ss.split():
+        newli.remove(item)
+    return newli

 if __name__ == '__main__':
+
+    # Test Example with created data
+    # ------------------------------
+
     nobs = 1000
-    testdataint = np.random.randint(3, size=(nobs, 4)).view([('a', int), (
-        'b', int), ('c', int), ('d', int)])
-    testdatacont = np.random.normal(size=(nobs, 2)).view([('e', float), (
-        'f', float)])
+    testdataint = np.random.randint(3, size=(nobs,4)).view([('a',int),('b',int),('c',int),('d',int)])
+    testdatacont = np.random.normal( size=(nobs,2)).view([('e',float), ('f',float)])
     import numpy.lib.recfunctions
-    dt2 = numpy.lib.recfunctions.zip_descr((testdataint, testdatacont),
-        flatten=True)
-    testdata = np.empty((nobs, 1), dt2)
+    dt2 = numpy.lib.recfunctions.zip_descr((testdataint, testdatacont),flatten=True)
+    # concatenate structured arrays
+    testdata = np.empty((nobs,1), dt2)
     for name in testdataint.dtype.names:
         testdata[name] = testdataint[name]
     for name in testdatacont.dtype.names:
         testdata[name] = testdatacont[name]
-    if 0:
-        xx, n = form2design('F:a', testdata)
+
+
+    #print(form2design('a',testdata)
+
+    if 0: # print(only when nobs is small, e.g. nobs=10
+        xx, n = form2design('F:a',testdata)
         print(xx)
-        print(form2design('P:a*b', testdata))
-        print(data2proddummy(np.c_[testdata['a'], testdata['b']]))
-        xx, names = form2design('a F:b P:c*d', testdata)
+        print(form2design('P:a*b',testdata))
+        print(data2proddummy((np.c_[testdata['a'],testdata['b']])))
+
+        xx, names = form2design('a F:b P:c*d',testdata)
+
+    #xx, names = form2design('I a F:b F:c F:d P:c*d',testdata)
     xx, names = form2design('I a F:b P:c*d', testdata)
     xx, names = form2design('I a F:b P:c*d G:a*e f', testdata)
+
+
     X = np.column_stack([xx[nn] for nn in names])
-    y = X.sum(1) + 0.01 * np.random.normal(size=nobs)
-    rest1 = sm.OLS(y, X).fit()
+    # simple test version: all coefficients equal to one
+    y = X.sum(1) + 0.01*np.random.normal(size=(nobs))
+    rest1 = sm.OLS(y,X).fit() #results
     print(rest1.params)
     print(anova_str % anovadict(rest1))
+
+
     X = np.column_stack([xx[nn] for nn in dropname('ae f', names)])
-    y = X.sum(1) + 0.01 * np.random.normal(size=nobs)
-    rest1 = sm.OLS(y, X).fit()
+    # simple test version: all coefficients equal to one
+    y = X.sum(1) + 0.01*np.random.normal(size=(nobs))
+    rest1 = sm.OLS(y,X).fit()
     print(rest1.params)
     print(anova_str % anovadict(rest1))
-    dt_b = np.dtype([('breed', int), ('sex', int), ('litter', int), ('pen',
-        int), ('pig', int), ('age', float), ('bage', float), ('y', float)])
-    dta = np.genfromtxt('dftest3.data', dt_b, missing='.', usemask=True)
+
+
+    # Example: from Bruce
+    # -------------------
+
+    #get data and clean it
+    #^^^^^^^^^^^^^^^^^^^^^
+
+    # requires file 'dftest3.data' posted by Bruce
+
+    # read data set and drop rows with missing data
+    dt_b = np.dtype([('breed', int), ('sex', int), ('litter', int),
+                   ('pen', int), ('pig', int), ('age', float),
+                   ('bage', float), ('y', float)])
+    dta = np.genfromtxt('dftest3.data', dt_b,missing='.', usemask=True)
     print('missing', [dta.mask[k].sum() for k in dta.dtype.names])
     m = dta.mask.view(bool)
-    droprows = m.reshape(-1, len(dta.dtype.names)).any(1)
-    dta_use_b1 = dta[~droprows, :].data
+    droprows = m.reshape(-1,len(dta.dtype.names)).any(1)
+    # get complete data as plain structured array
+    # maybe does not work with masked arrays
+    dta_use_b1 = dta[~droprows,:].data
     print(dta_use_b1.shape)
     print(dta_use_b1.dtype)
+
+    #Example b1: variables from Bruce's glm
+    #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+    # prepare data and dummy variables
     xx_b1, names_b1 = form2design('I F:sex age', dta_use_b1)
+    # create design matrix
     X_b1 = np.column_stack([xx_b1[nn] for nn in dropname('', names_b1)])
     y_b1 = dta_use_b1['y']
+    # estimate using OLS
     rest_b1 = sm.OLS(y_b1, X_b1).fit()
+    # print(results)
     print(rest_b1.params)
     print(anova_str % anovadict(rest_b1))
+    #compare with original version only in original version
+    #print(anova_str % anovadict(res_b0))
+
+    # Example: use all variables except pig identifier
+    #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
     allexog = ' '.join(dta.dtype.names[:-1])
-    xx_b1a, names_b1a = form2design('I F:breed F:sex F:litter F:pen age bage',
-        dta_use_b1)
+    #'breed sex litter pen pig age bage'
+
+    xx_b1a, names_b1a = form2design('I F:breed F:sex F:litter F:pen age bage', dta_use_b1)
     X_b1a = np.column_stack([xx_b1a[nn] for nn in dropname('', names_b1a)])
     y_b1a = dta_use_b1['y']
     rest_b1a = sm.OLS(y_b1a, X_b1a).fit()
     print(rest_b1a.params)
     print(anova_str % anovadict(rest_b1a))
+
     for dropn in names_b1a:
         print(('\nResults dropping', dropn))
-        X_b1a_ = np.column_stack([xx_b1a[nn] for nn in dropname(dropn,
-            names_b1a)])
+        X_b1a_ = np.column_stack([xx_b1a[nn] for nn in dropname(dropn, names_b1a)])
         y_b1a_ = dta_use_b1['y']
         rest_b1a_ = sm.OLS(y_b1a_, X_b1a_).fit()
+        #print(rest_b1a_.params)
         print(anova_str % anovadict(rest_b1a_))
diff --git a/statsmodels/sandbox/regression/try_treewalker.py b/statsmodels/sandbox/regression/try_treewalker.py
index d2d9cc9c6..9359f1ee0 100644
--- a/statsmodels/sandbox/regression/try_treewalker.py
+++ b/statsmodels/sandbox/regression/try_treewalker.py
@@ -1,51 +1,130 @@
-"""Trying out tree structure for nested logit
+'''Trying out tree structure for nested logit

 sum is standing for likelihood calculations

 should collect and aggregate likelihood contributions bottom up

-"""
+'''
 from statsmodels.compat.python import lrange
 import numpy as np
-tree = [[0, 1], [[2, 3], [4, 5, 6]], [7]]
-xb = 2 * np.arange(8)
-testxb = 1

+tree = [[0,1],[[2,3],[4,5,6]],[7]]
+#singleton/degenerate branch needs to be list
+
+xb = 2*np.arange(8)
+testxb = 1 #0

 def branch(tree):
-    """walking a tree bottom-up
-    """
-    pass
+    '''walking a tree bottom-up
+    '''
+
+    if not isinstance(tree[0], int):   #assumes leaves are int for choice index
+        branchsum = 0
+        for b in tree:
+            branchsum += branch(b)
+    else:
+        print(tree)
+        print('final branch with', tree, sum(tree))
+        if testxb:
+            return sum(xb[tree])
+        else:
+            return sum(tree)

+    print('working on branch', tree, branchsum)
+    return branchsum

 print(branch(tree))
-testxb = 0


+
+#new version that also keeps track of branch name and allows V_j for a branch
+#   as in Greene, V_j + lamda * IV does not look the same as including the
+#   explanatory variables in leaf X_j, V_j is linear in X, IV is logsumexp of X,
+
+
+testxb = 0#1#0
 def branch2(tree):
-    """walking a tree bottom-up based on dictionary
-    """
-    pass
-
-
-tree = [[0, 1], [[2, 3], [4, 5, 6]], [7]]
-tree2 = 'top', [('B1', ['a', 'b']), ('B2', [('B21', ['c', 'd']), ('B22', [
-    'e', 'f', 'g'])]), ('B3', ['h'])]
-data2 = dict([i for i in zip('abcdefgh', lrange(8))])
-data2.update({'top': 1000, 'B1': 100, 'B2': 200, 'B21': 21, 'B22': 22, 'B3':
-    300})
-print("""
- tree with dictionary data""")
-print(branch2(tree2))
-paramsind = {'B1': [], 'a': ['consta', 'p'], 'b': ['constb', 'p'], 'B2': [
-    'const2', 'x2'], 'B21': [], 'c': ['consta', 'p', 'time'], 'd': [
-    'consta', 'p', 'time'], 'B22': ['x22'], 'e': ['conste', 'p', 'hince'],
-    'f': ['constt', 'p', 'hincf'], 'g': ['p', 'hincg'], 'B3': [], 'h': [
-    'consth', 'p', 'h'], 'top': []}
+    '''walking a tree bottom-up based on dictionary
+    '''
+
+
+    if isinstance(tree,  tuple):   #assumes leaves are int for choice index
+        name, subtree = tree
+        print(name, data2[name])
+        print('subtree', subtree)
+        if testxb:
+            branchsum = data2[name]
+        else:
+            branchsum = name  #0
+        for b in subtree:
+            #branchsum += branch2(b)
+            branchsum = branchsum + branch2(b)
+    else:
+        leavessum = sum((data2[bi] for bi in tree))
+        print('final branch with', tree, ''.join(tree), leavessum) #sum(tree)
+        if testxb:
+            return leavessum  #sum(xb[tree])
+        else:
+            return ''.join(tree) #sum(tree)
+
+    print('working on branch', tree, branchsum)
+    return branchsum
+
+tree = [[0,1],[[2,3],[4,5,6]],[7]]
+tree2 = ('top',
+            [('B1',['a','b']),
+             ('B2',
+                   [('B21',['c', 'd']),
+                    ('B22',['e', 'f', 'g'])
+                    ]
+              ),
+             ('B3',['h'])]
+         )
+
+data2 = dict([i for i in zip('abcdefgh',lrange(8))])
+#data2.update({'top':1000, 'B1':100, 'B2':200, 'B21':300,'B22':400, 'B3':400})
+data2.update({'top':1000, 'B1':100, 'B2':200, 'B21':21,'B22':22, 'B3':300})
+
+#data2
+#{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5, 'h': 7,
+#'top': 1000, 'B22': 22, 'B21': 21, 'B1': 100, 'B2': 200, 'B3': 300}
+
+print('\n tree with dictionary data')
+print(branch2(tree2))  # results look correct for testxb=0 and 1
+
+
+#parameters/coefficients map coefficient names to indices, list of indices into
+#a 1d params one for each leave and branch
+
+#Note: dict looses ordering
+paramsind = {
+ 'B1': [],
+ 'a': ['consta', 'p'],
+ 'b': ['constb', 'p'],
+ 'B2': ['const2', 'x2'],
+ 'B21': [],
+ 'c': ['consta', 'p', 'time'],
+ 'd': ['consta', 'p', 'time'],
+ 'B22': ['x22'],
+ 'e': ['conste', 'p', 'hince'],
+ 'f': ['constt', 'p', 'hincf'],
+ 'g': [          'p', 'hincg'],
+ 'B3': [],
+ 'h': ['consth', 'p', 'h'],
+ 'top': []}
+
+#unique, parameter array names,
+#sorted alphabetically, order is/should be only internal
+
 paramsnames = sorted(set([i for j in paramsind.values() for i in j]))
-paramsidx = dict((name, idx) for idx, name in enumerate(paramsnames))
-inddict = dict((k, [paramsidx[j] for j in v]) for k, v in paramsind.items())
-"""
+
+#mapping coefficient names to indices to unique/parameter array
+paramsidx = dict((name, idx) for (idx,name) in enumerate(paramsnames))
+
+#mapping branch and leaf names to index in parameter array
+inddict = dict((k,[paramsidx[j] for j in v]) for k,v in paramsind.items())
+
+'''
 >>> paramsnames
 ['const2', 'consta', 'constb', 'conste', 'consth', 'constt', 'h', 'hince',
  'hincf', 'hincg', 'p', 'time', 'x2', 'x22']
@@ -63,4 +142,4 @@ inddict = dict((k, [paramsidx[j] for j in v]) for k, v in paramsind.items())
  'g': ['p', 'hincg'], 'f': ['constt', 'p', 'hincf'], 'h': ['consth', 'p', 'h'],
  'top': [], 'B22': ['x22'], 'B21': [], 'B1': [], 'B2': ['const2', 'x2'],
  'B3': []}
-"""
+'''
diff --git a/statsmodels/sandbox/rls.py b/statsmodels/sandbox/rls.py
index 4061961ea..9daa663cc 100644
--- a/statsmodels/sandbox/rls.py
+++ b/statsmodels/sandbox/rls.py
@@ -37,7 +37,7 @@ class RLS(GLS):
     A Pedagogical Note", The Review of Economics and Statistics, 1991.
     """

-    def __init__(self, endog, exog, constr, param=0.0, sigma=None):
+    def __init__(self, endog, exog, constr, param=0., sigma=None):
         N, Q = exog.shape
         constr = np.asarray(constr)
         if constr.ndim == 1:
@@ -53,7 +53,7 @@ class RLS(GLS):
             param = np.ones((K,)) * param
         self.param = param
         if sigma is None:
-            sigma = 1.0
+            sigma = 1.
         if np.isscalar(sigma):
             sigma = np.ones(N) * sigma
         sigma = np.squeeze(sigma)
@@ -62,56 +62,90 @@ class RLS(GLS):
             self.cholsigmainv = np.diag(np.sqrt(sigma))
         else:
             self.sigma = sigma
-            self.cholsigmainv = np.linalg.cholesky(np.linalg.pinv(self.sigma)
-                ).T
+            self.cholsigmainv = np.linalg.cholesky(np.linalg.pinv(self.sigma)).T
         super(GLS, self).__init__(endog, exog)
-    _rwexog = None

+    _rwexog = None
     @property
     def rwexog(self):
         """Whitened exogenous variables augmented with restrictions"""
-        pass
-    _inv_rwexog = None
+        if self._rwexog is None:
+            P = self.ncoeffs
+            K = self.nconstraint
+            design = np.zeros((P + K, P + K))
+            design[:P, :P] = np.dot(self.wexog.T, self.wexog) #top left
+            constr = np.reshape(self.constraint, (K, P))
+            design[:P, P:] = constr.T #top right partition
+            design[P:, :P] = constr #bottom left partition
+            design[P:, P:] = np.zeros((K, K)) #bottom right partition
+            self._rwexog = design
+        return self._rwexog

+    _inv_rwexog = None
     @property
     def inv_rwexog(self):
         """Inverse of self.rwexog"""
-        pass
-    _rwendog = None
+        if self._inv_rwexog is None:
+            self._inv_rwexog = np.linalg.inv(self.rwexog)
+        return self._inv_rwexog

+    _rwendog = None
     @property
     def rwendog(self):
         """Whitened endogenous variable augmented with restriction parameters"""
-        pass
-    _ncp = None
+        if self._rwendog is None:
+            P = self.ncoeffs
+            K = self.nconstraint
+            response = np.zeros((P + K,))
+            response[:P] = np.dot(self.wexog.T, self.wendog)
+            response[P:] = self.param
+            self._rwendog = response
+        return self._rwendog

+    _ncp = None
     @property
     def rnorm_cov_params(self):
         """Parameter covariance under restrictions"""
-        pass
-    _wncp = None
+        if self._ncp is None:
+            P = self.ncoeffs
+            self._ncp = self.inv_rwexog[:P, :P]
+        return self._ncp

+    _wncp = None
     @property
     def wrnorm_cov_params(self):
         """
         Heteroskedasticity-consistent parameter covariance
         Used to calculate White standard errors.
         """
-        pass
-    _coeffs = None
+        if self._wncp is None:
+            df = self.df_resid
+            pred = np.dot(self.wexog, self.coeffs)
+            eps = np.diag((self.wendog - pred) ** 2)
+            sigmaSq = np.sum(eps)
+            pinvX = np.dot(self.rnorm_cov_params, self.wexog.T)
+            self._wncp = np.dot(np.dot(pinvX, eps), pinvX.T) * df / sigmaSq
+        return self._wncp

+    _coeffs = None
     @property
     def coeffs(self):
         """Estimated parameters"""
-        pass
+        if self._coeffs is None:
+            betaLambda = np.dot(self.inv_rwexog, self.rwendog)
+            self._coeffs = betaLambda[:self.ncoeffs]
+        return self._coeffs

+    def fit(self):
+        rncp = self.wrnorm_cov_params
+        lfit = RegressionResults(self, self.coeffs, normalized_cov_params=rncp)
+        return lfit

-if __name__ == '__main__':
+if __name__=="__main__":
     import statsmodels.api as sm
     dta = np.genfromtxt('./rlsdata.txt', names=True)
-    design = np.column_stack((dta['Y'], dta['Y'] ** 2, dta[['NE', 'NC', 'W',
-        'S']].view(float).reshape(dta.shape[0], -1)))
+    design = np.column_stack((dta['Y'],dta['Y']**2,dta[['NE','NC','W','S']].view(float).reshape(dta.shape[0],-1)))
     design = sm.add_constant(design, prepend=True)
-    rls_mod = RLS(dta['G'], design, constr=[0, 0, 0, 1, 1, 1, 1])
+    rls_mod = RLS(dta['G'],design, constr=[0,0,0,1,1,1,1])
     rls_fit = rls_mod.fit()
     print(rls_fit.params)
diff --git a/statsmodels/sandbox/stats/contrast_tools.py b/statsmodels/sandbox/stats/contrast_tools.py
index eed9c591d..e41758715 100644
--- a/statsmodels/sandbox/stats/contrast_tools.py
+++ b/statsmodels/sandbox/stats/contrast_tools.py
@@ -1,4 +1,4 @@
-"""functions to work with contrasts for multiple tests
+'''functions to work with contrasts for multiple tests

 contrast matrices for comparing all pairs, all levels to reference level, ...
 extension to 2-way groups in progress
@@ -18,13 +18,18 @@ Idea for second part
   - connect to new multiple comparison for contrast matrices, based on
     multivariate normal or t distribution (Hothorn, Bretz, Westfall)

-"""
+'''
+
+
+
 from numpy.testing import assert_equal
+
 import numpy as np

+#next 3 functions copied from multicomp.py

 def contrast_allpairs(nm):
-    """contrast or restriction matrix for all pairs of nm variables
+    '''contrast or restriction matrix for all pairs of nm variables

     Parameters
     ----------
@@ -35,12 +40,18 @@ def contrast_allpairs(nm):
     contr : ndarray, 2d, (nm*(nm-1)/2, nm)
        contrast matrix for all pairwise comparisons

-    """
-    pass
-
+    '''
+    contr = []
+    for i in range(nm):
+        for j in range(i+1, nm):
+            contr_row = np.zeros(nm)
+            contr_row[i] = 1
+            contr_row[j] = -1
+            contr.append(contr_row)
+    return np.array(contr)

 def contrast_all_one(nm):
-    """contrast or restriction matrix for all against first comparison
+    '''contrast or restriction matrix for all against first comparison

     Parameters
     ----------
@@ -51,12 +62,12 @@ def contrast_all_one(nm):
     contr : ndarray, 2d, (nm-1, nm)
        contrast matrix for all against first comparisons

-    """
-    pass
-
+    '''
+    contr = np.column_stack((np.ones(nm-1), -np.eye(nm-1)))
+    return contr

 def contrast_diff_mean(nm):
-    """contrast or restriction matrix for all against mean comparison
+    '''contrast or restriction matrix for all against mean comparison

     Parameters
     ----------
@@ -67,13 +78,31 @@ def contrast_diff_mean(nm):
     contr : ndarray, 2d, (nm-1, nm)
        contrast matrix for all against mean comparisons

-    """
-    pass
+    '''
+    return np.eye(nm) - np.ones((nm,nm))/nm

+def signstr(x, noplus=False):
+    if x in [-1,0,1]:
+        if not noplus:
+            return '+' if np.sign(x)>=0 else '-'
+        else:
+            return '' if np.sign(x)>=0 else '-'
+    else:
+        return str(x)
+
+
+def contrast_labels(contrasts, names, reverse=False):
+    if reverse:
+        sl = slice(None, None, -1)
+    else:
+        sl = slice(None)
+    labels = [''.join(['%s%s' % (signstr(c, noplus=True),v)
+                          for c,v in zip(row, names)[sl] if c != 0])
+                             for row in contrasts]
+    return labels

-def contrast_product(names1, names2, intgroup1=None, intgroup2=None, pairs=
-    False):
-    """build contrast matrices for products of two categorical variables
+def contrast_product(names1, names2, intgroup1=None, intgroup2=None, pairs=False):
+    '''build contrast matrices for products of two categorical variables

     this is an experimental script and should be converted to a class

@@ -93,12 +122,53 @@ def contrast_product(names1, names2, intgroup1=None, intgroup2=None, pairs=

     ? does contrast_all_pairs work as a plugin to get all pairs ?

-    """
-    pass
+    '''
+
+    n1 = len(names1)
+    n2 = len(names2)
+    names_prod = ['%s_%s' % (i,j) for i in names1 for j in names2]
+    ee1 = np.zeros((1,n1))
+    ee1[0,0] = 1
+    if not pairs:
+        dd = np.r_[ee1, -contrast_all_one(n1)]
+    else:
+        dd = np.r_[ee1, -contrast_allpairs(n1)]
+
+    contrast_prod = np.kron(dd[1:], np.eye(n2))
+    names_contrast_prod0 = contrast_labels(contrast_prod, names_prod, reverse=True)
+    names_contrast_prod = [''.join(['%s%s' % (signstr(c, noplus=True),v)
+                              for c,v in zip(row, names_prod)[::-1] if c != 0])
+                                 for row in contrast_prod]
+
+    ee2 = np.zeros((1,n2))
+    ee2[0,0] = 1
+    #dd2 = np.r_[ee2, -contrast_all_one(n2)]
+    if not pairs:
+        dd2 = np.r_[ee2, -contrast_all_one(n2)]
+    else:
+        dd2 = np.r_[ee2, -contrast_allpairs(n2)]
+
+    contrast_prod2 = np.kron(np.eye(n1), dd2[1:])
+    names_contrast_prod2 = [''.join(['%s%s' % (signstr(c, noplus=True),v)
+                              for c,v in zip(row, names_prod)[::-1] if c != 0])
+                                 for row in contrast_prod2]
+
+    if (intgroup1 is not None) and (intgroup1 is not None):
+        d1, _ = dummy_1d(intgroup1)
+        d2, _ = dummy_1d(intgroup2)
+        dummy = dummy_product(d1, d2)
+    else:
+        dummy = None
+
+    return (names_prod, contrast_prod, names_contrast_prod,
+                        contrast_prod2, names_contrast_prod2, dummy)
+
+
+


 def dummy_1d(x, varname=None):
-    """dummy variable for id integer groups
+    '''dummy variable for id integer groups

     Parameters
     ----------
@@ -142,12 +212,18 @@ def dummy_1d(x, varname=None):
            [0, 1],
            [0, 1]]), ['gender_F', 'gender_M'])

-    """
-    pass
+    '''
+    if varname is None:  #assumes integer
+        labels = ['level_%d' % i for i in range(x.max() + 1)]
+        return (x[:,None]==np.arange(x.max()+1)).astype(int), labels
+    else:
+        grouplabels = np.unique(x)
+        labels = [varname + '_%s' % str(i) for i in grouplabels]
+        return (x[:,None]==grouplabels).astype(int), labels


 def dummy_product(d1, d2, method='full'):
-    """dummy variable from product of two dummy variables
+    '''dummy variable from product of two dummy variables

     Parameters
     ----------
@@ -167,12 +243,24 @@ def dummy_product(d1, d2, method='full'):
     dummy : ndarray
         dummy variable for product, see method

-    """
-    pass
+    '''
+
+    if method == 'full':
+        dd = (d1[:,:,None]*d2[:,None,:]).reshape(d1.shape[0],-1)
+    elif method == 'drop-last':  #same as SAS transreg
+        d12rl = dummy_product(d1[:,:-1], d2[:,:-1])
+        dd = np.column_stack((np.ones(d1.shape[0], int), d1[:,:-1], d2[:,:-1],d12rl))
+        #Note: dtype int should preserve dtype of d1 and d2
+    elif method == 'drop-first':
+        d12r = dummy_product(d1[:,1:], d2[:,1:])
+        dd = np.column_stack((np.ones(d1.shape[0], int), d1[:,1:], d2[:,1:],d12r))
+    else:
+        raise ValueError('method not recognized')

+    return dd

 def dummy_limits(d):
-    """start and endpoints of groups in a sorted dummy variable array
+    '''start and endpoints of groups in a sorted dummy variable array

     helper function for nested categories

@@ -199,12 +287,24 @@ def dummy_limits(d):
     [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])]
     >>> [np.arange(d1.shape[0])[b:e] for b,e in zip(*dummy_limits(d1))]
     [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])]
-    """
-    pass
+    '''
+    nobs, nvars = d.shape
+    start1, col1 = np.nonzero(np.diff(d,axis=0)==1)
+    end1, col1_ = np.nonzero(np.diff(d,axis=0)==-1)
+    cc = np.arange(nvars)
+    #print(cc, np.r_[[0], col1], np.r_[col1_, [nvars-1]]
+    if ((not (np.r_[[0], col1] == cc).all())
+            or (not (np.r_[col1_, [nvars-1]] == cc).all())):
+        raise ValueError('dummy variable is not sorted')
+
+    start = np.r_[[0], start1+1]
+    end = np.r_[end1+1, [nobs]]
+    return start, end
+


 def dummy_nested(d1, d2, method='full'):
-    """unfinished and incomplete mainly copy past dummy_product
+    '''unfinished and incomplete mainly copy past dummy_product
     dummy variable from product of two dummy variables

     Parameters
@@ -224,12 +324,34 @@ def dummy_nested(d1, d2, method='full'):
     dummy : ndarray
         dummy variable for product, see method

-    """
-    pass
+    '''
+    if method == 'full':
+        return d2
+
+    start1, end1 = dummy_limits(d1)
+    start2, end2 = dummy_limits(d2)
+    first = np.in1d(start2, start1)
+    last = np.in1d(end2, end1)
+    equal = (first == last)
+    col_dropf = ~first*~equal
+    col_dropl = ~last*~equal
+
+
+    if method == 'drop-last':
+        d12rl = dummy_product(d1[:,:-1], d2[:,:-1])
+        dd = np.column_stack((np.ones(d1.shape[0], int), d1[:,:-1], d2[:,col_dropl]))
+        #Note: dtype int should preserve dtype of d1 and d2
+    elif method == 'drop-first':
+        d12r = dummy_product(d1[:,1:], d2[:,1:])
+        dd = np.column_stack((np.ones(d1.shape[0], int), d1[:,1:], d2[:,col_dropf]))
+    else:
+        raise ValueError('method not recognized')
+
+    return dd, col_dropf, col_dropl


 class DummyTransform:
-    """Conversion between full rank dummy encodings
+    '''Conversion between full rank dummy encodings


     y = X b + u
@@ -259,39 +381,42 @@ class DummyTransform:
      - not sure yet if method names make sense


-    """
+    '''

     def __init__(self, d1, d2):
-        """C such that d1 C = d2, with d1 = X, d2 = Z
+        '''C such that d1 C = d2, with d1 = X, d2 = Z

         should be (x, z) in arguments ?
-        """
+        '''
         self.transf_matrix = np.linalg.lstsq(d1, d2, rcond=-1)[0]
         self.invtransf_matrix = np.linalg.lstsq(d2, d1, rcond=-1)[0]

     def dot_left(self, a):
-        """ b = C a
-        """
-        pass
+        ''' b = C a
+        '''
+        return np.dot(self.transf_matrix, a)

     def dot_right(self, x):
-        """ z = x C
-        """
-        pass
+        ''' z = x C
+        '''
+        return np.dot(x, self.transf_matrix)

     def inv_dot_left(self, b):
-        """ a = C^{-1} b
-        """
-        pass
+        ''' a = C^{-1} b
+        '''
+        return np.dot(self.invtransf_matrix, b)

     def inv_dot_right(self, z):
-        """ x = z C^{-1}
-        """
-        pass
+        ''' x = z C^{-1}
+        '''
+        return np.dot(z, self.invtransf_matrix)
+
+
+


 def groupmean_d(x, d):
-    """groupmeans using dummy variables
+    '''groupmeans using dummy variables

     Parameters
     ----------
@@ -314,12 +439,19 @@ def groupmean_d(x, d):
     dummy variable. In this case it is recommended to use
     a more efficient version.

-    """
-    pass
+    '''
+    x = np.asarray(x)
+##    if x.ndim == 1:
+##        nvars = 1
+##    else:
+    nvars = x.ndim + 1
+    sli = [slice(None)] + [None]*(nvars-2) + [slice(None)]
+    return (x[...,None] * d[sli]).sum(0)*1./d.sum(0)
+


 class TwoWay:
-    """a wrapper class for two way anova type of analysis with OLS
+    '''a wrapper class for two way anova type of analysis with OLS


     currently mainly to bring things together
@@ -337,8 +469,7 @@ class TwoWay:

     missing: ANOVA table

-    """
-
+    '''
     def __init__(self, endog, factor1, factor2, varnames=None):
         self.nobs = factor1.shape[0]
         if varnames is None:
@@ -346,49 +477,141 @@ class TwoWay:
             vname2 = 'b'
         else:
             vname1, vname1 = varnames
+
         self.d1, self.d1_labels = d1, d1_labels = dummy_1d(factor1, vname1)
         self.d2, self.d2_labels = d2, d2_labels = dummy_1d(factor2, vname2)
         self.nlevel1 = nlevel1 = d1.shape[1]
         self.nlevel2 = nlevel2 = d2.shape[1]
+
+
+        #get product dummies
         res = contrast_product(d1_labels, d2_labels)
         prodlab, C1, C1lab, C2, C2lab, _ = res
-        (self.prod_label, self.C1, self.C1_label, self.C2, self.C2_label, _
-            ) = res
+        self.prod_label, self.C1, self.C1_label, self.C2, self.C2_label, _ = res
         dp_full = dummy_product(d1, d2, method='full')
         dp_dropf = dummy_product(d1, d2, method='drop-first')
         self.transform = DummyTransform(dp_full, dp_dropf)
+
+        #estimate the model
         self.nvars = dp_full.shape[1]
         self.exog = dp_full
         self.resols = sm.OLS(endog, dp_full).fit()
         self.params = self.resols.params
+
+        #get transformed parameters, (constant, main, interaction effect)
         self.params_dropf = self.transform.inv_dot_left(self.params)
         self.start_interaction = 1 + (nlevel1 - 1) + (nlevel2 - 1)
         self.n_interaction = self.nvars - self.start_interaction

+    #convert to cached property
     def r_nointer(self):
-        """contrast/restriction matrix for no interaction
-        """
-        pass
+        '''contrast/restriction matrix for no interaction
+        '''
+        nia = self.n_interaction
+        R_nointer = np.hstack((np.zeros((nia, self.nvars-nia)), np.eye(nia)))
+        #inter_direct = resols_full_dropf.tval[-nia:]
+        R_nointer_transf = self.transform.inv_dot_right(R_nointer)
+        self.R_nointer_transf = R_nointer_transf
+        return R_nointer_transf

     def ttest_interaction(self):
-        """ttests for no-interaction terms are zero
-        """
-        pass
+        '''ttests for no-interaction terms are zero
+        '''
+        #use self.r_nointer instead
+        nia = self.n_interaction
+        R_nointer = np.hstack((np.zeros((nia, self.nvars-nia)), np.eye(nia)))
+        #inter_direct = resols_full_dropf.tval[-nia:]
+        R_nointer_transf = self.transform.inv_dot_right(R_nointer)
+        self.R_nointer_transf = R_nointer_transf
+        t_res = self.resols.t_test(R_nointer_transf)
+        return t_res

     def ftest_interaction(self):
-        """ttests for no-interaction terms are zero
-        """
-        pass
+        '''ttests for no-interaction terms are zero
+        '''
+        R_nointer_transf = self.r_nointer()
+        return self.resols.f_test(R_nointer_transf)
+
+    def ttest_conditional_effect(self, factorind):
+        if factorind == 1:
+            return self.resols.t_test(self.C1), self.C1_label
+        else:
+            return self.resols.t_test(self.C2), self.C2_label

+    def summary_coeff(self):
+        from statsmodels.iolib import SimpleTable
+        params_arr = self.params.reshape(self.nlevel1, self.nlevel2)
+        stubs = self.d1_labels
+        headers = self.d2_labels
+        title = 'Estimated Coefficients by factors'
+        table_fmt = dict(
+            data_fmts = ["%#10.4g"]*self.nlevel2)
+        return SimpleTable(params_arr, headers, stubs, title=title,
+                           txt_fmt=table_fmt)
+
+
+# --------------- tests
+# TODO: several tests still missing, several are in the example with print

 class TestContrastTools:

     def __init__(self):
         self.v1name = ['a0', 'a1', 'a2']
         self.v2name = ['b0', 'b1']
-        self.d1 = np.array([[1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [0,
-            1, 0], [0, 1, 0], [0, 1, 0], [0, 1, 0], [0, 0, 1], [0, 0, 1], [
-            0, 0, 1], [0, 0, 1]])
+        self.d1 = np.array([[1, 0, 0],
+                            [1, 0, 0],
+                            [1, 0, 0],
+                            [1, 0, 0],
+                            [0, 1, 0],
+                            [0, 1, 0],
+                            [0, 1, 0],
+                            [0, 1, 0],
+                            [0, 0, 1],
+                            [0, 0, 1],
+                            [0, 0, 1],
+                            [0, 0, 1]])
+
+    def test_dummy_1d(self):
+        x = np.array(['F', 'F', 'M', 'M', 'F', 'F', 'M', 'M', 'F', 'F', 'M', 'M'],
+              dtype='|S1')
+        d, labels = (np.array([[1, 0],
+                               [1, 0],
+                               [0, 1],
+                               [0, 1],
+                               [1, 0],
+                               [1, 0],
+                               [0, 1],
+                               [0, 1],
+                               [1, 0],
+                               [1, 0],
+                               [0, 1],
+                               [0, 1]]), ['gender_F', 'gender_M'])
+        res_d, res_labels = dummy_1d(x, varname='gender')
+        assert_equal(res_d, d)
+        assert_equal(res_labels, labels)
+
+    def test_contrast_product(self):
+        res_cp = contrast_product(self.v1name, self.v2name)
+        res_t = [0]*6
+        res_t[0] = ['a0_b0', 'a0_b1', 'a1_b0', 'a1_b1', 'a2_b0', 'a2_b1']
+        res_t[1] = np.array([[-1.,  0.,  1.,  0.,  0.,  0.],
+                           [ 0., -1.,  0.,  1.,  0.,  0.],
+                           [-1.,  0.,  0.,  0.,  1.,  0.],
+                           [ 0., -1.,  0.,  0.,  0.,  1.]])
+        res_t[2] = ['a1_b0-a0_b0', 'a1_b1-a0_b1', 'a2_b0-a0_b0', 'a2_b1-a0_b1']
+        res_t[3] =  np.array([[-1.,  1.,  0.,  0.,  0.,  0.],
+                           [ 0.,  0., -1.,  1.,  0.,  0.],
+                           [ 0.,  0.,  0.,  0., -1.,  1.]])
+        res_t[4] = ['a0_b1-a0_b0', 'a1_b1-a1_b0', 'a2_b1-a2_b0']
+        for ii in range(5):
+            np.testing.assert_equal(res_cp[ii], res_t[ii], err_msg=str(ii))
+
+    def test_dummy_limits(self):
+        b,e = dummy_limits(self.d1)
+        assert_equal(b, np.array([0, 4, 8]))
+        assert_equal(e, np.array([ 4,  8, 12]))
+
+


 if __name__ == '__main__':
@@ -396,67 +619,98 @@ if __name__ == '__main__':
     tt.test_contrast_product()
     tt.test_dummy_1d()
     tt.test_dummy_limits()
+
     import statsmodels.api as sm
+
     examples = ['small', 'large', None][1]
+
     v1name = ['a0', 'a1', 'a2']
     v2name = ['b0', 'b1']
     res_cp = contrast_product(v1name, v2name)
     print(res_cp)
+
     y = np.arange(12)
-    x1 = np.arange(12) // 4
-    x2 = np.arange(12) // 2 % 2
+    x1 = np.arange(12)//4
+    x2 = np.arange(12)//2 % 2
+
     if 'small' in examples:
         d1, d1_labels = dummy_1d(x1)
         d2, d2_labels = dummy_1d(x2)
+
+
     if 'large' in examples:
         x1 = np.repeat(x1, 5, axis=0)
         x2 = np.repeat(x2, 5, axis=0)
+
     nobs = x1.shape[0]
     d1, d1_labels = dummy_1d(x1)
     d2, d2_labels = dummy_1d(x2)
+
     dd_full = dummy_product(d1, d2, method='full')
     dd_dropl = dummy_product(d1, d2, method='drop-last')
     dd_dropf = dummy_product(d1, d2, method='drop-first')
+
+    #Note: full parameterization of dummies is orthogonal
+    #np.eye(6)*10 in "large" example
     print((np.dot(dd_full.T, dd_full) == np.diag(dd_full.sum(0))).all())
-    effect_size = [1.0, 0.01][1]
+
+    #check that transforms work
+    #generate 3 data sets with the 3 different parameterizations
+
+    effect_size = [1., 0.01][1]
     noise_scale = [0.001, 0.1][0]
     noise = noise_scale * np.random.randn(nobs)
-    beta = effect_size * np.arange(1, 7)
+    beta = effect_size * np.arange(1,7)
     ydata_full = (dd_full * beta).sum(1) + noise
     ydata_dropl = (dd_dropl * beta).sum(1) + noise
     ydata_dropf = (dd_dropf * beta).sum(1) + noise
+
     resols_full_full = sm.OLS(ydata_full, dd_full).fit()
     resols_full_dropf = sm.OLS(ydata_full, dd_dropf).fit()
     params_f_f = resols_full_full.params
     params_f_df = resols_full_dropf.params
+
     resols_dropf_full = sm.OLS(ydata_dropf, dd_full).fit()
     resols_dropf_dropf = sm.OLS(ydata_dropf, dd_dropf).fit()
     params_df_f = resols_dropf_full.params
     params_df_df = resols_dropf_dropf.params
+
+
     tr_of = np.linalg.lstsq(dd_dropf, dd_full, rcond=-1)[0]
     tr_fo = np.linalg.lstsq(dd_full, dd_dropf, rcond=-1)[0]
     print(np.dot(tr_fo, params_df_df) - params_df_f)
     print(np.dot(tr_of, params_f_f) - params_f_df)
+
     transf_f_df = DummyTransform(dd_full, dd_dropf)
-    print(np.max(np.abs(dd_full - transf_f_df.inv_dot_right(dd_dropf))))
-    print(np.max(np.abs(dd_dropf - transf_f_df.dot_right(dd_full))))
-    print(np.max(np.abs(params_df_df - transf_f_df.inv_dot_left(params_df_f))))
-    np.max(np.abs(params_f_df - transf_f_df.inv_dot_left(params_f_f)))
-    prodlab, C1, C1lab, C2, C2lab, _ = contrast_product(v1name, v2name)
+    print(np.max(np.abs((dd_full - transf_f_df.inv_dot_right(dd_dropf)))))
+    print(np.max(np.abs((dd_dropf - transf_f_df.dot_right(dd_full)))))
+    print(np.max(np.abs((params_df_df
+                         - transf_f_df.inv_dot_left(params_df_f)))))
+    np.max(np.abs((params_f_df
+                         - transf_f_df.inv_dot_left(params_f_f))))
+
+    prodlab, C1, C1lab, C2, C2lab,_ = contrast_product(v1name, v2name)
+
     print('\ntvalues for no effect of factor 1')
     print('each test is conditional on a level of factor 2')
     print(C1lab)
     print(resols_dropf_full.t_test(C1).tvalue)
+
     print('\ntvalues for no effect of factor 2')
     print('each test is conditional on a level of factor 1')
     print(C2lab)
     print(resols_dropf_full.t_test(C2).tvalue)
+
+    #covariance matrix of restrictions C2, note: orthogonal
     resols_dropf_full.cov_params(C2)
-    R_noint = np.hstack((np.zeros((2, 4)), np.eye(2)))
+
+    #testing for no interaction effect
+    R_noint = np.hstack((np.zeros((2,4)), np.eye(2)))
     inter_direct = resols_full_dropf.tvalues[-2:]
-    inter_transf = resols_full_full.t_test(transf_f_df.inv_dot_right(R_noint)
-        ).tvalue
-    print(np.max(np.abs(inter_direct - inter_transf)))
+    inter_transf = resols_full_full.t_test(transf_f_df.inv_dot_right(R_noint)).tvalue
+    print(np.max(np.abs((inter_direct - inter_transf))))
+
+    #now with class version
     tw = TwoWay(ydata_dropf, x1, x2)
     print(tw.ttest_interaction().tvalue)
     print(tw.ttest_interaction().pvalue)
@@ -465,7 +719,24 @@ if __name__ == '__main__':
     print(tw.ttest_conditional_effect(1)[0].tvalue)
     print(tw.ttest_conditional_effect(2)[0].tvalue)
     print(tw.summary_coeff())
-""" documentation for early examples while developing - some have changed already
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+''' documentation for early examples while developing - some have changed already
 >>> y = np.arange(12)
 >>> y
 array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
@@ -536,8 +807,18 @@ array([[ 1.,  1.,  0.,  1.,  1.,  0.],
        [ 1.,  0.,  0.,  1.,  0.,  0.],
        [ 1.,  0.,  0.,  0.,  0.,  0.],
        [ 1.,  0.,  0.,  0.,  0.,  0.]])
-"""
-"""
+'''
+
+
+
+
+#nprod = ['%s_%s' % (i,j) for i in ['a0', 'a1', 'a2'] for j in ['b0', 'b1']]
+#>>> [''.join(['%s%s' % (signstr(c),v) for c,v in zip(row, nprod) if c != 0])
+#     for row in np.kron(dd[1:], np.eye(2))]
+
+
+
+'''
 >>> nprod = ['%s_%s' % (i,j) for i in ['a0', 'a1', 'a2'] for j in ['b0', 'b1']]
 >>> nprod
 ['a0_b0', 'a0_b1', 'a1_b0', 'a1_b1', 'a2_b0', 'a2_b1']
@@ -675,4 +956,4 @@ array([[ 1.5,  5.5,  9.5],
 [array([ 1.5,  5.5,  9.5]), array([ 0.,  1.,  2.]), array([ 0.5,  0.5,  0.5])]
 >>>

-"""
+'''
diff --git a/statsmodels/sandbox/stats/diagnostic.py b/statsmodels/sandbox/stats/diagnostic.py
index ea98e5918..69740053c 100644
--- a/statsmodels/sandbox/stats/diagnostic.py
+++ b/statsmodels/sandbox/stats/diagnostic.py
@@ -1,11 +1,35 @@
 import warnings
-from statsmodels.stats.diagnostic import OLS, ResultsStore, acorr_breusch_godfrey, acorr_ljungbox, acorr_lm, breaks_cusumolsresid, breaks_hansen, compare_cox, compare_j, het_arch, het_breuschpagan, het_goldfeldquandt, het_white, linear_harvey_collier, linear_lm, linear_rainbow, recursive_olsresiduals, spec_white
+
+from statsmodels.stats.diagnostic import (
+    OLS,
+    ResultsStore,
+    acorr_breusch_godfrey,
+    acorr_ljungbox,
+    acorr_lm,
+    breaks_cusumolsresid,
+    breaks_hansen,
+    compare_cox,
+    compare_j,
+    het_arch,
+    het_breuschpagan,
+    het_goldfeldquandt,
+    het_white,
+    linear_harvey_collier,
+    linear_lm,
+    linear_rainbow,
+    recursive_olsresiduals,
+    spec_white,
+)
 from statsmodels.tsa.stattools import adfuller
-__all__ = ['OLS', 'ResultsStore', 'acorr_breusch_godfrey', 'acorr_ljungbox',
-    'acorr_lm', 'adfuller', 'breaks_cusumolsresid', 'breaks_hansen',
-    'compare_cox', 'compare_j', 'het_arch', 'het_breuschpagan',
-    'het_goldfeldquandt', 'het_white', 'linear_harvey_collier', 'linear_lm',
-    'linear_rainbow', 'recursive_olsresiduals', 'spec_white']
-warnings.warn(
-    'The statsmodels.sandbox.stats.diagnostic module is deprecated. Use statsmodels.stats.diagnostic.'
-    , FutureWarning, stacklevel=2)
+
+__all__ = ["OLS", "ResultsStore", "acorr_breusch_godfrey", "acorr_ljungbox",
+           "acorr_lm", "adfuller", "breaks_cusumolsresid", "breaks_hansen",
+           "compare_cox", "compare_j", "het_arch", "het_breuschpagan",
+           "het_goldfeldquandt", "het_white", "linear_harvey_collier",
+           "linear_lm", "linear_rainbow", "recursive_olsresiduals",
+           "spec_white"]
+
+
+warnings.warn("The statsmodels.sandbox.stats.diagnostic module is deprecated. "
+              "Use statsmodels.stats.diagnostic.", FutureWarning,
+              stacklevel=2)
diff --git a/statsmodels/sandbox/stats/ex_newtests.py b/statsmodels/sandbox/stats/ex_newtests.py
index debffc15d..019197816 100644
--- a/statsmodels/sandbox/stats/ex_newtests.py
+++ b/statsmodels/sandbox/stats/ex_newtests.py
@@ -1,13 +1,29 @@
 import statsmodels.datasets.macrodata.data as macro
 from statsmodels.tsa.stattools import adfuller
+
 macrod = macro.load().data
+
 print(macro.NOTE)
+
 print(macrod.dtype.names)
-datatrendli = [('realgdp', 1), ('realcons', 1), ('realinv', 1), ('realgovt',
-    1), ('realdpi', 1), ('cpi', 1), ('m1', 1), ('tbilrate', 0), ('unemp', 0
-    ), ('pop', 1), ('infl', 0), ('realint', 0)]
+
+datatrendli = [
+               ('realgdp', 1),
+               ('realcons', 1),
+               ('realinv', 1),
+               ('realgovt', 1),
+               ('realdpi', 1),
+               ('cpi', 1),
+               ('m1', 1),
+               ('tbilrate', 0),
+               ('unemp',0),
+               ('pop', 1),
+               ('infl',0),
+               ('realint', 0)
+               ]
+
 print('%-10s %5s %-8s' % ('variable', 'trend', '  adf'))
 for name, torder in datatrendli:
-    c_order = {(0): 'n', (1): 'c'}
+    c_order = {0: "n", 1: "c"}
     adf_, pval = adfuller(macrod[name], regression=c_order[torder])[:2]
     print('%-10s %5d %8.4f %8.4f' % (name, torder, adf_, pval))
diff --git a/statsmodels/sandbox/stats/multicomp.py b/statsmodels/sandbox/stats/multicomp.py
index 13045b6fb..22667ea51 100644
--- a/statsmodels/sandbox/stats/multicomp.py
+++ b/statsmodels/sandbox/stats/multicomp.py
@@ -1,4 +1,4 @@
-"""
+'''

 from pystatsmodels mailinglist 20100524

@@ -60,25 +60,34 @@ TODO
 * name of function multipletests, rename to something like pvalue_correction?


-"""
+'''
 from collections import namedtuple
+
 from statsmodels.compat.python import lzip, lrange
+
 import copy
 import math
+
 import numpy as np
 from numpy.testing import assert_almost_equal, assert_equal
 from scipy import stats, interpolate
+
 from statsmodels.iolib.table import SimpleTable
+#temporary circular import
 from statsmodels.stats.multitest import multipletests, _ecdf as ecdf, fdrcorrection as fdrcorrection0, fdrcorrection_twostage
 from statsmodels.graphics import utils
 from statsmodels.tools.sm_exceptions import ValueWarning
+
 try:
+    # Studentized Range in SciPy 1.7+
     from scipy.stats import studentized_range
 except ImportError:
     from statsmodels.stats.libqsturng import qsturng, psturng
     studentized_range_tuple = namedtuple('studentized_range', ['ppf', 'sf'])
     studentized_range = studentized_range_tuple(ppf=qsturng, sf=psturng)
-qcrit = """
+
+
+qcrit = '''
   2     3     4     5     6     7     8     9     10
 5   3.64 5.70   4.60 6.98   5.22 7.80   5.67 8.42   6.03 8.91   6.33 9.32   6.58 9.67   6.80 9.97   6.99 10.24
 6   3.46 5.24   4.34 6.33   4.90 7.03   5.30 7.56   5.63 7.97   5.90 8.32   6.12 8.61   6.32 8.87   6.49 9.10
@@ -102,17 +111,19 @@ qcrit = """
 60  2.83 3.76   3.40 4.28   3.74 4.59   3.98 4.82   4.16 4.99   4.31 5.13 4.44 5.25   4.55 5.36   4.65 5.45
 120   2.80 3.70   3.36 4.20   3.68 4.50   3.92 4.71   4.10 4.87   4.24 5.01 4.36 5.12   4.47 5.21   4.56 5.30
 infinity  2.77 3.64   3.31 4.12   3.63 4.40   3.86 4.60   4.03 4.76   4.17 4.88   4.29 4.99   4.39 5.08   4.47 5.16
-"""
-res = [line.split() for line in qcrit.replace('infinity', '9999').split('\n')]
-c = np.array(res[2:-1]).astype(float)
-ccols = np.arange(2, 11)
-crows = c[:, 0]
+'''
+
+res = [line.split() for line in qcrit.replace('infinity','9999').split('\n')]
+c=np.array(res[2:-1]).astype(float)
+#c[c==9999] = np.inf
+ccols = np.arange(2,11)
+crows = c[:,0]
 cv005 = c[:, 1::2]
 cv001 = c[:, 2::2]


 def get_tukeyQcrit(k, df, alpha=0.05):
-    """
+    '''
     return critical values for Tukey's HSD (Q)

     Parameters
@@ -127,12 +138,17 @@ def get_tukeyQcrit(k, df, alpha=0.05):


     not enough error checking for limitations
-    """
-    pass
-
+    '''
+    if alpha == 0.05:
+        intp = interpolate.interp1d(crows, cv005[:,k-2])
+    elif alpha == 0.01:
+        intp = interpolate.interp1d(crows, cv001[:,k-2])
+    else:
+        raise ValueError('only implemented for alpha equal to 0.01 and 0.05')
+    return intp(df)

 def get_tukeyQcrit2(k, df, alpha=0.05):
-    """
+    '''
     return critical values for Tukey's HSD (Q)

     Parameters
@@ -147,12 +163,12 @@ def get_tukeyQcrit2(k, df, alpha=0.05):


     not enough error checking for limitations
-    """
-    pass
+    '''
+    return studentized_range.ppf(1-alpha, k, df)


 def get_tukey_pvalue(k, df, q):
-    """
+    '''
     return adjusted p-values for Tukey's HSD

     Parameters
@@ -164,17 +180,127 @@ def get_tukey_pvalue(k, df, q):
     q : scalar, array_like; q >= 0
         quantile value of Studentized Range

-    """
-    pass
+    '''
+    return studentized_range.sf(q, k, df)
+
+
+def Tukeythreegene(first, second, third):
+    # Performing the Tukey HSD post-hoc test for three genes
+    # qwb = xlrd.open_workbook('F:/Lab/bioinformatics/qcrittable.xls')
+    # #opening the workbook containing the q crit table
+    # qwb.sheet_names()
+    # qcrittable = qwb.sheet_by_name(u'Sheet1')
+
+    # means of the three arrays
+    firstmean = np.mean(first)
+    secondmean = np.mean(second)
+    thirdmean = np.mean(third)
+
+    # standard deviations of the threearrays
+    firststd = np.std(first)
+    secondstd = np.std(second)
+    thirdstd = np.std(third)
+
+    # standard deviation squared of the three arrays
+    firsts2 = math.pow(firststd, 2)
+    seconds2 = math.pow(secondstd, 2)
+    thirds2 = math.pow(thirdstd, 2)
+
+    # numerator for mean square error
+    mserrornum = firsts2 * 2 + seconds2 * 2 + thirds2 * 2
+    # denominator for mean square error
+    mserrorden = (len(first) + len(second) + len(third)) - 3
+    mserror = mserrornum / mserrorden  # mean square error
+
+    standarderror = math.sqrt(mserror / len(first))
+    # standard error, which is square root of mserror and
+    # the number of samples in a group
+
+    # various degrees of freedom
+    dftotal = len(first) + len(second) + len(third) - 1
+    dfgroups = 2
+    dferror = dftotal - dfgroups  # noqa: F841
+
+    qcrit = 0.5  # fix arbitrary#qcrittable.cell(dftotal, 3).value
+    qcrit = get_tukeyQcrit(3, dftotal, alpha=0.05)
+    # getting the q critical value, for degrees of freedom total and 3 groups
+
+    qtest3to1 = (math.fabs(thirdmean - firstmean)) / standarderror
+    # calculating q test statistic values
+    qtest3to2 = (math.fabs(thirdmean - secondmean)) / standarderror
+    qtest2to1 = (math.fabs(secondmean - firstmean)) / standarderror
+
+    conclusion = []
+
+    # print(qcrit
+    print(qtest3to1)
+    print(qtest3to2)
+    print(qtest2to1)
+
+    # testing all q test statistic values to q critical values
+    if qtest3to1 > qcrit:
+        conclusion.append('3to1null')
+    else:
+        conclusion.append('3to1alt')
+    if qtest3to2 > qcrit:
+        conclusion.append('3to2null')
+    else:
+        conclusion.append('3to2alt')
+    if qtest2to1 > qcrit:
+        conclusion.append('2to1null')
+    else:
+        conclusion.append('2to1alt')
+
+    return conclusion
+
+
+#rewrite by Vincent
+def Tukeythreegene2(genes): #Performing the Tukey HSD post-hoc test for three genes
+    """gend is a list, ie [first, second, third]"""
+#   qwb = xlrd.open_workbook('F:/Lab/bioinformatics/qcrittable.xls')
+    #opening the workbook containing the q crit table
+#   qwb.sheet_names()
+#   qcrittable = qwb.sheet_by_name(u'Sheet1')
+
+    means = []
+    stds = []
+    for gene in genes:
+        means.append(np.mean(gene))
+        std.append(np.std(gene))  # noqa:F821  See GH#5756
+
+    #firstmean = np.mean(first) #means of the three arrays
+    #secondmean = np.mean(second)
+    #thirdmean = np.mean(third)
+
+    #firststd = np.std(first) #standard deviations of the three arrays
+    #secondstd = np.std(second)
+    #thirdstd = np.std(third)
+
+    stds2 = []
+    for std in stds:
+        stds2.append(math.pow(std,2))
+
+
+    #firsts2 = math.pow(firststd,2) #standard deviation squared of the three arrays
+    #seconds2 = math.pow(secondstd,2)
+    #thirds2 = math.pow(thirdstd,2)
+
+    #mserrornum = firsts2*2+seconds2*2+thirds2*2 #numerator for mean square error
+    mserrornum = sum(stds2)*2
+    mserrorden = (len(genes[0])+len(genes[1])+len(genes[2]))-3 #denominator for mean square error
+    mserror = mserrornum/mserrorden #mean square error
+
+
+def catstack(args):
+    x = np.hstack(args)
+    labels = np.hstack([k*np.ones(len(arr)) for k,arr in enumerate(args)])
+    return x, labels


-def Tukeythreegene2(genes):
-    """gend is a list, ie [first, second, third]"""
-    pass


 def maxzero(x):
-    """find all up zero crossings and return the index of the highest
+    '''find all up zero crossings and return the index of the highest

     Not used anymore

@@ -197,12 +323,20 @@ def maxzero(x):
            -0.97727788,  0.95008842, -0.15135721])
     >>> maxzero(x)
     (None, array([6]))
-    """
-    pass
-
+    '''
+    x = np.asarray(x)
+    cond1 = x[:-1] < 0
+    cond2 = x[1:] > 0
+    #allzeros = np.nonzero(np.sign(x[:-1])*np.sign(x[1:]) <= 0)[0] + 1
+    allzeros = np.nonzero((cond1 & cond2) | (x[1:]==0))[0] + 1
+    if x[-1] >=0:
+        maxz = max(allzeros)
+    else:
+        maxz = None
+    return maxz, allzeros

 def maxzerodown(x):
-    """find all up zero crossings and return the index of the highest
+    '''find all up zero crossings and return the index of the highest

     Not used anymore

@@ -224,40 +358,101 @@ def maxzerodown(x):
            -0.97727788,  0.95008842, -0.15135721])
     >>> maxzero(x)
     (None, array([6]))
-"""
-    pass
+'''
+    x = np.asarray(x)
+    cond1 = x[:-1] > 0
+    cond2 = x[1:] < 0
+    #allzeros = np.nonzero(np.sign(x[:-1])*np.sign(x[1:]) <= 0)[0] + 1
+    allzeros = np.nonzero((cond1 & cond2) | (x[1:]==0))[0] + 1
+    if x[-1] <=0:
+        maxz = max(allzeros)
+    else:
+        maxz = None
+    return maxz, allzeros
+


 def rejectionline(n, alpha=0.5):
-    """reference line for rejection in multiple tests
+    '''reference line for rejection in multiple tests

     Not used anymore

     from: section 3.2, page 60
-    """
-    pass
+    '''
+    t = np.arange(n)/float(n)
+    frej = t/( t * (1-alpha) + alpha)
+    return frej


-def fdrcorrection_bak(pvals, alpha=0.05, method='indep'):
-    """Reject False discovery rate correction for pvalues

-    Old version, to be deleted


-    missing: methods that estimate fraction of true hypotheses

-    """
-    pass
+#I do not remember what I changed or why 2 versions,
+#this follows german diss ???  with rline
+#this might be useful if the null hypothesis is not "all effects are zero"
+#rename to _bak and working again on fdrcorrection0
+def fdrcorrection_bak(pvals, alpha=0.05, method='indep'):
+    '''Reject False discovery rate correction for pvalues
+
+    Old version, to be deleted


-def mcfdr(nrepl=100, nobs=50, ntests=10, ntrue=6, mu=0.5, alpha=0.05, rho=0.0):
-    """MonteCarlo to test fdrcorrection
-    """
-    pass
+    missing: methods that estimate fraction of true hypotheses

+    '''
+    pvals = np.asarray(pvals)
+
+
+    pvals_sortind = np.argsort(pvals)
+    pvals_sorted = pvals[pvals_sortind]
+    pecdf = ecdf(pvals_sorted)
+    if method in ['i', 'indep', 'p', 'poscorr']:
+        rline = pvals_sorted / alpha
+    elif method in ['n', 'negcorr']:
+        cm = np.sum(1./np.arange(1, len(pvals)))
+        rline = pvals_sorted / alpha * cm
+    elif method in ['g', 'onegcorr']:  #what's this ? german diss
+        rline = pvals_sorted / (pvals_sorted*(1-alpha) + alpha)
+    elif method in ['oth', 'o2negcorr']: # other invalid, cut-paste
+        cm = np.sum(np.arange(len(pvals)))
+        rline = pvals_sorted / alpha /cm
+    else:
+        raise ValueError('method not available')
+
+    reject = pecdf >= rline
+    if reject.any():
+        rejectmax = max(np.nonzero(reject)[0])
+    else:
+        rejectmax = 0
+    reject[:rejectmax] = True
+    return reject[pvals_sortind.argsort()]
+
+def mcfdr(nrepl=100, nobs=50, ntests=10, ntrue=6, mu=0.5, alpha=0.05, rho=0.):
+    '''MonteCarlo to test fdrcorrection
+    '''
+    nfalse = ntests - ntrue
+    locs = np.array([0.]*ntrue + [mu]*(ntests - ntrue))
+    results = []
+    for i in range(nrepl):
+        #rvs = locs + stats.norm.rvs(size=(nobs, ntests))
+        rvs = locs + randmvn(rho, size=(nobs, ntests))
+        tt, tpval = stats.ttest_1samp(rvs, 0)
+        res = fdrcorrection_bak(np.abs(tpval), alpha=alpha, method='i')
+        res0 = fdrcorrection0(np.abs(tpval), alpha=alpha)
+        #res and res0 give the same results
+        results.append([np.sum(res[:ntrue]), np.sum(res[ntrue:])] +
+                       [np.sum(res0[:ntrue]), np.sum(res0[ntrue:])] +
+                       res.tolist() +
+                       np.sort(tpval).tolist() +
+                       [np.sum(tpval[:ntrue]<alpha),
+                        np.sum(tpval[ntrue:]<alpha)] +
+                       [np.sum(tpval[:ntrue]<alpha/ntests),
+                        np.sum(tpval[ntrue:]<alpha/ntests)])
+    return np.array(results)

 def randmvn(rho, size=(1, 2), standardize=False):
-    """create random draws from equi-correlated multivariate normal distribution
+    '''create random draws from equi-correlated multivariate normal distribution

     Parameters
     ----------
@@ -272,21 +467,47 @@ def randmvn(rho, size=(1, 2), standardize=False):
         nobs by nvars where each row is a independent random draw of nvars-
         dimensional correlated rvs

-    """
-    pass
-
+    '''
+    nobs, nvars = size
+    if 0 < rho and rho < 1:
+        rvs = np.random.randn(nobs, nvars+1)
+        rvs2 = rvs[:,:-1] * np.sqrt((1-rho)) + rvs[:,-1:] * np.sqrt(rho)
+    elif rho ==0:
+        rvs2 = np.random.randn(nobs, nvars)
+    elif rho < 0:
+        if rho < -1./(nvars-1):
+            raise ValueError('rho has to be larger than -1./(nvars-1)')
+        elif rho == -1./(nvars-1):
+            rho = -1./(nvars-1+1e-10)  #barely positive definite
+        #use Cholesky
+        A = rho*np.ones((nvars,nvars))+(1-rho)*np.eye(nvars)
+        rvs2 = np.dot(np.random.randn(nobs, nvars), np.linalg.cholesky(A).T)
+    if standardize:
+        rvs2 = stats.zscore(rvs2)
+    return rvs2
+
+#============================
+#
+# Part 2: Multiple comparisons and independent samples tests
+#
+#============================

 def tiecorrect(xranks):
-    """
+    '''

     should be equivalent of scipy.stats.tiecorrect

-    """
-    pass
+    '''
+    #casting to int rounds down, but not relevant for this case
+    rankbincount = np.bincount(np.asarray(xranks,dtype=int))
+    nties = rankbincount[rankbincount > 1]
+    ntot = float(len(xranks))
+    tiecorrection = 1 - (nties**3 - nties).sum()/(ntot**3 - ntot)
+    return tiecorrection


 class GroupsStats:
-    """
+    '''
     statistics by groups (another version)

     groupstats as a class with lazy evaluation (not yet - decorators are still
@@ -298,10 +519,10 @@ class GroupsStats:

     TODO: incomplete doc strings

-    """
+    '''

     def __init__(self, x, useranks=False, uni=None, intlab=None):
-        """descriptive statistics by groups
+        '''descriptive statistics by groups

         Parameters
         ----------
@@ -314,38 +535,76 @@ class GroupsStats:
             to avoid call to unique, these can be given as inputs


-        """
+        '''
         self.x = np.asarray(x)
         if intlab is None:
-            uni, intlab = np.unique(x[:, 1], return_inverse=True)
+            uni, intlab = np.unique(x[:,1], return_inverse=True)
         elif uni is None:
-            uni = np.unique(x[:, 1])
+            uni = np.unique(x[:,1])
+
         self.useranks = useranks
+
+
         self.uni = uni
         self.intlab = intlab
         self.groupnobs = groupnobs = np.bincount(intlab)
+
+        #temporary until separated and made all lazy
         self.runbasic(useranks=useranks)

+
+
     def runbasic_old(self, useranks=False):
         """runbasic_old"""
-        pass
+        #check: refactoring screwed up case useranks=True
+
+        #groupxsum = np.bincount(intlab, weights=X[:,0])
+        #groupxmean = groupxsum * 1.0 / groupnobs
+        x = self.x
+        if useranks:
+            self.xx = x[:,1].argsort().argsort() + 1  #rankraw
+        else:
+            self.xx = x[:,0]
+        self.groupsum = groupranksum = np.bincount(self.intlab, weights=self.xx)
+        #print('groupranksum', groupranksum, groupranksum.shape, self.groupnobs.shape
+        # start at 1 for stats.rankdata :
+        self.groupmean = grouprankmean = groupranksum * 1.0 / self.groupnobs # + 1
+        self.groupmeanfilter = grouprankmean[self.intlab]
+        #return grouprankmean[intlab]

     def runbasic(self, useranks=False):
         """runbasic"""
-        pass
+        #check: refactoring screwed up case useranks=True
+
+        #groupxsum = np.bincount(intlab, weights=X[:,0])
+        #groupxmean = groupxsum * 1.0 / groupnobs
+        x = self.x
+        if useranks:
+            xuni, xintlab = np.unique(x[:,0], return_inverse=True)
+            ranksraw = x[:,0].argsort().argsort() + 1  #rankraw
+            self.xx = GroupsStats(np.column_stack([ranksraw, xintlab]),
+                                  useranks=False).groupmeanfilter
+        else:
+            self.xx = x[:,0]
+        self.groupsum = groupranksum = np.bincount(self.intlab, weights=self.xx)
+        #print('groupranksum', groupranksum, groupranksum.shape, self.groupnobs.shape
+        # start at 1 for stats.rankdata :
+        self.groupmean = grouprankmean = groupranksum * 1.0 / self.groupnobs # + 1
+        self.groupmeanfilter = grouprankmean[self.intlab]
+        #return grouprankmean[intlab]

     def groupdemean(self):
         """groupdemean"""
-        pass
+        return self.xx - self.groupmeanfilter

     def groupsswithin(self):
         """groupsswithin"""
-        pass
+        xtmp = self.groupdemean()
+        return np.bincount(self.intlab, weights=xtmp**2)

     def groupvarwithin(self):
         """groupvarwithin"""
-        pass
-
+        return self.groupsswithin()/(self.groupnobs-1) #.sum()

 class TukeyHSDResults:
     """Results from Tukey HSD test, with additional plot methods
@@ -370,10 +629,10 @@ class TukeyHSDResults:
     Other attributes contain information about the data from the
     MultiComparison instance: data, df_total, groups, groupsunique, variance.
     """
-
     def __init__(self, mc_object, results_table, q_crit, reject=None,
-        meandiffs=None, std_pairs=None, confint=None, df_total=None,
-        reject2=None, variance=None, pvalues=None):
+                 meandiffs=None, std_pairs=None, confint=None, df_total=None,
+                 reject2=None, variance=None, pvalues=None):
+
         self._multicomp = mc_object
         self._results_table = results_table
         self.q_crit = q_crit
@@ -385,6 +644,7 @@ class TukeyHSDResults:
         self.reject2 = reject2
         self.variance = variance
         self.pvalues = pvalues
+        # Taken out of _multicomp for ease of access for unknowledgeable users
         self.data = self._multicomp.data
         self.groups = self._multicomp.groups
         self.groupsunique = self._multicomp.groupsunique
@@ -393,17 +653,20 @@ class TukeyHSDResults:
         return str(self._results_table)

     def summary(self):
-        """Summary table that can be printed
-        """
-        pass
+        '''Summary table that can be printed
+        '''
+        return self._results_table
+

     def _simultaneous_ci(self):
         """Compute simultaneous confidence intervals for comparison of means.
         """
-        pass
+        self.halfwidths = simultaneous_ci(self.q_crit, self.variance,
+                            self._multicomp.groupstats.groupnobs,
+                            self._multicomp.pairindices)

-    def plot_simultaneous(self, comparison_name=None, ax=None, figsize=(10,
-        6), xlabel=None, ylabel=None):
+    def plot_simultaneous(self, comparison_name=None, ax=None, figsize=(10,6),
+                          xlabel=None, ylabel=None):
         """Plot a universal confidence interval of each group mean

         Visualize significant differences in a plot with one confidence
@@ -467,11 +730,66 @@ class TukeyHSDResults:
         Optionally provide one of the group names to color code the plot to
         highlight group means different from comparison_name.
         """
-        pass
+        fig, ax1 = utils.create_mpl_ax(ax)
+        if figsize is not None:
+            fig.set_size_inches(figsize)
+        if getattr(self, 'halfwidths', None) is None:
+            self._simultaneous_ci()
+        means = self._multicomp.groupstats.groupmean
+
+
+        sigidx = []
+        nsigidx = []
+        minrange = [means[i] - self.halfwidths[i] for i in range(len(means))]
+        maxrange = [means[i] + self.halfwidths[i] for i in range(len(means))]
+
+        if comparison_name is None:
+            ax1.errorbar(means, lrange(len(means)), xerr=self.halfwidths,
+                         marker='o', linestyle='None', color='k', ecolor='k')
+        else:
+            if comparison_name not in self.groupsunique:
+                raise ValueError('comparison_name not found in group names.')
+            midx = np.where(self.groupsunique==comparison_name)[0][0]
+            for i in range(len(means)):
+                if self.groupsunique[i] == comparison_name:
+                    continue
+                if (min(maxrange[i], maxrange[midx]) -
+                                         max(minrange[i], minrange[midx]) < 0):
+                    sigidx.append(i)
+                else:
+                    nsigidx.append(i)
+            #Plot the main comparison
+            ax1.errorbar(means[midx], midx, xerr=self.halfwidths[midx],
+                         marker='o', linestyle='None', color='b', ecolor='b')
+            ax1.plot([minrange[midx]]*2, [-1, self._multicomp.ngroups],
+                     linestyle='--', color='0.7')
+            ax1.plot([maxrange[midx]]*2, [-1, self._multicomp.ngroups],
+                     linestyle='--', color='0.7')
+            #Plot those that are significantly different
+            if len(sigidx) > 0:
+                ax1.errorbar(means[sigidx], sigidx,
+                             xerr=self.halfwidths[sigidx], marker='o',
+                             linestyle='None', color='r', ecolor='r')
+            #Plot those that are not significantly different
+            if len(nsigidx) > 0:
+                ax1.errorbar(means[nsigidx], nsigidx,
+                             xerr=self.halfwidths[nsigidx], marker='o',
+                             linestyle='None', color='0.5', ecolor='0.5')
+
+        ax1.set_title('Multiple Comparisons Between All Pairs (Tukey)')
+        r = np.max(maxrange) - np.min(minrange)
+        ax1.set_ylim([-1, self._multicomp.ngroups])
+        ax1.set_xlim([np.min(minrange) - r / 10., np.max(maxrange) + r / 10.])
+        ylbls = [""] + self.groupsunique.astype(str).tolist() + [""]
+        ax1.set_yticks(np.arange(-1, len(means) + 1))
+        ax1.set_yticklabels(ylbls)
+        ax1.set_xlabel(xlabel if xlabel is not None else '')
+        ax1.set_ylabel(ylabel if ylabel is not None else '')
+        return fig


 class MultiComparison:
-    """Tests for multiple comparisons
+    '''Tests for multiple comparisons

     Parameters
     ----------
@@ -485,69 +803,101 @@ class MultiComparison:
         If group_order does not contain all labels that are in groups, then
         only those observations are kept that have a label in group_order.

-    """
+    '''

     def __init__(self, data, groups, group_order=None):
+
         if len(data) != len(groups):
-            raise ValueError('data has %d elements and groups has %d' % (
-                len(data), len(groups)))
+            raise ValueError('data has %d elements and groups has %d' % (len(data), len(groups)))
         self.data = np.asarray(data)
         self.groups = groups = np.asarray(groups)
+
+        # Allow for user-provided sorting of groups
         if group_order is None:
             self.groupsunique, self.groupintlab = np.unique(groups,
-                return_inverse=True)
+                                                            return_inverse=True)
         else:
+            #check if group_order has any names not in groups
             for grp in group_order:
                 if grp not in groups:
                     raise ValueError(
-                        "group_order value '%s' not found in groups" % grp)
+                            "group_order value '%s' not found in groups" % grp)
             self.groupsunique = np.array(group_order)
             self.groupintlab = np.empty(len(data), int)
-            self.groupintlab.fill(-999)
+            self.groupintlab.fill(-999)  # instead of a nan
             count = 0
             for name in self.groupsunique:
                 idx = np.where(self.groups == name)[0]
                 count += len(idx)
                 self.groupintlab[idx] = np.where(self.groupsunique == name)[0]
             if count != self.data.shape[0]:
+                #raise ValueError('group_order does not contain all groups')
+                # warn and keep only observations with label in group_order
                 import warnings
                 warnings.warn('group_order does not contain all groups:' +
-                    ' dropping observations', ValueWarning)
+                              ' dropping observations', ValueWarning)
+
                 mask_keep = self.groupintlab != -999
                 self.groupintlab = self.groupintlab[mask_keep]
                 self.data = self.data[mask_keep]
                 self.groups = self.groups[mask_keep]
+
         if len(self.groupsunique) < 2:
-            raise ValueError(
-                '2 or more groups required for multiple comparisons')
+            raise ValueError('2 or more groups required for multiple comparisons')
+
         self.datali = [self.data[self.groups == k] for k in self.groupsunique]
-        self.pairindices = np.triu_indices(len(self.groupsunique), 1)
+        self.pairindices = np.triu_indices(len(self.groupsunique), 1)  #tuple
         self.nobs = self.data.shape[0]
         self.ngroups = len(self.groupsunique)

+
     def getranks(self):
-        """convert data to rankdata and attach
+        '''convert data to rankdata and attach


         This creates rankdata as it is used for non-parametric tests, where
         in the case of ties the average rank is assigned.


-        """
-        pass
+        '''
+        #bug: the next should use self.groupintlab instead of self.groups
+        #update: looks fixed
+        #self.ranks = GroupsStats(np.column_stack([self.data, self.groups]),
+        self.ranks = GroupsStats(np.column_stack([self.data, self.groupintlab]),
+                                 useranks=True)
+        self.rankdata = self.ranks.groupmeanfilter

     def kruskal(self, pairs=None, multimethod='T'):
-        """
+        '''
         pairwise comparison for kruskal-wallis test

         This is just a reimplementation of scipy.stats.kruskal and does
         not yet use a multiple comparison correction.

-        """
-        pass
+        '''
+        self.getranks()
+        tot = self.nobs
+        meanranks = self.ranks.groupmean
+        groupnobs = self.ranks.groupnobs
+
+
+        # simultaneous/separate treatment of multiple tests
+        f=(tot * (tot + 1.) / 12.) / stats.tiecorrect(self.rankdata) #(xranks)
+        print('MultiComparison.kruskal')
+        for i,j in zip(*self.pairindices):
+            #pdiff = np.abs(mrs[i] - mrs[j])
+            pdiff = np.abs(meanranks[i] - meanranks[j])
+            se = np.sqrt(f * np.sum(1. / groupnobs[[i,j]] )) #np.array([8,8]))) #Fixme groupnobs[[i,j]] ))
+            Q = pdiff / se
+
+            # TODO : print(statments, fix
+            print(i,j, pdiff, se, pdiff / se, pdiff / se > 2.6310)
+            print(stats.norm.sf(Q) * 2)
+            return stats.norm.sf(Q) * 2
+

     def allpairtest(self, testfunc, alpha=0.05, method='bonf', pvalidx=1):
-        """run a pairwise test on all pairs with multiple test correction
+        '''run a pairwise test on all pairs with multiple test correction

         The statistical test given in testfunc is calculated for all pairs
         and the p-values are adjusted by methods in multipletests. The p-value
@@ -575,8 +925,47 @@ class MultiComparison:
         errors:  TODO: check if this is still wrong, I think it's fixed.
         results from multipletests are in different order
         pval_corrected can be larger than 1 ???
-        """
-        pass
+        '''
+        res = []
+        for i,j in zip(*self.pairindices):
+            res.append(testfunc(self.datali[i], self.datali[j]))
+        res = np.array(res)
+        reject, pvals_corrected, alphacSidak, alphacBonf = \
+                multipletests(res[:, pvalidx], alpha=alpha, method=method)
+        #print(np.column_stack([res[:,0],res[:,1], reject, pvals_corrected])
+
+        i1, i2 = self.pairindices
+        if pvals_corrected is None:
+            resarr = np.array(lzip(self.groupsunique[i1], self.groupsunique[i2],
+                                  np.round(res[:,0],4),
+                                  np.round(res[:,1],4),
+                                  reject),
+                       dtype=[('group1', object),
+                              ('group2', object),
+                              ('stat',float),
+                              ('pval',float),
+                              ('reject', np.bool_)])
+        else:
+            resarr = np.array(lzip(self.groupsunique[i1], self.groupsunique[i2],
+                                  np.round(res[:,0],4),
+                                  np.round(res[:,1],4),
+                                  np.round(pvals_corrected,4),
+                                  reject),
+                       dtype=[('group1', object),
+                              ('group2', object),
+                              ('stat',float),
+                              ('pval',float),
+                              ('pval_corr',float),
+                              ('reject', np.bool_)])
+        results_table = SimpleTable(resarr, headers=resarr.dtype.names)
+        results_table.title = (
+            'Test Multiple Comparison %s \n%s%4.2f method=%s'
+            % (testfunc.__name__, 'FWER=', alpha, method) +
+            '\nalphacSidak=%4.2f, alphacBonf=%5.3f'
+            % (alphacSidak, alphacBonf))
+
+        return results_table, (res, reject, pvals_corrected,
+                               alphacSidak, alphacBonf), resarr

     def tukeyhsd(self, alpha=0.05):
         """
@@ -593,31 +982,87 @@ class MultiComparison:
             A results class containing relevant data and some post-hoc
             calculations
         """
-        pass
+        self.groupstats = GroupsStats(
+            np.column_stack([self.data, self.groupintlab]),
+            useranks=False)
+
+        gmeans = self.groupstats.groupmean
+        gnobs = self.groupstats.groupnobs
+        # var_ = self.groupstats.groupvarwithin()
+        # #possibly an error in varcorrection in this case
+        var_ = np.var(self.groupstats.groupdemean(), ddof=len(gmeans))
+        # res contains: 0:(idx1, idx2), 1:reject, 2:meandiffs, 3: std_pairs,
+        # 4:confint, 5:q_crit, 6:df_total, 7:reject2, 8: pvals
+        res = tukeyhsd(gmeans, gnobs, var_, df=None, alpha=alpha, q_crit=None)
+
+        resarr = np.array(lzip(self.groupsunique[res[0][0]],
+                               self.groupsunique[res[0][1]],
+                               np.round(res[2], 4),
+                               np.round(res[8], 4),
+                               np.round(res[4][:, 0], 4),
+                               np.round(res[4][:, 1], 4),
+                               res[1]),
+                          dtype=[('group1', object),
+                                 ('group2', object),
+                                 ('meandiff', float),
+                                 ('p-adj', float),
+                                 ('lower', float),
+                                 ('upper', float),
+                                 ('reject', np.bool_)])
+        results_table = SimpleTable(resarr, headers=resarr.dtype.names)
+        results_table.title = 'Multiple Comparison of Means - Tukey HSD, ' + \
+                              'FWER=%4.2f' % alpha
+
+        return TukeyHSDResults(self, results_table, res[5], res[1], res[2],
+                               res[3], res[4], res[6], res[7], var_, res[8])


 def rankdata(x):
-    """rankdata, equivalent to scipy.stats.rankdata
+    '''rankdata, equivalent to scipy.stats.rankdata

     just a different implementation, I have not yet compared speed

-    """
-    pass
+    '''
+    uni, intlab = np.unique(x[:,0], return_inverse=True)
+    groupnobs = np.bincount(intlab)
+    groupxsum = np.bincount(intlab, weights=X[:,0])
+    groupxmean = groupxsum * 1.0 / groupnobs
+
+    rankraw = x[:,0].argsort().argsort()
+    groupranksum = np.bincount(intlab, weights=rankraw)
+    # start at 1 for stats.rankdata :
+    grouprankmean = groupranksum * 1.0 / groupnobs + 1
+    return grouprankmean[intlab]
+

+#new

 def compare_ordered(vals, alpha):
-    """simple ordered sequential comparison of means
+    '''simple ordered sequential comparison of means

     vals : array_like
         means or rankmeans for independent groups

     incomplete, no return, not used yet
-    """
-    pass
+    '''
+    vals = np.asarray(vals)
+    alphaf = alpha  # Notation ?
+    sortind = np.argsort(vals)
+    pvals = vals[sortind]
+    sortrevind = sortind.argsort()
+    ntests = len(vals)
+    #alphacSidak = 1 - np.power((1. - alphaf), 1./ntests)
+    #alphacBonf = alphaf / float(ntests)
+    v1, v2 = np.triu_indices(ntests, 1)
+    #v1,v2 have wrong sequence
+    for i in range(4):
+        for j in range(4,i, -1):
+            print(i,j)
+


 def varcorrection_unbalanced(nobs_all, srange=False):
-    """correction factor for variance with unequal sample sizes
+    '''correction factor for variance with unequal sample sizes

     this is just a harmonic mean

@@ -649,12 +1094,15 @@ def varcorrection_unbalanced(nobs_all, srange=False):
     error, MSE. To obtain the correction factor for the standard deviation,
     square root needs to be taken.

-    """
-    pass
-
+    '''
+    nobs_all = np.asarray(nobs_all)
+    if not srange:
+        return (1./nobs_all).sum()
+    else:
+        return (1./nobs_all).sum()/len(nobs_all)

 def varcorrection_pairs_unbalanced(nobs_all, srange=False):
-    """correction factor for variance with unequal sample sizes for all pairs
+    '''correction factor for variance with unequal sample sizes for all pairs

     this is just a harmonic mean

@@ -689,12 +1137,16 @@ def varcorrection_pairs_unbalanced(nobs_all, srange=False):
     For the studentized range statistic, the resulting factor has to be
     divided by 2.

-    """
-    pass
-
+    '''
+    #TODO: test and replace with broadcasting
+    n1, n2 = np.meshgrid(nobs_all, nobs_all)
+    if not srange:
+        return (1./n1 + 1./n2)
+    else:
+        return (1./n1 + 1./n2) / 2.

 def varcorrection_unequal(var_all, nobs_all, df_all):
-    """return joint variance from samples with unequal variances and unequal
+    '''return joint variance from samples with unequal variances and unequal
     sample sizes

     something is wrong
@@ -731,12 +1183,18 @@ def varcorrection_unequal(var_all, nobs_all, df_all):
     square root needs to be taken.

     This is for variance of mean difference not of studentized range.
-    """
-    pass
+    '''
+
+    var_all = np.asarray(var_all)
+    var_over_n = var_all *1./ nobs_all  #avoid integer division
+    varjoint = var_over_n.sum()
+
+    dfjoint = varjoint**2 / (var_over_n**2 * df_all).sum()

+    return varjoint, dfjoint

 def varcorrection_pairs_unequal(var_all, nobs_all, df_all):
-    """return joint variance from samples with unequal variances and unequal
+    '''return joint variance from samples with unequal variances and unequal
     sample sizes for all pairs

     something is wrong
@@ -774,12 +1232,20 @@ def varcorrection_pairs_unequal(var_all, nobs_all, df_all):
     square root needs to be taken.

     TODO: something looks wrong with dfjoint, is formula from SPSS
-    """
-    pass
+    '''
+    #TODO: test and replace with broadcasting
+    v1, v2 = np.meshgrid(var_all, var_all)
+    n1, n2 = np.meshgrid(nobs_all, nobs_all)
+    df1, df2 = np.meshgrid(df_all, df_all)
+
+    varjoint = v1/n1 + v2/n2
+
+    dfjoint = varjoint**2 / (df1 * (v1/n1)**2 + df2 * (v2/n2)**2)

+    return varjoint, dfjoint

 def tukeyhsd(mean_all, nobs_all, var_all, df=None, alpha=0.05, q_crit=None):
-    """simultaneous Tukey HSD
+    '''simultaneous Tukey HSD


     check: instead of sorting, I use absolute value of pairwise differences
@@ -791,8 +1257,64 @@ def tukeyhsd(mean_all, nobs_all, var_all, df=None, alpha=0.05, q_crit=None):

     TODO: error in variance calculation when nobs_all is scalar, missing 1/n

-    """
-    pass
+    '''
+    mean_all = np.asarray(mean_all)
+    #check if or when other ones need to be arrays
+
+    n_means = len(mean_all)
+
+    if df is None:
+        df = nobs_all - 1
+
+    if np.size(df) == 1:   # assumes balanced samples with df = n - 1, n_i = n
+        df_total = n_means * df
+        df = np.ones(n_means) * df
+    else:
+        df_total = np.sum(df)
+
+    if (np.size(nobs_all) == 1) and (np.size(var_all) == 1):
+        #balanced sample sizes and homogenous variance
+        var_pairs = 1. * var_all / nobs_all * np.ones((n_means, n_means))
+
+    elif np.size(var_all) == 1:
+        #unequal sample sizes and homogenous variance
+        var_pairs = var_all * varcorrection_pairs_unbalanced(nobs_all,
+                                                             srange=True)
+    elif np.size(var_all) > 1:
+        var_pairs, df_sum = varcorrection_pairs_unequal(nobs_all, var_all, df)
+        var_pairs /= 2.
+        #check division by two for studentized range
+
+    else:
+        raise ValueError('not supposed to be here')
+
+    #meandiffs_ = mean_all[:,None] - mean_all
+    meandiffs_ = mean_all - mean_all[:,None]  #reverse sign, check with R example
+    std_pairs_ = np.sqrt(var_pairs)
+
+    #select all pairs from upper triangle of matrix
+    idx1, idx2 = np.triu_indices(n_means, 1)
+    meandiffs = meandiffs_[idx1, idx2]
+    std_pairs = std_pairs_[idx1, idx2]
+
+    st_range = np.abs(meandiffs) / std_pairs #studentized range statistic
+
+    df_total_ = max(df_total, 5)  #TODO: smallest df in table
+    if q_crit is None:
+        q_crit = get_tukeyQcrit2(n_means, df_total, alpha=alpha)
+
+    pvalues = get_tukey_pvalue(n_means, df_total, st_range)
+    # we need pvalues to be atleast_1d for iteration. see #6132
+    pvalues = np.atleast_1d(pvalues)
+
+    reject = st_range > q_crit
+    crit_int = std_pairs * q_crit
+    reject2 = np.abs(meandiffs) > crit_int
+
+    confint = np.column_stack((meandiffs - crit_int, meandiffs + crit_int))
+
+    return ((idx1, idx2), reject, meandiffs, std_pairs, confint, q_crit,
+            df_total, reject2, pvalues)


 def simultaneous_ci(q_crit, var, groupnobs, pairindices=None):
@@ -832,11 +1354,34 @@ def simultaneous_ci(q_crit, var, groupnobs, pairindices=None):
     .. [*] Hochberg, Y., and A. C. Tamhane. Multiple Comparison Procedures.
            Hoboken, NJ: John Wiley & Sons, 1987.)
     """
-    pass
+    # Set initial variables
+    ng = len(groupnobs)
+    if pairindices is None:
+        pairindices = np.triu_indices(ng, 1)
+
+    # Compute dij for all pairwise comparisons ala hochberg p. 95
+    gvar = var / groupnobs
+
+    d12 = np.sqrt(gvar[pairindices[0]] + gvar[pairindices[1]])
+
+    # Create the full d matrix given all known dij vals
+    d = np.zeros((ng, ng))
+    d[pairindices] = d12
+    d = d + d.conj().T
+
+    # Compute the two global sums from hochberg eq 3.32
+    sum1 = np.sum(d12)
+    sum2 = np.sum(d, axis=0)
+
+    if (ng > 2):
+        w = ((ng-1.) * sum2 - sum1) / ((ng - 1.) * (ng - 2.))
+    else:
+        w = sum1 * np.ones((2, 1)) / 2.

+    return (q_crit / np.sqrt(2))*w

 def distance_st_range(mean_all, nobs_all, var_all, df=None, triu=False):
-    """pairwise distance matrix, outsourced from tukeyhsd
+    '''pairwise distance matrix, outsourced from tukeyhsd



@@ -846,12 +1391,53 @@ def distance_st_range(mean_all, nobs_all, var_all, df=None, triu=False):

     TODO: error in variance calculation when nobs_all is scalar, missing 1/n

-    """
-    pass
+    '''
+    mean_all = np.asarray(mean_all)
+    #check if or when other ones need to be arrays
+
+    n_means = len(mean_all)
+
+    if df is None:
+        df = nobs_all - 1
+
+    if np.size(df) == 1:   # assumes balanced samples with df = n - 1, n_i = n
+        df_total = n_means * df
+    else:
+        df_total = np.sum(df)
+
+    if (np.size(nobs_all) == 1) and (np.size(var_all) == 1):
+        #balanced sample sizes and homogenous variance
+        var_pairs = 1. * var_all / nobs_all * np.ones((n_means, n_means))
+
+    elif np.size(var_all) == 1:
+        #unequal sample sizes and homogenous variance
+        var_pairs = var_all * varcorrection_pairs_unbalanced(nobs_all,
+                                                             srange=True)
+    elif np.size(var_all) > 1:
+        var_pairs, df_sum = varcorrection_pairs_unequal(nobs_all, var_all, df)
+        var_pairs /= 2.
+        #check division by two for studentized range
+
+    else:
+        raise ValueError('not supposed to be here')
+
+    #meandiffs_ = mean_all[:,None] - mean_all
+    meandiffs = mean_all - mean_all[:,None]  #reverse sign, check with R example
+    std_pairs = np.sqrt(var_pairs)
+
+    idx1, idx2 = np.triu_indices(n_means, 1)
+    if triu:
+        #select all pairs from upper triangle of matrix
+        meandiffs = meandiffs_[idx1, idx2]  # noqa: F821  See GH#5756
+        std_pairs = std_pairs_[idx1, idx2]  # noqa: F821  See GH#5756
+
+    st_range = np.abs(meandiffs) / std_pairs #studentized range statistic
+
+    return st_range, meandiffs, std_pairs, (idx1,idx2)  #return square arrays


 def contrast_allpairs(nm):
-    """contrast or restriction matrix for all pairs of nm variables
+    '''contrast or restriction matrix for all pairs of nm variables

     Parameters
     ----------
@@ -862,12 +1448,18 @@ def contrast_allpairs(nm):
     contr : ndarray, 2d, (nm*(nm-1)/2, nm)
        contrast matrix for all pairwise comparisons

-    """
-    pass
-
+    '''
+    contr = []
+    for i in range(nm):
+        for j in range(i+1, nm):
+            contr_row = np.zeros(nm)
+            contr_row[i] = 1
+            contr_row[j] = -1
+            contr.append(contr_row)
+    return np.array(contr)

 def contrast_all_one(nm):
-    """contrast or restriction matrix for all against first comparison
+    '''contrast or restriction matrix for all against first comparison

     Parameters
     ----------
@@ -878,12 +1470,12 @@ def contrast_all_one(nm):
     contr : ndarray, 2d, (nm-1, nm)
        contrast matrix for all against first comparisons

-    """
-    pass
-
+    '''
+    contr = np.column_stack((np.ones(nm-1), -np.eye(nm-1)))
+    return contr

 def contrast_diff_mean(nm):
-    """contrast or restriction matrix for all against mean comparison
+    '''contrast or restriction matrix for all against mean comparison

     Parameters
     ----------
@@ -894,20 +1486,42 @@ def contrast_diff_mean(nm):
     contr : ndarray, 2d, (nm-1, nm)
        contrast matrix for all against mean comparisons

-    """
-    pass
+    '''
+    return np.eye(nm) - np.ones((nm,nm))/nm
+
+def tukey_pvalues(std_range, nm, df):
+    #corrected but very slow with warnings about integration
+    #nm = len(std_range)
+    contr = contrast_allpairs(nm)
+    corr = np.dot(contr, contr.T)/2.
+    tstat = std_range / np.sqrt(2) * np.ones(corr.shape[0]) #need len of all pairs
+    return multicontrast_pvalues(tstat, corr, df=df)
+
+
+def multicontrast_pvalues(tstat, tcorr, df=None, dist='t', alternative='two-sided'):
+    '''pvalues for simultaneous tests
+
+    '''
+    from statsmodels.sandbox.distributions.multivariate import mvstdtprob
+    if (df is None) and (dist == 't'):
+        raise ValueError('df has to be specified for the t-distribution')
+    tstat = np.asarray(tstat)
+    ntests = len(tstat)
+    cc = np.abs(tstat)
+    pval_global = 1 - mvstdtprob(-cc,cc, tcorr, df)
+    pvals = []
+    for ti in cc:
+        limits = ti*np.ones(ntests)
+        pvals.append(1 - mvstdtprob(-cc,cc, tcorr, df))
+
+    return pval_global, np.asarray(pvals)


-def multicontrast_pvalues(tstat, tcorr, df=None, dist='t', alternative=
-    'two-sided'):
-    """pvalues for simultaneous tests

-    """
-    pass


 class StepDown:
-    """a class for step down methods
+    '''a class for step down methods

     This is currently for simple tree subset descend, similar to homogeneous_subsets,
     but checks all leave-one-out subsets instead of assuming an ordered set.
@@ -939,7 +1553,7 @@ class StepDown:
     be calculated in advance for tests like the F-based ones.


-    """
+    '''

     def __init__(self, vals, nobs_all, var_all, df=None):
         self.vals = vals
@@ -947,6 +1561,10 @@ class StepDown:
         self.nobs_all = nobs_all
         self.var_all = var_all
         self.df = df
+        # the following has been moved to run
+        #self.cache_result = {}
+        #self.crit = self.getcrit(0.5)   #decide where to set alpha, moved to run
+        #self.accepted = []  #store accepted sets, not unique

     def get_crit(self, alpha):
         """
@@ -954,38 +1572,78 @@ class StepDown:

         currently tukey Q, add others
         """
-        pass
+        q_crit = get_tukeyQcrit(self.n_vals, self.df, alpha=alpha)
+        return q_crit * np.ones(self.n_vals)
+
+

     def get_distance_matrix(self):
-        """studentized range statistic"""
-        pass
+        '''studentized range statistic'''
+        #make into property, decorate
+        dres = distance_st_range(self.vals, self.nobs_all, self.var_all, df=self.df)
+        self.distance_matrix = dres[0]

     def iter_subsets(self, indices):
         """Iterate substeps"""
-        pass
+        for ii in range(len(indices)):
+            idxsub = copy.copy(indices)
+            idxsub.pop(ii)
+            yield idxsub
+

     def check_set(self, indices):
-        """check whether pairwise distances of indices satisfy condition
+        '''check whether pairwise distances of indices satisfy condition

-        """
-        pass
+        '''
+        indtup = tuple(indices)
+        if indtup in self.cache_result:
+            return self.cache_result[indtup]
+        else:
+            set_distance_matrix = self.distance_matrix[np.asarray(indices)[:,None], indices]
+            n_elements = len(indices)
+            if np.any(set_distance_matrix > self.crit[n_elements-1]):
+                res = True
+            else:
+                res = False
+            self.cache_result[indtup] = res
+            return res

     def stepdown(self, indices):
         """stepdown"""
-        pass
+        print(indices)
+        if self.check_set(indices): # larger than critical distance
+            if (len(indices) > 2):  # step down into subsets if more than 2 elements
+                for subs in self.iter_subsets(indices):
+                    self.stepdown(subs)
+            else:
+                self.rejected.append(tuple(indices))
+        else:
+            self.accepted.append(tuple(indices))
+            return indices

     def run(self, alpha):
-        """main function to run the test,
+        '''main function to run the test,

         could be done in __call__ instead
         this could have all the initialization code

-        """
-        pass
+        '''
+        self.cache_result = {}
+        self.crit = self.get_crit(alpha)   #decide where to set alpha, moved to run
+        self.accepted = []  #store accepted sets, not unique
+        self.rejected = []
+        self.get_distance_matrix()
+        self.stepdown(lrange(self.n_vals))
+
+        return list(set(self.accepted)), list(set(sd.rejected))
+
+
+
+


 def homogeneous_subsets(vals, dcrit):
-    """recursively check all pairs of vals for minimum distance
+    '''recursively check all pairs of vals for minimum distance

     step down method as in Newman-Keuls and Ryan procedures. This is not a
     closed procedure since not all partitions are checked.
@@ -1025,12 +1683,39 @@ def homogeneous_subsets(vals, dcrit):
     [array([  8. ,   9. ,   9.5,  10. ]), array([ 2. ,  2.5,  3. ]), array([ 6.])]


-    """
-    pass
+    '''

+    nvals = len(vals)
+    indices_ = lrange(nvals)
+    rejected = []
+    subsetsli = []
+    if np.size(dcrit) == 1:
+        dcrit = dcrit*np.ones((nvals, nvals))  #example numbers for experimenting
+
+    def subsets(vals, indices_):
+        '''recursive function for constructing homogeneous subset
+
+        registers rejected and subsetli in outer scope
+        '''
+        i, j = (indices_[0], indices_[-1])
+        if vals[-1] - vals[0] > dcrit[i,j]:
+            rejected.append((indices_[0], indices_[-1]))
+            return [subsets(vals[:-1], indices_[:-1]),
+                    subsets(vals[1:], indices_[1:]),
+                    (indices_[0], indices_[-1])]
+        else:
+            subsetsli.append(tuple(indices_))
+            return indices_
+    res = subsets(vals, indices_)
+
+    all_pairs = [(i,j) for i in range(nvals) for j in range(nvals-1,i,-1)]
+    rejs = set(rejected)
+    not_rejected = list(set(all_pairs) - rejs)
+
+    return list(rejs), not_rejected, list(set(subsetsli)), res

 def set_partition(ssli):
-    """extract a partition from a list of tuples
+    '''extract a partition from a list of tuples

     this should be correctly called select largest disjoint sets.
     Begun and Gabriel 1981 do not seem to be bothered by sets of accepted
@@ -1051,12 +1736,23 @@ def set_partition(ssli):
     >>> set_partition(li)
     ([(5, 6, 7, 8), (1, 2, 3)], [0, 4])

-    """
-    pass
+    '''
+    part = []
+    for s in sorted(list(set(ssli)), key=len)[::-1]:
+        #print(s,
+        s_ = set(s).copy()
+        if not any(set(s_).intersection(set(t)) for t in part):
+            #print('inside:', s
+            part.append(s)
+        #else: print(part
+
+    missing = list(set(i for ll in ssli for i in ll)
+                   - set(i for ll in part for i in ll))
+    return part, missing


 def set_remove_subs(ssli):
-    """remove sets that are subsets of another set from a list of tuples
+    '''remove sets that are subsets of another set from a list of tuples

     Parameters
     ----------
@@ -1076,134 +1772,194 @@ def set_remove_subs(ssli):
     >>> set_remove_subs([(0, 1), (1, 2), (1,1, 1, 2, 3), (0,)])
     [(1, 1, 1, 2, 3), (0, 1)]

-    """
-    pass
+    '''
+    #TODO: maybe convert all tuples to sets immediately, but I do not need the extra efficiency
+    part = []
+    for s in sorted(list(set(ssli)), key=lambda x: len(set(x)))[::-1]:
+        #print(s,
+        #s_ = set(s).copy()
+        if not any(set(s).issubset(set(t)) for t in part):
+            #print('inside:', s
+            part.append(s)
+        #else: print(part
+
+##    missing = list(set(i for ll in ssli for i in ll)
+##                   - set(i for ll in part for i in ll))
+    return part


 if __name__ == '__main__':
+
     examples = ['tukey', 'tukeycrit', 'fdr', 'fdrmc', 'bonf', 'randmvn',
-        'multicompdev', 'None']
+                'multicompdev', 'None']#[-1]
+
     if 'tukey' in examples:
-        x = np.array([[0, 0, 1]]).T + np.random.randn(3, 20)
+        #Example Tukey
+        x = np.array([[0,0,1]]).T + np.random.randn(3, 20)
         print(Tukeythreegene(*x))
-    if 'fdr' in examples or 'bonf' in examples:
+
+    # Example FDR
+    # ------------
+    if ('fdr' in examples) or ('bonf' in examples):
         from .ex_multicomp import example_fdr_bonferroni
         example_fdr_bonferroni()
+
     if 'fdrmc' in examples:
-        mcres = mcfdr(nobs=100, nrepl=1000, ntests=30, ntrue=30, mu=0.1,
-            alpha=0.05, rho=0.3)
+        mcres = mcfdr(nobs=100, nrepl=1000, ntests=30, ntrue=30, mu=0.1, alpha=0.05, rho=0.3)
         mcmeans = np.array(mcres).mean(0)
         print(mcmeans)
-        print(mcmeans[0] / 6.0, 1 - mcmeans[1] / 4.0)
+        print(mcmeans[0]/6., 1-mcmeans[1]/4.)
         print(mcmeans[:4], mcmeans[-4:])
+
+
     if 'randmvn' in examples:
-        rvsmvn = randmvn(0.8, (5000, 5))
+        rvsmvn = randmvn(0.8, (5000,5))
         print(np.corrcoef(rvsmvn, rowvar=0))
         print(rvsmvn.var(0))
+
+
     if 'tukeycrit' in examples:
-        print(get_tukeyQcrit(8, 8, alpha=0.05), 5.6)
+        print(get_tukeyQcrit(8, 8, alpha=0.05), 5.60)
         print(get_tukeyQcrit(8, 8, alpha=0.01), 7.47)
+
+
     if 'multicompdev' in examples:
-        X = np.array([[7.68, 1], [7.69, 1], [7.7, 1], [7.7, 1], [7.72, 1],
-            [7.73, 1], [7.73, 1], [7.76, 1], [7.71, 2], [7.73, 2], [7.74, 2
-            ], [7.74, 2], [7.78, 2], [7.78, 2], [7.8, 2], [7.81, 2], [7.74,
-            3], [7.75, 3], [7.77, 3], [7.78, 3], [7.8, 3], [7.81, 3], [7.84,
-            3], [7.71, 4], [7.71, 4], [7.74, 4], [7.79, 4], [7.81, 4], [
-            7.85, 4], [7.87, 4], [7.91, 4]])
-        xli = [X[X[:, 1] == k, 0] for k in range(1, 5)]
-        xranks = stats.rankdata(X[:, 0])
-        xranksli = [xranks[X[:, 1] == k] for k in range(1, 5)]
+        #development of kruskal-wallis multiple-comparison
+        #example from matlab file exchange
+
+        X = np.array([[7.68, 1], [7.69, 1], [7.70, 1], [7.70, 1], [7.72, 1],
+                      [7.73, 1], [7.73, 1], [7.76, 1], [7.71, 2], [7.73, 2],
+                      [7.74, 2], [7.74, 2], [7.78, 2], [7.78, 2], [7.80, 2],
+                      [7.81, 2], [7.74, 3], [7.75, 3], [7.77, 3], [7.78, 3],
+                      [7.80, 3], [7.81, 3], [7.84, 3], [7.71, 4], [7.71, 4],
+                      [7.74, 4], [7.79, 4], [7.81, 4], [7.85, 4], [7.87, 4],
+                      [7.91, 4]])
+        xli = [X[X[:,1]==k,0] for k in range(1,5)]
+        xranks = stats.rankdata(X[:,0])
+        xranksli = [xranks[X[:,1]==k] for k in range(1,5)]
         xnobs = np.array([len(xval) for xval in xli])
         meanranks = [item.mean() for item in xranksli]
         sumranks = [item.sum() for item in xranksli]
-        stats.norm.sf(0.6744897501960817)
+        # equivalent function
+        #from scipy import special
+        #-np.sqrt(2.)*special.erfcinv(2-0.5) == stats.norm.isf(0.25)
+        stats.norm.sf(0.67448975019608171)
         stats.norm.isf(0.25)
+
         mrs = np.sort(meanranks)
-        v1, v2 = np.triu_indices(4, 1)
+        v1, v2 = np.triu_indices(4,1)
         print('\nsorted rank differences')
         print(mrs[v2] - mrs[v1])
         diffidx = np.argsort(mrs[v2] - mrs[v1])[::-1]
         mrs[v2[diffidx]] - mrs[v1[diffidx]]
+
         print('\nkruskal for all pairs')
-        for i, j in zip(v2[diffidx], v1[diffidx]):
-            print(i, j, stats.kruskal(xli[i], xli[j]))
-            mwu, mwupval = stats.mannwhitneyu(xli[i], xli[j],
-                use_continuity=False)
-            print(mwu, mwupval * 2, mwupval * 2 < 0.05 / 6.0, mwupval * 2 <
-                0.1 / 6.0)
-        uni, intlab = np.unique(X[:, 0], return_inverse=True)
+        for i,j in zip(v2[diffidx], v1[diffidx]):
+            print(i,j, stats.kruskal(xli[i], xli[j]))
+            mwu, mwupval = stats.mannwhitneyu(xli[i], xli[j], use_continuity=False)
+            print(mwu, mwupval*2, mwupval*2<0.05/6., mwupval*2<0.1/6.)
+
+
+
+
+
+        uni, intlab = np.unique(X[:,0], return_inverse=True)
         groupnobs = np.bincount(intlab)
-        groupxsum = np.bincount(intlab, weights=X[:, 0])
+        groupxsum = np.bincount(intlab, weights=X[:,0])
         groupxmean = groupxsum * 1.0 / groupnobs
-        rankraw = X[:, 0].argsort().argsort()
+
+        rankraw = X[:,0].argsort().argsort()
         groupranksum = np.bincount(intlab, weights=rankraw)
+        # start at 1 for stats.rankdata :
         grouprankmean = groupranksum * 1.0 / groupnobs + 1
-        assert_almost_equal(grouprankmean[intlab], stats.rankdata(X[:, 0]), 15)
+        assert_almost_equal(grouprankmean[intlab], stats.rankdata(X[:,0]), 15)
         gs = GroupsStats(X, useranks=True)
         print('\ngroupmeanfilter and grouprankmeans')
         print(gs.groupmeanfilter)
         print(grouprankmean[intlab])
-        xuni, xintlab = np.unique(X[:, 0], return_inverse=True)
-        gs2 = GroupsStats(np.column_stack([X[:, 0], xintlab]), useranks=True)
+        #the following has changed
+        #assert_almost_equal(gs.groupmeanfilter, stats.rankdata(X[:,0]), 15)
+
+        xuni, xintlab = np.unique(X[:,0], return_inverse=True)
+        gs2 = GroupsStats(np.column_stack([X[:,0], xintlab]), useranks=True)
+        #assert_almost_equal(gs2.groupmeanfilter, stats.rankdata(X[:,0]), 15)
+
         rankbincount = np.bincount(xranks.astype(int))
         nties = rankbincount[rankbincount > 1]
         ntot = float(len(xranks))
-        tiecorrection = 1 - (nties ** 3 - nties).sum() / (ntot ** 3 - ntot)
-        assert_almost_equal(tiecorrection, stats.tiecorrect(xranks), 15)
+        tiecorrection = 1 - (nties**3 - nties).sum()/(ntot**3 - ntot)
+        assert_almost_equal(tiecorrection, stats.tiecorrect(xranks),15)
         print('\ntiecorrection for data and ranks')
         print(tiecorrection)
         print(tiecorrect(xranks))
+
         tot = X.shape[0]
-        t = 500
-        f = tot * (tot + 1.0) / 12.0 - t / (6.0 * (tot - 1.0))
-        f = tot * (tot + 1.0) / 12.0 / stats.tiecorrect(xranks)
+        t=500 #168
+        f=(tot*(tot+1.)/12.)-(t/(6.*(tot-1.)))
+        f=(tot*(tot+1.)/12.)/stats.tiecorrect(xranks)
         print('\npairs of mean rank differences')
-        for i, j in zip(v2[diffidx], v1[diffidx]):
+        for i,j in zip(v2[diffidx], v1[diffidx]):
+            #pdiff = np.abs(mrs[i] - mrs[j])
             pdiff = np.abs(meanranks[i] - meanranks[j])
-            se = np.sqrt(f * np.sum(1.0 / xnobs[[i, j]]))
-            print(i, j, pdiff, se, pdiff / se, pdiff / se > 2.631)
+            se = np.sqrt(f * np.sum(1./xnobs[[i,j]] )) #np.array([8,8]))) #Fixme groupnobs[[i,j]] ))
+            print(i,j, pdiff, se, pdiff/se, pdiff/se>2.6310)
+
         multicomp = MultiComparison(*X.T)
         multicomp.kruskal()
         gsr = GroupsStats(X, useranks=True)
+
         print('\nexamples for kruskal multicomparison')
         for i in range(10):
-            x1, x2 = (np.random.randn(30, 2) + np.array([0, 0.5])).T
+            x1, x2 = (np.random.randn(30,2) + np.array([0, 0.5])).T
             skw = stats.kruskal(x1, x2)
-            mc2 = MultiComparison(np.r_[x1, x2], np.r_[np.zeros(len(x1)),
-                np.ones(len(x2))])
+            mc2=MultiComparison(np.r_[x1, x2], np.r_[np.zeros(len(x1)), np.ones(len(x2))])
             newskw = mc2.kruskal()
-            print(skw, np.sqrt(skw[0]), skw[1] - newskw, (newskw / skw[1] -
-                1) * 100)
+            print(skw, np.sqrt(skw[0]), skw[1]-newskw, (newskw/skw[1]-1)*100)
+
         tablett, restt, arrtt = multicomp.allpairtest(stats.ttest_ind)
         tablemw, resmw, arrmw = multicomp.allpairtest(stats.mannwhitneyu)
         print('')
         print(tablett)
         print('')
         print(tablemw)
-        tablemwhs, resmw, arrmw = multicomp.allpairtest(stats.mannwhitneyu,
-            method='hs')
+        tablemwhs, resmw, arrmw = multicomp.allpairtest(stats.mannwhitneyu, method='hs')
         print('')
         print(tablemwhs)
+
     if 'last' in examples:
-        xli = (np.random.randn(60, 4) + np.array([0, 0, 0.5, 0.5])).T
+        xli = (np.random.randn(60,4) + np.array([0, 0, 0.5, 0.5])).T
+        #Xrvs = np.array(catstack(xli))
         xrvs, xrvsgr = catstack(xli)
         multicompr = MultiComparison(xrvs, xrvsgr)
         tablett, restt, arrtt = multicompr.allpairtest(stats.ttest_ind)
         print(tablett)
-        xli = [[8, 10, 9, 10, 9], [7, 8, 5, 8, 5], [4, 8, 7, 5, 7]]
+
+
+        xli=[[8,10,9,10,9],[7,8,5,8,5],[4,8,7,5,7]]
         x, labels = catstack(xli)
         gs4 = GroupsStats(np.column_stack([x, labels]))
         print(gs4.groupvarwithin())
-    gmeans = np.array([7.71375, 7.76125, 7.78428571, 7.79875])
+
+
+    #test_tukeyhsd() #moved to test_multi.py
+
+    gmeans = np.array([ 7.71375,  7.76125,  7.78428571,  7.79875])
     gnobs = np.array([8, 8, 7, 8])
     sd = StepDown(gmeans, gnobs, 0.001, [27])
-    pvals = [0.0001, 0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344,
-        0.0459, 0.324, 0.4262, 0.5719, 0.6528, 0.759, 1.0]
+
+    #example from BKY
+    pvals = [0.0001, 0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344, 0.0459,
+             0.3240, 0.4262, 0.5719, 0.6528, 0.7590, 1.000 ]
+
+    #same number of rejection as in BKY paper:
+    #single step-up:4, two-stage:8, iterated two-step:9
+    #also alpha_star is the same as theirs for TST
     print(fdrcorrection0(pvals, alpha=0.05, method='indep'))
     print(fdrcorrection_twostage(pvals, alpha=0.05, iter=False))
     res_tst = fdrcorrection_twostage(pvals, alpha=0.05, iter=False)
-    assert_almost_equal([0.047619, 0.0649], res_tst[-1][:2], 3)
+    assert_almost_equal([0.047619, 0.0649], res_tst[-1][:2],3) #alpha_star for stage 2
     assert_equal(8, res_tst[0].sum())
     print(fdrcorrection_twostage(pvals, alpha=0.05, iter=True))
     print('fdr_gbs', multipletests(pvals, alpha=0.05, method='fdr_gbs'))
+    #multicontrast_pvalues(tstat, tcorr, df)
     tukey_pvalues(3.649, 3, 16)
diff --git a/statsmodels/sandbox/stats/runs.py b/statsmodels/sandbox/stats/runs.py
index 77ac052bd..d80051c8a 100644
--- a/statsmodels/sandbox/stats/runs.py
+++ b/statsmodels/sandbox/stats/runs.py
@@ -1,4 +1,4 @@
-"""runstest
+'''runstest

 formulas for mean and var of runs taken from SAS manual NPAR tests, also idea
 for runstest_1samp and runstest_2samp
@@ -17,16 +17,16 @@ run is also started after a run of a fixed length of the same kind.
 TODO
 * add one-sided tests where possible or where it makes sense

-"""
+'''
+
 import numpy as np
 from scipy import stats
 from scipy.special import comb
 import warnings
 from statsmodels.tools.validation import array_like

-
 class Runs:
-    """class for runs in a binary sequence
+    '''class for runs in a binary sequence


     Parameters
@@ -51,22 +51,22 @@ class Runs:
     The exact distribution for the runs test is also available but not yet
     verified.

-    """
+    '''

     def __init__(self, x):
         self.x = np.asarray(x)
-        self.runstart = runstart = np.nonzero(np.diff(np.r_[[-np.inf], x, [
-            np.inf]]))[0]
+
+        self.runstart = runstart = np.nonzero(np.diff(np.r_[[-np.inf], x, [np.inf]]))[0]
         self.runs = runs = np.diff(runstart)
         self.runs_sign = runs_sign = x[runstart[:-1]]
-        self.runs_pos = runs[runs_sign == 1]
-        self.runs_neg = runs[runs_sign == 0]
+        self.runs_pos = runs[runs_sign==1]
+        self.runs_neg = runs[runs_sign==0]
         self.runs_freqs = np.bincount(runs)
         self.n_runs = len(self.runs)
-        self.n_pos = (x == 1).sum()
+        self.n_pos = (x==1).sum()

     def runs_test(self, correction=True):
-        """basic version of runs test
+        '''basic version of runs test

         Parameters
         ----------
@@ -78,12 +78,33 @@ class Runs:

         pvalue based on normal distribution, with integer correction

-        """
-        pass
-
+        '''
+        self.npo = npo = (self.runs_pos).sum()
+        self.nne = nne = (self.runs_neg).sum()
+
+        #n_r = self.n_runs
+        n = npo + nne
+        npn = npo * nne
+        rmean = 2. * npn / n + 1
+        rvar = 2. * npn * (2.*npn - n) / n**2. / (n-1.)
+        rstd = np.sqrt(rvar)
+        rdemean = self.n_runs - rmean
+        if n >= 50 or not correction:
+            z = rdemean
+        else:
+            if rdemean > 0.5:
+                z = rdemean - 0.5
+            elif rdemean < 0.5:
+                z = rdemean + 0.5
+            else:
+                z = 0.
+
+        z /= rstd
+        pval = 2 * stats.norm.sf(np.abs(z))
+        return z, pval

 def runstest_1samp(x, cutoff='mean', correction=True):
-    """use runs test on binary discretized data above/below cutoff
+    '''use runs test on binary discretized data above/below cutoff

     Parameters
     ----------
@@ -106,12 +127,20 @@ def runstest_1samp(x, cutoff='mean', correction=True):
         p-value, reject the null hypothesis if it is below an type 1 error
         level, alpha .

-    """
-    pass
+    '''

+    x = array_like(x, "x")
+    if cutoff == 'mean':
+        cutoff = np.mean(x)
+    elif cutoff == 'median':
+        cutoff = np.median(x)
+    else:
+        cutoff = float(cutoff)
+    xindicator = (x >= cutoff).astype(int)
+    return Runs(xindicator).runs_test(correction=correction)

 def runstest_2samp(x, y=None, groups=None, correction=True):
-    """Wald-Wolfowitz runstest for two samples
+    '''Wald-Wolfowitz runstest for two samples

     This tests whether two samples come from the same distribution.

@@ -176,12 +205,53 @@ def runstest_2samp(x, y=None, groups=None, correction=True):
     Runs
     RunsProb

-    """
-    pass
+    '''
+    x = np.asarray(x)
+    if y is not None:
+        y = np.asarray(y)
+        groups = np.concatenate((np.zeros(len(x)), np.ones(len(y))))
+        # note reassigning x
+        x = np.concatenate((x, y))
+        gruni = np.arange(2)
+    elif groups is not None:
+        gruni = np.unique(groups)
+        if gruni.size != 2:  # pylint: disable=E1103
+            raise ValueError('not exactly two groups specified')
+        #require groups to be numeric ???
+    else:
+        raise ValueError('either y or groups is necessary')
+
+    xargsort = np.argsort(x)
+    #check for ties
+    x_sorted = x[xargsort]
+    x_diff = np.diff(x_sorted)  # used for detecting and handling ties
+    if x_diff.min() == 0:
+        print('ties detected')   #replace with warning
+        x_mindiff = x_diff[x_diff > 0].min()
+        eps = x_mindiff/2.
+        xx = x.copy()  #do not change original, just in case
+
+        xx[groups==gruni[0]] += eps
+        xargsort = np.argsort(xx)
+        xindicator = groups[xargsort]
+        z0, p0 = Runs(xindicator).runs_test(correction=correction)
+
+        xx[groups==gruni[0]] -= eps   #restore xx = x
+        xx[groups==gruni[1]] += eps
+        xargsort = np.argsort(xx)
+        xindicator = groups[xargsort]
+        z1, p1 = Runs(xindicator).runs_test(correction=correction)
+
+        idx = np.argmax([p0,p1])
+        return [z0, z1][idx], [p0, p1][idx]
+
+    else:
+        xindicator = groups[xargsort]
+        return Runs(xindicator).runs_test(correction=correction)


 class TotalRunsProb:
-    """class for the probability distribution of total runs
+    '''class for the probability distribution of total runs

     This is the exact probability distribution for the (Wald-Wolfowitz)
     runs test. The random variable is the total number of runs if the
@@ -203,7 +273,7 @@ class TotalRunsProb:



-    """
+    '''

     def __init__(self, n0, n1):
         self.n0 = n0
@@ -211,9 +281,41 @@ class TotalRunsProb:
         self.n = n = n0 + n1
         self.comball = comb(n, n1)

+    def runs_prob_even(self, r):
+        n0, n1 = self.n0, self.n1
+        tmp0 = comb(n0-1, r//2-1)
+        tmp1 = comb(n1-1, r//2-1)
+        return tmp0 * tmp1 * 2. / self.comball
+
+    def runs_prob_odd(self, r):
+        n0, n1 = self.n0, self.n1
+        k = (r+1)//2
+        tmp0 = comb(n0-1, k-1)
+        tmp1 = comb(n1-1, k-2)
+        tmp3 = comb(n0-1, k-2)
+        tmp4 = comb(n1-1, k-1)
+        return (tmp0 * tmp1 + tmp3 * tmp4)  / self.comball
+
+    def pdf(self, r):
+        r = np.asarray(r)
+        r_isodd = np.mod(r, 2) > 0
+        r_odd = r[r_isodd]
+        r_even = r[~r_isodd]
+        runs_pdf = np.zeros(r.shape)
+        runs_pdf[r_isodd] = self.runs_prob_odd(r_odd)
+        runs_pdf[~r_isodd] = self.runs_prob_even(r_even)
+        return runs_pdf
+
+
+    def cdf(self, r):
+        r_ = np.arange(2,r+1)
+        cdfval = self.runs_prob_even(r_[::2]).sum()
+        cdfval += self.runs_prob_odd(r_[1::2]).sum()
+        return cdfval
+

 class RunsProb:
-    """distribution of success runs of length k or more (classical definition)
+    '''distribution of success runs of length k or more (classical definition)

     The underlying process is assumed to be a sequence of Bernoulli trials
     of a given length n.
@@ -229,10 +331,12 @@ class RunsProb:
     need a MonteCarlo function to do some quick tests before doing more


-    """
+    '''
+
+

     def pdf(self, x, k, n, p):
-        """distribution of success runs of length k or more
+        '''distribution of success runs of length k or more

         Parameters
         ----------
@@ -257,11 +361,19 @@ class RunsProb:
         References
         ----------
         Muselli 1996, theorem 3
-        """
-        pass
+        '''
+
+        q = 1-p
+        m = np.arange(x, (n+1)//(k+1)+1)[:,None]
+        terms = (-1)**(m-x) * comb(m, x) * p**(m*k) * q**(m-1) \
+                * (comb(n - m*k, m - 1) + q * comb(n - m*k, m))
+        return terms.sum(0)

+    def pdf_nb(self, x, k, n, p):
+        pass
+        #y = np.arange(m-1, n-mk+1

-"""
+'''
 >>> [np.sum([RunsProb().pdf(xi, k, 16, 10/16.) for xi in range(0,16)]) for k in range(16)]
 [0.99999332193894064, 0.99999999999999367, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
 >>> [(np.arange(0,16) * [RunsProb().pdf(xi, k, 16, 10/16.) for xi in range(0,16)]).sum() for k in range(16)]
@@ -274,11 +386,12 @@ array([ 0.63635392,  0.37642045,  0.22194602,  0.13039329,  0.07629395,
 array([ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.16298145,  0.12014061,  0.20563602,
         0.35047531,  0.59509277,  1.00708008,  1.69921875,  2.85926815])
-"""
+'''
+


 def median_test_ksample(x, groups):
-    """chisquare test for equality of median/location
+    '''chisquare test for equality of median/location

     This tests whether all groups have the same fraction of observations
     above the median.
@@ -299,12 +412,36 @@ def median_test_ksample(x, groups):
     others ????
        currently some test output, table and expected

-    """
-    pass
+    '''
+    x = np.asarray(x)
+    gruni = np.unique(groups)
+    xli = [x[groups==group] for group in gruni]
+    xmedian = np.median(x)
+    counts_larger = np.array([(xg > xmedian).sum() for xg in xli])
+    counts = np.array([len(xg) for xg in xli])
+    counts_smaller = counts - counts_larger
+    nobs = counts.sum()
+    n_larger = (x > xmedian).sum()
+    n_smaller = nobs - n_larger
+    table = np.vstack((counts_smaller, counts_larger))
+
+    #the following should be replaced by chisquare_contingency table
+    expected = np.vstack((counts * 1. / nobs * n_smaller,
+                          counts * 1. / nobs * n_larger))
+
+    if (expected < 5).any():
+        print('Warning: There are cells with less than 5 expected' \
+        'observations. The chisquare distribution might not be a good' \
+        'approximation for the true distribution.')
+
+    #check ddof
+    return stats.chisquare(table.ravel(), expected.ravel(), ddof=1), table, expected
+
+


 def cochrans_q(x):
-    """Cochran's Q test for identical effect of k treatments
+    '''Cochran's Q test for identical effect of k treatments

     Cochran's Q is a k-sample extension of the McNemar test. If there are only
     two treatments, then Cochran's Q test and McNemar test are equivalent.
@@ -338,12 +475,35 @@ def cochrans_q(x):
     https://en.wikipedia.org/wiki/Cochran_test
     SAS Manual for NPAR TESTS

-    """
-    pass
+    '''
+
+    warnings.warn("Deprecated, use stats.cochrans_q instead", FutureWarning)
+
+    x = np.asarray(x)
+    gruni = np.unique(x)
+    N, k = x.shape
+    count_row_success = (x==gruni[-1]).sum(1, float)
+    count_col_success = (x==gruni[-1]).sum(0, float)
+    count_row_ss = count_row_success.sum()
+    count_col_ss = count_col_success.sum()
+    assert count_row_ss == count_col_ss  #just a calculation check
+

+    #this is SAS manual
+    q_stat = (k-1) * (k *  np.sum(count_col_success**2) - count_col_ss**2) \
+             / (k * count_row_ss - np.sum(count_row_success**2))
+
+    #Note: the denominator looks just like k times the variance of the
+    #columns
+
+    #Wikipedia uses a different, but equivalent expression
+##    q_stat = (k-1) * (k *  np.sum(count_row_success**2) - count_row_ss**2) \
+##             / (k * count_col_ss - np.sum(count_col_success**2))
+
+    return q_stat, stats.chi2.sf(q_stat, k-1)

 def mcnemar(x, y=None, exact=True, correction=True):
-    """McNemar test
+    '''McNemar test

     Parameters
     ----------
@@ -377,12 +537,36 @@ def mcnemar(x, y=None, exact=True, correction=True):
     This is a special case of Cochran's Q test. The results when the chisquare
     distribution is used are identical, except for continuity correction.

-    """
-    pass
+    '''
+
+    warnings.warn("Deprecated, use stats.TableSymmetry instead", FutureWarning)
+
+    x = np.asarray(x)
+    if y is None and x.shape[0] == x.shape[1]:
+        if x.shape[0] != 2:
+            raise ValueError('table needs to be 2 by 2')
+        n1, n2 = x[1, 0], x[0, 1]
+    else:
+        # I'm not checking here whether x and y are binary,
+        # is not this also paired sign test
+        n1 = np.sum(x < y, 0)
+        n2 = np.sum(x > y, 0)
+
+    if exact:
+        stat = np.minimum(n1, n2)
+        # binom is symmetric with p=0.5
+        pval = stats.binom.cdf(stat, n1 + n2, 0.5) * 2
+        pval = np.minimum(pval, 1)  # limit to 1 if n1==n2
+    else:
+        corr = int(correction) # convert bool to 0 or 1
+        stat = (np.abs(n1 - n2) - corr)**2 / (1. * (n1 + n2))
+        df = 1
+        pval = stats.chi2.sf(stat, df)
+    return stat, pval


 def symmetry_bowker(table):
-    """Test for symmetry of a (k, k) square contingency table
+    '''Test for symmetry of a (k, k) square contingency table

     This is an extension of the McNemar test to test the Null hypothesis
     that the contingency table is symmetric around the main diagonal, that is
@@ -419,16 +603,35 @@ def symmetry_bowker(table):
     mcnemar


-    """
-    pass
+    '''
+
+    warnings.warn("Deprecated, use stats.TableSymmetry instead", FutureWarning)
+
+    table = np.asarray(table)
+    k, k2 = table.shape
+    if k != k2:
+        raise ValueError('table needs to be square')
+
+    #low_idx = np.tril_indices(k, -1)  # this does not have Fortran order
+    upp_idx = np.triu_indices(k, 1)
+
+    tril = table.T[upp_idx]   # lower triangle in column order
+    triu = table[upp_idx]     # upper triangle in row order
+
+    stat = ((tril - triu)**2 / (tril + triu + 1e-20)).sum()
+    df = k * (k-1) / 2.
+    pval = stats.chi2.sf(stat, df)
+
+    return stat, pval, df


 if __name__ == '__main__':
+
     x1 = np.array([1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1])
+
     print(Runs(x1).runs_test())
     print(runstest_1samp(x1, cutoff='mean'))
-    print(runstest_2samp(np.arange(16, 0, -1), groups=x1))
-    print(TotalRunsProb(7, 9).cdf(11))
-    print(median_test_ksample(np.random.randn(100), np.random.randint(0, 2,
-        100)))
-    print(cochrans_q(np.random.randint(0, 2, (100, 8))))
+    print(runstest_2samp(np.arange(16,0,-1), groups=x1))
+    print(TotalRunsProb(7,9).cdf(11))
+    print(median_test_ksample(np.random.randn(100), np.random.randint(0,2,100)))
+    print(cochrans_q(np.random.randint(0,2,(100,8))))
diff --git a/statsmodels/sandbox/stats/stats_dhuard.py b/statsmodels/sandbox/stats/stats_dhuard.py
index 7a504b9c2..5d31c1f11 100644
--- a/statsmodels/sandbox/stats/stats_dhuard.py
+++ b/statsmodels/sandbox/stats/stats_dhuard.py
@@ -1,4 +1,4 @@
-"""
+'''
 from David Huard's scipy sandbox, also attached to a ticket and
 in the matplotlib-user mailinglist  (links ???)

@@ -79,11 +79,10 @@ Created on Monday, May 03, 2010, 11:47:03 AM
 Author: josef-pktd, parts based on David Huard
 License: BSD

-"""
+'''
 import scipy.interpolate as interpolate
 import numpy as np

-
 def scoreatpercentile(data, percentile):
     """Return the score at the given percentile of the data.

@@ -93,8 +92,10 @@ def scoreatpercentile(data, percentile):

         will return the median of sample `data`.
     """
-    pass
-
+    per = np.array(percentile)
+    cdf = empiricalcdf(data)
+    interpolator = interpolate.interp1d(np.sort(cdf), np.sort(data))
+    return interpolator(per/100.)

 def percentileofscore(data, score):
     """Return the percentile-position of score relative to data.
@@ -110,8 +111,9 @@ def percentileofscore(data, score):

     Raise an error if the score is outside the range of data.
     """
-    pass
-
+    cdf = empiricalcdf(data)
+    interpolator = interpolate.interp1d(np.sort(data), np.sort(cdf))
+    return interpolator(score)*100.

 def empiricalcdf(data, method='Hazen'):
     """Return the empirical cdf.
@@ -126,18 +128,38 @@ def empiricalcdf(data, method='Hazen'):

     Where i goes from 1 to N.
     """
-    pass
+
+    i = np.argsort(np.argsort(data)) + 1.
+    N = len(data)
+    method = method.lower()
+    if method == 'hazen':
+        cdf = (i-0.5)/N
+    elif method == 'weibull':
+        cdf = i/(N+1.)
+    elif method == 'california':
+        cdf = (i-1.)/N
+    elif method == 'chegodayev':
+        cdf = (i-.3)/(N+.4)
+    elif method == 'cunnane':
+        cdf = (i-.4)/(N+.2)
+    elif method == 'gringorten':
+        cdf = (i-.44)/(N+.12)
+    else:
+        raise ValueError('Unknown method. Choose among Weibull, Hazen,'
+                         'Chegodayev, Cunnane, Gringorten and California.')
+
+    return cdf


 class HistDist:
-    """Distribution with piecewise linear cdf, pdf is step function
+    '''Distribution with piecewise linear cdf, pdf is step function

     can be created from empiricial distribution or from a histogram (not done yet)

     work in progress, not finished


-    """
+    '''

     def __init__(self, data):
         self.data = np.atleast_1d(data)
@@ -145,12 +167,11 @@ class HistDist:
         sortind = np.argsort(data)
         self._datasorted = data[sortind]
         self.ranking = np.argsort(sortind)
+
         cdf = self.empiricalcdf()
         self._empcdfsorted = np.sort(cdf)
-        self.cdfintp = interpolate.interp1d(self._datasorted, self.
-            _empcdfsorted)
-        self.ppfintp = interpolate.interp1d(self._empcdfsorted, self.
-            _datasorted)
+        self.cdfintp = interpolate.interp1d(self._datasorted, self._empcdfsorted)
+        self.ppfintp = interpolate.interp1d(self._empcdfsorted, self._datasorted)

     def empiricalcdf(self, data=None, method='Hazen'):
         """Return the empirical cdf.
@@ -165,34 +186,77 @@ class HistDist:

         Where i goes from 1 to N.
         """
-        pass
+
+        if data is None:
+            data = self.data
+            i = self.ranking
+        else:
+            i = np.argsort(np.argsort(data)) + 1.
+
+        N = len(data)
+        method = method.lower()
+        if method == 'hazen':
+            cdf = (i-0.5)/N
+        elif method == 'weibull':
+            cdf = i/(N+1.)
+        elif method == 'california':
+            cdf = (i-1.)/N
+        elif method == 'chegodayev':
+            cdf = (i-.3)/(N+.4)
+        elif method == 'cunnane':
+            cdf = (i-.4)/(N+.2)
+        elif method == 'gringorten':
+            cdf = (i-.44)/(N+.12)
+        else:
+            raise ValueError('Unknown method. Choose among Weibull, Hazen,'
+                             'Chegodayev, Cunnane, Gringorten and California.')
+
+        return cdf
+

     def cdf_emp(self, score):
-        """
+        '''
         this is score in dh

-        """
-        pass
+        '''
+        return self.cdfintp(score)
+        #return percentileofscore(self.data, score)

     def ppf_emp(self, quantile):
-        """
+        '''
         this is score in dh

-        """
-        pass
+        '''
+        return self.ppfintp(quantile)
+        #return scoreatpercentile(self.data, quantile*100)

+
+    #from DHuard http://old.nabble.com/matplotlib-f2903.html
     def optimize_binning(self, method='Freedman'):
         """Find the optimal number of bins and update the bin countaccordingly.
         Available methods : Freedman
                             Scott
         """
-        pass

+        nobs = len(self.data)
+        if method=='Freedman':
+            IQR = self.ppf_emp(0.75) - self.ppf_emp(0.25) # Interquantile range(75% -25%)
+            width = 2* IQR* nobs**(-1./3)
+
+        elif method=='Scott':
+            width = 3.49 * np.std(self.data) * nobs**(-1./3)
+
+        self.nbin = (np.ptp(self.binlimit)/width)
+        return self.nbin

+
+#changes: josef-pktd
 if __name__ == '__main__':
     import matplotlib.pyplot as plt
+
     nobs = 100
     x = np.random.randn(nobs)
+
     examples = [2]
     if 1 in examples:
         empiricalcdf(x)
@@ -201,42 +265,64 @@ if __name__ == '__main__':
         xsupp = np.linspace(x.min(), x.max())
         pos = percentileofscore(x, xsupp)
         plt.plot(xsupp, pos)
-        plt.plot(scoreatpercentile(x, pos), pos + 1)
-        emp = interpolate.InterpolatedUnivariateSpline(np.sort(x), np.sort(
-            empiricalcdf(x)), k=1)
+        #perc = np.linspace(2.5, 97.5)
+        #plt.plot(scoreatpercentile(x, perc), perc)
+        plt.plot(scoreatpercentile(x, pos), pos+1)
+
+
+        #emp = interpolate.PiecewisePolynomial(np.sort(empiricalcdf(x)), np.sort(x))
+        emp=interpolate.InterpolatedUnivariateSpline(np.sort(x),np.sort(empiricalcdf(x)),k=1)
         pdfemp = np.array([emp.derivatives(xi)[1] for xi in xsupp])
         plt.figure()
-        plt.plot(xsupp, pdfemp)
+        plt.plot(xsupp,pdfemp)
         cdf_ongrid = emp(xsupp)
         plt.figure()
         plt.plot(xsupp, cdf_ongrid)
+
+        #get pdf from interpolated cdf on a regular grid
         plt.figure()
-        plt.step(xsupp[:-1], np.diff(cdf_ongrid) / np.diff(xsupp))
+        plt.step(xsupp[:-1],np.diff(cdf_ongrid)/np.diff(xsupp))
+
+        #reduce number of bins/steps
         xsupp2 = np.linspace(x.min(), x.max(), 25)
         plt.figure()
-        plt.step(xsupp2[:-1], np.diff(emp(xsupp2)) / np.diff(xsupp2))
+        plt.step(xsupp2[:-1],np.diff(emp(xsupp2))/np.diff(xsupp2))
+
+        #pdf using 25 original observations, every (nobs/25)th
         xso = np.sort(x)
-        xs = xso[::nobs / 25]
+        xs = xso[::nobs/25]
         plt.figure()
-        plt.step(xs[:-1], np.diff(emp(xs)) / np.diff(xs))
+        plt.step(xs[:-1],np.diff(emp(xs))/np.diff(xs))
+        #lower end looks strange
+
+
     histd = HistDist(x)
     print(histd.optimize_binning())
     print(histd.cdf_emp(histd.binlimit))
     print(histd.ppf_emp([0.25, 0.5, 0.75]))
     print(histd.cdf_emp([-0.5, -0.25, 0, 0.25, 0.5]))
+
+
     xsupp = np.linspace(x.min(), x.max(), 500)
-    emp = interpolate.InterpolatedUnivariateSpline(np.sort(x), np.sort(
-        empiricalcdf(x)), k=1)
+    emp=interpolate.InterpolatedUnivariateSpline(np.sort(x),np.sort(empiricalcdf(x)),k=1)
+    #pdfemp = np.array([emp.derivatives(xi)[1] for xi in xsupp])
+    #plt.figure()
+    #plt.plot(xsupp,pdfemp)
     cdf_ongrid = emp(xsupp)
     plt.figure()
     plt.plot(xsupp, cdf_ongrid)
-    ppfintp = interpolate.InterpolatedUnivariateSpline(cdf_ongrid, xsupp, k=3)
+    ppfintp = interpolate.InterpolatedUnivariateSpline(cdf_ongrid,xsupp,k=3)
+
     ppfs = ppfintp(cdf_ongrid)
     plt.plot(ppfs, cdf_ongrid)
-    ppfemp = interpolate.UnivariateSpline(np.sort(empiricalcdf(x)), np.sort
-        (x), k=3, s=0.03)
+    #ppfemp=interpolate.InterpolatedUnivariateSpline(np.sort(empiricalcdf(x)),np.sort(x),k=3)
+    #Do not use interpolating splines for function approximation
+    #with s=0.03 the spline is monotonic at the evaluated values
+    ppfemp=interpolate.UnivariateSpline(np.sort(empiricalcdf(x)),np.sort(x),k=3, s=0.03)
     ppfe = ppfemp(cdf_ongrid)
     plt.plot(ppfe, cdf_ongrid)
+
     print('negative density')
-    print('(np.diff(ppfs)).min()', np.diff(ppfs).min())
-    print('(np.diff(cdf_ongrid)).min()', np.diff(cdf_ongrid).min())
+    print('(np.diff(ppfs)).min()', (np.diff(ppfs)).min())
+    print('(np.diff(cdf_ongrid)).min()', (np.diff(cdf_ongrid)).min())
+    #plt.show()
diff --git a/statsmodels/sandbox/stats/stats_mstats_short.py b/statsmodels/sandbox/stats/stats_mstats_short.py
index 0dc645343..a3b7ab53d 100644
--- a/statsmodels/sandbox/stats/stats_mstats_short.py
+++ b/statsmodels/sandbox/stats/stats_mstats_short.py
@@ -1,4 +1,4 @@
-"""get versions of mstats percentile functions that also work with non-masked arrays
+'''get versions of mstats percentile functions that also work with non-masked arrays

 uses dispatch to mstats version for difficult cases:
   - data is masked array
@@ -20,14 +20,23 @@ weighted plotting_positions
 - add weighted quantiles


-"""
+'''
 import numpy as np
 from numpy import ma
 from scipy import stats

+#from numpy.ma import nomask

-def quantiles(a, prob=list([0.25, 0.5, 0.75]), alphap=0.4, betap=0.4, axis=
-    None, limit=(), masknan=False):
+
+
+
+#####--------------------------------------------------------------------------
+#---- --- Percentiles ---
+#####--------------------------------------------------------------------------
+
+
+def quantiles(a, prob=list([.25,.5,.75]), alphap=.4, betap=.4, axis=None,
+               limit=(), masknan=False):
     """
     Computes empirical quantiles for a data array.

@@ -111,17 +120,71 @@ def quantiles(a, prob=list([0.25, 0.5, 0.75]), alphap=0.4, betap=0.4, axis=
       [False False  True]],
            fill_value = 1e+20)
     """
-    pass
-

-def scoreatpercentile(data, per, limit=(), alphap=0.4, betap=0.4, axis=0,
-    masknan=None):
+    if isinstance(a, np.ma.MaskedArray):
+        return stats.mstats.mquantiles(a, prob=prob, alphap=alphap, betap=alphap, axis=axis,
+               limit=limit)
+    if limit:
+        marr = stats.mstats.mquantiles(a, prob=prob, alphap=alphap, betap=alphap, axis=axis,
+               limit=limit)
+        return ma.filled(marr, fill_value=np.nan)
+    if masknan:
+        nanmask = np.isnan(a)
+        if nanmask.any():
+            marr = ma.array(a, mask=nanmask)
+            marr = stats.mstats.mquantiles(marr, prob=prob, alphap=alphap, betap=alphap,
+                              axis=axis, limit=limit)
+            return ma.filled(marr, fill_value=np.nan)
+
+    # Initialization & checks ---------
+    data = np.asarray(a)
+
+    p = np.array(prob, copy=False, ndmin=1)
+    m = alphap + p*(1.-alphap-betap)
+
+    isrolled = False
+    #from _quantiles1d
+    if (axis is None):
+        data = data.ravel()  #reshape(-1,1)
+        axis = 0
+    else:
+        axis = np.arange(data.ndim)[axis]
+        data = np.rollaxis(data, axis)
+        isrolled = True # keep track, maybe can be removed
+
+    x = np.sort(data, axis=0)
+    n = x.shape[0]
+    returnshape = list(data.shape)
+    returnshape[axis] = p
+
+    #TODO: check these
+    if n == 0:
+        return np.empty(len(p), dtype=float)
+    elif n == 1:
+        return np.resize(x, p.shape)
+    aleph = (n*p + m)
+    k = np.floor(aleph.clip(1, n-1)).astype(int)
+    ind = [None]*x.ndim
+    ind[0] = slice(None)
+    gamma = (aleph-k).clip(0,1)[ind]
+    q = (1.-gamma)*x[k-1] + gamma*x[k]
+    if isrolled:
+        return np.rollaxis(q, 0, axis+1)
+    else:
+        return q
+
+def scoreatpercentile(data, per, limit=(), alphap=.4, betap=.4, axis=0, masknan=None):
     """Calculate the score at the given 'per' percentile of the
     sequence a.  For example, the score at per=50 is the median.

     This function is a shortcut to mquantile
     """
-    pass
+    per = np.asarray(per, float)
+    if (per < 0).any() or (per > 100.).any():
+        raise ValueError("The percentile should be between 0. and 100. !"\
+                         " (got %s)" % per)
+    return quantiles(data, prob=[per/100.], alphap=alphap, betap=betap,
+                      limit=limit, axis=axis, masknan=masknan).squeeze()


 def plotting_positions(data, alpha=0.4, beta=0.4, axis=0, masknan=False):
@@ -169,15 +232,43 @@ def plotting_positions(data, alpha=0.4, beta=0.4, axis=0, masknan=False):
     unknown,
     dates to original papers from Beasley, Erickson, Allison 2009 Behav Genet
     """
-    pass
-
+    if isinstance(data, np.ma.MaskedArray):
+        if axis is None or data.ndim == 1:
+            return stats.mstats.plotting_positions(data, alpha=alpha, beta=beta)
+        else:
+            return ma.apply_along_axis(stats.mstats.plotting_positions, axis, data, alpha=alpha, beta=beta)
+    if masknan:
+        nanmask = np.isnan(data)
+        if nanmask.any():
+            marr = ma.array(data, mask=nanmask)
+            #code duplication:
+            if axis is None or data.ndim == 1:
+                marr = stats.mstats.plotting_positions(marr, alpha=alpha, beta=beta)
+            else:
+                marr = ma.apply_along_axis(stats.mstats.plotting_positions, axis, marr, alpha=alpha, beta=beta)
+            return ma.filled(marr, fill_value=np.nan)
+
+    data = np.asarray(data)
+    if data.size == 1:    # use helper function instead
+        data = np.atleast_1d(data)
+        axis = 0
+    if axis is None:
+        data = data.ravel()
+        axis = 0
+    n = data.shape[axis]
+    if data.ndim == 1:
+        plpos = np.empty(data.shape, dtype=float)
+        plpos[data.argsort()] = (np.arange(1,n+1) - alpha)/(n+1.-alpha-beta)
+    else:
+        #nd assignment instead of second argsort does not look easy
+        plpos = (data.argsort(axis).argsort(axis) + 1. - alpha)/(n+1.-alpha-beta)
+    return plpos

 meppf = plotting_positions

-
-def plotting_positions_w1d(data, weights=None, alpha=0.4, beta=0.4, method=
-    'notnormed'):
-    """Weighted plotting positions (or empirical percentile points) for the data.
+def plotting_positions_w1d(data, weights=None, alpha=0.4, beta=0.4,
+                           method='notnormed'):
+    '''Weighted plotting positions (or empirical percentile points) for the data.

     observations are weighted and the plotting positions are defined as
     (ws-alpha)/(n-alpha-beta), where:
@@ -195,79 +286,95 @@ def plotting_positions_w1d(data, weights=None, alpha=0.4, beta=0.4, method=
     --------
     plotting_positions : unweighted version that works also with more than one
         dimension and has other options
-    """
-    pass
-
-
-def edf_normal_inverse_transformed(x, alpha=3.0 / 8, beta=3.0 / 8, axis=0):
-    """rank based normal inverse transformed cdf
-    """
-    pass
-
+    '''
+
+    x = np.atleast_1d(data)
+    if x.ndim > 1:
+        raise ValueError('currently implemented only for 1d')
+    if weights is None:
+        weights = np.ones(x.shape)
+    else:
+        weights = np.array(weights, float, copy=False, ndmin=1) #atleast_1d(weights)
+        if weights.shape != x.shape:
+            raise ValueError('if weights is given, it needs to be the same'
+                             'shape as data')
+    n = len(x)
+    xargsort = x.argsort()
+    ws = weights[xargsort].cumsum()
+    res = np.empty(x.shape)
+    if method == 'normed':
+        res[xargsort] = (1.*ws/ws[-1]*n-alpha)/(n+1.-alpha-beta)
+    else:
+        res[xargsort] = (1.*ws-alpha)/(ws[-1]+1.-alpha-beta)
+    return res
+
+def edf_normal_inverse_transformed(x, alpha=3./8, beta=3./8, axis=0):
+    '''rank based normal inverse transformed cdf
+    '''
+    from scipy import stats
+    ranks = plotting_positions(x, alpha=alpha, beta=alpha, axis=0, masknan=False)
+    ranks_transf = stats.norm.ppf(ranks)
+    return ranks_transf

 if __name__ == '__main__':
+
     x = np.arange(5)
     print(plotting_positions(x))
-    x = np.arange(10).reshape(-1, 2)
+    x = np.arange(10).reshape(-1,2)
     print(plotting_positions(x))
     print(quantiles(x, axis=0))
     print(quantiles(x, axis=None))
     print(quantiles(x, axis=1))
     xm = ma.array(x)
     x2 = x.astype(float)
-    x2[1, 0] = np.nan
+    x2[1,0] = np.nan
     print(plotting_positions(xm, axis=0))
+
+    # test 0d, 1d
     for sl1 in [slice(None), 0]:
-        print((plotting_positions(xm[sl1, 0]) == plotting_positions(x[sl1, 
-            0])).all())
-        print((quantiles(xm[sl1, 0]) == quantiles(x[sl1, 0])).all())
-        print((stats.mstats.mquantiles(ma.fix_invalid(x2[sl1, 0])) ==
-            quantiles(x2[sl1, 0], masknan=1)).all())
+        print((plotting_positions(xm[sl1,0]) == plotting_positions(x[sl1,0])).all())
+        print((quantiles(xm[sl1,0]) == quantiles(x[sl1,0])).all())
+        print((stats.mstats.mquantiles(ma.fix_invalid(x2[sl1,0])) == quantiles(x2[sl1,0], masknan=1)).all())
+
+    #test 2d
     for ax in [0, 1, None, -1]:
-        print((plotting_positions(xm, axis=ax) == plotting_positions(x,
-            axis=ax)).all())
+        print((plotting_positions(xm, axis=ax) == plotting_positions(x, axis=ax)).all())
         print((quantiles(xm, axis=ax) == quantiles(x, axis=ax)).all())
-        print((stats.mstats.mquantiles(ma.fix_invalid(x2), axis=ax) ==
-            quantiles(x2, axis=ax, masknan=1)).all())
-    print((stats.mstats.plotting_positions(ma.fix_invalid(x2)) ==
-        plotting_positions(x2, axis=None, masknan=1)).all())
-    x3 = np.dstack((x, x)).T
-    for ax in [1, 2]:
-        print((plotting_positions(x3, axis=ax)[0] == plotting_positions(x.T,
-            axis=ax - 1)).all())
-    np.testing.assert_equal(plotting_positions(np.arange(10), alpha=0.35,
-        beta=1 - 0.35), (1 + np.arange(10) - 0.35) / 10)
-    np.testing.assert_equal(plotting_positions(np.arange(10), alpha=0.4,
-        beta=0.4), (1 + np.arange(10) - 0.4) / (10 + 0.2))
-    np.testing.assert_equal(plotting_positions(np.arange(10)), (1 + np.
-        arange(10) - 0.4) / (10 + 0.2))
+        print((stats.mstats.mquantiles(ma.fix_invalid(x2), axis=ax) == quantiles(x2, axis=ax, masknan=1)).all())
+
+    #stats version does not have axis
+    print((stats.mstats.plotting_positions(ma.fix_invalid(x2)) == plotting_positions(x2, axis=None, masknan=1)).all())
+
+    #test 3d
+    x3 = np.dstack((x,x)).T
+    for ax in [1,2]:
+        print((plotting_positions(x3, axis=ax)[0] == plotting_positions(x.T, axis=ax-1)).all())
+
+    np.testing.assert_equal(plotting_positions(np.arange(10), alpha=0.35, beta=1-0.35), (1+np.arange(10)-0.35)/10)
+    np.testing.assert_equal(plotting_positions(np.arange(10), alpha=0.4, beta=0.4), (1+np.arange(10)-0.4)/(10+0.2))
+    np.testing.assert_equal(plotting_positions(np.arange(10)), (1+np.arange(10)-0.4)/(10+0.2))
     print('')
-    print(scoreatpercentile(x, [10, 90]))
-    print(plotting_positions_w1d(x[:, 0]))
-    print((plotting_positions_w1d(x[:, 0]) == plotting_positions(x[:, 0])).
-        all())
+    print(scoreatpercentile(x, [10,90]))
+    print(plotting_positions_w1d(x[:,0]))
+    print((plotting_positions_w1d(x[:,0]) == plotting_positions(x[:,0])).all())
+
+
+    #weights versus replicating multiple occurencies of same x value
     w1 = [1, 1, 2, 1, 1]
     plotexample = 1
     if plotexample:
         import matplotlib.pyplot as plt
         plt.figure()
         plt.title('ppf, cdf values on horizontal axis')
-        plt.step(plotting_positions_w1d(x[:, 0], weights=w1, method='0'), x
-            [:, 0], where='post')
-        plt.step(stats.mstats.plotting_positions(np.repeat(x[:, 0], w1,
-            axis=0)), np.repeat(x[:, 0], w1, axis=0), where='post')
-        plt.plot(plotting_positions_w1d(x[:, 0], weights=w1, method='0'), x
-            [:, 0], '-o')
-        plt.plot(stats.mstats.plotting_positions(np.repeat(x[:, 0], w1,
-            axis=0)), np.repeat(x[:, 0], w1, axis=0), '-o')
+        plt.step(plotting_positions_w1d(x[:,0], weights=w1, method='0'), x[:,0], where='post')
+        plt.step(stats.mstats.plotting_positions(np.repeat(x[:,0],w1,axis=0)),np.repeat(x[:,0],w1,axis=0),where='post')
+        plt.plot(plotting_positions_w1d(x[:,0], weights=w1, method='0'), x[:,0], '-o')
+        plt.plot(stats.mstats.plotting_positions(np.repeat(x[:,0],w1,axis=0)),np.repeat(x[:,0],w1,axis=0), '-o')
+
         plt.figure()
         plt.title('cdf, cdf values on vertical axis')
-        plt.step(x[:, 0], plotting_positions_w1d(x[:, 0], weights=w1,
-            method='0'), where='post')
-        plt.step(np.repeat(x[:, 0], w1, axis=0), stats.mstats.
-            plotting_positions(np.repeat(x[:, 0], w1, axis=0)), where='post')
-        plt.plot(x[:, 0], plotting_positions_w1d(x[:, 0], weights=w1,
-            method='0'), '-o')
-        plt.plot(np.repeat(x[:, 0], w1, axis=0), stats.mstats.
-            plotting_positions(np.repeat(x[:, 0], w1, axis=0)), '-o')
+        plt.step(x[:,0], plotting_positions_w1d(x[:,0], weights=w1, method='0'),where='post')
+        plt.step(np.repeat(x[:,0],w1,axis=0), stats.mstats.plotting_positions(np.repeat(x[:,0],w1,axis=0)),where='post')
+        plt.plot(x[:,0], plotting_positions_w1d(x[:,0], weights=w1, method='0'), '-o')
+        plt.plot(np.repeat(x[:,0],w1,axis=0), stats.mstats.plotting_positions(np.repeat(x[:,0],w1,axis=0)), '-o')
     plt.show()
diff --git a/statsmodels/sandbox/sysreg.py b/statsmodels/sandbox/sysreg.py
index 21d11fdd1..363d4bb41 100644
--- a/statsmodels/sandbox/sysreg.py
+++ b/statsmodels/sandbox/sysreg.py
@@ -2,9 +2,19 @@ from statsmodels.regression.linear_model import GLS
 import numpy as np
 from statsmodels.base.model import LikelihoodModelResults
 from scipy import sparse
+
+# http://www.irisa.fr/aladin/wg-statlin/WORKSHOPS/RENNES02/SLIDES/Foschi.pdf
+
 __all__ = ['SUR', 'Sem2SLS']


+#probably should have a SystemModel superclass
+# TODO: does it make sense of SUR equations to have
+# independent endogenous regressors?  If so, then
+# change docs to LHS = RHS
+#TODO: make a dictionary that holds equation specific information
+#rather than these cryptic lists?  Slower to get a dict value?
+#TODO: refine sigma definition
 class SUR:
     """
     Seemingly Unrelated Regression
@@ -78,55 +88,98 @@ class SUR:
     ----------
     Zellner (1962), Greene (2003)
     """
-
+#TODO: Does each equation need nobs to be the same?
     def __init__(self, sys, sigma=None, dfk=None):
         if len(sys) % 2 != 0:
-            raise ValueError(
-                'sys must be a list of pairs of endogenous and exogenous variables.  Got length %s'
-                 % len(sys))
+            raise ValueError("sys must be a list of pairs of endogenous and \
+exogenous variables.  Got length %s" % len(sys))
         if dfk:
-            if not dfk.lower() in ['dfk1', 'dfk2']:
-                raise ValueError('dfk option %s not understood' % dfk)
+            if not dfk.lower() in ['dfk1','dfk2']:
+                raise ValueError("dfk option %s not understood" % (dfk))
         self._dfk = dfk
         M = len(sys[1::2])
         self._M = M
+#        exog = np.zeros((M,M), dtype=object)
+#        for i,eq in enumerate(sys[1::2]):
+#            exog[i,i] = np.asarray(eq)  # not sure this exog is needed
+                                        # used to compute resids for now
         exog = np.column_stack(np.asarray(sys[1::2][i]) for i in range(M))
-        self.exog = exog
+#       exog = np.vstack(np.asarray(sys[1::2][i]) for i in range(M))
+        self.exog = exog # 2d ndarray exog is better
+# Endog, might just go ahead and reshape this?
         endog = np.asarray(sys[::2])
         self.endog = endog
-        self.nobs = float(self.endog[0].shape[0])
+        self.nobs = float(self.endog[0].shape[0]) # assumes all the same length
+
+        # Degrees of Freedom
         df_resid = []
         df_model = []
-        [df_resid.append(self.nobs - np.linalg.matrix_rank(_)) for _ in sys
-            [1::2]]
+        [df_resid.append(self.nobs - np.linalg.matrix_rank(_)) for _ in sys[1::2]]
         [df_model.append(np.linalg.matrix_rank(_) - 1) for _ in sys[1::2]]
         self.df_resid = np.asarray(df_resid)
         self.df_model = np.asarray(df_model)
-        sp_exog = sparse.lil_matrix((int(self.nobs * M), int(np.sum(self.
-            df_model + 1))))
-        self._cols = np.cumsum(np.hstack((0, self.df_model + 1)))
+
+# "Block-diagonal" sparse matrix of exog
+        sp_exog = sparse.lil_matrix((int(self.nobs*M),
+            int(np.sum(self.df_model+1)))) # linked lists to build
+        self._cols = np.cumsum(np.hstack((0, self.df_model+1)))
         for i in range(M):
-            sp_exog[i * self.nobs:(i + 1) * self.nobs, self._cols[i]:self.
-                _cols[i + 1]] = sys[1::2][i]
-        self.sp_exog = sp_exog.tocsr()
+            sp_exog[i*self.nobs:(i+1)*self.nobs,
+                    self._cols[i]:self._cols[i+1]] = sys[1::2][i]
+        self.sp_exog = sp_exog.tocsr() # cast to compressed for efficiency
+# Deal with sigma, check shape earlier if given
         if np.any(sigma):
-            sigma = np.asarray(sigma)
+            sigma = np.asarray(sigma) # check shape
         elif sigma is None:
             resids = []
             for i in range(M):
-                resids.append(GLS(endog[i], exog[:, self._cols[i]:self.
-                    _cols[i + 1]]).fit().resid)
-            resids = np.asarray(resids).reshape(M, -1)
+                resids.append(GLS(endog[i],exog[:,
+                    self._cols[i]:self._cols[i+1]]).fit().resid)
+            resids = np.asarray(resids).reshape(M,-1)
             sigma = self._compute_sigma(resids)
         self.sigma = sigma
-        self.cholsigmainv = np.linalg.cholesky(np.linalg.pinv(self.sigma)).T
+        self.cholsigmainv = np.linalg.cholesky(np.linalg.pinv(\
+                    self.sigma)).T
         self.initialize()

+    def initialize(self):
+        self.wendog = self.whiten(self.endog)
+        self.wexog = self.whiten(self.sp_exog)
+        self.pinv_wexog = np.linalg.pinv(self.wexog)
+        self.normalized_cov_params = np.dot(self.pinv_wexog,
+                np.transpose(self.pinv_wexog))
+        self.history = {'params' : [np.inf]}
+        self.iterations = 0
+
+    def _update_history(self, params):
+        self.history['params'].append(params)
+
     def _compute_sigma(self, resids):
         """
         Computes the sigma matrix and update the cholesky decomposition.
         """
-        pass
+        M = self._M
+        nobs = self.nobs
+        sig = np.dot(resids, resids.T)  # faster way to do this?
+        if not self._dfk:
+            div = nobs
+        elif self._dfk.lower() == 'dfk1':
+            div = np.zeros(M**2)
+            for i in range(M):
+                for j in range(M):
+                    div[i+j] = ((self.df_model[i]+1) *\
+                            (self.df_model[j]+1))**(1/2)
+            div.reshape(M,M)
+        else: # 'dfk2' error checking is done earlier
+            div = np.zeros(M**2)
+            for i in range(M):
+                for j in range(M):
+                    div[i+j] = nobs - np.max(self.df_model[i]+1,
+                        self.df_model[j]+1)
+            div.reshape(M,M)
+# does not handle (#,)
+        self.cholsigmainv = np.linalg.cholesky(np.linalg.pinv(sig/div)).T
+        return sig/div

     def whiten(self, X):
         """
@@ -144,9 +197,15 @@ class SUR:

         If X is the endogenous LHS of the system.
         """
-        pass
+        nobs = self.nobs
+        if X is self.endog: # definitely not a robust check
+            return np.dot(np.kron(self.cholsigmainv,np.eye(nobs)),
+                X.reshape(-1,1))
+        elif X is self.sp_exog:
+            return (sparse.kron(self.cholsigmainv,
+                sparse.eye(nobs,nobs))*X).toarray()#*=dot until cast to array

-    def fit(self, igls=False, tol=1e-05, maxiter=100):
+    def fit(self, igls=False, tol=1e-5, maxiter=100):
         """
         igls : bool
             Iterate until estimates converge if sigma is None instead of
@@ -162,9 +221,40 @@ class SUR:
         diagonal structure. It should work for ill-conditioned `sigma`
         but this is untested.
         """
-        pass

+        if not np.any(self.sigma):
+            self.sigma = self._compute_sigma(self.endog, self.exog)
+        M = self._M
+        beta = np.dot(self.pinv_wexog, self.wendog)
+        self._update_history(beta)
+        self.iterations += 1
+        if not igls:
+            sur_fit = SysResults(self, beta, self.normalized_cov_params)
+            return sur_fit
+
+        conv = self.history['params']
+        while igls and (np.any(np.abs(conv[-2] - conv[-1]) > tol)) and \
+                (self.iterations < maxiter):
+            fittedvalues = (self.sp_exog*beta).reshape(M,-1)
+            resids = self.endog - fittedvalues # do not attach results yet
+            self.sigma = self._compute_sigma(resids) # need to attach for compute?
+            self.wendog = self.whiten(self.endog)
+            self.wexog = self.whiten(self.sp_exog)
+            self.pinv_wexog = np.linalg.pinv(self.wexog)
+            self.normalized_cov_params = np.dot(self.pinv_wexog,
+                    np.transpose(self.pinv_wexog))
+            beta = np.dot(self.pinv_wexog, self.wendog)
+            self._update_history(beta)
+            self.iterations += 1
+        sur_fit = SysResults(self, beta, self.normalized_cov_params)
+        return sur_fit
+
+    def predict(self, design):
+        pass

+#TODO: Should just have a general 2SLS estimator to subclass
+# for IV, FGLS, etc.
+# Also should probably have SEM class and estimators as subclasses
 class Sem2SLS:
     """
     Two-Stage Least Squares for Simultaneous equations
@@ -189,54 +279,98 @@ class Sem2SLS:
     Estimation is done by brute force and there is no exploitation of
     the structure of the system.
     """
-
     def __init__(self, sys, indep_endog=None, instruments=None):
         if len(sys) % 2 != 0:
-            raise ValueError(
-                'sys must be a list of pairs of endogenous and exogenous variables.  Got length %s'
-                 % len(sys))
+            raise ValueError("sys must be a list of pairs of endogenous and \
+exogenous variables.  Got length %s" % len(sys))
         M = len(sys[1::2])
         self._M = M
-        self.endog = sys[::2]
+# The lists are probably a bad idea
+        self.endog = sys[::2]   # these are just list containers
         self.exog = sys[1::2]
         self._K = [np.linalg.matrix_rank(_) for _ in sys[1::2]]
+#        fullexog = np.column_stack((_ for _ in self.exog))
+
         self.instruments = instruments
+
+        # Keep the Y_j's in a container to get IVs
         instr_endog = {}
-        [instr_endog.setdefault(_, []) for _ in indep_endog.keys()]
+        [instr_endog.setdefault(_,[]) for _ in indep_endog.keys()]
+
         for eq_key in indep_endog:
             for varcol in indep_endog[eq_key]:
-                instr_endog[eq_key].append(self.exog[eq_key][:, varcol])
+                instr_endog[eq_key].append(self.exog[eq_key][:,varcol])
+                # ^ copy needed?
+#        self._instr_endog = instr_endog
+
         self._indep_endog = indep_endog
-        _col_map = np.cumsum(np.hstack((0, self._K)))
+        _col_map = np.cumsum(np.hstack((0,self._K))) # starting col no.s
+# move this check to whiten since we're not going to build a full exog?
         for eq_key in indep_endog:
             try:
                 iter(indep_endog[eq_key])
             except:
-                raise TypeError(
-                    'The values of the indep_exog dict must be iterable. Got type %s for converter %s'
-                     % (type(indep_endog[eq_key]), eq_key))
+#                eq_key = [eq_key]
+                raise TypeError("The values of the indep_exog dict must be "
+                                "iterable. Got type %s for converter %s"
+                                % (type(indep_endog[eq_key]), eq_key))
+#            for del_col in indep_endog[eq_key]:
+#                fullexog = np.delete(fullexog,  _col_map[eq_key]+del_col, 1)
+#                _col_map[eq_key+1:] -= 1
+
+# Josef's example for deleting reoccuring "rows"
+#        fullexog = np.unique(fullexog.T.view([('',fullexog.dtype)]*\
+#                fullexog.shape[0])).view(fullexog.dtype).reshape(\
+#                fullexog.shape[0],-1)
+# From http://article.gmane.org/gmane.comp.python.numeric.general/32276/
+# Or Jouni' suggetsion of taking a hash:
+# http://www.mail-archive.com/numpy-discussion@scipy.org/msg04209.html
+# not clear to me how this would work though, only if they are the *same*
+# elements?
+#        self.fullexog = fullexog
         self.wexog = self.whiten(instr_endog)

+
     def whiten(self, Y):
         """
         Runs the first stage of the 2SLS.

         Returns the RHS variables that include the instruments.
         """
-        pass
+        wexog = []
+        indep_endog = self._indep_endog # this has the col mapping
+#        fullexog = self.fullexog
+        instruments = self.instruments
+        for eq in range(self._M): # need to go through all equations regardless
+            instr_eq = Y.get(eq, None) # Y has the eq to ind endog array map
+            newRHS = self.exog[eq].copy()
+            if instr_eq:
+                for i,LHS in enumerate(instr_eq):
+                    yhat = GLS(LHS, self.instruments).fit().fittedvalues
+                    newRHS[:,indep_endog[eq][i]] = yhat
+                # this might fail if there is a one variable column (nobs,)
+                # in exog
+            wexog.append(newRHS)
+        return wexog

     def fit(self):
         """
         """
-        pass
-
+        delta = []
+        wexog = self.wexog
+        endog = self.endog
+        for j in range(self._M):
+            delta.append(GLS(endog[j], wexog[j]).fit().params)
+        return delta

 class SysResults(LikelihoodModelResults):
     """
     Not implemented yet.
     """
-
-    def __init__(self, model, params, normalized_cov_params=None, scale=1.0):
+    def __init__(self, model, params, normalized_cov_params=None, scale=1.):
         super(SysResults, self).__init__(model, params,
-            normalized_cov_params, scale)
+                normalized_cov_params, scale)
         self._get_results()
+
+    def _get_results(self):
+        pass
diff --git a/statsmodels/sandbox/tools/cross_val.py b/statsmodels/sandbox/tools/cross_val.py
index 517cba0dc..3c0e0b887 100644
--- a/statsmodels/sandbox/tools/cross_val.py
+++ b/statsmodels/sandbox/tools/cross_val.py
@@ -12,11 +12,13 @@ changes to code by josef-pktd:
  - docstring formatting: underlines of headers

 """
+
 from statsmodels.compat.python import lrange
 import numpy as np
 from itertools import combinations


+################################################################################
 class LeaveOneOut:
     """
     Leave-One-Out cross validation iterator:
@@ -50,19 +52,25 @@ class LeaveOneOut:
         """
         self.n = n

+
     def __iter__(self):
         n = self.n
         for i in range(n):
-            test_index = np.zeros(n, dtype=bool)
+            test_index  = np.zeros(n, dtype=bool)
             test_index[i] = True
             train_index = np.logical_not(test_index)
             yield train_index, test_index

+
     def __repr__(self):
-        return '%s.%s(n=%i)' % (self.__class__.__module__, self.__class__.
-            __name__, self.n)
+        return '%s.%s(n=%i)' % (self.__class__.__module__,
+                                self.__class__.__name__,
+                                self.n,
+                                )
+


+################################################################################
 class LeavePOut:
     """
     Leave-P-Out cross validation iterator:
@@ -100,6 +108,7 @@ class LeavePOut:
         self.n = n
         self.p = p

+
     def __iter__(self):
         n = self.n
         p = self.p
@@ -110,11 +119,17 @@ class LeavePOut:
             train_index = np.logical_not(test_index)
             yield train_index, test_index

+
     def __repr__(self):
-        return '%s.%s(n=%i, p=%i)' % (self.__class__.__module__, self.
-            __class__.__name__, self.n, self.p)
+        return '%s.%s(n=%i, p=%i)' % (
+                                self.__class__.__module__,
+                                self.__class__.__name__,
+                                self.n,
+                                self.p,
+                                )


+################################################################################
 class KFold:
     """
     K-Folds cross validation iterator:
@@ -149,29 +164,37 @@ class KFold:
         -----
         All the folds have size trunc(n/k), the last one has the complementary
         """
-        assert k > 0, ValueError('cannot have k below 1')
-        assert k < n, ValueError('cannot have k=%d greater than %d' % (k, n))
+        assert k>0, ValueError('cannot have k below 1')
+        assert k<n, ValueError('cannot have k=%d greater than %d'% (k, n))
         self.n = n
         self.k = k

+
     def __iter__(self):
         n = self.n
         k = self.k
-        j = int(np.ceil(n / k))
+        j = int(np.ceil(n/k))
+
         for i in range(k):
-            test_index = np.zeros(n, dtype=bool)
-            if i < k - 1:
-                test_index[i * j:(i + 1) * j] = True
+            test_index  = np.zeros(n, dtype=bool)
+            if i<k-1:
+                test_index[i*j:(i+1)*j] = True
             else:
-                test_index[i * j:] = True
+                test_index[i*j:] = True
             train_index = np.logical_not(test_index)
             yield train_index, test_index

+
     def __repr__(self):
-        return '%s.%s(n=%i, k=%i)' % (self.__class__.__module__, self.
-            __class__.__name__, self.n, self.k)
+        return '%s.%s(n=%i, k=%i)' % (
+                                self.__class__.__module__,
+                                self.__class__.__name__,
+                                self.n,
+                                self.k,
+                                )


+################################################################################
 class LeaveOneLabelOut:
     """
     Leave-One-Label_Out cross-validation iterator:
@@ -197,7 +220,8 @@ class LeaveOneLabelOut:
         >>> lol = cross_val.LeaveOneLabelOut(labels)
         >>> for train_index, test_index in lol:
         ...    print "TRAIN:", train_index, "TEST:", test_index
-        ...    X_train, X_test, y_train, y_test = cross_val.split(train_index,             test_index, X, y)
+        ...    X_train, X_test, y_train, y_test = cross_val.split(train_index, \
+            test_index, X, y)
         ...    print X_train, X_test, y_train, y_test
         TRAIN: [False False  True  True] TEST: [ True  True False False]
         [[5 6]
@@ -210,17 +234,23 @@ class LeaveOneLabelOut:
         """
         self.labels = labels

+
     def __iter__(self):
+        # We make a copy here to avoid side-effects during iteration
         labels = np.array(self.labels, copy=True)
         for i in np.unique(labels):
-            test_index = np.zeros(len(labels), dtype=bool)
-            test_index[labels == i] = True
+            test_index  = np.zeros(len(labels), dtype=bool)
+            test_index[labels==i] = True
             train_index = np.logical_not(test_index)
             yield train_index, test_index

+
     def __repr__(self):
-        return '%s.%s(labels=%s)' % (self.__class__.__module__, self.
-            __class__.__name__, self.labels)
+        return '%s.%s(labels=%s)' % (
+                                self.__class__.__module__,
+                                self.__class__.__name__,
+                                self.labels,
+                                )


 def split(train_indexes, test_indexes, *args):
@@ -228,17 +258,26 @@ def split(train_indexes, test_indexes, *args):
     For each arg return a train and test subsets defined by indexes provided
     in train_indexes and test_indexes
     """
-    pass
-
-
-"""
+    ret = []
+    for arg in args:
+        arg = np.asanyarray(arg)
+        arg_train = arg[train_indexes]
+        arg_test  = arg[test_indexes]
+        ret.append(arg_train)
+        ret.append(arg_test)
+    return ret
+
+'''
  >>> cv = cross_val.LeaveOneLabelOut(X, y) # y making y optional and
 possible to add other arrays of the same shape[0] too
  >>> for X_train, y_train, X_test, y_test in cv:
  ...      print np.sqrt((model.fit(X_train, y_train).predict(X_test)
 - y_test) ** 2).mean())
-"""
+'''
+

+################################################################################
+#below: Author: josef-pktd

 class KStepAhead:
     """
@@ -288,34 +327,41 @@ class KStepAhead:
         self.n = n
         self.k = k
         if start is None:
-            start = int(np.trunc(n * 0.25))
+            start = int(np.trunc(n*0.25)) # pick something arbitrary
         self.start = start
         self.kall = kall
         self.return_slice = return_slice

+
     def __iter__(self):
         n = self.n
         k = self.k
         start = self.start
         if self.return_slice:
-            for i in range(start, n - k):
+            for i in range(start, n-k):
                 train_slice = slice(None, i, None)
                 if self.kall:
-                    test_slice = slice(i, i + k)
+                    test_slice = slice(i, i+k)
                 else:
-                    test_slice = slice(i + k - 1, i + k)
+                    test_slice = slice(i+k-1, i+k)
                 yield train_slice, test_slice
-        else:
-            for i in range(start, n - k):
-                train_index = np.zeros(n, dtype=bool)
+
+        else: #for compatibility with other iterators
+            for i in range(start, n-k):
+                train_index  = np.zeros(n, dtype=bool)
                 train_index[:i] = True
-                test_index = np.zeros(n, dtype=bool)
+                test_index  = np.zeros(n, dtype=bool)
                 if self.kall:
-                    test_index[i:i + k] = True
+                    test_index[i:i+k] = True # np.logical_not(test_index)
                 else:
-                    test_index[i + k - 1:i + k] = True
+                    test_index[i+k-1:i+k] = True
+                #or faster to return np.arange(i,i+k) ?
+                #returning slice should be faster in this case
                 yield train_index, test_index

+
     def __repr__(self):
-        return '%s.%s(n=%i)' % (self.__class__.__module__, self.__class__.
-            __name__, self.n)
+        return '%s.%s(n=%i)' % (self.__class__.__module__,
+                                self.__class__.__name__,
+                                self.n,
+                                )
diff --git a/statsmodels/sandbox/tools/mctools.py b/statsmodels/sandbox/tools/mctools.py
index ae51d13ec..b3026b013 100644
--- a/statsmodels/sandbox/tools/mctools.py
+++ b/statsmodels/sandbox/tools/mctools.py
@@ -1,4 +1,4 @@
-"""Helper class for Monte Carlo Studies for (currently) statistical tests
+'''Helper class for Monte Carlo Studies for (currently) statistical tests

 Most of it should also be usable for Bootstrap, and for MC for estimators.
 Takes the sample generator, dgb, and the statistical results, statistic,
@@ -22,12 +22,13 @@ variables. Joint distribution is not used (yet).
 I guess this is currently only for one sided test statistics, e.g. for
 two-sided tests basend on t or normal distribution use the absolute value.

-"""
+'''
 from statsmodels.compat.python import lrange
 import numpy as np
-from statsmodels.iolib.table import SimpleTable

+from statsmodels.iolib.table import SimpleTable

+#copied from stattools
 class StatTestMC:
     """class to run Monte Carlo study on a statistical test'''

@@ -97,11 +98,11 @@ class StatTestMC:
     """

     def __init__(self, dgp, statistic):
-        self.dgp = dgp
-        self.statistic = statistic
+        self.dgp = dgp #staticmethod(dgp)  #no self
+        self.statistic = statistic # staticmethod(statistic)  #no self

     def run(self, nrepl, statindices=None, dgpargs=[], statsargs=[]):
-        """run the actual Monte Carlo and save results
+        '''run the actual Monte Carlo and save results

         Parameters
         ----------
@@ -122,11 +123,43 @@ class StatTestMC:
         None, all results are attached


-        """
-        pass
+        '''
+        self.nrepl = nrepl
+        self.statindices = statindices
+        self.dgpargs = dgpargs
+        self.statsargs = statsargs
+
+        dgp = self.dgp
+        statfun = self.statistic # name ?
+        #introspect len of return of statfun,
+        #possible problems with ndim>1, check ValueError
+        mcres0 = statfun(dgp(*dgpargs), *statsargs)
+        self.nreturn = nreturns = len(np.ravel(mcres0))
+
+        #single return statistic
+        if statindices is None:
+            #self.nreturn = nreturns = 1
+            mcres = np.zeros(nrepl)
+            mcres[0] = mcres0
+            for ii in range(1, nrepl-1, nreturns):
+                x = dgp(*dgpargs) #(1e-4+np.random.randn(nobs)).cumsum()
+                #should I ravel?
+                mcres[ii] = statfun(x, *statsargs)
+        #more than one return statistic
+        else:
+            self.nreturn = nreturns = len(statindices)
+            self.mcres = mcres = np.zeros((nrepl, nreturns))
+            mcres[0] = [mcres0[i] for i in statindices]
+            for ii in range(1, nrepl-1):
+                x = dgp(*dgpargs) #(1e-4+np.random.randn(nobs)).cumsum()
+                ret = statfun(x, *statsargs)
+                mcres[ii] = [ret[i] for i in statindices]
+
+        self.mcres = mcres
+

     def histogram(self, idx=None, critval=None):
-        """calculate histogram values
+        '''calculate histogram values

         does not do any plotting

@@ -134,11 +167,39 @@ class StatTestMC:
         method, but this also does a binned pdf (self.histo)


-        """
-        pass
+        '''
+        if self.mcres.ndim == 2:
+            if idx is not None:
+                mcres = self.mcres[:,idx]
+            else:
+                raise ValueError('currently only 1 statistic at a time')
+        else:
+            mcres = self.mcres
+
+        if critval is None:
+            histo = np.histogram(mcres, bins=10)
+        else:
+            if not critval[0] == -np.inf:
+                bins=np.r_[-np.inf, critval, np.inf]
+            if not critval[0] == -np.inf:
+                bins=np.r_[bins, np.inf]
+            histo = np.histogram(mcres,
+                                 bins=np.r_[-np.inf, critval, np.inf])
+
+        self.histo = histo
+        self.cumhisto = np.cumsum(histo[0])*1./self.nrepl
+        self.cumhistoreversed = np.cumsum(histo[0][::-1])[::-1]*1./self.nrepl
+        return histo, self.cumhisto, self.cumhistoreversed
+
+    #use cache decorator instead
+    def get_mc_sorted(self):
+        if not hasattr(self, 'mcressort'):
+            self.mcressort = np.sort(self.mcres, axis=0)
+        return self.mcressort
+

     def quantiles(self, idx=None, frac=[0.01, 0.025, 0.05, 0.1, 0.975]):
-        """calculate quantiles of Monte Carlo results
+        '''calculate quantiles of Monte Carlo results

         similar to ppf

@@ -165,11 +226,23 @@ class StatTestMC:
         change sequence idx, frac


-        """
-        pass
+        '''
+
+        if self.mcres.ndim == 2:
+            if idx is not None:
+                mcres = self.mcres[:,idx]
+            else:
+                raise ValueError('currently only 1 statistic at a time')
+        else:
+            mcres = self.mcres
+
+        self.frac = frac = np.asarray(frac)
+
+        mc_sorted = self.get_mc_sorted()[:,idx]
+        return frac, mc_sorted[(self.nrepl*frac).astype(int)]

     def cdf(self, x, idx=None):
-        """calculate cumulative probabilities of Monte Carlo results
+        '''calculate cumulative probabilities of Monte Carlo results

         Parameters
         ----------
@@ -189,11 +262,37 @@ class StatTestMC:



-        """
-        pass
+        '''
+        idx = np.atleast_1d(idx).tolist()  #assure iterable, use list ?
+
+#        if self.mcres.ndim == 2:
+#            if not idx is None:
+#                mcres = self.mcres[:,idx]
+#            else:
+#                raise ValueError('currently only 1 statistic at a time')
+#        else:
+#            mcres = self.mcres
+
+        mc_sorted = self.get_mc_sorted()
+
+        x = np.asarray(x)
+        #TODO:autodetect or explicit option ?
+        if x.ndim > 1 and x.shape[1]==len(idx):
+            use_xi = True
+        else:
+            use_xi = False
+
+        x_ = x  #alias
+        probs = []
+        for i,ix in enumerate(idx):
+            if use_xi:
+                x_ = x[:,i]
+            probs.append(np.searchsorted(mc_sorted[:,ix], x_)/float(self.nrepl))
+        probs = np.asarray(probs).T
+        return x, probs

     def plot_hist(self, idx, distpdf=None, bins=50, ax=None, kwds=None):
-        """plot the histogram against a reference distribution
+        '''plot the histogram against a reference distribution

         Parameters
         ----------
@@ -215,12 +314,33 @@ class StatTestMC:
         None


-        """
-        pass
+        '''
+        if kwds is None:
+            kwds = ({},{})
+        if self.mcres.ndim == 2:
+            if idx is not None:
+                mcres = self.mcres[:,idx]
+            else:
+                raise ValueError('currently only 1 statistic at a time')
+        else:
+            mcres = self.mcres
+
+        lsp = np.linspace(mcres.min(), mcres.max(), 100)

-    def summary_quantiles(self, idx, distppf, frac=[0.01, 0.025, 0.05, 0.1,
-        0.975], varnames=None, title=None):
-        """summary table for quantiles (critical values)
+
+        import matplotlib.pyplot as plt
+        #I do not want to figure this out now
+#        if ax=None:
+#            fig = plt.figure()
+#            ax = fig.addaxis()
+        fig = plt.figure()
+        plt.hist(mcres, bins=bins, normed=True, **kwds[0])
+        plt.plot(lsp, distpdf(lsp), 'r', **kwds[1])
+
+
+    def summary_quantiles(self, idx, distppf, frac=[0.01, 0.025, 0.05, 0.1, 0.975],
+                          varnames=None, title=None):
+        '''summary table for quantiles (critical values)

         Parameters
         ----------
@@ -240,11 +360,34 @@ class StatTestMC:
         table : instance of SimpleTable
             use `print(table` to see results

-        """
-        pass
+        '''
+        idx = np.atleast_1d(idx)  #assure iterable, use list ?
+
+        quant, mcq = self.quantiles(idx, frac=frac)
+        #not sure whether this will work with single quantile
+        #crit = stats.chi2([2,4]).ppf(np.atleast_2d(quant).T)
+        crit = distppf(np.atleast_2d(quant).T)
+        mml=[]
+        for i, ix in enumerate(idx):  #TODO: hardcoded 2 ?
+            mml.extend([mcq[:,i], crit[:,i]])
+        #mmlar = np.column_stack(mml)
+        mmlar = np.column_stack([quant] + mml)
+        #print(mmlar.shape
+        if title:
+            title = title +' Quantiles (critical values)'
+        else:
+            title='Quantiles (critical values)'
+        #TODO use stub instead
+        if varnames is None:
+            varnames = ['var%d' % i for i in range(mmlar.shape[1]//2)]
+        headers = ['\nprob'] + ['%s\n%s' % (i, t) for i in varnames for t in ['mc', 'dist']]
+        return SimpleTable(mmlar,
+                          txt_fmt={'data_fmts': ["%#6.3f"]+["%#10.4f"]*(mmlar.shape[1]-1)},
+                          title=title,
+                          headers=headers)

     def summary_cdf(self, idx, frac, crit, varnames=None, title=None):
-        """summary table for cumulative density function
+        '''summary table for cumulative density function


         Parameters
@@ -265,14 +408,75 @@ class StatTestMC:
             use `print(table` to see results


-        """
-        pass
+        '''
+        idx = np.atleast_1d(idx)  #assure iterable, use list ?
+
+
+        mml=[]
+        #TODO:need broadcasting in cdf
+        for i in range(len(idx)):
+            #print(i, mc1.cdf(crit[:,i], [idx[i]])[1].ravel()
+            mml.append(self.cdf(crit[:,i], [idx[i]])[1].ravel())
+        #mml = self.cdf(crit, idx)[1]
+        #mmlar = np.column_stack(mml)
+        #print(mml[0].shape, np.shape(frac)
+        mmlar = np.column_stack([frac] + mml)
+        #print(mmlar.shape
+        if title:
+            title = title +' Probabilites'
+        else:
+            title='Probabilities'
+        #TODO use stub instead
+        #headers = ['\nprob'] + ['var%d\n%s' % (i, t) for i in range(mmlar.shape[1]-1) for t in ['mc']]
+
+        if varnames is None:
+            varnames = ['var%d' % i for i in range(mmlar.shape[1]-1)]
+        headers = ['prob'] + varnames
+        return SimpleTable(mmlar,
+                          txt_fmt={'data_fmts': ["%#6.3f"]+["%#10.4f"]*(np.array(mml).shape[1]-1)},
+                          title=title,
+                          headers=headers)
+
+
+
+
+
+
+


 if __name__ == '__main__':
     from scipy import stats
+
     from statsmodels.stats.diagnostic import acorr_ljungbox
+
+
+    def randwalksim(nobs=100, drift=0.0):
+        return (drift+np.random.randn(nobs)).cumsum()
+
+    def normalnoisesim(nobs=500, loc=0.0):
+        return (loc+np.random.randn(nobs))
+
+#    print('\nResults with MC class'
+#    mc1 = StatTestMC(randwalksim, adf20)
+#    mc1.run(1000)
+#    print(mc1.histogram(critval=[-3.5, -3.17, -2.9 , -2.58,  0.26])
+#    print(mc1.quantiles()
+
     print('\nLjung Box')
+
+    def lb4(x):
+        s,p = acorr_ljungbox(x, lags=4, return_df=True)
+        return s[-1], p[-1]
+
+    def lb1(x):
+        s,p = acorr_ljungbox(x, lags=1, return_df=True)
+        return s[0], p[0]
+
+    def lb(x):
+        s,p = acorr_ljungbox(x, lags=4, return_df=True)
+        return np.r_[s, p]
+
     print('Results with MC class')
     mc1 = StatTestMC(normalnoisesim, lb)
     mc1.run(10000, statindices=lrange(8))
@@ -280,28 +484,37 @@ if __name__ == '__main__':
     print(mc1.quantiles(1))
     print(mc1.quantiles(0))
     print(mc1.histogram(0))
-    print(mc1.summary_quantiles([1, 2, 3], stats.chi2([2, 3, 4]).ppf,
-        varnames=['lag 1', 'lag 2', 'lag 3'], title='acorr_ljungbox'))
+
+    #print(mc1.summary_quantiles([1], stats.chi2([2]).ppf, title='acorr_ljungbox')
+    print(mc1.summary_quantiles([1,2,3], stats.chi2([2,3,4]).ppf,
+                                varnames=['lag 1', 'lag 2', 'lag 3'],
+                                title='acorr_ljungbox'))
     print(mc1.cdf(0.1026, 1))
     print(mc1.cdf(0.7278, 3))
-    print(mc1.cdf(0.7278, [1, 2, 3]))
+
+    print(mc1.cdf(0.7278, [1,2,3]))
     frac = [0.01, 0.025, 0.05, 0.1, 0.975]
-    crit = stats.chi2([2, 4]).ppf(np.atleast_2d(frac).T)
-    print(mc1.summary_cdf([1, 3], frac, crit, title='acorr_ljungbox'))
-    crit = stats.chi2([2, 3, 4]).ppf(np.atleast_2d(frac).T)
-    print(mc1.summary_cdf([1, 2, 3], frac, crit, varnames=['lag 1', 'lag 2',
-        'lag 3'], title='acorr_ljungbox'))
-    print(mc1.cdf(crit, [1, 2, 3])[1].shape)
-    """
+    crit = stats.chi2([2,4]).ppf(np.atleast_2d(frac).T)
+    print(mc1.summary_cdf([1,3], frac, crit, title='acorr_ljungbox'))
+    crit = stats.chi2([2,3,4]).ppf(np.atleast_2d(frac).T)
+    print(mc1.summary_cdf([1,2,3], frac, crit,
+                          varnames=['lag 1', 'lag 2', 'lag 3'],
+                          title='acorr_ljungbox'))
+
+    print(mc1.cdf(crit, [1,2,3])[1].shape)
+
+    #fixed broadcasting in cdf  Done 2d only
+    '''
     >>> mc1.cdf(crit[:,0], [1])[1].shape
     (5, 1)
     >>> mc1.cdf(crit[:,0], [1,3])[1].shape
     (5, 2)
     >>> mc1.cdf(crit[:,:], [1,3])[1].shape
     (2, 5, 2)
-    """
-    doplot = 0
+    '''
+
+    doplot=0
     if doplot:
         import matplotlib.pyplot as plt
-        mc1.plot_hist(0, stats.chi2(2).pdf)
+        mc1.plot_hist(0,stats.chi2(2).pdf)  #which pdf
         plt.show()
diff --git a/statsmodels/sandbox/tools/tools_pca.py b/statsmodels/sandbox/tools/tools_pca.py
index d15e96aae..5b2ea33f6 100644
--- a/statsmodels/sandbox/tools/tools_pca.py
+++ b/statsmodels/sandbox/tools/tools_pca.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Principal Component Analysis


@@ -6,11 +7,12 @@ Author: josef-pktd

 TODO : add class for better reuse of results
 """
+
 import numpy as np


 def pca(data, keepdim=0, normalize=0, demean=True):
-    """principal components with eigenvector decomposition
+    '''principal components with eigenvector decomposition
     similar to princomp in matlab

     Parameters
@@ -43,12 +45,47 @@ def pca(data, keepdim=0, normalize=0, demean=True):
     --------
     pcasvd : principal component analysis using svd

-    """
-    pass
+    '''
+    x = np.array(data)
+    #make copy so original does not change, maybe not necessary anymore
+    if demean:
+        m = x.mean(0)
+    else:
+        m = np.zeros(x.shape[1])
+    x -= m
+
+    # Covariance matrix
+    xcov = np.cov(x, rowvar=0)
+
+    # Compute eigenvalues and sort into descending order
+    evals, evecs = np.linalg.eig(xcov)
+    indices = np.argsort(evals)
+    indices = indices[::-1]
+    evecs = evecs[:,indices]
+    evals = evals[indices]
+
+    if keepdim > 0 and keepdim < x.shape[1]:
+        evecs = evecs[:,:keepdim]
+        evals = evals[:keepdim]
+
+    if normalize:
+        #for i in range(shape(evecs)[1]):
+        #    evecs[:,i] / linalg.norm(evecs[:,i]) * sqrt(evals[i])
+        evecs = evecs/np.sqrt(evals) #np.sqrt(np.dot(evecs.T, evecs) * evals)
+
+    # get factor matrix
+    #x = np.dot(evecs.T, x.T)
+    factors = np.dot(x, evecs)
+    # get original data from reduced number of components
+    #xreduced = np.dot(evecs.T, factors) + m
+    #print x.shape, factors.shape, evecs.shape, m.shape
+    xreduced = np.dot(factors, evecs.T) + m
+    return xreduced, factors, evals, evecs
+


 def pcasvd(data, keepdim=0, demean=True):
-    """principal components with svd
+    '''principal components with svd

     Parameters
     ----------
@@ -79,8 +116,33 @@ def pcasvd(data, keepdim=0, demean=True):
     -----
     This does not have yet the normalize option of pca.

-    """
-    pass
+    '''
+    nobs, nvars = data.shape
+    #print nobs, nvars, keepdim
+    x = np.array(data)
+    #make copy so original does not change
+    if demean:
+        m = x.mean(0)
+    else:
+        m = 0
+##    if keepdim == 0:
+##        keepdim = nvars
+##        "print reassigning keepdim to max", keepdim
+    x -= m
+    U, s, v = np.linalg.svd(x.T, full_matrices=1)
+    factors = np.dot(U.T, x.T).T #princomps
+    if keepdim:
+        xreduced = np.dot(factors[:,:keepdim], U[:,:keepdim].T) + m
+    else:
+        xreduced = data
+        keepdim = nvars
+        "print reassigning keepdim to max", keepdim
+
+    # s = evals, U = evecs
+    # no idea why denominator for s is with minus 1
+    evals = s**2/(x.shape[0]-1)
+    #print keepdim
+    return xreduced, factors[:,:keepdim], evals[:keepdim], U[:,:keepdim] #, v


 __all__ = ['pca', 'pcasvd']
diff --git a/statsmodels/sandbox/tools/try_mctools.py b/statsmodels/sandbox/tools/try_mctools.py
index 7c92e0ac8..22926ccd4 100644
--- a/statsmodels/sandbox/tools/try_mctools.py
+++ b/statsmodels/sandbox/tools/try_mctools.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Fri Sep 30 15:20:45 2011

@@ -9,28 +10,62 @@ from scipy import stats
 from statsmodels.sandbox.tools.mctools import StatTestMC
 from statsmodels.stats.diagnostic import acorr_ljungbox
 from statsmodels.tsa.stattools import adfuller
+
+def normalnoisesim(nobs=500, loc=0.0):
+    return (loc+np.random.randn(nobs))
+
+
+def lb(x):
+    s,p = acorr_ljungbox(x, lags=4)
+    return np.r_[s, p]
+
+
 mc1 = StatTestMC(normalnoisesim, lb)
 mc1.run(5000, statindices=lrange(4))
-print(mc1.summary_quantiles([1, 2, 3], stats.chi2([2, 3, 4]).ppf, varnames=
-    ['lag 1', 'lag 2', 'lag 3'], title='acorr_ljungbox'))
+
+print(mc1.summary_quantiles([1,2,3], stats.chi2([2,3,4]).ppf,
+                            varnames=['lag 1', 'lag 2', 'lag 3'],
+                            title='acorr_ljungbox'))
 print('\n\n')
+
 frac = [0.01, 0.025, 0.05, 0.1, 0.975]
-crit = stats.chi2([2, 3, 4]).ppf(np.atleast_2d(frac).T)
-print(mc1.summary_cdf([1, 2, 3], frac, crit, varnames=['lag 1', 'lag 2',
-    'lag 3'], title='acorr_ljungbox'))
-print(mc1.cdf(crit, [1, 2, 3])[1])
+crit = stats.chi2([2,3,4]).ppf(np.atleast_2d(frac).T)
+print(mc1.summary_cdf([1,2,3], frac, crit,
+                      varnames=['lag 1', 'lag 2', 'lag 3'],
+                      title='acorr_ljungbox'))
+print(mc1.cdf(crit, [1,2,3])[1])
+
+#----------------------
+
+def randwalksim(nobs=500, drift=0.0):
+    return (drift+np.random.randn(nobs)).cumsum()
+
+
+def adf20(x):
+    return adfuller(x, 2, regression="n", autolag=None)
+
 print(adf20(np.random.randn(100)))
+
 mc2 = StatTestMC(randwalksim, adf20)
-mc2.run(10000, statindices=[0, 1])
+mc2.run(10000, statindices=[0,1])
 frac = [0.01, 0.05, 0.1]
-crit = np.array([-3.4996365338407074, -2.8918307730370025, -2.5829283377617176]
-    )[:, None]
-print(mc2.summary_cdf([0], frac, crit, varnames=['adf'], title='adf'))
+#bug
+crit = np.array([-3.4996365338407074, -2.8918307730370025, -2.5829283377617176])[:,None]
+print(mc2.summary_cdf([0], frac, crit,
+                      varnames=['adf'],
+                      title='adf'))
+#bug
+#crit2 = np.column_stack((crit, frac))
+#print mc2.summary_cdf([0, 1], frac, crit,
+#                      varnames=['adf'],
+#                      title='adf')
+
 print(mc2.quantiles([0]))
 print(mc2.cdf(crit, [0]))
-doplot = 1
+
+doplot=1
 if doplot:
     import matplotlib.pyplot as plt
-    mc1.plot_hist([3], stats.chi2([4]).pdf)
+    mc1.plot_hist([3],stats.chi2([4]).pdf)
     plt.title('acorr_ljungbox - MC versus chi2')
     plt.show()
diff --git a/statsmodels/sandbox/tsa/diffusion.py b/statsmodels/sandbox/tsa/diffusion.py
index d8c24d17c..df53821c5 100644
--- a/statsmodels/sandbox/tsa/diffusion.py
+++ b/statsmodels/sandbox/tsa/diffusion.py
@@ -1,4 +1,4 @@
-"""getting started with diffusions, continuous time stochastic processes
+'''getting started with diffusions, continuous time stochastic processes

 Author: josef-pktd
 License: BSD
@@ -47,54 +47,70 @@ stochastic volatility models: estimation unclear
 finance applications ? option pricing, interest rate models


-"""
+'''
 import numpy as np
 from scipy import stats, signal
 import matplotlib.pyplot as plt

+#np.random.seed(987656789)

 class Diffusion:
-    """Wiener Process, Brownian Motion with mu=0 and sigma=1
-    """
-
+    '''Wiener Process, Brownian Motion with mu=0 and sigma=1
+    '''
     def __init__(self):
         pass

     def simulateW(self, nobs=100, T=1, dt=None, nrepl=1):
-        """generate sample of Wiener Process
-        """
-        pass
+        '''generate sample of Wiener Process
+        '''
+        dt = T*1.0/nobs
+        t = np.linspace(dt, 1, nobs)
+        dW = np.sqrt(dt)*np.random.normal(size=(nrepl, nobs))
+        W = np.cumsum(dW,1)
+        self.dW = dW
+        return W, t

     def expectedsim(self, func, nobs=100, T=1, dt=None, nrepl=1):
-        """get expectation of a function of a Wiener Process by simulation
+        '''get expectation of a function of a Wiener Process by simulation

         initially test example from
-        """
-        pass
-
+        '''
+        W, t = self.simulateW(nobs=nobs, T=T, dt=dt, nrepl=nrepl)
+        U = func(t, W)
+        Umean = U.mean(0)
+        return U, Umean, t

 class AffineDiffusion(Diffusion):
-    """
+    r'''

     differential equation:

     :math::
-    dx_t = f(t,x)dt + \\sigma(t,x)dW_t
+    dx_t = f(t,x)dt + \sigma(t,x)dW_t

     integral:

     :math::
-    x_T = x_0 + \\int_{0}^{T}f(t,S)dt + \\int_0^T  \\sigma(t,S)dW_t
+    x_T = x_0 + \int_{0}^{T}f(t,S)dt + \int_0^T  \sigma(t,S)dW_t

     TODO: check definition, affine, what about jump diffusion?

-    """
+    '''

     def __init__(self):
         pass

+    def sim(self, nobs=100, T=1, dt=None, nrepl=1):
+        # this does not look correct if drift or sig depend on x
+        # see arithmetic BM
+        W, t = self.simulateW(nobs=nobs, T=T, dt=dt, nrepl=nrepl)
+        dx =  self._drift() + self._sig() * W
+        x  = np.cumsum(dx,1)
+        xmean = x.mean(0)
+        return x, xmean, t
+
     def simEM(self, xzero=None, nobs=100, T=1, dt=None, nrepl=1, Tratio=4):
-        """
+        '''

         from Higham 2001

@@ -102,11 +118,34 @@ class AffineDiffusion(Diffusion):
         TODO: check if I can skip the loop using my way from exactprocess
               problem might be Winc (reshape into 3d and sum)
         TODO: (later) check memory efficiency for large simulations
-        """
-        pass
-
-
-"""
+        '''
+        #TODO: reverse parameterization to start with final nobs and DT
+        nobs = nobs * Tratio  # simple way to change parameter
+        # maybe wrong parameterization,
+        # drift too large, variance too small ? which dt/Dt
+        # _drift, _sig independent of dt is wrong
+        if xzero is None:
+            xzero = self.xzero
+        if dt is None:
+            dt = T*1.0/nobs
+        W, t = self.simulateW(nobs=nobs, T=T, dt=dt, nrepl=nrepl)
+        dW = self.dW
+        t = np.linspace(dt, 1, nobs)
+        Dt = Tratio*dt
+        L = nobs/Tratio        # L EM steps of size Dt = R*dt
+        Xem = np.zeros((nrepl,L))    # preallocate for efficiency
+        Xtemp = xzero
+        Xem[:,0] = xzero
+        for j in np.arange(1,L):
+            #Winc = np.sum(dW[:,Tratio*(j-1)+1:Tratio*j],1)
+            Winc = np.sum(dW[:,np.arange(Tratio*(j-1)+1,Tratio*j)],1)
+            #Xtemp = Xtemp + Dt*lamda*Xtemp + mu*Xtemp*Winc;
+            Xtemp = Xtemp + self._drift(x=Xtemp) + self._sig(x=Xtemp) * Winc
+            #Dt*lamda*Xtemp + mu*Xtemp*Winc;
+            Xem[:,j] = Xtemp
+        return Xem
+
+'''
     R = 4; Dt = R*dt; L = N/R;        % L EM steps of size Dt = R*dt
     Xem = zeros(1,L);                 % preallocate for efficiency
     Xtemp = Xzero;
@@ -115,51 +154,78 @@ class AffineDiffusion(Diffusion):
        Xtemp = Xtemp + Dt*lambda*Xtemp + mu*Xtemp*Winc;
        Xem(j) = Xtemp;
     end
-"""
-
+'''

 class ExactDiffusion(AffineDiffusion):
-    """Diffusion that has an exact integral representation
+    '''Diffusion that has an exact integral representation

     this is currently mainly for geometric, log processes

-    """
+    '''

     def __init__(self):
         pass

-    def exactprocess(self, xzero, nobs, ddt=1.0, nrepl=2):
-        """ddt : discrete delta t
+    def exactprocess(self, xzero, nobs, ddt=1., nrepl=2):
+        '''ddt : discrete delta t



         should be the same as an AR(1)
         not tested yet
-        """
-        pass
-
+        '''
+        t = np.linspace(ddt, nobs*ddt, nobs)
+        #expnt = np.exp(-self.lambd * t)
+        expddt = np.exp(-self.lambd * ddt)
+        normrvs = np.random.normal(size=(nrepl,nobs))
+        #do I need lfilter here AR(1) ? if mean reverting lag-coeff<1
+        #lfilter does not handle 2d arrays, it does?
+        inc = self._exactconst(expddt) + self._exactstd(expddt) * normrvs
+        return signal.lfilter([1.], [1.,-expddt], inc)
+
+    def exactdist(self, xzero, t):
+        expnt = np.exp(-self.lambd * t)
+        meant = xzero * expnt + self._exactconst(expnt)
+        stdt = self._exactstd(expnt)
+        return stats.norm(loc=meant, scale=stdt)

 class ArithmeticBrownian(AffineDiffusion):
-    """
+    '''
     :math::
     dx_t &= \\mu dt + \\sigma dW_t
-    """
+    '''

     def __init__(self, xzero, mu, sigma):
         self.xzero = xzero
         self.mu = mu
         self.sigma = sigma

-    def exactprocess(self, nobs, xzero=None, ddt=1.0, nrepl=2):
-        """ddt : discrete delta t
+    def _drift(self, *args, **kwds):
+        return self.mu
+    def _sig(self, *args, **kwds):
+        return self.sigma
+    def exactprocess(self, nobs, xzero=None, ddt=1., nrepl=2):
+        '''ddt : discrete delta t

         not tested yet
-        """
-        pass
+        '''
+        if xzero is None:
+            xzero = self.xzero
+        t = np.linspace(ddt, nobs*ddt, nobs)
+        normrvs = np.random.normal(size=(nrepl,nobs))
+        inc = self._drift + self._sigma * np.sqrt(ddt) * normrvs
+        #return signal.lfilter([1.], [1.,-1], inc)
+        return xzero + np.cumsum(inc,1)
+
+    def exactdist(self, xzero, t):
+        expnt = np.exp(-self.lambd * t)
+        meant = self._drift * t
+        stdt = self._sigma * np.sqrt(t)
+        return stats.norm(loc=meant, scale=stdt)


 class GeometricBrownian(AffineDiffusion):
-    """Geometric Brownian Motion
+    '''Geometric Brownian Motion

     :math::
     dx_t &= \\mu x_t dt + \\sigma x_t dW_t
@@ -169,16 +235,22 @@ class GeometricBrownian(AffineDiffusion):
     $\\sigma $ is the Volatility,
     $W$ is the Wiener process (Brownian motion).

-    """
-
+    '''
     def __init__(self, xzero, mu, sigma):
         self.xzero = xzero
         self.mu = mu
         self.sigma = sigma

+    def _drift(self, *args, **kwds):
+        x = kwds['x']
+        return self.mu * x
+    def _sig(self, *args, **kwds):
+        x = kwds['x']
+        return self.sigma * x
+

 class OUprocess(AffineDiffusion):
-    """Ornstein-Uhlenbeck
+    '''Ornstein-Uhlenbeck

     :math::
       dx_t&=\\lambda(\\mu - x_t)dt+\\sigma dW_t
@@ -188,32 +260,75 @@ class OUprocess(AffineDiffusion):


     TODO: move exact higher up in class hierarchy
-    """
-
+    '''
     def __init__(self, xzero, mu, lambd, sigma):
         self.xzero = xzero
         self.lambd = lambd
         self.mu = mu
         self.sigma = sigma

-    def exactprocess(self, xzero, nobs, ddt=1.0, nrepl=2):
-        """ddt : discrete delta t
+    def _drift(self, *args, **kwds):
+        x = kwds['x']
+        return self.lambd * (self.mu - x)
+    def _sig(self, *args, **kwds):
+        x = kwds['x']
+        return self.sigma * x
+    def exact(self, xzero, t, normrvs):
+        #TODO: aggregate over time for process with observations for all t
+        #      i.e. exact conditional distribution for discrete time increment
+        #      -> exactprocess
+        #TODO: for single t, return stats.norm -> exactdist
+        expnt = np.exp(-self.lambd * t)
+        return (xzero * expnt + self.mu * (1-expnt) +
+                self.sigma * np.sqrt((1-expnt*expnt)/2./self.lambd) * normrvs)
+
+    def exactprocess(self, xzero, nobs, ddt=1., nrepl=2):
+        '''ddt : discrete delta t

         should be the same as an AR(1)
         not tested yet
         # after writing this I saw the same use of lfilter in sitmo
-        """
-        pass
+        '''
+        t = np.linspace(ddt, nobs*ddt, nobs)
+        expnt = np.exp(-self.lambd * t)
+        expddt = np.exp(-self.lambd * ddt)
+        normrvs = np.random.normal(size=(nrepl,nobs))
+        #do I need lfilter here AR(1) ? lfilter does not handle 2d arrays, it does?
+        from scipy import signal
+        #xzero * expnt
+        inc = ( self.mu * (1-expddt) +
+                self.sigma * np.sqrt((1-expddt*expddt)/2./self.lambd) * normrvs )
+
+        return signal.lfilter([1.], [1.,-expddt], inc)
+
+
+    def exactdist(self, xzero, t):
+        #TODO: aggregate over time for process with observations for all t
+        #TODO: for single t, return stats.norm
+        expnt = np.exp(-self.lambd * t)
+        meant = xzero * expnt + self.mu * (1-expnt)
+        stdt = self.sigma * np.sqrt((1-expnt*expnt)/2./self.lambd)
+        from scipy import stats
+        return stats.norm(loc=meant, scale=stdt)

     def fitls(self, data, dt):
-        """assumes data is 1d, univariate time series
+        '''assumes data is 1d, univariate time series
         formula from sitmo
-        """
-        pass
+        '''
+        # brute force, no parameter estimation errors
+        nobs = len(data)-1
+        exog = np.column_stack((np.ones(nobs), data[:-1]))
+        parest, res, rank, sing = np.linalg.lstsq(exog, data[1:], rcond=-1)
+        const, slope = parest
+        errvar = res/(nobs-2.)
+        lambd = -np.log(slope)/dt
+        sigma = np.sqrt(-errvar * 2.*np.log(slope)/ (1-slope**2)/dt)
+        mu = const / (1-slope)
+        return mu, lambd, sigma


 class SchwartzOne(ExactDiffusion):
-    """the Schwartz type 1 stochastic process
+    '''the Schwartz type 1 stochastic process

     :math::
     dx_t = \\kappa (\\mu - \\ln x_t) x_t dt + \\sigma x_tdW \\
@@ -221,47 +336,119 @@ class SchwartzOne(ExactDiffusion):
     The Schwartz type 1 process is a log of the Ornstein-Uhlenbeck stochastic
     process.

-    """
+    '''

     def __init__(self, xzero, mu, kappa, sigma):
         self.xzero = xzero
         self.mu = mu
         self.kappa = kappa
-        self.lambd = kappa
+        self.lambd = kappa #alias until I fix exact
         self.sigma = sigma

-    def exactprocess(self, xzero, nobs, ddt=1.0, nrepl=2):
-        """uses exact solution for log of process
-        """
-        pass
+    def _exactconst(self, expnt):
+        return (1-expnt) * (self.mu - self.sigma**2 / 2. /self.kappa)
+
+    def _exactstd(self, expnt):
+        return self.sigma * np.sqrt((1-expnt*expnt)/2./self.kappa)
+
+    def exactprocess(self, xzero, nobs, ddt=1., nrepl=2):
+        '''uses exact solution for log of process
+        '''
+        lnxzero = np.log(xzero)
+        lnx = super(self.__class__, self).exactprocess(xzero, nobs, ddt=ddt, nrepl=nrepl)
+        return np.exp(lnx)
+
+    def exactdist(self, xzero, t):
+        expnt = np.exp(-self.lambd * t)
+        #TODO: check this is still wrong, just guessing
+        meant = np.log(xzero) * expnt + self._exactconst(expnt)
+        stdt = self._exactstd(expnt)
+        return stats.lognorm(loc=meant, scale=stdt)

     def fitls(self, data, dt):
-        """assumes data is 1d, univariate time series
+        '''assumes data is 1d, univariate time series
         formula from sitmo
-        """
-        pass
+        '''
+        # brute force, no parameter estimation errors
+        nobs = len(data)-1
+        exog = np.column_stack((np.ones(nobs),np.log(data[:-1])))
+        parest, res, rank, sing = np.linalg.lstsq(exog, np.log(data[1:]), rcond=-1)
+        const, slope = parest
+        errvar = res/(nobs-2.)  #check denominator estimate, of sigma too low
+        kappa = -np.log(slope)/dt
+        sigma = np.sqrt(errvar * kappa / (1-np.exp(-2*kappa*dt)))
+        mu = const / (1-np.exp(-kappa*dt)) + sigma**2/2./kappa
+        if np.shape(mu)== (1,):
+            mu = mu[0]   # TODO: how to remove scalar array ?
+        if np.shape(sigma)== (1,):
+            sigma = sigma[0]
+        #mu, kappa are good, sigma too small
+        return mu, kappa, sigma


-class BrownianBridge:

+class BrownianBridge:
     def __init__(self):
         pass

+    def simulate(self, x0, x1, nobs, nrepl=1, ddt=1., sigma=1.):
+        nobs=nobs+1
+        dt = ddt*1./nobs
+        t = np.linspace(dt, ddt-dt, nobs)
+        t = np.linspace(dt, ddt, nobs)
+        wm = [t/ddt, 1-t/ddt]
+        #wmi = wm[1]
+        #wm1 = x1*wm[0]
+        wmi = 1-dt/(ddt-t)
+        wm1 = x1*(dt/(ddt-t))
+        su = sigma* np.sqrt(t*(1-t)/ddt)
+        s = sigma* np.sqrt(dt*(ddt-t-dt)/(ddt-t))
+        x = np.zeros((nrepl, nobs))
+        x[:,0] = x0
+        rvs = s*np.random.normal(size=(nrepl,nobs))
+        for i in range(1,nobs):
+            x[:,i] = x[:,i-1]*wmi[i] + wm1[i] + rvs[:,i]
+        return x, t, su

-class CompoundPoisson:
-    """nobs iid compound poisson distributions, not a process in time
-    """

+class CompoundPoisson:
+    '''nobs iid compound poisson distributions, not a process in time
+    '''
     def __init__(self, lambd, randfn=np.random.normal):
         if len(lambd) != len(randfn):
-            raise ValueError(
-                'lambd and randfn need to have the same number of elements')
+            raise ValueError('lambd and randfn need to have the same number of elements')
+
         self.nobj = len(lambd)
         self.randfn = randfn
         self.lambd = np.asarray(lambd)

+    def simulate(self, nobs, nrepl=1):
+        nobj = self.nobj
+        x = np.zeros((nrepl, nobs, nobj))
+        N = np.random.poisson(self.lambd[None,None,:], size=(nrepl,nobs,nobj))
+        for io in range(nobj):
+            randfnc = self.randfn[io]
+
+            nc = N[:,:,io]
+            #print nrepl,nobs,nc
+            #xio = randfnc(size=(nrepl,nobs,np.max(nc))).cumsum(-1)[np.arange(nrepl)[:,None],np.arange(nobs),nc-1]
+            rvs = randfnc(size=(nrepl,nobs,np.max(nc)))
+            print('rvs.sum()', rvs.sum(), rvs.shape)
+            xio = rvs.cumsum(-1)[np.arange(nrepl)[:,None],np.arange(nobs),nc-1]
+            #print xio.shape
+            x[:,:,io] = xio
+        x[N==0] = 0
+        return x, N

-"""
+
+
+
+
+
+
+
+
+'''
 randn('state',100)                                % set the state of randn
 T = 1; N = 500; dt = T/N; t = [dt:dt:1];

@@ -277,29 +464,43 @@ ylabel('U(t)','FontSize',16,'Rotation',0,'HorizontalAlignment','right')
 legend('mean of 1000 paths','5 individual paths',2)

 averr = norm((Umean - exp(9*t/8)),'inf')          % sample error
-"""
+'''
+
 if __name__ == '__main__':
     doplot = 1
     nrepl = 1000
-    examples = []
+    examples = []#['all']
+
     if 'all' in examples:
         w = Diffusion()
+
+        # Wiener Process
+        # ^^^^^^^^^^^^^^
+
         ws = w.simulateW(1000, nrepl=nrepl)
         if doplot:
             plt.figure()
             tmp = plt.plot(ws[0].T)
             tmp = plt.plot(ws[0].mean(0), linewidth=2)
             plt.title('Standard Brownian Motion (Wiener Process)')
-        func = lambda t, W: np.exp(t + 0.5 * W)
+
+        func = lambda t, W: np.exp(t + 0.5*W)
         us = w.expectedsim(func, nobs=500, nrepl=nrepl)
         if doplot:
             plt.figure()
             tmp = plt.plot(us[0].T)
             tmp = plt.plot(us[1], linewidth=2)
             plt.title('Brownian Motion - exp')
-        averr = np.linalg.norm(us[1] - np.exp(9 * us[2] / 8.0), np.inf)
+        #plt.show()
+        averr = np.linalg.norm(us[1] - np.exp(9*us[2]/8.), np.inf)
         print(averr)
-        gb = GeometricBrownian(xzero=1.0, mu=0.01, sigma=0.5)
+        #print us[1][:10]
+        #print np.exp(9.*us[2][:10]/8.)
+
+        # Geometric Brownian
+        # ^^^^^^^^^^^^^^^^^^
+
+        gb = GeometricBrownian(xzero=1., mu=0.01, sigma=0.5)
         gbs = gb.simEM(nobs=100, nrepl=100)
         if doplot:
             plt.figure()
@@ -310,6 +511,7 @@ if __name__ == '__main__':
             tmp = plt.plot(np.log(gbs).T)
             tmp = plt.plot(np.log(gbs.mean(0)), linewidth=2)
             plt.title('Geometric Brownian - log-transformed')
+
         ab = ArithmeticBrownian(xzero=1, mu=0.05, sigma=1)
         abs = ab.simEM(nobs=100, nrepl=100)
         if doplot:
@@ -317,20 +519,29 @@ if __name__ == '__main__':
             tmp = plt.plot(abs.T)
             tmp = plt.plot(abs.mean(0), linewidth=2)
             plt.title('Arithmetic Brownian')
+
+        # Ornstein-Uhlenbeck
+        # ^^^^^^^^^^^^^^^^^^
+
         ou = OUprocess(xzero=2, mu=1, lambd=0.5, sigma=0.1)
         ous = ou.simEM()
-        oue = ou.exact(1, 1, np.random.normal(size=(5, 10)))
-        ou.exact(0, np.linspace(0, 10, 10 / 0.1), 0)
-        ou.exactprocess(0, 10)
-        print(ou.exactprocess(0, 10, ddt=0.1, nrepl=10).mean(0))
-        oues = ou.exactprocess(0, 100, ddt=0.1, nrepl=100)
+        oue = ou.exact(1, 1, np.random.normal(size=(5,10)))
+        ou.exact(0, np.linspace(0,10,10/0.1), 0)
+        ou.exactprocess(0,10)
+        print(ou.exactprocess(0,10, ddt=0.1,nrepl=10).mean(0))
+        #the following looks good, approaches mu
+        oues = ou.exactprocess(0,100, ddt=0.1,nrepl=100)
         if doplot:
             plt.figure()
             tmp = plt.plot(oues.T)
             tmp = plt.plot(oues.mean(0), linewidth=2)
             plt.title('Ornstein-Uhlenbeck')
+
+        # SchwartsOne
+        # ^^^^^^^^^^^
+
         so = SchwartzOne(xzero=0, mu=1, kappa=0.5, sigma=0.1)
-        sos = so.exactprocess(0, 50, ddt=0.1, nrepl=100)
+        sos = so.exactprocess(0,50, ddt=0.1,nrepl=100)
         print(sos.mean(0))
         print(np.log(sos.mean(0)))
         doplot = 1
@@ -339,28 +550,43 @@ if __name__ == '__main__':
             tmp = plt.plot(sos.T)
             tmp = plt.plot(sos.mean(0), linewidth=2)
             plt.title('Schwartz One')
-        print(so.fitls(sos[0, :], dt=0.1))
-        sos2 = so.exactprocess(0, 500, ddt=0.1, nrepl=5)
+        print(so.fitls(sos[0,:],dt=0.1))
+        sos2 = so.exactprocess(0,500, ddt=0.1,nrepl=5)
         print('true: mu=1, kappa=0.5, sigma=0.1')
         for i in range(5):
-            print(so.fitls(sos2[i], dt=0.1))
+            print(so.fitls(sos2[i],dt=0.1))
+
+
+
+        # Brownian Bridge
+        # ^^^^^^^^^^^^^^^
+
         bb = BrownianBridge()
-        bbs, t, wm = bb.simulate(0, 0.5, 99, nrepl=500, ddt=1.0, sigma=0.1)
+        #bbs = bb.sample(x0, x1, nobs, nrepl=1, ddt=1., sigma=1.)
+        bbs, t, wm = bb.simulate(0, 0.5, 99, nrepl=500, ddt=1., sigma=0.1)
         if doplot:
             plt.figure()
             tmp = plt.plot(bbs.T)
             tmp = plt.plot(bbs.mean(0), linewidth=2)
             plt.title('Brownian Bridge')
             plt.figure()
-            plt.plot(wm, 'r', label='theoretical')
+            plt.plot(wm,'r', label='theoretical')
             plt.plot(bbs.std(0), label='simulated')
             plt.title('Brownian Bridge - Variance')
             plt.legend()
-    cp = CompoundPoisson([1, 1], [np.random.normal, np.random.normal])
-    cps = cp.simulate(nobs=20000, nrepl=3)
+
+    # Compound Poisson
+    # ^^^^^^^^^^^^^^^^
+    cp = CompoundPoisson([1,1], [np.random.normal,np.random.normal])
+    cps = cp.simulate(nobs=20000,nrepl=3)
     print(cps[0].sum(-1).sum(-1))
     print(cps[0].sum())
     print(cps[0].mean(-1).mean(-1))
     print(cps[0].mean())
     print(cps[1].size)
     print(cps[1].sum())
+    #Note Y = sum^{N} X is compound poisson of iid x, then
+    #E(Y) = E(N)*E(X)   eg. eq. (6.37) page 385 in http://ee.stanford.edu/~gray/sp.html
+
+
+    #plt.show()
diff --git a/statsmodels/sandbox/tsa/diffusion2.py b/statsmodels/sandbox/tsa/diffusion2.py
index ce4bfcc20..a48f72734 100644
--- a/statsmodels/sandbox/tsa/diffusion2.py
+++ b/statsmodels/sandbox/tsa/diffusion2.py
@@ -81,12 +81,14 @@ CumS is empty array, Events == -1


 """
+
+
 import numpy as np
+#from scipy import stats  # currently only uses np.random
 import matplotlib.pyplot as plt

-
 class JumpDiffusionMerton:
-    """
+    '''

     Example
     -------
@@ -103,151 +105,398 @@ class JumpDiffusionMerton:
     plt.title('Merton jump-diffusion')


-    """
+    '''

     def __init__(self):
         pass


+    def simulate(self, m,s,lambd,a,D,ts,nrepl):
+
+        T = ts[-1]  # time points
+        # simulate number of jumps
+        n_jumps = np.random.poisson(lambd*T, size=(nrepl, 1))
+
+        jumps=[]
+        nobs=len(ts)
+        jumps=np.zeros((nrepl,nobs))
+        for j in range(nrepl):
+            # simulate jump arrival time
+            t = T*np.random.rand(n_jumps[j])#,1) #uniform
+            t = np.sort(t,0)
+
+            # simulate jump size
+            S = a + D*np.random.randn(n_jumps[j],1)
+
+            # put things together
+            CumS = np.cumsum(S)
+            jumps_ts = np.zeros(nobs)
+            for n in range(nobs):
+                Events = np.sum(t<=ts[n])-1
+                #print n, Events, CumS.shape, jumps_ts.shape
+                jumps_ts[n]=0
+                if Events > 0:
+                    jumps_ts[n] = CumS[Events] #TODO: out of bounds see top
+
+            #jumps = np.column_stack((jumps, jumps_ts))  #maybe wrong transl
+            jumps[j,:] = jumps_ts
+
+
+        D_Diff = np.zeros((nrepl,nobs))
+        for k in range(nobs):
+            Dt=ts[k]
+            if k>1:
+                Dt=ts[k]-ts[k-1]
+            D_Diff[:,k]=m*Dt + s*np.sqrt(Dt)*np.random.randn(nrepl)
+
+        x = np.hstack((np.zeros((nrepl,1)),np.cumsum(D_Diff,1)+jumps))
+
+        return x
+
 class JumpDiffusionKou:

     def __init__(self):
         pass

+    def simulate(self, m,s,lambd,p,e1,e2,ts,nrepl):
+
+        T=ts[-1]
+        # simulate number of jumps
+        N = np.random.poisson(lambd*T,size =(nrepl,1))
+
+        jumps=[]
+        nobs=len(ts)
+        jumps=np.zeros((nrepl,nobs))
+        for j in range(nrepl):
+            # simulate jump arrival time
+            t=T*np.random.rand(N[j])
+            t=np.sort(t)
+
+            # simulate jump size
+            ww = np.random.binomial(1, p, size=(N[j]))
+            S = ww * np.random.exponential(e1, size=(N[j])) - \
+                (1-ww) * np.random.exponential(e2, N[j])
+
+            # put things together
+            CumS = np.cumsum(S)
+            jumps_ts = np.zeros(nobs)
+            for n in range(nobs):
+                Events = sum(t<=ts[n])-1
+                jumps_ts[n]=0
+                if Events:
+                    jumps_ts[n]=CumS[Events]
+
+            jumps[j,:] = jumps_ts
+
+        D_Diff = np.zeros((nrepl,nobs))
+        for k in range(nobs):
+            Dt=ts[k]
+            if k>1:
+                Dt=ts[k]-ts[k-1]
+
+            D_Diff[:,k]=m*Dt + s*np.sqrt(Dt)*np.random.normal(size=nrepl)
+
+        x = np.hstack((np.zeros((nrepl,1)),np.cumsum(D_Diff,1)+jumps))
+        return x
+

 class VG:
-    """variance gamma process
-    """
+    '''variance gamma process
+    '''

     def __init__(self):
         pass

+    def simulate(self, m,s,kappa,ts,nrepl):
+
+        T=len(ts)
+        dXs = np.zeros((nrepl,T))
+        for t in range(T):
+            dt=ts[1]-0
+            if t>1:
+                dt = ts[t]-ts[t-1]
+
+            #print dt/kappa
+            #TODO: check parameterization of gamrnd, checked looks same as np
+
+            d_tau = kappa * np.random.gamma(dt/kappa,1.,size=(nrepl))
+            #print s*np.sqrt(d_tau)
+            # this raises exception:
+            #dX = stats.norm.rvs(m*d_tau,(s*np.sqrt(d_tau)))
+            # np.random.normal requires scale >0
+            dX = np.random.normal(loc=m*d_tau, scale=1e-6+s*np.sqrt(d_tau))
+
+            dXs[:,t] = dX
+
+        x = np.cumsum(dXs,1)
+        return x

 class IG:
-    """inverse-Gaussian ??? used by NIG
-    """
+    '''inverse-Gaussian ??? used by NIG
+    '''

     def __init__(self):
         pass

+    def simulate(self, l,m,nrepl):
+
+        N = np.random.randn(nrepl,1)
+        Y = N**2
+        X = m + (.5*m*m/l)*Y - (.5*m/l)*np.sqrt(4*m*l*Y+m*m*(Y**2))
+        U = np.random.rand(nrepl,1)
+
+        ind = U>m/(X+m)
+        X[ind] = m*m/X[ind]
+        return X.ravel()
+

 class NIG:
-    """normal-inverse-Gaussian
-    """
+    '''normal-inverse-Gaussian
+    '''

     def __init__(self):
         pass

+    def simulate(self, th,k,s,ts,nrepl):
+
+        T = len(ts)
+        DXs = np.zeros((nrepl,T))
+        for t in range(T):
+            Dt=ts[1]-0
+            if t>1:
+                Dt=ts[t]-ts[t-1]
+
+            lfrac = 1/k*(Dt**2)
+            m = Dt
+            DS = IG().simulate(lfrac, m, nrepl)
+            N = np.random.randn(nrepl)
+
+            DX = s*N*np.sqrt(DS) + th*DS
+            #print DS.shape, DX.shape, DXs.shape
+            DXs[:,t] = DX
+
+        x = np.cumsum(DXs,1)
+        return x

 class Heston:
-    """Heston Stochastic Volatility
-    """
+    '''Heston Stochastic Volatility
+    '''

     def __init__(self):
         pass

+    def simulate(self, m, kappa, eta,lambd,r, ts, nrepl,tratio=1.):
+        T = ts[-1]
+        nobs = len(ts)
+        dt = np.zeros(nobs) #/tratio
+        dt[0] = ts[0]-0
+        dt[1:] = np.diff(ts)
+
+        DXs = np.zeros((nrepl,nobs))
+
+        dB_1 = np.sqrt(dt) * np.random.randn(nrepl,nobs)
+        dB_2u = np.sqrt(dt) * np.random.randn(nrepl,nobs)
+        dB_2 = r*dB_1 + np.sqrt(1-r**2)*dB_2u
+
+        vt = eta*np.ones(nrepl)
+        v=[]
+        dXs = np.zeros((nrepl,nobs))
+        vts = np.zeros((nrepl,nobs))
+        for t in range(nobs):
+            dv = kappa*(eta-vt)*dt[t]+ lambd*np.sqrt(vt)*dB_2[:,t]
+            dX = m*dt[t] + np.sqrt(vt*dt[t]) * dB_1[:,t]
+            vt = vt + dv
+
+            vts[:,t] = vt
+            dXs[:,t] = dX
+
+        x = np.cumsum(dXs,1)
+        return x, vts

 class CIRSubordinatedBrownian:
-    """CIR subordinated Brownian Motion
-    """
+    '''CIR subordinated Brownian Motion
+    '''

     def __init__(self):
         pass

+    def simulate(self, m, kappa, T_dot,lambd,sigma, ts, nrepl):
+        T = ts[-1]
+        nobs = len(ts)
+        dtarr = np.zeros(nobs) #/tratio
+        dtarr[0] = ts[0]-0
+        dtarr[1:] = np.diff(ts)
+
+        DXs = np.zeros((nrepl,nobs))
+
+        dB = np.sqrt(dtarr) * np.random.randn(nrepl,nobs)
+
+        yt = 1.
+        dXs = np.zeros((nrepl,nobs))
+        dtaus = np.zeros((nrepl,nobs))
+        y = np.zeros((nrepl,nobs))
+        for t in range(nobs):
+            dt = dtarr[t]
+            dy = kappa*(T_dot-yt)*dt + lambd*np.sqrt(yt)*dB[:,t]
+            yt = np.maximum(yt+dy,1e-10) # keep away from zero ?
+
+            dtau = np.maximum(yt*dt, 1e-6)
+            dX = np.random.normal(loc=m*dtau, scale=sigma*np.sqrt(dtau))
+
+            y[:,t] = yt
+            dtaus[:,t] = dtau
+            dXs[:,t] = dX
+
+        tau = np.cumsum(dtaus,1)
+        x = np.cumsum(dXs,1)
+        return x, tau, y
+
+def schout2contank(a,b,d):
+
+    th = d*b/np.sqrt(a**2-b**2)
+    k = 1/(d*np.sqrt(a**2-b**2))
+    s = np.sqrt(d/np.sqrt(a**2-b**2))
+    return th,k,s
+

 if __name__ == '__main__':
-    nobs = 252.0
-    ts = np.linspace(1.0 / nobs, 1.0, nobs)
-    nrepl = 5
-    mu = 0.01
-    sigma = 0.02
-    lambd = 3.45 * 10
-    a = 0
-    D = 0.2
+
+    #Merton Jump Diffusion
+    #^^^^^^^^^^^^^^^^^^^^^
+
+    # grid of time values at which the process is evaluated
+    #("0" will be added, too)
+    nobs = 252.#1000 #252.
+    ts  = np.linspace(1./nobs, 1., nobs)
+    nrepl=5 # number of simulations
+    mu=.010     # deterministic drift
+    sigma = .020 # Gaussian component
+    lambd = 3.45 *10 # Poisson process arrival rate
+    a=0 # drift of log-jump
+    D=.2 # st.dev of log-jump
     jd = JumpDiffusionMerton()
-    x = jd.simulate(mu, sigma, lambd, a, D, ts, nrepl)
+    x = jd.simulate(mu,sigma,lambd,a,D,ts,nrepl)
     plt.figure()
-    plt.plot(x.T)
+    plt.plot(x.T) #Todo
     plt.title('Merton jump-diffusion')
+
     sigma = 0.2
     lambd = 3.45
-    x = jd.simulate(mu, sigma, lambd, a, D, ts, nrepl)
+    x = jd.simulate(mu,sigma,lambd,a,D,ts,nrepl)
     plt.figure()
-    plt.plot(x.T)
+    plt.plot(x.T) #Todo
     plt.title('Merton jump-diffusion')
-    mu = 0.0
-    lambd = 4.25
-    p = 0.5
-    e1 = 0.2
-    e2 = 0.3
-    sig = 0.2
-    x = JumpDiffusionKou().simulate(mu, sig, lambd, p, e1, e2, ts, nrepl)
+
+    #Kou jump diffusion
+    #^^^^^^^^^^^^^^^^^^
+
+    mu=.0 # deterministic drift
+    lambd=4.25 # Poisson process arrival rate
+    p=.5 # prob. of up-jump
+    e1=.2 # parameter of up-jump
+    e2=.3 # parameter of down-jump
+    sig=.2 # Gaussian component
+
+    x = JumpDiffusionKou().simulate(mu,sig,lambd,p,e1,e2,ts,nrepl)
+
     plt.figure()
-    plt.plot(x.T)
+    plt.plot(x.T) #Todo
     plt.title('double exponential (Kou jump diffusion)')
-    mu = 0.1
-    kappa = 1.0
-    sig = 0.5
-    x = VG().simulate(mu, sig, kappa, ts, nrepl)
+
+    #variance-gamma
+    #^^^^^^^^^^^^^^
+    mu = .1     # deterministic drift in subordinated Brownian motion
+    kappa = 1. #10. #1   # inverse for gamma shape parameter
+    sig = 0.5 #.2    # s.dev in subordinated Brownian motion
+
+    x = VG().simulate(mu,sig,kappa,ts,nrepl)
     plt.figure()
-    plt.plot(x.T)
+    plt.plot(x.T) #Todo
     plt.title('variance gamma')
+
+
+    #normal-inverse-Gaussian
+    #^^^^^^^^^^^^^^^^^^^^^^^
+
+    # (Schoutens notation)
     al = 2.1
     be = 0
     de = 1
-    th, k, s = schout2contank(al, be, de)
-    x = NIG().simulate(th, k, s, ts, nrepl)
+    # convert parameters to Cont-Tankov notation
+    th,k,s = schout2contank(al,be,de)
+
+    x = NIG().simulate(th,k,s,ts,nrepl)
+
     plt.figure()
-    plt.plot(x.T)
+    plt.plot(x.T) #Todo  x-axis
     plt.title('normal-inverse-Gaussian')
-    m = 0.0
-    kappa = 0.6
-    eta = 0.3 ** 2
-    lambd = 0.25
-    r = -0.7
-    T = 20.0
-    nobs = 252.0 * T
-    tsh = np.linspace(T / nobs, T, nobs)
-    x, vts = Heston().simulate(m, kappa, eta, lambd, r, tsh, nrepl, tratio=20.0
-        )
+
+    #Heston Stochastic Volatility
+    #^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+    m=.0
+    kappa = .6  # 2*Kappa*Eta>Lambda^2
+    eta = .3**2
+    lambd =.25
+    r = -.7
+    T = 20.
+    nobs = 252.*T#1000 #252.
+    tsh  = np.linspace(T/nobs, T, nobs)
+    x, vts = Heston().simulate(m,kappa, eta,lambd,r, tsh, nrepl, tratio=20.)
+
     plt.figure()
     plt.plot(x.T)
     plt.title('Heston Stochastic Volatility')
+
     plt.figure()
     plt.plot(np.sqrt(vts).T)
     plt.title('Heston Stochastic Volatility - CIR Vol.')
+
     plt.figure()
-    plt.subplot(2, 1, 1)
+    plt.subplot(2,1,1)
     plt.plot(x[0])
     plt.title('Heston Stochastic Volatility process')
-    plt.subplot(2, 1, 2)
+    plt.subplot(2,1,2)
     plt.plot(np.sqrt(vts[0]))
     plt.title('CIR Volatility')
-    m = 0.1
-    sigma = 0.4
-    kappa = 0.6
-    T_dot = 1
-    lambd = 1
-    T = 10.0
-    nobs = 252.0 * T
-    tsh = np.linspace(T / nobs, T, nobs)
-    x, tau, y = CIRSubordinatedBrownian().simulate(m, kappa, T_dot, lambd,
-        sigma, tsh, nrepl)
+
+
+    #CIR subordinated Brownian
+    #^^^^^^^^^^^^^^^^^^^^^^^^^
+    m=.1
+    sigma=.4
+
+    kappa=.6  # 2*Kappa*T_dot>Lambda^2
+    T_dot=1
+    lambd=1
+    #T=252*10
+    #dt=1/252
+    #nrepl=2
+    T = 10.
+    nobs = 252.*T#1000 #252.
+    tsh  = np.linspace(T/nobs, T, nobs)
+    x, tau, y = CIRSubordinatedBrownian().simulate(m, kappa, T_dot,lambd,sigma, tsh, nrepl)
+
     plt.figure()
     plt.plot(tsh, x.T)
     plt.title('CIRSubordinatedBrownian process')
+
     plt.figure()
     plt.plot(tsh, y.T)
     plt.title('CIRSubordinatedBrownian - CIR')
+
     plt.figure()
     plt.plot(tsh, tau.T)
     plt.title('CIRSubordinatedBrownian - stochastic time ')
+
     plt.figure()
-    plt.subplot(2, 1, 1)
+    plt.subplot(2,1,1)
     plt.plot(tsh, x[0])
     plt.title('CIRSubordinatedBrownian process')
-    plt.subplot(2, 1, 2)
+    plt.subplot(2,1,2)
     plt.plot(tsh, y[0], label='CIR')
     plt.plot(tsh, tau[0], label='stoch. time')
     plt.legend(loc='upper left')
     plt.title('CIRSubordinatedBrownian')
+
+    #plt.show()
diff --git a/statsmodels/sandbox/tsa/example_arma.py b/statsmodels/sandbox/tsa/example_arma.py
index de25dbf7e..a4e1754d2 100644
--- a/statsmodels/sandbox/tsa/example_arma.py
+++ b/statsmodels/sandbox/tsa/example_arma.py
@@ -1,94 +1,212 @@
-"""trying to verify theoretical acf of arma
+'''trying to verify theoretical acf of arma

 explicit functions for autocovariance functions of ARIMA(1,1), MA(1), MA(2)
 plus 3 functions from nitime.utils

-"""
+'''
 import numpy as np
 from numpy.testing import assert_array_almost_equal
 import matplotlib.pyplot as plt
+
 from statsmodels import regression
 from statsmodels.tsa.arima_process import arma_generate_sample, arma_impulse_response
 from statsmodels.tsa.arima_process import arma_acovf, arma_acf
 from statsmodels.tsa.arima.model import ARIMA
 from statsmodels.tsa.stattools import acf, acovf
 from statsmodels.graphics.tsaplots import plot_acf
-ar = [1.0, -0.6]
-ma = [1.0, 0.4]
-mod = ''
+
+ar = [1., -0.6]
+#ar = [1., 0.]
+ma = [1., 0.4]
+#ma = [1., 0.4, 0.6]
+#ma = [1., 0.]
+mod = ''#'ma2'
 x = arma_generate_sample(ar, ma, 5000)
 x_acf = acf(x)[:10]
 x_ir = arma_impulse_response(ar, ma)

+#print x_acf[:10]
+#print x_ir[:10]
+#irc2 = np.correlate(x_ir,x_ir,'full')[len(x_ir)-1:]
+#print irc2[:10]
+#print irc2[:10]/irc2[0]
+#print irc2[:10-1] / irc2[1:10]
+#print x_acf[:10-1] / x_acf[1:10]
+
+# detrend helper from matplotlib.mlab
+def detrend(x, key=None):
+    if key is None or key=='constant':
+        return detrend_mean(x)
+    elif key=='linear':
+        return detrend_linear(x)

 def demean(x, axis=0):
-    """Return x minus its mean along the specified axis"""
-    pass
-
+    "Return x minus its mean along the specified axis"
+    x = np.asarray(x)
+    if axis:
+        ind = [slice(None)] * axis
+        ind.append(np.newaxis)
+        return x - x.mean(axis)[ind]
+    return x - x.mean(axis)

 def detrend_mean(x):
-    """Return x minus the mean(x)"""
-    pass
-
+    "Return x minus the mean(x)"
+    return x - x.mean()

 def detrend_none(x):
-    """Return x: no detrending"""
-    pass
-
+    "Return x: no detrending"
+    return x

 def detrend_linear(y):
-    """Return y minus best fit line; 'linear' detrending """
-    pass
-
+    "Return y minus best fit line; 'linear' detrending "
+    # This is faster than an algorithm based on linalg.lstsq.
+    x = np.arange(len(y), dtype=np.float_)
+    C = np.cov(x, y, bias=1)
+    b = C[0,1]/C[0,0]
+    a = y.mean() - b*x.mean()
+    return y - (b*x + a)

 def acovf_explicit(ar, ma, nobs):
-    """add correlation of MA representation explicitely
+    '''add correlation of MA representation explicitely
+
+    '''
+    ir = arma_impulse_response(ar, ma)
+    acovfexpl = [np.dot(ir[:nobs-t], ir[t:nobs]) for t in range(10)]
+    return acovfexpl
+
+def acovf_arma11(ar, ma):
+    # ARMA(1,1)
+    # Florens et al page 278
+    # wrong result ?
+    # new calculation bigJudge p 311, now the same
+    a = -ar[1]
+    b = ma[1]
+    #rho = [1.]
+    #rho.append((1-a*b)*(a-b)/(1.+a**2-2*a*b))
+    rho = [(1.+b**2+2*a*b)/(1.-a**2)]
+    rho.append((1+a*b)*(a+b)/(1.-a**2))
+    for _ in range(8):
+        last = rho[-1]
+        rho.append(a*last)
+    return np.array(rho)
+
+#    print acf11[:10]
+#    print acf11[:10] /acf11[0]
+
+def acovf_ma2(ma):
+    # MA(2)
+    # from Greene p616 (with typo), Florens p280
+    b1 = -ma[1]
+    b2 = -ma[2]
+    rho = np.zeros(10)
+    rho[0] = (1 + b1**2 + b2**2)
+    rho[1] = (-b1 + b1*b2)
+    rho[2] = -b2
+    return rho
+
+#    rho2 = rho/rho[0]
+#    print rho2
+#    print irc2[:10]/irc2[0]
+
+def acovf_ma1(ma):
+    # MA(1)
+    # from Greene p616 (with typo), Florens p280
+    b = -ma[1]
+    rho = np.zeros(10)
+    rho[0] = (1 + b**2)
+    rho[1] = -b
+    return rho
+
+#    rho2 = rho/rho[0]
+#    print rho2
+#    print irc2[:10]/irc2[0]
+
+
+ar1 = [1., -0.8]
+ar0 = [1., 0.]
+ma1 = [1., 0.4]
+ma2 = [1., 0.4, 0.6]
+ma0 = [1., 0.]
+
+comparefn = dict(
+        [('ma1', acovf_ma1),
+        ('ma2', acovf_ma2),
+        ('arma11', acovf_arma11),
+        ('ar1', acovf_arma11)])
+
+cases = [('ma1', (ar0, ma1)),
+        ('ma2', (ar0, ma2)),
+        ('arma11', (ar1, ma1)),
+        ('ar1', (ar1, ma0))]

-    """
-    pass
-
-
-ar1 = [1.0, -0.8]
-ar0 = [1.0, 0.0]
-ma1 = [1.0, 0.4]
-ma2 = [1.0, 0.4, 0.6]
-ma0 = [1.0, 0.0]
-comparefn = dict([('ma1', acovf_ma1), ('ma2', acovf_ma2), ('arma11',
-    acovf_arma11), ('ar1', acovf_arma11)])
-cases = [('ma1', (ar0, ma1)), ('ma2', (ar0, ma2)), ('arma11', (ar1, ma1)),
-    ('ar1', (ar1, ma0))]
 for c, args in cases:
+
     ar, ma = args
     print('')
     print(c, ar, ma)
     myacovf = arma_acovf(ar, ma, nobs=10)
     myacf = arma_acf(ar, ma, lags=10)
-    if c[:2] == 'ma':
+    if c[:2]=='ma':
         othacovf = comparefn[c](ma)
     else:
         othacovf = comparefn[c](ar, ma)
     print(myacovf[:5])
     print(othacovf[:5])
-    assert_array_almost_equal(myacovf, othacovf, 10)
-    assert_array_almost_equal(myacf, othacovf / othacovf[0], 10)
-
-
+    #something broke again,
+    #for high persistence case eg ar=0.99, nobs of IR has to be large
+    #made changes to arma_acovf
+    assert_array_almost_equal(myacovf, othacovf,10)
+    assert_array_almost_equal(myacf, othacovf/othacovf[0],10)
+
+
+#from nitime.utils
+def ar_generator(N=512, sigma=1.):
+    # this generates a signal u(n) = a1*u(n-1) + a2*u(n-2) + ... + v(n)
+    # where v(n) is a stationary stochastic process with zero mean
+    # and variance = sigma
+    # this sequence is shown to be estimated well by an order 8 AR system
+    taps = np.array([2.7607, -3.8106, 2.6535, -0.9238])
+    v = np.random.normal(size=N, scale=sigma**0.5)
+    u = np.zeros(N)
+    P = len(taps)
+    for l in range(P):
+        u[l] = v[l] + np.dot(u[:l][::-1], taps[:l])
+    for l in range(P,N):
+        u[l] = v[l] + np.dot(u[l-P:l][::-1], taps)
+    return u, v, taps
+
+#JP: small differences to using np.correlate, because assumes mean(s)=0
+#    denominator is N, not N-k, biased estimator
+#    misnomer: (biased) autocovariance not autocorrelation
+#from nitime.utils
 def autocorr(s, axis=-1):
     """Returns the autocorrelation of signal s at all lags. Adheres to the
 definition r(k) = E{s(n)s*(n-k)} where E{} is the expectation operator.
 """
-    pass
-
-
-def norm_corr(x, y, mode='valid'):
+    N = s.shape[axis]
+    S = np.fft.fft(s, n=2*N-1, axis=axis)
+    sxx = np.fft.ifft(S*S.conjugate(), axis=axis).real[:N]
+    return sxx/N
+
+#JP: with valid this returns a single value, if x and y have same length
+#   e.g. norm_corr(x, x)
+#   using std subtracts mean, but correlate does not, requires means are exactly 0
+#   biased, no n-k correction for laglength
+#from nitime.utils
+def norm_corr(x,y,mode = 'valid'):
     """Returns the correlation between two ndarrays, by calling np.correlate in
 'same' mode and normalizing the result by the std of the arrays and by
 their lengths. This results in a correlation = 1 for an auto-correlation"""
-    pass
+
+    return ( np.correlate(x,y,mode) /
+             (np.std(x)*np.std(y)*(x.shape[-1])) )


+
+# from matplotlib axes.py
+# note: self is axis
 def pltacorr(self, x, **kwargs):
-    """
+    r"""
     call signature::

         acorr(x, normed=True, detrend=detrend_none, usevlines=True,
@@ -122,7 +240,7 @@ def pltacorr(self, x, **kwargs):

     *maxlags* is a positive integer detailing the number of lags
     to show.  The default value of *None* will return all
-    :math:`2 \\mathrm{len}(x) - 1` lags.
+    :math:`2 \mathrm{len}(x) - 1` lags.

     The return value is a tuple (*lags*, *c*, *linecol*, *b*)
     where
@@ -147,11 +265,10 @@ def pltacorr(self, x, **kwargs):

     .. plot:: mpl_examples/pylab_examples/xcorr_demo.py
     """
-    pass
+    return self.xcorr(x, x, **kwargs)

-
-def pltxcorr(self, x, y, normed=True, detrend=detrend_none, usevlines=True,
-    maxlags=10, **kwargs):
+def pltxcorr(self, x, y, normed=True, detrend=detrend_none,
+          usevlines=True, maxlags=10, **kwargs):
     """
     call signature::

@@ -204,33 +321,87 @@ def pltxcorr(self, x, y, normed=True, detrend=detrend_none, usevlines=True,

     .. plot:: mpl_examples/pylab_examples/xcorr_demo.py
     """
-    pass
+
+
+    Nx = len(x)
+    if Nx!=len(y):
+        raise ValueError('x and y must be equal length')
+
+    x = detrend(np.asarray(x))
+    y = detrend(np.asarray(y))
+
+    c = np.correlate(x, y, mode=2)
+
+    if normed:
+        c /= np.sqrt(np.dot(x, x) * np.dot(y, y))
+
+    if maxlags is None:
+        maxlags = Nx - 1
+
+    if maxlags >= Nx or maxlags < 1:
+        raise ValueError('maxlags must be None or strictly '
+                         'positive < %d' % Nx)
+
+    lags = np.arange(-maxlags,maxlags+1)
+    c = c[Nx-1-maxlags:Nx+maxlags]
+
+
+    if usevlines:
+        a = self.vlines(lags, [0], c, **kwargs)
+        b = self.axhline(**kwargs)
+        kwargs.setdefault('marker', 'o')
+        kwargs.setdefault('linestyle', 'None')
+        d = self.plot(lags, c, **kwargs)
+    else:
+
+        kwargs.setdefault('marker', 'o')
+        kwargs.setdefault('linestyle', 'None')
+        a, = self.plot(lags, c, **kwargs)
+        b = None
+    return lags, c, a, b
+
+
+
+


 arrvs = ar_generator()
+##arma = ARIMA()
+##res = arma.fit(arrvs[0], 4, 0)
 arma = ARIMA(arrvs[0])
-res = arma.fit((4, 0, 0))
+res = arma.fit((4,0, 0))
+
 print(res[0])
+
 acf1 = acf(arrvs[0])
 acovf1b = acovf(arrvs[0], unbiased=False)
 acf2 = autocorr(arrvs[0])
-acf2m = autocorr(arrvs[0] - arrvs[0].mean())
+acf2m = autocorr(arrvs[0]-arrvs[0].mean())
 print(acf1[:10])
 print(acovf1b[:10])
 print(acf2[:10])
 print(acf2m[:10])
+
+
 x = arma_generate_sample([1.0, -0.8], [1.0], 500)
 print(acf(x)[:20])
 print(regression.yule_walker(x, 10))
+
+#ax = plt.axes()
 plt.plot(x)
+#plt.show()
+
 plt.figure()
-pltxcorr(plt, x, x)
+pltxcorr(plt,x,x)
 plt.figure()
-pltxcorr(plt, x, x, usevlines=False)
+pltxcorr(plt,x,x, usevlines=False)
 plt.figure()
+#FIXME: plotacf was moved to graphics/tsaplots.py, and interface changed
 plot_acf(plt, acf1[:20], np.arange(len(acf1[:20])), usevlines=True)
 plt.figure()
 ax = plt.subplot(211)
 plot_acf(ax, acf1[:20], usevlines=True)
 ax = plt.subplot(212)
 plot_acf(ax, acf1[:20], np.arange(len(acf1[:20])), usevlines=False)
+
+#plt.show()
diff --git a/statsmodels/sandbox/tsa/examples/ex_mle_arma.py b/statsmodels/sandbox/tsa/examples/ex_mle_arma.py
index 0e13f4f0f..3f5e7c014 100644
--- a/statsmodels/sandbox/tsa/examples/ex_mle_arma.py
+++ b/statsmodels/sandbox/tsa/examples/ex_mle_arma.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 TODO: broken because of changes to arguments and import paths
 fixing this needs a closer look
@@ -7,34 +8,44 @@ Author: josef-pktd
 copyright: Simplified BSD see license.txt
 """
 import numpy as np
+
 import numdifftools as ndt
+
 from statsmodels.sandbox import tsa
-from statsmodels.tsa.arma_mle import Arma
+from statsmodels.tsa.arma_mle import Arma  # local import
 from statsmodels.tsa.arima_process import arma_generate_sample
+
 examples = ['arma']
 if 'arma' in examples:
-    print('\nExample 1')
+
+    print("\nExample 1")
     print('----------')
     ar = [1.0, -0.8]
-    ma = [1.0, 0.5]
-    y1 = arma_generate_sample(ar, ma, 1000, 0.1)
-    y1 -= y1.mean()
+    ma = [1.0,  0.5]
+    y1 = arma_generate_sample(ar,ma,1000,0.1)
+    y1 -= y1.mean() #no mean correction/constant in estimation so far
+
     arma1 = Arma(y1)
     arma1.nar = 1
     arma1.nma = 1
-    arma1res = arma1.fit_mle(order=(1, 1), method='fmin')
+    arma1res = arma1.fit_mle(order=(1,1), method='fmin')
     print(arma1res.params)
+
+    #Warning need new instance otherwise results carry over
     arma2 = Arma(y1)
     arma2.nar = 1
     arma2.nma = 1
     res2 = arma2.fit(method='bfgs')
     print(res2.params)
     print(res2.model.hessian(res2.params))
-    print(ndt.Hessian(arma1.loglike, stepMax=0.01)(res2.params))
+    print(ndt.Hessian(arma1.loglike, stepMax=1e-2)(res2.params))
     arest = tsa.arima.ARIMA(y1)
-    resls = arest.fit((1, 0, 1))
+    resls = arest.fit((1,0,1))
     print(resls[0])
     print(resls[1])
+
+
+
     print('\nparameter estimate - comparing methods')
     print('---------------------------------------')
     print('parameter of DGP ar(1), ma(1), sigma_error')
@@ -45,69 +56,85 @@ if 'arma' in examples:
     print(res2.params)
     print('cond. least squares uses optim.leastsq ?')
     errls = arest.error_estimate
-    print(resls[0], np.sqrt(np.dot(errls, errls) / errls.shape[0]))
+    print(resls[0], np.sqrt(np.dot(errls,errls)/errls.shape[0]))
+
     err = arma1.geterrors(res2.params)
     print('cond least squares parameter cov')
-    print(np.dot(errls, errls) / errls.shape[0] * resls[1])
+    #print(np.dot(err,err)/err.shape[0] * resls[1])
+    #errls = arest.error_estimate
+    print(np.dot(errls,errls)/errls.shape[0] * resls[1])
+#    print('fmin hessian')
+#    print(arma1res.model.optimresults['Hopt'][:2,:2])
     print('bfgs hessian')
-    print(res2.model.optimresults['Hopt'][:2, :2])
+    print(res2.model.optimresults['Hopt'][:2,:2])
     print('numdifftools inverse hessian')
-    print(-np.linalg.inv(ndt.Hessian(arma1.loglike, stepMax=0.01)(res2.
-        params))[:2, :2])
+    print(-np.linalg.inv(ndt.Hessian(arma1.loglike, stepMax=1e-2)(res2.params))[:2,:2])
+
     print('\nFitting Arma(1,1) to squared data')
-    arma3 = Arma(y1 ** 2)
+    arma3 = Arma(y1**2)
     res3 = arma3.fit(method='bfgs')
     print(res3.params)
+
     print('\nFitting Arma(3,3) to data from DGP Arma(1,1)')
     arma4 = Arma(y1)
     arma4.nar = 3
     arma4.nma = 3
-    res4 = arma4.fit(start_params=[-0.5, -0.1, -0.1, 0.2, 0.1, 0.1, 0.5])
+    #res4 = arma4.fit(method='bfgs')
+    res4 = arma4.fit(start_params=[-0.5, -0.1,-0.1,0.2,0.1,0.1,0.5])
     print(res4.params)
     print('numdifftools inverse hessian')
-    pcov = -np.linalg.inv(ndt.Hessian(arma4.loglike, stepMax=0.01)(res4.params)
-        )
+    pcov = -np.linalg.inv(ndt.Hessian(arma4.loglike, stepMax=1e-2)(res4.params))
+    #print(pcov)
     print('standard error of parameter estimate from Hessian')
     pstd = np.sqrt(np.diag(pcov))
     print(pstd)
     print('t-values')
-    print(res4.params / pstd)
+    print(res4.params/pstd)
     print('eigenvalues of pcov:')
     print(np.linalg.eigh(pcov)[0])
     print('sometimes they are negative')
-    print('\nExample 2 - DGP is Arma(3,3)')
+
+
+    print("\nExample 2 - DGP is Arma(3,3)")
     print('-----------------------------')
     ar = [1.0, -0.6, -0.2, -0.1]
-    ma = [1.0, 0.5, 0.1, 0.1]
-    y2 = arest.generate_sample(ar, ma, 1000, 0.1)
-    y2 -= y2.mean()
+    ma = [1.0,  0.5, 0.1, 0.1]
+    y2 = arest.generate_sample(ar,ma,1000,0.1)
+    y2 -= y2.mean() #no mean correction/constant in estimation so far
+
+
     print('\nFitting Arma(3,3) to data from DGP Arma(3,3)')
     arma4 = Arma(y2)
     arma4.nar = 3
     arma4.nma = 3
+    #res4 = arma4.fit(method='bfgs')
     print('\ntrue parameters')
     print('ar', ar[1:])
     print('ma', ma[1:])
-    res4 = arma4.fit(start_params=[-0.5, -0.1, -0.1, 0.2, 0.1, 0.1, 0.5])
+    res4 = arma4.fit(start_params=[-0.5, -0.1,-0.1,0.2,0.1,0.1,0.5])
     print(res4.params)
     print('numdifftools inverse hessian')
-    pcov = -np.linalg.inv(ndt.Hessian(arma4.loglike, stepMax=0.01)(res4.params)
-        )
+    pcov = -np.linalg.inv(ndt.Hessian(arma4.loglike, stepMax=1e-2)(res4.params))
+    #print(pcov)
     print('standard error of parameter estimate from Hessian')
     pstd = np.sqrt(np.diag(pcov))
     print(pstd)
     print('t-values')
-    print(res4.params / pstd)
+    print(res4.params/pstd)
     print('eigenvalues of pcov:')
     print(np.linalg.eigh(pcov)[0])
     print('sometimes they are negative')
+
     arma6 = Arma(y2)
     arma6.nar = 3
     arma6.nma = 3
-    res6 = arma6.fit(start_params=[-0.5, -0.1, -0.1, 0.2, 0.1, 0.1, 0.5],
-        method='bfgs')
+    res6 = arma6.fit(start_params=[-0.5, -0.1,-0.1,0.2,0.1,0.1,0.5],
+                      method='bfgs')
     print('\nmle with bfgs')
     print(res6.params)
     print('pstd with bfgs hessian')
     hopt = res6.model.optimresults['Hopt']
     print(np.sqrt(np.diag(hopt)))
+
+    #fmin estimates for coefficients in ARMA(3,3) look good
+    #but not inverse Hessian, sometimes negative values for variance
diff --git a/statsmodels/sandbox/tsa/examples/example_var.py b/statsmodels/sandbox/tsa/examples/example_var.py
index 174a8f2ce..63713b7fa 100644
--- a/statsmodels/sandbox/tsa/examples/example_var.py
+++ b/statsmodels/sandbox/tsa/examples/example_var.py
@@ -1,27 +1,55 @@
 """
 Look at some macro plots, then do some VARs and IRFs.
 """
+
 import numpy as np
 import scikits.timeseries as ts
 import scikits.timeseries.lib.plotlib as tplt
+
 import statsmodels.api as sm
+
 data = sm.datasets.macrodata.load()
 data = data.data
+
+
+### Create Timeseries Representations of a few vars
+
 dates = ts.date_array(start_date=ts.Date('Q', year=1959, quarter=1),
     end_date=ts.Date('Q', year=2009, quarter=3))
-ts_data = data[['realgdp', 'realcons', 'cpi']].view(float).reshape(-1, 3)
-ts_data = np.column_stack((ts_data, (1 - data['unemp'] / 100) * data['pop']))
+
+ts_data = data[['realgdp','realcons','cpi']].view(float).reshape(-1,3)
+ts_data = np.column_stack((ts_data, (1 - data['unemp']/100) * data['pop']))
 ts_series = ts.time_series(ts_data, dates)
+
+
 fig = tplt.tsfigure()
 fsp = fig.add_tsplot(221)
-fsp.tsplot(ts_series[:, 0], '-')
-fsp.set_title('Real GDP')
+fsp.tsplot(ts_series[:,0],'-')
+fsp.set_title("Real GDP")
 fsp = fig.add_tsplot(222)
-fsp.tsplot(ts_series[:, 1], 'r-')
-fsp.set_title('Real Consumption')
+fsp.tsplot(ts_series[:,1],'r-')
+fsp.set_title("Real Consumption")
 fsp = fig.add_tsplot(223)
-fsp.tsplot(ts_series[:, 2], 'g-')
-fsp.set_title('CPI')
+fsp.tsplot(ts_series[:,2],'g-')
+fsp.set_title("CPI")
 fsp = fig.add_tsplot(224)
-fsp.tsplot(ts_series[:, 3], 'y-')
-fsp.set_title('Employment')
+fsp.tsplot(ts_series[:,3],'y-')
+fsp.set_title("Employment")
+
+
+
+# Plot real GDP
+#plt.subplot(221)
+#plt.plot(data['realgdp'])
+#plt.title("Real GDP")
+
+# Plot employment
+#plt.subplot(222)
+
+# Plot cpi
+#plt.subplot(223)
+
+# Plot real consumption
+#plt.subplot(224)
+
+#plt.show()
diff --git a/statsmodels/sandbox/tsa/examples/try_ld_nitime.py b/statsmodels/sandbox/tsa/examples/try_ld_nitime.py
index 0a4164fcc..b5880df1a 100644
--- a/statsmodels/sandbox/tsa/examples/try_ld_nitime.py
+++ b/statsmodels/sandbox/tsa/examples/try_ld_nitime.py
@@ -1,15 +1,24 @@
-"""Levinson Durbin recursion adjusted from nitime
+'''Levinson Durbin recursion adjusted from nitime
+
+'''

-"""
 import numpy as np
+
 import nitime.utils as ut
+
 import statsmodels.api as sm
-sxx = None
+
+sxx=None
 order = 10
-npts = 2048 * 10
+
+npts = 2048*10
 sigma = 1
 drop_transients = 1024
 coefs = np.array([0.9, -0.5])
+
+# Generate AR(2) time series
 X, v, _ = ut.ar_generator(npts, sigma, coefs, drop_transients)
+
 s = X
+
 sm.tsa.stattools.pacf(X)
diff --git a/statsmodels/sandbox/tsa/fftarma.py b/statsmodels/sandbox/tsa/fftarma.py
index 2c180633c..eeb22ef7c 100644
--- a/statsmodels/sandbox/tsa/fftarma.py
+++ b/statsmodels/sandbox/tsa/fftarma.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Mon Dec 14 19:53:25 2009

@@ -27,14 +28,20 @@ change/check: instead of using marep, use fft-transform of ar and ma
 get function drop imag if close to zero from numpy/scipy source, where?

 """
+
 import numpy as np
 import numpy.fft as fft
+#import scipy.fftpack as fft
 from scipy import signal
+#from try_var_convolve import maxabs
 from statsmodels.tsa.arima_process import ArmaProcess


+#trying to convert old experiments to a class
+
+
 class ArmaFft(ArmaProcess):
-    """fft tools for arma processes
+    '''fft tools for arma processes

     This class contains several methods that are providing the same or similar
     returns to try out and test different implementations.
@@ -56,20 +63,23 @@ class ArmaFft(ArmaProcess):
     normalization of the power spectrum, spectral density: not checked yet, for
     example no variance of underlying process is used

-    """
+    '''

     def __init__(self, ar, ma, n):
+        #duplicates now that are subclassing ArmaProcess
         super(ArmaFft, self).__init__(ar, ma)
+
         self.ar = np.asarray(ar)
         self.ma = np.asarray(ma)
         self.nobs = n
+        #could make the polynomials into cached attributes
         self.arpoly = np.polynomial.Polynomial(ar)
         self.mapoly = np.polynomial.Polynomial(ma)
-        self.nar = len(ar)
+        self.nar = len(ar)  #1d only currently
         self.nma = len(ma)

     def padarr(self, arr, maxlag, atend=True):
-        """pad 1d array with zeros at end to have length maxlag
+        '''pad 1d array with zeros at end to have length maxlag
         function that is a method, no self used

         Parameters
@@ -92,11 +102,15 @@ class ArmaFft(ArmaProcess):
         This is mainly written to extend coefficient arrays for the lag-polynomials.
         It returns a copy.

-        """
-        pass
+        '''
+        if atend:
+            return np.r_[arr, np.zeros(maxlag-len(arr))]
+        else:
+            return np.r_[np.zeros(maxlag-len(arr)), arr]
+

     def pad(self, maxlag):
-        """construct AR and MA polynomials that are zero-padded to a common length
+        '''construct AR and MA polynomials that are zero-padded to a common length

         Parameters
         ----------
@@ -110,11 +124,13 @@ class ArmaFft(ArmaProcess):
         ma : ndarray
             extended AR polynomial coefficients

-        """
-        pass
+        '''
+        arpad = np.r_[self.ar, np.zeros(maxlag-self.nar)]
+        mapad = np.r_[self.ma, np.zeros(maxlag-self.nma)]
+        return arpad, mapad

     def fftar(self, n=None):
-        """Fourier transform of AR polynomial, zero-padded at end to n
+        '''Fourier transform of AR polynomial, zero-padded at end to n

         Parameters
         ----------
@@ -125,11 +141,13 @@ class ArmaFft(ArmaProcess):
         -------
         fftar : ndarray
             fft of zero-padded ar polynomial
-        """
-        pass
+        '''
+        if n is None:
+            n = len(self.ar)
+        return fft.fft(self.padarr(self.ar, n))

     def fftma(self, n):
-        """Fourier transform of MA polynomial, zero-padded at end to n
+        '''Fourier transform of MA polynomial, zero-padded at end to n

         Parameters
         ----------
@@ -140,11 +158,13 @@ class ArmaFft(ArmaProcess):
         -------
         fftar : ndarray
             fft of zero-padded ar polynomial
-        """
-        pass
+        '''
+        if n is None:
+            n = len(self.ar)
+        return fft.fft(self.padarr(self.ma, n))

     def fftarma(self, n=None):
-        """Fourier transform of ARMA polynomial, zero-padded at end to n
+        '''Fourier transform of ARMA polynomial, zero-padded at end to n

         The Fourier transform of the ARMA process is calculated as the ratio
         of the fft of the MA polynomial divided by the fft of the AR polynomial.
@@ -158,45 +178,68 @@ class ArmaFft(ArmaProcess):
         -------
         fftarma : ndarray
             fft of zero-padded arma polynomial
-        """
-        pass
+        '''
+        if n is None:
+            n = self.nobs
+        return (self.fftma(n) / self.fftar(n))

     def spd(self, npos):
-        """raw spectral density, returns Fourier transform
+        '''raw spectral density, returns Fourier transform

         n is number of points in positive spectrum, the actual number of points
         is twice as large. different from other spd methods with fft
-        """
-        pass
+        '''
+        n = npos
+        w = fft.fftfreq(2*n) * 2 * np.pi
+        hw = self.fftarma(2*n)  #not sure, need to check normalization
+        #return (hw*hw.conj()).real[n//2-1:]  * 0.5 / np.pi #does not show in plot
+        return (hw*hw.conj()).real * 0.5 / np.pi, w

     def spdshift(self, n):
-        """power spectral density using fftshift
+        '''power spectral density using fftshift

         currently returns two-sided according to fft frequencies, use first half
-        """
-        pass
+        '''
+        #size = s1+s2-1
+        mapadded = self.padarr(self.ma, n)
+        arpadded = self.padarr(self.ar, n)
+        hw = fft.fft(fft.fftshift(mapadded)) / fft.fft(fft.fftshift(arpadded))
+        #return np.abs(spd)[n//2-1:]
+        w = fft.fftfreq(n) * 2 * np.pi
+        wslice = slice(n//2-1, None, None)
+        #return (hw*hw.conj()).real[wslice], w[wslice]
+        return (hw*hw.conj()).real, w

     def spddirect(self, n):
-        """power spectral density using padding to length n done by fft
+        '''power spectral density using padding to length n done by fft

         currently returns two-sided according to fft frequencies, use first half
-        """
-        pass
+        '''
+        #size = s1+s2-1
+        #abs looks wrong
+        hw = fft.fft(self.ma, n) / fft.fft(self.ar, n)
+        w = fft.fftfreq(n) * 2 * np.pi
+        wslice = slice(None, n//2, None)
+        #return (np.abs(hw)**2)[wslice], w[wslice]
+        return (np.abs(hw)**2) * 0.5/np.pi, w

     def _spddirect2(self, n):
-        """this looks bad, maybe with an fftshift
-        """
-        pass
+        '''this looks bad, maybe with an fftshift
+        '''
+        #size = s1+s2-1
+        hw = (fft.fft(np.r_[self.ma[::-1],self.ma], n)
+                / fft.fft(np.r_[self.ar[::-1],self.ar], n))
+        return (hw*hw.conj()) #.real[n//2-1:]

     def spdroots(self, w):
-        """spectral density for frequency using polynomial roots
+        '''spectral density for frequency using polynomial roots

         builds two arrays (number of roots, number of frequencies)
-        """
-        pass
+        '''
+        return self._spdroots(self.arroots, self.maroots, w)

     def _spdroots(self, arroots, maroots, w):
-        """spectral density for frequency using polynomial roots
+        '''spectral density for frequency using polynomial roots

         builds two arrays (number of roots, number of frequencies)

@@ -212,20 +255,32 @@ class ArmaFft(ArmaProcess):
         Notes
         -----
         this should go into a function
-        """
-        pass
+        '''
+        w = np.atleast_2d(w).T
+        cosw = np.cos(w)
+        #Greene 5th edt. p626, section 20.2.7.a.
+        maroots = 1./maroots
+        arroots = 1./arroots
+        num = 1 + maroots**2 - 2* maroots * cosw
+        den = 1 + arroots**2 - 2* arroots * cosw
+        #print 'num.shape, den.shape', num.shape, den.shape
+        hw = 0.5 / np.pi * num.prod(-1) / den.prod(-1) #or use expsumlog
+        return np.squeeze(hw), w.squeeze()

     def spdpoly(self, w, nma=50):
-        """spectral density from MA polynomial representation for ARMA process
+        '''spectral density from MA polynomial representation for ARMA process

         References
         ----------
         Cochrane, section 8.3.3
-        """
-        pass
+        '''
+        mpoly = np.polynomial.Polynomial(self.arma2ma(nma))
+        hw = mpoly(np.exp(1j * w))
+        spd = np.real_if_close(hw * hw.conj() * 0.5/np.pi)
+        return spd, w

     def filter(self, x):
-        """
+        '''
         filter a timeseries with the ARMA filter

         padding with zero is missing, in example I needed the padding to get
@@ -238,31 +293,52 @@ class ArmaFft(ArmaProcess):
         --------
         tsa.filters.fftconvolve

-        """
-        pass
+        '''
+        n = x.shape[0]
+        if n == self.fftarma:
+            fftarma = self.fftarma
+        else:
+            fftarma = self.fftma(n) / self.fftar(n)
+        tmpfft = fftarma * fft.fft(x)
+        return fft.ifft(tmpfft)

     def filter2(self, x, pad=0):
-        """filter a time series using fftconvolve3 with ARMA filter
+        '''filter a time series using fftconvolve3 with ARMA filter

         padding of x currently works only if x is 1d
         in example it produces same observations at beginning as lfilter even
         without padding.

         TODO: this returns 1 additional observation at the end
-        """
-        pass
+        '''
+        from statsmodels.tsa.filters import fftconvolve3
+        if not pad:
+            pass
+        elif pad == 'auto':
+            #just guessing how much padding
+            x = self.padarr(x, x.shape[0] + 2*(self.nma+self.nar), atend=False)
+        else:
+            x = self.padarr(x, x.shape[0] + int(pad), atend=False)
+
+        return fftconvolve3(x, self.ma, self.ar)
+

     def acf2spdfreq(self, acovf, nfreq=100, w=None):
-        """
+        '''
         not really a method
         just for comparison, not efficient for large n or long acf

         this is also similarly use in tsa.stattools.periodogram with window
-        """
-        pass
+        '''
+        if w is None:
+            w = np.linspace(0, np.pi, nfreq)[:, None]
+        nac = len(acovf)
+        hw = 0.5 / np.pi * (acovf[0] +
+                            2 * (acovf[1:] * np.cos(w*np.arange(1,nac))).sum(1))
+        return hw

     def invpowerspd(self, n):
-        """autocovariance from spectral density
+        '''autocovariance from spectral density

         scaling is correct, but n needs to be large for numerical accuracy
         maybe padding with zero in fft would be faster
@@ -274,98 +350,171 @@ class ArmaFft(ArmaProcess):
         >>> ArmaFft([1, -0.5], [1., 0.4], 40).acovf(10)
         array([ 2.08    ,  1.44    ,  0.72    ,  0.36    ,  0.18    ,  0.09    ,
                 0.045   ,  0.0225  ,  0.01125 ,  0.005625])
-        """
-        pass
+        '''
+        hw = self.fftarma(n)
+        return np.real_if_close(fft.ifft(hw*hw.conj()), tol=200)[:n]

     def spdmapoly(self, w, twosided=False):
-        """ma only, need division for ar, use LagPolynomial
-        """
-        pass
+        '''ma only, need division for ar, use LagPolynomial
+        '''
+        if w is None:
+            w = np.linspace(0, np.pi, nfreq)
+        return 0.5 / np.pi * self.mapoly(np.exp(w*1j))
+

     def plot4(self, fig=None, nobs=100, nacf=20, nfreq=100):
         """Plot results"""
-        pass
+        rvs = self.generate_sample(nsample=100, burnin=500)
+        acf = self.acf(nacf)[:nacf]  #TODO: check return length
+        pacf = self.pacf(nacf)
+        w = np.linspace(0, np.pi, nfreq)
+        spdr, wr = self.spdroots(w)
+
+        if fig is None:
+            import matplotlib.pyplot as plt
+            fig = plt.figure()
+        ax = fig.add_subplot(2,2,1)
+        ax.plot(rvs)
+        ax.set_title('Random Sample \nar=%s, ma=%s' % (self.ar, self.ma))
+
+        ax = fig.add_subplot(2,2,2)
+        ax.plot(acf)
+        ax.set_title('Autocorrelation \nar=%s, ma=%rs' % (self.ar, self.ma))
+
+        ax = fig.add_subplot(2,2,3)
+        ax.plot(wr, spdr)
+        ax.set_title('Power Spectrum \nar=%s, ma=%s' % (self.ar, self.ma))

+        ax = fig.add_subplot(2,2,4)
+        ax.plot(pacf)
+        ax.set_title('Partial Autocorrelation \nar=%s, ma=%s' % (self.ar, self.ma))
+
+        return fig
+
+
+
+
+
+
+
+def spdar1(ar, w):
+    if np.ndim(ar) == 0:
+        rho = ar
+    else:
+        rho = -ar[1]
+    return 0.5 / np.pi /(1 + rho*rho - 2 * rho * np.cos(w))

 if __name__ == '__main__':
-    nobs = 200
+    def maxabs(x,y):
+        return np.max(np.abs(x-y))
+    nobs = 200  #10000
     ar = [1, 0.0]
     ma = [1, 0.0]
     ar2 = np.zeros(nobs)
     ar2[:2] = [1, -0.9]
+
+
+
     uni = np.zeros(nobs)
-    uni[0] = 1.0
+    uni[0]=1.
+    #arrep = signal.lfilter(ma, ar, ar2)
+    #marep = signal.lfilter([1],arrep, uni)
+    # same faster:
     arcomb = np.convolve(ar, ar2, mode='same')
-    marep = signal.lfilter(ma, arcomb, uni)
+    marep = signal.lfilter(ma,arcomb, uni) #[len(ma):]
     print(marep[:10])
     mafr = fft.fft(marep)
+
     rvs = np.random.normal(size=nobs)
     datafr = fft.fft(rvs)
-    y = fft.ifft(mafr * datafr)
-    print(np.corrcoef(np.c_[y[2:], y[1:-1], y[:-2]], rowvar=0))
-    arrep = signal.lfilter([1], marep, uni)
-    print(arrep[:20])
+    y = fft.ifft(mafr*datafr)
+    print(np.corrcoef(np.c_[y[2:], y[1:-1], y[:-2]],rowvar=0))
+
+    arrep = signal.lfilter([1],marep, uni)
+    print(arrep[:20])  # roundtrip to ar
     arfr = fft.fft(arrep)
     yfr = fft.fft(y)
-    x = fft.ifft(arfr * yfr).real
+    x = fft.ifft(arfr*yfr).real  #imag part is e-15
+    # the next two are equal, roundtrip works
     print(x[:5])
     print(rvs[:5])
-    print(np.corrcoef(np.c_[x[2:], x[1:-1], x[:-2]], rowvar=0))
+    print(np.corrcoef(np.c_[x[2:], x[1:-1], x[:-2]],rowvar=0))
+
+
+    # ARMA filter using fft with ratio of fft of ma/ar lag polynomial
+    # seems much faster than using lfilter
+
+    #padding, note arcomb is already full length
     arcombp = np.zeros(nobs)
     arcombp[:len(arcomb)] = arcomb
-    map_ = np.zeros(nobs)
+    map_ = np.zeros(nobs)    #rename: map was shadowing builtin
     map_[:len(ma)] = ma
     ar0fr = fft.fft(arcombp)
     ma0fr = fft.fft(map_)
-    y2 = fft.ifft(ma0fr / ar0fr * datafr)
+    y2 = fft.ifft(ma0fr/ar0fr*datafr)
+    #the next two are (almost) equal in real part, almost zero but different in imag
     print(y2[:10])
     print(y[:10])
-    print(maxabs(y, y2))
+    print(maxabs(y, y2))  # from chfdiscrete
+    #1.1282071239631782e-014
+
     ar = [1, -0.4]
     ma = [1, 0.2]
-    arma1 = ArmaFft([1, -0.5, 0, 0, 0, 0, -0.7, 0.3], [1, 0.8], nobs)
+
+    arma1 = ArmaFft([1, -0.5,0,0,0,00, -0.7, 0.3], [1, 0.8], nobs)
+
     nfreq = nobs
     w = np.linspace(0, np.pi, nfreq)
-    w2 = np.linspace(0, 2 * np.pi, nfreq)
+    w2 = np.linspace(0, 2*np.pi, nfreq)
+
     import matplotlib.pyplot as plt
     plt.close('all')
+
     plt.figure()
-    spd1, w1 = arma1.spd(2 ** 10)
+    spd1, w1 = arma1.spd(2**10)
     print(spd1.shape)
     _ = plt.plot(spd1)
     plt.title('spd fft complex')
+
     plt.figure()
-    spd2, w2 = arma1.spdshift(2 ** 10)
+    spd2, w2 = arma1.spdshift(2**10)
     print(spd2.shape)
     _ = plt.plot(w2, spd2)
     plt.title('spd fft shift')
+
     plt.figure()
-    spd3, w3 = arma1.spddirect(2 ** 10)
+    spd3, w3 = arma1.spddirect(2**10)
     print(spd3.shape)
     _ = plt.plot(w3, spd3)
     plt.title('spd fft direct')
+
     plt.figure()
-    spd3b = arma1._spddirect2(2 ** 10)
+    spd3b = arma1._spddirect2(2**10)
     print(spd3b.shape)
     _ = plt.plot(spd3b)
     plt.title('spd fft direct mirrored')
+
     plt.figure()
     spdr, wr = arma1.spdroots(w)
     print(spdr.shape)
     plt.plot(w, spdr)
     plt.title('spd from roots')
+
     plt.figure()
     spdar1_ = spdar1(arma1.ar, w)
     print(spdar1_.shape)
     _ = plt.plot(w, spdar1_)
     plt.title('spd ar1')
+
+
     plt.figure()
     wper, spdper = arma1.periodogram(nfreq)
     print(spdper.shape)
     _ = plt.plot(w, spdper)
     plt.title('periodogram')
+
     startup = 1000
-    rvs = arma1.generate_sample(startup + 10000)[startup:]
+    rvs = arma1.generate_sample(startup+10000)[startup:]
     import matplotlib.mlab as mlb
     plt.figure()
     sdm, wm = mlb.psd(x)
@@ -373,12 +522,17 @@ if __name__ == '__main__':
     sdm = sdm.ravel()
     plt.plot(wm, sdm)
     plt.title('matplotlib')
+
     from nitime.algorithms import LD_AR_est
+    #yule_AR_est(s, order, Nfreqs)
     wnt, spdnt = LD_AR_est(rvs, 10, 512)
     plt.figure()
     print('spdnt.shape', spdnt.shape)
     _ = plt.plot(spdnt.ravel())
     print(spdnt[:10])
     plt.title('nitime')
+
     fig = plt.figure()
     arma1.plot4(fig)
+
+    #plt.show()
diff --git a/statsmodels/sandbox/tsa/movstat.py b/statsmodels/sandbox/tsa/movstat.py
index c929abd9d..af111874f 100644
--- a/statsmodels/sandbox/tsa/movstat.py
+++ b/statsmodels/sandbox/tsa/movstat.py
@@ -1,4 +1,4 @@
-"""using scipy signal and numpy correlate to calculate some time series
+'''using scipy signal and numpy correlate to calculate some time series
 statistics

 original developer notes
@@ -28,23 +28,35 @@ True
 # multidimensional, but, it looks like it uses common filter across time series, no VAR
 ndimage.filters.correlate(np.vstack([x,x]),np.array([[1,1,1],[0,0,0]]), origin = 1)
 ndimage.filters.correlate(x,[1,1,1],origin = 1))
-ndimage.filters.correlate(np.vstack([x,x]),np.array([[0.5,0.5,0.5],[0.5,0.5,0.5]]), origin = 1)
+ndimage.filters.correlate(np.vstack([x,x]),np.array([[0.5,0.5,0.5],[0.5,0.5,0.5]]), \
+origin = 1)

->>> np.all(ndimage.filters.correlate(np.vstack([x,x]),np.array([[1,1,1],[0,0,0]]), origin = 1)[0]==ndimage.filters.correlate(x,[1,1,1],origin = 1))
+>>> np.all(ndimage.filters.correlate(np.vstack([x,x]),np.array([[1,1,1],[0,0,0]]), origin = 1)[0]==\
+ndimage.filters.correlate(x,[1,1,1],origin = 1))
 True
->>> np.all(ndimage.filters.correlate(np.vstack([x,x]),np.array([[0.5,0.5,0.5],[0.5,0.5,0.5]]), origin = 1)[0]==ndimage.filters.correlate(x,[1,1,1],origin = 1))
+>>> np.all(ndimage.filters.correlate(np.vstack([x,x]),np.array([[0.5,0.5,0.5],[0.5,0.5,0.5]]), \
+origin = 1)[0]==ndimage.filters.correlate(x,[1,1,1],origin = 1))


 update
 2009-09-06: cosmetic changes, rearrangements
-"""
+'''
+
 import numpy as np
 from scipy import signal
+
 from numpy.testing import assert_array_equal, assert_array_almost_equal


-def movorder(x, order='med', windsize=3, lag='lagged'):
-    """moving order statistics
+def expandarr(x,k):
+    #make it work for 2D or nD with axis
+    kadd = k
+    if np.ndim(x) == 2:
+        kadd = (kadd, np.shape(x)[1])
+    return np.r_[np.ones(kadd)*x[0],x,np.ones(kadd)*x[-1]]
+
+def movorder(x, order = 'med', windsize=3, lag='lagged'):
+    '''moving order statistics

     Parameters
     ----------
@@ -62,17 +74,88 @@ def movorder(x, order='med', windsize=3, lag='lagged'):
     filtered array


-    """
-    pass
-
+    '''
+
+    #if windsize is even should it raise ValueError
+    if lag == 'lagged':
+        lead = windsize//2
+    elif lag == 'centered':
+        lead = 0
+    elif lag == 'leading':
+        lead = -windsize//2 +1
+    else:
+        raise ValueError
+    if np.isfinite(order): #if np.isnumber(order):
+        ord = order   # note: ord is a builtin function
+    elif order == 'med':
+        ord = (windsize - 1)/2
+    elif order == 'min':
+        ord = 0
+    elif order == 'max':
+        ord = windsize - 1
+    else:
+        raise ValueError
+
+    #return signal.order_filter(x,np.ones(windsize),ord)[:-lead]
+    xext = expandarr(x, windsize)
+    #np.r_[np.ones(windsize)*x[0],x,np.ones(windsize)*x[-1]]
+    return signal.order_filter(xext,np.ones(windsize),ord)[windsize-lead:-(windsize+lead)]

 def check_movorder():
-    """graphical test for movorder"""
-    pass
-
+    '''graphical test for movorder'''
+    import matplotlib.pylab as plt
+    x = np.arange(1,10)
+    xo = movorder(x, order='max')
+    assert_array_equal(xo, x)
+    x = np.arange(10,1,-1)
+    xo = movorder(x, order='min')
+    assert_array_equal(xo, x)
+    assert_array_equal(movorder(x, order='min', lag='centered')[:-1], x[1:])
+
+    tt = np.linspace(0,2*np.pi,15)
+    x = np.sin(tt) + 1
+    xo = movorder(x, order='max')
+    plt.figure()
+    plt.plot(tt,x,'.-',tt,xo,'.-')
+    plt.title('moving max lagged')
+    xo = movorder(x, order='max', lag='centered')
+    plt.figure()
+    plt.plot(tt,x,'.-',tt,xo,'.-')
+    plt.title('moving max centered')
+    xo = movorder(x, order='max', lag='leading')
+    plt.figure()
+    plt.plot(tt,x,'.-',tt,xo,'.-')
+    plt.title('moving max leading')
+
+# identity filter
+##>>> signal.order_filter(x,np.ones(1),0)
+##array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
+# median filter
+##signal.medfilt(np.sin(x), kernel_size=3)
+##>>> plt.figure()
+##<matplotlib.figure.Figure object at 0x069BBB50>
+##>>> x=np.linspace(0,3,100);plt.plot(x,np.sin(x),x,signal.medfilt(np.sin(x), kernel_size=3))
+
+# remove old version
+##def movmeanvar(x, windowsize=3, valid='same'):
+##    '''
+##    this should also work along axis or at least for columns
+##    '''
+##    n = x.shape[0]
+##    x = expandarr(x, windowsize - 1)
+##    takeslice = slice(windowsize-1, n + windowsize-1)
+##    avgkern = (np.ones(windowsize)/float(windowsize))
+##    m = np.correlate(x, avgkern, 'same')#[takeslice]
+##    print(m.shape)
+##    print(x.shape)
+##    xm = x - m
+##    v = np.correlate(x*x, avgkern, 'same') - m**2
+##    v1 = np.correlate(xm*xm, avgkern, valid) #not correct for var of window
+###>>> np.correlate(xm*xm,np.array([1,1,1])/3.0,'valid')-np.correlate(xm*xm,np.array([1,1,1])/3.0,'valid')**2
+##    return m[takeslice], v[takeslice], v1

 def movmean(x, windowsize=3, lag='lagged'):
-    """moving window mean
+    '''moving window mean


     Parameters
@@ -95,12 +178,11 @@ def movmean(x, windowsize=3, lag='lagged'):
     for leading and lagging the data array x is extended by the closest value of the array


-    """
-    pass
-
+    '''
+    return movmoment(x, 1, windowsize=windowsize, lag=lag)

 def movvar(x, windowsize=3, lag='lagged'):
-    """moving window variance
+    '''moving window variance


     Parameters
@@ -118,12 +200,13 @@ def movvar(x, windowsize=3, lag='lagged'):
         moving variance, with same shape as x


-    """
-    pass
-
+    '''
+    m1 = movmoment(x, 1, windowsize=windowsize, lag=lag)
+    m2 = movmoment(x, 2, windowsize=windowsize, lag=lag)
+    return m2 - m1*m1

 def movmoment(x, k, windowsize=3, lag='lagged'):
-    """non-central moment
+    '''non-central moment


     Parameters
@@ -146,87 +229,184 @@ def movmoment(x, k, windowsize=3, lag='lagged'):
     If data x is 2d, then moving moment is calculated for each
     column.

-    """
-    pass
+    '''
+
+    windsize = windowsize
+    #if windsize is even should it raise ValueError
+    if lag == 'lagged':
+        #lead = -0 + windsize #windsize//2
+        lead = -0# + (windsize-1) + windsize//2
+        sl = slice((windsize-1) or None, -2*(windsize-1) or None)
+    elif lag == 'centered':
+        lead = -windsize//2  #0#-1 #+ #(windsize-1)
+        sl = slice((windsize-1)+windsize//2 or None, -(windsize-1)-windsize//2 or None)
+    elif lag == 'leading':
+        #lead = -windsize +1#+1 #+ (windsize-1)#//2 +1
+        lead = -windsize +2 #-windsize//2 +1
+        sl = slice(2*(windsize-1)+1+lead or None, -(2*(windsize-1)+lead)+1 or None)
+    else:
+        raise ValueError
+
+    avgkern = (np.ones(windowsize)/float(windowsize))
+    xext = expandarr(x, windsize-1)
+    #Note: expandarr increases the array size by 2*(windsize-1)
+
+    #sl = slice(2*(windsize-1)+1+lead or None, -(2*(windsize-1)+lead)+1 or None)
+    print(sl)
+
+    if xext.ndim == 1:
+        return np.correlate(xext**k, avgkern, 'full')[sl]
+        #return np.correlate(xext**k, avgkern, 'same')[windsize-lead:-(windsize+lead)]
+    else:
+        print(xext.shape)
+        print(avgkern[:,None].shape)
+
+        # try first with 2d along columns, possibly ndim with axis
+        return signal.correlate(xext**k, avgkern[:,None], 'full')[sl,:]
+


+
+
+
+
+#x=0.5**np.arange(10);xm=x-x.mean();a=np.correlate(xm,[1],'full')
+#x=0.5**np.arange(3);np.correlate(x,x,'same')
+##>>> x=0.5**np.arange(10);xm=x-x.mean();a=np.correlate(xm,xo,'full')
+##
+##>>> xo=np.ones(10);d=np.correlate(xo,xo,'full')
+##>>> xo
+##xo=np.ones(10);d=np.correlate(xo,xo,'full')
+##>>> x=np.ones(10);xo=x-x.mean();a=np.correlate(xo,xo,'full')
+##>>> xo=np.ones(10);d=np.correlate(xo,xo,'full')
+##>>> d
+##array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,   9.,
+##         8.,   7.,   6.,   5.,   4.,   3.,   2.,   1.])
+
+
+##def ccovf():
+##    pass
+##    #x=0.5**np.arange(10);xm=x-x.mean();a=np.correlate(xm,xo,'full')
+
 __all__ = ['movorder', 'movmean', 'movvar', 'movmoment']
+
 if __name__ == '__main__':
+
     print('\ncheckin moving mean and variance')
     nobs = 10
     x = np.arange(nobs)
     ws = 3
-    ave = np.array([0.0, 1 / 3.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 
-        26 / 3.0, 9])
-    va = np.array([[0.0, 0.0], [0.22222222, 0.88888889], [0.66666667, 
-        2.66666667], [0.66666667, 2.66666667], [0.66666667, 2.66666667], [
-        0.66666667, 2.66666667], [0.66666667, 2.66666667], [0.66666667, 
-        2.66666667], [0.66666667, 2.66666667], [0.66666667, 2.66666667], [
-        0.22222222, 0.88888889], [0.0, 0.0]])
-    ave2d = np.c_[ave, 2 * ave]
+    ave = np.array([ 0., 1/3., 1., 2., 3., 4., 5., 6., 7., 8.,
+                  26/3., 9])
+    va = np.array([[ 0.        ,  0.        ],
+                   [ 0.22222222,  0.88888889],
+                   [ 0.66666667,  2.66666667],
+                   [ 0.66666667,  2.66666667],
+                   [ 0.66666667,  2.66666667],
+                   [ 0.66666667,  2.66666667],
+                   [ 0.66666667,  2.66666667],
+                   [ 0.66666667,  2.66666667],
+                   [ 0.66666667,  2.66666667],
+                   [ 0.66666667,  2.66666667],
+                   [ 0.22222222,  0.88888889],
+                   [ 0.        ,  0.        ]])
+    ave2d = np.c_[ave, 2*ave]
     print(movmean(x, windowsize=ws, lag='lagged'))
     print(movvar(x, windowsize=ws, lag='lagged'))
-    print([np.var(x[i - ws:i]) for i in range(ws, nobs)])
+    print([np.var(x[i-ws:i]) for i in range(ws, nobs)])
     m1 = movmoment(x, 1, windowsize=3, lag='lagged')
     m2 = movmoment(x, 2, windowsize=3, lag='lagged')
     print(m1)
     print(m2)
-    print(m2 - m1 * m1)
-    assert_array_almost_equal(va[ws - 1:, 0], movvar(x, windowsize=3, lag=
-        'leading'))
-    assert_array_almost_equal(va[ws // 2:-ws // 2 + 1, 0], movvar(x,
-        windowsize=3, lag='centered'))
-    assert_array_almost_equal(va[:-ws + 1, 0], movvar(x, windowsize=ws, lag
-        ='lagged'))
+    print(m2 - m1*m1)
+
+    # this implicitly also tests moment
+    assert_array_almost_equal(va[ws-1:,0],
+                    movvar(x, windowsize=3, lag='leading'))
+    assert_array_almost_equal(va[ws//2:-ws//2+1,0],
+                    movvar(x, windowsize=3, lag='centered'))
+    assert_array_almost_equal(va[:-ws+1,0],
+                    movvar(x, windowsize=ws, lag='lagged'))
+
+
+
     print('\nchecking moving moment for 2d (columns only)')
-    x2d = np.c_[x, 2 * x]
+    x2d = np.c_[x, 2*x]
     print(movmoment(x2d, 1, windowsize=3, lag='centered'))
     print(movmean(x2d, windowsize=ws, lag='lagged'))
     print(movvar(x2d, windowsize=ws, lag='lagged'))
-    assert_array_almost_equal(va[ws - 1:, :], movvar(x2d, windowsize=3, lag
-        ='leading'))
-    assert_array_almost_equal(va[ws // 2:-ws // 2 + 1, :], movvar(x2d,
-        windowsize=3, lag='centered'))
-    assert_array_almost_equal(va[:-ws + 1, :], movvar(x2d, windowsize=ws,
-        lag='lagged'))
-    assert_array_almost_equal(ave2d[ws - 1:], movmoment(x2d, 1, windowsize=
-        3, lag='leading'))
-    assert_array_almost_equal(ave2d[ws // 2:-ws // 2 + 1], movmoment(x2d, 1,
-        windowsize=3, lag='centered'))
-    assert_array_almost_equal(ave2d[:-ws + 1], movmean(x2d, windowsize=ws,
-        lag='lagged'))
+    assert_array_almost_equal(va[ws-1:,:],
+                    movvar(x2d, windowsize=3, lag='leading'))
+    assert_array_almost_equal(va[ws//2:-ws//2+1,:],
+                    movvar(x2d, windowsize=3, lag='centered'))
+    assert_array_almost_equal(va[:-ws+1,:],
+                    movvar(x2d, windowsize=ws, lag='lagged'))
+
+    assert_array_almost_equal(ave2d[ws-1:],
+                    movmoment(x2d, 1, windowsize=3, lag='leading'))
+    assert_array_almost_equal(ave2d[ws//2:-ws//2+1],
+                    movmoment(x2d, 1, windowsize=3, lag='centered'))
+    assert_array_almost_equal(ave2d[:-ws+1],
+                    movmean(x2d, windowsize=ws, lag='lagged'))
+
     from scipy import ndimage
-    print(ndimage.filters.correlate1d(x2d, np.array([1, 1, 1]) / 3.0, axis=0))
-    xg = np.array([0.0, 0.1, 0.3, 0.6, 1.0, 1.5, 2.1, 2.8, 3.6, 4.5, 5.5, 
-        6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5,
-        18.5, 19.5, 20.5, 21.5, 22.5, 23.5, 24.5, 25.5, 26.5, 27.5, 28.5, 
-        29.5, 30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5, 37.5, 38.5, 39.5, 
-        40.5, 41.5, 42.5, 43.5, 44.5, 45.5, 46.5, 47.5, 48.5, 49.5, 50.5, 
-        51.5, 52.5, 53.5, 54.5, 55.5, 56.5, 57.5, 58.5, 59.5, 60.5, 61.5, 
-        62.5, 63.5, 64.5, 65.5, 66.5, 67.5, 68.5, 69.5, 70.5, 71.5, 72.5, 
-        73.5, 74.5, 75.5, 76.5, 77.5, 78.5, 79.5, 80.5, 81.5, 82.5, 83.5, 
-        84.5, 85.5, 86.5, 87.5, 88.5, 89.5, 90.5, 91.5, 92.5, 93.5, 94.5])
-    assert_array_almost_equal(xg, movmean(np.arange(100), 10, 'lagged'))
-    xd = np.array([0.3, 0.6, 1.0, 1.5, 2.1, 2.8, 3.6, 4.5, 5.5, 6.5, 7.5, 
-        8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 
-        19.5, 20.5, 21.5, 22.5, 23.5, 24.5, 25.5, 26.5, 27.5, 28.5, 29.5, 
-        30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5, 37.5, 38.5, 39.5, 40.5, 
-        41.5, 42.5, 43.5, 44.5, 45.5, 46.5, 47.5, 48.5, 49.5, 50.5, 51.5, 
-        52.5, 53.5, 54.5, 55.5, 56.5, 57.5, 58.5, 59.5, 60.5, 61.5, 62.5, 
-        63.5, 64.5, 65.5, 66.5, 67.5, 68.5, 69.5, 70.5, 71.5, 72.5, 73.5, 
-        74.5, 75.5, 76.5, 77.5, 78.5, 79.5, 80.5, 81.5, 82.5, 83.5, 84.5, 
-        85.5, 86.5, 87.5, 88.5, 89.5, 90.5, 91.5, 92.5, 93.5, 94.5, 95.4, 
-        96.2, 96.9, 97.5, 98.0, 98.4, 98.7, 98.9, 99.0])
-    assert_array_almost_equal(xd, movmean(np.arange(100), 10, 'leading'))
-    xc = np.array([1.36363636, 1.90909091, 2.54545455, 3.27272727, 
-        4.09090909, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 
-        15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 
-        26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 
-        37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 
-        48.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 
-        59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 
-        70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 
-        81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 
-        92.0, 93.0, 94.0, 94.90909091, 95.72727273, 96.45454545, 
-        97.09090909, 97.63636364])
-    assert_array_almost_equal(xc, movmean(np.arange(100), 11, 'centered'))
+    print(ndimage.filters.correlate1d(x2d, np.array([1,1,1])/3., axis=0))
+
+
+    #regression test check
+
+    xg = np.array([  0. ,   0.1,   0.3,   0.6,   1. ,   1.5,   2.1,   2.8,   3.6,
+                 4.5,   5.5,   6.5,   7.5,   8.5,   9.5,  10.5,  11.5,  12.5,
+                13.5,  14.5,  15.5,  16.5,  17.5,  18.5,  19.5,  20.5,  21.5,
+                22.5,  23.5,  24.5,  25.5,  26.5,  27.5,  28.5,  29.5,  30.5,
+                31.5,  32.5,  33.5,  34.5,  35.5,  36.5,  37.5,  38.5,  39.5,
+                40.5,  41.5,  42.5,  43.5,  44.5,  45.5,  46.5,  47.5,  48.5,
+                49.5,  50.5,  51.5,  52.5,  53.5,  54.5,  55.5,  56.5,  57.5,
+                58.5,  59.5,  60.5,  61.5,  62.5,  63.5,  64.5,  65.5,  66.5,
+                67.5,  68.5,  69.5,  70.5,  71.5,  72.5,  73.5,  74.5,  75.5,
+                76.5,  77.5,  78.5,  79.5,  80.5,  81.5,  82.5,  83.5,  84.5,
+                85.5,  86.5,  87.5,  88.5,  89.5,  90.5,  91.5,  92.5,  93.5,
+                94.5])
+
+    assert_array_almost_equal(xg, movmean(np.arange(100), 10,'lagged'))
+
+    xd = np.array([  0.3,   0.6,   1. ,   1.5,   2.1,   2.8,   3.6,   4.5,   5.5,
+                 6.5,   7.5,   8.5,   9.5,  10.5,  11.5,  12.5,  13.5,  14.5,
+                15.5,  16.5,  17.5,  18.5,  19.5,  20.5,  21.5,  22.5,  23.5,
+                24.5,  25.5,  26.5,  27.5,  28.5,  29.5,  30.5,  31.5,  32.5,
+                33.5,  34.5,  35.5,  36.5,  37.5,  38.5,  39.5,  40.5,  41.5,
+                42.5,  43.5,  44.5,  45.5,  46.5,  47.5,  48.5,  49.5,  50.5,
+                51.5,  52.5,  53.5,  54.5,  55.5,  56.5,  57.5,  58.5,  59.5,
+                60.5,  61.5,  62.5,  63.5,  64.5,  65.5,  66.5,  67.5,  68.5,
+                69.5,  70.5,  71.5,  72.5,  73.5,  74.5,  75.5,  76.5,  77.5,
+                78.5,  79.5,  80.5,  81.5,  82.5,  83.5,  84.5,  85.5,  86.5,
+                87.5,  88.5,  89.5,  90.5,  91.5,  92.5,  93.5,  94.5,  95.4,
+                96.2,  96.9,  97.5,  98. ,  98.4,  98.7,  98.9,  99. ])
+    assert_array_almost_equal(xd, movmean(np.arange(100), 10,'leading'))
+
+    xc = np.array([ 1.36363636,   1.90909091,   2.54545455,   3.27272727,
+                 4.09090909,   5.        ,   6.        ,   7.        ,
+                 8.        ,   9.        ,  10.        ,  11.        ,
+                12.        ,  13.        ,  14.        ,  15.        ,
+                16.        ,  17.        ,  18.        ,  19.        ,
+                20.        ,  21.        ,  22.        ,  23.        ,
+                24.        ,  25.        ,  26.        ,  27.        ,
+                28.        ,  29.        ,  30.        ,  31.        ,
+                32.        ,  33.        ,  34.        ,  35.        ,
+                36.        ,  37.        ,  38.        ,  39.        ,
+                40.        ,  41.        ,  42.        ,  43.        ,
+                44.        ,  45.        ,  46.        ,  47.        ,
+                48.        ,  49.        ,  50.        ,  51.        ,
+                52.        ,  53.        ,  54.        ,  55.        ,
+                56.        ,  57.        ,  58.        ,  59.        ,
+                60.        ,  61.        ,  62.        ,  63.        ,
+                64.        ,  65.        ,  66.        ,  67.        ,
+                68.        ,  69.        ,  70.        ,  71.        ,
+                72.        ,  73.        ,  74.        ,  75.        ,
+                76.        ,  77.        ,  78.        ,  79.        ,
+                80.        ,  81.        ,  82.        ,  83.        ,
+                84.        ,  85.        ,  86.        ,  87.        ,
+                88.        ,  89.        ,  90.        ,  91.        ,
+                92.        ,  93.        ,  94.        ,  94.90909091,
+                95.72727273,  96.45454545,  97.09090909,  97.63636364])
+    assert_array_almost_equal(xc, movmean(np.arange(100), 11,'centered'))
diff --git a/statsmodels/sandbox/tsa/try_arma_more.py b/statsmodels/sandbox/tsa/try_arma_more.py
index b56971935..fd91d1a21 100644
--- a/statsmodels/sandbox/tsa/try_arma_more.py
+++ b/statsmodels/sandbox/tsa/try_arma_more.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Periodograms for ARMA and time series

 theoretical periodogram of ARMA process and different version
@@ -14,6 +15,7 @@ import numpy as np
 from scipy import signal, ndimage
 import matplotlib.mlab as mlb
 import matplotlib.pyplot as plt
+
 from statsmodels.tsa.arima_process import arma_generate_sample
 from statsmodels.tsa.stattools import acovf
 hastalkbox = False
@@ -21,71 +23,109 @@ try:
     import scikits.talkbox.spectral.basic as stbs
 except ImportError:
     hastalkbox = False
-ar = [1.0, -0.7]
-ma = [1.0, 0.3]
-ar = np.convolve([1.0] + [0] * 50 + [-0.6], ar)
-ar = np.convolve([1.0, -0.5] + [0] * 49 + [-0.3], ar)
+
+ar = [1., -0.7]#[1,0,0,0,0,0,0,-0.7]
+ma = [1., 0.3]
+
+ar = np.convolve([1.]+[0]*50 +[-0.6], ar)
+ar = np.convolve([1., -0.5]+[0]*49 +[-0.3], ar)
+
 n_startup = 1000
 nobs = 1000
-xo = arma_generate_sample(ar, ma, n_startup + nobs)
+# throwing away samples at beginning makes sample more "stationary"
+
+xo = arma_generate_sample(ar,ma,n_startup+nobs)
 x = xo[n_startup:]
+
+
 plt.figure()
 plt.plot(x)
+
 rescale = 0
+
 w, h = signal.freqz(ma, ar)
-sd = np.abs(h) ** 2 / np.sqrt(2 * np.pi)
+sd = np.abs(h)**2/np.sqrt(2*np.pi)
+
 if np.sum(np.isnan(h)) > 0:
+    # this happens with unit root or seasonal unit root'
     print('Warning: nan in frequency response h')
-    h[np.isnan(h)] = 1.0
+    h[np.isnan(h)] = 1.
     rescale = 0
+
+
+
+#replace with signal.order_filter ?
 pm = ndimage.filters.maximum_filter(sd, footprint=np.ones(5))
 maxind = np.nonzero(pm == sd)
 print('local maxima frequencies')
 wmax = w[maxind]
 sdmax = sd[maxind]
+
+
 plt.figure()
-plt.subplot(2, 3, 1)
+plt.subplot(2,3,1)
 if rescale:
-    plt.plot(w, sd / sd[0], '-', wmax, sdmax / sd[0], 'o')
+    plt.plot(w, sd/sd[0], '-', wmax, sdmax/sd[0], 'o')
+#    plt.plot(w, sd/sd[0], '-')
+#    plt.hold()
+#    plt.plot(wmax, sdmax/sd[0], 'o')
 else:
     plt.plot(w, sd, '-', wmax, sdmax, 'o')
+#    plt.hold()
+#    plt.plot(wmax, sdmax, 'o')
+
 plt.title('DGP')
+
 sdm, wm = mlb.psd(x)
 sdm = sdm.ravel()
 pm = ndimage.filters.maximum_filter(sdm, footprint=np.ones(5))
 maxind = np.nonzero(pm == sdm)
-plt.subplot(2, 3, 2)
+
+plt.subplot(2,3,2)
 if rescale:
-    plt.plot(wm, sdm / sdm[0], '-', wm[maxind], sdm[maxind] / sdm[0], 'o')
+    plt.plot(wm,sdm/sdm[0], '-', wm[maxind], sdm[maxind]/sdm[0], 'o')
 else:
     plt.plot(wm, sdm, '-', wm[maxind], sdm[maxind], 'o')
 plt.title('matplotlib')
+
 if hastalkbox:
     sdp, wp = stbs.periodogram(x)
-    plt.subplot(2, 3, 3)
+    plt.subplot(2,3,3)
+
     if rescale:
-        plt.plot(wp, sdp / sdp[0])
+        plt.plot(wp,sdp/sdp[0])
     else:
         plt.plot(wp, sdp)
     plt.title('stbs.periodogram')
+
 xacov = acovf(x, unbiased=False)
-plt.subplot(2, 3, 4)
+plt.subplot(2,3,4)
 plt.plot(xacov)
 plt.title('autocovariance')
-nr = len(x)
-xacovfft = np.fft.fft(np.correlate(x, x, 'full'))
+
+nr = len(x)#*2/3
+#xacovfft = np.fft.fft(xacov[:nr], 2*nr-1)
+xacovfft = np.fft.fft(np.correlate(x,x,'full'))
+#abs(xacovfft)**2 or equivalently
 xacovfft = xacovfft * xacovfft.conj()
-plt.subplot(2, 3, 5)
+
+plt.subplot(2,3,5)
 if rescale:
-    plt.plot(xacovfft[:nr] / xacovfft[0])
+    plt.plot(xacovfft[:nr]/xacovfft[0])
 else:
     plt.plot(xacovfft[:nr])
+
 plt.title('fft')
+
 if hastalkbox:
     sdpa, wpa = stbs.arspec(x, 50)
-    plt.subplot(2, 3, 6)
+    plt.subplot(2,3,6)
+
     if rescale:
-        plt.plot(wpa, sdpa / sdpa[0])
+        plt.plot(wpa,sdpa/sdpa[0])
     else:
         plt.plot(wpa, sdpa)
     plt.title('stbs.arspec')
+
+
+#plt.show()
diff --git a/statsmodels/sandbox/tsa/try_fi.py b/statsmodels/sandbox/tsa/try_fi.py
index c2a6aed3a..936bb49fe 100644
--- a/statsmodels/sandbox/tsa/try_fi.py
+++ b/statsmodels/sandbox/tsa/try_fi.py
@@ -1,4 +1,5 @@
-"""
+
+'''
 using lfilter to get fractional integration polynomial (1-L)^d, d<1
 `ri` is (1-L)^(-d), d<1

@@ -6,19 +7,24 @@ second part in here is ar2arma

 only examples left

-"""
+'''
 import numpy as np
 from scipy.special import gamma
 from scipy import signal
-from statsmodels.tsa.arima_process import lpol_fiar, lpol_fima, ar2arma, arma_impulse_response
+
+from statsmodels.tsa.arima_process import (lpol_fiar, lpol_fima,
+                                           ar2arma, arma_impulse_response)
+
+
+
 if __name__ == '__main__':
     d = 0.4
     n = 1000
-    j = np.arange(n * 10)
-    ri0 = gamma(d + j) / (gamma(j + 1) * gamma(d))
-    ri = lpol_fima(d, n=n)
-    riinv = signal.lfilter([1], ri, [1] + [0] * (n - 1))
-    """
+    j = np.arange(n*10)
+    ri0 = gamma(d+j)/(gamma(j+1)*gamma(d))
+    ri = lpol_fima(d, n=n)  # get_ficoefs(d, n=n) old naming?
+    riinv = signal.lfilter([1], ri, [1]+[0]*(n-1))#[[5,10,20,25]]
+    '''
     array([-0.029952  , -0.01100641, -0.00410998, -0.00299859])
     >>> d=0.4; j=np.arange(1000);ri=gamma(d+j)/(gamma(j+1)*gamma(d))
     >>> # (1-L)^d, d<1 is
@@ -31,28 +37,32 @@ if __name__ == '__main__':
           -0.00299859, -0.00283712, -0.00269001, -0.00255551, -0.00243214,
           -0.00231864])
     >>> # verified for points [[5,10,20,25]] at 4 decimals with Bhardwaj, Swanson, Journal of Eonometrics 2006
-    """
+    '''
     print(lpol_fiar(0.4, n=20))
     print(lpol_fima(-0.4, n=20))
-    print(np.sum((lpol_fima(-0.4, n=n)[1:] + riinv[1:]) ** 2))
-    print(np.sum((lpol_fiar(0.4, n=n)[1:] - riinv[1:]) ** 2))
+    print(np.sum((lpol_fima(-0.4, n=n)[1:] + riinv[1:])**2)) #different signs
+    print(np.sum((lpol_fiar(0.4, n=n)[1:] - riinv[1:])**2)) #corrected signs
+
+    #test is now in statsmodels.tsa.tests.test_arima_process
     from statsmodels.tsa.tests.test_arima_process import test_fi
     test_fi()
+
     ar_true = [1, -0.4]
     ma_true = [1, 0.5]
+
+
     ar_desired = arma_impulse_response(ma_true, ar_true)
-    ar_app, ma_app, res = ar2arma(ar_desired, 2, 1, n=100, mse='ar', start=
-        [0.1])
+    ar_app, ma_app, res = ar2arma(ar_desired, 2,1, n=100, mse='ar', start=[0.1])
     print(ar_app, ma_app)
-    ar_app, ma_app, res = ar2arma(ar_desired, 2, 2, n=100, mse='ar', start=
-        [-0.1, 0.1])
+    ar_app, ma_app, res = ar2arma(ar_desired, 2,2, n=100, mse='ar', start=[-0.1, 0.1])
     print(ar_app, ma_app)
-    ar_app, ma_app, res = ar2arma(ar_desired, 2, 3, n=100, mse='ar')
+    ar_app, ma_app, res = ar2arma(ar_desired, 2,3, n=100, mse='ar')#, start = [-0.1, 0.1])
     print(ar_app, ma_app)
+
     slow = 1
     if slow:
         ar_desired = lpol_fiar(0.4, n=100)
-        ar_app, ma_app, res = ar2arma(ar_desired, 3, 1, n=100, mse='ar')
+        ar_app, ma_app, res = ar2arma(ar_desired, 3, 1, n=100, mse='ar')#, start = [-0.1, 0.1])
         print(ar_app, ma_app)
-        ar_app, ma_app, res = ar2arma(ar_desired, 10, 10, n=100, mse='ar')
+        ar_app, ma_app, res = ar2arma(ar_desired, 10, 10, n=100, mse='ar')#, start = [-0.1, 0.1])
         print(ar_app, ma_app)
diff --git a/statsmodels/sandbox/tsa/try_var_convolve.py b/statsmodels/sandbox/tsa/try_var_convolve.py
index 4a65ffcf8..6416ac302 100644
--- a/statsmodels/sandbox/tsa/try_var_convolve.py
+++ b/statsmodels/sandbox/tsa/try_var_convolve.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """trying out VAR filtering and multidimensional fft

 Note: second half is copy and paste and does not run as script
@@ -15,29 +16,44 @@ Runs now without raising exception
 import numpy as np
 from numpy.testing import assert_equal
 from scipy import signal, stats
+
 try:
     from scipy.signal._signaltools import _centered as trim_centered
 except ImportError:
+    # Must be using SciPy <1.8.0 where this function was moved (it's not a
+    # public SciPy function, but we need it here)
     from scipy.signal.signaltools import _centered as trim_centered
+
 from statsmodels.tsa.filters.filtertools import fftconvolveinv as fftconvolve
-x = np.arange(40).reshape((2, 20)).T
-x = np.arange(60).reshape((3, 20)).T
-a3f = np.array([[[0.5, 1.0], [1.0, 0.5]], [[0.5, 1.0], [1.0, 0.5]]])
-a3f = np.ones((2, 3, 3))
+
+
+x = np.arange(40).reshape((2,20)).T
+x = np.arange(60).reshape((3,20)).T
+a3f = np.array([[[0.5,  1.], [1.,  0.5]],
+               [[0.5,  1.], [1.,  0.5]]])
+a3f = np.ones((2,3,3))
+
+
 nlags = a3f.shape[0]
-ntrim = nlags // 2
-y0 = signal.convolve(x, a3f[:, :, 0], mode='valid')
-y1 = signal.convolve(x, a3f[:, :, 1], mode='valid')
-yf = signal.convolve(x[:, :, None], a3f)
-y = yf[:, 1, :]
-yvalid = yf[ntrim:-ntrim, yf.shape[1] // 2, :]
+ntrim = nlags//2
+
+y0 = signal.convolve(x,a3f[:,:,0], mode='valid')
+y1 = signal.convolve(x,a3f[:,:,1], mode='valid')
+yf = signal.convolve(x[:,:,None],a3f)
+y = yf[:,1,:]  #
+yvalid = yf[ntrim:-ntrim,yf.shape[1]//2,:]
+#same result with fftconvolve
+#signal.fftconvolve(x[:,:,None],a3f).shape
+#signal.fftconvolve(x[:,:,None],a3f)[:,1,:]
 print(trim_centered(y, x.shape))
-assert_equal(yvalid[:, 0], y0.ravel())
-assert_equal(yvalid[:, 1], y1.ravel())
+# this raises an exception:
+#print(trim_centered(yf, (x.shape).shape)
+assert_equal(yvalid[:,0], y0.ravel())
+assert_equal(yvalid[:,1], y1.ravel())


 def arfilter(x, a):
-    """apply an autoregressive filter to a series x
+    '''apply an autoregressive filter to a series x

     x can be 2d, a can be 1d, 2d, or 3d

@@ -87,99 +103,213 @@ def arfilter(x, a):

     TODO: initial conditions

-    """
-    pass
+    '''
+    x = np.asarray(x)
+    a = np.asarray(a)
+    if x.ndim == 1:
+        x = x[:,None]
+    if x.ndim > 2:
+        raise ValueError('x array has to be 1d or 2d')
+    nvar = x.shape[1]
+    nlags = a.shape[0]
+    ntrim = nlags//2
+    # for x is 2d with ncols >1
+
+    if a.ndim == 1:
+        # case: identical ar filter (lag polynomial)
+        return signal.convolve(x, a[:,None], mode='valid')
+        # alternative:
+        #return signal.lfilter(a,[1],x.astype(float),axis=0)
+    elif a.ndim == 2:
+        if min(a.shape) == 1:
+            # case: identical ar filter (lag polynomial)
+            return signal.convolve(x, a, mode='valid')
+
+        # case: independent ar
+        #(a bit like recserar in gauss, but no x yet)
+        result = np.zeros((x.shape[0]-nlags+1, nvar))
+        for i in range(nvar):
+            # could also use np.convolve, but easier for swiching to fft
+            result[:,i] = signal.convolve(x[:,i], a[:,i], mode='valid')
+        return result

+    elif a.ndim == 3:
+        # case: vector autoregressive with lag matrices
+#        #not necessary:
+#        if np.any(a.shape[1:] != nvar):
+#            raise ValueError('if 3d shape of a has to be (nobs,nvar,nvar)')
+        yf = signal.convolve(x[:,:,None], a)
+        yvalid = yf[ntrim:-ntrim, yf.shape[1]//2,:]
+        return yvalid

-a3f = np.ones((2, 3, 3))
-y0ar = arfilter(x, a3f[:, :, 0])
+a3f = np.ones((2,3,3))
+y0ar = arfilter(x,a3f[:,:,0])
 print(y0ar, x[1:] + x[:-1])
-yres = arfilter(x, a3f[:, :, :2])
-print(np.all(yres == (x[1:, :].sum(1) + x[:-1].sum(1))[:, None]))
-yff = fftconvolve(x.astype(float)[:, :, None], a3f)
+yres = arfilter(x,a3f[:,:,:2])
+print(np.all(yres == (x[1:,:].sum(1) + x[:-1].sum(1))[:,None]))
+
+
+yff = fftconvolve(x.astype(float)[:,:,None],a3f)
+
 rvs = np.random.randn(500)
-ar1fft = fftconvolve(rvs, np.array([1, -0.8]))
-ar1fftp = fftconvolve(np.r_[np.zeros(100), rvs], np.array([1, -0.8]))
-ar1lf = signal.lfilter([1], [1, -0.8], rvs)
+ar1fft = fftconvolve(rvs,np.array([1,-0.8]))
+#ar1fftp = fftconvolve(np.r_[np.zeros(100),rvs,np.zeros(100)],np.array([1,-0.8]))
+ar1fftp = fftconvolve(np.r_[np.zeros(100),rvs],np.array([1,-0.8]))
+ar1lf = signal.lfilter([1], [1,-0.8], rvs)
+
 ar1 = np.zeros(501)
-for i in range(1, 501):
-    ar1[i] = 0.8 * ar1[i - 1] + rvs[i - 1]
+for i in range(1,501):
+    ar1[i] = 0.8*ar1[i-1] + rvs[i-1]
+
+#the previous looks wrong, is for generating ar with delayed error,
+#or maybe for an ma(1) filter, (generating ar and applying ma filter are the same)
+#maybe not since it replicates lfilter and fftp
+#still strange explanation for convolution
+#ok. because this is my fftconvolve, which is an inverse filter (read the namespace!)
+
+#This is an AR filter
 errar1 = np.zeros(501)
-for i in range(1, 500):
-    errar1[i] = rvs[i] - 0.8 * rvs[i - 1]
-print("""
- compare: 
-errloop - arloop - fft - lfilter - fftp (padded)""")
-print(np.column_stack((errar1[1:31], ar1[1:31], ar1fft[:30], ar1lf[:30],
-    ar1fftp[100:130])))
-print(maxabs(ar1[1:], ar1lf))
-print(maxabs(ar1[1:], ar1fftp[100:-1]))
-rvs3 = np.random.randn(500, 3)
-a3n = np.array([[1, 1, 1], [-0.8, 0.5, 0.1]])
-a3n = np.array([[1, 1, 1], [-0.8, 0.0, 0.0]])
-a3n = np.array([[1, -1, -1], [-0.8, 0.0, 0.0]])
-a3n = np.array([[1, 0, 0], [-0.8, 0.0, 0.0]])
-a3ne = np.r_[np.ones((1, 3)), -0.8 * np.eye(3)]
-a3ne = np.r_[np.ones((1, 3)), -0.8 * np.eye(3)]
-ar13fft = fftconvolve(rvs3, a3n)
-ar13 = np.zeros((501, 3))
-for i in range(1, 501):
-    ar13[i] = np.sum(a3n[1, :] * ar13[i - 1]) + rvs[i - 1]
-imp = np.zeros((10, 3))
-imp[0] = 1
-a3n = np.array([[1, 0, 0], [-0.8, 0.0, 0.0]])
-fftconvolve(np.r_[np.zeros((100, 3)), imp], a3n)[100:]
-a3n = np.array([[1, 0, 0], [-0.8, -0.5, 0.0]])
-fftconvolve(np.r_[np.zeros((100, 3)), imp], a3n)[100:]
-a3n3 = np.array([[[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], [[-
-    0.8, 0.0, 0.0], [0.0, -0.8, 0.0], [0.0, 0.0, -0.8]]])
-a3n3 = np.array([[[1.0, 0.5, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], [[-
-    0.8, 0.0, 0.0], [0.0, -0.8, 0.0], [0.0, 0.0, -0.8]]])
-ttt = fftconvolve(np.r_[np.zeros((100, 3)), imp][:, :, None], a3n3.T)[100:]
-gftt = ttt / ttt[0, :, :]
-a3n3 = np.array([[[1.0, 0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], [[-0.8,
-    0.2, 0.0], [0, 0.0, 0.0], [0.0, 0.0, 0.8]]])
-ttt = fftconvolve(np.r_[np.zeros((100, 3)), imp][:, :, None], a3n3)[100:]
-gftt = ttt / ttt[0, :, :]
-signal.fftconvolve(np.dstack((imp, imp, imp)), a3n3)[1, :, :]
+for i in range(1,500):
+    errar1[i] = rvs[i] - 0.8*rvs[i-1]
+
+#print(ar1[-10:])
+#print(ar1fft[-11:-1])
+#print(ar1lf[-10:])
+#print(ar1[:10])
+#print(ar1fft[1:11])
+#print(ar1lf[:10])
+#print(ar1[100:110])
+#print(ar1fft[100:110])
+#print(ar1lf[100:110])
+#
+#arloop - lfilter - fftp (padded)  are the same
+print('\n compare: \nerrloop - arloop - fft - lfilter - fftp (padded)')
+#print(np.column_stack((ar1[1:31],ar1fft[:30], ar1lf[:30]))
+print(np.column_stack((errar1[1:31], ar1[1:31],ar1fft[:30], ar1lf[:30],
+                       ar1fftp[100:130])))
+
+def maxabs(x,y):
+    return np.max(np.abs(x-y))
+
+print(maxabs(ar1[1:], ar1lf))  #0
+print(maxabs(ar1[1:], ar1fftp[100:-1])) # around 1e-15
+
+rvs3 = np.random.randn(500,3)
+a3n = np.array([[1,1,1],[-0.8,0.5,0.1]])
+a3n = np.array([[1,1,1],[-0.8,0.0,0.0]])
+a3n = np.array([[1,-1,-1],[-0.8,0.0,0.0]])
+a3n = np.array([[1,0,0],[-0.8,0.0,0.0]])
+a3ne = np.r_[np.ones((1,3)),-0.8*np.eye(3)]
+a3ne = np.r_[np.ones((1,3)),-0.8*np.eye(3)]
+ar13fft = fftconvolve(rvs3,a3n)
+
+ar13 = np.zeros((501,3))
+for i in range(1,501):
+    ar13[i] = np.sum(a3n[1,:]*ar13[i-1]) + rvs[i-1]
+
+#changes imp was not defined, not sure what it is supposed to be
+#copied from a .log file
+imp = np.zeros((10,3))
+imp[0]=1
+
+a3n = np.array([[1,0,0],[-0.8,0.0,0.0]])
+fftconvolve(np.r_[np.zeros((100,3)),imp],a3n)[100:]
+a3n = np.array([[1,0,0],[-0.8,-0.50,0.0]])
+fftconvolve(np.r_[np.zeros((100,3)),imp],a3n)[100:]
+
+a3n3 = np.array([[[ 1. ,  0. ,  0. ],
+                 [ 0. ,  1. ,  0. ],
+                 [ 0. ,  0. ,  1. ]],
+
+                [[-0.8,  0. ,  0. ],
+                 [ 0. , -0.8,  0. ],
+                 [ 0. ,  0. , -0.8]]])
+
+a3n3 = np.array([[[ 1. ,  0.5 ,  0. ],
+                  [ 0. ,  1. ,  0. ],
+                  [ 0. ,  0. ,  1. ]],
+
+                 [[-0.8,  0. ,  0. ],
+                  [ 0. , -0.8,  0. ],
+                  [ 0. ,  0. , -0.8]]])
+ttt = fftconvolve(np.r_[np.zeros((100,3)),imp][:,:,None],a3n3.T)[100:]
+gftt = ttt/ttt[0,:,:]
+
+a3n3 = np.array([[[ 1. ,  0 ,  0. ],
+                  [ 0. ,  1. ,  0. ],
+                  [ 0. ,  0. ,  1. ]],
+
+                 [[-0.8,  0.2 ,  0. ],
+                  [ 0 ,  0.0,  0. ],
+                  [ 0. ,  0. , 0.8]]])
+ttt = fftconvolve(np.r_[np.zeros((100,3)),imp][:,:,None],a3n3)[100:]
+gftt = ttt/ttt[0,:,:]
+signal.fftconvolve(np.dstack((imp,imp,imp)),a3n3)[1,:,:]
+
 nobs = 10
-imp = np.zeros((nobs, 3))
-imp[1] = 1.0
-ar13 = np.zeros((nobs + 1, 3))
-for i in range(1, nobs + 1):
-    ar13[i] = np.dot(a3n3[1, :, :], ar13[i - 1]) + imp[i - 1]
-a3n3inv = np.zeros((nobs + 1, 3, 3))
-a3n3inv[0, :, :] = a3n3[0]
-a3n3inv[1, :, :] = -a3n3[1]
-for i in range(2, nobs + 1):
-    a3n3inv[i, :, :] = np.dot(-a3n3[1], a3n3inv[i - 1, :, :])
-a3n3sy = np.array([[[1.0, 0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], [[-
-    0.8, 0.2, 0.0], [0, 0.0, 0.0], [0.0, 0.0, 0.8]]])
+imp = np.zeros((nobs,3))
+imp[1] = 1.
+ar13 = np.zeros((nobs+1,3))
+for i in range(1,nobs+1):
+    ar13[i] = np.dot(a3n3[1,:,:],ar13[i-1]) + imp[i-1]
+
+a3n3inv = np.zeros((nobs+1,3,3))
+a3n3inv[0,:,:] = a3n3[0]
+a3n3inv[1,:,:] = -a3n3[1]
+for i in range(2,nobs+1):
+    a3n3inv[i,:,:] = np.dot(-a3n3[1],a3n3inv[i-1,:,:])
+
+
+a3n3sy = np.array([[[ 1. ,  0 ,  0. ],
+                  [ 0. ,  1. ,  0. ],
+                  [ 0. ,  0. ,  1. ]],
+
+                 [[-0.8,  0.2 ,  0. ],
+                  [ 0 ,  0.0,  0. ],
+                  [ 0. ,  0. , 0.8]]])
+
 nobs = 10
-a = np.array([[[1.0, 0.0], [0.0, 1.0]], [[-0.8, 0.0], [-0.1, -0.8]]])
-a2n3inv = np.zeros((nobs + 1, 2, 2))
-a2n3inv[0, :, :] = a[0]
-a2n3inv[1, :, :] = -a[1]
-for i in range(2, nobs + 1):
-    a2n3inv[i, :, :] = np.dot(-a[1], a2n3inv[i - 1, :, :])
+a = np.array([[[ 1. ,  0. ],
+        [ 0. ,  1. ]],
+
+       [[-0.8,  0.0 ],
+        [ -0.1 , -0.8]]])
+
+
+a2n3inv = np.zeros((nobs+1,2,2))
+a2n3inv[0,:,:] = a[0]
+a2n3inv[1,:,:] = -a[1]
+for i in range(2,nobs+1):
+    a2n3inv[i,:,:] = np.dot(-a[1],a2n3inv[i-1,:,:])
+
 nobs = 10
-imp = np.zeros((nobs, 2))
-imp[0, 0] = 1.0
-a2 = np.array([[[1.0, 0.0], [0.0, 1.0]], [[-0.8, 0.0], [0.1, -0.8]]])
-ar12 = np.zeros((nobs + 1, 2))
-for i in range(1, nobs + 1):
-    ar12[i] = np.dot(-a2[1, :, :], ar12[i - 1]) + imp[i - 1]
-u = np.random.randn(10, 2)
-ar12r = np.zeros((nobs + 1, 2))
-for i in range(1, nobs + 1):
-    ar12r[i] = np.dot(-a2[1, :, :], ar12r[i - 1]) + u[i - 1]
-a2inv = np.zeros((nobs + 1, 2, 2))
-a2inv[0, :, :] = a2[0]
-a2inv[1, :, :] = -a2[1]
-for i in range(2, nobs + 1):
-    a2inv[i, :, :] = np.dot(-a2[1], a2inv[i - 1, :, :])
+imp = np.zeros((nobs,2))
+imp[0,0] = 1.
+
+#a2 was missing, copied from .log file, not sure if correct
+a2 = np.array([[[ 1. ,  0. ],
+        [ 0. ,  1. ]],
+
+       [[-0.8,  0. ],
+        [0.1, -0.8]]])
+
+ar12 = np.zeros((nobs+1,2))
+for i in range(1,nobs+1):
+    ar12[i] = np.dot(-a2[1,:,:],ar12[i-1]) + imp[i-1]
+
+u = np.random.randn(10,2)
+ar12r = np.zeros((nobs+1,2))
+for i in range(1,nobs+1):
+    ar12r[i] = np.dot(-a2[1,:,:],ar12r[i-1]) + u[i-1]
+
+a2inv = np.zeros((nobs+1,2,2))
+a2inv[0,:,:] = a2[0]
+a2inv[1,:,:] = -a2[1]
+for i in range(2,nobs+1):
+    a2inv[i,:,:] = np.dot(-a2[1],a2inv[i-1,:,:])
+
 nbins = 12
-binProb = np.zeros(nbins) + 1.0 / nbins
+binProb = np.zeros(nbins) + 1.0/nbins
 binSumProb = np.add.accumulate(binProb)
 print(binSumProb)
-print(stats.gamma.ppf(binSumProb, 0.6379, loc=1.6, scale=39.555))
+print(stats.gamma.ppf(binSumProb,0.6379,loc=1.6,scale=39.555))
diff --git a/statsmodels/sandbox/tsa/varma.py b/statsmodels/sandbox/tsa/varma.py
index c5f13e4ef..28b5f05be 100644
--- a/statsmodels/sandbox/tsa/varma.py
+++ b/statsmodels/sandbox/tsa/varma.py
@@ -1,4 +1,4 @@
-"""VAR and VARMA process
+'''VAR and VARMA process

 this does not actually do much, trying out a version for a time loop

@@ -20,13 +20,17 @@ changes

 Author : josefpkt
 License : BSD
-"""
+'''
+
 import numpy as np
 from scipy import signal


-def VAR(x, B, const=0):
-    """ multivariate linear filter
+#NOTE: this just returns that predicted values given the
+#B matrix in polynomial form.
+#TODO: make sure VAR class returns B/params in this form.
+def VAR(x,B, const=0):
+    ''' multivariate linear filter

     Parameters
     ----------
@@ -57,12 +61,21 @@ def VAR(x, B, const=0):
     ----------
     https://en.wikipedia.org/wiki/Vector_Autoregression
     https://en.wikipedia.org/wiki/General_matrix_notation_of_a_VAR(p)
-    """
-    pass
-
-
-def VARMA(x, B, C, const=0):
-    """ multivariate linear filter
+    '''
+    p = B.shape[0]
+    T = x.shape[0]
+    xhat = np.zeros(x.shape)
+    for t in range(p,T): #[p+2]:#
+##        print(p,T)
+##        print(x[t-p:t,:,np.newaxis].shape)
+##        print(B.shape)
+        #print(x[t-p:t,:,np.newaxis])
+        xhat[t,:] = const + (x[t-p:t,:,np.newaxis]*B).sum(axis=1).sum(axis=0)
+    return xhat
+
+
+def VARMA(x,B,C, const=0):
+    ''' multivariate linear filter

     x (TxK)
     B (PxKxK)
@@ -70,52 +83,90 @@ def VARMA(x, B, C, const=0):
     xhat(t,i) = sum{_p}sum{_k} { x(t-P:t,:) .* B(:,:,i) } +
                 sum{_q}sum{_k} { e(t-Q:t,:) .* C(:,:,i) }for all i = 0,K-1

-    """
-    pass
+    '''
+    P = B.shape[0]
+    Q = C.shape[0]
+    T = x.shape[0]
+    xhat = np.zeros(x.shape)
+    e = np.zeros(x.shape)
+    start = max(P,Q)
+    for t in range(start,T): #[p+2]:#
+##        print(p,T
+##        print(x[t-p:t,:,np.newaxis].shape
+##        print(B.shape
+        #print(x[t-p:t,:,np.newaxis]
+        xhat[t,:] =  const + (x[t-P:t,:,np.newaxis]*B).sum(axis=1).sum(axis=0) + \
+                     (e[t-Q:t,:,np.newaxis]*C).sum(axis=1).sum(axis=0)
+        e[t,:] = x[t,:] - xhat[t,:]
+    return xhat, e


 if __name__ == '__main__':
+
+
     T = 20
     K = 2
     P = 3
-    x = np.column_stack([np.arange(T)] * K)
-    B = np.ones((P, K, K))
-    B[:, :, 1] = [[0, 0], [0, 0], [0, 1]]
-    xhat = VAR(x, B)
-    print(np.all(xhat[P:, 0] == np.correlate(x[:-1, 0], np.ones(P)) * 2))
+    #x = np.arange(10).reshape(5,2)
+    x = np.column_stack([np.arange(T)]*K)
+    B = np.ones((P,K,K))
+    #B[:,:,1] = 2
+    B[:,:,1] = [[0,0],[0,0],[0,1]]
+    xhat = VAR(x,B)
+    print(np.all(xhat[P:,0]==np.correlate(x[:-1,0],np.ones(P))*2))
+    #print(xhat)
+
+
     T = 20
     K = 2
     Q = 2
     P = 3
     const = 1
-    x = np.column_stack([np.arange(T)] * K)
-    B = np.ones((P, K, K))
-    B[:, :, 1] = [[0, 0], [0, 0], [0, 1]]
-    C = np.zeros((Q, K, K))
-    xhat1 = VAR(x, B, const=const)
-    xhat2, err2 = VARMA(x, B, C, const=const)
+    #x = np.arange(10).reshape(5,2)
+    x = np.column_stack([np.arange(T)]*K)
+    B = np.ones((P,K,K))
+    #B[:,:,1] = 2
+    B[:,:,1] = [[0,0],[0,0],[0,1]]
+    C = np.zeros((Q,K,K))
+    xhat1 = VAR(x,B, const=const)
+    xhat2, err2 = VARMA(x,B,C, const=const)
     print(np.all(xhat2 == xhat1))
-    print(np.all(xhat2[P:, 0] == np.correlate(x[:-1, 0], np.ones(P)) * 2 +
-        const))
-    C[1, 1, 1] = 0.5
-    xhat3, err3 = VARMA(x, B, C)
-    x = np.r_[np.zeros((P, K)), x]
-    xhat4, err4 = VARMA(x, B, C)
-    C[1, 1, 1] = 1
-    B[:, :, 1] = [[0, 0], [0, 0], [0, 1]]
-    xhat5, err5 = VARMA(x, B, C)
-    x0 = np.column_stack([np.arange(T), 2 * np.arange(T)])
-    B[:, :, 0] = np.ones((P, K))
-    B[:, :, 1] = np.ones((P, K))
-    B[1, 1, 1] = 0
-    xhat0 = VAR(x0, B)
-    xcorr00 = signal.correlate(x0, B[:, :, 0])
-    xcorr01 = signal.correlate(x0, B[:, :, 1])
-    print(np.all(signal.correlate(x0, B[:, :, 0], 'valid')[:-1, 0] == xhat0
-        [P:, 0]))
-    print(np.all(signal.correlate(x0, B[:, :, 1], 'valid')[:-1, 0] == xhat0
-        [P:, 1]))
+    print(np.all(xhat2[P:,0] == np.correlate(x[:-1,0],np.ones(P))*2+const))
+
+    C[1,1,1] = 0.5
+    xhat3, err3 = VARMA(x,B,C)
+
+    x = np.r_[np.zeros((P,K)),x]  #prepend initial conditions
+    xhat4, err4 = VARMA(x,B,C)
+
+    C[1,1,1] = 1
+    B[:,:,1] = [[0,0],[0,0],[0,1]]
+    xhat5, err5 = VARMA(x,B,C)
+    #print(err5)
+
+    #in differences
+    #VARMA(np.diff(x,axis=0),B,C)
+
+
+    #Note:
+    # * signal correlate applies same filter to all columns if kernel.shape[1]<K
+    #   e.g. signal.correlate(x0,np.ones((3,1)),'valid')
+    # * if kernel.shape[1]==K, then `valid` produces a single column
+    #   -> possible to run signal.correlate K times with different filters,
+    #      see the following example, which replicates VAR filter
+    x0 = np.column_stack([np.arange(T), 2*np.arange(T)])
+    B[:,:,0] = np.ones((P,K))
+    B[:,:,1] = np.ones((P,K))
+    B[1,1,1] = 0
+    xhat0 = VAR(x0,B)
+    xcorr00 = signal.correlate(x0,B[:,:,0])#[:,0]
+    xcorr01 = signal.correlate(x0,B[:,:,1])
+    print(np.all(signal.correlate(x0,B[:,:,0],'valid')[:-1,0]==xhat0[P:,0]))
+    print(np.all(signal.correlate(x0,B[:,:,1],'valid')[:-1,0]==xhat0[P:,1]))
+
+    #import error
+    #from movstat import acovf, acf
     from statsmodels.tsa.stattools import acovf, acf
-    aav = acovf(x[:, 0])
-    print(aav[0] == np.var(x[:, 0]))
-    aac = acf(x[:, 0])
+    aav = acovf(x[:,0])
+    print(aav[0] == np.var(x[:,0]))
+    aac = acf(x[:,0])
diff --git a/statsmodels/stats/_adnorm.py b/statsmodels/stats/_adnorm.py
index 794af0a12..bed38c4fd 100644
--- a/statsmodels/stats/_adnorm.py
+++ b/statsmodels/stats/_adnorm.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Sun Sep 25 21:23:38 2011

@@ -5,8 +6,10 @@ Author: Josef Perktold and Scipy developers
 License : BSD-3
 """
 import warnings
+
 import numpy as np
 from scipy import stats
+
 from statsmodels.tools.validation import array_like, bool_like, int_like


@@ -34,7 +37,43 @@ def anderson_statistic(x, dist='norm', fit=True, params=(), axis=0):
     {float, ndarray}
         The Anderson-Darling statistic.
     """
-    pass
+    x = array_like(x, 'x', ndim=None)
+    fit = bool_like(fit, 'fit')
+    axis = int_like(axis, 'axis')
+    y = np.sort(x, axis=axis)
+    nobs = y.shape[axis]
+    if fit:
+        if dist == 'norm':
+            xbar = np.expand_dims(np.mean(x, axis=axis), axis)
+            s = np.expand_dims(np.std(x, ddof=1, axis=axis), axis)
+            w = (y - xbar) / s
+            z = stats.norm.cdf(w)
+        elif callable(dist):
+            params = dist.fit(x)
+            z = dist.cdf(y, *params)
+        else:
+            raise ValueError("dist must be 'norm' or a Callable")
+    else:
+        if callable(dist):
+            z = dist.cdf(y, *params)
+        else:
+            raise ValueError('if fit is false, then dist must be callable')
+
+    i = np.arange(1, nobs + 1)
+    sl1 = [None] * x.ndim
+    sl1[axis] = slice(None)
+    sl1 = tuple(sl1)
+    sl2 = [slice(None)] * x.ndim
+    sl2[axis] = slice(None, None, -1)
+    sl2 = tuple(sl2)
+    with warnings.catch_warnings():
+        warnings.filterwarnings(
+            "ignore", message="divide by zero encountered in log1p"
+        )
+        ad_values = (2 * i[sl1] - 1.0) / nobs * (np.log(z) + np.log1p(-z[sl2]))
+        s = np.sum(ad_values, axis=axis)
+    a2 = -nobs - s
+    return a2


 def normal_ad(x, axis=0):
@@ -64,4 +103,40 @@ def normal_ad(x, axis=0):
         Kolmogorov-Smirnov test with estimated parameters for Normal or
         Exponential distributions.
     """
-    pass
+    ad2 = anderson_statistic(x, dist='norm', fit=True, axis=axis)
+    n = x.shape[axis]
+
+    ad2a = ad2 * (1 + 0.75 / n + 2.25 / n ** 2)
+
+    if np.size(ad2a) == 1:
+        if (ad2a >= 0.00 and ad2a < 0.200):
+            pval = 1 - np.exp(-13.436 + 101.14 * ad2a - 223.73 * ad2a ** 2)
+        elif ad2a < 0.340:
+            pval = 1 - np.exp(-8.318 + 42.796 * ad2a - 59.938 * ad2a ** 2)
+        elif ad2a < 0.600:
+            pval = np.exp(0.9177 - 4.279 * ad2a - 1.38 * ad2a ** 2)
+        elif ad2a <= 13:
+            pval = np.exp(1.2937 - 5.709 * ad2a + 0.0186 * ad2a ** 2)
+        else:
+            pval = 0.0  # is < 4.9542108058458799e-31
+
+    else:
+        bounds = np.array([0.0, 0.200, 0.340, 0.600])
+
+        pval0 = lambda ad2a: np.nan * np.ones_like(ad2a)
+        pval1 = lambda ad2a: 1 - np.exp(
+            -13.436 + 101.14 * ad2a - 223.73 * ad2a ** 2)
+        pval2 = lambda ad2a: 1 - np.exp(
+            -8.318 + 42.796 * ad2a - 59.938 * ad2a ** 2)
+        pval3 = lambda ad2a: np.exp(0.9177 - 4.279 * ad2a - 1.38 * ad2a ** 2)
+        pval4 = lambda ad2a: np.exp(1.2937 - 5.709 * ad2a + 0.0186 * ad2a ** 2)
+
+        pvalli = [pval0, pval1, pval2, pval3, pval4]
+
+        idx = np.searchsorted(bounds, ad2a, side='right')
+        pval = np.nan * np.ones_like(ad2a)
+        for i in range(5):
+            mask = (idx == i)
+            pval[mask] = pvalli[i](ad2a[mask])
+
+    return ad2, pval
diff --git a/statsmodels/stats/_delta_method.py b/statsmodels/stats/_delta_method.py
index c9db09321..956e15316 100644
--- a/statsmodels/stats/_delta_method.py
+++ b/statsmodels/stats/_delta_method.py
@@ -1,15 +1,17 @@
+# -*- coding: utf-8 -*-
 """
 Author: Josef Perktold
 License: BSD-3

 """
+
 from __future__ import print_function
 import numpy as np
 from scipy import stats


 class NonlinearDeltaCov:
-    """Asymptotic covariance by Deltamethod
+    '''Asymptotic covariance by Deltamethod

     The function is designed for 2d array, with rows equal to
     the number of equations or constraints and columns equal to the number
@@ -46,8 +48,7 @@ class NonlinearDeltaCov:
         Not yet implemented.


-    """
-
+    '''
     def __init__(self, func, params, cov_params, deriv=None, func_args=None):
         self.fun = func
         self.params = params
@@ -76,12 +77,27 @@ class NonlinearDeltaCov:
         grad : ndarray
             gradient or jacobian of the function
         """
-        pass
+        if params is None:
+            params = self.params
+        if self._grad is not None:
+            return self._grad(params)
+        else:
+            # copied from discrete_margins
+            try:
+                from statsmodels.tools.numdiff import approx_fprime_cs
+                jac = approx_fprime_cs(params, self.fun, **kwds)
+            except TypeError:  # norm.cdf doesn't take complex values
+                from statsmodels.tools.numdiff import approx_fprime
+                jac = approx_fprime(params, self.fun, **kwds)
+
+            return jac

     def cov(self):
         """Covariance matrix of the transformed random variable.
         """
-        pass
+        g = self.grad()
+        covar = np.dot(np.dot(g, self.cov_params), g.T)
+        return covar

     def predicted(self):
         """Value of the function evaluated at the attached params.
@@ -91,7 +107,13 @@ class NonlinearDeltaCov:
         `predicted` is the maximum likelihood estimate of the value of the
         nonlinear function.
         """
-        pass
+
+        predicted = self.fun(self.params)
+
+        # TODO: why do I need to squeeze in poisson example
+        if predicted.ndim > 1:
+            predicted = predicted.squeeze()
+        return predicted

     def wald_test(self, value):
         """Joint hypothesis tests that H0: f(params) = value.
@@ -115,22 +137,34 @@ class NonlinearDeltaCov:
             The p-value for the hypothesis test, based and chisquare
             distribution and implies a two-sided hypothesis test
         """
-        pass
+        # TODO: add use_t option or not?
+        m = self.predicted()
+        v = self.cov()
+        df_constraints = np.size(m)
+        diff = m - value
+        lmstat = np.dot(np.dot(diff.T, np.linalg.inv(v)), diff)
+        return lmstat, stats.chi2.sf(lmstat, df_constraints)

     def var(self):
         """standard error for each equation (row) treated separately

         """
-        pass
+        g = self.grad()
+        var = (np.dot(g, self.cov_params) * g).sum(-1)
+
+        if var.ndim == 2:
+            var = var.T
+        return var

     def se_vectorized(self):
         """standard error for each equation (row) treated separately

         """
-        pass
+        var = self.var()
+        return np.sqrt(var)

     def conf_int(self, alpha=0.05, use_t=False, df=None, var_extra=None,
-        predicted=None, se=None):
+                 predicted=None, se=None):
         """
         Confidence interval for predicted based on delta method.

@@ -164,10 +198,35 @@ class NonlinearDeltaCov:
             for the corresponding parameter. The first column contains all
             lower, the second column contains all upper limits.
         """
-        pass

-    def summary(self, xname=None, alpha=0.05, title=None, use_t=False, df=None
-        ):
+        # TODO: predicted and se as arguments to avoid duplicate calculations
+        # or leave unchanged?
+        if not use_t:
+            dist = stats.norm
+            dist_args = ()
+        else:
+            if df is None:
+                raise ValueError('t distribution requires df')
+            dist = stats.t
+            dist_args = (df,)
+
+        if predicted is None:
+            predicted = self.predicted()
+        if se is None:
+            se = self.se_vectorized()
+        if var_extra is not None:
+            se = np.sqrt(se**2 + var_extra)
+
+        q = dist.ppf(1 - alpha / 2., *dist_args)
+        lower = predicted - q * se
+        upper = predicted + q * se
+        ci = np.column_stack((lower, upper))
+        if ci.shape[1] != 2:
+            raise RuntimeError('something wrong: ci not 2 columns')
+        return ci
+
+    def summary(self, xname=None, alpha=0.05, title=None, use_t=False,
+                df=None):
         """Summarize the Results of the nonlinear transformation.

         This provides a parameter table equivalent to `t_test` and reuses
@@ -199,4 +258,23 @@ class NonlinearDeltaCov:
             results summary.
             For F or Wald test, the return is a string.
         """
-        pass
+        # this is an experimental reuse of ContrastResults
+        from statsmodels.stats.contrast import ContrastResults
+        predicted = self.predicted()
+        se = self.se_vectorized()
+        # TODO check shape for scalar case, ContrastResults requires iterable
+        predicted = np.atleast_1d(predicted)
+        if predicted.ndim > 1:
+            predicted = predicted.squeeze()
+        se = np.atleast_1d(se)
+
+        statistic = predicted / se
+        if use_t:
+            df_resid = df
+            cr = ContrastResults(effect=predicted, t=statistic, sd=se,
+                                 df_denom=df_resid)
+        else:
+            cr = ContrastResults(effect=predicted, statistic=statistic, sd=se,
+                                 df_denom=None, distribution='norm')
+
+        return cr.summary(xname=xname, alpha=alpha, title=title)
diff --git a/statsmodels/stats/_diagnostic_other.py b/statsmodels/stats/_diagnostic_other.py
index 0ad5730d2..9be330e3c 100644
--- a/statsmodels/stats/_diagnostic_other.py
+++ b/statsmodels/stats/_diagnostic_other.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Score, lagrange multiplier and conditional moment tests
 robust to misspecification or without specification of higher moments

@@ -159,12 +160,17 @@ press, 2010.

 """
 import warnings
+
 import numpy as np
 from scipy import stats
+
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.regression.linear_model import OLS


+# deprecated dispersion functions, moved to discrete._diagnostic_count
+
+
 def dispersion_poisson(results):
     """Score/LM type tests for Poisson variance assumptions

@@ -196,12 +202,19 @@ def dispersion_poisson(results):
        Each test has two strings a descriptive name and a string for the
        alternative hypothesis.
     """
-    pass
+    msg = (
+        'dispersion_poisson here is deprecated, use the version in '
+        'discrete._diagnostic_count'
+    )
+    warnings.warn(msg, FutureWarning)
+
+    from statsmodels.discrete._diagnostics_count import test_poisson_dispersion
+    return test_poisson_dispersion(results, _old=True)


-def dispersion_poisson_generic(results, exog_new_test, exog_new_control=
-    None, include_score=False, use_endog=True, cov_type='HC3', cov_kwds=
-    None, use_t=False):
+def dispersion_poisson_generic(results, exog_new_test, exog_new_control=None,
+                               include_score=False, use_endog=True,
+                               cov_type='HC3', cov_kwds=None, use_t=False):
     """A variable addition test for the variance function

     .. deprecated:: 0.14
@@ -215,21 +228,53 @@ def dispersion_poisson_generic(results, exog_new_test, exog_new_control=

     Warning: insufficiently tested, especially for options
     """
-    pass
+    msg = (
+        'dispersion_poisson_generic here is deprecated, use the version in '
+        'discrete._diagnostic_count'
+    )
+    warnings.warn(msg, FutureWarning)
+
+    from statsmodels.discrete._diagnostics_count import (
+        _test_poisson_dispersion_generic
+        )
+
+    res_test = _test_poisson_dispersion_generic(
+        results, exog_new_test, exog_new_control= exog_new_control,
+        include_score=include_score, use_endog=use_endog,
+        cov_type=cov_type, cov_kwds=cov_kwds, use_t=use_t,
+        )
+    return res_test


 class ResultsGeneric:

+
     def __init__(self, **kwds):
         self.__dict__.update(kwds)


 class TestResults(ResultsGeneric):
-    pass
+
+    def summary(self):
+        txt = 'Specification Test (LM, score)\n'
+        stat = [self.c1, self.c2, self.c3]
+        pval = [self.pval1, self.pval2, self.pval3]
+        description = ['nonrobust', 'dispersed', 'HC']
+
+        for row in zip(description, stat, pval):
+            txt += '%-12s  statistic = %6.4f  pvalue = %6.5f\n' % row
+
+        txt += '\nAssumptions:\n'
+        txt += 'nonrobust: variance is correctly specified\n'
+        txt += 'dispersed: variance correctly specified up to scale factor\n'
+        txt += 'HC       : robust to any heteroscedasticity\n'
+        txt += 'test is not robust to correlation across observations'
+
+        return txt


 def lm_test_glm(result, exog_extra, mean_deriv=None):
-    """score/lagrange multiplier test for GLM
+    '''score/lagrange multiplier test for GLM

     Wooldridge procedure for test of mean function in GLM

@@ -271,12 +316,79 @@ def lm_test_glm(result, exog_extra, mean_deriv=None):

     and other articles and text book by Wooldridge

-    """
-    pass
+    '''
+
+    if hasattr(result, '_result'):
+        res = result._result
+    else:
+        res = result
+
+    mod = result.model
+    nobs = mod.endog.shape[0]
+
+    #mean_func = mod.family.link.inverse
+    dlinkinv = mod.family.link.inverse_deriv
+
+    # derivative of mean function w.r.t. beta (linear params)
+    dm = lambda x, linpred: dlinkinv(linpred)[:,None] * x
+
+    var_func = mod.family.variance
+
+    x = result.model.exog
+    x2 = exog_extra
+
+    # test omitted
+    try:
+        lin_pred = res.predict(which="linear")
+    except TypeError:
+        # TODO: Standardized to which="linear" and remove linear kwarg
+        lin_pred = res.predict(linear=True)
+    dm_incl = dm(x, lin_pred)
+    if x2 is not None:
+        dm_excl = dm(x2, lin_pred)
+        if mean_deriv is not None:
+            # allow both and stack
+            dm_excl = np.column_stack((dm_excl, mean_deriv))
+    elif mean_deriv is not None:
+        dm_excl = mean_deriv
+    else:
+        raise ValueError('either exog_extra or mean_deriv have to be provided')
+
+    # TODO check for rank or redundant, note OLS calculates the rank
+    k_constraint = dm_excl.shape[1]
+    fittedvalues = res.predict()  # discrete has linpred instead of mean
+    v = var_func(fittedvalues)
+    std = np.sqrt(v)
+    res_ols1 = OLS(res.resid_response / std, np.column_stack((dm_incl, dm_excl)) / std[:, None]).fit()
+
+    # case: nonrobust assumes variance implied by distribution is correct
+    c1 = res_ols1.ess
+    pval1 = stats.chi2.sf(c1, k_constraint)
+    #print c1, stats.chi2.sf(c1, 2)
+
+    # case: robust to dispersion
+    c2 = nobs * res_ols1.rsquared
+    pval2 = stats.chi2.sf(c2, k_constraint)
+    #print c2, stats.chi2.sf(c2, 2)
+
+    # case: robust to heteroscedasticity
+    from statsmodels.stats.multivariate_tools import partial_project
+    pp = partial_project(dm_excl / std[:,None], dm_incl / std[:,None])
+    resid_p = res.resid_response / std
+    res_ols3 = OLS(np.ones(nobs), pp.resid * resid_p[:,None]).fit()
+    #c3 = nobs * res_ols3.rsquared   # this is Wooldridge
+    c3b = res_ols3.ess  # simpler if endog is ones
+    pval3 = stats.chi2.sf(c3b, k_constraint)
+
+    tres = TestResults(c1=c1, pval1=pval1,
+                       c2=c2, pval2=pval2,
+                       c3=c3b, pval3=pval3)
+
+    return tres


 def cm_test_robust(resid, resid_deriv, instruments, weights=1):
-    """score/lagrange multiplier of Wooldridge
+    '''score/lagrange multiplier of Wooldridge

     generic version of Wooldridge procedure for test of conditional moments

@@ -319,13 +431,46 @@ def cm_test_robust(resid, resid_deriv, instruments, weights=1):
     Wooldridge
     and more Wooldridge

-    """
-    pass
+    '''
+    # notation: Wooldridge uses too mamny Greek letters
+    # instruments is capital lambda
+    # resid is small phi
+    # resid_deriv is capital phi
+    # weights is C
+
+
+    nobs = resid.shape[0]
+
+    from statsmodels.stats.multivariate_tools import partial_project
+
+    w_sqrt = np.sqrt(weights)
+    if np.size(weights) > 1:
+        w_sqrt = w_sqrt[:,None]
+    pp = partial_project(instruments * w_sqrt, resid_deriv * w_sqrt)
+    mom_resid = pp.resid
+
+    moms_test = mom_resid * resid[:, None] * w_sqrt
+
+    # we get this here in case we extend resid to be more than 1-D
+    k_constraint = moms_test.shape[1]
+
+    # use OPG variance as in Wooldridge 1990. This might generalize
+    cov = moms_test.T.dot(moms_test)
+    diff = moms_test.sum(0)
+
+    # see Wooldridge last page in appendix
+    stat = diff.dot(np.linalg.solve(cov, diff))
+
+    # for checking, this corresponds to nobs * rsquared of auxiliary regression
+    stat2 = OLS(np.ones(nobs), moms_test).fit().ess
+    pval = stats.chi2.sf(stat, k_constraint)
+
+    return stat, pval, stat2


 def lm_robust(score, constraint_matrix, score_deriv_inv, cov_score,
-    cov_params=None):
-    """general formula for score/LM test
+              cov_params=None):
+    '''general formula for score/LM test

     generalized score or lagrange multiplier test for implicit constraints

@@ -361,12 +506,33 @@ def lm_robust(score, constraint_matrix, score_deriv_inv, cov_score,
     Notes
     -----

-    """
-    pass
+    '''
+    # shorthand alias
+    R, Ainv, B, V = constraint_matrix, score_deriv_inv, cov_score, cov_params
+
+    tmp = R.dot(Ainv)
+    wscore = tmp.dot(score)  # C Ainv score
+
+    if B is None and V is None:
+        # only Ainv is given, so we assume information matrix identity holds
+        # computational short cut, should be same if Ainv == inv(B)
+        lm_stat = score.dot(Ainv.dot(score))
+    else:
+        # information matrix identity does not hold
+        if V is None:
+            inner = tmp.dot(B).dot(tmp.T)
+        else:
+            inner = R.dot(V).dot(R.T)
+
+        #lm_stat2 = wscore.dot(np.linalg.pinv(inner).dot(wscore))
+        # Let's assume inner is invertible, TODO: check if usecase for pinv exists
+        lm_stat = wscore.dot(np.linalg.solve(inner, wscore))
+
+    return lm_stat#, lm_stat2


 def lm_robust_subset(score, k_constraints, score_deriv_inv, cov_score):
-    """general formula for score/LM test
+    '''general formula for score/LM test

     generalized score or lagrange multiplier test for constraints on a subset
     of parameters
@@ -406,12 +572,52 @@ def lm_robust_subset(score, k_constraints, score_deriv_inv, cov_score):
     The implementation is based on Boos 1992 section 4.1. The same derivation
     is also in other articles and in text books.

-    """
-    pass
+    '''
+
+    # Notation in Boos
+    # score `S = sum (s_i)
+    # score_obs `s_i`
+    # score_deriv `I` is derivative of score (hessian)
+    # `D` is covariance matrix of score, OPG product given independent observations
+
+    #k_params = len(score)
+
+    # Note: I reverse order between constraint and unconstrained compared to Boos

+    # submatrices of score_deriv/hessian
+    # these are I22 and I12 in Boos
+    #h_uu = score_deriv[-k_constraints:, -k_constraints:]
+    h_uu = score_deriv_inv[:-k_constraints, :-k_constraints]
+    h_cu = score_deriv_inv[-k_constraints:, :-k_constraints]

-def lm_robust_subset_parts(score, k_constraints, score_deriv_uu,
-    score_deriv_cu, cov_score_cc, cov_score_cu, cov_score_uu):
+    # TODO: pinv or solve ?
+    tmp_proj = h_cu.dot(np.linalg.inv(h_uu))
+    tmp = np.column_stack((-tmp_proj, np.eye(k_constraints))) #, tmp_proj))
+
+    cov_score_constraints = tmp.dot(cov_score.dot(tmp.T))
+
+    #lm_stat2 = wscore.dot(np.linalg.pinv(inner).dot(wscore))
+    # Let's assume inner is invertible, TODO: check if usecase for pinv exists
+    lm_stat = score.dot(np.linalg.solve(cov_score_constraints, score))
+    pval = stats.chi2.sf(lm_stat, k_constraints)
+
+#     # check second calculation Boos referencing Kent 1982 and Engle 1984
+#     # we can use this when robust_cov_params of full model is available
+#     #h_inv = np.linalg.inv(score_deriv)
+#     hinv = score_deriv_inv
+#     v = h_inv.dot(cov_score.dot(h_inv)) # this is robust cov_params
+#     v_cc = v[:k_constraints, :k_constraints]
+#     h_cc = score_deriv[:k_constraints, :k_constraints]
+#     # brute force calculation:
+#     h_resid_cu = h_cc - h_cu.dot(np.linalg.solve(h_uu, h_cu))
+#     cov_s_c = h_resid_cu.dot(v_cc.dot(h_resid_cu))
+#     diff = np.max(np.abs(cov_s_c - cov_score_constraints))
+    return lm_stat, pval  #, lm_stat2
+
+
+def lm_robust_subset_parts(score, k_constraints,
+                           score_deriv_uu, score_deriv_cu,
+                           cov_score_cc, cov_score_cu, cov_score_uu):
     """robust generalized score tests on subset of parameters

     This is the same as lm_robust_subset with arguments in parts of
@@ -463,7 +669,18 @@ def lm_robust_subset_parts(score, k_constraints, score_deriv_uu,
     section 4.1 in the form attributed to Breslow (1990). It does not use the
     computation attributed to Kent (1982) and Engle (1984).
     """
-    pass
+
+    tmp_proj = np.linalg.solve(score_deriv_uu, score_deriv_cu.T).T
+    tmp = tmp_proj.dot(cov_score_cu.T)
+
+    # this needs to make a copy of cov_score_cc for further inplace modification
+    cov = cov_score_cc - tmp
+    cov -= tmp.T
+    cov += tmp_proj.dot(cov_score_uu).dot(tmp_proj.T)
+
+    lm_stat = score.dot(np.linalg.solve(cov, score))
+    pval = stats.chi2.sf(lm_stat, k_constraints)
+    return lm_stat, pval


 def lm_robust_reparameterized(score, params_deriv, score_deriv, cov_score):
@@ -502,11 +719,29 @@ def lm_robust_reparameterized(score, params_deriv, score_deriv, cov_score):
     -----
     Boos 1992, section 4.3, expression for T_{GS} just before example 6
     """
-    pass
+    # Boos notation
+    # params_deriv G
+
+    k_params, k_reduced = params_deriv.shape
+    k_constraints = k_params - k_reduced
+
+    G = params_deriv  # shortcut alias

+    tmp_c0 = np.linalg.pinv(G.T.dot(score_deriv.dot(G)))
+    tmp_c1 = score_deriv.dot(G.dot(tmp_c0.dot(G.T)))
+    tmp_c = np.eye(k_params) - tmp_c1

-def conditional_moment_test_generic(mom_test, mom_test_deriv, mom_incl,
-    mom_incl_deriv, var_mom_all=None, cov_type='OPG', cov_kwds=None):
+    cov = tmp_c.dot(cov_score.dot(tmp_c.T))  # warning: reduced rank
+
+    lm_stat = score.dot(np.linalg.pinv(cov).dot(score))
+    pval = stats.chi2.sf(lm_stat, k_constraints)
+    return lm_stat, pval
+
+
+def conditional_moment_test_generic(mom_test, mom_test_deriv,
+                                    mom_incl, mom_incl_deriv,
+                                    var_mom_all=None,
+                                    cov_type='OPG', cov_kwds=None):
     """generic conditional moment test

     This is mainly intended as internal function in support of diagnostic
@@ -552,12 +787,54 @@ def conditional_moment_test_generic(mom_test, mom_test_deriv, mom_incl,
     Wooldridge ???
     Pagan and Vella 1989
     """
-    pass
+    if cov_type != 'OPG':
+        raise NotImplementedError
+
+    k_constraints = mom_test.shape[1]
+
+    if mom_incl is None:
+        # assume mom_test_deriv is zero, do not include effect of mom_incl
+        if var_mom_all is None:
+            var_cm = mom_test.T.dot(mom_test)
+        else:
+            var_cm = var_mom_all
+
+    else:
+        # take into account he effect of parameter estimates on mom_test
+        if var_mom_all is None:
+            mom_all = np.column_stack((mom_test, mom_incl))
+            # TODO: replace with inner sandwich covariance estimator
+            var_mom_all = mom_all.T.dot(mom_all)
+
+        tmp = mom_test_deriv.dot(np.linalg.pinv(mom_incl_deriv))
+        h = np.column_stack((np.eye(k_constraints), -tmp))
+
+        var_cm = h.dot(var_mom_all.dot(h.T))
+
+    # calculate test results with chisquare
+    var_cm_inv = np.linalg.pinv(var_cm)
+    mom_test_sum = mom_test.sum(0)
+    statistic = mom_test_sum.dot(var_cm_inv.dot(mom_test_sum))
+    pval = stats.chi2.sf(statistic, k_constraints)
+
+    # normal test of individual components
+    se = np.sqrt(np.diag(var_cm))
+    tvalues = mom_test_sum / se
+    pvalues = stats.norm.sf(np.abs(tvalues))
+
+    res = ResultsGeneric(var_cm=var_cm,
+                         stat_cmt=statistic,
+                         pval_cmt=pval,
+                         tvalues=tvalues,
+                         pvalues=pvalues)
+
+    return res


 def conditional_moment_test_regression(mom_test, mom_test_deriv=None,
-    mom_incl=None, mom_incl_deriv=None, var_mom_all=None, demean=False,
-    cov_type='OPG', cov_kwds=None):
+                                    mom_incl=None, mom_incl_deriv=None,
+                                    var_mom_all=None, demean=False,
+                                    cov_type='OPG', cov_kwds=None):
     """generic conditional moment test based artificial regression

     this is very experimental, no options implemented yet
@@ -571,7 +848,28 @@ def conditional_moment_test_regression(mom_test, mom_test_deriv=None,
     and it is assumed that parameters were estimated with optimial GMM, i.e.
     the weight matrix equal to the expectation of the score variance.
     """
-    pass
+    # so far coded from memory
+    nobs, k_constraints = mom_test.shape
+
+    endog = np.ones(nobs)
+    if mom_incl is not None:
+        ex = np.column_stack((mom_test, mom_incl))
+    else:
+        ex = mom_test
+    if demean:
+        ex -= ex.mean(0)
+    if cov_type == 'OPG':
+        res = OLS(endog, ex).fit()
+
+        statistic = nobs * res.rsquared
+        pval = stats.chi2.sf(statistic, k_constraints)
+    else:
+        res = OLS(endog, ex).fit(cov_type=cov_type, cov_kwds=cov_kwds)
+        tres = res.wald_test(np.eye(ex.shape[1]))
+        statistic = tres.statistic
+        pval = tres.pvalue
+
+    return statistic, pval


 class CMTNewey:
@@ -638,18 +936,50 @@ class CMTNewey:
       Moment Tests, Econometrica
     """

-    def __init__(self, moments, cov_moments, moments_deriv, weights, transf_mt
-        ):
+    def __init__(self, moments, cov_moments, moments_deriv,
+                 weights, transf_mt):
+
         self.moments = moments
         self.cov_moments = cov_moments
         self.moments_deriv = moments_deriv
         self.weights = weights
         self.transf_mt = transf_mt
+
+        # derived quantities
         self.moments_constraint = transf_mt.dot(moments)
-        self.htw = moments_deriv.T.dot(weights)
-        self.k_moments = self.moments.shape[-1]
+        self.htw = moments_deriv.T.dot(weights)   # H'W
+
+        # TODO check these
+        self.k_moments = self.moments.shape[-1]  # in this case only 1-D
+        # assuming full rank of L'
         self.k_constraints = self.transf_mt.shape[0]

+    @cache_readonly
+    def asy_transf_params(self):
+
+        moments_deriv = self.moments_deriv  # H
+        #weights = self.weights  # W
+
+        htw = self.htw  # moments_deriv.T.dot(weights)   # H'W
+        res = np.linalg.solve(htw.dot(moments_deriv), htw)
+        #res = np.linalg.pinv(htw.dot(moments_deriv)).dot(htw)
+        return -res
+
+    @cache_readonly
+    def project_w(self):
+        # P_w = I - H (H' W H)^{-1} H' W
+        moments_deriv = self.moments_deriv  # H
+
+        res = moments_deriv.dot(self.asy_transf_params)
+        res += np.eye(res.shape[0])
+        return res
+
+    @cache_readonly
+    def asy_transform_mom_constraints(self):
+        # L P_w
+        res = self.transf_mt.dot(self.project_w)
+        return res
+
     @cache_readonly
     def asy_cov_moments(self):
         """
@@ -659,7 +989,20 @@ class CMTNewey:
         mean is not implemented,
         V is the same as cov_moments in __init__ argument
         """
-        pass
+
+        return self.cov_moments
+
+    @cache_readonly
+    def cov_mom_constraints(self):
+
+        # linear transformation
+        transf = self.asy_transform_mom_constraints
+
+        return transf.dot(self.asy_cov_moments).dot(transf.T)
+
+    @cache_readonly
+    def rank_cov_mom_constraints(self):
+        return np.linalg.matrix_rank(self.cov_mom_constraints)

     def ztest(self):
         """statistic, p-value and degrees of freedom of separate moment test
@@ -669,13 +1012,26 @@ class CMTNewey:
         TODO: This can use generic ztest/ttest features and return
         ContrastResults
         """
-        pass
+        diff = self.moments_constraint
+        bse = np.sqrt(np.diag(self.cov_mom_constraints))
+
+        # Newey uses a generalized inverse
+        stat = diff / bse
+        pval = stats.norm.sf(np.abs(stat))*2
+        return stat, pval

     @cache_readonly
     def chisquare(self):
         """statistic, p-value and degrees of freedom of joint moment test
         """
-        pass
+        diff = self.moments_constraint
+        cov = self.cov_mom_constraints
+
+        # Newey uses a generalized inverse
+        stat = diff.T.dot(np.linalg.pinv(cov).dot(diff))
+        df = self.rank_cov_mom_constraints
+        pval = stats.chi2.sf(stat, df)  # Theorem 1
+        return stat, pval, df


 class CMTTauchen:
@@ -707,17 +1063,37 @@ class CMTTauchen:
         covariance estimate, i.e. the inner part of a sandwich covariance.
     """

-    def __init__(self, score, score_deriv, moments, moments_deriv, cov_moments
-        ):
+    def __init__(self, score, score_deriv, moments, moments_deriv, cov_moments):
         self.score = score
         self.score_deriv = score_deriv
         self.moments = moments
         self.moments_deriv = moments_deriv
         self.cov_moments_all = cov_moments
+
         self.k_moments_test = moments.shape[-1]
         self.k_params = score.shape[-1]
         self.k_moments_all = self.k_params + self.k_moments_test

+    @cache_readonly
+    def cov_params_all(self):
+        m_deriv = np.zeros((self.k_moments_all, self.k_moments_all))
+        m_deriv[:self.k_params, :self.k_params] = self.score_deriv
+        m_deriv[self.k_params:, :self.k_params] = self.moments_deriv
+        m_deriv[self.k_params:, self.k_params:] = np.eye(self.k_moments_test)
+
+        m_deriv_inv = np.linalg.inv(m_deriv)
+        cov = m_deriv_inv.dot(self.cov_moments_all.dot(m_deriv_inv.T)) # K_inv J K_inv
+        return cov
+
+    @cache_readonly
+    def cov_mom_constraints(self):
+        return self.cov_params_all[self.k_params:, self.k_params:]
+
+    @cache_readonly
+    def rank_cov_mom_constraints(self):
+        return np.linalg.matrix_rank(self.cov_mom_constraints)
+
+    # TODO: not DRY, just copied from CMTNewey
     def ztest(self):
         """statistic, p-value and degrees of freedom of separate moment test

@@ -726,10 +1102,25 @@ class CMTTauchen:
         TODO: This can use generic ztest/ttest features and return
         ContrastResults
         """
-        pass
+        diff = self.moments_constraint
+        bse = np.sqrt(np.diag(self.cov_mom_constraints))
+
+        # Newey uses a generalized inverse
+        stat = diff / bse
+        pval = stats.norm.sf(np.abs(stat))*2
+        return stat, pval

     @cache_readonly
     def chisquare(self):
         """statistic, p-value and degrees of freedom of joint moment test
         """
-        pass
+        diff = self.moments #_constraints
+        cov = self.cov_mom_constraints
+
+        # Newey uses a generalized inverse, we use it also here
+        stat = diff.T.dot(np.linalg.pinv(cov).dot(diff))
+        #df = self.k_moments_test
+        # We allow for redundant mom_constraints:
+        df = self.rank_cov_mom_constraints
+        pval = stats.chi2.sf(stat, df)
+        return stat, pval, df
diff --git a/statsmodels/stats/_inference_tools.py b/statsmodels/stats/_inference_tools.py
index 66e13ee15..babaf3a52 100644
--- a/statsmodels/stats/_inference_tools.py
+++ b/statsmodels/stats/_inference_tools.py
@@ -4,11 +4,12 @@ Created on Mar 30, 2022 1:21:54 PM
 Author: Josef Perktold
 License: BSD-3
 """
+
 import numpy as np
 from numpy.testing import assert_allclose


-def _mover_confint(stat1, stat2, ci1, ci2, contrast='diff'):
+def _mover_confint(stat1, stat2, ci1, ci2, contrast="diff"):
     """

     References
@@ -29,4 +30,54 @@ def _mover_confint(stat1, stat2, ci1, ci2, contrast='diff'):
        about Effect Measures: A General Approach.” Statistics in Medicine 27
        (10): 1693–1702. https://doi.org/10.1002/sim.3095.
     """
-    pass
+
+    if contrast == "diff":
+        stat = stat1 - stat2
+        low_half = np.sqrt((stat1 - ci1[0])**2 + (stat2 - ci2[1])**2)
+        upp_half = np.sqrt((stat1 - ci1[1])**2 + (stat2 - ci2[0])**2)
+        ci = (stat - low_half, stat + upp_half)
+
+    elif contrast == "sum":
+        stat = stat1 + stat2
+        low_half = np.sqrt((stat1 - ci1[0])**2 + (stat2 - ci2[0])**2)
+        upp_half = np.sqrt((stat1 - ci1[1])**2 + (stat2 - ci2[1])**2)
+        ci = (stat - low_half, stat + upp_half)
+
+    elif contrast == "ratio":
+        # stat = stat1 / stat2
+        prod = stat1 * stat2
+        term1 = stat2**2 - (ci2[1] - stat2)**2
+        term2 = stat2**2 - (ci2[0] - stat2)**2
+        low_ = (prod -
+                np.sqrt(prod**2 - term1 * (stat1**2 - (ci1[0] - stat1)**2))
+                ) / term1
+        upp_ = (prod +
+                np.sqrt(prod**2 - term2 * (stat1**2 - (ci1[1] - stat1)**2))
+                ) / term2
+
+        # method 2 Li, Tang, Wong 2014
+        low1, upp1 = ci1
+        low2, upp2 = ci2
+        term1 = upp2 * (2 * stat2 - upp2)
+        term2 = low2 * (2 * stat2 - low2)
+        low = (prod -
+               np.sqrt(prod**2 - term1 * low1 * (2 * stat1 - low1))
+               ) / term1
+        upp = (prod +
+               np.sqrt(prod**2 - term2 * upp1 * (2 * stat1 - upp1))
+               ) / term2
+
+        assert_allclose((low_, upp_), (low, upp), atol=1e-15, rtol=1e-10)
+
+        ci = (low, upp)
+
+    return ci
+
+
+def _mover_confint_sum(stat, ci):
+
+    stat_ = stat.sum(0)
+    low_half = np.sqrt(np.sum((stat_ - ci[0])**2))
+    upp_half = np.sqrt(np.sum((stat_ - ci[1])**2))
+    ci = (stat - low_half, stat + upp_half)
+    return ci
diff --git a/statsmodels/stats/_knockoff.py b/statsmodels/stats/_knockoff.py
index 0929935e9..aecb73c6a 100644
--- a/statsmodels/stats/_knockoff.py
+++ b/statsmodels/stats/_knockoff.py
@@ -22,6 +22,7 @@ Rina Foygel Barber, Emmanuel Candes (2015).  Controlling the False
 Discovery Rate via Knockoffs.  Annals of Statistics 43:5.
 https://candes.su.domains/publications/downloads/FDR_regression.pdf
 """
+
 import numpy as np
 import pandas as pd
 from statsmodels.iolib import summary2
@@ -69,51 +70,82 @@ class RegressionFDR:
     sdp approach requires that the cvxopt package be installed.
     """

-    def __init__(self, endog, exog, regeffects, method='knockoff', **kwargs):
-        if hasattr(exog, 'columns'):
+    def __init__(self, endog, exog, regeffects, method="knockoff",
+                 **kwargs):
+
+        if hasattr(exog, "columns"):
             self.xnames = exog.columns
         else:
-            self.xnames = [('x%d' % j) for j in range(exog.shape[1])]
+            self.xnames = ["x%d" % j for j in range(exog.shape[1])]
+
         exog = np.asarray(exog)
         endog = np.asarray(endog)
-        if 'design_method' not in kwargs:
-            kwargs['design_method'] = 'equi'
+
+        if "design_method" not in kwargs:
+            kwargs["design_method"] = "equi"
+
         nobs, nvar = exog.shape
-        if kwargs['design_method'] == 'equi':
+
+        if kwargs["design_method"] == "equi":
             exog1, exog2, _ = _design_knockoff_equi(exog)
-        elif kwargs['design_method'] == 'sdp':
+        elif kwargs["design_method"] == "sdp":
             exog1, exog2, _ = _design_knockoff_sdp(exog)
         endog = endog - np.mean(endog)
+
         self.endog = endog
         self.exog = np.concatenate((exog1, exog2), axis=1)
         self.exog1 = exog1
         self.exog2 = exog2
+
         self.stats = regeffects.stats(self)
+
         unq, inv, cnt = np.unique(self.stats, return_inverse=True,
-            return_counts=True)
+                                  return_counts=True)
+
+        # The denominator of the FDR
         cc = np.cumsum(cnt)
         denom = len(self.stats) - cc + cnt
         denom[denom < 1] = 1
+
+        # The numerator of the FDR
         ii = np.searchsorted(unq, -unq, side='right') - 1
         numer = cc[ii]
         numer[ii < 0] = 0
+
+        # The knockoff+ estimated FDR
         fdrp = (1 + numer) / denom
+
+        # The knockoff estimated FDR
         fdr = numer / denom
+
         self.fdr = fdr[inv]
         self.fdrp = fdrp[inv]
         self._ufdr = fdr
         self._unq = unq
+
         df = pd.DataFrame(index=self.xnames)
-        df['Stat'] = self.stats
-        df['FDR+'] = self.fdrp
-        df['FDR'] = self.fdr
+        df["Stat"] = self.stats
+        df["FDR+"] = self.fdrp
+        df["FDR"] = self.fdr
         self.fdr_df = df

     def threshold(self, tfdr):
         """
         Returns the threshold statistic for a given target FDR.
         """
-        pass
+
+        if np.min(self._ufdr) <= tfdr:
+            return self._unq[self._ufdr <= tfdr][0]
+        else:
+            return np.inf
+
+    def summary(self):
+
+        summ = summary2.Summary()
+        summ.add_title("Regression FDR results")
+        summ.add_df(self.fdr_df)
+
+        return summ


 def _design_knockoff_sdp(exog):
@@ -123,7 +155,43 @@ def _design_knockoff_sdp(exog):

     Requires cvxopt to be installed.
     """
-    pass
+
+    try:
+        from cvxopt import solvers, matrix
+    except ImportError:
+        raise ValueError("SDP knockoff designs require installation of cvxopt")
+
+    nobs, nvar = exog.shape
+
+    # Standardize exog
+    xnm = np.sum(exog**2, 0)
+    xnm = np.sqrt(xnm)
+    exog = exog / xnm
+
+    Sigma = np.dot(exog.T, exog)
+
+    c = matrix(-np.ones(nvar))
+
+    h0 = np.concatenate((np.zeros(nvar), np.ones(nvar)))
+    h0 = matrix(h0)
+    G0 = np.concatenate((-np.eye(nvar), np.eye(nvar)), axis=0)
+    G0 = matrix(G0)
+
+    h1 = 2 * Sigma
+    h1 = matrix(h1)
+    i, j = np.diag_indices(nvar)
+    G1 = np.zeros((nvar*nvar, nvar))
+    G1[i*nvar + j, i] = 1
+    G1 = matrix(G1)
+
+    solvers.options['show_progress'] = False
+    sol = solvers.sdp(c, G0, h0, [G1], [h1])
+    sl = np.asarray(sol['x']).ravel()
+
+    xcov = np.dot(exog.T, exog)
+    exogn = _get_knmat(exog, xcov, sl)
+
+    return exog, exogn, sl


 def _design_knockoff_equi(exog):
@@ -139,4 +207,50 @@ def _design_knockoff_equi(exog):
     the covariances between corresponding columns of exogn and exogs
     are as small as possible.
     """
-    pass
+
+    nobs, nvar = exog.shape
+
+    if nobs < 2*nvar:
+        msg = "The equivariant knockoff can ony be used when n >= 2*p"
+        raise ValueError(msg)
+
+    # Standardize exog
+    xnm = np.sum(exog**2, 0)
+    xnm = np.sqrt(xnm)
+    exog = exog / xnm
+
+    xcov = np.dot(exog.T, exog)
+    ev, _ = np.linalg.eig(xcov)
+    evmin = np.min(ev)
+
+    sl = min(2*evmin, 1)
+    sl = sl * np.ones(nvar)
+
+    exogn = _get_knmat(exog, xcov, sl)
+
+    return exog, exogn, sl
+
+
+def _get_knmat(exog, xcov, sl):
+    # Utility function, see equation 2.2 of Barber & Candes.
+
+    nobs, nvar = exog.shape
+
+    ash = np.linalg.inv(xcov)
+    ash *= -np.outer(sl, sl)
+    i, j = np.diag_indices(nvar)
+    ash[i, j] += 2 * sl
+
+    umat = np.random.normal(size=(nobs, nvar))
+    u, _ = np.linalg.qr(exog)
+    umat -= np.dot(u, np.dot(u.T, umat))
+    umat, _ = np.linalg.qr(umat)
+
+    ashr, xc, _ = np.linalg.svd(ash, 0)
+    ashr *= np.sqrt(xc)
+    ashr = ashr.T
+
+    ex = (sl[:, None] * np.linalg.solve(xcov, exog.T)).T
+    exogn = exog - ex + np.dot(umat, ashr)
+
+    return exogn
diff --git a/statsmodels/stats/_lilliefors.py b/statsmodels/stats/_lilliefors.py
index cb404b616..2de765057 100644
--- a/statsmodels/stats/_lilliefors.py
+++ b/statsmodels/stats/_lilliefors.py
@@ -38,10 +38,14 @@ unknown. Journal of the American Statistical Association, Vol 64, No. 325.
 (1969), pp. 387–389.
 """
 from functools import partial
+
 import numpy as np
 from scipy import stats
+
 from statsmodels.tools.validation import string_like
-from ._lilliefors_critical_values import critical_values, asymp_critical_values, PERCENTILES
+from ._lilliefors_critical_values import (critical_values,
+                                          asymp_critical_values,
+                                          PERCENTILES)
 from .tabledist import TableDist


@@ -57,7 +61,12 @@ def _make_asymptotic_function(params):
         Array with shape (nalpha, 3) where nalpha is the number of
         significance levels
     """
-    pass
+
+    def f(n):
+        poly = np.array([1, np.log(n), np.log(n) ** 2])
+        return np.exp(poly.dot(params.T))
+
+    return f


 def ksstat(x, cdf, alternative='two_sided', args=()):
@@ -104,7 +113,24 @@ def ksstat(x, cdf, alternative='two_sided', args=()):
     statistic which can be used either as distance measure or to implement
     case specific p-values.
     """
-    pass
+    nobs = float(len(x))
+
+    if isinstance(cdf, str):
+        cdf = getattr(stats.distributions, cdf).cdf
+    elif hasattr(cdf, 'cdf'):
+        cdf = getattr(cdf, 'cdf')
+
+    x = np.sort(x)
+    cdfvals = cdf(x, *args)
+
+    d_plus = (np.arange(1.0, nobs + 1) / nobs - cdfvals).max()
+    d_min = (cdfvals - np.arange(0.0, nobs) / nobs).max()
+    if alternative == 'greater':
+        return d_plus
+    elif alternative == 'less':
+        return d_min
+
+    return np.max([d_plus, d_min])


 def get_lilliefors_table(dist='norm'):
@@ -124,7 +150,26 @@ def get_lilliefors_table(dist='norm'):
     lf : TableDist object.
         table of critical values
     """
-    pass
+    # function just to keep things together
+    # for this test alpha is sf probability, i.e. right tail probability
+
+    alpha = 1 - np.array(PERCENTILES) / 100.0
+    alpha = alpha[::-1]
+    dist = 'normal' if dist == 'norm' else dist
+    if dist not in critical_values:
+        raise ValueError("Invalid dist parameter. Must be 'norm' or 'exp'")
+    cv_data = critical_values[dist]
+    acv_data = asymp_critical_values[dist]
+
+    size = np.array(sorted(cv_data), dtype=float)
+    crit_lf = np.array([cv_data[key] for key in sorted(cv_data)])
+    crit_lf = crit_lf[:, ::-1]
+
+    asym_params = np.array([acv_data[key] for key in sorted(acv_data)])
+    asymp_fn = _make_asymptotic_function((asym_params[::-1]))
+
+    lf = TableDist(alpha, size, crit_lf, asymptotic=asymp_fn)
+    return lf


 lilliefors_table_norm = get_lilliefors_table(dist='norm')
@@ -164,10 +209,17 @@ def pval_lf(d_max, n):
     ----------
     DallalWilkinson1986
     """
-    pass
+    # todo: check boundaries, valid range for n and Dmax
+    if n > 100:
+        d_max *= (n / 100.) ** 0.49
+        n = 100
+    pval = np.exp(-7.01256 * d_max ** 2 * (n + 2.78019)
+                  + 2.99587 * d_max * np.sqrt(n + 2.78019) - 0.122119
+                  + 0.974598 / np.sqrt(n) + 1.67997 / n)
+    return pval


-def kstest_fit(x, dist='norm', pvalmethod='table'):
+def kstest_fit(x, dist='norm', pvalmethod="table"):
     """
     Test assumed normal or exponential distribution using Lilliefors' test.

@@ -211,7 +263,46 @@ def kstest_fit(x, dist='norm', pvalmethod='table'):
     For implementation details, see  lilliefors_critical_value_simulation.py in
     the test directory.
     """
-    pass
+    pvalmethod = string_like(pvalmethod,
+                             "pvalmethod",
+                             options=("approx", "table"))
+    x = np.asarray(x)
+    if x.ndim == 2 and x.shape[1] == 1:
+        x = x[:, 0]
+    elif x.ndim != 1:
+        raise ValueError("Invalid parameter `x`: must be a one-dimensional"
+                         " array-like or a single-column DataFrame")
+
+    nobs = len(x)
+
+    if dist == 'norm':
+        z = (x - x.mean()) / x.std(ddof=1)
+        test_d = stats.norm.cdf
+        lilliefors_table = lilliefors_table_norm
+    elif dist == 'exp':
+        z = x / x.mean()
+        test_d = stats.expon.cdf
+        lilliefors_table = lilliefors_table_expon
+        pvalmethod = 'table'
+    else:
+        raise ValueError("Invalid dist parameter, must be 'norm' or 'exp'")
+
+    min_nobs = 4 if dist == 'norm' else 3
+    if nobs < min_nobs:
+        raise ValueError('Test for distribution {0} requires at least {1} '
+                         'observations'.format(dist, min_nobs))
+
+    d_ks = ksstat(z, test_d, alternative='two_sided')
+
+    if pvalmethod == 'approx':
+        pval = pval_lf(d_ks, nobs)
+        # check pval is in desired range
+        if pval > 0.1:
+            pval = lilliefors_table.prob(d_ks, nobs)
+    else:  # pvalmethod == 'table'
+        pval = lilliefors_table.prob(d_ks, nobs)
+
+    return d_ks, pval


 lilliefors = kstest_fit
diff --git a/statsmodels/stats/_lilliefors_critical_values.py b/statsmodels/stats/_lilliefors_critical_values.py
index fc6d2db92..50897defc 100644
--- a/statsmodels/stats/_lilliefors_critical_values.py
+++ b/statsmodels/stats/_lilliefors_critical_values.py
@@ -3,174 +3,380 @@ This file is automatically generated by littlefors_critical_values.py.
 Do not directly modifythis file.

 Value based on 10,000,000 simulations."""
+
 from numpy import array
+
 PERCENTILES = [1, 5, 10, 25, 50, 75, 90, 92.5, 95, 97.5, 99, 99.5, 99.7, 99.9]
-SAMPLE_SIZES = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
-    20, 25, 30, 40, 50, 100, 200, 400, 800, 1600]
-normal_crit_vals = {(4): array([0.14467854, 0.16876575, 0.18664724, 
-    0.22120362, 0.25828924, 0.29341032, 0.34532673, 0.35917374, 0.37521968,
-    0.39563998, 0.41307904, 0.42157653, 0.4261507, 0.43265213]), (5): array
-    ([0.13587046, 0.16098893, 0.17638354, 0.20235666, 0.2333944, 0.27766941,
-    0.31900772, 0.32936832, 0.34309223, 0.36727643, 0.39671728, 0.41322814,
-    0.42293504, 0.4386304]), (6): array([0.12919635, 0.15139467, 0.16384021,
-    0.18597849, 0.2186713, 0.2585473, 0.29713753, 0.30829444, 0.32338252, 
-    0.3456671, 0.37038945, 0.38760943, 0.40001813, 0.42304439]), (7): array
-    ([0.12263812, 0.14163065, 0.15238656, 0.17435948, 0.20617949, 
-    0.24243592, 0.28031415, 0.29068512, 0.3042307, 0.32532967, 0.35070348, 
-    0.3678189, 0.37908881, 0.40078798]), (8): array([0.11633728, 0.13297288,
-    0.14353311, 0.16537078, 0.19477376, 0.22936164, 0.2652001, 0.27501697, 
-    0.28804474, 0.30862157, 0.33279908, 0.34911188, 0.3603517, 0.38252055]),
-    (9): array([0.11029593, 0.126086, 0.1365291, 0.15748048, 0.18510669, 
-    0.21822055, 0.25223085, 0.26161129, 0.27415243, 0.29383667, 0.31708299,
-    0.33312043, 0.34406772, 0.36524182]), (10): array([0.10487398, 
-    0.12044377, 0.13065174, 0.15042835, 0.17679904, 0.20848607, 0.24098605,
-    0.25001629, 0.26202351, 0.2809226, 0.30341763, 0.31888089, 0.32960742, 
-    0.35061378]), (11): array([0.10036835, 0.11563925, 0.12540948, 
-    0.14421024, 0.1695621, 0.19993088, 0.23119563, 0.23987246, 0.25142048, 
-    0.26961902, 0.29148756, 0.30646087, 0.31678589, 0.33743951]), (12):
-    array([0.09649147, 0.11137479, 0.12069511, 0.13877133, 0.16313672, 
-    0.19233706, 0.22244871, 0.23082937, 0.24194831, 0.25959488, 0.28077229,
-    0.29526801, 0.30538827, 0.32558601]), (13): array([0.09318309, 
-    0.10753337, 0.11647389, 0.13389325, 0.15739248, 0.18555597, 0.21463087,
-    0.22267463, 0.23346007, 0.25054975, 0.27100552, 0.28519234, 0.29504428,
-    0.31445229]), (14): array([0.09024138, 0.1040431, 0.11266368, 
-    0.12948657, 0.15221479, 0.17944478, 0.2075745, 0.21536969, 0.22585144, 
-    0.2423819, 0.26231339, 0.2760657, 0.28555592, 0.30466148]), (15): array
-    ([0.08750578, 0.10085666, 0.1092263, 0.12552356, 0.14751778, 0.17390304,
-    0.2011939, 0.20879449, 0.21891697, 0.23499116, 0.25427802, 0.26772298, 
-    0.27706522, 0.29551017]), (16): array([0.08501529, 0.09795236, 
-    0.10607182, 0.1219047, 0.14327014, 0.16885124, 0.19537444, 0.20274039, 
-    0.21257643, 0.22822187, 0.24703504, 0.26003621, 0.26913735, 0.28745211]
-    ), (17): array([0.0827393, 0.09529635, 0.10320903, 0.11859867, 
-    0.13936135, 0.16422592, 0.1900207, 0.19722541, 0.20683438, 0.22204864, 
-    0.2402864, 0.25306469, 0.26200104, 0.2798316]), (18): array([0.08063198,
-    0.09285546, 0.10055901, 0.11554003, 0.13575222, 0.15996707, 0.18509495,
-    0.1921063, 0.20145911, 0.2163409, 0.23420221, 0.24665892, 0.25527246, 
-    0.27285099]), (19): array([0.0786934, 0.09059543, 0.09810264, 
-    0.11271608, 0.1324079, 0.15602767, 0.18052002, 0.18736281, 0.19652196, 
-    0.21105313, 0.22854301, 0.24068403, 0.24912401, 0.26624557]), (20):
-    array([0.07686372, 0.08852309, 0.09582876, 0.11008398, 0.12930804, 
-    0.15236687, 0.1762829, 0.18297149, 0.19190794, 0.20610189, 0.22327259, 
-    0.23513834, 0.24349604, 0.2602868]), (25): array([0.06933943, 
-    0.07985746, 0.08645091, 0.09927555, 0.11656692, 0.1373056, 0.15889286, 
-    0.16493761, 0.17300926, 0.18582499, 0.2013265, 0.21221932, 0.21979198, 
-    0.2350391]), (30): array([0.06380332, 0.07339662, 0.07942847, 
-    0.09118039, 0.10701841, 0.12603791, 0.14586121, 0.15138566, 0.15878307,
-    0.17058849, 0.18492078, 0.19492393, 0.20198353, 0.21612086]), (40):
-    array([0.055784, 0.06414642, 0.06940064, 0.07961554, 0.09339439, 
-    0.10994266, 0.12719016, 0.13202416, 0.13850018, 0.14877502, 0.16131916,
-    0.17009968, 0.17633078, 0.18870328]), (50): array([0.05022994, 
-    0.05773701, 0.06243507, 0.07160481, 0.08395875, 0.09882109, 0.11431303,
-    0.11863768, 0.12444044, 0.1337458, 0.14504504, 0.15299605, 0.15854061, 
-    0.16985345]), (100): array([0.0361047, 0.04146075, 0.0448043, 
-    0.05131964, 0.06011456, 0.07068396, 0.08173283, 0.08483991, 0.08900091,
-    0.09564506, 0.10373761, 0.10937278, 0.11338394, 0.12161442]), (200):
-    array([0.02584151, 0.02963511, 0.03200533, 0.0366259, 0.04286406, 
-    0.05036816, 0.05820564, 0.06041126, 0.06336215, 0.06809025, 0.0738379, 
-    0.07788879, 0.08074047, 0.08663425]), (400): array([0.01844162, 
-    0.02112065, 0.02280475, 0.02607295, 0.03049198, 0.03580284, 0.04135097,
-    0.04290933, 0.04500619, 0.04834792, 0.0524239, 0.05530289, 0.05733374, 
-    0.06145404]), (800): array([0.0131231, 0.01501723, 0.01620598, 
-    0.01852257, 0.02164634, 0.02540595, 0.02933599, 0.03043873, 0.03192094,
-    0.03429131, 0.03718247, 0.03922491, 0.0406419, 0.04361741]), (1600):
-    array([0.00932049, 0.01066126, 0.01150366, 0.01314135, 0.0153528, 
-    0.0180115, 0.02079512, 0.02157469, 0.02262168, 0.02429563, 0.02634302, 
-    0.02777611, 0.02879721, 0.03088286])}
-normal_asymp_crit_vals = {(1.0): array([-1.17114969, -0.45068579, -
-    0.00356741]), (5.0): array([-1.03298277, -0.45068579, -0.00356741]), (
-    10.0): array([-0.95518114, -0.45068579, -0.00356741]), (25.0): array([-
-    0.81912169, -0.45068579, -0.00356741]), (50.0): array([-0.6607348, -
-    0.45068579, -0.00356741]), (75.0): array([-0.49861004, -0.45068579, -
-    0.00356741]), (90.0): array([-0.35446139, -0.45068579, -0.00356741]), (
-    92.5): array([-0.31737193, -0.45068579, -0.00356741]), (95.0): array([-
-    0.26969888, -0.45068579, -0.00356741]), (97.5): array([-0.1979977, -
-    0.45068579, -0.00356741]), (99.0): array([-0.11709649, -0.45068579, -
-    0.00356741]), (99.5): array([-0.06398777, -0.45068579, -0.00356741]), (
-    99.7): array([-0.0281863, -0.45068579, -0.00356741]), (99.9): array([
-    0.04129756, -0.45068579, -0.00356741])}
-exp_crit_vals = {(3): array([0.2109761, 0.22988438, 0.25444542, 0.30397879,
-    0.36446945, 0.42814169, 0.51106702, 0.52962262, 0.55088872, 0.57754526,
-    0.60036174, 0.61179874, 0.61823126, 0.6294291]), (4): array([0.17384601,
-    0.20723235, 0.22806298, 0.26607407, 0.31624998, 0.38384024, 0.44420279,
-    0.46088139, 0.4843941, 0.521064, 0.55751191, 0.57850145, 0.59207384, 
-    0.62098085]), (5): array([0.15948619, 0.18836122, 0.20519471, 0.2387884,
-    0.28771415, 0.34487558, 0.40440534, 0.42088543, 0.44200346, 0.47455921,
-    0.51269224, 0.53654197, 0.55188279, 0.58123527]), (6): array([0.1480075,
-    0.17258589, 0.18825927, 0.21978293, 0.2645031, 0.31713484, 0.37308973, 
-    0.38833891, 0.4084249, 0.43908986, 0.47469271, 0.49984099, 0.5170463, 
-    0.54989819]), (7): array([0.13783797, 0.16039265, 0.17499898, 
-    0.20510973, 0.24578726, 0.29584306, 0.34806914, 0.36231862, 0.38111852,
-    0.41060597, 0.44579783, 0.46945914, 0.48562857, 0.5179461]), (8): array
-    ([0.12914489, 0.15056654, 0.16462426, 0.19280967, 0.23086956, 
-    0.27829277, 0.3273405, 0.34086546, 0.35894819, 0.3874127, 0.42078514, 
-    0.44361638, 0.45939843, 0.4910784]), (9): array([0.12209612, 0.1426062,
-    0.15604163, 0.1823721, 0.21856178, 0.26352862, 0.3101114, 0.32311637, 
-    0.3404356, 0.36744906, 0.39952102, 0.42164587, 0.43719767, 0.46827333]),
-    (10): array([0.11622346, 0.13594053, 0.14864516, 0.17351161, 0.20820576,
-    0.25083243, 0.29544984, 0.3078714, 0.32438433, 0.35032272, 0.38135762, 
-    0.40282213, 0.41758788, 0.44762269]), (11): array([0.11116996, 
-    0.13008811, 0.14214361, 0.16587204, 0.19916, 0.23987837, 0.28264075, 
-    0.29456614, 0.31039145, 0.3353923, 0.36531224, 0.38597334, 0.40039914, 
-    0.42945227]), (12): array([0.10682852, 0.12494111, 0.13642386, 
-    0.15924117, 0.19119135, 0.23024225, 0.27140493, 0.28285565, 0.29813962,
-    0.32235591, 0.35118835, 0.37131437, 0.38541362, 0.41355258]), (13):
-    array([0.10299158, 0.12035679, 0.13139418, 0.15340996, 0.18411715, 
-    0.22174274, 0.26139337, 0.27247472, 0.28721874, 0.31061368, 0.3386004, 
-    0.3579752, 0.37154656, 0.399054]), (14): array([0.09952544, 0.11620577,
-    0.12685351, 0.14816724, 0.17777806, 0.21413923, 0.25247245, 0.26312392,
-    0.27746948, 0.30001199, 0.32724359, 0.34615172, 0.3591787, 0.38578716]),
-    (15): array([0.09635825, 0.11247223, 0.12279761, 0.1434653, 0.17203838,
-    0.20728334, 0.2443769, 0.25476058, 0.26865606, 0.29062696, 0.31686937, 
-    0.33520734, 0.34795825, 0.37374779]), (16): array([0.093446, 0.10911215,
-    0.11910394, 0.1391716, 0.16686968, 0.20103119, 0.23706762, 0.24716908, 
-    0.26064721, 0.28192033, 0.30753741, 0.32530837, 0.33776015, 0.36287832]
-    ), (17): array([0.09080542, 0.10601186, 0.11575975, 0.13523043, 
-    0.16215573, 0.19534591, 0.2303959, 0.24019386, 0.25330648, 0.27400789, 
-    0.29895143, 0.31623639, 0.32845602, 0.35308388]), (18): array([
-    0.08844468, 0.10320066, 0.11271261, 0.13163588, 0.15780097, 0.19009349,
-    0.22420159, 0.2337161, 0.24646825, 0.26673429, 0.29105285, 0.30803457, 
-    0.31984228, 0.34387696]), (19): array([0.08623885, 0.1006206, 
-    0.10988138, 0.12831526, 0.15380297, 0.18527462, 0.21845938, 0.22776537,
-    0.2402398, 0.2600306, 0.28378696, 0.30035609, 0.31189906, 0.33544625]),
-    (20): array([0.08417538, 0.09821659, 0.10727612, 0.12520467, 0.15011026,
-    0.18077566, 0.21317237, 0.2222683, 0.23446664, 0.25382491, 0.27704397, 
-    0.29319327, 0.30456436, 0.32759304]), (25): array([0.07575606, 
-    0.08842104, 0.09651937, 0.1126094, 0.13493277, 0.1624784, 0.19169294, 
-    0.19987007, 0.2108557, 0.22828626, 0.24930174, 0.26403731, 0.2743157, 
-    0.29520476]), (30): array([0.06952264, 0.08110063, 0.08850223, 
-    0.10324683, 0.12365394, 0.14885572, 0.1756019, 0.18308805, 0.19314473, 
-    0.20920585, 0.2285569, 0.24218995, 0.25167821, 0.270999]), (40): array(
-    [0.06066843, 0.07071874, 0.07713823, 0.0899287, 0.10765412, 0.12954118,
-    0.15278494, 0.15934334, 0.16813288, 0.18214059, 0.19908466, 0.21086902,
-    0.2192688, 0.23619981]), (50): array([0.05453287, 0.06354216, 
-    0.06931727, 0.0807706, 0.09664734, 0.11625969, 0.13711892, 0.14300016, 
-    0.15087554, 0.16345379, 0.17865445, 0.18939212, 0.19681564, 0.21177526]
-    ), (100): array([0.03911314, 0.04550435, 0.04958972, 0.05770944, 
-    0.06896805, 0.08288491, 0.09771459, 0.10189766, 0.10752233, 0.1164863, 
-    0.12735376, 0.13497319, 0.14038704, 0.15150129]), (200): array([
-    0.02793837, 0.03246737, 0.03536757, 0.04111863, 0.04910105, 0.0589757, 
-    0.06949443, 0.0724547, 0.07645096, 0.08280575, 0.09054264, 0.09600477, 
-    0.09980637, 0.10760793]), (400): array([0.01991454, 0.023116, 
-    0.02516769, 0.0292415, 0.03488679, 0.0418774, 0.04931093, 0.05141711, 
-    0.054236, 0.05874581, 0.06421502, 0.06808278, 0.07079796, 0.07628466]),
-    (800): array([0.01415514, 0.01642927, 0.0178805, 0.02075938, 0.02475783,
-    0.02970526, 0.03496731, 0.03645351, 0.03845017, 0.04164885, 0.04552405,
-    0.0482418, 0.05017098, 0.05406992]), (1600): array([0.01005851, 
-    0.01165883, 0.0126869, 0.01472426, 0.01754881, 0.02104396, 0.02476615, 
-    0.02581898, 0.02723521, 0.02950141, 0.03224859, 0.03418578, 0.03553487,
-    0.03829879])}
-exp_asymp_crit_vals = {(1.0): array([-1.04988875, -0.46444021, -0.00252528]
-    ), (5.0): array([-0.89825066, -0.46444021, -0.00252528]), (10.0): array
-    ([-0.81197037, -0.46444021, -0.00252528]), (25.0): array([-0.66000008, 
-    -0.46444021, -0.00252528]), (50.0): array([-0.4814209, -0.46444021, -
-    0.00252528]), (75.0): array([-0.29737272, -0.46444021, -0.00252528]), (
-    90.0): array([-0.13397474, -0.46444021, -0.00252528]), (92.5): array([-
-    0.09220104, -0.46444021, -0.00252528]), (95.0): array([-0.03868789, -
-    0.46444021, -0.00252528]), (97.5): array([0.0411943, -0.46444021, -
-    0.00252528]), (99.0): array([0.130122, -0.46444021, -0.00252528]), (
-    99.5): array([0.1882404, -0.46444021, -0.00252528]), (99.7): array([
-    0.22712671, -0.46444021, -0.00252528]), (99.9): array([0.30189677, -
-    0.46444021, -0.00252528])}
+
+SAMPLE_SIZES = [
+    3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30,
+    40, 50, 100, 200, 400, 800, 1600
+]
+normal_crit_vals = {
+    4:
+    array([
+        0.14467854, 0.16876575, 0.18664724, 0.22120362, 0.25828924, 0.29341032,
+        0.34532673, 0.35917374, 0.37521968, 0.39563998, 0.41307904, 0.42157653,
+        0.4261507, 0.43265213
+    ]),
+    5:
+    array([
+        0.13587046, 0.16098893, 0.17638354, 0.20235666, 0.2333944, 0.27766941,
+        0.31900772, 0.32936832, 0.34309223, 0.36727643, 0.39671728, 0.41322814,
+        0.42293504, 0.4386304
+    ]),
+    6:
+    array([
+        0.12919635, 0.15139467, 0.16384021, 0.18597849, 0.2186713, 0.2585473,
+        0.29713753, 0.30829444, 0.32338252, 0.3456671, 0.37038945, 0.38760943,
+        0.40001813, 0.42304439
+    ]),
+    7:
+    array([
+        0.12263812, 0.14163065, 0.15238656, 0.17435948, 0.20617949, 0.24243592,
+        0.28031415, 0.29068512, 0.3042307, 0.32532967, 0.35070348, 0.3678189,
+        0.37908881, 0.40078798
+    ]),
+    8:
+    array([
+        0.11633728, 0.13297288, 0.14353311, 0.16537078, 0.19477376, 0.22936164,
+        0.2652001, 0.27501697, 0.28804474, 0.30862157, 0.33279908, 0.34911188,
+        0.3603517, 0.38252055
+    ]),
+    9:
+    array([
+        0.11029593, 0.126086, 0.1365291, 0.15748048, 0.18510669, 0.21822055,
+        0.25223085, 0.26161129, 0.27415243, 0.29383667, 0.31708299, 0.33312043,
+        0.34406772, 0.36524182
+    ]),
+    10:
+    array([
+        0.10487398, 0.12044377, 0.13065174, 0.15042835, 0.17679904, 0.20848607,
+        0.24098605, 0.25001629, 0.26202351, 0.2809226, 0.30341763, 0.31888089,
+        0.32960742, 0.35061378
+    ]),
+    11:
+    array([
+        0.10036835, 0.11563925, 0.12540948, 0.14421024, 0.1695621, 0.19993088,
+        0.23119563, 0.23987246, 0.25142048, 0.26961902, 0.29148756, 0.30646087,
+        0.31678589, 0.33743951
+    ]),
+    12:
+    array([
+        0.09649147, 0.11137479, 0.12069511, 0.13877133, 0.16313672, 0.19233706,
+        0.22244871, 0.23082937, 0.24194831, 0.25959488, 0.28077229, 0.29526801,
+        0.30538827, 0.32558601
+    ]),
+    13:
+    array([
+        0.09318309, 0.10753337, 0.11647389, 0.13389325, 0.15739248, 0.18555597,
+        0.21463087, 0.22267463, 0.23346007, 0.25054975, 0.27100552, 0.28519234,
+        0.29504428, 0.31445229
+    ]),
+    14:
+    array([
+        0.09024138, 0.1040431, 0.11266368, 0.12948657, 0.15221479, 0.17944478,
+        0.2075745, 0.21536969, 0.22585144, 0.2423819, 0.26231339, 0.2760657,
+        0.28555592, 0.30466148
+    ]),
+    15:
+    array([
+        0.08750578, 0.10085666, 0.1092263, 0.12552356, 0.14751778, 0.17390304,
+        0.2011939, 0.20879449, 0.21891697, 0.23499116, 0.25427802, 0.26772298,
+        0.27706522, 0.29551017
+    ]),
+    16:
+    array([
+        0.08501529, 0.09795236, 0.10607182, 0.1219047, 0.14327014, 0.16885124,
+        0.19537444, 0.20274039, 0.21257643, 0.22822187, 0.24703504, 0.26003621,
+        0.26913735, 0.28745211
+    ]),
+    17:
+    array([
+        0.0827393, 0.09529635, 0.10320903, 0.11859867, 0.13936135, 0.16422592,
+        0.1900207, 0.19722541, 0.20683438, 0.22204864, 0.2402864, 0.25306469,
+        0.26200104, 0.2798316
+    ]),
+    18:
+    array([
+        0.08063198, 0.09285546, 0.10055901, 0.11554003, 0.13575222, 0.15996707,
+        0.18509495, 0.1921063, 0.20145911, 0.2163409, 0.23420221, 0.24665892,
+        0.25527246, 0.27285099
+    ]),
+    19:
+    array([
+        0.0786934, 0.09059543, 0.09810264, 0.11271608, 0.1324079, 0.15602767,
+        0.18052002, 0.18736281, 0.19652196, 0.21105313, 0.22854301, 0.24068403,
+        0.24912401, 0.26624557
+    ]),
+    20:
+    array([
+        0.07686372, 0.08852309, 0.09582876, 0.11008398, 0.12930804, 0.15236687,
+        0.1762829, 0.18297149, 0.19190794, 0.20610189, 0.22327259, 0.23513834,
+        0.24349604, 0.2602868
+    ]),
+    25:
+    array([
+        0.06933943, 0.07985746, 0.08645091, 0.09927555, 0.11656692, 0.1373056,
+        0.15889286, 0.16493761, 0.17300926, 0.18582499, 0.2013265, 0.21221932,
+        0.21979198, 0.2350391
+    ]),
+    30:
+    array([
+        0.06380332, 0.07339662, 0.07942847, 0.09118039, 0.10701841, 0.12603791,
+        0.14586121, 0.15138566, 0.15878307, 0.17058849, 0.18492078, 0.19492393,
+        0.20198353, 0.21612086
+    ]),
+    40:
+    array([
+        0.055784, 0.06414642, 0.06940064, 0.07961554, 0.09339439, 0.10994266,
+        0.12719016, 0.13202416, 0.13850018, 0.14877502, 0.16131916, 0.17009968,
+        0.17633078, 0.18870328
+    ]),
+    50:
+    array([
+        0.05022994, 0.05773701, 0.06243507, 0.07160481, 0.08395875, 0.09882109,
+        0.11431303, 0.11863768, 0.12444044, 0.1337458, 0.14504504, 0.15299605,
+        0.15854061, 0.16985345
+    ]),
+    100:
+    array([
+        0.0361047, 0.04146075, 0.0448043, 0.05131964, 0.06011456, 0.07068396,
+        0.08173283, 0.08483991, 0.08900091, 0.09564506, 0.10373761, 0.10937278,
+        0.11338394, 0.12161442
+    ]),
+    200:
+    array([
+        0.02584151, 0.02963511, 0.03200533, 0.0366259, 0.04286406, 0.05036816,
+        0.05820564, 0.06041126, 0.06336215, 0.06809025, 0.0738379, 0.07788879,
+        0.08074047, 0.08663425
+    ]),
+    400:
+    array([
+        0.01844162, 0.02112065, 0.02280475, 0.02607295, 0.03049198, 0.03580284,
+        0.04135097, 0.04290933, 0.04500619, 0.04834792, 0.0524239, 0.05530289,
+        0.05733374, 0.06145404
+    ]),
+    800:
+    array([
+        0.0131231, 0.01501723, 0.01620598, 0.01852257, 0.02164634, 0.02540595,
+        0.02933599, 0.03043873, 0.03192094, 0.03429131, 0.03718247, 0.03922491,
+        0.0406419, 0.04361741
+    ]),
+    1600:
+    array([
+        0.00932049, 0.01066126, 0.01150366, 0.01314135, 0.0153528, 0.0180115,
+        0.02079512, 0.02157469, 0.02262168, 0.02429563, 0.02634302, 0.02777611,
+        0.02879721, 0.03088286
+    ])
+}
+
+# Coefficients are model log(cv) = b[0] + b[1] log(n) + b[2] log(n)**2
+normal_asymp_crit_vals = {
+    1.0: array([-1.17114969, -0.45068579, -0.00356741]),
+    5.0: array([-1.03298277, -0.45068579, -0.00356741]),
+    10.0: array([-0.95518114, -0.45068579, -0.00356741]),
+    25.0: array([-0.81912169, -0.45068579, -0.00356741]),
+    50.0: array([-0.6607348, -0.45068579, -0.00356741]),
+    75.0: array([-0.49861004, -0.45068579, -0.00356741]),
+    90.0: array([-0.35446139, -0.45068579, -0.00356741]),
+    92.5: array([-0.31737193, -0.45068579, -0.00356741]),
+    95.0: array([-0.26969888, -0.45068579, -0.00356741]),
+    97.5: array([-0.1979977, -0.45068579, -0.00356741]),
+    99.0: array([-0.11709649, -0.45068579, -0.00356741]),
+    99.5: array([-0.06398777, -0.45068579, -0.00356741]),
+    99.7: array([-0.0281863, -0.45068579, -0.00356741]),
+    99.9: array([0.04129756, -0.45068579, -0.00356741])
+}
+
+
+exp_crit_vals = {
+    3:
+    array([
+        0.2109761, 0.22988438, 0.25444542, 0.30397879, 0.36446945, 0.42814169,
+        0.51106702, 0.52962262, 0.55088872, 0.57754526, 0.60036174, 0.61179874,
+        0.61823126, 0.6294291
+    ]),
+    4:
+    array([
+        0.17384601, 0.20723235, 0.22806298, 0.26607407, 0.31624998, 0.38384024,
+        0.44420279, 0.46088139, 0.4843941, 0.521064, 0.55751191, 0.57850145,
+        0.59207384, 0.62098085
+    ]),
+    5:
+    array([
+        0.15948619, 0.18836122, 0.20519471, 0.2387884, 0.28771415, 0.34487558,
+        0.40440534, 0.42088543, 0.44200346, 0.47455921, 0.51269224, 0.53654197,
+        0.55188279, 0.58123527
+    ]),
+    6:
+    array([
+        0.1480075, 0.17258589, 0.18825927, 0.21978293, 0.2645031, 0.31713484,
+        0.37308973, 0.38833891, 0.4084249, 0.43908986, 0.47469271, 0.49984099,
+        0.5170463, 0.54989819
+    ]),
+    7:
+    array([
+        0.13783797, 0.16039265, 0.17499898, 0.20510973, 0.24578726, 0.29584306,
+        0.34806914, 0.36231862, 0.38111852, 0.41060597, 0.44579783, 0.46945914,
+        0.48562857, 0.5179461
+    ]),
+    8:
+    array([
+        0.12914489, 0.15056654, 0.16462426, 0.19280967, 0.23086956, 0.27829277,
+        0.3273405, 0.34086546, 0.35894819, 0.3874127, 0.42078514, 0.44361638,
+        0.45939843, 0.4910784
+    ]),
+    9:
+    array([
+        0.12209612, 0.1426062, 0.15604163, 0.1823721, 0.21856178, 0.26352862,
+        0.3101114, 0.32311637, 0.3404356, 0.36744906, 0.39952102, 0.42164587,
+        0.43719767, 0.46827333
+    ]),
+    10:
+    array([
+        0.11622346, 0.13594053, 0.14864516, 0.17351161, 0.20820576, 0.25083243,
+        0.29544984, 0.3078714, 0.32438433, 0.35032272, 0.38135762, 0.40282213,
+        0.41758788, 0.44762269
+    ]),
+    11:
+    array([
+        0.11116996, 0.13008811, 0.14214361, 0.16587204, 0.19916, 0.23987837,
+        0.28264075, 0.29456614, 0.31039145, 0.3353923, 0.36531224, 0.38597334,
+        0.40039914, 0.42945227
+    ]),
+    12:
+    array([
+        0.10682852, 0.12494111, 0.13642386, 0.15924117, 0.19119135, 0.23024225,
+        0.27140493, 0.28285565, 0.29813962, 0.32235591, 0.35118835, 0.37131437,
+        0.38541362, 0.41355258
+    ]),
+    13:
+    array([
+        0.10299158, 0.12035679, 0.13139418, 0.15340996, 0.18411715, 0.22174274,
+        0.26139337, 0.27247472, 0.28721874, 0.31061368, 0.3386004, 0.3579752,
+        0.37154656, 0.399054
+    ]),
+    14:
+    array([
+        0.09952544, 0.11620577, 0.12685351, 0.14816724, 0.17777806, 0.21413923,
+        0.25247245, 0.26312392, 0.27746948, 0.30001199, 0.32724359, 0.34615172,
+        0.3591787, 0.38578716
+    ]),
+    15:
+    array([
+        0.09635825, 0.11247223, 0.12279761, 0.1434653, 0.17203838, 0.20728334,
+        0.2443769, 0.25476058, 0.26865606, 0.29062696, 0.31686937, 0.33520734,
+        0.34795825, 0.37374779
+    ]),
+    16:
+    array([
+        0.093446, 0.10911215, 0.11910394, 0.1391716, 0.16686968, 0.20103119,
+        0.23706762, 0.24716908, 0.26064721, 0.28192033, 0.30753741, 0.32530837,
+        0.33776015, 0.36287832
+    ]),
+    17:
+    array([
+        0.09080542, 0.10601186, 0.11575975, 0.13523043, 0.16215573, 0.19534591,
+        0.2303959, 0.24019386, 0.25330648, 0.27400789, 0.29895143, 0.31623639,
+        0.32845602, 0.35308388
+    ]),
+    18:
+    array([
+        0.08844468, 0.10320066, 0.11271261, 0.13163588, 0.15780097, 0.19009349,
+        0.22420159, 0.2337161, 0.24646825, 0.26673429, 0.29105285, 0.30803457,
+        0.31984228, 0.34387696
+    ]),
+    19:
+    array([
+        0.08623885, 0.1006206, 0.10988138, 0.12831526, 0.15380297, 0.18527462,
+        0.21845938, 0.22776537, 0.2402398, 0.2600306, 0.28378696, 0.30035609,
+        0.31189906, 0.33544625
+    ]),
+    20:
+    array([
+        0.08417538, 0.09821659, 0.10727612, 0.12520467, 0.15011026, 0.18077566,
+        0.21317237, 0.2222683, 0.23446664, 0.25382491, 0.27704397, 0.29319327,
+        0.30456436, 0.32759304
+    ]),
+    25:
+    array([
+        0.07575606, 0.08842104, 0.09651937, 0.1126094, 0.13493277, 0.1624784,
+        0.19169294, 0.19987007, 0.2108557, 0.22828626, 0.24930174, 0.26403731,
+        0.2743157, 0.29520476
+    ]),
+    30:
+    array([
+        0.06952264, 0.08110063, 0.08850223, 0.10324683, 0.12365394, 0.14885572,
+        0.1756019, 0.18308805, 0.19314473, 0.20920585, 0.2285569, 0.24218995,
+        0.25167821, 0.270999
+    ]),
+    40:
+    array([
+        0.06066843, 0.07071874, 0.07713823, 0.0899287, 0.10765412, 0.12954118,
+        0.15278494, 0.15934334, 0.16813288, 0.18214059, 0.19908466, 0.21086902,
+        0.2192688, 0.23619981
+    ]),
+    50:
+    array([
+        0.05453287, 0.06354216, 0.06931727, 0.0807706, 0.09664734, 0.11625969,
+        0.13711892, 0.14300016, 0.15087554, 0.16345379, 0.17865445, 0.18939212,
+        0.19681564, 0.21177526
+    ]),
+    100:
+    array([
+        0.03911314, 0.04550435, 0.04958972, 0.05770944, 0.06896805, 0.08288491,
+        0.09771459, 0.10189766, 0.10752233, 0.1164863, 0.12735376, 0.13497319,
+        0.14038704, 0.15150129
+    ]),
+    200:
+    array([
+        0.02793837, 0.03246737, 0.03536757, 0.04111863, 0.04910105, 0.0589757,
+        0.06949443, 0.0724547, 0.07645096, 0.08280575, 0.09054264, 0.09600477,
+        0.09980637, 0.10760793
+    ]),
+    400:
+    array([
+        0.01991454, 0.023116, 0.02516769, 0.0292415, 0.03488679, 0.0418774,
+        0.04931093, 0.05141711, 0.054236, 0.05874581, 0.06421502, 0.06808278,
+        0.07079796, 0.07628466
+    ]),
+    800:
+    array([
+        0.01415514, 0.01642927, 0.0178805, 0.02075938, 0.02475783, 0.02970526,
+        0.03496731, 0.03645351, 0.03845017, 0.04164885, 0.04552405, 0.0482418,
+        0.05017098, 0.05406992
+    ]),
+    1600:
+    array([
+        0.01005851, 0.01165883, 0.0126869, 0.01472426, 0.01754881, 0.02104396,
+        0.02476615, 0.02581898, 0.02723521, 0.02950141, 0.03224859, 0.03418578,
+        0.03553487, 0.03829879
+    ])
+}
+
+# Coefficients are model log(cv) = b[0] + b[1] log(n) + b[2] log(n)**2
+exp_asymp_crit_vals = {
+    1.0: array([-1.04988875, -0.46444021, -0.00252528]),
+    5.0: array([-0.89825066, -0.46444021, -0.00252528]),
+    10.0: array([-0.81197037, -0.46444021, -0.00252528]),
+    25.0: array([-0.66000008, -0.46444021, -0.00252528]),
+    50.0: array([-0.4814209, -0.46444021, -0.00252528]),
+    75.0: array([-0.29737272, -0.46444021, -0.00252528]),
+    90.0: array([-0.13397474, -0.46444021, -0.00252528]),
+    92.5: array([-0.09220104, -0.46444021, -0.00252528]),
+    95.0: array([-0.03868789, -0.46444021, -0.00252528]),
+    97.5: array([0.0411943, -0.46444021, -0.00252528]),
+    99.0: array([0.130122, -0.46444021, -0.00252528]),
+    99.5: array([0.1882404, -0.46444021, -0.00252528]),
+    99.7: array([0.22712671, -0.46444021, -0.00252528]),
+    99.9: array([0.30189677, -0.46444021, -0.00252528])
+}
+
+
+# Critical Value
 critical_values = {'normal': normal_crit_vals, 'exp': exp_crit_vals}
-asymp_critical_values = {'normal': normal_asymp_crit_vals, 'exp':
-    exp_asymp_crit_vals}
+asymp_critical_values = {
+    'normal': normal_asymp_crit_vals,
+    'exp': exp_asymp_crit_vals
+}
diff --git a/statsmodels/stats/anova.py b/statsmodels/stats/anova.py
index 5a1415a17..e32bc7b85 100644
--- a/statsmodels/stats/anova.py
+++ b/statsmodels/stats/anova.py
@@ -1,14 +1,37 @@
 from statsmodels.compat.python import lrange
+
 import numpy as np
 import pandas as pd
 from pandas import DataFrame, Index
 import patsy
 from scipy import stats
-from statsmodels.formula.formulatools import _has_intercept, _intercept_idx, _remove_intercept_patsy
+
+from statsmodels.formula.formulatools import (
+    _has_intercept,
+    _intercept_idx,
+    _remove_intercept_patsy,
+)
 from statsmodels.iolib import summary2
 from statsmodels.regression.linear_model import OLS


+def _get_covariance(model, robust):
+    if robust is None:
+        return model.cov_params()
+    elif robust == "hc0":
+        return model.cov_HC0
+    elif robust == "hc1":
+        return model.cov_HC1
+    elif robust == "hc2":
+        return model.cov_HC2
+    elif robust == "hc3":
+        return model.cov_HC3
+    else:  # pragma: no cover
+        raise ValueError("robust options %s not understood" % robust)
+
+
+# NOTE: these need to take into account weights !
+
 def anova_single(model, **kwargs):
     """
     Anova table for one fitted linear model.
@@ -32,11 +55,45 @@ def anova_single(model, **kwargs):
     -----
     Use of this function is discouraged. Use anova_lm instead.
     """
-    pass
-
-
-def anova1_lm_single(model, endog, exog, nobs, design_info, table, n_rows,
-    test, pr_test, robust):
+    test = kwargs.get("test", "F")
+    scale = kwargs.get("scale", None)
+    typ = kwargs.get("typ", 1)
+    robust = kwargs.get("robust", None)
+    if robust:
+        robust = robust.lower()
+
+    endog = model.model.endog
+    exog = model.model.exog
+    nobs = exog.shape[0]
+
+    response_name = model.model.endog_names
+    design_info = model.model.data.design_info
+    exog_names = model.model.exog_names
+    # +1 for resids
+    n_rows = (len(design_info.terms) - _has_intercept(design_info) + 1)
+
+    pr_test = "PR(>%s)" % test
+    names = ['df', 'sum_sq', 'mean_sq', test, pr_test]
+
+    table = DataFrame(np.zeros((n_rows, 5)), columns=names)
+
+    if typ in [1, "I"]:
+        return anova1_lm_single(model, endog, exog, nobs, design_info, table,
+                                n_rows, test, pr_test, robust)
+    elif typ in [2, "II"]:
+        return anova2_lm_single(model, design_info, n_rows, test, pr_test,
+                                robust)
+    elif typ in [3, "III"]:
+        return anova3_lm_single(model, design_info, n_rows, test, pr_test,
+                                robust)
+    elif typ in [4, "IV"]:
+        raise NotImplementedError("Type IV not yet implemented")
+    else:  # pragma: no cover
+        raise ValueError("Type %s not understood" % str(typ))
+
+
+def anova1_lm_single(model, endog, exog, nobs, design_info, table, n_rows, test,
+                     pr_test, robust):
     """
     Anova table for one fitted linear model.

@@ -57,9 +114,39 @@ def anova1_lm_single(model, endog, exog, nobs, design_info, table, n_rows,
     -----
     Use of this function is discouraged. Use anova_lm instead.
     """
-    pass
-
-
+    #maybe we should rethink using pinv > qr in OLS/linear models?
+    effects = getattr(model, 'effects', None)
+    if effects is None:
+        q,r = np.linalg.qr(exog)
+        effects = np.dot(q.T, endog)
+
+    arr = np.zeros((len(design_info.terms), len(design_info.column_names)))
+    slices = [design_info.slice(name) for name in design_info.term_names]
+    for i,slice_ in enumerate(slices):
+        arr[i, slice_] = 1
+
+    sum_sq = np.dot(arr, effects**2)
+    #NOTE: assumes intercept is first column
+    idx = _intercept_idx(design_info)
+    sum_sq = sum_sq[~idx]
+    term_names = np.array(design_info.term_names) # want boolean indexing
+    term_names = term_names[~idx]
+
+    index = term_names.tolist()
+    table.index = Index(index + ['Residual'])
+    table.loc[index, ['df', 'sum_sq']] = np.c_[arr[~idx].sum(1), sum_sq]
+    # fill in residual
+    table.loc['Residual', ['sum_sq','df']] = model.ssr, model.df_resid
+    if test == 'F':
+        table[test] = ((table['sum_sq'] / table['df']) /
+                       (model.ssr / model.df_resid))
+        table[pr_test] = stats.f.sf(table["F"], table["df"],
+                                    model.df_resid)
+        table.loc['Residual', [test, pr_test]] = np.nan, np.nan
+    table['mean_sq'] = table['sum_sq'] / table['df']
+    return table
+
+#NOTE: the below is not agnostic about formula...
 def anova2_lm_single(model, design_info, n_rows, test, pr_test, robust):
     """
     Anova type II table for one fitted linear model.
@@ -85,8 +172,107 @@ def anova2_lm_single(model, design_info, n_rows, test, pr_test, robust):
     Sum of Squares compares marginal contribution of terms. Thus, it is
     not particularly useful for models with significant interaction terms.
     """
-    pass
-
+    terms_info = design_info.terms[:] # copy
+    terms_info = _remove_intercept_patsy(terms_info)
+
+    names = ['sum_sq', 'df', test, pr_test]
+
+    table = DataFrame(np.zeros((n_rows, 4)), columns = names)
+    cov = _get_covariance(model, None)
+    robust_cov = _get_covariance(model, robust)
+    col_order = []
+    index = []
+    for i, term in enumerate(terms_info):
+        # grab all varaibles except interaction effects that contain term
+        # need two hypotheses matrices L1 is most restrictive, ie., term==0
+        # L2 is everything except term==0
+        cols = design_info.slice(term)
+        L1 = lrange(cols.start, cols.stop)
+        L2 = []
+        term_set = set(term.factors)
+        for t in terms_info: # for the term you have
+            other_set = set(t.factors)
+            if term_set.issubset(other_set) and not term_set == other_set:
+                col = design_info.slice(t)
+                # on a higher order term containing current `term`
+                L1.extend(lrange(col.start, col.stop))
+                L2.extend(lrange(col.start, col.stop))
+
+        L1 = np.eye(model.model.exog.shape[1])[L1]
+        L2 = np.eye(model.model.exog.shape[1])[L2]
+
+        if L2.size:
+            LVL = np.dot(np.dot(L1,robust_cov),L2.T)
+            from scipy import linalg
+            orth_compl,_ = linalg.qr(LVL)
+            r = L1.shape[0] - L2.shape[0]
+            # L1|2
+            # use the non-unique orthogonal completion since L12 is rank r
+            L12 = np.dot(orth_compl[:,-r:].T, L1)
+        else:
+            L12 = L1
+            r = L1.shape[0]
+        #from IPython.core.debugger import Pdb; Pdb().set_trace()
+        if test == 'F':
+            f = model.f_test(L12, cov_p=robust_cov)
+            table.loc[table.index[i], test] = test_value = f.fvalue
+            table.loc[table.index[i], pr_test] = f.pvalue
+
+        # need to back out SSR from f_test
+        table.loc[table.index[i], 'df'] = r
+        col_order.append(cols.start)
+        index.append(term.name())
+
+    table.index = Index(index + ['Residual'])
+    table = table.iloc[np.argsort(col_order + [model.model.exog.shape[1]+1])]
+    # back out sum of squares from f_test
+    ssr = table[test] * table['df'] * model.ssr/model.df_resid
+    table['sum_sq'] = ssr
+    # fill in residual
+    table.loc['Residual', ['sum_sq','df', test, pr_test]] = (model.ssr,
+                                                            model.df_resid,
+                                                            np.nan, np.nan)
+
+    return table
+
+def anova3_lm_single(model, design_info, n_rows, test, pr_test, robust):
+    n_rows += _has_intercept(design_info)
+    terms_info = design_info.terms
+
+    names = ['sum_sq', 'df', test, pr_test]
+
+    table = DataFrame(np.zeros((n_rows, 4)), columns = names)
+    cov = _get_covariance(model, robust)
+    col_order = []
+    index = []
+    for i, term in enumerate(terms_info):
+        # grab term, hypothesis is that term == 0
+        cols = design_info.slice(term)
+        L1 = np.eye(model.model.exog.shape[1])[cols]
+        L12 = L1
+        r = L1.shape[0]
+
+        if test == 'F':
+            f = model.f_test(L12, cov_p=cov)
+            table.loc[table.index[i], test] = test_value = f.fvalue
+            table.loc[table.index[i], pr_test] = f.pvalue
+
+        # need to back out SSR from f_test
+        table.loc[table.index[i], 'df'] = r
+        #col_order.append(cols.start)
+        index.append(term.name())
+
+    table.index = Index(index + ['Residual'])
+    #NOTE: Do not need to sort because terms are an ordered dict now
+    #table = table.iloc[np.argsort(col_order + [model.model.exog.shape[1]+1])]
+    # back out sum of squares from f_test
+    ssr = table[test] * table['df'] * model.ssr/model.df_resid
+    table['sum_sq'] = ssr
+    # fill in residual
+    table.loc['Residual', ['sum_sq','df', test, pr_test]] = (model.ssr,
+                                                            model.df_resid,
+                                                            np.nan, np.nan)
+    return table

 def anova_lm(*args, **kwargs):
     """
@@ -158,7 +344,48 @@ def anova_lm(*args, **kwargs):
     >>> table = sm.stats.anova_lm(moore_lm, typ=2) # Type 2 Anova DataFrame
     >>> print(table)
     """
-    pass
+    typ = kwargs.get('typ', 1)
+
+    ### Farm Out Single model Anova Type I, II, III, and IV ###
+
+    if len(args) == 1:
+        model = args[0]
+        return anova_single(model, **kwargs)
+
+    if typ not in [1, "I"]:
+        raise ValueError("Multiple models only supported for type I. "
+                         "Got type %s" % str(typ))
+
+    test = kwargs.get("test", "F")
+    scale = kwargs.get("scale", None)
+    n_models = len(args)
+    pr_test = "Pr(>%s)" % test
+    names = ['df_resid', 'ssr', 'df_diff', 'ss_diff', test, pr_test]
+    table = DataFrame(np.zeros((n_models, 6)), columns=names)
+
+    if not scale: # assume biggest model is last
+        scale = args[-1].scale
+
+    table["ssr"] = [mdl.ssr for mdl in args]
+    table["df_resid"] = [mdl.df_resid for mdl in args]
+    table.loc[table.index[1:], "df_diff"] = -np.diff(table["df_resid"].values)
+    table["ss_diff"] = -table["ssr"].diff()
+    if test == "F":
+        table["F"] = table["ss_diff"] / table["df_diff"] / scale
+        table[pr_test] = stats.f.sf(table["F"], table["df_diff"],
+                                    table["df_resid"])
+        # for earlier scipy - stats.f.sf(np.nan, 10, 2) -> 0 not nan
+        table.loc[table['F'].isnull(), pr_test] = np.nan
+
+    return table
+
+
+def _not_slice(slices, slices_to_exclude, n):
+    ind = np.array([True]*n)
+    for term in slices_to_exclude:
+        s = slices[term]
+        ind[s] = False
+    return ind


 def _ssr_reduced_model(y, x, term_slices, params, keys):
@@ -187,7 +414,12 @@ def _ssr_reduced_model(y, x, term_slices, params, keys):
     df : int
         degrees of freedom
     """
-    pass
+    ind = _not_slice(term_slices, keys, x.shape[1])
+    params1 = params[ind]
+    ssr = np.subtract(y, x[:, ind].dot(params1))
+    ssr = ssr.T.dot(ssr)
+    df_resid = len(y) - len(params1)
+    return ssr, df_resid


 class AnovaRM:
@@ -247,33 +479,41 @@ class AnovaRM:
     """

     def __init__(self, data, depvar, subject, within=None, between=None,
-        aggregate_func=None):
+                 aggregate_func=None):
         self.data = data
         self.depvar = depvar
         self.within = within
         if 'C' in within:
-            raise ValueError(
-                "Factor name cannot be 'C'! This is in conflict with patsy's contrast function name."
-                )
+            raise ValueError("Factor name cannot be 'C'! This is in conflict "
+                             "with patsy's contrast function name.")
         self.between = between
         if between is not None:
-            raise NotImplementedError(
-                'Between subject effect not yet supported!')
+            raise NotImplementedError('Between subject effect not '
+                                      'yet supported!')
         self.subject = subject
+
         if aggregate_func == 'mean':
             self.aggregate_func = pd.Series.mean
         else:
             self.aggregate_func = aggregate_func
+
         if not data.equals(data.drop_duplicates(subset=[subject] + within)):
             if self.aggregate_func is not None:
                 self._aggregate()
             else:
-                msg = (
-                    'The data set contains more than one observation per subject and cell. Either aggregate the data manually, or pass the `aggregate_func` parameter.'
-                    )
+                msg = ('The data set contains more than one observation per '
+                       'subject and cell. Either aggregate the data manually, '
+                       'or pass the `aggregate_func` parameter.')
                 raise ValueError(msg)
+
         self._check_data_balanced()

+    def _aggregate(self):
+        self.data = (self.data
+                     .groupby([self.subject] + self.within,
+                              as_index=False)[self.depvar]
+                     .agg(self.aggregate_func))
+
     def _check_data_balanced(self):
         """raise if data is not balanced

@@ -282,7 +522,30 @@ class AnovaRM:

         Return might change
         """
-        pass
+        factor_levels = 1
+        for wi in self.within:
+            factor_levels *= len(self.data[wi].unique())
+
+        cell_count = {}
+        for index in range(self.data.shape[0]):
+            key = []
+            for col in self.within:
+                key.append(self.data[col].iloc[index])
+            key = tuple(key)
+            if key in cell_count:
+                cell_count[key] = cell_count[key] + 1
+            else:
+                cell_count[key] = 1
+        error_message = "Data is unbalanced."
+        if len(cell_count) != factor_levels:
+            raise ValueError(error_message)
+        count = cell_count[key]
+        for key in cell_count:
+            if count != cell_count[key]:
+                raise ValueError(error_message)
+        if self.data.shape[0] > count * factor_levels:
+            raise ValueError('There are more than 1 element in a cell! Missing'
+                             ' factors?')

     def fit(self):
         """estimate the model and compute the Anova table
@@ -291,7 +554,64 @@ class AnovaRM:
         -------
         AnovaResults instance
         """
-        pass
+        y = self.data[self.depvar].values
+
+        # Construct OLS endog and exog from string using patsy
+        within = ['C(%s, Sum)' % i for i in self.within]
+        subject = 'C(%s, Sum)' % self.subject
+        factors = within + [subject]
+        x = patsy.dmatrix('*'.join(factors), data=self.data)
+        term_slices = x.design_info.term_name_slices
+        for key in term_slices:
+            ind = np.array([False]*x.shape[1])
+            ind[term_slices[key]] = True
+            term_slices[key] = np.array(ind)
+        term_exclude = [':'.join(factors)]
+        ind = _not_slice(term_slices, term_exclude, x.shape[1])
+        x = x[:, ind]
+
+        # Fit OLS
+        model = OLS(y, x)
+        results = model.fit()
+        if model.rank < x.shape[1]:
+            raise ValueError('Independent variables are collinear.')
+        for i in term_exclude:
+            term_slices.pop(i)
+        for key in term_slices:
+            term_slices[key] = term_slices[key][ind]
+        params = results.params
+        df_resid = results.df_resid
+        ssr = results.ssr
+
+        columns = ['F Value', 'Num DF', 'Den DF', 'Pr > F']
+        anova_table = pd.DataFrame(np.zeros((0, 4)), columns=columns)
+
+        for key in term_slices:
+            if self.subject not in key and key != 'Intercept':
+                #  Independen variables are orthogonal
+                ssr1, df_resid1 = _ssr_reduced_model(
+                    y, x, term_slices, params, [key])
+                df1 = df_resid1 - df_resid
+                msm = (ssr1 - ssr) / df1
+                if (key == ':'.join(factors[:-1]) or
+                        (key + ':' + subject not in term_slices)):
+                    mse = ssr / df_resid
+                    df2 = df_resid
+                else:
+                    ssr1, df_resid1 = _ssr_reduced_model(
+                        y, x, term_slices, params,
+                        [key + ':' + subject])
+                    df2 = df_resid1 - df_resid
+                    mse = (ssr1 - ssr) / df2
+                F = msm / mse
+                p = stats.f.sf(F, df1, df2)
+                term = key.replace('C(', '').replace(', Sum)', '')
+                anova_table.loc[term, 'F Value'] = F
+                anova_table.loc[term, 'Num DF'] = df1
+                anova_table.loc[term, 'Den DF'] = df2
+                anova_table.loc[term, 'Pr > F'] = p
+
+        return AnovaResults(anova_table)


 class AnovaResults:
@@ -302,7 +622,6 @@ class AnovaResults:
     ----------
     anova_table : DataFrame
     """
-
     def __init__(self, anova_table):
         self.anova_table = anova_table

@@ -316,15 +635,34 @@ class AnovaResults:
         -------
         summary : summary2.Summary instance
         """
-        pass
+        summ = summary2.Summary()
+        summ.add_title('Anova')
+        summ.add_df(self.anova_table)
+
+        return summ


-if __name__ == '__main__':
+if __name__ == "__main__":
     import pandas
+
     from statsmodels.formula.api import ols
-    moore = pandas.read_csv('moore.csv', skiprows=1, names=[
-        'partner_status', 'conformity', 'fcategory', 'fscore'])
+
+    # in R
+    #library(car)
+    #write.csv(Moore, "moore.csv", row.names=FALSE)
+    moore = pandas.read_csv('moore.csv', skiprows=1,
+                            names=['partner_status','conformity',
+                                   'fcategory','fscore'])
     moore_lm = ols('conformity ~ C(fcategory, Sum)*C(partner_status, Sum)',
-        data=moore).fit()
+                    data=moore).fit()
+
     mooreB = ols('conformity ~ C(partner_status, Sum)', data=moore).fit()
+
+    # for each term you just want to test vs the model without its
+    # higher-order terms
+
+    # using Monette-Fox slides and Marden class notes for linear algebra /
+    # orthogonal complement
+    # https://netfiles.uiuc.edu/jimarden/www/Classes/STAT324/
+
     table = anova_lm(moore_lm, typ=2)
diff --git a/statsmodels/stats/api.py b/statsmodels/stats/api.py
index 0876bf32d..f00462cf5 100644
--- a/statsmodels/stats/api.py
+++ b/statsmodels/stats/api.py
@@ -1,76 +1,270 @@
 from . import diagnostic
-from .diagnostic import acorr_ljungbox, acorr_breusch_godfrey, acorr_lm, compare_cox, compare_j, compare_encompassing, het_goldfeldquandt, het_breuschpagan, het_white, het_arch, linear_harvey_collier, linear_rainbow, linear_lm, linear_reset, breaks_cusumolsresid, breaks_hansen, recursive_olsresiduals, spec_white
+from .diagnostic import (
+    acorr_ljungbox, acorr_breusch_godfrey, acorr_lm,
+    compare_cox, compare_j, compare_encompassing,
+    het_goldfeldquandt,
+    het_breuschpagan, het_white, het_arch,
+    linear_harvey_collier, linear_rainbow, linear_lm, linear_reset,
+    breaks_cusumolsresid, breaks_hansen, recursive_olsresiduals,
+    spec_white
+    )
 from ._adnorm import normal_ad
 from ._lilliefors import lilliefors
+
 from ._knockoff import RegressionFDR
 from . import multicomp
-from .multitest import multipletests, fdrcorrection, fdrcorrection_twostage, local_fdr, NullDistribution
+from .multitest import (multipletests, fdrcorrection, fdrcorrection_twostage,
+                        local_fdr, NullDistribution)
 from .multicomp import tukeyhsd
 from . import gof
-from .gof import powerdiscrepancy, gof_chisquare_discrete, chisquare_effectsize
+from .gof import (powerdiscrepancy, gof_chisquare_discrete,
+                  chisquare_effectsize)
 from . import stattools
 from .stattools import durbin_watson, omni_normtest, jarque_bera
+
 from . import sandwich_covariance
-from .sandwich_covariance import cov_cluster, cov_cluster_2groups, cov_nw_panel, cov_hac, cov_white_simple, cov_hc0, cov_hc1, cov_hc2, cov_hc3, se_cov
-from .weightstats import DescrStatsW, CompareMeans, ttest_ind, ttost_ind, ttost_paired, ztest, ztost, zconfint
-from .proportion import binom_test_reject_interval, binom_test, binom_tost, binom_tost_reject_interval, power_binom_tost, power_ztost_prop, proportion_confint, proportion_effectsize, samplesize_confint_proportion, proportions_chisquare, proportions_chisquare_allpairs, proportions_chisquare_pairscontrol, proportions_ztest, proportions_ztost, multinomial_proportions_confint, confint_proportions_2indep, power_proportions_2indep, samplesize_proportions_2indep_onetail, test_proportions_2indep, tost_proportions_2indep
-from .rates import test_poisson, confint_poisson, confint_quantile_poisson, tolerance_int_poisson, etest_poisson_2indep, test_poisson_2indep, tost_poisson_2indep, confint_poisson_2indep, nonequivalence_poisson_2indep, power_poisson_ratio_2indep, power_poisson_diff_2indep, power_equivalence_poisson_2indep, power_negbin_ratio_2indep, power_equivalence_neginb_2indep
-from .oneway import anova_oneway, equivalence_oneway, test_scale_oneway, equivalence_scale_oneway, effectsize_oneway, power_equivalence_oneway, simulate_power_equivalence_oneway, anova_generic, equivalence_oneway_generic, confint_effectsize_oneway, confint_noncentrality, convert_effectsize_fsqu, f2_to_wellek, fstat_to_wellek, wellek_to_f2
-from .multivariate import test_cov, test_cov_blockdiagonal, test_cov_diagonal, test_cov_oneway, test_cov_spherical, test_mvmean, confint_mvmean, confint_mvmean_fromstats, test_mvmean_2indep
-from .power import TTestPower, TTestIndPower, GofChisquarePower, NormalIndPower, FTestAnovaPower, FTestPower, tt_solve_power, tt_ind_solve_power, zt_ind_solve_power
+from .sandwich_covariance import (
+    cov_cluster, cov_cluster_2groups, cov_nw_panel,
+    cov_hac, cov_white_simple,
+    cov_hc0, cov_hc1, cov_hc2, cov_hc3,
+    se_cov
+    )
+
+from .weightstats import (DescrStatsW, CompareMeans, ttest_ind, ttost_ind,
+                          ttost_paired, ztest, ztost, zconfint)
+
+from .proportion import (
+    binom_test_reject_interval, binom_test,
+    binom_tost, binom_tost_reject_interval,
+    power_binom_tost, power_ztost_prop,
+    proportion_confint, proportion_effectsize,
+    samplesize_confint_proportion,
+    proportions_chisquare, proportions_chisquare_allpairs,
+    proportions_chisquare_pairscontrol, proportions_ztest,
+    proportions_ztost, multinomial_proportions_confint,
+    # 2 sample functions:
+    confint_proportions_2indep, power_proportions_2indep,
+    samplesize_proportions_2indep_onetail,
+    test_proportions_2indep, tost_proportions_2indep,
+    )
+
+from .rates import (
+    # 1 sample:
+    test_poisson, confint_poisson,
+    confint_quantile_poisson, tolerance_int_poisson,
+    # 2-sample
+    etest_poisson_2indep, test_poisson_2indep, tost_poisson_2indep,
+    confint_poisson_2indep, nonequivalence_poisson_2indep,
+    # power
+    power_poisson_ratio_2indep, power_poisson_diff_2indep,
+    power_equivalence_poisson_2indep,
+    power_negbin_ratio_2indep, power_equivalence_neginb_2indep
+    )
+
+from .oneway import (
+        # mean and scale
+        anova_oneway, equivalence_oneway,
+        test_scale_oneway, equivalence_scale_oneway,
+        # power
+        effectsize_oneway,
+        power_equivalence_oneway, simulate_power_equivalence_oneway,
+        # from stats
+        anova_generic, equivalence_oneway_generic,
+        # effect size
+        confint_effectsize_oneway, confint_noncentrality,
+        convert_effectsize_fsqu,
+        f2_to_wellek, fstat_to_wellek, wellek_to_f2
+        )
+
+from .multivariate import (
+        test_cov, test_cov_blockdiagonal, test_cov_diagonal, test_cov_oneway,
+        test_cov_spherical,
+        test_mvmean, confint_mvmean, confint_mvmean_fromstats,
+        test_mvmean_2indep,
+        )
+
+from .power import (TTestPower, TTestIndPower, GofChisquarePower,
+                    NormalIndPower, FTestAnovaPower, FTestPower,
+                    tt_solve_power, tt_ind_solve_power, zt_ind_solve_power)
+
 from .descriptivestats import Describe
+
 from .anova import anova_lm, AnovaRM
-from .inter_rater import cohens_kappa, fleiss_kappa
+
+from .inter_rater import (
+    cohens_kappa, fleiss_kappa
+    )
+
 from .oaxaca import OaxacaBlinder
+
 from . import moment_helpers
-from .correlation_tools import corr_clipped, corr_nearest, corr_nearest_factor, corr_thresholded, cov_nearest, cov_nearest_factor_homog, FactoredPSDMatrix
+from .correlation_tools import (
+    corr_clipped, corr_nearest,
+    corr_nearest_factor, corr_thresholded, cov_nearest,
+    cov_nearest_factor_homog, FactoredPSDMatrix)
+
 from statsmodels.sandbox.stats.runs import Runs, runstest_1samp, runstest_2samp
-from statsmodels.stats.contingency_tables import mcnemar, cochrans_q, SquareTable, Table2x2, Table, StratifiedTable
+
+from statsmodels.stats.contingency_tables import (mcnemar, cochrans_q,
+                                                  SquareTable,
+                                                  Table2x2,
+                                                  Table,
+                                                  StratifiedTable)
 from .mediation import Mediation
-from .meta_analysis import combine_effects, effectsize_2proportions, effectsize_smd
-__all__ = ['AnovaRM', 'CompareMeans', 'DescrStatsW', 'Describe',
-    'FTestAnovaPower', 'FTestPower', 'FactoredPSDMatrix',
-    'GofChisquarePower', 'Mediation', 'NormalIndPower', 'NullDistribution',
-    'OaxacaBlinder', 'RegressionFDR', 'Runs', 'SquareTable',
-    'StratifiedTable', 'TTestIndPower', 'TTestPower', 'Table', 'Table2x2',
-    'acorr_breusch_godfrey', 'acorr_ljungbox', 'acorr_lm', 'anova_generic',
-    'anova_lm', 'anova_oneway', 'binom_test', 'binom_test_reject_interval',
-    'binom_tost', 'binom_tost_reject_interval', 'breaks_cusumolsresid',
-    'breaks_hansen', 'chisquare_effectsize', 'cochrans_q', 'cohens_kappa',
-    'combine_effects', 'compare_cox', 'compare_encompassing', 'compare_j',
-    'confint_effectsize_oneway', 'confint_mvmean',
-    'confint_mvmean_fromstats', 'confint_noncentrality', 'confint_poisson',
-    'confint_poisson_2indep', 'confint_proportions_2indep',
-    'confint_quantile_poisson', 'convert_effectsize_fsqu', 'corr_clipped',
-    'corr_nearest', 'corr_nearest_factor', 'corr_thresholded',
-    'cov_cluster', 'cov_cluster_2groups', 'cov_hac', 'cov_hc0', 'cov_hc1',
-    'cov_hc2', 'cov_hc3', 'cov_nearest', 'cov_nearest_factor_homog',
-    'cov_nw_panel', 'cov_white_simple', 'diagnostic', 'durbin_watson',
-    'effectsize_2proportions', 'effectsize_oneway', 'effectsize_smd',
-    'equivalence_oneway', 'equivalence_oneway_generic',
-    'equivalence_scale_oneway', 'etest_poisson_2indep', 'f2_to_wellek',
-    'fdrcorrection', 'fdrcorrection_twostage', 'fleiss_kappa',
-    'fstat_to_wellek', 'gof', 'gof_chisquare_discrete', 'het_arch',
-    'het_breuschpagan', 'het_goldfeldquandt', 'het_white', 'jarque_bera',
-    'lilliefors', 'linear_harvey_collier', 'linear_lm', 'linear_rainbow',
-    'linear_reset', 'local_fdr', 'mcnemar', 'moment_helpers', 'multicomp',
-    'multinomial_proportions_confint', 'multipletests',
-    'nonequivalence_poisson_2indep', 'normal_ad', 'omni_normtest',
-    'power_binom_tost', 'power_equivalence_neginb_2indep',
-    'power_equivalence_oneway', 'power_equivalence_poisson_2indep',
-    'power_negbin_ratio_2indep', 'power_poisson_diff_2indep',
-    'power_poisson_ratio_2indep', 'power_proportions_2indep',
-    'power_ztost_prop', 'powerdiscrepancy', 'proportion_confint',
-    'proportion_effectsize', 'proportions_chisquare',
-    'proportions_chisquare_allpairs', 'proportions_chisquare_pairscontrol',
-    'proportions_ztest', 'proportions_ztost', 'recursive_olsresiduals',
-    'runstest_1samp', 'runstest_2samp', 'samplesize_confint_proportion',
-    'samplesize_proportions_2indep_onetail', 'sandwich_covariance',
-    'se_cov', 'simulate_power_equivalence_oneway', 'spec_white',
-    'stattools', 'test_cov', 'test_cov_blockdiagonal', 'test_cov_diagonal',
-    'test_cov_oneway', 'test_cov_spherical', 'test_mvmean',
-    'test_mvmean_2indep', 'test_poisson', 'test_poisson_2indep',
-    'test_proportions_2indep', 'test_scale_oneway', 'tolerance_int_poisson',
-    'tost_poisson_2indep', 'tost_proportions_2indep', 'tt_ind_solve_power',
-    'tt_solve_power', 'ttest_ind', 'ttost_ind', 'ttost_paired', 'tukeyhsd',
-    'wellek_to_f2', 'zconfint', 'zt_ind_solve_power', 'ztest', 'ztost']
+
+from .meta_analysis import (
+    combine_effects, effectsize_2proportions, effectsize_smd,
+    )
+
+__all__ = [
+    "AnovaRM",
+    "CompareMeans",
+    "DescrStatsW",
+    "Describe",
+    "FTestAnovaPower",
+    "FTestPower",
+    "FactoredPSDMatrix",
+    "GofChisquarePower",
+    "Mediation",
+    "NormalIndPower",
+    "NullDistribution",
+    "OaxacaBlinder",
+    "RegressionFDR",
+    "Runs",
+    "SquareTable",
+    "StratifiedTable",
+    "TTestIndPower",
+    "TTestPower",
+    "Table",
+    "Table2x2",
+    "acorr_breusch_godfrey",
+    "acorr_ljungbox",
+    "acorr_lm",
+    "anova_generic",
+    "anova_lm",
+    "anova_oneway",
+    "binom_test",
+    "binom_test_reject_interval",
+    "binom_tost",
+    "binom_tost_reject_interval",
+    "breaks_cusumolsresid",
+    "breaks_hansen",
+    "chisquare_effectsize",
+    "cochrans_q",
+    "cohens_kappa",
+    "combine_effects",
+    "compare_cox",
+    "compare_encompassing",
+    "compare_j",
+    "confint_effectsize_oneway",
+    "confint_mvmean",
+    "confint_mvmean_fromstats",
+    "confint_noncentrality",
+    "confint_poisson",
+    "confint_poisson_2indep",
+    "confint_proportions_2indep",
+    "confint_quantile_poisson",
+    "convert_effectsize_fsqu",
+    "corr_clipped",
+    "corr_nearest",
+    "corr_nearest_factor",
+    "corr_thresholded",
+    "cov_cluster",
+    "cov_cluster_2groups",
+    "cov_hac",
+    "cov_hc0",
+    "cov_hc1",
+    "cov_hc2",
+    "cov_hc3",
+    "cov_nearest",
+    "cov_nearest_factor_homog",
+    "cov_nw_panel",
+    "cov_white_simple",
+    "diagnostic",
+    "durbin_watson",
+    "effectsize_2proportions",
+    "effectsize_oneway",
+    "effectsize_smd",
+    "equivalence_oneway",
+    "equivalence_oneway_generic",
+    "equivalence_scale_oneway",
+    "etest_poisson_2indep",
+    "f2_to_wellek",
+    "fdrcorrection",
+    "fdrcorrection_twostage",
+    "fleiss_kappa",
+    "fstat_to_wellek",
+    "gof",
+    "gof_chisquare_discrete",
+    "het_arch",
+    "het_breuschpagan",
+    "het_goldfeldquandt",
+    "het_white",
+    "jarque_bera",
+    "lilliefors",
+    "linear_harvey_collier",
+    "linear_lm",
+    "linear_rainbow",
+    "linear_reset",
+    "local_fdr",
+    "mcnemar",
+    "moment_helpers",
+    "multicomp",
+    "multinomial_proportions_confint",
+    "multipletests",
+    "nonequivalence_poisson_2indep",
+    "normal_ad",
+    "omni_normtest",
+    "power_binom_tost",
+    "power_equivalence_neginb_2indep",
+    "power_equivalence_oneway",
+    "power_equivalence_poisson_2indep",
+    "power_negbin_ratio_2indep",
+    "power_poisson_diff_2indep",
+    "power_poisson_ratio_2indep",
+    "power_proportions_2indep",
+    "power_ztost_prop",
+    "powerdiscrepancy",
+    "proportion_confint",
+    "proportion_effectsize",
+    "proportions_chisquare",
+    "proportions_chisquare_allpairs",
+    "proportions_chisquare_pairscontrol",
+    "proportions_ztest",
+    "proportions_ztost",
+    "recursive_olsresiduals",
+    "runstest_1samp",
+    "runstest_2samp",
+    "samplesize_confint_proportion",
+    "samplesize_proportions_2indep_onetail",
+    "sandwich_covariance",
+    "se_cov",
+    "simulate_power_equivalence_oneway",
+    "spec_white",
+    "stattools",
+    "test_cov",
+    "test_cov_blockdiagonal",
+    "test_cov_diagonal",
+    "test_cov_oneway",
+    "test_cov_spherical",
+    "test_mvmean",
+    "test_mvmean_2indep",
+    "test_poisson",
+    "test_poisson_2indep",
+    "test_proportions_2indep",
+    "test_scale_oneway",
+    "tolerance_int_poisson",
+    "tost_poisson_2indep",
+    "tost_proportions_2indep",
+    "tt_ind_solve_power",
+    "tt_solve_power",
+    "ttest_ind",
+    "ttost_ind",
+    "ttost_paired",
+    "tukeyhsd",
+    "wellek_to_f2",
+    "zconfint",
+    "zt_ind_solve_power",
+    "ztest",
+    "ztost",
+]
diff --git a/statsmodels/stats/base.py b/statsmodels/stats/base.py
index b9da61a90..19ff38a11 100644
--- a/statsmodels/stats/base.py
+++ b/statsmodels/stats/base.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Base classes for statistical test results

 Created on Mon Apr 22 14:03:21 2013
@@ -19,7 +20,7 @@ class HolderTuple(Holder):
         if tuple_ is not None:
             self.tuple = tuple(getattr(self, att) for att in tuple_)
         else:
-            self.tuple = self.statistic, self.pvalue
+            self.tuple = (self.statistic, self.pvalue)

     def __iter__(self):
         yield from self.tuple
@@ -35,7 +36,7 @@ class HolderTuple(Holder):


 class AllPairsResults:
-    """Results class for pairwise comparisons, based on p-values
+    '''Results class for pairwise comparisons, based on p-values

     Parameters
     ----------
@@ -59,47 +60,66 @@ class AllPairsResults:
     This class can also be used for other pairwise comparisons, for example
     comparing several treatments to a control (as in Dunnet's test).

-    """
+    '''

-    def __init__(self, pvals_raw, all_pairs, multitest_method='hs', levels=
-        None, n_levels=None):
+    def __init__(self, pvals_raw, all_pairs, multitest_method='hs',
+                 levels=None, n_levels=None):
         self.pvals_raw = pvals_raw
         self.all_pairs = all_pairs
         if n_levels is None:
+            # for all_pairs nobs*(nobs-1)/2
             self.n_levels = np.max(all_pairs) + 1
         else:
             self.n_levels = n_levels
+
         self.multitest_method = multitest_method
         self.levels = levels
         if levels is None:
-            self.all_pairs_names = [('%r' % (pairs,)) for pairs in all_pairs]
+            self.all_pairs_names = ['%r' % (pairs,) for pairs in all_pairs]
         else:
-            self.all_pairs_names = [('%s-%s' % (levels[pairs[0]], levels[
-                pairs[1]])) for pairs in all_pairs]
+            self.all_pairs_names = ['%s-%s' % (levels[pairs[0]],
+                                               levels[pairs[1]])
+                                    for pairs in all_pairs]

     def pval_corrected(self, method=None):
-        """p-values corrected for multiple testing problem
+        '''p-values corrected for multiple testing problem

         This uses the default p-value correction of the instance stored in
         ``self.multitest_method`` if method is None.

-        """
-        pass
+        '''
+        import statsmodels.stats.multitest as smt
+        if method is None:
+            method = self.multitest_method
+        # TODO: breaks with method=None
+        return smt.multipletests(self.pvals_raw, method=method)[1]

     def __str__(self):
         return self.summary()

     def pval_table(self):
-        """create a (n_levels, n_levels) array with corrected p_values
+        '''create a (n_levels, n_levels) array with corrected p_values

         this needs to improve, similar to R pairwise output
-        """
-        pass
+        '''
+        k = self.n_levels
+        pvals_mat = np.zeros((k, k))
+        # if we do not assume we have all pairs
+        pvals_mat[lzip(*self.all_pairs)] = self.pval_corrected()
+        return pvals_mat

     def summary(self):
-        """returns text summarizing the results
+        '''returns text summarizing the results

         uses the default pvalue correction of the instance stored in
         ``self.multitest_method``
-        """
-        pass
+        '''
+        import statsmodels.stats.multitest as smt
+        maxlevel = max((len(ss) for ss in self.all_pairs_names))
+
+        text = ('Corrected p-values using %s p-value correction\n\n'
+                % smt.multitest_methods_names[self.multitest_method])
+        text += 'Pairs' + (' ' * (maxlevel - 5 + 1)) + 'p-values\n'
+        text += '\n'.join(('%s  %6.4g' % (pairs, pv) for (pairs, pv) in
+                          zip(self.all_pairs_names, self.pval_corrected())))
+        return text
diff --git a/statsmodels/stats/contingency_tables.py b/statsmodels/stats/contingency_tables.py
index 2fddf15a8..98be708f8 100644
--- a/statsmodels/stats/contingency_tables.py
+++ b/statsmodels/stats/contingency_tables.py
@@ -24,10 +24,13 @@ Note that the inference procedures may depend on how the data were
 sampled.  In general the observed units are independent and
 identically distributed.
 """
+
 import warnings
+
 import numpy as np
 import pandas as pd
 from scipy import stats
+
 from statsmodels import iolib
 from statsmodels.tools import sm_exceptions
 from statsmodels.tools.decorators import cache_readonly
@@ -39,23 +42,36 @@ def _make_df_square(table):
     the row and column indices contain the same values, in the same
     order.  The row and column index are extended to achieve this.
     """
-    pass
+
+    if not isinstance(table, pd.DataFrame):
+        return table
+
+    # If the table is not square, make it square
+    if not table.index.equals(table.columns):
+        ix = list(set(table.index) | set(table.columns))
+        ix.sort()
+        table = table.reindex(index=ix, columns=ix, fill_value=0)
+
+    # Ensures that the rows and columns are in the same order.
+    table = table.reindex(table.columns)
+
+    return table


 class _Bunch:

     def __repr__(self):
-        return '<bunch containing results, print to see contents>'
+        return "<bunch containing results, print to see contents>"

     def __str__(self):
         ky = [k for k, _ in self.__dict__.items()]
         ky.sort()
         m = max([len(k) for k in ky])
         tab = []
-        f = '{:' + str(m) + '}   {}'
+        f = "{:" + str(m) + "}   {}"
         for k in ky:
             tab.append(f.format(k, self.__dict__[k]))
-        return '\n'.join(tab)
+        return "\n".join(tab)


 class Table:
@@ -94,14 +110,16 @@ class Table:
     """

     def __init__(self, table, shift_zeros=True):
+
         self.table_orig = table
         self.table = np.asarray(table, dtype=np.float64)
-        if shift_zeros and self.table.min() == 0:
+
+        if shift_zeros and (self.table.min() == 0):
             self.table[self.table == 0] = 0.5

     def __str__(self):
-        s = 'A %dx%d contingency table with counts:\n' % tuple(self.table.shape
-            )
+        s = ("A %dx%d contingency table with counts:\n" %
+             tuple(self.table.shape))
         s += np.array_str(self.table)
         return s

@@ -123,7 +141,13 @@ class Table:
         -------
         A Table instance.
         """
-        pass
+
+        if isinstance(data, pd.DataFrame):
+            table = pd.crosstab(data.iloc[:, 0], data.iloc[:, 1])
+        else:
+            table = pd.crosstab(data[:, 0], data[:, 1])
+
+        return cls(table, shift_zeros)

     def test_nominal_association(self):
         """
@@ -144,7 +168,15 @@ class Table:
         pvalue : float
             The p-value for the test.
         """
-        pass
+
+        statistic = np.asarray(self.chi2_contribs).sum()
+        df = np.prod(np.asarray(self.table.shape) - 1)
+        pvalue = 1 - stats.chi2.cdf(statistic, df)
+        b = _Bunch()
+        b.statistic = statistic
+        b.df = df
+        b.pvalue = pvalue
+        return b

     def test_ordinal_association(self, row_scores=None, col_scores=None):
         """
@@ -185,7 +217,50 @@ class Table:
         Using the default row and column scores gives the
         Cochran-Armitage trend test.
         """
-        pass
+
+        if row_scores is None:
+            row_scores = np.arange(self.table.shape[0])
+
+        if col_scores is None:
+            col_scores = np.arange(self.table.shape[1])
+
+        if len(row_scores) != self.table.shape[0]:
+            msg = ("The length of `row_scores` must match the first " +
+                   "dimension of `table`.")
+            raise ValueError(msg)
+
+        if len(col_scores) != self.table.shape[1]:
+            msg = ("The length of `col_scores` must match the second " +
+                   "dimension of `table`.")
+            raise ValueError(msg)
+
+        # The test statistic
+        statistic = np.dot(row_scores, np.dot(self.table, col_scores))
+
+        # Some needed quantities
+        n_obs = self.table.sum()
+        rtot = self.table.sum(1)
+        um = np.dot(row_scores, rtot)
+        u2m = np.dot(row_scores**2, rtot)
+        ctot = self.table.sum(0)
+        vn = np.dot(col_scores, ctot)
+        v2n = np.dot(col_scores**2, ctot)
+
+        # The null mean and variance of the test statistic
+        e_stat = um * vn / n_obs
+        v_stat = (u2m - um**2 / n_obs) * (v2n - vn**2 / n_obs) / (n_obs - 1)
+        sd_stat = np.sqrt(v_stat)
+
+        zscore = (statistic - e_stat) / sd_stat
+        pvalue = 2 * stats.norm.cdf(-np.abs(zscore))
+
+        b = _Bunch()
+        b.statistic = statistic
+        b.null_mean = e_stat
+        b.null_sd = sd_stat
+        b.zscore = zscore
+        b.pvalue = pvalue
+        return b

     @cache_readonly
     def marginal_probabilities(self):
@@ -199,7 +274,16 @@ class Table:
         col : ndarray
             Marginal column probabilities
         """
-        pass
+
+        n = self.table.sum()
+        row = self.table.sum(1) / n
+        col = self.table.sum(0) / n
+
+        if isinstance(self.table_orig, pd.DataFrame):
+            row = pd.Series(row, self.table_orig.index)
+            col = pd.Series(col, self.table_orig.columns)
+
+        return row, col

     @cache_readonly
     def independence_probabilities(self):
@@ -210,7 +294,15 @@ class Table:
         column are the estimated marginal distributions
         of the rows and columns.
         """
-        pass
+
+        row, col = self.marginal_probabilities
+        itab = np.outer(row, col)
+
+        if isinstance(self.table_orig, pd.DataFrame):
+            itab = pd.DataFrame(itab, self.table_orig.index,
+                                self.table_orig.columns)
+
+        return itab

     @cache_readonly
     def fittedvalues(self):
@@ -220,7 +312,10 @@ class Table:
         The returned cell counts are estimates under a model
         where the rows and columns of the table are independent.
         """
-        pass
+
+        probs = self.independence_probabilities
+        fit = self.table.sum() * probs
+        return fit

     @cache_readonly
     def resid_pearson(self):
@@ -230,14 +325,20 @@ class Table:
         The Pearson residuals are calculated under a model where
         the rows and columns of the table are independent.
         """
-        pass
+
+        fit = self.fittedvalues
+        resids = (self.table - fit) / np.sqrt(fit)
+        return resids

     @cache_readonly
     def standardized_resids(self):
         """
         Returns standardized residuals under independence.
         """
-        pass
+
+        row, col = self.marginal_probabilities
+        sresids = self.resid_pearson / np.sqrt(np.outer(1 - row, 1 - col))
+        return sresids

     @cache_readonly
     def chi2_contribs(self):
@@ -248,7 +349,8 @@ class Table:
         test statistic for the null hypothesis that the rows and columns
         are independent.
         """
-        pass
+
+        return self.resid_pearson**2

     @cache_readonly
     def local_log_oddsratios(self):
@@ -258,7 +360,22 @@ class Table:
         The local log odds ratios are the log odds ratios
         calculated for contiguous 2x2 sub-tables.
         """
-        pass
+
+        ta = self.table.copy()
+        a = ta[0:-1, 0:-1]
+        b = ta[0:-1, 1:]
+        c = ta[1:, 0:-1]
+        d = ta[1:, 1:]
+        tab = np.log(a) + np.log(d) - np.log(b) - np.log(c)
+        rslt = np.empty(self.table.shape, np.float64)
+        rslt *= np.nan
+        rslt[0:-1, 0:-1] = tab
+
+        if isinstance(self.table_orig, pd.DataFrame):
+            rslt = pd.DataFrame(rslt, index=self.table_orig.index,
+                                columns=self.table_orig.columns)
+
+        return rslt

     @cache_readonly
     def local_oddsratios(self):
@@ -267,7 +384,8 @@ class Table:

         See documentation for local_log_oddsratios.
         """
-        pass
+
+        return np.exp(self.local_log_oddsratios)

     @cache_readonly
     def cumulative_log_oddsratios(self):
@@ -280,7 +398,24 @@ class Table:
         to obtain a 2x2 table from which a log odds ratio can be
         calculated.
         """
-        pass
+
+        ta = self.table.cumsum(0).cumsum(1)
+
+        a = ta[0:-1, 0:-1]
+        b = ta[0:-1, -1:] - a
+        c = ta[-1:, 0:-1] - a
+        d = ta[-1, -1] - (a + b + c)
+
+        tab = np.log(a) + np.log(d) - np.log(b) - np.log(c)
+        rslt = np.empty(self.table.shape, np.float64)
+        rslt *= np.nan
+        rslt[0:-1, 0:-1] = tab
+
+        if isinstance(self.table_orig, pd.DataFrame):
+            rslt = pd.DataFrame(rslt, index=self.table_orig.index,
+                                columns=self.table_orig.columns)
+
+        return rslt

     @cache_readonly
     def cumulative_oddsratios(self):
@@ -289,7 +424,8 @@ class Table:

         See documentation for cumulative_log_oddsratio.
         """
-        pass
+
+        return np.exp(self.cumulative_log_oddsratios)


 class SquareTable(Table):
@@ -317,13 +453,14 @@ class SquareTable(Table):
     """

     def __init__(self, table, shift_zeros=True):
-        table = _make_df_square(table)
+        table = _make_df_square(table)  # Non-pandas passes through
         k1, k2 = table.shape
         if k1 != k2:
             raise ValueError('table must be square')
+
         super(SquareTable, self).__init__(table, shift_zeros)

-    def symmetry(self, method='bowker'):
+    def symmetry(self, method="bowker"):
         """
         Test for symmetry of a joint distribution.

@@ -363,9 +500,28 @@ class SquareTable(Table):
         mcnemar
         homogeneity
         """
-        pass

-    def homogeneity(self, method='stuart_maxwell'):
+        if method.lower() != "bowker":
+            raise ValueError("method for symmetry testing must be 'bowker'")
+
+        k = self.table.shape[0]
+        upp_idx = np.triu_indices(k, 1)
+
+        tril = self.table.T[upp_idx]   # lower triangle in column order
+        triu = self.table[upp_idx]     # upper triangle in row order
+
+        statistic = ((tril - triu)**2 / (tril + triu + 1e-20)).sum()
+        df = k * (k-1) / 2.
+        pvalue = stats.chi2.sf(statistic, df)
+
+        b = _Bunch()
+        b.statistic = statistic
+        b.pvalue = pvalue
+        b.df = df
+
+        return b
+
+    def homogeneity(self, method="stuart_maxwell"):
         """
         Compare row and column marginal distributions.

@@ -397,9 +553,65 @@ class SquareTable(Table):
         meaningful, the two factors must have the same sample space
         (i.e. the same categories).
         """
-        pass

-    def summary(self, alpha=0.05, float_format='%.3f'):
+        if self.table.shape[0] < 1:
+            raise ValueError('table is empty')
+        elif self.table.shape[0] == 1:
+            b = _Bunch()
+            b.statistic = 0
+            b.pvalue = 1
+            b.df = 0
+            return b
+
+        method = method.lower()
+        if method not in ["bhapkar", "stuart_maxwell"]:
+            raise ValueError("method '%s' for homogeneity not known" % method)
+
+        n_obs = self.table.sum()
+        pr = self.table.astype(np.float64) / n_obs
+
+        # Compute margins, eliminate last row/column so there is no
+        # degeneracy
+        row = pr.sum(1)[0:-1]
+        col = pr.sum(0)[0:-1]
+        pr = pr[0:-1, 0:-1]
+
+        # The estimated difference between row and column margins.
+        d = col - row
+
+        # The degrees of freedom of the chi^2 reference distribution.
+        df = pr.shape[0]
+
+        if method == "bhapkar":
+            vmat = -(pr + pr.T) - np.outer(d, d)
+            dv = col + row - 2*np.diag(pr) - d**2
+            np.fill_diagonal(vmat, dv)
+        elif method == "stuart_maxwell":
+            vmat = -(pr + pr.T)
+            dv = row + col - 2*np.diag(pr)
+            np.fill_diagonal(vmat, dv)
+
+        try:
+            statistic = n_obs * np.dot(d, np.linalg.solve(vmat, d))
+        except np.linalg.LinAlgError:
+            warnings.warn("Unable to invert covariance matrix",
+                          sm_exceptions.SingularMatrixWarning)
+            b = _Bunch()
+            b.statistic = np.nan
+            b.pvalue = np.nan
+            b.df = df
+            return b
+
+        pvalue = 1 - stats.chi2.cdf(statistic, df)
+
+        b = _Bunch()
+        b.statistic = statistic
+        b.pvalue = pvalue
+        b.df = df
+
+        return b
+
+    def summary(self, alpha=0.05, float_format="%.3f"):
         """
         Produce a summary of the analysis.

@@ -413,7 +625,19 @@ class SquareTable(Table):
             The method for producing the confidence interval.  Currently
             must be 'normal' which uses the normal approximation.
         """
-        pass
+
+        fmt = float_format
+
+        headers = ["Statistic", "P-value", "DF"]
+        stubs = ["Symmetry", "Homogeneity"]
+        sy = self.symmetry()
+        hm = self.homogeneity()
+        data = [[fmt % sy.statistic, fmt % sy.pvalue, '%d' % sy.df],
+                [fmt % hm.statistic, fmt % hm.pvalue, '%d' % hm.df]]
+        tab = iolib.SimpleTable(data, headers, stubs, data_aligns="r",
+                                table_dec_above='')
+
+        return tab


 class Table2x2(SquareTable):
@@ -442,10 +666,13 @@ class Table2x2(SquareTable):
     """

     def __init__(self, table, shift_zeros=True):
+
         if type(table) is list:
             table = np.asarray(table)
-        if table.ndim != 2 or table.shape[0] != 2 or table.shape[1] != 2:
-            raise ValueError('Table2x2 takes a 2x2 table as input.')
+
+        if (table.ndim != 2) or (table.shape[0] != 2) or (table.shape[1] != 2):
+            raise ValueError("Table2x2 takes a 2x2 table as input.")
+
         super(Table2x2, self).__init__(table, shift_zeros)

     @classmethod
@@ -462,28 +689,38 @@ class Table2x2(SquareTable):
             If True, and if there are any zeros in the contingency
             table, add 0.5 to all four cells of the table.
         """
-        pass
+
+        if isinstance(data, pd.DataFrame):
+            table = pd.crosstab(data.iloc[:, 0], data.iloc[:, 1])
+        else:
+            table = pd.crosstab(data[:, 0], data[:, 1])
+        return cls(table, shift_zeros)

     @cache_readonly
     def log_oddsratio(self):
         """
         Returns the log odds ratio for a 2x2 table.
         """
-        pass
+
+        f = self.table.flatten()
+        return np.dot(np.log(f), np.r_[1, -1, -1, 1])

     @cache_readonly
     def oddsratio(self):
         """
         Returns the odds ratio for a 2x2 table.
         """
-        pass
+
+        return (self.table[0, 0] * self.table[1, 1] /
+                (self.table[0, 1] * self.table[1, 0]))

     @cache_readonly
     def log_oddsratio_se(self):
         """
         Returns the standard error for the log odds ratio.
         """
-        pass
+
+        return np.sqrt(np.sum(1 / self.table))

     def oddsratio_pvalue(self, null=1):
         """
@@ -494,7 +731,8 @@ class Table2x2(SquareTable):
         null : float
             The null value of the odds ratio.
         """
-        pass
+
+        return self.log_oddsratio_pvalue(np.log(null))

     def log_oddsratio_pvalue(self, null=0):
         """
@@ -505,9 +743,12 @@ class Table2x2(SquareTable):
         null : float
             The null value of the log odds ratio.
         """
-        pass

-    def log_oddsratio_confint(self, alpha=0.05, method='normal'):
+        zscore = (self.log_oddsratio - null) / self.log_oddsratio_se
+        pvalue = 2 * stats.norm.cdf(-np.abs(zscore))
+        return pvalue
+
+    def log_oddsratio_confint(self, alpha=0.05, method="normal"):
         """
         A confidence level for the log odds ratio.

@@ -520,9 +761,15 @@ class Table2x2(SquareTable):
             The method for producing the confidence interval.  Currently
             must be 'normal' which uses the normal approximation.
         """
-        pass

-    def oddsratio_confint(self, alpha=0.05, method='normal'):
+        f = -stats.norm.ppf(alpha / 2)
+        lor = self.log_oddsratio
+        se = self.log_oddsratio_se
+        lcb = lor - f * se
+        ucb = lor + f * se
+        return lcb, ucb
+
+    def oddsratio_confint(self, alpha=0.05, method="normal"):
         """
         A confidence interval for the odds ratio.

@@ -535,7 +782,8 @@ class Table2x2(SquareTable):
             The method for producing the confidence interval.  Currently
             must be 'normal' which uses the normal approximation.
         """
-        pass
+        lcb, ucb = self.log_oddsratio_confint(alpha, method=method)
+        return np.exp(lcb), np.exp(ucb)

     @cache_readonly
     def riskratio(self):
@@ -544,21 +792,28 @@ class Table2x2(SquareTable):

         The risk ratio is calculated with respect to the rows.
         """
-        pass
+
+        p = self.table[:, 0] / self.table.sum(1)
+        return p[0] / p[1]

     @cache_readonly
     def log_riskratio(self):
         """
         Returns the log of the risk ratio.
         """
-        pass
+
+        return np.log(self.riskratio)

     @cache_readonly
     def log_riskratio_se(self):
         """
         Returns the standard error of the log of the risk ratio.
         """
-        pass
+
+        n = self.table.sum(1)
+        p = self.table[:, 0] / n
+        va = np.sum((1 - p) / (n*p))
+        return np.sqrt(va)

     def riskratio_pvalue(self, null=1):
         """
@@ -569,7 +824,8 @@ class Table2x2(SquareTable):
         null : float
             The null value of the risk ratio.
         """
-        pass
+
+        return self.log_riskratio_pvalue(np.log(null))

     def log_riskratio_pvalue(self, null=0):
         """
@@ -580,9 +836,12 @@ class Table2x2(SquareTable):
         null : float
             The null value of the log risk ratio.
         """
-        pass

-    def log_riskratio_confint(self, alpha=0.05, method='normal'):
+        zscore = (self.log_riskratio - null) / self.log_riskratio_se
+        pvalue = 2 * stats.norm.cdf(-np.abs(zscore))
+        return pvalue
+
+    def log_riskratio_confint(self, alpha=0.05, method="normal"):
         """
         A confidence interval for the log risk ratio.

@@ -595,9 +854,14 @@ class Table2x2(SquareTable):
             The method for producing the confidence interval.  Currently
             must be 'normal' which uses the normal approximation.
         """
-        pass
+        f = -stats.norm.ppf(alpha / 2)
+        lrr = self.log_riskratio
+        se = self.log_riskratio_se
+        lcb = lrr - f * se
+        ucb = lrr + f * se
+        return lcb, ucb

-    def riskratio_confint(self, alpha=0.05, method='normal'):
+    def riskratio_confint(self, alpha=0.05, method="normal"):
         """
         A confidence interval for the risk ratio.

@@ -610,9 +874,10 @@ class Table2x2(SquareTable):
             The method for producing the confidence interval.  Currently
             must be 'normal' which uses the normal approximation.
         """
-        pass
+        lcb, ucb = self.log_riskratio_confint(alpha, method=method)
+        return np.exp(lcb), np.exp(ucb)

-    def summary(self, alpha=0.05, float_format='%.3f', method='normal'):
+    def summary(self, alpha=0.05, float_format="%.3f", method="normal"):
         """
         Summarizes results for a 2x2 table analysis.

@@ -627,7 +892,31 @@ class Table2x2(SquareTable):
             The method for producing the confidence interval.  Currently
             must be 'normal' which uses the normal approximation.
         """
-        pass
+
+        def fmt(x):
+            if isinstance(x, str):
+                return x
+            return float_format % x
+
+        headers = ["Estimate", "SE", "LCB", "UCB", "p-value"]
+        stubs = ["Odds ratio", "Log odds ratio", "Risk ratio",
+                 "Log risk ratio"]
+
+        lcb1, ucb1 = self.oddsratio_confint(alpha, method)
+        lcb2, ucb2 = self.log_oddsratio_confint(alpha, method)
+        lcb3, ucb3 = self.riskratio_confint(alpha, method)
+        lcb4, ucb4 = self.log_riskratio_confint(alpha, method)
+        data = [[fmt(x) for x in [self.oddsratio, "", lcb1, ucb1,
+                                  self.oddsratio_pvalue()]],
+                [fmt(x) for x in [self.log_oddsratio, self.log_oddsratio_se,
+                                  lcb2, ucb2, self.oddsratio_pvalue()]],
+                [fmt(x) for x in [self.riskratio, "", lcb3, ucb3,
+                                  self.riskratio_pvalue()]],
+                [fmt(x) for x in [self.log_riskratio, self.log_riskratio_se,
+                                  lcb4, ucb4, self.riskratio_pvalue()]]]
+        tab = iolib.SimpleTable(data, headers, stubs, data_aligns="r",
+                                table_dec_above='')
+        return tab


 class StratifiedTable:
@@ -653,24 +942,34 @@ class StratifiedTable:
     """

     def __init__(self, tables, shift_zeros=False):
+
         if isinstance(tables, np.ndarray):
             sp = tables.shape
-            if len(sp) != 3 or sp[0] != 2 or sp[1] != 2:
-                raise ValueError('If an ndarray, argument must be 2x2xn')
-            table = tables * 1.0
+            if (len(sp) != 3) or (sp[0] != 2) or (sp[1] != 2):
+                raise ValueError("If an ndarray, argument must be 2x2xn")
+            table = tables * 1.  # use atleast float dtype
         else:
-            if any([(np.asarray(x).shape != (2, 2)) for x in tables]):
-                m = 'If `tables` is a list, all of its elements should be 2x2'
+            if any([np.asarray(x).shape != (2, 2) for x in tables]):
+                m = "If `tables` is a list, all of its elements should be 2x2"
                 raise ValueError(m)
+
+            # Create a data cube
             table = np.dstack(tables).astype(np.float64)
+
         if shift_zeros:
             zx = (table == 0).sum(0).sum(0)
             ix = np.flatnonzero(zx > 0)
             if len(ix) > 0:
                 table = table.copy()
                 table[:, :, ix] += 0.5
+
         self.table = table
+
         self._cache = {}
+
+        # Quantities to precompute.  Table entries are [[a, b], [c,
+        # d]], 'ad' is 'a * d', 'apb' is 'a + b', 'dma' is 'd - a',
+        # etc.
         self._apb = table[0, 0, :] + table[0, 1, :]
         self._apc = table[0, 0, :] + table[1, 0, :]
         self._bpd = table[0, 1, :] + table[1, 1, :]
@@ -707,7 +1006,27 @@ class StratifiedTable:
         -------
         StratifiedTable
         """
-        pass
+
+        if not isinstance(data, pd.DataFrame):
+            data1 = pd.DataFrame(index=np.arange(data.shape[0]),
+                                 columns=[var1, var2, strata])
+            data1[data1.columns[var1]] = data[:, var1]
+            data1[data1.columns[var2]] = data[:, var2]
+            data1[data1.columns[strata]] = data[:, strata]
+        else:
+            data1 = data[[var1, var2, strata]]
+
+        gb = data1.groupby(strata).groups
+        tables = []
+        for g in gb:
+            ii = gb[g]
+            tab = pd.crosstab(data1.loc[ii, var1], data1.loc[ii, var2])
+            if (tab.shape != np.r_[2, 2]).any():
+                msg = "Invalid table dimensions"
+                raise ValueError(msg)
+            tables.append(np.asarray(tab))
+
+        return cls(tables)

     def test_null_odds(self, correction=False):
         """
@@ -726,7 +1045,26 @@ class StratifiedTable:
         Bunch
             A bunch containing the chi^2 test statistic and p-value.
         """
-        pass
+
+        statistic = np.sum(self.table[0, 0, :] -
+                           self._apb * self._apc / self._n)
+        statistic = np.abs(statistic)
+        if correction:
+            statistic -= 0.5
+        statistic = statistic**2
+        denom = self._apb * self._apc * self._bpd * self._cpd
+        denom /= (self._n**2 * (self._n - 1))
+        denom = np.sum(denom)
+        statistic /= denom
+
+        # df is always 1
+        pvalue = 1 - stats.chi2.cdf(statistic, 1)
+
+        b = _Bunch()
+        b.statistic = statistic
+        b.pvalue = pvalue
+
+        return b

     @cache_readonly
     def oddsratio_pooled(self):
@@ -736,7 +1074,8 @@ class StratifiedTable:
         The value is an estimate of a common odds ratio across all of the
         stratified tables.
         """
-        pass
+        odds_ratio = np.sum(self._ad / self._n) / np.sum(self._bc / self._n)
+        return odds_ratio

     @cache_readonly
     def logodds_pooled(self):
@@ -745,14 +1084,19 @@ class StratifiedTable:

         See oddsratio_pooled for more information.
         """
-        pass
+        return np.log(self.oddsratio_pooled)

     @cache_readonly
     def riskratio_pooled(self):
         """
         Estimate of the pooled risk ratio.
         """
-        pass
+
+        acd = self.table[0, 0, :] * self._cpd
+        cab = self.table[1, 0, :] * self._apb
+
+        rr = np.sum(acd / self._n) / np.sum(cab / self._n)
+        return rr

     @cache_readonly
     def logodds_pooled_se(self):
@@ -765,9 +1109,22 @@ class StratifiedTable:
         Mantel-Haenszel Variance Consistent in Both Sparse Data and
         Large-Strata Limiting Models." Biometrics 42, no. 2 (1986): 311-23.
         """
-        pass

-    def logodds_pooled_confint(self, alpha=0.05, method='normal'):
+        adns = np.sum(self._ad / self._n)
+        bcns = np.sum(self._bc / self._n)
+        lor_va = np.sum(self._apd * self._ad / self._n**2) / adns**2
+        mid = self._apd * self._bc / self._n**2
+        mid += (1 - self._apd / self._n) * self._ad / self._n
+        mid = np.sum(mid)
+        mid /= (adns * bcns)
+        lor_va += mid
+        lor_va += np.sum((1 - self._apd / self._n) *
+                         self._bc / self._n) / bcns**2
+        lor_va /= 2
+        lor_se = np.sqrt(lor_va)
+        return lor_se
+
+    def logodds_pooled_confint(self, alpha=0.05, method="normal"):
         """
         A confidence interval for the pooled log odds ratio.

@@ -787,9 +1144,18 @@ class StratifiedTable:
         ucb : float
             The upper confidence limit.
         """
-        pass

-    def oddsratio_pooled_confint(self, alpha=0.05, method='normal'):
+        lor = np.log(self.oddsratio_pooled)
+        lor_se = self.logodds_pooled_se
+
+        f = -stats.norm.ppf(alpha / 2)
+
+        lcb = lor - f * lor_se
+        ucb = lor + f * lor_se
+
+        return lcb, ucb
+
+    def oddsratio_pooled_confint(self, alpha=0.05, method="normal"):
         """
         A confidence interval for the pooled odds ratio.

@@ -809,7 +1175,11 @@ class StratifiedTable:
         ucb : float
             The upper confidence limit.
         """
-        pass
+
+        lcb, ucb = self.logodds_pooled_confint(alpha, method=method)
+        lcb = np.exp(lcb)
+        ucb = np.exp(ucb)
+        return lcb, ucb

     def test_equal_odds(self, adjust=False):
         """
@@ -832,9 +1202,40 @@ class StratifiedTable:
         p-value : float
             The p-value for the test.
         """
-        pass

-    def summary(self, alpha=0.05, float_format='%.3f', method='normal'):
+        table = self.table
+
+        r = self.oddsratio_pooled
+        a = 1 - r
+        b = r * (self._apb + self._apc) + self._dma
+        c = -r * self._apb * self._apc
+
+        # Expected value of first cell
+        dr = np.sqrt(b**2 - 4*a*c)
+        e11 = (-b + dr) / (2*a)
+
+        # Variance of the first cell
+        v11 = (1 / e11 + 1 / (self._apc - e11) + 1 / (self._apb - e11) +
+               1 / (self._dma + e11))
+        v11 = 1 / v11
+
+        statistic = np.sum((table[0, 0, :] - e11)**2 / v11)
+
+        if adjust:
+            adj = table[0, 0, :].sum() - e11.sum()
+            adj = adj**2
+            adj /= np.sum(v11)
+            statistic -= adj
+
+        pvalue = 1 - stats.chi2.cdf(statistic, table.shape[2] - 1)
+
+        b = _Bunch()
+        b.statistic = statistic
+        b.pvalue = pvalue
+
+        return b
+
+    def summary(self, alpha=0.05, float_format="%.3f", method="normal"):
         """
         A summary of all the main results.

@@ -849,7 +1250,46 @@ class StratifiedTable:
             The method for producing the confidence interval.  Currently
             must be 'normal' which uses the normal approximation.
         """
-        pass
+
+        def fmt(x):
+            if isinstance(x, str):
+                return x
+            return float_format % x
+
+        co_lcb, co_ucb = self.oddsratio_pooled_confint(
+            alpha=alpha, method=method)
+        clo_lcb, clo_ucb = self.logodds_pooled_confint(
+            alpha=alpha, method=method)
+        headers = ["Estimate", "LCB", "UCB"]
+        stubs = ["Pooled odds", "Pooled log odds", "Pooled risk ratio", ""]
+        data = [[fmt(x) for x in [self.oddsratio_pooled, co_lcb, co_ucb]],
+                [fmt(x) for x in [self.logodds_pooled, clo_lcb, clo_ucb]],
+                [fmt(x) for x in [self.riskratio_pooled, "", ""]],
+                ['', '', '']]
+        tab1 = iolib.SimpleTable(data, headers, stubs, data_aligns="r",
+                                 table_dec_above='')
+
+        headers = ["Statistic", "P-value", ""]
+        stubs = ["Test of OR=1", "Test constant OR"]
+        rslt1 = self.test_null_odds()
+        rslt2 = self.test_equal_odds()
+        data = [[fmt(x) for x in [rslt1.statistic, rslt1.pvalue, ""]],
+                [fmt(x) for x in [rslt2.statistic, rslt2.pvalue, ""]]]
+        tab2 = iolib.SimpleTable(data, headers, stubs, data_aligns="r")
+        tab1.extend(tab2)
+
+        headers = ["", "", ""]
+        stubs = ["Number of tables", "Min n", "Max n", "Avg n", "Total n"]
+        ss = self.table.sum(0).sum(0)
+        data = [["%d" % self.table.shape[2], '', ''],
+                ["%d" % min(ss), '', ''],
+                ["%d" % max(ss), '', ''],
+                ["%.0f" % np.mean(ss), '', ''],
+                ["%d" % sum(ss), '', '', '']]
+        tab3 = iolib.SimpleTable(data, headers, stubs, data_aligns="r")
+        tab1.extend(tab3)
+
+        return tab1


 def mcnemar(table, exact=True, correction=True):
@@ -887,7 +1327,32 @@ def mcnemar(table, exact=True, correction=True):
     test. The results when the chisquare distribution is used are
     identical, except for continuity correction.
     """
-    pass
+
+    table = _make_df_square(table)
+    table = np.asarray(table, dtype=np.float64)
+    n1, n2 = table[0, 1], table[1, 0]
+
+    if exact:
+        statistic = np.minimum(n1, n2)
+        # binom is symmetric with p=0.5
+        # SciPy 1.7+ requires int arguments
+        int_sum = int(n1 + n2)
+        if int_sum != (n1 + n2):
+            raise ValueError(
+                "exact can only be used with tables containing integers."
+            )
+        pvalue = stats.binom.cdf(statistic, int_sum, 0.5) * 2
+        pvalue = np.minimum(pvalue, 1)  # limit to 1 if n1==n2
+    else:
+        corr = int(correction)  # convert bool to 0 or 1
+        statistic = (np.abs(n1 - n2) - corr)**2 / (1. * (n1 + n2))
+        df = 1
+        pvalue = stats.chi2.sf(statistic, df)
+
+    b = _Bunch()
+    b.statistic = statistic
+    b.pvalue = pvalue
+    return b


 def cochrans_q(x, return_object=True):
@@ -933,4 +1398,34 @@ def cochrans_q(x, return_object=True):
     https://en.wikipedia.org/wiki/Cochran_test
     SAS Manual for NPAR TESTS
     """
-    pass
+
+    x = np.asarray(x, dtype=np.float64)
+    gruni = np.unique(x)
+    N, k = x.shape
+    count_row_success = (x == gruni[-1]).sum(1, float)
+    count_col_success = (x == gruni[-1]).sum(0, float)
+    count_row_ss = count_row_success.sum()
+    count_col_ss = count_col_success.sum()
+    assert count_row_ss == count_col_ss  # just a calculation check
+
+    # From the SAS manual
+    q_stat = ((k-1) * (k * np.sum(count_col_success**2) - count_col_ss**2)
+              / (k * count_row_ss - np.sum(count_row_success**2)))
+
+    # Note: the denominator looks just like k times the variance of
+    # the columns
+    # Wikipedia uses a different, but equivalent expression
+    # q_stat = (k-1) * (k *  np.sum(count_row_success**2) - count_row_ss**2)
+    #         / (k * count_col_ss - np.sum(count_col_success**2))
+
+    df = k - 1
+    pvalue = stats.chi2.sf(q_stat, df)
+
+    if return_object:
+        b = _Bunch()
+        b.statistic = q_stat
+        b.df = df
+        b.pvalue = pvalue
+        return b
+
+    return q_stat, pvalue, df
diff --git a/statsmodels/stats/contrast.py b/statsmodels/stats/contrast.py
index a25f9a34b..246508f0f 100644
--- a/statsmodels/stats/contrast.py
+++ b/statsmodels/stats/contrast.py
@@ -6,6 +6,7 @@ from statsmodels.tools.tools import clean0, fullrank
 from statsmodels.stats.multitest import multipletests


+#TODO: should this be public if it's just a container?
 class ContrastResults:
     """
     Class for results of tests of linear restrictions on coefficients in a model.
@@ -18,8 +19,9 @@ class ContrastResults:
     """

     def __init__(self, t=None, F=None, sd=None, effect=None, df_denom=None,
-        df_num=None, alpha=0.05, **kwds):
-        self.effect = effect
+                 df_num=None, alpha=0.05, **kwds):
+
+        self.effect = effect  # Let it be None for F
         if F is not None:
             self.distribution = 'F'
             self.fvalue = F
@@ -27,21 +29,23 @@ class ContrastResults:
             self.df_denom = df_denom
             self.df_num = df_num
             self.dist = fdist
-            self.dist_args = df_num, df_denom
+            self.dist_args = (df_num, df_denom)
             self.pvalue = fdist.sf(F, df_num, df_denom)
         elif t is not None:
             self.distribution = 't'
             self.tvalue = t
-            self.statistic = t
+            self.statistic = t  # generic alias
             self.sd = sd
             self.df_denom = df_denom
             self.dist = student_t
-            self.dist_args = df_denom,
+            self.dist_args = (df_denom,)
             self.pvalue = self.dist.sf(np.abs(t), df_denom) * 2
         elif 'statistic' in kwds:
+            # TODO: currently targeted to normal distribution, and chi2
             self.distribution = kwds['distribution']
             self.statistic = kwds['statistic']
-            self.tvalue = value = kwds['statistic']
+            self.tvalue = value = kwds['statistic']  # keep alias
+            # TODO: for results instance we decided to use tvalues also for normal
             self.sd = sd
             self.dist = getattr(stats, self.distribution)
             self.dist_args = kwds.get('dist_args', ())
@@ -49,15 +53,19 @@ class ContrastResults:
                 self.pvalue = self.dist.sf(self.statistic, df_denom)
                 self.df_denom = df_denom
             else:
-                """normal"""
+                "normal"
                 self.pvalue = np.full_like(value, np.nan)
                 not_nan = ~np.isnan(value)
                 self.pvalue[not_nan] = self.dist.sf(np.abs(value[not_nan])) * 2
         else:
             self.pvalue = np.nan
+
+        # cleanup
+        # should we return python scalar?
         self.pvalue = np.squeeze(self.pvalue)
+
         if self.effect is not None:
-            self.c_names = [('c%d' % ii) for ii in range(len(self.effect))]
+            self.c_names = ['c%d' % ii for ii in range(len(self.effect))]
         else:
             self.c_names = None

@@ -79,7 +87,14 @@ class ContrastResults:
             The array has the lower and the upper limit of the confidence
             interval in the columns.
         """
-        pass
+        if self.effect is not None:
+            # confidence intervals
+            q = self.dist.ppf(1 - alpha / 2., *self.dist_args)
+            lower = self.effect - q * self.sd
+            upper = self.effect + q * self.sd
+            return np.column_stack((lower, upper))
+        else:
+            raise NotImplementedError('Confidence Interval not available')

     def __str__(self):
         return self.summary().__str__()
@@ -109,14 +124,64 @@ class ContrastResults:
             results summary.
             For F or Wald test, the return is a string.
         """
-        pass
+        if self.effect is not None:
+            # TODO: should also add some extra information, e.g. robust cov ?
+            # TODO: can we infer names for constraints, xname in __init__ ?
+            if title is None:
+                title = 'Test for Constraints'
+            elif title == '':
+                # do not add any title,
+                # I think SimpleTable skips on None - check
+                title = None
+            # we have everything for a params table
+            use_t = (self.distribution == 't')
+            yname='constraints' # Not used in params_frame
+            if xname is None:
+                xname = self.c_names
+            from statsmodels.iolib.summary import summary_params
+            pvalues = np.atleast_1d(self.pvalue)
+            summ = summary_params((self, self.effect, self.sd, self.statistic,
+                                   pvalues, self.conf_int(alpha)),
+                                  yname=yname, xname=xname, use_t=use_t,
+                                  title=title, alpha=alpha)
+            return summ
+        elif hasattr(self, 'fvalue'):
+            # TODO: create something nicer for these casee
+            return ('<F test: F=%s, p=%s, df_denom=%.3g, df_num=%.3g>' %
+                   (repr(self.fvalue), self.pvalue, self.df_denom,
+                    self.df_num))
+        elif self.distribution == 'chi2':
+            return ('<Wald test (%s): statistic=%s, p-value=%s, df_denom=%.3g>' %
+                   (self.distribution, self.statistic, self.pvalue,
+                    self.df_denom))
+        else:
+            # generic
+            return ('<Wald test: statistic=%s, p-value=%s>' %
+                   (self.statistic, self.pvalue))
+

     def summary_frame(self, xname=None, alpha=0.05):
         """Return the parameter table as a pandas DataFrame

         This is only available for t and normal tests
         """
-        pass
+        if self.effect is not None:
+            # we have everything for a params table
+            use_t = (self.distribution == 't')
+            yname='constraints'  # Not used in params_frame
+            if xname is None:
+                xname = self.c_names
+            from statsmodels.iolib.summary import summary_params_frame
+            summ = summary_params_frame((self, self.effect, self.sd,
+                                         self.statistic,self.pvalue,
+                                         self.conf_int(alpha)), yname=yname,
+                                         xname=xname, use_t=use_t,
+                                         alpha=alpha)
+            return summ
+        else:
+            # TODO: create something nicer
+            raise NotImplementedError('only available for t and z')
+


 class Contrast:
@@ -184,12 +249,14 @@ class Contrast:
     >>> np.allclose(c3.contrast_matrix, test2)
     True
     """
-
     def _get_matrix(self):
         """
         Gets the contrast_matrix property
         """
-        pass
+        if not hasattr(self, "_contrast_matrix"):
+            self.compute_matrix()
+        return self._contrast_matrix
+
     contrast_matrix = property(_get_matrix)

     def __init__(self, term, design):
@@ -204,9 +271,20 @@ class Contrast:

         where pinv(D) is the generalized inverse of D=design.
         """
-        pass

+        T = self.term
+        if T.ndim == 1:
+            T = T[:,None]

+        self.T = clean0(T)
+        self.D = self.design
+        self._contrast_matrix = contrastfromcols(self.T, self.D)
+        try:
+            self.rank = self.matrix.shape[1]
+        except:
+            self.rank = 1
+
+#TODO: fix docstring after usage is settled
 def contrastfromcols(L, D, pseudo=None):
     """
     From an n x p design matrix D and a matrix L, tries
@@ -237,32 +315,68 @@ def contrastfromcols(L, D, pseudo=None):
     L : array_like
     D : array_like
     """
-    pass
+    L = np.asarray(L)
+    D = np.asarray(D)
+
+    n, p = D.shape
+
+    if L.shape[0] != n and L.shape[1] != p:
+        raise ValueError("shape of L and D mismatched")
+
+    if pseudo is None:
+        pseudo = np.linalg.pinv(D)    # D^+ \approx= ((dot(D.T,D))^(-1),D.T)
+
+    if L.shape[0] == n:
+        C = np.dot(pseudo, L).T
+    else:
+        C = L
+        C = np.dot(pseudo, np.dot(D, C.T)).T
+
+    Lp = np.dot(D, C.T)

+    if len(Lp.shape) == 1:
+        Lp.shape = (n, 1)

+    if np.linalg.matrix_rank(Lp) != Lp.shape[1]:
+        Lp = fullrank(Lp)
+        C = np.dot(pseudo, Lp).T
+
+    return np.squeeze(C)
+
+
+# TODO: this is currently a minimal version, stub
 class WaldTestResults:
+    # for F and chi2 tests of joint hypothesis, mainly for vectorized

     def __init__(self, statistic, distribution, dist_args, table=None,
-        pvalues=None):
+                 pvalues=None):
         self.table = table
+
         self.distribution = distribution
         self.statistic = statistic
+        #self.sd = sd
         self.dist_args = dist_args
+
+        # The following is because I do not know which we want
         if table is not None:
             self.statistic = table['statistic'].values
             self.pvalues = table['pvalue'].values
             self.df_constraints = table['df_constraint'].values
             if self.distribution == 'F':
                 self.df_denom = table['df_denom'].values
+
         else:
             if self.distribution == 'chi2':
                 self.dist = stats.chi2
-                self.df_constraints = self.dist_args[0]
+                self.df_constraints = self.dist_args[0]  # assumes tuple
+                # using dist_args[0] is a bit dangerous,
             elif self.distribution == 'F':
                 self.dist = stats.f
                 self.df_constraints, self.df_denom = self.dist_args
+
             else:
                 raise ValueError('only F and chi2 are possible distribution')
+
             if pvalues is None:
                 self.pvalues = self.dist.sf(np.abs(statistic), *dist_args)
             else:
@@ -272,20 +386,41 @@ class WaldTestResults:
     def col_names(self):
         """column names for summary table
         """
-        pass
+
+        pr_test = "P>%s" % self.distribution
+        col_names = [self.distribution, pr_test, 'df constraint']
+        if self.distribution == 'F':
+            col_names.append('df denom')
+        return col_names
+
+    def summary_frame(self):
+        # needs to be a method for consistency
+        if hasattr(self, '_dframe'):
+            return self._dframe
+        # rename the column nambes, but do not copy data
+        renaming = dict(zip(self.table.columns, self.col_names))
+        self.dframe = self.table.rename(columns=renaming)
+        return self.dframe
+

     def __str__(self):
         return self.summary_frame().to_string()

+
     def __repr__(self):
         return str(self.__class__) + '\n' + self.__str__()


+# t_test for pairwise comparison and automatic contrast/restrictions
+
+
 def _get_pairs_labels(k_level, level_names):
     """helper function for labels for pairwise comparisons
     """
-    pass
-
+    idx_pairs_all = np.triu_indices(k_level, 1)
+    labels = ['%s-%s' % (level_names[name[1]], level_names[name[0]])
+              for name in zip(*idx_pairs_all)]
+    return labels

 def _contrast_pairs(k_params, k_level, idx_start):
     """create pairwise contrast for reference coding
@@ -312,11 +447,25 @@ def _contrast_pairs(k_params, k_level, idx_start):
         restriction matrix with k_params columns and number of rows equal to
         the number of restrictions.
     """
-    pass
+    k_level_m1 = k_level - 1
+    idx_pairs = np.triu_indices(k_level_m1, 1)
+
+    k = len(idx_pairs[0])
+    c_pairs = np.zeros((k, k_level_m1))
+    c_pairs[np.arange(k), idx_pairs[0]] = -1
+    c_pairs[np.arange(k), idx_pairs[1]] = 1
+    c_reference = np.eye(k_level_m1)
+    c = np.concatenate((c_reference, c_pairs), axis=0)
+    k_all = c.shape[0]
+
+    contrasts = np.zeros((k_all, k_params))
+    contrasts[:, idx_start : idx_start + k_level_m1] = c
+
+    return contrasts


 def t_test_multi(result, contrasts, method='hs', alpha=0.05, ci_method=None,
-    contrast_names=None):
+                 contrast_names=None):
     """perform t_test and add multiplicity correction to results dataframe

     Parameters
@@ -342,7 +491,16 @@ def t_test_multi(result, contrasts, method='hs', alpha=0.05, ci_method=None,
         for multiplicity corrected p-values and boolean indicator for whether
         the Null hypothesis is rejected.
     """
-    pass
+    tt = result.t_test(contrasts)
+    res_df = tt.summary_frame(xname=contrast_names)
+
+    if type(method) is not list:
+        method = [method]
+    for meth in method:
+        mt = multipletests(tt.pvalue, method=meth, alpha=alpha)
+        res_df['pvalue-%s' % meth] = mt[1]
+        res_df['reject-%s' % meth] = mt[0]
+    return res_df


 class MultiCompResult:
@@ -350,7 +508,6 @@ class MultiCompResult:

     currently just a minimal class to hold attributes.
     """
-
     def __init__(self, **kwargs):
         self.__dict__.update(kwargs)

@@ -380,11 +537,18 @@ def _embed_constraints(contrasts, k_params, idx_start, index=None):
         restriction matrix with k_params columns and number of rows equal to
         the number of restrictions.
     """
-    pass
+
+    k_c, k_p = contrasts.shape
+    c = np.zeros((k_c, k_params))
+    if index is None:
+        c[:, idx_start : idx_start + k_p] = contrasts
+    else:
+        c[:, index] = contrasts
+    return c


-def _constraints_factor(encoding_matrix, comparison='pairwise', k_params=
-    None, idx_start=None):
+def _constraints_factor(encoding_matrix, comparison='pairwise', k_params=None,
+                        idx_start=None):
     """helper function to create constraints based on encoding matrix

     Parameters
@@ -411,11 +575,27 @@ def _constraints_factor(encoding_matrix, comparison='pairwise', k_params=
         Contrast or restriction matrix that can be used in hypothesis test
         of model results. The number of columns is k_params.
     """
-    pass
+
+    cm = encoding_matrix
+    k_level, k_p = cm.shape
+
+    import statsmodels.sandbox.stats.multicomp as mc
+    if comparison in ['pairwise', 'pw', 'pairs']:
+        c_all = -mc.contrast_allpairs(k_level)
+    else:
+        raise NotImplementedError('currentlyonly pairwise comparison')
+
+    contrasts = c_all.dot(cm)
+    if k_params is not None:
+        if idx_start is None:
+            raise ValueError("if k_params is not None, then idx_start is "
+                             "required")
+        contrasts = _embed_constraints(contrasts, k_params, idx_start)
+    return contrasts


 def t_test_pairwise(result, term_name, method='hs', alpha=0.05,
-    factor_labels=None, ignore=False):
+                    factor_labels=None, ignore=False):
     """
     Perform pairwise t_test with multiple testing corrected p-values.

@@ -462,7 +642,40 @@ def t_test_pairwise(result, term_name, method='hs', alpha=0.05,
     Currently there are no multiple testing corrected confidence intervals
     available.
     """
-    pass
+
+    desinfo = result.model.data.design_info
+    term_idx = desinfo.term_names.index(term_name)
+    term = desinfo.terms[term_idx]
+    idx_start = desinfo.term_slices[term].start
+    if not ignore and len(term.factors) > 1:
+        raise ValueError('interaction effects not yet supported')
+    factor = term.factors[0]
+    cat = desinfo.factor_infos[factor].categories
+    if factor_labels is not None:
+        if len(factor_labels) == len(cat):
+            cat = factor_labels
+        else:
+            raise ValueError("factor_labels has the wrong length, should be %d" % len(cat))
+
+
+    k_level = len(cat)
+    cm = desinfo.term_codings[term][0].contrast_matrices[factor].matrix
+
+    k_params = len(result.params)
+    labels = _get_pairs_labels(k_level, cat)
+
+    import statsmodels.sandbox.stats.multicomp as mc
+    c_all_pairs = -mc.contrast_allpairs(k_level)
+    contrasts_sub = c_all_pairs.dot(cm)
+    contrasts = _embed_constraints(contrasts_sub, k_params, idx_start)
+    res_df = t_test_multi(result, contrasts, method=method, ci_method=None,
+                          alpha=alpha, contrast_names=labels)
+    res = MultiCompResult(result_frame=res_df,
+                          contrasts=contrasts,
+                          term=term,
+                          contrast_labels=labels,
+                          term_encoding_matrix=cm)
+    return res


 def _offset_constraint(r_matrix, params_est, params_alt):
@@ -476,7 +689,9 @@ def _offset_constraint(r_matrix, params_est, params_alt):
     nc = fs.statistic * fs.df_num

     """
-    pass
+    diff_est = r_matrix @ params_est
+    diff_alt = r_matrix @ params_alt
+    return diff_est - diff_alt


 def wald_test_noncent(params, r_matrix, value, results, diff=None, joint=True):
@@ -520,11 +735,19 @@ def wald_test_noncent(params, r_matrix, value, results, diff=None, joint=True):
     Status : experimental, API will likely change

     """
-    pass
+    if diff is None:
+        diff = r_matrix @ params - value  # at parameter under alternative
+
+    cov_c = results.cov_params(r_matrix=r_matrix)
+    if joint:
+        nc = diff @ np.linalg.solve(cov_c, diff)
+    else:
+        nc = diff / np.sqrt(np.diag(cov_c))
+    return nc


-def wald_test_noncent_generic(params, r_matrix, value, cov_params, diff=
-    None, joint=True):
+def wald_test_noncent_generic(params, r_matrix, value, cov_params, diff=None,
+                              joint=True):
     """noncentrality parameter for a wald test

     The null hypothesis is ``diff = r_matrix @ params - value = 0``
@@ -564,4 +787,16 @@ def wald_test_noncent_generic(params, r_matrix, value, cov_params, diff=
     -----
     Status : experimental, API will likely change
     """
-    pass
+    if value is None:
+        value = 0
+    if diff is None:
+        # at parameter under alternative
+        diff = r_matrix @ params - value
+
+    c = r_matrix
+    cov_c = c.dot(cov_params).dot(c.T)
+    if joint:
+        nc = diff @ np.linalg.solve(cov_c, diff)
+    else:
+        nc = diff / np.sqrt(np.diag(cov_c))
+    return nc
diff --git a/statsmodels/stats/correlation_tools.py b/statsmodels/stats/correlation_tools.py
index 75e27ed55..706bdc147 100644
--- a/statsmodels/stats/correlation_tools.py
+++ b/statsmodels/stats/correlation_tools.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """

 Created on Fri Aug 17 13:10:52 2012
@@ -5,17 +6,27 @@ Created on Fri Aug 17 13:10:52 2012
 Author: Josef Perktold
 License: BSD-3
 """
+
 import numpy as np
 import scipy.sparse as sparse
 from scipy.sparse.linalg import svds
 from scipy.optimize import fminbound
 import warnings
+
 from statsmodels.tools.tools import Bunch
-from statsmodels.tools.sm_exceptions import IterationLimitWarning, iteration_limit_doc
+from statsmodels.tools.sm_exceptions import (
+    IterationLimitWarning, iteration_limit_doc)
+
+
+def clip_evals(x, value=0):  # threshold=0, value=0):
+    evals, evecs = np.linalg.eigh(x)
+    clipped = np.any(evals < value)
+    x_new = np.dot(evecs * np.maximum(evals, value), evecs.T)
+    return x_new, clipped


 def corr_nearest(corr, threshold=1e-15, n_fact=100):
-    """
+    '''
     Find the nearest correlation matrix that is positive semi-definite.

     The function iteratively adjust the correlation matrix by clipping the
@@ -57,12 +68,32 @@ def corr_nearest(corr, threshold=1e-15, n_fact=100):
     corr_clipped
     cov_nearest

-    """
-    pass
+    '''
+    k_vars = corr.shape[0]
+    if k_vars != corr.shape[1]:
+        raise ValueError("matrix is not square")
+
+    diff = np.zeros(corr.shape)
+    x_new = corr.copy()
+    diag_idx = np.arange(k_vars)
+
+    for ii in range(int(len(corr) * n_fact)):
+        x_adj = x_new - diff
+        x_psd, clipped = clip_evals(x_adj, value=threshold)
+        if not clipped:
+            x_new = x_psd
+            break
+        diff = x_psd - x_adj
+        x_new = x_psd.copy()
+        x_new[diag_idx, diag_idx] = 1
+    else:
+        warnings.warn(iteration_limit_doc, IterationLimitWarning)
+
+    return x_new


 def corr_clipped(corr, threshold=1e-15):
-    """
+    '''
     Find a near correlation matrix that is positive semi-definite

     This function clips the eigenvalues, replacing eigenvalues smaller than
@@ -110,12 +141,19 @@ def corr_clipped(corr, threshold=1e-15):
     corr_nearest
     cov_nearest

-    """
-    pass
+    '''
+    x_new, clipped = clip_evals(corr, value=threshold)
+    if not clipped:
+        return corr
+
+    # cov2corr
+    x_std = np.sqrt(np.diag(x_new))
+    x_new = x_new / x_std / x_std[:, None]
+    return x_new


 def cov_nearest(cov, method='clipped', threshold=1e-15, n_fact=100,
-    return_all=False):
+                return_all=False):
     """
     Find the nearest covariance matrix that is positive (semi-) definite

@@ -167,11 +205,24 @@ def cov_nearest(cov, method='clipped', threshold=1e-15, n_fact=100,
     corr_nearest
     corr_clipped
     """
-    pass

+    from statsmodels.stats.moment_helpers import cov2corr, corr2cov
+    cov_, std_ = cov2corr(cov, return_std=True)
+    if method == 'clipped':
+        corr_ = corr_clipped(cov_, threshold=threshold)
+    else:  # method == 'nearest'
+        corr_ = corr_nearest(cov_, threshold=threshold, n_fact=n_fact)
+
+    cov_ = corr2cov(corr_, std_)
+
+    if return_all:
+        return cov_, corr_, std_
+    else:
+        return cov_

-def _nmono_linesearch(obj, grad, x, d, obj_hist, M=10, sig1=0.1, sig2=0.9,
-    gam=0.0001, maxiter=100):
+
+def _nmono_linesearch(obj, grad, x, d, obj_hist, M=10, sig1=0.1,
+                      sig2=0.9, gam=1e-4, maxiter=100):
     """
     Implements the non-monotone line search of Grippo et al. (1986),
     as described in Birgin, Martinez and Raydan (2013).
@@ -229,12 +280,35 @@ def _nmono_linesearch(obj, grad, x, d, obj_hist, M=10, sig1=0.1, sig2=0.9,
     gradient methods: Review and perspectives. Journal of Statistical
     Software (preprint).
     """
-    pass

+    alpha = 1.
+    last_obval = obj(x)
+    obj_max = max(obj_hist[-M:])
+
+    for iter in range(maxiter):
+
+        obval = obj(x + alpha*d)
+        g = grad(x)
+        gtd = (g * d).sum()
+
+        if obval <= obj_max + gam*alpha*gtd:
+            return alpha, x + alpha*d, obval, g

-def _spg_optim(func, grad, start, project, maxiter=10000.0, M=10, ctol=
-    0.001, maxiter_nmls=200, lam_min=1e-30, lam_max=1e+30, sig1=0.1, sig2=
-    0.9, gam=0.0001):
+        a1 = -0.5*alpha**2*gtd / (obval - last_obval - alpha*gtd)
+
+        if (sig1 <= a1) and (a1 <= sig2*alpha):
+            alpha = a1
+        else:
+            alpha /= 2.
+
+        last_obval = obval
+
+    return None, None, None, None
+
+
+def _spg_optim(func, grad, start, project, maxiter=1e4, M=10,
+               ctol=1e-3, maxiter_nmls=200, lam_min=1e-30,
+               lam_max=1e30, sig1=0.1, sig2=0.9, gam=1e-4):
     """
     Implements the spectral projected gradient method for minimizing a
     differentiable function on a convex domain.
@@ -275,7 +349,65 @@ def _spg_optim(func, grad, start, project, maxiter=10000.0, M=10, ctol=
     Software (preprint).  Available at:
     http://www.ime.usp.br/~egbirgin/publications/bmr5.pdf
     """
-    pass
+
+    lam = min(10*lam_min, lam_max)
+
+    params = start.copy()
+    gval = grad(params)
+
+    obj_hist = [func(params), ]
+
+    for itr in range(int(maxiter)):
+
+        # Check convergence
+        df = params - gval
+        project(df)
+        df -= params
+        if np.max(np.abs(df)) < ctol:
+            return Bunch(**{"Converged": True, "params": params,
+                            "objective_values": obj_hist,
+                            "Message": "Converged successfully"})
+
+        # The line search direction
+        d = params - lam*gval
+        project(d)
+        d -= params
+
+        # Carry out the nonmonotone line search
+        alpha, params1, fval, gval1 = _nmono_linesearch(
+            func,
+            grad,
+            params,
+            d,
+            obj_hist,
+            M=M,
+            sig1=sig1,
+            sig2=sig2,
+            gam=gam,
+            maxiter=maxiter_nmls)
+
+        if alpha is None:
+            return Bunch(**{"Converged": False, "params": params,
+                            "objective_values": obj_hist,
+                            "Message": "Failed in nmono_linesearch"})
+
+        obj_hist.append(fval)
+        s = params1 - params
+        y = gval1 - gval
+
+        sy = (s*y).sum()
+        if sy <= 0:
+            lam = lam_max
+        else:
+            ss = (s*s).sum()
+            lam = max(lam_min, min(ss/sy, lam_max))
+
+        params = params1
+        gval = gval1
+
+    return Bunch(**{"Converged": False, "params": params,
+                    "objective_values": obj_hist,
+                    "Message": "spg_optim did not converge"})


 def _project_correlation_factors(X):
@@ -285,7 +417,10 @@ def _project_correlation_factors(X):

     The input matrix is modified in-place.
     """
-    pass
+    nm = np.sqrt((X*X).sum(1))
+    ii = np.flatnonzero(nm > 1)
+    if len(ii) > 0:
+        X[ii, :] /= nm[ii][:, None]


 class FactoredPSDMatrix:
@@ -317,14 +452,14 @@ class FactoredPSDMatrix:
         root = root / np.sqrt(diag)[:, None]
         u, s, vt = np.linalg.svd(root, 0)
         self.factor = u
-        self.scales = s ** 2
+        self.scales = s**2

     def to_matrix(self):
         """
         Returns the PSD matrix represented by this instance as a full
         (square) matrix.
         """
-        pass
+        return np.diag(self.diag) + np.dot(self.root, self.root.T)

     def decorrelate(self, rhs):
         """
@@ -348,7 +483,20 @@ class FactoredPSDMatrix:

         This function exploits the factor structure for efficiency.
         """
-        pass
+
+        # I + factor * qval * factor' is the inverse square root of
+        # the covariance matrix in the homogeneous case where diag =
+        # 1.
+        qval = -1 + 1 / np.sqrt(1 + self.scales)
+
+        # Decorrelate in the general case.
+        rhs = rhs / np.sqrt(self.diag)[:, None]
+        rhs1 = np.dot(self.factor.T, rhs)
+        rhs1 *= qval[:, None]
+        rhs1 = np.dot(self.factor, rhs1)
+        rhs += rhs1
+
+        return rhs

     def solve(self, rhs):
         """
@@ -370,18 +518,29 @@ class FactoredPSDMatrix:
         -----
         This function exploits the factor structure for efficiency.
         """
-        pass
+
+        qval = -self.scales / (1 + self.scales)
+        dr = np.sqrt(self.diag)
+        rhs = rhs / dr[:, None]
+        mat = qval[:, None] * np.dot(self.factor.T, rhs)
+        rhs = rhs + np.dot(self.factor, mat)
+        return rhs / dr[:, None]

     def logdet(self):
         """
         Returns the logarithm of the determinant of a
         factor-structured matrix.
         """
-        pass
+
+        logdet = np.sum(np.log(self.diag))
+        logdet += np.sum(np.log(self.scales))
+        logdet += np.sum(np.log(1 + 1 / self.scales))
+
+        return logdet


-def corr_nearest_factor(corr, rank, ctol=1e-06, lam_min=1e-30, lam_max=
-    1e+30, maxiter=1000):
+def corr_nearest_factor(corr, rank, ctol=1e-6, lam_min=1e-30,
+                        lam_max=1e30, maxiter=1000):
     """
     Find the nearest correlation matrix with factor structure to a
     given square matrix.
@@ -464,7 +623,70 @@ def corr_nearest_factor(corr, rank, ctol=1e-06, lam_min=1e-30, lam_max=
     >>> corr = corr * (np.abs(corr) >= 0.3)
     >>> rslt = corr_nearest_factor(corr, 3)
     """
-    pass
+
+    p, _ = corr.shape
+
+    # Starting values (following the PCA method in BHR).
+    u, s, vt = svds(corr, rank)
+    X = u * np.sqrt(s)
+    nm = np.sqrt((X**2).sum(1))
+    ii = np.flatnonzero(nm > 1e-5)
+    X[ii, :] /= nm[ii][:, None]
+
+    # Zero the diagonal
+    corr1 = corr.copy()
+    if type(corr1) is np.ndarray:
+        np.fill_diagonal(corr1, 0)
+    elif sparse.issparse(corr1):
+        corr1.setdiag(np.zeros(corr1.shape[0]))
+        corr1.eliminate_zeros()
+        corr1.sort_indices()
+    else:
+        raise ValueError("Matrix type not supported")
+
+    # The gradient, from lemma 4.1 of BHR.
+    def grad(X):
+        gr = np.dot(X, np.dot(X.T, X))
+        if type(corr1) is np.ndarray:
+            gr -= np.dot(corr1, X)
+        else:
+            gr -= corr1.dot(X)
+        gr -= (X*X).sum(1)[:, None] * X
+        return 4*gr
+
+    # The objective function (sum of squared deviations between fitted
+    # and observed arrays).
+    def func(X):
+        if type(corr1) is np.ndarray:
+            M = np.dot(X, X.T)
+            np.fill_diagonal(M, 0)
+            M -= corr1
+            fval = (M*M).sum()
+            return fval
+        else:
+            fval = 0.
+            # Control the size of intermediates
+            max_ws = 1e6
+            bs = int(max_ws / X.shape[0])
+            ir = 0
+            while ir < X.shape[0]:
+                ir2 = min(ir+bs, X.shape[0])
+                u = np.dot(X[ir:ir2, :], X.T)
+                ii = np.arange(u.shape[0])
+                u[ii, ir+ii] = 0
+                u -= np.asarray(corr1[ir:ir2, :].todense())
+                fval += (u*u).sum()
+                ir += bs
+            return fval
+
+    rslt = _spg_optim(func, grad, X, _project_correlation_factors, ctol=ctol,
+                      lam_min=lam_min, lam_max=lam_max, maxiter=maxiter)
+    root = rslt.params
+    diag = 1 - (root**2).sum(1)
+    soln = FactoredPSDMatrix(diag, root)
+    rslt.corr = soln
+    del rslt.params
+    return rslt


 def cov_nearest_factor_homog(cov, rank):
@@ -519,11 +741,37 @@ def cov_nearest_factor_homog(cov, rank):
     >>> cov = cov * (np.abs(cov) >= 0.3)
     >>> rslt = cov_nearest_factor_homog(cov, 3)
     """
-    pass

+    m, n = cov.shape

-def corr_thresholded(data, minabs=None, max_elt=10000000.0):
-    """
+    Q, Lambda, _ = svds(cov, rank)
+
+    if sparse.issparse(cov):
+        QSQ = np.dot(Q.T, cov.dot(Q))
+        ts = cov.diagonal().sum()
+        tss = cov.dot(cov).diagonal().sum()
+    else:
+        QSQ = np.dot(Q.T, np.dot(cov, Q))
+        ts = np.trace(cov)
+        tss = np.trace(np.dot(cov, cov))
+
+    def fun(k):
+        Lambda_t = Lambda - k
+        v = tss + m*(k**2) + np.sum(Lambda_t**2) - 2*k*ts
+        v += 2*k*np.sum(Lambda_t) - 2*np.sum(np.diag(QSQ) * Lambda_t)
+        return v
+
+    # Get the optimal decomposition
+    k_opt = fminbound(fun, 0, 1e5)
+    Lambda_opt = Lambda - k_opt
+    fac_opt = Q * np.sqrt(Lambda_opt)
+
+    diag = k_opt * np.ones(m, dtype=np.float64)  # - (fac_opt**2).sum(1)
+    return FactoredPSDMatrix(diag, fac_opt)
+
+
+def corr_thresholded(data, minabs=None, max_elt=1e7):
+    r"""
     Construct a sparse matrix containing the thresholded row-wise
     correlation matrix from a data array.

@@ -544,7 +792,7 @@ def corr_thresholded(data, minabs=None, max_elt=10000000.0):

     Notes
     -----
-    This is an alternative to C = np.corrcoef(data); C \\*= (np.abs(C)
+    This is an alternative to C = np.corrcoef(data); C \*= (np.abs(C)
     >= absmin), suitable for very tall data matrices.

     If the data are jointly Gaussian, the marginal sampling
@@ -574,7 +822,44 @@ def corr_thresholded(data, minabs=None, max_elt=10000000.0):
     >>> x = np.random.randn(100,1).dot(b.T) + np.random.randn(100,10)
     >>> cmat = corr_thresholded(x, 0.3)
     """
-    pass
+
+    nrow, ncol = data.shape
+
+    if minabs is None:
+        minabs = 1. / float(ncol)
+
+    # Row-standardize the data
+    data = data.copy()
+    data -= data.mean(1)[:, None]
+    sd = data.std(1, ddof=1)
+    ii = np.flatnonzero(sd > 1e-5)
+    data[ii, :] /= sd[ii][:, None]
+    ii = np.flatnonzero(sd <= 1e-5)
+    data[ii, :] = 0
+
+    # Number of rows to process in one pass
+    bs = int(np.floor(max_elt / nrow))
+
+    ipos_all, jpos_all, cor_values = [], [], []
+
+    ir = 0
+    while ir < nrow:
+        ir2 = min(data.shape[0], ir + bs)
+        cm = np.dot(data[ir:ir2, :], data.T) / (ncol - 1)
+        cma = np.abs(cm)
+        ipos, jpos = np.nonzero(cma >= minabs)
+        ipos_all.append(ipos + ir)
+        jpos_all.append(jpos)
+        cor_values.append(cm[ipos, jpos])
+        ir += bs
+
+    ipos = np.concatenate(ipos_all)
+    jpos = np.concatenate(jpos_all)
+    cor_values = np.concatenate(cor_values)
+
+    cmat = sparse.coo_matrix((cor_values, (ipos, jpos)), (nrow, nrow))
+
+    return cmat


 class MultivariateKernel:
@@ -586,6 +871,9 @@ class MultivariateKernel:
     (a 1d ndarray) to each row of `loc` (a 2d ndarray).
     """

+    def call(self, x, loc):
+        raise NotImplementedError
+
     def set_bandwidth(self, bw):
         """
         Set the bandwidth to the given vector.
@@ -595,7 +883,15 @@ class MultivariateKernel:
         bw : array_like
             A vector of non-negative bandwidth values.
         """
-        pass
+
+        self.bw = bw
+        self._setup()
+
+    def _setup(self):
+
+        # Precompute the squared bandwidth values.
+        self.bwk = np.prod(self.bw)
+        self.bw2 = self.bw * self.bw

     def set_default_bw(self, loc, bwm=None):
         """
@@ -610,7 +906,20 @@ class MultivariateKernel:
             A non-negative scalar that is used to multiply
             the default bandwidth.
         """
-        pass
+
+        sd = loc.std(0)
+        q25, q75 = np.percentile(loc, [25, 75], axis=0)
+        iqr = (q75 - q25) / 1.349
+        bw = np.where(iqr < sd, iqr, sd)
+        bw *= 0.9 / loc.shape[0] ** 0.2
+
+        if bwm is not None:
+            bw *= bwm
+
+        # The final bandwidths
+        self.bw = np.asarray(bw, dtype=np.float64)
+
+        self._setup()


 class GaussianMultivariateKernel(MultivariateKernel):
@@ -618,6 +927,9 @@ class GaussianMultivariateKernel(MultivariateKernel):
     The Gaussian (squared exponential) multivariate kernel.
     """

+    def call(self, x, loc):
+        return np.exp(-(x - loc)**2 / (2 * self.bw2)).sum(1) / self.bwk
+

 def kernel_covariance(exog, loc, groups, kernel=None, bw=None):
     """
@@ -664,4 +976,64 @@ def kernel_covariance(exog, loc, groups, kernel=None, bw=None):
         multivariate geostatics.  Statistical Science 30(2).
         https://arxiv.org/pdf/1507.08017.pdf
     """
-    pass
+
+    exog = np.asarray(exog)
+    loc = np.asarray(loc)
+    groups = np.asarray(groups)
+
+    if loc.ndim == 1:
+        loc = loc[:, None]
+
+    v = [exog.shape[0], loc.shape[0], len(groups)]
+    if min(v) != max(v):
+        msg = "exog, loc, and groups must have the same number of rows"
+        raise ValueError(msg)
+
+    # Map from group labels to the row indices in each group.
+    ix = {}
+    for i, g in enumerate(groups):
+        if g not in ix:
+            ix[g] = []
+        ix[g].append(i)
+    for g in ix.keys():
+        ix[g] = np.sort(ix[g])
+
+    if kernel is None:
+        kernel = GaussianMultivariateKernel()
+
+    if bw is None:
+        kernel.set_default_bw(loc)
+    elif np.isscalar(bw):
+        kernel.set_default_bw(loc, bwm=bw)
+    else:
+        kernel.set_bandwidth(bw)
+
+    def cov(x, y):
+
+        kx = kernel.call(x, loc)
+        ky = kernel.call(y, loc)
+
+        cm, cw = 0., 0.
+
+        for g, ii in ix.items():
+
+            m = len(ii)
+            j1, j2 = np.indices((m, m))
+            j1 = ii[j1.flat]
+            j2 = ii[j2.flat]
+            w = kx[j1] * ky[j2]
+
+            # TODO: some other form of broadcasting may be faster than
+            # einsum here
+            cm += np.einsum("ij,ik,i->jk", exog[j1, :], exog[j2, :], w)
+            cw += w.sum()
+
+        if cw < 1e-10:
+            msg = ("Effective sample size is 0.  The bandwidth may be too " +
+                   "small, or you are outside the range of your data.")
+            warnings.warn(msg)
+            return np.nan * np.ones_like(cm)
+
+        return cm / cw
+
+    return cov
diff --git a/statsmodels/stats/descriptivestats.py b/statsmodels/stats/descriptivestats.py
index 1773d6f60..b4366e1b5 100644
--- a/statsmodels/stats/descriptivestats.py
+++ b/statsmodels/stats/descriptivestats.py
@@ -1,22 +1,76 @@
 from statsmodels.compat.pandas import PD_LT_2, Appender, is_numeric_dtype
 from statsmodels.compat.scipy import SP_LT_19
+
 from typing import Sequence, Union
+
 import numpy as np
 import pandas as pd
+
 if PD_LT_2:
     from pandas.core.dtypes.common import is_categorical_dtype
+else:
+    # After pandas 2 is the minium, use the isinstance check
+    def is_categorical_dtype(dtype):
+        return isinstance(dtype, pd.CategoricalDtype)
+
 from scipy import stats
+
 from statsmodels.iolib.table import SimpleTable
 from statsmodels.stats.stattools import jarque_bera
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.docstring import Docstring, Parameter
-from statsmodels.tools.validation import array_like, bool_like, float_like, int_like
-PERCENTILES = 1, 5, 10, 25, 50, 75, 90, 95, 99
+from statsmodels.tools.validation import (
+    array_like,
+    bool_like,
+    float_like,
+    int_like,
+)
+
+PERCENTILES = (1, 5, 10, 25, 50, 75, 90, 95, 99)
 QUANTILES = np.array(PERCENTILES) / 100.0
-MISSING = {'obs': nancount, 'mean': np.nanmean, 'std': np.nanstd, 'max': np
-    .nanmax, 'min': np.nanmin, 'ptp': nanptp, 'var': np.nanvar, 'skew':
-    nanskewness, 'uss': nanuss, 'kurtosis': nankurtosis, 'percentiles':
-    nanpercentile}
+
+
+def pd_ptp(df):
+    return df.max() - df.min()
+
+
+def nancount(x, axis=0):
+    return (1 - np.isnan(x)).sum(axis=axis)
+
+
+def nanptp(arr, axis=0):
+    return np.nanmax(arr, axis=axis) - np.nanmin(arr, axis=axis)
+
+
+def nanuss(arr, axis=0):
+    return np.nansum(arr ** 2, axis=axis)
+
+
+def nanpercentile(arr, axis=0):
+    return np.nanpercentile(arr, PERCENTILES, axis=axis)
+
+
+def nankurtosis(arr, axis=0):
+    return stats.kurtosis(arr, axis=axis, nan_policy="omit")
+
+
+def nanskewness(arr, axis=0):
+    return stats.skew(arr, axis=axis, nan_policy="omit")
+
+
+MISSING = {
+    "obs": nancount,
+    "mean": np.nanmean,
+    "std": np.nanstd,
+    "max": np.nanmax,
+    "min": np.nanmin,
+    "ptp": nanptp,
+    "var": np.nanvar,
+    "skew": nanskewness,
+    "uss": nanuss,
+    "kurtosis": nankurtosis,
+    "percentiles": nanpercentile,
+}


 def _kurtosis(a):
@@ -25,7 +79,11 @@ def _kurtosis(a):

     missing options
     """
-    pass
+    try:
+        res = stats.kurtosis(a)
+    except ValueError:
+        res = np.nan
+    return res


 def _skew(a):
@@ -34,7 +92,11 @@ def _skew(a):

     missing options
     """
-    pass
+    try:
+        res = stats.skew(a)
+    except ValueError:
+        res = np.nan
+    return res


 def sign_test(samp, mu0=0):
@@ -72,15 +134,44 @@ def sign_test(samp, mu0=0):
     --------
     scipy.stats.wilcoxon
     """
-    pass
-
-
-NUMERIC_STATISTICS = ('nobs', 'missing', 'mean', 'std_err', 'ci', 'std',
-    'iqr', 'iqr_normal', 'mad', 'mad_normal', 'coef_var', 'range', 'max',
-    'min', 'skew', 'kurtosis', 'jarque_bera', 'mode', 'median', 'percentiles')
-CATEGORICAL_STATISTICS = 'nobs', 'missing', 'distinct', 'top', 'freq'
-_additional = [stat for stat in CATEGORICAL_STATISTICS if stat not in
-    NUMERIC_STATISTICS]
+    samp = np.asarray(samp)
+    pos = np.sum(samp > mu0)
+    neg = np.sum(samp < mu0)
+    M = (pos - neg) / 2.0
+    try:
+        p = stats.binomtest(min(pos, neg), pos + neg, 0.5).pvalue
+    except AttributeError:
+        # Remove after min SciPy >= 1.7
+        p = stats.binom_test(min(pos, neg), pos + neg, 0.5)
+    return M, p
+
+
+NUMERIC_STATISTICS = (
+    "nobs",
+    "missing",
+    "mean",
+    "std_err",
+    "ci",
+    "std",
+    "iqr",
+    "iqr_normal",
+    "mad",
+    "mad_normal",
+    "coef_var",
+    "range",
+    "max",
+    "min",
+    "skew",
+    "kurtosis",
+    "jarque_bera",
+    "mode",
+    "median",
+    "percentiles",
+)
+CATEGORICAL_STATISTICS = ("nobs", "missing", "distinct", "top", "freq")
+_additional = [
+    stat for stat in CATEGORICAL_STATISTICS if stat not in NUMERIC_STATISTICS
+]
 DEFAULT_STATISTICS = NUMERIC_STATISTICS + tuple(_additional)


@@ -166,76 +257,106 @@ class Description:
     * "freq" - The frequency of the common categories. Labeled freq_n for n in 1,
       2, ..., ``ntop``.
     """
-    _int_fmt = ['nobs', 'missing', 'distinct']
+
+    _int_fmt = ["nobs", "missing", "distinct"]
     numeric_statistics = NUMERIC_STATISTICS
     categorical_statistics = CATEGORICAL_STATISTICS
     default_statistics = DEFAULT_STATISTICS

-    def __init__(self, data: Union[np.ndarray, pd.Series, pd.DataFrame],
-        stats: Sequence[str]=None, *, numeric: bool=True, categorical: bool
-        =True, alpha: float=0.05, use_t: bool=False, percentiles: Sequence[
-        Union[int, float]]=PERCENTILES, ntop: bool=5):
+    def __init__(
+        self,
+        data: Union[np.ndarray, pd.Series, pd.DataFrame],
+        stats: Sequence[str] = None,
+        *,
+        numeric: bool = True,
+        categorical: bool = True,
+        alpha: float = 0.05,
+        use_t: bool = False,
+        percentiles: Sequence[Union[int, float]] = PERCENTILES,
+        ntop: bool = 5,
+    ):
         data_arr = data
         if not isinstance(data, (pd.Series, pd.DataFrame)):
-            data_arr = array_like(data, 'data', maxdim=2)
+            data_arr = array_like(data, "data", maxdim=2)
         if data_arr.ndim == 1:
             data = pd.Series(data)
-        numeric = bool_like(numeric, 'numeric')
-        categorical = bool_like(categorical, 'categorical')
+        numeric = bool_like(numeric, "numeric")
+        categorical = bool_like(categorical, "categorical")
         include = []
-        col_types = ''
+        col_types = ""
         if numeric:
             include.append(np.number)
-            col_types = 'numeric'
+            col_types = "numeric"
         if categorical:
-            include.append('category')
-            col_types += 'and ' if col_types != '' else ''
-            col_types += 'categorical'
+            include.append("category")
+            col_types += "and " if col_types != "" else ""
+            col_types += "categorical"
         if not numeric and not categorical:
             raise ValueError(
-                'At least one of numeric and categorical must be True')
+                "At least one of numeric and categorical must be True"
+            )
         self._data = pd.DataFrame(data).select_dtypes(include)
         if self._data.shape[1] == 0:
+
             raise ValueError(
-                f'Selecting {col_types} results in an empty DataFrame')
+                f"Selecting {col_types} results in an empty DataFrame"
+            )
         self._is_numeric = [is_numeric_dtype(dt) for dt in self._data.dtypes]
-        self._is_cat_like = [is_categorical_dtype(dt) for dt in self._data.
-            dtypes]
+        self._is_cat_like = [
+            is_categorical_dtype(dt) for dt in self._data.dtypes
+        ]
+
         if stats is not None:
             undef = [stat for stat in stats if stat not in DEFAULT_STATISTICS]
             if undef:
-                raise ValueError(f"{', '.join(undef)} are not known statistics"
-                    )
-        self._stats = list(DEFAULT_STATISTICS) if stats is None else list(stats
-            )
-        self._ntop = int_like(ntop, 'ntop')
-        self._compute_top = 'top' in self._stats
-        self._compute_freq = 'freq' in self._stats
+                raise ValueError(
+                    f"{', '.join(undef)} are not known statistics"
+                )
+        self._stats = (
+            list(DEFAULT_STATISTICS) if stats is None else list(stats)
+        )
+        self._ntop = int_like(ntop, "ntop")
+        self._compute_top = "top" in self._stats
+        self._compute_freq = "freq" in self._stats
         if self._compute_top and self._ntop <= 0 < sum(self._is_cat_like):
-            raise ValueError('top must be a non-negative integer')
-        replacements = {'mode': ['mode', 'mode_freq'], 'ci': ['upper_ci',
-            'lower_ci'], 'jarque_bera': ['jarque_bera', 'jarque_bera_pval'],
-            'top': [f'top_{i}' for i in range(1, self._ntop + 1)], 'freq':
-            [f'freq_{i}' for i in range(1, self._ntop + 1)]}
+            raise ValueError("top must be a non-negative integer")
+
+        # Expand special stats
+        replacements = {
+            "mode": ["mode", "mode_freq"],
+            "ci": ["upper_ci", "lower_ci"],
+            "jarque_bera": ["jarque_bera", "jarque_bera_pval"],
+            "top": [f"top_{i}" for i in range(1, self._ntop + 1)],
+            "freq": [f"freq_{i}" for i in range(1, self._ntop + 1)],
+        }
+
         for key in replacements:
             if key in self._stats:
                 idx = self._stats.index(key)
-                self._stats = self._stats[:idx] + replacements[key
-                    ] + self._stats[idx + 1:]
-        self._percentiles = array_like(percentiles, 'percentiles', maxdim=1,
-            dtype='d')
+                self._stats = (
+                    self._stats[:idx]
+                    + replacements[key]
+                    + self._stats[idx + 1 :]
+                )
+
+        self._percentiles = array_like(
+            percentiles, "percentiles", maxdim=1, dtype="d"
+        )
         self._percentiles = np.sort(self._percentiles)
         if np.unique(self._percentiles).shape[0] != self._percentiles.shape[0]:
-            raise ValueError('percentiles must be distinct')
+            raise ValueError("percentiles must be distinct")
         if np.any(self._percentiles >= 100) or np.any(self._percentiles <= 0):
-            raise ValueError('percentiles must be strictly between 0 and 100')
-        self._alpha = float_like(alpha, 'alpha')
+            raise ValueError("percentiles must be strictly between 0 and 100")
+        self._alpha = float_like(alpha, "alpha")
         if not 0 < alpha < 1:
-            raise ValueError('alpha must be strictly between 0 and 1')
-        self._use_t = bool_like(use_t, 'use_t')
+            raise ValueError("alpha must be strictly between 0 and 1")
+        self._use_t = bool_like(use_t, "use_t")
+
+    def _reorder(self, df: pd.DataFrame) -> pd.DataFrame:
+        return df.loc[[s for s in self._stats if s in df.index]]

     @cache_readonly
-    def frame(self) ->pd.DataFrame:
+    def frame(self) -> pd.DataFrame:
         """
         Descriptive statistics for both numeric and categorical data

@@ -244,10 +365,17 @@ class Description:
         DataFrame
             The statistics
         """
-        pass
+        numeric = self.numeric
+        categorical = self.categorical
+        if categorical.shape[1] == 0:
+            return numeric
+        elif numeric.shape[1] == 0:
+            return categorical
+        df = pd.concat([numeric, categorical], axis=1)
+        return self._reorder(df[self._data.columns])

     @cache_readonly
-    def numeric(self) ->pd.DataFrame:
+    def numeric(self) -> pd.DataFrame:
         """
         Descriptive statistics for numeric data

@@ -256,10 +384,145 @@ class Description:
         DataFrame
             The statistics of the numeric columns
         """
-        pass
+        df: pd.DataFrame = self._data.loc[:, self._is_numeric]
+        cols = df.columns
+        _, k = df.shape
+        std = df.std()
+        count = df.count()
+        mean = df.mean()
+        mad = (df - mean).abs().mean()
+        std_err = std.copy()
+        std_err.loc[count > 0] /= count.loc[count > 0] ** 0.5
+        if self._use_t:
+            q = stats.t(count - 1).ppf(1.0 - self._alpha / 2)
+        else:
+            q = stats.norm.ppf(1.0 - self._alpha / 2)
+
+        def _mode(ser):
+            dtype = ser.dtype if isinstance(ser.dtype, np.dtype) else ser.dtype.numpy_dtype
+            ser_no_missing = ser.dropna().to_numpy(dtype=dtype)
+            kwargs = {} if SP_LT_19 else {"keepdims": True}
+            mode_res = stats.mode(ser_no_missing, **kwargs)
+            # Changes in SciPy 1.10
+            if np.isscalar(mode_res[0]):
+                return float(mode_res[0]), mode_res[1]
+            if mode_res[0].shape[0] > 0:
+                return [float(val) for val in mode_res]
+            return np.nan, np.nan
+
+        mode_values = df.apply(_mode).T
+        if mode_values.size > 0:
+            if isinstance(mode_values, pd.DataFrame):
+                # pandas 1.0 or later
+                mode = np.asarray(mode_values[0], dtype=float)
+                mode_counts = np.asarray(mode_values[1], dtype=np.int64)
+            else:
+                # pandas before 1.0 returns a Series of 2-elem list
+                mode = []
+                mode_counts = []
+                for idx in mode_values.index:
+                    val = mode_values.loc[idx]
+                    mode.append(val[0])
+                    mode_counts.append(val[1])
+                mode = np.atleast_1d(mode)
+                mode_counts = np.atleast_1d(mode_counts)
+        else:
+            mode = mode_counts = np.empty(0)
+        loc = count > 0
+        mode_freq = np.full(mode.shape[0], np.nan)
+        mode_freq[loc] = mode_counts[loc] / count.loc[loc]
+        # TODO: Workaround for pandas AbstractMethodError in extension
+        #  types. Remove when quantile is supported for these
+        _df = df
+        try:
+            from pandas.api.types import is_extension_array_dtype
+            _df = df.copy()
+            for col in df:
+                if is_extension_array_dtype(df[col].dtype):
+                    if _df[col].isnull().any():
+                        _df[col] = _df[col].fillna(np.nan)
+        except ImportError:
+            pass
+
+        if df.shape[1] > 0:
+            iqr = _df.quantile(0.75) - _df.quantile(0.25)
+        else:
+            iqr = mean
+
+        def _safe_jarque_bera(c):
+            a = np.asarray(c)
+            if a.shape[0] < 2:
+                return (np.nan,) * 4
+            return jarque_bera(a)
+
+        jb = df.apply(
+            lambda x: list(_safe_jarque_bera(x.dropna())), result_type="expand"
+        ).T
+        nan_mean = mean.copy()
+        nan_mean.loc[nan_mean == 0] = np.nan
+        coef_var = std / nan_mean
+
+        results = {
+            "nobs": pd.Series(
+                np.ones(k, dtype=np.int64) * df.shape[0], index=cols
+            ),
+            "missing": df.shape[0] - count,
+            "mean": mean,
+            "std_err": std_err,
+            "upper_ci": mean + q * std_err,
+            "lower_ci": mean - q * std_err,
+            "std": std,
+            "iqr": iqr,
+            "mad": mad,
+            "coef_var": coef_var,
+            "range": pd_ptp(df),
+            "max": df.max(),
+            "min": df.min(),
+            "skew": jb[2],
+            "kurtosis": jb[3],
+            "iqr_normal": iqr / np.diff(stats.norm.ppf([0.25, 0.75])),
+            "mad_normal": mad / np.sqrt(2 / np.pi),
+            "jarque_bera": jb[0],
+            "jarque_bera_pval": jb[1],
+            "mode": pd.Series(mode, index=cols),
+            "mode_freq": pd.Series(mode_freq, index=cols),
+            "median": df.median(),
+        }
+        final = {k: v for k, v in results.items() if k in self._stats}
+        results_df = pd.DataFrame(
+            list(final.values()), columns=cols, index=list(final.keys())
+        )
+        if "percentiles" not in self._stats:
+            return results_df
+        # Pandas before 1.0 cannot handle empty DF
+        if df.shape[1] > 0:
+            # TODO: Remove when extension types support quantile
+            perc = _df.quantile(self._percentiles / 100).astype(float)
+        else:
+            perc = pd.DataFrame(index=self._percentiles / 100, dtype=float)
+        if np.all(np.floor(100 * perc.index) == (100 * perc.index)):
+            perc.index = [f"{int(100 * idx)}%" for idx in perc.index]
+        else:
+            dupe = True
+            scale = 100
+            index = perc.index
+            while dupe:
+                scale *= 10
+                idx = np.floor(scale * perc.index)
+                if np.all(np.diff(idx) > 0):
+                    dupe = False
+            index = np.floor(scale * index) / (scale / 100)
+            fmt = f"0.{len(str(scale//100))-1}f"
+            output = f"{{0:{fmt}}}%"
+            perc.index = [output.format(val) for val in index]
+
+        # Add in the names of the percentiles to the output
+        self._stats = self._stats + perc.index.tolist()
+
+        return self._reorder(pd.concat([results_df, perc], axis=0))

     @cache_readonly
-    def categorical(self) ->pd.DataFrame:
+    def categorical(self) -> pd.DataFrame:
         """
         Descriptive statistics for categorical data

@@ -268,9 +531,55 @@ class Description:
         DataFrame
             The statistics of the categorical columns
         """
-        pass

-    def summary(self) ->SimpleTable:
+        df = self._data.loc[:, [col for col in self._is_cat_like]]
+        k = df.shape[1]
+        cols = df.columns
+        vc = {col: df[col].value_counts(normalize=True) for col in df}
+        distinct = pd.Series(
+            {col: vc[col].shape[0] for col in vc}, dtype=np.int64
+        )
+        top = {}
+        freq = {}
+        for col in vc:
+            single = vc[col]
+            if single.shape[0] >= self._ntop:
+                top[col] = single.index[: self._ntop]
+                freq[col] = np.asarray(single.iloc[:5])
+            else:
+                val = list(single.index)
+                val += [None] * (self._ntop - len(val))
+                top[col] = val
+                freq_val = list(single)
+                freq_val += [np.nan] * (self._ntop - len(freq_val))
+                freq[col] = np.asarray(freq_val)
+        index = [f"top_{i}" for i in range(1, self._ntop + 1)]
+        top_df = pd.DataFrame(top, dtype="object", index=index, columns=cols)
+        index = [f"freq_{i}" for i in range(1, self._ntop + 1)]
+        freq_df = pd.DataFrame(freq, dtype="object", index=index, columns=cols)
+
+        results = {
+            "nobs": pd.Series(
+                np.ones(k, dtype=np.int64) * df.shape[0], index=cols
+            ),
+            "missing": df.shape[0] - df.count(),
+            "distinct": distinct,
+        }
+        final = {k: v for k, v in results.items() if k in self._stats}
+        results_df = pd.DataFrame(
+            list(final.values()),
+            columns=cols,
+            index=list(final.keys()),
+            dtype="object",
+        )
+        if self._compute_top:
+            results_df = pd.concat([results_df, top_df], axis=0)
+        if self._compute_freq:
+            results_df = pd.concat([results_df, freq_df], axis=0)
+
+        return self._reorder(results_df)
+
+    def summary(self) -> SimpleTable:
         """
         Summary table of the descriptive statistics

@@ -279,19 +588,77 @@ class Description:
         SimpleTable
             A table instance supporting export to text, csv and LaTeX
         """
-        pass
-
-    def __str__(self) ->str:
+        df = self.frame.astype(object)
+        if df.isnull().any().any():
+            df = df.fillna("")
+        cols = [str(col) for col in df.columns]
+        stubs = [str(idx) for idx in df.index]
+        data = []
+        for _, row in df.iterrows():
+            data.append([v for v in row])
+
+        def _formatter(v):
+            if isinstance(v, str):
+                return v
+            elif v // 1 == v:
+                return str(int(v))
+            return f"{v:0.4g}"
+
+        return SimpleTable(
+            data,
+            header=cols,
+            stubs=stubs,
+            title="Descriptive Statistics",
+            txt_fmt={"data_fmts": {0: "%s", 1: _formatter}},
+            datatypes=[1] * len(data),
+        )
+
+    def __str__(self) -> str:
         return str(self.summary().as_text())


 ds = Docstring(Description.__doc__)
-ds.replace_block('Returns', Parameter(None, 'DataFrame', [
-    'Descriptive statistics']))
-ds.replace_block('Attributes', [])
-ds.replace_block('See Also', [([('pandas.DataFrame.describe', None)], [
-    'Basic descriptive statistics']), ([('Description', None)], [
-    'Descriptive statistics class with additional output options'])])
+ds.replace_block(
+    "Returns", Parameter(None, "DataFrame", ["Descriptive statistics"])
+)
+ds.replace_block("Attributes", [])
+ds.replace_block(
+    "See Also",
+    [
+        (
+            [("pandas.DataFrame.describe", None)],
+            ["Basic descriptive statistics"],
+        ),
+        (
+            [("Description", None)],
+            ["Descriptive statistics class with additional output options"],
+        ),
+    ],
+)
+
+
+@Appender(str(ds))
+def describe(
+    data: Union[np.ndarray, pd.Series, pd.DataFrame],
+    stats: Sequence[str] = None,
+    *,
+    numeric: bool = True,
+    categorical: bool = True,
+    alpha: float = 0.05,
+    use_t: bool = False,
+    percentiles: Sequence[Union[int, float]] = PERCENTILES,
+    ntop: bool = 5,
+) -> pd.DataFrame:
+    return Description(
+        data,
+        stats,
+        numeric=numeric,
+        categorical=categorical,
+        alpha=alpha,
+        use_t=use_t,
+        percentiles=percentiles,
+        ntop=ntop,
+    ).frame


 class Describe:
@@ -300,4 +667,4 @@ class Describe:
     """

     def __init__(self, dataset):
-        raise NotImplementedError('Describe has been removed')
+        raise NotImplementedError("Describe has been removed")
diff --git a/statsmodels/stats/diagnostic.py b/statsmodels/stats/diagnostic.py
index e6b04244f..07c1c4ecd 100644
--- a/statsmodels/stats/diagnostic.py
+++ b/statsmodels/stats/diagnostic.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Various Statistical Tests

@@ -23,22 +24,41 @@ missing:
   - specification tests against nonparametric alternatives
 """
 from statsmodels.compat.pandas import deprecate_kwarg
+
 from collections.abc import Iterable
+
 import numpy as np
 import pandas as pd
 from scipy import stats
+
 from statsmodels.regression.linear_model import OLS, RegressionResultsWrapper
 from statsmodels.stats._adnorm import anderson_statistic, normal_ad
-from statsmodels.stats._lilliefors import kstest_exponential, kstest_fit, kstest_normal, lilliefors
-from statsmodels.tools.validation import array_like, bool_like, dict_like, float_like, int_like, string_like
+from statsmodels.stats._lilliefors import (
+    kstest_exponential,
+    kstest_fit,
+    kstest_normal,
+    lilliefors,
+)
+from statsmodels.tools.validation import (
+    array_like,
+    bool_like,
+    dict_like,
+    float_like,
+    int_like,
+    string_like,
+)
 from statsmodels.tsa.tsatools import lagmat
-__all__ = ['kstest_fit', 'lilliefors', 'kstest_normal',
-    'kstest_exponential', 'normal_ad', 'compare_cox', 'compare_j',
-    'acorr_breusch_godfrey', 'acorr_ljungbox', 'acorr_lm', 'het_arch',
-    'het_breuschpagan', 'het_goldfeldquandt', 'het_white', 'spec_white',
-    'linear_lm', 'linear_rainbow', 'linear_harvey_collier',
-    'anderson_statistic']
-NESTED_ERROR = """The exog in results_x and in results_z are nested. {test} requires that models are non-nested.
+
+__all__ = ["kstest_fit", "lilliefors", "kstest_normal", "kstest_exponential",
+           "normal_ad", "compare_cox", "compare_j", "acorr_breusch_godfrey",
+           "acorr_ljungbox", "acorr_lm", "het_arch", "het_breuschpagan",
+           "het_goldfeldquandt", "het_white", "spec_white", "linear_lm",
+           "linear_rainbow", "linear_harvey_collier", "anderson_statistic"]
+
+
+NESTED_ERROR = """\
+The exog in results_x and in results_z are nested. {test} requires \
+that models are non-nested.
 """


@@ -58,11 +78,34 @@ def _check_nested_exog(small, large):
     bool
         True if small is nested by large
     """
-    pass

+    if small.shape[1] > large.shape[1]:
+        return False
+    coef = np.linalg.lstsq(large, small, rcond=None)[0]
+    err = small - large @ coef
+    return np.linalg.matrix_rank(np.c_[large, err]) == large.shape[1]

-class ResultsStore:

+def _check_nested_results(results_x, results_z):
+    if not isinstance(results_x, RegressionResultsWrapper):
+        raise TypeError("results_x must come from a linear regression model")
+    if not isinstance(results_z, RegressionResultsWrapper):
+        raise TypeError("results_z must come from a linear regression model")
+    if not np.allclose(results_x.model.endog, results_z.model.endog):
+        raise ValueError("endogenous variables in models are not the same")
+
+    x = results_x.model.exog
+    z = results_z.model.exog
+
+    nested = False
+    if x.shape[1] <= z.shape[1]:
+        nested = nested or _check_nested_exog(x, z)
+    else:
+        nested = nested or _check_nested_exog(z, x)
+    return nested
+
+
+class ResultsStore:
     def __str__(self):
         return getattr(self, '_str', self.__class__.__name__)

@@ -105,7 +148,37 @@ def compare_cox(results_x, results_z, store=False):
     .. [1] Greene, W. H. Econometric Analysis. New Jersey. Prentice Hall;
        5th edition. (2002).
     """
-    pass
+    if _check_nested_results(results_x, results_z):
+        raise ValueError(NESTED_ERROR.format(test="Cox comparison"))
+    x = results_x.model.exog
+    z = results_z.model.exog
+    nobs = results_x.model.endog.shape[0]
+    sigma2_x = results_x.ssr / nobs
+    sigma2_z = results_z.ssr / nobs
+    yhat_x = results_x.fittedvalues
+    res_dx = OLS(yhat_x, z).fit()
+    err_zx = res_dx.resid
+    res_xzx = OLS(err_zx, x).fit()
+    err_xzx = res_xzx.resid
+
+    sigma2_zx = sigma2_x + np.dot(err_zx.T, err_zx) / nobs
+    c01 = nobs / 2. * (np.log(sigma2_z) - np.log(sigma2_zx))
+    v01 = sigma2_x * np.dot(err_xzx.T, err_xzx) / sigma2_zx ** 2
+    q = c01 / np.sqrt(v01)
+    pval = 2 * stats.norm.sf(np.abs(q))
+
+    if store:
+        res = ResultsStore()
+        res.res_dx = res_dx
+        res.res_xzx = res_xzx
+        res.c01 = c01
+        res.v01 = v01
+        res.q = q
+        res.pvalue = pval
+        res.dist = stats.norm
+        return q, pval, res
+
+    return q, pval


 def compare_j(results_x, results_z, store=False):
@@ -145,12 +218,29 @@ def compare_j(results_x, results_z, store=False):
     .. [1] Greene, W. H. Econometric Analysis. New Jersey. Prentice Hall;
        5th edition. (2002).
     """
-    pass
-
-
-def compare_encompassing(results_x, results_z, cov_type='nonrobust',
-    cov_kwargs=None):
-    """
+    # TODO: Allow cov to be specified
+    if _check_nested_results(results_x, results_z):
+        raise ValueError(NESTED_ERROR.format(test="J comparison"))
+    y = results_x.model.endog
+    z = results_z.model.exog
+    yhat_x = results_x.fittedvalues
+    res_zx = OLS(y, np.column_stack((yhat_x, z))).fit()
+    tstat = res_zx.tvalues[0]
+    pval = res_zx.pvalues[0]
+    if store:
+        res = ResultsStore()
+        res.res_zx = res_zx
+        res.dist = stats.t(res_zx.df_resid)
+        res.teststat = tstat
+        res.pvalue = pval
+        return tstat, pval, res
+
+    return tstat, pval
+
+
+def compare_encompassing(results_x, results_z, cov_type="nonrobust",
+                         cov_kwargs=None):
+    r"""
     Davidson-MacKinnon encompassing test for comparing non-nested models

     Parameters
@@ -191,10 +281,10 @@ def compare_encompassing(results_x, results_z, cov_type='nonrobust',

     .. math::

-        Y = X\\beta + Z_1\\gamma + \\epsilon
+        Y = X\beta + Z_1\gamma + \epsilon

     where :math:`Z_1` are the columns of :math:`Z` that are not spanned by
-    :math:`X`. The null is :math:`H_0:\\gamma=0`. When testing whether z is
+    :math:`X`. The null is :math:`H_0:\gamma=0`. When testing whether z is
     encompassed, the roles of :math:`X` and :math:`Z` are reversed.

     Implementation of  Davidson and MacKinnon (1993)'s encompassing test.
@@ -202,11 +292,41 @@ def compare_encompassing(results_x, results_z, cov_type='nonrobust',
     that nests the two. The Wald tests are performed by using an OLS
     regression.
     """
-    pass
+    if _check_nested_results(results_x, results_z):
+        raise ValueError(NESTED_ERROR.format(test="Testing encompassing"))
+
+    y = results_x.model.endog
+    x = results_x.model.exog
+    z = results_z.model.exog
+
+    def _test_nested(endog, a, b, cov_est, cov_kwds):
+        err = b - a @ np.linalg.lstsq(a, b, rcond=None)[0]
+        u, s, v = np.linalg.svd(err)
+        eps = np.finfo(np.double).eps
+        tol = s.max(axis=-1, keepdims=True) * max(err.shape) * eps
+        non_zero = np.abs(s) > tol
+        aug = err @ v[:, non_zero]
+        aug_reg = np.hstack([a, aug])
+        k_a = aug.shape[1]
+        k = aug_reg.shape[1]
+
+        res = OLS(endog, aug_reg).fit(cov_type=cov_est, cov_kwds=cov_kwds)
+        r_matrix = np.zeros((k_a, k))
+        r_matrix[:, -k_a:] = np.eye(k_a)
+        test = res.wald_test(r_matrix, use_f=True, scalar=True)
+        stat, pvalue = test.statistic, test.pvalue
+        df_num, df_denom = int(test.df_num), int(test.df_denom)
+        return stat, pvalue, df_num, df_denom
+
+    x_nested = _test_nested(y, x, z, cov_type, cov_kwargs)
+    z_nested = _test_nested(y, z, x, cov_type, cov_kwargs)
+    return pd.DataFrame([x_nested, z_nested],
+                        index=["x", "z"],
+                        columns=["stat", "pvalue", "df_num", "df_denom"])


 def acorr_ljungbox(x, lags=None, boxpierce=False, model_df=0, period=None,
-    return_df=True, auto_lag=False):
+                   return_df=True, auto_lag=False):
     """
     Ljung-Box test of autocorrelation in residuals.

@@ -289,12 +409,80 @@ def acorr_ljungbox(x, lags=None, boxpierce=False, model_df=0, period=None,
            lb_stat     lb_pvalue
     10  214.106992  1.827374e-40
     """
-    pass
-
-
-@deprecate_kwarg('maxlag', 'nlags')
-def acorr_lm(resid, nlags=None, store=False, *, period=None, ddof=0,
-    cov_type='nonrobust', cov_kwargs=None):
+    # Avoid cyclic import
+    from statsmodels.tsa.stattools import acf
+    x = array_like(x, "x")
+    period = int_like(period, "period", optional=True)
+    model_df = int_like(model_df, "model_df", optional=False)
+    if period is not None and period <= 1:
+        raise ValueError("period must be >= 2")
+    if model_df < 0:
+        raise ValueError("model_df must be >= 0")
+    nobs = x.shape[0]
+    if auto_lag:
+        maxlag = nobs - 1
+
+        # Compute sum of squared autocorrelations
+        sacf = acf(x, nlags=maxlag, fft=False)
+
+        if not boxpierce:
+            q_sacf = (nobs * (nobs + 2) *
+                      np.cumsum(sacf[1:maxlag + 1] ** 2
+                                / (nobs - np.arange(1, maxlag + 1))))
+        else:
+            q_sacf = nobs * np.cumsum(sacf[1:maxlag + 1] ** 2)
+
+        # obtain thresholds
+        q = 2.4
+        threshold = np.sqrt(q * np.log(nobs))
+        threshold_metric = np.abs(sacf).max() * np.sqrt(nobs)
+
+        # compute penalized sum of squared autocorrelations
+        if (threshold_metric <= threshold):
+            q_sacf = q_sacf - (np.arange(1, nobs) * np.log(nobs))
+        else:
+            q_sacf = q_sacf - (2 * np.arange(1, nobs))
+
+        # note: np.argmax returns first (i.e., smallest) index of largest value
+        lags = np.argmax(q_sacf)
+        lags = max(1, lags)  # optimal lag has to be at least 1
+        lags = int_like(lags, "lags")
+        lags = np.arange(1, lags + 1)
+    elif period is not None:
+        lags = np.arange(1, min(nobs // 5, 2 * period) + 1, dtype=int)
+    elif lags is None:
+        lags = np.arange(1, min(nobs // 5, 10) + 1, dtype=int)
+    elif not isinstance(lags, Iterable):
+        lags = int_like(lags, "lags")
+        lags = np.arange(1, lags + 1)
+    lags = array_like(lags, "lags", dtype="int")
+    maxlag = lags.max()
+
+    # normalize by nobs not (nobs-nlags)
+    # SS: unbiased=False is default now
+    sacf = acf(x, nlags=maxlag, fft=False)
+    sacf2 = sacf[1:maxlag + 1] ** 2 / (nobs - np.arange(1, maxlag + 1))
+    qljungbox = nobs * (nobs + 2) * np.cumsum(sacf2)[lags - 1]
+    adj_lags = lags - model_df
+    pval = np.full_like(qljungbox, np.nan)
+    loc = adj_lags > 0
+    pval[loc] = stats.chi2.sf(qljungbox[loc], adj_lags[loc])
+
+    if not boxpierce:
+        return pd.DataFrame({"lb_stat": qljungbox, "lb_pvalue": pval},
+                            index=lags)
+
+    qboxpierce = nobs * np.cumsum(sacf[1:maxlag + 1] ** 2)[lags - 1]
+    pvalbp = np.full_like(qljungbox, np.nan)
+    pvalbp[loc] = stats.chi2.sf(qboxpierce[loc], adj_lags[loc])
+    return pd.DataFrame({"lb_stat": qljungbox, "lb_pvalue": pval,
+                         "bp_stat": qboxpierce, "bp_pvalue": pvalbp},
+                        index=lags)
+
+
+@deprecate_kwarg("maxlag", "nlags")
+def acorr_lm(resid, nlags=None, store=False, *, period=None,
+             ddof=0, cov_type="nonrobust", cov_kwargs=None):
     """
     Lagrange Multiplier tests for autocorrelation.

@@ -356,10 +544,48 @@ def acorr_lm(resid, nlags=None, store=False, *, period=None, ddof=0,
     R-squared from a regression on the residual on nlags lags of the
     residual.
     """
-    pass
-
-
-@deprecate_kwarg('maxlag', 'nlags')
+    resid = array_like(resid, "resid", ndim=1)
+    cov_type = string_like(cov_type, "cov_type")
+    cov_kwargs = {} if cov_kwargs is None else cov_kwargs
+    cov_kwargs = dict_like(cov_kwargs, "cov_kwargs")
+    nobs = resid.shape[0]
+    if period is not None and nlags is None:
+        maxlag = min(nobs // 5, 2 * period)
+    elif nlags is None:
+        maxlag = min(10, nobs // 5)
+    else:
+        maxlag = nlags
+
+    xdall = lagmat(resid[:, None], maxlag, trim="both")
+    nobs = xdall.shape[0]
+    xdall = np.c_[np.ones((nobs, 1)), xdall]
+    xshort = resid[-nobs:]
+    res_store = ResultsStore()
+    usedlag = maxlag
+
+    resols = OLS(xshort, xdall[:, :usedlag + 1]).fit(cov_type=cov_type,
+                                                     cov_kwargs=cov_kwargs)
+    fval = float(resols.fvalue)
+    fpval = float(resols.f_pvalue)
+    if cov_type == "nonrobust":
+        lm = (nobs - ddof) * resols.rsquared
+        lmpval = stats.chi2.sf(lm, usedlag)
+        # Note: deg of freedom for LM test: nvars - constant = lags used
+    else:
+        r_matrix = np.hstack((np.zeros((usedlag, 1)), np.eye(usedlag)))
+        test_stat = resols.wald_test(r_matrix, use_f=False, scalar=True)
+        lm = float(test_stat.statistic)
+        lmpval = float(test_stat.pvalue)
+
+    if store:
+        res_store.resols = resols
+        res_store.usedlag = usedlag
+        return lm, lmpval, fval, fpval, res_store
+    else:
+        return lm, lmpval, fval, fpval
+
+
+@deprecate_kwarg("maxlag", "nlags")
 def het_arch(resid, nlags=None, store=False, ddof=0):
     """
     Engle's Test for Autoregressive Conditional Heteroscedasticity (ARCH).
@@ -396,10 +622,10 @@ def het_arch(resid, nlags=None, store=False, ddof=0):
     -----
     verified against R:FinTS::ArchTest
     """
-    pass
+    return acorr_lm(resid ** 2, nlags=nlags, store=store, ddof=ddof)


-@deprecate_kwarg('results', 'res')
+@deprecate_kwarg("results", "res")
 def acorr_breusch_godfrey(res, nlags=None, store=False):
     """
     Breusch-Godfrey Lagrange Multiplier tests for residual autocorrelation.
@@ -441,10 +667,48 @@ def acorr_breusch_godfrey(res, nlags=None, store=False):
     .. [1] Greene, W. H. Econometric Analysis. New Jersey. Prentice Hall;
       5th edition. (2002).
     """
-    pass

-
-def _check_het_test(x: np.ndarray, test_name: str) ->None:
+    x = np.asarray(res.resid).squeeze()
+    if x.ndim != 1:
+        raise ValueError("Model resid must be a 1d array. Cannot be used on"
+                         " multivariate models.")
+    exog_old = res.model.exog
+    nobs = x.shape[0]
+    if nlags is None:
+        nlags = min(10, nobs // 5)
+
+    x = np.concatenate((np.zeros(nlags), x))
+
+    xdall = lagmat(x[:, None], nlags, trim="both")
+    nobs = xdall.shape[0]
+    xdall = np.c_[np.ones((nobs, 1)), xdall]
+    xshort = x[-nobs:]
+    if exog_old is None:
+        exog = xdall
+    else:
+        exog = np.column_stack((exog_old, xdall))
+    k_vars = exog.shape[1]
+
+    resols = OLS(xshort, exog).fit()
+    ft = resols.f_test(np.eye(nlags, k_vars, k_vars - nlags))
+    fval = ft.fvalue
+    fpval = ft.pvalue
+    fval = float(np.squeeze(fval))
+    fpval = float(np.squeeze(fpval))
+    lm = nobs * resols.rsquared
+    lmpval = stats.chi2.sf(lm, nlags)
+    # Note: degrees of freedom for LM test is nvars minus constant = usedlags
+
+    if store:
+        res_store = ResultsStore()
+        res_store.resols = resols
+        res_store.usedlag = nlags
+        return lm, lmpval, fval, fpval, res_store
+    else:
+        return lm, lmpval, fval, fpval
+
+
+def _check_het_test(x: np.ndarray, test_name: str) -> None:
     """
     Check validity of the exogenous regressors in a heteroskedasticity test

@@ -455,19 +719,27 @@ def _check_het_test(x: np.ndarray, test_name: str) ->None:
     test_name : str
         The test name for the exception
     """
-    pass
+    x_max = x.max(axis=0)
+    if (
+        not np.any(((x_max - x.min(axis=0)) == 0) & (x_max != 0))
+        or x.shape[1] < 2
+    ):
+        raise ValueError(
+            f"{test_name} test requires exog to have at least "
+            "two columns where one is a constant."
+        )


 def het_breuschpagan(resid, exog_het, robust=True):
-    """
+    r"""
     Breusch-Pagan Lagrange Multiplier test for heteroscedasticity

     The tests the hypothesis that the residual variance does not depend on
     the variables in x in the form

-    .. :math: \\sigma_i = \\sigma * f(\\alpha_0 + \\alpha z_i)
+    .. :math: \sigma_i = \sigma * f(\alpha_0 + \alpha z_i)

-    Homoscedasticity implies that :math:`\\alpha=0`.
+    Homoscedasticity implies that :math:`\alpha=0`.

     Parameters
     ----------
@@ -526,7 +798,18 @@ def het_breuschpagan(resid, exog_het, robust=True):
     .. [3] Koenker, R. (1981). "A note on studentizing a test for
        heteroskedasticity". Journal of Econometrics 17 (1): 107–112.
     """
-    pass
+    x = array_like(exog_het, "exog_het", ndim=2)
+    _check_het_test(x, "The Breusch-Pagan")
+    y = array_like(resid, "resid", ndim=1) ** 2
+    if not robust:
+        y = y / np.mean(y)
+    nobs, nvars = x.shape
+    resols = OLS(y, x).fit()
+    fval = resols.fvalue
+    fpval = resols.f_pvalue
+    lm = nobs * resols.rsquared if robust else resols.ess / 2
+    # Note: degrees of freedom for LM test is nvars minus constant
+    return lm, stats.chi2.sf(lm, nvars - 1), fval, fpval


 def het_white(resid, exog):
@@ -566,11 +849,29 @@ def het_white(resid, exog):
     Greene section 11.4.1 5th edition p. 222. Test statistic reproduces
     Greene 5th, example 11.3.
     """
-    pass
-
-
-def het_goldfeldquandt(y, x, idx=None, split=None, drop=None, alternative=
-    'increasing', store=False):
+    x = array_like(exog, "exog", ndim=2)
+    y = array_like(resid, "resid", ndim=2, shape=(x.shape[0], 1))
+    _check_het_test(x, "White's heteroskedasticity")
+    nobs, nvars0 = x.shape
+    i0, i1 = np.triu_indices(nvars0)
+    exog = x[:, i0] * x[:, i1]
+    nobs, nvars = exog.shape
+    assert nvars == nvars0 * (nvars0 - 1) / 2. + nvars0
+    resols = OLS(y ** 2, exog).fit()
+    fval = resols.fvalue
+    fpval = resols.f_pvalue
+    lm = nobs * resols.rsquared
+    # Note: degrees of freedom for LM test is nvars minus constant
+    # degrees of freedom take possible reduced rank in exog into account
+    # df_model checks the rank to determine df
+    # extra calculation that can be removed:
+    assert resols.df_model == np.linalg.matrix_rank(exog) - 1
+    lmpval = stats.chi2.sf(lm, resols.df_model)
+    return lm, lmpval, fval, fpval
+
+
+def het_goldfeldquandt(y, x, idx=None, split=None, drop=None,
+                       alternative="increasing", store=False):
     """
     Goldfeld-Quandt homoskedasticity test.

@@ -626,13 +927,69 @@ def het_goldfeldquandt(y, x, idx=None, split=None, drop=None, alternative=
     Results are identical R, but the drop option is defined differently.
     (sorting by idx not tested yet)
     """
-    pass
-
-
-@deprecate_kwarg('result', 'res')
-def linear_reset(res, power=3, test_type='fitted', use_f=False, cov_type=
-    'nonrobust', cov_kwargs=None):
-    """
+    x = np.asarray(x)
+    y = np.asarray(y)  # **2
+    nobs, nvars = x.shape
+    if split is None:
+        split = nobs // 2
+    elif (0 < split) and (split < 1):
+        split = int(nobs * split)
+
+    if drop is None:
+        start2 = split
+    elif (0 < drop) and (drop < 1):
+        start2 = split + int(nobs * drop)
+    else:
+        start2 = split + drop
+
+    if idx is not None:
+        xsortind = np.argsort(x[:, idx])
+        y = y[xsortind]
+        x = x[xsortind, :]
+
+    resols1 = OLS(y[:split], x[:split]).fit()
+    resols2 = OLS(y[start2:], x[start2:]).fit()
+    fval = resols2.mse_resid / resols1.mse_resid
+    # if fval>1:
+    if alternative.lower() in ["i", "inc", "increasing"]:
+        fpval = stats.f.sf(fval, resols1.df_resid, resols2.df_resid)
+        ordering = "increasing"
+    elif alternative.lower() in ["d", "dec", "decreasing"]:
+        fpval = stats.f.sf(1. / fval, resols2.df_resid, resols1.df_resid)
+        ordering = "decreasing"
+    elif alternative.lower() in ["2", "2-sided", "two-sided"]:
+        fpval_sm = stats.f.cdf(fval, resols2.df_resid, resols1.df_resid)
+        fpval_la = stats.f.sf(fval, resols2.df_resid, resols1.df_resid)
+        fpval = 2 * min(fpval_sm, fpval_la)
+        ordering = "two-sided"
+    else:
+        raise ValueError("invalid alternative")
+
+    if store:
+        res = ResultsStore()
+        res.__doc__ = "Test Results for Goldfeld-Quandt test of" \
+                      "heterogeneity"
+        res.fval = fval
+        res.fpval = fpval
+        res.df_fval = (resols2.df_resid, resols1.df_resid)
+        res.resols1 = resols1
+        res.resols2 = resols2
+        res.ordering = ordering
+        res.split = split
+        res._str = """\
+The Goldfeld-Quandt test for null hypothesis that the variance in the second
+subsample is %s than in the first subsample:
+F-statistic =%8.4f and p-value =%8.4f""" % (ordering, fval, fpval)
+
+        return fval, fpval, ordering, res
+
+    return fval, fpval, ordering
+
+
+@deprecate_kwarg("result", "res")
+def linear_reset(res, power=3, test_type="fitted", use_f=False,
+                 cov_type="nonrobust", cov_kwargs=None):
+    r"""
     Ramsey's RESET test for neglected nonlinearity

     Parameters
@@ -675,23 +1032,74 @@ def linear_reset(res, power=3, test_type='fitted', use_f=False, cov_type=

     .. math::

-       Y = X\\beta + Z\\gamma + \\epsilon
+       Y = X\beta + Z\gamma + \epsilon

     where :math:`Z` are a set of regressors that are one of:

-    * Powers of :math:`X\\hat{\\beta}` from the original regression.
+    * Powers of :math:`X\hat{\beta}` from the original regression.
     * Powers of :math:`X`, excluding the constant and binary regressors.
     * Powers of the first principal component of :math:`X`. If the
       model includes a constant, this column is dropped before computing
       the principal component. In either case, the principal component
       is extracted from the correlation matrix of remaining columns.

-    The test is a Wald test of the null :math:`H_0:\\gamma=0`. If use_f
+    The test is a Wald test of the null :math:`H_0:\gamma=0`. If use_f
     is True, then the quadratic-form test statistic is divided by the
     number of restrictions and the F distribution is used to compute
     the critical value.
     """
-    pass
+    if not isinstance(res, RegressionResultsWrapper):
+        raise TypeError("result must come from a linear regression model")
+    if bool(res.model.k_constant) and res.model.exog.shape[1] == 1:
+        raise ValueError("exog contains only a constant column. The RESET "
+                         "test requires exog to have at least 1 "
+                         "non-constant column.")
+    test_type = string_like(test_type, "test_type",
+                            options=("fitted", "exog", "princomp"))
+    cov_kwargs = dict_like(cov_kwargs, "cov_kwargs", optional=True)
+    use_f = bool_like(use_f, "use_f")
+    if isinstance(power, int):
+        if power < 2:
+            raise ValueError("power must be >= 2")
+        power = np.arange(2, power + 1, dtype=int)
+    else:
+        try:
+            power = np.array(power, dtype=int)
+        except Exception:
+            raise ValueError("power must be an integer or list of integers")
+        if power.ndim != 1 or len(set(power)) != power.shape[0] or \
+                (power < 2).any():
+            raise ValueError("power must contains distinct integers all >= 2")
+    exog = res.model.exog
+    if test_type == "fitted":
+        aug = np.asarray(res.fittedvalues)[:, None]
+    elif test_type == "exog":
+        # Remove constant and binary
+        aug = res.model.exog
+        binary = ((exog == exog.max(axis=0)) | (exog == exog.min(axis=0)))
+        binary = binary.all(axis=0)
+        if binary.all():
+            raise ValueError("Model contains only constant or binary data")
+        aug = aug[:, ~binary]
+    else:
+        from statsmodels.multivariate.pca import PCA
+        aug = exog
+        if res.k_constant:
+            retain = np.arange(aug.shape[1]).tolist()
+            retain.pop(int(res.model.data.const_idx))
+            aug = aug[:, retain]
+        pca = PCA(aug, ncomp=1, standardize=bool(res.k_constant),
+                  demean=bool(res.k_constant), method="nipals")
+        aug = pca.factors[:, :1]
+    aug_exog = np.hstack([exog] + [aug ** p for p in power])
+    mod_class = res.model.__class__
+    mod = mod_class(res.model.data.endog, aug_exog)
+    cov_kwargs = {} if cov_kwargs is None else cov_kwargs
+    res = mod.fit(cov_type=cov_type, cov_kwargs=cov_kwargs)
+    nrestr = aug_exog.shape[1] - exog.shape[1]
+    nparams = aug_exog.shape[1]
+    r_mat = np.eye(nrestr, nparams, k=nparams-nrestr)
+    return res.wald_test(r_mat, use_f=use_f, scalar=True)


 def linear_harvey_collier(res, order_by=None, skip=None):
@@ -729,11 +1137,16 @@ def linear_harvey_collier(res, order_by=None, skip=None):
     This test is a t-test that the mean of the recursive ols residuals is zero.
     Calculating the recursive residuals might take some time for large samples.
     """
-    pass
+    # I think this has different ddof than
+    # B.H. Baltagi, Econometrics, 2011, chapter 8
+    # but it matches Gretl and R:lmtest, pvalue at decimal=13
+    rr = recursive_olsresiduals(res, skip=skip, alpha=0.95, order_by=order_by)

+    return stats.ttest_1samp(rr[3][3:], 0)

-def linear_rainbow(res, frac=0.5, order_by=None, use_distance=False, center
-    =None):
+
+def linear_rainbow(res, frac=0.5, order_by=None, use_distance=False,
+                   center=None):
     """
     Rainbow test for linearity

@@ -772,7 +1185,74 @@ def linear_rainbow(res, frac=0.5, order_by=None, use_distance=False, center
     This test assumes residuals are homoskedastic and may reject a correct
     linear specification if the residuals are heteroskedastic.
     """
-    pass
+    if not isinstance(res, RegressionResultsWrapper):
+        raise TypeError("res must be a results instance from a linear model.")
+    frac = float_like(frac, "frac")
+
+    use_distance = bool_like(use_distance, "use_distance")
+    nobs = res.nobs
+    endog = res.model.endog
+    exog = res.model.exog
+    if order_by is not None and use_distance:
+        raise ValueError("order_by and use_distance cannot be simultaneously"
+                         "used.")
+    if order_by is not None:
+        if isinstance(order_by, np.ndarray):
+            order_by = array_like(order_by, "order_by", ndim=1, dtype="int")
+        else:
+            if isinstance(order_by, str):
+                order_by = [order_by]
+            try:
+                cols = res.model.data.orig_exog[order_by].copy()
+            except (IndexError, KeyError):
+                raise TypeError("order_by must contain valid column names "
+                                "from the exog data used to construct res,"
+                                "and exog must be a pandas DataFrame.")
+            name = "__index__"
+            while name in cols:
+                name += '_'
+            cols[name] = np.arange(cols.shape[0])
+            cols = cols.sort_values(order_by)
+            order_by = np.asarray(cols[name])
+        endog = endog[order_by]
+        exog = exog[order_by]
+    if use_distance:
+        center = int(nobs) // 2 if center is None else center
+        if isinstance(center, float):
+            if not 0.0 <= center <= 1.0:
+                raise ValueError("center must be in (0, 1) when a float.")
+            center = int(center * (nobs-1))
+        else:
+            center = int_like(center, "center")
+            if not 0 < center < nobs - 1:
+                raise ValueError("center must be in [0, nobs) when an int.")
+        center_obs = exog[center:center+1]
+        from scipy.spatial.distance import cdist
+        try:
+            err = exog - center_obs
+            vi = np.linalg.inv(err.T @ err / nobs)
+        except np.linalg.LinAlgError:
+            err = exog - exog.mean(0)
+            vi = np.linalg.inv(err.T @ err / nobs)
+        dist = cdist(exog, center_obs, metric='mahalanobis', VI=vi)
+        idx = np.argsort(dist.ravel())
+        endog = endog[idx]
+        exog = exog[idx]
+
+    lowidx = np.ceil(0.5 * (1 - frac) * nobs).astype(int)
+    uppidx = np.floor(lowidx + frac * nobs).astype(int)
+    if uppidx - lowidx < exog.shape[1]:
+        raise ValueError("frac is too small to perform test. frac * nobs"
+                         "must be greater than the number of exogenous"
+                         "variables in the model.")
+    mi_sl = slice(lowidx, uppidx)
+    res_mi = OLS(endog[mi_sl], exog[mi_sl]).fit()
+    nobs_mi = res_mi.model.endog.shape[0]
+    ss_mi = res_mi.ssr
+    ss = res.ssr
+    fstat = (ss - ss_mi) / (nobs - nobs_mi) / ss_mi * res_mi.df_resid
+    pval = stats.f.sf(fstat, nobs - nobs_mi, res_mi.df_resid)
+    return fstat, pval


 def linear_lm(resid, exog, func=None):
@@ -810,7 +1290,18 @@ def linear_lm(resid, exog, func=None):
     regressors. The Null hypothesis is that the linear specification is
     correct.
     """
-    pass
+    if func is None:
+        def func(x):
+            return np.power(x, 2)
+    exog = np.asarray(exog)
+    exog_aux = np.column_stack((exog, func(exog[:, 1:])))
+
+    nobs, k_vars = exog.shape
+    ls = OLS(resid, exog_aux).fit()
+    ftest = ls.f_test(np.eye(k_vars - 1, k_vars * 2 - 1, k_vars))
+    lm = nobs * ls.rsquared
+    lm_pval = stats.chi2.sf(lm, k_vars - 1)
+    return lm, lm_pval, ftest


 def spec_white(resid, exog):
@@ -863,12 +1354,42 @@ def spec_white(resid, exog):
        estimator and a direct test for heteroscedasticity. Econometrica, 48:
        817-838.
     """
-    pass
-
-
-@deprecate_kwarg('olsresults', 'res')
-def recursive_olsresiduals(res, skip=None, lamda=0.0, alpha=0.95, order_by=None
-    ):
+    x = array_like(exog, "exog", ndim=2)
+    e = array_like(resid, "resid", ndim=1)
+    if x.shape[1] < 2 or not np.any(np.ptp(x, 0) == 0.0):
+        raise ValueError("White's specification test requires at least two"
+                         "columns where one is a constant.")
+
+    # add interaction terms
+    i0, i1 = np.triu_indices(x.shape[1])
+    exog = np.delete(x[:, i0] * x[:, i1], 0, 1)
+
+    # collinearity check - see _fit_collinear
+    atol = 1e-14
+    rtol = 1e-13
+    tol = atol + rtol * exog.var(0)
+    r = np.linalg.qr(exog, mode="r")
+    mask = np.abs(r.diagonal()) < np.sqrt(tol)
+    exog = exog[:, np.where(~mask)[0]]
+
+    # calculate test statistic
+    sqe = e * e
+    sqmndevs = sqe - np.mean(sqe)
+    d = np.dot(exog.T, sqmndevs)
+    devx = exog - np.mean(exog, axis=0)
+    devx *= sqmndevs[:, None]
+    b = devx.T.dot(devx)
+    stat = d.dot(np.linalg.solve(b, d))
+
+    # chi-square test
+    dof = devx.shape[1]
+    pval = stats.chi2.sf(stat, dof)
+    return stat, pval, dof
+
+
+@deprecate_kwarg("olsresults", "res")
+def recursive_olsresiduals(res, skip=None, lamda=0.0, alpha=0.95,
+                           order_by=None):
     """
     Calculate recursive ols with residuals and Cusum test statistic

@@ -927,7 +1448,88 @@ def recursive_olsresiduals(res, skip=None, lamda=0.0, alpha=0.95, order_by=None
     Journal of the Royal Statistical Society. Series B (Methodological) 37,
     no. 2 (1975): 149-192.
     """
-    pass
+    if not isinstance(res, RegressionResultsWrapper):
+        raise TypeError("res a regression results instance")
+    y = res.model.endog
+    x = res.model.exog
+    order_by = array_like(order_by, "order_by", dtype="int", optional=True,
+                          ndim=1, shape=(y.shape[0],))
+    # intialize with skip observations
+    if order_by is not None:
+        x = x[order_by]
+        y = y[order_by]
+
+    nobs, nvars = x.shape
+    if skip is None:
+        skip = nvars
+    rparams = np.nan * np.zeros((nobs, nvars))
+    rresid = np.nan * np.zeros(nobs)
+    rypred = np.nan * np.zeros(nobs)
+    rvarraw = np.nan * np.zeros(nobs)
+
+    x0 = x[:skip]
+    if np.linalg.matrix_rank(x0) < x0.shape[1]:
+        err_msg = """\
+"The initial regressor matrix, x[:skip], issingular. You must use a value of
+skip large enough to ensure that the first OLS estimator is well-defined.
+"""
+        raise ValueError(err_msg)
+    y0 = y[:skip]
+    # add Ridge to start (not in jplv)
+    xtxi = np.linalg.inv(np.dot(x0.T, x0) + lamda * np.eye(nvars))
+    xty = np.dot(x0.T, y0)  # xi * y   #np.dot(xi, y)
+    beta = np.dot(xtxi, xty)
+    rparams[skip - 1] = beta
+    yipred = np.dot(x[skip - 1], beta)
+    rypred[skip - 1] = yipred
+    rresid[skip - 1] = y[skip - 1] - yipred
+    rvarraw[skip - 1] = 1 + np.dot(x[skip - 1], np.dot(xtxi, x[skip - 1]))
+    for i in range(skip, nobs):
+        xi = x[i:i + 1, :]
+        yi = y[i]
+
+        # get prediction error with previous beta
+        yipred = np.dot(xi, beta)
+        rypred[i] = np.squeeze(yipred)
+        residi = yi - yipred
+        rresid[i] = np.squeeze(residi)
+
+        # update beta and inverse(X'X)
+        tmp = np.dot(xtxi, xi.T)
+        ft = 1 + np.dot(xi, tmp)
+
+        xtxi = xtxi - np.dot(tmp, tmp.T) / ft  # BigJudge equ 5.5.15
+
+        beta = beta + (tmp * residi / ft).ravel()  # BigJudge equ 5.5.14
+        rparams[i] = beta
+        rvarraw[i] = np.squeeze(ft)
+
+    rresid_scaled = rresid / np.sqrt(rvarraw)  # N(0,sigma2) distributed
+    nrr = nobs - skip
+    # sigma2 = rresid_scaled[skip-1:].var(ddof=1)  #var or sum of squares ?
+    # Greene has var, jplv and Ploberger have sum of squares (Ass.:mean=0)
+    # Gretl uses: by reverse engineering matching their numbers
+    sigma2 = rresid_scaled[skip:].var(ddof=1)
+    rresid_standardized = rresid_scaled / np.sqrt(sigma2)  # N(0,1) distributed
+    rcusum = rresid_standardized[skip - 1:].cumsum()
+    # confidence interval points in Greene p136 looks strange. Cleared up
+    # this assumes sum of independent standard normal, which does not take into
+    # account that we make many tests at the same time
+    if alpha == 0.90:
+        a = 0.850
+    elif alpha == 0.95:
+        a = 0.948
+    elif alpha == 0.99:
+        a = 1.143
+    else:
+        raise ValueError("alpha can only be 0.9, 0.95 or 0.99")
+
+    # following taken from Ploberger,
+    # crit = a * np.sqrt(nrr)
+    rcusumci = (a * np.sqrt(nrr) + 2 * a * np.arange(0, nobs - skip) / np.sqrt(
+        nrr)) * np.array([[-1.], [+1.]])
+    return (rresid, rparams, rypred, rresid_standardized, rresid_scaled,
+            rcusum, rcusumci)


 def breaks_hansen(olsresults):
@@ -960,7 +1562,19 @@ def breaks_hansen(olsresults):
     ----------
     Greene section 7.5.1, notation follows Greene
     """
-    pass
+    x = olsresults.model.exog
+    resid = array_like(olsresults.resid, "resid", shape=(x.shape[0], 1))
+    nobs, nvars = x.shape
+    resid2 = resid ** 2
+    ft = np.c_[x * resid[:, None], (resid2 - resid2.mean())]
+    score = ft.cumsum(0)
+    f = nobs * (ft[:, :, None] * ft[:, None, :]).sum(0)
+    s = (score[:, :, None] * score[:, None, :]).sum(0)
+    h = np.trace(np.dot(np.linalg.inv(f), s))
+    crit95 = np.array([(2, 1.01), (6, 1.9), (15, 3.75), (19, 4.52)],
+                      dtype=[("nobs", int), ("crit", float)])
+    # TODO: get critical values from Bruce Hansen's 1992 paper
+    return h, crit95


 def breaks_cusumolsresid(resid, ddof=0):
@@ -1005,4 +1619,53 @@ def breaks_cusumolsresid(resid, ddof=0):
     Ploberger, Werner, and Walter Kramer. “The Cusum Test with OLS Residuals.”
     Econometrica 60, no. 2 (March 1992): 271-285.
     """
-    pass
+    resid = np.asarray(resid).ravel()
+    nobs = len(resid)
+    nobssigma2 = (resid ** 2).sum()
+    if ddof > 0:
+        nobssigma2 = nobssigma2 / (nobs - ddof) * nobs
+    # b is asymptotically a Brownian Bridge
+    b = resid.cumsum() / np.sqrt(nobssigma2)  # use T*sigma directly
+    # asymptotically distributed as standard Brownian Bridge
+    sup_b = np.abs(b).max()
+    crit = [(1, 1.63), (5, 1.36), (10, 1.22)]
+    # Note stats.kstwobign.isf(0.1) is distribution of sup.abs of Brownian
+    # Bridge
+    # >>> stats.kstwobign.isf([0.01,0.05,0.1])
+    # array([ 1.62762361,  1.35809864,  1.22384787])
+    pval = stats.kstwobign.sf(sup_b)
+    return sup_b, pval, crit
+
+# def breaks_cusum(recolsresid):
+#    """renormalized cusum test for parameter stability based on recursive
+#    residuals
+#
+#
+#    still incorrect: in PK, the normalization for sigma is by T not T-K
+#    also the test statistic is asymptotically a Wiener Process, Brownian
+#    motion
+#    not Brownian Bridge
+#    for testing: result reject should be identical as in standard cusum
+#    version
+#
+#    References
+#    ----------
+#    Ploberger, Werner, and Walter Kramer. “The Cusum Test with OLS Residuals.”
+#    Econometrica 60, no. 2 (March 1992): 271-285.
+#
+#    """
+#    resid = recolsresid.ravel()
+#    nobssigma2 = (resid**2).sum()
+#    #B is asymptotically a Brownian Bridge
+#    B = resid.cumsum()/np.sqrt(nobssigma2) # use T*sigma directly
+#    nobs = len(resid)
+#    denom = 1. + 2. * np.arange(nobs)/(nobs-1.) #not sure about limits
+#    sup_b = np.abs(B/denom).max()
+#    #asymptotically distributed as standard Brownian Bridge
+#    crit = [(1,1.63), (5, 1.36), (10, 1.22)]
+#    #Note stats.kstwobign.isf(0.1) is distribution of sup.abs of Brownian
+#    Bridge
+#    #>>> stats.kstwobign.isf([0.01,0.05,0.1])
+#    #array([ 1.62762361,  1.35809864,  1.22384787])
+#    pval = stats.kstwobign.sf(sup_b)
+#    return sup_b, pval, crit
diff --git a/statsmodels/stats/diagnostic_gen.py b/statsmodels/stats/diagnostic_gen.py
index 07192b120..57a1f5f36 100644
--- a/statsmodels/stats/diagnostic_gen.py
+++ b/statsmodels/stats/diagnostic_gen.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Tue Oct  6 12:42:11 2020

@@ -5,14 +6,17 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 from scipy import stats
+
 from statsmodels.stats.base import HolderTuple
 from statsmodels.stats.effect_size import _noncentrality_chisquare


-def test_chisquare_binning(counts, expected, sort_var=None, bins=10, df=
-    None, ordered=False, sort_method='quicksort', alpha_nc=0.05):
+def test_chisquare_binning(counts, expected, sort_var=None, bins=10,
+                           df=None, ordered=False, sort_method="quicksort",
+                           alpha_nc=0.05):
     """chisquare gof test with binning of data, Hosmer-Lemeshow type

     ``observed`` and ``expected`` are observation specific and should have
@@ -59,7 +63,60 @@ def test_chisquare_binning(counts, expected, sort_var=None, bins=10, df=
     Note: If there are ties in the ``sort_var`` array, then the split of
     observations into groups will depend on the sort algorithm.
     """
-    pass
+
+    observed = np.asarray(counts)
+    expected = np.asarray(expected)
+    n_observed = counts.sum()
+    n_expected = expected.sum()
+    if not np.allclose(n_observed, n_expected, atol=1e-13):
+        if np.max(expected) < 1 + 1e-13:
+            # expected seems to be probability, warn and rescale
+            import warnings
+            warnings.warn("sum of expected and of observed differ, "
+                          "rescaling ``expected``")
+            expected = expected / n_expected * n_observed
+        else:
+            # expected doesn't look like fractions or probabilities
+            raise ValueError("total counts of expected and observed differ")
+
+    # k = 1 if observed.ndim == 1 else observed.shape[1]
+    if sort_var is not None:
+        argsort = np.argsort(sort_var, kind=sort_method)
+    else:
+        argsort = np.arange(observed.shape[0])
+    # indices = [arr for arr in np.array_split(argsort, bins, axis=0)]
+    indices = np.array_split(argsort, bins, axis=0)
+    # in one loop, observed expected in last dimension, too messy,
+    # freqs_probs = np.array([np.vstack([observed[idx].mean(0),
+    #                                    expected[idx].mean(0)]).T
+    #                         for idx in indices])
+    freqs = np.array([observed[idx].sum(0) for idx in indices])
+    probs = np.array([expected[idx].sum(0) for idx in indices])
+
+    # chisquare test
+    resid_pearson = (freqs - probs) / np.sqrt(probs)
+    chi2_stat_groups = ((freqs - probs)**2 / probs).sum(1)
+    chi2_stat = chi2_stat_groups.sum()
+    if df is None:
+        g, c = freqs.shape
+        if ordered is True:
+            df = (g - 2) * (c - 1) + (c - 2)
+        else:
+            df = (g - 2) * (c - 1)
+    pvalue = stats.chi2.sf(chi2_stat, df)
+    noncentrality = _noncentrality_chisquare(chi2_stat, df, alpha=alpha_nc)
+
+    res = HolderTuple(statistic=chi2_stat,
+                      pvalue=pvalue,
+                      df=df,
+                      freqs=freqs,
+                      probs=probs,
+                      noncentrality=noncentrality,
+                      resid_pearson=resid_pearson,
+                      chi2_stat_groups=chi2_stat_groups,
+                      indices=indices
+                      )
+    return res


 def prob_larger_ordinal_choice(prob):
@@ -93,7 +150,18 @@ def prob_larger_ordinal_choice(prob):
     `statsmodels.stats.nonparametric.rank_compare_2ordinal`

     """
-    pass
+    # similar to `nonparametric rank_compare_2ordinal`
+
+    prob = np.asarray(prob)
+    cdf = prob.cumsum(-1)
+    if cdf.ndim == 1:
+        cdf_ = np.concatenate(([0], cdf))
+    elif cdf.ndim == 2:
+        cdf_ = np.concatenate((np.zeros((len(cdf), 1)), cdf), axis=1)
+    # r_1 = cdf_[..., 1:] + cdf_[..., :-1] - 1
+    cdf_mid = (cdf_[..., 1:] + cdf_[..., :-1]) / 2
+    r = cdf_mid * 2 - 1
+    return cdf_mid, r


 def prob_larger_2ordinal(probs1, probs2):
@@ -113,7 +181,39 @@ def prob_larger_2ordinal(probs1, probs2):
     prob2 : float
         prob2 = 1 - prob1 = Pr(x1 < x2) + 0.5 * Pr(x1 = x2)
     """
-    pass
+#    count1 = np.asarray(count1)
+#    count2 = np.asarray(count2)
+#    nobs1, nobs2 = count1.sum(), count2.sum()
+#    freq1 = count1 / nobs1
+#    freq2 = count2 / nobs2
+
+#     if freq1.ndim == 1:
+#         freq1_ = np.concatenate(([0], freq1))
+#     elif freq1.ndim == 2:
+#         freq1_ = np.concatenate((np.zeros((len(freq1), 1)), freq1), axis=1)
+
+#     if freq2.ndim == 1:
+#         freq2_ = np.concatenate(([0], freq2))
+#     elif freq2.ndim == 2:
+#         freq2_ = np.concatenate((np.zeros((len(freq2), 1)), freq2), axis=1)
+
+    freq1 = np.asarray(probs1)
+    freq2 = np.asarray(probs2)
+    # add zero at beginning of choices for cdf computation
+    freq1_ = np.concatenate((np.zeros(freq1.shape[:-1] + (1,)), freq1),
+                            axis=-1)
+    freq2_ = np.concatenate((np.zeros(freq2.shape[:-1] + (1,)), freq2),
+                            axis=-1)
+
+    cdf1 = freq1_.cumsum(axis=-1)
+    cdf2 = freq2_.cumsum(axis=-1)
+
+    # mid rank cdf
+    cdfm1 = (cdf1[..., 1:] + cdf1[..., :-1]) / 2
+    cdfm2 = (cdf2[..., 1:] + cdf2[..., :-1]) / 2
+    prob1 = (cdfm2 * freq1).sum(-1)
+    prob2 = (cdfm1 * freq2).sum(-1)
+    return prob1, prob2


 def cov_multinomial(probs):
@@ -124,7 +224,13 @@ def cov_multinomial(probs):
     cov = diag(probs) - outer(probs, probs)

     """
-    pass
+
+    k = probs.shape[-1]
+    di = np.diag_indices(k, 2)
+    cov = probs[..., None] * probs[..., None, :]
+    cov *= - 1
+    cov[..., di[0], di[1]] += probs
+    return cov


 def var_multinomial(probs):
@@ -133,4 +239,5 @@ def var_multinomial(probs):
     var = probs * (1 - probs)

     """
-    pass
+    var = probs * (1 - probs)
+    return var
diff --git a/statsmodels/stats/dist_dependence_measures.py b/statsmodels/stats/dist_dependence_measures.py
index a8a440f33..597336d67 100644
--- a/statsmodels/stats/dist_dependence_measures.py
+++ b/statsmodels/stats/dist_dependence_measures.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """ Distance dependence measure and the dCov test.

 Implementation of Székely et al. (2007) calculation of distance
@@ -15,16 +16,22 @@ References
 """
 from collections import namedtuple
 import warnings
+
 import numpy as np
 from scipy.spatial.distance import pdist, squareform
 from scipy.stats import norm
+
 from statsmodels.tools.sm_exceptions import HypothesisTestWarning
-DistDependStat = namedtuple('DistDependStat', ['test_statistic',
-    'distance_correlation', 'distance_covariance', 'dvar_x', 'dvar_y', 'S'])

+DistDependStat = namedtuple(
+    "DistDependStat",
+    ["test_statistic", "distance_correlation",
+     "distance_covariance", "dvar_x", "dvar_y", "S"],
+)

-def distance_covariance_test(x, y, B=None, method='auto'):
-    """The Distance Covariance (dCov) test
+
+def distance_covariance_test(x, y, B=None, method="auto"):
+    r"""The Distance Covariance (dCov) test

     Apply the Distance Covariance (dCov) test of independence to `x` and `y`.
     This test was introduced in [1]_, and is based on the distance covariance
@@ -104,11 +111,38 @@ def distance_covariance_test(x, y, B=None, method='auto'):
     # (test_statistic, pval, chosen_method)

     """
-    pass
+    x, y = _validate_and_tranform_x_and_y(x, y)
+
+    n = x.shape[0]
+    stats = distance_statistics(x, y)
+
+    if method == "auto" and n <= 500 or method == "emp":
+        chosen_method = "emp"
+        test_statistic, pval = _empirical_pvalue(x, y, B, n, stats)
+
+    elif method == "auto" and n > 500 or method == "asym":
+        chosen_method = "asym"
+        test_statistic, pval = _asymptotic_pvalue(stats)
+
+    else:
+        raise ValueError("Unknown 'method' parameter: {}".format(method))
+
+    # In case we got an extreme p-value (0 or 1) when using the empirical
+    # distribution of the test statistic under the null, we fall back
+    # to the asymptotic approximation.
+    if chosen_method == "emp" and pval in [0, 1]:
+        msg = (
+            f"p-value was {pval} when using the empirical method. "
+            "The asymptotic approximation will be used instead"
+        )
+        warnings.warn(msg, HypothesisTestWarning)
+        _, pval = _asymptotic_pvalue(stats)
+
+    return test_statistic, pval, chosen_method


 def _validate_and_tranform_x_and_y(x, y):
-    """Ensure `x` and `y` have proper shape and transform/reshape them if
+    r"""Ensure `x` and `y` have proper shape and transform/reshape them if
     required.

     Parameters
@@ -136,11 +170,25 @@ def _validate_and_tranform_x_and_y(x, y):
         If `x` and `y` have a different number of observations.

     """
-    pass
+    x = np.asanyarray(x)
+    y = np.asanyarray(y)
+
+    if x.shape[0] != y.shape[0]:
+        raise ValueError(
+            "x and y must have the same number of observations (rows)."
+        )
+
+    if len(x.shape) == 1:
+        x = x.reshape((x.shape[0], 1))
+
+    if len(y.shape) == 1:
+        y = y.reshape((y.shape[0], 1))
+
+    return x, y


 def _empirical_pvalue(x, y, B, n, stats):
-    """Calculate the empirical p-value based on permutations of `y`'s rows
+    r"""Calculate the empirical p-value based on permutations of `y`'s rows

     Parameters
     ----------
@@ -169,11 +217,18 @@ def _empirical_pvalue(x, y, B, n, stats):
         The empirical p-value.

     """
-    pass
+    B = int(B) if B else int(np.floor(200 + 5000 / n))
+    empirical_dist = _get_test_statistic_distribution(x, y, B)
+    pval = 1 - np.searchsorted(
+        sorted(empirical_dist), stats.test_statistic
+    ) / len(empirical_dist)
+    test_statistic = stats.test_statistic
+
+    return test_statistic, pval


 def _asymptotic_pvalue(stats):
-    """Calculate the p-value based on an approximation of the distribution of
+    r"""Calculate the p-value based on an approximation of the distribution of
     the test statistic under the null.

     Parameters
@@ -189,11 +244,14 @@ def _asymptotic_pvalue(stats):
         The asymptotic p-value.

     """
-    pass
+    test_statistic = np.sqrt(stats.test_statistic / stats.S)
+    pval = (1 - norm.cdf(test_statistic)) * 2
+
+    return test_statistic, pval


 def _get_test_statistic_distribution(x, y, B):
-    """
+    r"""
     Parameters
     ----------
     x : array_like, 1-D or 2-D
@@ -217,11 +275,19 @@ def _get_test_statistic_distribution(x, y, B):
         The empirical distribution of the test statistic.

     """
-    pass
+    y = y.copy()
+    emp_dist = np.zeros(B)
+    x_dist = squareform(pdist(x, "euclidean"))
+
+    for i in range(B):
+        np.random.shuffle(y)
+        emp_dist[i] = distance_statistics(x, y, x_dist=x_dist).test_statistic
+
+    return emp_dist


 def distance_statistics(x, y, x_dist=None, y_dist=None):
-    """Calculate various distance dependence statistics.
+    r"""Calculate various distance dependence statistics.

     Calculate several distance dependence statistics as described in [1]_.

@@ -282,11 +348,43 @@ def distance_statistics(x, y, x_dist=None, y_dist=None):
     S=0.10892061635588891)

     """
-    pass
+    x, y = _validate_and_tranform_x_and_y(x, y)
+
+    n = x.shape[0]
+
+    a = x_dist if x_dist is not None else squareform(pdist(x, "euclidean"))
+    b = y_dist if y_dist is not None else squareform(pdist(y, "euclidean"))
+
+    a_row_means = a.mean(axis=0, keepdims=True)
+    b_row_means = b.mean(axis=0, keepdims=True)
+    a_col_means = a.mean(axis=1, keepdims=True)
+    b_col_means = b.mean(axis=1, keepdims=True)
+    a_mean = a.mean()
+    b_mean = b.mean()
+
+    A = a - a_row_means - a_col_means + a_mean
+    B = b - b_row_means - b_col_means + b_mean
+
+    S = a_mean * b_mean
+    dcov = np.sqrt(np.multiply(A, B).mean())
+    dvar_x = np.sqrt(np.multiply(A, A).mean())
+    dvar_y = np.sqrt(np.multiply(B, B).mean())
+    dcor = dcov / np.sqrt(dvar_x * dvar_y)
+
+    test_statistic = n * dcov ** 2
+
+    return DistDependStat(
+        test_statistic=test_statistic,
+        distance_correlation=dcor,
+        distance_covariance=dcov,
+        dvar_x=dvar_x,
+        dvar_y=dvar_y,
+        S=S,
+    )


 def distance_covariance(x, y):
-    """Distance covariance.
+    r"""Distance covariance.

     Calculate the empirical distance covariance as described in [1]_.

@@ -324,11 +422,11 @@ def distance_covariance(x, y):
     0.007575063951951362

     """
-    pass
+    return distance_statistics(x, y).distance_covariance


 def distance_variance(x):
-    """Distance variance.
+    r"""Distance variance.

     Calculate the empirical distance variance as described in [1]_.

@@ -361,11 +459,11 @@ def distance_variance(x):
     0.21732609190659702

     """
-    pass
+    return distance_covariance(x, x)


 def distance_correlation(x, y):
-    """Distance correlation.
+    r"""Distance correlation.

     Calculate the empirical distance correlation as described in [1]_.
     This statistic is analogous to product-moment correlation and describes
@@ -407,4 +505,4 @@ def distance_correlation(x, y):
     0.04060497840149489

     """
-    pass
+    return distance_statistics(x, y).distance_correlation
diff --git a/statsmodels/stats/effect_size.py b/statsmodels/stats/effect_size.py
index cfeedbd51..e41771fbe 100644
--- a/statsmodels/stats/effect_size.py
+++ b/statsmodels/stats/effect_size.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Mon Oct  5 12:36:54 2020

@@ -47,7 +48,24 @@ def _noncentrality_chisquare(chi2_stat, df, alpha=0.05):
         https://doi.org/10.1016/j.spl.2008.07.025.

     """
-    pass
+    alpha_half = alpha / 2
+
+    nc_umvue = chi2_stat - df
+    nc = np.maximum(nc_umvue, 0)
+    nc_lzd = np.maximum(nc_umvue, chi2_stat / (df + 1))
+    nc_krs = np.maximum(nc_umvue, chi2_stat * 2 / (df + 2))
+    nc_median = special.chndtrinc(chi2_stat, df, 0.5)
+    ci = special.chndtrinc(chi2_stat, df, [1 - alpha_half, alpha_half])
+
+    res = Holder(nc=nc,
+                 confint=ci,
+                 nc_umvue=nc_umvue,
+                 nc_lzd=nc_lzd,
+                 nc_krs=nc_krs,
+                 nc_median=nc_median,
+                 name="Noncentrality for chisquare-distributed random variable"
+                 )
+    return res


 def _noncentrality_f(f_stat, df1, df2, alpha=0.05):
@@ -80,7 +98,23 @@ def _noncentrality_f(f_stat, df1, df2, alpha=0.05):
        Noncentrality Parameters.” Canadian Journal of Statistics 21 (1): 45–57.
        https://doi.org/10.2307/3315657.
     """
-    pass
+    alpha_half = alpha / 2
+
+    x_s = f_stat * df1 / df2
+    nc_umvue = (df2 - 2) * x_s - df1
+    nc = np.maximum(nc_umvue, 0)
+    nc_krs = np.maximum(nc_umvue, x_s * 2 * (df2 - 1) / (df1 + 2))
+    nc_median = special.ncfdtrinc(df1, df2, 0.5, f_stat)
+    ci = special.ncfdtrinc(df1, df2, [1 - alpha_half, alpha_half], f_stat)
+
+    res = Holder(nc=nc,
+                 confint=ci,
+                 nc_umvue=nc_umvue,
+                 nc_krs=nc_krs,
+                 nc_median=nc_median,
+                 name="Noncentrality for F-distributed random variable"
+                 )
+    return res


 def _noncentrality_t(t_stat, df, alpha=0.05):
@@ -113,4 +147,17 @@ def _noncentrality_t(t_stat, df, alpha=0.05):
        https://doi.org/10.3102/10769986006002107.

     """
-    pass
+    alpha_half = alpha / 2
+
+    gfac = np.exp(special.gammaln(df/2.-0.5) - special.gammaln(df/2.))
+    c11 = np.sqrt(df/2.) * gfac
+    nc = t_stat / c11
+    nc_median = special.nctdtrinc(df, 0.5, t_stat)
+    ci = special.nctdtrinc(df, [1 - alpha_half, alpha_half], t_stat)
+
+    res = Holder(nc=nc,
+                 confint=ci,
+                 nc_median=nc_median,
+                 name="Noncentrality for t-distributed random variable"
+                 )
+    return res
diff --git a/statsmodels/stats/gof.py b/statsmodels/stats/gof.py
index f6f085cda..f64047ce1 100644
--- a/statsmodels/stats/gof.py
+++ b/statsmodels/stats/gof.py
@@ -1,4 +1,4 @@
-"""extra statistical function and helper functions
+'''extra statistical function and helper functions

 contains:

@@ -16,14 +16,15 @@ changes
 -------
 2013-02-25 : add chisquare_power, effectsize and "value"

-"""
+'''
 from statsmodels.compat.python import lrange
 import numpy as np
 from scipy import stats


+# copied from regression/stats.utils
 def powerdiscrepancy(observed, expected, lambd=0.0, axis=0, ddof=0):
-    """Calculates power discrepancy, a class of goodness-of-fit tests
+    r"""Calculates power discrepancy, a class of goodness-of-fit tests
     as a measure of discrepancy between observed and expected data.

     This contains several goodness-of-fit tests as special cases, see the
@@ -31,7 +32,7 @@ def powerdiscrepancy(observed, expected, lambd=0.0, axis=0, ddof=0):
     is based on the asymptotic chi-square distribution of the test statistic.

     freeman_tukey:
-    D(x|\\theta) = \\sum_j (\\sqrt{x_j} - \\sqrt{e_j})^2
+    D(x|\theta) = \sum_j (\sqrt{x_j} - \sqrt{e_j})^2

     Parameters
     ----------
@@ -111,11 +112,68 @@ def powerdiscrepancy(observed, expected, lambd=0.0, axis=0, ddof=0):
     >>> powerdiscrepancy(np.column_stack((observed,2*observed)), np.column_stack((10*expected,20*expected)), lambd=-1, axis=0)
     (array([[ 2.77258872,  5.54517744]]), array([[ 0.59657359,  0.2357868 ]]))
     """
-    pass
-
+    o = np.array(observed)
+    e = np.array(expected)
+
+    if not isinstance(lambd, str):
+        a = lambd
+    else:
+        if lambd == 'loglikeratio':
+            a = 0
+        elif lambd == 'freeman_tukey':
+            a = -0.5
+        elif lambd == 'pearson':
+            a = 1
+        elif lambd == 'modified_loglikeratio':
+            a = -1
+        elif lambd == 'cressie_read':
+            a = 2/3.0
+        else:
+            raise ValueError('lambd has to be a number or one of '
+                             'loglikeratio, freeman_tukey, pearson, '
+                             'modified_loglikeratio or cressie_read')
+
+    n = np.sum(o, axis=axis)
+    nt = n
+    if n.size>1:
+        n = np.atleast_2d(n)
+        if axis == 1:
+            nt = n.T     # need both for 2d, n and nt for broadcasting
+        if e.ndim == 1:
+            e = np.atleast_2d(e)
+            if axis == 0:
+                e = e.T
+
+    if np.allclose(np.sum(e, axis=axis), n, rtol=1e-8, atol=0):
+        p = e/(1.0*nt)
+    elif np.allclose(np.sum(e, axis=axis), 1, rtol=1e-8, atol=0):
+        p = e
+        e = nt * e
+    else:
+        raise ValueError('observed and expected need to have the same '
+                         'number of observations, or e needs to add to 1')
+    k = o.shape[axis]
+    if e.shape[axis] != k:
+        raise ValueError('observed and expected need to have the same '
+                         'number of bins')
+
+    # Note: taken from formulas, to simplify cancel n
+    if a == 0:   # log likelihood ratio
+        D_obs = 2*n * np.sum(o/(1.0*nt) * np.log(o/e), axis=axis)
+    elif a == -1:  # modified log likelihood ratio
+        D_obs = 2*n * np.sum(e/(1.0*nt) * np.log(e/o), axis=axis)
+    else:
+        D_obs = 2*n/a/(a+1) * np.sum(o/(1.0*nt) * ((o/e)**a - 1), axis=axis)
+
+    return D_obs, stats.chi2.sf(D_obs,k-1-ddof)
+
+
+
+#todo: need also binning for continuous distribution
+#      and separated binning function to be used for powerdiscrepancy

 def gof_chisquare_discrete(distfn, arg, rvs, alpha, msg):
-    """perform chisquare test for random sample of a discrete distribution
+    '''perform chisquare test for random sample of a discrete distribution

     Parameters
     ----------
@@ -139,12 +197,54 @@ def gof_chisquare_discrete(distfn, arg, rvs, alpha, msg):

     refactor: maybe a class, check returns, or separate binning from
         test results
-    """
-    pass
-
-
+    '''
+
+    # define parameters for test
+##    n=2000
+    n = len(rvs)
+    nsupp = 20
+    wsupp = 1.0/nsupp
+
+##    distfn = getattr(stats, distname)
+##    np.random.seed(9765456)
+##    rvs = distfn.rvs(size=n,*arg)
+
+    # construct intervals with minimum mass 1/nsupp
+    # intervalls are left-half-open as in a cdf difference
+    distsupport = lrange(max(distfn.a, -1000), min(distfn.b, 1000) + 1)
+    last = 0
+    distsupp = [max(distfn.a, -1000)]
+    distmass = []
+    for ii in distsupport:
+        current = distfn.cdf(ii,*arg)
+        if current - last >= wsupp-1e-14:
+            distsupp.append(ii)
+            distmass.append(current - last)
+            last = current
+            if current > (1-wsupp):
+                break
+    if distsupp[-1]  < distfn.b:
+        distsupp.append(distfn.b)
+        distmass.append(1-last)
+    distsupp = np.array(distsupp)
+    distmass = np.array(distmass)
+
+    # convert intervals to right-half-open as required by histogram
+    histsupp = distsupp+1e-8
+    histsupp[0] = distfn.a
+
+    # find sample frequencies and perform chisquare test
+    #TODO: move to compatibility.py
+    freq, hsupp = np.histogram(rvs,histsupp)
+    cdfs = distfn.cdf(distsupp,*arg)
+    (chis,pval) = stats.chisquare(np.array(freq),n*distmass)
+
+    return chis, pval, (pval > alpha), 'chisquare - test for %s' \
+           'at arg = %s with pval = %s' % (msg,str(arg),str(pval))
+
+# copy/paste, remove code duplication when it works
 def gof_binning_discrete(rvs, distfn, arg, nsupp=20):
-    """get bins for chisquare type gof tests for a discrete distribution
+    '''get bins for chisquare type gof tests for a discrete distribution

     Parameters
     ----------
@@ -183,10 +283,50 @@ def gof_binning_discrete(rvs, distfn, arg, nsupp=20):
       optimal number of bins ? (check easyfit),
       recommendation in literature at least 5 expected observations in each bin

-    """
-    pass
-
-
+    '''
+
+    # define parameters for test
+##    n=2000
+    n = len(rvs)
+
+    wsupp = 1.0/nsupp
+
+##    distfn = getattr(stats, distname)
+##    np.random.seed(9765456)
+##    rvs = distfn.rvs(size=n,*arg)
+
+    # construct intervals with minimum mass 1/nsupp
+    # intervalls are left-half-open as in a cdf difference
+    distsupport = lrange(max(distfn.a, -1000), min(distfn.b, 1000) + 1)
+    last = 0
+    distsupp = [max(distfn.a, -1000)]
+    distmass = []
+    for ii in distsupport:
+        current = distfn.cdf(ii,*arg)
+        if current - last >= wsupp-1e-14:
+            distsupp.append(ii)
+            distmass.append(current - last)
+            last = current
+            if current > (1-wsupp):
+                break
+    if distsupp[-1]  < distfn.b:
+        distsupp.append(distfn.b)
+        distmass.append(1-last)
+    distsupp = np.array(distsupp)
+    distmass = np.array(distmass)
+
+    # convert intervals to right-half-open as required by histogram
+    histsupp = distsupp+1e-8
+    histsupp[0] = distfn.a
+
+    # find sample frequencies and perform chisquare test
+    freq,hsupp = np.histogram(rvs,histsupp)
+    #freq,hsupp = np.histogram(rvs,histsupp,new=True)
+    cdfs = distfn.cdf(distsupp,*arg)
+    return np.array(freq), n*distmass, histsupp
+
+
+# -*- coding: utf-8 -*-
 """Extension to chisquare goodness-of-fit test

 Created on Mon Feb 25 13:46:53 2013
@@ -196,8 +336,9 @@ License: BSD-3
 """


+
 def chisquare(f_obs, f_exp=None, value=0, ddof=0, return_basic=True):
-    """chisquare goodness-of-fit test
+    '''chisquare goodness-of-fit test

     The null hypothesis is that the distance between the expected distribution
     and the observed frequencies is ``value``. The alternative hypothesis is
@@ -226,12 +367,32 @@ def chisquare(f_obs, f_exp=None, value=0, ddof=0, return_basic=True):
     powerdiscrepancy
     scipy.stats.chisquare

-    """
-    pass
+    '''
+
+    f_obs = np.asarray(f_obs)
+    n_bins = len(f_obs)
+    nobs = f_obs.sum(0)
+    if f_exp is None:
+        # uniform distribution
+        f_exp = np.empty(n_bins, float)
+        f_exp.fill(nobs / float(n_bins))
+
+    f_exp = np.asarray(f_exp, float)
+
+    chisq = ((f_obs - f_exp)**2 / f_exp).sum(0)
+    if value == 0:
+        pvalue = stats.chi2.sf(chisq, n_bins - 1 - ddof)
+    else:
+        pvalue = stats.ncx2.sf(chisq, n_bins - 1 - ddof, value**2 * nobs)
+
+    if return_basic:
+        return chisq, pvalue
+    else:
+        return chisq, pvalue    #TODO: replace with TestResults


 def chisquare_power(effect_size, nobs, n_bins, alpha=0.05, ddof=0):
-    """power of chisquare goodness of fit test
+    '''power of chisquare goodness of fit test

     effect size is sqrt of chisquare statistic divided by nobs

@@ -270,12 +431,14 @@ def chisquare_power(effect_size, nobs, n_bins, alpha=0.05, ddof=0):
     chisquare_effectsize
     statsmodels.stats.GofChisquarePower

-    """
-    pass
+    '''
+    crit = stats.chi2.isf(alpha, n_bins - 1 - ddof)
+    power = stats.ncx2.sf(crit, n_bins - 1 - ddof, effect_size**2 * nobs)
+    return power


 def chisquare_effectsize(probs0, probs1, correction=None, cohen=True, axis=0):
-    """effect size for a chisquare goodness-of-fit test
+    '''effect size for a chisquare goodness-of-fit test

     Parameters
     ----------
@@ -308,5 +471,20 @@ def chisquare_effectsize(probs0, probs1, correction=None, cohen=True, axis=0):
     effectsize : float
         effect size of chisquare test

-    """
-    pass
+    '''
+    probs0 = np.asarray(probs0, float)
+    probs1 = np.asarray(probs1, float)
+    probs0 = probs0 / probs0.sum(axis)
+    probs1 = probs1 / probs1.sum(axis)
+
+    d2 = ((probs1 - probs0)**2 / probs0).sum(axis)
+
+    if correction is not None:
+        nobs, df = correction
+        diff = ((probs1 - probs0) / probs0).sum(axis)
+        d2 = np.maximum((d2 * nobs - diff - df) / (nobs - 1.), 0)
+
+    if cohen:
+        return np.sqrt(d2)
+    else:
+        return d2
diff --git a/statsmodels/stats/inter_rater.py b/statsmodels/stats/inter_rater.py
index 130a1bbd7..523de2df5 100644
--- a/statsmodels/stats/inter_rater.py
+++ b/statsmodels/stats/inter_rater.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Inter Rater Agreement

 contains
@@ -33,11 +34,13 @@ convenience functions to create required data format from raw data
    DONE

 """
+
 import numpy as np
-from scipy import stats
+from scipy import stats  #get rid of this? need only norm.sf


 class ResultsBunch(dict):
+
     template = '%r'

     def __init__(self, **kwds):
@@ -45,12 +48,14 @@ class ResultsBunch(dict):
         self.__dict__ = self
         self._initialize()

+    def _initialize(self):
+        pass
+
     def __str__(self):
         return self.template % self

-
 def _int_ifclose(x, dec=1, width=4):
-    """helper function for creating result string for int or float
+    '''helper function for creating result string for int or float

     only dec=1 and width=4 is implemented

@@ -70,12 +75,16 @@ def _int_ifclose(x, dec=1, width=4):
     x_string : str
         x formatted as string, either '%4d' or '%4.1f'

-    """
-    pass
+    '''
+    xint = int(round(x))
+    if np.max(np.abs(xint - x)) < 1e-14:
+        return xint, '%4d' % xint
+    else:
+        return x, '%4.1f' % x


 def aggregate_raters(data, n_cat=None):
-    """convert raw data with shape (subject, rater) to (subject, cat_counts)
+    '''convert raw data with shape (subject, rater) to (subject, cat_counts)

     brings data into correct format for fleiss_kappa

@@ -103,12 +112,27 @@ def aggregate_raters(data, n_cat=None):
     categories : nd_array, (n_category_levels,)
         Contains the category levels.

-    """
-    pass
-
+    '''
+    data = np.asarray(data)
+    n_rows = data.shape[0]
+    if n_cat is None:
+        #I could add int conversion (reverse_index) to np.unique
+        cat_uni, cat_int = np.unique(data.ravel(), return_inverse=True)
+        n_cat = len(cat_uni)
+        data_ = cat_int.reshape(data.shape)
+    else:
+        cat_uni = np.arange(n_cat)  #for return only, assumed cat levels
+        data_ = data
+
+    tt = np.zeros((n_rows, n_cat), int)
+    for idx, row in enumerate(data_):
+        ro = np.bincount(row)
+        tt[idx, :len(ro)] = ro
+
+    return tt, cat_uni

 def to_table(data, bins=None):
-    """convert raw data with shape (subject, rater) to (rater1, rater2)
+    '''convert raw data with shape (subject, rater) to (rater1, rater2)

     brings data into correct format for cohens_kappa

@@ -142,9 +166,31 @@ def to_table(data, bins=None):
     the resulting contingency table is the same as the number of raters
     instead of 2-dimensional.

-    """
-    pass
+    '''

+    data = np.asarray(data)
+    n_rows, n_cols = data.shape
+    if bins is None:
+        #I could add int conversion (reverse_index) to np.unique
+        cat_uni, cat_int = np.unique(data.ravel(), return_inverse=True)
+        n_cat = len(cat_uni)
+        data_ = cat_int.reshape(data.shape)
+        bins_ = np.arange(n_cat+1) - 0.5
+        #alternative implementation with double loop
+        #tt = np.asarray([[(x == [i,j]).all(1).sum() for j in cat_uni]
+        #                 for i in cat_uni] )
+        #other altervative: unique rows and bincount
+    elif np.isscalar(bins):
+        bins_ = np.arange(bins+1) - 0.5
+        data_ = data
+    else:
+        bins_ = bins
+        data_ = data
+
+
+    tt = np.histogramdd(data_, (bins_,)*n_cols)
+
+    return tt[0], bins_

 def fleiss_kappa(table, method='fleiss'):
     """Fleiss' and Randolph's kappa multi-rater agreement measure
@@ -197,11 +243,33 @@ def fleiss_kappa(table, method='fleiss'):
     Advances in Data Analysis and Classification 4 (4): 271-86.
     https://doi.org/10.1007/s11634-010-0073-4.
     """
-    pass
+
+    table = 1.0 * np.asarray(table)   #avoid integer division
+    n_sub, n_cat =  table.shape
+    n_total = table.sum()
+    n_rater = table.sum(1)
+    n_rat = n_rater.max()
+    #assume fully ranked
+    assert n_total == n_sub * n_rat
+
+    #marginal frequency  of categories
+    p_cat = table.sum(0) / n_total
+
+    table2 = table * table
+    p_rat = (table2.sum(1) - n_rat) / (n_rat * (n_rat - 1.))
+    p_mean = p_rat.mean()
+
+    if method == 'fleiss':
+        p_mean_exp = (p_cat*p_cat).sum()
+    elif method.startswith('rand') or method.startswith('unif'):
+        p_mean_exp = 1 / n_cat
+
+    kappa = (p_mean - p_mean_exp) / (1- p_mean_exp)
+    return kappa


 def cohens_kappa(table, weights=None, return_results=True, wt=None):
-    """Compute Cohen's kappa with variance and equal-zero test
+    '''Compute Cohen's kappa with variance and equal-zero test

     Parameters
     ----------
@@ -272,11 +340,102 @@ def cohens_kappa(table, weights=None, return_results=True, wt=None):
     Wikipedia
     SAS Manual

-    """
-    pass
-
-
-_kappa_template = """                  %(kind)s Kappa Coefficient
+    '''
+    table = np.asarray(table, float) #avoid integer division
+    agree = np.diag(table).sum()
+    nobs = table.sum()
+    probs = table / nobs
+    freqs = probs  #TODO: rename to use freqs instead of probs for observed
+    probs_diag = np.diag(probs)
+    freq_row = table.sum(1) / nobs
+    freq_col = table.sum(0) / nobs
+    prob_exp = freq_col * freq_row[:, None]
+    assert np.allclose(prob_exp.sum(), 1)
+    #print prob_exp.sum()
+    agree_exp = np.diag(prob_exp).sum() #need for kappa_max
+    if weights is None and wt is None:
+        kind = 'Simple'
+        kappa = (agree / nobs - agree_exp) / (1 - agree_exp)
+
+        if return_results:
+            #variance
+            term_a = probs_diag * (1 - (freq_row + freq_col) * (1 - kappa))**2
+            term_a = term_a.sum()
+            term_b = probs * (freq_col[:, None] + freq_row)**2
+            d_idx = np.arange(table.shape[0])
+            term_b[d_idx, d_idx] = 0   #set diagonal to zero
+            term_b = (1 - kappa)**2 * term_b.sum()
+            term_c = (kappa - agree_exp * (1-kappa))**2
+            var_kappa = (term_a + term_b - term_c) / (1 - agree_exp)**2 / nobs
+            #term_c = freq_col * freq_row[:, None] * (freq_col + freq_row[:,None])
+            term_c = freq_col * freq_row * (freq_col + freq_row)
+            var_kappa0 = (agree_exp + agree_exp**2 - term_c.sum())
+            var_kappa0 /= (1 - agree_exp)**2 * nobs
+
+    else:
+        if weights is None:
+            weights = np.arange(table.shape[0])
+        #weights follows the Wikipedia definition, not the SAS, which is 1 -
+        kind = 'Weighted'
+        weights = np.asarray(weights, float)
+        if weights.ndim == 1:
+            if wt in ['ca', 'linear', None]:
+                weights = np.abs(weights[:, None] - weights) /  \
+                           (weights[-1] - weights[0])
+            elif wt in ['fc', 'quadratic']:
+                weights = (weights[:, None] - weights)**2 /  \
+                           (weights[-1] - weights[0])**2
+            elif wt == 'toeplitz':
+                #assume toeplitz structure
+                from scipy.linalg import toeplitz
+                #weights = toeplitz(np.arange(table.shape[0]))
+                weights = toeplitz(weights)
+            else:
+                raise ValueError('wt option is not known')
+        else:
+            rows, cols = table.shape
+            if (table.shape != weights.shape):
+                raise ValueError('weights are not square')
+        #this is formula from Wikipedia
+        kappa = 1 - (weights * table).sum() / nobs / (weights * prob_exp).sum()
+        #TODO: add var_kappa for weighted version
+        if return_results:
+            var_kappa = np.nan
+            var_kappa0 = np.nan
+            #switch to SAS manual weights, problem if user specifies weights
+            #w is negative in some examples,
+            #but weights is scale invariant in examples and rough check of source
+            w = 1. - weights
+            w_row = (freq_col * w).sum(1)
+            w_col = (freq_row[:, None] * w).sum(0)
+            agree_wexp = (w * freq_col * freq_row[:, None]).sum()
+            term_a = freqs * (w -  (w_col + w_row[:, None]) * (1 - kappa))**2
+            fac = 1. / ((1 - agree_wexp)**2 * nobs)
+            var_kappa = term_a.sum() - (kappa - agree_wexp * (1 - kappa))**2
+            var_kappa *=  fac
+
+            freqse = freq_col * freq_row[:, None]
+            var_kappa0 = (freqse * (w -  (w_col + w_row[:, None]))**2).sum()
+            var_kappa0 -= agree_wexp**2
+            var_kappa0 *=  fac
+
+    kappa_max = (np.minimum(freq_row, freq_col).sum() - agree_exp) / \
+                (1 - agree_exp)
+
+    if return_results:
+        res = KappaResults( kind=kind,
+                    kappa=kappa,
+                    kappa_max=kappa_max,
+                    weights=weights,
+                    var_kappa=var_kappa,
+                    var_kappa0=var_kappa0)
+        return res
+    else:
+        return kappa
+
+
+_kappa_template = '''\
+                  %(kind)s Kappa Coefficient
               --------------------------------
               Kappa                     %(kappa)6.4f
               ASE                       %(std_kappa)6.4f
@@ -289,8 +448,9 @@ _kappa_template = """                  %(kind)s Kappa Coefficient
               Z                         %(z_value)6.4f
               One-sided Pr >  Z         %(pvalue_one_sided)6.4f
               Two-sided Pr > |Z|        %(pvalue_two_sided)6.4f
-"""
-"""
+'''
+
+'''
                    Weighted Kappa Coefficient
               --------------------------------
               Weighted Kappa            0.4701
@@ -304,11 +464,11 @@ _kappa_template = """                  %(kind)s Kappa Coefficient
               Z                         3.2971
               One-sided Pr >  Z         0.0005
               Two-sided Pr > |Z|        0.0010
-"""
+'''


 class KappaResults(ResultsBunch):
-    """Results for Cohen's kappa
+    '''Results for Cohen's kappa

     Attributes
     ----------
@@ -330,8 +490,30 @@ class KappaResults(ResultsBunch):
     The confidence interval for kappa and the statistics for the test of
     H0: kappa=0 are based on the asymptotic normal distribution of kappa.

-    """
+    '''
+
     template = _kappa_template

+    def _initialize(self):
+        if 'alpha' not in self:
+            self['alpha'] = 0.025
+            self['alpha_ci'] = _int_ifclose(100 - 0.025 * 200)[1]
+
+        self['std_kappa'] = np.sqrt(self['var_kappa'])
+        self['std_kappa0'] = np.sqrt(self['var_kappa0'])
+
+        self['z_value'] = self['kappa'] / self['std_kappa0']
+
+        self['pvalue_one_sided'] = stats.norm.sf(self['z_value'])
+        self['pvalue_two_sided'] = stats.norm.sf(np.abs(self['z_value'])) * 2
+
+        delta = stats.norm.isf(self['alpha']) * self['std_kappa']
+        self['kappa_low'] = self['kappa'] - delta
+        self['kappa_upp'] = self['kappa'] + delta
+        self['distribution_kappa'] = stats.norm(loc=self['kappa'],
+                                                scale=self['std_kappa'])
+        self['distribution_zero_null'] = stats.norm(loc=0,
+                                                scale=self['std_kappa0'])
+
     def __str__(self):
         return self.template % self
diff --git a/statsmodels/stats/knockoff_regeffects.py b/statsmodels/stats/knockoff_regeffects.py
index 5d7934f46..d3e575c0c 100644
--- a/statsmodels/stats/knockoff_regeffects.py
+++ b/statsmodels/stats/knockoff_regeffects.py
@@ -20,6 +20,9 @@ class RegressionEffects:
     to the strength of the estimated association for coefficient p+j.
     """

+    def stats(self, parent):
+        raise NotImplementedError
+

 class CorrelationEffects(RegressionEffects):
     """
@@ -39,6 +42,11 @@ class CorrelationEffects(RegressionEffects):
     paper.
     """

+    def stats(self, parent):
+        s1 = np.dot(parent.exog1.T, parent.endog)
+        s2 = np.dot(parent.exog2.T, parent.endog)
+        return np.abs(s1) - np.abs(s2)
+

 class ForwardEffects(RegressionEffects):
     """
@@ -68,6 +76,28 @@ class ForwardEffects(RegressionEffects):
     def __init__(self, pursuit):
         self.pursuit = pursuit

+    def stats(self, parent):
+        nvar = parent.exog.shape[1]
+        rv = parent.endog.copy()
+        vl = [(i, parent.exog[:, i]) for i in range(nvar)]
+        z = np.empty(nvar)
+        past = []
+        for i in range(nvar):
+            dp = np.r_[[np.abs(np.dot(rv, x[1])) for x in vl]]
+            j = np.argmax(dp)
+            z[vl[j][0]] = nvar - i - 1
+            x = vl[j][1]
+            del vl[j]
+            if self.pursuit:
+                for v in past:
+                    x -= np.dot(x, v)*v
+                past.append(x)
+            rv -= np.dot(rv, x) * x
+        z1 = z[0:nvar//2]
+        z2 = z[nvar//2:]
+        st = np.where(z1 > z2, z1, z2) * np.sign(z1 - z2)
+        return st
+

 class OLSEffects(RegressionEffects):
     """
@@ -87,6 +117,15 @@ class OLSEffects(RegressionEffects):
     paper.
     """

+    def stats(self, parent):
+        from statsmodels.regression.linear_model import OLS
+
+        model = OLS(parent.endog, parent.exog)
+        result = model.fit()
+        q = len(result.params) // 2
+        stats = np.abs(result.params[0:q]) - np.abs(result.params[q:])
+        return stats
+

 class RegModelEffects(RegressionEffects):
     """
@@ -109,8 +148,18 @@ class RegModelEffects(RegressionEffects):
     """

     def __init__(self, model_cls, regularized=False, model_kws=None,
-        fit_kws=None):
+                 fit_kws=None):
         self.model_cls = model_cls
         self.regularized = regularized
         self.model_kws = model_kws if model_kws is not None else {}
         self.fit_kws = fit_kws if fit_kws is not None else {}
+
+    def stats(self, parent):
+        model = self.model_cls(parent.endog, parent.exog, **self.model_kws)
+        if self.regularized:
+            params = model.fit_regularized(**self.fit_kws).params
+        else:
+            params = model.fit(**self.fit_kws).params
+        q = len(params) // 2
+        stats = np.abs(params[0:q]) - np.abs(params[q:])
+        return stats
diff --git a/statsmodels/stats/libqsturng/make_tbls.py b/statsmodels/stats/libqsturng/make_tbls.py
index 569ace0ab..33bc88c05 100644
--- a/statsmodels/stats/libqsturng/make_tbls.py
+++ b/statsmodels/stats/libqsturng/make_tbls.py
@@ -4,9 +4,25 @@ from statsmodels.compat.python import lrange, lmap
 import math
 import scipy.stats
 from scipy.optimize import leastsq
+
 import numpy as np
 from numpy.random import random
-q0100 = """2 0.2010022 0.6351172 0.9504689 1.179321 1.354691 1.495126 1.611354 1.709984 1.795325 1.87032 1.937057 1.997068 2.051505 2.101256 2.147016 2.189342 2.228683 2.265408 2.299823 2.558612 2.729718 2.95625 3.184742938 3.398609188
+
+# The values for p in [.5, .75, .9, .95, .975, .99, .995, .999]
+# were pulled from:
+#    http://www.stata.com/stb/stb46/dm64/sturng.pdf
+#
+# Values for p in [.1, .675, .8, .85] were calculated using R's qtukey function
+#
+# the table was programmed by Gleason and extends Harter's (1960) table
+# using the Copenhaver & Holland (1988) algorithm (C&H). Gleason found
+# that the 4th significant digit of the C&H differed from Harter's
+# tables on about 20% of the values. Gleason states this was do to
+# consevative rounding by Harter. In those event the table reflects
+# Harter's orginal approximations.
+
+q0100 = """\
+2 0.2010022 0.6351172 0.9504689 1.179321 1.354691 1.495126 1.611354 1.709984 1.795325 1.87032 1.937057 1.997068 2.051505 2.101256 2.147016 2.189342 2.228683 2.265408 2.299823 2.558612 2.729718 2.95625 3.184742938 3.398609188
 3 0.193179 0.6294481 0.9564746 1.19723 1.383028 1.532369 1.656225 1.761451 1.852559 1.93265 2.003933 2.068034 2.126178 2.179312 2.228177 2.273367 2.315364 2.354561 2.391287 2.667213 2.849389 3.009265469 3.237758406 3.451624656
 4 0.1892648 0.6266441 0.9606115 1.2089 1.401557 1.55691 1.686009 1.795829 1.890994 1.974697 2.049222 2.116253 2.177065 2.232641 2.283754 2.331023 2.37495 2.415949 2.454361 2.742846 2.933173 3.062280938 3.290773875 3.504640125
 5 0.1869239 0.6249713 0.963532 1.217021 1.414548 1.574255 1.707205 1.820437 1.91864 2.005066 2.082048 2.151312 2.214162 2.271609 2.324449 2.37332 2.418737 2.461128 2.500844 2.7991 2.9958 3.115296406 3.343789344 3.557655594
@@ -31,7 +47,9 @@ q0100 = """2 0.2010022 0.6351172 0.9504689 1.179321 1.354691 1.495126 1.611354 1
 60 0.1784658 0.6188994 0.9777482 1.256759 1.480212 1.664639 1.820636 1.955191 2.07309 2.177731 2.271599 2.356562 2.434053 2.505196 2.570884 2.631843 2.688663 2.745483 2.802303 3.159123 3.406221516 3.626467688 3.90118925 4.1497775
 120 0.1780885 0.6186256 0.9785495 1.259052 1.484147 1.67025 1.827902 1.964066 2.083518 2.189653 2.284954 2.371292 2.450102 2.522511 2.589417 2.651546 2.71243375 2.763748 2.8156585 3.201588 3.459897 3.67055425 3.9555195 4.20683
 1e38 0.177712 0.6183521 0.9793662 1.261398 1.488195 1.676051 1.835449 1.973327 2.094446 2.202195 2.299057 2.386902 2.467168 2.540983 2.609248 2.677513 2.745778 2.787396 2.829014 3.236691 3.487830797 3.721309063 4.01874075 4.279424"""
-q0300 = """2 0.6289521 1.248281 1.638496 1.916298 2.129504 2.301246 2.444313 2.566465 2.672747 2.766604 2.850494 2.926224 2.995161 3.05836 3.116655 3.170712 3.221076 3.268192 3.312433 3.647666 3.871606 4.170521 4.372227 4.52341
+
+q0300 = """\
+2 0.6289521 1.248281 1.638496 1.916298 2.129504 2.301246 2.444313 2.566465 2.672747 2.766604 2.850494 2.926224 2.995161 3.05836 3.116655 3.170712 3.221076 3.268192 3.312433 3.647666 3.871606 4.170521 4.372227 4.52341
 3 0.5999117 1.209786 1.598235 1.875707 2.088948 2.260822 2.404037 2.52633 2.632732 2.726691 2.810665 2.886463 2.955453 3.018695 3.077022 3.131103 3.181483 3.228609 3.272854 3.607961 3.831649 4.130021 4.331231 4.48198
 4 0.5857155 1.19124 1.579749 1.858059 2.072245 2.245015 2.389043 2.512062 2.619115 2.713659 2.798159 2.874434 2.943859 3.007497 3.066189 3.120607 3.171298 3.218713 3.263228 3.600301 3.825208 4.125075 4.327208 4.478602
 5 0.5773226 1.18033 1.569213 1.84843 2.063579 2.237255 2.382108 2.505872 2.613597 2.708749 2.793803 2.870583 2.94047 3.004536 3.063622 3.118406 3.169439 3.217173 3.261987 3.601296 3.827646 4.129353 4.332665 4.48491
@@ -56,7 +74,9 @@ q0300 = """2 0.6289521 1.248281 1.638496 1.916298 2.129504 2.301246 2.444313 2.5
 60 0.5475416 1.141671 1.53435 1.820353 2.042846 2.223698 2.375337 2.505445 2.619082 2.719743 2.809939 2.891531 2.965932 3.034243 3.097332 3.1559 3.210518 3.261657 3.30971 3.674738 3.919203 4.245861 4.438669 4.631477
 120 0.5462314 1.139963 1.532911 1.819385 2.04241 2.223806 2.375978 2.506602 2.620733 2.721868 2.812516 2.894541 2.969356 3.038064 3.101534 3.160468 3.215439 3.39400025 3.5725615 3.75112275 3.929684 4.256342 4.44915 4.641958
 1e38 0.5449254 1.138259 1.531485 1.818447 2.042028 2.223993 2.376728 2.507898 2.622556 2.724195 2.815328 2.897817 2.973079 3.042215 3.106097 3.165428 3.220399 3.39896025 3.5775215 3.75608275 3.934644 4.261302 4.45411 4.646918"""
-q0500 = """2 1.155 1.908 2.377 2.713 2.973 3.184 3.361 3.513 3.645 3.762 3.867 3.963 4.049 4.129 4.203 4.271 4.335 4.394 4.451 4.878 5.165 5.549 5.810 6.006
+
+q0500 = """\
+2 1.155 1.908 2.377 2.713 2.973 3.184 3.361 3.513 3.645 3.762 3.867 3.963 4.049 4.129 4.203 4.271 4.335 4.394 4.451 4.878 5.165 5.549 5.810 6.006
 3 1.082 1.791 2.230 2.545 2.789 2.986 3.152 3.294 3.418 3.528 3.626 3.715 3.796 3.871 3.940 4.004 4.064 4.120 4.172 4.573 4.842 5.202 5.447 5.630
 4 1.048 1.736 2.163 2.468 2.704 2.895 3.055 3.193 3.313 3.419 3.515 3.601 3.680 3.752 3.819 3.881 3.939 3.993 4.044 4.432 4.693 5.043 5.279 5.457
 5 1.028 1.705 2.124 2.423 2.655 2.843 3.000 3.135 3.253 3.357 3.451 3.535 3.613 3.684 3.749 3.810 3.867 3.920 3.970 4.351 4.608 4.951 5.184 5.358
@@ -81,7 +101,9 @@ q0500 = """2 1.155 1.908 2.377 2.713 2.973 3.184 3.361 3.513 3.645 3.762 3.867 3
 60 .9597 1.597 1.990 2.270 2.486 2.661 2.808 2.933 3.043 3.140 3.227 3.306 3.378 3.444 3.505 3.562 3.615 3.665 3.711 4.067 4.306 4.627 4.845 5.009
 120 .9568 1.592 1.984 2.263 2.479 2.653 2.799 2.924 3.034 3.130 3.217 3.296 3.367 3.433 3.494 3.550 3.603 3.652 3.699 4.052 4.290 4.610 4.827 4.990
 1e38 .9539 1.588 1.978 2.257 2.472 2.645 2.791 2.915 3.024 3.121 3.207 3.285 3.356 3.422 3.482 3.538 3.591 3.640 3.686 4.037 4.274 4.591 4.806 4.968"""
-q0675 = """2 1.829602 2.751705 3.332700 3.754119 4.082579 4.350351 4.575528 4.769258 4.938876 5.089456 5.22465 5.347168 5.459072 5.56197 5.657136 5.745596 5.828188 5.905606 5.978428 6.534036 6.908522 7.411898 7.753537 8.010516
+
+q0675 = """\
+2 1.829602 2.751705 3.332700 3.754119 4.082579 4.350351 4.575528 4.769258 4.938876 5.089456 5.22465 5.347168 5.459072 5.56197 5.657136 5.745596 5.828188 5.905606 5.978428 6.534036 6.908522 7.411898 7.753537 8.010516
 3 1.660743 2.469725 2.973973 3.338757 3.622958 3.854718 4.049715 4.217574 4.364624 4.495236 4.612559 4.718926 4.816117 4.905518 4.988228 5.065133 5.136955 5.204295 5.267653 5.751485 6.07799 6.517299 6.815682 7.040219
 4 1.585479 2.344680 2.814410 3.153343 3.417165 3.632254 3.813232 3.96905 4.105579 4.226877 4.335857 4.434684 4.525004 4.608102 4.684995 4.756504 4.823298 4.885932 4.944872 5.395226 5.699385 6.108899 6.387203 6.596702
 5 1.543029 2.27426 2.72431 3.048331 3.300303 3.505645 3.678397 3.827131 3.957462 4.073264 4.177319 4.27169 4.35795 4.437321 4.510774 4.579091 4.642911 4.702763 4.759089 5.189651 5.480611 5.872552 6.139026 6.339673
@@ -106,7 +128,9 @@ q0675 = """2 1.829602 2.751705 3.332700 3.754119 4.082579 4.350351 4.575528 4.76
 60 1.40343 2.042626 2.425459 2.696611 2.905244 3.074043 3.215327 3.336516 3.442417 3.536316 3.620555 3.696861 3.766541 3.830609 3.889866 3.944956 3.996401 4.044636 4.09002 4.436878 4.671454 4.987998 5.203693 5.366394
 120 1.397651 2.033010 2.412913 2.681639 2.888185 3.055146 3.194785 3.314484 3.419022 3.511665 3.59474 3.66996 3.738623 3.801735 3.86009 3.914325 3.96496 4.012423 4.057072 4.398008 4.628308 4.938805 5.150236 5.309666
 1e38 1.391918 2.023469 2.400447 2.666735 2.871167 3.036254 3.174203 3.292360 3.395479 3.486805 3.568651 3.642718 3.710296 3.772381 3.829761 3.883069 3.93282 3.979437 4.023276 4.357546 4.582861 4.886029 5.092081 5.247256"""
-q0750 = """2 2.267583 3.308014 3.969236 4.451126 4.82785 5.13561 5.394819 5.618097 5.813776 5.987632 6.143829 6.285461 6.41489 6.533954 6.644113 6.746546 6.842214 6.931913 7.01631 7.660853 8.09584 8.68119 9.0788 9.377929
+
+q0750 = """\
+2 2.267583 3.308014 3.969236 4.451126 4.82785 5.13561 5.394819 5.618097 5.813776 5.987632 6.143829 6.285461 6.41489 6.533954 6.644113 6.746546 6.842214 6.931913 7.01631 7.660853 8.09584 8.68119 9.0788 9.377929
 3 2.011896 2.883775 3.431223 3.829258 4.140443 4.394852 4.609323 4.794233 4.956425 5.10064 5.230299 5.347941 5.455509 5.554514 5.646158 5.73141 5.811064 5.885775 5.956093 6.493827 6.857365 7.3472 7.680302 7.931152
 4 1.901267 2.701018 3.198596 3.559322 3.841087 4.071417 4.265624 4.433118 4.580085 4.710812 4.828384 4.935098 5.032703 5.122566 5.205771 5.283192 5.355547 5.423427 5.48733 5.976418 6.307462 6.753955 7.057827 7.286775
 5 1.839820 2.599651 3.069171 3.40865 3.673526 3.889955 4.072422 4.229795 4.367901 4.490764 4.601285 4.701617 4.793402 4.877922 4.956192 5.029034 5.09712 5.161005 5.221154 5.681792 5.993844 6.415033 6.70187 6.918073
@@ -131,7 +155,9 @@ q0750 = """2 2.267583 3.308014 3.969236 4.451126 4.82785 5.13561 5.394819 5.6180
 60 1.642744 2.274622 2.650486 2.916339 3.120874 3.286413 3.425034 3.544004 3.648023 3.740302 3.823131 3.898196 3.966776 4.029861 4.088234 4.142523 4.19324 4.24081 4.285584 4.628295 4.860604 5.174794 5.389348 5.551435
 120 1.634753 2.261421 2.633285 2.895829 3.097525 3.260567 3.396959 3.513912 3.616089 3.706672 3.787929 3.861531 3.92874 3.990536 4.047692 4.10083 4.150455 4.196985 4.240768 4.575490 4.802013 5.107977 5.316696 5.474283
 1e38 1.626840 2.248346 2.616224 2.875451 3.074279 3.234786 3.368898 3.483775 3.584045 3.672862 3.752475 3.824535 3.890294 3.950721 4.006580 4.058483 4.106932 4.152338 4.195044 4.520933 4.740866 5.037152 5.238766 5.390726"""
-q0800 = """2 2.666345 3.820436 4.558532 5.098158 5.520848 5.866626 6.158145 6.409446 6.62982 6.825717 7.001791 7.161505 7.307502 7.441845 7.566171 7.681802 7.789818 7.891113 7.986436 8.714887 9.206808 9.868718 10.31830 10.65683
+
+q0800 = """\
+2 2.666345 3.820436 4.558532 5.098158 5.520848 5.866626 6.158145 6.409446 6.62982 6.825717 7.001791 7.161505 7.307502 7.441845 7.566171 7.681802 7.789818 7.891113 7.986436 8.714887 9.206808 9.868718 10.31830 10.65683
 3 2.316120 3.245426 3.832597 4.261107 4.596942 4.871989 5.104169 5.304561 5.480484 5.637021 5.777843 5.905682 6.022626 6.130305 6.230013 6.322797 6.409513 6.49087 6.567462 7.153711 7.55053 8.085698 8.449862 8.724212
 4 2.168283 3.003795 3.52645 3.90676 4.204595 4.44853 4.654519 4.832388 4.988615 5.127694 5.25287 5.366554 5.470593 5.566425 5.655195 5.737827 5.815079 5.887577 5.955847 6.47896 6.833568 7.31242 7.638648 7.884592
 5 2.087215 2.871505 3.358337 3.711564 3.987876 4.214094 4.405111 4.57007 4.714986 4.844026 4.960193 5.065723 5.162321 5.25132 5.333779 5.410553 5.482342 5.549725 5.613191 6.099852 6.430105 6.876484 7.180827 7.410389
@@ -156,7 +182,9 @@ q0800 = """2 2.666345 3.820436 4.558532 5.098158 5.520848 5.866626 6.158145 6.40
 60 1.832568 2.456435 2.826413 3.088026 3.289358 3.452379 3.588962 3.70624 3.808829 3.899882 3.981645 4.055774 4.123524 4.185868 4.243575 4.297261 4.34743 4.394499 4.438814 4.778404 5.009002 5.321406 5.535087 5.696698
 120 1.822478 2.439890 2.804980 3.062567 3.260456 3.420458 3.55435 3.669198 3.769570 3.858583 3.938458 4.010829 4.076934 4.137731 4.193979 4.246285 4.295145 4.340968 4.384094 4.714106 4.937761 5.24027 5.446912 5.603078
 1e38 1.812388 2.423529 2.783758 3.037317 3.231739 3.388684 3.519834 3.632192 3.73028 3.817183 3.895093 3.965627 4.030005 4.089173 4.143877 4.194716 4.242179 4.286668 4.328517 4.648069 4.863937 5.155024 5.353283 5.50281"""
-q0850 = """2 3.226562 4.548022 5.398759 6.022701 6.512387 6.913502 7.251997 7.54401 7.800236 8.028116 8.233021 8.418953 8.588968 8.74545 8.890294 9.02503 9.150913 9.268977 9.380094 10.22972 10.80450 11.58094 12.11086 12.51097
+
+q0850 = """\
+2 3.226562 4.548022 5.398759 6.022701 6.512387 6.913502 7.251997 7.54401 7.800236 8.028116 8.233021 8.418953 8.588968 8.74545 8.890294 9.02503 9.150913 9.268977 9.380094 10.22972 10.80450 11.58094 12.11086 12.51097
 3 2.721399 3.731515 4.374509 4.845675 5.215912 5.5197 5.776502 5.998388 6.193356 6.366968 6.523249 6.665198 6.795111 6.914781 7.025634 7.128823 7.225292 7.315823 7.401073 8.054202 8.496827 9.094477 9.501702 9.808753
 4 2.514747 3.399285 3.956491 4.363675 4.68348 4.945965 5.16798 5.359938 5.52872 5.679113 5.814574 5.937683 6.050411 6.154302 6.25058 6.34024 6.424092 6.502812 6.576963 7.145835 7.532079 8.054293 8.410406 8.679063
 5 2.403262 3.220436 3.730867 4.102766 4.394545 4.633955 4.836465 5.011596 5.165628 5.302922 5.426626 5.539086 5.642096 5.737057 5.825086 5.907085 5.983793 6.055822 6.123687 6.644817 6.999123 7.478735 7.806159 8.05333
@@ -181,7 +209,9 @@ q0850 = """2 3.226562 4.548022 5.398759 6.022701 6.512387 6.913502 7.251997 7.54
 60 2.062208 2.674759 3.037417 3.293931 3.491470 3.651535 3.785736 3.901046 4.001976 4.091607 4.172136 4.245183 4.311975 4.373463 4.430401 4.483391 4.532928 4.579419 4.623205 4.95919 5.187807 5.498134 5.710792 5.871841
 120 2.048920 2.653534 3.010189 3.261791 3.455148 3.611561 3.742514 3.854896 3.953159 4.040341 4.118605 4.189543 4.254363 4.314 4.369191 4.42053 4.4685 4.513501 4.555864 4.880396 5.100706 5.399172 5.60337 5.757859
 1e38 2.035805 2.632586 2.983286 3.229990 3.419154 3.571884 3.699544 3.808945 3.904479 3.989143 4.065068 4.133821 4.19659 4.254292 4.307653 4.357255 4.403572 4.446994 4.487848 4.800043 5.011193 5.296241 5.4906 5.637297"""
-q0900 = """1 8.929 13.44 16.36 18.49 20.15 21.51 22.64 23.62 24.48 25.24 25.92 26.54 27.10 27.62 28.10 28.54 28.96 29.35 29.71 32.50 34.38 36.91 38.62 39.91
+
+q0900 = """\
+1 8.929 13.44 16.36 18.49 20.15 21.51 22.64 23.62 24.48 25.24 25.92 26.54 27.10 27.62 28.10 28.54 28.96 29.35 29.71 32.50 34.38 36.91 38.62 39.91
 2 4.129 5.733 6.773 7.538 8.139 8.633 9.049 9.409 9.725 10.01 10.26 10.49 10.70 10.89 11.07 11.24 11.39 11.54 11.68 12.73 13.44 14.40 15.04 15.54
 3 3.328 4.467 5.199 5.738 6.162 6.511 6.806 7.062 7.287 7.487 7.667 7.831 7.982 8.120 8.248 8.368 8.479 8.584 8.683 9.440 9.954 10.65 11.12 11.48
 4 3.015 3.976 4.586 5.035 5.388 5.679 5.926 6.139 6.327 6.494 6.645 6.783 6.909 7.025 7.132 7.233 7.326 7.414 7.497 8.135 8.569 9.156 9.558 9.861
@@ -207,7 +237,9 @@ q0900 = """1 8.929 13.44 16.36 18.49 20.15 21.51 22.64 23.62 24.48 25.24 25.92 2
 60 2.363 2.959 3.312 3.562 3.755 3.911 4.042 4.155 4.254 4.342 4.421 4.493 4.558 4.619 4.675 4.727 4.775 4.821 4.864 5.196 5.422 5.730 5.941 6.101
 120 2.344 2.930 3.276 3.520 3.707 3.859 3.986 4.096 4.191 4.276 4.353 4.422 4.485 4.543 4.597 4.647 4.694 4.738 4.779 5.097 5.313 5.606 5.808 5.960
 1e38 2.326 2.902 3.240 3.478 3.661 3.808 3.931 4.037 4.129 4.211 4.285 4.351 4.412 4.468 4.519 4.568 4.612 4.654 4.694 4.997 5.202 5.480 5.669 5.812"""
-q0950 = """1 17.97 26.98 32.82 37.08 40.41 43.12 45.40 47.36 49.07 50.59 51.96 53.20 54.33 55.36 56.32 57.22 58.04 58.83 59.56 65.15 68.92 73.97 77.40 79.98
+
+q0950 = """\
+1 17.97 26.98 32.82 37.08 40.41 43.12 45.40 47.36 49.07 50.59 51.96 53.20 54.33 55.36 56.32 57.22 58.04 58.83 59.56 65.15 68.92 73.97 77.40 79.98
 2 6.085 8.331 9.799 10.88 11.73 12.43 13.03 13.54 13.99 14.40 14.76 15.09 15.39 15.65 15.92 16.14 16.38 16.57 16.78 18.27 19.28 20.66 21.59 22.29
 3 4.501 5.910 6.825 7.502 8.037 8.478 8.852 9.177 9.462 9.717 9.946 10.15 10.35 10.52 10.69 10.84 10.98 11.11 11.24 12.21 12.86 13.76 14.36 14.82
 4 3.926 5.040 5.757 6.287 6.706 7.053 7.347 7.602 7.826 8.027 8.208 8.373 8.524 8.664 8.793 8.914 9.027 9.133 9.233 10.00 10.53 11.24 11.73 12.10
@@ -233,7 +265,9 @@ q0950 = """1 17.97 26.98 32.82 37.08 40.41 43.12 45.40 47.36 49.07 50.59 51.96 5
 60 2.829 3.399 3.737 3.977 4.163 4.314 4.441 4.550 4.646 4.732 4.808 4.878 4.942 5.001 5.056 5.107 5.154 5.199 5.241 5.566 5.789 6.093 6.302 6.462
 120 2.800 3.356 3.685 3.917 4.096 4.241 4.363 4.468 4.560 4.641 4.714 4.781 4.842 4.898 4.950 4.998 5.043 5.086 5.126 5.434 5.644 5.929 6.126 6.275
 1e38 2.772 3.314 3.633 3.858 4.030 4.170 4.286 4.387 4.474 4.552 4.622 4.685 4.743 4.796 4.845 4.891 4.934 4.974 5.012 5.301 5.498 5.764 5.947 6.085"""
-q0975 = """1 35.99 54.00 65.69 74.22 80.87 86.29 90.85 94.77 98.20 101.3 104.0 106.5 108.8 110.8 112.7 114.5 116.2 117.7 119.2 130.4 137.9 148.1 154.9 160.0
+
+q0975 = """\
+1 35.99 54.00 65.69 74.22 80.87 86.29 90.85 94.77 98.20 101.3 104.0 106.5 108.8 110.8 112.7 114.5 116.2 117.7 119.2 130.4 137.9 148.1 154.9 160.0
 2 8.776 11.94 14.02 15.54 16.75 17.74 18.58 19.31 19.95 20.52 21.03 21.49 21.91 22.30 22.67 23.01 23.32 23.62 23.89 26.03 27.47 29.42 30.74 31.74
 3 5.907 7.661 8.808 9.659 10.33 10.89 11.36 11.77 12.14 12.46 12.75 13.01 13.25 13.47 13.68 13.87 14.05 14.22 14.38 15.62 16.46 17.58 18.37 18.95
 4 4.943 6.244 7.088 7.715 8.213 8.625 8.975 9.279 9.548 9.788 10.00 10.20 10.38 10.55 10.71 10.85 10.99 11.11 11.23 12.16 12.78 13.65 14.23 14.68
@@ -259,7 +293,9 @@ q0975 = """1 35.99 54.00 65.69 74.22 80.87 86.29 90.85 94.77 98.20 101.3 104.0 1
 60 3.251 3.798 4.124 4.356 4.536 4.682 4.806 4.912 5.006 5.089 5.164 5.232 5.295 5.352 5.406 5.456 5.502 5.546 5.588 5.908 6.127 6.428 6.636 6.795
 120 3.210 3.739 4.053 4.275 4.447 4.587 4.704 4.805 4.894 4.972 5.043 5.107 5.166 5.221 5.271 5.318 5.362 5.403 5.442 5.741 5.946 6.225 6.418 6.564
 1e38 3.170 3.682 3.984 4.197 4.361 4.494 4.605 4.700 4.784 4.858 4.925 4.985 5.041 5.092 5.139 5.183 5.224 5.262 5.299 5.577 5.766 6.023 6.199 6.333"""
-q0990 = """1 90.02 135.0 164.3 185.6 202.2 215.8 227.2 237.0 245.6 253.2 260.0 266.2 271.8 277.0 281.8 286.3 290.4 294.3 298.0 326.0 344.8 370.1 387.3 400.1
+
+q0990 = """\
+1 90.02 135.0 164.3 185.6 202.2 215.8 227.2 237.0 245.6 253.2 260.0 266.2 271.8 277.0 281.8 286.3 290.4 294.3 298.0 326.0 344.8 370.1 387.3 400.1
 2 14.04 19.02 22.29 24.72 26.63 28.20 29.53 30.68 31.69 32.59 33.40 34.13 34.81 35.43 36.00 36.53 37.03 37.50 37.95 41.32 43.61 46.70 48.80 50.38
 3 8.260 10.62 12.17 13.32 14.24 15.00 15.65 16.21 16.69 17.13 17.53 17.89 18.22 18.52 18.81 19.07 19.32 19.55 19.77 21.44 22.59 24.13 25.19 25.99
 4 6.511 8.120 9.173 9.958 10.58 11.10 11.54 11.92 12.26 12.57 12.84 13.09 13.32 13.53 13.72 13.91 14.08 14.24 14.39 15.57 16.38 17.46 18.20 18.77
@@ -285,7 +321,9 @@ q0990 = """1 90.02 135.0 164.3 185.6 202.2 215.8 227.2 237.0 245.6 253.2 260.0 2
 60 3.762 4.282 4.594 4.818 4.991 5.133 5.253 5.356 5.447 5.528 5.601 5.667 5.728 5.784 5.837 5.886 5.931 5.974 6.015 6.329 6.546 6.843 7.049 7.207
 120 3.702 4.200 4.497 4.708 4.872 5.005 5.118 5.214 5.299 5.375 5.443 5.505 5.561 5.614 5.662 5.708 5.750 5.790 5.827 6.117 6.316 6.588 6.776 6.919
 1e38 3.643 4.120 4.403 4.603 4.757 4.882 4.987 5.078 5.157 5.227 5.290 5.348 5.400 5.448 5.493 5.535 5.574 5.611 5.645 5.911 6.092 6.338 6.507 6.636"""
-q0995 = """1 180.1 270.1 328.5 371.2 404.4 431.6 454.4 474.0 491.1 506.3 520.0 532.4 543.6 554.0 563.6 572.5 580.9 588.7 596.0 652.0 689.6 740.2 774.5 800.3
+
+q0995 = """\
+1 180.1 270.1 328.5 371.2 404.4 431.6 454.4 474.0 491.1 506.3 520.0 532.4 543.6 554.0 563.6 572.5 580.9 588.7 596.0 652.0 689.6 740.2 774.5 800.3
 2 19.92 26.97 31.60 35.02 37.73 39.95 41.83 43.46 44.89 46.16 47.31 48.35 49.30 50.17 50.99 51.74 52.45 53.12 53.74 58.52 61.76 66.13 69.10 71.35
 3 10.54 13.51 15.45 16.91 18.06 19.01 19.83 20.53 21.15 21.70 22.20 22.66 23.08 23.46 23.82 24.15 24.46 24.76 25.03 27.15 28.60 30.55 31.88 32.90
 4 7.916 9.813 11.06 11.99 12.74 13.35 13.88 14.33 14.74 15.10 15.42 15.72 15.99 16.24 16.47 16.70 16.90 17.09 17.28 18.68 19.63 20.93 21.83 22.50
@@ -311,7 +349,9 @@ q0995 = """1 180.1 270.1 328.5 371.2 404.4 431.6 454.4 474.0 491.1 506.3 520.0 5
 60 4.122 4.625 4.928 5.146 5.316 5.454 5.571 5.673 5.762 5.841 5.913 5.979 6.039 6.094 6.146 6.194 6.239 6.281 6.321 6.632 6.846 7.142 7.347 7.504
 120 4.044 4.523 4.809 5.013 5.172 5.301 5.410 5.504 5.586 5.660 5.726 5.786  5.842 5.893 5.940 5.984 6.025 6.064 6.101 6.384 6.579 6.846 7.031 7.172
 1e38 3.970 4.424 4.694 4.886 5.033 5.154 5.255 5.341 5.418 5.485 5.546 5.602  5.652 5.699 5.742 5.783 5.820 5.856 5.889 6.146 6.322 6.561 6.725 6.850"""
-q0999 = """1 900.3 1351. 1643. 1856. 2022. 2158. 2272. 2370. 2455. 2532. 2600. 2662. 2718. 2770. 2818. 2863. 2904. 2943. 2980. 3260. 3448. 3701. 3873. 4002.
+
+q0999 = """\
+1 900.3 1351. 1643. 1856. 2022. 2158. 2272. 2370. 2455. 2532. 2600. 2662. 2718. 2770. 2818. 2863. 2904. 2943. 2980. 3260. 3448. 3701. 3873. 4002.
 2 44.69 60.42 70.77 78.43 84.49 89.46 93.67 97.30 100.5 103.3 105.9 108.2 110.4 112.3 114.2 115.9 117.4 118.9 120.3 131.0 138.3 148.0 154.7 159.7
 3 18.28 23.32 26.65 29.13 31.11 32.74 34.12 35.33 36.39 37.34 38.20 38.98 39.69 40.35 40.97 41.54 42.07 42.58 43.05 46.68 49.16 52.51 54.81 56.53
 4 12.18 14.98 16.84 18.23 19.34 20.26 21.04 21.73 22.33 22.87 23.36 23.81 24.21 24.59 24.94 25.27 25.58 25.87 26.14 28.24 29.68 31.65 32.98 34.00
@@ -337,36 +377,88 @@ q0999 = """1 900.3 1351. 1643. 1856. 2022. 2158. 2272. 2370. 2455. 2532. 2600. 2
 60 4.893 5.365 5.653 5.860 6.022 6.155 6.268 6.365 6.451 6.528 6.598 6.661 6.720 6.773 6.824 6.870 6.914 6.956 6.995 7.299 7.510 7.802 8.005 8.161
 120 4.771 5.211 5.476 5.667 5.815 5.937 6.039 6.128 6.206 6.275 6.338 6.395 6.448 6.496 6.541 6.583 6.623 6.660 6.695 6.966 7.153 7.410 7.589 7.726
 1e38 4.654 5.063 5.309 5.484 5.619 5.730 5.823 5.903 5.973 6.036 6.092 6.144 6.191 6.234 6.274 6.312 6.347 6.380 6.411 6.651 6.816 7.041 7.196 7.314"""
-T = dict([(0.1, dict([(float(L.split()[0]), lmap(float, L.split()[1:])) for
-    L in q0100.split('\n')])), (0.5, dict([(float(L.split()[0]), lmap(float,
-    L.split()[1:])) for L in q0500.split('\n')])), (0.675, dict([(float(L.
-    split()[0]), lmap(float, L.split()[1:])) for L in q0675.split('\n')])),
-    (0.75, dict([(float(L.split()[0]), lmap(float, L.split()[1:])) for L in
-    q0750.split('\n')])), (0.8, dict([(float(L.split()[0]), lmap(float, L.
-    split()[1:])) for L in q0800.split('\n')])), (0.85, dict([(float(L.
-    split()[0]), lmap(float, L.split()[1:])) for L in q0850.split('\n')])),
-    (0.9, dict([(float(L.split()[0]), lmap(float, L.split()[1:])) for L in
-    q0900.split('\n')])), (0.95, dict([(float(L.split()[0]), lmap(float, L.
-    split()[1:])) for L in q0950.split('\n')])), (0.975, dict([(float(L.
-    split()[0]), lmap(float, L.split()[1:])) for L in q0975.split('\n')])),
-    (0.99, dict([(float(L.split()[0]), lmap(float, L.split()[1:])) for L in
-    q0990.split('\n')])), (0.995, dict([(float(L.split()[0]), lmap(float, L
-    .split()[1:])) for L in q0995.split('\n')])), (0.999, dict([(float(L.
-    split()[0]), lmap(float, L.split()[1:])) for L in q0999.split('\n')]))])
-R = dict(zip([2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 
-    19, 20, 30, 40, 60, 80, 100], lrange(24)))
+
+# Build the T+ 'matrix'
+# T is a dict of dicts of lists
+
+#                 [alpha keys]        [v keys]
+#                   [table values as lists of floats]
+T = dict([(0.100, dict([(float(L.split()[0]),
+                         lmap(float, L.split()[1:])) for L in q0100.split('\n')])),
+          (0.500, dict([(float(L.split()[0]),
+                         lmap(float, L.split()[1:])) for L in q0500.split('\n')])),
+          (0.675, dict([(float(L.split()[0]),
+                         lmap(float, L.split()[1:])) for L in q0675.split('\n')])),
+          (0.750, dict([(float(L.split()[0]),
+                         lmap(float, L.split()[1:])) for L in q0750.split('\n')])),
+          (0.800, dict([(float(L.split()[0]),
+                         lmap(float, L.split()[1:])) for L in q0800.split('\n')])),
+          (0.850, dict([(float(L.split()[0]),
+                         lmap(float, L.split()[1:])) for L in q0850.split('\n')])),
+          (0.900, dict([(float(L.split()[0]),
+                         lmap(float, L.split()[1:])) for L in q0900.split('\n')])),
+          (0.950, dict([(float(L.split()[0]),
+                         lmap(float, L.split()[1:])) for L in q0950.split('\n')])),
+          (0.975, dict([(float(L.split()[0]),
+                         lmap(float, L.split()[1:])) for L in q0975.split('\n')])),
+          (0.990, dict([(float(L.split()[0]),
+                         lmap(float, L.split()[1:])) for L in q0990.split('\n')])),
+          (0.995, dict([(float(L.split()[0]),
+                         lmap(float, L.split()[1:])) for L in q0995.split('\n')])),
+          (0.999, dict([(float(L.split()[0]),
+                         lmap(float, L.split()[1:])) for L in q0999.split('\n')]))])
+
+# This dict maps r values to the correct list index
+R = dict(zip([2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,
+                     17,18,19,20,30,40,60,80,100], lrange(24)))
+
 inf = np.inf
+# we will need a tinv function
 _tinv = scipy.stats.t.isf
 _phi = scipy.stats.norm.isf
+
+# Now we can build the A 'matrix'
+
+# these are for the least squares fitting
+def qhat(a, p, r, v):
+
+    # eq. 2.3
+    p_ = (1. + p) /2.
+
+    f = a[0]*np.log(r-1.) + \
+        a[1]*np.log(r-1.)**2 + \
+        a[2]*np.log(r-1.)**3 + \
+        a[3]*np.log(r-1.)**4
+
+    # eq. 2.7 and 2.8 corrections
+    for i, r_ in enumerate(r):
+        if r_ == 3:
+            f[i] += -0.002 / (1. + 12. * _phi(p)**2)
+
+            if v <= 4.364:
+                f[i] += 1./517. - 1./(312.*v)
+            else:
+                f[i] += 1./(191.*v)
+
+    return math.sqrt(2) * (f - 1.) * _tinv(p_, v)
+
 errfunc = lambda a, p, r, v, q: qhat(a, p, r, v) - q
-A = {}
+
+A = {} # this is the error matrix
 for p in T:
     for v in T[p]:
+        #eq. 2.4
         a0 = random(4)
-        a1, success = leastsq(errfunc, a0, args=(p, np.array(list(R.keys())
-            ), v, np.array(T[p][v])))
-        if v == 1e+38:
-            A[p, inf] = list(a1)
+        a1, success = leastsq(errfunc, a0,
+                              args=(p, np.array(list(R.keys())),
+                                    v, np.array(T[p][v])))
+
+        if v == 1e38:
+            A[(p,inf)] = list(a1)
         else:
-            A[p, v] = list(a1)
-raise ImportError('we do not want to import this')
+            A[(p,v)] = list(a1)
+
+raise ImportError("we do not want to import this")
+# uncomment the lines below to repr-ize A
+##import pprint
+##pprint.pprint(A, width=160)
diff --git a/statsmodels/stats/libqsturng/qsturng_.py b/statsmodels/stats/libqsturng/qsturng_.py
index 18cce2797..94c9056b0 100644
--- a/statsmodels/stats/libqsturng/qsturng_.py
+++ b/statsmodels/stats/libqsturng/qsturng_.py
@@ -1,3 +1,6 @@
+# Copyright (c) 2011, Roger Lew [see LICENSE.txt]
+# This software is funded in part by NIH Grant P20 RR016454.
+
 """
 Implementation of Gleason's (1999) non-iterative upper quantile
 studentized range approximation.
@@ -23,525 +26,373 @@ from statsmodels.compat.python import lrange
 import math
 import scipy.stats
 import numpy as np
+
 from scipy.optimize import fminbound
+
 inf = np.inf
+
 __version__ = '0.2.3'
-A = {(0.1, 2.0): [-2.2485085243379075, -1.5641014278923464, 
-    0.5594229442681675, -0.06000660885388338], (0.1, 3.0): [-
-    2.2061105943901564, -1.8415406600571855, 0.6188078803983496, -
-    0.06221709366120983], (0.1, 4.0): [-2.168669178667818, -
-    2.008196172372553, 0.650100844319474, -0.06289005500114471], (0.1, 5.0):
-    [-2.145077200277393, -2.112454843879346, 0.6670124058282134, -
-    0.0629935022336548], (0.1, 6.0): [-2.0896098049743155, -
-    2.2400004934286497, 0.7008852339170014, -0.06590756856327275], (0.1, 
-    7.0): [-2.0689296655661584, -2.3078445479584873, 0.7157737460941891, -
-    0.06708103424935055], (0.1, 8.0): [-2.0064956480711262, -
-    2.437400413087452, 0.7629753236741527, -0.07280551812150546], (0.1, 9.0
-    ): [-2.326947751343606, -2.046949471277309, 0.6066251871772059, -
-    0.054887108437009016], (0.1, 10.0): [-2.514024350177229, -
-    1.8261187841127482, 0.5167435807790675, -0.04459042515096363], (0.1, 
-    11.0): [-2.513018130913083, -1.8371718595995694, 0.5133670169486225, -
-    0.043761825829092445], (0.1, 12.0): [-2.5203508109278823, -
-    1.8355687130611862, 0.5063486549107169, -0.04264620506310826], (0.1, 
-    13.0): [-2.5142536438310477, -1.8496969402776282, 0.5061699136776415, -
-    0.04237837990566536], (0.1, 14.0): [-2.3924634153781352, -
-    2.013859173066078, 0.5642189325163869, -0.048716888109540266], (0.1, 
-    15.0): [-2.3573552940582574, -2.057667697622436, 0.5742406877114323, -
-    0.04936748764922584], (0.1, 16.0): [-2.304642748304487, -
-    2.1295959138627993, 0.5977827265768055, -0.05186482921630162], (0.1, 
-    17.0): [-2.2230551072316125, -2.2472837435427127, 0.6425575824321521, -
-    0.05718666520919764], (0.1, 18.0): [-2.3912859179716897, -
-    2.035060407064127, 0.5592478874933333, -0.047729331835226464], (0.1, 
-    19.0): [-2.4169773092220623, -2.0048217969339146, 0.5449303931974891, -
-    0.045991241346224065], (0.1, 20.0): [-2.426408719466075, -
-    1.9916614057049267, 0.5358355513964815, -0.04463049934517662], (0.1, 
-    24.0): [-2.396990313206187, -2.0252941869225345, 0.5342838214120014, -
-    0.043116495567779786], (0.1, 30.0): [-2.2509922780354623, -
-    2.2309248956124894, 0.6074804132493726, -0.05142741588881732], (0.1, 
-    40.0): [-2.1310090183854946, -2.3908466074610564, 0.6584437538232322, -
-    0.05676653804036895], (0.1, 60.0): [-1.9240060179027036, -
-    2.6685751031012233, 0.7567882664745302, -0.067938584352399], (0.1, 
-    120.0): [-1.9814895487030182, -2.5962051736978373, 0.7179396904129269, 
-    -0.06312686320151162], (0.1, inf): [-1.913410267066703, -
-    2.694736732872473, 0.7474233512275059, -0.06660897234304515], (0.5, 2.0
-    ): [-0.8829593573877065, -0.1083576698911433, 0.03521496683939439, -
-    0.002857628897827646], (0.5, 3.0): [-0.8908582920584683, -
-    0.10255696422201063, 0.033613638666631696, -0.0027101699918520737], (
-    0.5, 4.0): [-0.8962734533933812, -0.09907252460766829, 
-    0.032657774808907684, -0.0026219007698204916], (0.5, 5.0): [-
-    0.8995914551194105, -0.09727283658202682, 0.03223618767518296, -
-    0.0025911555217019663], (0.5, 6.0): [-0.8995942873570247, -
-    0.09817629241110665, 0.032590766960226995, -0.0026319890073613164], (
-    0.5, 7.0): [-0.9013149110286394, -0.09713590762029654, 
-    0.03230412499326953, -0.0026057965808244125], (0.5, 8.0): [-
-    0.902925005994329, -0.09604750097133796, 0.03203094661557457, -
-    0.002584874865905389], (0.5, 9.0): [-0.903855986078037, -
-    0.09539077155457189, 0.0318326511111059, -0.002565606021931599], (0.5, 
-    10.0): [-0.9056252493612539, -0.09395448808977191, 0.031414451048323286,
-    -0.002525783470543203], (0.5, 11.0): [-0.9042034737117383, -
-    0.09585165637027729, 0.0321150356209743, -0.002605505640009345], (0.5, 
-    12.0): [-0.9058597347175766, -0.09444930629672803, 0.03170594592321096,
-    -0.002567333019578019], (0.5, 13.0): [-0.9055543706729305, -
-    0.09479299105078025, 0.03182659496457109, -0.0025807109129488545], (0.5,
-    14.0): [-0.9065275660438876, -0.09379215699456474, 0.03146896632888904,
-    -0.002539517536108374], (0.5, 15.0): [-0.9064232370040008, -
-    0.09417301752048798, 0.031657517378893905, -0.0025659271829033877], (
-    0.5, 16.0): [-0.9071633863668523, -0.09378517808382043, 
-    0.031630091949658, -0.0025701459247416637], (0.5, 17.0): [-
-    0.9079013381676971, -0.09300114763863888, 0.031376863944487084, -
-    0.002545143621663892], (0.5, 18.0): [-0.9077432927051563, -
-    0.0933435163781806, 0.03151813966239531, -0.002561390613327718], (0.5, 
-    19.0): [-0.9078949945649029, -0.09316964789456067, 0.0314407823663429, 
-    -0.0025498353345867453], (0.5, 20.0): [-0.9084270786103072, -
-    0.09269601647660859, 0.03129604031138833, -0.0025346963982742186], (0.5,
-    24.0): [-0.9083281347135469, -0.09295930814497078, 0.03146406319007709,
-    -0.0025611384271086285], (0.5, 30.0): [-0.9085762405001683, -
-    0.09304313939198051, 0.03157879172934133, -0.0025766595412777147], (0.5,
-    40.0): [-0.9103408504543868, -0.09197803573891457, 0.03145163100005264,
-    -0.0025791418103733297], (0.5, 60.0): [-0.9108435668103003, -
-    0.09145267557242343, 0.031333147984820044, -0.0025669786958144843], (
-    0.5, 120.0): [-0.9096364956146383, -0.09341456326135235, 
-    0.032215602703677425, -0.0026704024780441257], (0.5, inf): [-
-    0.9107715750098166, -0.09289922035033457, 0.032230422399363315, -
-    0.0026696941964372916], (0.675, 2.0): [-0.6723152102656514, -
-    0.09708362403066345, 0.02799137890166165, -0.002142518406984556], (
-    0.675, 3.0): [-0.6566172476464582, -0.08147195494632696, 
-    0.02345732427073333, -0.0017448570400999351], (0.675, 4.0): [-
-    0.6504567769746112, -0.07141907339945043, 0.0207419625768525, -
-    0.0015171262565892491], (0.675, 5.0): [-0.6471887535780833, -
-    0.06472061142521834, 0.01905345024654645, -0.001383623298622871], (
-    0.675, 6.0): [-0.6452300370201866, -0.059926313672731824, 
-    0.017918997181483924, -0.0012992250285556828], (0.675, 7.0): [-
-    0.6440331314847884, -0.056248191513784476, 0.01709144679129372, -
-    0.0012406558789511822], (0.675, 8.0): [-0.6432509586576436, -
-    0.053352543126426684, 0.016471879286491072, -0.0011991839050964099], (
-    0.675, 9.0): [-0.6427115275491165, -0.05102376962044908, 
-    0.01599799600547195, -0.0011693637984597086], (0.675, 10.0): [-
-    0.6423224440850263, -0.04911832746288437, 0.015629704966568955, -
-    0.0011477775513952285], (0.675, 11.0): [-0.6420389785435356, -
-    0.04752462796027789, 0.015334801262767227, -0.0011315057284007177], (
-    0.675, 12.0): [-0.6418034497351277, -0.04620590757600329, 
-    0.015108290595438166, -0.0011207364514518488], (0.675, 13.0): [-
-    0.6416208645682334, -0.04507609933687423, 0.0149226565346125, -
-    0.0011126140690497352], (0.675, 14.0): [-0.6414690648019898, -
-    0.044108523550512715, 0.014772954218646743, -0.0011069708562369386], (
-    0.675, 15.0): [-0.641339151519666, -0.043273370927039825, 
-    0.014651691599222836, -0.0011032216539514398], (0.675, 16.0): [-
-    0.6412323784275208, -0.04253892501246387, 0.014549992487506169, -
-    0.0011005633864334021], (0.675, 17.0): [-0.6411303403753661, -
-    0.041905699463005854, 0.014470805560767184, -0.001099528643673847], (
-    0.675, 18.0): [-0.6410413739156126, -0.041343885546229336, 
-    0.014404563657113593, -0.0010991304223377683], (0.675, 19.0): [-
-    0.640960648828273, -0.04084569291139839, 0.0143501596551338, -
-    0.00109936567111219], (0.675, 20.0): [-0.6408864740508957, -
-    0.040402175957178085, 0.014305769823654429, -0.0011001304776712105], (
-    0.675, 24.0): [-0.6406376396593784, -0.039034716348048545, 
-    0.014196703837251648, -0.0011061961945598175], (0.675, 30.0): [-
-    0.6403498771629489, -0.03774965115694172, 0.014147040999127263, -
-    0.0011188251352919833], (0.675, 40.0): [-0.6399990514713938, -
-    0.0365833075748578, 0.014172070700846548, -0.0011391004138624943], (
-    0.675, 60.0): [-0.6395558620243025, -0.035576938958184395, 
-    0.014287299153378865, -0.0011675811805794236], (0.675, 120.0): [-
-    0.6389924267477862, -0.03476375751238885, 0.014500726912982405, -
-    0.0012028491454427466], (0.675, inf): [-0.6383268257924761, -
-    0.034101476695520404, 0.014780921043580184, -0.0012366204114216408], (
-    0.75, 2.0): [-0.6068407363850445, -0.09637519207805703, 
-    0.026567529471304554, -0.0019963228971914488], (0.75, 3.0): [-
-    0.5798614451910266, -0.07857029271803488, 0.02128063792500945, -
-    0.0015329306898533772], (0.75, 4.0): [-0.5682077168619359, -
-    0.0668113563896649, 0.01806528405105919, -0.0012641485481533648], (0.75,
-    5.0): [-0.5617529243574022, -0.058864526929603825, 0.0160467350257088, 
-    -0.0011052560286524044], (0.75, 6.0): [-0.5577344928206636, -
-    0.05313692326982735, 0.014684258167069347, -0.0010042826823561605], (
-    0.75, 7.0): [-0.5550952459886733, -0.048752649191139405, 
-    0.013696566605823626, -0.000934822100031339], (0.75, 8.0): [-
-    0.5532499368619151, -0.045305558708724644, 0.012959681992062138, -
-    0.0008858354160169602], (0.75, 9.0): [-0.551892590540262, -
-    0.042539819902381634, 0.01239879110642477, -0.0008508396224143583], (
-    0.75, 10.0): [-0.5508538465695689, -0.040281425755686585, 
-    0.01196442242722482, -0.0008256032216149268], (0.75, 11.0): [-
-    0.5500319810354127, -0.03841017610019395, 0.011623294239447784, -
-    0.0008073297503432007], (0.75, 12.0): [-0.5493654159631918, -
-    0.0368385432678871, 0.0113518226378957, -0.0007940703654926442], (0.75,
-    13.0): [-0.5488101597275383, -0.035506710625568455, 
-    0.011134691307865171, -0.0007846360016355809], (0.75, 14.0): [-
-    0.5483409434607195, -0.03436479060990657, 0.010958873929274728, -
-    0.0007779664535700829], (0.75, 15.0): [-0.5479360241830425, -
-    0.03337923745574803, 0.010816140998057593, -0.000773441750647851], (
-    0.75, 16.0): [-0.5475834768972804, -0.03252056914589892, 
-    0.010699240399358219, -0.0007705084732859668], (0.75, 17.0): [-
-    0.547271159637953, -0.03176927719292753, 0.01060374975117048, -
-    0.0007688642392748113], (0.75, 18.0): [-0.5469935180882654, -
-    0.031105476267880995, 0.010524669113016114, -0.0007681065683746409], (
-    0.75, 19.0): [-0.5467435762641908, -0.030516967201954, 
-    0.010459478822937069, -0.0007680865258244004], (0.75, 20.0): [-
-    0.5465172837895013, -0.029992319199769232, 0.010405694998386575, -
-    0.0007686417223966138], (0.75, 24.0): [-0.5457830954682836, -
-    0.028372628574010936, 0.010269939602271542, -0.0007742737064726184], (
-    0.75, 30.0): [-0.5450124643439755, -0.026834887880579802, 
-    0.010195603314317611, -0.0007864861595410551], (0.75, 40.0): [-
-    0.5441812744202262, -0.02541322448887138, 0.010196455193836855, -
-    0.0008061078574952374], (0.75, 60.0): [-0.543265189207915, -
-    0.024141961069146383, 0.010285001019536088, -0.0008333219336429459], (
-    0.75, 120.0): [-0.5422475781799481, -0.023039071833948214, 
-    0.010463365295636302, -0.0008661282853947792], (0.75, inf): [-
-    0.5411457981536716, -0.02206592527426093, 0.01070374099737127, -
-    0.0008972656400512218], (0.8, 2.0): [-0.5689527404683115, -
-    0.09632625519054196, 0.025815915364208686, -0.0019136561019354845], (
-    0.8, 3.0): [-0.5336038380862278, -0.07758519101487618, 
-    0.020184759265389905, -0.0014242746007323785], (0.8, 4.0): [-
-    0.5178027428593426, -0.06498773844360871, 0.016713309796866204, -
-    0.001135379856633562], (0.8, 5.0): [-0.508943612222684, -
-    0.056379186603362705, 0.014511270339773345, -0.000962256041174932], (
-    0.8, 6.0): [-0.5033515302863041, -0.05016886029479081, 
-    0.01302807093593626, -0.0008526981269253631], (0.8, 7.0): [-
-    0.4996093438089643, -0.04541733378780603, 0.011955593330247398, -
-    0.0007775960560425088], (0.8, 8.0): [-0.49694518248979763, -
-    0.04168915151602197, 0.011158986677273709, -0.0007249743010395337], (
-    0.8, 9.0): [-0.4949559974898507, -0.038702217132906024, 
-    0.010554360004521268, -0.0006875213117164109], (0.8, 10.0): [-
-    0.4934140791016248, -0.0362667887413254, 0.010087354421936092, -
-    0.000660608350628656], (0.8, 11.0): [-0.49218129312493897, -
-    0.0342524036432735, 0.009721858483857954, -0.0006412345933520191], (0.8,
-    12.0): [-0.4911722395711218, -0.03256326973049902, 0.00943185830960214,
-    -0.0006272525385241903], (0.8, 13.0): [-0.49032781145131277, -
-    0.031132495018324432, 0.00919997625627929, -0.0006172944366003854], (
-    0.8, 14.0): [-0.4896104962846426, -0.029906921170494854, 
-    0.009012451847823854, -0.0006102621196866954], (0.8, 15.0): [-
-    0.4889906979305492, -0.028849609914548158, 0.00886028200026196, -
-    0.0006054899157517905], (0.8, 16.0): [-0.48844921216636505, -
-    0.027929790075266154, 0.00873599263877896, -0.0006024211979685938], (
-    0.8, 17.0): [-0.48797119683309537, -0.027123634910159868, 
-    0.008633813986948189, -0.000600618215934], (0.8, 18.0): [-
-    0.48754596864745836, -0.02641196872349696, 0.008549319660470575, -
-    0.0005997708316083362], (0.8, 19.0): [-0.48716341805691843, -
-    0.025781422230819986, 0.008479665591502577, -0.0005997003175832347], (
-    0.8, 20.0): [-0.4868173919718555, -0.02521962985219875, 
-    0.008422184425428777, -0.0006002321282288671], (0.8, 24.0): [-
-    0.48570639629281365, -0.023480608772518948, 0.008274490561114187, -
-    0.000605681105792215], (0.8, 30.0): [-0.48455867067770253, -
-    0.021824655071720423, 0.008188850297472057, -0.0006176212693378563], (
-    0.8, 40.0): [-0.4833547872926742, -0.02027995899836339, 
-    0.008176509591419471, -0.0006365711712982963], (0.8, 60.0): [-
-    0.4820735194499668, -0.018875344346672228, 0.008247399719147234, -
-    0.0006624247847927724], (0.8, 120.0): [-0.4807035618533018, -
-    0.017621686995755746, 0.00840096388032238, -0.0006930038380894932], (
-    0.8, inf): [-0.47926687718713606, -0.0164765753523672, 
-    0.008609705964659181, -0.0007216084349273091], (0.85, 2.0): [-
-    0.5336680698638174, -0.09828817825272326, 0.026002333446289064, -
-    0.0019567144268844896], (0.85, 3.0): [-0.4899591923961999, -
-    0.07731272264841806, 0.019368984865418108, -0.0013449670192265796], (
-    0.85, 4.0): [-0.4695607916238286, -0.0638185185139467, 
-    0.015581608910696544, -0.0010264315084377606], (0.85, 5.0): [-
-    0.45790853796153624, -0.054680511194530226, 0.013229852432203093, -
-    0.000842484308475359], (0.85, 6.0): [-0.4505070841695738, -
-    0.0480509366828733, 0.01163640758271419, -0.0007249148003352981], (0.85,
-    7.0): [-0.4454833747733618, -0.042996612516383016, 0.010493052959891263,
-    -0.0006452878479215324], (0.85, 8.0): [-0.4418662493266415, -
-    0.039040005821657585, 0.009647953079416054, -0.0005899087436096757], (
-    0.85, 9.0): [-0.4391411868981226, -0.03587569303075271, 
-    0.009008880413062819, -0.0005507148033939969], (0.85, 10.0): [-
-    0.4370125539095377, -0.033300997407157376, 0.008517215935534485, -
-    0.0005227277079969546], (0.85, 11.0): [-0.43530109064899053, -
-    0.031174742038490313, 0.008133561986838607, -0.0005026835380978793], (
-    0.85, 12.0): [-0.4338922037661007, -0.02939618314990838, 
-    0.007830626267772851, -0.0004883643171267822], (0.85, 13.0): [-
-    0.43271026958463166, -0.027890759135246888, 0.007588691666863294, -
-    0.0004781933971059697], (0.85, 14.0): [-0.4317023026500721, -
-    0.02660415606239619, 0.007393909968870555, -0.0004710999685433542], (
-    0.85, 15.0): [-0.43083160459377423, -0.025494228911600785, 
-    0.007235873865755087, -0.0004663067705226248], (0.85, 16.0): [-
-    0.4300699280587239, -0.024529612608808794, 0.007106922702621968, -
-    0.0004632386986094179], (0.85, 17.0): [-0.42939734931902857, -
-    0.02368502561605427, 0.007001154160969589, -0.0004614795494299416], (
-    0.85, 18.0): [-0.42879829041505324, -0.022940655682782165, 
-    0.006914006369119409, -0.00046070877994711774], (0.85, 19.0): [-
-    0.42826119448419875, -0.02228018178163465, 0.006841774690582643, -
-    0.0004606684121409198], (0.85, 20.0): [-0.4277765488709448, -
-    0.02169090907674783, 0.006781740864371797, -0.0004611862028906803], (
-    0.85, 24.0): [-0.4262245003364085, -0.019869646711890065, 
-    0.006627679959349403, -0.00046668820637553747], (0.85, 30.0): [-
-    0.4246381044323342, -0.018130114737381745, 0.006534461306049916, -
-    0.00047835583417510423], (0.85, 40.0): [-0.4229991780458938, -
-    0.016498222901308417, 0.006512055834357841, -0.0004965604368532547], (
-    0.85, 60.0): [-0.42129387265810464, -0.014992121475265813, 
-    0.0065657795990087635, -0.000520697056406877], (0.85, 120.0): [-
-    0.4195158047636637, -0.013615722489371183, 0.006692391127572681, -
-    0.0005484691164916749], (0.85, inf): [-0.4176875182542897, -
-    0.012327525092266726, 0.006866492056956259, -0.0005740372026175354], (
-    0.9, 1.0): [-0.6585106327909672, -0.126716242078905, 
-    0.03631880191760306, -0.002901283222928193], (0.9, 2.0): [-
-    0.5039194536982914, -0.09699610802114624, 0.0247264376234734, -
-    0.0017901399938303017], (0.9, 3.0): [-0.44799791843058734, -
-    0.0771803703333072, 0.01858404205559447, -0.0012647038118363408], (0.9,
-    4.0): [-0.42164091756145167, -0.06342707100628751, 0.014732203755741392,
-    -0.0009490417411795769], (0.9, 5.0): [-0.40686856251221754, -
-    0.0533619400548424, 0.012041802076025801, -0.0007296019829241061], (0.9,
-    6.0): [-0.39669926026535285, -0.04695151743800424, 0.010546647213094956,
-    -0.0006262119800236606], (0.9, 7.0): [-0.39006553675807426, -
-    0.04169480606532109, 0.00936875466017372, -0.0005464869571327386], (0.9,
-    8.0): [-0.3857020506706191, -0.037083910859179794, 0.008323321852637584,
-    -0.0004717758697403545], (0.9, 9.0): [-0.3819073726789294, -
-    0.034004585655388865, 0.007753199157411918, -0.0004430654730852787], (
-    0.9, 10.0): [-0.37893272918125737, -0.03139467760091698, 
-    0.007259680250353354, -0.0004160518834299966], (0.9, 11.0): [-
-    0.3769251249270513, -0.02878079340313647, 0.006693790904906038, -
-    0.00037420010136784526], (0.9, 12.0): [-0.3750634520012919, -
-    0.026956483290567372, 0.006414773070777652, -0.00036595383207062906], (
-    0.9, 13.0): [-0.3733951612238321, -0.02543949524844704, 
-    0.006176065653019719, -0.00035678737379179527], (0.9, 14.0): [-
-    0.3721697989108784, -0.02396347606956644, 0.005926323446596964, -
-    0.0003439784452550796], (0.9, 15.0): [-0.371209456600122, -
-    0.022696132732654414, 0.005752167718462315, -0.0003396110856177085], (
-    0.9, 16.0): [-0.3695892437798334, -0.022227885445863002, 
-    0.005769170679938393, -0.0003504276253809968], (0.9, 17.0): [-
-    0.36884224719083203, -0.021146977888668726, 0.005595792826973272, -
-    0.0003428381041269753], (0.9, 18.0): [-0.36803087186793326, -
-    0.020337731477576542, 0.005465537809521276, -0.0003345296694653525], (
-    0.9, 19.0): [-0.3676700404163355, -0.019370115848857467, 
-    0.0053249296207149655, -0.00032975528909580403], (0.9, 20.0): [-
-    0.3664227626718881, -0.019344251412284838, 0.005445496858289753, -
-    0.0003486811167754095], (0.9, 24.0): [-0.3645065075375519, -
-    0.01728425549999068, 0.005233750005917675, -0.0003489820284574729], (
-    0.9, 30.0): [-0.3625186894016861, -0.015358560437631397, 
-    0.0050914299956134786, -0.0003557452889163398], (0.9, 40.0): [-
-    0.3600888667651094, -0.014016835682905486, 0.005193083595911151, -
-    0.00038798316011984165], (0.9, 60.0): [-0.3582559069026806, -
-    0.011991568926537646, 0.005063220854241419, -0.00039090198974493085], (
-    0.9, 120.0): [-0.3554361223728441, -0.011074403997811812, 
-    0.005350457075276516, -0.0004364713742807418], (0.9, inf): [-
-    0.35311806343057167, -0.009625402009214535, 0.005454859120817718, -
-    0.00045343916634968493], (0.95, 1.0): [-0.6533031813602007, -
-    0.12638310760474375, 0.035987535130769424, -0.0028562665467665315], (
-    0.95, 2.0): [-0.47225160417826934, -0.10182570362271424, 
-    0.025846563499059158, -0.0019096769058043243], (0.95, 3.0): [-
-    0.4056635555586528, -0.0770671726933503, 0.017789909647225533, -
-    0.001182961668735774], (0.95, 4.0): [-0.37041675177340955, -
-    0.06381568711893947, 0.014115210247737845, -0.000899960984351176], (
-    0.95, 5.0): [-0.3515239829115231, -0.05215650264066932, 
-    0.010753738086401853, -0.0005986841939451575], (0.95, 6.0): [-
-    0.33806730015201264, -0.0456683998095786, 0.009316889895287816, -
-    0.000513697196157821], (0.95, 7.0): [-0.32924041072104465, -
-    0.04001960177549009, 0.008005119955286516, -0.0004205453613586804], (
-    0.95, 8.0): [-0.32289030266989077, -0.03557534593167044, 
-    0.007050908934469467, -0.00035980773304803576], (0.95, 9.0): [-
-    0.31767304201477375, -0.0324649459301657, 0.006475595043727214, -
-    0.0003316676253661824], (0.95, 10.0): [-0.31424318064708656, -
-    0.029133461621153, 0.0057437449431074795, -0.0002789425226120919], (
-    0.95, 11.0): [-0.31113589620384974, -0.02685115250591049, 
-    0.005351790528294289, -0.00026155954116874666], (0.95, 12.0): [-
-    0.3084898361241458, -0.025043238019239168, 0.005066167591348883, -
-    0.00025017202909614005], (0.95, 13.0): [-0.3059212907410393, -
-    0.023863874699213077, 0.004961805113580732, -0.00025665425781125703], (
-    0.95, 14.0): [-0.30449676902720035, -0.021983976741572344, 
-    0.004574051373575197, -0.00022881166323945914], (0.95, 15.0): [-
-    0.30264908294481396, -0.02104880307520084, 0.004486657161480438, -
-    0.00023187587597844057], (0.95, 16.0): [-0.30118294463097917, -
-    0.02016023106192673, 0.004417078075905686, -0.00023733502359045826], (
-    0.95, 17.0): [-0.30020013353427744, -0.018959271614471574, 
-    0.0041925333038202285, -0.00022274025630789767], (0.95, 18.0): [-
-    0.298578865568744, -0.018664437456802, 0.00425577876328337, -
-    0.00023758868868853716], (0.95, 19.0): [-0.29796289236978263, -
-    0.01763221855231759, 0.0040792779937959866, -0.00022753271474613109], (
-    0.95, 20.0): [-0.2968150655483808, -0.017302563243037392, 
-    0.004118842622142896, -0.00023913038468772782], (0.95, 24.0): [-
-    0.29403146911167666, -0.015332330986025032, 0.003929217031916373, -
-    0.00024003445648641732], (0.95, 30.0): [-0.2908077556377588, -
-    0.013844059210779323, 0.003927916561605989, -0.00026085104496801666], (
-    0.95, 40.0): [-0.2882158303280511, -0.011894686715666892, 
-    0.003820262327883998, -0.0002693332510203125], (0.95, 60.0): [-
-    0.2852563673775145, -0.010235910558409797, 0.003814702977758, -
-    0.0002859836214417896], (0.95, 120.0): [-0.2824106588502654, -
-    0.008610383632730503, 0.0038450612886908714, -0.0003020605367155941], (
-    0.95, inf): [-0.27885570064169296, -0.007812245552484922, 
-    0.004179853805362345, -0.0003469494881774609], (0.975, 1.0): [-
-    0.6520359830429798, -0.12608944279227957, 0.03571003875711735, -
-    0.0028116024425349053], (0.975, 2.0): [-0.4637189113038228, -
-    0.09695445831999651, 0.02395831251991229, -0.0017124565391080503], (
-    0.975, 3.0): [-0.38265282195259875, -0.07678253923161228, 
-    0.017405078796142955, -0.0011610853687902553], (0.975, 4.0): [-
-    0.340511931588784, -0.0636523427346716, 0.013528310336964293, -
-    0.0008364470893499076], (0.975, 5.0): [-0.31777655705536484, -
-    0.05169468691433462, 0.010115807205265859, -0.0005451746534419201], (
-    0.975, 6.0): [-0.30177149019958716, -0.04480669763118906, 
-    0.008483551848413786, -0.00042827853925009264], (0.975, 7.0): [-
-    0.2904697231329356, -0.039732822689098744, 0.007435356037378946, -
-    0.0003756292828335067], (0.975, 8.0): [-0.2830948400736814, -
-    0.03476490494071339, 0.006293251369492852, -0.00029339243611357956], (
-    0.975, 9.0): [-0.27711707948119785, -0.03121046519481071, 
-    0.0055576244284178435, -0.000246637982088958], (0.975, 10.0): [-
-    0.2724920344855361, -0.028259756468251584, 0.00499112012528406, -
-    0.0002153538041703539], (0.975, 11.0): [-0.2684851586001101, -
-    0.026146703336893323, 0.004655776711063407, -0.00020400628148271448], (
-    0.975, 12.0): [-0.2649992154000819, -0.024522931106167097, 
-    0.004425962495866528, -0.00019855685376441687], (0.975, 13.0): [-
-    0.2625023751891592, -0.022785875653297854, 0.004150277321193792, -
-    0.00018801223218078264], (0.975, 14.0): [-0.2603855241432176, -
-    0.02130350985973834, 0.003919560828046468, -0.00017826200169385824], (
-    0.975, 15.0): [-0.25801244886414665, -0.020505508012402567, 
-    0.003875486893271293, -0.00018588907991739744], (0.975, 16.0): [-
-    0.2568531606236051, -0.018888418269740373, 0.0035453092842317293, -
-    0.00016235770674204116], (0.975, 17.0): [-0.2550113227135355, -
-    0.018362951972357794, 0.003565393310528863, -0.0001747035335499273], (
-    0.975, 18.0): [-0.25325045404452656, -0.017993537285026156, 
-    0.003603586740537669, -0.00018635492166426884], (0.975, 19.0): [-
-    0.2523689949467793, -0.016948921372207198, 0.00341389317813308, -
-    0.0001746225341468788], (0.975, 20.0): [-0.2513449802502769, -
-    0.016249564498874988, 0.0033197284005334333, -0.00017098091103245596],
-    (0.975, 24.0): [-0.24768690797476625, -0.014668160763513996, 
-    0.003285079118685256, -0.00019013480716844995], (0.975, 30.0): [-
-    0.24420834707522676, -0.012911171716272752, 0.003197767670096805, -
-    0.00020114907914487053], (0.975, 40.0): [-0.24105725356215926, -
-    0.010836526056169627, 0.003023130355075416, -0.00020128696343148667], (
-    0.975, 60.0): [-0.23732082703955223, -0.009544272715738539, 
-    0.003143290447355526, -0.0002306222410938394], (0.975, 120.0): [-
-    0.23358581879594578, -0.008128125991870934, 0.0031877298679120094, -
-    0.000244962304468515], (0.975, inf): [-0.23004105093119268, -
-    0.006711258517413357, 0.0032760251638919435, -0.0002624400131946299], (
-    0.99, 1.0): [-0.651541194227062, -0.1266603927572312, 
-    0.03607480609672048, -0.0028668112687608113], (0.99, 2.0): [-
-    0.45463403324378804, -0.09870123623452737, 0.02441271576168469, -
-    0.0017613772919362193], (0.99, 3.0): [-0.3640206005103578, -
-    0.07924495919372915, 0.017838124021360584, -0.00119080116484847], (0.99,
-    4.0): [-0.3190350606395382, -0.06106074068244524, 0.012093154962939612,
-    -0.0006726834718844309], (0.99, 5.0): [-0.2891701458068918, -
-    0.05294078009931369, 0.010231009146279354, -0.0005717833918461524], (
-    0.99, 6.0): [-0.2728324016117901, -0.042505435573209085, 
-    0.007275340111826453, -0.0003131403471072592], (0.99, 7.0): [-
-    0.2577396872054672, -0.039384214480463406, 0.006912088259728687, -
-    0.00032994068754356204], (0.99, 8.0): [-0.24913629282433833, -
-    0.03383156717843286, 0.0055516244725724185, -0.00022570786249671376], (
-    0.99, 9.0): [-0.24252380896373404, -0.029488280751457097, 
-    0.0045215453527923, -0.00014424552929022646], (0.99, 10.0): [-
-    0.23654349556639986, -0.02705600214566789, 0.004162725546934363, -
-    0.00013804427029504753], (0.99, 11.0): [-0.23187404969432468, -
-    0.024803662094970855, 0.0037885852786822475, -0.00012334999287725012],
-    (0.99, 12.0): [-0.22749929386320905, -0.023655085290534145, 
-    0.0037845051889055896, -0.00014785715789924055], (0.99, 13.0): [-
-    0.22458989143485605, -0.021688394892771506, 0.003407529460142525, -
-    0.00012436961982044268], (0.99, 14.0): [-0.22197623872225777, -
-    0.020188830700102918, 0.0031648685865587473, -0.00011320740119998819],
-    (0.99, 15.0): [-0.2193924323730066, -0.019327469111698265, 
-    0.0031295453754886576, -0.00012373072900083014], (0.99, 16.0): [-
-    0.21739436875855705, -0.018215854969324128, 0.0029638341057222645, -
-    0.00011714667871412003], (0.99, 17.0): [-0.21548926805467686, -
-    0.01744782217941272, 0.0028994805120482812, -0.00012001887015183794], (
-    0.99, 18.0): [-0.21365014687077843, -0.01688869353338961, 
-    0.0028778031289216546, -0.0001259119910479271], (0.99, 19.0): [-
-    0.21236653761262406, -0.016057151563612645, 0.0027571468998022017, -
-    0.00012049196593780046], (0.99, 20.0): [-0.21092693178421842, -
-    0.015641706950956638, 0.0027765989877361293, -0.00013084915163086915],
-    (0.99, 24.0): [-0.20681960327410207, -0.013804298040271909, 
-    0.0026308276736585674, -0.0001355061502101814], (0.99, 30.0): [-
-    0.20271691131071576, -0.01206095288359876, 0.002542613800419891, -
-    0.00014589047959047533], (0.99, 40.0): [-0.19833098054449289, -
-    0.01071453396374072, 0.0025985992420317597, -0.0001688279944262007], (
-    0.99, 60.0): [-0.19406768821236584, -0.009329710648201399, 
-    0.0026521518387539584, -0.00018884874193665104], (0.99, 120.0): [-
-    0.19010213174677365, -0.007595820722130092, 0.0025660823297025633, -
-    0.00018906475172834352], (0.99, inf): [-0.18602070255787137, -
-    0.006212115516536319, 0.0026328293420766593, -0.0002045336652986713], (
-    0.995, 1.0): [-0.6513558354495183, -0.1266868999507193, 
-    0.036067522182457165, -0.002865451695884492], (0.995, 2.0): [-
-    0.45229774013072793, -0.09869462954369547, 0.024381858599368908, -
-    0.0017594734553033394], (0.995, 3.0): [-0.35935765236429706, -
-    0.07665040832667191, 0.016823026893528978, -0.0010835134496404637], (
-    0.995, 4.0): [-0.3070447472093117, -0.06309304773161302, 
-    0.012771683306774929, -0.0007585249162180995], (0.995, 5.0): [-
-    0.27582551740863454, -0.05253335313788579, 0.009777600984517437, -
-    0.0005133803175639913], (0.995, 6.0): [-0.25657971464398704, -
-    0.043424914996692286, 0.007432414743596999, -0.00034105188850494067], (
-    0.995, 7.0): [-0.24090407819707738, -0.03959160471220029, 
-    0.006884842945102039, -0.00034737131709273414], (0.995, 8.0): [-
-    0.23089540800827862, -0.03435330581636196, 0.005600952762982011, -
-    0.00024389336976992433], (0.995, 9.0): [-0.22322694848310584, -
-    0.030294770709722547, 0.004675123974724554, -0.00017437479314218922], (
-    0.995, 10.0): [-0.21722684126671632, -0.02699356356016381, 
-    0.003981159271090549, -0.00013135281785826703], (0.995, 11.0): [-
-    0.2117163582285291, -0.02515619361821255, 0.0037507759652964205, -
-    0.0001295983668517567], (0.995, 12.0): [-0.20745332165849167, -
-    0.02331881953560722, 0.0034935020002058903, -0.00012642826898405916], (
-    0.995, 13.0): [-0.20426054591612508, -0.021189796175249527, 
-    0.003031472176128759, -9.049773387753162e-05], (0.995, 14.0): [-
-    0.20113536905578902, -0.02001153669662306, 0.002921588088995673, -
-    9.571527213951222e-05], (0.995, 15.0): [-0.19855601561006403, -
-    0.01880853373400254, 0.0027608859956002344, -9.247299525692922e-05], (
-    0.995, 16.0): [-0.19619157579534008, -0.017970461530551096, 
-    0.002711371910500037, -9.986487498289086e-05], (0.995, 17.0): [-
-    0.19428015140726104, -0.017009762497670704, 0.0025833389598201345, -
-    9.613754573806112e-05], (0.995, 18.0): [-0.19243180236773033, -
-    0.01631617252107519, 0.002522744356161862, -9.806758052343288e-05], (
-    0.995, 19.0): [-0.19061294393069844, -0.01586226613672222, 
-    0.002520700590264178, -0.00010466151274918466], (0.995, 20.0): [-
-    0.18946302696580328, -0.014975796567260896, 0.0023700506576419867, -
-    9.550777905788463e-05], (0.995, 24.0): [-0.18444251428695257, -
-    0.013770955893918012, 0.0024579445553339903, -0.00012688402863358003],
-    (0.995, 30.0): [-0.18009742499570078, -0.011831341846559026, 
-    0.0022801125189390046, -0.00012536249967254906], (0.995, 40.0): [-
-    0.1756272188094326, -0.010157142650455463, 0.0022121943861923474, -
-    0.000134542652873434], (0.995, 60.0): [-0.17084630673594547, -
-    0.00902249658527548, 0.0023435529965815565, -0.00016240306777440115], (
-    0.995, 120.0): [-0.16648414081054147, -0.0074792163241677225, 
-    0.0023284585524533607, -0.00017116464012147041], (0.995, inf): [-
-    0.1621392187545246, -0.0058985998630496144, 0.0022605819363689093, -
-    0.00016896211491119114], (0.999, 1.0): [-0.6523399407208936, -
-    0.1257942744544422, 0.03583057799567927, -0.0028470555202945564], (
-    0.999, 2.0): [-0.4505016431132634, -0.09829480438069829, 
-    0.024134463919493736, -0.001726960395685284], (0.999, 3.0): [-
-    0.3516174149930782, -0.07680115227237427, 0.016695693063138672, -
-    0.0010661121974071864], (0.999, 4.0): [-0.29398448788574133, -
-    0.06277319725219685, 0.012454220010543127, -0.0007264416572340245], (
-    0.999, 5.0): [-0.25725364564365477, -0.053463787584337355, 
-    0.009966423655743155, -0.0005486603938898066], (0.999, 6.0): [-
-    0.23674225795168574, -0.040973155890031254, 0.00625994811917367, -
-    0.00021565734226586692], (0.999, 7.0): [-0.21840108878983297, -
-    0.03703702027187772, 0.00559080636719007, -0.00020238790479809623], (
-    0.999, 8.0): [-0.2057964743918449, -0.032500885103194356, 
-    0.004644164458566176, -0.00014769592268680274], (0.999, 9.0): [-
-    0.19604592954882674, -0.029166922919677936, 0.004064433311194981, -
-    0.00012854052861297006], (0.999, 10.0): [-0.18857328935948367, -
-    0.02631670570316109, 0.0035897350868809275, -0.00011572282691335702], (
-    0.999, 11.0): [-0.18207431428535406, -0.024201081944369412, 
-    0.0031647372098056077, -8.114593598229644e-05], (0.999, 12.0): [-
-    0.177963581489911, -0.02105430611862088, 0.0023968085939602055, -
-    1.5907156771296993e-05], (0.999, 13.0): [-0.1737196596274549, -
-    0.01957716295017771, 0.002239178347399974, -2.0613023472812558e-05], (
-    0.999, 14.0): [-0.16905298116759873, -0.01967115985443986, 
-    0.002649520832588927, -9.107427522063407e-05], (0.999, 15.0): [-
-    0.16635662558214312, -0.017903767183469876, 0.0022301322677100496, -
-    5.1956773935885426e-05], (0.999, 16.0): [-0.1638877654952545, -
-    0.01667191883990242, 0.002036528960274438, -4.359244759972494e-05], (
-    0.999, 17.0): [-0.1613193417799076, -0.015998918405126326, 
-    0.0019990454743285904, -4.817627749132765e-05], (0.999, 18.0): [-
-    0.1588063311037657, -0.015830715141055916, 0.002168840534383209, -
-    8.061825248932771e-05], (0.999, 19.0): [-0.15644841913314136, -
-    0.01572936472110568, 0.0022981443610378136, -0.00010093672643417343], (
-    0.999, 20.0): [-0.15516596606222705, -0.014725095968258637, 
-    0.0021117117014292155, -8.880688029732848e-05], (0.999, 24.0): [-
-    0.14997437768645827, -0.012755323295476786, 0.001887165151049694, -
-    8.089637066241494e-05], (0.999, 30.0): [-0.14459974882323703, -
-    0.011247323832877647, 0.001863740064382628, -9.641532319160674e-05], (
-    0.999, 40.0): [-0.13933285919392555, -0.009715176969249659, 
-    0.0018131251876208683, -0.00010452598991994023], (0.999, 60.0): [-
-    0.13424555343804143, -0.008216302795166944, 0.0017883427892173382, -
-    0.00011415865110808405], (0.999, 120.0): [-0.12896119523040372, -
-    0.007042670111258111, 0.0018472364154226955, -0.00012862202979478294],
-    (0.999, inf): [-0.12397213562666673, -0.005690120160415, 
-    0.0018260689406957129, -0.00013263452567995485]}
-p_keys = [0.1, 0.5, 0.675, 0.75, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99, 0.995, 
-    0.999]
-v_keys = lrange(2, 21) + [24, 30, 40, 60, 120, inf]

+# changelog
+# 0.1   - initial release
+# 0.1.1 - vectorized
+# 0.2   - psturng added
+# 0.2.1 - T, R generation script relegated to make_tbls.py
+# 0.2.2
+#       - select_points refactored for performance to select_ps and
+#         select_vs
+#       - pysturng tester added.
+# 0.2.3 - uses np.inf and np.isinf
+
+# Gleason's table was derived using least square estimation on the tabled
+# r values for combinations of p and v. In total there are 206
+# estimates over p-values of .5, .75, .9, .95, .975, .99, .995,
+# and .999, and over v (degrees of freedom) of (1) - 20, 24, 30, 40,
+# 60, 120, and inf. combinations with p < .95 do not have coefficients
+# for v = 1. Hence the parentheses. These coefficients allow us to
+# form f-hat. f-hat with the inverse t transform of tinv(p,v) yields
+# a fairly accurate estimate of the studentized range distribution
+# across a wide range of values. According to Gleason this method
+# should be more accurate than algorithm AS190 of Lund and Lund (1983)
+# and work across a wider range of values (The AS190 only works
+# from .9 <= p <= .99). R's qtukey algorithm was used to add tables
+# at .675, .8, and .85. These aid approximations when p < .9.
+#
+# The code that generated this table is called make_tbls.py and is
+# located in version control.
+A = {(0.1, 2.0): [-2.2485085243379075, -1.5641014278923464, 0.55942294426816752, -0.060006608853883377],
+     (0.1, 3.0): [-2.2061105943901564, -1.8415406600571855, 0.61880788039834955, -0.062217093661209831],
+     (0.1, 4.0): [-2.1686691786678178, -2.008196172372553, 0.65010084431947401, -0.06289005500114471],
+     (0.1, 5.0): [-2.145077200277393, -2.112454843879346, 0.66701240582821342, -0.062993502233654797],
+     (0.1, 6.0): [-2.0896098049743155, -2.2400004934286497, 0.70088523391700142, -0.065907568563272748],
+     (0.1, 7.0): [-2.0689296655661584, -2.3078445479584873, 0.71577374609418909, -0.067081034249350552],
+     (0.1, 8.0): [-2.0064956480711262, -2.437400413087452, 0.76297532367415266, -0.072805518121505458],
+     (0.1, 9.0): [-2.3269477513436061, -2.0469494712773089, 0.60662518717720593, -0.054887108437009016],
+     (0.1, 10.0): [-2.514024350177229, -1.8261187841127482, 0.51674358077906746, -0.044590425150963633],
+     (0.1, 11.0): [-2.5130181309130828, -1.8371718595995694, 0.51336701694862252, -0.043761825829092445],
+     (0.1, 12.0): [-2.5203508109278823, -1.8355687130611862, 0.5063486549107169, -0.042646205063108261],
+     (0.1, 13.0): [-2.5142536438310477, -1.8496969402776282, 0.50616991367764153, -0.042378379905665363],
+     (0.1, 14.0): [-2.3924634153781352, -2.013859173066078, 0.56421893251638688, -0.048716888109540266],
+     (0.1, 15.0): [-2.3573552940582574, -2.0576676976224362, 0.57424068771143233, -0.049367487649225841],
+     (0.1, 16.0): [-2.3046427483044871, -2.1295959138627993, 0.59778272657680553, -0.051864829216301617],
+     (0.1, 17.0): [-2.2230551072316125, -2.2472837435427127, 0.64255758243215211, -0.057186665209197643],
+     (0.1, 18.0): [-2.3912859179716897, -2.0350604070641269, 0.55924788749333332, -0.047729331835226464],
+     (0.1, 19.0): [-2.4169773092220623, -2.0048217969339146, 0.54493039319748915, -0.045991241346224065],
+     (0.1, 20.0): [-2.4264087194660751, -1.9916614057049267, 0.53583555139648154, -0.04463049934517662],
+     (0.1, 24.0): [-2.3969903132061869, -2.0252941869225345, 0.53428382141200137, -0.043116495567779786],
+     (0.1, 30.0): [-2.2509922780354623, -2.2309248956124894, 0.60748041324937263, -0.051427415888817322],
+     (0.1, 40.0): [-2.1310090183854946, -2.3908466074610564, 0.65844375382323217, -0.05676653804036895],
+     (0.1, 60.0): [-1.9240060179027036, -2.6685751031012233, 0.75678826647453024, -0.067938584352398995],
+     (0.1, 120.0): [-1.9814895487030182, -2.5962051736978373, 0.71793969041292693, -0.063126863201511618],
+     (0.1, inf): [-1.913410267066703, -2.6947367328724732, 0.74742335122750592, -0.06660897234304515],
+     (0.5, 2.0): [-0.88295935738770648, -0.1083576698911433, 0.035214966839394388, -0.0028576288978276461],
+     (0.5, 3.0): [-0.89085829205846834, -0.10255696422201063, 0.033613638666631696, -0.0027101699918520737],
+     (0.5, 4.0): [-0.89627345339338116, -0.099072524607668286, 0.032657774808907684, -0.0026219007698204916],
+     (0.5, 5.0): [-0.89959145511941052, -0.097272836582026817, 0.032236187675182958, -0.0025911555217019663],
+     (0.5, 6.0): [-0.89959428735702474, -0.098176292411106647, 0.032590766960226995, -0.0026319890073613164],
+     (0.5, 7.0): [-0.90131491102863937, -0.097135907620296544, 0.032304124993269533, -0.0026057965808244125],
+     (0.5, 8.0): [-0.90292500599432901, -0.096047500971337962, 0.032030946615574568, -0.0025848748659053891],
+     (0.5, 9.0): [-0.90385598607803697, -0.095390771554571888, 0.031832651111105899, -0.0025656060219315991],
+     (0.5, 10.0): [-0.90562524936125388, -0.093954488089771915, 0.031414451048323286, -0.0025257834705432031],
+     (0.5, 11.0): [-0.90420347371173826, -0.095851656370277288, 0.0321150356209743, -0.0026055056400093451],
+     (0.5, 12.0): [-0.90585973471757664, -0.094449306296728028, 0.031705945923210958, -0.0025673330195780191],
+     (0.5, 13.0): [-0.90555437067293054, -0.094792991050780248, 0.031826594964571089, -0.0025807109129488545],
+     (0.5, 14.0): [-0.90652756604388762, -0.093792156994564738, 0.031468966328889042, -0.0025395175361083741],
+     (0.5, 15.0): [-0.90642323700400085, -0.094173017520487984, 0.031657517378893905, -0.0025659271829033877],
+     (0.5, 16.0): [-0.90716338636685234, -0.093785178083820434, 0.031630091949657997, -0.0025701459247416637],
+     (0.5, 17.0): [-0.90790133816769714, -0.093001147638638884, 0.031376863944487084, -0.002545143621663892],
+     (0.5, 18.0): [-0.9077432927051563, -0.093343516378180599, 0.031518139662395313, -0.0025613906133277178],
+     (0.5, 19.0): [-0.90789499456490286, -0.09316964789456067, 0.031440782366342901, -0.0025498353345867453],
+     (0.5, 20.0): [-0.90842707861030725, -0.092696016476608592, 0.031296040311388329, -0.0025346963982742186],
+     (0.5, 24.0): [-0.9083281347135469, -0.092959308144970776, 0.031464063190077093, -0.0025611384271086285],
+     (0.5, 30.0): [-0.90857624050016828, -0.093043139391980514, 0.031578791729341332, -0.0025766595412777147],
+     (0.5, 40.0): [-0.91034085045438684, -0.091978035738914568, 0.031451631000052639, -0.0025791418103733297],
+     (0.5, 60.0): [-0.91084356681030032, -0.091452675572423425, 0.031333147984820044, -0.0025669786958144843],
+     (0.5, 120.0): [-0.90963649561463833, -0.093414563261352349, 0.032215602703677425, -0.0026704024780441257],
+     (0.5, inf): [-0.91077157500981665, -0.092899220350334571, 0.032230422399363315, -0.0026696941964372916],
+     (0.675, 2.0): [-0.67231521026565144, -0.097083624030663451, 0.027991378901661649, -0.0021425184069845558],
+     (0.675, 3.0): [-0.65661724764645824, -0.08147195494632696, 0.02345732427073333, -0.0017448570400999351],
+     (0.675, 4.0): [-0.65045677697461124, -0.071419073399450431, 0.020741962576852499, -0.0015171262565892491],
+     (0.675, 5.0): [-0.64718875357808325, -0.064720611425218344, 0.019053450246546449, -0.0013836232986228711],
+     (0.675, 6.0): [-0.64523003702018655, -0.059926313672731824, 0.017918997181483924, -0.0012992250285556828],
+     (0.675, 7.0): [-0.64403313148478836, -0.056248191513784476, 0.017091446791293721, -0.0012406558789511822],
+     (0.675, 8.0): [-0.64325095865764359, -0.053352543126426684, 0.016471879286491072, -0.0011991839050964099],
+     (0.675, 9.0): [-0.64271152754911653, -0.051023769620449078, 0.01599799600547195, -0.0011693637984597086],
+     (0.675, 10.0): [-0.64232244408502626, -0.049118327462884373, 0.015629704966568955, -0.0011477775513952285],
+     (0.675, 11.0): [-0.64203897854353564, -0.047524627960277892, 0.015334801262767227, -0.0011315057284007177],
+     (0.675, 12.0): [-0.64180344973512771, -0.046205907576003291, 0.015108290595438166, -0.0011207364514518488],
+     (0.675, 13.0): [-0.64162086456823342, -0.045076099336874231, 0.0149226565346125, -0.0011126140690497352],
+     (0.675, 14.0): [-0.64146906480198984, -0.044108523550512715, 0.014772954218646743, -0.0011069708562369386],
+     (0.675, 15.0): [-0.64133915151966603, -0.043273370927039825, 0.014651691599222836, -0.0011032216539514398],
+     (0.675, 16.0): [-0.64123237842752079, -0.042538925012463868, 0.014549992487506169, -0.0011005633864334021],
+     (0.675, 17.0): [-0.64113034037536609, -0.041905699463005854, 0.014470805560767184, -0.0010995286436738471],
+     (0.675, 18.0): [-0.64104137391561256, -0.041343885546229336, 0.014404563657113593, -0.0010991304223377683],
+     (0.675, 19.0): [-0.64096064882827297, -0.04084569291139839, 0.014350159655133801, -0.0010993656711121901],
+     (0.675, 20.0): [-0.64088647405089572, -0.040402175957178085, 0.014305769823654429, -0.0011001304776712105],
+     (0.675, 24.0): [-0.64063763965937837, -0.039034716348048545, 0.014196703837251648, -0.0011061961945598175],
+     (0.675, 30.0): [-0.64034987716294889, -0.037749651156941719, 0.014147040999127263, -0.0011188251352919833],
+     (0.675, 40.0): [-0.6399990514713938, -0.036583307574857803, 0.014172070700846548, -0.0011391004138624943],
+     (0.675, 60.0): [-0.63955586202430248, -0.035576938958184395, 0.014287299153378865, -0.0011675811805794236],
+     (0.675, 120.0): [-0.63899242674778622, -0.034763757512388853, 0.014500726912982405, -0.0012028491454427466],
+     (0.675, inf): [-0.63832682579247613, -0.034101476695520404, 0.014780921043580184, -0.0012366204114216408],
+     (0.75, 2.0): [-0.60684073638504454, -0.096375192078057031, 0.026567529471304554, -0.0019963228971914488],
+     (0.75, 3.0): [-0.57986144519102656, -0.078570292718034881, 0.021280637925009449, -0.0015329306898533772],
+     (0.75, 4.0): [-0.56820771686193594, -0.0668113563896649, 0.018065284051059189, -0.0012641485481533648],
+     (0.75, 5.0): [-0.56175292435740221, -0.058864526929603825, 0.016046735025708799, -0.0011052560286524044],
+     (0.75, 6.0): [-0.55773449282066356, -0.053136923269827351, 0.014684258167069347, -0.0010042826823561605],
+     (0.75, 7.0): [-0.55509524598867332, -0.048752649191139405, 0.013696566605823626, -0.00093482210003133898],
+     (0.75, 8.0): [-0.55324993686191515, -0.045305558708724644, 0.012959681992062138, -0.00088583541601696021],
+     (0.75, 9.0): [-0.55189259054026196, -0.042539819902381634, 0.012398791106424769, -0.00085083962241435827],
+     (0.75, 10.0): [-0.55085384656956893, -0.040281425755686585, 0.01196442242722482, -0.00082560322161492677],
+     (0.75, 11.0): [-0.55003198103541273, -0.038410176100193948, 0.011623294239447784, -0.00080732975034320073],
+     (0.75, 12.0): [-0.54936541596319177, -0.036838543267887103, 0.011351822637895701, -0.0007940703654926442],
+     (0.75, 13.0): [-0.54881015972753833, -0.035506710625568455, 0.011134691307865171, -0.0007846360016355809],
+     (0.75, 14.0): [-0.54834094346071949, -0.034364790609906569, 0.010958873929274728, -0.00077796645357008291],
+     (0.75, 15.0): [-0.54793602418304255, -0.033379237455748029, 0.010816140998057593, -0.00077344175064785099],
+     (0.75, 16.0): [-0.54758347689728037, -0.032520569145898917, 0.010699240399358219, -0.00077050847328596678],
+     (0.75, 17.0): [-0.54727115963795303, -0.031769277192927527, 0.010603749751170481, -0.0007688642392748113],
+     (0.75, 18.0): [-0.54699351808826535, -0.031105476267880995, 0.010524669113016114, -0.00076810656837464093],
+     (0.75, 19.0): [-0.54674357626419079, -0.030516967201954001, 0.010459478822937069, -0.00076808652582440037],
+     (0.75, 20.0): [-0.54651728378950126, -0.029992319199769232, 0.010405694998386575, -0.0007686417223966138],
+     (0.75, 24.0): [-0.54578309546828363, -0.028372628574010936, 0.010269939602271542, -0.00077427370647261838],
+     (0.75, 30.0): [-0.54501246434397554, -0.026834887880579802, 0.010195603314317611, -0.00078648615954105515],
+     (0.75, 40.0): [-0.54418127442022624, -0.025413224488871379, 0.010196455193836855, -0.00080610785749523739],
+     (0.75, 60.0): [-0.543265189207915, -0.024141961069146383, 0.010285001019536088, -0.00083332193364294587],
+     (0.75, 120.0): [-0.54224757817994806, -0.023039071833948214, 0.010463365295636302, -0.00086612828539477918],
+     (0.75, inf): [-0.54114579815367159, -0.02206592527426093, 0.01070374099737127, -0.00089726564005122183],
+     (0.8, 2.0): [-0.56895274046831146, -0.096326255190541957, 0.025815915364208686, -0.0019136561019354845],
+     (0.8, 3.0): [-0.5336038380862278, -0.077585191014876181, 0.020184759265389905, -0.0014242746007323785],
+     (0.8, 4.0): [-0.51780274285934258, -0.064987738443608709, 0.016713309796866204, -0.001135379856633562],
+     (0.8, 5.0): [-0.50894361222268403, -0.056379186603362705, 0.014511270339773345, -0.00096225604117493205],
+     (0.8, 6.0): [-0.50335153028630408, -0.050168860294790812, 0.01302807093593626, -0.00085269812692536306],
+     (0.8, 7.0): [-0.49960934380896432, -0.045417333787806033, 0.011955593330247398, -0.00077759605604250882],
+     (0.8, 8.0): [-0.49694518248979763, -0.041689151516021969, 0.011158986677273709, -0.00072497430103953366],
+     (0.8, 9.0): [-0.4949559974898507, -0.038702217132906024, 0.010554360004521268, -0.0006875213117164109],
+     (0.8, 10.0): [-0.49341407910162483, -0.036266788741325398, 0.010087354421936092, -0.00066060835062865602],
+     (0.8, 11.0): [-0.49218129312493897, -0.034252403643273498, 0.0097218584838579536, -0.00064123459335201907],
+     (0.8, 12.0): [-0.49117223957112183, -0.032563269730499021, 0.0094318583096021404, -0.00062725253852419032],
+     (0.8, 13.0): [-0.49032781145131277, -0.031132495018324432, 0.0091999762562792898, -0.0006172944366003854],
+     (0.8, 14.0): [-0.48961049628464259, -0.029906921170494854, 0.009012451847823854, -0.00061026211968669543],
+     (0.8, 15.0): [-0.48899069793054922, -0.028849609914548158, 0.0088602820002619594, -0.00060548991575179055],
+     (0.8, 16.0): [-0.48844921216636505, -0.027929790075266154, 0.00873599263877896, -0.00060242119796859379],
+     (0.8, 17.0): [-0.48797119683309537, -0.027123634910159868, 0.0086338139869481887, -0.00060061821593399998],
+     (0.8, 18.0): [-0.48754596864745836, -0.026411968723496961, 0.0085493196604705755, -0.00059977083160833624],
+     (0.8, 19.0): [-0.48716341805691843, -0.025781422230819986, 0.0084796655915025769, -0.00059970031758323466],
+     (0.8, 20.0): [-0.48681739197185547, -0.025219629852198749, 0.0084221844254287765, -0.00060023212822886711],
+     (0.8, 24.0): [-0.48570639629281365, -0.023480608772518948, 0.008274490561114187, -0.000605681105792215],
+     (0.8, 30.0): [-0.48455867067770253, -0.021824655071720423, 0.0081888502974720567, -0.00061762126933785633],
+     (0.8, 40.0): [-0.48335478729267423, -0.020279958998363389, 0.0081765095914194709, -0.00063657117129829635],
+     (0.8, 60.0): [-0.48207351944996679, -0.018875344346672228, 0.0082473997191472338, -0.00066242478479277243],
+     (0.8, 120.0): [-0.48070356185330182, -0.017621686995755746, 0.0084009638803223801, -0.00069300383808949318],
+     (0.8, inf): [-0.47926687718713606, -0.016476575352367202, 0.0086097059646591811, -0.00072160843492730911],
+     (0.85, 2.0): [-0.53366806986381743, -0.098288178252723263, 0.026002333446289064, -0.0019567144268844896],
+     (0.85, 3.0): [-0.48995919239619989, -0.077312722648418056, 0.019368984865418108, -0.0013449670192265796],
+     (0.85, 4.0): [-0.46956079162382858, -0.063818518513946695, 0.015581608910696544, -0.0010264315084377606],
+     (0.85, 5.0): [-0.45790853796153624, -0.054680511194530226, 0.013229852432203093, -0.00084248430847535898],
+     (0.85, 6.0): [-0.4505070841695738, -0.048050936682873302, 0.011636407582714191, -0.00072491480033529815],
+     (0.85, 7.0): [-0.44548337477336181, -0.042996612516383016, 0.010493052959891263, -0.00064528784792153239],
+     (0.85, 8.0): [-0.44186624932664148, -0.039040005821657585, 0.0096479530794160544, -0.00058990874360967567],
+     (0.85, 9.0): [-0.43914118689812259, -0.035875693030752713, 0.0090088804130628187, -0.00055071480339399694],
+     (0.85, 10.0): [-0.43701255390953769, -0.033300997407157376, 0.0085172159355344848, -0.00052272770799695464],
+     (0.85, 11.0): [-0.43530109064899053, -0.031174742038490313, 0.0081335619868386066, -0.00050268353809787927],
+     (0.85, 12.0): [-0.43389220376610071, -0.02939618314990838, 0.007830626267772851, -0.00048836431712678222],
+     (0.85, 13.0): [-0.43271026958463166, -0.027890759135246888, 0.0075886916668632936, -0.00047819339710596971],
+     (0.85, 14.0): [-0.43170230265007209, -0.026604156062396189, 0.0073939099688705547, -0.00047109996854335419],
+     (0.85, 15.0): [-0.43083160459377423, -0.025494228911600785, 0.0072358738657550868, -0.00046630677052262481],
+     (0.85, 16.0): [-0.4300699280587239, -0.024529612608808794, 0.0071069227026219683, -0.00046323869860941791],
+     (0.85, 17.0): [-0.42939734931902857, -0.023685025616054269, 0.0070011541609695891, -0.00046147954942994158],
+     (0.85, 18.0): [-0.42879829041505324, -0.022940655682782165, 0.006914006369119409, -0.00046070877994711774],
+     (0.85, 19.0): [-0.42826119448419875, -0.022280181781634649, 0.0068417746905826433, -0.00046066841214091982],
+     (0.85, 20.0): [-0.42777654887094479, -0.021690909076747832, 0.0067817408643717969, -0.00046118620289068032],
+     (0.85, 24.0): [-0.42622450033640852, -0.019869646711890065, 0.0066276799593494029, -0.00046668820637553747],
+     (0.85, 30.0): [-0.42463810443233418, -0.018130114737381745, 0.0065344613060499164, -0.00047835583417510423],
+     (0.85, 40.0): [-0.42299917804589382, -0.016498222901308417, 0.0065120558343578407, -0.00049656043685325469],
+     (0.85, 60.0): [-0.42129387265810464, -0.014992121475265813, 0.0065657795990087635, -0.00052069705640687698],
+     (0.85, 120.0): [-0.41951580476366368, -0.013615722489371183, 0.0066923911275726814, -0.00054846911649167492],
+     (0.85, inf): [-0.41768751825428968, -0.012327525092266726, 0.0068664920569562592, -0.00057403720261753539],
+     (0.9, 1.0): [-0.65851063279096722, -0.126716242078905, 0.036318801917603061, -0.002901283222928193],
+     (0.9, 2.0): [-0.50391945369829139, -0.096996108021146235, 0.024726437623473398, -0.0017901399938303017],
+     (0.9, 3.0): [-0.44799791843058734, -0.077180370333307199, 0.018584042055594469, -0.0012647038118363408],
+     (0.9, 4.0): [-0.42164091756145167, -0.063427071006287514, 0.014732203755741392, -0.00094904174117957688],
+     (0.9, 5.0): [-0.40686856251221754, -0.053361940054842398, 0.012041802076025801, -0.00072960198292410612],
+     (0.9, 6.0): [-0.39669926026535285, -0.046951517438004242, 0.010546647213094956, -0.00062621198002366064],
+     (0.9, 7.0): [-0.39006553675807426, -0.04169480606532109, 0.0093687546601737195, -0.00054648695713273862],
+     (0.9, 8.0): [-0.38570205067061908, -0.037083910859179794, 0.0083233218526375836, -0.00047177586974035451],
+     (0.9, 9.0): [-0.38190737267892938, -0.034004585655388865, 0.0077531991574119183, -0.00044306547308527872],
+     (0.9, 10.0): [-0.37893272918125737, -0.031394677600916979, 0.0072596802503533536, -0.0004160518834299966],
+     (0.9, 11.0): [-0.37692512492705132, -0.028780793403136471, 0.0066937909049060379, -0.00037420010136784526],
+     (0.9, 12.0): [-0.37506345200129187, -0.026956483290567372, 0.0064147730707776523, -0.00036595383207062906],
+     (0.9, 13.0): [-0.37339516122383209, -0.02543949524844704, 0.0061760656530197187, -0.00035678737379179527],
+     (0.9, 14.0): [-0.37216979891087842, -0.02396347606956644, 0.0059263234465969641, -0.0003439784452550796],
+     (0.9, 15.0): [-0.371209456600122, -0.022696132732654414, 0.0057521677184623147, -0.00033961108561770848],
+     (0.9, 16.0): [-0.36958924377983338, -0.022227885445863002, 0.0057691706799383926, -0.00035042762538099682],
+     (0.9, 17.0): [-0.36884224719083203, -0.021146977888668726, 0.0055957928269732716, -0.00034283810412697531],
+     (0.9, 18.0): [-0.36803087186793326, -0.020337731477576542, 0.0054655378095212759, -0.00033452966946535248],
+     (0.9, 19.0): [-0.3676700404163355, -0.019370115848857467, 0.0053249296207149655, -0.00032975528909580403],
+     (0.9, 20.0): [-0.36642276267188811, -0.019344251412284838, 0.0054454968582897528, -0.00034868111677540948],
+     (0.9, 24.0): [-0.36450650753755193, -0.017284255499990679, 0.0052337500059176749, -0.00034898202845747288],
+     (0.9, 30.0): [-0.36251868940168608, -0.015358560437631397, 0.0050914299956134786, -0.00035574528891633978],
+     (0.9, 40.0): [-0.36008886676510943, -0.014016835682905486, 0.0051930835959111514, -0.00038798316011984165],
+     (0.9, 60.0): [-0.35825590690268061, -0.011991568926537646, 0.0050632208542414191, -0.00039090198974493085],
+     (0.9, 120.0): [-0.35543612237284411, -0.011074403997811812, 0.0053504570752765162, -0.00043647137428074178],
+     (0.9, inf): [-0.35311806343057167, -0.0096254020092145353, 0.0054548591208177181, -0.00045343916634968493],
+     (0.95, 1.0): [-0.65330318136020071, -0.12638310760474375, 0.035987535130769424, -0.0028562665467665315],
+     (0.95, 2.0): [-0.47225160417826934, -0.10182570362271424, 0.025846563499059158, -0.0019096769058043243],
+     (0.95, 3.0): [-0.4056635555586528, -0.077067172693350297, 0.017789909647225533, -0.001182961668735774],
+     (0.95, 4.0): [-0.37041675177340955, -0.063815687118939465, 0.014115210247737845, -0.00089996098435117598],
+     (0.95, 5.0): [-0.35152398291152309, -0.052156502640669317, 0.010753738086401853, -0.0005986841939451575],
+     (0.95, 6.0): [-0.33806730015201264, -0.045668399809578597, 0.0093168898952878162, -0.00051369719615782102],
+     (0.95, 7.0): [-0.32924041072104465, -0.040019601775490091, 0.0080051199552865163, -0.00042054536135868043],
+     (0.95, 8.0): [-0.32289030266989077, -0.035575345931670443, 0.0070509089344694669, -0.00035980773304803576],
+     (0.95, 9.0): [-0.31767304201477375, -0.032464945930165703, 0.0064755950437272143, -0.0003316676253661824],
+     (0.95, 10.0): [-0.31424318064708656, -0.029133461621153, 0.0057437449431074795, -0.00027894252261209191],
+     (0.95, 11.0): [-0.31113589620384974, -0.02685115250591049, 0.0053517905282942889, -0.00026155954116874666],
+     (0.95, 12.0): [-0.30848983612414582, -0.025043238019239168, 0.0050661675913488829, -0.00025017202909614005],
+     (0.95, 13.0): [-0.3059212907410393, -0.023863874699213077, 0.0049618051135807322, -0.00025665425781125703],
+     (0.95, 14.0): [-0.30449676902720035, -0.021983976741572344, 0.0045740513735751968, -0.00022881166323945914],
+     (0.95, 15.0): [-0.30264908294481396, -0.02104880307520084, 0.0044866571614804382, -0.00023187587597844057],
+     (0.95, 16.0): [-0.30118294463097917, -0.020160231061926728, 0.0044170780759056859, -0.00023733502359045826],
+     (0.95, 17.0): [-0.30020013353427744, -0.018959271614471574, 0.0041925333038202285, -0.00022274025630789767],
+     (0.95, 18.0): [-0.29857886556874402, -0.018664437456802001, 0.0042557787632833697, -0.00023758868868853716],
+     (0.95, 19.0): [-0.29796289236978263, -0.017632218552317589, 0.0040792779937959866, -0.00022753271474613109],
+     (0.95, 20.0): [-0.29681506554838077, -0.017302563243037392, 0.0041188426221428964, -0.00023913038468772782],
+     (0.95, 24.0): [-0.29403146911167666, -0.015332330986025032, 0.0039292170319163728, -0.00024003445648641732],
+     (0.95, 30.0): [-0.29080775563775879, -0.013844059210779323, 0.0039279165616059892, -0.00026085104496801666],
+     (0.95, 40.0): [-0.28821583032805109, -0.011894686715666892, 0.0038202623278839982, -0.00026933325102031252],
+     (0.95, 60.0): [-0.28525636737751447, -0.010235910558409797, 0.0038147029777580001, -0.00028598362144178959],
+     (0.95, 120.0): [-0.28241065885026539, -0.0086103836327305026, 0.0038450612886908714, -0.00030206053671559411],
+     (0.95, inf): [-0.27885570064169296, -0.0078122455524849222, 0.0041798538053623453, -0.0003469494881774609],
+     (0.975, 1.0): [-0.65203598304297983, -0.12608944279227957, 0.035710038757117347, -0.0028116024425349053],
+     (0.975, 2.0): [-0.46371891130382281, -0.096954458319996509, 0.023958312519912289, -0.0017124565391080503],
+     (0.975, 3.0): [-0.38265282195259875, -0.076782539231612282, 0.017405078796142955, -0.0011610853687902553],
+     (0.975, 4.0): [-0.34051193158878401, -0.063652342734671602, 0.013528310336964293, -0.00083644708934990761],
+     (0.975, 5.0): [-0.31777655705536484, -0.051694686914334619, 0.010115807205265859, -0.00054517465344192009],
+     (0.975, 6.0): [-0.30177149019958716, -0.044806697631189059, 0.008483551848413786, -0.00042827853925009264],
+     (0.975, 7.0): [-0.29046972313293562, -0.039732822689098744, 0.007435356037378946, -0.00037562928283350671],
+     (0.975, 8.0): [-0.28309484007368141, -0.034764904940713388, 0.0062932513694928518, -0.00029339243611357956],
+     (0.975, 9.0): [-0.27711707948119785, -0.031210465194810709, 0.0055576244284178435, -0.00024663798208895803],
+     (0.975, 10.0): [-0.27249203448553611, -0.028259756468251584, 0.00499112012528406, -0.00021535380417035389],
+     (0.975, 11.0): [-0.26848515860011007, -0.026146703336893323, 0.0046557767110634073, -0.00020400628148271448],
+     (0.975, 12.0): [-0.26499921540008192, -0.024522931106167097, 0.0044259624958665278, -0.00019855685376441687],
+     (0.975, 13.0): [-0.2625023751891592, -0.022785875653297854, 0.004150277321193792, -0.00018801223218078264],
+     (0.975, 14.0): [-0.26038552414321758, -0.021303509859738341, 0.0039195608280464681, -0.00017826200169385824],
+     (0.975, 15.0): [-0.25801244886414665, -0.020505508012402567, 0.0038754868932712929, -0.00018588907991739744],
+     (0.975, 16.0): [-0.25685316062360508, -0.018888418269740373, 0.0035453092842317293, -0.00016235770674204116],
+     (0.975, 17.0): [-0.25501132271353549, -0.018362951972357794, 0.0035653933105288631, -0.00017470353354992729],
+     (0.975, 18.0): [-0.25325045404452656, -0.017993537285026156, 0.0036035867405376691, -0.00018635492166426884],
+     (0.975, 19.0): [-0.25236899494677928, -0.016948921372207198, 0.0034138931781330802, -0.00017462253414687881],
+     (0.975, 20.0): [-0.25134498025027691, -0.016249564498874988, 0.0033197284005334333, -0.00017098091103245596],
+     (0.975, 24.0): [-0.24768690797476625, -0.014668160763513996, 0.0032850791186852558, -0.00019013480716844995],
+     (0.975, 30.0): [-0.24420834707522676, -0.012911171716272752, 0.0031977676700968051, -0.00020114907914487053],
+     (0.975, 40.0): [-0.24105725356215926, -0.010836526056169627, 0.0030231303550754159, -0.00020128696343148667],
+     (0.975, 60.0): [-0.23732082703955223, -0.0095442727157385391, 0.0031432904473555259, -0.00023062224109383941],
+     (0.975, 120.0): [-0.23358581879594578, -0.0081281259918709343, 0.0031877298679120094, -0.00024496230446851501],
+     (0.975, inf): [-0.23004105093119268, -0.0067112585174133573, 0.0032760251638919435, -0.00026244001319462992],
+     (0.99, 1.0): [-0.65154119422706203, -0.1266603927572312, 0.03607480609672048, -0.0028668112687608113],
+     (0.99, 2.0): [-0.45463403324378804, -0.098701236234527367, 0.024412715761684689, -0.0017613772919362193],
+     (0.99, 3.0): [-0.36402060051035778, -0.079244959193729148, 0.017838124021360584, -0.00119080116484847],
+     (0.99, 4.0): [-0.31903506063953818, -0.061060740682445241, 0.012093154962939612, -0.00067268347188443093],
+     (0.99, 5.0): [-0.28917014580689182, -0.052940780099313689, 0.010231009146279354, -0.00057178339184615239],
+     (0.99, 6.0): [-0.27283240161179012, -0.042505435573209085, 0.0072753401118264534, -0.00031314034710725922],
+     (0.99, 7.0): [-0.25773968720546719, -0.039384214480463406, 0.0069120882597286867, -0.00032994068754356204],
+     (0.99, 8.0): [-0.24913629282433833, -0.033831567178432859, 0.0055516244725724185, -0.00022570786249671376],
+     (0.99, 9.0): [-0.24252380896373404, -0.029488280751457097, 0.0045215453527922998, -0.00014424552929022646],
+     (0.99, 10.0): [-0.23654349556639986, -0.02705600214566789, 0.0041627255469343632, -0.00013804427029504753],
+     (0.99, 11.0): [-0.23187404969432468, -0.024803662094970855, 0.0037885852786822475, -0.00012334999287725012],
+     (0.99, 12.0): [-0.22749929386320905, -0.023655085290534145, 0.0037845051889055896, -0.00014785715789924055],
+     (0.99, 13.0): [-0.22458989143485605, -0.021688394892771506, 0.0034075294601425251, -0.00012436961982044268],
+     (0.99, 14.0): [-0.22197623872225777, -0.020188830700102918, 0.0031648685865587473, -0.00011320740119998819],
+     (0.99, 15.0): [-0.2193924323730066, -0.019327469111698265, 0.0031295453754886576, -0.00012373072900083014],
+     (0.99, 16.0): [-0.21739436875855705, -0.018215854969324128, 0.0029638341057222645, -0.00011714667871412003],
+     (0.99, 17.0): [-0.21548926805467686, -0.017447822179412719, 0.0028994805120482812, -0.00012001887015183794],
+     (0.99, 18.0): [-0.21365014687077843, -0.01688869353338961, 0.0028778031289216546, -0.00012591199104792711],
+     (0.99, 19.0): [-0.21236653761262406, -0.016057151563612645, 0.0027571468998022017, -0.00012049196593780046],
+     (0.99, 20.0): [-0.21092693178421842, -0.015641706950956638, 0.0027765989877361293, -0.00013084915163086915],
+     (0.99, 24.0): [-0.20681960327410207, -0.013804298040271909, 0.0026308276736585674, -0.0001355061502101814],
+     (0.99, 30.0): [-0.20271691131071576, -0.01206095288359876, 0.0025426138004198909, -0.00014589047959047533],
+     (0.99, 40.0): [-0.19833098054449289, -0.010714533963740719, 0.0025985992420317597, -0.0001688279944262007],
+     (0.99, 60.0): [-0.19406768821236584, -0.0093297106482013985, 0.0026521518387539584, -0.00018884874193665104],
+     (0.99, 120.0): [-0.19010213174677365, -0.0075958207221300924, 0.0025660823297025633, -0.00018906475172834352],
+     (0.99, inf): [-0.18602070255787137, -0.0062121155165363188, 0.0026328293420766593, -0.00020453366529867131],
+     (0.995, 1.0): [-0.65135583544951825, -0.1266868999507193, 0.036067522182457165, -0.0028654516958844922],
+     (0.995, 2.0): [-0.45229774013072793, -0.09869462954369547, 0.024381858599368908, -0.0017594734553033394],
+     (0.995, 3.0): [-0.35935765236429706, -0.076650408326671915, 0.016823026893528978, -0.0010835134496404637],
+     (0.995, 4.0): [-0.30704474720931169, -0.063093047731613019, 0.012771683306774929, -0.00075852491621809955],
+     (0.995, 5.0): [-0.27582551740863454, -0.052533353137885791, 0.0097776009845174372, -0.00051338031756399129],
+     (0.995, 6.0): [-0.25657971464398704, -0.043424914996692286, 0.0074324147435969991, -0.00034105188850494067],
+     (0.995, 7.0): [-0.24090407819707738, -0.039591604712200287, 0.0068848429451020387, -0.00034737131709273414],
+     (0.995, 8.0): [-0.23089540800827862, -0.034353305816361958, 0.0056009527629820111, -0.00024389336976992433],
+     (0.995, 9.0): [-0.22322694848310584, -0.030294770709722547, 0.0046751239747245543, -0.00017437479314218922],
+     (0.995, 10.0): [-0.21722684126671632, -0.026993563560163809, 0.0039811592710905491, -0.00013135281785826703],
+     (0.995, 11.0): [-0.21171635822852911, -0.025156193618212551, 0.0037507759652964205, -0.00012959836685175671],
+     (0.995, 12.0): [-0.20745332165849167, -0.023318819535607219, 0.0034935020002058903, -0.00012642826898405916],
+     (0.995, 13.0): [-0.20426054591612508, -0.021189796175249527, 0.003031472176128759, -9.0497733877531618e-05],
+     (0.995, 14.0): [-0.20113536905578902, -0.020011536696623061, 0.0029215880889956729, -9.571527213951222e-05],
+     (0.995, 15.0): [-0.19855601561006403, -0.018808533734002542, 0.0027608859956002344, -9.2472995256929217e-05],
+     (0.995, 16.0): [-0.19619157579534008, -0.017970461530551096, 0.0027113719105000371, -9.9864874982890861e-05],
+     (0.995, 17.0): [-0.19428015140726104, -0.017009762497670704, 0.0025833389598201345, -9.6137545738061124e-05],
+     (0.995, 18.0): [-0.19243180236773033, -0.01631617252107519, 0.0025227443561618621, -9.8067580523432881e-05],
+     (0.995, 19.0): [-0.19061294393069844, -0.01586226613672222, 0.0025207005902641781, -0.00010466151274918466],
+     (0.995, 20.0): [-0.18946302696580328, -0.014975796567260896, 0.0023700506576419867, -9.5507779057884629e-05],
+     (0.995, 24.0): [-0.18444251428695257, -0.013770955893918012, 0.0024579445553339903, -0.00012688402863358003],
+     (0.995, 30.0): [-0.18009742499570078, -0.011831341846559026, 0.0022801125189390046, -0.00012536249967254906],
+     (0.995, 40.0): [-0.17562721880943261, -0.010157142650455463, 0.0022121943861923474, -0.000134542652873434],
+     (0.995, 60.0): [-0.17084630673594547, -0.0090224965852754805, 0.0023435529965815565, -0.00016240306777440115],
+     (0.995, 120.0): [-0.16648414081054147, -0.0074792163241677225, 0.0023284585524533607, -0.00017116464012147041],
+     (0.995, inf): [-0.16213921875452461, -0.0058985998630496144, 0.0022605819363689093, -0.00016896211491119114],
+     (0.999, 1.0): [-0.65233994072089363, -0.12579427445444219, 0.035830577995679271, -0.0028470555202945564],
+     (0.999, 2.0): [-0.45050164311326341, -0.098294804380698292, 0.024134463919493736, -0.0017269603956852841],
+     (0.999, 3.0): [-0.35161741499307819, -0.076801152272374273, 0.016695693063138672, -0.0010661121974071864],
+     (0.999, 4.0): [-0.29398448788574133, -0.06277319725219685, 0.012454220010543127, -0.00072644165723402445],
+     (0.999, 5.0): [-0.25725364564365477, -0.053463787584337355, 0.0099664236557431545, -0.00054866039388980659],
+     (0.999, 6.0): [-0.23674225795168574, -0.040973155890031254, 0.0062599481191736696, -0.00021565734226586692],
+     (0.999, 7.0): [-0.21840108878983297, -0.037037020271877719, 0.0055908063671900703, -0.00020238790479809623],
+     (0.999, 8.0): [-0.2057964743918449, -0.032500885103194356, 0.0046441644585661756, -0.00014769592268680274],
+     (0.999, 9.0): [-0.19604592954882674, -0.029166922919677936, 0.0040644333111949814, -0.00012854052861297006],
+     (0.999, 10.0): [-0.18857328935948367, -0.026316705703161091, 0.0035897350868809275, -0.00011572282691335702],
+     (0.999, 11.0): [-0.18207431428535406, -0.024201081944369412, 0.0031647372098056077, -8.1145935982296439e-05],
+     (0.999, 12.0): [-0.17796358148991101, -0.021054306118620879, 0.0023968085939602055, -1.5907156771296993e-05],
+     (0.999, 13.0): [-0.17371965962745489, -0.019577162950177709, 0.0022391783473999739, -2.0613023472812558e-05],
+     (0.999, 14.0): [-0.16905298116759873, -0.01967115985443986, 0.0026495208325889269, -9.1074275220634073e-05],
+     (0.999, 15.0): [-0.16635662558214312, -0.017903767183469876, 0.0022301322677100496, -5.1956773935885426e-05],
+     (0.999, 16.0): [-0.16388776549525449, -0.016671918839902419, 0.0020365289602744382, -4.3592447599724942e-05],
+     (0.999, 17.0): [-0.16131934177990759, -0.015998918405126326, 0.0019990454743285904, -4.8176277491327653e-05],
+     (0.999, 18.0): [-0.15880633110376571, -0.015830715141055916, 0.0021688405343832091, -8.061825248932771e-05],
+     (0.999, 19.0): [-0.15644841913314136, -0.015729364721105681, 0.0022981443610378136, -0.00010093672643417343],
+     (0.999, 20.0): [-0.15516596606222705, -0.014725095968258637, 0.0021117117014292155, -8.8806880297328484e-05],
+     (0.999, 24.0): [-0.14997437768645827, -0.012755323295476786, 0.0018871651510496939, -8.0896370662414938e-05],
+     (0.999, 30.0): [-0.14459974882323703, -0.011247323832877647, 0.0018637400643826279, -9.6415323191606741e-05],
+     (0.999, 40.0): [-0.13933285919392555, -0.0097151769692496587, 0.0018131251876208683, -0.00010452598991994023],
+     (0.999, 60.0): [-0.13424555343804143, -0.0082163027951669444, 0.0017883427892173382, -0.00011415865110808405],
+     (0.999, 120.0): [-0.12896119523040372, -0.0070426701112581112, 0.0018472364154226955, -0.00012862202979478294],
+     (0.999, inf): [-0.12397213562666673, -0.0056901201604149998, 0.0018260689406957129, -0.00013263452567995485]}
+
+# p values that are defined in the A table
+p_keys = [.1,.5,.675,.75,.8,.85,.9,.95,.975,.99,.995,.999]
+
+# v values that are defined in the A table
+v_keys = lrange(2, 21) + [24, 30, 40, 60, 120, inf]

 def _isfloat(x):
     """
     returns True if x is a float,
     returns False otherwise
     """
-    pass
+    try:
+        float(x)
+    except:
+        return False

+    return True

-def _phi(p):
+##def _phi(p):
+##    """returns the pth quantile inverse norm"""
+##    return scipy.stats.norm.isf(p)
+
+def _phi( p ):
+    # this function is faster than using scipy.stats.norm.isf(p)
+    # but the permissity of the license is not explicitly listed.
+    # using scipy.stats.norm.isf(p) is an acceptable alternative
     """
     Modified from the author's original perl code (original comments follow below)
     by dfield@yahoo-inc.com.  May 3, 2004.
@@ -562,56 +413,351 @@ def _phi(p):
     E-mail:      pjacklam@online.no
     WWW URL:     http://home.online.no/~pjacklam
     """
-    pass

+    if p <= 0 or p >= 1:
+        # The original perl code exits here, we'll throw an exception instead
+        raise ValueError( "Argument to ltqnorm %f must be in open interval (0,1)" % p )
+
+    # Coefficients in rational approximations.
+    a = (-3.969683028665376e+01,  2.209460984245205e+02, \
+         -2.759285104469687e+02,  1.383577518672690e+02, \
+         -3.066479806614716e+01,  2.506628277459239e+00)
+    b = (-5.447609879822406e+01,  1.615858368580409e+02, \
+         -1.556989798598866e+02,  6.680131188771972e+01, \
+         -1.328068155288572e+01 )
+    c = (-7.784894002430293e-03, -3.223964580411365e-01, \
+         -2.400758277161838e+00, -2.549732539343734e+00, \
+          4.374664141464968e+00,  2.938163982698783e+00)
+    d = ( 7.784695709041462e-03,  3.224671290700398e-01, \
+          2.445134137142996e+00,  3.754408661907416e+00)
+
+    # Define break-points.
+    plow  = 0.02425
+    phigh = 1 - plow
+
+    # Rational approximation for lower region:
+    if p < plow:
+        q  = math.sqrt(-2*math.log(p))
+        return -(((((c[0]*q+c[1])*q+c[2])*q+c[3])*q+c[4])*q+c[5]) / \
+                 ((((d[0]*q+d[1])*q+d[2])*q+d[3])*q+1)
+
+    # Rational approximation for upper region:
+    if phigh < p:
+        q  = math.sqrt(-2*math.log(1-p))
+        return (((((c[0]*q+c[1])*q+c[2])*q+c[3])*q+c[4])*q+c[5]) / \
+                ((((d[0]*q+d[1])*q+d[2])*q+d[3])*q+1)
+
+    # Rational approximation for central region:
+    q = p - 0.5
+    r = q*q
+    return -(((((a[0]*r+a[1])*r+a[2])*r+a[3])*r+a[4])*r+a[5])*q / \
+           (((((b[0]*r+b[1])*r+b[2])*r+b[3])*r+b[4])*r+1)

 def _ptransform(p):
     """function for p-value abcissa transformation"""
-    pass
-
+    return -1. / (1. + 1.5 * _phi((1. + p)/2.))

 def _func(a, p, r, v):
     """
     calculates f-hat for the coefficients in a, probability p,
     sample mean difference r, and degrees of freedom v.
     """
-    pass
+    # eq. 2.3
+    f = a[0]*math.log(r-1.) + \
+        a[1]*math.log(r-1.)**2 + \
+        a[2]*math.log(r-1.)**3 + \
+        a[3]*math.log(r-1.)**4
+
+    # eq. 2.7 and 2.8 corrections
+    if r == 3:
+        f += -0.002 / (1. + 12. * _phi(p)**2)
+
+        if v <= 4.364:
+            v = v if not np.isinf(v) else 1e38
+            f += 1. / 517. - 1. / (312. * v)
+        else:
+            v = v if not np.isinf(v) else 1e38
+            f += 1. / (191. * v)

+    return -f

 def _select_ps(p):
+    # There are more generic ways of doing this but profiling
+    # revealed that selecting these points is one of the slow
+    # things that is easy to change. This is about 11 times
+    # faster than the generic algorithm it is replacing.
+    #
+    # it is possible that different break points could yield
+    # better estimates, but the function this is refactoring
+    # just used linear distance.
     """returns the points to use for interpolating p"""
-    pass
-
+    if p >= .99:
+        return .990, .995, .999
+    elif p >= .975:
+        return .975, .990, .995
+    elif p >= .95:
+        return .950, .975, .990
+    elif p >= .9125:
+        return .900, .950, .975
+    elif p >= .875:
+        return .850, .900, .950
+    elif p >= .825:
+        return .800, .850, .900
+    elif p >= .7625:
+        return .750, .800, .850
+    elif p >= .675:
+        return .675, .750, .800
+    elif p >= .500:
+        return .500, .675, .750
+    else:
+        return .100, .500, .675

 def _interpolate_p(p, r, v):
     """
     interpolates p based on the values in the A table for the
     scalar value of r and the scalar value of v
     """
-    pass

+    # interpolate p (v should be in table)
+    # if .5 < p < .75 use linear interpolation in q
+    # if p > .75 use quadratic interpolation in log(y + r/v)
+    # by -1. / (1. + 1.5 * _phi((1. + p)/2.))
+
+    # find the 3 closest v values
+    p0, p1, p2 = _select_ps(p)
+    try:
+        y0 = _func(A[(p0, v)], p0, r, v) + 1.
+    except:
+        print(p,r,v)
+        raise
+    y1 = _func(A[(p1, v)], p1, r, v) + 1.
+    y2 = _func(A[(p2, v)], p2, r, v) + 1.
+
+    y_log0 = math.log(y0 + float(r)/float(v))
+    y_log1 = math.log(y1 + float(r)/float(v))
+    y_log2 = math.log(y2 + float(r)/float(v))
+
+    # If p < .85 apply only the ordinate transformation
+    # if p > .85 apply the ordinate and the abcissa transformation
+    # In both cases apply quadratic interpolation
+    if p > .85:
+        p_t  = _ptransform(p)
+        p0_t = _ptransform(p0)
+        p1_t = _ptransform(p1)
+        p2_t = _ptransform(p2)
+
+        # calculate derivatives for quadratic interpolation
+        d2 = 2*((y_log2-y_log1)/(p2_t-p1_t) - \
+                (y_log1-y_log0)/(p1_t-p0_t))/(p2_t-p0_t)
+        if (p2+p0)>=(p1+p1):
+            d1 = (y_log2-y_log1)/(p2_t-p1_t) - 0.5*d2*(p2_t-p1_t)
+        else:
+            d1 = (y_log1-y_log0)/(p1_t-p0_t) + 0.5*d2*(p1_t-p0_t)
+        d0 = y_log1
+
+        # interpolate value
+        y_log = (d2/2.) * (p_t-p1_t)**2. + d1 * (p_t-p1_t) + d0
+
+        # transform back to y
+        y = math.exp(y_log) - float(r)/float(v)
+
+    elif p > .5:
+        # calculate derivatives for quadratic interpolation
+        d2 = 2*((y_log2-y_log1)/(p2-p1) - \
+                (y_log1-y_log0)/(p1-p0))/(p2-p0)
+        if (p2+p0)>=(p1+p1):
+            d1 = (y_log2-y_log1)/(p2-p1) - 0.5*d2*(p2-p1)
+        else:
+            d1 = (y_log1-y_log0)/(p1-p0) + 0.5*d2*(p1-p0)
+        d0 = y_log1
+
+        # interpolate values
+        y_log = (d2/2.) * (p-p1)**2. + d1 * (p-p1) + d0
+
+        # transform back to y
+        y = math.exp(y_log) - float(r)/float(v)
+
+    else:
+        # linear interpolation in q and p
+        v = min(v, 1e38)
+        q0 = math.sqrt(2) * -y0 * \
+             scipy.stats.t.isf((1.+p0)/2., v)
+        q1 = math.sqrt(2) * -y1 * \
+             scipy.stats.t.isf((1.+p1)/2., v)
+
+        d1 = (q1-q0)/(p1-p0)
+        d0 = q0
+
+        # interpolate values
+        q = d1 * (p-p0) + d0
+
+        # transform back to y
+        y = -q / (math.sqrt(2) * scipy.stats.t.isf((1.+p)/2., v))
+
+    return y

 def _select_vs(v, p):
+    # This one is is about 30 times faster than
+    # the generic algorithm it is replacing.
     """returns the points to use for interpolating v"""
-    pass

+    if v >= 120.:
+        return 60, 120, inf
+    elif v >= 60.:
+        return 40, 60, 120
+    elif v >= 40.:
+        return 30, 40, 60
+    elif v >= 30.:
+        return 24, 30, 40
+    elif v >= 24.:
+        return 20, 24, 30
+    elif v >= 19.5:
+        return 19, 20, 24
+
+    if p >= .9:
+        if v < 2.5:
+            return 1, 2, 3
+    else:
+        if v < 3.5:
+            return 2, 3, 4
+
+    vi = int(round(v))
+    return vi - 1, vi, vi + 1

 def _interpolate_v(p, r, v):
     """
     interpolates v based on the values in the A table for the
     scalar value of r and th
     """
-    pass
-
+    # interpolate v (p should be in table)
+    # ordinate: y**2
+    # abcissa:  1./v
+
+    # find the 3 closest v values
+    # only p >= .9 have table values for 1 degree of freedom.
+    # The boolean is used to index the tuple and append 1 when
+    # p >= .9
+    v0, v1, v2 = _select_vs(v, p)
+
+    # y = f - 1.
+    y0_sq = (_func(A[(p,v0)], p, r, v0) + 1.)**2.
+    y1_sq = (_func(A[(p,v1)], p, r, v1) + 1.)**2.
+    y2_sq = (_func(A[(p,v2)], p, r, v2) + 1.)**2.
+
+    # if v2 is inf set to a big number so interpolation
+    # calculations will work
+    if v2 > 1e38:
+        v2 = 1e38
+
+    # transform v
+    v_, v0_, v1_, v2_ = 1./v, 1./v0, 1./v1, 1./v2
+
+    # calculate derivatives for quadratic interpolation
+    d2 = 2.*((y2_sq-y1_sq)/(v2_-v1_) - \
+             (y0_sq-y1_sq)/(v0_-v1_)) / (v2_-v0_)
+    if (v2_ + v0_) >= (v1_ + v1_):
+        d1 = (y2_sq-y1_sq) / (v2_-v1_) - 0.5*d2*(v2_-v1_)
+    else:
+        d1 = (y1_sq-y0_sq) / (v1_-v0_) + 0.5*d2*(v1_-v0_)
+    d0 = y1_sq
+
+    # calculate y
+    y = math.sqrt((d2/2.)*(v_-v1_)**2. + d1*(v_-v1_)+ d0)
+
+    return y

 def _qsturng(p, r, v):
     """scalar version of qsturng"""
-    pass
+##    print 'q',p
+    # r is interpolated through the q to y here we only need to
+    # account for when p and/or v are not found in the table.
+    global A, p_keys, v_keys
+
+    if p < .1 or p > .999:
+        raise ValueError('p must be between .1 and .999')
+
+    if p < .9:
+        if v < 2:
+            raise ValueError('v must be > 2 when p < .9')
+    else:
+        if v < 1:
+            raise ValueError('v must be > 1 when p >= .9')
+
+    # The easy case. A tabled value is requested.
+
+    #numpy 1.4.1: TypeError: unhashable type: 'numpy.ndarray' :
+    p = float(p)
+    if isinstance(v, np.ndarray):
+        v = v.item()
+    if (p,v) in A:
+        y = _func(A[(p,v)], p, r, v) + 1.
+
+    elif p not in p_keys and v not in v_keys+([],[1])[p>=.90]:
+        # apply bilinear (quadratic) interpolation
+        #
+        #   p0,v2 +        o         + p1,v2    + p2,v2
+        #                    r2
+        #
+        # 1
+        # -                 (p,v)
+        # v                x
+        #
+        #                    r1
+        #   p0,v1 +        o         + p1,v1    + p2,v1
+        #
+        #
+        #   p0,v0 +        o r0      + p1,v0    + p2,v0
+        #
+        #             _ptransform(p)
+        #
+        # (p1 and v1 may be below or above (p,v). The algorithm
+        # works in both cases. For diagramatic simplicity it is
+        # shown as above)
+        #
+        # 1. at v0, v1, and v2 use quadratic interpolation
+        #    to find r0, r1, r2
+        #
+        # 2. use r0, r1, r2 and quadratic interpolaiton
+        #    to find y and (p,v)
+
+        # find the 3 closest v values
+        v0, v1, v2 = _select_vs(v, p)
+
+        # find the 3 closest p values
+        p0, p1, p2 = _select_ps(p)
+
+        # calculate r0, r1, and r2
+        r0_sq = _interpolate_p(p, r, v0)**2
+        r1_sq = _interpolate_p(p, r, v1)**2
+        r2_sq = _interpolate_p(p, r, v2)**2
+
+        # transform v
+        v_, v0_, v1_, v2_ = 1./v, 1./v0, 1./v1, 1./v2
+
+        # calculate derivatives for quadratic interpolation
+        d2 = 2.*((r2_sq-r1_sq)/(v2_-v1_) - \
+                 (r0_sq-r1_sq)/(v0_-v1_)) / (v2_-v0_)
+        if (v2_ + v0_) >= (v1_ + v1_):
+            d1 = (r2_sq-r1_sq) / (v2_-v1_) - 0.5*d2*(v2_-v1_)
+        else:
+            d1 = (r1_sq-r0_sq) / (v1_-v0_) + 0.5*d2*(v1_-v0_)
+        d0 = r1_sq

+        # calculate y
+        y = math.sqrt((d2/2.)*(v_-v1_)**2. + d1*(v_-v1_)+ d0)

-_vqsturng = np.vectorize(_qsturng)
-_vqsturng.__doc__ = 'vector version of qsturng'
+    elif v not in v_keys+([],[1])[p>=.90]:
+        y = _interpolate_v(p, r, v)

+    elif p not in p_keys:
+        y = _interpolate_p(p, r, v)
+
+    v = min(v, 1e38)
+    return math.sqrt(2) * -y * scipy.stats.t.isf((1. + p) / 2., v)
+
+# make a qsturng functinon that will accept list-like objects
+_vqsturng = np.vectorize(_qsturng)
+_vqsturng.__doc__ = """vector version of qsturng"""

 def qsturng(p, r, v):
     """Approximates the quantile p for a studentized range
@@ -640,17 +786,64 @@ def qsturng(p, r, v):
     q : (scalar, array_like)
         approximation of the Studentized Range
     """
-    pass

+    if all(map(_isfloat, [p, r, v])):
+        return _qsturng(p, r, v)
+    return _vqsturng(p, r, v)
+
+##def _qsturng0(p, r, v):
+####    print 'q0',p
+##    """
+##    returns a first order approximation of q studentized range
+##    value. Based on Lund and Lund's 1983 based on the FORTRAN77
+##    algorithm AS 190.2 Appl. Statist. (1983).
+##    """
+##    vmax = 120.
+##    c = [0.8843, 0.2368, 1.214, 1.208, 1.4142]
+##
+##    t = -_phi(.5+.5*p)
+##    if (v < vmax):
+##        t += (t**3. + t) / float(v) / 4.
+##
+##    q = c[0] - c[1] * t
+##    if (v < vmax):
+##        q = q - c[2] / float(v) + c[3] * t / float(v)
+##    q = t * (q * math.log(r - 1.) + c[4])
+##
+##    # apply "bar napkin" correction for when p < .85
+##    # this is good enough for our intended purpose
+##    if p < .85:
+##        q += math.log10(r) * 2.25 * (.85-p)
+##    return q

 def _psturng(q, r, v):
     """scalar version of psturng"""
-    pass
-
+    if q < 0.:
+        raise ValueError('q should be >= 0')
+
+    def opt_func(p, r, v):
+        return np.squeeze(abs(_qsturng(p, r, v) - q))
+
+    if v == 1:
+        if q < _qsturng(.9, r, 1):
+            return .1
+        elif q > _qsturng(.999, r, 1):
+            return .001
+        soln = 1. - fminbound(opt_func, .9, .999, args=(r,v))
+        return np.atleast_1d(soln)
+    else:
+        if q < _qsturng(.1, r, v):
+            return .9
+        elif q > _qsturng(.999, r, v):
+            return .001
+        soln = 1. - fminbound(opt_func, .1, .999, args=(r,v))
+        return np.atleast_1d(soln)
+
+def _psturng_scalar(q, r, v):
+    return np.squeeze(_psturng(q, r, v))

 _vpsturng = np.vectorize(_psturng_scalar)
-_vpsturng.__doc__ = 'vector version of psturng'
-
+_vpsturng.__doc__ = """vector version of psturng"""

 def psturng(q, r, v):
     """Evaluates the probability from 0 to q for a studentized
@@ -680,4 +873,32 @@ def psturng(q, r, v):
         and .1, when v > 1, p is bound between .001 and .9.
         Values between .5 and .9 are 1st order appoximations.
     """
-    pass
+    if all(map(_isfloat, [q, r, v])):
+        return _psturng(q, r, v)
+    return _vpsturng(q, r, v)
+
+##p, r, v = .9, 10, 20
+##print
+##print 'p and v interpolation'
+##print '\t20\t22\t24'
+##print '.75',qsturng(.75, r, 20),qsturng(.75, r, 22),qsturng(.75, r, 24)
+##print '.85',qsturng(.85, r, 20),qsturng(.85, r, 22),qsturng(.85, r, 24)
+##print '.90',qsturng(.90, r, 20),qsturng(.90, r, 22),qsturng(.90, r, 24)
+##print
+##print 'p and v interpolation'
+##print '\t120\t500\tinf'
+##print '.950',qsturng(.95, r, 120),qsturng(.95, r, 500),qsturng(.95, r, inf)
+##print '.960',qsturng(.96, r, 120),qsturng(.96, r, 500),qsturng(.96, r, inf)
+##print '.975',qsturng(.975, r, 120),qsturng(.975, r, 500),qsturng(.975, r, inf)
+##print
+##print 'p and v interpolation'
+##print '\t40\t50\t60'
+##print '.950',qsturng(.95, r, 40),qsturng(.95, r, 50),qsturng(.95, r, 60)
+##print '.960',qsturng(.96, r, 40),qsturng(.96, r, 50),qsturng(.96, r, 60)
+##print '.975',qsturng(.975, r, 40),qsturng(.975, r, 50),qsturng(.975, r, 60)
+##print
+##print 'p and v interpolation'
+##print '\t20\t22\t24'
+##print '.50',qsturng(.5, r, 20),qsturng(.5, r, 22),qsturng(.5, r, 24)
+##print '.60',qsturng(.6, r, 20),qsturng(.6, r, 22),qsturng(.6, r, 24)
+##print '.75',qsturng(.75, r, 20),qsturng(.75, r, 22),qsturng(.75, r, 24)
diff --git a/statsmodels/stats/mediation.py b/statsmodels/stats/mediation.py
index 261eb1b25..b2a3aed32 100644
--- a/statsmodels/stats/mediation.py
+++ b/statsmodels/stats/mediation.py
@@ -122,41 +122,103 @@ class Mediation:
     Software 59:5.  http://www.jstatsoft.org/v59/i05/paper
     """

-    def __init__(self, outcome_model, mediator_model, exposure, mediator=
-        None, moderators=None, outcome_fit_kwargs=None, mediator_fit_kwargs
-        =None, outcome_predict_kwargs=None):
+    def __init__(self, outcome_model, mediator_model, exposure, mediator=None,
+                 moderators=None, outcome_fit_kwargs=None, mediator_fit_kwargs=None,
+                 outcome_predict_kwargs=None):
+
         self.outcome_model = outcome_model
         self.mediator_model = mediator_model
         self.exposure = exposure
         self.moderators = moderators if moderators is not None else {}
+
         if mediator is None:
             self.mediator = self._guess_endog_name(mediator_model, 'mediator')
         else:
             self.mediator = mediator
-        self._outcome_fit_kwargs = (outcome_fit_kwargs if 
-            outcome_fit_kwargs is not None else {})
-        self._mediator_fit_kwargs = (mediator_fit_kwargs if 
-            mediator_fit_kwargs is not None else {})
-        self._outcome_predict_kwargs = (outcome_predict_kwargs if 
-            outcome_predict_kwargs is not None else {})
+
+        self._outcome_fit_kwargs = (outcome_fit_kwargs if outcome_fit_kwargs
+                is not None else {})
+        self._mediator_fit_kwargs = (mediator_fit_kwargs if mediator_fit_kwargs
+                is not None else {})
+        self._outcome_predict_kwargs = (outcome_predict_kwargs if
+                outcome_predict_kwargs is not None else {})
+
+        # We will be changing these so need to copy.
         self._outcome_exog = outcome_model.exog.copy()
         self._mediator_exog = mediator_model.exog.copy()
+
+        # Position of the exposure variable in the mediator model.
         self._exp_pos_mediator = self._variable_pos('exposure', 'mediator')
+
+        # Position of the exposure variable in the outcome model.
         self._exp_pos_outcome = self._variable_pos('exposure', 'outcome')
+
+        # Position of the mediator variable in the outcome model.
         self._med_pos_outcome = self._variable_pos('mediator', 'outcome')

+
+    def _variable_pos(self, var, model):
+        if model == 'mediator':
+            mod = self.mediator_model
+        else:
+            mod = self.outcome_model
+
+        if var == 'mediator':
+            return maybe_name_or_idx(self.mediator, mod)[1]
+
+        exp = self.exposure
+        exp_is_2 = ((len(exp) == 2) and not isinstance(exp, str))
+
+        if exp_is_2:
+            if model == 'outcome':
+                return exp[0]
+            elif model == 'mediator':
+                return exp[1]
+        else:
+            return maybe_name_or_idx(exp, mod)[1]
+
+
+    def _guess_endog_name(self, model, typ):
+        if hasattr(model, 'formula'):
+            return model.formula.split("~")[0].strip()
+        else:
+            raise ValueError('cannot infer %s name without formula' % typ)
+
+
     def _simulate_params(self, result):
         """
         Simulate model parameters from fitted sampling distribution.
         """
-        pass
+        mn = result.params
+        cov = result.cov_params()
+        return np.random.multivariate_normal(mn, cov)
+

     def _get_mediator_exog(self, exposure):
         """
         Return the mediator exog matrix with exposure set to the given
         value.  Set values of moderated variables as needed.
         """
-        pass
+        mediator_exog = self._mediator_exog
+        if not hasattr(self.mediator_model, 'formula'):
+            mediator_exog[:, self._exp_pos_mediator] = exposure
+            for ix in self.moderators:
+                v = self.moderators[ix]
+                mediator_exog[:, ix[1]] = v
+        else:
+            # Need to regenerate the model exog
+            df = self.mediator_model.data.frame.copy()
+            df[self.exposure] = exposure
+            for vname in self.moderators:
+                v = self.moderators[vname]
+                df.loc[:, vname] = v
+            klass = self.mediator_model.__class__
+            init_kwargs = self.mediator_model._get_init_kwds()
+            model = klass.from_formula(data=df, **init_kwargs)
+            mediator_exog = model.exog
+
+        return mediator_exog
+

     def _get_outcome_exog(self, exposure, mediator):
         """
@@ -164,9 +226,43 @@ class Mediation:
         the given values.  Set values of moderated variables as
         needed.
         """
-        pass
+        outcome_exog = self._outcome_exog
+        if not hasattr(self.outcome_model, 'formula'):
+            outcome_exog[:, self._med_pos_outcome] = mediator
+            outcome_exog[:, self._exp_pos_outcome] = exposure
+            for ix in self.moderators:
+                v = self.moderators[ix]
+                outcome_exog[:, ix[0]] = v
+        else:
+            # Need to regenerate the model exog
+            df = self.outcome_model.data.frame.copy()
+            df[self.exposure] = exposure
+            df[self.mediator] = mediator
+            for vname in self.moderators:
+                v = self.moderators[vname]
+                df[vname] = v
+            klass = self.outcome_model.__class__
+            init_kwargs = self.outcome_model._get_init_kwds()
+            model = klass.from_formula(data=df, **init_kwargs)
+            outcome_exog = model.exog
+
+        return outcome_exog
+
+
+    def _fit_model(self, model, fit_kwargs, boot=False):
+        klass = model.__class__
+        init_kwargs = model._get_init_kwds()
+        endog = model.endog
+        exog = model.exog
+        if boot:
+            ii = np.random.randint(0, len(endog), len(endog))
+            endog = endog[ii]
+            exog = exog[ii, :]
+        outcome_model = klass(endog, exog, **init_kwargs)
+        return outcome_model.fit(**fit_kwargs)

-    def fit(self, method='parametric', n_rep=1000):
+
+    def fit(self, method="parametric", n_rep=1000):
         """
         Fit a regression model to assess mediation.

@@ -179,7 +275,72 @@ class Mediation:

         Returns a MediationResults object.
         """
-        pass
+
+        if method.startswith("para"):
+            # Initial fit to unperturbed data.
+            outcome_result = self._fit_model(self.outcome_model, self._outcome_fit_kwargs)
+            mediator_result = self._fit_model(self.mediator_model, self._mediator_fit_kwargs)
+        elif not method.startswith("boot"):
+            raise ValueError(
+                "method must be either 'parametric' or 'bootstrap'"
+            )
+
+        indirect_effects = [[], []]
+        direct_effects = [[], []]
+
+        for iter in range(n_rep):
+
+            if method == "parametric":
+                # Realization of outcome model parameters from sampling distribution
+                outcome_params = self._simulate_params(outcome_result)
+
+                # Realization of mediation model parameters from sampling distribution
+                mediation_params = self._simulate_params(mediator_result)
+            else:
+                outcome_result = self._fit_model(self.outcome_model,
+                                                 self._outcome_fit_kwargs, boot=True)
+                outcome_params = outcome_result.params
+                mediator_result = self._fit_model(self.mediator_model,
+                                                  self._mediator_fit_kwargs, boot=True)
+                mediation_params = mediator_result.params
+
+            # predicted outcomes[tm][te] is the outcome when the
+            # mediator is set to tm and the outcome/exposure is set to
+            # te.
+            predicted_outcomes = [[None, None], [None, None]]
+            for tm in 0, 1:
+                mex = self._get_mediator_exog(tm)
+                kwargs = {"exog": mex}
+                if hasattr(mediator_result, "scale"):
+                    kwargs["scale"] = mediator_result.scale
+                gen = self.mediator_model.get_distribution(mediation_params,
+                                                           **kwargs)
+                potential_mediator = gen.rvs(mex.shape[0])
+
+                for te in 0, 1:
+                    oex = self._get_outcome_exog(te, potential_mediator)
+                    po = self.outcome_model.predict(outcome_params, oex,
+                            **self._outcome_predict_kwargs)
+                    predicted_outcomes[tm][te] = po
+
+            for t in 0, 1:
+                indirect_effects[t].append(predicted_outcomes[1][t] - predicted_outcomes[0][t])
+                direct_effects[t].append(predicted_outcomes[t][1] - predicted_outcomes[t][0])
+
+        for t in 0, 1:
+            indirect_effects[t] = np.asarray(indirect_effects[t]).T
+            direct_effects[t] = np.asarray(direct_effects[t]).T
+
+        self.indirect_effects = indirect_effects
+        self.direct_effects = direct_effects
+
+        rslt = MediationResults(self.indirect_effects, self.direct_effects)
+        rslt.method = method
+        return rslt
+
+
+def _pvalue(vec):
+    return 2 * min(sum(vec > 0), sum(vec < 0)) / float(len(vec))


 class MediationResults:
@@ -193,22 +354,26 @@ class MediationResults:
     """

     def __init__(self, indirect_effects, direct_effects):
+
         self.indirect_effects = indirect_effects
         self.direct_effects = direct_effects
+
         indirect_effects_avg = [None, None]
         direct_effects_avg = [None, None]
-        for t in (0, 1):
+        for t in 0, 1:
             indirect_effects_avg[t] = indirect_effects[t].mean(0)
             direct_effects_avg[t] = direct_effects[t].mean(0)
+
         self.ACME_ctrl = indirect_effects_avg[0]
         self.ACME_tx = indirect_effects_avg[1]
         self.ADE_ctrl = direct_effects_avg[0]
         self.ADE_tx = direct_effects_avg[1]
-        self.total_effect = (self.ACME_ctrl + self.ACME_tx + self.ADE_ctrl +
-            self.ADE_tx) / 2
+        self.total_effect = (self.ACME_ctrl + self.ACME_tx + self.ADE_ctrl + self.ADE_tx) / 2
+
         self.prop_med_ctrl = self.ACME_ctrl / self.total_effect
         self.prop_med_tx = self.ACME_tx / self.total_effect
         self.prop_med_avg = (self.prop_med_ctrl + self.prop_med_tx) / 2
+
         self.ACME_avg = (self.ACME_ctrl + self.ACME_tx) / 2
         self.ADE_avg = (self.ADE_ctrl + self.ADE_tx) / 2

@@ -216,4 +381,32 @@ class MediationResults:
         """
         Provide a summary of a mediation analysis.
         """
-        pass
+
+        columns = ["Estimate", "Lower CI bound", "Upper CI bound", "P-value"]
+        index = ["ACME (control)", "ACME (treated)",
+                 "ADE (control)", "ADE (treated)",
+                 "Total effect",
+                 "Prop. mediated (control)",
+                 "Prop. mediated (treated)",
+                 "ACME (average)", "ADE (average)",
+                 "Prop. mediated (average)"]
+        smry = pd.DataFrame(columns=columns, index=index)
+
+        for i, vec in enumerate([self.ACME_ctrl, self.ACME_tx,
+                                 self.ADE_ctrl, self.ADE_tx,
+                                 self.total_effect, self.prop_med_ctrl,
+                                 self.prop_med_tx, self.ACME_avg,
+                                 self.ADE_avg, self.prop_med_avg]):
+
+            if ((vec is self.prop_med_ctrl) or (vec is self.prop_med_tx) or
+                    (vec is self.prop_med_avg)):
+                smry.iloc[i, 0] = np.median(vec)
+            else:
+                smry.iloc[i, 0] = vec.mean()
+            smry.iloc[i, 1] = np.percentile(vec, 100 * alpha / 2)
+            smry.iloc[i, 2] = np.percentile(vec, 100 * (1 - alpha / 2))
+            smry.iloc[i, 3] = _pvalue(vec)
+
+        smry = smry.apply(pd.to_numeric, errors='coerce')
+
+        return smry
diff --git a/statsmodels/stats/meta_analysis.py b/statsmodels/stats/meta_analysis.py
index 992fa6455..94129825e 100644
--- a/statsmodels/stats/meta_analysis.py
+++ b/statsmodels/stats/meta_analysis.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Thu Apr  2 14:34:25 2020

@@ -5,9 +6,11 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 import pandas as pd
 from scipy import stats
+
 from statsmodels.stats.base import HolderTuple


@@ -20,15 +23,22 @@ class CombineResults:
     def __init__(self, **kwds):
         self.__dict__.update(kwds)
         self._ini_keys = list(kwds.keys())
+
         self.df_resid = self.k - 1
+
+        # TODO: move to property ?
         self.sd_eff_w_fe_hksj = np.sqrt(self.var_hksj_fe)
         self.sd_eff_w_re_hksj = np.sqrt(self.var_hksj_re)
+
+        # explained variance measures
         self.h2 = self.q / (self.k - 1)
         self.i2 = 1 - 1 / self.h2
+
+        # memoize ci_samples
         self.cache_ci = {}

-    def conf_int_samples(self, alpha=0.05, use_t=None, nobs=None, ci_func=None
-        ):
+    def conf_int_samples(self, alpha=0.05, use_t=None, nobs=None,
+                         ci_func=None):
         """confidence intervals for the effect size estimate of samples

         Additional information needs to be provided for confidence intervals
@@ -66,7 +76,46 @@ class CombineResults:
         CombineResults currently only has information from the combine_effects
         function, which does not provide details about individual samples.
         """
-        pass
+        # this is a bit messy, we don't have enough information about
+        # computing conf_int already in results for other than normal
+        # TODO: maybe there is a better
+        if (alpha, use_t) in self.cache_ci:
+            return self.cache_ci[(alpha, use_t)]
+
+        if use_t is None:
+            use_t = self.use_t
+
+        if ci_func is not None:
+            kwds = {"use_t": use_t} if use_t is not None else {}
+            ci_eff = ci_func(alpha=alpha, **kwds)
+            self.ci_sample_distr = "ci_func"
+        else:
+            if use_t is False:
+                crit = stats.norm.isf(alpha / 2)
+                self.ci_sample_distr = "normal"
+            else:
+                if nobs is not None:
+                    df_resid = nobs - 1
+                    crit = stats.t.isf(alpha / 2, df_resid)
+                    self.ci_sample_distr = "t"
+                else:
+                    msg = ("`use_t=True` requires `nobs` for each sample "
+                           "or `ci_func`. Using normal distribution for "
+                           "confidence interval of individual samples.")
+                    import warnings
+                    warnings.warn(msg)
+                    crit = stats.norm.isf(alpha / 2)
+                    self.ci_sample_distr = "normal"
+
+            # sgn = np.asarray([-1, 1])
+            # ci_eff = self.eff + sgn * crit * self.sd_eff
+            ci_low = self.eff - crit * self.sd_eff
+            ci_upp = self.eff + crit * self.sd_eff
+            ci_eff = (ci_low, ci_upp)
+
+        # if (alpha, use_t) not in self.cache_ci:  # not needed
+        self.cache_ci[(alpha, use_t)] = ci_eff
+        return ci_eff

     def conf_int(self, alpha=0.05, use_t=None):
         """confidence interval for the overall mean estimate
@@ -102,7 +151,24 @@ class CombineResults:
             the estimated scale is 1.

         """
-        pass
+        if use_t is None:
+            use_t = self.use_t
+
+        if use_t is False:
+            crit = stats.norm.isf(alpha / 2)
+        else:
+            crit = stats.t.isf(alpha / 2, self.df_resid)
+
+        sgn = np.asarray([-1, 1])
+        m_fe = self.mean_effect_fe
+        m_re = self.mean_effect_re
+        ci_eff_fe = m_fe + sgn * crit * self.sd_eff_w_fe
+        ci_eff_re = m_re + sgn * crit * self.sd_eff_w_re
+
+        ci_eff_fe_wls = m_fe + sgn * crit * np.sqrt(self.var_hksj_fe)
+        ci_eff_re_wls = m_re + sgn * crit * np.sqrt(self.var_hksj_re)
+
+        return ci_eff_fe, ci_eff_re, ci_eff_fe_wls, ci_eff_re_wls

     def test_homogeneity(self):
         """Test whether the means of all samples are the same
@@ -124,7 +190,12 @@ class CombineResults:
                 Degrees of freedom, equal to number of studies or samples
                 minus 1.
         """
-        pass
+        pvalue = stats.chi2.sf(self.q, self.k - 1)
+        res = HolderTuple(statistic=self.q,
+                          pvalue=pvalue,
+                          df=self.k - 1,
+                          distr="chi2")
+        return res

     def summary_array(self, alpha=0.05, use_t=None):
         """Create array with sample statistics and mean estimates
@@ -151,7 +222,26 @@ class CombineResults:
         column_names : list of str
             The names for the columns, used when creating summary DataFrame.
         """
-        pass
+
+        ci_low, ci_upp = self.conf_int_samples(alpha=alpha, use_t=use_t)
+        res = np.column_stack([self.eff, self.sd_eff,
+                               ci_low, ci_upp,
+                               self.weights_rel_fe, self.weights_rel_re])
+
+        ci = self.conf_int(alpha=alpha, use_t=use_t)
+        res_fe = [[self.mean_effect_fe, self.sd_eff_w_fe,
+                   ci[0][0], ci[0][1], 1, np.nan]]
+        res_re = [[self.mean_effect_re, self.sd_eff_w_re,
+                   ci[1][0], ci[1][1], np.nan, 1]]
+        res_fe_wls = [[self.mean_effect_fe, self.sd_eff_w_fe_hksj,
+                       ci[2][0], ci[2][1], 1, np.nan]]
+        res_re_wls = [[self.mean_effect_re, self.sd_eff_w_re_hksj,
+                       ci[3][0], ci[3][1], np.nan, 1]]
+
+        res = np.concatenate([res, res_fe, res_re, res_fe_wls, res_re_wls],
+                             axis=0)
+        column_names = ['eff', "sd_eff", "ci_low", "ci_upp", "w_fe", "w_re"]
+        return res, column_names

     def summary_frame(self, alpha=0.05, use_t=None):
         """Create DataFrame with sample statistics and mean estimates
@@ -177,10 +267,17 @@ class CombineResults:
             Rows include statistics for samples and estimates of overall mean.

         """
-        pass
-
-    def plot_forest(self, alpha=0.05, use_t=None, use_exp=False, ax=None,
-        **kwds):
+        if use_t is None:
+            use_t = self.use_t
+        labels = (list(self.row_names) +
+                  ["fixed effect", "random effect",
+                   "fixed effect wls", "random effect wls"])
+        res, col_names = self.summary_array(alpha=alpha, use_t=use_t)
+        results = pd.DataFrame(res, index=labels, columns=col_names)
+        return results
+
+    def plot_forest(self, alpha=0.05, use_t=None, use_exp=False,
+                    ax=None, **kwds):
         """Forest plot with means and confidence intervals

         Parameters
@@ -217,7 +314,14 @@ class CombineResults:
         dot_plot

         """
-        pass
+        from statsmodels.graphics.dotplots import dot_plot
+        res_df = self.summary_frame(alpha=alpha, use_t=use_t)
+        if use_exp:
+            res_df = np.exp(res_df[["eff", "ci_low", "ci_upp"]])
+        hw = np.abs(res_df[["ci_low", "ci_upp"]] - res_df[["eff"]].values)
+        fig = dot_plot(points=res_df["eff"], intervals=hw,
+                       lines=res_df.index, line_order=res_df.index, **kwds)
+        return fig


 def effectsize_smd(mean1, sd1, nobs1, mean2, sd2, nobs2):
@@ -268,11 +372,25 @@ def effectsize_smd(mean1, sd1, nobs1, mean2, sd2, nobs2):
         Boca Raton: CRC Press/Taylor & Francis Group.

     """
-    pass
-
-
-def effectsize_2proportions(count1, nobs1, count2, nobs2, statistic='diff',
-    zero_correction=None, zero_kwds=None):
+    # TODO: not used yet, design and options ?
+    # k = len(mean1)
+    # if row_names is None:
+    #    row_names = list(range(k))
+    # crit = stats.norm.isf(alpha / 2)
+    # var_diff_uneq = sd1**2 / nobs1 + sd2**2 / nobs2
+    var_diff = (sd1**2 * (nobs1 - 1) +
+                sd2**2 * (nobs2 - 1)) / (nobs1 + nobs2 - 2)
+    sd_diff = np.sqrt(var_diff)
+    nobs = nobs1 + nobs2
+    bias_correction = 1 - 3 / (4 * nobs - 9)
+    smd = (mean1 - mean2) / sd_diff
+    smd_bc = bias_correction * smd
+    var_smdbc = nobs / nobs1 / nobs2 + smd_bc**2 / 2 / (nobs - 3.94)
+    return smd_bc, var_smdbc
+
+
+def effectsize_2proportions(count1, nobs1, count2, nobs2, statistic="diff",
+                            zero_correction=None, zero_kwds=None):
     """Effects sizes for two sample binomial proportions

     Parameters
@@ -329,11 +447,66 @@ def effectsize_2proportions(count1, nobs1, count2, nobs2, statistic='diff',
     --------
     statsmodels.stats.contingency_tables
     """
-    pass
-
-
-def combine_effects(effect, variance, method_re='iterated', row_names=None,
-    use_t=False, alpha=0.05, **kwds):
+    if zero_correction is None:
+        cc1 = cc2 = 0
+    elif zero_correction == "tac":
+        # treatment arm continuity correction Ruecker et al 2009, section 3.2
+        nobs_t = nobs1 + nobs2
+        cc1 = nobs2 / nobs_t
+        cc2 = nobs1 / nobs_t
+    elif zero_correction == "clip":
+        clip_bounds = zero_kwds.get("clip_bounds", (1e-6, 1 - 1e-6))
+        cc1 = cc2 = 0
+    elif zero_correction:
+        # TODO: check is float_like
+        cc1 = cc2 = zero_correction
+    else:
+        msg = "zero_correction not recognized or supported"
+        raise NotImplementedError(msg)
+
+    zero_mask1 = (count1 == 0) | (count1 == nobs1)
+    zero_mask2 = (count2 == 0) | (count2 == nobs2)
+    zmask = np.logical_or(zero_mask1, zero_mask2)
+    n1 = nobs1 + (cc1 + cc2) * zmask
+    n2 = nobs2 + (cc1 + cc2) * zmask
+    p1 = (count1 + cc1) / (n1)
+    p2 = (count2 + cc2) / (n2)
+
+    if zero_correction == "clip":
+        p1 = np.clip(p1, *clip_bounds)
+        p2 = np.clip(p2, *clip_bounds)
+
+    if statistic in ["diff", "rd"]:
+        rd = p1 - p2
+        rd_var = p1 * (1 - p1) / n1 + p2 * (1 - p2) / n2
+        eff = rd
+        var_eff = rd_var
+    elif statistic in ["risk-ratio", "rr"]:
+        # rr = p1 / p2
+        log_rr = np.log(p1) - np.log(p2)
+        log_rr_var = (1 - p1) / p1 / n1 + (1 - p2) / p2 / n2
+        eff = log_rr
+        var_eff = log_rr_var
+    elif statistic in ["odds-ratio", "or"]:
+        # or_ = p1 / (1 - p1) * (1 - p2) / p2
+        log_or = np.log(p1) - np.log(1 - p1) - np.log(p2) + np.log(1 - p2)
+        log_or_var = 1 / (p1 * (1 - p1) * n1) + 1 / (p2 * (1 - p2) * n2)
+        eff = log_or
+        var_eff = log_or_var
+    elif statistic in ["arcsine", "arcsin", "as"]:
+        as_ = np.arcsin(np.sqrt(p1)) - np.arcsin(np.sqrt(p2))
+        as_var = (1 / n1 + 1 / n2) / 4
+        eff = as_
+        var_eff = as_var
+    else:
+        msg = 'statistic not recognized, use one of "rd", "rr", "or", "as"'
+        raise NotImplementedError(msg)
+
+    return eff, var_eff
+
+
+def combine_effects(effect, variance, method_re="iterated", row_names=None,
+                    use_t=False, alpha=0.05, **kwds):
     """combining effect sizes for effect sizes using meta-analysis

     This currently does not use np.asarray, all computations are possible in
@@ -397,10 +570,63 @@ def combine_effects(effect, variance, method_re='iterated', row_names=None,
         Boca Raton: CRC Press/Taylor & Francis Group.

     """
-    pass

+    k = len(effect)
+    if row_names is None:
+        row_names = list(range(k))
+    crit = stats.norm.isf(alpha / 2)
+
+    # alias for initial version
+    eff = effect
+    var_eff = variance
+    sd_eff = np.sqrt(var_eff)

-def _fit_tau_iterative(eff, var_eff, tau2_start=0, atol=1e-05, maxiter=50):
+    # fixed effects computation
+
+    weights_fe = 1 / var_eff  # no bias correction ?
+    w_total_fe = weights_fe.sum(0)
+    weights_rel_fe = weights_fe / w_total_fe
+
+    eff_w_fe = weights_rel_fe * eff
+    mean_effect_fe = eff_w_fe.sum()
+    var_eff_w_fe = 1 / w_total_fe
+    sd_eff_w_fe = np.sqrt(var_eff_w_fe)
+
+    # random effects computation
+
+    q = (weights_fe * eff**2).sum(0)
+    q -= (weights_fe * eff).sum()**2 / w_total_fe
+    df = k - 1
+
+    if method_re.lower() in ["iterated", "pm"]:
+        tau2, _ = _fit_tau_iterative(eff, var_eff, **kwds)
+    elif method_re.lower() in ["chi2", "dl"]:
+        c = w_total_fe - (weights_fe**2).sum() / w_total_fe
+        tau2 = (q - df) / c
+    else:
+        raise ValueError('method_re should be "iterated" or "chi2"')
+
+    weights_re = 1 / (var_eff + tau2)  # no  bias_correction ?
+    w_total_re = weights_re.sum(0)
+    weights_rel_re = weights_re / weights_re.sum(0)
+
+    eff_w_re = weights_rel_re * eff
+    mean_effect_re = eff_w_re.sum()
+    var_eff_w_re = 1 / w_total_re
+    sd_eff_w_re = np.sqrt(var_eff_w_re)
+    # ci_low_eff_re = mean_effect_re - crit * sd_eff_w_re
+    # ci_upp_eff_re = mean_effect_re + crit * sd_eff_w_re
+
+    scale_hksj_re = (weights_re * (eff - mean_effect_re)**2).sum() / df
+    scale_hksj_fe = (weights_fe * (eff - mean_effect_fe)**2).sum() / df
+    var_hksj_re = (weights_rel_re * (eff - mean_effect_re)**2).sum() / df
+    var_hksj_fe = (weights_rel_fe * (eff - mean_effect_fe)**2).sum() / df
+
+    res = CombineResults(**locals())
+    return res
+
+
+def _fit_tau_iterative(eff, var_eff, tau2_start=0, atol=1e-5, maxiter=50):
     """Paule-Mandel iterative estimate of between random effect variance

     implementation follows DerSimonian and Kacker 2007 Appendix 8
@@ -427,7 +653,28 @@ def _fit_tau_iterative(eff, var_eff, tau2_start=0, atol=1e-05, maxiter=50):
         True if iteration has converged.

     """
-    pass
+    tau2 = tau2_start
+    k = eff.shape[0]
+    converged = False
+    for i in range(maxiter):
+        w = 1 / (var_eff + tau2)
+        m = w.dot(eff) / w.sum(0)
+        resid_sq = (eff - m)**2
+        q_w = w.dot(resid_sq)
+        # estimating equation
+        ee = q_w - (k - 1)
+        if ee < 0:
+            tau2 = 0
+            converged = 0
+            break
+        if np.allclose(ee, 0, atol=atol):
+            converged = True
+            break
+        # update tau2
+        delta = ee / (w**2).dot(resid_sq)
+        tau2 += delta
+
+    return tau2, converged


 def _fit_tau_mm(eff, var_eff, weights):
@@ -450,10 +697,21 @@ def _fit_tau_mm(eff, var_eff, weights):
         estimate of random effects variance tau squared

     """
-    pass
+    w = weights
+
+    m = w.dot(eff) / w.sum(0)
+    resid_sq = (eff - m)**2
+    q_w = w.dot(resid_sq)
+    w_t = w.sum()
+    expect = w.dot(var_eff) - (w**2).dot(var_eff) / w_t
+    denom = w_t - (w**2).sum() / w_t
+    # moment estimate from estimating equation
+    tau2 = (q_w - expect) / denom
+
+    return tau2


-def _fit_tau_iter_mm(eff, var_eff, tau2_start=0, atol=1e-05, maxiter=50):
+def _fit_tau_iter_mm(eff, var_eff, tau2_start=0, atol=1e-5, maxiter=50):
     """iterated method of moment estimate of between random effect variance

     This repeatedly estimates tau, updating weights in each iteration
@@ -480,4 +738,19 @@ def _fit_tau_iter_mm(eff, var_eff, tau2_start=0, atol=1e-05, maxiter=50):
         True if iteration has converged.

     """
-    pass
+    tau2 = tau2_start
+    converged = False
+    for _ in range(maxiter):
+        w = 1 / (var_eff + tau2)
+
+        tau2_new = _fit_tau_mm(eff, var_eff, w)
+        tau2_new = max(0, tau2_new)
+
+        delta = tau2_new - tau2
+        if np.allclose(delta, 0, atol=atol):
+            converged = True
+            break
+
+        tau2 = tau2_new
+
+    return tau2, converged
diff --git a/statsmodels/stats/moment_helpers.py b/statsmodels/stats/moment_helpers.py
index 0dcf35293..f2ca10cfa 100644
--- a/statsmodels/stats/moment_helpers.py
+++ b/statsmodels/stats/moment_helpers.py
@@ -11,22 +11,73 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 from scipy.special import comb


+def _convert_to_multidim(x):
+    if any([isinstance(x, list), isinstance(x, tuple)]):
+        return np.array(x)
+    elif isinstance(x, np.ndarray):
+        return x
+    else:
+        # something strange was passed and the function probably
+        # will fall, maybe insert an exception?
+        return x
+
+
+def _convert_from_multidim(x, totype=list):
+    if len(x.shape) < 2:
+        return totype(x)
+    return x.T
+
+
 def mc2mnc(mc):
     """convert central to non-central moments, uses recursive formula
     optionally adjusts first moment to return mean
     """
-    pass
+    x = _convert_to_multidim(mc)
+
+    def _local_counts(mc):
+        mean = mc[0]
+        mc = [1] + list(mc)  # add zero moment = 1
+        mc[1] = 0  # define central mean as zero for formula
+        mnc = [1, mean]  # zero and first raw moments
+        for nn, m in enumerate(mc[2:]):
+            n = nn + 2
+            mnc.append(0)
+            for k in range(n + 1):
+                mnc[n] += comb(n, k, exact=True) * mc[k] * mean ** (n - k)
+        return mnc[1:]
+
+    res = np.apply_along_axis(_local_counts, 0, x)
+    # for backward compatibility convert 1-dim output to list/tuple
+    return _convert_from_multidim(res)


 def mnc2mc(mnc, wmean=True):
     """convert non-central to central moments, uses recursive formula
     optionally adjusts first moment to return mean
     """
-    pass
+    X = _convert_to_multidim(mnc)
+
+    def _local_counts(mnc):
+        mean = mnc[0]
+        mnc = [1] + list(mnc)  # add zero moment = 1
+        mu = []
+        for n, m in enumerate(mnc):
+            mu.append(0)
+            for k in range(n + 1):
+                sgn_comb = (-1) ** (n - k) * comb(n, k, exact=True)
+                mu[n] += sgn_comb * mnc[k] * mean ** (n - k)
+        if wmean:
+            mu[1] = mean
+        return mu[1:]
+
+    res = np.apply_along_axis(_local_counts, 0, X)
+    # for backward compatibility convert 1-dim output to list/tuple
+    return _convert_from_multidim(res)


 def cum2mc(kappa):
@@ -37,7 +88,23 @@ def cum2mc(kappa):
     ----------
     Kenneth Lange: Numerical Analysis for Statisticians, page 40
     """
-    pass
+    X = _convert_to_multidim(kappa)
+
+    def _local_counts(kappa):
+        mc = [1, 0.0]  # _kappa[0]]  #insert 0-moment and mean
+        kappa0 = kappa[0]
+        kappa = [1] + list(kappa)
+        for nn, m in enumerate(kappa[2:]):
+            n = nn + 2
+            mc.append(0)
+            for k in range(n - 1):
+                mc[n] += comb(n - 1, k, exact=True) * kappa[n - k] * mc[k]
+        mc[1] = kappa0  # insert mean as first moments by convention
+        return mc[1:]
+
+    res = np.apply_along_axis(_local_counts, 0, X)
+    # for backward compatibility convert 1-dim output to list/tuple
+    return _convert_from_multidim(res)


 def mnc2cum(mnc):
@@ -46,35 +113,116 @@ def mnc2cum(mnc):

     https://en.wikipedia.org/wiki/Cumulant#Cumulants_and_moments
     """
-    pass
+    X = _convert_to_multidim(mnc)
+
+    def _local_counts(mnc):
+        mnc = [1] + list(mnc)
+        kappa = [1]
+        for nn, m in enumerate(mnc[1:]):
+            n = nn + 1
+            kappa.append(m)
+            for k in range(1, n):
+                num_ways = comb(n - 1, k - 1, exact=True)
+                kappa[n] -= num_ways * kappa[k] * mnc[n - k]
+        return kappa[1:]
+
+    res = np.apply_along_axis(_local_counts, 0, X)
+    # for backward compatibility convert 1-dim output to list/tuple
+    return _convert_from_multidim(res)


 def mc2cum(mc):
     """
     just chained because I have still the test case
     """
-    pass
+    first_step = mc2mnc(mc)
+    if isinstance(first_step, np.ndarray):
+        first_step = first_step.T
+    return mnc2cum(first_step)
+    # return np.apply_along_axis(lambda x: mnc2cum(mc2mnc(x)), 0, mc)


 def mvsk2mc(args):
     """convert mean, variance, skew, kurtosis to central moments"""
-    pass
+    X = _convert_to_multidim(args)
+
+    def _local_counts(args):
+        mu, sig2, sk, kur = args
+        cnt = [None] * 4
+        cnt[0] = mu
+        cnt[1] = sig2
+        cnt[2] = sk * sig2 ** 1.5
+        cnt[3] = (kur + 3.0) * sig2 ** 2.0
+        return tuple(cnt)
+
+    res = np.apply_along_axis(_local_counts, 0, X)
+    # for backward compatibility convert 1-dim output to list/tuple
+    return _convert_from_multidim(res, tuple)


 def mvsk2mnc(args):
     """convert mean, variance, skew, kurtosis to non-central moments"""
-    pass
+    X = _convert_to_multidim(args)
+
+    def _local_counts(args):
+        mc, mc2, skew, kurt = args
+        mnc = mc
+        mnc2 = mc2 + mc * mc
+        mc3 = skew * (mc2 ** 1.5)  # 3rd central moment
+        mnc3 = mc3 + 3 * mc * mc2 + mc ** 3  # 3rd non-central moment
+        mc4 = (kurt + 3.0) * (mc2 ** 2.0)  # 4th central moment
+        mnc4 = mc4 + 4 * mc * mc3 + 6 * mc * mc * mc2 + mc ** 4
+        return (mnc, mnc2, mnc3, mnc4)
+
+    res = np.apply_along_axis(_local_counts, 0, X)
+    # for backward compatibility convert 1-dim output to list/tuple
+    return _convert_from_multidim(res, tuple)


 def mc2mvsk(args):
     """convert central moments to mean, variance, skew, kurtosis"""
-    pass
+    X = _convert_to_multidim(args)
+
+    def _local_counts(args):
+        mc, mc2, mc3, mc4 = args
+        skew = np.divide(mc3, mc2 ** 1.5)
+        kurt = np.divide(mc4, mc2 ** 2.0) - 3.0
+        return (mc, mc2, skew, kurt)
+
+    res = np.apply_along_axis(_local_counts, 0, X)
+    # for backward compatibility convert 1-dim output to list/tuple
+    return _convert_from_multidim(res, tuple)


 def mnc2mvsk(args):
     """convert central moments to mean, variance, skew, kurtosis
     """
-    pass
+    X = _convert_to_multidim(args)
+
+    def _local_counts(args):
+        # convert four non-central moments to central moments
+        mnc, mnc2, mnc3, mnc4 = args
+        mc = mnc
+        mc2 = mnc2 - mnc * mnc
+        mc3 = mnc3 - (3 * mc * mc2 + mc ** 3)  # 3rd central moment
+        mc4 = mnc4 - (4 * mc * mc3 + 6 * mc * mc * mc2 + mc ** 4)
+        return mc2mvsk((mc, mc2, mc3, mc4))
+
+    res = np.apply_along_axis(_local_counts, 0, X)
+    # for backward compatibility convert 1-dim output to list/tuple
+    return _convert_from_multidim(res, tuple)
+
+# def mnc2mc(args):
+#    """convert four non-central moments to central moments
+#    """
+#    mnc, mnc2, mnc3, mnc4 = args
+#    mc = mnc
+#    mc2 = mnc2 - mnc*mnc
+#    mc3 = mnc3 - (3*mc*mc2+mc**3) # 3rd central moment
+#    mc4 = mnc4 - (4*mc*mc3+6*mc*mc*mc2+mc**4)
+#    return mc, mc2, mc
+
+# TODO: no return, did it get lost in cut-paste?


 def cov2corr(cov, return_std=False):
@@ -99,7 +247,13 @@ def cov2corr(cov, return_std=False):
     This function does not convert subclasses of ndarrays. This requires that
     division is defined elementwise. np.ma.array and np.matrix are allowed.
     """
-    pass
+    cov = np.asanyarray(cov)
+    std_ = np.sqrt(np.diag(cov))
+    corr = cov / np.outer(std_, std_)
+    if return_std:
+        return corr, std_
+    else:
+        return corr


 def corr2cov(corr, std):
@@ -124,7 +278,10 @@ def corr2cov(corr, std):
     that multiplication is defined elementwise. np.ma.array are allowed, but
     not matrices.
     """
-    pass
+    corr = np.asanyarray(corr)
+    std_ = np.asanyarray(std)
+    cov = corr * np.outer(std_, std_)
+    return cov


 def se_cov(cov):
@@ -143,4 +300,4 @@ def se_cov(cov):
     std : ndarray
         standard deviation from diagonal of cov
     """
-    pass
+    return np.sqrt(np.diag(cov))
diff --git a/statsmodels/stats/multicomp.py b/statsmodels/stats/multicomp.py
index f0ebe17c1..f63bc61aa 100644
--- a/statsmodels/stats/multicomp.py
+++ b/statsmodels/stats/multicomp.py
@@ -1,9 +1,13 @@
+# -*- coding: utf-8 -*-
 """

 Created on Fri Mar 30 18:27:25 2012
 Author: Josef Perktold
 """
-from statsmodels.sandbox.stats.multicomp import tukeyhsd, MultiComparison
+
+from statsmodels.sandbox.stats.multicomp import (  # noqa:F401
+    tukeyhsd, MultiComparison)
+
 __all__ = ['tukeyhsd', 'MultiComparison']


@@ -36,4 +40,5 @@ def pairwise_tukeyhsd(endog, groups, alpha=0.05):
     tukeyhsd
     statsmodels.sandbox.stats.multicomp.TukeyHSDResults
     """
-    pass
+
+    return MultiComparison(endog, groups).tukeyhsd(alpha=alpha)
diff --git a/statsmodels/stats/multitest.py b/statsmodels/stats/multitest.py
index a002b9c58..2139a2ce9 100644
--- a/statsmodels/stats/multitest.py
+++ b/statsmodels/stats/multitest.py
@@ -1,42 +1,69 @@
-"""Multiple Testing and P-Value Correction
+'''Multiple Testing and P-Value Correction


 Author: Josef Perktold
 License: BSD-3

-"""
+'''
+
+
 import numpy as np
+
 from statsmodels.stats._knockoff import RegressionFDR
+
 __all__ = ['fdrcorrection', 'fdrcorrection_twostage', 'local_fdr',
-    'multipletests', 'NullDistribution', 'RegressionFDR']
+           'multipletests', 'NullDistribution', 'RegressionFDR']
+
+# ==============================================
+#
+# Part 1: Multiple Tests and P-Value Correction
+#
+# ==============================================


 def _ecdf(x):
-    """no frills empirical cdf used in fdrcorrection
-    """
-    pass
-
-
-multitest_methods_names = {'b': 'Bonferroni', 's': 'Sidak', 'h': 'Holm',
-    'hs': 'Holm-Sidak', 'sh': 'Simes-Hochberg', 'ho': 'Hommel', 'fdr_bh':
-    'FDR Benjamini-Hochberg', 'fdr_by': 'FDR Benjamini-Yekutieli',
-    'fdr_tsbh': 'FDR 2-stage Benjamini-Hochberg', 'fdr_tsbky':
-    'FDR 2-stage Benjamini-Krieger-Yekutieli', 'fdr_gbs':
-    'FDR adaptive Gavrilov-Benjamini-Sarkar'}
-_alias_list = [['b', 'bonf', 'bonferroni'], ['s', 'sidak'], ['h', 'holm'],
-    ['hs', 'holm-sidak'], ['sh', 'simes-hochberg'], ['ho', 'hommel'], [
-    'fdr_bh', 'fdr_i', 'fdr_p', 'fdri', 'fdrp'], ['fdr_by', 'fdr_n',
-    'fdr_c', 'fdrn', 'fdrcorr'], ['fdr_tsbh', 'fdr_2sbh'], ['fdr_tsbky',
-    'fdr_2sbky', 'fdr_twostage'], ['fdr_gbs']]
+    '''no frills empirical cdf used in fdrcorrection
+    '''
+    nobs = len(x)
+    return np.arange(1,nobs+1)/float(nobs)
+
+multitest_methods_names = {'b': 'Bonferroni',
+                           's': 'Sidak',
+                           'h': 'Holm',
+                           'hs': 'Holm-Sidak',
+                           'sh': 'Simes-Hochberg',
+                           'ho': 'Hommel',
+                           'fdr_bh': 'FDR Benjamini-Hochberg',
+                           'fdr_by': 'FDR Benjamini-Yekutieli',
+                           'fdr_tsbh': 'FDR 2-stage Benjamini-Hochberg',
+                           'fdr_tsbky': 'FDR 2-stage Benjamini-Krieger-Yekutieli',
+                           'fdr_gbs': 'FDR adaptive Gavrilov-Benjamini-Sarkar'
+                           }
+
+_alias_list = [['b', 'bonf', 'bonferroni'],
+               ['s', 'sidak'],
+               ['h', 'holm'],
+               ['hs', 'holm-sidak'],
+               ['sh', 'simes-hochberg'],
+               ['ho', 'hommel'],
+               ['fdr_bh', 'fdr_i', 'fdr_p', 'fdri', 'fdrp'],
+               ['fdr_by', 'fdr_n', 'fdr_c', 'fdrn', 'fdrcorr'],
+               ['fdr_tsbh', 'fdr_2sbh'],
+               ['fdr_tsbky', 'fdr_2sbky', 'fdr_twostage'],
+               ['fdr_gbs']
+               ]
+
+
 multitest_alias = {}
 for m in _alias_list:
     multitest_alias[m[0]] = m[0]
     for a in m[1:]:
         multitest_alias[a] = m[0]

-
-def multipletests(pvals, alpha=0.05, method='hs', maxiter=1, is_sorted=
-    False, returnsorted=False):
+def multipletests(pvals, alpha=0.05, method='hs',
+                  maxiter=1,
+                  is_sorted=False,
+                  returnsorted=False):
     """
     Test results and p-value correction for multiple tests

@@ -117,11 +144,141 @@ def multipletests(pvals, alpha=0.05, method='hs', maxiter=1, is_sorted=
     Method='hommel' is very slow for large arrays, since it requires the
     evaluation of n partitions, where n is the number of p-values.
     """
-    pass
+    import gc
+    pvals = np.asarray(pvals)
+    alphaf = alpha  # Notation ?
+
+    if not is_sorted:
+        sortind = np.argsort(pvals)
+        pvals = np.take(pvals, sortind)
+
+    ntests = len(pvals)
+    alphacSidak = 1 - np.power((1. - alphaf), 1./ntests)
+    alphacBonf = alphaf / float(ntests)
+    if method.lower() in ['b', 'bonf', 'bonferroni']:
+        reject = pvals <= alphacBonf
+        pvals_corrected = pvals * float(ntests)
+
+    elif method.lower() in ['s', 'sidak']:
+        reject = pvals <= alphacSidak
+        pvals_corrected = -np.expm1(ntests * np.log1p(-pvals))
+
+    elif method.lower() in ['hs', 'holm-sidak']:
+        alphacSidak_all = 1 - np.power((1. - alphaf),
+                                       1./np.arange(ntests, 0, -1))
+        notreject = pvals > alphacSidak_all
+        del alphacSidak_all
+
+        nr_index = np.nonzero(notreject)[0]
+        if nr_index.size == 0:
+            # nonreject is empty, all rejected
+            notrejectmin = len(pvals)
+        else:
+            notrejectmin = np.min(nr_index)
+        notreject[notrejectmin:] = True
+        reject = ~notreject
+        del notreject
+
+        # It's eqivalent to 1 - np.power((1. - pvals),
+        #                           np.arange(ntests, 0, -1))
+        # but prevents the issue of the floating point precision
+        pvals_corrected_raw = -np.expm1(np.arange(ntests, 0, -1) *
+                                        np.log1p(-pvals))
+        pvals_corrected = np.maximum.accumulate(pvals_corrected_raw)
+        del pvals_corrected_raw
+
+    elif method.lower() in ['h', 'holm']:
+        notreject = pvals > alphaf / np.arange(ntests, 0, -1)
+        nr_index = np.nonzero(notreject)[0]
+        if nr_index.size == 0:
+            # nonreject is empty, all rejected
+            notrejectmin = len(pvals)
+        else:
+            notrejectmin = np.min(nr_index)
+        notreject[notrejectmin:] = True
+        reject = ~notreject
+        pvals_corrected_raw = pvals * np.arange(ntests, 0, -1)
+        pvals_corrected = np.maximum.accumulate(pvals_corrected_raw)
+        del pvals_corrected_raw
+        gc.collect()
+
+    elif method.lower() in ['sh', 'simes-hochberg']:
+        alphash = alphaf / np.arange(ntests, 0, -1)
+        reject = pvals <= alphash
+        rejind = np.nonzero(reject)
+        if rejind[0].size > 0:
+            rejectmax = np.max(np.nonzero(reject))
+            reject[:rejectmax] = True
+        pvals_corrected_raw = np.arange(ntests, 0, -1) * pvals
+        pvals_corrected = np.minimum.accumulate(pvals_corrected_raw[::-1])[::-1]
+        del pvals_corrected_raw
+
+    elif method.lower() in ['ho', 'hommel']:
+        # we need a copy because we overwrite it in a loop
+        a = pvals.copy()
+        for m in range(ntests, 1, -1):
+            cim = np.min(m * pvals[-m:] / np.arange(1,m+1.))
+            a[-m:] = np.maximum(a[-m:], cim)
+            a[:-m] = np.maximum(a[:-m], np.minimum(m * pvals[:-m], cim))
+        pvals_corrected = a
+        reject = a <= alphaf
+
+    elif method.lower() in ['fdr_bh', 'fdr_i', 'fdr_p', 'fdri', 'fdrp']:
+        # delegate, call with sorted pvals
+        reject, pvals_corrected = fdrcorrection(pvals, alpha=alpha,
+                                                 method='indep',
+                                                 is_sorted=True)
+    elif method.lower() in ['fdr_by', 'fdr_n', 'fdr_c', 'fdrn', 'fdrcorr']:
+        # delegate, call with sorted pvals
+        reject, pvals_corrected = fdrcorrection(pvals, alpha=alpha,
+                                                 method='n',
+                                                 is_sorted=True)
+    elif method.lower() in ['fdr_tsbky', 'fdr_2sbky', 'fdr_twostage']:
+        # delegate, call with sorted pvals
+        reject, pvals_corrected = fdrcorrection_twostage(pvals, alpha=alpha,
+                                                         method='bky',
+                                                         maxiter=maxiter,
+                                                         is_sorted=True)[:2]
+    elif method.lower() in ['fdr_tsbh', 'fdr_2sbh']:
+        # delegate, call with sorted pvals
+        reject, pvals_corrected = fdrcorrection_twostage(pvals, alpha=alpha,
+                                                         method='bh',
+                                                         maxiter=maxiter,
+                                                         is_sorted=True)[:2]
+
+    elif method.lower() in ['fdr_gbs']:
+        #adaptive stepdown in Gavrilov, Benjamini, Sarkar, Annals of Statistics 2009
+##        notreject = pvals > alphaf / np.arange(ntests, 0, -1) #alphacSidak
+##        notrejectmin = np.min(np.nonzero(notreject))
+##        notreject[notrejectmin:] = True
+##        reject = ~notreject
+
+        ii = np.arange(1, ntests + 1)
+        q = (ntests + 1. - ii)/ii * pvals / (1. - pvals)
+        pvals_corrected_raw = np.maximum.accumulate(q) #up requirementd
+
+        pvals_corrected = np.minimum.accumulate(pvals_corrected_raw[::-1])[::-1]
+        del pvals_corrected_raw
+        reject = pvals_corrected <= alpha
+
+    else:
+        raise ValueError('method not recognized')
+
+    if pvals_corrected is not None: #not necessary anymore
+        pvals_corrected[pvals_corrected>1] = 1
+    if is_sorted or returnsorted:
+        return reject, pvals_corrected, alphacSidak, alphacBonf
+    else:
+        pvals_corrected_ = np.empty_like(pvals_corrected)
+        pvals_corrected_[sortind] = pvals_corrected
+        del pvals_corrected
+        reject_ = np.empty_like(reject)
+        reject_[sortind] = reject
+        return reject_, pvals_corrected_, alphacSidak, alphacBonf


 def fdrcorrection(pvals, alpha=0.05, method='indep', is_sorted=False):
-    """
+    '''
     pvalue correction for false discovery rate.

     This covers Benjamini/Hochberg for independent or positively correlated and
@@ -170,13 +327,51 @@ def fdrcorrection(pvals, alpha=0.05, method='indep', is_sorted=False):
     --------
     multipletests

-    """
-    pass
-
-
-def fdrcorrection_twostage(pvals, alpha=0.05, method='bky', maxiter=1, iter
-    =None, is_sorted=False):
-    """(iterated) two stage linear step-up procedure with estimation of number of true
+    '''
+    pvals = np.asarray(pvals)
+    assert pvals.ndim == 1, "pvals must be 1-dimensional, that is of shape (n,)"
+
+    if not is_sorted:
+        pvals_sortind = np.argsort(pvals)
+        pvals_sorted = np.take(pvals, pvals_sortind)
+    else:
+        pvals_sorted = pvals  # alias
+
+    if method in ['i', 'indep', 'p', 'poscorr']:
+        ecdffactor = _ecdf(pvals_sorted)
+    elif method in ['n', 'negcorr']:
+        cm = np.sum(1./np.arange(1, len(pvals_sorted)+1))   #corrected this
+        ecdffactor = _ecdf(pvals_sorted) / cm
+##    elif method in ['n', 'negcorr']:
+##        cm = np.sum(np.arange(len(pvals)))
+##        ecdffactor = ecdf(pvals_sorted)/cm
+    else:
+        raise ValueError('only indep and negcorr implemented')
+    reject = pvals_sorted <= ecdffactor*alpha
+    if reject.any():
+        rejectmax = max(np.nonzero(reject)[0])
+        reject[:rejectmax] = True
+
+    pvals_corrected_raw = pvals_sorted / ecdffactor
+    pvals_corrected = np.minimum.accumulate(pvals_corrected_raw[::-1])[::-1]
+    del pvals_corrected_raw
+    pvals_corrected[pvals_corrected>1] = 1
+    if not is_sorted:
+        pvals_corrected_ = np.empty_like(pvals_corrected)
+        pvals_corrected_[pvals_sortind] = pvals_corrected
+        del pvals_corrected
+        reject_ = np.empty_like(reject)
+        reject_[pvals_sortind] = reject
+        return reject_, pvals_corrected_
+    else:
+        return reject, pvals_corrected
+
+
+def fdrcorrection_twostage(pvals, alpha=0.05, method='bky',
+                           maxiter=1,
+                           iter=None,
+                           is_sorted=False):
+    '''(iterated) two stage linear step-up procedure with estimation of number of true
     hypotheses

     Benjamini, Krieger and Yekuteli, procedure in Definition 6
@@ -250,12 +445,83 @@ def fdrcorrection_twostage(pvals, alpha=0.05, method='bky', maxiter=1, iter

     TODO: What should be returned?

-    """
-    pass
-
-
-def local_fdr(zscores, null_proportion=1.0, null_pdf=None, deg=7, nbins=30,
-    alpha=0):
+    '''
+    pvals = np.asarray(pvals)
+
+    if iter is not None:
+        import warnings
+        msg = "iter keyword is deprecated, use maxiter keyword instead."
+        warnings.warn(msg, FutureWarning)
+
+    if iter is False:
+        maxiter = 1
+    elif iter is True or maxiter in [-1, None] :
+        maxiter = len(pvals)
+    # otherwise we use maxiter
+
+
+    if not is_sorted:
+        pvals_sortind = np.argsort(pvals)
+        pvals = np.take(pvals, pvals_sortind)
+
+    ntests = len(pvals)
+    if method == 'bky':
+        fact = (1.+alpha)
+        alpha_prime = alpha / fact
+    elif method == 'bh':
+        fact = 1.
+        alpha_prime = alpha
+    else:
+        raise ValueError("only 'bky' and 'bh' are available as method")
+
+    alpha_stages = [alpha_prime]
+    rej, pvalscorr = fdrcorrection(pvals, alpha=alpha_prime, method='indep',
+                                   is_sorted=True)
+    r1 = rej.sum()
+    if (r1 == 0) or (r1 == ntests):
+        # return rej, pvalscorr * fact, ntests - r1, alpha_stages
+        reject = rej
+        pvalscorr *= fact
+        ri = r1
+    else:
+        ri_old = ri = r1
+        ntests0 = ntests # needed if maxiter=0
+        # while True:
+        for it in range(maxiter):
+            ntests0 = 1.0 * ntests - ri_old
+            alpha_star = alpha_prime * ntests / ntests0
+            alpha_stages.append(alpha_star)
+            #print ntests0, alpha_star
+            rej, pvalscorr = fdrcorrection(pvals, alpha=alpha_star, method='indep',
+                                           is_sorted=True)
+            ri = rej.sum()
+            if (it >= maxiter - 1) or ri == ri_old:
+                break
+            elif ri < ri_old:
+                # prevent cycles and endless loops
+                raise RuntimeError(" oops - should not be here")
+            ri_old = ri
+
+        # make adjustment to pvalscorr to reflect estimated number of Non-Null cases
+        # decision is then pvalscorr < alpha  (or <=)
+        pvalscorr *= ntests0 * 1.0 /  ntests
+        if method == 'bky':
+            pvalscorr *= (1. + alpha)
+
+    pvalscorr[pvalscorr>1] = 1
+    if not is_sorted:
+        pvalscorr_ = np.empty_like(pvalscorr)
+        pvalscorr_[pvals_sortind] = pvalscorr
+        del pvalscorr
+        reject = np.empty_like(rej)
+        reject[pvals_sortind] = rej
+        return reject, pvalscorr_, ntests - ri, alpha_stages
+    else:
+        return rej, pvalscorr, ntests - ri, alpha_stages
+
+
+def local_fdr(zscores, null_proportion=1.0, null_pdf=None, deg=7,
+              nbins=30, alpha=0):
     """
     Calculate local FDR values for a list of Z-scores.

@@ -301,7 +567,58 @@ def local_fdr(zscores, null_proportion=1.0, null_pdf=None, deg=7, nbins=30,
     >>> null = EmpiricalNull(zscores)
     >>> fdr = local_fdr(zscores, null_pdf=null.pdf)
     """
-    pass
+
+    from statsmodels.genmod.generalized_linear_model import GLM
+    from statsmodels.genmod.generalized_linear_model import families
+    from statsmodels.regression.linear_model import OLS
+
+    # Bins for Poisson modeling of the marginal Z-score density
+    minz = min(zscores)
+    maxz = max(zscores)
+    bins = np.linspace(minz, maxz, nbins)
+
+    # Bin counts
+    zhist = np.histogram(zscores, bins)[0]
+
+    # Bin centers
+    zbins = (bins[:-1] + bins[1:]) / 2
+
+    # The design matrix at bin centers
+    dmat = np.vander(zbins, deg + 1)
+
+    # Rescale the design matrix
+    sd = dmat.std(0)
+    ii = sd >1e-8
+    dmat[:, ii] /= sd[ii]
+
+    start = OLS(np.log(1 + zhist), dmat).fit().params
+
+    # Poisson regression
+    if alpha > 0:
+        md = GLM(zhist, dmat, family=families.Poisson()).fit_regularized(L1_wt=0, alpha=alpha, start_params=start)
+    else:
+        md = GLM(zhist, dmat, family=families.Poisson()).fit(start_params=start)
+
+    # The design matrix for all Z-scores
+    dmat_full = np.vander(zscores, deg + 1)
+    dmat_full[:, ii] /= sd[ii]
+
+    # The height of the estimated marginal density of Z-scores,
+    # evaluated at every observed Z-score.
+    fz = md.predict(dmat_full) / (len(zscores) * (bins[1] - bins[0]))
+
+    # The null density.
+    if null_pdf is None:
+        f0 = np.exp(-0.5 * zscores**2) / np.sqrt(2 * np.pi)
+    else:
+        f0 = null_pdf(zscores)
+
+    # The local FDR values
+    fdr = null_proportion * f0 / fz
+
+    fdr = np.clip(fdr, 0, 1)
+
+    return fdr


 class NullDistribution:
@@ -354,17 +671,25 @@ class NullDistribution:
     """

     def __init__(self, zscores, null_lb=-1, null_ub=1, estimate_mean=True,
-        estimate_scale=True, estimate_null_proportion=False):
+                 estimate_scale=True, estimate_null_proportion=False):
+
+        # Extract the null z-scores
         ii = np.flatnonzero((zscores >= null_lb) & (zscores <= null_ub))
         if len(ii) == 0:
-            raise RuntimeError('No Z-scores fall between null_lb and null_ub')
+            raise RuntimeError("No Z-scores fall between null_lb and null_ub")
         zscores0 = zscores[ii]
+
+        # Number of Z-scores, and null Z-scores
         n_zs, n_zs0 = len(zscores), len(zscores0)

+        # Unpack and transform the parameters to the natural scale, hold
+        # parameters fixed as specified.
         def xform(params):
-            mean = 0.0
-            sd = 1.0
-            prob = 1.0
+
+            mean = 0.
+            sd = 1.
+            prob = 1.
+
             ii = 0
             if estimate_mean:
                 mean = params[ii]
@@ -374,9 +699,13 @@ class NullDistribution:
                 ii += 1
             if estimate_null_proportion:
                 prob = 1 / (1 + np.exp(-params[ii]))
+
             return mean, sd, prob
+
+
         from scipy.stats.distributions import norm

+
         def fun(params):
             """
             Negative log-likelihood of z-scores.
@@ -389,22 +718,39 @@ class NullDistribution:

             The implementation follows section 4 from Efron 2008.
             """
+
             d, s, p = xform(params)
-            central_mass = norm.cdf((null_ub - d) / s) - norm.cdf((null_lb -
-                d) / s)
+
+            # Mass within the central region
+            central_mass = (norm.cdf((null_ub - d) / s) -
+                            norm.cdf((null_lb - d) / s))
+
+            # Probability that a Z-score is null and is in the central region
             cp = p * central_mass
+
+            # Binomial term
             rval = n_zs0 * np.log(cp) + (n_zs - n_zs0) * np.log(1 - cp)
+
+            # Truncated Gaussian term for null Z-scores
             zv = (zscores0 - d) / s
-            rval += np.sum(-zv ** 2 / 2) - n_zs0 * np.log(s)
+            rval += np.sum(-zv**2 / 2) - n_zs0 * np.log(s)
             rval -= n_zs0 * np.log(central_mass)
+
             return -rval
+
+
+        # Estimate the parameters
         from scipy.optimize import minimize
-        mz = minimize(fun, np.r_[0.0, 0, 3], method='Nelder-Mead')
+        # starting values are mean = 0, scale = 1, p0 ~ 1
+        mz = minimize(fun, np.r_[0., 0, 3], method="Nelder-Mead")
         mean, sd, prob = xform(mz['x'])
+
         self.mean = mean
         self.sd = sd
         self.null_proportion = prob

+
+    # The fitted null density function
     def pdf(self, zscores):
         """
         Evaluates the fitted empirical null Z-score density.
@@ -420,4 +766,6 @@ class NullDistribution:
         The empirical null Z-score density evaluated at the given
         points.
         """
-        pass
+
+        zval = (zscores - self.mean) / self.sd
+        return np.exp(-0.5*zval**2 - np.log(self.sd) - 0.5*np.log(2*np.pi))
diff --git a/statsmodels/stats/multivariate.py b/statsmodels/stats/multivariate.py
index e316ea31f..df178e14e 100644
--- a/statsmodels/stats/multivariate.py
+++ b/statsmodels/stats/multivariate.py
@@ -1,16 +1,24 @@
+# -*- coding: utf-8 -*-
 """
 Created on Sun Nov  5 14:48:19 2017

 Author: Josef Perktold
 License: BSD-3
 """
+
 import numpy as np
 from scipy import stats
+
 from statsmodels.stats.moment_helpers import cov2corr
 from statsmodels.stats.base import HolderTuple
 from statsmodels.tools.validation import array_like


+# shortcut function
+def _logdet(x):
+    return np.linalg.slogdet(x)[1]
+
+
 def test_mvmean(data, mean_null=0, return_results=True):
     """Hotellings test for multivariate mean in one sample

@@ -33,7 +41,25 @@ def test_mvmean(data, mean_null=0, return_results=True):
         pvalue are returned.

     """
-    pass
+    x = np.asarray(data)
+    nobs, k_vars = x.shape
+    mean = x.mean(0)
+    cov = np.cov(x, rowvar=False, ddof=1)
+    diff = mean - mean_null
+    t2 = nobs * diff.dot(np.linalg.solve(cov, diff))
+    factor = (nobs - 1) * k_vars / (nobs - k_vars)
+    statistic = t2 / factor
+    df = (k_vars, nobs - k_vars)
+    pvalue = stats.f.sf(statistic, df[0], df[1])
+    if return_results:
+        res = HolderTuple(statistic=statistic,
+                          pvalue=pvalue,
+                          df=df,
+                          t2=t2,
+                          distr="F")
+        return res
+    else:
+        return statistic, pvalue


 def test_mvmean_2indep(data1, data2):
@@ -54,7 +80,30 @@ def test_mvmean_2indep(data1, data2):
     results : instance of a results class with attributes
         statistic, pvalue, t2 and df
     """
-    pass
+    x1 = array_like(data1, "x1", ndim=2)
+    x2 = array_like(data2, "x2", ndim=2)
+    nobs1, k_vars = x1.shape
+    nobs2, k_vars2 = x2.shape
+    if k_vars2 != k_vars:
+        msg = "both samples need to have the same number of columns"
+        raise ValueError(msg)
+    mean1 = x1.mean(0)
+    mean2 = x2.mean(0)
+    cov1 = np.cov(x1, rowvar=False, ddof=1)
+    cov2 = np.cov(x2, rowvar=False, ddof=1)
+    nobs_t = nobs1 + nobs2
+    combined_cov = ((nobs1 - 1) * cov1 + (nobs2 - 1) * cov2) / (nobs_t - 2)
+    diff = mean1 - mean2
+    t2 = (nobs1 * nobs2) / nobs_t * diff @ np.linalg.solve(combined_cov, diff)
+    factor = ((nobs_t - 2) * k_vars) / (nobs_t - k_vars - 1)
+    statistic = t2 / factor
+    df = (k_vars, nobs_t - 1 - k_vars)
+    pvalue = stats.f.sf(statistic, df[0], df[1])
+    return HolderTuple(statistic=statistic,
+                       pvalue=pvalue,
+                       df=df,
+                       t2=t2,
+                       distr="F")


 def confint_mvmean(data, lin_transf=None, alpha=0.5, simult=False):
@@ -107,11 +156,20 @@ def confint_mvmean(data, lin_transf=None, alpha=0.5, simult=False):
     Statistical Analysis. 6th ed. Upper Saddle River, N.J: Pearson Prentice
     Hall.
     """
-    pass
+    x = np.asarray(data)
+    nobs, k_vars = x.shape
+    if lin_transf is None:
+        lin_transf = np.eye(k_vars)
+    mean = x.mean(0)
+    cov = np.cov(x, rowvar=False, ddof=0)
+
+    ci = confint_mvmean_fromstats(mean, cov, nobs, lin_transf=lin_transf,
+                                  alpha=alpha, simult=simult)
+    return ci


 def confint_mvmean_fromstats(mean, cov, nobs, lin_transf=None, alpha=0.05,
-    simult=False):
+                             simult=False):
     """Confidence interval for linear transformation of a multivariate mean

     Either pointwise or simultaneous confidence intervals are returned.
@@ -155,7 +213,30 @@ def confint_mvmean_fromstats(mean, cov, nobs, lin_transf=None, alpha=0.05,
     Hall.

     """
-    pass
+    mean = np.asarray(mean)
+    cov = np.asarray(cov)
+    c = np.atleast_2d(lin_transf)
+    k_vars = len(mean)
+
+    if simult is False:
+        values = c.dot(mean)
+        quad_form = (c * cov.dot(c.T).T).sum(1)
+        df = nobs - 1
+        t_critval = stats.t.isf(alpha / 2, df)
+        ci_diff = np.sqrt(quad_form / df) * t_critval
+        low = values - ci_diff
+        upp = values + ci_diff
+    else:
+        values = c.dot(mean)
+        quad_form = (c * cov.dot(c.T).T).sum(1)
+        factor = (nobs - 1) * k_vars / (nobs - k_vars) / nobs
+        df = (k_vars, nobs - k_vars)
+        f_critval = stats.f.isf(alpha, df[0], df[1])
+        ci_diff = np.sqrt(factor * quad_form * f_critval)
+        low = values - ci_diff
+        upp = values + ci_diff
+
+    return low, upp, values  # , (f_critval, factor, quad_form, df)


 """
@@ -174,7 +255,7 @@ Stata refers to Rencher and Christensen for the formulas. Those correspond
 to the formula collection in Bartlett 1954 for several of them.


-"""
+"""  # pylint: disable=W0105


 def test_cov(cov, nobs, cov_null):
@@ -213,20 +294,41 @@ def test_cov(cov, nobs, cov_null):
     Stata Press Publication.

     """
-    pass
+    # using Stata formulas where cov_sample use nobs in denominator
+    # Bartlett 1954 has fewer terms
+
+    S = np.asarray(cov) * (nobs - 1) / nobs
+    S0 = np.asarray(cov_null)
+    k = cov.shape[0]
+    n = nobs
+
+    fact = nobs - 1.
+    fact *= 1 - (2 * k + 1 - 2 / (k + 1)) / (6 * (n - 1) - 1)
+    fact2 = _logdet(S0) - _logdet(n / (n - 1) * S)
+    fact2 += np.trace(n / (n - 1) * np.linalg.solve(S0, S)) - k
+    statistic = fact * fact2
+    df = k * (k + 1) / 2
+    pvalue = stats.chi2.sf(statistic, df)
+    return HolderTuple(statistic=statistic,
+                       pvalue=pvalue,
+                       df=df,
+                       distr="chi2",
+                       null="equal value",
+                       cov_null=cov_null
+                       )


 def test_cov_spherical(cov, nobs):
-    """One sample hypothesis test that covariance matrix is spherical
+    r"""One sample hypothesis test that covariance matrix is spherical

     The Null and alternative hypotheses are

     .. math::

-       H0 &: \\Sigma = \\sigma I \\\\
-       H1 &: \\Sigma \\neq \\sigma I
+       H0 &: \Sigma = \sigma I \\
+       H1 &: \Sigma \neq \sigma I

-    where :math:`\\sigma_i` is the common variance with unspecified value.
+    where :math:`\sigma_i` is the common variance with unspecified value.

     Parameters
     ----------
@@ -255,20 +357,35 @@ def test_cov_spherical(cov, nobs):
     StataCorp, L. P. Stata Multivariate Statistics: Reference Manual.
     Stata Press Publication.
     """
-    pass
+
+    # unchanged Stata formula, but denom is cov cancels, AFAICS
+    # Bartlett 1954 correction factor in IIIc
+    cov = np.asarray(cov)
+    k = cov.shape[0]
+
+    statistic = nobs - 1 - (2 * k**2 + k + 2) / (6 * k)
+    statistic *= k * np.log(np.trace(cov)) - _logdet(cov) - k * np.log(k)
+    df = k * (k + 1) / 2 - 1
+    pvalue = stats.chi2.sf(statistic, df)
+    return HolderTuple(statistic=statistic,
+                       pvalue=pvalue,
+                       df=df,
+                       distr="chi2",
+                       null="spherical"
+                       )


 def test_cov_diagonal(cov, nobs):
-    """One sample hypothesis test that covariance matrix is diagonal matrix.
+    r"""One sample hypothesis test that covariance matrix is diagonal matrix.

     The Null and alternative hypotheses are

     .. math::

-       H0 &: \\Sigma = diag(\\sigma_i) \\\\
-       H1 &: \\Sigma \\neq diag(\\sigma_i)
+       H0 &: \Sigma = diag(\sigma_i) \\
+       H1 &: \Sigma \neq diag(\sigma_i)

-    where :math:`\\sigma_i` are the variances with unspecified values.
+    where :math:`\sigma_i` are the variances with unspecified values.

     Parameters
     ----------
@@ -293,26 +410,51 @@ def test_cov_diagonal(cov, nobs):
     StataCorp, L. P. Stata Multivariate Statistics: Reference Manual.
     Stata Press Publication.
     """
-    pass
+    cov = np.asarray(cov)
+    k = cov.shape[0]
+    R = cov2corr(cov)
+
+    statistic = -(nobs - 1 - (2 * k + 5) / 6) * _logdet(R)
+    df = k * (k - 1) / 2
+    pvalue = stats.chi2.sf(statistic, df)
+    return HolderTuple(statistic=statistic,
+                       pvalue=pvalue,
+                       df=df,
+                       distr="chi2",
+                       null="diagonal"
+                       )


 def _get_blocks(mat, block_len):
     """get diagonal blocks from matrix
     """
-    pass
+    k = len(mat)
+    idx = np.cumsum(block_len)
+    if idx[-1] == k:
+        idx = idx[:-1]
+    elif idx[-1] > k:
+        raise ValueError("sum of block_len larger than shape of mat")
+    else:
+        # allow one missing block that is the remainder
+        pass
+    idx_blocks = np.split(np.arange(k), idx)
+    blocks = []
+    for ii in idx_blocks:
+        blocks.append(mat[ii[:, None], ii])
+    return blocks, idx_blocks


 def test_cov_blockdiagonal(cov, nobs, block_len):
-    """One sample hypothesis test that covariance is block diagonal.
+    r"""One sample hypothesis test that covariance is block diagonal.

     The Null and alternative hypotheses are

     .. math::

-       H0 &: \\Sigma = diag(\\Sigma_i) \\\\
-       H1 &: \\Sigma \\neq diag(\\Sigma_i)
+       H0 &: \Sigma = diag(\Sigma_i) \\
+       H1 &: \Sigma \neq diag(\Sigma_i)

-    where :math:`\\Sigma_i` are covariance blocks with unspecified values.
+    where :math:`\Sigma_i` are covariance blocks with unspecified values.

     Parameters
     ----------
@@ -339,11 +481,32 @@ def test_cov_blockdiagonal(cov, nobs, block_len):
     StataCorp, L. P. Stata Multivariate Statistics: Reference Manual.
     Stata Press Publication.
     """
-    pass
+    cov = np.asarray(cov)
+    cov_blocks = _get_blocks(cov, block_len)[0]
+    k = cov.shape[0]
+    k_blocks = [c.shape[0] for c in cov_blocks]
+    if k != sum(k_blocks):
+        msg = "sample covariances and blocks do not have matching shape"
+        raise ValueError(msg)
+    logdet_blocks = sum(_logdet(c) for c in cov_blocks)
+    a2 = k**2 - sum(ki**2 for ki in k_blocks)
+    a3 = k**3 - sum(ki**3 for ki in k_blocks)
+
+    statistic = (nobs - 1 - (2 * a3 + 3 * a2) / (6. * a2))
+    statistic *= logdet_blocks - _logdet(cov)
+
+    df = a2 / 2
+    pvalue = stats.chi2.sf(statistic, df)
+    return HolderTuple(statistic=statistic,
+                       pvalue=pvalue,
+                       df=df,
+                       distr="chi2",
+                       null="block-diagonal"
+                       )


 def test_cov_oneway(cov_list, nobs_list):
-    """Multiple sample hypothesis test that covariance matrices are equal.
+    r"""Multiple sample hypothesis test that covariance matrices are equal.

     This is commonly known as Box-M test.

@@ -351,10 +514,10 @@ def test_cov_oneway(cov_list, nobs_list):

     .. math::

-       H0 &: \\Sigma_i = \\Sigma_j  \\text{ for all i and j} \\\\
-       H1 &: \\Sigma_i \\neq \\Sigma_j \\text{ for at least one i and j}
+       H0 &: \Sigma_i = \Sigma_j  \text{ for all i and j} \\
+       H1 &: \Sigma_i \neq \Sigma_j \text{ for at least one i and j}

-    where :math:`\\Sigma_i` is the covariance of sample `i`.
+    where :math:`\Sigma_i` is the covariance of sample `i`.

     Parameters
     ----------
@@ -387,4 +550,45 @@ def test_cov_oneway(cov_list, nobs_list):
     StataCorp, L. P. Stata Multivariate Statistics: Reference Manual.
     Stata Press Publication.
     """
-    pass
+    # Note stata uses nobs in cov, this uses nobs - 1
+    cov_list = list(map(np.asarray, cov_list))
+    m = len(cov_list)
+    nobs = sum(nobs_list)  # total number of observations
+    k = cov_list[0].shape[0]
+
+    cov_pooled = sum((n - 1) * c for (n, c) in zip(nobs_list, cov_list))
+    cov_pooled /= (nobs - m)
+    stat0 = (nobs - m) * _logdet(cov_pooled)
+    stat0 -= sum((n - 1) * _logdet(c) for (n, c) in zip(nobs_list, cov_list))
+
+    # Box's chi2
+    c1 = sum(1 / (n - 1) for n in nobs_list) - 1 / (nobs - m)
+    c1 *= (2 * k*k + 3 * k - 1) / (6 * (k + 1) * (m - 1))
+    df_chi2 = (m - 1) * k * (k + 1) / 2
+    statistic_chi2 = (1 - c1) * stat0
+    pvalue_chi2 = stats.chi2.sf(statistic_chi2, df_chi2)
+
+    c2 = sum(1 / (n - 1)**2 for n in nobs_list) - 1 / (nobs - m)**2
+    c2 *= (k - 1) * (k + 2) / (6 * (m - 1))
+    a1 = df_chi2
+    a2 = (a1 + 2) / abs(c2 - c1**2)
+    b1 = (1 - c1 - a1 / a2) / a1
+    b2 = (1 - c1 + 2 / a2) / a2
+    if c2 > c1**2:
+        statistic_f = b1 * stat0
+    else:
+        tmp = b2 * stat0
+        statistic_f = a2 / a1 * tmp / (1 + tmp)
+    df_f = (a1, a2)
+    pvalue_f = stats.f.sf(statistic_f, *df_f)
+    return HolderTuple(statistic=statistic_f,  # name convention, using F here
+                       pvalue=pvalue_f,   # name convention, using F here
+                       statistic_base=stat0,
+                       statistic_chi2=statistic_chi2,
+                       pvalue_chi2=pvalue_chi2,
+                       df_chi2=df_chi2,
+                       distr_chi2='chi2',
+                       statistic_f=statistic_f,
+                       pvalue_f=pvalue_f,
+                       df_f=df_f,
+                       distr_f='F')
diff --git a/statsmodels/stats/multivariate_tools.py b/statsmodels/stats/multivariate_tools.py
index c43604667..129701711 100644
--- a/statsmodels/stats/multivariate_tools.py
+++ b/statsmodels/stats/multivariate_tools.py
@@ -1,4 +1,4 @@
-"""Tools for multivariate analysis
+'''Tools for multivariate analysis


 Author : Josef Perktold
@@ -10,13 +10,16 @@ TODO:

 - names of functions, currently just "working titles"

-"""
+'''
+
+
+
 import numpy as np
-from statsmodels.tools.tools import Bunch

+from statsmodels.tools.tools import Bunch

 def partial_project(endog, exog):
-    """helper function to get linear projection or partialling out of variables
+    '''helper function to get linear projection or partialling out of variables

     endog variables are projected on exog variables

@@ -41,12 +44,21 @@ def partial_project(endog, exog):
     This is no-frills mainly for internal calculations, no error checking or
     array conversion is performed, at least for now.

-    """
-    pass
+    '''
+    x1, x2 = endog, exog
+    params = np.linalg.pinv(x2).dot(x1)
+    predicted = x2.dot(params)
+    residual = x1 - predicted
+    res = Bunch(params=params,
+                fittedvalues=predicted,
+                resid=residual)
+
+    return res
+


 def cancorr(x1, x2, demean=True, standardize=False):
-    """canonical correlation coefficient beween 2 arrays
+    '''canonical correlation coefficient beween 2 arrays

     Parameters
     ----------
@@ -81,12 +93,27 @@ def cancorr(x1, x2, demean=True, standardize=False):
     cc_stats
     CCA not yet

-    """
-    pass
+    '''
+    #x, y = x1, x2
+    if demean or standardize:
+        x1 = (x1 - x1.mean(0))
+        x2 = (x2 - x2.mean(0))
+
+    if standardize:
+        #std does not make a difference to canonical correlation coefficients
+        x1 /= x1.std(0)
+        x2 /= x2.std(0)
+
+    t1 = np.linalg.pinv(x1).dot(x2)
+    t2 = np.linalg.pinv(x2).dot(x1)
+    m = t1.dot(t2)
+    cc = np.sqrt(np.linalg.eigvals(m))
+    cc.sort()
+    return cc[::-1]


 def cc_ranktest(x1, x2, demean=True, fullrank=False):
-    """rank tests based on smallest canonical correlation coefficients
+    '''rank tests based on smallest canonical correlation coefficients

     Anderson canonical correlations test (LM test) and
     Cragg-Donald test (Wald test)
@@ -132,12 +159,30 @@ def cc_ranktest(x1, x2, demean=True, fullrank=False):
     cancorr
     cc_stats

-    """
-    pass
+    '''
+
+    from scipy import stats
+
+    nobs1, k1 = x1.shape
+    nobs2, k2 = x2.shape
+
+    cc = cancorr(x1, x2, demean=demean)
+    cc2 = cc * cc
+    if fullrank:
+        df = np.abs(k1 - k2) + 1
+        value = nobs1 * cc2[-1]
+        w_value = nobs1 * (cc2[-1] / (1. - cc2[-1]))
+        return value, stats.chi2.sf(value, df), df, cc, w_value, stats.chi2.sf(w_value, df)
+    else:
+        r = np.arange(min(k1, k2))[::-1]
+        df = (k1 - r) * (k2 - r)
+        values = nobs1 * cc2[::-1].cumsum()
+        w_values = nobs1 * (cc2 / (1. - cc2))[::-1].cumsum()
+        return values, stats.chi2.sf(values, df), df, cc, w_values, stats.chi2.sf(w_values, df)


 def cc_stats(x1, x2, demean=True):
-    """MANOVA statistics based on canonical correlation coefficient
+    '''MANOVA statistics based on canonical correlation coefficient

     Calculates Pillai's Trace, Wilk's Lambda, Hotelling's Trace and
     Roy's Largest Root.
@@ -164,5 +209,38 @@ def cc_stats(x1, x2, demean=True):
     TODO: should return a results class instead
     produces nans sometimes, singular, perfect correlation of x1, x2 ?

-    """
-    pass
+    '''
+
+    nobs1, k1 = x1.shape  # endogenous ?
+    nobs2, k2 = x2.shape
+    cc = cancorr(x1, x2, demean=demean)
+    cc2 = cc**2
+    lam = (cc2 / (1 - cc2))  # what if max cc2 is 1 ?
+    # Problem: ccr might not care if x1 or x2 are reduced rank,
+    #          but df will depend on rank
+    df_model = k1 * k2  # df_hypothesis (we do not include mean in x1, x2)
+    df_resid = k1 * (nobs1 - k2 - demean)
+    s = min(df_model, k1)
+    m = 0.5 * (df_model - k1)
+    n = 0.5 * (df_resid - k1 - 1)
+
+    df1 = k1 * df_model
+    df2 = k2
+
+
+    pt_value = cc2.sum()    # Pillai's trace
+    wl_value = np.product(1 / (1 + lam))   # Wilk's Lambda
+    ht_value = lam.sum()    # Hotelling's Trace
+    rm_value = lam.max()    # Roy's largest root
+    #from scipy import stats
+    # what's the distribution, the test statistic ?
+    res = {}
+    res['canonical correlation coefficient'] = cc
+    res['eigenvalues'] = lam
+    res["Pillai's Trace"] = pt_value
+    res["Wilk's Lambda"] = wl_value
+    res["Hotelling's Trace"] = ht_value
+    res["Roy's Largest Root"] = rm_value
+    res['df_resid'] = df_resid
+    res['df_m'] = m
+    return res
diff --git a/statsmodels/stats/nonparametric.py b/statsmodels/stats/nonparametric.py
index f2768bf09..abb2c28dd 100644
--- a/statsmodels/stats/nonparametric.py
+++ b/statsmodels/stats/nonparametric.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Rank based methods for inferential statistics

@@ -7,11 +8,19 @@ Author: Josef Perktold
 License: BSD-3

 """
+
+
 import numpy as np
 from scipy import stats
 from scipy.stats import rankdata
+
 from statsmodels.stats.base import HolderTuple
-from statsmodels.stats.weightstats import _tconfint_generic, _tstat_generic, _zconfint_generic, _zstat_generic
+from statsmodels.stats.weightstats import (
+    _tconfint_generic,
+    _tstat_generic,
+    _zconfint_generic,
+    _zstat_generic,
+)


 def rankdata_2samp(x1, x2):
@@ -34,7 +43,28 @@ def rankdata_2samp(x1, x2):
         Internal midranks of the second sample.

     """
-    pass
+    x1 = np.asarray(x1)
+    x2 = np.asarray(x2)
+
+    nobs1 = len(x1)
+    nobs2 = len(x2)
+    if nobs1 == 0 or nobs2 == 0:
+        raise ValueError("one sample has zero length")
+
+    x_combined = np.concatenate((x1, x2))
+    if x_combined.ndim > 1:
+        rank = np.apply_along_axis(rankdata, 0, x_combined)
+    else:
+        rank = rankdata(x_combined)  # no axis in older scipy
+    rank1 = rank[:nobs1]
+    rank2 = rank[nobs1:]
+    if x_combined.ndim > 1:
+        ranki1 = np.apply_along_axis(rankdata, 0, x1)
+        ranki2 = np.apply_along_axis(rankdata, 0, x2)
+    else:
+        ranki1 = rankdata(x1)
+        ranki2 = rankdata(x2)
+    return rank1, rank2, ranki1, ranki2


 class RankCompareResult(HolderTuple):
@@ -45,7 +75,7 @@ class RankCompareResult(HolderTuple):
     and summary.
     """

-    def conf_int(self, value=None, alpha=0.05, alternative='two-sided'):
+    def conf_int(self, value=None, alpha=0.05, alternative="two-sided"):
         """
         Confidence interval for probability that sample 1 has larger values

@@ -78,9 +108,20 @@ class RankCompareResult(HolderTuple):
             "larger".

         """
-        pass

-    def test_prob_superior(self, value=0.5, alternative='two-sided'):
+        p0 = value
+        if p0 is None:
+            p0 = 0
+        diff = self.prob1 - p0
+        std_diff = np.sqrt(self.var / self.nobs)
+
+        if self.use_t is False:
+            return _zconfint_generic(diff, std_diff, alpha, alternative)
+        else:
+            return _tconfint_generic(diff, std_diff, self.df, alpha,
+                                     alternative)
+
+    def test_prob_superior(self, value=0.5, alternative="two-sided"):
         """test for superiority probability

         H0: P(x1 > x2) + 0.5 * P(x1 = x2) = value
@@ -110,10 +151,31 @@ class RankCompareResult(HolderTuple):
                 Pvalue of the test based on either normal or t distribution.

         """
-        pass
+
+        p0 = value  # alias
+        # diff = self.prob1 - p0  # for reporting, not used in computation
+        # TODO: use var_prob
+        std_diff = np.sqrt(self.var / self.nobs)
+
+        # corresponds to a one-sample test and either p0 or diff could be used
+        if not self.use_t:
+            stat, pv = _zstat_generic(self.prob1, p0, std_diff, alternative,
+                                      diff=0)
+            distr = "normal"
+        else:
+            stat, pv = _tstat_generic(self.prob1, p0, std_diff, self.df,
+                                      alternative, diff=0)
+            distr = "t"
+
+        res = HolderTuple(statistic=stat,
+                          pvalue=pv,
+                          df=self.df,
+                          distribution=distr
+                          )
+        return res

     def tost_prob_superior(self, low, upp):
-        """test of stochastic (non-)equivalence of p = P(x1 > x2)
+        '''test of stochastic (non-)equivalence of p = P(x1 > x2)

         Null hypothesis:  p < low or p > upp
         Alternative hypothesis:  low < p < upp
@@ -152,11 +214,27 @@ class RankCompareResult(HolderTuple):
                 Results instanc with test statistic, pvalue and degrees of
                 freedom for upper threshold test.

-        """
-        pass
-
-    def confint_lintransf(self, const=-1, slope=2, alpha=0.05, alternative=
-        'two-sided'):
+        '''
+
+        t1 = self.test_prob_superior(low, alternative='larger')
+        t2 = self.test_prob_superior(upp, alternative='smaller')
+
+        # idx_max = 1 if t1.pvalue < t2.pvalue else 0
+        idx_max = np.asarray(t1.pvalue < t2.pvalue, int)
+        title = "Equivalence test for Prob(x1 > x2) + 0.5 Prob(x1 = x2) "
+        res = HolderTuple(statistic=np.choose(idx_max,
+                                              [t1.statistic, t2.statistic]),
+                          # pvalue=[t1.pvalue, t2.pvalue][idx_max], # python
+                          # use np.choose for vectorized selection
+                          pvalue=np.choose(idx_max, [t1.pvalue, t2.pvalue]),
+                          results_larger=t1,
+                          results_smaller=t2,
+                          title=title
+                          )
+        return res
+
+    def confint_lintransf(self, const=-1, slope=2, alpha=0.05,
+                          alternative="two-sided"):
         """confidence interval of a linear transformation of prob1

         This computes the confidence interval for
@@ -189,7 +267,13 @@ class RankCompareResult(HolderTuple):
             "larger".

         """
-        pass
+
+        low_p, upp_p = self.conf_int(alpha=alpha, alternative=alternative)
+        low = const + slope * low_p
+        upp = const + slope * upp_p
+        if slope < 0:
+            low, upp = upp, low
+        return low, upp

     def effectsize_normal(self, prob=None):
         """
@@ -216,7 +300,9 @@ class RankCompareResult(HolderTuple):
         equivalent Cohen's d effect size under normality assumption.

         """
-        pass
+        if prob is None:
+            prob = self.prob1
+        return stats.norm.ppf(prob) * np.sqrt(2)

     def summary(self, alpha=0.05, xname=None):
         """summary table for probability that random draw x1 is larger than x2
@@ -235,7 +321,34 @@ class RankCompareResult(HolderTuple):
         SimpleTable instance with methods to convert to different output
         formats.
         """
-        pass
+
+        yname = "None"
+        effect = np.atleast_1d(self.prob1)
+        if self.pvalue is None:
+            statistic, pvalue = self.test_prob_superior()
+        else:
+            pvalue = self.pvalue
+            statistic = self.statistic
+        pvalues = np.atleast_1d(pvalue)
+        ci = np.atleast_2d(self.conf_int(alpha=alpha))
+        if ci.shape[0] > 1:
+            ci = ci.T
+        use_t = self.use_t
+        sd = np.atleast_1d(np.sqrt(self.var_prob))
+        statistic = np.atleast_1d(statistic)
+        if xname is None:
+            xname = ['c%d' % ii for ii in range(len(effect))]
+
+        xname2 = ['prob(x1>x2) %s' % ii for ii in xname]
+
+        title = "Probability sample 1 is stochastically larger"
+        from statsmodels.iolib.summary import summary_params
+
+        summ = summary_params((self, effect, sd, statistic,
+                               pvalues, ci),
+                              yname=yname, xname=xname2, use_t=use_t,
+                              title=title, alpha=alpha)
+        return summ


 def rank_compare_2indep(x1, x2, use_t=True):
@@ -339,7 +452,58 @@ def rank_compare_2indep(x1, x2, use_t=True):
            https://doi.org/10.1080/00031305.2017.1305291.

     """
-    pass
+    x1 = np.asarray(x1)
+    x2 = np.asarray(x2)
+
+    nobs1 = len(x1)
+    nobs2 = len(x2)
+    nobs = nobs1 + nobs2
+    if nobs1 == 0 or nobs2 == 0:
+        raise ValueError("one sample has zero length")
+
+    rank1, rank2, ranki1, ranki2 = rankdata_2samp(x1, x2)
+
+    meanr1 = np.mean(rank1, axis=0)
+    meanr2 = np.mean(rank2, axis=0)
+    meanri1 = np.mean(ranki1, axis=0)
+    meanri2 = np.mean(ranki2, axis=0)
+
+    S1 = np.sum(np.power(rank1 - ranki1 - meanr1 + meanri1, 2.0), axis=0)
+    S1 /= nobs1 - 1
+    S2 = np.sum(np.power(rank2 - ranki2 - meanr2 + meanri2, 2.0), axis=0)
+    S2 /= nobs2 - 1
+
+    wbfn = nobs1 * nobs2 * (meanr1 - meanr2)
+    wbfn /= (nobs1 + nobs2) * np.sqrt(nobs1 * S1 + nobs2 * S2)
+
+    # Here we only use alternative == "two-sided"
+    if use_t:
+        df_numer = np.power(nobs1 * S1 + nobs2 * S2, 2.0)
+        df_denom = np.power(nobs1 * S1, 2.0) / (nobs1 - 1)
+        df_denom += np.power(nobs2 * S2, 2.0) / (nobs2 - 1)
+        df = df_numer / df_denom
+        pvalue = 2 * stats.t.sf(np.abs(wbfn), df)
+    else:
+        pvalue = 2 * stats.norm.sf(np.abs(wbfn))
+        df = None
+
+    # other info
+    var1 = S1 / (nobs - nobs1)**2
+    var2 = S2 / (nobs - nobs2)**2
+    var_prob = (var1 / nobs1 + var2 / nobs2)
+    var = nobs * (var1 / nobs1 + var2 / nobs2)
+    prob1 = (meanr1 - (nobs1 + 1) / 2) / nobs2
+    prob2 = (meanr2 - (nobs2 + 1) / 2) / nobs1
+
+    return RankCompareResult(statistic=wbfn, pvalue=pvalue, s1=S1, s2=S2,
+                             var1=var1, var2=var2, var=var,
+                             var_prob=var_prob,
+                             nobs1=nobs1, nobs2=nobs2, nobs=nobs,
+                             mean1=meanr1, mean2=meanr2,
+                             prob1=prob1, prob2=prob2,
+                             somersd1=prob1 * 2 - 1, somersd2=prob2 * 2 - 1,
+                             df=df, use_t=use_t
+                             )


 def rank_compare_2ordinal(count1, count2, ddof=1, use_t=True):
@@ -389,7 +553,41 @@ def rank_compare_2ordinal(count1, count2, ddof=1, use_t=True):
     function `rank_compare_2indep`.

     """
-    pass
+
+    count1 = np.asarray(count1)
+    count2 = np.asarray(count2)
+    nobs1, nobs2 = count1.sum(), count2.sum()
+    freq1 = count1 / nobs1
+    freq2 = count2 / nobs2
+    cdf1 = np.concatenate(([0], freq1)).cumsum(axis=0)
+    cdf2 = np.concatenate(([0], freq2)).cumsum(axis=0)
+
+    # mid rank cdf
+    cdfm1 = (cdf1[1:] + cdf1[:-1]) / 2
+    cdfm2 = (cdf2[1:] + cdf2[:-1]) / 2
+    prob1 = (cdfm2 * freq1).sum()
+    prob2 = (cdfm1 * freq2).sum()
+
+    var1 = (cdfm2**2 * freq1).sum() - prob1**2
+    var2 = (cdfm1**2 * freq2).sum() - prob2**2
+
+    var_prob = (var1 / (nobs1 - ddof) + var2 / (nobs2 - ddof))
+    nobs = nobs1 + nobs2
+    var = nobs * var_prob
+    vn1 = var1 * nobs2 * nobs1 / (nobs1 - ddof)
+    vn2 = var2 * nobs1 * nobs2 / (nobs2 - ddof)
+    df = (vn1 + vn2)**2 / (vn1**2 / (nobs1 - 1) + vn2**2 / (nobs2 - 1))
+    res = RankCompareResult(statistic=None, pvalue=None, s1=None, s2=None,
+                            var1=var1, var2=var2, var=var,
+                            var_prob=var_prob,
+                            nobs1=nobs1, nobs2=nobs2, nobs=nobs,
+                            mean1=None, mean2=None,
+                            prob1=prob1, prob2=prob2,
+                            somersd1=prob1 * 2 - 1, somersd2=prob2 * 2 - 1,
+                            df=df, use_t=use_t
+                            )
+
+    return res


 def prob_larger_continuous(distr1, distr2):
@@ -433,7 +631,8 @@ def prob_larger_continuous(distr1, distr2):
     0.23975006109347669

     """
-    pass
+
+    return distr1.expect(distr2.cdf)


 def cohensd2problarger(d):
@@ -462,4 +661,5 @@ def cohensd2problarger(d):
     prob : float or ndarray
         Prob(x1 > x2)
     """
-    pass
+
+    return stats.norm.cdf(d / np.sqrt(2))
diff --git a/statsmodels/stats/oaxaca.py b/statsmodels/stats/oaxaca.py
index 55838d9a0..f1dfd3f3f 100644
--- a/statsmodels/stats/oaxaca.py
+++ b/statsmodels/stats/oaxaca.py
@@ -1,3 +1,6 @@
+# TODO Non-Linear Regressions can be used
+# TODO Further decomposition of the two_fold parameters i.e.
+# the delta method for further two_fold detail
 """
 Author: Austin Adams

@@ -42,7 +45,9 @@ A. S. Blinder "Wage Discrimination: Reduced Form and Structural
 Estimates," The Journal of Human Resources, 1973.
 """
 from textwrap import dedent
+
 import numpy as np
+
 from statsmodels.regression.linear_model import OLS
 from statsmodels.tools.tools import add_constant

@@ -111,11 +116,20 @@ class OaxacaBlinder:
     Gap: 158.75044
     """

-    def __init__(self, endog, exog, bifurcate, hasconst=True, swap=True,
-        cov_type='nonrobust', cov_kwds=None):
-        if str(type(exog)).find('pandas') != -1:
+    def __init__(
+        self,
+        endog,
+        exog,
+        bifurcate,
+        hasconst=True,
+        swap=True,
+        cov_type="nonrobust",
+        cov_kwds=None,
+    ):
+        if str(type(exog)).find("pandas") != -1:
             bifurcate = exog.columns.get_loc(bifurcate)
             endog, exog = np.array(endog), np.array(exog)
+
         self.two_fold_type = None
         self.bifurcate = bifurcate
         self.cov_type = cov_type
@@ -127,6 +141,9 @@ class OaxacaBlinder:
         endog = np.column_stack((bi_col, endog))
         bi = np.unique(bi_col)
         self.bi_col = bi_col
+
+        # split the data along the bifurcate axis, the issue is you need to
+        # delete it after you fit the model for the total model.
         exog_f = exog[np.where(exog[:, bifurcate] == bi[0])]
         exog_s = exog[np.where(exog[:, bifurcate] == bi[1])]
         endog_f = endog[np.where(endog[:, 0] == bi[0])]
@@ -136,32 +153,154 @@ class OaxacaBlinder:
         endog_f = endog_f[:, 1]
         endog_s = endog_s[:, 1]
         self.endog = endog[:, 1]
+
         self.len_f, self.len_s = len(endog_f), len(endog_s)
         self.gap = endog_f.mean() - endog_s.mean()
+
         if swap and self.gap < 0:
             endog_f, endog_s = endog_s, endog_f
             exog_f, exog_s = exog_s, exog_f
             self.gap = endog_f.mean() - endog_s.mean()
             bi[0], bi[1] = bi[1], bi[0]
+
         self.bi = bi
+
         if hasconst is False:
             exog_f = add_constant(exog_f, prepend=False)
             exog_s = add_constant(exog_s, prepend=False)
             self.exog = add_constant(self.exog, prepend=False)
             self.neumark = add_constant(self.neumark, prepend=False)
+
         self.exog_f_mean = np.mean(exog_f, axis=0)
         self.exog_s_mean = np.mean(exog_s, axis=0)
-        self._f_model = OLS(endog_f, exog_f).fit(cov_type=cov_type,
-            cov_kwds=cov_kwds)
-        self._s_model = OLS(endog_s, exog_s).fit(cov_type=cov_type,
-            cov_kwds=cov_kwds)
+
+        self._f_model = OLS(endog_f, exog_f).fit(
+            cov_type=cov_type, cov_kwds=cov_kwds
+        )
+        self._s_model = OLS(endog_s, exog_s).fit(
+            cov_type=cov_type, cov_kwds=cov_kwds
+        )

     def variance(self, decomp_type, n=5000, conf=0.99):
         """
         A helper function to calculate the variance/std. Used to keep
         the decomposition functions cleaner
         """
-        pass
+        if self.submitted_n is not None:
+            n = self.submitted_n
+        if self.submitted_conf is not None:
+            conf = self.submitted_conf
+        if self.submitted_weight is not None:
+            submitted_weight = [
+                self.submitted_weight,
+                1 - self.submitted_weight,
+            ]
+        bi = self.bi
+        bifurcate = self.bifurcate
+        endow_eff_list = []
+        coef_eff_list = []
+        int_eff_list = []
+        exp_eff_list = []
+        unexp_eff_list = []
+        for _ in range(0, n):
+            endog = np.column_stack((self.bi_col, self.endog))
+            exog = self.exog
+            amount = len(endog)
+
+            samples = np.random.randint(0, high=amount, size=amount)
+            endog = endog[samples]
+            exog = exog[samples]
+            neumark = np.delete(exog, bifurcate, axis=1)
+
+            exog_f = exog[np.where(exog[:, bifurcate] == bi[0])]
+            exog_s = exog[np.where(exog[:, bifurcate] == bi[1])]
+            endog_f = endog[np.where(endog[:, 0] == bi[0])]
+            endog_s = endog[np.where(endog[:, 0] == bi[1])]
+            exog_f = np.delete(exog_f, bifurcate, axis=1)
+            exog_s = np.delete(exog_s, bifurcate, axis=1)
+            endog_f = endog_f[:, 1]
+            endog_s = endog_s[:, 1]
+            endog = endog[:, 1]
+
+            two_fold_type = self.two_fold_type
+
+            if self.hasconst is False:
+                exog_f = add_constant(exog_f, prepend=False)
+                exog_s = add_constant(exog_s, prepend=False)
+                exog = add_constant(exog, prepend=False)
+                neumark = add_constant(neumark, prepend=False)
+
+            _f_model = OLS(endog_f, exog_f).fit(
+                cov_type=self.cov_type, cov_kwds=self.cov_kwds
+            )
+            _s_model = OLS(endog_s, exog_s).fit(
+                cov_type=self.cov_type, cov_kwds=self.cov_kwds
+            )
+            exog_f_mean = np.mean(exog_f, axis=0)
+            exog_s_mean = np.mean(exog_s, axis=0)
+
+            if decomp_type == 3:
+                endow_eff = (exog_f_mean - exog_s_mean) @ _s_model.params
+                coef_eff = exog_s_mean @ (_f_model.params - _s_model.params)
+                int_eff = (exog_f_mean - exog_s_mean) @ (
+                    _f_model.params - _s_model.params
+                )
+
+                endow_eff_list.append(endow_eff)
+                coef_eff_list.append(coef_eff)
+                int_eff_list.append(int_eff)
+
+            elif decomp_type == 2:
+                len_f = len(exog_f)
+                len_s = len(exog_s)
+
+                if two_fold_type == "cotton":
+                    t_params = (len_f / (len_f + len_s) * _f_model.params) + (
+                        len_s / (len_f + len_s) * _s_model.params
+                    )
+
+                elif two_fold_type == "reimers":
+                    t_params = 0.5 * (_f_model.params + _s_model.params)
+
+                elif two_fold_type == "self_submitted":
+                    t_params = (
+                        submitted_weight[0] * _f_model.params
+                        + submitted_weight[1] * _s_model.params
+                    )
+
+                elif two_fold_type == "nuemark":
+                    _t_model = OLS(endog, neumark).fit(
+                        cov_type=self.cov_type, cov_kwds=self.cov_kwds
+                    )
+                    t_params = _t_model.params
+
+                else:
+                    _t_model = OLS(endog, exog).fit(
+                        cov_type=self.cov_type, cov_kwds=self.cov_kwds
+                    )
+                    t_params = np.delete(_t_model.params, bifurcate)
+
+                unexplained = (exog_f_mean @ (_f_model.params - t_params)) + (
+                    exog_s_mean @ (t_params - _s_model.params)
+                )
+
+                explained = (exog_f_mean - exog_s_mean) @ t_params
+
+                unexp_eff_list.append(unexplained)
+                exp_eff_list.append(explained)
+
+        high, low = int(n * conf), int(n * (1 - conf))
+        if decomp_type == 3:
+            return [
+                np.std(np.sort(endow_eff_list)[low:high]),
+                np.std(np.sort(coef_eff_list)[low:high]),
+                np.std(np.sort(int_eff_list)[low:high]),
+            ]
+        elif decomp_type == 2:
+            return [
+                np.std(np.sort(unexp_eff_list)[low:high]),
+                np.std(np.sort(exp_eff_list)[low:high]),
+            ]

     def three_fold(self, std=False, n=None, conf=None):
         """
@@ -185,10 +324,37 @@ class OaxacaBlinder:
         OaxacaResults
             A results container for the three-fold decomposition.
         """
-        pass
-
-    def two_fold(self, std=False, two_fold_type='pooled', submitted_weight=
-        None, n=None, conf=None):
+        self.submitted_n = n
+        self.submitted_conf = conf
+        self.submitted_weight = None
+        std_val = None
+        self.endow_eff = (
+            self.exog_f_mean - self.exog_s_mean
+        ) @ self._s_model.params
+        self.coef_eff = self.exog_s_mean @ (
+            self._f_model.params - self._s_model.params
+        )
+        self.int_eff = (self.exog_f_mean - self.exog_s_mean) @ (
+            self._f_model.params - self._s_model.params
+        )
+
+        if std is True:
+            std_val = self.variance(3)
+
+        return OaxacaResults(
+            (self.endow_eff, self.coef_eff, self.int_eff, self.gap),
+            3,
+            std_val=std_val,
+        )
+
+    def two_fold(
+        self,
+        std=False,
+        two_fold_type="pooled",
+        submitted_weight=None,
+        n=None,
+        conf=None,
+    ):
         """
         Calculates the two-fold or pooled Oaxaca Blinder Decompositions

@@ -246,7 +412,52 @@ class OaxacaBlinder:
         OaxacaResults
             A results container for the two-fold decomposition.
         """
-        pass
+        self.submitted_n = n
+        self.submitted_conf = conf
+        std_val = None
+        self.two_fold_type = two_fold_type
+        self.submitted_weight = submitted_weight
+
+        if two_fold_type == "cotton":
+            self.t_params = (
+                self.len_f / (self.len_f + self.len_s) * self._f_model.params
+            ) + (self.len_s / (self.len_f + self.len_s) * self._s_model.params)
+
+        elif two_fold_type == "reimers":
+            self.t_params = 0.5 * (self._f_model.params + self._s_model.params)
+
+        elif two_fold_type == "self_submitted":
+            if submitted_weight is None:
+                raise ValueError("Please submit weights")
+            submitted_weight = [submitted_weight, 1 - submitted_weight]
+            self.t_params = (
+                submitted_weight[0] * self._f_model.params
+                + submitted_weight[1] * self._s_model.params
+            )
+
+        elif two_fold_type == "nuemark":
+            self._t_model = OLS(self.endog, self.neumark).fit(
+                cov_type=self.cov_type, cov_kwds=self.cov_kwds
+            )
+            self.t_params = self._t_model.params
+
+        else:
+            self._t_model = OLS(self.endog, self.exog).fit(
+                cov_type=self.cov_type, cov_kwds=self.cov_kwds
+            )
+            self.t_params = np.delete(self._t_model.params, self.bifurcate)
+
+        self.unexplained = (
+            self.exog_f_mean @ (self._f_model.params - self.t_params)
+        ) + (self.exog_s_mean @ (self.t_params - self._s_model.params))
+        self.explained = (self.exog_f_mean - self.exog_s_mean) @ self.t_params
+
+        if std is True:
+            std_val = self.variance(2)
+
+        return OaxacaResults(
+            (self.unexplained, self.explained, self.gap), 2, std_val=std_val
+        )


 class OaxacaResults:
@@ -308,4 +519,58 @@ class OaxacaResults:
         """
         Print a summary table with the Oaxaca-Blinder effects
         """
-        pass
+        if self.model_type == 2:
+            if self.std is None:
+                print(
+                    dedent(
+                        f"""\
+                Oaxaca-Blinder Two-fold Effects
+                Unexplained Effect: {self.params[0]:.5f}
+                Explained Effect: {self.params[1]:.5f}
+                Gap: {self.params[2]:.5f}"""
+                    )
+                )
+            else:
+                print(
+                    dedent(
+                        """\
+                Oaxaca-Blinder Two-fold Effects
+                Unexplained Effect: {:.5f}
+                Unexplained Standard Error: {:.5f}
+                Explained Effect: {:.5f}
+                Explained Standard Error: {:.5f}
+                Gap: {:.5f}""".format(
+                            self.params[0],
+                            self.std[0],
+                            self.params[1],
+                            self.std[1],
+                            self.params[2],
+                        )
+                    )
+                )
+        if self.model_type == 3:
+            if self.std is None:
+                print(
+                    dedent(
+                        f"""\
+                Oaxaca-Blinder Three-fold Effects
+                Endowment Effect: {self.params[0]:.5f}
+                Coefficient Effect: {self.params[1]:.5f}
+                Interaction Effect: {self.params[2]:.5f}
+                Gap: {self.params[3]:.5f}"""
+                    )
+                )
+            else:
+                print(
+                    dedent(
+                        f"""\
+                Oaxaca-Blinder Three-fold Effects
+                Endowment Effect: {self.params[0]:.5f}
+                Endowment Standard Error: {self.std[0]:.5f}
+                Coefficient Effect: {self.params[1]:.5f}
+                Coefficient Standard Error: {self.std[1]:.5f}
+                Interaction Effect: {self.params[2]:.5f}
+                Interaction Standard Error: {self.std[2]:.5f}
+                Gap: {self.params[3]:.5f}"""
+                    )
+                )
diff --git a/statsmodels/stats/oneway.py b/statsmodels/stats/oneway.py
index 3138c68d7..943006d96 100644
--- a/statsmodels/stats/oneway.py
+++ b/statsmodels/stats/oneway.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Wed Mar 18 10:33:38 2020

@@ -5,16 +6,20 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np
 from scipy import stats
 from scipy.special import ncfdtrinc
+
+# functions that use scipy.special instead of boost based function in stats
 from statsmodels.stats.power import ncf_cdf, ncf_ppf
+
 from statsmodels.stats.robust_compare import TrimmedMean, scale_transform
 from statsmodels.tools.testing import Holder
 from statsmodels.stats.base import HolderTuple


-def effectsize_oneway(means, vars_, nobs, use_var='unequal', ddof_between=0):
+def effectsize_oneway(means, vars_, nobs, use_var="unequal", ddof_between=0):
     """
     Effect size corresponding to Cohen's f = nc / nobs for oneway anova

@@ -129,7 +134,52 @@ def effectsize_oneway(means, vars_, nobs, use_var='unequal', ddof_between=0):
     0.3765792117047725

     """
-    pass
+    # the code here is largely a copy of onway_generic with adjustments
+
+    means = np.asarray(means)
+    n_groups = means.shape[0]
+
+    if np.size(nobs) == 1:
+        nobs = np.ones(n_groups) * nobs
+
+    nobs_t = nobs.sum()
+
+    if use_var == "equal":
+        if np.size(vars_) == 1:
+            var_resid = vars_
+        else:
+            vars_ = np.asarray(vars_)
+            var_resid = ((nobs - 1) * vars_).sum() / (nobs_t - n_groups)
+
+        vars_ = var_resid  # scalar, if broadcasting works
+
+    weights = nobs / vars_
+
+    w_total = weights.sum()
+    w_rel = weights / w_total
+    # meanw_t = (weights * means).sum() / w_total
+    meanw_t = w_rel @ means
+
+    f2 = np.dot(weights, (means - meanw_t)**2) / (nobs_t - ddof_between)
+
+    if use_var.lower() == "bf":
+        weights = nobs
+        w_total = weights.sum()
+        w_rel = weights / w_total
+        meanw_t = w_rel @ means
+        # TODO: reuse general case with weights
+        tmp = ((1. - nobs / nobs_t) * vars_).sum()
+        statistic = 1. * (nobs * (means - meanw_t)**2).sum()
+        statistic /= tmp
+        f2 = statistic * (1. - nobs / nobs_t).sum() / nobs_t
+        # correction factor for df_num in BFM
+        df_num2 = n_groups - 1
+        df_num = tmp**2 / ((vars_**2).sum() +
+                           (nobs / nobs_t * vars_).sum()**2 -
+                           2 * (nobs / nobs_t * vars_**2).sum())
+        f2 *= df_num / df_num2
+
+    return f2


 def convert_effectsize_fsqu(f2=None, eta2=None):
@@ -157,7 +207,14 @@ def convert_effectsize_fsqu(f2=None, eta2=None):
         An instance of the Holder class with f2 and eta2 as attributes.

     """
-    pass
+    if f2 is not None:
+        eta2 = 1 / (1 + 1 / f2)
+
+    elif eta2 is not None:
+        f2 = eta2 / (1 - eta2)
+
+    res = Holder(f2=f2, eta2=eta2)
+    return res


 def _fstat2effectsize(f_stat, df):
@@ -200,8 +257,19 @@ def _fstat2effectsize(f_stat, df):
     cases (e.g. zero division).

     """
-    pass
+    df1, df2 = df
+    f2 = f_stat * df1 / df2
+    eta2 = f2 / (f2 + 1)
+    omega2_ = (f_stat - 1) / (f_stat + (df2 + 1) / df1)
+    omega2 = (f2 - df1 / df2) / (f2 + 1 + 1 / df2)  # rewrite
+    eps2_ = (f_stat - 1) / (f_stat + df2 / df1)
+    eps2 = (f2 - df1 / df2) / (f2 + 1)  # rewrite
+    return Holder(f2=f2, eta2=eta2, omega2=omega2, eps2=eps2, eps2_=eps2_,
+                  omega2_=omega2_)
+

+# conversion functions for Wellek's equivalence effect size
+# these are mainly to compare with literature

 def wellek_to_f2(eps, n_groups):
     """Convert Wellek's effect size (sqrt) to Cohen's f-squared
@@ -222,7 +290,8 @@ def wellek_to_f2(eps, n_groups):
     f2 : effect size Cohen's f-squared

     """
-    pass
+    f2 = 1 / n_groups * eps**2
+    return f2


 def f2_to_wellek(f2, n_groups):
@@ -244,7 +313,8 @@ def f2_to_wellek(f2, n_groups):
     eps : float or ndarray
         Wellek's effect size used in anova equivalence test
     """
-    pass
+    eps = np.sqrt(n_groups * f2)
+    return eps


 def fstat_to_wellek(f_stat, n_groups, nobs_mean):
@@ -269,10 +339,12 @@ def fstat_to_wellek(f_stat, n_groups, nobs_mean):
         Wellek's effect size used in anova equivalence test

     """
-    pass
+    es = f_stat * (n_groups - 1) / nobs_mean
+    return es


-def confint_noncentrality(f_stat, df, alpha=0.05, alternative='two-sided'):
+def confint_noncentrality(f_stat, df, alpha=0.05,
+                          alternative="two-sided"):
     """
     Confidence interval for noncentrality parameter in F-test

@@ -314,7 +386,15 @@ def confint_noncentrality(f_stat, df, alpha=0.05, alternative='two-sided'):
     --------
     confint_effectsize_oneway
     """
-    pass
+
+    df1, df2 = df
+    if alternative in ["two-sided", "2s", "ts"]:
+        alpha1s = alpha / 2
+        ci = ncfdtrinc(df1, df2, [1 - alpha1s, alpha1s], f_stat)
+    else:
+        raise NotImplementedError
+
+    return ci


 def confint_effectsize_oneway(f_stat, df, alpha=0.05, nobs=None):
@@ -358,11 +438,25 @@ def confint_effectsize_oneway(f_stat, df, alpha=0.05, nobs=None):
     --------
     confint_noncentrality
     """
-    pass

+    df1, df2 = df
+    if nobs is None:
+        nobs = df1 + df2 + 1
+    ci_nc = confint_noncentrality(f_stat, df, alpha=alpha)

-def anova_generic(means, variances, nobs, use_var='unequal',
-    welch_correction=True, info=None):
+    ci_f2 = ci_nc / nobs
+    ci_res = convert_effectsize_fsqu(f2=ci_f2)
+    ci_res.ci_omega2 = (ci_f2 - df1 / df2) / (ci_f2 + 1 + 1 / df2)
+    ci_res.ci_nc = ci_nc
+    ci_res.ci_f = np.sqrt(ci_res.f2)
+    ci_res.ci_eta = np.sqrt(ci_res.eta2)
+    ci_res.ci_f_corrected = np.sqrt(ci_res.f2 * (df1 + 1) / df1)
+
+    return ci_res
+
+
+def anova_generic(means, variances, nobs, use_var="unequal",
+                  welch_correction=True, info=None):
     """
     Oneway Anova based on summary statistics

@@ -397,11 +491,76 @@ def anova_generic(means, variances, nobs, use_var='unequal',
         This includes `statistic` and `pvalue`.

     """
-    pass
-
-
-def anova_oneway(data, groups=None, use_var='unequal', welch_correction=
-    True, trim_frac=0):
+    options = {"use_var": use_var,
+               "welch_correction": welch_correction
+               }
+    if means.ndim != 1:
+        raise ValueError('data (means, ...) has to be one-dimensional')
+    nobs_t = nobs.sum()
+    n_groups = len(means)
+    # mean_t = (nobs * means).sum() / nobs_t
+    if use_var == "unequal":
+        weights = nobs / variances
+    else:
+        weights = nobs
+
+    w_total = weights.sum()
+    w_rel = weights / w_total
+    # meanw_t = (weights * means).sum() / w_total
+    meanw_t = w_rel @ means
+
+    statistic = np.dot(weights, (means - meanw_t)**2) / (n_groups - 1.)
+    df_num = n_groups - 1.
+
+    if use_var == "unequal":
+        tmp = ((1 - w_rel)**2 / (nobs - 1)).sum() / (n_groups**2 - 1)
+        if welch_correction:
+            statistic /= 1 + 2 * (n_groups - 2) * tmp
+        df_denom = 1. / (3. * tmp)
+
+    elif use_var == "equal":
+        # variance of group demeaned total sample, pooled var_resid
+        tmp = ((nobs - 1) * variances).sum() / (nobs_t - n_groups)
+        statistic /= tmp
+        df_denom = nobs_t - n_groups
+
+    elif use_var == "bf":
+        tmp = ((1. - nobs / nobs_t) * variances).sum()
+        statistic = 1. * (nobs * (means - meanw_t)**2).sum()
+        statistic /= tmp
+
+        df_num2 = n_groups - 1
+        df_denom = tmp**2 / ((1. - nobs / nobs_t) ** 2 *
+                             variances ** 2 / (nobs - 1)).sum()
+        df_num = tmp**2 / ((variances ** 2).sum() +
+                           (nobs / nobs_t * variances).sum() ** 2 -
+                           2 * (nobs / nobs_t * variances ** 2).sum())
+        pval2 = stats.f.sf(statistic, df_num2, df_denom)
+        options["df2"] = (df_num2, df_denom)
+        options["df_num2"] = df_num2
+        options["pvalue2"] = pval2
+
+    else:
+        raise ValueError('use_var is to be one of "unequal", "equal" or "bf"')
+
+    pval = stats.f.sf(statistic, df_num, df_denom)
+    res = HolderTuple(statistic=statistic,
+                      pvalue=pval,
+                      df=(df_num, df_denom),
+                      df_num=df_num,
+                      df_denom=df_denom,
+                      nobs_t=nobs_t,
+                      n_groups=n_groups,
+                      means=means,
+                      nobs=nobs,
+                      vars_=variances,
+                      **options
+                      )
+    return res
+
+
+def anova_oneway(data, groups=None, use_var="unequal", welch_correction=True,
+                 trim_frac=0):
     """Oneway Anova

     This implements standard anova, Welch and Brown-Forsythe, and trimmed
@@ -501,11 +660,42 @@ def anova_oneway(data, groups=None, use_var='unequal', welch_correction=
     Simulation and Computation 26 (3): 1139–1145.
     doi:10.1080/03610919708813431.
     """
-    pass
+    if groups is not None:
+        uniques = np.unique(groups)
+        data = [data[groups == uni] for uni in uniques]
+    else:
+        # uniques = None  # not used yet, add to info?
+        pass
+    args = list(map(np.asarray, data))
+    if any([x.ndim != 1 for x in args]):
+        raise ValueError('data arrays have to be one-dimensional')
+
+    nobs = np.array([len(x) for x in args], float)
+    # n_groups = len(args)  # not used
+    # means = np.array([np.mean(x, axis=0) for x in args], float)
+    # vars_ = np.array([np.var(x, ddof=1, axis=0) for x in args], float)
+
+    if trim_frac == 0:
+        means = np.array([x.mean() for x in args])
+        vars_ = np.array([x.var(ddof=1) for x in args])
+    else:
+        tms = [TrimmedMean(x, trim_frac) for x in args]
+        means = np.array([tm.mean_trimmed for tm in tms])
+        # R doesn't use uncorrected var_winsorized
+        # vars_ = np.array([tm.var_winsorized for tm in tms])
+        vars_ = np.array([tm.var_winsorized * (tm.nobs - 1) /
+                          (tm.nobs_reduced - 1) for tm in tms])
+        # nobs_original = nobs  # store just in case
+        nobs = np.array([tm.nobs_reduced for tm in tms])
+
+    res = anova_generic(means, vars_, nobs, use_var=use_var,
+                        welch_correction=welch_correction)
+
+    return res


 def equivalence_oneway_generic(f_stat, n_groups, nobs, equiv_margin, df,
-    alpha=0.05, margin_type='f2'):
+                               alpha=0.05, margin_type="f2"):
     """Equivalence test for oneway anova (Wellek and extensions)

     This is an helper function when summary statistics are available.
@@ -574,11 +764,47 @@ def equivalence_oneway_generic(f_stat, n_groups, nobs, equiv_margin, df,
     https://doi.org/10.1080/19466315.2019.1654915.

     """
-    pass
-
-
-def equivalence_oneway(data, equiv_margin, groups=None, use_var='unequal',
-    welch_correction=True, trim_frac=0, margin_type='f2'):
+    nobs_t = nobs.sum()
+    nobs_mean = nobs_t / n_groups
+
+    if margin_type == "wellek":
+        nc_null = nobs_mean * equiv_margin**2
+        es = f_stat * (n_groups - 1) / nobs_mean
+        type_effectsize = "Wellek's psi_squared"
+    elif margin_type in ["f2", "fsqu", "fsquared"]:
+        nc_null = nobs_t * equiv_margin
+        es = f_stat / nobs_t
+        type_effectsize = "Cohen's f_squared"
+    else:
+        raise ValueError('`margin_type` should be "f2" or "wellek"')
+    crit_f = ncf_ppf(alpha, df[0], df[1], nc_null)
+
+    if margin_type == "wellek":
+        # TODO: do we need a sqrt
+        crit_es = crit_f * (n_groups - 1) / nobs_mean
+    elif margin_type in ["f2", "fsqu", "fsquared"]:
+        crit_es = crit_f / nobs_t
+
+    reject = (es < crit_es)
+
+    pv = ncf_cdf(f_stat, df[0], df[1], nc_null)
+    pwr = ncf_cdf(crit_f, df[0], df[1], 1e-13)  # scipy, cannot be 0
+    res = HolderTuple(statistic=f_stat,
+                      pvalue=pv,
+                      effectsize=es,  # match es type to margin_type
+                      crit_f=crit_f,
+                      crit_es=crit_es,
+                      reject=reject,
+                      power_zero=pwr,
+                      df=df,
+                      f_stat=f_stat,
+                      type_effectsize=type_effectsize
+                      )
+    return res
+
+
+def equivalence_oneway(data, equiv_margin, groups=None, use_var="unequal",
+                       welch_correction=True, trim_frac=0, margin_type="f2"):
     """equivalence test for oneway anova (Wellek's Anova)

     The null hypothesis is that the means differ by more than `equiv_margin`
@@ -644,7 +870,17 @@ def equivalence_oneway(data, equiv_margin, groups=None, use_var='unequal',
     anova_oneway
     equivalence_scale_oneway
     """
-    pass
+
+    # use anova to compute summary statistics and f-statistic
+    res0 = anova_oneway(data, groups=groups, use_var=use_var,
+                        welch_correction=welch_correction,
+                        trim_frac=trim_frac)
+    f_stat = res0.statistic
+    res = equivalence_oneway_generic(f_stat, res0.n_groups, res0.nobs_t,
+                                     equiv_margin, res0.df, alpha=0.05,
+                                     margin_type=margin_type)
+
+    return res


 def _power_equivalence_oneway_emp(f_stat, n_groups, nobs, eps, df, alpha=0.05):
@@ -676,11 +912,20 @@ def _power_equivalence_oneway_emp(f_stat, n_groups, nobs, eps, df, alpha=0.05):
         Ex-post, post-hoc or empirical power at f-statistic of the equivalence
         test.
     """
-    pass
+
+    res = equivalence_oneway_generic(f_stat, n_groups, nobs, eps, df,
+                                     alpha=alpha, margin_type="wellek")
+
+    nobs_mean = nobs.sum() / n_groups
+    fn = f_stat  # post-hoc power, empirical power at estimate
+    esn = fn * (n_groups - 1) / nobs_mean  # Wellek psi
+    pow_ = ncf_cdf(res.crit_f, df[0], df[1], nobs_mean * esn)
+
+    return pow_


 def power_equivalence_oneway(f2_alt, equiv_margin, nobs_t, n_groups=None,
-    df=None, alpha=0.05, margin_type='f2'):
+                             df=None, alpha=0.05, margin_type="f2"):
     """
     Power of  oneway equivalence test

@@ -711,11 +956,42 @@ def power_equivalence_oneway(f2_alt, equiv_margin, nobs_t, n_groups=None,
         Power of the equivalence test at given equivalence effect size under
         the alternative.
     """
-    pass
+
+    # one of n_groups or df has to be specified
+    if df is None:
+        if n_groups is None:
+            raise ValueError("either df or n_groups has to be provided")
+        df = (n_groups - 1, nobs_t - n_groups)
+
+    # esn = fn * (n_groups - 1) / nobs_mean  # Wellek psi
+
+    # fix for scipy, ncf does not allow nc == 0, fixed in scipy master
+    if f2_alt == 0:
+        f2_alt = 1e-13
+    # effect size, critical value at margin
+    # f2_null = equiv_margin
+    if margin_type in ["f2", "fsqu", "fsquared"]:
+        f2_null = equiv_margin
+    elif margin_type == "wellek":
+        if n_groups is None:
+            raise ValueError("If margin_type is wellek, then n_groups has "
+                             "to be provided")
+        #  f2_null = (n_groups - 1) * n_groups / nobs_t * equiv_margin**2
+        nobs_mean = nobs_t / n_groups
+        f2_null = nobs_mean * equiv_margin**2 / nobs_t
+        f2_alt = nobs_mean * f2_alt**2 / nobs_t
+    else:
+        raise ValueError('`margin_type` should be "f2" or "wellek"')
+
+    crit_f_margin = ncf_ppf(alpha, df[0], df[1], nobs_t * f2_null)
+    pwr_alt = ncf_cdf(crit_f_margin, df[0], df[1], nobs_t * f2_alt)
+    return pwr_alt


 def simulate_power_equivalence_oneway(means, nobs, equiv_margin, vars_=None,
-    k_mc=1000, trim_frac=0, options_var=None, margin_type='f2'):
+                                      k_mc=1000, trim_frac=0,
+                                      options_var=None, margin_type="f2"
+                                      ):  # , anova_options=None):  #TODO
     """Simulate Power for oneway equivalence test (Wellek's Anova)

     This function is experimental and written to evaluate asymptotic power
@@ -725,11 +1001,62 @@ def simulate_power_equivalence_oneway(means, nobs, equiv_margin, vars_=None,
     Effect size for equivalence margin

     """
-    pass
-
-
-def test_scale_oneway(data, method='bf', center='median', transform='abs',
-    trim_frac_mean=0.1, trim_frac_anova=0.0):
+    if options_var is None:
+        options_var = ["unequal", "equal", "bf"]
+    if vars_ is not None:
+        stds = np.sqrt(vars_)
+    else:
+        stds = np.ones(len(means))
+
+    nobs_mean = nobs.mean()
+    n_groups = len(nobs)
+    res_mc = []
+    f_mc = []
+    reject_mc = []
+    other_mc = []
+    for _ in range(k_mc):
+        y0, y1, y2, y3 = [m + std * np.random.randn(n)
+                          for (n, m, std) in zip(nobs, means, stds)]
+
+        res_i = []
+        f_i = []
+        reject_i = []
+        other_i = []
+        for uv in options_var:
+            # for welch in options_welch:
+            # res1 = sma.anova_generic(means, vars_, nobs, use_var=uv,
+            #                          welch_correction=welch)
+            res0 = anova_oneway([y0, y1, y2, y3], use_var=uv,
+                                trim_frac=trim_frac)
+            f_stat = res0.statistic
+            res1 = equivalence_oneway_generic(f_stat, n_groups, nobs.sum(),
+                                              equiv_margin, res0.df,
+                                              alpha=0.05,
+                                              margin_type=margin_type)
+            res_i.append(res1.pvalue)
+            es_wellek = f_stat * (n_groups - 1) / nobs_mean
+            f_i.append(es_wellek)
+            reject_i.append(res1.reject)
+            other_i.extend([res1.crit_f, res1.crit_es, res1.power_zero])
+        res_mc.append(res_i)
+        f_mc.append(f_i)
+        reject_mc.append(reject_i)
+        other_mc.append(other_i)
+
+    f_mc = np.asarray(f_mc)
+    other_mc = np.asarray(other_mc)
+    res_mc = np.asarray(res_mc)
+    reject_mc = np.asarray(reject_mc)
+    res = Holder(f_stat=f_mc,
+                 other=other_mc,
+                 pvalue=res_mc,
+                 reject=reject_mc
+                 )
+    return res
+
+
+def test_scale_oneway(data, method="bf", center="median", transform="abs",
+                      trim_frac_mean=0.1, trim_frac_anova=0.0):
     """Oneway Anova test for equal scale, variance or dispersion

     This hypothesis test performs a oneway anova test on transformed data and
@@ -804,11 +1131,20 @@ def test_scale_oneway(data, method='bf', center='median', transform='abs',
     anova_oneway
     scale_transform
     """
-    pass
+
+    data = map(np.asarray, data)
+    xxd = [scale_transform(x, center=center, transform=transform,
+                           trim_frac=trim_frac_mean) for x in data]
+
+    res = anova_oneway(xxd, groups=None, use_var=method,
+                       welch_correction=True, trim_frac=trim_frac_anova)
+    res.data_transformed = xxd
+    return res


-def equivalence_scale_oneway(data, equiv_margin, method='bf', center=
-    'median', transform='abs', trim_frac_mean=0.0, trim_frac_anova=0.0):
+def equivalence_scale_oneway(data, equiv_margin, method='bf', center='median',
+                             transform='abs', trim_frac_mean=0.,
+                             trim_frac_anova=0.):
     """Oneway Anova test for equivalence of scale, variance or dispersion

     This hypothesis test performs a oneway equivalence anova test on
@@ -874,4 +1210,11 @@ def equivalence_scale_oneway(data, equiv_margin, method='bf', center=
     scale_transform
     equivalence_oneway
     """
-    pass
+    data = map(np.asarray, data)
+    xxd = [scale_transform(x, center=center, transform=transform,
+                           trim_frac=trim_frac_mean) for x in data]
+
+    res = equivalence_oneway(xxd, equiv_margin, use_var=method,
+                             welch_correction=True, trim_frac=trim_frac_anova)
+    res.x_transformed = xxd
+    return res
diff --git a/statsmodels/stats/outliers_influence.py b/statsmodels/stats/outliers_influence.py
index 5ab63f0d1..7058adaa1 100644
--- a/statsmodels/stats/outliers_influence.py
+++ b/statsmodels/stats/outliers_influence.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Influence and Outlier Measures

 Created on Sun Jan 29 11:16:09 2012
@@ -5,20 +6,26 @@ Created on Sun Jan 29 11:16:09 2012
 Author: Josef Perktold
 License: BSD-3
 """
+
 import warnings
+
 from statsmodels.compat.pandas import Appender
 from statsmodels.compat.python import lzip
+
 from collections import defaultdict
+
 import numpy as np
+
 from statsmodels.graphics._regressionplots_doc import _plot_influence_doc
 from statsmodels.regression.linear_model import OLS
 from statsmodels.stats.multitest import multipletests
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.tools import maybe_unwrap_results

+# outliers test convenience wrapper

-def outlier_test(model_results, method='bonf', alpha=0.05, labels=None,
-    order=False, cutoff=None):
+def outlier_test(model_results, method='bonf', alpha=.05, labels=None,
+                 order=False, cutoff=None):
     """
     Outlier Tests for RegressionResults instances.

@@ -63,8 +70,41 @@ def outlier_test(model_results, method='bonf', alpha=0.05, labels=None,
     The unadjusted p-value is stats.t.sf(abs(resid), df) where
     df = df_resid - 1.
     """
-    pass
-
+    from scipy import stats  # lazy import
+    if labels is None:
+        labels = getattr(model_results.model.data, 'row_labels', None)
+    infl = getattr(model_results, 'get_influence', None)
+    if infl is None:
+        results = maybe_unwrap_results(model_results)
+        raise AttributeError("model_results object %s does not have a "
+                             "get_influence "
+                             "method." % results.__class__.__name__)
+    resid = infl().resid_studentized_external
+    if order:
+        idx = np.abs(resid).argsort()[::-1]
+        resid = resid[idx]
+        if labels is not None:
+            labels = np.asarray(labels)[idx]
+    df = model_results.df_resid - 1
+    unadj_p = stats.t.sf(np.abs(resid), df) * 2
+    adj_p = multipletests(unadj_p, alpha=alpha, method=method)
+
+    data = np.c_[resid, unadj_p, adj_p[1]]
+    if cutoff is not None:
+        mask = data[:, -1] < cutoff
+        data = data[mask]
+    else:
+        mask = slice(None)
+
+    if labels is not None:
+        from pandas import DataFrame
+        return DataFrame(data,
+                         columns=['student_resid', 'unadj_p', method + "(p)"],
+                         index=np.asarray(labels)[mask])
+    return data
+
+
+# influence measures

 def reset_ramsey(res, degree=5):
     """Ramsey's RESET specification test for linear models
@@ -93,7 +133,21 @@ def reset_ramsey(res, degree=5):
     ----------
     https://en.wikipedia.org/wiki/Ramsey_RESET_test
     """
-    pass
+    order = degree + 1
+    k_vars = res.model.exog.shape[1]
+    # vander without constant and x, and drop constant
+    norm_values = np.asarray(res.fittedvalues)
+    norm_values = norm_values / np.sqrt((norm_values ** 2).mean())
+    y_fitted_vander = np.vander(norm_values, order)[:, :-2]
+    exog = np.column_stack((res.model.exog, y_fitted_vander))
+    exog /= np.sqrt((exog ** 2).mean(0))
+    endog = res.model.endog / (res.model.endog ** 2).mean()
+    res_aux = OLS(endog, exog).fit()
+    # r_matrix = np.eye(degree, exog.shape[1], k_vars)
+    r_matrix = np.eye(degree - 1, exog.shape[1], k_vars)
+    # df1 = degree - 1
+    # df2 = exog.shape[0] - degree - res.df_model  (without constant)
+    return res_aux.f_test(r_matrix)  # , r_matrix, res_aux


 def variance_inflation_factor(exog, exog_idx):
@@ -135,15 +189,74 @@ def variance_inflation_factor(exog, exog_idx):
     ----------
     https://en.wikipedia.org/wiki/Variance_inflation_factor
     """
-    pass
+    k_vars = exog.shape[1]
+    exog = np.asarray(exog)
+    x_i = exog[:, exog_idx]
+    mask = np.arange(k_vars) != exog_idx
+    x_noti = exog[:, mask]
+    r_squared_i = OLS(x_i, x_noti).fit().rsquared
+    vif = 1. / (1. - r_squared_i)
+    return vif


 class _BaseInfluenceMixin:
     """common methods between OLSInfluence and MLE/GLMInfluence
     """

+    @Appender(_plot_influence_doc.format(**{'extra_params_doc': ""}))
+    def plot_influence(self, external=None, alpha=.05, criterion="cooks",
+                       size=48, plot_alpha=.75, ax=None, **kwargs):
+
+        if external is None:
+            external = hasattr(self, '_cache') and 'res_looo' in self._cache
+        from statsmodels.graphics.regressionplots import _influence_plot
+        if self.hat_matrix_diag is not None:
+            res = _influence_plot(self.results, self, external=external,
+                                  alpha=alpha,
+                                  criterion=criterion, size=size,
+                                  plot_alpha=plot_alpha, ax=ax, **kwargs)
+        else:
+            warnings.warn("Plot uses pearson residuals and exog hat matrix.")
+            res = _influence_plot(self.results, self, external=external,
+                                  alpha=alpha,
+                                  criterion=criterion, size=size,
+                                  leverage=self.hat_matrix_exog_diag,
+                                  resid=self.resid,
+                                  plot_alpha=plot_alpha, ax=ax, **kwargs)
+        return res
+
+    def _plot_index(self, y, ylabel, threshold=None, title=None, ax=None,
+                    **kwds):
+        from statsmodels.graphics import utils
+        fig, ax = utils.create_mpl_ax(ax)
+        if title is None:
+            title = "Index Plot"
+        nobs = len(self.endog)
+        index = np.arange(nobs)
+        ax.scatter(index, y, **kwds)
+
+        if threshold == 'all':
+            large_points = np.ones(nobs, np.bool_)
+        else:
+            large_points = np.abs(y) > threshold
+        psize = 3 * np.ones(nobs)
+        # add point labels
+        labels = self.results.model.data.row_labels
+        if labels is None:
+            labels = np.arange(nobs)
+        ax = utils.annotate_axes(np.where(large_points)[0], labels,
+                                 lzip(index, y),
+                                 lzip(-psize, psize), "large",
+                                 ax)
+
+        font = {"fontsize": 16, "color": "black"}
+        ax.set_ylabel(ylabel, **font)
+        ax.set_xlabel("Observation", **font)
+        ax.set_title(title, **font)
+        return fig
+
     def plot_index(self, y_var='cooks', threshold=None, title=None, ax=None,
-        idx=None, **kwds):
+                   idx=None, **kwds):
         """index plot for influence attributes

         Parameters
@@ -167,7 +280,36 @@ class _BaseInfluenceMixin:
         kwds : optional keywords
             Keywords will be used in the call to matplotlib scatter function.
         """
-        pass
+        criterion = y_var  # alias
+        if threshold is None:
+            # TODO: criterion specific defaults
+            threshold = 'all'
+
+        if criterion == 'dfbeta':
+            y = self.dfbetas[:, idx]
+            ylabel = 'DFBETA for ' + self.results.model.exog_names[idx]
+        elif criterion.startswith('cook'):
+            y = self.cooks_distance[0]
+            ylabel = "Cook's distance"
+        elif criterion.startswith('hat') or criterion.startswith('lever'):
+            y = self.hat_matrix_diag
+            ylabel = "Leverage (diagonal of hat matrix)"
+        elif criterion.startswith('cook'):
+            y = self.cooks_distance[0]
+            ylabel = "Cook's distance"
+        elif criterion.startswith('resid_stu'):
+            y = self.resid_studentized
+            ylabel = "Internally Studentized Residuals"
+        else:
+            # assume we have the name of an attribute
+            y = getattr(self, y_var)
+            if idx is not None:
+                y = y[idx]
+            ylabel = y_var
+
+        fig = self._plot_index(y, ylabel, threshold=threshold, title=title,
+                               ax=ax, **kwds)
+        return fig


 class MLEInfluence(_BaseInfluenceMixin):
@@ -240,8 +382,10 @@ class MLEInfluence(_BaseInfluenceMixin):
     """

     def __init__(self, results, resid=None, endog=None, exog=None,
-        hat_matrix_diag=None, cov_params=None, scale=None):
+                 hat_matrix_diag=None, cov_params=None, scale=None):
+        # this __init__ attaches attributes that we don't really need
         self.results = results = maybe_unwrap_results(results)
+        # TODO: check for extra params in e.g. NegBin
         self.nobs, self.k_vars = results.model.exog.shape
         self.k_params = np.size(results.params)
         self.endog = endog if endog is not None else results.model.endog
@@ -250,12 +394,15 @@ class MLEInfluence(_BaseInfluenceMixin):
         if resid is not None:
             self.resid = resid
         else:
-            self.resid = getattr(results, 'resid_pearson', None)
-            if self.resid is not None:
+            self.resid = getattr(results, "resid_pearson", None)
+            if self.resid is not None: # and scale != 1:
+                # GLM and similar does not divide resid_pearson by scale
                 self.resid = self.resid / np.sqrt(self.scale)
-        self.cov_params = (cov_params if cov_params is not None else
-            results.cov_params())
+
+        self.cov_params = (cov_params if cov_params is not None
+                           else results.cov_params())
         self.model_class = results.model.__class__
+
         self.hessian = self.results.model.hessian(self.results.params)
         self.score_obs = self.results.model.score_obs(self.results.params)
         if hat_matrix_diag is not None:
@@ -267,14 +414,38 @@ class MLEInfluence(_BaseInfluenceMixin):

         This is the analogue of the hat matrix diagonal for general MLE.
         """
-        pass
+        if hasattr(self, '_hat_matrix_diag'):
+            return self._hat_matrix_diag
+
+        try:
+            dsdy = self.results.model._deriv_score_obs_dendog(
+                self.results.params)
+        except NotImplementedError:
+            dsdy = None
+
+        if dsdy is None:
+            warnings.warn("hat matrix is not available, missing derivatives",
+                          UserWarning)
+            return None
+
+        dmu_dp = self.results.model._deriv_mean_dparams(self.results.params)
+
+        # dmu_dp = 1 /
+        #      self.results.model.family.link.deriv(self.results.fittedvalues)
+        h = (dmu_dp * np.linalg.solve(-self.hessian, dsdy.T).T).sum(1)
+        return h

     @cache_readonly
     def hat_matrix_exog_diag(self):
         """Diagonal of the hat_matrix using only exog as in OLS

         """
-        pass
+        get_exogs = getattr(self.results.model, "_get_exogs", None)
+        if get_exogs is not None:
+            exog = np.column_stack(get_exogs())
+        else:
+            exog = self.exog
+        return (exog * np.linalg.pinv(exog).T).sum(1)

     @cache_readonly
     def d_params(self):
@@ -283,7 +454,12 @@ class MLEInfluence(_BaseInfluenceMixin):
         This uses one-step approximation of the parameter change to deleting
         one observation.
         """
-        pass
+        so_noti = self.score_obs.sum(0) - self.score_obs
+        beta_i = np.linalg.solve(self.hessian, so_noti.T).T
+        if self.hat_matrix_diag is not None:
+            beta_i /= (1 - self.hat_matrix_diag)[:, None]
+
+        return beta_i

     @cache_readonly
     def dfbetas(self):
@@ -292,7 +468,9 @@ class MLEInfluence(_BaseInfluenceMixin):
         The one-step change of parameters in d_params is rescaled by dividing
         by the standard error of the parameter estimate given by results.bse.
         """
-        pass
+
+        beta_i = self.d_params / self.results.bse
+        return beta_i

     @cache_readonly
     def params_one(self):
@@ -301,7 +479,7 @@ class MLEInfluence(_BaseInfluenceMixin):
         This the one step parameter estimate computed as
         ``params`` from the full sample minus ``d_params``.
         """
-        pass
+        return self.results.params - self.d_params

     @cache_readonly
     def cooks_distance(self):
@@ -317,7 +495,17 @@ class MLEInfluence(_BaseInfluenceMixin):
         chi-square distribution instead of F-distribution, or if we make it
         dependent on the fit keyword use_t.
         """
-        pass
+        cooks_d2 = (self.d_params * np.linalg.solve(self.cov_params,
+                                                    self.d_params.T).T).sum(1)
+        cooks_d2 /= self.k_params
+        from scipy import stats
+
+        # alpha = 0.1
+        # print stats.f.isf(1-alpha, n_params, res.df_modelwc)
+        # TODO use chi2   # use_f option
+        pvals = stats.f.sf(cooks_d2, self.k_params, self.results.df_resid)
+
+        return cooks_d2, pvals

     @cache_readonly
     def resid_studentized(self):
@@ -331,7 +519,7 @@ class MLEInfluence(_BaseInfluenceMixin):
         Studentized residuals are not available if hat_matrix_diag is None.

         """
-        pass
+        return self.resid / np.sqrt(1 - self.hat_matrix_diag)

     def resid_score_factor(self):
         """Score residual divided by sqrt of hessian factor.
@@ -344,7 +532,18 @@ class MLEInfluence(_BaseInfluenceMixin):
         is positive, i.e. loglikelihood is not globally concave w.r.t. linear
         predictor. (This occured in an example for GeneralizedPoisson)
         """
-        pass
+        from statsmodels.genmod.generalized_linear_model import GLM
+        sf = self.results.model.score_factor(self.results.params)
+        hf = self.results.model.hessian_factor(self.results.params)
+        if isinstance(sf, tuple):
+            sf = sf[0]
+        if isinstance(hf, tuple):
+            hf = hf[0]
+        if not isinstance(self.results.model, GLM):
+            # hessian_factor in GLM has wrong sign, is already positive
+            hf = -hf
+
+        return sf / np.sqrt(hf) / np.sqrt(1 - self.hat_matrix_diag)

     def resid_score(self, joint=True, index=None, studentize=False):
         """Score observations scaled by inverse hessian.
@@ -382,7 +581,37 @@ class MLEInfluence(_BaseInfluenceMixin):
           This will make them differ in the case of robust cov_params.

         """
-        pass
+        # currently no caching
+        score_obs = self.results.model.score_obs(self.results.params)
+        hess = self.results.model.hessian(self.results.params)
+        if index is not None:
+            score_obs = score_obs[:, index]
+            hess = hess[index[:, None], index]
+
+        if joint:
+            resid = (score_obs.T * np.linalg.solve(-hess, score_obs.T)).sum(0)
+        else:
+            resid = score_obs / np.sqrt(np.diag(-hess))
+
+        if studentize:
+            if joint:
+                resid /= np.sqrt(1 - self.hat_matrix_diag)
+            else:
+                # 2-dim resid
+                resid /= np.sqrt(1 - self.hat_matrix_diag[:, None])
+
+        return resid
+
+    @cache_readonly
+    def _get_prediction(self):
+        # TODO: do we cache this or does it need to be a method
+        # we only need unchanging parts, alpha for confint could change
+        with warnings.catch_warnings():
+            msg = 'linear keyword is deprecated, use which="linear"'
+            warnings.filterwarnings("ignore", message=msg,
+                                    category=FutureWarning)
+            pred = self.results.get_prediction()
+        return pred

     @cache_readonly
     def d_fittedvalues(self):
@@ -396,7 +625,10 @@ class MLEInfluence(_BaseInfluenceMixin):
         This uses the one-step approximation of the parameter change to
         deleting one observation ``d_params``.
         """
-        pass
+        # results.params might be a pandas.Series
+        params = np.asarray(self.results.params)
+        deriv = self.results.model._deriv_mean_dparams(params)
+        return (deriv * self.d_params).sum(1)

     @property
     def d_fittedvalues_scaled(self):
@@ -407,7 +639,10 @@ class MLEInfluence(_BaseInfluenceMixin):
         one observation ``d_params``, and divides by the standard errors
         for the predicted mean provided by results.get_prediction.
         """
-        pass
+        # Note: this and the previous methods are for the response
+        # and not for a weighted response, i.e. not the self.exog, self.endog
+        # this will be relevant for WLS comparing fitted endog versus wendog
+        return self.d_fittedvalues / self._get_prediction.se

     def summary_frame(self):
         """
@@ -432,7 +667,34 @@ class MLEInfluence(_BaseInfluenceMixin):
         * dffits_internal : DFFITS statistics using internally Studentized
           residuals defined in `d_fittedvalues_scaled`
         """
-        pass
+        from pandas import DataFrame
+
+        # row and column labels
+        data = self.results.model.data
+        row_labels = data.row_labels
+        beta_labels = ['dfb_' + i for i in data.xnames]
+
+        # grab the results
+        if self.hat_matrix_diag is not None:
+            summary_data = DataFrame(dict(
+                cooks_d=self.cooks_distance[0],
+                standard_resid=self.resid_studentized,
+                hat_diag=self.hat_matrix_diag,
+                dffits_internal=self.d_fittedvalues_scaled),
+                index=row_labels)
+        else:
+            summary_data = DataFrame(dict(
+                cooks_d=self.cooks_distance[0],
+                # standard_resid=self.resid_studentized,
+                # hat_diag=self.hat_matrix_diag,
+                dffits_internal=self.d_fittedvalues_scaled),
+                index=row_labels)
+
+        # NOTE: if we do not give columns, order of above will be arbitrary
+        dfbeta = DataFrame(self.dfbetas, columns=beta_labels,
+                           index=row_labels)
+
+        return dfbeta.join(summary_data)


 class OLSInfluence(_BaseInfluenceMixin):
@@ -464,13 +726,17 @@ class OLSInfluence(_BaseInfluenceMixin):
     """

     def __init__(self, results):
+        # check which model is allowed
         self.results = maybe_unwrap_results(results)
         self.nobs, self.k_vars = results.model.exog.shape
         self.endog = results.model.endog
         self.exog = results.model.exog
         self.resid = results.resid
         self.model_class = results.model.__class__
+
+        # self.sigma_est = np.sqrt(results.mse_resid)
         self.scale = results.mse_resid
+
         self.aux_regression_exog = {}
         self.aux_regression_endog = {}

@@ -482,13 +748,14 @@ class OLSInfluence(_BaseInfluenceMixin):
         -----
         temporarily calculated here, this should go to model class
         """
-        pass
+        return (self.exog * self.results.model.pinv_wexog.T).sum(1)

     @cache_readonly
     def resid_press(self):
         """PRESS residuals
         """
-        pass
+        hii = self.hat_matrix_diag
+        return self.resid / (1 - hii)

     @cache_readonly
     def influence(self):
@@ -498,7 +765,8 @@ class OLSInfluence(_BaseInfluenceMixin):
         u * h / (1 - h)
         where u are the residuals and h is the diagonal of the hat_matrix
         """
-        pass
+        hii = self.hat_matrix_diag
+        return self.resid * hii / (1 - hii)

     @cache_readonly
     def hat_diag_factor(self):
@@ -507,13 +775,14 @@ class OLSInfluence(_BaseInfluenceMixin):
         this might be useful for internal reuse
         h / (1 - h)
         """
-        pass
+        hii = self.hat_matrix_diag
+        return hii / (1 - hii)

     @cache_readonly
     def ess_press(self):
         """Error sum of squares of PRESS residuals
         """
-        pass
+        return np.dot(self.resid_press, self.resid_press)

     @cache_readonly
     def resid_studentized(self):
@@ -523,7 +792,7 @@ class OLSInfluence(_BaseInfluenceMixin):
         MLEInfluence this uses sigma from original estimate and does
         not require leave one out loop
         """
-        pass
+        return self.resid_studentized_internal

     @cache_readonly
     def resid_studentized_internal(self):
@@ -532,7 +801,8 @@ class OLSInfluence(_BaseInfluenceMixin):
         this uses sigma from original estimate
         does not require leave one out loop
         """
-        pass
+        return self.get_resid_studentized_external(sigma=None)
+        # return self.results.resid / self.sigma_est

     @cache_readonly
     def resid_studentized_external(self):
@@ -542,7 +812,8 @@ class OLSInfluence(_BaseInfluenceMixin):

         requires leave one out loop for observations
         """
-        pass
+        sigma_looo = np.sqrt(self.sigma2_not_obsi)
+        return self.get_resid_studentized_external(sigma=sigma_looo)

     def get_resid_studentized_external(self, sigma=None):
         """calculate studentized residuals
@@ -568,8 +839,15 @@ class OLSInfluence(_BaseInfluenceMixin):
         estimate of the standard deviation of the residuals, and hii is the
         diagonal of the hat_matrix.
         """
-        pass
+        hii = self.hat_matrix_diag
+        if sigma is None:
+            sigma2_est = self.scale
+            # can be replace by different estimators of sigma
+            sigma = np.sqrt(sigma2_est)
+
+        return self.resid / sigma / np.sqrt(1 - hii)

+    # same computation as GLMInfluence
     @cache_readonly
     def cooks_distance(self):
         """
@@ -584,7 +862,18 @@ class OLSInfluence(_BaseInfluenceMixin):
         .. [*] Cook's distance. (n.d.). In Wikipedia. July 2019, from
             https://en.wikipedia.org/wiki/Cook%27s_distance
         """
-        pass
+        hii = self.hat_matrix_diag
+        # Eubank p.93, 94
+        cooks_d2 = self.resid_studentized ** 2 / self.k_vars
+        cooks_d2 *= hii / (1 - hii)
+
+        from scipy import stats
+
+        # alpha = 0.1
+        # print stats.f.isf(1-alpha, n_params, res.df_modelwc)
+        pvals = stats.f.sf(cooks_d2, self.k_vars, self.results.df_resid)
+
+        return cooks_d2, pvals

     @cache_readonly
     def dffits_internal(self):
@@ -593,7 +882,13 @@ class OLSInfluence(_BaseInfluenceMixin):
         based on resid_studentized_internal
         uses original results, no nobs loop
         """
-        pass
+        # TODO: do I want to use different sigma estimate in
+        #      resid_studentized_external
+        # -> move definition of sigma_error to the __init__
+        hii = self.hat_matrix_diag
+        dffits_ = self.resid_studentized_internal * np.sqrt(hii / (1 - hii))
+        dffits_threshold = 2 * np.sqrt(self.k_vars * 1. / self.nobs)
+        return dffits_, dffits_threshold

     @cache_readonly
     def dffits(self):
@@ -616,7 +911,13 @@ class OLSInfluence(_BaseInfluenceMixin):
         ----------
         `Wikipedia <https://en.wikipedia.org/wiki/DFFITS>`_
         """
-        pass
+        # TODO: do I want to use different sigma estimate in
+        #      resid_studentized_external
+        # -> move definition of sigma_error to the __init__
+        hii = self.hat_matrix_diag
+        dffits_ = self.resid_studentized_external * np.sqrt(hii / (1 - hii))
+        dffits_threshold = 2 * np.sqrt(self.k_vars * 1. / self.nobs)
+        return dffits_, dffits_threshold

     @cache_readonly
     def dfbetas(self):
@@ -624,7 +925,10 @@ class OLSInfluence(_BaseInfluenceMixin):

         uses results from leave-one-observation-out loop
         """
-        pass
+        dfbetas = self.results.params - self.params_not_obsi  # [None,:]
+        dfbetas /= np.sqrt(self.sigma2_not_obsi[:, None])
+        dfbetas /= np.sqrt(np.diag(self.results.normalized_cov_params))
+        return dfbetas

     @cache_readonly
     def dfbeta(self):
@@ -632,7 +936,8 @@ class OLSInfluence(_BaseInfluenceMixin):

         uses results from leave-one-observation-out loop
         """
-        pass
+        dfbeta = self.results.params - self.params_not_obsi
+        return dfbeta

     @cache_readonly
     def sigma2_not_obsi(self):
@@ -642,7 +947,7 @@ class OLSInfluence(_BaseInfluenceMixin):

         uses results from leave-one-observation-out loop
         """
-        pass
+        return np.asarray(self._res_looo['mse_resid'])

     @property
     def params_not_obsi(self):
@@ -650,7 +955,7 @@ class OLSInfluence(_BaseInfluenceMixin):

         uses results from leave-one-observation-out loop
         """
-        pass
+        return np.asarray(self._res_looo['params'])

     @property
     def det_cov_params_not_obsi(self):
@@ -658,7 +963,7 @@ class OLSInfluence(_BaseInfluenceMixin):

         uses results from leave-one-observation-out loop
         """
-        pass
+        return np.asarray(self._res_looo['det_cov_params'])

     @cache_readonly
     def cov_ratio(self):
@@ -668,7 +973,10 @@ class OLSInfluence(_BaseInfluenceMixin):
         from leave-one-out estimates.
         requires leave one out loop for observations
         """
-        pass
+        # do not use inplace division / because then we change original
+        cov_ratio = (self.det_cov_params_not_obsi
+                     / np.linalg.det(self.results.cov_params()))
+        return cov_ratio

     @cache_readonly
     def resid_var(self):
@@ -680,7 +988,8 @@ class OLSInfluence(_BaseInfluenceMixin):

         where hii is the diagonal of the hat matrix
         """
-        pass
+        # TODO:check if correct outside of ols
+        return self.scale * (1 - self.hat_matrix_diag)

     @cache_readonly
     def resid_std(self):
@@ -690,7 +999,7 @@ class OLSInfluence(_BaseInfluenceMixin):
         --------
         resid_var
         """
-        pass
+        return np.sqrt(self.resid_var)

     def _ols_xnoti(self, drop_idx, endog_idx='endog', store=True):
         """regression results from LOVO auxiliary regression with cache
@@ -715,7 +1024,34 @@ class OLSInfluence(_BaseInfluenceMixin):
         this needs more thought, memory versus speed
         not yet used in any other parts, not sufficiently tested
         """
-        pass
+        # reverse the structure, access store, if fail calculate ?
+        # this creates keys in store even if store = false ! bug
+        if endog_idx == 'endog':
+            stored = self.aux_regression_endog
+            if hasattr(stored, drop_idx):
+                return stored[drop_idx]
+            x_i = self.results.model.endog
+
+        else:
+            # nested dictionary
+            try:
+                self.aux_regression_exog[endog_idx][drop_idx]
+            except KeyError:
+                pass
+
+            stored = self.aux_regression_exog[endog_idx]
+            stored = {}
+
+            x_i = self.exog[:, endog_idx]
+
+        k_vars = self.exog.shape[1]
+        mask = np.arange(k_vars) != drop_idx
+        x_noti = self.exog[:, mask]
+        res = OLS(x_i, x_noti).fit()
+        if store:
+            stored[drop_idx] = res
+
+        return res

     def _get_drop_vari(self, attributes):
         """
@@ -732,7 +1068,19 @@ class OLSInfluence(_BaseInfluenceMixin):

         not yet used
         """
-        pass
+        from statsmodels.sandbox.tools.cross_val import LeaveOneOut
+
+        endog = self.results.model.endog
+        exog = self.exog
+
+        cv_iter = LeaveOneOut(self.k_vars)
+        res_loo = defaultdict(list)
+        for inidx, outidx in cv_iter:
+            for att in attributes:
+                res_i = self.model_class(endog, exog[:, inidx]).fit()
+                res_loo[att].append(getattr(res_i, att))
+
+        return res_loo

     @cache_readonly
     def _res_looo(self):
@@ -745,7 +1093,27 @@ class OLSInfluence(_BaseInfluenceMixin):

         this uses a nobs loop, only attributes of the OLS instance are stored.
         """
-        pass
+        from statsmodels.sandbox.tools.cross_val import LeaveOneOut
+
+        def get_det_cov_params(res):
+            return np.linalg.det(res.cov_params())
+
+        endog = self.results.model.endog
+        exog = self.results.model.exog
+
+        params = np.zeros(exog.shape, dtype=float)
+        mse_resid = np.zeros(endog.shape, dtype=float)
+        det_cov_params = np.zeros(endog.shape, dtype=float)
+
+        cv_iter = LeaveOneOut(self.nobs)
+        for inidx, outidx in cv_iter:
+            res_i = self.model_class(endog[inidx], exog[inidx]).fit()
+            params[outidx] = res_i.params
+            mse_resid[outidx] = res_i.mse_resid
+            det_cov_params[outidx] = get_det_cov_params(res_i)
+
+        return dict(params=params, mse_resid=mse_resid,
+                    det_cov_params=det_cov_params)

     def summary_frame(self):
         """
@@ -773,9 +1141,30 @@ class OLSInfluence(_BaseInfluenceMixin):
         * student_resid : Externally Studentized residuals defined in
           `Influence.resid_studentized_external`
         """
-        pass
-
-    def summary_table(self, float_fmt='%6.3f'):
+        from pandas import DataFrame
+
+        # row and column labels
+        data = self.results.model.data
+        row_labels = data.row_labels
+        beta_labels = ['dfb_' + i for i in data.xnames]
+
+        # grab the results
+        summary_data = DataFrame(dict(
+            cooks_d=self.cooks_distance[0],
+            standard_resid=self.resid_studentized_internal,
+            hat_diag=self.hat_matrix_diag,
+            dffits_internal=self.dffits_internal[0],
+            student_resid=self.resid_studentized_external,
+            dffits=self.dffits[0],
+        ),
+            index=row_labels)
+        # NOTE: if we do not give columns, order of above will be arbitrary
+        dfbeta = DataFrame(self.dfbetas, columns=beta_labels,
+                           index=row_labels)
+
+        return dfbeta.join(summary_data)
+
+    def summary_table(self, float_fmt="%6.3f"):
         """create a summary table with all influence and outlier measures

         This does currently not distinguish between statistics that can be
@@ -791,7 +1180,42 @@ class OLSInfluence(_BaseInfluenceMixin):
         -----
         This also attaches table_data to the instance.
         """
-        pass
+        # print self.dfbetas
+
+        #        table_raw = [ np.arange(self.nobs),
+        #                      self.endog,
+        #                      self.fittedvalues,
+        #                      self.cooks_distance(),
+        #                      self.resid_studentized_internal,
+        #                      self.hat_matrix_diag,
+        #                      self.dffits_internal,
+        #                      self.resid_studentized_external,
+        #                      self.dffits,
+        #                      self.dfbetas
+        #                      ]
+        table_raw = [('obs', np.arange(self.nobs)),
+                     ('endog', self.endog),
+                     ('fitted\nvalue', self.results.fittedvalues),
+                     ("Cook's\nd", self.cooks_distance[0]),
+                     ("student.\nresidual", self.resid_studentized_internal),
+                     ('hat diag', self.hat_matrix_diag),
+                     ('dffits \ninternal', self.dffits_internal[0]),
+                     ("ext.stud.\nresidual", self.resid_studentized_external),
+                     ('dffits', self.dffits[0])
+                     ]
+        colnames, data = lzip(*table_raw)  # unzip
+        data = np.column_stack(data)
+        self.table_data = data
+        from copy import deepcopy
+
+        from statsmodels.iolib.table import SimpleTable, default_html_fmt
+        from statsmodels.iolib.tableformatting import fmt_base
+        fmt = deepcopy(fmt_base)
+        fmt_html = deepcopy(default_html_fmt)
+        fmt['data_fmts'] = ["%4d"] + [float_fmt] * (data.shape[1] - 1)
+        # fmt_html['data_fmts'] = fmt['data_fmts']
+        return SimpleTable(data, headers=colnames, txt_fmt=fmt,
+                           html_fmt=fmt_html)


 def summary_table(res, alpha=0.05):
@@ -812,7 +1236,67 @@ def summary_table(res, alpha=0.05):
     ss2 : list[str]
        column_names for table (Note: rows of table are observations)
     """
-    pass
+
+    from scipy import stats
+
+    from statsmodels.sandbox.regression.predstd import wls_prediction_std
+
+    infl = OLSInfluence(res)
+
+    # standard error for predicted mean
+    # Note: using hat_matrix only works for fitted values
+    predict_mean_se = np.sqrt(infl.hat_matrix_diag * res.mse_resid)
+
+    tppf = stats.t.isf(alpha / 2., res.df_resid)
+    predict_mean_ci = np.column_stack([
+        res.fittedvalues - tppf * predict_mean_se,
+        res.fittedvalues + tppf * predict_mean_se])
+
+    # standard error for predicted observation
+    tmp = wls_prediction_std(res, alpha=alpha)
+    predict_se, predict_ci_low, predict_ci_upp = tmp
+
+    predict_ci = np.column_stack((predict_ci_low, predict_ci_upp))
+
+    # standard deviation of residual
+    resid_se = np.sqrt(res.mse_resid * (1 - infl.hat_matrix_diag))
+
+    table_sm = np.column_stack([
+        np.arange(res.nobs) + 1,
+        res.model.endog,
+        res.fittedvalues,
+        predict_mean_se,
+        predict_mean_ci[:, 0],
+        predict_mean_ci[:, 1],
+        predict_ci[:, 0],
+        predict_ci[:, 1],
+        res.resid,
+        resid_se,
+        infl.resid_studentized_internal,
+        infl.cooks_distance[0]
+    ])
+
+    # colnames, data = lzip(*table_raw) #unzip
+    data = table_sm
+    ss2 = ['Obs', 'Dep Var\nPopulation', 'Predicted\nValue',
+           'Std Error\nMean Predict', 'Mean ci\n95% low', 'Mean ci\n95% upp',
+           'Predict ci\n95% low', 'Predict ci\n95% upp', 'Residual',
+           'Std Error\nResidual', 'Student\nResidual', "Cook's\nD"]
+    colnames = ss2
+    # self.table_data = data
+    # data = np.column_stack(data)
+    from copy import deepcopy
+
+    from statsmodels.iolib.table import SimpleTable, default_html_fmt
+    from statsmodels.iolib.tableformatting import fmt_base
+    fmt = deepcopy(fmt_base)
+    fmt_html = deepcopy(default_html_fmt)
+    fmt['data_fmts'] = ["%4d"] + ["%6.3f"] * (data.shape[1] - 1)
+    # fmt_html['data_fmts'] = fmt['data_fmts']
+    st = SimpleTable(data, headers=colnames, txt_fmt=fmt,
+                     html_fmt=fmt_html)
+
+    return st, data, ss2


 class GLMInfluence(MLEInfluence):
@@ -869,7 +1353,10 @@ class GLMInfluence(MLEInfluence):
         argument to GLMInfluence or computes it using the results method
         `get_hat_matrix`.
         """
-        pass
+        if hasattr(self, '_hat_matrix_diag'):
+            return self._hat_matrix_diag
+        else:
+            return self.results.get_hat_matrix()

     @cache_readonly
     def d_params(self):
@@ -880,8 +1367,12 @@ class GLMInfluence(MLEInfluence):
         This uses one-step approximation of the parameter change to deleting
         one observation.
         """
-        pass

+        beta_i = np.linalg.pinv(self.exog) * self.resid_studentized
+        beta_i /= np.sqrt(1 - self.hat_matrix_diag)
+        return beta_i.T
+
+    # same computation as OLS
     @cache_readonly
     def resid_studentized(self):
         """
@@ -895,8 +1386,10 @@ class GLMInfluence(MLEInfluence):
         pearson residuals by default, and
         hii is the diagonal of the hat matrix.
         """
-        pass
+        # redundant with scaled resid_pearson, keep for docstring for now
+        return super().resid_studentized

+    # same computation as OLS
     @cache_readonly
     def cooks_distance(self):
         """Cook's distance
@@ -912,7 +1405,18 @@ class GLMInfluence(MLEInfluence):
         It includes p-values based on the F-distribution which are only
         approximate outside of linear Gaussian models.
         """
-        pass
+        hii = self.hat_matrix_diag
+        # Eubank p.93, 94
+        cooks_d2 = self.resid_studentized ** 2 / self.k_vars
+        cooks_d2 *= hii / (1 - hii)
+
+        from scipy import stats
+
+        # alpha = 0.1
+        # print stats.f.isf(1-alpha, n_params, res.df_modelwc)
+        pvals = stats.f.sf(cooks_d2, self.k_vars, self.results.df_resid)
+
+        return cooks_d2, pvals

     @property
     def d_linpred(self):
@@ -922,7 +1426,10 @@ class GLMInfluence(MLEInfluence):
         This uses one-step approximation of the parameter change to deleting
         one observation ``d_params``.
         """
-        pass
+        # TODO: This will need adjustment for extra params in Poisson
+        # use original model exog not transformed influence exog
+        exog = self.results.model.exog
+        return (exog * self.d_params).sum(1)

     @property
     def d_linpred_scaled(self):
@@ -933,19 +1440,29 @@ class GLMInfluence(MLEInfluence):
         one observation ``d_params``, and divides by the standard errors
         for linpred provided by results.get_prediction.
         """
-        pass
+        # Note: this and the previous methods are for the response
+        # and not for a weighted response, i.e. not the self.exog, self.endog
+        # this will be relevant for WLS comparing fitted endog versus wendog
+        return self.d_linpred / self._get_prediction.linpred.se

     @property
     def _fittedvalues_one(self):
         """experimental code
         """
-        pass
+        warnings.warn('this ignores offset and exposure', UserWarning)
+        # TODO: we need to handle offset, exposure and weights
+        # use original model exog not transformed influence exog
+        exog = self.results.model.exog
+        fitted = np.array([self.results.model.predict(pi, exog[i])
+                           for i, pi in enumerate(self.params_one)])
+        return fitted.squeeze()

     @property
     def _diff_fittedvalues_one(self):
         """experimental code
         """
-        pass
+        # in discrete we cannot reuse results.fittedvalues
+        return self.results.predict() - self._fittedvalues_one

     @cache_readonly
     def _res_looo(self):
@@ -963,4 +1480,55 @@ class GLMInfluence(MLEInfluence):
         Warning: This will need refactoring and API changes to be able to
         add options.
         """
-        pass
+        from statsmodels.sandbox.tools.cross_val import LeaveOneOut
+        get_det_cov_params = lambda res: np.linalg.det(res.cov_params())
+
+        endog = self.results.model.endog
+        exog = self.results.model.exog
+
+        init_kwds = self.results.model._get_init_kwds()
+        # We need to drop obs also from extra arrays
+        freq_weights = init_kwds.pop('freq_weights')
+        var_weights = init_kwds.pop('var_weights')
+        offset = offset_ = init_kwds.pop('offset')
+        exposure = exposure_ = init_kwds.pop('exposure')
+        n_trials = init_kwds.pop('n_trials', None)
+        # family Binomial creates `n` i.e. `n_trials`
+        # we need to reset it
+        # TODO: figure out how to do this properly
+        if hasattr(init_kwds['family'], 'initialize'):
+            # assume we have Binomial
+            is_binomial = True
+        else:
+            is_binomial = False
+
+        params = np.zeros(exog.shape, dtype=float)
+        scale = np.zeros(endog.shape, dtype=float)
+        det_cov_params = np.zeros(endog.shape, dtype=float)
+
+        cv_iter = LeaveOneOut(self.nobs)
+        for inidx, outidx in cv_iter:
+            if offset is not None:
+                offset_ = offset[inidx]
+            if exposure is not None:
+                exposure_ = exposure[inidx]
+            if n_trials is not None:
+                init_kwds['n_trials'] = n_trials[inidx]
+
+            mod_i = self.model_class(endog[inidx], exog[inidx],
+                                     offset=offset_,
+                                     exposure=exposure_,
+                                     freq_weights=freq_weights[inidx],
+                                     var_weights=var_weights[inidx],
+                                     **init_kwds)
+            if is_binomial:
+                mod_i.family.n = init_kwds['n_trials']
+            res_i = mod_i.fit(start_params=self.results.params,
+                              method='newton')
+            params[outidx] = res_i.params.copy()
+            scale[outidx] = res_i.scale
+            det_cov_params[outidx] = get_det_cov_params(res_i)
+
+        return dict(params=params, scale=scale, mse_resid=scale,
+                    # alias for now
+                    det_cov_params=det_cov_params)
diff --git a/statsmodels/stats/power.py b/statsmodels/stats/power.py
index 7bb5715bc..c50d832ab 100644
--- a/statsmodels/stats/power.py
+++ b/statsmodels/stats/power.py
@@ -1,3 +1,5 @@
+# -*- coding: utf-8 -*-
+#pylint: disable-msg=W0142
 """Statistical power, solving for nobs, ... - trial version

 Created on Sat Jan 12 21:48:06 2013
@@ -29,18 +31,71 @@ refactoring

 """
 import warnings
+
 import numpy as np
 from scipy import stats, optimize, special
 from statsmodels.tools.rootfinding import brentq_expanding


-def ttest_power(effect_size, nobs, alpha, df=None, alternative='two-sided'):
-    """Calculate power of a ttest
-    """
-    pass
+def nct_cdf(x, df, nc):
+    return special.nctdtr(df, nc, x)
+
+
+def nct_sf(x, df, nc):
+    return 1 - special.nctdtr(df, nc, x)
+
+
+def ncf_cdf(x, dfn, dfd, nc):
+    return special.ncfdtr(dfn, dfd, nc, x)
+

+def ncf_sf(x, dfn, dfd, nc):
+    return 1 - special.ncfdtr(dfn, dfd, nc, x)

-def normal_power(effect_size, nobs, alpha, alternative='two-sided', sigma=1.0):
+
+def ncf_ppf(q, dfn, dfd, nc):
+    return special.ncfdtri(dfn, dfd, nc, q)
+
+
+def ttest_power(effect_size, nobs, alpha, df=None, alternative='two-sided'):
+    '''Calculate power of a ttest
+    '''
+    d = effect_size
+    if df is None:
+        df = nobs - 1
+
+    if alternative in ['two-sided', '2s']:
+        alpha_ = alpha / 2.  #no inplace changes, does not work
+    elif alternative in ['smaller', 'larger']:
+        alpha_ = alpha
+    else:
+        raise ValueError("alternative has to be 'two-sided', 'larger' " +
+                         "or 'smaller'")
+
+    pow_ = 0
+    if alternative in ['two-sided', '2s', 'larger']:
+        crit_upp = stats.t.isf(alpha_, df)
+        #print crit_upp, df, d*np.sqrt(nobs)
+        # use private methods, generic methods return nan with negative d
+        if np.any(np.isnan(crit_upp)):
+            # avoid endless loop, https://github.com/scipy/scipy/issues/2667
+            pow_ = np.nan
+        else:
+            # pow_ = stats.nct._sf(crit_upp, df, d*np.sqrt(nobs))
+            # use scipy.special
+            pow_ = nct_sf(crit_upp, df, d*np.sqrt(nobs))
+    if alternative in ['two-sided', '2s', 'smaller']:
+        crit_low = stats.t.ppf(alpha_, df)
+        #print crit_low, df, d*np.sqrt(nobs)
+        if np.any(np.isnan(crit_low)):
+            pow_ = np.nan
+        else:
+            # pow_ += stats.nct._cdf(crit_low, df, d*np.sqrt(nobs))
+            pow_ += nct_cdf(crit_low, df, d*np.sqrt(nobs))
+    return pow_
+
+
+def normal_power(effect_size, nobs, alpha, alternative='two-sided', sigma=1.):
     """Calculate power of a normal distributed test statistic

     This is an generalization of `normal_power` when variance under Null and
@@ -61,11 +116,29 @@ def normal_power(effect_size, nobs, alpha, alternative='two-sided', sigma=1.0):
         two-sided (default) or one sided test. The one-sided test can be
         either 'larger', 'smaller'.
     """
-    pass

+    d = effect_size

-def normal_power_het(diff, nobs, alpha, std_null=1.0, std_alternative=None,
-    alternative='two-sided'):
+    if alternative in ['two-sided', '2s']:
+        alpha_ = alpha / 2.  #no inplace changes, does not work
+    elif alternative in ['smaller', 'larger']:
+        alpha_ = alpha
+    else:
+        raise ValueError("alternative has to be 'two-sided', 'larger' " +
+                         "or 'smaller'")
+
+    pow_ = 0
+    if alternative in ['two-sided', '2s', 'larger']:
+        crit = stats.norm.isf(alpha_)
+        pow_ = stats.norm.sf(crit - d*np.sqrt(nobs)/sigma)
+    if alternative in ['two-sided', '2s', 'smaller']:
+        crit = stats.norm.ppf(alpha_)
+        pow_ += stats.norm.cdf(crit - d*np.sqrt(nobs)/sigma)
+    return pow_
+
+
+def normal_power_het(diff, nobs, alpha, std_null=1., std_alternative=None,
+                 alternative='two-sided'):
     """Calculate power of a normal distributed test statistic

     This is an generalization of `normal_power` when variance under Null and
@@ -95,11 +168,34 @@ def normal_power_het(diff, nobs, alpha, std_null=1.0, std_alternative=None,
     -------
     power : float
     """
-    pass
-

-def normal_sample_size_one_tail(diff, power, alpha, std_null=1.0,
-    std_alternative=None):
+    d = diff
+    if std_alternative is None:
+        std_alternative = std_null
+
+    if alternative in ['two-sided', '2s']:
+        alpha_ = alpha / 2.  #no inplace changes, does not work
+    elif alternative in ['smaller', 'larger']:
+        alpha_ = alpha
+    else:
+        raise ValueError("alternative has to be 'two-sided', 'larger' " +
+                         "or 'smaller'")
+
+    std_ratio = std_null / std_alternative
+    pow_ = 0
+    if alternative in ['two-sided', '2s', 'larger']:
+        crit = stats.norm.isf(alpha_)
+        pow_ = stats.norm.sf(crit * std_ratio -
+                             d*np.sqrt(nobs) / std_alternative)
+    if alternative in ['two-sided', '2s', 'smaller']:
+        crit = stats.norm.ppf(alpha_)
+        pow_ += stats.norm.cdf(crit * std_ratio -
+                               d*np.sqrt(nobs) / std_alternative)
+    return pow_
+
+
+def normal_sample_size_one_tail(diff, power, alpha, std_null=1.,
+                                std_alternative=None):
     """explicit sample size computation if only one tail is relevant

     The sample size is based on the power in one tail assuming that the
@@ -138,21 +234,33 @@ def normal_sample_size_one_tail(diff, power, alpha, std_null=1.0,
         std_alternative is equal to std_null.

     """
-    pass
+
+    if std_alternative is None:
+        std_alternative = std_null
+
+    crit_power = stats.norm.isf(power)
+    crit = stats.norm.isf(alpha)
+    n1 = (np.maximum(crit * std_null - crit_power * std_alternative, 0)
+          / diff)**2
+    return n1


 def ftest_anova_power(effect_size, nobs, alpha, k_groups=2, df=None):
-    """power for ftest for one way anova with k equal sized groups
+    '''power for ftest for one way anova with k equal sized groups

     nobs total sample size, sum over all groups

     should be general nobs observations, k_groups restrictions ???
-    """
-    pass
+    '''
+    df_num = k_groups - 1
+    df_denom = nobs - k_groups
+    crit = stats.f.isf(alpha, df_num, df_denom)
+    pow_ = ncf_sf(crit, df_num, df_denom, effect_size**2 * nobs)
+    return pow_


 def ftest_power(effect_size, df2, df1, alpha, ncc=1):
-    """Calculate the power of a F-test.
+    '''Calculate the power of a F-test.

     Parameters
     ----------
@@ -194,12 +302,18 @@ def ftest_power(effect_size, df2, df1, alpha, ncc=1):
     ftest_power with ncc=0 should also be correct for f_test in regression
     models, with df_num (df1) as number of constraints and d_denom (df2) as
     df_resid.
-    """
-    pass
+    '''
+    df_num, df_denom = df1, df2
+    nc = effect_size**2 * (df_denom + df_num + ncc)
+    crit = stats.f.isf(alpha, df_num, df_denom)
+    # pow_ = stats.ncf.sf(crit, df_num, df_denom, nc)
+    # use scipy.special for ncf
+    pow_ = ncf_sf(crit, df_num, df_denom, nc)
+    return pow_ #, crit, nc


 def ftest_power_f2(effect_size, df_num, df_denom, alpha, ncc=1):
-    """Calculate the power of a F-test.
+    '''Calculate the power of a F-test.

     Based on Cohen's `f^2` effect size.

@@ -249,33 +363,56 @@ def ftest_power_f2(effect_size, df_num, df_denom, alpha, ncc=1):
     ftest_power with ncc=0 should also be correct for f_test in regression
     models, with df_num (df1) as number of constraints and d_denom (df2) as
     df_resid.
-    """
-    pass
+    '''
+
+    nc = effect_size * (df_denom + df_num + ncc)
+    crit = stats.f.isf(alpha, df_num, df_denom)
+    # pow_ = stats.ncf.sf(crit, df_num, df_denom, nc)
+    # use scipy.special for ncf
+    pow_ = ncf_sf(crit, df_num, df_denom, nc)
+    return pow_


+#class based implementation
+#--------------------------
+
 class Power:
-    """Statistical Power calculations, Base Class
+    '''Statistical Power calculations, Base Class

     so far this could all be class methods
-    """
+    '''

     def __init__(self, **kwds):
         self.__dict__.update(kwds)
-        self.start_ttp = dict(effect_size=0.01, nobs=10.0, alpha=0.15,
-            power=0.6, nobs1=10.0, ratio=1, df_num=10, df_denom=3)
+        # used only for instance level start values
+        self.start_ttp = dict(effect_size=0.01, nobs=10., alpha=0.15,
+                              power=0.6, nobs1=10., ratio=1,
+                              df_num=10, df_denom=3   # for FTestPower
+                              )
+        # TODO: nobs1 and ratio are for ttest_ind,
+        #      need start_ttp for each test/class separately,
+        # possible rootfinding problem for effect_size, starting small seems to
+        # work
         from collections import defaultdict
         self.start_bqexp = defaultdict(dict)
         for key in ['nobs', 'nobs1', 'df_num', 'df_denom']:
-            self.start_bqexp[key] = dict(low=2.0, start_upp=50.0)
+            self.start_bqexp[key] = dict(low=2., start_upp=50.)
         for key in ['df_denom']:
-            self.start_bqexp[key] = dict(low=1.0, start_upp=50.0)
+            self.start_bqexp[key] = dict(low=1., start_upp=50.)
         for key in ['ratio']:
-            self.start_bqexp[key] = dict(low=1e-08, start_upp=2)
+            self.start_bqexp[key] = dict(low=1e-8, start_upp=2)
         for key in ['alpha']:
             self.start_bqexp[key] = dict(low=1e-12, upp=1 - 1e-12)

+    def power(self, *args, **kwds):
+        raise NotImplementedError
+
+    def _power_identity(self, *args, **kwds):
+        power_ = kwds.pop('power')
+        return self.power(*args, **kwds) - power_
+
     def solve_power(self, **kwds):
-        """solve for any one of the parameters of a t-test
+        '''solve for any one of the parameters of a t-test

         for t-test the keywords are:
             effect_size, nobs, alpha, power
@@ -292,11 +429,108 @@ class Power:
             three solvers that have been tried.


-        """
-        pass
-
-    def plot_power(self, dep_var='nobs', nobs=None, effect_size=None, alpha
-        =0.05, ax=None, title=None, plt_kwds=None, **kwds):
+        '''
+        #TODO: maybe use explicit kwds,
+        #    nicer but requires inspect? and not generic across tests
+        #    I'm duplicating this in the subclass to get informative docstring
+        key = [k for k,v in kwds.items() if v is None]
+        #print kwds, key
+        if len(key) != 1:
+            raise ValueError('need exactly one keyword that is None')
+        key = key[0]
+
+        if key == 'power':
+            del kwds['power']
+            return self.power(**kwds)
+
+        if kwds['effect_size'] == 0:
+            import warnings
+            from statsmodels.tools.sm_exceptions import HypothesisTestWarning
+            warnings.warn('Warning: Effect size of 0 detected', HypothesisTestWarning)
+            if key == 'power':
+                return kwds['alpha']
+            if key == 'alpha':
+                return kwds['power']
+            else:
+                raise ValueError('Cannot detect an effect-size of 0. Try changing your effect-size.')
+
+
+        self._counter = 0
+
+        def func(x):
+            kwds[key] = x
+            fval = self._power_identity(**kwds)
+            self._counter += 1
+            #print self._counter,
+            if self._counter > 500:
+                raise RuntimeError('possible endless loop (500 NaNs)')
+            if np.isnan(fval):
+                return np.inf
+            else:
+                return fval
+
+        #TODO: I'm using the following so I get a warning when start_ttp is not defined
+        try:
+            start_value = self.start_ttp[key]
+        except KeyError:
+            start_value = 0.9
+            import warnings
+            from statsmodels.tools.sm_exceptions import ValueWarning
+            warnings.warn('Warning: using default start_value for {0}'.format(key), ValueWarning)
+
+        fit_kwds = self.start_bqexp[key]
+        fit_res = []
+        #print vars()
+        try:
+            val, res = brentq_expanding(func, full_output=True, **fit_kwds)
+            failed = False
+            fit_res.append(res)
+        except ValueError:
+            failed = True
+            fit_res.append(None)
+
+        success = None
+        if (not failed) and res.converged:
+            success = 1
+        else:
+            # try backup
+            # TODO: check more cases to make this robust
+            if not np.isnan(start_value):
+                val, infodict, ier, msg = optimize.fsolve(func, start_value,
+                                                          full_output=True) #scalar
+                #val = optimize.newton(func, start_value) #scalar
+                fval = infodict['fvec']
+                fit_res.append(infodict)
+            else:
+                ier = -1
+                fval = 1
+                fit_res.append([None])
+
+            if ier == 1 and np.abs(fval) < 1e-4 :
+                success = 1
+            else:
+                #print infodict
+                if key in ['alpha', 'power', 'effect_size']:
+                    val, r = optimize.brentq(func, 1e-8, 1-1e-8,
+                                             full_output=True) #scalar
+                    success = 1 if r.converged else 0
+                    fit_res.append(r)
+                else:
+                    success = 0
+
+        if not success == 1:
+            import warnings
+            from statsmodels.tools.sm_exceptions import (ConvergenceWarning,
+                convergence_doc)
+            warnings.warn(convergence_doc, ConvergenceWarning)
+
+        #attach fit_res, for reading only, should be needed only for debugging
+        fit_res.insert(0, success)
+        self.cache_fit_res = fit_res
+        return val
+
+    def plot_power(self, dep_var='nobs', nobs=None, effect_size=None,
+                   alpha=0.05, ax=None, title=None, plt_kwds=None, **kwds):
         """
         Plot power with number of observations or effect size on x-axis

@@ -343,17 +577,58 @@ class Power:

         TODO: maybe add line variable, if we want more than nobs and effectsize
         """
-        pass
+        #if pwr_kwds is None:
+        #    pwr_kwds = {}
+        from statsmodels.graphics import utils
+        from statsmodels.graphics.plottools import rainbow
+        fig, ax = utils.create_mpl_ax(ax)
+        import matplotlib.pyplot as plt
+        colormap = plt.cm.Dark2 #pylint: disable-msg=E1101
+        plt_alpha = 1 #0.75
+        lw = 2
+        if dep_var == 'nobs':
+            colors = rainbow(len(effect_size))
+            colors = [colormap(i) for i in np.linspace(0, 0.9, len(effect_size))]
+            for ii, es in enumerate(effect_size):
+                power = self.power(es, nobs, alpha, **kwds)
+                ax.plot(nobs, power, lw=lw, alpha=plt_alpha,
+                        color=colors[ii], label='es=%4.2F' % es)
+                xlabel = 'Number of Observations'
+        elif dep_var in ['effect size', 'effect_size', 'es']:
+            colors = rainbow(len(nobs))
+            colors = [colormap(i) for i in np.linspace(0, 0.9, len(nobs))]
+            for ii, n in enumerate(nobs):
+                power = self.power(effect_size, n, alpha, **kwds)
+                ax.plot(effect_size, power, lw=lw, alpha=plt_alpha,
+                        color=colors[ii], label='N=%4.2F' % n)
+                xlabel = 'Effect Size'
+        elif dep_var in ['alpha']:
+            # experimental nobs as defining separate lines
+            colors = rainbow(len(nobs))
+
+            for ii, n in enumerate(nobs):
+                power = self.power(effect_size, n, alpha, **kwds)
+                ax.plot(alpha, power, lw=lw, alpha=plt_alpha,
+                        color=colors[ii], label='N=%4.2F' % n)
+                xlabel = 'alpha'
+        else:
+            raise ValueError('depvar not implemented')
+
+        if title is None:
+            title = 'Power of Test'
+        ax.set_xlabel(xlabel)
+        ax.set_title(title)
+        ax.legend(loc='lower right')
+        return fig


 class TTestPower(Power):
-    """Statistical Power calculations for one sample or paired sample t-test
+    '''Statistical Power calculations for one sample or paired sample t-test

-    """
+    '''

-    def power(self, effect_size, nobs, alpha, df=None, alternative='two-sided'
-        ):
-        """Calculate the power of a t-test for one sample or paired samples.
+    def power(self, effect_size, nobs, alpha, df=None, alternative='two-sided'):
+        '''Calculate the power of a t-test for one sample or paired samples.

         Parameters
         ----------
@@ -381,12 +656,16 @@ class TTestPower(Power):
             type II error. Power is the probability that the test correctly
             rejects the Null Hypothesis if the Alternative Hypothesis is true.

-       """
-        pass
+       '''
+        # for debugging
+        #print 'calling ttest power with', (effect_size, nobs, alpha, df, alternative)
+        return ttest_power(effect_size, nobs, alpha, df=df,
+                           alternative=alternative)

-    def solve_power(self, effect_size=None, nobs=None, alpha=None, power=
-        None, alternative='two-sided'):
-        """solve for any one parameter of the power of a one sample t-test
+    #method is only added to have explicit keywords and docstring
+    def solve_power(self, effect_size=None, nobs=None, alpha=None, power=None,
+                    alternative='two-sided'):
+        '''solve for any one parameter of the power of a one sample t-test

         for the one sample t-test the keywords are:
             effect_size, nobs, alpha, power
@@ -440,20 +719,26 @@ class TTestPower(Power):
         ``brentq`` with fixed bounds is used. However, there can still be cases
         where this fails.

-        """
-        pass
-
+        '''
+        # for debugging
+        #print 'calling ttest solve with', (effect_size, nobs, alpha, power, alternative)
+        return super(TTestPower, self).solve_power(effect_size=effect_size,
+                                                      nobs=nobs,
+                                                      alpha=alpha,
+                                                      power=power,
+                                                      alternative=alternative)

 class TTestIndPower(Power):
-    """Statistical Power calculations for t-test for two independent sample
+    '''Statistical Power calculations for t-test for two independent sample

     currently only uses pooled variance

-    """
+    '''
+

     def power(self, effect_size, nobs1, alpha, ratio=1, df=None,
-        alternative='two-sided'):
-        """Calculate the power of a t-test for two independent sample
+              alternative='two-sided'):
+        '''Calculate the power of a t-test for two independent sample

         Parameters
         ----------
@@ -487,12 +772,21 @@ class TTestIndPower(Power):
             type II error. Power is the probability that the test correctly
             rejects the Null Hypothesis if the Alternative Hypothesis is true.

-        """
-        pass
+        '''
+
+        nobs2 = nobs1*ratio
+        #pooled variance
+        if df is None:
+            df = (nobs1 - 1 + nobs2 - 1)
+
+        nobs = 1./ (1. / nobs1 + 1. / nobs2)
+        #print 'calling ttest power with', (effect_size, nobs, alpha, df, alternative)
+        return ttest_power(effect_size, nobs, alpha, df=df, alternative=alternative)

-    def solve_power(self, effect_size=None, nobs1=None, alpha=None, power=
-        None, ratio=1.0, alternative='two-sided'):
-        """solve for any one parameter of the power of a two sample t-test
+    #method is only added to have explicit keywords and docstring
+    def solve_power(self, effect_size=None, nobs1=None, alpha=None, power=None,
+                    ratio=1., alternative='two-sided'):
+        '''solve for any one parameter of the power of a two sample t-test

         for t-test the keywords are:
             effect_size, nobs1, alpha, power, ratio
@@ -541,24 +835,28 @@ class TTestIndPower(Power):
         ``brentq`` with fixed bounds is used. However, there can still be cases
         where this fails.

-        """
-        pass
-
+        '''
+        return super(TTestIndPower, self).solve_power(effect_size=effect_size,
+                                                      nobs1=nobs1,
+                                                      alpha=alpha,
+                                                      power=power,
+                                                      ratio=ratio,
+                                                      alternative=alternative)

 class NormalIndPower(Power):
-    """Statistical Power calculations for z-test for two independent samples.
+    '''Statistical Power calculations for z-test for two independent samples.

     currently only uses pooled variance

-    """
+    '''

     def __init__(self, ddof=0, **kwds):
         self.ddof = ddof
         super(NormalIndPower, self).__init__(**kwds)

-    def power(self, effect_size, nobs1, alpha, ratio=1, alternative='two-sided'
-        ):
-        """Calculate the power of a z-test for two independent sample
+    def power(self, effect_size, nobs1, alpha, ratio=1,
+              alternative='two-sided'):
+        '''Calculate the power of a z-test for two independent sample

         Parameters
         ----------
@@ -589,12 +887,23 @@ class NormalIndPower(Power):
             type II error. Power is the probability that the test correctly
             rejects the Null Hypothesis if the Alternative Hypothesis is true.

-        """
-        pass
+        '''

-    def solve_power(self, effect_size=None, nobs1=None, alpha=None, power=
-        None, ratio=1.0, alternative='two-sided'):
-        """solve for any one parameter of the power of a two sample z-test
+        ddof = self.ddof  # for correlation, ddof=3
+
+        # get effective nobs, factor for std of test statistic
+        if ratio > 0:
+            nobs2 = nobs1*ratio
+            #equivalent to nobs = n1*n2/(n1+n2)=n1*ratio/(1+ratio)
+            nobs = 1./ (1. / (nobs1 - ddof) + 1. / (nobs2 - ddof))
+        else:
+            nobs = nobs1 - ddof
+        return normal_power(effect_size, nobs, alpha, alternative=alternative)
+
+    #method is only added to have explicit keywords and docstring
+    def solve_power(self, effect_size=None, nobs1=None, alpha=None, power=None,
+                    ratio=1., alternative='two-sided'):
+        '''solve for any one parameter of the power of a two sample z-test

         for z-test the keywords are:
             effect_size, nobs1, alpha, power, ratio
@@ -647,8 +956,13 @@ class NormalIndPower(Power):
         ``brentq`` with fixed bounds is used. However, there can still be cases
         where this fails.

-        """
-        pass
+        '''
+        return super(NormalIndPower, self).solve_power(effect_size=effect_size,
+                                                      nobs1=nobs1,
+                                                      alpha=alpha,
+                                                      power=power,
+                                                      ratio=ratio,
+                                                      alternative=alternative)


 class FTestPower(Power):
@@ -695,7 +1009,7 @@ class FTestPower(Power):
     """

     def power(self, effect_size, df_num, df_denom, alpha, ncc=1):
-        """Calculate the power of a F-test.
+        '''Calculate the power of a F-test.

         The effect size is Cohen's ``f``, square root of ``f2``.

@@ -740,12 +1054,16 @@ class FTestPower(Power):

         ftest_power with ncc=0 should also be correct for f_test in regression
         models, with df_num and d_denom as defined there. (not verified yet)
-        """
-        pass
+        '''

+        pow_ = ftest_power(effect_size, df_num, df_denom, alpha, ncc=ncc)
+        #print effect_size, df_num, df_denom, alpha, pow_
+        return pow_
+
+    #method is only added to have explicit keywords and docstring
     def solve_power(self, effect_size=None, df_num=None, df_denom=None,
-        alpha=None, power=None, ncc=1, **kwargs):
-        """solve for any one parameter of the power of a F-test
+                    alpha=None, power=None, ncc=1, **kwargs):
+        '''solve for any one parameter of the power of a F-test

         for the one sample F-test the keywords are:
             effect_size, df_num, df_denom, alpha, power
@@ -803,8 +1121,19 @@ class FTestPower(Power):
         ``brentq`` with fixed bounds is used. However, there can still be cases
         where this fails.

-        """
-        pass
+        '''
+        if kwargs:
+            if "nobs" in kwargs:
+                warnings.warn("nobs is not used")
+            else:
+                raise ValueError(f"incorrect keyword(s) {kwargs}")
+        return super(FTestPower, self).solve_power(effect_size=effect_size,
+                                                      df_num=df_num,
+                                                      df_denom=df_denom,
+                                                      alpha=alpha,
+                                                      power=power,
+                                                      ncc=ncc)
+


 class FTestPowerF2(Power):
@@ -842,7 +1171,7 @@ class FTestPowerF2(Power):
     """

     def power(self, effect_size, df_num, df_denom, alpha, ncc=1):
-        """Calculate the power of a F-test.
+        '''Calculate the power of a F-test.

         The effect size is Cohen's ``f^2``.

@@ -882,12 +1211,15 @@ class FTestPowerF2(Power):

         ftest_power with ncc=0 should also be correct for f_test in regression
         models, with df_num and d_denom as defined there. (not verified yet)
-        """
-        pass
+        '''

+        pow_ = ftest_power_f2(effect_size, df_num, df_denom, alpha, ncc=ncc)
+        return pow_
+
+    #method is only added to have explicit keywords and docstring
     def solve_power(self, effect_size=None, df_num=None, df_denom=None,
-        alpha=None, power=None, ncc=1):
-        """Solve for any one parameter of the power of a F-test
+                    alpha=None, power=None, ncc=1):
+        '''Solve for any one parameter of the power of a F-test

         for the one sample F-test the keywords are:
             effect_size, df_num, df_denom, alpha, power
@@ -937,12 +1269,18 @@ class FTestPowerF2(Power):
         ``brentq`` with fixed bounds is used. However, there can still be cases
         where this fails.

-        """
-        pass
+        '''
+
+        return super(FTestPowerF2, self).solve_power(effect_size=effect_size,
+                                                      df_num=df_num,
+                                                      df_denom=df_denom,
+                                                      alpha=alpha,
+                                                      power=power,
+                                                      ncc=ncc)


 class FTestAnovaPower(Power):
-    """Statistical Power calculations F-test for one factor balanced ANOVA
+    '''Statistical Power calculations F-test for one factor balanced ANOVA

     This is based on Cohen's f as effect size measure.

@@ -950,10 +1288,10 @@ class FTestAnovaPower(Power):
     --------
     statsmodels.stats.oneway.effectsize_oneway

-    """
+    '''

     def power(self, effect_size, nobs, alpha, k_groups=2):
-        """Calculate the power of a F-test for one factor ANOVA.
+        '''Calculate the power of a F-test for one factor ANOVA.

         Parameters
         ----------
@@ -975,12 +1313,13 @@ class FTestAnovaPower(Power):
             type II error. Power is the probability that the test correctly
             rejects the Null Hypothesis if the Alternative Hypothesis is true.

-       """
-        pass
+       '''
+        return ftest_anova_power(effect_size, nobs, alpha, k_groups=k_groups)

-    def solve_power(self, effect_size=None, nobs=None, alpha=None, power=
-        None, k_groups=2):
-        """solve for any one parameter of the power of a F-test
+    #method is only added to have explicit keywords and docstring
+    def solve_power(self, effect_size=None, nobs=None, alpha=None, power=None,
+                    k_groups=2):
+        '''solve for any one parameter of the power of a F-test

         for the one sample F-test the keywords are:
             effect_size, nobs, alpha, power
@@ -1019,23 +1358,51 @@ class FTestAnovaPower(Power):
         ``brentq`` with fixed bounds is used. However, there can still be cases
         where this fails.

-        """
-        pass
+        '''
+        # update start values for root finding
+        if k_groups is not None:
+            self.start_ttp['nobs'] = k_groups * 10
+            self.start_bqexp['nobs'] = dict(low=k_groups * 2,
+                                            start_upp=k_groups * 10)
+        # first attempt at special casing
+        if effect_size is None:
+            return self._solve_effect_size(effect_size=effect_size,
+                                           nobs=nobs,
+                                           alpha=alpha,
+                                           k_groups=k_groups,
+                                           power=power)
+
+        return super(FTestAnovaPower, self).solve_power(effect_size=effect_size,
+                                                      nobs=nobs,
+                                                      alpha=alpha,
+                                                      k_groups=k_groups,
+                                                      power=power)

     def _solve_effect_size(self, effect_size=None, nobs=None, alpha=None,
-        power=None, k_groups=2):
-        """experimental, test failure in solve_power for effect_size
-        """
-        pass
+                           power=None, k_groups=2):
+        '''experimental, test failure in solve_power for effect_size
+        '''
+        def func(x):
+            effect_size = x
+            return self._power_identity(effect_size=effect_size,
+                                          nobs=nobs,
+                                          alpha=alpha,
+                                          k_groups=k_groups,
+                                          power=power)
+
+        val, r = optimize.brentq(func, 1e-8, 1-1e-8, full_output=True)
+        if not r.converged:
+            print(r)
+        return val


 class GofChisquarePower(Power):
-    """Statistical Power calculations for one sample chisquare test
+    '''Statistical Power calculations for one sample chisquare test

-    """
+    '''

-    def power(self, effect_size, nobs, alpha, n_bins, ddof=0):
-        """Calculate the power of a chisquare test for one sample
+    def power(self, effect_size, nobs, alpha, n_bins, ddof=0):#alternative='two-sided'):
+        '''Calculate the power of a chisquare test for one sample

         Only two-sided alternative is implemented

@@ -1059,12 +1426,14 @@ class GofChisquarePower(Power):
             type II error. Power is the probability that the test correctly
             rejects the Null Hypothesis if the Alternative Hypothesis is true.

-       """
-        pass
+       '''
+        from statsmodels.stats.gof import chisquare_power
+        return chisquare_power(effect_size, nobs, n_bins, alpha, ddof=0)

-    def solve_power(self, effect_size=None, nobs=None, alpha=None, power=
-        None, n_bins=2):
-        """solve for any one parameter of the power of a one sample chisquare-test
+    #method is only added to have explicit keywords and docstring
+    def solve_power(self, effect_size=None, nobs=None, alpha=None,
+                    power=None, n_bins=2):
+        '''solve for any one parameter of the power of a one sample chisquare-test

         for the one sample chisquare-test the keywords are:
             effect_size, nobs, alpha, power
@@ -1107,22 +1476,26 @@ class GofChisquarePower(Power):
         ``brentq`` with fixed bounds is used. However, there can still be cases
         where this fails.

-        """
-        pass
-
+        '''
+        return super(GofChisquarePower, self).solve_power(effect_size=effect_size,
+                                                      nobs=nobs,
+                                                      n_bins=n_bins,
+                                                      alpha=alpha,
+                                                      power=power)

 class _GofChisquareIndPower(Power):
-    """Statistical Power calculations for chisquare goodness-of-fit test
+    '''Statistical Power calculations for chisquare goodness-of-fit test

     TODO: this is not working yet
           for 2sample case need two nobs in function
           no one-sided chisquare test, is there one? use normal distribution?
           -> drop one-sided options?
-    """
+    '''

-    def power(self, effect_size, nobs1, alpha, ratio=1, alternative='two-sided'
-        ):
-        """Calculate the power of a chisquare for two independent sample
+
+    def power(self, effect_size, nobs1, alpha, ratio=1,
+              alternative='two-sided'):
+        '''Calculate the power of a chisquare for two independent sample

         Parameters
         ----------
@@ -1153,12 +1526,18 @@ class _GofChisquareIndPower(Power):
             type II error. Power is the probability that the test correctly
             rejects the Null Hypothesis if the Alternative Hypothesis is true.

-        """
-        pass
+        '''

-    def solve_power(self, effect_size=None, nobs1=None, alpha=None, power=
-        None, ratio=1.0, alternative='two-sided'):
-        """solve for any one parameter of the power of a two sample z-test
+        from statsmodels.stats.gof import chisquare_power
+        nobs2 = nobs1*ratio
+        #equivalent to nobs = n1*n2/(n1+n2)=n1*ratio/(1+ratio)
+        nobs = 1./ (1. / nobs1 + 1. / nobs2)
+        return chisquare_power(effect_size, nobs, alpha)
+
+    #method is only added to have explicit keywords and docstring
+    def solve_power(self, effect_size=None, nobs1=None, alpha=None, power=None,
+                    ratio=1., alternative='two-sided'):
+        '''solve for any one parameter of the power of a two sample z-test

         for z-test the keywords are:
             effect_size, nobs1, alpha, power, ratio
@@ -1207,10 +1586,15 @@ class _GofChisquareIndPower(Power):
         ``brentq`` with fixed bounds is used. However, there can still be cases
         where this fails.

-        """
-        pass
-
+        '''
+        return super(_GofChisquareIndPower, self).solve_power(effect_size=effect_size,
+                                                      nobs1=nobs1,
+                                                      alpha=alpha,
+                                                      power=power,
+                                                      ratio=ratio,
+                                                      alternative=alternative)

+#shortcut functions
 tt_solve_power = TTestPower().solve_power
 tt_ind_solve_power = TTestIndPower().solve_power
 zt_ind_solve_power = NormalIndPower().solve_power
diff --git a/statsmodels/stats/proportion.py b/statsmodels/stats/proportion.py
index 7bca256d5..66c9c727e 100644
--- a/statsmodels/stats/proportion.py
+++ b/statsmodels/stats/proportion.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Tests and Confidence Intervals for Binomial Proportions

@@ -6,21 +7,25 @@ Created on Fri Mar 01 00:23:07 2013
 Author: Josef Perktold
 License: BSD-3
 """
+
 from statsmodels.compat.python import lzip
 from typing import Callable, Tuple
 import numpy as np
 import pandas as pd
 from scipy import optimize, stats
+
 from statsmodels.stats.base import AllPairsResults, HolderTuple
 from statsmodels.stats.weightstats import _zstat_generic2
 from statsmodels.tools.sm_exceptions import HypothesisTestWarning
 from statsmodels.tools.testing import Holder
 from statsmodels.tools.validation import array_like
+
 FLOAT_INFO = np.finfo(float)


-def _bound_proportion_confint(func: Callable[[float], float], qi: float,
-    lower: bool=True) ->float:
+def _bound_proportion_confint(
+    func: Callable[[float], float], qi: float, lower: bool = True
+) -> float:
     """
     Try hard to find a bound different from eps/1 - eps in proportion_confint

@@ -38,11 +43,24 @@ def _bound_proportion_confint(func: Callable[[float], float], qi: float,
     float
         The coarse bound
     """
-    pass
+    default = FLOAT_INFO.eps if lower else 1.0 - FLOAT_INFO.eps
+
+    def step(v):
+        return v / 8 if lower else v + (1.0 - v) / 8

+    x = step(qi)
+    w = func(x)
+    cnt = 1
+    while w > 0 and cnt < 10:
+        x = step(x)
+        w = func(x)
+        cnt += 1
+    return x if cnt < 10 else default

-def _bisection_search_conservative(func: Callable[[float], float], lb:
-    float, ub: float, steps: int=27) ->Tuple[float, float]:
+
+def _bisection_search_conservative(
+    func: Callable[[float], float], lb: float, ub: float, steps: int = 27
+) -> Tuple[float, float]:
     """
     Private function used as a fallback by proportion_confint

@@ -66,10 +84,32 @@ def _bisection_search_conservative(func: Callable[[float], float], lb:
     func_val : float
         The value of the function at the estimate
     """
-    pass
-
-
-def proportion_confint(count, nobs, alpha: float=0.05, method='normal'):
+    upper = func(ub)
+    lower = func(lb)
+    best = upper if upper < 0 else lower
+    best_pt = ub if upper < 0 else lb
+    if np.sign(lower) == np.sign(upper):
+        raise ValueError("problem with signs")
+    mp = (ub + lb) / 2
+    mid = func(mp)
+    if (mid < 0) and (mid > best):
+        best = mid
+        best_pt = mp
+    for _ in range(steps):
+        if np.sign(mid) == np.sign(upper):
+            ub = mp
+            upper = mid
+        else:
+            lb = mp
+        mp = (ub + lb) / 2
+        mid = func(mp)
+        if (mid < 0) and (mid > best):
+            best = mid
+            best_pt = mp
+    return best_pt, best
+
+
+def proportion_confint(count, nobs, alpha:float=0.05, method="normal"):
     """
     Confidence interval for a binomial proportion

@@ -128,7 +168,143 @@ def proportion_confint(count, nobs, alpha: float=0.05, method='normal'):
        "Interval Estimation for a Binomial Proportion", Statistical
        Science 16 (2): 101–133. doi:10.1214/ss/1009213286.
     """
-    pass
+    is_scalar = np.isscalar(count) and np.isscalar(nobs)
+    is_pandas = isinstance(count, (pd.Series, pd.DataFrame))
+    count_a = array_like(count, "count", optional=False, ndim=None)
+    nobs_a = array_like(nobs, "nobs", optional=False, ndim=None)
+
+    def _check(x: np.ndarray, name: str) -> np.ndarray:
+        if np.issubdtype(x.dtype, np.integer):
+            return x
+        y = x.astype(np.int64, casting="unsafe")
+        if np.any(y != x):
+            raise ValueError(
+                f"{name} must have an integral dtype. Found data with "
+                f"dtype {x.dtype}"
+            )
+        return y
+
+    if method == "binom_test":
+        count_a = _check(np.asarray(count_a), "count")
+        nobs_a = _check(np.asarray(nobs_a), "count")
+
+    q_ = count_a / nobs_a
+    alpha_2 = 0.5 * alpha
+
+    if method == "normal":
+        std_ = np.sqrt(q_ * (1 - q_) / nobs_a)
+        dist = stats.norm.isf(alpha / 2.0) * std_
+        ci_low = q_ - dist
+        ci_upp = q_ + dist
+    elif method == "binom_test":
+        # inverting the binomial test
+        def func_factory(count: int, nobs: int) -> Callable[[float], float]:
+            if hasattr(stats, "binomtest"):
+
+                def func(qi):
+                    return stats.binomtest(count, nobs, p=qi).pvalue - alpha
+
+            else:
+                # Remove after min SciPy >= 1.7
+                def func(qi):
+                    return stats.binom_test(count, nobs, p=qi) - alpha
+
+            return func
+
+        bcast = np.broadcast(count_a, nobs_a)
+        ci_low = np.zeros(bcast.shape)
+        ci_upp = np.zeros(bcast.shape)
+        index = bcast.index
+        for c, n in bcast:
+            # Enforce symmetry
+            reverse = False
+            _q = q_.flat[index]
+            if c > n // 2:
+                c = n - c
+                reverse = True
+                _q = 1 - _q
+            func = func_factory(c, n)
+            if c == 0:
+                ci_low.flat[index] = 0.0
+            else:
+                lower_bnd = _bound_proportion_confint(func, _q, lower=True)
+                val, _z = optimize.brentq(
+                    func, lower_bnd, _q, full_output=True
+                )
+                if func(val) > 0:
+                    power = 10
+                    new_lb = val - (val - lower_bnd) / 2**power
+                    while func(new_lb) > 0 and power >= 0:
+                        power -= 1
+                        new_lb = val - (val - lower_bnd) / 2**power
+                    val, _ = _bisection_search_conservative(func, new_lb, _q)
+                ci_low.flat[index] = val
+            if c == n:
+                ci_upp.flat[index] = 1.0
+            else:
+                upper_bnd = _bound_proportion_confint(func, _q, lower=False)
+                val, _z = optimize.brentq(
+                    func, _q, upper_bnd, full_output=True
+                )
+                if func(val) > 0:
+                    power = 10
+                    new_ub = val + (upper_bnd - val) / 2**power
+                    while func(new_ub) > 0 and power >= 0:
+                        power -= 1
+                        new_ub = val - (upper_bnd - val) / 2**power
+                    val, _ = _bisection_search_conservative(func, _q, new_ub)
+                ci_upp.flat[index] = val
+            if reverse:
+                temp = ci_upp.flat[index]
+                ci_upp.flat[index] = 1 - ci_low.flat[index]
+                ci_low.flat[index] = 1 - temp
+            index = bcast.index
+    elif method == "beta":
+        ci_low = stats.beta.ppf(alpha_2, count_a, nobs_a - count_a + 1)
+        ci_upp = stats.beta.isf(alpha_2, count_a + 1, nobs_a - count_a)
+
+        if np.ndim(ci_low) > 0:
+            ci_low.flat[q_.flat == 0] = 0
+            ci_upp.flat[q_.flat == 1] = 1
+        else:
+            ci_low = 0 if q_ == 0 else ci_low
+            ci_upp = 1 if q_ == 1 else ci_upp
+    elif method == "agresti_coull":
+        crit = stats.norm.isf(alpha / 2.0)
+        nobs_c = nobs_a + crit**2
+        q_c = (count_a + crit**2 / 2.0) / nobs_c
+        std_c = np.sqrt(q_c * (1.0 - q_c) / nobs_c)
+        dist = crit * std_c
+        ci_low = q_c - dist
+        ci_upp = q_c + dist
+    elif method == "wilson":
+        crit = stats.norm.isf(alpha / 2.0)
+        crit2 = crit**2
+        denom = 1 + crit2 / nobs_a
+        center = (q_ + crit2 / (2 * nobs_a)) / denom
+        dist = crit * np.sqrt(
+            q_ * (1.0 - q_) / nobs_a + crit2 / (4.0 * nobs_a**2)
+        )
+        dist /= denom
+        ci_low = center - dist
+        ci_upp = center + dist
+    # method adjusted to be more forgiving of misspellings or incorrect option name
+    elif method[:4] == "jeff":
+        ci_low, ci_upp = stats.beta.interval(
+            1 - alpha, count_a + 0.5, nobs_a - count_a + 0.5
+        )
+    else:
+        raise NotImplementedError(f"method {method} is not available")
+    if method in ["normal", "agresti_coull"]:
+        ci_low = np.clip(ci_low, 0, 1)
+        ci_upp = np.clip(ci_upp, 0, 1)
+    if is_pandas:
+        container = pd.Series if isinstance(count, pd.Series) else pd.DataFrame
+        ci_low = container(ci_low, index=count.index)
+        ci_upp = container(ci_upp, index=count.index)
+    if is_scalar:
+        return float(ci_low), float(ci_upp)
+    return ci_low, ci_upp


 def multinomial_proportions_confint(counts, alpha=0.05, method='goodman'):
@@ -216,11 +392,122 @@ def multinomial_proportions_confint(counts, alpha=0.05, method='goodman'):
            small counts in a large number of cells," Journal of Statistical
            Software, Vol. 5, No. 6, 2000, pp. 1-24.
     """
-    pass
+    if alpha <= 0 or alpha >= 1:
+        raise ValueError('alpha must be in (0, 1), bounds excluded')
+    counts = np.array(counts, dtype=float)
+    if (counts < 0).any():
+        raise ValueError('counts must be >= 0')
+
+    n = counts.sum()
+    k = len(counts)
+    proportions = counts / n
+    if method == 'goodman':
+        chi2 = stats.chi2.ppf(1 - alpha / k, 1)
+        delta = chi2 ** 2 + (4 * n * proportions * chi2 * (1 - proportions))
+        region = ((2 * n * proportions + chi2 +
+                   np.array([- np.sqrt(delta), np.sqrt(delta)])) /
+                  (2 * (chi2 + n))).T
+    elif method[:5] == 'sison':  # We accept any name starting with 'sison'
+        # Define a few functions we'll use a lot.
+        def poisson_interval(interval, p):
+            """
+            Compute P(b <= Z <= a) where Z ~ Poisson(p) and
+            `interval = (b, a)`.
+            """
+            b, a = interval
+            prob = stats.poisson.cdf(a, p) - stats.poisson.cdf(b - 1, p)
+            return prob
+
+        def truncated_poisson_factorial_moment(interval, r, p):
+            """
+            Compute mu_r, the r-th factorial moment of a poisson random
+            variable of parameter `p` truncated to `interval = (b, a)`.
+            """
+            b, a = interval
+            return p ** r * (1 - ((poisson_interval((a - r + 1, a), p) -
+                                   poisson_interval((b - r, b - 1), p)) /
+                                  poisson_interval((b, a), p)))
+
+        def edgeworth(intervals):
+            """
+            Compute the Edgeworth expansion term of Sison & Glaz's formula
+            (1) (approximated probability for multinomial proportions in a
+            given box).
+            """
+            # Compute means and central moments of the truncated poisson
+            # variables.
+            mu_r1, mu_r2, mu_r3, mu_r4 = [
+                np.array([truncated_poisson_factorial_moment(interval, r, p)
+                          for (interval, p) in zip(intervals, counts)])
+                for r in range(1, 5)
+            ]
+            mu = mu_r1
+            mu2 = mu_r2 + mu - mu ** 2
+            mu3 = mu_r3 + mu_r2 * (3 - 3 * mu) + mu - 3 * mu ** 2 + 2 * mu ** 3
+            mu4 = (mu_r4 + mu_r3 * (6 - 4 * mu) +
+                   mu_r2 * (7 - 12 * mu + 6 * mu ** 2) +
+                   mu - 4 * mu ** 2 + 6 * mu ** 3 - 3 * mu ** 4)
+
+            # Compute expansion factors, gamma_1 and gamma_2.
+            g1 = mu3.sum() / mu2.sum() ** 1.5
+            g2 = (mu4.sum() - 3 * (mu2 ** 2).sum()) / mu2.sum() ** 2
+
+            # Compute the expansion itself.
+            x = (n - mu.sum()) / np.sqrt(mu2.sum())
+            phi = np.exp(- x ** 2 / 2) / np.sqrt(2 * np.pi)
+            H3 = x ** 3 - 3 * x
+            H4 = x ** 4 - 6 * x ** 2 + 3
+            H6 = x ** 6 - 15 * x ** 4 + 45 * x ** 2 - 15
+            f = phi * (1 + g1 * H3 / 6 + g2 * H4 / 24 + g1 ** 2 * H6 / 72)
+            return f / np.sqrt(mu2.sum())
+
+
+        def approximated_multinomial_interval(intervals):
+            """
+            Compute approximated probability for Multinomial(n, proportions)
+            to be in `intervals` (Sison & Glaz's formula (1)).
+            """
+            return np.exp(
+                np.sum(np.log([poisson_interval(interval, p)
+                               for (interval, p) in zip(intervals, counts)])) +
+                np.log(edgeworth(intervals)) -
+                np.log(stats.poisson._pmf(n, n))
+            )
+
+        def nu(c):
+            """
+            Compute interval coverage for a given `c` (Sison & Glaz's
+            formula (7)).
+            """
+            return approximated_multinomial_interval(
+                [(np.maximum(count - c, 0), np.minimum(count + c, n))
+                 for count in counts])
+
+        # Find the value of `c` that will give us the confidence intervals
+        # (solving nu(c) <= 1 - alpha < nu(c + 1).
+        c = 1.0
+        nuc = nu(c)
+        nucp1 = nu(c + 1)
+        while not (nuc <= (1 - alpha) < nucp1):
+            if c > n:
+                raise Exception("Couldn't find a value for `c` that "
+                                "solves nu(c) <= 1 - alpha < nu(c + 1)")
+            c += 1
+            nuc = nucp1
+            nucp1 = nu(c + 1)
+
+        # Compute gamma and the corresponding confidence intervals.
+        g = (1 - alpha - nuc) / (nucp1 - nuc)
+        ci_lower = np.maximum(proportions - c / n, 0)
+        ci_upper = np.minimum(proportions + (c + 2 * g) / n, 1)
+        region = np.array([ci_lower, ci_upper]).T
+    else:
+        raise NotImplementedError('method "%s" is not available' % method)
+    return region


 def samplesize_confint_proportion(proportion, half_length, alpha=0.05,
-    method='normal'):
+                                  method='normal'):
     """
     Find sample size to get desired confidence interval length

@@ -248,7 +535,13 @@ def samplesize_confint_proportion(proportion, half_length, alpha=0.05,
     possible application: number of replications in bootstrap samples

     """
-    pass
+    q_ = proportion
+    if method == 'normal':
+        n = q_ * (1 - q_) / (half_length / stats.norm.isf(alpha / 2.))**2
+    else:
+        raise NotImplementedError('only "normal" is available')
+
+    return n


 def proportion_effectsize(prop1, prop2, method='normal'):
@@ -287,7 +580,11 @@ def proportion_effectsize(prop1, prop2, method='normal'):
     array([-0.21015893,  0.        ,  0.20135792])

     """
-    pass
+    if method != 'normal':
+        raise ValueError('only "normal" is implemented')
+
+    es = 2 * (np.arcsin(np.sqrt(prop1)) - np.arcsin(np.sqrt(prop2)))
+    return es


 def std_prop(prop, nobs):
@@ -308,12 +605,16 @@ def std_prop(prop, nobs):
     std : array_like
         standard error for a proportion of nobs independent observations
     """
-    pass
+    return np.sqrt(prop * (1. - prop) / nobs)
+
+
+def _std_diff_prop(p1, p2, ratio=1):
+    return np.sqrt(p1 * (1 - p1) + p2 * (1 - p2) / ratio)


 def _power_ztost(mean_low, var_low, mean_upp, var_upp, mean_alt, var_alt,
-    alpha=0.05, discrete=True, dist='norm', nobs=None, continuity=0,
-    critval_continuity=0):
+                 alpha=0.05, discrete=True, dist='norm', nobs=None,
+                 continuity=0, critval_continuity=0):
     """
     Generic statistical power function for normal based equivalence test

@@ -322,7 +623,38 @@ def _power_ztost(mean_low, var_low, mean_upp, var_upp, mean_alt, var_alt,

     see power_ztost_prob for a description of the options
     """
-    pass
+    # TODO: refactor structure, separate norm and binom better
+    if not isinstance(continuity, tuple):
+        continuity = (continuity, continuity)
+    crit = stats.norm.isf(alpha)
+    k_low = mean_low + np.sqrt(var_low) * crit
+    k_upp = mean_upp - np.sqrt(var_upp) * crit
+    if discrete or dist == 'binom':
+        k_low = np.ceil(k_low * nobs + 0.5 * critval_continuity)
+        k_upp = np.trunc(k_upp * nobs - 0.5 * critval_continuity)
+        if dist == 'norm':
+            #need proportion
+            k_low = (k_low) * 1. / nobs #-1 to match PASS
+            k_upp = k_upp * 1. / nobs
+#    else:
+#        if dist == 'binom':
+#            #need counts
+#            k_low *= nobs
+#            k_upp *= nobs
+    #print mean_low, np.sqrt(var_low), crit, var_low
+    #print mean_upp, np.sqrt(var_upp), crit, var_upp
+    if np.any(k_low > k_upp):   #vectorize
+        import warnings
+        warnings.warn("no overlap, power is zero", HypothesisTestWarning)
+    std_alt = np.sqrt(var_alt)
+    z_low = (k_low - mean_alt - continuity[0] * 0.5 / nobs) / std_alt
+    z_upp = (k_upp - mean_alt + continuity[1] * 0.5 / nobs) / std_alt
+    if dist == 'norm':
+        power = stats.norm.cdf(z_upp) - stats.norm.cdf(z_low)
+    elif dist == 'binom':
+        power = (stats.binom.cdf(k_upp, nobs, mean_alt) -
+                     stats.binom.cdf(k_low-1, nobs, mean_alt))
+    return power, (k_low, k_upp, z_low, z_upp)


 def binom_tost(count, nobs, low, upp):
@@ -346,7 +678,10 @@ def binom_tost(count, nobs, low, upp):
         p-values of lower and upper one-sided tests

     """
-    pass
+    # binom_test_stat only returns pval
+    tt1 = binom_test(count, nobs, alternative='larger', prop=low)
+    tt2 = binom_test(count, nobs, alternative='smaller', prop=upp)
+    return np.maximum(tt1, tt2), tt1, tt2,


 def binom_tost_reject_interval(low, upp, nobs, alpha=0.05):
@@ -371,11 +706,12 @@ def binom_tost_reject_interval(low, upp, nobs, alpha=0.05):
         lower and upper bound of rejection region

     """
-    pass
+    x_low = stats.binom.isf(alpha, nobs, low) + 1
+    x_upp = stats.binom.ppf(alpha, nobs, upp) - 1
+    return x_low, x_upp


-def binom_test_reject_interval(value, nobs, alpha=0.05, alternative='two-sided'
-    ):
+def binom_test_reject_interval(value, nobs, alpha=0.05, alternative='two-sided'):
     """
     Rejection region for binomial test for one sample proportion

@@ -393,7 +729,20 @@ def binom_test_reject_interval(value, nobs, alpha=0.05, alternative='two-sided'
     x_low, x_upp : int
         lower and upper bound of rejection region
     """
-    pass
+    if alternative in ['2s', 'two-sided']:
+        alternative = '2s'  # normalize alternative name
+        alpha = alpha / 2
+
+    if alternative in ['2s', 'smaller']:
+        x_low = stats.binom.ppf(alpha, nobs, value) - 1
+    else:
+        x_low = 0
+    if alternative in ['2s', 'larger']:
+        x_upp = stats.binom.isf(alpha, nobs, value) + 1
+    else :
+        x_upp = nobs
+
+    return int(x_low), int(x_upp)


 def binom_test(count, nobs, prop=0.5, alternative='two-sided'):
@@ -426,11 +775,37 @@ def binom_test(count, nobs, prop=0.5, alternative='two-sided'):
     -----
     This uses scipy.stats.binom_test for the two-sided alternative.
     """
-    pass
+
+    if np.any(prop > 1.0) or np.any(prop < 0.0):
+        raise ValueError("p must be in range [0,1]")
+    if alternative in ['2s', 'two-sided']:
+        try:
+            pval = stats.binomtest(count, n=nobs, p=prop).pvalue
+        except AttributeError:
+            # Remove after min SciPy >= 1.7
+            pval = stats.binom_test(count, n=nobs, p=prop)
+    elif alternative in ['l', 'larger']:
+        pval = stats.binom.sf(count-1, nobs, prop)
+    elif alternative in ['s', 'smaller']:
+        pval = stats.binom.cdf(count, nobs, prop)
+    else:
+        raise ValueError('alternative not recognized\n'
+                         'should be two-sided, larger or smaller')
+    return pval
+
+
+def power_binom_tost(low, upp, nobs, p_alt=None, alpha=0.05):
+    if p_alt is None:
+        p_alt = 0.5 * (low + upp)
+    x_low, x_upp = binom_tost_reject_interval(low, upp, nobs, alpha=alpha)
+    power = (stats.binom.cdf(x_upp, nobs, p_alt) -
+                     stats.binom.cdf(x_low-1, nobs, p_alt))
+    return power


 def power_ztost_prop(low, upp, nobs, p_alt, alpha=0.05, dist='norm',
-    variance_prop=None, discrete=True, continuity=0, critval_continuity=0):
+                     variance_prop=None, discrete=True, continuity=0,
+                     critval_continuity=0):
     """
     Power of proportions equivalence test based on normal distribution

@@ -502,7 +877,18 @@ def power_ztost_prop(low, upp, nobs, p_alt, alpha=0.05, dist='norm',
     PASS Chapter 110: Equivalence Tests for One Proportion.

     """
-    pass
+    mean_low = low
+    var_low = std_prop(low, nobs)**2
+    mean_upp = upp
+    var_upp = std_prop(upp, nobs)**2
+    mean_alt = p_alt
+    var_alt = std_prop(p_alt, nobs)**2
+    if variance_prop is not None:
+        var_low = var_upp = std_prop(variance_prop, nobs)**2
+    power = _power_ztost(mean_low, var_low, mean_upp, var_upp, mean_alt, var_alt,
+                 alpha=alpha, discrete=discrete, dist=dist, nobs=nobs,
+                 continuity=continuity, critval_continuity=critval_continuity)
+    return np.maximum(power[0], 0), power[1:]


 def _table_proportion(count, nobs):
@@ -528,11 +914,17 @@ def _table_proportion(count, nobs):
     recent scipy has more elaborate contingency table functions

     """
-    pass
+    count = np.asarray(count)
+    dt = np.promote_types(count.dtype, np.float64)
+    count = np.asarray(count, dtype=dt)
+    table = np.column_stack((count, nobs - count))
+    expected = table.sum(0) * table.sum(1)[:, None] * 1. / table.sum()
+    n_rows = table.shape[0]
+    return table, expected, n_rows


 def proportions_ztest(count, nobs, value=None, alternative='two-sided',
-    prop_var=False):
+                      prop_var=False):
     """
     Test for proportions based on normal (z) test

@@ -600,7 +992,39 @@ def proportions_ztest(count, nobs, value=None, alternative='two-sided',
     chisquare is the distribution of the square of a standard normal
     distribution.
     """
-    pass
+    # TODO: verify that this really holds
+    # TODO: add continuity correction or other improvements for small samples
+    # TODO: change options similar to propotion_ztost ?
+
+    count = np.asarray(count)
+    nobs = np.asarray(nobs)
+
+    if nobs.size == 1:
+        nobs = nobs * np.ones_like(count)
+
+    prop = count * 1. / nobs
+    k_sample = np.size(prop)
+    if value is None:
+        if k_sample == 1:
+            raise ValueError('value must be provided for a 1-sample test')
+        value = 0
+    if k_sample == 1:
+        diff = prop - value
+    elif k_sample == 2:
+        diff = prop[0] - prop[1] - value
+    else:
+        msg = 'more than two samples are not implemented yet'
+        raise NotImplementedError(msg)
+
+    p_pooled = np.sum(count) * 1. / np.sum(nobs)
+
+    nobs_fact = np.sum(1. / nobs)
+    if prop_var:
+        p_pooled = prop_var
+    var_ = p_pooled * (1 - p_pooled) * nobs_fact
+    std_diff = np.sqrt(var_)
+    from statsmodels.stats.weightstats import _zstat_generic2
+    return _zstat_generic2(diff, std_diff, alternative)


 def proportions_ztost(count, nobs, low, upp, prop_var='sample'):
@@ -638,7 +1062,21 @@ def proportions_ztost(count, nobs, low, upp, prop_var='sample'):
     checked only for 1 sample case

     """
-    pass
+    if prop_var == 'limits':
+        prop_var_low = low
+        prop_var_upp = upp
+    elif prop_var == 'sample':
+        prop_var_low = prop_var_upp = False  #ztest uses sample
+    elif prop_var == 'null':
+        prop_var_low = prop_var_upp = 0.5 * (low + upp)
+    elif np.isreal(prop_var):
+        prop_var_low = prop_var_upp = prop_var
+
+    tt1 = proportions_ztest(count, nobs, alternative='larger',
+                            prop_var=prop_var_low, value=low)
+    tt2 = proportions_ztest(count, nobs, alternative='smaller',
+                            prop_var=prop_var_upp, value=upp)
+    return np.maximum(tt1[1], tt2[1]), tt1, tt2,


 def proportions_chisquare(count, nobs, value=None):
@@ -683,7 +1121,18 @@ def proportions_chisquare(count, nobs, value=None):
     that all samples have the same proportion.

     """
-    pass
+    nobs = np.atleast_1d(nobs)
+    table, expected, n_rows = _table_proportion(count, nobs)
+    if value is not None:
+        expected = np.column_stack((nobs * value, nobs * (1 - value)))
+        ddof = n_rows - 1
+    else:
+        ddof = n_rows
+
+    #print table, expected
+    chi2stat, pval = stats.chisquare(table.ravel(), expected.ravel(),
+                                     ddof=ddof)
+    return chi2stat, pval, (table, expected)


 def proportions_chisquare_allpairs(count, nobs, multitest_method='hs'):
@@ -716,11 +1165,15 @@ def proportions_chisquare_allpairs(count, nobs, multitest_method='hs'):
     -----
     Yates continuity correction is not available.
     """
-    pass
+    #all_pairs = lmap(list, lzip(*np.triu_indices(4, 1)))
+    all_pairs = lzip(*np.triu_indices(len(count), 1))
+    pvals = [proportions_chisquare(count[list(pair)], nobs[list(pair)])[1]
+               for pair in all_pairs]
+    return AllPairsResults(pvals, all_pairs, multitest_method=multitest_method)


 def proportions_chisquare_pairscontrol(count, nobs, value=None,
-    multitest_method='hs', alternative='two-sided'):
+                               multitest_method='hs', alternative='two-sided'):
     """
     Chisquare test of proportions for pairs of k samples compared to control

@@ -759,11 +1212,19 @@ def proportions_chisquare_pairscontrol(count, nobs, value=None,
     ``value`` and ``alternative`` options are not yet implemented.

     """
-    pass
+    if (value is not None) or (alternative not in ['two-sided', '2s']):
+        raise NotImplementedError
+    #all_pairs = lmap(list, lzip(*np.triu_indices(4, 1)))
+    all_pairs = [(0, k) for k in range(1, len(count))]
+    pvals = [proportions_chisquare(count[list(pair)], nobs[list(pair)],
+                                   #alternative=alternative)[1]
+                                   )[1]
+               for pair in all_pairs]
+    return AllPairsResults(pvals, all_pairs, multitest_method=multitest_method)


 def confint_proportions_2indep(count1, nobs1, count2, nobs2, method=None,
-    compare='diff', alpha=0.05, correction=True):
+                               compare='diff', alpha=0.05, correction=True):
     """
     Confidence intervals for comparing two independent proportions.

@@ -841,11 +1302,117 @@ def confint_proportions_2indep(count1, nobs1, count2, nobs2, method=None,
        in Statistics - Theory and Methods 40 (7): 1271–82.
        https://doi.org/10.1080/03610920903576580.
     """
-    pass
-
-
-def _shrink_prob(count1, nobs1, count2, nobs2, shrink_factor=2, return_corr
-    =True):
+    method_default = {'diff': 'newcomb',
+                      'ratio': 'log-adjusted',
+                      'odds-ratio': 'logit-adjusted'}
+    # normalize compare name
+    if compare.lower() == 'or':
+        compare = 'odds-ratio'
+    if method is None:
+        method = method_default[compare]
+
+    method = method.lower()
+    if method.startswith('agr'):
+        method = 'agresti-caffo'
+
+    p1 = count1 / nobs1
+    p2 = count2 / nobs2
+    diff = p1 - p2
+    addone = 1 if method == 'agresti-caffo' else 0
+
+    if compare == 'diff':
+        if method in ['wald', 'agresti-caffo']:
+            count1_, nobs1_ = count1 + addone, nobs1 + 2 * addone
+            count2_, nobs2_ = count2 + addone, nobs2 + 2 * addone
+            p1_ = count1_ / nobs1_
+            p2_ = count2_ / nobs2_
+            diff_ = p1_ - p2_
+            var = p1_ * (1 - p1_) / nobs1_ + p2_ * (1 - p2_) / nobs2_
+            z = stats.norm.isf(alpha / 2)
+            d_wald = z * np.sqrt(var)
+            low = diff_ - d_wald
+            upp = diff_ + d_wald
+
+        elif method.startswith('newcomb'):
+            low1, upp1 = proportion_confint(count1, nobs1,
+                                            method='wilson', alpha=alpha)
+            low2, upp2 = proportion_confint(count2, nobs2,
+                                            method='wilson', alpha=alpha)
+            d_low = np.sqrt((p1 - low1)**2 + (upp2 - p2)**2)
+            d_upp = np.sqrt((p2 - low2)**2 + (upp1 - p1)**2)
+            low = diff - d_low
+            upp = diff + d_upp
+
+        elif method == "score":
+            low, upp = _score_confint_inversion(count1, nobs1, count2, nobs2,
+                                                compare=compare, alpha=alpha,
+                                                correction=correction)
+
+        else:
+            raise ValueError('method not recognized')
+
+    elif compare == 'ratio':
+        # ratio = p1 / p2
+        if method in ['log', 'log-adjusted']:
+            addhalf = 0.5 if method == 'log-adjusted' else 0
+            count1_, nobs1_ = count1 + addhalf, nobs1 + addhalf
+            count2_, nobs2_ = count2 + addhalf, nobs2 + addhalf
+            p1_ = count1_ / nobs1_
+            p2_ = count2_ / nobs2_
+            ratio_ = p1_ / p2_
+            var = (1 / count1_) - 1 / nobs1_ + 1 / count2_ - 1 / nobs2_
+            z = stats.norm.isf(alpha / 2)
+            d_log = z * np.sqrt(var)
+            low = np.exp(np.log(ratio_) - d_log)
+            upp = np.exp(np.log(ratio_) + d_log)
+
+        elif method == 'score':
+            res = _confint_riskratio_koopman(count1, nobs1, count2, nobs2,
+                                             alpha=alpha,
+                                             correction=correction)
+            low, upp = res.confint
+
+        else:
+            raise ValueError('method not recognized')
+
+    elif compare == 'odds-ratio':
+        # odds_ratio = p1 / (1 - p1) / p2 * (1 - p2)
+        if method in ['logit', 'logit-adjusted', 'logit-smoothed']:
+            if method in ['logit-smoothed']:
+                adjusted = _shrink_prob(count1, nobs1, count2, nobs2,
+                                        shrink_factor=2, return_corr=False)[0]
+                count1_, nobs1_, count2_, nobs2_ = adjusted
+
+            else:
+                addhalf = 0.5 if method == 'logit-adjusted' else 0
+                count1_, nobs1_ = count1 + addhalf, nobs1 + 2 * addhalf
+                count2_, nobs2_ = count2 + addhalf, nobs2 + 2 * addhalf
+            p1_ = count1_ / nobs1_
+            p2_ = count2_ / nobs2_
+            odds_ratio_ = p1_ / (1 - p1_) / p2_ * (1 - p2_)
+            var = (1 / count1_ + 1 / (nobs1_ - count1_) +
+                   1 / count2_ + 1 / (nobs2_ - count2_))
+            z = stats.norm.isf(alpha / 2)
+            d_log = z * np.sqrt(var)
+            low = np.exp(np.log(odds_ratio_) - d_log)
+            upp = np.exp(np.log(odds_ratio_) + d_log)
+
+        elif method == "score":
+            low, upp = _score_confint_inversion(count1, nobs1, count2, nobs2,
+                                                compare=compare, alpha=alpha,
+                                                correction=correction)
+
+        else:
+            raise ValueError('method not recognized')
+
+    else:
+        raise ValueError('compare not recognized')
+
+    return low, upp
+
+
+def _shrink_prob(count1, nobs1, count2, nobs2, shrink_factor=2,
+                 return_corr=True):
     """
     Shrink observed counts towards independence

@@ -876,12 +1443,24 @@ def _shrink_prob(count1, nobs1, count2, nobs2, shrink_factor=2, return_corr
         false.

     """
-    pass
+    vectorized = any(np.size(i) > 1 for i in [count1, nobs1, count2, nobs2])
+    if vectorized:
+        raise ValueError("function is not vectorized")
+    nobs_col = np.array([count1 + count2, nobs1 - count1 + nobs2 - count2])
+    nobs_row = np.array([nobs1, nobs2])
+    nobs = nobs1 + nobs2
+    prob_indep = (nobs_col * nobs_row[:, None]) / nobs**2
+    corr = shrink_factor * prob_indep
+    if return_corr:
+        return (corr[0, 0], corr[0].sum(), corr[1, 0], corr[1].sum())
+    else:
+        return (count1 + corr[0, 0], nobs1 + corr[0].sum(),
+                count2 + corr[1, 0], nobs2 + corr[1].sum()), prob_indep


 def score_test_proportions_2indep(count1, nobs1, count2, nobs2, value=None,
-    compare='diff', alternative='two-sided', correction=True,
-    return_results=True):
+                                  compare='diff', alternative='two-sided',
+                                  correction=True, return_results=True):
     """
     Score test for two independent proportions

@@ -929,12 +1508,115 @@ def score_test_proportions_2indep(count1, nobs1, count2, nobs2, value=None,
     change.

     """
-    pass
+
+    value_default = 0 if compare == 'diff' else 1
+    if value is None:
+        # TODO: odds ratio does not work if value=1
+        value = value_default
+
+    nobs = nobs1 + nobs2
+    count = count1 + count2
+    p1 = count1 / nobs1
+    p2 = count2 / nobs2
+    if value == value_default:
+        # use pooled estimator if equality test
+        # shortcut, but required for odds ratio
+        prop0 = prop1 = count / nobs
+    # this uses index 0 from Miettinen Nurminned 1985
+    count0, nobs0 = count2, nobs2
+    p0 = p2
+
+    if compare == 'diff':
+        diff = value  # hypothesis value
+
+        if diff != 0:
+            tmp3 = nobs
+            tmp2 = (nobs1 + 2 * nobs0) * diff - nobs - count
+            tmp1 = (count0 * diff - nobs - 2 * count0) * diff + count
+            tmp0 = count0 * diff * (1 - diff)
+            q = ((tmp2 / (3 * tmp3))**3 - tmp1 * tmp2 / (6 * tmp3**2) +
+                 tmp0 / (2 * tmp3))
+            p = np.sign(q) * np.sqrt((tmp2 / (3 * tmp3))**2 -
+                                     tmp1 / (3 * tmp3))
+            a = (np.pi + np.arccos(q / p**3)) / 3
+
+            prop0 = 2 * p * np.cos(a) - tmp2 / (3 * tmp3)
+            prop1 = prop0 + diff
+
+        var = prop1 * (1 - prop1) / nobs1 + prop0 * (1 - prop0) / nobs0
+        if correction:
+            var *= nobs / (nobs - 1)
+
+        diff_stat = (p1 - p0 - diff)
+
+    elif compare == 'ratio':
+        # risk ratio
+        ratio = value
+
+        if ratio != 1:
+            a = nobs * ratio
+            b = -(nobs1 * ratio + count1 + nobs2 + count0 * ratio)
+            c = count
+            prop0 = (-b - np.sqrt(b**2 - 4 * a * c)) / (2 * a)
+            prop1 = prop0 * ratio
+
+        var = (prop1 * (1 - prop1) / nobs1 +
+               ratio**2 * prop0 * (1 - prop0) / nobs0)
+        if correction:
+            var *= nobs / (nobs - 1)
+
+        # NCSS looks incorrect for var, but it is what should be reported
+        # diff_stat = (p1 / p0 - ratio)   # NCSS/PASS
+        diff_stat = (p1 - ratio * p0)  # Miettinen Nurminen
+
+    elif compare in ['or', 'odds-ratio']:
+        # odds ratio
+        oratio = value
+
+        if oratio != 1:
+            # Note the constraint estimator does not handle odds-ratio = 1
+            a = nobs0 * (oratio - 1)
+            b = nobs1 * oratio + nobs0 - count * (oratio - 1)
+            c = -count
+            prop0 = (-b + np.sqrt(b**2 - 4 * a * c)) / (2 * a)
+            prop1 = prop0 * oratio / (1 + prop0 * (oratio - 1))
+
+        # try to avoid 0 and 1 proportions,
+        # those raise Zero Division Runtime Warnings
+        eps = 1e-10
+        prop0 = np.clip(prop0, eps, 1 - eps)
+        prop1 = np.clip(prop1, eps, 1 - eps)
+
+        var = (1 / (prop1 * (1 - prop1) * nobs1) +
+               1 / (prop0 * (1 - prop0) * nobs0))
+        if correction:
+            var *= nobs / (nobs - 1)
+
+        diff_stat = ((p1 - prop1) / (prop1 * (1 - prop1)) -
+                     (p0 - prop0) / (prop0 * (1 - prop0)))
+
+    statistic, pvalue = _zstat_generic2(diff_stat, np.sqrt(var),
+                                        alternative=alternative)
+
+    if return_results:
+        res = HolderTuple(statistic=statistic,
+                          pvalue=pvalue,
+                          compare=compare,
+                          method='score',
+                          variance=var,
+                          alternative=alternative,
+                          prop1_null=prop1,
+                          prop2_null=prop0,
+                          )
+        return res
+    else:
+        return statistic, pvalue


 def test_proportions_2indep(count1, nobs1, count2, nobs2, value=None,
-    method=None, compare='diff', alternative='two-sided', correction=True,
-    return_results=True):
+                            method=None, compare='diff',
+                            alternative='two-sided', correction=True,
+                            return_results=True):
     """
     Hypothesis test for comparing two independent proportions

@@ -1061,11 +1743,161 @@ def test_proportions_2indep(count1, nobs1, count2, nobs2, value=None,
     - 'odds-ratio': 'logit-adjusted'

     """
-    pass
-
-
-def tost_proportions_2indep(count1, nobs1, count2, nobs2, low, upp, method=
-    None, compare='diff', correction=True):
+    method_default = {'diff': 'agresti-caffo',
+                      'ratio': 'log-adjusted',
+                      'odds-ratio': 'logit-adjusted'}
+    # normalize compare name
+    if compare.lower() == 'or':
+        compare = 'odds-ratio'
+    if method is None:
+        method = method_default[compare]
+
+    method = method.lower()
+    if method.startswith('agr'):
+        method = 'agresti-caffo'
+
+    if value is None:
+        # TODO: odds ratio does not work if value=1 for score test
+        value = 0 if compare == 'diff' else 1
+
+    count1, nobs1, count2, nobs2 = map(np.asarray,
+                                       [count1, nobs1, count2, nobs2])
+
+    p1 = count1 / nobs1
+    p2 = count2 / nobs2
+    diff = p1 - p2
+    ratio = p1 / p2
+    odds_ratio = p1 / (1 - p1) / p2 * (1 - p2)
+    res = None
+
+    if compare == 'diff':
+        if method in ['wald', 'agresti-caffo']:
+            addone = 1 if method == 'agresti-caffo' else 0
+            count1_, nobs1_ = count1 + addone, nobs1 + 2 * addone
+            count2_, nobs2_ = count2 + addone, nobs2 + 2 * addone
+            p1_ = count1_ / nobs1_
+            p2_ = count2_ / nobs2_
+            diff_stat = p1_ - p2_ - value
+            var = p1_ * (1 - p1_) / nobs1_ + p2_ * (1 - p2_) / nobs2_
+            statistic = diff_stat / np.sqrt(var)
+            distr = 'normal'
+
+        elif method.startswith('newcomb'):
+            msg = 'newcomb not available for hypothesis test'
+            raise NotImplementedError(msg)
+
+        elif method == 'score':
+            # Note score part is the same call for all compare
+            res = score_test_proportions_2indep(count1, nobs1, count2, nobs2,
+                                                value=value, compare=compare,
+                                                alternative=alternative,
+                                                correction=correction,
+                                                return_results=return_results)
+            if return_results is False:
+                statistic, pvalue = res[:2]
+            distr = 'normal'
+            # TODO/Note score_test_proportion_2samp returns statistic  and
+            #     not diff_stat
+            diff_stat = None
+        else:
+            raise ValueError('method not recognized')
+
+    elif compare == 'ratio':
+        if method in ['log', 'log-adjusted']:
+            addhalf = 0.5 if method == 'log-adjusted' else 0
+            count1_, nobs1_ = count1 + addhalf, nobs1 + addhalf
+            count2_, nobs2_ = count2 + addhalf, nobs2 + addhalf
+            p1_ = count1_ / nobs1_
+            p2_ = count2_ / nobs2_
+            ratio_ = p1_ / p2_
+            var = (1 / count1_) - 1 / nobs1_ + 1 / count2_ - 1 / nobs2_
+            diff_stat = np.log(ratio_) - np.log(value)
+            statistic = diff_stat / np.sqrt(var)
+            distr = 'normal'
+
+        elif method == 'score':
+            res = score_test_proportions_2indep(count1, nobs1, count2, nobs2,
+                                                value=value, compare=compare,
+                                                alternative=alternative,
+                                                correction=correction,
+                                                return_results=return_results)
+            if return_results is False:
+                statistic, pvalue = res[:2]
+            distr = 'normal'
+            diff_stat = None
+
+        else:
+            raise ValueError('method not recognized')
+
+    elif compare == "odds-ratio":
+
+        if method in ['logit', 'logit-adjusted', 'logit-smoothed']:
+            if method in ['logit-smoothed']:
+                adjusted = _shrink_prob(count1, nobs1, count2, nobs2,
+                                        shrink_factor=2, return_corr=False)[0]
+                count1_, nobs1_, count2_, nobs2_ = adjusted
+
+            else:
+                addhalf = 0.5 if method == 'logit-adjusted' else 0
+                count1_, nobs1_ = count1 + addhalf, nobs1 + 2 * addhalf
+                count2_, nobs2_ = count2 + addhalf, nobs2 + 2 * addhalf
+            p1_ = count1_ / nobs1_
+            p2_ = count2_ / nobs2_
+            odds_ratio_ = p1_ / (1 - p1_) / p2_ * (1 - p2_)
+            var = (1 / count1_ + 1 / (nobs1_ - count1_) +
+                   1 / count2_ + 1 / (nobs2_ - count2_))
+
+            diff_stat = np.log(odds_ratio_) - np.log(value)
+            statistic = diff_stat / np.sqrt(var)
+            distr = 'normal'
+
+        elif method == 'score':
+            res = score_test_proportions_2indep(count1, nobs1, count2, nobs2,
+                                                value=value, compare=compare,
+                                                alternative=alternative,
+                                                correction=correction,
+                                                return_results=return_results)
+            if return_results is False:
+                statistic, pvalue = res[:2]
+            distr = 'normal'
+            diff_stat = None
+        else:
+            raise ValueError('method "%s" not recognized' % method)
+
+    else:
+        raise ValueError('compare "%s" not recognized' % compare)
+
+    if distr == 'normal' and diff_stat is not None:
+        statistic, pvalue = _zstat_generic2(diff_stat, np.sqrt(var),
+                                            alternative=alternative)
+
+    if return_results:
+        if res is None:
+            res = HolderTuple(statistic=statistic,
+                              pvalue=pvalue,
+                              compare=compare,
+                              method=method,
+                              diff=diff,
+                              ratio=ratio,
+                              odds_ratio=odds_ratio,
+                              variance=var,
+                              alternative=alternative,
+                              value=value,
+                              )
+        else:
+            # we already have a return result from score test
+            # add missing attributes
+            res.diff = diff
+            res.ratio = ratio
+            res.odds_ratio = odds_ratio
+            res.value = value
+        return res
+    else:
+        return statistic, pvalue
+
+
+def tost_proportions_2indep(count1, nobs1, count2, nobs2, low, upp,
+                            method=None, compare='diff', correction=True):
     """
     Equivalence test based on two one-sided `test_proportions_2indep`

@@ -1161,7 +1993,33 @@ def tost_proportions_2indep(count1, nobs1, count2, nobs2, low, upp, method=
     the same method and comparison options.

     """
-    pass
+
+    tt1 = test_proportions_2indep(count1, nobs1, count2, nobs2, value=low,
+                                  method=method, compare=compare,
+                                  alternative='larger',
+                                  correction=correction,
+                                  return_results=True)
+    tt2 = test_proportions_2indep(count1, nobs1, count2, nobs2, value=upp,
+                                  method=method, compare=compare,
+                                  alternative='smaller',
+                                  correction=correction,
+                                  return_results=True)
+
+    # idx_max = 1 if t1.pvalue < t2.pvalue else 0
+    idx_max = np.asarray(tt1.pvalue < tt2.pvalue, int)
+    statistic = np.choose(idx_max, [tt1.statistic, tt2.statistic])
+    pvalue = np.choose(idx_max, [tt1.pvalue, tt2.pvalue])
+
+    res = HolderTuple(statistic=statistic,
+                      pvalue=pvalue,
+                      compare=compare,
+                      method=method,
+                      results_larger=tt1,
+                      results_smaller=tt2,
+                      title="Equivalence test for 2 independent proportions"
+                      )
+
+    return res


 def _std_2prop_power(diff, p2, ratio=1, alpha=0.05, value=0):
@@ -1171,11 +2029,28 @@ def _std_2prop_power(diff, p2, ratio=1, alpha=0.05, value=0):
     helper function for power and sample size computation

     """
-    pass
-
-
-def power_proportions_2indep(diff, prop2, nobs1, ratio=1, alpha=0.05, value
-    =0, alternative='two-sided', return_results=True):
+    if value != 0:
+        msg = 'non-zero diff under null, value, is not yet implemented'
+        raise NotImplementedError(msg)
+
+    nobs_ratio = ratio
+    p1 = p2 + diff
+    # The following contains currently redundant variables that will
+    # be useful for different options for the null variance
+    p_pooled = (p1 + p2 * ratio) / (1 + ratio)
+    # probabilities for the variance for the null statistic
+    p1_vnull, p2_vnull = p_pooled, p_pooled
+    p2_alt = p2
+    p1_alt = p2_alt + diff
+
+    std_null = _std_diff_prop(p1_vnull, p2_vnull, ratio=nobs_ratio)
+    std_alt = _std_diff_prop(p1_alt, p2_alt, ratio=nobs_ratio)
+    return p_pooled, std_null, std_alt
+
+
+def power_proportions_2indep(diff, prop2, nobs1, ratio=1, alpha=0.05,
+                             value=0, alternative='two-sided',
+                             return_results=True):
     """
     Power for ztest that two independent proportions are equal

@@ -1230,11 +2105,34 @@ def power_proportions_2indep(diff, prop2, nobs1, ratio=1, alpha=0.05, value
             standard error of difference under the alternative hypothesis
             (without sqrt(nobs1))
     """
-    pass
+    # TODO: avoid possible circular import, check if needed
+    from statsmodels.stats.power import normal_power_het
+
+    p_pooled, std_null, std_alt = _std_2prop_power(diff, prop2, ratio=ratio,
+                                                   alpha=alpha, value=value)
+
+    pow_ = normal_power_het(diff, nobs1, alpha, std_null=std_null,
+                            std_alternative=std_alt,
+                            alternative=alternative)
+
+    if return_results:
+        res = Holder(power=pow_,
+                     p_pooled=p_pooled,
+                     std_null=std_null,
+                     std_alt=std_alt,
+                     nobs1=nobs1,
+                     nobs2=ratio * nobs1,
+                     nobs_ratio=ratio,
+                     alpha=alpha,
+                     )
+        return res
+    else:
+        return pow_


 def samplesize_proportions_2indep_onetail(diff, prop2, power, ratio=1,
-    alpha=0.05, value=0, alternative='two-sided'):
+                                          alpha=0.05, value=0,
+                                          alternative='two-sided'):
     """
     Required sample size assuming normal distribution based on one tail

@@ -1271,11 +2169,22 @@ def samplesize_proportions_2indep_onetail(diff, prop2, power, ratio=1,
     nobs1 : float
         Number of observations in sample 1.
     """
-    pass
+    # TODO: avoid possible circular import, check if needed
+    from statsmodels.stats.power import normal_sample_size_one_tail
+
+    if alternative in ['two-sided', '2s']:
+        alpha = alpha / 2
+
+    _, std_null, std_alt = _std_2prop_power(diff, prop2, ratio=ratio,
+                                            alpha=alpha, value=value)
+
+    nobs = normal_sample_size_one_tail(diff, power, alpha, std_null=std_null,
+                                       std_alternative=std_alt)
+    return nobs


 def _score_confint_inversion(count1, nobs1, count2, nobs2, compare='diff',
-    alpha=0.05, correction=True):
+                             alpha=0.05, correction=True):
     """
     Compute score confidence interval by inverting score test

@@ -1307,11 +2216,49 @@ def _score_confint_inversion(count1, nobs1, count2, nobs2, compare='diff',
     upp : float
         Upper confidence bound.
     """
-    pass
+
+    def func(v):
+        r = test_proportions_2indep(count1, nobs1, count2, nobs2,
+                                    value=v, compare=compare, method='score',
+                                    correction=correction,
+                                    alternative="two-sided")
+        return r.pvalue - alpha
+
+    rt0 = test_proportions_2indep(count1, nobs1, count2, nobs2,
+                                  value=0, compare=compare, method='score',
+                                  correction=correction,
+                                  alternative="two-sided")
+
+    # use default method to get starting values
+    # this will not work if score confint becomes default
+    # maybe use "wald" as alias that works for all compare statistics
+    use_method = {"diff": "wald", "ratio": "log", "odds-ratio": "logit"}
+    rci0 = confint_proportions_2indep(count1, nobs1, count2, nobs2,
+                                      method=use_method[compare],
+                                      compare=compare, alpha=alpha)
+
+    # Note diff might be negative
+    ub = rci0[1] + np.abs(rci0[1]) * 0.5
+    lb = rci0[0] - np.abs(rci0[0]) * 0.25
+    if compare == 'diff':
+        param = rt0.diff
+        # 1 might not be the correct upper bound because
+        #     rootfinding is for the `diff` and not for a probability.
+        ub = min(ub, 0.99999)
+    elif compare == 'ratio':
+        param = rt0.ratio
+        ub *= 2  # add more buffer
+    if compare == 'odds-ratio':
+        param = rt0.odds_ratio
+
+    # root finding for confint bounds
+    upp = optimize.brentq(func, param, ub)
+    low = optimize.brentq(func, lb, param)
+    return low, upp


 def _confint_riskratio_koopman(count1, nobs1, count2, nobs2, alpha=0.05,
-    correction=True):
+                               correction=True):
     """
     Score confidence interval for ratio or proportions, Koopman/Nam

@@ -1320,7 +2267,31 @@ def _confint_riskratio_koopman(count1, nobs1, count2, nobs2, alpha=0.05,
     When correction is True, then the small sample correction nobs / (nobs - 1)
     by Miettinen/Nurminen is used.
     """
-    pass
+    # The names below follow Nam
+    x0, x1, n0, n1 = count2, count1, nobs2, nobs1
+    x = x0 + x1
+    n = n0 + n1
+    z = stats.norm.isf(alpha / 2)**2
+    if correction:
+        # Mietinnen/Nurminen small sample correction
+        z *= n / (n - 1)
+    # z = stats.chi2.isf(alpha, 1)
+    # equ 6 in Nam 1995
+    a1 = n0 * (n0 * n * x1 + n1 * (n0 + x1) * z)
+    a2 = - n0 * (n0 * n1 * x + 2 * n * x0 * x1 + n1 * (n0 + x0 + 2 * x1) * z)
+    a3 = 2 * n0 * n1 * x0 * x + n * x0 * x0 * x1 + n0 * n1 * x * z
+    a4 = - n1 * x0 * x0 * x
+
+    p_roots_ = np.sort(np.roots([a1, a2, a3, a4]))
+    p_roots = p_roots_[:2][::-1]
+
+    # equ 5
+    ci = (1 - (n1 - x1) * (1 - p_roots) / (x0 + n1 - n * p_roots)) / p_roots
+
+    res = Holder()
+    res.confint = ci
+    res._p_roots = p_roots_  # for unit tests, can be dropped
+    return res


 def _confint_riskratio_paired_nam(table, alpha=0.05):
@@ -1350,4 +2321,33 @@ def _confint_riskratio_paired_nam(table, alpha=0.05):
     confidence interval agrees only at 2 decimals

     """
-    pass
+    x11, x10, x01, x00 = np.ravel(table)
+    n = np.sum(table)  # nobs
+    p10, p01 = x10 / n, x01 / n
+    p1 = (x11 + x10) / n
+    p0 = (x11 + x01) / n
+    q00 = 1 - x00 / n
+
+    z2 = stats.norm.isf(alpha / 2)**2
+    # z = stats.chi2.isf(alpha, 1)
+    # before equ 3 in Nam 2009
+
+    g1 = (n * p0 + z2 / 2) * p0
+    g2 = - (2 * n * p1 * p0 + z2 * q00)
+    g3 = (n * p1 + z2 / 2) * p1
+
+    a0 = g1**2 - (z2 * p0 / 2)**2
+    a1 = 2 * g1 * g2
+    a2 = g2**2 + 2 * g1 * g3 + z2**2 * (p1 * p0 - 2 * p10 * p01) / 2
+    a3 = 2 * g2 * g3
+    a4 = g3**2 - (z2 * p1 / 2)**2
+
+    p_roots = np.sort(np.roots([a0, a1, a2, a3, a4]))
+    # p_roots = np.sort(np.roots([1, a1 / a0, a2 / a0, a3 / a0, a4 / a0]))
+
+    ci = [p_roots.min(), p_roots.max()]
+    res = Holder()
+    res.confint = ci
+    res.p = p1, p0
+    res._p_roots = p_roots  # for unit tests, can be dropped
+    return res
diff --git a/statsmodels/stats/rates.py b/statsmodels/stats/rates.py
index 39fa79890..cfb0e62c3 100644
--- a/statsmodels/stats/rates.py
+++ b/statsmodels/stats/rates.py
@@ -1,25 +1,53 @@
-"""
+'''
 Test for ratio of Poisson intensities in two independent samples

 Author: Josef Perktold
 License: BSD-3

-"""
+'''
+
 import numpy as np
 import warnings
+
 from scipy import stats, optimize
+
 from statsmodels.stats.base import HolderTuple
 from statsmodels.stats.weightstats import _zstat_generic2
 from statsmodels.stats._inference_tools import _mover_confint
+
+# shorthand
 norm = stats.norm
-method_names_poisson_1samp = {'test': ['wald', 'score', 'exact-c', 'midp-c',
-    'waldccv', 'sqrt-a', 'sqrt-v', 'sqrt'], 'confint': ['wald', 'score',
-    'exact-c', 'midp-c', 'jeff', 'waldccv', 'sqrt-a', 'sqrt-v', 'sqrt',
-    'sqrt-cent', 'sqrt-centcc']}


-def test_poisson(count, nobs, value, method=None, alternative='two-sided',
-    dispersion=1):
+method_names_poisson_1samp = {
+    "test": [
+        "wald",
+        "score",
+        "exact-c",
+        "midp-c",
+        "waldccv",
+        "sqrt-a",
+        "sqrt-v",
+        "sqrt",
+        ],
+    "confint": [
+        "wald",
+        "score",
+        "exact-c",
+        "midp-c",
+        "jeff",
+        "waldccv",
+        "sqrt-a",
+        "sqrt-v",
+        "sqrt",
+        "sqrt-cent",
+        "sqrt-centcc",
+        ]
+    }
+
+
+def test_poisson(count, nobs, value, method=None, alternative="two-sided",
+                 dispersion=1):
     """Test for one sample poisson mean or rate

     Parameters
@@ -72,7 +100,90 @@ def test_poisson(count, nobs, value, method=None, alternative='two-sided',
     confint_poisson

     """
-    pass
+
+    n = nobs  # short hand
+    rate = count / n
+
+    if method is None:
+        msg = "method needs to be specified, currently no default method"
+        raise ValueError(msg)
+
+    if dispersion != 1:
+        if method not in ["wald", "waldcc", "score"]:
+            msg = "excess dispersion only supported in wald and score methods"
+            raise ValueError(msg)
+
+    dist = "normal"
+
+    if method == "wald":
+        std = np.sqrt(dispersion * rate / n)
+        statistic = (rate - value) / std
+
+    elif method == "waldccv":
+        # WCC in Barker 2002
+        # add 0.5 event, not 0.5 event rate as in waldcc
+        # std = np.sqrt((rate + 0.5 / n) / n)
+        # statistic = (rate + 0.5 / n - value) / std
+        std = np.sqrt(dispersion * (rate + 0.5 / n) / n)
+        statistic = (rate - value) / std
+
+    elif method == "score":
+        std = np.sqrt(dispersion * value / n)
+        statistic = (rate - value) / std
+        pvalue = stats.norm.sf(statistic)
+
+    elif method.startswith("exact-c") or method.startswith("midp-c"):
+        pv1 = stats.poisson.cdf(count, n * value)
+        pv2 = stats.poisson.sf(count - 1, n * value)
+        if method.startswith("midp-c"):
+            pv1 = pv1 - 0.5 * stats.poisson.pmf(count, n * value)
+            pv2 = pv2 - 0.5 * stats.poisson.pmf(count, n * value)
+        if alternative == "two-sided":
+            pvalue = 2 * np.minimum(pv1, pv2)
+        elif alternative == "larger":
+            pvalue = pv2
+        elif alternative == "smaller":
+            pvalue = pv1
+        else:
+            msg = 'alternative should be "two-sided", "larger" or "smaller"'
+            raise ValueError(msg)
+
+        statistic = np.nan
+        dist = "Poisson"
+
+    elif method == "sqrt":
+        std = 0.5
+        statistic = (np.sqrt(count) - np.sqrt(n * value)) / std
+
+    elif method == "sqrt-a":
+        # anscombe, based on Swift 2009 (with transformation to rate)
+        std = 0.5
+        statistic = (np.sqrt(count + 3 / 8) - np.sqrt(n * value + 3 / 8)) / std
+
+    elif method == "sqrt-v":
+        # vandenbroucke, based on Swift 2009 (with transformation to rate)
+        std = 0.5
+        crit = stats.norm.isf(0.025)
+        statistic = (np.sqrt(count + (crit**2 + 2) / 12) -
+                     # np.sqrt(n * value + (crit**2 + 2) / 12)) / std
+                     np.sqrt(n * value)) / std
+
+    else:
+        raise ValueError("unknown method %s" % method)
+
+    if dist == 'normal':
+        statistic, pvalue = _zstat_generic2(statistic, 1, alternative)
+
+    res = HolderTuple(
+        statistic=statistic,
+        pvalue=np.clip(pvalue, 0, 1),
+        distribution=dist,
+        method=method,
+        alternative=alternative,
+        rate=rate,
+        nobs=n
+        )
+    return res


 def confint_poisson(count, exposure, method=None, alpha=0.05):
@@ -157,11 +268,110 @@ def confint_poisson(count, exposure, method=None, alpha=0.05):
        https://doi.org/10.1080/03610920802255856.

     """
-    pass
-
-
-def tolerance_int_poisson(count, exposure, prob=0.95, exposure_new=1.0,
-    method=None, alpha=0.05, alternative='two-sided'):
+    n = exposure  # short hand
+    rate = count / exposure
+    alpha = alpha / 2  # two-sided
+
+    if method is None:
+        msg = "method needs to be specified, currently no default method"
+        raise ValueError(msg)
+
+    if method == "wald":
+        whalf = stats.norm.isf(alpha) * np.sqrt(rate / n)
+        ci = (rate - whalf, rate + whalf)
+
+    elif method == "waldccv":
+        # based on WCC in Barker 2002
+        # add 0.5 event, not 0.5 event rate as in BARKER waldcc
+        whalf = stats.norm.isf(alpha) * np.sqrt((rate + 0.5 / n) / n)
+        ci = (rate - whalf, rate + whalf)
+
+    elif method == "score":
+        crit = stats.norm.isf(alpha)
+        center = count + crit**2 / 2
+        whalf = crit * np.sqrt((count + crit**2 / 4))
+        ci = ((center - whalf) / n, (center + whalf) / n)
+
+    elif method == "midp-c":
+        # note local alpha above is for one tail
+        ci = _invert_test_confint(count, n, alpha=2 * alpha, method="midp-c",
+                                  method_start="exact-c")
+
+    elif method == "sqrt":
+        # drop, wrong n
+        crit = stats.norm.isf(alpha)
+        center = rate + crit**2 / (4 * n)
+        whalf = crit * np.sqrt(rate / n)
+        ci = (center - whalf, center + whalf)
+
+    elif method == "sqrt-cent":
+        crit = stats.norm.isf(alpha)
+        center = count + crit**2 / 4
+        whalf = crit * np.sqrt((count + 3 / 8))
+        ci = ((center - whalf) / n, (center + whalf) / n)
+
+    elif method == "sqrt-centcc":
+        # drop with cc, does not match cipoisson in R survival
+        crit = stats.norm.isf(alpha)
+        # avoid sqrt of negative value if count=0
+        center_low = np.sqrt(np.maximum(count + 3 / 8 - 0.5, 0))
+        center_upp = np.sqrt(count + 3 / 8 + 0.5)
+        whalf = crit / 2
+        # above is for ci of count
+        ci = (((np.maximum(center_low - whalf, 0))**2 - 3 / 8) / n,
+              ((center_upp + whalf)**2 - 3 / 8) / n)
+
+        # crit = stats.norm.isf(alpha)
+        # center = count
+        # whalf = crit * np.sqrt((count + 3 / 8 + 0.5))
+        # ci = ((center - whalf - 0.5) / n, (center + whalf + 0.5) / n)
+
+    elif method == "sqrt-a":
+        # anscombe, based on Swift 2009 (with transformation to rate)
+        crit = stats.norm.isf(alpha)
+        center = np.sqrt(count + 3 / 8)
+        whalf = crit / 2
+        # above is for ci of count
+        ci = (((np.maximum(center - whalf, 0))**2 - 3 / 8) / n,
+              ((center + whalf)**2 - 3 / 8) / n)
+
+    elif method == "sqrt-v":
+        # vandenbroucke, based on Swift 2009 (with transformation to rate)
+        crit = stats.norm.isf(alpha)
+        center = np.sqrt(count + (crit**2 + 2) / 12)
+        whalf = crit / 2
+        # above is for ci of count
+        ci = (np.maximum(center - whalf, 0))**2 / n, (center + whalf)**2 / n
+
+    elif method in ["gamma", "exact-c"]:
+        # garwood exact, gamma
+        low = stats.gamma.ppf(alpha, count) / exposure
+        upp = stats.gamma.isf(alpha, count+1) / exposure
+        if np.isnan(low).any():
+            # case with count = 0
+            if np.size(low) == 1:
+                low = 0.0
+            else:
+                low[np.isnan(low)] = 0.0
+
+        ci = (low, upp)
+
+    elif method.startswith("jeff"):
+        # jeffreys, gamma
+        countc = count + 0.5
+        ci = (stats.gamma.ppf(alpha, countc) / exposure,
+              stats.gamma.isf(alpha, countc) / exposure)
+
+    else:
+        raise ValueError("unknown method %s" % method)
+
+    ci = (np.maximum(ci[0], 0), ci[1])
+    return ci
+
+
+def tolerance_int_poisson(count, exposure, prob=0.95, exposure_new=1.,
+                          method=None, alpha=0.05,
+                          alternative="two-sided"):
     """tolerance interval for a poisson observation

     Parameters
@@ -214,11 +424,35 @@ def tolerance_int_poisson(count, exposure, prob=0.95, exposure_new=1.0,
        100–110. https://doi.org/10.1080/00224065.1981.11980998.

     """
-    pass
-
-
-def confint_quantile_poisson(count, exposure, prob, exposure_new=1.0,
-    method=None, alpha=0.05, alternative='two-sided'):
+    prob_tail = 1 - prob
+    alpha_ = alpha
+    if alternative != "two-sided":
+        # confint_poisson does not have one-sided alternatives
+        alpha_ = alpha * 2
+    low, upp = confint_poisson(count, exposure, method=method, alpha=alpha_)
+
+    if exposure_new != 1:
+        low *= exposure_new
+        upp *= exposure_new
+
+    if alternative == "two-sided":
+        low_pred = stats.poisson.ppf(prob_tail / 2, low)
+        upp_pred = stats.poisson.ppf(1 - prob_tail / 2, upp)
+    elif alternative == "larger":
+        low_pred = 0
+        upp_pred = stats.poisson.ppf(1 - prob_tail, upp)
+    elif alternative == "smaller":
+        low_pred = stats.poisson.ppf(prob_tail, low)
+        upp_pred = np.inf
+
+    # clip -1 of ppf(0)
+    low_pred = np.maximum(low_pred, 0)
+    return low_pred, upp_pred
+
+
+def confint_quantile_poisson(count, exposure, prob, exposure_new=1.,
+                             method=None, alpha=0.05,
+                             alternative="two-sided"):
     """confidence interval for quantile of poisson random variable

     Parameters
@@ -261,35 +495,117 @@ def confint_quantile_poisson(count, exposure, prob, exposure_new=1.0,
     Hahn, Gerald J, and William Q Meeker. 2010. Statistical Intervals: A Guide
     for Practitioners.
     """
-    pass
-
-
-def _invert_test_confint(count, nobs, alpha=0.05, method='midp-c',
-    method_start='exact-c'):
+    alpha_ = alpha
+    if alternative != "two-sided":
+        # confint_poisson does not have one-sided alternatives
+        alpha_ = alpha * 2
+    low, upp = confint_poisson(count, exposure, method=method, alpha=alpha_)
+    if exposure_new != 1:
+        low *= exposure_new
+        upp *= exposure_new
+
+    if alternative == "two-sided":
+        low_pred = stats.poisson.ppf(prob, low)
+        upp_pred = stats.poisson.ppf(prob, upp)
+    elif alternative == "larger":
+        low_pred = 0
+        upp_pred = stats.poisson.ppf(prob, upp)
+    elif alternative == "smaller":
+        low_pred = stats.poisson.ppf(prob, low)
+        upp_pred = np.inf
+
+    # clip -1 of ppf(0)
+    low_pred = np.maximum(low_pred, 0)
+    return low_pred, upp_pred
+
+
+def _invert_test_confint(count, nobs, alpha=0.05, method="midp-c",
+                         method_start="exact-c"):
     """invert hypothesis test to get confidence interval
     """
-    pass
-

-def _invert_test_confint_2indep(count1, exposure1, count2, exposure2, alpha
-    =0.05, method='score', compare='diff', method_start='wald'):
+    def func(r):
+        v = (test_poisson(count, nobs, value=r, method=method)[1] -
+             alpha)**2
+        return v
+
+    ci = confint_poisson(count, nobs, method=method_start)
+    low = optimize.fmin(func, ci[0], xtol=1e-8, disp=False)
+    upp = optimize.fmin(func, ci[1], xtol=1e-8, disp=False)
+    assert np.size(low) == 1
+    return low[0], upp[0]
+
+
+def _invert_test_confint_2indep(
+        count1, exposure1, count2, exposure2,
+        alpha=0.05,
+        method="score",
+        compare="diff",
+        method_start="wald"
+        ):
     """invert hypothesis test to get confidence interval for 2indep
     """
-    pass

-
-method_names_poisson_2indep = {'test': {'ratio': ['wald', 'score',
-    'score-log', 'wald-log', 'exact-cond', 'cond-midp', 'sqrt',
-    'etest-score', 'etest-wald'], 'diff': ['wald', 'score', 'waldccv',
-    'etest-score', 'etest-wald']}, 'confint': {'ratio': ['waldcc', 'score',
-    'score-log', 'wald-log', 'sqrtcc', 'mover'], 'diff': ['wald', 'score',
-    'waldccv', 'mover']}}
+    def func(r):
+        v = (test_poisson_2indep(
+             count1, exposure1, count2, exposure2,
+             value=r, method=method, compare=compare
+             )[1] - alpha)**2
+        return v
+
+    ci = confint_poisson_2indep(count1, exposure1, count2, exposure2,
+                                method=method_start, compare=compare)
+    low = optimize.fmin(func, ci[0], xtol=1e-8, disp=False)
+    upp = optimize.fmin(func, ci[1], xtol=1e-8, disp=False)
+    assert np.size(low) == 1
+    return low[0], upp[0]
+
+
+method_names_poisson_2indep = {
+    "test": {
+        "ratio": [
+            "wald",
+            "score",
+            "score-log",
+            "wald-log",
+            "exact-cond",
+            "cond-midp",
+            "sqrt",
+            "etest-score",
+            "etest-wald"
+            ],
+        "diff": [
+            "wald",
+            "score",
+            "waldccv",
+            "etest-score",
+            "etest-wald"
+            ]
+        },
+    "confint": {
+        "ratio": [
+            "waldcc",
+            "score",
+            "score-log",
+            "wald-log",
+            "sqrtcc",
+            "mover",
+            ],
+        "diff": [
+            "wald",
+            "score",
+            "waldccv",
+            "mover"
+            ]
+        }
+    }


 def test_poisson_2indep(count1, exposure1, count2, exposure2, value=None,
-    ratio_null=None, method=None, compare='ratio', alternative='two-sided',
-    etest_kwds=None):
-    """Test for comparing two sample Poisson intensity rates.
+                        ratio_null=None,
+                        method=None, compare='ratio',
+                        alternative='two-sided', etest_kwds=None):
+    '''Test for comparing two sample Poisson intensity rates.

     Rates are defined as expected count divided by exposure.

@@ -429,20 +745,170 @@ def test_poisson_2indep(count1, exposure1, count2, exposure2, value=None,
        Computational Statistics & Data Analysis 51 (6): 3085–99.
        https://doi.org/10.1016/j.csda.2006.02.004.

-    """
-    pass
+    '''
+
+    # shortcut names
+    y1, n1, y2, n2 = map(np.asarray, [count1, exposure1, count2, exposure2])
+    d = n2 / n1
+    rate1, rate2 = y1 / n1, y2 / n2
+    rates_cmle = None
+
+    if compare == 'ratio':
+        if method is None:
+            # default method
+            method = 'score'
+
+        if ratio_null is not None:
+            warnings.warn("'ratio_null' is deprecated, use 'value' keyword",
+                          FutureWarning)
+            value = ratio_null
+        if ratio_null is None and value is None:
+            # default value
+            value = ratio_null = 1
+        else:
+            # for results holder instance, it still contains ratio_null
+            ratio_null = value
+
+        r = value
+        r_d = r / d   # r1 * n1 / (r2 * n2)
+
+        if method in ['score']:
+            stat = (y1 - y2 * r_d) / np.sqrt((y1 + y2) * r_d)
+            dist = 'normal'
+        elif method in ['wald']:
+            stat = (y1 - y2 * r_d) / np.sqrt(y1 + y2 * r_d**2)
+            dist = 'normal'
+        elif method in ['score-log']:
+            stat = (np.log(y1 / y2) - np.log(r_d))
+            stat /= np.sqrt((2 + 1 / r_d + r_d) / (y1 + y2))
+            dist = 'normal'
+        elif method in ['wald-log']:
+            stat = (np.log(y1 / y2) - np.log(r_d)) / np.sqrt(1 / y1 + 1 / y2)
+            dist = 'normal'
+        elif method in ['sqrt']:
+            stat = 2 * (np.sqrt(y1 + 3 / 8.) - np.sqrt((y2 + 3 / 8.) * r_d))
+            stat /= np.sqrt(1 + r_d)
+            dist = 'normal'
+        elif method in ['exact-cond', 'cond-midp']:
+            from statsmodels.stats import proportion
+            bp = r_d / (1 + r_d)
+            y_total = y1 + y2
+            stat = np.nan
+            # TODO: why y2 in here and not y1, check definition of H1 "larger"
+            pvalue = proportion.binom_test(y1, y_total, prop=bp,
+                                           alternative=alternative)
+            if method in ['cond-midp']:
+                # not inplace in case we still want binom pvalue
+                pvalue = pvalue - 0.5 * stats.binom.pmf(y1, y_total, bp)
+
+            dist = 'binomial'
+        elif method.startswith('etest'):
+            if method.endswith('wald'):
+                method_etest = 'wald'
+            else:
+                method_etest = 'score'
+            if etest_kwds is None:
+                etest_kwds = {}
+
+            stat, pvalue = etest_poisson_2indep(
+                count1, exposure1, count2, exposure2, value=value,
+                method=method_etest, alternative=alternative, **etest_kwds)
+
+            dist = 'poisson'
+        else:
+            raise ValueError(f'method "{method}" not recognized')
+
+    elif compare == "diff":
+        if value is None:
+            value = 0
+        if method in ['wald']:
+            stat = (rate1 - rate2 - value) / np.sqrt(rate1 / n1 + rate2 / n2)
+            dist = 'normal'
+            "waldccv"
+        elif method in ['waldccv']:
+            stat = (rate1 - rate2 - value)
+            stat /= np.sqrt((count1 + 0.5) / n1**2 + (count2 + 0.5) / n2**2)
+            dist = 'normal'
+        elif method in ['score']:
+            # estimate rates with constraint MLE
+            count_pooled = y1 + y2
+            rate_pooled = count_pooled / (n1 + n2)
+            dt = rate_pooled - value
+            r2_cmle = 0.5 * (dt + np.sqrt(dt**2 + 4 * value * y2 / (n1 + n2)))
+            r1_cmle = r2_cmle + value
+
+            stat = ((rate1 - rate2 - value) /
+                    np.sqrt(r1_cmle / n1 + r2_cmle / n2))
+            rates_cmle = (r1_cmle, r2_cmle)
+            dist = 'normal'
+        elif method.startswith('etest'):
+            if method.endswith('wald'):
+                method_etest = 'wald'
+            else:
+                method_etest = 'score'
+                if method == "etest":
+                    method = method + "-score"
+
+            if etest_kwds is None:
+                etest_kwds = {}
+
+            stat, pvalue = etest_poisson_2indep(
+                count1, exposure1, count2, exposure2, value=value,
+                method=method_etest, compare="diff",
+                alternative=alternative, **etest_kwds)
+
+            dist = 'poisson'
+        else:
+            raise ValueError(f'method "{method}" not recognized')
+    else:
+        raise NotImplementedError('"compare" needs to be ratio or diff')
+
+    if dist == 'normal':
+        stat, pvalue = _zstat_generic2(stat, 1, alternative)
+
+    rates = (rate1, rate2)
+    ratio = rate1 / rate2
+    diff = rate1 - rate2
+    res = HolderTuple(statistic=stat,
+                      pvalue=pvalue,
+                      distribution=dist,
+                      compare=compare,
+                      method=method,
+                      alternative=alternative,
+                      rates=rates,
+                      ratio=ratio,
+                      diff=diff,
+                      value=value,
+                      rates_cmle=rates_cmle,
+                      ratio_null=ratio_null,
+                      )
+    return res


 def _score_diff(y1, n1, y2, n2, value=0, return_cmle=False):
     """score test and cmle for difference of 2 independent poisson rates

     """
-    pass
-
-
-def etest_poisson_2indep(count1, exposure1, count2, exposure2, ratio_null=
-    None, value=None, method='score', compare='ratio', alternative=
-    'two-sided', ygrid=None, y_grid=None):
+    count_pooled = y1 + y2
+    rate1, rate2 = y1 / n1, y2 / n2
+    rate_pooled = count_pooled / (n1 + n2)
+    dt = rate_pooled - value
+    r2_cmle = 0.5 * (dt + np.sqrt(dt**2 + 4 * value * y2 / (n1 + n2)))
+    r1_cmle = r2_cmle + value
+    eps = 1e-20  # avoid zero division in stat_func
+    v = r1_cmle / n1 + r2_cmle / n2
+    stat = (rate1 - rate2 - value) / np.sqrt(v + eps)
+
+    if return_cmle:
+        return stat, r1_cmle, r2_cmle
+    else:
+        return stat
+
+
+def etest_poisson_2indep(count1, exposure1, count2, exposure2, ratio_null=None,
+                         value=None, method='score', compare="ratio",
+                         alternative='two-sided', ygrid=None,
+                         y_grid=None):
     """
     E-test for ratio of two sample Poisson rates.

@@ -527,12 +993,113 @@ def etest_poisson_2indep(count1, exposure1, count2, exposure2, ratio_null=
     Analysis 51 (6): 3085–99. https://doi.org/10.1016/j.csda.2006.02.004.

     """
-    pass
+    y1, n1, y2, n2 = map(np.asarray, [count1, exposure1, count2, exposure2])
+    d = n2 / n1
+
+    eps = 1e-20  # avoid zero division in stat_func
+
+    if compare == "ratio":
+        if ratio_null is None and value is None:
+            # default value
+            value = 1
+        elif ratio_null is not None:
+            warnings.warn("'ratio_null' is deprecated, use 'value' keyword",
+                          FutureWarning)
+            value = ratio_null
+
+        r = value  # rate1 / rate2
+        r_d = r / d
+        rate2_cmle = (y1 + y2) / n2 / (1 + r_d)
+        rate1_cmle = rate2_cmle * r
+
+        if method in ['score']:
+            def stat_func(x1, x2):
+                return (x1 - x2 * r_d) / np.sqrt((x1 + x2) * r_d + eps)
+            # TODO: do I need these? return_results ?
+            # rate2_cmle = (y1 + y2) / n2 / (1 + r_d)
+            # rate1_cmle = rate2_cmle * r
+            # rate1 = rate1_cmle
+            # rate2 = rate2_cmle
+        elif method in ['wald']:
+            def stat_func(x1, x2):
+                return (x1 - x2 * r_d) / np.sqrt(x1 + x2 * r_d**2 + eps)
+            # rate2_mle = y2 / n2
+            # rate1_mle = y1 / n1
+            # rate1 = rate1_mle
+            # rate2 = rate2_mle
+        else:
+            raise ValueError('method not recognized')
+
+    elif compare == "diff":
+        if value is None:
+            value = 0
+        tmp = _score_diff(y1, n1, y2, n2, value=value, return_cmle=True)
+        _, rate1_cmle, rate2_cmle = tmp
+
+        if method in ['score']:
+
+            def stat_func(x1, x2):
+                return _score_diff(x1, n1, x2, n2, value=value)
+
+        elif method in ['wald']:
+
+            def stat_func(x1, x2):
+                rate1, rate2 = x1 / n1, x2 / n2
+                stat = (rate1 - rate2 - value)
+                stat /= np.sqrt(rate1 / n1 + rate2 / n2 + eps)
+                return stat
+
+        else:
+            raise ValueError('method not recognized')
+
+    # The sampling distribution needs to be based on the null hypotheis
+    # use constrained MLE from 'score' calculation
+    rate1 = rate1_cmle
+    rate2 = rate2_cmle
+    mean1 = n1 * rate1
+    mean2 = n2 * rate2
+
+    stat_sample = stat_func(y1, y2)
+
+    if ygrid is not None:
+        warnings.warn("ygrid is deprecated, use y_grid", FutureWarning)
+    y_grid = y_grid if y_grid is not None else ygrid
+
+    # The following uses a fixed truncation for evaluating the probabilities
+    # It will currently only work for small counts, so that sf at truncation
+    # point is small
+    # We can make it depend on the amount of truncated sf.
+    # Some numerical optimization or checks for large means need to be added.
+    if y_grid is None:
+        threshold = stats.poisson.isf(1e-13, max(mean1, mean2))
+        threshold = max(threshold, 100)   # keep at least 100
+        y_grid = np.arange(threshold + 1)
+    else:
+        y_grid = np.asarray(y_grid)
+        if y_grid.ndim != 1:
+            raise ValueError("y_grid needs to be None or 1-dimensional array")
+    pdf1 = stats.poisson.pmf(y_grid, mean1)
+    pdf2 = stats.poisson.pmf(y_grid, mean2)
+
+    stat_space = stat_func(y_grid[:, None], y_grid[None, :])  # broadcasting
+    eps = 1e-15   # correction for strict inequality check
+
+    if alternative in ['two-sided', '2-sided', '2s']:
+        mask = np.abs(stat_space) >= (np.abs(stat_sample) - eps)
+    elif alternative in ['larger', 'l']:
+        mask = stat_space >= (stat_sample - eps)
+    elif alternative in ['smaller', 's']:
+        mask = stat_space <= (stat_sample + eps)
+    else:
+        raise ValueError('invalid alternative')
+
+    pvalue = ((pdf1[:, None] * pdf2[None, :])[mask]).sum()
+    return stat_sample, pvalue


 def tost_poisson_2indep(count1, exposure1, count2, exposure2, low, upp,
-    method='score', compare='ratio'):
-    """Equivalence test based on two one-sided `test_proportions_2indep`
+                        method='score', compare='ratio'):
+    '''Equivalence test based on two one-sided `test_proportions_2indep`

     This assumes that we have two independent poisson samples.

@@ -601,12 +1168,37 @@ def tost_poisson_2indep(count1, exposure1, count2, exposure2, low, upp,
     --------
     test_poisson_2indep
     confint_poisson_2indep
-    """
-    pass
-
-
-def nonequivalence_poisson_2indep(count1, exposure1, count2, exposure2, low,
-    upp, method='score', compare='ratio'):
+    '''
+
+    tt1 = test_poisson_2indep(count1, exposure1, count2, exposure2,
+                              value=low, method=method,
+                              compare=compare,
+                              alternative='larger')
+    tt2 = test_poisson_2indep(count1, exposure1, count2, exposure2,
+                              value=upp, method=method,
+                              compare=compare,
+                              alternative='smaller')
+
+    # idx_max = 1 if t1.pvalue < t2.pvalue else 0
+    idx_max = np.asarray(tt1.pvalue < tt2.pvalue, int)
+    statistic = np.choose(idx_max, [tt1.statistic, tt2.statistic])
+    pvalue = np.choose(idx_max, [tt1.pvalue, tt2.pvalue])
+
+    res = HolderTuple(statistic=statistic,
+                      pvalue=pvalue,
+                      method=method,
+                      compare=compare,
+                      equiv_limits=(low, upp),
+                      results_larger=tt1,
+                      results_smaller=tt2,
+                      title="Equivalence test for 2 independent Poisson rates"
+                      )
+
+    return res
+
+
+def nonequivalence_poisson_2indep(count1, exposure1, count2, exposure2,
+                                  low, upp, method='score', compare="ratio"):
     """Test for non-equivalence, minimum effect for poisson.

     This reverses null and alternative hypothesis compared to equivalence
@@ -652,11 +1244,32 @@ def nonequivalence_poisson_2indep(count1, exposure1, count2, exposure2, low,
        Econometrics 7 (2): 21. https://doi.org/10.3390/econometrics7020021.

     """
-    pass
-
-
-def confint_poisson_2indep(count1, exposure1, count2, exposure2, method=
-    'score', compare='ratio', alpha=0.05, method_mover='score'):
+    tt1 = test_poisson_2indep(count1, exposure1, count2, exposure2,
+                              value=low, method=method, compare=compare,
+                              alternative='smaller')
+    tt2 = test_poisson_2indep(count1, exposure1, count2, exposure2,
+                              value=upp, method=method, compare=compare,
+                              alternative='larger')
+
+    # idx_min = 0 if tt1.pvalue < tt2.pvalue else 1
+    idx_min = np.asarray(tt1.pvalue < tt2.pvalue, int)
+    pvalue = 2 * np.minimum(tt1.pvalue, tt2.pvalue)
+    statistic = np.choose(idx_min, [tt1.statistic, tt2.statistic])
+    res = HolderTuple(statistic=statistic,
+                      pvalue=pvalue,
+                      method=method,
+                      results_larger=tt1,
+                      results_smaller=tt2,
+                      title="Equivalence test for 2 independent Poisson rates"
+                      )
+
+    return res
+
+
+def confint_poisson_2indep(count1, exposure1, count2, exposure2,
+                           method='score', compare='ratio', alpha=0.05,
+                           method_mover="score",
+                           ):
     """Confidence interval for ratio or difference of 2 indep poisson rates.

     Parameters
@@ -720,12 +1333,123 @@ def confint_poisson_2indep(count1, exposure1, count2, exposure2, method=
     tuple (low, upp) : confidence limits.

     """
-    pass
-

-def power_poisson_ratio_2indep(rate1, rate2, nobs1, nobs_ratio=1, exposure=
-    1, value=0, alpha=0.05, dispersion=1, alternative='smaller', method_var
-    ='alt', return_results=True):
+    # shortcut names
+    y1, n1, y2, n2 = map(np.asarray, [count1, exposure1, count2, exposure2])
+    rate1, rate2 = y1 / n1, y2 / n2
+    alpha = alpha / 2  # two-sided only
+
+    if compare == "ratio":
+
+        if method == "score":
+            low, upp = _invert_test_confint_2indep(
+                count1, exposure1, count2, exposure2,
+                alpha=alpha * 2,   # check how alpha is defined
+                method="score",
+                compare="ratio",
+                method_start="waldcc"
+                )
+            ci = (low, upp)
+
+        elif method == "wald-log":
+            crit = stats.norm.isf(alpha)
+            c = 0
+            center = (count1 + c) / (count2 + c) * n2 / n1
+            std = np.sqrt(1 / (count1 + c) + 1 / (count2 + c))
+
+            ci = (center * np.exp(- crit * std), center * np.exp(crit * std))
+
+        elif method == "score-log":
+            low, upp = _invert_test_confint_2indep(
+                count1, exposure1, count2, exposure2,
+                alpha=alpha * 2,   # check how alpha is defined
+                method="score-log",
+                compare="ratio",
+                method_start="waldcc"
+                )
+            ci = (low, upp)
+
+        elif method == "waldcc":
+            crit = stats.norm.isf(alpha)
+            center = (count1 + 0.5) / (count2 + 0.5) * n2 / n1
+            std = np.sqrt(1 / (count1 + 0.5) + 1 / (count2 + 0.5))
+
+            ci = (center * np.exp(- crit * std), center * np.exp(crit * std))
+
+        elif method == "sqrtcc":
+            # coded based on Price, Bonett 2000 equ (2.4)
+            crit = stats.norm.isf(alpha)
+            center = np.sqrt((count1 + 0.5) * (count2 + 0.5))
+            std = 0.5 * np.sqrt(count1 + 0.5 + count2 + 0.5 - 0.25 * crit)
+            denom = (count2 + 0.5 - 0.25 * crit**2)
+
+            low_sqrt = (center - crit * std) / denom
+            upp_sqrt = (center + crit * std) / denom
+
+            ci = (low_sqrt**2, upp_sqrt**2)
+
+        elif method == "mover":
+            method_p = method_mover
+            ci1 = confint_poisson(y1, n1, method=method_p, alpha=2*alpha)
+            ci2 = confint_poisson(y2, n2, method=method_p, alpha=2*alpha)
+
+            ci = _mover_confint(rate1, rate2, ci1, ci2, contrast="ratio")
+
+        else:
+            raise ValueError(f'method "{method}" not recognized')
+
+        ci = (np.maximum(ci[0], 0), ci[1])
+
+    elif compare == "diff":
+
+        if method in ['wald']:
+            crit = stats.norm.isf(alpha)
+            center = rate1 - rate2
+            half = crit * np.sqrt(rate1 / n1 + rate2 / n2)
+            ci = center - half, center + half
+
+        elif method in ['waldccv']:
+            crit = stats.norm.isf(alpha)
+            center = rate1 - rate2
+            std = np.sqrt((count1 + 0.5) / n1**2 + (count2 + 0.5) / n2**2)
+            half = crit * std
+            ci = center - half, center + half
+
+        elif method == "score":
+            low, upp = _invert_test_confint_2indep(
+                count1, exposure1, count2, exposure2,
+                alpha=alpha * 2,   # check how alpha is defined
+                method="score",
+                compare="diff",
+                method_start="waldccv"
+                )
+            ci = (low, upp)
+
+        elif method == "mover":
+            method_p = method_mover
+            ci1 = confint_poisson(y1, n1, method=method_p, alpha=2*alpha)
+            ci2 = confint_poisson(y2, n2, method=method_p, alpha=2*alpha)
+
+            ci = _mover_confint(rate1, rate2, ci1, ci2, contrast="diff")
+        else:
+            raise ValueError(f'method "{method}" not recognized')
+    else:
+        raise NotImplementedError('"compare" needs to be ratio or diff')
+
+    return ci
+
+
+def power_poisson_ratio_2indep(
+        rate1, rate2, nobs1,
+        nobs_ratio=1,
+        exposure=1,
+        value=0,
+        alpha=0.05,
+        dispersion=1,
+        alternative="smaller",
+        method_var="alt",
+        return_results=True,
+        ):
     """Power of test of ratio of 2 independent poisson rates.

     This is based on Zhu and Zhu and Lakkis. It does not directly correspond
@@ -800,12 +1524,54 @@ def power_poisson_ratio_2indep(rate1, rate2, nobs1, nobs_ratio=1, exposure=
        376–87. https://doi.org/10.1002/sim.5947.
     .. [3] PASS documentation
     """
-    pass
-
-
-def power_equivalence_poisson_2indep(rate1, rate2, nobs1, low, upp,
-    nobs_ratio=1, exposure=1, alpha=0.05, dispersion=1, method_var='alt',
-    return_results=False):
+    # TODO: avoid possible circular import, check if needed
+    from statsmodels.stats.power import normal_power_het
+
+    rate1, rate2, nobs1 = map(np.asarray, [rate1, rate2, nobs1])
+
+    nobs2 = nobs_ratio * nobs1
+    v1 = dispersion / exposure * (1 / rate1 + 1 / (nobs_ratio * rate2))
+    if method_var == "alt":
+        v0 = v1
+    elif method_var == "score":
+        # nobs_ratio = 1 / nobs_ratio
+        v0 = dispersion / exposure * (1 + value / nobs_ratio)**2
+        v0 /= value / nobs_ratio * (rate1 + (nobs_ratio * rate2))
+    else:
+        raise NotImplementedError(f"method_var {method_var} not recognized")
+
+    std_null = np.sqrt(v0)
+    std_alt = np.sqrt(v1)
+    es = np.log(rate1 / rate2) - np.log(value)
+
+    pow_ = normal_power_het(es, nobs1, alpha, std_null=std_null,
+                            std_alternative=std_alt,
+                            alternative=alternative)
+
+    p_pooled = None  # TODO: replace or remove
+
+    if return_results:
+        res = HolderTuple(
+            power=pow_,
+            p_pooled=p_pooled,
+            std_null=std_null,
+            std_alt=std_alt,
+            nobs1=nobs1,
+            nobs2=nobs2,
+            nobs_ratio=nobs_ratio,
+            alpha=alpha,
+            tuple_=("power",),  # override default
+            )
+        return res
+
+    return pow_
+
+
+def power_equivalence_poisson_2indep(rate1, rate2, nobs1,
+                                     low, upp, nobs_ratio=1,
+                                     exposure=1, alpha=0.05, dispersion=1,
+                                     method_var="alt",
+                                     return_results=False):
     """Power of equivalence test of ratio of 2 independent poisson rates.

     Parameters
@@ -877,27 +1643,118 @@ def power_equivalence_poisson_2indep(rate1, rate2, nobs1, low, upp,
        376–87. https://doi.org/10.1002/sim.5947.
     .. [3] PASS documentation
     """
-    pass
+    rate1, rate2, nobs1 = map(np.asarray, [rate1, rate2, nobs1])
+
+    nobs2 = nobs_ratio * nobs1
+    v1 = dispersion / exposure * (1 / rate1 + 1 / (nobs_ratio * rate2))
+
+    if method_var == "alt":
+        v0_low = v0_upp = v1
+    elif method_var == "score":
+        v0_low = dispersion / exposure * (1 + low * nobs_ratio)**2
+        v0_low /= low * nobs_ratio * (rate1 + (nobs_ratio * rate2))
+        v0_upp = dispersion / exposure * (1 + upp * nobs_ratio)**2
+        v0_upp /= upp * nobs_ratio * (rate1 + (nobs_ratio * rate2))
+    else:
+        raise NotImplementedError(f"method_var {method_var} not recognized")
+
+    es_low = np.log(rate1 / rate2) - np.log(low)
+    es_upp = np.log(rate1 / rate2) - np.log(upp)
+    std_null_low = np.sqrt(v0_low)
+    std_null_upp = np.sqrt(v0_upp)
+    std_alternative = np.sqrt(v1)
+
+    pow_ = _power_equivalence_het(es_low, es_upp, nobs2, alpha=alpha,
+                                  std_null_low=std_null_low,
+                                  std_null_upp=std_null_upp,
+                                  std_alternative=std_alternative)
+
+    if return_results:
+        res = HolderTuple(
+            power=pow_[0],
+            power_margins=pow[1:],
+            std_null_low=std_null_low,
+            std_null_upp=std_null_upp,
+            std_alt=std_alternative,
+            nobs1=nobs1,
+            nobs2=nobs2,
+            nobs_ratio=nobs_ratio,
+            alpha=alpha,
+            tuple_=("power",),  # override default
+            )
+        return res
+    else:
+        return pow_[0]


 def _power_equivalence_het_v0(es_low, es_upp, nobs, alpha=0.05,
-    std_null_low=None, std_null_upp=None, std_alternative=None):
+                              std_null_low=None,
+                              std_null_upp=None,
+                              std_alternative=None):
     """power for equivalence test

     """
-    pass

+    s0_low = std_null_low
+    s0_upp = std_null_upp
+    s1 = std_alternative

-def _power_equivalence_het(es_low, es_upp, nobs, alpha=0.05, std_null_low=
-    None, std_null_upp=None, std_alternative=None):
+    crit = norm.isf(alpha)
+    pow_ = (
+        norm.cdf((np.sqrt(nobs) * es_low - crit * s0_low) / s1) +
+        norm.cdf((np.sqrt(nobs) * es_upp - crit * s0_upp) / s1) - 1
+        )
+    return pow_
+
+
+def _power_equivalence_het(es_low, es_upp, nobs, alpha=0.05,
+                           std_null_low=None,
+                           std_null_upp=None,
+                           std_alternative=None):
     """power for equivalence test

     """
-    pass
+
+    s0_low = std_null_low
+    s0_upp = std_null_upp
+    s1 = std_alternative
+
+    crit = norm.isf(alpha)
+
+    # Note: rejection region is an interval [low, upp]
+    # Here we compute the complement of the two tail probabilities
+    p1 = norm.sf((np.sqrt(nobs) * es_low - crit * s0_low) / s1)
+    p2 = norm.cdf((np.sqrt(nobs) * es_upp + crit * s0_upp) / s1)
+    pow_ = 1 - (p1 + p2)
+    return pow_, p1, p2
+
+
+def _std_2poisson_power(
+        rate1, rate2, nobs_ratio=1, alpha=0.05,
+        exposure=1,
+        dispersion=1,
+        value=0,
+        method_var="score",
+        ):
+    rates_pooled = (rate1 + rate2 * nobs_ratio) / (1 + nobs_ratio)
+    # v1 = dispersion / exposure * (1 / rate2 + 1 / (nobs_ratio * rate1))
+    if method_var == "alt":
+        v0 = v1 = rate1 + rate2 / nobs_ratio
+    else:
+        # uaw n1 = 1 as normalization
+        _, r1_cmle, r2_cmle = _score_diff(
+            rate1, 1, rate2 * nobs_ratio, nobs_ratio, value=value,
+            return_cmle=True)
+        v1 = rate1 + rate2 / nobs_ratio
+        v0 = r1_cmle + r2_cmle / nobs_ratio
+    return rates_pooled, np.sqrt(v0), np.sqrt(v1)


 def power_poisson_diff_2indep(rate1, rate2, nobs1, nobs_ratio=1, alpha=0.05,
-    value=0, method_var='score', alternative='two-sided', return_results=True):
+                              value=0,
+                              method_var="score",
+                              alternative='two-sided',
+                              return_results=True):
     """Power of ztest for the difference between two independent poisson rates.

     Parameters
@@ -960,11 +1817,44 @@ def power_poisson_diff_2indep(rate1, rate2, nobs1, nobs_ratio=1, alpha=0.05,
     .. [2] PASS manual chapter 436

     """
-    pass
+    # TODO: avoid possible circular import, check if needed
+    from statsmodels.stats.power import normal_power_het
+
+    rate1, rate2, nobs1 = map(np.asarray, [rate1, rate2, nobs1])
+
+    diff = rate1 - rate2
+    _, std_null, std_alt = _std_2poisson_power(
+        rate1,
+        rate2,
+        nobs_ratio=nobs_ratio,
+        alpha=alpha,
+        value=value,
+        method_var=method_var,
+        )
+
+    pow_ = normal_power_het(diff - value, nobs1, alpha, std_null=std_null,
+                            std_alternative=std_alt,
+                            alternative=alternative)
+
+    if return_results:
+        res = HolderTuple(
+            power=pow_,
+            rates_alt=(rate2 + diff, rate2),
+            std_null=std_null,
+            std_alt=std_alt,
+            nobs1=nobs1,
+            nobs2=nobs_ratio * nobs1,
+            nobs_ratio=nobs_ratio,
+            alpha=alpha,
+            tuple_=("power",),  # override default
+            )
+        return res
+    else:
+        return pow_


 def _var_cmle_negbin(rate1, rate2, nobs_ratio, exposure=1, value=1,
-    dispersion=0):
+                     dispersion=0):
     """
     variance based on constrained cmle, for score test version

@@ -972,12 +1862,38 @@ def _var_cmle_negbin(rate1, rate2, nobs_ratio, exposure=1, value=1,

     value = rate1 / rate2 under the null
     """
-    pass
-
-
-def power_negbin_ratio_2indep(rate1, rate2, nobs1, nobs_ratio=1, exposure=1,
-    value=1, alpha=0.05, dispersion=0.01, alternative='two-sided',
-    method_var='alt', return_results=True):
+    # definitions in Zhu
+    # nobs_ratio = n1 / n0
+    # value = ratio = r1 / r0
+    rate0 = rate2  # control
+    nobs_ratio = 1 / nobs_ratio
+
+    a = - dispersion * exposure * value * (1 + nobs_ratio)
+    b = (dispersion * exposure * (rate0 * value + nobs_ratio * rate1) -
+         (1 + nobs_ratio * value))
+    c = rate0 + nobs_ratio * rate1
+    if dispersion == 0:
+        r0 = -c / b
+    else:
+        r0 = (-b - np.sqrt(b**2 - 4 * a * c)) / (2 * a)
+    r1 = r0 * value
+    v = (1 / exposure / r0 * (1 + 1 / value / nobs_ratio) +
+         (1 + nobs_ratio) / nobs_ratio * dispersion)
+
+    r2 = r0
+    return v * nobs_ratio, r1, r2
+
+
+def power_negbin_ratio_2indep(
+        rate1, rate2, nobs1,
+        nobs_ratio=1,
+        exposure=1,
+        value=1,
+        alpha=0.05,
+        dispersion=0.01,
+        alternative="two-sided",
+        method_var="alt",
+        return_results=True):
     """
     Power of test of ratio of 2 independent negative binomial rates.

@@ -1054,12 +1970,56 @@ def power_negbin_ratio_2indep(rate1, rate2, nobs1, nobs_ratio=1, exposure=1,
        376–87. https://doi.org/10.1002/sim.5947.
     .. [3] PASS documentation
     """
-    pass
-
-
-def power_equivalence_neginb_2indep(rate1, rate2, nobs1, low, upp,
-    nobs_ratio=1, exposure=1, alpha=0.05, dispersion=0, method_var='alt',
-    return_results=False):
+    # TODO: avoid possible circular import, check if needed
+    from statsmodels.stats.power import normal_power_het
+
+    rate1, rate2, nobs1 = map(np.asarray, [rate1, rate2, nobs1])
+
+    nobs2 = nobs_ratio * nobs1
+    v1 = ((1 / rate1 + 1 / (nobs_ratio * rate2)) / exposure +
+          (1 + nobs_ratio) / nobs_ratio * dispersion)
+    if method_var == "alt":
+        v0 = v1
+    elif method_var == "ftotal":
+        v0 = (1 + value * nobs_ratio)**2 / (
+             exposure * nobs_ratio * value * (rate1 + nobs_ratio * rate2))
+        v0 += (1 + nobs_ratio) / nobs_ratio * dispersion
+    elif method_var == "score":
+        v0 = _var_cmle_negbin(rate1, rate2, nobs_ratio,
+                              exposure=exposure, value=value,
+                              dispersion=dispersion)[0]
+    else:
+        raise NotImplementedError(f"method_var {method_var} not recognized")
+
+    std_null = np.sqrt(v0)
+    std_alt = np.sqrt(v1)
+    es = np.log(rate1 / rate2) - np.log(value)
+
+    pow_ = normal_power_het(es, nobs1, alpha, std_null=std_null,
+                            std_alternative=std_alt,
+                            alternative=alternative)
+
+    if return_results:
+        res = HolderTuple(
+            power=pow_,
+            std_null=std_null,
+            std_alt=std_alt,
+            nobs1=nobs1,
+            nobs2=nobs2,
+            nobs_ratio=nobs_ratio,
+            alpha=alpha,
+            tuple_=("power",),  # override default
+            )
+        return res
+
+    return pow_
+
+
+def power_equivalence_neginb_2indep(rate1, rate2, nobs1,
+                                    low, upp, nobs_ratio=1,
+                                    exposure=1, alpha=0.05, dispersion=0,
+                                    method_var="alt",
+                                    return_results=False):
     """
     Power of equivalence test of ratio of 2 indep. negative binomial rates.

@@ -1132,4 +2092,55 @@ def power_equivalence_neginb_2indep(rate1, rate2, nobs1, low, upp,
        376–87. https://doi.org/10.1002/sim.5947.
     .. [3] PASS documentation
     """
-    pass
+    rate1, rate2, nobs1 = map(np.asarray, [rate1, rate2, nobs1])
+
+    nobs2 = nobs_ratio * nobs1
+
+    v1 = ((1 / rate2 + 1 / (nobs_ratio * rate1)) / exposure +
+          (1 + nobs_ratio) / nobs_ratio * dispersion)
+    if method_var == "alt":
+        v0_low = v0_upp = v1
+    elif method_var == "ftotal":
+        v0_low = (1 + low * nobs_ratio)**2 / (
+             exposure * nobs_ratio * low * (rate1 + nobs_ratio * rate2))
+        v0_low += (1 + nobs_ratio) / nobs_ratio * dispersion
+        v0_upp = (1 + upp * nobs_ratio)**2 / (
+             exposure * nobs_ratio * upp * (rate1 + nobs_ratio * rate2))
+        v0_upp += (1 + nobs_ratio) / nobs_ratio * dispersion
+    elif method_var == "score":
+        v0_low = _var_cmle_negbin(rate1, rate2, nobs_ratio,
+                                  exposure=exposure, value=low,
+                                  dispersion=dispersion)[0]
+        v0_upp = _var_cmle_negbin(rate1, rate2, nobs_ratio,
+                                  exposure=exposure, value=upp,
+                                  dispersion=dispersion)[0]
+    else:
+        raise NotImplementedError(f"method_var {method_var} not recognized")
+
+    es_low = np.log(rate1 / rate2) - np.log(low)
+    es_upp = np.log(rate1 / rate2) - np.log(upp)
+    std_null_low = np.sqrt(v0_low)
+    std_null_upp = np.sqrt(v0_upp)
+    std_alternative = np.sqrt(v1)
+
+    pow_ = _power_equivalence_het(es_low, es_upp, nobs1, alpha=alpha,
+                                  std_null_low=std_null_low,
+                                  std_null_upp=std_null_upp,
+                                  std_alternative=std_alternative)
+
+    if return_results:
+        res = HolderTuple(
+            power=pow_[0],
+            power_margins=pow[1:],
+            std_null_low=std_null_low,
+            std_null_upp=std_null_upp,
+            std_alt=std_alternative,
+            nobs1=nobs1,
+            nobs2=nobs2,
+            nobs_ratio=nobs_ratio,
+            alpha=alpha,
+            tuple_=("power",),  # override default
+            )
+        return res
+    else:
+        return pow_[0]
diff --git a/statsmodels/stats/regularized_covariance.py b/statsmodels/stats/regularized_covariance.py
index 4ee43c25e..d3068e3c3 100644
--- a/statsmodels/stats/regularized_covariance.py
+++ b/statsmodels/stats/regularized_covariance.py
@@ -28,7 +28,20 @@ def _calc_nodewise_row(exog, idx, alpha):
     nodewise_row_i = arg min 1/(2n) ||exog_i - exog_-i gamma||_2^2
                              + alpha ||gamma||_1
     """
-    pass
+
+    p = exog.shape[1]
+    ind = list(range(p))
+    ind.pop(idx)
+
+    # handle array alphas
+    if not np.isscalar(alpha):
+        alpha = alpha[ind]
+
+    tmod = OLS(exog[:, idx], exog[:, ind])
+
+    nodewise_row = tmod.fit_regularized(alpha=alpha).params
+
+    return nodewise_row


 def _calc_nodewise_weight(exog, nodewise_row, idx, alpha):
@@ -59,7 +72,18 @@ def _calc_nodewise_weight(exog, nodewise_row, idx, alpha):
     nodewise_weight_i = sqrt(1/n ||exog,i - exog_-i nodewise_row||_2^2
                              + alpha ||nodewise_row||_1)
     """
-    pass
+
+    n, p = exog.shape
+    ind = list(range(p))
+    ind.pop(idx)
+
+    # handle array alphas
+    if not np.isscalar(alpha):
+        alpha = alpha[ind]
+
+    d = np.linalg.norm(exog[:, idx] - exog[:, ind].dot(nodewise_row))**2
+    d = np.sqrt(d / n + alpha * np.linalg.norm(nodewise_row, 1))
+    return d


 def _calc_approx_inv_cov(nodewise_row_l, nodewise_weight_l):
@@ -87,7 +111,17 @@ def _calc_approx_inv_cov(nodewise_row_l, nodewise_weight_l):

     approx_inv_cov_j = - 1 / nww_j [nwr_j,1,...,1,...nwr_j,p]
     """
-    pass
+
+    p = len(nodewise_weight_l)
+
+    approx_inv_cov = -np.eye(p)
+    for idx in range(p):
+        ind = list(range(p))
+        ind.pop(idx)
+        approx_inv_cov[idx, ind] = nodewise_row_l[idx]
+    approx_inv_cov *= -1 / nodewise_weight_l[:, None]**2
+
+    return approx_inv_cov


 class RegularizedInvCovariance:
@@ -109,6 +143,7 @@ class RegularizedInvCovariance:
     """

     def __init__(self, exog):
+
         self.exog = exog

     def fit(self, alpha=0):
@@ -120,4 +155,27 @@ class RegularizedInvCovariance:
         alpha : scalar
             Regularizing constant
         """
-        pass
+
+        n, p = self.exog.shape
+
+        nodewise_row_l = []
+        nodewise_weight_l = []
+
+        for idx in range(p):
+            nodewise_row = _calc_nodewise_row(self.exog, idx, alpha)
+            nodewise_row_l.append(nodewise_row)
+
+            nodewise_weight = _calc_nodewise_weight(self.exog, nodewise_row,
+                                                    idx, alpha)
+            nodewise_weight_l.append(nodewise_weight)
+
+        nodewise_row_l = np.array(nodewise_row_l)
+        nodewise_weight_l = np.array(nodewise_weight_l)
+
+        approx_inv_cov = _calc_approx_inv_cov(nodewise_row_l,
+                                              nodewise_weight_l)
+
+        self._approx_inv_cov = approx_inv_cov
+
+    def approx_inv_cov(self):
+        return self._approx_inv_cov
diff --git a/statsmodels/stats/robust_compare.py b/statsmodels/stats/robust_compare.py
index 42c53203c..1460e795d 100644
--- a/statsmodels/stats/robust_compare.py
+++ b/statsmodels/stats/robust_compare.py
@@ -1,12 +1,17 @@
+# -*- coding: utf-8 -*-
 """Anova k-sample comparison without and with trimming

 Created on Sun Jun 09 23:51:34 2013

 Author: Josef Perktold
 """
+
 import numbers
 import numpy as np

+# the trimboth and trim_mean are taken from scipy.stats.stats
+# and enhanced by axis
+

 def trimboth(a, proportiontocut, axis=0):
     """
@@ -44,7 +49,19 @@ def trimboth(a, proportiontocut, axis=0):
     (16,)

     """
-    pass
+    a = np.asarray(a)
+    if axis is None:
+        a = a.ravel()
+        axis = 0
+    nobs = a.shape[axis]
+    lowercut = int(proportiontocut * nobs)
+    uppercut = nobs - lowercut
+    if (lowercut >= uppercut):
+        raise ValueError("Proportion too big.")
+
+    sl = [slice(None)] * a.ndim
+    sl[axis] = slice(lowercut, uppercut)
+    return a[tuple(sl)]


 def trim_mean(a, proportiontocut, axis=0):
@@ -72,7 +89,8 @@ def trim_mean(a, proportiontocut, axis=0):
         Mean of trimmed array.

     """
-    pass
+    newa = trimboth(np.sort(a, axis), proportiontocut, axis=axis)
+    return np.mean(newa, axis=axis)


 class TrimmedMean:
@@ -99,68 +117,93 @@ class TrimmedMean:

     def __init__(self, data, fraction, is_sorted=False, axis=0):
         self.data = np.asarray(data)
+        # TODO: add pandas handling, maybe not if this stays internal
+
         self.axis = axis
         self.fraction = fraction
         self.nobs = nobs = self.data.shape[axis]
         self.lowercut = lowercut = int(fraction * nobs)
         self.uppercut = uppercut = nobs - lowercut
-        if lowercut >= uppercut:
-            raise ValueError('Proportion too big.')
+        if (lowercut >= uppercut):
+            raise ValueError("Proportion too big.")
         self.nobs_reduced = nobs - 2 * lowercut
+
         self.sl = [slice(None)] * self.data.ndim
         self.sl[axis] = slice(self.lowercut, self.uppercut)
+        # numpy requires now tuple for indexing, not list
         self.sl = tuple(self.sl)
         if not is_sorted:
             self.data_sorted = np.sort(self.data, axis=axis)
         else:
             self.data_sorted = self.data
+
+        # this only works for axis=0
         self.lowerbound = np.take(self.data_sorted, lowercut, axis=axis)
         self.upperbound = np.take(self.data_sorted, uppercut - 1, axis=axis)
+        # self.lowerbound = self.data_sorted[lowercut]
+        # self.upperbound = self.data_sorted[uppercut - 1]

     @property
     def data_trimmed(self):
         """numpy array of trimmed and sorted data
         """
-        pass
+        # returns a view
+        return self.data_sorted[self.sl]

-    @property
+    @property  # cache
     def data_winsorized(self):
         """winsorized data
         """
-        pass
+        lb = np.expand_dims(self.lowerbound, self.axis)
+        ub = np.expand_dims(self.upperbound, self.axis)
+        return np.clip(self.data_sorted, lb, ub)

     @property
     def mean_trimmed(self):
         """mean of trimmed data
         """
-        pass
+        return np.mean(self.data_sorted[tuple(self.sl)], self.axis)

     @property
     def mean_winsorized(self):
         """mean of winsorized data
         """
-        pass
+        return np.mean(self.data_winsorized, self.axis)

     @property
     def var_winsorized(self):
         """variance of winsorized data
         """
-        pass
+        # hardcoded ddof = 1
+        return np.var(self.data_winsorized, ddof=1, axis=self.axis)

     @property
     def std_mean_trimmed(self):
         """standard error of trimmed mean
         """
-        pass
+        se = np.sqrt(self.var_winsorized / self.nobs_reduced)
+        # trimming creates correlation across trimmed observations
+        # trimming is based on order statistics of the data
+        # wilcox 2012, p.61
+        se *= np.sqrt(self.nobs / self.nobs_reduced)
+        return se

     @property
     def std_mean_winsorized(self):
         """standard error of winsorized mean
         """
-        pass
-
-    def ttest_mean(self, value=0, transform='trimmed', alternative='two-sided'
-        ):
+        # the following matches Wilcox, WRS2
+        std_ = np.sqrt(self.var_winsorized / self.nobs)
+        std_ *= (self.nobs - 1) / (self.nobs_reduced - 1)
+        # old version
+        # tm = self
+        # formula from an old SAS manual page, simplified
+        # std_ = np.sqrt(tm.var_winsorized / (tm.nobs_reduced - 1) *
+        #               (tm.nobs - 1.) / tm.nobs)
+        return std_
+
+    def ttest_mean(self, value=0, transform='trimmed',
+                   alternative='two-sided'):
         """
         One sample t-test for trimmed or Winsorized mean

@@ -180,18 +223,37 @@ class TrimmedMean:
         statistic. The approximation is valid if the underlying distribution
         is symmetric.
         """
-        pass
+        import statsmodels.stats.weightstats as smws
+        df = self.nobs_reduced - 1
+        if transform == 'trimmed':
+            mean_ = self.mean_trimmed
+            std_ = self.std_mean_trimmed
+        elif transform == 'winsorized':
+            mean_ = self.mean_winsorized
+            std_ = self.std_mean_winsorized
+        else:
+            raise ValueError("transform can only be 'trimmed' or 'winsorized'")
+
+        res = smws._tstat_generic(mean_, 0, std_,
+                                  df, alternative=alternative, diff=value)
+        return res + (df,)

     def reset_fraction(self, frac):
         """create a TrimmedMean instance with a new trimming fraction

         This reuses the sorted array from the current instance.
         """
-        pass
+        tm = TrimmedMean(self.data_sorted, frac, is_sorted=True,
+                         axis=self.axis)
+        tm.data = self.data
+        # TODO: this will not work if there is processing of meta-information
+        #       in __init__,
+        #       for example storing a pandas DataFrame or Series index
+        return tm


 def scale_transform(data, center='median', transform='abs', trim_frac=0.2,
-    axis=0):
+                    axis=0):
     """Transform data for variance comparison for Levene type tests

     Parameters
@@ -215,4 +277,29 @@ def scale_transform(data, center='median', transform='abs', trim_frac=0.2,
         transformed data in the same shape as the original data.

     """
-    pass
+    x = np.asarray(data)  # x is shorthand from earlier code
+
+    if transform == 'abs':
+        tfunc = np.abs
+    elif transform == 'square':
+        tfunc = lambda x: x * x  # noqa
+    elif transform == 'identity':
+        tfunc = lambda x: x  # noqa
+    elif callable(transform):
+        tfunc = transform
+    else:
+        raise ValueError('transform should be abs, square or exp')
+
+    if center == 'median':
+        res = tfunc(x - np.expand_dims(np.median(x, axis=axis), axis))
+    elif center == 'mean':
+        res = tfunc(x - np.expand_dims(np.mean(x, axis=axis), axis))
+    elif center == 'trimmed':
+        center = trim_mean(x, trim_frac, axis=axis)
+        res = tfunc(x - np.expand_dims(center, axis))
+    elif isinstance(center, numbers.Number):
+        res = tfunc(x - center)
+    else:
+        raise ValueError('center should be median, mean or trimmed')
+
+    return res
diff --git a/statsmodels/stats/sandwich_covariance.py b/statsmodels/stats/sandwich_covariance.py
index d658715b3..022cbe10b 100644
--- a/statsmodels/stats/sandwich_covariance.py
+++ b/statsmodels/stats/sandwich_covariance.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Sandwich covariance estimators


@@ -101,12 +102,20 @@ Statistics 90, no. 3 (2008): 414–427.

 """
 import numpy as np
+
 from statsmodels.tools.grouputils import combine_indices, group_sums
 from statsmodels.stats.moment_helpers import se_cov
+
 __all__ = ['cov_cluster', 'cov_cluster_2groups', 'cov_hac', 'cov_nw_panel',
-    'cov_white_simple', 'cov_hc0', 'cov_hc1', 'cov_hc2', 'cov_hc3',
-    'se_cov', 'weights_bartlett', 'weights_uniform']
-"""
+           'cov_white_simple',
+           'cov_hc0', 'cov_hc1', 'cov_hc2', 'cov_hc3',
+           'se_cov', 'weights_bartlett', 'weights_uniform']
+
+
+
+
+#----------- from linear_model.RegressionResults
+'''
     HC0_se
         White's (1980) heteroskedasticity robust standard errors.
         Defined as sqrt(diag(X.T X)^(-1)X.T diag(e_i^(2)) X(X.T X)^(-1)
@@ -146,57 +155,111 @@ __all__ = ['cov_cluster', 'cov_cluster_2groups', 'cov_hac', 'cov_nw_panel',
         which is in this case is resid^(2)/(1-h_ii)^(2).  HCCM matrices are
         only appropriate for OLS.

-"""
-
+'''

+# Note: HCCM stands for Heteroskedasticity Consistent Covariance Matrix
 def _HCCM(results, scale):
-    """
+    '''
     sandwich with pinv(x) * diag(scale) * pinv(x).T

     where pinv(x) = (X'X)^(-1) X
     and scale is (nobs,)
-    """
-    pass
-
+    '''
+    H = np.dot(results.model.pinv_wexog,
+        scale[:,None]*results.model.pinv_wexog.T)
+    return H

 def cov_hc0(results):
     """
     See statsmodels.RegressionResults
     """
-    pass

+    het_scale = results.resid**2 # or whitened residuals? only OLS?
+    cov_hc0 = _HCCM(results, het_scale)
+
+    return cov_hc0

 def cov_hc1(results):
     """
     See statsmodels.RegressionResults
     """
-    pass

+    het_scale = results.nobs/(results.df_resid)*(results.resid**2)
+    cov_hc1 = _HCCM(results, het_scale)
+    return cov_hc1

 def cov_hc2(results):
     """
     See statsmodels.RegressionResults
     """
-    pass

+    # probably could be optimized
+    h = np.diag(np.dot(results.model.exog,
+                          np.dot(results.normalized_cov_params,
+                          results.model.exog.T)))
+    het_scale = results.resid**2/(1-h)
+    cov_hc2_ = _HCCM(results, het_scale)
+    return cov_hc2_

 def cov_hc3(results):
     """
     See statsmodels.RegressionResults
     """
-    pass

+    # above probably could be optimized to only calc the diag
+    h = np.diag(np.dot(results.model.exog,
+                          np.dot(results.normalized_cov_params,
+                          results.model.exog.T)))
+    het_scale=(results.resid/(1-h))**2
+    cov_hc3_ = _HCCM(results, het_scale)
+    return cov_hc3_
+
+#---------------------------------------

 def _get_sandwich_arrays(results, cov_type=''):
     """Helper function to get scores from results

     Parameters
     """
-    pass
+
+    if isinstance(results, tuple):
+        # assume we have jac and hessian_inv
+        jac, hessian_inv = results
+        xu = jac = np.asarray(jac)
+        hessian_inv = np.asarray(hessian_inv)
+    elif hasattr(results, 'model'):
+        if hasattr(results, '_results'):
+            # remove wrapper
+            results = results._results
+        # assume we have a results instance
+        if hasattr(results.model, 'jac'):
+            xu = results.model.jac(results.params)
+            hessian_inv = np.linalg.inv(results.model.hessian(results.params))
+        elif hasattr(results.model, 'score_obs'):
+            xu = results.model.score_obs(results.params)
+            hessian_inv = np.linalg.inv(results.model.hessian(results.params))
+        else:
+            xu = results.model.wexog * results.wresid[:, None]
+
+            hessian_inv = np.asarray(results.normalized_cov_params)
+
+        # experimental support for freq_weights
+        if hasattr(results.model, 'freq_weights') and not cov_type == 'clu':
+            # we do not want to square the weights in the covariance calculations
+            # assumes that freq_weights are incorporated in score_obs or equivalent
+            # assumes xu/score_obs is 2D
+            # temporary asarray
+            xu /= np.sqrt(np.asarray(results.model.freq_weights)[:, None])
+
+    else:
+        raise ValueError('need either tuple of (jac, hessian_inv) or results' +
+                         'instance')
+
+    return xu, hessian_inv


 def _HCCM1(results, scale):
-    """
+    '''
     sandwich with pinv(x) * scale * pinv(x).T

     where pinv(x) = (X'X)^(-1) X
@@ -214,12 +277,17 @@ def _HCCM1(results, scale):
     H : ndarray (k_vars, k_vars)
         robust covariance matrix for the parameter estimates

-    """
-    pass
-
+    '''
+    if scale.ndim == 1:
+        H = np.dot(results.model.pinv_wexog,
+                   scale[:,None]*results.model.pinv_wexog.T)
+    else:
+        H = np.dot(results.model.pinv_wexog,
+                   np.dot(scale, results.model.pinv_wexog.T))
+    return H

 def _HCCM2(hessian_inv, scale):
-    """
+    '''
     sandwich with (X'X)^(-1) * scale * (X'X)^(-1)

     scale is (kvars, kvars)
@@ -237,12 +305,17 @@ def _HCCM2(hessian_inv, scale):
     H : ndarray (k_vars, k_vars)
         robust covariance matrix for the parameter estimates

-    """
-    pass
+    '''
+    if scale.ndim == 1:
+        scale = scale[:,None]

+    xxi = hessian_inv
+    H = np.dot(np.dot(xxi, scale), xxi.T)
+    return H

+#TODO: other kernels, move ?
 def weights_bartlett(nlags):
-    """Bartlett weights for HAC
+    '''Bartlett weights for HAC

     this will be moved to another module

@@ -256,12 +329,13 @@ def weights_bartlett(nlags):
     kernel : ndarray, (nlags+1,)
         weights for Bartlett kernel

-    """
-    pass
+    '''

+    #with lag zero
+    return 1 - np.arange(nlags+1)/(nlags+1.)

 def weights_uniform(nlags):
-    """uniform weights for HAC
+    '''uniform weights for HAC

     this will be moved to another module

@@ -275,15 +349,18 @@ def weights_uniform(nlags):
     kernel : ndarray, (nlags+1,)
         weights for uniform kernel

-    """
-    pass
+    '''
+
+    #with lag zero
+    return np.ones(nlags+1)


-kernel_dict = {'bartlett': weights_bartlett, 'uniform': weights_uniform}
+kernel_dict = {'bartlett': weights_bartlett,
+               'uniform': weights_uniform}


 def S_hac_simple(x, nlags=None, weights_func=weights_bartlett):
-    """inner covariance matrix for HAC (Newey, West) sandwich
+    '''inner covariance matrix for HAC (Newey, West) sandwich

     assumes we have a single time series with zero axis consecutive, equal
     spaced time periods
@@ -311,12 +388,26 @@ def S_hac_simple(x, nlags=None, weights_func=weights_bartlett):

     options might change when other kernels besides Bartlett are available.

-    """
-    pass
+    '''
+
+    if x.ndim == 1:
+        x = x[:,None]
+    n_periods = x.shape[0]
+    if nlags is None:
+        nlags = int(np.floor(4 * (n_periods / 100.)**(2./9.)))
+
+    weights = weights_func(nlags)
+
+    S = weights[0] * np.dot(x.T, x)  #weights[0] just for completeness, is 1

+    for lag in range(1, nlags+1):
+        s = np.dot(x[lag:].T, x[:-lag])
+        S += weights[lag] * (s + s.T)
+
+    return S

 def S_white_simple(x):
-    """inner covariance matrix for White heteroscedastistity sandwich
+    '''inner covariance matrix for White heteroscedastistity sandwich


     Parameters
@@ -333,12 +424,15 @@ def S_white_simple(x):
     -----
     this is just dot(X.T, X)

-    """
-    pass
+    '''
+    if x.ndim == 1:
+        x = x[:,None]
+
+    return np.dot(x.T, x)


 def S_hac_groupsum(x, time, nlags=None, weights_func=weights_bartlett):
-    """inner covariance matrix for HAC over group sums sandwich
+    '''inner covariance matrix for HAC over group sums sandwich

     This assumes we have complete equal spaced time periods.
     The number of time periods per group need not be the same, but we need
@@ -371,29 +465,39 @@ def S_hac_groupsum(x, time, nlags=None, weights_func=weights_bartlett):
     Daniel Hoechle, xtscc paper
     Driscoll and Kraay

-    """
-    pass
+    '''
+    #needs groupsums
+
+    x_group_sums = group_sums(x, time).T #TODO: transpose return in grou_sum
+
+    return S_hac_simple(x_group_sums, nlags=nlags, weights_func=weights_func)


 def S_crosssection(x, group):
-    """inner covariance matrix for White on group sums sandwich
+    '''inner covariance matrix for White on group sums sandwich

     I guess for a single categorical group only,
     categorical group, can also be the product/intersection of groups

     This is used by cov_cluster and indirectly verified

-    """
-    pass
+    '''
+    x_group_sums = group_sums(x, group).T  #TODO: why transposed
+
+    return S_white_simple(x_group_sums)


 def cov_crosssection_0(results, group):
-    """this one is still wrong, use cov_cluster instead"""
-    pass
+    '''this one is still wrong, use cov_cluster instead'''

+    #TODO: currently used version of groupsums requires 2d resid
+    scale = S_crosssection(results.resid[:,None], group)
+    scale = np.squeeze(scale)
+    cov = _HCCM1(results, scale)
+    return cov

 def cov_cluster(results, group, use_correction=True):
-    """cluster robust covariance matrix
+    '''cluster robust covariance matrix

     Calculates sandwich covariance matrix for a single cluster, i.e. grouped
     variables.
@@ -415,12 +519,30 @@ def cov_cluster(results, group, use_correction=True):
     -----
     same result as Stata in UCLA example and same as Peterson

-    """
-    pass
+    '''
+    #TODO: currently used version of groupsums requires 2d resid
+    xu, hessian_inv = _get_sandwich_arrays(results, cov_type='clu')

+    if not hasattr(group, 'dtype') or group.dtype != np.dtype('int'):
+        clusters, group = np.unique(group, return_inverse=True)
+    else:
+        clusters = np.unique(group)
+
+    scale = S_crosssection(xu, group)
+
+    nobs, k_params = xu.shape
+    n_groups = len(clusters) #replace with stored group attributes if available
+
+    cov_c = _HCCM2(hessian_inv, scale)
+
+    if use_correction:
+        cov_c *= (n_groups / (n_groups - 1.) *
+                  ((nobs-1.) / float(nobs - k_params)))
+
+    return cov_c

 def cov_cluster_2groups(results, group, group2=None, use_correction=True):
-    """cluster robust covariance matrix for two groups/clusters
+    '''cluster robust covariance matrix for two groups/clusters

     Parameters
     ----------
@@ -446,12 +568,38 @@ def cov_cluster_2groups(results, group, group2=None, use_correction=True):
     -----

     verified against Peterson's table, (4 decimal print precision)
-    """
-    pass
+    '''
+
+    if group2 is None:
+        if group.ndim !=2 or group.shape[1] != 2:
+            raise ValueError('if group2 is not given, then groups needs to be ' +
+                             'an array with two columns')
+        group0 = group[:, 0]
+        group1 = group[:, 1]
+    else:
+        group0 = group
+        group1 = group2
+        group = (group0, group1)
+
+
+    cov0 = cov_cluster(results, group0, use_correction=use_correction)
+    #[0] because we get still also returns bse
+    cov1 = cov_cluster(results, group1, use_correction=use_correction)
+
+    # cov of cluster formed by intersection of two groups
+    cov01 = cov_cluster(results,
+                        combine_indices(group)[0],
+                        use_correction=use_correction)
+
+    #robust cov matrix for union of groups
+    cov_both = cov0 + cov1 - cov01
+
+    #return all three (for now?)
+    return cov_both, cov0, cov1


 def cov_white_simple(results, use_correction=True):
-    """
+    '''
     heteroscedasticity robust covariance matrix (White)

     Parameters
@@ -477,13 +625,22 @@ def cov_white_simple(results, use_correction=True):
     cov_hc1, cov_hc2, cov_hc3 : heteroscedasticity robust covariance matrices
         with small sample corrections

-    """
-    pass
+    '''
+    xu, hessian_inv = _get_sandwich_arrays(results)
+    sigma = S_white_simple(xu)
+
+    cov_w = _HCCM2(hessian_inv, sigma)  #add bread to sandwich
+
+    if use_correction:
+        nobs, k_params = xu.shape
+        cov_w *= nobs / float(nobs - k_params)
+
+    return cov_w


 def cov_hac_simple(results, nlags=None, weights_func=weights_bartlett,
-    use_correction=True):
-    """
+                   use_correction=True):
+    '''
     heteroscedasticity and autocorrelation robust covariance matrix (Newey-West)

     Assumes we have a single time series with zero axis consecutive, equal
@@ -514,34 +671,66 @@ def cov_hac_simple(results, nlags=None, weights_func=weights_bartlett,

     options might change when other kernels besides Bartlett are available.

-    """
-    pass
+    '''
+    xu, hessian_inv = _get_sandwich_arrays(results)
+    sigma = S_hac_simple(xu, nlags=nlags, weights_func=weights_func)
+
+    cov_hac = _HCCM2(hessian_inv, sigma)
+
+    if use_correction:
+        nobs, k_params = xu.shape
+        cov_hac *= nobs / float(nobs - k_params)

+    return cov_hac

-cov_hac = cov_hac_simple
+cov_hac = cov_hac_simple   #alias for users

+#---------------------- use time lags corrected for groups
+#the following were copied from a different experimental script,
+#groupidx is tuple, observations assumed to be stacked by group member and
+#sorted by time, equal number of periods is not required, but equal spacing is.
+#I think this is pure within group HAC: apply HAC to each group member
+#separately

 def lagged_groups(x, lag, groupidx):
-    """
+    '''
     assumes sorted by time, groupidx is tuple of start and end values
     not optimized, just to get a working version, loop over groups
-    """
-    pass
+    '''
+    out0 = []
+    out_lagged = []
+    for l,u in groupidx:
+        if l+lag < u: #group is longer than lag
+            out0.append(x[l+lag:u])
+            out_lagged.append(x[l:u-lag])
+
+    if out0 == []:
+        raise ValueError('all groups are empty taking lags')
+    #return out0, out_lagged
+    return np.vstack(out0), np.vstack(out_lagged)
+


 def S_nw_panel(xw, weights, groupidx):
-    """inner covariance matrix for HAC for panel data
+    '''inner covariance matrix for HAC for panel data

     no denominator nobs used

     no reference for this, just accounting for time indices
-    """
-    pass
+    '''
+    nlags = len(weights)-1
+
+    S = weights[0] * np.dot(xw.T, xw)  #weights just for completeness
+    for lag in range(1, nlags+1):
+        xw0, xwlag = lagged_groups(xw, lag, groupidx)
+        s = np.dot(xw0.T, xwlag)
+        S += weights[lag] * (s + s.T)
+    return S


 def cov_nw_panel(results, nlags, groupidx, weights_func=weights_bartlett,
-    use_correction='hac'):
-    """Panel HAC robust covariance matrix
+                 use_correction='hac'):
+    '''Panel HAC robust covariance matrix

     Assumes we have a panel of time series with consecutive, equal spaced time
     periods. Data is assumed to be in long format with time series of each
@@ -587,13 +776,31 @@ def cov_nw_panel(results, nlags, groupidx, weights_func=weights_bartlett,
     Options might change when other kernels besides Bartlett and uniform are
     available.

-    """
-    pass
+    '''
+    if nlags == 0: #so we can reproduce HC0 White
+        weights = [1, 0]  #to avoid the scalar check in hac_nw
+    else:
+        weights = weights_func(nlags)
+
+    xu, hessian_inv = _get_sandwich_arrays(results)
+
+    S_hac = S_nw_panel(xu, weights, groupidx)
+    cov_hac = _HCCM2(hessian_inv, S_hac)
+    if use_correction:
+        nobs, k_params = xu.shape
+        if use_correction == 'hac':
+            cov_hac *= nobs / float(nobs - k_params)
+        elif use_correction in ['c', 'clu', 'cluster']:
+            n_groups = len(groupidx)
+            cov_hac *= n_groups / (n_groups - 1.)
+            cov_hac *= ((nobs-1.) / float(nobs - k_params))
+
+    return cov_hac


 def cov_nw_groupsum(results, nlags, time, weights_func=weights_bartlett,
-    use_correction=0):
-    """Driscoll and Kraay Panel robust covariance matrix
+                 use_correction=0):
+    '''Driscoll and Kraay Panel robust covariance matrix

     Robust covariance matrix for panel data of Driscoll and Kraay.

@@ -649,5 +856,20 @@ def cov_nw_groupsum(results, nlags, time, weights_func=weights_bartlett,
     Daniel Hoechle, xtscc paper
     Driscoll and Kraay

-    """
-    pass
+    '''
+
+    xu, hessian_inv = _get_sandwich_arrays(results)
+
+    #S_hac = S_nw_panel(xw, weights, groupidx)
+    S_hac = S_hac_groupsum(xu, time, nlags=nlags, weights_func=weights_func)
+    cov_hac = _HCCM2(hessian_inv, S_hac)
+    if use_correction:
+        nobs, k_params = xu.shape
+        if use_correction == 'hac':
+            cov_hac *= nobs / float(nobs - k_params)
+        elif use_correction in ['c', 'cluster']:
+            n_groups = len(np.unique(time))
+            cov_hac *= n_groups / (n_groups - 1.)
+            cov_hac *= ((nobs-1.) / float(nobs - k_params))
+
+    return cov_hac
diff --git a/statsmodels/stats/stattools.py b/statsmodels/stats/stattools.py
index b9b7f5815..0a418a897 100644
--- a/statsmodels/stats/stattools.py
+++ b/statsmodels/stats/stattools.py
@@ -5,13 +5,14 @@ Notes
 -----
 These functions have not been formally tested.
 """
+
 from scipy import stats
 import numpy as np
 from statsmodels.tools.sm_exceptions import ValueWarning


 def durbin_watson(resids, axis=0):
-    """
+    r"""
     Calculates the Durbin-Watson statistic.

     Parameters
@@ -35,7 +36,7 @@ def durbin_watson(resids, axis=0):

     .. math::

-       \\sum_{t=2}^T((e_t - e_{t-1})^2)/\\sum_{t=1}^Te_t^2
+       \sum_{t=2}^T((e_t - e_{t-1})^2)/\sum_{t=1}^Te_t^2

     The test statistic is approximately equal to 2*(1-r) where ``r`` is the
     sample autocorrelation of the residuals. Thus, for r == 0, indicating no
@@ -44,7 +45,10 @@ def durbin_watson(resids, axis=0):
     evidence for positive serial correlation. The closer to 4, the more
     evidence for negative serial correlation.
     """
-    pass
+    resids = np.asarray(resids)
+    diff_resids = np.diff(resids, 1, axis=axis)
+    dw = np.sum(diff_resids**2, axis=axis) / np.sum(resids**2, axis=axis)
+    return dw


 def omni_normtest(resids, axis=0):
@@ -61,11 +65,21 @@ def omni_normtest(resids, axis=0):
     -------
     Chi^2 score, two-tail probability
     """
-    pass
+    # TODO: change to exception in summary branch and catch in summary()
+    #   behavior changed between scipy 0.9 and 0.10
+    resids = np.asarray(resids)
+    n = resids.shape[axis]
+    if n < 8:
+        from warnings import warn
+        warn("omni_normtest is not valid with less than 8 observations; %i "
+             "samples were given." % int(n), ValueWarning)
+        return np.nan, np.nan
+
+    return stats.normaltest(resids, axis=axis)


 def jarque_bera(resids, axis=0):
-    """
+    r"""
     The Jarque-Bera test of normality.

     Parameters
@@ -94,7 +108,7 @@ def jarque_bera(resids, axis=0):
     The Jarque-Bera test statistic tests the null that the data is normally
     distributed against an alternative that the data follow some other
     distribution. The test statistic is based on two moments of the data,
-    the skewness, and the kurtosis, and has an asymptotic :math:`\\chi^2_2`
+    the skewness, and the kurtosis, and has an asymptotic :math:`\chi^2_2`
     distribution.

     The test statistic is defined
@@ -104,7 +118,19 @@ def jarque_bera(resids, axis=0):
     where n is the number of data points, S is the sample skewness, and K is
     the sample kurtosis of the data.
     """
-    pass
+    resids = np.atleast_1d(np.asarray(resids, dtype=float))
+    if resids.size < 2:
+        raise ValueError("resids must contain at least 2 elements")
+    # Calculate residual skewness and kurtosis
+    skew = stats.skew(resids, axis=axis)
+    kurtosis = 3 + stats.kurtosis(resids, axis=axis)
+
+    # Calculate the Jarque-Bera test for normality
+    n = resids.shape[axis]
+    jb = (n / 6.) * (skew ** 2 + (1 / 4.) * (kurtosis - 3) ** 2)
+    jb_pv = stats.chi2.sf(jb, 2)
+
+    return jb, jb_pv, skew, kurtosis


 def robust_skewness(y, axis=0):
@@ -154,7 +180,33 @@ def robust_skewness(y, axis=0):
        skewness and kurtosis," Finance Research Letters, vol. 1, pp. 56-73,
        March 2004.
     """
-    pass
+
+    if axis is None:
+        y = y.ravel()
+        axis = 0
+
+    y = np.sort(y, axis)
+
+    q1, q2, q3 = np.percentile(y, [25.0, 50.0, 75.0], axis=axis)
+
+    mu = y.mean(axis)
+    shape = (y.size,)
+    if axis is not None:
+        shape = list(mu.shape)
+        shape.insert(axis, 1)
+        shape = tuple(shape)
+
+    mu_b = np.reshape(mu, shape)
+    q2_b = np.reshape(q2, shape)
+
+    sigma = np.sqrt(np.mean(((y - mu_b)**2), axis))
+
+    sk1 = stats.skew(y, axis=axis)
+    sk2 = (q1 + q3 - 2.0 * q2) / (q3 - q1)
+    sk3 = (mu - q2) / np.mean(abs(y - q2_b), axis=axis)
+    sk4 = (mu - q2) / sigma
+
+    return sk1, sk2, sk3, sk4


 def _kr3(y, alpha=5.0, beta=50.0):
@@ -182,7 +234,15 @@ def _kr3(y, alpha=5.0, beta=50.0):
        skewness and kurtosis," Finance Research Letters, vol. 1, pp. 56-73,
        March 2004.
     """
-    pass
+    perc = (alpha, 100.0 - alpha, beta, 100.0 - beta)
+    lower_alpha, upper_alpha, lower_beta, upper_beta = np.percentile(y, perc)
+    l_alpha = np.mean(y[y < lower_alpha])
+    u_alpha = np.mean(y[y > upper_alpha])
+
+    l_beta = np.mean(y[y < lower_beta])
+    u_beta = np.mean(y[y > upper_beta])
+
+    return (u_alpha - l_alpha) / (u_beta - l_beta)


 def expected_robust_kurtosis(ab=(5.0, 50.0), dg=(2.5, 25.0)):
@@ -210,7 +270,24 @@ def expected_robust_kurtosis(ab=(5.0, 50.0), dg=(2.5, 25.0)):
     -----
     See `robust_kurtosis` for definitions of the robust kurtosis measures
     """
-    pass
+
+    alpha, beta = ab
+    delta, gamma = dg
+    expected_value = np.zeros(4)
+    ppf = stats.norm.ppf
+    pdf = stats.norm.pdf
+    q1, q2, q3, q5, q6, q7 = ppf(np.array((1.0, 2.0, 3.0, 5.0, 6.0, 7.0)) / 8)
+    expected_value[0] = 3
+
+    expected_value[1] = ((q7 - q5) + (q3 - q1)) / (q6 - q2)
+
+    q_alpha, q_beta = ppf(np.array((alpha / 100.0, beta / 100.0)))
+    expected_value[2] = (2 * pdf(q_alpha) / alpha) / (2 * pdf(q_beta) / beta)
+
+    q_delta, q_gamma = ppf(np.array((delta / 100.0, gamma / 100.0)))
+    expected_value[3] = (-2.0 * q_delta) / (-2.0 * q_gamma)
+
+    return expected_value


 def robust_kurtosis(y, axis=0, ab=(5.0, 50.0), dg=(2.5, 25.0), excess=True):
@@ -275,7 +352,31 @@ def robust_kurtosis(y, axis=0, ab=(5.0, 50.0), dg=(2.5, 25.0), excess=True):
        skewness and kurtosis," Finance Research Letters, vol. 1, pp. 56-73,
        March 2004.
     """
-    pass
+    if (axis is None or
+            (y.squeeze().ndim == 1 and y.ndim != 1)):
+        y = y.ravel()
+        axis = 0
+
+    alpha, beta = ab
+    delta, gamma = dg
+
+    perc = (12.5, 25.0, 37.5, 62.5, 75.0, 87.5,
+            delta, 100.0 - delta, gamma, 100.0 - gamma)
+    e1, e2, e3, e5, e6, e7, fd, f1md, fg, f1mg = np.percentile(y, perc,
+                                                               axis=axis)
+
+    expected_value = (expected_robust_kurtosis(ab, dg)
+                      if excess else np.zeros(4))
+
+    kr1 = stats.kurtosis(y, axis, False) - expected_value[0]
+    kr2 = ((e7 - e5) + (e3 - e1)) / (e6 - e2) - expected_value[1]
+    if y.ndim == 1:
+        kr3 = _kr3(y, alpha, beta)
+    else:
+        kr3 = np.apply_along_axis(_kr3, axis, y, alpha, beta)
+    kr3 -= expected_value[2]
+    kr4 = (f1md - fd) / (f1mg - fg) - expected_value[3]
+    return kr1, kr2, kr3, kr4


 def _medcouple_1d(y):
@@ -301,7 +402,43 @@ def _medcouple_1d(y):
        distributions" Computational Statistics & Data Analysis, vol. 52, pp.
        5186-5201, August 2008.
     """
-    pass
+
+    # Parameter changes the algorithm to the slower for large n
+
+    y = np.squeeze(np.asarray(y))
+    if y.ndim != 1:
+        raise ValueError("y must be squeezable to a 1-d array")
+
+    y = np.sort(y)
+
+    n = y.shape[0]
+    if n % 2 == 0:
+        mf = (y[n // 2 - 1] + y[n // 2]) / 2
+    else:
+        mf = y[(n - 1) // 2]
+
+    z = y - mf
+    lower = z[z <= 0.0]
+    upper = z[z >= 0.0]
+    upper = upper[:, None]
+    standardization = upper - lower
+    is_zero = np.logical_and(lower == 0.0, upper == 0.0)
+    standardization[is_zero] = np.inf
+    spread = upper + lower
+    h = spread / standardization
+    # GH5395
+    num_ties = np.sum(lower == 0.0)
+    if num_ties:
+        # Replacements has -1 above the anti-diagonal, 0 on the anti-diagonal,
+        # and 1 below the anti-diagonal
+        replacements = np.ones((num_ties, num_ties)) - np.eye(num_ties)
+        replacements -= 2 * np.triu(replacements)
+        # Convert diagonal to anti-diagonal
+        replacements = np.fliplr(replacements)
+        # Always replace upper right block
+        h[:num_ties, -num_ties:] = replacements
+
+    return np.median(h)


 def medcouple(y, axis=0):
@@ -331,4 +468,8 @@ def medcouple(y, axis=0):
        distributions" Computational Statistics & Data Analysis, vol. 52, pp.
        5186-5201, August 2008.
     """
-    pass
+    y = np.asarray(y, dtype=np.double)  # GH 4243
+    if axis is None:
+        return _medcouple_1d(y.ravel())
+
+    return np.apply_along_axis(_medcouple_1d, axis, y)
diff --git a/statsmodels/stats/tabledist.py b/statsmodels/stats/tabledist.py
index ba72091ec..5c2533102 100644
--- a/statsmodels/stats/tabledist.py
+++ b/statsmodels/stats/tabledist.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Sat Oct 01 20:20:16 2011

@@ -14,6 +15,7 @@ interpolators
 """
 import numpy as np
 from scipy.interpolate import interp1d, interp2d, Rbf
+
 from statsmodels.tools.decorators import cache_readonly


@@ -59,8 +61,8 @@ class TableDist:
     is used for nobs > max(size).
     """

-    def __init__(self, alpha, size, crit_table, asymptotic=None, min_nobs=
-        None, max_nobs=None):
+    def __init__(self, alpha, size, crit_table, asymptotic=None,
+                 min_nobs=None, max_nobs=None):
         self.alpha = np.asarray(alpha)
         if self.alpha.ndim != 1:
             raise ValueError('alpha is not 1d')
@@ -76,29 +78,30 @@ class TableDist:
                 raise ValueError('alpha is not sorted')
         self.crit_table = np.asarray(crit_table)
         if self.crit_table.shape != (self.size.shape[0], self.alpha.shape[0]):
-            raise ValueError(
-                'crit_table must have shape(len(size), len(alpha))')
+            raise ValueError('crit_table must have shape'
+                             '(len(size), len(alpha))')
+
         self.n_alpha = len(alpha)
         self.signcrit = np.sign(np.diff(self.crit_table, 1).mean())
-        if self.signcrit > 0:
+        if self.signcrit > 0:  # increasing
             self.critv_bounds = self.crit_table[:, [0, 1]]
         else:
             self.critv_bounds = self.crit_table[:, [1, 0]]
         self.asymptotic = None
         max_size = self.max_size = max(size)
+
         if asymptotic is not None:
             try:
                 cv = asymptotic(self.max_size + 1)
             except Exception as exc:
-                raise type(exc)(
-                    """Calling asymptotic(self.size+1) failed. The error message was:
-
-{err_msg}"""
-                    .format(err_msg=exc.args[0]))
+                raise type(exc)('Calling asymptotic(self.size+1) failed. The '
+                                'error message was:'
+                                '\n\n{err_msg}'.format(err_msg=exc.args[0]))
             if len(cv) != len(alpha):
-                raise ValueError('asymptotic does not return len(alpha) values'
-                    )
+                raise ValueError('asymptotic does not return len(alpha) '
+                                 'values')
             self.asymptotic = asymptotic
+
         self.min_nobs = max_size if min_nobs is None else min_nobs
         self.max_nobs = max_size if max_nobs is None else max_nobs
         if self.min_nobs > max_size:
@@ -106,6 +109,26 @@ class TableDist:
         if self.max_nobs > max_size:
             raise ValueError('max_nobs > max(size)')

+    @cache_readonly
+    def polyn(self):
+        polyn = [interp1d(self.size, self.crit_table[:, i])
+                 for i in range(self.n_alpha)]
+        return polyn
+
+    @cache_readonly
+    def poly2d(self):
+        # check for monotonicity ?
+        # fix this, interp needs increasing
+        poly2d = interp2d(self.size, self.alpha, self.crit_table)
+        return poly2d
+
+    @cache_readonly
+    def polyrbf(self):
+        xs, xa = np.meshgrid(self.size.astype(float), self.alpha)
+        polyrbf = Rbf(xs.ravel(), xa.ravel(), self.crit_table.T.ravel(),
+                      function='linear')
+        return polyrbf
+
     def _critvals(self, n):
         """
         Rows of the table, linearly interpolated for given sample size
@@ -126,7 +149,21 @@ class TableDist:
         critical values for all alphas for any sample size that we can obtain
         through interpolation
         """
-        pass
+        if n > self.max_size:
+            if self.asymptotic is not None:
+                cv = self.asymptotic(n)
+            else:
+                raise ValueError('n is above max(size) and no asymptotic '
+                                 'distribtuion is provided')
+        else:
+            cv = ([p(n) for p in self.polyn])
+            if n > self.min_nobs:
+                w = (n - self.min_nobs) / (self.max_nobs - self.min_nobs)
+                w = min(1.0, w)
+                a_cv = self.asymptotic(n)
+                cv = w * a_cv + (1 - w) * cv
+
+        return cv

     def prob(self, x, n):
         """
@@ -147,7 +184,32 @@ class TableDist:
             This is the probability for each value of x, the p-value in
             underlying distribution is for a statistical test.
         """
-        pass
+        critv = self._critvals(n)
+        alpha = self.alpha
+
+        if self.signcrit < 1:
+            # reverse if critv is decreasing
+            critv, alpha = critv[::-1], alpha[::-1]
+
+        # now critv is increasing
+        if np.size(x) == 1:
+            if x < critv[0]:
+                return alpha[0]
+            elif x > critv[-1]:
+                return alpha[-1]
+            return interp1d(critv, alpha)(x)[()]
+        else:
+            # vectorized
+            cond_low = (x < critv[0])
+            cond_high = (x > critv[-1])
+            cond_interior = ~np.logical_or(cond_low, cond_high)
+
+            probs = np.nan * np.ones(x.shape)  # mistake if nan left
+            probs[cond_low] = alpha[0]
+            probs[cond_low] = alpha[-1]
+            probs[cond_interior] = interp1d(critv, alpha)(x[cond_interior])
+
+            return probs

     def crit(self, prob, n):
         """
@@ -167,7 +229,26 @@ class TableDist:
         ppf : array_like
             critical values with same shape as prob
         """
-        pass
+        prob = np.asarray(prob)
+        alpha = self.alpha
+        critv = self._critvals(n)
+
+        # vectorized
+        cond_ilow = (prob > alpha[0])
+        cond_ihigh = (prob < alpha[-1])
+        cond_interior = np.logical_or(cond_ilow, cond_ihigh)
+
+        # scalar
+        if prob.size == 1:
+            if cond_interior:
+                return interp1d(alpha, critv)(prob)
+            else:
+                return np.nan
+
+        # vectorized
+        quantile = np.nan * np.ones(prob.shape)  # nans for outside
+        quantile[cond_interior] = interp1d(alpha, critv)(prob[cond_interior])
+        return quantile

     def crit3(self, prob, n):
         """
@@ -188,4 +269,23 @@ class TableDist:
             critical values with same shape as prob, returns nan for arguments
             that are outside of the table bounds
         """
-        pass
+        prob = np.asarray(prob)
+        alpha = self.alpha
+
+        # vectorized
+        cond_ilow = (prob > alpha[0])
+        cond_ihigh = (prob < alpha[-1])
+        cond_interior = np.logical_or(cond_ilow, cond_ihigh)
+
+        # scalar
+        if prob.size == 1:
+            if cond_interior:
+                return self.polyrbf(n, prob)
+            else:
+                return np.nan
+
+        # vectorized
+        quantile = np.nan * np.ones(prob.shape)  # nans for outside
+
+        quantile[cond_interior] = self.polyrbf(n, prob[cond_interior])
+        return quantile
diff --git a/statsmodels/stats/weightstats.py b/statsmodels/stats/weightstats.py
index e5ed3e6bf..57f136c9e 100644
--- a/statsmodels/stats/weightstats.py
+++ b/statsmodels/stats/weightstats.py
@@ -31,8 +31,10 @@ Note: scipy has now a separate, pooled variance option in ttest, but I have not
 compared yet.

 """
+
 import numpy as np
 from scipy import stats
+
 from statsmodels.tools.decorators import cache_readonly


@@ -100,11 +102,13 @@ class DescrStatsW:
     """

     def __init__(self, data, weights=None, ddof=0):
+
         self.data = np.asarray(data)
         if weights is None:
             self.weights = np.ones(self.data.shape[0])
         else:
             self.weights = np.asarray(weights).astype(float)
+            # TODO: why squeeze?
             if len(self.weights.shape) > 1 and len(self.weights) > 1:
                 self.weights = self.weights.squeeze()
         self.ddof = ddof
@@ -112,34 +116,35 @@ class DescrStatsW:
     @cache_readonly
     def sum_weights(self):
         """Sum of weights"""
-        pass
+        return self.weights.sum(0)

     @cache_readonly
     def nobs(self):
         """alias for number of observations/cases, equal to sum of weights
         """
-        pass
+        return self.sum_weights

     @cache_readonly
     def sum(self):
         """weighted sum of data"""
-        pass
+        return np.dot(self.data.T, self.weights)

     @cache_readonly
     def mean(self):
         """weighted mean of data"""
-        pass
+        return self.sum / self.sum_weights

     @cache_readonly
     def demeaned(self):
         """data with weighted mean subtracted"""
-        pass
+        return self.data - self.mean

     @cache_readonly
     def sumsquares(self):
         """weighted sum of squares of demeaned data"""
-        pass
+        return np.dot((self.demeaned ** 2).T, self.weights)

+    # need memoize instead of cache decorator
     def var_ddof(self, ddof=0):
         """variance of data given ddof

@@ -153,7 +158,7 @@ class DescrStatsW:
         var : float, ndarray
             variance with denominator ``sum_weights - ddof``
         """
-        pass
+        return self.sumsquares / (self.sum_weights - ddof)

     def std_ddof(self, ddof=0):
         """standard deviation of data with given ddof
@@ -168,13 +173,13 @@ class DescrStatsW:
         std : float, ndarray
             standard deviation with denominator ``sum_weights - ddof``
         """
-        pass
+        return np.sqrt(self.var_ddof(ddof=ddof))

     @cache_readonly
     def var(self):
         """variance with default degrees of freedom correction
         """
-        pass
+        return self.sumsquares / (self.sum_weights - self.ddof)

     @cache_readonly
     def _var(self):
@@ -182,13 +187,13 @@ class DescrStatsW:

         used for statistical tests with controlled ddof
         """
-        pass
+        return self.sumsquares / self.sum_weights

     @cache_readonly
     def std(self):
         """standard deviation with default degrees of freedom correction
         """
-        pass
+        return np.sqrt(self.var)

     @cache_readonly
     def cov(self):
@@ -197,7 +202,9 @@ class DescrStatsW:
         assumes variables in columns and observations in rows
         uses default ddof
         """
-        pass
+        cov_ = np.dot(self.weights * self.demeaned.T, self.demeaned)
+        cov_ /= self.sum_weights - self.ddof
+        return cov_

     @cache_readonly
     def corrcoef(self):
@@ -205,13 +212,20 @@ class DescrStatsW:

         assumes variables in columns and observations in rows
         """
-        pass
+        return self.cov / self.std / self.std[:, None]

     @cache_readonly
     def std_mean(self):
         """standard deviation of weighted mean
         """
-        pass
+        std = self.std
+        if self.ddof != 0:
+            # ddof correction,   (need copy of std)
+            std = std * np.sqrt(
+                (self.sum_weights - self.ddof) / self.sum_weights
+            )
+
+        return std / np.sqrt(self.sum_weights - 1)

     def quantile(self, probs, return_pandas=True):
         """
@@ -256,9 +270,60 @@ class DescrStatsW:

         https://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univariate_sect028.htm
         """
-        pass

-    def tconfint_mean(self, alpha=0.05, alternative='two-sided'):
+        import pandas as pd
+
+        probs = np.asarray(probs)
+        probs = np.atleast_1d(probs)
+
+        if self.data.ndim == 1:
+            rslt = self._quantile(self.data, probs)
+            if return_pandas:
+                rslt = pd.Series(rslt, index=probs)
+        else:
+            rslt = []
+            for vec in self.data.T:
+                rslt.append(self._quantile(vec, probs))
+            rslt = np.column_stack(rslt)
+            if return_pandas:
+                columns = ["col%d" % (j + 1) for j in range(rslt.shape[1])]
+                rslt = pd.DataFrame(data=rslt, columns=columns, index=probs)
+
+        if return_pandas:
+            rslt.index.name = "p"
+
+        return rslt
+
+    def _quantile(self, vec, probs):
+        # Helper function to calculate weighted quantiles for one column.
+        # Follows definition from SAS documentation.
+        # Returns ndarray
+
+        import pandas as pd
+
+        # Aggregate over ties
+        df = pd.DataFrame(index=np.arange(len(self.weights)))
+        df["weights"] = self.weights
+        df["vec"] = vec
+        dfg = df.groupby("vec").agg("sum")
+        weights = dfg.values[:, 0]
+        values = np.asarray(dfg.index)
+
+        cweights = np.cumsum(weights)
+        totwt = cweights[-1]
+        targets = probs * totwt
+        ii = np.searchsorted(cweights, targets)
+
+        rslt = values[ii]
+
+        # Exact hits
+        jj = np.flatnonzero(np.abs(targets - cweights[ii]) < 1e-10)
+        jj = jj[ii[jj] < len(cweights) - 1]
+        rslt[jj] = (values[ii[jj]] + values[ii[jj] + 1]) / 2
+
+        return rslt
+
+    def tconfint_mean(self, alpha=0.05, alternative="two-sided"):
         """two-sided confidence interval for weighted mean of data

         If the data is 2d, then these are separate confidence intervals
@@ -288,9 +353,14 @@ class DescrStatsW:
         In a previous version, statsmodels 0.4, alpha was the confidence
         level, e.g. 0.95
         """
-        pass
-
-    def zconfint_mean(self, alpha=0.05, alternative='two-sided'):
+        # TODO: add asymmetric
+        dof = self.sum_weights - 1
+        ci = _tconfint_generic(
+            self.mean, self.std_mean, dof, alpha, alternative
+        )
+        return ci
+
+    def zconfint_mean(self, alpha=0.05, alternative="two-sided"):
         """two-sided confidence interval for weighted mean of data

         Confidence interval is based on normal distribution.
@@ -321,9 +391,10 @@ class DescrStatsW:
         In a previous version, statsmodels 0.4, alpha was the confidence
         level, e.g. 0.95
         """
-        pass

-    def ttest_mean(self, value=0, alternative='two-sided'):
+        return _zconfint_generic(self.mean, self.std_mean, alpha, alternative)
+
+    def ttest_mean(self, value=0, alternative="two-sided"):
         """ttest of Null hypothesis that mean is equal to value.

         The alternative hypothesis H1 is defined by the following
@@ -352,7 +423,20 @@ class DescrStatsW:
         df : int or float

         """
-        pass
+        # TODO: check direction with R, smaller=less, larger=greater
+        tstat = (self.mean - value) / self.std_mean
+        dof = self.sum_weights - 1
+        # TODO: use outsourced
+        if alternative == "two-sided":
+            pvalue = stats.t.sf(np.abs(tstat), dof) * 2
+        elif alternative == "larger":
+            pvalue = stats.t.sf(tstat, dof)
+        elif alternative == "smaller":
+            pvalue = stats.t.cdf(tstat, dof)
+        else:
+            raise ValueError("alternative not recognized")
+
+        return tstat, pvalue, dof

     def ttost_mean(self, low, upp):
         """test of (non-)equivalence of one sample
@@ -385,9 +469,12 @@ class DescrStatsW:
             test

         """
-        pass

-    def ztest_mean(self, value=0, alternative='two-sided'):
+        t1, pv1, df1 = self.ttest_mean(low, alternative="larger")
+        t2, pv2, df2 = self.ttest_mean(upp, alternative="smaller")
+        return np.maximum(pv1, pv2), (t1, pv1, df1), (t2, pv2, df2)
+
+    def ztest_mean(self, value=0, alternative="two-sided"):
         """z-test of Null hypothesis that mean is equal to value.

         The alternative hypothesis H1 is defined by the following
@@ -443,7 +530,16 @@ class DescrStatsW:
         >>> sm.stats.DescrStatsW(x1, np.array(w1)*21./20).ztest_mean(0.5)
         (2.5819888974716116, 0.0098232745075192366)
         """
-        pass
+        tstat = (self.mean - value) / self.std_mean
+        # TODO: use outsourced
+        if alternative == "two-sided":
+            pvalue = stats.norm.sf(np.abs(tstat)) * 2
+        elif alternative == "larger":
+            pvalue = stats.norm.sf(tstat)
+        elif alternative == "smaller":
+            pvalue = stats.norm.cdf(tstat)
+
+        return tstat, pvalue

     def ztost_mean(self, low, upp):
         """test of (non-)equivalence of one sample, based on z-test
@@ -474,7 +570,10 @@ class DescrStatsW:
             test statistic and p-value for upper threshold test

         """
-        pass
+
+        t1, pv1 = self.ztest_mean(low, alternative="larger")
+        t2, pv2 = self.ztest_mean(upp, alternative="smaller")
+        return np.maximum(pv1, pv2), (t1, pv1), (t2, pv2)

     def get_compare(self, other, weights=None):
         """return an instance of CompareMeans with self and other
@@ -497,7 +596,11 @@ class DescrStatsW:
         CompareMeans

         """
-        pass
+        if not isinstance(other, self.__class__):
+            d2 = DescrStatsW(other, weights)
+        else:
+            d2 = other
+        return CompareMeans(self, d2)

     def asrepeats(self):
         """get array that has repeats given by floor(weights)
@@ -505,7 +608,8 @@ class DescrStatsW:
         observations with weight=0 are dropped

         """
-        pass
+        w_int = np.floor(self.weights).astype(int)
+        return np.repeat(self.data, w_int, axis=0)


 def _tstat_generic(value1, value2, std_diff, dof, alternative, diff=0):
@@ -544,7 +648,17 @@ def _tstat_generic(value1, value2, std_diff, dof, alternative, diff=0):
         P-value of the hypothesis test assuming that the test statistic is
         t-distributed with ``df`` degrees of freedom.
     """
-    pass
+
+    tstat = (value1 - value2 - diff) / std_diff
+    if alternative in ["two-sided", "2-sided", "2s"]:
+        pvalue = stats.t.sf(np.abs(tstat), dof) * 2
+    elif alternative in ["larger", "l"]:
+        pvalue = stats.t.sf(tstat, dof)
+    elif alternative in ["smaller", "s"]:
+        pvalue = stats.t.cdf(tstat, dof)
+    else:
+        raise ValueError("invalid alternative")
+    return tstat, pvalue


 def _tconfint_generic(mean, std_mean, dof, alpha, alternative):
@@ -577,7 +691,23 @@ def _tconfint_generic(mean, std_mean, dof, alpha, alternative):
         Upper confidence limit. This is inf for the one-sided alternative
         "larger".
     """
-    pass
+
+    if alternative in ["two-sided", "2-sided", "2s"]:
+        tcrit = stats.t.ppf(1 - alpha / 2.0, dof)
+        lower = mean - tcrit * std_mean
+        upper = mean + tcrit * std_mean
+    elif alternative in ["larger", "l"]:
+        tcrit = stats.t.ppf(alpha, dof)
+        lower = mean + tcrit * std_mean
+        upper = np.inf
+    elif alternative in ["smaller", "s"]:
+        tcrit = stats.t.ppf(1 - alpha, dof)
+        lower = -np.inf
+        upper = mean + tcrit * std_mean
+    else:
+        raise ValueError("invalid alternative")
+
+    return lower, upper


 def _zstat_generic(value1, value2, std_diff, alternative, diff=0):
@@ -614,7 +744,17 @@ def _zstat_generic(value1, value2, std_diff, alternative, diff=0):
         P-value of the hypothesis test assuming that the test statistic is
         t-distributed with ``df`` degrees of freedom.
     """
-    pass
+
+    zstat = (value1 - value2 - diff) / std_diff
+    if alternative in ["two-sided", "2-sided", "2s"]:
+        pvalue = stats.norm.sf(np.abs(zstat)) * 2
+    elif alternative in ["larger", "l"]:
+        pvalue = stats.norm.sf(zstat)
+    elif alternative in ["smaller", "s"]:
+        pvalue = stats.norm.cdf(zstat)
+    else:
+        raise ValueError("invalid alternative")
+    return zstat, pvalue


 def _zstat_generic2(value, std, alternative):
@@ -648,7 +788,17 @@ def _zstat_generic2(value, std, alternative):
         P-value of the hypothesis test assuming that the test statistic is
         normally distributed.
     """
-    pass
+
+    zstat = value / std
+    if alternative in ["two-sided", "2-sided", "2s"]:
+        pvalue = stats.norm.sf(np.abs(zstat)) * 2
+    elif alternative in ["larger", "l"]:
+        pvalue = stats.norm.sf(zstat)
+    elif alternative in ["smaller", "s"]:
+        pvalue = stats.norm.cdf(zstat)
+    else:
+        raise ValueError("invalid alternative")
+    return zstat, pvalue


 def _zconfint_generic(mean, std_mean, alpha, alternative):
@@ -679,7 +829,23 @@ def _zconfint_generic(mean, std_mean, alpha, alternative):
         Upper confidence limit. This is inf for the one-sided alternative
         "larger".
     """
-    pass
+
+    if alternative in ["two-sided", "2-sided", "2s"]:
+        zcrit = stats.norm.ppf(1 - alpha / 2.0)
+        lower = mean - zcrit * std_mean
+        upper = mean + zcrit * std_mean
+    elif alternative in ["larger", "l"]:
+        zcrit = stats.norm.ppf(alpha)
+        lower = mean + zcrit * std_mean
+        upper = np.inf
+    elif alternative in ["smaller", "s"]:
+        zcrit = stats.norm.ppf(1 - alpha)
+        lower = -np.inf
+        upper = mean + zcrit * std_mean
+    else:
+        raise ValueError("invalid alternative")
+
+    return lower, upper


 class CompareMeans:
@@ -709,10 +875,16 @@ class CompareMeans:
         """
         self.d1 = d1
         self.d2 = d2
+        # assume nobs is available
+
+    #        if not hasattr(self.d1, 'nobs'):
+    #            d1.nobs1 = d1.sum_weights.astype(float)  #float just to make sure
+    #        self.nobs2 = d2.sum_weights.astype(float)

     @classmethod
-    def from_data(cls, data1, data2, weights1=None, weights2=None, ddof1=0,
-        ddof2=0):
+    def from_data(
+        cls, data1, data2, weights1=None, weights2=None, ddof1=0, ddof2=0
+    ):
         """construct a CompareMeans object from data

         Parameters
@@ -731,9 +903,12 @@ class CompareMeans:
         A CompareMeans instance.

         """
-        pass
+        return cls(
+            DescrStatsW(data1, weights=weights1, ddof=ddof1),
+            DescrStatsW(data2, weights=weights2, ddof=ddof2),
+        )

-    def summary(self, use_t=True, alpha=0.05, usevar='pooled', value=0):
+    def summary(self, use_t=True, alpha=0.05, usevar="pooled", value=0):
         """summarize the results of the hypothesis test

         Parameters
@@ -757,21 +932,88 @@ class CompareMeans:
         smry : SimpleTable

         """
-        pass
+
+        d1 = self.d1
+        d2 = self.d2
+
+        confint_percents = 100 - alpha * 100
+
+        if use_t:
+            tstat, pvalue, _ = self.ttest_ind(usevar=usevar, value=value)
+            lower, upper = self.tconfint_diff(alpha=alpha, usevar=usevar)
+        else:
+            tstat, pvalue = self.ztest_ind(usevar=usevar, value=value)
+            lower, upper = self.zconfint_diff(alpha=alpha, usevar=usevar)
+
+        if usevar == "pooled":
+            std_err = self.std_meandiff_pooledvar
+        else:
+            std_err = self.std_meandiff_separatevar
+
+        std_err = np.atleast_1d(std_err)
+        tstat = np.atleast_1d(tstat)
+        pvalue = np.atleast_1d(pvalue)
+        lower = np.atleast_1d(lower)
+        upper = np.atleast_1d(upper)
+        conf_int = np.column_stack((lower, upper))
+        params = np.atleast_1d(d1.mean - d2.mean - value)
+
+        title = "Test for equality of means"
+        yname = "y"  # not used in params_frame
+        xname = ["subset #%d" % (ii + 1) for ii in range(tstat.shape[0])]
+
+        from statsmodels.iolib.summary import summary_params
+
+        return summary_params(
+            (None, params, std_err, tstat, pvalue, conf_int),
+            alpha=alpha,
+            use_t=use_t,
+            yname=yname,
+            xname=xname,
+            title=title,
+        )
+
+    @cache_readonly
+    def std_meandiff_separatevar(self):
+        # this uses ``_var`` to use ddof=0 for formula
+        d1 = self.d1
+        d2 = self.d2
+        return np.sqrt(d1._var / (d1.nobs - 1) + d2._var / (d2.nobs - 1))

     @cache_readonly
     def std_meandiff_pooledvar(self):
         """variance assuming equal variance in both data sets

         """
-        pass
+        # this uses ``_var`` to use ddof=0 for formula
+
+        d1 = self.d1
+        d2 = self.d2
+        # could make var_pooled into attribute
+        var_pooled = (
+            (d1.sumsquares + d2.sumsquares)
+            /
+            # (d1.nobs - d1.ddof + d2.nobs - d2.ddof))
+            (d1.nobs - 1 + d2.nobs - 1)
+        )
+        return np.sqrt(var_pooled * (1.0 / d1.nobs + 1.0 / d2.nobs))

     def dof_satt(self):
         """degrees of freedom of Satterthwaite for unequal variance
         """
-        pass
-
-    def ttest_ind(self, alternative='two-sided', usevar='pooled', value=0):
+        d1 = self.d1
+        d2 = self.d2
+        # this follows blindly the SPSS manual
+        # except I use  ``_var`` which has ddof=0
+        sem1 = d1._var / (d1.nobs - 1)
+        sem2 = d2._var / (d2.nobs - 1)
+        semsum = sem1 + sem2
+        z1 = (sem1 / semsum) ** 2 / (d1.nobs - 1)
+        z2 = (sem2 / semsum) ** 2 / (d2.nobs - 1)
+        dof = 1.0 / (z1 + z2)
+        return dof
+
+    def ttest_ind(self, alternative="two-sided", usevar="pooled", value=0):
         """ttest for the null hypothesis of identical means

         this should also be the same as onewaygls, except for ddof differences
@@ -810,9 +1052,25 @@ class CompareMeans:
         The result is independent of the user specified ddof.

         """
-        pass
+        d1 = self.d1
+        d2 = self.d2
+
+        if usevar == "pooled":
+            stdm = self.std_meandiff_pooledvar
+            dof = d1.nobs - 1 + d2.nobs - 1
+        elif usevar == "unequal":
+            stdm = self.std_meandiff_separatevar
+            dof = self.dof_satt()
+        else:
+            raise ValueError('usevar can only be "pooled" or "unequal"')
+
+        tstat, pval = _tstat_generic(
+            d1.mean, d2.mean, stdm, dof, alternative, diff=value
+        )

-    def ztest_ind(self, alternative='two-sided', usevar='pooled', value=0):
+        return tstat, pval, dof
+
+    def ztest_ind(self, alternative="two-sided", usevar="pooled", value=0):
         """z-test for the null hypothesis of identical means

         Parameters
@@ -842,10 +1100,25 @@ class CompareMeans:
             pvalue of the z-test

         """
-        pass
+        d1 = self.d1
+        d2 = self.d2
+
+        if usevar == "pooled":
+            stdm = self.std_meandiff_pooledvar
+        elif usevar == "unequal":
+            stdm = self.std_meandiff_separatevar
+        else:
+            raise ValueError('usevar can only be "pooled" or "unequal"')
+
+        tstat, pval = _zstat_generic(
+            d1.mean, d2.mean, stdm, alternative, diff=value
+        )

-    def tconfint_diff(self, alpha=0.05, alternative='two-sided', usevar=
-        'pooled'):
+        return tstat, pval
+
+    def tconfint_diff(
+        self, alpha=0.05, alternative="two-sided", usevar="pooled"
+    ):
         """confidence interval for the difference in means

         Parameters
@@ -877,10 +1150,26 @@ class CompareMeans:
         The result is independent of the user specified ddof.

         """
-        pass
+        d1 = self.d1
+        d2 = self.d2
+        diff = d1.mean - d2.mean
+        if usevar == "pooled":
+            std_diff = self.std_meandiff_pooledvar
+            dof = d1.nobs - 1 + d2.nobs - 1
+        elif usevar == "unequal":
+            std_diff = self.std_meandiff_separatevar
+            dof = self.dof_satt()
+        else:
+            raise ValueError('usevar can only be "pooled" or "unequal"')

-    def zconfint_diff(self, alpha=0.05, alternative='two-sided', usevar=
-        'pooled'):
+        res = _tconfint_generic(
+            diff, std_diff, dof, alpha=alpha, alternative=alternative
+        )
+        return res
+
+    def zconfint_diff(
+        self, alpha=0.05, alternative="two-sided", usevar="pooled"
+    ):
         """confidence interval for the difference in means

         Parameters
@@ -912,9 +1201,22 @@ class CompareMeans:
         The result is independent of the user specified ddof.

         """
-        pass
+        d1 = self.d1
+        d2 = self.d2
+        diff = d1.mean - d2.mean
+        if usevar == "pooled":
+            std_diff = self.std_meandiff_pooledvar
+        elif usevar == "unequal":
+            std_diff = self.std_meandiff_separatevar
+        else:
+            raise ValueError('usevar can only be "pooled" or "unequal"')

-    def ttost_ind(self, low, upp, usevar='pooled'):
+        res = _zconfint_generic(
+            diff, std_diff, alpha=alpha, alternative=alternative
+        )
+        return res
+
+    def ttost_ind(self, low, upp, usevar="pooled"):
         """
         test of equivalence for two independent samples, base on t-test

@@ -936,9 +1238,12 @@ class CompareMeans:
         t2, pv2 : tuple of floats
             test statistic and pvalue for upper threshold test
         """
-        pass
+        tt1 = self.ttest_ind(alternative="larger", usevar=usevar, value=low)
+        tt2 = self.ttest_ind(alternative="smaller", usevar=usevar, value=upp)
+        # TODO: remove tuple return, use same as for function tost_ind
+        return np.maximum(tt1[1], tt2[1]), (tt1, tt2)

-    def ztost_ind(self, low, upp, usevar='pooled'):
+    def ztost_ind(self, low, upp, usevar="pooled"):
         """
         test of equivalence for two independent samples, based on z-test

@@ -960,11 +1265,33 @@ class CompareMeans:
         t2, pv2 : tuple of floats
             test statistic and pvalue for upper threshold test
         """
-        pass
-
-
-def ttest_ind(x1, x2, alternative='two-sided', usevar='pooled', weights=(
-    None, None), value=0):
+        tt1 = self.ztest_ind(alternative="larger", usevar=usevar, value=low)
+        tt2 = self.ztest_ind(alternative="smaller", usevar=usevar, value=upp)
+        # TODO: remove tuple return, use same as for function tost_ind
+        return np.maximum(tt1[1], tt2[1]), tt1, tt2
+
+    # tost.__doc__ = tost_ind.__doc__
+
+
+# does not work for 2d, does not take weights into account
+##    def test_equal_var(self):
+##        """Levene test for independence
+##
+##        """
+##        d1 = self.d1
+##        d2 = self.d2
+##        #rewrite this, for now just use scipy.stats
+##        return stats.levene(d1.data, d2.data)
+
+
+def ttest_ind(
+    x1,
+    x2,
+    alternative="two-sided",
+    usevar="pooled",
+    weights=(None, None),
+    value=0,
+):
     """ttest independent sample

     Convenience function that uses the classes and throws away the intermediate
@@ -1006,11 +1333,20 @@ def ttest_ind(x1, x2, alternative='two-sided', usevar='pooled', weights=(
         degrees of freedom used in the t-test

     """
-    pass
+    cm = CompareMeans(
+        DescrStatsW(x1, weights=weights[0], ddof=0),
+        DescrStatsW(x2, weights=weights[1], ddof=0),
+    )
+    tstat, pval, dof = cm.ttest_ind(
+        alternative=alternative, usevar=usevar, value=value
+    )
+
+    return tstat, pval, dof


-def ttost_ind(x1, x2, low, upp, usevar='pooled', weights=(None, None),
-    transform=None):
+def ttost_ind(
+    x1, x2, low, upp, usevar="pooled", weights=(None, None), transform=None
+):
     """test of (non-)equivalence for two independent samples

     TOST: two one-sided t tests
@@ -1066,7 +1402,26 @@ def ttost_ind(x1, x2, low, upp, usevar='pooled', weights=(None, None),
     be correction with the functions in ``multitest``.

     """
-    pass
+
+    if transform:
+        if transform is np.log:
+            # avoid hstack in special case
+            x1 = transform(x1)
+            x2 = transform(x2)
+        else:
+            # for transforms like rankdata that will need both datasets
+            # concatenate works for stacking 1d and 2d arrays
+            xx = transform(np.concatenate((x1, x2), 0))
+            x1 = xx[: len(x1)]
+            x2 = xx[len(x1) :]
+        low = transform(low)
+        upp = transform(upp)
+    cm = CompareMeans(
+        DescrStatsW(x1, weights=weights[0], ddof=0),
+        DescrStatsW(x2, weights=weights[1], ddof=0),
+    )
+    pval, res = cm.ttost_ind(low, upp, usevar=usevar)
+    return pval, res[0], res[1]


 def ttost_paired(x1, x2, low, upp, transform=None, weights=None):
@@ -1109,11 +1464,29 @@ def ttost_paired(x1, x2, low, upp, transform=None, weights=None):
         test statistic, pvalue and degrees of freedom for upper threshold test

     """
-    pass

-
-def ztest(x1, x2=None, value=0, alternative='two-sided', usevar='pooled',
-    ddof=1.0):
+    if transform:
+        if transform is np.log:
+            # avoid hstack in special case
+            x1 = transform(x1)
+            x2 = transform(x2)
+        else:
+            # for transforms like rankdata that will need both datasets
+            # concatenate works for stacking 1d and 2d arrays
+            xx = transform(np.concatenate((x1, x2), 0))
+            x1 = xx[: len(x1)]
+            x2 = xx[len(x1) :]
+        low = transform(low)
+        upp = transform(upp)
+    dd = DescrStatsW(x1 - x2, weights=weights, ddof=0)
+    t1, pv1, df1 = dd.ttest_mean(low, alternative="larger")
+    t2, pv2, df2 = dd.ttest_mean(upp, alternative="smaller")
+    return np.maximum(pv1, pv2), (t1, pv1, df1), (t2, pv2, df2)
+
+
+def ztest(
+    x1, x2=None, value=0, alternative="two-sided", usevar="pooled", ddof=1.0
+):
     """test for mean based on normal distribution, one or two samples

     In the case of two samples, the samples are assumed to be independent.
@@ -1158,11 +1531,48 @@ def ztest(x1, x2=None, value=0, alternative='two-sided', usevar='pooled',
     usevar can be pooled or unequal in two sample case

     """
-    pass
-
-
-def zconfint(x1, x2=None, value=0, alpha=0.05, alternative='two-sided',
-    usevar='pooled', ddof=1.0):
+    # TODO: this should delegate to CompareMeans like ttest_ind
+    #       However that does not implement ddof
+
+    # usevar can be pooled or unequal
+
+    if usevar not in {"pooled", "unequal"}:
+        raise NotImplementedError('usevar can only be "pooled" or "unequal"')
+
+    x1 = np.asarray(x1)
+    nobs1 = x1.shape[0]
+    x1_mean = x1.mean(0)
+    x1_var = x1.var(0)
+
+    if x2 is not None:
+        x2 = np.asarray(x2)
+        nobs2 = x2.shape[0]
+        x2_mean = x2.mean(0)
+        x2_var = x2.var(0)
+        if usevar == "pooled":
+            var = nobs1 * x1_var + nobs2 * x2_var
+            var /= nobs1 + nobs2 - 2 * ddof
+            var *= 1.0 / nobs1 + 1.0 / nobs2
+        elif usevar == "unequal":
+            var = x1_var / (nobs1 - ddof) + x2_var / (nobs2 - ddof)
+    else:
+        var = x1_var / (nobs1 - ddof)
+        x2_mean = 0
+
+    std_diff = np.sqrt(var)
+    # stat = x1_mean - x2_mean - value
+    return _zstat_generic(x1_mean, x2_mean, std_diff, alternative, diff=value)
+
+
+def zconfint(
+    x1,
+    x2=None,
+    value=0,
+    alpha=0.05,
+    alternative="two-sided",
+    usevar="pooled",
+    ddof=1.0,
+):
     """confidence interval based on normal distribution z-test

     Parameters
@@ -1201,10 +1611,35 @@ def zconfint(x1, x2=None, value=0, alpha=0.05, alternative='two-sided',
     CompareMeans

     """
-    pass
-
-
-def ztost(x1, low, upp, x2=None, usevar='pooled', ddof=1.0):
+    # usevar is not used, always pooled
+    # mostly duplicate code from ztest
+
+    if usevar != "pooled":
+        raise NotImplementedError('only usevar="pooled" is implemented')
+    x1 = np.asarray(x1)
+    nobs1 = x1.shape[0]
+    x1_mean = x1.mean(0)
+    x1_var = x1.var(0)
+    if x2 is not None:
+        x2 = np.asarray(x2)
+        nobs2 = x2.shape[0]
+        x2_mean = x2.mean(0)
+        x2_var = x2.var(0)
+        var_pooled = nobs1 * x1_var + nobs2 * x2_var
+        var_pooled /= nobs1 + nobs2 - 2 * ddof
+        var_pooled *= 1.0 / nobs1 + 1.0 / nobs2
+    else:
+        var_pooled = x1_var / (nobs1 - ddof)
+        x2_mean = 0
+
+    std_diff = np.sqrt(var_pooled)
+    ci = _zconfint_generic(
+        x1_mean - x2_mean - value, std_diff, alpha, alternative
+    )
+    return ci
+
+
+def ztost(x1, low, upp, x2=None, usevar="pooled", ddof=1.0):
     """Equivalence test based on normal distribution

     Parameters
@@ -1234,4 +1669,14 @@ def ztost(x1, low, upp, x2=None, usevar='pooled', ddof=1.0):
     checked only for 1 sample case

     """
-    pass
+    tt1 = ztest(
+        x1, x2, alternative="larger", usevar=usevar, value=low, ddof=ddof
+    )
+    tt2 = ztest(
+        x1, x2, alternative="smaller", usevar=usevar, value=upp, ddof=ddof
+    )
+    return (
+        np.maximum(tt1[1], tt2[1]),
+        tt1,
+        tt2,
+    )
diff --git a/statsmodels/tools/_testing.py b/statsmodels/tools/_testing.py
index 612ca9f01..3c1408b41 100644
--- a/statsmodels/tools/_testing.py
+++ b/statsmodels/tools/_testing.py
@@ -8,16 +8,18 @@ during refactoring arises.
 The first group of functions provide consistency checks

 """
+
 import os
 import sys
 from packaging.version import Version, parse
+
 import numpy as np
 from numpy.testing import assert_allclose, assert_
+
 import pandas as pd


 class PytestTester:
-
     def __init__(self, package_path=None):
         f = sys._getframe(1)
         if package_path is None:
@@ -38,12 +40,41 @@ class PytestTester:
             print('Running pytest ' + ' '.join(cmd))
             status = pytest.main(cmd)
             if exit:
-                print(f'Exit status: {status}')
+                print(f"Exit status: {status}")
                 sys.exit(status)
         except ImportError:
             raise ImportError('pytest>=3 required to run the test')


+def check_ttest_tvalues(results):
+    # test that t_test has same results a params, bse, tvalues, ...
+    res = results
+    mat = np.eye(len(res.params))
+    tt = res.t_test(mat)
+
+    assert_allclose(tt.effect, res.params, rtol=1e-12)
+    # TODO: tt.sd and tt.tvalue are 2d also for single regressor, squeeze
+    assert_allclose(np.squeeze(tt.sd), res.bse, rtol=1e-10)
+    assert_allclose(np.squeeze(tt.tvalue), res.tvalues, rtol=1e-12)
+    assert_allclose(tt.pvalue, res.pvalues, rtol=5e-10)
+    assert_allclose(tt.conf_int(), res.conf_int(), rtol=1e-10)
+
+    # test params table frame returned by t_test
+    table_res = np.column_stack((res.params, res.bse, res.tvalues,
+                                 res.pvalues, res.conf_int()))
+    table2 = tt.summary_frame().values
+    assert_allclose(table2, table_res, rtol=1e-12)
+
+    # TODO: move this to test_attributes ?
+    assert_(hasattr(res, 'use_t'))
+
+    tt = res.t_test(mat[0])
+    tt.summary()   # smoke test for #1323
+    pvalues = np.asarray(res.pvalues)
+    assert_allclose(tt.pvalue, pvalues[0], rtol=5e-10)
+    # TODO: Adapt more of test_generic_methods.test_ttest_values here?
+
+
 def check_ftest_pvalues(results):
     """
     Check that the outputs of `res.wald_test` produces pvalues that
@@ -60,7 +91,51 @@ def check_ftest_pvalues(results):
     ------
     AssertionError
     """
-    pass
+    res = results
+    use_t = res.use_t
+    k_vars = len(res.params)
+    # check default use_t
+    pvals = [res.wald_test(np.eye(k_vars)[k], use_f=use_t, scalar=True).pvalue
+             for k in range(k_vars)]
+    assert_allclose(pvals, res.pvalues, rtol=5e-10, atol=1e-25)
+
+    # automatic use_f based on results class use_t
+    pvals = [res.wald_test(np.eye(k_vars)[k], scalar=True).pvalue
+             for k in range(k_vars)]
+    assert_allclose(pvals, res.pvalues, rtol=5e-10, atol=1e-25)
+
+    # TODO: Separate these out into summary/summary2 tests?
+    # label for pvalues in summary
+    string_use_t = 'P>|z|' if use_t is False else 'P>|t|'
+    summ = str(res.summary())
+    assert_(string_use_t in summ)
+
+    # try except for models that do not have summary2
+    try:
+        summ2 = str(res.summary2())
+    except AttributeError:
+        pass
+    else:
+        assert_(string_use_t in summ2)
+
+
+def check_fitted(results):
+    import pytest
+
+    # ignore wrapper for isinstance check
+    from statsmodels.genmod.generalized_linear_model import GLMResults
+    from statsmodels.discrete.discrete_model import DiscreteResults
+
+    # possibly unwrap -- GEE has no wrapper
+    results = getattr(results, '_results', results)
+
+    if isinstance(results, (GLMResults, DiscreteResults)):
+        pytest.skip('Not supported for {0}'.format(type(results)))
+
+    res = results
+    fitted = res.fittedvalues
+    assert_allclose(res.model.endog - fitted, res.resid, rtol=1e-12)
+    assert_allclose(fitted, res.predict(), rtol=1e-12)


 def check_predict_types(results):
@@ -76,4 +151,47 @@ def check_predict_types(results):
     ------
     AssertionError
     """
-    pass
+    res = results
+    # squeeze to make 1d for single regressor test case
+    p_exog = np.squeeze(np.asarray(res.model.exog[:2]))
+
+    # ignore wrapper for isinstance check
+    from statsmodels.genmod.generalized_linear_model import GLMResults
+    from statsmodels.discrete.discrete_model import DiscreteResults
+    from statsmodels.compat.pandas import assert_frame_equal, assert_series_equal
+
+    # possibly unwrap -- GEE has no wrapper
+    results = getattr(results, '_results', results)
+
+    if isinstance(results, (GLMResults, DiscreteResults)):
+        # SMOKE test only  TODO: mark this somehow
+        res.predict(p_exog)
+        res.predict(p_exog.tolist())
+        res.predict(p_exog[0].tolist())
+    else:
+        fitted = res.fittedvalues[:2]
+        assert_allclose(fitted, res.predict(p_exog), rtol=1e-12)
+        # this needs reshape to column-vector:
+        assert_allclose(fitted, res.predict(np.squeeze(p_exog).tolist()),
+                        rtol=1e-12)
+        # only one prediction:
+        assert_allclose(fitted[:1], res.predict(p_exog[0].tolist()),
+                        rtol=1e-12)
+        assert_allclose(fitted[:1], res.predict(p_exog[0]),
+                        rtol=1e-12)
+
+        # Check that pandas wrapping works as expected
+        exog_index = range(len(p_exog))
+        predicted = res.predict(p_exog)
+
+        cls = pd.Series if p_exog.ndim == 1 else pd.DataFrame
+        predicted_pandas = res.predict(cls(p_exog, index=exog_index))
+
+        # predicted.ndim may not match p_exog.ndim because it may be squeezed
+        #  if p_exog has only one column
+        cls = pd.Series if predicted.ndim == 1 else pd.DataFrame
+        predicted_expected = cls(predicted, index=exog_index)
+        if isinstance(predicted_expected, pd.Series):
+            assert_series_equal(predicted_expected, predicted_pandas)
+        else:
+            assert_frame_equal(predicted_expected, predicted_pandas)
diff --git a/statsmodels/tools/catadd.py b/statsmodels/tools/catadd.py
index a7598038b..8016ba963 100644
--- a/statsmodels/tools/catadd.py
+++ b/statsmodels/tools/catadd.py
@@ -2,11 +2,36 @@ import numpy as np


 def add_indep(x, varnames, dtype=None):
-    """
+    '''
     construct array with independent columns

     x is either iterable (list, tuple) or instance of ndarray or a subclass
     of it.  If x is an ndarray, then each column is assumed to represent a
     variable with observations in rows.
-    """
-    pass
+    '''
+    # TODO: this needs tests for subclasses
+
+    if isinstance(x, np.ndarray) and x.ndim == 2:
+        x = x.T
+
+    nvars_orig = len(x)
+    nobs = len(x[0])
+    if not dtype:
+        dtype = np.asarray(x[0]).dtype
+    xout = np.zeros((nobs, nvars_orig), dtype=dtype)
+    count = 0
+    rank_old = 0
+    varnames_new = []
+    varnames_dropped = []
+    keepindx = []
+    for (xi, ni) in zip(x, varnames):
+        xout[:, count] = xi
+        rank_new = np.linalg.matrix_rank(xout)
+        if rank_new > rank_old:
+            varnames_new.append(ni)
+            rank_old = rank_new
+            count += 1
+        else:
+            varnames_dropped.append(ni)
+
+    return xout[:, :count], varnames_new
diff --git a/statsmodels/tools/data.py b/statsmodels/tools/data.py
index 75f87c0b1..ee8bc4d93 100644
--- a/statsmodels/tools/data.py
+++ b/statsmodels/tools/data.py
@@ -5,6 +5,33 @@ import numpy as np
 import pandas as pd


+def _check_period_index(x, freq="M"):
+    from pandas import PeriodIndex, DatetimeIndex
+    if not isinstance(x.index, (DatetimeIndex, PeriodIndex)):
+        raise ValueError("The index must be a DatetimeIndex or PeriodIndex")
+
+    if x.index.freq is not None:
+        inferred_freq = x.index.freqstr
+    else:
+        inferred_freq = pd.infer_freq(x.index)
+    if not inferred_freq.startswith(freq):
+        raise ValueError("Expected frequency {}. Got {}".format(freq,
+                                                                inferred_freq))
+
+
+def is_data_frame(obj):
+    return isinstance(obj, pd.DataFrame)
+
+
+def is_design_matrix(obj):
+    from patsy import DesignMatrix
+    return isinstance(obj, DesignMatrix)
+
+
+def _is_structured_ndarray(obj):
+    return isinstance(obj, np.ndarray) and obj.dtype.names is not None
+
+
 def interpret_data(data, colnames=None, rownames=None):
     """
     Convert passed data structure to form required by estimation classes
@@ -20,11 +47,72 @@ def interpret_data(data, colnames=None, rownames=None):
     -------
     (values, colnames, rownames) : (homogeneous ndarray, list)
     """
-    pass
+    if isinstance(data, np.ndarray):
+        values = np.asarray(data)
+
+        if colnames is None:
+            colnames = ['Y_%d' % i for i in range(values.shape[1])]
+    elif is_data_frame(data):
+        # XXX: hack
+        data = data.dropna()
+        values = data.values
+        colnames = data.columns
+        rownames = data.index
+    else:  # pragma: no cover
+        raise TypeError('Cannot handle input type {typ}'
+                        .format(typ=type(data).__name__))
+
+    if not isinstance(colnames, list):
+        colnames = list(colnames)
+
+    # sanity check
+    if len(colnames) != values.shape[1]:
+        raise ValueError('length of colnames does not match number '
+                         'of columns in data')
+
+    if rownames is not None and len(rownames) != len(values):
+        raise ValueError('length of rownames does not match number '
+                         'of rows in data')
+
+    return values, colnames, rownames
+
+
+def struct_to_ndarray(arr):
+    return arr.view((float, (len(arr.dtype.names),)), type=np.ndarray)
+
+
+def _is_using_ndarray_type(endog, exog):
+    return (type(endog) is np.ndarray and
+            (type(exog) is np.ndarray or exog is None))
+
+
+def _is_using_ndarray(endog, exog):
+    return (isinstance(endog, np.ndarray) and
+            (isinstance(exog, np.ndarray) or exog is None))
+
+
+def _is_using_pandas(endog, exog):
+    from statsmodels.compat.pandas import data_klasses as klasses
+    return (isinstance(endog, klasses) or isinstance(exog, klasses))
+
+
+def _is_array_like(endog, exog):
+    try:  # do it like this in case of mixed types, ie., ndarray and list
+        endog = np.asarray(endog)
+        exog = np.asarray(exog)
+        return True
+    except:
+        return False
+
+
+def _is_using_patsy(endog, exog):
+    # we get this when a structured array is passed through a formula
+    return (is_design_matrix(endog) and
+            (is_design_matrix(exog) or exog is None))


 def _is_recarray(data):
     """
     Returns true if data is a recarray
     """
-    pass
+    return isinstance(data, np.core.recarray)
diff --git a/statsmodels/tools/decorators.py b/statsmodels/tools/decorators.py
index f8804b758..ec3190647 100644
--- a/statsmodels/tools/decorators.py
+++ b/statsmodels/tools/decorators.py
@@ -1,20 +1,21 @@
 from statsmodels.tools.sm_exceptions import CacheWriteWarning
 from statsmodels.compat.pandas import cache_readonly as PandasCacheReadonly
+
 import warnings
+
 __all__ = ['cache_readonly', 'cache_writable', 'deprecated_alias',
-    'ResettableCache']
+           'ResettableCache']


 class ResettableCache(dict):
     """DO NOT USE. BACKWARD COMPAT ONLY"""
-
     def __init__(self, *args, **kwargs):
         super(ResettableCache, self).__init__(*args, **kwargs)
         self.__dict__ = self


 def deprecated_alias(old_name, new_name, remove_version=None, msg=None,
-    warning=FutureWarning):
+                     warning=FutureWarning):
     """
     Deprecate attribute in favor of alternative name.

@@ -53,7 +54,22 @@ def deprecated_alias(old_name, new_name, remove_version=None, msg=None,
     __main__:1: FutureWarning: nvars is a deprecated alias for neqs
     3
     """
-    pass
+
+    if msg is None:
+        msg = '%s is a deprecated alias for %s' % (old_name, new_name)
+        if remove_version is not None:
+            msg += ', will be removed in version %s' % remove_version
+
+    def fget(self):
+        warnings.warn(msg, warning, stacklevel=2)
+        return getattr(self, new_name)
+
+    def fset(self, value):
+        warnings.warn(msg, warning, stacklevel=2)
+        setattr(self, new_name, value)
+
+    res = property(fget=fget, fset=fset)
+    return res


 class CachedAttribute:
@@ -66,16 +82,19 @@ class CachedAttribute:
     def __get__(self, obj, type=None):
         if obj is None:
             return self.fget
+        # Get the cache or set a default one if needed
         _cachename = self.cachename
         _cache = getattr(obj, _cachename, None)
         if _cache is None:
             setattr(obj, _cachename, {})
             _cache = getattr(obj, _cachename)
+        # Get the name of the attribute to set and cache
         name = self.name
         _cachedval = _cache.get(name, None)
         if _cachedval is None:
             _cachedval = self.fget(obj)
             _cache[name] = _cachedval
+
         return _cachedval

     def __set__(self, obj, value):
@@ -84,7 +103,6 @@ class CachedAttribute:


 class CachedWritableAttribute(CachedAttribute):
-
     def __set__(self, obj, value):
         _cache = getattr(obj, self.cachename)
         name = self.name
@@ -101,18 +119,32 @@ class _cache_readonly(property):
         self.cachename = cachename

     def __call__(self, func):
-        return CachedAttribute(func, cachename=self.cachename)
+        return CachedAttribute(func,
+                               cachename=self.cachename)


 class cache_writable(_cache_readonly):
     """
     Decorator for CachedWritableAttribute
     """
-
     def __call__(self, func):
-        return CachedWritableAttribute(func, cachename=self.cachename)
+        return CachedWritableAttribute(func,
+                                       cachename=self.cachename)


+# Use pandas since it works with docs correctly
 cache_readonly = PandasCacheReadonly
+# cached_value and cached_data behave identically to cache_readonly, but
+# are used by `remove_data` to
+#   a) identify array-like attributes to remove (cached_data)
+#   b) make sure certain values are evaluated before caching (cached_value)
+# TODO: Disabled since the subclasses break doc strings
+# class cached_data(PandasCacheReadonly):
+#     pass
+
 cached_data = PandasCacheReadonly
+
+# class cached_value(PandasCacheReadonly):
+#     pass
+
 cached_value = PandasCacheReadonly
diff --git a/statsmodels/tools/docstring.py b/statsmodels/tools/docstring.py
index acba3c993..0773147f9 100644
--- a/statsmodels/tools/docstring.py
+++ b/statsmodels/tools/docstring.py
@@ -7,17 +7,22 @@ import copy
 import inspect
 import re
 import textwrap
+
 from statsmodels.tools.sm_exceptions import ParseError


 def dedent_lines(lines):
     """Deindent a list of lines maximally"""
-    pass
+    return textwrap.dedent("\n".join(lines)).split("\n")


 def strip_blank_lines(line):
     """Remove leading and trailing blank lines from a list of lines"""
-    pass
+    while line and not line[0].strip():
+        del line[0]
+    while line and not line[-1].strip():
+        del line[-1]
+    return line


 class Reader:
@@ -30,20 +35,74 @@ class Reader:
         Parameters
         ----------
         data : str
-           String with lines separated by '
-'.
+           String with lines separated by '\n'.
         """
         if isinstance(data, list):
             self._str = data
         else:
-            self._str = data.split('\n')
+            self._str = data.split("\n")  # store string as list of lines
+
         self.reset()

     def __getitem__(self, n):
         return self._str[n]

+    def reset(self):
+        self._line_num = 0  # current line nr
+
+    def read(self):
+        if not self.eof():
+            out = self[self._line_num]
+            self._line_num += 1
+            return out
+        else:
+            return ""
+
+    def seek_next_non_empty_line(self):
+        for line in self[self._line_num :]:
+            if line.strip():
+                break
+            else:
+                self._line_num += 1
+
+    def eof(self):
+        return self._line_num >= len(self._str)
+
+    def read_to_condition(self, condition_func):
+        start = self._line_num
+        for line in self[start:]:
+            if condition_func(line):
+                return self[start : self._line_num]
+            self._line_num += 1
+            if self.eof():
+                return self[start : self._line_num + 1]
+        return []
+
+    def read_to_next_empty_line(self):
+        self.seek_next_non_empty_line()
+
+        def is_empty(line):
+            return not line.strip()
+
+        return self.read_to_condition(is_empty)
+
+    def read_to_next_unindented_line(self):
+        def is_unindented(line):
+            return line.strip() and (len(line.lstrip()) == len(line))

-Parameter = namedtuple('Parameter', ['name', 'type', 'desc'])
+        return self.read_to_condition(is_unindented)
+
+    def peek(self, n=0):
+        if self._line_num + n < len(self._str):
+            return self[self._line_num + n]
+        else:
+            return ""
+
+    def is_empty(self):
+        return not "".join(self._str).strip()
+
+
+Parameter = namedtuple("Parameter", ["name", "type", "desc"])


 class NumpyDocString(Mapping):
@@ -51,17 +110,35 @@ class NumpyDocString(Mapping):

     Instances define a mapping from section title to structured data.
     """
-    sections = {'Signature': '', 'Summary': [''], 'Extended Summary': [],
-        'Parameters': [], 'Returns': [], 'Yields': [], 'Receives': [],
-        'Raises': [], 'Warns': [], 'Other Parameters': [], 'Attributes': [],
-        'Methods': [], 'See Also': [], 'Notes': [], 'Warnings': [],
-        'References': '', 'Examples': '', 'index': {}}
+
+    sections = {
+        "Signature": "",
+        "Summary": [""],
+        "Extended Summary": [],
+        "Parameters": [],
+        "Returns": [],
+        "Yields": [],
+        "Receives": [],
+        "Raises": [],
+        "Warns": [],
+        "Other Parameters": [],
+        "Attributes": [],
+        "Methods": [],
+        "See Also": [],
+        "Notes": [],
+        "Warnings": [],
+        "References": "",
+        "Examples": "",
+        "index": {},
+    }

     def __init__(self, docstring):
         orig_docstring = docstring
-        docstring = textwrap.dedent(docstring).split('\n')
+        docstring = textwrap.dedent(docstring).split("\n")
+
         self._doc = Reader(docstring)
         self._parsed_data = copy.deepcopy(self.sections)
+
         try:
             self._parse()
         except ParseError as e:
@@ -73,7 +150,7 @@ class NumpyDocString(Mapping):

     def __setitem__(self, key, val):
         if key not in self._parsed_data:
-            self._error_location('Unknown section %s' % key)
+            self._error_location("Unknown section %s" % key)
         else:
             self._parsed_data[key] = val

@@ -82,18 +159,118 @@ class NumpyDocString(Mapping):

     def __len__(self):
         return len(self._parsed_data)
-    _role = ':(?P<role>\\w+):'
-    _funcbacktick = '`(?P<name>(?:~\\w+\\.)?[a-zA-Z0-9_\\.-]+)`'
-    _funcplain = '(?P<name2>[a-zA-Z0-9_\\.-]+)'
-    _funcname = '(' + _role + _funcbacktick + '|' + _funcplain + ')'
-    _funcnamenext = _funcname.replace('role', 'rolenext')
-    _funcnamenext = _funcnamenext.replace('name', 'namenext')
-    _description = '(?P<description>\\s*:(\\s+(?P<desc>\\S+.*))?)?\\s*$'
-    _func_rgx = re.compile('^\\s*' + _funcname + '\\s*')
-    _line_rgx = re.compile('^\\s*' + '(?P<allfuncs>' + _funcname +
-        '(?P<morefuncs>([,]\\s+' + _funcnamenext + ')*)' + ')' +
-        '(?P<trailing>[,\\.])?' + _description)
-    empty_description = '..'
+
+    def _is_at_section(self):
+        self._doc.seek_next_non_empty_line()
+
+        if self._doc.eof():
+            return False
+
+        l1 = self._doc.peek().strip()  # e.g. Parameters
+
+        if l1.startswith(".. index::"):
+            return True
+
+        l2 = self._doc.peek(1).strip()  # ---------- or ==========
+        return l2.startswith("-" * len(l1)) or l2.startswith("=" * len(l1))
+
+    def _strip(self, doc):
+        i = 0
+        j = 0
+        for i, line in enumerate(doc):
+            if line.strip():
+                break
+
+        for j, line in enumerate(doc[::-1]):
+            if line.strip():
+                break
+
+        return doc[i : len(doc) - j]
+
+    def _read_to_next_section(self):
+        section = self._doc.read_to_next_empty_line()
+
+        while not self._is_at_section() and not self._doc.eof():
+            if not self._doc.peek(-1).strip():  # previous line was empty
+                section += [""]
+
+            section += self._doc.read_to_next_empty_line()
+
+        return section
+
+    def _read_sections(self):
+        while not self._doc.eof():
+            data = self._read_to_next_section()
+            name = data[0].strip()
+
+            if name.startswith(".."):  # index section
+                yield name, data[1:]
+            elif len(data) < 2:
+                yield StopIteration
+            else:
+                yield name, self._strip(data[2:])
+
+    def _parse_param_list(self, content, single_element_is_type=False):
+        r = Reader(content)
+        params = []
+        while not r.eof():
+            header = r.read().strip()
+            if " : " in header:
+                arg_name, arg_type = header.split(" : ")[:2]
+            else:
+                if single_element_is_type:
+                    arg_name, arg_type = "", header
+                else:
+                    arg_name, arg_type = header, ""
+
+            desc = r.read_to_next_unindented_line()
+            desc = dedent_lines(desc)
+            desc = strip_blank_lines(desc)
+
+            params.append(Parameter(arg_name, arg_type, desc))
+
+        return params
+
+    # See also supports the following formats.
+    #
+    # <FUNCNAME>
+    # <FUNCNAME> SPACE* COLON SPACE+ <DESC> SPACE*
+    # <FUNCNAME> ( COMMA SPACE+ <FUNCNAME>)+ (COMMA | PERIOD)? SPACE*
+    # <FUNCNAME> ( COMMA SPACE+ <FUNCNAME>)* SPACE* COLON SPACE+ <DESC> SPACE*
+
+    # <FUNCNAME> is one of
+    #   <PLAIN_FUNCNAME>
+    #   COLON <ROLE> COLON BACKTICK <PLAIN_FUNCNAME> BACKTICK
+    # where
+    #   <PLAIN_FUNCNAME> is a legal function name, and
+    #   <ROLE> is any nonempty sequence of word characters.
+    # Examples: func_f1  :meth:`func_h1` :obj:`~baz.obj_r` :class:`class_j`
+    # <DESC> is a string describing the function.
+
+    _role = r":(?P<role>\w+):"
+    _funcbacktick = r"`(?P<name>(?:~\w+\.)?[a-zA-Z0-9_\.-]+)`"
+    _funcplain = r"(?P<name2>[a-zA-Z0-9_\.-]+)"
+    _funcname = r"(" + _role + _funcbacktick + r"|" + _funcplain + r")"
+    _funcnamenext = _funcname.replace("role", "rolenext")
+    _funcnamenext = _funcnamenext.replace("name", "namenext")
+    _description = r"(?P<description>\s*:(\s+(?P<desc>\S+.*))?)?\s*$"
+    _func_rgx = re.compile(r"^\s*" + _funcname + r"\s*")
+    _line_rgx = re.compile(
+        r"^\s*"
+        + r"(?P<allfuncs>"
+        + _funcname  # group for all function names
+        + r"(?P<morefuncs>([,]\s+"
+        + _funcnamenext
+        + r")*)"
+        + r")"
+        +  # end of "allfuncs"
+        # Some function lists have a trailing comma (or period)
+        r"(?P<trailing>[,\.])?"
+        + _description
+    )
+
+    # Empty <DESC> elements are replaced with '..'
+    empty_description = ".."

     def _parse_see_also(self, content):
         """
@@ -102,35 +279,280 @@ class NumpyDocString(Mapping):
         another_func_name : Descriptive text
         func_name1, func_name2, :meth:`func_name`, func_name3
         """
-        pass
+
+        items = []
+
+        def parse_item_name(text):
+            """Match ':role:`name`' or 'name'."""
+            m = self._func_rgx.match(text)
+            if not m:
+                raise ParseError(f"{text} is not a item name")
+            role = m.group("role")
+            name = m.group("name") if role else m.group("name2")
+            return name, role, m.end()
+
+        rest = []
+        for line in content:
+            if not line.strip():
+                continue
+
+            line_match = self._line_rgx.match(line)
+            description = None
+            if line_match:
+                description = line_match.group("desc")
+                if line_match.group("trailing") and description:
+                    self._error_location(
+                        "Unexpected comma or period after function list at "
+                        "index %d of line "
+                        '"%s"' % (line_match.end("trailing"), line)
+                    )
+            if not description and line.startswith(" "):
+                rest.append(line.strip())
+            elif line_match:
+                funcs = []
+                text = line_match.group("allfuncs")
+                while True:
+                    if not text.strip():
+                        break
+                    name, role, match_end = parse_item_name(text)
+                    funcs.append((name, role))
+                    text = text[match_end:].strip()
+                    if text and text[0] == ",":
+                        text = text[1:].strip()
+                rest = list(filter(None, [description]))
+                items.append((funcs, rest))
+            else:
+                raise ParseError(f"{line} is not a item name")
+        return items

     def _parse_index(self, section, content):
         """
         .. index: default
            :refguide: something, else, and more
         """
-        pass
+
+        def strip_each_in(lst):
+            return [s.strip() for s in lst]
+
+        out = {}
+        section = section.split("::")
+        if len(section) > 1:
+            out["default"] = strip_each_in(section[1].split(","))[0]
+        for line in content:
+            line = line.split(":")
+            if len(line) > 2:
+                out[line[1]] = strip_each_in(line[2].split(","))
+        return out

     def _parse_summary(self):
         """Grab signature (if given) and summary"""
-        pass
+        if self._is_at_section():
+            return
+
+        # If several signatures present, take the last one
+        while True:
+            summary = self._doc.read_to_next_empty_line()
+            summary_str = " ".join([s.strip() for s in summary]).strip()
+            compiled = re.compile(r"^([\w., ]+=)?\s*[\w\.]+\(.*\)$")
+            if compiled.match(summary_str):
+                self["Signature"] = summary_str
+                if not self._is_at_section():
+                    continue
+            break
+
+        if summary is not None:
+            self["Summary"] = summary
+
+        if not self._is_at_section():
+            self["Extended Summary"] = self._read_to_next_section()
+
+    def _parse(self):
+        self._doc.reset()
+        self._parse_summary()
+
+        sections = list(self._read_sections())
+        section_names = set([section for section, content in sections])
+
+        has_returns = "Returns" in section_names
+        has_yields = "Yields" in section_names
+        # We could do more tests, but we are not. Arbitrarily.
+        if has_returns and has_yields:
+            msg = "Docstring contains both a Returns and Yields section."
+            raise ValueError(msg)
+        if not has_yields and "Receives" in section_names:
+            msg = "Docstring contains a Receives section but not Yields."
+            raise ValueError(msg)
+
+        for (section, content) in sections:
+            if not section.startswith(".."):
+                section = (s.capitalize() for s in section.split(" "))
+                section = " ".join(section)
+                if self.get(section):
+                    self._error_location(
+                        "The section %s appears twice" % section
+                    )
+
+            if section in (
+                "Parameters",
+                "Other Parameters",
+                "Attributes",
+                "Methods",
+            ):
+                self[section] = self._parse_param_list(content)
+            elif section in (
+                "Returns",
+                "Yields",
+                "Raises",
+                "Warns",
+                "Receives",
+            ):
+                self[section] = self._parse_param_list(
+                    content, single_element_is_type=True
+                )
+            elif section.startswith(".. index::"):
+                self["index"] = self._parse_index(section, content)
+            elif section == "See Also":
+                self["See Also"] = self._parse_see_also(content)
+            else:
+                self[section] = content
+
+    def _error_location(self, msg):
+        if hasattr(self, "_obj"):
+            # we know where the docs came from:
+            try:
+                filename = inspect.getsourcefile(self._obj)
+            except TypeError:
+                filename = None
+            msg = msg + (
+                " in the docstring of %s in %s." % (self._obj, filename)
+            )
+
+        raise ValueError(msg)
+
+    # string conversion routines
+
+    def _str_header(self, name, symbol="-"):
+        return [name, len(name) * symbol]
+
+    def _str_indent(self, doc, indent=4):
+        out = []
+        for line in doc:
+            out += [" " * indent + line]
+        return out
+
+    def _str_signature(self):
+        if self["Signature"]:
+            return [self["Signature"].replace("*", r"\*")] + [""]
+        else:
+            return [""]

-    def __str__(self, func_role=''):
+    def _str_summary(self):
+        if self["Summary"]:
+            return self["Summary"] + [""]
+        else:
+            return []
+
+    def _str_extended_summary(self):
+        if self["Extended Summary"]:
+            return self["Extended Summary"] + [""]
+        else:
+            return []
+
+    def _str_param_list(self, name):
+        out = []
+        if self[name]:
+            out += self._str_header(name)
+            for param in self[name]:
+                parts = []
+                if param.name:
+                    parts.append(param.name)
+                if param.type:
+                    parts.append(param.type)
+                out += [" : ".join(parts)]
+                if param.desc and "".join(param.desc).strip():
+                    out += self._str_indent(param.desc)
+            out += [""]
+        return out
+
+    def _str_section(self, name):
+        out = []
+        if self[name]:
+            out += self._str_header(name)
+            out += self[name]
+            out += [""]
+        return out
+
+    def _str_see_also(self, func_role):
+        if not self["See Also"]:
+            return []
+        out = []
+        out += self._str_header("See Also")
+        last_had_desc = True
+        for funcs, desc in self["See Also"]:
+            assert isinstance(funcs, list)
+            links = []
+            for func, role in funcs:
+                if role:
+                    link = ":%s:`%s`" % (role, func)
+                elif func_role:
+                    link = ":%s:`%s`" % (func_role, func)
+                else:
+                    link = "%s" % func
+                links.append(link)
+            link = ", ".join(links)
+            out += [link]
+            if desc:
+                out += self._str_indent([" ".join(desc)])
+                last_had_desc = True
+            else:
+                last_had_desc = False
+                out += self._str_indent([self.empty_description])
+
+        if last_had_desc:
+            out += [""]
+        return out
+
+    def _str_index(self):
+        idx = self["index"]
+        out = []
+        output_index = False
+        default_index = idx.get("default", "")
+        if default_index:
+            output_index = True
+        out += [".. index:: %s" % default_index]
+        for section, references in idx.items():
+            if section == "default":
+                continue
+            output_index = True
+            out += ["   :%s: %s" % (section, ", ".join(references))]
+        if output_index:
+            return out
+        else:
+            return ""
+
+    def __str__(self, func_role=""):
         out = []
         out += self._str_signature()
         out += self._str_summary()
         out += self._str_extended_summary()
-        for param_list in ('Parameters', 'Returns', 'Yields', 'Receives',
-            'Other Parameters', 'Raises', 'Warns'):
+        for param_list in (
+            "Parameters",
+            "Returns",
+            "Yields",
+            "Receives",
+            "Other Parameters",
+            "Raises",
+            "Warns",
+        ):
             out += self._str_param_list(param_list)
-        out += self._str_section('Warnings')
+        out += self._str_section("Warnings")
         out += self._str_see_also(func_role)
-        for s in ('Notes', 'References', 'Examples'):
+        for s in ("Notes", "References", "Examples"):
             out += self._str_section(s)
-        for param_list in ('Attributes', 'Methods'):
+        for param_list in ("Attributes", "Methods"):
             out += self._str_param_list(param_list)
         out += self._str_index()
-        return '\n'.join(out)
+        return "\n".join(out)


 class Docstring:
@@ -157,7 +579,19 @@ class Docstring:
         parameters : str, list[str]
             The names of the parameters to remove.
         """
-        pass
+        if self._docstring is None:
+            # Protection against -oo execution
+            return
+        if isinstance(parameters, str):
+            parameters = [parameters]
+        repl = [
+            param
+            for param in self._ds["Parameters"]
+            if param.name not in parameters
+        ]
+        if len(repl) + len(parameters) != len(self._ds["Parameters"]):
+            raise ValueError("One or more parameters were not found.")
+        self._ds["Parameters"] = repl

     def insert_parameters(self, after, parameters):
         """
@@ -169,7 +603,24 @@ class Docstring:
         parameters : Parameter, list[Parameter]
             A Parameter of a list of Parameters.
         """
-        pass
+        if self._docstring is None:
+            # Protection against -oo execution
+            return
+        if isinstance(parameters, Parameter):
+            parameters = [parameters]
+        if after is None:
+            self._ds["Parameters"] = parameters + self._ds["Parameters"]
+        else:
+            loc = -1
+            for i, param in enumerate(self._ds["Parameters"]):
+                if param.name == after:
+                    loc = i + 1
+                    break
+            if loc < 0:
+                raise ValueError()
+            params = self._ds["Parameters"][:loc] + parameters
+            params += self._ds["Parameters"][loc:]
+            self._ds["Parameters"] = params

     def replace_block(self, block_name, block):
         """
@@ -181,7 +632,46 @@ class Docstring:
             The replacement block. The structure of the replacement block must
             match how the block is stored by NumpyDocString.
         """
-        pass
+        if self._docstring is None:
+            # Protection against -oo execution
+            return
+        block_name = " ".join(map(str.capitalize, block_name.split(" ")))
+        if block_name not in self._ds:
+            raise ValueError(
+                "{0} is not a block in the " "docstring".format(block_name)
+            )
+        if not isinstance(block, list) and isinstance(
+            self._ds[block_name], list
+        ):
+            block = [block]
+        self._ds[block_name] = block
+
+    def extract_parameters(self, parameters, indent=0):
+        if self._docstring is None:
+            # Protection against -oo execution
+            return
+        if isinstance(parameters, str):
+            parameters = [parameters]
+        ds_params = {param.name: param for param in self._ds["Parameters"]}
+        missing = set(parameters).difference(ds_params.keys())
+        if missing:
+            raise ValueError(
+                "{0} were not found in the "
+                "docstring".format(",".join(missing))
+            )
+        final = [ds_params[param] for param in parameters]
+        ds = copy.deepcopy(self._ds)
+        for key in ds:
+            if key != "Parameters":
+                ds[key] = [] if key != "index" else {}
+            else:
+                ds[key] = final
+        out = str(ds).strip()
+        if indent:
+            out = textwrap.indent(out, " " * indent)
+
+        out = "\n".join(out.split("\n")[2:])
+        return out

     def __str__(self):
         return str(self._ds)
@@ -201,7 +691,11 @@ def remove_parameters(docstring, parameters):
     str
         The modified docstring.
     """
-    pass
+    if docstring is None:
+        return
+    ds = Docstring(docstring)
+    ds.remove_parameters(parameters)
+    return str(ds)


 def indent(text, prefix, predicate=None):
@@ -224,4 +718,6 @@ def indent(text, prefix, predicate=None):
     -------

     """
-    pass
+    if text is None:
+        return ""
+    return textwrap.indent(text, prefix, predicate=predicate)
diff --git a/statsmodels/tools/eval_measures.py b/statsmodels/tools/eval_measures.py
index b0ddbf83c..3c5d9db95 100644
--- a/statsmodels/tools/eval_measures.py
+++ b/statsmodels/tools/eval_measures.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """some measures for evaluation of prediction, tests and model selection

 Created on Tue Nov 08 15:23:20 2011
@@ -8,6 +9,7 @@ License: BSD-3

 """
 import numpy as np
+
 from statsmodels.tools.validation import array_like


@@ -34,7 +36,9 @@ def mse(x1, x2, axis=0):
     desired result or not depends on the array subclass, for example
     numpy matrices will silently produce an incorrect result.
     """
-    pass
+    x1 = np.asanyarray(x1)
+    x2 = np.asanyarray(x2)
+    return np.mean((x1 - x2) ** 2, axis=axis)


 def rmse(x1, x2, axis=0):
@@ -60,7 +64,9 @@ def rmse(x1, x2, axis=0):
     desired result or not depends on the array subclass, for example
     numpy matrices will silently produce an incorrect result.
     """
-    pass
+    x1 = np.asanyarray(x1)
+    x2 = np.asanyarray(x2)
+    return np.sqrt(mse(x1, x2, axis=axis))


 def rmspe(y, y_hat, axis=0, zeros=np.nan):
@@ -83,7 +89,15 @@ def rmspe(y, y_hat, axis=0, zeros=np.nan):
     rmspe : ndarray or float
        Root Mean Squared Percentage Error along given axis.
     """
-    pass
+    y_hat = np.asarray(y_hat)
+    y = np.asarray(y)
+    error = y - y_hat
+    loc = y != 0
+    loc = loc.ravel()
+    percentage_error = np.full_like(error, zeros)
+    percentage_error.flat[loc] = error.flat[loc] / y.flat[loc]
+    mspe = np.nanmean(percentage_error ** 2, axis=axis) * 100
+    return np.sqrt(mspe)


 def maxabs(x1, x2, axis=0):
@@ -108,7 +122,9 @@ def maxabs(x1, x2, axis=0):
     This uses ``numpy.asanyarray`` to convert the input. Whether this is the
     desired result or not depends on the array subclass.
     """
-    pass
+    x1 = np.asanyarray(x1)
+    x2 = np.asanyarray(x2)
+    return np.max(np.abs(x1 - x2), axis=axis)


 def meanabs(x1, x2, axis=0):
@@ -133,7 +149,9 @@ def meanabs(x1, x2, axis=0):
     This uses ``numpy.asanyarray`` to convert the input. Whether this is the
     desired result or not depends on the array subclass.
     """
-    pass
+    x1 = np.asanyarray(x1)
+    x2 = np.asanyarray(x2)
+    return np.mean(np.abs(x1 - x2), axis=axis)


 def medianabs(x1, x2, axis=0):
@@ -158,7 +176,9 @@ def medianabs(x1, x2, axis=0):
     This uses ``numpy.asanyarray`` to convert the input. Whether this is the
     desired result or not depends on the array subclass.
     """
-    pass
+    x1 = np.asanyarray(x1)
+    x2 = np.asanyarray(x2)
+    return np.median(np.abs(x1 - x2), axis=axis)


 def bias(x1, x2, axis=0):
@@ -183,7 +203,9 @@ def bias(x1, x2, axis=0):
     This uses ``numpy.asanyarray`` to convert the input. Whether this is the
     desired result or not depends on the array subclass.
     """
-    pass
+    x1 = np.asanyarray(x1)
+    x2 = np.asanyarray(x2)
+    return np.mean(x1 - x2, axis=axis)


 def medianbias(x1, x2, axis=0):
@@ -208,7 +230,9 @@ def medianbias(x1, x2, axis=0):
     This uses ``numpy.asanyarray`` to convert the input. Whether this is the
     desired result or not depends on the array subclass.
     """
-    pass
+    x1 = np.asanyarray(x1)
+    x2 = np.asanyarray(x2)
+    return np.median(x1 - x2, axis=axis)


 def vare(x1, x2, ddof=0, axis=0):
@@ -233,7 +257,9 @@ def vare(x1, x2, ddof=0, axis=0):
     This uses ``numpy.asanyarray`` to convert the input. Whether this is the
     desired result or not depends on the array subclass.
     """
-    pass
+    x1 = np.asanyarray(x1)
+    x2 = np.asanyarray(x2)
+    return np.var(x1 - x2, ddof=ddof, axis=axis)


 def stde(x1, x2, ddof=0, axis=0):
@@ -258,7 +284,9 @@ def stde(x1, x2, ddof=0, axis=0):
     This uses ``numpy.asanyarray`` to convert the input. Whether this is the
     desired result or not depends on the array subclass.
     """
-    pass
+    x1 = np.asanyarray(x1)
+    x2 = np.asanyarray(x2)
+    return np.std(x1 - x2, ddof=ddof, axis=axis)


 def iqr(x1, x2, axis=0):
@@ -283,7 +311,24 @@ def iqr(x1, x2, axis=0):
     -----
     If ``x1`` and ``x2`` have different shapes, then they must broadcast.
     """
-    pass
+    x1 = array_like(x1, "x1", dtype=None, ndim=None)
+    x2 = array_like(x2, "x1", dtype=None, ndim=None)
+    if axis is None:
+        x1 = x1.ravel()
+        x2 = x2.ravel()
+        axis = 0
+    xdiff = np.sort(x1 - x2, axis=axis)
+    nobs = x1.shape[axis]
+    idx = np.round((nobs - 1) * np.array([0.25, 0.75])).astype(int)
+    sl = [slice(None)] * xdiff.ndim
+    sl[axis] = idx
+    iqr = np.diff(xdiff[tuple(sl)], axis=axis)
+    iqr = np.squeeze(iqr)  # drop reduced dimension
+    return iqr
+
+
+# Information Criteria
+# ---------------------


 def aic(llf, nobs, df_modelwc):
@@ -308,7 +353,7 @@ def aic(llf, nobs, df_modelwc):
     ----------
     https://en.wikipedia.org/wiki/Akaike_information_criterion
     """
-    pass
+    return -2.0 * llf + 2.0 * df_modelwc


 def aicc(llf, nobs, df_modelwc):
@@ -338,7 +383,11 @@ def aicc(llf, nobs, df_modelwc):
     Returns +inf if the effective degrees of freedom, defined as
     ``nobs - df_modelwc - 1.0``, is <= 0.
     """
-    pass
+    dof_eff = nobs - df_modelwc - 1.0
+    if dof_eff > 0:
+        return -2.0 * llf + 2.0 * df_modelwc * nobs / dof_eff
+    else:
+        return np.inf


 def bic(llf, nobs, df_modelwc):
@@ -363,7 +412,7 @@ def bic(llf, nobs, df_modelwc):
     ----------
     https://en.wikipedia.org/wiki/Bayesian_information_criterion
     """
-    pass
+    return -2.0 * llf + np.log(nobs) * df_modelwc


 def hqic(llf, nobs, df_modelwc):
@@ -388,11 +437,14 @@ def hqic(llf, nobs, df_modelwc):
     ----------
     Wikipedia does not say much
     """
-    pass
+    return -2.0 * llf + 2 * np.log(np.log(nobs)) * df_modelwc
+
+
+# IC based on residual sigma


 def aic_sigma(sigma2, nobs, df_modelwc, islog=False):
-    """
+    r"""
     Akaike information criterion

     Parameters
@@ -421,13 +473,13 @@ def aic_sigma(sigma2, nobs, df_modelwc, islog=False):

     :math:`-2 llf + 2 k`

-    in terms of :math:`\\hat{\\sigma}^2`
+    in terms of :math:`\hat{\sigma}^2`

-    :math:`log(\\hat{\\sigma}^2) + 2 k / n`
+    :math:`log(\hat{\sigma}^2) + 2 k / n`

-    in terms of the determinant of :math:`\\hat{\\Sigma}`
+    in terms of the determinant of :math:`\hat{\Sigma}`

-    :math:`log(\\|\\hat{\\Sigma}\\|) + 2 k / n`
+    :math:`log(\|\hat{\Sigma}\|) + 2 k / n`

     Note: In our definition we do not divide by n in the log-likelihood
     version.
@@ -443,7 +495,9 @@ def aic_sigma(sigma2, nobs, df_modelwc, islog=False):
     ----------
     https://en.wikipedia.org/wiki/Akaike_information_criterion
     """
-    pass
+    if not islog:
+        sigma2 = np.log(sigma2)
+    return sigma2 + aic(0, nobs, df_modelwc) / nobs


 def aicc_sigma(sigma2, nobs, df_modelwc, islog=False):
@@ -476,7 +530,9 @@ def aicc_sigma(sigma2, nobs, df_modelwc, islog=False):
     ----------
     https://en.wikipedia.org/wiki/Akaike_information_criterion#AICc
     """
-    pass
+    if not islog:
+        sigma2 = np.log(sigma2)
+    return sigma2 + aicc(0, nobs, df_modelwc) / nobs


 def bic_sigma(sigma2, nobs, df_modelwc, islog=False):
@@ -508,7 +564,9 @@ def bic_sigma(sigma2, nobs, df_modelwc, islog=False):
     ----------
     https://en.wikipedia.org/wiki/Bayesian_information_criterion
     """
-    pass
+    if not islog:
+        sigma2 = np.log(sigma2)
+    return sigma2 + bic(0, nobs, df_modelwc) / nobs


 def hqic_sigma(sigma2, nobs, df_modelwc, islog=False):
@@ -540,9 +598,34 @@ def hqic_sigma(sigma2, nobs, df_modelwc, islog=False):
     ----------
     xxx
     """
-    pass
-
-
-__all__ = [maxabs, meanabs, medianabs, medianbias, mse, rmse, rmspe, stde,
-    vare, aic, aic_sigma, aicc, aicc_sigma, bias, bic, bic_sigma, hqic,
-    hqic_sigma, iqr]
+    if not islog:
+        sigma2 = np.log(sigma2)
+    return sigma2 + hqic(0, nobs, df_modelwc) / nobs
+
+
+# from var_model.py, VAR only? separates neqs and k_vars per equation
+# def fpe_sigma():
+#     ((nobs + self.df_model) / self.df_resid) ** neqs * np.exp(ld)
+
+
+__all__ = [
+    maxabs,
+    meanabs,
+    medianabs,
+    medianbias,
+    mse,
+    rmse,
+    rmspe,
+    stde,
+    vare,
+    aic,
+    aic_sigma,
+    aicc,
+    aicc_sigma,
+    bias,
+    bic,
+    bic_sigma,
+    hqic,
+    hqic_sigma,
+    iqr,
+]
diff --git a/statsmodels/tools/grouputils.py b/statsmodels/tools/grouputils.py
index 31909f0ac..4a55e4182 100644
--- a/statsmodels/tools/grouputils.py
+++ b/statsmodels/tools/grouputils.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Tools for working with groups

 This provides several functions to work with groups and a Group class that
@@ -29,6 +30,7 @@ need more efficient loop if groups are sorted -> see GroupSorted.group_iter
 from statsmodels.compat.python import lrange, lzip
 import numpy as np
 import pandas as pd
+
 import statsmodels.tools.data as data_util
 from pandas import Index, MultiIndex

@@ -36,9 +38,46 @@ from pandas import Index, MultiIndex
 def combine_indices(groups, prefix='', sep='.', return_labels=False):
     """use np.unique to get integer group indices for product, intersection
     """
-    pass
+    if isinstance(groups, tuple):
+        groups = np.column_stack(groups)
+    else:
+        groups = np.asarray(groups)
+
+    dt = groups.dtype
+
+    is2d = (groups.ndim == 2)  # need to store
+
+    if is2d:
+        ncols = groups.shape[1]
+        if not groups.flags.c_contiguous:
+            groups = np.array(groups, order='C')
+
+        groups_ = groups.view([('', groups.dtype)] * groups.shape[1])
+    else:
+        groups_ = groups
+
+    uni, uni_idx, uni_inv = np.unique(groups_, return_index=True,
+                                      return_inverse=True)
+
+    if is2d:
+        uni = uni.view(dt).reshape(-1, ncols)

+        # avoiding a view would be
+        # for t in uni.dtype.fields.values():
+        #     assert (t[0] == dt)
+        #
+        # uni.dtype = dt
+        # uni.shape = (uni.size//ncols, ncols)

+    if return_labels:
+        label = [(prefix+sep.join(['%s']*len(uni[0]))) % tuple(ii)
+                 for ii in uni]
+        return uni_inv, uni_idx, uni, label
+    else:
+        return uni_inv, uni_idx, uni
+
+
+# written for and used in try_covariance_grouploop.py
 def group_sums(x, group, use_bincount=True):
     """simple bincount version, again

@@ -51,7 +90,26 @@ def group_sums(x, group, use_bincount=True):

     for comparison, simple python loop
     """
-    pass
+    x = np.asarray(x)
+    if x.ndim == 1:
+        x = x[:, None]
+    elif x.ndim > 2 and use_bincount:
+        raise ValueError('not implemented yet')
+
+    if use_bincount:
+
+        # re-label groups or bincount takes too much memory
+        if np.max(group) > 2 * x.shape[0]:
+            group = pd.factorize(group)[0]
+
+        return np.array([np.bincount(group, weights=x[:, col])
+                         for col in range(x.shape[1])])
+    else:
+        uniques = np.unique(group)
+        result = np.zeros([len(uniques)] + list(x.shape[1:]))
+        for ii, cat in enumerate(uniques):
+            result[ii] = x[group == cat].sum(0)
+        return result


 def group_sums_dummy(x, group_dummy):
@@ -59,9 +117,14 @@ def group_sums_dummy(x, group_dummy):

     group_dummy can be either ndarray or sparse matrix
     """
-    pass
+    if data_util._is_using_ndarray_type(group_dummy, None):
+        return np.dot(x.T, group_dummy)
+    else:  # check for sparse
+        return x.T * group_dummy


+# TODO: See if this can be entirely replaced by Grouping.dummy_sparse;
+#  see GH#5687
 def dummy_sparse(groups):
     """create a sparse indicator from a group array with integer labels

@@ -108,37 +171,98 @@ def dummy_sparse(groups):
             [0, 0, 1],
             [1, 0, 0]], dtype=int8)
     """
-    pass
+    from scipy import sparse
+
+    indptr = np.arange(len(groups)+1)
+    data = np.ones(len(groups), dtype=np.int8)
+    indi = sparse.csr_matrix((data, groups, indptr))
+
+    return indi


 class Group:

     def __init__(self, group, name=''):
+
+        # self.group = np.asarray(group)  # TODO: use checks in combine_indices
         self.name = name
         uni, uni_idx, uni_inv = combine_indices(group)
+
+        # TODO: rename these to something easier to remember
         self.group_int, self.uni_idx, self.uni = uni, uni_idx, uni_inv
+
         self.n_groups = len(self.uni)
+
+        # put this here so they can be overwritten before calling labels
         self.separator = '.'
         self.prefix = self.name
         if self.prefix:
             self.prefix = self.prefix + '='

+    # cache decorator
+    def counts(self):
+        return np.bincount(self.group_int)
+
+    # cache_decorator
+    def labels(self):
+        # is this only needed for product of groups (intersection)?
+        prefix = self.prefix
+        uni = self.uni
+        sep = self.separator
+
+        if uni.ndim > 1:
+            label = [(prefix+sep.join(['%s']*len(uni[0]))) % tuple(ii)
+                     for ii in uni]
+        else:
+            label = [prefix + '%s' % ii for ii in uni]
+        return label
+
     def dummy(self, drop_idx=None, sparse=False, dtype=int):
         """
         drop_idx is only available if sparse=False

         drop_idx is supposed to index into uni
         """
-        pass
+        uni = self.uni
+        if drop_idx is not None:
+            idx = lrange(len(uni))
+            del idx[drop_idx]
+            uni = uni[idx]

+        group = self.group

-class GroupSorted(Group):
+        if not sparse:
+            return (group[:, None] == uni[None, :]).astype(dtype)
+        else:
+            return dummy_sparse(self.group_int)
+
+    def interaction(self, other):
+        if isinstance(other, self.__class__):
+            other = other.group
+        return self.__class__((self, other))

+    def group_sums(self, x, use_bincount=True):
+        return group_sums(x, self.group_int, use_bincount=use_bincount)
+
+    def group_demean(self, x, use_bincount=True):
+        nobs = float(len(x))
+        means_g = group_sums(x / nobs, self.group_int,
+                             use_bincount=use_bincount)
+        x_demeaned = x - means_g[self.group_int]  # check reverse_index?
+        return x_demeaned, means_g
+
+
+class GroupSorted(Group):
     def __init__(self, group, name=''):
         super(self.__class__, self).__init__(group, name=name)
-        idx = (np.nonzero(np.diff(group))[0] + 1).tolist()
+
+        idx = (np.nonzero(np.diff(group))[0]+1).tolist()
         self.groupidx = lzip([0] + idx, idx + [len(group)])

+    def group_iter(self):
+        for low, upp in self.groupidx:
+            yield slice(low, upp)
+
     def lag_indices(self, lag):
         """return the index array for lagged values

@@ -154,7 +278,11 @@ class GroupSorted(Group):

         not tested yet
         """
-        pass
+        lag_idx = np.asarray(self.groupidx)[:, 1] - lag  # asarray or already?
+        mask_ok = (lag <= lag_idx)
+        # still an observation that belongs to the same individual
+
+        return lag_idx[mask_ok]


 def _is_hierarchical(x):
@@ -162,11 +290,25 @@ def _is_hierarchical(x):
     Checks if the first item of an array-like object is also array-like
     If so, we have a MultiIndex and returns True. Else returns False.
     """
-    pass
+    item = x[0]
+    # is there a better way to do this?
+    if isinstance(item, (list, tuple, np.ndarray, pd.Series, pd.DataFrame)):
+        return True
+    else:
+        return False


-class Grouping:
+def _make_hierarchical_index(index, names):
+    return MultiIndex.from_tuples(*[index], names=names)
+
+
+def _make_generic_names(index):
+    n_names = len(index.names)
+    pad = str(len(str(n_names)))  # number of digits
+    return [("group{0:0"+pad+"}").format(i) for i in range(n_names)]

+
+class Grouping:
     def __init__(self, index, names=None):
         """
         index : index-like
@@ -184,12 +326,12 @@ class Grouping:
         """
         if isinstance(index, (Index, MultiIndex)):
             if names is not None:
-                if hasattr(index, 'set_names'):
+                if hasattr(index, 'set_names'):  # newer pandas
                     index.set_names(names, inplace=True)
                 else:
                     index.names = names
             self.index = index
-        else:
+        else:  # array_like
             if _is_hierarchical(index):
                 self.index = _make_hierarchical_index(index, names)
             else:
@@ -200,15 +342,49 @@ class Grouping:
                     self.index.set_names(names, inplace=True)
                 else:
                     self.index.names = names
+
         self.nobs = len(self.index)
         self.nlevels = len(self.index.names)
         self.slices = None

+    @property
+    def index_shape(self):
+        if hasattr(self.index, 'levshape'):
+            return self.index.levshape
+        else:
+            return self.index.shape
+
+    @property
+    def levels(self):
+        if hasattr(self.index, 'levels'):
+            return self.index.levels
+        else:
+            return pd.Categorical(self.index).levels
+
+    @property
+    def labels(self):
+        # this was index_int, but that's not a very good name...
+        codes = getattr(self.index, 'codes', None)
+        if codes is None:
+            if hasattr(self.index, 'labels'):
+                codes = self.index.labels
+            else:
+                codes = pd.Categorical(self.index).codes[None]
+        return codes
+
+    @property
+    def group_names(self):
+        return self.index.names
+
     def reindex(self, index=None, names=None):
         """
         Resets the index in-place.
         """
-        pass
+        # NOTE: this is not of much use if the rest of the data does not change
+        # This needs to reset cache
+        if names is None:
+            names = self.group_names
+        self = Grouping(index, names)

     def get_slices(self, level=0):
         """
@@ -216,18 +392,36 @@ class Grouping:
         groups for the first index level. I.e., self.slices[0] is the
         index where each observation is in the first (sorted) group.
         """
-        pass
+        # TODO: refactor this
+        groups = self.index.get_level_values(level).unique()
+        groups = np.array(groups)
+        groups.sort()
+        if isinstance(self.index, MultiIndex):
+            self.slices = [self.index.get_loc_level(x, level=level)[0]
+                           for x in groups]
+        else:
+            self.slices = [self.index.get_loc(x) for x in groups]

     def count_categories(self, level=0):
         """
         Sets the attribute counts to equal the bincount of the (integer-valued)
         labels.
         """
-        pass
+        # TODO: refactor this not to set an attribute. Why would we do this?
+        self.counts = np.bincount(self.labels[level])

     def check_index(self, is_sorted=True, unique=True, index=None):
         """Sanity checks"""
-        pass
+        if not index:
+            index = self.index
+        if is_sorted:
+            test = pd.DataFrame(lrange(len(index)), index=index)
+            test_sorted = test.sort()
+            if not test.index.equals(test_sorted.index):
+                raise Exception('Data is not be sorted')
+        if unique:
+            if len(index) != len(index.unique()):
+                raise Exception('Duplicate index entries')

     def sort(self, data, index=None):
         """Applies a (potentially hierarchical) sort operation on a numpy array
@@ -235,24 +429,74 @@ class Grouping:
         user-supplied index.  Returns an object of the same type as the
         original data as well as the matching (sorted) Pandas index.
         """
-        pass
+
+        if index is None:
+            index = self.index
+        if data_util._is_using_ndarray_type(data, None):
+            if data.ndim == 1:
+                out = pd.Series(data, index=index, copy=True)
+                out = out.sort_index()
+            else:
+                out = pd.DataFrame(data, index=index)
+                out = out.sort_index(inplace=False)  # copies
+            return np.array(out), out.index
+        elif data_util._is_using_pandas(data, None):
+            out = data
+            out = out.reindex(index)  # copies?
+            out = out.sort_index()
+            return out, out.index
+        else:
+            msg = 'data must be a Numpy array or a Pandas Series/DataFrame'
+            raise ValueError(msg)

     def transform_dataframe(self, dataframe, function, level=0, **kwargs):
         """Apply function to each column, by group
         Assumes that the dataframe already has a proper index"""
-        pass
+        if dataframe.shape[0] != self.nobs:
+            raise Exception('dataframe does not have the same shape as index')
+        out = dataframe.groupby(level=level).apply(function, **kwargs)
+        if 1 in out.shape:
+            return np.ravel(out)
+        else:
+            return np.array(out)

     def transform_array(self, array, function, level=0, **kwargs):
         """Apply function to each column, by group
         """
-        pass
+        if array.shape[0] != self.nobs:
+            raise Exception('array does not have the same shape as index')
+        dataframe = pd.DataFrame(array, index=self.index)
+        return self.transform_dataframe(dataframe, function, level=level,
+                                        **kwargs)

     def transform_slices(self, array, function, level=0, **kwargs):
         """Apply function to each group. Similar to transform_array but does
         not coerce array to a DataFrame and back and only works on a 1D or 2D
         numpy array. function is called function(group, group_idx, **kwargs).
         """
-        pass
+        array = np.asarray(array)
+        if array.shape[0] != self.nobs:
+            raise Exception('array does not have the same shape as index')
+        # always reset because level is given. need to refactor this.
+        self.get_slices(level=level)
+        processed = []
+        for s in self.slices:
+            if array.ndim == 2:
+                subset = array[s, :]
+            elif array.ndim == 1:
+                subset = array[s]
+            processed.append(function(subset, s, **kwargs))
+        processed = np.array(processed)
+        return processed.reshape(-1, processed.shape[-1])
+
+    # TODO: this is not general needs to be a PanelGrouping object
+    def dummies_time(self):
+        self.dummy_sparse(level=1)
+        return self._dummies
+
+    def dummies_groups(self, level=0):
+        self.dummy_sparse(level=level)
+        return self._dummies

     def dummy_sparse(self, level=0):
         """create a sparse indicator from a group array with integer labels
@@ -301,4 +545,5 @@ class Grouping:
                 [0, 0, 1],
                 [1, 0, 0]], dtype=int8)
         """
-        pass
+        indi = dummy_sparse(self.labels[level])
+        self._dummies = indi
diff --git a/statsmodels/tools/linalg.py b/statsmodels/tools/linalg.py
index 931950783..4bb520340 100644
--- a/statsmodels/tools/linalg.py
+++ b/statsmodels/tools/linalg.py
@@ -2,8 +2,9 @@
 Linear Algebra solvers and other helpers
 """
 import numpy as np
-__all__ = ['logdet_symm', 'stationary_solve', 'transf_constraints',
-    'matrix_sqrt']
+
+__all__ = ["logdet_symm", "stationary_solve", "transf_constraints",
+           "matrix_sqrt"]


 def logdet_symm(m, check_symm=False):
@@ -20,7 +21,12 @@ def logdet_symm(m, check_symm=False):
     logdet : float
         The log-determinant of m.
     """
-    pass
+    from scipy import linalg
+    if check_symm:
+        if not np.all(m == m.T):  # would be nice to short-circuit check
+            raise ValueError("m is not symmetric.")
+    c, _ = linalg.cho_factor(m, lower=True)
+    return 2*np.sum(np.log(c.diagonal()))


 def stationary_solve(r, b):
@@ -43,7 +49,32 @@ def stationary_solve(r, b):
     -------
     The solution to the linear system.
     """
-    pass
+
+    db = r[0:1]
+
+    dim = b.ndim
+    if b.ndim == 1:
+        b = b[:, None]
+    x = b[0:1, :]
+
+    for j in range(1, len(b)):
+        rf = r[0:j][::-1]
+        a = (b[j, :] - np.dot(rf, x)) / (1 - np.dot(rf, db[::-1]))
+        z = x - np.outer(db[::-1], a)
+        x = np.concatenate((z, a[None, :]), axis=0)
+
+        if j == len(b) - 1:
+            break
+
+        rn = r[j]
+        a = (rn - np.dot(rf, db)) / (1 - np.dot(rf, db[::-1]))
+        z = db - a*db[::-1]
+        db = np.concatenate((z, np.r_[a]))
+
+    if dim == 1:
+        x = x[:, 0]
+
+    return x


 def transf_constraints(constraints):
@@ -73,11 +104,17 @@ def transf_constraints(constraints):
     statsmodels.base._constraints.TransformRestriction : class to impose
         constraints by reparameterization used by `_fit_constrained`.
     """
-    pass
+
+    from scipy import linalg
+
+    m = constraints.shape[0]
+    q, _ = linalg.qr(np.transpose(constraints))
+    transf = q[:, m:]
+    return transf


-def matrix_sqrt(mat, inverse=False, full=False, nullspace=False, threshold=
-    1e-15):
+def matrix_sqrt(mat, inverse=False, full=False, nullspace=False,
+                threshold=1e-15):
     """matrix square root for symmetric matrices

     Usage is for decomposing a covariance function S into a square root R
@@ -113,4 +150,25 @@ def matrix_sqrt(mat, inverse=False, full=False, nullspace=False, threshold=
     msqrt : ndarray
         matrix square root or square root of inverse matrix.
     """
-    pass
+    # see also scipy.linalg null_space
+    u, s, v = np.linalg.svd(mat)
+    if np.any(s < -threshold):
+        import warnings
+        warnings.warn('some singular values are negative')
+
+    if not nullspace:
+        mask = s > threshold
+        s[s < threshold] = 0
+    else:
+        mask = s < threshold
+        s[s > threshold] = 0
+
+    sqrt_s = np.sqrt(s[mask])
+    if inverse:
+        sqrt_s = 1 / np.sqrt(s[mask])
+
+    if full:
+        b = np.dot(u[:, mask], np.dot(np.diag(sqrt_s), v[mask]))
+    else:
+        b = np.dot(np.diag(sqrt_s), v[mask])
+    return b
diff --git a/statsmodels/tools/numdiff.py b/statsmodels/tools/numdiff.py
index 63d12a0bf..c77df81b4 100644
--- a/statsmodels/tools/numdiff.py
+++ b/statsmodels/tools/numdiff.py
@@ -12,9 +12,44 @@ without dependencies.
   observations.
 * numerical precision will vary and depend on the choice of stepsizes
 """
+
+# TODO:
+# * some cleanup
+# * check numerical accuracy (and bugs) with numdifftools and analytical
+#   derivatives
+#   - linear least squares case: (hess - 2*X'X) is 1e-8 or so
+#   - gradient and Hessian agree with numdifftools when evaluated away from
+#     minimum
+#   - forward gradient, Jacobian evaluated at minimum is inaccurate, centered
+#     (+/- epsilon) is ok
+# * dot product of Jacobian is different from Hessian, either wrong example or
+#   a bug (unlikely), or a real difference
+#
+#
+# What are the conditions that Jacobian dotproduct and Hessian are the same?
+#
+# See also:
+#
+# BHHH: Greene p481 17.4.6,  MLE Jacobian = d loglike / d beta , where loglike
+# is vector for each observation
+#    see also example 17.4 when J'J is very different from Hessian
+#    also does it hold only at the minimum, what's relationship to covariance
+#    of Jacobian matrix
+# http://projects.scipy.org/scipy/ticket/1157
+# https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm
+#    objective: sum((y-f(beta,x)**2),   Jacobian = d f/d beta
+#    and not d objective/d beta as in MLE Greene
+#    similar: http://crsouza.blogspot.com/2009/11/neural-network-learning-by-levenberg_18.html#hessian
+#
+# in example: if J = d x*beta / d beta then J'J == X'X
+#    similar to https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm
 import numpy as np
+
 from statsmodels.compat.pandas import Appender, Substitution
+
+# NOTE: we only do double precision internally so far
 EPS = np.finfo(float).eps
+
 _hessian_docs = """
     Calculate Hessian with finite difference derivative approximation

@@ -56,8 +91,23 @@ _hessian_docs = """
 """


+def _get_epsilon(x, s, epsilon, n):
+    if epsilon is None:
+        h = EPS**(1. / s) * np.maximum(np.abs(np.asarray(x)), 0.1)
+    else:
+        if np.isscalar(epsilon):
+            h = np.empty(n)
+            h.fill(epsilon)
+        else:  # pragma : no cover
+            h = np.asarray(epsilon)
+            if h.shape != x.shape:
+                raise ValueError("If h is not a scalar it must have the same"
+                                 " shape as x.")
+    return np.asarray(h)
+
+
 def approx_fprime(x, f, epsilon=None, args=(), kwargs={}, centered=False):
-    """
+    '''
     Gradient of function, or Jacobian if function f returns 1d array

     Parameters
@@ -88,13 +138,35 @@ def approx_fprime(x, f, epsilon=None, args=(), kwargs={}, centered=False):
     by f (e.g., with a value for each observation), it returns a 3d array
     with the Jacobian of each observation with shape xk x nobs x xk. I.e.,
     the Jacobian of the first observation would be [:, 0, :]
-    """
-    pass
-
-
-def _approx_fprime_scalar(x, f, epsilon=None, args=(), kwargs={}, centered=
-    False):
-    """
+    '''
+    n = len(x)
+    f0 = f(*((x,)+args), **kwargs)
+    dim = np.atleast_1d(f0).shape  # it could be a scalar
+    grad = np.zeros((n,) + dim, np.promote_types(float, x.dtype))
+    ei = np.zeros((n,), float)
+    if not centered:
+        epsilon = _get_epsilon(x, 2, epsilon, n)
+        for k in range(n):
+            ei[k] = epsilon[k]
+            grad[k, :] = (f(*((x+ei,) + args), **kwargs) - f0)/epsilon[k]
+            ei[k] = 0.0
+    else:
+        epsilon = _get_epsilon(x, 3, epsilon, n) / 2.
+        for k in range(n):
+            ei[k] = epsilon[k]
+            grad[k, :] = (f(*((x+ei,)+args), **kwargs) -
+                          f(*((x-ei,)+args), **kwargs))/(2 * epsilon[k])
+            ei[k] = 0.0
+
+    if n == 1:
+        return grad.T
+    else:
+        return grad.squeeze().T
+
+
+def _approx_fprime_scalar(x, f, epsilon=None, args=(), kwargs={},
+                          centered=False):
+    '''
     Gradient of function vectorized for scalar parameter.

     This assumes that the function ``f`` is vectorized for a scalar parameter.
@@ -122,12 +194,24 @@ def _approx_fprime_scalar(x, f, epsilon=None, args=(), kwargs={}, centered=
     -------
     grad : ndarray
         Array of derivatives, gradient evaluated at parameters ``x``.
-    """
-    pass
+    '''
+    x = np.asarray(x)
+    n = 1
+
+    f0 = f(*((x,)+args), **kwargs)
+    if not centered:
+        eps = _get_epsilon(x, 2, epsilon, n)
+        grad = (f(*((x+eps,) + args), **kwargs) - f0) / eps
+    else:
+        eps = _get_epsilon(x, 3, epsilon, n) / 2.
+        grad = (f(*((x+eps,)+args), **kwargs) -
+                f(*((x-eps,)+args), **kwargs)) / (2 * eps)
+
+    return grad


 def approx_fprime_cs(x, f, epsilon=None, args=(), kwargs={}):
-    """
+    '''
     Calculate gradient or Jacobian with complex step derivative approximation

     Parameters
@@ -155,12 +239,23 @@ def approx_fprime_cs(x, f, epsilon=None, args=(), kwargs={}):
     truncation error can be eliminated by choosing epsilon to be very small.
     The complex-step derivative avoids the problem of round-off error with
     small epsilon because there is no subtraction.
-    """
-    pass
+    '''
+    # From Guilherme P. de Freitas, numpy mailing list
+    # May 04 2010 thread "Improvement of performance"
+    # http://mail.scipy.org/pipermail/numpy-discussion/2010-May/050250.html
+    n = len(x)
+
+    epsilon = _get_epsilon(x, 1, epsilon, n)
+    increments = np.identity(n) * 1j * epsilon
+    # TODO: see if this can be vectorized, but usually dim is small
+    partials = [f(x+ih, *args, **kwargs).imag / epsilon[i]
+                for i, ih in enumerate(increments)]
+
+    return np.array(partials).T


 def _approx_fprime_cs_scalar(x, f, epsilon=None, args=(), kwargs={}):
-    """
+    '''
     Calculate gradient for scalar parameter with complex step derivatives.

     This assumes that the function ``f`` is vectorized for a scalar parameter.
@@ -192,12 +287,22 @@ def _approx_fprime_cs_scalar(x, f, epsilon=None, args=(), kwargs={}):
     truncation error can be eliminated by choosing epsilon to be very small.
     The complex-step derivative avoids the problem of round-off error with
     small epsilon because there is no subtraction.
-    """
-    pass
+    '''
+    # From Guilherme P. de Freitas, numpy mailing list
+    # May 04 2010 thread "Improvement of performance"
+    # http://mail.scipy.org/pipermail/numpy-discussion/2010-May/050250.html
+    x = np.asarray(x)
+    n = x.shape[-1]
+
+    epsilon = _get_epsilon(x, 1, epsilon, n)
+    eps = 1j * epsilon
+    partials = f(x + eps, *args, **kwargs).imag / epsilon
+
+    return np.array(partials)


 def approx_hess_cs(x, f, epsilon=None, args=(), kwargs={}):
-    """Calculate Hessian with complex-step derivative approximation
+    '''Calculate Hessian with complex-step derivative approximation

     Parameters
     ----------
@@ -220,10 +325,140 @@ def approx_hess_cs(x, f, epsilon=None, args=(), kwargs={}):
     of Numerical Differentiation, University of Kent, Canterbury, Kent, U.K.

     The stepsize is the same for the complex and the finite difference part.
-    """
-    pass
+    '''
+    # TODO: might want to consider lowering the step for pure derivatives
+    n = len(x)
+    h = _get_epsilon(x, 3, epsilon, n)
+    ee = np.diag(h)
+    hess = np.outer(h, h)
+
+    n = len(x)
+
+    for i in range(n):
+        for j in range(i, n):
+            hess[i, j] = np.squeeze(
+                (f(*((x + 1j*ee[i, :] + ee[j, :],) + args), **kwargs)
+                          - f(*((x + 1j*ee[i, :] - ee[j, :],)+args),
+                              **kwargs)).imag/2./hess[i, j]
+            )
+            hess[j, i] = hess[i, j]
+
+    return hess
+
+
+@Substitution(
+    scale="3",
+    extra_params="""return_grad : bool
+        Whether or not to also return the gradient
+""",
+    extra_returns="""grad : nparray
+        Gradient if return_grad == True
+""",
+    equation_number="7",
+    equation="""1/(d_j*d_k) * ((f(x + d[j]*e[j] + d[k]*e[k]) - f(x + d[j]*e[j])))
+"""
+)
+@Appender(_hessian_docs)
+def approx_hess1(x, f, epsilon=None, args=(), kwargs={}, return_grad=False):
+    n = len(x)
+    h = _get_epsilon(x, 3, epsilon, n)
+    ee = np.diag(h)
+
+    f0 = f(*((x,)+args), **kwargs)
+    # Compute forward step
+    g = np.zeros(n)
+    for i in range(n):
+        g[i] = f(*((x+ee[i, :],)+args), **kwargs)
+
+    hess = np.outer(h, h)  # this is now epsilon**2
+    # Compute "double" forward step
+    for i in range(n):
+        for j in range(i, n):
+            hess[i, j] = (f(*((x + ee[i, :] + ee[j, :],) + args), **kwargs) -
+                          g[i] - g[j] + f0)/hess[i, j]
+            hess[j, i] = hess[i, j]
+    if return_grad:
+        grad = (g - f0)/h
+        return hess, grad
+    else:
+        return hess
+
+
+@Substitution(
+    scale="3",
+    extra_params="""return_grad : bool
+        Whether or not to also return the gradient
+""",
+    extra_returns="""grad : ndarray
+        Gradient if return_grad == True
+""",
+    equation_number="8",
+    equation="""1/(2*d_j*d_k) * ((f(x + d[j]*e[j] + d[k]*e[k]) - f(x + d[j]*e[j])) -
+                 (f(x + d[k]*e[k]) - f(x)) +
+                 (f(x - d[j]*e[j] - d[k]*e[k]) - f(x + d[j]*e[j])) -
+                 (f(x - d[k]*e[k]) - f(x)))
+"""
+)
+@Appender(_hessian_docs)
+def approx_hess2(x, f, epsilon=None, args=(), kwargs={}, return_grad=False):
+    #
+    n = len(x)
+    # NOTE: ridout suggesting using eps**(1/4)*theta
+    h = _get_epsilon(x, 3, epsilon, n)
+    ee = np.diag(h)
+    f0 = f(*((x,)+args), **kwargs)
+    # Compute forward step
+    g = np.zeros(n)
+    gg = np.zeros(n)
+    for i in range(n):
+        g[i] = f(*((x+ee[i, :],)+args), **kwargs)
+        gg[i] = f(*((x-ee[i, :],)+args), **kwargs)
+
+    hess = np.outer(h, h)  # this is now epsilon**2
+    # Compute "double" forward step
+    for i in range(n):
+        for j in range(i, n):
+            hess[i, j] = (f(*((x + ee[i, :] + ee[j, :],) + args), **kwargs) -
+                          g[i] - g[j] + f0 +
+                          f(*((x - ee[i, :] - ee[j, :],) + args), **kwargs) -
+                          gg[i] - gg[j] + f0)/(2 * hess[i, j])
+            hess[j, i] = hess[i, j]
+    if return_grad:
+        grad = (g - f0)/h
+        return hess, grad
+    else:
+        return hess
+
+
+@Substitution(
+    scale="4",
+    extra_params="",
+    extra_returns="",
+    equation_number="9",
+    equation="""1/(4*d_j*d_k) * ((f(x + d[j]*e[j] + d[k]*e[k]) - f(x + d[j]*e[j]
+                                                     - d[k]*e[k])) -
+                 (f(x - d[j]*e[j] + d[k]*e[k]) - f(x - d[j]*e[j]
+                                                     - d[k]*e[k]))"""
+)
+@Appender(_hessian_docs)
+def approx_hess3(x, f, epsilon=None, args=(), kwargs={}):
+    n = len(x)
+    h = _get_epsilon(x, 4, epsilon, n)
+    ee = np.diag(h)
+    hess = np.outer(h, h)
+
+    for i in range(n):
+        for j in range(i, n):
+            hess[i, j] = np.squeeze(
+                (f(*((x + ee[i, :] + ee[j, :],) + args), **kwargs)
+                 - f(*((x + ee[i, :] - ee[j, :],) + args), **kwargs)
+                 - (f(*((x - ee[i, :] + ee[j, :],) + args), **kwargs)
+                    - f(*((x - ee[i, :] - ee[j, :],) + args), **kwargs))
+                 )/(4.*hess[i, j])
+            )
+            hess[j, i] = hess[i, j]
+    return hess


 approx_hess = approx_hess3
-approx_hess.__doc__ += """
-    This is an alias for approx_hess3"""
+approx_hess.__doc__ += "\n    This is an alias for approx_hess3"
diff --git a/statsmodels/tools/parallel.py b/statsmodels/tools/parallel.py
index 8bc37aaf6..453da26c0 100644
--- a/statsmodels/tools/parallel.py
+++ b/statsmodels/tools/parallel.py
@@ -9,7 +9,9 @@ changes for statsmodels (Josef Perktold)
 - try import from joblib directly, (does not import all of sklearn)

 """
-from statsmodels.tools.sm_exceptions import ModuleUnavailableWarning, module_unavailable_doc
+
+from statsmodels.tools.sm_exceptions import (ModuleUnavailableWarning,
+                                             module_unavailable_doc)


 def parallel_func(func, n_jobs, verbose=5):
@@ -43,4 +45,30 @@ def parallel_func(func, n_jobs, verbose=5):
     >>> print(n_jobs)
     >>> parallel(p_func(i**2) for i in range(10))
     """
-    pass
+    try:
+        try:
+            from joblib import Parallel, delayed
+        except ImportError:
+            from sklearn.externals.joblib import Parallel, delayed
+
+        parallel = Parallel(n_jobs, verbose=verbose)
+        my_func = delayed(func)
+
+        if n_jobs == -1:
+            try:
+                import multiprocessing
+                n_jobs = multiprocessing.cpu_count()
+            except (ImportError, NotImplementedError):
+                import warnings
+                warnings.warn(module_unavailable_doc.format('multiprocessing'),
+                              ModuleUnavailableWarning)
+                n_jobs = 1
+
+    except ImportError:
+        import warnings
+        warnings.warn(module_unavailable_doc.format('joblib'),
+                      ModuleUnavailableWarning)
+        n_jobs = 1
+        my_func = func
+        parallel = list
+    return parallel, my_func, n_jobs
diff --git a/statsmodels/tools/print_version.py b/statsmodels/tools/print_version.py
index ec308a285..a0c094486 100755
--- a/statsmodels/tools/print_version.py
+++ b/statsmodels/tools/print_version.py
@@ -1,8 +1,142 @@
+#!/usr/bin/env python
 from functools import reduce
 import sys
 from os.path import dirname


+def safe_version(module, attr='__version__', *others):
+    if not isinstance(attr, list):
+        attr = [attr]
+    try:
+        return reduce(getattr, [module] + attr)
+    except AttributeError:
+        if others:
+            return safe_version(module, others[0], *others[1:])
+        return "Cannot detect version"
+
+
+def _show_versions_only():
+    print("\nINSTALLED VERSIONS")
+    print("------------------")
+    print("Python: %d.%d.%d.%s.%s" % sys.version_info[:])
+    try:
+        import os
+        (sysname, nodename, release, version, machine) = os.uname()
+        print("OS: %s %s %s %s" % (sysname, release, version, machine))
+        print("byteorder: %s" % sys.byteorder)
+        print("LC_ALL: %s" % os.environ.get('LC_ALL', "None"))
+        print("LANG: %s" % os.environ.get('LANG', "None"))
+    except:
+        pass
+    try:
+        import statsmodels
+        has_sm = True
+    except ImportError:
+        has_sm = False
+
+    print('\nstatsmodels\n===========\n')
+    if has_sm:
+        print('Installed: %s' % safe_version(statsmodels))
+    else:
+        print('Not installed')
+
+    print("\nRequired Dependencies\n=====================\n")
+    try:
+        import Cython
+        print("cython: %s" % safe_version(Cython))
+    except ImportError:
+        print("cython: Not installed")
+
+    try:
+        import numpy
+        print("numpy: %s" % safe_version(numpy, ['version', 'version']))
+    except ImportError:
+        print("numpy: Not installed")
+
+    try:
+        import scipy
+        print("scipy: %s" % safe_version(scipy, ['version', 'version']))
+    except ImportError:
+        print("scipy: Not installed")
+
+    try:
+        import pandas
+        print("pandas: %s" % safe_version(pandas))
+    except ImportError:
+        print("pandas: Not installed")
+
+    try:
+        import dateutil
+        print("    dateutil: %s" % safe_version(dateutil))
+    except ImportError:
+        print("    dateutil: not installed")
+
+    try:
+        import patsy
+        print("patsy: %s" % safe_version(patsy))
+    except ImportError:
+        print("patsy: Not installed")
+
+    print("\nOptional Dependencies\n=====================\n")
+
+    try:
+        import matplotlib as mpl
+        print("matplotlib: %s" % safe_version(mpl))
+    except ImportError:
+        print("matplotlib: Not installed")
+
+    try:
+        from cvxopt import info
+        print("cvxopt: %s" % safe_version(info, 'version'))
+    except ImportError:
+        print("cvxopt: Not installed")
+
+    try:
+        import joblib
+        print("joblib: %s " % (safe_version(joblib)))
+    except ImportError:
+        print("joblib: Not installed")
+
+    print("\nDeveloper Tools\n================\n")
+
+    try:
+        import IPython
+        print("IPython: %s" % safe_version(IPython))
+    except ImportError:
+        print("IPython: Not installed")
+    try:
+        import jinja2
+        print("    jinja2: %s" % safe_version(jinja2))
+    except ImportError:
+        print("    jinja2: Not installed")
+
+    try:
+        import sphinx
+        print("sphinx: %s" % safe_version(sphinx))
+    except ImportError:
+        print("sphinx: Not installed")
+
+    try:
+        import pygments
+        print("    pygments: %s" % safe_version(pygments))
+    except ImportError:
+        print("    pygments: Not installed")
+
+    try:
+        import pytest
+        print("pytest: %s (%s)" % (safe_version(pytest), dirname(pytest.__file__)))
+    except ImportError:
+        print("pytest: Not installed")
+
+    try:
+        import virtualenv
+        print("virtualenv: %s" % safe_version(virtualenv))
+    except ImportError:
+        print("virtualenv: Not installed")
+
+    print("\n")
+
+
 def show_versions(show_dirs=True):
     """
     List the versions of statsmodels and any installed dependencies
@@ -12,8 +146,145 @@ def show_versions(show_dirs=True):
     show_dirs : bool
         Flag indicating to show module locations
     """
-    pass
+    if not show_dirs:
+        _show_versions_only()
+    print("\nINSTALLED VERSIONS")
+    print("------------------")
+    print("Python: %d.%d.%d.%s.%s" % sys.version_info[:])
+    try:
+        import os
+        (sysname, nodename, release, version, machine) = os.uname()
+        print("OS: %s %s %s %s" % (sysname, release, version, machine))
+        print("byteorder: %s" % sys.byteorder)
+        print("LC_ALL: %s" % os.environ.get('LC_ALL', "None"))
+        print("LANG: %s" % os.environ.get('LANG', "None"))
+    except:
+        pass
+
+    try:
+        import statsmodels
+        has_sm = True
+    except ImportError:
+        has_sm = False
+
+    print('\nstatsmodels\n===========\n')
+    if has_sm:
+        print('Installed: %s (%s)' % (safe_version(statsmodels),
+                                      dirname(statsmodels.__file__)))
+    else:
+        print('Not installed')
+
+    print("\nRequired Dependencies\n=====================\n")
+    try:
+        import Cython
+        print("cython: %s (%s)" % (safe_version(Cython),
+                                   dirname(Cython.__file__)))
+    except ImportError:
+        print("cython: Not installed")
+
+    try:
+        import numpy
+        print("numpy: %s (%s)" % (safe_version(numpy, ['version', 'version']),
+                                  dirname(numpy.__file__)))
+    except ImportError:
+        print("numpy: Not installed")
+
+    try:
+        import scipy
+        print("scipy: %s (%s)" % (safe_version(scipy, ['version', 'version']),
+                                  dirname(scipy.__file__)))
+    except ImportError:
+        print("scipy: Not installed")
+
+    try:
+        import pandas
+        print("pandas: %s (%s)" % (safe_version(pandas, ['version', 'version'],
+                                                '__version__'),
+                                   dirname(pandas.__file__)))
+    except ImportError:
+        print("pandas: Not installed")
+
+    try:
+        import dateutil
+        print("    dateutil: %s (%s)" % (safe_version(dateutil),
+                                         dirname(dateutil.__file__)))
+    except ImportError:
+        print("    dateutil: not installed")
+
+    try:
+        import patsy
+        print("patsy: %s (%s)" % (safe_version(patsy),
+                                  dirname(patsy.__file__)))
+    except ImportError:
+        print("patsy: Not installed")
+
+    print("\nOptional Dependencies\n=====================\n")
+
+    try:
+        import matplotlib as mpl
+        print("matplotlib: %s (%s)" % (safe_version(mpl),
+                                       dirname(mpl.__file__)))
+        print("    backend: %s " % mpl.rcParams['backend'])
+    except ImportError:
+        print("matplotlib: Not installed")
+
+    try:
+        from cvxopt import info
+        print("cvxopt: %s (%s)" % (safe_version(info, 'version'),
+                                   dirname(info.__file__)))
+    except ImportError:
+        print("cvxopt: Not installed")
+
+    try:
+        import joblib
+        print("joblib: %s (%s)" % (safe_version(joblib),
+                                   dirname(joblib.__file__)))
+    except ImportError:
+        print("joblib: Not installed")
+
+    print("\nDeveloper Tools\n================\n")
+
+    try:
+        import IPython
+        print("IPython: %s (%s)" % (safe_version(IPython),
+                                    dirname(IPython.__file__)))
+    except ImportError:
+        print("IPython: Not installed")
+    try:
+        import jinja2
+        print("    jinja2: %s (%s)" % (safe_version(jinja2),
+                                       dirname(jinja2.__file__)))
+    except ImportError:
+        print("    jinja2: Not installed")
+
+    try:
+        import sphinx
+        print("sphinx: %s (%s)" % (safe_version(sphinx),
+                                   dirname(sphinx.__file__)))
+    except ImportError:
+        print("sphinx: Not installed")
+
+    try:
+        import pygments
+        print("    pygments: %s (%s)" % (safe_version(pygments),
+                                         dirname(pygments.__file__)))
+    except ImportError:
+        print("    pygments: Not installed")
+
+    try:
+        import pytest
+        print("pytest: %s (%s)" % (safe_version(pytest), dirname(pytest.__file__)))
+    except ImportError:
+        print("pytest: Not installed")
+
+    try:
+        import virtualenv
+        print("virtualenv: %s (%s)" % (safe_version(virtualenv),
+                                       dirname(virtualenv.__file__)))
+    except ImportError:
+        print("virtualenv: Not installed")

+    print("\n")

-if __name__ == '__main__':
+if __name__ == "__main__":
     show_versions()
diff --git a/statsmodels/tools/rng_qrng.py b/statsmodels/tools/rng_qrng.py
index 9eb44014c..e36ee44bb 100644
--- a/statsmodels/tools/rng_qrng.py
+++ b/statsmodels/tools/rng_qrng.py
@@ -1,13 +1,15 @@
 import numpy as np
 import scipy.stats as stats
-_future_warn = """Passing `None` as the seed currently return the NumPy singleton RandomState
+
+_future_warn = """\
+Passing `None` as the seed currently return the NumPy singleton RandomState
 (np.random.mtrand._rand). After release 0.13 this will change to using the
 default generator provided by NumPy (np.random.default_rng()). If you need
 reproducible draws, you should pass a seeded np.random.Generator, e.g.,

 import numpy as np
 seed = 32839283923801
-rng = np.random.default_rng(seed)\"
+rng = np.random.default_rng(seed)"
 """


@@ -38,4 +40,16 @@ def check_random_state(seed=None):

         Random number generator.
     """
-    pass
+    if hasattr(stats, "qmc") and \
+            isinstance(seed, stats.qmc.QMCEngine):
+        return seed
+    elif isinstance(seed, np.random.RandomState):
+        return seed
+    elif isinstance(seed, np.random.Generator):
+        return seed
+    elif seed is not None:
+        return np.random.default_rng(seed)
+    else:
+        import warnings
+        warnings.warn(_future_warn, FutureWarning)
+        return np.random.mtrand._rand
diff --git a/statsmodels/tools/rootfinding.py b/statsmodels/tools/rootfinding.py
index b370e8a79..12bf4cc0a 100644
--- a/statsmodels/tools/rootfinding.py
+++ b/statsmodels/tools/rootfinding.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """

 Created on Mon Mar 18 15:48:23 2013
@@ -11,14 +12,18 @@ TODO:
 """
 import numpy as np
 from scipy import optimize
+
 from statsmodels.tools.testing import Holder
+
 DEBUG = False


-def brentq_expanding(func, low=None, upp=None, args=(), xtol=1e-05,
-    start_low=None, start_upp=None, increasing=None, max_it=100, maxiter_bq
-    =100, factor=10, full_output=False):
-    """find the root of a function in one variable by expanding and brentq
+# based on scipy.stats.distributions._ppf_single_call
+def brentq_expanding(func, low=None, upp=None, args=(), xtol=1e-5,
+                     start_low=None, start_upp=None, increasing=None,
+                     max_it=100, maxiter_bq=100, factor=10,
+                     full_output=False):
+    '''find the root of a function in one variable by expanding and brentq

     Assumes function ``func`` is monotonic.

@@ -86,5 +91,132 @@ def brentq_expanding(func, low=None, upp=None, args=(), xtol=1e-05,

     If

-    """
-    pass
+    '''
+    # TODO: rtol is missing, what does it do?
+    left, right = low, upp  # alias
+
+    # start_upp first because of possible sl = -1 > upp
+    if upp is not None:
+        su = upp
+    elif start_upp is not None:
+        if start_upp < 0:
+            raise ValueError('start_upp needs to be positive')
+        su = start_upp
+    else:
+        su = 1.
+
+    if low is not None:
+        sl = low
+    elif start_low is not None:
+        if start_low > 0:
+            raise ValueError('start_low needs to be negative')
+        sl = start_low
+    else:
+        sl = min(-1., su - 1.)
+
+    # need sl < su
+    if upp is None:
+        su = max(su, sl + 1.)
+
+    # increasing or not ?
+    if ((low is None) or (upp is None)) and increasing is None:
+        assert sl < su  # check during development
+        f_low = func(sl, *args)
+        f_upp = func(su, *args)
+
+        # special case for F-distribution (symmetric around zero for effect
+        # size)
+        # chisquare also takes an indefinite time (did not wait see if it
+        # returns)
+        if np.max(np.abs(f_upp - f_low)) < 1e-15 and sl == -1 and su == 1:
+            sl = 1e-8
+            f_low = func(sl, *args)
+            increasing = (f_low < f_upp)
+
+        # possibly func returns nan
+        delta = su - sl
+        if np.isnan(f_low):
+            # try just 3 points to find ``increasing``
+            # do not change sl because brentq can handle one nan bound
+            for fraction in [0.25, 0.5, 0.75]:
+                sl_ = sl + fraction * delta
+                f_low = func(sl_, *args)
+                if not np.isnan(f_low):
+                    break
+            else:
+                raise ValueError('could not determine whether function is ' +
+                                 'increasing based on starting interval.' +
+                                 '\nspecify increasing or change starting ' +
+                                 'bounds')
+        if np.isnan(f_upp):
+            for fraction in [0.25, 0.5, 0.75]:
+                su_ = su + fraction * delta
+                f_upp = func(su_, *args)
+                if not np.isnan(f_upp):
+                    break
+            else:
+                raise ValueError('could not determine whether function is' +
+                                 'increasing based on starting interval.' +
+                                 '\nspecify increasing or change starting ' +
+                                 'bounds')
+
+        increasing = (f_low < f_upp)
+
+    if not increasing:
+        sl, su = su, sl
+        left, right = right, left
+
+    n_it = 0
+    if left is None and sl != 0:
+        left = sl
+        while func(left, *args) > 0:
+            # condition is also false if func returns nan
+            right = left
+            left *= factor
+            if n_it >= max_it:
+                break
+            n_it += 1
+        # left is now such that func(left) < q
+    if right is None and su != 0:
+        right = su
+        while func(right, *args) < 0:
+            left = right
+            right *= factor
+            if n_it >= max_it:
+                break
+            n_it += 1
+        # right is now such that func(right) > q
+
+    if n_it >= max_it:
+        # print('Warning: max_it reached')
+        # TODO: use Warnings, Note: brentq might still work even with max_it
+        f_low = func(sl, *args)
+        f_upp = func(su, *args)
+        if np.isnan(f_low) and np.isnan(f_upp):
+            # can we still get here?
+            raise ValueError('max_it reached' +
+                             '\nthe function values at boths bounds are NaN' +
+                             '\nchange the starting bounds, set bounds' +
+                             'or increase max_it')
+
+    res = optimize.brentq(func, left, right, args=args,
+                          xtol=xtol, maxiter=maxiter_bq,
+                          full_output=full_output)
+    if full_output:
+        val = res[0]
+        info = Holder(
+            # from brentq
+            root=res[1].root,
+            iterations=res[1].iterations,
+            function_calls=res[1].function_calls,
+            converged=res[1].converged,
+            flag=res[1].flag,
+            # ours:
+            iterations_expand=n_it,
+            start_bounds=(sl, su),
+            brentq_bounds=(left, right),
+            increasing=increasing,
+            )
+        return val, info
+    else:
+        return res
diff --git a/statsmodels/tools/sequences.py b/statsmodels/tools/sequences.py
index b85508968..de409ee27 100644
--- a/statsmodels/tools/sequences.py
+++ b/statsmodels/tools/sequences.py
@@ -29,7 +29,30 @@ def discrepancy(sample, bounds=None):
       Computer Science and Data Analysis Series Science and Data Analysis
       Series, 2006.
     """
-    pass
+    sample = np.asarray(sample)
+    n_sample, dim = sample.shape
+
+    # Sample scaling from bounds to unit hypercube
+    if bounds is not None:
+        min_ = bounds.min(axis=0)
+        max_ = bounds.max(axis=0)
+        sample = (sample - min_) / (max_ - min_)
+
+    abs_ = abs(sample - 0.5)
+    disc1 = np.sum(np.prod(1 + 0.5 * abs_ - 0.5 * abs_ ** 2, axis=1))
+
+    prod_arr = 1
+    for i in range(dim):
+        s0 = sample[:, i]
+        prod_arr *= (1 +
+                     0.5 * abs(s0[:, None] - 0.5) + 0.5 * abs(s0 - 0.5) -
+                     0.5 * abs(s0[:, None] - s0))
+    disc2 = prod_arr.sum()
+
+    c2 = ((13.0 / 12.0) ** dim - 2.0 / n_sample * disc1 +
+          1.0 / (n_sample ** 2) * disc2)
+
+    return c2


 def primes_from_2_to(n):
@@ -49,7 +72,13 @@ def primes_from_2_to(n):
     ----------
     [1] `StackOverflow <https://stackoverflow.com/questions/2068372>`_.
     """
-    pass
+    sieve = np.ones(n // 3 + (n % 6 == 2), dtype=bool)
+    for i in range(1, int(n ** 0.5) // 3 + 1):
+        if sieve[i]:
+            k = 3 * i + 1 | 1
+            sieve[k * k // 3::2 * k] = False
+            sieve[k * (k - 2 * (i & 1) + 4) // 3::2 * k] = False
+    return np.r_[2, 3, ((3 * np.nonzero(sieve)[0][1:] + 1) | 1)]


 def n_primes(n):
@@ -65,7 +94,29 @@ def n_primes(n):
     primes : list(int)
         List of primes.
     """
-    pass
+    primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
+              61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127,
+              131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193,
+              197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269,
+              271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349,
+              353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431,
+              433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503,
+              509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599,
+              601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673,
+              677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761,
+              769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857,
+              859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947,
+              953, 967, 971, 977, 983, 991, 997][:n]
+
+    if len(primes) < n:
+        big_number = 10
+        while 'Not enought primes':
+            primes = primes_from_2_to(big_number)[:n]
+            if len(primes) == n:
+                break
+            big_number += 1000
+
+    return primes


 def van_der_corput(n_sample, base=2, start_index=0):
@@ -87,7 +138,17 @@ def van_der_corput(n_sample, base=2, start_index=0):
     sequence : list (n_samples,)
         Sequence of Van der Corput.
     """
-    pass
+    sequence = []
+    for i in range(start_index, start_index + n_sample):
+        n_th_number, denom = 0., 1.
+        quotient = i
+        while quotient > 0:
+            quotient, remainder = divmod(quotient, base)
+            denom *= base
+            n_th_number += remainder / denom
+        sequence.append(n_th_number)
+
+    return sequence


 def halton(dim, n_sample, bounds=None, start_index=0):
@@ -136,4 +197,16 @@ def halton(dim, n_sample, bounds=None, start_index=0):

     >>> sample_continued = sequences.halton(dim=2, n_sample=5, start_index=5)
     """
-    pass
+    base = n_primes(dim)
+
+    # Generate a sample using a Van der Corput sequence per dimension.
+    sample = [van_der_corput(n_sample + 1, bdim, start_index) for bdim in base]
+    sample = np.array(sample).T[1:]
+
+    # Sample scaling from unit hypercube to feature range
+    if bounds is not None:
+        min_ = bounds.min(axis=0)
+        max_ = bounds.max(axis=0)
+        sample = sample * (max_ - min_) + min_
+
+    return sample
diff --git a/statsmodels/tools/sm_exceptions.py b/statsmodels/tools/sm_exceptions.py
index a8ea0dce6..413ef8df6 100644
--- a/statsmodels/tools/sm_exceptions.py
+++ b/statsmodels/tools/sm_exceptions.py
@@ -10,13 +10,16 @@ warning, and should usually be accompanied by a sting using the format
 warning_name_doc that services as a generic message to use when the warning is
 raised.
 """
+
 import warnings


+# Errors
 class PerfectSeparationError(Exception):
     """
     Error due to perfect prediction in discrete models
     """
+
     pass


@@ -24,6 +27,7 @@ class MissingDataError(Exception):
     """
     Error raised if variables contain missing values when forbidden
     """
+
     pass


@@ -31,6 +35,7 @@ class X13NotFoundError(Exception):
     """
     Error locating the X13 binary
     """
+
     pass


@@ -38,6 +43,7 @@ class X13Error(Exception):
     """
     Error when running modes using X13
     """
+
     pass


@@ -48,11 +54,12 @@ class ParseError(Exception):

     def __str__(self):
         message = self.args[0]
-        if hasattr(self, 'docstring'):
-            message = f'{message} in {self.docstring}'
+        if hasattr(self, "docstring"):
+            message = f"{message} in {self.docstring}"
         return message


+# Warning
 class X13Warning(Warning):
     """
     Unexpected conditions when using X13
@@ -64,6 +71,7 @@ class IOWarning(RuntimeWarning):
     """
     Resource not deleted
     """
+
     pass


@@ -102,6 +110,7 @@ class CacheWriteWarning(ModelWarning):
     """
     Attempting to write to a read-only cached value
     """
+
     pass


@@ -109,6 +118,7 @@ class IterationLimitWarning(ModelWarning):
     """
     Iteration limit reached without convergence
     """
+
     pass


@@ -121,6 +131,7 @@ class InvalidTestWarning(ModelWarning):
     """
     Test not applicable to model
     """
+
     pass


@@ -128,6 +139,7 @@ class NotImplementedWarning(ModelWarning):
     """
     Non-fatal function non-implementation
     """
+
     pass


@@ -135,6 +147,7 @@ class OutputWarning(ModelWarning):
     """
     Function output contains atypical values
     """
+
     pass


@@ -142,6 +155,7 @@ class DomainWarning(ModelWarning):
     """
     Variables are not compliant with required domain constraints
     """
+
     pass


@@ -149,6 +163,7 @@ class ValueWarning(ModelWarning):
     """
     Non-fatal out-of-range value given
     """
+
     pass


@@ -156,6 +171,7 @@ class EstimationWarning(ModelWarning):
     """
     Unexpected condition encountered during estimation
     """
+
     pass


@@ -163,6 +179,7 @@ class SingularMatrixWarning(ModelWarning):
     """
     Non-fatal matrix inversion affects output results
     """
+
     pass


@@ -170,6 +187,7 @@ class HypothesisTestWarning(ModelWarning):
     """
     Issue occurred when performing hypothesis test
     """
+
     pass


@@ -177,6 +195,7 @@ class InterpolationWarning(ModelWarning):
     """
     Table granularity and limits restrict interpolation
     """
+
     pass


@@ -184,6 +203,7 @@ class PrecisionWarning(ModelWarning):
     """
     Numerical implementation affects precision
     """
+
     pass


@@ -191,6 +211,7 @@ class SpecificationWarning(ModelWarning):
     """
     Non-fatal model specification issue
     """
+
     pass


@@ -198,6 +219,7 @@ class HessianInversionWarning(ModelWarning):
     """
     Hessian noninvertible and standard errors unavailable
     """
+
     pass


@@ -205,6 +227,7 @@ class CollinearityWarning(ModelWarning):
     """
     Variables are highly collinear
     """
+
     pass


@@ -212,6 +235,7 @@ class PerfectSeparationWarning(ModelWarning):
     """
     Perfect separation or prediction
     """
+
     pass


@@ -219,6 +243,7 @@ class InfeasibleTestError(RuntimeError):
     """
     Test statistic cannot be computed
     """
+
     pass


@@ -226,8 +251,10 @@ recarray_exception = """
 recarray support has been removed from statsmodels. Use pandas DataFrames
 for structured data.
 """
-warnings.simplefilter('always', ModelWarning)
-warnings.simplefilter('always', ConvergenceWarning)
-warnings.simplefilter('always', CacheWriteWarning)
-warnings.simplefilter('always', IterationLimitWarning)
-warnings.simplefilter('always', InvalidTestWarning)
+
+
+warnings.simplefilter("always", ModelWarning)
+warnings.simplefilter("always", ConvergenceWarning)
+warnings.simplefilter("always", CacheWriteWarning)
+warnings.simplefilter("always", IterationLimitWarning)
+warnings.simplefilter("always", InvalidTestWarning)
diff --git a/statsmodels/tools/testing.py b/statsmodels/tools/testing.py
index 96c4dba79..4a25f34e3 100644
--- a/statsmodels/tools/testing.py
+++ b/statsmodels/tools/testing.py
@@ -2,9 +2,13 @@

 """
 from statsmodels.compat.pandas import testing as pdt
+
 import numpy.testing as npt
 import pandas
+
 from statsmodels.tools.tools import Bunch
+
+# Standard list for parsing tables
 PARAM_LIST = ['params', 'bse', 'tvalues', 'pvalues']


@@ -25,10 +29,20 @@ def bunch_factory(attribute, columns):
     are split so that Bunch has the keys in columns and
     bunch[column[i]] = bunch[attribute][:, i]
     """
-    pass
+    class FactoryBunch(Bunch):
+        def __init__(self, *args, **kwargs):
+            super(FactoryBunch, self).__init__(*args, **kwargs)
+            if not hasattr(self, attribute):
+                raise AttributeError('{0} is required and must be passed to '
+                                     'the constructor'.format(attribute))
+            for i, att in enumerate(columns):
+                self[att] = getattr(self, attribute)[:, i]
+
+    return FactoryBunch


 ParamsTableTestBunch = bunch_factory('params_table', PARAM_LIST)
+
 MarginTableTestBunch = bunch_factory('margins_table', PARAM_LIST)


@@ -36,17 +50,30 @@ class Holder:
     """
     Test-focused class to simplify accessing values by attribute
     """
-
     def __init__(self, **kwds):
         self.__dict__.update(kwds)

     def __str__(self):
-        ss = '\n'.join(str(k) + ' = ' + str(v).replace('\n', '\n    ') for 
-            k, v in vars(self).items())
+        ss = "\n".join(str(k) + " = " + str(v).replace('\n', '\n    ')
+                       for k, v in vars(self).items())
         return ss

     def __repr__(self):
-        ss = '\n'.join(str(k) + ' = ' + repr(v).replace('\n', '\n    ') for
-            k, v in vars(self).items())
-        ss = str(self.__class__) + '\n' + ss
+        # use repr for values including nested cases as in tost
+        ss = "\n".join(str(k) + " = " + repr(v).replace('\n', '\n    ')
+                       for k, v in vars(self).items())
+        ss = str(self.__class__) + "\n" + ss
         return ss
+
+
+# adjusted functions
+
+def assert_equal(actual, desired, err_msg='', verbose=True, **kwds):
+    if isinstance(desired, pandas.Index):
+        pdt.assert_index_equal(actual, desired)
+    elif isinstance(desired, pandas.Series):
+        pdt.assert_series_equal(actual, desired, **kwds)
+    elif isinstance(desired, pandas.DataFrame):
+        pdt.assert_frame_equal(actual, desired, **kwds)
+    else:
+        npt.assert_equal(actual, desired, err_msg='', verbose=True)
diff --git a/statsmodels/tools/tools.py b/statsmodels/tools/tools.py
index 52d056d23..40b6f2de3 100644
--- a/statsmodels/tools/tools.py
+++ b/statsmodels/tools/tools.py
@@ -4,16 +4,29 @@ Utility functions models code
 import numpy as np
 import pandas as pd
 import scipy.linalg
+
 from statsmodels.tools.data import _is_using_pandas
 from statsmodels.tools.validation import array_like


+def asstr2(s):
+    if isinstance(s, str):
+        return s
+    elif isinstance(s, bytes):
+        return s.decode('latin1')
+    else:
+        return str(s)
+
+
 def _make_dictnames(tmp_arr, offset=0):
     """
     Helper function to create a dictionary mapping a column number
     to the name in tmp_arr.
     """
-    pass
+    col_map = {}
+    for i, col_name in enumerate(tmp_arr):
+        col_map[i + offset] = col_name
+    return col_map


 def drop_missing(Y, X=None, axis=1):
@@ -36,9 +49,25 @@ def drop_missing(Y, X=None, axis=1):
     -----
     If either Y or X is 1d, it is reshaped to be 2d.
     """
-    pass
-
-
+    Y = np.asarray(Y)
+    if Y.ndim == 1:
+        Y = Y[:, None]
+    if X is not None:
+        X = np.array(X)
+        if X.ndim == 1:
+            X = X[:, None]
+        keepidx = np.logical_and(~np.isnan(Y).any(axis),
+                                 ~np.isnan(X).any(axis))
+        return Y[keepidx], X[keepidx]
+    else:
+        keepidx = ~np.isnan(Y).any(axis)
+        return Y[keepidx]
+
+
+# TODO: needs to better preserve dtype and be more flexible
+# ie., if you still have a string variable in your array you do not
+# want to cast it to float
+# TODO: add name validator (ie., bad names for datasets.grunfeld)
 def categorical(data, col=None, dictnames=False, drop=False):
     """
     Construct a dummy matrix from categorical variables
@@ -119,9 +148,10 @@ def categorical(data, col=None, dictnames=False, drop=False):

     >>> design2 = sm.tools.categorical(struct_ar, col='str_instr', drop=True)
     """
-    pass
+    raise NotImplementedError("categorical has been removed")


+# TODO: add an axis argument to this for sysreg
 def add_constant(data, prepend=True, has_constant='skip'):
     """
     Add a column of ones to an array.
@@ -150,7 +180,34 @@ def add_constant(data, prepend=True, has_constant='skip'):
     When the input is a pandas Series or DataFrame, the added column's name
     is 'const'.
     """
-    pass
+    if _is_using_pandas(data, None):
+        from statsmodels.tsa.tsatools import add_trend
+        return add_trend(data, trend='c', prepend=prepend, has_constant=has_constant)
+
+    # Special case for NumPy
+    x = np.asarray(data)
+    ndim = x.ndim
+    if ndim == 1:
+        x = x[:, None]
+    elif x.ndim > 2:
+        raise ValueError('Only implemented for 2-dimensional arrays')
+
+    is_nonzero_const = np.ptp(x, axis=0) == 0
+    is_nonzero_const &= np.all(x != 0.0, axis=0)
+    if is_nonzero_const.any():
+        if has_constant == 'skip':
+            return x
+        elif has_constant == 'raise':
+            if ndim == 1:
+                raise ValueError("data is constant.")
+            else:
+                columns = np.arange(x.shape[1])
+                cols = ",".join([str(c) for c in columns[is_nonzero_const]])
+                raise ValueError(f"Column(s) {cols} are constant.")
+
+    x = [np.ones(x.shape[0]), x]
+    x = x if prepend else x[::-1]
+    return np.column_stack(x)


 def isestimable(c, d):
@@ -184,7 +241,15 @@ def isestimable(c, d):
     >>> isestimable([1, -1, 0], d)
     True
     """
-    pass
+    c = array_like(c, 'c', maxdim=2)
+    d = array_like(d, 'd', ndim=2)
+    c = c[None, :] if c.ndim == 1 else c
+    if c.shape[1] != d.shape[1]:
+        raise ValueError('Contrast should have %d columns' % d.shape[1])
+    new = np.vstack([c, d])
+    if np.linalg.matrix_rank(new) != np.linalg.matrix_rank(d):
+        return False
+    return True


 def pinv_extended(x, rcond=1e-15):
@@ -194,7 +259,21 @@ def pinv_extended(x, rcond=1e-15):

     Code adapted from numpy.
     """
-    pass
+    x = np.asarray(x)
+    x = x.conjugate()
+    u, s, vt = np.linalg.svd(x, False)
+    s_orig = np.copy(s)
+    m = u.shape[0]
+    n = vt.shape[1]
+    cutoff = rcond * np.maximum.reduce(s)
+    for i in range(min(n, m)):
+        if s[i] > cutoff:
+            s[i] = 1./s[i]
+        else:
+            s[i] = 0.
+    res = np.dot(np.transpose(vt), np.multiply(s[:, np.core.newaxis],
+                                               np.transpose(u)))
+    return res, s_orig


 def recipr(x):
@@ -211,7 +290,14 @@ def recipr(x):
     ndarray
         The array with 0-filled reciprocals.
     """
-    pass
+    x = np.asarray(x)
+    out = np.zeros_like(x, dtype=np.float64)
+    nans = np.isnan(x.flat)
+    pos = ~nans
+    pos[pos] = pos[pos] & (x.flat[pos] > 0)
+    out.flat[pos] = 1.0 / x.flat[pos]
+    out.flat[nans] = np.nan
+    return out


 def recipr0(x):
@@ -228,7 +314,14 @@ def recipr0(x):
     ndarray
         The array with 0-filled reciprocals.
     """
-    pass
+    x = np.asarray(x)
+    out = np.zeros_like(x, dtype=np.float64)
+    nans = np.isnan(x.flat)
+    non_zero = ~nans
+    non_zero[non_zero] = non_zero[non_zero] & (x.flat[non_zero] != 0)
+    out.flat[non_zero] = 1.0 / x.flat[non_zero]
+    out.flat[nans] = np.nan
+    return out


 def clean0(matrix):
@@ -245,7 +338,9 @@ def clean0(matrix):
     ndarray
         The cleaned array.
     """
-    pass
+    colsum = np.add.reduce(matrix**2, 0)
+    val = [matrix[:, i] for i in np.flatnonzero(colsum)]
+    return np.array(np.transpose(val))


 def fullrank(x, r=None):
@@ -269,7 +364,16 @@ def fullrank(x, r=None):
     If the rank of x is known it can be specified as r -- no check
     is made to ensure that this really is the rank of x.
     """
-    pass
+    if r is None:
+        r = np.linalg.matrix_rank(x)
+
+    v, d, u = np.linalg.svd(x, full_matrices=False)
+    order = np.argsort(d)
+    order = order[::-1]
+    value = []
+    for i in range(r):
+        value.append(v[:, order[i]])
+    return np.asarray(np.transpose(value)).astype(np.float64)


 def unsqueeze(data, axis, oldshape):
@@ -303,7 +407,9 @@ def unsqueeze(data, axis, oldshape):
     (3, 1, 5)
     >>>
     """
-    pass
+    newshape = list(oldshape)
+    newshape[axis] = 1
+    return data.reshape(newshape)


 def nan_dot(A, B):
@@ -315,7 +421,18 @@ def nan_dot(A, B):
     ----------
     A, B : ndarray
     """
-    pass
+    # Find out who should be nan due to nan * nonzero
+    should_be_nan_1 = np.dot(np.isnan(A), (B != 0))
+    should_be_nan_2 = np.dot((A != 0), np.isnan(B))
+    should_be_nan = should_be_nan_1 + should_be_nan_2
+
+    # Multiply after setting all nan to 0
+    # This is what happens if there were no nan * nonzero conflicts
+    C = np.dot(np.nan_to_num(A), np.nan_to_num(B))
+
+    C[should_be_nan] = np.nan
+
+    return C


 def maybe_unwrap_results(results):
@@ -325,7 +442,7 @@ def maybe_unwrap_results(results):
     Can be used in plotting functions or other post-estimation type
     routines.
     """
-    pass
+    return getattr(results, '_results', results)


 class Bunch(dict):
@@ -339,7 +456,6 @@ class Bunch(dict):
     **kwargs
         Keyword agument passed to dict constructor, key=value.
     """
-
     def __init__(self, *args, **kwargs):
         super(Bunch, self).__init__(*args, **kwargs)
         self.__dict__ = self
@@ -370,10 +486,25 @@ def _ensure_2d(x, ndarray=False):
     -----
     Accepts None for simplicity
     """
-    pass
+    if x is None:
+        return x
+    is_pandas = _is_using_pandas(x, None)
+    if x.ndim == 2:
+        if is_pandas:
+            return x, x.columns
+        else:
+            return x, None
+    elif x.ndim > 2:
+        raise ValueError('x mst be 1 or 2-dimensional.')
+
+    name = x.name if is_pandas else None
+    if ndarray:
+        return np.asarray(x)[:, None], name
+    else:
+        return pd.DataFrame(x), name


-def matrix_rank(m, tol=None, method='qr'):
+def matrix_rank(m, tol=None, method="qr"):
     """
     Matrix rank calculation using QR or SVD

@@ -401,4 +532,17 @@ def matrix_rank(m, tol=None, method='qr'):
     elements on the leading diagonal of the R matrix that are above tol
     in absolute value.
     """
-    pass
+    m = array_like(m, "m", ndim=2)
+    if method == "ip":
+        m = m[:, np.any(m != 0, axis=0)]
+        m = m / np.sqrt((m ** 2).sum(0))
+        m = m.T @ m
+        return np.linalg.matrix_rank(m, tol=tol, hermitian=True)
+    elif method == "qr":
+        r, = scipy.linalg.qr(m, mode="r")
+        abs_diag = np.abs(np.diag(r))
+        if tol is None:
+            tol = abs_diag[0] * m.shape[1] * np.finfo(float).eps
+        return int((abs_diag > tol).sum())
+    else:
+        return np.linalg.matrix_rank(m, tol=tol)
diff --git a/statsmodels/tools/transform_model.py b/statsmodels/tools/transform_model.py
index 979ebfbdf..ff19edd64 100644
--- a/statsmodels/tools/transform_model.py
+++ b/statsmodels/tools/transform_model.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Tue May 27 13:23:24 2014

@@ -5,6 +6,7 @@ Author: Josef Perktold
 License: BSD-3

 """
+
 import numpy as np


@@ -42,23 +44,32 @@ class StandardizeTransform:
         data = np.asarray(data)
         self.mean = data.mean(0)
         self.scale = data.std(0, ddof=1)
+
+        # do not transform a constant
         if const_idx is None:
             const_idx = np.nonzero(self.scale == 0)[0]
             if len(const_idx) == 0:
                 const_idx = 'n'
             else:
                 const_idx = int(np.squeeze(const_idx))
+
         if const_idx != 'n':
             self.mean[const_idx] = 0
             self.scale[const_idx] = 1
+
         if demean is False:
             self.mean = None
+
         self.const_idx = const_idx

     def transform(self, data):
         """standardize the data using the stored transformation
         """
-        pass
+        # could use scipy.stats.zscore instead
+        if self.mean is None:
+            return np.asarray(data) / self.scale
+        else:
+            return (np.asarray(data) - self.mean) / self.scale

     def transform_params(self, params):
         """Transform parameters of the standardized model to the original model
@@ -74,5 +85,11 @@ class StandardizeTransform:
             parameters transformed to the parameterization of the original
             model
         """
-        pass
+
+        params_new = params / self.scale
+        if self.const_idx != 'n':
+            params_new[self.const_idx] -= (params_new * self.mean).sum()
+
+        return params_new
+
     __call__ = transform
diff --git a/statsmodels/tools/typing.py b/statsmodels/tools/typing.py
index b83b66107..b406da2f2 100644
--- a/statsmodels/tools/typing.py
+++ b/statsmodels/tools/typing.py
@@ -1,19 +1,32 @@
 from __future__ import annotations
+
 from typing import TYPE_CHECKING, Any, Sequence, Union
+
 from packaging.version import parse
 from pandas import DataFrame, Series
+
 if TYPE_CHECKING:
     import numpy as np
-    if parse(np.__version__) < parse('1.22.0'):
+
+    if parse(np.__version__) < parse("1.22.0"):
         raise NotImplementedError(
-            'NumPy 1.22.0 or later required for type checking')
-    from numpy.typing import ArrayLike as ArrayLike, DTypeLike, NDArray, _FloatLike_co, _UIntLike_co
+            "NumPy 1.22.0 or later required for type checking"
+        )
+    from numpy.typing import (
+        ArrayLike as ArrayLike,
+        DTypeLike,
+        NDArray,
+        _FloatLike_co,
+        _UIntLike_co,
+    )
+
     _ExtendedFloatLike_co = Union[_FloatLike_co, _UIntLike_co]
     NumericArray = NDArray[Any, np.dtype[_ExtendedFloatLike_co]]
     Float64Array = NDArray[Any, np.double]
     ArrayLike1D = Union[Sequence[Union[float, int]], NumericArray, Series]
-    ArrayLike2D = Union[Sequence[Sequence[Union[float, int]]], NumericArray,
-        DataFrame]
+    ArrayLike2D = Union[
+        Sequence[Sequence[Union[float, int]]], NumericArray, DataFrame
+    ]
 else:
     ArrayLike = Any
     DTypeLike = Any
@@ -22,5 +35,13 @@ else:
     ArrayLike1D = Any
     ArrayLike2D = Any
     NDArray = Any
-__all__ = ['ArrayLike', 'DTypeLike', 'Float64Array', 'ArrayLike1D',
-    'ArrayLike2D', 'NDArray', 'NumericArray']
+
+__all__ = [
+    "ArrayLike",
+    "DTypeLike",
+    "Float64Array",
+    "ArrayLike1D",
+    "ArrayLike2D",
+    "NDArray",
+    "NumericArray",
+]
diff --git a/statsmodels/tools/validation/decorators.py b/statsmodels/tools/validation/decorators.py
index dc692c2f2..d8a0e6a6f 100644
--- a/statsmodels/tools/validation/decorators.py
+++ b/statsmodels/tools/validation/decorators.py
@@ -1,3 +1,41 @@
 from functools import wraps
+
 import numpy as np
+
 import statsmodels.tools.validation.validation as v
+
+
+def array_like(
+    pos,
+    name,
+    dtype=np.double,
+    ndim=None,
+    maxdim=None,
+    shape=None,
+    order="C",
+    contiguous=False,
+):
+    def inner(func):
+        @wraps(func)
+        def wrapper(*args, **kwargs):
+            if pos < len(args):
+                arg = args[pos]
+                arg = v.array_like(
+                    arg, name, dtype, ndim, maxdim, shape, order, contiguous
+                )
+                if pos == 0:
+                    args = (arg,) + args[1:]
+                else:
+                    args = args[:pos] + (arg,) + args[pos + 1:]
+            else:
+                arg = kwargs[name]
+                arg = v.array_like(
+                    arg, name, dtype, ndim, maxdim, shape, order, contiguous
+                )
+                kwargs[name] = arg
+
+            return func(*args, **kwargs)
+
+        return wrapper
+
+    return inner
diff --git a/statsmodels/tools/validation/validation.py b/statsmodels/tools/validation/validation.py
index 7472cc2f3..52b483f98 100644
--- a/statsmodels/tools/validation/validation.py
+++ b/statsmodels/tools/validation/validation.py
@@ -1,5 +1,6 @@
 from typing import Any, Optional
 from collections.abc import Mapping
+
 import numpy as np
 import pandas as pd

@@ -22,11 +23,28 @@ def _right_squeeze(arr, stop_dim=0):
         Array with all trailing singleton dimensions (0 or 1) removed.
         Singleton dimensions for dimension < stop_dim are retained.
     """
-    pass
-
-
-def array_like(obj, name, dtype=np.double, ndim=1, maxdim=None, shape=None,
-    order=None, contiguous=False, optional=False, writeable=True):
+    last = arr.ndim
+    for s in reversed(arr.shape):
+        if s > 1:
+            break
+        last -= 1
+    last = max(last, stop_dim)
+
+    return arr.reshape(arr.shape[:last])
+
+
+def array_like(
+    obj,
+    name,
+    dtype=np.double,
+    ndim=1,
+    maxdim=None,
+    shape=None,
+    order=None,
+    contiguous=False,
+    optional=False,
+    writeable=True,
+):
     """
     Convert array-like to a ndarray and check conditions

@@ -115,7 +133,33 @@ def array_like(obj, name, dtype=np.double, ndim=1, maxdim=None, shape=None,
      ...
     ValueError: x is required to have shape (*, 4, 4) but has shape (4, 10, 4)
     """
-    pass
+    if optional and obj is None:
+        return None
+    reqs = ["W"] if writeable else []
+    if order == "C" or contiguous:
+        reqs += ["C"]
+    elif order == "F":
+        reqs += ["F"]
+    arr = np.require(obj, dtype=dtype, requirements=reqs)
+    if maxdim is not None:
+        if arr.ndim > maxdim:
+            msg = "{0} must have ndim <= {1}".format(name, maxdim)
+            raise ValueError(msg)
+    elif ndim is not None:
+        if arr.ndim > ndim:
+            arr = _right_squeeze(arr, stop_dim=ndim)
+        elif arr.ndim < ndim:
+            arr = np.reshape(arr, arr.shape + (1,) * (ndim - arr.ndim))
+        if arr.ndim != ndim:
+            msg = "{0} is required to have ndim {1} but has ndim {2}"
+            raise ValueError(msg.format(name, ndim, arr.ndim))
+    if shape is not None:
+        for actual, req in zip(arr.shape, shape):
+            if req is not None and actual != req:
+                req_shape = str(shape).replace("None, ", "*, ")
+                msg = "{0} is required to have shape {1} but has shape {2}"
+                raise ValueError(msg.format(name, req_shape, arr.shape))
+    return arr


 class PandasWrapper:
@@ -160,7 +204,39 @@ class PandasWrapper:
         array_like
             A pandas Series or DataFrame, depending on the shape of obj.
         """
-        pass
+        obj = np.asarray(obj)
+        if not self._is_pandas:
+            return obj
+
+        if obj.shape[0] + trim_start + trim_end != self._pandas_obj.shape[0]:
+            raise ValueError(
+                "obj must have the same number of elements in "
+                "axis 0 as orig"
+            )
+        index = self._pandas_obj.index
+        index = index[trim_start: index.shape[0] - trim_end]
+        if obj.ndim == 1:
+            if columns is None:
+                name = getattr(self._pandas_obj, "name", None)
+            elif isinstance(columns, str):
+                name = columns
+            else:
+                name = columns[0]
+            if append is not None:
+                name = append if name is None else f"{name}_{append}"
+
+            return pd.Series(obj, name=name, index=index)
+        elif obj.ndim == 2:
+            if columns is None:
+                columns = getattr(self._pandas_obj, "columns", None)
+            if append is not None:
+                new = []
+                for c in columns:
+                    new.append(append if c is None else f"{c}_{append}")
+                columns = new
+            return pd.DataFrame(obj, columns=columns, index=index)
+        else:
+            raise ValueError("Can only wrap 1 or 2-d array_like")


 def bool_like(value, name, optional=False, strict=False):
@@ -184,11 +260,29 @@ def bool_like(value, name, optional=False, strict=False):
     converted : bool
         value converted to a bool
     """
-    pass
-
-
-def int_like(value: Any, name: str, optional: bool=False, strict: bool=False
-    ) ->Optional[int]:
+    if optional and value is None:
+        return value
+    extra_text = " or None" if optional else ""
+    if strict:
+        if isinstance(value, bool):
+            return value
+        else:
+            raise TypeError("{0} must be a bool{1}".format(name, extra_text))
+
+    if hasattr(value, "squeeze") and callable(value.squeeze):
+        value = value.squeeze()
+    try:
+        return bool(value)
+    except Exception:
+        raise TypeError(
+            "{0} must be a bool (or bool-compatible)"
+            "{1}".format(name, extra_text)
+        )
+
+
+def int_like(
+    value: Any, name: str, optional: bool = False, strict: bool = False
+) -> Optional[int]:
     """
     Convert to int or raise if not int_like

@@ -209,10 +303,29 @@ def int_like(value: Any, name: str, optional: bool=False, strict: bool=False
     converted : int
         value converted to a int
     """
-    pass
-
-
-def required_int_like(value: Any, name: str, strict: bool=False) ->int:
+    if optional and value is None:
+        return None
+    is_bool_timedelta = isinstance(value, (bool, np.timedelta64))
+
+    if hasattr(value, "squeeze") and callable(value.squeeze):
+        value = value.squeeze()
+
+    if isinstance(value, (int, np.integer)) and not is_bool_timedelta:
+        return int(value)
+    elif not strict and not is_bool_timedelta:
+        try:
+            if value == (value // 1):
+                return int(value)
+        except Exception:
+            pass
+    extra_text = " or None" if optional else ""
+    raise TypeError(
+        "{0} must be integer_like (int or np.integer, but not bool"
+        " or timedelta64){1}".format(name, extra_text)
+    )
+
+
+def required_int_like(value: Any, name: str, strict: bool = False) -> int:
     """
     Convert to int or raise if not int_like

@@ -233,7 +346,9 @@ def required_int_like(value: Any, name: str, strict: bool=False) ->int:
     converted : int
         value converted to a int
     """
-    pass
+    _int = int_like(value, name, optional=False, strict=strict)
+    assert _int is not None
+    return _int


 def float_like(value, name, optional=False, strict=False):
@@ -259,7 +374,31 @@ def float_like(value, name, optional=False, strict=False):
     converted : float
         value converted to a float
     """
-    pass
+    if optional and value is None:
+        return None
+    is_bool = isinstance(value, bool)
+    is_complex = isinstance(value, (complex, np.complexfloating))
+    if hasattr(value, "squeeze") and callable(value.squeeze):
+        value = value.squeeze()
+
+    if isinstance(value, (int, np.integer, float, np.inexact)) and not (
+        is_bool or is_complex
+    ):
+        return float(value)
+    elif not strict and is_complex:
+        imag = np.imag(value)
+        if imag == 0:
+            return float(np.real(value))
+    elif not strict and not is_bool:
+        try:
+            return float(value / 1.0)
+        except Exception:
+            pass
+    extra_text = " or None" if optional else ""
+    raise TypeError(
+        "{0} must be float_like (float or np.inexact)"
+        "{1}".format(name, extra_text)
+    )


 def string_like(value, name, optional=False, options=None, lower=True):
@@ -291,7 +430,21 @@ def string_like(value, name, optional=False, options=None, lower=True):
     ValueError
         If the input is not in ``options`` when ``options`` is set.
     """
-    pass
+    if value is None:
+        return None
+    if not isinstance(value, str):
+        extra_text = " or None" if optional else ""
+        raise TypeError("{0} must be a string{1}".format(name, extra_text))
+    if lower:
+        value = value.lower()
+    if options is not None and value not in options:
+        extra_text = "If not None, " if optional else ""
+        options_text = "'" + "', '".join(options) + "'"
+        msg = "{0}{1} must be one of: {2}".format(
+            extra_text, name, options_text
+        )
+        raise ValueError(msg)
+    return value


 def dict_like(value, name, optional=False, strict=True):
@@ -314,4 +467,13 @@ def dict_like(value, name, optional=False, strict=True):
     converted : dict_like
         value
     """
-    pass
+    if optional and value is None:
+        return None
+    if not isinstance(value, Mapping) or (
+        strict and not (isinstance(value, dict))
+    ):
+        extra_text = "If not None, " if optional else ""
+        strict_text = " or dict_like (i.e., a Mapping)" if strict else ""
+        msg = "{0}{1} must be a dict{2}".format(extra_text, name, strict_text)
+        raise TypeError(msg)
+    return value
diff --git a/statsmodels/tools/web.py b/statsmodels/tools/web.py
index e453047bb..98daf41a8 100644
--- a/statsmodels/tools/web.py
+++ b/statsmodels/tools/web.py
@@ -4,7 +4,9 @@ to a function's reference
 """
 import webbrowser
 from urllib.parse import urlencode
+
 from statsmodels import __version__
+
 BASE_URL = 'https://www.statsmodels.org/'


@@ -13,7 +15,30 @@ def _generate_url(func, stable):
     Parse inputs and return a correctly formatted URL or raises ValueError
     if the input is not understandable
     """
-    pass
+    url = BASE_URL
+    if stable:
+        url += 'stable/'
+    else:
+        url += 'devel/'
+
+    if func is None:
+        return url
+    elif isinstance(func, str):
+        url += 'search.html?'
+        url += urlencode({'q': func})
+        url += '&check_keywords=yes&area=default'
+    else:
+        try:
+            func = func
+            func_name = func.__name__
+            func_module = func.__module__
+            if not func_module.startswith('statsmodels.'):
+                raise ValueError('Function must be from statsmodels')
+            url += 'generated/'
+            url += func_module + '.' + func_name + '.html'
+        except AttributeError:
+            raise ValueError('Input not understood')
+    return url


 def webdoc(func=None, stable=None):
@@ -53,4 +78,7 @@ def webdoc(func=None, stable=None):

     Uses the default system browser.
     """
-    pass
+    stable = __version__ if 'dev' not in __version__ else stable
+    url_or_error = _generate_url(func, stable)
+    webbrowser.open(url_or_error)
+    return None
diff --git a/statsmodels/treatment/treatment_effects.py b/statsmodels/treatment/treatment_effects.py
index b51050f3f..db96271fe 100644
--- a/statsmodels/treatment/treatment_effects.py
+++ b/statsmodels/treatment/treatment_effects.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Treatment effect estimators

 follows largely Stata's teffects in Stata 13 manual
@@ -26,6 +27,7 @@ Note: script requires cattaneo2 data file from Stata 14, hardcoded file path
 could be loaded with webuse

 """
+
 import numpy as np
 from statsmodels.compat.pandas import Substitution
 from scipy.linalg import block_diag
@@ -41,7 +43,15 @@ def _mom_ate(params, endog, tind, prob, weighted=True):
     This does not include a moment condition for potential outcome mean (POM).

     """
-    pass
+    w1 = (tind / prob)
+    w0 = (1. - tind) / (1. - prob)
+    if weighted:
+        w0 /= w0.mean()
+        w1 /= w1.mean()
+
+    wdiff = w1 - w0
+
+    return endog * wdiff - params


 def _mom_atm(params, endog, tind, prob, weighted=True):
@@ -49,7 +59,13 @@ def _mom_atm(params, endog, tind, prob, weighted=True):

     moment conditions are POM0 and POM1
     """
-    pass
+    w1 = (tind / prob)
+    w0 = (1. - tind) / (1. - prob)
+    if weighted:
+        w1 /= w1.mean()
+        w0 /= w0.mean()
+
+    return np.column_stack((endog * w0 - params[0], endog * w1 - params[1]))


 def _mom_ols(params, endog, tind, prob, weighted=True):
@@ -59,7 +75,12 @@ def _mom_ols(params, endog, tind, prob, weighted=True):
     moment conditions are POM0 and POM1

     """
-    pass
+    w = tind / prob + (1-tind) / (1 - prob)
+
+    treat_ind = np.column_stack((1 - tind, tind))
+    mom = (w * (endog - treat_ind.dot(params)))[:, None] * treat_ind
+
+    return mom


 def _mom_ols_te(tm, endog, tind, prob, weighted=True):
@@ -70,14 +91,42 @@ def _mom_ols_te(tm, endog, tind, prob, weighted=True):
     second moment is POM0  (control)

     """
-    pass
+    w = tind / prob + (1-tind) / (1 - prob)
+
+    treat_ind = np.column_stack((tind, np.ones(len(tind))))
+    mom = (w * (endog - treat_ind.dot(tm)))[:, None] * treat_ind
+
+    return mom
+
+
+def _mom_olsex(params, model=None, exog=None, scale=None):
+    exog = exog if exog is not None else model.exog
+    fitted = model.predict(params, exog)
+    resid = model.endog - fitted
+    if scale is not None:
+        resid /= scale
+    mom = resid[:, None] * exog
+    return mom


 def ate_ipw(endog, tind, prob, weighted=True, probt=None):
     """average treatment effect based on basic inverse propensity weighting.

     """
-    pass
+    w1 = (tind / prob)
+    w0 = (1. - tind) / (1. - prob)
+
+    if probt is not None:
+        w1 *= probt
+        w0 *= probt
+
+    if weighted:
+        w0 /= w0.mean()
+        w1 /= w1.mean()
+
+    wdiff = w1 - w0
+
+    return (endog * wdiff).mean(), (endog * w0).mean(), (endog * w1).mean()


 class _TEGMMGeneric1(GMM):
@@ -92,23 +141,53 @@ class _TEGMMGeneric1(GMM):
     """

     def __init__(self, endog, res_select, mom_outcome, exclude_tmoms=False,
-        **kwargs):
+                 **kwargs):
         super(_TEGMMGeneric1, self).__init__(endog, None, None)
         self.results_select = res_select
         self.mom_outcome = mom_outcome
         self.exclude_tmoms = exclude_tmoms
         self.__dict__.update(kwargs)
+
+        # add xnames so it's not None
+        # we don't have exog in init in this version
         if self.data.xnames is None:
             self.data.xnames = []
+
+        # need information about decomposition of parameters
         if exclude_tmoms:
             self.k_select = 0
         else:
             self.k_select = len(res_select.model.data.param_names)
+
         if exclude_tmoms:
+            # fittedvalues is still linpred
             self.prob = self.results_select.predict()
         else:
             self.prob = None

+    def momcond(self, params):
+        k_outcome = len(params) - self.k_select
+        tm = params[:k_outcome]
+        p_tm = params[k_outcome:]
+
+        tind = self.results_select.model.endog
+
+        if self.exclude_tmoms:
+            prob = self.prob
+        else:
+            prob = self.results_select.model.predict(p_tm)
+
+        moms_list = []
+        mom_o = self.mom_outcome(tm, self.endog, tind, prob, weighted=True)
+        moms_list.append(mom_o)
+
+        if not self.exclude_tmoms:
+            mom_t = self.results_select.model.score_obs(p_tm)
+            moms_list.append(mom_t)
+
+        moms = np.column_stack(moms_list)
+        return moms
+

 class _TEGMM(GMM):
     """GMM class to get cov_params for treatment effects
@@ -125,9 +204,23 @@ class _TEGMM(GMM):
         super(_TEGMM, self).__init__(endog, None, None)
         self.results_select = res_select
         self.mom_outcome = mom_outcome
+
+        # add xnames so it's not None
+        # we don't have exog in init in this version
         if self.data.xnames is None:
             self.data.xnames = []

+    def momcond(self, params):
+        tm = params[:2]
+        p_tm = params[2:]
+
+        tind = self.results_select.model.endog
+        prob = self.results_select.model.predict(p_tm)
+        momt = self.mom_outcome(tm, self.endog, tind, prob)  # weighted=True)
+        moms = np.column_stack((momt,
+                                self.results_select.model.score_obs(p_tm)))
+        return moms
+

 class _IPWGMM(_TEGMMGeneric1):
     """ GMM for aipw treatment effect and potential outcome
@@ -135,6 +228,47 @@ class _IPWGMM(_TEGMMGeneric1):
     uses unweighted outcome regression
     """

+    def momcond(self, params):
+        # Note: momcond in original order of observations
+        ra = self.teff
+        res_select = ra.results_select
+        tind = ra.treatment
+        endog = ra.model_pool.endog
+        effect_group = self.effect_group
+
+        tm = params[:2]
+        ps = params[2:]
+
+        prob_sel = np.asarray(res_select.model.predict(ps))
+        prob_sel = np.clip(prob_sel, 0.01, 0.99)
+        prob = prob_sel
+
+        if effect_group == "all":
+            probt = None
+        elif effect_group in [1, "treated"]:
+            probt = prob
+        elif effect_group in [0, "untreated", "control"]:
+            probt = 1 - prob
+        elif isinstance(effect_group, np.ndarray):
+            probt = probt
+        else:
+            raise ValueError("incorrect option for effect_group")
+
+        w = tind / prob + (1 - tind) / (1 - prob)
+        # Are we supposed to use scaled weights? doesn't cloesely match Stata
+        # w1 = tind / prob
+        # w2 = (1 - tind) / (1 - prob)
+        # w = w1 / w1.sum() * tind.sum() + w2 / w2.sum() * (1 - tind).sum()
+        if probt is not None:
+            w *= probt
+
+        treat_ind = np.column_stack((tind, np.ones(len(tind))))
+        mm = (w * (endog - treat_ind.dot(tm)))[:, None] * treat_ind
+
+        mom_select = res_select.model.score_obs(ps)
+        moms = np.column_stack((mm, mom_select))
+        return moms
+

 class _AIPWGMM(_TEGMMGeneric1):
     """ GMM for aipw treatment effect and potential outcome
@@ -142,6 +276,65 @@ class _AIPWGMM(_TEGMMGeneric1):
     uses unweighted outcome regression
     """

+    def momcond(self, params):
+        ra = self.teff
+        treat_mask = ra.treat_mask
+        res_select = ra.results_select
+
+        ppom = params[1]
+        mask = np.arange(len(params)) != 1
+        params = params[mask]
+
+        k = ra.results0.model.exog.shape[1]
+        pm = params[0]  # ATE parameter
+        p0 = params[1:k+1]
+        p1 = params[k+1:2*k+1]
+        ps = params[2*k+1:]
+        mod0 = ra.results0.model
+        mod1 = ra.results1.model
+        # use reordered exog, endog so it matches sub models by group
+        exog = ra.exog_grouped
+        endog = ra.endog_grouped
+
+        prob_sel = np.asarray(res_select.model.predict(ps))
+        prob_sel = np.clip(prob_sel, 0.01, 0.99)
+
+        prob0 = prob_sel[~treat_mask]
+        prob1 = prob_sel[treat_mask]
+        prob = np.concatenate((prob0, prob1))
+
+        # outcome models by treatment unweighted
+        fitted0 = mod0.predict(p0, exog)
+        mom0 = _mom_olsex(p0, model=mod0)
+
+        fitted1 = mod1.predict(p1, exog)
+        mom1 = _mom_olsex(p1, model=mod1)
+
+        mom_outcome = block_diag(mom0, mom1)
+
+        # moments for target statistics, ATE and POM
+        tind = ra.treatment
+        tind = np.concatenate((tind[~treat_mask], tind[treat_mask]))
+        correct0 = (endog - fitted0) / (1 - prob) * (1 - tind)
+        correct1 = (endog - fitted1) / prob * tind
+
+        tmean0 = fitted0 + correct0
+        tmean1 = fitted1 + correct1
+        ate = tmean1 - tmean0
+
+        mm = ate - pm
+        mpom = tmean0 - ppom
+        mm = np.column_stack((mm, mpom))
+
+        # Note: res_select has original data order,
+        # mom_outcome and mm use grouped observations
+        mom_select = res_select.model.score_obs(ps)
+        mom_select = np.concatenate((mom_select[~treat_mask],
+                                     mom_select[treat_mask]), axis=0)
+
+        moms = np.column_stack((mm, mom_outcome, mom_select))
+        return moms
+

 class _AIPWWLSGMM(_TEGMMGeneric1):
     """ GMM for aipw-wls treatment effect and potential outcome
@@ -149,6 +342,73 @@ class _AIPWWLSGMM(_TEGMMGeneric1):
     uses weighted outcome regression
     """

+    def momcond(self, params):
+        ra = self.teff
+        treat_mask = ra.treat_mask
+        res_select = ra.results_select
+
+        ppom = params[1]
+        mask = np.arange(len(params)) != 1
+        params = params[mask]
+
+        k = ra.results0.model.exog.shape[1]
+        pm = params[0]  # ATE parameter
+        p0 = params[1:k+1]
+        p1 = params[k+1:2*k+1]
+        ps = params[-6:]
+        mod0 = ra.results0.model
+        mod1 = ra.results1.model
+        # use reordered exog, endog so it matches sub models by group
+        exog = ra.exog_grouped
+        endog = ra.endog_grouped
+
+        # todo: need weights in outcome models
+        prob_sel = np.asarray(res_select.model.predict(ps))
+
+        prob_sel = np.clip(prob_sel, 0.001, 0.999)
+
+        prob0 = prob_sel[~treat_mask]
+        prob1 = prob_sel[treat_mask]
+        prob = np.concatenate((prob0, prob1))
+
+        tind = 0
+        ww0 = (1 - tind) / (1 - prob0) * ((1 - tind) / (1 - prob0) - 1)
+        tind = 1
+        ww1 = tind / prob1 * (tind / prob1 - 1)
+
+        # outcome models by treatment using IPW weights
+        fitted0 = mod0.predict(p0, exog)
+        mom0 = _mom_olsex(p0, model=mod0) * ww0[:, None]
+
+        fitted1 = mod1.predict(p1, exog)
+        mom1 = _mom_olsex(p1, model=mod1) * ww1[:, None]
+
+        mom_outcome = block_diag(mom0, mom1)
+
+        # moments for target statistics, ATE and POM
+        tind = ra.treatment
+        tind = np.concatenate((tind[~treat_mask], tind[treat_mask]))
+
+        correct0 = (endog - fitted0) / (1 - prob) * (1 - tind)
+        correct1 = (endog - fitted1) / prob * tind
+
+        tmean0 = fitted0 + correct0
+        tmean1 = fitted1 + correct1
+        ate = tmean1 - tmean0
+
+        mm = ate - pm
+        mpom = tmean0 - ppom
+        mm = np.column_stack((mm, mpom))
+
+        # Note: res_select has original data order,
+        # mom_outcome and mm use grouped observations
+        mom_select = res_select.model.score_obs(ps)
+        mom_select = np.concatenate((mom_select[~treat_mask],
+                                     mom_select[treat_mask]), axis=0)
+
+        moms = np.column_stack((mm, mom_outcome, mom_select))
+        return moms
+

 class _RAGMM(_TEGMMGeneric1):
     """GMM for regression adjustment treatment effect and potential outcome
@@ -156,11 +416,114 @@ class _RAGMM(_TEGMMGeneric1):
     uses unweighted outcome regression
     """

+    def momcond(self, params):
+        ra = self.teff
+
+        ppom = params[1]
+        mask = np.arange(len(params)) != 1
+        params = params[mask]
+
+        k = ra.results0.model.exog.shape[1]
+        pm = params[0]
+        p0 = params[1:k+1]
+        p1 = params[-k:]
+        mod0 = ra.results0.model
+        mod1 = ra.results1.model
+        # use reordered exog, endog so it matches sub models by group
+        exog = ra.exog_grouped
+
+        fitted0 = mod0.predict(p0, exog)
+        mom0 = _mom_olsex(p0, model=mod0)
+
+        fitted1 = mod1.predict(p1, exog)
+        mom1 = _mom_olsex(p1, model=mod1)
+
+        momout = block_diag(mom0, mom1)
+
+        mm = fitted1 - fitted0 - pm
+        mpom = fitted0 - ppom
+        mm = np.column_stack((mm, mpom))
+        if self.probt is not None:
+            mm *= (self.probt / self.probt.mean())[:, None]
+
+        moms = np.column_stack((mm, momout))
+        return moms
+

 class _IPWRAGMM(_TEGMMGeneric1):
     """ GMM for ipwra treatment effect and potential outcome
     """

+    def momcond(self, params):
+        ra = self.teff
+        treat_mask = ra.treat_mask
+        res_select = ra.results_select
+
+        ppom = params[1]
+        mask = np.arange(len(params)) != 1
+        params = params[mask]
+
+        k = ra.results0.model.exog.shape[1]
+        pm = params[0]  # ATE parameter
+        p0 = params[1:k+1]
+        p1 = params[k+1:2*k+1]
+        ps = params[-6:]
+        mod0 = ra.results0.model
+        mod1 = ra.results1.model
+
+        # use reordered exog so it matches sub models by group
+        exog = ra.exog_grouped
+        tind = np.zeros(len(treat_mask))
+        tind[-treat_mask.sum():] = 1
+
+        # selection probability by group, propensity score
+        prob_sel = np.asarray(res_select.model.predict(ps))
+        prob_sel = np.clip(prob_sel, 0.001, 0.999)
+        prob0 = prob_sel[~treat_mask]
+        prob1 = prob_sel[treat_mask]
+
+        effect_group = self.effect_group
+        if effect_group == "all":
+            w0 = 1 / (1 - prob0)
+            w1 = 1 / prob1
+            sind = 1
+        elif effect_group in [1, "treated"]:
+            w0 = prob0 / (1 - prob0)
+            w1 = prob1 / prob1
+            # for averaging effect on treated
+            sind = tind / tind.mean()
+        elif effect_group in [0, "untreated", "control"]:
+            w0 = (1 - prob0) / (1 - prob0)
+            w1 = (1 - prob1) / prob1
+
+            sind = (1 - tind)
+            sind /= sind.mean()
+        else:
+            raise ValueError("incorrect option for effect_group")
+
+        # outcome models by treatment using IPW weights
+        fitted0 = mod0.predict(p0, exog)
+        mom0 = _mom_olsex(p0, model=mod0) * w0[:, None]
+
+        fitted1 = mod1.predict(p1, exog)
+        mom1 = _mom_olsex(p1, model=mod1) * w1[:, None]
+
+        mom_outcome = block_diag(mom0, mom1)
+
+        # moments for target statistics, ATE and POM
+        mm = (fitted1 - fitted0 - pm) * sind
+        mpom = (fitted0 - ppom) * sind
+        mm = np.column_stack((mm, mpom))
+
+        # Note: res_select has original data order,
+        # mom_outcome and mm use grouped observations
+        mom_select = res_select.model.score_obs(ps)
+        mom_select = np.concatenate((mom_select[~treat_mask],
+                                     mom_select[treat_mask]), axis=0)
+
+        moms = np.column_stack((mm, mom_outcome, mom_select))
+        return moms
+

 class TreatmentEffectResults(ContrastResults):
     """
@@ -194,11 +557,14 @@ class TreatmentEffectResults(ContrastResults):
         self.teff = teff
         self.results_gmm = results_gmm
         self.method = method
+        # TODO: make those explicit?
         self.__dict__.update(kwds)
-        self.c_names = ['ATE', 'POM0', 'POM1']
+
+        self.c_names = ["ATE", "POM0", "POM1"]


-doc_params_returns = """Parameters
+doc_params_returns = """\
+Parameters
 ----------
 return_results : bool
     If True, then a results instance is returned.
@@ -220,7 +586,9 @@ Returns
 -------
 TreatmentEffectsResults instance or tuple (ATE, POM0, POM1)
 """
-doc_params_returns2 = """Parameters
+
+doc_params_returns2 = """\
+Parameters
 ----------
 return_results : bool
     If True, then a results instance is returned.
@@ -272,23 +640,34 @@ class TreatmentEffect(object):

     """

-    def __init__(self, model, treatment, results_select=None, _cov_type=
-        'HC0', **kwds):
-        self.__dict__.update(kwds)
+    def __init__(self, model, treatment, results_select=None, _cov_type="HC0",
+                 **kwds):
+        # Note _cov_type is only for preliminary estimators,
+        # cov in GMM alwasy corresponds to HC0
+        self.__dict__.update(kwds)  # currently not used
         self.treatment = np.asarray(treatment)
-        self.treat_mask = treat_mask = treatment == 1
+        self.treat_mask = treat_mask = (treatment == 1)
+
         if results_select is not None:
             self.results_select = results_select
             self.prob_select = results_select.predict()
+
         self.model_pool = model
         endog = model.endog
         exog = model.exog
         self.nobs = endog.shape[0]
         self._cov_type = _cov_type
+
+        # no init keys are supported
         mod0 = model.__class__(endog[~treat_mask], exog[~treat_mask])
         self.results0 = mod0.fit(cov_type=_cov_type)
         mod1 = model.__class__(endog[treat_mask], exog[treat_mask])
         self.results1 = mod1.fit(cov_type=_cov_type)
+        # self.predict_mean0 = self.model_pool.predict(self.results0.params
+        #                                             ).mean()
+        # self.predict_mean1 = self.model_pool.predict(self.results1.params
+        #                                             ).mean()
+
         self.exog_grouped = np.concatenate((mod0.exog, mod1.exog), axis=0)
         self.endog_grouped = np.concatenate((mod0.endog, mod1.endog), axis=0)

@@ -299,9 +678,9 @@ class TreatmentEffect(object):
         not yet implemented

         """
-        pass
+        raise NotImplementedError

-    def ipw(self, return_results=True, effect_group='all', disp=False):
+    def ipw(self, return_results=True, effect_group="all", disp=False):
         """Inverse Probability Weighted treatment effect estimation.

         Parameters
@@ -330,34 +709,152 @@ class TreatmentEffect(object):
         --------
         TreatmentEffectsResults
         """
-        pass
-
-    @Substitution(params_returns=indent(doc_params_returns, ' ' * 8))
-    def ra(self, return_results=True, effect_group='all', disp=False):
+        endog = self.model_pool.endog
+        tind = self.treatment
+        prob = self.prob_select
+        if effect_group == "all":
+            probt = None
+        elif effect_group in [1, "treated"]:
+            probt = prob
+            effect_group = 1  # standardize effect_group name
+        elif effect_group in [0, "untreated", "control"]:
+            probt = 1 - prob
+            effect_group = 0  # standardize effect_group name
+        elif isinstance(effect_group, np.ndarray):
+            probt = effect_group
+            effect_group = "user"  # standardize effect_group name
+        else:
+            raise ValueError("incorrect option for effect_group")
+
+        res_ipw = ate_ipw(endog, tind, prob, weighted=True, probt=probt)
+
+        if not return_results:
+            return res_ipw
+
+        # gmm = _TEGMMGeneric1(endog, self.results_select, _mom_ols_te,
+        #                     probt=probt)
+        gmm = _IPWGMM(endog, self.results_select, None, teff=self,
+                      effect_group=effect_group)
+        start_params = np.concatenate((res_ipw[:2],
+                                       self.results_select.params))
+        res_gmm = gmm.fit(start_params=start_params,
+                          inv_weights=np.eye(len(start_params)),
+                          optim_method='nm',
+                          optim_args={"maxiter": 5000, "disp": disp},
+                          maxiter=1,
+                          )
+
+        res = TreatmentEffectResults(self, res_gmm, "IPW",
+                                     start_params=start_params,
+                                     effect_group=effect_group,
+                                     )
+        return res
+
+    @Substitution(params_returns=indent(doc_params_returns, " " * 8))
+    def ra(self, return_results=True, effect_group="all", disp=False):
         """
         Regression Adjustment treatment effect estimation.
-        
-%(params_returns)s
+        \n%(params_returns)s
         See Also
         --------
         TreatmentEffectsResults
         """
-        pass
+        # need indicator for reordered observations
+        tind = np.zeros(len(self.treatment))
+        tind[-self.treatment.sum():] = 1
+        if effect_group == "all":
+            probt = None
+        elif effect_group in [1, "treated"]:
+            probt = tind
+            effect_group = 1  # standardize effect_group name
+        elif effect_group in [0, "untreated", "control"]:
+            probt = 1 - tind
+            effect_group = 0  # standardize effect_group name
+        elif isinstance(effect_group, np.ndarray):
+            # TODO: do we keep this?
+            probt = effect_group
+            effect_group = "user"  # standardize effect_group name
+        else:
+            raise ValueError("incorrect option for effect_group")
+
+        exog = self.exog_grouped

-    @Substitution(params_returns=indent(doc_params_returns2, ' ' * 8))
+        # weight or indicator for effect_group
+        if probt is not None:
+            cw = (probt / probt.mean())
+        else:
+            cw = 1
+
+        pom0 = (self.results0.predict(exog) * cw).mean()
+        pom1 = (self.results1.predict(exog) * cw).mean()
+        if not return_results:
+            return pom1 - pom0, pom0, pom1
+
+        endog = self.model_pool.endog
+        mod_gmm = _RAGMM(endog, self.results_select, None, teff=self,
+                         probt=probt)
+        start_params = np.concatenate((
+            # ate, tt0.effect,
+            [pom1 - pom0, pom0],
+            self.results0.params,
+            self.results1.params))
+        res_gmm = mod_gmm.fit(start_params=start_params,
+                              inv_weights=np.eye(len(start_params)),
+                              optim_method='nm',
+                              optim_args={"maxiter": 5000, "disp": disp},
+                              maxiter=1,
+                              )
+        res = TreatmentEffectResults(self, res_gmm, "IPW",
+                                     start_params=start_params,
+                                     effect_group=effect_group,
+                                     )
+        return res
+
+    @Substitution(params_returns=indent(doc_params_returns2, " " * 8))
     def aipw(self, return_results=True, disp=False):
         """
         ATE and POM from double robust augmented inverse probability weighting
-        
-%(params_returns)s
+        \n%(params_returns)s
         See Also
         --------
         TreatmentEffectsResults

         """
-        pass

-    @Substitution(params_returns=indent(doc_params_returns2, ' ' * 8))
+        nobs = self.nobs
+        prob = self.prob_select
+        tind = self.treatment
+        exog = self.model_pool.exog  # in original order
+        correct0 = (self.results0.resid / (1 - prob[tind == 0])).sum() / nobs
+        correct1 = (self.results1.resid / (prob[tind == 1])).sum() / nobs
+        tmean0 = self.results0.predict(exog).mean() + correct0
+        tmean1 = self.results1.predict(exog).mean() + correct1
+        ate = tmean1 - tmean0
+        if not return_results:
+            return ate, tmean0, tmean1
+
+        endog = self.model_pool.endog
+        p2_aipw = np.asarray([ate, tmean0])
+
+        mag_aipw1 = _AIPWGMM(endog, self.results_select, None, teff=self)
+        start_params = np.concatenate((
+            p2_aipw,
+            self.results0.params, self.results1.params,
+            self.results_select.params))
+        res_gmm = mag_aipw1.fit(
+            start_params=start_params,
+            inv_weights=np.eye(len(start_params)),
+            optim_method='nm',
+            optim_args={"maxiter": 5000, "disp": disp},
+            maxiter=1)
+
+        res = TreatmentEffectResults(self, res_gmm, "IPW",
+                                     start_params=start_params,
+                                     effect_group="all",
+                                     )
+        return res
+
+    @Substitution(params_returns=indent(doc_params_returns2, " " * 8))
     def aipw_wls(self, return_results=True, disp=False):
         """
         ATE and POM from double robust augmented inverse probability weighting.
@@ -365,25 +862,135 @@ class TreatmentEffect(object):
         This uses weighted outcome regression, while `aipw` uses unweighted
         outcome regression.
         Option for effect on treated or on untreated is not available.
-        
-%(params_returns)s
+        \n%(params_returns)s
         See Also
         --------
         TreatmentEffectsResults

         """
-        pass
-
-    @Substitution(params_returns=indent(doc_params_returns, ' ' * 8))
-    def ipw_ra(self, return_results=True, effect_group='all', disp=False):
+        nobs = self.nobs
+        prob = self.prob_select
+
+        endog = self.model_pool.endog
+        exog = self.model_pool.exog
+        tind = self.treatment
+        treat_mask = self.treat_mask
+
+        ww1 = tind / prob * (tind / prob - 1)
+        mod1 = WLS(endog[treat_mask], exog[treat_mask],
+                   weights=ww1[treat_mask])
+        result1 = mod1.fit(cov_type='HC1')
+        mean1_ipw2 = result1.predict(exog).mean()
+
+        ww0 = (1 - tind) / (1 - prob) * ((1 - tind) / (1 - prob) - 1)
+        mod0 = WLS(endog[~treat_mask], exog[~treat_mask],
+                   weights=ww0[~treat_mask])
+        result0 = mod0.fit(cov_type='HC1')
+        mean0_ipw2 = result0.predict(exog).mean()
+
+        self.results_ipwwls0 = result0
+        self.results_ipwwls1 = result1
+
+        correct0 = (result0.resid / (1 - prob[tind == 0])).sum() / nobs
+        correct1 = (result1.resid / (prob[tind == 1])).sum() / nobs
+        tmean0 = mean0_ipw2 + correct0
+        tmean1 = mean1_ipw2 + correct1
+        ate = tmean1 - tmean0
+
+        if not return_results:
+            return ate, tmean0, tmean1
+
+        p2_aipw_wls = np.asarray([ate, tmean0]).squeeze()
+
+        # GMM
+        mod_gmm = _AIPWWLSGMM(endog, self.results_select, None,
+                              teff=self)
+        start_params = np.concatenate((
+            p2_aipw_wls,
+            result0.params,
+            result1.params,
+            self.results_select.params))
+        res_gmm = mod_gmm.fit(
+            start_params=start_params,
+            inv_weights=np.eye(len(start_params)),
+            optim_method='nm',
+            optim_args={"maxiter": 5000, "disp": disp},
+            maxiter=1)
+        res = TreatmentEffectResults(self, res_gmm, "IPW",
+                                     start_params=start_params,
+                                     effect_group="all",
+                                     )
+        return res
+
+    @Substitution(params_returns=indent(doc_params_returns, " " * 8))
+    def ipw_ra(self, return_results=True, effect_group="all", disp=False):
         """
         ATE and POM from inverse probability weighted regression adjustment.

-        
-%(params_returns)s
+        \n%(params_returns)s
         See Also
         --------
         TreatmentEffectsResults

         """
-        pass
+        treat_mask = self.treat_mask
+        endog = self.model_pool.endog
+        exog = self.model_pool.exog
+        prob = self.prob_select
+
+        prob0 = prob[~treat_mask]
+        prob1 = prob[treat_mask]
+        if effect_group == "all":
+            w0 = 1 / (1 - prob0)
+            w1 = 1 / prob1
+            exogt = exog
+        elif effect_group in [1, "treated"]:
+            w0 = prob0 / (1 - prob0)
+            w1 = prob1 / prob1
+            exogt = exog[treat_mask]
+            effect_group = 1  # standardize effect_group name
+        elif effect_group in [0, "untreated", "control"]:
+            w0 = (1 - prob0) / (1 - prob0)
+            w1 = (1 - prob1) / prob1
+            exogt = exog[~treat_mask]
+            effect_group = 0  # standardize effect_group name
+        else:
+            raise ValueError("incorrect option for effect_group")
+
+        mod0 = WLS(endog[~treat_mask], exog[~treat_mask],
+                   weights=w0)
+        result0 = mod0.fit(cov_type='HC1')
+        # mean0_ipwra = (result0.predict(exog) * (prob / prob.mean())).mean()
+        mean0_ipwra = result0.predict(exogt).mean()
+
+        mod1 = WLS(endog[treat_mask], exog[treat_mask],
+                   weights=w1)
+        result1 = mod1.fit(cov_type='HC1')
+        # mean1_ipwra = (result1.predict(exog) * (prob / prob.mean())).mean()
+        mean1_ipwra = result1.predict(exogt).mean()
+
+        if not return_results:
+            return mean1_ipwra - mean0_ipwra, mean0_ipwra, mean1_ipwra
+
+        # GMM
+        mod_gmm = _IPWRAGMM(endog, self.results_select, None, teff=self,
+                            effect_group=effect_group)
+        start_params = np.concatenate((
+            [mean1_ipwra - mean0_ipwra, mean0_ipwra],
+            result0.params,
+            result1.params,
+            np.asarray(self.results_select.params)
+            ))
+        res_gmm = mod_gmm.fit(
+            start_params=start_params,
+            inv_weights=np.eye(len(start_params)),
+            optim_method='nm',
+            optim_args={"maxiter": 2000, "disp": disp},
+            maxiter=1
+            )
+
+        res = TreatmentEffectResults(self, res_gmm, "IPW",
+                                     start_params=start_params,
+                                     effect_group=effect_group,
+                                     )
+        return res
diff --git a/statsmodels/tsa/_bds.py b/statsmodels/tsa/_bds.py
index 7dc940027..eba21dbed 100644
--- a/statsmodels/tsa/_bds.py
+++ b/statsmodels/tsa/_bds.py
@@ -16,8 +16,10 @@ LeBaron, Blake. 1997.
 "A Fast Algorithm for the BDS Statistic."
 Studies in Nonlinear Dynamics & Econometrics 2 (2) (January 1).
 """
+
 import numpy as np
 from scipy import stats
+
 from statsmodels.tools.validation import array_like


@@ -45,7 +47,22 @@ def distance_indicators(x, epsilon=None, distance=1.5):
     -----
     Since this can be a very large matrix, use np.int8 to save some space.
     """
-    pass
+    x = array_like(x, 'x')
+
+    if epsilon is not None and epsilon <= 0:
+        raise ValueError("Threshold distance must be positive if specified."
+                         " Got epsilon of %f" % epsilon)
+    if distance <= 0:
+        raise ValueError("Threshold distance must be positive."
+                         " Got distance multiplier %f" % distance)
+
+    # TODO: add functionality to select epsilon optimally
+    # TODO: and/or compute for a range of epsilons in [0.5*s, 2.0*s]?
+    #      or [1.5*s, 2.0*s]?
+    if epsilon is None:
+        epsilon = distance * x.std(ddof=1)
+
+    return np.abs(x[:, None] - x) < epsilon


 def correlation_sum(indicators, embedding_dim):
@@ -68,7 +85,20 @@ def correlation_sum(indicators, embedding_dim):
     indicators_joint
         matrix of joint-distance-threshold indicators
     """
-    pass
+    if not indicators.ndim == 2:
+        raise ValueError('Indicators must be a matrix')
+    if not indicators.shape[0] == indicators.shape[1]:
+        raise ValueError('Indicator matrix must be symmetric (square)')
+
+    if embedding_dim == 1:
+        indicators_joint = indicators
+    else:
+        corrsum, indicators = correlation_sum(indicators, embedding_dim - 1)
+        indicators_joint = indicators[1:, 1:]*indicators[:-1, :-1]
+
+    nobs = len(indicators_joint)
+    corrsum = np.mean(indicators_joint[np.triu_indices(nobs, 1)])
+    return corrsum, indicators_joint


 def correlation_sums(indicators, max_dim):
@@ -87,7 +117,14 @@ def correlation_sums(indicators, max_dim):
     corrsums : ndarray
         Correlation sums
     """
-    pass
+
+    corrsums = np.zeros((1, max_dim))
+
+    corrsums[0, 0], indicators = correlation_sum(indicators, 1)
+    for i in range(1, max_dim):
+        corrsums[0, i], indicators = correlation_sum(indicators, 2)
+
+    return corrsums


 def _var(indicators, max_dim):
@@ -106,7 +143,24 @@ def _var(indicators, max_dim):
     variances : float
         Variance of BDS effect
     """
-    pass
+    nobs = len(indicators)
+    corrsum_1dim, _ = correlation_sum(indicators, 1)
+    k = ((indicators.sum(1)**2).sum() - 3*indicators.sum() +
+         2*nobs) / (nobs * (nobs - 1) * (nobs - 2))
+
+    variances = np.zeros((1, max_dim - 1))
+
+    for embedding_dim in range(2, max_dim + 1):
+        tmp = 0
+        for j in range(1, embedding_dim):
+            tmp += (k**(embedding_dim - j))*(corrsum_1dim**(2 * j))
+        variances[0, embedding_dim-2] = 4 * (
+            k**embedding_dim +
+            2 * tmp +
+            ((embedding_dim - 1)**2) * (corrsum_1dim**(2 * embedding_dim)) -
+            (embedding_dim**2) * k * (corrsum_1dim**(2 * embedding_dim - 2)))
+
+    return variances, k


 def bds(x, max_dim=2, epsilon=None, distance=1.5):
@@ -147,4 +201,43 @@ def bds(x, max_dim=2, epsilon=None, distance=1.5):
     required to calculate the m-histories:
     x_t^m = (x_t, x_{t-1}, ... x_{t-(m-1)})
     """
-    pass
+    x = array_like(x, 'x', ndim=1)
+    nobs_full = len(x)
+
+    if max_dim < 2 or max_dim >= nobs_full:
+        raise ValueError("Maximum embedding dimension must be in the range"
+                         " [2,len(x)-1]. Got %d." % max_dim)
+
+    # Cache the indicators
+    indicators = distance_indicators(x, epsilon, distance)
+
+    # Get estimates of m-dimensional correlation integrals
+    corrsum_mdims = correlation_sums(indicators, max_dim)
+
+    # Get variance of effect
+    variances, k = _var(indicators, max_dim)
+    stddevs = np.sqrt(variances)
+
+    bds_stats = np.zeros((1, max_dim - 1))
+    pvalues = np.zeros((1, max_dim - 1))
+    for embedding_dim in range(2, max_dim+1):
+        ninitial = (embedding_dim - 1)
+        nobs = nobs_full - ninitial
+
+        # Get estimates of 1-dimensional correlation integrals
+        # (see Kanzler footnote 10 for why indicators are truncated)
+        corrsum_1dim, _ = correlation_sum(indicators[ninitial:, ninitial:], 1)
+        corrsum_mdim = corrsum_mdims[0, embedding_dim - 1]
+
+        # Get the intermediate values for the statistic
+        effect = corrsum_mdim - (corrsum_1dim**embedding_dim)
+        sd = stddevs[0, embedding_dim - 2]
+
+        # Calculate the statistic: bds_stat ~ N(0,1)
+        bds_stats[0, embedding_dim - 2] = np.sqrt(nobs) * effect / sd
+
+        # Calculate the p-value (two-tailed test)
+        pvalue = 2*stats.norm.sf(np.abs(bds_stats[0, embedding_dim - 2]))
+        pvalues[0, embedding_dim - 2] = pvalue
+
+    return np.squeeze(bds_stats), np.squeeze(pvalues)
diff --git a/statsmodels/tsa/adfvalues.py b/statsmodels/tsa/adfvalues.py
index 58215d801..bdd5e6f4c 100644
--- a/statsmodels/tsa/adfvalues.py
+++ b/statsmodels/tsa/adfvalues.py
@@ -1,109 +1,226 @@
 from scipy.stats import norm
 from numpy import array, polyval, inf, asarray
+
 __all__ = ['mackinnonp', 'mackinnoncrit']
+
+# These are the cut-off values for the left-tail vs. the rest of the
+# tau distribution, for getting the p-values
+
 tau_star_nc = [-1.04, -1.53, -2.68, -3.09, -3.07, -3.77]
 tau_min_nc = [-19.04, -19.62, -21.21, -23.25, -21.63, -25.74]
 tau_max_nc = [inf, 1.51, 0.86, 0.88, 1.05, 1.24]
 tau_star_c = [-1.61, -2.62, -3.13, -3.47, -3.78, -3.93]
 tau_min_c = [-18.83, -18.86, -23.48, -28.07, -25.96, -23.27]
 tau_max_c = [2.74, 0.92, 0.55, 0.61, 0.79, 1]
-tau_star_ct = [-2.89, -3.19, -3.5, -3.65, -3.8, -4.36]
+tau_star_ct = [-2.89, -3.19, -3.50, -3.65, -3.80, -4.36]
 tau_min_ct = [-16.18, -21.15, -25.37, -26.63, -26.53, -26.18]
 tau_max_ct = [0.7, 0.63, 0.71, 0.93, 1.19, 1.42]
 tau_star_ctt = [-3.21, -3.51, -3.81, -3.83, -4.12, -4.63]
 tau_min_ctt = [-17.17, -21.1, -24.33, -24.03, -24.33, -28.22]
 tau_max_ctt = [0.54, 0.79, 1.08, 1.43, 3.49, 1.92]
-_tau_maxs = {'n': tau_max_nc, 'c': tau_max_c, 'ct': tau_max_ct, 'ctt':
-    tau_max_ctt}
-_tau_mins = {'n': tau_min_nc, 'c': tau_min_c, 'ct': tau_min_ct, 'ctt':
-    tau_min_ctt}
-_tau_stars = {'n': tau_star_nc, 'c': tau_star_c, 'ct': tau_star_ct, 'ctt':
-    tau_star_ctt}
-small_scaling = array([1, 1, 0.01])
-tau_nc_smallp = [[0.6344, 1.2378, 3.2496], [1.9129, 1.3857, 3.5322], [
-    2.7648, 1.4502, 3.4186], [3.4336, 1.4835, 3.19], [4.0999, 1.5533, 3.59],
+
+_tau_maxs = {
+    "n": tau_max_nc,
+    "c": tau_max_c,
+    "ct": tau_max_ct,
+    "ctt": tau_max_ctt,
+}
+_tau_mins = {
+    "n": tau_min_nc,
+    "c": tau_min_c,
+    "ct": tau_min_ct,
+    "ctt": tau_min_ctt,
+}
+_tau_stars = {
+    "n": tau_star_nc,
+    "c": tau_star_c,
+    "ct": tau_star_ct,
+    "ctt": tau_star_ctt,
+}
+
+
+small_scaling = array([1, 1, 1e-2])
+tau_nc_smallp = [
+    [0.6344, 1.2378, 3.2496],
+    [1.9129, 1.3857, 3.5322],
+    [2.7648, 1.4502, 3.4186],
+    [3.4336, 1.4835, 3.19],
+    [4.0999, 1.5533, 3.59],
     [4.5388, 1.5344, 2.9807]]
-tau_nc_smallp = asarray(tau_nc_smallp) * small_scaling
-tau_c_smallp = [[2.1659, 1.4412, 3.8269], [2.92, 1.5012, 3.9796], [3.4699, 
-    1.4856, 3.164], [3.9673, 1.4777, 2.6315], [4.5509, 1.5338, 2.9545], [
-    5.1399, 1.6036, 3.4445]]
-tau_c_smallp = asarray(tau_c_smallp) * small_scaling
-tau_ct_smallp = [[3.2512, 1.6047, 4.9588], [3.6646, 1.5419, 3.6448], [
-    4.0983, 1.5173, 2.9898], [4.5844, 1.5338, 2.8796], [5.0722, 1.5634, 
-    2.9472], [5.53, 1.5914, 3.0392]]
-tau_ct_smallp = asarray(tau_ct_smallp) * small_scaling
-tau_ctt_smallp = [[4.0003, 1.658, 4.8288], [4.3534, 1.6016, 3.7947], [
-    4.7343, 1.5768, 3.2396], [5.214, 1.6077, 3.3449], [5.6481, 1.6274, 
-    3.3455], [5.9296, 1.5929, 2.8223]]
-tau_ctt_smallp = asarray(tau_ctt_smallp) * small_scaling
-_tau_smallps = {'n': tau_nc_smallp, 'c': tau_c_smallp, 'ct': tau_ct_smallp,
-    'ctt': tau_ctt_smallp}
-large_scaling = array([1, 0.1, 0.1, 0.01])
-tau_nc_largep = [[0.4797, 9.3557, -0.6999, 3.3066], [1.5578, 8.558, -2.083,
-    -3.3549], [2.2268, 6.8093, -3.2362, -5.4448], [2.7654, 6.4502, -3.0811,
-    -4.4946], [3.2684, 6.8051, -2.6778, -3.4972], [3.7268, 7.167, -2.3648, 
-    -2.8288]]
-tau_nc_largep = asarray(tau_nc_largep) * large_scaling
-tau_c_largep = [[1.7339, 9.3202, -1.2745, -1.0368], [2.1945, 6.4695, -
-    2.9198, -4.2377], [2.5893, 4.5168, -3.6529, -5.0074], [3.0387, 4.5452, 
-    -3.3666, -4.1921], [3.5049, 5.2098, -2.9158, -3.3468], [3.9489, 5.8933,
-    -2.5359, -2.721]]
-tau_c_largep = asarray(tau_c_largep) * large_scaling
-tau_ct_largep = [[2.5261, 6.1654, -3.7956, -6.0285], [2.85, 5.272, -3.6622,
-    -5.1695], [3.221, 5.255, -3.2685, -4.1501], [3.652, 5.9758, -2.7483, -
-    3.2081], [4.0712, 6.6428, -2.3464, -2.546], [4.4735, 7.1757, -2.0681, -
-    2.1196]]
-tau_ct_largep = asarray(tau_ct_largep) * large_scaling
-tau_ctt_largep = [[3.0778, 4.9529, -4.1477, -5.9359], [3.4713, 5.967, -
-    3.2507, -4.2286], [3.8637, 6.7852, -2.6286, -3.1381], [4.2736, 7.6199, 
-    -2.1534, -2.4026], [4.6679, 8.2618, -1.822, -1.9147], [5.0009, 8.3735, 
-    -1.6994, -1.6928]]
-tau_ctt_largep = asarray(tau_ctt_largep) * large_scaling
-_tau_largeps = {'n': tau_nc_largep, 'c': tau_c_largep, 'ct': tau_ct_largep,
-    'ctt': tau_ctt_largep}
+tau_nc_smallp = asarray(tau_nc_smallp)*small_scaling
+
+tau_c_smallp = [
+    [2.1659, 1.4412, 3.8269],
+    [2.92, 1.5012, 3.9796],
+    [3.4699, 1.4856, 3.164],
+    [3.9673, 1.4777, 2.6315],
+    [4.5509, 1.5338, 2.9545],
+    [5.1399, 1.6036, 3.4445]]
+tau_c_smallp = asarray(tau_c_smallp)*small_scaling
+
+tau_ct_smallp = [
+    [3.2512, 1.6047, 4.9588],
+    [3.6646, 1.5419, 3.6448],
+    [4.0983, 1.5173, 2.9898],
+    [4.5844, 1.5338, 2.8796],
+    [5.0722, 1.5634, 2.9472],
+    [5.53, 1.5914, 3.0392]]
+tau_ct_smallp = asarray(tau_ct_smallp)*small_scaling
+
+tau_ctt_smallp = [
+    [4.0003, 1.658, 4.8288],
+    [4.3534, 1.6016, 3.7947],
+    [4.7343, 1.5768, 3.2396],
+    [5.214, 1.6077, 3.3449],
+    [5.6481, 1.6274, 3.3455],
+    [5.9296, 1.5929, 2.8223]]
+tau_ctt_smallp = asarray(tau_ctt_smallp)*small_scaling
+
+_tau_smallps = {
+    "n": tau_nc_smallp,
+    "c": tau_c_smallp,
+    "ct": tau_ct_smallp,
+    "ctt": tau_ctt_smallp,
+}
+
+
+large_scaling = array([1, 1e-1, 1e-1, 1e-2])
+tau_nc_largep = [
+    [0.4797, 9.3557, -0.6999, 3.3066],
+    [1.5578, 8.558, -2.083, -3.3549],
+    [2.2268, 6.8093, -3.2362, -5.4448],
+    [2.7654, 6.4502, -3.0811, -4.4946],
+    [3.2684, 6.8051, -2.6778, -3.4972],
+    [3.7268, 7.167, -2.3648, -2.8288]]
+tau_nc_largep = asarray(tau_nc_largep)*large_scaling
+
+tau_c_largep = [
+    [1.7339, 9.3202, -1.2745, -1.0368],
+    [2.1945, 6.4695, -2.9198, -4.2377],
+    [2.5893, 4.5168, -3.6529, -5.0074],
+    [3.0387, 4.5452, -3.3666, -4.1921],
+    [3.5049, 5.2098, -2.9158, -3.3468],
+    [3.9489, 5.8933, -2.5359, -2.721]]
+tau_c_largep = asarray(tau_c_largep)*large_scaling
+
+tau_ct_largep = [
+    [2.5261, 6.1654, -3.7956, -6.0285],
+    [2.85, 5.272, -3.6622, -5.1695],
+    [3.221, 5.255, -3.2685, -4.1501],
+    [3.652, 5.9758, -2.7483, -3.2081],
+    [4.0712, 6.6428, -2.3464, -2.546],
+    [4.4735, 7.1757, -2.0681, -2.1196]]
+tau_ct_largep = asarray(tau_ct_largep)*large_scaling
+
+tau_ctt_largep = [
+    [3.0778, 4.9529, -4.1477, -5.9359],
+    [3.4713, 5.967, -3.2507, -4.2286],
+    [3.8637, 6.7852, -2.6286, -3.1381],
+    [4.2736, 7.6199, -2.1534, -2.4026],
+    [4.6679, 8.2618, -1.822, -1.9147],
+    [5.0009, 8.3735, -1.6994, -1.6928]]
+tau_ctt_largep = asarray(tau_ctt_largep)*large_scaling
+
+_tau_largeps = {
+    "n": tau_nc_largep,
+    "c": tau_c_largep,
+    "ct": tau_ct_largep,
+    "ctt": tau_ctt_largep,
+}
+
+
+# NOTE: The Z-statistic is used when lags are included to account for
+#  serial correlation in the error term
+
 z_star_nc = [-2.9, -8.7, -14.8, -20.9, -25.7, -30.5]
 z_star_c = [-8.9, -14.3, -19.5, -25.1, -29.6, -34.4]
 z_star_ct = [-15.0, -19.6, -25.3, -29.6, -31.8, -38.4]
 z_star_ctt = [-20.7, -25.3, -29.9, -34.4, -38.5, -44.2]
-z_nc_smallp = array([[0.0342, -0.6376, 0, -0.03872], [1.3426, -0.768, 0, -
-    0.04104], [3.8607, -2.4159, 0.51293, -0.09835], [6.1072, -3.725, 
-    0.85887, -0.13102], [7.78, -4.4579, 1.00056, -0.14014], [4.0253, -
-    0.8815, 0, -0.04887]])
-z_c_smallp = array([[2.2142, -1.7863, 0.32828, -0.07727], [1.1662, 0.1814, 
-    -0.36707, 0], [6.6584, -4.3486, 1.04705, -0.15011], [3.3249, -0.8456, 0,
-    -0.04818], [4.0356, -0.9306, 0, -0.04776], [13.9959, -8.4314, 1.97411, 
-    -0.22234]])
-z_ct_smallp = array([[4.6476, -2.8932, 0.5832, -0.0999], [7.2453, -4.7021, 
-    1.127, -0.15665], [3.4893, -0.8914, 0, -0.04755], [1.6604, 1.0375, -
-    0.53377, 0], [2.006, 1.1197, -0.55315, 0], [11.1626, -5.6858, 1.21479, 
-    -0.15428]])
-z_ctt_smallp = array([[3.6739, -1.1549, 0, -0.03947], [3.9783, -1.0619, 0, 
-    -0.04394], [2.0062, 0.8907, -0.51708, 0], [4.9218, -1.0663, 0, -0.04691
-    ], [5.1433, -0.9877, 0, -0.04993], [23.6812, -14.6485, 3.42909, -0.33794]])
-z_large_scaling = array([1, 0.1, 0.01, 0.001, 1e-05])
-z_nc_largep = array([[0.4927, 6.906, 13.2331, 12.099, 0], [1.5167, 4.6859, 
-    4.2401, 2.7939, 7.9601], [2.2347, 3.9465, 2.2406, 0.8746, 1.4239], [
-    2.8239, 3.6265, 1.6738, 0.5408, 0.7449], [3.3174, 3.3492, 1.2792, 
-    0.3416, 0.3894], [3.729, 3.0611, 0.9579, 0.2087, 0.1943]])
+
+
+# These are Table 5 from MacKinnon (1994)
+# small p is defined as p in .005 to .150 ie p = .005 up to z_star
+# Z* is the largest value for which it is appropriate to use these
+# approximations
+# the left tail approximation is
+# p = norm.cdf(d_0 + d_1*log(abs(z)) + d_2*log(abs(z))**2 + d_3*log(abs(z))**3)
+# there is no Z-min, ie., it is well-behaved in the left tail
+
+z_nc_smallp = array([
+    [.0342, -.6376, 0, -.03872],
+    [1.3426, -.7680, 0, -.04104],
+    [3.8607, -2.4159, .51293, -.09835],
+    [6.1072, -3.7250, .85887, -.13102],
+    [7.7800, -4.4579, 1.00056, -.14014],
+    [4.0253, -.8815, 0, -.04887]])
+
+z_c_smallp = array([
+    [2.2142, -1.7863, .32828, -.07727],
+    [1.1662, .1814, -.36707, 0],
+    [6.6584, -4.3486, 1.04705, -.15011],
+    [3.3249, -.8456, 0, -.04818],
+    [4.0356, -.9306, 0, -.04776],
+    [13.9959, -8.4314, 1.97411, -.22234]])
+
+z_ct_smallp = array([
+    [4.6476, -2.8932, 0.5832, -0.0999],
+    [7.2453, -4.7021, 1.127, -.15665],
+    [3.4893, -0.8914, 0, -.04755],
+    [1.6604, 1.0375, -0.53377, 0],
+    [2.006, 1.1197, -0.55315, 0],
+    [11.1626, -5.6858, 1.21479, -.15428]])
+
+z_ctt_smallp = array([
+    [3.6739, -1.1549, 0, -0.03947],
+    [3.9783, -1.0619, 0, -0.04394],
+    [2.0062, 0.8907, -0.51708, 0],
+    [4.9218, -1.0663, 0, -0.04691],
+    [5.1433, -0.9877, 0, -0.04993],
+    [23.6812, -14.6485, 3.42909, -.33794]])
+# These are Table 6 from MacKinnon (1994).
+# These are well-behaved in the right tail.
+# the approximation function is
+# p = norm.cdf(d_0 + d_1 * z + d_2*z**2 + d_3*z**3 + d_4*z**4)
+z_large_scaling = array([1, 1e-1, 1e-2, 1e-3, 1e-5])
+z_nc_largep = array([
+    [0.4927, 6.906, 13.2331, 12.099, 0],
+    [1.5167, 4.6859, 4.2401, 2.7939, 7.9601],
+    [2.2347, 3.9465, 2.2406, 0.8746, 1.4239],
+    [2.8239, 3.6265, 1.6738, 0.5408, 0.7449],
+    [3.3174, 3.3492, 1.2792, 0.3416, 0.3894],
+    [3.729, 3.0611, 0.9579, 0.2087, 0.1943]])
 z_nc_largep *= z_large_scaling
-z_c_largep = array([[1.717, 5.5243, 4.3463, 1.6671, 0], [2.2394, 4.2377, 
-    2.432, 0.9241, 0.4364], [2.743, 3.626, 1.5703, 0.4612, 0.567], [3.228, 
-    3.3399, 1.2319, 0.3162, 0.3482], [3.6583, 3.0934, 0.9681, 0.2111, 
-    0.1979], [4.0379, 2.8735, 0.7694, 0.1433, 0.1146]])
+
+z_c_largep = array([
+    [1.717, 5.5243, 4.3463, 1.6671, 0],
+    [2.2394, 4.2377, 2.432, 0.9241, 0.4364],
+    [2.743, 3.626, 1.5703, 0.4612, 0.567],
+    [3.228, 3.3399, 1.2319, 0.3162, 0.3482],
+    [3.6583, 3.0934, 0.9681, 0.2111, 0.1979],
+    [4.0379, 2.8735, 0.7694, 0.1433, 0.1146]])
 z_c_largep *= z_large_scaling
-z_ct_largep = array([[2.7117, 4.5731, 2.2868, 0.6362, 0.5], [3.0972, 4.0873,
-    1.8982, 0.5796, 0.7384], [3.4594, 3.6326, 1.4284, 0.3813, 0.4325], [
-    3.806, 3.2634, 1.0689, 0.2402, 0.2304], [4.1402, 2.9867, 0.8323, 0.16, 
-    0.1315], [4.4497, 2.7534, 0.6582, 0.1089, 0.0773]])
+
+z_ct_largep = array([
+    [2.7117, 4.5731, 2.2868, 0.6362, 0.5],
+    [3.0972, 4.0873, 1.8982, 0.5796, 0.7384],
+    [3.4594, 3.6326, 1.4284, 0.3813, 0.4325],
+    [3.806, 3.2634, 1.0689, 0.2402, 0.2304],
+    [4.1402, 2.9867, 0.8323, 0.16, 0.1315],
+    [4.4497, 2.7534, 0.6582, 0.1089, 0.0773]])
 z_ct_largep *= z_large_scaling
-z_ctt_largep = array([[3.4671, 4.3476, 1.9231, 0.5381, 0.6216], [3.7827, 
-    3.9421, 1.5699, 0.4093, 0.4485], [4.052, 3.4947, 1.1772, 0.2642, 0.2502
-    ], [4.3311, 3.1625, 0.9126, 0.1775, 0.1462], [4.594, 2.8739, 0.707, 
-    0.1181, 0.0838], [4.8479, 2.6447, 0.5647, 0.0827, 0.0518]])
+
+z_ctt_largep = array([
+    [3.4671, 4.3476, 1.9231, 0.5381, 0.6216],
+    [3.7827, 3.9421, 1.5699, 0.4093, 0.4485],
+    [4.052, 3.4947, 1.1772, 0.2642, 0.2502],
+    [4.3311, 3.1625, 0.9126, 0.1775, 0.1462],
+    [4.594, 2.8739, 0.707, 0.1181, 0.0838],
+    [4.8479, 2.6447, 0.5647, 0.0827, 0.0518]])
 z_ctt_largep *= z_large_scaling


-def mackinnonp(teststat, regression='c', N=1, lags=None):
+# TODO: finish this and then integrate them into adf function
+def mackinnonp(teststat, regression="c", N=1, lags=None):
     """
     Returns MacKinnon's approximate p-value for teststat.

@@ -136,82 +253,158 @@ def mackinnonp(teststat, regression='c', N=1, lags=None):
     H_0: AR coefficient = 1
     H_a: AR coefficient < 1
     """
-    pass
+    maxstat = _tau_maxs[regression]
+    minstat = _tau_mins[regression]
+    starstat = _tau_stars[regression]
+    if teststat > maxstat[N-1]:
+        return 1.0
+    elif teststat < minstat[N-1]:
+        return 0.0
+    if teststat <= starstat[N-1]:
+        tau_coef = _tau_smallps[regression][N-1]
+    else:
+        # Note: above is only for z stats
+        tau_coef = _tau_largeps[regression][N-1]
+    return norm.cdf(polyval(tau_coef[::-1], teststat))
+

+# These are the new estimates from MacKinnon 2010
+# the first axis is N -1
+# the second axis is 1 %, 5 %, 10 %
+# the last axis is the coefficients

-tau_nc_2010 = [[[-2.56574, -2.2358, -3.627, 0], [-1.941, -0.2686, -3.365, 
-    31.223], [-1.61682, 0.2656, -2.714, 25.364]]]
+tau_nc_2010 = [[
+    [-2.56574, -2.2358, -3.627, 0],  # N = 1
+    [-1.94100, -0.2686, -3.365, 31.223],
+    [-1.61682, 0.2656, -2.714, 25.364]]]
 tau_nc_2010 = asarray(tau_nc_2010)
-tau_c_2010 = [[[-3.43035, -6.5393, -16.786, -79.433], [-2.86154, -2.8903, -
-    4.234, -40.04], [-2.56677, -1.5384, -2.809, 0]], [[-3.89644, -10.9519, 
-    -33.527, 0], [-3.33613, -6.1101, -6.823, 0], [-3.04445, -4.2412, -2.72,
-    0]], [[-4.29374, -14.4354, -33.195, 47.433], [-3.74066, -8.5632, -
-    10.852, 27.982], [-3.45218, -6.2143, -3.718, 0]], [[-4.64332, -18.1031,
-    -37.972, 0], [-4.096, -11.2349, -11.175, 0], [-3.8102, -8.3931, -4.137,
-    0]], [[-4.95756, -21.8883, -45.142, 0], [-4.41519, -14.0405, -12.575, 0
-    ], [-4.13157, -10.7417, -3.784, 0]], [[-5.24568, -25.6688, -57.737, 
-    88.639], [-4.70693, -16.9178, -17.492, 60.007], [-4.42501, -13.1875, -
-    5.104, 27.877]], [[-5.51233, -29.576, -69.398, 164.295], [-4.97684, -
-    19.9021, -22.045, 110.761], [-4.69648, -15.7315, -5.104, 27.877]], [[-
-    5.76202, -33.5258, -82.189, 256.289], [-5.22924, -23.0023, -24.646, 
-    144.479], [-4.95007, -18.3959, -7.344, 94.872]], [[-5.99742, -37.6572, 
-    -87.365, 248.316], [-5.46697, -26.2057, -26.627, 176.382], [-5.18897, -
-    21.1377, -9.484, 172.704]], [[-6.22103, -41.7154, -102.68, 389.33], [-
-    5.69244, -29.4521, -30.994, 251.016], [-5.41533, -24.0006, -7.514, 
-    163.049]], [[-6.43377, -46.0084, -106.809, 352.752], [-5.90714, -
-    32.8336, -30.275, 249.994], [-5.63086, -26.9693, -4.083, 151.427]], [[-
-    6.6379, -50.2095, -124.156, 579.622], [-6.11279, -36.2681, -32.505, 
-    314.802], [-5.83724, -29.9864, -2.686, 184.116]]]
+
+tau_c_2010 = [
+    [[-3.43035, -6.5393, -16.786, -79.433],  # N = 1, 1%
+     [-2.86154, -2.8903, -4.234, -40.040],   # 5 %
+     [-2.56677, -1.5384, -2.809, 0]],        # 10 %
+    [[-3.89644, -10.9519, -33.527, 0],       # N = 2
+     [-3.33613, -6.1101, -6.823, 0],
+     [-3.04445, -4.2412, -2.720, 0]],
+    [[-4.29374, -14.4354, -33.195, 47.433],  # N = 3
+     [-3.74066, -8.5632, -10.852, 27.982],
+     [-3.45218, -6.2143, -3.718, 0]],
+    [[-4.64332, -18.1031, -37.972, 0],       # N = 4
+     [-4.09600, -11.2349, -11.175, 0],
+     [-3.81020, -8.3931, -4.137, 0]],
+    [[-4.95756, -21.8883, -45.142, 0],       # N = 5
+     [-4.41519, -14.0405, -12.575, 0],
+     [-4.13157, -10.7417, -3.784, 0]],
+    [[-5.24568, -25.6688, -57.737, 88.639],  # N = 6
+     [-4.70693, -16.9178, -17.492, 60.007],
+     [-4.42501, -13.1875, -5.104, 27.877]],
+    [[-5.51233, -29.5760, -69.398, 164.295],  # N = 7
+     [-4.97684, -19.9021, -22.045, 110.761],
+     [-4.69648, -15.7315, -5.104, 27.877]],
+    [[-5.76202, -33.5258, -82.189, 256.289],  # N = 8
+     [-5.22924, -23.0023, -24.646, 144.479],
+     [-4.95007, -18.3959, -7.344, 94.872]],
+    [[-5.99742, -37.6572, -87.365, 248.316],  # N = 9
+     [-5.46697, -26.2057, -26.627, 176.382],
+     [-5.18897, -21.1377, -9.484, 172.704]],
+    [[-6.22103, -41.7154, -102.680, 389.33],  # N = 10
+     [-5.69244, -29.4521, -30.994, 251.016],
+     [-5.41533, -24.0006, -7.514, 163.049]],
+    [[-6.43377, -46.0084, -106.809, 352.752],  # N = 11
+     [-5.90714, -32.8336, -30.275, 249.994],
+     [-5.63086, -26.9693, -4.083, 151.427]],
+    [[-6.63790, -50.2095, -124.156, 579.622],  # N = 12
+     [-6.11279, -36.2681, -32.505, 314.802],
+     [-5.83724, -29.9864, -2.686, 184.116]]]
 tau_c_2010 = asarray(tau_c_2010)
-tau_ct_2010 = [[[-3.95877, -9.0531, -28.428, -134.155], [-3.41049, -4.3904,
-    -9.036, -45.374], [-3.12705, -2.5856, -3.925, -22.38]], [[-4.32762, -
-    15.4387, -35.679, 0], [-3.78057, -9.5106, -12.074, 0], [-3.49631, -
-    7.0815, -7.538, 21.892]], [[-4.66305, -18.7688, -49.793, 104.244], [-
-    4.1189, -11.8922, -19.031, 77.332], [-3.83511, -9.0723, -8.504, 35.403]
-    ], [[-4.9694, -22.4694, -52.599, 51.314], [-4.42871, -14.5876, -18.228,
-    39.647], [-4.14633, -11.25, -9.873, 54.109]], [[-5.25276, -26.2183, -
-    59.631, 50.646], [-4.71537, -17.3569, -22.66, 91.359], [-4.43422, -
-    13.6078, -10.238, 76.781]], [[-5.51727, -29.976, -75.222, 202.253], [-
-    4.98228, -20.305, -25.224, 132.03], [-4.70233, -16.1253, -9.836, 94.272
-    ]], [[-5.76537, -33.9165, -84.312, 245.394], [-5.23299, -23.3328, -
-    28.955, 182.342], [-4.95405, -18.7352, -10.168, 120.575]], [[-6.00003, 
-    -37.8892, -96.428, 335.92], [-5.46971, -26.4771, -31.034, 220.165], [-
-    5.19183, -21.4328, -10.726, 157.955]], [[-6.22288, -41.9496, -109.881, 
-    466.068], [-5.69447, -29.7152, -33.784, 273.002], [-5.41738, -24.2882, 
-    -8.584, 169.891]], [[-6.43551, -46.1151, -120.814, 566.823], [-5.90887,
-    -33.0251, -37.208, 346.189], [-5.63255, -27.2042, -6.792, 177.666]], [[
-    -6.63894, -50.4287, -128.997, 642.781], [-6.11404, -36.461, -36.246, 
-    348.554], [-5.8385, -30.1995, -5.163, 210.338]], [[-6.83488, -54.7119, 
-    -139.8, 736.376], [-6.31127, -39.9676, -37.021, 406.051], [-6.0365, -
-    33.2381, -6.606, 317.776]]]
+
+tau_ct_2010 = [
+    [[-3.95877, -9.0531, -28.428, -134.155],   # N = 1
+     [-3.41049, -4.3904, -9.036, -45.374],
+     [-3.12705, -2.5856, -3.925, -22.380]],
+    [[-4.32762, -15.4387, -35.679, 0],         # N = 2
+     [-3.78057, -9.5106, -12.074, 0],
+     [-3.49631, -7.0815, -7.538, 21.892]],
+    [[-4.66305, -18.7688, -49.793, 104.244],   # N = 3
+     [-4.11890, -11.8922, -19.031, 77.332],
+     [-3.83511, -9.0723, -8.504, 35.403]],
+    [[-4.96940, -22.4694, -52.599, 51.314],    # N = 4
+     [-4.42871, -14.5876, -18.228, 39.647],
+     [-4.14633, -11.2500, -9.873, 54.109]],
+    [[-5.25276, -26.2183, -59.631, 50.646],    # N = 5
+     [-4.71537, -17.3569, -22.660, 91.359],
+     [-4.43422, -13.6078, -10.238, 76.781]],
+    [[-5.51727, -29.9760, -75.222, 202.253],   # N = 6
+     [-4.98228, -20.3050, -25.224, 132.03],
+     [-4.70233, -16.1253, -9.836, 94.272]],
+    [[-5.76537, -33.9165, -84.312, 245.394],   # N = 7
+     [-5.23299, -23.3328, -28.955, 182.342],
+     [-4.95405, -18.7352, -10.168, 120.575]],
+    [[-6.00003, -37.8892, -96.428, 335.92],    # N = 8
+     [-5.46971, -26.4771, -31.034, 220.165],
+     [-5.19183, -21.4328, -10.726, 157.955]],
+    [[-6.22288, -41.9496, -109.881, 466.068],  # N = 9
+     [-5.69447, -29.7152, -33.784, 273.002],
+     [-5.41738, -24.2882, -8.584, 169.891]],
+    [[-6.43551, -46.1151, -120.814, 566.823],  # N = 10
+     [-5.90887, -33.0251, -37.208, 346.189],
+     [-5.63255, -27.2042, -6.792, 177.666]],
+    [[-6.63894, -50.4287, -128.997, 642.781],  # N = 11
+     [-6.11404, -36.4610, -36.246, 348.554],
+     [-5.83850, -30.1995, -5.163, 210.338]],
+    [[-6.83488, -54.7119, -139.800, 736.376],  # N = 12
+     [-6.31127, -39.9676, -37.021, 406.051],
+     [-6.03650, -33.2381, -6.606, 317.776]]]
 tau_ct_2010 = asarray(tau_ct_2010)
-tau_ctt_2010 = [[[-4.37113, -11.5882, -35.819, -334.047], [-3.83239, -
-    5.9057, -12.49, -118.284], [-3.55326, -3.6596, -5.293, -63.559]], [[-
-    4.69276, -20.2284, -64.919, 88.884], [-4.15387, -13.3114, -28.402, 
-    72.741], [-3.87346, -10.4637, -17.408, 66.313]], [[-4.99071, -23.5873, 
-    -76.924, 184.782], [-4.45311, -15.7732, -32.316, 122.705], [-4.1728, -
-    12.4909, -17.912, 83.285]], [[-5.2678, -27.2836, -78.971, 137.871], [-
-    4.73244, -18.4833, -31.875, 111.817], [-4.45268, -14.7199, -17.969, 
-    101.92]], [[-5.52826, -30.9051, -92.49, 248.096], [-4.99491, -21.236, -
-    37.685, 194.208], [-4.71587, -17.082, -18.631, 136.672]], [[-5.77379, -
-    34.701, -105.937, 393.991], [-5.24217, -24.2177, -39.153, 232.528], [-
-    4.96397, -19.6064, -18.858, 174.919]], [[-6.00609, -38.7383, -108.605, 
-    365.208], [-5.47664, -27.3005, -39.498, 246.918], [-5.19921, -22.2617, 
-    -17.91, 208.494]], [[-6.22758, -42.7154, -119.622, 421.395], [-5.69983,
-    -30.4365, -44.3, 345.48], [-5.4232, -24.9686, -19.688, 274.462]], [[-
-    6.43933, -46.7581, -136.691, 651.38], [-5.91298, -33.7584, -42.686, 
-    346.629], [-5.63704, -27.8965, -13.88, 236.975]], [[-6.64235, -50.9783,
-    -145.462, 752.228], [-6.11753, -37.056, -48.719, 473.905], [-5.84215, -
-    30.8119, -14.938, 316.006]], [[-6.83743, -55.2861, -152.651, 792.577],
-    [-6.31396, -40.5507, -46.771, 487.185], [-6.03921, -33.895, -9.122, 
-    285.164]], [[-7.02582, -59.6037, -166.368, 989.879], [-6.50353, -
-    44.0797, -47.242, 543.889], [-6.22941, -36.9673, -10.868, 418.414]]]
+
+tau_ctt_2010 = [
+    [[-4.37113, -11.5882, -35.819, -334.047],  # N = 1
+     [-3.83239, -5.9057, -12.490, -118.284],
+     [-3.55326, -3.6596, -5.293, -63.559]],
+    [[-4.69276, -20.2284, -64.919, 88.884],    # N =2
+     [-4.15387, -13.3114, -28.402, 72.741],
+     [-3.87346, -10.4637, -17.408, 66.313]],
+    [[-4.99071, -23.5873, -76.924, 184.782],   # N = 3
+     [-4.45311, -15.7732, -32.316, 122.705],
+     [-4.17280, -12.4909, -17.912, 83.285]],
+    [[-5.26780, -27.2836, -78.971, 137.871],   # N = 4
+     [-4.73244, -18.4833, -31.875, 111.817],
+     [-4.45268, -14.7199, -17.969, 101.92]],
+    [[-5.52826, -30.9051, -92.490, 248.096],   # N = 5
+     [-4.99491, -21.2360, -37.685, 194.208],
+     [-4.71587, -17.0820, -18.631, 136.672]],
+    [[-5.77379, -34.7010, -105.937, 393.991],  # N = 6
+     [-5.24217, -24.2177, -39.153, 232.528],
+     [-4.96397, -19.6064, -18.858, 174.919]],
+    [[-6.00609, -38.7383, -108.605, 365.208],  # N = 7
+     [-5.47664, -27.3005, -39.498, 246.918],
+     [-5.19921, -22.2617, -17.910, 208.494]],
+    [[-6.22758, -42.7154, -119.622, 421.395],  # N = 8
+     [-5.69983, -30.4365, -44.300, 345.48],
+     [-5.42320, -24.9686, -19.688, 274.462]],
+    [[-6.43933, -46.7581, -136.691, 651.38],   # N = 9
+     [-5.91298, -33.7584, -42.686, 346.629],
+     [-5.63704, -27.8965, -13.880, 236.975]],
+    [[-6.64235, -50.9783, -145.462, 752.228],  # N = 10
+     [-6.11753, -37.056, -48.719, 473.905],
+     [-5.84215, -30.8119, -14.938, 316.006]],
+    [[-6.83743, -55.2861, -152.651, 792.577],  # N = 11
+     [-6.31396, -40.5507, -46.771, 487.185],
+     [-6.03921, -33.8950, -9.122, 285.164]],
+    [[-7.02582, -59.6037, -166.368, 989.879],  # N = 12
+     [-6.50353, -44.0797, -47.242, 543.889],
+     [-6.22941, -36.9673, -10.868, 418.414]]]
 tau_ctt_2010 = asarray(tau_ctt_2010)
-tau_2010s = {'n': tau_nc_2010, 'c': tau_c_2010, 'ct': tau_ct_2010, 'ctt':
-    tau_ctt_2010}
+
+tau_2010s = {
+    "n": tau_nc_2010,
+    "c": tau_c_2010,
+    "ct": tau_ct_2010,
+    "ctt": tau_ctt_2010,
+}


-def mackinnoncrit(N=1, regression='c', nobs=inf):
+def mackinnoncrit(N=1, regression="c", nobs=inf):
     """
     Returns the critical values for cointegrating and the ADF test.

@@ -246,4 +439,12 @@ def mackinnoncrit(N=1, regression='c', nobs=inf):
         Queen's University, Dept of Economics Working Papers 1227.
         http://ideas.repec.org/p/qed/wpaper/1227.html
     """
-    pass
+    reg = regression
+    if reg not in ['c', 'ct', 'n', 'ctt']:
+        raise ValueError("regression keyword %s not understood" % reg)
+    tau = tau_2010s[reg]
+    if nobs is inf:
+        return tau[N-1, :, 0]
+    else:
+        val = tau[N-1, :, ::-1]
+        return polyval(val.T, 1./nobs)
diff --git a/statsmodels/tsa/api.py b/statsmodels/tsa/api.py
index c6e419de4..a802379c6 100644
--- a/statsmodels/tsa/api.py
+++ b/statsmodels/tsa/api.py
@@ -1,16 +1,67 @@
-__all__ = ['AR', 'ARDL', 'ARIMA', 'ArmaProcess', 'AutoReg', 'DynamicFactor',
-    'DynamicFactorMQ', 'ETSModel', 'ExponentialSmoothing', 'Holt',
-    'MarkovAutoregression', 'MarkovRegression', 'SARIMAX', 'STL',
-    'STLForecast', 'SVAR', 'SimpleExpSmoothing', 'UECM',
-    'UnobservedComponents', 'VAR', 'VARMAX', 'VECM', 'acf', 'acovf',
-    'add_lag', 'add_trend', 'adfuller', 'range_unit_root_test', 'arima',
-    'arma_generate_sample', 'arma_order_select_ic', 'ardl_select_order',
-    'bds', 'bk_filter', 'breakvar_heteroskedasticity_test', 'ccf', 'ccovf',
-    'cf_filter', 'coint', 'datetools', 'detrend', 'filters', 'graphics',
-    'hp_filter', 'innovations', 'interp', 'kpss', 'lagmat', 'lagmat2ds',
-    'pacf', 'pacf_ols', 'pacf_yw', 'q_stat', 'seasonal_decompose',
-    'statespace', 'stattools', 'tsatools', 'var', 'x13_arima_analysis',
-    'x13_arima_select_order', 'zivot_andrews']
+__all__ = [
+    "AR",
+    "ARDL",
+    "ARIMA",
+    "ArmaProcess",
+    "AutoReg",
+    "DynamicFactor",
+    "DynamicFactorMQ",
+    "ETSModel",
+    "ExponentialSmoothing",
+    "Holt",
+    "MarkovAutoregression",
+    "MarkovRegression",
+    "SARIMAX",
+    "STL",
+    "STLForecast",
+    "SVAR",
+    "SimpleExpSmoothing",
+    "UECM",
+    "UnobservedComponents",
+    "VAR",
+    "VARMAX",
+    "VECM",
+    "acf",
+    "acovf",
+    "add_lag",
+    "add_trend",
+    "adfuller",
+    "range_unit_root_test",
+    "arima",
+    "arma_generate_sample",
+    "arma_order_select_ic",
+    "ardl_select_order",
+    "bds",
+    "bk_filter",
+    "breakvar_heteroskedasticity_test",
+    "ccf",
+    "ccovf",
+    "cf_filter",
+    "coint",
+    "datetools",
+    "detrend",
+    "filters",
+    "graphics",
+    "hp_filter",
+    "innovations",
+    "interp",
+    "kpss",
+    "lagmat",
+    "lagmat2ds",
+    "pacf",
+    "pacf_ols",
+    "pacf_yw",
+    "q_stat",
+    "seasonal_decompose",
+    "statespace",
+    "stattools",
+    "tsatools",
+    "var",
+    "x13_arima_analysis",
+    "x13_arima_select_order",
+    "zivot_andrews"
+]
+
 from . import interp, stattools, tsatools, vector_ar as var
 from ..graphics import tsaplots as graphics
 from .ar_model import AR, AutoReg
@@ -33,7 +84,24 @@ from .statespace.dynamic_factor_mq import DynamicFactorMQ
 from .statespace.sarimax import SARIMAX
 from .statespace.structural import UnobservedComponents
 from .statespace.varmax import VARMAX
-from .stattools import acf, acovf, adfuller, arma_order_select_ic, bds, breakvar_heteroskedasticity_test, ccf, ccovf, coint, kpss, pacf, pacf_ols, pacf_yw, q_stat, range_unit_root_test, zivot_andrews
+from .stattools import (
+    acf,
+    acovf,
+    adfuller,
+    arma_order_select_ic,
+    bds,
+    breakvar_heteroskedasticity_test,
+    ccf,
+    ccovf,
+    coint,
+    kpss,
+    pacf,
+    pacf_ols,
+    pacf_yw,
+    q_stat,
+    range_unit_root_test,
+    zivot_andrews
+)
 from .tsatools import add_lag, add_trend, detrend, lagmat, lagmat2ds
 from .vector_ar.svar_model import SVAR
 from .vector_ar.var_model import VAR
diff --git a/statsmodels/tsa/ar_model.py b/statsmodels/tsa/ar_model.py
index 1e669544e..8e167286c 100644
--- a/statsmodels/tsa/ar_model.py
+++ b/statsmodels/tsa/ar_model.py
@@ -1,14 +1,24 @@
+# -*- coding: utf-8 -*-
 from __future__ import annotations
-from statsmodels.compat.pandas import Appender, Substitution, call_cached_func, to_numpy
+
+from statsmodels.compat.pandas import (
+    Appender,
+    Substitution,
+    call_cached_func,
+    to_numpy,
+)
+
 from collections.abc import Iterable
 import datetime
 import datetime as dt
 from types import SimpleNamespace
 from typing import Any, Literal, Sequence, cast
 import warnings
+
 import numpy as np
 import pandas as pd
 from scipy.stats import gaussian_kde, norm
+
 import statsmodels.base.wrapper as wrap
 from statsmodels.iolib.summary import Summary
 from statsmodels.regression.linear_model import OLS
@@ -16,14 +26,32 @@ from statsmodels.tools import eval_measures
 from statsmodels.tools.decorators import cache_readonly, cache_writable
 from statsmodels.tools.docstring import Docstring, remove_parameters
 from statsmodels.tools.sm_exceptions import SpecificationWarning
-from statsmodels.tools.typing import ArrayLike, ArrayLike1D, ArrayLike2D, Float64Array, NDArray
-from statsmodels.tools.validation import array_like, bool_like, int_like, string_like
+from statsmodels.tools.typing import (
+    ArrayLike,
+    ArrayLike1D,
+    ArrayLike2D,
+    Float64Array,
+    NDArray,
+)
+from statsmodels.tools.validation import (
+    array_like,
+    bool_like,
+    int_like,
+    string_like,
+)
 from statsmodels.tsa.arima_process import arma2ma
 from statsmodels.tsa.base import tsa_model
 from statsmodels.tsa.base.prediction import PredictionResults
-from statsmodels.tsa.deterministic import DeterministicProcess, DeterministicTerm, Seasonality, TimeTrend
+from statsmodels.tsa.deterministic import (
+    DeterministicProcess,
+    DeterministicTerm,
+    Seasonality,
+    TimeTrend,
+)
 from statsmodels.tsa.tsatools import freq_to_period, lagmat
-__all__ = ['AR', 'AutoReg']
+
+__all__ = ["AR", "AutoReg"]
+
 AR_DEPRECATION_WARN = """
 statsmodels.tsa.AR has been deprecated in favor of statsmodels.tsa.AutoReg and
 statsmodels.tsa.SARIMAX.
@@ -43,6 +71,7 @@ To silence this warning and continue using AR until it is removed, use:
 import warnings
 warnings.filterwarnings('ignore', 'statsmodels.tsa.ar_model.AR', FutureWarning)
 """
+
 REPEATED_FIT_ERROR = """
 Model has been fit using maxlag={0}, method={1}, ic={2}, trend={3}. These
 cannot be changed in subsequent calls to `fit`. Instead, use a new instance of
@@ -50,14 +79,20 @@ AR.
 """


-def sumofsq(x: np.ndarray, axis: int=0) ->(float | np.ndarray):
+def sumofsq(x: np.ndarray, axis: int = 0) -> float | np.ndarray:
     """Helper function to calculate sum of squares along first axis"""
-    pass
+    return np.sum(x**2, axis=axis)


-def _get_period(data: (pd.DatetimeIndex | pd.PeriodIndex), index_freq) ->int:
+def _get_period(data: pd.DatetimeIndex | pd.PeriodIndex, index_freq) -> int:
     """Shared helper to get period from frequenc or raise"""
-    pass
+    if data.freq:
+        return freq_to_period(index_freq)
+    raise ValueError(
+        "freq cannot be inferred from endog and model includes seasonal "
+        "terms.  The number of periods must be explicitly set when the "
+        "endog's index does not contain a frequency."
+    )


 class AutoReg(tsa_model.TimeSeriesModel):
@@ -155,102 +190,227 @@ class AutoReg(tsa_model.TimeSeriesModel):
     >>> print(out.format(res.aic, res.hqic, res.bic))
     AIC: 5.884, HQIC: 5.959, BIC: 6.071
     """
+
     _y: Float64Array

-    def __init__(self, endog: ArrayLike1D, lags: (int | Sequence[int] |
-        None), trend: Literal['n', 'c', 't', 'ct']='c', seasonal: bool=
-        False, exog: (ArrayLike2D | None)=None, hold_back: (int | None)=
-        None, period: (int | None)=None, missing: str='none', *,
-        deterministic: (DeterministicProcess | None)=None, old_names: bool=
-        False):
+    def __init__(
+        self,
+        endog: ArrayLike1D,
+        lags: int | Sequence[int] | None,
+        trend: Literal["n", "c", "t", "ct"] = "c",
+        seasonal: bool = False,
+        exog: ArrayLike2D | None = None,
+        hold_back: int | None = None,
+        period: int | None = None,
+        missing: str = "none",
+        *,
+        deterministic: DeterministicProcess | None = None,
+        old_names: bool = False,
+    ):
         super().__init__(endog, exog, None, None, missing=missing)
-        self._trend = cast(Literal['n', 'c', 't', 'ct'], string_like(trend,
-            'trend', options=('n', 'c', 't', 'ct'), optional=False))
-        self._seasonal = bool_like(seasonal, 'seasonal')
-        self._period = int_like(period, 'period', optional=True)
+        self._trend = cast(
+            Literal["n", "c", "t", "ct"],
+            string_like(
+                trend, "trend", options=("n", "c", "t", "ct"), optional=False
+            ),
+        )
+        self._seasonal = bool_like(seasonal, "seasonal")
+        self._period = int_like(period, "period", optional=True)
         if self._period is None and self._seasonal:
             self._period = _get_period(self.data, self._index_freq)
         terms: list[DeterministicTerm] = [TimeTrend.from_string(self._trend)]
         if seasonal:
             assert isinstance(self._period, int)
             terms.append(Seasonality(self._period))
-        if hasattr(self.data.orig_endog, 'index'):
+        if hasattr(self.data.orig_endog, "index"):
             index = self.data.orig_endog.index
         else:
             index = np.arange(self.data.endog.shape[0])
         self._user_deterministic = False
         if deterministic is not None:
             if not isinstance(deterministic, DeterministicProcess):
-                raise TypeError('deterministic must be a DeterministicProcess')
+                raise TypeError("deterministic must be a DeterministicProcess")
             self._deterministics = deterministic
             self._user_deterministic = True
         else:
-            self._deterministics = DeterministicProcess(index,
-                additional_terms=terms)
+            self._deterministics = DeterministicProcess(
+                index, additional_terms=terms
+            )
         self._exog_names: list[str] = []
         self._k_ar = 0
-        self._old_names = bool_like(old_names, 'old_names', optional=False)
-        if deterministic is not None and (self._trend != 'n' or self._seasonal
-            ):
+        self._old_names = bool_like(old_names, "old_names", optional=False)
+        if deterministic is not None and (
+            self._trend != "n" or self._seasonal
+        ):
             warnings.warn(
-                'When using deterministic, trend must be "n" and seasonal must be False.'
-                , SpecificationWarning, stacklevel=2)
+                'When using deterministic, trend must be "n" and '
+                "seasonal must be False.",
+                SpecificationWarning,
+                stacklevel=2,
+            )
         if self._old_names:
             warnings.warn(
-                'old_names will be removed after the 0.14 release. You should stop setting this parameter and use the new names.'
-                , FutureWarning, stacklevel=2)
-        self._lags, self._hold_back = self._check_lags(lags, int_like(
-            hold_back, 'hold_back', optional=True))
+                "old_names will be removed after the 0.14 release. You should "
+                "stop setting this parameter and use the new names.",
+                FutureWarning,
+                stacklevel=2,
+            )
+        self._lags, self._hold_back = self._check_lags(
+            lags, int_like(hold_back, "hold_back", optional=True)
+        )
         self._setup_regressors()
         self.nobs = self._y.shape[0]
         self.data.xnames = self.exog_names

     @property
-    def ar_lags(self) ->(list[int] | None):
+    def ar_lags(self) -> list[int] | None:
         """The autoregressive lags included in the model"""
-        pass
+        lags = list(self._lags)
+        return None if not lags else lags

     @property
-    def hold_back(self) ->(int | None):
+    def hold_back(self) -> int | None:
         """The number of initial obs. excluded from the estimation sample."""
-        pass
+        return self._hold_back

     @property
-    def trend(self) ->Literal['n', 'c', 'ct', 'ctt']:
+    def trend(self) -> Literal["n", "c", "ct", "ctt"]:
         """The trend used in the model."""
-        pass
+        return self._trend

     @property
-    def seasonal(self) ->bool:
+    def seasonal(self) -> bool:
         """Flag indicating that the model contains a seasonal component."""
-        pass
+        return self._seasonal

     @property
-    def deterministic(self) ->(DeterministicProcess | None):
+    def deterministic(self) -> DeterministicProcess | None:
         """The deterministic used to construct the model"""
-        pass
+        return self._deterministics if self._user_deterministic else None

     @property
-    def period(self) ->(int | None):
+    def period(self) -> int | None:
         """The period of the seasonal component."""
-        pass
+        return self._period

     @property
-    def df_model(self) ->int:
+    def df_model(self) -> int:
         """The model degrees of freedom."""
-        pass
+        return self._x.shape[1]

     @property
-    def exog_names(self) ->(list[str] | None):
+    def exog_names(self) -> list[str] | None:
         """Names of exogenous variables included in model"""
-        pass
+        return self._exog_names

-    def initialize(self) ->None:
+    def initialize(self) -> None:
         """Initialize the model (no-op)."""
         pass

-    def fit(self, cov_type: str='nonrobust', cov_kwds: (dict[str, Any] |
-        None)=None, use_t: bool=False) ->AutoRegResultsWrapper:
+    def _check_lags(
+        self, lags: int | Sequence[int] | None, hold_back: int | None
+    ) -> tuple[list[int], int]:
+        if lags is None:
+            _lags: list[int] = []
+            self._maxlag = 0
+        elif isinstance(lags, Iterable):
+            _lags = []
+            for lag in lags:
+                val = int_like(lag, "lags")
+                assert isinstance(val, int)
+                _lags.append(val)
+            _lags_arr: NDArray = np.array(sorted(_lags))
+            if (
+                np.any(_lags_arr < 1)
+                or np.unique(_lags_arr).shape[0] != _lags_arr.shape[0]
+            ):
+                raise ValueError(
+                    "All values in lags must be positive and distinct."
+                )
+            self._maxlag = np.max(_lags_arr)
+            _lags = [int(v) for v in _lags_arr]
+        else:
+            val = int_like(lags, "lags")
+            assert isinstance(val, int)
+            self._maxlag = val
+            if self._maxlag < 0:
+                raise ValueError("lags must be a non-negative scalar.")
+            _lags_arr = np.arange(1, self._maxlag + 1)
+            _lags = [int(v) for v in _lags_arr]
+
+        if hold_back is None:
+            hold_back = self._maxlag
+        if hold_back < self._maxlag:
+            raise ValueError(
+                "hold_back must be >= lags if lags is an int or"
+                "max(lags) if lags is array_like."
+            )
+        return _lags, int(hold_back)
+
+    def _setup_regressors(self) -> None:
+        maxlag = self._maxlag
+        hold_back = self._hold_back
+        exog_names = []
+        endog_names = self.endog_names
+        x, y = lagmat(self.endog, maxlag, original="sep")
+        exog_names.extend(
+            [endog_names + ".L{0}".format(lag) for lag in self._lags]
+        )
+        if len(self._lags) < maxlag:
+            x = x[:, np.asarray(self._lags) - 1]
+        self._k_ar = x.shape[1]
+        deterministic = self._deterministics.in_sample()
+        if deterministic.shape[1]:
+            x = np.c_[to_numpy(deterministic), x]
+            if self._old_names:
+                deterministic_names = []
+                if "c" in self._trend:
+                    deterministic_names.append("intercept")
+                if "t" in self._trend:
+                    deterministic_names.append("trend")
+                if self._seasonal:
+                    period = self._period
+                    assert isinstance(period, int)
+                    names = ["seasonal.{0}".format(i) for i in range(period)]
+                    if "c" in self._trend:
+                        names = names[1:]
+                    deterministic_names.extend(names)
+            else:
+                deterministic_names = list(deterministic.columns)
+            exog_names = deterministic_names + exog_names
+        if self.exog is not None:
+            x = np.c_[x, self.exog]
+            exog_names.extend(self.data.param_names)
+        y = y[hold_back:]
+        x = x[hold_back:]
+        if y.shape[0] < x.shape[1]:
+            reg = x.shape[1]
+            period = self._period
+            trend = 0 if self._trend == "n" else len(self._trend)
+            if self._seasonal:
+                assert isinstance(period, int)
+                seas = period - int("c" in self._trend)
+            else:
+                seas = 0
+            lags = len(self._lags)
+            nobs = y.shape[0]
+            raise ValueError(
+                "The model specification cannot be estimated. "
+                f"The model contains {reg} regressors ({trend} trend, "
+                f"{seas} seasonal, {lags} lags) but after adjustment "
+                "for hold_back and creation of the lags, there "
+                f"are only {nobs} data points available to estimate "
+                "parameters."
+            )
+        self._y, self._x = y, x
+        self._exog_names = exog_names
+
+    def fit(
+        self,
+        cov_type: str = "nonrobust",
+        cov_kwds: dict[str, Any] | None = None,
+        use_t: bool = False,
+    ) -> AutoRegResultsWrapper:
         """
         Estimate the model parameters.

@@ -305,9 +465,39 @@ class AutoReg(tsa_model.TimeSeriesModel):
         Use ``OLS`` to estimate model parameters and to estimate parameter
         covariance.
         """
-        pass
+        # TODO: Determine correction for degree-of-freedom
+        # Special case parameterless model
+        if self._x.shape[1] == 0:
+            return AutoRegResultsWrapper(
+                AutoRegResults(self, np.empty(0), np.empty((0, 0)))
+            )

-    def loglike(self, params: ArrayLike) ->float:
+        ols_mod = OLS(self._y, self._x)
+        ols_res = ols_mod.fit(
+            cov_type=cov_type, cov_kwds=cov_kwds, use_t=use_t
+        )
+        cov_params = ols_res.cov_params()
+        use_t = ols_res.use_t
+        if cov_type == "nonrobust" and not use_t:
+            nobs = self._y.shape[0]
+            k = self._x.shape[1]
+            scale = nobs / (nobs - k)
+            cov_params /= scale
+        res = AutoRegResults(
+            self,
+            ols_res.params,
+            cov_params,
+            ols_res.normalized_cov_params,
+            use_t=use_t,
+        )
+
+        return AutoRegResultsWrapper(res)
+
+    def _resid(self, params: ArrayLike) -> np.ndarray:
+        params = array_like(params, "params", ndim=2)
+        return self._y.squeeze() - (self._x @ params).squeeze()
+
+    def loglike(self, params: ArrayLike) -> float:
         """
         Log-likelihood of model.

@@ -321,9 +511,13 @@ class AutoReg(tsa_model.TimeSeriesModel):
         float
             The log-likelihood value.
         """
-        pass
+        nobs = self.nobs
+        resid = self._resid(params)
+        ssr = resid @ resid
+        llf = -(nobs / 2) * (np.log(2 * np.pi) + np.log(ssr / nobs) + 1)
+        return llf

-    def score(self, params: ArrayLike) ->np.ndarray:
+    def score(self, params: ArrayLike) -> np.ndarray:
         """
         Score vector of model.

@@ -339,9 +533,10 @@ class AutoReg(tsa_model.TimeSeriesModel):
         ndarray
             The score vector evaluated at the parameters.
         """
-        pass
+        resid = self._resid(params)
+        return self._x.T @ resid

-    def information(self, params: ArrayLike) ->np.ndarray:
+    def information(self, params: ArrayLike) -> np.ndarray:
         """
         Fisher information matrix of model.

@@ -357,9 +552,11 @@ class AutoReg(tsa_model.TimeSeriesModel):
         ndarray
             The information matrix.
         """
-        pass
+        resid = self._resid(params)
+        sigma2 = resid @ resid / self.nobs
+        return (self._x.T @ self._x) * (1 / sigma2)

-    def hessian(self, params: ArrayLike) ->np.ndarray:
+    def hessian(self, params: ArrayLike) -> np.ndarray:
         """
         The Hessian matrix of the model.

@@ -373,11 +570,53 @@ class AutoReg(tsa_model.TimeSeriesModel):
         ndarray
             The hessian evaluated at the parameters.
         """
-        pass
-
-    def _dynamic_predict(self, params: ArrayLike, start: int, end: int,
-        dynamic: int, num_oos: int, exog: (Float64Array | None), exog_oos:
-        (Float64Array | None)) ->pd.Series:
+        return -self.information(params)
+
+    def _setup_oos_forecast(
+        self, add_forecasts: int, exog_oos: ArrayLike2D
+    ) -> np.ndarray:
+        x = np.zeros((add_forecasts, self._x.shape[1]))
+        oos_exog = self._deterministics.out_of_sample(steps=add_forecasts)
+        n_deterministic = oos_exog.shape[1]
+        x[:, :n_deterministic] = to_numpy(oos_exog)
+        # skip the AR columns
+        loc = n_deterministic + len(self._lags)
+        if self.exog is not None:
+            exog_oos_a = np.asarray(exog_oos)
+            x[:, loc:] = exog_oos_a[:add_forecasts]
+        return x
+
+    def _wrap_prediction(
+        self, prediction: np.ndarray, start: int, end: int, pad: int
+    ) -> pd.Series:
+        prediction = np.hstack([np.full(pad, np.nan), prediction])
+        n_values = end - start + pad
+        if not isinstance(self.data.orig_endog, (pd.Series, pd.DataFrame)):
+            return prediction[-n_values:]
+        index = self._index
+        if end > self.endog.shape[0]:
+            freq = getattr(index, "freq", None)
+            if freq:
+                if isinstance(index, pd.PeriodIndex):
+                    index = pd.period_range(index[0], freq=freq, periods=end)
+                else:
+                    index = pd.date_range(index[0], freq=freq, periods=end)
+            else:
+                index = pd.RangeIndex(end)
+        index = index[start - pad : end]
+        prediction = prediction[-n_values:]
+        return pd.Series(prediction, index=index)
+
+    def _dynamic_predict(
+        self,
+        params: ArrayLike,
+        start: int,
+        end: int,
+        dynamic: int,
+        num_oos: int,
+        exog: Float64Array | None,
+        exog_oos: Float64Array | None,
+    ) -> pd.Series:
         """

         :param params:
@@ -389,11 +628,73 @@ class AutoReg(tsa_model.TimeSeriesModel):
         :param exog_oos:
         :return:
         """
-        pass
-
-    def _static_predict(self, params: Float64Array, start: int, end: int,
-        num_oos: int, exog: (Float64Array | None), exog_oos: (Float64Array |
-        None)) ->pd.Series:
+        reg = []
+        hold_back = self._hold_back
+        adj = 0
+        if start < hold_back:
+            # Adjust start and dynamic
+            adj = hold_back - start
+        start += adj
+        # New offset shifts, but must remain non-negative
+        dynamic = max(dynamic - adj, 0)
+
+        if (start - hold_back) <= self.nobs:
+            # _x is missing hold_back observations, which is why
+            # it is shifted by this amount
+            is_loc = slice(start - hold_back, end + 1 - hold_back)
+            x = self._x[is_loc]
+            if exog is not None:
+                x = x.copy()
+                # Replace final columns
+                x[:, -exog.shape[1] :] = exog[start : end + 1]
+            reg.append(x)
+        if num_oos > 0:
+            reg.append(self._setup_oos_forecast(num_oos, exog_oos))
+        _reg = np.vstack(reg)
+        det_col_idx = self._x.shape[1] - len(self._lags)
+        det_col_idx -= 0 if self.exog is None else self.exog.shape[1]
+        # Simple 1-step static forecasts for dynamic observations
+        forecasts = np.empty(_reg.shape[0])
+        forecasts[:dynamic] = _reg[:dynamic] @ params
+        for h in range(dynamic, _reg.shape[0]):
+            # Fill in regressor matrix
+            for j, lag in enumerate(self._lags):
+                fcast_loc = h - lag
+                if fcast_loc >= dynamic:
+                    val = forecasts[fcast_loc]
+                else:
+                    # If before the start of the forecasts, use actual values
+                    val = self.endog[fcast_loc + start]
+                _reg[h, det_col_idx + j] = val
+            forecasts[h] = np.squeeze(_reg[h : h + 1] @ params)
+        return self._wrap_prediction(forecasts, start, end + 1 + num_oos, adj)
+
+    def _static_oos_predict(
+        self, params: ArrayLike, num_oos: int, exog_oos: ArrayLike2D
+    ) -> np.ndarray:
+        new_x = self._setup_oos_forecast(num_oos, exog_oos)
+        if self._maxlag == 0:
+            return new_x @ params
+        forecasts = np.empty(num_oos)
+        nexog = 0 if self.exog is None else self.exog.shape[1]
+        ar_offset = self._x.shape[1] - nexog - len(self._lags)
+        for i in range(num_oos):
+            for j, lag in enumerate(self._lags):
+                loc = i - lag
+                val = self._y[loc] if loc < 0 else forecasts[loc]
+                new_x[i, ar_offset + j] = np.squeeze(val)
+            forecasts[i] = np.squeeze(new_x[i : i + 1] @ params)
+        return forecasts
+
+    def _static_predict(
+        self,
+        params: Float64Array,
+        start: int,
+        end: int,
+        num_oos: int,
+        exog: Float64Array | None,
+        exog_oos: Float64Array | None,
+    ) -> pd.Series:
         """
         Path for static predictions

@@ -414,13 +715,91 @@ class AutoReg(tsa_model.TimeSeriesModel):
         exog_oos :  {ndarray, DataFrame}
             Containing forecast exog values
         """
-        pass
-
-    def predict(self, params: ArrayLike, start: (int | str | datetime.
-        datetime | pd.Timestamp | None)=None, end: (int | str | datetime.
-        datetime | pd.Timestamp | None)=None, dynamic: (bool | int)=False,
-        exog: (ArrayLike2D | None)=None, exog_oos: (ArrayLike2D | None)=None
-        ) ->pd.Series:
+        hold_back = self._hold_back
+        nobs = self.endog.shape[0]
+
+        x = np.empty((0, self._x.shape[1]))
+
+        # Adjust start to reflect observations lost
+        adj = max(0, hold_back - start)
+        start += adj
+        if start <= nobs:
+            # Use existing regressors
+            is_loc = slice(start - hold_back, end + 1 - hold_back)
+            x = self._x[is_loc]
+            if exog is not None:
+                exog_a = np.asarray(exog)
+                x = x.copy()
+                # Replace final columns
+                x[:, -exog_a.shape[1] :] = exog_a[start : end + 1]
+        in_sample = x @ params
+        if num_oos == 0:  # No out of sample
+            return self._wrap_prediction(in_sample, start, end + 1, adj)
+
+        out_of_sample = self._static_oos_predict(params, num_oos, exog_oos)
+        prediction = np.hstack((in_sample, out_of_sample))
+        return self._wrap_prediction(prediction, start, end + 1 + num_oos, adj)
+
+    def _prepare_prediction(
+        self,
+        params: ArrayLike,
+        exog: ArrayLike2D,
+        exog_oos: ArrayLike2D,
+        start: int | str | datetime.datetime | pd.Timestamp | None,
+        end: int | str | datetime.datetime | pd.Timestamp | None,
+    ) -> tuple[
+        np.ndarray,
+        np.ndarray | pd.DataFrame | None,
+        np.ndarray | pd.DataFrame | None,
+        int,
+        int,
+        int,
+    ]:
+        params = array_like(params, "params")
+        assert isinstance(params, np.ndarray)
+        if isinstance(exog, pd.DataFrame):
+            _exog = exog
+        else:
+            _exog = array_like(exog, "exog", ndim=2, optional=True)
+        if isinstance(exog_oos, pd.DataFrame):
+            _exog_oos = exog_oos
+        else:
+            _exog_oos = array_like(exog_oos, "exog_oos", ndim=2, optional=True)
+        start = 0 if start is None else start
+        end = self._index[-1] if end is None else end
+        start, end, num_oos, _ = self._get_prediction_index(start, end)
+        return params, _exog, _exog_oos, start, end, num_oos
+
+    def _parse_dynamic(self, dynamic, start):
+        if isinstance(
+            dynamic, (str, bytes, pd.Timestamp, dt.datetime, pd.Period)
+        ):
+            dynamic_loc, _, _ = self._get_index_loc(dynamic)
+            # Adjust since relative to start
+            dynamic_loc -= start
+        elif dynamic is True:
+            # if True, all forecasts are dynamic
+            dynamic_loc = 0
+        else:
+            dynamic_loc = int(dynamic)
+        # At this point dynamic is an offset relative to start
+        # and it must be non-negative
+        if dynamic_loc < 0:
+            raise ValueError(
+                "Dynamic prediction cannot begin prior to the "
+                "first observation in the sample."
+            )
+        return dynamic_loc
+
+    def predict(
+        self,
+        params: ArrayLike,
+        start: int | str | datetime.datetime | pd.Timestamp | None = None,
+        end: int | str | datetime.datetime | pd.Timestamp | None = None,
+        dynamic: bool | int = False,
+        exog: ArrayLike2D | None = None,
+        exog_oos: ArrayLike2D | None = None,
+    ) -> pd.Series:
         """
         In-sample prediction and out-of-sample forecasting.

@@ -464,7 +843,57 @@ class AutoReg(tsa_model.TimeSeriesModel):
             Array of out of in-sample predictions and / or out-of-sample
             forecasts.
         """
-        pass
+
+        params, exog, exog_oos, start, end, num_oos = self._prepare_prediction(
+            params, exog, exog_oos, start, end
+        )
+        if self.exog is None and (exog is not None or exog_oos is not None):
+            raise ValueError(
+                "exog and exog_oos cannot be used when the model "
+                "does not contains exogenous regressors."
+            )
+        elif self.exog is not None:
+            if exog is not None and exog.shape != self.exog.shape:
+                msg = (
+                    "The shape of exog {0} must match the shape of the "
+                    "exog variable used to create the model {1}."
+                )
+                raise ValueError(msg.format(exog.shape, self.exog.shape))
+            if (
+                exog_oos is not None
+                and exog_oos.shape[1] != self.exog.shape[1]
+            ):
+                msg = (
+                    "The number of columns in exog_oos ({0}) must match "
+                    "the number of columns  in the exog variable used to "
+                    "create the model ({1})."
+                )
+                raise ValueError(
+                    msg.format(exog_oos.shape[1], self.exog.shape[1])
+                )
+            if num_oos > 0 and exog_oos is None:
+                raise ValueError(
+                    "exog_oos must be provided when producing "
+                    "out-of-sample forecasts."
+                )
+            elif exog_oos is not None and num_oos > exog_oos.shape[0]:
+                msg = (
+                    "start and end indicate that {0} out-of-sample "
+                    "predictions must be computed. exog_oos has {1} rows "
+                    "but must have at least {0}."
+                )
+                raise ValueError(msg.format(num_oos, exog_oos.shape[0]))
+
+        if (isinstance(dynamic, bool) and not dynamic) or self._maxlag == 0:
+            # If model has no lags, static and dynamic are identical
+            return self._static_predict(
+                params, start, end, num_oos, exog, exog_oos
+            )
+        dynamic = self._parse_dynamic(dynamic, start)
+
+        return self._dynamic_predict(
+            params, start, end, dynamic, num_oos, exog, exog_oos
+        )


 class AR:
@@ -479,8 +908,9 @@ class AR:

     def __init__(self, *args, **kwargs):
         raise NotImplementedError(
-            'AR has been removed from statsmodels and replaced with statsmodels.tsa.ar_model.AutoReg.'
-            )
+            "AR has been removed from statsmodels and replaced with "
+            "statsmodels.tsa.ar_model.AutoReg."
+        )


 class ARResults:
@@ -494,13 +924,15 @@ class ARResults:

     def __init__(self, *args, **kwargs):
         raise NotImplementedError(
-            'AR and ARResults have been removed and replaced by AutoReg And AutoRegResults.'
-            )
+            "AR and ARResults have been removed and replaced by "
+            "AutoReg And AutoRegResults."
+        )


 doc = Docstring(AutoReg.predict.__doc__)
-_predict_params = doc.extract_parameters(['start', 'end', 'dynamic', 'exog',
-    'exog_oos'], 8)
+_predict_params = doc.extract_parameters(
+    ["start", "end", "dynamic", "exog", "exog_oos"], 8
+)


 class AutoRegResults(tsa_model.TimeSeriesModelResults):
@@ -525,10 +957,19 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
     summary_text : str, optional
         Additional text to append to results summary
     """
-    _cache: dict[str, Any] = {}

-    def __init__(self, model, params, cov_params, normalized_cov_params=
-        None, scale=1.0, use_t=False, summary_text=''):
+    _cache: dict[str, Any] = {}  # for scale setter
+
+    def __init__(
+        self,
+        model,
+        params,
+        cov_params,
+        normalized_cov_params=None,
+        scale=1.0,
+        use_t=False,
+        summary_text="",
+    ):
         super().__init__(model, params, normalized_cov_params, scale)
         self._cache = {}
         self._params = params
@@ -558,37 +999,46 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         **kwargs
             Any additional keyword arguments required to initialize the model.
         """
-        pass
+        self._params = params
+        self.model = model

     @property
     def ar_lags(self):
         """The autoregressive lags included in the model"""
-        pass
+        return self._ar_lags

     @property
     def params(self):
         """The estimated parameters."""
-        pass
+        return self._params

     @property
     def df_model(self):
         """The degrees of freedom consumed by the model."""
-        pass
+        return self._df_model

     @property
     def df_resid(self):
         """The remaining degrees of freedom in the residuals."""
-        pass
+        return self.nobs - self._df_model

     @property
     def nobs(self):
         """
         The number of observations after adjusting for losses due to lags.
         """
-        pass
+        return self._nobs
+
+    @cache_writable()
+    def sigma2(self):
+        return 1.0 / self.nobs * sumofsq(self.resid)
+
+    @cache_writable()  # for compatability with RegressionResults
+    def scale(self):
+        return self.sigma2

     @cache_readonly
-    def bse(self):
+    def bse(self):  # allow user to specify?
         """
         The standard errors of the estimated parameters.

@@ -596,63 +1046,97 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         the OLS standard errors of the coefficients. If the `method` is 'mle'
         then they are computed using the numerical Hessian.
         """
-        pass
+        return np.sqrt(np.diag(self.cov_params()))

     @cache_readonly
     def aic(self):
-        """
+        r"""
         Akaike Information Criterion using Lutkepohl's definition.

-        :math:`-2 llf + \\ln(nobs) (1 + df_{model})`
+        :math:`-2 llf + \ln(nobs) (1 + df_{model})`
         """
-        pass
+        # This is based on loglike with dropped constant terms ?
+        # Lutkepohl
+        # return np.log(self.sigma2) + 1./self.model.nobs * self.k_ar
+        # Include constant as estimated free parameter and double the loss
+        # Stata defintion
+        # nobs = self.nobs
+        # return -2 * self.llf/nobs + 2 * (self.k_ar+self.k_trend)/nobs
+        return eval_measures.aic(self.llf, self.nobs, self.df_model + 1)

     @cache_readonly
     def hqic(self):
-        """
+        r"""
         Hannan-Quinn Information Criterion using Lutkepohl's definition.

-        :math:`-2 llf + 2 \\ln(\\ln(nobs)) (1 + df_{model})`
+        :math:`-2 llf + 2 \ln(\ln(nobs)) (1 + df_{model})`
         """
-        pass
+        # Lutkepohl
+        # return np.log(self.sigma2)+ 2 * np.log(np.log(nobs))/nobs * self.k_ar
+        # R uses all estimated parameters rather than just lags
+        # Stata
+        # nobs = self.nobs
+        # return -2 * self.llf/nobs + 2 * np.log(np.log(nobs))/nobs * \
+        #        (self.k_ar + self.k_trend)
+        return eval_measures.hqic(self.llf, self.nobs, self.df_model + 1)

     @cache_readonly
     def fpe(self):
-        """
+        r"""
         Final prediction error using Lütkepohl's definition.

-        :math:`((nobs+df_{model})/(nobs-df_{model})) \\sigma^2`
+        :math:`((nobs+df_{model})/(nobs-df_{model})) \sigma^2`
         """
-        pass
+        nobs = self.nobs
+        df_model = self.df_model
+        # Lutkepohl
+        return self.sigma2 * ((nobs + df_model) / (nobs - df_model))

     @cache_readonly
     def aicc(self):
-        """
+        r"""
         Akaike Information Criterion with small sample correction

         :math:`2.0 * df_{model} * nobs / (nobs - df_{model} - 1.0)`
         """
-        pass
+        return eval_measures.aicc(self.llf, self.nobs, self.df_model + 1)

     @cache_readonly
     def bic(self):
-        """
+        r"""
         Bayes Information Criterion

-        :math:`-2 llf + \\ln(nobs) (1 + df_{model})`
+        :math:`-2 llf + \ln(nobs) (1 + df_{model})`
         """
-        pass
+        # Lutkepohl
+        # np.log(self.sigma2) + np.log(nobs)/nobs * self.k_ar
+        # Include constant as est. free parameter
+        # Stata
+        # -2 * self.llf/nobs + np.log(nobs)/nobs * (self.k_ar + self.k_trend)
+        return eval_measures.bic(self.llf, self.nobs, self.df_model + 1)

     @cache_readonly
     def resid(self):
         """
         The residuals of the model.
         """
-        pass
+        model = self.model
+        endog = model.endog.squeeze()
+        return endog[self._hold_back :] - self.fittedvalues

     def _lag_repr(self):
         """Returns poly repr of an AR, (1  -phi1 L -phi2 L^2-...)"""
-        pass
+        ar_lags = self._ar_lags if self._ar_lags is not None else []
+        k_ar = len(ar_lags)
+        ar_params = np.zeros(self._max_lag + 1)
+        ar_params[0] = 1
+        df_model = self._df_model
+        exog = self.model.exog
+        k_exog = exog.shape[1] if exog is not None else 0
+        params = self._params[df_model - k_ar - k_exog : df_model - k_exog]
+        for i, lag in enumerate(ar_lags):
+            ar_params[lag] = -params[i]
+        return ar_params

     @cache_readonly
     def roots(self):
@@ -664,17 +1148,24 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         Stability requires that the roots in modulus lie outside the unit
         circle.
         """
-        pass
+        # TODO: Specific to AR
+        lag_repr = self._lag_repr()
+        if lag_repr.shape[0] == 1:
+            return np.empty(0)
+
+        return np.roots(lag_repr) ** -1

     @cache_readonly
     def arfreq(self):
-        """
+        r"""
         Returns the frequency of the AR roots.

         This is the solution, x, to z = abs(z)*exp(2j*np.pi*x) where z are the
         roots.
         """
-        pass
+        # TODO: Specific to AR
+        z = self.roots
+        return np.arctan2(z.imag, z.real) / (2 * np.pi)

     @cache_readonly
     def fittedvalues(self):
@@ -684,7 +1175,7 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         The `k_ar` initial values are computed via the Kalman Filter if the
         model is fit by `mle`.
         """
-        pass
+        return self.model.predict(self.params)[self._hold_back :]

     def test_serial_correlation(self, lags=None, model_df=None):
         """
@@ -719,7 +1210,30 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         statsmodels.stats.diagnostic.acorr_ljungbox
             Ljung-Box test for serial correlation.
         """
-        pass
+        # Deferred to prevent circular import
+        from statsmodels.stats.diagnostic import acorr_ljungbox
+
+        lags = int_like(lags, "lags", optional=True)
+        model_df = int_like(model_df, "df_model", optional=True)
+        model_df = self.df_model if model_df is None else model_df
+        nobs_effective = self.resid.shape[0]
+        if lags is None:
+            lags = min(nobs_effective // 5, 10)
+        test_stats = acorr_ljungbox(
+            self.resid,
+            lags=lags,
+            boxpierce=False,
+            model_df=model_df,
+        )
+        cols = ["Ljung-Box", "LB P-value", "DF"]
+        if lags == 1:
+            df = max(0, 1 - model_df)
+        else:
+            df = np.clip(np.arange(1, lags + 1) - model_df, 0, np.inf)
+            df = df.astype(int)
+        test_stats["df"] = df
+        index = pd.RangeIndex(1, lags + 1, name="Lag")
+        return pd.DataFrame(test_stats, columns=cols, index=index)

     def test_normality(self):
         """
@@ -740,7 +1254,11 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         statsmodels.stats.stattools.jarque_bera
             The Jarque-Bera test of normality.
         """
-        pass
+        # Deferred to prevent circular import
+        from statsmodels.stats.stattools import jarque_bera
+
+        index = ["Jarque-Bera", "P-value", "Skewness", "Kurtosis"]
+        return pd.Series(jarque_bera(self.resid), index=index)

     def test_heteroskedasticity(self, lags=None):
         """
@@ -765,7 +1283,19 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         statsmodels.stats.diagnostic.acorr_lm
             LM test for autocorrelation.
         """
-        pass
+        from statsmodels.stats.diagnostic import het_arch
+
+        lags = int_like(lags, "lags", optional=True)
+        nobs_effective = self.resid.shape[0]
+        if lags is None:
+            lags = min(nobs_effective // 5, 10)
+        out = []
+        for lag in range(1, lags + 1):
+            res = het_arch(self.resid, nlags=lag)
+            out.append([res[0], res[1], lag])
+        index = pd.RangeIndex(1, lags + 1, name="Lag")
+        cols = ["ARCH-LM", "P-value", "DF"]
+        return pd.DataFrame(out, columns=cols, index=index)

     def diagnostic_summary(self):
         """
@@ -786,10 +1316,66 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         test_heteroskedasticity
             Test models residuals for conditional heteroskedasticity.
         """
-        pass
-
-    def get_prediction(self, start=None, end=None, dynamic=False, exog=None,
-        exog_oos=None):
+        from statsmodels.iolib.table import SimpleTable
+
+        spacer = SimpleTable([""])
+        smry = Summary()
+        sc = self.test_serial_correlation()
+        sc = sc.loc[sc.DF > 0]
+        values = [[i + 1] + row for i, row in enumerate(sc.values.tolist())]
+        data_fmts = ("%10d", "%10.3f", "%10.3f", "%10d")
+        if sc.shape[0]:
+            tab = SimpleTable(
+                values,
+                headers=["Lag"] + list(sc.columns),
+                title="Test of No Serial Correlation",
+                header_align="r",
+                data_fmts=data_fmts,
+            )
+            smry.tables.append(tab)
+            smry.tables.append(spacer)
+        jb = self.test_normality()
+        data_fmts = ("%10.3f", "%10.3f", "%10.3f", "%10.3f")
+        tab = SimpleTable(
+            [jb.values],
+            headers=list(jb.index),
+            title="Test of Normality",
+            header_align="r",
+            data_fmts=data_fmts,
+        )
+        smry.tables.append(tab)
+        smry.tables.append(spacer)
+        arch_lm = self.test_heteroskedasticity()
+        values = [
+            [i + 1] + row for i, row in enumerate(arch_lm.values.tolist())
+        ]
+        data_fmts = ("%10d", "%10.3f", "%10.3f", "%10d")
+        tab = SimpleTable(
+            values,
+            headers=["Lag"] + list(arch_lm.columns),
+            title="Test of Conditional Homoskedasticity",
+            header_align="r",
+            data_fmts=data_fmts,
+        )
+        smry.tables.append(tab)
+        return smry
+
+    @Appender(remove_parameters(AutoReg.predict.__doc__, "params"))
+    def predict(
+        self, start=None, end=None, dynamic=False, exog=None, exog_oos=None
+    ):
+        return self.model.predict(
+            self._params,
+            start=start,
+            end=end,
+            dynamic=dynamic,
+            exog=exog,
+            exog_oos=exog_oos,
+        )
+
+    def get_prediction(
+        self, start=None, end=None, dynamic=False, exog=None, exog_oos=None
+    ):
         """
         Predictions and prediction intervals

@@ -830,7 +1416,22 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         PredictionResults
             Prediction results with mean and prediction intervals
         """
-        pass
+        mean = self.predict(
+            start=start, end=end, dynamic=dynamic, exog=exog, exog_oos=exog_oos
+        )
+        mean_var = np.full_like(mean, self.sigma2)
+        mean_var[np.isnan(mean)] = np.nan
+        start = 0 if start is None else start
+        end = self.model._index[-1] if end is None else end
+        _, _, oos, _ = self.model._get_prediction_index(start, end)
+        if oos > 0:
+            ar_params = self._lag_repr()
+            ma = arma2ma(ar_params, np.ones(1), lags=oos)
+            mean_var[-oos:] = self.sigma2 * np.cumsum(ma**2)
+        if isinstance(mean, pd.Series):
+            mean_var = pd.Series(mean_var, index=mean.index)
+
+        return PredictionResults(mean, mean_var)

     def forecast(self, steps=1, exog=None):
         """
@@ -860,22 +1461,80 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         AutoRegResults.get_prediction
             In- and out-of-sample predictions and confidence intervals
         """
-        pass
-
-    def _plot_predictions(self, predictions, start, end, alpha, in_sample,
-        fig, figsize):
+        start = self.model.data.orig_endog.shape[0]
+        if isinstance(steps, (int, np.integer)):
+            end = start + steps - 1
+        else:
+            end = steps
+        return self.predict(start=start, end=end, dynamic=False, exog_oos=exog)
+
+    def _plot_predictions(
+        self,
+        predictions,
+        start,
+        end,
+        alpha,
+        in_sample,
+        fig,
+        figsize,
+    ):
         """Shared helper for plotting predictions"""
-        pass
+        from statsmodels.graphics.utils import _import_mpl, create_mpl_fig
+
+        _import_mpl()
+        fig = create_mpl_fig(fig, figsize)
+        start = 0 if start is None else start
+        end = self.model._index[-1] if end is None else end
+        _, _, oos, _ = self.model._get_prediction_index(start, end)
+
+        ax = fig.add_subplot(111)
+        mean = predictions.predicted_mean
+        if not in_sample and oos:
+            if isinstance(mean, pd.Series):
+                mean = mean.iloc[-oos:]
+        elif not in_sample:
+            raise ValueError(
+                "in_sample is False but there are no"
+                "out-of-sample forecasts to plot."
+            )
+        ax.plot(mean, zorder=2)
+
+        if oos and alpha is not None:
+            ci = np.asarray(predictions.conf_int(alpha))
+            lower, upper = ci[-oos:, 0], ci[-oos:, 1]
+            label = "{0:.0%} confidence interval".format(1 - alpha)
+            x = ax.get_lines()[-1].get_xdata()
+            ax.fill_between(
+                x[-oos:],
+                lower,
+                upper,
+                color="gray",
+                alpha=0.5,
+                label=label,
+                zorder=1,
+            )
+        ax.legend(loc="best")
+
+        return fig

     @Substitution(predict_params=_predict_params)
-    def plot_predict(self, start=None, end=None, dynamic=False, exog=None,
-        exog_oos=None, alpha=0.05, in_sample=True, fig=None, figsize=None):
+    def plot_predict(
+        self,
+        start=None,
+        end=None,
+        dynamic=False,
+        exog=None,
+        exog_oos=None,
+        alpha=0.05,
+        in_sample=True,
+        fig=None,
+        figsize=None,
+    ):
         """
         Plot in- and out-of-sample predictions

         Parameters
-        ----------
-%(predict_params)s
+        ----------\n%(predict_params)s
         alpha : {float, None}
             The tail probability not covered by the confidence interval. Must
             be in (0, 1). Confidence interval is constructed assuming normally
@@ -895,7 +1554,12 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         Figure
             Figure handle containing the plot.
         """
-        pass
+        predictions = self.get_prediction(
+            start=start, end=end, dynamic=dynamic, exog=exog, exog_oos=exog_oos
+        )
+        return self._plot_predictions(
+            predictions, start, end, alpha, in_sample, fig, figsize
+        )

     def plot_diagnostics(self, lags=10, fig=None, figsize=None):
         """
@@ -929,7 +1593,61 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         statsmodels.graphics.gofplots.qqplot
         statsmodels.graphics.tsaplots.plot_acf
         """
-        pass
+        from statsmodels.graphics.utils import _import_mpl, create_mpl_fig
+
+        _import_mpl()
+        fig = create_mpl_fig(fig, figsize)
+        # Eliminate residuals associated with burned or diffuse likelihoods
+        resid = self.resid
+
+        # Top-left: residuals vs time
+        ax = fig.add_subplot(221)
+        if hasattr(self.model.data, "dates") and self.data.dates is not None:
+            x = self.model.data.dates._mpl_repr()
+            x = x[self.model.hold_back :]
+        else:
+            hold_back = self.model.hold_back
+            x = hold_back + np.arange(self.resid.shape[0])
+        std_resid = resid / np.sqrt(self.sigma2)
+        ax.plot(x, std_resid)
+        ax.hlines(0, x[0], x[-1], alpha=0.5)
+        ax.set_xlim(x[0], x[-1])
+        ax.set_title("Standardized residual")
+
+        # Top-right: histogram, Gaussian kernel density, Normal density
+        # Can only do histogram and Gaussian kernel density on the non-null
+        # elements
+        std_resid_nonmissing = std_resid[~(np.isnan(resid))]
+        ax = fig.add_subplot(222)
+
+        ax.hist(std_resid_nonmissing, density=True, label="Hist")
+
+        kde = gaussian_kde(std_resid)
+        xlim = (-1.96 * 2, 1.96 * 2)
+        x = np.linspace(xlim[0], xlim[1])
+        ax.plot(x, kde(x), label="KDE")
+        ax.plot(x, norm.pdf(x), label="N(0,1)")
+        ax.set_xlim(xlim)
+        ax.legend()
+        ax.set_title("Histogram plus estimated density")
+
+        # Bottom-left: QQ plot
+        ax = fig.add_subplot(223)
+        from statsmodels.graphics.gofplots import qqplot
+
+        qqplot(std_resid, line="s", ax=ax)
+        ax.set_title("Normal Q-Q")
+
+        # Bottom-right: Correlogram
+        ax = fig.add_subplot(224)
+        from statsmodels.graphics.tsaplots import plot_acf
+
+        plot_acf(resid, ax=ax, lags=lags)
+        ax.set_title("Correlogram")
+
+        ax.set_ylim(-1, 1)
+
+        return fig

     def summary(self, alpha=0.05):
         """
@@ -950,7 +1668,88 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         --------
         statsmodels.iolib.summary.Summary
         """
-        pass
+        model = self.model
+
+        title = model.__class__.__name__ + " Model Results"
+        method = "Conditional MLE"
+        # get sample
+        start = self._hold_back
+        if self.data.dates is not None:
+            dates = self.data.dates
+            sample = [dates[start].strftime("%m-%d-%Y")]
+            sample += ["- " + dates[-1].strftime("%m-%d-%Y")]
+        else:
+            sample = [str(start), str(len(self.data.orig_endog))]
+        model = model.__class__.__name__
+        if self.model.seasonal:
+            model = "Seas. " + model
+        if self.ar_lags is not None and len(self.ar_lags) < self._max_lag:
+            model = "Restr. " + model
+        if self.model.exog is not None:
+            model += "-X"
+
+        order = "({0})".format(self._max_lag)
+        dep_name = str(self.model.endog_names)
+        top_left = [
+            ("Dep. Variable:", [dep_name]),
+            ("Model:", [model + order]),
+            ("Method:", [method]),
+            ("Date:", None),
+            ("Time:", None),
+            ("Sample:", [sample[0]]),
+            ("", [sample[1]]),
+        ]
+
+        top_right = [
+            ("No. Observations:", [str(len(self.model.endog))]),
+            ("Log Likelihood", ["%#5.3f" % self.llf]),
+            ("S.D. of innovations", ["%#5.3f" % self.sigma2**0.5]),
+            ("AIC", ["%#5.3f" % self.aic]),
+            ("BIC", ["%#5.3f" % self.bic]),
+            ("HQIC", ["%#5.3f" % self.hqic]),
+        ]
+
+        smry = Summary()
+        smry.add_table_2cols(
+            self, gleft=top_left, gright=top_right, title=title
+        )
+        smry.add_table_params(self, alpha=alpha, use_t=False)
+
+        # Make the roots table
+        from statsmodels.iolib.table import SimpleTable
+
+        if self._max_lag:
+            arstubs = ["AR.%d" % i for i in range(1, self._max_lag + 1)]
+            stubs = arstubs
+            roots = self.roots
+            freq = self.arfreq
+            modulus = np.abs(roots)
+            data = np.column_stack((roots.real, roots.imag, modulus, freq))
+            roots_table = SimpleTable(
+                [
+                    (
+                        "%17.4f" % row[0],
+                        "%+17.4fj" % row[1],
+                        "%17.4f" % row[2],
+                        "%17.4f" % row[3],
+                    )
+                    for row in data
+                ],
+                headers=[
+                    "            Real",
+                    "         Imaginary",
+                    "         Modulus",
+                    "        Frequency",
+                ],
+                title="Roots",
+                stubs=stubs,
+            )
+
+            smry.tables.append(roots_table)
+        if self._summary_text:
+            extra_txt = smry.extra_txt if smry.extra_txt is not None else []
+            smry.add_extra_txt(extra_txt + [self._summary_text])
+        return smry

     def apply(self, endog, exog=None, refit=False, fit_kwargs=None):
         """
@@ -1029,7 +1828,71 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         1983    1.463415
         Freq: A-DEC, dtype: float64
         """
-        pass
+        existing = self.model
+        try:
+            deterministic = existing.deterministic
+            if deterministic is not None:
+                if isinstance(endog, (pd.Series, pd.DataFrame)):
+                    index = endog.index
+                else:
+                    index = np.arange(endog.shape[0])
+                deterministic = deterministic.apply(index)
+            mod = AutoReg(
+                endog,
+                lags=existing.ar_lags,
+                trend=existing.trend,
+                seasonal=existing.seasonal,
+                exog=exog,
+                hold_back=existing.hold_back,
+                period=existing.period,
+                deterministic=deterministic,
+                old_names=False,
+            )
+        except Exception as exc:
+            error = (
+                "An exception occured during the creation of the cloned "
+                "AutoReg instance when applying the existing model "
+                "specification to the new data. The original traceback "
+                "appears below."
+            )
+            exc.args = (error,) + exc.args
+            raise exc.with_traceback(exc.__traceback__)
+
+        if (mod.exog is None) != (existing.exog is None):
+            if existing.exog is not None:
+                raise ValueError(
+                    "exog must be provided when the original model contained "
+                    "exog variables"
+                )
+            raise ValueError(
+                "exog must be None when the original model did not contain "
+                "exog variables"
+            )
+        if (
+            existing.exog is not None
+            and existing.exog.shape[1] != mod.exog.shape[1]
+        ):
+            raise ValueError(
+                "The number of exog variables passed must match the original "
+                f"number of exog values ({existing.exog.shape[1]})"
+            )
+        if refit:
+            fit_kwargs = {} if fit_kwargs is None else fit_kwargs
+            return mod.fit(**fit_kwargs)
+        smry_txt = (
+            "Parameters and standard errors were estimated using a different "
+            "dataset and were then applied to this dataset."
+        )
+        res = AutoRegResults(
+            mod,
+            self.params,
+            self.cov_params_default,
+            self.normalized_cov_params,
+            use_t=self.use_t,
+            summary_text=smry_txt,
+        )
+
+        return AutoRegResultsWrapper(res)

     def append(self, endog, exog=None, refit=False, fit_kwargs=None):
         """
@@ -1110,28 +1973,92 @@ class AutoRegResults(tsa_model.TimeSeriesModelResults):
         2006    3.335294
         Freq: A-DEC, dtype: float64
         """
-        pass
+
+        def _check(orig, new, name, use_pandas=True):
+            from statsmodels.tsa.statespace.mlemodel import _check_index
+
+            typ = type(orig)
+            if not isinstance(new, typ):
+                raise TypeError(
+                    f"{name} must have the same type as the {name} used to "
+                    f"originally create the model ({typ.__name__})."
+                )
+            if not use_pandas:
+                return np.concatenate([orig, new])
+            start = len(orig)
+            end = start + len(new) - 1
+            _, _, _, append_ix = self.model._get_prediction_index(start, end)
+            _check_index(append_ix, new, title=name)
+            return pd.concat([orig, new], axis=0)
+
+        existing = self.model
+        no_exog = existing.exog is None
+        if no_exog != (exog is None):
+            if no_exog:
+                err = (
+                    "Original model does not contain exog data but exog data "
+                    "passed"
+                )
+            else:
+                err = "Original model has exog data but not exog data passed"
+            raise ValueError(err)
+        if isinstance(existing.data.orig_endog, (pd.Series, pd.DataFrame)):
+            endog = _check(existing.data.orig_endog, endog, "endog")
+        else:
+            endog = _check(
+                existing.endog, np.asarray(endog), "endog", use_pandas=False
+            )
+        if isinstance(existing.data.orig_exog, (pd.Series, pd.DataFrame)):
+            exog = _check(existing.data.orig_exog, exog, "exog")
+        elif exog is not None:
+            exog = _check(
+                existing.exog, np.asarray(exog), "endog", use_pandas=False
+            )
+        return self.apply(endog, exog, refit=refit, fit_kwargs=fit_kwargs)


 class AutoRegResultsWrapper(wrap.ResultsWrapper):
     _attrs = {}
-    _wrap_attrs = wrap.union_dicts(tsa_model.TimeSeriesResultsWrapper.
-        _wrap_attrs, _attrs)
+    _wrap_attrs = wrap.union_dicts(
+        tsa_model.TimeSeriesResultsWrapper._wrap_attrs, _attrs
+    )
     _methods = {}
-    _wrap_methods = wrap.union_dicts(tsa_model.TimeSeriesResultsWrapper.
-        _wrap_methods, _methods)
+    _wrap_methods = wrap.union_dicts(
+        tsa_model.TimeSeriesResultsWrapper._wrap_methods, _methods
+    )


 wrap.populate_wrapper(AutoRegResultsWrapper, AutoRegResults)
+
 doc = Docstring(AutoReg.__doc__)
-_auto_reg_params = doc.extract_parameters(['trend', 'seasonal', 'exog',
-    'hold_back', 'period', 'missing', 'old_names'], 4)
+_auto_reg_params = doc.extract_parameters(
+    [
+        "trend",
+        "seasonal",
+        "exog",
+        "hold_back",
+        "period",
+        "missing",
+        "old_names",
+    ],
+    4,
+)


 @Substitution(auto_reg_params=_auto_reg_params)
-def ar_select_order(endog, maxlag, ic='bic', glob=False, trend: Literal['n',
-    'c', 'ct', 'ctt']='c', seasonal=False, exog=None, hold_back=None,
-    period=None, missing='none', old_names=False):
+def ar_select_order(
+    endog,
+    maxlag,
+    ic="bic",
+    glob=False,
+    trend: Literal["n", "c", "ct", "ctt"] = "c",
+    seasonal=False,
+    exog=None,
+    hold_back=None,
+    period=None,
+    missing="none",
+    old_names=False,
+):
     """
     Autoregressive AR-X(p) model order selection.

@@ -1147,8 +2074,7 @@ def ar_select_order(endog, maxlag, ic='bic', glob=False, trend: Literal['n',
         Flag indicating where to use a global search  across all combinations
         of lags.  In practice, this option is not computational feasible when
         maxlag is larger than 15 (or perhaps 20) since the global search
-        requires fitting 2**maxlag models.
-%(auto_reg_params)s
+        requires fitting 2**maxlag models.\n%(auto_reg_params)s

     Returns
     -------
@@ -1179,7 +2105,95 @@ def ar_select_order(endog, maxlag, ic='bic', glob=False, trend: Literal['n',
     >>> mod.ar_lags
     array([1, 2, 9])
     """
-    pass
+    full_mod = AutoReg(
+        endog,
+        maxlag,
+        trend=trend,
+        seasonal=seasonal,
+        exog=exog,
+        hold_back=hold_back,
+        period=period,
+        missing=missing,
+        old_names=old_names,
+    )
+    nexog = full_mod.exog.shape[1] if full_mod.exog is not None else 0
+    y, x = full_mod._y, full_mod._x
+    base_col = x.shape[1] - nexog - maxlag
+    sel = np.ones(x.shape[1], dtype=bool)
+    ics: list[tuple[int | tuple[int, ...], tuple[float, float, float]]] = []
+
+    def compute_ics(res):
+        nobs = res.nobs
+        df_model = res.df_model
+        sigma2 = 1.0 / nobs * sumofsq(res.resid)
+        llf = -nobs * (np.log(2 * np.pi * sigma2) + 1) / 2
+        res = SimpleNamespace(
+            nobs=nobs, df_model=df_model, sigma2=sigma2, llf=llf
+        )
+
+        aic = call_cached_func(AutoRegResults.aic, res)
+        bic = call_cached_func(AutoRegResults.bic, res)
+        hqic = call_cached_func(AutoRegResults.hqic, res)
+
+        return aic, bic, hqic
+
+    def ic_no_data():
+        """Fake mod and results to handle no regressor case"""
+        mod = SimpleNamespace(
+            nobs=y.shape[0], endog=y, exog=np.empty((y.shape[0], 0))
+        )
+        llf = OLS.loglike(mod, np.empty(0))
+        res = SimpleNamespace(
+            resid=y, nobs=y.shape[0], llf=llf, df_model=0, k_constant=0
+        )
+
+        return compute_ics(res)
+
+    if not glob:
+        sel[base_col : base_col + maxlag] = False
+        for i in range(maxlag + 1):
+            sel[base_col : base_col + i] = True
+            if not np.any(sel):
+                ics.append((0, ic_no_data()))
+                continue
+            res = OLS(y, x[:, sel]).fit()
+            lags = tuple(j for j in range(1, i + 1))
+            lags = 0 if not lags else lags
+            ics.append((lags, compute_ics(res)))
+    else:
+        bits = np.arange(2**maxlag, dtype=np.int32)[:, None]
+        bits = bits.view(np.uint8)
+        bits = np.unpackbits(bits).reshape(-1, 32)
+        for i in range(4):
+            bits[:, 8 * i : 8 * (i + 1)] = bits[:, 8 * i : 8 * (i + 1)][
+                :, ::-1
+            ]
+        masks = bits[:, :maxlag]
+        for mask in masks:
+            sel[base_col : base_col + maxlag] = mask
+            if not np.any(sel):
+                ics.append((0, ic_no_data()))
+                continue
+            res = OLS(y, x[:, sel]).fit()
+            lags = tuple(np.where(mask)[0] + 1)
+            lags = 0 if not lags else lags
+            ics.append((lags, compute_ics(res)))
+
+    key_loc = {"aic": 0, "bic": 1, "hqic": 2}[ic]
+    ics = sorted(ics, key=lambda x: x[1][key_loc])
+    selected_model = ics[0][0]
+    mod = AutoReg(
+        endog,
+        selected_model,
+        trend=trend,
+        seasonal=seasonal,
+        exog=exog,
+        hold_back=hold_back,
+        period=period,
+        missing=missing,
+        old_names=old_names,
+    )
+    return AROrderSelectionResults(mod, ics, trend, seasonal, period)


 class AROrderSelectionResults:
@@ -1189,9 +2203,14 @@ class AROrderSelectionResults:
     Contains the information criteria for all fitted model orders.
     """

-    def __init__(self, model: AutoReg, ics: list[tuple[int | tuple[int, ...
-        ], tuple[float, float, float]]], trend: Literal['n', 'c', 'ct',
-        'ctt'], seasonal: bool, period: (int | None)):
+    def __init__(
+        self,
+        model: AutoReg,
+        ics: list[tuple[int | tuple[int, ...], tuple[float, float, float]]],
+        trend: Literal["n", "c", "ct", "ctt"],
+        seasonal: bool,
+        period: int | None,
+    ):
         self._model = model
         self._ics = ics
         self._trend = trend
@@ -1205,27 +2224,27 @@ class AROrderSelectionResults:
         self._hqic = dict([(key, val[2]) for key, val in hqic])

     @property
-    def model(self) ->AutoReg:
+    def model(self) -> AutoReg:
         """The model selected using the chosen information criterion."""
-        pass
+        return self._model

     @property
-    def seasonal(self) ->bool:
+    def seasonal(self) -> bool:
         """Flag indicating if a seasonal component is included."""
-        pass
+        return self._seasonal

     @property
-    def trend(self) ->Literal['n', 'c', 'ct', 'ctt']:
+    def trend(self) -> Literal["n", "c", "ct", "ctt"]:
         """The trend included in the model selection."""
-        pass
+        return self._trend

     @property
-    def period(self) ->(int | None):
+    def period(self) -> int | None:
         """The period of the seasonal component."""
-        pass
+        return self._period

     @property
-    def aic(self) ->dict[int | tuple[int, ...], float]:
+    def aic(self) -> dict[int | tuple[int, ...], float]:
         """
         The Akaike information criterion for the models fit.

@@ -1233,10 +2252,10 @@ class AROrderSelectionResults:
         -------
         dict[tuple, float]
         """
-        pass
+        return self._aic

     @property
-    def bic(self) ->dict[int | tuple[int, ...], float]:
+    def bic(self) -> dict[int | tuple[int, ...], float]:
         """
         The Bayesian (Schwarz) information criteria for the models fit.

@@ -1244,10 +2263,10 @@ class AROrderSelectionResults:
         -------
         dict[tuple, float]
         """
-        pass
+        return self._bic

     @property
-    def hqic(self) ->dict[int | tuple[int, ...], float]:
+    def hqic(self) -> dict[int | tuple[int, ...], float]:
         """
         The Hannan-Quinn information criteria for the models fit.

@@ -1255,9 +2274,9 @@ class AROrderSelectionResults:
         -------
         dict[tuple, float]
         """
-        pass
+        return self._hqic

     @property
-    def ar_lags(self) ->(list[int] | None):
+    def ar_lags(self) -> list[int] | None:
         """The lags included in the selected model."""
-        pass
+        return self._model.ar_lags
diff --git a/statsmodels/tsa/ardl/_pss_critical_values/pss-process.py b/statsmodels/tsa/ardl/_pss_critical_values/pss-process.py
index 10babb350..6cd0fc75c 100644
--- a/statsmodels/tsa/ardl/_pss_critical_values/pss-process.py
+++ b/statsmodels/tsa/ardl/_pss_critical_values/pss-process.py
@@ -1,30 +1,36 @@
 from statsmodels.compat.pandas import FUTURE_STACK
+
 from collections import defaultdict
 import glob
 import os
+
 import numpy as np
 import pandas as pd
 from scipy import stats
 from sklearn.model_selection import KFold
-if __name__ == '__main__':
+
+if __name__ == "__main__":
     from black import FileMode, TargetVersion, format_file_contents
     from sklearn.linear_model import LinearRegression
     from sklearn.model_selection import cross_val_score
-    PATH = os.environ.get('PSS_PATH', '..')
-    print(f'Processing {PATH}')
-    files = glob.glob(os.path.join(PATH, '*.npz'))
+
+    PATH = os.environ.get("PSS_PATH", "..")
+    print(f"Processing {PATH}")
+
+    files = glob.glob(os.path.join(PATH, "*.npz"))
     groups = defaultdict(list)
     for f in files:
-        keys = f.split('-')
-        key = int(keys[2]), int(keys[4]), keys[6] == 'True'
+        keys = f.split("-")
+        key = int(keys[2]), int(keys[4]), keys[6] == "True"
         if key[0] == 0:
             continue
         with np.load(f) as contents:
-            idx = (100 * contents['percentiles']).astype(int)
-            s = pd.Series(contents['q'], index=idx)
+            idx = (100 * contents["percentiles"]).astype(int)
+            s = pd.Series(contents["q"], index=idx)
         groups[key].append(s)
+
     final = {}
-    quantiles = 90, 95, 99, 99.9
+    quantiles = (90, 95, 99, 99.9)
     crit_vals = {}
     ordered_keys = sorted(groups.keys())
     for key in ordered_keys:
@@ -34,15 +40,36 @@ if __name__ == '__main__':
             cv.append(final[key].loc[int(100 * q)].mean())
         crit_vals[key] = cv
     df = pd.DataFrame(crit_vals).T
-    df.index.names = 'k', 'case', 'I1'
+    df.index.names = ("k", "case", "I1")
     df.columns = quantiles
+
     for key, row in df.iterrows():
         crit_vals[key] = [round(val, 7) for val in list(row)]
+
+    def setup_regressors(df, low_pow=3, high_pow=3, cut=70, log=False):
+        s = df.stack(**FUTURE_STACK).reset_index()
+        q = s.level_0 / 10000
+        y = stats.norm.ppf(q)
+        cv = s[0]
+        if log:
+            cv = np.log(cv)
+        m = np.where(s.level_0 <= df.index[cut])[0].max()
+        reg = np.zeros((q.shape[0], 2 + low_pow + high_pow))
+        reg[:m, 0] = 1
+        for i in range(low_pow):
+            reg[:m, i + 1] = cv[:m] ** (i + 1)
+        w = 1 + low_pow
+        reg[m:, w] = 1
+        for i in range(high_pow):
+            reg[m:, w + i + 1] = cv[m:] ** (i + 1)
+        return reg, y
+
     large_p = {}
     small_p = {}
     transform = {}
     max_stat = {}
     threshold = {}
+
     hp = 2
     for key in final:
         print(key)
@@ -54,32 +81,36 @@ if __name__ == '__main__':
                 for log in (True, False):
                     cv = KFold(shuffle=True, random_state=20210903)
                     x, y = setup_regressors(data, lp, hp, cut, log)
-                    k = lp, hp, cut, log
-                    score[k] = cross_val_score(lr, x, y, scoring=
-                        'neg_mean_absolute_error', cv=cv).sum()
+                    k = (lp, hp, cut, log)
+                    score[k] = cross_val_score(
+                        lr, x, y, scoring="neg_mean_absolute_error", cv=cv
+                    ).sum()
         idx = pd.Series(score).idxmax()
         lp, hp, cut, log = idx
         assert log
+
         x, y = setup_regressors(data, lp, hp, cut, log)
         lr = lr.fit(x, y)
-        large = lr.coef_[:1 + lp]
+        large = lr.coef_[: 1 + lp]
         if lp == 2:
             large = np.array(large.tolist() + [0.0])
         large_p[key] = large.tolist()
-        small_p[key] = lr.coef_[1 + lp:].tolist()
+        small_p[key] = lr.coef_[1 + lp :].tolist()
         transform[key] = log
         max_stat[key] = np.inf
         threshold[key] = data.iloc[cut].mean()
         if small_p[key][2] < 0:
             max_stat[key] = small_p[key][1] / (-2 * small_p[key][2])
+
     for key in large_p:
         large_p[key] = [round(val, 5) for val in large_p[key]]
         small_p[key] = [round(val, 5) for val in small_p[key]]
+
     raw_code = f"""
 #!/usr/bin/env python
 # coding: utf-8

-""\"
+\"\"\"
 Critical value polynomials and related quantities for the bounds test of

 Pesaran, M. H., Shin, Y., & Smith, R. J. (2001). Bounds testing approaches
@@ -112,7 +143,7 @@ where x = np.log(stat) and Phi() is the normal cdf.
 When this the models, the polynomial is evaluated at the natural log of the
 test statistic and then the normal CDF of this value is computed to produce
 the p-value.
-""\"
+\"\"\"

 __all__ = ["large_p", "small_p", "crit_vals", "crit_percentiles", "stat_star"]

@@ -126,9 +157,12 @@ crit_percentiles = {quantiles}

 crit_vals = {crit_vals}
 """
+
     targets = {TargetVersion.PY37, TargetVersion.PY38, TargetVersion.PY39}
     fm = FileMode(target_versions=targets, line_length=79)
     formatted_code = format_file_contents(raw_code, fast=False, mode=fm)
-    with open('../pss_critical_values.py', 'w', newline='\n', encoding='utf-8'
-        ) as out:
+
+    with open(
+            "../pss_critical_values.py", "w", newline="\n", encoding="utf-8"
+    ) as out:
         out.write(formatted_code)
diff --git a/statsmodels/tsa/ardl/_pss_critical_values/pss.py b/statsmodels/tsa/ardl/_pss_critical_values/pss.py
index 9064a12d4..5b26a082a 100644
--- a/statsmodels/tsa/ardl/_pss_critical_values/pss.py
+++ b/statsmodels/tsa/ardl/_pss_critical_values/pss.py
@@ -1,9 +1,78 @@
+#!/usr/bin/env python
+# coding: utf-8
+
 from itertools import product
 import os
+
 import numpy as np
-PATH = os.environ.get('PSS_PATH', '..')
-seed = [3957597042, 2709280948, 499296859, 1555610991, 2390531900, 
-    2160388094, 4098495866, 47221919]
+
+PATH = os.environ.get("PSS_PATH", "..")
+
+
+def pss_block(
+    seed, k, case, i1, block_id, m=2_000_000, t=1_000, save=True, path="./"
+):
+    file_name = f"pss-k-{k}-case-{case}-i1-{i1}-block-{block_id}.npz"
+    file_name = os.path.join(path, file_name)
+    if save and os.path.exists(file_name):
+        return
+    rs = np.random.default_rng(seed)
+    const = np.ones(t - 1)
+    tau = np.arange(1, t).astype(float)
+    f = np.empty(m)
+    for j in range(m):
+        u = rs.standard_normal((k + 1, t))
+        y = np.cumsum(u[0])
+        if i1:
+            x = np.cumsum(u[1:], axis=1).T
+        else:
+            x = u[1:].T
+        lhs = np.diff(y)
+        rhv = [y[:-1], x[:-1]]
+        if case == 2:
+            rhv.append(const)
+        elif case == 4:
+            rhv.append(tau)
+        if case >= 3:
+            rhv.append(const)
+        if case == 5:
+            rhv.append(tau)
+        rest = k + 1
+        if case in (2, 4):
+            rest += 1
+        rhs = np.column_stack(rhv)
+        b = np.linalg.lstsq(rhs, lhs, rcond=None)[0]
+        u = lhs - rhs @ b
+        s2 = u.T @ u / (u.shape[0] - rhs.shape[1])
+        xpx = rhs.T @ rhs
+        vcv = np.linalg.inv(xpx) * s2
+        r = np.eye(rest, rhs.shape[1])
+        rvcvr = r @ vcv @ r.T
+        rb = r @ b
+        f[j] = rb.T @ np.linalg.inv(rvcvr) @ rb / rest
+    percentiles = [0.05]
+    percentiles += [i / 10 for i in range(1, 10)]
+    percentiles += [1 + i / 2 for i in range(18)]
+    percentiles += [i for i in range(10, 51)]
+    percentiles += [100 - v for v in percentiles]
+    percentiles = sorted(set(percentiles))
+    percentiles = np.asarray(percentiles)
+    q = np.percentile(f, percentiles)
+    if save:
+        np.savez(file_name, q=q, percentiles=percentiles)
+    return q
+
+
+seed = [
+    3957597042,
+    2709280948,
+    499296859,
+    1555610991,
+    2390531900,
+    2160388094,
+    4098495866,
+    47221919,
+]
 ss = np.random.SeedSequence(seed)
 k = list(range(1, 11))
 case = list(range(1, 6))
@@ -13,8 +82,18 @@ params = list(product(k, case, i1, block_id))
 seeds = ss.generate_state(8 * len(params)).reshape((-1, 8)).tolist()
 configs = []
 for _s, (_k, _case, _i1, _block_id) in zip(seeds, params):
-    configs.append({'seed': _s, 'k': _k, 'case': _case, 'i1': _i1,
-        'block_id': _block_id, 'path': PATH})
-if __name__ == '__main__':
+    configs.append(
+        {
+            "seed": _s,
+            "k": _k,
+            "case": _case,
+            "i1": _i1,
+            "block_id": _block_id,
+            "path": PATH,
+        }
+    )
+
+if __name__ == "__main__":
     from joblib import Parallel, delayed
+
     Parallel(n_jobs=10)(delayed(pss_block)(**c) for c in configs)
diff --git a/statsmodels/tsa/ardl/model.py b/statsmodels/tsa/ardl/model.py
index 7f321fbec..750929945 100644
--- a/statsmodels/tsa/ardl/model.py
+++ b/statsmodels/tsa/ardl/model.py
@@ -1,16 +1,30 @@
 from __future__ import annotations
+
 from statsmodels.compat.pandas import Appender, Substitution, call_cached_func
 from statsmodels.compat.python import Literal
+
 from collections import defaultdict
 import datetime as dt
 from itertools import combinations, product
 import textwrap
 from types import SimpleNamespace
-from typing import TYPE_CHECKING, Any, Dict, Hashable, Mapping, NamedTuple, Optional, Sequence, Union
+from typing import (
+    TYPE_CHECKING,
+    Any,
+    Dict,
+    Hashable,
+    Mapping,
+    NamedTuple,
+    Optional,
+    Sequence,
+    Union,
+)
 import warnings
+
 import numpy as np
 import pandas as pd
 from scipy import stats
+
 from statsmodels.base.data import PandasData
 import statsmodels.base.wrapper as wrap
 from statsmodels.iolib.summary import Summary, summary_params
@@ -18,19 +32,43 @@ from statsmodels.regression.linear_model import OLS
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.docstring import Docstring, Parameter, remove_parameters
 from statsmodels.tools.sm_exceptions import SpecificationWarning
-from statsmodels.tools.typing import ArrayLike1D, ArrayLike2D, Float64Array, NDArray
-from statsmodels.tools.validation import array_like, bool_like, float_like, int_like
-from statsmodels.tsa.ar_model import AROrderSelectionResults, AutoReg, AutoRegResults, sumofsq
+from statsmodels.tools.typing import (
+    ArrayLike1D,
+    ArrayLike2D,
+    Float64Array,
+    NDArray,
+)
+from statsmodels.tools.validation import (
+    array_like,
+    bool_like,
+    float_like,
+    int_like,
+)
+from statsmodels.tsa.ar_model import (
+    AROrderSelectionResults,
+    AutoReg,
+    AutoRegResults,
+    sumofsq,
+)
 from statsmodels.tsa.ardl import pss_critical_values
 from statsmodels.tsa.arima_process import arma2ma
 from statsmodels.tsa.base import tsa_model
 from statsmodels.tsa.base.prediction import PredictionResults
 from statsmodels.tsa.deterministic import DeterministicProcess
 from statsmodels.tsa.tsatools import lagmat
+
 if TYPE_CHECKING:
     import matplotlib.figure
-__all__ = ['ARDL', 'ARDLResults', 'ardl_select_order',
-    'ARDLOrderSelectionResults', 'UECM', 'UECMResults', 'BoundsTestResult']
+
+__all__ = [
+    "ARDL",
+    "ARDLResults",
+    "ardl_select_order",
+    "ARDLOrderSelectionResults",
+    "UECM",
+    "UECMResults",
+    "BoundsTestResult",
+]


 class BoundsTestResult(NamedTuple):
@@ -41,23 +79,114 @@ class BoundsTestResult(NamedTuple):
     alternative: str

     def __repr__(self):
-        return f"""{self.__class__.__name__}
+        return f"""\
+{self.__class__.__name__}
 Stat: {self.stat:0.5f}
-Upper P-value: {self.p_values['upper']:0.3g}
-Lower P-value: {self.p_values['lower']:0.3g}
+Upper P-value: {self.p_values["upper"]:0.3g}
+Lower P-value: {self.p_values["lower"]:0.3g}
 Null: {self.null}
 Alternative: {self.alternative}
 """


 _UECMOrder = Union[None, int, Dict[Hashable, Optional[int]]]
-_ARDLOrder = Union[None, int, _UECMOrder, Sequence[int], Dict[Hashable,
-    Union[int, Sequence[int], None]]]
-_INT_TYPES = int, np.integer
+
+_ARDLOrder = Union[
+    None,
+    int,
+    _UECMOrder,
+    Sequence[int],
+    Dict[Hashable, Union[int, Sequence[int], None]],
+]
+
+_INT_TYPES = (int, np.integer)
+
+
+def _check_order(order: int | Sequence[int] | None, causal: bool) -> bool:
+    if order is None:
+        return True
+    if isinstance(order, (int, np.integer)):
+        if int(order) < int(causal):
+            raise ValueError(
+                f"integer orders must be at least {int(causal)} when causal "
+                f"is {causal}."
+            )
+        return True
+    for v in order:
+        if not isinstance(v, (int, np.integer)):
+            raise TypeError(
+                "sequence orders must contain non-negative integer values"
+            )
+    order = [int(v) for v in order]
+    if len(set(order)) != len(order) or min(order) < 0:
+        raise ValueError(
+            "sequence orders must contain distinct non-negative values"
+        )
+    if int(causal) and min(order) < 1:
+        raise ValueError(
+            "sequence orders must be strictly positive when causal is True"
+        )
+    return True
+
+
+def _format_order(
+    exog: ArrayLike2D, order: _ARDLOrder, causal: bool
+) -> dict[Hashable, list[int]]:
+    keys: list[Hashable]
+    exog_order: dict[Hashable, int | Sequence[int] | None]
+    if exog is None and order in (0, None):
+        return {}
+    if not isinstance(exog, pd.DataFrame):
+        exog = array_like(exog, "exog", ndim=2, maxdim=2)
+        keys = list(range(exog.shape[1]))
+    else:
+        keys = [col for col in exog.columns]
+    if order is None:
+        exog_order = {k: None for k in keys}
+    elif isinstance(order, Mapping):
+        exog_order = order
+        missing = set(keys).difference(order.keys())
+        extra = set(order.keys()).difference(keys)
+        if extra:
+            msg = (
+                "order dictionary contains keys for exogenous "
+                "variable(s) that are not contained in exog"
+            )
+            msg += " Extra keys: "
+            msg += ", ".join(list(sorted([str(v) for v in extra]))) + "."
+            raise ValueError(msg)
+        if missing:
+            msg = (
+                "exog contains variables that are missing from the order "
+                "dictionary.  Missing keys: "
+            )
+            msg += ", ".join([str(k) for k in missing]) + "."
+            warnings.warn(msg, SpecificationWarning, stacklevel=2)
+
+        for key in exog_order:
+            _check_order(exog_order[key], causal)
+    elif isinstance(order, _INT_TYPES):
+        _check_order(order, causal)
+        exog_order = {k: int(order) for k in keys}
+    else:
+        _check_order(order, causal)
+        exog_order = {k: list(order) for k in keys}
+    final_order: dict[Hashable, list[int]] = {}
+    for key in exog_order:
+        value = exog_order[key]
+        if value is None:
+            continue
+        assert value is not None
+        if isinstance(value, int):
+            final_order[key] = list(range(int(causal), value + 1))
+        else:
+            final_order[key] = [int(lag) for lag in value]
+
+    return final_order


 class ARDL(AutoReg):
-    """
+    r"""
     Autoregressive Distributed Lag (ARDL) Model

     Parameters
@@ -122,19 +251,19 @@ class ARDL(AutoReg):

     .. math ::

-       Y_t = \\delta_0 + \\delta_1 t + \\delta_2 t^2
-             + \\sum_{i=1}^{s-1} \\gamma_i I_{[(\\mod(t,s) + 1) = i]}
-             + \\sum_{j=1}^p \\phi_j Y_{t-j}
-             + \\sum_{l=1}^k \\sum_{m=0}^{o_l} \\beta_{l,m} X_{l, t-m}
-             + Z_t \\lambda
-             + \\epsilon_t
+       Y_t = \delta_0 + \delta_1 t + \delta_2 t^2
+             + \sum_{i=1}^{s-1} \gamma_i I_{[(\mod(t,s) + 1) = i]}
+             + \sum_{j=1}^p \phi_j Y_{t-j}
+             + \sum_{l=1}^k \sum_{m=0}^{o_l} \beta_{l,m} X_{l, t-m}
+             + Z_t \lambda
+             + \epsilon_t

-    where :math:`\\delta_\\bullet` capture trends, :math:`\\gamma_\\bullet`
+    where :math:`\delta_\bullet` capture trends, :math:`\gamma_\bullet`
     capture seasonal shifts, s is the period of the seasonality, p is the
     lag length of the endogenous variable, k is the number of exogenous
     variables :math:`X_{l}`, :math:`o_l` is included the lag length of
     :math:`X_{l}`, :math:`Z_t` are ``r`` included fixed regressors and
-    :math:`\\epsilon_t` is a white noise shock. If ``causal`` is ``True``,
+    :math:`\epsilon_t` is a white noise shock. If ``causal`` is ``True``,
     then the 0-th lag of the exogenous variables is not included and the
     sum starts at ``m=1``.

@@ -189,43 +318,73 @@ class ARDL(AutoReg):
     >>> ARDL(lrma, 3, exoga, {0: [0, 1], 1: [0, 1, 3], 2: 2})
     """

-    def __init__(self, endog: (Sequence[float] | pd.Series | ArrayLike2D),
-        lags: (int | Sequence[int] | None), exog: (ArrayLike2D | None)=None,
-        order: _ARDLOrder=0, trend: Literal['n', 'c', 'ct', 'ctt']='c', *,
-        fixed: (ArrayLike2D | None)=None, causal: bool=False, seasonal:
-        bool=False, deterministic: (DeterministicProcess | None)=None,
-        hold_back: (int | None)=None, period: (int | None)=None, missing:
-        Literal['none', 'drop', 'raise']='none') ->None:
+    def __init__(
+        self,
+        endog: Sequence[float] | pd.Series | ArrayLike2D,
+        lags: int | Sequence[int] | None,
+        exog: ArrayLike2D | None = None,
+        order: _ARDLOrder = 0,
+        trend: Literal["n", "c", "ct", "ctt"] = "c",
+        *,
+        fixed: ArrayLike2D | None = None,
+        causal: bool = False,
+        seasonal: bool = False,
+        deterministic: DeterministicProcess | None = None,
+        hold_back: int | None = None,
+        period: int | None = None,
+        missing: Literal["none", "drop", "raise"] = "none",
+    ) -> None:
         self._x = np.empty((0, 0))
         self._y = np.empty((0,))
-        super().__init__(endog, lags, trend=trend, seasonal=seasonal, exog=
-            exog, hold_back=hold_back, period=period, missing=missing,
-            deterministic=deterministic, old_names=False)
-        self._causal = bool_like(causal, 'causal', strict=True)
+
+        super().__init__(
+            endog,
+            lags,
+            trend=trend,
+            seasonal=seasonal,
+            exog=exog,
+            hold_back=hold_back,
+            period=period,
+            missing=missing,
+            deterministic=deterministic,
+            old_names=False,
+        )
+        # Reset hold back which was set in AutoReg.__init__
+        self._causal = bool_like(causal, "causal", strict=True)
         self.data.orig_fixed = fixed
         if fixed is not None:
-            fixed_arr = array_like(fixed, 'fixed', ndim=2, maxdim=2)
-            if fixed_arr.shape[0] != self.data.endog.shape[0] or not np.all(np
-                .isfinite(fixed_arr)):
+            fixed_arr = array_like(fixed, "fixed", ndim=2, maxdim=2)
+            if fixed_arr.shape[0] != self.data.endog.shape[0] or not np.all(
+                np.isfinite(fixed_arr)
+            ):
                 raise ValueError(
-                    'fixed must be an (nobs, m) array where nobs matches the number of observations in the endog variable, and allvalues must be finite'
-                    )
+                    "fixed must be an (nobs, m) array where nobs matches the "
+                    "number of observations in the endog variable, and all"
+                    "values must be finite"
+                )
             if isinstance(fixed, pd.DataFrame):
                 self._fixed_names = list(fixed.columns)
             else:
-                self._fixed_names = [f'z.{i}' for i in range(fixed_arr.
-                    shape[1])]
+                self._fixed_names = [
+                    f"z.{i}" for i in range(fixed_arr.shape[1])
+                ]
             self._fixed = fixed_arr
         else:
             self._fixed = np.empty((self.data.endog.shape[0], 0))
             self._fixed_names = []
+
         self._blocks: dict[str, np.ndarray] = {}
         self._names: dict[str, Sequence[str]] = {}
+
+        # 1. Check and update order
         self._order = self._check_order(order)
+        # 2. Construct Regressors
         self._y, self._x = self._construct_regressors(hold_back)
+        # 3. Construct variable names
         self._endog_name, self._exog_names = self._construct_variable_names()
         self.data.param_names = self.data.xnames = self._exog_names
         self.data.ynames = self._endog_name
+
         self._causal = True
         if self._order:
             min_lags = [min(val) for val in self._order.values()]
@@ -234,46 +393,99 @@ class ARDL(AutoReg):
         self._results_wrapper = ARDLResultsWrapper

     @property
-    def fixed(self) ->(NDArray | pd.DataFrame | None):
+    def fixed(self) -> NDArray | pd.DataFrame | None:
         """The fixed data used to construct the model"""
-        pass
+        return self.data.orig_fixed

     @property
-    def causal(self) ->bool:
+    def causal(self) -> bool:
         """Flag indicating that the ARDL is causal"""
-        pass
+        return self._causal

     @property
-    def ar_lags(self) ->(list[int] | None):
+    def ar_lags(self) -> list[int] | None:
         """The autoregressive lags included in the model"""
-        pass
+        return None if not self._lags else self._lags

     @property
-    def dl_lags(self) ->dict[Hashable, list[int]]:
+    def dl_lags(self) -> dict[Hashable, list[int]]:
         """The lags of exogenous variables included in the model"""
-        pass
+        return self._order

     @property
-    def ardl_order(self) ->tuple[int, ...]:
+    def ardl_order(self) -> tuple[int, ...]:
         """The order of the ARDL(p,q)"""
-        pass
-
-    def _setup_regressors(self) ->None:
+        ar_order = 0 if not self._lags else int(max(self._lags))
+        ardl_order = [ar_order]
+        for lags in self._order.values():
+            if lags is not None:
+                ardl_order.append(int(max(lags)))
+        return tuple(ardl_order)
+
+    def _setup_regressors(self) -> None:
         """Place holder to let AutoReg init complete"""
-        pass
+        self._y = np.empty((self.endog.shape[0] - self._hold_back, 0))

     @staticmethod
-    def _format_exog(exog: ArrayLike2D, order: dict[Hashable, list[int]]
-        ) ->dict[Hashable, np.ndarray]:
+    def _format_exog(
+        exog: ArrayLike2D, order: dict[Hashable, list[int]]
+    ) -> dict[Hashable, np.ndarray]:
         """Transform exogenous variables and orders to regressors"""
-        pass
+        if not order:
+            return {}
+        max_order = 0
+        for val in order.values():
+            if val is not None:
+                max_order = max(max(val), max_order)
+        if not isinstance(exog, pd.DataFrame):
+            exog = array_like(exog, "exog", ndim=2, maxdim=2)
+        exog_lags = {}
+        for key in order:
+            if order[key] is None:
+                continue
+            if isinstance(exog, np.ndarray):
+                assert isinstance(key, int)
+                col = exog[:, key]
+            else:
+                col = exog[key]
+            lagged_col = lagmat(col, max_order, original="in")
+            lags = order[key]
+            exog_lags[key] = lagged_col[:, lags]
+        return exog_lags

-    def _check_order(self, order: _ARDLOrder) ->dict[Hashable, list[int]]:
+    def _check_order(self, order: _ARDLOrder) -> dict[Hashable, list[int]]:
         """Validate and standardize the model order"""
-        pass
-
-    def fit(self, *, cov_type: str='nonrobust', cov_kwds: dict[str, Any]=
-        None, use_t: bool=True) ->ARDLResults:
+        return _format_order(self.data.orig_exog, order, self._causal)
+
+    def _fit(
+        self,
+        cov_type: str = "nonrobust",
+        cov_kwds: dict[str, Any] = None,
+        use_t: bool = True,
+    ) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
+        if self._x.shape[1] == 0:
+            return np.empty((0,)), np.empty((0, 0)), np.empty((0, 0))
+        ols_mod = OLS(self._y, self._x)
+        ols_res = ols_mod.fit(
+            cov_type=cov_type, cov_kwds=cov_kwds, use_t=use_t
+        )
+        cov_params = ols_res.cov_params()
+        use_t = ols_res.use_t
+        if cov_type == "nonrobust" and not use_t:
+            nobs = self._y.shape[0]
+            k = self._x.shape[1]
+            scale = nobs / (nobs - k)
+            cov_params /= scale
+
+        return ols_res.params, cov_params, ols_res.normalized_cov_params
+
+    def fit(
+        self,
+        *,
+        cov_type: str = "nonrobust",
+        cov_kwds: dict[str, Any] = None,
+        use_t: bool = True,
+    ) -> ARDLResults:
         """
         Estimate the model parameters.

@@ -330,29 +542,160 @@ class ARDL(AutoReg):
         Use ``OLS`` to estimate model parameters and to estimate parameter
         covariance.
         """
-        pass
-
-    def _construct_regressors(self, hold_back: (int | None)) ->tuple[np.
-        ndarray, np.ndarray]:
+        params, cov_params, norm_cov_params = self._fit(
+            cov_type=cov_type, cov_kwds=cov_kwds, use_t=use_t
+        )
+        res = ARDLResults(
+            self, params, cov_params, norm_cov_params, use_t=use_t
+        )
+        return ARDLResultsWrapper(res)
+
+    def _construct_regressors(
+        self, hold_back: int | None
+    ) -> tuple[np.ndarray, np.ndarray]:
         """Construct and format model regressors"""
-        pass
+        # TODO: Missing adjustment
+        self._maxlag = max(self._lags) if self._lags else 0
+        _endog_reg, _endog = lagmat(
+            self.data.endog, self._maxlag, original="sep"
+        )
+        assert isinstance(_endog, np.ndarray)
+        assert isinstance(_endog_reg, np.ndarray)
+        self._endog_reg, self._endog = _endog_reg, _endog
+        if self._endog_reg.shape[1] != len(self._lags):
+            lag_locs = [lag - 1 for lag in self._lags]
+            self._endog_reg = self._endog_reg[:, lag_locs]
+
+        orig_exog = self.data.orig_exog
+        self._exog = self._format_exog(orig_exog, self._order)
+
+        exog_maxlag = 0
+        for val in self._order.values():
+            exog_maxlag = max(exog_maxlag, max(val) if val is not None else 0)
+        self._maxlag = max(self._maxlag, exog_maxlag)
+
+        self._deterministic_reg = self._deterministics.in_sample()
+        self._blocks = {
+            "endog": self._endog_reg,
+            "exog": self._exog,
+            "deterministic": self._deterministic_reg,
+            "fixed": self._fixed,
+        }
+        x = [self._deterministic_reg, self._endog_reg]
+        x += [ex for ex in self._exog.values()] + [self._fixed]
+        reg = np.column_stack(x)
+        if hold_back is None:
+            self._hold_back = int(self._maxlag)
+        if self._hold_back < self._maxlag:
+            raise ValueError(
+                "hold_back must be >= the maximum lag of the endog and exog "
+                "variables"
+            )
+        reg = reg[self._hold_back :]
+        if reg.shape[1] > reg.shape[0]:
+            raise ValueError(
+                f"The number of regressors ({reg.shape[1]}) including "
+                "deterministics, lags of the endog, lags of the exogenous, "
+                "and fixed regressors is larger than the sample available "
+                f"for estimation ({reg.shape[0]})."
+            )
+        return self.data.endog[self._hold_back :], reg

     def _construct_variable_names(self):
         """Construct model variables names"""
-        pass
-
-    def _forecasting_x(self, start: int, end: int, num_oos: int, exog: (
-        ArrayLike2D | None), exog_oos: (ArrayLike2D | None), fixed: (
-        ArrayLike2D | None), fixed_oos: (ArrayLike2D | None)) ->np.ndarray:
+        y_name = self.data.ynames
+        endog_lag_names = [f"{y_name}.L{i}" for i in self._lags]
+
+        exog = self.data.orig_exog
+        exog_names = {}
+        for key in self._order:
+            if isinstance(exog, np.ndarray):
+                base = f"x{key}"
+            else:
+                base = str(key)
+            lags = self._order[key]
+            exog_names[key] = [f"{base}.L{lag}" for lag in lags]
+
+        self._names = {
+            "endog": endog_lag_names,
+            "exog": exog_names,
+            "deterministic": self._deterministic_reg.columns,
+            "fixed": self._fixed_names,
+        }
+        x_names = list(self._deterministic_reg.columns)
+        x_names += endog_lag_names
+        for key in exog_names:
+            x_names += exog_names[key]
+        x_names += self._fixed_names
+        return y_name, x_names
+
+    def _forecasting_x(
+        self,
+        start: int,
+        end: int,
+        num_oos: int,
+        exog: ArrayLike2D | None,
+        exog_oos: ArrayLike2D | None,
+        fixed: ArrayLike2D | None,
+        fixed_oos: ArrayLike2D | None,
+    ) -> np.ndarray:
         """Construct exog matrix for forecasts"""
-        pass
-
-    def predict(self, params: ArrayLike1D, start: (int | str | dt.datetime |
-        pd.Timestamp | None)=None, end: (int | str | dt.datetime | pd.
-        Timestamp | None)=None, dynamic: bool=False, exog: (NDArray | pd.
-        DataFrame | None)=None, exog_oos: (NDArray | pd.DataFrame | None)=
-        None, fixed: (NDArray | pd.DataFrame | None)=None, fixed_oos: (
-        NDArray | pd.DataFrame | None)=None):
+
+        def pad_x(x: np.ndarray, pad: int) -> np.ndarray:
+            if pad == 0:
+                return x
+            k = x.shape[1]
+            return np.vstack([np.full((pad, k), np.nan), x])
+
+        pad = 0 if start >= self._hold_back else self._hold_back - start
+        # Shortcut if all in-sample and no new data
+
+        if (end + 1) < self.endog.shape[0] and exog is None and fixed is None:
+            adjusted_start = max(start - self._hold_back, 0)
+            return pad_x(
+                self._x[adjusted_start : end + 1 - self._hold_back], pad
+            )
+
+        # If anything changed, rebuild x array
+        exog = self.data.exog if exog is None else np.asarray(exog)
+        if exog_oos is not None:
+            exog = np.vstack([exog, np.asarray(exog_oos)[:num_oos]])
+        fixed = self._fixed if fixed is None else np.asarray(fixed)
+        if fixed_oos is not None:
+            fixed = np.vstack([fixed, np.asarray(fixed_oos)[:num_oos]])
+        det = self._deterministics.in_sample()
+        if num_oos:
+            oos_det = self._deterministics.out_of_sample(num_oos)
+            det = pd.concat([det, oos_det], axis=0)
+        endog = self.data.endog
+        if num_oos:
+            endog = np.hstack([endog, np.full(num_oos, np.nan)])
+        x = [det]
+        if self._lags:
+            endog_reg = lagmat(endog, max(self._lags), original="ex")
+            x.append(endog_reg[:, [lag - 1 for lag in self._lags]])
+        if self.ardl_order[1:]:
+            if isinstance(self.data.orig_exog, pd.DataFrame):
+                exog = pd.DataFrame(exog, columns=self.data.orig_exog.columns)
+            exog = self._format_exog(exog, self._order)
+            x.extend([np.asarray(arr) for arr in exog.values()])
+        if fixed.shape[1] > 0:
+            x.append(fixed)
+        _x = np.column_stack(x)
+        _x[: self._hold_back] = np.nan
+        return _x[start:]
+
+    def predict(
+        self,
+        params: ArrayLike1D,
+        start: int | str | dt.datetime | pd.Timestamp | None = None,
+        end: int | str | dt.datetime | pd.Timestamp | None = None,
+        dynamic: bool = False,
+        exog: NDArray | pd.DataFrame | None = None,
+        exog_oos: NDArray | pd.DataFrame | None = None,
+        fixed: NDArray | pd.DataFrame | None = None,
+        fixed_oos: NDArray | pd.DataFrame | None = None,
+    ):
         """
         In-sample prediction and out-of-sample forecasting.

@@ -404,15 +747,131 @@ class ARDL(AutoReg):
             Array of out of in-sample predictions and / or out-of-sample
             forecasts.
         """
-        pass
+        params, exog, exog_oos, start, end, num_oos = self._prepare_prediction(
+            params, exog, exog_oos, start, end
+        )
+
+        def check_exog(arr, name, orig, exact):
+            if isinstance(orig, pd.DataFrame):
+                if not isinstance(arr, pd.DataFrame):
+                    raise TypeError(
+                        f"{name} must be a DataFrame when the original exog "
+                        "was a DataFrame"
+                    )
+                if sorted(arr.columns) != sorted(self.data.orig_exog.columns):
+                    raise ValueError(
+                        f"{name} must have the same columns as the original "
+                        "exog"
+                    )
+            else:
+                arr = array_like(arr, name, ndim=2, optional=False)
+            if arr.ndim != 2 or arr.shape[1] != orig.shape[1]:
+                raise ValueError(
+                    f"{name} must have the same number of columns as the "
+                    f"original data, {orig.shape[1]}"
+                )
+            if exact and arr.shape[0] != orig.shape[0]:
+                raise ValueError(
+                    f"{name} must have the same number of rows as the "
+                    f"original data ({n})."
+                )
+            return arr
+
+        n = self.data.endog.shape[0]
+        if exog is not None:
+            exog = check_exog(exog, "exog", self.data.orig_exog, True)
+        if exog_oos is not None:
+            exog_oos = check_exog(
+                exog_oos, "exog_oos", self.data.orig_exog, False
+            )
+        if fixed is not None:
+            fixed = check_exog(fixed, "fixed", self._fixed, True)
+        if fixed_oos is not None:
+            fixed_oos = check_exog(
+                np.asarray(fixed_oos), "fixed_oos", self._fixed, False
+            )
+        # The maximum number of 1-step predictions that can be made,
+        # which depends on the model and lags
+        if self._fixed.shape[1] or not self._causal:
+            max_1step = 0
+        else:
+            max_1step = np.inf if not self._lags else min(self._lags)
+            if self._order:
+                min_exog = min([min(v) for v in self._order.values()])
+                max_1step = min(max_1step, min_exog)
+        if num_oos > max_1step:
+            if self._order and exog_oos is None:
+                raise ValueError(
+                    "exog_oos must be provided when out-of-sample "
+                    "observations require values of the exog not in the "
+                    "original sample"
+                )
+            elif self._order and (exog_oos.shape[0] + max_1step) < num_oos:
+                raise ValueError(
+                    f"exog_oos must have at least {num_oos - max_1step} "
+                    f"observations to produce {num_oos} forecasts based on "
+                    "the model specification."
+                )
+
+            if self._fixed.shape[1] and fixed_oos is None:
+                raise ValueError(
+                    "fixed_oos must be provided when predicting "
+                    "out-of-sample observations"
+                )
+            elif self._fixed.shape[1] and fixed_oos.shape[0] < num_oos:
+                raise ValueError(
+                    f"fixed_oos must have at least {num_oos} observations "
+                    f"to produce {num_oos} forecasts."
+                )
+        # Extend exog_oos if fcast is valid for horizon but no exog_oos given
+        if self.exog is not None and exog_oos is None and num_oos:
+            exog_oos = np.full((num_oos, self.exog.shape[1]), np.nan)
+            if isinstance(self.data.orig_exog, pd.DataFrame):
+                exog_oos = pd.DataFrame(
+                    exog_oos, columns=self.data.orig_exog.columns
+                )
+        x = self._forecasting_x(
+            start, end, num_oos, exog, exog_oos, fixed, fixed_oos
+        )
+        if dynamic is False:
+            dynamic_start = end + 1 - start
+        else:
+            dynamic_step = self._parse_dynamic(dynamic, start)
+            dynamic_start = dynamic_step
+            if start < self._hold_back:
+                dynamic_start = max(dynamic_start, self._hold_back - start)
+
+        fcasts = np.full(x.shape[0], np.nan)
+        fcasts[:dynamic_start] = x[:dynamic_start] @ params
+        offset = self._deterministic_reg.shape[1]
+        for i in range(dynamic_start, fcasts.shape[0]):
+            for j, lag in enumerate(self._lags):
+                loc = i - lag
+                if loc >= dynamic_start:
+                    val = fcasts[loc]
+                else:
+                    # Actual data
+                    val = self.endog[start + loc]
+                x[i, offset + j] = val
+            fcasts[i] = x[i] @ params
+        return self._wrap_prediction(fcasts, start, end + 1 + num_oos, 0)

     @classmethod
-    def from_formula(cls, formula: str, data: pd.DataFrame, lags: (int |
-        Sequence[int] | None)=0, order: _ARDLOrder=0, trend: Literal['n',
-        'c', 'ct', 'ctt']='n', *, causal: bool=False, seasonal: bool=False,
-        deterministic: (DeterministicProcess | None)=None, hold_back: (int |
-        None)=None, period: (int | None)=None, missing: Literal['none',
-        'raise']='none') ->(ARDL | 'UECM'):
+    def from_formula(
+        cls,
+        formula: str,
+        data: pd.DataFrame,
+        lags: int | Sequence[int] | None = 0,
+        order: _ARDLOrder = 0,
+        trend: Literal["n", "c", "ct", "ctt"] = "n",
+        *,
+        causal: bool = False,
+        seasonal: bool = False,
+        deterministic: DeterministicProcess | None = None,
+        hold_back: int | None = None,
+        period: int | None = None,
+        missing: Literal["none", "raise"] = "none",
+    ) -> ARDL | "UECM":
         """
         Construct an ARDL from a formula

@@ -490,12 +949,44 @@ class ARDL(AutoReg):

         >>> mod = ARDL.from_formula("lrm ~ ibo | ide", data, 2, 2)
         """
-        pass
+        index = data.index
+        fixed_formula = None
+        if "|" in formula:
+            formula, fixed_formula = formula.split("|")
+            fixed_formula = fixed_formula.strip()
+        mod = OLS.from_formula(formula + " -1", data)
+        exog = mod.data.orig_exog
+        exog.index = index
+        endog = mod.data.orig_endog
+        endog.index = index
+        if fixed_formula is not None:
+            endog_name = formula.split("~")[0].strip()
+            fixed_formula = f"{endog_name} ~ {fixed_formula} - 1"
+            mod = OLS.from_formula(fixed_formula, data)
+            fixed: pd.DataFrame | None = mod.data.orig_exog
+            fixed.index = index
+        else:
+            fixed = None
+        return cls(
+            endog,
+            lags,
+            exog,
+            order,
+            trend=trend,
+            fixed=fixed,
+            causal=causal,
+            seasonal=seasonal,
+            deterministic=deterministic,
+            hold_back=hold_back,
+            period=period,
+            missing=missing,
+        )


 doc = Docstring(ARDL.predict.__doc__)
-_predict_params = doc.extract_parameters(['start', 'end', 'dynamic', 'exog',
-    'exog_oos', 'fixed', 'fixed_oos'], 8)
+_predict_params = doc.extract_parameters(
+    ["start", "end", "dynamic", "exog", "exog_oos", "fixed", "fixed_oos"], 8
+)


 class ARDLResults(AutoRegResults):
@@ -518,13 +1009,21 @@ class ARDLResults(AutoRegResults):
     use_t : bool
         Whether use_t was set in fit
     """
-    _cache = {}

-    def __init__(self, model: ARDL, params: np.ndarray, cov_params: np.
-        ndarray, normalized_cov_params: (Float64Array | None)=None, scale:
-        float=1.0, use_t: bool=False):
-        super().__init__(model, params, normalized_cov_params, scale, use_t
-            =use_t)
+    _cache = {}  # for scale setter
+
+    def __init__(
+        self,
+        model: ARDL,
+        params: np.ndarray,
+        cov_params: np.ndarray,
+        normalized_cov_params: Float64Array | None = None,
+        scale: float = 1.0,
+        use_t: bool = False,
+    ):
+        super().__init__(
+            model, params, normalized_cov_params, scale, use_t=use_t
+        )
         self._cache = {}
         self._params = params
         self._nobs = model.nobs
@@ -537,9 +1036,34 @@ class ARDLResults(AutoRegResults):
         self._hold_back = self.model.hold_back
         self.cov_params_default = cov_params

-    def forecast(self, steps: int=1, exog: (NDArray | pd.DataFrame | None)=
-        None, fixed: (NDArray | pd.DataFrame | None)=None) ->(np.ndarray |
-        pd.Series):
+    @Appender(remove_parameters(ARDL.predict.__doc__, "params"))
+    def predict(
+        self,
+        start: int | str | dt.datetime | pd.Timestamp | None = None,
+        end: int | str | dt.datetime | pd.Timestamp | None = None,
+        dynamic: bool = False,
+        exog: NDArray | pd.DataFrame | None = None,
+        exog_oos: NDArray | pd.DataFrame | None = None,
+        fixed: NDArray | pd.DataFrame | None = None,
+        fixed_oos: NDArray | pd.DataFrame | None = None,
+    ):
+        return self.model.predict(
+            self._params,
+            start=start,
+            end=end,
+            dynamic=dynamic,
+            exog=exog,
+            exog_oos=exog_oos,
+            fixed=fixed,
+            fixed_oos=fixed_oos,
+        )
+
+    def forecast(
+        self,
+        steps: int = 1,
+        exog: NDArray | pd.DataFrame | None = None,
+        fixed: NDArray | pd.DataFrame | None = None,
+    ) -> np.ndarray | pd.Series:
         """
         Out-of-sample forecasts

@@ -570,18 +1094,37 @@ class ARDLResults(AutoRegResults):
         ARDLResults.get_prediction
             In- and out-of-sample predictions and confidence intervals
         """
-        pass
+        start = self.model.data.orig_endog.shape[0]
+        if isinstance(steps, (int, np.integer)):
+            end = start + steps - 1
+        else:
+            end = steps
+        return self.predict(
+            start=start, end=end, dynamic=False, exog_oos=exog, fixed_oos=fixed
+        )

-    def _lag_repr(self) ->np.ndarray:
+    def _lag_repr(self) -> np.ndarray:
         """Returns poly repr of an AR, (1  -phi1 L -phi2 L^2-...)"""
-        pass
-
-    def get_prediction(self, start: (int | str | dt.datetime | pd.Timestamp |
-        None)=None, end: (int | str | dt.datetime | pd.Timestamp | None)=
-        None, dynamic: bool=False, exog: (NDArray | pd.DataFrame | None)=
-        None, exog_oos: (NDArray | pd.DataFrame | None)=None, fixed: (
-        NDArray | pd.DataFrame | None)=None, fixed_oos: (NDArray | pd.
-        DataFrame | None)=None) ->(np.ndarray | pd.Series):
+        ar_lags = self._ar_lags if self._ar_lags is not None else []
+        k_ar = len(ar_lags)
+        ar_params = np.zeros(self._max_lag + 1)
+        ar_params[0] = 1
+        offset = self.model._deterministic_reg.shape[1]
+        params = self._params[offset : offset + k_ar]
+        for i, lag in enumerate(ar_lags):
+            ar_params[lag] = -params[i]
+        return ar_params
+
+    def get_prediction(
+        self,
+        start: int | str | dt.datetime | pd.Timestamp | None = None,
+        end: int | str | dt.datetime | pd.Timestamp | None = None,
+        dynamic: bool = False,
+        exog: NDArray | pd.DataFrame | None = None,
+        exog_oos: NDArray | pd.DataFrame | None = None,
+        fixed: NDArray | pd.DataFrame | None = None,
+        fixed_oos: NDArray | pd.DataFrame | None = None,
+    ) -> np.ndarray | pd.Series:
         """
         Predictions and prediction intervals

@@ -630,23 +1173,49 @@ class ARDLResults(AutoRegResults):
         PredictionResults
             Prediction results with mean and prediction intervals
         """
-        pass
+        mean = self.predict(
+            start=start,
+            end=end,
+            dynamic=dynamic,
+            exog=exog,
+            exog_oos=exog_oos,
+            fixed=fixed,
+            fixed_oos=fixed_oos,
+        )
+        mean_var = np.full_like(mean, fill_value=self.sigma2)
+        mean_var[np.isnan(mean)] = np.nan
+        start = 0 if start is None else start
+        end = self.model._index[-1] if end is None else end
+        _, _, oos, _ = self.model._get_prediction_index(start, end)
+        if oos > 0:
+            ar_params = self._lag_repr()
+            ma = arma2ma(ar_params, np.ones(1), lags=oos)
+            mean_var[-oos:] = self.sigma2 * np.cumsum(ma**2)
+        if isinstance(mean, pd.Series):
+            mean_var = pd.Series(mean_var, index=mean.index)
+
+        return PredictionResults(mean, mean_var)

     @Substitution(predict_params=_predict_params)
-    def plot_predict(self, start: (int | str | dt.datetime | pd.Timestamp |
-        None)=None, end: (int | str | dt.datetime | pd.Timestamp | None)=
-        None, dynamic: bool=False, exog: (NDArray | pd.DataFrame | None)=
-        None, exog_oos: (NDArray | pd.DataFrame | None)=None, fixed: (
-        NDArray | pd.DataFrame | None)=None, fixed_oos: (NDArray | pd.
-        DataFrame | None)=None, alpha: float=0.05, in_sample: bool=True,
-        fig: 'matplotlib.figure.Figure'=None, figsize: (tuple[int, int] |
-        None)=None) ->'matplotlib.figure.Figure':
+    def plot_predict(
+        self,
+        start: int | str | dt.datetime | pd.Timestamp | None = None,
+        end: int | str | dt.datetime | pd.Timestamp | None = None,
+        dynamic: bool = False,
+        exog: NDArray | pd.DataFrame | None = None,
+        exog_oos: NDArray | pd.DataFrame | None = None,
+        fixed: NDArray | pd.DataFrame | None = None,
+        fixed_oos: NDArray | pd.DataFrame | None = None,
+        alpha: float = 0.05,
+        in_sample: bool = True,
+        fig: "matplotlib.figure.Figure" = None,
+        figsize: tuple[int, int] | None = None,
+    ) -> "matplotlib.figure.Figure":
         """
         Plot in- and out-of-sample predictions

         Parameters
-        ----------
-%(predict_params)s
+        ----------\n%(predict_params)s
         alpha : {float, None}
             The tail probability not covered by the confidence interval. Must
             be in (0, 1). Confidence interval is constructed assuming normally
@@ -666,9 +1235,20 @@ class ARDLResults(AutoRegResults):
         Figure
             Figure handle containing the plot.
         """
-        pass
-
-    def summary(self, alpha: float=0.05) ->Summary:
+        predictions = self.get_prediction(
+            start=start,
+            end=end,
+            dynamic=dynamic,
+            exog=exog,
+            exog_oos=exog_oos,
+            fixed=fixed,
+            fixed_oos=fixed_oos,
+        )
+        return self._plot_predictions(
+            predictions, start, end, alpha, in_sample, fig, figsize
+        )
+
+    def summary(self, alpha: float = 0.05) -> Summary:
         """
         Summarize the Model

@@ -687,16 +1267,60 @@ class ARDLResults(AutoRegResults):
         --------
         statsmodels.iolib.summary.Summary
         """
-        pass
+        model = self.model
+
+        title = model.__class__.__name__ + " Model Results"
+        method = "Conditional MLE"
+        # get sample
+        start = self._hold_back
+        if self.data.dates is not None:
+            dates = self.data.dates
+            sample = [dates[start].strftime("%m-%d-%Y")]
+            sample += ["- " + dates[-1].strftime("%m-%d-%Y")]
+        else:
+            sample = [str(start), str(len(self.data.orig_endog))]
+        model = self.model.__class__.__name__ + str(self.model.ardl_order)
+        if self.model.seasonal:
+            model = "Seas. " + model
+
+        dep_name = str(self.model.endog_names)
+        top_left = [
+            ("Dep. Variable:", [dep_name]),
+            ("Model:", [model]),
+            ("Method:", [method]),
+            ("Date:", None),
+            ("Time:", None),
+            ("Sample:", [sample[0]]),
+            ("", [sample[1]]),
+        ]
+
+        top_right = [
+            ("No. Observations:", [str(len(self.model.endog))]),
+            ("Log Likelihood", ["%#5.3f" % self.llf]),
+            ("S.D. of innovations", ["%#5.3f" % self.sigma2**0.5]),
+            ("AIC", ["%#5.3f" % self.aic]),
+            ("BIC", ["%#5.3f" % self.bic]),
+            ("HQIC", ["%#5.3f" % self.hqic]),
+        ]
+
+        smry = Summary()
+        smry.add_table_2cols(
+            self, gleft=top_left, gright=top_right, title=title
+        )
+        smry.add_table_params(self, alpha=alpha, use_t=False)
+
+        return smry


 class ARDLResultsWrapper(wrap.ResultsWrapper):
     _attrs = {}
-    _wrap_attrs = wrap.union_dicts(tsa_model.TimeSeriesResultsWrapper.
-        _wrap_attrs, _attrs)
+    _wrap_attrs = wrap.union_dicts(
+        tsa_model.TimeSeriesResultsWrapper._wrap_attrs, _attrs
+    )
     _methods = {}
-    _wrap_methods = wrap.union_dicts(tsa_model.TimeSeriesResultsWrapper.
-        _wrap_methods, _methods)
+    _wrap_methods = wrap.union_dicts(
+        tsa_model.TimeSeriesResultsWrapper._wrap_methods, _methods
+    )


 wrap.populate_wrapper(ARDLResultsWrapper, ARDLResults)
@@ -710,38 +1334,54 @@ class ARDLOrderSelectionResults(AROrderSelectionResults):
     """

     def __init__(self, model, ics, trend, seasonal, period):
-        _ics = ((0,), (0, 0, 0)),
+        _ics = (((0,), (0, 0, 0)),)
         super().__init__(model, _ics, trend, seasonal, period)

         def _to_dict(d):
             return d[0], dict(d[1:])
-        self._aic = pd.Series({v[0]: _to_dict(k) for k, v in ics.items()},
-            dtype=object)
-        self._aic.index.name = self._aic.name = 'AIC'
+
+        self._aic = pd.Series(
+            {v[0]: _to_dict(k) for k, v in ics.items()}, dtype=object
+        )
+        self._aic.index.name = self._aic.name = "AIC"
         self._aic = self._aic.sort_index()
-        self._bic = pd.Series({v[1]: _to_dict(k) for k, v in ics.items()},
-            dtype=object)
-        self._bic.index.name = self._bic.name = 'BIC'
+
+        self._bic = pd.Series(
+            {v[1]: _to_dict(k) for k, v in ics.items()}, dtype=object
+        )
+        self._bic.index.name = self._bic.name = "BIC"
         self._bic = self._bic.sort_index()
-        self._hqic = pd.Series({v[2]: _to_dict(k) for k, v in ics.items()},
-            dtype=object)
-        self._hqic.index.name = self._hqic.name = 'HQIC'
+
+        self._hqic = pd.Series(
+            {v[2]: _to_dict(k) for k, v in ics.items()}, dtype=object
+        )
+        self._hqic.index.name = self._hqic.name = "HQIC"
         self._hqic = self._hqic.sort_index()

     @property
-    def dl_lags(self) ->dict[Hashable, list[int]]:
+    def dl_lags(self) -> dict[Hashable, list[int]]:
         """The lags of exogenous variables in the selected model"""
-        pass
-
-
-def ardl_select_order(endog: (ArrayLike1D | ArrayLike2D), maxlag: int, exog:
-    ArrayLike2D, maxorder: (int | dict[Hashable, int]), trend: Literal['n',
-    'c', 'ct', 'ctt']='c', *, fixed: (ArrayLike2D | None)=None, causal:
-    bool=False, ic: Literal['aic', 'bic']='bic', glob: bool=False, seasonal:
-    bool=False, deterministic: (DeterministicProcess | None)=None,
-    hold_back: (int | None)=None, period: (int | None)=None, missing:
-    Literal['none', 'raise']='none') ->ARDLOrderSelectionResults:
-    """
+        return self._model.dl_lags
+
+
+def ardl_select_order(
+    endog: ArrayLike1D | ArrayLike2D,
+    maxlag: int,
+    exog: ArrayLike2D,
+    maxorder: int | dict[Hashable, int],
+    trend: Literal["n", "c", "ct", "ctt"] = "c",
+    *,
+    fixed: ArrayLike2D | None = None,
+    causal: bool = False,
+    ic: Literal["aic", "bic"] = "bic",
+    glob: bool = False,
+    seasonal: bool = False,
+    deterministic: DeterministicProcess | None = None,
+    hold_back: int | None = None,
+    period: int | None = None,
+    missing: Literal["none", "raise"] = "none",
+) -> ARDLOrderSelectionResults:
+    r"""
     ARDL order selection

     Parameters
@@ -812,35 +1452,180 @@ def ardl_select_order(endog: (ArrayLike1D | ArrayLike2D), maxlag: int, exog:
         A results holder containing the selected model and the complete set
         of information criteria for all models fit.
     """
-    pass
+    orig_hold_back = int_like(hold_back, "hold_back", optional=True)
+
+    def compute_ics(y, x, df):
+        if x.shape[1]:
+            resid = y - x @ np.linalg.lstsq(x, y, rcond=None)[0]
+        else:
+            resid = y
+        nobs = resid.shape[0]
+        sigma2 = 1.0 / nobs * sumofsq(resid)
+        llf = -nobs * (np.log(2 * np.pi * sigma2) + 1) / 2
+        res = SimpleNamespace(
+            nobs=nobs, df_model=df + x.shape[1], sigma2=sigma2, llf=llf
+        )
+
+        aic = call_cached_func(ARDLResults.aic, res)
+        bic = call_cached_func(ARDLResults.bic, res)
+        hqic = call_cached_func(ARDLResults.hqic, res)
+
+        return aic, bic, hqic
+
+    base = ARDL(
+        endog,
+        maxlag,
+        exog,
+        maxorder,
+        trend,
+        fixed=fixed,
+        causal=causal,
+        seasonal=seasonal,
+        deterministic=deterministic,
+        hold_back=hold_back,
+        period=period,
+        missing=missing,
+    )
+    hold_back = base.hold_back
+    blocks = base._blocks
+    always = np.column_stack([blocks["deterministic"], blocks["fixed"]])
+    always = always[hold_back:]
+    select = []
+    iter_orders = []
+    select.append(blocks["endog"][hold_back:])
+    iter_orders.append(list(range(blocks["endog"].shape[1] + 1)))
+    var_names = []
+    for var in blocks["exog"]:
+        block = blocks["exog"][var][hold_back:]
+        select.append(block)
+        iter_orders.append(list(range(block.shape[1] + 1)))
+        var_names.append(var)
+    y = base._y
+    if always.shape[1]:
+        pinv_always = np.linalg.pinv(always)
+        for i in range(len(select)):
+            x = select[i]
+            select[i] = x - always @ (pinv_always @ x)
+        y = y - always @ (pinv_always @ y)
+
+    def perm_to_tuple(keys, perm):
+        if perm == ():
+            d = {k: 0 for k, _ in keys if k is not None}
+            return (0,) + tuple((k, v) for k, v in d.items())
+        d = defaultdict(list)
+        y_lags = []
+        for v in perm:
+            key = keys[v]
+            if key[0] is None:
+                y_lags.append(key[1])
+            else:
+                d[key[0]].append(key[1])
+        d = dict(d)
+        if not y_lags or y_lags == [0]:
+            y_lags = 0
+        else:
+            y_lags = tuple(y_lags)
+        for key in keys:
+            if key[0] not in d and key[0] is not None:
+                d[key[0]] = None
+        for key in d:
+            if d[key] is not None:
+                d[key] = tuple(d[key])
+        return (y_lags,) + tuple((k, v) for k, v in d.items())
+
+    always_df = always.shape[1]
+    ics = {}
+    if glob:
+        ar_lags = base.ar_lags if base.ar_lags is not None else []
+        keys = [(None, i) for i in ar_lags]
+        for k, v in base._order.items():
+            keys += [(k, i) for i in v]
+        x = np.column_stack([a for a in select])
+        all_columns = list(range(x.shape[1]))
+        for i in range(x.shape[1]):
+            for perm in combinations(all_columns, i):
+                key = perm_to_tuple(keys, perm)
+                ics[key] = compute_ics(y, x[:, perm], always_df)
+    else:
+        for io in product(*iter_orders):
+            x = np.column_stack([a[:, : io[i]] for i, a in enumerate(select)])
+            key = [io[0] if io[0] else None]
+            for j, val in enumerate(io[1:]):
+                var = var_names[j]
+                if causal:
+                    key.append((var, None if val == 0 else val))
+                else:
+                    key.append((var, val - 1 if val - 1 >= 0 else None))
+            key = tuple(key)
+            ics[key] = compute_ics(y, x, always_df)
+    index = {"aic": 0, "bic": 1, "hqic": 2}[ic]
+    lowest = np.inf
+    for key in ics:
+        val = ics[key][index]
+        if val < lowest:
+            lowest = val
+            selected_order = key
+    exog_order = {k: v for k, v in selected_order[1:]}
+    model = ARDL(
+        endog,
+        selected_order[0],
+        exog,
+        exog_order,
+        trend,
+        fixed=fixed,
+        causal=causal,
+        seasonal=seasonal,
+        deterministic=deterministic,
+        hold_back=orig_hold_back,
+        period=period,
+        missing=missing,
+    )
+
+    return ARDLOrderSelectionResults(model, ics, trend, seasonal, period)


 lags_descr = textwrap.wrap(
-    'The number of lags of the endogenous variable to include in the model. Must be at least 1.'
-    , 71)
-lags_param = Parameter(name='lags', type='int', desc=lags_descr)
+    "The number of lags of the endogenous variable to include in the model. "
+    "Must be at least 1.",
+    71,
+)
+lags_param = Parameter(name="lags", type="int", desc=lags_descr)
 order_descr = textwrap.wrap(
-    'If int, uses lags 0, 1, ..., order  for all exog variables. If a dict, applies the lags series by series. If ``exog`` is anything other than a DataFrame, the keys are the column index of exog (e.g., 0, 1, ...). If a DataFrame, keys are column names.'
-    , 71)
-order_param = Parameter(name='order', type='int, dict', desc=order_descr)
+    "If int, uses lags 0, 1, ..., order  for all exog variables. If a dict, "
+    "applies the lags series by series. If ``exog`` is anything other than a "
+    "DataFrame, the keys are the column index of exog (e.g., 0, 1, ...). If "
+    "a DataFrame, keys are column names.",
+    71,
+)
+order_param = Parameter(name="order", type="int, dict", desc=order_descr)
+
 from_formula_doc = Docstring(ARDL.from_formula.__doc__)
-from_formula_doc.replace_block('Summary', 'Construct an UECM from a formula')
-from_formula_doc.remove_parameters('lags')
-from_formula_doc.remove_parameters('order')
-from_formula_doc.insert_parameters('data', lags_param)
-from_formula_doc.insert_parameters('lags', order_param)
+from_formula_doc.replace_block("Summary", "Construct an UECM from a formula")
+from_formula_doc.remove_parameters("lags")
+from_formula_doc.remove_parameters("order")
+from_formula_doc.insert_parameters("data", lags_param)
+from_formula_doc.insert_parameters("lags", order_param)
+
+
 fit_doc = Docstring(ARDL.fit.__doc__)
-fit_doc.replace_block('Returns', [Parameter('', 'UECMResults', [
-    'Estimation results.'])])
+fit_doc.replace_block(
+    "Returns", [Parameter("", "UECMResults", ["Estimation results."])]
+)
+
 if fit_doc._ds is not None:
-    see_also = fit_doc._ds['See Also']
-    see_also.insert(0, ([('statsmodels.tsa.ardl.ARDL', None)], [
-        'Autoregressive distributed lag model estimation']))
-    fit_doc.replace_block('See Also', see_also)
+    see_also = fit_doc._ds["See Also"]
+    see_also.insert(
+        0,
+        (
+            [("statsmodels.tsa.ardl.ARDL", None)],
+            ["Autoregressive distributed lag model estimation"],
+        ),
+    )
+    fit_doc.replace_block("See Also", see_also)


 class UECM(ARDL):
-    """
+    r"""
     Unconstrained Error Correlation Model(UECM)

     Parameters
@@ -903,21 +1688,21 @@ class UECM(ARDL):

     .. math ::

-       \\Delta Y_t = \\delta_0 + \\delta_1 t + \\delta_2 t^2
-             + \\sum_{i=1}^{s-1} \\gamma_i I_{[(\\mod(t,s) + 1) = i]}
-             + \\lambda_0 Y_{t-1} + \\lambda_1 X_{1,t-1} + \\ldots
-             + \\lambda_{k} X_{k,t-1}
-             + \\sum_{j=1}^{p-1} \\phi_j \\Delta Y_{t-j}
-             + \\sum_{l=1}^k \\sum_{m=0}^{o_l-1} \\beta_{l,m} \\Delta X_{l, t-m}
-             + Z_t \\lambda
-             + \\epsilon_t
+       \Delta Y_t = \delta_0 + \delta_1 t + \delta_2 t^2
+             + \sum_{i=1}^{s-1} \gamma_i I_{[(\mod(t,s) + 1) = i]}
+             + \lambda_0 Y_{t-1} + \lambda_1 X_{1,t-1} + \ldots
+             + \lambda_{k} X_{k,t-1}
+             + \sum_{j=1}^{p-1} \phi_j \Delta Y_{t-j}
+             + \sum_{l=1}^k \sum_{m=0}^{o_l-1} \beta_{l,m} \Delta X_{l, t-m}
+             + Z_t \lambda
+             + \epsilon_t

-    where :math:`\\delta_\\bullet` capture trends, :math:`\\gamma_\\bullet`
+    where :math:`\delta_\bullet` capture trends, :math:`\gamma_\bullet`
     capture seasonal shifts, s is the period of the seasonality, p is the
     lag length of the endogenous variable, k is the number of exogenous
     variables :math:`X_{l}`, :math:`o_l` is included the lag length of
     :math:`X_{l}`, :math:`Z_t` are ``r`` included fixed regressors and
-    :math:`\\epsilon_t` is a white noise shock. If ``causal`` is ``True``,
+    :math:`\epsilon_t` is a white noise shock. If ``causal`` is ``True``,
     then the 0-th lag of the exogenous variables is not included and the
     sum starts at ``m=1``.

@@ -961,40 +1746,207 @@ class UECM(ARDL):
     >>> UECM(lrma, 3, exoga, {0: 1, 1: 3, 2: 2})
     """

-    def __init__(self, endog: (ArrayLike1D | ArrayLike2D), lags: (int |
-        None), exog: (ArrayLike2D | None)=None, order: _UECMOrder=0, trend:
-        Literal['n', 'c', 'ct', 'ctt']='c', *, fixed: (ArrayLike2D | None)=
-        None, causal: bool=False, seasonal: bool=False, deterministic: (
-        DeterministicProcess | None)=None, hold_back: (int | None)=None,
-        period: (int | None)=None, missing: Literal['none', 'drop', 'raise'
-        ]='none') ->None:
-        super().__init__(endog, lags, exog, order, trend=trend, fixed=fixed,
-            seasonal=seasonal, causal=causal, hold_back=hold_back, period=
-            period, missing=missing, deterministic=deterministic)
+    def __init__(
+        self,
+        endog: ArrayLike1D | ArrayLike2D,
+        lags: int | None,
+        exog: ArrayLike2D | None = None,
+        order: _UECMOrder = 0,
+        trend: Literal["n", "c", "ct", "ctt"] = "c",
+        *,
+        fixed: ArrayLike2D | None = None,
+        causal: bool = False,
+        seasonal: bool = False,
+        deterministic: DeterministicProcess | None = None,
+        hold_back: int | None = None,
+        period: int | None = None,
+        missing: Literal["none", "drop", "raise"] = "none",
+    ) -> None:
+        super().__init__(
+            endog,
+            lags,
+            exog,
+            order,
+            trend=trend,
+            fixed=fixed,
+            seasonal=seasonal,
+            causal=causal,
+            hold_back=hold_back,
+            period=period,
+            missing=missing,
+            deterministic=deterministic,
+        )
         self._results_class = UECMResults
         self._results_wrapper = UECMResultsWrapper

-    def _check_lags(self, lags: (int | Sequence[int] | None), hold_back: (
-        int | None)) ->tuple[list[int], int]:
+    def _check_lags(
+        self, lags: int | Sequence[int] | None, hold_back: int | None
+    ) -> tuple[list[int], int]:
         """Check lags value conforms to requirement"""
-        pass
+        if not (isinstance(lags, _INT_TYPES) or lags is None):
+            raise TypeError("lags must be an integer or None")
+        return super()._check_lags(lags, hold_back)

     def _check_order(self, order: _ARDLOrder):
         """Check order conforms to requirement"""
-        pass
+        if isinstance(order, Mapping):
+            for k, v in order.items():
+                if not isinstance(v, _INT_TYPES) and v is not None:
+                    raise TypeError(
+                        "order values must be positive integers or None"
+                    )
+        elif not (isinstance(order, _INT_TYPES) or order is None):
+            raise TypeError(
+                "order must be None, a positive integer, or a dict "
+                "containing positive integers or None"
+            )
+        # TODO: Check order is >= 1
+        order = super()._check_order(order)
+        if not order:
+            raise ValueError(
+                "Model must contain at least one exogenous variable"
+            )
+        for key, val in order.items():
+            if val == [0]:
+                raise ValueError(
+                    "All included exog variables must have a lag length >= 1"
+                )
+        return order

     def _construct_variable_names(self):
         """Construct model variables names"""
-        pass
-
-    def _construct_regressors(self, hold_back: (int | None)) ->tuple[np.
-        ndarray, np.ndarray]:
+        endog = self.data.orig_endog
+        if isinstance(endog, pd.Series):
+            y_base = endog.name or "y"
+        elif isinstance(endog, pd.DataFrame):
+            y_base = endog.squeeze().name or "y"
+        else:
+            y_base = "y"
+        y_name = f"D.{y_base}"
+        # 1. Deterministics
+        x_names = list(self._deterministic_reg.columns)
+        # 2. Levels
+        x_names.append(f"{y_base}.L1")
+        orig_exog = self.data.orig_exog
+        exog_pandas = isinstance(orig_exog, pd.DataFrame)
+        dexog_names = []
+        for key, val in self._order.items():
+            if val is not None:
+                if exog_pandas:
+                    x_name = f"{key}.L1"
+                else:
+                    x_name = f"x{key}.L1"
+                x_names.append(x_name)
+                lag_base = x_name[:-1]
+                for lag in val[:-1]:
+                    dexog_names.append(f"D.{lag_base}{lag}")
+        # 3. Lagged endog
+        y_lags = max(self._lags) if self._lags else 0
+        dendog_names = [f"{y_name}.L{lag}" for lag in range(1, y_lags)]
+        x_names.extend(dendog_names)
+        x_names.extend(dexog_names)
+        x_names.extend(self._fixed_names)
+        return y_name, x_names
+
+    def _construct_regressors(
+        self, hold_back: int | None
+    ) -> tuple[np.ndarray, np.ndarray]:
         """Construct and format model regressors"""
-        pass
+        # 1. Endogenous and endogenous lags
+        self._maxlag = max(self._lags) if self._lags else 0
+        dendog = np.full_like(self.data.endog, np.nan)
+        dendog[1:] = np.diff(self.data.endog, axis=0)
+        dlag = max(0, self._maxlag - 1)
+        self._endog_reg, self._endog = lagmat(dendog, dlag, original="sep")
+        # 2. Deterministics
+        self._deterministic_reg = self._deterministics.in_sample()
+        # 3. Levels
+        orig_exog = self.data.orig_exog
+        exog_pandas = isinstance(orig_exog, pd.DataFrame)
+        lvl = np.full_like(self.data.endog, np.nan)
+        lvl[1:] = self.data.endog[:-1]
+        lvls = [lvl.copy()]
+        for key, val in self._order.items():
+            if val is not None:
+                if exog_pandas:
+                    loc = orig_exog.columns.get_loc(key)
+                else:
+                    loc = key
+                lvl[1:] = self.data.exog[:-1, loc]
+                lvls.append(lvl.copy())
+        self._levels = np.column_stack(lvls)
+
+        # 4. exog Lags
+        if exog_pandas:
+            dexog = orig_exog.diff()
+        else:
+            dexog = np.full_like(self.data.exog, np.nan)
+            dexog[1:] = np.diff(orig_exog, axis=0)
+        adj_order = {}
+        for key, val in self._order.items():
+            val = None if (val is None or val == [1]) else val[:-1]
+            adj_order[key] = val
+        self._exog = self._format_exog(dexog, adj_order)
+
+        self._blocks = {
+            "deterministic": self._deterministic_reg,
+            "levels": self._levels,
+            "endog": self._endog_reg,
+            "exog": self._exog,
+            "fixed": self._fixed,
+        }
+        blocks = [self._endog]
+        for key, val in self._blocks.items():
+            if key != "exog":
+                blocks.append(np.asarray(val))
+            else:
+                for subval in val.values():
+                    blocks.append(np.asarray(subval))
+        y = blocks[0]
+        reg = np.column_stack(blocks[1:])
+        exog_maxlag = 0
+        for val in self._order.values():
+            exog_maxlag = max(exog_maxlag, max(val) if val is not None else 0)
+        self._maxlag = max(self._maxlag, exog_maxlag)
+        # Must be at least 1 since the endog is differenced
+        self._maxlag = max(self._maxlag, 1)
+        if hold_back is None:
+            self._hold_back = int(self._maxlag)
+        if self._hold_back < self._maxlag:
+            raise ValueError(
+                "hold_back must be >= the maximum lag of the endog and exog "
+                "variables"
+            )
+        reg = reg[self._hold_back :]
+        if reg.shape[1] > reg.shape[0]:
+            raise ValueError(
+                f"The number of regressors ({reg.shape[1]}) including "
+                "deterministics, lags of the endog, lags of the exogenous, "
+                "and fixed regressors is larger than the sample available "
+                f"for estimation ({reg.shape[0]})."
+            )
+        return np.squeeze(y)[self._hold_back :], reg
+
+    @Appender(str(fit_doc))
+    def fit(
+        self,
+        *,
+        cov_type: str = "nonrobust",
+        cov_kwds: dict[str, Any] = None,
+        use_t: bool = True,
+    ) -> UECMResults:
+        params, cov_params, norm_cov_params = self._fit(
+            cov_type=cov_type, cov_kwds=cov_kwds, use_t=use_t
+        )
+        res = UECMResults(
+            self, params, cov_params, norm_cov_params, use_t=use_t
+        )
+        return UECMResultsWrapper(res)

     @classmethod
-    def from_ardl(cls, ardl: ARDL, missing: Literal['none', 'drop', 'raise'
-        ]='none'):
+    def from_ardl(
+        cls, ardl: ARDL, missing: Literal["none", "drop", "raise"] = "none"
+    ):
         """
         Construct a UECM from an ARDL model

@@ -1017,14 +1969,51 @@ class UECM(ARDL):
         of at least 1. Additionally, the included lags must be contiguous
         starting at 0 if non-causal or 1 if causal.
         """
-        pass
-
-    def predict(self, params: ArrayLike1D, start: (int | str | dt.datetime |
-        pd.Timestamp | None)=None, end: (int | str | dt.datetime | pd.
-        Timestamp | None)=None, dynamic: bool=False, exog: (NDArray | pd.
-        DataFrame | None)=None, exog_oos: (NDArray | pd.DataFrame | None)=
-        None, fixed: (NDArray | pd.DataFrame | None)=None, fixed_oos: (
-        NDArray | pd.DataFrame | None)=None) ->np.ndarray:
+        err = (
+            "UECM can only be created from ARDL models that include all "
+            "{var_typ} lags up to the maximum lag in the model."
+        )
+        uecm_lags = {}
+        dl_lags = ardl.dl_lags
+        for key, val in dl_lags.items():
+            max_val = max(val)
+            if len(dl_lags[key]) < (max_val + int(not ardl.causal)):
+                raise ValueError(err.format(var_typ="exogenous"))
+            uecm_lags[key] = max_val
+        if ardl.ar_lags is None:
+            ar_lags = None
+        else:
+            max_val = max(ardl.ar_lags)
+            if len(ardl.ar_lags) != max_val:
+                raise ValueError(err.format(var_typ="endogenous"))
+            ar_lags = max_val
+
+        return cls(
+            ardl.data.orig_endog,
+            ar_lags,
+            ardl.data.orig_exog,
+            uecm_lags,
+            trend=ardl.trend,
+            fixed=ardl.fixed,
+            seasonal=ardl.seasonal,
+            hold_back=ardl.hold_back,
+            period=ardl.period,
+            causal=ardl.causal,
+            missing=missing,
+            deterministic=ardl.deterministic,
+        )
+
+    def predict(
+        self,
+        params: ArrayLike1D,
+        start: int | str | dt.datetime | pd.Timestamp | None = None,
+        end: int | str | dt.datetime | pd.Timestamp | None = None,
+        dynamic: bool = False,
+        exog: NDArray | pd.DataFrame | None = None,
+        exog_oos: NDArray | pd.DataFrame | None = None,
+        fixed: NDArray | pd.DataFrame | None = None,
+        fixed_oos: NDArray | pd.DataFrame | None = None,
+    ) -> np.ndarray:
         """
         In-sample prediction and out-of-sample forecasting.

@@ -1076,7 +2065,49 @@ class UECM(ARDL):
             Array of out of in-sample predictions and / or out-of-sample
             forecasts.
         """
-        pass
+        if dynamic is not False:
+            raise NotImplementedError("dynamic forecasts are not supported")
+        params, exog, exog_oos, start, end, num_oos = self._prepare_prediction(
+            params, exog, exog_oos, start, end
+        )
+        if num_oos != 0:
+            raise NotImplementedError(
+                "Out-of-sample forecasts are not supported"
+            )
+        pred = np.full(self.endog.shape[0], np.nan)
+        pred[-self._x.shape[0] :] = self._x @ params
+        return pred[start : end + 1]
+
+    @classmethod
+    @Appender(from_formula_doc.__str__().replace("ARDL", "UECM"))
+    def from_formula(
+        cls,
+        formula: str,
+        data: pd.DataFrame,
+        lags: int | Sequence[int] | None = 0,
+        order: _ARDLOrder = 0,
+        trend: Literal["n", "c", "ct", "ctt"] = "n",
+        *,
+        causal: bool = False,
+        seasonal: bool = False,
+        deterministic: DeterministicProcess | None = None,
+        hold_back: int | None = None,
+        period: int | None = None,
+        missing: Literal["none", "raise"] = "none",
+    ) -> UECM:
+        return super().from_formula(
+            formula,
+            data,
+            lags,
+            order,
+            trend,
+            causal=causal,
+            seasonal=seasonal,
+            deterministic=deterministic,
+            hold_back=hold_back,
+            period=period,
+            missing=missing,
+        )


 class UECMResults(ARDLResults):
@@ -1097,41 +2128,160 @@ class UECMResults(ARDLResults):
     scale : float, optional
         An estimate of the scale of the model.
     """
-    _cache: dict[str, Any] = {}
+
+    _cache: dict[str, Any] = {}  # for scale setter
+
+    def _ci_wrap(
+        self, val: np.ndarray, name: str = ""
+    ) -> NDArray | pd.Series | pd.DataFrame:
+        if not isinstance(self.model.data, PandasData):
+            return val
+        ndet = self.model._blocks["deterministic"].shape[1]
+        nlvl = self.model._blocks["levels"].shape[1]
+        lbls = self.model.exog_names[: (ndet + nlvl)]
+        for i in range(ndet, ndet + nlvl):
+            lbl = lbls[i]
+            if lbl.endswith(".L1"):
+                lbls[i] = lbl[:-3]
+        if val.ndim == 2:
+            return pd.DataFrame(val, columns=lbls, index=lbls)
+        return pd.Series(val, index=lbls, name=name)

     @cache_readonly
-    def ci_params(self) ->(np.ndarray | pd.Series):
+    def ci_params(self) -> np.ndarray | pd.Series:
         """Parameters of normalized cointegrating relationship"""
-        pass
+        ndet = self.model._blocks["deterministic"].shape[1]
+        nlvl = self.model._blocks["levels"].shape[1]
+        base = np.asarray(self.params)[ndet]
+        return self._ci_wrap(self.params[: ndet + nlvl] / base, "ci_params")

     @cache_readonly
-    def ci_bse(self) ->(np.ndarray | pd.Series):
+    def ci_bse(self) -> np.ndarray | pd.Series:
         """Standard Errors of normalized cointegrating relationship"""
-        pass
+        bse = np.sqrt(np.diag(self.ci_cov_params()))
+        return self._ci_wrap(bse, "ci_bse")

     @cache_readonly
-    def ci_tvalues(self) ->(np.ndarray | pd.Series):
+    def ci_tvalues(self) -> np.ndarray | pd.Series:
         """T-values of normalized cointegrating relationship"""
-        pass
+        ndet = self.model._blocks["deterministic"].shape[1]
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            tvalues = np.asarray(self.ci_params) / np.asarray(self.ci_bse)
+            tvalues[ndet] = np.nan
+        return self._ci_wrap(tvalues, "ci_tvalues")

     @cache_readonly
-    def ci_pvalues(self) ->(np.ndarray | pd.Series):
+    def ci_pvalues(self) -> np.ndarray | pd.Series:
         """P-values of normalized cointegrating relationship"""
-        pass
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            pvalues = 2 * (1 - stats.norm.cdf(np.abs(self.ci_tvalues)))
+        return self._ci_wrap(pvalues, "ci_pvalues")

-    def ci_cov_params(self) ->(Float64Array | pd.DataFrame):
+    def ci_conf_int(self, alpha: float = 0.05) -> Float64Array | pd.DataFrame:
+        alpha = float_like(alpha, "alpha")
+
+        if self.use_t:
+            q = stats.t(self.df_resid).ppf(1 - alpha / 2)
+        else:
+            q = stats.norm().ppf(1 - alpha / 2)
+        p = self.ci_params
+        se = self.ci_bse
+        out = [p - q * se, p + q * se]
+        if not isinstance(p, pd.Series):
+            return np.column_stack(out)
+
+        df = pd.concat(out, axis=1)
+        df.columns = ["lower", "upper"]
+
+        return df
+
+    def ci_summary(self, alpha: float = 0.05) -> Summary:
+        def _ci(alpha=alpha):
+            return np.asarray(self.ci_conf_int(alpha))
+
+        smry = Summary()
+        ndet = self.model._blocks["deterministic"].shape[1]
+        nlvl = self.model._blocks["levels"].shape[1]
+        exog_names = list(self.model.exog_names)[: (ndet + nlvl)]
+
+        model = SimpleNamespace(
+            endog_names=self.model.endog_names, exog_names=exog_names
+        )
+        data = SimpleNamespace(
+            params=self.ci_params,
+            bse=self.ci_bse,
+            tvalues=self.ci_tvalues,
+            pvalues=self.ci_pvalues,
+            conf_int=_ci,
+            model=model,
+        )
+        tab = summary_params(data)
+        tab.title = "Cointegrating Vector"
+        smry.tables.append(tab)
+
+        return smry
+
+    @cache_readonly
+    def ci_resids(self) -> np.ndarray | pd.Series:
+        d = self.model._blocks["deterministic"]
+        exog = self.model.data.orig_exog
+        is_pandas = isinstance(exog, pd.DataFrame)
+        exog = exog if is_pandas else self.model.exog
+        cols = [np.asarray(d), self.model.endog]
+        for key, value in self.model.dl_lags.items():
+            if value is not None:
+                if is_pandas:
+                    cols.append(np.asarray(exog[key]))
+                else:
+                    cols.append(exog[:, key])
+        ci_x = np.column_stack(cols)
+        resids = ci_x @ self.ci_params
+        if not isinstance(self.model.data, PandasData):
+            return resids
+        index = self.model.data.orig_endog.index
+        return pd.Series(resids, index=index, name="ci_resids")
+
+    def ci_cov_params(self) -> Float64Array | pd.DataFrame:
         """Covariance of normalized of cointegrating relationship"""
-        pass
+        ndet = self.model._blocks["deterministic"].shape[1]
+        nlvl = self.model._blocks["levels"].shape[1]
+        loc = list(range(ndet + nlvl))
+        cov = self.cov_params()
+        cov_a = np.asarray(cov)
+        ci_cov = cov_a[np.ix_(loc, loc)]
+        m = ci_cov.shape[0]
+        params = np.asarray(self.params)[: ndet + nlvl]
+        base = params[ndet]
+        d = np.zeros((m, m))
+        for i in range(m):
+            if i == ndet:
+                continue
+            d[i, i] = 1 / base
+            d[i, ndet] = -params[i] / (base**2)
+        ci_cov = d @ ci_cov @ d.T
+        return self._ci_wrap(ci_cov)

     def _lag_repr(self):
         """Returns poly repr of an AR, (1  -phi1 L -phi2 L^2-...)"""
-        pass
-
-    def bounds_test(self, case: Literal[1, 2, 3, 4, 5], cov_type: str=
-        'nonrobust', cov_kwds: dict[str, Any]=None, use_t: bool=True,
-        asymptotic: bool=True, nsim: int=100000, seed: (int | Sequence[int] |
-        np.random.RandomState | np.random.Generator | None)=None):
-        """
+        # TODO
+
+    def bounds_test(
+        self,
+        case: Literal[1, 2, 3, 4, 5],
+        cov_type: str = "nonrobust",
+        cov_kwds: dict[str, Any] = None,
+        use_t: bool = True,
+        asymptotic: bool = True,
+        nsim: int = 100_000,
+        seed: int
+        | Sequence[int]
+        | np.random.RandomState
+        | np.random.Generator
+        | None = None,
+    ):
+        r"""
         Cointegration bounds test of Pesaran, Shin, and Smith

         Parameters
@@ -1198,8 +2348,8 @@ class UECMResults(ARDLResults):

         .. math::

-           \\Delta Y_{t}=\\delta_{0} + \\delta_{1}t + Z_{t-1}\\beta
-                        + \\sum_{j=0}^{P}\\Delta X_{t-j}\\Gamma + \\epsilon_{t}
+           \Delta Y_{t}=\delta_{0} + \delta_{1}t + Z_{t-1}\beta
+                        + \sum_{j=0}^{P}\Delta X_{t-j}\Gamma + \epsilon_{t}

         where :math:`Z_{t-1}` contains both :math:`Y_{t-1}` and
         :math:`X_{t-1}`.
@@ -1218,7 +2368,7 @@ class UECMResults(ARDLResults):
            test

         The test statistic is a Wald-type quadratic form test that all of the
-        coefficients in :math:`\\beta` are 0 along with any included
+        coefficients in :math:`\beta` are 0 along with any included
         deterministic terms, which depends on the case. The statistic returned
         is an F-type test statistic which is the standard quadratic form test
         statistic divided by the number of restrictions.
@@ -1229,16 +2379,202 @@ class UECMResults(ARDLResults):
            approaches to the analysis of level relationships. Journal of
            applied econometrics, 16(3), 289-326.
         """
-        pass
+        model = self.model
+        trend: Literal["n", "c", "ct"]
+        if case == 1:
+            trend = "n"
+        elif case in (2, 3):
+            trend = "c"
+        else:
+            trend = "ct"
+        order = {key: max(val) for key, val in model._order.items()}
+        uecm = UECM(
+            model.data.endog,
+            max(model.ar_lags),
+            model.data.orig_exog,
+            order=order,
+            causal=model.causal,
+            trend=trend,
+        )
+        res = uecm.fit(cov_type=cov_type, cov_kwds=cov_kwds, use_t=use_t)
+        cov = res.cov_params()
+        nvar = len(res.model.ardl_order)
+        if case == 1:
+            rest = np.arange(nvar)
+        elif case == 2:
+            rest = np.arange(nvar + 1)
+        elif case == 3:
+            rest = np.arange(1, nvar + 1)
+        elif case == 4:
+            rest = np.arange(1, nvar + 2)
+        elif case == 5:
+            rest = np.arange(2, nvar + 2)
+        r = np.zeros((rest.shape[0], cov.shape[1]))
+        for i, loc in enumerate(rest):
+            r[i, loc] = 1
+        vcv = r @ cov @ r.T
+        coef = r @ res.params
+        stat = coef.T @ np.linalg.inv(vcv) @ coef / r.shape[0]
+        k = nvar
+        if asymptotic and k <= 10:
+            cv = pss_critical_values.crit_vals
+            key = (k, case)
+            upper = cv[key + (True,)]
+            lower = cv[key + (False,)]
+            crit_vals = pd.DataFrame(
+                {"lower": lower, "upper": upper},
+                index=pss_critical_values.crit_percentiles,
+            )
+            crit_vals.index.name = "percentile"
+            p_values = pd.Series(
+                {
+                    "lower": _pss_pvalue(stat, k, case, False),
+                    "upper": _pss_pvalue(stat, k, case, True),
+                }
+            )
+        else:
+            nobs = res.resid.shape[0]
+            crit_vals, p_values = _pss_simulate(
+                stat, k, case, nobs=nobs, nsim=nsim, seed=seed
+            )
+
+        return BoundsTestResult(
+            stat,
+            crit_vals,
+            p_values,
+            "No Cointegration",
+            "Possible Cointegration",
+        )
+
+
+def _pss_pvalue(stat: float, k: int, case: int, i1: bool) -> float:
+    key = (k, case, i1)
+    large_p = pss_critical_values.large_p[key]
+    small_p = pss_critical_values.small_p[key]
+    threshold = pss_critical_values.stat_star[key]
+    log_stat = np.log(stat)
+    p = small_p if stat > threshold else large_p
+    x = [log_stat**i for i in range(len(p))]
+    return 1 - stats.norm.cdf(x @ np.array(p))
+
+
+def _pss_simulate(
+    stat: float,
+    k: int,
+    case: Literal[1, 2, 3, 4, 5],
+    nobs: int,
+    nsim: int,
+    seed: int
+    | Sequence[int]
+    | np.random.RandomState
+    | np.random.Generator
+    | None,
+) -> tuple[pd.DataFrame, pd.Series]:
+    rs: np.random.RandomState | np.random.Generator
+    if not isinstance(seed, np.random.RandomState):
+        rs = np.random.default_rng(seed)
+    else:
+        assert isinstance(seed, np.random.RandomState)
+        rs = seed
+
+    def _vectorized_ols_resid(rhs, lhs):
+        rhs_t = np.transpose(rhs, [0, 2, 1])
+        xpx = np.matmul(rhs_t, rhs)
+        xpy = np.matmul(rhs_t, lhs)
+        b = np.linalg.solve(xpx, xpy)
+        return np.squeeze(lhs - np.matmul(rhs, b))
+
+    block_size = 100_000_000 // (8 * nobs * k)
+    remaining = nsim
+    loc = 0
+    f_upper = np.empty(nsim)
+    f_lower = np.empty(nsim)
+    while remaining > 0:
+        to_do = min(remaining, block_size)
+        e = rs.standard_normal((to_do, nobs + 1, k))
+
+        y = np.cumsum(e[:, :, :1], axis=1)
+        x_upper = np.cumsum(e[:, :, 1:], axis=1)
+        x_lower = e[:, :, 1:]
+        lhs = np.diff(y, axis=1)
+        if case in (2, 3):
+            rhs = np.empty((to_do, nobs, k + 1))
+            rhs[:, :, -1] = 1
+        elif case in (4, 5):
+            rhs = np.empty((to_do, nobs, k + 2))
+            rhs[:, :, -2] = np.arange(nobs, dtype=float)
+            rhs[:, :, -1] = 1
+        else:
+            rhs = np.empty((to_do, nobs, k))
+        rhs[:, :, :1] = y[:, :-1]
+        rhs[:, :, 1:k] = x_upper[:, :-1]
+
+        u = _vectorized_ols_resid(rhs, lhs)
+        df = rhs.shape[1] - rhs.shape[2]
+        s2 = (u**2).sum(1) / df
+
+        if case in (3, 4):
+            rhs_r = rhs[:, :, -1:]
+        elif case == 5:  # case 5
+            rhs_r = rhs[:, :, -2:]
+        if case in (3, 4, 5):
+            ur = _vectorized_ols_resid(rhs_r, lhs)
+            nrest = rhs.shape[-1] - rhs_r.shape[-1]
+        else:
+            ur = np.squeeze(lhs)
+            nrest = rhs.shape[-1]
+
+        f = ((ur**2).sum(1) - (u**2).sum(1)) / nrest
+        f /= s2
+        f_upper[loc : loc + to_do] = f
+
+        # Lower
+        rhs[:, :, 1:k] = x_lower[:, :-1]
+        u = _vectorized_ols_resid(rhs, lhs)
+        s2 = (u**2).sum(1) / df
+
+        if case in (3, 4):
+            rhs_r = rhs[:, :, -1:]
+        elif case == 5:  # case 5
+            rhs_r = rhs[:, :, -2:]
+        if case in (3, 4, 5):
+            ur = _vectorized_ols_resid(rhs_r, lhs)
+            nrest = rhs.shape[-1] - rhs_r.shape[-1]
+        else:
+            ur = np.squeeze(lhs)
+            nrest = rhs.shape[-1]
+
+        f = ((ur**2).sum(1) - (u**2).sum(1)) / nrest
+        f /= s2
+        f_lower[loc : loc + to_do] = f
+
+        loc += to_do
+        remaining -= to_do
+
+    crit_percentiles = pss_critical_values.crit_percentiles
+    crit_vals = pd.DataFrame(
+        {
+            "lower": np.percentile(f_lower, crit_percentiles),
+            "upper": np.percentile(f_upper, crit_percentiles),
+        },
+        index=crit_percentiles,
+    )
+    crit_vals.index.name = "percentile"
+    p_values = pd.Series(
+        {"lower": (stat < f_lower).mean(), "upper": (stat < f_upper).mean()}
+    )
+    return crit_vals, p_values


 class UECMResultsWrapper(wrap.ResultsWrapper):
     _attrs = {}
-    _wrap_attrs = wrap.union_dicts(tsa_model.TimeSeriesResultsWrapper.
-        _wrap_attrs, _attrs)
+    _wrap_attrs = wrap.union_dicts(
+        tsa_model.TimeSeriesResultsWrapper._wrap_attrs, _attrs
+    )
     _methods = {}
-    _wrap_methods = wrap.union_dicts(tsa_model.TimeSeriesResultsWrapper.
-        _wrap_methods, _methods)
+    _wrap_methods = wrap.union_dicts(
+        tsa_model.TimeSeriesResultsWrapper._wrap_methods, _methods
+    )


 wrap.populate_wrapper(UECMResultsWrapper, UECMResults)
diff --git a/statsmodels/tsa/ardl/pss_critical_values.py b/statsmodels/tsa/ardl/pss_critical_values.py
index 23e0ad024..e9ad84be0 100644
--- a/statsmodels/tsa/ardl/pss_critical_values.py
+++ b/statsmodels/tsa/ardl/pss_critical_values.py
@@ -1,3 +1,6 @@
+#!/usr/bin/env python
+# coding: utf-8
+
 """
 Critical value polynomials and related quantities for the bounds test of

@@ -32,280 +35,419 @@ When this the models, the polynomial is evaluated at the natural log of the
 test statistic and then the normal CDF of this value is computed to produce
 the p-value.
 """
-__all__ = ['large_p', 'small_p', 'crit_vals', 'crit_percentiles', 'stat_star']
-large_p = {(1, 1, False): [0.2231, 0.91426, 0.10102, 0.00569], (1, 1, True):
-    [-0.21766, 0.85933, 0.10411, 0.00661], (1, 2, False): [-0.60796, 
-    1.48713, 0.15076, 0.04453], (1, 2, True): [-0.96204, 1.52593, 0.15996, 
-    0.04166], (1, 3, False): [-0.62883, 0.78991, 0.1, 0.00693], (1, 3, True
-    ): [-0.91895, 0.82086, 0.12921, 0.01076], (1, 4, False): [-1.50546, 
-    1.79052, 0.05488, 0.06801], (1, 4, True): [-1.79654, 1.8048, 0.06573, 
-    0.06768], (1, 5, False): [-1.36367, 0.94126, 0.21556, 0.02473], (1, 5, 
-    True): [-1.60554, 0.93305, 0.2422, 0.03241], (2, 1, False): [0.20576, 
-    1.18914, 0.15731, 0.01144], (2, 1, True): [-0.49024, 1.16958, 0.20564, 
-    0.02008], (2, 2, False): [-0.51799, 1.6368, 0.18955, 0.04317], (2, 2, 
-    True): [-1.13394, 1.71056, 0.20442, 0.04195], (2, 3, False): [-0.51712,
-    1.12963, 0.18936, 0.01808], (2, 3, True): [-1.07441, 1.14964, 0.26066, 
-    0.03338], (2, 4, False): [-1.29895, 1.88501, 0.11734, 0.06615], (2, 4, 
-    True): [-1.82455, 1.92207, 0.13753, 0.06269], (2, 5, False): [-1.22263,
-    1.23208, 0.31401, 0.04495], (2, 5, True): [-1.67689, 1.17567, 0.33606, 
-    0.05898], (3, 1, False): [0.1826, 1.39275, 0.19774, 0.01647], (3, 1, 
-    True): [-0.71889, 1.39726, 0.29712, 0.03794], (3, 2, False): [-0.45864,
-    1.77632, 0.22125, 0.04372], (3, 2, True): [-1.28619, 1.88107, 0.23969, 
-    0.04414], (3, 3, False): [-0.45093, 1.38824, 0.26556, 0.03063], (3, 3, 
-    True): [-1.22712, 1.36564, 0.34942, 0.05555], (3, 4, False): [-1.15886,
-    1.99182, 0.16358, 0.06392], (3, 4, True): [-1.88388, 2.05362, 0.18349, 
-    0.06501], (3, 5, False): [-1.11221, 1.44327, 0.3547, 0.05263], (3, 5, 
-    True): [-1.75354, 1.37461, 0.3882, 0.07239], (4, 1, False): [0.16431, 
-    1.56391, 0.22944, 0.02067], (4, 1, True): [-0.90799, 1.56908, 0.34763, 
-    0.04814], (4, 2, False): [-0.41568, 1.90715, 0.24783, 0.04407], (4, 2, 
-    True): [-1.42373, 2.03902, 0.26907, 0.04755], (4, 3, False): [-0.41104,
-    1.5716, 0.3066, 0.03842], (4, 3, True): [-1.36194, 1.54043, 0.40145, 
-    0.06846], (4, 4, False): [-1.05651, 2.10007, 0.20201, 0.06129], (4, 4, 
-    True): [-1.95474, 2.18305, 0.22527, 0.06441], (4, 5, False): [-1.02502,
-    1.62605, 0.38203, 0.05565], (4, 5, True): [-1.83458, 1.555, 0.42888, 
-    0.07459], (5, 1, False): [0.15015, 1.71718, 0.2584, 0.02507], (5, 1, 
-    True): [-1.0707, 1.72829, 0.39037, 0.05468], (5, 2, False): [-0.38277, 
-    2.02985, 0.27139, 0.04513], (5, 2, True): [-1.54974, 2.18631, 0.29592, 
-    0.04967], (5, 3, False): [-0.38023, 1.72586, 0.33033, 0.04188], (5, 3, 
-    True): [-1.48415, 1.70271, 0.44016, 0.07248], (5, 4, False): [-0.97676,
-    2.20429, 0.23233, 0.06543], (5, 4, True): [-2.03144, 2.31343, 0.25394, 
-    0.0675], (5, 5, False): [-0.95421, 1.78775, 0.40239, 0.05642], (5, 5, 
-    True): [-1.91679, 1.72031, 0.46434, 0.06641], (6, 1, False): [0.13913, 
-    1.8581, 0.28528, 0.02931], (6, 1, True): [-1.21438, 1.87638, 0.42416, 
-    0.05485], (6, 2, False): [-0.35664, 2.14606, 0.29484, 0.04728], (6, 2, 
-    True): [-1.66532, 2.32448, 0.31723, 0.05528], (6, 3, False): [-0.35498,
-    1.86634, 0.35087, 0.04455], (6, 3, True): [-1.59785, 1.85278, 0.47304, 
-    0.07114], (6, 4, False): [-0.91274, 2.30752, 0.26053, 0.0644], (6, 4, 
-    True): [-2.10956, 2.43721, 0.2852, 0.06694], (6, 5, False): [-0.89553, 
-    1.9318, 0.41381, 0.05292], (6, 5, True): [-1.99931, 1.87789, 0.49842, 
-    0.04135], (7, 1, False): [0.12974, 1.98503, 0.30606, 0.03218], (7, 1, 
-    True): [-1.34555, 2.01647, 0.45456, 0.05018], (7, 2, False): [-0.33519,
-    2.25631, 0.31659, 0.05016], (7, 2, True): [-1.77496, 2.45806, 0.3372, 
-    0.05741], (7, 3, False): [-0.33377, 1.99554, 0.36742, 0.04624], (7, 3, 
-    True): [-1.70381, 1.99863, 0.49883, 0.05092], (7, 4, False): [-0.8596, 
-    2.40762, 0.28334, 0.06401], (7, 4, True): [-2.18704, 2.55828, 0.30627, 
-    0.07091], (7, 5, False): [-0.84606, 2.06291, 0.42505, 0.05152], (7, 5, 
-    True): [-2.08097, 2.02139, 0.5348, 0.02343], (8, 1, False): [0.12244, 
-    2.10698, 0.32849, 0.03596], (8, 1, True): [-1.46632, 2.1505, 0.48168, 
-    0.04116], (8, 2, False): [-0.31707, 2.36107, 0.33198, 0.04953], (8, 2, 
-    True): [-1.87722, 2.58105, 0.35963, 0.05848], (8, 3, False): [-0.31629,
-    2.11679, 0.38514, 0.04868], (8, 3, True): [-1.80483, 2.13412, 0.52935, 
-    0.03618], (8, 4, False): [-0.81509, 2.50518, 0.30456, 0.06388], (8, 4, 
-    True): [-2.26501, 2.67227, 0.33843, 0.06554], (8, 5, False): [-0.80333,
-    2.18457, 0.42995, 0.0463], (8, 5, True): [-2.16125, 2.15208, 0.58319, 
-    0.0], (9, 1, False): [0.11562, 2.22037, 0.34907, 0.03968], (9, 1, True):
-    [-1.57878, 2.27626, 0.5124, 0.03164], (9, 2, False): [-0.30188, 2.46235,
-    0.35132, 0.05209], (9, 2, True): [-1.97465, 2.70256, 0.37466, 0.06205],
-    (9, 3, False): [-0.30097, 2.23118, 0.39976, 0.05001], (9, 3, True): [-
-    1.90164, 2.26261, 0.56431, 0.0175], (9, 4, False): [-0.77664, 2.59712, 
-    0.32618, 0.06452], (9, 4, True): [-2.33996, 2.78253, 0.36072, 0.06644],
-    (9, 5, False): [-0.76631, 2.2987, 0.43834, 0.04274], (9, 5, True): [-
-    2.23753, 2.27521, 0.60763, 0.0], (10, 1, False): [0.10995, 2.3278, 
-    0.36567, 0.04153], (10, 1, True): [-1.6849, 2.39419, 0.5433, 0.02457],
-    (10, 2, False): [-0.28847, 2.55819, 0.36959, 0.05499], (10, 2, True): [
-    -2.06725, 2.81756, 0.38761, 0.0676], (10, 3, False): [-0.28748, 2.33948,
-    0.41398, 0.05101], (10, 3, True): [-1.99259, 2.38061, 0.59433, 0.01114],
-    (10, 4, False): [-0.74317, 2.68624, 0.345, 0.07032], (10, 4, True): [-
-    2.41409, 2.8931, 0.37487, 0.07102], (10, 5, False): [-0.73464, 2.40692,
-    0.45153, 0.0434], (10, 5, True): [-2.31364, 2.39092, 0.64313, -0.01012]}
-small_p = {(1, 1, False): [0.2585, 0.92944, 0.25921], (1, 1, True): [-
-    0.17399, 0.88425, 0.29947], (1, 2, False): [-0.45787, 1.15813, 0.37268],
-    (1, 2, True): [-0.76388, 1.13438, 0.39908], (1, 3, False): [-0.57887, 
-    0.87657, 0.32929], (1, 3, True): [-0.88284, 0.81513, 0.366], (1, 4, 
-    False): [-1.1926, 1.21061, 0.40386], (1, 4, True): [-1.42909, 1.16607, 
-    0.42899], (1, 5, False): [-1.34428, 0.8756, 0.37809], (1, 5, True): [-
-    1.56285, 0.80464, 0.40703], (2, 1, False): [0.23004, 1.12045, 0.31791],
-    (2, 1, True): [-0.45371, 1.06577, 0.38144], (2, 2, False): [-0.41191, 
-    1.36838, 0.39668], (2, 2, True): [-0.9488, 1.32707, 0.44808], (2, 3, 
-    False): [-0.49166, 1.11266, 0.36824], (2, 3, True): [-1.03636, 1.04019,
-    0.42589], (2, 4, False): [-1.08188, 1.42797, 0.42653], (2, 4, True): [-
-    1.52152, 1.36, 0.47256], (2, 5, False): [-1.12408, 1.0565, 0.43505], (2,
-    5, True): [-1.58614, 1.01208, 0.46796], (3, 1, False): [0.20945, 
-    1.29304, 0.36292], (3, 1, True): [-0.60112, 1.139, 0.47837], (3, 2, 
-    False): [-0.37491, 1.53959, 0.42397], (3, 2, True): [-1.11163, 1.50639,
-    0.48662], (3, 3, False): [-0.41411, 1.27093, 0.41524], (3, 3, True): [-
-    1.14285, 1.18673, 0.4906], (3, 4, False): [-0.9946, 1.60793, 0.44771],
-    (3, 4, True): [-1.62609, 1.54566, 0.50619], (3, 5, False): [-1.04988, 
-    1.31372, 0.44802], (3, 5, True): [-1.68976, 1.25316, 0.49896], (4, 1, 
-    False): [0.18839, 1.46484, 0.39125], (4, 1, True): [-0.81822, 1.35949, 
-    0.50619], (4, 2, False): [-0.35123, 1.705, 0.44075], (4, 2, True): [-
-    1.2591, 1.67286, 0.52021], (4, 3, False): [-0.34716, 1.39436, 0.46391],
-    (4, 3, True): [-1.30728, 1.41428, 0.51292], (4, 4, False): [-0.92783, 
-    1.77056, 0.46587], (4, 4, True): [-1.71493, 1.69609, 0.54221], (4, 5, 
-    False): [-0.97468, 1.50704, 0.46661], (4, 5, True): [-1.7783, 1.4453, 
-    0.53112], (5, 1, False): [0.17584, 1.60806, 0.424], (5, 1, True): [-
-    1.00705, 1.5668, 0.52487], (5, 2, False): [-0.32186, 1.82909, 0.47183],
-    (5, 2, True): [-1.39492, 1.83145, 0.54756], (5, 3, False): [-0.32204, 
-    1.55407, 0.4884], (5, 3, True): [-1.43499, 1.58772, 0.54359], (5, 4, 
-    False): [-0.87005, 1.9128, 0.48361], (5, 4, True): [-1.81929, 1.8594, 
-    0.56629], (5, 5, False): [-0.91534, 1.6826, 0.47972], (5, 5, True): [-
-    1.86297, 1.61238, 0.56196], (6, 1, False): [0.16642, 1.7409, 0.45235],
-    (6, 1, True): [-1.15641, 1.72534, 0.55469], (6, 2, False): [-0.31023, 
-    1.97806, 0.47892], (6, 2, True): [-1.52248, 1.98657, 0.56855], (6, 3, 
-    False): [-0.30333, 1.70462, 0.50703], (6, 3, True): [-1.5521, 1.74539, 
-    0.57191], (6, 4, False): [-0.82345, 2.04624, 0.50026], (6, 4, True): [-
-    1.90659, 1.99476, 0.59394], (6, 5, False): [-0.85675, 1.81838, 0.50387],
-    (6, 5, True): [-1.92708, 1.73629, 0.60069], (7, 1, False): [0.15013, 
-    1.88779, 0.46397], (7, 1, True): [-1.28169, 1.85521, 0.58877], (7, 2, 
-    False): [-0.2904, 2.09042, 0.50233], (7, 2, True): [-1.62626, 2.10378, 
-    0.6013], (7, 3, False): [-0.29138, 1.8506, 0.52083], (7, 3, True): [-
-    1.64831, 1.87115, 0.60523], (7, 4, False): [-0.78647, 2.1757, 0.51247],
-    (7, 4, True): [-1.98344, 2.10977, 0.62411], (7, 5, False): [-0.81099, 
-    1.95374, 0.51949], (7, 5, True): [-1.99875, 1.86512, 0.63051], (8, 1, 
-    False): [0.14342, 2.00691, 0.48514], (8, 1, True): [-1.3933, 1.97361, 
-    0.62074], (8, 2, False): [-0.27952, 2.20983, 0.51721], (8, 2, True): [-
-    1.74485, 2.25435, 0.61354], (8, 3, False): [-0.28049, 1.98611, 0.53286],
-    (8, 3, True): [-1.74116, 1.99245, 0.63511], (8, 4, False): [-0.74797, 
-    2.28202, 0.53356], (8, 4, True): [-2.07764, 2.25027, 0.64023], (8, 5, 
-    False): [-0.76505, 2.06317, 0.54393], (8, 5, True): [-2.04872, 1.95334,
-    0.67177], (9, 1, False): [0.13505, 2.12341, 0.50439], (9, 1, True): [-
-    1.49339, 2.07805, 0.65464], (9, 2, False): [-0.26881, 2.32256, 0.53025],
-    (9, 2, True): [-1.82677, 2.34223, 0.65004], (9, 3, False): [-0.26657, 
-    2.09906, 0.55384], (9, 3, True): [-1.80085, 2.06043, 0.68234], (9, 4, 
-    False): [-0.71672, 2.38896, 0.54931], (9, 4, True): [-2.17306, 2.39146,
-    0.65252], (9, 5, False): [-0.70907, 2.13027, 0.58668], (9, 5, True): [-
-    2.14411, 2.10595, 0.68478], (10, 1, False): [0.12664, 2.23871, 0.51771],
-    (10, 1, True): [-1.59784, 2.19509, 0.67874], (10, 2, False): [-0.25969,
-    2.4312, 0.54096], (10, 2, True): [-1.93843, 2.48708, 0.65741], (10, 3, 
-    False): [-0.25694, 2.21617, 0.56619], (10, 3, True): [-1.89772, 2.1894,
-    0.70143], (10, 4, False): [-0.69126, 2.49776, 0.5583], (10, 4, True): [
-    -2.24685, 2.4968, 0.67598], (10, 5, False): [-0.6971, 2.28206, 0.57816],
-    (10, 5, True): [-2.21015, 2.208, 0.71379]}
-stat_star = {(1, 1, False): 0.855423425047013, (1, 1, True): 
-    0.9074438436193457, (1, 2, False): 2.3148213273461034, (1, 2, True): 
-    2.727010046970744, (1, 3, False): 0.846390593107207, (1, 3, True): 
-    1.157556027201022, (1, 4, False): 3.220377136548005, (1, 4, True): 
-    3.6108265020012418, (1, 5, False): 1.7114703606421378, (1, 5, True): 
-    2.066325210881278, (2, 1, False): 1.1268996107665314, (2, 1, True): 
-    1.3332514927355072, (2, 2, False): 2.0512213167246456, (2, 2, True): 
-    2.656191837644102, (2, 3, False): 1.058908331354388, (2, 3, True): 
-    1.5313322825819844, (2, 4, False): 2.7213091542989725, (2, 4, True): 
-    3.2984645209852856, (2, 5, False): 2.6006009671146497, (2, 5, True): 
-    2.661856653261213, (3, 1, False): 1.263159095916295, (3, 1, True): 
-    2.4151349732452863, (3, 2, False): 1.8886043232371843, (3, 2, True): 
-    2.6028096820968405, (3, 3, False): 1.4879903191884682, (3, 3, True): 
-    2.2926969339773926, (3, 4, False): 2.418527659154858, (3, 4, True): 
-    3.1039322592065988, (3, 5, False): 1.9523612040944802, (3, 5, True): 
-    2.2115727453490757, (4, 1, False): 1.290890114741129, (4, 1, True): 
-    2.1296963408410905, (4, 2, False): 1.7770902061605607, (4, 2, True): 
-    2.5611885327765402, (4, 3, False): 1.9340163095801728, (4, 3, True): 
-    1.9141318638062572, (4, 4, False): 2.2146739201335466, (4, 4, True): 
-    2.9701790485477932, (4, 5, False): 1.7408452994169448, (4, 5, True): 
-    2.1047247176583914, (5, 1, False): 1.336967174239227, (5, 1, True): 
-    1.9131415178585627, (5, 2, False): 1.6953274259688569, (5, 2, True): 
-    2.52745981091846, (5, 3, False): 1.8124340908468068, (5, 3, True): 
-    1.8520883187848405, (5, 4, False): 2.0675009559739297, (5, 4, True): 
-    2.8728076833515552, (5, 5, False): 1.5978968362839456, (5, 5, True): 
-    2.1017517002543418, (6, 1, False): 1.3810422398306446, (6, 1, True): 
-    1.8993612909227247, (6, 2, False): 1.6324374150719114, (6, 2, True): 
-    2.498801004400209, (6, 3, False): 1.72340094901749, (6, 3, True): 
-    1.8586513178563737, (6, 4, False): 1.955819927102859, (6, 4, True): 
-    2.797145060481245, (6, 5, False): 1.578613967104358, (6, 5, True): 
-    2.356249534336445, (7, 1, False): 1.319436681229134, (7, 1, True): 
-    1.9955849619883248, (7, 2, False): 1.5822190052675569, (7, 2, True): 
-    2.4744987764453055, (7, 3, False): 1.65578510076754, (7, 3, True): 
-    2.046536484369615, (7, 4, False): 1.8684573094851133, (7, 4, True): 
-    2.737241392502754, (7, 5, False): 1.571855677342554, (7, 5, True): 
-    2.6006325210258505, (8, 1, False): 1.3413558170956845, (8, 1, True): 
-    2.182981174661154, (8, 2, False): 1.5416965902808288, (8, 2, True): 
-    2.4538471213095594, (8, 3, False): 1.6021238307647196, (8, 3, True): 
-    2.2031866832480778, (8, 4, False): 1.797595752125897, (8, 4, True): 
-    2.688099837236925, (8, 5, False): 1.6561231184668357, (8, 5, True): 
-    2.883361281576836, (9, 1, False): 1.3260368480749927, (9, 1, True): 
-    2.359689612641543, (9, 2, False): 1.5074890058192492, (9, 2, True): 
-    2.435592395931648, (9, 3, False): 1.5584090417965821, (9, 3, True): 
-    2.586293446202391, (9, 4, False): 1.7393454428092985, (9, 4, True): 
-    2.6470908946956655, (9, 5, False): 1.8180517504983742, (9, 5, True): 
-    2.818161371392247, (10, 1, False): 1.3126519241806318, (10, 1, True): 
-    2.3499432601613885, (10, 2, False): 1.4785447632683744, (10, 2, True): 
-    2.4199239298786215, (10, 3, False): 1.5219767684407846, (10, 3, True): 
-    2.55484741648857, (10, 4, False): 1.6902675233415512, (10, 4, True): 
-    2.6119272436084637, (10, 5, False): 1.7372865030759366, (10, 5, True): 
-    2.7644864472524904}
-crit_percentiles = 90, 95, 99, 99.9
-crit_vals = {(1, 1, False): [2.4170317, 3.119659, 4.7510799, 7.0838335], (1,
-    1, True): [3.2538509, 4.0643748, 5.8825257, 8.4189144], (1, 2, False):
-    [3.0235968, 3.6115364, 4.9094056, 6.6859696], (1, 2, True): [3.4943406,
-    4.1231394, 5.4961076, 7.3531815], (1, 3, False): [4.044319, 4.9228967, 
-    6.8609106, 9.5203666], (1, 3, True): [4.7771822, 5.7217442, 7.7821227, 
-    10.557471], (1, 4, False): [4.0317707, 4.6921341, 6.1259225, 8.0467248],
-    (1, 4, True): [4.4725009, 5.169214, 6.668854, 8.6632132], (1, 5, False):
-    [5.5958071, 6.586727, 8.7355157, 11.6171903], (1, 5, True): [6.2656898,
-    7.3133165, 9.5652229, 12.5537707], (2, 1, False): [2.1562308, 2.6846692,
-    3.8773621, 5.5425892], (2, 1, True): [3.1684785, 3.8003954, 5.177742, 
-    7.0453814], (2, 2, False): [2.6273503, 3.0998243, 4.1327001, 5.528847],
-    (2, 2, True): [3.3084134, 3.8345125, 4.9642009, 6.4657839], (2, 3, 
-    False): [3.1741284, 3.8022629, 5.1722882, 7.0241224], (2, 3, True): [
-    4.108262, 4.8116858, 6.3220548, 8.322478], (2, 4, False): [3.3668869, 
-    3.8887628, 5.0115801, 6.5052326], (2, 4, True): [4.0126604, 4.5835675, 
-    5.7968684, 7.3887863], (2, 5, False): [4.1863149, 4.8834936, 6.3813095,
-    8.3781415], (2, 5, True): [5.053508, 5.8168869, 7.4384998, 9.565425], (
-    3, 1, False): [1.998571, 2.4316514, 3.3919322, 4.709226], (3, 1, True):
-    [3.0729965, 3.6016775, 4.7371358, 6.2398661], (3, 2, False): [2.3813866,
-    2.7820412, 3.6486786, 4.8089784], (3, 2, True): [3.1778198, 3.6364094, 
-    4.6114583, 5.8888408], (3, 3, False): [2.7295224, 3.2290217, 4.3110408,
-    5.7599206], (3, 3, True): [3.7471556, 4.3222818, 5.5425521, 7.1435458],
-    (3, 4, False): [2.9636218, 3.4007434, 4.3358236, 5.5729155], (3, 4, 
-    True): [3.7234883, 4.2135706, 5.247283, 6.5911207], (3, 5, False): [
-    3.4742551, 4.0219835, 5.1911046, 6.7348191], (3, 5, True): [4.4323554, 
-    5.0480574, 6.3448127, 8.0277313], (4, 1, False): [1.8897829, 2.2616928,
-    3.0771215, 4.1837434], (4, 1, True): [2.9925753, 3.4545032, 4.4326745, 
-    5.7123835], (4, 2, False): [2.2123295, 2.5633388, 3.3177874, 4.321218],
-    (4, 2, True): [3.0796353, 3.4898084, 4.3536497, 5.4747288], (4, 3, 
-    False): [2.4565534, 2.877209, 3.7798528, 4.9852682], (4, 3, True): [
-    3.516144, 4.0104999, 5.0504684, 6.4022435], (4, 4, False): [2.6902225, 
-    3.0699099, 3.877333, 4.9405835], (4, 4, True): [3.5231152, 3.9578931, 
-    4.867071, 6.0403311], (4, 5, False): [3.0443998, 3.5009718, 4.4707539, 
-    5.7457746], (4, 5, True): [4.0501255, 4.5739556, 5.6686684, 7.0814031],
-    (5, 1, False): [1.8104326, 2.1394999, 2.8541086, 3.8114409], (5, 1, 
-    True): [2.9267613, 3.3396521, 4.2078599, 5.3342038], (5, 2, False): [
-    2.0879588, 2.40264, 3.0748083, 3.9596152], (5, 2, True): [3.002768, 
-    3.3764374, 4.1585099, 5.1657752], (5, 3, False): [2.2702787, 2.6369717,
-    3.4203738, 4.4521021], (5, 3, True): [3.3535243, 3.7914038, 4.7060983, 
-    5.8841151], (5, 4, False): [2.4928973, 2.831033, 3.5478855, 4.4836677],
-    (5, 4, True): [3.3756681, 3.7687148, 4.587147, 5.6351487], (5, 5, False
-    ): [2.7536425, 3.149282, 3.985975, 5.0799181], (5, 5, True): [3.7890425,
-    4.2501858, 5.2074857, 6.4355821], (6, 1, False): [1.7483313, 2.0453753,
-    2.685931, 3.5375009], (6, 1, True): [2.8719403, 3.2474515, 4.0322637, 
-    5.0451946], (6, 2, False): [1.9922451, 2.2792144, 2.8891314, 3.690865],
-    (6, 2, True): [2.9399824, 3.2851357, 4.0031551, 4.9247226], (6, 3, 
-    False): [2.1343676, 2.4620175, 3.1585901, 4.0720179], (6, 3, True): [
-    3.2311014, 3.6271964, 4.4502999, 5.5018575], (6, 4, False): [2.3423792,
-    2.6488947, 3.2947623, 4.1354724], (6, 4, True): [3.2610813, 3.6218989, 
-    4.3702232, 5.3232767], (6, 5, False): [2.5446232, 2.8951601, 3.633989, 
-    4.5935586], (6, 5, True): [3.5984454, 4.0134462, 4.8709448, 5.9622726],
-    (7, 1, False): [1.6985327, 1.9707636, 2.5536649, 3.3259272], (7, 1, 
-    True): [2.825928, 3.1725169, 3.8932738, 4.8134085], (7, 2, False): [
-    1.9155946, 2.1802812, 2.7408759, 3.4710326], (7, 2, True): [2.8879427, 
-    3.2093335, 3.8753322, 4.724748], (7, 3, False): [2.0305429, 2.3281704, 
-    2.9569345, 3.7788337], (7, 3, True): [3.136325, 3.4999128, 4.2519893, 
-    5.2075305], (7, 4, False): [2.2246175, 2.5055486, 3.0962182, 3.86164],
-    (7, 4, True): [3.1695552, 3.5051856, 4.1974421, 5.073436], (7, 5, False
-    ): [2.3861201, 2.7031072, 3.3680435, 4.2305443], (7, 5, True): [
-    3.4533491, 3.8323234, 4.613939, 5.6044399], (8, 1, False): [1.6569223, 
-    1.9092423, 2.4470718, 3.1537838], (8, 1, True): [2.7862884, 3.1097259, 
-    3.7785302, 4.6293176], (8, 2, False): [1.8532862, 2.0996872, 2.6186041,
-    3.2930359], (8, 2, True): [2.8435812, 3.1459955, 3.769165, 4.5623681],
-    (8, 3, False): [1.9480198, 2.2215083, 2.7979659, 3.54771], (8, 3, True):
-    [3.0595184, 3.3969531, 4.0923089, 4.9739178], (8, 4, False): [2.1289147,
-    2.3893773, 2.9340882, 3.6390988], (8, 4, True): [3.094188, 3.4085297, 
-    4.0545165, 4.8699787], (8, 5, False): [2.2616596, 2.5515168, 3.1586476,
-    3.9422645], (8, 5, True): [3.3374076, 3.6880139, 4.407457, 5.3152095],
-    (9, 1, False): [1.6224492, 1.8578787, 2.3580077, 3.0112501], (9, 1, 
-    True): [2.7520721, 3.0557346, 3.6811682, 4.4739536], (9, 2, False): [
-    1.8008993, 2.0320841, 2.5170871, 3.1451424], (9, 2, True): [2.8053707, 
-    3.091422, 3.6784683, 4.4205306], (9, 3, False): [1.8811231, 2.1353897, 
-    2.6683796, 3.358463], (9, 3, True): [2.9957112, 3.3114482, 3.9596061, 
-    4.7754473], (9, 4, False): [2.0498497, 2.2930641, 2.8018384, 3.4543646],
-    (9, 4, True): [3.0308611, 3.3269185, 3.9347618, 4.6993614], (9, 5, 
-    False): [2.1610306, 2.4296727, 2.98963, 3.7067719], (9, 5, True): [
-    3.2429533, 3.5699095, 4.2401975, 5.0823119], (10, 1, False): [1.5927907,
-    1.8145253, 2.2828013, 2.8927966], (10, 1, True): [2.7222721, 3.009471, 
-    3.5990544, 4.3432975], (10, 2, False): [1.756145, 1.9744492, 2.4313123,
-    3.0218681], (10, 2, True): [2.7724339, 3.0440412, 3.6004793, 4.3015151],
-    (10, 3, False): [1.8248841, 2.0628201, 2.5606728, 3.2029316], (10, 3, 
-    True): [2.9416094, 3.239357, 3.8484916, 4.6144906], (10, 4, False): [
-    1.9833587, 2.2124939, 2.690228, 3.3020807], (10, 4, True): [2.9767752, 
-    3.2574924, 3.8317161, 4.5512138], (10, 5, False): [2.0779589, 2.3285481,
-    2.8499681, 3.5195753], (10, 5, True): [3.1649384, 3.4725945, 4.1003673,
-    4.8879723]}
+
+__all__ = ["large_p", "small_p", "crit_vals", "crit_percentiles", "stat_star"]
+
+large_p = {
+    (1, 1, False): [0.2231, 0.91426, 0.10102, 0.00569],
+    (1, 1, True): [-0.21766, 0.85933, 0.10411, 0.00661],
+    (1, 2, False): [-0.60796, 1.48713, 0.15076, 0.04453],
+    (1, 2, True): [-0.96204, 1.52593, 0.15996, 0.04166],
+    (1, 3, False): [-0.62883, 0.78991, 0.1, 0.00693],
+    (1, 3, True): [-0.91895, 0.82086, 0.12921, 0.01076],
+    (1, 4, False): [-1.50546, 1.79052, 0.05488, 0.06801],
+    (1, 4, True): [-1.79654, 1.8048, 0.06573, 0.06768],
+    (1, 5, False): [-1.36367, 0.94126, 0.21556, 0.02473],
+    (1, 5, True): [-1.60554, 0.93305, 0.2422, 0.03241],
+    (2, 1, False): [0.20576, 1.18914, 0.15731, 0.01144],
+    (2, 1, True): [-0.49024, 1.16958, 0.20564, 0.02008],
+    (2, 2, False): [-0.51799, 1.6368, 0.18955, 0.04317],
+    (2, 2, True): [-1.13394, 1.71056, 0.20442, 0.04195],
+    (2, 3, False): [-0.51712, 1.12963, 0.18936, 0.01808],
+    (2, 3, True): [-1.07441, 1.14964, 0.26066, 0.03338],
+    (2, 4, False): [-1.29895, 1.88501, 0.11734, 0.06615],
+    (2, 4, True): [-1.82455, 1.92207, 0.13753, 0.06269],
+    (2, 5, False): [-1.22263, 1.23208, 0.31401, 0.04495],
+    (2, 5, True): [-1.67689, 1.17567, 0.33606, 0.05898],
+    (3, 1, False): [0.1826, 1.39275, 0.19774, 0.01647],
+    (3, 1, True): [-0.71889, 1.39726, 0.29712, 0.03794],
+    (3, 2, False): [-0.45864, 1.77632, 0.22125, 0.04372],
+    (3, 2, True): [-1.28619, 1.88107, 0.23969, 0.04414],
+    (3, 3, False): [-0.45093, 1.38824, 0.26556, 0.03063],
+    (3, 3, True): [-1.22712, 1.36564, 0.34942, 0.05555],
+    (3, 4, False): [-1.15886, 1.99182, 0.16358, 0.06392],
+    (3, 4, True): [-1.88388, 2.05362, 0.18349, 0.06501],
+    (3, 5, False): [-1.11221, 1.44327, 0.3547, 0.05263],
+    (3, 5, True): [-1.75354, 1.37461, 0.3882, 0.07239],
+    (4, 1, False): [0.16431, 1.56391, 0.22944, 0.02067],
+    (4, 1, True): [-0.90799, 1.56908, 0.34763, 0.04814],
+    (4, 2, False): [-0.41568, 1.90715, 0.24783, 0.04407],
+    (4, 2, True): [-1.42373, 2.03902, 0.26907, 0.04755],
+    (4, 3, False): [-0.41104, 1.5716, 0.3066, 0.03842],
+    (4, 3, True): [-1.36194, 1.54043, 0.40145, 0.06846],
+    (4, 4, False): [-1.05651, 2.10007, 0.20201, 0.06129],
+    (4, 4, True): [-1.95474, 2.18305, 0.22527, 0.06441],
+    (4, 5, False): [-1.02502, 1.62605, 0.38203, 0.05565],
+    (4, 5, True): [-1.83458, 1.555, 0.42888, 0.07459],
+    (5, 1, False): [0.15015, 1.71718, 0.2584, 0.02507],
+    (5, 1, True): [-1.0707, 1.72829, 0.39037, 0.05468],
+    (5, 2, False): [-0.38277, 2.02985, 0.27139, 0.04513],
+    (5, 2, True): [-1.54974, 2.18631, 0.29592, 0.04967],
+    (5, 3, False): [-0.38023, 1.72586, 0.33033, 0.04188],
+    (5, 3, True): [-1.48415, 1.70271, 0.44016, 0.07248],
+    (5, 4, False): [-0.97676, 2.20429, 0.23233, 0.06543],
+    (5, 4, True): [-2.03144, 2.31343, 0.25394, 0.0675],
+    (5, 5, False): [-0.95421, 1.78775, 0.40239, 0.05642],
+    (5, 5, True): [-1.91679, 1.72031, 0.46434, 0.06641],
+    (6, 1, False): [0.13913, 1.8581, 0.28528, 0.02931],
+    (6, 1, True): [-1.21438, 1.87638, 0.42416, 0.05485],
+    (6, 2, False): [-0.35664, 2.14606, 0.29484, 0.04728],
+    (6, 2, True): [-1.66532, 2.32448, 0.31723, 0.05528],
+    (6, 3, False): [-0.35498, 1.86634, 0.35087, 0.04455],
+    (6, 3, True): [-1.59785, 1.85278, 0.47304, 0.07114],
+    (6, 4, False): [-0.91274, 2.30752, 0.26053, 0.0644],
+    (6, 4, True): [-2.10956, 2.43721, 0.2852, 0.06694],
+    (6, 5, False): [-0.89553, 1.9318, 0.41381, 0.05292],
+    (6, 5, True): [-1.99931, 1.87789, 0.49842, 0.04135],
+    (7, 1, False): [0.12974, 1.98503, 0.30606, 0.03218],
+    (7, 1, True): [-1.34555, 2.01647, 0.45456, 0.05018],
+    (7, 2, False): [-0.33519, 2.25631, 0.31659, 0.05016],
+    (7, 2, True): [-1.77496, 2.45806, 0.3372, 0.05741],
+    (7, 3, False): [-0.33377, 1.99554, 0.36742, 0.04624],
+    (7, 3, True): [-1.70381, 1.99863, 0.49883, 0.05092],
+    (7, 4, False): [-0.8596, 2.40762, 0.28334, 0.06401],
+    (7, 4, True): [-2.18704, 2.55828, 0.30627, 0.07091],
+    (7, 5, False): [-0.84606, 2.06291, 0.42505, 0.05152],
+    (7, 5, True): [-2.08097, 2.02139, 0.5348, 0.02343],
+    (8, 1, False): [0.12244, 2.10698, 0.32849, 0.03596],
+    (8, 1, True): [-1.46632, 2.1505, 0.48168, 0.04116],
+    (8, 2, False): [-0.31707, 2.36107, 0.33198, 0.04953],
+    (8, 2, True): [-1.87722, 2.58105, 0.35963, 0.05848],
+    (8, 3, False): [-0.31629, 2.11679, 0.38514, 0.04868],
+    (8, 3, True): [-1.80483, 2.13412, 0.52935, 0.03618],
+    (8, 4, False): [-0.81509, 2.50518, 0.30456, 0.06388],
+    (8, 4, True): [-2.26501, 2.67227, 0.33843, 0.06554],
+    (8, 5, False): [-0.80333, 2.18457, 0.42995, 0.0463],
+    (8, 5, True): [-2.16125, 2.15208, 0.58319, 0.0],
+    (9, 1, False): [0.11562, 2.22037, 0.34907, 0.03968],
+    (9, 1, True): [-1.57878, 2.27626, 0.5124, 0.03164],
+    (9, 2, False): [-0.30188, 2.46235, 0.35132, 0.05209],
+    (9, 2, True): [-1.97465, 2.70256, 0.37466, 0.06205],
+    (9, 3, False): [-0.30097, 2.23118, 0.39976, 0.05001],
+    (9, 3, True): [-1.90164, 2.26261, 0.56431, 0.0175],
+    (9, 4, False): [-0.77664, 2.59712, 0.32618, 0.06452],
+    (9, 4, True): [-2.33996, 2.78253, 0.36072, 0.06644],
+    (9, 5, False): [-0.76631, 2.2987, 0.43834, 0.04274],
+    (9, 5, True): [-2.23753, 2.27521, 0.60763, 0.0],
+    (10, 1, False): [0.10995, 2.3278, 0.36567, 0.04153],
+    (10, 1, True): [-1.6849, 2.39419, 0.5433, 0.02457],
+    (10, 2, False): [-0.28847, 2.55819, 0.36959, 0.05499],
+    (10, 2, True): [-2.06725, 2.81756, 0.38761, 0.0676],
+    (10, 3, False): [-0.28748, 2.33948, 0.41398, 0.05101],
+    (10, 3, True): [-1.99259, 2.38061, 0.59433, 0.01114],
+    (10, 4, False): [-0.74317, 2.68624, 0.345, 0.07032],
+    (10, 4, True): [-2.41409, 2.8931, 0.37487, 0.07102],
+    (10, 5, False): [-0.73464, 2.40692, 0.45153, 0.0434],
+    (10, 5, True): [-2.31364, 2.39092, 0.64313, -0.01012],
+}
+
+small_p = {
+    (1, 1, False): [0.2585, 0.92944, 0.25921],
+    (1, 1, True): [-0.17399, 0.88425, 0.29947],
+    (1, 2, False): [-0.45787, 1.15813, 0.37268],
+    (1, 2, True): [-0.76388, 1.13438, 0.39908],
+    (1, 3, False): [-0.57887, 0.87657, 0.32929],
+    (1, 3, True): [-0.88284, 0.81513, 0.366],
+    (1, 4, False): [-1.1926, 1.21061, 0.40386],
+    (1, 4, True): [-1.42909, 1.16607, 0.42899],
+    (1, 5, False): [-1.34428, 0.8756, 0.37809],
+    (1, 5, True): [-1.56285, 0.80464, 0.40703],
+    (2, 1, False): [0.23004, 1.12045, 0.31791],
+    (2, 1, True): [-0.45371, 1.06577, 0.38144],
+    (2, 2, False): [-0.41191, 1.36838, 0.39668],
+    (2, 2, True): [-0.9488, 1.32707, 0.44808],
+    (2, 3, False): [-0.49166, 1.11266, 0.36824],
+    (2, 3, True): [-1.03636, 1.04019, 0.42589],
+    (2, 4, False): [-1.08188, 1.42797, 0.42653],
+    (2, 4, True): [-1.52152, 1.36, 0.47256],
+    (2, 5, False): [-1.12408, 1.0565, 0.43505],
+    (2, 5, True): [-1.58614, 1.01208, 0.46796],
+    (3, 1, False): [0.20945, 1.29304, 0.36292],
+    (3, 1, True): [-0.60112, 1.139, 0.47837],
+    (3, 2, False): [-0.37491, 1.53959, 0.42397],
+    (3, 2, True): [-1.11163, 1.50639, 0.48662],
+    (3, 3, False): [-0.41411, 1.27093, 0.41524],
+    (3, 3, True): [-1.14285, 1.18673, 0.4906],
+    (3, 4, False): [-0.9946, 1.60793, 0.44771],
+    (3, 4, True): [-1.62609, 1.54566, 0.50619],
+    (3, 5, False): [-1.04988, 1.31372, 0.44802],
+    (3, 5, True): [-1.68976, 1.25316, 0.49896],
+    (4, 1, False): [0.18839, 1.46484, 0.39125],
+    (4, 1, True): [-0.81822, 1.35949, 0.50619],
+    (4, 2, False): [-0.35123, 1.705, 0.44075],
+    (4, 2, True): [-1.2591, 1.67286, 0.52021],
+    (4, 3, False): [-0.34716, 1.39436, 0.46391],
+    (4, 3, True): [-1.30728, 1.41428, 0.51292],
+    (4, 4, False): [-0.92783, 1.77056, 0.46587],
+    (4, 4, True): [-1.71493, 1.69609, 0.54221],
+    (4, 5, False): [-0.97468, 1.50704, 0.46661],
+    (4, 5, True): [-1.7783, 1.4453, 0.53112],
+    (5, 1, False): [0.17584, 1.60806, 0.424],
+    (5, 1, True): [-1.00705, 1.5668, 0.52487],
+    (5, 2, False): [-0.32186, 1.82909, 0.47183],
+    (5, 2, True): [-1.39492, 1.83145, 0.54756],
+    (5, 3, False): [-0.32204, 1.55407, 0.4884],
+    (5, 3, True): [-1.43499, 1.58772, 0.54359],
+    (5, 4, False): [-0.87005, 1.9128, 0.48361],
+    (5, 4, True): [-1.81929, 1.8594, 0.56629],
+    (5, 5, False): [-0.91534, 1.6826, 0.47972],
+    (5, 5, True): [-1.86297, 1.61238, 0.56196],
+    (6, 1, False): [0.16642, 1.7409, 0.45235],
+    (6, 1, True): [-1.15641, 1.72534, 0.55469],
+    (6, 2, False): [-0.31023, 1.97806, 0.47892],
+    (6, 2, True): [-1.52248, 1.98657, 0.56855],
+    (6, 3, False): [-0.30333, 1.70462, 0.50703],
+    (6, 3, True): [-1.5521, 1.74539, 0.57191],
+    (6, 4, False): [-0.82345, 2.04624, 0.50026],
+    (6, 4, True): [-1.90659, 1.99476, 0.59394],
+    (6, 5, False): [-0.85675, 1.81838, 0.50387],
+    (6, 5, True): [-1.92708, 1.73629, 0.60069],
+    (7, 1, False): [0.15013, 1.88779, 0.46397],
+    (7, 1, True): [-1.28169, 1.85521, 0.58877],
+    (7, 2, False): [-0.2904, 2.09042, 0.50233],
+    (7, 2, True): [-1.62626, 2.10378, 0.6013],
+    (7, 3, False): [-0.29138, 1.8506, 0.52083],
+    (7, 3, True): [-1.64831, 1.87115, 0.60523],
+    (7, 4, False): [-0.78647, 2.1757, 0.51247],
+    (7, 4, True): [-1.98344, 2.10977, 0.62411],
+    (7, 5, False): [-0.81099, 1.95374, 0.51949],
+    (7, 5, True): [-1.99875, 1.86512, 0.63051],
+    (8, 1, False): [0.14342, 2.00691, 0.48514],
+    (8, 1, True): [-1.3933, 1.97361, 0.62074],
+    (8, 2, False): [-0.27952, 2.20983, 0.51721],
+    (8, 2, True): [-1.74485, 2.25435, 0.61354],
+    (8, 3, False): [-0.28049, 1.98611, 0.53286],
+    (8, 3, True): [-1.74116, 1.99245, 0.63511],
+    (8, 4, False): [-0.74797, 2.28202, 0.53356],
+    (8, 4, True): [-2.07764, 2.25027, 0.64023],
+    (8, 5, False): [-0.76505, 2.06317, 0.54393],
+    (8, 5, True): [-2.04872, 1.95334, 0.67177],
+    (9, 1, False): [0.13505, 2.12341, 0.50439],
+    (9, 1, True): [-1.49339, 2.07805, 0.65464],
+    (9, 2, False): [-0.26881, 2.32256, 0.53025],
+    (9, 2, True): [-1.82677, 2.34223, 0.65004],
+    (9, 3, False): [-0.26657, 2.09906, 0.55384],
+    (9, 3, True): [-1.80085, 2.06043, 0.68234],
+    (9, 4, False): [-0.71672, 2.38896, 0.54931],
+    (9, 4, True): [-2.17306, 2.39146, 0.65252],
+    (9, 5, False): [-0.70907, 2.13027, 0.58668],
+    (9, 5, True): [-2.14411, 2.10595, 0.68478],
+    (10, 1, False): [0.12664, 2.23871, 0.51771],
+    (10, 1, True): [-1.59784, 2.19509, 0.67874],
+    (10, 2, False): [-0.25969, 2.4312, 0.54096],
+    (10, 2, True): [-1.93843, 2.48708, 0.65741],
+    (10, 3, False): [-0.25694, 2.21617, 0.56619],
+    (10, 3, True): [-1.89772, 2.1894, 0.70143],
+    (10, 4, False): [-0.69126, 2.49776, 0.5583],
+    (10, 4, True): [-2.24685, 2.4968, 0.67598],
+    (10, 5, False): [-0.6971, 2.28206, 0.57816],
+    (10, 5, True): [-2.21015, 2.208, 0.71379],
+}
+
+stat_star = {
+    (1, 1, False): 0.855423425047013,
+    (1, 1, True): 0.9074438436193457,
+    (1, 2, False): 2.3148213273461034,
+    (1, 2, True): 2.727010046970744,
+    (1, 3, False): 0.846390593107207,
+    (1, 3, True): 1.157556027201022,
+    (1, 4, False): 3.220377136548005,
+    (1, 4, True): 3.6108265020012418,
+    (1, 5, False): 1.7114703606421378,
+    (1, 5, True): 2.066325210881278,
+    (2, 1, False): 1.1268996107665314,
+    (2, 1, True): 1.3332514927355072,
+    (2, 2, False): 2.0512213167246456,
+    (2, 2, True): 2.656191837644102,
+    (2, 3, False): 1.058908331354388,
+    (2, 3, True): 1.5313322825819844,
+    (2, 4, False): 2.7213091542989725,
+    (2, 4, True): 3.2984645209852856,
+    (2, 5, False): 2.6006009671146497,
+    (2, 5, True): 2.661856653261213,
+    (3, 1, False): 1.263159095916295,
+    (3, 1, True): 2.4151349732452863,
+    (3, 2, False): 1.8886043232371843,
+    (3, 2, True): 2.6028096820968405,
+    (3, 3, False): 1.4879903191884682,
+    (3, 3, True): 2.2926969339773926,
+    (3, 4, False): 2.418527659154858,
+    (3, 4, True): 3.1039322592065988,
+    (3, 5, False): 1.9523612040944802,
+    (3, 5, True): 2.2115727453490757,
+    (4, 1, False): 1.290890114741129,
+    (4, 1, True): 2.1296963408410905,
+    (4, 2, False): 1.7770902061605607,
+    (4, 2, True): 2.5611885327765402,
+    (4, 3, False): 1.9340163095801728,
+    (4, 3, True): 1.9141318638062572,
+    (4, 4, False): 2.2146739201335466,
+    (4, 4, True): 2.9701790485477932,
+    (4, 5, False): 1.7408452994169448,
+    (4, 5, True): 2.1047247176583914,
+    (5, 1, False): 1.336967174239227,
+    (5, 1, True): 1.9131415178585627,
+    (5, 2, False): 1.6953274259688569,
+    (5, 2, True): 2.52745981091846,
+    (5, 3, False): 1.8124340908468068,
+    (5, 3, True): 1.8520883187848405,
+    (5, 4, False): 2.0675009559739297,
+    (5, 4, True): 2.8728076833515552,
+    (5, 5, False): 1.5978968362839456,
+    (5, 5, True): 2.1017517002543418,
+    (6, 1, False): 1.3810422398306446,
+    (6, 1, True): 1.8993612909227247,
+    (6, 2, False): 1.6324374150719114,
+    (6, 2, True): 2.498801004400209,
+    (6, 3, False): 1.72340094901749,
+    (6, 3, True): 1.8586513178563737,
+    (6, 4, False): 1.955819927102859,
+    (6, 4, True): 2.797145060481245,
+    (6, 5, False): 1.578613967104358,
+    (6, 5, True): 2.356249534336445,
+    (7, 1, False): 1.319436681229134,
+    (7, 1, True): 1.9955849619883248,
+    (7, 2, False): 1.5822190052675569,
+    (7, 2, True): 2.4744987764453055,
+    (7, 3, False): 1.65578510076754,
+    (7, 3, True): 2.046536484369615,
+    (7, 4, False): 1.8684573094851133,
+    (7, 4, True): 2.737241392502754,
+    (7, 5, False): 1.571855677342554,
+    (7, 5, True): 2.6006325210258505,
+    (8, 1, False): 1.3413558170956845,
+    (8, 1, True): 2.182981174661154,
+    (8, 2, False): 1.5416965902808288,
+    (8, 2, True): 2.4538471213095594,
+    (8, 3, False): 1.6021238307647196,
+    (8, 3, True): 2.2031866832480778,
+    (8, 4, False): 1.797595752125897,
+    (8, 4, True): 2.688099837236925,
+    (8, 5, False): 1.6561231184668357,
+    (8, 5, True): 2.883361281576836,
+    (9, 1, False): 1.3260368480749927,
+    (9, 1, True): 2.359689612641543,
+    (9, 2, False): 1.5074890058192492,
+    (9, 2, True): 2.435592395931648,
+    (9, 3, False): 1.5584090417965821,
+    (9, 3, True): 2.586293446202391,
+    (9, 4, False): 1.7393454428092985,
+    (9, 4, True): 2.6470908946956655,
+    (9, 5, False): 1.8180517504983742,
+    (9, 5, True): 2.818161371392247,
+    (10, 1, False): 1.3126519241806318,
+    (10, 1, True): 2.3499432601613885,
+    (10, 2, False): 1.4785447632683744,
+    (10, 2, True): 2.4199239298786215,
+    (10, 3, False): 1.5219767684407846,
+    (10, 3, True): 2.55484741648857,
+    (10, 4, False): 1.6902675233415512,
+    (10, 4, True): 2.6119272436084637,
+    (10, 5, False): 1.7372865030759366,
+    (10, 5, True): 2.7644864472524904,
+}
+
+crit_percentiles = (90, 95, 99, 99.9)
+
+crit_vals = {
+    (1, 1, False): [2.4170317, 3.119659, 4.7510799, 7.0838335],
+    (1, 1, True): [3.2538509, 4.0643748, 5.8825257, 8.4189144],
+    (1, 2, False): [3.0235968, 3.6115364, 4.9094056, 6.6859696],
+    (1, 2, True): [3.4943406, 4.1231394, 5.4961076, 7.3531815],
+    (1, 3, False): [4.044319, 4.9228967, 6.8609106, 9.5203666],
+    (1, 3, True): [4.7771822, 5.7217442, 7.7821227, 10.557471],
+    (1, 4, False): [4.0317707, 4.6921341, 6.1259225, 8.0467248],
+    (1, 4, True): [4.4725009, 5.169214, 6.668854, 8.6632132],
+    (1, 5, False): [5.5958071, 6.586727, 8.7355157, 11.6171903],
+    (1, 5, True): [6.2656898, 7.3133165, 9.5652229, 12.5537707],
+    (2, 1, False): [2.1562308, 2.6846692, 3.8773621, 5.5425892],
+    (2, 1, True): [3.1684785, 3.8003954, 5.177742, 7.0453814],
+    (2, 2, False): [2.6273503, 3.0998243, 4.1327001, 5.528847],
+    (2, 2, True): [3.3084134, 3.8345125, 4.9642009, 6.4657839],
+    (2, 3, False): [3.1741284, 3.8022629, 5.1722882, 7.0241224],
+    (2, 3, True): [4.108262, 4.8116858, 6.3220548, 8.322478],
+    (2, 4, False): [3.3668869, 3.8887628, 5.0115801, 6.5052326],
+    (2, 4, True): [4.0126604, 4.5835675, 5.7968684, 7.3887863],
+    (2, 5, False): [4.1863149, 4.8834936, 6.3813095, 8.3781415],
+    (2, 5, True): [5.053508, 5.8168869, 7.4384998, 9.565425],
+    (3, 1, False): [1.998571, 2.4316514, 3.3919322, 4.709226],
+    (3, 1, True): [3.0729965, 3.6016775, 4.7371358, 6.2398661],
+    (3, 2, False): [2.3813866, 2.7820412, 3.6486786, 4.8089784],
+    (3, 2, True): [3.1778198, 3.6364094, 4.6114583, 5.8888408],
+    (3, 3, False): [2.7295224, 3.2290217, 4.3110408, 5.7599206],
+    (3, 3, True): [3.7471556, 4.3222818, 5.5425521, 7.1435458],
+    (3, 4, False): [2.9636218, 3.4007434, 4.3358236, 5.5729155],
+    (3, 4, True): [3.7234883, 4.2135706, 5.247283, 6.5911207],
+    (3, 5, False): [3.4742551, 4.0219835, 5.1911046, 6.7348191],
+    (3, 5, True): [4.4323554, 5.0480574, 6.3448127, 8.0277313],
+    (4, 1, False): [1.8897829, 2.2616928, 3.0771215, 4.1837434],
+    (4, 1, True): [2.9925753, 3.4545032, 4.4326745, 5.7123835],
+    (4, 2, False): [2.2123295, 2.5633388, 3.3177874, 4.321218],
+    (4, 2, True): [3.0796353, 3.4898084, 4.3536497, 5.4747288],
+    (4, 3, False): [2.4565534, 2.877209, 3.7798528, 4.9852682],
+    (4, 3, True): [3.516144, 4.0104999, 5.0504684, 6.4022435],
+    (4, 4, False): [2.6902225, 3.0699099, 3.877333, 4.9405835],
+    (4, 4, True): [3.5231152, 3.9578931, 4.867071, 6.0403311],
+    (4, 5, False): [3.0443998, 3.5009718, 4.4707539, 5.7457746],
+    (4, 5, True): [4.0501255, 4.5739556, 5.6686684, 7.0814031],
+    (5, 1, False): [1.8104326, 2.1394999, 2.8541086, 3.8114409],
+    (5, 1, True): [2.9267613, 3.3396521, 4.2078599, 5.3342038],
+    (5, 2, False): [2.0879588, 2.40264, 3.0748083, 3.9596152],
+    (5, 2, True): [3.002768, 3.3764374, 4.1585099, 5.1657752],
+    (5, 3, False): [2.2702787, 2.6369717, 3.4203738, 4.4521021],
+    (5, 3, True): [3.3535243, 3.7914038, 4.7060983, 5.8841151],
+    (5, 4, False): [2.4928973, 2.831033, 3.5478855, 4.4836677],
+    (5, 4, True): [3.3756681, 3.7687148, 4.587147, 5.6351487],
+    (5, 5, False): [2.7536425, 3.149282, 3.985975, 5.0799181],
+    (5, 5, True): [3.7890425, 4.2501858, 5.2074857, 6.4355821],
+    (6, 1, False): [1.7483313, 2.0453753, 2.685931, 3.5375009],
+    (6, 1, True): [2.8719403, 3.2474515, 4.0322637, 5.0451946],
+    (6, 2, False): [1.9922451, 2.2792144, 2.8891314, 3.690865],
+    (6, 2, True): [2.9399824, 3.2851357, 4.0031551, 4.9247226],
+    (6, 3, False): [2.1343676, 2.4620175, 3.1585901, 4.0720179],
+    (6, 3, True): [3.2311014, 3.6271964, 4.4502999, 5.5018575],
+    (6, 4, False): [2.3423792, 2.6488947, 3.2947623, 4.1354724],
+    (6, 4, True): [3.2610813, 3.6218989, 4.3702232, 5.3232767],
+    (6, 5, False): [2.5446232, 2.8951601, 3.633989, 4.5935586],
+    (6, 5, True): [3.5984454, 4.0134462, 4.8709448, 5.9622726],
+    (7, 1, False): [1.6985327, 1.9707636, 2.5536649, 3.3259272],
+    (7, 1, True): [2.825928, 3.1725169, 3.8932738, 4.8134085],
+    (7, 2, False): [1.9155946, 2.1802812, 2.7408759, 3.4710326],
+    (7, 2, True): [2.8879427, 3.2093335, 3.8753322, 4.724748],
+    (7, 3, False): [2.0305429, 2.3281704, 2.9569345, 3.7788337],
+    (7, 3, True): [3.136325, 3.4999128, 4.2519893, 5.2075305],
+    (7, 4, False): [2.2246175, 2.5055486, 3.0962182, 3.86164],
+    (7, 4, True): [3.1695552, 3.5051856, 4.1974421, 5.073436],
+    (7, 5, False): [2.3861201, 2.7031072, 3.3680435, 4.2305443],
+    (7, 5, True): [3.4533491, 3.8323234, 4.613939, 5.6044399],
+    (8, 1, False): [1.6569223, 1.9092423, 2.4470718, 3.1537838],
+    (8, 1, True): [2.7862884, 3.1097259, 3.7785302, 4.6293176],
+    (8, 2, False): [1.8532862, 2.0996872, 2.6186041, 3.2930359],
+    (8, 2, True): [2.8435812, 3.1459955, 3.769165, 4.5623681],
+    (8, 3, False): [1.9480198, 2.2215083, 2.7979659, 3.54771],
+    (8, 3, True): [3.0595184, 3.3969531, 4.0923089, 4.9739178],
+    (8, 4, False): [2.1289147, 2.3893773, 2.9340882, 3.6390988],
+    (8, 4, True): [3.094188, 3.4085297, 4.0545165, 4.8699787],
+    (8, 5, False): [2.2616596, 2.5515168, 3.1586476, 3.9422645],
+    (8, 5, True): [3.3374076, 3.6880139, 4.407457, 5.3152095],
+    (9, 1, False): [1.6224492, 1.8578787, 2.3580077, 3.0112501],
+    (9, 1, True): [2.7520721, 3.0557346, 3.6811682, 4.4739536],
+    (9, 2, False): [1.8008993, 2.0320841, 2.5170871, 3.1451424],
+    (9, 2, True): [2.8053707, 3.091422, 3.6784683, 4.4205306],
+    (9, 3, False): [1.8811231, 2.1353897, 2.6683796, 3.358463],
+    (9, 3, True): [2.9957112, 3.3114482, 3.9596061, 4.7754473],
+    (9, 4, False): [2.0498497, 2.2930641, 2.8018384, 3.4543646],
+    (9, 4, True): [3.0308611, 3.3269185, 3.9347618, 4.6993614],
+    (9, 5, False): [2.1610306, 2.4296727, 2.98963, 3.7067719],
+    (9, 5, True): [3.2429533, 3.5699095, 4.2401975, 5.0823119],
+    (10, 1, False): [1.5927907, 1.8145253, 2.2828013, 2.8927966],
+    (10, 1, True): [2.7222721, 3.009471, 3.5990544, 4.3432975],
+    (10, 2, False): [1.756145, 1.9744492, 2.4313123, 3.0218681],
+    (10, 2, True): [2.7724339, 3.0440412, 3.6004793, 4.3015151],
+    (10, 3, False): [1.8248841, 2.0628201, 2.5606728, 3.2029316],
+    (10, 3, True): [2.9416094, 3.239357, 3.8484916, 4.6144906],
+    (10, 4, False): [1.9833587, 2.2124939, 2.690228, 3.3020807],
+    (10, 4, True): [2.9767752, 3.2574924, 3.8317161, 4.5512138],
+    (10, 5, False): [2.0779589, 2.3285481, 2.8499681, 3.5195753],
+    (10, 5, True): [3.1649384, 3.4725945, 4.1003673, 4.8879723],
+}
diff --git a/statsmodels/tsa/arima/api.py b/statsmodels/tsa/arima/api.py
index 3d8ef9791..6d85e94b4 100644
--- a/statsmodels/tsa/arima/api.py
+++ b/statsmodels/tsa/arima/api.py
@@ -1,2 +1,3 @@
-__all__ = ['ARIMA']
+__all__ = ["ARIMA"]
+
 from statsmodels.tsa.arima.model import ARIMA
diff --git a/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/dowj.py b/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/dowj.py
index 8769d769c..b268af6ba 100644
--- a/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/dowj.py
+++ b/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/dowj.py
@@ -14,14 +14,17 @@ References
 .. [1] Brockwell, Peter J., and Richard A. Davis. 2016.
    Introduction to Time Series and Forecasting. Springer.
 .. [2] Brockwell, Peter J., and Richard A. Davis. n.d. ITSM2000.
-"""
+"""  # noqa:E501
+
 import pandas as pd
-dowj = pd.Series([110.94, 110.69, 110.43, 110.56, 110.75, 110.84, 110.46, 
-    110.56, 110.46, 110.05, 109.6, 109.31, 109.31, 109.25, 109.02, 108.54, 
-    108.77, 109.02, 109.44, 109.38, 109.53, 109.89, 110.56, 110.56, 110.72,
-    111.23, 111.48, 111.58, 111.9, 112.19, 112.06, 111.96, 111.68, 111.36, 
-    111.42, 112, 112.22, 112.7, 113.15, 114.36, 114.65, 115.06, 115.86, 
-    116.4, 116.44, 116.88, 118.07, 118.51, 119.28, 119.79, 119.7, 119.28, 
-    119.66, 120.14, 120.97, 121.13, 121.55, 121.96, 122.26, 123.79, 124.11,
-    124.14, 123.37, 123.02, 122.86, 123.02, 123.11, 123.05, 123.05, 122.83,
-    123.18, 122.67, 122.73, 122.86, 122.67, 122.09, 122, 121.23])
+
+dowj = pd.Series([
+    110.94, 110.69, 110.43, 110.56, 110.75, 110.84, 110.46, 110.56, 110.46,
+    110.05, 109.6, 109.31, 109.31, 109.25, 109.02, 108.54, 108.77, 109.02,
+    109.44, 109.38, 109.53, 109.89, 110.56, 110.56, 110.72, 111.23, 111.48,
+    111.58, 111.9, 112.19, 112.06, 111.96, 111.68, 111.36, 111.42, 112,
+    112.22, 112.7, 113.15, 114.36, 114.65, 115.06, 115.86, 116.4, 116.44,
+    116.88, 118.07, 118.51, 119.28, 119.79, 119.7, 119.28, 119.66, 120.14,
+    120.97, 121.13, 121.55, 121.96, 122.26, 123.79, 124.11, 124.14, 123.37,
+    123.02, 122.86, 123.02, 123.11, 123.05, 123.05, 122.83, 123.18, 122.67,
+    122.73, 122.86, 122.67, 122.09, 122, 121.23])
diff --git a/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/lake.py b/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/lake.py
index b15b4a992..753a001bc 100755
--- a/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/lake.py
+++ b/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/lake.py
@@ -11,14 +11,17 @@ References
    Introduction to Time Series and Forecasting. Springer.
 .. [2] Brockwell, Peter J., and Richard A. Davis. n.d. ITSM2000.
 """
+
 import pandas as pd
-lake = pd.Series([10.38, 11.86, 10.97, 10.8, 9.79, 10.39, 10.42, 10.82, 
-    11.4, 11.32, 11.44, 11.68, 11.17, 10.53, 10.01, 9.91, 9.14, 9.16, 9.55,
-    9.67, 8.44, 8.24, 9.1, 9.09, 9.35, 8.82, 9.32, 9.01, 9, 9.8, 9.83, 9.72,
-    9.89, 10.01, 9.37, 8.69, 8.19, 8.67, 9.55, 8.92, 8.09, 9.37, 10.13, 
-    10.14, 9.51, 9.24, 8.66, 8.86, 8.05, 7.79, 6.75, 6.75, 7.82, 8.64, 
-    10.58, 9.48, 7.38, 6.9, 6.94, 6.24, 6.84, 6.85, 6.9, 7.79, 8.18, 7.51, 
-    7.23, 8.42, 9.61, 9.05, 9.26, 9.22, 9.38, 9.1, 7.95, 8.12, 9.75, 10.85,
-    10.41, 9.96, 9.61, 8.76, 8.18, 7.21, 7.13, 9.1, 8.25, 7.91, 6.89, 5.96,
-    6.8, 7.68, 8.38, 8.52, 9.74, 9.31, 9.89, 9.96], index=pd.period_range(
-    start='1875', end='1972', freq='Y').to_timestamp())
+
+lake = pd.Series([
+    10.38, 11.86, 10.97, 10.8, 9.79, 10.39, 10.42, 10.82, 11.4, 11.32, 11.44,
+    11.68, 11.17, 10.53, 10.01, 9.91, 9.14, 9.16, 9.55, 9.67, 8.44, 8.24, 9.1,
+    9.09, 9.35, 8.82, 9.32, 9.01, 9, 9.8, 9.83, 9.72, 9.89, 10.01, 9.37, 8.69,
+    8.19, 8.67, 9.55, 8.92, 8.09, 9.37, 10.13, 10.14, 9.51, 9.24, 8.66, 8.86,
+    8.05, 7.79, 6.75, 6.75, 7.82, 8.64, 10.58, 9.48, 7.38, 6.9, 6.94, 6.24,
+    6.84, 6.85, 6.9, 7.79, 8.18, 7.51, 7.23, 8.42, 9.61, 9.05, 9.26, 9.22,
+    9.38, 9.1, 7.95, 8.12, 9.75, 10.85, 10.41, 9.96, 9.61, 8.76, 8.18, 7.21,
+    7.13, 9.1, 8.25, 7.91, 6.89, 5.96, 6.8, 7.68, 8.38, 8.52, 9.74, 9.31,
+    9.89, 9.96],
+    index=pd.period_range(start='1875', end='1972', freq='Y').to_timestamp())
diff --git a/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/oshorts.py b/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/oshorts.py
index ada54a694..312c299b5 100644
--- a/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/oshorts.py
+++ b/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/oshorts.py
@@ -12,8 +12,11 @@ References
    Introduction to Time Series and Forecasting. Springer.
 .. [2] Brockwell, Peter J., and Richard A. Davis. n.d. ITSM2000.
 """
+
 import pandas as pd
-oshorts = pd.Series([78, -58, 53, -65, 13, -6, -16, -14, 3, -72, 89, -48, -
-    14, 32, 56, -86, -66, 50, 26, 59, -47, -83, 2, -1, 124, -106, 113, -76,
-    -47, -32, 39, -30, 6, -73, 18, 2, -24, 23, -38, 91, -56, -58, 1, 14, -4,
-    77, -127, 97, 10, -28, -17, 23, -2, 48, -131, 65, -17])
+
+oshorts = pd.Series([
+    78, -58, 53, -65, 13, -6, -16, -14, 3, -72, 89, -48, -14, 32, 56, -86,
+    -66, 50, 26, 59, -47, -83, 2, -1, 124, -106, 113, -76, -47, -32, 39,
+    -30, 6, -73, 18, 2, -24, 23, -38, 91, -56, -58, 1, 14, -4, 77, -127, 97,
+    10, -28, -17, 23, -2, 48, -131, 65, -17])
diff --git a/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/sbl.py b/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/sbl.py
index e859991ed..9093c847d 100644
--- a/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/sbl.py
+++ b/statsmodels/tsa/arima/datasets/brockwell_davis_2002/data/sbl.py
@@ -12,16 +12,18 @@ References
    Introduction to Time Series and Forecasting. Springer.
 .. [2] Brockwell, Peter J., and Richard A. Davis. n.d. ITSM2000.
 """
+
 import pandas as pd
-sbl = pd.Series([1577, 1356, 1652, 1382, 1519, 1421, 1442, 1543, 1656, 1561,
-    1905, 2199, 1473, 1655, 1407, 1395, 1530, 1309, 1526, 1327, 1627, 1748,
-    1958, 2274, 1648, 1401, 1411, 1403, 1394, 1520, 1528, 1643, 1515, 1685,
-    2000, 2215, 1956, 1462, 1563, 1459, 1446, 1622, 1657, 1638, 1643, 1683,
-    2050, 2262, 1813, 1445, 1762, 1461, 1556, 1431, 1427, 1554, 1645, 1653,
-    2016, 2207, 1665, 1361, 1506, 1360, 1453, 1522, 1460, 1552, 1548, 1827,
-    1737, 1941, 1474, 1458, 1542, 1404, 1522, 1385, 1641, 1510, 1681, 1938,
-    1868, 1726, 1456, 1445, 1456, 1365, 1487, 1558, 1488, 1684, 1594, 1850,
-    1998, 2079, 1494, 1057, 1218, 1168, 1236, 1076, 1174, 1139, 1427, 1487,
-    1483, 1513, 1357, 1165, 1282, 1110, 1297, 1185, 1222, 1284, 1444, 1575,
-    1737, 1763], index=pd.date_range(start='1975-01-01', end='1984-12-01',
-    freq='MS'))
+
+sbl = pd.Series([
+    1577, 1356, 1652, 1382, 1519, 1421, 1442, 1543, 1656, 1561, 1905, 2199,
+    1473, 1655, 1407, 1395, 1530, 1309, 1526, 1327, 1627, 1748, 1958, 2274,
+    1648, 1401, 1411, 1403, 1394, 1520, 1528, 1643, 1515, 1685, 2000, 2215,
+    1956, 1462, 1563, 1459, 1446, 1622, 1657, 1638, 1643, 1683, 2050, 2262,
+    1813, 1445, 1762, 1461, 1556, 1431, 1427, 1554, 1645, 1653, 2016, 2207,
+    1665, 1361, 1506, 1360, 1453, 1522, 1460, 1552, 1548, 1827, 1737, 1941,
+    1474, 1458, 1542, 1404, 1522, 1385, 1641, 1510, 1681, 1938, 1868, 1726,
+    1456, 1445, 1456, 1365, 1487, 1558, 1488, 1684, 1594, 1850, 1998, 2079,
+    1494, 1057, 1218, 1168, 1236, 1076, 1174, 1139, 1427, 1487, 1483, 1513,
+    1357, 1165, 1282, 1110, 1297, 1185, 1222, 1284, 1444, 1575, 1737, 1763],
+    index=pd.date_range(start='1975-01-01', end='1984-12-01', freq='MS'))
diff --git a/statsmodels/tsa/arima/estimators/burg.py b/statsmodels/tsa/arima/estimators/burg.py
index 32a8a1c96..148f040cf 100644
--- a/statsmodels/tsa/arima/estimators/burg.py
+++ b/statsmodels/tsa/arima/estimators/burg.py
@@ -5,8 +5,10 @@ Author: Chad Fulton
 License: BSD-3
 """
 import numpy as np
+
 from statsmodels.tools.tools import Bunch
 from statsmodels.regression import linear_model
+
 from statsmodels.tsa.arima.specification import SARIMAXSpecification
 from statsmodels.tsa.arima.params import SARIMAXParams

@@ -46,4 +48,30 @@ def burg(endog, ar_order=0, demean=True):
     .. [1] Brockwell, Peter J., and Richard A. Davis. 2016.
        Introduction to Time Series and Forecasting. Springer.
     """
-    pass
+    spec = SARIMAXSpecification(endog, ar_order=ar_order)
+    endog = spec.endog
+
+    # Workaround for statsmodels.tsa.stattools.pacf_burg which does not work
+    # on integer input
+    # TODO: remove when possible
+    if np.issubdtype(endog.dtype, np.dtype(int)):
+        endog = endog * 1.0
+
+    if not spec.is_ar_consecutive:
+        raise ValueError('Burg estimation unavailable for models with'
+                         ' seasonal or otherwise non-consecutive AR orders.')
+
+    p = SARIMAXParams(spec=spec)
+
+    if ar_order == 0:
+        p.sigma2 = np.var(endog)
+    else:
+        p.ar_params, p.sigma2 = linear_model.burg(endog, order=ar_order,
+                                                  demean=demean)
+
+        # Construct other results
+    other_results = Bunch({
+        'spec': spec,
+    })
+
+    return p, other_results
diff --git a/statsmodels/tsa/arima/estimators/durbin_levinson.py b/statsmodels/tsa/arima/estimators/durbin_levinson.py
index 18dbb58b9..d76673008 100644
--- a/statsmodels/tsa/arima/estimators/durbin_levinson.py
+++ b/statsmodels/tsa/arima/estimators/durbin_levinson.py
@@ -5,14 +5,16 @@ Author: Chad Fulton
 License: BSD-3
 """
 from statsmodels.compat.pandas import deprecate_kwarg
+
 import numpy as np
+
 from statsmodels.tools.tools import Bunch
 from statsmodels.tsa.arima.params import SARIMAXParams
 from statsmodels.tsa.arima.specification import SARIMAXSpecification
 from statsmodels.tsa.stattools import acovf


-@deprecate_kwarg('unbiased', 'adjusted')
+@deprecate_kwarg("unbiased", "adjusted")
 def durbin_levinson(endog, ar_order=0, demean=True, adjusted=False):
     """
     Estimate AR parameters at multiple orders using Durbin-Levinson recursions.
@@ -52,4 +54,54 @@ def durbin_levinson(endog, ar_order=0, demean=True, adjusted=False):
     .. [1] Brockwell, Peter J., and Richard A. Davis. 2016.
        Introduction to Time Series and Forecasting. Springer.
     """
-    pass
+    spec = max_spec = SARIMAXSpecification(endog, ar_order=ar_order)
+    endog = max_spec.endog
+
+    # Make sure we have a consecutive process
+    if not max_spec.is_ar_consecutive:
+        raise ValueError('Durbin-Levinson estimation unavailable for models'
+                         ' with seasonal or otherwise non-consecutive AR'
+                         ' orders.')
+
+    gamma = acovf(endog, adjusted=adjusted, fft=True, demean=demean,
+                  nlag=max_spec.ar_order)
+
+    # If no AR component, just a variance computation
+    if max_spec.ar_order == 0:
+        ar_params = [None]
+        sigma2 = [gamma[0]]
+    # Otherwise, AR model
+    else:
+        Phi = np.zeros((max_spec.ar_order, max_spec.ar_order))
+        v = np.zeros(max_spec.ar_order + 1)
+
+        Phi[0, 0] = gamma[1] / gamma[0]
+        v[0] = gamma[0]
+        v[1] = v[0] * (1 - Phi[0, 0]**2)
+
+        for i in range(1, max_spec.ar_order):
+            tmp = Phi[i-1, :i]
+            Phi[i, i] = (gamma[i + 1] - np.dot(tmp, gamma[i:0:-1])) / v[i]
+            Phi[i, :i] = (tmp - Phi[i, i] * tmp[::-1])
+            v[i + 1] = v[i] * (1 - Phi[i, i]**2)
+
+        ar_params = [None] + [Phi[i, :i + 1] for i in range(max_spec.ar_order)]
+        sigma2 = v
+
+    # Compute output
+    out = []
+    for i in range(max_spec.ar_order + 1):
+        spec = SARIMAXSpecification(ar_order=i)
+        p = SARIMAXParams(spec=spec)
+        if i == 0:
+            p.params = sigma2[i]
+        else:
+            p.params = np.r_[ar_params[i], sigma2[i]]
+        out.append(p)
+
+        # Construct other results
+    other_results = Bunch({
+        'spec': spec,
+    })
+
+    return out, other_results
diff --git a/statsmodels/tsa/arima/estimators/gls.py b/statsmodels/tsa/arima/estimators/gls.py
index fcf6fb12e..2d803cb52 100644
--- a/statsmodels/tsa/arima/estimators/gls.py
+++ b/statsmodels/tsa/arima/estimators/gls.py
@@ -6,22 +6,26 @@ License: BSD-3
 """
 import numpy as np
 import warnings
+
 from statsmodels.tools.tools import add_constant, Bunch
 from statsmodels.regression.linear_model import OLS
 from statsmodels.tsa.innovations import arma_innovations
 from statsmodels.tsa.statespace.tools import diff
+
 from statsmodels.tsa.arima.estimators.yule_walker import yule_walker
 from statsmodels.tsa.arima.estimators.burg import burg
 from statsmodels.tsa.arima.estimators.hannan_rissanen import hannan_rissanen
-from statsmodels.tsa.arima.estimators.innovations import innovations, innovations_mle
+from statsmodels.tsa.arima.estimators.innovations import (
+    innovations, innovations_mle)
 from statsmodels.tsa.arima.estimators.statespace import statespace
+
 from statsmodels.tsa.arima.specification import SARIMAXSpecification
 from statsmodels.tsa.arima.params import SARIMAXParams


 def gls(endog, exog=None, order=(0, 0, 0), seasonal_order=(0, 0, 0, 0),
-    include_constant=None, n_iter=None, max_iter=50, tolerance=1e-08,
-    arma_estimator='innovations_mle', arma_estimator_kwargs=None):
+        include_constant=None, n_iter=None, max_iter=50, tolerance=1e-8,
+        arma_estimator='innovations_mle', arma_estimator_kwargs=None):
     """
     Estimate ARMAX parameters by GLS.

@@ -98,4 +102,214 @@ def gls(endog, exog=None, order=(0, 0, 0), seasonal_order=(0, 0, 0, 0),
     .. [1] Brockwell, Peter J., and Richard A. Davis. 2016.
        Introduction to Time Series and Forecasting. Springer.
     """
-    pass
+    # Handle n_iter
+    if n_iter is not None:
+        max_iter = n_iter
+        tolerance = np.inf
+
+    # Default for include_constant is True if there is no integration and
+    # False otherwise
+    integrated = order[1] > 0 or seasonal_order[1] > 0
+    if include_constant is None:
+        include_constant = not integrated
+    elif include_constant and integrated:
+        raise ValueError('Cannot include a constant in an integrated model.')
+
+    # Handle including the constant (need to do it now so that the constant
+    # parameter can be included in the specification as part of `exog`.)
+    if include_constant:
+        exog = np.ones_like(endog) if exog is None else add_constant(exog)
+
+    # Create the SARIMAX specification
+    spec = SARIMAXSpecification(endog, exog=exog, order=order,
+                                seasonal_order=seasonal_order)
+    endog = spec.endog
+    exog = spec.exog
+
+    # Handle integration
+    if spec.is_integrated:
+        # TODO: this is the approach suggested by BD (see Remark 1 in
+        # section 6.6.2 and Example 6.6.3), but maybe there are some cases
+        # where we don't want to force this behavior on the user?
+        warnings.warn('Provided `endog` and `exog` series have been'
+                      ' differenced to eliminate integration prior to GLS'
+                      ' parameter estimation.')
+        endog = diff(endog, k_diff=spec.diff,
+                     k_seasonal_diff=spec.seasonal_diff,
+                     seasonal_periods=spec.seasonal_periods)
+        exog = diff(exog, k_diff=spec.diff,
+                    k_seasonal_diff=spec.seasonal_diff,
+                    seasonal_periods=spec.seasonal_periods)
+    augmented = np.c_[endog, exog]
+
+    # Validate arma_estimator
+    spec.validate_estimator(arma_estimator)
+    if arma_estimator_kwargs is None:
+        arma_estimator_kwargs = {}
+
+    # Step 1: OLS
+    mod_ols = OLS(endog, exog)
+    res_ols = mod_ols.fit()
+    exog_params = res_ols.params
+    resid = res_ols.resid
+
+    # 0th iteration parameters
+    p = SARIMAXParams(spec=spec)
+    p.exog_params = exog_params
+    if spec.max_ar_order > 0:
+        p.ar_params = np.zeros(spec.k_ar_params)
+    if spec.max_seasonal_ar_order > 0:
+        p.seasonal_ar_params = np.zeros(spec.k_seasonal_ar_params)
+    if spec.max_ma_order > 0:
+        p.ma_params = np.zeros(spec.k_ma_params)
+    if spec.max_seasonal_ma_order > 0:
+        p.seasonal_ma_params = np.zeros(spec.k_seasonal_ma_params)
+    p.sigma2 = res_ols.scale
+
+    ar_params = p.ar_params
+    seasonal_ar_params = p.seasonal_ar_params
+    ma_params = p.ma_params
+    seasonal_ma_params = p.seasonal_ma_params
+    sigma2 = p.sigma2
+
+    # Step 2 - 4: iterate feasible GLS to convergence
+    arma_results = [None]
+    differences = [None]
+    parameters = [p]
+    converged = False if n_iter is None else None
+    i = 0
+
+    def _check_arma_estimator_kwargs(kwargs, method):
+        if kwargs:
+            raise ValueError(
+                f"arma_estimator_kwargs not supported for method {method}"
+            )
+
+    for i in range(1, max_iter + 1):
+        prev = exog_params
+
+        # Step 2: ARMA
+        # TODO: allow estimator-specific kwargs?
+        if arma_estimator == 'yule_walker':
+            p_arma, res_arma = yule_walker(
+                resid, ar_order=spec.ar_order, demean=False,
+                **arma_estimator_kwargs)
+        elif arma_estimator == 'burg':
+            _check_arma_estimator_kwargs(arma_estimator_kwargs, "burg")
+            p_arma, res_arma = burg(resid, ar_order=spec.ar_order,
+                                    demean=False)
+        elif arma_estimator == 'innovations':
+            _check_arma_estimator_kwargs(arma_estimator_kwargs, "innovations")
+            out, res_arma = innovations(resid, ma_order=spec.ma_order,
+                                        demean=False)
+            p_arma = out[-1]
+        elif arma_estimator == 'hannan_rissanen':
+            p_arma, res_arma = hannan_rissanen(
+                resid, ar_order=spec.ar_order, ma_order=spec.ma_order,
+                demean=False, **arma_estimator_kwargs)
+        else:
+            # For later iterations, use a "warm start" for parameter estimates
+            # (speeds up estimation and convergence)
+            start_params = (
+                None if i == 1 else np.r_[ar_params, ma_params,
+                                          seasonal_ar_params,
+                                          seasonal_ma_params, sigma2])
+            # Note: in each case, we do not pass in the order of integration
+            # since we have already differenced the series
+            tmp_order = (spec.order[0], 0, spec.order[2])
+            tmp_seasonal_order = (spec.seasonal_order[0], 0,
+                                  spec.seasonal_order[2],
+                                  spec.seasonal_order[3])
+            if arma_estimator == 'innovations_mle':
+                p_arma, res_arma = innovations_mle(
+                    resid, order=tmp_order, seasonal_order=tmp_seasonal_order,
+                    demean=False, start_params=start_params,
+                    **arma_estimator_kwargs)
+            else:
+                p_arma, res_arma = statespace(
+                    resid, order=tmp_order, seasonal_order=tmp_seasonal_order,
+                    include_constant=False, start_params=start_params,
+                    **arma_estimator_kwargs)
+
+        ar_params = p_arma.ar_params
+        seasonal_ar_params = p_arma.seasonal_ar_params
+        ma_params = p_arma.ma_params
+        seasonal_ma_params = p_arma.seasonal_ma_params
+        sigma2 = p_arma.sigma2
+        arma_results.append(res_arma)
+
+        # Step 3: GLS
+        # Compute transformed variables that satisfy OLS assumptions
+        # Note: In section 6.1.1 of Brockwell and Davis (2016), these
+        # transformations are developed as computed by left multiplcation
+        # by a matrix T. However, explicitly constructing T and then
+        # performing the left-multiplications does not scale well when nobs is
+        # large. Instead, we can retrieve the transformed variables as the
+        # residuals of the innovations algorithm (the `normalize=True`
+        # argument applies a Prais-Winsten-type normalization to the first few
+        # observations to ensure homoskedasticity). Brockwell and Davis
+        # mention that they also take this approach in practice.
+
+        # GH-6540: AR must be stationary
+
+        if not p_arma.is_stationary:
+            raise ValueError(
+                "Roots of the autoregressive parameters indicate that data is"
+                "non-stationary. GLS cannot be used with non-stationary "
+                "parameters. You should consider differencing the model data"
+                "or applying a nonlinear transformation (e.g., natural log)."
+            )
+        tmp, _ = arma_innovations.arma_innovations(
+            augmented, ar_params=ar_params, ma_params=ma_params,
+            normalize=True)
+        u = tmp[:, 0]
+        x = tmp[:, 1:]
+
+        # OLS on transformed variables
+        mod_gls = OLS(u, x)
+        res_gls = mod_gls.fit()
+        exog_params = res_gls.params
+        resid = endog - np.dot(exog, exog_params)
+
+        # Construct the parameter vector for the iteration
+        p = SARIMAXParams(spec=spec)
+        p.exog_params = exog_params
+        if spec.max_ar_order > 0:
+            p.ar_params = ar_params
+        if spec.max_seasonal_ar_order > 0:
+            p.seasonal_ar_params = seasonal_ar_params
+        if spec.max_ma_order > 0:
+            p.ma_params = ma_params
+        if spec.max_seasonal_ma_order > 0:
+            p.seasonal_ma_params = seasonal_ma_params
+        p.sigma2 = sigma2
+        parameters.append(p)
+
+        # Check for convergence
+        difference = np.abs(exog_params - prev)
+        differences.append(difference)
+        if n_iter is None and np.all(difference < tolerance):
+            converged = True
+            break
+    else:
+        if n_iter is None:
+            warnings.warn('Feasible GLS failed to converge in %d iterations.'
+                          ' Consider increasing the maximum number of'
+                          ' iterations using the `max_iter` argument or'
+                          ' reducing the required tolerance using the'
+                          ' `tolerance` argument.' % max_iter)
+
+    # Construct final results
+    p = parameters[-1]
+    other_results = Bunch({
+        'spec': spec,
+        'params': parameters,
+        'converged': converged,
+        'differences': differences,
+        'iterations': i,
+        'arma_estimator': arma_estimator,
+        'arma_estimator_kwargs': arma_estimator_kwargs,
+        'arma_results': arma_results,
+    })
+
+    return p, other_results
diff --git a/statsmodels/tsa/arima/estimators/hannan_rissanen.py b/statsmodels/tsa/arima/estimators/hannan_rissanen.py
index 908ad896b..90db197d9 100644
--- a/statsmodels/tsa/arima/estimators/hannan_rissanen.py
+++ b/statsmodels/tsa/arima/estimators/hannan_rissanen.py
@@ -5,16 +5,19 @@ Author: Chad Fulton
 License: BSD-3
 """
 import numpy as np
+
 from scipy.signal import lfilter
 from statsmodels.tools.tools import Bunch
 from statsmodels.regression.linear_model import OLS, yule_walker
 from statsmodels.tsa.tsatools import lagmat
+
 from statsmodels.tsa.arima.specification import SARIMAXSpecification
 from statsmodels.tsa.arima.params import SARIMAXParams


 def hannan_rissanen(endog, ar_order=0, ma_order=0, demean=True,
-    initial_ar_order=None, unbiased=None, fixed_params=None):
+                    initial_ar_order=None, unbiased=None,
+                    fixed_params=None):
     """
     Estimate ARMA parameters using Hannan-Rissanen procedure.

@@ -90,7 +93,211 @@ def hannan_rissanen(endog, ar_order=0, ma_order=0, demean=True,
        "Automatic Modeling Methods for Univariate Series."
        A Course in Time Series Analysis, 171–201.
     """
-    pass
+    spec = SARIMAXSpecification(endog, ar_order=ar_order, ma_order=ma_order)
+
+    fixed_params = _validate_fixed_params(fixed_params, spec.param_names)
+
+    endog = spec.endog
+    if demean:
+        endog = endog - endog.mean()
+
+    p = SARIMAXParams(spec=spec)
+
+    nobs = len(endog)
+    max_ar_order = spec.max_ar_order
+    max_ma_order = spec.max_ma_order
+
+    # Default initial_ar_order is as suggested by Gomez and Maravall (2001)
+    if initial_ar_order is None:
+        initial_ar_order = max(np.floor(np.log(nobs)**2).astype(int),
+                               2 * max(max_ar_order, max_ma_order))
+    # Create a spec, just to validate the initial autoregressive order
+    _ = SARIMAXSpecification(endog, ar_order=initial_ar_order)
+
+    # Unpack fixed and free ar/ma lags, ix, and params (fixed only)
+    params_info = _package_fixed_and_free_params_info(
+        fixed_params, spec.ar_lags, spec.ma_lags
+    )
+
+    # Compute lagged endog
+    lagged_endog = lagmat(endog, max_ar_order, trim='both')
+
+    # If no AR or MA components, this is just a variance computation
+    mod = None
+    if max_ma_order == 0 and max_ar_order == 0:
+        p.sigma2 = np.var(endog, ddof=0)
+        resid = endog.copy()
+    # If no MA component, this is just CSS
+    elif max_ma_order == 0:
+        # extract 1) lagged_endog with free params; 2) lagged_endog with fixed
+        # params; 3) endog residual after applying fixed params if applicable
+        X_with_free_params = lagged_endog[:, params_info.free_ar_ix]
+        X_with_fixed_params = lagged_endog[:, params_info.fixed_ar_ix]
+        y = endog[max_ar_order:]
+        if X_with_fixed_params.shape[1] != 0:
+            y = y - X_with_fixed_params.dot(params_info.fixed_ar_params)
+
+        # no free ar params -> variance computation on the endog residual
+        if X_with_free_params.shape[1] == 0:
+            p.ar_params = params_info.fixed_ar_params
+            p.sigma2 = np.var(y, ddof=0)
+            resid = y.copy()
+        # otherwise OLS with endog residual (after applying fixed params) as y,
+        # and lagged_endog with free params as X
+        else:
+            mod = OLS(y, X_with_free_params)
+            res = mod.fit()
+            resid = res.resid
+            p.sigma2 = res.scale
+            p.ar_params = _stitch_fixed_and_free_params(
+                fixed_ar_or_ma_lags=params_info.fixed_ar_lags,
+                fixed_ar_or_ma_params=params_info.fixed_ar_params,
+                free_ar_or_ma_lags=params_info.free_ar_lags,
+                free_ar_or_ma_params=res.params,
+                spec_ar_or_ma_lags=spec.ar_lags
+            )
+    # Otherwise ARMA model
+    else:
+        # Step 1: Compute long AR model via Yule-Walker, get residuals
+        initial_ar_params, _ = yule_walker(
+            endog, order=initial_ar_order, method='mle')
+        X = lagmat(endog, initial_ar_order, trim='both')
+        y = endog[initial_ar_order:]
+        resid = y - X.dot(initial_ar_params)
+
+        # Get lagged residuals for `exog` in least-squares regression
+        lagged_resid = lagmat(resid, max_ma_order, trim='both')
+
+        # Step 2: estimate ARMA model via least squares
+        ix = initial_ar_order + max_ma_order - max_ar_order
+        X_with_free_params = np.c_[
+            lagged_endog[ix:, params_info.free_ar_ix],
+            lagged_resid[:, params_info.free_ma_ix]
+        ]
+        X_with_fixed_params = np.c_[
+            lagged_endog[ix:, params_info.fixed_ar_ix],
+            lagged_resid[:, params_info.fixed_ma_ix]
+        ]
+        y = endog[initial_ar_order + max_ma_order:]
+        if X_with_fixed_params.shape[1] != 0:
+            y = y - X_with_fixed_params.dot(
+                np.r_[params_info.fixed_ar_params, params_info.fixed_ma_params]
+            )
+
+        # Step 2.1: no free ar params -> variance computation on the endog
+        # residual
+        if X_with_free_params.shape[1] == 0:
+            p.ar_params = params_info.fixed_ar_params
+            p.ma_params = params_info.fixed_ma_params
+            p.sigma2 = np.var(y, ddof=0)
+            resid = y.copy()
+        # Step 2.2: otherwise OLS with endog residual (after applying fixed
+        # params) as y, and lagged_endog and lagged_resid with free params as X
+        else:
+            mod = OLS(y, X_with_free_params)
+            res = mod.fit()
+            k_free_ar_params = len(params_info.free_ar_lags)
+            p.ar_params = _stitch_fixed_and_free_params(
+                fixed_ar_or_ma_lags=params_info.fixed_ar_lags,
+                fixed_ar_or_ma_params=params_info.fixed_ar_params,
+                free_ar_or_ma_lags=params_info.free_ar_lags,
+                free_ar_or_ma_params=res.params[:k_free_ar_params],
+                spec_ar_or_ma_lags=spec.ar_lags
+            )
+            p.ma_params = _stitch_fixed_and_free_params(
+                fixed_ar_or_ma_lags=params_info.fixed_ma_lags,
+                fixed_ar_or_ma_params=params_info.fixed_ma_params,
+                free_ar_or_ma_lags=params_info.free_ma_lags,
+                free_ar_or_ma_params=res.params[k_free_ar_params:],
+                spec_ar_or_ma_lags=spec.ma_lags
+            )
+            resid = res.resid
+            p.sigma2 = res.scale
+
+        # Step 3: bias correction (if requested)
+
+        # Step 3.1: validate `unbiased` argument and handle setting the default
+        if unbiased is True:
+            if len(fixed_params) != 0:
+                raise NotImplementedError(
+                    "Third step of Hannan-Rissanen estimation to remove "
+                    "parameter bias is not yet implemented for the case "
+                    "with fixed parameters."
+                )
+            elif not (p.is_stationary and p.is_invertible):
+                raise ValueError(
+                    "Cannot perform third step of Hannan-Rissanen estimation "
+                    "to remove parameter bias, because parameters estimated "
+                    "from the second step are non-stationary or "
+                    "non-invertible."
+                )
+        elif unbiased is None:
+            if len(fixed_params) != 0:
+                unbiased = False
+            else:
+                unbiased = p.is_stationary and p.is_invertible
+
+        # Step 3.2: bias correction
+        if unbiased is True:
+            if mod is None:
+                raise ValueError("Must have free parameters to use unbiased")
+            Z = np.zeros_like(endog)
+
+            ar_coef = p.ar_poly.coef
+            ma_coef = p.ma_poly.coef
+
+            for t in range(nobs):
+                if t >= max(max_ar_order, max_ma_order):
+                    # Note: in the case of non-consecutive lag orders, the
+                    # polynomials have the appropriate zeros so we don't
+                    # need to subset `endog[t - max_ar_order:t]` or
+                    # Z[t - max_ma_order:t]
+                    tmp_ar = np.dot(
+                        -ar_coef[1:], endog[t - max_ar_order:t][::-1])
+                    tmp_ma = np.dot(ma_coef[1:],
+                                    Z[t - max_ma_order:t][::-1])
+                    Z[t] = endog[t] - tmp_ar - tmp_ma
+
+            V = lfilter([1], ar_coef, Z)
+            W = lfilter(np.r_[1, -ma_coef[1:]], [1], Z)
+
+            lagged_V = lagmat(V, max_ar_order, trim='both')
+            lagged_W = lagmat(W, max_ma_order, trim='both')
+
+            exog = np.c_[
+                lagged_V[
+                    max(max_ma_order - max_ar_order, 0):,
+                    params_info.free_ar_ix
+                ],
+                lagged_W[
+                    max(max_ar_order - max_ma_order, 0):,
+                    params_info.free_ma_ix
+                ]
+            ]
+
+            mod_unbias = OLS(Z[max(max_ar_order, max_ma_order):], exog)
+            res_unbias = mod_unbias.fit()
+
+            p.ar_params = (
+                p.ar_params + res_unbias.params[:spec.k_ar_params])
+            p.ma_params = (
+                p.ma_params + res_unbias.params[spec.k_ar_params:])
+
+            # Recompute sigma2
+            resid = mod.endog - mod.exog.dot(
+                np.r_[p.ar_params, p.ma_params])
+            p.sigma2 = np.inner(resid, resid) / len(resid)
+
+    # TODO: Gomez and Maravall (2001) or Gomez (1998)
+    # propose one more step here to further improve MA estimates
+
+    # Construct results
+    other_results = Bunch({
+        'spec': spec,
+        'initial_ar_order': initial_ar_order,
+        'resid': resid
+    })
+    return p, other_results


 def _validate_fixed_params(fixed_params, spec_param_names):
@@ -104,11 +311,27 @@ def _validate_fixed_params(fixed_params, spec_param_names):
     spec_param_names : list of string
         SARIMAXSpecification.param_names
     """
-    pass
+    if fixed_params is None:
+        fixed_params = {}
+
+    assert isinstance(fixed_params, dict)
+
+    fixed_param_names = set(fixed_params.keys())
+    valid_param_names = set(spec_param_names) - {"sigma2"}
+
+    invalid_param_names = fixed_param_names - valid_param_names
+
+    if len(invalid_param_names) > 0:
+        raise ValueError(
+            f"Invalid fixed parameter(s): {sorted(list(invalid_param_names))}."
+            f" Please select among {sorted(list(valid_param_names))}."
+        )
+
+    return fixed_params


 def _package_fixed_and_free_params_info(fixed_params, spec_ar_lags,
-    spec_ma_lags):
+                                        spec_ma_lags):
     """
     Parameters
     ----------
@@ -125,12 +348,53 @@ def _package_fixed_and_free_params_info(fixed_params, spec_ar_lags,
     (ix) fixed_ar_ix, fixed_ma_ix, free_ar_ix, free_ma_ix;
     (params) fixed_ar_params, free_ma_params
     """
-    pass
+    # unpack fixed lags and params
+    fixed_ar_lags_and_params = []
+    fixed_ma_lags_and_params = []
+    for key, val in fixed_params.items():
+        lag = int(key.split(".")[-1].lstrip("L"))
+        if key.startswith("ar"):
+            fixed_ar_lags_and_params.append((lag, val))
+        elif key.startswith("ma"):
+            fixed_ma_lags_and_params.append((lag, val))
+
+    fixed_ar_lags_and_params.sort()
+    fixed_ma_lags_and_params.sort()
+
+    fixed_ar_lags = [lag for lag, _ in fixed_ar_lags_and_params]
+    fixed_ar_params = np.array([val for _, val in fixed_ar_lags_and_params])
+
+    fixed_ma_lags = [lag for lag, _ in fixed_ma_lags_and_params]
+    fixed_ma_params = np.array([val for _, val in fixed_ma_lags_and_params])
+
+    # unpack free lags
+    free_ar_lags = [lag for lag in spec_ar_lags
+                    if lag not in set(fixed_ar_lags)]
+    free_ma_lags = [lag for lag in spec_ma_lags
+                    if lag not in set(fixed_ma_lags)]

+    # get ix for indexing purposes: `ar_ix`, and `ma_ix` below, are to account
+    # for non-consecutive lags; for indexing purposes, must have dtype int
+    free_ar_ix = np.array(free_ar_lags, dtype=int) - 1
+    free_ma_ix = np.array(free_ma_lags, dtype=int) - 1
+    fixed_ar_ix = np.array(fixed_ar_lags, dtype=int) - 1
+    fixed_ma_ix = np.array(fixed_ma_lags, dtype=int) - 1

-def _stitch_fixed_and_free_params(fixed_ar_or_ma_lags,
-    fixed_ar_or_ma_params, free_ar_or_ma_lags, free_ar_or_ma_params,
-    spec_ar_or_ma_lags):
+    return Bunch(
+        # lags
+        fixed_ar_lags=fixed_ar_lags, fixed_ma_lags=fixed_ma_lags,
+        free_ar_lags=free_ar_lags, free_ma_lags=free_ma_lags,
+        # ixs
+        fixed_ar_ix=fixed_ar_ix, fixed_ma_ix=fixed_ma_ix,
+        free_ar_ix=free_ar_ix, free_ma_ix=free_ma_ix,
+        # fixed params
+        fixed_ar_params=fixed_ar_params, fixed_ma_params=fixed_ma_params,
+    )
+
+
+def _stitch_fixed_and_free_params(fixed_ar_or_ma_lags, fixed_ar_or_ma_params,
+                                  free_ar_or_ma_lags, free_ar_or_ma_params,
+                                  spec_ar_or_ma_lags):
     """
     Stitch together fixed and free params, by the order of lags, for setting
     SARIMAXParams.ma_params or SARIMAXParams.ar_params
@@ -150,4 +414,17 @@ def _stitch_fixed_and_free_params(fixed_ar_or_ma_lags,
     -------
     list of fixed and free params by the order of lags
     """
-    pass
+    assert len(fixed_ar_or_ma_lags) == len(fixed_ar_or_ma_params)
+    assert len(free_ar_or_ma_lags) == len(free_ar_or_ma_params)
+
+    all_lags = np.r_[fixed_ar_or_ma_lags, free_ar_or_ma_lags]
+    all_params = np.r_[fixed_ar_or_ma_params, free_ar_or_ma_params]
+    assert set(all_lags) == set(spec_ar_or_ma_lags)
+
+    lag_to_param_map = dict(zip(all_lags, all_params))
+
+    # Sort params by the order of their corresponding lags in
+    # spec_ar_or_ma_lags (e.g. SARIMAXSpecification.ar_lags or
+    # SARIMAXSpecification.ma_lags)
+    all_params_sorted = [lag_to_param_map[lag] for lag in spec_ar_or_ma_lags]
+    return all_params_sorted
diff --git a/statsmodels/tsa/arima/estimators/innovations.py b/statsmodels/tsa/arima/estimators/innovations.py
index b594bd23f..9526feb16 100644
--- a/statsmodels/tsa/arima/estimators/innovations.py
+++ b/statsmodels/tsa/arima/estimators/innovations.py
@@ -6,11 +6,13 @@ License: BSD-3
 """
 import warnings
 import numpy as np
+
 from scipy.optimize import minimize
 from statsmodels.tools.tools import Bunch
 from statsmodels.tsa.innovations import arma_innovations
 from statsmodels.tsa.stattools import acovf, innovations_algo
 from statsmodels.tsa.statespace.tools import diff
+
 from statsmodels.tsa.arima.specification import SARIMAXSpecification
 from statsmodels.tsa.arima.params import SARIMAXParams
 from statsmodels.tsa.arima.estimators.hannan_rissanen import hannan_rissanen
@@ -51,12 +53,42 @@ def innovations(endog, ma_order=0, demean=True):
     .. [1] Brockwell, Peter J., and Richard A. Davis. 2016.
        Introduction to Time Series and Forecasting. Springer.
     """
-    pass
+    spec = max_spec = SARIMAXSpecification(endog, ma_order=ma_order)
+    endog = max_spec.endog
+
+    if demean:
+        endog = endog - endog.mean()
+
+    if not max_spec.is_ma_consecutive:
+        raise ValueError('Innovations estimation unavailable for models with'
+                         ' seasonal or otherwise non-consecutive MA orders.')
+
+    sample_acovf = acovf(endog, fft=True)
+    theta, v = innovations_algo(sample_acovf, nobs=max_spec.ma_order + 1)
+    ma_params = [theta[i, :i] for i in range(1, max_spec.ma_order + 1)]
+    sigma2 = v
+
+    out = []
+    for i in range(max_spec.ma_order + 1):
+        spec = SARIMAXSpecification(ma_order=i)
+        p = SARIMAXParams(spec=spec)
+        if i == 0:
+            p.params = sigma2[i]
+        else:
+            p.params = np.r_[ma_params[i - 1], sigma2[i]]
+        out.append(p)
+
+    # Construct other results
+    other_results = Bunch({
+        'spec': spec,
+    })
+
+    return out, other_results


 def innovations_mle(endog, order=(0, 0, 0), seasonal_order=(0, 0, 0, 0),
-    demean=True, enforce_invertibility=True, start_params=None,
-    minimize_kwargs=None):
+                    demean=True, enforce_invertibility=True,
+                    start_params=None, minimize_kwargs=None):
     """
     Estimate SARIMA parameters by MLE using innovations algorithm.

@@ -116,4 +148,104 @@ def innovations_mle(endog, order=(0, 0, 0), seasonal_order=(0, 0, 0, 0),
     .. [1] Brockwell, Peter J., and Richard A. Davis. 2016.
        Introduction to Time Series and Forecasting. Springer.
     """
-    pass
+    spec = SARIMAXSpecification(
+        endog, order=order, seasonal_order=seasonal_order,
+        enforce_stationarity=True, enforce_invertibility=enforce_invertibility)
+    endog = spec.endog
+    if spec.is_integrated:
+        warnings.warn('Provided `endog` series has been differenced to'
+                      ' eliminate integration prior to ARMA parameter'
+                      ' estimation.')
+        endog = diff(endog, k_diff=spec.diff,
+                     k_seasonal_diff=spec.seasonal_diff,
+                     seasonal_periods=spec.seasonal_periods)
+    if demean:
+        endog = endog - endog.mean()
+
+    p = SARIMAXParams(spec=spec)
+
+    if start_params is None:
+        sp = SARIMAXParams(spec=spec)
+
+        # Estimate starting parameters via Hannan-Rissanen
+        hr, hr_results = hannan_rissanen(endog, ar_order=spec.ar_order,
+                                         ma_order=spec.ma_order, demean=False)
+        if spec.seasonal_periods == 0:
+            # If no seasonal component, then `hr` gives starting parameters
+            sp.params = hr.params
+        else:
+            # If we do have a seasonal component, estimate starting parameters
+            # for the seasonal lags using the residuals from the previous step
+            _ = SARIMAXSpecification(
+                endog, seasonal_order=seasonal_order,
+                enforce_stationarity=True,
+                enforce_invertibility=enforce_invertibility)
+
+            ar_order = np.array(spec.seasonal_ar_lags) * spec.seasonal_periods
+            ma_order = np.array(spec.seasonal_ma_lags) * spec.seasonal_periods
+            seasonal_hr, seasonal_hr_results = hannan_rissanen(
+                hr_results.resid, ar_order=ar_order, ma_order=ma_order,
+                demean=False)
+
+            # Set the starting parameters
+            sp.ar_params = hr.ar_params
+            sp.ma_params = hr.ma_params
+            sp.seasonal_ar_params = seasonal_hr.ar_params
+            sp.seasonal_ma_params = seasonal_hr.ma_params
+            sp.sigma2 = seasonal_hr.sigma2
+
+        # Then, require starting parameters to be stationary and invertible
+        if not sp.is_stationary:
+            sp.ar_params = [0] * sp.k_ar_params
+            sp.seasonal_ar_params = [0] * sp.k_seasonal_ar_params
+
+        if not sp.is_invertible and spec.enforce_invertibility:
+            sp.ma_params = [0] * sp.k_ma_params
+            sp.seasonal_ma_params = [0] * sp.k_seasonal_ma_params
+
+        start_params = sp.params
+    else:
+        sp = SARIMAXParams(spec=spec)
+        sp.params = start_params
+        if not sp.is_stationary:
+            raise ValueError('Given starting parameters imply a non-stationary'
+                             ' AR process. Innovations algorithm requires a'
+                             ' stationary process.')
+
+        if spec.enforce_invertibility and not sp.is_invertible:
+            raise ValueError('Given starting parameters imply a non-invertible'
+                             ' MA process with `enforce_invertibility=True`.')
+
+    def obj(params):
+        p.params = spec.constrain_params(params)
+
+        return -arma_innovations.arma_loglike(
+            endog, ar_params=-p.reduced_ar_poly.coef[1:],
+            ma_params=p.reduced_ma_poly.coef[1:], sigma2=p.sigma2)
+
+    # Untransform the starting parameters
+    unconstrained_start_params = spec.unconstrain_params(start_params)
+
+    # Perform the minimization
+    if minimize_kwargs is None:
+        minimize_kwargs = {}
+    if 'options' not in minimize_kwargs:
+        minimize_kwargs['options'] = {}
+    minimize_kwargs['options'].setdefault('maxiter', 100)
+    minimize_results = minimize(obj, unconstrained_start_params,
+                                **minimize_kwargs)
+
+    # TODO: show warning if convergence failed.
+
+    # Reverse the transformation to get the optimal parameters
+    p.params = spec.constrain_params(minimize_results.x)
+
+    # Construct other results
+    other_results = Bunch({
+        'spec': spec,
+        'minimize_results': minimize_results,
+        'minimize_kwargs': minimize_kwargs,
+        'start_params': start_params
+    })
+
+    return p, other_results
diff --git a/statsmodels/tsa/arima/estimators/statespace.py b/statsmodels/tsa/arima/estimators/statespace.py
index 2168bdd69..63515e40a 100644
--- a/statsmodels/tsa/arima/estimators/statespace.py
+++ b/statsmodels/tsa/arima/estimators/statespace.py
@@ -5,16 +5,18 @@ Author: Chad Fulton
 License: BSD-3
 """
 import numpy as np
+
 from statsmodels.tools.tools import add_constant, Bunch
 from statsmodels.tsa.statespace.sarimax import SARIMAX
+
 from statsmodels.tsa.arima.specification import SARIMAXSpecification
 from statsmodels.tsa.arima.params import SARIMAXParams


-def statespace(endog, exog=None, order=(0, 0, 0), seasonal_order=(0, 0, 0, 
-    0), include_constant=True, enforce_stationarity=True,
-    enforce_invertibility=True, concentrate_scale=False, start_params=None,
-    fit_kwargs=None):
+def statespace(endog, exog=None, order=(0, 0, 0),
+               seasonal_order=(0, 0, 0, 0), include_constant=True,
+               enforce_stationarity=True, enforce_invertibility=True,
+               concentrate_scale=False, start_params=None, fit_kwargs=None):
     """
     Estimate SARIMAX parameters using state space methods.

@@ -71,4 +73,50 @@ def statespace(endog, exog=None, order=(0, 0, 0), seasonal_order=(0, 0, 0,
        Time Series Analysis by State Space Methods: Second Edition.
        Oxford University Press.
     """
-    pass
+    # Handle including the constant (need to do it now so that the constant
+    # parameter can be included in the specification as part of `exog`.)
+    if include_constant:
+        exog = np.ones_like(endog) if exog is None else add_constant(exog)
+
+    # Create the specification
+    spec = SARIMAXSpecification(
+        endog, exog=exog, order=order, seasonal_order=seasonal_order,
+        enforce_stationarity=enforce_stationarity,
+        enforce_invertibility=enforce_invertibility,
+        concentrate_scale=concentrate_scale)
+    endog = spec.endog
+    exog = spec.exog
+    p = SARIMAXParams(spec=spec)
+
+    # Check start parameters
+    if start_params is not None:
+        sp = SARIMAXParams(spec=spec)
+        sp.params = start_params
+
+        if spec.enforce_stationarity and not sp.is_stationary:
+            raise ValueError('Given starting parameters imply a non-stationary'
+                             ' AR process with `enforce_stationarity=True`.')
+
+        if spec.enforce_invertibility and not sp.is_invertible:
+            raise ValueError('Given starting parameters imply a non-invertible'
+                             ' MA process with `enforce_invertibility=True`.')
+
+    # Create and fit the state space model
+    mod = SARIMAX(endog, exog=exog, order=spec.order,
+                  seasonal_order=spec.seasonal_order,
+                  enforce_stationarity=spec.enforce_stationarity,
+                  enforce_invertibility=spec.enforce_invertibility,
+                  concentrate_scale=spec.concentrate_scale)
+    if fit_kwargs is None:
+        fit_kwargs = {}
+    fit_kwargs.setdefault('disp', 0)
+    res_ss = mod.fit(start_params=start_params, **fit_kwargs)
+
+    # Construct results
+    p.params = res_ss.params
+    res = Bunch({
+        'spec': spec,
+        'statespace_results': res_ss,
+    })
+
+    return p, res
diff --git a/statsmodels/tsa/arima/estimators/yule_walker.py b/statsmodels/tsa/arima/estimators/yule_walker.py
index 8f5609309..7b2a74932 100644
--- a/statsmodels/tsa/arima/estimators/yule_walker.py
+++ b/statsmodels/tsa/arima/estimators/yule_walker.py
@@ -5,13 +5,14 @@ Author: Chad Fulton
 License: BSD-3
 """
 from statsmodels.compat.pandas import deprecate_kwarg
+
 from statsmodels.regression import linear_model
 from statsmodels.tools.tools import Bunch
 from statsmodels.tsa.arima.params import SARIMAXParams
 from statsmodels.tsa.arima.specification import SARIMAXSpecification


-@deprecate_kwarg('unbiased', 'adjusted')
+@deprecate_kwarg("unbiased", "adjusted")
 def yule_walker(endog, ar_order=0, demean=True, adjusted=False):
     """
     Estimate AR parameters using Yule-Walker equations.
@@ -53,4 +54,23 @@ def yule_walker(endog, ar_order=0, demean=True, adjusted=False):
     .. [1] Brockwell, Peter J., and Richard A. Davis. 2016.
        Introduction to Time Series and Forecasting. Springer.
     """
-    pass
+    spec = SARIMAXSpecification(endog, ar_order=ar_order)
+    endog = spec.endog
+    p = SARIMAXParams(spec=spec)
+
+    if not spec.is_ar_consecutive:
+        raise ValueError('Yule-Walker estimation unavailable for models with'
+                         ' seasonal or non-consecutive AR orders.')
+
+    # Estimate parameters
+    method = 'adjusted' if adjusted else 'mle'
+    p.ar_params, sigma = linear_model.yule_walker(
+        endog, order=ar_order, demean=demean, method=method)
+    p.sigma2 = sigma**2
+
+    # Construct other results
+    other_results = Bunch({
+        'spec': spec,
+    })
+
+    return p, other_results
diff --git a/statsmodels/tsa/arima/model.py b/statsmodels/tsa/arima/model.py
index a14c04b34..2639b8aa8 100644
--- a/statsmodels/tsa/arima/model.py
+++ b/statsmodels/tsa/arima/model.py
@@ -5,23 +5,29 @@ Author: Chad Fulton
 License: BSD-3
 """
 from statsmodels.compat.pandas import Appender
+
 import warnings
+
 import numpy as np
+
 from statsmodels.tools.data import _is_using_pandas
 from statsmodels.tsa.statespace import sarimax
 from statsmodels.tsa.statespace.kalman_filter import MEMORY_CONSERVE
 from statsmodels.tsa.statespace.tools import diff
 import statsmodels.base.wrapper as wrap
+
 from statsmodels.tsa.arima.estimators.yule_walker import yule_walker
 from statsmodels.tsa.arima.estimators.burg import burg
 from statsmodels.tsa.arima.estimators.hannan_rissanen import hannan_rissanen
-from statsmodels.tsa.arima.estimators.innovations import innovations, innovations_mle
+from statsmodels.tsa.arima.estimators.innovations import (
+    innovations, innovations_mle)
 from statsmodels.tsa.arima.estimators.gls import gls as estimate_gls
+
 from statsmodels.tsa.arima.specification import SARIMAXSpecification


 class ARIMA(sarimax.SARIMAX):
-    """
+    r"""
     Autoregressive Integrated Moving Average (ARIMA) model, and extensions

     This model is the basic interface for ARIMA-type models, including those
@@ -98,17 +104,17 @@ class ARIMA(sarimax.SARIMAX):

     .. math::

-        Y_{t}-\\delta_{0}-\\delta_{1}t-\\ldots-\\delta_{k}t^{k}-X_{t}\\beta
-            & =\\epsilon_{t} \\\\
-        \\left(1-L\\right)^{d}\\left(1-L^{s}\\right)^{D}\\Phi\\left(L\\right)
-        \\Phi_{s}\\left(L\\right)\\epsilon_{t}
-            & =\\Theta\\left(L\\right)\\Theta_{s}\\left(L\\right)\\eta_{t}
+        Y_{t}-\delta_{0}-\delta_{1}t-\ldots-\delta_{k}t^{k}-X_{t}\beta
+            & =\epsilon_{t} \\
+        \left(1-L\right)^{d}\left(1-L^{s}\right)^{D}\Phi\left(L\right)
+        \Phi_{s}\left(L\right)\epsilon_{t}
+            & =\Theta\left(L\right)\Theta_{s}\left(L\right)\eta_{t}

-    where :math:`\\eta_t \\sim WN(0,\\sigma^2)` is a white noise process, L
+    where :math:`\eta_t \sim WN(0,\sigma^2)` is a white noise process, L
     is the lag operator, and :math:`G(L)` are lag polynomials corresponding
-    to the autoregressive (:math:`\\Phi`), seasonal autoregressive
-    (:math:`\\Phi_s`), moving average (:math:`\\Theta`), and seasonal moving
-    average components (:math:`\\Theta_s`).
+    to the autoregressive (:math:`\Phi`), seasonal autoregressive
+    (:math:`\Phi_s`), moving average (:math:`\Theta`), and seasonal moving
+    average components (:math:`\Theta_s`).

     `enforce_stationarity` and `enforce_invertibility` are specified in the
     constructor because they affect loglikelihood computations, and so should
@@ -128,57 +134,100 @@ class ARIMA(sarimax.SARIMAX):
     >>> res = mod.fit()
     >>> print(res.summary())
     """
-
-    def __init__(self, endog, exog=None, order=(0, 0, 0), seasonal_order=(0,
-        0, 0, 0), trend=None, enforce_stationarity=True,
-        enforce_invertibility=True, concentrate_scale=False, trend_offset=1,
-        dates=None, freq=None, missing='none', validate_specification=True):
+    def __init__(self, endog, exog=None, order=(0, 0, 0),
+                 seasonal_order=(0, 0, 0, 0), trend=None,
+                 enforce_stationarity=True, enforce_invertibility=True,
+                 concentrate_scale=False, trend_offset=1, dates=None,
+                 freq=None, missing='none', validate_specification=True):
+        # Default for trend
+        # 'c' if there is no integration and 'n' otherwise
+        # TODO: if trend='c', then we could alternatively use `demean=True` in
+        # the estimation methods rather than setting up `exog` and using GLS.
+        # Not sure if it's worth the trouble though.
         integrated = order[1] > 0 or seasonal_order[1] > 0
         if trend is None and not integrated:
             trend = 'c'
         elif trend is None:
             trend = 'n'
-        self._spec_arima = SARIMAXSpecification(endog, exog=exog, order=
-            order, seasonal_order=seasonal_order, trend=trend,
-            enforce_stationarity=None, enforce_invertibility=None,
+
+        # Construct the specification
+        # (don't pass specific values of enforce stationarity/invertibility,
+        # because we don't actually want to restrict the estimators based on
+        # this criteria. Instead, we'll just make sure that the parameter
+        # estimates from those methods satisfy the criteria.)
+        self._spec_arima = SARIMAXSpecification(
+            endog, exog=exog, order=order, seasonal_order=seasonal_order,
+            trend=trend, enforce_stationarity=None, enforce_invertibility=None,
             concentrate_scale=concentrate_scale, trend_offset=trend_offset,
-            dates=dates, freq=freq, missing=missing, validate_specification
-            =validate_specification)
+            dates=dates, freq=freq, missing=missing,
+            validate_specification=validate_specification)
         exog = self._spec_arima._model.data.orig_exog
+
+        # Raise an error if we have a constant in an integrated model
+
         has_trend = len(self._spec_arima.trend_terms) > 0
         if has_trend:
             lowest_trend = np.min(self._spec_arima.trend_terms)
             if lowest_trend < order[1] + seasonal_order[1]:
                 raise ValueError(
-                    'In models with integration (`d > 0`) or seasonal integration (`D > 0`), trend terms of lower order than `d + D` cannot be (as they would be eliminated due to the differencing operation). For example, a constant cannot be included in an ARIMA(1, 1, 1) model, but including a linear trend, which would have the same effect as fitting a constant to the differenced data, is allowed.'
-                    )
+                    'In models with integration (`d > 0`) or seasonal'
+                    ' integration (`D > 0`), trend terms of lower order than'
+                    ' `d + D` cannot be (as they would be eliminated due to'
+                    ' the differencing operation). For example, a constant'
+                    ' cannot be included in an ARIMA(1, 1, 1) model, but'
+                    ' including a linear trend, which would have the same'
+                    ' effect as fitting a constant to the differenced data,'
+                    ' is allowed.')
+
+        # Keep the given `exog` by removing the prepended trend variables
         input_exog = None
         if exog is not None:
             if _is_using_pandas(exog, None):
                 input_exog = exog.iloc[:, self._spec_arima.k_trend:]
             else:
                 input_exog = exog[:, self._spec_arima.k_trend:]
-        super().__init__(endog, exog, trend=None, order=order,
-            seasonal_order=seasonal_order, enforce_stationarity=
-            enforce_stationarity, enforce_invertibility=
-            enforce_invertibility, concentrate_scale=concentrate_scale,
-            dates=dates, freq=freq, missing=missing, validate_specification
-            =validate_specification)
+
+        # Initialize the base SARIMAX class
+        # Note: we don't pass in a trend value to the base class, since ARIMA
+        # standardizes the trend to always be part of exog, while the base
+        # SARIMAX class puts it in the transition equation.
+        super().__init__(
+            endog, exog, trend=None, order=order,
+            seasonal_order=seasonal_order,
+            enforce_stationarity=enforce_stationarity,
+            enforce_invertibility=enforce_invertibility,
+            concentrate_scale=concentrate_scale, dates=dates, freq=freq,
+            missing=missing, validate_specification=validate_specification)
         self.trend = trend
+
+        # Save the input exog and input exog names, so that we can refer to
+        # them later (see especially `ARIMAResults.append`)
         self._input_exog = input_exog
         if exog is not None:
             self._input_exog_names = self.exog_names[self._spec_arima.k_trend:]
         else:
             self._input_exog_names = None
+
+        # Override the public attributes for k_exog and k_trend to reflect the
+        # distinction here (for the purpose of the superclass, these are both
+        # combined as `k_exog`)
         self.k_exog = self._spec_arima.k_exog
         self.k_trend = self._spec_arima.k_trend
+
+        # Remove some init kwargs that aren't used in this model
         unused = ['measurement_error', 'time_varying_regression',
-            'mle_regression', 'simple_differencing', 'hamilton_representation']
+                  'mle_regression', 'simple_differencing',
+                  'hamilton_representation']
         self._init_keys = [key for key in self._init_keys if key not in unused]

+    @property
+    def _res_classes(self):
+        return {'fit': (ARIMAResults, ARIMAResultsWrapper)}
+
     def fit(self, start_params=None, transformed=True, includes_fixed=False,
-        method=None, method_kwargs=None, gls=None, gls_kwargs=None,
-        cov_type=None, cov_kwds=None, return_params=False, low_memory=False):
+            method=None, method_kwargs=None, gls=None, gls_kwargs=None,
+            cov_type=None, cov_kwds=None, return_params=False,
+            low_memory=False):
         """
         Fit (estimate) the parameters of the model.

@@ -263,21 +312,223 @@ class ARIMA(sarimax.SARIMAX):
         >>> res = mod.fit()
         >>> print(res.summary())
         """
-        pass
+        # Determine which method to use
+        # 1. If method is specified, make sure it is valid
+        if method is not None:
+            self._spec_arima.validate_estimator(method)
+        # 2. Otherwise, use state space
+        # TODO: may want to consider using innovations (MLE) if possible here,
+        # (since in some cases it may be faster than state space), but it is
+        # less tested.
+        else:
+            method = 'statespace'
+
+        # Can only use fixed parameters with the following methods
+        methods_with_fixed_params = ['statespace', 'hannan_rissanen']
+        if self._has_fixed_params and method not in methods_with_fixed_params:
+            raise ValueError(
+                "When parameters have been fixed, only the methods "
+                f"{methods_with_fixed_params} can be used; got '{method}'."
+            )
+
+        # Handle kwargs related to the fit method
+        if method_kwargs is None:
+            method_kwargs = {}
+        required_kwargs = []
+        if method == 'statespace':
+            required_kwargs = ['enforce_stationarity', 'enforce_invertibility',
+                               'concentrate_scale']
+        elif method == 'innovations_mle':
+            required_kwargs = ['enforce_invertibility']
+        for name in required_kwargs:
+            if name in method_kwargs:
+                raise ValueError('Cannot override model level value for "%s"'
+                                 ' when method="%s".' % (name, method))
+            method_kwargs[name] = getattr(self, name)
+
+        # Handle kwargs related to GLS estimation
+        if gls_kwargs is None:
+            gls_kwargs = {}
+
+        # Handle starting parameters
+        # TODO: maybe should have standard way of computing starting
+        # parameters in this class?
+        if start_params is not None:
+            if method not in ['statespace', 'innovations_mle']:
+                raise ValueError('Estimation method "%s" does not use starting'
+                                 ' parameters, but `start_params` argument was'
+                                 ' given.' % method)
+
+            method_kwargs['start_params'] = start_params
+            method_kwargs['transformed'] = transformed
+            method_kwargs['includes_fixed'] = includes_fixed
+
+        # Perform estimation, depending on whether we have exog or not
+        p = None
+        fit_details = None
+        has_exog = self._spec_arima.exog is not None
+        if has_exog or method == 'statespace':
+            # Use GLS if it was explicitly requested (`gls = True`) or if it
+            # was left at the default (`gls = None`) and the ARMA estimator is
+            # anything but statespace.
+            # Note: both GLS and statespace are able to handle models with
+            # integration, so we don't need to difference endog or exog here.
+            if has_exog and (gls or (gls is None and method != 'statespace')):
+                if self._has_fixed_params:
+                    raise NotImplementedError(
+                        'GLS estimation is not yet implemented for the case '
+                        'with fixed parameters.'
+                    )
+                p, fit_details = estimate_gls(
+                    self.endog, exog=self.exog, order=self.order,
+                    seasonal_order=self.seasonal_order, include_constant=False,
+                    arma_estimator=method, arma_estimator_kwargs=method_kwargs,
+                    **gls_kwargs)
+            elif method != 'statespace':
+                raise ValueError('If `exog` is given and GLS is disabled'
+                                 ' (`gls=False`), then the only valid'
+                                 " method is 'statespace'. Got '%s'."
+                                 % method)
+            else:
+                method_kwargs.setdefault('disp', 0)
+
+                res = super().fit(
+                    return_params=return_params, low_memory=low_memory,
+                    cov_type=cov_type, cov_kwds=cov_kwds, **method_kwargs)
+                if not return_params:
+                    res.fit_details = res.mlefit
+        else:
+            # Handle differencing if we have an integrated model
+            # (these methods do not support handling integration internally,
+            # so we need to manually do the differencing)
+            endog = self.endog
+            order = self._spec_arima.order
+            seasonal_order = self._spec_arima.seasonal_order
+            if self._spec_arima.is_integrated:
+                warnings.warn('Provided `endog` series has been differenced'
+                              ' to eliminate integration prior to parameter'
+                              ' estimation by method "%s".' % method,
+                              stacklevel=2,)
+                endog = diff(
+                    endog, k_diff=self._spec_arima.diff,
+                    k_seasonal_diff=self._spec_arima.seasonal_diff,
+                    seasonal_periods=self._spec_arima.seasonal_periods)
+                if order[1] > 0:
+                    order = (order[0], 0, order[2])
+                if seasonal_order[1] > 0:
+                    seasonal_order = (seasonal_order[0], 0, seasonal_order[2],
+                                      seasonal_order[3])
+            if self._has_fixed_params:
+                method_kwargs['fixed_params'] = self._fixed_params.copy()
+
+            # Now, estimate parameters
+            if method == 'yule_walker':
+                p, fit_details = yule_walker(
+                    endog, ar_order=order[0], demean=False,
+                    **method_kwargs)
+            elif method == 'burg':
+                p, fit_details = burg(endog, ar_order=order[0],
+                                      demean=False, **method_kwargs)
+            elif method == 'hannan_rissanen':
+                p, fit_details = hannan_rissanen(
+                    endog, ar_order=order[0],
+                    ma_order=order[2], demean=False, **method_kwargs)
+            elif method == 'innovations':
+                p, fit_details = innovations(
+                    endog, ma_order=order[2], demean=False,
+                    **method_kwargs)
+                # innovations computes estimates through the given order, so
+                # we want to take the estimate associated with the given order
+                p = p[-1]
+            elif method == 'innovations_mle':
+                p, fit_details = innovations_mle(
+                    endog, order=order,
+                    seasonal_order=seasonal_order,
+                    demean=False, **method_kwargs)
+
+        # In all cases except method='statespace', we now need to extract the
+        # parameters and, optionally, create a new results object
+        if p is not None:
+            # Need to check that fitted parameters satisfy given restrictions
+            if (self.enforce_stationarity
+                    and self._spec_arima.max_reduced_ar_order > 0
+                    and not p.is_stationary):
+                raise ValueError('Non-stationary autoregressive parameters'
+                                 ' found with `enforce_stationarity=True`.'
+                                 ' Consider setting it to False or using a'
+                                 ' different estimation method, such as'
+                                 ' method="statespace".')
+
+            if (self.enforce_invertibility
+                    and self._spec_arima.max_reduced_ma_order > 0
+                    and not p.is_invertible):
+                raise ValueError('Non-invertible moving average parameters'
+                                 ' found with `enforce_invertibility=True`.'
+                                 ' Consider setting it to False or using a'
+                                 ' different estimation method, such as'
+                                 ' method="statespace".')
+
+            # Build the requested results
+            if return_params:
+                res = p.params
+            else:
+                # Handle memory conservation option
+                if low_memory:
+                    conserve_memory = self.ssm.conserve_memory
+                    self.ssm.set_conserve_memory(MEMORY_CONSERVE)
+
+                # Perform filtering / smoothing
+                if (self.ssm.memory_no_predicted or self.ssm.memory_no_gain
+                        or self.ssm.memory_no_smoothing):
+                    func = self.filter
+                else:
+                    func = self.smooth
+                res = func(p.params, transformed=True, includes_fixed=True,
+                           cov_type=cov_type, cov_kwds=cov_kwds)
+
+                # Save any details from the fit method
+                res.fit_details = fit_details
+
+                # Reset memory conservation
+                if low_memory:
+                    self.ssm.set_conserve_memory(conserve_memory)
+
+        return res


 @Appender(sarimax.SARIMAXResults.__doc__)
 class ARIMAResults(sarimax.SARIMAXResults):
-    pass
+
+    @Appender(sarimax.SARIMAXResults.append.__doc__)
+    def append(self, endog, exog=None, refit=False, fit_kwargs=None, **kwargs):
+        # MLEResults.append will concatenate the given `exog` here with
+        # `data.orig_exog`. However, `data.orig_exog` already has had any
+        # trend variables prepended to it, while the `exog` given here should
+        # not. Instead, we need to temporarily replace `orig_exog` and
+        # `exog_names` with the ones that correspond to those that were input
+        # by the user.
+        if exog is not None:
+            orig_exog = self.model.data.orig_exog
+            exog_names = self.model.exog_names
+            self.model.data.orig_exog = self.model._input_exog
+            self.model.exog_names = self.model._input_exog_names
+
+        # Perform the appending procedure
+        out = super().append(endog, exog=exog, refit=refit,
+                             fit_kwargs=fit_kwargs, **kwargs)
+
+        # Now we reverse the temporary change made above
+        if exog is not None:
+            self.model.data.orig_exog = orig_exog
+            self.model.exog_names = exog_names
+        return out


 class ARIMAResultsWrapper(sarimax.SARIMAXResultsWrapper):
     _attrs = {}
-    _wrap_attrs = wrap.union_dicts(sarimax.SARIMAXResultsWrapper.
-        _wrap_attrs, _attrs)
+    _wrap_attrs = wrap.union_dicts(
+        sarimax.SARIMAXResultsWrapper._wrap_attrs, _attrs)
     _methods = {}
-    _wrap_methods = wrap.union_dicts(sarimax.SARIMAXResultsWrapper.
-        _wrap_methods, _methods)
-
-
-wrap.populate_wrapper(ARIMAResultsWrapper, ARIMAResults)
+    _wrap_methods = wrap.union_dicts(
+        sarimax.SARIMAXResultsWrapper._wrap_methods, _methods)
+wrap.populate_wrapper(ARIMAResultsWrapper, ARIMAResults)  # noqa:E305
diff --git a/statsmodels/tsa/arima/params.py b/statsmodels/tsa/arima/params.py
index 6e1e40b2d..52b8a7e0a 100644
--- a/statsmodels/tsa/arima/params.py
+++ b/statsmodels/tsa/arima/params.py
@@ -7,6 +7,7 @@ License: BSD-3
 import numpy as np
 import pandas as pd
 from numpy.polynomial import Polynomial
+
 from statsmodels.tsa.statespace.tools import is_invertible
 from statsmodels.tsa.arima.tools import validate_basic

@@ -54,106 +55,301 @@ class SARIMAXParams:

     def __init__(self, spec):
         self.spec = spec
+
+        # Local copies of relevant attributes
         self.exog_names = spec.exog_names
         self.ar_names = spec.ar_names
         self.ma_names = spec.ma_names
         self.seasonal_ar_names = spec.seasonal_ar_names
         self.seasonal_ma_names = spec.seasonal_ma_names
         self.param_names = spec.param_names
+
         self.k_exog_params = spec.k_exog_params
         self.k_ar_params = spec.k_ar_params
         self.k_ma_params = spec.k_ma_params
         self.k_seasonal_ar_params = spec.k_seasonal_ar_params
         self.k_seasonal_ma_params = spec.k_seasonal_ma_params
         self.k_params = spec.k_params
-        self._params_split = spec.split_params(np.zeros(self.k_params) * np
-            .nan, allow_infnan=True)
+
+        # Cache for holding parameter values
+        self._params_split = spec.split_params(
+            np.zeros(self.k_params) * np.nan, allow_infnan=True)
         self._params = None

     @property
     def exog_params(self):
         """(array) Parameters associated with exogenous variables."""
-        pass
+        return self._params_split['exog_params']
+
+    @exog_params.setter
+    def exog_params(self, value):
+        if np.isscalar(value):
+            value = [value] * self.k_exog_params
+        self._params_split['exog_params'] = validate_basic(
+            value, self.k_exog_params, title='exogenous coefficients')
+        self._params = None

     @property
     def ar_params(self):
         """(array) Autoregressive (non-seasonal) parameters."""
-        pass
+        return self._params_split['ar_params']
+
+    @ar_params.setter
+    def ar_params(self, value):
+        if np.isscalar(value):
+            value = [value] * self.k_ar_params
+        self._params_split['ar_params'] = validate_basic(
+            value, self.k_ar_params, title='AR coefficients')
+        self._params = None

     @property
     def ar_poly(self):
         """(Polynomial) Autoregressive (non-seasonal) lag polynomial."""
-        pass
+        coef = np.zeros(self.spec.max_ar_order + 1)
+        coef[0] = 1
+        ix = self.spec.ar_lags
+        coef[ix] = -self._params_split['ar_params']
+        return Polynomial(coef)
+
+    @ar_poly.setter
+    def ar_poly(self, value):
+        # Convert from the polynomial to the parameters, and set that way
+        if isinstance(value, Polynomial):
+            value = value.coef
+        value = validate_basic(value, self.spec.max_ar_order + 1,
+                               title='AR polynomial')
+        if value[0] != 1:
+            raise ValueError('AR polynomial constant must be equal to 1.')
+        ar_params = []
+        for i in range(1, self.spec.max_ar_order + 1):
+            if i in self.spec.ar_lags:
+                ar_params.append(-value[i])
+            elif value[i] != 0:
+                raise ValueError('AR polynomial includes non-zero values'
+                                 ' for lags that are excluded in the'
+                                 ' specification.')
+        self.ar_params = ar_params

     @property
     def ma_params(self):
         """(array) Moving average (non-seasonal) parameters."""
-        pass
+        return self._params_split['ma_params']
+
+    @ma_params.setter
+    def ma_params(self, value):
+        if np.isscalar(value):
+            value = [value] * self.k_ma_params
+        self._params_split['ma_params'] = validate_basic(
+            value, self.k_ma_params, title='MA coefficients')
+        self._params = None

     @property
     def ma_poly(self):
         """(Polynomial) Moving average (non-seasonal) lag polynomial."""
-        pass
+        coef = np.zeros(self.spec.max_ma_order + 1)
+        coef[0] = 1
+        ix = self.spec.ma_lags
+        coef[ix] = self._params_split['ma_params']
+        return Polynomial(coef)
+
+    @ma_poly.setter
+    def ma_poly(self, value):
+        # Convert from the polynomial to the parameters, and set that way
+        if isinstance(value, Polynomial):
+            value = value.coef
+        value = validate_basic(value, self.spec.max_ma_order + 1,
+                               title='MA polynomial')
+        if value[0] != 1:
+            raise ValueError('MA polynomial constant must be equal to 1.')
+        ma_params = []
+        for i in range(1, self.spec.max_ma_order + 1):
+            if i in self.spec.ma_lags:
+                ma_params.append(value[i])
+            elif value[i] != 0:
+                raise ValueError('MA polynomial includes non-zero values'
+                                 ' for lags that are excluded in the'
+                                 ' specification.')
+        self.ma_params = ma_params

     @property
     def seasonal_ar_params(self):
         """(array) Seasonal autoregressive parameters."""
-        pass
+        return self._params_split['seasonal_ar_params']
+
+    @seasonal_ar_params.setter
+    def seasonal_ar_params(self, value):
+        if np.isscalar(value):
+            value = [value] * self.k_seasonal_ar_params
+        self._params_split['seasonal_ar_params'] = validate_basic(
+            value, self.k_seasonal_ar_params, title='seasonal AR coefficients')
+        self._params = None

     @property
     def seasonal_ar_poly(self):
         """(Polynomial) Seasonal autoregressive lag polynomial."""
-        pass
+        # Need to expand the polynomial according to the season
+        s = self.spec.seasonal_periods
+        coef = [1]
+        if s > 0:
+            expanded = np.zeros(self.spec.max_seasonal_ar_order)
+            ix = np.array(self.spec.seasonal_ar_lags, dtype=int) - 1
+            expanded[ix] = -self._params_split['seasonal_ar_params']
+            coef = np.r_[1, np.pad(np.reshape(expanded, (-1, 1)),
+                                   [(0, 0), (s - 1, 0)], 'constant').flatten()]
+        return Polynomial(coef)
+
+    @seasonal_ar_poly.setter
+    def seasonal_ar_poly(self, value):
+        s = self.spec.seasonal_periods
+        # Note: assume that we are given coefficients from the full polynomial
+        # Convert from the polynomial to the parameters, and set that way
+        if isinstance(value, Polynomial):
+            value = value.coef
+        value = validate_basic(value, 1 + s * self.spec.max_seasonal_ar_order,
+                               title='seasonal AR polynomial')
+        if value[0] != 1:
+            raise ValueError('Polynomial constant must be equal to 1.')
+        seasonal_ar_params = []
+        for i in range(1, self.spec.max_seasonal_ar_order + 1):
+            if i in self.spec.seasonal_ar_lags:
+                seasonal_ar_params.append(-value[s * i])
+            elif value[s * i] != 0:
+                raise ValueError('AR polynomial includes non-zero values'
+                                 ' for lags that are excluded in the'
+                                 ' specification.')
+        self.seasonal_ar_params = seasonal_ar_params

     @property
     def seasonal_ma_params(self):
         """(array) Seasonal moving average parameters."""
-        pass
+        return self._params_split['seasonal_ma_params']
+
+    @seasonal_ma_params.setter
+    def seasonal_ma_params(self, value):
+        if np.isscalar(value):
+            value = [value] * self.k_seasonal_ma_params
+        self._params_split['seasonal_ma_params'] = validate_basic(
+            value, self.k_seasonal_ma_params, title='seasonal MA coefficients')
+        self._params = None

     @property
     def seasonal_ma_poly(self):
         """(Polynomial) Seasonal moving average lag polynomial."""
-        pass
+        # Need to expand the polynomial according to the season
+        s = self.spec.seasonal_periods
+        coef = np.array([1])
+        if s > 0:
+            expanded = np.zeros(self.spec.max_seasonal_ma_order)
+            ix = np.array(self.spec.seasonal_ma_lags, dtype=int) - 1
+            expanded[ix] = self._params_split['seasonal_ma_params']
+            coef = np.r_[1, np.pad(np.reshape(expanded, (-1, 1)),
+                                   [(0, 0), (s - 1, 0)], 'constant').flatten()]
+        return Polynomial(coef)
+
+    @seasonal_ma_poly.setter
+    def seasonal_ma_poly(self, value):
+        s = self.spec.seasonal_periods
+        # Note: assume that we are given coefficients from the full polynomial
+        # Convert from the polynomial to the parameters, and set that way
+        if isinstance(value, Polynomial):
+            value = value.coef
+        value = validate_basic(value, 1 + s * self.spec.max_seasonal_ma_order,
+                               title='seasonal MA polynomial',)
+        if value[0] != 1:
+            raise ValueError('Polynomial constant must be equal to 1.')
+        seasonal_ma_params = []
+        for i in range(1, self.spec.max_seasonal_ma_order + 1):
+            if i in self.spec.seasonal_ma_lags:
+                seasonal_ma_params.append(value[s * i])
+            elif value[s * i] != 0:
+                raise ValueError('MA polynomial includes non-zero values'
+                                 ' for lags that are excluded in the'
+                                 ' specification.')
+        self.seasonal_ma_params = seasonal_ma_params

     @property
     def sigma2(self):
         """(float) Innovation variance."""
-        pass
+        return self._params_split['sigma2']
+
+    @sigma2.setter
+    def sigma2(self, params):
+        length = int(not self.spec.concentrate_scale)
+        self._params_split['sigma2'] = validate_basic(
+            params, length, title='sigma2').item()
+        self._params = None

     @property
     def reduced_ar_poly(self):
         """(Polynomial) Reduced form autoregressive lag polynomial."""
-        pass
+        return self.ar_poly * self.seasonal_ar_poly

     @property
     def reduced_ma_poly(self):
         """(Polynomial) Reduced form moving average lag polynomial."""
-        pass
+        return self.ma_poly * self.seasonal_ma_poly

     @property
     def params(self):
         """(array) Complete parameter vector."""
-        pass
+        if self._params is None:
+            self._params = self.spec.join_params(**self._params_split)
+        return self._params.copy()
+
+    @params.setter
+    def params(self, value):
+        self._params_split = self.spec.split_params(value)
+        self._params = None

     @property
     def is_complete(self):
         """(bool) Are current parameter values all filled in (i.e. not NaN)."""
-        pass
+        return not np.any(np.isnan(self.params))

     @property
     def is_valid(self):
         """(bool) Are current parameter values valid (e.g. variance > 0)."""
-        pass
+        valid = True
+        try:
+            self.spec.validate_params(self.params)
+        except ValueError:
+            valid = False
+        return valid

     @property
     def is_stationary(self):
         """(bool) Is the reduced autoregressive lag poylnomial stationary."""
-        pass
+        validate_basic(self.ar_params, self.k_ar_params,
+                       title='AR coefficients')
+        validate_basic(self.seasonal_ar_params, self.k_seasonal_ar_params,
+                       title='seasonal AR coefficients')
+
+        ar_stationary = True
+        seasonal_ar_stationary = True
+        if self.k_ar_params > 0:
+            ar_stationary = is_invertible(self.ar_poly.coef)
+        if self.k_seasonal_ar_params > 0:
+            seasonal_ar_stationary = is_invertible(self.seasonal_ar_poly.coef)
+
+        return ar_stationary and seasonal_ar_stationary

     @property
     def is_invertible(self):
         """(bool) Is the reduced moving average lag poylnomial invertible."""
-        pass
+        # Short-circuit if there is no MA component
+        validate_basic(self.ma_params, self.k_ma_params,
+                       title='MA coefficients')
+        validate_basic(self.seasonal_ma_params, self.k_seasonal_ma_params,
+                       title='seasonal MA coefficients')
+
+        ma_stationary = True
+        seasonal_ma_stationary = True
+        if self.k_ma_params > 0:
+            ma_stationary = is_invertible(self.ma_poly.coef)
+        if self.k_seasonal_ma_params > 0:
+            seasonal_ma_stationary = is_invertible(self.seasonal_ma_poly.coef)
+
+        return ma_stationary and seasonal_ma_stationary

     def to_dict(self):
         """
@@ -167,7 +363,7 @@ class SARIMAXParams:
             `concentrate_scale=True`) 'sigma2'. Values are the parameters
             associated with the key, based on the `params` argument.
         """
-        pass
+        return self._params_split.copy()

     def to_pandas(self):
         """
@@ -178,7 +374,7 @@ class SARIMAXParams:
         series : pd.Series
             Pandas series with index set to the parameter names.
         """
-        pass
+        return pd.Series(self.params, index=self.param_names)

     def __repr__(self):
         """Represent SARIMAXParams object as a string."""
@@ -190,9 +386,11 @@ class SARIMAXParams:
         if self.k_ma_params:
             components.append('ma=%s' % str(self.ma_params))
         if self.k_seasonal_ar_params:
-            components.append('seasonal_ar=%s' % str(self.seasonal_ar_params))
+            components.append('seasonal_ar=%s' %
+                              str(self.seasonal_ar_params))
         if self.k_seasonal_ma_params:
-            components.append('seasonal_ma=%s' % str(self.seasonal_ma_params))
+            components.append('seasonal_ma=%s' %
+                              str(self.seasonal_ma_params))
         if not self.spec.concentrate_scale:
             components.append('sigma2=%s' % self.sigma2)
         return 'SARIMAXParams(%s)' % ', '.join(components)
diff --git a/statsmodels/tsa/arima/specification.py b/statsmodels/tsa/arima/specification.py
index 5e169d4ae..ad8406174 100644
--- a/statsmodels/tsa/arima/specification.py
+++ b/statsmodels/tsa/arima/specification.py
@@ -6,9 +6,14 @@ License: BSD-3
 """
 import numpy as np
 import pandas as pd
+
 from statsmodels.tools.data import _is_using_pandas
 from statsmodels.tsa.base.tsa_model import TimeSeriesModel
-from statsmodels.tsa.statespace.tools import is_invertible, constrain_stationary_univariate as constrain, unconstrain_stationary_univariate as unconstrain, prepare_exog, prepare_trend_spec, prepare_trend_data
+from statsmodels.tsa.statespace.tools import (
+    is_invertible, constrain_stationary_univariate as constrain,
+    unconstrain_stationary_univariate as unconstrain,
+    prepare_exog, prepare_trend_spec, prepare_trend_data)
+
 from statsmodels.tsa.arima.tools import standardize_lag_order, validate_basic


@@ -203,90 +208,115 @@ class SARIMAXSpecification:
     SARIMAXSpecification(endog=y, order=(1, 0, 0), seasonal_order=(1, 0, 0, 4))
     """

-    def __init__(self, endog=None, exog=None, order=None, seasonal_order=
-        None, ar_order=None, diff=None, ma_order=None, seasonal_ar_order=
-        None, seasonal_diff=None, seasonal_ma_order=None, seasonal_periods=
-        None, trend=None, enforce_stationarity=None, enforce_invertibility=
-        None, concentrate_scale=None, trend_offset=1, dates=None, freq=None,
-        missing='none', validate_specification=True):
+    def __init__(self, endog=None, exog=None, order=None,
+                 seasonal_order=None, ar_order=None, diff=None, ma_order=None,
+                 seasonal_ar_order=None, seasonal_diff=None,
+                 seasonal_ma_order=None, seasonal_periods=None, trend=None,
+                 enforce_stationarity=None, enforce_invertibility=None,
+                 concentrate_scale=None, trend_offset=1, dates=None, freq=None,
+                 missing='none', validate_specification=True):
+
+        # Basic parameters
         self.enforce_stationarity = enforce_stationarity
         self.enforce_invertibility = enforce_invertibility
         self.concentrate_scale = concentrate_scale
         self.trend_offset = trend_offset
+
+        # Validate that we were not given conflicting specifications
         has_order = order is not None
-        has_specific_order = (ar_order is not None or diff is not None or 
-            ma_order is not None)
+        has_specific_order = (ar_order is not None or diff is not None or
+                              ma_order is not None)
         has_seasonal_order = seasonal_order is not None
-        has_specific_seasonal_order = (seasonal_ar_order is not None or 
-            seasonal_diff is not None or seasonal_ma_order is not None or 
-            seasonal_periods is not None)
+        has_specific_seasonal_order = (seasonal_ar_order is not None or
+                                       seasonal_diff is not None or
+                                       seasonal_ma_order is not None or
+                                       seasonal_periods is not None)
         if has_order and has_specific_order:
-            raise ValueError(
-                'Cannot specify both `order` and either of `ar_order` or `ma_order`.'
-                )
+            raise ValueError('Cannot specify both `order` and either of'
+                             ' `ar_order` or `ma_order`.')
         if has_seasonal_order and has_specific_seasonal_order:
-            raise ValueError(
-                'Cannot specify both `seasonal_order` and any of `seasonal_ar_order`, `seasonal_ma_order`, or `seasonal_periods`.'
-                )
+            raise ValueError('Cannot specify both `seasonal_order` and any of'
+                             ' `seasonal_ar_order`, `seasonal_ma_order`,'
+                             ' or `seasonal_periods`.')
+
+        # Compute `order`
         if has_specific_order:
             ar_order = 0 if ar_order is None else ar_order
             diff = 0 if diff is None else diff
             ma_order = 0 if ma_order is None else ma_order
-            order = ar_order, diff, ma_order
+            order = (ar_order, diff, ma_order)
         elif not has_order:
-            order = 0, 0, 0
+            order = (0, 0, 0)
+
+        # Compute `seasonal_order`
         if has_specific_seasonal_order:
-            seasonal_ar_order = (0 if seasonal_ar_order is None else
-                seasonal_ar_order)
+            seasonal_ar_order = (
+                0 if seasonal_ar_order is None else seasonal_ar_order)
             seasonal_diff = 0 if seasonal_diff is None else seasonal_diff
-            seasonal_ma_order = (0 if seasonal_ma_order is None else
-                seasonal_ma_order)
-            seasonal_periods = (0 if seasonal_periods is None else
-                seasonal_periods)
+            seasonal_ma_order = (
+                0 if seasonal_ma_order is None else seasonal_ma_order)
+            seasonal_periods = (
+                0 if seasonal_periods is None else seasonal_periods)
             seasonal_order = (seasonal_ar_order, seasonal_diff,
-                seasonal_ma_order, seasonal_periods)
+                              seasonal_ma_order, seasonal_periods)
         elif not has_seasonal_order:
-            seasonal_order = 0, 0, 0, 0
+            seasonal_order = (0, 0, 0, 0)
+
+        # Validate shapes of `order`, `seasonal_order`
         if len(order) != 3:
-            raise ValueError(
-                '`order` argument must be an iterable with three elements.')
+            raise ValueError('`order` argument must be an iterable with three'
+                             ' elements.')
         if len(seasonal_order) != 4:
-            raise ValueError(
-                '`seasonal_order` argument must be an iterable with four elements.'
-                )
+            raise ValueError('`seasonal_order` argument must be an iterable'
+                             ' with four elements.')
+
+        # Validate differencing parameters
         if validate_specification:
             if order[1] < 0:
                 raise ValueError('Cannot specify negative differencing.')
             if order[1] != int(order[1]):
                 raise ValueError('Cannot specify fractional differencing.')
             if seasonal_order[1] < 0:
-                raise ValueError(
-                    'Cannot specify negative seasonal differencing.')
+                raise ValueError('Cannot specify negative seasonal'
+                                 ' differencing.')
             if seasonal_order[1] != int(seasonal_order[1]):
-                raise ValueError(
-                    'Cannot specify fractional seasonal differencing.')
+                raise ValueError('Cannot specify fractional seasonal'
+                                 ' differencing.')
             if seasonal_order[3] < 0:
-                raise ValueError(
-                    'Cannot specify negative seasonal periodicity.')
-        order = standardize_lag_order(order[0], 'AR'), int(order[1]
-            ), standardize_lag_order(order[2], 'MA')
-        seasonal_order = standardize_lag_order(seasonal_order[0], 'seasonal AR'
-            ), int(seasonal_order[1]), standardize_lag_order(seasonal_order
-            [2], 'seasonal MA'), int(seasonal_order[3])
+                raise ValueError('Cannot specify negative seasonal'
+                                 ' periodicity.')
+
+        # Standardize to integers or lists of integers
+        order = (
+            standardize_lag_order(order[0], 'AR'),
+            int(order[1]),
+            standardize_lag_order(order[2], 'MA'))
+        seasonal_order = (
+            standardize_lag_order(seasonal_order[0], 'seasonal AR'),
+            int(seasonal_order[1]),
+            standardize_lag_order(seasonal_order[2], 'seasonal MA'),
+            int(seasonal_order[3]))
+
+        # Validate seasonals
         if validate_specification:
             if seasonal_order[3] == 1:
-                raise ValueError('Seasonal periodicity must be greater than 1.'
-                    )
-            if (seasonal_order[0] != 0 or seasonal_order[1] != 0 or 
-                seasonal_order[2] != 0) and seasonal_order[3] == 0:
-                raise ValueError(
-                    'Must include nonzero seasonal periodicity if including seasonal AR, MA, or differencing.'
-                    )
+                raise ValueError('Seasonal periodicity must be greater'
+                                 ' than 1.')
+            if ((seasonal_order[0] != 0 or seasonal_order[1] != 0 or
+                    seasonal_order[2] != 0) and seasonal_order[3] == 0):
+                raise ValueError('Must include nonzero seasonal periodicity if'
+                                 ' including seasonal AR, MA, or'
+                                 ' differencing.')
+
+        # Basic order
         self.order = order
         self.ar_order, self.diff, self.ma_order = order
+
         self.seasonal_order = seasonal_order
         (self.seasonal_ar_order, self.seasonal_diff, self.seasonal_ma_order,
-            self.seasonal_periods) = seasonal_order
+         self.seasonal_periods) = seasonal_order
+
+        # Lists of included lags
         if isinstance(self.ar_order, list):
             self.ar_lags = self.ar_order
         else:
@@ -295,57 +325,87 @@ class SARIMAXSpecification:
             self.ma_lags = self.ma_order
         else:
             self.ma_lags = np.arange(1, self.ma_order + 1).tolist()
+
         if isinstance(self.seasonal_ar_order, list):
             self.seasonal_ar_lags = self.seasonal_ar_order
         else:
-            self.seasonal_ar_lags = np.arange(1, self.seasonal_ar_order + 1
-                ).tolist()
+            self.seasonal_ar_lags = (
+                np.arange(1, self.seasonal_ar_order + 1).tolist())
         if isinstance(self.seasonal_ma_order, list):
             self.seasonal_ma_lags = self.seasonal_ma_order
         else:
-            self.seasonal_ma_lags = np.arange(1, self.seasonal_ma_order + 1
-                ).tolist()
+            self.seasonal_ma_lags = (
+                np.arange(1, self.seasonal_ma_order + 1).tolist())
+
+        # Maximum lag orders
         self.max_ar_order = self.ar_lags[-1] if self.ar_lags else 0
         self.max_ma_order = self.ma_lags[-1] if self.ma_lags else 0
-        self.max_seasonal_ar_order = self.seasonal_ar_lags[-1
-            ] if self.seasonal_ar_lags else 0
-        self.max_seasonal_ma_order = self.seasonal_ma_lags[-1
-            ] if self.seasonal_ma_lags else 0
-        self.max_reduced_ar_order = (self.max_ar_order + self.
-            max_seasonal_ar_order * self.seasonal_periods)
-        self.max_reduced_ma_order = (self.max_ma_order + self.
-            max_seasonal_ma_order * self.seasonal_periods)
+
+        self.max_seasonal_ar_order = (
+            self.seasonal_ar_lags[-1] if self.seasonal_ar_lags else 0)
+        self.max_seasonal_ma_order = (
+            self.seasonal_ma_lags[-1] if self.seasonal_ma_lags else 0)
+
+        self.max_reduced_ar_order = (
+            self.max_ar_order +
+            self.max_seasonal_ar_order * self.seasonal_periods)
+        self.max_reduced_ma_order = (
+            self.max_ma_order +
+            self.max_seasonal_ma_order * self.seasonal_periods)
+
+        # Check that we don't have duplicate AR or MA lags from the seasonal
+        # component
         ar_lags = set(self.ar_lags)
-        seasonal_ar_lags = set(np.array(self.seasonal_ar_lags) * self.
-            seasonal_periods)
+        seasonal_ar_lags = set(np.array(self.seasonal_ar_lags)
+                               * self.seasonal_periods)
         duplicate_ar_lags = ar_lags.intersection(seasonal_ar_lags)
         if validate_specification and len(duplicate_ar_lags) > 0:
-            raise ValueError(
-                'Invalid model: autoregressive lag(s) %s are in both the seasonal and non-seasonal autoregressive components.'
-                 % duplicate_ar_lags)
+            raise ValueError('Invalid model: autoregressive lag(s) %s are'
+                             ' in both the seasonal and non-seasonal'
+                             ' autoregressive components.'
+                             % duplicate_ar_lags)
+
         ma_lags = set(self.ma_lags)
-        seasonal_ma_lags = set(np.array(self.seasonal_ma_lags) * self.
-            seasonal_periods)
+        seasonal_ma_lags = set(np.array(self.seasonal_ma_lags)
+                               * self.seasonal_periods)
         duplicate_ma_lags = ma_lags.intersection(seasonal_ma_lags)
         if validate_specification and len(duplicate_ma_lags) > 0:
-            raise ValueError(
-                'Invalid model: moving average lag(s) %s are in both the seasonal and non-seasonal moving average components.'
-                 % duplicate_ma_lags)
+            raise ValueError('Invalid model: moving average lag(s) %s are'
+                             ' in both the seasonal and non-seasonal'
+                             ' moving average components.'
+                             % duplicate_ma_lags)
+
+        # Handle trend
         self.trend = trend
         self.trend_poly, _ = prepare_trend_spec(trend)
+
+        # Check for a constant column in the provided exog
         exog_is_pandas = _is_using_pandas(exog, None)
-        if validate_specification and exog is not None and len(self.trend_poly
-            ) > 0 and self.trend_poly[0] == 1:
+        if (validate_specification and exog is not None and
+                len(self.trend_poly) > 0 and self.trend_poly[0] == 1):
+            # Figure out if we have any constant columns
             x = np.asanyarray(exog)
             ptp0 = np.ptp(x, axis=0)
             col_is_const = ptp0 == 0
             nz_const = col_is_const & (x[0] != 0)
             col_const = nz_const
+
+            # If we already have a constant column, raise an error
             if np.any(col_const):
-                raise ValueError(
-                    'A constant trend was included in the model specification, but the `exog` data already contains a column of constants.'
-                    )
+                raise ValueError('A constant trend was included in the model'
+                                 ' specification, but the `exog` data already'
+                                 ' contains a column of constants.')
+
+        # This contains the included exponents of the trend polynomial,
+        # where e.g. the constant term has exponent 0, a linear trend has
+        # exponent 1, etc.
         self.trend_terms = np.where(self.trend_poly == 1)[0]
+        # Trend order is either the degree of the trend polynomial, if all
+        # exponents are included, or a list of included exponents. Here we need
+        # to make a distinction between a degree zero polynomial (i.e. a
+        # constant) and the zero polynomial (i.e. not even a constant). The
+        # former has `trend_order = 0`, while the latter has
+        # `trend_order = None`.
         self.k_trend = len(self.trend_terms)
         if len(self.trend_terms) == 0:
             self.trend_order = None
@@ -356,32 +416,46 @@ class SARIMAXSpecification:
         else:
             self.trend_order = self.trend_terms
             self.trend_degree = self.trend_terms[-1]
+
+        # Handle endog / exog
+        # Standardize exog
         self.k_exog, exog = prepare_exog(exog)
+
+        # Standardize endog (including creating a faux endog if necessary)
         faux_endog = endog is None
         if endog is None:
             endog = [] if exog is None else np.zeros(len(exog)) * np.nan
+
+        # Add trend data into exog
         nobs = len(endog) if exog is None else len(exog)
         if self.trend_order is not None:
+            # Add in the data
             trend_data = self.construct_trend_data(nobs, trend_offset)
             if exog is None:
                 exog = trend_data
             elif exog_is_pandas:
                 trend_data = pd.DataFrame(trend_data, index=exog.index,
-                    columns=self.construct_trend_names())
+                                          columns=self.construct_trend_names())
                 exog = pd.concat([trend_data, exog], axis=1)
             else:
                 exog = np.c_[trend_data, exog]
-        self._model = TimeSeriesModel(endog, exog=exog, dates=dates, freq=
-            freq, missing=missing)
+
+        # Create an underlying time series model, to handle endog / exog,
+        # especially validating shapes, retrieving names, and potentially
+        # providing us with a time series index
+        self._model = TimeSeriesModel(endog, exog=exog, dates=dates, freq=freq,
+                                      missing=missing)
         self.endog = None if faux_endog else self._model.endog
         self.exog = self._model.exog
-        if (validate_specification and not faux_endog and self.endog.ndim >
-            1 and self.endog.shape[1] > 1):
-            raise ValueError(
-                'SARIMAX models require univariate `endog`. Got shape %s.' %
-                str(self.endog.shape))
-        self._has_missing = None if faux_endog else np.any(np.isnan(self.endog)
-            )
+
+        # Validate endog shape
+        if (validate_specification and not faux_endog and
+                self.endog.ndim > 1 and self.endog.shape[1] > 1):
+            raise ValueError('SARIMAX models require univariate `endog`. Got'
+                             ' shape %s.' % str(self.endog.shape))
+
+        self._has_missing = (
+            None if faux_endog else np.any(np.isnan(self.endog)))

     @property
     def is_ar_consecutive(self):
@@ -390,7 +464,8 @@ class SARIMAXSpecification:

         I.e. does it include all lags up to and including the maximum lag.
         """
-        pass
+        return (self.max_seasonal_ar_order == 0 and
+                not isinstance(self.ar_order, list))

     @property
     def is_ma_consecutive(self):
@@ -399,7 +474,8 @@ class SARIMAXSpecification:

         I.e. does it include all lags up to and including the maximum lag.
         """
-        pass
+        return (self.max_seasonal_ma_order == 0 and
+                not isinstance(self.ma_order, list))

     @property
     def is_integrated(self):
@@ -408,72 +484,83 @@ class SARIMAXSpecification:

         I.e. does it have a nonzero `diff` or `seasonal_diff`.
         """
-        pass
+        return self.diff > 0 or self.seasonal_diff > 0

     @property
     def is_seasonal(self):
         """(bool) Does the model include a seasonal component."""
-        pass
+        return self.seasonal_periods != 0

     @property
     def k_exog_params(self):
         """(int) Number of parameters associated with exogenous variables."""
-        pass
+        return len(self.exog_names)

     @property
     def k_ar_params(self):
         """(int) Number of autoregressive (non-seasonal) parameters."""
-        pass
+        return len(self.ar_lags)

     @property
     def k_ma_params(self):
         """(int) Number of moving average (non-seasonal) parameters."""
-        pass
+        return len(self.ma_lags)

     @property
     def k_seasonal_ar_params(self):
         """(int) Number of seasonal autoregressive parameters."""
-        pass
+        return len(self.seasonal_ar_lags)

     @property
     def k_seasonal_ma_params(self):
         """(int) Number of seasonal moving average parameters."""
-        pass
+        return len(self.seasonal_ma_lags)

     @property
     def k_params(self):
         """(int) Total number of model parameters."""
-        pass
+        k_params = (self.k_exog_params + self.k_ar_params + self.k_ma_params +
+                    self.k_seasonal_ar_params + self.k_seasonal_ma_params)
+        if not self.concentrate_scale:
+            k_params += 1
+        return k_params

     @property
     def exog_names(self):
         """(list of str) Names associated with exogenous parameters."""
-        pass
+        exog_names = self._model.exog_names
+        return [] if exog_names is None else exog_names

     @property
     def ar_names(self):
         """(list of str) Names of (non-seasonal) autoregressive parameters."""
-        pass
+        return ['ar.L%d' % i for i in self.ar_lags]

     @property
     def ma_names(self):
         """(list of str) Names of (non-seasonal) moving average parameters."""
-        pass
+        return ['ma.L%d' % i for i in self.ma_lags]

     @property
     def seasonal_ar_names(self):
         """(list of str) Names of seasonal autoregressive parameters."""
-        pass
+        s = self.seasonal_periods
+        return ['ar.S.L%d' % (i * s) for i in self.seasonal_ar_lags]

     @property
     def seasonal_ma_names(self):
         """(list of str) Names of seasonal moving average parameters."""
-        pass
+        s = self.seasonal_periods
+        return ['ma.S.L%d' % (i * s) for i in self.seasonal_ma_lags]

     @property
     def param_names(self):
         """(list of str) Names of all model parameters."""
-        pass
+        names = (self.exog_names + self.ar_names + self.ma_names +
+                 self.seasonal_ar_names + self.seasonal_ma_names)
+        if not self.concentrate_scale:
+            names.append('sigma2')
+        return names

     @property
     def valid_estimators(self):
@@ -486,7 +573,39 @@ class SARIMAXSpecification:
         `valid_estimators` are the estimators that could be passed as the
         `arma_estimator` argument to `gls`.
         """
-        pass
+        estimators = {'yule_walker', 'burg', 'innovations',
+                      'hannan_rissanen', 'innovations_mle', 'statespace'}
+
+        # Properties
+        has_ar = self.max_ar_order != 0
+        has_ma = self.max_ma_order != 0
+        has_seasonal = self.seasonal_periods != 0
+
+        # Only state space can handle missing data or concentrated scale
+        if self._has_missing:
+            estimators.intersection_update(['statespace'])
+
+        # Only numerical MLE estimators can enforce restrictions
+        if ((self.enforce_stationarity and self.max_ar_order > 0) or
+                (self.enforce_invertibility and self.max_ma_order > 0)):
+            estimators.intersection_update(['innovations_mle', 'statespace'])
+
+        # Innovations: no AR, non-consecutive MA, seasonal
+        if has_ar or not self.is_ma_consecutive or has_seasonal:
+            estimators.discard('innovations')
+        # Yule-Walker/Burg: no MA, non-consecutive AR, seasonal
+        if has_ma or not self.is_ar_consecutive or has_seasonal:
+            estimators.discard('yule_walker')
+            estimators.discard('burg')
+        # Hannan-Rissanen: no seasonal
+        if has_seasonal:
+            estimators.discard('hannan_rissanen')
+        # Innovations MLE: cannot have enforce_stationary=False or
+        # concentratre_scale=True
+        if self.enforce_stationarity is False or self.concentrate_scale:
+            estimators.discard('innovations_mle')
+
+        return estimators

     def validate_estimator(self, estimator):
         """
@@ -538,7 +657,78 @@ class SARIMAXSpecification:
         >>> spec.validate_estimator('not_an_estimator')
         ValueError: "not_an_estimator" is not a valid estimator.
         """
-        pass
+        has_ar = self.max_ar_order != 0
+        has_ma = self.max_ma_order != 0
+        has_seasonal = self.seasonal_periods != 0
+        has_missing = self._has_missing
+
+        titles = {
+            'yule_walker': 'Yule-Walker',
+            'burg': 'Burg',
+            'innovations': 'Innovations',
+            'hannan_rissanen': 'Hannan-Rissanen',
+            'innovations_mle': 'Innovations MLE',
+            'statespace': 'State space'
+        }
+
+        # Only state space form can support missing data
+        if estimator != 'statespace':
+            if has_missing:
+                raise ValueError('%s estimator does not support missing'
+                                 ' values in `endog`.' % titles[estimator])
+
+        # Only state space and innovations MLE can enforce parameter
+        # restrictions
+        if estimator not in ['innovations_mle', 'statespace']:
+            if self.max_ar_order > 0 and self.enforce_stationarity:
+                raise ValueError('%s estimator cannot enforce a stationary'
+                                 ' autoregressive lag polynomial.'
+                                 % titles[estimator])
+            if self.max_ma_order > 0 and self.enforce_invertibility:
+                raise ValueError('%s estimator cannot enforce an invertible'
+                                 ' moving average lag polynomial.'
+                                 % titles[estimator])
+
+        # Now go through specific disqualifications for each estimator
+        if estimator in ['yule_walker', 'burg']:
+            if has_seasonal:
+                raise ValueError('%s estimator does not support seasonal'
+                                 ' components.' % titles[estimator])
+            if not self.is_ar_consecutive:
+                raise ValueError('%s estimator does not support'
+                                 ' non-consecutive autoregressive lags.'
+                                 % titles[estimator])
+            if has_ma:
+                raise ValueError('%s estimator does not support moving average'
+                                 ' components.' % titles[estimator])
+        elif estimator == 'innovations':
+            if has_seasonal:
+                raise ValueError('Innovations estimator does not support'
+                                 ' seasonal components.')
+            if not self.is_ma_consecutive:
+                raise ValueError('Innovations estimator does not support'
+                                 ' non-consecutive moving average lags.')
+            if has_ar:
+                raise ValueError('Innovations estimator does not support'
+                                 ' autoregressive components.')
+        elif estimator == 'hannan_rissanen':
+            if has_seasonal:
+                raise ValueError('Hannan-Rissanen estimator does not support'
+                                 ' seasonal components.')
+        elif estimator == 'innovations_mle':
+            if self.enforce_stationarity is False:
+                raise ValueError('Innovations MLE estimator does not support'
+                                 ' non-stationary autoregressive components,'
+                                 ' but `enforce_stationarity` is set to False')
+            if self.concentrate_scale:
+                raise ValueError('Innovations MLE estimator does not support'
+                                 ' concentrating the scale out of the'
+                                 ' log-likelihood function')
+        elif estimator == 'statespace':
+            # State space form supports all variations of SARIMAX.
+            pass
+        else:
+            raise ValueError('"%s" is not a valid estimator.' % estimator)

     def split_params(self, params, allow_infnan=False):
         """
@@ -571,10 +761,28 @@ class SARIMAXSpecification:
          'seasonal_ma_params': array([], dtype=float64),
          'sigma2': 4.0}
         """
-        pass
+        params = validate_basic(params, self.k_params,
+                                allow_infnan=allow_infnan,
+                                title='joint parameters')
+
+        ix = [self.k_exog_params, self.k_ar_params, self.k_ma_params,
+              self.k_seasonal_ar_params, self.k_seasonal_ma_params]
+        names = ['exog_params', 'ar_params', 'ma_params',
+                 'seasonal_ar_params', 'seasonal_ma_params']
+        if not self.concentrate_scale:
+            ix.append(1)
+            names.append('sigma2')
+        ix = np.cumsum(ix)
+
+        out = dict(zip(names, np.split(params, ix)))
+        if 'sigma2' in out:
+            out['sigma2'] = out['sigma2'].item()
+
+        return out

     def join_params(self, exog_params=None, ar_params=None, ma_params=None,
-        seasonal_ar_params=None, seasonal_ma_params=None, sigma2=None):
+                    seasonal_ar_params=None, seasonal_ma_params=None,
+                    sigma2=None):
         """
         Join parameters into a single vector.

@@ -610,7 +818,33 @@ class SARIMAXSpecification:
         >>> spec.join_params(ar_params=0.5, sigma2=4)
         array([0.5, 4. ])
         """
-        pass
+        definitions = [
+            ('exogenous variables', self.k_exog_params, exog_params),
+            ('AR terms', self.k_ar_params, ar_params),
+            ('MA terms', self.k_ma_params, ma_params),
+            ('seasonal AR terms', self.k_seasonal_ar_params,
+                seasonal_ar_params),
+            ('seasonal MA terms', self.k_seasonal_ma_params,
+                seasonal_ma_params),
+            ('variance', int(not self.concentrate_scale), sigma2)]
+
+        params_list = []
+        for title, k, params in definitions:
+            if k > 0:
+                # Validate
+                if params is None:
+                    raise ValueError('Specification includes %s, but no'
+                                     ' parameters were provided.' % title)
+                params = np.atleast_1d(np.squeeze(params))
+                if not params.shape == (k,):
+                    raise ValueError('Specification included %d %s, but'
+                                     ' parameters with shape %s were provided.'
+                                     % (k, title, params.shape))
+
+                # Otherwise add to the list
+                params_list.append(params)
+
+        return np.concatenate(params_list)

     def validate_params(self, params):
         """
@@ -639,7 +873,37 @@ class SARIMAXSpecification:
         >>> spec.validate_params([-1.5, 4.])
         ValueError: Non-stationary autoregressive polynomial.
         """
-        pass
+        # Note: split_params includes basic validation
+        params = self.split_params(params)
+
+        # Specific checks
+        if self.enforce_stationarity:
+            if self.k_ar_params:
+                ar_poly = np.r_[1, -params['ar_params']]
+                if not is_invertible(ar_poly):
+                    raise ValueError('Non-stationary autoregressive'
+                                     ' polynomial.')
+            if self.k_seasonal_ar_params:
+                seasonal_ar_poly = np.r_[1, -params['seasonal_ar_params']]
+                if not is_invertible(seasonal_ar_poly):
+                    raise ValueError('Non-stationary seasonal autoregressive'
+                                     ' polynomial.')
+
+        if self.enforce_invertibility:
+            if self.k_ma_params:
+                ma_poly = np.r_[1, params['ma_params']]
+                if not is_invertible(ma_poly):
+                    raise ValueError('Non-invertible moving average'
+                                     ' polynomial.')
+            if self.k_seasonal_ma_params:
+                seasonal_ma_poly = np.r_[1, params['seasonal_ma_params']]
+                if not is_invertible(seasonal_ma_poly):
+                    raise ValueError('Non-invertible seasonal moving average'
+                                     ' polynomial.')
+
+        if not self.concentrate_scale:
+            if params['sigma2'] <= 0:
+                raise ValueError('Non-positive variance term.')

     def constrain_params(self, unconstrained):
         """
@@ -669,7 +933,39 @@ class SARIMAXSpecification:
         >>> spec.constrain_params([10, -2])
         array([-0.99504,  4.     ])
         """
-        pass
+        unconstrained = self.split_params(unconstrained)
+        params = {}
+
+        if self.k_exog_params:
+            params['exog_params'] = unconstrained['exog_params']
+        if self.k_ar_params:
+            if self.enforce_stationarity:
+                params['ar_params'] = constrain(unconstrained['ar_params'])
+            else:
+                params['ar_params'] = unconstrained['ar_params']
+        if self.k_ma_params:
+            if self.enforce_invertibility:
+                params['ma_params'] = -constrain(unconstrained['ma_params'])
+            else:
+                params['ma_params'] = unconstrained['ma_params']
+        if self.k_seasonal_ar_params:
+            if self.enforce_stationarity:
+                params['seasonal_ar_params'] = (
+                    constrain(unconstrained['seasonal_ar_params']))
+            else:
+                params['seasonal_ar_params'] = (
+                    unconstrained['seasonal_ar_params'])
+        if self.k_seasonal_ma_params:
+            if self.enforce_invertibility:
+                params['seasonal_ma_params'] = (
+                    -constrain(unconstrained['seasonal_ma_params']))
+            else:
+                params['seasonal_ma_params'] = (
+                    unconstrained['seasonal_ma_params'])
+        if not self.concentrate_scale:
+            params['sigma2'] = unconstrained['sigma2']**2
+
+        return self.join_params(**params)

     def unconstrain_params(self, constrained):
         """
@@ -697,7 +993,59 @@ class SARIMAXSpecification:
         >>> spec.unconstrain_params([-0.5, 4.])
         array([0.57735, 2.     ])
         """
-        pass
+        constrained = self.split_params(constrained)
+        params = {}
+
+        if self.k_exog_params:
+            params['exog_params'] = constrained['exog_params']
+        if self.k_ar_params:
+            if self.enforce_stationarity:
+                params['ar_params'] = unconstrain(constrained['ar_params'])
+            else:
+                params['ar_params'] = constrained['ar_params']
+        if self.k_ma_params:
+            if self.enforce_invertibility:
+                params['ma_params'] = unconstrain(-constrained['ma_params'])
+            else:
+                params['ma_params'] = constrained['ma_params']
+        if self.k_seasonal_ar_params:
+            if self.enforce_stationarity:
+                params['seasonal_ar_params'] = (
+                    unconstrain(constrained['seasonal_ar_params']))
+            else:
+                params['seasonal_ar_params'] = (
+                    constrained['seasonal_ar_params'])
+        if self.k_seasonal_ma_params:
+            if self.enforce_invertibility:
+                params['seasonal_ma_params'] = (
+                    unconstrain(-constrained['seasonal_ma_params']))
+            else:
+                params['seasonal_ma_params'] = (
+                    constrained['seasonal_ma_params'])
+        if not self.concentrate_scale:
+            params['sigma2'] = constrained['sigma2']**0.5
+
+        return self.join_params(**params)
+
+    def construct_trend_data(self, nobs, offset=1):
+        if self.trend_order is None:
+            trend_data = None
+        else:
+            trend_data = prepare_trend_data(
+                self.trend_poly, int(np.sum(self.trend_poly)), nobs, offset)
+
+        return trend_data
+
+    def construct_trend_names(self):
+        names = []
+        for i in self.trend_terms:
+            if i == 0:
+                names.append('const')
+            elif i == 1:
+                names.append('drift')
+            else:
+                names.append('trend.%d' % i)
+        return names

     def __repr__(self):
         """Represent SARIMAXSpecification object as a string."""
@@ -710,11 +1058,11 @@ class SARIMAXSpecification:
         if self.seasonal_periods > 0:
             components.append('seasonal_order=%s' % str(self.seasonal_order))
         if self.enforce_stationarity is not None:
-            components.append('enforce_stationarity=%s' % self.
-                enforce_stationarity)
+            components.append('enforce_stationarity=%s'
+                              % self.enforce_stationarity)
         if self.enforce_invertibility is not None:
-            components.append('enforce_invertibility=%s' % self.
-                enforce_invertibility)
+            components.append('enforce_invertibility=%s'
+                              % self.enforce_invertibility)
         if self.concentrate_scale is not None:
             components.append('concentrate_scale=%s' % self.concentrate_scale)
         return 'SARIMAXSpecification(%s)' % ', '.join(components)
diff --git a/statsmodels/tsa/arima/tools.py b/statsmodels/tsa/arima/tools.py
index 276c79f78..e402b46f5 100644
--- a/statsmodels/tsa/arima/tools.py
+++ b/statsmodels/tsa/arima/tools.py
@@ -45,7 +45,67 @@ def standardize_lag_order(order, title=None):
     >>> standardize_lag_order([1, 3])
     [1, 3]
     """
-    pass
+    order = np.array(order)
+    title = 'order' if title is None else '%s order' % title
+
+    # Only integer orders are valid
+    if not np.all(order == order.astype(int)):
+        raise ValueError('Invalid %s. Non-integer order (%s) given.'
+                         % (title, order))
+    order = order.astype(int)
+
+    # Only positive integers are valid
+    if np.any(order < 0):
+        raise ValueError('Terms in the %s cannot be negative.' % title)
+
+    # Try to squeeze out an irrelevant trailing dimension
+    if order.ndim == 2 and order.shape[1] == 1:
+        order = order[:, 0]
+    elif order.ndim > 1:
+        raise ValueError('Invalid %s. Must be an integer or'
+                         ' 1-dimensional array-like object (e.g. list,'
+                         ' ndarray, etc.). Got %s.' % (title, order))
+
+    # Option 1: the typical integer response (implies including all
+    # lags up through and including the value)
+    if order.ndim == 0:
+        order = order.item()
+    elif len(order) == 0:
+        order = 0
+    else:
+        # Option 2: boolean list
+        has_zeros = (0 in order)
+        has_multiple_ones = np.sum(order == 1) > 1
+        has_gt_one = np.any(order > 1)
+        if has_zeros or has_multiple_ones:
+            if has_gt_one:
+                raise ValueError('Invalid %s. Appears to be a boolean list'
+                                 ' (since it contains a 0 element and/or'
+                                 ' multiple elements) but also contains'
+                                 ' elements greater than 1 like a list of'
+                                 ' lag orders.' % title)
+            order = (np.where(order == 1)[0] + 1)
+
+        # (Default) Option 3: list of lag orders to include
+        else:
+            order = np.sort(order)
+
+        # If we have an empty list, set order to zero
+        if len(order) == 0:
+            order = 0
+        # If we actually were given consecutive lag orders, just use integer
+        elif np.all(order == np.arange(1, len(order) + 1)):
+            order = order[-1]
+        # Otherwise, convert to list
+        else:
+            order = order.tolist()
+
+    # Check for duplicates
+    has_duplicate = isinstance(order, list) and np.any(np.diff(order) == 0)
+    if has_duplicate:
+        raise ValueError('Invalid %s. Cannot have duplicate elements.' % title)
+
+    return order


 def validate_basic(params, length, allow_infnan=False, title=None):
@@ -75,4 +135,31 @@ def validate_basic(params, length, allow_infnan=False, title=None):
     Basic check that the parameters are numeric and that they are the right
     shape. Optionally checks for NaN / infinite values.
     """
-    pass
+    title = '' if title is None else ' for %s' % title
+
+    # Check for invalid type and coerce to non-integer
+    try:
+        params = np.array(params, dtype=object)
+        is_complex = [isinstance(p, complex) for p in params.ravel()]
+        dtype = complex if any(is_complex) else float
+        params = np.array(params, dtype=dtype)
+    except TypeError:
+        raise ValueError('Parameters vector%s includes invalid values.'
+                         % title)
+
+    # Check for NaN, inf
+    if not allow_infnan and (np.any(np.isnan(params)) or
+                             np.any(np.isinf(params))):
+        raise ValueError('Parameters vector%s includes NaN or Inf values.'
+                         % title)
+
+    params = np.atleast_1d(np.squeeze(params))
+
+    # Check for right number of parameters
+    if params.shape != (length,):
+        plural = '' if length == 1 else 's'
+        raise ValueError('Specification%s implies %d parameter%s, but'
+                         ' values with shape %s were provided.'
+                         % (title, length, plural, params.shape))
+
+    return params
diff --git a/statsmodels/tsa/arima_model.py b/statsmodels/tsa/arima_model.py
index 079e1dffb..0dd3b2209 100644
--- a/statsmodels/tsa/arima_model.py
+++ b/statsmodels/tsa/arima_model.py
@@ -1,6 +1,7 @@
 """
 See statsmodels.tsa.arima.model.ARIMA and statsmodels.tsa.SARIMAX.
 """
+
 ARIMA_DEPRECATION_ERROR = """
 statsmodels.tsa.arima_model.ARMA and statsmodels.tsa.arima_model.ARIMA have
 been removed in favor of statsmodels.tsa.arima.model.ARIMA (note the .
@@ -61,6 +62,5 @@ class ARMAResults:


 class ARIMAResults(ARMAResults):
-
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)
diff --git a/statsmodels/tsa/arima_process.py b/statsmodels/tsa/arima_process.py
index 52c2a0fa6..87589cb06 100644
--- a/statsmodels/tsa/arima_process.py
+++ b/statsmodels/tsa/arima_process.py
@@ -17,21 +17,37 @@ Author: josefpktd
 License: BSD
 """
 import warnings
+
 from statsmodels.compat.pandas import Appender
+
 import numpy as np
 from scipy import linalg, optimize, signal
+
 from statsmodels.tools.docstring import Docstring, remove_parameters
 from statsmodels.tools.validation import array_like
-__all__ = ['arma_acf', 'arma_acovf', 'arma_generate_sample',
-    'arma_impulse_response', 'arma2ar', 'arma2ma', 'deconvolve',
-    'lpol2index', 'index2lpol']
-NONSTATIONARY_ERROR = """The model's autoregressive parameters (ar) indicate that the process
+
+__all__ = [
+    "arma_acf",
+    "arma_acovf",
+    "arma_generate_sample",
+    "arma_impulse_response",
+    "arma2ar",
+    "arma2ma",
+    "deconvolve",
+    "lpol2index",
+    "index2lpol",
+]
+
+
+NONSTATIONARY_ERROR = """\
+The model's autoregressive parameters (ar) indicate that the process
  is non-stationary. arma_acovf can only be used with stationary processes.
 """


-def arma_generate_sample(ar, ma, nsample, scale=1, distrvs=None, axis=0,
-    burnin=0):
+def arma_generate_sample(
+    ar, ma, nsample, scale=1, distrvs=None, axis=0, burnin=0
+):
     """
     Simulate data from an ARMA.

@@ -84,7 +100,23 @@ def arma_generate_sample(ar, ma, nsample, scale=1, distrvs=None, axis=0,
     >>> model.params
     array([ 0.79044189, -0.23140636,  0.70072904,  0.40608028])
     """
-    pass
+    distrvs = np.random.standard_normal if distrvs is None else distrvs
+    if np.ndim(nsample) == 0:
+        nsample = [nsample]
+    if burnin:
+        # handle burin time for nd arrays
+        # maybe there is a better trick in scipy.fft code
+        newsize = list(nsample)
+        newsize[axis] += burnin
+        newsize = tuple(newsize)
+        fslice = [slice(None)] * len(newsize)
+        fslice[axis] = slice(burnin, None, None)
+        fslice = tuple(fslice)
+    else:
+        newsize = tuple(nsample)
+        fslice = tuple([slice(None)] * np.ndim(newsize))
+    eta = scale * distrvs(size=newsize)
+    return signal.lfilter(ma, ar, eta, axis=axis)[fslice]


 def arma_acovf(ar, ma, nobs=10, sigma2=1, dtype=None):
@@ -117,7 +149,52 @@ def arma_acovf(ar, ma, nobs=10, sigma2=1, dtype=None):
     .. [*] Brockwell, Peter J., and Richard A. Davis. 2009. Time Series:
         Theory and Methods. 2nd ed. 1991. New York, NY: Springer.
     """
-    pass
+    if dtype is None:
+        dtype = np.common_type(np.array(ar), np.array(ma), np.array(sigma2))
+
+    p = len(ar) - 1
+    q = len(ma) - 1
+    m = max(p, q) + 1
+
+    if sigma2.real < 0:
+        raise ValueError("Must have positive innovation variance.")
+
+    # Short-circuit for trivial corner-case
+    if p == q == 0:
+        out = np.zeros(nobs, dtype=dtype)
+        out[0] = sigma2
+        return out
+    elif p > 0 and np.max(np.abs(np.roots(ar))) >= 1:
+        raise ValueError(NONSTATIONARY_ERROR)
+
+    # Get the moving average representation coefficients that we need
+    ma_coeffs = arma2ma(ar, ma, lags=m)
+
+    # Solve for the first m autocovariances via the linear system
+    # described by (BD, eq. 3.3.8)
+    A = np.zeros((m, m), dtype=dtype)
+    b = np.zeros((m, 1), dtype=dtype)
+    # We need a zero-right-padded version of ar params
+    tmp_ar = np.zeros(m, dtype=dtype)
+    tmp_ar[: p + 1] = ar
+    for k in range(m):
+        A[k, : (k + 1)] = tmp_ar[: (k + 1)][::-1]
+        A[k, 1 : m - k] += tmp_ar[(k + 1) : m]
+        b[k] = sigma2 * np.dot(ma[k : q + 1], ma_coeffs[: max((q + 1 - k), 0)])
+    acovf = np.zeros(max(nobs, m), dtype=dtype)
+    try:
+        acovf[:m] = np.linalg.solve(A, b)[:, 0]
+    except np.linalg.LinAlgError:
+        raise ValueError(NONSTATIONARY_ERROR)
+
+    # Iteratively apply (BD, eq. 3.3.9) to solve for remaining autocovariances
+    if nobs > m:
+        zi = signal.lfiltic([1], ar, acovf[:m:][::-1])
+        acovf[m:] = signal.lfilter(
+            [1], ar, np.zeros(nobs - m, dtype=dtype), zi=zi
+        )[0]
+
+    return acovf[:nobs]


 def arma_acf(ar, ma, lags=10):
@@ -144,7 +221,8 @@ def arma_acf(ar, ma, lags=10):
     acf : Sample autocorrelation function estimation.
     acovf : Sample autocovariance function estimation.
     """
-    pass
+    acovf = arma_acovf(ar, ma, lags)
+    return acovf / acovf[0]


 def arma_pacf(ar, ma, lags=10):
@@ -171,7 +249,15 @@ def arma_pacf(ar, ma, lags=10):

     not tested/checked yet
     """
-    pass
+    # TODO: Should use rank 1 inverse update
+    apacf = np.zeros(lags)
+    acov = arma_acf(ar, ma, lags=lags + 1)
+
+    apacf[0] = 1.0
+    for k in range(2, lags + 1):
+        r = acov[:k]
+        apacf[k - 1] = linalg.solve(linalg.toeplitz(r[:-1]), r[1:])[-1]
+    return apacf


 def arma_periodogram(ar, ma, worN=None, whole=0):
@@ -208,7 +294,18 @@ def arma_periodogram(ar, ma, worN=None, whole=0):
     This uses signal.freqz, which does not use fft. There is a fft version
     somewhere.
     """
-    pass
+    w, h = signal.freqz(ma, ar, worN=worN, whole=whole)
+    sd = np.abs(h) ** 2 / np.sqrt(2 * np.pi)
+    if np.any(np.isnan(h)):
+        # this happens with unit root or seasonal unit root'
+        import warnings
+
+        warnings.warn(
+            "Warning: nan in frequency response h, maybe a unit " "root",
+            RuntimeWarning,
+            stacklevel=2,
+        )
+    return w, sd


 def arma_impulse_response(ar, ma, leads=100):
@@ -265,7 +362,9 @@ def arma_impulse_response(ar, ma, leads=100):
     array([ 1.        ,  1.3       ,  1.24      ,  0.992     ,  0.7936    ,
             0.63488   ,  0.507904  ,  0.4063232 ,  0.32505856,  0.26004685])
     """
-    pass
+    impulse = np.zeros(leads)
+    impulse[0] = 1.0
+    return signal.lfilter(ma, ar, impulse)


 def arma2ma(ar, ma, lags=100):
@@ -290,7 +389,7 @@ def arma2ma(ar, ma, lags=100):
     -----
     Equivalent to ``arma_impulse_response(ma, ar, leads=100)``
     """
-    pass
+    return arma_impulse_response(ar, ma, leads=lags)


 def arma2ar(ar, ma, lags=100):
@@ -315,10 +414,11 @@ def arma2ar(ar, ma, lags=100):
     -----
     Equivalent to ``arma_impulse_response(ma, ar, leads=100)``
     """
-    pass
+    return arma_impulse_response(ma, ar, leads=lags)


-def ar2arma(ar_des, p, q, n=20, mse='ar', start=None):
+# moved from sandbox.tsa.try_fi
+def ar2arma(ar_des, p, q, n=20, mse="ar", start=None):
     """
     Find arma approximation to ar process.

@@ -358,10 +458,29 @@ def ar2arma(ar_des, p, q, n=20, mse='ar', start=None):
     Extension is possible if we want to match autocovariance instead
     of impulse response function.
     """
-    pass

+    # TODO: convert MA lag polynomial, ma_app, to be invertible, by mirroring
+    # TODO: roots outside the unit interval to ones that are inside. How to do
+    # TODO: this?
+
+    # p,q = pq
+    def msear_err(arma, ar_des):
+        ar, ma = np.r_[1, arma[: p - 1]], np.r_[1, arma[p - 1 :]]
+        ar_approx = arma_impulse_response(ma, ar, n)
+        return ar_des - ar_approx  # ((ar - ar_approx)**2).sum()
+
+    if start is None:
+        arma0 = np.r_[-0.9 * np.ones(p - 1), np.zeros(q - 1)]
+    else:
+        arma0 = start
+    res = optimize.leastsq(msear_err, arma0, ar_des, maxfev=5000)
+    arma_app = np.atleast_1d(res[0])
+    ar_app = (np.r_[1, arma_app[: p - 1]],)
+    ma_app = np.r_[1, arma_app[p - 1 :]]
+    return ar_app, ma_app, res

-_arma_docs = {'ar': arma2ar.__doc__, 'ma': arma2ma.__doc__}
+
+_arma_docs = {"ar": arma2ar.__doc__, "ma": arma2ma.__doc__}


 def lpol2index(ar):
@@ -380,7 +499,12 @@ def lpol2index(ar):
     index : ndarray
         index (lags) of lag polynomial with non-zero elements
     """
-    pass
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore", np.ComplexWarning)
+        ar = array_like(ar, "ar")
+    index = np.nonzero(ar)[0]
+    coeffs = ar[index]
+    return coeffs, index


 def index2lpol(coeffs, index):
@@ -399,7 +523,10 @@ def index2lpol(coeffs, index):
     ar : array_like
         coefficients of lag polynomial
     """
-    pass
+    n = max(index)
+    ar = np.zeros(n + 1)
+    ar[index] = coeffs
+    return ar


 def lpol_fima(d, n=20):
@@ -419,9 +546,14 @@ def lpol_fima(d, n=20):
     ma : ndarray
         coefficients of lag polynomial
     """
-    pass
+    # hide import inside function until we use this heavily
+    from scipy.special import gammaln
+
+    j = np.arange(n)
+    return np.exp(gammaln(d + j) - gammaln(j + 1) - gammaln(d))


+# moved from sandbox.tsa.try_fi
 def lpol_fiar(d, n=20):
     """AR representation of fractional integration

@@ -443,9 +575,16 @@ def lpol_fiar(d, n=20):
     first coefficient is 1, negative signs except for first term,
     ar(L)*x_t
     """
-    pass
+    # hide import inside function until we use this heavily
+    from scipy.special import gammaln

+    j = np.arange(n)
+    ar = -np.exp(gammaln(-d + j) - gammaln(j + 1) - gammaln(-d))
+    ar[0] = 1
+    return ar

+
+# moved from sandbox.tsa.try_fi
 def lpol_sdiff(s):
     """return coefficients for seasonal difference (1-L^s)

@@ -460,7 +599,7 @@ def lpol_sdiff(s):
     -------
     sdiff : list, length s+1
     """
-    pass
+    return [1] + [0] * (s - 1) + [-1]


 def deconvolve(num, den, n=None):
@@ -493,17 +632,34 @@ def deconvolve(num, den, n=None):
     This is copied from scipy.signal.signaltools and added n as optional
     parameter.
     """
-    pass
+    num = np.atleast_1d(num)
+    den = np.atleast_1d(den)
+    N = len(num)
+    D = len(den)
+    if D > N and n is None:
+        quot = []
+        rem = num
+    else:
+        if n is None:
+            n = N - D + 1
+        input = np.zeros(n, float)
+        input[0] = 1
+        quot = signal.lfilter(num, den, input)
+        num_approx = signal.convolve(den, quot, mode="full")
+        if len(num) < len(num_approx):  # 1d only ?
+            num = np.concatenate((num, np.zeros(len(num_approx) - len(num))))
+        rem = num - num_approx
+    return quot, rem


 _generate_sample_doc = Docstring(arma_generate_sample.__doc__)
-_generate_sample_doc.remove_parameters(['ar', 'ma'])
-_generate_sample_doc.replace_block('Notes', [])
-_generate_sample_doc.replace_block('Examples', [])
+_generate_sample_doc.remove_parameters(["ar", "ma"])
+_generate_sample_doc.replace_block("Notes", [])
+_generate_sample_doc.replace_block("Examples", [])


 class ArmaProcess:
-    """
+    r"""
     Theoretical properties of an ARMA process for specified lag-polynomials.

     Parameters
@@ -530,16 +686,16 @@ class ArmaProcess:

     .. math::

-        y_{t}=\\phi_{1}y_{t-1}+\\ldots+\\phi_{p}y_{t-p}+\\theta_{1}\\epsilon_{t-1}
-               +\\ldots+\\theta_{q}\\epsilon_{t-q}+\\epsilon_{t}
+        y_{t}=\phi_{1}y_{t-1}+\ldots+\phi_{p}y_{t-p}+\theta_{1}\epsilon_{t-1}
+               +\ldots+\theta_{q}\epsilon_{t-q}+\epsilon_{t}

     and the parameterization used in this function uses the lag-polynomial
     representation,

     .. math::

-        \\left(1-\\phi_{1}L-\\ldots-\\phi_{p}L^{p}\\right)y_{t} =
-            \\left(1+\\theta_{1}L+\\ldots+\\theta_{q}L^{q}\\right)\\epsilon_{t}
+        \left(1-\phi_{1}L-\ldots-\phi_{p}L^{p}\right)y_{t} =
+            \left(1+\theta_{1}L+\ldots+\theta_{q}L^{q}\right)\epsilon_{t}

     Examples
     --------
@@ -571,15 +727,16 @@ class ArmaProcess:
     array([1.5-1.32287566j, 1.5+1.32287566j])
     """

+    # TODO: Check unit root behavior
     def __init__(self, ar=None, ma=None, nobs=100):
         if ar is None:
             ar = np.array([1.0])
         if ma is None:
             ma = np.array([1.0])
         with warnings.catch_warnings():
-            warnings.simplefilter('ignore', np.ComplexWarning)
-            self.ar = array_like(ar, 'ar')
-            self.ma = array_like(ma, 'ma')
+            warnings.simplefilter("ignore", np.ComplexWarning)
+            self.ar = array_like(ar, "ar")
+            self.ma = array_like(ma, "ma")
         self.arcoefs = -self.ar[1:]
         self.macoefs = self.ma[1:]
         self.arpoly = np.polynomial.Polynomial(self.ar)
@@ -618,7 +775,21 @@ class ArmaProcess:
         >>> arma_process.isinvertible
         True
         """
-        pass
+        if arroots is not None and len(arroots):
+            arpoly = np.polynomial.polynomial.Polynomial.fromroots(arroots)
+            arcoefs = arpoly.coef[1:] / arpoly.coef[0]
+        else:
+            arcoefs = []
+
+        if maroots is not None and len(maroots):
+            mapoly = np.polynomial.polynomial.Polynomial.fromroots(maroots)
+            macoefs = mapoly.coef[1:] / mapoly.coef[0]
+        else:
+            macoefs = []
+
+        # As from_coeffs will create a polynomial with constant 1/-1,(MA/AR)
+        # we need to scale the polynomial coefficients accordingly
+        return cls(np.r_[1, arcoefs], np.r_[1, macoefs], nobs=nobs)

     @classmethod
     def from_coeffs(cls, arcoefs=None, macoefs=None, nobs=100):
@@ -653,7 +824,13 @@ class ArmaProcess:
         >>> arma_process.isinvertible
         True
         """
-        pass
+        arcoefs = [] if arcoefs is None else arcoefs
+        macoefs = [] if macoefs is None else macoefs
+        return cls(
+            np.r_[1, -np.asarray(arcoefs)],
+            np.r_[1, np.asarray(macoefs)],
+            nobs=nobs,
+        )

     @classmethod
     def from_estimation(cls, model_results, nobs=None):
@@ -677,7 +854,12 @@ class ArmaProcess:
         statsmodels.tsa.arima.model.ARIMA
             The models class used to create the ArmaProcess
         """
-        pass
+        nobs = nobs or model_results.nobs
+        return cls(
+            model_results.polynomial_reduced_ar,
+            model_results.polynomial_reduced_ma,
+            nobs=nobs,
+        )

     def __mul__(self, oth):
         if isinstance(oth, self.__class__):
@@ -691,27 +873,68 @@ class ArmaProcess:
                 ar = (self.arpoly * arpolyoth).coef
                 ma = (self.mapoly * mapolyoth).coef
             except:
-                raise TypeError('Other type is not a valid type')
+                raise TypeError("Other type is not a valid type")
         return self.__class__(ar, ma, nobs=self.nobs)

     def __repr__(self):
-        msg = 'ArmaProcess({0}, {1}, nobs={2}) at {3}'
-        return msg.format(self.ar.tolist(), self.ma.tolist(), self.nobs,
-            hex(id(self)))
+        msg = "ArmaProcess({0}, {1}, nobs={2}) at {3}"
+        return msg.format(
+            self.ar.tolist(), self.ma.tolist(), self.nobs, hex(id(self))
+        )

     def __str__(self):
-        return 'ArmaProcess\nAR: {0}\nMA: {1}'.format(self.ar.tolist(),
-            self.ma.tolist())
+        return "ArmaProcess\nAR: {0}\nMA: {1}".format(
+            self.ar.tolist(), self.ma.tolist()
+        )
+
+    @Appender(remove_parameters(arma_acovf.__doc__, ["ar", "ma", "sigma2"]))
+    def acovf(self, nobs=None):
+        nobs = nobs or self.nobs
+        return arma_acovf(self.ar, self.ma, nobs=nobs)
+
+    @Appender(remove_parameters(arma_acf.__doc__, ["ar", "ma"]))
+    def acf(self, lags=None):
+        lags = lags or self.nobs
+        return arma_acf(self.ar, self.ma, lags=lags)
+
+    @Appender(remove_parameters(arma_pacf.__doc__, ["ar", "ma"]))
+    def pacf(self, lags=None):
+        lags = lags or self.nobs
+        return arma_pacf(self.ar, self.ma, lags=lags)
+
+    @Appender(
+        remove_parameters(
+            arma_periodogram.__doc__, ["ar", "ma", "worN", "whole"]
+        )
+    )
+    def periodogram(self, nobs=None):
+        nobs = nobs or self.nobs
+        return arma_periodogram(self.ar, self.ma, worN=nobs)
+
+    @Appender(remove_parameters(arma_impulse_response.__doc__, ["ar", "ma"]))
+    def impulse_response(self, leads=None):
+        leads = leads or self.nobs
+        return arma_impulse_response(self.ar, self.ma, leads=leads)
+
+    @Appender(remove_parameters(arma2ma.__doc__, ["ar", "ma"]))
+    def arma2ma(self, lags=None):
+        lags = lags or self.lags
+        return arma2ma(self.ar, self.ma, lags=lags)
+
+    @Appender(remove_parameters(arma2ar.__doc__, ["ar", "ma"]))
+    def arma2ar(self, lags=None):
+        lags = lags or self.lags
+        return arma2ar(self.ar, self.ma, lags=lags)

     @property
     def arroots(self):
         """Roots of autoregressive lag-polynomial"""
-        pass
+        return self.arpoly.roots()

     @property
     def maroots(self):
         """Roots of moving average lag-polynomial"""
-        pass
+        return self.mapoly.roots()

     @property
     def isstationary(self):
@@ -723,7 +946,10 @@ class ArmaProcess:
         bool
              True if autoregressive roots are outside unit circle.
         """
-        pass
+        if np.all(np.abs(self.arroots) > 1.0):
+            return True
+        else:
+            return False

     @property
     def isinvertible(self):
@@ -735,7 +961,10 @@ class ArmaProcess:
         bool
              True if moving average roots are outside unit circle.
         """
-        pass
+        if np.all(np.abs(self.maroots) > 1):
+            return True
+        else:
+            return False

     def invertroots(self, retnew=False):
         """
@@ -758,4 +987,24 @@ class ArmaProcess:
            If retnew is true, then return a new instance with invertible
            MA-polynomial.
         """
-        pass
+        # TODO: variable returns like this?
+        pr = self.maroots
+        mainv = self.ma
+        invertible = self.isinvertible
+        if not invertible:
+            pr[np.abs(pr) < 1] = 1.0 / pr[np.abs(pr) < 1]
+            pnew = np.polynomial.Polynomial.fromroots(pr)
+            mainv = pnew.coef / pnew.coef[0]
+
+        if retnew:
+            return self.__class__(self.ar, mainv, nobs=self.nobs)
+        else:
+            return mainv, invertible
+
+    @Appender(str(_generate_sample_doc))
+    def generate_sample(
+        self, nsample=100, scale=1.0, distrvs=None, axis=0, burnin=0
+    ):
+        return arma_generate_sample(
+            self.ar, self.ma, nsample, scale, distrvs, axis=axis, burnin=burnin
+        )
diff --git a/statsmodels/tsa/arma_mle.py b/statsmodels/tsa/arma_mle.py
index 380149ffe..f1d07c24f 100644
--- a/statsmodels/tsa/arma_mle.py
+++ b/statsmodels/tsa/arma_mle.py
@@ -22,4 +22,5 @@ class Arma:

     def __init__(self, endog, exog=None):
         raise NotImplementedError(
-            'ARMA has been removed. Use SARIMAX, ARIMA or AutoReg')
+            "ARMA has been removed. Use SARIMAX, ARIMA or AutoReg"
+        )
diff --git a/statsmodels/tsa/base/datetools.py b/statsmodels/tsa/base/datetools.py
index d7789773d..dc231667c 100644
--- a/statsmodels/tsa/base/datetools.py
+++ b/statsmodels/tsa/base/datetools.py
@@ -2,31 +2,49 @@
 Tools for working with dates
 """
 from statsmodels.compat.python import asstr, lmap, lrange, lzip
+
 import datetime
 import re
+
 import numpy as np
 from pandas import to_datetime
-_quarter_to_day = {'1': (3, 31), '2': (6, 30), '3': (9, 30), '4': (12, 31),
-    'I': (3, 31), 'II': (6, 30), 'III': (9, 30), 'IV': (12, 31)}
+
+_quarter_to_day = {
+        "1" : (3, 31),
+        "2" : (6, 30),
+        "3" : (9, 30),
+        "4" : (12, 31),
+        "I" : (3, 31),
+        "II" : (6, 30),
+        "III" : (9, 30),
+        "IV" : (12, 31)
+        }
+
+
 _mdays = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
-_months_with_days = lzip(lrange(1, 13), _mdays)
-_month_to_day = dict(zip(map(str, lrange(1, 13)), _months_with_days))
-_month_to_day.update(dict(zip(['I', 'II', 'III', 'IV', 'V', 'VI', 'VII',
-    'VIII', 'IX', 'X', 'XI', 'XII'], _months_with_days)))
-_y_pattern = '^\\d?\\d?\\d?\\d$'
-_q_pattern = """
+_months_with_days = lzip(lrange(1,13), _mdays)
+_month_to_day = dict(zip(map(str,lrange(1,13)), _months_with_days))
+_month_to_day.update(dict(zip(["I", "II", "III", "IV", "V", "VI",
+                               "VII", "VIII", "IX", "X", "XI", "XII"],
+                               _months_with_days)))
+
+# regex patterns
+_y_pattern = r'^\d?\d?\d?\d$'
+
+_q_pattern = r'''
 ^               # beginning of string
-\\d?\\d?\\d?\\d     # match any number 1-9999, includes leading zeros
+\d?\d?\d?\d     # match any number 1-9999, includes leading zeros

 (:?q)           # use q or a : as a separator

 ([1-4]|(I{1,3}V?)) # match 1-4 or I-IV roman numerals

 $               # end of string
-"""
-_m_pattern = """
+'''
+
+_m_pattern = r'''
 ^               # beginning of string
-\\d?\\d?\\d?\\d     # match any number 1-9999, includes leading zeros
+\d?\d?\d?\d     # match any number 1-9999, includes leading zeros

 (:?m)           # use m or a : as a separator

@@ -34,7 +52,13 @@ _m_pattern = """
                                               # I-XII roman numerals

 $               # end of string
-"""
+'''
+
+
+#NOTE: see also ts.extras.isleapyear, which accepts a sequence
+def _is_leap(year):
+    year = int(year)
+    return year % 4 == 0 and (year % 100 != 0 or year % 400 == 0)


 def date_parser(timestr, parserinfo=None, **kwargs):
@@ -44,7 +68,24 @@ def date_parser(timestr, parserinfo=None, **kwargs):
     with q instead of m. It is not case sensitive. The default for annual
     data is the end of the year, which also differs from dateutil.
     """
-    pass
+    flags = re.IGNORECASE | re.VERBOSE
+    if re.search(_q_pattern, timestr, flags):
+        y,q = timestr.replace(":","").lower().split('q')
+        month, day = _quarter_to_day[q.upper()]
+        year = int(y)
+    elif re.search(_m_pattern, timestr, flags):
+        y,m = timestr.replace(":","").lower().split('m')
+        month, day = _month_to_day[m.upper()]
+        year = int(y)
+        if _is_leap(y) and month == 2:
+            day += 1
+    elif re.search(_y_pattern, timestr, flags):
+        month, day = 12, 31
+        year = int(timestr)
+    else:
+        return to_datetime(timestr, **kwargs)
+
+    return datetime.datetime(year, month, day)


 def date_range_str(start, end=None, length=None):
@@ -65,7 +106,46 @@ def date_range_str(start, end=None, length=None):
     date_range : list
         List of strings
     """
-    pass
+    flags = re.IGNORECASE | re.VERBOSE
+
+    start = start.lower()
+    if re.search(_m_pattern, start, flags):
+        annual_freq = 12
+        split = 'm'
+    elif re.search(_q_pattern, start, flags):
+        annual_freq = 4
+        split = 'q'
+    elif re.search(_y_pattern, start, flags):
+        annual_freq = 1
+        start += 'a1' # hack
+        if end:
+            end += 'a1'
+        split = 'a'
+    else:
+        raise ValueError("Date %s not understood" % start)
+    yr1, offset1 = lmap(int, start.replace(":","").split(split))
+    if end is not None:
+        end = end.lower()
+        yr2, offset2 = lmap(int, end.replace(":","").split(split))
+    else:  # length > 0
+        if not length:
+            raise ValueError("length must be provided if end is None")
+        yr2 = yr1 + length // annual_freq
+        offset2 = length % annual_freq + (offset1 - 1)
+    years = [str(yr) for yr in np.repeat(lrange(yr1 + 1, yr2), annual_freq)]
+    # tack on first year
+    years = [(str(yr1))] * (annual_freq + 1 - offset1) + years
+    # tack on last year
+    years = years + [(str(yr2))] * offset2
+    if split != 'a':
+        offset = np.tile(np.arange(1, annual_freq + 1), yr2 - yr1 - 1).astype("a2")
+        offset = np.r_[np.arange(offset1, annual_freq + 1).astype('a2'), offset]
+        offset = np.r_[offset, np.arange(1, offset2 + 1).astype('a2')]
+        date_arr_range = [''.join([i, split, asstr(j)])
+                          for i, j in zip(years, offset)]
+    else:
+        date_arr_range = years
+    return date_arr_range


 def dates_from_str(dates):
@@ -84,7 +164,7 @@ def dates_from_str(dates):
     date_list : ndarray
         A list of datetime types.
     """
-    pass
+    return lmap(date_parser, dates)


 def dates_from_range(start, end=None, length=None):
@@ -113,4 +193,5 @@ def dates_from_range(start, end=None, length=None):
     date_list : ndarray
         A list of datetime types.
     """
-    pass
+    dates = date_range_str(start, end, length)
+    return dates_from_str(dates)
diff --git a/statsmodels/tsa/base/prediction.py b/statsmodels/tsa/base/prediction.py
index c934ba44d..0b46817a9 100644
--- a/statsmodels/tsa/base/prediction.py
+++ b/statsmodels/tsa/base/prediction.py
@@ -24,55 +24,82 @@ class PredictionResults:
         index of ``predicted_mean``
     """

-    def __init__(self, predicted_mean, var_pred_mean, dist=None, df=None,
-        row_labels=None):
+    def __init__(
+        self,
+        predicted_mean,
+        var_pred_mean,
+        dist=None,
+        df=None,
+        row_labels=None,
+    ):
         self._predicted_mean = np.asarray(predicted_mean)
         self._var_pred_mean = np.asarray(var_pred_mean)
         self._df = df
         self._row_labels = row_labels
         if row_labels is None:
-            self._row_labels = getattr(predicted_mean, 'index', None)
+            self._row_labels = getattr(predicted_mean, "index", None)
         self._use_pandas = self._row_labels is not None
-        if dist != 't' and df is not None:
+
+        if dist != "t" and df is not None:
             raise ValueError('df must be None when dist is not "t"')
-        if dist is None or dist == 'norm':
+
+        if dist is None or dist == "norm":
             self.dist = stats.norm
             self.dist_args = ()
-        elif dist == 't':
+        elif dist == "t":
             self.dist = stats.t
-            self.dist_args = self._df,
+            self.dist_args = (self._df,)
         elif isinstance(dist, stats.distributions.rv_frozen):
             self.dist = dist
             self.dist_args = ()
         else:
             raise ValueError('dist must be a None, "norm", "t" or a callable.')

+    def _wrap_pandas(self, value, name=None, columns=None):
+        if not self._use_pandas:
+            return value
+        if value.ndim == 1:
+            return pd.Series(value, index=self._row_labels, name=name)
+        return pd.DataFrame(value, index=self._row_labels, columns=columns)
+
     @property
     def row_labels(self):
         """The row labels used in pandas-types."""
-        pass
+        return self._row_labels

     @property
     def predicted_mean(self):
         """The predicted mean"""
-        pass
+        return self._wrap_pandas(self._predicted_mean, "predicted_mean")

     @property
     def var_pred_mean(self):
         """The variance of the predicted mean"""
-        pass
+        if self._var_pred_mean.ndim > 2:
+            return self._var_pred_mean
+        return self._wrap_pandas(self._var_pred_mean, "var_pred_mean")

     @property
     def se_mean(self):
         """The standard deviation of the predicted mean"""
-        pass
+        ndim = self._var_pred_mean.ndim
+        if ndim == 1:
+            values = np.sqrt(self._var_pred_mean)
+        elif ndim == 3:
+            values = np.sqrt(self._var_pred_mean.T.diagonal())
+        else:
+            raise NotImplementedError("var_pre_mean must be 1 or 3 dim")
+        return self._wrap_pandas(values, "mean_se")

     @property
     def tvalues(self):
         """The ratio of the predicted mean to its standard deviation"""
-        pass
+        val = self.predicted_mean / self.se_mean
+        if isinstance(val, pd.Series):
+            val.name = "tvalues"
+        return val

-    def t_test(self, value=0, alternative='two-sided'):
+    def t_test(self, value=0, alternative="two-sided"):
         """
         z- or t-test for hypothesis that mean is equal to value

@@ -92,7 +119,18 @@ class PredictionResults:
             the attribute of the instance, specified in `__init__`. Default
             if not specified is the normal distribution.
         """
-        pass
+        # assumes symmetric distribution
+        stat = (self.predicted_mean - value) / self.se_mean
+
+        if alternative in ["two-sided", "2-sided", "2s"]:
+            pvalue = self.dist.sf(np.abs(stat), *self.dist_args) * 2
+        elif alternative in ["larger", "l"]:
+            pvalue = self.dist.sf(stat, *self.dist_args)
+        elif alternative in ["smaller", "s"]:
+            pvalue = self.dist.cdf(stat, *self.dist_args)
+        else:
+            raise ValueError("invalid alternative")
+        return stat, pvalue

     def conf_int(self, alpha=0.05):
         """
@@ -112,7 +150,14 @@ class PredictionResults:
             The array has the lower and the upper limit of the prediction
             interval in the columns.
         """
-        pass
+        se = self.se_mean
+        q = self.dist.ppf(1 - alpha / 2.0, *self.dist_args)
+        lower = self.predicted_mean - q * se
+        upper = self.predicted_mean + q * se
+        ci = np.column_stack((lower, upper))
+        if self._use_pandas:
+            return self._wrap_pandas(ci, columns=["lower", "upper"])
+        return ci

     def summary_frame(self, alpha=0.05):
         """
@@ -133,4 +178,12 @@ class PredictionResults:
         Fixes alpha to 0.05 so that the confidence interval should have 95%
         coverage.
         """
-        pass
+        ci_mean = np.asarray(self.conf_int(alpha=alpha))
+        lower, upper = ci_mean[:, 0], ci_mean[:, 1]
+        to_include = {
+            "mean": self.predicted_mean,
+            "mean_se": self.se_mean,
+            "mean_ci_lower": lower,
+            "mean_ci_upper": upper,
+        }
+        return pd.DataFrame(to_include)
diff --git a/statsmodels/tsa/base/tsa_model.py b/statsmodels/tsa/base/tsa_model.py
index 37e7e0db9..a14628d87 100644
--- a/statsmodels/tsa/base/tsa_model.py
+++ b/statsmodels/tsa/base/tsa_model.py
@@ -1,14 +1,34 @@
 from __future__ import annotations
-from statsmodels.compat.pandas import is_float_index, is_int_index, is_numeric_dtype
+
+from statsmodels.compat.pandas import (
+    is_float_index,
+    is_int_index,
+    is_numeric_dtype,
+)
+
 import numbers
 import warnings
+
 import numpy as np
-from pandas import DatetimeIndex, Index, Period, PeriodIndex, RangeIndex, Series, Timestamp, date_range, period_range, to_datetime
+from pandas import (
+    DatetimeIndex,
+    Index,
+    Period,
+    PeriodIndex,
+    RangeIndex,
+    Series,
+    Timestamp,
+    date_range,
+    period_range,
+    to_datetime,
+)
 from pandas.tseries.frequencies import to_offset
+
 from statsmodels.base.data import PandasData
 import statsmodels.base.model as base
 import statsmodels.base.wrapper as wrap
 from statsmodels.tools.sm_exceptions import ValueWarning
+
 _tsa_doc = """
     %(model)s

@@ -23,7 +43,9 @@ _tsa_doc = """
         'M', 'A', or 'Q'. This is optional if dates are given.
     %(extra_params)s
     %(extra_sections)s"""
-_model_doc = 'Timeseries model base class'
+
+_model_doc = "Timeseries model base class"
+
 _generic_params = base._model_params_doc
 _missing_param_doc = base._missing_param_doc

@@ -58,7 +80,133 @@ def get_index_loc(key, index):
     the index up to and including key, and then returns the location in the
     new index.
     """
-    pass
+    base_index = index
+
+    index = base_index
+    date_index = isinstance(base_index, (PeriodIndex, DatetimeIndex))
+    int_index = is_int_index(base_index)
+    range_index = isinstance(base_index, RangeIndex)
+    index_class = type(base_index)
+    nobs = len(index)
+
+    # Special handling for RangeIndex
+    if range_index and isinstance(key, (int, np.integer)):
+        # Negative indices (that lie in the Index)
+        if key < 0 and -key <= nobs:
+            key = nobs + key
+        # Out-of-sample (note that we include key itself in the new index)
+        elif key > nobs - 1:
+            # See gh5835. Remove the except after pandas 0.25 required.
+            try:
+                base_index_start = base_index.start
+                base_index_step = base_index.step
+            except AttributeError:
+                base_index_start = base_index._start
+                base_index_step = base_index._step
+            stop = base_index_start + (key + 1) * base_index_step
+            index = RangeIndex(
+                start=base_index_start, stop=stop, step=base_index_step
+            )
+
+    # Special handling for NumericIndex
+    if (
+        not range_index
+        and int_index
+        and not date_index
+        and isinstance(key, (int, np.integer))
+    ):
+        # Negative indices (that lie in the Index)
+        if key < 0 and -key <= nobs:
+            key = nobs + key
+        # Out-of-sample (note that we include key itself in the new index)
+        elif key > base_index[-1]:
+            index = Index(np.arange(base_index[0], int(key + 1)))
+
+    # Special handling for date indexes
+    if date_index:
+        # Use index type to choose creation function
+        if index_class is DatetimeIndex:
+            index_fn = date_range
+        else:
+            index_fn = period_range
+        # Integer key (i.e. already given a location)
+        if isinstance(key, (int, np.integer)):
+            # Negative indices (that lie in the Index)
+            if key < 0 and -key < nobs:
+                key = index[nobs + key]
+            # Out-of-sample (note that we include key itself in the new
+            # index)
+            elif key > len(base_index) - 1:
+                index = index_fn(
+                    start=base_index[0],
+                    periods=int(key + 1),
+                    freq=base_index.freq,
+                )
+                key = index[-1]
+            else:
+                key = index[key]
+        # Other key types (i.e. string date or some datetime-like object)
+        else:
+            # Convert the key to the appropriate date-like object
+            if index_class is PeriodIndex:
+                date_key = Period(key, freq=base_index.freq)
+            else:
+                date_key = Timestamp(key)
+
+            # Out-of-sample
+            if date_key > base_index[-1]:
+                # First create an index that may not always include `key`
+                index = index_fn(
+                    start=base_index[0], end=date_key, freq=base_index.freq
+                )
+
+                # Now make sure we include `key`
+                if not index[-1] == date_key:
+                    index = index_fn(
+                        start=base_index[0],
+                        periods=len(index) + 1,
+                        freq=base_index.freq,
+                    )
+
+                # To avoid possible inconsistencies with `get_loc` below,
+                # set the key directly equal to the last index location
+                key = index[-1]
+
+    # Get the location
+    if date_index:
+        # (note that get_loc will throw a KeyError if key is invalid)
+        loc = index.get_loc(key)
+    elif int_index or range_index:
+        # For NumericIndex and RangeIndex, key is assumed to be the location
+        # and not an index value (this assumption is required to support
+        # RangeIndex)
+        try:
+            index[key]
+        # We want to raise a KeyError in this case, to keep the exception
+        # consistent across index types.
+        # - Attempting to index with an out-of-bound location (e.g.
+        #   index[10] on an index of length 9) will raise an IndexError
+        #   (as of Pandas 0.22)
+        # - Attemtping to index with a type that cannot be cast to integer
+        #   (e.g. a non-numeric string) will raise a ValueError if the
+        #   index is RangeIndex (otherwise will raise an IndexError)
+        #   (as of Pandas 0.22)
+        except (IndexError, ValueError) as e:
+            raise KeyError(str(e))
+        loc = key
+    else:
+        loc = index.get_loc(key)
+
+    # Check if we now have a modified index
+    index_was_expanded = index is not base_index
+
+    # Return the index through the end of the loc / slice
+    if isinstance(loc, slice):
+        end = loc.stop - 1
+    else:
+        end = loc
+
+    return loc, index[: end + 1], index_was_expanded


 def get_index_label_loc(key, index, row_labels):
@@ -93,12 +241,58 @@ def get_index_label_loc(key, index, row_labels):
     then falling back to try again with the model row labels as the base
     index.
     """
-    pass
-
-
-def get_prediction_index(start, end, nobs, base_index, index=None, silent=
-    False, index_none=False, index_generated=None, data=None) ->tuple[int,
-    int, int, Index | None]:
+    try:
+        loc, index, index_was_expanded = get_index_loc(key, index)
+    except KeyError as e:
+        try:
+            if not isinstance(key, (int, np.integer)):
+                loc = row_labels.get_loc(key)
+            else:
+                raise
+            # Require scalar
+            # Pandas may return a slice if there are multiple matching
+            # locations that are monotonic increasing (otherwise it may
+            # return an array of integer locations, see below).
+            if isinstance(loc, slice):
+                loc = loc.start
+            if isinstance(loc, np.ndarray):
+                # Pandas may return a mask (boolean array), for e.g.:
+                # pd.Index(list('abcb')).get_loc('b')
+                if loc.dtype == bool:
+                    # Return the first True value
+                    # (we know there is at least one True value if we're
+                    # here because otherwise the get_loc call would have
+                    # raised an exception)
+                    loc = np.argmax(loc)
+                # Finally, Pandas may return an integer array of
+                # locations that match the given value, for e.g.
+                # pd.DatetimeIndex(['2001-02', '2001-01']).get_loc('2001')
+                # (this appears to be slightly undocumented behavior, since
+                # only int, slice, and mask are mentioned in docs for
+                # pandas.Index.get_loc as of 0.23.4)
+                else:
+                    loc = loc[0]
+            if not isinstance(loc, numbers.Integral):
+                raise
+
+            index = row_labels[: loc + 1]
+            index_was_expanded = False
+        except:
+            raise e
+    return loc, index, index_was_expanded
+
+
+def get_prediction_index(
+    start,
+    end,
+    nobs,
+    base_index,
+    index=None,
+    silent=False,
+    index_none=False,
+    index_generated=None,
+    data=None,
+) -> tuple[int, int, int, Index | None]:
     """
     Get the location of a specific key in an index or model row labels

@@ -157,16 +351,125 @@ def get_prediction_index(start, end, nobs, base_index, index=None, silent=
     or to index locations in an ambiguous way (while for `NumericIndex`,
     since we have required them to be full indexes, there is no ambiguity).
     """
-    pass

+    # Convert index keys (start, end) to index locations and get associated
+    # indexes.
+    try:
+        start, _, start_oos = get_index_label_loc(
+            start, base_index, data.row_labels
+        )
+    except KeyError:
+        raise KeyError(
+            "The `start` argument could not be matched to a"
+            " location related to the index of the data."
+        )
+    if end is None:
+        end = max(start, len(base_index) - 1)
+    try:
+        end, end_index, end_oos = get_index_label_loc(
+            end, base_index, data.row_labels
+        )
+    except KeyError:
+        raise KeyError(
+            "The `end` argument could not be matched to a"
+            " location related to the index of the data."
+        )
+
+    # Handle slices (if the given index keys cover more than one date)
+    if isinstance(start, slice):
+        start = start.start
+    if isinstance(end, slice):
+        end = end.stop - 1
+
+    # Get the actual index for the prediction
+    prediction_index = end_index[start:]
+
+    # Validate prediction options
+    if end < start:
+        raise ValueError("Prediction must have `end` after `start`.")
+
+    # Handle custom prediction index
+    # First, if we were given an index, check that it's the right size and
+    # use it if so
+    if index is not None:
+        if not len(prediction_index) == len(index):
+            raise ValueError(
+                "Invalid `index` provided in prediction."
+                " Must have length consistent with `start`"
+                " and `end` arguments."
+            )
+        # But if we weren't given Pandas input, this index will not be
+        # used because the data will not be wrapped; in that case, issue
+        # a warning
+        if not isinstance(data, PandasData) and not silent:
+            warnings.warn(
+                "Because the model data (`endog`, `exog`) were"
+                " not given as Pandas objects, the prediction"
+                " output will be Numpy arrays, and the given"
+                " `index` argument will only be used"
+                " internally.",
+                ValueWarning,
+                stacklevel=2,
+            )
+        prediction_index = Index(index)
+    # Now, if we *do not* have a supported index, but we were given some
+    # kind of index...
+    elif index_generated and not index_none:
+        # If we are in sample, and have row labels, use them
+        if data.row_labels is not None and not (start_oos or end_oos):
+            prediction_index = data.row_labels[start : end + 1]
+        # Otherwise, warn the user that they will get an NumericIndex
+        else:
+            if not silent:
+                warnings.warn(
+                    "No supported index is available."
+                    " Prediction results will be given with"
+                    " an integer index beginning at `start`.",
+                    ValueWarning,
+                    stacklevel=2,
+                )
+            warnings.warn(
+                "No supported index is available. In the next"
+                " version, calling this method in a model"
+                " without a supported index will result in an"
+                " exception.",
+                FutureWarning,
+                stacklevel=2,
+            )
+    elif index_none:
+        prediction_index = None
+
+    # For backwards compatibility, set `predict_*` values
+    if prediction_index is not None:
+        data.predict_start = prediction_index[0]
+        data.predict_end = prediction_index[-1]
+        data.predict_dates = prediction_index
+    else:
+        data.predict_start = None
+        data.predict_end = None
+        data.predict_dates = None
+
+    # Compute out-of-sample observations
+    out_of_sample = max(end - (nobs - 1), 0)
+    end -= out_of_sample
+
+    return start, end, out_of_sample, prediction_index

-class TimeSeriesModel(base.LikelihoodModel):
-    __doc__ = _tsa_doc % {'model': _model_doc, 'params': _generic_params,
-        'extra_params': _missing_param_doc, 'extra_sections': ''}

-    def __init__(self, endog, exog=None, dates=None, freq=None, missing=
-        'none', **kwargs):
+class TimeSeriesModel(base.LikelihoodModel):
+    __doc__ = _tsa_doc % {
+        "model": _model_doc,
+        "params": _generic_params,
+        "extra_params": _missing_param_doc,
+        "extra_sections": "",
+    }
+
+    def __init__(
+        self, endog, exog=None, dates=None, freq=None, missing="none", **kwargs
+    ):
         super().__init__(endog, exog, missing=missing, **kwargs)
+
+        # Date handling in indexes
         self._init_dates(dates, freq)

     def _init_dates(self, dates=None, freq=None):
@@ -208,7 +511,200 @@ class TimeSeriesModel(base.LikelihoodModel):
         must have an underlying supported index, even if it is just a generated
         NumericIndex.
         """
-        pass
+
+        # Get our index from `dates` if available, otherwise from whatever
+        # Pandas index we might have retrieved from endog, exog
+        if dates is not None:
+            index = dates
+        else:
+            index = self.data.row_labels
+
+        # Sanity check that we do not have a `freq` without an index
+        if index is None and freq is not None:
+            raise ValueError("Frequency provided without associated index.")
+
+        # If an index is available, see if it is a date-based index or if it
+        # can be coerced to one. (If it cannot we'll fall back, below, to an
+        # internal, 0, 1, ... nobs-1 integer index for modeling purposes)
+        inferred_freq = False
+        if index is not None:
+            # Try to coerce to date-based index
+            if not isinstance(index, (DatetimeIndex, PeriodIndex)):
+                try:
+                    # Only try to coerce non-numeric index types (string,
+                    # list of date-times, etc.)
+                    # Note that np.asarray(Float64Index([...])) yields an
+                    # object dtype array in earlier versions of Pandas (and so
+                    # will not have is_numeric_dtype == True), so explicitly
+                    # check for it here. But note also that in very early
+                    # Pandas (~0.12), Float64Index does not exist (and so the
+                    # statsmodels compat makes it an empty tuple, so in that
+                    # case also check if the first element is a float.
+                    _index = np.asarray(index)
+                    if (
+                        is_numeric_dtype(_index)
+                        or is_float_index(index)
+                        or (isinstance(_index[0], float))
+                    ):
+                        raise ValueError("Numeric index given")
+                    # If a non-index Pandas series was given, only keep its
+                    # values (because we must have a pd.Index type, below, and
+                    # pd.to_datetime will return a Series when passed
+                    # non-list-like objects)
+                    if isinstance(index, Series):
+                        index = index.values
+                    # All coercion is done via pd.to_datetime
+                    # Note: date coercion via pd.to_datetime does not handle
+                    # string versions of PeriodIndex objects most of the time.
+                    _index = to_datetime(index)
+                    # Older versions of Pandas can sometimes fail here and
+                    # return a numpy array - check to make sure it's an index
+                    if not isinstance(_index, Index):
+                        raise ValueError("Could not coerce to date index")
+                    index = _index
+                except:
+                    # Only want to actually raise an exception if `dates` was
+                    # provided but cannot be coerced. If we got the index from
+                    # the row_labels, we'll just ignore it and use the integer
+                    # index below
+                    if dates is not None:
+                        raise ValueError(
+                            "Non-date index index provided to"
+                            " `dates` argument."
+                        )
+            # Now, if we were given, or coerced, a date-based index, make sure
+            # it has an associated frequency
+            if isinstance(index, (DatetimeIndex, PeriodIndex)):
+                # If no frequency, try to get an inferred frequency
+                if freq is None and index.freq is None:
+                    freq = index.inferred_freq
+                    # If we got an inferred frequncy, alert the user
+                    if freq is not None:
+                        inferred_freq = True
+                        if freq is not None:
+                            warnings.warn(
+                                "No frequency information was"
+                                " provided, so inferred frequency %s"
+                                " will be used." % freq,
+                                ValueWarning,
+                                stacklevel = 2,
+                            )
+
+                # Convert the passed freq to a pandas offset object
+                if freq is not None:
+                    freq = to_offset(freq)
+
+                # Now, if no frequency information is available from the index
+                # itself or from the `freq` argument, raise an exception
+                if freq is None and index.freq is None:
+                    # But again, only want to raise the exception if `dates`
+                    # was provided.
+                    if dates is not None:
+                        raise ValueError(
+                            "No frequency information was"
+                            " provided with date index and no"
+                            " frequency could be inferred."
+                        )
+                # However, if the index itself has no frequency information but
+                # the `freq` argument is available (or was inferred), construct
+                # a new index with an associated frequency
+                elif freq is not None and index.freq is None:
+                    resampled_index = date_range(
+                        start=index[0], end=index[-1], freq=freq
+                    )
+                    if not inferred_freq and not resampled_index.equals(index):
+                        raise ValueError(
+                            "The given frequency argument could"
+                            " not be matched to the given index."
+                        )
+                    index = resampled_index
+                # Finally, if the index itself has a frequency and there was
+                # also a given frequency, raise an exception if they are not
+                # equal
+                elif (
+                    freq is not None
+                    and not inferred_freq
+                    and not (index.freq == freq)
+                ):
+                    raise ValueError(
+                        "The given frequency argument is"
+                        " incompatible with the given index."
+                    )
+            # Finally, raise an exception if we could not coerce to date-based
+            # but we were given a frequency argument
+            elif freq is not None:
+                raise ValueError(
+                    "Given index could not be coerced to dates"
+                    " but `freq` argument was provided."
+                )
+
+        # Get attributes of the index
+        has_index = index is not None
+        date_index = isinstance(index, (DatetimeIndex, PeriodIndex))
+        period_index = isinstance(index, PeriodIndex)
+        int_index = is_int_index(index)
+        range_index = isinstance(index, RangeIndex)
+        has_freq = index.freq is not None if date_index else None
+        increment = Index(range(self.endog.shape[0]))
+        is_increment = index.equals(increment) if int_index else None
+        if date_index:
+            try:
+                is_monotonic = index.is_monotonic_increasing
+            except AttributeError:
+                # Remove after pandas 1.5 is minimum
+                is_monotonic = index.is_monotonic
+        else:
+            is_monotonic = None
+
+        # Issue warnings for unsupported indexes
+        if has_index and not (date_index or range_index or is_increment):
+            warnings.warn(
+                "An unsupported index was provided and will be"
+                " ignored when e.g. forecasting.",
+                ValueWarning,
+                stacklevel=2,
+            )
+        if date_index and not has_freq:
+            warnings.warn(
+                "A date index has been provided, but it has no"
+                " associated frequency information and so will be"
+                " ignored when e.g. forecasting.",
+                ValueWarning,
+                stacklevel=2,
+            )
+        if date_index and not is_monotonic:
+            warnings.warn(
+                "A date index has been provided, but it is not"
+                " monotonic and so will be ignored when e.g."
+                " forecasting.",
+                ValueWarning,
+                stacklevel=2,
+            )
+
+        # Construct the internal index
+        index_generated = False
+        valid_index = (
+            (date_index and has_freq and is_monotonic)
+            or (int_index and is_increment)
+            or range_index
+        )
+
+        if valid_index:
+            _index = index
+        else:
+            _index = increment
+            index_generated = True
+        self._index = _index
+        self._index_generated = index_generated
+        self._index_none = index is None
+        self._index_int64 = int_index and not range_index and not date_index
+        self._index_dates = date_index and not index_generated
+        self._index_freq = self._index.freq if self._index_dates else None
+        self._index_inferred_freq = inferred_freq
+
+        # For backwards compatibility, set data.dates, data.freq
+        self.data.dates = self._index if self._index_dates else None
+        self.data.freq = self._index.freqstr if self._index_dates else None

     def _get_index_loc(self, key, base_index=None):
         """
@@ -240,7 +736,10 @@ class TimeSeriesModel(base.LikelihoodModel):
         an NumericIndex or a date index, this function extends the index up to
         and including key, and then returns the location in the new index.
         """
-        pass
+
+        if base_index is None:
+            base_index = self._index
+        return get_index_loc(key, base_index)

     def _get_index_label_loc(self, key, base_index=None):
         """
@@ -273,10 +772,11 @@ class TimeSeriesModel(base.LikelihoodModel):
         then falling back to try again with the model row labels as the base
         index.
         """
-        pass
+        if base_index is None:
+            base_index = self._index
+        return get_index_label_loc(key, base_index, self.data.row_labels)

-    def _get_prediction_index(self, start, end, index=None, silent=False
-        ) ->tuple[int, int, int, Index | None]:
+    def _get_prediction_index(self, start, end, index=None, silent=False) -> tuple[int, int, int, Index | None]:
         """
         Get the location of a specific key in an index or model row labels

@@ -332,13 +832,38 @@ class TimeSeriesModel(base.LikelihoodModel):
         or to index locations in an ambiguous way (while for `NumericIndex`,
         since we have required them to be full indexes, there is no ambiguity).
         """
-        pass
-    exog_names = property(_get_exog_names, _set_exog_names, None,
-        'The names of the exogenous variables.')
+        nobs = len(self.endog)
+        return get_prediction_index(
+            start,
+            end,
+            nobs,
+            base_index=self._index,
+            index=index,
+            silent=silent,
+            index_none=self._index_none,
+            index_generated=self._index_generated,
+            data=self.data,
+        )
+
+    def _get_exog_names(self):
+        return self.data.xnames
+
+    def _set_exog_names(self, vals):
+        if not isinstance(vals, list):
+            vals = [vals]
+        self.data.xnames = vals
+
+    # TODO: This is an antipattern, fix/remove with VAR
+    # overwrite with writable property for (V)AR models
+    exog_names = property(
+        _get_exog_names,
+        _set_exog_names,
+        None,
+        "The names of the exogenous variables.",
+    )


 class TimeSeriesModelResults(base.LikelihoodModelResults):
-
     def __init__(self, model, params, normalized_cov_params, scale=1.0):
         self.data = model.data
         super().__init__(model, params, normalized_cov_params, scale)
@@ -346,11 +871,15 @@ class TimeSeriesModelResults(base.LikelihoodModelResults):

 class TimeSeriesResultsWrapper(wrap.ResultsWrapper):
     _attrs = {}
-    _wrap_attrs = wrap.union_dicts(base.LikelihoodResultsWrapper.
-        _wrap_attrs, _attrs)
-    _methods = {'predict': 'dates'}
-    _wrap_methods = wrap.union_dicts(base.LikelihoodResultsWrapper.
-        _wrap_methods, _methods)
-
-
-wrap.populate_wrapper(TimeSeriesResultsWrapper, TimeSeriesModelResults)
+    _wrap_attrs = wrap.union_dicts(
+        base.LikelihoodResultsWrapper._wrap_attrs, _attrs
+    )
+    _methods = {"predict": "dates"}
+    _wrap_methods = wrap.union_dicts(
+        base.LikelihoodResultsWrapper._wrap_methods, _methods
+    )
+
+
+wrap.populate_wrapper(
+    TimeSeriesResultsWrapper, TimeSeriesModelResults  # noqa:E305
+)
diff --git a/statsmodels/tsa/coint_tables.py b/statsmodels/tsa/coint_tables.py
index 9ecd71d14..0bf084b6a 100644
--- a/statsmodels/tsa/coint_tables.py
+++ b/statsmodels/tsa/coint_tables.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Created on Thu Aug 30 12:26:38 2012
 Author: Josef Perktold
@@ -36,8 +37,11 @@ function jc =  c_sja(n,p)
 % jlesage@spatial-econometrics.com

 """
+
 import numpy as np
-ss_ejcp0 = """         2.9762  4.1296  6.9406
+
+ss_ejcp0 = '''\
+         2.9762  4.1296  6.9406
          9.4748 11.2246 15.0923
         15.7175 17.7961 22.2519
         21.8370 24.1592 29.0609
@@ -48,8 +52,10 @@ ss_ejcp0 = """         2.9762  4.1296  6.9406
         51.8528 54.9629 61.3449
         57.7954 61.0404 67.6415
         63.7248 67.0756 73.8856
-        69.6513 73.0946 80.0937"""
-ss_ejcp1 = """         2.7055   3.8415   6.6349
+        69.6513 73.0946 80.0937'''
+
+ss_ejcp1 = '''\
+         2.7055   3.8415   6.6349
         12.2971  14.2639  18.5200
         18.8928  21.1314  25.8650
         25.1236  27.5858  32.7172
@@ -60,8 +66,10 @@ ss_ejcp1 = """         2.7055   3.8415   6.6349
         55.2412  58.4332  64.9960
         61.2041  64.5040  71.2525
         67.1307  70.5392  77.4877
-        73.0563  76.5734  83.7105"""
-ss_ejcp2 = """         2.7055   3.8415   6.6349
+        73.0563  76.5734  83.7105'''
+
+ss_ejcp2 = '''\
+         2.7055   3.8415   6.6349
         15.0006  17.1481  21.7465
         21.8731  24.2522  29.2631
         28.2398  30.8151  36.1930
@@ -72,11 +80,29 @@ ss_ejcp2 = """         2.7055   3.8415   6.6349
         58.5316  61.8051  68.5030
         64.5292  67.9040  74.7434
         70.4630  73.9355  81.0678
-        76.4081  79.9878  87.2395"""
-ejcp0 = np.array(ss_ejcp0.split(), float).reshape(-1, 3)
-ejcp1 = np.array(ss_ejcp1.split(), float).reshape(-1, 3)
-ejcp2 = np.array(ss_ejcp2.split(), float).reshape(-1, 3)
-"""
+        76.4081  79.9878  87.2395'''
+
+ejcp0 = np.array(ss_ejcp0.split(),float).reshape(-1,3)
+ejcp1 = np.array(ss_ejcp1.split(),float).reshape(-1,3)
+ejcp2 = np.array(ss_ejcp2.split(),float).reshape(-1,3)
+
+
+def c_sja(n, p):
+    if ((p > 1) or (p < -1)):
+        jc = np.full(3, np.nan)
+    elif ((n > 12) or (n < 1)):
+        jc = np.full(3, np.nan)
+    elif p == -1:
+        jc = ejcp0[n-1,:]
+    elif p == 0:
+        jc = ejcp1[n-1,:]
+    elif p == 1:
+        jc = ejcp2[n-1,:]
+
+    return jc
+
+
+'''
 function jc = c_sjt(n,p)
 % PURPOSE: find critical values for Johansen trace statistic
 % ------------------------------------------------------------
@@ -124,8 +150,11 @@ function jc = c_sjt(n,p)
 %       248.77 256.23 270.47
 %       293.83 301.95 318.14];
 %
-"""
-ss_tjcp0 = """         2.9762   4.1296   6.9406
+'''
+
+
+ss_tjcp0 = '''\
+         2.9762   4.1296   6.9406
         10.4741  12.3212  16.3640
         21.7781  24.2761  29.5147
         37.0339  40.1749  46.5716
@@ -136,8 +165,11 @@ ss_tjcp0 = """         2.9762   4.1296   6.9406
        173.2292 179.5199 191.8122
        212.4721 219.4051 232.8291
        255.6732 263.2603 277.9962
-       302.9054 311.1288 326.9716"""
-ss_tjcp1 = """          2.7055   3.8415   6.6349
+       302.9054 311.1288 326.9716'''
+
+
+ss_tjcp1 = '''\
+          2.7055   3.8415   6.6349
          13.4294  15.4943  19.9349
          27.0669  29.7961  35.4628
          44.4929  47.8545  54.6815
@@ -148,8 +180,10 @@ ss_tjcp1 = """          2.7055   3.8415   6.6349
         190.8714 197.3772 210.0366
         232.1030 239.2468 253.2526
         277.3740 285.1402 300.2821
-        326.5354 334.9795 351.2150"""
-ss_tjcp2 = """           2.7055   3.8415   6.6349
+        326.5354 334.9795 351.2150'''
+
+ss_tjcp2 = '''\
+           2.7055   3.8415   6.6349
           16.1619  18.3985  23.1485
           32.0645  35.0116  41.0815
           51.6492  55.2459  62.5202
@@ -160,10 +194,30 @@ ss_tjcp2 = """           2.7055   3.8415   6.6349
          208.3582 215.1268 228.2226
          251.6293 259.0267 273.3838
          298.8836 306.8988 322.4264
-         350.1125 358.7190 375.3203"""
-tjcp0 = np.array(ss_tjcp0.split(), float).reshape(-1, 3)
-tjcp1 = np.array(ss_tjcp1.split(), float).reshape(-1, 3)
-tjcp2 = np.array(ss_tjcp2.split(), float).reshape(-1, 3)
+         350.1125 358.7190 375.3203'''
+
+tjcp0 = np.array(ss_tjcp0.split(),float).reshape(-1,3)
+tjcp1 = np.array(ss_tjcp1.split(),float).reshape(-1,3)
+tjcp2 = np.array(ss_tjcp2.split(),float).reshape(-1,3)
+
+
+def c_sjt(n, p):
+    if ((p > 1) or (p < -1)):
+        jc = np.full(3, np.nan)
+    elif ((n > 12) or (n < 1)):
+        jc = np.full(3, np.nan)
+    elif p == -1:
+        jc = tjcp0[n-1,:]
+    elif p == 0:
+        jc = tjcp1[n-1,:]
+    elif p == 1:
+        jc = tjcp2[n-1,:]
+    else:
+        raise ValueError('invalid p')
+
+    return jc
+
+
 if __name__ == '__main__':
     for p in range(-2, 3, 1):
         for n in range(12):
diff --git a/statsmodels/tsa/descriptivestats.py b/statsmodels/tsa/descriptivestats.py
index 08f30e4ea..fc2b33cd3 100644
--- a/statsmodels/tsa/descriptivestats.py
+++ b/statsmodels/tsa/descriptivestats.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Descriptive Statistics for Time Series

 Created on Sat Oct 30 14:24:08 2010
@@ -5,16 +6,75 @@ Created on Sat Oct 30 14:24:08 2010
 Author: josef-pktd
 License: BSD(3clause)
 """
+
 import numpy as np
 from . import stattools as stt


+#todo: check subclassing for descriptive stats classes
 class TsaDescriptive:
-    """collection of descriptive statistical methods for time series
+    '''collection of descriptive statistical methods for time series

-    """
+    '''

     def __init__(self, data, label=None, name=''):
         self.data = data
         self.label = label
         self.name = name
+
+    def filter(self, num, den):
+        from scipy.signal import lfilter
+        xfiltered = lfilter(num, den, self.data)
+        return self.__class__(xfiltered, self.label, self.name + '_filtered')
+
+    def detrend(self, order=1):
+        from . import tsatools
+        xdetrended = tsatools.detrend(self.data, order=order)
+        return self.__class__(xdetrended, self.label, self.name + '_detrended')
+
+    def fit(self, order=(1,0,1), **kwds):
+        from .arima_model import ARMA
+        self.mod = ARMA(self.data)
+        self.res = self.mod.fit(order=order, **kwds)
+        #self.estimated_process =
+        return self.res
+
+    def acf(self, nlags=40):
+        return stt.acf(self.data, nlags=nlags)
+
+    def pacf(self, nlags=40):
+        return stt.pacf(self.data, nlags=nlags)
+
+    def periodogram(self):
+        #does not return frequesncies
+        return stt.periodogram(self.data)
+
+    # copied from fftarma.py
+    def plot4(self, fig=None, nobs=100, nacf=20, nfreq=100):
+        data = self.data
+        acf = self.acf(nacf)
+        pacf = self.pacf(nacf)
+        w = np.linspace(0, np.pi, nfreq, endpoint=False)
+        spdr = self.periodogram()[:nfreq] #(w)
+
+        if fig is None:
+            import matplotlib.pyplot as plt
+            fig = plt.figure()
+        ax = fig.add_subplot(2,2,1)
+        namestr = ' for %s' % self.name if self.name else ''
+        ax.plot(data)
+        ax.set_title('Time series' + namestr)
+
+        ax = fig.add_subplot(2,2,2)
+        ax.plot(acf)
+        ax.set_title('Autocorrelation' + namestr)
+
+        ax = fig.add_subplot(2,2,3)
+        ax.plot(spdr) # (wr, spdr)
+        ax.set_title('Power Spectrum' + namestr)
+
+        ax = fig.add_subplot(2,2,4)
+        ax.plot(pacf)
+        ax.set_title('Partial Autocorrelation' + namestr)
+
+        return fig
diff --git a/statsmodels/tsa/deterministic.py b/statsmodels/tsa/deterministic.py
index e13faa123..694473031 100644
--- a/statsmodels/tsa/deterministic.py
+++ b/statsmodels/tsa/deterministic.py
@@ -1,30 +1,50 @@
-from statsmodels.compat.pandas import PD_LT_2_2_0, Appender, is_int_index, to_numpy
+from statsmodels.compat.pandas import (
+    PD_LT_2_2_0,
+    Appender,
+    is_int_index,
+    to_numpy,
+)
+
 from abc import ABC, abstractmethod
 import datetime as dt
 from typing import Hashable, List, Optional, Sequence, Set, Tuple, Type, Union
+
 import numpy as np
 import pandas as pd
 from scipy.linalg import qr
+
 from statsmodels.iolib.summary import d_or_f
-from statsmodels.tools.validation import bool_like, float_like, required_int_like, string_like
+from statsmodels.tools.validation import (
+    bool_like,
+    float_like,
+    required_int_like,
+    string_like,
+)
 from statsmodels.tsa.tsatools import freq_to_period
+
 DateLike = Union[dt.datetime, pd.Timestamp, np.datetime64]
 IntLike = Union[int, np.integer]
-START_BEFORE_INDEX_ERR = """start is less than the first observation in the index. Values can only be created for observations after the start of the index.
+
+
+START_BEFORE_INDEX_ERR = """\
+start is less than the first observation in the index. Values can only be \
+created for observations after the start of the index.
 """


 class DeterministicTerm(ABC):
     """Abstract Base Class for all Deterministic Terms"""
+
+    # Set _is_dummy if the term is a dummy variable process
     _is_dummy = False

     @property
-    def is_dummy(self) ->bool:
+    def is_dummy(self) -> bool:
         """Flag indicating whether the values produced are dummy variables"""
-        pass
+        return self._is_dummy

     @abstractmethod
-    def in_sample(self, index: Sequence[Hashable]) ->pd.DataFrame:
+    def in_sample(self, index: Sequence[Hashable]) -> pd.DataFrame:
         """
         Produce deterministic trends for in-sample fitting.

@@ -39,11 +59,14 @@ class DeterministicTerm(ABC):
         DataFrame
             A DataFrame containing the deterministic terms.
         """
-        pass

     @abstractmethod
-    def out_of_sample(self, steps: int, index: Sequence[Hashable],
-        forecast_index: Optional[Sequence[Hashable]]=None) ->pd.DataFrame:
+    def out_of_sample(
+        self,
+        steps: int,
+        index: Sequence[Hashable],
+        forecast_index: Optional[Sequence[Hashable]] = None,
+    ) -> pd.DataFrame:
         """
         Produce deterministic trends for out-of-sample forecasts

@@ -63,38 +86,90 @@ class DeterministicTerm(ABC):
         DataFrame
             A DataFrame containing the deterministic terms.
         """
-        pass

     @abstractmethod
-    def __str__(self) ->str:
+    def __str__(self) -> str:
         """A meaningful string representation of the term"""

-    def __hash__(self) ->int:
+    def __hash__(self) -> int:
         name: Tuple[Hashable, ...] = (type(self).__name__,)
         return hash(name + self._eq_attr)

     @property
     @abstractmethod
-    def _eq_attr(self) ->Tuple[Hashable, ...]:
+    def _eq_attr(self) -> Tuple[Hashable, ...]:
         """tuple of attributes that are used for equality comparison"""
-        pass

     @staticmethod
-    def _extend_index(index: pd.Index, steps: int, forecast_index: Optional
-        [Sequence[Hashable]]=None) ->pd.Index:
-        """Extend the forecast index"""
-        pass
-
-    def __repr__(self) ->str:
-        return self.__str__() + f' at 0x{id(self):0x}'
+    def _index_like(index: Sequence[Hashable]) -> pd.Index:
+        if isinstance(index, pd.Index):
+            return index
+        try:
+            return pd.Index(index)
+        except Exception:
+            raise TypeError("index must be a pandas Index or index-like")

-    def __eq__(self, other: object) ->bool:
+    @staticmethod
+    def _extend_index(
+        index: pd.Index,
+        steps: int,
+        forecast_index: Optional[Sequence[Hashable]] = None,
+    ) -> pd.Index:
+        """Extend the forecast index"""
+        if forecast_index is not None:
+            forecast_index = DeterministicTerm._index_like(forecast_index)
+            assert isinstance(forecast_index, pd.Index)
+            if forecast_index.shape[0] != steps:
+                raise ValueError(
+                    "The number of values in forecast_index "
+                    f"({forecast_index.shape[0]}) must match steps ({steps})."
+                )
+            return forecast_index
+        if isinstance(index, pd.PeriodIndex):
+            return pd.period_range(
+                index[-1] + 1, periods=steps, freq=index.freq
+            )
+        elif isinstance(index, pd.DatetimeIndex) and index.freq is not None:
+            next_obs = pd.date_range(index[-1], freq=index.freq, periods=2)[1]
+            return pd.date_range(next_obs, freq=index.freq, periods=steps)
+        elif isinstance(index, pd.RangeIndex):
+            assert isinstance(index, pd.RangeIndex)
+            try:
+                step = index.step
+                start = index.stop
+            except AttributeError:
+                # TODO: Remove after pandas min ver is 1.0.0+
+                step = index[-1] - index[-2] if len(index) > 1 else 1
+                start = index[-1] + step
+            stop = start + step * steps
+            return pd.RangeIndex(start, stop, step=step)
+        elif is_int_index(index) and np.all(np.diff(index) == 1):
+            idx_arr = np.arange(index[-1] + 1, index[-1] + steps + 1)
+            return pd.Index(idx_arr)
+        # default range index
+        import warnings
+
+        warnings.warn(
+            "Only PeriodIndexes, DatetimeIndexes with a frequency set, "
+            "RangesIndexes, and Index with a unit increment support "
+            "extending. The index is set will contain the position relative "
+            "to the data length.",
+            UserWarning,
+            stacklevel=2,
+        )
+        nobs = index.shape[0]
+        return pd.RangeIndex(nobs + 1, nobs + steps + 1)
+
+    def __repr__(self) -> str:
+        return self.__str__() + f" at 0x{id(self):0x}"
+
+    def __eq__(self, other: object) -> bool:
         if isinstance(other, type(self)):
             own_attr = self._eq_attr
             oth_attr = other._eq_attr
             if len(own_attr) != len(oth_attr):
                 return False
-            return all([(a == b) for a, b in zip(own_attr, oth_attr)])
+            return all([a == b for a, b in zip(own_attr, oth_attr)])
         else:
             return False

@@ -102,30 +177,51 @@ class DeterministicTerm(ABC):
 class TimeTrendDeterministicTerm(DeterministicTerm, ABC):
     """Abstract Base Class for all Time Trend Deterministic Terms"""

-    def __init__(self, constant: bool=True, order: int=0) ->None:
-        self._constant = bool_like(constant, 'constant')
-        self._order = required_int_like(order, 'order')
+    def __init__(self, constant: bool = True, order: int = 0) -> None:
+        self._constant = bool_like(constant, "constant")
+        self._order = required_int_like(order, "order")

     @property
-    def constant(self) ->bool:
+    def constant(self) -> bool:
         """Flag indicating that a constant is included"""
-        pass
+        return self._constant

     @property
-    def order(self) ->int:
+    def order(self) -> int:
         """Order of the time trend"""
-        pass
+        return self._order

-    def __str__(self) ->str:
+    @property
+    def _columns(self) -> List[str]:
+        columns = []
+        trend_names = {1: "trend", 2: "trend_squared", 3: "trend_cubed"}
+        if self._constant:
+            columns.append("const")
+        for power in range(1, self._order + 1):
+            if power in trend_names:
+                columns.append(trend_names[power])
+            else:
+                columns.append(f"trend**{power}")
+        return columns
+
+    def _get_terms(self, locs: np.ndarray) -> np.ndarray:
+        nterms = int(self._constant) + self._order
+        terms = np.tile(locs, (1, nterms))
+        power = np.zeros((1, nterms), dtype=int)
+        power[0, int(self._constant) :] = np.arange(1, self._order + 1)
+        terms **= power
+        return terms
+
+    def __str__(self) -> str:
         terms = []
         if self._constant:
-            terms.append('Constant')
+            terms.append("Constant")
         if self._order:
-            terms.append(f'Powers 1 to {self._order + 1}')
+            terms.append(f"Powers 1 to {self._order + 1}")
         if not terms:
-            terms = ['Empty']
-        terms_str = ','.join(terms)
-        return f'TimeTrend({terms_str})'
+            terms = ["Empty"]
+        terms_str = ",".join(terms)
+        return f"TimeTrend({terms_str})"


 class TimeTrend(TimeTrendDeterministicTerm):
@@ -155,11 +251,11 @@ class TimeTrend(TimeTrendDeterministicTerm):
     >>> trend_gen.in_sample(data.index)
     """

-    def __init__(self, constant: bool=True, order: int=0) ->None:
+    def __init__(self, constant: bool = True, order: int = 0) -> None:
         super().__init__(constant, order)

     @classmethod
-    def from_string(cls, trend: str) ->'TimeTrend':
+    def from_string(cls, trend: str) -> "TimeTrend":
         """
         Create a TimeTrend from a string description.

@@ -181,7 +277,41 @@ class TimeTrend(TimeTrendDeterministicTerm):
         TimeTrend
             The TimeTrend instance.
         """
-        pass
+        constant = trend.startswith("c")
+        order = 0
+        if "tt" in trend:
+            order = 2
+        elif "t" in trend:
+            order = 1
+        return cls(constant=constant, order=order)
+
+    @Appender(DeterministicTerm.in_sample.__doc__)
+    def in_sample(
+        self, index: Union[Sequence[Hashable], pd.Index]
+    ) -> pd.DataFrame:
+        index = self._index_like(index)
+        nobs = index.shape[0]
+        locs = np.arange(1, nobs + 1, dtype=np.double)[:, None]
+        terms = self._get_terms(locs)
+        return pd.DataFrame(terms, columns=self._columns, index=index)
+
+    @Appender(DeterministicTerm.out_of_sample.__doc__)
+    def out_of_sample(
+        self,
+        steps: int,
+        index: Union[Sequence[Hashable], pd.Index],
+        forecast_index: Optional[Sequence[Hashable]] = None,
+    ) -> pd.DataFrame:
+        index = self._index_like(index)
+        nobs = index.shape[0]
+        fcast_index = self._extend_index(index, steps, forecast_index)
+        locs = np.arange(nobs + 1, nobs + steps + 1, dtype=np.double)[:, None]
+        terms = self._get_terms(locs)
+        return pd.DataFrame(terms, columns=self._columns, index=fcast_index)
+
+    @property
+    def _eq_attr(self) -> Tuple[Hashable, ...]:
+        return self._constant, self._order


 class Seasonality(DeterministicTerm):
@@ -218,30 +348,33 @@ class Seasonality(DeterministicTerm):
     >>> seas_gen = Seasonality(11, initial_period=4)
     >>> seas_gen.in_sample(data.index)
     """
+
     _is_dummy = True

-    def __init__(self, period: int, initial_period: int=1) ->None:
-        self._period = required_int_like(period, 'period')
-        self._initial_period = required_int_like(initial_period,
-            'initial_period')
+    def __init__(self, period: int, initial_period: int = 1) -> None:
+        self._period = required_int_like(period, "period")
+        self._initial_period = required_int_like(
+            initial_period, "initial_period"
+        )
         if period < 2:
-            raise ValueError('period must be >= 2')
+            raise ValueError("period must be >= 2")
         if not 1 <= self._initial_period <= period:
-            raise ValueError('initial_period must be in {1, 2, ..., period}')
+            raise ValueError("initial_period must be in {1, 2, ..., period}")

     @property
-    def period(self) ->int:
+    def period(self) -> int:
         """The period of the seasonality"""
-        pass
+        return self._period

     @property
-    def initial_period(self) ->int:
+    def initial_period(self) -> int:
         """The seasonal index of the first observation"""
-        pass
+        return self._initial_period

     @classmethod
-    def from_index(cls, index: Union[Sequence[Hashable], pd.DatetimeIndex,
-        pd.PeriodIndex]) ->'Seasonality':
+    def from_index(
+        cls, index: Union[Sequence[Hashable], pd.DatetimeIndex, pd.PeriodIndex]
+    ) -> "Seasonality":
         """
         Construct a seasonality directly from an index using its frequency.

@@ -255,26 +388,88 @@ class Seasonality(DeterministicTerm):
         Seasonality
             The initialized Seasonality instance.
         """
-        pass
+        index = cls._index_like(index)
+        if isinstance(index, pd.PeriodIndex):
+            freq = index.freq
+        elif isinstance(index, pd.DatetimeIndex):
+            freq = index.freq if index.freq else index.inferred_freq
+        else:
+            raise TypeError("index must be a DatetimeIndex or PeriodIndex")
+        if freq is None:
+            raise ValueError("index must have a freq or inferred_freq set")
+        period = freq_to_period(freq)
+        return cls(period=period)
+
+    @property
+    def _eq_attr(self) -> Tuple[Hashable, ...]:
+        return self._period, self._initial_period
+
+    def __str__(self) -> str:
+        return f"Seasonality(period={self._period})"

-    def __str__(self) ->str:
-        return f'Seasonality(period={self._period})'
+    @property
+    def _columns(self) -> List[str]:
+        period = self._period
+        columns = []
+        for i in range(1, period + 1):
+            columns.append(f"s({i},{period})")
+        return columns
+
+    @Appender(DeterministicTerm.in_sample.__doc__)
+    def in_sample(
+        self, index: Union[Sequence[Hashable], pd.Index]
+    ) -> pd.DataFrame:
+        index = self._index_like(index)
+        nobs = index.shape[0]
+        period = self._period
+        term = np.zeros((nobs, period))
+        offset = self._initial_period - 1
+        for i in range(period):
+            col = (i + offset) % period
+            term[i::period, col] = 1
+        return pd.DataFrame(term, columns=self._columns, index=index)
+
+    @Appender(DeterministicTerm.out_of_sample.__doc__)
+    def out_of_sample(
+        self,
+        steps: int,
+        index: Union[Sequence[Hashable], pd.Index],
+        forecast_index: Optional[Sequence[Hashable]] = None,
+    ) -> pd.DataFrame:
+        index = self._index_like(index)
+        fcast_index = self._extend_index(index, steps, forecast_index)
+        nobs = index.shape[0]
+        period = self._period
+        term = np.zeros((steps, period))
+        offset = self._initial_period - 1
+        for i in range(period):
+            col_loc = (nobs + offset + i) % period
+            term[i::period, col_loc] = 1
+        return pd.DataFrame(term, columns=self._columns, index=fcast_index)


 class FourierDeterministicTerm(DeterministicTerm, ABC):
     """Abstract Base Class for all Fourier Deterministic Terms"""

-    def __init__(self, order: int) ->None:
-        self._order = required_int_like(order, 'terms')
+    def __init__(self, order: int) -> None:
+        self._order = required_int_like(order, "terms")

     @property
-    def order(self) ->int:
+    def order(self) -> int:
         """The order of the Fourier terms included"""
-        pass
+        return self._order
+
+    def _get_terms(self, locs: np.ndarray) -> np.ndarray:
+        locs = 2 * np.pi * locs.astype(np.double)
+        terms = np.empty((locs.shape[0], 2 * self._order))
+        for i in range(self._order):
+            for j, func in enumerate((np.sin, np.cos)):
+                terms[:, 2 * i + j] = func((i + 1) * locs)
+        return terms


 class Fourier(FourierDeterministicTerm):
-    """
+    r"""
     Fourier series deterministic terms

     Parameters
@@ -297,8 +492,8 @@ class Fourier(FourierDeterministicTerm):

     .. math::

-       f_{i,s,t} & = \\sin\\left(2 \\pi i \\times \\frac{t}{m} \\right)  \\\\
-       f_{i,c,t} & = \\cos\\left(2 \\pi i \\times \\frac{t}{m} \\right)
+       f_{i,s,t} & = \sin\left(2 \pi i \times \frac{t}{m} \right)  \\
+       f_{i,c,t} & = \cos\left(2 \pi i \times \frac{t}{m} \right)

     where m is the length of the period.

@@ -316,37 +511,109 @@ class Fourier(FourierDeterministicTerm):

     def __init__(self, period: float, order: int):
         super().__init__(order)
-        self._period = float_like(period, 'period')
+        self._period = float_like(period, "period")
         if 2 * self._order > self._period:
-            raise ValueError('2 * order must be <= period')
+            raise ValueError("2 * order must be <= period")

     @property
-    def period(self) ->float:
+    def period(self) -> float:
         """The period of the Fourier terms"""
-        pass
+        return self._period
+
+    @property
+    def _columns(self) -> List[str]:
+        period = self._period
+        fmt_period = d_or_f(period).strip()
+        columns = []
+        for i in range(1, self._order + 1):
+            for typ in ("sin", "cos"):
+                columns.append(f"{typ}({i},{fmt_period})")
+        return columns
+
+    @Appender(DeterministicTerm.in_sample.__doc__)
+    def in_sample(
+        self, index: Union[Sequence[Hashable], pd.Index]
+    ) -> pd.DataFrame:
+        index = self._index_like(index)
+        nobs = index.shape[0]
+        terms = self._get_terms(np.arange(nobs) / self._period)
+        return pd.DataFrame(terms, index=index, columns=self._columns)
+
+    @Appender(DeterministicTerm.out_of_sample.__doc__)
+    def out_of_sample(
+        self,
+        steps: int,
+        index: Union[Sequence[Hashable], pd.Index],
+        forecast_index: Optional[Sequence[Hashable]] = None,
+    ) -> pd.DataFrame:
+        index = self._index_like(index)
+        fcast_index = self._extend_index(index, steps, forecast_index)
+        nobs = index.shape[0]
+        terms = self._get_terms(np.arange(nobs, nobs + steps) / self._period)
+        return pd.DataFrame(terms, index=fcast_index, columns=self._columns)
+
+    @property
+    def _eq_attr(self) -> Tuple[Hashable, ...]:
+        return self._period, self._order

-    def __str__(self) ->str:
-        return f'Fourier(period={self._period}, order={self._order})'
+    def __str__(self) -> str:
+        return f"Fourier(period={self._period}, order={self._order})"


 class CalendarDeterministicTerm(DeterministicTerm, ABC):
     """Abstract Base Class for calendar deterministic terms"""

-    def __init__(self, freq: str) ->None:
+    def __init__(self, freq: str) -> None:
         try:
-            index = pd.date_range('2020-01-01', freq=freq, periods=1)
+            index = pd.date_range("2020-01-01", freq=freq, periods=1)
             self._freq = index.freq
         except ValueError:
-            raise ValueError('freq is not understood by pandas')
+            raise ValueError("freq is not understood by pandas")

     @property
-    def freq(self) ->str:
+    def freq(self) -> str:
         """The frequency of the deterministic terms"""
-        pass
+        return self._freq.freqstr
+
+    def _compute_ratio(
+        self, index: Union[pd.DatetimeIndex, pd.PeriodIndex]
+    ) -> np.ndarray:
+        if isinstance(index, pd.PeriodIndex):
+            index = index.to_timestamp()
+        delta = index - index.to_period(self._freq).to_timestamp()
+        pi = index.to_period(self._freq)
+        gap = (pi + 1).to_timestamp() - pi.to_timestamp()
+        return to_numpy(delta) / to_numpy(gap)
+
+    def _check_index_type(
+        self,
+        index: pd.Index,
+        allowed: Union[Type, Tuple[Type, ...]] = (
+            pd.DatetimeIndex,
+            pd.PeriodIndex,
+        ),
+    ) -> Union[pd.DatetimeIndex, pd.PeriodIndex]:
+        if isinstance(allowed, type):
+            allowed = (allowed,)
+        if not isinstance(index, allowed):
+            if len(allowed) == 1:
+                allowed_types = "a " + allowed[0].__name__
+            else:
+                allowed_types = ", ".join(a.__name__ for a in allowed[:-1])
+                if len(allowed) > 2:
+                    allowed_types += ","
+                allowed_types += " and " + allowed[-1].__name__
+            msg = (
+                f"{type(self).__name__} terms can only be computed from "
+                f"{allowed_types}"
+            )
+            raise TypeError(msg)
+        assert isinstance(index, (pd.DatetimeIndex, pd.PeriodIndex))
+        return index


 class CalendarFourier(CalendarDeterministicTerm, FourierDeterministicTerm):
-    """
+    r"""
     Fourier series deterministic terms based on calendar time

     Parameters
@@ -369,12 +636,12 @@ class CalendarFourier(CalendarDeterministicTerm, FourierDeterministicTerm):

     .. math::

-       f_{i,s,t} & = \\sin\\left(2 \\pi i \\tau_t \\right)  \\\\
-       f_{i,c,t} & = \\cos\\left(2 \\pi i \\tau_t \\right)
+       f_{i,s,t} & = \sin\left(2 \pi i \tau_t \right)  \\
+       f_{i,c,t} & = \cos\left(2 \pi i \tau_t \right)

-    where m is the length of the period and :math:`\\tau_t` is the frequency
+    where m is the length of the period and :math:`\tau_t` is the frequency
     normalized time.  For example, when freq is "D" then an observation with
-    a timestamp of 12:00:00 would have :math:`\\tau_t=0.5`.
+    a timestamp of 12:00:00 would have :math:`\tau_t=0.5`.

     Examples
     --------
@@ -394,13 +661,51 @@ class CalendarFourier(CalendarDeterministicTerm, FourierDeterministicTerm):
     >>> cal_fourier_gen.in_sample(index)
     """

-    def __init__(self, freq: str, order: int) ->None:
+    def __init__(self, freq: str, order: int) -> None:
         super().__init__(freq)
         FourierDeterministicTerm.__init__(self, order)
-        self._order = required_int_like(order, 'terms')
+        self._order = required_int_like(order, "terms")
+
+    @property
+    def _columns(self) -> List[str]:
+        columns = []
+        for i in range(1, self._order + 1):
+            for typ in ("sin", "cos"):
+                columns.append(f"{typ}({i},freq={self._freq.freqstr})")
+        return columns
+
+    @Appender(DeterministicTerm.in_sample.__doc__)
+    def in_sample(
+        self, index: Union[Sequence[Hashable], pd.Index]
+    ) -> pd.DataFrame:
+        index = self._index_like(index)
+        index = self._check_index_type(index)
+
+        ratio = self._compute_ratio(index)
+        terms = self._get_terms(ratio)
+        return pd.DataFrame(terms, index=index, columns=self._columns)
+
+    @Appender(DeterministicTerm.out_of_sample.__doc__)
+    def out_of_sample(
+        self,
+        steps: int,
+        index: Union[Sequence[Hashable], pd.Index],
+        forecast_index: Optional[Sequence[Hashable]] = None,
+    ) -> pd.DataFrame:
+        index = self._index_like(index)
+        fcast_index = self._extend_index(index, steps, forecast_index)
+        self._check_index_type(fcast_index)
+        assert isinstance(fcast_index, (pd.DatetimeIndex, pd.PeriodIndex))
+        ratio = self._compute_ratio(fcast_index)
+        terms = self._get_terms(ratio)
+        return pd.DataFrame(terms, index=fcast_index, columns=self._columns)
+
+    @property
+    def _eq_attr(self) -> Tuple[Hashable, ...]:
+        return self._freq.freqstr, self._order

-    def __str__(self) ->str:
-        return f'Fourier(freq={self._freq.freqstr}, order={self._order})'
+    def __str__(self) -> str:
+        return f"Fourier(freq={self._freq.freqstr}, order={self._order})"


 class CalendarSeasonality(CalendarDeterministicTerm):
@@ -438,50 +743,156 @@ class CalendarSeasonality(CalendarDeterministicTerm):
     >>> cal_seas_gen = CalendarSeasonality("H", "D")
     >>> cal_seas_gen.in_sample(index)
     """
+
     _is_dummy = True
+
+    # out_of: freq
     if PD_LT_2_2_0:
-        _supported = {'W': {'B': 5, 'D': 7, 'h': 24 * 7, 'H': 24 * 7}, 'D':
-            {'h': 24, 'H': 24}, 'Q': {'MS': 3, 'M': 3}, 'A': {'MS': 12, 'M':
-            12}, 'Y': {'MS': 12, 'Q': 4, 'M': 12}}
+        _supported = {
+            "W": {"B": 5, "D": 7, "h": 24 * 7, "H": 24 * 7},
+            "D": {"h": 24, "H": 24},
+            "Q": {"MS": 3, "M": 3},
+            "A": {"MS": 12, "M": 12},
+            "Y": {"MS": 12, "Q": 4, "M": 12},
+        }
     else:
-        _supported = {'W': {'B': 5, 'D': 7, 'h': 24 * 7}, 'D': {'h': 24},
-            'Q': {'MS': 3, 'ME': 3}, 'A': {'MS': 12, 'ME': 12, 'QE': 4},
-            'Y': {'MS': 12, 'ME': 12, 'QE': 4}, 'QE': {'ME': 3}, 'YE': {
-            'ME': 12, 'QE': 4}}
-
-    def __init__(self, freq: str, period: str) ->None:
+        _supported = {
+            "W": {"B": 5, "D": 7, "h": 24 * 7},
+            "D": {"h": 24},
+            "Q": {"MS": 3, "ME": 3},
+            "A": {"MS": 12, "ME": 12, "QE": 4},
+            "Y": {"MS": 12, "ME": 12, "QE": 4},
+            "QE": {"ME": 3},
+            "YE": {"ME": 12, "QE": 4},
+        }
+
+    def __init__(self, freq: str, period: str) -> None:
         freq_options: Set[str] = set()
-        freq_options.update(*[list(val.keys()) for val in self._supported.
-            values()])
+        freq_options.update(
+            *[list(val.keys()) for val in self._supported.values()]
+        )
         period_options = tuple(self._supported.keys())
-        freq = string_like(freq, 'freq', options=tuple(freq_options), lower
-            =False)
-        period = string_like(period, 'period', options=period_options,
-            lower=False)
+
+        freq = string_like(
+            freq, "freq", options=tuple(freq_options), lower=False
+        )
+        period = string_like(
+            period, "period", options=period_options, lower=False
+        )
         if freq not in self._supported[period]:
             raise ValueError(
-                f'The combination of freq={freq} and period={period} is not supported.'
-                )
+                f"The combination of freq={freq} and "
+                f"period={period} is not supported."
+            )
         super().__init__(freq)
         self._period = period
-        self._freq_str = self._freq.freqstr.split('-')[0]
+        self._freq_str = self._freq.freqstr.split("-")[0]

     @property
-    def freq(self) ->str:
+    def freq(self) -> str:
         """The frequency of the deterministic terms"""
-        pass
+        return self._freq.freqstr

     @property
-    def period(self) ->str:
+    def period(self) -> str:
         """The full period"""
-        pass
+        return self._period
+
+    def _weekly_to_loc(
+        self, index: Union[pd.DatetimeIndex, pd.PeriodIndex]
+    ) -> np.ndarray:
+        if self._freq.freqstr in ("h", "H"):
+            return index.hour + 24 * index.dayofweek
+        elif self._freq.freqstr == "D":
+            return index.dayofweek
+        else:  # "B"
+            bdays = pd.bdate_range("2000-1-1", periods=10).dayofweek.unique()
+            loc = index.dayofweek
+            if not loc.isin(bdays).all():
+                raise ValueError(
+                    "freq is B but index contains days that are not business "
+                    "days."
+                )
+            return loc
+
+    def _daily_to_loc(
+        self, index: Union[pd.DatetimeIndex, pd.PeriodIndex]
+    ) -> np.ndarray:
+        return index.hour
+
+    def _quarterly_to_loc(
+        self, index: Union[pd.DatetimeIndex, pd.PeriodIndex]
+    ) -> np.ndarray:
+        return (index.month - 1) % 3
+
+    def _annual_to_loc(
+        self, index: Union[pd.DatetimeIndex, pd.PeriodIndex]
+    ) -> np.ndarray:
+        if self._freq.freqstr in ("M", "ME", "MS"):
+            return index.month - 1
+        else:  # "Q"
+            return index.quarter - 1
+
+    def _get_terms(
+        self, index: Union[pd.DatetimeIndex, pd.PeriodIndex]
+    ) -> np.ndarray:
+        if self._period == "D":
+            locs = self._daily_to_loc(index)
+        elif self._period == "W":
+            locs = self._weekly_to_loc(index)
+        elif self._period in ("Q", "QE"):
+            locs = self._quarterly_to_loc(index)
+        else:  # "A", "Y":
+            locs = self._annual_to_loc(index)
+        full_cycle = self._supported[self._period][self._freq_str]
+        terms = np.zeros((locs.shape[0], full_cycle))
+        terms[np.arange(locs.shape[0]), locs] = 1
+        return terms
+
+    @property
+    def _columns(self) -> List[str]:
+        columns = []
+        count = self._supported[self._period][self._freq_str]
+        for i in range(count):
+            columns.append(
+                f"s({self._freq_str}={i + 1}, period={self._period})"
+            )
+        return columns
+
+    @Appender(DeterministicTerm.in_sample.__doc__)
+    def in_sample(
+        self, index: Union[Sequence[Hashable], pd.Index]
+    ) -> pd.DataFrame:
+        index = self._index_like(index)
+        index = self._check_index_type(index)
+        terms = self._get_terms(index)
+
+        return pd.DataFrame(terms, index=index, columns=self._columns)
+
+    @Appender(DeterministicTerm.out_of_sample.__doc__)
+    def out_of_sample(
+        self,
+        steps: int,
+        index: Union[Sequence[Hashable], pd.Index],
+        forecast_index: Optional[Sequence[Hashable]] = None,
+    ) -> pd.DataFrame:
+        index = self._index_like(index)
+        fcast_index = self._extend_index(index, steps, forecast_index)
+        self._check_index_type(fcast_index)
+        assert isinstance(fcast_index, (pd.DatetimeIndex, pd.PeriodIndex))
+        terms = self._get_terms(fcast_index)
+        return pd.DataFrame(terms, index=fcast_index, columns=self._columns)
+
+    @property
+    def _eq_attr(self) -> Tuple[Hashable, ...]:
+        return self._period, self._freq_str

-    def __str__(self) ->str:
-        return f'Seasonal(freq={self._freq_str})'
+    def __str__(self) -> str:
+        return f"Seasonal(freq={self._freq_str})"


 class CalendarTimeTrend(CalendarDeterministicTerm, TimeTrendDeterministicTerm):
-    """
+    r"""
     Constant and time trend determinstic terms based on calendar time

     Parameters
@@ -507,8 +918,8 @@ class CalendarTimeTrend(CalendarDeterministicTerm, TimeTrendDeterministicTerm):

     Notes
     -----
-    The time stamp, :math:`\\tau_t`, is the number of periods that have elapsed
-    since the base_period. :math:`\\tau_t` may be fractional.
+    The time stamp, :math:`\tau_t`, is the number of periods that have elapsed
+    since the base_period. :math:`\tau_t` may be fractional.

     Examples
     --------
@@ -534,11 +945,18 @@ class CalendarTimeTrend(CalendarDeterministicTerm, TimeTrendDeterministicTerm):
     >>> cal_trend_gen.in_sample(index)
     """

-    def __init__(self, freq: str, constant: bool=True, order: int=0, *,
-        base_period: Optional[Union[str, DateLike]]=None) ->None:
+    def __init__(
+        self,
+        freq: str,
+        constant: bool = True,
+        order: int = 0,
+        *,
+        base_period: Optional[Union[str, DateLike]] = None,
+    ) -> None:
         super().__init__(freq)
-        TimeTrendDeterministicTerm.__init__(self, constant=constant, order=
-            order)
+        TimeTrendDeterministicTerm.__init__(
+            self, constant=constant, order=order
+        )
         self._ref_i8 = 0
         if base_period is not None:
             pr = pd.period_range(base_period, periods=1, freq=self._freq)
@@ -546,13 +964,17 @@ class CalendarTimeTrend(CalendarDeterministicTerm, TimeTrendDeterministicTerm):
         self._base_period = None if base_period is None else str(base_period)

     @property
-    def base_period(self) ->Optional[str]:
+    def base_period(self) -> Optional[str]:
         """The base period"""
-        pass
+        return self._base_period

     @classmethod
-    def from_string(cls, freq: str, trend: str, base_period: Optional[Union
-        [str, DateLike]]=None) ->'CalendarTimeTrend':
+    def from_string(
+        cls,
+        freq: str,
+        trend: str,
+        base_period: Optional[Union[str, DateLike]] = None,
+    ) -> "CalendarTimeTrend":
         """
         Create a TimeTrend from a string description.

@@ -581,13 +1003,66 @@ class CalendarTimeTrend(CalendarDeterministicTerm, TimeTrendDeterministicTerm):
         TimeTrend
             The TimeTrend instance.
         """
-        pass
+        constant = trend.startswith("c")
+        order = 0
+        if "tt" in trend:
+            order = 2
+        elif "t" in trend:
+            order = 1
+        return cls(freq, constant, order, base_period=base_period)
+
+    def _terms(
+        self, index: Union[pd.DatetimeIndex, pd.PeriodIndex], ratio: np.ndarray
+    ) -> pd.DataFrame:
+        if isinstance(index, pd.DatetimeIndex):
+            index = index.to_period(self._freq)
+
+        index_i8 = index.asi8
+        index_i8 = index_i8 - self._ref_i8 + 1
+        time = index_i8.astype(np.double) + ratio
+        time = time[:, None]
+        terms = self._get_terms(time)
+        return pd.DataFrame(terms, columns=self._columns, index=index)
+
+    @Appender(DeterministicTerm.in_sample.__doc__)
+    def in_sample(
+        self, index: Union[Sequence[Hashable], pd.Index]
+    ) -> pd.DataFrame:
+        index = self._index_like(index)
+        index = self._check_index_type(index)
+        ratio = self._compute_ratio(index)
+        return self._terms(index, ratio)
+
+    @Appender(DeterministicTerm.out_of_sample.__doc__)
+    def out_of_sample(
+        self,
+        steps: int,
+        index: Union[Sequence[Hashable], pd.Index],
+        forecast_index: Optional[Sequence[Hashable]] = None,
+    ) -> pd.DataFrame:
+        index = self._index_like(index)
+        fcast_index = self._extend_index(index, steps, forecast_index)
+        self._check_index_type(fcast_index)
+        assert isinstance(fcast_index, (pd.PeriodIndex, pd.DatetimeIndex))
+        ratio = self._compute_ratio(fcast_index)
+        return self._terms(fcast_index, ratio)
+
+    @property
+    def _eq_attr(self) -> Tuple[Hashable, ...]:
+        attr: Tuple[Hashable, ...] = (
+            self._constant,
+            self._order,
+            self._freq.freqstr,
+        )
+        if self._base_period is not None:
+            attr += (self._base_period,)
+        return attr

-    def __str__(self) ->str:
+    def __str__(self) -> str:
         value = TimeTrendDeterministicTerm.__str__(self)
-        value = 'Calendar' + value[:-1] + f', freq={self._freq.freqstr})'
+        value = "Calendar" + value[:-1] + f", freq={self._freq.freqstr})"
         if self._base_period is not None:
-            value = value[:-1] + f'base_period={self._base_period})'
+            value = value[:-1] + f"base_period={self._base_period})"
         return value


@@ -685,10 +1160,18 @@ class DeterministicProcess:
     2000-01-03    1.0     0.0     1.0       0.034398       0.999408
     """

-    def __init__(self, index: Union[Sequence[Hashable], pd.Index], *,
-        period: Optional[Union[float, int]]=None, constant: bool=False,
-        order: int=0, seasonal: bool=False, fourier: int=0,
-        additional_terms: Sequence[DeterministicTerm]=(), drop: bool=False):
+    def __init__(
+        self,
+        index: Union[Sequence[Hashable], pd.Index],
+        *,
+        period: Optional[Union[float, int]] = None,
+        constant: bool = False,
+        order: int = 0,
+        seasonal: bool = False,
+        fourier: int = 0,
+        additional_terms: Sequence[DeterministicTerm] = (),
+        drop: bool = False,
+    ):
         if not isinstance(index, pd.Index):
             index = pd.Index(index)
         self._index = index
@@ -696,57 +1179,248 @@ class DeterministicProcess:
         self._extendable = False
         self._index_freq = None
         self._validate_index()
-        period = float_like(period, 'period', optional=True)
-        self._constant = constant = bool_like(constant, 'constant')
-        self._order = required_int_like(order, 'order')
-        self._seasonal = seasonal = bool_like(seasonal, 'seasonal')
-        self._fourier = required_int_like(fourier, 'fourier')
+        period = float_like(period, "period", optional=True)
+        self._constant = constant = bool_like(constant, "constant")
+        self._order = required_int_like(order, "order")
+        self._seasonal = seasonal = bool_like(seasonal, "seasonal")
+        self._fourier = required_int_like(fourier, "fourier")
         additional_terms = tuple(additional_terms)
         self._cached_in_sample = None
-        self._drop = bool_like(drop, 'drop')
+        self._drop = bool_like(drop, "drop")
         self._additional_terms = additional_terms
         if constant or order:
             self._deterministic_terms.append(TimeTrend(constant, order))
         if seasonal and fourier:
             raise ValueError(
-                'seasonal and fourier can be initialized through the constructor since these will be necessarily perfectly collinear. Instead, you can pass additional components using the additional_terms input.'
-                )
+                """seasonal and fourier can be initialized through the \
+constructor since these will be necessarily perfectly collinear. Instead, \
+you can pass additional components using the additional_terms input."""
+            )
         if (seasonal or fourier) and period is None:
             if period is None:
                 self._period = period = freq_to_period(self._index_freq)
         if seasonal:
-            period = required_int_like(period, 'period')
+            period = required_int_like(period, "period")
             self._deterministic_terms.append(Seasonality(period))
         elif fourier:
-            period = float_like(period, 'period')
+            period = float_like(period, "period")
             assert period is not None
             self._deterministic_terms.append(Fourier(period, order=fourier))
         for term in additional_terms:
             if not isinstance(term, DeterministicTerm):
                 raise TypeError(
-                    'All additional terms must be instances of subsclasses of DeterministicTerm'
-                    )
+                    "All additional terms must be instances of subsclasses "
+                    "of DeterministicTerm"
+                )
             if term not in self._deterministic_terms:
                 self._deterministic_terms.append(term)
             else:
                 raise ValueError(
-                    'One or more terms in additional_terms has been added through the parameters of the constructor. Terms must be unique.'
-                    )
+                    "One or more terms in additional_terms has been added "
+                    "through the parameters of the constructor. Terms must "
+                    "be unique."
+                )
         self._period = period
         self._retain_cols: Optional[List[Hashable]] = None

     @property
-    def index(self) ->pd.Index:
+    def index(self) -> pd.Index:
         """The index of the process"""
-        pass
+        return self._index

     @property
-    def terms(self) ->List[DeterministicTerm]:
+    def terms(self) -> List[DeterministicTerm]:
         """The deterministic terms included in the process"""
-        pass
-
-    def range(self, start: Union[IntLike, DateLike, str], stop: Union[
-        IntLike, DateLike, str]) ->pd.DataFrame:
+        return self._deterministic_terms
+
+    def _adjust_dummies(self, terms: List[pd.DataFrame]) -> List[pd.DataFrame]:
+        has_const: Optional[bool] = None
+        for dterm in self._deterministic_terms:
+            if isinstance(dterm, (TimeTrend, CalendarTimeTrend)):
+                has_const = has_const or dterm.constant
+        if has_const is None:
+            has_const = False
+            for term in terms:
+                const_col = (term == term.iloc[0]).all() & (term.iloc[0] != 0)
+                has_const = has_const or const_col.any()
+        drop_first = has_const
+        for i, dterm in enumerate(self._deterministic_terms):
+            is_dummy = dterm.is_dummy
+            if is_dummy and drop_first:
+                # drop first
+                terms[i] = terms[i].iloc[:, 1:]
+            drop_first = drop_first or is_dummy
+        return terms
+
+    def _remove_zeros_ones(self, terms: pd.DataFrame) -> pd.DataFrame:
+        all_zero = np.all(terms == 0, axis=0)
+        if np.any(all_zero):
+            terms = terms.loc[:, ~all_zero]
+        is_constant = terms.max(axis=0) == terms.min(axis=0)
+        if np.sum(is_constant) > 1:
+            # Retain first
+            const_locs = np.where(is_constant)[0]
+            is_constant.iloc[const_locs[:1]] = False
+            terms = terms.loc[:, ~is_constant]
+        return terms
+
+    @Appender(DeterministicTerm.in_sample.__doc__)
+    def in_sample(self) -> pd.DataFrame:
+        if self._cached_in_sample is not None:
+            return self._cached_in_sample
+        index = self._index
+        if not self._deterministic_terms:
+            return pd.DataFrame(np.empty((index.shape[0], 0)), index=index)
+        raw_terms = []
+        for term in self._deterministic_terms:
+            raw_terms.append(term.in_sample(index))
+
+        raw_terms = self._adjust_dummies(raw_terms)
+        terms: pd.DataFrame = pd.concat(raw_terms, axis=1)
+        terms = self._remove_zeros_ones(terms)
+        if self._drop:
+            terms_arr = to_numpy(terms)
+            res = qr(terms_arr, mode="r", pivoting=True)
+            r = res[0]
+            p = res[-1]
+            abs_diag = np.abs(np.diag(r))
+            tol = abs_diag[0] * terms_arr.shape[1] * np.finfo(float).eps
+            rank = int(np.sum(abs_diag > tol))
+            rpx = r.T @ terms_arr
+            keep = [0]
+            last_rank = 1
+            # Find the left-most columns that produce full rank
+            for i in range(1, terms_arr.shape[1]):
+                curr_rank = np.linalg.matrix_rank(rpx[: i + 1, : i + 1])
+                if curr_rank > last_rank:
+                    keep.append(i)
+                    last_rank = curr_rank
+                if curr_rank == rank:
+                    break
+            if len(keep) == rank:
+                terms = terms.iloc[:, keep]
+            else:
+                terms = terms.iloc[:, np.sort(p[:rank])]
+        self._retain_cols = terms.columns
+        self._cached_in_sample = terms
+        return terms
+
+    @Appender(DeterministicTerm.out_of_sample.__doc__)
+    def out_of_sample(
+        self,
+        steps: int,
+        forecast_index: Optional[Union[Sequence[Hashable], pd.Index]] = None,
+    ) -> pd.DataFrame:
+        steps = required_int_like(steps, "steps")
+        if self._drop and self._retain_cols is None:
+            self.in_sample()
+        index = self._index
+        if not self._deterministic_terms:
+            return pd.DataFrame(np.empty((index.shape[0], 0)), index=index)
+        raw_terms = []
+        for term in self._deterministic_terms:
+            raw_terms.append(term.out_of_sample(steps, index, forecast_index))
+        terms: pd.DataFrame = pd.concat(raw_terms, axis=1)
+        assert self._retain_cols is not None
+        if terms.shape[1] != len(self._retain_cols):
+            terms = terms[self._retain_cols]
+        return terms
+
+    def _extend_time_index(
+        self,
+        stop: pd.Timestamp,
+    ) -> Union[pd.DatetimeIndex, pd.PeriodIndex]:
+        index = self._index
+        if isinstance(index, pd.PeriodIndex):
+            return pd.period_range(index[0], end=stop, freq=index.freq)
+        return pd.date_range(start=index[0], end=stop, freq=self._index_freq)
+
+    def _range_from_range_index(self, start: int, stop: int) -> pd.DataFrame:
+        index = self._index
+        is_int64_index = is_int_index(index)
+        assert isinstance(index, pd.RangeIndex) or is_int64_index
+        if start < index[0]:
+            raise ValueError(START_BEFORE_INDEX_ERR)
+        if isinstance(index, pd.RangeIndex):
+            idx_step = index.step
+        else:
+            idx_step = np.diff(index).max() if len(index) > 1 else 1
+        if idx_step != 1 and ((start - index[0]) % idx_step) != 0:
+            raise ValueError(
+                f"The step of the index is not 1 (actual step={idx_step})."
+                " start must be in the sequence that would have been "
+                "generated by the index."
+            )
+        if is_int64_index:
+            new_idx = pd.Index(np.arange(start, stop))
+        else:
+            new_idx = pd.RangeIndex(start, stop, step=idx_step)
+        if new_idx[-1] <= self._index[-1]:
+            # In-sample only
+            in_sample = self.in_sample()
+            in_sample = in_sample.loc[new_idx]
+            return in_sample
+        elif new_idx[0] > self._index[-1]:
+            # Out of-sample only
+            next_value = index[-1] + idx_step
+            if new_idx[0] != next_value:
+                tmp = pd.RangeIndex(next_value, stop, step=idx_step)
+                oos = self.out_of_sample(tmp.shape[0], forecast_index=tmp)
+                return oos.loc[new_idx]
+            return self.out_of_sample(new_idx.shape[0], forecast_index=new_idx)
+        # Using some from each in and out of sample
+        in_sample_loc = new_idx <= self._index[-1]
+        in_sample_idx = new_idx[in_sample_loc]
+        out_of_sample_idx = new_idx[~in_sample_loc]
+        in_sample_exog = self.in_sample().loc[in_sample_idx]
+        oos_exog = self.out_of_sample(
+            steps=out_of_sample_idx.shape[0], forecast_index=out_of_sample_idx
+        )
+        return pd.concat([in_sample_exog, oos_exog], axis=0)
+
+    def _range_from_time_index(
+        self, start: pd.Timestamp, stop: pd.Timestamp
+    ) -> pd.DataFrame:
+        index = self._index
+        if isinstance(self._index, pd.PeriodIndex):
+            if isinstance(start, pd.Timestamp):
+                start = start.to_period(freq=self._index_freq)
+            if isinstance(stop, pd.Timestamp):
+                stop = stop.to_period(freq=self._index_freq)
+        if start < index[0]:
+            raise ValueError(START_BEFORE_INDEX_ERR)
+        if stop <= self._index[-1]:
+            return self.in_sample().loc[start:stop]
+        new_idx = self._extend_time_index(stop)
+        oos_idx = new_idx[new_idx > index[-1]]
+        oos = self.out_of_sample(oos_idx.shape[0], oos_idx)
+        if start >= oos_idx[0]:
+            return oos.loc[start:stop]
+        both = pd.concat([self.in_sample(), oos], axis=0)
+        return both.loc[start:stop]
+
+    def _int_to_timestamp(self, value: int, name: str) -> pd.Timestamp:
+        if value < 0:
+            raise ValueError(f"{name} must be non-negative.")
+        if value < self._index.shape[0]:
+            return self._index[value]
+        add_periods = value - (self._index.shape[0] - 1) + 1
+        index = self._index
+        if isinstance(self._index, pd.PeriodIndex):
+            pr = pd.period_range(
+                index[-1], freq=self._index_freq, periods=add_periods
+            )
+            return pr[-1].to_timestamp()
+        dr = pd.date_range(
+            index[-1], freq=self._index_freq, periods=add_periods
+        )
+        return dr[-1]
+
+    def range(
+        self,
+        start: Union[IntLike, DateLike, str],
+        stop: Union[IntLike, DateLike, str],
+    ) -> pd.DataFrame:
         """
         Deterministic terms spanning a range of observations

@@ -763,7 +1437,43 @@ class DeterministicProcess:
         DataFrame
             A data frame of deterministic terms
         """
-        pass
+        if not self._extendable:
+            raise TypeError(
+                """The index in the deterministic process does not \
+support extension. Only PeriodIndex, DatetimeIndex with a frequency, \
+RangeIndex, and integral Indexes that start at 0 and have only unit \
+differences can be extended when producing out-of-sample forecasts.
+"""
+            )
+        if type(self._index) in (pd.RangeIndex,) or is_int_index(self._index):
+            start = required_int_like(start, "start")
+            stop = required_int_like(stop, "stop")
+            # Add 1 to ensure that the end point is inclusive
+            stop += 1
+            return self._range_from_range_index(start, stop)
+        if isinstance(start, (int, np.integer)):
+            start = self._int_to_timestamp(start, "start")
+        else:
+            start = pd.Timestamp(start)
+        if isinstance(stop, (int, np.integer)):
+            stop = self._int_to_timestamp(stop, "stop")
+        else:
+            stop = pd.Timestamp(stop)
+        return self._range_from_time_index(start, stop)
+
+    def _validate_index(self) -> None:
+        if isinstance(self._index, pd.PeriodIndex):
+            self._index_freq = self._index.freq
+            self._extendable = True
+        elif isinstance(self._index, pd.DatetimeIndex):
+            self._index_freq = self._index.freq or self._index.inferred_freq
+            self._extendable = self._index_freq is not None
+        elif isinstance(self._index, pd.RangeIndex):
+            self._extendable = True
+        elif is_int_index(self._index):
+            self._extendable = self._index[0] == 0 and np.all(
+                np.diff(self._index) == 1
+            )

     def apply(self, index):
         """
@@ -780,4 +1490,13 @@ class DeterministicProcess:
         DeterministicProcess
             The deterministic process applied to a different index
         """
-        pass
+        return DeterministicProcess(
+            index,
+            period=self._period,
+            constant=self._constant,
+            order=self._order,
+            seasonal=self._seasonal,
+            fourier=self._fourier,
+            additional_terms=self._additional_terms,
+            drop=self._drop,
+        )
diff --git a/statsmodels/tsa/exponential_smoothing/base.py b/statsmodels/tsa/exponential_smoothing/base.py
index c693b47d6..bcc82cea9 100644
--- a/statsmodels/tsa/exponential_smoothing/base.py
+++ b/statsmodels/tsa/exponential_smoothing/base.py
@@ -1,14 +1,21 @@
 from collections import OrderedDict
 import contextlib
 import warnings
+
 import numpy as np
 import pandas as pd
 from scipy.stats import norm
+
 from statsmodels.base.data import PandasData
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.eval_measures import aic, aicc, bic, hqic
 from statsmodels.tools.sm_exceptions import PrecisionWarning
-from statsmodels.tools.numdiff import _get_epsilon, approx_fprime, approx_fprime_cs, approx_hess_cs
+from statsmodels.tools.numdiff import (
+    _get_epsilon,
+    approx_fprime,
+    approx_fprime_cs,
+    approx_hess_cs,
+)
 from statsmodels.tools.tools import pinv_extended
 import statsmodels.tsa.base.tsa_model as tsbase
 from statsmodels.tsa.statespace.tools import _safe_cond
@@ -20,20 +27,50 @@ class StateSpaceMLEModel(tsbase.TimeSeriesModel):
     from statespace.mlemodel.MLEModel
     """

-    def __init__(self, endog, exog=None, dates=None, freq=None, missing=
-        'none', **kwargs):
-        super().__init__(endog=endog, exog=exog, dates=dates, freq=freq,
-            missing=missing)
+    def __init__(
+        self, endog, exog=None, dates=None, freq=None, missing="none", **kwargs
+    ):
+        # TODO: this was changed from the original, requires some work when
+        # using this as base class for state space and exponential smoothing
+        super().__init__(
+            endog=endog, exog=exog, dates=dates, freq=freq, missing=missing
+        )
+
+        # Store kwargs to recreate model
         self._init_kwargs = kwargs
+
+        # Prepared the endog array: C-ordered, shape=(nobs x k_endog)
         self.endog, self.exog = self.prepare_data(self.data)
         self.use_pandas = isinstance(self.data, PandasData)
+
+        # Dimensions
         self.nobs = self.endog.shape[0]
+
+        # Setup holder for fixed parameters
         self._has_fixed_params = False
         self._fixed_params = None
         self._params_index = None
         self._fixed_params_index = None
         self._free_params_index = None

+    @staticmethod
+    def prepare_data(data):
+        raise NotImplementedError
+
+    def clone(self, endog, exog=None, **kwargs):
+        raise NotImplementedError
+
+    def _validate_can_fix_params(self, param_names):
+        for param_name in param_names:
+            if param_name not in self.param_names:
+                raise ValueError(
+                    'Invalid parameter name passed: "%s".' % param_name
+                )
+
+    @property
+    def k_params(self):
+        return len(self.param_names)
+
     @contextlib.contextmanager
     def fix_params(self, params):
         """
@@ -52,7 +89,54 @@ class StateSpaceMLEModel(tsbase.TimeSeriesModel):
         >>> with mod.fix_params({'ar.L1': 0.5}):
                 res = mod.fit()
         """
-        pass
+        # Initialization (this is done here rather than in the constructor
+        # because param_names may not be available at that point)
+        if self._fixed_params is None:
+            self._fixed_params = {}
+            self._params_index = OrderedDict(
+                zip(self.param_names, np.arange(self.k_params))
+            )
+
+        # Cache the current fixed parameters
+        cache_fixed_params = self._fixed_params.copy()
+        cache_has_fixed_params = self._has_fixed_params
+        cache_fixed_params_index = self._fixed_params_index
+        cache_free_params_index = self._free_params_index
+
+        # Validate parameter names and values
+        all_fixed_param_names = (
+            set(params.keys()) | set(self._fixed_params.keys())
+        )
+        self._validate_can_fix_params(all_fixed_param_names)
+
+        # Set the new fixed parameters, keeping the order as given by
+        # param_names
+        self._fixed_params.update(params)
+        self._fixed_params = OrderedDict(
+            [
+                (name, self._fixed_params[name])
+                for name in self.param_names
+                if name in self._fixed_params
+            ]
+        )
+
+        # Update associated values
+        self._has_fixed_params = True
+        self._fixed_params_index = [
+            self._params_index[key] for key in self._fixed_params.keys()
+        ]
+        self._free_params_index = list(
+            set(np.arange(self.k_params)).difference(self._fixed_params_index)
+        )
+
+        try:
+            yield
+        finally:
+            # Reset the fixed parameters
+            self._has_fixed_params = cache_has_fixed_params
+            self._fixed_params = cache_fixed_params
+            self._fixed_params_index = cache_fixed_params_index
+            self._free_params_index = cache_free_params_index

     def fit_constrained(self, constraints, start_params=None, **fit_kwds):
         """
@@ -78,14 +162,19 @@ class StateSpaceMLEModel(tsbase.TimeSeriesModel):
         >>> mod = sm.tsa.SARIMAX(endog, order=(1, 0, 1))
         >>> res = mod.fit_constrained({'ar.L1': 0.5})
         """
-        pass
+        with self.fix_params(constraints):
+            res = self.fit(start_params, **fit_kwds)
+        return res

     @property
     def start_params(self):
         """
         (array) Starting parameters for maximum likelihood estimation.
         """
-        pass
+        if hasattr(self, "_start_params"):
+            return self._start_params
+        else:
+            raise NotImplementedError

     @property
     def param_names(self):
@@ -93,26 +182,128 @@ class StateSpaceMLEModel(tsbase.TimeSeriesModel):
         (list of str) List of human readable parameter names (for parameters
         actually included in the model).
         """
-        pass
+        if hasattr(self, "_param_names"):
+            return self._param_names
+        else:
+            try:
+                names = ["param.%d" % i for i in range(len(self.start_params))]
+            except NotImplementedError:
+                names = []
+            return names

     @classmethod
-    def from_formula(cls, formula, data, subset=None, drop_cols=None, *args,
-        **kwargs):
+    def from_formula(
+        cls, formula, data, subset=None, drop_cols=None, *args, **kwargs
+    ):
         """
         Not implemented for state space models
         """
-        pass
+        raise NotImplementedError
+
+    def _wrap_data(self, data, start_idx, end_idx, names=None):
+        # TODO: check if this is reasonable for statespace
+        # squeezing data: data may be:
+        # - m x n: m dates, n simulations -> squeeze does nothing
+        # - m x 1: m dates, 1 simulation -> squeeze removes last dimension
+        # - 1 x n: don't squeeze, already fine
+        # - 1 x 1: squeeze only second axis
+        if data.ndim > 1 and data.shape[1] == 1:
+            data = np.squeeze(data, axis=1)
+        if self.use_pandas:
+            if data.shape[0]:
+                _, _, _, index = self._get_prediction_index(start_idx, end_idx)
+            else:
+                index = None
+            if data.ndim < 2:
+                data = pd.Series(data, index=index, name=names)
+            else:
+                data = pd.DataFrame(data, index=index, columns=names)
+        return data
+
+    def _wrap_results(
+        self,
+        params,
+        result,
+        return_raw,
+        cov_type=None,
+        cov_kwds=None,
+        results_class=None,
+        wrapper_class=None,
+    ):
+        if not return_raw:
+            # Wrap in a results object
+            result_kwargs = {}
+            if cov_type is not None:
+                result_kwargs["cov_type"] = cov_type
+            if cov_kwds is not None:
+                result_kwargs["cov_kwds"] = cov_kwds
+
+            if results_class is None:
+                results_class = self._res_classes["fit"][0]
+            if wrapper_class is None:
+                wrapper_class = self._res_classes["fit"][1]
+
+            res = results_class(self, params, result, **result_kwargs)
+            result = wrapper_class(res)
+        return result
+
+    def _score_complex_step(self, params, **kwargs):
+        # the default epsilon can be too small
+        # inversion_method = INVERT_UNIVARIATE | SOLVE_LU
+        epsilon = _get_epsilon(params, 2., None, len(params))
+        kwargs['transformed'] = True
+        kwargs['complex_step'] = True
+        return approx_fprime_cs(params, self.loglike, epsilon=epsilon,
+                                kwargs=kwargs)
+
+    def _score_finite_difference(self, params, approx_centered=False,
+                                 **kwargs):
+        kwargs['transformed'] = True
+        return approx_fprime(params, self.loglike, kwargs=kwargs,
+                             centered=approx_centered)
+
+    def _hessian_finite_difference(self, params, approx_centered=False,
+                                   **kwargs):
+        params = np.array(params, ndmin=1)
+
+        warnings.warn('Calculation of the Hessian using finite differences'
+                      ' is usually subject to substantial approximation'
+                      ' errors.',
+                      PrecisionWarning,
+                      stacklevel=3,
+                      )
+
+        if not approx_centered:
+            epsilon = _get_epsilon(params, 3, None, len(params))
+        else:
+            epsilon = _get_epsilon(params, 4, None, len(params)) / 2
+        hessian = approx_fprime(params, self._score_finite_difference,
+                                epsilon=epsilon, kwargs=kwargs,
+                                centered=approx_centered)
+
+        # TODO: changed this to nobs_effective, has to be changed when merging
+        # with statespace mlemodel
+        return hessian / (self.nobs_effective)

     def _hessian_complex_step(self, params, **kwargs):
         """
         Hessian matrix computed by second-order complex-step differentiation
         on the `loglike` function.
         """
-        pass
+        # the default epsilon can be too small
+        epsilon = _get_epsilon(params, 3., None, len(params))
+        kwargs['transformed'] = True
+        kwargs['complex_step'] = True
+        hessian = approx_hess_cs(
+            params, self.loglike, epsilon=epsilon, kwargs=kwargs)
+
+        # TODO: changed this to nobs_effective, has to be changed when merging
+        # with statespace mlemodel
+        return hessian / (self.nobs_effective)


 class StateSpaceMLEResults(tsbase.TimeSeriesModelResults):
-    """
+    r"""
     Class to hold results from fitting a state space model.

     Parameters
@@ -135,70 +326,95 @@ class StateSpaceMLEResults(tsbase.TimeSeriesModelResults):
     def __init__(self, model, params, scale=1.0):
         self.data = model.data
         self.endog = model.data.orig_endog
+
         super().__init__(model, params, None, scale=scale)
+
+        # Save the fixed parameters
         self._has_fixed_params = self.model._has_fixed_params
         self._fixed_params_index = self.model._fixed_params_index
         self._free_params_index = self.model._free_params_index
+        # TODO: seems like maybe self.fixed_params should be the dictionary
+        # itself, not just the keys?
         if self._has_fixed_params:
             self._fixed_params = self.model._fixed_params.copy()
             self.fixed_params = list(self._fixed_params.keys())
         else:
             self._fixed_params = None
             self.fixed_params = []
-        self.param_names = [('%s (fixed)' % name if name in self.
-            fixed_params else name) for name in self.data.param_names or []]
+        self.param_names = [
+            "%s (fixed)" % name if name in self.fixed_params else name
+            for name in (self.data.param_names or [])
+        ]
+
+        # Dimensions
         self.nobs = self.model.nobs
         self.k_params = self.model.k_params
+
         self._rank = None

+    @cache_readonly
+    def nobs_effective(self):
+        raise NotImplementedError
+
+    @cache_readonly
+    def df_resid(self):
+        return self.nobs_effective - self.df_model
+
     @cache_readonly
     def aic(self):
         """
         (float) Akaike Information Criterion
         """
-        pass
+        return aic(self.llf, self.nobs_effective, self.df_model)

     @cache_readonly
     def aicc(self):
         """
         (float) Akaike Information Criterion with small sample correction
         """
-        pass
+        return aicc(self.llf, self.nobs_effective, self.df_model)

     @cache_readonly
     def bic(self):
         """
         (float) Bayes Information Criterion
         """
-        pass
+        return bic(self.llf, self.nobs_effective, self.df_model)
+
+    @cache_readonly
+    def fittedvalues(self):
+        # TODO
+        raise NotImplementedError

     @cache_readonly
     def hqic(self):
         """
         (float) Hannan-Quinn Information Criterion
         """
-        pass
+        # return (-2 * self.llf +
+        #         2 * np.log(np.log(self.nobs_effective)) * self.df_model)
+        return hqic(self.llf, self.nobs_effective, self.df_model)

     @cache_readonly
     def llf(self):
         """
         (float) The value of the log-likelihood function evaluated at `params`.
         """
-        pass
+        raise NotImplementedError

     @cache_readonly
     def mae(self):
         """
         (float) Mean absolute error
         """
-        pass
+        return np.mean(np.abs(self.resid))

     @cache_readonly
     def mse(self):
         """
         (float) Mean squared error
         """
-        pass
+        return self.sse / self.nobs

     @cache_readonly
     def pvalues(self):
@@ -207,25 +423,78 @@ class StateSpaceMLEResults(tsbase.TimeSeriesModelResults):
         coefficients. Note that the coefficients are assumed to have a Normal
         distribution.
         """
-        pass
+        pvalues = np.zeros_like(self.zvalues) * np.nan
+        mask = np.ones_like(pvalues, dtype=bool)
+        mask[self._free_params_index] = True
+        mask &= ~np.isnan(self.zvalues)
+        pvalues[mask] = norm.sf(np.abs(self.zvalues[mask])) * 2
+        return pvalues
+
+    @cache_readonly
+    def resid(self):
+        raise NotImplementedError

     @cache_readonly
     def sse(self):
         """
         (float) Sum of squared errors
         """
-        pass
+        return np.sum(self.resid ** 2)

     @cache_readonly
     def zvalues(self):
         """
         (array) The z-statistics for the coefficients.
         """
-        pass
+        return self.params / self.bse

     def _get_prediction_start_index(self, anchor):
         """Returns a valid numeric start index for predictions/simulations"""
-        pass
+        if anchor is None or anchor == "start":
+            iloc = 0
+        elif anchor == "end":
+            iloc = self.nobs
+        else:
+            iloc, _, _ = self.model._get_index_loc(anchor)
+            if isinstance(iloc, slice):
+                iloc = iloc.start
+            iloc += 1  # anchor is one before start of prediction/simulation
+
+        if iloc < 0:
+            iloc = self.nobs + iloc
+        if iloc > self.nobs:
+            raise ValueError("Cannot anchor simulation outside of the sample.")
+        return iloc
+
+    def _cov_params_approx(
+        self, approx_complex_step=True, approx_centered=False
+    ):
+        evaluated_hessian = self.nobs_effective * self.model.hessian(
+            params=self.params,
+            transformed=True,
+            includes_fixed=True,
+            method="approx",
+            approx_complex_step=approx_complex_step,
+            approx_centered=approx_centered,
+        )
+        # TODO: Case with "not approx_complex_step" is not hit in
+        # tests as of 2017-05-19
+
+        if len(self.fixed_params) > 0:
+            mask = np.ix_(self._free_params_index, self._free_params_index)
+            if len(self.fixed_params) < self.k_params:
+                (tmp, singular_values) = pinv_extended(evaluated_hessian[mask])
+            else:
+                tmp, singular_values = np.nan, [np.nan]
+            neg_cov = np.zeros_like(evaluated_hessian) * np.nan
+            neg_cov[mask] = tmp
+        else:
+            (neg_cov, singular_values) = pinv_extended(evaluated_hessian)
+
+        self.model.update(self.params, transformed=True, includes_fixed=True)
+        if self._rank is None:
+            self._rank = np.linalg.matrix_rank(np.diag(singular_values))
+        return -neg_cov

     @cache_readonly
     def cov_params_approx(self):
@@ -233,7 +502,9 @@ class StateSpaceMLEResults(tsbase.TimeSeriesModelResults):
         (array) The variance / covariance matrix. Computed using the numerical
         Hessian approximated by complex step or finite differences methods.
         """
-        pass
+        return self._cov_params_approx(
+            self._cov_approx_complex_step, self._cov_approx_centered
+        )

     def test_serial_correlation(self, method, lags=None):
         """
@@ -280,11 +551,55 @@ class StateSpaceMLEResults(tsbase.TimeSeriesModelResults):

         Output is nan for any endogenous variable which has missing values.
         """
-        pass
+        if method is None:
+            method = 'ljungbox'
+
+        if self.standardized_forecasts_error is None:
+            raise ValueError('Cannot compute test statistic when standardized'
+                             ' forecast errors have not been computed.')
+
+        if method == 'ljungbox' or method == 'boxpierce':
+            from statsmodels.stats.diagnostic import acorr_ljungbox
+            if hasattr(self, "loglikelihood_burn"):
+                d = np.maximum(self.loglikelihood_burn, self.nobs_diffuse)
+                # This differs from self.nobs_effective because here we want to
+                # exclude exact diffuse periods, whereas self.nobs_effective
+                # only excludes explicitly burned (usually approximate diffuse)
+                # periods.
+                nobs_effective = self.nobs - d
+            else:
+                nobs_effective = self.nobs_effective
+            output = []
+
+            # Default lags for acorr_ljungbox is 40, but may not always have
+            # that many observations
+            if lags is None:
+                seasonal_periods = getattr(self.model, "seasonal_periods", 0)
+                if seasonal_periods:
+                    lags = min(2 * seasonal_periods, nobs_effective // 5)
+                else:
+                    lags = min(10, nobs_effective // 5)
+
+            cols = [2, 3] if method == 'boxpierce' else [0, 1]
+            for i in range(self.model.k_endog):
+                if hasattr(self, "filter_results"):
+                    x = self.filter_results.standardized_forecasts_error[i][d:]
+                else:
+                    x = self.standardized_forecasts_error
+                results = acorr_ljungbox(
+                    x, lags=lags, boxpierce=(method == 'boxpierce')
+                )
+                output.append(np.asarray(results)[:, cols].T)
+
+            output = np.c_[output]
+        else:
+            raise NotImplementedError('Invalid serial correlation test'
+                                      ' method.')
+        return output

     def test_heteroskedasticity(self, method, alternative='two-sided',
-        use_f=True):
-        """
+                                use_f=True):
+        r"""
         Test for heteroskedasticity of standardized residuals

         Tests whether the sum-of-squares in the first third of the sample is
@@ -340,8 +655,8 @@ class StateSpaceMLEResults(tsbase.TimeSeriesModelResults):

         .. math::

-            H(h) = \\sum_{t=T-h+1}^T  \\tilde v_t^2
-            \\Bigg / \\sum_{t=d+1}^{d+1+h} \\tilde v_t^2
+            H(h) = \sum_{t=T-h+1}^T  \tilde v_t^2
+            \Bigg / \sum_{t=d+1}^{d+1+h} \tilde v_t^2

         where :math:`d` = max(loglikelihood_burn, nobs_diffuse)` (usually
         corresponding to diffuse initialization under either the approximate
@@ -349,7 +664,7 @@ class StateSpaceMLEResults(tsbase.TimeSeriesModelResults):

         This statistic can be tested against an :math:`F(h,h)` distribution.
         Alternatively, :math:`h H(h)` is asymptotically distributed according
-        to :math:`\\chi_h^2`; this second test can be applied by passing
+        to :math:`\chi_h^2`; this second test can be applied by passing
         `asymptotic=True` as an argument.

         See section 5.4 of [1]_ for the above formula and discussion, as well
@@ -364,7 +679,101 @@ class StateSpaceMLEResults(tsbase.TimeSeriesModelResults):
         .. [1] Harvey, Andrew C. 1990. *Forecasting, Structural Time Series*
                *Models and the Kalman Filter.* Cambridge University Press.
         """
-        pass
+        if method is None:
+            method = 'breakvar'
+
+        if self.standardized_forecasts_error is None:
+            raise ValueError('Cannot compute test statistic when standardized'
+                             ' forecast errors have not been computed.')
+
+        if method == 'breakvar':
+            # Store some values
+            if hasattr(self, "filter_results"):
+                squared_resid = (
+                    self.filter_results.standardized_forecasts_error**2
+                )
+                d = np.maximum(self.loglikelihood_burn, self.nobs_diffuse)
+                # This differs from self.nobs_effective because here we want to
+                # exclude exact diffuse periods, whereas self.nobs_effective
+                # only excludes explicitly burned (usually approximate diffuse)
+                # periods.
+                nobs_effective = self.nobs - d
+            else:
+                squared_resid = self.standardized_forecasts_error**2
+                if squared_resid.ndim == 1:
+                    squared_resid = np.asarray(squared_resid)
+                    squared_resid = squared_resid[np.newaxis, :]
+                nobs_effective = self.nobs_effective
+                d = 0
+            squared_resid = np.asarray(squared_resid)
+
+            test_statistics = []
+            p_values = []
+            for i in range(self.model.k_endog):
+                h = int(np.round(nobs_effective / 3))
+                numer_resid = squared_resid[i, -h:]
+                numer_resid = numer_resid[~np.isnan(numer_resid)]
+                numer_dof = len(numer_resid)
+
+                denom_resid = squared_resid[i, d:d + h]
+                denom_resid = denom_resid[~np.isnan(denom_resid)]
+                denom_dof = len(denom_resid)
+
+                if numer_dof < 2:
+                    warnings.warn('Early subset of data for variable %d'
+                                  '  has too few non-missing observations to'
+                                  ' calculate test statistic.' % i,
+                                  stacklevel=2,
+                                  )
+                    numer_resid = np.nan
+                if denom_dof < 2:
+                    warnings.warn('Later subset of data for variable %d'
+                                  '  has too few non-missing observations to'
+                                  ' calculate test statistic.' % i,
+                                  stacklevel=2,
+                                  )
+                    denom_resid = np.nan
+
+                test_statistic = np.sum(numer_resid) / np.sum(denom_resid)
+
+                # Setup functions to calculate the p-values
+                if use_f:
+                    from scipy.stats import f
+                    pval_lower = lambda test_statistics: f.cdf(  # noqa:E731
+                        test_statistics, numer_dof, denom_dof)
+                    pval_upper = lambda test_statistics: f.sf(  # noqa:E731
+                        test_statistics, numer_dof, denom_dof)
+                else:
+                    from scipy.stats import chi2
+                    pval_lower = lambda test_statistics: chi2.cdf(  # noqa:E731
+                        numer_dof * test_statistics, denom_dof)
+                    pval_upper = lambda test_statistics: chi2.sf(  # noqa:E731
+                        numer_dof * test_statistics, denom_dof)
+
+                # Calculate the one- or two-sided p-values
+                alternative = alternative.lower()
+                if alternative in ['i', 'inc', 'increasing']:
+                    p_value = pval_upper(test_statistic)
+                elif alternative in ['d', 'dec', 'decreasing']:
+                    test_statistic = 1. / test_statistic
+                    p_value = pval_upper(test_statistic)
+                elif alternative in ['2', '2-sided', 'two-sided']:
+                    p_value = 2 * np.minimum(
+                        pval_lower(test_statistic),
+                        pval_upper(test_statistic)
+                    )
+                else:
+                    raise ValueError('Invalid alternative.')
+
+                test_statistics.append(test_statistic)
+                p_values.append(p_value)
+
+            output = np.c_[test_statistics, p_values]
+        else:
+            raise NotImplementedError('Invalid heteroskedasticity test'
+                                      ' method.')
+
+        return output

     def test_normality(self, method):
         """
@@ -394,10 +803,42 @@ class StateSpaceMLEResults(tsbase.TimeSeriesModelResults):
         standardized residuals excluding those corresponding to missing
         observations.
         """
-        pass
+        if method is None:
+            method = 'jarquebera'
+
+        if self.standardized_forecasts_error is None:
+            raise ValueError('Cannot compute test statistic when standardized'
+                             ' forecast errors have not been computed.')
+
+        if method == 'jarquebera':
+            from statsmodels.stats.stattools import jarque_bera
+            if hasattr(self, "loglikelihood_burn"):
+                d = np.maximum(self.loglikelihood_burn, self.nobs_diffuse)
+            else:
+                d = 0
+            output = []
+            for i in range(self.model.k_endog):
+                if hasattr(self, "fiter_results"):
+                    resid = self.filter_results.standardized_forecasts_error[
+                        i, d:
+                    ]
+                else:
+                    resid = self.standardized_forecasts_error
+                mask = ~np.isnan(resid)
+                output.append(jarque_bera(resid[mask]))
+        else:
+            raise NotImplementedError('Invalid normality test method.')
+
+        return np.array(output)

-    def summary(self, alpha=0.05, start=None, title=None, model_name=None,
-        display_params=True):
+    def summary(
+        self,
+        alpha=0.05,
+        start=None,
+        title=None,
+        model_name=None,
+        display_params=True,
+    ):
         """
         Summarize the Model

@@ -420,4 +861,130 @@ class StateSpaceMLEResults(tsbase.TimeSeriesModelResults):
         --------
         statsmodels.iolib.summary.Summary
         """
-        pass
+        from statsmodels.iolib.summary import Summary
+
+        # Model specification results
+        model = self.model
+        if title is None:
+            title = "Statespace Model Results"
+
+        if start is None:
+            start = 0
+        if self.model._index_dates:
+            ix = self.model._index
+            d = ix[start]
+            sample = ["%02d-%02d-%02d" % (d.month, d.day, d.year)]
+            d = ix[-1]
+            sample += ["- " + "%02d-%02d-%02d" % (d.month, d.day, d.year)]
+        else:
+            sample = [str(start), " - " + str(self.nobs)]
+
+        # Standardize the model name as a list of str
+        if model_name is None:
+            model_name = model.__class__.__name__
+
+        # Diagnostic tests results
+        try:
+            het = self.test_heteroskedasticity(method="breakvar")
+        except Exception:  # FIXME: catch something specific
+            het = np.array([[np.nan] * 2])
+        try:
+            lb = self.test_serial_correlation(method="ljungbox")
+        except Exception:  # FIXME: catch something specific
+            lb = np.array([[np.nan] * 2]).reshape(1, 2, 1)
+        try:
+            jb = self.test_normality(method="jarquebera")
+        except Exception:  # FIXME: catch something specific
+            jb = np.array([[np.nan] * 4])
+
+        # Create the tables
+        if not isinstance(model_name, list):
+            model_name = [model_name]
+
+        top_left = [("Dep. Variable:", None)]
+        top_left.append(("Model:", [model_name[0]]))
+        for i in range(1, len(model_name)):
+            top_left.append(("", ["+ " + model_name[i]]))
+        top_left += [
+            ("Date:", None),
+            ("Time:", None),
+            ("Sample:", [sample[0]]),
+            ("", [sample[1]]),
+        ]
+
+        top_right = [
+            ("No. Observations:", [self.nobs]),
+            ("Log Likelihood", ["%#5.3f" % self.llf]),
+        ]
+        if hasattr(self, "rsquared"):
+            top_right.append(("R-squared:", ["%#8.3f" % self.rsquared]))
+        top_right += [
+            ("AIC", ["%#5.3f" % self.aic]),
+            ("BIC", ["%#5.3f" % self.bic]),
+            ("HQIC", ["%#5.3f" % self.hqic]),
+        ]
+
+        if hasattr(self, "filter_results"):
+            if (
+                    self.filter_results is not None
+                    and self.filter_results.filter_concentrated
+            ):
+                top_right.append(("Scale", ["%#5.3f" % self.scale]))
+        else:
+            top_right.append(("Scale", ["%#5.3f" % self.scale]))
+
+        if hasattr(self, "cov_type"):
+            top_left.append(("Covariance Type:", [self.cov_type]))
+
+        format_str = lambda array: [  # noqa:E731
+            ", ".join(["{0:.2f}".format(i) for i in array])
+        ]
+        diagn_left = [
+            ("Ljung-Box (Q):", format_str(lb[:, 0, -1])),
+            ("Prob(Q):", format_str(lb[:, 1, -1])),
+            ("Heteroskedasticity (H):", format_str(het[:, 0])),
+            ("Prob(H) (two-sided):", format_str(het[:, 1])),
+        ]
+
+        diagn_right = [
+            ("Jarque-Bera (JB):", format_str(jb[:, 0])),
+            ("Prob(JB):", format_str(jb[:, 1])),
+            ("Skew:", format_str(jb[:, 2])),
+            ("Kurtosis:", format_str(jb[:, 3])),
+        ]
+
+        summary = Summary()
+        summary.add_table_2cols(
+            self, gleft=top_left, gright=top_right, title=title
+        )
+        if len(self.params) > 0 and display_params:
+            summary.add_table_params(
+                self, alpha=alpha, xname=self.param_names, use_t=False
+            )
+        summary.add_table_2cols(
+            self, gleft=diagn_left, gright=diagn_right, title=""
+        )
+
+        # Add warnings/notes, added to text format only
+        etext = []
+        if hasattr(self, "cov_type") and "description" in self.cov_kwds:
+            etext.append(self.cov_kwds["description"])
+        if self._rank < (len(self.params) - len(self.fixed_params)):
+            cov_params = self.cov_params()
+            if len(self.fixed_params) > 0:
+                mask = np.ix_(self._free_params_index, self._free_params_index)
+                cov_params = cov_params[mask]
+            etext.append(
+                "Covariance matrix is singular or near-singular,"
+                " with condition number %6.3g. Standard errors may be"
+                " unstable." % _safe_cond(cov_params)
+            )
+
+        if etext:
+            etext = [
+                "[{0}] {1}".format(i + 1, text) for i, text in enumerate(etext)
+            ]
+            etext.insert(0, "Warnings:")
+            summary.add_extra_txt(etext)
+
+        return summary
diff --git a/statsmodels/tsa/exponential_smoothing/ets.py b/statsmodels/tsa/exponential_smoothing/ets.py
index 6e83a1cbf..ca354b057 100644
--- a/statsmodels/tsa/exponential_smoothing/ets.py
+++ b/statsmodels/tsa/exponential_smoothing/ets.py
@@ -1,4 +1,4 @@
-"""
+r"""
 ETS models for time series analysis.

 The ETS models are a family of time series models. They can be seen as a
@@ -12,12 +12,12 @@ and a seasonality type (S; additive or multiplicative or none).
 The following gives a very short summary, a more thorough introduction can be
 found in [1]_.

-Denote with :math:`\\circ_b` the trend operation (addition or
-multiplication), with :math:`\\circ_d` the operation linking trend and dampening
-factor :math:`\\phi` (multiplication if trend is additive, power if trend is
-multiplicative), and with :math:`\\circ_s` the seasonality operation (addition
+Denote with :math:`\circ_b` the trend operation (addition or
+multiplication), with :math:`\circ_d` the operation linking trend and dampening
+factor :math:`\phi` (multiplication if trend is additive, power if trend is
+multiplicative), and with :math:`\circ_s` the seasonality operation (addition
 or multiplication).
-Furthermore, let :math:`\\ominus` be the respective inverse operation
+Furthermore, let :math:`\ominus` be the respective inverse operation
 (subtraction or division).

 With this, it is possible to formulate the ETS models as a forecast equation
@@ -26,18 +26,18 @@ latter are used to update the internal state.

 .. math::

-    \\hat{y}_{t|t-1} &= (l_{t-1} \\circ_b (b_{t-1}\\circ_d \\phi))\\circ_s s_{t-m}\\\\
-    l_{t} &= \\alpha (y_{t} \\ominus_s s_{t-m})
-             + (1 - \\alpha) (l_{t-1} \\circ_b (b_{t-1} \\circ_d \\phi))\\\\
-    b_{t} &= \\beta/\\alpha (l_{t} \\ominus_b l_{t-1})
-             + (1 - \\beta/\\alpha) b_{t-1}\\\\
-    s_{t} &= \\gamma (y_t \\ominus_s (l_{t-1} \\circ_b (b_{t-1}\\circ_d\\phi))
-             + (1 - \\gamma) s_{t-m}
+    \hat{y}_{t|t-1} &= (l_{t-1} \circ_b (b_{t-1}\circ_d \phi))\circ_s s_{t-m}\\
+    l_{t} &= \alpha (y_{t} \ominus_s s_{t-m})
+             + (1 - \alpha) (l_{t-1} \circ_b (b_{t-1} \circ_d \phi))\\
+    b_{t} &= \beta/\alpha (l_{t} \ominus_b l_{t-1})
+             + (1 - \beta/\alpha) b_{t-1}\\
+    s_{t} &= \gamma (y_t \ominus_s (l_{t-1} \circ_b (b_{t-1}\circ_d\phi))
+             + (1 - \gamma) s_{t-m}

 The notation here follows [1]_; :math:`l_t` denotes the level at time
 :math:`t`, `b_t` the trend, and `s_t` the seasonal component. :math:`m` is the
-number of seasonal periods, and :math:`\\phi` a trend damping factor.
-The parameters :math:`\\alpha, \\beta, \\gamma` are the smoothing parameters,
+number of seasonal periods, and :math:`\phi` a trend damping factor.
+The parameters :math:`\alpha, \beta, \gamma` are the smoothing parameters,
 which are called ``smoothing_level``, ``smoothing_trend``, and
 ``smoothing_seasonal``, respectively.

@@ -51,86 +51,86 @@ additive error model,

 .. math::

-    y_t = \\hat{y}_{t|t-1} + e_t,
+    y_t = \hat{y}_{t|t-1} + e_t,

 in the multiplicative error model,

 .. math::

-    y_t = \\hat{y}_{t|t-1}\\cdot (1 + e_t).
+    y_t = \hat{y}_{t|t-1}\cdot (1 + e_t).

 Using these error models, it is possible to formulate state space equations for
 the ETS models:

 .. math::

-   y_t &= Y_t + \\eta \\cdot e_t\\\\
-   l_t &= L_t + \\alpha \\cdot (M_e \\cdot L_t + \\kappa_l) \\cdot e_t\\\\
-   b_t &= B_t + \\beta \\cdot (M_e \\cdot B_t + \\kappa_b) \\cdot e_t\\\\
-   s_t &= S_t + \\gamma \\cdot (M_e \\cdot S_t+\\kappa_s)\\cdot e_t\\\\
+   y_t &= Y_t + \eta \cdot e_t\\
+   l_t &= L_t + \alpha \cdot (M_e \cdot L_t + \kappa_l) \cdot e_t\\
+   b_t &= B_t + \beta \cdot (M_e \cdot B_t + \kappa_b) \cdot e_t\\
+   s_t &= S_t + \gamma \cdot (M_e \cdot S_t+\kappa_s)\cdot e_t\\

 with

 .. math::

-   B_t &= b_{t-1} \\circ_d \\phi\\\\
-   L_t &= l_{t-1} \\circ_b B_t\\\\
-   S_t &= s_{t-m}\\\\
-   Y_t &= L_t \\circ_s S_t,
+   B_t &= b_{t-1} \circ_d \phi\\
+   L_t &= l_{t-1} \circ_b B_t\\
+   S_t &= s_{t-m}\\
+   Y_t &= L_t \circ_s S_t,

 and

 .. math::

-   \\eta &= \\begin{cases}
-               Y_t\\quad\\text{if error is multiplicative}\\\\
-               1\\quad\\text{else}
-           \\end{cases}\\\\
-   M_e &= \\begin{cases}
-               1\\quad\\text{if error is multiplicative}\\\\
-               0\\quad\\text{else}
-           \\end{cases}\\\\
+   \eta &= \begin{cases}
+               Y_t\quad\text{if error is multiplicative}\\
+               1\quad\text{else}
+           \end{cases}\\
+   M_e &= \begin{cases}
+               1\quad\text{if error is multiplicative}\\
+               0\quad\text{else}
+           \end{cases}\\

 and, when using the additive error model,

 .. math::

-   \\kappa_l &= \\begin{cases}
-               \\frac{1}{S_t}\\quad
-               \\text{if seasonality is multiplicative}\\\\
-               1\\quad\\text{else}
-           \\end{cases}\\\\
-   \\kappa_b &= \\begin{cases}
-               \\frac{\\kappa_l}{l_{t-1}}\\quad
-               \\text{if trend is multiplicative}\\\\
-               \\kappa_l\\quad\\text{else}
-           \\end{cases}\\\\
-   \\kappa_s &= \\begin{cases}
-               \\frac{1}{L_t}\\quad\\text{if seasonality is multiplicative}\\\\
-               1\\quad\\text{else}
-           \\end{cases}
+   \kappa_l &= \begin{cases}
+               \frac{1}{S_t}\quad
+               \text{if seasonality is multiplicative}\\
+               1\quad\text{else}
+           \end{cases}\\
+   \kappa_b &= \begin{cases}
+               \frac{\kappa_l}{l_{t-1}}\quad
+               \text{if trend is multiplicative}\\
+               \kappa_l\quad\text{else}
+           \end{cases}\\
+   \kappa_s &= \begin{cases}
+               \frac{1}{L_t}\quad\text{if seasonality is multiplicative}\\
+               1\quad\text{else}
+           \end{cases}

 When using the multiplicative error model

 .. math::

-   \\kappa_l &= \\begin{cases}
-               0\\quad
-               \\text{if seasonality is multiplicative}\\\\
-               S_t\\quad\\text{else}
-           \\end{cases}\\\\
-   \\kappa_b &= \\begin{cases}
-               \\frac{\\kappa_l}{l_{t-1}}\\quad
-               \\text{if trend is multiplicative}\\\\
-               \\kappa_l + l_{t-1}\\quad\\text{else}
-           \\end{cases}\\\\
-   \\kappa_s &= \\begin{cases}
-               0\\quad\\text{if seasonality is multiplicative}\\\\
-               L_t\\quad\\text{else}
-           \\end{cases}
-
-When fitting an ETS model, the parameters :math:`\\alpha, \\beta`, \\gamma,
-\\phi` and the initial states `l_{-1}, b_{-1}, s_{-1}, \\ldots, s_{-m}` are
+   \kappa_l &= \begin{cases}
+               0\quad
+               \text{if seasonality is multiplicative}\\
+               S_t\quad\text{else}
+           \end{cases}\\
+   \kappa_b &= \begin{cases}
+               \frac{\kappa_l}{l_{t-1}}\quad
+               \text{if trend is multiplicative}\\
+               \kappa_l + l_{t-1}\quad\text{else}
+           \end{cases}\\
+   \kappa_s &= \begin{cases}
+               0\quad\text{if seasonality is multiplicative}\\
+               L_t\quad\text{else}
+           \end{cases}
+
+When fitting an ETS model, the parameters :math:`\alpha, \beta`, \gamma,
+\phi` and the initial states `l_{-1}, b_{-1}, s_{-1}, \ldots, s_{-m}` are
 selected as maximizers of log likelihood.

 References
@@ -139,13 +139,16 @@ References
    principles and practice*, 3rd edition, OTexts: Melbourne,
    Australia. OTexts.com/fpp3. Accessed on April 19th 2020.
 """
+
 from collections import OrderedDict
 import contextlib
 import datetime as dt
+
 import numpy as np
 import pandas as pd
 from scipy.stats import norm, rv_continuous, rv_discrete
 from scipy.stats.distributions import rv_frozen
+
 from statsmodels.base.covtype import descriptions
 import statsmodels.base.wrapper as wrap
 from statsmodels.iolib.summary import forg
@@ -153,16 +156,59 @@ from statsmodels.iolib.table import SimpleTable
 from statsmodels.iolib.tableformatting import fmt_params
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.tools import Bunch
-from statsmodels.tools.validation import array_like, bool_like, int_like, string_like
+from statsmodels.tools.validation import (
+    array_like,
+    bool_like,
+    int_like,
+    string_like,
+)
 import statsmodels.tsa.base.tsa_model as tsbase
 from statsmodels.tsa.exponential_smoothing import base
 import statsmodels.tsa.exponential_smoothing._ets_smooth as smooth
-from statsmodels.tsa.exponential_smoothing.initialization import _initialization_simple, _initialization_heuristic
+from statsmodels.tsa.exponential_smoothing.initialization import (
+    _initialization_simple,
+    _initialization_heuristic,
+)
 from statsmodels.tsa.tsatools import freq_to_period

+# Implementation details:
+
+# * The smoothing equations are implemented only for models having all
+#   components (trend, dampening, seasonality). When using other models, the
+#   respective parameters (smoothing and initial parameters) are set to values
+#   that lead to the reduced model (often zero).
+#   The internal model is needed for smoothing (called from fit and loglike),
+#   forecasts, and simulations.
+# * Somewhat related to above: There are 2 sets of parameters: model/external
+#   params, and internal params.
+#   - model params are all parameters necessary for a model, and are for
+#     example passed as argument to the likelihood function or as start_params
+#     to fit
+#   - internal params are what is used internally in the smoothing equations
+# * Regarding fitting, bounds, fixing parameters, and internal parameters, the
+#   overall workflow is the following:
+#   - get start parameters in the form of external parameters (includes fixed
+#     parameters)
+#   - transform external parameters to internal parameters, bounding all that
+#     are missing -> now we have some missing parameters, but potentially also
+#     some user-specified bounds
+#   - set bounds for fixed parameters
+#   - make sure that starting parameters are within bounds
+#   - set up the constraint bounds and function
+# * Since the traditional bounds are nonlinear for beta and gamma, if no bounds
+#   are given, we internally use beta_star and gamma_star for fitting
+# * When estimating initial level and initial seasonal values, one of them has
+#   to be removed in order to have a well posed problem. I am solving this by
+#   fixing the last initial seasonal value to 0 (for additive seasonality) or 1
+#   (for multiplicative seasonality).
+#   For the additive models, this means I have to subtract the last initial
+#   seasonal value from all initial seasonal values and add it to the initial
+#   level; for the multiplicative models I do the same with division and
+#   multiplication
+

 class ETSModel(base.StateSpaceMLEModel):
-    """
+    r"""
     ETS models.

     Parameters
@@ -234,11 +280,11 @@ class ETSModel(base.StateSpaceMLEModel):
     The following gives a very short summary, a more thorough introduction can
     be found in [1]_.

-    Denote with :math:`\\circ_b` the trend operation (addition or
-    multiplication), with :math:`\\circ_d` the operation linking trend and
-    dampening factor :math:`\\phi` (multiplication if trend is additive, power
-    if trend is multiplicative), and with :math:`\\circ_s` the seasonality
-    operation (addition or multiplication). Furthermore, let :math:`\\ominus`
+    Denote with :math:`\circ_b` the trend operation (addition or
+    multiplication), with :math:`\circ_d` the operation linking trend and
+    dampening factor :math:`\phi` (multiplication if trend is additive, power
+    if trend is multiplicative), and with :math:`\circ_s` the seasonality
+    operation (addition or multiplication). Furthermore, let :math:`\ominus`
     be the respective inverse operation (subtraction or division).

     With this, it is possible to formulate the ETS models as a forecast
@@ -247,19 +293,19 @@ class ETSModel(base.StateSpaceMLEModel):

     .. math::

-        \\hat{y}_{t|t-1} &= (l_{t-1} \\circ_b (b_{t-1}\\circ_d \\phi))
-                           \\circ_s s_{t-m}\\\\
-        l_{t} &= \\alpha (y_{t} \\ominus_s s_{t-m})
-                 + (1 - \\alpha) (l_{t-1} \\circ_b (b_{t-1} \\circ_d \\phi))\\\\
-        b_{t} &= \\beta/\\alpha (l_{t} \\ominus_b l_{t-1})
-                 + (1 - \\beta/\\alpha) b_{t-1}\\\\
-        s_{t} &= \\gamma (y_t \\ominus_s (l_{t-1} \\circ_b (b_{t-1}\\circ_d\\phi))
-                 + (1 - \\gamma) s_{t-m}
+        \hat{y}_{t|t-1} &= (l_{t-1} \circ_b (b_{t-1}\circ_d \phi))
+                           \circ_s s_{t-m}\\
+        l_{t} &= \alpha (y_{t} \ominus_s s_{t-m})
+                 + (1 - \alpha) (l_{t-1} \circ_b (b_{t-1} \circ_d \phi))\\
+        b_{t} &= \beta/\alpha (l_{t} \ominus_b l_{t-1})
+                 + (1 - \beta/\alpha) b_{t-1}\\
+        s_{t} &= \gamma (y_t \ominus_s (l_{t-1} \circ_b (b_{t-1}\circ_d\phi))
+                 + (1 - \gamma) s_{t-m}

     The notation here follows [1]_; :math:`l_t` denotes the level at time
     :math:`t`, `b_t` the trend, and `s_t` the seasonal component. :math:`m`
-    is the number of seasonal periods, and :math:`\\phi` a trend damping
-    factor. The parameters :math:`\\alpha, \\beta, \\gamma` are the smoothing
+    is the number of seasonal periods, and :math:`\phi` a trend damping
+    factor. The parameters :math:`\alpha, \beta, \gamma` are the smoothing
     parameters, which are called ``smoothing_level``, ``smoothing_trend``, and
     ``smoothing_seasonal``, respectively.

@@ -273,86 +319,86 @@ class ETSModel(base.StateSpaceMLEModel):

     .. math::

-        y_t = \\hat{y}_{t|t-1} + e_t,
+        y_t = \hat{y}_{t|t-1} + e_t,

     in the multiplicative error model,

     .. math::

-        y_t = \\hat{y}_{t|t-1}\\cdot (1 + e_t).
+        y_t = \hat{y}_{t|t-1}\cdot (1 + e_t).

     Using these error models, it is possible to formulate state space
     equations for the ETS models:

     .. math::

-       y_t &= Y_t + \\eta \\cdot e_t\\\\
-       l_t &= L_t + \\alpha \\cdot (M_e \\cdot L_t + \\kappa_l) \\cdot e_t\\\\
-       b_t &= B_t + \\beta \\cdot (M_e \\cdot B_t + \\kappa_b) \\cdot e_t\\\\
-       s_t &= S_t + \\gamma \\cdot (M_e \\cdot S_t+\\kappa_s)\\cdot e_t\\\\
+       y_t &= Y_t + \eta \cdot e_t\\
+       l_t &= L_t + \alpha \cdot (M_e \cdot L_t + \kappa_l) \cdot e_t\\
+       b_t &= B_t + \beta \cdot (M_e \cdot B_t + \kappa_b) \cdot e_t\\
+       s_t &= S_t + \gamma \cdot (M_e \cdot S_t+\kappa_s)\cdot e_t\\

     with

     .. math::

-       B_t &= b_{t-1} \\circ_d \\phi\\\\
-       L_t &= l_{t-1} \\circ_b B_t\\\\
-       S_t &= s_{t-m}\\\\
-       Y_t &= L_t \\circ_s S_t,
+       B_t &= b_{t-1} \circ_d \phi\\
+       L_t &= l_{t-1} \circ_b B_t\\
+       S_t &= s_{t-m}\\
+       Y_t &= L_t \circ_s S_t,

     and

     .. math::

-       \\eta &= \\begin{cases}
-                   Y_t\\quad\\text{if error is multiplicative}\\\\
-                   1\\quad\\text{else}
-               \\end{cases}\\\\
-       M_e &= \\begin{cases}
-                   1\\quad\\text{if error is multiplicative}\\\\
-                   0\\quad\\text{else}
-               \\end{cases}\\\\
+       \eta &= \begin{cases}
+                   Y_t\quad\text{if error is multiplicative}\\
+                   1\quad\text{else}
+               \end{cases}\\
+       M_e &= \begin{cases}
+                   1\quad\text{if error is multiplicative}\\
+                   0\quad\text{else}
+               \end{cases}\\

     and, when using the additive error model,

     .. math::

-       \\kappa_l &= \\begin{cases}
-                   \\frac{1}{S_t}\\quad
-                   \\text{if seasonality is multiplicative}\\\\
-                   1\\quad\\text{else}
-               \\end{cases}\\\\
-       \\kappa_b &= \\begin{cases}
-                   \\frac{\\kappa_l}{l_{t-1}}\\quad
-                   \\text{if trend is multiplicative}\\\\
-                   \\kappa_l\\quad\\text{else}
-               \\end{cases}\\\\
-       \\kappa_s &= \\begin{cases}
-                   \\frac{1}{L_t}\\quad\\text{if seasonality is multiplicative}\\\\
-                   1\\quad\\text{else}
-               \\end{cases}
+       \kappa_l &= \begin{cases}
+                   \frac{1}{S_t}\quad
+                   \text{if seasonality is multiplicative}\\
+                   1\quad\text{else}
+               \end{cases}\\
+       \kappa_b &= \begin{cases}
+                   \frac{\kappa_l}{l_{t-1}}\quad
+                   \text{if trend is multiplicative}\\
+                   \kappa_l\quad\text{else}
+               \end{cases}\\
+       \kappa_s &= \begin{cases}
+                   \frac{1}{L_t}\quad\text{if seasonality is multiplicative}\\
+                   1\quad\text{else}
+               \end{cases}

     When using the multiplicative error model

     .. math::

-       \\kappa_l &= \\begin{cases}
-                   0\\quad
-                   \\text{if seasonality is multiplicative}\\\\
-                   S_t\\quad\\text{else}
-               \\end{cases}\\\\
-       \\kappa_b &= \\begin{cases}
-                   \\frac{\\kappa_l}{l_{t-1}}\\quad
-                   \\text{if trend is multiplicative}\\\\
-                   \\kappa_l + l_{t-1}\\quad\\text{else}
-               \\end{cases}\\\\
-       \\kappa_s &= \\begin{cases}
-                   0\\quad\\text{if seasonality is multiplicative}\\\\
-                   L_t\\quad\\text{else}
-               \\end{cases}
-
-    When fitting an ETS model, the parameters :math:`\\alpha, \\beta`, \\gamma,
-    \\phi` and the initial states `l_{-1}, b_{-1}, s_{-1}, \\ldots, s_{-m}` are
+       \kappa_l &= \begin{cases}
+                   0\quad
+                   \text{if seasonality is multiplicative}\\
+                   S_t\quad\text{else}
+               \end{cases}\\
+       \kappa_b &= \begin{cases}
+                   \frac{\kappa_l}{l_{t-1}}\quad
+                   \text{if trend is multiplicative}\\
+                   \kappa_l + l_{t-1}\quad\text{else}
+               \end{cases}\\
+       \kappa_s &= \begin{cases}
+                   0\quad\text{if seasonality is multiplicative}\\
+                   L_t\quad\text{else}
+               \end{cases}
+
+    When fitting an ETS model, the parameters :math:`\alpha, \beta`, \gamma,
+    \phi` and the initial states `l_{-1}, b_{-1}, s_{-1}, \ldots, s_{-m}` are
     selected as maximizers of log likelihood.

     References
@@ -362,57 +408,109 @@ class ETSModel(base.StateSpaceMLEModel):
        Australia. OTexts.com/fpp3. Accessed on April 19th 2020.
     """

-    def __init__(self, endog, error='add', trend=None, damped_trend=False,
-        seasonal=None, seasonal_periods=None, initialization_method=
-        'estimated', initial_level=None, initial_trend=None,
-        initial_seasonal=None, bounds=None, dates=None, freq=None, missing=
-        'none'):
-        super().__init__(endog, exog=None, dates=dates, freq=freq, missing=
-            missing)
-        options = 'add', 'mul', 'additive', 'multiplicative'
-        self.error = string_like(error, 'error', options=options)[:3]
-        self.trend = string_like(trend, 'trend', options=options, optional=True
-            )
+    def __init__(
+        self,
+        endog,
+        error="add",
+        trend=None,
+        damped_trend=False,
+        seasonal=None,
+        seasonal_periods=None,
+        initialization_method="estimated",
+        initial_level=None,
+        initial_trend=None,
+        initial_seasonal=None,
+        bounds=None,
+        dates=None,
+        freq=None,
+        missing="none",
+    ):
+
+        super().__init__(
+            endog, exog=None, dates=dates, freq=freq, missing=missing
+        )
+
+        # MODEL DEFINITION
+        # ================
+        options = ("add", "mul", "additive", "multiplicative")
+        # take first three letters of option -> either "add" or "mul"
+        self.error = string_like(error, "error", options=options)[:3]
+        self.trend = string_like(
+            trend, "trend", options=options, optional=True
+        )
         if self.trend is not None:
             self.trend = self.trend[:3]
-        self.damped_trend = bool_like(damped_trend, 'damped_trend')
-        self.seasonal = string_like(seasonal, 'seasonal', options=options,
-            optional=True)
+        self.damped_trend = bool_like(damped_trend, "damped_trend")
+        self.seasonal = string_like(
+            seasonal, "seasonal", options=options, optional=True
+        )
         if self.seasonal is not None:
             self.seasonal = self.seasonal[:3]
+
         self.has_trend = self.trend is not None
         self.has_seasonal = self.seasonal is not None
+
         if self.has_seasonal:
-            self.seasonal_periods = int_like(seasonal_periods,
-                'seasonal_periods', optional=True)
+            self.seasonal_periods = int_like(
+                seasonal_periods, "seasonal_periods", optional=True
+            )
             if seasonal_periods is None:
                 self.seasonal_periods = freq_to_period(self._index_freq)
             if self.seasonal_periods <= 1:
-                raise ValueError('seasonal_periods must be larger than 1.')
+                raise ValueError("seasonal_periods must be larger than 1.")
         else:
+            # in case the model has no seasonal component, we internally handle
+            # this as if it had an additive seasonal component with
+            # seasonal_periods=1, but restrict the smoothing parameter to 0 and
+            # set the initial seasonal to 0.
             self.seasonal_periods = 1
-        if np.any(self.endog <= 0) and (self.error == 'mul' or self.trend ==
-            'mul' or self.seasonal == 'mul'):
+
+        # reject invalid models
+        if np.any(self.endog <= 0) and (
+            self.error == "mul"
+            or self.trend == "mul"
+            or self.seasonal == "mul"
+        ):
             raise ValueError(
-                'endog must be strictly positive when using multiplicative error, trend or seasonal components.'
-                )
+                "endog must be strictly positive when using "
+                "multiplicative error, trend or seasonal components."
+            )
         if self.damped_trend and not self.has_trend:
-            raise ValueError('Can only dampen the trend component')
-        self.set_initialization_method(initialization_method, initial_level,
-            initial_trend, initial_seasonal)
+            raise ValueError("Can only dampen the trend component")
+
+        # INITIALIZATION METHOD
+        # =====================
+        self.set_initialization_method(
+            initialization_method,
+            initial_level,
+            initial_trend,
+            initial_seasonal,
+        )
+
+        # BOUNDS
+        # ======
         self.set_bounds(bounds)
-        if self.trend == 'add' or self.trend is None:
-            if self.seasonal == 'add' or self.seasonal is None:
+
+        # SMOOTHER
+        # ========
+        if self.trend == "add" or self.trend is None:
+            if self.seasonal == "add" or self.seasonal is None:
                 self._smoothing_func = smooth._ets_smooth_add_add
             else:
                 self._smoothing_func = smooth._ets_smooth_add_mul
-        elif self.seasonal == 'add' or self.seasonal is None:
-            self._smoothing_func = smooth._ets_smooth_mul_add
         else:
-            self._smoothing_func = smooth._ets_smooth_mul_mul
-
-    def set_initialization_method(self, initialization_method,
-        initial_level=None, initial_trend=None, initial_seasonal=None):
+            if self.seasonal == "add" or self.seasonal is None:
+                self._smoothing_func = smooth._ets_smooth_mul_add
+            else:
+                self._smoothing_func = smooth._ets_smooth_mul_mul
+
+    def set_initialization_method(
+        self,
+        initialization_method,
+        initial_level=None,
+        initial_trend=None,
+        initial_seasonal=None,
+    ):
         """
         Sets a new initialization method for the state space model.

@@ -442,7 +540,78 @@ class ETSModel(base.StateSpaceMLEModel):
             The initial seasonal component. An array of length
             `seasonal_periods`. Only used if initialization is 'known'.
         """
-        pass
+        self.initialization_method = string_like(
+            initialization_method,
+            "initialization_method",
+            options=("estimated", "known", "heuristic"),
+        )
+        if self.initialization_method == "known":
+            if initial_level is None:
+                raise ValueError(
+                    "`initial_level` argument must be provided"
+                    ' when initialization method is set to "known".'
+                )
+            if self.has_trend and initial_trend is None:
+                raise ValueError(
+                    "`initial_trend` argument must be provided"
+                    " for models with a trend component when"
+                    ' initialization method is set to "known".'
+                )
+            if self.has_seasonal and initial_seasonal is None:
+                raise ValueError(
+                    "`initial_seasonal` argument must be provided"
+                    " for models with a seasonal component when"
+                    ' initialization method is set to "known".'
+                )
+        elif self.initialization_method == "heuristic":
+            (
+                initial_level,
+                initial_trend,
+                initial_seasonal,
+            ) = _initialization_heuristic(
+                self.endog,
+                trend=self.trend,
+                seasonal=self.seasonal,
+                seasonal_periods=self.seasonal_periods,
+            )
+        elif self.initialization_method == "estimated":
+            if self.nobs < 10 + 2 * (self.seasonal_periods // 2):
+                (
+                    initial_level,
+                    initial_trend,
+                    initial_seasonal,
+                ) = _initialization_simple(
+                    self.endog,
+                    trend=self.trend,
+                    seasonal=self.seasonal,
+                    seasonal_periods=self.seasonal_periods,
+                )
+            else:
+                (
+                    initial_level,
+                    initial_trend,
+                    initial_seasonal,
+                ) = _initialization_heuristic(
+                    self.endog,
+                    trend=self.trend,
+                    seasonal=self.seasonal,
+                    seasonal_periods=self.seasonal_periods,
+                )
+        if not self.has_trend:
+            initial_trend = 0
+        if not self.has_seasonal:
+            initial_seasonal = 0
+        self.initial_level = initial_level
+        self.initial_trend = initial_trend
+        self.initial_seasonal = initial_seasonal
+
+        # we also have to reset the params index dictionaries
+        self._internal_params_index = OrderedDict(
+            zip(self._internal_param_names, np.arange(self._k_params_internal))
+        )
+        self._params_index = OrderedDict(
+            zip(self.param_names, np.arange(self.k_params))
+        )

     def set_bounds(self, bounds):
         """
@@ -463,34 +632,223 @@ class ETSModel(base.StateSpaceMLEModel):
            principles and practice*, 3rd edition, OTexts: Melbourne,
            Australia. OTexts.com/fpp3. Accessed on April 19th 2020.
         """
-        pass
+        if bounds is None:
+            self.bounds = {}
+        else:
+            if not isinstance(bounds, (dict, OrderedDict)):
+                raise ValueError("bounds must be a dictionary")
+            for key in bounds:
+                if key not in self.param_names:
+                    raise ValueError(
+                        f"Invalid key: {key} in bounds dictionary"
+                    )
+                bounds[key] = array_like(
+                    bounds[key], f"bounds[{key}]", shape=(2,)
+                )
+            self.bounds = bounds

     @staticmethod
     def prepare_data(data):
         """
         Prepare data for use in the state space representation
         """
-        pass
+        endog = np.array(data.orig_endog, order="C")
+        if endog.ndim != 1:
+            raise ValueError("endog must be 1-dimensional")
+        if endog.dtype != np.double:
+            endog = np.asarray(data.orig_endog, order="C", dtype=float)
+        return endog, None
+
+    @property
+    def nobs_effective(self):
+        return self.nobs
+
+    @property
+    def k_endog(self):
+        return 1
+
+    @property
+    def short_name(self):
+        name = "".join(
+            [
+                str(s)[0].upper()
+                for s in [self.error, self.trend, self.seasonal]
+            ]
+        )
+        if self.damped_trend:
+            name = name[0:2] + "d" + name[2]
+        return name
+
+    @property
+    def _param_names(self):
+        param_names = ["smoothing_level"]
+        if self.has_trend:
+            param_names += ["smoothing_trend"]
+        if self.has_seasonal:
+            param_names += ["smoothing_seasonal"]
+        if self.damped_trend:
+            param_names += ["damping_trend"]
+
+        # Initialization
+        if self.initialization_method == "estimated":
+            param_names += ["initial_level"]
+            if self.has_trend:
+                param_names += ["initial_trend"]
+            if self.has_seasonal:
+                param_names += [
+                    f"initial_seasonal.{i}"
+                    for i in range(self.seasonal_periods)
+                ]
+        return param_names
+
+    @property
+    def state_names(self):
+        names = ["level"]
+        if self.has_trend:
+            names += ["trend"]
+        if self.has_seasonal:
+            names += ["seasonal"]
+        return names
+
+    @property
+    def initial_state_names(self):
+        names = ["initial_level"]
+        if self.has_trend:
+            names += ["initial_trend"]
+        if self.has_seasonal:
+            names += [
+                f"initial_seasonal.{i}" for i in range(self.seasonal_periods)
+            ]
+        return names
+
+    @property
+    def _smoothing_param_names(self):
+        return [
+            "smoothing_level",
+            "smoothing_trend",
+            "smoothing_seasonal",
+            "damping_trend",
+        ]
+
+    @property
+    def _internal_initial_state_names(self):
+        param_names = [
+            "initial_level",
+            "initial_trend",
+        ]
+        param_names += [
+            f"initial_seasonal.{i}" for i in range(self.seasonal_periods)
+        ]
+        return param_names
+
+    @property
+    def _internal_param_names(self):
+        return self._smoothing_param_names + self._internal_initial_state_names
+
+    @property
+    def _k_states(self):
+        return 1 + int(self.has_trend) + int(self.has_seasonal)  # level
+
+    @property
+    def _k_states_internal(self):
+        return 2 + self.seasonal_periods
+
+    @property
+    def _k_smoothing_params(self):
+        return self._k_states + int(self.damped_trend)
+
+    @property
+    def _k_initial_states(self):
+        return (
+            1
+            + int(self.has_trend)
+            + +int(self.has_seasonal) * self.seasonal_periods
+        )
+
+    @property
+    def k_params(self):
+        k = self._k_smoothing_params
+        if self.initialization_method == "estimated":
+            k += self._k_initial_states
+        return k
+
+    @property
+    def _k_params_internal(self):
+        return 4 + 2 + self.seasonal_periods

     def _internal_params(self, params):
         """
         Converts a parameter array passed from outside to the internally used
         full parameter array.
         """
-        pass
+        # internal params that are not needed are all set to zero, except phi,
+        # which is one
+        internal = np.zeros(self._k_params_internal, dtype=params.dtype)
+        for i, name in enumerate(self.param_names):
+            internal_idx = self._internal_params_index[name]
+            internal[internal_idx] = params[i]
+        if not self.damped_trend:
+            internal[3] = 1  # phi is 4th parameter
+        if self.initialization_method != "estimated":
+            internal[4] = self.initial_level
+            internal[5] = self.initial_trend
+            if np.isscalar(self.initial_seasonal):
+                internal[6:] = self.initial_seasonal
+            else:
+                # See GH 7893
+                internal[6:] = self.initial_seasonal[::-1]
+        return internal

     def _model_params(self, internal):
         """
         Converts internal parameters to model parameters
         """
-        pass
+        params = np.empty(self.k_params)
+        for i, name in enumerate(self.param_names):
+            internal_idx = self._internal_params_index[name]
+            params[i] = internal[internal_idx]
+        return params
+
+    @property
+    def _seasonal_index(self):
+        return 1 + int(self.has_trend)
+
+    def _get_states(self, xhat):
+        states = np.empty((self.nobs, self._k_states))
+        all_names = ["level", "trend", "seasonal"]
+        for i, name in enumerate(self.state_names):
+            idx = all_names.index(name)
+            states[:, i] = xhat[:, idx]
+        return states

     def _get_internal_states(self, states, params):
         """
         Converts a state matrix/dataframe to the (nobs, 2+m) matrix used
         internally
         """
-        pass
+        internal_params = self._internal_params(params)
+        if isinstance(states, (pd.Series, pd.DataFrame)):
+            states = states.values
+        internal_states = np.zeros((self.nobs, 2 + self.seasonal_periods))
+        internal_states[:, 0] = states[:, 0]
+        if self.has_trend:
+            internal_states[:, 1] = states[:, 1]
+        if self.has_seasonal:
+            for j in range(self.seasonal_periods):
+                internal_states[j:, 2 + j] = states[
+                    0 : self.nobs - j, self._seasonal_index
+                ]
+                internal_states[0:j, 2 + j] = internal_params[6 : 6 + j][::-1]
+        return internal_states
+
+    @property
+    def _default_start_params(self):
+        return {
+            "smoothing_level": 0.1,
+            "smoothing_trend": 0.01,
+            "smoothing_seasonal": 0.01,
+            "damping_trend": 0.98,
+        }

     @property
     def _start_params(self):
@@ -499,7 +857,29 @@ class ETSModel(base.StateSpaceMLEModel):
         This should not be called directly, but by calling
         ``self.start_params``.
         """
-        pass
+        params = []
+        for p in self._smoothing_param_names:
+            if p in self.param_names:
+                params.append(self._default_start_params[p])
+
+        if self.initialization_method == "estimated":
+            lvl_idx = len(params)
+            params += [self.initial_level]
+            if self.has_trend:
+                params += [self.initial_trend]
+            if self.has_seasonal:
+                # we have to adapt the seasonal values a bit to make sure the
+                # problem is well posed (see implementation notes above)
+                initial_seasonal = self.initial_seasonal
+                if self.seasonal == "mul":
+                    params[lvl_idx] *= initial_seasonal[-1]
+                    initial_seasonal /= initial_seasonal[-1]
+                else:
+                    params[lvl_idx] += initial_seasonal[-1]
+                    initial_seasonal -= initial_seasonal[-1]
+                params += initial_seasonal.tolist()
+
+        return np.array(params)

     def _convert_and_bound_start_params(self, params):
         """
@@ -507,17 +887,80 @@ class ETSModel(base.StateSpaceMLEModel):
         parameters as bounded, sets bounds for fixed parameters, and then makes
         sure that all start parameters are within the specified bounds.
         """
-        pass
-
-    def fit(self, start_params=None, maxiter=1000, full_output=True, disp=
-        True, callback=None, return_params=False, **kwargs):
-        """
+        internal_params = self._internal_params(params)
+        # set bounds for missing and fixed
+        for p in self._internal_param_names:
+            idx = self._internal_params_index[p]
+            if p not in self.param_names:
+                # any missing parameters are set to the value they got from the
+                # call to _internal_params
+                self.bounds[p] = [internal_params[idx]] * 2
+            elif self._has_fixed_params and p in self._fixed_params:
+                self.bounds[p] = [self._fixed_params[p]] * 2
+            # make sure everything is within bounds
+            if p in self.bounds:
+                internal_params[idx] = np.clip(
+                    internal_params[idx]
+                    + 1e-3,  # try not to start on boundary
+                    *self.bounds[p],
+                )
+        return internal_params
+
+    def _setup_bounds(self):
+        # By default, we are using the traditional constraints for the
+        # smoothing parameters if nothing else is specified
+        #
+        #    0 <     alpha     < 1
+        #    0 <   beta/alpha  < 1
+        #    0 < gamma + alpha < 1
+        #  0.8 <      phi      < 0.98
+        #
+        # For initial states, no bounds are the default setting.
+        #
+        # Since the bounds for beta and gamma are not in the simple form of a
+        # constant interval, we will use the parameters beta_star=beta/alpha
+        # and gamma_star=gamma+alpha during fitting.
+
+        lb = np.zeros(self._k_params_internal) + 1e-4
+        ub = np.ones(self._k_params_internal) - 1e-4
+
+        # other bounds for phi and initial states
+        lb[3], ub[3] = 0.8, 0.98
+        if self.initialization_method == "estimated":
+            lb[4:-1] = -np.inf
+            ub[4:-1] = np.inf
+            # fix the last initial_seasonal to 0 or 1, otherwise the equation
+            # is underdetermined
+            if self.seasonal == "mul":
+                lb[-1], ub[-1] = 1, 1
+            else:
+                lb[-1], ub[-1] = 0, 0
+
+        # set lb and ub for parameters with bounds
+        for p in self._internal_param_names:
+            idx = self._internal_params_index[p]
+            if p in self.bounds:
+                lb[idx], ub[idx] = self.bounds[p]
+
+        return [(lb[i], ub[i]) for i in range(self._k_params_internal)]
+
+    def fit(
+        self,
+        start_params=None,
+        maxiter=1000,
+        full_output=True,
+        disp=True,
+        callback=None,
+        return_params=False,
+        **kwargs,
+    ):
+        r"""
         Fit an ETS model by maximizing log-likelihood.

-        Log-likelihood is a function of the model parameters :math:`\\alpha,
-        \\beta, \\gamma, \\phi` (depending on the chosen model), and, if
+        Log-likelihood is a function of the model parameters :math:`\alpha,
+        \beta, \gamma, \phi` (depending on the chosen model), and, if
         `initialization_method` was set to `'estimated'` in the constructor,
-        also the initial states :math:`l_{-1}, b_{-1}, s_{-1}, \\ldots, s_{-m}`.
+        also the initial states :math:`l_{-1}, b_{-1}, s_{-1}, \ldots, s_{-m}`.

         The fit is performed using the L-BFGS algorithm.

@@ -530,10 +973,10 @@ class ETSModel(base.StateSpaceMLEModel):
             the parameters in the following order, skipping parameters that do
             not exist in the chosen model.

-            * `smoothing_level` (:math:`\\alpha`)
-            * `smoothing_trend` (:math:`\\beta`)
-            * `smoothing_seasonal` (:math:`\\gamma`)
-            * `damping_trend` (:math:`\\phi`)
+            * `smoothing_level` (:math:`\alpha`)
+            * `smoothing_trend` (:math:`\beta`)
+            * `smoothing_seasonal` (:math:`\gamma`)
+            * `damping_trend` (:math:`\phi`)

             If ``initialization_method`` was set to ``'estimated'`` (the
             default), additionally, the parameters
@@ -566,10 +1009,106 @@ class ETSModel(base.StateSpaceMLEModel):
         -------
         results : ETSResults
         """
-        pass

-    def _loglike_internal(self, params, yhat, xhat, is_fixed=None,
-        fixed_values=None, use_beta_star=False, use_gamma_star=False):
+        if start_params is None:
+            start_params = self.start_params
+        else:
+            start_params = np.asarray(start_params)
+
+        if self._has_fixed_params and len(self._free_params_index) == 0:
+            final_params = np.asarray(list(self._fixed_params.values()))
+            mlefit = Bunch(
+                params=start_params, mle_retvals=None, mle_settings=None
+            )
+        else:
+            internal_start_params = self._convert_and_bound_start_params(
+                start_params
+            )
+            bounds = self._setup_bounds()
+
+            # check if we need to use the starred parameters
+            use_beta_star = "smoothing_trend" not in self.bounds
+            if use_beta_star:
+                internal_start_params[1] /= internal_start_params[0]
+            use_gamma_star = "smoothing_seasonal" not in self.bounds
+            if use_gamma_star:
+                internal_start_params[2] /= 1 - internal_start_params[0]
+
+            # check if we have fixed parameters and remove them from the
+            # parameter vector
+            is_fixed = np.zeros(self._k_params_internal, dtype=int)
+            fixed_values = np.empty_like(internal_start_params)
+            params_without_fixed = []
+            kwargs["bounds"] = []
+            for i in range(self._k_params_internal):
+                if bounds[i][0] == bounds[i][1]:
+                    is_fixed[i] = True
+                    fixed_values[i] = bounds[i][0]
+                else:
+                    params_without_fixed.append(internal_start_params[i])
+                    kwargs["bounds"].append(bounds[i])
+            params_without_fixed = np.asarray(params_without_fixed)
+
+            # pre-allocate memory for smoothing results
+            yhat = np.zeros(self.nobs)
+            xhat = np.zeros((self.nobs, self._k_states_internal))
+
+            kwargs["approx_grad"] = True
+            with self.use_internal_loglike():
+                mlefit = super().fit(
+                    params_without_fixed,
+                    fargs=(
+                        yhat,
+                        xhat,
+                        is_fixed,
+                        fixed_values,
+                        use_beta_star,
+                        use_gamma_star,
+                    ),
+                    method="lbfgs",
+                    maxiter=maxiter,
+                    full_output=full_output,
+                    disp=disp,
+                    callback=callback,
+                    skip_hessian=True,
+                    **kwargs,
+                )
+            # convert params back
+            # first, insert fixed params
+            fitted_params = np.empty_like(internal_start_params)
+            idx_without_fixed = 0
+            for i in range(self._k_params_internal):
+                if is_fixed[i]:
+                    fitted_params[i] = fixed_values[i]
+                else:
+                    fitted_params[i] = mlefit.params[idx_without_fixed]
+                    idx_without_fixed += 1
+
+            if use_beta_star:
+                fitted_params[1] *= fitted_params[0]
+            if use_gamma_star:
+                fitted_params[2] *= 1 - fitted_params[0]
+            final_params = self._model_params(fitted_params)
+
+        if return_params:
+            return final_params
+        else:
+            result = self.smooth(final_params)
+            result.mlefit = mlefit
+            result.mle_retvals = mlefit.mle_retvals
+            result.mle_settings = mlefit.mle_settings
+            return result
+
+    def _loglike_internal(
+        self,
+        params,
+        yhat,
+        xhat,
+        is_fixed=None,
+        fixed_values=None,
+        use_beta_star=False,
+        use_gamma_star=False,
+    ):
         """
         Log-likelihood function to be called from fit to avoid reallocation of
         memory.
@@ -597,10 +1136,50 @@ class ETSModel(base.StateSpaceMLEModel):
         use_gamma_star : boolean
             Whether to internally use gamma_star as parameter
         """
-        pass
+        if np.iscomplexobj(params):
+            data = np.asarray(self.endog, dtype=complex)
+        else:
+            data = self.endog
+
+        if is_fixed is None:
+            is_fixed = np.zeros(self._k_params_internal, dtype=int)
+            fixed_values = np.empty(
+                self._k_params_internal, dtype=params.dtype
+            )
+
+        self._smoothing_func(
+            params,
+            data,
+            yhat,
+            xhat,
+            is_fixed,
+            fixed_values,
+            use_beta_star,
+            use_gamma_star,
+        )
+        res = self._residuals(yhat, data=data)
+        logL = -self.nobs / 2 * (np.log(2 * np.pi * np.mean(res ** 2)) + 1)
+        if self.error == "mul":
+            # GH-7331: in some cases, yhat can become negative, so that a
+            # multiplicative model is no longer well-defined. To avoid these
+            # parameterizations, we clip negative values to very small positive
+            # values so that the log-transformation yields very large negative
+            # values.
+            yhat[yhat <= 0] = 1 / (1e-8 * (1 + np.abs(yhat[yhat <= 0])))
+            logL -= np.sum(np.log(yhat))
+        return logL
+
+    @contextlib.contextmanager
+    def use_internal_loglike(self):
+        external_loglike = self.loglike
+        self.loglike = self._loglike_internal
+        try:
+            yield
+        finally:
+            self.loglike = external_loglike

     def loglike(self, params, **kwargs):
-        """
+        r"""
         Log-likelihood of model.

         Parameters
@@ -615,14 +1194,14 @@ class ETSModel(base.StateSpaceMLEModel):

         .. math::

-           l(\\theta, x_0|y) = - \\frac{n}{2}(\\log(2\\pi s^2) + 1)
-                              - \\sum\\limits_{t=1}^n \\log(k_t)
+           l(\theta, x_0|y) = - \frac{n}{2}(\log(2\pi s^2) + 1)
+                              - \sum\limits_{t=1}^n \log(k_t)

         with

         .. math::

-           s^2 = \\frac{1}{n}\\sum\\limits_{t=1}^n \\frac{(\\hat{y}_t - y_t)^2}{k_t}
+           s^2 = \frac{1}{n}\sum\limits_{t=1}^n \frac{(\hat{y}_t - y_t)^2}{k_t}

         where :math:`k_t = 1` for the additive error model and :math:`k_t =
         y_t` for the multiplicative error model.
@@ -634,11 +1213,21 @@ class ETSModel(base.StateSpaceMLEModel):
            *Journal of the American Statistical Association*, 92(440),
            1621-1629
         """
-        pass
+        params = self._internal_params(np.asarray(params))
+        yhat = np.zeros(self.nobs, dtype=params.dtype)
+        xhat = np.zeros(
+            (self.nobs, self._k_states_internal), dtype=params.dtype
+        )
+        return self._loglike_internal(np.asarray(params), yhat, xhat)

     def _residuals(self, yhat, data=None):
         """Calculates residuals of a prediction"""
-        pass
+        if data is None:
+            data = self.endog
+        if self.error == "mul":
+            return (data - yhat) / yhat
+        else:
+            return data - yhat

     def _smooth(self, params):
         """
@@ -658,7 +1247,28 @@ class ETSModel(base.StateSpaceMLEModel):
             Internal states of exponential smoothing. If original data was a
             ``pd.Series``, returns a ``pd.DataFrame``, else a ``np.ndarray``.
         """
-        pass
+        internal_params = self._internal_params(params)
+        yhat = np.zeros(self.nobs)
+        xhat = np.zeros((self.nobs, self._k_states_internal))
+        is_fixed = np.zeros(self._k_params_internal, dtype=int)
+        fixed_values = np.empty(self._k_params_internal, dtype=params.dtype)
+        self._smoothing_func(
+            internal_params, self.endog, yhat, xhat, is_fixed, fixed_values
+        )
+
+        # remove states that are only internal
+        states = self._get_states(xhat)
+
+        if self.use_pandas:
+            _, _, _, index = self._get_prediction_index(0, self.nobs - 1)
+            yhat = pd.Series(yhat, index=index)
+            statenames = ["level"]
+            if self.has_trend:
+                statenames += ["trend"]
+            if self.has_seasonal:
+                statenames += ["seasonal"]
+            states = pd.DataFrame(states, index=index, columns=statenames)
+        return yhat, states

     def smooth(self, params, return_raw=False):
         """
@@ -679,11 +1289,18 @@ class ETSModel(base.StateSpaceMLEModel):
             object. Otherwise a tuple of arrays or pandas objects, depending on
             the format of the endog data.
         """
-        pass
+        params = np.asarray(params)
+        results = self._smooth(params)
+        return self._wrap_results(params, results, return_raw)

-    def hessian(self, params, approx_centered=False, approx_complex_step=
-        True, **kwargs):
-        """
+    @property
+    def _res_classes(self):
+        return {"fit": (ETSResults, ETSResultsWrapper)}
+
+    def hessian(
+        self, params, approx_centered=False, approx_complex_step=True, **kwargs
+    ):
+        r"""
         Hessian matrix of the likelihood function, evaluated at the given
         parameters

@@ -706,28 +1323,76 @@ class ETSModel(base.StateSpaceMLEModel):
         -----
         This is a numerical approximation.
         """
-        pass
+        method = kwargs.get("method", "approx")
+
+        if method == "approx":
+            if approx_complex_step:
+                hessian = self._hessian_complex_step(params, **kwargs)
+            else:
+                hessian = self._hessian_finite_difference(
+                    params, approx_centered=approx_centered, **kwargs
+                )
+        else:
+            raise NotImplementedError("Invalid Hessian calculation method.")
+
+        return hessian
+
+    def score(
+        self, params, approx_centered=False, approx_complex_step=True, **kwargs
+    ):
+        method = kwargs.get("method", "approx")
+
+        if method == "approx":
+            if approx_complex_step:
+                score = self._score_complex_step(params, **kwargs)
+            else:
+                score = self._score_finite_difference(
+                    params, approx_centered=approx_centered, **kwargs
+                )
+        else:
+            raise NotImplementedError("Invalid score method.")
+
+        return score
+
+    def update(params, *args, **kwargs):
+        # Dummy method to make methods copied from statespace.MLEModel work
+        ...


 class ETSResults(base.StateSpaceMLEResults):
     """
     Results from an error, trend, seasonal (ETS) exponential smoothing model
     """
-
     def __init__(self, model, params, results):
         yhat, xhat = results
         self._llf = model.loglike(params)
         self._residuals = model._residuals(yhat)
         self._fittedvalues = yhat
+        # scale is concentrated in this model formulation and corresponds to
+        # mean squared residuals, see docstring of model.loglike
         scale = np.mean(self._residuals ** 2)
         super().__init__(model, params, scale=scale)
-        model_definition_attrs = ['short_name', 'error', 'trend',
-            'seasonal', 'damped_trend', 'has_trend', 'has_seasonal',
-            'seasonal_periods', 'initialization_method']
+
+        # get model definition
+        model_definition_attrs = [
+            "short_name",
+            "error",
+            "trend",
+            "seasonal",
+            "damped_trend",
+            "has_trend",
+            "has_seasonal",
+            "seasonal_periods",
+            "initialization_method",
+        ]
         for attr in model_definition_attrs:
             setattr(self, attr, getattr(model, attr))
-        self.param_names = [('%s (fixed)' % name if name in self.
-            fixed_params else name) for name in self.model.param_names or []]
+        self.param_names = [
+            "%s (fixed)" % name if name in self.fixed_params else name
+            for name in (self.model.param_names or [])
+        ]
+
+        # get fitted states and parameters
         internal_params = self.model._internal_params(params)
         self.states = xhat
         if self.model.use_pandas:
@@ -735,6 +1400,7 @@ class ETSResults(base.StateSpaceMLEResults):
         else:
             states = self.states
         self.initial_state = np.zeros(model._k_initial_states)
+
         self.level = states[:, 0]
         self.initial_level = internal_params[4]
         self.initial_state[0] = self.initial_level
@@ -748,51 +1414,79 @@ class ETSResults(base.StateSpaceMLEResults):
             self.smoothing_trend = self.beta
         if self.has_seasonal:
             self.season = states[:, self.model._seasonal_index]
+            # See GH 7893
             self.initial_seasonal = internal_params[6:][::-1]
-            self.initial_state[self.model._seasonal_index:
-                ] = self.initial_seasonal
+            self.initial_state[
+                self.model._seasonal_index :
+            ] = self.initial_seasonal
             self.gamma = self.params[self.model._seasonal_index]
             self.smoothing_seasonal = self.gamma
         if self.damped_trend:
             self.phi = internal_params[3]
             self.damping_trend = self.phi
+
+        # degrees of freedom of model
         k_free_params = self.k_params - len(self.fixed_params)
         self.df_model = k_free_params + 1
+
+        # standardized forecasting error
         self.mean_resid = np.mean(self.resid)
         self.scale_resid = np.std(self.resid, ddof=1)
-        self.standardized_forecasts_error = (self.resid - self.mean_resid
-            ) / self.scale_resid
-        if not hasattr(self, 'cov_kwds'):
+        self.standardized_forecasts_error = (
+            self.resid - self.mean_resid
+        ) / self.scale_resid
+
+        # Setup covariance matrix notes dictionary
+        # For now, only support "approx"
+        if not hasattr(self, "cov_kwds"):
             self.cov_kwds = {}
-        self.cov_type = 'approx'
+        self.cov_type = "approx"
+
+        # Setup the cache
         self._cache = {}
+
+        # Handle covariance matrix calculation
         self._cov_approx_complex_step = True
         self._cov_approx_centered = False
-        approx_type_str = 'complex-step'
+        approx_type_str = "complex-step"
         try:
             self._rank = None
             if self.k_params == 0:
                 self.cov_params_default = np.zeros((0, 0))
                 self._rank = 0
-                self.cov_kwds['description'] = 'No parameters estimated.'
+                self.cov_kwds["description"] = "No parameters estimated."
             else:
                 self.cov_params_default = self.cov_params_approx
-                self.cov_kwds['description'] = descriptions['approx'].format(
-                    approx_type=approx_type_str)
+                self.cov_kwds["description"] = descriptions["approx"].format(
+                    approx_type=approx_type_str
+                )
         except np.linalg.LinAlgError:
             self._rank = 0
             k_params = len(self.params)
             self.cov_params_default = np.zeros((k_params, k_params)) * np.nan
-            self.cov_kwds['cov_type'] = (
-                'Covariance matrix could not be calculated: singular. information matrix.'
-                )
+            self.cov_kwds["cov_type"] = (
+                "Covariance matrix could not be calculated: singular."
+                " information matrix."
+            )
+
+    @cache_readonly
+    def nobs_effective(self):
+        return self.nobs
+
+    @cache_readonly
+    def fittedvalues(self):
+        return self._fittedvalues
+
+    @cache_readonly
+    def resid(self):
+        return self._residuals

     @cache_readonly
     def llf(self):
         """
         log-likelihood function evaluated at the fitted params
         """
-        pass
+        return self._llf

     def _get_prediction_params(self, start_idx):
         """
@@ -800,7 +1494,17 @@ class ETSResults(base.StateSpaceMLEResults):
         "initial" states for prediction/simulation, that is the states just
         before the first prediction/simulation step.
         """
-        pass
+        internal_params = self.model._internal_params(self.params)
+        if start_idx == 0:
+            return internal_params
+        else:
+            internal_states = self.model._get_internal_states(
+                self.states, self.params
+            )
+            start_state = np.empty(6 + self.seasonal_periods)
+            start_state[0:4] = internal_params[0:4]
+            start_state[4:] = internal_states[start_idx - 1, :]
+            return start_state

     def _relative_forecast_variance(self, steps):
         """
@@ -810,11 +1514,88 @@ class ETSResults(base.StateSpaceMLEResults):
            principles and practice*, 3rd edition, OTexts: Melbourne,
            Australia. OTexts.com/fpp3. Accessed on April 19th 2020.
         """
-        pass
-
-    def simulate(self, nsimulations, anchor=None, repetitions=1,
-        random_errors=None, random_state=None):
-        """
+        h = steps
+        alpha = self.smoothing_level
+        if self.has_trend:
+            beta = self.smoothing_trend
+        if self.has_seasonal:
+            gamma = self.smoothing_seasonal
+            m = self.seasonal_periods
+            k = np.asarray((h - 1) / m, dtype=int)
+        if self.damped_trend:
+            phi = self.damping_trend
+        model = self.model.short_name
+        if model == "ANN":
+            return 1 + alpha ** 2 * (h - 1)
+        elif model == "AAN":
+            return 1 + (h - 1) * (
+                alpha ** 2 + alpha * beta * h + beta ** 2 * h / 6 * (2 * h - 1)
+            )
+        elif model == "AAdN":
+            return (
+                1
+                + alpha ** 2 * (h - 1)
+                + (
+                    (beta * phi * h)
+                    / ((1 - phi) ** 2)
+                    * (2 * alpha * (1 - phi) + beta * phi)
+                )
+                - (
+                    (beta * phi * (1 - phi ** h))
+                    / ((1 - phi) ** 2 * (1 - phi ** 2))
+                    * (
+                        2 * alpha * (1 - phi ** 2)
+                        + beta * phi * (1 + 2 * phi - phi ** h)
+                    )
+                )
+            )
+        elif model == "ANA":
+            return 1 + alpha ** 2 * (h - 1) + gamma * k * (2 * alpha + gamma)
+        elif model == "AAA":
+            return (
+                1
+                + (h - 1)
+                * (
+                    alpha ** 2
+                    + alpha * beta * h
+                    + (beta ** 2) / 6 * h * (2 * h - 1)
+                )
+                + gamma * k * (2 * alpha + gamma + beta * m * (k + 1))
+            )
+        elif model == "AAdA":
+            return (
+                1
+                + alpha ** 2 * (h - 1)
+                + gamma * k * (2 * alpha + gamma)
+                + (beta * phi * h)
+                / ((1 - phi) ** 2)
+                * (2 * alpha * (1 - phi) + beta * phi)
+                - (
+                    (beta * phi * (1 - phi ** h))
+                    / ((1 - phi) ** 2 * (1 - phi ** 2))
+                    * (
+                        2 * alpha * (1 - phi ** 2)
+                        + beta * phi * (1 + 2 * phi - phi ** h)
+                    )
+                )
+                + (
+                    (2 * beta * gamma * phi)
+                    / ((1 - phi) * (1 - phi ** m))
+                    * (k * (1 - phi ** m) - phi ** m * (1 - phi ** (m * k)))
+                )
+            )
+        else:
+            raise NotImplementedError
+
+    def simulate(
+        self,
+        nsimulations,
+        anchor=None,
+        repetitions=1,
+        random_errors=None,
+        random_state=None,
+    ):
+        r"""
         Random simulations using the state space formulation.

         Parameters
@@ -876,7 +1657,234 @@ class ETSResults(base.StateSpaceMLEResults):
             ``np.ndarray`` of shape (`nsimulations`, `repetitions`) is
             returned.
         """
-        pass
+
+        r"""
+        Implementation notes
+        --------------------
+        The simulation is based on the state space model of the Holt-Winter's
+        methods. The state space model assumes that the true value at time
+        :math:`t` is randomly distributed around the prediction value.
+        If using the additive error model, this means:
+
+        .. math::
+
+            y_t &= \hat{y}_{t|t-1} + e_t\\
+            e_t &\sim \mathcal{N}(0, \sigma^2)
+
+        Using the multiplicative error model:
+
+        .. math::
+
+            y_t &= \hat{y}_{t|t-1} \cdot (1 + e_t)\\
+            e_t &\sim \mathcal{N}(0, \sigma^2)
+
+        Inserting these equations into the smoothing equation formulation leads
+        to the state space equations. The notation used here follows
+        [1]_.
+
+        Additionally,
+
+        .. math::
+
+           B_t = b_{t-1} \circ_d \phi\\
+           L_t = l_{t-1} \circ_b B_t\\
+           S_t = s_{t-m}\\
+           Y_t = L_t \circ_s S_t,
+
+        where :math:`\circ_d` is the operation linking trend and damping
+        parameter (multiplication if the trend is additive, power if the trend
+        is multiplicative), :math:`\circ_b` is the operation linking level and
+        trend (addition if the trend is additive, multiplication if the trend
+        is multiplicative), and :math:'\circ_s` is the operation linking
+        seasonality to the rest.
+
+        The state space equations can then be formulated as
+
+        .. math::
+
+           y_t = Y_t + \eta \cdot e_t\\
+           l_t = L_t + \alpha \cdot (M_e \cdot L_t + \kappa_l) \cdot e_t\\
+           b_t = B_t + \beta \cdot (M_e \cdot B_t+\kappa_b) \cdot e_t\\
+           s_t = S_t + \gamma \cdot (M_e \cdot S_t + \kappa_s) \cdot e_t\\
+
+        with
+
+        .. math::
+
+           \eta &= \begin{cases}
+                       Y_t\quad\text{if error is multiplicative}\\
+                       1\quad\text{else}
+                   \end{cases}\\
+           M_e &= \begin{cases}
+                       1\quad\text{if error is multiplicative}\\
+                       0\quad\text{else}
+                   \end{cases}\\
+
+        and, when using the additive error model,
+
+        .. math::
+
+           \kappa_l &= \begin{cases}
+                       \frac{1}{S_t}\quad
+                       \text{if seasonality is multiplicative}\\
+                       1\quad\text{else}
+                   \end{cases}\\
+           \kappa_b &= \begin{cases}
+                       \frac{\kappa_l}{l_{t-1}}\quad
+                       \text{if trend is multiplicative}\\
+                       \kappa_l\quad\text{else}
+                   \end{cases}\\
+           \kappa_s &= \begin{cases}
+                       \frac{1}{L_t}\quad
+                       \text{if seasonality is multiplicative}\\
+                       1\quad\text{else}
+                   \end{cases}
+
+        When using the multiplicative error model
+
+        .. math::
+
+           \kappa_l &= \begin{cases}
+                       0\quad
+                       \text{if seasonality is multiplicative}\\
+                       S_t\quad\text{else}
+                   \end{cases}\\
+           \kappa_b &= \begin{cases}
+                       \frac{\kappa_l}{l_{t-1}}\quad
+                       \text{if trend is multiplicative}\\
+                       \kappa_l + l_{t-1}\quad\text{else}
+                   \end{cases}\\
+           \kappa_s &= \begin{cases}
+                       0\quad\text{if seasonality is multiplicative}\\
+                       L_t\quad\text{else}
+                   \end{cases}
+
+        References
+        ----------
+        .. [1] Hyndman, R.J., & Athanasopoulos, G. (2018) *Forecasting:
+           principles and practice*, 2nd edition, OTexts: Melbourne,
+           Australia. OTexts.com/fpp2. Accessed on February 28th 2020.
+        """
+        # Get the starting location
+        start_idx = self._get_prediction_start_index(anchor)
+
+        # set initial values and obtain parameters
+        start_params = self._get_prediction_params(start_idx)
+        x = np.zeros((nsimulations, self.model._k_states_internal))
+        # is fixed and fixed values are dummy arguments
+        is_fixed = np.zeros(len(start_params), dtype=int)
+        fixed_values = np.zeros_like(start_params)
+        (
+            alpha,
+            beta_star,
+            gamma_star,
+            phi,
+            m,
+            _,
+        ) = smooth._initialize_ets_smooth(
+            start_params, x, is_fixed, fixed_values
+        )
+        beta = alpha * beta_star
+        gamma = (1 - alpha) * gamma_star
+        # make x a 3 dimensional matrix: first dimension is nsimulations
+        # (number of steps), next is number of states, innermost is repetitions
+        nstates = x.shape[1]
+        x = np.tile(np.reshape(x, (nsimulations, nstates, 1)), repetitions)
+        y = np.empty((nsimulations, repetitions))
+
+        # get random error eps
+        sigma = np.sqrt(self.scale)
+        if isinstance(random_errors, np.ndarray):
+            if random_errors.shape != (nsimulations, repetitions):
+                raise ValueError(
+                    "If random is an ndarray, it must have shape "
+                    "(nsimulations, repetitions)!"
+                )
+            eps = random_errors
+        elif random_errors == "bootstrap":
+            eps = np.random.choice(
+                self.resid, size=(nsimulations, repetitions), replace=True
+            )
+        elif random_errors is None:
+            if random_state is None:
+                eps = np.random.randn(nsimulations, repetitions) * sigma
+            elif isinstance(random_state, int):
+                rng = np.random.RandomState(random_state)
+                eps = rng.randn(nsimulations, repetitions) * sigma
+            elif isinstance(random_state, np.random.RandomState):
+                eps = random_state.randn(nsimulations, repetitions) * sigma
+            else:
+                raise ValueError(
+                    "Argument random_state must be None, an integer, "
+                    "or an instance of np.random.RandomState"
+                )
+        elif isinstance(random_errors, (rv_continuous, rv_discrete)):
+            params = random_errors.fit(self.resid)
+            eps = random_errors.rvs(*params, size=(nsimulations, repetitions))
+        elif isinstance(random_errors, rv_frozen):
+            eps = random_errors.rvs(size=(nsimulations, repetitions))
+        else:
+            raise ValueError("Argument random_errors has unexpected value!")
+
+        # get model settings
+        mul_seasonal = self.seasonal == "mul"
+        mul_trend = self.trend == "mul"
+        mul_error = self.error == "mul"
+
+        # define trend, damping and seasonality operations
+        if mul_trend:
+            op_b = np.multiply
+            op_d = np.power
+        else:
+            op_b = np.add
+            op_d = np.multiply
+        if mul_seasonal:
+            op_s = np.multiply
+        else:
+            op_s = np.add
+
+        # x translation:
+        # - x[t, 0, :] is level[t]
+        # - x[t, 1, :] is trend[t]
+        # - x[t, 2, :] is seasonal[t]
+        # - x[t, 3, :] is seasonal[t-1]
+        # - x[t, 2+j, :] is seasonal[t-j]
+        # - similarly: x[t-1, 2+m-1, :] is seasonal[t-m]
+        for t in range(nsimulations):
+            B = op_d(x[t - 1, 1, :], phi)
+            L = op_b(x[t - 1, 0, :], B)
+            S = x[t - 1, 2 + m - 1, :]
+            Y = op_s(L, S)
+            if self.error == "add":
+                eta = 1
+                kappa_l = 1 / S if mul_seasonal else 1
+                kappa_b = kappa_l / x[t - 1, 0, :] if mul_trend else kappa_l
+                kappa_s = 1 / L if mul_seasonal else 1
+            else:
+                eta = Y
+                kappa_l = 0 if mul_seasonal else S
+                kappa_b = (
+                    kappa_l / x[t - 1, 0, :]
+                    if mul_trend
+                    else kappa_l + x[t - 1, 0, :]
+                )
+                kappa_s = 0 if mul_seasonal else L
+
+            y[t, :] = Y + eta * eps[t, :]
+            x[t, 0, :] = L + alpha * (mul_error * L + kappa_l) * eps[t, :]
+            x[t, 1, :] = B + beta * (mul_error * B + kappa_b) * eps[t, :]
+            x[t, 2, :] = S + gamma * (mul_error * S + kappa_s) * eps[t, :]
+            # update seasonals by shifting previous seasonal right
+            x[t, 3:, :] = x[t - 1, 2:-1, :]
+
+        # Wrap data / squeeze where appropriate
+        if repetitions > 1:
+            names = ["simulation.%d" % num for num in range(repetitions)]
+        else:
+            names = "simulation"
+        return self.model._wrap_data(
+            y, start_idx, start_idx + nsimulations - 1, names=names
+        )

     def forecast(self, steps=1):
         """
@@ -895,13 +1903,78 @@ class ETSResults(base.StateSpaceMLEResults):
         forecast : ndarray
             Array of out of sample forecasts. A (steps x k_endog) array.
         """
-        pass
+        return self._forecast(steps, "end")

     def _forecast(self, steps, anchor):
         """
         Dynamic prediction/forecasting
         """
-        pass
+        # forecast is the same as simulation without errors
+        return self.simulate(
+            steps, anchor=anchor, random_errors=np.zeros((steps, 1))
+        )
+
+    def _handle_prediction_index(self, start, dynamic, end, index):
+        if start is None:
+            start = 0
+
+        # Handle start, end, dynamic
+        start, end, out_of_sample, _ = self.model._get_prediction_index(
+            start, end, index
+        )
+        # if end was outside of the sample, it is now the last point in the
+        # sample
+        if start > end + out_of_sample + 1:
+            raise ValueError(
+                "Prediction start cannot lie outside of the sample."
+            )
+
+        # Handle `dynamic`
+        if isinstance(dynamic, (str, dt.datetime, pd.Timestamp)):
+            dynamic, _, _ = self.model._get_index_loc(dynamic)
+            # Convert to offset relative to start
+            dynamic = dynamic - start
+        elif isinstance(dynamic, bool):
+            if dynamic:
+                dynamic = 0
+            else:
+                dynamic = end + 1 - start
+
+        # start : index of first predicted value
+        # dynamic : offset to first dynamically predicted value
+        #     -> if dynamic == 0, only dynamic simulations
+        if dynamic == 0:
+            start_smooth = None
+            end_smooth = None
+            nsmooth = 0
+            start_dynamic = start
+        else:
+            # dynamic simulations from start + dynamic
+            start_smooth = start
+            end_smooth = min(start + dynamic - 1, end)
+            nsmooth = max(end_smooth - start_smooth + 1, 0)
+            start_dynamic = start + dynamic
+        # anchor for simulations is one before start_dynamic
+        if start_dynamic == 0:
+            anchor_dynamic = "start"
+        else:
+            anchor_dynamic = start_dynamic - 1
+        # end is last point in sample, out_of_sample gives number of
+        # simulations out of sample
+        end_dynamic = end + out_of_sample
+        ndynamic = end_dynamic - start_dynamic + 1
+        return (
+            start,
+            end,
+            start_smooth,
+            end_smooth,
+            anchor_dynamic,
+            start_dynamic,
+            end_dynamic,
+            nsmooth,
+            ndynamic,
+            index,
+        )

     def predict(self, start=None, end=None, dynamic=False, index=None):
         """
@@ -940,10 +2013,49 @@ class ETSResults(base.StateSpaceMLEResults):
             forecasts. An (npredict,) array. If original data was a pd.Series
             or DataFrame, a pd.Series is returned.
         """
-        pass

-    def get_prediction(self, start=None, end=None, dynamic=False, index=
-        None, method=None, simulate_repetitions=1000, **simulate_kwargs):
+        (
+            start,
+            end,
+            start_smooth,
+            end_smooth,
+            anchor_dynamic,
+            _,
+            end_dynamic,
+            nsmooth,
+            ndynamic,
+            index,
+        ) = self._handle_prediction_index(start, dynamic, end, index)
+
+        y = np.empty(nsmooth + ndynamic)
+
+        # In sample nondynamic prediction: smoothing
+        if nsmooth > 0:
+            y[0:nsmooth] = self.fittedvalues[start_smooth : end_smooth + 1]
+
+        # Out of sample/dynamic prediction: forecast
+        if ndynamic > 0:
+            y[nsmooth:] = self._forecast(ndynamic, anchor_dynamic)
+
+        # when we are doing out of sample only prediction, start > end + 1, and
+        # we only want to output beginning at start
+        if start > end + 1:
+            ndiscard = start - (end + 1)
+            y = y[ndiscard:]
+
+        # Wrap data / squeeze where appropriate
+        return self.model._wrap_data(y, start, end_dynamic)
+
+    def get_prediction(
+        self,
+        start=None,
+        end=None,
+        dynamic=False,
+        index=None,
+        method=None,
+        simulate_repetitions=1000,
+        **simulate_kwargs,
+    ):
         """
         Calculates mean prediction and prediction intervals.

@@ -986,7 +2098,18 @@ class ETSResults(base.StateSpaceMLEResults):
         PredictionResults
             Predicted mean values and prediction intervals
         """
-        pass
+        return PredictionResultsWrapper(
+            PredictionResults(
+                self,
+                start,
+                end,
+                dynamic,
+                index,
+                method,
+                simulate_repetitions,
+                **simulate_kwargs,
+            )
+        )

     def summary(self, alpha=0.05, start=None):
         """
@@ -1009,17 +2132,51 @@ class ETSResults(base.StateSpaceMLEResults):
         --------
         statsmodels.iolib.summary.Summary
         """
-        pass
+        model_name = f"ETS({self.short_name})"
+
+        summary = super().summary(
+            alpha=alpha,
+            start=start,
+            title="ETS Results",
+            model_name=model_name,
+        )
+
+        if self.model.initialization_method != "estimated":
+            params = np.array(self.initial_state)
+            if params.ndim > 1:
+                params = params[0]
+            names = self.model.initial_state_names
+            param_header = [
+                "initialization method: %s" % self.model.initialization_method
+            ]
+            params_stubs = names
+            params_data = [
+                [forg(params[i], prec=4)] for i in range(len(params))
+            ]
+
+            initial_state_table = SimpleTable(
+                params_data, param_header, params_stubs, txt_fmt=fmt_params
+            )
+            summary.tables.insert(-1, initial_state_table)
+
+        return summary


 class ETSResultsWrapper(wrap.ResultsWrapper):
-    _attrs = {'fittedvalues': 'rows', 'level': 'rows', 'resid': 'rows',
-        'season': 'rows', 'slope': 'rows'}
-    _wrap_attrs = wrap.union_dicts(tsbase.TimeSeriesResultsWrapper.
-        _wrap_attrs, _attrs)
-    _methods = {'predict': 'dates', 'forecast': 'dates'}
-    _wrap_methods = wrap.union_dicts(tsbase.TimeSeriesResultsWrapper.
-        _wrap_methods, _methods)
+    _attrs = {
+        "fittedvalues": "rows",
+        "level": "rows",
+        "resid": "rows",
+        "season": "rows",
+        "slope": "rows",
+    }
+    _wrap_attrs = wrap.union_dicts(
+        tsbase.TimeSeriesResultsWrapper._wrap_attrs, _attrs
+    )
+    _methods = {"predict": "dates", "forecast": "dates"}
+    _wrap_methods = wrap.union_dicts(
+        tsbase.TimeSeriesResultsWrapper._wrap_methods, _methods
+    )


 wrap.populate_wrapper(ETSResultsWrapper, ETSResults)
@@ -1066,62 +2223,110 @@ class PredictionResults:
         Additional arguments passed to the ``simulate`` method.
     """

-    def __init__(self, results, start=None, end=None, dynamic=False, index=
-        None, method=None, simulate_repetitions=1000, **simulate_kwargs):
+    def __init__(
+        self,
+        results,
+        start=None,
+        end=None,
+        dynamic=False,
+        index=None,
+        method=None,
+        simulate_repetitions=1000,
+        **simulate_kwargs,
+    ):
         self.use_pandas = results.model.use_pandas
+
         if method is None:
-            exact_available = ['ANN', 'AAN', 'AAdN', 'ANA', 'AAA', 'AAdA']
+            exact_available = ["ANN", "AAN", "AAdN", "ANA", "AAA", "AAdA"]
             if results.model.short_name in exact_available:
-                method = 'exact'
+                method = "exact"
             else:
-                method = 'simulated'
+                method = "simulated"
         self.method = method
-        (start, end, start_smooth, _, anchor_dynamic, start_dynamic,
-            end_dynamic, nsmooth, ndynamic, index
-            ) = results._handle_prediction_index(start, dynamic, end, index)
-        self.predicted_mean = results.predict(start=start, end=end_dynamic,
-            dynamic=dynamic, index=index)
+
+        (
+            start,
+            end,
+            start_smooth,
+            _,
+            anchor_dynamic,
+            start_dynamic,
+            end_dynamic,
+            nsmooth,
+            ndynamic,
+            index,
+        ) = results._handle_prediction_index(start, dynamic, end, index)
+
+        self.predicted_mean = results.predict(
+            start=start, end=end_dynamic, dynamic=dynamic, index=index
+        )
         self.row_labels = self.predicted_mean.index
         self.endog = np.empty(nsmooth + ndynamic) * np.nan
         if nsmooth > 0:
-            self.endog[0:end - start + 1] = results.data.endog[start:end + 1]
-        self.model = Bunch(data=results.model.data.__class__(endog=self.
-            endog, predict_dates=self.row_labels))
-        if self.method == 'simulated':
+            self.endog[0: (end - start + 1)] = results.data.endog[
+                start: (end + 1)
+            ]
+        self.model = Bunch(
+            data=results.model.data.__class__(
+                endog=self.endog, predict_dates=self.row_labels
+            )
+        )
+
+        if self.method == "simulated":
+
             sim_results = []
+            # first, perform "non-dynamic" simulations, i.e. simulations of
+            # only one step, based on the previous step
             if nsmooth > 1:
                 if start_smooth == 0:
-                    anchor = 'start'
+                    anchor = "start"
                 else:
                     anchor = start_smooth - 1
                 for i in range(nsmooth):
-                    sim_results.append(results.simulate(1, anchor=anchor,
-                        repetitions=simulate_repetitions, **simulate_kwargs))
+                    sim_results.append(
+                        results.simulate(
+                            1,
+                            anchor=anchor,
+                            repetitions=simulate_repetitions,
+                            **simulate_kwargs,
+                        )
+                    )
+                    # anchor
                     anchor = start_smooth + i
             if ndynamic:
-                sim_results.append(results.simulate(ndynamic, anchor=
-                    anchor_dynamic, repetitions=simulate_repetitions, **
-                    simulate_kwargs))
+                sim_results.append(
+                    results.simulate(
+                        ndynamic,
+                        anchor=anchor_dynamic,
+                        repetitions=simulate_repetitions,
+                        **simulate_kwargs,
+                    )
+                )
             if sim_results and isinstance(sim_results[0], pd.DataFrame):
                 self.simulation_results = pd.concat(sim_results, axis=0)
             else:
                 self.simulation_results = np.concatenate(sim_results, axis=0)
             self.forecast_variance = self.simulation_results.var(1)
-        else:
+        else:  # method == 'exact'
             steps = np.ones(ndynamic + nsmooth)
             if ndynamic > 0:
-                steps[start_dynamic - min(start_dynamic, start):] = range(1,
-                    ndynamic + 1)
+                steps[
+                    (start_dynamic - min(start_dynamic, start)):
+                    ] = range(1, ndynamic + 1)
+            # when we are doing out of sample only prediction,
+            # start > end + 1, and
+            # we only want to output beginning at start
             if start > end + 1:
                 ndiscard = start - (end + 1)
                 steps = steps[ndiscard:]
-            self.forecast_variance = (results.mse * results.
-                _relative_forecast_variance(steps))
+            self.forecast_variance = (
+                results.mse * results._relative_forecast_variance(steps)
+            )

     @property
     def var_pred_mean(self):
         """The variance of the predicted mean"""
-        pass
+        return self.forecast_variance

     def pred_int(self, alpha=0.05):
         """
@@ -1133,15 +2338,61 @@ class PredictionResults:
             The significance level for the prediction interval. Default is
             0.05, that is, a 95% prediction interval.
         """
-        pass
+
+        if self.method == "simulated":
+            simulated_upper_pi = np.quantile(
+                self.simulation_results, 1 - alpha / 2, axis=1
+            )
+            simulated_lower_pi = np.quantile(
+                self.simulation_results, alpha / 2, axis=1
+            )
+            pred_int = np.vstack((simulated_lower_pi, simulated_upper_pi)).T
+        else:
+            q = norm.ppf(1 - alpha / 2)
+            half_interval_size = q * np.sqrt(self.forecast_variance)
+            pred_int = np.vstack(
+                (
+                    self.predicted_mean - half_interval_size,
+                    self.predicted_mean + half_interval_size,
+                )
+            ).T
+
+        if self.use_pandas:
+            pred_int = pd.DataFrame(pred_int, index=self.row_labels)
+            names = [
+                f"lower PI (alpha={alpha:f})",
+                f"upper PI (alpha={alpha:f})",
+            ]
+            pred_int.columns = names
+        return pred_int
+
+    def summary_frame(self, endog=0, alpha=0.05):
+        pred_int = np.asarray(self.pred_int(alpha=alpha))
+        to_include = {}
+        to_include["mean"] = self.predicted_mean
+        if self.method == "simulated":
+            to_include["mean_numerical"] = np.mean(
+                self.simulation_results, axis=1
+            )
+        to_include["pi_lower"] = pred_int[:, 0]
+        to_include["pi_upper"] = pred_int[:, 1]
+
+        res = pd.DataFrame(
+            to_include, index=self.row_labels, columns=list(to_include.keys())
+        )
+        return res


 class PredictionResultsWrapper(wrap.ResultsWrapper):
-    _attrs = {'predicted_mean': 'dates', 'simulation_results': 'dates',
-        'endog': 'dates'}
+    _attrs = {
+        "predicted_mean": "dates",
+        "simulation_results": "dates",
+        "endog": "dates",
+    }
     _wrap_attrs = wrap.union_dicts(_attrs)
+
     _methods = {}
     _wrap_methods = wrap.union_dicts(_methods)


-wrap.populate_wrapper(PredictionResultsWrapper, PredictionResults)
+wrap.populate_wrapper(PredictionResultsWrapper, PredictionResults)  # noqa:E305
diff --git a/statsmodels/tsa/exponential_smoothing/initialization.py b/statsmodels/tsa/exponential_smoothing/initialization.py
index 70d90a7d5..8afff31ab 100644
--- a/statsmodels/tsa/exponential_smoothing/initialization.py
+++ b/statsmodels/tsa/exponential_smoothing/initialization.py
@@ -1,5 +1,120 @@
 """
 Initialization methods for states of exponential smoothing models
 """
+
 import numpy as np
 import pandas as pd
+
+
+def _initialization_simple(endog, trend=False, seasonal=False,
+                           seasonal_periods=None):
+    # See Section 7.6 of Hyndman and Athanasopoulos
+    nobs = len(endog)
+    initial_trend = None
+    initial_seasonal = None
+
+    # Non-seasonal
+    if seasonal is None or not seasonal:
+        initial_level = endog[0]
+        if trend == 'add':
+            initial_trend = endog[1] - endog[0]
+        elif trend == 'mul':
+            initial_trend = endog[1] / endog[0]
+    # Seasonal
+    else:
+        if nobs < 2 * seasonal_periods:
+            raise ValueError('Cannot compute initial seasonals using'
+                             ' heuristic method with less than two full'
+                             ' seasonal cycles in the data.')
+
+        initial_level = np.mean(endog[:seasonal_periods])
+        m = seasonal_periods
+
+        if trend is not None:
+            initial_trend = (pd.Series(endog).diff(m)[m:2 * m] / m).mean()
+
+        if seasonal == 'add':
+            initial_seasonal = endog[:m] - initial_level
+        elif seasonal == 'mul':
+            initial_seasonal = endog[:m] / initial_level
+
+    return initial_level, initial_trend, initial_seasonal
+
+
+def _initialization_heuristic(endog, trend=False, seasonal=False,
+                              seasonal_periods=None):
+    # See Section 2.6 of Hyndman et al.
+    endog = endog.copy()
+    nobs = len(endog)
+
+    if nobs < 10:
+        raise ValueError('Cannot use heuristic method with less than 10'
+                         ' observations.')
+
+    # Seasonal component
+    initial_seasonal = None
+    if seasonal:
+        # Calculate the number of full cycles to use
+        if nobs < 2 * seasonal_periods:
+            raise ValueError('Cannot compute initial seasonals using'
+                             ' heuristic method with less than two full'
+                             ' seasonal cycles in the data.')
+        # We need at least 10 periods for the level initialization
+        # and we will lose self.seasonal_periods // 2 values at the
+        # beginning and end of the sample, so we need at least
+        # 10 + 2 * (self.seasonal_periods // 2) values
+        min_obs = 10 + 2 * (seasonal_periods // 2)
+        if nobs < min_obs:
+            raise ValueError('Cannot use heuristic method to compute'
+                             ' initial seasonal and levels with less'
+                             ' than 10 + 2 * (seasonal_periods // 2)'
+                             ' datapoints.')
+        # In some datasets we may only have 2 full cycles (but this may
+        # still satisfy the above restriction that we will end up with
+        # 10 seasonally adjusted observations)
+        k_cycles = min(5, nobs // seasonal_periods)
+        # In other datasets, 3 full cycles may not be enough to end up
+        # with 10 seasonally adjusted observations
+        k_cycles = max(k_cycles, int(np.ceil(min_obs / seasonal_periods)))
+
+        # Compute the moving average
+        series = pd.Series(endog[:seasonal_periods * k_cycles])
+        initial_trend = series.rolling(seasonal_periods, center=True).mean()
+        if seasonal_periods % 2 == 0:
+            initial_trend = initial_trend.shift(-1).rolling(2).mean()
+
+        # Detrend
+        if seasonal == 'add':
+            detrended = series - initial_trend
+        elif seasonal == 'mul':
+            detrended = series / initial_trend
+
+        # Average seasonal effect
+        tmp = np.zeros(k_cycles * seasonal_periods) * np.nan
+        tmp[:len(detrended)] = detrended.values
+        initial_seasonal = np.nanmean(
+            tmp.reshape(k_cycles, seasonal_periods).T, axis=1)
+
+        # Normalize the seasonals
+        if seasonal == 'add':
+            initial_seasonal -= np.mean(initial_seasonal)
+        elif seasonal == 'mul':
+            initial_seasonal /= np.mean(initial_seasonal)
+
+        # Replace the data with the trend
+        endog = initial_trend.dropna().values
+
+    # Trend / Level
+    exog = np.c_[np.ones(10), np.arange(10) + 1]
+    if endog.ndim == 1:
+        endog = np.atleast_2d(endog).T
+    beta = np.squeeze(np.linalg.pinv(exog).dot(endog[:10]))
+    initial_level = beta[0]
+
+    initial_trend = None
+    if trend == 'add':
+        initial_trend = beta[1]
+    elif trend == 'mul':
+        initial_trend = 1 + beta[1] / beta[0]
+
+    return initial_level, initial_trend, initial_seasonal
diff --git a/statsmodels/tsa/filters/_utils.py b/statsmodels/tsa/filters/_utils.py
index 494ef90d8..6113e3438 100644
--- a/statsmodels/tsa/filters/_utils.py
+++ b/statsmodels/tsa/filters/_utils.py
@@ -1,10 +1,71 @@
 from functools import wraps
+
 from statsmodels.tools.data import _is_using_pandas
 from statsmodels.tsa.tsatools import freq_to_period


-def pandas_wrapper_freq(func, trim_head=None, trim_tail=None, freq_kw=
-    'freq', columns=None, *args, **kwargs):
+def _get_pandas_wrapper(X, trim_head=None, trim_tail=None, names=None):
+    index = X.index
+    #TODO: allow use index labels
+    if trim_head is None and trim_tail is None:
+        index = index
+    elif trim_tail is None:
+        index = index[trim_head:]
+    elif trim_head is None:
+        index = index[:-trim_tail]
+    else:
+        index = index[trim_head:-trim_tail]
+    if hasattr(X, "columns"):
+        if names is None:
+            names = X.columns
+        return lambda x : X.__class__(x, index=index, columns=names)
+    else:
+        if names is None:
+            names = X.name
+        return lambda x : X.__class__(x, index=index, name=names)
+
+
+def pandas_wrapper(func, trim_head=None, trim_tail=None, names=None, *args,
+                   **kwargs):
+    @wraps(func)
+    def new_func(X, *args, **kwargs):
+        # quick pass-through for do nothing case
+        if not _is_using_pandas(X, None):
+            return func(X, *args, **kwargs)
+
+        wrapper_func = _get_pandas_wrapper(X, trim_head, trim_tail,
+                                           names)
+        ret = func(X, *args, **kwargs)
+        ret = wrapper_func(ret)
+        return ret
+
+    return new_func
+
+
+def pandas_wrapper_bunch(func, trim_head=None, trim_tail=None,
+                         names=None, *args, **kwargs):
+    @wraps(func)
+    def new_func(X, *args, **kwargs):
+        # quick pass-through for do nothing case
+        if not _is_using_pandas(X, None):
+            return func(X, *args, **kwargs)
+
+        wrapper_func = _get_pandas_wrapper(X, trim_head, trim_tail,
+                                           names)
+        ret = func(X, *args, **kwargs)
+        ret = wrapper_func(ret)
+        return ret
+
+    return new_func
+
+
+def pandas_wrapper_predict(func, trim_head=None, trim_tail=None,
+                           columns=None, *args, **kwargs):
+    raise NotImplementedError
+
+
+def pandas_wrapper_freq(func, trim_head=None, trim_tail=None,
+                        freq_kw='freq', columns=None, *args, **kwargs):
     """
     Return a new function that catches the incoming X, checks if it's pandas,
     calls the functions as is. Then wraps the results in the incoming index.
@@ -12,4 +73,20 @@ def pandas_wrapper_freq(func, trim_head=None, trim_tail=None, freq_kw=
     Deals with frequencies. Expects that the function returns a tuple,
     a Bunch object, or a pandas-object.
     """
-    pass
+
+    @wraps(func)
+    def new_func(X, *args, **kwargs):
+        # quick pass-through for do nothing case
+        if not _is_using_pandas(X, None):
+            return func(X, *args, **kwargs)
+
+        wrapper_func = _get_pandas_wrapper(X, trim_head, trim_tail,
+                                           columns)
+        index = X.index
+        freq = index.inferred_freq
+        kwargs.update({freq_kw : freq_to_period(freq)})
+        ret = func(X, *args, **kwargs)
+        ret = wrapper_func(ret)
+        return ret
+
+    return new_func
diff --git a/statsmodels/tsa/filters/api.py b/statsmodels/tsa/filters/api.py
index 82ced5ff9..e8183cf9b 100644
--- a/statsmodels/tsa/filters/api.py
+++ b/statsmodels/tsa/filters/api.py
@@ -1,5 +1,5 @@
-__all__ = ['bkfilter', 'hpfilter', 'cffilter', 'miso_lfilter',
-    'convolution_filter', 'recursive_filter']
+__all__ = ["bkfilter", "hpfilter", "cffilter", "miso_lfilter",
+           "convolution_filter", "recursive_filter"]
 from .bk_filter import bkfilter
 from .hp_filter import hpfilter
 from .cf_filter import cffilter
diff --git a/statsmodels/tsa/filters/bk_filter.py b/statsmodels/tsa/filters/bk_filter.py
index cc19abe4a..888c7061b 100644
--- a/statsmodels/tsa/filters/bk_filter.py
+++ b/statsmodels/tsa/filters/bk_filter.py
@@ -1,5 +1,7 @@
+
 import numpy as np
 from scipy.signal import fftconvolve
+
 from statsmodels.tools.validation import array_like, PandasWrapper


@@ -77,4 +79,24 @@ def bkfilter(x, low=6, high=32, K=12):

     .. plot:: plots/bkf_plot.py
     """
-    pass
+    # TODO: change the docstring to ..math::?
+    # TODO: allow windowing functions to correct for Gibb's Phenomenon?
+    # adjust bweights (symmetrically) by below before demeaning
+    # Lancosz Sigma Factors np.sinc(2*j/(2.*K+1))
+    pw = PandasWrapper(x)
+    x = array_like(x, 'x', maxdim=2)
+    omega_1 = 2. * np.pi / high  # convert from freq. to periodicity
+    omega_2 = 2. * np.pi / low
+    bweights = np.zeros(2 * K + 1)
+    bweights[K] = (omega_2 - omega_1) / np.pi  # weight at zero freq.
+    j = np.arange(1, int(K) + 1)
+    weights = 1 / (np.pi * j) * (np.sin(omega_2 * j) - np.sin(omega_1 * j))
+    bweights[K + j] = weights  # j is an idx
+    bweights[:K] = weights[::-1]  # make symmetric weights
+    bweights -= bweights.mean()  # make sure weights sum to zero
+    if x.ndim == 2:
+        bweights = bweights[:, None]
+    x = fftconvolve(x, bweights, mode='valid')
+    # get a centered moving avg/convolution
+
+    return pw.wrap(x, append='cycle', trim_start=K, trim_end=K)
diff --git a/statsmodels/tsa/filters/cf_filter.py b/statsmodels/tsa/filters/cf_filter.py
index 21670489b..118cb5434 100644
--- a/statsmodels/tsa/filters/cf_filter.py
+++ b/statsmodels/tsa/filters/cf_filter.py
@@ -1,7 +1,17 @@
 import numpy as np
+
 from statsmodels.tools.validation import PandasWrapper, array_like

+# the data is sampled quarterly, so cut-off frequency of 18
+
+# Wn is normalized cut-off freq
+#Cutoff frequency is that frequency where the magnitude response of the filter
+# is sqrt(1/2.). For butter, the normalized cutoff frequency Wn must be a
+# number between  0 and 1, where 1 corresponds to the Nyquist frequency, p
+# radians per sample.

+
+# NOTE: uses a loop, could probably be sped-up for very large datasets
 def cffilter(x, low=6, high=32, drift=True):
     """
     Christiano Fitzgerald asymmetric, random walk filter.
@@ -63,13 +73,40 @@ def cffilter(x, low=6, high=32, drift=True):

     .. plot:: plots/cff_plot.py
     """
-    pass
+    #TODO: cythonize/vectorize loop?, add ability for symmetric filter,
+    #      and estimates of theta other than random walk.
+    if low < 2:
+        raise ValueError("low must be >= 2")
+    pw = PandasWrapper(x)
+    x = array_like(x, 'x', ndim=2)
+    nobs, nseries = x.shape
+    a = 2*np.pi/high
+    b = 2*np.pi/low
+
+    if drift:  # get drift adjusted series
+        x = x - np.arange(nobs)[:, None] * (x[-1] - x[0]) / (nobs - 1)
+
+    J = np.arange(1, nobs + 1)
+    Bj = (np.sin(b * J) - np.sin(a * J)) / (np.pi * J)
+    B0 = (b - a) / np.pi
+    Bj = np.r_[B0, Bj][:, None]
+    y = np.zeros((nobs, nseries))
+
+    for i in range(nobs):
+        B = -.5 * Bj[0] - np.sum(Bj[1:-i - 2])
+        A = -Bj[0] - np.sum(Bj[1:-i - 2]) - np.sum(Bj[1:i]) - B
+        y[i] = (Bj[0] * x[i] + np.dot(Bj[1:-i - 2].T, x[i + 1:-1]) +
+                B * x[-1] + np.dot(Bj[1:i].T, x[1:i][::-1]) + A * x[0])
+    y = y.squeeze()
+
+    cycle, trend = y.squeeze(), x.squeeze() - y
+
+    return pw.wrap(cycle, append='cycle'), pw.wrap(trend, append='trend')


-if __name__ == '__main__':
+if __name__ == "__main__":
     import statsmodels as sm
-    dta = sm.datasets.macrodata.load().data[['infl', 'tbilrate']].view((
-        float, 2))[1:]
+    dta = sm.datasets.macrodata.load().data[['infl','tbilrate']].view((float,2))[1:]
     cycle, trend = cffilter(dta, 6, 32, drift=True)
     dta = sm.datasets.macrodata.load().data['tbilrate'][1:]
     cycle2, trend2 = cffilter(dta, 6, 32, drift=True)
diff --git a/statsmodels/tsa/filters/filtertools.py b/statsmodels/tsa/filters/filtertools.py
index 7c0a81644..ca63da825 100644
--- a/statsmodels/tsa/filters/filtertools.py
+++ b/statsmodels/tsa/filters/filtertools.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """Linear Filters for time series analysis and testing


@@ -8,17 +9,56 @@ Created on Sat Oct 23 17:18:03 2010

 Author: Josef-pktd
 """
+# not original copied from various experimental scripts
+# version control history is there
+
 import numpy as np
 import scipy.fftpack as fft
 from scipy import signal
+
 try:
     from scipy.signal._signaltools import _centered as trim_centered
 except ImportError:
+    # Must be using SciPy <1.8.0 where this function was moved (it's not a
+    # public SciPy function, but we need it here)
     from scipy.signal.signaltools import _centered as trim_centered
-from statsmodels.tools.validation import array_like, PandasWrapper

+from statsmodels.tools.validation import array_like, PandasWrapper

-def fftconvolveinv(in1, in2, mode='full'):
+def _pad_nans(x, head=None, tail=None):
+    if np.ndim(x) == 1:
+        if head is None and tail is None:
+            return x
+        elif head and tail:
+            return np.r_[[np.nan] * head, x, [np.nan] * tail]
+        elif tail is None:
+            return np.r_[[np.nan] * head, x]
+        elif head is None:
+            return np.r_[x, [np.nan] * tail]
+    elif np.ndim(x) == 2:
+        if head is None and tail is None:
+            return x
+        elif head and tail:
+            return np.r_[[[np.nan] * x.shape[1]] * head, x,
+                         [[np.nan] * x.shape[1]] * tail]
+        elif tail is None:
+            return np.r_[[[np.nan] * x.shape[1]] * head, x]
+        elif head is None:
+            return np.r_[x, [[np.nan] * x.shape[1]] * tail]
+    else:
+        raise ValueError("Nan-padding for ndim > 2 not implemented")
+
+#original changes and examples in sandbox.tsa.try_var_convolve
+
+# do not do these imports, here just for copied fftconvolve
+#get rid of these imports
+#from scipy.fftpack import fft, ifft, ifftshift, fft2, ifft2, fftn, \
+#     ifftn, fftfreq
+#from numpy import product,array
+
+
+# previous location in sandbox.tsa.try_var_convolve
+def fftconvolveinv(in1, in2, mode="full"):
     """
     Convolve two N-dimensional arrays using FFT. See convolve.

@@ -31,10 +71,38 @@ def fftconvolveinv(in1, in2, mode='full'):
     but it does not work for multidimensional inverse filter (fftn)
     original signal.fftconvolve also uses fftn
     """
-    pass
-
-
-def fftconvolve3(in1, in2=None, in3=None, mode='full'):
+    s1 = np.array(in1.shape)
+    s2 = np.array(in2.shape)
+    complex_result = (np.issubdtype(in1.dtype, np.complex) or
+                      np.issubdtype(in2.dtype, np.complex))
+    size = s1+s2-1
+
+    # Always use 2**n-sized FFT
+    fsize = 2**np.ceil(np.log2(size))
+    IN1 = fft.fftn(in1,fsize)
+    #IN1 *= fftn(in2,fsize) #JP: this looks like the only change I made
+    IN1 /= fft.fftn(in2,fsize)  # use inverse filter
+    # note the inverse is elementwise not matrix inverse
+    # is this correct, NO  does not seem to work for VARMA
+    fslice = tuple([slice(0, int(sz)) for sz in size])
+    ret = fft.ifftn(IN1)[fslice].copy()
+    del IN1
+    if not complex_result:
+        ret = ret.real
+    if mode == "full":
+        return ret
+    elif mode == "same":
+        if np.product(s1,axis=0) > np.product(s2,axis=0):
+            osize = s1
+        else:
+            osize = s2
+        return trim_centered(ret,osize)
+    elif mode == "valid":
+        return trim_centered(ret,abs(s2-s1)+1)
+
+
+#code duplication with fftconvolveinv
+def fftconvolve3(in1, in2=None, in3=None, mode="full"):
     """
     Convolve two N-dimensional arrays using FFT. See convolve.

@@ -53,9 +121,52 @@ def fftconvolve3(in1, in2=None, in3=None, mode='full'):
     but it does not work for multidimensional inverse filter (fftn)
     original signal.fftconvolve also uses fftn
     """
-    pass
-
-
+    if (in2 is None) and (in3 is None):
+        raise ValueError('at least one of in2 and in3 needs to be given')
+    s1 = np.array(in1.shape)
+    if in2 is not None:
+        s2 = np.array(in2.shape)
+    else:
+        s2 = 0
+    if in3 is not None:
+        s3 = np.array(in3.shape)
+        s2 = max(s2, s3) # try this looks reasonable for ARMA
+        #s2 = s3
+
+    complex_result = (np.issubdtype(in1.dtype, np.complex) or
+                      np.issubdtype(in2.dtype, np.complex))
+    size = s1+s2-1
+
+    # Always use 2**n-sized FFT
+    fsize = 2**np.ceil(np.log2(size))
+    #convolve shorter ones first, not sure if it matters
+    IN1 = in1.copy()  # TODO: Is this correct?
+    if in2 is not None:
+        IN1 = fft.fftn(in2, fsize)
+    if in3 is not None:
+        IN1 /= fft.fftn(in3, fsize)  # use inverse filter
+    # note the inverse is elementwise not matrix inverse
+    # is this correct, NO  does not seem to work for VARMA
+    IN1 *= fft.fftn(in1, fsize)
+    fslice = tuple([slice(0, int(sz)) for sz in size])
+    ret = fft.ifftn(IN1)[fslice].copy()
+    del IN1
+    if not complex_result:
+        ret = ret.real
+    if mode == "full":
+        return ret
+    elif mode == "same":
+        if np.product(s1,axis=0) > np.product(s2,axis=0):
+            osize = s1
+        else:
+            osize = s2
+        return trim_centered(ret,osize)
+    elif mode == "valid":
+        return trim_centered(ret,abs(s2-s1)+1)
+
+
+#original changes and examples in sandbox.tsa.try_var_convolve
+#examples and tests are there
 def recursive_filter(x, ar_coeff, init=None):
     """
     Autoregressive, or recursive, filtering.
@@ -85,7 +196,29 @@ def recursive_filter(x, ar_coeff, init=None):

     where n_coeff = len(n_coeff).
     """
-    pass
+    pw = PandasWrapper(x)
+    x = array_like(x, 'x')
+    ar_coeff = array_like(ar_coeff, 'ar_coeff')
+
+    if init is not None:  # integer init are treated differently in lfiltic
+        init = array_like(init, 'init')
+        if len(init) != len(ar_coeff):
+            raise ValueError("ar_coeff must be the same length as init")
+
+    if init is not None:
+        zi = signal.lfiltic([1], np.r_[1, -ar_coeff], init, x)
+    else:
+        zi = None
+
+    y = signal.lfilter([1.], np.r_[1, -ar_coeff], x, zi=zi)
+
+    if init is not None:
+        result = y[0]
+    else:
+        result = y
+
+    return pw.wrap(result)
+


 def convolution_filter(x, filt, nsides=2):
@@ -138,9 +271,41 @@ def convolution_filter(x, filt, nsides=2):
     fast for medium sized data. For large data fft convolution would be
     faster.
     """
-    pass
-
-
+    # for nsides shift the index instead of using 0 for 0 lag this
+    # allows correct handling of NaNs
+    if nsides == 1:
+        trim_head = len(filt) - 1
+        trim_tail = None
+    elif nsides == 2:
+        trim_head = int(np.ceil(len(filt)/2.) - 1) or None
+        trim_tail = int(np.ceil(len(filt)/2.) - len(filt) % 2) or None
+    else:  # pragma : no cover
+        raise ValueError("nsides must be 1 or 2")
+
+    pw = PandasWrapper(x)
+    x = array_like(x, 'x', maxdim=2)
+    filt = array_like(filt, 'filt', ndim=x.ndim)
+
+    if filt.ndim == 1 or min(filt.shape) == 1:
+        result = signal.convolve(x, filt, mode='valid')
+    else:  # filt.ndim == 2
+        nlags = filt.shape[0]
+        nvar = x.shape[1]
+        result = np.zeros((x.shape[0] - nlags + 1, nvar))
+        if nsides == 2:
+            for i in range(nvar):
+                # could also use np.convolve, but easier for swiching to fft
+                result[:, i] = signal.convolve(x[:, i], filt[:, i],
+                                               mode='valid')
+        elif nsides == 1:
+            for i in range(nvar):
+                result[:, i] = signal.convolve(x[:, i], np.r_[0, filt[:, i]],
+                                               mode='valid')
+    result = _pad_nans(result, trim_head, trim_tail)
+    return pw.wrap(result)
+
+
+# previously located in sandbox.tsa.garch
 def miso_lfilter(ar, ma, x, useic=False):
     """
     Filter multiple time series into a single time series.
@@ -181,4 +346,19 @@ def miso_lfilter(ar, ma, x, useic=False):
     with shapes y (nobs,), x (nobs, nvars), ar (narlags,), and
     ma (narlags, nvars).
     """
-    pass
+    ma = array_like(ma, 'ma')
+    ar = array_like(ar, 'ar')
+    inp = signal.correlate(x, ma[::-1, :])[:, (x.shape[1] + 1) // 2]
+    # for testing 2d equivalence between convolve and correlate
+    #  inp2 = signal.convolve(x, ma[:,::-1])[:, (x.shape[1]+1)//2]
+    #  np.testing.assert_almost_equal(inp2, inp)
+    nobs = x.shape[0]
+    # cut of extra values at end
+
+    # TODO: initialize also x for correlate
+    if useic:
+        return signal.lfilter([1], ar, inp,
+                              zi=signal.lfiltic(np.array([1., 0.]), ar,
+                                                useic))[0][:nobs], inp[:nobs]
+    else:
+        return signal.lfilter([1], ar, inp)[:nobs], inp[:nobs]
diff --git a/statsmodels/tsa/filters/hp_filter.py b/statsmodels/tsa/filters/hp_filter.py
index 6e4d64287..221948096 100644
--- a/statsmodels/tsa/filters/hp_filter.py
+++ b/statsmodels/tsa/filters/hp_filter.py
@@ -1,3 +1,4 @@
+
 import numpy as np
 from scipy import sparse
 from scipy.sparse.linalg import spsolve
@@ -88,4 +89,16 @@ def hpfilter(x, lamb=1600):

     .. plot:: plots/hpf_plot.py
     """
-    pass
+    pw = PandasWrapper(x)
+    x = array_like(x, 'x', ndim=1)
+    nobs = len(x)
+    I = sparse.eye(nobs, nobs)  # noqa:E741
+    offsets = np.array([0, 1, 2])
+    data = np.repeat([[1.], [-2.], [1.]], nobs, axis=1)
+    K = sparse.dia_matrix((data, offsets), shape=(nobs - 2, nobs))
+
+    use_umfpack = True
+    trend = spsolve(I+lamb*K.T.dot(K), x, use_umfpack=use_umfpack)
+
+    cycle = x - trend
+    return pw.wrap(cycle, append='cycle'), pw.wrap(trend, append='trend')
diff --git a/statsmodels/tsa/forecasting/stl.py b/statsmodels/tsa/forecasting/stl.py
index 9dbdd8519..5d8d42171 100644
--- a/statsmodels/tsa/forecasting/stl.py
+++ b/statsmodels/tsa/forecasting/stl.py
@@ -1,8 +1,11 @@
 from statsmodels.compat.pandas import Substitution, is_int_index
+
 import datetime as dt
 from typing import Any, Dict, Optional, Union
+
 import numpy as np
 import pandas as pd
+
 from statsmodels.base.data import PandasData
 from statsmodels.iolib.summary import SimpleTable, Summary
 from statsmodels.tools.docstring import Docstring, Parameter, indent
@@ -10,25 +13,58 @@ from statsmodels.tsa.base.prediction import PredictionResults
 from statsmodels.tsa.base.tsa_model import get_index_loc, get_prediction_index
 from statsmodels.tsa.seasonal import STL, DecomposeResult
 from statsmodels.tsa.statespace.kalman_filter import _check_dynamic
+
 DateLike = Union[int, str, dt.datetime, pd.Timestamp, np.datetime64]
+
 ds = Docstring(STL.__doc__)
-ds.insert_parameters('endog', Parameter('model', 'Model', [
-    'The model used to forecast endog after the seasonality has been removed using STL'
-    ]))
-ds.insert_parameters('model', Parameter('model_kwargs', 'Dict[str, Any]', [
-    'Any additional arguments needed to initialized the model using the residuals produced by subtracting the seasonality.'
-    ]))
-_stl_forecast_params = ds.extract_parameters(['endog', 'model',
-    'model_kwargs', 'period', 'seasonal', 'trend', 'low_pass',
-    'seasonal_deg', 'trend_deg', 'low_pass_deg', 'robust', 'seasonal_jump',
-    'trend_jump', 'low_pass_jump'])
+ds.insert_parameters(
+    "endog",
+    Parameter(
+        "model",
+        "Model",
+        [
+            "The model used to forecast endog after the seasonality has been "
+            "removed using STL"
+        ],
+    ),
+)
+ds.insert_parameters(
+    "model",
+    Parameter(
+        "model_kwargs",
+        "Dict[str, Any]",
+        [
+            "Any additional arguments needed to initialized the model using "
+            "the residuals produced by subtracting the seasonality."
+        ],
+    ),
+)
+_stl_forecast_params = ds.extract_parameters(
+    [
+        "endog",
+        "model",
+        "model_kwargs",
+        "period",
+        "seasonal",
+        "trend",
+        "low_pass",
+        "seasonal_deg",
+        "trend_deg",
+        "low_pass_deg",
+        "robust",
+        "seasonal_jump",
+        "trend_jump",
+        "low_pass_jump",
+    ]
+)
+
 ds = Docstring(STL.fit.__doc__)
-_fit_params = ds.extract_parameters(['inner_iter', 'outer_iter'])
+_fit_params = ds.extract_parameters(["inner_iter", "outer_iter"])


-@Substitution(stl_forecast_params=indent(_stl_forecast_params, '    '))
+@Substitution(stl_forecast_params=indent(_stl_forecast_params, "    "))
 class STLForecast:
-    """
+    r"""
     Model-based forecasting using STL to remove seasonality

     Forecasts are produced by first subtracting the seasonality
@@ -52,12 +88,12 @@ class STLForecast:

     Notes
     -----
-    If :math:`\\hat{S}_t` is the seasonal component, then the deseasonalize
+    If :math:`\hat{S}_t` is the seasonal component, then the deseasonalize
     series is constructed as

     .. math::

-        Y_t - \\hat{S}_t
+        Y_t - \hat{S}_t

     The trend component is not removed, and so the time series model should
     be capable of adequately fitting and forecasting the trend if present. The
@@ -65,9 +101,9 @@ class STLForecast:

     .. math::

-        \\hat{S}_{T + h} = \\hat{S}_{T - k}
+        \hat{S}_{T + h} = \hat{S}_{T - k}

-    where :math:`k = m - h + m \\lfloor (h-1)/m \\rfloor` tracks the period
+    where :math:`k = m - h + m \lfloor (h-1)/m \rfloor` tracks the period
     offset in the full cycle of 1, 2, ..., m where m is the period length.

     This class is mostly a convenience wrapper around ``STL`` and a
@@ -113,29 +149,50 @@ class STLForecast:
     >>> forecasts = res.forecast(12)
     """

-    def __init__(self, endog, model, *, model_kwargs=None, period=None,
-        seasonal=7, trend=None, low_pass=None, seasonal_deg=1, trend_deg=1,
-        low_pass_deg=1, robust=False, seasonal_jump=1, trend_jump=1,
-        low_pass_jump=1):
+    def __init__(
+        self,
+        endog,
+        model,
+        *,
+        model_kwargs=None,
+        period=None,
+        seasonal=7,
+        trend=None,
+        low_pass=None,
+        seasonal_deg=1,
+        trend_deg=1,
+        low_pass_deg=1,
+        robust=False,
+        seasonal_jump=1,
+        trend_jump=1,
+        low_pass_jump=1,
+    ):
         self._endog = endog
-        self._stl_kwargs = dict(period=period, seasonal=seasonal, trend=
-            trend, low_pass=low_pass, seasonal_deg=seasonal_deg, trend_deg=
-            trend_deg, low_pass_deg=low_pass_deg, robust=robust,
-            seasonal_jump=seasonal_jump, trend_jump=trend_jump,
-            low_pass_jump=low_pass_jump)
+        self._stl_kwargs = dict(
+            period=period,
+            seasonal=seasonal,
+            trend=trend,
+            low_pass=low_pass,
+            seasonal_deg=seasonal_deg,
+            trend_deg=trend_deg,
+            low_pass_deg=low_pass_deg,
+            robust=robust,
+            seasonal_jump=seasonal_jump,
+            trend_jump=trend_jump,
+            low_pass_jump=low_pass_jump,
+        )
         self._model = model
         self._model_kwargs = {} if model_kwargs is None else model_kwargs
-        if not hasattr(model, 'fit'):
-            raise AttributeError('model must expose a ``fit``  method.')
+        if not hasattr(model, "fit"):
+            raise AttributeError("model must expose a ``fit``  method.")

-    @Substitution(fit_params=indent(_fit_params, ' ' * 8))
+    @Substitution(fit_params=indent(_fit_params, " " * 8))
     def fit(self, *, inner_iter=None, outer_iter=None, fit_kwargs=None):
         """
         Estimate STL and forecasting model parameters.

         Parameters
-        ----------
-%(fit_params)s
+        ----------\n%(fit_params)s
         fit_kwargs : Dict[str, Any]
             Any additional keyword arguments to pass to ``model``'s ``fit``
             method when estimating the model on the decomposed residuals.
@@ -145,7 +202,19 @@ class STLForecast:
         STLForecastResults
             Results with forecasting methods.
         """
-        pass
+        fit_kwargs = {} if fit_kwargs is None else fit_kwargs
+        stl = STL(self._endog, **self._stl_kwargs)
+        stl_fit: DecomposeResult = stl.fit(
+            inner_iter=inner_iter, outer_iter=outer_iter
+        )
+        model_endog = stl_fit.trend + stl_fit.resid
+        mod = self._model(model_endog, **self._model_kwargs)
+        res = mod.fit(**fit_kwargs)
+        if not hasattr(res, "forecast"):
+            raise AttributeError(
+                "The model's result must expose a ``forecast`` method."
+            )
+        return STLForecastResults(stl, stl_fit, mod, res, self._endog)


 class STLForecastResults:
@@ -164,48 +233,51 @@ class STLForecastResults:
         Model results instance supporting, at a minimum, ``forecast``.
     """

-    def __init__(self, stl: STL, result: DecomposeResult, model,
-        model_result, endog) ->None:
+    def __init__(
+        self, stl: STL, result: DecomposeResult, model, model_result, endog
+    ) -> None:
         self._stl = stl
         self._result = result
         self._model = model
         self._model_result = model_result
         self._endog = np.asarray(endog)
         self._nobs = self._endog.shape[0]
-        self._index = getattr(endog, 'index', pd.RangeIndex(self._nobs))
-        if not (isinstance(self._index, (pd.DatetimeIndex, pd.PeriodIndex)) or
-            is_int_index(self._index)):
+        self._index = getattr(endog, "index", pd.RangeIndex(self._nobs))
+        if not (
+            isinstance(self._index, (pd.DatetimeIndex, pd.PeriodIndex))
+            or is_int_index(self._index)
+        ):
             try:
                 self._index = pd.to_datetime(self._index)
             except ValueError:
                 self._index = pd.RangeIndex(self._nobs)

     @property
-    def period(self) ->int:
+    def period(self) -> int:
         """The period of the seasonal component"""
-        pass
+        return self._stl.period

     @property
-    def stl(self) ->STL:
+    def stl(self) -> STL:
         """The STL instance used to decompose the time series"""
-        pass
+        return self._stl

     @property
-    def result(self) ->DecomposeResult:
+    def result(self) -> DecomposeResult:
         """The result of applying STL to the data"""
-        pass
+        return self._result

     @property
-    def model(self) ->Any:
+    def model(self) -> Any:
         """The model fit to the additively deseasonalized data"""
-        pass
+        return self._model

     @property
-    def model_result(self) ->Any:
+    def model_result(self) -> Any:
         """The result class from the estimated model"""
-        pass
+        return self._model_result

-    def summary(self) ->Summary:
+    def summary(self) -> Summary:
         """
         Summary of both the STL decomposition and the model fit.

@@ -219,10 +291,52 @@ class STLForecastResults:
         Requires that the model's result class supports ``summary`` and
         returns a ``Summary`` object.
         """
-        pass
-
-    def _get_seasonal_prediction(self, start: Optional[DateLike], end:
-        Optional[DateLike], dynamic: Union[bool, DateLike]) ->np.ndarray:
+        if not hasattr(self._model_result, "summary"):
+            raise AttributeError(
+                "The model result does not have a summary attribute."
+            )
+        summary: Summary = self._model_result.summary()
+        if not isinstance(summary, Summary):
+            raise TypeError(
+                "The model result's summary is not a Summary object."
+            )
+        summary.tables[0].title = (
+            "STL Decomposition and " + summary.tables[0].title
+        )
+        config = self._stl.config
+        left_keys = ("period", "seasonal", "robust")
+        left_data = []
+        left_stubs = []
+        right_data = []
+        right_stubs = []
+        for key in config:
+            new = key.capitalize()
+            new = new.replace("_", " ")
+            if new in ("Trend", "Low Pass"):
+                new += " Length"
+            is_left = any(key.startswith(val) for val in left_keys)
+            new += ":"
+            stub = f"{new:<23s}"
+            val = f"{str(config[key]):>13s}"
+            if is_left:
+                left_stubs.append(stub)
+                left_data.append([val])
+            else:
+                right_stubs.append(" " * 6 + stub)
+                right_data.append([val])
+        tab = SimpleTable(
+            left_data, stubs=tuple(left_stubs), title="STL Configuration"
+        )
+        tab.extend_right(SimpleTable(right_data, stubs=right_stubs))
+        summary.tables.append(tab)
+        return summary
+
+    def _get_seasonal_prediction(
+        self,
+        start: Optional[DateLike],
+        end: Optional[DateLike],
+        dynamic: Union[bool, DateLike],
+    ) -> np.ndarray:
         """
         Get STLs seasonal in- and out-of-sample predictions

@@ -253,10 +367,40 @@ class STLForecastResults:
         ndarray
             Array containing the seasibak predictions.
         """
-        pass
-
-    def _seasonal_forecast(self, steps: int, index: Optional[pd.Index],
-        offset=None) ->Union[pd.Series, np.ndarray]:
+        data = PandasData(pd.Series(self._endog), index=self._index)
+        if start is None:
+            start = 0
+        (start, end, out_of_sample, prediction_index) = get_prediction_index(
+            start, end, self._nobs, self._index, data=data
+        )
+
+        if isinstance(dynamic, (str, dt.datetime, pd.Timestamp)):
+            dynamic, _, _ = get_index_loc(dynamic, self._index)
+            dynamic = dynamic - start
+        elif dynamic is True:
+            dynamic = 0
+        elif dynamic is False:
+            # If `dynamic=False`, then no dynamic predictions
+            dynamic = None
+        nobs = self._nobs
+        dynamic, _ = _check_dynamic(dynamic, start, end, nobs)
+        in_sample_end = end + 1 if dynamic is None else dynamic
+        seasonal = np.asarray(self._result.seasonal)
+        predictions = seasonal[start:in_sample_end]
+        oos = np.empty((0,))
+        if dynamic is not None:
+            num = out_of_sample + end + 1 - dynamic
+            oos = self._seasonal_forecast(num, None, offset=dynamic)
+        elif out_of_sample:
+            oos = self._seasonal_forecast(out_of_sample, None)
+            oos_start = max(start - nobs, 0)
+            oos = oos[oos_start:]
+        predictions = np.r_[predictions, oos]
+        return predictions
+
+    def _seasonal_forecast(
+        self, steps: int, index: Optional[pd.Index], offset=None
+    ) -> Union[pd.Series, np.ndarray]:
         """
         Get the seasonal component of the forecast

@@ -275,10 +419,20 @@ class STLForecastResults:
         seasonal : {ndarray, Series}
             The seasonal component.
         """
-        pass

-    def forecast(self, steps: int=1, **kwargs: Dict[str, Any]) ->Union[np.
-        ndarray, pd.Series]:
+        period = self.period
+        seasonal = np.asarray(self._result.seasonal)
+        offset = self._nobs if offset is None else offset
+        seasonal = seasonal[offset - period : offset]
+        seasonal = np.tile(seasonal, steps // period + ((steps % period) != 0))
+        seasonal = seasonal[:steps]
+        if index is not None:
+            seasonal = pd.Series(seasonal, index=index)
+        return seasonal
+
+    def forecast(
+        self, steps: int = 1, **kwargs: Dict[str, Any]
+    ) -> Union[np.ndarray, pd.Series]:
         """
         Out-of-sample forecasts

@@ -299,11 +453,17 @@ class STLForecastResults:
         forecast : {ndarray, Series}
             Out of sample forecasts
         """
-        pass
-
-    def get_prediction(self, start: Optional[DateLike]=None, end: Optional[
-        DateLike]=None, dynamic: Union[bool, DateLike]=False, **kwargs:
-        Dict[str, Any]):
+        forecast = self._model_result.forecast(steps=steps, **kwargs)
+        index = forecast.index if isinstance(forecast, pd.Series) else None
+        return forecast + self._seasonal_forecast(steps, index)
+
+    def get_prediction(
+        self,
+        start: Optional[DateLike] = None,
+        end: Optional[DateLike] = None,
+        dynamic: Union[bool, DateLike] = False,
+        **kwargs: Dict[str, Any],
+    ):
         """
         In-sample prediction and out-of-sample forecasting

@@ -339,4 +499,26 @@ class STLForecastResults:
             PredictionResults instance containing in-sample predictions,
             out-of-sample forecasts, and prediction intervals.
         """
-        pass
+        pred = self._model_result.get_prediction(
+            start=start, end=end, dynamic=dynamic, **kwargs
+        )
+        seasonal_prediction = self._get_seasonal_prediction(
+            start, end, dynamic
+        )
+        mean = pred.predicted_mean + seasonal_prediction
+        try:
+            var_pred_mean = pred.var_pred_mean
+        except (AttributeError, NotImplementedError):
+            # Allow models that do not return var_pred_mean
+            import warnings
+
+            warnings.warn(
+                "The variance of the predicted mean is not available using "
+                f"the {self.model.__class__.__name__} model class.",
+                UserWarning,
+                stacklevel=2,
+            )
+            var_pred_mean = np.nan + mean.copy()
+        return PredictionResults(
+            mean, var_pred_mean, dist="norm", row_labels=pred.row_labels
+        )
diff --git a/statsmodels/tsa/forecasting/theta.py b/statsmodels/tsa/forecasting/theta.py
index 6ff173497..8380a606b 100644
--- a/statsmodels/tsa/forecasting/theta.py
+++ b/statsmodels/tsa/forecasting/theta.py
@@ -1,4 +1,4 @@
-"""
+r"""
 Implementation of the Theta forecasting method of

 Assimakopoulos, V., & Nikolopoulos, K. (2000). The theta model: a decomposition
@@ -13,24 +13,39 @@ Fioruci, J. A., Pellegrini, T. R., Louzada, F., & Petropoulos, F. (2015).
 The optimized theta method. arXiv preprint arXiv:1503.03529.
 """
 from typing import TYPE_CHECKING, Optional, Tuple
+
 import numpy as np
 import pandas as pd
 from scipy import stats
+
 from statsmodels.iolib.summary import Summary
 from statsmodels.iolib.table import SimpleTable
-from statsmodels.tools.validation import array_like, bool_like, float_like, int_like, string_like
+from statsmodels.tools.validation import (
+    array_like,
+    bool_like,
+    float_like,
+    int_like,
+    string_like,
+)
 from statsmodels.tsa.deterministic import DeterministicTerm
 from statsmodels.tsa.seasonal import seasonal_decompose
-from statsmodels.tsa.statespace.exponential_smoothing import ExponentialSmoothing
+from statsmodels.tsa.statespace.exponential_smoothing import (
+    ExponentialSmoothing,
+)
 from statsmodels.tsa.statespace.sarimax import SARIMAX
 from statsmodels.tsa.stattools import acf
 from statsmodels.tsa.tsatools import add_trend, freq_to_period
+
 if TYPE_CHECKING:
     import matplotlib.figure


+def extend_index(steps: int, index: pd.Index) -> pd.Index:
+    return DeterministicTerm._extend_index(index, steps)
+
+
 class ThetaModel:
-    """
+    r"""
     The Theta forecasting model of Assimakopoulos and Nikolopoulos (2000)

     Parameters
@@ -74,20 +89,20 @@ class ThetaModel:

     .. math::

-       \\hat{X}_{T+h|T} = \\frac{\\theta-1}{\\theta} b_0
-                         \\left[h - 1 + \\frac{1}{\\alpha}
-                         - \\frac{(1-\\alpha)^T}{\\alpha} \\right]
-                         + \\tilde{X}_{T+h|T}
+       \hat{X}_{T+h|T} = \frac{\theta-1}{\theta} b_0
+                         \left[h - 1 + \frac{1}{\alpha}
+                         - \frac{(1-\alpha)^T}{\alpha} \right]
+                         + \tilde{X}_{T+h|T}

-    where :math:`\\tilde{X}_{T+h|T}` is the SES forecast of the endogenous
-    variable using the parameter :math:`\\alpha`. :math:`b_0` is the
+    where :math:`\tilde{X}_{T+h|T}` is the SES forecast of the endogenous
+    variable using the parameter :math:`\alpha`. :math:`b_0` is the
     slope of a time trend line fitted to X using the terms 0, 1, ..., T-1.

     The model is estimated in steps:

     1. Test for seasonality
     2. Deseasonalize if seasonality detected
-    3. Estimate :math:`\\alpha` by fitting a SES model to the data and
+    3. Estimate :math:`\alpha` by fitting a SES model to the data and
        :math:`b_0` by OLS.
     4. Forecast the series
     5. Reseasonalize if the data was deseasonalized.
@@ -110,40 +125,79 @@ class ThetaModel:
        (2015). The optimized theta method. arXiv preprint arXiv:1503.03529.
     """

-    def __init__(self, endog, *, period: Optional[int]=None, deseasonalize:
-        bool=True, use_test: bool=True, method: str='auto', difference:
-        bool=False) ->None:
-        self._y = array_like(endog, 'endog', ndim=1)
+    def __init__(
+        self,
+        endog,
+        *,
+        period: Optional[int] = None,
+        deseasonalize: bool = True,
+        use_test: bool = True,
+        method: str = "auto",
+        difference: bool = False
+    ) -> None:
+        self._y = array_like(endog, "endog", ndim=1)
         if isinstance(endog, pd.DataFrame):
             self.endog_orig = endog.iloc[:, 0]
         else:
             self.endog_orig = endog
-        self._period = int_like(period, 'period', optional=True)
-        self._deseasonalize = bool_like(deseasonalize, 'deseasonalize')
-        self._use_test = bool_like(use_test, 'use_test'
-            ) and self._deseasonalize
-        self._diff = bool_like(difference, 'difference')
-        self._method = string_like(method, 'model', options=('auto',
-            'additive', 'multiplicative', 'mul', 'add'))
-        if self._method == 'auto':
-            self._method = 'mul' if self._y.min() > 0 else 'add'
+        self._period = int_like(period, "period", optional=True)
+        self._deseasonalize = bool_like(deseasonalize, "deseasonalize")
+        self._use_test = (
+            bool_like(use_test, "use_test") and self._deseasonalize
+        )
+        self._diff = bool_like(difference, "difference")
+        self._method = string_like(
+            method,
+            "model",
+            options=("auto", "additive", "multiplicative", "mul", "add"),
+        )
+        if self._method == "auto":
+            self._method = "mul" if self._y.min() > 0 else "add"
         if self._period is None and self._deseasonalize:
-            idx = getattr(endog, 'index', None)
+            idx = getattr(endog, "index", None)
             pfreq = None
             if idx is not None:
-                pfreq = getattr(idx, 'freq', None)
+                pfreq = getattr(idx, "freq", None)
                 if pfreq is None:
-                    pfreq = getattr(idx, 'inferred_freq', None)
+                    pfreq = getattr(idx, "inferred_freq", None)
             if pfreq is not None:
                 self._period = freq_to_period(pfreq)
             else:
                 raise ValueError(
-                    'You must specify a period or endog must be a pandas object with a DatetimeIndex with a freq not set to None'
-                    )
+                    "You must specify a period or endog must be a "
+                    "pandas object with a DatetimeIndex with "
+                    "a freq not set to None"
+                )
+
         self._has_seasonality = self._deseasonalize

-    def fit(self, use_mle: bool=False, disp: bool=False) ->'ThetaModelResults':
-        """
+    def _test_seasonality(self) -> None:
+        y = self._y
+        if self._diff:
+            y = np.diff(y)
+        rho = acf(y, nlags=self._period, fft=True)
+        nobs = y.shape[0]
+        stat = nobs * rho[-1] ** 2 / np.sum(rho[:-1] ** 2)
+        # CV is 10% from a chi2(1), 1.645**2
+        self._has_seasonality = stat > 2.705543454095404
+
+    def _deseasonalize_data(self) -> Tuple[np.ndarray, np.ndarray]:
+        y = self._y
+        if not self._has_seasonality:
+            return self._y, np.empty(0)
+
+        res = seasonal_decompose(y, model=self._method, period=self._period)
+        if res.seasonal.min() <= 0:
+            self._method = "add"
+            res = seasonal_decompose(y, model="add", period=self._period)
+            return y - res.seasonal, res.seasonal[: self._period]
+        else:
+            return y / res.seasonal, res.seasonal[: self._period]
+
+    def fit(
+        self, use_mle: bool = False, disp: bool = False
+    ) -> "ThetaModelResults":
+        r"""
         Estimate model parameters.

         Parameters
@@ -162,52 +216,79 @@ class ThetaModel:

         .. math::

-           X_t = X_{t-1} + b_0 + (\\alpha-1)\\epsilon_{t-1} + \\epsilon_t
+           X_t = X_{t-1} + b_0 + (\alpha-1)\epsilon_{t-1} + \epsilon_t

         When estimating the model using 2-step estimation, the model
         parameters are estimated using the OLS regression

         .. math::

-           X_t = a_0 + b_0 (t-1) + \\eta_t
+           X_t = a_0 + b_0 (t-1) + \eta_t

         and the SES

         .. math::

-           \\tilde{X}_{t+1} = \\alpha X_{t} + (1-\\alpha)\\tilde{X}_{t}
+           \tilde{X}_{t+1} = \alpha X_{t} + (1-\alpha)\tilde{X}_{t}

         Returns
         -------
         ThetaModelResult
             Model results and forecasting
         """
-        pass
+        if self._deseasonalize and self._use_test:
+            self._test_seasonality()
+        y, seasonal = self._deseasonalize_data()
+        if use_mle:
+            mod = SARIMAX(y, order=(0, 1, 1), trend="c")
+            res = mod.fit(disp=disp)
+            params = np.asarray(res.params)
+            alpha = params[1] + 1
+            if alpha > 1:
+                alpha = 0.9998
+                res = mod.fit_constrained({"ma.L1": alpha - 1})
+                params = np.asarray(res.params)
+            b0 = params[0]
+            sigma2 = params[-1]
+            one_step = res.forecast(1) - b0
+        else:
+            ct = add_trend(y, "ct", prepend=True)[:, :2]
+            ct[:, 1] -= 1
+            _, b0 = np.linalg.lstsq(ct, y, rcond=None)[0]
+            res = ExponentialSmoothing(
+                y, initial_level=y[0], initialization_method="known"
+            ).fit(disp=disp)
+            alpha = res.params[0]
+            sigma2 = None
+            one_step = res.forecast(1)
+        return ThetaModelResults(
+            b0, alpha, sigma2, one_step, seasonal, use_mle, self
+        )

     @property
-    def deseasonalize(self) ->bool:
+    def deseasonalize(self) -> bool:
         """Whether to deseasonalize the data"""
-        pass
+        return self._deseasonalize

     @property
-    def period(self) ->int:
+    def period(self) -> int:
         """The period of the seasonality"""
-        pass
+        return self._period

     @property
-    def use_test(self) ->bool:
+    def use_test(self) -> bool:
         """Whether to test the data for seasonality"""
-        pass
+        return self._use_test

     @property
-    def difference(self) ->bool:
+    def difference(self) -> bool:
         """Whether the data is differenced in the seasonality test"""
-        pass
+        return self._diff

     @property
-    def method(self) ->str:
+    def method(self) -> str:
         """The method used to deseasonalize the data"""
-        pass
+        return self._method


 class ThetaModelResults:
@@ -232,9 +313,16 @@ class ThetaModelResults:
         The model used to produce the results.
     """

-    def __init__(self, b0: float, alpha: float, sigma2: Optional[float],
-        one_step: float, seasonal: np.ndarray, use_mle: bool, model: ThetaModel
-        ) ->None:
+    def __init__(
+        self,
+        b0: float,
+        alpha: float,
+        sigma2: Optional[float],
+        one_step: float,
+        seasonal: np.ndarray,
+        use_mle: bool,
+        model: ThetaModel,
+    ) -> None:
         self._b0 = b0
         self._alpha = alpha
         self._sigma2 = sigma2
@@ -245,22 +333,27 @@ class ThetaModelResults:
         self._use_mle = use_mle

     @property
-    def params(self) ->pd.Series:
+    def params(self) -> pd.Series:
         """The forecasting model parameters"""
-        pass
+        return pd.Series([self._b0, self._alpha], index=["b0", "alpha"])

     @property
-    def sigma2(self) ->float:
+    def sigma2(self) -> float:
         """The estimated residual variance"""
-        pass
+        if self._sigma2 is None:
+            mod = SARIMAX(self.model._y, order=(0, 1, 1), trend="c")
+            res = mod.fit(disp=False)
+            self._sigma2 = np.asarray(res.params)[-1]
+        assert self._sigma2 is not None
+        return self._sigma2

     @property
-    def model(self) ->ThetaModel:
+    def model(self) -> ThetaModel:
         """The model used to produce the results"""
-        pass
+        return self._model

-    def forecast(self, steps: int=1, theta: float=2) ->pd.Series:
-        """
+    def forecast(self, steps: int = 1, theta: float = 2) -> pd.Series:
+        r"""
         Forecast the model for a given theta

         Parameters
@@ -282,13 +375,13 @@ class ThetaModelResults:

         .. math::

-           \\hat{X}_{T+h|T} = \\frac{\\theta-1}{\\theta} b_0
-                             \\left[h - 1 + \\frac{1}{\\alpha}
-                             - \\frac{(1-\\alpha)^T}{\\alpha} \\right]
-                             + \\tilde{X}_{T+h|T}
+           \hat{X}_{T+h|T} = \frac{\theta-1}{\theta} b_0
+                             \left[h - 1 + \frac{1}{\alpha}
+                             - \frac{(1-\alpha)^T}{\alpha} \right]
+                             + \tilde{X}_{T+h|T}

-        where :math:`\\tilde{X}_{T+h|T}` is the SES forecast of the endogenous
-        variable using the parameter :math:`\\alpha`. :math:`b_0` is the
+        where :math:`\tilde{X}_{T+h|T}` is the SES forecast of the endogenous
+        variable using the parameter :math:`\alpha`. :math:`b_0` is the
         slope of a time trend line fitted to X using the terms 0, 1, ..., T-1.

         This expression follows from [1]_ and [2]_ when the combination
@@ -304,10 +397,30 @@ class ThetaModelResults:
            F. (2015). The optimized theta method. arXiv preprint
            arXiv:1503.03529.
         """
-        pass

-    def forecast_components(self, steps: int=1) ->pd.DataFrame:
-        """
+        steps = int_like(steps, "steps")
+        if steps < 1:
+            raise ValueError("steps must be a positive integer")
+        theta = float_like(theta, "theta")
+        if theta < 1:
+            raise ValueError("theta must be a float >= 1")
+        thresh = 4.0 / np.finfo(np.double).eps
+        trend_weight = (theta - 1) / theta if theta < thresh else 1.0
+        comp = self.forecast_components(steps=steps)
+        fcast = trend_weight * comp.trend + np.asarray(comp.ses)
+        # Re-seasonalize if needed
+        if self.model.deseasonalize:
+            seasonal = np.asarray(comp.seasonal)
+            if self.model.method.startswith("mul"):
+                fcast *= seasonal
+            else:
+                fcast += seasonal
+        fcast.name = "forecast"
+
+        return fcast
+
+    def forecast_components(self, steps: int = 1) -> pd.DataFrame:
+        r"""
         Compute the three components of the Theta model forecast

         Parameters
@@ -323,15 +436,46 @@ class ThetaModelResults:

         Notes
         -----
-        For a given value of :math:`\\theta`, the deseasonalized forecast is
-        `fcast = w * trend + ses` where :math:`w = \\frac{theta - 1}{theta}`.
+        For a given value of :math:`\theta`, the deseasonalized forecast is
+        `fcast = w * trend + ses` where :math:`w = \frac{theta - 1}{theta}`.
         The reseasonalized forecasts are then `seasonal * fcast` if the
         seasonality is multiplicative or `seasonal + fcast` if the seasonality
         is additive.
         """
-        pass
-
-    def summary(self) ->Summary:
+        steps = int_like(steps, "steps")
+        if steps < 1:
+            raise ValueError("steps must be a positive integer")
+        alpha = self._alpha
+        b0 = self._b0
+        nobs = self._nobs
+        h = np.arange(1, steps + 1, dtype=np.float64) - 1
+        if alpha > 0:
+            h += 1 / alpha - ((1 - alpha) ** nobs / alpha)
+        trend = b0 * h
+        ses = self._one_step * np.ones(steps)
+        if self.model.method.startswith("add"):
+            season = np.zeros(steps)
+        else:
+            season = np.ones(steps)
+        # Re-seasonalize
+        if self.model.deseasonalize:
+            seasonal = self._seasonal
+            period = self.model.period
+            oos_idx = nobs + np.arange(steps)
+            seasonal_locs = oos_idx % period
+            if seasonal.shape[0]:
+                season[:] = seasonal[seasonal_locs]
+        index = getattr(self.model.endog_orig, "index", None)
+        if index is None:
+            index = pd.RangeIndex(0, self.model.endog_orig.shape[0])
+        index = extend_index(steps, index)
+
+        df = pd.DataFrame(
+            {"trend": trend, "ses": ses, "seasonal": season}, index=index
+        )
+        return df
+
+    def summary(self) -> Summary:
         """
         Summarize the model

@@ -345,11 +489,71 @@ class ThetaModelResults:
         --------
         statsmodels.iolib.summary.Summary
         """
-        pass
-
-    def prediction_intervals(self, steps: int=1, theta: float=2, alpha:
-        float=0.05) ->pd.DataFrame:
-        """
+        model = self.model
+        smry = Summary()
+
+        model_name = type(model).__name__
+        title = model_name + " Results"
+        method = "MLE" if self._use_mle else "OLS/SES"
+
+        is_series = isinstance(model.endog_orig, pd.Series)
+        index = getattr(model.endog_orig, "index", None)
+        if is_series and isinstance(index, (pd.DatetimeIndex, pd.PeriodIndex)):
+            sample = [index[0].strftime("%m-%d-%Y")]
+            sample += ["- " + index[-1].strftime("%m-%d-%Y")]
+        else:
+            sample = [str(0), str(model.endog_orig.shape[0])]
+
+        dep_name = getattr(model.endog_orig, "name", "endog") or "endog"
+        top_left = [
+            ("Dep. Variable:", [dep_name]),
+            ("Method:", [method]),
+            ("Date:", None),
+            ("Time:", None),
+            ("Sample:", [sample[0]]),
+            ("", [sample[1]]),
+        ]
+        method = (
+            "Multiplicative" if model.method.startswith("mul") else "Additive"
+        )
+        top_right = [
+            ("No. Observations:", [str(self._nobs)]),
+            ("Deseasonalized:", [str(model.deseasonalize)]),
+        ]
+
+        if model.deseasonalize:
+            top_right.extend(
+                [
+                    ("Deseas. Method:", [method]),
+                    ("Period:", [str(model.period)]),
+                    ("", [""]),
+                    ("", [""]),
+                ]
+            )
+        else:
+            top_right.extend([("", [""])] * 4)
+
+        smry.add_table_2cols(
+            self, gleft=top_left, gright=top_right, title=title
+        )
+        table_fmt = {"data_fmts": ["%s", "%#0.4g"], "data_aligns": "r"}
+
+        data = np.asarray(self.params)[:, None]
+        st = SimpleTable(
+            data,
+            ["Parameters", "Estimate"],
+            list(self.params.index),
+            title="Parameter Estimates",
+            txt_fmt=table_fmt,
+        )
+        smry.tables.append(st)
+
+        return smry
+
+    def prediction_intervals(
+        self, steps: int = 1, theta: float = 2, alpha: float = 0.05
+    ) -> pd.DataFrame:
+        r"""
         Parameters
         ----------
         steps : int, default 1
@@ -369,16 +573,33 @@ class ThetaModelResults:
         -----
         The variance of the h-step forecast is assumed to follow from the
         integrated Moving Average structure of the Theta model, and so is
-        :math:`\\sigma^2(1 + (h-1)(1 + (\\alpha-1)^2)`. The prediction interval
+        :math:`\sigma^2(1 + (h-1)(1 + (\alpha-1)^2)`. The prediction interval
         assumes that innovations are normally distributed.
         """
-        pass
-
-    def plot_predict(self, steps: int=1, theta: float=2, alpha: Optional[
-        float]=0.05, in_sample: bool=False, fig: Optional[
-        'matplotlib.figure.Figure']=None, figsize: Tuple[float, float]=None
-        ) ->'matplotlib.figure.Figure':
-        """
+        model_alpha = self.params.iloc[1]
+        sigma2_h = (
+            1 + np.arange(steps) * (1 + (model_alpha - 1) ** 2)
+        ) * self.sigma2
+        sigma_h = np.sqrt(sigma2_h)
+        quantile = stats.norm.ppf(alpha / 2)
+        predictions = self.forecast(steps, theta)
+        return pd.DataFrame(
+            {
+                "lower": predictions + sigma_h * quantile,
+                "upper": predictions + sigma_h * -quantile,
+            }
+        )
+
+    def plot_predict(
+        self,
+        steps: int = 1,
+        theta: float = 2,
+        alpha: Optional[float] = 0.05,
+        in_sample: bool = False,
+        fig: Optional["matplotlib.figure.Figure"] = None,
+        figsize: Tuple[float, float] = None,
+    ) -> "matplotlib.figure.Figure":
+        r"""
         Plot forecasts, prediction intervals and in-sample values

         Parameters
@@ -411,7 +632,38 @@ class ThetaModelResults:
         -----
         The variance of the h-step forecast is assumed to follow from the
         integrated Moving Average structure of the Theta model, and so is
-        :math:`\\sigma^2(\\alpha^2 + (h-1))`. The prediction interval assumes
+        :math:`\sigma^2(\alpha^2 + (h-1))`. The prediction interval assumes
         that innovations are normally distributed.
         """
-        pass
+        from statsmodels.graphics.utils import _import_mpl, create_mpl_fig
+
+        _import_mpl()
+        fig = create_mpl_fig(fig, figsize)
+        assert fig is not None
+        predictions = self.forecast(steps, theta)
+        pred_index = predictions.index
+
+        ax = fig.add_subplot(111)
+        nobs = self.model.endog_orig.shape[0]
+        index = pd.Index(np.arange(nobs))
+        if in_sample:
+            if isinstance(self.model.endog_orig, pd.Series):
+                index = self.model.endog_orig.index
+            ax.plot(index, self.model.endog_orig)
+        ax.plot(pred_index, predictions)
+        if alpha is not None:
+            pi = self.prediction_intervals(steps, theta, alpha)
+            label = "{0:.0%} confidence interval".format(1 - alpha)
+            ax.fill_between(
+                pred_index,
+                pi["lower"],
+                pi["upper"],
+                color="gray",
+                alpha=0.5,
+                label=label,
+            )
+
+        ax.legend(loc="best", frameon=False)
+        fig.tight_layout(pad=1.0)
+
+        return fig
diff --git a/statsmodels/tsa/holtwinters/_smoothers.py b/statsmodels/tsa/holtwinters/_smoothers.py
index 23a0df632..3f198cd07 100644
--- a/statsmodels/tsa/holtwinters/_smoothers.py
+++ b/statsmodels/tsa/holtwinters/_smoothers.py
@@ -1,9 +1,9 @@
 import numpy as np
+
 LOWER_BOUND = np.sqrt(np.finfo(float).eps)


 class HoltWintersArgs:
-
     def __init__(self, xi, p, bounds, y, m, n, transform=False):
         self._xi = xi
         self._p = p
@@ -16,6 +16,54 @@ class HoltWintersArgs:
         self._n = n
         self._transform = transform

+    @property
+    def xi(self):
+        return self._xi
+
+    @xi.setter
+    def xi(self, value):
+        self._xi = value
+
+    @property
+    def p(self):
+        return self._p
+
+    @property
+    def bounds(self):
+        return self._bounds
+
+    @property
+    def y(self):
+        return self._y
+
+    @property
+    def lvl(self):
+        return self._lvl
+
+    @property
+    def b(self):
+        return self._b
+
+    @property
+    def s(self):
+        return self._s
+
+    @property
+    def m(self):
+        return self._m
+
+    @property
+    def n(self):
+        return self._n
+
+    @property
+    def transform(self):
+        return self._transform
+
+    @transform.setter
+    def transform(self, value):
+        self._transform = value
+

 def to_restricted(p, sel, bounds):
     """
@@ -38,7 +86,22 @@ def to_restricted(p, sel, bounds):
     -------

     """
-    pass
+    a, b, g = p[:3]
+
+    if sel[0]:
+        lb = max(LOWER_BOUND, bounds[0, 0])
+        ub = min(1 - LOWER_BOUND, bounds[0, 1])
+        a = lb + a * (ub - lb)
+    if sel[1]:
+        lb = bounds[1, 0]
+        ub = min(a, bounds[1, 1])
+        b = lb + b * (ub - lb)
+    if sel[2]:
+        lb = bounds[2, 0]
+        ub = min(1.0 - a, bounds[2, 1])
+        g = lb + g * (ub - lb)
+
+    return a, b, g


 def to_unrestricted(p, sel, bounds):
@@ -55,14 +118,52 @@ def to_unrestricted(p, sel, bounds):
     ndarray
         Parameters all in (0,1)
     """
-    pass
+    # eps < a < 1 - eps
+    # eps < b <= a
+    # eps < g <= 1 - a
+
+    a, b, g = p[:3]
+
+    if sel[0]:
+        lb = max(LOWER_BOUND, bounds[0, 0])
+        ub = min(1 - LOWER_BOUND, bounds[0, 1])
+        a = (a - lb) / (ub - lb)
+    if sel[1]:
+        lb = bounds[1, 0]
+        ub = min(p[0], bounds[1, 1])
+        b = (b - lb) / (ub - lb)
+    if sel[2]:
+        lb = bounds[2, 0]
+        ub = min(1.0 - p[0], bounds[2, 1])
+        g = (g - lb) / (ub - lb)
+
+    return a, b, g


 def holt_init(x, hw_args: HoltWintersArgs):
     """
     Initialization for the Holt Models
     """
-    pass
+    # Map back to the full set of parameters
+    hw_args.p[hw_args.xi.astype(bool)] = x
+
+    # Ensure alpha and beta satisfy the requirements
+    if hw_args.transform:
+        alpha, beta, _ = to_restricted(hw_args.p, hw_args.xi, hw_args.bounds)
+    else:
+        alpha, beta = hw_args.p[:2]
+    # Level, trend and dampening
+    l0, b0, phi = hw_args.p[3:6]
+    # Save repeated calculations
+    alphac = 1 - alpha
+    betac = 1 - beta
+    # Setup alpha * y
+    y_alpha = alpha * hw_args.y
+    # In-place operations
+    hw_args.lvl[0] = l0
+    hw_args.b[0] = b0
+
+    return alpha, beta, phi, alphac, betac, y_alpha


 def holt__(x, hw_args: HoltWintersArgs):
@@ -71,7 +172,12 @@ def holt__(x, hw_args: HoltWintersArgs):
     Minimization Function
     (,)
     """
-    pass
+    _, _, _, alphac, _, y_alpha = holt_init(x, hw_args)
+    n = hw_args.n
+    lvl = hw_args.lvl
+    for i in range(1, n):
+        lvl[i] = (y_alpha[i - 1]) + (alphac * (lvl[i - 1]))
+    return hw_args.y - lvl


 def holt_mul_dam(x, hw_args: HoltWintersArgs):
@@ -80,7 +186,13 @@ def holt_mul_dam(x, hw_args: HoltWintersArgs):
     Minimization Function
     (M,) & (Md,)
     """
-    pass
+    _, beta, phi, alphac, betac, y_alpha = holt_init(x, hw_args)
+    lvl = hw_args.lvl
+    b = hw_args.b
+    for i in range(1, hw_args.n):
+        lvl[i] = (y_alpha[i - 1]) + (alphac * (lvl[i - 1] * b[i - 1] ** phi))
+        b[i] = (beta * (lvl[i] / lvl[i - 1])) + (betac * b[i - 1] ** phi)
+    return hw_args.y - lvl * b**phi


 def holt_add_dam(x, hw_args: HoltWintersArgs):
@@ -89,12 +201,39 @@ def holt_add_dam(x, hw_args: HoltWintersArgs):
     Minimization Function
     (A,) & (Ad,)
     """
-    pass
+    _, beta, phi, alphac, betac, y_alpha = holt_init(x, hw_args)
+    lvl = hw_args.lvl
+    b = hw_args.b
+    for i in range(1, hw_args.n):
+        lvl[i] = (y_alpha[i - 1]) + (alphac * (lvl[i - 1] + phi * b[i - 1]))
+        b[i] = (beta * (lvl[i] - lvl[i - 1])) + (betac * phi * b[i - 1])
+    return hw_args.y - (lvl + phi * b)


 def holt_win_init(x, hw_args: HoltWintersArgs):
     """Initialization for the Holt Winters Seasonal Models"""
-    pass
+    hw_args.p[hw_args.xi.astype(bool)] = x
+    if hw_args.transform:
+        alpha, beta, gamma = to_restricted(
+            hw_args.p, hw_args.xi, hw_args.bounds
+        )
+    else:
+        alpha, beta, gamma = hw_args.p[:3]
+
+    l0, b0, phi = hw_args.p[3:6]
+    s0 = hw_args.p[6:]
+    alphac = 1 - alpha
+    betac = 1 - beta
+    gammac = 1 - gamma
+    y_alpha = alpha * hw_args.y
+    y_gamma = gamma * hw_args.y
+    hw_args.lvl[:] = 0
+    hw_args.b[:] = 0
+    hw_args.s[:] = 0
+    hw_args.lvl[0] = l0
+    hw_args.b[0] = b0
+    hw_args.s[: hw_args.m] = s0
+    return alpha, beta, gamma, phi, alphac, betac, gammac, y_alpha, y_gamma


 def holt_win__mul(x, hw_args: HoltWintersArgs):
@@ -103,7 +242,16 @@ def holt_win__mul(x, hw_args: HoltWintersArgs):
     Minimization Function
     (,M)
     """
-    pass
+    (_, _, _, _, alphac, _, gammac, y_alpha, y_gamma) = holt_win_init(
+        x, hw_args
+    )
+    lvl = hw_args.lvl
+    s = hw_args.s
+    m = hw_args.m
+    for i in range(1, hw_args.n):
+        lvl[i] = (y_alpha[i - 1] / s[i - 1]) + (alphac * (lvl[i - 1]))
+        s[i + m - 1] = (y_gamma[i - 1] / (lvl[i - 1])) + (gammac * s[i - 1])
+    return hw_args.y - lvl * s[: -(m - 1)]


 def holt_win__add(x, hw_args: HoltWintersArgs):
@@ -112,7 +260,20 @@ def holt_win__add(x, hw_args: HoltWintersArgs):
     Minimization Function
     (,A)
     """
-    pass
+    (alpha, _, gamma, _, alphac, _, gammac, y_alpha, y_gamma) = holt_win_init(
+        x, hw_args
+    )
+    lvl = hw_args.lvl
+    s = hw_args.s
+    m = hw_args.m
+    for i in range(1, hw_args.n):
+        lvl[i] = (
+            (y_alpha[i - 1]) - (alpha * s[i - 1]) + (alphac * (lvl[i - 1]))
+        )
+        s[i + m - 1] = (
+            y_gamma[i - 1] - (gamma * (lvl[i - 1])) + (gammac * s[i - 1])
+        )
+    return hw_args.y - lvl - s[: -(m - 1)]


 def holt_win_add_mul_dam(x, hw_args: HoltWintersArgs):
@@ -121,7 +282,30 @@ def holt_win_add_mul_dam(x, hw_args: HoltWintersArgs):
     Minimization Function
     (A,M) & (Ad,M)
     """
-    pass
+    (
+        _,
+        beta,
+        _,
+        phi,
+        alphac,
+        betac,
+        gammac,
+        y_alpha,
+        y_gamma,
+    ) = holt_win_init(x, hw_args)
+    lvl = hw_args.lvl
+    b = hw_args.b
+    s = hw_args.s
+    m = hw_args.m
+    for i in range(1, hw_args.n):
+        lvl[i] = (y_alpha[i - 1] / s[i - 1]) + (
+            alphac * (lvl[i - 1] + phi * b[i - 1])
+        )
+        b[i] = (beta * (lvl[i] - lvl[i - 1])) + (betac * phi * b[i - 1])
+        s[i + m - 1] = (y_gamma[i - 1] / (lvl[i - 1] + phi * b[i - 1])) + (
+            gammac * s[i - 1]
+        )
+    return hw_args.y - (lvl + phi * b) * s[: -(m - 1)]


 def holt_win_mul_mul_dam(x, hw_args: HoltWintersArgs):
@@ -130,7 +314,30 @@ def holt_win_mul_mul_dam(x, hw_args: HoltWintersArgs):
     Minimization Function
     (M,M) & (Md,M)
     """
-    pass
+    (
+        _,
+        beta,
+        _,
+        phi,
+        alphac,
+        betac,
+        gammac,
+        y_alpha,
+        y_gamma,
+    ) = holt_win_init(x, hw_args)
+    lvl = hw_args.lvl
+    s = hw_args.s
+    b = hw_args.b
+    m = hw_args.m
+    for i in range(1, hw_args.n):
+        lvl[i] = (y_alpha[i - 1] / s[i - 1]) + (
+            alphac * (lvl[i - 1] * b[i - 1] ** phi)
+        )
+        b[i] = (beta * (lvl[i] / lvl[i - 1])) + (betac * b[i - 1] ** phi)
+        s[i + m - 1] = (y_gamma[i - 1] / (lvl[i - 1] * b[i - 1] ** phi)) + (
+            gammac * s[i - 1]
+        )
+    return hw_args.y - (lvl * b**phi) * s[: -(m - 1)]


 def holt_win_add_add_dam(x, hw_args: HoltWintersArgs):
@@ -139,7 +346,34 @@ def holt_win_add_add_dam(x, hw_args: HoltWintersArgs):
     Minimization Function
     (A,A) & (Ad,A)
     """
-    pass
+    (
+        alpha,
+        beta,
+        gamma,
+        phi,
+        alphac,
+        betac,
+        gammac,
+        y_alpha,
+        y_gamma,
+    ) = holt_win_init(x, hw_args)
+    lvl = hw_args.lvl
+    s = hw_args.s
+    b = hw_args.b
+    m = hw_args.m
+    for i in range(1, hw_args.n):
+        lvl[i] = (
+            (y_alpha[i - 1])
+            - (alpha * s[i - 1])
+            + (alphac * (lvl[i - 1] + phi * b[i - 1]))
+        )
+        b[i] = (beta * (lvl[i] - lvl[i - 1])) + (betac * phi * b[i - 1])
+        s[i + m - 1] = (
+            y_gamma[i - 1]
+            - (gamma * (lvl[i - 1] + phi * b[i - 1]))
+            + (gammac * s[i - 1])
+        )
+    return hw_args.y - ((lvl + phi * b) + s[: -(m - 1)])


 def holt_win_mul_add_dam(x, hw_args: HoltWintersArgs):
@@ -148,4 +382,31 @@ def holt_win_mul_add_dam(x, hw_args: HoltWintersArgs):
     Minimization Function
     (M,A) & (M,Ad)
     """
-    pass
+    (
+        alpha,
+        beta,
+        gamma,
+        phi,
+        alphac,
+        betac,
+        gammac,
+        y_alpha,
+        y_gamma,
+    ) = holt_win_init(x, hw_args)
+    lvl = hw_args.lvl
+    s = hw_args.s
+    b = hw_args.b
+    m = hw_args.m
+    for i in range(1, hw_args.n):
+        lvl[i] = (
+            (y_alpha[i - 1])
+            - (alpha * s[i - 1])
+            + (alphac * (lvl[i - 1] * b[i - 1] ** phi))
+        )
+        b[i] = (beta * (lvl[i] / lvl[i - 1])) + (betac * b[i - 1] ** phi)
+        s[i + m - 1] = (
+            y_gamma[i - 1]
+            - (gamma * (lvl[i - 1] * b[i - 1] ** phi))
+            + (gammac * s[i - 1])
+        )
+    return hw_args.y - ((lvl * phi * b) + s[: -(m - 1)])
diff --git a/statsmodels/tsa/holtwinters/model.py b/statsmodels/tsa/holtwinters/model.py
index a7e261e07..1e8d885a6 100644
--- a/statsmodels/tsa/holtwinters/model.py
+++ b/statsmodels/tsa/holtwinters/model.py
@@ -12,34 +12,78 @@ Author: Terence L van Zyl
 Modified: Kevin Sheppard
 """
 from statsmodels.compat.pandas import deprecate_kwarg
+
 import contextlib
 from typing import Any, Hashable, Sequence
 import warnings
+
 import numpy as np
 import pandas as pd
 from scipy.optimize import basinhopping, least_squares, minimize
 from scipy.special import inv_boxcox
 from scipy.stats import boxcox
-from statsmodels.tools.validation import array_like, bool_like, dict_like, float_like, int_like, string_like
+
+from statsmodels.tools.validation import (
+    array_like,
+    bool_like,
+    dict_like,
+    float_like,
+    int_like,
+    string_like,
+)
 from statsmodels.tsa.base.tsa_model import TimeSeriesModel
-from statsmodels.tsa.exponential_smoothing.ets import _initialization_heuristic, _initialization_simple
-from statsmodels.tsa.holtwinters import _exponential_smoothers as smoothers, _smoothers as py_smoothers
+from statsmodels.tsa.exponential_smoothing.ets import (
+    _initialization_heuristic,
+    _initialization_simple,
+)
+from statsmodels.tsa.holtwinters import (
+    _exponential_smoothers as smoothers,
+    _smoothers as py_smoothers,
+)
 from statsmodels.tsa.holtwinters._exponential_smoothers import HoltWintersArgs
-from statsmodels.tsa.holtwinters._smoothers import to_restricted, to_unrestricted
-from statsmodels.tsa.holtwinters.results import HoltWintersResults, HoltWintersResultsWrapper
+from statsmodels.tsa.holtwinters._smoothers import (
+    to_restricted,
+    to_unrestricted,
+)
+from statsmodels.tsa.holtwinters.results import (
+    HoltWintersResults,
+    HoltWintersResultsWrapper,
+)
 from statsmodels.tsa.tsatools import freq_to_period
-SMOOTHERS = {('mul', 'add'): smoothers.holt_win_add_mul_dam, ('mul', 'mul'):
-    smoothers.holt_win_mul_mul_dam, ('mul', None): smoothers.holt_win__mul,
-    ('add', 'add'): smoothers.holt_win_add_add_dam, ('add', 'mul'):
-    smoothers.holt_win_mul_add_dam, ('add', None): smoothers.holt_win__add,
-    (None, 'add'): smoothers.holt_add_dam, (None, 'mul'): smoothers.
-    holt_mul_dam, (None, None): smoothers.holt__}
-PY_SMOOTHERS = {('mul', 'add'): py_smoothers.holt_win_add_mul_dam, ('mul',
-    'mul'): py_smoothers.holt_win_mul_mul_dam, ('mul', None): py_smoothers.
-    holt_win__mul, ('add', 'add'): py_smoothers.holt_win_add_add_dam, (
-    'add', 'mul'): py_smoothers.holt_win_mul_add_dam, ('add', None):
-    py_smoothers.holt_win__add, (None, 'add'): py_smoothers.holt_add_dam, (
-    None, 'mul'): py_smoothers.holt_mul_dam, (None, None): py_smoothers.holt__}
+
+SMOOTHERS = {
+    ("mul", "add"): smoothers.holt_win_add_mul_dam,
+    ("mul", "mul"): smoothers.holt_win_mul_mul_dam,
+    ("mul", None): smoothers.holt_win__mul,
+    ("add", "add"): smoothers.holt_win_add_add_dam,
+    ("add", "mul"): smoothers.holt_win_mul_add_dam,
+    ("add", None): smoothers.holt_win__add,
+    (None, "add"): smoothers.holt_add_dam,
+    (None, "mul"): smoothers.holt_mul_dam,
+    (None, None): smoothers.holt__,
+}
+
+PY_SMOOTHERS = {
+    ("mul", "add"): py_smoothers.holt_win_add_mul_dam,
+    ("mul", "mul"): py_smoothers.holt_win_mul_mul_dam,
+    ("mul", None): py_smoothers.holt_win__mul,
+    ("add", "add"): py_smoothers.holt_win_add_add_dam,
+    ("add", "mul"): py_smoothers.holt_win_mul_add_dam,
+    ("add", None): py_smoothers.holt_win__add,
+    (None, "add"): py_smoothers.holt_add_dam,
+    (None, "mul"): py_smoothers.holt_mul_dam,
+    (None, None): py_smoothers.holt__,
+}
+
+
+def opt_wrapper(func):
+    def f(*args, **kwargs):
+        err = func(*args, **kwargs)
+        if isinstance(err, np.ndarray):
+            return err.T @ err
+        return err
+
+    return f


 class _OptConfig:
@@ -55,6 +99,17 @@ class _OptConfig:
     mask: np.ndarray
     mle_retvals: Any

+    def unpack_parameters(self, params) -> "_OptConfig":
+        self.alpha = params[0]
+        self.beta = params[1]
+        self.gamma = params[2]
+        self.level = params[3]
+        self.trend = params[4]
+        self.phi = params[5]
+        self.seasonal = params[6:]
+
+        return self
+

 class ExponentialSmoothing(TimeSeriesModel):
     """
@@ -149,60 +204,88 @@ class ExponentialSmoothing(TimeSeriesModel):
         and practice. OTexts, 2014.
     """

-    @deprecate_kwarg('damped', 'damped_trend')
-    def __init__(self, endog, trend=None, damped_trend=False, seasonal=None,
-        *, seasonal_periods=None, initialization_method='estimated',
-        initial_level=None, initial_trend=None, initial_seasonal=None,
-        use_boxcox=False, bounds=None, dates=None, freq=None, missing='none'):
+    @deprecate_kwarg("damped", "damped_trend")
+    def __init__(
+        self,
+        endog,
+        trend=None,
+        damped_trend=False,
+        seasonal=None,
+        *,
+        seasonal_periods=None,
+        initialization_method="estimated",
+        initial_level=None,
+        initial_trend=None,
+        initial_seasonal=None,
+        use_boxcox=False,
+        bounds=None,
+        dates=None,
+        freq=None,
+        missing="none",
+    ):
         super().__init__(endog, None, dates, freq, missing=missing)
-        self._y = self._data = array_like(endog, 'endog', ndim=1,
-            contiguous=True, order='C')
-        options = 'add', 'mul', 'additive', 'multiplicative'
-        trend = string_like(trend, 'trend', options=options, optional=True)
-        if trend in ['additive', 'multiplicative']:
-            trend = {'additive': 'add', 'multiplicative': 'mul'}[trend]
+        self._y = self._data = array_like(
+            endog, "endog", ndim=1, contiguous=True, order="C"
+        )
+        options = ("add", "mul", "additive", "multiplicative")
+        trend = string_like(trend, "trend", options=options, optional=True)
+        if trend in ["additive", "multiplicative"]:
+            trend = {"additive": "add", "multiplicative": "mul"}[trend]
         self.trend = trend
-        self.damped_trend = bool_like(damped_trend, 'damped_trend')
-        seasonal = string_like(seasonal, 'seasonal', options=options,
-            optional=True)
-        if seasonal in ['additive', 'multiplicative']:
-            seasonal = {'additive': 'add', 'multiplicative': 'mul'}[seasonal]
+        self.damped_trend = bool_like(damped_trend, "damped_trend")
+        seasonal = string_like(
+            seasonal, "seasonal", options=options, optional=True
+        )
+        if seasonal in ["additive", "multiplicative"]:
+            seasonal = {"additive": "add", "multiplicative": "mul"}[seasonal]
         self.seasonal = seasonal
-        self.has_trend = trend in ['mul', 'add']
-        self.has_seasonal = seasonal in ['mul', 'add']
-        if (self.trend == 'mul' or self.seasonal == 'mul') and not np.all(
-            self._data > 0.0):
+        self.has_trend = trend in ["mul", "add"]
+        self.has_seasonal = seasonal in ["mul", "add"]
+        if (self.trend == "mul" or self.seasonal == "mul") and not np.all(
+            self._data > 0.0
+        ):
             raise ValueError(
-                'endog must be strictly positive when usingmultiplicative trend or seasonal components.'
-                )
+                "endog must be strictly positive when using"
+                "multiplicative trend or seasonal components."
+            )
         if self.damped_trend and not self.has_trend:
-            raise ValueError('Can only dampen the trend component')
+            raise ValueError("Can only dampen the trend component")
         if self.has_seasonal:
-            self.seasonal_periods = int_like(seasonal_periods,
-                'seasonal_periods', optional=True)
+            self.seasonal_periods = int_like(
+                seasonal_periods, "seasonal_periods", optional=True
+            )
             if seasonal_periods is None:
                 try:
                     self.seasonal_periods = freq_to_period(self._index_freq)
                 except Exception:
                     raise ValueError(
-                        'seasonal_periods has not been provided and index does not have a known freq. You must provide seasonal_periods'
-                        )
+                        "seasonal_periods has not been provided and index "
+                        "does not have a known freq. You must provide "
+                        "seasonal_periods"
+                    )
             if self.seasonal_periods <= 1:
-                raise ValueError('seasonal_periods must be larger than 1.')
+                raise ValueError("seasonal_periods must be larger than 1.")
             assert self.seasonal_periods is not None
         else:
             self.seasonal_periods = 0
         self.nobs = len(self.endog)
-        options = 'known', 'estimated', 'heuristic', 'legacy-heuristic'
-        self._initialization_method = string_like(initialization_method,
-            'initialization_method', optional=False, options=options)
-        self._initial_level = float_like(initial_level, 'initial_level',
-            optional=True)
-        self._initial_trend = float_like(initial_trend, 'initial_trend',
-            optional=True)
-        self._initial_seasonal = array_like(initial_seasonal,
-            'initial_seasonal', optional=True)
-        estimated = self._initialization_method == 'estimated'
+        options = ("known", "estimated", "heuristic", "legacy-heuristic")
+        self._initialization_method = string_like(
+            initialization_method,
+            "initialization_method",
+            optional=False,
+            options=options,
+        )
+        self._initial_level = float_like(
+            initial_level, "initial_level", optional=True
+        )
+        self._initial_trend = float_like(
+            initial_trend, "initial_trend", optional=True
+        )
+        self._initial_seasonal = array_like(
+            initial_seasonal, "initial_seasonal", optional=True
+        )
+        estimated = self._initialization_method == "estimated"
         self._estimate_level = estimated
         self._estimate_trend = estimated and self.trend is not None
         self._estimate_seasonal = estimated and self.seasonal is not None
@@ -213,6 +296,51 @@ class ExponentialSmoothing(TimeSeriesModel):
         self._initialize()
         self._fixed_parameters = {}

+    def _check_bounds(self, bounds):
+        bounds = dict_like(bounds, "bounds", optional=True)
+        if bounds is None:
+            return
+        msg = (
+            "bounds must be a dictionary of 2-element tuples of the form"
+            " (lb, ub) where lb < ub, lb>=0 and ub<=1"
+        )
+        variables = self._ordered_names()
+        for key in bounds:
+            if key not in variables:
+                supported = ", ".join(variables[:-1])
+                supported += ", and " + variables[-1]
+                raise KeyError(
+                    f"{key} does not match the list of supported variables "
+                    f"names: {supported}."
+                )
+            bound = bounds[key]
+            if not isinstance(bound, tuple):
+                raise TypeError(msg)
+            lb = bound[0] if bound[0] is not None else -np.inf
+            ub = bound[1] if bound[1] is not None else np.inf
+            if len(bound) != 2 or lb >= ub:
+                raise ValueError(msg)
+            if ("smoothing" in key or "damp" in key) and (
+                bound[0] < 0.0 or bound[1] > 1.0
+            ):
+                raise ValueError(
+                    f"{key} must have a lower bound >= 0.0 and <= 1.0"
+                )
+        return bounds
+
+    def _boxcox(self):
+        if self._use_boxcox is None or self._use_boxcox is False:
+            self._lambda = np.nan
+            return self._y
+        if self._use_boxcox is True:
+            y, self._lambda = boxcox(self._y)
+        elif isinstance(self._use_boxcox, (int, float)):
+            self._lambda = float(self._use_boxcox)
+            y = boxcox(self._y, self._use_boxcox)
+        else:
+            raise TypeError("use_boxcox must be True, False or a float.")
+        return y
+
     @contextlib.contextmanager
     def fix_params(self, values):
         """
@@ -239,7 +367,113 @@ class ExponentialSmoothing(TimeSeriesModel):
         >>> with mod.fix_params({"smoothing_level": 0.2}):
         ...     mod.fit()
         """
-        pass
+        values = dict_like(values, "values")
+        valid_keys = ("smoothing_level",)
+        if self.has_trend:
+            valid_keys += ("smoothing_trend",)
+        if self.has_seasonal:
+            valid_keys += ("smoothing_seasonal",)
+            m = self.seasonal_periods
+            valid_keys += tuple([f"initial_seasonal.{i}" for i in range(m)])
+        if self.damped_trend:
+            valid_keys += ("damping_trend",)
+        if self._initialization_method in ("estimated", None):
+            extra_keys = [
+                key.replace("smoothing_", "initial_")
+                for key in valid_keys
+                if "smoothing_" in key
+            ]
+            valid_keys += tuple(extra_keys)
+
+        for key in values:
+            if key not in valid_keys:
+                valid = ", ".join(valid_keys[:-1]) + ", and " + valid_keys[-1]
+                raise KeyError(
+                    f"{key} if not allowed. Only {valid} are supported in "
+                    "this specification."
+                )
+
+        if "smoothing_level" in values:
+            alpha = values["smoothing_level"]
+            if alpha <= 0.0:
+                raise ValueError("smoothing_level must be in (0, 1)")
+            beta = values.get("smoothing_trend", 0.0)
+            if beta > alpha:
+                raise ValueError("smoothing_trend must be <= smoothing_level")
+            gamma = values.get("smoothing_seasonal", 0.0)
+            if gamma > 1 - alpha:
+                raise ValueError(
+                    "smoothing_seasonal must be <= 1 - smoothing_level"
+                )
+
+        try:
+            self._fixed_parameters = values
+            yield
+        finally:
+            self._fixed_parameters = {}
+
+    def _initialize(self):
+        if self._initialization_method == "known":
+            return self._initialize_known()
+        msg = (
+            f"initialization method is {self._initialization_method} but "
+            "initial_{0} has been set."
+        )
+        if self._initial_level is not None:
+            raise ValueError(msg.format("level"))
+        if self._initial_trend is not None:
+            raise ValueError(msg.format("trend"))
+        if self._initial_seasonal is not None:
+            raise ValueError(msg.format("seasonal"))
+        if self._initialization_method == "legacy-heuristic":
+            return self._initialize_legacy()
+        elif self._initialization_method == "heuristic":
+            return self._initialize_heuristic()
+        elif self._initialization_method == "estimated":
+            if self.nobs < 10 + 2 * (self.seasonal_periods // 2):
+                return self._initialize_simple()
+            else:
+                return self._initialize_heuristic()
+
+    def _initialize_simple(self):
+        trend = self.trend if self.has_trend else False
+        seasonal = self.seasonal if self.has_seasonal else False
+        lvl, trend, seas = _initialization_simple(
+            self._y, trend, seasonal, self.seasonal_periods
+        )
+        self._initial_level = lvl
+        self._initial_trend = trend
+        self._initial_seasonal = seas
+
+    def _initialize_heuristic(self):
+        trend = self.trend if self.has_trend else False
+        seasonal = self.seasonal if self.has_seasonal else False
+        lvl, trend, seas = _initialization_heuristic(
+            self._y, trend, seasonal, self.seasonal_periods
+        )
+        self._initial_level = lvl
+        self._initial_trend = trend
+        self._initial_seasonal = seas
+
+    def _initialize_legacy(self):
+        lvl, trend, seasonal = self.initial_values(force=True)
+        self._initial_level = lvl
+        self._initial_trend = trend
+        self._initial_seasonal = seasonal
+
+    def _initialize_known(self):
+        msg = "initialization is 'known' but initial_{0} not given"
+        if self._initial_level is None:
+            raise ValueError(msg.format("level"))
+        excess = "initial_{0} set but model has no {0} component"
+        if self.has_trend and self._initial_trend is None:
+            raise ValueError(msg.format("trend"))
+        elif not self.has_trend and self._initial_trend is not None:
+            raise ValueError(excess.format("trend"))
+        if self.has_seasonal and self._initial_seasonal is None:
+            raise ValueError(msg.format("seasonal"))
+        elif not self.has_seasonal and self._initial_seasonal is not None:
+            raise ValueError(excess.format("seasonal"))

     def predict(self, params, start=None, end=None):
         """
@@ -263,16 +497,457 @@ class ExponentialSmoothing(TimeSeriesModel):
         ndarray
             The predicted values.
         """
-        pass
-
-    @deprecate_kwarg('smoothing_slope', 'smoothing_trend')
-    @deprecate_kwarg('initial_slope', 'initial_trend')
-    @deprecate_kwarg('damping_slope', 'damping_trend')
-    def fit(self, smoothing_level=None, smoothing_trend=None,
-        smoothing_seasonal=None, damping_trend=None, *, optimized=True,
-        remove_bias=False, start_params=None, method=None, minimize_kwargs=
-        None, use_brute=True, use_boxcox=None, use_basinhopping=None,
-        initial_level=None, initial_trend=None):
+        if start is None:
+            freq = getattr(self._index, "freq", 1)
+            if isinstance(freq, int):
+                start = self._index.shape[0]
+            else:
+                start = self._index[-1] + freq
+        start, end, out_of_sample, _ = self._get_prediction_index(
+            start=start, end=end
+        )
+        if out_of_sample > 0:
+            res = self._predict(h=out_of_sample, **params)
+        else:
+            res = self._predict(h=0, **params)
+        return res.fittedfcast[start : end + out_of_sample + 1]
+
+    def _enforce_bounds(self, p, sel, lb, ub):
+        initial_p = p[sel]
+
+        # Ensure strictly inbounds
+        loc = initial_p <= lb
+        upper = ub[loc].copy()
+        upper[~np.isfinite(upper)] = 100.0
+        eps = 1e-4
+        initial_p[loc] = lb[loc] + eps * (upper - lb[loc])
+
+        loc = initial_p >= ub
+        lower = lb[loc].copy()
+        lower[~np.isfinite(lower)] = -100.0
+        eps = 1e-4
+        initial_p[loc] = ub[loc] - eps * (ub[loc] - lower)
+
+        return initial_p
+
+    @staticmethod
+    def _check_blocked_keywords(
+        d: dict, keys: Sequence[Hashable], name="kwargs"
+    ):
+        for key in keys:
+            if key in d:
+                raise ValueError(f"{name} must not contain '{key}'")
+
+    def _check_bound_feasibility(self, bounds):
+        if bounds[1][0] > bounds[0][1]:
+            raise ValueError(
+                "The bounds for smoothing_trend and smoothing_level are "
+                "incompatible since smoothing_trend <= smoothing_level."
+            )
+        if bounds[2][0] > (1 - bounds[0][1]):
+            raise ValueError(
+                "The bounds for smoothing_seasonal and smoothing_level "
+                "are incompatible since smoothing_seasonal <= "
+                "1 - smoothing_level."
+            )
+
+    @staticmethod
+    def _setup_brute(sel, bounds, alpha):
+        # More points when fewer parameters
+        ns = 87 // sel[:3].sum()
+
+        if not sel[0]:
+            # Easy case since no cross-constraints
+            nparams = int(sel[1]) + int(sel[2])
+            args = []
+            for i in range(1, 3):
+                if sel[i]:
+                    bound = bounds[i]
+                    step = bound[1] - bound[0]
+                    lb = bound[0] + 0.005 * step
+                    if i == 1:
+                        ub = min(bound[1], alpha) - 0.005 * step
+                    else:
+                        ub = min(bound[1], 1 - alpha) - 0.005 * step
+                    args.append(np.linspace(lb, ub, ns))
+            points = np.stack(np.meshgrid(*args))
+            points = points.reshape((nparams, -1)).T
+            return np.ascontiguousarray(points)
+
+        bound = bounds[0]
+        step = 0.005 * (bound[1] - bound[0])
+        points = np.linspace(bound[0] + step, bound[1] - step, ns)
+        if not sel[1] and not sel[2]:
+            return points[:, None]
+
+        combined = []
+        b_bounds = bounds[1]
+        g_bounds = bounds[2]
+        if sel[1] and sel[2]:
+            for a in points:
+                b_lb = b_bounds[0]
+                b_ub = min(b_bounds[1], a)
+                g_lb = g_bounds[0]
+                g_ub = min(g_bounds[1], 1 - a)
+                if b_lb > b_ub or g_lb > g_ub:
+                    # infeasible point
+                    continue
+                nb = int(np.ceil(ns * np.sqrt(a)))
+                ng = int(np.ceil(ns * np.sqrt(1 - a)))
+                b = np.linspace(b_lb, b_ub, nb)
+                g = np.linspace(g_lb, g_ub, ng)
+                both = np.stack(np.meshgrid(b, g)).reshape(2, -1).T
+                final = np.empty((both.shape[0], 3))
+                final[:, 0] = a
+                final[:, 1:] = both
+                combined.append(final)
+        elif sel[1]:
+            for a in points:
+                b_lb = b_bounds[0]
+                b_ub = min(b_bounds[1], a)
+                if b_lb > b_ub:
+                    # infeasible point
+                    continue
+                nb = int(np.ceil(ns * np.sqrt(a)))
+                final = np.empty((nb, 2))
+                final[:, 0] = a
+                final[:, 1] = np.linspace(b_lb, b_ub, nb)
+                combined.append(final)
+        else:  # sel[2]
+            for a in points:
+                g_lb = g_bounds[0]
+                g_ub = min(g_bounds[1], 1 - a)
+                if g_lb > g_ub:
+                    # infeasible point
+                    continue
+                ng = int(np.ceil(ns * np.sqrt(1 - a)))
+                final = np.empty((ng, 2))
+                final[:, 1] = np.linspace(g_lb, g_ub, ng)
+                final[:, 0] = a
+                combined.append(final)
+
+        return np.vstack(combined)
+
+    def _ordered_names(self):
+        names = (
+            "smoothing_level",
+            "smoothing_trend",
+            "smoothing_seasonal",
+            "initial_level",
+            "initial_trend",
+            "damping_trend",
+        )
+        m = self.seasonal_periods
+        names += tuple([f"initial_seasonal.{i}" for i in range(m)])
+        return names
+
+    def _update_for_fixed(self, sel, alpha, beta, gamma, phi, l0, b0, s0):
+        if self._fixed_parameters:
+            fixed = self._fixed_parameters
+            names = self._ordered_names()
+            not_fixed = np.array([name not in fixed for name in names])
+            if (~sel[~not_fixed]).any():
+                invalid = []
+                for name, s, nf in zip(names, sel, not_fixed):
+                    if not s and not nf:
+                        invalid.append(name)
+                invalid_names = ", ".join(invalid)
+                raise ValueError(
+                    "Cannot fix a parameter that is not being "
+                    f"estimated: {invalid_names}"
+                )
+
+            sel &= not_fixed
+            alpha = fixed.get("smoothing_level", alpha)
+            beta = fixed.get("smoothing_trend", beta)
+            gamma = fixed.get("smoothing_seasonal", gamma)
+            phi = fixed.get("damping_trend", phi)
+            l0 = fixed.get("initial_level", l0)
+            b0 = fixed.get("initial_trend", b0)
+            for i in range(self.seasonal_periods):
+                s0[i] = fixed.get(f"initial_seasonal.{i}", s0[i])
+        return sel, alpha, beta, gamma, phi, l0, b0, s0
+
+    def _construct_bounds(self):
+        trend_lb = 0.0 if self.trend == "mul" else None
+        season_lb = 0.0 if self.seasonal == "mul" else None
+        lvl_lb = None if trend_lb is None and season_lb is None else 0.0
+        bounds = [
+            (0.0, 1.0),  # alpha
+            (0.0, 1.0),  # beta
+            (0.0, 1.0),  # gamma
+            (lvl_lb, None),  # level
+            (trend_lb, None),  # trend
+            (0.8, 0.995),  # phi
+        ]
+        bounds += [(season_lb, None)] * self.seasonal_periods
+        if self._bounds is not None:
+            assert isinstance(self._bounds, dict)
+            for i, name in enumerate(self._ordered_names()):
+                bounds[i] = self._bounds.get(name, bounds[i])
+        # Update bounds to account for fixed parameters
+        fixed = self._fixed_parameters
+        if "smoothing_level" in fixed:
+            # Update bounds if fixed alpha
+            alpha = fixed["smoothing_level"]
+            # beta <= alpha
+            if bounds[1][1] > alpha:
+                bounds[1] = (bounds[1][0], alpha)
+            # gamma <= 1 - alpha
+            if bounds[2][1] > (1 - alpha):
+                bounds[2] = (bounds[2][0], 1 - alpha)
+            # gamma <= 1 - alpha
+        if "smoothing_trend" in fixed:
+            # beta <= alpha
+            beta = fixed["smoothing_trend"]
+            bounds[0] = (max(beta, bounds[0][0]), bounds[0][1])
+        if "smoothing_seasonal" in fixed:
+            gamma = fixed["smoothing_seasonal"]
+            # gamma <= 1 - alpha => alpha <= 1 - gamma
+            bounds[0] = (bounds[0][0], min(1 - gamma, bounds[0][1]))
+        # Ensure bounds are feasible
+        for i, name in enumerate(self._ordered_names()):
+            lb = bounds[i][0] if bounds[i][0] is not None else -np.inf
+            ub = bounds[i][1] if bounds[i][1] is not None else np.inf
+            if lb >= ub:
+                raise ValueError(
+                    "After adjusting for user-provided bounds fixed values, "
+                    f"the resulting set of bounds for {name}, {bounds[i]}, "
+                    "are infeasible."
+                )
+        self._check_bound_feasibility(bounds)
+        return bounds
+
+    def _get_starting_values(
+        self,
+        params,
+        start_params,
+        use_brute,
+        sel,
+        hw_args,
+        bounds,
+        alpha,
+        func,
+    ):
+        if start_params is None and use_brute and np.any(sel[:3]):
+            # Have a quick look in the region for a good starting place for
+            # alpha, beta & gamma using fixed values for initial
+            m = self.seasonal_periods
+            sv_sel = np.array([False] * (6 + m))
+            sv_sel[:3] = True
+            sv_sel &= sel
+            hw_args.xi = sv_sel.astype(int)
+            hw_args.transform = False
+            # Setup the grid points, respecting constraints
+            points = self._setup_brute(sv_sel, bounds, alpha)
+            opt = opt_wrapper(func)
+            best_val = np.inf
+            best_params = points[0]
+            for point in points:
+                val = opt(point, hw_args)
+                if val < best_val:
+                    best_params = point
+                    best_val = val
+            params[sv_sel] = best_params
+        elif start_params is not None:
+            if len(start_params) != sel.sum():
+                msg = "start_params must have {0} values but has {1}."
+                nxi, nsp = len(sel), len(start_params)
+                raise ValueError(msg.format(nxi, nsp))
+            params[sel] = start_params
+        return params
+
+    def _optimize_parameters(
+        self, data: _OptConfig, use_brute, method, kwargs
+    ) -> _OptConfig:
+        # Prepare starting values
+        alpha = data.alpha
+        beta = data.beta
+        phi = data.phi
+        gamma = data.gamma
+        y = data.y
+        start_params = data.params
+
+        has_seasonal = self.has_seasonal
+        has_trend = self.has_trend
+        trend = self.trend
+        seasonal = self.seasonal
+        damped_trend = self.damped_trend
+
+        m = self.seasonal_periods
+        params = np.zeros(6 + m)
+        l0, b0, s0 = self.initial_values(
+            initial_level=data.level, initial_trend=data.trend
+        )
+
+        init_alpha = alpha if alpha is not None else 0.5 / max(m, 1)
+        init_beta = beta
+        if beta is None and has_trend:
+            init_beta = 0.1 * init_alpha
+        init_gamma = gamma
+        if has_seasonal and gamma is None:
+            init_gamma = 0.05 * (1 - init_alpha)
+        init_phi = phi if phi is not None else 0.99
+        # Selection of parameters to optimize
+        sel = np.array(
+            [
+                alpha is None,
+                has_trend and beta is None,
+                has_seasonal and gamma is None,
+                self._estimate_level,
+                self._estimate_trend,
+                damped_trend and phi is None,
+            ]
+            + [has_seasonal and self._estimate_seasonal] * m,
+        )
+        (
+            sel,
+            init_alpha,
+            init_beta,
+            init_gamma,
+            init_phi,
+            l0,
+            b0,
+            s0,
+        ) = self._update_for_fixed(
+            sel, init_alpha, init_beta, init_gamma, init_phi, l0, b0, s0
+        )
+
+        func = SMOOTHERS[(seasonal, trend)]
+        params[:6] = [init_alpha, init_beta, init_gamma, l0, b0, init_phi]
+        if m:
+            params[-m:] = s0
+        if not np.any(sel):
+            from statsmodels.tools.sm_exceptions import EstimationWarning
+
+            message = (
+                "Model has no free parameters to estimate. Set "
+                "optimized=False to suppress this warning"
+            )
+            warnings.warn(message, EstimationWarning, stacklevel=3)
+            data = data.unpack_parameters(params)
+            data.params = params
+            data.mask = sel
+
+            return data
+        orig_bounds = self._construct_bounds()
+
+        bounds = np.array(orig_bounds[:3], dtype=float)
+        hw_args = HoltWintersArgs(
+            sel.astype(int), params, bounds, y, m, self.nobs
+        )
+        params = self._get_starting_values(
+            params,
+            start_params,
+            use_brute,
+            sel,
+            hw_args,
+            bounds,
+            init_alpha,
+            func,
+        )
+
+        # We always use [0, 1] for a, b and g and handle transform inside
+        mod_bounds = [(0, 1)] * 3 + orig_bounds[3:]
+        relevant_bounds = [bnd for bnd, flag in zip(mod_bounds, sel) if flag]
+        bounds = np.array(relevant_bounds, dtype=float)
+        lb, ub = bounds.T
+        lb[np.isnan(lb)] = -np.inf
+        ub[np.isnan(ub)] = np.inf
+        hw_args.xi = sel.astype(int)
+
+        # Ensure strictly inbounds
+        initial_p = self._enforce_bounds(params, sel, lb, ub)
+        # Transform to unrestricted space
+        params[sel] = initial_p
+        params[:3] = to_unrestricted(params, sel, hw_args.bounds)
+        initial_p = params[sel]
+        # Ensure parameters are transformed internally
+        hw_args.transform = True
+        if method in ("least_squares", "ls"):
+            # Least squares uses a different format for bounds
+            ls_bounds = lb, ub
+            self._check_blocked_keywords(kwargs, ("args", "bounds"))
+            res = least_squares(
+                func, initial_p, bounds=ls_bounds, args=(hw_args,), **kwargs
+            )
+            success = res.success
+        elif method in ("basinhopping", "bh"):
+            # Take a deeper look in the local minimum we are in to find the
+            # best solution to parameters, maybe hop around to try escape the
+            # local minimum we may be in.
+            minimizer_kwargs = {"args": (hw_args,), "bounds": relevant_bounds}
+            kwargs = kwargs.copy()
+            if "minimizer_kwargs" in kwargs:
+                self._check_blocked_keywords(
+                    kwargs["minimizer_kwargs"],
+                    ("args", "bounds"),
+                    name="kwargs['minimizer_kwargs']",
+                )
+                minimizer_kwargs.update(kwargs["minimizer_kwargs"])
+                del kwargs["minimizer_kwargs"]
+            default_kwargs = {
+                "minimizer_kwargs": minimizer_kwargs,
+                "stepsize": 0.01,
+            }
+            default_kwargs.update(kwargs)
+            obj = opt_wrapper(func)
+            res = basinhopping(obj, initial_p, **default_kwargs)
+            success = res.lowest_optimization_result.success
+        else:
+            obj = opt_wrapper(func)
+            self._check_blocked_keywords(kwargs, ("args", "bounds", "method"))
+            res = minimize(
+                obj,
+                initial_p,
+                args=(hw_args,),
+                bounds=relevant_bounds,
+                method=method,
+                **kwargs,
+            )
+            success = res.success
+        # finally transform to restricted space
+        params[sel] = res.x
+        params[:3] = to_restricted(params, sel, hw_args.bounds)
+        res.x = params[sel]
+
+        if not success:
+            from statsmodels.tools.sm_exceptions import ConvergenceWarning
+
+            warnings.warn(
+                "Optimization failed to converge. Check mle_retvals.",
+                ConvergenceWarning,
+            )
+        params[sel] = res.x
+
+        data.unpack_parameters(params)
+        data.params = params
+        data.mask = sel
+        data.mle_retvals = res
+
+        return data
+
+    @deprecate_kwarg("smoothing_slope", "smoothing_trend")
+    @deprecate_kwarg("initial_slope", "initial_trend")
+    @deprecate_kwarg("damping_slope", "damping_trend")
+    def fit(
+        self,
+        smoothing_level=None,
+        smoothing_trend=None,
+        smoothing_seasonal=None,
+        damping_trend=None,
+        *,
+        optimized=True,
+        remove_bias=False,
+        start_params=None,
+        method=None,
+        minimize_kwargs=None,
+        use_brute=True,
+        use_boxcox=None,
+        use_basinhopping=None,
+        initial_level=None,
+        initial_trend=None,
+    ):
         """
         Fit the model

@@ -379,10 +1054,131 @@ class ExponentialSmoothing(TimeSeriesModel):
         [1] Hyndman, Rob J., and George Athanasopoulos. Forecasting: principles
             and practice. OTexts, 2014.
         """
-        pass
+        # Variable renames to alpha,beta, etc as this helps with following the
+        # mathematical notation in general
+        alpha = float_like(smoothing_level, "smoothing_level", True)
+        beta = float_like(smoothing_trend, "smoothing_trend", True)
+        gamma = float_like(smoothing_seasonal, "smoothing_seasonal", True)
+        phi = float_like(damping_trend, "damping_trend", True)
+        initial_level = float_like(initial_level, "initial_level", True)
+        initial_trend = float_like(initial_trend, "initial_trend", True)
+        start_params = array_like(start_params, "start_params", optional=True)
+        minimize_kwargs = dict_like(
+            minimize_kwargs, "minimize_kwargs", optional=True
+        )
+        minimize_kwargs = {} if minimize_kwargs is None else minimize_kwargs
+        use_basinhopping = bool_like(
+            use_basinhopping, "use_basinhopping", optional=True
+        )
+        supported_methods = ("basinhopping", "bh")
+        supported_methods += ("least_squares", "ls")
+        supported_methods += (
+            "L-BFGS-B",
+            "TNC",
+            "SLSQP",
+            "Powell",
+            "trust-constr",
+        )
+        method = string_like(
+            method,
+            "method",
+            options=supported_methods,
+            lower=False,
+            optional=True,
+        )
+        # TODO: Deprecate initial_level and related parameters from fit
+        if initial_level is not None or initial_trend is not None:
+            raise ValueError(
+                "Initial values were set during model construction. These "
+                "cannot be changed during fit."
+            )
+        if use_boxcox is not None:
+            raise ValueError(
+                "use_boxcox was set at model initialization and cannot "
+                "be changed"
+            )
+        elif self._use_boxcox is None:
+            use_boxcox = False
+        else:
+            use_boxcox = self._use_boxcox

-    def initial_values(self, initial_level=None, initial_trend=None, force=
-        False):
+        if use_basinhopping is not None:
+            raise ValueError(
+                "use_basinhopping is deprecated. Set optimization method "
+                "using 'method'."
+            )
+
+        data = self._data
+        damped = self.damped_trend
+        phi = phi if damped else 1.0
+        if self._use_boxcox is None:
+            if use_boxcox == "log":
+                lamda = 0.0
+                y = boxcox(data, lamda)
+            elif isinstance(use_boxcox, float):
+                lamda = use_boxcox
+                y = boxcox(data, lamda)
+            elif use_boxcox:
+                y, lamda = boxcox(data)
+                # use_boxcox = lamda
+            else:
+                y = data.squeeze()
+        else:
+            y = self._y
+
+        self._y = y
+        res = _OptConfig()
+        res.alpha = alpha
+        res.beta = beta
+        res.phi = phi
+        res.gamma = gamma
+        res.level = initial_level
+        res.trend = initial_trend
+        res.seasonal = None
+        res.y = y
+        res.params = start_params
+        res.mle_retvals = res.mask = None
+        method = "SLSQP" if method is None else method
+        if optimized:
+            res = self._optimize_parameters(
+                res, use_brute, method, minimize_kwargs
+            )
+        else:
+            l0, b0, s0 = self.initial_values(
+                initial_level=initial_level, initial_trend=initial_trend
+            )
+            res.level = l0
+            res.trend = b0
+            res.seasonal = s0
+            if self._fixed_parameters:
+                fp = self._fixed_parameters
+                res.alpha = fp.get("smoothing_level", res.alpha)
+                res.beta = fp.get("smoothing_trend", res.beta)
+                res.gamma = fp.get("smoothing_seasonal", res.gamma)
+                res.phi = fp.get("damping_trend", res.phi)
+                res.level = fp.get("initial_level", res.level)
+                res.trend = fp.get("initial_trend", res.trend)
+                res.seasonal = fp.get("initial_seasonal", res.seasonal)
+
+        hwfit = self._predict(
+            h=0,
+            smoothing_level=res.alpha,
+            smoothing_trend=res.beta,
+            smoothing_seasonal=res.gamma,
+            damping_trend=res.phi,
+            initial_level=res.level,
+            initial_trend=res.trend,
+            initial_seasons=res.seasonal,
+            use_boxcox=use_boxcox,
+            remove_bias=remove_bias,
+            is_optimized=res.mask,
+        )
+        hwfit._results.mle_retvals = res.mle_retvals
+        return hwfit
+
+    def initial_values(
+        self, initial_level=None, initial_trend=None, force=False
+    ):
         """
         Compute initial values used in the exponential smoothing recursions.

@@ -418,14 +1214,60 @@ class ExponentialSmoothing(TimeSeriesModel):
         seasonal component is added the initialization adapts to account for
         the modified structure.
         """
-        pass
-
-    @deprecate_kwarg('smoothing_slope', 'smoothing_trend')
-    @deprecate_kwarg('damping_slope', 'damping_trend')
-    def _predict(self, h=None, smoothing_level=None, smoothing_trend=None,
-        smoothing_seasonal=None, initial_level=None, initial_trend=None,
-        damping_trend=None, initial_seasons=None, use_boxcox=None, lamda=
-        None, remove_bias=None, is_optimized=None):
+        if self._initialization_method is not None and not force:
+            return (
+                self._initial_level,
+                self._initial_trend,
+                self._initial_seasonal,
+            )
+        y = self._y
+        trend = self.trend
+        seasonal = self.seasonal
+        has_seasonal = self.has_seasonal
+        has_trend = self.has_trend
+        m = self.seasonal_periods
+        l0 = initial_level
+        b0 = initial_trend
+        if has_seasonal:
+            l0 = y[np.arange(self.nobs) % m == 0].mean() if l0 is None else l0
+            if b0 is None and has_trend:
+                # TODO: Fix for short m
+                lead, lag = y[m : m + m], y[:m]
+                if trend == "mul":
+                    b0 = np.exp((np.log(lead.mean()) - np.log(lag.mean())) / m)
+                else:
+                    b0 = ((lead - lag) / m).mean()
+            s0 = list(y[:m] / l0) if seasonal == "mul" else list(y[:m] - l0)
+        elif has_trend:
+            l0 = y[0] if l0 is None else l0
+            if b0 is None:
+                b0 = y[1] / y[0] if trend == "mul" else y[1] - y[0]
+            s0 = []
+        else:
+            if l0 is None:
+                l0 = y[0]
+            b0 = None
+            s0 = []
+
+        return l0, b0, s0
+
+    @deprecate_kwarg("smoothing_slope", "smoothing_trend")
+    @deprecate_kwarg("damping_slope", "damping_trend")
+    def _predict(
+        self,
+        h=None,
+        smoothing_level=None,
+        smoothing_trend=None,
+        smoothing_seasonal=None,
+        initial_level=None,
+        initial_trend=None,
+        damping_trend=None,
+        initial_seasons=None,
+        use_boxcox=None,
+        lamda=None,
+        remove_bias=None,
+        is_optimized=None,
+    ):
         """
         Helper prediction function

@@ -434,7 +1276,211 @@ class ExponentialSmoothing(TimeSeriesModel):
         h : int, optional
             The number of time steps to forecast ahead.
         """
-        pass
+        # Variable renames to alpha, beta, etc as this helps with following the
+        # mathematical notation in general
+        alpha = smoothing_level
+        beta = smoothing_trend
+        gamma = smoothing_seasonal
+        phi = damping_trend
+
+        # Start in sample and out of sample predictions
+        data = self.endog
+        damped = self.damped_trend
+        has_seasonal = self.has_seasonal
+        has_trend = self.has_trend
+        trend = self.trend
+        seasonal = self.seasonal
+        m = self.seasonal_periods
+        phi = phi if damped else 1.0
+        if use_boxcox == "log":
+            lamda = 0.0
+            y = boxcox(data, 0.0)
+        elif isinstance(use_boxcox, float):
+            lamda = use_boxcox
+            y = boxcox(data, lamda)
+        elif use_boxcox:
+            y, lamda = boxcox(data)
+        else:
+            lamda = None
+            y = data.squeeze()
+            if np.ndim(y) != 1:
+                raise NotImplementedError("Only 1 dimensional data supported")
+        y_alpha = np.zeros((self.nobs,))
+        y_gamma = np.zeros((self.nobs,))
+        alphac = 1 - alpha
+        y_alpha[:] = alpha * y
+        betac = 1 - beta if beta is not None else 0
+        gammac = 1 - gamma if gamma is not None else 0
+        if has_seasonal:
+            y_gamma[:] = gamma * y
+        lvls = np.zeros((self.nobs + h + 1,))
+        b = np.zeros((self.nobs + h + 1,))
+        s = np.zeros((self.nobs + h + m + 1,))
+        lvls[0] = initial_level
+        b[0] = initial_trend
+        s[:m] = initial_seasons
+        phi_h = (
+            np.cumsum(np.repeat(phi, h + 1) ** np.arange(1, h + 1 + 1))
+            if damped
+            else np.arange(1, h + 1 + 1)
+        )
+        trended = {"mul": np.multiply, "add": np.add, None: lambda l, b: l}[
+            trend
+        ]
+        detrend = {"mul": np.divide, "add": np.subtract, None: lambda l, b: 0}[
+            trend
+        ]
+        dampen = {"mul": np.power, "add": np.multiply, None: lambda b, phi: 0}[
+            trend
+        ]
+        nobs = self.nobs
+        if seasonal == "mul":
+            for i in range(1, nobs + 1):
+                lvls[i] = y_alpha[i - 1] / s[i - 1] + (
+                    alphac * trended(lvls[i - 1], dampen(b[i - 1], phi))
+                )
+                if has_trend:
+                    b[i] = (beta * detrend(lvls[i], lvls[i - 1])) + (
+                        betac * dampen(b[i - 1], phi)
+                    )
+                s[i + m - 1] = y_gamma[i - 1] / trended(
+                    lvls[i - 1], dampen(b[i - 1], phi)
+                ) + (gammac * s[i - 1])
+            _trend = b[1 : nobs + 1].copy()
+            season = s[m : nobs + m].copy()
+            lvls[nobs:] = lvls[nobs]
+            if has_trend:
+                b[:nobs] = dampen(b[:nobs], phi)
+                b[nobs:] = dampen(b[nobs], phi_h)
+            trend = trended(lvls, b)
+            s[nobs + m - 1 :] = [
+                s[(nobs - 1) + j % m] for j in range(h + 1 + 1)
+            ]
+            fitted = trend * s[:-m]
+        elif seasonal == "add":
+            for i in range(1, nobs + 1):
+                lvls[i] = (
+                    y_alpha[i - 1]
+                    - (alpha * s[i - 1])
+                    + (alphac * trended(lvls[i - 1], dampen(b[i - 1], phi)))
+                )
+                if has_trend:
+                    b[i] = (beta * detrend(lvls[i], lvls[i - 1])) + (
+                        betac * dampen(b[i - 1], phi)
+                    )
+                s[i + m - 1] = (
+                    y_gamma[i - 1]
+                    - (gamma * trended(lvls[i - 1], dampen(b[i - 1], phi)))
+                    + (gammac * s[i - 1])
+                )
+            _trend = b[1 : nobs + 1].copy()
+            season = s[m : nobs + m].copy()
+            lvls[nobs:] = lvls[nobs]
+            if has_trend:
+                b[:nobs] = dampen(b[:nobs], phi)
+                b[nobs:] = dampen(b[nobs], phi_h)
+            trend = trended(lvls, b)
+            s[nobs + m - 1 :] = [
+                s[(nobs - 1) + j % m] for j in range(h + 1 + 1)
+            ]
+            fitted = trend + s[:-m]
+        else:
+            for i in range(1, nobs + 1):
+                lvls[i] = y_alpha[i - 1] + (
+                    alphac * trended(lvls[i - 1], dampen(b[i - 1], phi))
+                )
+                if has_trend:
+                    b[i] = (beta * detrend(lvls[i], lvls[i - 1])) + (
+                        betac * dampen(b[i - 1], phi)
+                    )
+            _trend = b[1 : nobs + 1].copy()
+            season = s[m : nobs + m].copy()
+            lvls[nobs:] = lvls[nobs]
+            if has_trend:
+                b[:nobs] = dampen(b[:nobs], phi)
+                b[nobs:] = dampen(b[nobs], phi_h)
+            trend = trended(lvls, b)
+            fitted = trend
+        level = lvls[1 : nobs + 1].copy()
+        if use_boxcox or use_boxcox == "log" or isinstance(use_boxcox, float):
+            fitted = inv_boxcox(fitted, lamda)
+        err = fitted[: -h - 1] - data
+        sse = err.T @ err
+        # (s0 + gamma) + (b0 + beta) + (l0 + alpha) + phi
+        k = m * has_seasonal + 2 * has_trend + 2 + 1 * damped
+        aic = self.nobs * np.log(sse / self.nobs) + k * 2
+        dof_eff = self.nobs - k - 3
+        if dof_eff > 0:
+            aicc_penalty = (2 * (k + 2) * (k + 3)) / dof_eff
+        else:
+            aicc_penalty = np.inf
+        aicc = aic + aicc_penalty
+        bic = self.nobs * np.log(sse / self.nobs) + k * np.log(self.nobs)
+        resid = data - fitted[: -h - 1]
+        if remove_bias:
+            fitted += resid.mean()
+        self.params = {
+            "smoothing_level": alpha,
+            "smoothing_trend": beta,
+            "smoothing_seasonal": gamma,
+            "damping_trend": phi if damped else np.nan,
+            "initial_level": lvls[0],
+            "initial_trend": b[0] / phi if phi > 0 else 0,
+            "initial_seasons": s[:m],
+            "use_boxcox": use_boxcox,
+            "lamda": lamda,
+            "remove_bias": remove_bias,
+        }
+
+        # Format parameters into a DataFrame
+        codes = ["alpha", "beta", "gamma", "l.0", "b.0", "phi"]
+        codes += ["s.{0}".format(i) for i in range(m)]
+        idx = [
+            "smoothing_level",
+            "smoothing_trend",
+            "smoothing_seasonal",
+            "initial_level",
+            "initial_trend",
+            "damping_trend",
+        ]
+        idx += ["initial_seasons.{0}".format(i) for i in range(m)]
+
+        formatted = [alpha, beta, gamma, lvls[0], b[0], phi]
+        formatted += s[:m].tolist()
+        formatted = list(map(lambda v: np.nan if v is None else v, formatted))
+        formatted = np.array(formatted)
+        if is_optimized is None:
+            optimized = np.zeros(len(codes), dtype=bool)
+        else:
+            optimized = is_optimized.astype(bool)
+        included = [True, has_trend, has_seasonal, True, has_trend, damped]
+        included += [True] * m
+        formatted = pd.DataFrame(
+            [[c, f, o] for c, f, o in zip(codes, formatted, optimized)],
+            columns=["name", "param", "optimized"],
+            index=idx,
+        )
+        formatted = formatted.loc[included]
+
+        hwfit = HoltWintersResults(
+            self,
+            self.params,
+            fittedfcast=fitted,
+            fittedvalues=fitted[: -h - 1],
+            fcastvalues=fitted[-h - 1 :],
+            sse=sse,
+            level=level,
+            trend=_trend,
+            season=season,
+            aic=aic,
+            bic=bic,
+            aicc=aicc,
+            resid=resid,
+            k=k,
+            params_formatted=formatted,
+            optimized=optimized,
+        )
+        return HoltWintersResultsWrapper(hwfit)


 class SimpleExpSmoothing(ExponentialSmoothing):
@@ -491,13 +1537,31 @@ class SimpleExpSmoothing(ExponentialSmoothing):
         and practice. OTexts, 2014.
     """

-    def __init__(self, endog, initialization_method=None, initial_level=None):
-        super().__init__(endog, initialization_method=initialization_method,
-            initial_level=initial_level)
-
-    def fit(self, smoothing_level=None, *, optimized=True, start_params=
-        None, initial_level=None, use_brute=True, use_boxcox=None,
-        remove_bias=False, method=None, minimize_kwargs=None):
+    def __init__(
+        self,
+        endog,
+        initialization_method=None,  # Future: 'estimated',
+        initial_level=None,
+    ):
+        super().__init__(
+            endog,
+            initialization_method=initialization_method,
+            initial_level=initial_level,
+        )
+
+    def fit(
+        self,
+        smoothing_level=None,
+        *,
+        optimized=True,
+        start_params=None,
+        initial_level=None,
+        use_brute=True,
+        use_boxcox=None,
+        remove_bias=False,
+        method=None,
+        minimize_kwargs=None,
+    ):
         """
         Fit the model

@@ -551,7 +1615,17 @@ class SimpleExpSmoothing(ExponentialSmoothing):
         [1] Hyndman, Rob J., and George Athanasopoulos. Forecasting: principles
             and practice. OTexts, 2014.
         """
-        pass
+        return super().fit(
+            smoothing_level=smoothing_level,
+            optimized=optimized,
+            start_params=start_params,
+            initial_level=initial_level,
+            use_brute=use_brute,
+            remove_bias=remove_bias,
+            use_boxcox=use_boxcox,
+            method=method,
+            minimize_kwargs=minimize_kwargs,
+        )


 class Holt(ExponentialSmoothing):
@@ -617,21 +1691,45 @@ class Holt(ExponentialSmoothing):
         and practice. OTexts, 2014.
     """

-    @deprecate_kwarg('damped', 'damped_trend')
-    def __init__(self, endog, exponential=False, damped_trend=False,
-        initialization_method=None, initial_level=None, initial_trend=None):
-        trend = 'mul' if exponential else 'add'
-        super().__init__(endog, trend=trend, damped_trend=damped_trend,
-            initialization_method=initialization_method, initial_level=
-            initial_level, initial_trend=initial_trend)
-
-    @deprecate_kwarg('smoothing_slope', 'smoothing_trend')
-    @deprecate_kwarg('initial_slope', 'initial_trend')
-    @deprecate_kwarg('damping_slope', 'damping_trend')
-    def fit(self, smoothing_level=None, smoothing_trend=None, *,
-        damping_trend=None, optimized=True, start_params=None,
-        initial_level=None, initial_trend=None, use_brute=True, use_boxcox=
-        None, remove_bias=False, method=None, minimize_kwargs=None):
+    @deprecate_kwarg("damped", "damped_trend")
+    def __init__(
+        self,
+        endog,
+        exponential=False,
+        damped_trend=False,
+        initialization_method=None,  # Future: 'estimated',
+        initial_level=None,
+        initial_trend=None,
+    ):
+        trend = "mul" if exponential else "add"
+        super().__init__(
+            endog,
+            trend=trend,
+            damped_trend=damped_trend,
+            initialization_method=initialization_method,
+            initial_level=initial_level,
+            initial_trend=initial_trend,
+        )
+
+    @deprecate_kwarg("smoothing_slope", "smoothing_trend")
+    @deprecate_kwarg("initial_slope", "initial_trend")
+    @deprecate_kwarg("damping_slope", "damping_trend")
+    def fit(
+        self,
+        smoothing_level=None,
+        smoothing_trend=None,
+        *,
+        damping_trend=None,
+        optimized=True,
+        start_params=None,
+        initial_level=None,
+        initial_trend=None,
+        use_brute=True,
+        use_boxcox=None,
+        remove_bias=False,
+        method=None,
+        minimize_kwargs=None,
+    ):
         """
         Fit the model

@@ -703,4 +1801,17 @@ class Holt(ExponentialSmoothing):
         [1] Hyndman, Rob J., and George Athanasopoulos. Forecasting: principles
             and practice. OTexts, 2014.
         """
-        pass
+        return super().fit(
+            smoothing_level=smoothing_level,
+            smoothing_trend=smoothing_trend,
+            damping_trend=damping_trend,
+            optimized=optimized,
+            start_params=start_params,
+            initial_level=initial_level,
+            initial_trend=initial_trend,
+            use_brute=use_brute,
+            use_boxcox=use_boxcox,
+            remove_bias=remove_bias,
+            method=method,
+            minimize_kwargs=minimize_kwargs,
+        )
diff --git a/statsmodels/tsa/holtwinters/results.py b/statsmodels/tsa/holtwinters/results.py
index 1d18aef0e..07b9f5147 100644
--- a/statsmodels/tsa/holtwinters/results.py
+++ b/statsmodels/tsa/holtwinters/results.py
@@ -1,11 +1,20 @@
 import numpy as np
 import pandas as pd
 from scipy.special import inv_boxcox
-from scipy.stats import boxcox, rv_continuous, rv_discrete
+from scipy.stats import (
+    boxcox,
+    rv_continuous,
+    rv_discrete,
+)
 from scipy.stats.distributions import rv_frozen
+
 from statsmodels.base.data import PandasData
 from statsmodels.base.model import Results
-from statsmodels.base.wrapper import ResultsWrapper, populate_wrapper, union_dicts
+from statsmodels.base.wrapper import (
+    ResultsWrapper,
+    populate_wrapper,
+    union_dicts,
+)


 class HoltWintersResults(Results):
@@ -54,9 +63,26 @@ class HoltWintersResults(Results):
         Optimization results if the parameters were optimized to fit the data.
     """

-    def __init__(self, model, params, sse, aic, aicc, bic, optimized, level,
-        trend, season, params_formatted, resid, k, fittedvalues,
-        fittedfcast, fcastvalues, mle_retvals=None):
+    def __init__(
+        self,
+        model,
+        params,
+        sse,
+        aic,
+        aicc,
+        bic,
+        optimized,
+        level,
+        trend,
+        season,
+        params_formatted,
+        resid,
+        k,
+        fittedvalues,
+        fittedfcast,
+        fcastvalues,
+        mle_retvals=None,
+    ):
         self.data = model.data
         super().__init__(model, params)
         self._model = model
@@ -81,63 +107,67 @@ class HoltWintersResults(Results):
         """
         The Akaike information criterion.
         """
-        pass
+        return self._aic

     @property
     def aicc(self):
         """
         AIC with a correction for finite sample sizes.
         """
-        pass
+        return self._aicc

     @property
     def bic(self):
         """
         The Bayesian information criterion.
         """
-        pass
+        return self._bic

     @property
     def sse(self):
         """
         The sum of squared errors between the data and the fittted value.
         """
-        pass
+        return self._sse

     @property
     def model(self):
         """
         The model used to produce the results instance.
         """
-        pass
+        return self._model
+
+    @model.setter
+    def model(self, value):
+        self._model = value

     @property
     def level(self):
         """
         An array of the levels values that make up the fitted values.
         """
-        pass
+        return self._level

     @property
     def optimized(self):
         """
         Flag indicating if model parameters were optimized to fit the data.
         """
-        pass
+        return self._optimized

     @property
     def trend(self):
         """
         An array of the trend values that make up the fitted values.
         """
-        pass
+        return self._trend

     @property
     def season(self):
         """
         An array of the seasonal values that make up the fitted values.
         """
-        pass
+        return self._season

     @property
     def params_formatted(self):
@@ -147,49 +177,53 @@ class HoltWintersResults(Results):
         Contains short names and a flag indicating whether the parameter's
         value was optimized to fit the data.
         """
-        pass
+        return self._params_formatted

     @property
     def fittedvalues(self):
         """
         An array of the fitted values
         """
-        pass
+        return self._fittedvalues

     @property
     def fittedfcast(self):
         """
         An array of both the fitted values and forecast values.
         """
-        pass
+        return self._fittedfcast

     @property
     def fcastvalues(self):
         """
         An array of the forecast values
         """
-        pass
+        return self._fcastvalues

     @property
     def resid(self):
         """
         An array of the residuals of the fittedvalues and actual values.
         """
-        pass
+        return self._resid

     @property
     def k(self):
         """
         The k parameter used to remove the bias in AIC, BIC etc.
         """
-        pass
+        return self._k

     @property
     def mle_retvals(self):
         """
         Optimization results if the parameters were optimized to fit the data.
         """
-        pass
+        return self._mle_retvals
+
+    @mle_retvals.setter
+    def mle_retvals(self, value):
+        self._mle_retvals = value

     def predict(self, start=None, end=None):
         """
@@ -214,7 +248,7 @@ class HoltWintersResults(Results):
         forecast : ndarray
             Array of out of sample forecasts.
         """
-        pass
+        return self.model.predict(self.params, start, end)

     def forecast(self, steps=1):
         """
@@ -231,7 +265,20 @@ class HoltWintersResults(Results):
         forecast : ndarray
             Array of out of sample forecasts
         """
-        pass
+        try:
+            freq = getattr(self.model._index, "freq", 1)
+            if not isinstance(freq, int) and isinstance(
+                self.model._index, (pd.DatetimeIndex, pd.PeriodIndex)
+            ):
+                start = self.model._index[-1] + freq
+                end = self.model._index[-1] + steps * freq
+            else:
+                start = self.model._index.shape[0]
+                end = start + steps - 1
+            return self.model.predict(self.params, start=start, end=end)
+        except AttributeError:
+            # May occur when the index does not have a freq
+            return self.model._predict(h=steps, **self.params).fcastvalues

     def summary(self):
         """
@@ -247,11 +294,107 @@ class HoltWintersResults(Results):
         --------
         statsmodels.iolib.summary.Summary
         """
-        pass
-
-    def simulate(self, nsimulations, anchor=None, repetitions=1, error=
-        'add', random_errors=None, random_state=None):
-        """
+        from statsmodels.iolib.summary import Summary
+        from statsmodels.iolib.table import SimpleTable
+
+        model = self.model
+        title = model.__class__.__name__ + " Model Results"
+
+        dep_variable = "endog"
+        orig_endog = self.model.data.orig_endog
+        if isinstance(orig_endog, pd.DataFrame):
+            dep_variable = orig_endog.columns[0]
+        elif isinstance(orig_endog, pd.Series):
+            dep_variable = orig_endog.name
+        seasonal_periods = (
+            None
+            if self.model.seasonal is None
+            else self.model.seasonal_periods
+        )
+        lookup = {
+            "add": "Additive",
+            "additive": "Additive",
+            "mul": "Multiplicative",
+            "multiplicative": "Multiplicative",
+            None: "None",
+        }
+        transform = self.params["use_boxcox"]
+        box_cox_transform = True if transform else False
+        box_cox_coeff = (
+            transform if isinstance(transform, str) else self.params["lamda"]
+        )
+        if isinstance(box_cox_coeff, float):
+            box_cox_coeff = "{:>10.5f}".format(box_cox_coeff)
+        top_left = [
+            ("Dep. Variable:", [dep_variable]),
+            ("Model:", [model.__class__.__name__]),
+            ("Optimized:", [str(np.any(self.optimized))]),
+            ("Trend:", [lookup[self.model.trend]]),
+            ("Seasonal:", [lookup[self.model.seasonal]]),
+            ("Seasonal Periods:", [str(seasonal_periods)]),
+            ("Box-Cox:", [str(box_cox_transform)]),
+            ("Box-Cox Coeff.:", [str(box_cox_coeff)]),
+        ]
+
+        top_right = [
+            ("No. Observations:", [str(len(self.model.endog))]),
+            ("SSE", ["{:5.3f}".format(self.sse)]),
+            ("AIC", ["{:5.3f}".format(self.aic)]),
+            ("BIC", ["{:5.3f}".format(self.bic)]),
+            ("AICC", ["{:5.3f}".format(self.aicc)]),
+            ("Date:", None),
+            ("Time:", None),
+        ]
+
+        smry = Summary()
+        smry.add_table_2cols(
+            self, gleft=top_left, gright=top_right, title=title
+        )
+        formatted = self.params_formatted  # type: pd.DataFrame
+
+        def _fmt(x):
+            abs_x = np.abs(x)
+            scale = 1
+            if np.isnan(x):
+                return f"{str(x):>20}"
+            if abs_x != 0:
+                scale = int(np.log10(abs_x))
+            if scale > 4 or scale < -3:
+                return "{:>20.5g}".format(x)
+            dec = min(7 - scale, 7)
+            fmt = "{{:>20.{0}f}}".format(dec)
+            return fmt.format(x)
+
+        tab = []
+        for _, vals in formatted.iterrows():
+            tab.append(
+                [
+                    _fmt(vals.iloc[1]),
+                    "{0:>20}".format(vals.iloc[0]),
+                    "{0:>20}".format(str(bool(vals.iloc[2]))),
+                ]
+            )
+        params_table = SimpleTable(
+            tab,
+            headers=["coeff", "code", "optimized"],
+            title="",
+            stubs=list(formatted.index),
+        )
+
+        smry.tables.append(params_table)
+
+        return smry
+
+    def simulate(
+        self,
+        nsimulations,
+        anchor=None,
+        repetitions=1,
+        error="add",
+        random_errors=None,
+        random_state=None,
+    ):
+        r"""
         Random simulations using the state space formulation.

         Parameters
@@ -321,15 +464,15 @@ class HoltWintersResults(Results):

         .. math::

-            y_t &= \\hat{y}_{t|t-1} + e_t\\\\
-            e_t &\\sim \\mathcal{N}(0, \\sigma^2)
+            y_t &= \hat{y}_{t|t-1} + e_t\\
+            e_t &\sim \mathcal{N}(0, \sigma^2)

         Using the multiplicative error model:

         .. math::

-            y_t &= \\hat{y}_{t|t-1} \\cdot (1 + e_t)\\\\
-            e_t &\\sim \\mathcal{N}(0, \\sigma^2)
+            y_t &= \hat{y}_{t|t-1} \cdot (1 + e_t)\\
+            e_t &\sim \mathcal{N}(0, \sigma^2)

         Inserting these equations into the smoothing equation formulation leads
         to the state space equations. The notation used here follows
@@ -339,78 +482,78 @@ class HoltWintersResults(Results):

         .. math::

-           B_t &= b_{t-1} \\circ_d \\phi\\\\
-           L_t &= l_{t-1} \\circ_b B_t\\\\
-           S_t &= s_{t-m}\\\\
-           Y_t &= L_t \\circ_s S_t,
+           B_t &= b_{t-1} \circ_d \phi\\
+           L_t &= l_{t-1} \circ_b B_t\\
+           S_t &= s_{t-m}\\
+           Y_t &= L_t \circ_s S_t,

-        where :math:`\\circ_d` is the operation linking trend and damping
+        where :math:`\circ_d` is the operation linking trend and damping
         parameter (multiplication if the trend is additive, power if the trend
-        is multiplicative), :math:`\\circ_b` is the operation linking level and
+        is multiplicative), :math:`\circ_b` is the operation linking level and
         trend (addition if the trend is additive, multiplication if the trend
-        is multiplicative), and :math:`\\circ_s` is the operation linking
+        is multiplicative), and :math:`\circ_s` is the operation linking
         seasonality to the rest.

         The state space equations can then be formulated as

         .. math::

-           y_t &= Y_t + \\eta \\cdot e_t\\\\
-           l_t &= L_t + \\alpha \\cdot (M_e \\cdot L_t + \\kappa_l) \\cdot e_t\\\\
-           b_t &= B_t + \\beta \\cdot (M_e \\cdot B_t + \\kappa_b) \\cdot e_t\\\\
-           s_t &= S_t + \\gamma \\cdot (M_e \\cdot S_t + \\kappa_s) \\cdot e_t\\\\
+           y_t &= Y_t + \eta \cdot e_t\\
+           l_t &= L_t + \alpha \cdot (M_e \cdot L_t + \kappa_l) \cdot e_t\\
+           b_t &= B_t + \beta \cdot (M_e \cdot B_t + \kappa_b) \cdot e_t\\
+           s_t &= S_t + \gamma \cdot (M_e \cdot S_t + \kappa_s) \cdot e_t\\

         with

         .. math::

-           \\eta &= \\begin{cases}
-                       Y_t\\quad\\text{if error is multiplicative}\\\\
-                       1\\quad\\text{else}
-                   \\end{cases}\\\\
-           M_e &= \\begin{cases}
-                       1\\quad\\text{if error is multiplicative}\\\\
-                       0\\quad\\text{else}
-                   \\end{cases}\\\\
+           \eta &= \begin{cases}
+                       Y_t\quad\text{if error is multiplicative}\\
+                       1\quad\text{else}
+                   \end{cases}\\
+           M_e &= \begin{cases}
+                       1\quad\text{if error is multiplicative}\\
+                       0\quad\text{else}
+                   \end{cases}\\

         and, when using the additive error model,

         .. math::

-           \\kappa_l &= \\begin{cases}
-                       \\frac{1}{S_t}\\quad
-                       \\text{if seasonality is multiplicative}\\\\
-                       1\\quad\\text{else}
-                   \\end{cases}\\\\
-           \\kappa_b &= \\begin{cases}
-                       \\frac{\\kappa_l}{l_{t-1}}\\quad
-                       \\text{if trend is multiplicative}\\\\
-                       \\kappa_l\\quad\\text{else}
-                   \\end{cases}\\\\
-           \\kappa_s &= \\begin{cases}
-                       \\frac{1}{L_t}\\quad\\text{if seasonality is
-                                               multiplicative}\\\\
-                       1\\quad\\text{else}
-                   \\end{cases}
+           \kappa_l &= \begin{cases}
+                       \frac{1}{S_t}\quad
+                       \text{if seasonality is multiplicative}\\
+                       1\quad\text{else}
+                   \end{cases}\\
+           \kappa_b &= \begin{cases}
+                       \frac{\kappa_l}{l_{t-1}}\quad
+                       \text{if trend is multiplicative}\\
+                       \kappa_l\quad\text{else}
+                   \end{cases}\\
+           \kappa_s &= \begin{cases}
+                       \frac{1}{L_t}\quad\text{if seasonality is
+                                               multiplicative}\\
+                       1\quad\text{else}
+                   \end{cases}

         When using the multiplicative error model

         .. math::

-           \\kappa_l &= \\begin{cases}
-                       0\\quad
-                       \\text{if seasonality is multiplicative}\\\\
-                       S_t\\quad\\text{else}
-                   \\end{cases}\\\\
-           \\kappa_b &= \\begin{cases}
-                       \\frac{\\kappa_l}{l_{t-1}}\\quad
-                       \\text{if trend is multiplicative}\\\\
-                       \\kappa_l + l_{t-1}\\quad\\text{else}
-                   \\end{cases}\\\\
-           \\kappa_s &= \\begin{cases}
-                       0\\quad\\text{if seasonality is multiplicative}\\\\
-                       L_t\\quad\\text{else}
-                   \\end{cases}
+           \kappa_l &= \begin{cases}
+                       0\quad
+                       \text{if seasonality is multiplicative}\\
+                       S_t\quad\text{else}
+                   \end{cases}\\
+           \kappa_b &= \begin{cases}
+                       \frac{\kappa_l}{l_{t-1}}\quad
+                       \text{if trend is multiplicative}\\
+                       \kappa_l + l_{t-1}\quad\text{else}
+                   \end{cases}\\
+           \kappa_s &= \begin{cases}
+                       0\quad\text{if seasonality is multiplicative}\\
+                       L_t\quad\text{else}
+                   \end{cases}

         References
         ----------
@@ -418,14 +561,205 @@ class HoltWintersResults(Results):
            principles and practice*, 2nd edition, OTexts: Melbourne,
            Australia. OTexts.com/fpp2. Accessed on February 28th 2020.
         """
-        pass
+
+        # check inputs
+        if error in ["additive", "multiplicative"]:
+            error = {"additive": "add", "multiplicative": "mul"}[error]
+        if error not in ["add", "mul"]:
+            raise ValueError("error must be 'add' or 'mul'!")
+
+        # Get the starting location
+        if anchor is None or anchor == "end":
+            start_idx = self.model.nobs
+        elif anchor == "start":
+            start_idx = 0
+        else:
+            start_idx, _, _ = self.model._get_index_loc(anchor)
+            if isinstance(start_idx, slice):
+                start_idx = start_idx.start
+        if start_idx < 0:
+            start_idx += self.model.nobs
+        if start_idx > self.model.nobs:
+            raise ValueError("Cannot anchor simulation outside of the sample.")
+
+        # get Holt-Winters settings and parameters
+        trend = self.model.trend
+        damped = self.model.damped_trend
+        seasonal = self.model.seasonal
+        use_boxcox = self.params["use_boxcox"]
+        lamda = self.params["lamda"]
+        alpha = self.params["smoothing_level"]
+        beta = self.params["smoothing_trend"]
+        gamma = self.params["smoothing_seasonal"]
+        phi = self.params["damping_trend"]
+        # if model has no seasonal component, use 1 as period length
+        m = max(self.model.seasonal_periods, 1)
+        n_params = (
+            2
+            + 2 * self.model.has_trend
+            + (m + 1) * self.model.has_seasonal
+            + damped
+        )
+        mul_seasonal = seasonal == "mul"
+        mul_trend = trend == "mul"
+        mul_error = error == "mul"
+
+        # define trend, damping and seasonality operations
+        if mul_trend:
+            op_b = np.multiply
+            op_d = np.power
+            neutral_b = 1
+        else:
+            op_b = np.add
+            op_d = np.multiply
+            neutral_b = 0
+        if mul_seasonal:
+            op_s = np.multiply
+            neutral_s = 1
+        else:
+            op_s = np.add
+            neutral_s = 0
+
+        # set initial values
+        level = self.level
+        _trend = self.trend
+        season = self.season
+        # (notation as in https://otexts.com/fpp2/ets.html)
+        y = np.empty((nsimulations, repetitions))
+        # lvl instead of l because of E741
+        lvl = np.empty((nsimulations + 1, repetitions))
+        b = np.empty((nsimulations + 1, repetitions))
+        s = np.empty((nsimulations + m, repetitions))
+        # the following uses python's index wrapping
+        if start_idx == 0:
+            lvl[-1, :] = self.params["initial_level"]
+            b[-1, :] = self.params["initial_trend"]
+        else:
+            lvl[-1, :] = level[start_idx - 1]
+            b[-1, :] = _trend[start_idx - 1]
+        if 0 <= start_idx and start_idx <= m:
+            initial_seasons = self.params["initial_seasons"]
+            _s = np.concatenate(
+                (initial_seasons[start_idx:], season[:start_idx])
+            )
+            s[-m:, :] = np.tile(_s, (repetitions, 1)).T
+        else:
+            s[-m:, :] = np.tile(
+                season[start_idx - m : start_idx], (repetitions, 1)
+            ).T
+
+        # set neutral values for unused features
+        if trend is None:
+            b[:, :] = neutral_b
+            phi = 1
+            beta = 0
+        if seasonal is None:
+            s[:, :] = neutral_s
+            gamma = 0
+        if not damped:
+            phi = 1
+
+        # calculate residuals for error covariance estimation
+        if use_boxcox:
+            fitted = boxcox(self.fittedvalues, lamda)
+        else:
+            fitted = self.fittedvalues
+        if error == "add":
+            resid = self.model._y - fitted
+        else:
+            resid = (self.model._y - fitted) / fitted
+        sigma = np.sqrt(np.sum(resid**2) / (len(resid) - n_params))
+
+        # get random error eps
+        if isinstance(random_errors, np.ndarray):
+            if random_errors.shape != (nsimulations, repetitions):
+                raise ValueError(
+                    "If random_errors is an ndarray, it must have shape "
+                    "(nsimulations, repetitions)"
+                )
+            eps = random_errors
+        elif random_errors == "bootstrap":
+            eps = np.random.choice(
+                resid, size=(nsimulations, repetitions), replace=True
+            )
+        elif random_errors is None:
+            if random_state is None:
+                eps = np.random.randn(nsimulations, repetitions) * sigma
+            elif isinstance(random_state, int):
+                rng = np.random.RandomState(random_state)
+                eps = rng.randn(nsimulations, repetitions) * sigma
+            elif isinstance(random_state, np.random.RandomState):
+                eps = random_state.randn(nsimulations, repetitions) * sigma
+            else:
+                raise ValueError(
+                    "Argument random_state must be None, an integer, "
+                    "or an instance of np.random.RandomState"
+                )
+        elif isinstance(random_errors, (rv_continuous, rv_discrete)):
+            params = random_errors.fit(resid)
+            eps = random_errors.rvs(*params, size=(nsimulations, repetitions))
+        elif isinstance(random_errors, rv_frozen):
+            eps = random_errors.rvs(size=(nsimulations, repetitions))
+        else:
+            raise ValueError("Argument random_errors has unexpected value!")
+
+        for t in range(nsimulations):
+            b0 = op_d(b[t - 1, :], phi)
+            l0 = op_b(lvl[t - 1, :], b0)
+            s0 = s[t - m, :]
+            y0 = op_s(l0, s0)
+            if error == "add":
+                eta = 1
+                kappa_l = 1 / s0 if mul_seasonal else 1
+                kappa_b = kappa_l / lvl[t - 1, :] if mul_trend else kappa_l
+                kappa_s = 1 / l0 if mul_seasonal else 1
+            else:
+                eta = y0
+                kappa_l = 0 if mul_seasonal else s0
+                kappa_b = (
+                    kappa_l / lvl[t - 1, :]
+                    if mul_trend
+                    else kappa_l + lvl[t - 1, :]
+                )
+                kappa_s = 0 if mul_seasonal else l0
+
+            y[t, :] = y0 + eta * eps[t, :]
+            lvl[t, :] = l0 + alpha * (mul_error * l0 + kappa_l) * eps[t, :]
+            b[t, :] = b0 + beta * (mul_error * b0 + kappa_b) * eps[t, :]
+            s[t, :] = s0 + gamma * (mul_error * s0 + kappa_s) * eps[t, :]
+
+        if use_boxcox:
+            y = inv_boxcox(y, lamda)
+
+        sim = np.atleast_1d(np.squeeze(y))
+        if y.shape[0] == 1 and y.size > 1:
+            sim = sim[None, :]
+        # Wrap data / squeeze where appropriate
+        if not isinstance(self.model.data, PandasData):
+            return sim
+
+        _, _, _, index = self.model._get_prediction_index(
+            start_idx, start_idx + nsimulations - 1
+        )
+        if repetitions == 1:
+            sim = pd.Series(sim, index=index, name=self.model.endog_names)
+        else:
+            sim = pd.DataFrame(sim, index=index)
+
+        return sim


 class HoltWintersResultsWrapper(ResultsWrapper):
-    _attrs = {'fittedvalues': 'rows', 'level': 'rows', 'resid': 'rows',
-        'season': 'rows', 'trend': 'rows', 'slope': 'rows'}
+    _attrs = {
+        "fittedvalues": "rows",
+        "level": "rows",
+        "resid": "rows",
+        "season": "rows",
+        "trend": "rows",
+        "slope": "rows",
+    }
     _wrap_attrs = union_dicts(ResultsWrapper._wrap_attrs, _attrs)
-    _methods = {'predict': 'dates', 'forecast': 'dates'}
+    _methods = {"predict": "dates", "forecast": "dates"}
     _wrap_methods = union_dicts(ResultsWrapper._wrap_methods, _methods)


diff --git a/statsmodels/tsa/innovations/api.py b/statsmodels/tsa/innovations/api.py
index cf8215c9b..6090ecc98 100644
--- a/statsmodels/tsa/innovations/api.py
+++ b/statsmodels/tsa/innovations/api.py
@@ -1,3 +1,15 @@
-from .arma_innovations import arma_innovations, arma_loglike, arma_loglikeobs, arma_score, arma_scoreobs
-__all__ = ['arma_innovations', 'arma_loglike', 'arma_loglikeobs',
-    'arma_score', 'arma_scoreobs']
+from .arma_innovations import (  # noqa: F401
+    arma_innovations,
+    arma_loglike,
+    arma_loglikeobs,
+    arma_score,
+    arma_scoreobs,
+)
+
+__all__ = [
+    "arma_innovations",
+    "arma_loglike",
+    "arma_loglikeobs",
+    "arma_score",
+    "arma_scoreobs",
+]
diff --git a/statsmodels/tsa/innovations/arma_innovations.py b/statsmodels/tsa/innovations/arma_innovations.py
index 81dd8732c..a1d9c0c3a 100644
--- a/statsmodels/tsa/innovations/arma_innovations.py
+++ b/statsmodels/tsa/innovations/arma_innovations.py
@@ -1,16 +1,19 @@
 import numpy as np
+
 from statsmodels.tsa import arima_process
 from statsmodels.tsa.statespace.tools import prefix_dtype_map
 from statsmodels.tools.numdiff import _get_epsilon, approx_fprime_cs
 from scipy.linalg.blas import find_best_blas_type
 from . import _arma_innovations
-NON_STATIONARY_ERROR = """The model's autoregressive parameters (ar_params) indicate that the process
+
+NON_STATIONARY_ERROR = """\
+The model's autoregressive parameters (ar_params) indicate that the process
  is non-stationary. The innovations algorithm cannot be used.
 """


 def arma_innovations(endog, ar_params=None, ma_params=None, sigma2=1,
-    normalize=False, prefix=None):
+                     normalize=False, prefix=None):
     """
     Compute innovations using a given ARMA process.

@@ -41,7 +44,67 @@ def arma_innovations(endog, ar_params=None, ma_params=None, sigma2=1,
     innovations_mse : ndarray
         Mean square error for the innovations.
     """
-    pass
+    # Parameters
+    endog = np.array(endog)
+    squeezed = endog.ndim == 1
+    if squeezed:
+        endog = endog[:, None]
+
+    ar_params = np.atleast_1d([] if ar_params is None else ar_params)
+    ma_params = np.atleast_1d([] if ma_params is None else ma_params)
+
+    nobs, k_endog = endog.shape
+    ar = np.r_[1, -ar_params]
+    ma = np.r_[1, ma_params]
+
+    # Get BLAS prefix
+    if prefix is None:
+        prefix, dtype, _ = find_best_blas_type(
+            [endog, ar_params, ma_params, np.array(sigma2)])
+    dtype = prefix_dtype_map[prefix]
+
+    # Make arrays contiguous for BLAS calls
+    endog = np.asfortranarray(endog, dtype=dtype)
+    ar_params = np.asfortranarray(ar_params, dtype=dtype)
+    ma_params = np.asfortranarray(ma_params, dtype=dtype)
+    sigma2 = dtype(sigma2).item()
+
+    # Get the appropriate functions
+    arma_transformed_acovf_fast = getattr(
+        _arma_innovations, prefix + 'arma_transformed_acovf_fast')
+    arma_innovations_algo_fast = getattr(
+        _arma_innovations, prefix + 'arma_innovations_algo_fast')
+    arma_innovations_filter = getattr(
+        _arma_innovations, prefix + 'arma_innovations_filter')
+
+    # Run the innovations algorithm for ARMA coefficients
+    arma_acovf = arima_process.arma_acovf(ar, ma,
+                                          sigma2=sigma2, nobs=nobs) / sigma2
+    acovf, acovf2 = arma_transformed_acovf_fast(ar, ma, arma_acovf)
+    theta, v = arma_innovations_algo_fast(nobs, ar_params, ma_params,
+                                          acovf, acovf2)
+    v = np.array(v)
+    if (np.any(v < 0) or
+            not np.isfinite(theta).all() or
+            not np.isfinite(v).all()):
+        # This is defensive code that is hard to hit
+        raise ValueError(NON_STATIONARY_ERROR)
+
+    # Run the innovations filter across each series
+    u = []
+    for i in range(k_endog):
+        u_i = np.array(arma_innovations_filter(endog[:, i], ar_params,
+                                               ma_params, theta))
+        u.append(u_i)
+    u = np.vstack(u).T
+    if normalize:
+        u /= v[:, None]**0.5
+
+    # Post-processing
+    if squeezed:
+        u = u.squeeze()
+
+    return u, v


 def arma_loglike(endog, ar_params=None, ma_params=None, sigma2=1, prefix=None):
@@ -68,11 +131,13 @@ def arma_loglike(endog, ar_params=None, ma_params=None, sigma2=1, prefix=None):
     float
         The joint loglikelihood.
     """
-    pass
+    llf_obs = arma_loglikeobs(endog, ar_params=ar_params, ma_params=ma_params,
+                              sigma2=sigma2, prefix=prefix)
+    return np.sum(llf_obs)


-def arma_loglikeobs(endog, ar_params=None, ma_params=None, sigma2=1, prefix
-    =None):
+def arma_loglikeobs(endog, ar_params=None, ma_params=None, sigma2=1,
+                    prefix=None):
     """
     Compute the log-likelihood for each observation assuming an ARMA process.

@@ -96,10 +161,26 @@ def arma_loglikeobs(endog, ar_params=None, ma_params=None, sigma2=1, prefix
     ndarray
         Array of loglikelihood values for each observation.
     """
-    pass
+    endog = np.array(endog)
+    ar_params = np.atleast_1d([] if ar_params is None else ar_params)
+    ma_params = np.atleast_1d([] if ma_params is None else ma_params)
+
+    if prefix is None:
+        prefix, dtype, _ = find_best_blas_type(
+            [endog, ar_params, ma_params, np.array(sigma2)])
+    dtype = prefix_dtype_map[prefix]
+
+    endog = np.ascontiguousarray(endog, dtype=dtype)
+    ar_params = np.asfortranarray(ar_params, dtype=dtype)
+    ma_params = np.asfortranarray(ma_params, dtype=dtype)
+    sigma2 = dtype(sigma2).item()

+    func = getattr(_arma_innovations, prefix + 'arma_loglikeobs_fast')
+    return func(endog, ar_params, ma_params, sigma2)

-def arma_score(endog, ar_params=None, ma_params=None, sigma2=1, prefix=None):
+
+def arma_score(endog, ar_params=None, ma_params=None, sigma2=1,
+               prefix=None):
     """
     Compute the score (gradient of the log-likelihood function).

@@ -132,11 +213,22 @@ def arma_score(endog, ar_params=None, ma_params=None, sigma2=1, prefix=None):
     This is a numerical approximation, calculated using first-order complex
     step differentiation on the `arma_loglike` method.
     """
-    pass
+    ar_params = [] if ar_params is None else ar_params
+    ma_params = [] if ma_params is None else ma_params
+
+    p = len(ar_params)
+    q = len(ma_params)

+    def func(params):
+        return arma_loglike(endog, params[:p], params[p:p + q], params[p + q:])

-def arma_scoreobs(endog, ar_params=None, ma_params=None, sigma2=1, prefix=None
-    ):
+    params0 = np.r_[ar_params, ma_params, sigma2]
+    epsilon = _get_epsilon(params0, 2., None, len(params0))
+    return approx_fprime_cs(params0, func, epsilon)
+
+
+def arma_scoreobs(endog, ar_params=None, ma_params=None, sigma2=1,
+                  prefix=None):
     """
     Compute the score (gradient) per observation.

@@ -169,4 +261,16 @@ def arma_scoreobs(endog, ar_params=None, ma_params=None, sigma2=1, prefix=None
     This is a numerical approximation, calculated using first-order complex
     step differentiation on the `arma_loglike` method.
     """
-    pass
+    ar_params = [] if ar_params is None else ar_params
+    ma_params = [] if ma_params is None else ma_params
+
+    p = len(ar_params)
+    q = len(ma_params)
+
+    def func(params):
+        return arma_loglikeobs(endog, params[:p], params[p:p + q],
+                               params[p + q:])
+
+    params0 = np.r_[ar_params, ma_params, sigma2]
+    epsilon = _get_epsilon(params0, 2., None, len(params0))
+    return approx_fprime_cs(params0, func, epsilon)
diff --git a/statsmodels/tsa/interp/denton.py b/statsmodels/tsa/interp/denton.py
index ba826c719..016c13174 100644
--- a/statsmodels/tsa/interp/denton.py
+++ b/statsmodels/tsa/interp/denton.py
@@ -1,9 +1,90 @@
+
 import numpy as np
-from numpy import dot, eye, diag_indices, zeros, ones, diag, asarray, r_
+from numpy import (dot, eye, diag_indices, zeros, ones, diag,
+        asarray, r_)
 from numpy.linalg import solve


-def dentonm(indicator, benchmark, freq='aq', **kwargs):
+# def denton(indicator, benchmark, freq="aq", **kwarg):
+#    """
+#    Denton's method to convert low-frequency to high frequency data.
+#
+#    Parameters
+#    ----------
+#    benchmark : array_like
+#        The higher frequency benchmark.  A 1d or 2d data series in columns.
+#        If 2d, then M series are assumed.
+#    indicator
+#        A low-frequency indicator series.  It is assumed that there are no
+#        pre-sample indicators.  Ie., the first indicators line up with
+#        the first benchmark.
+#    freq : str {"aq","qm", "other"}
+#        "aq" - Benchmarking an annual series to quarterly.
+#        "mq" - Benchmarking a quarterly series to monthly.
+#        "other" - Custom stride.  A kwarg, k, must be supplied.
+#    kwargs :
+#        k : int
+#            The number of high-frequency observations that sum to make an
+#            aggregate low-frequency observation. `k` is used with
+#            `freq` == "other".
+#    Returns
+#    -------
+#    benchmarked series : ndarray
+#
+#    Notes
+#    -----
+#    Denton's method minimizes the distance given by the penalty function, in
+#    a least squares sense, between the unknown benchmarked series and the
+#    indicator series subject to the condition that the sum of the benchmarked
+#    series is equal to the benchmark.
+#
+#
+#    References
+#    ----------
+#    Bloem, A.M, Dippelsman, R.J. and Maehle, N.O.  2001 Quarterly National
+#        Accounts Manual--Concepts, Data Sources, and Compilation. IMF.
+#        http://www.imf.org/external/pubs/ft/qna/2000/Textbook/index.htm
+#    Denton, F.T. 1971. "Adjustment of monthly or quarterly series to annual
+#        totals: an approach based on quadratic minimization." Journal of the
+#        American Statistical Association. 99-102.
+#
+#    """
+#    # check arrays and make 2d
+#    indicator = np.asarray(indicator)
+#    if indicator.ndim == 1:
+#        indicator = indicator[:,None]
+#    benchmark = np.asarray(benchmark)
+#    if benchmark.ndim == 1:
+#        benchmark = benchmark[:,None]
+#
+#    # get dimensions
+#    N = len(indicator) # total number of high-freq
+#    m = len(benchmark) # total number of low-freq
+#
+#    # number of low-freq observations for aggregate measure
+#    # 4 for annual to quarter and 3 for quarter to monthly
+#    if freq == "aq":
+#        k = 4
+#    elif freq == "qm":
+#        k = 3
+#    elif freq == "other":
+#        k = kwargs.get("k")
+#        if not k:
+#            raise ValueError("k must be supplied with freq=\"other\"")
+#    else:
+#        raise ValueError("freq %s not understood" % freq)
+#
+#    n = k*m # number of indicator series with a benchmark for back-series
+#    # if k*m != n, then we are going to extrapolate q observations
+#
+#    B = block_diag(*(np.ones((k,1)),)*m)
+#
+#    r = benchmark - B.T.dot(indicator)
+#TODO: take code in the string at the end and implement Denton's original
+# method with a few of the penalty functions.
+
+
+def dentonm(indicator, benchmark, freq="aq", **kwargs):
     """
     Modified Denton's method to convert low-frequency to high-frequency data.

@@ -72,24 +153,117 @@ def dentonm(indicator, benchmark, freq='aq', **kwargs):
         totals: an approach based on quadratic minimization." Journal of the
         American Statistical Association. 99-102.
     """
-    pass
+#    penalty : str
+#        Penalty function.  Can be "D1", "D2", "D3", "D4", "D5".
+#        X is the benchmarked series and I is the indicator.
+#        D1 - sum((X[t] - X[t-1]) - (I[t] - I[ti-1])**2)
+#        D2 - sum((ln(X[t]/X[t-1]) - ln(I[t]/I[t-1]))**2)
+#        D3 - sum((X[t]/X[t-1] / I[t]/I[t-1])**2)
+#        D4 - sum((X[t]/I[t] - X[t-1]/I[t-1])**2)
+#        D5 - sum((X[t]/I[t] / X[t-1]/I[t-1] - 1)**2)
+#NOTE: only D4 is the only one implemented, see IMF chapter 6.
+
+    # check arrays and make 2d
+    indicator = asarray(indicator)
+    if indicator.ndim == 1:
+        indicator = indicator[:,None]
+    benchmark = asarray(benchmark)
+    if benchmark.ndim == 1:
+        benchmark = benchmark[:,None]
+
+    # get dimensions
+    N = len(indicator) # total number of high-freq
+    m = len(benchmark) # total number of low-freq
+
+    # number of low-freq observations for aggregate measure
+    # 4 for annual to quarter and 3 for quarter to monthly
+    if freq == "aq":
+        k = 4
+    elif freq == "qm":
+        k = 3
+    elif freq == "other":
+        k = kwargs.get("k")
+        if not k:
+            raise ValueError("k must be supplied with freq=\"other\"")
+    else:
+        raise ValueError("freq %s not understood" % freq)
+
+    n = k*m # number of indicator series with a benchmark for back-series
+    # if k*m != n, then we are going to extrapolate q observations
+    if N > n:
+        q = N - n
+    else:
+        q = 0
+
+    # make the aggregator matrix
+    #B = block_diag(*(ones((k,1)),)*m)
+    B = np.kron(np.eye(m), ones((k,1)))
+
+    # following the IMF paper, we can do
+    Zinv = diag(1./indicator.squeeze()[:n])
+    # this is D in Denton's notation (not using initial value correction)
+#    D = eye(n)
+    # make off-diagonal = -1
+#    D[((np.diag_indices(n)[0])[:-1]+1,(np.diag_indices(n)[1])[:-1])] = -1
+    # account for starting conditions
+#    H = D[1:,:]
+#    HTH = dot(H.T,H)
+    # just make HTH
+    HTH = eye(n)
+    diag_idx0, diag_idx1 = diag_indices(n)
+    HTH[diag_idx0[1:-1], diag_idx1[1:-1]] += 1
+    HTH[diag_idx0[:-1]+1, diag_idx1[:-1]] = -1
+    HTH[diag_idx0[:-1], diag_idx1[:-1]+1] = -1
+
+    W = dot(dot(Zinv,HTH),Zinv)
+
+    # make partitioned matrices
+    # TODO: break this out so that we can simplify the linalg?
+    I = zeros((n+m, n+m))  # noqa:E741
+    I[:n,:n] = W
+    I[:n,n:] = B
+    I[n:,:n] = B.T

+    A = zeros((m+n,1)) # zero first-order constraints
+    A[-m:] = benchmark # adding up constraints
+    X = solve(I,A)
+    X = X[:-m]  # drop the lagrange multipliers

-if __name__ == '__main__':
-    indicator = np.array([98.2, 100.8, 102.2, 100.8, 99.0, 101.6, 102.7, 
-        101.5, 100.5, 103.0, 103.5, 101.5])
-    benchmark = np.array([4000.0, 4161.4])
-    x_imf = dentonm(indicator, benchmark, freq='aq')
-    imf_stata = np.array([969.8, 998.4, 1018.3, 1013.4, 1007.2, 1042.9, 
-        1060.3, 1051.0, 1040.6, 1066.5, 1071.7, 1051.0])
+    # handle extrapolation
+    if q > 0:
+        # get last Benchmark-Indicator ratio
+        bi = X[n-1]/indicator[n-1]
+        extrapolated = bi * indicator[n:]
+        X = r_[X,extrapolated]
+
+    return X.squeeze()
+
+
+if __name__ == "__main__":
+    #these will be the tests
+    # from IMF paper
+
+    # quarterly data
+    indicator = np.array([98.2, 100.8, 102.2, 100.8, 99.0, 101.6,
+                          102.7, 101.5, 100.5, 103.0, 103.5, 101.5])
+    # two annual observations
+    benchmark = np.array([4000.,4161.4])
+    x_imf = dentonm(indicator, benchmark, freq="aq")
+
+    imf_stata = np.array([969.8, 998.4, 1018.3, 1013.4, 1007.2, 1042.9,
+                                1060.3, 1051.0, 1040.6, 1066.5, 1071.7, 1051.0])
     np.testing.assert_almost_equal(imf_stata, x_imf, 1)
-    zQ = np.array([50, 100, 150, 100] * 5)
-    Y = np.array([500, 400, 300, 400, 500])
-    x_denton = dentonm(zQ, Y, freq='aq')
-    x_stata = np.array([64.334796, 127.80616, 187.82379, 120.03526, 
-        56.563894, 105.97568, 147.50144, 89.958987, 40.547201, 74.445963, 
-        108.34473, 76.66211, 42.763347, 94.14664, 153.41596, 109.67405, 
-        58.290761, 122.62556, 190.41409, 128.66959])
+
+    # Denton example
+    zQ = np.array([50,100,150,100] * 5)
+    Y = np.array([500,400,300,400,500])
+    x_denton = dentonm(zQ, Y, freq="aq")
+    x_stata = np.array([64.334796,127.80616,187.82379,120.03526,56.563894,
+                    105.97568,147.50144,89.958987,40.547201,74.445963,
+                    108.34473,76.66211,42.763347,94.14664,153.41596,
+                    109.67405,58.290761,122.62556,190.41409,128.66959])
+
+
 """
 # Examples from the Denton 1971 paper
 k = 4
diff --git a/statsmodels/tsa/mlemodel.py b/statsmodels/tsa/mlemodel.py
index fe22c27b9..2b8674817 100644
--- a/statsmodels/tsa/mlemodel.py
+++ b/statsmodels/tsa/mlemodel.py
@@ -10,13 +10,19 @@ Author: josef-pktd
 License: BSD

 """
+
+
 try:
     import numdifftools as ndt
 except ImportError:
     pass
+
 from statsmodels.base.model import LikelihoodModel


+#copied from sandbox/regression/mle.py
+#TODO: I take it this is only a stub and should be included in another
+# model class?
 class TSMLEModel(LikelihoodModel):
     """
     univariate time series model for estimation with maximum likelihood
@@ -25,9 +31,15 @@ class TSMLEModel(LikelihoodModel):
     """

     def __init__(self, endog, exog=None):
+        #need to override p,q (nar,nma) correctly
         super().__init__(endog, exog)
+        #set default arma(1,1)
         self.nar = 1
         self.nma = 1
+        #self.initialize()
+
+    def geterrors(self, params):
+        raise NotImplementedError

     def loglike(self, params):
         """
@@ -42,23 +54,33 @@ class TSMLEModel(LikelihoodModel):
         -----
         needs to be overwritten by subclass
         """
-        pass
+        raise NotImplementedError

     def score(self, params):
         """
         Score vector for Arma model
         """
-        pass
+        #return None
+        #print params
+        jac = ndt.Jacobian(self.loglike, stepMax=1e-4)
+        return jac(params)[-1]

     def hessian(self, params):
         """
         Hessian of arma model.  Currently uses numdifftools
         """
-        pass
+        #return None
+        Hfun = ndt.Jacobian(self.score, stepMax=1e-4)
+        return Hfun(params)[-1]

     def fit(self, start_params=None, maxiter=5000, method='fmin', tol=1e-08):
-        """estimate model by minimizing negative loglikelihood
+        '''estimate model by minimizing negative loglikelihood

         does this need to be overwritten ?
-        """
-        pass
+        '''
+        if start_params is None and hasattr(self, '_start_params'):
+            start_params = self._start_params
+        #start_params = np.concatenate((0.05*np.ones(self.nar + self.nma), [1]))
+        mlefit = super().fit(start_params=start_params,
+                maxiter=maxiter, method=method, tol=tol)
+        return mlefit
diff --git a/statsmodels/tsa/regime_switching/markov_autoregression.py b/statsmodels/tsa/regime_switching/markov_autoregression.py
index befe83a92..f801cfc1e 100644
--- a/statsmodels/tsa/regime_switching/markov_autoregression.py
+++ b/statsmodels/tsa/regime_switching/markov_autoregression.py
@@ -4,15 +4,20 @@ Markov switching autoregression models
 Author: Chad Fulton
 License: BSD-3
 """
+
+
 import numpy as np
 import statsmodels.base.wrapper as wrap
+
 from statsmodels.tsa.tsatools import lagmat
-from statsmodels.tsa.regime_switching import markov_switching, markov_regression
-from statsmodels.tsa.statespace.tools import constrain_stationary_univariate, unconstrain_stationary_univariate
+from statsmodels.tsa.regime_switching import (
+    markov_switching, markov_regression)
+from statsmodels.tsa.statespace.tools import (
+    constrain_stationary_univariate, unconstrain_stationary_univariate)


 class MarkovAutoregression(markov_regression.MarkovRegression):
-    """
+    r"""
     Markov switching regression model

     Parameters
@@ -65,11 +70,11 @@ class MarkovAutoregression(markov_regression.MarkovRegression):

     .. math::

-        y_t = a_{S_t} + x_t' \\beta_{S_t} + \\phi_{1, S_t}
-        (y_{t-1} - a_{S_{t-1}} - x_{t-1}' \\beta_{S_{t-1}}) + \\dots +
-        \\phi_{p, S_t} (y_{t-p} - a_{S_{t-p}} - x_{t-p}' \\beta_{S_{t-p}}) +
-        \\varepsilon_t \\\\
-        \\varepsilon_t \\sim N(0, \\sigma_{S_t}^2)
+        y_t = a_{S_t} + x_t' \beta_{S_t} + \phi_{1, S_t}
+        (y_{t-1} - a_{S_{t-1}} - x_{t-1}' \beta_{S_{t-1}}) + \dots +
+        \phi_{p, S_t} (y_{t-p} - a_{S_{t-p}} - x_{t-p}' \beta_{S_{t-p}}) +
+        \varepsilon_t \\
+        \varepsilon_t \sim N(0, \sigma_{S_t}^2)

     i.e. the model is an autoregression with where the autoregressive
     coefficients, the mean of the process (possibly including trend or
@@ -93,39 +98,61 @@ class MarkovAutoregression(markov_regression.MarkovRegression):
     """

     def __init__(self, endog, k_regimes, order, trend='c', exog=None,
-        exog_tvtp=None, switching_ar=True, switching_trend=True,
-        switching_exog=False, switching_variance=False, dates=None, freq=
-        None, missing='none'):
+                 exog_tvtp=None, switching_ar=True, switching_trend=True,
+                 switching_exog=False, switching_variance=False,
+                 dates=None, freq=None, missing='none'):
+
+        # Properties
         self.switching_ar = switching_ar
+
+        # Switching options
         if self.switching_ar is True or self.switching_ar is False:
             self.switching_ar = [self.switching_ar] * order
         elif not len(self.switching_ar) == order:
             raise ValueError('Invalid iterable passed to `switching_ar`.')
-        super().__init__(endog, k_regimes, trend=trend, exog=exog, order=
-            order, exog_tvtp=exog_tvtp, switching_trend=switching_trend,
-            switching_exog=switching_exog, switching_variance=
-            switching_variance, dates=dates, freq=freq, missing=missing)
+
+        # Initialize the base model
+        super().__init__(
+            endog, k_regimes, trend=trend, exog=exog, order=order,
+            exog_tvtp=exog_tvtp, switching_trend=switching_trend,
+            switching_exog=switching_exog,
+            switching_variance=switching_variance, dates=dates, freq=freq,
+            missing=missing)
+
+        # Sanity checks
         if self.nobs <= self.order:
-            raise ValueError(
-                'Must have more observations than the order of the autoregression.'
-                )
+            raise ValueError('Must have more observations than the order of'
+                             ' the autoregression.')
+
+        # Autoregressive exog
         self.exog_ar = lagmat(endog, self.order)[self.order:]
+
+        # Reshape other datasets
         self.nobs -= self.order
         self.orig_endog = self.endog
         self.endog = self.endog[self.order:]
         if self._k_exog > 0:
             self.orig_exog = self.exog
             self.exog = self.exog[self.order:]
-        self.data.endog, self.data.exog = self.data._convert_endog_exog(self
-            .endog, self.exog)
+
+        # Reset the ModelData datasets
+        self.data.endog, self.data.exog = (
+            self.data._convert_endog_exog(self.endog, self.exog))
+
+        # Reset indexes, if provided
         if self.data.row_labels is not None:
-            self.data._cache['row_labels'] = self.data.row_labels[self.order:]
+            self.data._cache['row_labels'] = (
+                self.data.row_labels[self.order:])
         if self._index is not None:
             if self._index_generated:
                 self._index = self._index[:-self.order]
             else:
                 self._index = self._index[self.order:]
+
+        # Parameters
         self.parameters['autoregressive'] = self.switching_ar
+
+        # Cache an array for holding slices
         self._predict_slices = [slice(None, None, None)] * (self.order + 1)

     def predict_conditional(self, params):
@@ -143,33 +170,205 @@ class MarkovAutoregression(markov_regression.MarkovRegression):
             Array of predictions conditional on current, and possibly past,
             regimes
         """
-        pass
+        params = np.array(params, ndmin=1)
+
+        # Prediction is based on:
+        # y_t = x_t beta^{(S_t)} +
+        #       \phi_1^{(S_t)} (y_{t-1} - x_{t-1} beta^{(S_t-1)}) + ...
+        #       \phi_p^{(S_t)} (y_{t-p} - x_{t-p} beta^{(S_t-p)}) + eps_t
+        if self._k_exog > 0:
+            xb = []
+            for i in range(self.k_regimes):
+                coeffs = params[self.parameters[i, 'exog']]
+                xb.append(np.dot(self.orig_exog, coeffs))
+
+        predict = np.zeros(
+            (self.k_regimes,) * (self.order + 1) + (self.nobs,),
+            dtype=np.promote_types(np.float64, params.dtype))
+        # Iterate over S_{t} = i
+        for i in range(self.k_regimes):
+            ar_coeffs = params[self.parameters[i, 'autoregressive']]
+
+            # y_t - x_t beta^{(S_t)}
+            ix = self._predict_slices[:]
+            ix[0] = i
+            ix = tuple(ix)
+            if self._k_exog > 0:
+                predict[ix] += xb[i][self.order:]
+
+            # Iterate over j = 2, .., p
+            for j in range(1, self.order + 1):
+                for k in range(self.k_regimes):
+                    # This gets a specific time-period / regime slice:
+                    # S_{t} = i, S_{t-j} = k, across all other time-period /
+                    # regime slices.
+                    ix = self._predict_slices[:]
+                    ix[0] = i
+                    ix[j] = k
+                    ix = tuple(ix)
+
+                    start = self.order - j
+                    end = -j
+                    if self._k_exog > 0:
+                        predict[ix] += ar_coeffs[j-1] * (
+                            self.orig_endog[start:end] - xb[k][start:end])
+                    else:
+                        predict[ix] += ar_coeffs[j-1] * (
+                            self.orig_endog[start:end])
+
+        return predict
+
+    def _resid(self, params):
+        return self.endog - self.predict_conditional(params)

     def _conditional_loglikelihoods(self, params):
         """
         Compute loglikelihoods conditional on the current period's regime and
         the last `self.order` regimes.
         """
-        pass
+        # Get the residuals
+        resid = self._resid(params)
+
+        # Compute the conditional likelihoods
+        variance = params[self.parameters['variance']].squeeze()
+        if self.switching_variance:
+            variance = np.reshape(variance, (self.k_regimes, 1, 1))
+
+        conditional_loglikelihoods = (
+            -0.5 * resid**2 / variance - 0.5 * np.log(2 * np.pi * variance))
+
+        return conditional_loglikelihoods
+
+    @property
+    def _res_classes(self):
+        return {'fit': (MarkovAutoregressionResults,
+                        MarkovAutoregressionResultsWrapper)}

     def _em_iteration(self, params0):
         """
         EM iteration
         """
-        pass
+        # Inherited parameters
+        result, params1 = markov_switching.MarkovSwitching._em_iteration(
+            self, params0)
+
+        tmp = np.sqrt(result.smoothed_marginal_probabilities)
+
+        # Regression coefficients
+        coeffs = None
+        if self._k_exog > 0:
+            coeffs = self._em_exog(result, self.endog, self.exog,
+                                   self.parameters.switching['exog'], tmp)
+            for i in range(self.k_regimes):
+                params1[self.parameters[i, 'exog']] = coeffs[i]
+
+        # Autoregressive
+        if self.order > 0:
+            if self._k_exog > 0:
+                ar_coeffs, variance = self._em_autoregressive(
+                    result, coeffs)
+            else:
+                ar_coeffs = self._em_exog(
+                    result, self.endog, self.exog_ar,
+                    self.parameters.switching['autoregressive'])
+                variance = self._em_variance(
+                    result, self.endog, self.exog_ar, ar_coeffs, tmp)
+            for i in range(self.k_regimes):
+                params1[self.parameters[i, 'autoregressive']] = ar_coeffs[i]
+            params1[self.parameters['variance']] = variance
+
+        return result, params1

     def _em_autoregressive(self, result, betas, tmp=None):
         """
         EM step for autoregressive coefficients and variances
         """
-        pass
+        if tmp is None:
+            tmp = np.sqrt(result.smoothed_marginal_probabilities)
+
+        resid = np.zeros((self.k_regimes, self.nobs + self.order))
+        resid[:] = self.orig_endog
+        if self._k_exog > 0:
+            for i in range(self.k_regimes):
+                resid[i] -= np.dot(self.orig_exog, betas[i])
+
+        # The difference between this and `_em_exog` is that here we have a
+        # different endog and exog for each regime
+        coeffs = np.zeros((self.k_regimes,) + (self.order,))
+        variance = np.zeros((self.k_regimes,))
+        exog = np.zeros((self.nobs, self.order))
+        for i in range(self.k_regimes):
+            endog = resid[i, self.order:]
+            exog = lagmat(resid[i], self.order)[self.order:]
+            tmp_endog = tmp[i] * endog
+            tmp_exog = tmp[i][:, None] * exog
+
+            coeffs[i] = np.dot(np.linalg.pinv(tmp_exog), tmp_endog)
+
+            if self.switching_variance:
+                tmp_resid = endog - np.dot(exog, coeffs[i])
+                variance[i] = (np.sum(
+                    tmp_resid**2 * result.smoothed_marginal_probabilities[i]) /
+                    np.sum(result.smoothed_marginal_probabilities[i]))
+            else:
+                tmp_resid = tmp_endog - np.dot(tmp_exog, coeffs[i])
+                variance[i] = np.sum(tmp_resid**2)
+
+        # Variances
+        if not self.switching_variance:
+            variance = variance.sum() / self.nobs
+
+        return coeffs, variance

     @property
     def start_params(self):
         """
         (array) Starting parameters for maximum likelihood estimation.
         """
-        pass
+        # Inherited parameters
+        params = markov_switching.MarkovSwitching.start_params.fget(self)
+
+        # OLS for starting parameters
+        endog = self.endog.copy()
+        if self._k_exog > 0 and self.order > 0:
+            exog = np.c_[self.exog, self.exog_ar]
+        elif self._k_exog > 0:
+            exog = self.exog
+        elif self.order > 0:
+            exog = self.exog_ar
+
+        if self._k_exog > 0 or self.order > 0:
+            beta = np.dot(np.linalg.pinv(exog), endog)
+            variance = np.var(endog - np.dot(exog, beta))
+        else:
+            variance = np.var(endog)
+
+        # Regression coefficients
+        if self._k_exog > 0:
+            if np.any(self.switching_coeffs):
+                for i in range(self.k_regimes):
+                    params[self.parameters[i, 'exog']] = (
+                        beta[:self._k_exog] * (i / self.k_regimes))
+            else:
+                params[self.parameters['exog']] = beta[:self._k_exog]
+
+        # Autoregressive
+        if self.order > 0:
+            if np.any(self.switching_ar):
+                for i in range(self.k_regimes):
+                    params[self.parameters[i, 'autoregressive']] = (
+                        beta[self._k_exog:] * (i / self.k_regimes))
+            else:
+                params[self.parameters['autoregressive']] = beta[self._k_exog:]
+
+        # Variance
+        if self.switching_variance:
+            params[self.parameters['variance']] = (
+                np.linspace(variance / 10., variance, num=self.k_regimes))
+        else:
+            params[self.parameters['variance']] = variance
+
+        return params

     @property
     def param_names(self):
@@ -177,7 +376,21 @@ class MarkovAutoregression(markov_regression.MarkovRegression):
         (list of str) List of human readable parameter names (for parameters
         actually included in the model).
         """
-        pass
+        # Inherited parameters
+        param_names = np.array(
+            markov_regression.MarkovRegression.param_names.fget(self),
+            dtype=object)
+
+        # Autoregressive
+        if np.any(self.switching_ar):
+            for i in range(self.k_regimes):
+                param_names[self.parameters[i, 'autoregressive']] = [
+                    'ar.L%d[%d]' % (j+1, i) for j in range(self.order)]
+        else:
+            param_names[self.parameters['autoregressive']] = [
+                'ar.L%d' % (j+1) for j in range(self.order)]
+
+        return param_names.tolist()

     def transform_params(self, unconstrained):
         """
@@ -196,7 +409,19 @@ class MarkovAutoregression(markov_regression.MarkovRegression):
             Array of constrained parameters which may be used in likelihood
             evaluation.
         """
-        pass
+        # Inherited parameters
+        constrained = super(MarkovAutoregression, self).transform_params(
+            unconstrained)
+
+        # Autoregressive
+        # TODO may provide unexpected results when some coefficients are not
+        # switching
+        for i in range(self.k_regimes):
+            s = self.parameters[i, 'autoregressive']
+            constrained[s] = constrain_stationary_univariate(
+                unconstrained[s])
+
+        return constrained

     def untransform_params(self, constrained):
         """
@@ -214,11 +439,23 @@ class MarkovAutoregression(markov_regression.MarkovRegression):
         unconstrained : array_like
             Array of unconstrained parameters used by the optimizer.
         """
-        pass
+        # Inherited parameters
+        unconstrained = super(MarkovAutoregression, self).untransform_params(
+            constrained)
+
+        # Autoregressive
+        # TODO may provide unexpected results when some coefficients are not
+        # switching
+        for i in range(self.k_regimes):
+            s = self.parameters[i, 'autoregressive']
+            unconstrained[s] = unconstrain_stationary_univariate(
+                constrained[s])
+
+        return unconstrained


 class MarkovAutoregressionResults(markov_regression.MarkovRegressionResults):
-    """
+    r"""
     Class to hold results from fitting a Markov switching autoregression model

     Parameters
@@ -249,10 +486,8 @@ class MarkovAutoregressionResults(markov_regression.MarkovRegressionResults):
     pass


-class MarkovAutoregressionResultsWrapper(markov_regression.
-    MarkovRegressionResultsWrapper):
+class MarkovAutoregressionResultsWrapper(
+        markov_regression.MarkovRegressionResultsWrapper):
     pass
-
-
-wrap.populate_wrapper(MarkovAutoregressionResultsWrapper,
-    MarkovAutoregressionResults)
+wrap.populate_wrapper(MarkovAutoregressionResultsWrapper,  # noqa:E305
+                      MarkovAutoregressionResults)
diff --git a/statsmodels/tsa/regime_switching/markov_regression.py b/statsmodels/tsa/regime_switching/markov_regression.py
index 575da8b8d..91922a3fc 100644
--- a/statsmodels/tsa/regime_switching/markov_regression.py
+++ b/statsmodels/tsa/regime_switching/markov_regression.py
@@ -6,11 +6,12 @@ License: BSD-3
 """
 import numpy as np
 import statsmodels.base.wrapper as wrap
+
 from statsmodels.tsa.regime_switching import markov_switching


 class MarkovRegression(markov_switching.MarkovSwitching):
-    """
+    r"""
     First-order k-regime Markov switching regression model

     Parameters
@@ -60,8 +61,8 @@ class MarkovRegression(markov_switching.MarkovSwitching):

     .. math::

-        y_t = a_{S_t} + x_t' \\beta_{S_t} + \\varepsilon_t \\\\
-        \\varepsilon_t \\sim N(0, \\sigma_{S_t}^2)
+        y_t = a_{S_t} + x_t' \beta_{S_t} + \varepsilon_t \\
+        \varepsilon_t \sim N(0, \sigma_{S_t}^2)

     i.e. the model is a dynamic linear regression where the coefficients and
     the variance of the error term may be switching across regimes.
@@ -83,14 +84,21 @@ class MarkovRegression(markov_switching.MarkovSwitching):
     """

     def __init__(self, endog, k_regimes, trend='c', exog=None, order=0,
-        exog_tvtp=None, switching_trend=True, switching_exog=True,
-        switching_variance=False, dates=None, freq=None, missing='none'):
+                 exog_tvtp=None, switching_trend=True, switching_exog=True,
+                 switching_variance=False, dates=None, freq=None,
+                 missing='none'):
+
+        # Properties
         from statsmodels.tools.validation import string_like
-        self.trend = string_like(trend, 'trend', options=('n', 'c', 'ct', 't'))
+        self.trend = string_like(trend, "trend", options=("n", "c", "ct", "t"))
         self.switching_trend = switching_trend
         self.switching_exog = switching_exog
         self.switching_variance = switching_variance
+
+        # Exogenous data
         self.k_exog, exog = markov_switching.prepare_exog(exog)
+
+        # Trend
         nobs = len(endog)
         self.k_trend = 0
         self._k_exog = self.k_exog
@@ -102,15 +110,19 @@ class MarkovRegression(markov_switching.MarkovSwitching):
             trend_exog = (np.arange(nobs) + 1)[:, np.newaxis]
             self.k_trend = 1
         elif trend == 'ct':
-            trend_exog = np.c_[np.ones((nobs, 1)), (np.arange(nobs) + 1)[:,
-                np.newaxis]]
+            trend_exog = np.c_[np.ones((nobs, 1)),
+                               (np.arange(nobs) + 1)[:, np.newaxis]]
             self.k_trend = 2
         if trend_exog is not None:
             exog = trend_exog if exog is None else np.c_[trend_exog, exog]
             self._k_exog += self.k_trend
-        super(MarkovRegression, self).__init__(endog, k_regimes, order=
-            order, exog_tvtp=exog_tvtp, exog=exog, dates=dates, freq=freq,
-            missing=missing)
+
+        # Initialize the base model
+        super(MarkovRegression, self).__init__(
+            endog, k_regimes, order=order, exog_tvtp=exog_tvtp, exog=exog,
+            dates=dates, freq=freq, missing=missing)
+
+        # Switching options
         if self.switching_trend is True or self.switching_trend is False:
             self.switching_trend = [self.switching_trend] * self.k_trend
         elif not len(self.switching_trend) == self.k_trend:
@@ -119,8 +131,12 @@ class MarkovRegression(markov_switching.MarkovSwitching):
             self.switching_exog = [self.switching_exog] * self.k_exog
         elif not len(self.switching_exog) == self.k_exog:
             raise ValueError('Invalid iterable passed to `switching_exog`.')
-        self.switching_coeffs = np.r_[self.switching_trend, self.switching_exog
-            ].astype(bool).tolist()
+
+        self.switching_coeffs = (
+            np.r_[self.switching_trend,
+                  self.switching_exog].astype(bool).tolist())
+
+        # Parameters
         self.parameters['exog'] = self.switching_coeffs
         self.parameters['variance'] = [1] if self.switching_variance else [0]

@@ -139,13 +155,47 @@ class MarkovRegression(markov_switching.MarkovSwitching):
             Array of predictions conditional on current, and possibly past,
             regimes
         """
-        pass
+        params = np.array(params, ndmin=1)
+
+        # Since in the base model the values are the same across columns, we
+        # only compute a single column, and then expand it below.
+        predict = np.zeros((self.k_regimes, self.nobs), dtype=params.dtype)
+
+        for i in range(self.k_regimes):
+            # Predict
+            if self._k_exog > 0:
+                coeffs = params[self.parameters[i, 'exog']]
+                predict[i] = np.dot(self.exog, coeffs)
+
+        return predict[:, None, :]
+
+    def _resid(self, params):
+        predict = np.repeat(self.predict_conditional(params),
+                            self.k_regimes, axis=1)
+        return self.endog - predict

     def _conditional_loglikelihoods(self, params):
         """
         Compute loglikelihoods conditional on the current period's regime
         """
-        pass
+
+        # Get residuals
+        resid = self._resid(params)
+
+        # Compute the conditional likelihoods
+        variance = params[self.parameters['variance']].squeeze()
+        if self.switching_variance:
+            variance = np.reshape(variance, (self.k_regimes, 1, 1))
+
+        conditional_loglikelihoods = (
+            -0.5 * resid**2 / variance - 0.5 * np.log(2 * np.pi * variance))
+
+        return conditional_loglikelihoods
+
+    @property
+    def _res_classes(self):
+        return {'fit': (MarkovRegressionResults,
+                        MarkovRegressionResultsWrapper)}

     def _em_iteration(self, params0):
         """
@@ -157,19 +207,85 @@ class MarkovRegression(markov_switching.MarkovSwitching):
         non-TVTP transition probabilities and then performs the EM step for
         regression coefficients and variances.
         """
-        pass
+        # Inherited parameters
+        result, params1 = super(MarkovRegression, self)._em_iteration(params0)
+
+        tmp = np.sqrt(result.smoothed_marginal_probabilities)
+
+        # Regression coefficients
+        coeffs = None
+        if self._k_exog > 0:
+            coeffs = self._em_exog(result, self.endog, self.exog,
+                                   self.parameters.switching['exog'], tmp)
+            for i in range(self.k_regimes):
+                params1[self.parameters[i, 'exog']] = coeffs[i]
+
+        # Variances
+        params1[self.parameters['variance']] = self._em_variance(
+            result, self.endog, self.exog, coeffs, tmp)
+        # params1[self.parameters['variance']] = 0.33282116
+
+        return result, params1

     def _em_exog(self, result, endog, exog, switching, tmp=None):
         """
         EM step for regression coefficients
         """
-        pass
+        k_exog = exog.shape[1]
+        coeffs = np.zeros((self.k_regimes, k_exog))
+
+        # First, estimate non-switching coefficients
+        if not np.all(switching):
+            nonswitching_exog = exog[:, ~switching]
+            nonswitching_coeffs = (
+                np.dot(np.linalg.pinv(nonswitching_exog), endog))
+            coeffs[:, ~switching] = nonswitching_coeffs
+            endog = endog - np.dot(nonswitching_exog, nonswitching_coeffs)
+
+        # Next, get switching coefficients
+        if np.any(switching):
+            switching_exog = exog[:, switching]
+            if tmp is None:
+                tmp = np.sqrt(result.smoothed_marginal_probabilities)
+            for i in range(self.k_regimes):
+                tmp_endog = tmp[i] * endog
+                tmp_exog = tmp[i][:, np.newaxis] * switching_exog
+                coeffs[i, switching] = (
+                    np.dot(np.linalg.pinv(tmp_exog), tmp_endog))
+
+        return coeffs

     def _em_variance(self, result, endog, exog, betas, tmp=None):
         """
         EM step for variances
         """
-        pass
+        k_exog = 0 if exog is None else exog.shape[1]
+
+        if self.switching_variance:
+            variance = np.zeros(self.k_regimes)
+            for i in range(self.k_regimes):
+                if k_exog > 0:
+                    resid = endog - np.dot(exog, betas[i])
+                else:
+                    resid = endog
+                variance[i] = (
+                    np.sum(resid**2 *
+                           result.smoothed_marginal_probabilities[i]) /
+                    np.sum(result.smoothed_marginal_probabilities[i]))
+        else:
+            variance = 0
+            if tmp is None:
+                tmp = np.sqrt(result.smoothed_marginal_probabilities)
+            for i in range(self.k_regimes):
+                tmp_endog = tmp[i] * endog
+                if k_exog > 0:
+                    tmp_exog = tmp[i][:, np.newaxis] * exog
+                    resid = tmp_endog - np.dot(tmp_exog, betas[i])
+                else:
+                    resid = tmp_endog
+                variance += np.sum(resid**2)
+            variance /= self.nobs
+        return variance

     @property
     def start_params(self):
@@ -185,7 +301,31 @@ class MarkovRegression(markov_switching.MarkovSwitching):
         starting parameters, which are then used by the typical scoring
         approach.
         """
-        pass
+        # Inherited parameters
+        params = markov_switching.MarkovSwitching.start_params.fget(self)
+
+        # Regression coefficients
+        if self._k_exog > 0:
+            beta = np.dot(np.linalg.pinv(self.exog), self.endog)
+            variance = np.var(self.endog - np.dot(self.exog, beta))
+
+            if np.any(self.switching_coeffs):
+                for i in range(self.k_regimes):
+                    params[self.parameters[i, 'exog']] = (
+                        beta * (i / self.k_regimes))
+            else:
+                params[self.parameters['exog']] = beta
+        else:
+            variance = np.var(self.endog)
+
+        # Variances
+        if self.switching_variance:
+            params[self.parameters['variance']] = (
+                np.linspace(variance / 10., variance, num=self.k_regimes))
+        else:
+            params[self.parameters['variance']] = variance
+
+        return params

     @property
     def param_names(self):
@@ -193,7 +333,27 @@ class MarkovRegression(markov_switching.MarkovSwitching):
         (list of str) List of human readable parameter names (for parameters
         actually included in the model).
         """
-        pass
+        # Inherited parameters
+        param_names = np.array(
+            markov_switching.MarkovSwitching.param_names.fget(self),
+            dtype=object)
+
+        # Regression coefficients
+        if np.any(self.switching_coeffs):
+            for i in range(self.k_regimes):
+                param_names[self.parameters[i, 'exog']] = [
+                    '%s[%d]' % (exog_name, i) for exog_name in self.exog_names]
+        else:
+            param_names[self.parameters['exog']] = self.exog_names
+
+        # Variances
+        if self.switching_variance:
+            for i in range(self.k_regimes):
+                param_names[self.parameters[i, 'variance']] = 'sigma2[%d]' % i
+        else:
+            param_names[self.parameters['variance']] = 'sigma2'
+
+        return param_names.tolist()

     def transform_params(self, unconstrained):
         """
@@ -212,7 +372,19 @@ class MarkovRegression(markov_switching.MarkovSwitching):
             Array of constrained parameters which may be used in likelihood
             evaluation.
         """
-        pass
+        # Inherited parameters
+        constrained = super(MarkovRegression, self).transform_params(
+            unconstrained)
+
+        # Nothing to do for regression coefficients
+        constrained[self.parameters['exog']] = (
+            unconstrained[self.parameters['exog']])
+
+        # Force variances to be positive
+        constrained[self.parameters['variance']] = (
+            unconstrained[self.parameters['variance']]**2)
+
+        return constrained

     def untransform_params(self, constrained):
         """
@@ -230,11 +402,23 @@ class MarkovRegression(markov_switching.MarkovSwitching):
         unconstrained : array_like
             Array of unconstrained parameters used by the optimizer.
         """
-        pass
+        # Inherited parameters
+        unconstrained = super(MarkovRegression, self).untransform_params(
+            constrained)
+
+        # Nothing to do for regression coefficients
+        unconstrained[self.parameters['exog']] = (
+            constrained[self.parameters['exog']])
+
+        # Force variances to be positive
+        unconstrained[self.parameters['variance']] = (
+            constrained[self.parameters['variance']]**0.5)
+
+        return unconstrained


 class MarkovRegressionResults(markov_switching.MarkovSwitchingResults):
-    """
+    r"""
     Class to hold results from fitting a Markov switching regression model

     Parameters
@@ -265,9 +449,8 @@ class MarkovRegressionResults(markov_switching.MarkovSwitchingResults):
     pass


-class MarkovRegressionResultsWrapper(markov_switching.
-    MarkovSwitchingResultsWrapper):
+class MarkovRegressionResultsWrapper(
+        markov_switching.MarkovSwitchingResultsWrapper):
     pass
-
-
-wrap.populate_wrapper(MarkovRegressionResultsWrapper, MarkovRegressionResults)
+wrap.populate_wrapper(MarkovRegressionResultsWrapper,  # noqa:E305
+                      MarkovRegressionResults)
diff --git a/statsmodels/tsa/regime_switching/markov_switching.py b/statsmodels/tsa/regime_switching/markov_switching.py
index cb8270e25..47a81679e 100644
--- a/statsmodels/tsa/regime_switching/markov_switching.py
+++ b/statsmodels/tsa/regime_switching/markov_switching.py
@@ -5,9 +5,11 @@ Author: Chad Fulton
 License: BSD-3
 """
 import warnings
+
 import numpy as np
 import pandas as pd
 from scipy.special import logsumexp
+
 from statsmodels.base.data import PandasData
 import statsmodels.base.wrapper as wrap
 from statsmodels.tools.decorators import cache_readonly
@@ -16,31 +18,97 @@ from statsmodels.tools.numdiff import approx_fprime_cs, approx_hess_cs
 from statsmodels.tools.sm_exceptions import EstimationWarning
 from statsmodels.tools.tools import Bunch, pinv_extended
 import statsmodels.tsa.base.tsa_model as tsbase
-from statsmodels.tsa.regime_switching._hamilton_filter import chamilton_filter_log, dhamilton_filter_log, shamilton_filter_log, zhamilton_filter_log
-from statsmodels.tsa.regime_switching._kim_smoother import ckim_smoother_log, dkim_smoother_log, skim_smoother_log, zkim_smoother_log
-from statsmodels.tsa.statespace.tools import find_best_blas_type, prepare_exog, _safe_cond
-prefix_hamilton_filter_log_map = {'s': shamilton_filter_log, 'd':
-    dhamilton_filter_log, 'c': chamilton_filter_log, 'z': zhamilton_filter_log}
-prefix_kim_smoother_log_map = {'s': skim_smoother_log, 'd':
-    dkim_smoother_log, 'c': ckim_smoother_log, 'z': zkim_smoother_log}
+from statsmodels.tsa.regime_switching._hamilton_filter import (
+    chamilton_filter_log,
+    dhamilton_filter_log,
+    shamilton_filter_log,
+    zhamilton_filter_log,
+)
+from statsmodels.tsa.regime_switching._kim_smoother import (
+    ckim_smoother_log,
+    dkim_smoother_log,
+    skim_smoother_log,
+    zkim_smoother_log,
+)
+from statsmodels.tsa.statespace.tools import (
+    find_best_blas_type,
+    prepare_exog,
+    _safe_cond
+)
+
+prefix_hamilton_filter_log_map = {
+    's': shamilton_filter_log, 'd': dhamilton_filter_log,
+    'c': chamilton_filter_log, 'z': zhamilton_filter_log
+}
+
+prefix_kim_smoother_log_map = {
+    's': skim_smoother_log, 'd': dkim_smoother_log,
+    'c': ckim_smoother_log, 'z': zkim_smoother_log
+}


 def _logistic(x):
     """
     Note that this is not a vectorized function
     """
-    pass
+    x = np.array(x)
+    # np.exp(x) / (1 + np.exp(x))
+    if x.ndim == 0:
+        y = np.reshape(x, (1, 1, 1))
+    # np.exp(x[i]) / (1 + np.sum(np.exp(x[:])))
+    elif x.ndim == 1:
+        y = np.reshape(x, (len(x), 1, 1))
+    # np.exp(x[i,t]) / (1 + np.sum(np.exp(x[:,t])))
+    elif x.ndim == 2:
+        y = np.reshape(x, (x.shape[0], 1, x.shape[1]))
+    # np.exp(x[i,j,t]) / (1 + np.sum(np.exp(x[:,j,t])))
+    elif x.ndim == 3:
+        y = x
+    else:
+        raise NotImplementedError
+
+    tmp = np.c_[np.zeros((y.shape[-1], y.shape[1], 1)), y.T].T
+    evaluated = np.reshape(np.exp(y - logsumexp(tmp, axis=0)), x.shape)
+
+    return evaluated


 def _partials_logistic(x):
     """
     Note that this is not a vectorized function
     """
-    pass
+    tmp = _logistic(x)
+
+    # k
+    if tmp.ndim == 0:
+        return tmp - tmp**2
+    # k x k
+    elif tmp.ndim == 1:
+        partials = np.diag(tmp - tmp**2)
+    # k x k x t
+    elif tmp.ndim == 2:
+        partials = [np.diag(tmp[:, t] - tmp[:, t]**2)
+                    for t in range(tmp.shape[1])]
+        shape = tmp.shape[1], tmp.shape[0], tmp.shape[0]
+        partials = np.concatenate(partials).reshape(shape).transpose((1, 2, 0))
+    # k x k x j x t
+    else:
+        partials = [[np.diag(tmp[:, j, t] - tmp[:, j, t]**2)
+                     for t in range(tmp.shape[2])]
+                    for j in range(tmp.shape[1])]
+        shape = tmp.shape[1], tmp.shape[2], tmp.shape[0], tmp.shape[0]
+        partials = np.concatenate(partials).reshape(shape).transpose(
+            (2, 3, 0, 1))
+
+    for i in range(tmp.shape[0]):
+        for j in range(i):
+            partials[i, j, ...] = -tmp[i, ...] * tmp[j, ...]
+            partials[j, i, ...] = partials[i, j, ...]
+    return partials


 def cy_hamilton_filter_log(initial_probabilities, regime_transition,
-    conditional_loglikelihoods, model_order):
+                           conditional_loglikelihoods, model_order):
     """
     Hamilton filter in log space using Cython inner loop.

@@ -82,11 +150,89 @@ def cy_hamilton_filter_log(initial_probabilities, regime_transition,
         being in each combination of regimes conditional on time t
         information. Shaped (k_regimes,) * (order + 1) + (nobs,).
     """
-    pass
+
+    # Dimensions
+    k_regimes = len(initial_probabilities)
+    nobs = conditional_loglikelihoods.shape[-1]
+    order = conditional_loglikelihoods.ndim - 2
+    dtype = conditional_loglikelihoods.dtype
+
+    # Check for compatible shapes.
+    incompatible_shapes = (
+        regime_transition.shape[-1] not in (1, nobs + model_order)
+        or regime_transition.shape[:2] != (k_regimes, k_regimes)
+        or conditional_loglikelihoods.shape[0] != k_regimes)
+    if incompatible_shapes:
+        raise ValueError('Arguments do not have compatible shapes')
+
+    # Convert to log space
+    initial_probabilities = np.log(initial_probabilities)
+    regime_transition = np.log(np.maximum(regime_transition, 1e-20))
+
+    # Storage
+    # Pr[S_t = s_t | Y_t]
+    filtered_marginal_probabilities = (
+        np.zeros((k_regimes, nobs), dtype=dtype))
+    # Pr[S_t = s_t, ... S_{t-r} = s_{t-r} | Y_{t-1}]
+    # Has k_regimes^(order+1) elements
+    predicted_joint_probabilities = np.zeros(
+        (k_regimes,) * (order + 1) + (nobs,), dtype=dtype)
+    # log(f(y_t | Y_{t-1}))
+    joint_loglikelihoods = np.zeros((nobs,), dtype)
+    # Pr[S_t = s_t, ... S_{t-r+1} = s_{t-r+1} | Y_t]
+    # Has k_regimes^order elements
+    filtered_joint_probabilities = np.zeros(
+        (k_regimes,) * (order + 1) + (nobs + 1,), dtype=dtype)
+
+    # Initial probabilities
+    filtered_marginal_probabilities[:, 0] = initial_probabilities
+    tmp = np.copy(initial_probabilities)
+    shape = (k_regimes, k_regimes)
+    transition_t = 0
+    for i in range(order):
+        if regime_transition.shape[-1] > 1:
+            transition_t = i
+        tmp = np.reshape(regime_transition[..., transition_t],
+                         shape + (1,) * i) + tmp
+    filtered_joint_probabilities[..., 0] = tmp
+
+    # Get appropriate subset of transition matrix
+    if regime_transition.shape[-1] > 1:
+        regime_transition = regime_transition[..., model_order:]
+
+    # Run Cython filter iterations
+    prefix, dtype, _ = find_best_blas_type((
+        regime_transition, conditional_loglikelihoods, joint_loglikelihoods,
+        predicted_joint_probabilities, filtered_joint_probabilities))
+    func = prefix_hamilton_filter_log_map[prefix]
+    func(nobs, k_regimes, order, regime_transition,
+         conditional_loglikelihoods.reshape(k_regimes**(order+1), nobs),
+         joint_loglikelihoods,
+         predicted_joint_probabilities.reshape(k_regimes**(order+1), nobs),
+         filtered_joint_probabilities.reshape(k_regimes**(order+1), nobs+1))
+
+    # Save log versions for smoother
+    predicted_joint_probabilities_log = predicted_joint_probabilities
+    filtered_joint_probabilities_log = filtered_joint_probabilities
+
+    # Convert out of log scale
+    predicted_joint_probabilities = np.exp(predicted_joint_probabilities)
+    filtered_joint_probabilities = np.exp(filtered_joint_probabilities)
+
+    # S_t | t
+    filtered_marginal_probabilities = filtered_joint_probabilities[..., 1:]
+    for i in range(1, filtered_marginal_probabilities.ndim - 1):
+        filtered_marginal_probabilities = np.sum(
+            filtered_marginal_probabilities, axis=-2)
+
+    return (filtered_marginal_probabilities, predicted_joint_probabilities,
+            joint_loglikelihoods, filtered_joint_probabilities[..., 1:],
+            predicted_joint_probabilities_log,
+            filtered_joint_probabilities_log[..., 1:])


 def cy_kim_smoother_log(regime_transition, predicted_joint_probabilities,
-    filtered_joint_probabilities):
+                        filtered_joint_probabilities):
     """
     Kim smoother in log space using Cython inner loop.

@@ -118,7 +264,45 @@ def cy_kim_smoother_log(regime_transition, predicted_joint_probabilities,
         Array containing Pr[S_t=s_t | Y_T] - the probability of being in each
         regime conditional on all information. Shaped (k_regimes, nobs).
     """
-    pass
+
+    # Dimensions
+    k_regimes = filtered_joint_probabilities.shape[0]
+    nobs = filtered_joint_probabilities.shape[-1]
+    order = filtered_joint_probabilities.ndim - 2
+    dtype = filtered_joint_probabilities.dtype
+
+    # Storage
+    smoothed_joint_probabilities = np.zeros(
+        (k_regimes,) * (order + 1) + (nobs,), dtype=dtype)
+
+    # Get appropriate subset of transition matrix
+    if regime_transition.shape[-1] == nobs + order:
+        regime_transition = regime_transition[..., order:]
+
+    # Convert to log space
+    regime_transition = np.log(np.maximum(regime_transition, 1e-20))
+
+    # Run Cython smoother iterations
+    prefix, dtype, _ = find_best_blas_type((
+        regime_transition, predicted_joint_probabilities,
+        filtered_joint_probabilities))
+    func = prefix_kim_smoother_log_map[prefix]
+    func(nobs, k_regimes, order, regime_transition,
+         predicted_joint_probabilities.reshape(k_regimes**(order+1), nobs),
+         filtered_joint_probabilities.reshape(k_regimes**(order+1), nobs),
+         smoothed_joint_probabilities.reshape(k_regimes**(order+1), nobs))
+
+    # Convert back from log space
+    smoothed_joint_probabilities = np.exp(smoothed_joint_probabilities)
+
+    # Get smoothed marginal probabilities S_t | T by integrating out
+    # S_{t-k+1}, S_{t-k+2}, ..., S_{t-1}
+    smoothed_marginal_probabilities = smoothed_joint_probabilities
+    for i in range(1, smoothed_marginal_probabilities.ndim - 1):
+        smoothed_marginal_probabilities = np.sum(
+            smoothed_marginal_probabilities, axis=-2)
+
+    return smoothed_joint_probabilities, smoothed_marginal_probabilities


 class MarkovSwitchingParams:
@@ -200,22 +384,26 @@ class MarkovSwitchingParams:
     >>> parameters.k_parameters['exog']
     3
     """
-
     def __init__(self, k_regimes):
         self.k_regimes = k_regimes
+
         self.k_params = 0
         self.k_parameters = {}
         self.switching = {}
         self.slices_purpose = {}
-        self.relative_index_regime_purpose = [{} for i in range(self.k_regimes)
-            ]
-        self.index_regime_purpose = [{} for i in range(self.k_regimes)]
+        self.relative_index_regime_purpose = [
+            {} for i in range(self.k_regimes)]
+        self.index_regime_purpose = [
+            {} for i in range(self.k_regimes)]
         self.index_regime = [[] for i in range(self.k_regimes)]

     def __getitem__(self, key):
         _type = type(key)
+
+        # Get a slice for a block of parameters by purpose
         if _type is str:
             return self.slices_purpose[key]
+        # Get a slice for a block of parameters by regime
         elif _type is int:
             return self.index_regime[key]
         elif _type is tuple:
@@ -232,28 +420,34 @@ class MarkovSwitchingParams:

     def __setitem__(self, key, value):
         _type = type(key)
+
         if _type is str:
             value = np.array(value, dtype=bool, ndmin=1)
             k_params = self.k_params
-            self.k_parameters[key] = value.size + np.sum(value) * (self.
-                k_regimes - 1)
+            self.k_parameters[key] = (
+                value.size + np.sum(value) * (self.k_regimes - 1))
             self.k_params += self.k_parameters[key]
             self.switching[key] = value
             self.slices_purpose[key] = np.s_[k_params:self.k_params]
+
             for j in range(self.k_regimes):
                 self.relative_index_regime_purpose[j][key] = []
                 self.index_regime_purpose[j][key] = []
+
             offset = 0
             for i in range(value.size):
                 switching = value[i]
                 for j in range(self.k_regimes):
+                    # Non-switching parameters
                     if not switching:
                         self.relative_index_regime_purpose[j][key].append(
                             offset)
+                    # Switching parameters
                     else:
                         self.relative_index_regime_purpose[j][key].append(
                             offset + j)
                 offset += 1 if not switching else self.k_regimes
+
             for j in range(self.k_regimes):
                 offset = 0
                 indices = []
@@ -301,29 +495,45 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
     """

     def __init__(self, endog, k_regimes, order=0, exog_tvtp=None, exog=None,
-        dates=None, freq=None, missing='none'):
+                 dates=None, freq=None, missing='none'):
+
+        # Properties
         self.k_regimes = k_regimes
         self.tvtp = exog_tvtp is not None
+        # The order of the model may be overridden in subclasses
         self.order = order
+
+        # Exogenous data
+        # TODO add checks for exog_tvtp consistent shape and indices
         self.k_tvtp, self.exog_tvtp = prepare_exog(exog_tvtp)
+
+        # Initialize the base model
         super(MarkovSwitching, self).__init__(endog, exog, dates=dates,
-            freq=freq, missing=missing)
+                                              freq=freq, missing=missing)
+
+        # Dimensions
         self.nobs = self.endog.shape[0]
+
+        # Sanity checks
         if self.endog.ndim > 1 and self.endog.shape[1] > 1:
             raise ValueError('Must have univariate endogenous data.')
         if self.k_regimes < 2:
-            raise ValueError(
-                'Markov switching models must have at least two regimes.')
-        if not (self.exog_tvtp is None or self.exog_tvtp.shape[0] == self.nobs
-            ):
-            raise ValueError(
-                'Time-varying transition probabilities exogenous array must have the same number of observations as the endogenous array.'
-                )
+            raise ValueError('Markov switching models must have at least two'
+                             ' regimes.')
+        if not (self.exog_tvtp is None or
+                self.exog_tvtp.shape[0] == self.nobs):
+            raise ValueError('Time-varying transition probabilities exogenous'
+                             ' array must have the same number of observations'
+                             ' as the endogenous array.')
+
+        # Parameters
         self.parameters = MarkovSwitchingParams(self.k_regimes)
         k_transition = self.k_regimes - 1
         if self.tvtp:
             k_transition *= self.k_tvtp
         self.parameters['regime_transition'] = [1] * k_transition
+
+        # Internal model properties: default is steady-state initialization
         self._initialization = 'steady-state'
         self._initial_probabilities = None

@@ -332,7 +542,7 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         """
         (int) Number of parameters in the model
         """
-        pass
+        return self.parameters.k_params

     def initialize_steady_state(self):
         """
@@ -342,19 +552,81 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         -----
         Only valid if there are not time-varying transition probabilities.
         """
-        pass
+        if self.tvtp:
+            raise ValueError('Cannot use steady-state initialization when'
+                             ' the regime transition matrix is time-varying.')
+
+        self._initialization = 'steady-state'
+        self._initial_probabilities = None

-    def initialize_known(self, probabilities, tol=1e-08):
+    def initialize_known(self, probabilities, tol=1e-8):
         """
         Set initialization of regime probabilities to use known values
         """
-        pass
+        self._initialization = 'known'
+        probabilities = np.array(probabilities, ndmin=1)
+        if not probabilities.shape == (self.k_regimes,):
+            raise ValueError('Initial probabilities must be a vector of shape'
+                             ' (k_regimes,).')
+        if not np.abs(np.sum(probabilities) - 1) < tol:
+            raise ValueError('Initial probabilities vector must sum to one.')
+        self._initial_probabilities = probabilities

     def initial_probabilities(self, params, regime_transition=None):
         """
         Retrieve initial probabilities
         """
-        pass
+        params = np.array(params, ndmin=1)
+        if self._initialization == 'steady-state':
+            if regime_transition is None:
+                regime_transition = self.regime_transition_matrix(params)
+            if regime_transition.ndim == 3:
+                regime_transition = regime_transition[..., 0]
+            m = regime_transition.shape[0]
+            A = np.c_[(np.eye(m) - regime_transition).T, np.ones(m)].T
+            try:
+                probabilities = np.linalg.pinv(A)[:, -1]
+            except np.linalg.LinAlgError:
+                raise RuntimeError('Steady-state probabilities could not be'
+                                   ' constructed.')
+        elif self._initialization == 'known':
+            probabilities = self._initial_probabilities
+        else:
+            raise RuntimeError('Invalid initialization method selected.')
+
+        # Slightly bound probabilities away from zero (for filters in log
+        # space)
+        probabilities = np.maximum(probabilities, 1e-20)
+
+        return probabilities
+
+    def _regime_transition_matrix_tvtp(self, params, exog_tvtp=None):
+        if exog_tvtp is None:
+            exog_tvtp = self.exog_tvtp
+        nobs = len(exog_tvtp)
+
+        regime_transition_matrix = np.zeros(
+            (self.k_regimes, self.k_regimes, nobs),
+            dtype=np.promote_types(np.float64, params.dtype))
+
+        # Compute the predicted values from the regression
+        for i in range(self.k_regimes):
+            coeffs = params[self.parameters[i, 'regime_transition']]
+            regime_transition_matrix[:-1, i, :] = np.dot(
+                exog_tvtp,
+                np.reshape(coeffs, (self.k_regimes-1, self.k_tvtp)).T).T
+
+        # Perform the logistic transformation
+        tmp = np.c_[np.zeros((nobs, self.k_regimes, 1)),
+                    regime_transition_matrix[:-1, :, :].T].T
+        regime_transition_matrix[:-1, :, :] = np.exp(
+            regime_transition_matrix[:-1, :, :] - logsumexp(tmp, axis=0))
+
+        # Compute the last column of the transition matrix
+        regime_transition_matrix[-1, :, :] = (
+            1 - np.sum(regime_transition_matrix[:-1, :, :], axis=0))
+
+        return regime_transition_matrix

     def regime_transition_matrix(self, params, exog_tvtp=None):
         """
@@ -374,10 +646,24 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         it is certain that from one regime (j) you will transition to *some
         other regime*).
         """
-        pass
+        params = np.array(params, ndmin=1)
+        if not self.tvtp:
+            regime_transition_matrix = np.zeros(
+                (self.k_regimes, self.k_regimes, 1),
+                dtype=np.promote_types(np.float64, params.dtype))
+            regime_transition_matrix[:-1, :, 0] = np.reshape(
+                params[self.parameters['regime_transition']],
+                (self.k_regimes-1, self.k_regimes))
+            regime_transition_matrix[-1, :, 0] = (
+                1 - np.sum(regime_transition_matrix[:-1, :, 0], axis=0))
+        else:
+            regime_transition_matrix = (
+                self._regime_transition_matrix_tvtp(params, exog_tvtp))
+
+        return regime_transition_matrix

     def predict(self, params, start=None, end=None, probabilities=None,
-        conditional=False):
+                conditional=False):
         """
         In-sample prediction and out-of-sample forecasting

@@ -414,7 +700,41 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
             Array of out of in-sample predictions and / or out-of-sample
             forecasts.
         """
-        pass
+        if start is None:
+            start = self._index[0]
+
+        # Handle start, end
+        start, end, out_of_sample, prediction_index = (
+            self._get_prediction_index(start, end))
+
+        if out_of_sample > 0:
+            raise NotImplementedError
+
+        # Perform in-sample prediction
+        predict = self.predict_conditional(params)
+        squeezed = np.squeeze(predict)
+
+        # Check if we need to do weighted averaging
+        if squeezed.ndim - 1 > conditional:
+            # Determine in-sample weighting probabilities
+            if probabilities is None or probabilities == 'smoothed':
+                results = self.smooth(params, return_raw=True)
+                probabilities = results.smoothed_joint_probabilities
+            elif probabilities == 'filtered':
+                results = self.filter(params, return_raw=True)
+                probabilities = results.filtered_joint_probabilities
+            elif probabilities == 'predicted':
+                results = self.filter(params, return_raw=True)
+                probabilities = results.predicted_joint_probabilities
+
+            # Compute weighted average
+            predict = (predict * probabilities)
+            for i in range(predict.ndim - 1 - int(conditional)):
+                predict = np.sum(predict, axis=-2)
+        else:
+            predict = squeezed
+
+        return predict[start:end + out_of_sample + 1]

     def predict_conditional(self, params):
         """
@@ -432,7 +752,7 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
             Array of predictions conditional on current, and possibly past,
             regimes
         """
-        pass
+        raise NotImplementedError

     def _conditional_loglikelihoods(self, params):
         """
@@ -441,10 +761,29 @@ class MarkovSwitching(tsbase.TimeSeriesModel):

         Must be implemented in subclasses.
         """
-        pass
+        raise NotImplementedError
+
+    def _filter(self, params, regime_transition=None):
+        # Get the regime transition matrix if not provided
+        if regime_transition is None:
+            regime_transition = self.regime_transition_matrix(params)
+        # Get the initial probabilities
+        initial_probabilities = self.initial_probabilities(
+            params, regime_transition)
+
+        # Compute the conditional likelihoods
+        conditional_loglikelihoods = self._conditional_loglikelihoods(params)
+
+        # Apply the filter
+        return ((regime_transition, initial_probabilities,
+                 conditional_loglikelihoods) +
+                cy_hamilton_filter_log(
+                    initial_probabilities, regime_transition,
+                    conditional_loglikelihoods, self.order))

     def filter(self, params, transformed=True, cov_type=None, cov_kwds=None,
-        return_raw=False, results_class=None, results_wrapper_class=None):
+               return_raw=False, results_class=None,
+               results_wrapper_class=None):
         """
         Apply the Hamilton filter

@@ -476,10 +815,67 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         -------
         MarkovSwitchingResults
         """
-        pass
+        params = np.array(params, ndmin=1)
+
+        if not transformed:
+            params = self.transform_params(params)
+
+        # Save the parameter names
+        self.data.param_names = self.param_names
+
+        # Get the result
+        names = ['regime_transition', 'initial_probabilities',
+                 'conditional_loglikelihoods',
+                 'filtered_marginal_probabilities',
+                 'predicted_joint_probabilities', 'joint_loglikelihoods',
+                 'filtered_joint_probabilities',
+                 'predicted_joint_probabilities_log',
+                 'filtered_joint_probabilities_log']
+        result = HamiltonFilterResults(
+            self, Bunch(**dict(zip(names, self._filter(params)))))
+
+        # Wrap in a results object
+        return self._wrap_results(params, result, return_raw, cov_type,
+                                  cov_kwds, results_class,
+                                  results_wrapper_class)
+
+    def _smooth(self, params, predicted_joint_probabilities_log,
+                filtered_joint_probabilities_log, regime_transition=None):
+        # Get the regime transition matrix
+        if regime_transition is None:
+            regime_transition = self.regime_transition_matrix(params)
+
+        # Apply the smoother
+        return cy_kim_smoother_log(regime_transition,
+                                   predicted_joint_probabilities_log,
+                                   filtered_joint_probabilities_log)
+
+    @property
+    def _res_classes(self):
+        return {'fit': (MarkovSwitchingResults, MarkovSwitchingResultsWrapper)}
+
+    def _wrap_results(self, params, result, return_raw, cov_type=None,
+                      cov_kwds=None, results_class=None, wrapper_class=None):
+        if not return_raw:
+            # Wrap in a results object
+            result_kwargs = {}
+            if cov_type is not None:
+                result_kwargs['cov_type'] = cov_type
+            if cov_kwds is not None:
+                result_kwargs['cov_kwds'] = cov_kwds
+
+            if results_class is None:
+                results_class = self._res_classes['fit'][0]
+            if wrapper_class is None:
+                wrapper_class = self._res_classes['fit'][1]
+
+            res = results_class(self, params, result, **result_kwargs)
+            result = wrapper_class(res)
+        return result

     def smooth(self, params, transformed=True, cov_type=None, cov_kwds=None,
-        return_raw=False, results_class=None, results_wrapper_class=None):
+               return_raw=False, results_class=None,
+               results_wrapper_class=None):
         """
         Apply the Kim smoother and Hamilton filter

@@ -511,7 +907,37 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         -------
         MarkovSwitchingResults
         """
-        pass
+        params = np.array(params, ndmin=1)
+
+        if not transformed:
+            params = self.transform_params(params)
+
+        # Save the parameter names
+        self.data.param_names = self.param_names
+
+        # Hamilton filter
+        # TODO add option to filter to return logged values so that we do not
+        # need to re-log them for smoother
+        names = ['regime_transition', 'initial_probabilities',
+                 'conditional_loglikelihoods',
+                 'filtered_marginal_probabilities',
+                 'predicted_joint_probabilities', 'joint_loglikelihoods',
+                 'filtered_joint_probabilities',
+                 'predicted_joint_probabilities_log',
+                 'filtered_joint_probabilities_log']
+        result = Bunch(**dict(zip(names, self._filter(params))))
+
+        # Kim smoother
+        out = self._smooth(params, result.predicted_joint_probabilities_log,
+                           result.filtered_joint_probabilities_log)
+        result['smoothed_joint_probabilities'] = out[0]
+        result['smoothed_marginal_probabilities'] = out[1]
+        result = KimSmootherResults(self, result)
+
+        # Wrap in a results object
+        return self._wrap_results(params, result, return_raw, cov_type,
+                                  cov_kwds, results_class,
+                                  results_wrapper_class)

     def loglikeobs(self, params, transformed=True):
         """
@@ -525,7 +951,14 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         transformed : bool, optional
             Whether or not `params` is already transformed. Default is True.
         """
-        pass
+        params = np.array(params, ndmin=1)
+
+        if not transformed:
+            params = self.transform_params(params)
+
+        results = self._filter(params)
+
+        return results[5]

     def loglike(self, params, transformed=True):
         """
@@ -539,7 +972,7 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         transformed : bool, optional
             Whether or not `params` is already transformed. Default is True.
         """
-        pass
+        return np.sum(self.loglikeobs(params, transformed))

     def score(self, params, transformed=True):
         """
@@ -553,7 +986,9 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         transformed : bool, optional
             Whether or not `params` is already transformed. Default is True.
         """
-        pass
+        params = np.array(params, ndmin=1)
+
+        return approx_fprime_cs(params, self.loglike, args=(transformed,))

     def score_obs(self, params, transformed=True):
         """
@@ -567,7 +1002,9 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         transformed : bool, optional
             Whether or not `params` is already transformed. Default is True.
         """
-        pass
+        params = np.array(params, ndmin=1)
+
+        return approx_fprime_cs(params, self.loglikeobs, args=(transformed,))

     def hessian(self, params, transformed=True):
         """
@@ -582,12 +1019,14 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         transformed : bool, optional
             Whether or not `params` is already transformed. Default is True.
         """
-        pass
+        params = np.array(params, ndmin=1)
+
+        return approx_hess_cs(params, self.loglike)

     def fit(self, start_params=None, transformed=True, cov_type='approx',
-        cov_kwds=None, method='bfgs', maxiter=100, full_output=1, disp=0,
-        callback=None, return_params=False, em_iter=5, search_reps=0,
-        search_iter=5, search_scale=1.0, **kwargs):
+            cov_kwds=None, method='bfgs', maxiter=100, full_output=1, disp=0,
+            callback=None, return_params=False, em_iter=5, search_reps=0,
+            search_iter=5, search_scale=1., **kwargs):
         """
         Fits the model by maximum likelihood via Hamilton filter.

@@ -655,11 +1094,57 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         -------
         MarkovSwitchingResults
         """
-        pass
+
+        if start_params is None:
+            start_params = self.start_params
+            transformed = True
+        else:
+            start_params = np.array(start_params, ndmin=1)
+
+        # Random search for better start parameters
+        if search_reps > 0:
+            start_params = self._start_params_search(
+                search_reps, start_params=start_params,
+                transformed=transformed, em_iter=search_iter,
+                scale=search_scale)
+            transformed = True
+
+        # Get better start params through EM algorithm
+        if em_iter and not self.tvtp:
+            start_params = self._fit_em(start_params, transformed=transformed,
+                                        maxiter=em_iter, tolerance=0,
+                                        return_params=True)
+            transformed = True
+
+        if transformed:
+            start_params = self.untransform_params(start_params)
+
+        # Maximum likelihood estimation by scoring
+        fargs = (False,)
+        mlefit = super(MarkovSwitching, self).fit(start_params, method=method,
+                                                  fargs=fargs,
+                                                  maxiter=maxiter,
+                                                  full_output=full_output,
+                                                  disp=disp, callback=callback,
+                                                  skip_hessian=True, **kwargs)
+
+        # Just return the fitted parameters if requested
+        if return_params:
+            result = self.transform_params(mlefit.params)
+        # Otherwise construct the results class if desired
+        else:
+            result = self.smooth(mlefit.params, transformed=False,
+                                 cov_type=cov_type, cov_kwds=cov_kwds)
+
+            result.mlefit = mlefit
+            result.mle_retvals = mlefit.mle_retvals
+            result.mle_settings = mlefit.mle_settings
+
+        return result

     def _fit_em(self, start_params=None, transformed=True, cov_type='none',
-        cov_kwds=None, maxiter=50, tolerance=1e-06, full_output=True,
-        return_params=False, **kwargs):
+                cov_kwds=None, maxiter=50, tolerance=1e-6, full_output=True,
+                return_params=False, **kwargs):
         """
         Fits the model using the Expectation-Maximization (EM) algorithm

@@ -702,7 +1187,52 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         -------
         MarkovSwitchingResults
         """
-        pass
+
+        if start_params is None:
+            start_params = self.start_params
+            transformed = True
+        else:
+            start_params = np.array(start_params, ndmin=1)
+
+        if not transformed:
+            start_params = self.transform_params(start_params)
+
+        # Perform expectation-maximization
+        llf = []
+        params = [start_params]
+        i = 0
+        delta = 0
+        while i < maxiter and (i < 2 or (delta > tolerance)):
+            out = self._em_iteration(params[-1])
+            llf.append(out[0].llf)
+            params.append(out[1])
+            if i > 0:
+                delta = 2 * (llf[-1] - llf[-2]) / np.abs((llf[-1] + llf[-2]))
+            i += 1
+
+        # Just return the fitted parameters if requested
+        if return_params:
+            result = params[-1]
+        # Otherwise construct the results class if desired
+        else:
+            result = self.filter(params[-1], transformed=True,
+                                 cov_type=cov_type, cov_kwds=cov_kwds)
+
+            # Save the output
+            if full_output:
+                em_retvals = Bunch(**{'params': np.array(params),
+                                      'llf': np.array(llf),
+                                      'iter': i})
+                em_settings = Bunch(**{'tolerance': tolerance,
+                                       'maxiter': maxiter})
+            else:
+                em_retvals = None
+                em_settings = None
+
+            result.mle_retvals = em_retvals
+            result.mle_settings = em_settings
+
+        return result

     def _em_iteration(self, params0):
         """
@@ -713,16 +1243,61 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         The EM iteration in this base class only performs the EM step for
         non-TVTP transition probabilities.
         """
-        pass
+        params1 = np.zeros(params0.shape,
+                           dtype=np.promote_types(np.float64, params0.dtype))
+
+        # Smooth at the given parameters
+        result = self.smooth(params0, transformed=True, return_raw=True)
+
+        # The EM with TVTP is not yet supported, just return the previous
+        # iteration parameters
+        if self.tvtp:
+            params1[self.parameters['regime_transition']] = (
+                params0[self.parameters['regime_transition']])
+        else:
+            regime_transition = self._em_regime_transition(result)
+            for i in range(self.k_regimes):
+                params1[self.parameters[i, 'regime_transition']] = (
+                    regime_transition[i])
+
+        return result, params1

     def _em_regime_transition(self, result):
         """
         EM step for regime transition probabilities
         """
-        pass

-    def _start_params_search(self, reps, start_params=None, transformed=
-        True, em_iter=5, scale=1.0):
+        # Marginalize the smoothed joint probabilities to just S_t, S_{t-1} | T
+        tmp = result.smoothed_joint_probabilities
+        for i in range(tmp.ndim - 3):
+            tmp = np.sum(tmp, -2)
+        smoothed_joint_probabilities = tmp
+
+        # Transition parameters (recall we're not yet supporting TVTP here)
+        k_transition = len(self.parameters[0, 'regime_transition'])
+        regime_transition = np.zeros((self.k_regimes, k_transition))
+        for i in range(self.k_regimes):  # S_{t_1}
+            for j in range(self.k_regimes - 1):  # S_t
+                regime_transition[i, j] = (
+                    np.sum(smoothed_joint_probabilities[j, i]) /
+                    np.sum(result.smoothed_marginal_probabilities[i]))
+
+            # It may be the case that due to rounding error this estimates
+            # transition probabilities that sum to greater than one. If so,
+            # re-scale the probabilities and warn the user that something
+            # is not quite right
+            delta = np.sum(regime_transition[i]) - 1
+            if delta > 0:
+                warnings.warn('Invalid regime transition probabilities'
+                              ' estimated in EM iteration; probabilities have'
+                              ' been re-scaled to continue estimation.',
+                              EstimationWarning)
+                regime_transition[i] /= 1 + delta + 1e-6
+
+        return regime_transition
+
+    def _start_params_search(self, reps, start_params=None, transformed=True,
+                             em_iter=5, scale=1.):
         """
         Search for starting parameters as random permutations of a vector

@@ -748,14 +1323,65 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         This is a private method for finding good starting parameters for MLE
         by scoring, where the defaults have been set heuristically.
         """
-        pass
+        if start_params is None:
+            start_params = self.start_params
+            transformed = True
+        else:
+            start_params = np.array(start_params, ndmin=1)
+
+        # Random search is over untransformed space
+        if transformed:
+            start_params = self.untransform_params(start_params)
+
+        # Construct the standard deviations
+        scale = np.array(scale, ndmin=1)
+        if scale.size == 1:
+            scale = np.ones(self.k_params) * scale
+        if not scale.size == self.k_params:
+            raise ValueError('Scale of variates for random start'
+                             ' parameter search must be given for each'
+                             ' parameter or as a single scalar.')
+
+        # Construct the random variates
+        variates = np.zeros((reps, self.k_params))
+        for i in range(self.k_params):
+            variates[:, i] = scale[i] * np.random.uniform(-0.5, 0.5, size=reps)
+
+        llf = self.loglike(start_params, transformed=False)
+        params = start_params
+        for i in range(reps):
+            with warnings.catch_warnings():
+                warnings.simplefilter("ignore")
+
+                try:
+                    proposed_params = self._fit_em(
+                        start_params + variates[i], transformed=False,
+                        maxiter=em_iter, return_params=True)
+                    proposed_llf = self.loglike(proposed_params)
+
+                    if proposed_llf > llf:
+                        llf = proposed_llf
+                        params = self.untransform_params(proposed_params)
+                except Exception:  # FIXME: catch something specific
+                    pass
+
+        # Return transformed parameters
+        return self.transform_params(params)

     @property
     def start_params(self):
         """
         (array) Starting parameters for maximum likelihood estimation.
         """
-        pass
+        params = np.zeros(self.k_params, dtype=np.float64)
+
+        # Transition probabilities
+        if self.tvtp:
+            params[self.parameters['regime_transition']] = 0.
+        else:
+            params[self.parameters['regime_transition']] = 1. / self.k_regimes
+
+        return params

     @property
     def param_names(self):
@@ -763,7 +1389,24 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         (list of str) List of human readable parameter names (for parameters
         actually included in the model).
         """
-        pass
+        param_names = np.zeros(self.k_params, dtype=object)
+
+        # Transition probabilities
+        if self.tvtp:
+            # TODO add support for exog_tvtp_names
+            param_names[self.parameters['regime_transition']] = [
+                'p[%d->%d].tvtp%d' % (j, i, k)
+                for i in range(self.k_regimes-1)
+                for k in range(self.k_tvtp)
+                for j in range(self.k_regimes)
+                ]
+        else:
+            param_names[self.parameters['regime_transition']] = [
+                'p[%d->%d]' % (j, i)
+                for i in range(self.k_regimes-1)
+                for j in range(self.k_regimes)]
+
+        return param_names.tolist()

     def transform_params(self, unconstrained):
         """
@@ -787,14 +1430,40 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         In the base class, this only transforms the transition-probability-
         related parameters.
         """
-        pass
+        constrained = np.array(unconstrained, copy=True)
+        constrained = constrained.astype(
+            np.promote_types(np.float64, constrained.dtype))
+
+        # Nothing to do for transition probabilities if TVTP
+        if self.tvtp:
+            constrained[self.parameters['regime_transition']] = (
+                unconstrained[self.parameters['regime_transition']])
+        # Otherwise do logistic transformation
+        else:
+            # Transition probabilities
+            for i in range(self.k_regimes):
+                tmp1 = unconstrained[self.parameters[i, 'regime_transition']]
+                tmp2 = np.r_[0, tmp1]
+                constrained[self.parameters[i, 'regime_transition']] = np.exp(
+                    tmp1 - logsumexp(tmp2))
+
+        # Do not do anything for the rest of the parameters
+
+        return constrained

     def _untransform_logistic(self, unconstrained, constrained):
         """
         Function to allow using a numerical root-finder to reverse the
         logistic transform.
         """
-        pass
+        resid = np.zeros(unconstrained.shape, dtype=unconstrained.dtype)
+        exp = np.exp(unconstrained)
+        sum_exp = np.sum(exp)
+        for i in range(len(unconstrained)):
+            resid[i] = (unconstrained[i] -
+                        np.log(1 + sum_exp - exp[i]) +
+                        np.log(1 / constrained[i] - 1))
+        return resid

     def untransform_params(self, constrained):
         """
@@ -817,7 +1486,33 @@ class MarkovSwitching(tsbase.TimeSeriesModel):
         In the base class, this only untransforms the transition-probability-
         related parameters.
         """
-        pass
+        unconstrained = np.array(constrained, copy=True)
+        unconstrained = unconstrained.astype(
+            np.promote_types(np.float64, unconstrained.dtype))
+
+        # Nothing to do for transition probabilities if TVTP
+        if self.tvtp:
+            unconstrained[self.parameters['regime_transition']] = (
+                constrained[self.parameters['regime_transition']])
+        # Otherwise reverse logistic transformation
+        else:
+            for i in range(self.k_regimes):
+                s = self.parameters[i, 'regime_transition']
+                if self.k_regimes == 2:
+                    unconstrained[s] = -np.log(1. / constrained[s] - 1)
+                else:
+                    from scipy.optimize import root
+                    out = root(self._untransform_logistic,
+                               np.zeros(unconstrained[s].shape,
+                                        unconstrained.dtype),
+                               args=(constrained[s],))
+                    if not out['success']:
+                        raise ValueError('Could not untransform parameters.')
+                    unconstrained[s] = out['x']
+
+        # Do not do anything for the rest of the parameters
+
+        return unconstrained


 class HamiltonFilterResults:
@@ -856,31 +1551,66 @@ class HamiltonFilterResults:
     llf_obs : ndarray
         The loglikelihood values at each time period.
     """
-
     def __init__(self, model, result):
+
         self.model = model
+
         self.nobs = model.nobs
         self.order = model.order
         self.k_regimes = model.k_regimes
+
         attributes = ['regime_transition', 'initial_probabilities',
-            'conditional_loglikelihoods', 'predicted_joint_probabilities',
-            'filtered_marginal_probabilities',
-            'filtered_joint_probabilities', 'joint_loglikelihoods']
+                      'conditional_loglikelihoods',
+                      'predicted_joint_probabilities',
+                      'filtered_marginal_probabilities',
+                      'filtered_joint_probabilities',
+                      'joint_loglikelihoods']
         for name in attributes:
             setattr(self, name, getattr(result, name))
+
         self.initialization = model._initialization
         self.llf_obs = self.joint_loglikelihoods
         self.llf = np.sum(self.llf_obs)
+
+        # Subset transition if necessary (e.g. for Markov autoregression)
         if self.regime_transition.shape[-1] > 1 and self.order > 0:
             self.regime_transition = self.regime_transition[..., self.order:]
+
+        # Cache for predicted marginal probabilities
         self._predicted_marginal_probabilities = None

+    @property
+    def predicted_marginal_probabilities(self):
+        if self._predicted_marginal_probabilities is None:
+            self._predicted_marginal_probabilities = (
+                self.predicted_joint_probabilities)
+            for i in range(self._predicted_marginal_probabilities.ndim - 2):
+                self._predicted_marginal_probabilities = np.sum(
+                    self._predicted_marginal_probabilities, axis=-2)
+        return self._predicted_marginal_probabilities
+
     @property
     def expected_durations(self):
         """
         (array) Expected duration of a regime, possibly time-varying.
         """
-        pass
+        # It is possible that we will have a degenerate system, so that there
+        # is no possibility of transitioning to a different state. In that
+        # case, we do want the expected duration of one state to be np.inf,
+        # and the expected duration of the other states to be np.nan
+        diag = np.diagonal(self.regime_transition)
+        expected_durations = np.zeros_like(diag)
+        degenerate = np.any(diag == 1, axis=1)
+
+        # For non-degenerate states, use the usual computation
+        expected_durations[~degenerate] = 1 / (1 - diag[~degenerate])
+
+        # For degenerate states, everything is np.nan, except for the one
+        # state that is np.inf.
+        expected_durations[degenerate] = np.nan
+        expected_durations[diag == 1] = np.inf
+
+        return expected_durations.squeeze()


 class KimSmootherResults(HamiltonFilterResults):
@@ -904,17 +1634,18 @@ class KimSmootherResults(HamiltonFilterResults):
     k_states : int
         The dimension of the unobserved state process.
     """
-
     def __init__(self, model, result):
         super(KimSmootherResults, self).__init__(model, result)
+
         attributes = ['smoothed_joint_probabilities',
-            'smoothed_marginal_probabilities']
+                      'smoothed_marginal_probabilities']
+
         for name in attributes:
             setattr(self, name, getattr(result, name))


 class MarkovSwitchingResults(tsbase.TimeSeriesModelResults):
-    """
+    r"""
     Class to hold results from fitting a Markov switching model

     Parameters
@@ -944,88 +1675,160 @@ class MarkovSwitchingResults(tsbase.TimeSeriesModelResults):
     """
     use_t = False

-    def __init__(self, model, params, results, cov_type='opg', cov_kwds=
-        None, **kwargs):
+    def __init__(self, model, params, results, cov_type='opg', cov_kwds=None,
+                 **kwargs):
         self.data = model.data
+
         tsbase.TimeSeriesModelResults.__init__(self, model, params,
-            normalized_cov_params=None, scale=1.0)
+                                               normalized_cov_params=None,
+                                               scale=1.)
+
+        # Save the filter / smoother output
         self.filter_results = results
         if isinstance(results, KimSmootherResults):
             self.smoother_results = results
         else:
             self.smoother_results = None
+
+        # Dimensions
         self.nobs = model.nobs
         self.order = model.order
         self.k_regimes = model.k_regimes
+
+        # Setup covariance matrix notes dictionary
         if not hasattr(self, 'cov_kwds'):
             self.cov_kwds = {}
         self.cov_type = cov_type
+
+        # Setup the cache
         self._cache = {}
+
+        # Handle covariance matrix calculation
         if cov_kwds is None:
             cov_kwds = {}
-        self._cov_approx_complex_step = cov_kwds.pop('approx_complex_step',
-            True)
+        self._cov_approx_complex_step = (
+            cov_kwds.pop('approx_complex_step', True))
         self._cov_approx_centered = cov_kwds.pop('approx_centered', False)
         try:
             self._rank = None
             self._get_robustcov_results(cov_type=cov_type, use_self=True,
-                **cov_kwds)
+                                        **cov_kwds)
         except np.linalg.LinAlgError:
             self._rank = 0
             k_params = len(self.params)
             self.cov_params_default = np.zeros((k_params, k_params)) * np.nan
             self.cov_kwds['cov_type'] = (
-                'Covariance matrix could not be calculated: singular. information matrix.'
-                )
+                'Covariance matrix could not be calculated: singular.'
+                ' information matrix.')
+
+        # Copy over arrays
         attributes = ['regime_transition', 'initial_probabilities',
-            'conditional_loglikelihoods',
-            'predicted_marginal_probabilities',
-            'predicted_joint_probabilities',
-            'filtered_marginal_probabilities',
-            'filtered_joint_probabilities', 'joint_loglikelihoods',
-            'expected_durations']
+                      'conditional_loglikelihoods',
+                      'predicted_marginal_probabilities',
+                      'predicted_joint_probabilities',
+                      'filtered_marginal_probabilities',
+                      'filtered_joint_probabilities',
+                      'joint_loglikelihoods', 'expected_durations']
         for name in attributes:
             setattr(self, name, getattr(self.filter_results, name))
+
         attributes = ['smoothed_joint_probabilities',
-            'smoothed_marginal_probabilities']
+                      'smoothed_marginal_probabilities']
         for name in attributes:
             if self.smoother_results is not None:
                 setattr(self, name, getattr(self.smoother_results, name))
             else:
                 setattr(self, name, None)
-        self.predicted_marginal_probabilities = (self.
-            predicted_marginal_probabilities.T)
-        self.filtered_marginal_probabilities = (self.
-            filtered_marginal_probabilities.T)
+
+        # Reshape some arrays to long-format
+        self.predicted_marginal_probabilities = (
+            self.predicted_marginal_probabilities.T)
+        self.filtered_marginal_probabilities = (
+            self.filtered_marginal_probabilities.T)
         if self.smoother_results is not None:
-            self.smoothed_marginal_probabilities = (self.
-                smoothed_marginal_probabilities.T)
+            self.smoothed_marginal_probabilities = (
+                self.smoothed_marginal_probabilities.T)
+
+        # Make into Pandas arrays if using Pandas data
         if isinstance(self.data, PandasData):
             index = self.data.row_labels
             if self.expected_durations.ndim > 1:
-                self.expected_durations = pd.DataFrame(self.
-                    expected_durations, index=index)
-            self.predicted_marginal_probabilities = pd.DataFrame(self.
-                predicted_marginal_probabilities, index=index)
-            self.filtered_marginal_probabilities = pd.DataFrame(self.
-                filtered_marginal_probabilities, index=index)
+                self.expected_durations = pd.DataFrame(
+                    self.expected_durations, index=index)
+            self.predicted_marginal_probabilities = pd.DataFrame(
+                self.predicted_marginal_probabilities, index=index)
+            self.filtered_marginal_probabilities = pd.DataFrame(
+                self.filtered_marginal_probabilities, index=index)
             if self.smoother_results is not None:
-                self.smoothed_marginal_probabilities = pd.DataFrame(self.
-                    smoothed_marginal_probabilities, index=index)
+                self.smoothed_marginal_probabilities = pd.DataFrame(
+                    self.smoothed_marginal_probabilities, index=index)
+
+    def _get_robustcov_results(self, cov_type='opg', **kwargs):
+        from statsmodels.base.covtype import descriptions
+
+        use_self = kwargs.pop('use_self', False)
+        if use_self:
+            res = self
+        else:
+            raise NotImplementedError
+            res = self.__class__(
+                self.model, self.params,
+                normalized_cov_params=self.normalized_cov_params,
+                scale=self.scale)
+
+        # Set the new covariance type
+        res.cov_type = cov_type
+        res.cov_kwds = {}
+
+        approx_type_str = 'complex-step'
+
+        # Calculate the new covariance matrix
+        k_params = len(self.params)
+        if k_params == 0:
+            res.cov_params_default = np.zeros((0, 0))
+            res._rank = 0
+            res.cov_kwds['description'] = 'No parameters estimated.'
+        elif cov_type == 'custom':
+            res.cov_type = kwargs['custom_cov_type']
+            res.cov_params_default = kwargs['custom_cov_params']
+            res.cov_kwds['description'] = kwargs['custom_description']
+            res._rank = np.linalg.matrix_rank(res.cov_params_default)
+        elif cov_type == 'none':
+            res.cov_params_default = np.zeros((k_params, k_params)) * np.nan
+            res._rank = np.nan
+            res.cov_kwds['description'] = descriptions['none']
+        elif self.cov_type == 'approx':
+            res.cov_params_default = res.cov_params_approx
+            res.cov_kwds['description'] = descriptions['approx'].format(
+                                                approx_type=approx_type_str)
+        elif self.cov_type == 'opg':
+            res.cov_params_default = res.cov_params_opg
+            res.cov_kwds['description'] = descriptions['OPG'].format(
+                                                approx_type=approx_type_str)
+        elif self.cov_type == 'robust':
+            res.cov_params_default = res.cov_params_robust
+            res.cov_kwds['description'] = descriptions['robust'].format(
+                                                approx_type=approx_type_str)
+        else:
+            raise NotImplementedError('Invalid covariance matrix type.')
+
+        return res

     @cache_readonly
     def aic(self):
         """
         (float) Akaike Information Criterion
         """
-        pass
+        # return -2*self.llf + 2*self.params.shape[0]
+        return aic(self.llf, self.nobs, self.params.shape[0])

     @cache_readonly
     def bic(self):
         """
         (float) Bayes Information Criterion
         """
-        pass
+        # return -2*self.llf + self.params.shape[0]*np.log(self.nobs)
+        return bic(self.llf, self.nobs, self.params.shape[0])

     @cache_readonly
     def cov_params_approx(self):
@@ -1033,7 +1836,13 @@ class MarkovSwitchingResults(tsbase.TimeSeriesModelResults):
         (array) The variance / covariance matrix. Computed using the numerical
         Hessian approximated by complex step or finite differences methods.
         """
-        pass
+        evaluated_hessian = self.model.hessian(self.params, transformed=True)
+        neg_cov, singular_values = pinv_extended(evaluated_hessian)
+
+        if self._rank is None:
+            self._rank = np.linalg.matrix_rank(np.diag(singular_values))
+
+        return -neg_cov

     @cache_readonly
     def cov_params_opg(self):
@@ -1041,7 +1850,14 @@ class MarkovSwitchingResults(tsbase.TimeSeriesModelResults):
         (array) The variance / covariance matrix. Computed using the outer
         product of gradients method.
         """
-        pass
+        score_obs = self.model.score_obs(self.params, transformed=True).T
+        cov_params, singular_values = pinv_extended(
+            np.inner(score_obs, score_obs))
+
+        if self._rank is None:
+            self._rank = np.linalg.matrix_rank(np.diag(singular_values))
+
+        return cov_params

     @cache_readonly
     def cov_params_robust(self):
@@ -1049,45 +1865,59 @@ class MarkovSwitchingResults(tsbase.TimeSeriesModelResults):
         (array) The QMLE variance / covariance matrix. Computed using the
         numerical Hessian as the evaluated hessian.
         """
-        pass
+        cov_opg = self.cov_params_opg
+        evaluated_hessian = self.model.hessian(self.params, transformed=True)
+        cov_params, singular_values = pinv_extended(
+            np.dot(np.dot(evaluated_hessian, cov_opg), evaluated_hessian)
+        )
+
+        if self._rank is None:
+            self._rank = np.linalg.matrix_rank(np.diag(singular_values))
+
+        return cov_params

     @cache_readonly
     def fittedvalues(self):
         """
         (array) The predicted values of the model. An (nobs x k_endog) array.
         """
-        pass
+        return self.model.predict(self.params)

     @cache_readonly
     def hqic(self):
         """
         (float) Hannan-Quinn Information Criterion
         """
-        pass
+        # return -2*self.llf + 2*np.log(np.log(self.nobs))*self.params.shape[0]
+        return hqic(self.llf, self.nobs, self.params.shape[0])

     @cache_readonly
     def llf_obs(self):
         """
         (float) The value of the log-likelihood function evaluated at `params`.
         """
-        pass
+        return self.model.loglikeobs(self.params)

     @cache_readonly
     def llf(self):
         """
         (float) The value of the log-likelihood function evaluated at `params`.
         """
-        pass
+        return self.model.loglike(self.params)

     @cache_readonly
     def resid(self):
         """
         (array) The model residuals. An (nobs x k_endog) array.
         """
-        pass
+        return self.model.endog - self.fittedvalues
+
+    @property
+    def joint_likelihoods(self):
+        return np.exp(self.joint_loglikelihoods)

-    def predict(self, start=None, end=None, probabilities=None, conditional
-        =False):
+    def predict(self, start=None, end=None, probabilities=None,
+                conditional=False):
         """
         In-sample prediction and out-of-sample forecasting

@@ -1122,7 +1952,9 @@ class MarkovSwitchingResults(tsbase.TimeSeriesModelResults):
             Array of out of in-sample predictions and / or out-of-sample
             forecasts. An (npredict x k_endog) array.
         """
-        pass
+        return self.model.predict(self.params, start=start, end=end,
+                                  probabilities=probabilities,
+                                  conditional=conditional)

     def forecast(self, steps=1, **kwargs):
         """
@@ -1144,10 +1976,10 @@ class MarkovSwitchingResults(tsbase.TimeSeriesModelResults):
         forecast : ndarray
             Array of out of sample forecasts. A (steps x k_endog) array.
         """
-        pass
+        raise NotImplementedError

-    def summary(self, alpha=0.05, start=None, title=None, model_name=None,
-        display_params=True):
+    def summary(self, alpha=.05, start=None, title=None, model_name=None,
+                display_params=True):
         """
         Summarize the Model

@@ -1175,17 +2007,142 @@ class MarkovSwitchingResults(tsbase.TimeSeriesModelResults):
         --------
         statsmodels.iolib.summary.Summary
         """
-        pass
-
+        from statsmodels.iolib.summary import Summary
+
+        # Model specification results
+        model = self.model
+        if title is None:
+            title = 'Markov Switching Model Results'
+
+        if start is None:
+            start = 0
+        if self.data.dates is not None:
+            dates = self.data.dates
+            d = dates[start]
+            sample = ['%02d-%02d-%02d' % (d.month, d.day, d.year)]
+            d = dates[-1]
+            sample += ['- ' + '%02d-%02d-%02d' % (d.month, d.day, d.year)]
+        else:
+            sample = [str(start), ' - ' + str(self.model.nobs)]
+
+        # Standardize the model name as a list of str
+        if model_name is None:
+            model_name = model.__class__.__name__
+
+        # Create the tables
+        if not isinstance(model_name, list):
+            model_name = [model_name]
+
+        top_left = [('Dep. Variable:', None)]
+        top_left.append(('Model:', [model_name[0]]))
+        for i in range(1, len(model_name)):
+            top_left.append(('', ['+ ' + model_name[i]]))
+        top_left += [
+            ('Date:', None),
+            ('Time:', None),
+            ('Sample:', [sample[0]]),
+            ('', [sample[1]])
+        ]
+
+        top_right = [
+            ('No. Observations:', [self.model.nobs]),
+            ('Log Likelihood', ["%#5.3f" % self.llf]),
+            ('AIC', ["%#5.3f" % self.aic]),
+            ('BIC', ["%#5.3f" % self.bic]),
+            ('HQIC', ["%#5.3f" % self.hqic])
+        ]
+
+        if hasattr(self, 'cov_type'):
+            top_left.append(('Covariance Type:', [self.cov_type]))
+
+        summary = Summary()
+        summary.add_table_2cols(self, gleft=top_left, gright=top_right,
+                                title=title)
+
+        # Make parameters tables for each regime
+        import re
+
+        from statsmodels.iolib.summary import summary_params
+
+        def make_table(self, mask, title, strip_end=True):
+            res = (self, self.params[mask], self.bse[mask],
+                   self.tvalues[mask], self.pvalues[mask],
+                   self.conf_int(alpha)[mask])
+
+            param_names = [
+                re.sub(r'\[\d+\]$', '', name) for name in
+                np.array(self.data.param_names)[mask].tolist()
+            ]

-class MarkovSwitchingResultsWrapper(wrap.ResultsWrapper):
-    _attrs = {'cov_params_approx': 'cov', 'cov_params_default': 'cov',
-        'cov_params_opg': 'cov', 'cov_params_robust': 'cov'}
-    _wrap_attrs = wrap.union_dicts(tsbase.TimeSeriesResultsWrapper.
-        _wrap_attrs, _attrs)
-    _methods = {'forecast': 'dates'}
-    _wrap_methods = wrap.union_dicts(tsbase.TimeSeriesResultsWrapper.
-        _wrap_methods, _methods)
+            return summary_params(res, yname=None, xname=param_names,
+                                  alpha=alpha, use_t=False, title=title)
+
+        params = model.parameters
+        regime_masks = [[] for i in range(model.k_regimes)]
+        other_masks = {}
+        for key, switching in params.switching.items():
+            k_params = len(switching)
+            if key == 'regime_transition':
+                continue
+            other_masks[key] = []
+
+            for i in range(k_params):
+                if switching[i]:
+                    for j in range(self.k_regimes):
+                        regime_masks[j].append(params[j, key][i])
+                else:
+                    other_masks[key].append(params[0, key][i])
+
+        for i in range(self.k_regimes):
+            mask = regime_masks[i]
+            if len(mask) > 0:
+                table = make_table(self, mask, 'Regime %d parameters' % i)
+                summary.tables.append(table)
+
+        mask = []
+        for key, _mask in other_masks.items():
+            mask.extend(_mask)
+        if len(mask) > 0:
+            table = make_table(self, mask, 'Non-switching parameters')
+            summary.tables.append(table)
+
+        # Transition parameters
+        mask = params['regime_transition']
+        table = make_table(self, mask, 'Regime transition parameters')
+        summary.tables.append(table)
+
+        # Add warnings/notes, added to text format only
+        etext = []
+        if hasattr(self, 'cov_type') and 'description' in self.cov_kwds:
+            etext.append(self.cov_kwds['description'])
+
+        if self._rank < len(self.params):
+            etext.append("Covariance matrix is singular or near-singular,"
+                         " with condition number %6.3g. Standard errors may be"
+                         " unstable." % _safe_cond(self.cov_params()))
+
+        if etext:
+            etext = ["[{0}] {1}".format(i + 1, text)
+                     for i, text in enumerate(etext)]
+            etext.insert(0, "Warnings:")
+            summary.add_extra_txt(etext)
+
+        return summary


-wrap.populate_wrapper(MarkovSwitchingResultsWrapper, MarkovSwitchingResults)
+class MarkovSwitchingResultsWrapper(wrap.ResultsWrapper):
+    _attrs = {
+        'cov_params_approx': 'cov',
+        'cov_params_default': 'cov',
+        'cov_params_opg': 'cov',
+        'cov_params_robust': 'cov',
+    }
+    _wrap_attrs = wrap.union_dicts(tsbase.TimeSeriesResultsWrapper._wrap_attrs,
+                                   _attrs)
+    _methods = {
+        'forecast': 'dates',
+    }
+    _wrap_methods = wrap.union_dicts(
+        tsbase.TimeSeriesResultsWrapper._wrap_methods, _methods)
+wrap.populate_wrapper(MarkovSwitchingResultsWrapper,  # noqa:E305
+                      MarkovSwitchingResults)
diff --git a/statsmodels/tsa/seasonal.py b/statsmodels/tsa/seasonal.py
index f2f0716e0..2bc943bde 100644
--- a/statsmodels/tsa/seasonal.py
+++ b/statsmodels/tsa/seasonal.py
@@ -4,13 +4,20 @@ Seasonal Decomposition by Moving Averages
 import numpy as np
 import pandas as pd
 from pandas.core.nanops import nanmean as pd_nanmean
+
 from statsmodels.tools.validation import PandasWrapper, array_like
 from statsmodels.tsa.stl._stl import STL
 from statsmodels.tsa.filters.filtertools import convolution_filter
 from statsmodels.tsa.stl.mstl import MSTL
 from statsmodels.tsa.tsatools import freq_to_period
-__all__ = ['STL', 'seasonal_decompose', 'seasonal_mean', 'DecomposeResult',
-    'MSTL']
+
+__all__ = [
+    "STL",
+    "seasonal_decompose",
+    "seasonal_mean",
+    "DecomposeResult",
+    "MSTL",
+]


 def _extrapolate_trend(trend, npoints):
@@ -18,7 +25,42 @@ def _extrapolate_trend(trend, npoints):
     Replace nan values on trend's end-points with least-squares extrapolated
     values with regression considering npoints closest defined points.
     """
-    pass
+    front = next(
+        i for i, vals in enumerate(trend) if not np.any(np.isnan(vals))
+    )
+    back = (
+        trend.shape[0]
+        - 1
+        - next(
+            i
+            for i, vals in enumerate(trend[::-1])
+            if not np.any(np.isnan(vals))
+        )
+    )
+    front_last = min(front + npoints, back)
+    back_first = max(front, back - npoints)
+
+    k, n = np.linalg.lstsq(
+        np.c_[np.arange(front, front_last), np.ones(front_last - front)],
+        trend[front:front_last],
+        rcond=-1,
+    )[0]
+    extra = (np.arange(0, front) * np.c_[k] + np.c_[n]).T
+    if trend.ndim == 1:
+        extra = extra.squeeze()
+    trend[:front] = extra
+
+    k, n = np.linalg.lstsq(
+        np.c_[np.arange(back_first, back), np.ones(back - back_first)],
+        trend[back_first:back],
+        rcond=-1,
+    )[0]
+    extra = (np.arange(back + 1, trend.shape[0]) * np.c_[k] + np.c_[n]).T
+    if trend.ndim == 1:
+        extra = extra.squeeze()
+    trend[back + 1 :] = extra
+
+    return trend


 def seasonal_mean(x, period):
@@ -27,11 +69,17 @@ def seasonal_mean(x, period):
     number of periods per cycle. E.g., 12 for monthly. NaNs are ignored
     in the mean.
     """
-    pass
+    return np.array([pd_nanmean(x[i::period], axis=0) for i in range(period)])


-def seasonal_decompose(x, model='additive', filt=None, period=None,
-    two_sided=True, extrapolate_trend=0):
+def seasonal_decompose(
+    x,
+    model="additive",
+    filt=None,
+    period=None,
+    two_sided=True,
+    extrapolate_trend=0,
+):
     """
     Seasonal decomposition using moving averages.

@@ -93,7 +141,83 @@ def seasonal_decompose(x, model='additive', filt=None, period=None,
     series and the average of this de-trended series for each period is
     the returned seasonal component.
     """
-    pass
+    pfreq = period
+    pw = PandasWrapper(x)
+    if period is None:
+        pfreq = getattr(getattr(x, "index", None), "inferred_freq", None)
+
+    x = array_like(x, "x", maxdim=2)
+    nobs = len(x)
+
+    if not np.all(np.isfinite(x)):
+        raise ValueError("This function does not handle missing values")
+    if model.startswith("m"):
+        if np.any(x <= 0):
+            raise ValueError(
+                "Multiplicative seasonality is not appropriate "
+                "for zero and negative values"
+            )
+
+    if period is None:
+        if pfreq is not None:
+            pfreq = freq_to_period(pfreq)
+            period = pfreq
+        else:
+            raise ValueError(
+                "You must specify a period or x must be a pandas object with "
+                "a PeriodIndex or a DatetimeIndex with a freq not set to None"
+            )
+    if x.shape[0] < 2 * pfreq:
+        raise ValueError(
+            f"x must have 2 complete cycles requires {2 * pfreq} "
+            f"observations. x only has {x.shape[0]} observation(s)"
+        )
+
+    if filt is None:
+        if period % 2 == 0:  # split weights at ends
+            filt = np.array([0.5] + [1] * (period - 1) + [0.5]) / period
+        else:
+            filt = np.repeat(1.0 / period, period)
+
+    nsides = int(two_sided) + 1
+    trend = convolution_filter(x, filt, nsides)
+
+    if extrapolate_trend == "freq":
+        extrapolate_trend = period - 1
+
+    if extrapolate_trend > 0:
+        trend = _extrapolate_trend(trend, extrapolate_trend + 1)
+
+    if model.startswith("m"):
+        detrended = x / trend
+    else:
+        detrended = x - trend
+
+    period_averages = seasonal_mean(detrended, period)
+
+    if model.startswith("m"):
+        period_averages /= np.mean(period_averages, axis=0)
+    else:
+        period_averages -= np.mean(period_averages, axis=0)
+
+    seasonal = np.tile(period_averages.T, nobs // period + 1).T[:nobs]
+
+    if model.startswith("m"):
+        resid = x / seasonal / trend
+    else:
+        resid = detrended - seasonal
+
+    results = []
+    for s, name in zip(
+        (seasonal, trend, resid, x), ("seasonal", "trend", "resid", None)
+    ):
+        results.append(pw.wrap(s.squeeze(), columns=name))
+    return DecomposeResult(
+        seasonal=results[0],
+        trend=results[1],
+        resid=results[2],
+        observed=results[3],
+    )


 class DecomposeResult:
@@ -120,8 +244,9 @@ class DecomposeResult:
         if weights is None:
             weights = np.ones_like(observed)
             if isinstance(observed, pd.Series):
-                weights = pd.Series(weights, index=observed.index, name=
-                    'weights')
+                weights = pd.Series(
+                    weights, index=observed.index, name="weights"
+                )
         self._weights = weights
         self._resid = resid
         self._observed = observed
@@ -129,35 +254,41 @@ class DecomposeResult:
     @property
     def observed(self):
         """Observed data"""
-        pass
+        return self._observed

     @property
     def seasonal(self):
         """The estimated seasonal component"""
-        pass
+        return self._seasonal

     @property
     def trend(self):
         """The estimated trend component"""
-        pass
+        return self._trend

     @property
     def resid(self):
         """The estimated residuals"""
-        pass
+        return self._resid

     @property
     def weights(self):
         """The weights used in the robust estimation"""
-        pass
+        return self._weights

     @property
     def nobs(self):
         """Number of observations"""
-        pass
+        return self._observed.shape

-    def plot(self, observed=True, seasonal=True, trend=True, resid=True,
-        weights=False):
+    def plot(
+        self,
+        observed=True,
+        seasonal=True,
+        trend=True,
+        resid=True,
+        weights=False,
+    ):
         """
         Plot estimated components

@@ -179,4 +310,51 @@ class DecomposeResult:
         matplotlib.figure.Figure
             The figure instance that containing the plot.
         """
-        pass
+        from pandas.plotting import register_matplotlib_converters
+
+        from statsmodels.graphics.utils import _import_mpl
+
+        plt = _import_mpl()
+        register_matplotlib_converters()
+        series = [(self._observed, "Observed")] if observed else []
+        series += [(self.trend, "trend")] if trend else []
+
+        if self.seasonal.ndim == 1:
+            series += [(self.seasonal, "seasonal")] if seasonal else []
+        elif self.seasonal.ndim > 1:
+            if isinstance(self.seasonal, pd.DataFrame):
+                for col in self.seasonal.columns:
+                    series += (
+                        [(self.seasonal[col], "seasonal")] if seasonal else []
+                    )
+            else:
+                for i in range(self.seasonal.shape[1]):
+                    series += (
+                        [(self.seasonal[:, i], "seasonal")] if seasonal else []
+                    )
+
+        series += [(self.resid, "residual")] if resid else []
+        series += [(self.weights, "weights")] if weights else []
+
+        if isinstance(self._observed, (pd.DataFrame, pd.Series)):
+            nobs = self._observed.shape[0]
+            xlim = self._observed.index[0], self._observed.index[nobs - 1]
+        else:
+            xlim = (0, self._observed.shape[0] - 1)
+
+        fig, axs = plt.subplots(len(series), 1, sharex=True)
+        for i, (ax, (series, def_name)) in enumerate(zip(axs, series)):
+            if def_name != "residual":
+                ax.plot(series)
+            else:
+                ax.plot(series, marker="o", linestyle="none")
+                ax.plot(xlim, (0, 0), color="#000000", zorder=-3)
+            name = getattr(series, "name", def_name)
+            if def_name != "Observed":
+                name = name.capitalize()
+            title = ax.set_title if i == 0 and observed else ax.set_ylabel
+            title(name)
+            ax.set_xlim(xlim)
+
+        fig.tight_layout()
+        return fig
diff --git a/statsmodels/tsa/statespace/_pykalman_smoother.py b/statsmodels/tsa/statespace/_pykalman_smoother.py
index c9f425afa..d76399c1a 100644
--- a/statsmodels/tsa/statespace/_pykalman_smoother.py
+++ b/statsmodels/tsa/statespace/_pykalman_smoother.py
@@ -4,22 +4,29 @@ Kalman Smoother
 Author: Chad Fulton
 License: Simplified-BSD
 """
+
 import numpy as np
-SMOOTHER_STATE = 1
-SMOOTHER_STATE_COV = 2
-SMOOTHER_DISTURBANCE = 4
-SMOOTHER_DISTURBANCE_COV = 8
-SMOOTHER_ALL = (SMOOTHER_STATE | SMOOTHER_STATE_COV | SMOOTHER_DISTURBANCE |
-    SMOOTHER_DISTURBANCE_COV)
+
+SMOOTHER_STATE = 0x01          # Durbin and Koopman (2012), Chapter 4.4.2
+SMOOTHER_STATE_COV = 0x02      # ibid., Chapter 4.4.3
+SMOOTHER_DISTURBANCE = 0x04    # ibid., Chapter 4.5
+SMOOTHER_DISTURBANCE_COV = 0x08    # ibid., Chapter 4.5
+SMOOTHER_ALL = (
+    SMOOTHER_STATE | SMOOTHER_STATE_COV | SMOOTHER_DISTURBANCE |
+    SMOOTHER_DISTURBANCE_COV
+)


 class _KalmanSmoother:

     def __init__(self, model, kfilter, smoother_output):
+        # Save values
         self.model = model
         self.kfilter = kfilter
         self._kfilter = model._kalman_filter
         self.smoother_output = smoother_output
+
+        # Create storage
         self.scaled_smoothed_estimator = None
         self.scaled_smoothed_estimator_cov = None
         self.smoothing_error = None
@@ -29,49 +36,74 @@ class _KalmanSmoother:
         self.smoothed_state_disturbance_cov = None
         self.smoothed_measurement_disturbance = None
         self.smoothed_measurement_disturbance_cov = None
+
+        # Intermediate values
         self.tmp_L = np.zeros((model.k_states, model.k_states, model.nobs),
-            dtype=kfilter.dtype)
+                              dtype=kfilter.dtype)
+
         if smoother_output & (SMOOTHER_STATE | SMOOTHER_DISTURBANCE):
-            self.scaled_smoothed_estimator = np.zeros((model.k_states, 
-                model.nobs + 1), dtype=kfilter.dtype)
-            self.smoothing_error = np.zeros((model.k_endog, model.nobs),
-                dtype=kfilter.dtype)
+            self.scaled_smoothed_estimator = (
+                np.zeros((model.k_states, model.nobs+1), dtype=kfilter.dtype))
+            self.smoothing_error = (
+                np.zeros((model.k_endog, model.nobs), dtype=kfilter.dtype))
         if smoother_output & (SMOOTHER_STATE_COV | SMOOTHER_DISTURBANCE_COV):
-            self.scaled_smoothed_estimator_cov = np.zeros((model.k_states,
-                model.k_states, model.nobs + 1), dtype=kfilter.dtype)
+            self.scaled_smoothed_estimator_cov = (
+                np.zeros((model.k_states, model.k_states, model.nobs + 1),
+                         dtype=kfilter.dtype))
+
+        # State smoothing
         if smoother_output & SMOOTHER_STATE:
             self.smoothed_state = np.zeros((model.k_states, model.nobs),
-                dtype=kfilter.dtype)
+                                           dtype=kfilter.dtype)
         if smoother_output & SMOOTHER_STATE_COV:
-            self.smoothed_state_cov = np.zeros((model.k_states, model.
-                k_states, model.nobs), dtype=kfilter.dtype)
+            self.smoothed_state_cov = (
+                np.zeros((model.k_states, model.k_states, model.nobs),
+                         dtype=kfilter.dtype))
+
+        # Disturbance smoothing
         if smoother_output & SMOOTHER_DISTURBANCE:
-            self.smoothed_state_disturbance = np.zeros((model.k_posdef,
-                model.nobs), dtype=kfilter.dtype)
-            self.smoothed_measurement_disturbance = np.zeros((model.k_endog,
-                model.nobs), dtype=kfilter.dtype)
+            self.smoothed_state_disturbance = (
+                np.zeros((model.k_posdef, model.nobs), dtype=kfilter.dtype))
+            self.smoothed_measurement_disturbance = (
+                np.zeros((model.k_endog, model.nobs), dtype=kfilter.dtype))
         if smoother_output & SMOOTHER_DISTURBANCE_COV:
-            self.smoothed_state_disturbance_cov = np.zeros((model.k_posdef,
-                model.k_posdef, model.nobs), dtype=kfilter.dtype)
-            self.smoothed_measurement_disturbance_cov = np.zeros((model.
-                k_endog, model.k_endog, model.nobs), dtype=kfilter.dtype)
+            self.smoothed_state_disturbance_cov = (
+                np.zeros((model.k_posdef, model.k_posdef, model.nobs),
+                         dtype=kfilter.dtype))
+            self.smoothed_measurement_disturbance_cov = (
+                np.zeros((model.k_endog, model.k_endog, model.nobs),
+                         dtype=kfilter.dtype))
+
+    def seek(self, t):
+        if t >= self.model.nobs:
+            raise IndexError("Observation index out of range")
+        self.t = t

     def __iter__(self):
         return self

     def __call__(self):
-        self.seek(self.model.nobs - 1)
-        for i in range(self.model.nobs - 1, -1, -1):
+        self.seek(self.model.nobs-1)
+        # Perform backwards smoothing iterations
+        for i in range(self.model.nobs-1, -1, -1):
             next(self)

+    def next(self):
+        # next() is required for compatibility with Python2.7.
+        return self.__next__()
+
     def __next__(self):
+        # Check for valid iteration
         if not self.t >= 0:
             raise StopIteration
+
+        # Get local copies of variables
         t = self.t
         kfilter = self.kfilter
         _kfilter = self._kfilter
         model = self.model
         smoother_output = self.smoother_output
+
         scaled_smoothed_estimator = self.scaled_smoothed_estimator
         scaled_smoothed_estimator_cov = self.scaled_smoothed_estimator_cov
         smoothing_error = self.smoothing_error
@@ -79,43 +111,56 @@ class _KalmanSmoother:
         smoothed_state_cov = self.smoothed_state_cov
         smoothed_state_disturbance = self.smoothed_state_disturbance
         smoothed_state_disturbance_cov = self.smoothed_state_disturbance_cov
-        smoothed_measurement_disturbance = (self.
-            smoothed_measurement_disturbance)
-        smoothed_measurement_disturbance_cov = (self.
-            smoothed_measurement_disturbance_cov)
+        smoothed_measurement_disturbance = (
+            self.smoothed_measurement_disturbance)
+        smoothed_measurement_disturbance_cov = (
+            self.smoothed_measurement_disturbance_cov)
         tmp_L = self.tmp_L
+
+        # Seek the Cython Kalman filter to the right place, setup matrices
         _kfilter.seek(t, False)
         _kfilter.initialize_statespace_object_pointers()
         _kfilter.initialize_filter_object_pointers()
         _kfilter.select_missing()
-        missing_entire_obs = _kfilter.model.nmissing[t
-            ] == _kfilter.model.k_endog
-        missing_partial_obs = (not missing_entire_obs and _kfilter.model.
-            nmissing[t] > 0)
+
+        missing_entire_obs = (
+            _kfilter.model.nmissing[t] == _kfilter.model.k_endog)
+        missing_partial_obs = (
+            not missing_entire_obs and _kfilter.model.nmissing[t] > 0)
+
+        # Get the appropriate (possibly time-varying) indices
         design_t = 0 if kfilter.design.shape[2] == 1 else t
         obs_cov_t = 0 if kfilter.obs_cov.shape[2] == 1 else t
         transition_t = 0 if kfilter.transition.shape[2] == 1 else t
         selection_t = 0 if kfilter.selection.shape[2] == 1 else t
         state_cov_t = 0 if kfilter.state_cov.shape[2] == 1 else t
+
+        # Get endog dimension (can vary if there missing data)
         k_endog = _kfilter.k_endog
+
+        # Get references to representation matrices and Kalman filter output
         transition = model.transition[:, :, transition_t]
         selection = model.selection[:, :, selection_t]
         state_cov = model.state_cov[:, :, state_cov_t]
+
         predicted_state = kfilter.predicted_state[:, t]
         predicted_state_cov = kfilter.predicted_state_cov[:, :, t]
+
         mask = ~kfilter.missing[:, t].astype(bool)
         if missing_partial_obs:
-            design = np.array(_kfilter.selected_design[:k_endog * model.
-                k_states], copy=True).reshape(k_endog, model.k_states,
-                order='F')
-            obs_cov = np.array(_kfilter.selected_obs_cov[:k_endog ** 2],
-                copy=True).reshape(k_endog, k_endog)
+            design = np.array(
+                _kfilter.selected_design[:k_endog*model.k_states], copy=True
+            ).reshape(k_endog, model.k_states, order='F')
+            obs_cov = np.array(
+                _kfilter.selected_obs_cov[:k_endog**2], copy=True
+            ).reshape(k_endog, k_endog)
             kalman_gain = kfilter.kalman_gain[:, mask, t]
-            forecasts_error_cov = np.array(_kfilter.forecast_error_cov[:, :,
-                t], copy=True).ravel(order='F')[:k_endog ** 2].reshape(k_endog,
-                k_endog)
-            forecasts_error = np.array(_kfilter.forecast_error[:k_endog, t],
-                copy=True)
+
+            forecasts_error_cov = np.array(
+                _kfilter.forecast_error_cov[:, :, t], copy=True
+                ).ravel(order='F')[:k_endog**2].reshape(k_endog, k_endog)
+            forecasts_error = np.array(
+                _kfilter.forecast_error[:k_endog, t], copy=True)
             F_inv = np.linalg.inv(forecasts_error_cov)
         else:
             if missing_entire_obs:
@@ -127,56 +172,100 @@ class _KalmanSmoother:
             forecasts_error_cov = kfilter.forecasts_error_cov[:, :, t]
             forecasts_error = kfilter.forecasts_error[:, t]
             F_inv = np.linalg.inv(forecasts_error_cov)
+
+        # Create a temporary matrix
         tmp_L[:, :, t] = transition - kalman_gain.dot(design)
         L = tmp_L[:, :, t]
+
+        # Perform the recursion
+
+        # Intermediate values
         if smoother_output & (SMOOTHER_STATE | SMOOTHER_DISTURBANCE):
             if missing_entire_obs:
-                scaled_smoothed_estimator[:, t - 1] = transition.transpose(
-                    ).dot(scaled_smoothed_estimator[:, t])
+                # smoothing_error is undefined here, keep it as zeros
+                scaled_smoothed_estimator[:, t - 1] = (
+                    transition.transpose().dot(scaled_smoothed_estimator[:, t])
+                )
             else:
-                smoothing_error[:k_endog, t] = F_inv.dot(forecasts_error
-                    ) - kalman_gain.transpose().dot(scaled_smoothed_estimator
-                    [:, t])
-                scaled_smoothed_estimator[:, t - 1] = design.transpose().dot(
-                    smoothing_error[:k_endog, t]) + transition.transpose().dot(
-                    scaled_smoothed_estimator[:, t])
+                smoothing_error[:k_endog, t] = (
+                    F_inv.dot(forecasts_error) -
+                    kalman_gain.transpose().dot(
+                        scaled_smoothed_estimator[:, t])
+                )
+                scaled_smoothed_estimator[:, t - 1] = (
+                    design.transpose().dot(smoothing_error[:k_endog, t]) +
+                    transition.transpose().dot(scaled_smoothed_estimator[:, t])
+                )
         if smoother_output & (SMOOTHER_STATE_COV | SMOOTHER_DISTURBANCE_COV):
             if missing_entire_obs:
-                scaled_smoothed_estimator_cov[:, :, t - 1] = L.transpose().dot(
-                    scaled_smoothed_estimator_cov[:, :, t]).dot(L)
+                scaled_smoothed_estimator_cov[:, :, t - 1] = (
+                    L.transpose().dot(
+                        scaled_smoothed_estimator_cov[:, :, t]
+                    ).dot(L)
+                )
             else:
-                scaled_smoothed_estimator_cov[:, :, t - 1] = design.transpose(
-                    ).dot(F_inv).dot(design) + L.transpose().dot(
-                    scaled_smoothed_estimator_cov[:, :, t]).dot(L)
+                scaled_smoothed_estimator_cov[:, :, t - 1] = (
+                    design.transpose().dot(F_inv).dot(design) +
+                    L.transpose().dot(
+                        scaled_smoothed_estimator_cov[:, :, t]
+                    ).dot(L)
+                )
+
+        # State smoothing
         if smoother_output & SMOOTHER_STATE:
-            smoothed_state[:, t] = predicted_state + predicted_state_cov.dot(
-                scaled_smoothed_estimator[:, t - 1])
+            smoothed_state[:, t] = (
+                predicted_state +
+                predicted_state_cov.dot(scaled_smoothed_estimator[:, t - 1])
+            )
         if smoother_output & SMOOTHER_STATE_COV:
-            smoothed_state_cov[:, :, t
-                ] = predicted_state_cov - predicted_state_cov.dot(
-                scaled_smoothed_estimator_cov[:, :, t - 1]).dot(
-                predicted_state_cov)
+            smoothed_state_cov[:, :, t] = (
+                predicted_state_cov -
+                predicted_state_cov.dot(
+                    scaled_smoothed_estimator_cov[:, :, t - 1]
+                ).dot(predicted_state_cov)
+            )
+
+        # Disturbance smoothing
         if smoother_output & (SMOOTHER_DISTURBANCE | SMOOTHER_DISTURBANCE_COV):
             QR = state_cov.dot(selection.transpose())
+
         if smoother_output & SMOOTHER_DISTURBANCE:
-            smoothed_state_disturbance[:, t] = QR.dot(scaled_smoothed_estimator
-                [:, t])
+            smoothed_state_disturbance[:, t] = (
+                QR.dot(scaled_smoothed_estimator[:, t])
+            )
+            # measurement disturbance is set to zero when all missing
+            # (unconditional distribution)
             if not missing_entire_obs:
-                smoothed_measurement_disturbance[mask, t] = obs_cov.dot(
-                    smoothing_error[:k_endog, t])
+                smoothed_measurement_disturbance[mask, t] = (
+                    obs_cov.dot(smoothing_error[:k_endog, t])
+                )
+
         if smoother_output & SMOOTHER_DISTURBANCE_COV:
-            smoothed_state_disturbance_cov[:, :, t] = state_cov - QR.dot(
-                scaled_smoothed_estimator_cov[:, :, t]).dot(QR.transpose())
+            smoothed_state_disturbance_cov[:, :, t] = (
+                state_cov -
+                QR.dot(
+                    scaled_smoothed_estimator_cov[:, :, t]
+                ).dot(QR.transpose())
+            )
+
             if missing_entire_obs:
                 smoothed_measurement_disturbance_cov[:, :, t] = obs_cov
             else:
+                # For non-missing portion, calculate as usual
                 ix = np.ix_(mask, mask, [t])
-                smoothed_measurement_disturbance_cov[ix] = (obs_cov -
-                    obs_cov.dot(F_inv + kalman_gain.transpose().dot(
-                    scaled_smoothed_estimator_cov[:, :, t]).dot(kalman_gain
-                    )).dot(obs_cov))[:, :, np.newaxis]
+                smoothed_measurement_disturbance_cov[ix] = (
+                    obs_cov - obs_cov.dot(
+                        F_inv + kalman_gain.transpose().dot(
+                            scaled_smoothed_estimator_cov[:, :, t]
+                        ).dot(kalman_gain)
+                    ).dot(obs_cov)
+                )[:, :, np.newaxis]
+
+                # For missing portion, use unconditional distribution
                 ix = np.ix_(~mask, ~mask, [t])
                 mod_ix = np.ix_(~mask, ~mask, [0])
-                smoothed_measurement_disturbance_cov[ix] = np.copy(model.
-                    obs_cov[:, :, obs_cov_t:obs_cov_t + 1])[mod_ix]
+                smoothed_measurement_disturbance_cov[ix] = np.copy(
+                    model.obs_cov[:, :, obs_cov_t:obs_cov_t+1])[mod_ix]
+
+        # Advance the smoother
         self.t -= 1
diff --git a/statsmodels/tsa/statespace/_quarterly_ar1.py b/statsmodels/tsa/statespace/_quarterly_ar1.py
index 4d94048d1..7e3431a03 100644
--- a/statsmodels/tsa/statespace/_quarterly_ar1.py
+++ b/statsmodels/tsa/statespace/_quarterly_ar1.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 (Internal) AR(1) model for monthly growth rates aggregated to quarterly freq.

@@ -6,14 +7,17 @@ License: BSD-3
 """
 import warnings
 import numpy as np
+
 from statsmodels.tools.tools import Bunch
 from statsmodels.tsa.statespace import mlemodel, initialization
-from statsmodels.tsa.statespace.kalman_smoother import SMOOTHER_STATE, SMOOTHER_STATE_COV, SMOOTHER_STATE_AUTOCOV
-from statsmodels.tsa.statespace.tools import constrain_stationary_univariate, unconstrain_stationary_univariate
+from statsmodels.tsa.statespace.kalman_smoother import (
+    SMOOTHER_STATE, SMOOTHER_STATE_COV, SMOOTHER_STATE_AUTOCOV)
+from statsmodels.tsa.statespace.tools import (
+    constrain_stationary_univariate, unconstrain_stationary_univariate)


 class QuarterlyAR1(mlemodel.MLEModel):
-    """
+    r"""
     AR(1) model for monthly growth rates aggregated to quarterly frequency

     Parameters
@@ -28,26 +32,181 @@ class QuarterlyAR1(mlemodel.MLEModel):

     .. math::

-        y_t & = \\begin{bmatrix} 1 & 2 & 3 & 2 & 1 \\end{bmatrix} \\alpha_t \\\\
-        \\alpha_t & = \\begin{bmatrix}
-            \\phi & 0 & 0 & 0 & 0 \\\\
-               1 & 0 & 0 & 0 & 0 \\\\
-               0 & 1 & 0 & 0 & 0 \\\\
-               0 & 0 & 1 & 0 & 0 \\\\
-               0 & 0 & 0 & 1 & 0 \\\\
-        \\end{bmatrix} +
-        \\begin{bmatrix} 1 \\\\ 0 \\\\ 0 \\\\ 0 \\\\ 0 \\end{bmatrix} \\varepsilon_t
+        y_t & = \begin{bmatrix} 1 & 2 & 3 & 2 & 1 \end{bmatrix} \alpha_t \\
+        \alpha_t & = \begin{bmatrix}
+            \phi & 0 & 0 & 0 & 0 \\
+               1 & 0 & 0 & 0 & 0 \\
+               0 & 1 & 0 & 0 & 0 \\
+               0 & 0 & 1 & 0 & 0 \\
+               0 & 0 & 0 & 1 & 0 \\
+        \end{bmatrix} +
+        \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} \varepsilon_t

-    The two parameters to be estimated are :math:`\\phi` and :math:`\\sigma^2`.
+    The two parameters to be estimated are :math:`\phi` and :math:`\sigma^2`.

     It supports fitting via the usual quasi-Newton methods, as well as using
     the EM algorithm.

     """
-
     def __init__(self, endog):
-        super().__init__(endog, k_states=5, k_posdef=1, initialization=
-            'stationary')
+        super().__init__(endog, k_states=5, k_posdef=1,
+                         initialization='stationary')
         self['design'] = [1, 2, 3, 2, 1]
         self['transition', 1:, :-1] = np.eye(4)
-        self['selection', 0, 0] = 1.0
+        self['selection', 0, 0] = 1.
+
+    @property
+    def param_names(self):
+        return ['phi', 'sigma2']
+
+    @property
+    def start_params(self):
+        return np.array([0, np.nanvar(self.endog) / 19])
+
+    def fit(self, *args, **kwargs):
+        # Don't show warnings
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            out = super().fit(*args, **kwargs)
+        return out
+
+    def fit_em(self, start_params=None, transformed=True, cov_type='none',
+               cov_kwds=None, maxiter=500, tolerance=1e-6,
+               em_initialization=True, mstep_method=None, full_output=True,
+               return_params=False, low_memory=False):
+        if self._has_fixed_params:
+            raise NotImplementedError('Cannot fit using the EM algorithm while'
+                                      ' holding some parameters fixed.')
+        if low_memory:
+            raise ValueError('Cannot fit using the EM algorithm when using'
+                             ' low_memory option.')
+
+        if start_params is None:
+            start_params = self.start_params
+            transformed = True
+        else:
+            start_params = np.array(start_params, ndmin=1)
+
+        if not transformed:
+            start_params = self.transform_params(start_params)
+
+        # Perform expectation-maximization
+        llf = []
+        params = [start_params]
+        init = None
+        i = 0
+        delta = 0
+        while i < maxiter and (i < 2 or (delta > tolerance)):
+            out = self._em_iteration(params[-1], init=init,
+                                     mstep_method=mstep_method)
+            llf.append(out[0].llf_obs.sum())
+            params.append(out[1])
+            if em_initialization:
+                init = initialization.Initialization(
+                    self.k_states, 'known',
+                    constant=out[0].smoothed_state[..., 0],
+                    stationary_cov=out[0].smoothed_state_cov[..., 0])
+            if i > 0:
+                delta = (2 * (llf[-1] - llf[-2]) /
+                         (np.abs(llf[-1]) + np.abs(llf[-2])))
+            i += 1
+
+        # Just return the fitted parameters if requested
+        if return_params:
+            result = params[-1]
+        # Otherwise construct the results class if desired
+        else:
+            if em_initialization:
+                base_init = self.ssm.initialization
+                self.ssm.initialization = init
+            result = self.smooth(params[-1], transformed=True,
+                                 cov_type=cov_type, cov_kwds=cov_kwds)
+            if em_initialization:
+                self.ssm.initialization = base_init
+
+            # Save the output
+            if full_output:
+                em_retvals = Bunch(**{'params': np.array(params),
+                                      'llf': np.array(llf),
+                                      'iter': i})
+                em_settings = Bunch(**{'tolerance': tolerance,
+                                       'maxiter': maxiter})
+            else:
+                em_retvals = None
+                em_settings = None
+
+            result.mle_retvals = em_retvals
+            result.mle_settings = em_settings
+
+        return result
+
+    def _em_iteration(self, params0, init=None, mstep_method=None):
+        # (E)xpectation step
+        res = self._em_expectation_step(params0, init=init)
+
+        # (M)aximization step
+        params1 = self._em_maximization_step(res, params0,
+                                             mstep_method=mstep_method)
+
+        return res, params1
+
+    def _em_expectation_step(self, params0, init=None):
+        # (E)xpectation step
+        self.update(params0)
+        # Re-initialize state, if new initialization is given
+        if init is not None:
+            base_init = self.ssm.initialization
+            self.ssm.initialization = init
+        # Perform smoothing, only saving what is required
+        res = self.ssm.smooth(
+            SMOOTHER_STATE | SMOOTHER_STATE_COV | SMOOTHER_STATE_AUTOCOV,
+            update_filter=False)
+        res.llf_obs = np.array(
+            self.ssm._kalman_filter.loglikelihood, copy=True)
+        # Reset initialization
+        if init is not None:
+            self.ssm.initialization = base_init
+
+        return res
+
+    def _em_maximization_step(self, res, params0, mstep_method=None):
+        a = res.smoothed_state.T[..., None]
+        cov_a = res.smoothed_state_cov.transpose(2, 0, 1)
+        acov_a = res.smoothed_state_autocov.transpose(2, 0, 1)
+
+        # E[a_t a_t'], t = 0, ..., T
+        Eaa = cov_a.copy() + np.matmul(a, a.transpose(0, 2, 1))
+        # E[a_t a_{t-1}'], t = 1, ..., T
+        Eaa1 = acov_a[:-1] + np.matmul(a[1:], a[:-1].transpose(0, 2, 1))
+
+        # Factor VAR and covariance
+        A = Eaa[:-1, :1, :1].sum(axis=0)
+        B = Eaa1[:, :1, :1].sum(axis=0)
+        C = Eaa[1:, :1, :1].sum(axis=0)
+        nobs = Eaa.shape[0] - 1
+
+        f_A = B / A
+        f_Q = (C - f_A @ B.T) / nobs
+        params1 = np.zeros_like(params0)
+        params1[0] = f_A[0, 0]
+        params1[1] = f_Q[0, 0]
+
+        return params1
+
+    def transform_params(self, unconstrained):
+        # array no longer accepts inhomogeneous inputs
+        return np.hstack([
+            constrain_stationary_univariate(unconstrained[:1]),
+            unconstrained[1]**2])
+
+    def untransform_params(self, constrained):
+        # array no longer accepts inhomogeneous inputs
+        return np.hstack([
+            unconstrain_stationary_univariate(constrained[:1]),
+            constrained[1] ** 0.5])
+
+    def update(self, params, **kwargs):
+        super().update(params, **kwargs)
+
+        self['transition', 0, 0] = params[0]
+        self['state_cov', 0, 0] = params[1]
diff --git a/statsmodels/tsa/statespace/api.py b/statsmodels/tsa/statespace/api.py
index cea890d62..50ac7a245 100644
--- a/statsmodels/tsa/statespace/api.py
+++ b/statsmodels/tsa/statespace/api.py
@@ -1,5 +1,5 @@
-__all__ = ['SARIMAX', 'ExponentialSmoothing', 'MLEModel', 'MLEResults',
-    'tools', 'Initialization']
+__all__ = ["SARIMAX", "ExponentialSmoothing", "MLEModel", "MLEResults",
+           "tools", "Initialization"]
 from .sarimax import SARIMAX
 from .exponential_smoothing import ExponentialSmoothing
 from .mlemodel import MLEModel, MLEResults
diff --git a/statsmodels/tsa/statespace/cfa_simulation_smoother.py b/statsmodels/tsa/statespace/cfa_simulation_smoother.py
index 8f9ed549a..e17e4f4d5 100644
--- a/statsmodels/tsa/statespace/cfa_simulation_smoother.py
+++ b/statsmodels/tsa/statespace/cfa_simulation_smoother.py
@@ -4,12 +4,14 @@
 Author: Chad Fulton
 License: BSD-3
 """
+
 import numpy as np
+
 from . import tools


 class CFASimulationSmoother:
-    """
+    r"""
     "Cholesky Factor Algorithm" (CFA) simulation smoother

     Parameters
@@ -81,19 +83,30 @@ class CFASimulationSmoother:

     def __init__(self, model, cfa_simulation_smoother_classes=None):
         self.model = model
+
+        # Get the simulation smoother classes
         self.prefix_simulation_smoother_map = (
-            cfa_simulation_smoother_classes if 
-            cfa_simulation_smoother_classes is not None else tools.
-            prefix_cfa_simulation_smoother_map.copy())
+            cfa_simulation_smoother_classes
+            if cfa_simulation_smoother_classes is not None
+            else tools.prefix_cfa_simulation_smoother_map.copy())
+
         self._simulation_smoothers = {}
+
         self._posterior_mean = None
         self._posterior_cov_inv_chol = None
         self._posterior_cov = None
         self._simulated_state = None

+    @property
+    def _simulation_smoother(self):
+        prefix = self.model.prefix
+        if prefix in self._simulation_smoothers:
+            return self._simulation_smoothers[prefix]
+        return None
+
     @property
     def posterior_mean(self):
-        """
+        r"""
         Posterior mean of the states conditional on the data

         Notes
@@ -101,33 +114,39 @@ class CFASimulationSmoother:

         .. math::

-            \\hat \\alpha_t = E[\\alpha_t \\mid Y^n ]
+            \hat \alpha_t = E[\alpha_t \mid Y^n ]

         This posterior mean is identical to the `smoothed_state` computed by
         the Kalman smoother.
         """
-        pass
+        if self._posterior_mean is None:
+            self._posterior_mean = np.array(
+                self._simulation_smoother.posterior_mean, copy=True)
+        return self._posterior_mean

     @property
     def posterior_cov_inv_chol_sparse(self):
-        """
+        r"""
         Sparse Cholesky factor of inverse posterior covariance matrix

         Notes
         -----
         This attribute holds in sparse diagonal banded storage the Cholesky
         factor of the inverse of the posterior covariance matrix. If we denote
-        :math:`P = Var[\\alpha \\mid Y^n ]`, then the this attribute holds the
+        :math:`P = Var[\alpha \mid Y^n ]`, then the this attribute holds the
         lower Cholesky factor :math:`L`, defined from :math:`L L' = P^{-1}`.
         This attribute uses the sparse diagonal banded storage described in the
         documentation of, for example, the SciPy function
         `scipy.linalg.solveh_banded`.
         """
-        pass
+        if self._posterior_cov_inv_chol is None:
+            self._posterior_cov_inv_chol = np.array(
+                self._simulation_smoother.posterior_cov_inv_chol, copy=True)
+        return self._posterior_cov_inv_chol

     @property
     def posterior_cov(self):
-        """
+        r"""
         Posterior covariance of the states conditional on the data

         Notes
@@ -140,7 +159,7 @@ class CFASimulationSmoother:

         .. math::

-            Var[\\alpha \\mid Y^n ]
+            Var[\alpha \mid Y^n ]

         This posterior covariance matrix is *not* identical to the
         `smoothed_state_cov` attribute produced by the Kalman smoother, because
@@ -148,10 +167,15 @@ class CFASimulationSmoother:
         `smoothed_state_cov` contains the `(k_states, k_states)` block
         diagonal entries of this posterior covariance matrix.
         """
-        pass
+        if self._posterior_cov is None:
+            from scipy.linalg import cho_solve_banded
+            inv_chol = self.posterior_cov_inv_chol_sparse
+            self._posterior_cov = cho_solve_banded(
+                (inv_chol, True), np.eye(inv_chol.shape[1]))
+        return self._posterior_cov

     def simulate(self, variates=None, update_posterior=True):
-        """
+        r"""
         Perform simulation smoothing (via Cholesky factor algorithm)

         Does not return anything, but populates the object's `simulated_state`
@@ -174,25 +198,25 @@ class CFASimulationSmoother:

         .. math::

-            \\alpha \\mid Y_n \\sim N(\\hat \\alpha, Var(\\alpha \\mid Y_n))
+            \alpha \mid Y_n \sim N(\hat \alpha, Var(\alpha \mid Y_n))

-        Let :math:`L L' = Var(\\alpha \\mid Y_n)^{-1}`. Then simulation proceeds
+        Let :math:`L L' = Var(\alpha \mid Y_n)^{-1}`. Then simulation proceeds
         according to the following steps:

-        1. Draw :math:`u \\sim N(0, I)`
-        2. Compute :math:`x = \\hat \\alpha + (L')^{-1} u`
+        1. Draw :math:`u \sim N(0, I)`
+        2. Compute :math:`x = \hat \alpha + (L')^{-1} u`

         And then :math:`x` is a draw from the joint posterior of the states.
         The output of the function is as follows:

         - The simulated draw :math:`x` is held in the `simulated_state`
           attribute.
-        - The posterior mean :math:`\\hat \\alpha` is held in the
+        - The posterior mean :math:`\hat \alpha` is held in the
           `posterior_mean` attribute.
         - The (lower triangular) Cholesky factor of the inverse posterior
           covariance matrix, :math:`L`, is held in sparse diagonal banded
           storage in the `posterior_cov_inv_chol` attribute.
-        - The posterior covariance matrix :math:`Var(\\alpha \\mid Y_n)` can be
+        - The posterior covariance matrix :math:`Var(\alpha \mid Y_n)` can be
           computed on demand by accessing the `posterior_cov` property. Note
           that this matrix can be extremely large, so care must be taken when
           accessing this property. In most cases, it will be preferred to make
@@ -200,4 +224,32 @@ class CFASimulationSmoother:
           `posterior_cov` attribute.

         """
-        pass
+        # (Re) initialize the _statespace representation
+        prefix, dtype, create = self.model._initialize_representation()
+
+        # Validate variates and get in required datatype
+        if variates is not None:
+            tools.validate_matrix_shape('variates', variates.shape,
+                                        self.model.k_states,
+                                        self.model.nobs, 1)
+            variates = np.ravel(variates, order='F').astype(dtype)
+
+        # (Re) initialize the state
+        self.model._initialize_state(prefix=prefix)
+
+        # Construct the Cython simulation smoother instance, if necessary
+        if create or prefix not in self._simulation_smoothers:
+            cls = self.prefix_simulation_smoother_map[prefix]
+            self._simulation_smoothers[prefix] = cls(
+                self.model._statespaces[prefix])
+        sim = self._simulation_smoothers[prefix]
+
+        # Update posterior moments, if requested
+        if update_posterior:
+            sim.update_sparse_posterior_moments()
+            self._posterior_mean = None
+            self._posterior_cov_inv_chol = None
+            self._posterior_cov = None
+
+        # Perform simulation smoothing
+        self.simulated_state = sim.simulate(variates=variates)
diff --git a/statsmodels/tsa/statespace/dynamic_factor.py b/statsmodels/tsa/statespace/dynamic_factor.py
index c2729eb22..bec3147dd 100644
--- a/statsmodels/tsa/statespace/dynamic_factor.py
+++ b/statsmodels/tsa/statespace/dynamic_factor.py
@@ -1,12 +1,18 @@
+# -*- coding: utf-8 -*-
 """
 Dynamic factor model

 Author: Chad Fulton
 License: Simplified-BSD
 """
+
 import numpy as np
 from .mlemodel import MLEModel, MLEResults, MLEResultsWrapper
-from .tools import is_invertible, prepare_exog, constrain_stationary_univariate, unconstrain_stationary_univariate, constrain_stationary_multivariate, unconstrain_stationary_multivariate
+from .tools import (
+    is_invertible, prepare_exog,
+    constrain_stationary_univariate, unconstrain_stationary_univariate,
+    constrain_stationary_multivariate, unconstrain_stationary_multivariate
+)
 from statsmodels.multivariate.pca import PCA
 from statsmodels.regression.linear_model import OLS
 from statsmodels.tsa.vector_ar.var_model import VAR
@@ -20,7 +26,7 @@ from statsmodels.compat.pandas import Appender


 class DynamicFactor(MLEModel):
-    """
+    r"""
     Dynamic factor model

     Parameters
@@ -86,9 +92,9 @@ class DynamicFactor(MLEModel):

     .. math::

-        y_t & = \\Lambda f_t + B x_t + u_t \\\\
-        f_t & = A_1 f_{t-1} + \\dots + A_p f_{t-p} + \\eta_t \\\\
-        u_t & = C_1 u_{t-1} + \\dots + C_q u_{t-q} + \\varepsilon_t
+        y_t & = \Lambda f_t + B x_t + u_t \\
+        f_t & = A_1 f_{t-1} + \dots + A_p f_{t-p} + \eta_t \\
+        u_t & = C_1 u_{t-1} + \dots + C_q u_{t-q} + \varepsilon_t

     where there are `k_endog` observed series and `k_factors` unobserved
     factors. Thus :math:`y_t` is a `k_endog` x 1 vector and :math:`f_t` is a
@@ -96,9 +102,9 @@ class DynamicFactor(MLEModel):

     :math:`x_t` are optional exogenous vectors, shaped `k_exog` x 1.

-    :math:`\\eta_t` and :math:`\\varepsilon_t` are white noise error terms. In
-    order to identify the factors, :math:`Var(\\eta_t) = I`. Denote
-    :math:`Var(\\varepsilon_t) \\equiv \\Sigma`.
+    :math:`\eta_t` and :math:`\varepsilon_t` are white noise error terms. In
+    order to identify the factors, :math:`Var(\eta_t) = I`. Denote
+    :math:`Var(\varepsilon_t) \equiv \Sigma`.

     Options related to the unobserved factors:

@@ -114,10 +120,10 @@ class DynamicFactor(MLEModel):
       equation; corresponds to :math:`q`, above. To have white noise errors,
       set `error_order = 0` (this is the default).
     - `error_cov_type`: this controls the form of the covariance matrix
-      :math:`\\Sigma`. If it is "dscalar", then :math:`\\Sigma = \\sigma^2 I`. If
+      :math:`\Sigma`. If it is "dscalar", then :math:`\Sigma = \sigma^2 I`. If
       it is "diagonal", then
-      :math:`\\Sigma = \\text{diag}(\\sigma_1^2, \\dots, \\sigma_n^2)`. If it is
-      "unstructured", then :math:`\\Sigma` is any valid variance / covariance
+      :math:`\Sigma = \text{diag}(\sigma_1^2, \dots, \sigma_n^2)`. If it is
+      "unstructured", then :math:`\Sigma` is any valid variance / covariance
       matrix (i.e. symmetric and positive definite).
     - `error_var`: this controls whether or not the errors evolve jointly
       according to a VAR(q), or individually according to separate AR(q)
@@ -133,47 +139,80 @@ class DynamicFactor(MLEModel):
     """

     def __init__(self, endog, k_factors, factor_order, exog=None,
-        error_order=0, error_var=False, error_cov_type='diagonal',
-        enforce_stationarity=True, **kwargs):
+                 error_order=0, error_var=False, error_cov_type='diagonal',
+                 enforce_stationarity=True, **kwargs):
+
+        # Model properties
         self.enforce_stationarity = enforce_stationarity
+
+        # Factor-related properties
         self.k_factors = k_factors
         self.factor_order = factor_order
+
+        # Error-related properties
         self.error_order = error_order
         self.error_var = error_var and error_order > 0
         self.error_cov_type = error_cov_type
-        self.k_exog, exog = prepare_exog(exog)
+
+        # Exogenous data
+        (self.k_exog, exog) = prepare_exog(exog)
+
+        # Note: at some point in the future might add state regression, as in
+        # SARIMAX.
         self.mle_regression = self.k_exog > 0
+
+        # We need to have an array or pandas at this point
         if not _is_using_pandas(endog, None):
             endog = np.asanyarray(endog, order='C')
+
+        # Save some useful model orders, internally used
         k_endog = endog.shape[1] if endog.ndim > 1 else 1
         self._factor_order = max(1, self.factor_order) * self.k_factors
         self._error_order = self.error_order * k_endog
+
+        # Calculate the number of states
         k_states = self._factor_order
         k_posdef = self.k_factors
         if self.error_order > 0:
             k_states += self._error_order
             k_posdef += k_endog
+
+        # We can still estimate the model with no dynamic state (e.g. SUR), we
+        # just need to have one state that does nothing.
         self._unused_state = False
         if k_states == 0:
             k_states = 1
             k_posdef = 1
             self._unused_state = True
+
+        # Test for non-multivariate endog
         if k_endog < 2:
-            raise ValueError(
-                'The dynamic factors model is only valid for multivariate time series.'
-                )
+            raise ValueError('The dynamic factors model is only valid for'
+                             ' multivariate time series.')
+
+        # Test for too many factors
         if self.k_factors >= k_endog:
-            raise ValueError(
-                'Number of factors must be less than the number of endogenous variables.'
-                )
+            raise ValueError('Number of factors must be less than the number'
+                             ' of endogenous variables.')
+
+        # Test for invalid error_cov_type
         if self.error_cov_type not in ['scalar', 'diagonal', 'unstructured']:
-            raise ValueError(
-                'Invalid error covariance matrix type specification.')
+            raise ValueError('Invalid error covariance matrix type'
+                             ' specification.')
+
+        # By default, initialize as stationary
         kwargs.setdefault('initialization', 'stationary')
-        super(DynamicFactor, self).__init__(endog, exog=exog, k_states=
-            k_states, k_posdef=k_posdef, **kwargs)
+
+        # Initialize the state space model
+        super(DynamicFactor, self).__init__(
+            endog, exog=exog, k_states=k_states, k_posdef=k_posdef, **kwargs
+        )
+
+        # Set as time-varying model if we have exog
         if self.k_exog > 0:
             self.ssm._time_invariant = False
+
+        # Initialize the components
         self.parameters = {}
         self._initialize_loadings()
         self._initialize_exog()
@@ -182,22 +221,434 @@ class DynamicFactor(MLEModel):
         self._initialize_error_transition()
         self.k_params = sum(self.parameters.values())

+        # Cache parameter vector slices
         def _slice(key, offset):
             length = self.parameters[key]
             param_slice = np.s_[offset:offset + length]
             offset += length
             return param_slice, offset
+
         offset = 0
         self._params_loadings, offset = _slice('factor_loadings', offset)
         self._params_exog, offset = _slice('exog', offset)
         self._params_error_cov, offset = _slice('error_cov', offset)
-        self._params_factor_transition, offset = _slice('factor_transition',
-            offset)
-        self._params_error_transition, offset = _slice('error_transition',
-            offset)
+        self._params_factor_transition, offset = (
+            _slice('factor_transition', offset))
+        self._params_error_transition, offset = (
+            _slice('error_transition', offset))
+
+        # Update _init_keys attached by super
         self._init_keys += ['k_factors', 'factor_order', 'error_order',
-            'error_var', 'error_cov_type', 'enforce_stationarity'] + list(
-            kwargs.keys())
+                            'error_var', 'error_cov_type',
+                            'enforce_stationarity'] + list(kwargs.keys())
+
+    def _initialize_loadings(self):
+        # Initialize the parameters
+        self.parameters['factor_loadings'] = self.k_endog * self.k_factors
+
+        # Setup fixed components of state space matrices
+        if self.error_order > 0:
+            start = self._factor_order
+            end = self._factor_order + self.k_endog
+            self.ssm['design', :, start:end] = np.eye(self.k_endog)
+
+        # Setup indices of state space matrices
+        self._idx_loadings = np.s_['design', :, :self.k_factors]
+
+    def _initialize_exog(self):
+        # Initialize the parameters
+        self.parameters['exog'] = self.k_exog * self.k_endog
+
+        # If we have exog effects, then the obs intercept needs to be
+        # time-varying
+        if self.k_exog > 0:
+            self.ssm['obs_intercept'] = np.zeros((self.k_endog, self.nobs))
+
+        # Setup indices of state space matrices
+        self._idx_exog = np.s_['obs_intercept', :self.k_endog, :]
+
+    def _initialize_error_cov(self):
+        if self.error_cov_type == 'scalar':
+            self._initialize_error_cov_diagonal(scalar=True)
+        elif self.error_cov_type == 'diagonal':
+            self._initialize_error_cov_diagonal(scalar=False)
+        elif self.error_cov_type == 'unstructured':
+            self._initialize_error_cov_unstructured()
+
+    def _initialize_error_cov_diagonal(self, scalar=False):
+        # Initialize the parameters
+        self.parameters['error_cov'] = 1 if scalar else self.k_endog
+
+        # Setup fixed components of state space matrices
+
+        # Setup indices of state space matrices
+        k_endog = self.k_endog
+        k_factors = self.k_factors
+        idx = np.diag_indices(k_endog)
+        if self.error_order > 0:
+            matrix = 'state_cov'
+            idx = (idx[0] + k_factors, idx[1] + k_factors)
+        else:
+            matrix = 'obs_cov'
+        self._idx_error_cov = (matrix,) + idx
+
+    def _initialize_error_cov_unstructured(self):
+        # Initialize the parameters
+        k_endog = self.k_endog
+        self.parameters['error_cov'] = int(k_endog * (k_endog + 1) / 2)
+
+        # Setup fixed components of state space matrices
+
+        # Setup indices of state space matrices
+        self._idx_lower_error_cov = np.tril_indices(self.k_endog)
+        if self.error_order > 0:
+            start = self.k_factors
+            end = self.k_factors + self.k_endog
+            self._idx_error_cov = (
+                np.s_['state_cov', start:end, start:end])
+        else:
+            self._idx_error_cov = np.s_['obs_cov', :, :]
+
+    def _initialize_factor_transition(self):
+        order = self.factor_order * self.k_factors
+        k_factors = self.k_factors
+
+        # Initialize the parameters
+        self.parameters['factor_transition'] = (
+            self.factor_order * self.k_factors**2)
+
+        # Setup fixed components of state space matrices
+        # VAR(p) for factor transition
+        if self.k_factors > 0:
+            if self.factor_order > 0:
+                self.ssm['transition', k_factors:order, :order - k_factors] = (
+                    np.eye(order - k_factors))
+
+            self.ssm['selection', :k_factors, :k_factors] = np.eye(k_factors)
+            # Identification requires constraining the state covariance to an
+            # identity matrix
+            self.ssm['state_cov', :k_factors, :k_factors] = np.eye(k_factors)
+
+        # Setup indices of state space matrices
+        self._idx_factor_transition = np.s_['transition', :k_factors, :order]
+
+    def _initialize_error_transition(self):
+        # Initialize the appropriate situation
+        if self.error_order == 0:
+            self._initialize_error_transition_white_noise()
+        else:
+            # Generic setup fixed components of state space matrices
+            # VAR(q) for error transition
+            # (in the individual AR case, we still have the VAR(q) companion
+            # matrix structure, but force the coefficient matrices to be
+            # diagonal)
+            k_endog = self.k_endog
+            k_factors = self.k_factors
+            _factor_order = self._factor_order
+            _error_order = self._error_order
+            _slice = np.s_['selection',
+                           _factor_order:_factor_order + k_endog,
+                           k_factors:k_factors + k_endog]
+            self.ssm[_slice] = np.eye(k_endog)
+            _slice = np.s_[
+                'transition',
+                _factor_order + k_endog:_factor_order + _error_order,
+                _factor_order:_factor_order + _error_order - k_endog]
+            self.ssm[_slice] = np.eye(_error_order - k_endog)
+
+            # Now specialized setups
+            if self.error_var:
+                self._initialize_error_transition_var()
+            else:
+                self._initialize_error_transition_individual()
+
+    def _initialize_error_transition_white_noise(self):
+        # Initialize the parameters
+        self.parameters['error_transition'] = 0
+
+        # No fixed components of state space matrices
+
+        # Setup indices of state space matrices (just an empty slice)
+        self._idx_error_transition = np.s_['transition', 0:0, 0:0]
+
+    def _initialize_error_transition_var(self):
+        k_endog = self.k_endog
+        _factor_order = self._factor_order
+        _error_order = self._error_order
+
+        # Initialize the parameters
+        self.parameters['error_transition'] = _error_order * k_endog
+
+        # Fixed components already setup above
+
+        # Setup indices of state space matrices
+        # Here we want to set all of the elements of the coefficient matrices,
+        # the same as in a VAR specification
+        self._idx_error_transition = np.s_[
+            'transition',
+            _factor_order:_factor_order + k_endog,
+            _factor_order:_factor_order + _error_order]
+
+    def _initialize_error_transition_individual(self):
+        k_endog = self.k_endog
+        _error_order = self._error_order
+
+        # Initialize the parameters
+        self.parameters['error_transition'] = _error_order
+
+        # Fixed components already setup above
+
+        # Setup indices of state space matrices
+        # Here we want to set only the diagonal elements of the coefficient
+        # matrices, and we want to set them in order by equation, not by
+        # matrix (i.e. set the first element of the first matrix's diagonal,
+        # then set the first element of the second matrix's diagonal, then...)
+
+        # The basic setup is a tiled list of diagonal indices, one for each
+        # coefficient matrix
+        idx = np.tile(np.diag_indices(k_endog), self.error_order)
+        # Now we need to shift the rows down to the correct location
+        row_shift = self._factor_order
+        # And we need to shift the columns in an increasing way
+        col_inc = self._factor_order + np.repeat(
+            [i * k_endog for i in range(self.error_order)], k_endog)
+        idx[0] += row_shift
+        idx[1] += col_inc
+
+        # Make a copy (without the row shift) so that we can easily get the
+        # diagonal parameters back out of a generic coefficients matrix array
+        idx_diag = idx.copy()
+        idx_diag[0] -= row_shift
+        idx_diag[1] -= self._factor_order
+        idx_diag = idx_diag[:, np.lexsort((idx_diag[1], idx_diag[0]))]
+        self._idx_error_diag = (idx_diag[0], idx_diag[1])
+
+        # Finally, we want to fill the entries in in the correct order, which
+        # is to say we want to fill in lexicographically, first by row then by
+        # column
+        idx = idx[:, np.lexsort((idx[1], idx[0]))]
+        self._idx_error_transition = np.s_['transition', idx[0], idx[1]]
+
+    def clone(self, endog, exog=None, **kwargs):
+        return self._clone_from_init_kwds(endog, exog=exog, **kwargs)
+
+    @property
+    def _res_classes(self):
+        return {'fit': (DynamicFactorResults, DynamicFactorResultsWrapper)}
+
+    @property
+    def start_params(self):
+        params = np.zeros(self.k_params, dtype=np.float64)
+
+        endog = self.endog.copy()
+        mask = ~np.any(np.isnan(endog), axis=1)
+        endog = endog[mask]
+        if self.k_exog > 0:
+            exog = self.exog[mask]
+
+        # 1. Factor loadings (estimated via PCA)
+        if self.k_factors > 0:
+            # Use principal components + OLS as starting values
+            res_pca = PCA(endog, ncomp=self.k_factors)
+            mod_ols = OLS(endog, res_pca.factors)
+            res_ols = mod_ols.fit()
+
+            # Using OLS params for the loadings tends to gives higher starting
+            # log-likelihood.
+            params[self._params_loadings] = res_ols.params.T.ravel()
+            # params[self._params_loadings] = res_pca.loadings.ravel()
+
+            # However, using res_ols.resid tends to causes non-invertible
+            # starting VAR coefficients for error VARs
+            # endog = res_ols.resid
+            endog = endog - np.dot(res_pca.factors, res_pca.loadings.T)
+
+        # 2. Exog (OLS on residuals)
+        if self.k_exog > 0:
+            mod_ols = OLS(endog, exog=exog)
+            res_ols = mod_ols.fit()
+            # In the form: beta.x1.y1, beta.x2.y1, beta.x1.y2, ...
+            params[self._params_exog] = res_ols.params.T.ravel()
+            endog = res_ols.resid
+
+        # 3. Factors (VAR on res_pca.factors)
+        stationary = True
+        if self.k_factors > 1 and self.factor_order > 0:
+            # 3a. VAR transition (OLS on factors estimated via PCA)
+            mod_factors = VAR(res_pca.factors)
+            res_factors = mod_factors.fit(maxlags=self.factor_order, ic=None,
+                                          trend='n')
+            # Save the parameters
+            params[self._params_factor_transition] = (
+                res_factors.params.T.ravel())
+
+            # Test for stationarity
+            coefficient_matrices = (
+                params[self._params_factor_transition].reshape(
+                    self.k_factors * self.factor_order, self.k_factors
+                ).T
+            ).reshape(self.k_factors, self.k_factors, self.factor_order).T
+
+            stationary = is_invertible([1] + list(-coefficient_matrices))
+        elif self.k_factors > 0 and self.factor_order > 0:
+            # 3b. AR transition
+            Y = res_pca.factors[self.factor_order:]
+            X = lagmat(res_pca.factors, self.factor_order, trim='both')
+            params_ar = np.linalg.pinv(X).dot(Y)
+            stationary = is_invertible(np.r_[1, -params_ar.squeeze()])
+            params[self._params_factor_transition] = params_ar[:, 0]
+
+        # Check for stationarity
+        if not stationary and self.enforce_stationarity:
+            raise ValueError('Non-stationary starting autoregressive'
+                             ' parameters found with `enforce_stationarity`'
+                             ' set to True.')
+
+        # 4. Errors
+        if self.error_order == 0:
+            if self.error_cov_type == 'scalar':
+                params[self._params_error_cov] = endog.var(axis=0).mean()
+            elif self.error_cov_type == 'diagonal':
+                params[self._params_error_cov] = endog.var(axis=0)
+            elif self.error_cov_type == 'unstructured':
+                cov_factor = np.diag(endog.std(axis=0))
+                params[self._params_error_cov] = (
+                    cov_factor[self._idx_lower_error_cov].ravel())
+        elif self.error_var:
+            mod_errors = VAR(endog)
+            res_errors = mod_errors.fit(maxlags=self.error_order, ic=None,
+                                        trend='n')
+
+            # Test for stationarity
+            coefficient_matrices = (
+                np.array(res_errors.params.T).ravel().reshape(
+                    self.k_endog * self.error_order, self.k_endog
+                ).T
+            ).reshape(self.k_endog, self.k_endog, self.error_order).T
+
+            stationary = is_invertible([1] + list(-coefficient_matrices))
+            if not stationary and self.enforce_stationarity:
+                raise ValueError('Non-stationary starting error autoregressive'
+                                 ' parameters found with'
+                                 ' `enforce_stationarity` set to True.')
+
+            # Get the error autoregressive parameters
+            params[self._params_error_transition] = (
+                    np.array(res_errors.params.T).ravel())
+
+            # Get the error covariance parameters
+            if self.error_cov_type == 'scalar':
+                params[self._params_error_cov] = (
+                    res_errors.sigma_u.diagonal().mean())
+            elif self.error_cov_type == 'diagonal':
+                params[self._params_error_cov] = res_errors.sigma_u.diagonal()
+            elif self.error_cov_type == 'unstructured':
+                try:
+                    cov_factor = np.linalg.cholesky(res_errors.sigma_u)
+                except np.linalg.LinAlgError:
+                    cov_factor = np.eye(res_errors.sigma_u.shape[0]) * (
+                        res_errors.sigma_u.diagonal().mean()**0.5)
+                cov_factor = np.eye(res_errors.sigma_u.shape[0]) * (
+                    res_errors.sigma_u.diagonal().mean()**0.5)
+                params[self._params_error_cov] = (
+                    cov_factor[self._idx_lower_error_cov].ravel())
+        else:
+            error_ar_params = []
+            error_cov_params = []
+            for i in range(self.k_endog):
+                mod_error = ARIMA(endog[:, i], order=(self.error_order, 0, 0),
+                                  trend='n', enforce_stationarity=True)
+                res_error = mod_error.fit(method='burg')
+                error_ar_params += res_error.params[:self.error_order].tolist()
+                error_cov_params += res_error.params[-1:].tolist()
+
+            params[self._params_error_transition] = np.r_[error_ar_params]
+            params[self._params_error_cov] = np.r_[error_cov_params]
+
+        return params
+
+    @property
+    def param_names(self):
+        param_names = []
+        endog_names = self.endog_names
+
+        # 1. Factor loadings
+        param_names += [
+            'loading.f%d.%s' % (j+1, endog_names[i])
+            for i in range(self.k_endog)
+            for j in range(self.k_factors)
+        ]
+
+        # 2. Exog
+        # Recall these are in the form: beta.x1.y1, beta.x2.y1, beta.x1.y2, ...
+        param_names += [
+            'beta.%s.%s' % (self.exog_names[j], endog_names[i])
+            for i in range(self.k_endog)
+            for j in range(self.k_exog)
+        ]
+
+        # 3. Error covariances
+        if self.error_cov_type == 'scalar':
+            param_names += ['sigma2']
+        elif self.error_cov_type == 'diagonal':
+            param_names += [
+                'sigma2.%s' % endog_names[i]
+                for i in range(self.k_endog)
+            ]
+        elif self.error_cov_type == 'unstructured':
+            param_names += [
+                'cov.chol[%d,%d]' % (i + 1, j + 1)
+                for i in range(self.k_endog)
+                for j in range(i+1)
+            ]
+
+        # 4. Factor transition VAR
+        param_names += [
+            'L%d.f%d.f%d' % (i+1, k+1, j+1)
+            for j in range(self.k_factors)
+            for i in range(self.factor_order)
+            for k in range(self.k_factors)
+        ]
+
+        # 5. Error transition VAR
+        if self.error_var:
+            param_names += [
+                'L%d.e(%s).e(%s)' % (i+1, endog_names[k], endog_names[j])
+                for j in range(self.k_endog)
+                for i in range(self.error_order)
+                for k in range(self.k_endog)
+            ]
+        else:
+            param_names += [
+                'L%d.e(%s).e(%s)' % (i+1, endog_names[j], endog_names[j])
+                for j in range(self.k_endog)
+                for i in range(self.error_order)
+            ]
+
+        return param_names
+
+    @property
+    def state_names(self):
+        names = []
+        endog_names = self.endog_names
+
+        # Factors and lags
+        names += [
+            (('f%d' % (j + 1)) if i == 0 else ('f%d.L%d' % (j + 1, i)))
+            for i in range(max(1, self.factor_order))
+            for j in range(self.k_factors)]
+
+        if self.error_order > 0:
+            names += [
+                (('e(%s)' % endog_names[j]) if i == 0
+                 else ('e(%s).L%d' % (endog_names[j], i)))
+                for i in range(self.error_order)
+                for j in range(self.k_endog)]
+
+        if self._unused_state:
+            names += ['dummy']
+
+        return names

     def transform_params(self, unconstrained):
         """
@@ -221,7 +672,82 @@ class DynamicFactor(MLEModel):
         Constrains the factor transition to be stationary and variances to be
         positive.
         """
-        pass
+        unconstrained = np.array(unconstrained, ndmin=1)
+        dtype = unconstrained.dtype
+        constrained = np.zeros(unconstrained.shape, dtype=dtype)
+
+        # 1. Factor loadings
+        # The factor loadings do not need to be adjusted
+        constrained[self._params_loadings] = (
+            unconstrained[self._params_loadings])
+
+        # 2. Exog
+        # The regression coefficients do not need to be adjusted
+        constrained[self._params_exog] = (
+            unconstrained[self._params_exog])
+
+        # 3. Error covariances
+        # If we have variances, force them to be positive
+        if self.error_cov_type in ['scalar', 'diagonal']:
+            constrained[self._params_error_cov] = (
+                unconstrained[self._params_error_cov]**2)
+        # Otherwise, nothing needs to be done
+        elif self.error_cov_type == 'unstructured':
+            constrained[self._params_error_cov] = (
+                unconstrained[self._params_error_cov])
+
+        # 4. Factor transition VAR
+        # VAR transition: optionally force to be stationary
+        if self.enforce_stationarity and self.factor_order > 0:
+            # Transform the parameters
+            unconstrained_matrices = (
+                unconstrained[self._params_factor_transition].reshape(
+                    self.k_factors, self._factor_order))
+            # This is always an identity matrix, but because the transform
+            # done prior to update (where the ssm representation matrices
+            # change), it may be complex
+            cov = self.ssm['state_cov', :self.k_factors, :self.k_factors].real
+            coefficient_matrices, variance = (
+                constrain_stationary_multivariate(unconstrained_matrices, cov))
+            constrained[self._params_factor_transition] = (
+                coefficient_matrices.ravel())
+        else:
+            constrained[self._params_factor_transition] = (
+                unconstrained[self._params_factor_transition])
+
+        # 5. Error transition VAR
+        # VAR transition: optionally force to be stationary
+        if self.enforce_stationarity and self.error_order > 0:
+
+            # Joint VAR specification
+            if self.error_var:
+                unconstrained_matrices = (
+                    unconstrained[self._params_error_transition].reshape(
+                        self.k_endog, self._error_order))
+                start = self.k_factors
+                end = self.k_factors + self.k_endog
+                cov = self.ssm['state_cov', start:end, start:end].real
+                coefficient_matrices, variance = (
+                    constrain_stationary_multivariate(
+                        unconstrained_matrices, cov))
+                constrained[self._params_error_transition] = (
+                    coefficient_matrices.ravel())
+            # Separate AR specifications
+            else:
+                coefficients = (
+                    unconstrained[self._params_error_transition].copy())
+                for i in range(self.k_endog):
+                    start = i * self.error_order
+                    end = (i + 1) * self.error_order
+                    coefficients[start:end] = constrain_stationary_univariate(
+                        coefficients[start:end])
+                constrained[self._params_error_transition] = coefficients
+
+        else:
+            constrained[self._params_error_transition] = (
+                unconstrained[self._params_error_transition])
+
+        return constrained

     def untransform_params(self, constrained):
         """
@@ -239,10 +765,114 @@ class DynamicFactor(MLEModel):
         unconstrained : array_like
             Array of unconstrained parameters used by the optimizer.
         """
-        pass
+        constrained = np.array(constrained, ndmin=1)
+        dtype = constrained.dtype
+        unconstrained = np.zeros(constrained.shape, dtype=dtype)
+
+        # 1. Factor loadings
+        # The factor loadings do not need to be adjusted
+        unconstrained[self._params_loadings] = (
+            constrained[self._params_loadings])
+
+        # 2. Exog
+        # The regression coefficients do not need to be adjusted
+        unconstrained[self._params_exog] = (
+            constrained[self._params_exog])
+
+        # 3. Error covariances
+        # If we have variances, force them to be positive
+        if self.error_cov_type in ['scalar', 'diagonal']:
+            unconstrained[self._params_error_cov] = (
+                constrained[self._params_error_cov]**0.5)
+        # Otherwise, nothing needs to be done
+        elif self.error_cov_type == 'unstructured':
+            unconstrained[self._params_error_cov] = (
+                constrained[self._params_error_cov])
+
+        # 3. Factor transition VAR
+        # VAR transition: optionally force to be stationary
+        if self.enforce_stationarity and self.factor_order > 0:
+            # Transform the parameters
+            constrained_matrices = (
+                constrained[self._params_factor_transition].reshape(
+                    self.k_factors, self._factor_order))
+            cov = self.ssm['state_cov', :self.k_factors, :self.k_factors].real
+            coefficient_matrices, variance = (
+                unconstrain_stationary_multivariate(
+                    constrained_matrices, cov))
+            unconstrained[self._params_factor_transition] = (
+                coefficient_matrices.ravel())
+        else:
+            unconstrained[self._params_factor_transition] = (
+                constrained[self._params_factor_transition])
+
+        # 5. Error transition VAR
+        # VAR transition: optionally force to be stationary
+        if self.enforce_stationarity and self.error_order > 0:
+
+            # Joint VAR specification
+            if self.error_var:
+                constrained_matrices = (
+                    constrained[self._params_error_transition].reshape(
+                        self.k_endog, self._error_order))
+                start = self.k_factors
+                end = self.k_factors + self.k_endog
+                cov = self.ssm['state_cov', start:end, start:end].real
+                coefficient_matrices, variance = (
+                    unconstrain_stationary_multivariate(
+                        constrained_matrices, cov))
+                unconstrained[self._params_error_transition] = (
+                    coefficient_matrices.ravel())
+            # Separate AR specifications
+            else:
+                coefficients = (
+                    constrained[self._params_error_transition].copy())
+                for i in range(self.k_endog):
+                    start = i * self.error_order
+                    end = (i + 1) * self.error_order
+                    coefficients[start:end] = (
+                        unconstrain_stationary_univariate(
+                            coefficients[start:end]))
+                unconstrained[self._params_error_transition] = coefficients
+
+        else:
+            unconstrained[self._params_error_transition] = (
+                constrained[self._params_error_transition])
+
+        return unconstrained
+
+    def _validate_can_fix_params(self, param_names):
+        super(DynamicFactor, self)._validate_can_fix_params(param_names)
+
+        ix = np.cumsum(list(self.parameters.values()))[:-1]
+        (_, _, _, factor_transition_names, error_transition_names) = [
+            arr.tolist() for arr in np.array_split(self.param_names, ix)]
+
+        if self.enforce_stationarity and self.factor_order > 0:
+            if self.k_factors > 1 or self.factor_order > 1:
+                fix_all = param_names.issuperset(factor_transition_names)
+                fix_any = (
+                    len(param_names.intersection(factor_transition_names)) > 0)
+                if fix_any and not fix_all:
+                    raise ValueError(
+                        'Cannot fix individual factor transition parameters'
+                        ' when `enforce_stationarity=True`. In this case,'
+                        ' must either fix all factor transition parameters or'
+                        ' none.')
+        if self.enforce_stationarity and self.error_order > 0:
+            if self.error_var or self.error_order > 1:
+                fix_all = param_names.issuperset(error_transition_names)
+                fix_any = (
+                    len(param_names.intersection(error_transition_names)) > 0)
+                if fix_any and not fix_all:
+                    raise ValueError(
+                        'Cannot fix individual error transition parameters'
+                        ' when `enforce_stationarity=True`. In this case,'
+                        ' must either fix all error transition parameters or'
+                        ' none.')

     def update(self, params, transformed=True, includes_fixed=False,
-        complex_step=False):
+               complex_step=False):
         """
         Update the parameters of the model

@@ -266,17 +896,17 @@ class DynamicFactor(MLEModel):
         -----
         Let `n = k_endog`, `m = k_factors`, and `p = factor_order`. Then the
         `params` vector has length
-        :math:`[n  imes m] + [n] + [m^2    imes p]`.
+        :math:`[n \times m] + [n] + [m^2 \times p]`.
         It is expanded in the following way:

-        - The first :math:`n   imes m` parameters fill out the factor loading
+        - The first :math:`n \times m` parameters fill out the factor loading
           matrix, starting from the [0,0] entry and then proceeding along rows.
           These parameters are not modified in `transform_params`.
         - The next :math:`n` parameters provide variances for the error_cov
           errors in the observation equation. They fill in the diagonal of the
           observation covariance matrix, and are constrained to be positive by
           `transofrm_params`.
-        - The next :math:`m^2  imes p` parameters are used to create the `p`
+        - The next :math:`m^2 \times p` parameters are used to create the `p`
           coefficient matrices for the vector autoregression describing the
           factor transition. They are transformed in `transform_params` to
           enforce stationarity of the VAR(p). They are placed so as to make
@@ -285,7 +915,46 @@ class DynamicFactor(MLEModel):
           coefficient matrix (starting at [0,0] and filling along rows), the
           second :math:`m^2` parameters fill the second matrix, etc.
         """
-        pass
+        params = self.handle_params(params, transformed=transformed,
+                                    includes_fixed=includes_fixed)
+
+        # 1. Factor loadings
+        # Update the design / factor loading matrix
+        self.ssm[self._idx_loadings] = (
+            params[self._params_loadings].reshape(self.k_endog, self.k_factors)
+        )
+
+        # 2. Exog
+        if self.k_exog > 0:
+            exog_params = params[self._params_exog].reshape(
+                self.k_endog, self.k_exog).T
+            self.ssm[self._idx_exog] = np.dot(self.exog, exog_params).T
+
+        # 3. Error covariances
+        if self.error_cov_type in ['scalar', 'diagonal']:
+            self.ssm[self._idx_error_cov] = (
+                params[self._params_error_cov])
+        elif self.error_cov_type == 'unstructured':
+            error_cov_lower = np.zeros((self.k_endog, self.k_endog),
+                                       dtype=params.dtype)
+            error_cov_lower[self._idx_lower_error_cov] = (
+                params[self._params_error_cov])
+            self.ssm[self._idx_error_cov] = (
+                np.dot(error_cov_lower, error_cov_lower.T))
+
+        # 4. Factor transition VAR
+        self.ssm[self._idx_factor_transition] = (
+            params[self._params_factor_transition].reshape(
+                self.k_factors, self.factor_order * self.k_factors))
+
+        # 5. Error transition VAR
+        if self.error_var:
+            self.ssm[self._idx_error_transition] = (
+                params[self._params_error_transition].reshape(
+                    self.k_endog, self._error_order))
+        else:
+            self.ssm[self._idx_error_transition] = (
+                params[self._params_error_transition])


 class DynamicFactorResults(MLEResults):
@@ -311,41 +980,58 @@ class DynamicFactorResults(MLEResults):
     statsmodels.tsa.statespace.kalman_filter.FilterResults
     statsmodels.tsa.statespace.mlemodel.MLEResults
     """
-
-    def __init__(self, model, params, filter_results, cov_type=None, **kwargs):
+    def __init__(self, model, params, filter_results, cov_type=None,
+                 **kwargs):
         super(DynamicFactorResults, self).__init__(model, params,
-            filter_results, cov_type, **kwargs)
-        self.df_resid = np.inf
-        self.specification = Bunch(**{'k_endog': self.model.k_endog,
+                                                   filter_results, cov_type,
+                                                   **kwargs)
+
+        self.df_resid = np.inf  # attribute required for wald tests
+
+        self.specification = Bunch(**{
+            # Model properties
+            'k_endog': self.model.k_endog,
             'enforce_stationarity': self.model.enforce_stationarity,
-            'k_factors': self.model.k_factors, 'factor_order': self.model.
-            factor_order, 'error_order': self.model.error_order,
-            'error_var': self.model.error_var, 'error_cov_type': self.model
-            .error_cov_type, 'k_exog': self.model.k_exog})
+
+            # Factor-related properties
+            'k_factors': self.model.k_factors,
+            'factor_order': self.model.factor_order,
+
+            # Error-related properties
+            'error_order': self.model.error_order,
+            'error_var': self.model.error_var,
+            'error_cov_type': self.model.error_cov_type,
+
+            # Other properties
+            'k_exog': self.model.k_exog
+        })
+
+        # Polynomials / coefficient matrices
         self.coefficient_matrices_var = None
         if self.model.factor_order > 0:
-            ar_params = np.array(self.params[self.model.
-                _params_factor_transition])
+            ar_params = (
+                np.array(self.params[self.model._params_factor_transition]))
             k_factors = self.model.k_factors
             factor_order = self.model.factor_order
-            self.coefficient_matrices_var = ar_params.reshape(k_factors *
-                factor_order, k_factors).T.reshape(k_factors, k_factors,
-                factor_order).T
+            self.coefficient_matrices_var = (
+                ar_params.reshape(k_factors * factor_order, k_factors).T
+            ).reshape(k_factors, k_factors, factor_order).T
+
         self.coefficient_matrices_error = None
         if self.model.error_order > 0:
-            ar_params = np.array(self.params[self.model.
-                _params_error_transition])
+            ar_params = (
+                np.array(self.params[self.model._params_error_transition]))
             k_endog = self.model.k_endog
             error_order = self.model.error_order
             if self.model.error_var:
-                self.coefficient_matrices_error = ar_params.reshape(k_endog *
-                    error_order, k_endog).T.reshape(k_endog, k_endog,
-                    error_order).T
+                self.coefficient_matrices_error = (
+                    ar_params.reshape(k_endog * error_order, k_endog).T
+                ).reshape(k_endog, k_endog, error_order).T
             else:
                 mat = np.zeros((k_endog, k_endog * error_order))
                 mat[self.model._idx_error_diag] = ar_params
-                self.coefficient_matrices_error = mat.T.reshape(error_order,
-                    k_endog, k_endog)
+                self.coefficient_matrices_error = (
+                    mat.T.reshape(error_order, k_endog, k_endog))

     @property
     def factors(self):
@@ -372,7 +1058,24 @@ class DynamicFactorResults(MLEResults):
         - `offset`: an integer giving the offset in the state vector where
           this component begins
         """
-        pass
+        # If present, level is always the first component of the state vector
+        out = None
+        spec = self.specification
+        if spec.k_factors > 0:
+            offset = 0
+            end = spec.k_factors
+            res = self.filter_results
+            out = Bunch(
+                filtered=res.filtered_state[offset:end],
+                filtered_cov=res.filtered_state_cov[offset:end, offset:end],
+                smoothed=None, smoothed_cov=None,
+                offset=offset)
+            if self.smoothed_state is not None:
+                out.smoothed = self.smoothed_state[offset:end]
+            if self.smoothed_state_cov is not None:
+                out.smoothed_cov = (
+                    self.smoothed_state_cov[offset:end, offset:end])
+        return out

     @cache_readonly
     def coefficients_of_determination(self):
@@ -403,10 +1106,21 @@ class DynamicFactorResults(MLEResults):
         --------
         plot_coefficients_of_determination
         """
-        pass
+        from statsmodels.tools import add_constant
+        spec = self.specification
+        coefficients = np.zeros((spec.k_endog, spec.k_factors))
+        which = 'filtered' if self.smoothed_state is None else 'smoothed'
+
+        for i in range(spec.k_factors):
+            exog = add_constant(self.factors[which][i])
+            for j in range(spec.k_endog):
+                endog = self.filter_results.endog[j]
+                coefficients[j, i] = OLS(endog, exog).fit().rsquared

-    def plot_coefficients_of_determination(self, endog_labels=None, fig=
-        None, figsize=None):
+        return coefficients
+
+    def plot_coefficients_of_determination(self, endog_labels=None,
+                                           fig=None, figsize=None):
         """
         Plot the coefficients of determination

@@ -436,14 +1150,182 @@ class DynamicFactorResults(MLEResults):
         --------
         coefficients_of_determination
         """
-        pass
+        from statsmodels.graphics.utils import _import_mpl, create_mpl_fig
+        _import_mpl()
+        fig = create_mpl_fig(fig, figsize)
+
+        spec = self.specification
+
+        # Should we label endogenous variables?
+        if endog_labels is None:
+            endog_labels = spec.k_endog <= 5
+
+        # Plot the coefficients of determination
+        coefficients_of_determination = self.coefficients_of_determination
+        plot_idx = 1
+        locations = np.arange(spec.k_endog)
+        for coeffs in coefficients_of_determination.T:
+            # Create the new axis
+            ax = fig.add_subplot(spec.k_factors, 1, plot_idx)
+            ax.set_ylim((0, 1))
+            ax.set(title='Factor %i' % plot_idx, ylabel=r'$R^2$')
+            bars = ax.bar(locations, coeffs)
+
+            if endog_labels:
+                width = bars[0].get_width()
+                ax.xaxis.set_ticks(locations + width / 2)
+                ax.xaxis.set_ticklabels(self.model.endog_names)
+            else:
+                ax.set(xlabel='Endogenous variables')
+                ax.xaxis.set_ticks([])
+
+            plot_idx += 1
+
+        return fig
+
+    @Appender(MLEResults.summary.__doc__)
+    def summary(self, alpha=.05, start=None, separate_params=True):
+        from statsmodels.iolib.summary import summary_params
+        spec = self.specification
+
+        # Create the model name
+        model_name = []
+        if spec.k_factors > 0:
+            if spec.factor_order > 0:
+                model_type = ('DynamicFactor(factors=%d, order=%d)' %
+                              (spec.k_factors, spec.factor_order))
+            else:
+                model_type = 'StaticFactor(factors=%d)' % spec.k_factors
+
+            model_name.append(model_type)
+            if spec.k_exog > 0:
+                model_name.append('%d regressors' % spec.k_exog)
+        else:
+            model_name.append('SUR(%d regressors)' % spec.k_exog)
+
+        if spec.error_order > 0:
+            error_type = 'VAR' if spec.error_var else 'AR'
+            model_name.append('%s(%d) errors' % (error_type, spec.error_order))
+
+        summary = super(DynamicFactorResults, self).summary(
+            alpha=alpha, start=start, model_name=model_name,
+            display_params=not separate_params
+        )
+
+        if separate_params:
+            indices = np.arange(len(self.params))
+
+            def make_table(self, mask, title, strip_end=True):
+                res = (self, self.params[mask], self.bse[mask],
+                       self.zvalues[mask], self.pvalues[mask],
+                       self.conf_int(alpha)[mask])
+
+                param_names = [
+                    '.'.join(name.split('.')[:-1]) if strip_end else name
+                    for name in
+                    np.array(self.data.param_names)[mask].tolist()
+                ]
+
+                return summary_params(res, yname=None, xname=param_names,
+                                      alpha=alpha, use_t=False, title=title)
+
+            k_endog = self.model.k_endog
+            k_exog = self.model.k_exog
+            k_factors = self.model.k_factors
+            factor_order = self.model.factor_order
+            _factor_order = self.model._factor_order
+            _error_order = self.model._error_order
+
+            # Add parameter tables for each endogenous variable
+            loading_indices = indices[self.model._params_loadings]
+            loading_masks = []
+            exog_indices = indices[self.model._params_exog]
+            exog_masks = []
+            for i in range(k_endog):
+                # 1. Factor loadings
+                # Recall these are in the form:
+                # 'loading.f1.y1', 'loading.f2.y1', 'loading.f1.y2', ...
+
+                loading_mask = (
+                    loading_indices[i * k_factors:(i + 1) * k_factors])
+                loading_masks.append(loading_mask)
+
+                # 2. Exog
+                # Recall these are in the form:
+                # beta.x1.y1, beta.x2.y1, beta.x1.y2, ...
+                exog_mask = exog_indices[i * k_exog:(i + 1) * k_exog]
+                exog_masks.append(exog_mask)
+
+                # Create the table
+                mask = np.concatenate([loading_mask, exog_mask])
+                title = "Results for equation %s" % self.model.endog_names[i]
+                table = make_table(self, mask, title)
+                summary.tables.append(table)
+
+            # Add parameter tables for each factor
+            factor_indices = indices[self.model._params_factor_transition]
+            factor_masks = []
+            if factor_order > 0:
+                for i in range(k_factors):
+                    start = i * _factor_order
+                    factor_mask = factor_indices[start: start + _factor_order]
+                    factor_masks.append(factor_mask)
+
+                    # Create the table
+                    title = "Results for factor equation f%d" % (i+1)
+                    table = make_table(self, factor_mask, title)
+                    summary.tables.append(table)
+
+            # Add parameter tables for error transitions
+            error_masks = []
+            if spec.error_order > 0:
+                error_indices = indices[self.model._params_error_transition]
+                for i in range(k_endog):
+                    if spec.error_var:
+                        start = i * _error_order
+                        end = (i + 1) * _error_order
+                    else:
+                        start = i * spec.error_order
+                        end = (i + 1) * spec.error_order
+
+                    error_mask = error_indices[start:end]
+                    error_masks.append(error_mask)
+
+                    # Create the table
+                    title = ("Results for error equation e(%s)" %
+                             self.model.endog_names[i])
+                    table = make_table(self, error_mask, title)
+                    summary.tables.append(table)
+
+            # Error covariance terms
+            error_cov_mask = indices[self.model._params_error_cov]
+            table = make_table(self, error_cov_mask,
+                               "Error covariance matrix", strip_end=False)
+            summary.tables.append(table)
+
+            # Add a table for all other parameters
+            masks = []
+            for m in (loading_masks, exog_masks, factor_masks,
+                      error_masks, [error_cov_mask]):
+                m = np.array(m).flatten()
+                if len(m) > 0:
+                    masks.append(m)
+            masks = np.concatenate(masks)
+            inverse_mask = np.array(list(set(indices).difference(set(masks))))
+            if len(inverse_mask) > 0:
+                table = make_table(self, inverse_mask, "Other parameters",
+                                   strip_end=False)
+                summary.tables.append(table)
+
+        return summary


 class DynamicFactorResultsWrapper(MLEResultsWrapper):
     _attrs = {}
-    _wrap_attrs = wrap.union_dicts(MLEResultsWrapper._wrap_attrs, _attrs)
+    _wrap_attrs = wrap.union_dicts(MLEResultsWrapper._wrap_attrs,
+                                   _attrs)
     _methods = {}
-    _wrap_methods = wrap.union_dicts(MLEResultsWrapper._wrap_methods, _methods)
-
-
-wrap.populate_wrapper(DynamicFactorResultsWrapper, DynamicFactorResults)
+    _wrap_methods = wrap.union_dicts(MLEResultsWrapper._wrap_methods,
+                                     _methods)
+wrap.populate_wrapper(DynamicFactorResultsWrapper,  # noqa:E305
+                      DynamicFactorResults)
diff --git a/statsmodels/tsa/statespace/dynamic_factor_mq.py b/statsmodels/tsa/statespace/dynamic_factor_mq.py
index 3b9789775..1a58e0a01 100644
--- a/statsmodels/tsa/statespace/dynamic_factor_mq.py
+++ b/statsmodels/tsa/statespace/dynamic_factor_mq.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Dynamic factor model.

@@ -6,15 +7,18 @@ License: BSD-3
 """
 from collections import OrderedDict
 from warnings import warn
+
 import numpy as np
 import pandas as pd
 from scipy.linalg import cho_factor, cho_solve, LinAlgError
+
 from statsmodels.tools.data import _is_using_pandas
 from statsmodels.tools.validation import int_like
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.regression.linear_model import OLS
 from statsmodels.genmod.generalized_linear_model import GLM
 from statsmodels.multivariate.pca import PCA
+
 from statsmodels.tsa.statespace.sarimax import SARIMAX
 from statsmodels.tsa.statespace._quarterly_ar1 import QuarterlyAR1
 from statsmodels.tsa.vector_ar.var_model import VAR
@@ -22,9 +26,14 @@ from statsmodels.tools.tools import Bunch
 from statsmodels.tools.validation import string_like
 from statsmodels.tsa.tsatools import lagmat
 from statsmodels.tsa.statespace import mlemodel, initialization
-from statsmodels.tsa.statespace.tools import companion_matrix, is_invertible, constrain_stationary_univariate, constrain_stationary_multivariate, unconstrain_stationary_univariate, unconstrain_stationary_multivariate
-from statsmodels.tsa.statespace.kalman_smoother import SMOOTHER_STATE, SMOOTHER_STATE_COV, SMOOTHER_STATE_AUTOCOV
+from statsmodels.tsa.statespace.tools import (
+    companion_matrix, is_invertible, constrain_stationary_univariate,
+    constrain_stationary_multivariate, unconstrain_stationary_univariate,
+    unconstrain_stationary_multivariate)
+from statsmodels.tsa.statespace.kalman_smoother import (
+    SMOOTHER_STATE, SMOOTHER_STATE_COV, SMOOTHER_STATE_AUTOCOV)
 from statsmodels.base.data import PandasData
+
 from statsmodels.iolib.table import SimpleTable
 from statsmodels.iolib.summary import Summary
 from statsmodels.iolib.tableformatting import fmt_params
@@ -69,7 +78,7 @@ class FactorBlock(dict):
     """

     def __init__(self, factor_names, factor_order, endog_factor_map,
-        state_offset, k_endog_Q):
+                 state_offset, k_endog_Q):
         self.factor_names = factor_names
         self.k_factors = len(self.factor_names)
         self.factor_order = factor_order
@@ -81,6 +90,8 @@ class FactorBlock(dict):
         else:
             self._factor_order = self.factor_order
         self.k_states = self.k_factors * self._factor_order
+
+        # Save items
         self['factors'] = self.factors
         self['factors_ar'] = self.factors_ar
         self['factors_ix'] = self.factors_ix
@@ -90,27 +101,39 @@ class FactorBlock(dict):
     @property
     def factors_ix(self):
         """Factor state index array, shaped (k_factors, lags)."""
-        pass
+        # i.e. the position in the state vector of the second lag of the third
+        # factor is factors_ix[2, 1]
+        # ravel(order='F') gives e.g (f0.L1, f1.L1, f0.L2, f1.L2, f0.L3, ...)
+        # while
+        # ravel(order='C') gives e.g (f0.L1, f0.L2, f0.L3, f1.L1, f1.L2, ...)
+        o = self.state_offset
+        return np.reshape(o + np.arange(self.k_factors * self._factor_order),
+                          (self._factor_order, self.k_factors)).T

     @property
     def factors(self):
         """Factors and all lags in the state vector (max(5, p))."""
-        pass
+        # Note that this is equivalent to factors_ix with ravel(order='F')
+        o = self.state_offset
+        return np.s_[o:o + self.k_factors * self._factor_order]

     @property
     def factors_ar(self):
         """Factors and all lags used in the factor autoregression (p)."""
-        pass
+        o = self.state_offset
+        return np.s_[o:o + self.k_factors * self.factor_order]

     @property
     def factors_L1(self):
         """Factors (first block / lag only)."""
-        pass
+        o = self.state_offset
+        return np.s_[o:o + self.k_factors]

     @property
     def factors_L1_5(self):
         """Factors plus four lags."""
-        pass
+        o = self.state_offset
+        return np.s_[o:o + self.k_factors * 5]


 class DynamicFactorMQStates(dict):
@@ -294,34 +317,43 @@ class DynamicFactorMQStates(dict):
     """

     def __init__(self, k_endog_M, k_endog_Q, endog_names, factors,
-        factor_orders, factor_multiplicities, idiosyncratic_ar1):
+                 factor_orders, factor_multiplicities, idiosyncratic_ar1):
+        # Save model parameterization
         self.k_endog_M = k_endog_M
         self.k_endog_Q = k_endog_Q
         self.k_endog = self.k_endog_M + self.k_endog_Q
         self.idiosyncratic_ar1 = idiosyncratic_ar1
+
+        # Validate factor-related inputs
         factors_is_int = np.issubdtype(type(factors), np.integer)
         factors_is_list = isinstance(factors, (list, tuple))
         orders_is_int = np.issubdtype(type(factor_orders), np.integer)
         if factor_multiplicities is None:
             factor_multiplicities = 1
         mult_is_int = np.issubdtype(type(factor_multiplicities), np.integer)
-        if not (factors_is_int or factors_is_list or isinstance(factors, dict)
-            ):
-            raise ValueError(
-                '`factors` argument must an integer number of factors, a list of global factor names, or a dictionary, mapping observed variables to factors.'
-                )
+
+        if not (factors_is_int or factors_is_list or
+                isinstance(factors, dict)):
+            raise ValueError('`factors` argument must an integer number of'
+                             ' factors, a list of global factor names, or a'
+                             ' dictionary, mapping observed variables to'
+                             ' factors.')
         if not (orders_is_int or isinstance(factor_orders, dict)):
-            raise ValueError(
-                '`factor_orders` argument must either be an integer or a dictionary.'
-                )
+            raise ValueError('`factor_orders` argument must either be an'
+                             ' integer or a dictionary.')
         if not (mult_is_int or isinstance(factor_multiplicities, dict)):
-            raise ValueError(
-                '`factor_multiplicities` argument must either be an integer or a dictionary.'
-                )
+            raise ValueError('`factor_multiplicities` argument must either be'
+                             ' an integer or a dictionary.')
+
+        # Expand integers
+        # If `factors` is an integer, we assume that it denotes the number of
+        # global factors (factors that load on each variable)
         if factors_is_int or factors_is_list:
-            if factors_is_int and factors == 0 or factors_is_list and len(
-                factors) == 0:
+            # Validate this here for a more informative error message
+            if ((factors_is_int and factors == 0) or
+                    (factors_is_list and len(factors) == 0)):
                 raise ValueError('The model must contain at least one factor.')
+
             if factors_is_list:
                 factor_names = list(factors)
             else:
@@ -332,87 +364,134 @@ class DynamicFactorMQStates(dict):
             _factor_names.extend(val)
         factor_names = set(_factor_names)
         if orders_is_int:
-            factor_orders = {factor_name: factor_orders for factor_name in
-                factor_names}
+            factor_orders = {factor_name: factor_orders
+                             for factor_name in factor_names}
         if mult_is_int:
-            factor_multiplicities = {factor_name: factor_multiplicities for
-                factor_name in factor_names}
-        factors, factor_orders = self._apply_factor_multiplicities(factors,
-            factor_orders, factor_multiplicities)
+            factor_multiplicities = {factor_name: factor_multiplicities
+                                     for factor_name in factor_names}
+
+        # Apply the factor multiplicities
+        factors, factor_orders = self._apply_factor_multiplicities(
+            factors, factor_orders, factor_multiplicities)
+
+        # Save the (potentially expanded) variables
         self.factors = factors
         self.factor_orders = factor_orders
         self.factor_multiplicities = factor_multiplicities
-        self.endog_factor_map = self._construct_endog_factor_map(factors,
-            endog_names)
+
+        # Get the mapping between endog and factors
+        self.endog_factor_map = self._construct_endog_factor_map(
+            factors, endog_names)
         self.k_factors = self.endog_factor_map.shape[1]
+
+        # Validate number of factors
+        # TODO: could do more extensive validation here.
         if self.k_factors > self.k_endog_M:
-            raise ValueError(
-                f'Number of factors ({self.k_factors}) cannot be greater than the number of monthly endogenous variables ({self.k_endog_M}).'
-                )
-        self.loading_counts = self.endog_factor_map.sum(axis=0).rename('count'
-            ).reset_index().sort_values(['count', 'factor'], ascending=[
-            False, True]).set_index('factor')
-        block_loading_counts = {block: np.atleast_1d(self.loading_counts.
-            loc[list(block), 'count']).mean(axis=0) for block in
-            factor_orders.keys()}
+            raise ValueError(f'Number of factors ({self.k_factors}) cannot be'
+                             ' greater than the number of monthly endogenous'
+                             f' variables ({self.k_endog_M}).')
+
+        # Get `loading_counts`: factor -> # endog loading on the factor
+        self.loading_counts = (
+            self.endog_factor_map.sum(axis=0).rename('count')
+                .reset_index().sort_values(['count', 'factor'],
+                                           ascending=[False, True])
+                .set_index('factor'))
+        # `block_loading_counts`: block -> average of (# loading on factor)
+        # across each factor in the block
+        block_loading_counts = {
+            block: np.atleast_1d(
+                self.loading_counts.loc[list(block), 'count']).mean(axis=0)
+            for block in factor_orders.keys()}
         ix = pd.Index(block_loading_counts.keys(), tupleize_cols=False,
-            name='block')
-        self.block_loading_counts = pd.Series(list(block_loading_counts.
-            values()), index=ix, name='count').to_frame().sort_values([
-            'count', 'block'], ascending=[False, True])['count']
+                      name='block')
+        self.block_loading_counts = pd.Series(
+            list(block_loading_counts.values()),
+            index=ix, name='count').to_frame().sort_values(
+                ['count', 'block'], ascending=[False, True])['count']
+
+        # Get the mapping between factor blocks and VAR order
+
+        # `factor_block_orders`: pd.Series of factor block -> lag order
         ix = pd.Index(factor_orders.keys(), tupleize_cols=False, name='block')
-        self.factor_block_orders = pd.Series(list(factor_orders.values()),
-            index=ix, name='order')
+        self.factor_block_orders = pd.Series(
+            list(factor_orders.values()), index=ix, name='order')
+
+        # If the `factor_orders` variable was an integer, then it did not
+        # define an ordering for the factor blocks. In this case, we use the
+        # loading counts to do so. This ensures that e.g. global factors are
+        # listed first.
         if orders_is_int:
             keys = self.block_loading_counts.keys()
             self.factor_block_orders = self.factor_block_orders.loc[keys]
             self.factor_block_orders.index.name = 'block'
-        factor_names = pd.Series(np.concatenate(list(self.
-            factor_block_orders.index)))
-        missing = [name for name in self.endog_factor_map.columns if name
-             not in factor_names.tolist()]
+
+        # Define factor_names based on factor_block_orders (instead of on those
+        # from `endog_factor_map`) to (a) make sure that factors are allocated
+        # to only one block, and (b) order the factor names to be consistent
+        # with the block definitions.
+        factor_names = pd.Series(
+            np.concatenate(list(self.factor_block_orders.index)))
+        missing = [name for name in self.endog_factor_map.columns
+                   if name not in factor_names.tolist()]
         if len(missing):
             ix = pd.Index([(factor_name,) for factor_name in missing],
-                tupleize_cols=False, name='block')
+                          tupleize_cols=False, name='block')
             default_block_orders = pd.Series(np.ones(len(ix), dtype=int),
-                index=ix, name='order')
-            self.factor_block_orders = self.factor_block_orders.append(
-                default_block_orders)
-            factor_names = pd.Series(np.concatenate(list(self.
-                factor_block_orders.index)))
+                                             index=ix, name='order')
+            self.factor_block_orders = (
+                self.factor_block_orders.append(default_block_orders))
+            factor_names = pd.Series(
+                np.concatenate(list(self.factor_block_orders.index)))
         duplicates = factor_names.duplicated()
         if duplicates.any():
             duplicate_names = set(factor_names[duplicates])
-            raise ValueError(
-                f'Each factor can be assigned to at most one block of factors in `factor_orders`. Duplicate entries for {duplicate_names}'
-                )
+            raise ValueError('Each factor can be assigned to at most one'
+                             ' block of factors in `factor_orders`.'
+                             f' Duplicate entries for {duplicate_names}')
         self.factor_names = factor_names.tolist()
         self.max_factor_order = np.max(self.factor_block_orders)
-        self.endog_factor_map = self.endog_factor_map.loc[endog_names,
-            factor_names]
+
+        # Re-order the columns of the endog factor mapping to reflect the
+        # orderings of endog_names and factor_names
+        self.endog_factor_map = (
+            self.endog_factor_map.loc[endog_names, factor_names])
+
+        # Create factor block helpers, and get factor-related state and posdef
+        # dimensions
         self.k_states_factors = 0
         self.k_posdef_factors = 0
         state_offset = 0
         self.factor_blocks = []
         for factor_names, factor_order in self.factor_block_orders.items():
-            block = FactorBlock(factor_names, factor_order, self.
-                endog_factor_map, state_offset, self.k_endog_Q)
+            block = FactorBlock(factor_names, factor_order,
+                                self.endog_factor_map, state_offset,
+                                self.k_endog_Q)
             self.k_states_factors += block.k_states
             self.k_posdef_factors += block.k_factors
             state_offset += block.k_states
+
             self.factor_blocks.append(block)
+
+        # Idiosyncratic state dimensions
         self.k_states_idio_M = self.k_endog_M if idiosyncratic_ar1 else 0
         self.k_states_idio_Q = self.k_endog_Q * 5
         self.k_states_idio = self.k_states_idio_M + self.k_states_idio_Q
+
+        # Idiosyncratic posdef dimensions
         self.k_posdef_idio_M = self.k_endog_M if self.idiosyncratic_ar1 else 0
         self.k_posdef_idio_Q = self.k_endog_Q
         self.k_posdef_idio = self.k_posdef_idio_M + self.k_posdef_idio_Q
+
+        # Total states, posdef
         self.k_states = self.k_states_factors + self.k_states_idio
         self.k_posdef = self.k_posdef_factors + self.k_posdef_idio
+
+        # Cache
         self._endog_factor_iloc = None

     def _apply_factor_multiplicities(self, factors, factor_orders,
-        factor_multiplicities):
+                                     factor_multiplicities):
         """
         Expand `factors` and `factor_orders` to account for factor multiplity.

@@ -438,7 +517,35 @@ class DynamicFactorMQStates(dict):
             Dictionary of {tuple of factor names: factor order}, with factor
             names in each tuple expanded to incorporate multiplicities.
         """
-        pass
+        # Expand the factors to account for the multiplicities
+        new_factors = {}
+        for endog_name, factors_list in factors.items():
+            new_factor_list = []
+            for factor_name in factors_list:
+                n = factor_multiplicities.get(factor_name, 1)
+                if n > 1:
+                    new_factor_list += [f'{factor_name}.{i + 1}'
+                                        for i in range(n)]
+                else:
+                    new_factor_list.append(factor_name)
+            new_factors[endog_name] = new_factor_list
+
+        # Expand the factor orders to account for the multiplicities
+        new_factor_orders = {}
+        for block, factor_order in factor_orders.items():
+            if not isinstance(block, tuple):
+                block = (block,)
+            new_block = []
+            for factor_name in block:
+                n = factor_multiplicities.get(factor_name, 1)
+                if n > 1:
+                    new_block += [f'{factor_name}.{i + 1}'
+                                  for i in range(n)]
+                else:
+                    new_block += [factor_name]
+            new_factor_orders[tuple(new_block)] = factor_order
+
+        return new_factors, new_factor_orders

     def _construct_endog_factor_map(self, factors, endog_names):
         """
@@ -460,42 +567,117 @@ class DynamicFactorMQStates(dict):
             associated observed variable.

         """
-        pass
+        # Validate that all entries in the factors dictionary have associated
+        # factors
+        missing = []
+        for key, value in factors.items():
+            if not isinstance(value, (list, tuple)) or len(value) == 0:
+                missing.append(key)
+        if len(missing):
+            raise ValueError('Each observed variable must be mapped to at'
+                             ' least one factor in the `factors` dictionary.'
+                             f' Variables missing factors are: {missing}.')
+
+        # Validate that we have been told about the factors for each endog
+        # variable. This is because it doesn't make sense to include an
+        # observed variable that doesn't load on any factor
+        missing = set(endog_names).difference(set(factors.keys()))
+        if len(missing):
+            raise ValueError('If a `factors` dictionary is provided, then'
+                             ' it must include entries for each observed'
+                             f' variable. Missing variables are: {missing}.')
+
+        # Figure out the set of factor names
+        # (0 is just a dummy value for the dict - we just do it this way to
+        # collect the keys, in order, without duplicates.)
+        factor_names = {}
+        for key, value in factors.items():
+            if isinstance(value, str):
+                factor_names[value] = 0
+            else:
+                factor_names.update({v: 0 for v in value})
+        factor_names = list(factor_names.keys())
+        k_factors = len(factor_names)
+
+        endog_factor_map = pd.DataFrame(
+            np.zeros((self.k_endog, k_factors), dtype=bool),
+            index=pd.Index(endog_names, name='endog'),
+            columns=pd.Index(factor_names, name='factor'))
+        for key, value in factors.items():
+            endog_factor_map.loc[key, value] = True
+
+        return endog_factor_map

     @property
     def factors_L1(self):
         """Factors."""
-        pass
+        ix = np.arange(self.k_states_factors)
+        iloc = tuple(ix[block.factors_L1] for block in self.factor_blocks)
+        return np.concatenate(iloc)

     @property
     def factors_L1_5_ix(self):
         """Factors plus any lags, index shaped (5, k_factors)."""
-        pass
+        ix = np.arange(self.k_states_factors)
+        iloc = []
+        for block in self.factor_blocks:
+            iloc.append(ix[block.factors_L1_5].reshape(5, block.k_factors))
+        return np.concatenate(iloc, axis=1)

     @property
     def idio_ar_L1(self):
         """Idiosyncratic AR states, (first block / lag only)."""
-        pass
+        ix1 = self.k_states_factors
+        if self.idiosyncratic_ar1:
+            ix2 = ix1 + self.k_endog
+        else:
+            ix2 = ix1 + self.k_endog_Q
+        return np.s_[ix1:ix2]

     @property
     def idio_ar_M(self):
         """Idiosyncratic AR states for monthly variables."""
-        pass
+        ix1 = self.k_states_factors
+        ix2 = ix1
+        if self.idiosyncratic_ar1:
+            ix2 += self.k_endog_M
+        return np.s_[ix1:ix2]

     @property
     def idio_ar_Q(self):
         """Idiosyncratic AR states and all lags for quarterly variables."""
-        pass
+        # Note that this is equivalent to idio_ar_Q_ix with ravel(order='F')
+        ix1 = self.k_states_factors
+        if self.idiosyncratic_ar1:
+            ix1 += self.k_endog_M
+        ix2 = ix1 + self.k_endog_Q * 5
+        return np.s_[ix1:ix2]

     @property
     def idio_ar_Q_ix(self):
         """Idiosyncratic AR (quarterly) state index, (k_endog_Q, lags)."""
-        pass
+        # i.e. the position in the state vector of the second lag of the third
+        # quarterly variable is idio_ar_Q_ix[2, 1]
+        # ravel(order='F') gives e.g (y1.L1, y2.L1, y1.L2, y2.L3, y1.L3, ...)
+        # while
+        # ravel(order='C') gives e.g (y1.L1, y1.L2, y1.L3, y2.L1, y2.L2, ...)
+        start = self.k_states_factors
+        if self.idiosyncratic_ar1:
+            start += self.k_endog_M
+        return (start + np.reshape(
+                np.arange(5 * self.k_endog_Q), (5, self.k_endog_Q)).T)

     @property
     def endog_factor_iloc(self):
         """List of list of int, factor indexes for each observed variable."""
-        pass
+        # i.e. endog_factor_iloc[i] is a list of integer locations of the
+        # factors that load on the ith observed variable
+        if self._endog_factor_iloc is None:
+            ilocs = []
+            for i in range(self.k_endog):
+                ilocs.append(np.where(self.endog_factor_map.iloc[i])[0])
+            self._endog_factor_iloc = ilocs
+        return self._endog_factor_iloc

     def __getitem__(self, key):
         """
@@ -504,15 +686,15 @@ class DynamicFactorMQStates(dict):
         This is convenient in highlighting the indexing / slice quality of
         these attributes in the code below.
         """
-        if key in ['factors_L1', 'factors_L1_5_ix', 'idio_ar_L1',
-            'idio_ar_M', 'idio_ar_Q', 'idio_ar_Q_ix']:
+        if key in ['factors_L1', 'factors_L1_5_ix', 'idio_ar_L1', 'idio_ar_M',
+                   'idio_ar_Q', 'idio_ar_Q_ix']:
             return getattr(self, key)
         else:
             raise KeyError(key)


 class DynamicFactorMQ(mlemodel.MLEModel):
-    """
+    r"""
     Dynamic factor model with EM algorithm; option for monthly/quarterly data.

     Implementation of the dynamic factor model of Bańbura and Modugno (2014)
@@ -598,7 +780,7 @@ class DynamicFactorMQ(mlemodel.MLEModel):
         with monthly/quarterly mixed frequency data.
     init_t0 : bool, optional
         If True, this option initializes the Kalman filter with the
-        distribution for :math:`\\alpha_0` rather than :math:`\\alpha_1`. See
+        distribution for :math:`\alpha_0` rather than :math:`\alpha_1`. See
         the "Notes" section for more details. This option is rarely used except
         for testing. Default is False.
     obs_cov_diag : bool, optional
@@ -617,27 +799,27 @@ class DynamicFactorMQ(mlemodel.MLEModel):

     .. math::

-        y_t & = \\Lambda f_t + \\epsilon_t \\\\
-        f_t & = A_1 f_{t-1} + \\dots + A_p f_{t-p} + u_t
+        y_t & = \Lambda f_t + \epsilon_t \\
+        f_t & = A_1 f_{t-1} + \dots + A_p f_{t-p} + u_t

     where:

     - :math:`y_t` is observed data at time t
-    - :math:`\\epsilon_t` is idiosyncratic disturbance at time t (see below for
+    - :math:`\epsilon_t` is idiosyncratic disturbance at time t (see below for
       details, including modeling serial correlation in this term)
     - :math:`f_t` is the unobserved factor at time t
-    - :math:`u_t \\sim N(0, Q)` is the factor disturbance at time t
+    - :math:`u_t \sim N(0, Q)` is the factor disturbance at time t

     and:

-    - :math:`\\Lambda` is referred to as the matrix of factor loadings
+    - :math:`\Lambda` is referred to as the matrix of factor loadings
     - :math:`A_i` are matrices of autoregression coefficients

     Furthermore, we allow the idiosyncratic disturbances to be serially
     correlated, so that, if `idiosyncratic_ar1=True`,
-    :math:`\\epsilon_{i,t} = \\rho_i \\epsilon_{i,t-1} + e_{i,t}`, where
-    :math:`e_{i,t} \\sim N(0, \\sigma_i^2)`. If `idiosyncratic_ar1=False`,
-    then we instead have :math:`\\epsilon_{i,t} = e_{i,t}`.
+    :math:`\epsilon_{i,t} = \rho_i \epsilon_{i,t-1} + e_{i,t}`, where
+    :math:`e_{i,t} \sim N(0, \sigma_i^2)`. If `idiosyncratic_ar1=False`,
+    then we instead have :math:`\epsilon_{i,t} = e_{i,t}`.

     This basic setup can be found in [1]_, [2]_, [3]_, and [4]_.

@@ -730,9 +912,9 @@ class DynamicFactorMQ(mlemodel.MLEModel):

     .. math::

-        x_{i, t} = (y_{i, t} - \\bar y_i) / s_i
+        x_{i, t} = (y_{i, t} - \bar y_i) / s_i

-    where :math:`\\bar y_i` is the sample mean and :math:`s_i` is the sample
+    where :math:`\bar y_i` is the sample mean and :math:`s_i` is the sample
     standard deviation.

     By default, if standardization is applied prior to estimation, results such
@@ -763,9 +945,9 @@ class DynamicFactorMQ(mlemodel.MLEModel):
     model is set up / applied.

     - `init_t0`: state space models in Statsmodels follow Durbin and Koopman in
-      initializing the model with :math:`\\alpha_1 \\sim N(a_1, P_1)`. Other
+      initializing the model with :math:`\alpha_1 \sim N(a_1, P_1)`. Other
       implementations sometimes initialize instead with
-      :math:`\\alpha_0 \\sim N(a_0, P_0)`. We can accommodate this by prepending
+      :math:`\alpha_0 \sim N(a_0, P_0)`. We can accommodate this by prepending
       a row of NaNs to the observed dataset.
     - `obs_cov_diag`: the state space form in [1]_ incorporates non-zero (but
       very small) diagonal elements for the observation disturbance covariance
@@ -1097,39 +1279,53 @@ class DynamicFactorMQ(mlemodel.MLEModel):

     """

-    def __init__(self, endog, k_endog_monthly=None, factors=1,
-        factor_orders=1, factor_multiplicities=None, idiosyncratic_ar1=True,
-        standardize=True, endog_quarterly=None, init_t0=False, obs_cov_diag
-        =False, **kwargs):
+    def __init__(self, endog, k_endog_monthly=None, factors=1, factor_orders=1,
+                 factor_multiplicities=None, idiosyncratic_ar1=True,
+                 standardize=True, endog_quarterly=None, init_t0=False,
+                 obs_cov_diag=False, **kwargs):
+        # Handle endog variables
         if endog_quarterly is not None:
             if k_endog_monthly is not None:
-                raise ValueError(
-                    'If `endog_quarterly` is specified, then `endog` must contain only monthly variables, and so `k_endog_monthly` cannot be specified since it will be inferred from the shape of `endog`.'
-                    )
-            endog, k_endog_monthly = self.construct_endog(endog,
-                endog_quarterly)
+                raise ValueError('If `endog_quarterly` is specified, then'
+                                 ' `endog` must contain only monthly'
+                                 ' variables, and so `k_endog_monthly` cannot'
+                                 ' be specified since it will be inferred from'
+                                 ' the shape of `endog`.')
+            endog, k_endog_monthly = self.construct_endog(
+                endog, endog_quarterly)
         endog_is_pandas = _is_using_pandas(endog, None)
+
         if endog_is_pandas:
             if isinstance(endog, pd.Series):
                 endog = endog.to_frame()
-        elif np.ndim(endog) < 2:
-            endog = np.atleast_2d(endog).T
+        else:
+            if np.ndim(endog) < 2:
+                endog = np.atleast_2d(endog).T
+
         if k_endog_monthly is None:
             k_endog_monthly = endog.shape[1]
+
         if endog_is_pandas:
             endog_names = endog.columns.tolist()
-        elif endog.shape[1] == 1:
-            endog_names = ['y']
         else:
-            endog_names = [f'y{i + 1}' for i in range(endog.shape[1])]
+            if endog.shape[1] == 1:
+                endog_names = ['y']
+            else:
+                endog_names = [f'y{i + 1}' for i in range(endog.shape[1])]
+
         self.k_endog_M = int_like(k_endog_monthly, 'k_endog_monthly')
         self.k_endog_Q = endog.shape[1] - self.k_endog_M
-        s = self._s = DynamicFactorMQStates(self.k_endog_M, self.k_endog_Q,
-            endog_names, factors, factor_orders, factor_multiplicities,
-            idiosyncratic_ar1)
+
+        # Compute helper for handling factors / state indexing
+        s = self._s = DynamicFactorMQStates(
+            self.k_endog_M, self.k_endog_Q, endog_names, factors,
+            factor_orders, factor_multiplicities, idiosyncratic_ar1)
+
+        # Save parameterization
         self.factors = factors
         self.factor_orders = factor_orders
         self.factor_multiplicities = factor_multiplicities
+
         self.endog_factor_map = self._s.endog_factor_map
         self.factor_block_orders = self._s.factor_block_orders
         self.factor_names = self._s.factor_names
@@ -1139,38 +1335,49 @@ class DynamicFactorMQ(mlemodel.MLEModel):
         self.idiosyncratic_ar1 = idiosyncratic_ar1
         self.init_t0 = init_t0
         self.obs_cov_diag = obs_cov_diag
+
         if self.init_t0:
+            # TODO: test each of these options
             if endog_is_pandas:
                 ix = pd.period_range(endog.index[0] - 1, endog.index[-1],
-                    freq=endog.index.freq)
+                                     freq=endog.index.freq)
                 endog = endog.reindex(ix)
             else:
                 endog = np.c_[[np.nan] * endog.shape[1], endog.T].T
+
+        # Standardize endog, if requested
+        # Note: endog_mean and endog_std will always each be 1-dimensional with
+        # length equal to the number of endog variables
         if isinstance(standardize, tuple) and len(standardize) == 2:
             endog_mean, endog_std = standardize
+
+            # Validate the input
             n = endog.shape[1]
-            if isinstance(endog_mean, pd.Series
-                ) and not endog_mean.index.equals(pd.Index(endog_names)):
-                raise ValueError(
-                    f'Invalid value passed for `standardize`: if a Pandas Series, must have index {endog_names}. Got {endog_mean.index}.'
-                    )
+            if (isinstance(endog_mean, pd.Series) and not
+                    endog_mean.index.equals(pd.Index(endog_names))):
+                raise ValueError('Invalid value passed for `standardize`:'
+                                 ' if a Pandas Series, must have index'
+                                 f' {endog_names}. Got {endog_mean.index}.')
             else:
                 endog_mean = np.atleast_1d(endog_mean)
-            if isinstance(endog_std, pd.Series) and not endog_std.index.equals(
-                pd.Index(endog_names)):
-                raise ValueError(
-                    f'Invalid value passed for `standardize`: if a Pandas Series, must have index {endog_names}. Got {endog_std.index}.'
-                    )
+            if (isinstance(endog_std, pd.Series) and not
+                    endog_std.index.equals(pd.Index(endog_names))):
+                raise ValueError('Invalid value passed for `standardize`:'
+                                 ' if a Pandas Series, must have index'
+                                 f' {endog_names}. Got {endog_std.index}.')
             else:
                 endog_std = np.atleast_1d(endog_std)
-            if np.shape(endog_mean) != (n,) or np.shape(endog_std) != (n,):
-                raise ValueError(
-                    f'Invalid value passed for `standardize`: each element must be shaped ({n},).'
-                    )
+
+            if (np.shape(endog_mean) != (n,) or np.shape(endog_std) != (n,)):
+                raise ValueError('Invalid value passed for `standardize`: each'
+                                 f' element must be shaped ({n},).')
             standardize = True
+
+            # Make sure we have Pandas if endog is Pandas
             if endog_is_pandas:
                 endog_mean = pd.Series(endog_mean, index=endog_names)
                 endog_std = pd.Series(endog_std, index=endog_names)
+
         elif standardize in [1, True]:
             endog_mean = endog.mean(axis=0)
             endog_std = endog.std(axis=0)
@@ -1185,70 +1392,105 @@ class DynamicFactorMQ(mlemodel.MLEModel):
         if np.any(self._endog_std < 1e-10):
             ix = np.where(self._endog_std < 1e-10)
             names = np.array(endog_names)[ix[0]].tolist()
-            raise ValueError(
-                f'Constant variable(s) found in observed variables, but constants cannot be included in this model. These variables are: {names}.'
-                )
+            raise ValueError('Constant variable(s) found in observed'
+                             ' variables, but constants cannot be included'
+                             f' in this model. These variables are: {names}.')
+
         if self.standardize:
             endog = (endog - self._endog_mean) / self._endog_std
-        o = self._o = {'M': np.s_[:self.k_endog_M], 'Q': np.s_[self.k_endog_M:]
-            }
+
+        # Observation / states slices
+        o = self._o = {
+            'M': np.s_[:self.k_endog_M],
+            'Q': np.s_[self.k_endog_M:]}
+
+        # Construct the basic state space representation
         super().__init__(endog, k_states=s.k_states, k_posdef=s.k_posdef,
-            **kwargs)
+                         **kwargs)
+
+        # Revert the standardization for orig_endog
         if self.standardize:
-            self.data.orig_endog = (self.data.orig_endog * self._endog_std +
-                self._endog_mean)
+            self.data.orig_endog = (
+                self.data.orig_endog * self._endog_std + self._endog_mean)
+
+        # State initialization
+        # Note: we could just initialize the entire thing as stationary, but
+        # doing each block separately should be faster and avoid numerical
+        # issues
         if 'initialization' not in kwargs:
             self.ssm.initialize(self._default_initialization())
+
+        # Fixed components of the state space representation
+
+        # > design
         if self.idiosyncratic_ar1:
             self['design', o['M'], s['idio_ar_M']] = np.eye(self.k_endog_M)
         multipliers = [1, 2, 3, 2, 1]
         for i in range(len(multipliers)):
             m = multipliers[i]
-            self['design', o['Q'], s['idio_ar_Q_ix'][:, i]] = m * np.eye(self
-                .k_endog_Q)
+            self['design', o['Q'], s['idio_ar_Q_ix'][:, i]] = (
+                m * np.eye(self.k_endog_Q))
+
+        # > obs cov
         if self.obs_cov_diag:
-            self['obs_cov'] = np.eye(self.k_endog) * 0.0001
+            self['obs_cov'] = np.eye(self.k_endog) * 1e-4
+
+        # > transition
         for block in s.factor_blocks:
             if block.k_factors == 1:
                 tmp = 0
             else:
                 tmp = np.zeros((block.k_factors, block.k_factors))
-            self['transition', block['factors'], block['factors']
-                ] = companion_matrix([1] + [tmp] * block._factor_order).T
+            self['transition', block['factors'], block['factors']] = (
+                companion_matrix([1] + [tmp] * block._factor_order).T)
         if self.k_endog_Q == 1:
             tmp = 0
         else:
             tmp = np.zeros((self.k_endog_Q, self.k_endog_Q))
-        self['transition', s['idio_ar_Q'], s['idio_ar_Q']] = companion_matrix(
-            [1] + [tmp] * 5).T
+        self['transition', s['idio_ar_Q'], s['idio_ar_Q']] = (
+            companion_matrix([1] + [tmp] * 5).T)
+
+        # > selection
         ix1 = ix2 = 0
         for block in s.factor_blocks:
             ix2 += block.k_factors
-            self['selection', block['factors_ix'][:, 0], ix1:ix2] = np.eye(
-                block.k_factors)
+            self['selection', block['factors_ix'][:, 0], ix1:ix2] = (
+                np.eye(block.k_factors))
             ix1 = ix2
         if self.idiosyncratic_ar1:
             ix2 = ix1 + self.k_endog_M
             self['selection', s['idio_ar_M'], ix1:ix2] = np.eye(self.k_endog_M)
             ix1 = ix2
+
         ix2 = ix1 + self.k_endog_Q
-        self['selection', s['idio_ar_Q_ix'][:, 0], ix1:ix2] = np.eye(self.
-            k_endog_Q)
-        self.params = OrderedDict([('loadings', np.sum(self.
-            endog_factor_map.values)), ('factor_ar', np.sum([(block.
-            k_factors ** 2 * block.factor_order) for block in s.
-            factor_blocks])), ('factor_cov', np.sum([(block.k_factors * (
-            block.k_factors + 1) // 2) for block in s.factor_blocks])), (
-            'idiosyncratic_ar1', self.k_endog if self.idiosyncratic_ar1 else
-            0), ('idiosyncratic_var', self.k_endog)])
+        self['selection', s['idio_ar_Q_ix'][:, 0], ix1:ix2] = (
+            np.eye(self.k_endog_Q))
+
+        # Parameters
+        self.params = OrderedDict([
+            ('loadings', np.sum(self.endog_factor_map.values)),
+            ('factor_ar', np.sum([block.k_factors**2 * block.factor_order
+                                  for block in s.factor_blocks])),
+            ('factor_cov', np.sum([block.k_factors * (block.k_factors + 1) // 2
+                                   for block in s.factor_blocks])),
+            ('idiosyncratic_ar1',
+                self.k_endog if self.idiosyncratic_ar1 else 0),
+            ('idiosyncratic_var', self.k_endog)])
         self.k_params = np.sum(list(self.params.values()))
-        ix = np.split(np.arange(self.k_params), np.cumsum(list(self.params.
-            values()))[:-1])
+
+        # Parameter slices
+        ix = np.split(np.arange(self.k_params),
+                      np.cumsum(list(self.params.values()))[:-1])
         self._p = dict(zip(self.params.keys(), ix))
+
+        # Cache
         self._loading_constraints = {}
-        self._init_keys += ['factors', 'factor_orders',
-            'factor_multiplicities', 'idiosyncratic_ar1', 'standardize',
-            'init_t0', 'obs_cov_diag'] + list(kwargs.keys())
+
+        # Initialization kwarg keys, e.g. for cloning
+        self._init_keys += [
+            'factors', 'factor_orders', 'factor_multiplicities',
+            'idiosyncratic_ar1', 'standardize', 'init_t0',
+            'obs_cov_diag'] + list(kwargs.keys())

     @classmethod
     def construct_endog(cls, endog_monthly, endog_quarterly):
@@ -1277,10 +1519,80 @@ class DynamicFactorMQ(mlemodel.MLEModel):
             The number of monthly variables (which are ordered first) in the
             returned `endog` dataset.
         """
-        pass
+        # Create combined dataset
+        if endog_quarterly is not None:
+            # Validate endog_monthly
+            base_msg = ('If given both monthly and quarterly data'
+                        ' then the monthly dataset must be a Pandas'
+                        ' object with a date index at a monthly frequency.')
+            if not isinstance(endog_monthly, (pd.Series, pd.DataFrame)):
+                raise ValueError('Given monthly dataset is not a'
+                                 ' Pandas object. ' + base_msg)
+            elif endog_monthly.index.inferred_type not in ("datetime64",
+                                                           "period"):
+                raise ValueError('Given monthly dataset has an'
+                                 ' index with non-date values. ' + base_msg)
+            elif not getattr(endog_monthly.index, 'freqstr', 'N')[0] == 'M':
+                freqstr = getattr(endog_monthly.index, 'freqstr', 'None')
+                raise ValueError('Index of given monthly dataset has a'
+                                 ' non-monthly frequency (to check this,'
+                                 ' examine the `freqstr` attribute of the'
+                                 ' index of the dataset - it should start with'
+                                 ' M if it is monthly).'
+                                 f' Got {freqstr}. ' + base_msg)
+
+            # Validate endog_quarterly
+            base_msg = ('If a quarterly dataset is given, then it must be a'
+                        ' Pandas object with a date index at a quarterly'
+                        ' frequency.')
+            if not isinstance(endog_quarterly, (pd.Series, pd.DataFrame)):
+                raise ValueError('Given quarterly dataset is not a'
+                                 ' Pandas object. ' + base_msg)
+            elif endog_quarterly.index.inferred_type not in ("datetime64",
+                                                             "period"):
+                raise ValueError('Given quarterly dataset has an'
+                                 ' index with non-date values. ' + base_msg)
+            elif not getattr(endog_quarterly.index, 'freqstr', 'N')[0] == 'Q':
+                freqstr = getattr(endog_quarterly.index, 'freqstr', 'None')
+                raise ValueError('Index of given quarterly dataset'
+                                 ' has a non-quarterly frequency (to check'
+                                 ' this, examine the `freqstr` attribute of'
+                                 ' the index of the dataset - it should start'
+                                 ' with Q if it is quarterly).'
+                                 f' Got {freqstr}. ' + base_msg)
+
+            # Convert to PeriodIndex, if applicable
+            if hasattr(endog_monthly.index, 'to_period'):
+                endog_monthly = endog_monthly.to_period('M')
+            if hasattr(endog_quarterly.index, 'to_period'):
+                endog_quarterly = endog_quarterly.to_period('Q')
+
+            # Combine the datasets
+            endog = pd.concat([
+                endog_monthly,
+                endog_quarterly.resample('M', convention='end').first()],
+                axis=1)
+
+            # Make sure we didn't accidentally get duplicate column names
+            column_counts = endog.columns.value_counts()
+            if column_counts.max() > 1:
+                columns = endog.columns.values.astype(object)
+                for name in column_counts.index:
+                    count = column_counts.loc[name]
+                    if count == 1:
+                        continue
+                    mask = columns == name
+                    columns[mask] = [f'{name}{i + 1}' for i in range(count)]
+                endog.columns = columns
+        else:
+            endog = endog_monthly.copy()
+        shape = endog_monthly.shape
+        k_endog_monthly = shape[1] if len(shape) == 2 else 1
+
+        return endog, k_endog_monthly

     def clone(self, endog, k_endog_monthly=None, endog_quarterly=None,
-        retain_standardization=False, **kwargs):
+              retain_standardization=False, **kwargs):
         """
         Clone state space model with new data and optionally new specification.

@@ -1305,7 +1617,65 @@ class DynamicFactorMQ(mlemodel.MLEModel):
         -------
         model : DynamicFactorMQ instance
         """
-        pass
+        if retain_standardization and self.standardize:
+            kwargs['standardize'] = (self._endog_mean, self._endog_std)
+        mod = self._clone_from_init_kwds(
+            endog, k_endog_monthly=k_endog_monthly,
+            endog_quarterly=endog_quarterly, **kwargs)
+        return mod
+
+    @property
+    def _res_classes(self):
+        return {'fit': (DynamicFactorMQResults, mlemodel.MLEResultsWrapper)}
+
+    def _default_initialization(self):
+        s = self._s
+        init = initialization.Initialization(self.k_states)
+        for block in s.factor_blocks:
+            init.set(block['factors'], 'stationary')
+        if self.idiosyncratic_ar1:
+            for i in range(s['idio_ar_M'].start, s['idio_ar_M'].stop):
+                init.set(i, 'stationary')
+        init.set(s['idio_ar_Q'], 'stationary')
+        return init
+
+    def _get_endog_names(self, truncate=None, as_string=None):
+        if truncate is None:
+            truncate = False if as_string is False or self.k_endog == 1 else 24
+        if as_string is False and truncate is not False:
+            raise ValueError('Can only truncate endog names if they'
+                             ' are returned as a string.')
+        if as_string is None:
+            as_string = truncate is not False
+
+        # The base `endog_names` property is only a list if there are at least
+        # two variables; often, we need it to be a list
+        endog_names = self.endog_names
+        if not isinstance(endog_names, list):
+            endog_names = [endog_names]
+
+        if as_string:
+            endog_names = [str(name) for name in endog_names]
+
+        if truncate is not False:
+            n = truncate
+            endog_names = [name if len(name) <= n else name[:n] + '...'
+                           for name in endog_names]
+
+        return endog_names
+
+    @property
+    def _model_name(self):
+        model_name = [
+            'Dynamic Factor Model',
+            f'{self.k_factors} factors in {self.k_factor_blocks} blocks']
+        if self.k_endog_Q > 0:
+            model_name.append('Mixed frequency (M/Q)')
+
+        error_type = 'AR(1)' if self.idiosyncratic_ar1 else 'iid'
+        model_name.append(f'{error_type} idiosyncratic')
+
+        return model_name

     def summary(self, truncate_endog_names=None):
         """
@@ -1318,7 +1688,102 @@ class DynamicFactorMQ(mlemodel.MLEModel):
             Default is 24 if there is more than one observed variable, or
             an unlimited number of there is only one.
         """
-        pass
+        # Get endog names
+        endog_names = self._get_endog_names(truncate=truncate_endog_names,
+                                            as_string=True)
+
+        title = 'Model Specification: Dynamic Factor Model'
+
+        if self._index_dates:
+            ix = self._index
+            d = ix[0]
+            sample = ['%s' % d]
+            d = ix[-1]
+            sample += ['- ' + '%s' % d]
+        else:
+            sample = [str(0), ' - ' + str(self.nobs)]
+
+        # Standardize the model name as a list of str
+        model_name = self._model_name
+
+        # - Top summary table ------------------------------------------------
+        top_left = []
+        top_left.append(('Model:', [model_name[0]]))
+        for i in range(1, len(model_name)):
+            top_left.append(('', ['+ ' + model_name[i]]))
+        top_left += [
+            ('Sample:', [sample[0]]),
+            ('', [sample[1]])]
+
+        top_right = []
+        if self.k_endog_Q > 0:
+            top_right += [
+                ('# of monthly variables:', [self.k_endog_M]),
+                ('# of quarterly variables:', [self.k_endog_Q])]
+        else:
+            top_right += [('# of observed variables:', [self.k_endog])]
+        if self.k_factor_blocks == 1:
+            top_right += [('# of factors:', [self.k_factors])]
+        else:
+            top_right += [('# of factor blocks:', [self.k_factor_blocks])]
+        top_right += [('Idiosyncratic disturbances:',
+                       ['AR(1)' if self.idiosyncratic_ar1 else 'iid']),
+                      ('Standardize variables:', [self.standardize])]
+
+        summary = Summary()
+        self.model = self
+        summary.add_table_2cols(self, gleft=top_left, gright=top_right,
+                                title=title)
+        table_ix = 1
+        del self.model
+
+        # - Endog / factor map -----------------------------------------------
+        data = self.endog_factor_map.replace({True: 'X', False: ''})
+        data.index = endog_names
+        try:
+            items = data.items()
+        except AttributeError:
+            # Remove after pandas 1.5 is minimum
+            items = data.iteritems()
+        for name, col in items:
+            data[name] = data[name] + (' ' * (len(name) // 2))
+        data.index.name = 'Dep. variable'
+        data = data.reset_index()
+
+        params_data = data.values
+        params_header = data.columns.map(str).tolist()
+        params_stubs = None
+
+        title = 'Observed variables / factor loadings'
+        table = SimpleTable(
+            params_data, params_header, params_stubs,
+            txt_fmt=fmt_params, title=title)
+
+        summary.tables.insert(table_ix, table)
+        table_ix += 1
+
+        # - Factor blocks summary table --------------------------------------
+        data = self.factor_block_orders.reset_index()
+        data['block'] = data['block'].map(
+            lambda factor_names: ', '.join(factor_names))
+        try:
+            data[['order']] = data[['order']].map(str)
+        except AttributeError:
+            data[['order']] = data[['order']].applymap(str)
+
+        params_data = data.values
+        params_header = data.columns.map(str).tolist()
+        params_stubs = None
+
+        title = 'Factor blocks:'
+        table = SimpleTable(
+            params_data, params_header, params_stubs,
+            txt_fmt=fmt_params, title=title)
+
+        summary.tables.insert(table_ix, table)
+        table_ix += 1
+
+        return summary

     def __str__(self):
         """Summary tables showing model specification."""
@@ -1327,17 +1792,208 @@ class DynamicFactorMQ(mlemodel.MLEModel):
     @property
     def state_names(self):
         """(list of str) List of human readable names for unobserved states."""
-        pass
+        # Factors
+        state_names = []
+        for block in self._s.factor_blocks:
+            state_names += [f'{name}' for name in block.factor_names[:]]
+            for s in range(1, block._factor_order):
+                state_names += [f'L{s}.{name}'
+                                for name in block.factor_names]
+
+        # Monthly error
+        endog_names = self._get_endog_names()
+        if self.idiosyncratic_ar1:
+            endog_names_M = endog_names[self._o['M']]
+            state_names += [f'eps_M.{name}' for name in endog_names_M]
+        endog_names_Q = endog_names[self._o['Q']]
+
+        # Quarterly error
+        state_names += [f'eps_Q.{name}' for name in endog_names_Q]
+        for s in range(1, 5):
+            state_names += [f'L{s}.eps_Q.{name}' for name in endog_names_Q]
+        return state_names

     @property
     def param_names(self):
         """(list of str) List of human readable parameter names."""
-        pass
+        param_names = []
+        # Loadings
+        # So that Lambda = params[ix].reshape(self.k_endog, self.k_factors)
+        # (where Lambda stacks Lambda_M and Lambda_Q)
+        endog_names = self._get_endog_names(as_string=False)
+        for endog_name in endog_names:
+            for block in self._s.factor_blocks:
+                for factor_name in block.factor_names:
+                    if self.endog_factor_map.loc[endog_name, factor_name]:
+                        param_names.append(
+                            f'loading.{factor_name}->{endog_name}')
+
+        # Factor VAR
+        for block in self._s.factor_blocks:
+            for to_factor in block.factor_names:
+                param_names += [f'L{i}.{from_factor}->{to_factor}'
+                                for i in range(1, block.factor_order + 1)
+                                for from_factor in block.factor_names]
+
+        # Factor covariance
+        for i in range(len(self._s.factor_blocks)):
+            block = self._s.factor_blocks[i]
+            param_names += [f'fb({i}).cov.chol[{j + 1},{k + 1}]'
+                            for j in range(block.k_factors)
+                            for k in range(j + 1)]
+
+        # Error AR(1)
+        if self.idiosyncratic_ar1:
+            endog_names_M = endog_names[self._o['M']]
+            param_names += [f'L1.eps_M.{name}' for name in endog_names_M]
+
+            endog_names_Q = endog_names[self._o['Q']]
+            param_names += [f'L1.eps_Q.{name}' for name in endog_names_Q]
+
+        # Error innovation variances
+        param_names += [f'sigma2.{name}' for name in endog_names]
+
+        return param_names

     @property
     def start_params(self):
         """(array) Starting parameters for maximum likelihood estimation."""
-        pass
+        params = np.zeros(self.k_params, dtype=np.float64)
+
+        # (1) estimate factors one at a time, where the first step uses
+        # PCA on all `endog` variables that load on the first factor, and
+        # subsequent steps use residuals from the previous steps.
+        # TODO: what about factors that only load on quarterly variables?
+        endog_factor_map_M = self.endog_factor_map.iloc[:self.k_endog_M]
+        factors = []
+        endog = np.require(
+            pd.DataFrame(self.endog).interpolate().bfill(),
+            requirements="W"
+        )
+        for name in self.factor_names:
+            # Try to retrieve this from monthly variables, which is most
+            # consistent
+            endog_ix = np.where(endog_factor_map_M.loc[:, name])[0]
+            # But fall back to quarterly if necessary
+            if len(endog_ix) == 0:
+                endog_ix = np.where(self.endog_factor_map.loc[:, name])[0]
+            factor_endog = endog[:, endog_ix]
+
+            res_pca = PCA(factor_endog, ncomp=1, method='eig', normalize=False)
+            factors.append(res_pca.factors)
+            endog[:, endog_ix] -= res_pca.projection
+        factors = np.concatenate(factors, axis=1)
+
+        # (2) Estimate coefficients for each endog, one at a time (OLS for
+        # monthly variables, restricted OLS for quarterly). Also, compute
+        # residuals.
+        loadings = []
+        resid = []
+        for i in range(self.k_endog_M):
+            factor_ix = self._s.endog_factor_iloc[i]
+            factor_exog = factors[:, factor_ix]
+            mod_ols = OLS(self.endog[:, i], exog=factor_exog, missing='drop')
+            res_ols = mod_ols.fit()
+            loadings += res_ols.params.tolist()
+            resid.append(res_ols.resid)
+        for i in range(self.k_endog_M, self.k_endog):
+            factor_ix = self._s.endog_factor_iloc[i]
+            factor_exog = lagmat(factors[:, factor_ix], 4, original='in')
+            mod_glm = GLM(self.endog[:, i], factor_exog, missing='drop')
+            res_glm = mod_glm.fit_constrained(self.loading_constraints(i))
+            loadings += res_glm.params[:len(factor_ix)].tolist()
+            resid.append(res_glm.resid_response)
+        params[self._p['loadings']] = loadings
+
+        # (3) For each factor block, use an AR or VAR model to get coefficients
+        # and covariance estimate
+        # Factor transitions
+        stationary = True
+
+        factor_ar = []
+        factor_cov = []
+        i = 0
+        for block in self._s.factor_blocks:
+            factors_endog = factors[:, i:i + block.k_factors]
+            i += block.k_factors
+
+            if block.factor_order == 0:
+                continue
+
+            if block.k_factors == 1:
+                mod_factors = SARIMAX(factors_endog,
+                                      order=(block.factor_order, 0, 0))
+                sp = mod_factors.start_params
+                block_factor_ar = sp[:-1]
+                block_factor_cov = sp[-1:]
+
+                coefficient_matrices = mod_factors.start_params[:-1]
+            elif block.k_factors > 1:
+                mod_factors = VAR(factors_endog)
+                res_factors = mod_factors.fit(
+                    maxlags=block.factor_order, ic=None, trend='n')
+
+                block_factor_ar = res_factors.params.T.ravel()
+                L = np.linalg.cholesky(res_factors.sigma_u)
+                block_factor_cov = L[np.tril_indices_from(L)]
+
+                coefficient_matrices = np.transpose(
+                    np.reshape(block_factor_ar,
+                               (block.k_factors, block.k_factors,
+                                block.factor_order)), (2, 0, 1))
+
+            # Test for stationarity
+            stationary = is_invertible([1] + list(-coefficient_matrices))
+
+            # Check for stationarity
+            if not stationary:
+                warn('Non-stationary starting factor autoregressive'
+                     ' parameters found for factor block'
+                     f' {block.factor_names}. Using zeros as starting'
+                     ' parameters.')
+                block_factor_ar[:] = 0
+                cov_factor = np.diag(factors_endog.std(axis=0))
+                block_factor_cov = (
+                    cov_factor[np.tril_indices(block.k_factors)])
+            factor_ar += block_factor_ar.tolist()
+            factor_cov += block_factor_cov.tolist()
+        params[self._p['factor_ar']] = factor_ar
+        params[self._p['factor_cov']] = factor_cov
+
+        # (4) Use residuals from step (2) to estimate the idiosyncratic
+        # component
+        # Idiosyncratic component
+        if self.idiosyncratic_ar1:
+            idio_ar1 = []
+            idio_var = []
+            for i in range(self.k_endog_M):
+                mod_idio = SARIMAX(resid[i], order=(1, 0, 0), trend='c')
+                sp = mod_idio.start_params
+                idio_ar1.append(np.clip(sp[1], -0.99, 0.99))
+                idio_var.append(np.clip(sp[-1], 1e-5, np.inf))
+            for i in range(self.k_endog_M, self.k_endog):
+                y = self.endog[:, i].copy()
+                y[~np.isnan(y)] = resid[i]
+                mod_idio = QuarterlyAR1(y)
+                res_idio = mod_idio.fit(maxiter=10, return_params=True,
+                                        disp=False)
+                res_idio = mod_idio.fit_em(res_idio, maxiter=5,
+                                           return_params=True)
+                idio_ar1.append(np.clip(res_idio[0], -0.99, 0.99))
+                idio_var.append(np.clip(res_idio[1], 1e-5, np.inf))
+            params[self._p['idiosyncratic_ar1']] = idio_ar1
+            params[self._p['idiosyncratic_var']] = idio_var
+        else:
+            idio_var = [np.var(resid[i]) for i in range(self.k_endog_M)]
+            for i in range(self.k_endog_M, self.k_endog):
+                y = self.endog[:, i].copy()
+                y[~np.isnan(y)] = resid[i]
+                mod_idio = QuarterlyAR1(y)
+                res_idio = mod_idio.fit(return_params=True, disp=False)
+                idio_var.append(np.clip(res_idio[1], 1e-5, np.inf))
+            params[self._p['idiosyncratic_var']] = idio_var
+
+        return params

     def transform_params(self, unconstrained):
         """
@@ -1358,7 +2014,36 @@ class DynamicFactorMQ(mlemodel.MLEModel):
             Array of constrained parameters which may be used in likelihood
             evaluation.
         """
-        pass
+        constrained = unconstrained.copy()
+
+        # Stationary factor VAR
+        unconstrained_factor_ar = unconstrained[self._p['factor_ar']]
+        constrained_factor_ar = []
+        i = 0
+        for block in self._s.factor_blocks:
+            length = block.k_factors**2 * block.factor_order
+            tmp_coeff = np.reshape(
+                unconstrained_factor_ar[i:i + length],
+                (block.k_factors, block.k_factors * block.factor_order))
+            tmp_cov = np.eye(block.k_factors)
+            tmp_coeff, _ = constrain_stationary_multivariate(tmp_coeff,
+                                                             tmp_cov)
+            constrained_factor_ar += tmp_coeff.ravel().tolist()
+            i += length
+        constrained[self._p['factor_ar']] = constrained_factor_ar
+
+        # Stationary idiosyncratic AR(1)
+        if self.idiosyncratic_ar1:
+            idio_ar1 = unconstrained[self._p['idiosyncratic_ar1']]
+            constrained[self._p['idiosyncratic_ar1']] = [
+                constrain_stationary_univariate(idio_ar1[i:i + 1])[0]
+                for i in range(self.k_endog)]
+
+        # Positive idiosyncratic variances
+        constrained[self._p['idiosyncratic_var']] = (
+            constrained[self._p['idiosyncratic_var']]**2)
+
+        return constrained

     def untransform_params(self, constrained):
         """
@@ -1378,7 +2063,36 @@ class DynamicFactorMQ(mlemodel.MLEModel):
         unconstrained : array_like
             Array of unconstrained parameters used by the optimizer.
         """
-        pass
+        unconstrained = constrained.copy()
+
+        # Stationary factor VAR
+        constrained_factor_ar = constrained[self._p['factor_ar']]
+        unconstrained_factor_ar = []
+        i = 0
+        for block in self._s.factor_blocks:
+            length = block.k_factors**2 * block.factor_order
+            tmp_coeff = np.reshape(
+                constrained_factor_ar[i:i + length],
+                (block.k_factors, block.k_factors * block.factor_order))
+            tmp_cov = np.eye(block.k_factors)
+            tmp_coeff, _ = unconstrain_stationary_multivariate(tmp_coeff,
+                                                               tmp_cov)
+            unconstrained_factor_ar += tmp_coeff.ravel().tolist()
+            i += length
+        unconstrained[self._p['factor_ar']] = unconstrained_factor_ar
+
+        # Stationary idiosyncratic AR(1)
+        if self.idiosyncratic_ar1:
+            idio_ar1 = constrained[self._p['idiosyncratic_ar1']]
+            unconstrained[self._p['idiosyncratic_ar1']] = [
+                unconstrain_stationary_univariate(idio_ar1[i:i + 1])[0]
+                for i in range(self.k_endog)]
+
+        # Positive idiosyncratic variances
+        unconstrained[self._p['idiosyncratic_var']] = (
+            unconstrained[self._p['idiosyncratic_var']]**0.5)
+
+        return unconstrained

     def update(self, params, **kwargs):
         """
@@ -1393,7 +2107,71 @@ class DynamicFactorMQ(mlemodel.MLEModel):
             `transform_params` is called. Default is True.

         """
-        pass
+        params = super().update(params, **kwargs)
+
+        # Local copies
+        o = self._o
+        s = self._s
+        p = self._p
+
+        # Loadings
+        loadings = params[p['loadings']]
+        start = 0
+        for i in range(self.k_endog_M):
+            iloc = self._s.endog_factor_iloc[i]
+            k_factors = len(iloc)
+            factor_ix = s['factors_L1'][iloc]
+            self['design', i, factor_ix] = loadings[start:start + k_factors]
+            start += k_factors
+        multipliers = np.array([1, 2, 3, 2, 1])[:, None]
+        for i in range(self.k_endog_M, self.k_endog):
+            iloc = self._s.endog_factor_iloc[i]
+            k_factors = len(iloc)
+            factor_ix = s['factors_L1_5_ix'][:, iloc]
+            self['design', i, factor_ix.ravel()] = np.ravel(
+                loadings[start:start + k_factors] * multipliers)
+            start += k_factors
+
+        # Factor VAR
+        factor_ar = params[p['factor_ar']]
+        start = 0
+        for block in s.factor_blocks:
+            k_params = block.k_factors**2 * block.factor_order
+            A = np.reshape(
+                factor_ar[start:start + k_params],
+                (block.k_factors, block.k_factors * block.factor_order))
+            start += k_params
+            self['transition', block['factors_L1'], block['factors_ar']] = A
+
+        # Factor covariance
+        factor_cov = params[p['factor_cov']]
+        start = 0
+        ix1 = 0
+        for block in s.factor_blocks:
+            k_params = block.k_factors * (block.k_factors + 1) // 2
+            L = np.zeros((block.k_factors, block.k_factors),
+                         dtype=params.dtype)
+            L[np.tril_indices_from(L)] = factor_cov[start:start + k_params]
+            start += k_params
+            Q = L @ L.T
+            ix2 = ix1 + block.k_factors
+            self['state_cov', ix1:ix2, ix1:ix2] = Q
+            ix1 = ix2
+
+        # Error AR(1)
+        if self.idiosyncratic_ar1:
+            alpha = np.diag(params[p['idiosyncratic_ar1']])
+            self['transition', s['idio_ar_L1'], s['idio_ar_L1']] = alpha
+
+        # Error variances
+        if self.idiosyncratic_ar1:
+            self['state_cov', self.k_factors:, self.k_factors:] = (
+                np.diag(params[p['idiosyncratic_var']]))
+        else:
+            idio_var = params[p['idiosyncratic_var']]
+            self['obs_cov', o['M'], o['M']] = np.diag(idio_var[o['M']])
+            self['state_cov', self.k_factors:, self.k_factors:] = (
+                np.diag(idio_var[o['Q']]))

     @property
     def loglike_constant(self):
@@ -1403,10 +2181,10 @@ class DynamicFactorMQ(mlemodel.MLEModel):
         Useful in facilitating comparisons to other packages that exclude the
         constant from the log-likelihood computation.
         """
-        pass
+        return -0.5 * (1 - np.isnan(self.endog)).sum() * np.log(2 * np.pi)

     def loading_constraints(self, i):
-        """
+        r"""
         Matrix formulation of quarterly variables' factor loading constraints.

         Parameters
@@ -1437,7 +2215,7 @@ class DynamicFactorMQ(mlemodel.MLEModel):

         y_i = A_i f + 2 A_i L1.f + 3 A_i L2.f + 2 A_i L3.f + A_i L4.f

-        Stack the unconstrained coefficients: \\Lambda_i = [A_i' B_i' ... E_i']'
+        Stack the unconstrained coefficients: \Lambda_i = [A_i' B_i' ... E_i']'

         Then the constraints can be written as follows, for l = 1, ..., k_i

@@ -1450,9 +2228,9 @@ class DynamicFactorMQ(mlemodel.MLEModel):

         .. math::

-            R \\Lambda_i = q
+            R \Lambda_i = q

-        where :math:`\\Lambda_i` is shaped `(k_i * 5,)`, :math:`R` is shaped
+        where :math:`\Lambda_i` is shaped `(k_i * 5,)`, :math:`R` is shaped
         `(k_constraints, k_i * 5)`, and :math:`q` is shaped `(k_constraints,)`.


@@ -1470,15 +2248,34 @@ class DynamicFactorMQ(mlemodel.MLEModel):
                                                     | E_{i,2} |     | 0 |

         """
-        pass
+        if i < self.k_endog_M:
+            raise ValueError('No constraints for monthly variables.')
+        if i not in self._loading_constraints:
+            k_factors = self.endog_factor_map.iloc[i].sum()
+
+            R = np.zeros((k_factors * 4, k_factors * 5))
+            q = np.zeros(R.shape[0])
+
+            # Let R = [R_1 R_2]
+            # Then R_1 is multiples of the identity matrix
+            multipliers = np.array([1, 2, 3, 2, 1])
+            R[:, :k_factors] = np.reshape(
+                (multipliers[1:] * np.eye(k_factors)[..., None]).T,
+                (k_factors * 4, k_factors))
+
+            # And R_2 is the identity
+            R[:, k_factors:] = np.diag([-1] * (k_factors * 4))
+
+            self._loading_constraints[i] = (R, q)
+        return self._loading_constraints[i]

     def fit(self, start_params=None, transformed=True, includes_fixed=False,
-        cov_type='none', cov_kwds=None, method='em', maxiter=500, tolerance
-        =1e-06, em_initialization=True, mstep_method=None, full_output=1,
-        disp=False, callback=None, return_params=False, optim_score=None,
-        optim_complex_step=None, optim_hessian=None, flags=None, low_memory
-        =False, llf_decrease_action='revert', llf_decrease_tolerance=0.0001,
-        **kwargs):
+            cov_type='none', cov_kwds=None, method='em', maxiter=500,
+            tolerance=1e-6, em_initialization=True, mstep_method=None,
+            full_output=1, disp=False, callback=None, return_params=False,
+            optim_score=None, optim_complex_step=None, optim_hessian=None,
+            flags=None, low_memory=False, llf_decrease_action='revert',
+            llf_decrease_tolerance=1e-4, **kwargs):
         """
         Fits the model by maximum likelihood via Kalman filter.

@@ -1612,13 +2409,32 @@ class DynamicFactorMQ(mlemodel.MLEModel):
         statsmodels.base.model.LikelihoodModel.fit
         statsmodels.tsa.statespace.mlemodel.MLEResults
         """
-        pass
+        if method == 'em':
+            return self.fit_em(
+                start_params=start_params, transformed=transformed,
+                cov_type=cov_type, cov_kwds=cov_kwds, maxiter=maxiter,
+                tolerance=tolerance, em_initialization=em_initialization,
+                mstep_method=mstep_method, full_output=full_output, disp=disp,
+                return_params=return_params, low_memory=low_memory,
+                llf_decrease_action=llf_decrease_action,
+                llf_decrease_tolerance=llf_decrease_tolerance, **kwargs)
+        else:
+            return super().fit(
+                start_params=start_params, transformed=transformed,
+                includes_fixed=includes_fixed, cov_type=cov_type,
+                cov_kwds=cov_kwds, method=method, maxiter=maxiter,
+                full_output=full_output, disp=disp,
+                callback=callback, return_params=return_params,
+                optim_score=optim_score,
+                optim_complex_step=optim_complex_step,
+                optim_hessian=optim_hessian, flags=flags,
+                low_memory=low_memory, **kwargs)

     def fit_em(self, start_params=None, transformed=True, cov_type='none',
-        cov_kwds=None, maxiter=500, tolerance=1e-06, disp=False,
-        em_initialization=True, mstep_method=None, full_output=True,
-        return_params=False, low_memory=False, llf_decrease_action='revert',
-        llf_decrease_tolerance=0.0001):
+               cov_kwds=None, maxiter=500, tolerance=1e-6, disp=False,
+               em_initialization=True, mstep_method=None, full_output=True,
+               return_params=False, low_memory=False,
+               llf_decrease_action='revert', llf_decrease_tolerance=1e-4):
         """
         Fits the model by maximum likelihood via the EM algorithm.

@@ -1712,31 +2528,509 @@ class DynamicFactorMQ(mlemodel.MLEModel):
         statsmodels.tsa.statespace.mlemodel.MLEModel.fit
         statsmodels.tsa.statespace.mlemodel.MLEResults
         """
-        pass
+        if self._has_fixed_params:
+            raise NotImplementedError('Cannot fit using the EM algorithm while'
+                                      ' holding some parameters fixed.')
+        if low_memory:
+            raise ValueError('Cannot fit using the EM algorithm when using'
+                             ' low_memory option.')
+
+        if start_params is None:
+            start_params = self.start_params
+            transformed = True
+        else:
+            start_params = np.array(start_params, ndmin=1)
+
+        if not transformed:
+            start_params = self.transform_params(start_params)
+
+        llf_decrease_action = string_like(
+            llf_decrease_action, 'llf_decrease_action',
+            options=['ignore', 'warn', 'revert'])
+
+        disp = int(disp)
+
+        # Perform expectation-maximization
+        s = self._s
+        llf = []
+        params = [start_params]
+        init = None
+        inits = [self.ssm.initialization]
+        i = 0
+        delta = 0
+        terminate = False
+        # init_stationary = None if em_initialization else True
+        while i < maxiter and not terminate and (i < 1 or (delta > tolerance)):
+            out = self._em_iteration(params[-1], init=init,
+                                     mstep_method=mstep_method)
+            new_llf = out[0].llf_obs.sum()
+
+            # If we are not using EM initialization, then we need to check for
+            # non-stationary parameters
+            if not em_initialization:
+                self.update(out[1])
+                switch_init = []
+                T = self['transition']
+                init = self.ssm.initialization
+                iloc = np.arange(self.k_states)
+
+                # We may only have global initialization if we have no
+                # quarterly variables and idiosyncratic_ar1=False
+                if self.k_endog_Q == 0 and not self.idiosyncratic_ar1:
+                    block = s.factor_blocks[0]
+                    if init.initialization_type == 'stationary':
+                        Tb = T[block['factors'], block['factors']]
+                        if not np.all(np.linalg.eigvals(Tb) < (1 - 1e-10)):
+                            init.set(block['factors'], 'diffuse')
+                            switch_init.append(
+                                'factor block:'
+                                f' {tuple(block.factor_names)}')
+                else:
+                    # Factor blocks
+                    for block in s.factor_blocks:
+                        b = tuple(iloc[block['factors']])
+                        init_type = init.blocks[b].initialization_type
+                        if init_type == 'stationary':
+                            Tb = T[block['factors'], block['factors']]
+                            if not np.all(np.linalg.eigvals(Tb) < (1 - 1e-10)):
+                                init.set(block['factors'], 'diffuse')
+                                switch_init.append(
+                                    'factor block:'
+                                    f' {tuple(block.factor_names)}')
+
+                if self.idiosyncratic_ar1:
+                    endog_names = self._get_endog_names(as_string=True)
+                    # Monthly variables
+                    for j in range(s['idio_ar_M'].start, s['idio_ar_M'].stop):
+                        init_type = init.blocks[(j,)].initialization_type
+                        if init_type == 'stationary':
+                            if not np.abs(T[j, j]) < (1 - 1e-10):
+                                init.set(j, 'diffuse')
+                                name = endog_names[j - s['idio_ar_M'].start]
+                                switch_init.append(
+                                    'idiosyncratic AR(1) for monthly'
+                                    f' variable: {name}')
+
+                    # Quarterly variables
+                    if self.k_endog_Q > 0:
+                        b = tuple(iloc[s['idio_ar_Q']])
+                        init_type = init.blocks[b].initialization_type
+                        if init_type == 'stationary':
+                            Tb = T[s['idio_ar_Q'], s['idio_ar_Q']]
+                            if not np.all(np.linalg.eigvals(Tb) < (1 - 1e-10)):
+                                init.set(s['idio_ar_Q'], 'diffuse')
+                                switch_init.append(
+                                    'idiosyncratic AR(1) for the'
+                                    ' block of quarterly variables')
+
+                if len(switch_init) > 0:
+                    warn('Non-stationary parameters found at EM iteration'
+                         f' {i + 1}, which is not compatible with'
+                         ' stationary initialization. Initialization was'
+                         ' switched to diffuse for the following: '
+                         f' {switch_init}, and fitting was restarted.')
+                    results = self.fit_em(
+                        start_params=params[-1], transformed=transformed,
+                        cov_type=cov_type, cov_kwds=cov_kwds,
+                        maxiter=maxiter, tolerance=tolerance,
+                        em_initialization=em_initialization,
+                        mstep_method=mstep_method, full_output=full_output,
+                        disp=disp, return_params=return_params,
+                        low_memory=low_memory,
+                        llf_decrease_action=llf_decrease_action,
+                        llf_decrease_tolerance=llf_decrease_tolerance)
+                    self.ssm.initialize(self._default_initialization())
+                    return results
+
+            # Check for decrease in the log-likelihood
+            # Note: allow a little numerical error before declaring a decrease
+            llf_decrease = (
+                i > 0 and (new_llf - llf[-1]) < -llf_decrease_tolerance)
+
+            if llf_decrease_action == 'revert' and llf_decrease:
+                warn(f'Log-likelihood decreased at EM iteration {i + 1}.'
+                     f' Reverting to the results from EM iteration {i}'
+                     ' (prior to the decrease) and returning the solution.')
+                # Terminated iteration
+                i -= 1
+                terminate = True
+            else:
+                if llf_decrease_action == 'warn' and llf_decrease:
+                    warn(f'Log-likelihood decreased at EM iteration {i + 1},'
+                         ' which can indicate numerical issues.')
+                llf.append(new_llf)
+                params.append(out[1])
+                if em_initialization:
+                    init = initialization.Initialization(
+                        self.k_states, 'known',
+                        constant=out[0].smoothed_state[..., 0],
+                        stationary_cov=out[0].smoothed_state_cov[..., 0])
+                    inits.append(init)
+                if i > 0:
+                    delta = (2 * np.abs(llf[-1] - llf[-2]) /
+                             (np.abs(llf[-1]) + np.abs(llf[-2])))
+                else:
+                    delta = np.inf
+
+                # If `disp` is not False, display the first iteration
+                if disp and i == 0:
+                    print(f'EM start iterations, llf={llf[-1]:.5g}')
+                # Print output every `disp` observations
+                elif disp and ((i + 1) % disp) == 0:
+                    print(f'EM iteration {i + 1}, llf={llf[-1]:.5g},'
+                          f' convergence criterion={delta:.5g}')
+
+            # Advance the iteration counter
+            i += 1
+
+        # Check for convergence
+        not_converged = (i == maxiter and delta > tolerance)
+
+        # If no convergence without explicit termination, warn users
+        if not_converged:
+            warn(f'EM reached maximum number of iterations ({maxiter}),'
+                 f' without achieving convergence: llf={llf[-1]:.5g},'
+                 f' convergence criterion={delta:.5g}'
+                 f' (while specified tolerance was {tolerance:.5g})')
+
+        # If `disp` is not False, display the final iteration
+        if disp:
+            if terminate:
+                print(f'EM terminated at iteration {i}, llf={llf[-1]:.5g},'
+                      f' convergence criterion={delta:.5g}'
+                      f' (while specified tolerance was {tolerance:.5g})')
+            elif not_converged:
+                print(f'EM reached maximum number of iterations ({maxiter}),'
+                      f' without achieving convergence: llf={llf[-1]:.5g},'
+                      f' convergence criterion={delta:.5g}'
+                      f' (while specified tolerance was {tolerance:.5g})')
+            else:
+                print(f'EM converged at iteration {i}, llf={llf[-1]:.5g},'
+                      f' convergence criterion={delta:.5g}'
+                      f' < tolerance={tolerance:.5g}')
+
+        # Just return the fitted parameters if requested
+        if return_params:
+            result = params[-1]
+        # Otherwise construct the results class if desired
+        else:
+            if em_initialization:
+                base_init = self.ssm.initialization
+                self.ssm.initialization = init
+            # Note that because we are using params[-1], we are actually using
+            # the results from one additional iteration compared to the
+            # iteration at which we declared convergence.
+            result = self.smooth(params[-1], transformed=True,
+                                 cov_type=cov_type, cov_kwds=cov_kwds)
+            if em_initialization:
+                self.ssm.initialization = base_init
+
+            # Save the output
+            if full_output:
+                llf.append(result.llf)
+                em_retvals = Bunch(**{'params': np.array(params),
+                                      'llf': np.array(llf),
+                                      'iter': i,
+                                      'inits': inits})
+                em_settings = Bunch(**{'method': 'em',
+                                       'tolerance': tolerance,
+                                       'maxiter': maxiter})
+            else:
+                em_retvals = None
+                em_settings = None
+
+            result._results.mle_retvals = em_retvals
+            result._results.mle_settings = em_settings
+
+        return result

     def _em_iteration(self, params0, init=None, mstep_method=None):
         """EM iteration."""
-        pass
+        # (E)xpectation step
+        res = self._em_expectation_step(params0, init=init)
+
+        # (M)aximization step
+        params1 = self._em_maximization_step(res, params0,
+                                             mstep_method=mstep_method)
+
+        return res, params1

     def _em_expectation_step(self, params0, init=None):
         """EM expectation step."""
-        pass
+        # (E)xpectation step
+        self.update(params0)
+        # Re-initialize state, if new initialization is given
+        if init is not None:
+            base_init = self.ssm.initialization
+            self.ssm.initialization = init
+        # Perform smoothing, only saving what is required
+        res = self.ssm.smooth(
+            SMOOTHER_STATE | SMOOTHER_STATE_COV | SMOOTHER_STATE_AUTOCOV,
+            update_filter=False)
+        res.llf_obs = np.array(
+            self.ssm._kalman_filter.loglikelihood, copy=True)
+        # Reset initialization
+        if init is not None:
+            self.ssm.initialization = base_init
+
+        return res

     def _em_maximization_step(self, res, params0, mstep_method=None):
         """EM maximization step."""
-        pass
+        s = self._s
+
+        a = res.smoothed_state.T[..., None]
+        cov_a = res.smoothed_state_cov.transpose(2, 0, 1)
+        acov_a = res.smoothed_state_autocov.transpose(2, 0, 1)
+
+        # E[a_t a_t'], t = 0, ..., T
+        Eaa = cov_a.copy() + np.matmul(a, a.transpose(0, 2, 1))
+        # E[a_t a_{t-1}'], t = 1, ..., T
+        Eaa1 = acov_a[:-1] + np.matmul(a[1:], a[:-1].transpose(0, 2, 1))
+
+        # Observation equation
+        has_missing = np.any(res.nmissing)
+        if mstep_method is None:
+            mstep_method = 'missing' if has_missing else 'nonmissing'
+        mstep_method = mstep_method.lower()
+        if mstep_method == 'nonmissing' and has_missing:
+            raise ValueError('Cannot use EM algorithm option'
+                             ' `mstep_method="nonmissing"` with missing data.')
+
+        if mstep_method == 'nonmissing':
+            func = self._em_maximization_obs_nonmissing
+        elif mstep_method == 'missing':
+            func = self._em_maximization_obs_missing
+        else:
+            raise ValueError('Invalid maximization step method: "%s".'
+                             % mstep_method)
+        # TODO: compute H is pretty slow
+        Lambda, H = func(res, Eaa, a, compute_H=(not self.idiosyncratic_ar1))
+
+        # Factor VAR and covariance
+        factor_ar = []
+        factor_cov = []
+        for b in s.factor_blocks:
+            A = Eaa[:-1, b['factors_ar'], b['factors_ar']].sum(axis=0)
+            B = Eaa1[:, b['factors_L1'], b['factors_ar']].sum(axis=0)
+            C = Eaa[1:, b['factors_L1'], b['factors_L1']].sum(axis=0)
+            nobs = Eaa.shape[0] - 1
+
+            # want: x = B A^{-1}, so solve: x A = B or solve: A' x' = B'
+            try:
+                f_A = cho_solve(cho_factor(A), B.T).T
+            except LinAlgError:
+                # Fall back to general solver if there are problems with
+                # postive-definiteness
+                f_A = np.linalg.solve(A, B.T).T
+
+            f_Q = (C - f_A @ B.T) / nobs
+            factor_ar += f_A.ravel().tolist()
+            factor_cov += (
+                np.linalg.cholesky(f_Q)[np.tril_indices_from(f_Q)].tolist())
+
+        # Idiosyncratic AR(1) and variances
+        if self.idiosyncratic_ar1:
+            ix = s['idio_ar_L1']
+
+            Ad = Eaa[:-1, ix, ix].sum(axis=0).diagonal()
+            Bd = Eaa1[:, ix, ix].sum(axis=0).diagonal()
+            Cd = Eaa[1:, ix, ix].sum(axis=0).diagonal()
+            nobs = Eaa.shape[0] - 1
+
+            alpha = Bd / Ad
+            sigma2 = (Cd - alpha * Bd) / nobs
+        else:
+            ix = s['idio_ar_L1']
+            C = Eaa[:, ix, ix].sum(axis=0)
+            sigma2 = np.r_[H.diagonal()[self._o['M']],
+                           C.diagonal() / Eaa.shape[0]]
+
+        # Save parameters
+        params1 = np.zeros_like(params0)
+        loadings = []
+        for i in range(self.k_endog):
+            iloc = self._s.endog_factor_iloc[i]
+            factor_ix = s['factors_L1'][iloc]
+            loadings += Lambda[i, factor_ix].tolist()
+        params1[self._p['loadings']] = loadings
+        params1[self._p['factor_ar']] = factor_ar
+        params1[self._p['factor_cov']] = factor_cov
+        if self.idiosyncratic_ar1:
+            params1[self._p['idiosyncratic_ar1']] = alpha
+        params1[self._p['idiosyncratic_var']] = sigma2
+
+        return params1

     def _em_maximization_obs_nonmissing(self, res, Eaa, a, compute_H=False):
         """EM maximization step, observation equation without missing data."""
-        pass
+        s = self._s
+        dtype = Eaa.dtype
+
+        # Observation equation (non-missing)
+        # Note: we only compute loadings for monthly variables because
+        # quarterly variables will always have missing entries, so we would
+        # never choose this method in that case
+        k = s.k_states_factors
+        Lambda = np.zeros((self.k_endog, k), dtype=dtype)
+        for i in range(self.k_endog):
+            y = self.endog[:, i:i + 1]
+            iloc = self._s.endog_factor_iloc[i]
+            factor_ix = s['factors_L1'][iloc]
+
+            ix = (np.s_[:],) + np.ix_(factor_ix, factor_ix)
+            A = Eaa[ix].sum(axis=0)
+            B = y.T @ a[:, factor_ix, 0]
+            if self.idiosyncratic_ar1:
+                ix1 = s.k_states_factors + i
+                ix2 = ix1 + 1
+                B -= Eaa[:, ix1:ix2, factor_ix].sum(axis=0)
+
+            # want: x = B A^{-1}, so solve: x A = B or solve: A' x' = B'
+            try:
+                Lambda[i, factor_ix] = cho_solve(cho_factor(A), B.T).T
+            except LinAlgError:
+                # Fall back to general solver if there are problems with
+                # postive-definiteness
+                Lambda[i, factor_ix] = np.linalg.solve(A, B.T).T
+
+        # Compute new obs cov
+        # Note: this is unnecessary if `idiosyncratic_ar1=True`.
+        # This is written in a slightly more general way than
+        # Banbura and Modugno (2014), equation (7); see instead equation (13)
+        # of Wu et al. (1996)
+        # "An algorithm for estimating parameters of state-space models"
+        if compute_H:
+            Z = self['design'].copy()
+            Z[:, :k] = Lambda
+            BL = self.endog.T @ a[..., 0] @ Z.T
+            C = self.endog.T @ self.endog
+
+            H = (C + -BL - BL.T + Z @ Eaa.sum(axis=0) @ Z.T) / self.nobs
+        else:
+            H = np.zeros((self.k_endog, self.k_endog), dtype=dtype) * np.nan
+
+        return Lambda, H

     def _em_maximization_obs_missing(self, res, Eaa, a, compute_H=False):
         """EM maximization step, observation equation with missing data."""
-        pass
+        s = self._s
+        dtype = Eaa.dtype
+
+        # Observation equation (missing)
+        k = s.k_states_factors
+        Lambda = np.zeros((self.k_endog, k), dtype=dtype)
+
+        W = (1 - res.missing.T)
+        mask = W.astype(bool)
+
+        # Compute design for monthly
+        # Note: the relevant A changes for each i
+        for i in range(self.k_endog_M):
+            iloc = self._s.endog_factor_iloc[i]
+            factor_ix = s['factors_L1'][iloc]
+
+            m = mask[:, i]
+            yt = self.endog[m, i:i + 1]
+
+            ix = np.ix_(m, factor_ix, factor_ix)
+            Ai = Eaa[ix].sum(axis=0)
+            Bi = yt.T @ a[np.ix_(m, factor_ix)][..., 0]
+            if self.idiosyncratic_ar1:
+                ix1 = s.k_states_factors + i
+                ix2 = ix1 + 1
+                Bi -= Eaa[m, ix1:ix2][..., factor_ix].sum(axis=0)
+            # want: x = B A^{-1}, so solve: x A = B or solve: A' x' = B'
+            try:
+                Lambda[i, factor_ix] = cho_solve(cho_factor(Ai), Bi.T).T
+            except LinAlgError:
+                # Fall back to general solver if there are problems with
+                # postive-definiteness
+                Lambda[i, factor_ix] = np.linalg.solve(Ai, Bi.T).T
+
+        # Compute unrestricted design for quarterly
+        # See Banbura at al. (2011), where this is described in Appendix C,
+        # between equations (13) and (14).
+        if self.k_endog_Q > 0:
+            # Note: the relevant A changes for each i
+            multipliers = np.array([1, 2, 3, 2, 1])[:, None]
+            for i in range(self.k_endog_M, self.k_endog):
+                iloc = self._s.endog_factor_iloc[i]
+                factor_ix = s['factors_L1_5_ix'][:, iloc].ravel().tolist()
+
+                R, _ = self.loading_constraints(i)
+
+                iQ = i - self.k_endog_M
+                m = mask[:, i]
+                yt = self.endog[m, i:i + 1]
+                ix = np.ix_(m, factor_ix, factor_ix)
+                Ai = Eaa[ix].sum(axis=0)
+                BiQ = yt.T @ a[np.ix_(m, factor_ix)][..., 0]
+                if self.idiosyncratic_ar1:
+                    ix = (np.s_[:],) + np.ix_(s['idio_ar_Q_ix'][iQ], factor_ix)
+                    Eepsf = Eaa[ix]
+                    BiQ -= (multipliers * Eepsf[m].sum(axis=0)).sum(axis=0)
+
+                # Note that there was a typo in Banbura et al. (2011) for
+                # the formula applying the restrictions. In their notation,
+                # they show (C D C')^{-1} while it should be (C D^{-1} C')^{-1}
+                # Note: in reality, this is:
+                # unrestricted - Aii @ R.T @ RARi @ (R @ unrestricted - q)
+                # where the restrictions are defined as: R @ unrestricted = q
+                # However, here q = 0, so we can simplify.
+                try:
+                    L_and_lower = cho_factor(Ai)
+                    # x = BQ A^{-1}, or x A = BQ, so solve A' x' = (BQ)'
+                    unrestricted = cho_solve(L_and_lower, BiQ.T).T[0]
+                    AiiRT = cho_solve(L_and_lower, R.T)
+
+                    L_and_lower = cho_factor(R @ AiiRT)
+                    RAiiRTiR = cho_solve(L_and_lower, R)
+                    restricted = unrestricted - AiiRT @ RAiiRTiR @ unrestricted
+                except LinAlgError:
+                    # Fall back to slower method if there are problems with
+                    # postive-definiteness
+                    Aii = np.linalg.inv(Ai)
+                    unrestricted = (BiQ @ Aii)[0]
+                    RARi = np.linalg.inv(R @ Aii @ R.T)
+                    restricted = (unrestricted -
+                                  Aii @ R.T @ RARi @ R @ unrestricted)
+                Lambda[i, factor_ix] = restricted
+
+        # Compute new obs cov
+        # Note: this is unnecessary if `idiosyncratic_ar1=True`.
+        # See Banbura and Modugno (2014), equation (12)
+        # This does not literally follow their formula, e.g. multiplying by the
+        # W_t selection matrices, because those formulas require loops that are
+        # relatively slow. The formulation here is vectorized.
+        if compute_H:
+            Z = self['design'].copy()
+            Z[:, :Lambda.shape[1]] = Lambda
+
+            y = np.nan_to_num(self.endog)
+            C = y.T @ y
+            W = W[..., None]
+            IW = 1 - W
+
+            WL = W * Z
+            WLT = WL.transpose(0, 2, 1)
+            BL = y[..., None] @ a.transpose(0, 2, 1) @ WLT
+            A = Eaa
+
+            BLT = BL.transpose(0, 2, 1)
+            IWT = IW.transpose(0, 2, 1)
+
+            H = (C + (-BL - BLT + WL @ A @ WLT +
+                      IW * self['obs_cov'] * IWT).sum(axis=0)) / self.nobs
+        else:
+            H = np.zeros((self.k_endog, self.k_endog), dtype=dtype) * np.nan
+
+        return Lambda, H

     def smooth(self, params, transformed=True, includes_fixed=False,
-        complex_step=False, cov_type='none', cov_kwds=None, return_ssm=
-        False, results_class=None, results_wrapper_class=None, **kwargs):
+               complex_step=False, cov_type='none', cov_kwds=None,
+               return_ssm=False, results_class=None,
+               results_wrapper_class=None, **kwargs):
         """
         Kalman smoothing.

@@ -1760,12 +3054,16 @@ class DynamicFactorMQ(mlemodel.MLEModel):
             Additional keyword arguments to pass to the Kalman filter. See
             `KalmanFilter.filter` for more details.
         """
-        pass
+        return super().smooth(
+            params, transformed=transformed, includes_fixed=includes_fixed,
+            complex_step=complex_step, cov_type=cov_type, cov_kwds=cov_kwds,
+            return_ssm=return_ssm, results_class=results_class,
+            results_wrapper_class=results_wrapper_class, **kwargs)

     def filter(self, params, transformed=True, includes_fixed=False,
-        complex_step=False, cov_type='none', cov_kwds=None, return_ssm=
-        False, results_class=None, results_wrapper_class=None, low_memory=
-        False, **kwargs):
+               complex_step=False, cov_type='none', cov_kwds=None,
+               return_ssm=False, results_class=None,
+               results_wrapper_class=None, low_memory=False, **kwargs):
         """
         Kalman filtering.

@@ -1794,13 +3092,18 @@ class DynamicFactorMQ(mlemodel.MLEModel):
             Additional keyword arguments to pass to the Kalman filter. See
             `KalmanFilter.filter` for more details.
         """
-        pass
+        return super().filter(
+            params, transformed=transformed, includes_fixed=includes_fixed,
+            complex_step=complex_step, cov_type=cov_type, cov_kwds=cov_kwds,
+            return_ssm=return_ssm, results_class=results_class,
+            results_wrapper_class=results_wrapper_class, **kwargs)

     def simulate(self, params, nsimulations, measurement_shocks=None,
-        state_shocks=None, initial_state=None, anchor=None, repetitions=
-        None, exog=None, extend_model=None, extend_kwargs=None, transformed
-        =True, includes_fixed=False, original_scale=True, **kwargs):
-        """
+                 state_shocks=None, initial_state=None, anchor=None,
+                 repetitions=None, exog=None, extend_model=None,
+                 extend_kwargs=None, transformed=True, includes_fixed=False,
+                 original_scale=True, **kwargs):
+        r"""
         Simulate a new time series following the state space model.

         Parameters
@@ -1815,13 +3118,13 @@ class DynamicFactorMQ(mlemodel.MLEModel):
             number of observations.
         measurement_shocks : array_like, optional
             If specified, these are the shocks to the measurement equation,
-            :math:`\\varepsilon_t`. If unspecified, these are automatically
+            :math:`\varepsilon_t`. If unspecified, these are automatically
             generated using a pseudo-random number generator. If specified,
             must be shaped `nsimulations` x `k_endog`, where `k_endog` is the
             same as in the state space model.
         state_shocks : array_like, optional
             If specified, these are the shocks to the state equation,
-            :math:`\\eta_t`. If unspecified, these are automatically
+            :math:`\eta_t`. If unspecified, these are automatically
             generated using a pseudo-random number generator. If specified,
             must be shaped `nsimulations` x `k_posdef` where `k_posdef` is the
             same as in the state space model.
@@ -1874,12 +3177,53 @@ class DynamicFactorMQ(mlemodel.MLEModel):
             the first level containing the names of the `endog` variables and
             the second level containing the repetition number.
         """
-        pass
-
-    def impulse_responses(self, params, steps=1, impulse=0, orthogonalized=
-        False, cumulative=False, anchor=None, exog=None, extend_model=None,
-        extend_kwargs=None, transformed=True, includes_fixed=False,
-        original_scale=True, **kwargs):
+        # Get usual simulations (in the possibly-standardized scale)
+        sim = super().simulate(
+            params, nsimulations, measurement_shocks=measurement_shocks,
+            state_shocks=state_shocks, initial_state=initial_state,
+            anchor=anchor, repetitions=repetitions, exog=exog,
+            extend_model=extend_model, extend_kwargs=extend_kwargs,
+            transformed=transformed, includes_fixed=includes_fixed, **kwargs)
+
+        # If applicable, convert predictions back to original space
+        if self.standardize and original_scale:
+            use_pandas = isinstance(self.data, PandasData)
+            shape = sim.shape
+
+            if use_pandas:
+                # pd.Series (k_endog=1, replications=None)
+                if len(shape) == 1:
+                    std = self._endog_std.iloc[0]
+                    mean = self._endog_mean.iloc[0]
+                    sim = sim * std + mean
+                # pd.DataFrame (k_endog > 1, replications=None)
+                # [or]
+                # pd.DataFrame with MultiIndex (replications > 0)
+                elif len(shape) == 2:
+                    sim = (sim.multiply(self._endog_std, axis=1, level=0)
+                              .add(self._endog_mean, axis=1, level=0))
+            else:
+                # 1-dim array (k_endog=1, replications=None)
+                if len(shape) == 1:
+                    sim = sim * self._endog_std + self._endog_mean
+                # 2-dim array (k_endog > 1, replications=None)
+                elif len(shape) == 2:
+                    sim = sim * self._endog_std + self._endog_mean
+                # 3-dim array with MultiIndex (replications > 0)
+                else:
+                    # Get arrays into the form that can be used for
+                    # broadcasting
+                    std = np.atleast_2d(self._endog_std)[..., None]
+                    mean = np.atleast_2d(self._endog_mean)[..., None]
+                    sim = sim * std + mean
+
+        return sim
+
+    def impulse_responses(self, params, steps=1, impulse=0,
+                          orthogonalized=False, cumulative=False, anchor=None,
+                          exog=None, extend_model=None, extend_kwargs=None,
+                          transformed=True, includes_fixed=False,
+                          original_scale=True, **kwargs):
         """
         Impulse response function.

@@ -1949,17 +3293,46 @@ class DynamicFactorMQ(mlemodel.MLEModel):
             matrices).

         """
-        pass
+        # Get usual simulations (in the possibly-standardized scale)
+        irfs = super().impulse_responses(
+            params, steps=steps, impulse=impulse,
+            orthogonalized=orthogonalized, cumulative=cumulative,
+            anchor=anchor, exog=exog, extend_model=extend_model,
+            extend_kwargs=extend_kwargs, transformed=transformed,
+            includes_fixed=includes_fixed, **kwargs)
+
+        # If applicable, convert predictions back to original space
+        if self.standardize and original_scale:
+            use_pandas = isinstance(self.data, PandasData)
+            shape = irfs.shape
+
+            if use_pandas:
+                # pd.Series (k_endog=1, replications=None)
+                if len(shape) == 1:
+                    irfs = irfs * self._endog_std.iloc[0]
+                # pd.DataFrame (k_endog > 1)
+                # [or]
+                # pd.DataFrame with MultiIndex (replications > 0)
+                elif len(shape) == 2:
+                    irfs = irfs.multiply(self._endog_std, axis=1, level=0)
+            else:
+                # 1-dim array (k_endog=1)
+                if len(shape) == 1:
+                    irfs = irfs * self._endog_std
+                # 2-dim array (k_endog > 1)
+                elif len(shape) == 2:
+                    irfs = irfs * self._endog_std
+
+        return irfs


 class DynamicFactorMQResults(mlemodel.MLEResults):
     """
     Results from fitting a dynamic factor model
     """
-
     def __init__(self, model, params, filter_results, cov_type=None, **kwargs):
-        super(DynamicFactorMQResults, self).__init__(model, params,
-            filter_results, cov_type, **kwargs)
+        super(DynamicFactorMQResults, self).__init__(
+            model, params, filter_results, cov_type, **kwargs)

     @property
     def factors(self):
@@ -1986,10 +3359,23 @@ class DynamicFactorMQResults(mlemodel.MLEResults):
         - `offset`: an integer giving the offset in the state vector where
           this component begins
         """
-        pass
-
-    def get_coefficients_of_determination(self, method='individual', which=None
-        ):
+        out = None
+        if self.model.k_factors > 0:
+            iloc = self.model._s.factors_L1
+            ix = np.array(self.model.state_names)[iloc].tolist()
+            out = Bunch(
+                filtered=self.states.filtered.loc[:, ix],
+                filtered_cov=self.states.filtered_cov.loc[np.s_[ix, :], ix],
+                smoothed=None, smoothed_cov=None)
+            if self.smoothed_state is not None:
+                out.smoothed = self.states.smoothed.loc[:, ix]
+            if self.smoothed_state_cov is not None:
+                out.smoothed_cov = (
+                    self.states.smoothed_cov.loc[np.s_[ix, :], ix])
+        return out
+
+    def get_coefficients_of_determination(self, method='individual',
+                                          which=None):
         """
         Get coefficients of determination (R-squared) for variables / factors.

@@ -2022,7 +3408,63 @@ class DynamicFactorMQResults(mlemodel.MLEResults):
         plot_coefficients_of_determination
         coefficients_of_determination
         """
-        pass
+        from statsmodels.tools import add_constant
+
+        method = string_like(method, 'method', options=['individual', 'joint',
+                                                        'cumulative'])
+        if which is None:
+            which = 'filtered' if self.smoothed_state is None else 'smoothed'
+
+        k_endog = self.model.k_endog
+        k_factors = self.model.k_factors
+        ef_map = self.model._s.endog_factor_map
+        endog_names = self.model.endog_names
+        factor_names = self.model.factor_names
+
+        if method == 'individual':
+            coefficients = np.zeros((k_endog, k_factors))
+
+            for i in range(k_factors):
+                exog = add_constant(self.factors[which].iloc[:, i])
+                for j in range(k_endog):
+                    if ef_map.iloc[j, i]:
+                        endog = self.filter_results.endog[j]
+                        coefficients[j, i] = (
+                            OLS(endog, exog, missing='drop').fit().rsquared)
+                    else:
+                        coefficients[j, i] = np.nan
+
+            coefficients = pd.DataFrame(coefficients, index=endog_names,
+                                        columns=factor_names)
+        elif method == 'joint':
+            coefficients = np.zeros((k_endog,))
+            exog = add_constant(self.factors[which])
+            for j in range(k_endog):
+                endog = self.filter_results.endog[j]
+                ix = np.r_[True, ef_map.iloc[j]].tolist()
+                X = exog.loc[:, ix]
+                coefficients[j] = (
+                    OLS(endog, X, missing='drop').fit().rsquared)
+            coefficients = pd.Series(coefficients, index=endog_names)
+        elif method == 'cumulative':
+            coefficients = np.zeros((k_endog, k_factors))
+            exog = add_constant(self.factors[which])
+            for j in range(k_endog):
+                endog = self.filter_results.endog[j]
+
+                for i in range(k_factors):
+                    if self.model._s.endog_factor_map.iloc[j, i]:
+                        ix = np.r_[True, ef_map.iloc[j, :i + 1],
+                                   [False] * (k_factors - i - 1)]
+                        X = exog.loc[:, ix.astype(bool).tolist()]
+                        coefficients[j, i] = (
+                            OLS(endog, X, missing='drop').fit().rsquared)
+                    else:
+                        coefficients[j, i] = np.nan
+            coefficients = pd.DataFrame(coefficients, index=endog_names,
+                                        columns=factor_names)
+
+        return coefficients

     @cache_readonly
     def coefficients_of_determination(self):
@@ -2056,10 +3498,11 @@ class DynamicFactorMQResults(mlemodel.MLEResults):
         get_coefficients_of_determination
         plot_coefficients_of_determination
         """
-        pass
+        return self.get_coefficients_of_determination(method='individual')

-    def plot_coefficients_of_determination(self, method='individual', which
-        =None, endog_labels=None, fig=None, figsize=None):
+    def plot_coefficients_of_determination(self, method='individual',
+                                           which=None, endog_labels=None,
+                                           fig=None, figsize=None):
         """
         Plot coefficients of determination (R-squared) for variables / factors.

@@ -2097,13 +3540,50 @@ class DynamicFactorMQResults(mlemodel.MLEResults):
         --------
         get_coefficients_of_determination
         """
-        pass
+        from statsmodels.graphics.utils import _import_mpl, create_mpl_fig
+        _import_mpl()
+        fig = create_mpl_fig(fig, figsize)
+
+        method = string_like(method, 'method', options=['individual', 'joint',
+                                                        'cumulative'])
+
+        # Should we label endogenous variables?
+        if endog_labels is None:
+            endog_labels = self.model.k_endog <= 5
+
+        # Plot the coefficients of determination
+        rsquared = self.get_coefficients_of_determination(method=method,
+                                                          which=which)
+
+        if method in ['individual', 'cumulative']:
+            plot_idx = 1
+            for factor_name, coeffs in rsquared.T.iterrows():
+                # Create the new axis
+                ax = fig.add_subplot(self.model.k_factors, 1, plot_idx)
+                ax.set_ylim((0, 1))
+                ax.set(title=f'{factor_name}', ylabel=r'$R^2$')
+
+                coeffs.plot(ax=ax, kind='bar')
+                if plot_idx < len(rsquared.columns) or not endog_labels:
+                    ax.xaxis.set_ticklabels([])
+
+                plot_idx += 1
+        elif method == 'joint':
+            ax = fig.add_subplot(1, 1, 1)
+            ax.set_ylim((0, 1))
+            ax.set(title=r'$R^2$ - regression on all loaded factors',
+                   ylabel=r'$R^2$')
+            rsquared.plot(ax=ax, kind='bar')
+            if not endog_labels:
+                ax.xaxis.set_ticklabels([])
+
+        return fig

     def get_prediction(self, start=None, end=None, dynamic=False,
-        information_set='predicted', signal_only=False, original_scale=True,
-        index=None, exog=None, extend_model=None, extend_kwargs=None, **kwargs
-        ):
-        """
+                       information_set='predicted', signal_only=False,
+                       original_scale=True, index=None, exog=None,
+                       extend_model=None, extend_kwargs=None, **kwargs):
+        r"""
         In-sample prediction and out-of-sample forecasting.

         Parameters
@@ -2141,9 +3621,9 @@ class DynamicFactorMQResults(mlemodel.MLEResults):
             Whether to compute forecasts of only the "signal" component of
             the observation equation. Default is False. For example, the
             observation equation of a time-invariant model is
-            :math:`y_t = d + Z \\alpha_t + \\varepsilon_t`, and the "signal"
-            component is then :math:`Z \\alpha_t`. If this argument is set to
-            True, then forecasts of the "signal" :math:`Z \\alpha_t` will be
+            :math:`y_t = d + Z \alpha_t + \varepsilon_t`, and the "signal"
+            component is then :math:`Z \alpha_t`. If this argument is set to
+            True, then forecasts of the "signal" :math:`Z \alpha_t` will be
             returned. Otherwise, the default is for forecasts of :math:`y_t`
             to be returned.
         original_scale : bool, optional
@@ -2160,12 +3640,42 @@ class DynamicFactorMQResults(mlemodel.MLEResults):
             Array of out of in-sample predictions and / or out-of-sample
             forecasts. An (npredict x k_endog) array.
         """
-        pass
+        # Get usual predictions (in the possibly-standardized scale)
+        res = super().get_prediction(start=start, end=end, dynamic=dynamic,
+                                     information_set=information_set,
+                                     signal_only=signal_only,
+                                     index=index, exog=exog,
+                                     extend_model=extend_model,
+                                     extend_kwargs=extend_kwargs, **kwargs)
+
+        # If applicable, convert predictions back to original space
+        if self.model.standardize and original_scale:
+            prediction_results = res.prediction_results
+            k_endog, _ = prediction_results.endog.shape
+
+            mean = np.array(self.model._endog_mean)
+            std = np.array(self.model._endog_std)
+
+            if self.model.k_endog > 1:
+                mean = mean[None, :]
+                std = std[None, :]
+
+            res._results._predicted_mean = (
+                res._results._predicted_mean * std + mean)
+
+            if k_endog == 1:
+                res._results._var_pred_mean *= std**2
+            else:
+                res._results._var_pred_mean = (
+                    std * res._results._var_pred_mean * std.T)
+
+        return res

     def news(self, comparison, impact_date=None, impacted_variable=None,
-        start=None, end=None, periods=None, exog=None, comparison_type=None,
-        revisions_details_start=False, state_index=None, return_raw=False,
-        tolerance=1e-10, endog_quarterly=None, original_scale=True, **kwargs):
+             start=None, end=None, periods=None, exog=None,
+             comparison_type=None, revisions_details_start=False,
+             state_index=None, return_raw=False, tolerance=1e-10,
+             endog_quarterly=None, original_scale=True, **kwargs):
         """
         Compute impacts from updated data (news and revisions).

@@ -2240,11 +3750,89 @@ class DynamicFactorMQResults(mlemodel.MLEResults):
                In Handbook of economic forecasting, vol. 2, pp. 195-237.
                Elsevier, 2013.
         """
-        pass
+        if state_index == 'common':
+            state_index = (
+                np.arange(self.model.k_states - self.model.k_endog))
+
+        news_results = super().news(
+            comparison, impact_date=impact_date,
+            impacted_variable=impacted_variable, start=start, end=end,
+            periods=periods, exog=exog, comparison_type=comparison_type,
+            revisions_details_start=revisions_details_start,
+            state_index=state_index, return_raw=return_raw,
+            tolerance=tolerance, endog_quarterly=endog_quarterly, **kwargs)
+
+        # If we have standardized the data, we may want to report the news in
+        # the original scale. If so, we need to modify the data to "undo" the
+        # standardization.
+        if not return_raw and self.model.standardize and original_scale:
+            endog_mean = self.model._endog_mean
+            endog_std = self.model._endog_std
+
+            # Don't need to add in the mean for the impacts, since they are
+            # the difference of two forecasts
+            news_results.total_impacts = (
+                news_results.total_impacts * endog_std)
+            news_results.update_impacts = (
+                news_results.update_impacts * endog_std)
+            if news_results.revision_impacts is not None:
+                news_results.revision_impacts = (
+                    news_results.revision_impacts * endog_std)
+            if news_results.revision_detailed_impacts is not None:
+                news_results.revision_detailed_impacts = (
+                    news_results.revision_detailed_impacts * endog_std)
+            if news_results.revision_grouped_impacts is not None:
+                news_results.revision_grouped_impacts = (
+                    news_results.revision_grouped_impacts * endog_std)
+
+            # Update forecasts
+            for name in ['prev_impacted_forecasts', 'news', 'revisions',
+                         'update_realized', 'update_forecasts',
+                         'revised', 'revised_prev', 'post_impacted_forecasts',
+                         'revisions_all', 'revised_all', 'revised_prev_all']:
+                dta = getattr(news_results, name)
+
+                # for pd.Series, dta.multiply(...) and (sometimes) dta.add(...)
+                # remove the name attribute; save it now so that we can add it
+                # back in
+                orig_name = None
+                if hasattr(dta, 'name'):
+                    orig_name = dta.name
+
+                dta = dta.multiply(endog_std, level=1)
+
+                if name not in ['news', 'revisions']:
+                    dta = dta.add(endog_mean, level=1)
+
+                # add back in the name attribute if it was removed
+                if orig_name is not None:
+                    dta.name = orig_name
+
+                setattr(news_results, name, dta)
+
+            # For the weights: rows correspond to update (date, variable) and
+            # columns correspond to the impacted variable.
+            # 1. Because we have modified the updates (realized, forecasts, and
+            #    forecast errors) to be in the scale of the original updated
+            #    variable, we need to essentially reverse that change for each
+            #    row of the weights by dividing by the standard deviation of
+            #    that row's updated variable
+            # 2. Because we want the impacts to be in the scale of the original
+            #    impacted variable, we need to multiply each column by the
+            #    standard deviation of that column's impacted variable
+            news_results.weights = (
+                news_results.weights.divide(endog_std, axis=0, level=1)
+                                    .multiply(endog_std, axis=1, level=1))
+            news_results.revision_weights = (
+                news_results.revision_weights
+                            .divide(endog_std, axis=0, level=1)
+                            .multiply(endog_std, axis=1, level=1))
+
+        return news_results

     def get_smoothed_decomposition(self, decomposition_of='smoothed_state',
-        state_index=None, original_scale=True):
-        """
+                                   state_index=None, original_scale=True):
+        r"""
         Decompose smoothed output into contributions from observations

         Parameters
@@ -2319,14 +3907,50 @@ class DynamicFactorMQResults(mlemodel.MLEResults):

         Notes
         -----
-        Denote the smoothed state at time :math:`t` by :math:`\\alpha_t`. Then
-        the smoothed signal is :math:`Z_t \\alpha_t`, where :math:`Z_t` is the
+        Denote the smoothed state at time :math:`t` by :math:`\alpha_t`. Then
+        the smoothed signal is :math:`Z_t \alpha_t`, where :math:`Z_t` is the
         design matrix operative at time :math:`t`.
         """
-        pass
-
-    def append(self, endog, endog_quarterly=None, refit=False, fit_kwargs=
-        None, copy_initialization=True, retain_standardization=True, **kwargs):
+        # De-meaning the data is like putting the mean into the observation
+        # intercept. To compute the decomposition correctly in the original
+        # scale, we need to account for this, so we fill in the observation
+        # intercept temporarily
+        if self.model.standardize and original_scale:
+            cache_obs_intercept = self.model['obs_intercept']
+            self.model['obs_intercept'] = self.model._endog_mean
+
+        # Compute the contributions
+        (data_contributions, obs_intercept_contributions,
+         state_intercept_contributions, prior_contributions) = (
+            super().get_smoothed_decomposition(
+                decomposition_of=decomposition_of, state_index=state_index))
+
+        # Replace the original observation intercept
+        if self.model.standardize and original_scale:
+            self.model['obs_intercept'] = cache_obs_intercept
+
+        # Reverse the effect of dividing by the standard deviation
+        if (decomposition_of == 'smoothed_signal'
+                and self.model.standardize and original_scale):
+            endog_std = self.model._endog_std
+
+            data_contributions = (
+                data_contributions.multiply(endog_std, axis=0, level=0))
+            obs_intercept_contributions = (
+                obs_intercept_contributions.multiply(
+                    endog_std, axis=0, level=0))
+            state_intercept_contributions = (
+                state_intercept_contributions.multiply(
+                    endog_std, axis=0, level=0))
+            prior_contributions = (
+                prior_contributions.multiply(endog_std, axis=0, level=0))
+
+        return (data_contributions, obs_intercept_contributions,
+                state_intercept_contributions, prior_contributions)
+
+    def append(self, endog, endog_quarterly=None, refit=False, fit_kwargs=None,
+               copy_initialization=True, retain_standardization=True,
+               **kwargs):
         """
         Recreate the results object with new data appended to original data.

@@ -2387,10 +4011,26 @@ class DynamicFactorMQResults(mlemodel.MLEResults):
         extend
         apply
         """
-        pass
+        # Construct the combined dataset, if necessary
+        endog, k_endog_monthly = DynamicFactorMQ.construct_endog(
+            endog, endog_quarterly)
+
+        # Check for compatible dimensions
+        k_endog = endog.shape[1] if len(endog.shape) == 2 else 1
+        if (k_endog_monthly != self.model.k_endog_M or
+                k_endog != self.model.k_endog):
+            raise ValueError('Cannot append data of a different dimension to'
+                             ' a model.')
+
+        kwargs['k_endog_monthly'] = k_endog_monthly
+
+        return super().append(
+            endog, refit=refit, fit_kwargs=fit_kwargs,
+            copy_initialization=copy_initialization,
+            retain_standardization=retain_standardization, **kwargs)

     def extend(self, endog, endog_quarterly=None, fit_kwargs=None,
-        retain_standardization=True, **kwargs):
+               retain_standardization=True, **kwargs):
         """
         Recreate the results object for new data that extends original data.

@@ -2439,11 +4079,25 @@ class DynamicFactorMQResults(mlemodel.MLEResults):
         for the new data. To retrieve results for both the new data and the
         original data, see the `append` method.
         """
-        pass
+        # Construct the combined dataset, if necessary
+        endog, k_endog_monthly = DynamicFactorMQ.construct_endog(
+            endog, endog_quarterly)
+
+        # Check for compatible dimensions
+        k_endog = endog.shape[1] if len(endog.shape) == 2 else 1
+        if (k_endog_monthly != self.model.k_endog_M or
+                k_endog != self.model.k_endog):
+            raise ValueError('Cannot append data of a different dimension to'
+                             ' a model.')
+
+        kwargs['k_endog_monthly'] = k_endog_monthly
+        return super().extend(
+            endog, fit_kwargs=fit_kwargs,
+            retain_standardization=retain_standardization, **kwargs)

     def apply(self, endog, k_endog_monthly=None, endog_quarterly=None,
-        refit=False, fit_kwargs=None, copy_initialization=False,
-        retain_standardization=True, **kwargs):
+              refit=False, fit_kwargs=None, copy_initialization=False,
+              retain_standardization=True, **kwargs):
         """
         Apply the fitted parameters to new data unrelated to the original data.

@@ -2501,12 +4155,23 @@ class DynamicFactorMQResults(mlemodel.MLEResults):
         dataset. For observations that continue that original dataset by follow
         directly after its last element, see the `append` and `extend` methods.
         """
-        pass
-
-    def summary(self, alpha=0.05, start=None, title=None, model_name=None,
-        display_params=True, display_diagnostics=False,
-        display_params_as_list=False, truncate_endog_names=None,
-        display_max_endog=3):
+        mod = self.model.clone(endog, k_endog_monthly=k_endog_monthly,
+                               endog_quarterly=endog_quarterly,
+                               retain_standardization=retain_standardization,
+                               **kwargs)
+        if copy_initialization:
+            init = initialization.Initialization.from_results(
+                self.filter_results)
+            mod.ssm.initialization = init
+
+        res = self._apply(mod, refit=refit, fit_kwargs=fit_kwargs)
+
+        return res
+
+    def summary(self, alpha=.05, start=None, title=None, model_name=None,
+                display_params=True, display_diagnostics=False,
+                display_params_as_list=False, truncate_endog_names=None,
+                display_max_endog=3):
         """
         Summarize the Model.

@@ -2531,4 +4196,138 @@ class DynamicFactorMQResults(mlemodel.MLEResults):
         --------
         statsmodels.iolib.summary.Summary
         """
-        pass
+        mod = self.model
+
+        # Default title / model name
+        if title is None:
+            title = 'Dynamic Factor Results'
+        if model_name is None:
+            model_name = self.model._model_name
+
+        # Get endog names
+        endog_names = self.model._get_endog_names(
+            truncate=truncate_endog_names)
+
+        # Get extra elements for top summary table
+        extra_top_left = None
+        extra_top_right = []
+        mle_retvals = getattr(self, 'mle_retvals', None)
+        mle_settings = getattr(self, 'mle_settings', None)
+        if mle_settings is not None and mle_settings.method == 'em':
+            extra_top_right += [('EM Iterations', [f'{mle_retvals.iter}'])]
+
+        # Get the basic summary tables
+        summary = super().summary(
+            alpha=alpha, start=start, title=title, model_name=model_name,
+            display_params=(display_params and display_params_as_list),
+            display_diagnostics=display_diagnostics,
+            truncate_endog_names=truncate_endog_names,
+            display_max_endog=display_max_endog,
+            extra_top_left=extra_top_left, extra_top_right=extra_top_right)
+
+        # Get tables of parameters
+        table_ix = 1
+        if not display_params_as_list:
+
+            # Observation equation table
+            data = pd.DataFrame(
+                self.filter_results.design[:, mod._s['factors_L1'], 0],
+                index=endog_names, columns=mod.factor_names)
+            try:
+                data = data.map(lambda s: '%.2f' % s)
+            except AttributeError:
+                data = data.applymap(lambda s: '%.2f' % s)
+
+            # Idiosyncratic terms
+            # data['   '] = '   '
+            k_idio = 1
+            if mod.idiosyncratic_ar1:
+                data['   idiosyncratic: AR(1)'] = (
+                    self.params[mod._p['idiosyncratic_ar1']])
+                k_idio += 1
+            data['var.'] = self.params[mod._p['idiosyncratic_var']]
+            try:
+                data.iloc[:, -k_idio:] = data.iloc[:, -k_idio:].map(
+                    lambda s: f'{s:.2f}')
+            except AttributeError:
+                data.iloc[:, -k_idio:] = data.iloc[:, -k_idio:].applymap(
+                    lambda s: f'{s:.2f}')
+
+            data.index.name = 'Factor loadings:'
+
+            # Clear entries for non-loading factors
+            base_iloc = np.arange(mod.k_factors)
+            for i in range(mod.k_endog):
+                iloc = [j for j in base_iloc
+                        if j not in mod._s.endog_factor_iloc[i]]
+                data.iloc[i, iloc] = '.'
+
+            data = data.reset_index()
+
+            # Build the table
+            params_data = data.values
+            params_header = data.columns.tolist()
+            params_stubs = None
+
+            title = 'Observation equation:'
+            table = SimpleTable(
+                params_data, params_header, params_stubs,
+                txt_fmt=fmt_params, title=title)
+            summary.tables.insert(table_ix, table)
+            table_ix += 1
+
+            # Factor transitions
+            ix1 = 0
+            ix2 = 0
+            for i in range(len(mod._s.factor_blocks)):
+                block = mod._s.factor_blocks[i]
+                ix2 += block.k_factors
+
+                T = self.filter_results.transition
+                lag_names = []
+                for j in range(block.factor_order):
+                    lag_names += [f'L{j + 1}.{name}'
+                                  for name in block.factor_names]
+                data = pd.DataFrame(T[block.factors_L1, block.factors_ar, 0],
+                                    index=block.factor_names,
+                                    columns=lag_names)
+                data.index.name = ''
+                try:
+                    data = data.map(lambda s: '%.2f' % s)
+                except AttributeError:
+                    data = data.applymap(lambda s: '%.2f' % s)
+
+                Q = self.filter_results.state_cov
+                # data[' '] = ''
+                if block.k_factors == 1:
+                    data['   error variance'] = Q[ix1, ix1]
+                else:
+                    data['   error covariance'] = block.factor_names
+                    for j in range(block.k_factors):
+                        data[block.factor_names[j]] = Q[ix1:ix2, ix1 + j]
+                try:
+                    formatted_vals = data.iloc[:, -block.k_factors:].map(
+                        lambda s: f'{s:.2f}'
+                    )
+                except AttributeError:
+                    formatted_vals = data.iloc[:, -block.k_factors:].applymap(
+                        lambda s: f'{s:.2f}'
+                    )
+                data.iloc[:, -block.k_factors:] = formatted_vals
+
+                data = data.reset_index()
+
+                params_data = data.values
+                params_header = data.columns.tolist()
+                params_stubs = None
+
+                title = f'Transition: Factor block {i}'
+                table = SimpleTable(
+                    params_data, params_header, params_stubs,
+                    txt_fmt=fmt_params, title=title)
+                summary.tables.insert(table_ix, table)
+                table_ix += 1
+
+                ix1 = ix2
+
+        return summary
diff --git a/statsmodels/tsa/statespace/exponential_smoothing.py b/statsmodels/tsa/statespace/exponential_smoothing.py
index d28ffea3e..f1c84472f 100644
--- a/statsmodels/tsa/statespace/exponential_smoothing.py
+++ b/statsmodels/tsa/statespace/exponential_smoothing.py
@@ -7,16 +7,23 @@ License: BSD-3
 import numpy as np
 import pandas as pd
 from statsmodels.base.data import PandasData
+
 from statsmodels.genmod.generalized_linear_model import GLM
-from statsmodels.tools.validation import array_like, bool_like, float_like, string_like, int_like
+from statsmodels.tools.validation import (array_like, bool_like, float_like,
+                                          string_like, int_like)
+
 from statsmodels.tsa.exponential_smoothing import initialization as es_init
 from statsmodels.tsa.statespace import initialization as ss_init
-from statsmodels.tsa.statespace.kalman_filter import MEMORY_CONSERVE, MEMORY_NO_FORECAST
+from statsmodels.tsa.statespace.kalman_filter import (
+    MEMORY_CONSERVE, MEMORY_NO_FORECAST)
+
 from statsmodels.compat.pandas import Appender
 import statsmodels.base.wrapper as wrap
+
 from statsmodels.iolib.summary import forg
 from statsmodels.iolib.table import SimpleTable
 from statsmodels.iolib.tableformatting import fmt_params
+
 from .mlemodel import MLEModel, MLEResults, MLEResultsWrapper


@@ -138,142 +145,592 @@ class ExponentialSmoothing(MLEModel):
         Forecasting with exponential smoothing: the state space approach.
         Springer Science & Business Media, 2008.
     """
-
-    def __init__(self, endog, trend=False, damped_trend=False, seasonal=
-        None, initialization_method='estimated', initial_level=None,
-        initial_trend=None, initial_seasonal=None, bounds=None,
-        concentrate_scale=True, dates=None, freq=None):
+    def __init__(self, endog, trend=False, damped_trend=False, seasonal=None,
+                 initialization_method='estimated', initial_level=None,
+                 initial_trend=None, initial_seasonal=None, bounds=None,
+                 concentrate_scale=True, dates=None, freq=None):
+        # Model definition
         self.trend = bool_like(trend, 'trend')
         self.damped_trend = bool_like(damped_trend, 'damped_trend')
         self.seasonal_periods = int_like(seasonal, 'seasonal', optional=True)
         self.seasonal = self.seasonal_periods is not None
-        self.initialization_method = string_like(initialization_method,
-            'initialization_method').lower()
+        self.initialization_method = string_like(
+            initialization_method, 'initialization_method').lower()
         self.concentrate_scale = bool_like(concentrate_scale,
-            'concentrate_scale')
+                                           'concentrate_scale')
+
+        # TODO: add validation for bounds (e.g. have all bounds, upper > lower)
+        # TODO: add `bounds_method` argument to choose between "usual" and
+        # "admissible" as in Hyndman et al. (2008)
         self.bounds = bounds
         if self.bounds is None:
-            self.bounds = [(0.0001, 1 - 0.0001)] * 3 + [(0.8, 0.98)]
+            self.bounds = [(1e-4, 1-1e-4)] * 3 + [(0.8, 0.98)]
+
+        # Validation
         if self.seasonal_periods == 1:
             raise ValueError('Cannot have a seasonal period of 1.')
+
         if self.seasonal and self.seasonal_periods is None:
-            raise NotImplementedError(
-                'Unable to detect season automatically; please specify `seasonal_periods`.'
-                )
+            raise NotImplementedError('Unable to detect season automatically;'
+                                      ' please specify `seasonal_periods`.')
+
         if self.initialization_method not in ['concentrated', 'estimated',
-            'simple', 'heuristic', 'known']:
-            raise ValueError('Invalid initialization method "%s".' %
-                initialization_method)
+                                              'simple', 'heuristic', 'known']:
+            raise ValueError('Invalid initialization method "%s".'
+                             % initialization_method)
+
         if self.initialization_method == 'known':
             if initial_level is None:
-                raise ValueError(
-                    '`initial_level` argument must be provided when initialization method is set to "known".'
-                    )
+                raise ValueError('`initial_level` argument must be provided'
+                                 ' when initialization method is set to'
+                                 ' "known".')
             if initial_trend is None and self.trend:
-                raise ValueError(
-                    '`initial_trend` argument must be provided for models with a trend component when initialization method is set to "known".'
-                    )
+                raise ValueError('`initial_trend` argument must be provided'
+                                 ' for models with a trend component when'
+                                 ' initialization method is set to "known".')
             if initial_seasonal is None and self.seasonal:
-                raise ValueError(
-                    '`initial_seasonal` argument must be provided for models with a seasonal component when initialization method is set to "known".'
-                    )
+                raise ValueError('`initial_seasonal` argument must be provided'
+                                 ' for models with a seasonal component when'
+                                 ' initialization method is set to "known".')
+
+        # Initialize the state space model
         if not self.seasonal or self.seasonal_periods is None:
             self._seasonal_periods = 0
         else:
             self._seasonal_periods = self.seasonal_periods
+
         k_states = 2 + int(self.trend) + self._seasonal_periods
         k_posdef = 1
-        init = ss_init.Initialization(k_states, 'known', constant=[0] *
-            k_states)
-        super(ExponentialSmoothing, self).__init__(endog, k_states=k_states,
-            k_posdef=k_posdef, initialization=init, dates=dates, freq=freq)
+
+        init = ss_init.Initialization(k_states, 'known',
+                                      constant=[0] * k_states)
+        super(ExponentialSmoothing, self).__init__(
+            endog, k_states=k_states, k_posdef=k_posdef,
+            initialization=init, dates=dates, freq=freq)
+
+        # Concentrate the scale out of the likelihood function
         if self.concentrate_scale:
             self.ssm.filter_concentrated = True
-        self.ssm['design', 0, 0] = 1.0
-        self.ssm['selection', 0, 0] = 1.0
-        self.ssm['state_cov', 0, 0] = 1.0
-        self.ssm['design', 0, 1] = 1.0
-        self.ssm['transition', 1, 1] = 1.0
+
+        # Setup fixed elements of the system matrices
+        # Observation error
+        self.ssm['design', 0, 0] = 1.
+        self.ssm['selection', 0, 0] = 1.
+        self.ssm['state_cov', 0, 0] = 1.
+
+        # Level
+        self.ssm['design', 0, 1] = 1.
+        self.ssm['transition', 1, 1] = 1.
+
+        # Trend
         if self.trend:
-            self.ssm['transition', 1:3, 2] = 1.0
+            self.ssm['transition', 1:3, 2] = 1.
+
+        # Seasonal
         if self.seasonal:
             k = 2 + int(self.trend)
-            self.ssm['design', 0, k] = 1.0
-            self.ssm['transition', k, -1] = 1.0
-            self.ssm['transition', k + 1:k_states, k:k_states - 1] = np.eye(
-                self.seasonal_periods - 1)
+            self.ssm['design', 0, k] = 1.
+            self.ssm['transition', k, -1] = 1.
+            self.ssm['transition', k + 1:k_states, k:k_states - 1] = (
+                np.eye(self.seasonal_periods - 1))
+
+        # Initialization of the states
         if self.initialization_method != 'known':
-            msg = ('Cannot give `%%s` argument when initialization is "%s"' %
-                initialization_method)
+            msg = ('Cannot give `%%s` argument when initialization is "%s"'
+                   % initialization_method)
             if initial_level is not None:
                 raise ValueError(msg % 'initial_level')
             if initial_trend is not None:
                 raise ValueError(msg % 'initial_trend')
             if initial_seasonal is not None:
                 raise ValueError(msg % 'initial_seasonal')
+
         if self.initialization_method == 'simple':
-            initial_level, initial_trend, initial_seasonal = (es_init.
-                _initialization_simple(self.endog[:, 0], trend='add' if
-                self.trend else None, seasonal='add' if self.seasonal else
-                None, seasonal_periods=self.seasonal_periods))
+            initial_level, initial_trend, initial_seasonal = (
+                es_init._initialization_simple(
+                    self.endog[:, 0], trend='add' if self.trend else None,
+                    seasonal='add' if self.seasonal else None,
+                    seasonal_periods=self.seasonal_periods))
         elif self.initialization_method == 'heuristic':
-            initial_level, initial_trend, initial_seasonal = (es_init.
-                _initialization_heuristic(self.endog[:, 0], trend='add' if
-                self.trend else None, seasonal='add' if self.seasonal else
-                None, seasonal_periods=self.seasonal_periods))
+            initial_level, initial_trend, initial_seasonal = (
+                es_init._initialization_heuristic(
+                    self.endog[:, 0], trend='add' if self.trend else None,
+                    seasonal='add' if self.seasonal else None,
+                    seasonal_periods=self.seasonal_periods))
         elif self.initialization_method == 'known':
             initial_level = float_like(initial_level, 'initial_level')
             if self.trend:
                 initial_trend = float_like(initial_trend, 'initial_trend')
             if self.seasonal:
                 initial_seasonal = array_like(initial_seasonal,
-                    'initial_seasonal')
+                                              'initial_seasonal')
+
                 if len(initial_seasonal) == self.seasonal_periods - 1:
-                    initial_seasonal = np.r_[initial_seasonal, 0 - np.sum(
-                        initial_seasonal)]
+                    initial_seasonal = np.r_[initial_seasonal,
+                                             0 - np.sum(initial_seasonal)]
+
                 if len(initial_seasonal) != self.seasonal_periods:
                     raise ValueError(
-                        'Invalid length of initial seasonal values. Must be one of s or s-1, where s is the number of seasonal periods.'
-                        )
+                        'Invalid length of initial seasonal values. Must be'
+                        ' one of s or s-1, where s is the number of seasonal'
+                        ' periods.')
+
+        # Note that the simple and heuristic methods of computing initial
+        # seasonal factors return estimated seasonal factors associated with
+        # the first t = 1, 2, ..., `n_seasons` observations. To use these as
+        # the initial state, we lag them by `n_seasons`. This yields, for
+        # example for `n_seasons = 4`, the seasons lagged L3, L2, L1, L0.
+        # As described above, the state vector in this model should have
+        # seasonal factors ordered L0, L1, L2, L3, and as a result we need to
+        # reverse the order of the computed initial seasonal factors from
+        # these methods.
         methods = ['simple', 'heuristic']
-        if (self.initialization_method in methods and initial_seasonal is not
-            None):
+        if (self.initialization_method in methods
+                and initial_seasonal is not None):
             initial_seasonal = initial_seasonal[::-1]
+
         self._initial_level = initial_level
         self._initial_trend = initial_trend
         self._initial_seasonal = initial_seasonal
         self._initial_state = None
+
+        # Initialize now if possible (if we have a damped trend, then
+        # initialization will depend on the phi parameter, and so has to be
+        # done at each `update`)
         methods = ['simple', 'heuristic', 'known']
         if not self.damped_trend and self.initialization_method in methods:
-            self._initialize_constant_statespace(initial_level,
-                initial_trend, initial_seasonal)
+            self._initialize_constant_statespace(initial_level, initial_trend,
+                                                 initial_seasonal)
+
+        # Save keys for kwarg initialization
         self._init_keys += ['trend', 'damped_trend', 'seasonal',
-            'initialization_method', 'initial_level', 'initial_trend',
-            'initial_seasonal', 'bounds', 'concentrate_scale', 'dates', 'freq']
+                            'initialization_method', 'initial_level',
+                            'initial_trend', 'initial_seasonal', 'bounds',
+                            'concentrate_scale', 'dates', 'freq']
+
+    def _get_init_kwds(self):
+        kwds = super()._get_init_kwds()
+        kwds['seasonal'] = self.seasonal_periods
+        return kwds
+
+    @property
+    def _res_classes(self):
+        return {'fit': (ExponentialSmoothingResults,
+                        ExponentialSmoothingResultsWrapper)}
+
+    def clone(self, endog, exog=None, **kwargs):
+        if exog is not None:
+            raise NotImplementedError(
+                'ExponentialSmoothing does not support `exog`.')
+        return self._clone_from_init_kwds(endog, **kwargs)
+
+    @property
+    def state_names(self):
+        state_names = ['error', 'level']
+        if self.trend:
+            state_names += ['trend']
+        if self.seasonal:
+            state_names += (
+                ['seasonal'] + ['seasonal.L%d' % i
+                                for i in range(1, self.seasonal_periods)])
+
+        return state_names
+
+    @property
+    def param_names(self):
+        param_names = ['smoothing_level']
+        if self.trend:
+            param_names += ['smoothing_trend']
+        if self.seasonal:
+            param_names += ['smoothing_seasonal']
+        if self.damped_trend:
+            param_names += ['damping_trend']
+        if not self.concentrate_scale:
+            param_names += ['sigma2']
+
+        # Initialization
+        if self.initialization_method == 'estimated':
+            param_names += ['initial_level']
+            if self.trend:
+                param_names += ['initial_trend']
+            if self.seasonal:
+                param_names += (
+                    ['initial_seasonal']
+                    + ['initial_seasonal.L%d' % i
+                       for i in range(1, self.seasonal_periods - 1)])
+
+        return param_names
+
+    @property
+    def start_params(self):
+        # Make sure starting parameters aren't beyond or right on the bounds
+        bounds = [(x[0] + 1e-3, x[1] - 1e-3) for x in self.bounds]
+
+        # See Hyndman p.24
+        start_params = [np.clip(0.1, *bounds[0])]
+        if self.trend:
+            start_params += [np.clip(0.01, *bounds[1])]
+        if self.seasonal:
+            start_params += [np.clip(0.01, *bounds[2])]
+        if self.damped_trend:
+            start_params += [np.clip(0.98, *bounds[3])]
+        if not self.concentrate_scale:
+            start_params += [np.var(self.endog)]
+
+        # Initialization
+        if self.initialization_method == 'estimated':
+            initial_level, initial_trend, initial_seasonal = (
+                es_init._initialization_simple(
+                    self.endog[:, 0],
+                    trend='add' if self.trend else None,
+                    seasonal='add' if self.seasonal else None,
+                    seasonal_periods=self.seasonal_periods))
+            start_params += [initial_level]
+            if self.trend:
+                start_params += [initial_trend]
+            if self.seasonal:
+                start_params += initial_seasonal.tolist()[::-1][:-1]
+
+        return np.array(start_params)
+
+    @property
+    def k_params(self):
+        k_params = (
+            1 + int(self.trend) + int(self.seasonal) +
+            int(not self.concentrate_scale) + int(self.damped_trend))
+        if self.initialization_method == 'estimated':
+            k_params += (
+                1 + int(self.trend) +
+                int(self.seasonal) * (self._seasonal_periods - 1))
+        return k_params
+
+    def transform_params(self, unconstrained):
+        unconstrained = np.array(unconstrained, ndmin=1)
+        constrained = np.zeros_like(unconstrained)
+
+        # Alpha in (0, 1)
+        low, high = self.bounds[0]
+        constrained[0] = (
+            1 / (1 + np.exp(-unconstrained[0])) * (high - low) + low)
+        i = 1
+
+        # Beta in (0, alpha)
+        if self.trend:
+            low, high = self.bounds[1]
+            high = min(high, constrained[0])
+            constrained[i] = (
+                1 / (1 + np.exp(-unconstrained[i])) * (high - low) + low)
+            i += 1
+
+        # Gamma in (0, 1 - alpha)
+        if self.seasonal:
+            low, high = self.bounds[2]
+            high = min(high, 1 - constrained[0])
+            constrained[i] = (
+                1 / (1 + np.exp(-unconstrained[i])) * (high - low) + low)
+            i += 1
+
+        # Phi in bounds (e.g. default is [0.8, 0.98])
+        if self.damped_trend:
+            low, high = self.bounds[3]
+            constrained[i] = (
+                1 / (1 + np.exp(-unconstrained[i])) * (high - low) + low)
+            i += 1
+
+        # sigma^2 positive
+        if not self.concentrate_scale:
+            constrained[i] = unconstrained[i]**2
+            i += 1
+
+        # Initial parameters are as-is
+        if self.initialization_method == 'estimated':
+            constrained[i:] = unconstrained[i:]
+
+        return constrained
+
+    def untransform_params(self, constrained):
+        constrained = np.array(constrained, ndmin=1)
+        unconstrained = np.zeros_like(constrained)
+
+        # Alpha in (0, 1)
+        low, high = self.bounds[0]
+        tmp = (constrained[0] - low) / (high - low)
+        unconstrained[0] = np.log(tmp / (1 - tmp))
+        i = 1
+
+        # Beta in (0, alpha)
+        if self.trend:
+            low, high = self.bounds[1]
+            high = min(high, constrained[0])
+            tmp = (constrained[i] - low) / (high - low)
+            unconstrained[i] = np.log(tmp / (1 - tmp))
+            i += 1
+
+        # Gamma in (0, 1 - alpha)
+        if self.seasonal:
+            low, high = self.bounds[2]
+            high = min(high, 1 - constrained[0])
+            tmp = (constrained[i] - low) / (high - low)
+            unconstrained[i] = np.log(tmp / (1 - tmp))
+            i += 1
+
+        # Phi in bounds (e.g. default is [0.8, 0.98])
+        if self.damped_trend:
+            low, high = self.bounds[3]
+            tmp = (constrained[i] - low) / (high - low)
+            unconstrained[i] = np.log(tmp / (1 - tmp))
+            i += 1
+
+        # sigma^2 positive
+        if not self.concentrate_scale:
+            unconstrained[i] = constrained[i]**0.5
+            i += 1
+
+        # Initial parameters are as-is
+        if self.initialization_method == 'estimated':
+            unconstrained[i:] = constrained[i:]
+
+        return unconstrained
+
+    def _initialize_constant_statespace(self, initial_level,
+                                        initial_trend=None,
+                                        initial_seasonal=None):
+        # Note: this should be run after `update` has already put any new
+        # parameters into the transition matrix, since it uses the transition
+        # matrix explicitly.
+
+        # Due to timing differences, the state space representation integrates
+        # the trend into the level in the "predicted_state" (only the
+        # "filtered_state" corresponds to the timing of the exponential
+        # smoothing models)
+
+        # Initial values are interpreted as "filtered" values
+        constant = np.array([0., initial_level])
+        if self.trend and initial_trend is not None:
+            constant = np.r_[constant, initial_trend]
+        if self.seasonal and initial_seasonal is not None:
+            constant = np.r_[constant, initial_seasonal]
+        self._initial_state = constant[1:]
+
+        # Apply the prediction step to get to what we need for our Kalman
+        # filter implementation
+        constant = np.dot(self.ssm['transition'], constant)
+
+        self.initialization.constant = constant
+
+    def _initialize_stationary_cov_statespace(self):
+        R = self.ssm['selection']
+        Q = self.ssm['state_cov']
+        self.initialization.stationary_cov = R.dot(Q).dot(R.T)
+
+    def update(self, params, transformed=True, includes_fixed=False,
+               complex_step=False):
+        params = self.handle_params(params, transformed=transformed,
+                                    includes_fixed=includes_fixed)
+
+        # State space system matrices
+        self.ssm['selection', 0, 0] = 1 - params[0]
+        self.ssm['selection', 1, 0] = params[0]
+        i = 1
+        if self.trend:
+            self.ssm['selection', 2, 0] = params[i]
+            i += 1
+        if self.seasonal:
+            self.ssm['selection', 0, 0] -= params[i]
+            self.ssm['selection', i + 1, 0] = params[i]
+            i += 1
+        if self.damped_trend:
+            self.ssm['transition', 1:3, 2] = params[i]
+            i += 1
+        if not self.concentrate_scale:
+            self.ssm['state_cov', 0, 0] = params[i]
+            i += 1
+
+        # State initialization
+        if self.initialization_method == 'estimated':
+            initial_level = params[i]
+            i += 1
+            initial_trend = None
+            initial_seasonal = None
+
+            if self.trend:
+                initial_trend = params[i]
+                i += 1
+            if self.seasonal:
+                initial_seasonal = params[i: i + self.seasonal_periods - 1]
+                initial_seasonal = np.r_[initial_seasonal,
+                                         0 - np.sum(initial_seasonal)]
+            self._initialize_constant_statespace(initial_level, initial_trend,
+                                                 initial_seasonal)
+
+        methods = ['simple', 'heuristic', 'known']
+        if self.damped_trend and self.initialization_method in methods:
+            self._initialize_constant_statespace(
+                self._initial_level, self._initial_trend,
+                self._initial_seasonal)
+
+        self._initialize_stationary_cov_statespace()
+
+    def _compute_concentrated_states(self, params, *args, **kwargs):
+        # Apply the usual filter, but keep forecasts
+        kwargs['conserve_memory'] = MEMORY_CONSERVE & ~MEMORY_NO_FORECAST
+        super().loglike(params, *args, **kwargs)
+
+        # Compute the initial state vector
+        y_tilde = np.array(self.ssm._kalman_filter.forecast_error[0],
+                           copy=True)
+
+        # Need to modify our state space system matrices slightly to get them
+        # back into the form of the innovations framework of
+        # De Livera et al. (2011)
+        T = self['transition', 1:, 1:]
+        R = self['selection', 1:]
+        Z = self['design', :, 1:].copy()
+        i = 1
+        if self.trend:
+            Z[0, i] = 1.
+            i += 1
+        if self.seasonal:
+            Z[0, i] = 0.
+            Z[0, -1] = 1.
+
+        # Now compute the regression components as described in
+        # De Livera et al. (2011), equation (10).
+        D = T - R.dot(Z)
+        w = np.zeros((self.nobs, self.k_states - 1), dtype=D.dtype)
+        w[0] = Z
+        for i in range(self.nobs - 1):
+            w[i + 1] = w[i].dot(D)
+        mod_ols = GLM(y_tilde, w)
+
+        # If we have seasonal parameters, constrain them to sum to zero
+        # (otherwise the initial level gets confounded with the sum of the
+        # seasonals).
+        if self.seasonal:
+            R = np.zeros_like(Z)
+            R[0, -self.seasonal_periods:] = 1.
+            q = np.zeros((1, 1))
+            res_ols = mod_ols.fit_constrained((R, q))
+        else:
+            res_ols = mod_ols.fit()
+
+        # Separate into individual components
+        initial_level = res_ols.params[0]
+        initial_trend = res_ols.params[1] if self.trend else None
+        initial_seasonal = (
+            res_ols.params[-self.seasonal_periods:] if self.seasonal else None)
+
+        return initial_level, initial_trend, initial_seasonal
+
+    @Appender(MLEModel.loglike.__doc__)
+    def loglike(self, params, *args, **kwargs):
+        if self.initialization_method == 'concentrated':
+            self._initialize_constant_statespace(
+                *self._compute_concentrated_states(params, *args, **kwargs))
+            llf = self.ssm.loglike()
+            self.ssm.initialization.constant = np.zeros(self.k_states)
+        else:
+            llf = super().loglike(params, *args, **kwargs)
+        return llf
+
+    @Appender(MLEModel.filter.__doc__)
+    def filter(self, params, cov_type=None, cov_kwds=None,
+               return_ssm=False, results_class=None,
+               results_wrapper_class=None, *args, **kwargs):
+        if self.initialization_method == 'concentrated':
+            self._initialize_constant_statespace(
+                *self._compute_concentrated_states(params, *args, **kwargs))
+
+        results = super().filter(
+            params, cov_type=cov_type, cov_kwds=cov_kwds,
+            return_ssm=return_ssm, results_class=results_class,
+            results_wrapper_class=results_wrapper_class, *args, **kwargs)
+
+        if self.initialization_method == 'concentrated':
+            self.ssm.initialization.constant = np.zeros(self.k_states)
+        return results
+
+    @Appender(MLEModel.smooth.__doc__)
+    def smooth(self, params, cov_type=None, cov_kwds=None,
+               return_ssm=False, results_class=None,
+               results_wrapper_class=None, *args, **kwargs):
+        if self.initialization_method == 'concentrated':
+            self._initialize_constant_statespace(
+                *self._compute_concentrated_states(params, *args, **kwargs))
+
+        results = super().smooth(
+            params, cov_type=cov_type, cov_kwds=cov_kwds,
+            return_ssm=return_ssm, results_class=results_class,
+            results_wrapper_class=results_wrapper_class, *args, **kwargs)
+
+        if self.initialization_method == 'concentrated':
+            self.ssm.initialization.constant = np.zeros(self.k_states)
+        return results


 class ExponentialSmoothingResults(MLEResults):
     """
     Results from fitting a linear exponential smoothing model
     """
-
-    def __init__(self, model, params, filter_results, cov_type=None, **kwargs):
+    def __init__(self, model, params, filter_results, cov_type=None,
+                 **kwargs):
         super().__init__(model, params, filter_results, cov_type, **kwargs)
+
+        # Save the states
         self.initial_state = model._initial_state
         if isinstance(self.data, PandasData):
             index = self.data.row_labels
-            self.initial_state = pd.DataFrame([model._initial_state],
-                columns=model.state_names[1:])
+            self.initial_state = pd.DataFrame(
+                [model._initial_state], columns=model.state_names[1:])
             if model._index_dates and model._index_freq is not None:
                 self.initial_state.index = index.shift(-1)[:1]

+    @Appender(MLEResults.summary.__doc__)
+    def summary(self, alpha=.05, start=None):
+        specification = ['A']
+        if self.model.trend and self.model.damped_trend:
+            specification.append('Ad')
+        elif self.model.trend:
+            specification.append('A')
+        else:
+            specification.append('N')
+        if self.model.seasonal:
+            specification.append('A')
+        else:
+            specification.append('N')
+
+        model_name = 'ETS(' + ', '.join(specification) + ')'
+
+        summary = super(ExponentialSmoothingResults, self).summary(
+            alpha=alpha, start=start, title='Exponential Smoothing Results',
+            model_name=model_name)
+
+        if self.model.initialization_method != 'estimated':
+            params = np.array(self.initial_state)
+            if params.ndim > 1:
+                params = params[0]
+            names = self.model.state_names[1:]
+            param_header = ['initialization method: %s'
+                            % self.model.initialization_method]
+            params_stubs = names
+            params_data = [[forg(params[i], prec=4)]
+                           for i in range(len(params))]
+
+            initial_state_table = SimpleTable(params_data,
+                                              param_header,
+                                              params_stubs,
+                                              txt_fmt=fmt_params)
+            summary.tables.insert(-1, initial_state_table)
+
+        return summary
+

 class ExponentialSmoothingResultsWrapper(MLEResultsWrapper):
     _attrs = {}
-    _wrap_attrs = wrap.union_dicts(MLEResultsWrapper._wrap_attrs, _attrs)
+    _wrap_attrs = wrap.union_dicts(MLEResultsWrapper._wrap_attrs,
+                                   _attrs)
     _methods = {}
-    _wrap_methods = wrap.union_dicts(MLEResultsWrapper._wrap_methods, _methods)
-
-
-wrap.populate_wrapper(ExponentialSmoothingResultsWrapper,
-    ExponentialSmoothingResults)
+    _wrap_methods = wrap.union_dicts(MLEResultsWrapper._wrap_methods,
+                                     _methods)
+wrap.populate_wrapper(ExponentialSmoothingResultsWrapper,  # noqa:E305
+                      ExponentialSmoothingResults)
diff --git a/statsmodels/tsa/statespace/initialization.py b/statsmodels/tsa/statespace/initialization.py
index e6713b6b6..f4a2e1094 100644
--- a/statsmodels/tsa/statespace/initialization.py
+++ b/statsmodels/tsa/statespace/initialization.py
@@ -5,12 +5,14 @@ Author: Chad Fulton
 License: Simplified-BSD
 """
 import warnings
+
 import numpy as np
+
 from . import tools


 class Initialization:
-    """
+    r"""
     State space initialization

     Parameters
@@ -31,14 +33,14 @@ class Initialization:

     .. math::

-        \\alpha_1 & = a + A \\delta + R_0 \\eta_0 \\\\
-        \\delta & \\sim N(0, \\kappa I), \\kappa \\to \\infty \\\\
-        \\eta_0 & \\sim N(0, Q_0)
+        \alpha_1 & = a + A \delta + R_0 \eta_0 \\
+        \delta & \sim N(0, \kappa I), \kappa \to \infty \\
+        \eta_0 & \sim N(0, Q_0)

     Thus the state vector can be initialized with a known constant part
     (elements of :math:`a`), with part modeled as a diffuse initial
-    distribution (as a part of :math:`\\delta`), and with a part modeled as a
-    known (proper) initial distribution (as a part of :math:`\\eta_0`).
+    distribution (as a part of :math:`\delta`), and with a part modeled as a
+    known (proper) initial distribution (as a part of :math:`\eta_0`).

     There are two important restrictions:

@@ -52,13 +54,13 @@ class Initialization:
        warning to be given, since it is not technically invalid but may
        indicate user error.

-    The :math:`\\eta_0` compoenent is also referred to as the stationary part
+    The :math:`\eta_0` compoenent is also referred to as the stationary part
     because it is often set to the unconditional distribution of a stationary
     process.

     Initialization is specified for blocks (consecutive only, for now) of the
     state vector, with the entire state vector and individual elements as
-    special cases. Denote the block in question as :math:`\\alpha_1^{(i)}`. It
+    special cases. Denote the block in question as :math:`\alpha_1^{(i)}`. It
     can be initialized in the following ways:

     - 'known'
@@ -83,12 +85,12 @@ class Initialization:
     If a block is initialized as known, then a known (possibly degenerate)
     distribution is used; in particular, the block of states is understood to
     be distributed
-    :math:`\\alpha_1^{(i)} \\sim N(a^{(i)}, Q_0^{(i)})`. Here, is is possible to
+    :math:`\alpha_1^{(i)} \sim N(a^{(i)}, Q_0^{(i)})`. Here, is is possible to
     set :math:`a^{(i)} = 0`, and it is also possible that
     :math:`Q_0^{(i)}` is only positive-semidefinite; i.e.
-    :math:`\\alpha_1^{(i)}` may be degenerate. One particular example is
+    :math:`\alpha_1^{(i)}` may be degenerate. One particular example is
     that if the entire block's initial values are known, then
-    :math:`R_0^{(i)} = 0`, and so `Var(\\alpha_1^{(i)}) = 0`.
+    :math:`R_0^{(i)} = 0`, and so `Var(\alpha_1^{(i)}) = 0`.

     Here, `constant` must be provided (although it can be zeros), and
     `stationary_cov` is optional (by default it is a matrix of zeros).
@@ -96,17 +98,17 @@ class Initialization:
     **Diffuse**

     If a block is initialized as diffuse, then set
-    :math:`\\alpha_1^{(i)} \\sim N(a^{(i)}, \\kappa^{(i)} I)`. If the block is
+    :math:`\alpha_1^{(i)} \sim N(a^{(i)}, \kappa^{(i)} I)`. If the block is
     initialized using the exact diffuse initialization procedure, then it is
-    understood that :math:`\\kappa^{(i)} \\to \\infty`.
+    understood that :math:`\kappa^{(i)} \to \infty`.

     If the block is initialized using the approximate diffuse initialization
-    procedure, then `\\kappa^{(i)}` is set to some large value rather than
+    procedure, then `\kappa^{(i)}` is set to some large value rather than
     driven to infinity.

     In the approximate diffuse initialization case, it is possible, although
     unlikely, that a known constant value may have some effect on
-    initialization if :math:`\\kappa^{(i)}` is not set large enough.
+    initialization if :math:`\kappa^{(i)}` is not set large enough.

     Here, `constant` may be provided, and `approximate_diffuse_variance` may be
     provided.
@@ -115,7 +117,7 @@ class Initialization:

     If a block is initialized as stationary, then the block of states is
     understood to have the distribution
-    :math:`\\alpha_1^{(i)} \\sim N(a^{(i)}, Q_0^{(i)})`. :math:`a^{(i)}` is
+    :math:`\alpha_1^{(i)} \sim N(a^{(i)}, Q_0^{(i)})`. :math:`a^{(i)}` is
     the unconditional mean of the block, computed as
     :math:`(I - T^{(i)})^{-1} c_t`. :math:`Q_0^{(i)}` is the unconditional
     variance of the block, computed as the solution to the discrete Lyapunov
@@ -139,8 +141,8 @@ class Initialization:
     'known', 'diffuse', or 'stationary'.

     For a block of type mixed, suppose that it has `J` sub-blocks,
-    :math:`\\alpha_1^{(i,j)}`. Then
-    :math:`\\alpha_1^{(i)} = a^{(i)} + A^{(i)} \\delta + R_0^{(i)} \\eta_0^{(i)}`.
+    :math:`\alpha_1^{(i,j)}`. Then
+    :math:`\alpha_1^{(i)} = a^{(i)} + A^{(i)} \delta + R_0^{(i)} \eta_0^{(i)}`.

     Examples
     --------
@@ -177,29 +179,40 @@ class Initialization:
     """

     def __init__(self, k_states, initialization_type=None,
-        initialization_classes=None, approximate_diffuse_variance=1000000.0,
-        constant=None, stationary_cov=None):
+                 initialization_classes=None, approximate_diffuse_variance=1e6,
+                 constant=None, stationary_cov=None):
+        # Parameters
         self.k_states = k_states
+
+        # Attributes handling blocks of states with different initializations
         self._states = tuple(np.arange(k_states))
         self._initialization = np.array([None] * k_states)
         self.blocks = {}
+
+        # Attributes handling initialization of the entire set of states
+        # `constant` is a vector of constant values (i.e. it is the vector
+        # a from DK)
         self.initialization_type = None
         self.constant = np.zeros(self.k_states)
         self.stationary_cov = np.zeros((self.k_states, self.k_states))
         self.approximate_diffuse_variance = approximate_diffuse_variance
-        self.prefix_initialization_map = (initialization_classes if 
-            initialization_classes is not None else tools.
-            prefix_initialization_map.copy())
+
+        # Cython interface attributes
+        self.prefix_initialization_map = (
+            initialization_classes if initialization_classes is not None
+            else tools.prefix_initialization_map.copy())
         self._representations = {}
         self._initializations = {}
+
+        # If given a global initialization, use it now
         if initialization_type is not None:
             self.set(None, initialization_type, constant=constant,
-                stationary_cov=stationary_cov)
+                     stationary_cov=stationary_cov)

     @classmethod
-    def from_components(cls, k_states, a=None, Pstar=None, Pinf=None, A=
-        None, R0=None, Q0=None):
-        """
+    def from_components(cls, k_states, a=None, Pstar=None, Pinf=None, A=None,
+                        R0=None, Q0=None):
+        r"""
         Construct initialization object from component matrices

         Parameters
@@ -260,14 +273,125 @@ class Initialization:
            Time Series Analysis by State Space Methods: Second Edition.
            Oxford University Press.
         """
-        pass
+        k_states = k_states
+
+        # Standardize the input
+        a = tools._atleast_1d(a)
+        Pstar, Pinf, A, R0, Q0 = tools._atleast_2d(Pstar, Pinf, A, R0, Q0)
+
+        # Validate the diffuse component
+        if Pstar is not None and (R0 is not None or Q0 is not None):
+            raise ValueError('Cannot specify the initial state covariance both'
+                             ' as `Pstar` and as the components R0 and Q0'
+                             '  (because `Pstar` is defined such that'
+                             " `Pstar=R0 Q0 R0'`).")
+        if Pinf is not None and A is not None:
+            raise ValueError('Cannot specify both the diffuse covariance'
+                             ' matrix `Pinf` and the selection matrix for'
+                             ' diffuse elements, A, (because Pinf is defined'
+                             " such that `Pinf=A A'`).")
+        elif A is not None:
+            Pinf = np.dot(A, A.T)
+
+        # Validate the non-diffuse component
+        if a is None:
+            a = np.zeros(k_states)
+        if len(a) != k_states:
+            raise ValueError('Must provide constant initialization vector for'
+                             ' the entire state vector.')
+        if R0 is not None or Q0 is not None:
+            if R0 is None or Q0 is None:
+                raise ValueError('If specifying either of R0 or Q0 then you'
+                                 ' must specify both R0 and Q0.')
+            Pstar = R0.dot(Q0).dot(R0.T)
+
+        # Handle the diffuse component
+        diffuse_ix = []
+        if Pinf is not None:
+            diffuse_ix = np.where(np.diagonal(Pinf))[0].tolist()
+
+            if Pstar is not None:
+                for i in diffuse_ix:
+                    if not (np.all(Pstar[i] == 0) and
+                            np.all(Pstar[:, i] == 0)):
+                        raise ValueError(f'The state at position {i} was'
+                                         ' specified as diffuse in Pinf, but'
+                                         ' also contains a non-diffuse'
+                                         ' diagonal or off-diagonal in Pstar.')
+        k_diffuse_states = len(diffuse_ix)
+
+        nondiffuse_ix = [i for i in np.arange(k_states) if i not in diffuse_ix]
+        k_nondiffuse_states = k_states - k_diffuse_states
+
+        # If there are non-diffuse states, require Pstar
+        if Pstar is None and k_nondiffuse_states > 0:
+            raise ValueError('Must provide initial covariance matrix for'
+                             ' non-diffuse states.')
+
+        # Construct the initialization
+        init = cls(k_states)
+        if nondiffuse_ix:
+            nondiffuse_groups = np.split(
+                nondiffuse_ix, np.where(np.diff(nondiffuse_ix) != 1)[0] + 1)
+        else:
+            nondiffuse_groups = []
+        for group in nondiffuse_groups:
+            s = slice(group[0], group[-1] + 1)
+            init.set(s, 'known', constant=a[s], stationary_cov=Pstar[s, s])
+        for i in diffuse_ix:
+            init.set(i, 'diffuse')
+
+        return init
+
+    @classmethod
+    def from_results(cls, filter_results):
+        a = filter_results.initial_state
+        Pstar = filter_results.initial_state_cov
+        Pinf = filter_results.initial_diffuse_state_cov
+
+        return cls.from_components(filter_results.model.k_states,
+                                   a=a, Pstar=Pstar, Pinf=Pinf)

     def __setitem__(self, index, initialization_type):
         self.set(index, initialization_type)

-    def set(self, index, initialization_type, constant=None, stationary_cov
-        =None, approximate_diffuse_variance=None):
-        """
+    def _initialize_initialization(self, prefix):
+        dtype = tools.prefix_dtype_map[prefix]
+
+        # If the dtype-specific representation matrices do not exist, create
+        # them
+        if prefix not in self._representations:
+            # Copy the statespace representation matrices
+            self._representations[prefix] = {
+                'constant': self.constant.astype(dtype),
+                'stationary_cov': np.asfortranarray(
+                    self.stationary_cov.astype(dtype)),
+            }
+        # If they do exist, update them
+        else:
+            self._representations[prefix]['constant'][:] = (
+                self.constant.astype(dtype)[:])
+            self._representations[prefix]['stationary_cov'][:] = (
+                self.stationary_cov.astype(dtype)[:])
+
+        # Create if necessary
+        if prefix not in self._initializations:
+            # Setup the base statespace object
+            cls = self.prefix_initialization_map[prefix]
+            self._initializations[prefix] = cls(
+                self.k_states, self._representations[prefix]['constant'],
+                self._representations[prefix]['stationary_cov'],
+                self.approximate_diffuse_variance)
+        # Otherwise update
+        else:
+            self._initializations[prefix].approximate_diffuse_variance = (
+                self.approximate_diffuse_variance)
+
+        return prefix, dtype
+
+    def set(self, index, initialization_type, constant=None,
+            stationary_cov=None, approximate_diffuse_variance=None):
+        r"""
         Set initialization for states, either globally or for a block

         Parameters
@@ -290,11 +414,136 @@ class Initialization:
             The covariance matrix of the stationary part, denoted :math:`Q_0`.
             Only used with 'known' initialization.
         approximate_diffuse_variance : float, optional
-            The approximate diffuse variance, denoted :math:`\\kappa`. Only
+            The approximate diffuse variance, denoted :math:`\kappa`. Only
             applicable with 'approximate_diffuse' initialization. Default is
             1e6.
         """
-        pass
+        # Construct the index, using a slice object as an intermediate step
+        # to enforce regularity
+        if not isinstance(index, slice):
+            if isinstance(index, (int, np.integer)):
+                index = int(index)
+                if index < 0 or index >= self.k_states:
+                    raise ValueError('Invalid index.')
+                index = (index, index + 1)
+            elif index is None:
+                index = (index,)
+            elif not isinstance(index, tuple):
+                raise ValueError('Invalid index.')
+            if len(index) > 2:
+                raise ValueError('Cannot include a slice step in `index`.')
+            index = slice(*index)
+        index = self._states[index]
+
+        # Compatibility with zero-length slices (can make it easier to set up
+        # initialization without lots of if statements)
+        if len(index) == 0:
+            return
+
+        # Make sure that we are not setting a block when global initialization
+        # was previously set
+        if self.initialization_type is not None and not index == self._states:
+            raise ValueError('Cannot set initialization for the block of'
+                             '  states %s because initialization was'
+                             ' previously performed globally. You must either'
+                             ' re-initialize globally or'
+                             ' else unset the global initialization before'
+                             ' initializing specific blocks of states.'
+                             % str(index))
+        # Make sure that we are not setting a block that *overlaps* with
+        # another block (although we are free to *replace* an entire block)
+        uninitialized = np.equal(self._initialization[index, ], None)
+        if index not in self.blocks and not np.all(uninitialized):
+            raise ValueError('Cannot set initialization for the state(s) %s'
+                             ' because they are a subset of a previously'
+                             ' initialized block. You must either'
+                             ' re-initialize the entire block as a whole or'
+                             ' else unset the entire block before'
+                             ' re-initializing the subset.'
+                             % str(np.array(index)[~uninitialized]))
+
+        # If setting for all states, set this object's initialization
+        # attributes
+        k_states = len(index)
+        if k_states == self.k_states:
+            self.initialization_type = initialization_type
+
+            # General validation
+            if (approximate_diffuse_variance is not None and
+                    not initialization_type == 'approximate_diffuse'):
+                raise ValueError('`approximate_diffuse_variance` can only be'
+                                 ' provided when using approximate diffuse'
+                                 ' initialization.')
+            if (stationary_cov is not None and
+                    not initialization_type == 'known'):
+                raise ValueError('`stationary_cov` can only be provided when'
+                                 ' using known initialization.')
+
+            # Specific initialization handling
+            if initialization_type == 'known':
+                # Make sure we were given some known initialization
+                if constant is None and stationary_cov is None:
+                    raise ValueError('Must specify either the constant vector'
+                                     ' or the stationary covariance matrix'
+                                     ' (or both) if using known'
+                                     ' initialization.')
+                # Defaults
+                if stationary_cov is None:
+                    stationary_cov = np.zeros((k_states, k_states))
+                else:
+                    stationary_cov = np.array(stationary_cov)
+
+                # Validate
+                if not stationary_cov.shape == (k_states, k_states):
+                    raise ValueError('Invalid stationary covariance matrix;'
+                                     ' given shape %s but require shape %s.'
+                                     % (str(stationary_cov.shape),
+                                        str((k_states, k_states))))
+
+                # Set values
+                self.stationary_cov = stationary_cov
+            elif initialization_type == 'diffuse':
+                if constant is not None:
+                    warnings.warn('Constant values provided, but they are'
+                                  ' ignored in exact diffuse initialization.')
+            elif initialization_type == 'approximate_diffuse':
+                if approximate_diffuse_variance is not None:
+                    self.approximate_diffuse_variance = (
+                        approximate_diffuse_variance)
+            elif initialization_type == 'stationary':
+                if constant is not None:
+                    raise ValueError('Constant values cannot be provided for'
+                                     ' stationary initialization.')
+            else:
+                raise ValueError('Invalid initialization type.')
+
+            # Handle constant
+            if constant is None:
+                constant = np.zeros(k_states)
+            else:
+                constant = np.array(constant)
+            if not constant.shape == (k_states,):
+                raise ValueError('Invalid constant vector; given shape %s'
+                                 ' but require shape %s.'
+                                 % (str(constant.shape), str((k_states,))))
+            self.constant = constant
+        # Otherwise, if setting a sub-block, construct the new initialization
+        # object
+        else:
+            if isinstance(initialization_type, Initialization):
+                init = initialization_type
+            else:
+                if approximate_diffuse_variance is None:
+                    approximate_diffuse_variance = (
+                        self.approximate_diffuse_variance)
+                init = Initialization(
+                    k_states, initialization_type, constant=constant,
+                    stationary_cov=stationary_cov,
+                    approximate_diffuse_variance=approximate_diffuse_variance)
+
+            self.blocks[index] = init
+            for i in index:
+                self._initialization[i] = index

     def unset(self, index):
         """
@@ -316,18 +565,65 @@ class Initialization:
         initialization. To unset all initializations (including both global and
         block level), use the `clear` method.
         """
-        pass
+        if isinstance(index, (int, np.integer)):
+            index = int(index)
+            if index < 0 or index > self.k_states:
+                raise ValueError('Invalid index.')
+            index = (index, index + 1)
+        elif index is None:
+            index = (index,)
+        elif not isinstance(index, tuple):
+            raise ValueError('Invalid index.')
+        if len(index) > 2:
+            raise ValueError('Cannot include a slice step in `index`.')
+        index = self._states[slice(*index)]
+
+        # Compatibility with zero-length slices (can make it easier to set up
+        # initialization without lots of if statements)
+        if len(index) == 0:
+            return
+
+        # Unset the values
+        k_states = len(index)
+        if k_states == self.k_states and self.initialization_type is not None:
+            self.initialization_type = None
+            self.constant[:] = 0
+            self.stationary_cov[:] = 0
+        elif index in self.blocks:
+            for i in index:
+                self._initialization[i] = None
+            del self.blocks[index]
+        else:
+            raise ValueError('The given index does not correspond to a'
+                             ' previously initialized block.')

     def clear(self):
         """
         Clear all previously set initializations, either global or block level
         """
-        pass
+        # Clear initializations
+        for i in self._states:
+            self._initialization[i] = None
+
+        # Delete block initializations
+        keys = list(self.blocks.keys())
+        for key in keys:
+            del self.blocks[key]
+
+        # Clear global attributes
+        self.initialization_type = None
+        self.constant[:] = 0
+        self.stationary_cov[:] = 0
+
+    @property
+    def initialized(self):
+        return not (self.initialization_type is None and
+                    np.any(np.equal(self._initialization, None)))

     def __call__(self, index=None, model=None, initial_state_mean=None,
-        initial_diffuse_state_cov=None, initial_stationary_state_cov=None,
-        complex_step=False):
-        """
+                 initial_diffuse_state_cov=None,
+                 initial_stationary_state_cov=None, complex_step=False):
+        r"""
         Construct initialization representation

         Parameters
@@ -355,7 +651,7 @@ class Initialization:
             Initial state mean, :math:`a_1^{(0)} = a`
         initial_diffuse_state_cov : ndarray
             Diffuse component of initial state covariance matrix,
-            :math:`P_\\infty = A A'`
+            :math:`P_\infty = A A'`
         initial_stationary_state_cov : ndarray
             Stationary component of initial state covariance matrix,
             :math:`P_* = R_0 Q_0 R_0'`
@@ -366,11 +662,13 @@ class Initialization:
         of states, then either `model` or all of `state_intercept`,
         `transition`, `selection`, and `state_cov` must be provided.
         """
-        if self.initialization_type is None and np.any(np.equal(self.
-            _initialization, None)):
-            raise ValueError(
-                'Cannot construct initialization representation because not all states have been initialized.'
-                )
+        # Check that all states are initialized somehow
+        if (self.initialization_type is None and
+                np.any(np.equal(self._initialization, None))):
+            raise ValueError('Cannot construct initialization representation'
+                             ' because not all states have been initialized.')
+
+        # Setup indexes
         if index is None:
             index = self._states
             ix1 = np.s_[:]
@@ -378,58 +676,80 @@ class Initialization:
         else:
             ix1 = np.s_[index[0]:index[-1] + 1]
             ix2 = np.ix_(index, index)
+
+        # Retrieve state_intercept, etc. if `model` was given
         if model is not None:
             state_intercept = model['state_intercept', ix1, 0]
             transition = model[('transition',) + ix2 + (0,)]
             selection = model['selection', ix1, :, 0]
             state_cov = model['state_cov', :, :, 0]
             selected_state_cov = np.dot(selection, state_cov).dot(selection.T)
+
+        # Create output arrays if not given
         if initial_state_mean is None:
             initial_state_mean = np.zeros(self.k_states)
-        cov_shape = self.k_states, self.k_states
+        cov_shape = (self.k_states, self.k_states)
         if initial_diffuse_state_cov is None:
             initial_diffuse_state_cov = np.zeros(cov_shape)
         if initial_stationary_state_cov is None:
             initial_stationary_state_cov = np.zeros(cov_shape)
+
+        # If using global initialization, compute the actual elements and
+        # return them
         if self.initialization_type is not None:
             eye = np.eye(self.k_states)
             zeros = np.zeros((self.k_states, self.k_states))
+
+            # General validation
             if self.initialization_type == 'stationary' and model is None:
-                raise ValueError(
-                    'Stationary initialization requires passing either the `model` argument or all of the individual transition equation arguments.'
-                    )
+                raise ValueError('Stationary initialization requires passing'
+                                 ' either the `model` argument or all of the'
+                                 ' individual transition equation arguments.')
             if self.initialization_type == 'stationary':
+                # TODO performance
                 eigvals = np.linalg.eigvals(transition)
-                threshold = 1.0 - 1e-10
+                threshold = 1. - 1e-10
                 if not np.max(np.abs(eigvals)) < threshold:
-                    raise ValueError(
-                        'Transition equation is not stationary, and so stationary initialization cannot be used.'
-                        )
+                    raise ValueError('Transition equation is not stationary,'
+                                     ' and so stationary initialization cannot'
+                                     ' be used.')
+
+            # Set the initial state mean
             if self.initialization_type == 'stationary':
+                # TODO performance
                 initial_state_mean[ix1] = np.linalg.solve(eye - transition,
-                    state_intercept)
+                                                          state_intercept)
             else:
                 initial_state_mean[ix1] = self.constant
+
+            # Set the diffuse component
             if self.initialization_type == 'diffuse':
                 initial_diffuse_state_cov[ix2] = np.eye(self.k_states)
             else:
                 initial_diffuse_state_cov[ix2] = zeros
+
+            # Set the stationary component
             if self.initialization_type == 'known':
                 initial_stationary_state_cov[ix2] = self.stationary_cov
             elif self.initialization_type == 'diffuse':
                 initial_stationary_state_cov[ix2] = zeros
             elif self.initialization_type == 'approximate_diffuse':
-                initial_stationary_state_cov[ix2
-                    ] = eye * self.approximate_diffuse_variance
+                initial_stationary_state_cov[ix2] = (
+                    eye * self.approximate_diffuse_variance)
             elif self.initialization_type == 'stationary':
-                initial_stationary_state_cov[ix2
-                    ] = tools.solve_discrete_lyapunov(transition,
-                    selected_state_cov, complex_step=complex_step)
+                # TODO performance
+                initial_stationary_state_cov[ix2] = (
+                    tools.solve_discrete_lyapunov(transition,
+                                                  selected_state_cov,
+                                                  complex_step=complex_step))
         else:
+            # Otherwise, if using blocks, recursively initialize
+            # them (values will be set in-place)
             for block_index, init in self.blocks.items():
-                init(index=tuple(np.array(index)[block_index,]), model=
-                    model, initial_state_mean=initial_state_mean,
-                    initial_diffuse_state_cov=initial_diffuse_state_cov,
-                    initial_stationary_state_cov=initial_stationary_state_cov)
+                init(index=tuple(np.array(index)[block_index, ]),
+                     model=model, initial_state_mean=initial_state_mean,
+                     initial_diffuse_state_cov=initial_diffuse_state_cov,
+                     initial_stationary_state_cov=initial_stationary_state_cov)
+
         return (initial_state_mean, initial_diffuse_state_cov,
-            initial_stationary_state_cov)
+                initial_stationary_state_cov)
diff --git a/statsmodels/tsa/statespace/kalman_filter.py b/statsmodels/tsa/statespace/kalman_filter.py
index 7d3f4ac89..c9981bab9 100644
--- a/statsmodels/tsa/statespace/kalman_filter.py
+++ b/statsmodels/tsa/statespace/kalman_filter.py
@@ -4,52 +4,61 @@ State Space Representation and Kalman Filter
 Author: Chad Fulton
 License: Simplified-BSD
 """
+
 import contextlib
 from warnings import warn
+
 import numpy as np
 from .representation import OptionWrapper, Representation, FrozenRepresentation
 from .tools import reorder_missing_matrix, reorder_missing_vector
 from . import tools
 from statsmodels.tools.sm_exceptions import ValueWarning
-FILTER_CONVENTIONAL = 1
-FILTER_EXACT_INITIAL = 2
-FILTER_AUGMENTED = 4
-FILTER_SQUARE_ROOT = 8
-FILTER_UNIVARIATE = 16
-FILTER_COLLAPSED = 32
-FILTER_EXTENDED = 64
-FILTER_UNSCENTED = 128
-FILTER_CONCENTRATED = 256
-FILTER_CHANDRASEKHAR = 512
-INVERT_UNIVARIATE = 1
-SOLVE_LU = 2
-INVERT_LU = 4
-SOLVE_CHOLESKY = 8
-INVERT_CHOLESKY = 16
-STABILITY_FORCE_SYMMETRY = 1
+
+# Define constants
+FILTER_CONVENTIONAL = 0x01     # Durbin and Koopman (2012), Chapter 4
+FILTER_EXACT_INITIAL = 0x02    # ibid., Chapter 5.6
+FILTER_AUGMENTED = 0x04        # ibid., Chapter 5.7
+FILTER_SQUARE_ROOT = 0x08      # ibid., Chapter 6.3
+FILTER_UNIVARIATE = 0x10       # ibid., Chapter 6.4
+FILTER_COLLAPSED = 0x20        # ibid., Chapter 6.5
+FILTER_EXTENDED = 0x40         # ibid., Chapter 10.2
+FILTER_UNSCENTED = 0x80        # ibid., Chapter 10.3
+FILTER_CONCENTRATED = 0x100    # Harvey (1989), Chapter 3.4
+FILTER_CHANDRASEKHAR = 0x200   # Herbst (2015)
+
+INVERT_UNIVARIATE = 0x01
+SOLVE_LU = 0x02
+INVERT_LU = 0x04
+SOLVE_CHOLESKY = 0x08
+INVERT_CHOLESKY = 0x10
+
+STABILITY_FORCE_SYMMETRY = 0x01
+
 MEMORY_STORE_ALL = 0
-MEMORY_NO_FORECAST_MEAN = 1
-MEMORY_NO_FORECAST_COV = 2
+MEMORY_NO_FORECAST_MEAN = 0x01
+MEMORY_NO_FORECAST_COV = 0x02
 MEMORY_NO_FORECAST = MEMORY_NO_FORECAST_MEAN | MEMORY_NO_FORECAST_COV
-MEMORY_NO_PREDICTED_MEAN = 4
-MEMORY_NO_PREDICTED_COV = 8
+MEMORY_NO_PREDICTED_MEAN = 0x04
+MEMORY_NO_PREDICTED_COV = 0x08
 MEMORY_NO_PREDICTED = MEMORY_NO_PREDICTED_MEAN | MEMORY_NO_PREDICTED_COV
-MEMORY_NO_FILTERED_MEAN = 16
-MEMORY_NO_FILTERED_COV = 32
+MEMORY_NO_FILTERED_MEAN = 0x10
+MEMORY_NO_FILTERED_COV = 0x20
 MEMORY_NO_FILTERED = MEMORY_NO_FILTERED_MEAN | MEMORY_NO_FILTERED_COV
-MEMORY_NO_LIKELIHOOD = 64
-MEMORY_NO_GAIN = 128
-MEMORY_NO_SMOOTHING = 256
-MEMORY_NO_STD_FORECAST = 512
-MEMORY_CONSERVE = (MEMORY_NO_FORECAST_COV | MEMORY_NO_PREDICTED |
-    MEMORY_NO_FILTERED | MEMORY_NO_LIKELIHOOD | MEMORY_NO_GAIN |
-    MEMORY_NO_SMOOTHING)
+MEMORY_NO_LIKELIHOOD = 0x40
+MEMORY_NO_GAIN = 0x80
+MEMORY_NO_SMOOTHING = 0x100
+MEMORY_NO_STD_FORECAST = 0x200
+MEMORY_CONSERVE = (
+    MEMORY_NO_FORECAST_COV | MEMORY_NO_PREDICTED | MEMORY_NO_FILTERED |
+    MEMORY_NO_LIKELIHOOD | MEMORY_NO_GAIN | MEMORY_NO_SMOOTHING
+)
+
 TIMING_INIT_PREDICTED = 0
 TIMING_INIT_FILTERED = 1


 class KalmanFilter(Representation):
-    """
+    r"""
     State space representation of a time series process, with Kalman filter

     Parameters
@@ -127,10 +136,14 @@ class KalmanFilter(Representation):
     set, then the Cholesky decomposition method would *always* be used, even in
     the case of 1-dimensional data.
     """
-    filter_methods = ['filter_conventional', 'filter_exact_initial',
-        'filter_augmented', 'filter_square_root', 'filter_univariate',
-        'filter_collapsed', 'filter_extended', 'filter_unscented',
-        'filter_concentrated', 'filter_chandrasekhar']
+
+    filter_methods = [
+        'filter_conventional', 'filter_exact_initial', 'filter_augmented',
+        'filter_square_root', 'filter_univariate', 'filter_collapsed',
+        'filter_extended', 'filter_unscented', 'filter_concentrated',
+        'filter_chandrasekhar'
+    ]
+
     filter_conventional = OptionWrapper('filter_method', FILTER_CONVENTIONAL)
     """
     (bool) Flag for conventional Kalman filtering.
@@ -171,8 +184,12 @@ class KalmanFilter(Representation):
     """
     (bool) Flag for filtering with Chandrasekhar recursions.
     """
-    inversion_methods = ['invert_univariate', 'solve_lu', 'invert_lu',
-        'solve_cholesky', 'invert_cholesky']
+
+    inversion_methods = [
+        'invert_univariate', 'solve_lu', 'invert_lu', 'solve_cholesky',
+        'invert_cholesky'
+    ]
+
     invert_univariate = OptionWrapper('inversion_method', INVERT_UNIVARIATE)
     """
     (bool) Flag for univariate inversion method (recommended).
@@ -193,76 +210,111 @@ class KalmanFilter(Representation):
     """
     (bool) Flag for Cholesky inversion method.
     """
+
     stability_methods = ['stability_force_symmetry']
-    stability_force_symmetry = OptionWrapper('stability_method',
-        STABILITY_FORCE_SYMMETRY)
+
+    stability_force_symmetry = (
+        OptionWrapper('stability_method', STABILITY_FORCE_SYMMETRY)
+    )
     """
     (bool) Flag for enforcing covariance matrix symmetry
     """
-    memory_options = ['memory_store_all', 'memory_no_forecast_mean',
+
+    memory_options = [
+        'memory_store_all', 'memory_no_forecast_mean',
         'memory_no_forecast_cov', 'memory_no_forecast',
         'memory_no_predicted_mean', 'memory_no_predicted_cov',
         'memory_no_predicted', 'memory_no_filtered_mean',
         'memory_no_filtered_cov', 'memory_no_filtered',
-        'memory_no_likelihood', 'memory_no_gain', 'memory_no_smoothing',
-        'memory_no_std_forecast', 'memory_conserve']
+        'memory_no_likelihood', 'memory_no_gain',
+        'memory_no_smoothing', 'memory_no_std_forecast', 'memory_conserve'
+    ]
+
     memory_store_all = OptionWrapper('conserve_memory', MEMORY_STORE_ALL)
     """
     (bool) Flag for storing all intermediate results in memory (default).
     """
-    memory_no_forecast_mean = OptionWrapper('conserve_memory',
-        MEMORY_NO_FORECAST_MEAN)
+    memory_no_forecast_mean = OptionWrapper(
+        'conserve_memory', MEMORY_NO_FORECAST_MEAN)
     """
     (bool) Flag to prevent storing forecasts and forecast errors.
     """
-    memory_no_forecast_cov = OptionWrapper('conserve_memory',
-        MEMORY_NO_FORECAST_COV)
+    memory_no_forecast_cov = OptionWrapper(
+        'conserve_memory', MEMORY_NO_FORECAST_COV)
     """
     (bool) Flag to prevent storing forecast error covariance matrices.
     """
-
     @property
     def memory_no_forecast(self):
         """
         (bool) Flag to prevent storing all forecast-related output.
         """
-        pass
-    memory_no_predicted_mean = OptionWrapper('conserve_memory',
-        MEMORY_NO_PREDICTED_MEAN)
+        return self.memory_no_forecast_mean or self.memory_no_forecast_cov
+
+    @memory_no_forecast.setter
+    def memory_no_forecast(self, value):
+        if bool(value):
+            self.memory_no_forecast_mean = True
+            self.memory_no_forecast_cov = True
+        else:
+            self.memory_no_forecast_mean = False
+            self.memory_no_forecast_cov = False
+
+    memory_no_predicted_mean = OptionWrapper(
+        'conserve_memory', MEMORY_NO_PREDICTED_MEAN)
     """
     (bool) Flag to prevent storing predicted states.
     """
-    memory_no_predicted_cov = OptionWrapper('conserve_memory',
-        MEMORY_NO_PREDICTED_COV)
+    memory_no_predicted_cov = OptionWrapper(
+        'conserve_memory', MEMORY_NO_PREDICTED_COV)
     """
     (bool) Flag to prevent storing predicted state covariance matrices.
     """
-
     @property
     def memory_no_predicted(self):
         """
         (bool) Flag to prevent storing predicted state and covariance matrices.
         """
-        pass
-    memory_no_filtered_mean = OptionWrapper('conserve_memory',
-        MEMORY_NO_FILTERED_MEAN)
+        return self.memory_no_predicted_mean or self.memory_no_predicted_cov
+
+    @memory_no_predicted.setter
+    def memory_no_predicted(self, value):
+        if bool(value):
+            self.memory_no_predicted_mean = True
+            self.memory_no_predicted_cov = True
+        else:
+            self.memory_no_predicted_mean = False
+            self.memory_no_predicted_cov = False
+
+    memory_no_filtered_mean = OptionWrapper(
+        'conserve_memory', MEMORY_NO_FILTERED_MEAN)
     """
     (bool) Flag to prevent storing filtered states.
     """
-    memory_no_filtered_cov = OptionWrapper('conserve_memory',
-        MEMORY_NO_FILTERED_COV)
+    memory_no_filtered_cov = OptionWrapper(
+        'conserve_memory', MEMORY_NO_FILTERED_COV)
     """
     (bool) Flag to prevent storing filtered state covariance matrices.
     """
-
     @property
     def memory_no_filtered(self):
         """
         (bool) Flag to prevent storing filtered state and covariance matrices.
         """
-        pass
-    memory_no_likelihood = OptionWrapper('conserve_memory',
-        MEMORY_NO_LIKELIHOOD)
+        return self.memory_no_filtered_mean or self.memory_no_filtered_cov
+
+    @memory_no_filtered.setter
+    def memory_no_filtered(self, value):
+        if bool(value):
+            self.memory_no_filtered_mean = True
+            self.memory_no_filtered_cov = True
+        else:
+            self.memory_no_filtered_mean = False
+            self.memory_no_filtered_cov = False
+
+    memory_no_likelihood = (
+        OptionWrapper('conserve_memory', MEMORY_NO_LIKELIHOOD)
+    )
     """
     (bool) Flag to prevent storing likelihood values for each observation.
     """
@@ -274,8 +326,8 @@ class KalmanFilter(Representation):
     """
     (bool) Flag to prevent storing likelihood values for each observation.
     """
-    memory_no_std_forecast = OptionWrapper('conserve_memory',
-        MEMORY_NO_STD_FORECAST)
+    memory_no_std_forecast = (
+        OptionWrapper('conserve_memory', MEMORY_NO_STD_FORECAST))
     """
     (bool) Flag to prevent storing standardized forecast errors.
     """
@@ -283,9 +335,12 @@ class KalmanFilter(Representation):
     """
     (bool) Flag to conserve the maximum amount of memory.
     """
-    timing_options = ['timing_init_predicted', 'timing_init_filtered']
+
+    timing_options = [
+        'timing_init_predicted', 'timing_init_filtered'
+    ]
     timing_init_predicted = OptionWrapper('filter_timing',
-        TIMING_INIT_PREDICTED)
+                                          TIMING_INIT_PREDICTED)
     """
     (bool) Flag for the default timing convention (Durbin and Koopman, 2012).
     """
@@ -293,6 +348,8 @@ class KalmanFilter(Representation):
     """
     (bool) Flag for the alternate timing convention (Kim and Nelson, 2012).
     """
+
+    # Default filter options
     filter_method = FILTER_CONVENTIONAL
     """
     (int) Filtering method bitmask.
@@ -314,43 +371,153 @@ class KalmanFilter(Representation):
     (int) Filter timing.
     """

-    def __init__(self, k_endog, k_states, k_posdef=None, loglikelihood_burn
-        =0, tolerance=1e-19, results_class=None, kalman_filter_classes=None,
-        **kwargs):
+    def __init__(self, k_endog, k_states, k_posdef=None,
+                 loglikelihood_burn=0, tolerance=1e-19, results_class=None,
+                 kalman_filter_classes=None, **kwargs):
+        # Extract keyword arguments to-be-used later
         keys = ['filter_method'] + KalmanFilter.filter_methods
-        filter_method_kwargs = {key: kwargs.pop(key) for key in keys if key in
-            kwargs}
+        filter_method_kwargs = {key: kwargs.pop(key) for key in keys
+                                if key in kwargs}
         keys = ['inversion_method'] + KalmanFilter.inversion_methods
-        inversion_method_kwargs = {key: kwargs.pop(key) for key in keys if 
-            key in kwargs}
+        inversion_method_kwargs = {key: kwargs.pop(key) for key in keys
+                                   if key in kwargs}
         keys = ['stability_method'] + KalmanFilter.stability_methods
-        stability_method_kwargs = {key: kwargs.pop(key) for key in keys if 
-            key in kwargs}
+        stability_method_kwargs = {key: kwargs.pop(key) for key in keys
+                                   if key in kwargs}
         keys = ['conserve_memory'] + KalmanFilter.memory_options
-        conserve_memory_kwargs = {key: kwargs.pop(key) for key in keys if 
-            key in kwargs}
+        conserve_memory_kwargs = {key: kwargs.pop(key) for key in keys
+                                  if key in kwargs}
         keys = ['alternate_timing'] + KalmanFilter.timing_options
-        filter_timing_kwargs = {key: kwargs.pop(key) for key in keys if key in
-            kwargs}
-        super(KalmanFilter, self).__init__(k_endog, k_states, k_posdef, **
-            kwargs)
+        filter_timing_kwargs = {key: kwargs.pop(key) for key in keys
+                                if key in kwargs}
+
+        # Initialize the base class
+        super(KalmanFilter, self).__init__(
+            k_endog, k_states, k_posdef, **kwargs
+        )
+
+        # Setup the underlying Kalman filter storage
         self._kalman_filters = {}
+
+        # Filter options
         self.loglikelihood_burn = loglikelihood_burn
-        self.results_class = (results_class if results_class is not None else
-            FilterResults)
-        self.prefix_kalman_filter_map = (kalman_filter_classes if 
-            kalman_filter_classes is not None else tools.
-            prefix_kalman_filter_map.copy())
+        self.results_class = (
+            results_class if results_class is not None else FilterResults
+        )
+        # Options
+        self.prefix_kalman_filter_map = (
+            kalman_filter_classes
+            if kalman_filter_classes is not None
+            else tools.prefix_kalman_filter_map.copy())
+
         self.set_filter_method(**filter_method_kwargs)
         self.set_inversion_method(**inversion_method_kwargs)
         self.set_stability_method(**stability_method_kwargs)
         self.set_conserve_memory(**conserve_memory_kwargs)
         self.set_filter_timing(**filter_timing_kwargs)
+
         self.tolerance = tolerance
+
+        # Internal flags
+        # The _scale internal flag is used because we may want to
+        # use a fixed scale, in which case we want the flag to the Cython
+        # Kalman filter to indicate that the scale should not be concentrated
+        # out, so that self.filter_concentrated = False, but we still want to
+        # alert the results object that we are viewing the model as one in
+        # which the scale had been concentrated out for e.g. degree of freedom
+        # computations.
+        # This value should always be None, except within the fixed_scale
+        # context, and should not be modified by users or anywhere else.
         self._scale = None

+    def _clone_kwargs(self, endog, **kwargs):
+        # See Representation._clone_kwargs for docstring
+        kwargs = super(KalmanFilter, self)._clone_kwargs(endog, **kwargs)
+
+        # Get defaults for options
+        kwargs.setdefault('filter_method', self.filter_method)
+        kwargs.setdefault('inversion_method', self.inversion_method)
+        kwargs.setdefault('stability_method', self.stability_method)
+        kwargs.setdefault('conserve_memory', self.conserve_memory)
+        kwargs.setdefault('alternate_timing', bool(self.filter_timing))
+        kwargs.setdefault('tolerance', self.tolerance)
+        kwargs.setdefault('loglikelihood_burn', self.loglikelihood_burn)
+
+        return kwargs
+
+    @property
+    def _kalman_filter(self):
+        prefix = self.prefix
+        if prefix in self._kalman_filters:
+            return self._kalman_filters[prefix]
+        return None
+
+    def _initialize_filter(self, filter_method=None, inversion_method=None,
+                           stability_method=None, conserve_memory=None,
+                           tolerance=None, filter_timing=None,
+                           loglikelihood_burn=None):
+        if filter_method is None:
+            filter_method = self.filter_method
+        if inversion_method is None:
+            inversion_method = self.inversion_method
+        if stability_method is None:
+            stability_method = self.stability_method
+        if conserve_memory is None:
+            conserve_memory = self.conserve_memory
+        if loglikelihood_burn is None:
+            loglikelihood_burn = self.loglikelihood_burn
+        if filter_timing is None:
+            filter_timing = self.filter_timing
+        if tolerance is None:
+            tolerance = self.tolerance
+
+        # Make sure we have endog
+        if self.endog is None:
+            raise RuntimeError('Must bind a dataset to the model before'
+                               ' filtering or smoothing.')
+
+        # Initialize the representation matrices
+        prefix, dtype, create_statespace = self._initialize_representation()
+
+        # Determine if we need to (re-)create the filter
+        # (definitely need to recreate if we recreated the _statespace object)
+        create_filter = create_statespace or prefix not in self._kalman_filters
+        if not create_filter:
+            kalman_filter = self._kalman_filters[prefix]
+
+            create_filter = (
+                not kalman_filter.conserve_memory == conserve_memory or
+                not kalman_filter.loglikelihood_burn == loglikelihood_burn
+            )
+
+        # If the dtype-specific _kalman_filter does not exist (or if we need
+        # to re-create it), create it
+        if create_filter:
+            if prefix in self._kalman_filters:
+                # Delete the old filter
+                del self._kalman_filters[prefix]
+            # Setup the filter
+            cls = self.prefix_kalman_filter_map[prefix]
+            self._kalman_filters[prefix] = cls(
+                self._statespaces[prefix], filter_method, inversion_method,
+                stability_method, conserve_memory, filter_timing, tolerance,
+                loglikelihood_burn
+            )
+        # Otherwise, update the filter parameters
+        else:
+            kalman_filter = self._kalman_filters[prefix]
+            kalman_filter.set_filter_method(filter_method, False)
+            kalman_filter.inversion_method = inversion_method
+            kalman_filter.stability_method = stability_method
+            kalman_filter.filter_timing = filter_timing
+            kalman_filter.tolerance = tolerance
+            # conserve_memory and loglikelihood_burn changes always lead to
+            # re-created filters
+
+        return prefix, dtype, create_filter, create_statespace
+
     def set_filter_method(self, filter_method=None, **kwargs):
-        """
+        r"""
         Set the filtering method

         The filtering method controls aspects of which Kalman filtering
@@ -422,10 +589,14 @@ class KalmanFilter(Representation):
         >>> mod.ssm.filter_method
         17
         """
-        pass
+        if filter_method is not None:
+            self.filter_method = filter_method
+        for name in KalmanFilter.filter_methods:
+            if name in kwargs:
+                setattr(self, name, kwargs[name])

     def set_inversion_method(self, inversion_method=None, **kwargs):
-        """
+        r"""
         Set the inversion method

         The Kalman filter may contain one matrix inversion: that of the
@@ -504,10 +675,14 @@ class KalmanFilter(Representation):
         >>> mod.ssm.inversion_method
         16
         """
-        pass
+        if inversion_method is not None:
+            self.inversion_method = inversion_method
+        for name in KalmanFilter.inversion_methods:
+            if name in kwargs:
+                setattr(self, name, kwargs[name])

     def set_stability_method(self, stability_method=None, **kwargs):
-        """
+        r"""
         Set the numerical stability method

         The Kalman filter is a recursive algorithm that may in some cases
@@ -558,10 +733,14 @@ class KalmanFilter(Representation):
         >>> mod.ssm.stability_method
         0
         """
-        pass
+        if stability_method is not None:
+            self.stability_method = stability_method
+        for name in KalmanFilter.stability_methods:
+            if name in kwargs:
+                setattr(self, name, kwargs[name])

     def set_conserve_memory(self, conserve_memory=None, **kwargs):
-        """
+        r"""
         Set the memory conservation method

         By default, the Kalman filter computes a number of intermediate
@@ -652,10 +831,14 @@ class KalmanFilter(Representation):
         >>> mod.ssm.conserve_memory
         7
         """
-        pass
+        if conserve_memory is not None:
+            self.conserve_memory = conserve_memory
+        for name in KalmanFilter.memory_options:
+            if name in kwargs:
+                setattr(self, name, kwargs[name])

     def set_filter_timing(self, alternate_timing=None, **kwargs):
-        """
+        r"""
         Set the filter timing convention

         By default, the Kalman filter follows Durbin and Koopman, 2012, in
@@ -672,7 +855,12 @@ class KalmanFilter(Representation):
             Keyword arguments may be used to influence the memory conservation
             method by setting individual boolean flags. See notes for details.
         """
-        pass
+        if alternate_timing is not None:
+            self.filter_timing = int(alternate_timing)
+        if 'timing_init_predicted' in kwargs:
+            self.filter_timing = int(not kwargs['timing_init_predicted'])
+        if 'timing_init_filtered' in kwargs:
+            self.filter_timing = int(kwargs['timing_init_filtered'])

     @contextlib.contextmanager
     def fixed_scale(self, scale):
@@ -694,12 +882,53 @@ class KalmanFilter(Representation):
         concentrating out the scale, so that the set of parameters they are
         estimating does not include the scale.
         """
-        pass
+        # If a scale was provided, use it and do not concentrate it out of the
+        # loglikelihood
+        if scale is not None and scale != 1:
+            if not self.filter_concentrated:
+                raise ValueError('Cannot provide scale if filter method does'
+                                 ' not include FILTER_CONCENTRATED.')
+            self.filter_concentrated = False
+            self._scale = scale
+            obs_cov = self['obs_cov']
+            state_cov = self['state_cov']
+            self['obs_cov'] = scale * obs_cov
+            self['state_cov'] = scale * state_cov
+        try:
+            yield
+        finally:
+            # If a scale was provided, reset the model
+            if scale is not None and scale != 1:
+                self['state_cov'] = state_cov
+                self['obs_cov'] = obs_cov
+                self.filter_concentrated = True
+                self._scale = None
+
+    def _filter(self, filter_method=None, inversion_method=None,
+                stability_method=None, conserve_memory=None,
+                filter_timing=None, tolerance=None, loglikelihood_burn=None,
+                complex_step=False):
+        # Initialize the filter
+        prefix, dtype, create_filter, create_statespace = (
+            self._initialize_filter(
+                filter_method, inversion_method, stability_method,
+                conserve_memory, filter_timing, tolerance, loglikelihood_burn
+            )
+        )
+        kfilter = self._kalman_filters[prefix]
+
+        # Initialize the state
+        self._initialize_state(prefix=prefix, complex_step=complex_step)
+
+        # Run the filter
+        kfilter()
+
+        return kfilter

     def filter(self, filter_method=None, inversion_method=None,
-        stability_method=None, conserve_memory=None, filter_timing=None,
-        tolerance=None, loglikelihood_burn=None, complex_step=False):
-        """
+               stability_method=None, conserve_memory=None, filter_timing=None,
+               tolerance=None, loglikelihood_burn=None, complex_step=False):
+        r"""
         Apply the Kalman filter to the statespace model.

         Parameters
@@ -731,10 +960,29 @@ class KalmanFilter(Representation):
         This function by default does not compute variables required for
         smoothing.
         """
-        pass
+        # Handle memory conservation
+        if conserve_memory is None:
+            conserve_memory = self.conserve_memory | MEMORY_NO_SMOOTHING
+        conserve_memory_cache = self.conserve_memory
+        self.set_conserve_memory(conserve_memory)
+
+        # Run the filter
+        kfilter = self._filter(
+            filter_method, inversion_method, stability_method, conserve_memory,
+            filter_timing, tolerance, loglikelihood_burn, complex_step)
+
+        # Create the results object
+        results = self.results_class(self)
+        results.update_representation(self)
+        results.update_filter(kfilter)
+
+        # Resent memory conservation
+        self.set_conserve_memory(conserve_memory_cache)
+
+        return results

     def loglike(self, **kwargs):
-        """
+        r"""
         Calculate the loglikelihood associated with the statespace model.

         Parameters
@@ -748,10 +996,46 @@ class KalmanFilter(Representation):
         loglike : float
             The joint loglikelihood.
         """
-        pass
+        kwargs.setdefault('conserve_memory',
+                          MEMORY_CONSERVE ^ MEMORY_NO_LIKELIHOOD)
+        kfilter = self._filter(**kwargs)
+        loglikelihood_burn = kwargs.get('loglikelihood_burn',
+                                        self.loglikelihood_burn)
+        if not (kwargs['conserve_memory'] & MEMORY_NO_LIKELIHOOD):
+            loglike = np.sum(kfilter.loglikelihood[loglikelihood_burn:])
+        else:
+            loglike = np.sum(kfilter.loglikelihood)
+
+        # Need to modify the computed log-likelihood to incorporate the
+        # MLE scale.
+        if self.filter_method & FILTER_CONCENTRATED:
+            d = max(loglikelihood_burn, kfilter.nobs_diffuse)
+            nobs_k_endog = np.sum(
+                self.k_endog -
+                np.array(self._statespace.nmissing)[d:])
+
+            # In the univariate case, we need to subtract observations
+            # associated with a singular forecast error covariance matrix
+            nobs_k_endog -= kfilter.nobs_kendog_univariate_singular
+
+            if not (kwargs['conserve_memory'] & MEMORY_NO_LIKELIHOOD):
+                scale = np.sum(kfilter.scale[d:]) / nobs_k_endog
+            else:
+                scale = kfilter.scale[0] / nobs_k_endog
+
+            loglike += -0.5 * nobs_k_endog
+
+            # Now need to modify this for diffuse initialization, since for
+            # diffuse periods we only need to add in the scale value part if
+            # the diffuse forecast error covariance matrix element was singular
+            if kfilter.nobs_diffuse > 0:
+                nobs_k_endog -= kfilter.nobs_kendog_diffuse_nonsingular
+
+            loglike += -0.5 * nobs_k_endog * np.log(scale)
+        return loglike

     def loglikeobs(self, **kwargs):
-        """
+        r"""
         Calculate the loglikelihood for each observation associated with the
         statespace model.

@@ -771,13 +1055,69 @@ class KalmanFilter(Representation):
         loglike : array of float
             Array of loglikelihood values for each observation.
         """
-        pass
-
-    def simulate(self, nsimulations, measurement_shocks=None, state_shocks=
-        None, initial_state=None, pretransformed_measurement_shocks=True,
-        pretransformed_state_shocks=True, pretransformed_initial_state=True,
-        simulator=None, return_simulator=False, random_state=None):
-        """
+        if self.memory_no_likelihood:
+            raise RuntimeError('Cannot compute loglikelihood if'
+                               ' MEMORY_NO_LIKELIHOOD option is selected.')
+        if not self.filter_method & FILTER_CONCENTRATED:
+            kwargs.setdefault('conserve_memory',
+                              MEMORY_CONSERVE ^ MEMORY_NO_LIKELIHOOD)
+        else:
+            kwargs.setdefault(
+                'conserve_memory',
+                MEMORY_CONSERVE ^ (MEMORY_NO_FORECAST | MEMORY_NO_LIKELIHOOD))
+        kfilter = self._filter(**kwargs)
+        llf_obs = np.array(kfilter.loglikelihood, copy=True)
+        loglikelihood_burn = kwargs.get('loglikelihood_burn',
+                                        self.loglikelihood_burn)
+
+        # If the scale was concentrated out of the log-likelihood function,
+        # then the llf_obs above is:
+        # -0.5 * k_endog * log 2 * pi - 0.5 * log |F_t|
+        # and we need to add in the effect of the scale:
+        # -0.5 * k_endog * log scale - 0.5 v' F_t^{-1} v / scale
+        # and note that v' F_t^{-1} is in the _kalman_filter.scale array
+        # Also note that we need to adjust the nobs and k_endog in both the
+        # denominator of the scale computation and in the llf_obs adjustment
+        # to take into account missing values.
+        if self.filter_method & FILTER_CONCENTRATED:
+            d = max(loglikelihood_burn, kfilter.nobs_diffuse)
+            nmissing = np.array(self._statespace.nmissing)
+            nobs_k_endog = np.sum(self.k_endog - nmissing[d:])
+
+            # In the univariate case, we need to subtract observations
+            # associated with a singular forecast error covariance matrix
+            nobs_k_endog -= kfilter.nobs_kendog_univariate_singular
+
+            scale = np.sum(kfilter.scale[d:]) / nobs_k_endog
+
+            # Need to modify this for diffuse initialization, since for
+            # diffuse periods we only need to add in the scale value if the
+            # diffuse forecast error covariance matrix element was singular
+            nsingular = 0
+            if kfilter.nobs_diffuse > 0:
+                d = kfilter.nobs_diffuse
+                Finf = kfilter.forecast_error_diffuse_cov
+                singular = np.diagonal(Finf).real <= kfilter.tolerance_diffuse
+                nsingular = np.sum(~singular, axis=1)
+
+            scale_obs = np.array(kfilter.scale, copy=True)
+            llf_obs += -0.5 * (
+                (self.k_endog - nmissing - nsingular) * np.log(scale) +
+                scale_obs / scale)
+
+        # Set any burned observations to have zero likelihood
+        llf_obs[:loglikelihood_burn] = 0
+
+        return llf_obs
+
+    def simulate(self, nsimulations, measurement_shocks=None,
+                 state_shocks=None, initial_state=None,
+                 pretransformed_measurement_shocks=True,
+                 pretransformed_state_shocks=True,
+                 pretransformed_initial_state=True,
+                 simulator=None, return_simulator=False,
+                 random_state=None):
+        r"""
         Simulate a new time series following the state space model

         Parameters
@@ -789,13 +1129,13 @@ class KalmanFilter(Representation):
             number
         measurement_shocks : array_like, optional
             If specified, these are the shocks to the measurement equation,
-            :math:`\\varepsilon_t`. If unspecified, these are automatically
+            :math:`\varepsilon_t`. If unspecified, these are automatically
             generated using a pseudo-random number generator. If specified,
             must be shaped `nsimulations` x `k_endog`, where `k_endog` is the
             same as in the state space model.
         state_shocks : array_like, optional
             If specified, these are the shocks to the state equation,
-            :math:`\\eta_t`. If unspecified, these are automatically
+            :math:`\eta_t`. If unspecified, these are automatically
             generated using a pseudo-random number generator. If specified,
             must be shaped `nsimulations` x `k_posdef` where `k_posdef` is the
             same as in the state space model.
@@ -847,11 +1187,34 @@ class KalmanFilter(Representation):
             returned, which can be reused for additional simulations of the
             same size.
         """
-        pass
+        time_invariant = self.time_invariant
+        # Check for valid number of simulations
+        if not time_invariant and nsimulations > self.nobs:
+            raise ValueError('In a time-varying model, cannot create more'
+                             ' simulations than there are observations.')
+
+        return self._simulate(
+            nsimulations,
+            measurement_disturbance_variates=measurement_shocks,
+            state_disturbance_variates=state_shocks,
+            initial_state_variates=initial_state,
+            pretransformed_measurement_disturbance_variates=(
+                pretransformed_measurement_shocks),
+            pretransformed_state_disturbance_variates=(
+                pretransformed_state_shocks),
+            pretransformed_initial_state_variates=(
+                pretransformed_initial_state),
+            simulator=simulator, return_simulator=return_simulator,
+            random_state=random_state)
+
+    def _simulate(self, nsimulations, simulator=None, random_state=None,
+                  **kwargs):
+        raise NotImplementedError('Simulation only available through'
+                                  ' the simulation smoother.')

     def impulse_responses(self, steps=10, impulse=0, orthogonalized=False,
-        cumulative=False, direct=False):
-        """
+                          cumulative=False, direct=False):
+        r"""
         Impulse response function

         Parameters
@@ -891,7 +1254,89 @@ class KalmanFilter(Representation):
         responses from arbitrary time points, it is necessary to clone a new
         model with the appropriate system matrices.
         """
-        pass
+        # We need to add an additional step, since the first simulated value
+        # will always be zeros (note that we take this value out at the end).
+        steps += 1
+
+        # For time-invariant models, add an additional `step`. This is the
+        # default for time-invariant models based on the expected behavior for
+        # ARIMA and VAR models: we want to record the initial impulse and also
+        # `steps` values of the responses afterwards.
+        if (self._design.shape[2] == 1 and self._transition.shape[2] == 1 and
+                self._selection.shape[2] == 1):
+            steps += 1
+
+        # Check for what kind of impulse we want
+        if type(impulse) is int:
+            if impulse >= self.k_posdef or impulse < 0:
+                raise ValueError('Invalid value for `impulse`. Must be the'
+                                 ' index of one of the state innovations.')
+
+            # Create the (non-orthogonalized) impulse vector
+            idx = impulse
+            impulse = np.zeros(self.k_posdef)
+            impulse[idx] = 1
+        else:
+            impulse = np.array(impulse)
+            if impulse.ndim > 1:
+                impulse = np.squeeze(impulse)
+            if not impulse.shape == (self.k_posdef,):
+                raise ValueError('Invalid impulse vector. Must be shaped'
+                                 ' (%d,)' % self.k_posdef)
+
+        # Orthogonalize the impulses, if requested, using Cholesky on the
+        # first state covariance matrix
+        if orthogonalized:
+            state_chol = np.linalg.cholesky(self.state_cov[:, :, 0])
+            impulse = np.dot(state_chol, impulse)
+
+        # If we have time-varying design, transition, or selection matrices,
+        # then we can't produce more IRFs than we have time points
+        time_invariant_irf = (
+            self._design.shape[2] == self._transition.shape[2] ==
+            self._selection.shape[2] == 1)
+
+        # Note: to generate impulse responses following the end of a
+        # time-varying model, one should `clone` the state space model with the
+        # new time-varying model, and then compute the IRFs using the cloned
+        # model
+        if not time_invariant_irf and steps > self.nobs:
+            raise ValueError('In a time-varying model, cannot create more'
+                             ' impulse responses than there are'
+                             ' observations')
+
+        # Impulse responses only depend on the design, transition, and
+        # selection matrices. We set the others to zeros because they must be
+        # set in the call to `clone`.
+        # Note: we don't even need selection after the first point, because
+        # the state shocks will be zeros in every period except the first.
+        sim_model = self.clone(
+            endog=np.zeros((steps, self.k_endog), dtype=self.dtype),
+            obs_intercept=np.zeros(self.k_endog),
+            design=self['design', :, :, :steps],
+            obs_cov=np.zeros((self.k_endog, self.k_endog)),
+            state_intercept=np.zeros(self.k_states),
+            transition=self['transition', :, :, :steps],
+            selection=self['selection', :, :, :steps],
+            state_cov=np.zeros((self.k_posdef, self.k_posdef)))
+
+        # Get the impulse response function via simulation of the state
+        # space model, but with other shocks set to zero
+        measurement_shocks = np.zeros((steps, self.k_endog))
+        state_shocks = np.zeros((steps, self.k_posdef))
+        state_shocks[0] = impulse
+        initial_state = np.zeros((self.k_states,))
+        irf, _ = sim_model.simulate(
+            steps, measurement_shocks=measurement_shocks,
+            state_shocks=state_shocks, initial_state=initial_state)
+
+        # Get the cumulative response if requested
+        if cumulative:
+            irf = np.cumsum(irf, axis=0)
+
+        # Here we ignore the first value, because it is always zeros (we added
+        # an additional `step` at the top to account for this).
+        return irf[1:]


 class FilterResults(FrozenRepresentation):
@@ -1005,22 +1450,29 @@ class FilterResults(FrozenRepresentation):
     llf_obs : ndarray
         The loglikelihood values at each time period.
     """
-    _filter_attributes = ['filter_method', 'inversion_method',
-        'stability_method', 'conserve_memory', 'filter_timing', 'tolerance',
-        'loglikelihood_burn', 'converged', 'period_converged',
-        'filtered_state', 'filtered_state_cov', 'predicted_state',
-        'predicted_state_cov', 'forecasts_error_diffuse_cov',
-        'predicted_diffuse_state_cov', 'tmp1', 'tmp2', 'tmp3', 'tmp4',
-        'forecasts', 'forecasts_error', 'forecasts_error_cov', 'llf',
-        'llf_obs', 'collapsed_forecasts', 'collapsed_forecasts_error',
-        'collapsed_forecasts_error_cov', 'scale']
-    _filter_options = (KalmanFilter.filter_methods + KalmanFilter.
-        stability_methods + KalmanFilter.inversion_methods + KalmanFilter.
-        memory_options)
+    _filter_attributes = [
+        'filter_method', 'inversion_method', 'stability_method',
+        'conserve_memory', 'filter_timing', 'tolerance', 'loglikelihood_burn',
+        'converged', 'period_converged', 'filtered_state',
+        'filtered_state_cov', 'predicted_state', 'predicted_state_cov',
+        'forecasts_error_diffuse_cov', 'predicted_diffuse_state_cov',
+        'tmp1', 'tmp2', 'tmp3', 'tmp4', 'forecasts',
+        'forecasts_error', 'forecasts_error_cov', 'llf', 'llf_obs',
+        'collapsed_forecasts', 'collapsed_forecasts_error',
+        'collapsed_forecasts_error_cov', 'scale'
+    ]
+
+    _filter_options = (
+        KalmanFilter.filter_methods + KalmanFilter.stability_methods +
+        KalmanFilter.inversion_methods + KalmanFilter.memory_options
+    )
+
     _attributes = FrozenRepresentation._model_attributes + _filter_attributes

     def __init__(self, model):
         super(FilterResults, self).__init__(model)
+
+        # Setup caches for uninitialized objects
         self._kalman_gain = None
         self._standardized_forecasts_error = None

@@ -1040,7 +1492,12 @@ class FilterResults(FrozenRepresentation):
         -----
         This method is rarely required except for internal usage.
         """
-        pass
+        if not only_options:
+            super(FilterResults, self).update_representation(model)
+
+        # Save the options as boolean variables
+        for name in self._filter_options:
+            setattr(self, name, getattr(model, name, None))

     def update_filter(self, kalman_filter):
         """
@@ -1055,18 +1512,367 @@ class FilterResults(FrozenRepresentation):
         -----
         This method is rarely required except for internal usage.
         """
-        pass
+        # State initialization
+        self.initial_state = np.array(
+            kalman_filter.model.initial_state, copy=True
+        )
+        self.initial_state_cov = np.array(
+            kalman_filter.model.initial_state_cov, copy=True
+        )
+
+        # Save Kalman filter parameters
+        self.filter_method = kalman_filter.filter_method
+        self.inversion_method = kalman_filter.inversion_method
+        self.stability_method = kalman_filter.stability_method
+        self.conserve_memory = kalman_filter.conserve_memory
+        self.filter_timing = kalman_filter.filter_timing
+        self.tolerance = kalman_filter.tolerance
+        self.loglikelihood_burn = kalman_filter.loglikelihood_burn
+
+        # Save Kalman filter output
+        self.converged = bool(kalman_filter.converged)
+        self.period_converged = kalman_filter.period_converged
+        self.univariate_filter = np.array(kalman_filter.univariate_filter,
+                                          copy=True)
+
+        self.filtered_state = np.array(kalman_filter.filtered_state, copy=True)
+        self.filtered_state_cov = np.array(
+            kalman_filter.filtered_state_cov, copy=True
+        )
+        self.predicted_state = np.array(
+            kalman_filter.predicted_state, copy=True
+        )
+        self.predicted_state_cov = np.array(
+            kalman_filter.predicted_state_cov, copy=True
+        )
+
+        # Reset caches
+        has_missing = np.sum(self.nmissing) > 0
+        if not (self.memory_no_std_forecast or self.invert_lu or
+                self.solve_lu or self.filter_collapsed):
+            if has_missing:
+                self._standardized_forecasts_error = np.array(
+                    reorder_missing_vector(
+                        kalman_filter.standardized_forecast_error,
+                        self.missing, prefix=self.prefix))
+            else:
+                self._standardized_forecasts_error = np.array(
+                    kalman_filter.standardized_forecast_error, copy=True)
+        else:
+            self._standardized_forecasts_error = None
+
+        # In the partially missing data case, all entries will
+        # be in the upper left submatrix rather than the correct placement
+        # Re-ordering does not make sense in the collapsed case.
+        if has_missing and (not self.memory_no_gain and
+                            not self.filter_collapsed):
+            self._kalman_gain = np.array(reorder_missing_matrix(
+                kalman_filter.kalman_gain, self.missing, reorder_cols=True,
+                prefix=self.prefix))
+            self.tmp1 = np.array(reorder_missing_matrix(
+                kalman_filter.tmp1, self.missing, reorder_cols=True,
+                prefix=self.prefix))
+            self.tmp2 = np.array(reorder_missing_vector(
+                kalman_filter.tmp2, self.missing, prefix=self.prefix))
+            self.tmp3 = np.array(reorder_missing_matrix(
+                kalman_filter.tmp3, self.missing, reorder_rows=True,
+                prefix=self.prefix))
+            self.tmp4 = np.array(reorder_missing_matrix(
+                kalman_filter.tmp4, self.missing, reorder_cols=True,
+                reorder_rows=True, prefix=self.prefix))
+        else:
+            if not self.memory_no_gain:
+                self._kalman_gain = np.array(
+                    kalman_filter.kalman_gain, copy=True)
+            self.tmp1 = np.array(kalman_filter.tmp1, copy=True)
+            self.tmp2 = np.array(kalman_filter.tmp2, copy=True)
+            self.tmp3 = np.array(kalman_filter.tmp3, copy=True)
+            self.tmp4 = np.array(kalman_filter.tmp4, copy=True)
+            self.M = np.array(kalman_filter.M, copy=True)
+            self.M_diffuse = np.array(kalman_filter.M_inf, copy=True)
+
+        # Note: use forecasts rather than forecast, so as not to interfer
+        # with the `forecast` methods in subclasses
+        self.forecasts = np.array(kalman_filter.forecast, copy=True)
+        self.forecasts_error = np.array(
+            kalman_filter.forecast_error, copy=True
+        )
+        self.forecasts_error_cov = np.array(
+            kalman_filter.forecast_error_cov, copy=True
+        )
+        # Note: below we will set self.llf, and in the memory_no_likelihood
+        # case we will replace self.llf_obs = None at that time.
+        self.llf_obs = np.array(kalman_filter.loglikelihood, copy=True)
+
+        # Diffuse objects
+        self.nobs_diffuse = kalman_filter.nobs_diffuse
+        self.initial_diffuse_state_cov = None
+        self.forecasts_error_diffuse_cov = None
+        self.predicted_diffuse_state_cov = None
+        if self.nobs_diffuse > 0:
+            self.initial_diffuse_state_cov = np.array(
+                kalman_filter.model.initial_diffuse_state_cov, copy=True)
+            self.predicted_diffuse_state_cov = np.array(
+                    kalman_filter.predicted_diffuse_state_cov, copy=True)
+            if has_missing and not self.filter_collapsed:
+                self.forecasts_error_diffuse_cov = np.array(
+                    reorder_missing_matrix(
+                        kalman_filter.forecast_error_diffuse_cov,
+                        self.missing, reorder_cols=True, reorder_rows=True,
+                        prefix=self.prefix))
+            else:
+                self.forecasts_error_diffuse_cov = np.array(
+                    kalman_filter.forecast_error_diffuse_cov, copy=True)
+
+        # If there was missing data, save the original values from the Kalman
+        # filter output, since below will set the values corresponding to
+        # the missing observations to nans.
+        self.missing_forecasts = None
+        self.missing_forecasts_error = None
+        self.missing_forecasts_error_cov = None
+        if np.sum(self.nmissing) > 0:
+            # Copy the provided arrays (which are as the Kalman filter dataset)
+            # into new variables
+            self.missing_forecasts = np.copy(self.forecasts)
+            self.missing_forecasts_error = np.copy(self.forecasts_error)
+            self.missing_forecasts_error_cov = (
+                np.copy(self.forecasts_error_cov)
+            )
+
+        # Save the collapsed values
+        self.collapsed_forecasts = None
+        self.collapsed_forecasts_error = None
+        self.collapsed_forecasts_error_cov = None
+        if self.filter_collapsed:
+            # Copy the provided arrays (which are from the collapsed dataset)
+            # into new variables
+            self.collapsed_forecasts = self.forecasts[:self.k_states, :]
+            self.collapsed_forecasts_error = (
+                self.forecasts_error[:self.k_states, :]
+            )
+            self.collapsed_forecasts_error_cov = (
+                self.forecasts_error_cov[:self.k_states, :self.k_states, :]
+            )
+            # Recreate the original arrays (which should be from the original
+            # dataset) in the appropriate dimension
+            dtype = self.collapsed_forecasts.dtype
+            self.forecasts = np.zeros((self.k_endog, self.nobs), dtype=dtype)
+            self.forecasts_error = np.zeros((self.k_endog, self.nobs),
+                                            dtype=dtype)
+            self.forecasts_error_cov = (
+                np.zeros((self.k_endog, self.k_endog, self.nobs), dtype=dtype)
+            )
+
+        # Fill in missing values in the forecast, forecast error, and
+        # forecast error covariance matrix (this is required due to how the
+        # Kalman filter implements observations that are either partly or
+        # completely missing)
+        # Construct the predictions, forecasts
+        can_compute_mean = not (self.memory_no_forecast_mean or
+                                self.memory_no_predicted_mean)
+        can_compute_cov = not (self.memory_no_forecast_cov or
+                               self.memory_no_predicted_cov)
+        if can_compute_mean or can_compute_cov:
+            for t in range(self.nobs):
+                design_t = 0 if self.design.shape[2] == 1 else t
+                obs_cov_t = 0 if self.obs_cov.shape[2] == 1 else t
+                obs_intercept_t = 0 if self.obs_intercept.shape[1] == 1 else t
+
+                # For completely missing observations, the Kalman filter will
+                # produce forecasts, but forecast errors and the forecast
+                # error covariance matrix will be zeros - make them nan to
+                # improve clarity of results.
+                if self.nmissing[t] > 0:
+                    mask = ~self.missing[:, t].astype(bool)
+                    # We can recover forecasts
+                    # For partially missing observations, the Kalman filter
+                    # will produce all elements (forecasts, forecast errors,
+                    # forecast error covariance matrices) as usual, but their
+                    # dimension will only be equal to the number of non-missing
+                    # elements, and their location in memory will be in the
+                    # first blocks (e.g. for the forecasts_error, the first
+                    # k_endog - nmissing[t] columns will be filled in),
+                    # regardless of which endogenous variables they refer to
+                    # (i.e. the non- missing endogenous variables for that
+                    # observation). Furthermore, the forecast error covariance
+                    # matrix is only valid for those elements. What is done is
+                    # to set all elements to nan for these observations so that
+                    # they are flagged as missing. The variables
+                    # missing_forecasts, etc. then provide the forecasts, etc.
+                    # provided by the Kalman filter, from which the data can be
+                    # retrieved if desired.
+                    if can_compute_mean:
+                        self.forecasts[:, t] = np.dot(
+                            self.design[:, :, design_t],
+                            self.predicted_state[:, t]
+                        ) + self.obs_intercept[:, obs_intercept_t]
+                        self.forecasts_error[:, t] = np.nan
+                        self.forecasts_error[mask, t] = (
+                            self.endog[mask, t] - self.forecasts[mask, t])
+                    # TODO: We should only fill in the non-masked elements of
+                    # this array. Also, this will give the multivariate version
+                    # even if univariate filtering was selected. Instead, we
+                    # should use the reordering methods and then replace the
+                    # masked values with NaNs
+                    if can_compute_cov:
+                        self.forecasts_error_cov[:, :, t] = np.dot(
+                            np.dot(self.design[:, :, design_t],
+                                   self.predicted_state_cov[:, :, t]),
+                            self.design[:, :, design_t].T
+                        ) + self.obs_cov[:, :, obs_cov_t]
+                # In the collapsed case, everything just needs to be rebuilt
+                # for the original observed data, since the Kalman filter
+                # produced these values for the collapsed data.
+                elif self.filter_collapsed:
+                    if can_compute_mean:
+                        self.forecasts[:, t] = np.dot(
+                            self.design[:, :, design_t],
+                            self.predicted_state[:, t]
+                        ) + self.obs_intercept[:, obs_intercept_t]
+
+                        self.forecasts_error[:, t] = (
+                            self.endog[:, t] - self.forecasts[:, t]
+                        )
+
+                    if can_compute_cov:
+                        self.forecasts_error_cov[:, :, t] = np.dot(
+                            np.dot(self.design[:, :, design_t],
+                                   self.predicted_state_cov[:, :, t]),
+                            self.design[:, :, design_t].T
+                        ) + self.obs_cov[:, :, obs_cov_t]
+
+        # Note: if we concentrated out the scale, need to adjust the
+        # loglikelihood values and all of the covariance matrices and the
+        # values that depend on the covariance matrices
+        # Note: concentrated computation is not permitted with collapsed
+        # version, so we do not need to modify collapsed arrays.
+        self.scale = 1.
+        if self.filter_concentrated and self.model._scale is None:
+            d = max(self.loglikelihood_burn, self.nobs_diffuse)
+            # Compute the scale
+            nmissing = np.array(kalman_filter.model.nmissing)
+            nobs_k_endog = np.sum(self.k_endog - nmissing[d:])
+
+            # In the univariate case, we need to subtract observations
+            # associated with a singular forecast error covariance matrix
+            nobs_k_endog -= kalman_filter.nobs_kendog_univariate_singular
+
+            scale_obs = np.array(kalman_filter.scale, copy=True)
+            if not self.memory_no_likelihood:
+                self.scale = np.sum(scale_obs[d:]) / nobs_k_endog
+            else:
+                self.scale = scale_obs[0] / nobs_k_endog
+
+            # Need to modify this for diffuse initialization, since for
+            # diffuse periods we only need to add in the scale value if the
+            # diffuse forecast error covariance matrix element was singular
+            nsingular = 0
+            if kalman_filter.nobs_diffuse > 0:
+                Finf = kalman_filter.forecast_error_diffuse_cov
+                singular = (np.diagonal(Finf).real <=
+                            kalman_filter.tolerance_diffuse)
+                nsingular = np.sum(~singular, axis=1)
+
+            # Adjust the loglikelihood obs (see `KalmanFilter.loglikeobs` for
+            # defaults on the adjustment)
+            if not self.memory_no_likelihood:
+                self.llf_obs += -0.5 * (
+                    (self.k_endog - nmissing - nsingular) * np.log(self.scale)
+                    + scale_obs / self.scale)
+            else:
+                self.llf_obs[0] += -0.5 * np.squeeze(
+                    np.sum(
+                        (self.k_endog - nmissing - nsingular)
+                        * np.log(self.scale)
+                    )
+                    + scale_obs / self.scale
+                )
+
+            # Scale the filter output
+            self.obs_cov = self.obs_cov * self.scale
+            self.state_cov = self.state_cov * self.scale
+
+            self.initial_state_cov = self.initial_state_cov * self.scale
+            self.predicted_state_cov = self.predicted_state_cov * self.scale
+            self.filtered_state_cov = self.filtered_state_cov * self.scale
+            self.forecasts_error_cov = self.forecasts_error_cov * self.scale
+            if self.missing_forecasts_error_cov is not None:
+                self.missing_forecasts_error_cov = (
+                    self.missing_forecasts_error_cov * self.scale)
+
+            # Note: do not have to adjust the Kalman gain or tmp4
+            self.tmp1 = self.tmp1 * self.scale
+            self.tmp2 = self.tmp2 / self.scale
+            self.tmp3 = self.tmp3 / self.scale
+            if not (self.memory_no_std_forecast or
+                    self.invert_lu or
+                    self.solve_lu or
+                    self.filter_collapsed):
+                self._standardized_forecasts_error = (
+                    self._standardized_forecasts_error / self.scale**0.5)
+        # The self.model._scale value is only not None within a fixed_scale
+        # context, in which case it is set and indicates that we should
+        # generally view this results object as using a concentrated scale
+        # (e.g. for d.o.f. computations), but because the fixed scale was
+        # actually applied to the model prior to filtering, we do not need to
+        # make any adjustments to the filter output, etc.
+        elif self.model._scale is not None:
+            self.filter_concentrated = True
+            self.scale = self.model._scale
+
+        # Now, save self.llf, and handle the memory_no_likelihood case
+        if not self.memory_no_likelihood:
+            self.llf = np.sum(self.llf_obs[self.loglikelihood_burn:])
+        else:
+            self.llf = self.llf_obs[0]
+            self.llf_obs = None

     @property
     def kalman_gain(self):
         """
         Kalman gain matrices
         """
-        pass
+        if self._kalman_gain is None:
+            # k x n
+            self._kalman_gain = np.zeros(
+                (self.k_states, self.k_endog, self.nobs), dtype=self.dtype)
+            for t in range(self.nobs):
+                # In the case of entirely missing observations, let the Kalman
+                # gain be zeros.
+                if self.nmissing[t] == self.k_endog:
+                    continue
+
+                design_t = 0 if self.design.shape[2] == 1 else t
+                transition_t = 0 if self.transition.shape[2] == 1 else t
+                if self.nmissing[t] == 0:
+                    self._kalman_gain[:, :, t] = np.dot(
+                        np.dot(
+                            self.transition[:, :, transition_t],
+                            self.predicted_state_cov[:, :, t]
+                        ),
+                        np.dot(
+                            np.transpose(self.design[:, :, design_t]),
+                            np.linalg.inv(self.forecasts_error_cov[:, :, t])
+                        )
+                    )
+                else:
+                    mask = ~self.missing[:, t].astype(bool)
+                    F = self.forecasts_error_cov[np.ix_(mask, mask, [t])]
+                    self._kalman_gain[:, mask, t] = np.dot(
+                        np.dot(
+                            self.transition[:, :, transition_t],
+                            self.predicted_state_cov[:, :, t]
+                        ),
+                        np.dot(
+                            np.transpose(self.design[mask, :, design_t]),
+                            np.linalg.inv(F[:, :, 0])
+                        )
+                    )
+        return self._kalman_gain

     @property
     def standardized_forecasts_error(self):
-        """
+        r"""
         Standardized forecast errors

         Notes
@@ -1075,18 +1881,18 @@ class FilterResults(FrozenRepresentation):

         .. math::

-            v_t \\sim N(0, F_t)
+            v_t \sim N(0, F_t)

         Hypothesis tests are usually applied to the standardized residuals

         .. math::

-            v_t^s = B_t v_t \\sim N(0, I)
+            v_t^s = B_t v_t \sim N(0, I)

         where :math:`B_t = L_t^{-1}` and :math:`F_t = L_t L_t'`; then
         :math:`F_t^{-1} = (L_t')^{-1} L_t^{-1} = B_t' B_t`; :math:`B_t`
         and :math:`L_t` are lower triangular. Finally,
-        :math:`B_t v_t \\sim N(0, B_t F_t B_t')` and
+        :math:`B_t v_t \sim N(0, B_t F_t B_t')` and
         :math:`B_t F_t B_t' = L_t^{-1} L_t L_t' (L_t')^{-1} = I`.

         Thus we can rewrite :math:`v_t^s = L_t^{-1} v_t` or
@@ -1094,10 +1900,36 @@ class FilterResults(FrozenRepresentation):
         use a linear solver to recover :math:`v_t^s`. Since :math:`L_t` is
         lower triangular, we can use a triangular solver (?TRTRS).
         """
-        pass
+        if (self._standardized_forecasts_error is None
+                and not self.memory_no_forecast):
+            if self.k_endog == 1:
+                self._standardized_forecasts_error = (
+                    self.forecasts_error /
+                    self.forecasts_error_cov[0, 0, :]**0.5)
+            else:
+                from scipy import linalg
+                self._standardized_forecasts_error = np.zeros(
+                    self.forecasts_error.shape, dtype=self.dtype)
+                for t in range(self.forecasts_error_cov.shape[2]):
+                    if self.nmissing[t] > 0:
+                        self._standardized_forecasts_error[:, t] = np.nan
+                    if self.nmissing[t] < self.k_endog:
+                        mask = ~self.missing[:, t].astype(bool)
+                        F = self.forecasts_error_cov[np.ix_(mask, mask, [t])]
+                        try:
+                            upper, _ = linalg.cho_factor(F[:, :, 0])
+                            self._standardized_forecasts_error[mask, t] = (
+                                linalg.solve_triangular(
+                                    upper, self.forecasts_error[mask, t],
+                                    trans=1))
+                        except linalg.LinAlgError:
+                            self._standardized_forecasts_error[mask, t] = (
+                                np.nan)
+
+        return self._standardized_forecasts_error

     def predict(self, start=None, end=None, dynamic=None, **kwargs):
-        """
+        r"""
         In-sample and out-of-sample prediction for state space models generally

         Parameters
@@ -1136,11 +1968,141 @@ class FilterResults(FrozenRepresentation):
         Out-of-sample prediction first applies the Kalman filter to missing
         data for the number of periods desired to obtain the predicted states.
         """
-        pass
+        # Get the start and the end of the entire prediction range
+        if start is None:
+            start = 0
+        elif start < 0:
+            raise ValueError('Cannot predict values previous to the sample.')
+        if end is None:
+            end = self.nobs
+
+        # Prediction and forecasting is performed by iterating the Kalman
+        # Kalman filter through the entire range [0, end]
+        # Then, everything is returned corresponding to the range [start, end].
+        # In order to perform the calculations, the range is separately split
+        # up into the following categories:
+        # - static:   (in-sample) the Kalman filter is run as usual
+        # - dynamic:  (in-sample) the Kalman filter is run, but on missing data
+        # - forecast: (out-of-sample) the Kalman filter is run, but on missing
+        #             data
+
+        # Short-circuit if end is before start
+        if end <= start:
+            raise ValueError('End of prediction must be after start.')
+
+        # Get the number of forecasts to make after the end of the sample
+        nforecast = max(0, end - self.nobs)
+
+        # Get the number of dynamic prediction periods
+
+        # If `dynamic=True`, then assume that we want to begin dynamic
+        # prediction at the start of the sample prediction.
+        if dynamic is True:
+            dynamic = 0
+        # If `dynamic=False`, then assume we want no dynamic prediction
+        if dynamic is False:
+            dynamic = None
+
+        # Check validity of dynamic and warn or error if issues
+        dynamic, ndynamic = _check_dynamic(dynamic, start, end, self.nobs)
+
+        # Get the number of in-sample static predictions
+        if dynamic is None:
+            nstatic = min(end, self.nobs) - min(start, self.nobs)
+        else:
+            # (use max(., 0), since dynamic can be prior to start)
+            nstatic = max(dynamic - start, 0)
+
+        # Cannot do in-sample prediction if we do not have appropriate
+        # arrays (we can do out-of-sample forecasting, however)
+        if nstatic > 0 and self.memory_no_forecast_mean:
+            raise ValueError('In-sample prediction is not available if memory'
+                             ' conservation has been used to avoid storing'
+                             ' forecast means.')
+        # Cannot do dynamic in-sample prediction if we do not have appropriate
+        # arrays (we can do out-of-sample forecasting, however)
+        if ndynamic > 0 and self.memory_no_predicted:
+            raise ValueError('In-sample dynamic prediction is not available if'
+                             ' memory conservation has been used to avoid'
+                             ' storing forecasted or predicted state means'
+                             ' or covariances.')
+
+        # Construct the predicted state and covariance matrix for each time
+        # period depending on whether that time period corresponds to
+        # one-step-ahead prediction, dynamic prediction, or out-of-sample
+        # forecasting.
+
+        # If we only have simple prediction, then we can use the already saved
+        # Kalman filter output
+        if ndynamic == 0 and nforecast == 0:
+            results = self
+            oos_results = None
+        # If we have dynamic prediction or forecasting, then we need to
+        # re-apply the Kalman filter
+        else:
+            # Figure out the period for which we need to run the Kalman filter
+            if dynamic is not None:
+                kf_start = min(dynamic, self.nobs)
+            else:
+                kf_start = self.nobs
+            kf_end = end
+
+            # Make start, end consistent with the results that we're generating
+            # start = max(start - kf_start, 0)
+            # end = kf_end - kf_start
+
+            # We must at least store forecasts and predictions
+            kwargs['conserve_memory'] = (
+                self.conserve_memory & ~MEMORY_NO_FORECAST &
+                ~MEMORY_NO_PREDICTED)
+
+            # Can't use Chandrasekhar recursions for prediction
+            kwargs['filter_method'] = (
+                self.model.filter_method & ~FILTER_CHANDRASEKHAR)
+
+            # TODO: there is a corner case here when the filter has not
+            #       exited the diffuse filter, in which case this known
+            #       initialization is not correct.
+            # Even if we have not stored all predicted values (means and covs),
+            # we can still do pure out-of-sample forecasting because we will
+            # always have stored the last predicted values. In this case, we
+            # will initialize the forecasting filter with these values
+            if self.memory_no_predicted:
+                constant = self.predicted_state[..., -1]
+                stationary_cov = self.predicted_state_cov[..., -1]
+            # Otherwise initialize with the predicted state / cov from the
+            # existing results, at index kf_start (note that the time
+            # dimension of predicted_state and predicted_state_cov is
+            # self.nobs + 1; so e.g. in the case of pure forecasting we should
+            # be using the very last predicted state and predicted state cov
+            # elements, and kf_start will equal self.nobs which is correct)
+            else:
+                constant = self.predicted_state[..., kf_start]
+                stationary_cov = self.predicted_state_cov[..., kf_start]
+
+            kwargs.update({'initialization': 'known',
+                           'constant': constant,
+                           'stationary_cov': stationary_cov})
+
+            # Construct the new endogenous array.
+            endog = np.zeros((nforecast, self.k_endog)) * np.nan
+            model = self.model.extend(
+                endog, start=kf_start, end=kf_end - nforecast, **kwargs)
+            # Have to retroactively modify the model's endog
+            if ndynamic > 0:
+                model.endog[:, -(ndynamic + nforecast):] = np.nan
+
+            with model.fixed_scale(self.scale):
+                oos_results = model.filter()
+
+            results = self
+
+        return PredictionResults(results, start, end, nstatic, ndynamic,
+                                 nforecast, oos_results=oos_results)


 class PredictionResults(FilterResults):
-    """
+    r"""
     Results of in-sample and out-of-sample prediction for state space models
     generally

@@ -1223,23 +2185,34 @@ class PredictionResults(FilterResults):
     This class is essentially a view to the FilterResults object, but
     returning the appropriate ranges for everything.
     """
-    representation_attributes = ['endog', 'design', 'obs_intercept',
-        'obs_cov', 'transition', 'state_intercept', 'selection', 'state_cov']
-    filter_attributes = ['filtered_state', 'filtered_state_cov',
-        'predicted_state', 'predicted_state_cov', 'forecasts',
-        'forecasts_error', 'forecasts_error_cov']
-    smoother_attributes = ['smoothed_state', 'smoothed_state_cov']
+    representation_attributes = [
+        'endog', 'design', 'obs_intercept',
+        'obs_cov', 'transition', 'state_intercept', 'selection',
+        'state_cov'
+    ]
+    filter_attributes = [
+        'filtered_state', 'filtered_state_cov',
+        'predicted_state', 'predicted_state_cov',
+        'forecasts', 'forecasts_error', 'forecasts_error_cov'
+    ]
+    smoother_attributes = [
+        'smoothed_state', 'smoothed_state_cov',
+    ]

     def __init__(self, results, start, end, nstatic, ndynamic, nforecast,
-        oos_results=None):
+                 oos_results=None):
+        # Save the filter results object
         self.results = results
         self.oos_results = oos_results
+
+        # Save prediction ranges
         self.npredictions = start - end
         self.start = start
         self.end = end
         self.nstatic = nstatic
         self.ndynamic = ndynamic
         self.nforecast = nforecast
+
         self._predicted_signal = None
         self._predicted_signal_cov = None
         self._filtered_signal = None
@@ -1251,56 +2224,212 @@ class PredictionResults(FilterResults):
         self._smoothed_forecasts = None
         self._smoothed_forecasts_error_cov = None

+    def clear(self):
+        attributes = (['endog'] + self.representation_attributes
+                      + self.filter_attributes)
+        for attr in attributes:
+            _attr = '_' + attr
+            if hasattr(self, _attr):
+                delattr(self, _attr)
+
     def __getattr__(self, attr):
         """
         Provide access to the representation and filtered output in the
         appropriate range (`start` - `end`).
         """
+        # Prevent infinite recursive lookups
         if attr[0] == '_':
-            raise AttributeError("'%s' object has no attribute '%s'" % (
-                self.__class__.__name__, attr))
+            raise AttributeError("'%s' object has no attribute '%s'" %
+                                 (self.__class__.__name__, attr))
+
         _attr = '_' + attr
+
+        # Cache the attribute
         if not hasattr(self, _attr):
             if attr == 'endog' or attr in self.filter_attributes:
+                # Get a copy
                 value = getattr(self.results, attr).copy()
                 if self.ndynamic > 0:
                     end = self.end - self.ndynamic - self.nforecast
                     value = value[..., :end]
                 if self.oos_results is not None:
                     oos_value = getattr(self.oos_results, attr).copy()
+
+                    # Note that the last element of the results predicted state
+                    # and state cov will overlap with the first element of the
+                    # oos predicted state and state cov, so eliminate the
+                    # last element of the results versions
+                    # But if we have dynamic prediction, then we have already
+                    # eliminated the last element of the predicted state, so
+                    # we do not need to do it here.
                     if self.ndynamic == 0 and attr[:9] == 'predicted':
                         value = value[..., :-1]
+
                     value = np.concatenate([value, oos_value], axis=-1)
+
+                # Subset to the correct time frame
                 value = value[..., self.start:self.end]
             elif attr in self.smoother_attributes:
                 if self.ndynamic > 0:
                     raise NotImplementedError(
-                        'Cannot retrieve smoothed attributes when using dynamic prediction, since the information set used to compute the smoothed results differs from the information set implied by the dynamic prediction.'
-                        )
+                        'Cannot retrieve smoothed attributes when using'
+                        ' dynamic prediction, since the information set used'
+                        ' to compute the smoothed results differs from the'
+                        ' information set implied by the dynamic prediction.')
+                # Get a copy
                 value = getattr(self.results, attr).copy()
+
+                # The oos_results object is only dynamic or out-of-sample,
+                # so filtered == smoothed
                 if self.oos_results is not None:
                     filtered_attr = 'filtered' + attr[8:]
                     oos_value = getattr(self.oos_results, filtered_attr).copy()
                     value = np.concatenate([value, oos_value], axis=-1)
+
+                # Subset to the correct time frame
                 value = value[..., self.start:self.end]
             elif attr in self.representation_attributes:
                 value = getattr(self.results, attr).copy()
+                # If a time-invariant matrix, return it. Otherwise, subset to
+                # the correct period.
                 if value.shape[-1] == 1:
                     value = value[..., 0]
                 else:
                     if self.ndynamic > 0:
                         end = self.end - self.ndynamic - self.nforecast
                         value = value[..., :end]
+
                     if self.oos_results is not None:
                         oos_value = getattr(self.oos_results, attr).copy()
                         value = np.concatenate([value, oos_value], axis=-1)
                     value = value[..., self.start:self.end]
             else:
                 raise AttributeError("'%s' object has no attribute '%s'" %
-                    (self.__class__.__name__, attr))
+                                     (self.__class__.__name__, attr))
+
             setattr(self, _attr, value)
+
         return getattr(self, _attr)

+    def _compute_forecasts(self, states, states_cov, signal_only=False):
+        d = self.obs_intercept
+        Z = self.design
+        H = self.obs_cov
+
+        if d.ndim == 1:
+            d = d[:, None]
+
+        if Z.ndim == 2:
+            if not signal_only:
+                forecasts = d + Z @ states
+                forecasts_error_cov = (
+                    Z[None, ...] @ states_cov.T @ Z.T[None, ...] + H.T).T
+            else:
+                forecasts = Z @ states
+                forecasts_error_cov = (
+                    Z[None, ...] @ states_cov.T @ Z.T[None, ...]).T
+        else:
+            if not signal_only:
+                forecasts = d + (Z * states[None, :, :]).sum(axis=1)
+                tmp = Z[:, None, ...] * states_cov[None, ...]
+                tmp = (tmp[:, :, :, None, :]
+                       * Z.transpose(1, 0, 2)[None, :, None, ...])
+                forecasts_error_cov = (tmp.sum(axis=1).sum(axis=1).T + H.T).T
+            else:
+                forecasts = (Z * states[None, :, :]).sum(axis=1)
+                tmp = Z[:, None, ...] * states_cov[None, ...]
+                tmp = (tmp[:, :, :, None, :]
+                       * Z.transpose(1, 0, 2)[None, :, None, ...])
+                forecasts_error_cov = tmp.sum(axis=1).sum(axis=1)
+
+        return forecasts, forecasts_error_cov
+
+    @property
+    def predicted_signal(self):
+        if self._predicted_signal is None:
+            self._predicted_signal, self._predicted_signal_cov = (
+                self._compute_forecasts(self.predicted_state,
+                                        self.predicted_state_cov,
+                                        signal_only=True))
+        return self._predicted_signal
+
+    @property
+    def predicted_signal_cov(self):
+        if self._predicted_signal_cov is None:
+            self._predicted_signal, self._predicted_signal_cov = (
+                self._compute_forecasts(self.predicted_state,
+                                        self.predicted_state_cov,
+                                        signal_only=True))
+        return self._predicted_signal_cov
+
+    @property
+    def filtered_signal(self):
+        if self._filtered_signal is None:
+            self._filtered_signal, self._filtered_signal_cov = (
+                self._compute_forecasts(self.filtered_state,
+                                        self.filtered_state_cov,
+                                        signal_only=True))
+        return self._filtered_signal
+
+    @property
+    def filtered_signal_cov(self):
+        if self._filtered_signal_cov is None:
+            self._filtered_signal, self._filtered_signal_cov = (
+                self._compute_forecasts(self.filtered_state,
+                                        self.filtered_state_cov,
+                                        signal_only=True))
+        return self._filtered_signal_cov
+
+    @property
+    def smoothed_signal(self):
+        if self._smoothed_signal is None:
+            self._smoothed_signal, self._smoothed_signal_cov = (
+                self._compute_forecasts(self.smoothed_state,
+                                        self.smoothed_state_cov,
+                                        signal_only=True))
+        return self._smoothed_signal
+
+    @property
+    def smoothed_signal_cov(self):
+        if self._smoothed_signal_cov is None:
+            self._smoothed_signal, self._smoothed_signal_cov = (
+                self._compute_forecasts(self.smoothed_state,
+                                        self.smoothed_state_cov,
+                                        signal_only=True))
+        return self._smoothed_signal_cov
+
+    @property
+    def filtered_forecasts(self):
+        if self._filtered_forecasts is None:
+            self._filtered_forecasts, self._filtered_forecasts_cov = (
+                self._compute_forecasts(self.filtered_state,
+                                        self.filtered_state_cov))
+        return self._filtered_forecasts
+
+    @property
+    def filtered_forecasts_error_cov(self):
+        if self._filtered_forecasts_cov is None:
+            self._filtered_forecasts, self._filtered_forecasts_cov = (
+                self._compute_forecasts(self.filtered_state,
+                                        self.filtered_state_cov))
+        return self._filtered_forecasts_cov
+
+    @property
+    def smoothed_forecasts(self):
+        if self._smoothed_forecasts is None:
+            self._smoothed_forecasts, self._smoothed_forecasts_cov = (
+                self._compute_forecasts(self.smoothed_state,
+                                        self.smoothed_state_cov))
+        return self._smoothed_forecasts
+
+    @property
+    def smoothed_forecasts_error_cov(self):
+        if self._smoothed_forecasts_cov is None:
+            self._smoothed_forecasts, self._smoothed_forecasts_cov = (
+                self._compute_forecasts(self.smoothed_state,
+                                        self.smoothed_state_cov))
+        return self._smoothed_forecasts_cov
+

 def _check_dynamic(dynamic, start, end, nobs):
     """
@@ -1326,4 +2455,29 @@ def _check_dynamic(dynamic, start, end, nobs):
     ndynamic : int
         The number of dynamic forecasts
     """
-    pass
+    if dynamic is None:
+        return dynamic, 0
+
+    # Replace the relative dynamic offset with an absolute offset
+    dynamic = start + dynamic
+
+    # Validate the `dynamic` parameter
+    if dynamic < 0:
+        raise ValueError('Dynamic prediction cannot begin prior to the'
+                         ' first observation in the sample.')
+    elif dynamic > end:
+        warn('Dynamic prediction specified to begin after the end of'
+             ' prediction, and so has no effect.', ValueWarning)
+        return None, 0
+    elif dynamic > nobs:
+        warn('Dynamic prediction specified to begin during'
+             ' out-of-sample forecasting period, and so has no'
+             ' effect.', ValueWarning)
+        return None, 0
+
+    # Get the total size of the desired dynamic forecasting component
+    # Note: the first `dynamic` periods of prediction are actually
+    # *not* dynamic, because dynamic prediction begins at observation
+    # `dynamic`.
+    ndynamic = max(0, min(end, nobs) - dynamic)
+    return dynamic, ndynamic
diff --git a/statsmodels/tsa/statespace/kalman_smoother.py b/statsmodels/tsa/statespace/kalman_smoother.py
index fb614f0c6..134662d2a 100644
--- a/statsmodels/tsa/statespace/kalman_smoother.py
+++ b/statsmodels/tsa/statespace/kalman_smoother.py
@@ -4,27 +4,35 @@ State Space Representation and Kalman Filter, Smoother
 Author: Chad Fulton
 License: Simplified-BSD
 """
+
 import numpy as np
 from types import SimpleNamespace
+
 from statsmodels.tsa.statespace.representation import OptionWrapper
-from statsmodels.tsa.statespace.kalman_filter import KalmanFilter, FilterResults
-from statsmodels.tsa.statespace.tools import reorder_missing_matrix, reorder_missing_vector, copy_index_matrix
+from statsmodels.tsa.statespace.kalman_filter import (KalmanFilter,
+                                                      FilterResults)
+from statsmodels.tsa.statespace.tools import (
+    reorder_missing_matrix, reorder_missing_vector, copy_index_matrix)
 from statsmodels.tsa.statespace import tools, initialization
-SMOOTHER_STATE = 1
-SMOOTHER_STATE_COV = 2
-SMOOTHER_DISTURBANCE = 4
-SMOOTHER_DISTURBANCE_COV = 8
-SMOOTHER_STATE_AUTOCOV = 16
-SMOOTHER_ALL = (SMOOTHER_STATE | SMOOTHER_STATE_COV | SMOOTHER_DISTURBANCE |
-    SMOOTHER_DISTURBANCE_COV | SMOOTHER_STATE_AUTOCOV)
-SMOOTH_CONVENTIONAL = 1
-SMOOTH_CLASSICAL = 2
-SMOOTH_ALTERNATIVE = 4
-SMOOTH_UNIVARIATE = 8
+
+SMOOTHER_STATE = 0x01              # Durbin and Koopman (2012), Chapter 4.4.2
+SMOOTHER_STATE_COV = 0x02          # ibid., Chapter 4.4.3
+SMOOTHER_DISTURBANCE = 0x04        # ibid., Chapter 4.5
+SMOOTHER_DISTURBANCE_COV = 0x08    # ibid., Chapter 4.5
+SMOOTHER_STATE_AUTOCOV = 0x10      # ibid., Chapter 4.7
+SMOOTHER_ALL = (
+    SMOOTHER_STATE | SMOOTHER_STATE_COV | SMOOTHER_DISTURBANCE |
+    SMOOTHER_DISTURBANCE_COV | SMOOTHER_STATE_AUTOCOV
+)
+
+SMOOTH_CONVENTIONAL = 0x01
+SMOOTH_CLASSICAL = 0x02
+SMOOTH_ALTERNATIVE = 0x04
+SMOOTH_UNIVARIATE = 0x08


 class KalmanSmoother(KalmanFilter):
-    """
+    r"""
     State space representation of a time series process, with Kalman filter
     and smoother.

@@ -48,20 +56,29 @@ class KalmanSmoother(KalmanFilter):
         matrices, for Kalman filtering options, or for Kalman smoothing
         options. See `Representation` for more details.
     """
-    smoother_outputs = ['smoother_state', 'smoother_state_cov',
-        'smoother_state_autocov', 'smoother_disturbance',
-        'smoother_disturbance_cov', 'smoother_all']
+
+    smoother_outputs = [
+        'smoother_state', 'smoother_state_cov', 'smoother_state_autocov',
+        'smoother_disturbance', 'smoother_disturbance_cov', 'smoother_all',
+    ]
+
     smoother_state = OptionWrapper('smoother_output', SMOOTHER_STATE)
     smoother_state_cov = OptionWrapper('smoother_output', SMOOTHER_STATE_COV)
-    smoother_disturbance = OptionWrapper('smoother_output',
-        SMOOTHER_DISTURBANCE)
-    smoother_disturbance_cov = OptionWrapper('smoother_output',
-        SMOOTHER_DISTURBANCE_COV)
-    smoother_state_autocov = OptionWrapper('smoother_output',
-        SMOOTHER_STATE_AUTOCOV)
+    smoother_disturbance = (
+        OptionWrapper('smoother_output', SMOOTHER_DISTURBANCE)
+    )
+    smoother_disturbance_cov = (
+        OptionWrapper('smoother_output', SMOOTHER_DISTURBANCE_COV)
+    )
+    smoother_state_autocov = (
+        OptionWrapper('smoother_output', SMOOTHER_STATE_AUTOCOV)
+    )
     smoother_all = OptionWrapper('smoother_output', SMOOTHER_ALL)
-    smooth_methods = ['smooth_conventional', 'smooth_alternative',
-        'smooth_classical']
+
+    smooth_methods = [
+        'smooth_conventional', 'smooth_alternative', 'smooth_classical'
+    ]
+
     smooth_conventional = OptionWrapper('smooth_method', SMOOTH_CONVENTIONAL)
     """
     (bool) Flag for conventional (Durbin and Koopman, 2012) Kalman smoothing.
@@ -78,28 +95,99 @@ class KalmanSmoother(KalmanFilter):
     """
     (bool) Flag for univariate smoothing (uses modified Bryson-Frazier timing).
     """
+
+    # Default smoother options
     smoother_output = SMOOTHER_ALL
     smooth_method = 0

     def __init__(self, k_endog, k_states, k_posdef=None, results_class=None,
-        kalman_smoother_classes=None, **kwargs):
+                 kalman_smoother_classes=None, **kwargs):
+        # Set the default results class
         if results_class is None:
             results_class = SmootherResults
+
+        # Extract keyword arguments to-be-used later
         keys = ['smoother_output'] + KalmanSmoother.smoother_outputs
-        smoother_output_kwargs = {key: kwargs.pop(key) for key in keys if 
-            key in kwargs}
+        smoother_output_kwargs = {key: kwargs.pop(key) for key in keys
+                                  if key in kwargs}
         keys = ['smooth_method'] + KalmanSmoother.smooth_methods
-        smooth_method_kwargs = {key: kwargs.pop(key) for key in keys if key in
-            kwargs}
-        super(KalmanSmoother, self).__init__(k_endog, k_states, k_posdef,
-            results_class=results_class, **kwargs)
-        self.prefix_kalman_smoother_map = (kalman_smoother_classes if 
-            kalman_smoother_classes is not None else tools.
-            prefix_kalman_smoother_map.copy())
+        smooth_method_kwargs = {key: kwargs.pop(key) for key in keys
+                                if key in kwargs}
+
+        # Initialize the base class
+        super(KalmanSmoother, self).__init__(
+            k_endog, k_states, k_posdef, results_class=results_class, **kwargs
+        )
+
+        # Options
+        self.prefix_kalman_smoother_map = (
+            kalman_smoother_classes
+            if kalman_smoother_classes is not None
+            else tools.prefix_kalman_smoother_map.copy())
+
+        # Setup the underlying Kalman smoother storage
         self._kalman_smoothers = {}
+
+        # Set the smoother options
         self.set_smoother_output(**smoother_output_kwargs)
         self.set_smooth_method(**smooth_method_kwargs)

+    def _clone_kwargs(self, endog, **kwargs):
+        # See Representation._clone_kwargs for docstring
+        kwargs = super(KalmanSmoother, self)._clone_kwargs(endog, **kwargs)
+
+        # Get defaults for options
+        kwargs.setdefault('smoother_output', self.smoother_output)
+        kwargs.setdefault('smooth_method', self.smooth_method)
+
+        return kwargs
+
+    @property
+    def _kalman_smoother(self):
+        prefix = self.prefix
+        if prefix in self._kalman_smoothers:
+            return self._kalman_smoothers[prefix]
+        return None
+
+    def _initialize_smoother(self, smoother_output=None, smooth_method=None,
+                             prefix=None, **kwargs):
+        if smoother_output is None:
+            smoother_output = self.smoother_output
+        if smooth_method is None:
+            smooth_method = self.smooth_method
+
+        # Make sure we have the required Kalman filter
+        prefix, dtype, create_filter, create_statespace = (
+            self._initialize_filter(prefix, **kwargs)
+        )
+
+        # Determine if we need to (re-)create the smoother
+        # (definitely need to recreate if we recreated the filter)
+        create_smoother = (create_filter or
+                           prefix not in self._kalman_smoothers)
+        if not create_smoother:
+            kalman_smoother = self._kalman_smoothers[prefix]
+
+            create_smoother = (kalman_smoother.kfilter is not
+                               self._kalman_filters[prefix])
+
+        # If the dtype-specific _kalman_smoother does not exist (or if we
+        # need to re-create it), create it
+        if create_smoother:
+            # Setup the smoother
+            cls = self.prefix_kalman_smoother_map[prefix]
+            self._kalman_smoothers[prefix] = cls(
+                self._statespaces[prefix], self._kalman_filters[prefix],
+                smoother_output, smooth_method
+            )
+        # Otherwise, update the smoother parameters
+        else:
+            self._kalman_smoothers[prefix].set_smoother_output(
+                smoother_output, False)
+            self._kalman_smoothers[prefix].set_smooth_method(smooth_method)
+
+        return prefix, dtype, create_smoother, create_filter, create_statespace
+
     def set_smoother_output(self, smoother_output=None, **kwargs):
         """
         Set the smoother output
@@ -168,10 +256,14 @@ class KalmanSmoother(KalmanFilter):
         >>> mod.smoother_state
         True
         """
-        pass
+        if smoother_output is not None:
+            self.smoother_output = smoother_output
+        for name in KalmanSmoother.smoother_outputs:
+            if name in kwargs:
+                setattr(self, name, kwargs[name])

     def set_smooth_method(self, smooth_method=None, **kwargs):
-        """
+        r"""
         Set the smoothing method

         The smoothing method can be used to override the Kalman smoother
@@ -250,12 +342,38 @@ class KalmanSmoother(KalmanFilter):
         >>> mod.smooth_method
         17
         """
-        pass
+        if smooth_method is not None:
+            self.smooth_method = smooth_method
+        for name in KalmanSmoother.smooth_methods:
+            if name in kwargs:
+                setattr(self, name, kwargs[name])
+
+    def _smooth(self, smoother_output=None, smooth_method=None, prefix=None,
+                complex_step=False, results=None, **kwargs):
+        # Initialize the smoother
+        prefix, dtype, create_smoother, create_filter, create_statespace = (
+            self._initialize_smoother(
+                smoother_output, smooth_method, prefix=prefix, **kwargs
+            ))
+
+        # Check that the filter and statespace weren't just recreated
+        if create_filter or create_statespace:
+            raise ValueError('Passed settings forced re-creation of the'
+                             ' Kalman filter. Please run `_filter` before'
+                             ' running `_smooth`.')
+
+        # Get the appropriate smoother
+        smoother = self._kalman_smoothers[prefix]
+
+        # Run the smoother
+        smoother()
+
+        return smoother

     def smooth(self, smoother_output=None, smooth_method=None, results=None,
-        run_filter=True, prefix=None, complex_step=False,
-        update_representation=True, update_filter=True, update_smoother=
-        True, **kwargs):
+               run_filter=True, prefix=None, complex_step=False,
+               update_representation=True, update_filter=True,
+               update_smoother=True, **kwargs):
         """
         Apply the Kalman smoother to the statespace model.

@@ -280,11 +398,35 @@ class KalmanSmoother(KalmanFilter):
         -------
         SmootherResults object
         """
-        pass
+
+        # Run the filter
+        kfilter = self._filter(**kwargs)
+
+        # Create the results object
+        results = self.results_class(self)
+        if update_representation:
+            results.update_representation(self)
+        if update_filter:
+            results.update_filter(kfilter)
+        else:
+            # (even if we don't update all filter results, still need to
+            # update this)
+            results.nobs_diffuse = kfilter.nobs_diffuse
+
+        # Run the smoother
+        if smoother_output is None:
+            smoother_output = self.smoother_output
+        smoother = self._smooth(smoother_output, results=results, **kwargs)
+
+        # Update the results
+        if update_smoother:
+            results.update_smoother(smoother)
+
+        return results


 class SmootherResults(FilterResults):
-    """
+    r"""
     Results from applying the Kalman smoother and/or filter to a state space
     model.

@@ -406,7 +548,7 @@ class SmootherResults(FilterResults):
         The smoothed state covariance matrices at each time period.
     smoothed_state_autocov : ndarray
         The smoothed state lago-one autocovariance matrices at each time
-        period: :math:`Cov(\\alpha_{t+1}, \\alpha_t)`.
+        period: :math:`Cov(\alpha_{t+1}, \alpha_t)`.
     smoothed_measurement_disturbance : ndarray
         The smoothed measurement at each time period.
     smoothed_state_disturbance : ndarray
@@ -417,13 +559,18 @@ class SmootherResults(FilterResults):
     smoothed_state_disturbance_cov : ndarray
         The smoothed state disturbance covariance matrices at each time period.
     """
-    _smoother_attributes = ['smoother_output', 'scaled_smoothed_estimator',
+
+    _smoother_attributes = [
+        'smoother_output', 'scaled_smoothed_estimator',
         'scaled_smoothed_estimator_cov', 'smoothing_error',
         'smoothed_state', 'smoothed_state_cov', 'smoothed_state_autocov',
         'smoothed_measurement_disturbance', 'smoothed_state_disturbance',
         'smoothed_measurement_disturbance_cov',
-        'smoothed_state_disturbance_cov', 'innovations_transition']
+        'smoothed_state_disturbance_cov', 'innovations_transition'
+    ]
+
     _smoother_options = KalmanSmoother.smoother_outputs
+
     _attributes = FilterResults._model_attributes + _smoother_attributes

     def update_representation(self, model, only_options=False):
@@ -443,7 +590,16 @@ class SmootherResults(FilterResults):
         -----
         This method is rarely required except for internal usage.
         """
-        pass
+        super(SmootherResults, self).update_representation(model, only_options)
+
+        # Save the options as boolean variables
+        for name in self._smoother_options:
+            setattr(self, name, getattr(model, name, None))
+
+        # Initialize holders for smoothed forecasts
+        self._smoothed_forecasts = None
+        self._smoothed_forecasts_error = None
+        self._smoothed_forecasts_error_cov = None

     def update_smoother(self, smoother):
         """
@@ -458,10 +614,130 @@ class SmootherResults(FilterResults):
         -----
         This method is rarely required except for internal usage.
         """
-        pass
+        # Copy the appropriate output
+        attributes = []
+
+        # Since update_representation will already have been called, we can
+        # use the boolean options smoother_* and know they match the smoother
+        # itself
+        if self.smoother_state or self.smoother_disturbance:
+            attributes.append('scaled_smoothed_estimator')
+        if self.smoother_state_cov or self.smoother_disturbance_cov:
+            attributes.append('scaled_smoothed_estimator_cov')
+        if self.smoother_state:
+            attributes.append('smoothed_state')
+        if self.smoother_state_cov:
+            attributes.append('smoothed_state_cov')
+        if self.smoother_state_autocov:
+            attributes.append('smoothed_state_autocov')
+        if self.smoother_disturbance:
+            attributes += [
+                'smoothing_error',
+                'smoothed_measurement_disturbance',
+                'smoothed_state_disturbance'
+            ]
+        if self.smoother_disturbance_cov:
+            attributes += [
+                'smoothed_measurement_disturbance_cov',
+                'smoothed_state_disturbance_cov'
+            ]
+
+        has_missing = np.sum(self.nmissing) > 0
+        for name in self._smoother_attributes:
+            if name == 'smoother_output':
+                pass
+            elif name in attributes:
+                if name in ['smoothing_error',
+                            'smoothed_measurement_disturbance']:
+                    vector = getattr(smoother, name, None)
+                    if vector is not None and has_missing:
+                        vector = np.array(reorder_missing_vector(
+                            vector, self.missing, prefix=self.prefix))
+                    else:
+                        vector = np.array(vector, copy=True)
+                    setattr(self, name, vector)
+                elif name == 'smoothed_measurement_disturbance_cov':
+                    matrix = getattr(smoother, name, None)
+                    if matrix is not None and has_missing:
+                        matrix = reorder_missing_matrix(
+                            matrix, self.missing, reorder_rows=True,
+                            reorder_cols=True, prefix=self.prefix)
+                        # In the missing data case, we want to set the missing
+                        # components equal to their unconditional distribution
+                        copy_index_matrix(
+                            self.obs_cov, matrix, self.missing,
+                            index_rows=True, index_cols=True, inplace=True,
+                            prefix=self.prefix)
+                    else:
+                        matrix = np.array(matrix, copy=True)
+                    setattr(self, name, matrix)
+                else:
+                    setattr(self, name,
+                            np.array(getattr(smoother, name, None), copy=True))
+            else:
+                setattr(self, name, None)
+
+        self.innovations_transition = (
+            np.array(smoother.innovations_transition, copy=True))
+
+        # Diffuse objects
+        self.scaled_smoothed_diffuse_estimator = None
+        self.scaled_smoothed_diffuse1_estimator_cov = None
+        self.scaled_smoothed_diffuse2_estimator_cov = None
+        if self.nobs_diffuse > 0:
+            self.scaled_smoothed_diffuse_estimator = np.array(
+                smoother.scaled_smoothed_diffuse_estimator, copy=True)
+            self.scaled_smoothed_diffuse1_estimator_cov = np.array(
+                smoother.scaled_smoothed_diffuse1_estimator_cov, copy=True)
+            self.scaled_smoothed_diffuse2_estimator_cov = np.array(
+                smoother.scaled_smoothed_diffuse2_estimator_cov, copy=True)
+
+        # Adjustments
+
+        # For r_t (and similarly for N_t), what was calculated was
+        # r_T, ..., r_{-1}. We only want r_0, ..., r_T
+        # so exclude the appropriate element so that the time index is
+        # consistent with the other returned output
+        # r_t stored such that scaled_smoothed_estimator[0] == r_{-1}
+        start = 1
+        end = None
+        if 'scaled_smoothed_estimator' in attributes:
+            self.scaled_smoothed_estimator_presample = (
+                self.scaled_smoothed_estimator[:, 0])
+            self.scaled_smoothed_estimator = (
+                self.scaled_smoothed_estimator[:, start:end]
+            )
+        if 'scaled_smoothed_estimator_cov' in attributes:
+            self.scaled_smoothed_estimator_cov_presample = (
+                self.scaled_smoothed_estimator_cov[:, :, 0])
+            self.scaled_smoothed_estimator_cov = (
+                self.scaled_smoothed_estimator_cov[:, :, start:end]
+            )
+
+        # Clear the smoothed forecasts
+        self._smoothed_forecasts = None
+        self._smoothed_forecasts_error = None
+        self._smoothed_forecasts_error_cov = None
+
+        # Note: if we concentrated out the scale, need to adjust the
+        # loglikelihood values and all of the covariance matrices and the
+        # values that depend on the covariance matrices
+        if self.filter_concentrated and self.model._scale is None:
+            self.smoothed_state_cov *= self.scale
+            self.smoothed_state_autocov *= self.scale
+            self.smoothed_state_disturbance_cov *= self.scale
+            self.smoothed_measurement_disturbance_cov *= self.scale
+            self.scaled_smoothed_estimator_presample /= self.scale
+            self.scaled_smoothed_estimator /= self.scale
+            self.scaled_smoothed_estimator_cov_presample /= self.scale
+            self.scaled_smoothed_estimator_cov /= self.scale
+            self.smoothing_error /= self.scale
+
+        # Cache
+        self.__smoothed_state_autocovariance = {}

     def _smoothed_state_autocovariance(self, shift, start, end,
-        extend_kwargs=None):
+                                       extend_kwargs=None):
         """
         Compute "forward" autocovariances, Cov(t, t+j)

@@ -484,18 +760,86 @@ class SmootherResults(FilterResults):
             time-varying state space models.

         """
-        pass
-
-    def smoothed_state_autocovariance(self, lag=1, t=None, start=None, end=
-        None, extend_kwargs=None):
-        """
+        if extend_kwargs is None:
+            extend_kwargs = {}
+
+        # Size of returned array in the time dimension
+        n = end - start
+
+        # Get number of post-sample periods we need to create an extended
+        # model to compute
+        if shift == 0:
+            max_insample = self.nobs - shift
+        else:
+            max_insample = self.nobs - shift + 1
+        n_postsample = max(0, end - max_insample)
+
+        # Get full in-sample arrays
+        if shift != 0:
+            L = self.innovations_transition
+            P = self.predicted_state_cov
+            N = self.scaled_smoothed_estimator_cov
+        else:
+            acov = self.smoothed_state_cov
+
+        # If applicable, append out-of-sample arrays
+        if n_postsample > 0:
+            # Note: we need 1 less than the number of post
+            endog = np.zeros((n_postsample, self.k_endog)) * np.nan
+            mod = self.model.extend(endog, start=self.nobs, **extend_kwargs)
+            mod.initialize_known(self.predicted_state[..., self.nobs],
+                                 self.predicted_state_cov[..., self.nobs])
+            res = mod.smooth()
+
+            if shift != 0:
+                start_insample = max(0, start)
+                L = np.concatenate((L[..., start_insample:],
+                                    res.innovations_transition), axis=2)
+                P = np.concatenate((P[..., start_insample:],
+                                    res.predicted_state_cov[..., 1:]),
+                                   axis=2)
+                N = np.concatenate((N[..., start_insample:],
+                                    res.scaled_smoothed_estimator_cov),
+                                   axis=2)
+                end -= start_insample
+                start -= start_insample
+            else:
+                acov = np.concatenate((acov, res.predicted_state_cov), axis=2)
+
+        if shift != 0:
+            # Subset to appropriate start, end
+            start_insample = max(0, start)
+            LT = L[..., start_insample:end + shift - 1].T
+            P = P[..., start_insample:end + shift].T
+            N = N[..., start_insample:end + shift - 1].T
+
+            # Intermediate computations
+            tmpLT = np.eye(self.k_states)[None, :, :]
+            length = P.shape[0] - shift  # this is the required length of LT
+            for i in range(1, shift + 1):
+                tmpLT = LT[shift - i:length + shift - i] @ tmpLT
+            eye = np.eye(self.k_states)[None, ...]
+
+            # Compute the autocovariance
+            acov = np.zeros((n, self.k_states, self.k_states))
+            acov[:start_insample - start] = np.nan
+            acov[start_insample - start:] = (
+                P[:-shift] @ tmpLT @ (eye - N[shift - 1:] @ P[shift:]))
+        else:
+            acov = acov.T[start:end]
+
+        return acov
+
+    def smoothed_state_autocovariance(self, lag=1, t=None, start=None,
+                                      end=None, extend_kwargs=None):
+        r"""
         Compute state vector autocovariances, conditional on the full dataset

         Computes:

         .. math::

-            Cov(\\alpha_t - \\hat \\alpha_t, \\alpha_{t - j} - \\hat \\alpha_{t - j})
+            Cov(\alpha_t - \hat \alpha_t, \alpha_{t - j} - \hat \alpha_{t - j})

         where the `lag` argument gives the value for :math:`j`. Thus when
         the `lag` argument is positive, the autocovariance is between the
@@ -573,7 +917,7 @@ class SmootherResults(FilterResults):

         .. math::

-            Cov(\\alpha_t - \\hat \\alpha_t, \\alpha_{t - j} - \\hat \\alpha_{t - j})
+            Cov(\alpha_t - \hat \alpha_t, \alpha_{t - j} - \hat \alpha_{t - j})

         where the `lag` argument determines the autocovariance order :math:`j`,
         and `lag` is an integer (positive, zero, or negative). This method
@@ -608,11 +952,113 @@ class SmootherResults(FilterResults):
                Time Series Analysis by State Space Methods: Second Edition.
                Oxford University Press.
         """
-        pass
+        # We can cache the results for time-invariant models
+        cache_key = None
+        if extend_kwargs is None or len(extend_kwargs) == 0:
+            cache_key = (lag, t, start, end)
+
+        # Short-circuit for a cache-hit
+        if (cache_key is not None and
+                cache_key in self.__smoothed_state_autocovariance):
+            return self.__smoothed_state_autocovariance[cache_key]
+
+        # Switch to only positive values for `lag`
+        forward_autocovariances = False
+        if lag < 0:
+            lag = -lag
+            forward_autocovariances = True
+
+        # Handle `t`
+        if t is not None and (start is not None or end is not None):
+            raise ValueError('Cannot specify both `t` and `start` or `end`.')
+        if t is not None:
+            start = t
+            end = t + 1
+
+        # Defaults
+        if start is None:
+            start = 0
+        if end is None:
+            if forward_autocovariances and lag > 1 and extend_kwargs is None:
+                end = self.nobs - lag + 1
+            else:
+                end = self.nobs
+        if extend_kwargs is None:
+            extend_kwargs = {}
+
+        # Sanity checks
+        if start < 0 or end < 0:
+            raise ValueError('Negative `t`, `start`, or `end` is not allowed.')
+        if end < start:
+            raise ValueError('`end` must be after `start`')
+        if lag == 0 and self.smoothed_state_cov is None:
+            raise RuntimeError('Cannot return smoothed state covariances'
+                               ' if those values have not been computed by'
+                               ' Kalman smoothing.')
+
+        # We already have in-sample (+1 out-of-sample) smoothed covariances
+        if lag == 0 and end <= self.nobs + 1:
+            acov = self.smoothed_state_cov
+            if end == self.nobs + 1:
+                acov = np.concatenate(
+                    (acov[..., start:], self.predicted_state_cov[..., -1:]),
+                    axis=2).T
+            else:
+                acov = acov.T[start:end]
+        # In-sample, we can compute up to Cov(T, T+1) or Cov(T+1, T) and down
+        # to Cov(1, 2) or Cov(2, 1). So:
+        # - For lag=1 we set Cov(1, 0) = np.nan and then can compute up to T-1
+        #   in-sample values Cov(2, 1), ..., Cov(T, T-1) and the first
+        #   out-of-sample value Cov(T+1, T)
+        elif (lag == 1 and self.smoothed_state_autocov is not None and
+                not forward_autocovariances and end <= self.nobs + 1):
+            # nans = np.zeros((self.k_states, self.k_states, lag)) * np.nan
+            # acov = np.concatenate((nans, self.smoothed_state_autocov),
+            #                       axis=2).transpose(2, 0, 1)[start:end]
+            if start == 0:
+                nans = np.zeros((self.k_states, self.k_states, lag)) * np.nan
+                acov = np.concatenate(
+                    (nans, self.smoothed_state_autocov[..., :end - 1]),
+                    axis=2)
+            else:
+                acov = self.smoothed_state_autocov[..., start - 1:end - 1]
+            acov = acov.transpose(2, 0, 1)
+        # - For lag=-1 we can compute T in-sample values, Cov(1, 2), ...,
+        #   Cov(T, T+1) but we cannot compute the first out-of-sample value
+        #   Cov(T+1, T+2).
+        elif (lag == 1 and self.smoothed_state_autocov is not None and
+                forward_autocovariances and end < self.nobs + 1):
+            acov = self.smoothed_state_autocov.T[start:end]
+        # Otherwise, we need to compute additional values at the end of the
+        # sample
+        else:
+            if forward_autocovariances:
+                # Cov(t, t + lag), t = start, ..., end
+                acov = self._smoothed_state_autocovariance(
+                    lag, start, end, extend_kwargs=extend_kwargs)
+            else:
+                # Cov(t, t + lag)' = Cov(t + lag, t),
+                # with t = start - lag, ..., end - lag
+                out = self._smoothed_state_autocovariance(
+                    lag, start - lag, end - lag, extend_kwargs=extend_kwargs)
+                acov = out.transpose(0, 2, 1)
+
+        # Squeeze the last axis or else reshape to have the same axis
+        # definitions as e.g. smoothed_state_cov
+        if t is not None:
+            acov = acov[0]
+        else:
+            acov = acov.transpose(1, 2, 0)
+
+        # Fill in the cache, if applicable
+        if cache_key is not None:
+            self.__smoothed_state_autocovariance[cache_key] = acov
+
+        return acov

     def news(self, previous, t=None, start=None, end=None,
-        revisions_details_start=True, design=None, state_index=None):
-        """
+             revisions_details_start=True, design=None, state_index=None):
+        r"""
         Compute the news and impacts associated with a data release

         Parameters
@@ -680,7 +1126,7 @@ class SmootherResults(FilterResults):
             - `revisions_all` : y^r(updated) - y^r(previous) for all revisions
             - `gain`: the gain matrix associated with the "Kalman-like" update
               from the news, E[y I'] E[I I']^{-1}. In [1]_, this can be found
-              in the equation For E[y_{k,t_k} \\mid I_{v+1}] in the middle of
+              in the equation For E[y_{k,t_k} \mid I_{v+1}] in the middle of
               page 17.
             - `revision_weights` weights on observations for the smoothed
               signal
@@ -735,12 +1181,413 @@ class SmootherResults(FilterResults):
                Journal of Applied Econometrics 29, no. 1 (2014): 133-160.

         """
-        pass
-
-    def smoothed_state_gain(self, updates_ix, t=None, start=None, end=None,
-        extend_kwargs=None):
-        """
-        Cov(\\tilde \\alpha_{t}, I) Var(I, I)^{-1}
+        # Handle `t`
+        if t is not None and (start is not None or end is not None):
+            raise ValueError('Cannot specify both `t` and `start` or `end`.')
+        if t is not None:
+            start = t
+            end = t + 1
+
+        # Defaults
+        if start is None:
+            start = self.nobs - 1
+        if end is None:
+            end = self.nobs
+
+        # Sanity checks
+        if start < 0 or end < 0:
+            raise ValueError('Negative `t`, `start`, or `end` is not allowed.')
+        if end <= start:
+            raise ValueError('`end` must be after `start`')
+
+        if self.smoothed_state_cov is None:
+            raise ValueError('Cannot compute news without having applied the'
+                             ' Kalman smoother first.')
+
+        error_ss = ('This results object has %s and so it does not appear to'
+                    ' by an extension of `previous`. Can only compute the'
+                    ' news by comparing this results set to previous results'
+                    ' objects.')
+        if self.nobs < previous.nobs:
+            raise ValueError(error_ss % 'fewer observations than'
+                             ' `previous`')
+
+        if not (self.k_endog == previous.k_endog and
+                self.k_states == previous.k_states and
+                self.k_posdef == previous.k_posdef):
+            raise ValueError(error_ss % 'different state space dimensions than'
+                             ' `previous`')
+
+        for key in self.model.shapes.keys():
+            if key == 'obs':
+                continue
+            tv = getattr(self, key).shape[-1] > 1
+            tv_prev = getattr(previous, key).shape[-1] > 1
+            if tv and not tv_prev:
+                raise ValueError(error_ss % f'time-varying {key} while'
+                                 ' `previous` does not')
+            if not tv and tv_prev:
+                raise ValueError(error_ss % f'time-invariant {key} while'
+                                 ' `previous` does not')
+
+        # Standardize
+        if state_index is not None:
+            state_index = np.atleast_1d(
+                np.sort(np.array(state_index, dtype=int)))
+
+        # We cannot forecast out-of-sample periods in a time-varying model
+        if end > self.nobs and not self.model.time_invariant:
+            raise RuntimeError('Cannot compute the impacts of news on periods'
+                               ' outside of the sample in time-varying'
+                               ' models.')
+
+        # For time-varying case, figure out extension kwargs
+        extend_kwargs = {}
+        for key in self.model.shapes.keys():
+            if key == 'obs':
+                continue
+            mat = getattr(self, key)
+            prev_mat = getattr(previous, key)
+            if mat.shape[-1] > prev_mat.shape[-1]:
+                extend_kwargs[key] = mat[..., prev_mat.shape[-1]:]
+
+        # Figure out which indices have changed
+        revisions_ix, updates_ix = previous.model.diff_endog(self.endog.T)
+
+        # Compute prev / post impact forecasts
+        prev_impacted_forecasts = previous.predict(
+            start=start, end=end, **extend_kwargs).smoothed_forecasts
+        post_impacted_forecasts = self.predict(
+            start=start, end=end).smoothed_forecasts
+
+        # Separate revisions into those with detailed impacts and those where
+        # impacts are grouped together
+        if revisions_details_start is True:
+            revisions_details_start = 0
+        elif revisions_details_start is False:
+            revisions_details_start = previous.nobs
+        elif revisions_details_start < 0:
+            revisions_details_start = previous.nobs + revisions_details_start
+
+        revisions_grouped = []
+        revisions_details = []
+        if revisions_details_start > 0:
+            for s, i in revisions_ix:
+                if s < revisions_details_start:
+                    revisions_grouped.append((s, i))
+                else:
+                    revisions_details.append((s, i))
+        else:
+            revisions_details = revisions_ix
+
+        # Practically, don't compute impacts of revisions prior to first
+        # point that was actually revised
+        if len(revisions_ix) > 0:
+            revisions_details_start = max(revisions_ix[0][0],
+                                          revisions_details_start)
+
+        # Setup default (empty) output for revisions
+        revised_endog = None
+        revised_all = None
+        revised_prev_all = None
+        revisions_all = None
+
+        revised = None
+        revised_prev = None
+        revisions = None
+        revision_weights = None
+        revision_detailed_impacts = None
+        revision_results = None
+        revision_impacts = None
+
+        # Get revisions datapoints for all revisions (regardless of whether
+        # or not we are computing detailed impacts)
+        if len(revisions_ix) > 0:
+            # Indexes
+            revised_j, revised_p = zip(*revisions_ix)
+            compute_j = np.arange(revised_j[0], revised_j[-1] + 1)
+
+            # Data from updated model
+            revised_endog = self.endog[:, :previous.nobs].copy()
+            # ("revisions" are points where data was previously published and
+            # then changed, so we need to ignore "updates", which are points
+            # that were not previously published)
+            revised_endog[previous.missing.astype(bool)] = np.nan
+            # subset to revision periods
+            revised_all = revised_endog.T[compute_j]
+
+            # Data from original model
+            revised_prev_all = previous.endog.T[compute_j]
+
+            # revision = updated - original
+            revisions_all = (revised_all - revised_prev_all)
+
+            # Construct a model from which we can create weights for impacts
+            # through `end`
+            # Construct endog for the new model
+            tmp_endog = revised_endog.T.copy()
+            tmp_nobs = max(end, previous.nobs)
+            oos_nobs = tmp_nobs - previous.nobs
+            if oos_nobs > 0:
+                tmp_endog = np.concatenate([
+                    tmp_endog, np.zeros((oos_nobs, self.k_endog)) * np.nan
+                ], axis=0)
+
+            # Copy time-varying matrices (required by clone)
+            clone_kwargs = {}
+            for key in self.model.shapes.keys():
+                if key == 'obs':
+                    continue
+                mat = getattr(self, key)
+                if mat.shape[-1] > 1:
+                    clone_kwargs[key] = mat[..., :tmp_nobs]
+
+            rev_mod = previous.model.clone(tmp_endog, **clone_kwargs)
+            init = initialization.Initialization.from_results(self)
+            rev_mod.initialize(init)
+            revision_results = rev_mod.smooth()
+
+            # Get detailed revision weights, impacts, and forecasts
+            if len(revisions_details) > 0:
+                # Indexes for the subset of revisions for which we are
+                # computing detailed impacts
+                compute_j = np.arange(revisions_details_start,
+                                      revised_j[-1] + 1)
+                # Offset describing revisions for which we are not computing
+                # detailed impacts
+                offset = revisions_details_start - revised_j[0]
+                revised = revised_all[offset:]
+                revised_prev = revised_prev_all[offset:]
+                revisions = revisions_all[offset:]
+
+                # Compute the weights of the smoothed state vector
+                compute_t = np.arange(start, end)
+
+                smoothed_state_weights, _, _ = (
+                    tools._compute_smoothed_state_weights(
+                        rev_mod, compute_t=compute_t, compute_j=compute_j,
+                        compute_prior_weights=False, scale=previous.scale))
+
+                # Convert the weights in terms of smoothed forecasts
+                # t, j, m, p, i
+                ZT = rev_mod.design.T
+                if ZT.shape[0] > 1:
+                    ZT = ZT[compute_t]
+
+                # Subset the states used for the impacts if applicable
+                if state_index is not None:
+                    ZT = ZT[:, state_index, :]
+                    smoothed_state_weights = (
+                        smoothed_state_weights[:, :, state_index])
+
+                # Multiplication gives: t, j, m, p * t, j, m, p, k
+                # Sum along axis=2 gives: t, j, p, k
+                # Transpose to: t, j, k, p (i.e. like t, j, m, p but with k
+                # instead of m)
+                revision_weights = np.nansum(
+                    smoothed_state_weights[..., None]
+                    * ZT[:, None, :, None, :], axis=2).transpose(0, 1, 3, 2)
+
+                # Multiplication gives: t, j, k, p * t, j, k, p
+                # Sum along axes 1, 3 gives: t, k
+                # This is also a valid way to compute impacts, but it employs
+                # unnecessary multiplications with zeros; it is better to use
+                # the below method that flattens the revision indices before
+                # computing the impacts
+                # revision_detailed_impacts = np.nansum(
+                #     revision_weights * revisions[None, :, None, :],
+                #     axis=(1, 3))
+
+                # Flatten the weights and revisions along the revised j, k
+                # dimensions so that we only retain the actual revision
+                # elements
+                revised_j, revised_p = zip(*[
+                    s for s in revisions_ix
+                    if s[0] >= revisions_details_start])
+                ix_j = revised_j - revised_j[0]
+                # Shape is: t, k, j * p
+                # Note: have to transpose first so that the two advanced
+                # indexes are next to each other, so that "the dimensions from
+                # the advanced indexing operations are inserted into the result
+                # array at the same spot as they were in the initial array"
+                # (see https://numpy.org/doc/stable/user/basics.indexing.html,
+                # "Combining advanced and basic indexing")
+                revision_weights = (
+                    revision_weights.transpose(0, 2, 1, 3)[:, :,
+                                                           ix_j, revised_p])
+                # Shape is j * k
+                revisions = revisions[ix_j, revised_p]
+                # Shape is t, k
+                revision_detailed_impacts = revision_weights @ revisions
+
+                # Similarly, flatten the revised and revised_prev series
+                revised = revised[ix_j, revised_p]
+                revised_prev = revised_prev[ix_j, revised_p]
+
+                # Squeeze if `t` argument used
+                if t is not None:
+                    revision_weights = revision_weights[0]
+                    revision_detailed_impacts = revision_detailed_impacts[0]
+
+            # Get total revision impacts
+            revised_impact_forecasts = (
+                revision_results.smoothed_forecasts[..., start:end])
+            if end > revision_results.nobs:
+                predict_start = max(start, revision_results.nobs)
+                p = revision_results.predict(
+                    start=predict_start, end=end, **extend_kwargs)
+                revised_impact_forecasts = np.concatenate(
+                    (revised_impact_forecasts, p.forecasts), axis=1)
+
+            revision_impacts = (revised_impact_forecasts -
+                                prev_impacted_forecasts).T
+            if t is not None:
+                revision_impacts = revision_impacts[0]
+
+        # Need to also flatten the revisions items that contain all revisions
+        if len(revisions_ix) > 0:
+            revised_j, revised_p = zip(*revisions_ix)
+            ix_j = revised_j - revised_j[0]
+
+            revisions_all = revisions_all[ix_j, revised_p]
+            revised_all = revised_all[ix_j, revised_p]
+            revised_prev_all = revised_prev_all[ix_j, revised_p]
+
+        # Now handle updates
+        if len(updates_ix) > 0:
+            # Figure out which time points we need forecast errors for
+            update_t, update_k = zip(*updates_ix)
+            update_start_t = np.min(update_t)
+            update_end_t = np.max(update_t)
+
+            if revision_results is None:
+                forecasts = previous.predict(
+                    start=update_start_t, end=update_end_t + 1,
+                    **extend_kwargs).smoothed_forecasts.T
+            else:
+                forecasts = revision_results.predict(
+                    start=update_start_t,
+                    end=update_end_t + 1).smoothed_forecasts.T
+            realized = self.endog.T[update_start_t:update_end_t + 1]
+            forecasts_error = realized - forecasts
+
+            # Now subset forecast errors to only the (time, endog) elements
+            # that are updates
+            ix_t = update_t - update_start_t
+            update_realized = realized[ix_t, update_k]
+            update_forecasts = forecasts[ix_t, update_k]
+            update_forecasts_error = forecasts_error[ix_t, update_k]
+
+            # Get the gains associated with each of the periods
+            if self.design.shape[2] == 1:
+                design = self.design[..., 0][None, ...]
+            elif end <= self.nobs:
+                design = self.design[..., start:end].transpose(2, 0, 1)
+            else:
+                # Note: this case is no longer possible, since above we raise
+                # ValueError for time-varying case with end > self.nobs
+                if design is None:
+                    raise ValueError('Model has time-varying design matrix, so'
+                                     ' an updated time-varying matrix for'
+                                     ' period `t` is required.')
+                elif design.ndim == 2:
+                    design = design[None, ...]
+                else:
+                    design = design.transpose(2, 0, 1)
+
+            state_gain = previous.smoothed_state_gain(
+                updates_ix, start=start, end=end, extend_kwargs=extend_kwargs)
+
+            # Subset the states used for the impacts if applicable
+            if state_index is not None:
+                design = design[:, :, state_index]
+                state_gain = state_gain[:, state_index]
+
+            # Compute the gain in terms of observed variables
+            obs_gain = design @ state_gain
+
+            # Get the news
+            update_impacts = obs_gain @ update_forecasts_error
+
+            # Squeeze if `t` argument used
+            if t is not None:
+                obs_gain = obs_gain[0]
+                update_impacts = update_impacts[0]
+        else:
+            update_impacts = None
+            update_forecasts = None
+            update_realized = None
+            update_forecasts_error = None
+            obs_gain = None
+
+        # Results
+        out = SimpleNamespace(
+            # update to forecast of impacted variables from news
+            # = E[y^i | post] - E[y^i | revision] = weight @ news
+            update_impacts=update_impacts,
+            # update to forecast of variables of interest from revisions
+            # = E[y^i | revision] - E[y^i | previous]
+            revision_detailed_impacts=revision_detailed_impacts,
+            # news = A = y^u - E[y^u | previous]
+            news=update_forecasts_error,
+            # revivions y^r(updated) - y^r(previous) for periods in which
+            # detailed impacts were computed
+            revisions=revisions,
+            # revivions y^r(updated) - y^r(previous)
+            revisions_all=revisions_all,
+            # gain matrix = E[y A'] E[A A']^{-1}
+            gain=obs_gain,
+            # weights on observations for the smoothed signal
+            revision_weights=revision_weights,
+            # forecasts of the updated periods used to construct the news
+            # = E[y^u | revised]
+            update_forecasts=update_forecasts,
+            # realizations of the updated periods used to construct the news
+            # = y^u
+            update_realized=update_realized,
+            # revised observations of the periods that were revised and for
+            # which detailed impacts were computed
+            # = y^r_{revised}
+            revised=revised,
+            # revised observations of the periods that were revised
+            # = y^r_{revised}
+            revised_all=revised_all,
+            # previous observations of the periods that were revised and for
+            # which detailed impacts were computed
+            # = y^r_{previous}
+            revised_prev=revised_prev,
+            # previous observations of the periods that were revised
+            # = y^r_{previous}
+            revised_prev_all=revised_prev_all,
+            # previous forecast of the periods of interest, E[y^i | previous]
+            prev_impacted_forecasts=prev_impacted_forecasts,
+            # post. forecast of the periods of interest, E[y^i | post]
+            post_impacted_forecasts=post_impacted_forecasts,
+            # results object associated with the revision
+            revision_results=revision_results,
+            # total impacts from all revisions (both grouped and detailed)
+            revision_impacts=revision_impacts,
+            # list of (x, y) positions of revisions to endog
+            revisions_ix=revisions_ix,
+            # list of (x, y) positions of revisions to endog for which details
+            # of impacts were computed
+            revisions_details=revisions_details,
+            # list of (x, y) positions of revisions to endog for which impacts
+            # were grouped
+            revisions_grouped=revisions_grouped,
+            # period in which revision details start to be computed
+            revisions_details_start=revisions_details_start,
+            # list of (x, y) positions of updates to endog
+            updates_ix=updates_ix,
+            # index of state variables used to compute impacts
+            state_index=state_index)
+
+        return out
+
+    def smoothed_state_gain(self, updates_ix, t=None, start=None,
+                            end=None, extend_kwargs=None):
+        r"""
+        Cov(\tilde \alpha_{t}, I) Var(I, I)^{-1}

         where I is a vector of forecast errors associated with
         `update_indices`.
@@ -751,11 +1598,152 @@ class SmootherResults(FilterResults):
             List of indices `(t, i)`, where `t` denotes a zero-indexed time
             location and `i` denotes a zero-indexed endog variable.
         """
-        pass
+        # Handle `t`
+        if t is not None and (start is not None or end is not None):
+            raise ValueError('Cannot specify both `t` and `start` or `end`.')
+        if t is not None:
+            start = t
+            end = t + 1
+
+        # Defaults
+        if start is None:
+            start = self.nobs - 1
+        if end is None:
+            end = self.nobs
+        if extend_kwargs is None:
+            extend_kwargs = {}
+
+        # Sanity checks
+        if start < 0 or end < 0:
+            raise ValueError('Negative `t`, `start`, or `end` is not allowed.')
+        if end <= start:
+            raise ValueError('`end` must be after `start`')
+
+        # Dimensions
+        n_periods = end - start
+        n_updates = len(updates_ix)
+
+        # Helper to get possibly matrix that is possibly time-varying
+        def get_mat(which, t):
+            mat = getattr(self, which)
+            if mat.shape[-1] > 1:
+                if t < self.nobs:
+                    out = mat[..., t]
+                else:
+                    if (which not in extend_kwargs or
+                            extend_kwargs[which].shape[-1] <= t - self.nobs):
+                        raise ValueError(f'Model has time-varying {which}'
+                                         ' matrix, so an updated time-varying'
+                                         ' matrix for the extension period is'
+                                         ' required.')
+                    out = extend_kwargs[which][..., t - self.nobs]
+            else:
+                out = mat[..., 0]
+            return out
+
+        # Helper to get Cov(\tilde \alpha_{t}, I)
+        def get_cov_state_revision(t):
+            tmp1 = np.zeros((self.k_states, n_updates))
+            for i in range(n_updates):
+                t_i, k_i = updates_ix[i]
+                acov = self.smoothed_state_autocovariance(
+                    lag=t - t_i, t=t, extend_kwargs=extend_kwargs)
+                Z_i = get_mat('design', t_i)
+                tmp1[:, i:i + 1] = acov @ Z_i[k_i:k_i + 1].T
+            return tmp1
+
+        # Compute Cov(\tilde \alpha_{t}, I)
+        tmp1 = np.zeros((n_periods, self.k_states, n_updates))
+        for s in range(start, end):
+            tmp1[s - start] = get_cov_state_revision(s)
+
+        # Compute Var(I)
+        tmp2 = np.zeros((n_updates, n_updates))
+        for i in range(n_updates):
+            t_i, k_i = updates_ix[i]
+            for j in range(i + 1):
+                t_j, k_j = updates_ix[j]
+
+                Z_i = get_mat('design', t_i)
+                Z_j = get_mat('design', t_j)
+
+                acov = self.smoothed_state_autocovariance(
+                    lag=t_i - t_j, t=t_i, extend_kwargs=extend_kwargs)
+                tmp2[i, j] = tmp2[j, i] = np.squeeze(
+                    Z_i[k_i:k_i + 1] @ acov @ Z_j[k_j:k_j + 1].T
+                )
+
+                if t_i == t_j:
+                    H = get_mat('obs_cov', t_i)
+
+                    if i == j:
+                        tmp2[i, j] += H[k_i, k_j]
+                    else:
+                        tmp2[i, j] += H[k_i, k_j]
+                        tmp2[j, i] += H[k_i, k_j]
+
+        # Gain
+        gain = tmp1 @ np.linalg.inv(tmp2)
+
+        if t is not None:
+            gain = gain[0]
+
+        return gain
+
+    def _get_smoothed_forecasts(self):
+        if self._smoothed_forecasts is None:
+            # Initialize empty arrays
+            self._smoothed_forecasts = np.zeros(self.forecasts.shape,
+                                                dtype=self.dtype)
+            self._smoothed_forecasts_error = (
+                np.zeros(self.forecasts_error.shape, dtype=self.dtype)
+            )
+            self._smoothed_forecasts_error_cov = (
+                np.zeros(self.forecasts_error_cov.shape, dtype=self.dtype)
+            )
+
+            for t in range(self.nobs):
+                design_t = 0 if self.design.shape[2] == 1 else t
+                obs_cov_t = 0 if self.obs_cov.shape[2] == 1 else t
+                obs_intercept_t = 0 if self.obs_intercept.shape[1] == 1 else t
+
+                mask = ~self.missing[:, t].astype(bool)
+                # We can recover forecasts
+                self._smoothed_forecasts[:, t] = np.dot(
+                    self.design[:, :, design_t], self.smoothed_state[:, t]
+                ) + self.obs_intercept[:, obs_intercept_t]
+                if self.nmissing[t] > 0:
+                    self._smoothed_forecasts_error[:, t] = np.nan
+                self._smoothed_forecasts_error[mask, t] = (
+                    self.endog[mask, t] - self._smoothed_forecasts[mask, t]
+                )
+                self._smoothed_forecasts_error_cov[:, :, t] = np.dot(
+                    np.dot(self.design[:, :, design_t],
+                           self.smoothed_state_cov[:, :, t]),
+                    self.design[:, :, design_t].T
+                ) + self.obs_cov[:, :, obs_cov_t]
+
+        return (
+            self._smoothed_forecasts,
+            self._smoothed_forecasts_error,
+            self._smoothed_forecasts_error_cov
+        )
+
+    @property
+    def smoothed_forecasts(self):
+        return self._get_smoothed_forecasts()[0]
+
+    @property
+    def smoothed_forecasts_error(self):
+        return self._get_smoothed_forecasts()[1]
+
+    @property
+    def smoothed_forecasts_error_cov(self):
+        return self._get_smoothed_forecasts()[2]

     def get_smoothed_decomposition(self, decomposition_of='smoothed_state',
-        state_index=None):
-        """
+                                   state_index=None):
+        r"""
         Decompose smoothed output into contributions from observations

         Parameters
@@ -825,8 +1813,80 @@ class SmootherResults(FilterResults):

         Notes
         -----
-        Denote the smoothed state at time :math:`t` by :math:`\\alpha_t`. Then
-        the smoothed signal is :math:`Z_t \\alpha_t`, where :math:`Z_t` is the
+        Denote the smoothed state at time :math:`t` by :math:`\alpha_t`. Then
+        the smoothed signal is :math:`Z_t \alpha_t`, where :math:`Z_t` is the
         design matrix operative at time :math:`t`.
         """
-        pass
+        if decomposition_of not in ['smoothed_state', 'smoothed_signal']:
+            raise ValueError('Invalid value for `decomposition_of`. Must be'
+                             ' one of "smoothed_state" or "smoothed_signal".')
+
+        weights, state_intercept_weights, prior_weights = (
+            tools._compute_smoothed_state_weights(
+                self.model, compute_prior_weights=True, scale=self.scale))
+
+        # Get state space objects
+        ZT = self.model.design.T           # t, m, p
+        dT = self.model.obs_intercept.T    # t, p
+        cT = self.model.state_intercept.T  # t, m
+
+        # Subset the states used for the impacts if applicable
+        if decomposition_of == 'smoothed_signal' and state_index is not None:
+            ZT = ZT[:, state_index, :]
+            weights = weights[:, :, state_index]
+            prior_weights = prior_weights[:, state_index, :]
+
+        # Convert the weights in terms of smoothed signal
+        # t, j, m, p, i
+        if decomposition_of == 'smoothed_signal':
+            # Multiplication gives: t, j, m, p * t, j, m, p, k
+            # Sum along axis=2 gives: t, j, p, k
+            # Transpose to: t, j, k, p (i.e. like t, j, m, p but with k instead
+            # of m)
+            weights = np.nansum(weights[..., None] * ZT[:, None, :, None, :],
+                                axis=2).transpose(0, 1, 3, 2)
+
+            # Multiplication gives: t, j, m, l * t, j, m, l, k
+            # Sum along axis=2 gives: t, j, l, k
+            # Transpose to: t, j, k, l (i.e. like t, j, m, p but with k instead
+            # of m and l instead of p)
+            state_intercept_weights = np.nansum(
+                state_intercept_weights[..., None] * ZT[:, None, :, None, :],
+                axis=2).transpose(0, 1, 3, 2)
+
+            # Multiplication gives: t, m, l * t, m, l, k = t, m, l, k
+            # Sum along axis=1 gives: t, l, k
+            # Transpose to: t, k, l (i.e. like t, m, l but with k instead of m)
+            prior_weights = np.nansum(
+                prior_weights[..., None] * ZT[:, :, None, :],
+                axis=1).transpose(0, 2, 1)
+
+        # Contributions of observations: multiply weights by observations
+        # Multiplication gives t, j, {m,k}, p
+        data_contributions = weights * self.model.endog.T[None, :, None, :]
+        # Transpose to: t, {m,k}, j, p
+        data_contributions = data_contributions.transpose(0, 2, 1, 3)
+
+        # Contributions of obs intercept: multiply data weights by obs
+        # intercept
+        # Multiplication gives t, j, {m,k}, p
+        obs_intercept_contributions = -weights * dT[None, :, None, :]
+        # Transpose to: t, {m,k}, j, p
+        obs_intercept_contributions = (
+            obs_intercept_contributions.transpose(0, 2, 1, 3))
+
+        # Contributions of state intercept: multiply state intercept weights
+        # by state intercept
+        # Multiplication gives t, j, {m,k}, l
+        state_intercept_contributions = (
+            state_intercept_weights * cT[None, :, None, :])
+        # Transpose to: t, {m,k}, j, l
+        state_intercept_contributions = (
+            state_intercept_contributions.transpose(0, 2, 1, 3))
+
+        # Contributions of prior: multiply weights by prior
+        # Multiplication gives t, {m, k}, l
+        prior_contributions = prior_weights * self.initial_state[None, None, :]
+
+        return (data_contributions, obs_intercept_contributions,
+                state_intercept_contributions, prior_contributions)
diff --git a/statsmodels/tsa/statespace/mlemodel.py b/statsmodels/tsa/statespace/mlemodel.py
index 6ab9acf6e..3910a5533 100644
--- a/statsmodels/tsa/statespace/mlemodel.py
+++ b/statsmodels/tsa/statespace/mlemodel.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 State Space Model

@@ -5,22 +6,30 @@ Author: Chad Fulton
 License: Simplified-BSD
 """
 from statsmodels.compat.pandas import is_int_index
+
 import contextlib
 import warnings
+
 import datetime as dt
 from types import SimpleNamespace
 import numpy as np
 import pandas as pd
 from scipy.stats import norm
+
 from statsmodels.tools.tools import pinv_extended, Bunch
 from statsmodels.tools.sm_exceptions import PrecisionWarning, ValueWarning
-from statsmodels.tools.numdiff import _get_epsilon, approx_hess_cs, approx_fprime_cs, approx_fprime
+from statsmodels.tools.numdiff import (_get_epsilon, approx_hess_cs,
+                                       approx_fprime_cs, approx_fprime)
 from statsmodels.tools.decorators import cache_readonly
 from statsmodels.tools.eval_measures import aic, aicc, bic, hqic
+
 import statsmodels.base.wrapper as wrap
+
 import statsmodels.tsa.base.prediction as pred
+
 from statsmodels.base.data import PandasData
 import statsmodels.tsa.base.tsa_model as tsbase
+
 from .news import NewsResults
 from .simulation_smoother import SimulationSmoother
 from .kalman_smoother import SmootherResults
@@ -29,8 +38,53 @@ from .initialization import Initialization
 from .tools import prepare_exog, concat, _safe_cond, get_impact_dates


+def _handle_args(names, defaults, *args, **kwargs):
+    output_args = []
+    # We need to handle positional arguments in two ways, in case this was
+    # called by a Scipy optimization routine
+    if len(args) > 0:
+        # the fit() method will pass a dictionary
+        if isinstance(args[0], dict):
+            flags = args[0]
+        # otherwise, a user may have just used positional arguments...
+        else:
+            flags = dict(zip(names, args))
+        for i in range(len(names)):
+            output_args.append(flags.get(names[i], defaults[i]))
+
+        for name, value in flags.items():
+            if name in kwargs:
+                raise TypeError("loglike() got multiple values for keyword"
+                                " argument '%s'" % name)
+    else:
+        for i in range(len(names)):
+            output_args.append(kwargs.pop(names[i], defaults[i]))
+
+    return tuple(output_args) + (kwargs,)
+
+
+def _check_index(desired_index, dta, title='data'):
+    given_index = None
+    if isinstance(dta, (pd.Series, pd.DataFrame)):
+        given_index = dta.index
+    if given_index is not None and not desired_index.equals(given_index):
+        desired_freq = getattr(desired_index, 'freq', None)
+        given_freq = getattr(given_index, 'freq', None)
+        if ((desired_freq is not None or given_freq is not None) and
+                desired_freq != given_freq):
+            raise ValueError('Given %s does not have an index'
+                             ' that extends the index of the'
+                             ' model. Expected index frequency is'
+                             ' "%s", but got "%s".'
+                             % (title, desired_freq, given_freq))
+        else:
+            raise ValueError('Given %s does not have an index'
+                             ' that extends the index of the'
+                             ' model.' % title)
+
+
 class MLEModel(tsbase.TimeSeriesModel):
-    """
+    r"""
     State space model for maximum likelihood estimation

     Parameters
@@ -79,14 +133,26 @@ class MLEModel(tsbase.TimeSeriesModel):
     """

     def __init__(self, endog, k_states, exog=None, dates=None, freq=None,
-        **kwargs):
-        super(MLEModel, self).__init__(endog=endog, exog=exog, dates=dates,
-            freq=freq, missing='none')
+                 **kwargs):
+        # Initialize the model base
+        super(MLEModel, self).__init__(endog=endog, exog=exog,
+                                       dates=dates, freq=freq,
+                                       missing='none')
+
+        # Store kwargs to recreate model
         self._init_kwargs = kwargs
+
+        # Prepared the endog array: C-ordered, shape=(nobs x k_endog)
         self.endog, self.exog = self.prepare_data()
+
+        # Dimensions
         self.nobs = self.endog.shape[0]
         self.k_states = k_states
+
+        # Initialize the state-space representation
         self.initialize_statespace(**kwargs)
+
+        # Setup holder for fixed parameters
         self._has_fixed_params = False
         self._fixed_params = None
         self._params_index = None
@@ -97,7 +163,16 @@ class MLEModel(tsbase.TimeSeriesModel):
         """
         Prepare data for use in the state space representation
         """
-        pass
+        endog = np.array(self.data.orig_endog, order='C')
+        exog = self.data.orig_exog
+        if exog is not None:
+            exog = np.array(exog)
+
+        # Base class may allow 1-dim data, whereas we need 2-dim
+        if endog.ndim == 1:
+            endog.shape = (endog.shape[0], 1)  # this will be C-contiguous
+
+        return endog, exog

     def initialize_statespace(self, **kwargs):
         """
@@ -109,7 +184,56 @@ class MLEModel(tsbase.TimeSeriesModel):
             Additional keyword arguments to pass to the state space class
             constructor.
         """
-        pass
+        # (Now self.endog is C-ordered and in long format (nobs x k_endog). To
+        # get F-ordered and in wide format just need to transpose)
+        endog = self.endog.T
+
+        # Instantiate the state space object
+        self.ssm = SimulationSmoother(endog.shape[0], self.k_states,
+                                      nobs=endog.shape[1], **kwargs)
+        # Bind the data to the model
+        self.ssm.bind(endog)
+
+        # Other dimensions, now that `ssm` is available
+        self.k_endog = self.ssm.k_endog
+
+    def _get_index_with_final_state(self):
+        # The index we inherit from `TimeSeriesModel` will only cover the
+        # data sample itself, but we will also need an index value for the
+        # final state which is the next time step to the last datapoint.
+        # This method figures out an appropriate value for the three types of
+        # supported indexes: date-based, Int64Index, or RangeIndex
+        if self._index_dates:
+            if isinstance(self._index, pd.DatetimeIndex):
+                index = pd.date_range(
+                    start=self._index[0], periods=len(self._index) + 1,
+                    freq=self._index.freq)
+            elif isinstance(self._index, pd.PeriodIndex):
+                index = pd.period_range(
+                    start=self._index[0], periods=len(self._index) + 1,
+                    freq=self._index.freq)
+            else:
+                raise NotImplementedError
+        elif isinstance(self._index, pd.RangeIndex):
+            # COMPAT: pd.RangeIndex does not have start, stop, step prior to
+            #         pandas 0.25
+            try:
+                start = self._index.start
+                stop = self._index.stop
+                step = self._index.step
+            except AttributeError:
+                start = self._index._start
+                stop = self._index._stop
+                step = self._index._step
+            index = pd.RangeIndex(start, stop + step, step)
+        elif is_int_index(self._index):
+            # The only valid Int64Index is a full, incrementing index, so this
+            # is general
+            value = self._index[-1] + 1
+            index = pd.Index(self._index.tolist() + [value])
+        else:
+            raise NotImplementedError
+        return index

     def __setitem__(self, key, value):
         return self.ssm.__setitem__(key, value)
@@ -117,6 +241,16 @@ class MLEModel(tsbase.TimeSeriesModel):
     def __getitem__(self, key):
         return self.ssm.__getitem__(key)

+    def _get_init_kwds(self):
+        # Get keywords based on model attributes
+        kwds = super(MLEModel, self)._get_init_kwds()
+
+        for key, value in kwds.items():
+            if value is None and hasattr(self.ssm, key):
+                kwds[key] = getattr(self.ssm, key)
+
+        return kwds
+
     def clone(self, endog, exog=None, **kwargs):
         """
         Clone state space model with new data and optionally new specification
@@ -142,7 +276,24 @@ class MLEModel(tsbase.TimeSeriesModel):
         -----
         This method must be implemented
         """
-        pass
+        raise NotImplementedError('This method is not implemented in the base'
+                                  ' class and must be set up by each specific'
+                                  ' model.')
+
+    def _clone_from_init_kwds(self, endog, **kwargs):
+        # Cannot make this the default, because there is extra work required
+        # for subclasses to make _get_init_kwds useful.
+        use_kwargs = self._get_init_kwds()
+        use_kwargs.update(kwargs)
+
+        # Check for `exog`
+        if getattr(self, 'k_exog', 0) > 0 and kwargs.get('exog', None) is None:
+            raise ValueError('Cloning a model with an exogenous component'
+                             ' requires specifying a new exogenous array using'
+                             ' the `exog` argument.')
+
+        mod = self.__class__(endog, **use_kwargs)
+        return mod

     def set_filter_method(self, filter_method=None, **kwargs):
         """
@@ -164,7 +315,7 @@ class MLEModel(tsbase.TimeSeriesModel):
         This method is rarely used. See the corresponding function in the
         `KalmanFilter` class for details.
         """
-        pass
+        self.ssm.set_filter_method(filter_method, **kwargs)

     def set_inversion_method(self, inversion_method=None, **kwargs):
         """
@@ -188,7 +339,7 @@ class MLEModel(tsbase.TimeSeriesModel):
         This method is rarely used. See the corresponding function in the
         `KalmanFilter` class for details.
         """
-        pass
+        self.ssm.set_inversion_method(inversion_method, **kwargs)

     def set_stability_method(self, stability_method=None, **kwargs):
         """
@@ -212,7 +363,7 @@ class MLEModel(tsbase.TimeSeriesModel):
         This method is rarely used. See the corresponding function in the
         `KalmanFilter` class for details.
         """
-        pass
+        self.ssm.set_stability_method(stability_method, **kwargs)

     def set_conserve_memory(self, conserve_memory=None, **kwargs):
         """
@@ -236,7 +387,7 @@ class MLEModel(tsbase.TimeSeriesModel):
         This method is rarely used. See the corresponding function in the
         `KalmanFilter` class for details.
         """
-        pass
+        self.ssm.set_conserve_memory(conserve_memory, **kwargs)

     def set_smoother_output(self, smoother_output=None, **kwargs):
         """
@@ -258,19 +409,57 @@ class MLEModel(tsbase.TimeSeriesModel):
         This method is rarely used. See the corresponding function in the
         `KalmanSmoother` class for details.
         """
-        pass
+        self.ssm.set_smoother_output(smoother_output, **kwargs)

     def initialize_known(self, initial_state, initial_state_cov):
         """Initialize known"""
-        pass
+        self.ssm.initialize_known(initial_state, initial_state_cov)

     def initialize_approximate_diffuse(self, variance=None):
         """Initialize approximate diffuse"""
-        pass
+        self.ssm.initialize_approximate_diffuse(variance)

     def initialize_stationary(self):
         """Initialize stationary"""
-        pass
+        self.ssm.initialize_stationary()
+
+    @property
+    def initialization(self):
+        return self.ssm.initialization
+
+    @initialization.setter
+    def initialization(self, value):
+        self.ssm.initialization = value
+
+    @property
+    def initial_variance(self):
+        return self.ssm.initial_variance
+
+    @initial_variance.setter
+    def initial_variance(self, value):
+        self.ssm.initial_variance = value
+
+    @property
+    def loglikelihood_burn(self):
+        return self.ssm.loglikelihood_burn
+
+    @loglikelihood_burn.setter
+    def loglikelihood_burn(self, value):
+        self.ssm.loglikelihood_burn = value
+
+    @property
+    def tolerance(self):
+        return self.ssm.tolerance
+
+    @tolerance.setter
+    def tolerance(self, value):
+        self.ssm.tolerance = value
+
+    def _validate_can_fix_params(self, param_names):
+        for param_name in param_names:
+            if param_name not in self.param_names:
+                raise ValueError('Invalid parameter name passed: "%s".'
+                                 % param_name)

     @contextlib.contextmanager
     def fix_params(self, params):
@@ -290,13 +479,54 @@ class MLEModel(tsbase.TimeSeriesModel):
         >>> with mod.fix_params({'ar.L1': 0.5}):
                 res = mod.fit()
         """
-        pass
+        k_params = len(self.param_names)
+        # Initialization (this is done here rather than in the constructor
+        # because param_names may not be available at that point)
+        if self._fixed_params is None:
+            self._fixed_params = {}
+            self._params_index = dict(
+                zip(self.param_names, np.arange(k_params)))
+
+        # Cache the current fixed parameters
+        cache_fixed_params = self._fixed_params.copy()
+        cache_has_fixed_params = self._has_fixed_params
+        cache_fixed_params_index = self._fixed_params_index
+        cache_free_params_index = self._free_params_index
+
+        # Validate parameter names and values
+        all_fixed_param_names = (
+            set(params.keys()) | set(self._fixed_params.keys())
+        )
+        self._validate_can_fix_params(all_fixed_param_names)
+
+        # Set the new fixed parameters, keeping the order as given by
+        # param_names
+        self._fixed_params.update(params)
+        self._fixed_params = dict([
+            (name, self._fixed_params[name]) for name in self.param_names
+            if name in self._fixed_params])
+
+        # Update associated values
+        self._has_fixed_params = True
+        self._fixed_params_index = [self._params_index[key]
+                                    for key in self._fixed_params.keys()]
+        self._free_params_index = list(
+            set(np.arange(k_params)).difference(self._fixed_params_index))
+
+        try:
+            yield
+        finally:
+            # Reset the fixed parameters
+            self._has_fixed_params = cache_has_fixed_params
+            self._fixed_params = cache_fixed_params
+            self._fixed_params_index = cache_fixed_params_index
+            self._free_params_index = cache_free_params_index

     def fit(self, start_params=None, transformed=True, includes_fixed=False,
-        cov_type=None, cov_kwds=None, method='lbfgs', maxiter=50,
-        full_output=1, disp=5, callback=None, return_params=False,
-        optim_score=None, optim_complex_step=None, optim_hessian=None,
-        flags=None, low_memory=False, **kwargs):
+            cov_type=None, cov_kwds=None, method='lbfgs', maxiter=50,
+            full_output=1, disp=5, callback=None, return_params=False,
+            optim_score=None, optim_complex_step=None, optim_hessian=None,
+            flags=None, low_memory=False, **kwargs):
         """
         Fits the model by maximum likelihood via Kalman filter.

@@ -416,7 +646,97 @@ class MLEModel(tsbase.TimeSeriesModel):
         statsmodels.tsa.statespace.mlemodel.MLEResults
         statsmodels.tsa.statespace.structural.UnobservedComponentsResults
         """
-        pass
+        if start_params is None:
+            start_params = self.start_params
+            transformed = True
+            includes_fixed = True
+
+        # Update the score method
+        if optim_score is None and method == 'lbfgs':
+            kwargs.setdefault('approx_grad', True)
+            kwargs.setdefault('epsilon', 1e-5)
+        elif optim_score is None:
+            optim_score = 'approx'
+
+        # Check for complex step differentiation
+        if optim_complex_step is None:
+            optim_complex_step = not self.ssm._complex_endog
+        elif optim_complex_step and self.ssm._complex_endog:
+            raise ValueError('Cannot use complex step derivatives when data'
+                             ' or parameters are complex.')
+
+        # Standardize starting parameters
+        start_params = self.handle_params(start_params, transformed=True,
+                                          includes_fixed=includes_fixed)
+
+        # Unconstrain the starting parameters
+        if transformed:
+            start_params = self.untransform_params(start_params)
+
+        # Remove any fixed parameters
+        if self._has_fixed_params:
+            start_params = start_params[self._free_params_index]
+
+        # If all parameters are fixed, we are done
+        if self._has_fixed_params and len(start_params) == 0:
+            mlefit = Bunch(params=[], mle_retvals=None,
+                           mle_settings=None)
+        else:
+            # Remove disallowed kwargs
+            disallow = (
+                "concentrate_scale",
+                "enforce_stationarity",
+                "enforce_invertibility"
+            )
+            kwargs = {k: v for k, v in kwargs.items() if k not in disallow}
+            # Maximum likelihood estimation
+            if flags is None:
+                flags = {}
+            flags.update({
+                'transformed': False,
+                'includes_fixed': False,
+                'score_method': optim_score,
+                'approx_complex_step': optim_complex_step
+            })
+            if optim_hessian is not None:
+                flags['hessian_method'] = optim_hessian
+            fargs = (flags,)
+            mlefit = super(MLEModel, self).fit(start_params, method=method,
+                                               fargs=fargs,
+                                               maxiter=maxiter,
+                                               full_output=full_output,
+                                               disp=disp, callback=callback,
+                                               skip_hessian=True, **kwargs)
+
+        # Just return the fitted parameters if requested
+        if return_params:
+            return self.handle_params(mlefit.params, transformed=False,
+                                      includes_fixed=False)
+        # Otherwise construct the results class if desired
+        else:
+            # Handle memory conservation option
+            if low_memory:
+                conserve_memory = self.ssm.conserve_memory
+                self.ssm.set_conserve_memory(MEMORY_CONSERVE)
+
+            # Perform filtering / smoothing
+            if (self.ssm.memory_no_predicted or self.ssm.memory_no_gain
+                    or self.ssm.memory_no_smoothing):
+                func = self.filter
+            else:
+                func = self.smooth
+            res = func(mlefit.params, transformed=False, includes_fixed=False,
+                       cov_type=cov_type, cov_kwds=cov_kwds)
+
+            res.mlefit = mlefit
+            res.mle_retvals = mlefit.mle_retvals
+            res.mle_settings = mlefit.mle_settings
+
+            # Reset memory conservation
+            if low_memory:
+                self.ssm.set_conserve_memory(conserve_memory)
+
+            return res

     def fit_constrained(self, constraints, start_params=None, **fit_kwds):
         """
@@ -442,12 +762,37 @@ class MLEModel(tsbase.TimeSeriesModel):
         >>> mod = sm.tsa.SARIMAX(endog, order=(1, 0, 1))
         >>> res = mod.fit_constrained({'ar.L1': 0.5})
         """
-        pass
+        with self.fix_params(constraints):
+            res = self.fit(start_params, **fit_kwds)
+        return res
+
+    @property
+    def _res_classes(self):
+        return {'fit': (MLEResults, MLEResultsWrapper)}
+
+    def _wrap_results(self, params, result, return_raw, cov_type=None,
+                      cov_kwds=None, results_class=None, wrapper_class=None):
+        if not return_raw:
+            # Wrap in a results object
+            result_kwargs = {}
+            if cov_type is not None:
+                result_kwargs['cov_type'] = cov_type
+            if cov_kwds is not None:
+                result_kwargs['cov_kwds'] = cov_kwds
+
+            if results_class is None:
+                results_class = self._res_classes['fit'][0]
+            if wrapper_class is None:
+                wrapper_class = self._res_classes['fit'][1]
+
+            res = results_class(self, params, result, **result_kwargs)
+            result = wrapper_class(res)
+        return result

     def filter(self, params, transformed=True, includes_fixed=False,
-        complex_step=False, cov_type=None, cov_kwds=None, return_ssm=False,
-        results_class=None, results_wrapper_class=None, low_memory=False,
-        **kwargs):
+               complex_step=False, cov_type=None, cov_kwds=None,
+               return_ssm=False, results_class=None,
+               results_wrapper_class=None, low_memory=False, **kwargs):
         """
         Kalman filtering

@@ -476,11 +821,33 @@ class MLEModel(tsbase.TimeSeriesModel):
             Additional keyword arguments to pass to the Kalman filter. See
             `KalmanFilter.filter` for more details.
         """
-        pass
+        params = self.handle_params(params, transformed=transformed,
+                                    includes_fixed=includes_fixed)
+        self.update(params, transformed=True, includes_fixed=True,
+                    complex_step=complex_step)
+
+        # Save the parameter names
+        self.data.param_names = self.param_names
+
+        if complex_step:
+            kwargs['inversion_method'] = INVERT_UNIVARIATE | SOLVE_LU
+
+        # Handle memory conservation
+        if low_memory:
+            kwargs['conserve_memory'] = MEMORY_CONSERVE
+
+        # Get the state space output
+        result = self.ssm.filter(complex_step=complex_step, **kwargs)
+
+        # Wrap in a results object
+        return self._wrap_results(params, result, return_ssm, cov_type,
+                                  cov_kwds, results_class,
+                                  results_wrapper_class)

     def smooth(self, params, transformed=True, includes_fixed=False,
-        complex_step=False, cov_type=None, cov_kwds=None, return_ssm=False,
-        results_class=None, results_wrapper_class=None, **kwargs):
+               complex_step=False, cov_type=None, cov_kwds=None,
+               return_ssm=False, results_class=None,
+               results_wrapper_class=None, **kwargs):
         """
         Kalman smoothing

@@ -504,7 +871,25 @@ class MLEModel(tsbase.TimeSeriesModel):
             Additional keyword arguments to pass to the Kalman filter. See
             `KalmanFilter.filter` for more details.
         """
-        pass
+        params = self.handle_params(params, transformed=transformed,
+                                    includes_fixed=includes_fixed)
+        self.update(params, transformed=True, includes_fixed=True,
+                    complex_step=complex_step)
+
+        # Save the parameter names
+        self.data.param_names = self.param_names
+
+        if complex_step:
+            kwargs['inversion_method'] = INVERT_UNIVARIATE | SOLVE_LU
+
+        # Get the state space output
+        result = self.ssm.smooth(complex_step=complex_step, **kwargs)
+
+        # Wrap in a results object
+        return self._wrap_results(params, result, return_ssm, cov_type,
+                                  cov_kwds, results_class,
+                                  results_wrapper_class)
+
     _loglike_param_names = ['transformed', 'includes_fixed', 'complex_step']
     _loglike_param_defaults = [True, False, False]

@@ -539,10 +924,27 @@ class MLEModel(tsbase.TimeSeriesModel):
            Statistical Algorithms for Models in State Space Using SsfPack 2.2.
            Econometrics Journal 2 (1): 107-60. doi:10.1111/1368-423X.00023.
         """
-        pass
+        transformed, includes_fixed, complex_step, kwargs = _handle_args(
+            MLEModel._loglike_param_names, MLEModel._loglike_param_defaults,
+            *args, **kwargs)
+
+        params = self.handle_params(params, transformed=transformed,
+                                    includes_fixed=includes_fixed)
+        self.update(params, transformed=True, includes_fixed=True,
+                    complex_step=complex_step)
+
+        if complex_step:
+            kwargs['inversion_method'] = INVERT_UNIVARIATE | SOLVE_LU
+
+        loglike = self.ssm.loglike(complex_step=complex_step, **kwargs)
+
+        # Koopman, Shephard, and Doornik recommend maximizing the average
+        # likelihood to avoid scale issues, but the averaging is done
+        # automatically in the base model `fit` method
+        return loglike

     def loglikeobs(self, params, transformed=True, includes_fixed=False,
-        complex_step=False, **kwargs):
+                   complex_step=False, **kwargs):
         """
         Loglikelihood evaluation

@@ -572,10 +974,21 @@ class MLEModel(tsbase.TimeSeriesModel):
            Statistical Algorithms for Models in State Space Using SsfPack 2.2.
            Econometrics Journal 2 (1): 107-60. doi:10.1111/1368-423X.00023.
         """
-        pass
+        params = self.handle_params(params, transformed=transformed,
+                                    includes_fixed=includes_fixed)
+
+        # If we're using complex-step differentiation, then we cannot use
+        # Cholesky factorization
+        if complex_step:
+            kwargs['inversion_method'] = INVERT_UNIVARIATE | SOLVE_LU
+
+        self.update(params, transformed=True, includes_fixed=True,
+                    complex_step=complex_step)
+
+        return self.ssm.loglikeobs(complex_step=complex_step, **kwargs)

     def simulation_smoother(self, simulation_output=None, **kwargs):
-        """
+        r"""
         Retrieve a simulation smoother for the state space model.

         Parameters
@@ -591,11 +1004,111 @@ class MLEModel(tsbase.TimeSeriesModel):
         -------
         SimulationSmoothResults
         """
-        pass
+        return self.ssm.simulation_smoother(
+            simulation_output=simulation_output, **kwargs)
+
+    def _forecasts_error_partial_derivatives(self, params, transformed=True,
+                                             includes_fixed=False,
+                                             approx_complex_step=None,
+                                             approx_centered=False,
+                                             res=None, **kwargs):
+        params = np.array(params, ndmin=1)
+
+        # We cannot use complex-step differentiation with non-transformed
+        # parameters
+        if approx_complex_step is None:
+            approx_complex_step = transformed
+        if not transformed and approx_complex_step:
+            raise ValueError("Cannot use complex-step approximations to"
+                             " calculate the observed_information_matrix"
+                             " with untransformed parameters.")
+
+        # If we're using complex-step differentiation, then we cannot use
+        # Cholesky factorization
+        if approx_complex_step:
+            kwargs['inversion_method'] = INVERT_UNIVARIATE | SOLVE_LU
+
+        # Get values at the params themselves
+        if res is None:
+            self.update(params, transformed=transformed,
+                        includes_fixed=includes_fixed,
+                        complex_step=approx_complex_step)
+            res = self.ssm.filter(complex_step=approx_complex_step, **kwargs)
+
+        # Setup
+        n = len(params)
+
+        # Compute partial derivatives w.r.t. forecast error and forecast
+        # error covariance
+        partials_forecasts_error = (
+            np.zeros((self.k_endog, self.nobs, n))
+        )
+        partials_forecasts_error_cov = (
+            np.zeros((self.k_endog, self.k_endog, self.nobs, n))
+        )
+        if approx_complex_step:
+            epsilon = _get_epsilon(params, 2, None, n)
+            increments = np.identity(n) * 1j * epsilon
+
+            for i, ih in enumerate(increments):
+                self.update(params + ih, transformed=transformed,
+                            includes_fixed=includes_fixed,
+                            complex_step=True)
+                _res = self.ssm.filter(complex_step=True, **kwargs)
+
+                partials_forecasts_error[:, :, i] = (
+                    _res.forecasts_error.imag / epsilon[i]
+                )
+
+                partials_forecasts_error_cov[:, :, :, i] = (
+                    _res.forecasts_error_cov.imag / epsilon[i]
+                )
+        elif not approx_centered:
+            epsilon = _get_epsilon(params, 2, None, n)
+            ei = np.zeros((n,), float)
+            for i in range(n):
+                ei[i] = epsilon[i]
+                self.update(params + ei, transformed=transformed,
+                            includes_fixed=includes_fixed, complex_step=False)
+                _res = self.ssm.filter(complex_step=False, **kwargs)
+
+                partials_forecasts_error[:, :, i] = (
+                    _res.forecasts_error - res.forecasts_error) / epsilon[i]
+
+                partials_forecasts_error_cov[:, :, :, i] = (
+                    _res.forecasts_error_cov -
+                    res.forecasts_error_cov) / epsilon[i]
+                ei[i] = 0.0
+        else:
+            epsilon = _get_epsilon(params, 3, None, n) / 2.
+            ei = np.zeros((n,), float)
+            for i in range(n):
+                ei[i] = epsilon[i]
+
+                self.update(params + ei, transformed=transformed,
+                            includes_fixed=includes_fixed, complex_step=False)
+                _res1 = self.ssm.filter(complex_step=False, **kwargs)
+
+                self.update(params - ei, transformed=transformed,
+                            includes_fixed=includes_fixed, complex_step=False)
+                _res2 = self.ssm.filter(complex_step=False, **kwargs)
+
+                partials_forecasts_error[:, :, i] = (
+                    (_res1.forecasts_error - _res2.forecasts_error) /
+                    (2 * epsilon[i]))
+
+                partials_forecasts_error_cov[:, :, :, i] = (
+                    (_res1.forecasts_error_cov - _res2.forecasts_error_cov) /
+                    (2 * epsilon[i]))
+
+                ei[i] = 0.0
+
+        return partials_forecasts_error, partials_forecasts_error_cov

     def observed_information_matrix(self, params, transformed=True,
-        includes_fixed=False, approx_complex_step=None, approx_centered=
-        False, **kwargs):
+                                    includes_fixed=False,
+                                    approx_complex_step=None,
+                                    approx_centered=False, **kwargs):
         """
         Observed information matrix

@@ -622,10 +1135,71 @@ class MLEModel(tsbase.TimeSeriesModel):
         Forecasting, Structural Time Series Models and the Kalman Filter.
         Cambridge University Press.
         """
-        pass
+        params = np.array(params, ndmin=1)
+
+        # Setup
+        n = len(params)
+
+        # We cannot use complex-step differentiation with non-transformed
+        # parameters
+        if approx_complex_step is None:
+            approx_complex_step = transformed
+        if not transformed and approx_complex_step:
+            raise ValueError("Cannot use complex-step approximations to"
+                             " calculate the observed_information_matrix"
+                             " with untransformed parameters.")
+
+        # Get values at the params themselves
+        params = self.handle_params(params, transformed=transformed,
+                                    includes_fixed=includes_fixed)
+        self.update(params, transformed=True, includes_fixed=True,
+                    complex_step=approx_complex_step)
+        # If we're using complex-step differentiation, then we cannot use
+        # Cholesky factorization
+        if approx_complex_step:
+            kwargs['inversion_method'] = INVERT_UNIVARIATE | SOLVE_LU
+        res = self.ssm.filter(complex_step=approx_complex_step, **kwargs)
+        dtype = self.ssm.dtype
+
+        # Save this for inversion later
+        inv_forecasts_error_cov = res.forecasts_error_cov.copy()
+
+        partials_forecasts_error, partials_forecasts_error_cov = (
+            self._forecasts_error_partial_derivatives(
+                params, transformed=transformed, includes_fixed=includes_fixed,
+                approx_complex_step=approx_complex_step,
+                approx_centered=approx_centered, res=res, **kwargs))
+
+        # Compute the information matrix
+        tmp = np.zeros((self.k_endog, self.k_endog, self.nobs, n), dtype=dtype)
+
+        information_matrix = np.zeros((n, n), dtype=dtype)
+        d = np.maximum(self.ssm.loglikelihood_burn, res.nobs_diffuse)
+        for t in range(d, self.nobs):
+            inv_forecasts_error_cov[:, :, t] = (
+                np.linalg.inv(res.forecasts_error_cov[:, :, t])
+            )
+            for i in range(n):
+                tmp[:, :, t, i] = np.dot(
+                    inv_forecasts_error_cov[:, :, t],
+                    partials_forecasts_error_cov[:, :, t, i]
+                )
+            for i in range(n):
+                for j in range(n):
+                    information_matrix[i, j] += (
+                        0.5 * np.trace(np.dot(tmp[:, :, t, i],
+                                              tmp[:, :, t, j]))
+                    )
+                    information_matrix[i, j] += np.inner(
+                        partials_forecasts_error[:, t, i],
+                        np.dot(inv_forecasts_error_cov[:, :, t],
+                               partials_forecasts_error[:, t, j])
+                    )
+        return information_matrix / (self.nobs - self.ssm.loglikelihood_burn)

     def opg_information_matrix(self, params, transformed=True,
-        includes_fixed=False, approx_complex_step=None, **kwargs):
+                               includes_fixed=False, approx_complex_step=None,
+                               **kwargs):
         """
         Outer product of gradients information matrix

@@ -643,10 +1217,47 @@ class MLEModel(tsbase.TimeSeriesModel):
         Estimation and Inference in Nonlinear Structural Models.
         NBER Chapters. National Bureau of Economic Research, Inc.
         """
-        pass
+        # We cannot use complex-step differentiation with non-transformed
+        # parameters
+        if approx_complex_step is None:
+            approx_complex_step = transformed
+        if not transformed and approx_complex_step:
+            raise ValueError("Cannot use complex-step approximations to"
+                             " calculate the observed_information_matrix"
+                             " with untransformed parameters.")
+
+        score_obs = self.score_obs(params, transformed=transformed,
+                                   includes_fixed=includes_fixed,
+                                   approx_complex_step=approx_complex_step,
+                                   **kwargs).transpose()
+        return (
+            np.inner(score_obs, score_obs) /
+            (self.nobs - self.ssm.loglikelihood_burn)
+        )
+
+    def _score_complex_step(self, params, **kwargs):
+        # the default epsilon can be too small
+        # inversion_method = INVERT_UNIVARIATE | SOLVE_LU
+        epsilon = _get_epsilon(params, 2., None, len(params))
+        kwargs['transformed'] = True
+        kwargs['complex_step'] = True
+        return approx_fprime_cs(params, self.loglike, epsilon=epsilon,
+                                kwargs=kwargs)
+
+    def _score_finite_difference(self, params, approx_centered=False,
+                                 **kwargs):
+        kwargs['transformed'] = True
+        return approx_fprime(params, self.loglike, kwargs=kwargs,
+                             centered=approx_centered)
+
+    def _score_harvey(self, params, approx_complex_step=True, **kwargs):
+        score_obs = self._score_obs_harvey(
+            params, approx_complex_step=approx_complex_step, **kwargs)
+        return np.sum(score_obs, axis=0)

     def _score_obs_harvey(self, params, approx_complex_step=True,
-        approx_centered=False, includes_fixed=False, **kwargs):
+                          approx_centered=False, includes_fixed=False,
+                          **kwargs):
         """
         Score

@@ -669,9 +1280,50 @@ class MLEModel(tsbase.TimeSeriesModel):
         Forecasting, Structural Time Series Models and the Kalman Filter.
         Cambridge University Press.
         """
-        pass
+        params = np.array(params, ndmin=1)
+        n = len(params)
+
+        # Get values at the params themselves
+        self.update(params, transformed=True, includes_fixed=includes_fixed,
+                    complex_step=approx_complex_step)
+        if approx_complex_step:
+            kwargs['inversion_method'] = INVERT_UNIVARIATE | SOLVE_LU
+        if 'transformed' in kwargs:
+            del kwargs['transformed']
+        res = self.ssm.filter(complex_step=approx_complex_step, **kwargs)
+
+        # Get forecasts error partials
+        partials_forecasts_error, partials_forecasts_error_cov = (
+            self._forecasts_error_partial_derivatives(
+                params, transformed=True, includes_fixed=includes_fixed,
+                approx_complex_step=approx_complex_step,
+                approx_centered=approx_centered, res=res, **kwargs))
+
+        # Compute partial derivatives w.r.t. likelihood function
+        partials = np.zeros((self.nobs, n))
+        k_endog = self.k_endog
+        for t in range(self.nobs):
+            inv_forecasts_error_cov = np.linalg.inv(
+                    res.forecasts_error_cov[:, :, t])
+
+            for i in range(n):
+                partials[t, i] += np.trace(np.dot(
+                    np.dot(inv_forecasts_error_cov,
+                           partials_forecasts_error_cov[:, :, t, i]),
+                    (np.eye(k_endog) -
+                     np.dot(inv_forecasts_error_cov,
+                            np.outer(res.forecasts_error[:, t],
+                                     res.forecasts_error[:, t])))))
+                # 2 * dv / di * F^{-1} v_t
+                # where x = F^{-1} v_t or F x = v
+                partials[t, i] += 2 * np.dot(
+                    partials_forecasts_error[:, t, i],
+                    np.dot(inv_forecasts_error_cov, res.forecasts_error[:, t]))
+
+        return -partials / 2.
+
     _score_param_names = ['transformed', 'includes_fixed', 'score_method',
-        'approx_complex_step', 'approx_centered']
+                          'approx_complex_step', 'approx_centered']
     _score_param_defaults = [True, False, 'approx', None, False]

     def score(self, params, *args, **kwargs):
@@ -701,11 +1353,55 @@ class MLEModel(tsbase.TimeSeriesModel):
         `fit` must call this function and only supports passing arguments via
         args (for example `scipy.optimize.fmin_l_bfgs`).
         """
-        pass
+        (transformed, includes_fixed, method, approx_complex_step,
+         approx_centered, kwargs) = (
+            _handle_args(MLEModel._score_param_names,
+                         MLEModel._score_param_defaults, *args, **kwargs))
+        # For fit() calls, the method is called 'score_method' (to distinguish
+        # it from the method used for fit) but generally in kwargs the method
+        # will just be called 'method'
+        if 'method' in kwargs:
+            method = kwargs.pop('method')
+
+        if approx_complex_step is None:
+            approx_complex_step = not self.ssm._complex_endog
+        if approx_complex_step and self.ssm._complex_endog:
+            raise ValueError('Cannot use complex step derivatives when data'
+                             ' or parameters are complex.')
+
+        out = self.handle_params(
+            params, transformed=transformed, includes_fixed=includes_fixed,
+            return_jacobian=not transformed)
+        if transformed:
+            params = out
+        else:
+            params, transform_score = out
+
+        if method == 'harvey':
+            kwargs['includes_fixed'] = True
+            score = self._score_harvey(
+                params, approx_complex_step=approx_complex_step, **kwargs)
+        elif method == 'approx' and approx_complex_step:
+            kwargs['includes_fixed'] = True
+            score = self._score_complex_step(params, **kwargs)
+        elif method == 'approx':
+            kwargs['includes_fixed'] = True
+            score = self._score_finite_difference(
+                params, approx_centered=approx_centered, **kwargs)
+        else:
+            raise NotImplementedError('Invalid score method.')
+
+        if not transformed:
+            score = np.dot(transform_score, score)
+
+        if self._has_fixed_params and not includes_fixed:
+            score = score[self._free_params_index]
+
+        return score

     def score_obs(self, params, method='approx', transformed=True,
-        includes_fixed=False, approx_complex_step=None, approx_centered=
-        False, **kwargs):
+                  includes_fixed=False, approx_complex_step=None,
+                  approx_centered=False, **kwargs):
         """
         Compute the score per observation, evaluated at params

@@ -726,13 +1422,45 @@ class MLEModel(tsbase.TimeSeriesModel):
         This is a numerical approximation, calculated using first-order complex
         step differentiation on the `loglikeobs` method.
         """
-        pass
+        if not transformed and approx_complex_step:
+            raise ValueError("Cannot use complex-step approximations to"
+                             " calculate the score at each observation"
+                             " with untransformed parameters.")
+
+        if approx_complex_step is None:
+            approx_complex_step = not self.ssm._complex_endog
+        if approx_complex_step and self.ssm._complex_endog:
+            raise ValueError('Cannot use complex step derivatives when data'
+                             ' or parameters are complex.')
+
+        params = self.handle_params(params, transformed=True,
+                                    includes_fixed=includes_fixed)
+        kwargs['transformed'] = transformed
+        kwargs['includes_fixed'] = True
+
+        if method == 'harvey':
+            score = self._score_obs_harvey(
+                params, approx_complex_step=approx_complex_step, **kwargs)
+        elif method == 'approx' and approx_complex_step:
+            # the default epsilon can be too small
+            epsilon = _get_epsilon(params, 2., None, len(params))
+            kwargs['complex_step'] = True
+            score = approx_fprime_cs(params, self.loglikeobs, epsilon=epsilon,
+                                     kwargs=kwargs)
+        elif method == 'approx':
+            score = approx_fprime(params, self.loglikeobs, kwargs=kwargs,
+                                  centered=approx_centered)
+        else:
+            raise NotImplementedError('Invalid scoreobs method.')
+
+        return score
+
     _hessian_param_names = ['transformed', 'hessian_method',
-        'approx_complex_step', 'approx_centered']
+                            'approx_complex_step', 'approx_centered']
     _hessian_param_defaults = [True, 'approx', None, False]

     def hessian(self, params, *args, **kwargs):
-        """
+        r"""
         Hessian matrix of the likelihood function, evaluated at the given
         parameters

@@ -758,34 +1486,102 @@ class MLEModel(tsbase.TimeSeriesModel):
         `fit` must call this function and only supports passing arguments via
         args (for example `scipy.optimize.fmin_l_bfgs`).
         """
-        pass
+        transformed, method, approx_complex_step, approx_centered, kwargs = (
+            _handle_args(MLEModel._hessian_param_names,
+                         MLEModel._hessian_param_defaults,
+                         *args, **kwargs))
+        # For fit() calls, the method is called 'hessian_method' (to
+        # distinguish it from the method used for fit) but generally in kwargs
+        # the method will just be called 'method'
+        if 'method' in kwargs:
+            method = kwargs.pop('method')
+
+        if not transformed and approx_complex_step:
+            raise ValueError("Cannot use complex-step approximations to"
+                             " calculate the hessian with untransformed"
+                             " parameters.")
+
+        if approx_complex_step is None:
+            approx_complex_step = not self.ssm._complex_endog
+        if approx_complex_step and self.ssm._complex_endog:
+            raise ValueError('Cannot use complex step derivatives when data'
+                             ' or parameters are complex.')
+
+        if method == 'oim':
+            hessian = self._hessian_oim(
+                params, transformed=transformed,
+                approx_complex_step=approx_complex_step,
+                approx_centered=approx_centered, **kwargs)
+        elif method == 'opg':
+            hessian = self._hessian_opg(
+                params, transformed=transformed,
+                approx_complex_step=approx_complex_step,
+                approx_centered=approx_centered, **kwargs)
+        elif method == 'approx' and approx_complex_step:
+            hessian = self._hessian_complex_step(
+                params, transformed=transformed, **kwargs)
+        elif method == 'approx':
+            hessian = self._hessian_finite_difference(
+                params, transformed=transformed,
+                approx_centered=approx_centered, **kwargs)
+        else:
+            raise NotImplementedError('Invalid Hessian calculation method.')
+        return hessian

     def _hessian_oim(self, params, **kwargs):
         """
         Hessian matrix computed using the Harvey (1989) information matrix
         """
-        pass
+        return -self.observed_information_matrix(params, **kwargs)

     def _hessian_opg(self, params, **kwargs):
         """
         Hessian matrix computed using the outer product of gradients
         information matrix
         """
-        pass
+        return -self.opg_information_matrix(params, **kwargs)
+
+    def _hessian_finite_difference(self, params, approx_centered=False,
+                                   **kwargs):
+        params = np.array(params, ndmin=1)
+
+        warnings.warn('Calculation of the Hessian using finite differences'
+                      ' is usually subject to substantial approximation'
+                      ' errors.', PrecisionWarning)
+
+        if not approx_centered:
+            epsilon = _get_epsilon(params, 3, None, len(params))
+        else:
+            epsilon = _get_epsilon(params, 4, None, len(params)) / 2
+        hessian = approx_fprime(params, self._score_finite_difference,
+                                epsilon=epsilon, kwargs=kwargs,
+                                centered=approx_centered)
+
+        return hessian / (self.nobs - self.ssm.loglikelihood_burn)

     def _hessian_complex_step(self, params, **kwargs):
         """
         Hessian matrix computed by second-order complex-step differentiation
         on the `loglike` function.
         """
-        pass
+        # the default epsilon can be too small
+        epsilon = _get_epsilon(params, 3., None, len(params))
+        kwargs['transformed'] = True
+        kwargs['complex_step'] = True
+        hessian = approx_hess_cs(
+            params, self.loglike, epsilon=epsilon, kwargs=kwargs)
+
+        return hessian / (self.nobs - self.ssm.loglikelihood_burn)

     @property
     def start_params(self):
         """
         (array) Starting parameters for maximum likelihood estimation.
         """
-        pass
+        if hasattr(self, '_start_params'):
+            return self._start_params
+        else:
+            raise NotImplementedError

     @property
     def param_names(self):
@@ -793,14 +1589,25 @@ class MLEModel(tsbase.TimeSeriesModel):
         (list of str) List of human readable parameter names (for parameters
         actually included in the model).
         """
-        pass
+        if hasattr(self, '_param_names'):
+            return self._param_names
+        else:
+            try:
+                names = ['param.%d' % i for i in range(len(self.start_params))]
+            except NotImplementedError:
+                names = []
+            return names

     @property
     def state_names(self):
         """
         (list of str) List of human readable names for unobserved states.
         """
-        pass
+        if hasattr(self, '_state_names'):
+            return self._state_names
+        else:
+            names = ['state.%d' % i for i in range(self.k_states)]
+        return names

     def transform_jacobian(self, unconstrained, approx_centered=False):
         """
@@ -827,7 +1634,8 @@ class MLEModel(tsbase.TimeSeriesModel):
         guaranteed that the `transform_params` method is a real function (e.g.
         if Cholesky decomposition is used).
         """
-        pass
+        return approx_fprime(unconstrained, self.transform_params,
+                             centered=approx_centered)

     def transform_params(self, unconstrained):
         """
@@ -851,7 +1659,7 @@ class MLEModel(tsbase.TimeSeriesModel):
         This is a noop in the base class, subclasses should override where
         appropriate.
         """
-        pass
+        return np.array(unconstrained, ndmin=1)

     def untransform_params(self, constrained):
         """
@@ -874,17 +1682,46 @@ class MLEModel(tsbase.TimeSeriesModel):
         This is a noop in the base class, subclasses should override where
         appropriate.
         """
-        pass
+        return np.array(constrained, ndmin=1)

     def handle_params(self, params, transformed=True, includes_fixed=False,
-        return_jacobian=False):
+                      return_jacobian=False):
         """
         Ensure model parameters satisfy shape and other requirements
         """
-        pass
+        params = np.array(params, ndmin=1)
+
+        # Never want integer dtype, so convert to floats
+        if np.issubdtype(params.dtype, np.integer):
+            params = params.astype(np.float64)
+
+        if not includes_fixed and self._has_fixed_params:
+            k_params = len(self.param_names)
+            new_params = np.zeros(k_params, dtype=params.dtype) * np.nan
+            new_params[self._free_params_index] = params
+            params = new_params
+
+        if not transformed:
+            # It may be the case that the transformation relies on having
+            # "some" (non-NaN) values for the fixed parameters, even if we will
+            # not actually be transforming the fixed parameters (as they will)
+            # be set below regardless
+            if not includes_fixed and self._has_fixed_params:
+                params[self._fixed_params_index] = (
+                    list(self._fixed_params.values()))
+
+            if return_jacobian:
+                transform_score = self.transform_jacobian(params)
+            params = self.transform_params(params)
+
+        if not includes_fixed and self._has_fixed_params:
+            params[self._fixed_params_index] = (
+                list(self._fixed_params.values()))
+
+        return (params, transform_score) if return_jacobian else params

     def update(self, params, transformed=True, includes_fixed=False,
-        complex_step=False):
+               complex_step=False):
         """
         Update the parameters of the model

@@ -906,7 +1743,8 @@ class MLEModel(tsbase.TimeSeriesModel):
         Since Model is a base class, this method should be overridden by
         subclasses to perform actual updating steps.
         """
-        pass
+        return self.handle_params(params=params, transformed=transformed,
+                                  includes_fixed=includes_fixed)

     def _validate_out_of_sample_exog(self, exog, out_of_sample):
         """
@@ -925,11 +1763,33 @@ class MLEModel(tsbase.TimeSeriesModel):
             A numpy array of shape (out_of_sample, k_exog) if the model
             contains an `exog` component, or None if it does not.
         """
-        pass
-
-    def _get_extension_time_varying_matrices(self, params, exog,
-        out_of_sample, extend_kwargs=None, transformed=True, includes_fixed
-        =False, **kwargs):
+        k_exog = getattr(self, 'k_exog', 0)
+        if out_of_sample and k_exog > 0:
+            if exog is None:
+                raise ValueError('Out-of-sample operations in a model'
+                                 ' with a regression component require'
+                                 ' additional exogenous values via the'
+                                 ' `exog` argument.')
+            exog = np.array(exog)
+            required_exog_shape = (out_of_sample, self.k_exog)
+            try:
+                exog = exog.reshape(required_exog_shape)
+            except ValueError:
+                raise ValueError('Provided exogenous values are not of the'
+                                 ' appropriate shape. Required %s, got %s.'
+                                 % (str(required_exog_shape),
+                                    str(exog.shape)))
+        elif k_exog > 0 and exog is not None:
+            exog = None
+            warnings.warn('Exogenous array provided, but additional data'
+                          ' is not required. `exog` argument ignored.',
+                          ValueWarning)
+
+        return exog
+
+    def _get_extension_time_varying_matrices(
+            self, params, exog, out_of_sample, extend_kwargs=None,
+            transformed=True, includes_fixed=False, **kwargs):
         """
         Get updated time-varying state space system matrices

@@ -957,15 +1817,49 @@ class MLEModel(tsbase.TimeSeriesModel):
             the fixed parameters, in addition to the free parameters. Default
             is False.
         """
-        pass
+        # Get the appropriate exog for the extended sample
+        exog = self._validate_out_of_sample_exog(exog, out_of_sample)
+
+        # Create extended model
+        if extend_kwargs is None:
+            extend_kwargs = {}
+
+        # Handle trend offset for extended model
+        if getattr(self, 'k_trend', 0) > 0 and hasattr(self, 'trend_offset'):
+            extend_kwargs.setdefault(
+                'trend_offset', self.trend_offset + self.nobs)
+
+        mod_extend = self.clone(
+            endog=np.zeros((out_of_sample, self.k_endog)), exog=exog,
+            **extend_kwargs)
+        mod_extend.update(params, transformed=transformed,
+                          includes_fixed=includes_fixed)
+
+        # Retrieve the extensions to the time-varying system matrices and
+        # put them in kwargs
+        for name in self.ssm.shapes.keys():
+            if name == 'obs' or name in kwargs:
+                continue
+            original = getattr(self.ssm, name)
+            extended = getattr(mod_extend.ssm, name)
+            so = original.shape[-1]
+            se = extended.shape[-1]
+            if ((so > 1 or se > 1) or (
+                    so == 1 and self.nobs == 1 and
+                    np.any(original[..., 0] != extended[..., 0]))):
+                kwargs[name] = extended[..., -out_of_sample:]
+
+        return kwargs

     def simulate(self, params, nsimulations, measurement_shocks=None,
-        state_shocks=None, initial_state=None, anchor=None, repetitions=
-        None, exog=None, extend_model=None, extend_kwargs=None, transformed
-        =True, includes_fixed=False, pretransformed_measurement_shocks=True,
-        pretransformed_state_shocks=True, pretransformed_initial_state=True,
-        random_state=None, **kwargs):
-        """
+                 state_shocks=None, initial_state=None, anchor=None,
+                 repetitions=None, exog=None, extend_model=None,
+                 extend_kwargs=None, transformed=True, includes_fixed=False,
+                 pretransformed_measurement_shocks=True,
+                 pretransformed_state_shocks=True,
+                 pretransformed_initial_state=True, random_state=None,
+                 **kwargs):
+        r"""
         Simulate a new time series following the state space model

         Parameters
@@ -980,13 +1874,13 @@ class MLEModel(tsbase.TimeSeriesModel):
             number of observations.
         measurement_shocks : array_like, optional
             If specified, these are the shocks to the measurement equation,
-            :math:`\\varepsilon_t`. If unspecified, these are automatically
+            :math:`\varepsilon_t`. If unspecified, these are automatically
             generated using a pseudo-random number generator. If specified,
             must be shaped `nsimulations` x `k_endog`, where `k_endog` is the
             same as in the state space model.
         state_shocks : array_like, optional
             If specified, these are the shocks to the state equation,
-            :math:`\\eta_t`. If unspecified, these are automatically
+            :math:`\eta_t`. If unspecified, these are automatically
             generated using a pseudo-random number generator. If specified,
             must be shaped `nsimulations` x `k_posdef` where `k_posdef` is the
             same as in the state space model.
@@ -1065,11 +1959,112 @@ class MLEModel(tsbase.TimeSeriesModel):
         impulse_responses
             Impulse response functions
         """
-        pass
+        # Make sure the model class has the current parameters
+        self.update(params, transformed=transformed,
+                    includes_fixed=includes_fixed)

-    def impulse_responses(self, params, steps=1, impulse=0, orthogonalized=
-        False, cumulative=False, anchor=None, exog=None, extend_model=None,
-        extend_kwargs=None, transformed=True, includes_fixed=False, **kwargs):
+        # Get the starting location
+        if anchor is None or anchor == 'start':
+            iloc = 0
+        elif anchor == 'end':
+            iloc = self.nobs
+        else:
+            iloc, _, _ = self._get_index_loc(anchor)
+            if isinstance(iloc, slice):
+                iloc = iloc.start
+
+        if iloc < 0:
+            iloc = self.nobs + iloc
+        if iloc > self.nobs:
+            raise ValueError('Cannot anchor simulation outside of the sample.')
+
+        if iloc > 0 and initial_state is None:
+            raise ValueError('If `anchor` is after the start of the sample,'
+                             ' must provide a value for `initial_state`.')
+
+        # Get updated time-varying system matrices in **kwargs, if necessary
+        out_of_sample = max(iloc + nsimulations - self.nobs, 0)
+        if extend_model is None:
+            extend_model = self.exog is not None or not self.ssm.time_invariant
+        if out_of_sample and extend_model:
+            kwargs = self._get_extension_time_varying_matrices(
+                params, exog, out_of_sample, extend_kwargs,
+                transformed=transformed, includes_fixed=includes_fixed,
+                **kwargs)
+
+        # Standardize the dimensions of the initial state
+        if initial_state is not None:
+            initial_state = np.array(initial_state)
+            if initial_state.ndim < 2:
+                initial_state = np.atleast_2d(initial_state).T
+
+        # Construct a model that represents the simulation period
+        end = min(self.nobs, iloc + nsimulations)
+        nextend = iloc + nsimulations - end
+        sim_model = self.ssm.extend(np.zeros((nextend, self.k_endog)),
+                                    start=iloc, end=end, **kwargs)
+
+        # Simulate the data
+        _repetitions = 1 if repetitions is None else repetitions
+        sim = np.zeros((nsimulations, self.k_endog, _repetitions))
+        simulator = None
+
+        for i in range(_repetitions):
+            initial_state_variates = None
+            if initial_state is not None:
+                if initial_state.shape[1] == 1:
+                    initial_state_variates = initial_state[:, 0]
+                else:
+                    initial_state_variates = initial_state[:, i]
+
+            # TODO: allow specifying measurement / state shocks for each
+            # repetition?
+
+            out, _, simulator = sim_model.simulate(
+                nsimulations, measurement_shocks, state_shocks,
+                initial_state_variates,
+                pretransformed_measurement_shocks=(
+                    pretransformed_measurement_shocks),
+                pretransformed_state_shocks=pretransformed_state_shocks,
+                pretransformed_initial_state=pretransformed_initial_state,
+                simulator=simulator, return_simulator=True,
+                random_state=random_state)
+
+            sim[:, :, i] = out
+
+        # Wrap data / squeeze where appropriate
+        use_pandas = isinstance(self.data, PandasData)
+        index = None
+        if use_pandas:
+            _, _, _, index = self._get_prediction_index(
+                iloc, iloc + nsimulations - 1)
+        # If `repetitions` isn't set, we squeeze the last dimension(s)
+        if repetitions is None:
+            if self.k_endog == 1:
+                sim = sim[:, 0, 0]
+                if use_pandas:
+                    sim = pd.Series(sim, index=index, name=self.endog_names)
+            else:
+                sim = sim[:, :, 0]
+                if use_pandas:
+                    sim = pd.DataFrame(sim, index=index,
+                                       columns=self.endog_names)
+        elif use_pandas:
+            shape = sim.shape
+            endog_names = self.endog_names
+            if not isinstance(endog_names, list):
+                endog_names = [endog_names]
+            columns = pd.MultiIndex.from_product([endog_names,
+                                                  np.arange(shape[2])])
+            sim = pd.DataFrame(sim.reshape(shape[0], shape[1] * shape[2]),
+                               index=index, columns=columns)
+
+        return sim
+
+    def impulse_responses(self, params, steps=1, impulse=0,
+                          orthogonalized=False, cumulative=False, anchor=None,
+                          exog=None, extend_model=None, extend_kwargs=None,
+                          transformed=True, includes_fixed=False, **kwargs):
         """
         Impulse response function

@@ -1151,18 +2146,121 @@ class MLEModel(tsbase.TimeSeriesModel):
               orthogonalized option. Will require permuting matrices when
               constructing the extended model.
         """
-        pass
+        # Make sure the model class has the current parameters
+        self.update(params, transformed=transformed,
+                    includes_fixed=includes_fixed)
+
+        # For time-invariant models, add an additional `step`. This is the
+        # default for time-invariant models based on the expected behavior for
+        # ARIMA and VAR models: we want to record the initial impulse and also
+        # `steps` values of the responses afterwards.
+        # Note: we don't modify `steps` itself, because
+        # `KalmanFilter.impulse_responses` also adds an additional step in this
+        # case (this is so that there isn't different behavior when calling
+        # this method versus that method). We just need to also keep track of
+        # this here because we need to generate the correct extended model.
+        additional_steps = 0
+        if (self.ssm._design.shape[2] == 1 and
+                self.ssm._transition.shape[2] == 1 and
+                self.ssm._selection.shape[2] == 1):
+            additional_steps = 1
+
+        # Get the starting location
+        if anchor is None or anchor == 'start':
+            iloc = 0
+        elif anchor == 'end':
+            iloc = self.nobs - 1
+        else:
+            iloc, _, _ = self._get_index_loc(anchor)
+            if isinstance(iloc, slice):
+                iloc = iloc.start
+
+        if iloc < 0:
+            iloc = self.nobs + iloc
+        if iloc >= self.nobs:
+            raise ValueError('Cannot anchor impulse responses outside of the'
+                             ' sample.')
+
+        time_invariant = (
+            self.ssm._design.shape[2] == self.ssm._obs_cov.shape[2] ==
+            self.ssm._transition.shape[2] == self.ssm._selection.shape[2] ==
+            self.ssm._state_cov.shape[2] == 1)
+
+        # Get updated time-varying system matrices in **kwargs, if necessary
+        # (Note: KalmanFilter adds 1 to steps to account for the first impulse)
+        out_of_sample = max(
+            iloc + (steps + additional_steps + 1) - self.nobs, 0)
+        if extend_model is None:
+            extend_model = self.exog is not None and not time_invariant
+        if out_of_sample and extend_model:
+            kwargs = self._get_extension_time_varying_matrices(
+                params, exog, out_of_sample, extend_kwargs,
+                transformed=transformed, includes_fixed=includes_fixed,
+                **kwargs)
+
+        # Special handling for matrix terms that are time-varying but
+        # irrelevant for impulse response functions. Must be set since
+        # ssm.extend() requires that we pass new matrices for these, but they
+        # are ignored for IRF purposes.
+        end = min(self.nobs, iloc + steps + additional_steps)
+        nextend = iloc + (steps + additional_steps + 1) - end
+        if ('obs_intercept' not in kwargs and
+                self.ssm._obs_intercept.shape[1] > 1):
+            kwargs['obs_intercept'] = np.zeros((self.k_endog, nextend))
+        if ('state_intercept' not in kwargs and
+                self.ssm._state_intercept.shape[1] > 1):
+            kwargs['state_intercept'] = np.zeros((self.k_states, nextend))
+        if 'obs_cov' not in kwargs and self.ssm._obs_cov.shape[2] > 1:
+            kwargs['obs_cov'] = np.zeros((self.k_endog, self.k_endog, nextend))
+        # Special handling for matrix terms that are time-varying but
+        # only the value at the anchor matters for IRF purposes.
+        if 'state_cov' not in kwargs and self.ssm._state_cov.shape[2] > 1:
+            tmp = np.zeros((self.ssm.k_posdef, self.ssm.k_posdef, nextend))
+            tmp[:] = self['state_cov', :, :, iloc:iloc + 1]
+            kwargs['state_cov'] = tmp
+        if 'selection' not in kwargs and self.ssm._selection.shape[2] > 1:
+            tmp = np.zeros((self.k_states, self.ssm.k_posdef, nextend))
+            tmp[:] = self['selection', :, :, iloc:iloc + 1]
+            kwargs['selection'] = tmp
+
+        # Construct a model that represents the simulation period
+        sim_model = self.ssm.extend(np.empty((nextend, self.k_endog)),
+                                    start=iloc, end=end, **kwargs)
+
+        # Compute the impulse responses
+
+        # Convert endog name to index
+        use_pandas = isinstance(self.data, PandasData)
+        if type(impulse) is str:
+            if not use_pandas:
+                raise ValueError('Endog must be pd.DataFrame.')
+            impulse = self.endog_names.index(impulse)
+
+        irfs = sim_model.impulse_responses(
+            steps, impulse, orthogonalized, cumulative)
+
+        # IRF is (nobs x k_endog); do not want to squeeze in case of steps = 1
+        if irfs.shape[1] == 1:
+            irfs = irfs[:, 0]
+
+        # Wrap data / squeeze where appropriate
+        if use_pandas:
+            if self.k_endog == 1:
+                irfs = pd.Series(irfs, name=self.endog_names)
+            else:
+                irfs = pd.DataFrame(irfs, columns=self.endog_names)
+        return irfs

     @classmethod
     def from_formula(cls, formula, data, subset=None):
         """
         Not implemented for state space models
         """
-        pass
+        raise NotImplementedError


 class MLEResults(tsbase.TimeSeriesModelResults):
-    """
+    r"""
     Class to hold results from fitting a state space model.

     Parameters
@@ -1194,78 +2292,113 @@ class MLEResults(tsbase.TimeSeriesModelResults):
     statsmodels.tsa.statespace.kalman_filter.FilterResults
     statsmodels.tsa.statespace.representation.FrozenRepresentation
     """
-
     def __init__(self, model, params, results, cov_type=None, cov_kwds=None,
-        **kwargs):
+                 **kwargs):
         self.data = model.data
         scale = results.scale
+
         tsbase.TimeSeriesModelResults.__init__(self, model, params,
-            normalized_cov_params=None, scale=scale)
+                                               normalized_cov_params=None,
+                                               scale=scale)
+
+        # Save the fixed parameters
         self._has_fixed_params = self.model._has_fixed_params
         self._fixed_params_index = self.model._fixed_params_index
         self._free_params_index = self.model._free_params_index
+        # TODO: seems like maybe self.fixed_params should be the dictionary
+        # itself, not just the keys?
         if self._has_fixed_params:
             self._fixed_params = self.model._fixed_params.copy()
             self.fixed_params = list(self._fixed_params.keys())
         else:
             self._fixed_params = None
             self.fixed_params = []
-        self.param_names = [('%s (fixed)' % name if name in self.
-            fixed_params else name) for name in self.data.param_names or []]
+        self.param_names = [
+            '%s (fixed)' % name if name in self.fixed_params else name
+            for name in (self.data.param_names or [])]
+
+        # Save the state space representation output
         self.filter_results = results
         if isinstance(results, SmootherResults):
             self.smoother_results = results
         else:
             self.smoother_results = None
+
+        # Dimensions
         self.nobs = self.filter_results.nobs
         self.nobs_diffuse = self.filter_results.nobs_diffuse
         if self.nobs_diffuse > 0 and self.loglikelihood_burn > 0:
-            warnings.warn(
-                'Care should be used when applying a loglikelihood burn to a model with exact diffuse initialization. Some results objects, e.g. degrees of freedom, expect only one of the two to be set.'
-                )
+            warnings.warn('Care should be used when applying a loglikelihood'
+                          ' burn to a model with exact diffuse initialization.'
+                          ' Some results objects, e.g. degrees of freedom,'
+                          ' expect only one of the two to be set.')
+        # This only excludes explicitly burned (usually approximate diffuse)
+        # periods but does not exclude exact diffuse periods. This is
+        # because the loglikelihood remains valid for the initial periods in
+        # the exact diffuse case (see DK, 2012, section 7.2) and so also do
+        # e.g. information criteria (see DK, 2012, section 7.4) and the score
+        # vector (see DK, 2012, section 7.3.3, equation 7.15).
+        # However, other objects should be excluded in the diffuse periods
+        # (e.g. the diffuse forecast errors, so in some cases a different
+        # nobs_effective will have to be computed and used)
         self.nobs_effective = self.nobs - self.loglikelihood_burn
+
         P = self.filter_results.initial_diffuse_state_cov
         self.k_diffuse_states = 0 if P is None else np.sum(np.diagonal(P) == 1)
+
+        # Degrees of freedom (see DK 2012, section 7.4)
         k_free_params = self.params.size - len(self.fixed_params)
-        self.df_model = (k_free_params + self.k_diffuse_states + self.
-            filter_results.filter_concentrated)
+        self.df_model = (k_free_params + self.k_diffuse_states
+                         + self.filter_results.filter_concentrated)
         self.df_resid = self.nobs_effective - self.df_model
+
+        # Setup covariance matrix notes dictionary
         if not hasattr(self, 'cov_kwds'):
             self.cov_kwds = {}
         if cov_type is None:
             cov_type = 'approx' if results.memory_no_likelihood else 'opg'
         self.cov_type = cov_type
+
+        # Setup the cache
         self._cache = {}
+
+        # Handle covariance matrix calculation
         if cov_kwds is None:
             cov_kwds = {}
-        self._cov_approx_complex_step = cov_kwds.pop('approx_complex_step',
-            True)
+        self._cov_approx_complex_step = (
+            cov_kwds.pop('approx_complex_step', True))
         self._cov_approx_centered = cov_kwds.pop('approx_centered', False)
         try:
             self._rank = None
             self._get_robustcov_results(cov_type=cov_type, use_self=True,
-                **cov_kwds)
+                                        **cov_kwds)
         except np.linalg.LinAlgError:
             self._rank = 0
             k_params = len(self.params)
             self.cov_params_default = np.zeros((k_params, k_params)) * np.nan
             self.cov_kwds['cov_type'] = (
-                'Covariance matrix could not be calculated: singular. information matrix.'
-                )
+                'Covariance matrix could not be calculated: singular.'
+                ' information matrix.')
         self.model.update(self.params, transformed=True, includes_fixed=True)
-        extra_arrays = ['filtered_state', 'filtered_state_cov',
-            'predicted_state', 'predicted_state_cov', 'forecasts',
-            'forecasts_error', 'forecasts_error_cov',
-            'standardized_forecasts_error', 'forecasts_error_diffuse_cov',
-            'predicted_diffuse_state_cov', 'scaled_smoothed_estimator',
+
+        # References of filter and smoother output
+        extra_arrays = [
+            'filtered_state', 'filtered_state_cov', 'predicted_state',
+            'predicted_state_cov', 'forecasts', 'forecasts_error',
+            'forecasts_error_cov', 'standardized_forecasts_error',
+            'forecasts_error_diffuse_cov', 'predicted_diffuse_state_cov',
+            'scaled_smoothed_estimator',
             'scaled_smoothed_estimator_cov', 'smoothing_error',
-            'smoothed_state', 'smoothed_state_cov',
-            'smoothed_state_autocov', 'smoothed_measurement_disturbance',
+            'smoothed_state',
+            'smoothed_state_cov', 'smoothed_state_autocov',
+            'smoothed_measurement_disturbance',
             'smoothed_state_disturbance',
             'smoothed_measurement_disturbance_cov',
             'smoothed_state_disturbance_cov']
         for name in extra_arrays:
             setattr(self, name, getattr(self.filter_results, name, None))
+
+        # Remove too-short results when memory conservation was used
         if self.filter_results.memory_no_forecast_mean:
             self.forecasts = None
             self.forecasts_error = None
@@ -1285,70 +2418,87 @@ class MLEResults(tsbase.TimeSeriesModelResults):
             pass
         if self.filter_results.memory_no_std_forecast:
             self.standardized_forecasts_error = None
+
+        # Save more convenient access to states
+        # (will create a private attribute _states here and provide actual
+        # access via a getter, so that we can e.g. issue a warning in the case
+        # that a useless Pandas index was given in the model specification)
         self._states = SimpleNamespace()
+
         use_pandas = isinstance(self.data, PandasData)
         index = self.model._index
         columns = self.model.state_names
-        if (self.predicted_state is None or self.filter_results.
-            memory_no_predicted_mean):
+
+        # Predicted states
+        # Note: a complication here is that we also include the initial values
+        # here, so that we need an extended index in the Pandas case
+        if (self.predicted_state is None or
+                self.filter_results.memory_no_predicted_mean):
             self._states.predicted = None
         elif use_pandas:
             extended_index = self.model._get_index_with_final_state()
-            self._states.predicted = pd.DataFrame(self.predicted_state.T,
-                index=extended_index, columns=columns)
+            self._states.predicted = pd.DataFrame(
+                self.predicted_state.T, index=extended_index, columns=columns)
         else:
             self._states.predicted = self.predicted_state.T
-        if (self.predicted_state_cov is None or self.filter_results.
-            memory_no_predicted_cov):
+        if (self.predicted_state_cov is None or
+                self.filter_results.memory_no_predicted_cov):
             self._states.predicted_cov = None
         elif use_pandas:
             extended_index = self.model._get_index_with_final_state()
             tmp = np.transpose(self.predicted_state_cov, (2, 0, 1))
-            self._states.predicted_cov = pd.DataFrame(np.reshape(tmp, (tmp.
-                shape[0] * tmp.shape[1], tmp.shape[2])), index=pd.
-                MultiIndex.from_product([extended_index, columns]).
-                swaplevel(), columns=columns)
+            self._states.predicted_cov = pd.DataFrame(
+                np.reshape(tmp, (tmp.shape[0] * tmp.shape[1], tmp.shape[2])),
+                index=pd.MultiIndex.from_product(
+                    [extended_index, columns]).swaplevel(),
+                columns=columns)
         else:
-            self._states.predicted_cov = np.transpose(self.
-                predicted_state_cov, (2, 0, 1))
-        if (self.filtered_state is None or self.filter_results.
-            memory_no_filtered_mean):
+            self._states.predicted_cov = np.transpose(
+                self.predicted_state_cov, (2, 0, 1))
+
+        # Filtered states
+        if (self.filtered_state is None or
+                self.filter_results.memory_no_filtered_mean):
             self._states.filtered = None
         elif use_pandas:
-            self._states.filtered = pd.DataFrame(self.filtered_state.T,
-                index=index, columns=columns)
+            self._states.filtered = pd.DataFrame(
+                self.filtered_state.T, index=index, columns=columns)
         else:
             self._states.filtered = self.filtered_state.T
-        if (self.filtered_state_cov is None or self.filter_results.
-            memory_no_filtered_cov):
+        if (self.filtered_state_cov is None or
+                self.filter_results.memory_no_filtered_cov):
             self._states.filtered_cov = None
         elif use_pandas:
             tmp = np.transpose(self.filtered_state_cov, (2, 0, 1))
-            self._states.filtered_cov = pd.DataFrame(np.reshape(tmp, (tmp.
-                shape[0] * tmp.shape[1], tmp.shape[2])), index=pd.
-                MultiIndex.from_product([index, columns]).swaplevel(),
+            self._states.filtered_cov = pd.DataFrame(
+                np.reshape(tmp, (tmp.shape[0] * tmp.shape[1], tmp.shape[2])),
+                index=pd.MultiIndex.from_product([index, columns]).swaplevel(),
                 columns=columns)
         else:
-            self._states.filtered_cov = np.transpose(self.
-                filtered_state_cov, (2, 0, 1))
+            self._states.filtered_cov = np.transpose(
+                self.filtered_state_cov, (2, 0, 1))
+
+        # Smoothed states
         if self.smoothed_state is None:
             self._states.smoothed = None
         elif use_pandas:
-            self._states.smoothed = pd.DataFrame(self.smoothed_state.T,
-                index=index, columns=columns)
+            self._states.smoothed = pd.DataFrame(
+                self.smoothed_state.T, index=index, columns=columns)
         else:
             self._states.smoothed = self.smoothed_state.T
         if self.smoothed_state_cov is None:
             self._states.smoothed_cov = None
         elif use_pandas:
             tmp = np.transpose(self.smoothed_state_cov, (2, 0, 1))
-            self._states.smoothed_cov = pd.DataFrame(np.reshape(tmp, (tmp.
-                shape[0] * tmp.shape[1], tmp.shape[2])), index=pd.
-                MultiIndex.from_product([index, columns]).swaplevel(),
+            self._states.smoothed_cov = pd.DataFrame(
+                np.reshape(tmp, (tmp.shape[0] * tmp.shape[1], tmp.shape[2])),
+                index=pd.MultiIndex.from_product([index, columns]).swaplevel(),
                 columns=columns)
         else:
-            self._states.smoothed_cov = np.transpose(self.
-                smoothed_state_cov, (2, 0, 1))
+            self._states.smoothed_cov = np.transpose(
+                self.smoothed_state_cov, (2, 0, 1))
+
+        # Handle removing data
         self._data_attr_model = getattr(self, '_data_attr_model', [])
         self._data_attr_model.extend(['ssm'])
         self._data_attr.extend(extra_arrays)
@@ -1398,28 +2548,116 @@ class MLEResults(tsbase.TimeSeriesModelResults):
           intermediate calculations use the 'approx' method.
         - 'none' for no covariance matrix calculation.
         """
-        pass
+        from statsmodels.base.covtype import descriptions
+
+        use_self = kwargs.pop('use_self', False)
+        if use_self:
+            res = self
+        else:
+            raise NotImplementedError
+            res = self.__class__(
+                self.model, self.params,
+                normalized_cov_params=self.normalized_cov_params,
+                scale=self.scale)
+
+        # Set the new covariance type
+        res.cov_type = cov_type
+        res.cov_kwds = {}
+
+        # Calculate the new covariance matrix
+        approx_complex_step = self._cov_approx_complex_step
+        if approx_complex_step:
+            approx_type_str = 'complex-step'
+        elif self._cov_approx_centered:
+            approx_type_str = 'centered finite differences'
+        else:
+            approx_type_str = 'finite differences'
+
+        k_params = len(self.params)
+        if k_params == 0:
+            res.cov_params_default = np.zeros((0, 0))
+            res._rank = 0
+            res.cov_kwds['description'] = 'No parameters estimated.'
+        elif cov_type == 'custom':
+            res.cov_type = kwargs['custom_cov_type']
+            res.cov_params_default = kwargs['custom_cov_params']
+            res.cov_kwds['description'] = kwargs['custom_description']
+            if len(self.fixed_params) > 0:
+                mask = np.ix_(self._free_params_index, self._free_params_index)
+            else:
+                mask = np.s_[...]
+            res._rank = np.linalg.matrix_rank(res.cov_params_default[mask])
+        elif cov_type == 'none':
+            res.cov_params_default = np.zeros((k_params, k_params)) * np.nan
+            res._rank = np.nan
+            res.cov_kwds['description'] = descriptions['none']
+        elif self.cov_type == 'approx':
+            res.cov_params_default = res.cov_params_approx
+            res.cov_kwds['description'] = descriptions['approx'].format(
+                                                approx_type=approx_type_str)
+        elif self.cov_type == 'oim':
+            res.cov_params_default = res.cov_params_oim
+            res.cov_kwds['description'] = descriptions['OIM'].format(
+                                                approx_type=approx_type_str)
+        elif self.cov_type == 'opg':
+            res.cov_params_default = res.cov_params_opg
+            res.cov_kwds['description'] = descriptions['OPG'].format(
+                                                approx_type=approx_type_str)
+        elif self.cov_type == 'robust' or self.cov_type == 'robust_oim':
+            res.cov_params_default = res.cov_params_robust_oim
+            res.cov_kwds['description'] = descriptions['robust-OIM'].format(
+                                                approx_type=approx_type_str)
+        elif self.cov_type == 'robust_approx':
+            res.cov_params_default = res.cov_params_robust_approx
+            res.cov_kwds['description'] = descriptions['robust-approx'].format(
+                                                approx_type=approx_type_str)
+        else:
+            raise NotImplementedError('Invalid covariance matrix type.')
+
+        return res

     @cache_readonly
     def aic(self):
         """
         (float) Akaike Information Criterion
         """
-        pass
+        return aic(self.llf, self.nobs_effective, self.df_model)

     @cache_readonly
     def aicc(self):
         """
         (float) Akaike Information Criterion with small sample correction
         """
-        pass
+        return aicc(self.llf, self.nobs_effective, self.df_model)

     @cache_readonly
     def bic(self):
         """
         (float) Bayes Information Criterion
         """
-        pass
+        return bic(self.llf, self.nobs_effective, self.df_model)
+
+    def _cov_params_approx(self, approx_complex_step=True,
+                           approx_centered=False):
+        evaluated_hessian = self.nobs_effective * self.model.hessian(
+            params=self.params, transformed=True, includes_fixed=True,
+            method='approx', approx_complex_step=approx_complex_step,
+            approx_centered=approx_centered)
+        # TODO: Case with "not approx_complex_step" is not hit in
+        # tests as of 2017-05-19
+
+        if len(self.fixed_params) > 0:
+            mask = np.ix_(self._free_params_index, self._free_params_index)
+            (tmp, singular_values) = pinv_extended(evaluated_hessian[mask])
+            neg_cov = np.zeros_like(evaluated_hessian) * np.nan
+            neg_cov[mask] = tmp
+        else:
+            (neg_cov, singular_values) = pinv_extended(evaluated_hessian)
+
+        self.model.update(self.params, transformed=True, includes_fixed=True)
+        if self._rank is None:
+            self._rank = np.linalg.matrix_rank(np.diag(singular_values))
+        return -neg_cov

     @cache_readonly
     def cov_params_approx(self):
@@ -1427,7 +2665,27 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         (array) The variance / covariance matrix. Computed using the numerical
         Hessian approximated by complex step or finite differences methods.
         """
-        pass
+        return self._cov_params_approx(self._cov_approx_complex_step,
+                                       self._cov_approx_centered)
+
+    def _cov_params_oim(self, approx_complex_step=True, approx_centered=False):
+        evaluated_hessian = self.nobs_effective * self.model.hessian(
+            self.params, hessian_method='oim', transformed=True,
+            includes_fixed=True, approx_complex_step=approx_complex_step,
+            approx_centered=approx_centered)
+
+        if len(self.fixed_params) > 0:
+            mask = np.ix_(self._free_params_index, self._free_params_index)
+            (tmp, singular_values) = pinv_extended(evaluated_hessian[mask])
+            neg_cov = np.zeros_like(evaluated_hessian) * np.nan
+            neg_cov[mask] = tmp
+        else:
+            (neg_cov, singular_values) = pinv_extended(evaluated_hessian)
+
+        self.model.update(self.params, transformed=True, includes_fixed=True)
+        if self._rank is None:
+            self._rank = np.linalg.matrix_rank(np.diag(singular_values))
+        return -neg_cov

     @cache_readonly
     def cov_params_oim(self):
@@ -1435,7 +2693,36 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         (array) The variance / covariance matrix. Computed using the method
         from Harvey (1989).
         """
-        pass
+        return self._cov_params_oim(self._cov_approx_complex_step,
+                                    self._cov_approx_centered)
+
+    def _cov_params_opg(self, approx_complex_step=True, approx_centered=False):
+        evaluated_hessian = self.nobs_effective * self.model._hessian_opg(
+            self.params, transformed=True, includes_fixed=True,
+            approx_complex_step=approx_complex_step,
+            approx_centered=approx_centered)
+
+        no_free_params = (self._free_params_index is not None and
+                          len(self._free_params_index) == 0)
+
+        if no_free_params:
+            neg_cov = np.zeros_like(evaluated_hessian) * np.nan
+            singular_values = np.empty(0)
+        elif len(self.fixed_params) > 0:
+            mask = np.ix_(self._free_params_index, self._free_params_index)
+            (tmp, singular_values) = pinv_extended(evaluated_hessian[mask])
+            neg_cov = np.zeros_like(evaluated_hessian) * np.nan
+            neg_cov[mask] = tmp
+        else:
+            (neg_cov, singular_values) = pinv_extended(evaluated_hessian)
+
+        self.model.update(self.params, transformed=True, includes_fixed=True)
+        if self._rank is None:
+            if no_free_params:
+                self._rank = 0
+            else:
+                self._rank = np.linalg.matrix_rank(np.diag(singular_values))
+        return -neg_cov

     @cache_readonly
     def cov_params_opg(self):
@@ -1443,7 +2730,8 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         (array) The variance / covariance matrix. Computed using the outer
         product of gradients method.
         """
-        pass
+        return self._cov_params_opg(self._cov_approx_complex_step,
+                                    self._cov_approx_centered)

     @cache_readonly
     def cov_params_robust(self):
@@ -1451,7 +2739,37 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         (array) The QMLE variance / covariance matrix. Alias for
         `cov_params_robust_oim`
         """
-        pass
+        return self.cov_params_robust_oim
+
+    def _cov_params_robust_oim(self, approx_complex_step=True,
+                               approx_centered=False):
+        cov_opg = self._cov_params_opg(approx_complex_step=approx_complex_step,
+                                       approx_centered=approx_centered)
+
+        evaluated_hessian = self.nobs_effective * self.model.hessian(
+            self.params, hessian_method='oim', transformed=True,
+            includes_fixed=True, approx_complex_step=approx_complex_step,
+            approx_centered=approx_centered)
+
+        if len(self.fixed_params) > 0:
+            mask = np.ix_(self._free_params_index, self._free_params_index)
+            cov_params = np.zeros_like(evaluated_hessian) * np.nan
+
+            cov_opg = cov_opg[mask]
+            evaluated_hessian = evaluated_hessian[mask]
+
+            tmp, singular_values = pinv_extended(
+                np.dot(np.dot(evaluated_hessian, cov_opg), evaluated_hessian))
+
+            cov_params[mask] = tmp
+        else:
+            (cov_params, singular_values) = pinv_extended(
+                np.dot(np.dot(evaluated_hessian, cov_opg), evaluated_hessian))
+
+        self.model.update(self.params, transformed=True, includes_fixed=True)
+        if self._rank is None:
+            self._rank = np.linalg.matrix_rank(np.diag(singular_values))
+        return cov_params

     @cache_readonly
     def cov_params_robust_oim(self):
@@ -1459,7 +2777,39 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         (array) The QMLE variance / covariance matrix. Computed using the
         method from Harvey (1989) as the evaluated hessian.
         """
-        pass
+        return self._cov_params_robust_oim(self._cov_approx_complex_step,
+                                           self._cov_approx_centered)
+
+    def _cov_params_robust_approx(self, approx_complex_step=True,
+                                  approx_centered=False):
+        cov_opg = self._cov_params_opg(approx_complex_step=approx_complex_step,
+                                       approx_centered=approx_centered)
+
+        evaluated_hessian = self.nobs_effective * self.model.hessian(
+            self.params, transformed=True, includes_fixed=True,
+            method='approx', approx_complex_step=approx_complex_step)
+        # TODO: Case with "not approx_complex_step" is not
+        # hit in tests as of 2017-05-19
+
+        if len(self.fixed_params) > 0:
+            mask = np.ix_(self._free_params_index, self._free_params_index)
+            cov_params = np.zeros_like(evaluated_hessian) * np.nan
+
+            cov_opg = cov_opg[mask]
+            evaluated_hessian = evaluated_hessian[mask]
+
+            tmp, singular_values = pinv_extended(
+                np.dot(np.dot(evaluated_hessian, cov_opg), evaluated_hessian))
+
+            cov_params[mask] = tmp
+        else:
+            (cov_params, singular_values) = pinv_extended(
+                np.dot(np.dot(evaluated_hessian, cov_opg), evaluated_hessian))
+
+        self.model.update(self.params, transformed=True, includes_fixed=True)
+        if self._rank is None:
+            self._rank = np.linalg.matrix_rank(np.diag(singular_values))
+        return cov_params

     @cache_readonly
     def cov_params_robust_approx(self):
@@ -1467,10 +2817,11 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         (array) The QMLE variance / covariance matrix. Computed using the
         numerical Hessian as the evaluated hessian.
         """
-        pass
+        return self._cov_params_robust_approx(self._cov_approx_complex_step,
+                                              self._cov_approx_centered)

     def info_criteria(self, criteria, method='standard'):
-        """
+        r"""
         Information criteria

         Parameters
@@ -1488,11 +2839,11 @@ class MLEResults(tsbase.TimeSeriesModelResults):

         .. math::

-            AIC & = -2 \\log L(Y_n | \\hat \\psi) + 2 k \\\\
-            BIC & = -2 \\log L(Y_n | \\hat \\psi) + k \\log n \\\\
-            HQIC & = -2 \\log L(Y_n | \\hat \\psi) + 2 k \\log \\log n \\\\
+            AIC & = -2 \log L(Y_n | \hat \psi) + 2 k \\
+            BIC & = -2 \log L(Y_n | \hat \psi) + k \log n \\
+            HQIC & = -2 \log L(Y_n | \hat \psi) + 2 k \log \log n \\

-        where :math:`\\hat \\psi` are the maximum likelihood estimates of the
+        where :math:`\hat \psi` are the maximum likelihood estimates of the
         parameters, :math:`n` is the number of observations, and `k` is the
         number of estimated parameters.

@@ -1503,9 +2854,9 @@ class MLEResults(tsbase.TimeSeriesModelResults):

         .. math::

-            AIC_L & = \\log | Q | + \\frac{2 k}{n} \\\\
-            BIC_L & = \\log | Q | + \\frac{k \\log n}{n} \\\\
-            HQIC_L & = \\log | Q | + \\frac{2 k \\log \\log n}{n} \\\\
+            AIC_L & = \log | Q | + \frac{2 k}{n} \\
+            BIC_L & = \log | Q | + \frac{k \log n}{n} \\
+            HQIC_L & = \log | Q | + \frac{2 k \log \log n}{n} \\

         where :math:`Q` is the state covariance matrix. Note that the Lütkepohl
         definitions do not apply to all state space models, and should be used
@@ -1516,35 +2867,77 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         .. [*] Lütkepohl, Helmut. 2007. *New Introduction to Multiple Time*
            *Series Analysis.* Berlin: Springer.
         """
-        pass
+        criteria = criteria.lower()
+        method = method.lower()
+
+        if method == 'standard':
+            out = getattr(self, criteria)
+        elif method == 'lutkepohl':
+            if self.filter_results.state_cov.shape[-1] > 1:
+                raise ValueError('Cannot compute Lütkepohl statistics for'
+                                 ' models with time-varying state covariance'
+                                 ' matrix.')
+
+            cov = self.filter_results.state_cov[:, :, 0]
+            if criteria == 'aic':
+                out = np.squeeze(np.linalg.slogdet(cov)[1] +
+                                 2 * self.df_model / self.nobs_effective)
+            elif criteria == 'bic':
+                out = np.squeeze(np.linalg.slogdet(cov)[1] +
+                                 self.df_model * np.log(self.nobs_effective) /
+                                 self.nobs_effective)
+            elif criteria == 'hqic':
+                out = np.squeeze(np.linalg.slogdet(cov)[1] +
+                                 2 * self.df_model *
+                                 np.log(np.log(self.nobs_effective)) /
+                                 self.nobs_effective)
+            else:
+                raise ValueError('Invalid information criteria')
+
+        else:
+            raise ValueError('Invalid information criteria computation method')
+
+        return out

     @cache_readonly
     def fittedvalues(self):
         """
         (array) The predicted values of the model. An (nobs x k_endog) array.
         """
-        pass
+        # This is a (k_endog x nobs array; do not want to squeeze in case of
+        # the corner case where nobs = 1 (mostly a concern in the predict or
+        # forecast functions, but here also to maintain consistency)
+        fittedvalues = self.forecasts
+        if fittedvalues is None:
+            pass
+        elif fittedvalues.shape[0] == 1:
+            fittedvalues = fittedvalues[0, :]
+        else:
+            fittedvalues = fittedvalues.T
+        return fittedvalues

     @cache_readonly
     def hqic(self):
         """
         (float) Hannan-Quinn Information Criterion
         """
-        pass
+        # return (-2 * self.llf +
+        #         2 * np.log(np.log(self.nobs_effective)) * self.df_model)
+        return hqic(self.llf, self.nobs_effective, self.df_model)

     @cache_readonly
     def llf_obs(self):
         """
         (float) The value of the log-likelihood function evaluated at `params`.
         """
-        pass
+        return self.filter_results.llf_obs

     @cache_readonly
     def llf(self):
         """
         (float) The value of the log-likelihood function evaluated at `params`.
         """
-        pass
+        return self.filter_results.llf

     @cache_readonly
     def loglikelihood_burn(self):
@@ -1552,21 +2945,21 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         (float) The number of observations during which the likelihood is not
         evaluated.
         """
-        pass
+        return self.filter_results.loglikelihood_burn

     @cache_readonly
     def mae(self):
         """
         (float) Mean absolute error
         """
-        pass
+        return np.mean(np.abs(self.resid))

     @cache_readonly
     def mse(self):
         """
         (float) Mean squared error
         """
-        pass
+        return self.sse / self.nobs

     @cache_readonly
     def pvalues(self):
@@ -1575,28 +2968,51 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         coefficients. Note that the coefficients are assumed to have a Normal
         distribution.
         """
-        pass
+        pvalues = np.zeros_like(self.zvalues) * np.nan
+        mask = np.ones_like(pvalues, dtype=bool)
+        mask[self._free_params_index] = True
+        mask &= ~np.isnan(self.zvalues)
+        pvalues[mask] = norm.sf(np.abs(self.zvalues[mask])) * 2
+        return pvalues

     @cache_readonly
     def resid(self):
         """
         (array) The model residuals. An (nobs x k_endog) array.
         """
-        pass
+        # This is a (k_endog x nobs array; do not want to squeeze in case of
+        # the corner case where nobs = 1 (mostly a concern in the predict or
+        # forecast functions, but here also to maintain consistency)
+        resid = self.forecasts_error
+        if resid is None:
+            pass
+        elif resid.shape[0] == 1:
+            resid = resid[0, :]
+        else:
+            resid = resid.T
+        return resid
+
+    @property
+    def states(self):
+        if self.model._index_generated and not self.model._index_none:
+            warnings.warn('No supported index is available. The `states`'
+                          ' DataFrame uses a generated integer index',
+                          ValueWarning)
+        return self._states

     @cache_readonly
     def sse(self):
         """
         (float) Sum of squared errors
         """
-        pass
+        return np.sum(self.resid**2)

     @cache_readonly
     def zvalues(self):
         """
         (array) The z-statistics for the coefficients.
         """
-        pass
+        return self.params / self.bse

     def test_normality(self, method):
         """
@@ -1626,11 +3042,29 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         standardized residuals excluding those corresponding to missing
         observations.
         """
-        pass
+        if method is None:
+            method = 'jarquebera'
+
+        if self.standardized_forecasts_error is None:
+            raise ValueError('Cannot compute test statistic when standardized'
+                             ' forecast errors have not been computed.')
+
+        if method == 'jarquebera':
+            from statsmodels.stats.stattools import jarque_bera
+            d = np.maximum(self.loglikelihood_burn, self.nobs_diffuse)
+            output = []
+            for i in range(self.model.k_endog):
+                resid = self.filter_results.standardized_forecasts_error[i, d:]
+                mask = ~np.isnan(resid)
+                output.append(jarque_bera(resid[mask]))
+        else:
+            raise NotImplementedError('Invalid normality test method.')
+
+        return np.array(output)

     def test_heteroskedasticity(self, method, alternative='two-sided',
-        use_f=True):
-        """
+                                use_f=True):
+        r"""
         Test for heteroskedasticity of standardized residuals

         Tests whether the sum-of-squares in the first third of the sample is
@@ -1675,8 +3109,8 @@ class MLEResults(tsbase.TimeSeriesModelResults):

         .. math::

-            H(h) = \\sum_{t=T-h+1}^T  \\tilde v_t^2
-            \\Bigg / \\sum_{t=d+1}^{d+1+h} \\tilde v_t^2
+            H(h) = \sum_{t=T-h+1}^T  \tilde v_t^2
+            \Bigg / \sum_{t=d+1}^{d+1+h} \tilde v_t^2

         where :math:`d` = max(loglikelihood_burn, nobs_diffuse)` (usually
         corresponding to diffuse initialization under either the approximate
@@ -1684,7 +3118,7 @@ class MLEResults(tsbase.TimeSeriesModelResults):

         This statistic can be tested against an :math:`F(h,h)` distribution.
         Alternatively, :math:`h H(h)` is asymptotically distributed according
-        to :math:`\\chi_h^2`; this second test can be applied by passing
+        to :math:`\chi_h^2`; this second test can be applied by passing
         `use_f=True` as an argument.

         See section 5.4 of [1]_ for the above formula and discussion, as well
@@ -1699,7 +3133,44 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         .. [1] Harvey, Andrew C. 1990. *Forecasting, Structural Time Series*
                *Models and the Kalman Filter.* Cambridge University Press.
         """
-        pass
+        if method is None:
+            method = 'breakvar'
+
+        if self.standardized_forecasts_error is None:
+            raise ValueError('Cannot compute test statistic when standardized'
+                             ' forecast errors have not been computed.')
+
+        if method == 'breakvar':
+            from statsmodels.tsa.stattools import (
+                breakvar_heteroskedasticity_test
+                )
+            # Store some values
+            resid = self.filter_results.standardized_forecasts_error
+            d = np.maximum(self.loglikelihood_burn, self.nobs_diffuse)
+            # This differs from self.nobs_effective because here we want to
+            # exclude exact diffuse periods, whereas self.nobs_effective only
+            # excludes explicitly burned (usually approximate diffuse) periods.
+            nobs_effective = self.nobs - d
+            h = int(np.round(nobs_effective / 3))
+
+            test_statistics = []
+            p_values = []
+            for i in range(self.model.k_endog):
+                test_statistic, p_value = breakvar_heteroskedasticity_test(
+                    resid[i, d:],
+                    subset_length=h,
+                    alternative=alternative,
+                    use_f=use_f
+                    )
+                test_statistics.append(test_statistic)
+                p_values.append(p_value)
+
+            output = np.c_[test_statistics, p_values]
+        else:
+            raise NotImplementedError('Invalid heteroskedasticity test'
+                                      ' method.')
+
+        return output

     def test_serial_correlation(self, method, df_adjust=False, lags=None):
         """
@@ -1752,12 +3223,54 @@ class MLEResults(tsbase.TimeSeriesModelResults):

         Output is nan for any endogenous variable which has missing values.
         """
-        pass
+        if method is None:
+            method = 'ljungbox'
+
+        if self.standardized_forecasts_error is None:
+            raise ValueError('Cannot compute test statistic when standardized'
+                             ' forecast errors have not been computed.')
+
+        if method == 'ljungbox' or method == 'boxpierce':
+            from statsmodels.stats.diagnostic import acorr_ljungbox
+            d = np.maximum(self.loglikelihood_burn, self.nobs_diffuse)
+            # This differs from self.nobs_effective because here we want to
+            # exclude exact diffuse periods, whereas self.nobs_effective only
+            # excludes explicitly burned (usually approximate diffuse) periods.
+            nobs_effective = self.nobs - d
+            output = []
+
+            # Default lags for acorr_ljungbox is 40, but may not always have
+            # that many observations
+            if lags is None:
+                seasonal_periods = getattr(self.model, "seasonal_periods", 0)
+                if seasonal_periods:
+                    lags = min(2 * seasonal_periods, nobs_effective // 5)
+                else:
+                    lags = min(10, nobs_effective // 5)
+
+            model_df = 0
+            if df_adjust:
+                model_df = max(0, self.df_model - self.k_diffuse_states - 1)
+
+            cols = [2, 3] if method == 'boxpierce' else [0, 1]
+            for i in range(self.model.k_endog):
+                results = acorr_ljungbox(
+                    self.filter_results.standardized_forecasts_error[i][d:],
+                    lags=lags, boxpierce=(method == 'boxpierce'),
+                    model_df=model_df)
+                output.append(np.asarray(results)[:, cols].T)
+
+            output = np.c_[output]
+        else:
+            raise NotImplementedError('Invalid serial correlation test'
+                                      ' method.')
+        return output

     def get_prediction(self, start=None, end=None, dynamic=False,
-        information_set='predicted', signal_only=False, index=None, exog=
-        None, extend_model=None, extend_kwargs=None, **kwargs):
-        """
+                       information_set='predicted', signal_only=False,
+                       index=None, exog=None, extend_model=None,
+                       extend_kwargs=None, **kwargs):
+        r"""
         In-sample prediction and out-of-sample forecasting

         Parameters
@@ -1795,9 +3308,9 @@ class MLEResults(tsbase.TimeSeriesModelResults):
             Whether to compute predictions of only the "signal" component of
             the observation equation. Default is False. For example, the
             observation equation of a time-invariant model is
-            :math:`y_t = d + Z \\alpha_t + \\varepsilon_t`, and the "signal"
-            component is then :math:`Z \\alpha_t`. If this argument is set to
-            True, then predictions of the "signal" :math:`Z \\alpha_t` will be
+            :math:`y_t = d + Z \alpha_t + \varepsilon_t`, and the "signal"
+            component is then :math:`Z \alpha_t`. If this argument is set to
+            True, then predictions of the "signal" :math:`Z \alpha_t` will be
             returned. Otherwise, the default is for predictions of :math:`y_t`
             to be returned.
         **kwargs
@@ -1819,10 +3332,46 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         get_forecast
             Out-of-sample forecasts and results including confidence intervals.
         """
-        pass
+        if start is None:
+            start = 0
+
+        # Handle start, end, dynamic
+        start, end, out_of_sample, prediction_index = (
+            self.model._get_prediction_index(start, end, index))
+
+        # Handle `dynamic`
+        if isinstance(dynamic, (str, dt.datetime, pd.Timestamp)):
+            dynamic, _, _ = self.model._get_index_loc(dynamic)
+            # Convert to offset relative to start
+            dynamic = dynamic - start
+
+        # If we have out-of-sample forecasting and `exog` or in general any
+        # kind of time-varying state space model, then we need to create an
+        # extended model to get updated state space system matrices
+        if extend_model is None:
+            extend_model = (self.model.exog is not None or
+                            not self.filter_results.time_invariant)
+        if out_of_sample and extend_model:
+            kwargs = self.model._get_extension_time_varying_matrices(
+                self.params, exog, out_of_sample, extend_kwargs,
+                transformed=True, includes_fixed=True, **kwargs)
+
+        # Make sure the model class has the current parameters
+        self.model.update(self.params, transformed=True, includes_fixed=True)
+
+        # Perform the prediction
+        # This is a (k_endog x npredictions) array; do not want to squeeze in
+        # case of npredictions = 1
+        prediction_results = self.filter_results.predict(
+            start, end + out_of_sample + 1, dynamic, **kwargs)
+
+        # Return a new mlemodel.PredictionResults object
+        return PredictionResultsWrapper(PredictionResults(
+            self, prediction_results, information_set=information_set,
+            signal_only=signal_only, row_labels=prediction_index))

     def get_forecast(self, steps=1, signal_only=False, **kwargs):
-        """
+        r"""
         Out-of-sample forecasts and prediction intervals

         Parameters
@@ -1836,9 +3385,9 @@ class MLEResults(tsbase.TimeSeriesModelResults):
             Whether to compute forecasts of only the "signal" component of
             the observation equation. Default is False. For example, the
             observation equation of a time-invariant model is
-            :math:`y_t = d + Z \\alpha_t + \\varepsilon_t`, and the "signal"
-            component is then :math:`Z \\alpha_t`. If this argument is set to
-            True, then forecasts of the "signal" :math:`Z \\alpha_t` will be
+            :math:`y_t = d + Z \alpha_t + \varepsilon_t`, and the "signal"
+            component is then :math:`Z \alpha_t`. If this argument is set to
+            True, then forecasts of the "signal" :math:`Z \alpha_t` will be
             returned. Otherwise, the default is for forecasts of :math:`y_t`
             to be returned.
         **kwargs
@@ -1861,11 +3410,16 @@ class MLEResults(tsbase.TimeSeriesModelResults):
             In-sample predictions / out-of-sample forecasts and results
             including confidence intervals.
         """
-        pass
+        if isinstance(steps, int):
+            end = self.nobs + steps - 1
+        else:
+            end = steps
+        return self.get_prediction(start=self.nobs, end=end,
+                                   signal_only=signal_only, **kwargs)

-    def predict(self, start=None, end=None, dynamic=False, information_set=
-        'predicted', signal_only=False, **kwargs):
-        """
+    def predict(self, start=None, end=None, dynamic=False,
+                information_set='predicted', signal_only=False, **kwargs):
+        r"""
         In-sample prediction and out-of-sample forecasting

         Parameters
@@ -1903,9 +3457,9 @@ class MLEResults(tsbase.TimeSeriesModelResults):
             Whether to compute predictions of only the "signal" component of
             the observation equation. Default is False. For example, the
             observation equation of a time-invariant model is
-            :math:`y_t = d + Z \\alpha_t + \\varepsilon_t`, and the "signal"
-            component is then :math:`Z \\alpha_t`. If this argument is set to
-            True, then predictions of the "signal" :math:`Z \\alpha_t` will be
+            :math:`y_t = d + Z \alpha_t + \varepsilon_t`, and the "signal"
+            component is then :math:`Z \alpha_t`. If this argument is set to
+            True, then predictions of the "signal" :math:`Z \alpha_t` will be
             returned. Otherwise, the default is for predictions of :math:`y_t`
             to be returned.
         **kwargs
@@ -1929,10 +3483,14 @@ class MLEResults(tsbase.TimeSeriesModelResults):
             In-sample predictions / out-of-sample forecasts and results
             including confidence intervals.
         """
-        pass
+        # Perform the prediction
+        prediction_results = self.get_prediction(
+            start, end, dynamic, information_set=information_set,
+            signal_only=signal_only, **kwargs)
+        return prediction_results.predicted_mean

     def forecast(self, steps=1, signal_only=False, **kwargs):
-        """
+        r"""
         Out-of-sample forecasts

         Parameters
@@ -1946,9 +3504,9 @@ class MLEResults(tsbase.TimeSeriesModelResults):
             Whether to compute forecasts of only the "signal" component of
             the observation equation. Default is False. For example, the
             observation equation of a time-invariant model is
-            :math:`y_t = d + Z \\alpha_t + \\varepsilon_t`, and the "signal"
-            component is then :math:`Z \\alpha_t`. If this argument is set to
-            True, then forecasts of the "signal" :math:`Z \\alpha_t` will be
+            :math:`y_t = d + Z \alpha_t + \varepsilon_t`, and the "signal"
+            component is then :math:`Z \alpha_t`. If this argument is set to
+            True, then forecasts of the "signal" :math:`Z \alpha_t` will be
             returned. Otherwise, the default is for forecasts of :math:`y_t`
             to be returned.
         **kwargs
@@ -1972,14 +3530,22 @@ class MLEResults(tsbase.TimeSeriesModelResults):
             In-sample predictions / out-of-sample forecasts and results
             including confidence intervals.
         """
-        pass
-
-    def simulate(self, nsimulations, measurement_shocks=None, state_shocks=
-        None, initial_state=None, anchor=None, repetitions=None, exog=None,
-        extend_model=None, extend_kwargs=None,
-        pretransformed_measurement_shocks=True, pretransformed_state_shocks
-        =True, pretransformed_initial_state=True, random_state=None, **kwargs):
-        """
+        if isinstance(steps, int):
+            end = self.nobs + steps - 1
+        else:
+            end = steps
+        return self.predict(start=self.nobs, end=end, signal_only=signal_only,
+                            **kwargs)
+
+    def simulate(self, nsimulations, measurement_shocks=None,
+                 state_shocks=None, initial_state=None, anchor=None,
+                 repetitions=None, exog=None, extend_model=None,
+                 extend_kwargs=None,
+                 pretransformed_measurement_shocks=True,
+                 pretransformed_state_shocks=True,
+                 pretransformed_initial_state=True,
+                 random_state=None, **kwargs):
+        r"""
         Simulate a new time series following the state space model

         Parameters
@@ -1991,13 +3557,13 @@ class MLEResults(tsbase.TimeSeriesModelResults):
             number
         measurement_shocks : array_like, optional
             If specified, these are the shocks to the measurement equation,
-            :math:`\\varepsilon_t`. If unspecified, these are automatically
+            :math:`\varepsilon_t`. If unspecified, these are automatically
             generated using a pseudo-random number generator. If specified,
             must be shaped `nsimulations` x `k_endog`, where `k_endog` is the
             same as in the state space model.
         state_shocks : array_like, optional
             If specified, these are the shocks to the state equation,
-            :math:`\\eta_t`. If unspecified, these are automatically
+            :math:`\eta_t`. If unspecified, these are automatically
             generated using a pseudo-random number generator. If specified,
             must be shaped `nsimulations` x `k_posdef` where `k_posdef` is the
             same as in the state space model.
@@ -2064,10 +3630,51 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         impulse_responses
             Impulse response functions
         """
-        pass
+        # Get the starting location
+        if anchor is None or anchor == 'start':
+            iloc = 0
+        elif anchor == 'end':
+            iloc = self.nobs
+        else:
+            iloc, _, _ = self.model._get_index_loc(anchor)
+            if isinstance(iloc, slice):
+                iloc = iloc.start
+
+        if iloc < 0:
+            iloc = self.nobs + iloc
+        if iloc > self.nobs:
+            raise ValueError('Cannot anchor simulation outside of the sample.')
+
+        # Setup the initial state
+        if initial_state is None:
+            initial_state_moments = (
+                self.predicted_state[:, iloc],
+                self.predicted_state_cov[:, :, iloc])
+
+            _repetitions = 1 if repetitions is None else repetitions
+
+            initial_state = np.random.multivariate_normal(
+                *initial_state_moments, size=_repetitions).T
+
+        scale = self.scale if self.filter_results.filter_concentrated else None
+        with self.model.ssm.fixed_scale(scale):
+            sim = self.model.simulate(
+                self.params, nsimulations,
+                measurement_shocks=measurement_shocks,
+                state_shocks=state_shocks, initial_state=initial_state,
+                anchor=anchor, repetitions=repetitions, exog=exog,
+                transformed=True, includes_fixed=True,
+                extend_model=extend_model, extend_kwargs=extend_kwargs,
+                pretransformed_measurement_shocks=(
+                    pretransformed_measurement_shocks),
+                pretransformed_state_shocks=pretransformed_state_shocks,
+                pretransformed_initial_state=pretransformed_initial_state,
+                random_state=random_state, **kwargs)
+
+        return sim

     def impulse_responses(self, steps=1, impulse=0, orthogonalized=False,
-        cumulative=False, **kwargs):
+                          cumulative=False, **kwargs):
         """
         Impulse response function

@@ -2134,12 +3741,170 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         Intercepts in the measurement and state equation are ignored when
         calculating impulse responses.
         """
-        pass
+        scale = self.scale if self.filter_results.filter_concentrated else None
+        with self.model.ssm.fixed_scale(scale):
+            irfs = self.model.impulse_responses(self.params, steps, impulse,
+                                                orthogonalized, cumulative,
+                                                **kwargs)
+            # These are wrapped automatically, so just return the array
+            if isinstance(irfs, (pd.Series, pd.DataFrame)):
+                irfs = irfs.values
+        return irfs
+
+    def _apply(self, mod, refit=False, fit_kwargs=None):
+        if fit_kwargs is None:
+            fit_kwargs = {}
+
+        if refit:
+            fit_kwargs.setdefault('start_params', self.params)
+            if self._has_fixed_params:
+                fit_kwargs.setdefault('includes_fixed', True)
+                res = mod.fit_constrained(self._fixed_params, **fit_kwargs)
+            else:
+                res = mod.fit(**fit_kwargs)
+        else:
+            if 'cov_type' in fit_kwargs:
+                raise ValueError('Cannot specify covariance type in'
+                                 ' `fit_kwargs` unless refitting'
+                                 ' parameters (not available in extend).')
+            if 'cov_kwds' in fit_kwargs:
+                raise ValueError('Cannot specify covariance keyword arguments'
+                                 ' in `fit_kwargs` unless refitting'
+                                 ' parameters (not available in extend).')
+
+            if self.cov_type == 'none':
+                fit_kwargs['cov_type'] = 'none'
+            else:
+                fit_kwargs['cov_type'] = 'custom'
+                fit_kwargs['cov_kwds'] = {
+                    'custom_cov_type': self.cov_type,
+                    'custom_cov_params': self.cov_params_default,
+                    'custom_description': (
+                        'Parameters and standard errors were estimated using a'
+                        ' different dataset and were then applied to this'
+                        ' dataset. %s'
+                        % self.cov_kwds.get('description', 'Unknown.'))}
+
+            if self.smoother_results is not None:
+                func = mod.smooth
+            else:
+                func = mod.filter
+
+            if self._has_fixed_params:
+                with mod.fix_params(self._fixed_params):
+                    fit_kwargs.setdefault('includes_fixed', True)
+                    res = func(self.params, **fit_kwargs)
+            else:
+                res = func(self.params, **fit_kwargs)
+
+        return res
+
+    def _get_previous_updated(self, comparison, exog=None,
+                              comparison_type=None, **kwargs):
+        # If we were given data, create a new results object
+        comparison_dataset = not isinstance(
+            comparison, (MLEResults, MLEResultsWrapper))
+        if comparison_dataset:
+            # If `exog` is longer than `comparison`, then we extend it to match
+            nobs_endog = len(comparison)
+            nobs_exog = len(exog) if exog is not None else nobs_endog
+
+            if nobs_exog > nobs_endog:
+                _, _, _, ix = self.model._get_prediction_index(
+                    start=0, end=nobs_exog - 1)
+                # TODO: check that the index of `comparison` matches the model
+                comparison = np.asarray(comparison)
+                if comparison.ndim < 2:
+                    comparison = np.atleast_2d(comparison).T
+                if (comparison.ndim != 2 or
+                        comparison.shape[1] != self.model.k_endog):
+                    raise ValueError('Invalid shape for `comparison`. Must'
+                                     f' contain {self.model.k_endog} columns.')
+                extra = np.zeros((nobs_exog - nobs_endog,
+                                  self.model.k_endog)) * np.nan
+                comparison = pd.DataFrame(
+                    np.concatenate([comparison, extra], axis=0), index=ix,
+                    columns=self.model.endog_names)
+
+            # Get the results object
+            comparison = self.apply(comparison, exog=exog,
+                                    copy_initialization=True, **kwargs)
+
+        # Now, figure out the `updated` versus `previous` results objects
+        nmissing = self.filter_results.missing.sum()
+        nmissing_comparison = comparison.filter_results.missing.sum()
+        if (comparison_type == 'updated' or (comparison_type is None and (
+                comparison.nobs > self.nobs or
+                (comparison.nobs == self.nobs and
+                 nmissing > nmissing_comparison)))):
+            updated = comparison
+            previous = self
+        elif (comparison_type == 'previous' or (comparison_type is None and (
+                comparison.nobs < self.nobs or
+                (comparison.nobs == self.nobs and
+                 nmissing < nmissing_comparison)))):
+            updated = self
+            previous = comparison
+        else:
+            raise ValueError('Could not automatically determine the type'
+                             ' of comparison requested to compute the'
+                             ' News, so it must be specified as "updated"'
+                             ' or "previous", using the `comparison_type`'
+                             ' keyword argument')
+
+        # Check that the index of `updated` is a superset of the
+        # index of `previous`
+        # Note: the try/except block is for Pandas < 0.25, in which
+        # `PeriodIndex.difference` raises a ValueError if the argument is not
+        # also a `PeriodIndex`.
+        diff = previous.model._index.difference(updated.model._index)
+        if len(diff) > 0:
+            raise ValueError('The index associated with the updated results is'
+                             ' not a superset of the index associated with the'
+                             ' previous results, and so these datasets do not'
+                             ' appear to be related. Can only compute the'
+                             ' news by comparing this results set to previous'
+                             ' results objects.')
+
+        return previous, updated, comparison_dataset
+
+    def _news_previous_results(self, previous, start, end, periods,
+                               revisions_details_start=False,
+                               state_index=None):
+        # Compute the news
+        out = self.smoother_results.news(
+            previous.smoother_results, start=start, end=end,
+            revisions_details_start=revisions_details_start,
+            state_index=state_index)
+        return out
+
+    def _news_updated_results(self, updated, start, end, periods,
+                              revisions_details_start=False, state_index=None):
+        return updated._news_previous_results(
+            self, start, end, periods,
+            revisions_details_start=revisions_details_start,
+            state_index=state_index)
+
+    def _news_previous_data(self, endog, start, end, periods, exog,
+                            revisions_details_start=False, state_index=None):
+        previous = self.apply(endog, exog=exog, copy_initialization=True)
+        return self._news_previous_results(
+            previous, start, end, periods,
+            revisions_details_start=revisions_details_start,
+            state_index=state_index)
+
+    def _news_updated_data(self, endog, start, end, periods, exog,
+                           revisions_details_start=False, state_index=None):
+        updated = self.apply(endog, exog=exog, copy_initialization=True)
+        return self._news_updated_results(
+            updated, start, end, periods,
+            revisions_details_start=revisions_details_start,
+            state_index=state_index)

     def news(self, comparison, impact_date=None, impacted_variable=None,
-        start=None, end=None, periods=None, exog=None, comparison_type=None,
-        revisions_details_start=False, state_index=None, return_raw=False,
-        tolerance=1e-10, **kwargs):
+             start=None, end=None, periods=None, exog=None,
+             comparison_type=None, revisions_details_start=False,
+             state_index=None, return_raw=False, tolerance=1e-10, **kwargs):
         """
         Compute impacts from updated data (news and revisions)

@@ -2221,11 +3986,101 @@ class MLEResults(tsbase.TimeSeriesModelResults):
                In Handbook of economic forecasting, vol. 2, pp. 195-237.
                Elsevier, 2013.
         """
-        pass
+        # Validate input
+        if self.smoother_results is None:
+            raise ValueError('Cannot compute news without Kalman smoother'
+                             ' results.')
+
+        if state_index is not None:
+            state_index = np.sort(np.array(state_index, dtype=int))
+            if state_index[0] < 0:
+                raise ValueError('Cannot include negative indexes in'
+                                 ' `state_index`.')
+            if state_index[-1] >= self.model.k_states:
+                raise ValueError(f'Given state index {state_index[-1]} is too'
+                                 ' large for the number of states in the model'
+                                 f' ({self.model.k_states}).')
+
+        if not isinstance(revisions_details_start, (int, bool)):
+            revisions_details_start, _, _, _ = (
+                self.model._get_prediction_index(
+                    revisions_details_start, revisions_details_start))
+
+        # Get the previous and updated results objects from `self` and
+        # `comparison`:
+        previous, updated, comparison_dataset = self._get_previous_updated(
+            comparison, exog=exog, comparison_type=comparison_type, **kwargs)
+
+        # Handle start, end, periods
+        start, end, prediction_index = get_impact_dates(
+            previous_model=previous.model, updated_model=updated.model,
+            impact_date=impact_date, start=start, end=end, periods=periods)
+
+        # News results will always use Pandas, so if the model's data was not
+        # from Pandas, we'll create an index, as if the model's data had been
+        # given a default Pandas index.
+        if prediction_index is None:
+            prediction_index = pd.RangeIndex(start=start, stop=end + 1)
+
+        # For time-varying models try to create an appended `updated` model
+        # with NaN values. Do not extend the model if this was already done
+        # above (i.e. the case that `comparison` was a new dataset), because
+        # in that case `exog` and `kwargs` should have
+        # been set with the input `comparison` dataset in mind, and so would be
+        # useless here. Ultimately, we've already extended `updated` as far
+        # as we can. So raise an  exception in that case with a useful message.
+        # However, we still want to try to accommodate extending the model here
+        # if it is possible.
+        # Note that we do not need to extend time-invariant models, because
+        # `KalmanSmoother.news` can itself handle any impact dates for
+        # time-invariant models.
+        time_varying = not (previous.filter_results.time_invariant or
+                            updated.filter_results.time_invariant)
+        if time_varying and end >= updated.nobs:
+            # If we the given `comparison` was a dataset and either `exog` or
+            # `kwargs` was set, then we assume that we cannot create an updated
+            # time-varying model (because then we can't tell if `kwargs` and
+            # `exog` arguments are meant to apply to the `comparison` dataset
+            # or to this extension)
+            if comparison_dataset and (exog is not None or len(kwargs) > 0):
+                if comparison is updated:
+                    raise ValueError('If providing an updated dataset as the'
+                                     ' `comparison` with a time-varying model,'
+                                     ' then the `end` period cannot be beyond'
+                                     ' the end of that updated dataset.')
+                else:
+                    raise ValueError('If providing an previous dataset as the'
+                                     ' `comparison` with a time-varying model,'
+                                     ' then the `end` period cannot be beyond'
+                                     ' the end of the (updated) results'
+                                     ' object.')
+
+            # Try to extend `updated`
+            updated_orig = updated
+            # TODO: `append` should fix this k_endog=1 issue for us
+            # TODO: is the + 1 necessary?
+            if self.model.k_endog > 1:
+                extra = np.zeros((end - updated.nobs + 1,
+                                  self.model.k_endog)) * np.nan
+            else:
+                extra = np.zeros((end - updated.nobs + 1,)) * np.nan
+            updated = updated_orig.append(extra, exog=exog, **kwargs)
+
+        # Compute the news
+        news_results = updated._news_previous_results(
+            previous, start, end + 1, periods,
+            revisions_details_start=revisions_details_start,
+            state_index=state_index)
+
+        if not return_raw:
+            news_results = NewsResults(
+                news_results, self, updated, previous, impacted_variable,
+                tolerance, row_labels=prediction_index)
+        return news_results

     def get_smoothed_decomposition(self, decomposition_of='smoothed_state',
-        state_index=None):
-        """
+                                   state_index=None):
+        r"""
         Decompose smoothed output into contributions from observations

         Parameters
@@ -2296,14 +4151,70 @@ class MLEResults(tsbase.TimeSeriesModelResults):

         Notes
         -----
-        Denote the smoothed state at time :math:`t` by :math:`\\alpha_t`. Then
-        the smoothed signal is :math:`Z_t \\alpha_t`, where :math:`Z_t` is the
+        Denote the smoothed state at time :math:`t` by :math:`\alpha_t`. Then
+        the smoothed signal is :math:`Z_t \alpha_t`, where :math:`Z_t` is the
         design matrix operative at time :math:`t`.
         """
-        pass
+        (data_contributions, obs_intercept_contributions,
+         state_intercept_contributions, prior_contributions) = (
+            self.smoother_results.get_smoothed_decomposition(
+                decomposition_of=decomposition_of, state_index=state_index))
+
+        # Construct indexes
+        endog_names = self.model.endog_names
+        if self.model.k_endog == 1:
+            endog_names = [endog_names]
+
+        if decomposition_of == 'smoothed_state':
+            contributions_to = pd.MultiIndex.from_product(
+                [self.model.state_names, self.model._index],
+                names=['state_to', 'date_to'])
+        else:
+            contributions_to = pd.MultiIndex.from_product(
+                [endog_names, self.model._index],
+                names=['variable_to', 'date_to'])
+        contributions_from = pd.MultiIndex.from_product(
+            [endog_names, self.model._index],
+            names=['variable_from', 'date_from'])
+        obs_intercept_contributions_from = pd.MultiIndex.from_product(
+            [endog_names, self.model._index],
+            names=['obs_intercept_from', 'date_from'])
+        state_intercept_contributions_from = pd.MultiIndex.from_product(
+            [self.model.state_names, self.model._index],
+            names=['state_intercept_from', 'date_from'])
+        prior_contributions_from = pd.Index(self.model.state_names,
+                                            name='initial_state_from')
+
+        # Construct DataFrames
+        shape = data_contributions.shape
+        data_contributions = pd.DataFrame(
+            data_contributions.reshape(
+                shape[0] * shape[1], shape[2] * shape[3], order='F'),
+            index=contributions_to, columns=contributions_from)
+
+        shape = obs_intercept_contributions.shape
+        obs_intercept_contributions = pd.DataFrame(
+            obs_intercept_contributions.reshape(
+                shape[0] * shape[1], shape[2] * shape[3], order='F'),
+            index=contributions_to, columns=obs_intercept_contributions_from)
+
+        shape = state_intercept_contributions.shape
+        state_intercept_contributions = pd.DataFrame(
+            state_intercept_contributions.reshape(
+                shape[0] * shape[1], shape[2] * shape[3], order='F'),
+            index=contributions_to, columns=state_intercept_contributions_from)
+
+        shape = prior_contributions.shape
+        prior_contributions = pd.DataFrame(
+            prior_contributions.reshape(shape[0] * shape[1], shape[2],
+                                        order='F'),
+            index=contributions_to, columns=prior_contributions_from)
+
+        return (data_contributions, obs_intercept_contributions,
+                state_intercept_contributions, prior_contributions)

     def append(self, endog, exog=None, refit=False, fit_kwargs=None,
-        copy_initialization=False, **kwargs):
+               copy_initialization=False, **kwargs):
         """
         Recreate the results object with new data appended to the original data

@@ -2392,7 +4303,54 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         2003    0.878
         Freq: A-DEC, dtype: float64
         """
-        pass
+        start = self.nobs
+        end = self.nobs + len(endog) - 1
+        _, _, _, append_ix = self.model._get_prediction_index(start, end)
+
+        # Check the index of the new data
+        if isinstance(self.model.data, PandasData):
+            _check_index(append_ix, endog, '`endog`')
+
+        # Concatenate the new data to original data
+        new_endog = concat([self.model.data.orig_endog, endog], axis=0,
+                           allow_mix=True)
+
+        # Handle `exog`
+        if exog is not None:
+            _, exog = prepare_exog(exog)
+            _check_index(append_ix, exog, '`exog`')
+
+            new_exog = concat([self.model.data.orig_exog, exog], axis=0,
+                              allow_mix=True)
+        else:
+            new_exog = None
+
+        # Create a continuous index for the combined data
+        if isinstance(self.model.data, PandasData):
+            start = 0
+            end = len(new_endog) - 1
+            _, _, _, new_index = self.model._get_prediction_index(start, end)
+
+            # Standardize `endog` to have the right index and columns
+            columns = self.model.endog_names
+            if not isinstance(columns, list):
+                columns = [columns]
+            new_endog = pd.DataFrame(new_endog, index=new_index,
+                                     columns=columns)
+
+            # Standardize `exog` to have the right index
+            if new_exog is not None:
+                new_exog = pd.DataFrame(new_exog, index=new_index,
+                                        columns=self.model.exog_names)
+
+        if copy_initialization:
+            init = Initialization.from_results(self.filter_results)
+            kwargs.setdefault('initialization', init)
+
+        mod = self.model.clone(new_endog, exog=new_exog, **kwargs)
+        res = self._apply(mod, refit=refit, fit_kwargs=fit_kwargs)
+
+        return res

     def extend(self, endog, exog=None, fit_kwargs=None, **kwargs):
         """
@@ -2469,10 +4427,29 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         2003    0.878
         Freq: A-DEC, dtype: float64
         """
-        pass
+        start = self.nobs
+        end = self.nobs + len(endog) - 1
+        _, _, _, extend_ix = self.model._get_prediction_index(start, end)
+
+        if isinstance(self.model.data, PandasData):
+            _check_index(extend_ix, endog, '`endog`')
+
+            # Standardize `endog` to have the right index and columns
+            columns = self.model.endog_names
+            if not isinstance(columns, list):
+                columns = [columns]
+            endog = pd.DataFrame(endog, index=extend_ix, columns=columns)
+        # Extend the current fit result to additional data
+        mod = self.model.clone(endog, exog=exog, **kwargs)
+        mod.ssm.initialization = Initialization(
+            mod.k_states, 'known', constant=self.predicted_state[..., -1],
+            stationary_cov=self.predicted_state_cov[..., -1])
+        res = self._apply(mod, refit=False, fit_kwargs=fit_kwargs)
+
+        return res

     def apply(self, endog, exog=None, refit=False, fit_kwargs=None,
-        copy_initialization=False, **kwargs):
+              copy_initialization=False, **kwargs):
         """
         Apply the fitted parameters to new data unrelated to the original data

@@ -2554,11 +4531,19 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         1983    1.1707
         Freq: A-DEC, dtype: float64
         """
-        pass
+        mod = self.model.clone(endog, exog=exog, **kwargs)
+
+        if copy_initialization:
+            init = Initialization.from_results(self.filter_results)
+            mod.ssm.initialization = init
+
+        res = self._apply(mod, refit=refit, fit_kwargs=fit_kwargs)
+
+        return res

     def plot_diagnostics(self, variable=0, lags=10, fig=None, figsize=None,
-        truncate_endog_names=24, auto_ylims=False, bartlett_confint=False,
-        acf_kwargs=None):
+                         truncate_endog_names=24, auto_ylims=False,
+                         bartlett_confint=False, acf_kwargs=None):
         """
         Diagnostic plots for standardized residuals of one endogenous variable

@@ -2628,12 +4613,83 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         [2] Brockwell and Davis, 2010. Introduction to Time Series and
         Forecasting, 2nd edition.
         """
-        pass
+        from statsmodels.graphics.utils import _import_mpl, create_mpl_fig
+        _import_mpl()
+        fig = create_mpl_fig(fig, figsize)
+        # Eliminate residuals associated with burned or diffuse likelihoods
+        d = np.maximum(self.loglikelihood_burn, self.nobs_diffuse)

-    def summary(self, alpha=0.05, start=None, title=None, model_name=None,
-        display_params=True, display_diagnostics=True, truncate_endog_names
-        =None, display_max_endog=None, extra_top_left=None, extra_top_right
-        =None):
+        # If given a variable name, find the index
+        if isinstance(variable, str):
+            variable = self.model.endog_names.index(variable)
+
+        # Get residuals
+        if hasattr(self.data, 'dates') and self.data.dates is not None:
+            ix = self.data.dates[d:]
+        else:
+            ix = np.arange(self.nobs - d)
+        resid = pd.Series(
+            self.filter_results.standardized_forecasts_error[variable, d:],
+            index=ix)
+
+        if resid.shape[0] < max(d, lags):
+            raise ValueError(
+                "Length of endogenous variable must be larger the the number "
+                "of lags used in the model and the number of observations "
+                "burned in the log-likelihood calculation."
+            )
+
+        # Top-left: residuals vs time
+        ax = fig.add_subplot(221)
+        resid.dropna().plot(ax=ax)
+        ax.hlines(0, ix[0], ix[-1], alpha=0.5)
+        ax.set_xlim(ix[0], ix[-1])
+        name = self.model.endog_names[variable]
+        if len(name) > truncate_endog_names:
+            name = name[:truncate_endog_names - 3] + '...'
+        ax.set_title(f'Standardized residual for "{name}"')
+
+        # Top-right: histogram, Gaussian kernel density, Normal density
+        # Can only do histogram and Gaussian kernel density on the non-null
+        # elements
+        resid_nonmissing = resid.dropna()
+        ax = fig.add_subplot(222)
+
+        ax.hist(resid_nonmissing, density=True, label='Hist',
+                edgecolor='#FFFFFF')
+
+        from scipy.stats import gaussian_kde, norm
+        kde = gaussian_kde(resid_nonmissing)
+        xlim = (-1.96*2, 1.96*2)
+        x = np.linspace(xlim[0], xlim[1])
+        ax.plot(x, kde(x), label='KDE')
+        ax.plot(x, norm.pdf(x), label='N(0,1)')
+        ax.set_xlim(xlim)
+        ax.legend()
+        ax.set_title('Histogram plus estimated density')
+
+        # Bottom-left: QQ plot
+        ax = fig.add_subplot(223)
+        from statsmodels.graphics.gofplots import qqplot
+        qqplot(resid_nonmissing, line='s', ax=ax)
+        ax.set_title('Normal Q-Q')
+
+        # Bottom-right: Correlogram
+        ax = fig.add_subplot(224)
+        from statsmodels.graphics.tsaplots import plot_acf
+
+        if acf_kwargs is None:
+            acf_kwargs = {}
+        plot_acf(resid, ax=ax, lags=lags, auto_ylims=auto_ylims,
+                 bartlett_confint=bartlett_confint, **acf_kwargs)
+        ax.set_title('Correlogram')
+
+        return fig
+
+    def summary(self, alpha=.05, start=None, title=None, model_name=None,
+                display_params=True, display_diagnostics=True,
+                truncate_endog_names=None, display_max_endog=None,
+                extra_top_left=None, extra_top_right=None):
         """
         Summarize the Model

@@ -2656,22 +4712,195 @@ class MLEResults(tsbase.TimeSeriesModelResults):
         --------
         statsmodels.iolib.summary.Summary
         """
-        pass
+        from statsmodels.iolib.summary import Summary
+        from statsmodels.iolib.table import SimpleTable
+        from statsmodels.iolib.tableformatting import fmt_params
+
+        # Model specification results
+        model = self.model
+        if title is None:
+            title = 'Statespace Model Results'
+
+        if start is None:
+            start = 0
+        if self.model._index_dates:
+            ix = self.model._index
+            d = ix[start]
+            sample = ['%02d-%02d-%02d' % (d.month, d.day, d.year)]
+            d = ix[-1]
+            sample += ['- ' + '%02d-%02d-%02d' % (d.month, d.day, d.year)]
+        else:
+            sample = [str(start), ' - ' + str(self.nobs)]
+
+        # Standardize the model name as a list of str
+        if model_name is None:
+            model_name = model.__class__.__name__
+
+        # Truncate endog names
+        if truncate_endog_names is None:
+            truncate_endog_names = False if self.model.k_endog == 1 else 24
+        endog_names = self.model.endog_names
+        if not isinstance(endog_names, list):
+            endog_names = [endog_names]
+        endog_names = [str(name) for name in endog_names]
+        if truncate_endog_names is not False:
+            n = truncate_endog_names
+            endog_names = [name if len(name) <= n else name[:n] + '...'
+                           for name in endog_names]
+
+        # Shorten the endog name list if applicable
+        if display_max_endog is None:
+            display_max_endog = np.inf
+        yname = None
+        if self.model.k_endog > display_max_endog:
+            k = self.model.k_endog - 1
+            yname = '"' + endog_names[0] + f'", and {k} more'
+
+        # Create the tables
+        if not isinstance(model_name, list):
+            model_name = [model_name]
+
+        top_left = [('Dep. Variable:', None)]
+        top_left.append(('Model:', [model_name[0]]))
+        for i in range(1, len(model_name)):
+            top_left.append(('', ['+ ' + model_name[i]]))
+        top_left += [
+            ('Date:', None),
+            ('Time:', None),
+            ('Sample:', [sample[0]]),
+            ('', [sample[1]])
+        ]
+
+        top_right = [
+            ('No. Observations:', [self.nobs]),
+            ('Log Likelihood', ["%#5.3f" % self.llf]),
+        ]
+        if hasattr(self, 'rsquared'):
+            top_right.append(('R-squared:', ["%#8.3f" % self.rsquared]))
+        top_right += [
+            ('AIC', ["%#5.3f" % self.aic]),
+            ('BIC', ["%#5.3f" % self.bic]),
+            ('HQIC', ["%#5.3f" % self.hqic])]
+        if (self.filter_results is not None and
+                self.filter_results.filter_concentrated):
+            top_right.append(('Scale', ["%#5.3f" % self.scale]))
+
+        if hasattr(self, 'cov_type'):
+            cov_type = self.cov_type
+            if cov_type == 'none':
+                cov_type = 'Not computed'
+            top_left.append(('Covariance Type:', [cov_type]))
+
+        if extra_top_left is not None:
+            top_left += extra_top_left
+        if extra_top_right is not None:
+            top_right += extra_top_right
+
+        summary = Summary()
+        summary.add_table_2cols(self, gleft=top_left, gright=top_right,
+                                title=title, yname=yname)
+        table_ix = 1
+        if len(self.params) > 0 and display_params:
+            summary.add_table_params(self, alpha=alpha,
+                                     xname=self.param_names, use_t=False)
+            table_ix += 1
+
+        # Diagnostic tests results
+        if display_diagnostics:
+            try:
+                het = self.test_heteroskedasticity(method='breakvar')
+            except Exception:  # FIXME: catch something specific
+                het = np.zeros((self.model.k_endog, 2)) * np.nan
+            try:
+                lb = self.test_serial_correlation(method='ljungbox', lags=[1])
+            except Exception:  # FIXME: catch something specific
+                lb = np.zeros((self.model.k_endog, 2, 1)) * np.nan
+            try:
+                jb = self.test_normality(method='jarquebera')
+            except Exception:  # FIXME: catch something specific
+                jb = np.zeros((self.model.k_endog, 4)) * np.nan
+
+            if self.model.k_endog <= display_max_endog:
+                format_str = lambda array: [  # noqa:E731
+                    ', '.join(['{0:.2f}'.format(i) for i in array])
+                ]
+                diagn_left = [
+                    ('Ljung-Box (L1) (Q):', format_str(lb[:, 0, -1])),
+                    ('Prob(Q):', format_str(lb[:, 1, -1])),
+                    ('Heteroskedasticity (H):', format_str(het[:, 0])),
+                    ('Prob(H) (two-sided):', format_str(het[:, 1]))]
+
+                diagn_right = [('Jarque-Bera (JB):', format_str(jb[:, 0])),
+                               ('Prob(JB):', format_str(jb[:, 1])),
+                               ('Skew:', format_str(jb[:, 2])),
+                               ('Kurtosis:', format_str(jb[:, 3]))
+                               ]
+
+                summary.add_table_2cols(self, gleft=diagn_left,
+                                        gright=diagn_right, title="")
+            else:
+                columns = ['LjungBox\n(L1) (Q)', 'Prob(Q)',
+                           'Het.(H)', 'Prob(H)',
+                           'Jarque\nBera(JB)', 'Prob(JB)', 'Skew', 'Kurtosis']
+                data = pd.DataFrame(
+                    np.c_[lb[:, :2, -1], het[:, :2], jb[:, :4]],
+                    index=endog_names, columns=columns).applymap(
+                        lambda num: '' if pd.isnull(num) else '%.2f' % num)
+                data.index.name = 'Residual of\nDep. variable'
+                data = data.reset_index()
+
+                params_data = data.values
+                params_header = data.columns.tolist()
+                params_stubs = None
+
+                title = 'Residual diagnostics:'
+                table = SimpleTable(
+                    params_data, params_header, params_stubs,
+                    txt_fmt=fmt_params, title=title)
+                summary.tables.insert(table_ix, table)
+
+        # Add warnings/notes, added to text format only
+        etext = []
+        if hasattr(self, 'cov_type') and 'description' in self.cov_kwds:
+            etext.append(self.cov_kwds['description'])
+        if self._rank < (len(self.params) - len(self.fixed_params)):
+            cov_params = self.cov_params()
+            if len(self.fixed_params) > 0:
+                mask = np.ix_(self._free_params_index, self._free_params_index)
+                cov_params = cov_params[mask]
+            etext.append("Covariance matrix is singular or near-singular,"
+                         " with condition number %6.3g. Standard errors may be"
+                         " unstable." % _safe_cond(cov_params))
+
+        if etext:
+            etext = ["[{0}] {1}".format(i + 1, text)
+                     for i, text in enumerate(etext)]
+            etext.insert(0, "Warnings:")
+            summary.add_extra_txt(etext)
+
+        return summary


 class MLEResultsWrapper(wrap.ResultsWrapper):
-    _attrs = {'zvalues': 'columns', 'cov_params_approx': 'cov',
-        'cov_params_default': 'cov', 'cov_params_oim': 'cov',
-        'cov_params_opg': 'cov', 'cov_params_robust': 'cov',
-        'cov_params_robust_approx': 'cov', 'cov_params_robust_oim': 'cov'}
-    _wrap_attrs = wrap.union_dicts(tsbase.TimeSeriesResultsWrapper.
-        _wrap_attrs, _attrs)
-    _methods = {'forecast': 'dates', 'impulse_responses': 'ynames'}
-    _wrap_methods = wrap.union_dicts(tsbase.TimeSeriesResultsWrapper.
-        _wrap_methods, _methods)
-
-
-wrap.populate_wrapper(MLEResultsWrapper, MLEResults)
+    _attrs = {
+        'zvalues': 'columns',
+        'cov_params_approx': 'cov',
+        'cov_params_default': 'cov',
+        'cov_params_oim': 'cov',
+        'cov_params_opg': 'cov',
+        'cov_params_robust': 'cov',
+        'cov_params_robust_approx': 'cov',
+        'cov_params_robust_oim': 'cov',
+    }
+    _wrap_attrs = wrap.union_dicts(tsbase.TimeSeriesResultsWrapper._wrap_attrs,
+                                   _attrs)
+    _methods = {
+        'forecast': 'dates',
+        'impulse_responses': 'ynames'
+    }
+    _wrap_methods = wrap.union_dicts(
+        tsbase.TimeSeriesResultsWrapper._wrap_methods, _methods)
+wrap.populate_wrapper(MLEResultsWrapper, MLEResults)  # noqa:E305


 class PredictionResults(pred.PredictionResults):
@@ -2704,20 +4933,22 @@ class PredictionResults(pred.PredictionResults):
     signal_only : bool
         Whether the prediction is for the signal only
     """
-
     def __init__(self, model, prediction_results, row_labels=None,
-        information_set='predicted', signal_only=False):
+                 information_set='predicted', signal_only=False):
         if model.model.k_endog == 1:
-            endog = pd.Series(prediction_results.endog[0], name=model.model
-                .endog_names)
+            endog = pd.Series(prediction_results.endog[0],
+                              name=model.model.endog_names)
         else:
-            endog = pd.DataFrame(prediction_results.endog.T, columns=model.
-                model.endog_names)
-        self.model = Bunch(data=model.data.__class__(endog=endog,
-            predict_dates=row_labels))
+            endog = pd.DataFrame(prediction_results.endog.T,
+                                 columns=model.model.endog_names)
+        self.model = Bunch(data=model.data.__class__(
+            endog=endog, predict_dates=row_labels))
         self.prediction_results = prediction_results
+
         self.information_set = information_set
         self.signal_only = signal_only
+
+        # Get required values
         k_endog, nobs = prediction_results.endog.shape
         res = self.prediction_results.results
         if information_set == 'predicted' and not res.memory_no_forecast_mean:
@@ -2737,10 +4968,12 @@ class PredictionResults(pred.PredictionResults):
                 predicted_mean = self.prediction_results.smoothed_signal
         else:
             predicted_mean = np.zeros((k_endog, nobs)) * np.nan
+
         if predicted_mean.shape[0] == 1:
             predicted_mean = predicted_mean[0, :]
         else:
             predicted_mean = predicted_mean.transpose()
+
         if information_set == 'predicted' and not res.memory_no_forecast_cov:
             if not signal_only:
                 var_pred_mean = self.prediction_results.forecasts_error_cov
@@ -2748,32 +4981,102 @@ class PredictionResults(pred.PredictionResults):
                 var_pred_mean = self.prediction_results.predicted_signal_cov
         elif information_set == 'filtered' and not res.memory_no_filtered_mean:
             if not signal_only:
-                var_pred_mean = (self.prediction_results.
-                    filtered_forecasts_error_cov)
+                var_pred_mean = (
+                    self.prediction_results.filtered_forecasts_error_cov)
             else:
                 var_pred_mean = self.prediction_results.filtered_signal_cov
         elif information_set == 'smoothed':
             if not signal_only:
-                var_pred_mean = (self.prediction_results.
-                    smoothed_forecasts_error_cov)
+                var_pred_mean = (
+                    self.prediction_results.smoothed_forecasts_error_cov)
             else:
                 var_pred_mean = self.prediction_results.smoothed_signal_cov
         else:
             var_pred_mean = np.zeros((k_endog, k_endog, nobs)) * np.nan
+
         if var_pred_mean.shape[0] == 1:
             var_pred_mean = var_pred_mean[0, 0, :]
         else:
             var_pred_mean = var_pred_mean.transpose()
-        super(PredictionResults, self).__init__(predicted_mean,
-            var_pred_mean, dist='norm', row_labels=row_labels)
+
+        # Initialize
+        super(PredictionResults, self).__init__(predicted_mean, var_pred_mean,
+                                                dist='norm',
+                                                row_labels=row_labels)
+
+    @property
+    def se_mean(self):
+        # Replace negative values with np.nan to avoid a RuntimeWarning
+        var_pred_mean = self.var_pred_mean.copy()
+        var_pred_mean[var_pred_mean < 0] = np.nan
+        if var_pred_mean.ndim == 1:
+            se_mean = np.sqrt(var_pred_mean)
+        else:
+            se_mean = np.sqrt(var_pred_mean.T.diagonal())
+        return se_mean
+
+    def conf_int(self, method='endpoint', alpha=0.05, **kwds):
+        # TODO: this performs metadata wrapping, and that should be handled
+        #       by attach_* methods. However, they do not currently support
+        #       this use case.
+        _use_pandas = self._use_pandas
+        self._use_pandas = False
+        conf_int = super(PredictionResults, self).conf_int(alpha, **kwds)
+        self._use_pandas = _use_pandas
+
+        # Create a dataframe
+        if self._row_labels is not None:
+            conf_int = pd.DataFrame(conf_int, index=self.row_labels)
+
+            # Attach the endog names
+            ynames = self.model.data.ynames
+            if type(ynames) is not list:
+                ynames = [ynames]
+            names = (['lower {0}'.format(name) for name in ynames] +
+                     ['upper {0}'.format(name) for name in ynames])
+            conf_int.columns = names
+
+        return conf_int
+
+    def summary_frame(self, endog=0, alpha=0.05):
+        # TODO: finish and cleanup
+        # import pandas as pd
+        # ci_obs = self.conf_int(alpha=alpha, obs=True) # need to split
+        ci_mean = np.asarray(self.conf_int(alpha=alpha))
+        _use_pandas = self._use_pandas
+        self._use_pandas = False
+        to_include = {}
+        if self.predicted_mean.ndim == 1:
+            yname = self.model.data.ynames
+            to_include['mean'] = self.predicted_mean
+            to_include['mean_se'] = self.se_mean
+            k_endog = 1
+        else:
+            yname = self.model.data.ynames[endog]
+            to_include['mean'] = self.predicted_mean[:, endog]
+            to_include['mean_se'] = self.se_mean[:, endog]
+            k_endog = self.predicted_mean.shape[1]
+        self._use_pandas = _use_pandas
+        to_include['mean_ci_lower'] = ci_mean[:, endog]
+        to_include['mean_ci_upper'] = ci_mean[:, k_endog + endog]
+
+        # pandas dict does not handle 2d_array
+        # data = np.column_stack(list(to_include.values()))
+        # names = ....
+        res = pd.DataFrame(to_include, index=self._row_labels,
+                           columns=list(to_include.keys()))
+        res.columns.name = yname
+        return res


 class PredictionResultsWrapper(wrap.ResultsWrapper):
-    _attrs = {'predicted_mean': 'dates', 'se_mean': 'dates', 't_values':
-        'dates'}
+    _attrs = {
+        'predicted_mean': 'dates',
+        'se_mean': 'dates',
+        't_values': 'dates',
+    }
     _wrap_attrs = wrap.union_dicts(_attrs)
+
     _methods = {}
     _wrap_methods = wrap.union_dicts(_methods)
-
-
-wrap.populate_wrapper(PredictionResultsWrapper, PredictionResults)
+wrap.populate_wrapper(PredictionResultsWrapper, PredictionResults)  # noqa:E305
diff --git a/statsmodels/tsa/statespace/news.py b/statsmodels/tsa/statespace/news.py
index 7d11de94f..3f52ec348 100644
--- a/statsmodels/tsa/statespace/news.py
+++ b/statsmodels/tsa/statespace/news.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 News for state space models

@@ -5,8 +6,10 @@ Author: Chad Fulton
 License: BSD-3
 """
 from statsmodels.compat.pandas import FUTURE_STACK
+
 import numpy as np
 import pandas as pd
+
 from statsmodels.iolib.summary import Summary
 from statsmodels.iolib.table import SimpleTable
 from statsmodels.iolib.tableformatting import fmt_params
@@ -135,9 +138,12 @@ class NewsResults:
            In Handbook of economic forecasting, vol. 2, pp. 195-237.
            Elsevier, 2013.
     """
-
     def __init__(self, news_results, model, updated, previous,
-        impacted_variable=None, tolerance=1e-10, row_labels=None):
+                 impacted_variable=None, tolerance=1e-10, row_labels=None):
+        # Note: `model` will be the same as one of `revised` or `previous`, but
+        # we need to save it as self.model so that the `predict_dates`, which
+        # were generated by the `_get_prediction_index` call, will be available
+        # for use by the base wrapping code.
         self.model = model
         self.updated = updated
         self.previous = previous
@@ -145,136 +151,204 @@ class NewsResults:
         self._impacted_variable = impacted_variable
         self._tolerance = tolerance
         self.row_labels = row_labels
-        self.params = []
+        self.params = []  # required for `summary` to work
+
         self.endog_names = self.updated.model.endog_names
         self.k_endog = len(self.endog_names)
+
         self.n_revisions = len(self.news_results.revisions_ix)
         self.n_revisions_detailed = len(self.news_results.revisions_details)
         self.n_revisions_grouped = len(self.news_results.revisions_grouped)
+
         index = self.updated.model._index
         columns = np.atleast_1d(self.endog_names)
-        self.post_impacted_forecasts = pd.DataFrame(news_results.
-            post_impacted_forecasts.T, index=self.row_labels, columns=columns
-            ).rename_axis(index='impact date', columns='impacted variable')
-        self.prev_impacted_forecasts = pd.DataFrame(news_results.
-            prev_impacted_forecasts.T, index=self.row_labels, columns=columns
-            ).rename_axis(index='impact date', columns='impacted variable')
-        self.update_impacts = pd.DataFrame(news_results.update_impacts,
-            index=self.row_labels, columns=columns).rename_axis(index=
-            'impact date', columns='impacted variable')
-        self.revision_detailed_impacts = pd.DataFrame(news_results.
-            revision_detailed_impacts, index=self.row_labels, columns=
-            columns, dtype=float).rename_axis(index='impact date', columns=
-            'impacted variable')
-        self.revision_impacts = pd.DataFrame(news_results.revision_impacts,
-            index=self.row_labels, columns=columns, dtype=float).rename_axis(
-            index='impact date', columns='impacted variable')
-        self.revision_grouped_impacts = (self.revision_impacts - self.
-            revision_detailed_impacts.fillna(0))
+
+        # E[y^i | post]
+        self.post_impacted_forecasts = pd.DataFrame(
+            news_results.post_impacted_forecasts.T,
+            index=self.row_labels, columns=columns).rename_axis(
+                index='impact date', columns='impacted variable')
+        # E[y^i | previous]
+        self.prev_impacted_forecasts = pd.DataFrame(
+            news_results.prev_impacted_forecasts.T,
+            index=self.row_labels, columns=columns).rename_axis(
+                index='impact date', columns='impacted variable')
+        # E[y^i | post] - E[y^i | revisions]
+        self.update_impacts = pd.DataFrame(
+            news_results.update_impacts,
+            index=self.row_labels, columns=columns).rename_axis(
+                index='impact date', columns='impacted variable')
+        # E[y^i | revisions] - E[y^i | grouped revisions]
+        self.revision_detailed_impacts = pd.DataFrame(
+            news_results.revision_detailed_impacts,
+            index=self.row_labels,
+            columns=columns,
+            dtype=float,
+        ).rename_axis(index="impact date", columns="impacted variable")
+        # E[y^i | revisions] - E[y^i | previous]
+        self.revision_impacts = pd.DataFrame(
+            news_results.revision_impacts,
+            index=self.row_labels,
+            columns=columns,
+            dtype=float,
+        ).rename_axis(index="impact date", columns="impacted variable")
+        # E[y^i | grouped revisions] - E[y^i | previous]
+        self.revision_grouped_impacts = (
+            self.revision_impacts
+            - self.revision_detailed_impacts.fillna(0))
         if self.n_revisions_grouped == 0:
             self.revision_grouped_impacts.loc[:] = 0
-        self.total_impacts = (self.post_impacted_forecasts - self.
-            prev_impacted_forecasts)
+
+        # E[y^i | post] - E[y^i | previous]
+        self.total_impacts = (self.post_impacted_forecasts -
+                              self.prev_impacted_forecasts)
+
+        # Indices of revisions and updates
         self.revisions_details_start = news_results.revisions_details_start
-        self.revisions_iloc = pd.DataFrame(list(zip(*news_results.
-            revisions_ix)), index=['revision date', 'revised variable']).T
+
+        self.revisions_iloc = pd.DataFrame(
+            list(zip(*news_results.revisions_ix)),
+            index=['revision date', 'revised variable']).T
         iloc = self.revisions_iloc
         if len(iloc) > 0:
-            self.revisions_ix = pd.DataFrame({'revision date': index[iloc[
-                'revision date']], 'revised variable': columns[iloc[
-                'revised variable']]})
+            self.revisions_ix = pd.DataFrame({
+                'revision date': index[iloc['revision date']],
+                'revised variable': columns[iloc['revised variable']]})
         else:
             self.revisions_ix = iloc.copy()
+
         mask = iloc['revision date'] >= self.revisions_details_start
         self.revisions_iloc_detailed = self.revisions_iloc[mask]
         self.revisions_ix_detailed = self.revisions_ix[mask]
-        self.updates_iloc = pd.DataFrame(list(zip(*news_results.updates_ix)
-            ), index=['update date', 'updated variable']).T
+
+        self.updates_iloc = pd.DataFrame(
+            list(zip(*news_results.updates_ix)),
+            index=['update date', 'updated variable']).T
         iloc = self.updates_iloc
         if len(iloc) > 0:
-            self.updates_ix = pd.DataFrame({'update date': index[iloc[
-                'update date']], 'updated variable': columns[iloc[
-                'updated variable']]})
+            self.updates_ix = pd.DataFrame({
+                'update date': index[iloc['update date']],
+                'updated variable': columns[iloc['updated variable']]})
         else:
             self.updates_ix = iloc.copy()
+
+        # Index of the state variables used
         self.state_index = news_results.state_index
-        r_ix_all = pd.MultiIndex.from_arrays([self.revisions_ix[
-            'revision date'], self.revisions_ix['revised variable']])
-        r_ix = pd.MultiIndex.from_arrays([self.revisions_ix_detailed[
-            'revision date'], self.revisions_ix_detailed['revised variable']])
-        u_ix = pd.MultiIndex.from_arrays([self.updates_ix['update date'],
+
+        # Wrap forecasts and forecasts errors
+        r_ix_all = pd.MultiIndex.from_arrays([
+            self.revisions_ix['revision date'],
+            self.revisions_ix['revised variable']])
+        r_ix = pd.MultiIndex.from_arrays([
+            self.revisions_ix_detailed['revision date'],
+            self.revisions_ix_detailed['revised variable']])
+        u_ix = pd.MultiIndex.from_arrays([
+            self.updates_ix['update date'],
             self.updates_ix['updated variable']])
+
+        # E[y^u | post] - E[y^u | revisions]
         if news_results.news is None:
-            self.news = pd.Series([], index=u_ix, name='news', dtype=model.
-                params.dtype)
+            self.news = pd.Series([], index=u_ix, name='news',
+                                  dtype=model.params.dtype)
         else:
             self.news = pd.Series(news_results.news, index=u_ix, name='news')
+        # Revisions to data (y^r_{revised} - y^r_{previous})
         if news_results.revisions_all is None:
-            self.revisions_all = pd.Series([], index=r_ix_all, name=
-                'revision', dtype=model.params.dtype)
+            self.revisions_all = pd.Series([], index=r_ix_all, name='revision',
+                                           dtype=model.params.dtype)
         else:
             self.revisions_all = pd.Series(news_results.revisions_all,
-                index=r_ix_all, name='revision')
+                                           index=r_ix_all, name='revision')
+        # Revisions to data (y^r_{revised} - y^r_{previous}) for which detailed
+        # impacts were computed
         if news_results.revisions is None:
             self.revisions = pd.Series([], index=r_ix, name='revision',
-                dtype=model.params.dtype)
+                                       dtype=model.params.dtype)
         else:
-            self.revisions = pd.Series(news_results.revisions, index=r_ix,
-                name='revision')
+            self.revisions = pd.Series(news_results.revisions,
+                                       index=r_ix, name='revision')
+        # E[y^u | revised]
         if news_results.update_forecasts is None:
-            self.update_forecasts = pd.Series([], index=u_ix, dtype=model.
-                params.dtype)
+            self.update_forecasts = pd.Series([], index=u_ix,
+                                              dtype=model.params.dtype)
         else:
-            self.update_forecasts = pd.Series(news_results.update_forecasts,
-                index=u_ix)
+            self.update_forecasts = pd.Series(
+                news_results.update_forecasts, index=u_ix)
+        # y^r_{revised}
         if news_results.revised_all is None:
-            self.revised_all = pd.Series([], index=r_ix_all, dtype=model.
-                params.dtype, name='revised')
+            self.revised_all = pd.Series([], index=r_ix_all,
+                                         dtype=model.params.dtype,
+                                         name='revised')
         else:
-            self.revised_all = pd.Series(news_results.revised_all, index=
-                r_ix_all, name='revised')
+            self.revised_all = pd.Series(news_results.revised_all,
+                                         index=r_ix_all, name='revised')
+        # y^r_{revised} for which detailed impacts were computed
         if news_results.revised is None:
-            self.revised = pd.Series([], index=r_ix, dtype=model.params.
-                dtype, name='revised')
+            self.revised = pd.Series([], index=r_ix, dtype=model.params.dtype,
+                                     name='revised')
         else:
-            self.revised = pd.Series(news_results.revised, index=r_ix, name
-                ='revised')
+            self.revised = pd.Series(news_results.revised, index=r_ix,
+                                     name='revised')
+        # y^r_{previous}
         if news_results.revised_prev_all is None:
-            self.revised_prev_all = pd.Series([], index=r_ix_all, dtype=
-                model.params.dtype)
+            self.revised_prev_all = pd.Series([], index=r_ix_all,
+                                              dtype=model.params.dtype)
         else:
-            self.revised_prev_all = pd.Series(news_results.revised_prev_all,
-                index=r_ix_all)
+            self.revised_prev_all = pd.Series(
+                news_results.revised_prev_all, index=r_ix_all)
+        # y^r_{previous} for which detailed impacts were computed
         if news_results.revised_prev is None:
-            self.revised_prev = pd.Series([], index=r_ix, dtype=model.
-                params.dtype)
+            self.revised_prev = pd.Series([], index=r_ix,
+                                          dtype=model.params.dtype)
         else:
-            self.revised_prev = pd.Series(news_results.revised_prev, index=r_ix
-                )
+            self.revised_prev = pd.Series(
+                news_results.revised_prev, index=r_ix)
+        # y^u
         if news_results.update_realized is None:
-            self.update_realized = pd.Series([], index=u_ix, dtype=model.
-                params.dtype)
+            self.update_realized = pd.Series([], index=u_ix,
+                                             dtype=model.params.dtype)
         else:
-            self.update_realized = pd.Series(news_results.update_realized,
-                index=u_ix)
+            self.update_realized = pd.Series(
+                news_results.update_realized, index=u_ix)
         cols = pd.MultiIndex.from_product([self.row_labels, columns])
+        # reshaped version of gain matrix E[y A'] E[A A']^{-1}
         if len(self.updates_iloc):
-            weights = news_results.gain.reshape(len(cols), len(u_ix))
+            weights = news_results.gain.reshape(
+                len(cols), len(u_ix))
         else:
             weights = np.zeros((len(cols), len(u_ix)))
         self.weights = pd.DataFrame(weights, index=cols, columns=u_ix).T
         self.weights.columns.names = ['impact date', 'impacted variable']
+
+        # reshaped version of revision_weights
         if self.n_revisions_detailed > 0:
-            revision_weights = news_results.revision_weights.reshape(len(
-                cols), len(r_ix))
+            revision_weights = news_results.revision_weights.reshape(
+                len(cols), len(r_ix))
         else:
             revision_weights = np.zeros((len(cols), len(r_ix)))
-        self.revision_weights = pd.DataFrame(revision_weights, index=cols,
-            columns=r_ix).T
-        self.revision_weights.columns.names = ['impact date',
-            'impacted variable']
-        self.revision_weights_all = self.revision_weights.reindex(self.
-            revised_all.index)
+        self.revision_weights = pd.DataFrame(
+            revision_weights, index=cols, columns=r_ix).T
+        self.revision_weights.columns.names = [
+            'impact date', 'impacted variable']
+
+        self.revision_weights_all = self.revision_weights.reindex(
+            self.revised_all.index)
+
+    @property
+    def impacted_variable(self):
+        return self._impacted_variable
+
+    @impacted_variable.setter
+    def impacted_variable(self, value):
+        self._impacted_variable = value
+
+    @property
+    def tolerance(self):
+        return self._tolerance
+
+    @tolerance.setter
+    def tolerance(self, value):
+        self._tolerance = value

     @property
     def data_revisions(self):
@@ -298,7 +372,14 @@ class NewsResults:
         --------
         data_updates
         """
-        pass
+        # Save revisions data
+        data = pd.concat([
+            self.revised_all.rename('revised'),
+            self.revised_prev_all.rename('observed (prev)')
+        ], axis=1).sort_index()
+        data['detailed impacts computed'] = (
+            self.revised_all.index.isin(self.revised.index))
+        return data

     @property
     def data_updates(self):
@@ -322,7 +403,11 @@ class NewsResults:
         --------
         data_revisions
         """
-        pass
+        data = pd.concat([
+            self.update_realized.rename('observed'),
+            self.update_forecasts.rename('forecast (prev)')
+        ], axis=1).sort_index()
+        return data

     @property
     def details_by_impact(self):
@@ -383,7 +468,40 @@ class NewsResults:
         revision_details_by_update
         impacts
         """
-        pass
+        s = self.weights.stack(level=[0, 1], **FUTURE_STACK)
+        df = s.rename('weight').to_frame()
+        if len(self.updates_iloc):
+            df['forecast (prev)'] = self.update_forecasts
+            df['observed'] = self.update_realized
+            df['news'] = self.news
+            df['impact'] = df['news'] * df['weight']
+        else:
+            df['forecast (prev)'] = []
+            df['observed'] = []
+            df['news'] = []
+            df['impact'] = []
+        df = df[['observed', 'forecast (prev)', 'news', 'weight', 'impact']]
+        df = df.reorder_levels([2, 3, 0, 1]).sort_index()
+
+        if self.impacted_variable is not None and len(df) > 0:
+            df = df.loc[np.s_[:, self.impacted_variable], :]
+
+        mask = np.abs(df['impact']) > self.tolerance
+        return df[mask]
+
+    @property
+    def _revision_grouped_impacts(self):
+        s = self.revision_grouped_impacts.stack(**FUTURE_STACK)
+        df = s.rename('impact').to_frame()
+        df = df.reindex(['revision date', 'revised variable', 'impact'],
+                        axis=1)
+        if self.revisions_details_start > 0:
+            df['revision date'] = (
+                self.updated.model._index[self.revisions_details_start - 1])
+            df['revised variable'] = 'all prior revisions'
+        df = (df.set_index(['revision date', 'revised variable'], append=True)
+                .reorder_levels([2, 3, 0, 1]))
+        return df

     @property
     def revision_details_by_impact(self):
@@ -448,7 +566,29 @@ class NewsResults:
         details_by_impact
         impacts
         """
-        pass
+        weights = self.revision_weights.stack(level=[0, 1], **FUTURE_STACK)
+        df = pd.concat([
+            self.revised.reindex(weights.index),
+            self.revised_prev.rename('observed (prev)').reindex(weights.index),
+            self.revisions.reindex(weights.index),
+            weights.rename('weight'),
+            (self.revisions.reindex(weights.index) * weights).rename('impact'),
+        ], axis=1)
+
+        if self.n_revisions_grouped > 0:
+            df = pd.concat([df, self._revision_grouped_impacts])
+            # Explicitly set names for compatibility with pandas=1.2.5
+            df.index = df.index.set_names(
+                ['revision date', 'revised variable',
+                 'impact date', 'impacted variable'])
+
+        df = df.reorder_levels([2, 3, 0, 1]).sort_index()
+
+        if self.impacted_variable is not None and len(df) > 0:
+            df = df.loc[np.s_[:, self.impacted_variable], :]
+
+        mask = np.abs(df['impact']) > self.tolerance
+        return df[mask]

     @property
     def details_by_update(self):
@@ -507,7 +647,32 @@ class NewsResults:
         details_by_impact
         impacts
         """
-        pass
+        s = self.weights.stack(level=[0, 1], **FUTURE_STACK)
+        df = s.rename('weight').to_frame()
+        if len(self.updates_iloc):
+            df['forecast (prev)'] = self.update_forecasts
+            df['observed'] = self.update_realized
+            df['news'] = self.news
+            df['impact'] = df['news'] * df['weight']
+        else:
+            df['forecast (prev)'] = []
+            df['observed'] = []
+            df['news'] = []
+            df['impact'] = []
+        df = df[['forecast (prev)', 'observed', 'news',
+                 'weight', 'impact']]
+        df = df.reset_index()
+        keys = ['update date', 'updated variable', 'observed',
+                'forecast (prev)', 'impact date', 'impacted variable']
+        df.index = pd.MultiIndex.from_arrays([df[key] for key in keys])
+        details = df.drop(keys, axis=1).sort_index()
+
+        if self.impacted_variable is not None and len(df) > 0:
+            details = details.loc[
+                np.s_[:, :, :, :, :, self.impacted_variable], :]
+
+        mask = np.abs(details['impact']) > self.tolerance
+        return details[mask]

     @property
     def revision_details_by_update(self):
@@ -570,7 +735,36 @@ class NewsResults:
         details_by_impact
         impacts
         """
-        pass
+        weights = self.revision_weights.stack(level=[0, 1], **FUTURE_STACK)
+
+        df = pd.concat([
+            self.revised_prev.rename('observed (prev)').reindex(weights.index),
+            self.revised.reindex(weights.index),
+            self.revisions.reindex(weights.index),
+            weights.rename('weight'),
+            (self.revisions.reindex(weights.index) * weights).rename('impact'),
+        ], axis=1)
+
+        if self.n_revisions_grouped > 0:
+            df = pd.concat([df, self._revision_grouped_impacts])
+            # Explicitly set names for compatibility with pandas=1.2.5
+            df.index = df.index.set_names(
+                ['revision date', 'revised variable',
+                 'impact date', 'impacted variable'])
+
+        details = (df.set_index(['observed (prev)', 'revised'], append=True)
+                     .reorder_levels([
+                         'revision date', 'revised variable', 'revised',
+                         'observed (prev)', 'impact date',
+                         'impacted variable'])
+                     .sort_index())
+
+        if self.impacted_variable is not None and len(df) > 0:
+            details = details.loc[
+                np.s_[:, :, :, :, :, self.impacted_variable], :]
+
+        mask = np.abs(details['impact']) > self.tolerance
+        return details[mask]

     @property
     def impacts(self):
@@ -613,11 +807,35 @@ class NewsResults:
         details_by_impact
         details_by_update
         """
-        pass
+        # Summary of impacts
+        impacts = pd.concat([
+            self.prev_impacted_forecasts.unstack().rename('estimate (prev)'),
+            self.revision_impacts.unstack().rename('impact of revisions'),
+            self.update_impacts.unstack().rename('impact of news'),
+            self.post_impacted_forecasts.unstack().rename('estimate (new)')],
+            axis=1)
+        impacts['impact of revisions'] = (
+            impacts['impact of revisions'].astype(float).fillna(0))
+        impacts['impact of news'] = (
+            impacts['impact of news'].astype(float).fillna(0))
+        impacts['total impact'] = (impacts['impact of revisions'] +
+                                   impacts['impact of news'])
+        impacts = impacts.reorder_levels([1, 0]).sort_index()
+        impacts.index.names = ['impact date', 'impacted variable']
+        impacts = impacts[['estimate (prev)', 'impact of revisions',
+                           'impact of news', 'total impact', 'estimate (new)']]
+
+        if self.impacted_variable is not None:
+            impacts = impacts.loc[np.s_[:, self.impacted_variable], :]
+
+        tmp = np.abs(impacts[['impact of revisions', 'impact of news']])
+        mask = (tmp > self.tolerance).any(axis=1)
+
+        return impacts[mask]

     def summary_impacts(self, impact_date=None, impacted_variable=None,
-        groupby='impact date', show_revisions_columns=None, sparsify=True,
-        float_format='%.2f'):
+                        groupby='impact date', show_revisions_columns=None,
+                        sparsify=True, float_format='%.2f'):
         """
         Create summary table with detailed impacts from news; by date, variable

@@ -664,12 +882,84 @@ class NewsResults:
         --------
         impacts
         """
-        pass
+        # Squeeze for univariate models
+        if impacted_variable is None and self.k_endog == 1:
+            impacted_variable = self.endog_names[0]
+
+        # Default is to only show the revisions columns if there were any
+        # revisions (otherwise it would just be a column of zeros)
+        if show_revisions_columns is None:
+            show_revisions_columns = self.n_revisions > 0
+
+        # Select only the variables / dates of interest
+        s = list(np.s_[:, :])
+        if impact_date is not None:
+            s[0] = np.s_[impact_date]
+        if impacted_variable is not None:
+            s[1] = np.s_[impacted_variable]
+        s = tuple(s)
+        impacts = self.impacts.loc[s, :]
+
+        # Make the first index level the groupby level
+        groupby = groupby.lower()
+        if groupby in ['impacted variable', 'impacted_variable']:
+            impacts.index = impacts.index.swaplevel(1, 0)
+        elif groupby not in ['impact date', 'impact_date']:
+            raise ValueError('Invalid groupby for impacts table. Valid options'
+                             ' are "impact date" or "impacted variable".'
+                             f'Got "{groupby}".')
+        impacts = impacts.sort_index()
+
+        # Drop the non-groupby level if there's only one value
+        tmp_index = impacts.index.remove_unused_levels()
+        k_vars = len(tmp_index.levels[1])
+        removed_level = None
+        if sparsify and k_vars == 1:
+            name = tmp_index.names[1]
+            value = tmp_index.levels[1][0]
+            removed_level = f'{name} = {value}'
+            impacts.index = tmp_index.droplevel(1)
+            impacts = impacts.applymap(
+                lambda num: '' if pd.isnull(num) else float_format % num)
+            impacts = impacts.reset_index()
+            impacts.iloc[:, 0] = impacts.iloc[:, 0].map(str)
+        else:
+            impacts = impacts.reset_index()
+            impacts.iloc[:, :2] = impacts.iloc[:, :2].applymap(str)
+            impacts.iloc[:, 2:] = impacts.iloc[:, 2:].applymap(
+                lambda num: '' if pd.isnull(num) else float_format % num)
+
+        # Sparsify the groupby column
+        if sparsify and groupby in impacts:
+            mask = impacts[groupby] == impacts[groupby].shift(1)
+            tmp = impacts.loc[mask, groupby]
+            if len(tmp) > 0:
+                impacts.loc[mask, groupby] = ''
+
+        # Drop revisions and totals columns if applicable
+        if not show_revisions_columns:
+            impacts.drop(['impact of revisions', 'total impact'], axis=1,
+                         inplace=True)
+
+        params_data = impacts.values
+        params_header = impacts.columns.tolist()
+        params_stubs = None
+
+        title = 'Impacts'
+        if removed_level is not None:
+            join = 'on' if groupby == 'date' else 'for'
+            title += f' {join} [{removed_level}]'
+        impacts_table = SimpleTable(
+            params_data, params_header, params_stubs,
+            txt_fmt=fmt_params, title=title)
+
+        return impacts_table

     def summary_details(self, source='news', impact_date=None,
-        impacted_variable=None, update_date=None, updated_variable=None,
-        groupby='update date', sparsify=True, float_format='%.2f',
-        multiple_tables=False):
+                        impacted_variable=None, update_date=None,
+                        updated_variable=None, groupby='update date',
+                        sparsify=True, float_format='%.2f',
+                        multiple_tables=False):
         """
         Create summary table with detailed impacts; by date, variable

@@ -763,7 +1053,178 @@ class NewsResults:
         details_by_impact
         details_by_update
         """
-        pass
+        # Squeeze for univariate models
+        if self.k_endog == 1:
+            if impacted_variable is None:
+                impacted_variable = self.endog_names[0]
+            if updated_variable is None:
+                updated_variable = self.endog_names[0]
+
+        # Select only the variables / dates of interest
+        s = list(np.s_[:, :, :, :, :, :])
+        if impact_date is not None:
+            s[0] = np.s_[impact_date]
+        if impacted_variable is not None:
+            s[1] = np.s_[impacted_variable]
+        if update_date is not None:
+            s[2] = np.s_[update_date]
+        if updated_variable is not None:
+            s[3] = np.s_[updated_variable]
+        s = tuple(s)
+
+        if source == 'news':
+            details = self.details_by_impact.loc[s, :]
+            columns = {
+                'current': 'observed',
+                'prev': 'forecast (prev)',
+                'update date': 'update date',
+                'updated variable': 'updated variable',
+                'news': 'news',
+            }
+        elif source == 'revisions':
+            details = self.revision_details_by_impact.loc[s, :]
+            columns = {
+                'current': 'revised',
+                'prev': 'observed (prev)',
+                'update date': 'revision date',
+                'updated variable': 'revised variable',
+                'news': 'revision',
+            }
+        else:
+            raise ValueError(f'Invalid `source`: {source}. Must be "news" or'
+                             ' "revisions".')
+
+        # Make the first index level the groupby level
+        groupby = groupby.lower().replace('_', ' ')
+        groupby_overall = 'impact'
+        levels_order = [0, 1, 2, 3]
+        if groupby == 'update date':
+            levels_order = [2, 3, 0, 1]
+            groupby_overall = 'update'
+        elif groupby == 'updated variable':
+            levels_order = [3, 2, 1, 0]
+            groupby_overall = 'update'
+        elif groupby == 'impacted variable':
+            levels_order = [1, 0, 3, 2]
+        elif groupby != 'impact date':
+            raise ValueError('Invalid groupby for details table. Valid options'
+                             ' are "update date", "updated variable",'
+                             ' "impact date",or "impacted variable".'
+                             f' Got "{groupby}".')
+        details.index = (details.index.reorder_levels(levels_order)
+                                      .remove_unused_levels())
+        details = details.sort_index()
+
+        # If our overall group-by is `update`, move forecast (prev) and
+        # observed into the index
+        base_levels = [0, 1, 2, 3]
+        if groupby_overall == 'update':
+            details.set_index([columns['current'], columns['prev']],
+                              append=True, inplace=True)
+            details.index = details.index.reorder_levels([0, 1, 4, 5, 2, 3])
+            base_levels = [0, 1, 4, 5]
+
+        # Drop the non-groupby levels if there's only one value
+        tmp_index = details.index.remove_unused_levels()
+        n_levels = len(tmp_index.levels)
+        k_level_values = [len(tmp_index.levels[i]) for i in range(n_levels)]
+        removed_levels = []
+        if sparsify:
+            for i in sorted(base_levels)[::-1][:-1]:
+                if k_level_values[i] == 1:
+                    name = tmp_index.names[i]
+                    value = tmp_index.levels[i][0]
+                    can_drop = (
+                        (name == columns['update date']
+                            and update_date is not None) or
+                        (name == columns['updated variable']
+                            and updated_variable is not None) or
+                        (name == 'impact date'
+                            and impact_date is not None) or
+                        (name == 'impacted variable'
+                            and (impacted_variable is not None or
+                                 self.impacted_variable is not None)))
+                    if can_drop or not multiple_tables:
+                        removed_levels.insert(0, f'{name} = {value}')
+                        details.index = tmp_index = tmp_index.droplevel(i)
+
+        # Move everything to columns
+        details = details.reset_index()
+
+        # Function for formatting numbers
+        def str_format(num, mark_ones=False, mark_zeroes=False):
+            if pd.isnull(num):
+                out = ''
+            elif mark_ones and np.abs(1 - num) < self.tolerance:
+                out = '1.0'
+            elif mark_zeroes and np.abs(num) < self.tolerance:
+                out = '0'
+            else:
+                out = float_format % num
+            return out
+
+        # Function to create the table
+        def create_table(details, removed_levels):
+            # Convert everything to strings
+            for key in [columns['current'], columns['prev'], columns['news'],
+                        'weight', 'impact']:
+                if key in details:
+                    args = (
+                        # mark_ones
+                        True if key in ['weight'] else False,
+                        # mark_zeroes
+                        True if key in ['weight', 'impact'] else False)
+                    details[key] = details[key].apply(str_format, args=args)
+            for key in [columns['update date'], 'impact date']:
+                if key in details:
+                    details[key] = details[key].apply(str)
+
+            # Sparsify index columns
+            if sparsify:
+                sparsify_cols = [columns['update date'],
+                                 columns['updated variable'], 'impact date',
+                                 'impacted variable']
+                data_cols = [columns['current'], columns['prev']]
+                if groupby_overall == 'update':
+                    # Put data columns first, since we need to do an additional
+                    # check based on the other columns before sparsifying
+                    sparsify_cols = data_cols + sparsify_cols
+
+                for key in sparsify_cols:
+                    if key in details:
+                        mask = details[key] == details[key].shift(1)
+                        if key in data_cols:
+                            if columns['update date'] in details:
+                                tmp = details[columns['update date']]
+                                mask &= tmp == tmp.shift(1)
+                            if columns['updated variable'] in details:
+                                tmp = details[columns['updated variable']]
+                                mask &= tmp == tmp.shift(1)
+                        details.loc[mask, key] = ''
+
+            params_data = details.values
+            params_header = [str(x) for x in details.columns.tolist()]
+            params_stubs = None
+
+            title = f"Details of {source}"
+            if len(removed_levels):
+                title += ' for [' + ', '.join(removed_levels) + ']'
+            return SimpleTable(params_data, params_header, params_stubs,
+                               txt_fmt=fmt_params, title=title)
+
+        if multiple_tables:
+            details_table = []
+            for item in details[columns[groupby]].unique():
+                mask = details[columns[groupby]] == item
+                item_details = details[mask].drop(columns[groupby], axis=1)
+                item_removed_levels = (
+                    [f'{columns[groupby]} = {item}'] + removed_levels)
+                details_table.append(create_table(item_details,
+                                                  item_removed_levels))
+        else:
+            details_table = create_table(details, removed_levels)
+
+        return details_table

     def summary_revisions(self, sparsify=True):
         """
@@ -788,7 +1249,31 @@ class NewsResults:
             - `detailed impacts computed` : whether detailed impacts were
               computed for this revision
         """
-        pass
+        data = pd.merge(
+            self.data_revisions, self.revisions_all, left_index=True,
+            right_index=True).sort_index().reset_index()
+        data = data[['revision date', 'revised variable', 'observed (prev)',
+                     'revision', 'detailed impacts computed']]
+        data[['revision date', 'revised variable']] = (
+            data[['revision date', 'revised variable']].applymap(str))
+        data.iloc[:, 2:-1] = data.iloc[:, 2:-1].applymap(
+            lambda num: '' if pd.isnull(num) else '%.2f' % num)
+
+        # Sparsify the date column
+        if sparsify:
+            mask = data['revision date'] == data['revision date'].shift(1)
+            data.loc[mask, 'revision date'] = ''
+
+        params_data = data.values
+        params_header = data.columns.tolist()
+        params_stubs = None
+
+        title = 'Revisions to dataset:'
+        revisions_table = SimpleTable(
+            params_data, params_header, params_stubs,
+            txt_fmt=fmt_params, title=title)
+
+        return revisions_table

     def summary_news(self, sparsify=True):
         """
@@ -818,13 +1303,37 @@ class NewsResults:
         --------
         data_updates
         """
-        pass
-
-    def summary(self, impact_date=None, impacted_variable=None, update_date
-        =None, updated_variable=None, revision_date=None, revised_variable=
-        None, impacts_groupby='impact date', details_groupby='update date',
-        show_revisions_columns=None, sparsify=True, include_details_tables=
-        None, include_revisions_tables=False, float_format='%.2f'):
+        data = pd.merge(
+            self.data_updates, self.news, left_index=True,
+            right_index=True).sort_index().reset_index()
+        data[['update date', 'updated variable']] = (
+            data[['update date', 'updated variable']].applymap(str))
+        data.iloc[:, 2:] = data.iloc[:, 2:].applymap(
+            lambda num: '' if pd.isnull(num) else '%.2f' % num)
+
+        # Sparsify the date column
+        if sparsify:
+            mask = data['update date'] == data['update date'].shift(1)
+            data.loc[mask, 'update date'] = ''
+
+        params_data = data.values
+        params_header = data.columns.tolist()
+        params_stubs = None
+
+        title = 'News from updated observations:'
+        updates_table = SimpleTable(
+            params_data, params_header, params_stubs,
+            txt_fmt=fmt_params, title=title)
+
+        return updates_table
+
+    def summary(self, impact_date=None, impacted_variable=None,
+                update_date=None, updated_variable=None,
+                revision_date=None, revised_variable=None,
+                impacts_groupby='impact date', details_groupby='update date',
+                show_revisions_columns=None, sparsify=True,
+                include_details_tables=None, include_revisions_tables=False,
+                float_format='%.2f'):
         """
         Create summary tables describing news and impacts

@@ -920,4 +1429,139 @@ class NewsResults:
         summary_revisions
         summary_updates
         """
-        pass
+        # Default for include_details_tables
+        if include_details_tables is None:
+            include_details_tables = (self.k_endog == 1)
+
+        # Model specification results
+        model = self.model.model
+        title = 'News'
+
+        def get_sample(model):
+            if model._index_dates:
+                mask = ~np.isnan(model.endog).all(axis=1)
+                ix = model._index[mask]
+                d = ix[0]
+                sample = ['%s' % d]
+                d = ix[-1]
+                sample += ['- ' + '%s' % d]
+            else:
+                sample = [str(0), ' - ' + str(model.nobs)]
+
+            return sample
+        previous_sample = get_sample(self.previous.model)
+        revised_sample = get_sample(self.updated.model)
+
+        # Standardize the model name as a list of str
+        model_name = model.__class__.__name__
+
+        # Top summary table
+        top_left = [('Model:', [model_name]),
+                    ('Date:', None),
+                    ('Time:', None)]
+        if self.state_index is not None:
+            k_states_used = len(self.state_index)
+            if k_states_used != self.model.model.k_states:
+                top_left.append(('# of included states:', [k_states_used]))
+
+        top_right = [
+            ('Original sample:', [previous_sample[0]]),
+            ('', [previous_sample[1]]),
+            ('Update through:', [revised_sample[1][2:]]),
+            ('# of revisions:', [len(self.revisions_ix)]),
+            ('# of new datapoints:', [len(self.updates_ix)])]
+
+        summary = Summary()
+        self.model.endog_names = self.model.model.endog_names
+        summary.add_table_2cols(self, gleft=top_left, gright=top_right,
+                                title=title)
+        table_ix = 1
+
+        # Impact table
+        summary.tables.insert(table_ix, self.summary_impacts(
+            impact_date=impact_date, impacted_variable=impacted_variable,
+            groupby=impacts_groupby,
+            show_revisions_columns=show_revisions_columns, sparsify=sparsify,
+            float_format=float_format))
+        table_ix += 1
+
+        # News table
+        if len(self.updates_iloc) > 0:
+            summary.tables.insert(
+                table_ix, self.summary_news(sparsify=sparsify))
+            table_ix += 1
+
+        # Detail tables
+        multiple_tables = (self.k_endog > 1)
+        details_tables = self.summary_details(
+            source='news',
+            impact_date=impact_date, impacted_variable=impacted_variable,
+            update_date=update_date, updated_variable=updated_variable,
+            groupby=details_groupby, sparsify=sparsify,
+            float_format=float_format, multiple_tables=multiple_tables)
+        if not multiple_tables:
+            details_tables = [details_tables]
+
+        if include_details_tables:
+            for table in details_tables:
+                summary.tables.insert(table_ix, table)
+                table_ix += 1
+
+        # Revisions
+        if include_revisions_tables and self.n_revisions > 0:
+            summary.tables.insert(
+                table_ix, self.summary_revisions(sparsify=sparsify))
+            table_ix += 1
+
+            # Revision detail tables
+            revision_details_tables = self.summary_details(
+                source='revisions',
+                impact_date=impact_date, impacted_variable=impacted_variable,
+                update_date=revision_date, updated_variable=revised_variable,
+                groupby=details_groupby, sparsify=sparsify,
+                float_format=float_format, multiple_tables=multiple_tables)
+            if not multiple_tables:
+                revision_details_tables = [revision_details_tables]
+
+            if include_details_tables:
+                for table in revision_details_tables:
+                    summary.tables.insert(table_ix, table)
+                    table_ix += 1
+
+        return summary
+
+    def get_details(self, include_revisions=True, include_updates=True):
+        details = []
+        if include_updates:
+            details.append(self.details_by_impact.rename(
+                columns={'forecast (prev)': 'previous'}))
+        if include_revisions:
+            tmp = self.revision_details_by_impact.rename_axis(
+                index={'revision date': 'update date',
+                       'revised variable': 'updated variable'})
+            tmp = tmp.rename(columns={'revised': 'observed',
+                                      'observed (prev)': 'previous',
+                                      'revision': 'news'})
+            details.append(tmp)
+        if not (include_updates or include_revisions):
+            details.append(self.details_by_impact.rename(
+                columns={'forecast (prev)': 'previous'}).iloc[:0])
+
+        return pd.concat(details)
+
+    def get_impacts(self, groupby=None, include_revisions=True,
+                    include_updates=True):
+        details = self.get_details(include_revisions=include_revisions,
+                                   include_updates=include_updates)
+
+        impacts = details['impact'].unstack(['impact date',
+                                             'impacted variable'])
+
+        if groupby is not None:
+            impacts = (impacts.unstack('update date')
+                              .groupby(groupby).sum(min_count=1)
+                              .stack('update date')
+                              .swaplevel()
+                              .sort_index())
+
+        return impacts
diff --git a/statsmodels/tsa/statespace/representation.py b/statsmodels/tsa/statespace/representation.py
index b4895b2d2..e65d9ebd0 100644
--- a/statsmodels/tsa/statespace/representation.py
+++ b/statsmodels/tsa/statespace/representation.py
@@ -4,20 +4,25 @@ State Space Representation
 Author: Chad Fulton
 License: Simplified-BSD
 """
+
 import warnings
 import numpy as np
-from .tools import find_best_blas_type, validate_matrix_shape, validate_vector_shape
+from .tools import (
+    find_best_blas_type, validate_matrix_shape, validate_vector_shape
+)
 from .initialization import Initialization
 from . import tools


 class OptionWrapper:
-
     def __init__(self, mask_attribute, mask_value):
+        # Name of the class-level bitmask attribute
         self.mask_attribute = mask_attribute
+        # Value of this option
         self.mask_value = mask_value

     def __get__(self, obj, objtype):
+        # Return True / False based on whether the bit is set in the bitmask
         return bool(getattr(obj, self.mask_attribute, 0) & self.mask_value)

     def __set__(self, obj, value):
@@ -30,7 +35,6 @@ class OptionWrapper:


 class MatrixWrapper:
-
     def __init__(self, name, attribute):
         self.name = name
         self.attribute = attribute
@@ -38,21 +42,55 @@ class MatrixWrapper:

     def __get__(self, obj, objtype):
         matrix = getattr(obj, self._attribute, None)
+        # # Remove last dimension if the array is not actually time-varying
+        # if matrix is not None and matrix.shape[-1] == 1:
+        #     return np.squeeze(matrix, -1)
         return matrix

     def __set__(self, obj, value):
-        value = np.asarray(value, order='F')
+        value = np.asarray(value, order="F")
         shape = obj.shapes[self.attribute]
+
         if len(shape) == 3:
             value = self._set_matrix(obj, value, shape)
         else:
             value = self._set_vector(obj, value, shape)
+
         setattr(obj, self._attribute, value)
         obj.shapes[self.attribute] = value.shape

+    def _set_matrix(self, obj, value, shape):
+        # Expand 1-dimensional array if possible
+        if (value.ndim == 1 and shape[0] == 1 and
+                value.shape[0] == shape[1]):
+            value = value[None, :]
+
+        # Enforce that the matrix is appropriate size
+        validate_matrix_shape(
+            self.name, value.shape, shape[0], shape[1], obj.nobs
+        )
+
+        # Expand time-invariant matrix
+        if value.ndim == 2:
+            value = np.array(value[:, :, None], order="F")
+
+        return value
+
+    def _set_vector(self, obj, value, shape):
+        # Enforce that the vector has appropriate length
+        validate_vector_shape(
+            self.name, value.shape, shape[0], obj.nobs
+        )
+
+        # Expand the time-invariant vector
+        if value.ndim == 1:
+            value = np.array(value[:, None], order="F")
+
+        return value
+

 class Representation:
-    """
+    r"""
     State space representation of a time series process

     Parameters
@@ -139,38 +177,38 @@ class Representation:

     .. math::

-        y_t & = Z_t \\alpha_t + d_t + \\varepsilon_t \\\\
-        \\alpha_t & = T_t \\alpha_{t-1} + c_t + R_t \\eta_t \\\\
+        y_t & = Z_t \alpha_t + d_t + \varepsilon_t \\
+        \alpha_t & = T_t \alpha_{t-1} + c_t + R_t \eta_t \\

     where :math:`y_t` refers to the observation vector at time :math:`t`,
-    :math:`\\alpha_t` refers to the (unobserved) state vector at time
+    :math:`\alpha_t` refers to the (unobserved) state vector at time
     :math:`t`, and where the irregular components are defined as

     .. math::

-        \\varepsilon_t \\sim N(0, H_t) \\\\
-        \\eta_t \\sim N(0, Q_t) \\\\
+        \varepsilon_t \sim N(0, H_t) \\
+        \eta_t \sim N(0, Q_t) \\

     The remaining variables (:math:`Z_t, d_t, H_t, T_t, c_t, R_t, Q_t`) in the
     equations are matrices describing the process. Their variable names and
     dimensions are as follows

-    Z : `design`          :math:`(k\\_endog \\times k\\_states \\times nobs)`
+    Z : `design`          :math:`(k\_endog \times k\_states \times nobs)`

-    d : `obs_intercept`   :math:`(k\\_endog \\times nobs)`
+    d : `obs_intercept`   :math:`(k\_endog \times nobs)`

-    H : `obs_cov`         :math:`(k\\_endog \\times k\\_endog \\times nobs)`
+    H : `obs_cov`         :math:`(k\_endog \times k\_endog \times nobs)`

-    T : `transition`      :math:`(k\\_states \\times k\\_states \\times nobs)`
+    T : `transition`      :math:`(k\_states \times k\_states \times nobs)`

-    c : `state_intercept` :math:`(k\\_states \\times nobs)`
+    c : `state_intercept` :math:`(k\_states \times nobs)`

-    R : `selection`       :math:`(k\\_states \\times k\\_posdef \\times nobs)`
+    R : `selection`       :math:`(k\_states \times k\_posdef \times nobs)`

-    Q : `state_cov`       :math:`(k\\_posdef \\times k\\_posdef \\times nobs)`
+    Q : `state_cov`       :math:`(k\_posdef \times k\_posdef \times nobs)`

     In the case that one of the matrices is time-invariant (so that, for
-    example, :math:`Z_t = Z_{t+1} ~ \\forall ~ t`), its last dimension may
+    example, :math:`Z_t = Z_{t+1} ~ \forall ~ t`), its last dimension may
     be of size :math:`1` rather than size `nobs`.

     References
@@ -179,173 +217,249 @@ class Representation:
        Time Series Analysis by State Space Methods: Second Edition.
        Oxford University Press.
     """
+
     endog = None
-    """
+    r"""
     (array) The observation vector, alias for `obs`.
     """
     design = MatrixWrapper('design', 'design')
-    """
-    (array) Design matrix: :math:`Z~(k\\_endog \\times k\\_states \\times nobs)`
+    r"""
+    (array) Design matrix: :math:`Z~(k\_endog \times k\_states \times nobs)`
     """
     obs_intercept = MatrixWrapper('observation intercept', 'obs_intercept')
-    """
-    (array) Observation intercept: :math:`d~(k\\_endog \\times nobs)`
+    r"""
+    (array) Observation intercept: :math:`d~(k\_endog \times nobs)`
     """
     obs_cov = MatrixWrapper('observation covariance matrix', 'obs_cov')
-    """
+    r"""
     (array) Observation covariance matrix:
-    :math:`H~(k\\_endog \\times k\\_endog \\times nobs)`
+    :math:`H~(k\_endog \times k\_endog \times nobs)`
     """
     transition = MatrixWrapper('transition', 'transition')
-    """
+    r"""
     (array) Transition matrix:
-    :math:`T~(k\\_states \\times k\\_states \\times nobs)`
+    :math:`T~(k\_states \times k\_states \times nobs)`
     """
     state_intercept = MatrixWrapper('state intercept', 'state_intercept')
-    """
-    (array) State intercept: :math:`c~(k\\_states \\times nobs)`
+    r"""
+    (array) State intercept: :math:`c~(k\_states \times nobs)`
     """
     selection = MatrixWrapper('selection', 'selection')
-    """
+    r"""
     (array) Selection matrix:
-    :math:`R~(k\\_states \\times k\\_posdef \\times nobs)`
+    :math:`R~(k\_states \times k\_posdef \times nobs)`
     """
     state_cov = MatrixWrapper('state covariance matrix', 'state_cov')
-    """
+    r"""
     (array) State covariance matrix:
-    :math:`Q~(k\\_posdef \\times k\\_posdef \\times nobs)`
+    :math:`Q~(k\_posdef \times k\_posdef \times nobs)`
     """

-    def __init__(self, k_endog, k_states, k_posdef=None, initial_variance=
-        1000000.0, nobs=0, dtype=np.float64, design=None, obs_intercept=
-        None, obs_cov=None, transition=None, state_intercept=None,
-        selection=None, state_cov=None, statespace_classes=None, **kwargs):
+    def __init__(self, k_endog, k_states, k_posdef=None,
+                 initial_variance=1e6, nobs=0, dtype=np.float64,
+                 design=None, obs_intercept=None, obs_cov=None,
+                 transition=None, state_intercept=None, selection=None,
+                 state_cov=None, statespace_classes=None, **kwargs):
         self.shapes = {}
+
+        # Check if k_endog is actually the endog array
         endog = None
         if isinstance(k_endog, np.ndarray):
             endog = k_endog
-            if endog.flags['C_CONTIGUOUS'] and (endog.shape[0] > 1 or nobs == 1
-                ):
+            # If so, assume that it is either column-ordered and in wide format
+            # or row-ordered and in long format
+            if (endog.flags['C_CONTIGUOUS'] and
+                    (endog.shape[0] > 1 or nobs == 1)):
                 endog = endog.T
             k_endog = endog.shape[0]
+
+        # Endogenous array, dimensions, dtype
         self.k_endog = k_endog
         if k_endog < 1:
-            raise ValueError(
-                'Number of endogenous variables in statespace model must be a positive number.'
-                )
+            raise ValueError('Number of endogenous variables in statespace'
+                             ' model must be a positive number.')
         self.nobs = nobs
+
+        # Get dimensions from transition equation
         if k_states < 1:
-            raise ValueError(
-                'Number of states in statespace model must be a positive number.'
-                )
+            raise ValueError('Number of states in statespace model must be a'
+                             ' positive number.')
         self.k_states = k_states
         self.k_posdef = k_posdef if k_posdef is not None else k_states
+
+        # Make sure k_posdef <= k_states
+        # TODO: we could technically allow k_posdef > k_states, but the Cython
+        # code needs to be more thoroughly checked to avoid seg faults.
         if self.k_posdef > self.k_states:
-            raise ValueError(
-                'Dimension of state innovation `k_posdef` cannot be larger than the dimension of the state.'
-                )
+            raise ValueError('Dimension of state innovation `k_posdef` cannot'
+                             ' be larger than the dimension of the state.')
+
+        # Bind endog, if it was given
         if endog is not None:
             self.bind(endog)
-        self.shapes = {'obs': (self.k_endog, self.nobs), 'design': (self.
-            k_endog, self.k_states, 1), 'obs_intercept': (self.k_endog, 1),
-            'obs_cov': (self.k_endog, self.k_endog, 1), 'transition': (self
-            .k_states, self.k_states, 1), 'state_intercept': (self.k_states,
-            1), 'selection': (self.k_states, self.k_posdef, 1), 'state_cov':
-            (self.k_posdef, self.k_posdef, 1)}
+
+        # Record the shapes of all of our matrices
+        # Note: these are time-invariant shapes; in practice the last dimension
+        # may also be `self.nobs` for any or all of these.
+        self.shapes = {
+            'obs': (self.k_endog, self.nobs),
+            'design': (self.k_endog, self.k_states, 1),
+            'obs_intercept': (self.k_endog, 1),
+            'obs_cov': (self.k_endog, self.k_endog, 1),
+            'transition': (self.k_states, self.k_states, 1),
+            'state_intercept': (self.k_states, 1),
+            'selection': (self.k_states, self.k_posdef, 1),
+            'state_cov': (self.k_posdef, self.k_posdef, 1),
+        }
+
+        # Representation matrices
+        # These matrices are only used in the Python object as containers,
+        # which will be copied to the appropriate _statespace object if a
+        # filter is called.
         scope = locals()
         for name, shape in self.shapes.items():
             if name == 'obs':
                 continue
-            setattr(self, '_' + name, np.zeros(shape, dtype=dtype, order='F'))
+            # Create the initial storage array for each matrix
+            setattr(self, '_' + name, np.zeros(shape, dtype=dtype, order="F"))
+
+            # If we were given an initial value for the matrix, set it
+            # (notice it is being set via the descriptor)
             if scope[name] is not None:
                 setattr(self, name, scope[name])
+
+        # Options
         self.initial_variance = initial_variance
-        self.prefix_statespace_map = (statespace_classes if 
-            statespace_classes is not None else tools.prefix_statespace_map
-            .copy())
+        self.prefix_statespace_map = (statespace_classes
+                                      if statespace_classes is not None
+                                      else tools.prefix_statespace_map.copy())
+
+        # State-space initialization data
         self.initialization = kwargs.pop('initialization', None)
         basic_inits = ['diffuse', 'approximate_diffuse', 'stationary']
+
         if self.initialization in basic_inits:
             self.initialize(self.initialization)
         elif self.initialization == 'known':
             if 'constant' in kwargs:
                 constant = kwargs.pop('constant')
             elif 'initial_state' in kwargs:
+                # TODO deprecation warning
                 constant = kwargs.pop('initial_state')
             else:
-                raise ValueError(
-                    'Initial state must be provided when "known" is the specified initialization method.'
-                    )
+                raise ValueError('Initial state must be provided when "known"'
+                                 ' is the specified initialization method.')
             if 'stationary_cov' in kwargs:
                 stationary_cov = kwargs.pop('stationary_cov')
             elif 'initial_state_cov' in kwargs:
+                # TODO deprecation warning
                 stationary_cov = kwargs.pop('initial_state_cov')
             else:
-                raise ValueError(
-                    'Initial state covariance matrix must be provided when "known" is the specified initialization method.'
-                    )
-            self.initialize('known', constant=constant, stationary_cov=
-                stationary_cov)
-        elif not isinstance(self.initialization, Initialization
-            ) and self.initialization is not None:
-            raise ValueError('Invalid state space initialization method.')
+                raise ValueError('Initial state covariance matrix must be'
+                                 ' provided when "known" is the specified'
+                                 ' initialization method.')
+            self.initialize('known', constant=constant,
+                            stationary_cov=stationary_cov)
+        elif (not isinstance(self.initialization, Initialization) and
+                self.initialization is not None):
+            raise ValueError("Invalid state space initialization method.")
+
+        # Check for unused kwargs
         if len(kwargs):
-            msg = (
-                f'Unknown keyword arguments: {kwargs.keys()}.Passing unknown keyword arguments will raise a TypeError beginning in version 0.15.'
-                )
+            # raise TypeError(f'{__class__} constructor got unexpected keyword'
+            #                 f' argument(s): {kwargs}.')
+            msg = (f'Unknown keyword arguments: {kwargs.keys()}.'
+                   'Passing unknown keyword arguments will raise a TypeError'
+                   ' beginning in version 0.15.')
             warnings.warn(msg, FutureWarning)
+
+        # Matrix representations storage
         self._representations = {}
+
+        # Setup the underlying statespace object storage
         self._statespaces = {}
+
+        # Caches
         self._time_invariant = None

     def __getitem__(self, key):
         _type = type(key)
+        # If only a string is given then we must be getting an entire matrix
         if _type is str:
             if key not in self.shapes:
-                raise IndexError(
-                    '"%s" is an invalid state space matrix name' % key)
+                raise IndexError('"%s" is an invalid state space matrix name'
+                                 % key)
             matrix = getattr(self, '_' + key)
+
+            # See note on time-varying arrays, below
             if matrix.shape[-1] == 1:
-                return matrix[(slice(None),) * (matrix.ndim - 1) + (0,)]
+                return matrix[(slice(None),)*(matrix.ndim-1) + (0,)]
             else:
                 return matrix
+        # Otherwise if we have a tuple, we want a slice of a matrix
         elif _type is tuple:
             name, slice_ = key[0], key[1:]
             if name not in self.shapes:
-                raise IndexError(
-                    '"%s" is an invalid state space matrix name' % name)
+                raise IndexError('"%s" is an invalid state space matrix name'
+                                 % name)
+
             matrix = getattr(self, '_' + name)
-            if matrix.shape[-1] == 1 and len(slice_) <= matrix.ndim - 1:
+
+            # Since the model can support time-varying arrays, but often we
+            # will instead have time-invariant arrays, we want to allow setting
+            # a matrix slice like mod['transition',0,:] even though technically
+            # it should be mod['transition',0,:,0]. Thus if the array in
+            # question is time-invariant but the last slice was excluded,
+            # add it in as a zero.
+            if matrix.shape[-1] == 1 and len(slice_) <= matrix.ndim-1:
                 slice_ = slice_ + (0,)
+
             return matrix[slice_]
+        # Otherwise, we have only a single slice index, but it is not a string
         else:
-            raise IndexError(
-                'First index must the name of a valid state space matrix.')
+            raise IndexError('First index must the name of a valid state space'
+                             ' matrix.')

     def __setitem__(self, key, value):
         _type = type(key)
+        # If only a string is given then we must be setting an entire matrix
         if _type is str:
             if key not in self.shapes:
-                raise IndexError(
-                    '"%s" is an invalid state space matrix name' % key)
+                raise IndexError('"%s" is an invalid state space matrix name'
+                                 % key)
             setattr(self, key, value)
+        # If it's a tuple (with a string as the first element) then we must be
+        # setting a slice of a matrix
         elif _type is tuple:
             name, slice_ = key[0], key[1:]
             if name not in self.shapes:
-                raise IndexError(
-                    '"%s" is an invalid state space matrix name' % key[0])
+                raise IndexError('"%s" is an invalid state space matrix name'
+                                 % key[0])
+
+            # Change the dtype of the corresponding matrix
             dtype = np.array(value).dtype
             matrix = getattr(self, '_' + name)
             valid_types = ['f', 'd', 'F', 'D']
             if not matrix.dtype == dtype and dtype.char in valid_types:
                 matrix = getattr(self, '_' + name).real.astype(dtype)
-            if matrix.shape[-1] == 1 and len(slice_) == matrix.ndim - 1:
+
+            # Since the model can support time-varying arrays, but often we
+            # will instead have time-invariant arrays, we want to allow setting
+            # a matrix slice like mod['transition',0,:] even though technically
+            # it should be mod['transition',0,:,0]. Thus if the array in
+            # question is time-invariant but the last slice was excluded,
+            # add it in as a zero.
+            if matrix.shape[-1] == 1 and len(slice_) == matrix.ndim-1:
                 slice_ = slice_ + (0,)
+
+            # Set the new value
             matrix[slice_] = value
             setattr(self, name, matrix)
+        # Otherwise we got a single non-string key, (e.g. mod[:]), which is
+        # invalid
         else:
-            raise IndexError(
-                'First index must the name of a valid state space matrix.')
+            raise IndexError('First index must the name of a valid state space'
+                             ' matrix.')

     def _clone_kwargs(self, endog, **kwargs):
         """
@@ -360,7 +474,39 @@ class Representation:
             model constructor. Those that are not specified are copied from
             the specification of the current state space model.
         """
-        pass
+
+        # We always need the base dimensions, but they cannot change from
+        # the base model when cloning (the idea is: if these need to change,
+        # need to make a new instance manually, since it's not really cloning).
+        kwargs['nobs'] = len(endog)
+        kwargs['k_endog'] = self.k_endog
+        for key in ['k_states', 'k_posdef']:
+            val = getattr(self, key)
+            if key not in kwargs or kwargs[key] is None:
+                kwargs[key] = val
+            if kwargs[key] != val:
+                raise ValueError('Cannot change the dimension of %s when'
+                                 ' cloning.' % key)
+
+        # Get defaults for time-invariant system matrices, if not otherwise
+        # provided
+        # Time-varying matrices must be replaced.
+        for name in self.shapes.keys():
+            if name == 'obs':
+                continue
+
+            if name not in kwargs:
+                mat = getattr(self, name)
+                if mat.shape[-1] != 1:
+                    raise ValueError('The `%s` matrix is time-varying. Cloning'
+                                     ' this model requires specifying an'
+                                     ' updated matrix.' % name)
+                kwargs[name] = mat
+
+        # Default is to use the same initialization
+        kwargs.setdefault('initialization', self.initialization)
+
+        return kwargs

     def clone(self, endog, **kwargs):
         """
@@ -384,7 +530,10 @@ class Representation:
         If some system matrices are time-varying, then new time-varying
         matrices *must* be provided.
         """
-        pass
+        kwargs = self._clone_kwargs(endog, **kwargs)
+        mod = self.__class__(**kwargs)
+        mod.bind(endog)
+        return mod

     def extend(self, endog, start=None, end=None, **kwargs):
         """
@@ -416,21 +565,148 @@ class Representation:
         This method does not allow replacing a time-varying system matrix with
         a time-invariant one (or vice-versa). If that is required, use `clone`.
         """
-        pass
+        endog = np.atleast_1d(endog)
+        if endog.ndim == 1:
+            endog = endog[:, np.newaxis]
+        nobs = len(endog)
+
+        if start is None:
+            start = 0
+        if end is None:
+            end = self.nobs
+
+        if start < 0:
+            start = self.nobs + start
+        if end < 0:
+            end = self.nobs + end
+        if start > self.nobs:
+            raise ValueError('The `start` argument of the extension within the'
+                             ' base model cannot be after the end of the'
+                             ' base model.')
+        if end > self.nobs:
+            raise ValueError('The `end` argument of the extension within the'
+                             ' base model cannot be after the end of the'
+                             ' base model.')
+        if start > end:
+            raise ValueError('The `start` argument of the extension within the'
+                             ' base model cannot be after the `end` argument.')
+
+        # Note: if start == end or if end < self.nobs, then we're just cloning
+        # (no extension)
+        endog = tools.concat([self.endog[:, start:end].T, endog])
+
+        # Extend any time-varying arrays
+        error_ti = ('Model has time-invariant %s matrix, so cannot provide'
+                    ' an extended matrix.')
+        error_tv = ('Model has time-varying %s matrix, so an updated'
+                    ' time-varying matrix for the extension period'
+                    ' is required.')
+        for name, shape in self.shapes.items():
+            if name == 'obs':
+                continue
+
+            mat = getattr(self, name)
+
+            # If we were *not* given an extended value for this matrix...
+            if name not in kwargs:
+                # If this is a time-varying matrix in the existing model
+                if mat.shape[-1] > 1:
+                    # If we have an extension period, then raise an error
+                    # because we should have been given an extended value
+                    if end + nobs > self.nobs:
+                        raise ValueError(error_tv % name)
+                    # If we do not have an extension period, then set the new
+                    # time-varying matrix to be the portion of the existing
+                    # time-varying matrix that corresponds to the period of
+                    # interest
+                    else:
+                        kwargs[name] = mat[..., start:end + nobs]
+            elif nobs == 0:
+                raise ValueError('Extension is being performed within-sample'
+                                 ' so cannot provide an extended matrix')
+            # If we were given an extended value for this matrix
+            else:
+                # TODO: Need to add a check for ndim, and if the matrix has
+                # one fewer dimensions than the existing matrix, add a new axis
+
+                # If this is a time-invariant matrix in the existing model,
+                # raise an error
+                if mat.shape[-1] == 1 and self.nobs > 1:
+                    raise ValueError(error_ti % name)
+
+                # Otherwise, validate the shape of the given extended value
+                # Note: we do not validate the number of observations here
+                # (so we pass in updated_mat.shape[-1] as the nobs argument
+                # in the validate_* calls); instead, we check below that we
+                # at least `nobs` values were passed in and then only take the
+                # first of them as required. This can be useful when e.g. the
+                # end user knows the extension values up to some maximum
+                # endpoint, but does not know what the calling methods may
+                # specifically require.
+                updated_mat = np.asarray(kwargs[name])
+                if len(shape) == 2:
+                    validate_vector_shape(name, updated_mat.shape, shape[0],
+                                          updated_mat.shape[-1])
+                else:
+                    validate_matrix_shape(name, updated_mat.shape, shape[0],
+                                          shape[1], updated_mat.shape[-1])
+
+                if updated_mat.shape[-1] < nobs:
+                    raise ValueError(error_tv % name)
+                else:
+                    updated_mat = updated_mat[..., :nobs]
+
+                # Concatenate to get the new time-varying matrix
+                kwargs[name] = np.c_[mat[..., start:end], updated_mat]
+
+        return self.clone(endog, **kwargs)
+
+    def diff_endog(self, new_endog, tolerance=1e-10):
+        # TODO: move this function to tools?
+        endog = self.endog.T
+        if len(new_endog) < len(endog):
+            raise ValueError('Given data (length %d) is too short to diff'
+                             ' against model data (length %d).'
+                             % (len(new_endog), len(endog)))
+        if len(new_endog) > len(endog):
+            nobs_append = len(new_endog) - len(endog)
+            endog = np.c_[endog.T, new_endog[-nobs_append:].T * np.nan].T
+
+        new_nan = np.isnan(new_endog)
+        existing_nan = np.isnan(endog)
+        diff = np.abs(new_endog - endog)
+        diff[new_nan ^ existing_nan] = np.inf
+        diff[new_nan & existing_nan] = 0.
+
+        is_revision = (diff > tolerance)
+        is_new = existing_nan & ~new_nan
+        is_revision[is_new] = False
+
+        revision_ix = list(zip(*np.where(is_revision)))
+        new_ix = list(zip(*np.where(is_new)))
+
+        return revision_ix, new_ix

     @property
     def prefix(self):
         """
         (str) BLAS prefix of currently active representation matrices
         """
-        pass
+        arrays = (
+            self._design, self._obs_intercept, self._obs_cov,
+            self._transition, self._state_intercept, self._selection,
+            self._state_cov
+        )
+        if self.endog is not None:
+            arrays = (self.endog,) + arrays
+        return find_best_blas_type(arrays)[0]

     @property
     def dtype(self):
         """
         (dtype) Datatype of currently active representation matrices
         """
-        pass
+        return tools.prefix_dtype_map[self.prefix]

     @property
     def time_invariant(self):
@@ -438,14 +714,29 @@ class Representation:
         (bool) Whether or not currently active representation matrices are
         time-invariant
         """
-        pass
+        if self._time_invariant is None:
+            return (
+                self._design.shape[2] == self._obs_intercept.shape[1] ==
+                self._obs_cov.shape[2] == self._transition.shape[2] ==
+                self._state_intercept.shape[1] == self._selection.shape[2] ==
+                self._state_cov.shape[2]
+            )
+        else:
+            return self._time_invariant
+
+    @property
+    def _statespace(self):
+        prefix = self.prefix
+        if prefix in self._statespaces:
+            return self._statespaces[prefix]
+        return None

     @property
     def obs(self):
+        r"""
+        (array) Observation vector: :math:`y~(k\_endog \times nobs)`
         """
-        (array) Observation vector: :math:`y~(k\\_endog \\times nobs)`
-        """
-        pass
+        return self.endog

     def bind(self, endog):
         """
@@ -473,13 +764,88 @@ class Representation:
         Although this class (Representation) has stringent `bind` requirements,
         it is assumed that it will rarely be used directly.
         """
-        pass
+        if not isinstance(endog, np.ndarray):
+            raise ValueError("Invalid endogenous array; must be an ndarray.")
+
+        # Make sure we have a 2-dimensional array
+        # Note: reshaping a 1-dim array into a 2-dim array by changing the
+        #       shape tuple always results in a row (C)-ordered array, so it
+        #       must be shaped (nobs, k_endog)
+        if endog.ndim == 1:
+            # In the case of nobs x 0 arrays
+            if self.k_endog == 1:
+                endog.shape = (endog.shape[0], 1)
+            # In the case of k_endog x 0 arrays
+            else:
+                endog.shape = (1, endog.shape[0])
+        if not endog.ndim == 2:
+            raise ValueError('Invalid endogenous array provided; must be'
+                             ' 2-dimensional.')
+
+        # Check for valid column-ordered arrays
+        if endog.flags['F_CONTIGUOUS'] and endog.shape[0] == self.k_endog:
+            pass
+        # Check for valid row-ordered arrays, and transpose them to be the
+        # correct column-ordered array
+        elif endog.flags['C_CONTIGUOUS'] and endog.shape[1] == self.k_endog:
+            endog = endog.T
+        # Invalid column-ordered arrays
+        elif endog.flags['F_CONTIGUOUS']:
+            raise ValueError('Invalid endogenous array; column-ordered'
+                             ' arrays must have first axis shape of'
+                             ' `k_endog`.')
+        # Invalid row-ordered arrays
+        elif endog.flags['C_CONTIGUOUS']:
+            raise ValueError('Invalid endogenous array; row-ordered'
+                             ' arrays must have last axis shape of'
+                             ' `k_endog`.')
+        # Non-contiguous arrays
+        else:
+            raise ValueError('Invalid endogenous array; must be ordered in'
+                             ' contiguous memory.')
+
+        # We may still have a non-fortran contiguous array, so double-check
+        if not endog.flags['F_CONTIGUOUS']:
+            endog = np.asfortranarray(endog)
+
+        # Set a flag for complex data
+        self._complex_endog = np.iscomplexobj(endog)
+
+        # Set the data
+        self.endog = endog
+        self.nobs = self.endog.shape[1]
+
+        # Reset shapes
+        if hasattr(self, 'shapes'):
+            self.shapes['obs'] = self.endog.shape

     def initialize(self, initialization, approximate_diffuse_variance=None,
-        constant=None, stationary_cov=None, a=None, Pstar=None, Pinf=None,
-        A=None, R0=None, Q0=None):
+                   constant=None, stationary_cov=None, a=None, Pstar=None,
+                   Pinf=None, A=None, R0=None, Q0=None):
         """Create an Initialization object if necessary"""
-        pass
+        if initialization == 'known':
+            initialization = Initialization(self.k_states, 'known',
+                                            constant=constant,
+                                            stationary_cov=stationary_cov)
+        elif initialization == 'components':
+            initialization = Initialization.from_components(
+                a=a, Pstar=Pstar, Pinf=Pinf, A=A, R0=R0, Q0=Q0)
+        elif initialization == 'approximate_diffuse':
+            if approximate_diffuse_variance is None:
+                approximate_diffuse_variance = self.initial_variance
+            initialization = Initialization(
+                self.k_states, 'approximate_diffuse',
+                approximate_diffuse_variance=approximate_diffuse_variance)
+        elif initialization == 'stationary':
+            initialization = Initialization(self.k_states, 'stationary')
+        elif initialization == 'diffuse':
+            initialization = Initialization(self.k_states, 'diffuse')
+
+        # We must have an initialization object at this point
+        if not isinstance(initialization, Initialization):
+            raise ValueError("Invalid state space initialization method.")
+
+        self.initialization = initialization

     def initialize_known(self, constant, stationary_cov):
         """
@@ -497,7 +863,21 @@ class Representation:
         stationary_cov : array_like
             Known covariance matrix of the initial state vector.
         """
-        pass
+        constant = np.asarray(constant, order="F")
+        stationary_cov = np.asarray(stationary_cov, order="F")
+
+        if not constant.shape == (self.k_states,):
+            raise ValueError('Invalid dimensions for constant state vector.'
+                             ' Requires shape (%d,), got %s' %
+                             (self.k_states, str(constant.shape)))
+        if not stationary_cov.shape == (self.k_states, self.k_states):
+            raise ValueError('Invalid dimensions for stationary covariance'
+                             ' matrix. Requires shape (%d,%d), got %s' %
+                             (self.k_states, self.k_states,
+                              str(stationary_cov.shape)))
+
+        self.initialize('known', constant=constant,
+                        stationary_cov=stationary_cov)

     def initialize_approximate_diffuse(self, variance=None):
         """
@@ -513,10 +893,14 @@ class Representation:
             The variance for approximating diffuse initial conditions. Default
             is 1e6.
         """
-        pass
+        if variance is None:
+            variance = self.initial_variance
+
+        self.initialize('approximate_diffuse',
+                        approximate_diffuse_variance=variance)

     def initialize_components(self, a=None, Pstar=None, Pinf=None, A=None,
-        R0=None, Q0=None):
+                              R0=None, Q0=None):
         """
         Initialize the statespace model with component matrices

@@ -573,19 +957,108 @@ class Representation:
            Time Series Analysis by State Space Methods: Second Edition.
            Oxford University Press.
         """
-        pass
+        self.initialize('components', a=a, Pstar=Pstar, Pinf=Pinf, A=A, R0=R0,
+                        Q0=Q0)

     def initialize_stationary(self):
         """
         Initialize the statespace model as stationary.
         """
-        pass
+        self.initialize('stationary')

     def initialize_diffuse(self):
         """
         Initialize the statespace model as diffuse.
         """
-        pass
+        self.initialize('diffuse')
+
+    def _initialize_representation(self, prefix=None):
+        if prefix is None:
+            prefix = self.prefix
+        dtype = tools.prefix_dtype_map[prefix]
+
+        # If the dtype-specific representation matrices do not exist, create
+        # them
+        if prefix not in self._representations:
+            # Copy the statespace representation matrices
+            self._representations[prefix] = {}
+            for matrix in self.shapes.keys():
+                if matrix == 'obs':
+                    self._representations[prefix][matrix] = (
+                        self.obs.astype(dtype)
+                    )
+                else:
+                    # Note: this always makes a copy
+                    self._representations[prefix][matrix] = (
+                        getattr(self, '_' + matrix).astype(dtype)
+                    )
+        # If they do exist, update them
+        else:
+            for matrix in self.shapes.keys():
+                existing = self._representations[prefix][matrix]
+                if matrix == 'obs':
+                    # existing[:] = self.obs.astype(dtype)
+                    pass
+                else:
+                    new = getattr(self, '_' + matrix).astype(dtype)
+                    if existing.shape == new.shape:
+                        existing[:] = new[:]
+                    else:
+                        self._representations[prefix][matrix] = new
+
+        # Determine if we need to (re-)create the _statespace models
+        # (if time-varying matrices changed)
+        if prefix in self._statespaces:
+            ss = self._statespaces[prefix]
+            create = (
+                not ss.obs.shape[1] == self.endog.shape[1] or
+                not ss.design.shape[2] == self.design.shape[2] or
+                not ss.obs_intercept.shape[1] == self.obs_intercept.shape[1] or
+                not ss.obs_cov.shape[2] == self.obs_cov.shape[2] or
+                not ss.transition.shape[2] == self.transition.shape[2] or
+                not (ss.state_intercept.shape[1] ==
+                     self.state_intercept.shape[1]) or
+                not ss.selection.shape[2] == self.selection.shape[2] or
+                not ss.state_cov.shape[2] == self.state_cov.shape[2]
+            )
+        else:
+            create = True
+
+        # (re-)create if necessary
+        if create:
+            if prefix in self._statespaces:
+                del self._statespaces[prefix]
+
+            # Setup the base statespace object
+            cls = self.prefix_statespace_map[prefix]
+            self._statespaces[prefix] = cls(
+                self._representations[prefix]['obs'],
+                self._representations[prefix]['design'],
+                self._representations[prefix]['obs_intercept'],
+                self._representations[prefix]['obs_cov'],
+                self._representations[prefix]['transition'],
+                self._representations[prefix]['state_intercept'],
+                self._representations[prefix]['selection'],
+                self._representations[prefix]['state_cov']
+            )
+
+        return prefix, dtype, create
+
+    def _initialize_state(self, prefix=None, complex_step=False):
+        # TODO once the transition to using the Initialization objects is
+        # complete, this should be moved entirely to the _{{prefix}}Statespace
+        # object.
+        if prefix is None:
+            prefix = self.prefix
+
+        # (Re-)initialize the statespace model
+        if isinstance(self.initialization, Initialization):
+            if not self.initialization.initialized:
+                raise RuntimeError('Initialization is incomplete.')
+            self._statespaces[prefix].initialize(self.initialization,
+                                                 complex_step=complex_step)
+        else:
+            raise RuntimeError('Statespace model not initialized.')


 class FrozenRepresentation:
@@ -652,19 +1125,71 @@ class FrozenRepresentation:
     initial_state_cov : array_like
         The state covariance matrix used to initialize the Kalamn filter.
     """
-    _model_attributes = ['model', 'prefix', 'dtype', 'nobs', 'k_endog',
-        'k_states', 'k_posdef', 'time_invariant', 'endog', 'design',
-        'obs_intercept', 'obs_cov', 'transition', 'state_intercept',
-        'selection', 'state_cov', 'missing', 'nmissing', 'shapes',
-        'initialization', 'initial_state', 'initial_state_cov',
-        'initial_variance']
+    _model_attributes = [
+        'model', 'prefix', 'dtype', 'nobs', 'k_endog', 'k_states',
+        'k_posdef', 'time_invariant', 'endog', 'design', 'obs_intercept',
+        'obs_cov', 'transition', 'state_intercept', 'selection',
+        'state_cov', 'missing', 'nmissing', 'shapes', 'initialization',
+        'initial_state', 'initial_state_cov', 'initial_variance'
+    ]
     _attributes = _model_attributes

     def __init__(self, model):
+        # Initialize all attributes to None
         for name in self._attributes:
             setattr(self, name, None)
+
+        # Update the representation attributes
         self.update_representation(model)

     def update_representation(self, model):
         """Update model Representation"""
-        pass
+        # Model
+        self.model = model
+
+        # Data type
+        self.prefix = model.prefix
+        self.dtype = model.dtype
+
+        # Copy the model dimensions
+        self.nobs = model.nobs
+        self.k_endog = model.k_endog
+        self.k_states = model.k_states
+        self.k_posdef = model.k_posdef
+        self.time_invariant = model.time_invariant
+
+        # Save the state space representation at the time
+        self.endog = model.endog
+        self.design = model._design.copy()
+        self.obs_intercept = model._obs_intercept.copy()
+        self.obs_cov = model._obs_cov.copy()
+        self.transition = model._transition.copy()
+        self.state_intercept = model._state_intercept.copy()
+        self.selection = model._selection.copy()
+        self.state_cov = model._state_cov.copy()
+
+        self.missing = np.array(model._statespaces[self.prefix].missing,
+                                copy=True)
+        self.nmissing = np.array(model._statespaces[self.prefix].nmissing,
+                                 copy=True)
+
+        # Save the final shapes of the matrices
+        self.shapes = dict(model.shapes)
+        for name in self.shapes.keys():
+            if name == 'obs':
+                continue
+            self.shapes[name] = getattr(self, name).shape
+        self.shapes['obs'] = self.endog.shape
+
+        # Save the state space initialization
+        self.initialization = model.initialization
+
+        if model.initialization is not None:
+            model._initialize_state()
+            self.initial_state = np.array(
+                model._statespaces[self.prefix].initial_state, copy=True)
+            self.initial_state_cov = np.array(
+                model._statespaces[self.prefix].initial_state_cov, copy=True)
+            self.initial_diffuse_state_cov = np.array(
+                model._statespaces[self.prefix].initial_diffuse_state_cov,
+                copy=True)
diff --git a/statsmodels/tsa/statespace/sarimax.py b/statsmodels/tsa/statespace/sarimax.py
index ed5a0beda..c6063585f 100644
--- a/statsmodels/tsa/statespace/sarimax.py
+++ b/statsmodels/tsa/statespace/sarimax.py
@@ -5,23 +5,31 @@ Author: Chad Fulton
 License: Simplified-BSD
 """
 from warnings import warn
+
 import numpy as np
 import pandas as pd
+
 from statsmodels.compat.pandas import Appender
+
 from statsmodels.tools.tools import Bunch
 from statsmodels.tools.data import _is_using_pandas
 from statsmodels.tools.decorators import cache_readonly
 import statsmodels.base.wrapper as wrap
+
 from statsmodels.tsa.arima.specification import SARIMAXSpecification
 from statsmodels.tsa.arima.params import SARIMAXParams
 from statsmodels.tsa.tsatools import lagmat
+
 from .initialization import Initialization
 from .mlemodel import MLEModel, MLEResults, MLEResultsWrapper
-from .tools import companion_matrix, diff, is_invertible, constrain_stationary_univariate, unconstrain_stationary_univariate, prepare_exog, prepare_trend_spec, prepare_trend_data
+from .tools import (
+    companion_matrix, diff, is_invertible, constrain_stationary_univariate,
+    unconstrain_stationary_univariate,
+    prepare_exog, prepare_trend_spec, prepare_trend_data)


 class SARIMAX(MLEModel):
-    """
+    r"""
     Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors
     model

@@ -197,22 +205,22 @@ class SARIMAX(MLEModel):

     Notes
     -----
-    The SARIMA model is specified :math:`(p, d, q) \\times (P, D, Q)_s`.
+    The SARIMA model is specified :math:`(p, d, q) \times (P, D, Q)_s`.

     .. math::

-        \\phi_p (L) \\tilde \\phi_P (L^s) \\Delta^d \\Delta_s^D y_t = A(t) +
-            \\theta_q (L) \\tilde \\theta_Q (L^s) \\zeta_t
+        \phi_p (L) \tilde \phi_P (L^s) \Delta^d \Delta_s^D y_t = A(t) +
+            \theta_q (L) \tilde \theta_Q (L^s) \zeta_t

     In terms of a univariate structural model, this can be represented as

     .. math::

-        y_t & = u_t + \\eta_t \\\\
-        \\phi_p (L) \\tilde \\phi_P (L^s) \\Delta^d \\Delta_s^D u_t & = A(t) +
-            \\theta_q (L) \\tilde \\theta_Q (L^s) \\zeta_t
+        y_t & = u_t + \eta_t \\
+        \phi_p (L) \tilde \phi_P (L^s) \Delta^d \Delta_s^D u_t & = A(t) +
+            \theta_q (L) \tilde \theta_Q (L^s) \zeta_t

-    where :math:`\\eta_t` is only applicable in the case of measurement error
+    where :math:`\eta_t` is only applicable in the case of measurement error
     (although it is also used in the case of a pure regression model, i.e. if
     p=q=0).

@@ -221,9 +229,9 @@ class SARIMAX(MLEModel):

     .. math::

-        y_t & = \\beta_t x_t + u_t \\\\
-        \\phi_p (L) \\tilde \\phi_P (L^s) \\Delta^d \\Delta_s^D u_t & = A(t) +
-            \\theta_q (L) \\tilde \\theta_Q (L^s) \\zeta_t
+        y_t & = \beta_t x_t + u_t \\
+        \phi_p (L) \tilde \phi_P (L^s) \Delta^d \Delta_s^D u_t & = A(t) +
+            \theta_q (L) \tilde \theta_Q (L^s) \zeta_t

     this model is the one used when exogenous regressors are provided.

@@ -231,8 +239,8 @@ class SARIMAX(MLEModel):

     .. math::

-        \\Phi (L) \\equiv \\phi_p (L) \\tilde \\phi_P (L^s) \\\\
-        \\Theta (L) \\equiv \\theta_q (L) \\tilde \\theta_Q (L^s)
+        \Phi (L) \equiv \phi_p (L) \tilde \phi_P (L^s) \\
+        \Theta (L) \equiv \theta_q (L) \tilde \theta_Q (L^s)

     If `mle_regression` is True, regression coefficients are treated as
     additional parameters to be estimated via maximum likelihood. Otherwise
@@ -307,24 +315,30 @@ class SARIMAX(MLEModel):
        Oxford University Press.
     """

-    def __init__(self, endog, exog=None, order=(1, 0, 0), seasonal_order=(0,
-        0, 0, 0), trend=None, measurement_error=False,
-        time_varying_regression=False, mle_regression=True,
-        simple_differencing=False, enforce_stationarity=True,
-        enforce_invertibility=True, hamilton_representation=False,
-        concentrate_scale=False, trend_offset=1, use_exact_diffuse=False,
-        dates=None, freq=None, missing='none', validate_specification=True,
-        **kwargs):
-        self._spec = SARIMAXSpecification(endog, exog=exog, order=order,
-            seasonal_order=seasonal_order, trend=trend,
-            enforce_stationarity=None, enforce_invertibility=None,
+    def __init__(self, endog, exog=None, order=(1, 0, 0),
+                 seasonal_order=(0, 0, 0, 0), trend=None,
+                 measurement_error=False, time_varying_regression=False,
+                 mle_regression=True, simple_differencing=False,
+                 enforce_stationarity=True, enforce_invertibility=True,
+                 hamilton_representation=False, concentrate_scale=False,
+                 trend_offset=1, use_exact_diffuse=False, dates=None,
+                 freq=None, missing='none', validate_specification=True,
+                 **kwargs):
+
+        self._spec = SARIMAXSpecification(
+            endog, exog=exog, order=order, seasonal_order=seasonal_order,
+            trend=trend, enforce_stationarity=None, enforce_invertibility=None,
             concentrate_scale=concentrate_scale, dates=dates, freq=freq,
             missing=missing, validate_specification=validate_specification)
         self._params = SARIMAXParams(self._spec)
+
+        # Save given orders
         order = self._spec.order
         seasonal_order = self._spec.seasonal_order
         self.order = order
         self.seasonal_order = seasonal_order
+
+        # Model parameters
         self.seasonal_periods = seasonal_order[3]
         self.measurement_error = measurement_error
         self.time_varying_regression = time_varying_regression
@@ -335,113 +349,243 @@ class SARIMAX(MLEModel):
         self.hamilton_representation = hamilton_representation
         self.concentrate_scale = concentrate_scale
         self.use_exact_diffuse = use_exact_diffuse
+
+        # Enforce non-MLE coefficients if time varying coefficients is
+        # specified
         if self.time_varying_regression and self.mle_regression:
-            raise ValueError(
-                'Models with time-varying regression coefficients must integrate the coefficients as part of the state vector, so that `mle_regression` must be set to False.'
-                )
+            raise ValueError('Models with time-varying regression coefficients'
+                             ' must integrate the coefficients as part of the'
+                             ' state vector, so that `mle_regression` must'
+                             ' be set to False.')
+
+        # Lag polynomials
         self._params.ar_params = -1
         self.polynomial_ar = self._params.ar_poly.coef
         self._polynomial_ar = self.polynomial_ar.copy()
+
         self._params.ma_params = 1
         self.polynomial_ma = self._params.ma_poly.coef
         self._polynomial_ma = self.polynomial_ma.copy()
+
         self._params.seasonal_ar_params = -1
         self.polynomial_seasonal_ar = self._params.seasonal_ar_poly.coef
         self._polynomial_seasonal_ar = self.polynomial_seasonal_ar.copy()
+
         self._params.seasonal_ma_params = 1
         self.polynomial_seasonal_ma = self._params.seasonal_ma_poly.coef
         self._polynomial_seasonal_ma = self.polynomial_seasonal_ma.copy()
+
+        # Deterministic trend polynomial
         self.trend = trend
         self.trend_offset = trend_offset
         self.polynomial_trend, self.k_trend = prepare_trend_spec(self.trend)
         self._polynomial_trend = self.polynomial_trend.copy()
         self._k_trend = self.k_trend
+        # (we internally use _k_trend for mechanics so that the public
+        # attribute can be overridden by subclasses)
+
+        # Model orders
+        # Note: k_ar, k_ma, k_seasonal_ar, k_seasonal_ma do not include the
+        # constant term, so they may be zero.
+        # Note: for a typical ARMA(p,q) model, p = k_ar_params = k_ar - 1 and
+        # q = k_ma_params = k_ma - 1, although this may not be true for models
+        # with arbitrary log polynomials.
         self.k_ar = self._spec.max_ar_order
         self.k_ar_params = self._spec.k_ar_params
         self.k_diff = int(order[1])
         self.k_ma = self._spec.max_ma_order
         self.k_ma_params = self._spec.k_ma_params
-        self.k_seasonal_ar = (self._spec.max_seasonal_ar_order * self._spec
-            .seasonal_periods)
+
+        self.k_seasonal_ar = (self._spec.max_seasonal_ar_order *
+                              self._spec.seasonal_periods)
         self.k_seasonal_ar_params = self._spec.k_seasonal_ar_params
         self.k_seasonal_diff = int(seasonal_order[1])
-        self.k_seasonal_ma = (self._spec.max_seasonal_ma_order * self._spec
-            .seasonal_periods)
+        self.k_seasonal_ma = (self._spec.max_seasonal_ma_order *
+                              self._spec.seasonal_periods)
         self.k_seasonal_ma_params = self._spec.k_seasonal_ma_params
+
+        # Make internal copies of the differencing orders because if we use
+        # simple differencing, then we will need to internally use zeros after
+        # the simple differencing has been performed
         self._k_diff = self.k_diff
         self._k_seasonal_diff = self.k_seasonal_diff
-        if self.hamilton_representation and not (self.simple_differencing or
-            self._k_diff == self._k_seasonal_diff == 0):
-            raise ValueError(
-                'The Hamilton representation is only available for models in which there is no differencing integrated into the state vector. Set `simple_differencing` to True or set `hamilton_representation` to False'
-                )
-        self._k_order = max(self.k_ar + self.k_seasonal_ar, self.k_ma +
-            self.k_seasonal_ma + 1)
+
+        # We can only use the Hamilton representation if differencing is not
+        # performed as a part of the state space
+        if (self.hamilton_representation and not (self.simple_differencing or
+           self._k_diff == self._k_seasonal_diff == 0)):
+            raise ValueError('The Hamilton representation is only available'
+                             ' for models in which there is no differencing'
+                             ' integrated into the state vector. Set'
+                             ' `simple_differencing` to True or set'
+                             ' `hamilton_representation` to False')
+
+        # Model order
+        # (this is used internally in a number of locations)
+        self._k_order = max(self.k_ar + self.k_seasonal_ar,
+                            self.k_ma + self.k_seasonal_ma + 1)
         if self._k_order == 1 and self.k_ar + self.k_seasonal_ar == 0:
+            # Handle time-varying regression
             if self.time_varying_regression:
                 self._k_order = 0
-        self._k_exog, exog = prepare_exog(exog)
+
+        # Exogenous data
+        (self._k_exog, exog) = prepare_exog(exog)
+        # (we internally use _k_exog for mechanics so that the public attribute
+        # can be overridden by subclasses)
         self.k_exog = self._k_exog
-        self.mle_regression = (self.mle_regression and exog is not None and
-            self._k_exog > 0)
-        self.state_regression = (not self.mle_regression and exog is not
-            None and self._k_exog > 0)
+
+        # Redefine mle_regression to be true only if it was previously set to
+        # true and there are exogenous regressors
+        self.mle_regression = (
+            self.mle_regression and exog is not None and self._k_exog > 0
+        )
+        # State regression is regression with coefficients estimated within
+        # the state vector
+        self.state_regression = (
+            not self.mle_regression and exog is not None and self._k_exog > 0
+        )
+        # If all we have is a regression (so k_ar = k_ma = 0), then put the
+        # error term as measurement error
         if self.state_regression and self._k_order == 0:
             self.measurement_error = True
+
+        # Number of states
         k_states = self._k_order
         if not self.simple_differencing:
             k_states += (self.seasonal_periods * self._k_seasonal_diff +
-                self._k_diff)
+                         self._k_diff)
         if self.state_regression:
             k_states += self._k_exog
+
+        # Number of positive definite elements of the state covariance matrix
         k_posdef = int(self._k_order > 0)
+        # Only have an error component to the states if k_posdef > 0
         self.state_error = k_posdef > 0
         if self.state_regression and self.time_varying_regression:
             k_posdef += self._k_exog
+
+        # Diffuse initialization can be more sensistive to the variance value
+        # in the case of state regression, so set a higher than usual default
+        # variance
         if self.state_regression:
-            kwargs.setdefault('initial_variance', 10000000000.0)
+            kwargs.setdefault('initial_variance', 1e10)
+
+        # Handle non-default loglikelihood burn
         self._loglikelihood_burn = kwargs.get('loglikelihood_burn', None)
-        self.k_params = (self.k_ar_params + self.k_ma_params + self.
-            k_seasonal_ar_params + self.k_seasonal_ma_params + self.
-            _k_trend + self.measurement_error + int(not self.concentrate_scale)
-            )
+
+        # Number of parameters
+        self.k_params = (
+            self.k_ar_params + self.k_ma_params +
+            self.k_seasonal_ar_params + self.k_seasonal_ma_params +
+            self._k_trend +
+            self.measurement_error +
+            int(not self.concentrate_scale)
+        )
         if self.mle_regression:
             self.k_params += self._k_exog
+
+        # We need to have an array or pandas at this point
         self.orig_endog = endog
         self.orig_exog = exog
         if not _is_using_pandas(endog, None):
             endog = np.asanyarray(endog)
+
+        # Update the differencing dimensions if simple differencing is applied
         self.orig_k_diff = self._k_diff
         self.orig_k_seasonal_diff = self._k_seasonal_diff
-        if self.simple_differencing and (self._k_diff > 0 or self.
-            _k_seasonal_diff > 0):
+        if (self.simple_differencing and
+           (self._k_diff > 0 or self._k_seasonal_diff > 0)):
             self._k_diff = 0
             self._k_seasonal_diff = 0
-        self._k_states_diff = (self._k_diff + self.seasonal_periods * self.
-            _k_seasonal_diff)
+
+        # Internally used in several locations
+        self._k_states_diff = (
+            self._k_diff + self.seasonal_periods * self._k_seasonal_diff
+        )
+
+        # Set some model variables now so they will be available for the
+        # initialize() method, below
         self.nobs = len(endog)
         self.k_states = k_states
         self.k_posdef = k_posdef
-        super(SARIMAX, self).__init__(endog, exog=exog, k_states=k_states,
-            k_posdef=k_posdef, **kwargs)
+
+        # Initialize the statespace
+        super(SARIMAX, self).__init__(
+            endog, exog=exog, k_states=k_states, k_posdef=k_posdef, **kwargs
+        )
+
+        # Set the filter to concentrate out the scale if requested
         if self.concentrate_scale:
             self.ssm.filter_concentrated = True
+
+        # Set as time-varying model if we have time-trend or exog
         if self._k_exog > 0 or len(self.polynomial_trend) > 1:
             self.ssm._time_invariant = False
+
+        # Initialize the fixed components of the statespace model
         self.ssm['design'] = self.initial_design
         self.ssm['state_intercept'] = self.initial_state_intercept
         self.ssm['transition'] = self.initial_transition
         self.ssm['selection'] = self.initial_selection
         if self.concentrate_scale:
-            self.ssm['state_cov', 0, 0] = 1.0
+            self.ssm['state_cov', 0, 0] = 1.
+
+        # update _init_keys attached by super
         self._init_keys += ['order', 'seasonal_order', 'trend',
-            'measurement_error', 'time_varying_regression',
-            'mle_regression', 'simple_differencing', 'enforce_stationarity',
-            'enforce_invertibility', 'hamilton_representation',
-            'concentrate_scale', 'trend_offset'] + list(kwargs.keys())
+                            'measurement_error', 'time_varying_regression',
+                            'mle_regression', 'simple_differencing',
+                            'enforce_stationarity', 'enforce_invertibility',
+                            'hamilton_representation', 'concentrate_scale',
+                            'trend_offset'] + list(kwargs.keys())
+        # TODO: I think the kwargs or not attached, need to recover from ???
+
+        # Initialize the state
         if self.ssm.initialization is None:
             self.initialize_default()

+    def prepare_data(self):
+        endog, exog = super(SARIMAX, self).prepare_data()
+
+        # Perform simple differencing if requested
+        if (self.simple_differencing and
+           (self.orig_k_diff > 0 or self.orig_k_seasonal_diff > 0)):
+            # Save the original length
+            orig_length = endog.shape[0]
+            # Perform simple differencing
+            endog = diff(endog.copy(), self.orig_k_diff,
+                         self.orig_k_seasonal_diff, self.seasonal_periods)
+            if exog is not None:
+                exog = diff(exog.copy(), self.orig_k_diff,
+                            self.orig_k_seasonal_diff, self.seasonal_periods)
+
+            # Reset the ModelData datasets and cache
+            self.data.endog, self.data.exog = (
+                self.data._convert_endog_exog(endog, exog))
+
+            # Reset indexes, if provided
+            new_length = self.data.endog.shape[0]
+            if self.data.row_labels is not None:
+                self.data._cache['row_labels'] = (
+                    self.data.row_labels[orig_length - new_length:])
+            if self._index is not None:
+                if self._index_int64:
+                    self._index = pd.RangeIndex(start=1, stop=new_length + 1)
+                elif self._index_generated:
+                    self._index = self._index[:-(orig_length - new_length)]
+                else:
+                    self._index = self._index[orig_length - new_length:]
+
+        # Reset the nobs
+        self.nobs = endog.shape[0]
+
+        # Cache the arrays for calculating the intercept from the trend
+        # components
+        self._trend_data = prepare_trend_data(
+            self.polynomial_trend, self._k_trend, self.nobs, self.trend_offset)
+
+        return endog, exog
+
     def initialize(self):
         """
         Initialize the SARIMAX model.
@@ -451,45 +595,492 @@ class SARIMAX(MLEModel):
         These initialization steps must occur following the parent class
         __init__ function calls.
         """
-        pass
+        super(SARIMAX, self).initialize()
+
+        # Cache the indexes of included polynomial orders (for update below)
+        # (but we do not want the index of the constant term, so exclude the
+        # first index)
+        self._polynomial_ar_idx = np.nonzero(self.polynomial_ar)[0][1:]
+        self._polynomial_ma_idx = np.nonzero(self.polynomial_ma)[0][1:]
+        self._polynomial_seasonal_ar_idx = np.nonzero(
+            self.polynomial_seasonal_ar
+        )[0][1:]
+        self._polynomial_seasonal_ma_idx = np.nonzero(
+            self.polynomial_seasonal_ma
+        )[0][1:]
+
+        # Save the indices corresponding to the reduced form lag polynomial
+        # parameters in the transition and selection matrices so that they
+        # do not have to be recalculated for each update()
+        start_row = self._k_states_diff
+        end_row = start_row + self.k_ar + self.k_seasonal_ar
+        col = self._k_states_diff
+        if not self.hamilton_representation:
+            self.transition_ar_params_idx = (
+                np.s_['transition', start_row:end_row, col]
+            )
+        else:
+            self.transition_ar_params_idx = (
+                np.s_['transition', col, start_row:end_row]
+            )
+
+        start_row += 1
+        end_row = start_row + self.k_ma + self.k_seasonal_ma
+        col = 0
+        if not self.hamilton_representation:
+            self.selection_ma_params_idx = (
+                np.s_['selection', start_row:end_row, col]
+            )
+        else:
+            self.design_ma_params_idx = (
+                np.s_['design', col, start_row:end_row]
+            )
+
+        # Cache indices for exog variances in the state covariance matrix
+        if self.state_regression and self.time_varying_regression:
+            idx = np.diag_indices(self.k_posdef)
+            self._exog_variance_idx = ('state_cov', idx[0][-self._k_exog:],
+                                       idx[1][-self._k_exog:])

     def initialize_default(self, approximate_diffuse_variance=None):
         """Initialize default"""
-        pass
+        if approximate_diffuse_variance is None:
+            approximate_diffuse_variance = self.ssm.initial_variance
+        if self.use_exact_diffuse:
+            diffuse_type = 'diffuse'
+        else:
+            diffuse_type = 'approximate_diffuse'
+
+            # Set the loglikelihood burn parameter, if not given in constructor
+            if self._loglikelihood_burn is None:
+                k_diffuse_states = self.k_states
+                if self.enforce_stationarity:
+                    k_diffuse_states -= self._k_order
+                self.loglikelihood_burn = k_diffuse_states
+
+        init = Initialization(
+            self.k_states,
+            approximate_diffuse_variance=approximate_diffuse_variance)
+
+        if self.enforce_stationarity:
+            # Differencing operators are at the beginning
+            init.set((0, self._k_states_diff), diffuse_type)
+            # Stationary component in the middle
+            init.set((self._k_states_diff,
+                      self._k_states_diff + self._k_order),
+                     'stationary')
+            # Regression components at the end
+            init.set((self._k_states_diff + self._k_order,
+                      self._k_states_diff + self._k_order + self._k_exog),
+                     diffuse_type)
+        # If we're not enforcing a stationarity, then we cannot initialize a
+        # stationary component
+        else:
+            init.set(None, diffuse_type)
+
+        self.ssm.initialization = init

     @property
     def initial_design(self):
         """Initial design matrix"""
-        pass
+        # Basic design matrix
+        design = np.r_[
+            [1] * self._k_diff,
+            ([0] * (self.seasonal_periods - 1) + [1]) * self._k_seasonal_diff,
+            [1] * self.state_error, [0] * (self._k_order - 1)
+        ]
+
+        if len(design) == 0:
+            design = np.r_[0]
+
+        # If we have exogenous regressors included as part of the state vector
+        # then the exogenous data is incorporated as a time-varying component
+        # of the design matrix
+        if self.state_regression:
+            if self._k_order > 0:
+                design = np.c_[
+                    np.reshape(
+                        np.repeat(design, self.nobs),
+                        (design.shape[0], self.nobs)
+                    ).T,
+                    self.exog
+                ].T[None, :, :]
+            else:
+                design = self.exog.T[None, :, :]
+        return design

     @property
     def initial_state_intercept(self):
         """Initial state intercept vector"""
-        pass
+        # TODO make this self._k_trend > 1 and adjust the update to take
+        # into account that if the trend is a constant, it is not time-varying
+        if self._k_trend > 0:
+            state_intercept = np.zeros((self.k_states, self.nobs))
+        else:
+            state_intercept = np.zeros((self.k_states,))
+        return state_intercept

     @property
     def initial_transition(self):
         """Initial transition matrix"""
-        pass
+        transition = np.zeros((self.k_states, self.k_states))
+
+        # Exogenous regressors component
+        if self.state_regression:
+            start = -self._k_exog
+            # T_\beta
+            transition[start:, start:] = np.eye(self._k_exog)
+
+            # Autoregressive component
+            start = -(self._k_exog + self._k_order)
+            end = -self._k_exog if self._k_exog > 0 else None
+        else:
+            # Autoregressive component
+            start = -self._k_order
+            end = None
+
+        # T_c
+        if self._k_order > 0:
+            transition[start:end, start:end] = companion_matrix(self._k_order)
+            if self.hamilton_representation:
+                transition[start:end, start:end] = np.transpose(
+                    companion_matrix(self._k_order)
+                )
+
+        # Seasonal differencing component
+        # T^*
+        if self._k_seasonal_diff > 0:
+            seasonal_companion = companion_matrix(self.seasonal_periods).T
+            seasonal_companion[0, -1] = 1
+            for d in range(self._k_seasonal_diff):
+                start = self._k_diff + d * self.seasonal_periods
+                end = self._k_diff + (d + 1) * self.seasonal_periods
+
+                # T_c^*
+                transition[start:end, start:end] = seasonal_companion
+
+                # i
+                if d < self._k_seasonal_diff - 1:
+                    transition[start, end + self.seasonal_periods - 1] = 1
+
+                # \iota
+                transition[start, self._k_states_diff] = 1
+
+        # Differencing component
+        if self._k_diff > 0:
+            idx = np.triu_indices(self._k_diff)
+            # T^**
+            transition[idx] = 1
+            # [0 1]
+            if self.seasonal_periods > 0:
+                start = self._k_diff
+                end = self._k_states_diff
+                transition[:self._k_diff, start:end] = (
+                    ([0] * (self.seasonal_periods - 1) + [1]) *
+                    self._k_seasonal_diff)
+            # [1 0]
+            column = self._k_states_diff
+            transition[:self._k_diff, column] = 1
+
+        return transition

     @property
     def initial_selection(self):
         """Initial selection matrix"""
-        pass
+        if not (self.state_regression and self.time_varying_regression):
+            if self.k_posdef > 0:
+                selection = np.r_[
+                    [0] * (self._k_states_diff),
+                    [1] * (self._k_order > 0), [0] * (self._k_order - 1),
+                    [0] * ((1 - self.mle_regression) * self._k_exog)
+                ][:, None]
+
+                if len(selection) == 0:
+                    selection = np.zeros((self.k_states, self.k_posdef))
+            else:
+                selection = np.zeros((self.k_states, 0))
+        else:
+            selection = np.zeros((self.k_states, self.k_posdef))
+            # Typical state variance
+            if self._k_order > 0:
+                selection[0, 0] = 1
+            # Time-varying regression coefficient variances
+            for i in range(self._k_exog, 0, -1):
+                selection[-i, -i] = 1
+        return selection
+
+    def clone(self, endog, exog=None, **kwargs):
+        return self._clone_from_init_kwds(endog, exog=exog, **kwargs)
+
+    @property
+    def _res_classes(self):
+        return {'fit': (SARIMAXResults, SARIMAXResultsWrapper)}
+
+    @staticmethod
+    def _conditional_sum_squares(endog, k_ar, polynomial_ar, k_ma,
+                                 polynomial_ma, k_trend=0, trend_data=None,
+                                 warning_description=None):
+        k = 2 * k_ma
+        r = max(k + k_ma, k_ar)
+
+        k_params_ar = 0 if k_ar == 0 else len(polynomial_ar.nonzero()[0]) - 1
+        k_params_ma = 0 if k_ma == 0 else len(polynomial_ma.nonzero()[0]) - 1
+
+        residuals = None
+        if k_ar + k_ma + k_trend > 0:
+            try:
+                # If we have MA terms, get residuals from an AR(k) model to use
+                # as data for conditional sum of squares estimates of the MA
+                # parameters
+                if k_ma > 0:
+                    Y = endog[k:]
+                    X = lagmat(endog, k, trim='both')
+                    params_ar = np.linalg.pinv(X).dot(Y)
+                    residuals = Y - np.dot(X, params_ar)
+
+                # Run an ARMA(p,q) model using the just computed residuals as
+                # data
+                Y = endog[r:]
+
+                X = np.empty((Y.shape[0], 0))
+                if k_trend > 0:
+                    if trend_data is None:
+                        raise ValueError('Trend data must be provided if'
+                                         ' `k_trend` > 0.')
+                    X = np.c_[X, trend_data[:(-r if r > 0 else None), :]]
+                if k_ar > 0:
+                    cols = polynomial_ar.nonzero()[0][1:] - 1
+                    X = np.c_[X, lagmat(endog, k_ar)[r:, cols]]
+                if k_ma > 0:
+                    cols = polynomial_ma.nonzero()[0][1:] - 1
+                    X = np.c_[X, lagmat(residuals, k_ma)[r-k:, cols]]
+
+                # Get the array of [ar_params, ma_params]
+                params = np.linalg.pinv(X).dot(Y)
+                residuals = Y - np.dot(X, params)
+            except ValueError:
+                if warning_description is not None:
+                    warning_description = ' for %s' % warning_description
+                else:
+                    warning_description = ''
+                warn('Too few observations to estimate starting parameters%s.'
+                     ' All parameters except for variances will be set to'
+                     ' zeros.' % warning_description)
+                # Typically this will be raised if there are not enough
+                # observations for the `lagmat` calls.
+                params = np.zeros(k_trend + k_ar + k_ma, dtype=endog.dtype)
+                if len(endog) == 0:
+                    # This case usually happens when there are not even enough
+                    # observations for a complete set of differencing
+                    # operations (no hope of fitting, just set starting
+                    # variance to 1)
+                    residuals = np.ones(k_params_ma * 2 + 1, dtype=endog.dtype)
+                else:
+                    residuals = np.r_[
+                        np.zeros(k_params_ma * 2, dtype=endog.dtype),
+                        endog - np.mean(endog)]
+
+        # Default output
+        params_trend = []
+        params_ar = []
+        params_ma = []
+        params_variance = []
+
+        # Get the params
+        offset = 0
+        if k_trend > 0:
+            params_trend = params[offset:k_trend + offset]
+            offset += k_trend
+        if k_ar > 0:
+            params_ar = params[offset:k_params_ar + offset]
+            offset += k_params_ar
+        if k_ma > 0:
+            params_ma = params[offset:k_params_ma + offset]
+            offset += k_params_ma
+        if residuals is not None:
+            if len(residuals) > max(1, k_params_ma):
+                params_variance = (residuals[k_params_ma:] ** 2).mean()
+            else:
+                params_variance = np.var(endog)
+
+        return (params_trend, params_ar, params_ma,
+                params_variance)

     @property
     def start_params(self):
         """
         Starting parameters for maximum likelihood estimation
         """
-        pass
+
+        # Perform differencing if necessary (i.e. if simple differencing is
+        # false so that the state-space model will use the entire dataset)
+        trend_data = self._trend_data
+        if not self.simple_differencing and (
+           self._k_diff > 0 or self._k_seasonal_diff > 0):
+            endog = diff(self.endog, self._k_diff,
+                         self._k_seasonal_diff, self.seasonal_periods)
+            if self.exog is not None:
+                exog = diff(self.exog, self._k_diff,
+                            self._k_seasonal_diff, self.seasonal_periods)
+            else:
+                exog = None
+            trend_data = trend_data[:endog.shape[0], :]
+        else:
+            endog = self.endog.copy()
+            exog = self.exog.copy() if self.exog is not None else None
+        endog = endog.squeeze()
+
+        # Although the Kalman filter can deal with missing values in endog,
+        # conditional sum of squares cannot
+        if np.any(np.isnan(endog)):
+            mask = ~np.isnan(endog).squeeze()
+            endog = endog[mask]
+            if exog is not None:
+                exog = exog[mask]
+            if trend_data is not None:
+                trend_data = trend_data[mask]
+
+        # Regression effects via OLS
+        params_exog = []
+        if self._k_exog > 0:
+            params_exog = np.linalg.pinv(exog).dot(endog)
+            endog = endog - np.dot(exog, params_exog)
+        if self.state_regression:
+            params_exog = []
+
+        # Non-seasonal ARMA component and trend
+        (params_trend, params_ar, params_ma,
+         params_variance) = self._conditional_sum_squares(
+            endog, self.k_ar, self.polynomial_ar, self.k_ma,
+            self.polynomial_ma, self._k_trend, trend_data,
+            warning_description='ARMA and trend')
+
+        # If we have estimated non-stationary start parameters but enforce
+        # stationarity is on, start with 0 parameters and warn
+        invalid_ar = (
+            self.k_ar > 0 and
+            self.enforce_stationarity and
+            not is_invertible(np.r_[1, -params_ar])
+        )
+        if invalid_ar:
+            warn('Non-stationary starting autoregressive parameters'
+                 ' found. Using zeros as starting parameters.')
+            params_ar *= 0
+
+        # If we have estimated non-invertible start parameters but enforce
+        # invertibility is on, raise an error
+        invalid_ma = (
+            self.k_ma > 0 and
+            self.enforce_invertibility and
+            not is_invertible(np.r_[1, params_ma])
+        )
+        if invalid_ma:
+            warn('Non-invertible starting MA parameters found.'
+                 ' Using zeros as starting parameters.')
+            params_ma *= 0
+
+        # Seasonal Parameters
+        _, params_seasonal_ar, params_seasonal_ma, params_seasonal_variance = (
+            self._conditional_sum_squares(
+                endog, self.k_seasonal_ar, self.polynomial_seasonal_ar,
+                self.k_seasonal_ma, self.polynomial_seasonal_ma,
+                warning_description='seasonal ARMA'))
+
+        # If we have estimated non-stationary start parameters but enforce
+        # stationarity is on, warn and set start params to 0
+        invalid_seasonal_ar = (
+            self.k_seasonal_ar > 0 and
+            self.enforce_stationarity and
+            not is_invertible(np.r_[1, -params_seasonal_ar])
+        )
+        if invalid_seasonal_ar:
+            warn('Non-stationary starting seasonal autoregressive'
+                 ' Using zeros as starting parameters.')
+            params_seasonal_ar *= 0
+
+        # If we have estimated non-invertible start parameters but enforce
+        # invertibility is on, raise an error
+        invalid_seasonal_ma = (
+            self.k_seasonal_ma > 0 and
+            self.enforce_invertibility and
+            not is_invertible(np.r_[1, params_seasonal_ma])
+        )
+        if invalid_seasonal_ma:
+            warn('Non-invertible starting seasonal moving average'
+                 ' Using zeros as starting parameters.')
+            params_seasonal_ma *= 0
+
+        # Variances
+        params_exog_variance = []
+        if self.state_regression and self.time_varying_regression:
+            # TODO how to set the initial variance parameters?
+            params_exog_variance = [1] * self._k_exog
+        if (self.state_error and type(params_variance) is list and
+                len(params_variance) == 0):
+            if not (type(params_seasonal_variance) is list and
+                    len(params_seasonal_variance) == 0):
+                params_variance = params_seasonal_variance
+            elif self._k_exog > 0:
+                params_variance = np.inner(endog, endog)
+            else:
+                params_variance = np.inner(endog, endog) / self.nobs
+        params_measurement_variance = 1 if self.measurement_error else []
+
+        # We want to bound the starting variance away from zero
+        params_variance = np.atleast_1d(max(np.array(params_variance), 1e-10))
+
+        # Remove state variance as parameter if scale is concentrated out
+        if self.concentrate_scale:
+            params_variance = []
+
+        # Combine all parameters
+        return np.r_[
+            params_trend,
+            params_exog,
+            params_ar,
+            params_ma,
+            params_seasonal_ar,
+            params_seasonal_ma,
+            params_exog_variance,
+            params_measurement_variance,
+            params_variance
+        ]

     @property
     def endog_names(self, latex=False):
         """Names of endogenous variables"""
-        pass
-    params_complete = ['trend', 'exog', 'ar', 'ma', 'seasonal_ar',
-        'seasonal_ma', 'exog_variance', 'measurement_variance', 'variance']
+        diff = ''
+        if self.k_diff > 0:
+            if self.k_diff == 1:
+                diff = r'\Delta' if latex else 'D'
+            else:
+                diff = (r'\Delta^%d' if latex else 'D%d') % self.k_diff
+
+        seasonal_diff = ''
+        if self.k_seasonal_diff > 0:
+            if self.k_seasonal_diff == 1:
+                seasonal_diff = ((r'\Delta_%d' if latex else 'DS%d') %
+                                 (self.seasonal_periods))
+            else:
+                seasonal_diff = ((r'\Delta_%d^%d' if latex else 'D%dS%d') %
+                                 (self.k_seasonal_diff, self.seasonal_periods))
+        endog_diff = self.simple_differencing
+        if endog_diff and self.k_diff > 0 and self.k_seasonal_diff > 0:
+            return (('%s%s %s' if latex else '%s.%s.%s') %
+                    (diff, seasonal_diff, self.data.ynames))
+        elif endog_diff and self.k_diff > 0:
+            return (('%s %s' if latex else '%s.%s') %
+                    (diff, self.data.ynames))
+        elif endog_diff and self.k_seasonal_diff > 0:
+            return (('%s %s' if latex else '%s.%s') %
+                    (seasonal_diff, self.data.ynames))
+        else:
+            return self.data.ynames
+
+    params_complete = [
+        'trend', 'exog', 'ar', 'ma', 'seasonal_ar', 'seasonal_ma',
+        'exog_variance', 'measurement_variance', 'variance'
+    ]

     @property
     def param_terms(self):
@@ -498,7 +1089,18 @@ class SARIMAX(MLEModel):

         TODO Make this an dict with slice or indices as the values.
         """
-        pass
+        model_orders = self.model_orders
+        # Get basic list from model orders
+        params = [
+            order for order in self.params_complete
+            if model_orders[order] > 0
+        ]
+        # k_exog may be positive without associated parameters if it is in the
+        # state vector
+        if 'exog' in params and not self.mle_regression:
+            params.remove('exog')
+
+        return params

     @property
     def param_names(self):
@@ -506,28 +1108,177 @@ class SARIMAX(MLEModel):
         List of human readable parameter names (for parameters actually
         included in the model).
         """
-        pass
+        params_sort_order = self.param_terms
+        model_names = self.model_names
+        return [
+            name for param in params_sort_order for name in model_names[param]
+        ]
+
+    @property
+    def state_names(self):
+        # TODO: we may be able to revisit these states to get somewhat more
+        # informative names, but ultimately probably not much better.
+        # TODO: alternatively, we may be able to get better for certain models,
+        # like pure AR models.
+        k_ar_states = self._k_order
+        if not self.simple_differencing:
+            k_ar_states += (self.seasonal_periods * self._k_seasonal_diff +
+                            self._k_diff)
+        names = ['state.%d' % i for i in range(k_ar_states)]
+
+        if self._k_exog > 0 and self.state_regression:
+            names += ['beta.%s' % self.exog_names[i]
+                      for i in range(self._k_exog)]
+
+        return names

     @property
     def model_orders(self):
         """
         The orders of each of the polynomials in the model.
         """
-        pass
+        return {
+            'trend': self._k_trend,
+            'exog': self._k_exog,
+            'ar': self.k_ar,
+            'ma': self.k_ma,
+            'seasonal_ar': self.k_seasonal_ar,
+            'seasonal_ma': self.k_seasonal_ma,
+            'reduced_ar': self.k_ar + self.k_seasonal_ar,
+            'reduced_ma': self.k_ma + self.k_seasonal_ma,
+            'exog_variance': self._k_exog if (
+                self.state_regression and self.time_varying_regression) else 0,
+            'measurement_variance': int(self.measurement_error),
+            'variance': int(self.state_error and not self.concentrate_scale),
+        }

     @property
     def model_names(self):
         """
         The plain text names of all possible model parameters.
         """
-        pass
+        return self._get_model_names(latex=False)

     @property
     def model_latex_names(self):
         """
         The latex names of all possible model parameters.
         """
-        pass
+        return self._get_model_names(latex=True)
+
+    def _get_model_names(self, latex=False):
+        names = {
+            'trend': None,
+            'exog': None,
+            'ar': None,
+            'ma': None,
+            'seasonal_ar': None,
+            'seasonal_ma': None,
+            'reduced_ar': None,
+            'reduced_ma': None,
+            'exog_variance': None,
+            'measurement_variance': None,
+            'variance': None,
+        }
+
+        # Trend
+        if self._k_trend > 0:
+            trend_template = 't_%d' if latex else 'trend.%d'
+            names['trend'] = []
+            for i in self.polynomial_trend.nonzero()[0]:
+                if i == 0:
+                    names['trend'].append('intercept')
+                elif i == 1:
+                    names['trend'].append('drift')
+                else:
+                    names['trend'].append(trend_template % i)
+
+        # Exogenous coefficients
+        if self._k_exog > 0:
+            names['exog'] = self.exog_names
+
+        # Autoregressive
+        if self.k_ar > 0:
+            ar_template = '$\\phi_%d$' if latex else 'ar.L%d'
+            names['ar'] = []
+            for i in self.polynomial_ar.nonzero()[0][1:]:
+                names['ar'].append(ar_template % i)
+
+        # Moving Average
+        if self.k_ma > 0:
+            ma_template = '$\\theta_%d$' if latex else 'ma.L%d'
+            names['ma'] = []
+            for i in self.polynomial_ma.nonzero()[0][1:]:
+                names['ma'].append(ma_template % i)
+
+        # Seasonal Autoregressive
+        if self.k_seasonal_ar > 0:
+            seasonal_ar_template = (
+                '$\\tilde \\phi_%d$' if latex else 'ar.S.L%d'
+            )
+            names['seasonal_ar'] = []
+            for i in self.polynomial_seasonal_ar.nonzero()[0][1:]:
+                names['seasonal_ar'].append(seasonal_ar_template % i)
+
+        # Seasonal Moving Average
+        if self.k_seasonal_ma > 0:
+            seasonal_ma_template = (
+                '$\\tilde \\theta_%d$' if latex else 'ma.S.L%d'
+            )
+            names['seasonal_ma'] = []
+            for i in self.polynomial_seasonal_ma.nonzero()[0][1:]:
+                names['seasonal_ma'].append(seasonal_ma_template % i)
+
+        # Reduced Form Autoregressive
+        if self.k_ar > 0 or self.k_seasonal_ar > 0:
+            reduced_polynomial_ar = reduced_polynomial_ar = -np.polymul(
+                self.polynomial_ar, self.polynomial_seasonal_ar
+            )
+            ar_template = '$\\Phi_%d$' if latex else 'ar.R.L%d'
+            names['reduced_ar'] = []
+            for i in reduced_polynomial_ar.nonzero()[0][1:]:
+                names['reduced_ar'].append(ar_template % i)
+
+        # Reduced Form Moving Average
+        if self.k_ma > 0 or self.k_seasonal_ma > 0:
+            reduced_polynomial_ma = np.polymul(
+                self.polynomial_ma, self.polynomial_seasonal_ma
+            )
+            ma_template = '$\\Theta_%d$' if latex else 'ma.R.L%d'
+            names['reduced_ma'] = []
+            for i in reduced_polynomial_ma.nonzero()[0][1:]:
+                names['reduced_ma'].append(ma_template % i)
+
+        # Exogenous variances
+        if self.state_regression and self.time_varying_regression:
+            if not self.concentrate_scale:
+                exog_var_template = ('$\\sigma_\\text{%s}^2$' if latex
+                                     else 'var.%s')
+            else:
+                exog_var_template = (
+                    '$\\sigma_\\text{%s}^2 / \\sigma_\\zeta^2$' if latex
+                    else 'snr.%s')
+            names['exog_variance'] = [
+                exog_var_template % exog_name for exog_name in self.exog_names
+            ]
+
+        # Measurement error variance
+        if self.measurement_error:
+            if not self.concentrate_scale:
+                meas_var_tpl = (
+                    '$\\sigma_\\eta^2$' if latex else 'var.measurement_error')
+            else:
+                meas_var_tpl = (
+                    '$\\sigma_\\eta^2 / \\sigma_\\zeta^2$' if latex
+                    else 'snr.measurement_error')
+            names['measurement_variance'] = [meas_var_tpl]
+
+        # State variance
+        if self.state_error and not self.concentrate_scale:
+            var_tpl = '$\\sigma_\\zeta^2$' if latex else 'sigma2'
+            names['variance'] = [var_tpl]
+
+        return names

     def transform_params(self, unconstrained):
         """
@@ -556,7 +1307,82 @@ class SARIMAX(MLEModel):
         polynomials, although it only excludes a very small portion very close
         to the invertibility boundary.
         """
-        pass
+        unconstrained = np.array(unconstrained, ndmin=1)
+        constrained = np.zeros(unconstrained.shape, unconstrained.dtype)
+
+        start = end = 0
+
+        # Retain the trend parameters
+        if self._k_trend > 0:
+            end += self._k_trend
+            constrained[start:end] = unconstrained[start:end]
+            start += self._k_trend
+
+        # Retain any MLE regression coefficients
+        if self.mle_regression:
+            end += self._k_exog
+            constrained[start:end] = unconstrained[start:end]
+            start += self._k_exog
+
+        # Transform the AR parameters (phi) to be stationary
+        if self.k_ar_params > 0:
+            end += self.k_ar_params
+            if self.enforce_stationarity:
+                constrained[start:end] = (
+                    constrain_stationary_univariate(unconstrained[start:end])
+                )
+            else:
+                constrained[start:end] = unconstrained[start:end]
+            start += self.k_ar_params
+
+        # Transform the MA parameters (theta) to be invertible
+        if self.k_ma_params > 0:
+            end += self.k_ma_params
+            if self.enforce_invertibility:
+                constrained[start:end] = (
+                    -constrain_stationary_univariate(unconstrained[start:end])
+                )
+            else:
+                constrained[start:end] = unconstrained[start:end]
+            start += self.k_ma_params
+
+        # Transform the seasonal AR parameters (\tilde phi) to be stationary
+        if self.k_seasonal_ar > 0:
+            end += self.k_seasonal_ar_params
+            if self.enforce_stationarity:
+                constrained[start:end] = (
+                    constrain_stationary_univariate(unconstrained[start:end])
+                )
+            else:
+                constrained[start:end] = unconstrained[start:end]
+            start += self.k_seasonal_ar_params
+
+        # Transform the seasonal MA parameters (\tilde theta) to be invertible
+        if self.k_seasonal_ma_params > 0:
+            end += self.k_seasonal_ma_params
+            if self.enforce_invertibility:
+                constrained[start:end] = (
+                    -constrain_stationary_univariate(unconstrained[start:end])
+                )
+            else:
+                constrained[start:end] = unconstrained[start:end]
+            start += self.k_seasonal_ma_params
+
+        # Transform the standard deviation parameters to be positive
+        if self.state_regression and self.time_varying_regression:
+            end += self._k_exog
+            constrained[start:end] = unconstrained[start:end]**2
+            start += self._k_exog
+        if self.measurement_error:
+            constrained[start] = unconstrained[start]**2
+            start += 1
+            end += 1
+        if self.state_error and not self.concentrate_scale:
+            constrained[start] = unconstrained[start]**2
+            # start += 1
+            # end += 1
+
+        return constrained

     def untransform_params(self, constrained):
         """
@@ -585,10 +1411,108 @@ class SARIMAX(MLEModel):
         polynomials, although it only excludes a very small portion very close
         to the invertibility boundary.
         """
-        pass
+        constrained = np.array(constrained, ndmin=1)
+        unconstrained = np.zeros(constrained.shape, constrained.dtype)
+
+        start = end = 0
+
+        # Retain the trend parameters
+        if self._k_trend > 0:
+            end += self._k_trend
+            unconstrained[start:end] = constrained[start:end]
+            start += self._k_trend
+
+        # Retain any MLE regression coefficients
+        if self.mle_regression:
+            end += self._k_exog
+            unconstrained[start:end] = constrained[start:end]
+            start += self._k_exog
+
+        # Transform the AR parameters (phi) to be stationary
+        if self.k_ar_params > 0:
+            end += self.k_ar_params
+            if self.enforce_stationarity:
+                unconstrained[start:end] = (
+                    unconstrain_stationary_univariate(constrained[start:end])
+                )
+            else:
+                unconstrained[start:end] = constrained[start:end]
+            start += self.k_ar_params
+
+        # Transform the MA parameters (theta) to be invertible
+        if self.k_ma_params > 0:
+            end += self.k_ma_params
+            if self.enforce_invertibility:
+                unconstrained[start:end] = (
+                    unconstrain_stationary_univariate(-constrained[start:end])
+                )
+            else:
+                unconstrained[start:end] = constrained[start:end]
+            start += self.k_ma_params
+
+        # Transform the seasonal AR parameters (\tilde phi) to be stationary
+        if self.k_seasonal_ar > 0:
+            end += self.k_seasonal_ar_params
+            if self.enforce_stationarity:
+                unconstrained[start:end] = (
+                    unconstrain_stationary_univariate(constrained[start:end])
+                )
+            else:
+                unconstrained[start:end] = constrained[start:end]
+            start += self.k_seasonal_ar_params
+
+        # Transform the seasonal MA parameters (\tilde theta) to be invertible
+        if self.k_seasonal_ma_params > 0:
+            end += self.k_seasonal_ma_params
+            if self.enforce_invertibility:
+                unconstrained[start:end] = (
+                    unconstrain_stationary_univariate(-constrained[start:end])
+                )
+            else:
+                unconstrained[start:end] = constrained[start:end]
+            start += self.k_seasonal_ma_params
+
+        # Untransform the standard deviation
+        if self.state_regression and self.time_varying_regression:
+            end += self._k_exog
+            unconstrained[start:end] = constrained[start:end]**0.5
+            start += self._k_exog
+        if self.measurement_error:
+            unconstrained[start] = constrained[start]**0.5
+            start += 1
+            end += 1
+        if self.state_error and not self.concentrate_scale:
+            unconstrained[start] = constrained[start]**0.5
+            # start += 1
+            # end += 1
+
+        return unconstrained
+
+    def _validate_can_fix_params(self, param_names):
+        super(SARIMAX, self)._validate_can_fix_params(param_names)
+        model_names = self.model_names
+
+        items = [
+            ('ar', 'autoregressive', self.enforce_stationarity,
+                '`enforce_stationarity=True`'),
+            ('seasonal_ar', 'seasonal autoregressive',
+                self.enforce_stationarity, '`enforce_stationarity=True`'),
+            ('ma', 'moving average', self.enforce_invertibility,
+                '`enforce_invertibility=True`'),
+            ('seasonal_ma', 'seasonal moving average',
+                self.enforce_invertibility, '`enforce_invertibility=True`')]
+
+        for name, title, condition, condition_desc in items:
+            names = set(model_names[name] or [])
+            fix_all = param_names.issuperset(names)
+            fix_any = len(param_names.intersection(names)) > 0
+            if condition and fix_any and not fix_all:
+                raise ValueError('Cannot fix individual %s parameters when'
+                                 ' %s. Must either fix all %s parameters or'
+                                 ' none.' % (title, condition_desc, title))

     def update(self, params, transformed=True, includes_fixed=False,
-        complex_step=False):
+               complex_step=False):
         """
         Update the parameters of the model

@@ -608,11 +1532,179 @@ class SARIMAX(MLEModel):
         params : array_like
             Array of parameters.
         """
-        pass
+        params = self.handle_params(params, transformed=transformed,
+                                    includes_fixed=includes_fixed)
+
+        params_trend = None
+        params_exog = None
+        params_ar = None
+        params_ma = None
+        params_seasonal_ar = None
+        params_seasonal_ma = None
+        params_exog_variance = None
+        params_measurement_variance = None
+        params_variance = None
+
+        # Extract the parameters
+        start = end = 0
+        end += self._k_trend
+        params_trend = params[start:end]
+        start += self._k_trend
+        if self.mle_regression:
+            end += self._k_exog
+            params_exog = params[start:end]
+            start += self._k_exog
+        end += self.k_ar_params
+        params_ar = params[start:end]
+        start += self.k_ar_params
+        end += self.k_ma_params
+        params_ma = params[start:end]
+        start += self.k_ma_params
+        end += self.k_seasonal_ar_params
+        params_seasonal_ar = params[start:end]
+        start += self.k_seasonal_ar_params
+        end += self.k_seasonal_ma_params
+        params_seasonal_ma = params[start:end]
+        start += self.k_seasonal_ma_params
+        if self.state_regression and self.time_varying_regression:
+            end += self._k_exog
+            params_exog_variance = params[start:end]
+            start += self._k_exog
+        if self.measurement_error:
+            params_measurement_variance = params[start]
+            start += 1
+            end += 1
+        if self.state_error and not self.concentrate_scale:
+            params_variance = params[start]
+        # start += 1
+        # end += 1
+
+        # Update lag polynomials
+        if self.k_ar > 0:
+            if self._polynomial_ar.dtype == params.dtype:
+                self._polynomial_ar[self._polynomial_ar_idx] = -params_ar
+            else:
+                polynomial_ar = self._polynomial_ar.real.astype(params.dtype)
+                polynomial_ar[self._polynomial_ar_idx] = -params_ar
+                self._polynomial_ar = polynomial_ar
+
+        if self.k_ma > 0:
+            if self._polynomial_ma.dtype == params.dtype:
+                self._polynomial_ma[self._polynomial_ma_idx] = params_ma
+            else:
+                polynomial_ma = self._polynomial_ma.real.astype(params.dtype)
+                polynomial_ma[self._polynomial_ma_idx] = params_ma
+                self._polynomial_ma = polynomial_ma
+
+        if self.k_seasonal_ar > 0:
+            idx = self._polynomial_seasonal_ar_idx
+            if self._polynomial_seasonal_ar.dtype == params.dtype:
+                self._polynomial_seasonal_ar[idx] = -params_seasonal_ar
+            else:
+                polynomial_seasonal_ar = (
+                    self._polynomial_seasonal_ar.real.astype(params.dtype)
+                )
+                polynomial_seasonal_ar[idx] = -params_seasonal_ar
+                self._polynomial_seasonal_ar = polynomial_seasonal_ar

-    def _get_extension_time_varying_matrices(self, params, exog,
-        out_of_sample, extend_kwargs=None, transformed=True, includes_fixed
-        =False, **kwargs):
+        if self.k_seasonal_ma > 0:
+            idx = self._polynomial_seasonal_ma_idx
+            if self._polynomial_seasonal_ma.dtype == params.dtype:
+                self._polynomial_seasonal_ma[idx] = params_seasonal_ma
+            else:
+                polynomial_seasonal_ma = (
+                    self._polynomial_seasonal_ma.real.astype(params.dtype)
+                )
+                polynomial_seasonal_ma[idx] = params_seasonal_ma
+                self._polynomial_seasonal_ma = polynomial_seasonal_ma
+
+        # Get the reduced form lag polynomial terms by multiplying the regular
+        # and seasonal lag polynomials
+        # Note: that although the numpy np.polymul examples assume that they
+        # are ordered from highest degree to lowest, whereas our are from
+        # lowest to highest, it does not matter.
+        if self.k_seasonal_ar > 0:
+            reduced_polynomial_ar = -np.polymul(
+                self._polynomial_ar, self._polynomial_seasonal_ar
+            )
+        else:
+            reduced_polynomial_ar = -self._polynomial_ar
+        if self.k_seasonal_ma > 0:
+            reduced_polynomial_ma = np.polymul(
+                self._polynomial_ma, self._polynomial_seasonal_ma
+            )
+        else:
+            reduced_polynomial_ma = self._polynomial_ma
+
+        # Observation intercept
+        # Exogenous data with MLE estimation of parameters enters through a
+        # time-varying observation intercept (is equivalent to simply
+        # subtracting it out of the endogenous variable first)
+        if self.mle_regression:
+            self.ssm['obs_intercept'] = np.dot(self.exog, params_exog)[None, :]
+
+        # State intercept (Harvey) or additional observation intercept
+        # (Hamilton)
+        # SARIMA trend enters through the a time-varying state intercept,
+        # associated with the first row of the stationary component of the
+        # state vector (i.e. the first element of the state vector following
+        # any differencing elements)
+        if self._k_trend > 0:
+            data = np.dot(self._trend_data, params_trend).astype(params.dtype)
+            if not self.hamilton_representation:
+                self.ssm['state_intercept', self._k_states_diff, :] = data
+            else:
+                # The way the trend enters in the Hamilton representation means
+                # that the parameter is not an ``intercept'' but instead the
+                # mean of the process. The trend values in `data` are meant for
+                # an intercept, and so must be transformed to represent the
+                # mean instead
+                if self.hamilton_representation:
+                    data /= np.sum(-reduced_polynomial_ar)
+
+                # If we already set the observation intercept for MLE
+                # regression, just add to it
+                if self.mle_regression:
+                    self.ssm.obs_intercept += data[None, :]
+                # Otherwise set it directly
+                else:
+                    self.ssm['obs_intercept'] = data[None, :]
+
+        # Observation covariance matrix
+        if self.measurement_error:
+            self.ssm['obs_cov', 0, 0] = params_measurement_variance
+
+        # Transition matrix
+        if self.k_ar > 0 or self.k_seasonal_ar > 0:
+            self.ssm[self.transition_ar_params_idx] = reduced_polynomial_ar[1:]
+        elif not self.ssm.transition.dtype == params.dtype:
+            # This is required if the transition matrix is not really in use
+            # (e.g. for an MA(q) process) so that it's dtype never changes as
+            # the parameters' dtype changes. This changes the dtype manually.
+            self.ssm['transition'] = self.ssm['transition'].real.astype(
+                params.dtype)
+
+        # Selection matrix (Harvey) or Design matrix (Hamilton)
+        if self.k_ma > 0 or self.k_seasonal_ma > 0:
+            if not self.hamilton_representation:
+                self.ssm[self.selection_ma_params_idx] = (
+                    reduced_polynomial_ma[1:]
+                )
+            else:
+                self.ssm[self.design_ma_params_idx] = reduced_polynomial_ma[1:]
+
+        # State covariance matrix
+        if self.k_posdef > 0:
+            if not self.concentrate_scale:
+                self['state_cov', 0, 0] = params_variance
+            if self.state_regression and self.time_varying_regression:
+                self.ssm[self._exog_variance_idx] = params_exog_variance
+
+        return params
+
+    def _get_extension_time_varying_matrices(
+            self, params, exog, out_of_sample, extend_kwargs=None,
+            transformed=True, includes_fixed=False, **kwargs):
         """
         Get time-varying state space system matrices for extended model

@@ -621,7 +1713,49 @@ class SARIMAX(MLEModel):
         We need to override this method for SARIMAX because we need some
         special handling in the `simple_differencing=True` case.
         """
-        pass
+
+        # Get the appropriate exog for the extended sample
+        exog = self._validate_out_of_sample_exog(exog, out_of_sample)
+
+        # Get the tmp endog, exog
+        if self.simple_differencing:
+            nobs = self.data.orig_endog.shape[0] + out_of_sample
+            tmp_endog = np.zeros((nobs, self.k_endog))
+            if exog is not None:
+                tmp_exog = np.c_[self.data.orig_exog.T, exog.T].T
+            else:
+                tmp_exog = None
+        else:
+            tmp_endog = np.zeros((out_of_sample, self.k_endog))
+            tmp_exog = exog
+
+        # Create extended model
+        if extend_kwargs is None:
+            extend_kwargs = {}
+        if not self.simple_differencing and self.k_trend > 0:
+            extend_kwargs.setdefault(
+                'trend_offset', self.trend_offset + self.nobs)
+        extend_kwargs.setdefault('validate_specification', False)
+        mod_extend = self.clone(
+            endog=tmp_endog, exog=tmp_exog, **extend_kwargs)
+        mod_extend.update(params, transformed=transformed,
+                          includes_fixed=includes_fixed,)
+
+        # Retrieve the extensions to the time-varying system matrices and
+        # put them in kwargs
+        for name in self.ssm.shapes.keys():
+            if name == 'obs' or name in kwargs:
+                continue
+            original = getattr(self.ssm, name)
+            extended = getattr(mod_extend.ssm, name)
+            so = original.shape[-1]
+            se = extended.shape[-1]
+            if ((so > 1 or se > 1) or (
+                    so == 1 and self.nobs == 1 and
+                    np.any(original[..., 0] != extended[..., 0]))):
+                kwargs[name] = extended[..., -out_of_sample:]
+
+        return kwargs


 class SARIMAXResults(MLEResults):
@@ -667,39 +1801,67 @@ class SARIMAXResults(MLEResults):
     statsmodels.tsa.statespace.kalman_filter.FilterResults
     statsmodels.tsa.statespace.mlemodel.MLEResults
     """
-
-    def __init__(self, model, params, filter_results, cov_type=None, **kwargs):
+    def __init__(self, model, params, filter_results, cov_type=None,
+                 **kwargs):
         super(SARIMAXResults, self).__init__(model, params, filter_results,
-            cov_type, **kwargs)
-        self.df_resid = np.inf
+                                             cov_type, **kwargs)
+
+        self.df_resid = np.inf  # attribute required for wald tests
+
+        # Save _init_kwds
         self._init_kwds = self.model._get_init_kwds()
-        self.specification = Bunch(**{'seasonal_periods': self.model.
-            seasonal_periods, 'measurement_error': self.model.
-            measurement_error, 'time_varying_regression': self.model.
-            time_varying_regression, 'simple_differencing': self.model.
-            simple_differencing, 'enforce_stationarity': self.model.
-            enforce_stationarity, 'enforce_invertibility': self.model.
-            enforce_invertibility, 'hamilton_representation': self.model.
-            hamilton_representation, 'concentrate_scale': self.model.
-            concentrate_scale, 'trend_offset': self.model.trend_offset,
-            'order': self.model.order, 'seasonal_order': self.model.
-            seasonal_order, 'k_diff': self.model.k_diff, 'k_seasonal_diff':
-            self.model.k_seasonal_diff, 'k_ar': self.model.k_ar, 'k_ma':
-            self.model.k_ma, 'k_seasonal_ar': self.model.k_seasonal_ar,
-            'k_seasonal_ma': self.model.k_seasonal_ma, 'k_ar_params': self.
-            model.k_ar_params, 'k_ma_params': self.model.k_ma_params,
-            'trend': self.model.trend, 'k_trend': self.model.k_trend,
-            'k_exog': self.model.k_exog, 'mle_regression': self.model.
-            mle_regression, 'state_regression': self.model.state_regression})
+
+        # Save model specification
+        self.specification = Bunch(**{
+            # Set additional model parameters
+            'seasonal_periods': self.model.seasonal_periods,
+            'measurement_error': self.model.measurement_error,
+            'time_varying_regression': self.model.time_varying_regression,
+            'simple_differencing': self.model.simple_differencing,
+            'enforce_stationarity': self.model.enforce_stationarity,
+            'enforce_invertibility': self.model.enforce_invertibility,
+            'hamilton_representation': self.model.hamilton_representation,
+            'concentrate_scale': self.model.concentrate_scale,
+            'trend_offset': self.model.trend_offset,
+
+            'order': self.model.order,
+            'seasonal_order': self.model.seasonal_order,
+
+            # Model order
+            'k_diff': self.model.k_diff,
+            'k_seasonal_diff': self.model.k_seasonal_diff,
+            'k_ar': self.model.k_ar,
+            'k_ma': self.model.k_ma,
+            'k_seasonal_ar': self.model.k_seasonal_ar,
+            'k_seasonal_ma': self.model.k_seasonal_ma,
+
+            # Param Numbers
+            'k_ar_params': self.model.k_ar_params,
+            'k_ma_params': self.model.k_ma_params,
+
+            # Trend / Regression
+            'trend': self.model.trend,
+            'k_trend': self.model.k_trend,
+            'k_exog': self.model.k_exog,
+
+            'mle_regression': self.model.mle_regression,
+            'state_regression': self.model.state_regression,
+        })
+
+        # Polynomials
         self.polynomial_trend = self.model._polynomial_trend
         self.polynomial_ar = self.model._polynomial_ar
         self.polynomial_ma = self.model._polynomial_ma
         self.polynomial_seasonal_ar = self.model._polynomial_seasonal_ar
         self.polynomial_seasonal_ma = self.model._polynomial_seasonal_ma
-        self.polynomial_reduced_ar = np.polymul(self.polynomial_ar, self.
-            polynomial_seasonal_ar)
-        self.polynomial_reduced_ma = np.polymul(self.polynomial_ma, self.
-            polynomial_seasonal_ma)
+        self.polynomial_reduced_ar = np.polymul(
+            self.polynomial_ar, self.polynomial_seasonal_ar
+        )
+        self.polynomial_reduced_ma = np.polymul(
+            self.polynomial_ma, self.polynomial_seasonal_ma
+        )
+
+        # Distinguish parameters
         self.model_orders = self.model.model_orders
         self.param_terms = self.model.param_terms
         start = end = 0
@@ -717,24 +1879,31 @@ class SARIMAXResults(MLEResults):
             end += k
             setattr(self, '_params_%s' % name, self.params[start:end])
             start += k
+        # GH7527, all terms must be defined
         all_terms = ['ar', 'ma', 'seasonal_ar', 'seasonal_ma', 'variance']
         for name in set(all_terms).difference(self.param_terms):
             setattr(self, '_params_%s' % name, np.empty(0))
+
+        # Handle removing data
         self._data_attr_model.extend(['orig_endog', 'orig_exog'])

+    def extend(self, endog, exog=None, **kwargs):
+        kwargs.setdefault('trend_offset', self.nobs + 1)
+        return super(SARIMAXResults, self).extend(endog, exog=exog, **kwargs)
+
     @cache_readonly
     def arroots(self):
         """
         (array) Roots of the reduced form autoregressive lag polynomial
         """
-        pass
+        return np.roots(self.polynomial_reduced_ar)**-1

     @cache_readonly
     def maroots(self):
         """
         (array) Roots of the reduced form moving average lag polynomial
         """
-        pass
+        return np.roots(self.polynomial_reduced_ma)**-1

     @cache_readonly
     def arfreq(self):
@@ -742,7 +1911,10 @@ class SARIMAXResults(MLEResults):
         (array) Frequency of the roots of the reduced form autoregressive
         lag polynomial
         """
-        pass
+        z = self.arroots
+        if not z.size:
+            return
+        return np.arctan2(z.imag, z.real) / (2 * np.pi)

     @cache_readonly
     def mafreq(self):
@@ -750,7 +1922,10 @@ class SARIMAXResults(MLEResults):
         (array) Frequency of the roots of the reduced form moving average
         lag polynomial
         """
-        pass
+        z = self.maroots
+        if not z.size:
+            return
+        return np.arctan2(z.imag, z.real) / (2 * np.pi)

     @cache_readonly
     def arparams(self):
@@ -760,7 +1935,7 @@ class SARIMAXResults(MLEResults):
         `seasonalarparams`) or parameters whose values are constrained to be
         zero.
         """
-        pass
+        return self._params_ar

     @cache_readonly
     def seasonalarparams(self):
@@ -769,7 +1944,7 @@ class SARIMAXResults(MLEResults):
         model. Does not include nonseasonal autoregressive parameters (see
         `arparams`) or parameters whose values are constrained to be zero.
         """
-        pass
+        return self._params_seasonal_ar

     @cache_readonly
     def maparams(self):
@@ -779,7 +1954,7 @@ class SARIMAXResults(MLEResults):
         `seasonalmaparams`) or parameters whose values are constrained to be
         zero.
         """
-        pass
+        return self._params_ma

     @cache_readonly
     def seasonalmaparams(self):
@@ -788,14 +1963,72 @@ class SARIMAXResults(MLEResults):
         model. Does not include nonseasonal moving average parameters (see
         `maparams`) or parameters whose values are constrained to be zero.
         """
-        pass
+        return self._params_seasonal_ma
+
+    @Appender(MLEResults.summary.__doc__)
+    def summary(self, alpha=.05, start=None):
+        # Create the model name
+
+        # See if we have an ARIMA component
+        order = ''
+        if self.model.k_ar + self.model.k_diff + self.model.k_ma > 0:
+            if self.model.k_ar == self.model.k_ar_params:
+                order_ar = self.model.k_ar
+            else:
+                order_ar = list(self.model._spec.ar_lags)
+            if self.model.k_ma == self.model.k_ma_params:
+                order_ma = self.model.k_ma
+            else:
+                order_ma = list(self.model._spec.ma_lags)
+            # If there is simple differencing, then that is reflected in the
+            # dependent variable name
+            k_diff = 0 if self.model.simple_differencing else self.model.k_diff
+            order = '(%s, %d, %s)' % (order_ar, k_diff, order_ma)
+        # See if we have an SARIMA component
+        seasonal_order = ''
+        has_seasonal = (
+            self.model.k_seasonal_ar +
+            self.model.k_seasonal_diff +
+            self.model.k_seasonal_ma
+        ) > 0
+        if has_seasonal:
+            tmp = int(self.model.k_seasonal_ar / self.model.seasonal_periods)
+            if tmp == self.model.k_seasonal_ar_params:
+                order_seasonal_ar = (
+                    int(self.model.k_seasonal_ar / self.model.seasonal_periods)
+                )
+            else:
+                order_seasonal_ar = list(self.model._spec.seasonal_ar_lags)
+            tmp = int(self.model.k_seasonal_ma / self.model.seasonal_periods)
+            if tmp == self.model.k_ma_params:
+                order_seasonal_ma = tmp
+            else:
+                order_seasonal_ma = list(self.model._spec.seasonal_ma_lags)
+            # If there is simple differencing, then that is reflected in the
+            # dependent variable name
+            k_seasonal_diff = self.model.k_seasonal_diff
+            if self.model.simple_differencing:
+                k_seasonal_diff = 0
+            seasonal_order = ('(%s, %d, %s, %d)' %
+                              (str(order_seasonal_ar), k_seasonal_diff,
+                               str(order_seasonal_ma),
+                               self.model.seasonal_periods))
+            if not order == '':
+                order += 'x'
+        model_name = (
+            '%s%s%s' % (self.model.__class__.__name__, order, seasonal_order)
+            )
+        return super(SARIMAXResults, self).summary(
+            alpha=alpha, start=start, title='SARIMAX Results',
+            model_name=model_name
+        )


 class SARIMAXResultsWrapper(MLEResultsWrapper):
     _attrs = {}
-    _wrap_attrs = wrap.union_dicts(MLEResultsWrapper._wrap_attrs, _attrs)
+    _wrap_attrs = wrap.union_dicts(MLEResultsWrapper._wrap_attrs,
+                                   _attrs)
     _methods = {}
-    _wrap_methods = wrap.union_dicts(MLEResultsWrapper._wrap_methods, _methods)
-
-
-wrap.populate_wrapper(SARIMAXResultsWrapper, SARIMAXResults)
+    _wrap_methods = wrap.union_dicts(MLEResultsWrapper._wrap_methods,
+                                     _methods)
+wrap.populate_wrapper(SARIMAXResultsWrapper, SARIMAXResults)  # noqa:E305
diff --git a/statsmodels/tsa/statespace/simulation_smoother.py b/statsmodels/tsa/statespace/simulation_smoother.py
index a39d80bb9..27c8e5853 100644
--- a/statsmodels/tsa/statespace/simulation_smoother.py
+++ b/statsmodels/tsa/statespace/simulation_smoother.py
@@ -4,17 +4,22 @@ State Space Representation, Kalman Filter, Smoother, and Simulation Smoother
 Author: Chad Fulton
 License: Simplified-BSD
 """
+
 import numbers
 import warnings
 import numpy as np
 from .kalman_smoother import KalmanSmoother
 from .cfa_simulation_smoother import CFASimulationSmoother
 from . import tools
-SIMULATION_STATE = 1
-SIMULATION_DISTURBANCE = 4
-SIMULATION_ALL = SIMULATION_STATE | SIMULATION_DISTURBANCE
+
+SIMULATION_STATE = 0x01
+SIMULATION_DISTURBANCE = 0x04
+SIMULATION_ALL = (
+    SIMULATION_STATE | SIMULATION_DISTURBANCE
+)


+# Based on scipy.states._qmc.check_random_state
 def check_random_state(seed=None):
     """Turn `seed` into a `numpy.random.Generator` instance.
     Parameters
@@ -31,11 +36,17 @@ def check_random_state(seed=None):
     seed : {`numpy.random.Generator`, `numpy.random.RandomState`}
         Random number generator.
     """
-    pass
+    if seed is None or isinstance(seed, (numbers.Integral, np.integer)):
+        return np.random.default_rng(seed)
+    elif isinstance(seed, (np.random.RandomState, np.random.Generator)):
+        return seed
+    else:
+        raise ValueError(f'{seed!r} cannot be used to seed a'
+                         ' numpy.random.Generator instance')


 class SimulationSmoother(KalmanSmoother):
-    """
+    r"""
     State space representation of a time series process, with Kalman filter
     and smoother, and with simulation smoother.

@@ -63,25 +74,35 @@ class SimulationSmoother(KalmanSmoother):
         See `Representation`, `KalmanFilter`, and `KalmanSmoother` for more
         details.
     """
-    simulation_outputs = ['simulate_state', 'simulate_disturbance',
-        'simulate_all']
+
+    simulation_outputs = [
+        'simulate_state', 'simulate_disturbance', 'simulate_all'
+    ]

     def __init__(self, k_endog, k_states, k_posdef=None,
-        simulation_smooth_results_class=None, simulation_smoother_classes=
-        None, **kwargs):
-        super(SimulationSmoother, self).__init__(k_endog, k_states,
-            k_posdef, **kwargs)
+                 simulation_smooth_results_class=None,
+                 simulation_smoother_classes=None, **kwargs):
+        super(SimulationSmoother, self).__init__(
+            k_endog, k_states, k_posdef, **kwargs
+        )
+
         if simulation_smooth_results_class is None:
             simulation_smooth_results_class = SimulationSmoothResults
         self.simulation_smooth_results_class = simulation_smooth_results_class
-        self.prefix_simulation_smoother_map = (simulation_smoother_classes if
-            simulation_smoother_classes is not None else tools.
-            prefix_simulation_smoother_map.copy())
+
+        self.prefix_simulation_smoother_map = (
+            simulation_smoother_classes
+            if simulation_smoother_classes is not None
+            else tools.prefix_simulation_smoother_map.copy())
+
+        # Holder for an model-level simulation smoother objects, to use in
+        # simulating new time series.
         self._simulators = {}

-    def get_simulation_output(self, simulation_output=None, simulate_state=
-        None, simulate_disturbance=None, simulate_all=None, **kwargs):
-        """
+    def get_simulation_output(self, simulation_output=None,
+                              simulate_state=None, simulate_disturbance=None,
+                              simulate_all=None, **kwargs):
+        r"""
         Get simulation output bitmask

         Helper method to get final simulation output bitmask from a set of
@@ -100,15 +121,70 @@ class SimulationSmoother(KalmanSmoother):
             in the simulation output.
         simulate_all : bool, optional
             Whether or not to include all simulation output.
-        \\*\\*kwargs
+        \*\*kwargs
             Additional keyword arguments. Present so that calls to this method
-            can use \\*\\*kwargs without clearing out additional arguments.
+            can use \*\*kwargs without clearing out additional arguments.
         """
-        pass
+        # If we do not explicitly have simulation_output, try to get it from
+        # kwargs
+        if simulation_output is None:
+            simulation_output = 0
+
+            if simulate_state:
+                simulation_output |= SIMULATION_STATE
+            if simulate_disturbance:
+                simulation_output |= SIMULATION_DISTURBANCE
+            if simulate_all:
+                simulation_output |= SIMULATION_ALL
+
+            # Handle case of no information in kwargs
+            if simulation_output == 0:
+
+                # If some arguments were passed, but we still do not have any
+                # simulation output, raise an exception
+                argument_set = not all([
+                    simulate_state is None, simulate_disturbance is None,
+                    simulate_all is None
+                ])
+                if argument_set:
+                    raise ValueError("Invalid simulation output options:"
+                                     " given options would result in no"
+                                     " output.")
+
+                # Otherwise set simulation output to be the same as smoother
+                # output
+                simulation_output = self.smoother_output
+
+        return simulation_output
+
+    def _simulate(self, nsimulations, simulator=None, random_state=None,
+                  return_simulator=False, **kwargs):
+        # Create the simulator, if necessary
+        if simulator is None:
+            simulator = self.simulator(nsimulations, random_state=random_state)
+
+        # Perform simulation smoothing
+        simulator.simulate(**kwargs)
+
+        # Retrieve and return the objects of interest
+        simulated_obs = np.array(simulator.generated_obs, copy=True)
+        simulated_state = np.array(simulator.generated_state, copy=True)
+
+        out = (simulated_obs.T[:nsimulations],
+               simulated_state.T[:nsimulations])
+        if return_simulator:
+            out = out + (simulator,)
+        return out
+
+    def simulator(self, nsimulations, random_state=None):
+        return self.simulation_smoother(simulation_output=0, method='kfs',
+                                        nobs=nsimulations,
+                                        random_state=random_state)

     def simulation_smoother(self, simulation_output=None, method='kfs',
-        results_class=None, prefix=None, nobs=-1, random_state=None, **kwargs):
-        """
+                            results_class=None, prefix=None, nobs=-1,
+                            random_state=None, **kwargs):
+        r"""
         Retrieve a simulation smoother for the statespace model.

         Parameters
@@ -149,11 +225,69 @@ class SimulationSmoother(KalmanSmoother):
         -------
         SimulationSmoothResults
         """
-        pass
+        method = method.lower()
+
+        # Short-circuit for CFA
+        if method == 'cfa':
+            if simulation_output not in [None, 1, -1]:
+                raise ValueError('Can only retrieve simulations of the state'
+                                 ' vector using the CFA simulation smoother.')
+            return CFASimulationSmoother(self)
+        elif method != 'kfs':
+            raise ValueError('Invalid simulation smoother method "%s". Valid'
+                             ' methods are "kfs" or "cfa".' % method)
+
+        # Set the class to be the default results class, if None provided
+        if results_class is None:
+            results_class = self.simulation_smooth_results_class
+
+        # Instantiate a new results object
+        if not issubclass(results_class, SimulationSmoothResults):
+            raise ValueError('Invalid results class provided.')
+
+        # Make sure we have the required Statespace representation
+        prefix, dtype, create_smoother, create_filter, create_statespace = (
+            self._initialize_smoother())
+
+        # Simulation smoother parameters
+        simulation_output = self.get_simulation_output(simulation_output,
+                                                       **kwargs)
+
+        # Kalman smoother parameters
+        smoother_output = kwargs.get('smoother_output', simulation_output)
+
+        # Kalman filter parameters
+        filter_method = kwargs.get('filter_method', self.filter_method)
+        inversion_method = kwargs.get('inversion_method',
+                                      self.inversion_method)
+        stability_method = kwargs.get('stability_method',
+                                      self.stability_method)
+        conserve_memory = kwargs.get('conserve_memory',
+                                     self.conserve_memory)
+        filter_timing = kwargs.get('filter_timing',
+                                   self.filter_timing)
+        loglikelihood_burn = kwargs.get('loglikelihood_burn',
+                                        self.loglikelihood_burn)
+        tolerance = kwargs.get('tolerance', self.tolerance)
+
+        # Create a new simulation smoother object
+        cls = self.prefix_simulation_smoother_map[prefix]
+        simulation_smoother = cls(
+            self._statespaces[prefix],
+            filter_method, inversion_method, stability_method, conserve_memory,
+            filter_timing, tolerance, loglikelihood_burn, smoother_output,
+            simulation_output, nobs
+        )
+
+        # Create results object
+        results = results_class(self, simulation_smoother,
+                                random_state=random_state)
+
+        return results


 class SimulationSmoothResults:
-    """
+    r"""
     Results from applying the Kalman smoother and/or filter to a state space
     model.

@@ -213,6 +347,8 @@ class SimulationSmoothResults:
         self.dtype = model.dtype
         self._simulation_smoother = simulation_smoother
         self.random_state = check_random_state(random_state)
+
+        # Output
         self._generated_measurement_disturbance = None
         self._generated_state_disturbance = None
         self._generated_obs = None
@@ -221,9 +357,52 @@ class SimulationSmoothResults:
         self._simulated_measurement_disturbance = None
         self._simulated_state_disturbance = None

+    @property
+    def simulation_output(self):
+        return self._simulation_smoother.simulation_output
+
+    @simulation_output.setter
+    def simulation_output(self, value):
+        self._simulation_smoother.simulation_output = value
+
+    @property
+    def simulate_state(self):
+        return bool(self.simulation_output & SIMULATION_STATE)
+
+    @simulate_state.setter
+    def simulate_state(self, value):
+        if bool(value):
+            self.simulation_output = self.simulation_output | SIMULATION_STATE
+        else:
+            self.simulation_output = self.simulation_output & ~SIMULATION_STATE
+
+    @property
+    def simulate_disturbance(self):
+        return bool(self.simulation_output & SIMULATION_DISTURBANCE)
+
+    @simulate_disturbance.setter
+    def simulate_disturbance(self, value):
+        if bool(value):
+            self.simulation_output = (
+                self.simulation_output | SIMULATION_DISTURBANCE)
+        else:
+            self.simulation_output = (
+                self.simulation_output & ~SIMULATION_DISTURBANCE)
+
+    @property
+    def simulate_all(self):
+        return bool(self.simulation_output & SIMULATION_ALL)
+
+    @simulate_all.setter
+    def simulate_all(self, value):
+        if bool(value):
+            self.simulation_output = self.simulation_output | SIMULATION_ALL
+        else:
+            self.simulation_output = self.simulation_output & ~SIMULATION_ALL
+
     @property
     def generated_measurement_disturbance(self):
-        """
+        r"""
         Randomly drawn measurement disturbance variates

         Used to construct `generated_obs`.
@@ -233,17 +412,21 @@ class SimulationSmoothResults:

         .. math::

-           \\varepsilon_t^+ ~ N(0, H_t)
+           \varepsilon_t^+ ~ N(0, H_t)

         If `disturbance_variates` were provided to the `simulate()` method,
         then this returns those variates (which were N(0,1)) transformed to the
         distribution above.
         """
-        pass
+        if self._generated_measurement_disturbance is None:
+            self._generated_measurement_disturbance = np.array(
+                self._simulation_smoother.measurement_disturbance_variates,
+                copy=True).reshape(self.model.nobs, self.model.k_endog)
+        return self._generated_measurement_disturbance

     @property
     def generated_state_disturbance(self):
-        """
+        r"""
         Randomly drawn state disturbance variates, used to construct
         `generated_state` and `generated_obs`.

@@ -252,17 +435,21 @@ class SimulationSmoothResults:

         .. math::

-            \\eta_t^+ ~ N(0, Q_t)
+            \eta_t^+ ~ N(0, Q_t)

         If `disturbance_variates` were provided to the `simulate()` method,
         then this returns those variates (which were N(0,1)) transformed to the
         distribution above.
         """
-        pass
+        if self._generated_state_disturbance is None:
+            self._generated_state_disturbance = np.array(
+                self._simulation_smoother.state_disturbance_variates,
+                copy=True).reshape(self.model.nobs, self.model.k_posdef)
+        return self._generated_state_disturbance

     @property
     def generated_obs(self):
-        """
+        r"""
         Generated vector of observations by iterating on the observation and
         transition equations, given a random initial state draw and random
         disturbance draws.
@@ -272,13 +459,17 @@ class SimulationSmoothResults:

         .. math::

-            y_t^+ = d_t + Z_t \\alpha_t^+ + \\varepsilon_t^+
+            y_t^+ = d_t + Z_t \alpha_t^+ + \varepsilon_t^+
         """
-        pass
+        if self._generated_obs is None:
+            self._generated_obs = np.array(
+                self._simulation_smoother.generated_obs, copy=True
+            )
+        return self._generated_obs

     @property
     def generated_state(self):
-        """
+        r"""
         Generated vector of states by iterating on the transition equation,
         given a random initial state draw and random disturbance draws.

@@ -287,13 +478,17 @@ class SimulationSmoothResults:

         .. math::

-            \\alpha_{t+1}^+ = c_t + T_t \\alpha_t^+ + \\eta_t^+
+            \alpha_{t+1}^+ = c_t + T_t \alpha_t^+ + \eta_t^+
         """
-        pass
+        if self._generated_state is None:
+            self._generated_state = np.array(
+                self._simulation_smoother.generated_state, copy=True
+            )
+        return self._generated_state

     @property
     def simulated_state(self):
-        """
+        r"""
         Random draw of the state vector from its conditional distribution.

         Notes
@@ -301,13 +496,17 @@ class SimulationSmoothResults:

         .. math::

-            \\alpha ~ p(\\alpha \\mid Y_n)
+            \alpha ~ p(\alpha \mid Y_n)
         """
-        pass
+        if self._simulated_state is None:
+            self._simulated_state = np.array(
+                self._simulation_smoother.simulated_state, copy=True
+            )
+        return self._simulated_state

     @property
     def simulated_measurement_disturbance(self):
-        """
+        r"""
         Random draw of the measurement disturbance vector from its conditional
         distribution.

@@ -316,13 +515,18 @@ class SimulationSmoothResults:

         .. math::

-            \\varepsilon ~ N(\\hat \\varepsilon, Var(\\hat \\varepsilon \\mid Y_n))
+            \varepsilon ~ N(\hat \varepsilon, Var(\hat \varepsilon \mid Y_n))
         """
-        pass
+        if self._simulated_measurement_disturbance is None:
+            self._simulated_measurement_disturbance = np.array(
+                self._simulation_smoother.simulated_measurement_disturbance,
+                copy=True
+            )
+        return self._simulated_measurement_disturbance

     @property
     def simulated_state_disturbance(self):
-        """
+        r"""
         Random draw of the state disturbanc e vector from its conditional
         distribution.

@@ -331,17 +535,26 @@ class SimulationSmoothResults:

         .. math::

-            \\eta ~ N(\\hat \\eta, Var(\\hat \\eta \\mid Y_n))
-        """
-        pass
-
-    def simulate(self, simulation_output=-1, disturbance_variates=None,
-        measurement_disturbance_variates=None, state_disturbance_variates=
-        None, initial_state_variates=None, pretransformed=None,
-        pretransformed_measurement_disturbance_variates=None,
-        pretransformed_state_disturbance_variates=None,
-        pretransformed_initial_state_variates=False, random_state=None):
+            \eta ~ N(\hat \eta, Var(\hat \eta \mid Y_n))
         """
+        if self._simulated_state_disturbance is None:
+            self._simulated_state_disturbance = np.array(
+                self._simulation_smoother.simulated_state_disturbance,
+                copy=True
+            )
+        return self._simulated_state_disturbance
+
+    def simulate(self, simulation_output=-1,
+                 disturbance_variates=None,
+                 measurement_disturbance_variates=None,
+                 state_disturbance_variates=None,
+                 initial_state_variates=None,
+                 pretransformed=None,
+                 pretransformed_measurement_disturbance_variates=None,
+                 pretransformed_state_disturbance_variates=None,
+                 pretransformed_initial_state_variates=False,
+                 random_state=None):
+        r"""
         Perform simulation smoothing

         Does not return anything, but populates the object's `simulated_*`
@@ -354,13 +567,13 @@ class SimulationSmoothResults:
             simulation output defined in object initialization.
         measurement_disturbance_variates : array_like, optional
             If specified, these are the shocks to the measurement equation,
-            :math:`\\varepsilon_t`. If unspecified, these are automatically
+            :math:`\varepsilon_t`. If unspecified, these are automatically
             generated using a pseudo-random number generator. If specified,
             must be shaped `nsimulations` x `k_endog`, where `k_endog` is the
             same as in the state space model.
         state_disturbance_variates : array_like, optional
             If specified, these are the shocks to the state equation,
-            :math:`\\eta_t`. If unspecified, these are automatically
+            :math:`\eta_t`. If unspecified, these are automatically
             generated using a pseudo-random number generator. If specified,
             must be shaped `nsimulations` x `k_posdef` where `k_posdef` is the
             same as in the state space model.
@@ -416,4 +629,115 @@ class SimulationSmoothResults:
                Use ``pretransformed_measurement_disturbance_variates`` and
                ``pretransformed_state_disturbance_variates`` as replacements.
         """
-        pass
+        # Handle deprecated argumennts
+        if disturbance_variates is not None:
+            msg = ('`disturbance_variates` keyword is deprecated, use'
+                   ' `measurement_disturbance_variates` and'
+                   ' `state_disturbance_variates` instead.')
+            warnings.warn(msg, FutureWarning)
+            if (measurement_disturbance_variates is not None
+                    or state_disturbance_variates is not None):
+                raise ValueError('Cannot use `disturbance_variates` in'
+                                 ' combination with '
+                                 ' `measurement_disturbance_variates` or'
+                                 ' `state_disturbance_variates`.')
+            if disturbance_variates is not None:
+                disturbance_variates = disturbance_variates.ravel()
+                n_mds = self.model.nobs * self.model.k_endog
+                measurement_disturbance_variates = disturbance_variates[:n_mds]
+                state_disturbance_variates = disturbance_variates[n_mds:]
+        if pretransformed is not None:
+            msg = ('`pretransformed` keyword is deprecated, use'
+                   ' `pretransformed_measurement_disturbance_variates` and'
+                   ' `pretransformed_state_disturbance_variates` instead.')
+            warnings.warn(msg, FutureWarning)
+            if (pretransformed_measurement_disturbance_variates is not None
+                    or pretransformed_state_disturbance_variates is not None):
+                raise ValueError(
+                    'Cannot use `pretransformed` in combination with '
+                    ' `pretransformed_measurement_disturbance_variates` or'
+                    ' `pretransformed_state_disturbance_variates`.')
+            if pretransformed is not None:
+                pretransformed_measurement_disturbance_variates = (
+                    pretransformed)
+                pretransformed_state_disturbance_variates = pretransformed
+
+        if pretransformed_measurement_disturbance_variates is None:
+            pretransformed_measurement_disturbance_variates = False
+        if pretransformed_state_disturbance_variates is None:
+            pretransformed_state_disturbance_variates = False
+
+        # Clear any previous output
+        self._generated_measurement_disturbance = None
+        self._generated_state_disturbance = None
+        self._generated_state = None
+        self._generated_obs = None
+        self._generated_state = None
+        self._simulated_state = None
+        self._simulated_measurement_disturbance = None
+        self._simulated_state_disturbance = None
+
+        # Handle the random state
+        if random_state is None:
+            random_state = self.random_state
+        else:
+            random_state = check_random_state(random_state)
+
+        # Re-initialize the _statespace representation
+        prefix, dtype, create_smoother, create_filter, create_statespace = (
+            self.model._initialize_smoother())
+        if create_statespace:
+            raise ValueError('The simulation smoother currently cannot replace'
+                             ' the underlying _{{prefix}}Representation model'
+                             ' object if it changes (which happens e.g. if the'
+                             ' dimensions of some system matrices change.')
+
+        # Initialize the state
+        self.model._initialize_state(prefix=prefix)
+
+        # Draw the (independent) random variates for disturbances in the
+        # simulation
+        if measurement_disturbance_variates is not None:
+            self._simulation_smoother.set_measurement_disturbance_variates(
+                np.array(measurement_disturbance_variates,
+                         dtype=self.dtype).ravel(),
+                pretransformed=pretransformed_measurement_disturbance_variates
+            )
+        else:
+            self._simulation_smoother.draw_measurement_disturbance_variates(
+                random_state)
+
+        # Draw the (independent) random variates for disturbances in the
+        # simulation
+        if state_disturbance_variates is not None:
+            self._simulation_smoother.set_state_disturbance_variates(
+                np.array(state_disturbance_variates, dtype=self.dtype).ravel(),
+                pretransformed=pretransformed_state_disturbance_variates
+            )
+        else:
+            self._simulation_smoother.draw_state_disturbance_variates(
+                random_state)
+
+        # Draw the (independent) random variates for the initial states in the
+        # simulation
+        if initial_state_variates is not None:
+            if pretransformed_initial_state_variates:
+                self._simulation_smoother.set_initial_state(
+                    np.array(initial_state_variates, dtype=self.dtype)
+                )
+            else:
+                self._simulation_smoother.set_initial_state_variates(
+                    np.array(initial_state_variates, dtype=self.dtype),
+                    pretransformed=False
+                )
+            # Note: there is a third option, which is to set the initial state
+            # variates with pretransformed = True. However, this option simply
+            # eliminates the multiplication by the Cholesky factor of the
+            # initial state cov, but still adds the initial state mean. It's
+            # not clear when this would be useful...
+        else:
+            self._simulation_smoother.draw_initial_state_variates(
+                random_state)
+
+        # Perform simulation smoothing
+        self._simulation_smoother.simulate(simulation_output)
diff --git a/statsmodels/tsa/statespace/structural.py b/statsmodels/tsa/statespace/structural.py
index 9db920ace..76b6597c9 100644
--- a/statsmodels/tsa/statespace/structural.py
+++ b/statsmodels/tsa/statespace/structural.py
@@ -6,26 +6,43 @@ TODO: tests: "** On entry to DLASCL, parameter number  4 had an illegal value"
 Author: Chad Fulton
 License: Simplified-BSD
 """
+
 from warnings import warn
+
 import numpy as np
+
 from statsmodels.compat.pandas import Appender
 from statsmodels.tools.tools import Bunch
 from statsmodels.tools.sm_exceptions import OutputWarning, SpecificationWarning
 import statsmodels.base.wrapper as wrap
+
 from statsmodels.tsa.filters.hp_filter import hpfilter
 from statsmodels.tsa.tsatools import lagmat
+
 from .mlemodel import MLEModel, MLEResults, MLEResultsWrapper
 from .initialization import Initialization
-from .tools import companion_matrix, constrain_stationary_univariate, unconstrain_stationary_univariate, prepare_exog
-_mask_map = {(1): 'irregular', (2): 'fixed intercept', (3):
-    'deterministic constant', (6): 'random walk', (7): 'local level', (8):
-    'fixed slope', (11): 'deterministic trend', (14):
-    'random walk with drift', (15): 'local linear deterministic trend', (31
-    ): 'local linear trend', (27): 'smooth trend', (26): 'random trend'}
+from .tools import (
+    companion_matrix, constrain_stationary_univariate,
+    unconstrain_stationary_univariate, prepare_exog)
+
+_mask_map = {
+    1: 'irregular',
+    2: 'fixed intercept',
+    3: 'deterministic constant',
+    6: 'random walk',
+    7: 'local level',
+    8: 'fixed slope',
+    11: 'deterministic trend',
+    14: 'random walk with drift',
+    15: 'local linear deterministic trend',
+    31: 'local linear trend',
+    27: 'smooth trend',
+    26: 'random trend'
+}


 class UnobservedComponents(MLEModel):
-    """
+    r"""
     Univariate unobserved components time series model

     These are also known as structural time series models, and decompose a
@@ -79,7 +96,7 @@ class UnobservedComponents(MLEModel):
         A tuple with lower and upper allowed bounds for the period of the
         cycle. If not provided, the following default bounds are used:
         (1) if no date / time information is provided, the frequency is
-        constrained to be between zero and :math:`\\pi`, so the period is
+        constrained to be between zero and :math:`\pi`, so the period is
         constrained to be in [0.5, infinity].
         (2) If the date / time information is provided, the default bounds
         allow the cyclical component to be between 1.5 and 12 years; depending
@@ -107,12 +124,12 @@ class UnobservedComponents(MLEModel):

     .. math::

-        y_t = \\mu_t + \\gamma_t + c_t + \\varepsilon_t
+        y_t = \mu_t + \gamma_t + c_t + \varepsilon_t

     where :math:`y_t` refers to the observation vector at time :math:`t`,
-    :math:`\\mu_t` refers to the trend component, :math:`\\gamma_t` refers to the
+    :math:`\mu_t` refers to the trend component, :math:`\gamma_t` refers to the
     seasonal component, :math:`c_t` refers to the cycle, and
-    :math:`\\varepsilon_t` is the irregular. The modeling details of these
+    :math:`\varepsilon_t` is the irregular. The modeling details of these
     components are given below.

     **Trend**
@@ -122,15 +139,15 @@ class UnobservedComponents(MLEModel):

     .. math::

-        \\mu_t = \\mu_{t-1} + \\beta_{t-1} + \\eta_{t-1} \\\\
-        \\beta_t = \\beta_{t-1} + \\zeta_{t-1}
+        \mu_t = \mu_{t-1} + \beta_{t-1} + \eta_{t-1} \\
+        \beta_t = \beta_{t-1} + \zeta_{t-1}

     where the level is a generalization of the intercept term that can
     dynamically vary across time, and the trend is a generalization of the
     time-trend such that the slope can dynamically vary across time.

-    Here :math:`\\eta_t \\sim N(0, \\sigma_\\eta^2)` and
-    :math:`\\zeta_t \\sim N(0, \\sigma_\\zeta^2)`.
+    Here :math:`\eta_t \sim N(0, \sigma_\eta^2)` and
+    :math:`\zeta_t \sim N(0, \sigma_\zeta^2)`.

     For both elements (level and trend), we can consider models in which:

@@ -150,41 +167,41 @@ class UnobservedComponents(MLEModel):
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+
     | Model name                       | Full string syntax                   | Abbreviated syntax | Model                                            |
     +==================================+======================================+====================+==================================================+
-    | No trend                         | `'irregular'`                        | `'ntrend'`         | .. math:: y_t = \\varepsilon_t                    |
+    | No trend                         | `'irregular'`                        | `'ntrend'`         | .. math:: y_t = \varepsilon_t                    |
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+
-    | Fixed intercept                  | `'fixed intercept'`                  |                    | .. math:: y_t = \\mu                              |
+    | Fixed intercept                  | `'fixed intercept'`                  |                    | .. math:: y_t = \mu                              |
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+
-    | Deterministic constant           | `'deterministic constant'`           | `'dconstant'`      | .. math:: y_t = \\mu + \\varepsilon_t              |
+    | Deterministic constant           | `'deterministic constant'`           | `'dconstant'`      | .. math:: y_t = \mu + \varepsilon_t              |
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+
-    | Local level                      | `'local level'`                      | `'llevel'`         | .. math:: y_t &= \\mu_t + \\varepsilon_t \\\\        |
-    |                                  |                                      |                    |     \\mu_t &= \\mu_{t-1} + \\eta_t                  |
+    | Local level                      | `'local level'`                      | `'llevel'`         | .. math:: y_t &= \mu_t + \varepsilon_t \\        |
+    |                                  |                                      |                    |     \mu_t &= \mu_{t-1} + \eta_t                  |
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+
-    | Random walk                      | `'random walk'`                      | `'rwalk'`          | .. math:: y_t &= \\mu_t \\\\                        |
-    |                                  |                                      |                    |     \\mu_t &= \\mu_{t-1} + \\eta_t                  |
+    | Random walk                      | `'random walk'`                      | `'rwalk'`          | .. math:: y_t &= \mu_t \\                        |
+    |                                  |                                      |                    |     \mu_t &= \mu_{t-1} + \eta_t                  |
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+
-    | Fixed slope                      | `'fixed slope'`                      |                    | .. math:: y_t &= \\mu_t \\\\                        |
-    |                                  |                                      |                    |     \\mu_t &= \\mu_{t-1} + \\beta                   |
+    | Fixed slope                      | `'fixed slope'`                      |                    | .. math:: y_t &= \mu_t \\                        |
+    |                                  |                                      |                    |     \mu_t &= \mu_{t-1} + \beta                   |
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+
-    | Deterministic trend              | `'deterministic trend'`              | `'dtrend'`         | .. math:: y_t &= \\mu_t + \\varepsilon_t \\\\        |
-    |                                  |                                      |                    |     \\mu_t &= \\mu_{t-1} + \\beta                   |
+    | Deterministic trend              | `'deterministic trend'`              | `'dtrend'`         | .. math:: y_t &= \mu_t + \varepsilon_t \\        |
+    |                                  |                                      |                    |     \mu_t &= \mu_{t-1} + \beta                   |
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+
-    | Local linear deterministic trend | `'local linear deterministic trend'` | `'lldtrend'`       | .. math:: y_t &= \\mu_t + \\varepsilon_t \\\\        |
-    |                                  |                                      |                    |     \\mu_t &= \\mu_{t-1} + \\beta + \\eta_t          |
+    | Local linear deterministic trend | `'local linear deterministic trend'` | `'lldtrend'`       | .. math:: y_t &= \mu_t + \varepsilon_t \\        |
+    |                                  |                                      |                    |     \mu_t &= \mu_{t-1} + \beta + \eta_t          |
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+
-    | Random walk with drift           | `'random walk with drift'`           | `'rwdrift'`        | .. math:: y_t &= \\mu_t \\\\                        |
-    |                                  |                                      |                    |     \\mu_t &= \\mu_{t-1} + \\beta + \\eta_t          |
+    | Random walk with drift           | `'random walk with drift'`           | `'rwdrift'`        | .. math:: y_t &= \mu_t \\                        |
+    |                                  |                                      |                    |     \mu_t &= \mu_{t-1} + \beta + \eta_t          |
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+
-    | Local linear trend               | `'local linear trend'`               | `'lltrend'`        | .. math:: y_t &= \\mu_t + \\varepsilon_t \\\\        |
-    |                                  |                                      |                    |     \\mu_t &= \\mu_{t-1} + \\beta_{t-1} + \\eta_t \\\\ |
-    |                                  |                                      |                    |     \\beta_t &= \\beta_{t-1} + \\zeta_t             |
+    | Local linear trend               | `'local linear trend'`               | `'lltrend'`        | .. math:: y_t &= \mu_t + \varepsilon_t \\        |
+    |                                  |                                      |                    |     \mu_t &= \mu_{t-1} + \beta_{t-1} + \eta_t \\ |
+    |                                  |                                      |                    |     \beta_t &= \beta_{t-1} + \zeta_t             |
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+
-    | Smooth trend                     | `'smooth trend'`                     | `'strend'`         | .. math:: y_t &= \\mu_t + \\varepsilon_t \\\\        |
-    |                                  |                                      |                    |     \\mu_t &= \\mu_{t-1} + \\beta_{t-1} \\\\          |
-    |                                  |                                      |                    |     \\beta_t &= \\beta_{t-1} + \\zeta_t             |
+    | Smooth trend                     | `'smooth trend'`                     | `'strend'`         | .. math:: y_t &= \mu_t + \varepsilon_t \\        |
+    |                                  |                                      |                    |     \mu_t &= \mu_{t-1} + \beta_{t-1} \\          |
+    |                                  |                                      |                    |     \beta_t &= \beta_{t-1} + \zeta_t             |
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+
-    | Random trend                     | `'random trend'`                     | `'rtrend'`         | .. math:: y_t &= \\mu_t \\\\                        |
-    |                                  |                                      |                    |     \\mu_t &= \\mu_{t-1} + \\beta_{t-1} \\\\          |
-    |                                  |                                      |                    |     \\beta_t &= \\beta_{t-1} + \\zeta_t             |
+    | Random trend                     | `'random trend'`                     | `'rtrend'`         | .. math:: y_t &= \mu_t \\                        |
+    |                                  |                                      |                    |     \mu_t &= \mu_{t-1} + \beta_{t-1} \\          |
+    |                                  |                                      |                    |     \beta_t &= \beta_{t-1} + \zeta_t             |
     +----------------------------------+--------------------------------------+--------------------+--------------------------------------------------+

     Following the fitting of the model, the unobserved level and trend
@@ -197,17 +214,17 @@ class UnobservedComponents(MLEModel):

     .. math::

-        \\gamma_t = - \\sum_{j=1}^{s-1} \\gamma_{t+1-j} + \\omega_t \\\\
-        \\omega_t \\sim N(0, \\sigma_\\omega^2)
+        \gamma_t = - \sum_{j=1}^{s-1} \gamma_{t+1-j} + \omega_t \\
+        \omega_t \sim N(0, \sigma_\omega^2)

     The periodicity (number of seasons) is s, and the defining character is
     that (without the error term), the seasonal components sum to zero across
     one complete cycle. The inclusion of an error term allows the seasonal
-    effects to vary over time (if this is not desired, :math:`\\sigma_\\omega^2`
+    effects to vary over time (if this is not desired, :math:`\sigma_\omega^2`
     can be set to zero using the `stochastic_seasonal=False` keyword argument).

     This component results in one parameter to be selected via maximum
-    likelihood: :math:`\\sigma_\\omega^2`, and one parameter to be chosen, the
+    likelihood: :math:`\sigma_\omega^2`, and one parameter to be chosen, the
     number of seasons `s`.

     Following the fitting of the model, the unobserved seasonal component
@@ -220,14 +237,14 @@ class UnobservedComponents(MLEModel):

     .. math::

-        \\gamma_t & =  \\sum_{j=1}^h \\gamma_{j, t} \\\\
-        \\gamma_{j, t+1} & = \\gamma_{j, t}\\cos(\\lambda_j)
-                        + \\gamma^{*}_{j, t}\\sin(\\lambda_j) + \\omega_{j,t} \\\\
-        \\gamma^{*}_{j, t+1} & = -\\gamma^{(1)}_{j, t}\\sin(\\lambda_j)
-                            + \\gamma^{*}_{j, t}\\cos(\\lambda_j)
-                            + \\omega^{*}_{j, t}, \\\\
-        \\omega^{*}_{j, t}, \\omega_{j, t} & \\sim N(0, \\sigma_{\\omega^2}) \\\\
-        \\lambda_j & = \\frac{2 \\pi j}{s}
+        \gamma_t & =  \sum_{j=1}^h \gamma_{j, t} \\
+        \gamma_{j, t+1} & = \gamma_{j, t}\cos(\lambda_j)
+                        + \gamma^{*}_{j, t}\sin(\lambda_j) + \omega_{j,t} \\
+        \gamma^{*}_{j, t+1} & = -\gamma^{(1)}_{j, t}\sin(\lambda_j)
+                            + \gamma^{*}_{j, t}\cos(\lambda_j)
+                            + \omega^{*}_{j, t}, \\
+        \omega^{*}_{j, t}, \omega_{j, t} & \sim N(0, \sigma_{\omega^2}) \\
+        \lambda_j & = \frac{2 \pi j}{s}

     where j ranges from 1 to h.

@@ -241,9 +258,9 @@ class UnobservedComponents(MLEModel):
     meaning they will not vary over time.

     This component results in one parameter to be fitted using maximum
-    likelihood: :math:`\\sigma_{\\omega^2}`, and up to two parameters to be
+    likelihood: :math:`\sigma_{\omega^2}`, and up to two parameters to be
     chosen, the number of seasons s and optionally the number of harmonics
-    h, with :math:`1 \\leq h \\leq \\lfloor s/2 \\rfloor`.
+    h, with :math:`1 \leq h \leq \lfloor s/2 \rfloor`.

     After fitting the model, each unobserved seasonal component modeled in the
     frequency domain is available in the results class in the `freq_seasonal`
@@ -259,16 +276,16 @@ class UnobservedComponents(MLEModel):

     .. math::

-        c_{t+1} & = \\rho_c (\\tilde c_t \\cos \\lambda_c t
-                + \\tilde c_t^* \\sin \\lambda_c) +
-                \\tilde \\omega_t \\\\
-        c_{t+1}^* & = \\rho_c (- \\tilde c_t \\sin \\lambda_c  t +
-                \\tilde c_t^* \\cos \\lambda_c) +
-                \\tilde \\omega_t^* \\\\
+        c_{t+1} & = \rho_c (\tilde c_t \cos \lambda_c t
+                + \tilde c_t^* \sin \lambda_c) +
+                \tilde \omega_t \\
+        c_{t+1}^* & = \rho_c (- \tilde c_t \sin \lambda_c  t +
+                \tilde c_t^* \cos \lambda_c) +
+                \tilde \omega_t^* \\

-    where :math:`\\omega_t, \\tilde \\omega_t iid N(0, \\sigma_{\\tilde \\omega}^2)`
+    where :math:`\omega_t, \tilde \omega_t iid N(0, \sigma_{\tilde \omega}^2)`

-    The parameter :math:`\\lambda_c` (the frequency of the cycle) is an
+    The parameter :math:`\lambda_c` (the frequency of the cycle) is an
     additional parameter to be estimated by MLE.

     If the cyclical effect is stochastic (`stochastic_cycle=True`), then there
@@ -277,10 +294,10 @@ class UnobservedComponents(MLEModel):
     to have independent draws).

     If the cycle is damped (`damped_cycle=True`), then there is a third
-    parameter to estimate, :math:`\\rho_c`.
+    parameter to estimate, :math:`\rho_c`.

     In order to achieve cycles with the appropriate frequencies, bounds are
-    imposed on the parameter :math:`\\lambda_c` in estimation. These can be
+    imposed on the parameter :math:`\lambda_c` in estimation. These can be
     controlled via the keyword argument `cycle_period_bounds`, which, if
     specified, must be a tuple of bounds on the **period** `(lower, upper)`.
     The bounds on the frequency are then calculated from those bounds.
@@ -289,8 +306,8 @@ class UnobservedComponents(MLEModel):
     way:

     1. If no date / time information is provided, the frequency is
-       constrained to be between zero and :math:`\\pi`, so the period is
-       constrained to be in :math:`[0.5, \\infty]`.
+       constrained to be between zero and :math:`\pi`, so the period is
+       constrained to be in :math:`[0.5, \infty]`.
     2. If the date / time information is provided, the default bounds
        allow the cyclical component to be between 1.5 and 12 years; depending
        on the frequency of the endogenous variable, this will imply different
@@ -306,7 +323,7 @@ class UnobservedComponents(MLEModel):

     .. math::

-        \\varepsilon_t \\sim N(0, \\sigma_\\varepsilon^2)
+        \varepsilon_t \sim N(0, \sigma_\varepsilon^2)

     **Autoregressive Irregular**

@@ -315,8 +332,8 @@ class UnobservedComponents(MLEModel):

     .. math::

-        \\varepsilon_t = \\rho(L) \\varepsilon_{t-1} + \\epsilon_t \\\\
-        \\epsilon_t \\sim N(0, \\sigma_\\epsilon^2)
+        \varepsilon_t = \rho(L) \varepsilon_{t-1} + \epsilon_t \\
+        \epsilon_t \sim N(0, \sigma_\epsilon^2)

     In this case, the AR order is specified via the `autoregressive` keyword,
     and the autoregressive coefficients are estimated.
@@ -342,22 +359,30 @@ class UnobservedComponents(MLEModel):
     .. [1] Durbin, James, and Siem Jan Koopman. 2012.
        Time Series Analysis by State Space Methods: Second Edition.
        Oxford University Press.
-    """
+    """  # noqa:E501

     def __init__(self, endog, level=False, trend=False, seasonal=None,
-        freq_seasonal=None, cycle=False, autoregressive=None, exog=None,
-        irregular=False, stochastic_level=False, stochastic_trend=False,
-        stochastic_seasonal=True, stochastic_freq_seasonal=None,
-        stochastic_cycle=False, damped_cycle=False, cycle_period_bounds=
-        None, mle_regression=True, use_exact_diffuse=False, **kwargs):
+                 freq_seasonal=None, cycle=False, autoregressive=None,
+                 exog=None, irregular=False,
+                 stochastic_level=False,
+                 stochastic_trend=False,
+                 stochastic_seasonal=True,
+                 stochastic_freq_seasonal=None,
+                 stochastic_cycle=False,
+                 damped_cycle=False, cycle_period_bounds=None,
+                 mle_regression=True, use_exact_diffuse=False,
+                 **kwargs):
+
+        # Model options
         self.level = level
         self.trend = trend
         self.seasonal_periods = seasonal if seasonal is not None else 0
         self.seasonal = self.seasonal_periods > 0
         if freq_seasonal:
             self.freq_seasonal_periods = [d['period'] for d in freq_seasonal]
-            self.freq_seasonal_harmonics = [d.get('harmonics', int(np.floor
-                (d['period'] / 2))) for d in freq_seasonal]
+            self.freq_seasonal_harmonics = [d.get(
+                'harmonics', int(np.floor(d['period'] / 2))) for
+                d in freq_seasonal]
         else:
             self.freq_seasonal_periods = []
             self.freq_seasonal_harmonics = []
@@ -366,34 +391,44 @@ class UnobservedComponents(MLEModel):
         self.ar_order = autoregressive if autoregressive is not None else 0
         self.autoregressive = self.ar_order > 0
         self.irregular = irregular
+
         self.stochastic_level = stochastic_level
         self.stochastic_trend = stochastic_trend
         self.stochastic_seasonal = stochastic_seasonal
         if stochastic_freq_seasonal is None:
-            self.stochastic_freq_seasonal = [True] * len(self.
-                freq_seasonal_periods)
+            self.stochastic_freq_seasonal = [True] * len(
+                self.freq_seasonal_periods)
         else:
             if len(stochastic_freq_seasonal) != len(freq_seasonal):
                 raise ValueError(
-                    'Length of stochastic_freq_seasonal must equal length of freq_seasonal: {!r} vs {!r}'
-                    .format(len(stochastic_freq_seasonal), len(freq_seasonal)))
+                    "Length of stochastic_freq_seasonal must equal length"
+                    " of freq_seasonal: {!r} vs {!r}".format(
+                        len(stochastic_freq_seasonal), len(freq_seasonal)))
             self.stochastic_freq_seasonal = stochastic_freq_seasonal
         self.stochastic_cycle = stochastic_cycle
+
         self.damped_cycle = damped_cycle
         self.mle_regression = mle_regression
         self.use_exact_diffuse = use_exact_diffuse
+
+        # Check for string trend/level specification
         self.trend_specification = None
         if isinstance(self.level, str):
             self.trend_specification = level
             self.level = False
+
+            # Check if any of the trend/level components have been set, and
+            # reset everything to False
             trend_attributes = ['irregular', 'level', 'trend',
-                'stochastic_level', 'stochastic_trend']
+                                'stochastic_level', 'stochastic_trend']
             for attribute in trend_attributes:
                 if not getattr(self, attribute) is False:
-                    warn(
-                        'Value of `%s` may be overridden when the trend component is specified using a model string.'
+                    warn("Value of `%s` may be overridden when the trend"
+                         " component is specified using a model string."
                          % attribute, SpecificationWarning)
                     setattr(self, attribute, False)
+
+            # Now set the correct specification
             spec = self.trend_specification
             if spec == 'irregular' or spec == 'ntrend':
                 self.irregular = True
@@ -421,7 +456,8 @@ class UnobservedComponents(MLEModel):
                 self.level = True
                 self.trend = True
                 self.trend_specification = 'deterministic trend'
-            elif spec == 'local linear deterministic trend' or spec == 'lldtrend':
+            elif (spec == 'local linear deterministic trend' or
+                    spec == 'lldtrend'):
                 self.irregular = True
                 self.level = True
                 self.stochastic_level = True
@@ -451,54 +487,89 @@ class UnobservedComponents(MLEModel):
                 self.stochastic_trend = True
                 self.trend_specification = 'random trend'
             else:
-                raise ValueError("Invalid level/trend specification: '%s'" %
-                    spec)
+                raise ValueError("Invalid level/trend specification: '%s'"
+                                 % spec)
+
+        # Check for a model that makes sense
         if trend and not level:
-            warn(
-                'Trend component specified without level component; deterministic level component added.'
-                , SpecificationWarning)
+            warn("Trend component specified without level component;"
+                 " deterministic level component added.", SpecificationWarning)
             self.level = True
             self.stochastic_level = False
-        if not (self.irregular or self.level and self.stochastic_level or 
-            self.trend and self.stochastic_trend or self.seasonal and self.
-            stochastic_seasonal or self.freq_seasonal and any(self.
-            stochastic_freq_seasonal) or self.cycle and self.
-            stochastic_cycle or self.autoregressive):
-            warn(
-                'Specified model does not contain a stochastic element; irregular component added.'
-                , SpecificationWarning)
+
+        if not (self.irregular or
+                (self.level and self.stochastic_level) or
+                (self.trend and self.stochastic_trend) or
+                (self.seasonal and self.stochastic_seasonal) or
+                (self.freq_seasonal and any(
+                    self.stochastic_freq_seasonal)) or
+                (self.cycle and self.stochastic_cycle) or
+                self.autoregressive):
+            warn("Specified model does not contain a stochastic element;"
+                 " irregular component added.", SpecificationWarning)
             self.irregular = True
+
         if self.seasonal and self.seasonal_periods < 2:
-            raise ValueError(
-                'Seasonal component must have a seasonal period of at least 2.'
-                )
+            raise ValueError('Seasonal component must have a seasonal period'
+                             ' of at least 2.')
+
         if self.freq_seasonal:
             for p in self.freq_seasonal_periods:
                 if p < 2:
                     raise ValueError(
-                        'Frequency Domain seasonal component must have a seasonal period of at least 2.'
-                        )
-        self.trend_mask = (self.irregular * 1 | self.level * 2 | self.level *
-            self.stochastic_level * 4 | self.trend * 8 | self.trend * self.
-            stochastic_trend * 16)
+                        'Frequency Domain seasonal component must have a '
+                        'seasonal period of at least 2.')
+
+        # Create a bitmask holding the level/trend specification
+        self.trend_mask = (
+            self.irregular * 0x01 |
+            self.level * 0x02 |
+            self.level * self.stochastic_level * 0x04 |
+            self.trend * 0x08 |
+            self.trend * self.stochastic_trend * 0x10
+        )
+
+        # Create the trend specification, if it was not given
         if self.trend_specification is None:
+            # trend specification may be none, e.g. if the model is only
+            # a stochastic cycle, etc.
             self.trend_specification = _mask_map.get(self.trend_mask, None)
-        self.k_exog, exog = prepare_exog(exog)
+
+        # Exogenous component
+        (self.k_exog, exog) = prepare_exog(exog)
+
         self.regression = self.k_exog > 0
+
+        # Model parameters
         self._k_seasonal_states = (self.seasonal_periods - 1) * self.seasonal
-        self._k_freq_seas_states = sum(2 * h for h in self.
-            freq_seasonal_harmonics) * self.freq_seasonal
+        self._k_freq_seas_states = (
+            sum(2 * h for h in self.freq_seasonal_harmonics)
+            * self.freq_seasonal)
         self._k_cycle_states = self.cycle * 2
-        k_states = (self.level + self.trend + self._k_seasonal_states +
-            self._k_freq_seas_states + self._k_cycle_states + self.ar_order +
-            (not self.mle_regression) * self.k_exog)
-        k_posdef = (self.stochastic_level * self.level + self.
-            stochastic_trend * self.trend + self.stochastic_seasonal * self
-            .seasonal + sum(2 * h if self.stochastic_freq_seasonal[ix] else
-            0 for ix, h in enumerate(self.freq_seasonal_harmonics)) * self.
-            freq_seasonal + self.stochastic_cycle * self._k_cycle_states +
-            self.autoregressive)
+        k_states = (
+            self.level + self.trend +
+            self._k_seasonal_states +
+            self._k_freq_seas_states +
+            self._k_cycle_states +
+            self.ar_order +
+            (not self.mle_regression) * self.k_exog
+        )
+        k_posdef = (
+            self.stochastic_level * self.level +
+            self.stochastic_trend * self.trend +
+            self.stochastic_seasonal * self.seasonal +
+            ((sum(2 * h if self.stochastic_freq_seasonal[ix] else 0 for
+                  ix, h in enumerate(self.freq_seasonal_harmonics))) *
+             self.freq_seasonal) +
+            self.stochastic_cycle * (self._k_cycle_states) +
+            self.autoregressive
+        )
+
+        # Handle non-default loglikelihood burn
         self._loglikelihood_burn = kwargs.get('loglikelihood_burn', None)
+
+        # We can still estimate the model with just the irregular component,
+        # just need to have one state that does nothing.
         self._unused_state = False
         if k_states == 0:
             if not self.irregular:
@@ -507,49 +578,587 @@ class UnobservedComponents(MLEModel):
             self._unused_state = True
         if k_posdef == 0:
             k_posdef = 1
-        super(UnobservedComponents, self).__init__(endog, k_states,
-            k_posdef=k_posdef, exog=exog, **kwargs)
+
+        # Setup the representation
+        super(UnobservedComponents, self).__init__(
+            endog, k_states, k_posdef=k_posdef, exog=exog, **kwargs
+        )
         self.setup()
+
+        # Set as time-varying model if we have exog
         if self.k_exog > 0:
             self.ssm._time_invariant = False
+
+        # Need to reset the MLE names (since when they were first set, `setup`
+        # had not been run (and could not have been at that point))
         self.data.param_names = self.param_names
+
+        # Get bounds for the frequency of the cycle, if we know the frequency
+        # of the data.
         if cycle_period_bounds is None:
             freq = self.data.freq[0] if self.data.freq is not None else ''
             if freq in ('A', 'Y'):
-                cycle_period_bounds = 1.5, 12
+                cycle_period_bounds = (1.5, 12)
             elif freq == 'Q':
-                cycle_period_bounds = 1.5 * 4, 12 * 4
+                cycle_period_bounds = (1.5*4, 12*4)
             elif freq == 'M':
-                cycle_period_bounds = 1.5 * 12, 12 * 12
+                cycle_period_bounds = (1.5*12, 12*12)
             else:
-                cycle_period_bounds = 2, np.inf
-        self.cycle_frequency_bound = 2 * np.pi / cycle_period_bounds[1
-            ], 2 * np.pi / cycle_period_bounds[0]
+                # If we have no information on data frequency, require the
+                # cycle frequency to be between 0 and pi
+                cycle_period_bounds = (2, np.inf)
+
+        self.cycle_frequency_bound = (
+            2*np.pi / cycle_period_bounds[1], 2*np.pi / cycle_period_bounds[0]
+        )
+
+        # Update _init_keys attached by super
         self._init_keys += ['level', 'trend', 'seasonal', 'freq_seasonal',
-            'cycle', 'autoregressive', 'irregular', 'stochastic_level',
-            'stochastic_trend', 'stochastic_seasonal',
-            'stochastic_freq_seasonal', 'stochastic_cycle', 'damped_cycle',
-            'cycle_period_bounds', 'mle_regression'] + list(kwargs.keys())
+                            'cycle', 'autoregressive', 'irregular',
+                            'stochastic_level', 'stochastic_trend',
+                            'stochastic_seasonal', 'stochastic_freq_seasonal',
+                            'stochastic_cycle',
+                            'damped_cycle', 'cycle_period_bounds',
+                            'mle_regression'] + list(kwargs.keys())
+
+        # Initialize the state
         self.initialize_default()

+    def _get_init_kwds(self):
+        # Get keywords based on model attributes
+        kwds = super(UnobservedComponents, self)._get_init_kwds()
+
+        # Modifications
+        if self.trend_specification is not None:
+            kwds['level'] = self.trend_specification
+
+            for attr in ['irregular', 'trend', 'stochastic_level',
+                         'stochastic_trend']:
+                kwds[attr] = False
+
+        kwds['seasonal'] = self.seasonal_periods
+        kwds['freq_seasonal'] = [
+            {'period': p,
+             'harmonics': self.freq_seasonal_harmonics[ix]} for
+            ix, p in enumerate(self.freq_seasonal_periods)]
+        kwds['autoregressive'] = self.ar_order
+
+        return kwds
+
     def setup(self):
         """
         Setup the structural time series representation
         """
-        pass
+        # Initialize the ordered sets of parameters
+        self.parameters = {}
+        self.parameters_obs_intercept = {}
+        self.parameters_obs_cov = {}
+        self.parameters_transition = {}
+        self.parameters_state_cov = {}
+
+        # Initialize the fixed components of the state space matrices,
+        i = 0  # state offset
+        j = 0  # state covariance offset
+
+        if self.irregular:
+            self.parameters_obs_cov['irregular_var'] = 1
+        if self.level:
+            self.ssm['design', 0, i] = 1.
+            self.ssm['transition', i, i] = 1.
+            if self.trend:
+                self.ssm['transition', i, i+1] = 1.
+            if self.stochastic_level:
+                self.ssm['selection', i, j] = 1.
+                self.parameters_state_cov['level_var'] = 1
+                j += 1
+            i += 1
+        if self.trend:
+            self.ssm['transition', i, i] = 1.
+            if self.stochastic_trend:
+                self.ssm['selection', i, j] = 1.
+                self.parameters_state_cov['trend_var'] = 1
+                j += 1
+            i += 1
+        if self.seasonal:
+            n = self.seasonal_periods - 1
+            self.ssm['design', 0, i] = 1.
+            self.ssm['transition', i:i + n, i:i + n] = (
+                companion_matrix(np.r_[1, [1] * n]).transpose()
+            )
+            if self.stochastic_seasonal:
+                self.ssm['selection', i, j] = 1.
+                self.parameters_state_cov['seasonal_var'] = 1
+                j += 1
+            i += n
+        if self.freq_seasonal:
+            for ix, h in enumerate(self.freq_seasonal_harmonics):
+                # These are the \gamma_jt and \gamma^*_jt terms in D&K (3.8)
+                n = 2 * h
+                p = self.freq_seasonal_periods[ix]
+                lambda_p = 2 * np.pi / float(p)
+
+                t = 0  # frequency transition matrix offset
+                for block in range(1, h + 1):
+                    # ibid. eqn (3.7)
+                    self.ssm['design', 0, i+t] = 1.
+
+                    # ibid. eqn (3.8)
+                    cos_lambda_block = np.cos(lambda_p * block)
+                    sin_lambda_block = np.sin(lambda_p * block)
+                    trans = np.array([[cos_lambda_block, sin_lambda_block],
+                                      [-sin_lambda_block, cos_lambda_block]])
+                    trans_s = np.s_[i + t:i + t + 2]
+                    self.ssm['transition', trans_s, trans_s] = trans
+                    t += 2
+
+                if self.stochastic_freq_seasonal[ix]:
+                    self.ssm['selection', i:i + n, j:j + n] = np.eye(n)
+                    cov_key = 'freq_seasonal_var_{!r}'.format(ix)
+                    self.parameters_state_cov[cov_key] = 1
+                    j += n
+                i += n
+        if self.cycle:
+            self.ssm['design', 0, i] = 1.
+            self.parameters_transition['cycle_freq'] = 1
+            if self.damped_cycle:
+                self.parameters_transition['cycle_damp'] = 1
+            if self.stochastic_cycle:
+                self.ssm['selection', i:i+2, j:j+2] = np.eye(2)
+                self.parameters_state_cov['cycle_var'] = 1
+                j += 2
+            self._idx_cycle_transition = np.s_['transition', i:i+2, i:i+2]
+            i += 2
+        if self.autoregressive:
+            self.ssm['design', 0, i] = 1.
+            self.parameters_transition['ar_coeff'] = self.ar_order
+            self.parameters_state_cov['ar_var'] = 1
+            self.ssm['selection', i, j] = 1
+            self.ssm['transition', i:i+self.ar_order, i:i+self.ar_order] = (
+                companion_matrix(self.ar_order).T
+            )
+            self._idx_ar_transition = (
+                np.s_['transition', i, i:i+self.ar_order]
+            )
+            j += 1
+            i += self.ar_order
+        if self.regression:
+            if self.mle_regression:
+                self.parameters_obs_intercept['reg_coeff'] = self.k_exog
+            else:
+                design = np.repeat(self.ssm['design', :, :, 0], self.nobs,
+                                   axis=0)
+                self.ssm['design'] = design.transpose()[np.newaxis, :, :]
+                self.ssm['design', 0, i:i+self.k_exog, :] = (
+                    self.exog.transpose())
+                self.ssm['transition', i:i+self.k_exog, i:i+self.k_exog] = (
+                    np.eye(self.k_exog)
+                )
+
+                i += self.k_exog
+
+        # Update to get the actual parameter set
+        self.parameters.update(self.parameters_obs_cov)
+        self.parameters.update(self.parameters_state_cov)
+        self.parameters.update(self.parameters_transition)  # ordered last
+        self.parameters.update(self.parameters_obs_intercept)
+
+        self.k_obs_intercept = sum(self.parameters_obs_intercept.values())
+        self.k_obs_cov = sum(self.parameters_obs_cov.values())
+        self.k_transition = sum(self.parameters_transition.values())
+        self.k_state_cov = sum(self.parameters_state_cov.values())
+        self.k_params = sum(self.parameters.values())
+
+        # Other indices
+        idx = np.diag_indices(self.ssm.k_posdef)
+        self._idx_state_cov = ('state_cov', idx[0], idx[1])
+
+        # Some of the variances may be tied together (repeated parameter usage)
+        # Use list() for compatibility with python 3.5
+        param_keys = list(self.parameters_state_cov.keys())
+        self._var_repetitions = np.ones(self.k_state_cov, dtype=int)
+        if self.freq_seasonal:
+            for ix, is_stochastic in enumerate(self.stochastic_freq_seasonal):
+                if is_stochastic:
+                    num_harmonics = self.freq_seasonal_harmonics[ix]
+                    repeat_times = 2 * num_harmonics
+                    cov_key = 'freq_seasonal_var_{!r}'.format(ix)
+                    cov_ix = param_keys.index(cov_key)
+                    self._var_repetitions[cov_ix] = repeat_times
+
+        if self.stochastic_cycle and self.cycle:
+            cov_ix = param_keys.index('cycle_var')
+            self._var_repetitions[cov_ix] = 2
+        self._repeat_any_var = any(self._var_repetitions > 1)
+
+    def initialize_default(self, approximate_diffuse_variance=None):
+        if approximate_diffuse_variance is None:
+            approximate_diffuse_variance = self.ssm.initial_variance
+        if self.use_exact_diffuse:
+            diffuse_type = 'diffuse'
+        else:
+            diffuse_type = 'approximate_diffuse'
+
+            # Set the loglikelihood burn parameter, if not given in constructor
+            if self._loglikelihood_burn is None:
+                k_diffuse_states = (
+                    self.k_states - int(self._unused_state) - self.ar_order)
+                self.loglikelihood_burn = k_diffuse_states
+
+        init = Initialization(
+            self.k_states,
+            approximate_diffuse_variance=approximate_diffuse_variance)
+
+        if self._unused_state:
+            # If this flag is set, it means we have a model with just an
+            # irregular component and nothing else. The state is then
+            # irrelevant and we can't put it as diffuse, since then the filter
+            # will never leave the diffuse state.
+            init.set(0, 'known', constant=[0])
+        elif self.autoregressive:
+            offset = (self.level + self.trend +
+                      self._k_seasonal_states +
+                      self._k_freq_seas_states +
+                      self._k_cycle_states)
+            length = self.ar_order
+            init.set((0, offset), diffuse_type)
+            init.set((offset, offset + length), 'stationary')
+            init.set((offset + length, self.k_states), diffuse_type)
+        # If we do not have an autoregressive component, then everything has
+        # a diffuse initialization
+        else:
+            init.set(None, diffuse_type)
+
+        self.ssm.initialization = init
+
+    def clone(self, endog, exog=None, **kwargs):
+        return self._clone_from_init_kwds(endog, exog=exog, **kwargs)
+
+    @property
+    def _res_classes(self):
+        return {'fit': (UnobservedComponentsResults,
+                        UnobservedComponentsResultsWrapper)}
+
+    @property
+    def start_params(self):
+        if not hasattr(self, 'parameters'):
+            return []
+
+        # Eliminate missing data to estimate starting parameters
+        endog = self.endog
+        exog = self.exog
+        if np.any(np.isnan(endog)):
+            mask = ~np.isnan(endog).squeeze()
+            endog = endog[mask]
+            if exog is not None:
+                exog = exog[mask]
+
+        # Level / trend variances
+        # (Use the HP filter to get initial estimates of variances)
+        _start_params = {}
+        if self.level:
+            resid, trend1 = hpfilter(endog)
+
+            if self.stochastic_trend:
+                cycle2, trend2 = hpfilter(trend1)
+                _start_params['trend_var'] = np.std(trend2)**2
+                if self.stochastic_level:
+                    _start_params['level_var'] = np.std(cycle2)**2
+            elif self.stochastic_level:
+                _start_params['level_var'] = np.std(trend1)**2
+        else:
+            resid = self.ssm.endog[0]
+
+        # Regression
+        if self.regression and self.mle_regression:
+            _start_params['reg_coeff'] = (
+                np.linalg.pinv(exog).dot(resid).tolist()
+            )
+            resid = np.squeeze(
+                resid - np.dot(exog, _start_params['reg_coeff'])
+            )
+
+        # Autoregressive
+        if self.autoregressive:
+            Y = resid[self.ar_order:]
+            X = lagmat(resid, self.ar_order, trim='both')
+            _start_params['ar_coeff'] = np.linalg.pinv(X).dot(Y).tolist()
+            resid = np.squeeze(Y - np.dot(X, _start_params['ar_coeff']))
+            _start_params['ar_var'] = np.var(resid)
+
+        # The variance of the residual term can be used for all variances,
+        # just to get something in the right order of magnitude.
+        var_resid = np.var(resid)
+
+        # Seasonal
+        if self.stochastic_seasonal:
+            _start_params['seasonal_var'] = var_resid
+
+        # Frequency domain seasonal
+        for ix, is_stochastic in enumerate(self.stochastic_freq_seasonal):
+            cov_key = 'freq_seasonal_var_{!r}'.format(ix)
+            _start_params[cov_key] = var_resid
+
+        # Cyclical
+        if self.cycle:
+            _start_params['cycle_var'] = var_resid
+            # Clip this to make sure it is positive and strictly stationary
+            # (i.e. do not want negative or 1)
+            _start_params['cycle_damp'] = np.clip(
+                np.linalg.pinv(resid[:-1, None]).dot(resid[1:])[0], 0, 0.99
+            )
+
+            # Set initial period estimate to 3 year, if we know the frequency
+            # of the data observations
+            freq = self.data.freq[0] if self.data.freq is not None else ''
+            if freq == 'A':
+                _start_params['cycle_freq'] = 2 * np.pi / 3
+            elif freq == 'Q':
+                _start_params['cycle_freq'] = 2 * np.pi / 12
+            elif freq == 'M':
+                _start_params['cycle_freq'] = 2 * np.pi / 36
+            else:
+                if not np.any(np.isinf(self.cycle_frequency_bound)):
+                    _start_params['cycle_freq'] = (
+                        np.mean(self.cycle_frequency_bound))
+                elif np.isinf(self.cycle_frequency_bound[1]):
+                    _start_params['cycle_freq'] = self.cycle_frequency_bound[0]
+                else:
+                    _start_params['cycle_freq'] = self.cycle_frequency_bound[1]
+
+        # Irregular
+        if self.irregular:
+            _start_params['irregular_var'] = var_resid
+
+        # Create the starting parameter list
+        start_params = []
+        for key in self.parameters.keys():
+            if np.isscalar(_start_params[key]):
+                start_params.append(_start_params[key])
+            else:
+                start_params += _start_params[key]
+        return start_params
+
+    @property
+    def param_names(self):
+        if not hasattr(self, 'parameters'):
+            return []
+        param_names = []
+        for key in self.parameters.keys():
+            if key == 'irregular_var':
+                param_names.append('sigma2.irregular')
+            elif key == 'level_var':
+                param_names.append('sigma2.level')
+            elif key == 'trend_var':
+                param_names.append('sigma2.trend')
+            elif key == 'seasonal_var':
+                param_names.append('sigma2.seasonal')
+            elif key.startswith('freq_seasonal_var_'):
+                # There are potentially multiple frequency domain
+                # seasonal terms
+                idx_fseas_comp = int(key[-1])
+                periodicity = self.freq_seasonal_periods[idx_fseas_comp]
+                harmonics = self.freq_seasonal_harmonics[idx_fseas_comp]
+                freq_seasonal_name = "{p}({h})".format(
+                    p=repr(periodicity),
+                    h=repr(harmonics))
+                param_names.append(
+                    'sigma2.' + 'freq_seasonal_' + freq_seasonal_name)
+            elif key == 'cycle_var':
+                param_names.append('sigma2.cycle')
+            elif key == 'cycle_freq':
+                param_names.append('frequency.cycle')
+            elif key == 'cycle_damp':
+                param_names.append('damping.cycle')
+            elif key == 'ar_coeff':
+                for i in range(self.ar_order):
+                    param_names.append('ar.L%d' % (i+1))
+            elif key == 'ar_var':
+                param_names.append('sigma2.ar')
+            elif key == 'reg_coeff':
+                param_names += [
+                    'beta.%s' % self.exog_names[i]
+                    for i in range(self.k_exog)
+                ]
+            else:
+                param_names.append(key)
+        return param_names
+
+    @property
+    def state_names(self):
+        names = []
+        if self.level:
+            names.append('level')
+        if self.trend:
+            names.append('trend')
+        if self.seasonal:
+            names.append('seasonal')
+            names += ['seasonal.L%d' % i
+                      for i in range(1, self._k_seasonal_states)]
+        if self.freq_seasonal:
+            names += ['freq_seasonal.%d' % i
+                      for i in range(self._k_freq_seas_states)]
+        if self.cycle:
+            names += ['cycle', 'cycle.auxilliary']
+        if self.ar_order > 0:
+            names += ['ar.L%d' % i
+                      for i in range(1, self.ar_order + 1)]
+        if self.k_exog > 0 and not self.mle_regression:
+            names += ['beta.%s' % self.exog_names[i]
+                      for i in range(self.k_exog)]
+        if self._unused_state:
+            names += ['dummy']
+
+        return names

     def transform_params(self, unconstrained):
         """
         Transform unconstrained parameters used by the optimizer to constrained
         parameters used in likelihood evaluation
         """
-        pass
+        unconstrained = np.array(unconstrained, ndmin=1)
+        constrained = np.zeros(unconstrained.shape, dtype=unconstrained.dtype)
+
+        # Positive parameters: obs_cov, state_cov
+        offset = self.k_obs_cov + self.k_state_cov
+        constrained[:offset] = unconstrained[:offset]**2
+
+        # Cycle parameters
+        if self.cycle:
+            # Cycle frequency must be between between our bounds
+            low, high = self.cycle_frequency_bound
+            constrained[offset] = (
+                1 / (1 + np.exp(-unconstrained[offset]))
+            ) * (high - low) + low
+            offset += 1
+
+            # Cycle damping (if present) must be between 0 and 1
+            if self.damped_cycle:
+                constrained[offset] = (
+                    1 / (1 + np.exp(-unconstrained[offset]))
+                )
+                offset += 1
+
+        # Autoregressive coefficients must be stationary
+        if self.autoregressive:
+            constrained[offset:offset + self.ar_order] = (
+                constrain_stationary_univariate(
+                    unconstrained[offset:offset + self.ar_order]
+                )
+            )
+            offset += self.ar_order
+
+        # Nothing to do with betas
+        constrained[offset:offset + self.k_exog] = (
+            unconstrained[offset:offset + self.k_exog]
+        )
+
+        return constrained

     def untransform_params(self, constrained):
         """
         Reverse the transformation
         """
-        pass
+        constrained = np.array(constrained, ndmin=1)
+        unconstrained = np.zeros(constrained.shape, dtype=constrained.dtype)
+
+        # Positive parameters: obs_cov, state_cov
+        offset = self.k_obs_cov + self.k_state_cov
+        unconstrained[:offset] = constrained[:offset]**0.5
+
+        # Cycle parameters
+        if self.cycle:
+            # Cycle frequency must be between between our bounds
+            low, high = self.cycle_frequency_bound
+            x = (constrained[offset] - low) / (high - low)
+            unconstrained[offset] = np.log(
+                x / (1 - x)
+            )
+            offset += 1
+
+            # Cycle damping (if present) must be between 0 and 1
+            if self.damped_cycle:
+                unconstrained[offset] = np.log(
+                    constrained[offset] / (1 - constrained[offset])
+                )
+                offset += 1
+
+        # Autoregressive coefficients must be stationary
+        if self.autoregressive:
+            unconstrained[offset:offset + self.ar_order] = (
+                unconstrain_stationary_univariate(
+                    constrained[offset:offset + self.ar_order]
+                )
+            )
+            offset += self.ar_order
+
+        # Nothing to do with betas
+        unconstrained[offset:offset + self.k_exog] = (
+            constrained[offset:offset + self.k_exog]
+        )
+
+        return unconstrained
+
+    def _validate_can_fix_params(self, param_names):
+        super(UnobservedComponents, self)._validate_can_fix_params(param_names)
+
+        if 'ar_coeff' in self.parameters:
+            ar_names = ['ar.L%d' % (i+1) for i in range(self.ar_order)]
+            fix_all_ar = param_names.issuperset(ar_names)
+            fix_any_ar = len(param_names.intersection(ar_names)) > 0
+            if fix_any_ar and not fix_all_ar:
+                raise ValueError('Cannot fix individual autoregressive.'
+                                 ' parameters. Must either fix all'
+                                 ' autoregressive parameters or none.')
+
+    def update(self, params, transformed=True, includes_fixed=False,
+               complex_step=False):
+        params = self.handle_params(params, transformed=transformed,
+                                    includes_fixed=includes_fixed)
+
+        offset = 0
+
+        # Observation covariance
+        if self.irregular:
+            self.ssm['obs_cov', 0, 0] = params[offset]
+            offset += 1
+
+        # State covariance
+        if self.k_state_cov > 0:
+            variances = params[offset:offset+self.k_state_cov]
+            if self._repeat_any_var:
+                variances = np.repeat(variances, self._var_repetitions)
+            self.ssm[self._idx_state_cov] = variances
+            offset += self.k_state_cov
+
+        # Cycle transition
+        if self.cycle:
+            cos_freq = np.cos(params[offset])
+            sin_freq = np.sin(params[offset])
+            cycle_transition = np.array(
+                [[cos_freq, sin_freq],
+                 [-sin_freq, cos_freq]]
+            )
+            if self.damped_cycle:
+                offset += 1
+                cycle_transition *= params[offset]
+            self.ssm[self._idx_cycle_transition] = cycle_transition
+            offset += 1
+
+        # AR transition
+        if self.autoregressive:
+            self.ssm[self._idx_ar_transition] = (
+                params[offset:offset+self.ar_order]
+            )
+            offset += self.ar_order
+
+        # Beta observation intercept
+        if self.regression:
+            if self.mle_regression:
+                self.ssm['obs_intercept'] = np.dot(
+                    self.exog,
+                    params[offset:offset+self.k_exog]
+                )[None, :]
+            offset += self.k_exog


 class UnobservedComponentsResults(MLEResults):
@@ -573,31 +1182,50 @@ class UnobservedComponentsResults(MLEResults):
     statsmodels.tsa.statespace.mlemodel.MLEResults
     """

-    def __init__(self, model, params, filter_results, cov_type=None, **kwargs):
-        super(UnobservedComponentsResults, self).__init__(model, params,
-            filter_results, cov_type, **kwargs)
-        self.df_resid = np.inf
+    def __init__(self, model, params, filter_results, cov_type=None,
+                 **kwargs):
+        super(UnobservedComponentsResults, self).__init__(
+            model, params, filter_results, cov_type, **kwargs)
+
+        self.df_resid = np.inf  # attribute required for wald tests
+
+        # Save _init_kwds
         self._init_kwds = self.model._get_init_kwds()
-        self._k_states_by_type = {'seasonal': self.model._k_seasonal_states,
-            'freq_seasonal': self.model._k_freq_seas_states, 'cycle': self.
-            model._k_cycle_states}
-        self.specification = Bunch(**{'level': self.model.level, 'trend':
-            self.model.trend, 'seasonal_periods': self.model.
-            seasonal_periods, 'seasonal': self.model.seasonal,
+
+        # Save number of states by type
+        self._k_states_by_type = {
+            'seasonal': self.model._k_seasonal_states,
+            'freq_seasonal': self.model._k_freq_seas_states,
+            'cycle': self.model._k_cycle_states}
+
+        # Save the model specification
+        self.specification = Bunch(**{
+            # Model options
+            'level': self.model.level,
+            'trend': self.model.trend,
+            'seasonal_periods': self.model.seasonal_periods,
+            'seasonal': self.model.seasonal,
             'freq_seasonal': self.model.freq_seasonal,
             'freq_seasonal_periods': self.model.freq_seasonal_periods,
             'freq_seasonal_harmonics': self.model.freq_seasonal_harmonics,
-            'cycle': self.model.cycle, 'ar_order': self.model.ar_order,
-            'autoregressive': self.model.autoregressive, 'irregular': self.
-            model.irregular, 'stochastic_level': self.model.
-            stochastic_level, 'stochastic_trend': self.model.
-            stochastic_trend, 'stochastic_seasonal': self.model.
-            stochastic_seasonal, 'stochastic_freq_seasonal': self.model.
-            stochastic_freq_seasonal, 'stochastic_cycle': self.model.
-            stochastic_cycle, 'damped_cycle': self.model.damped_cycle,
-            'regression': self.model.regression, 'mle_regression': self.
-            model.mle_regression, 'k_exog': self.model.k_exog,
-            'trend_specification': self.model.trend_specification})
+            'cycle': self.model.cycle,
+            'ar_order': self.model.ar_order,
+            'autoregressive': self.model.autoregressive,
+            'irregular': self.model.irregular,
+            'stochastic_level': self.model.stochastic_level,
+            'stochastic_trend': self.model.stochastic_trend,
+            'stochastic_seasonal': self.model.stochastic_seasonal,
+            'stochastic_freq_seasonal': self.model.stochastic_freq_seasonal,
+            'stochastic_cycle': self.model.stochastic_cycle,
+
+            'damped_cycle': self.model.damped_cycle,
+            'regression': self.model.regression,
+            'mle_regression': self.model.mle_regression,
+            'k_exog': self.model.k_exog,
+
+            # Check for string trend/level specification
+            'trend_specification': self.model.trend_specification
+        })

     @property
     def level(self):
@@ -620,7 +1248,20 @@ class UnobservedComponentsResults(MLEResults):
             - `offset`: an integer giving the offset in the state vector where
                         this component begins
         """
-        pass
+        # If present, level is always the first component of the state vector
+        out = None
+        spec = self.specification
+        if spec.level:
+            offset = 0
+            out = Bunch(filtered=self.filtered_state[offset],
+                        filtered_cov=self.filtered_state_cov[offset, offset],
+                        smoothed=None, smoothed_cov=None,
+                        offset=offset)
+            if self.smoothed_state is not None:
+                out.smoothed = self.smoothed_state[offset]
+            if self.smoothed_state_cov is not None:
+                out.smoothed_cov = self.smoothed_state_cov[offset, offset]
+        return out

     @property
     def trend(self):
@@ -643,7 +1284,21 @@ class UnobservedComponentsResults(MLEResults):
             - `offset`: an integer giving the offset in the state vector where
                         this component begins
         """
-        pass
+        # If present, trend is always the second component of the state vector
+        # (because level is always present if trend is present)
+        out = None
+        spec = self.specification
+        if spec.trend:
+            offset = int(spec.level)
+            out = Bunch(filtered=self.filtered_state[offset],
+                        filtered_cov=self.filtered_state_cov[offset, offset],
+                        smoothed=None, smoothed_cov=None,
+                        offset=offset)
+            if self.smoothed_state is not None:
+                out.smoothed = self.smoothed_state[offset]
+            if self.smoothed_state_cov is not None:
+                out.smoothed_cov = self.smoothed_state_cov[offset, offset]
+        return out

     @property
     def seasonal(self):
@@ -666,7 +1321,23 @@ class UnobservedComponentsResults(MLEResults):
             - `offset`: an integer giving the offset in the state vector where
                         this component begins
         """
-        pass
+        # If present, seasonal always follows level/trend (if they are present)
+        # Note that we return only the first seasonal state, but there are
+        # in fact seasonal_periods-1 seasonal states, however latter states
+        # are just lagged versions of the first seasonal state.
+        out = None
+        spec = self.specification
+        if spec.seasonal:
+            offset = int(spec.trend + spec.level)
+            out = Bunch(filtered=self.filtered_state[offset],
+                        filtered_cov=self.filtered_state_cov[offset, offset],
+                        smoothed=None, smoothed_cov=None,
+                        offset=offset)
+            if self.smoothed_state is not None:
+                out.smoothed = self.smoothed_state[offset]
+            if self.smoothed_state_cov is not None:
+                out.smoothed_cov = self.smoothed_state_cov[offset, offset]
+        return out

     @property
     def freq_seasonal(self):
@@ -689,7 +1360,59 @@ class UnobservedComponentsResults(MLEResults):
             - `offset`: an integer giving the offset in the state vector where
                         this component begins
         """
-        pass
+        # If present, freq_seasonal components always follows level/trend
+        #  and seasonal.
+
+        # There are 2 * (harmonics) seasonal states per freq_seasonal
+        # component.
+        # The sum of every other state enters the measurement equation.
+        # Additionally, there can be multiple components of this type.
+        # These facts make this property messier in implementation than the
+        # others.
+        # Fortunately, the states are conditionally mutually independent
+        # (conditional on previous timestep's states), so that the calculations
+        # of the variances are simple summations of individual variances and
+        # the calculation of the returned state is likewise a summation.
+        out = []
+        spec = self.specification
+        if spec.freq_seasonal:
+            previous_states_offset = int(spec.trend + spec.level
+                                         + self._k_states_by_type['seasonal'])
+            previous_f_seas_offset = 0
+            for ix, h in enumerate(spec.freq_seasonal_harmonics):
+                offset = previous_states_offset + previous_f_seas_offset
+
+                period = spec.freq_seasonal_periods[ix]
+
+                # Only the gamma_jt terms enter the measurement equation (cf.
+                # D&K 2012 (3.7))
+                states_in_sum = np.arange(0, 2 * h, 2)
+
+                filtered_state = np.sum(
+                    [self.filtered_state[offset + j] for j in states_in_sum],
+                    axis=0)
+                filtered_cov = np.sum(
+                    [self.filtered_state_cov[offset + j, offset + j] for j in
+                     states_in_sum], axis=0)
+
+                item = Bunch(
+                    filtered=filtered_state,
+                    filtered_cov=filtered_cov,
+                    smoothed=None, smoothed_cov=None,
+                    offset=offset,
+                    pretty_name='seasonal {p}({h})'.format(p=repr(period),
+                                                           h=repr(h)))
+                if self.smoothed_state is not None:
+                    item.smoothed = np.sum(
+                        [self.smoothed_state[offset+j] for j in states_in_sum],
+                        axis=0)
+                if self.smoothed_state_cov is not None:
+                    item.smoothed_cov = np.sum(
+                        [self.smoothed_state_cov[offset+j, offset+j]
+                         for j in states_in_sum], axis=0)
+                out.append(item)
+                previous_f_seas_offset += 2 * h
+        return out

     @property
     def cycle(self):
@@ -712,7 +1435,27 @@ class UnobservedComponentsResults(MLEResults):
             - `offset`: an integer giving the offset in the state vector where
                         this component begins
         """
-        pass
+        # If present, cycle always follows level/trend, seasonal, and freq
+        #  seasonal.
+        # Note that we return only the first cyclical state, but there are
+        # in fact 2 cyclical states. The second cyclical state is not simply
+        # a lag of the first cyclical state, but the first cyclical state is
+        # the one that enters the measurement equation.
+        out = None
+        spec = self.specification
+        if spec.cycle:
+            offset = int(spec.trend + spec.level
+                         + self._k_states_by_type['seasonal']
+                         + self._k_states_by_type['freq_seasonal'])
+            out = Bunch(filtered=self.filtered_state[offset],
+                        filtered_cov=self.filtered_state_cov[offset, offset],
+                        smoothed=None, smoothed_cov=None,
+                        offset=offset)
+            if self.smoothed_state is not None:
+                out.smoothed = self.smoothed_state[offset]
+            if self.smoothed_state_cov is not None:
+                out.smoothed_cov = self.smoothed_state_cov[offset, offset]
+        return out

     @property
     def autoregressive(self):
@@ -735,7 +1478,26 @@ class UnobservedComponentsResults(MLEResults):
             - `offset`: an integer giving the offset in the state vector where
                         this component begins
         """
-        pass
+        # If present, autoregressive always follows level/trend, seasonal,
+        # freq seasonal, and cyclical.
+        # If it is an AR(p) model, then there are p associated
+        # states, but the second - pth states are just lags of the first state.
+        out = None
+        spec = self.specification
+        if spec.autoregressive:
+            offset = int(spec.trend + spec.level
+                         + self._k_states_by_type['seasonal']
+                         + self._k_states_by_type['freq_seasonal']
+                         + self._k_states_by_type['cycle'])
+            out = Bunch(filtered=self.filtered_state[offset],
+                        filtered_cov=self.filtered_state_cov[offset, offset],
+                        smoothed=None, smoothed_cov=None,
+                        offset=offset)
+            if self.smoothed_state is not None:
+                out.smoothed = self.smoothed_state[offset]
+            if self.smoothed_state_cov is not None:
+                out.smoothed_cov = self.smoothed_state_cov[offset, offset]
+        return out

     @property
     def regression_coefficients(self):
@@ -758,11 +1520,45 @@ class UnobservedComponentsResults(MLEResults):
             - `offset`: an integer giving the offset in the state vector where
                         this component begins
         """
-        pass
-
-    def plot_components(self, which=None, alpha=0.05, observed=True, level=
-        True, trend=True, seasonal=True, freq_seasonal=True, cycle=True,
-        autoregressive=True, legend_loc='upper right', fig=None, figsize=None):
+        # If present, state-vector regression coefficients always are last
+        # (i.e. they follow level/trend, seasonal, freq seasonal, cyclical, and
+        # autoregressive states). There is one state associated with each
+        # regressor, and all are returned here.
+        out = None
+        spec = self.specification
+        if spec.regression:
+            if spec.mle_regression:
+                import warnings
+                warnings.warn('Regression coefficients estimated via maximum'
+                              ' likelihood. Estimated coefficients are'
+                              ' available in the parameters list, not as part'
+                              ' of the state vector.', OutputWarning)
+            else:
+                offset = int(spec.trend + spec.level
+                             + self._k_states_by_type['seasonal']
+                             + self._k_states_by_type['freq_seasonal']
+                             + self._k_states_by_type['cycle']
+                             + spec.ar_order)
+                start = offset
+                end = offset + spec.k_exog
+                out = Bunch(
+                    filtered=self.filtered_state[start:end],
+                    filtered_cov=self.filtered_state_cov[start:end, start:end],
+                    smoothed=None, smoothed_cov=None,
+                    offset=offset
+                )
+                if self.smoothed_state is not None:
+                    out.smoothed = self.smoothed_state[start:end]
+                if self.smoothed_state_cov is not None:
+                    out.smoothed_cov = (
+                        self.smoothed_state_cov[start:end, start:end])
+        return out
+
+    def plot_components(self, which=None, alpha=0.05,
+                        observed=True, level=True, trend=True,
+                        seasonal=True, freq_seasonal=True,
+                        cycle=True, autoregressive=True,
+                        legend_loc='upper right', fig=None, figsize=None):
         """
         Plot the estimated components of the model.

@@ -822,15 +1618,194 @@ class UnobservedComponentsResults(MLEResults):

         All plots contain (1 - `alpha`) %  confidence intervals.
         """
-        pass
+        from scipy.stats import norm
+        from statsmodels.graphics.utils import _import_mpl, create_mpl_fig
+        plt = _import_mpl()
+        fig = create_mpl_fig(fig, figsize)
+
+        # Determine which results we have
+        if which is None:
+            which = 'filtered' if self.smoothed_state is None else 'smoothed'
+
+        # Determine which plots we have
+        spec = self.specification
+
+        comp = [
+            ('level', level and spec.level),
+            ('trend', trend and spec.trend),
+            ('seasonal', seasonal and spec.seasonal),
+        ]
+
+        if freq_seasonal and spec.freq_seasonal:
+            for ix, _ in enumerate(spec.freq_seasonal_periods):
+                key = 'freq_seasonal_{!r}'.format(ix)
+                comp.append((key, True))
+
+        comp.extend(
+            [('cycle', cycle and spec.cycle),
+             ('autoregressive', autoregressive and spec.autoregressive)])
+
+        components = dict(comp)
+
+        llb = self.filter_results.loglikelihood_burn
+
+        # Number of plots
+        k_plots = observed + np.sum(list(components.values()))
+
+        # Get dates, if applicable
+        if hasattr(self.data, 'dates') and self.data.dates is not None:
+            dates = self.data.dates._mpl_repr()
+        else:
+            dates = np.arange(len(self.data.endog))
+
+        # Get the critical value for confidence intervals
+        critical_value = norm.ppf(1 - alpha / 2.)
+
+        plot_idx = 1
+
+        # Observed, predicted, confidence intervals
+        if observed:
+            ax = fig.add_subplot(k_plots, 1, plot_idx)
+            plot_idx += 1
+
+            # Plot the observed dataset
+            ax.plot(dates[llb:], self.model.endog[llb:], color='k',
+                    label='Observed')
+
+            # Get the predicted values and confidence intervals
+            predict = self.filter_results.forecasts[0]
+            std_errors = np.sqrt(self.filter_results.forecasts_error_cov[0, 0])
+            ci_lower = predict - critical_value * std_errors
+            ci_upper = predict + critical_value * std_errors
+
+            # Plot
+            ax.plot(dates[llb:], predict[llb:],
+                    label='One-step-ahead predictions')
+            ci_poly = ax.fill_between(
+                dates[llb:], ci_lower[llb:], ci_upper[llb:], alpha=0.2
+            )
+            ci_label = '$%.3g \\%%$ confidence interval' % ((1 - alpha) * 100)
+
+            # Proxy artist for fill_between legend entry
+            # See e.g. https://matplotlib.org/1.3.1/users/legend_guide.html
+            p = plt.Rectangle((0, 0), 1, 1, fc=ci_poly.get_facecolor()[0])
+
+            # Legend
+            handles, labels = ax.get_legend_handles_labels()
+            handles.append(p)
+            labels.append(ci_label)
+            ax.legend(handles, labels, loc=legend_loc)
+
+            ax.set_title('Predicted vs observed')
+
+        # Plot each component
+        for component, is_plotted in components.items():
+            if not is_plotted:
+                continue
+
+            ax = fig.add_subplot(k_plots, 1, plot_idx)
+            plot_idx += 1
+
+            try:
+                component_bunch = getattr(self, component)
+                title = component.title()
+            except AttributeError:
+                # This might be a freq_seasonal component, of which there are
+                #  possibly multiple bagged up in property freq_seasonal
+                if component.startswith('freq_seasonal_'):
+                    ix = int(component.replace('freq_seasonal_', ''))
+                    big_bunch = getattr(self, 'freq_seasonal')
+                    component_bunch = big_bunch[ix]
+                    title = component_bunch.pretty_name
+                else:
+                    raise
+
+            # Check for a valid estimation type
+            if which not in component_bunch:
+                raise ValueError('Invalid type of state estimate.')
+
+            which_cov = '%s_cov' % which
+
+            # Get the predicted values
+            value = component_bunch[which]
+
+            # Plot
+            state_label = '%s (%s)' % (title, which)
+            ax.plot(dates[llb:], value[llb:], label=state_label)
+
+            # Get confidence intervals
+            if which_cov in component_bunch:
+                std_errors = np.sqrt(component_bunch['%s_cov' % which])
+                ci_lower = value - critical_value * std_errors
+                ci_upper = value + critical_value * std_errors
+                ci_poly = ax.fill_between(
+                    dates[llb:], ci_lower[llb:], ci_upper[llb:], alpha=0.2
+                )
+                ci_label = ('$%.3g \\%%$ confidence interval'
+                            % ((1 - alpha) * 100))
+
+            # Legend
+            ax.legend(loc=legend_loc)
+
+            ax.set_title('%s component' % title)
+
+        # Add a note if first observations excluded
+        if llb > 0:
+            text = ('Note: The first %d observations are not shown, due to'
+                    ' approximate diffuse initialization.')
+            fig.text(0.1, 0.01, text % llb, fontsize='large')
+
+        return fig
+
+    @Appender(MLEResults.summary.__doc__)
+    def summary(self, alpha=.05, start=None):
+        # Create the model name
+
+        model_name = [self.specification.trend_specification]
+
+        if self.specification.seasonal:
+            seasonal_name = ('seasonal(%d)'
+                             % self.specification.seasonal_periods)
+            if self.specification.stochastic_seasonal:
+                seasonal_name = 'stochastic ' + seasonal_name
+            model_name.append(seasonal_name)
+
+        if self.specification.freq_seasonal:
+            for ix, is_stochastic in enumerate(
+                    self.specification.stochastic_freq_seasonal):
+                periodicity = self.specification.freq_seasonal_periods[ix]
+                harmonics = self.specification.freq_seasonal_harmonics[ix]
+                freq_seasonal_name = "freq_seasonal({p}({h}))".format(
+                    p=repr(periodicity),
+                    h=repr(harmonics))
+                if is_stochastic:
+                    freq_seasonal_name = 'stochastic ' + freq_seasonal_name
+                model_name.append(freq_seasonal_name)
+
+        if self.specification.cycle:
+            cycle_name = 'cycle'
+            if self.specification.stochastic_cycle:
+                cycle_name = 'stochastic ' + cycle_name
+            if self.specification.damped_cycle:
+                cycle_name = 'damped ' + cycle_name
+            model_name.append(cycle_name)
+
+        if self.specification.autoregressive:
+            autoregressive_name = 'AR(%d)' % self.specification.ar_order
+            model_name.append(autoregressive_name)
+
+        return super(UnobservedComponentsResults, self).summary(
+            alpha=alpha, start=start, title='Unobserved Components Results',
+            model_name=model_name
+        )


 class UnobservedComponentsResultsWrapper(MLEResultsWrapper):
     _attrs = {}
-    _wrap_attrs = wrap.union_dicts(MLEResultsWrapper._wrap_attrs, _attrs)
+    _wrap_attrs = wrap.union_dicts(MLEResultsWrapper._wrap_attrs,
+                                   _attrs)
     _methods = {}
-    _wrap_methods = wrap.union_dicts(MLEResultsWrapper._wrap_methods, _methods)
-
-
-wrap.populate_wrapper(UnobservedComponentsResultsWrapper,
-    UnobservedComponentsResults)
+    _wrap_methods = wrap.union_dicts(MLEResultsWrapper._wrap_methods,
+                                     _methods)
+wrap.populate_wrapper(UnobservedComponentsResultsWrapper,  # noqa:E305
+                      UnobservedComponentsResults)
diff --git a/statsmodels/tsa/statespace/tools.py b/statsmodels/tsa/statespace/tools.py
index 039057016..43c708fe4 100644
--- a/statsmodels/tsa/statespace/tools.py
+++ b/statsmodels/tsa/statespace/tools.py
@@ -7,70 +7,118 @@ License: Simplified-BSD
 import numpy as np
 from scipy.linalg import solve_sylvester
 import pandas as pd
+
 from statsmodels.compat.pandas import Appender
 from statsmodels.tools.data import _is_using_pandas
 from scipy.linalg.blas import find_best_blas_type
-from . import _initialization, _representation, _kalman_filter, _kalman_smoother, _simulation_smoother, _cfa_simulation_smoother, _tools
+from . import (_initialization, _representation, _kalman_filter,
+               _kalman_smoother, _simulation_smoother,
+               _cfa_simulation_smoother, _tools)
+
+
 compatibility_mode = False
 has_trmm = True
-prefix_dtype_map = {'s': np.float32, 'd': np.float64, 'c': np.complex64,
-    'z': np.complex128}
-prefix_initialization_map = {'s': _initialization.sInitialization, 'd':
-    _initialization.dInitialization, 'c': _initialization.cInitialization,
-    'z': _initialization.zInitialization}
-prefix_statespace_map = {'s': _representation.sStatespace, 'd':
-    _representation.dStatespace, 'c': _representation.cStatespace, 'z':
-    _representation.zStatespace}
-prefix_kalman_filter_map = {'s': _kalman_filter.sKalmanFilter, 'd':
-    _kalman_filter.dKalmanFilter, 'c': _kalman_filter.cKalmanFilter, 'z':
-    _kalman_filter.zKalmanFilter}
-prefix_kalman_smoother_map = {'s': _kalman_smoother.sKalmanSmoother, 'd':
-    _kalman_smoother.dKalmanSmoother, 'c': _kalman_smoother.cKalmanSmoother,
-    'z': _kalman_smoother.zKalmanSmoother}
-prefix_simulation_smoother_map = {'s': _simulation_smoother.
-    sSimulationSmoother, 'd': _simulation_smoother.dSimulationSmoother, 'c':
-    _simulation_smoother.cSimulationSmoother, 'z': _simulation_smoother.
-    zSimulationSmoother}
-prefix_cfa_simulation_smoother_map = {'s': _cfa_simulation_smoother.
-    sCFASimulationSmoother, 'd': _cfa_simulation_smoother.
-    dCFASimulationSmoother, 'c': _cfa_simulation_smoother.
-    cCFASimulationSmoother, 'z': _cfa_simulation_smoother.
-    zCFASimulationSmoother}
-prefix_pacf_map = {'s': _tools.
-    _scompute_coefficients_from_multivariate_pacf, 'd': _tools.
-    _dcompute_coefficients_from_multivariate_pacf, 'c': _tools.
-    _ccompute_coefficients_from_multivariate_pacf, 'z': _tools.
-    _zcompute_coefficients_from_multivariate_pacf}
-prefix_sv_map = {'s': _tools._sconstrain_sv_less_than_one, 'd': _tools.
-    _dconstrain_sv_less_than_one, 'c': _tools._cconstrain_sv_less_than_one,
-    'z': _tools._zconstrain_sv_less_than_one}
-prefix_reorder_missing_matrix_map = {'s': _tools.sreorder_missing_matrix,
-    'd': _tools.dreorder_missing_matrix, 'c': _tools.
-    creorder_missing_matrix, 'z': _tools.zreorder_missing_matrix}
-prefix_reorder_missing_vector_map = {'s': _tools.sreorder_missing_vector,
-    'd': _tools.dreorder_missing_vector, 'c': _tools.
-    creorder_missing_vector, 'z': _tools.zreorder_missing_vector}
-prefix_copy_missing_matrix_map = {'s': _tools.scopy_missing_matrix, 'd':
-    _tools.dcopy_missing_matrix, 'c': _tools.ccopy_missing_matrix, 'z':
-    _tools.zcopy_missing_matrix}
-prefix_copy_missing_vector_map = {'s': _tools.scopy_missing_vector, 'd':
-    _tools.dcopy_missing_vector, 'c': _tools.ccopy_missing_vector, 'z':
-    _tools.zcopy_missing_vector}
-prefix_copy_index_matrix_map = {'s': _tools.scopy_index_matrix, 'd': _tools
-    .dcopy_index_matrix, 'c': _tools.ccopy_index_matrix, 'z': _tools.
-    zcopy_index_matrix}
-prefix_copy_index_vector_map = {'s': _tools.scopy_index_vector, 'd': _tools
-    .dcopy_index_vector, 'c': _tools.ccopy_index_vector, 'z': _tools.
-    zcopy_index_vector}
-prefix_compute_smoothed_state_weights_map = {'s': _tools.
-    _scompute_smoothed_state_weights, 'd': _tools.
-    _dcompute_smoothed_state_weights, 'c': _tools.
-    _ccompute_smoothed_state_weights, 'z': _tools.
-    _zcompute_smoothed_state_weights}
+prefix_dtype_map = {
+    's': np.float32, 'd': np.float64, 'c': np.complex64, 'z': np.complex128
+}
+prefix_initialization_map = {
+    's': _initialization.sInitialization,
+    'd': _initialization.dInitialization,
+    'c': _initialization.cInitialization,
+    'z': _initialization.zInitialization
+}
+prefix_statespace_map = {
+    's': _representation.sStatespace, 'd': _representation.dStatespace,
+    'c': _representation.cStatespace, 'z': _representation.zStatespace
+}
+prefix_kalman_filter_map = {
+    's': _kalman_filter.sKalmanFilter,
+    'd': _kalman_filter.dKalmanFilter,
+    'c': _kalman_filter.cKalmanFilter,
+    'z': _kalman_filter.zKalmanFilter
+}
+prefix_kalman_smoother_map = {
+    's': _kalman_smoother.sKalmanSmoother,
+    'd': _kalman_smoother.dKalmanSmoother,
+    'c': _kalman_smoother.cKalmanSmoother,
+    'z': _kalman_smoother.zKalmanSmoother
+}
+prefix_simulation_smoother_map = {
+    's': _simulation_smoother.sSimulationSmoother,
+    'd': _simulation_smoother.dSimulationSmoother,
+    'c': _simulation_smoother.cSimulationSmoother,
+    'z': _simulation_smoother.zSimulationSmoother
+}
+prefix_cfa_simulation_smoother_map = {
+    's': _cfa_simulation_smoother.sCFASimulationSmoother,
+    'd': _cfa_simulation_smoother.dCFASimulationSmoother,
+    'c': _cfa_simulation_smoother.cCFASimulationSmoother,
+    'z': _cfa_simulation_smoother.zCFASimulationSmoother
+}
+prefix_pacf_map = {
+    's': _tools._scompute_coefficients_from_multivariate_pacf,
+    'd': _tools._dcompute_coefficients_from_multivariate_pacf,
+    'c': _tools._ccompute_coefficients_from_multivariate_pacf,
+    'z': _tools._zcompute_coefficients_from_multivariate_pacf
+}
+prefix_sv_map = {
+    's': _tools._sconstrain_sv_less_than_one,
+    'd': _tools._dconstrain_sv_less_than_one,
+    'c': _tools._cconstrain_sv_less_than_one,
+    'z': _tools._zconstrain_sv_less_than_one
+}
+prefix_reorder_missing_matrix_map = {
+    's': _tools.sreorder_missing_matrix,
+    'd': _tools.dreorder_missing_matrix,
+    'c': _tools.creorder_missing_matrix,
+    'z': _tools.zreorder_missing_matrix
+}
+prefix_reorder_missing_vector_map = {
+    's': _tools.sreorder_missing_vector,
+    'd': _tools.dreorder_missing_vector,
+    'c': _tools.creorder_missing_vector,
+    'z': _tools.zreorder_missing_vector
+}
+prefix_copy_missing_matrix_map = {
+    's': _tools.scopy_missing_matrix,
+    'd': _tools.dcopy_missing_matrix,
+    'c': _tools.ccopy_missing_matrix,
+    'z': _tools.zcopy_missing_matrix
+}
+prefix_copy_missing_vector_map = {
+    's': _tools.scopy_missing_vector,
+    'd': _tools.dcopy_missing_vector,
+    'c': _tools.ccopy_missing_vector,
+    'z': _tools.zcopy_missing_vector
+}
+prefix_copy_index_matrix_map = {
+    's': _tools.scopy_index_matrix,
+    'd': _tools.dcopy_index_matrix,
+    'c': _tools.ccopy_index_matrix,
+    'z': _tools.zcopy_index_matrix
+}
+prefix_copy_index_vector_map = {
+    's': _tools.scopy_index_vector,
+    'd': _tools.dcopy_index_vector,
+    'c': _tools.ccopy_index_vector,
+    'z': _tools.zcopy_index_vector
+}
+prefix_compute_smoothed_state_weights_map = {
+    's': _tools._scompute_smoothed_state_weights,
+    'd': _tools._dcompute_smoothed_state_weights,
+    'c': _tools._ccompute_smoothed_state_weights,
+    'z': _tools._zcompute_smoothed_state_weights
+}
+
+
+def set_mode(compatibility=None):
+    if compatibility:
+        raise NotImplementedError('Compatibility mode is only available in'
+                                  ' statsmodels <= 0.9')


 def companion_matrix(polynomial):
-    """
+    r"""
     Create a companion matrix

     Parameters
@@ -96,64 +144,112 @@ def companion_matrix(polynomial):

     .. math::

-        c(L) = c_0 + c_1 L + \\dots + c_p L^p
+        c(L) = c_0 + c_1 L + \dots + c_p L^p

     returns a matrix of the form

     .. math::
-        \\begin{bmatrix}
-            \\phi_1 & 1      & 0 & \\cdots & 0 \\\\
-            \\phi_2 & 0      & 1 &        & 0 \\\\
-            \\vdots &        &   & \\ddots & 0 \\\\
-                   &        &   &        & 1 \\\\
-            \\phi_n & 0      & 0 & \\cdots & 0 \\\\
-        \\end{bmatrix}
-
-    where some or all of the :math:`\\phi_i` may be non-zero (if `polynomial` is
+        \begin{bmatrix}
+            \phi_1 & 1      & 0 & \cdots & 0 \\
+            \phi_2 & 0      & 1 &        & 0 \\
+            \vdots &        &   & \ddots & 0 \\
+                   &        &   &        & 1 \\
+            \phi_n & 0      & 0 & \cdots & 0 \\
+        \end{bmatrix}
+
+    where some or all of the :math:`\phi_i` may be non-zero (if `polynomial` is
     None, then all are equal to zero).

-    If the coefficients provided are scalars :math:`(c_0, c_1, \\dots, c_p)`,
-    then the companion matrix is an :math:`n \\times n` matrix formed with the
+    If the coefficients provided are scalars :math:`(c_0, c_1, \dots, c_p)`,
+    then the companion matrix is an :math:`n \times n` matrix formed with the
     elements in the first column defined as
-    :math:`\\phi_i = -\\frac{c_i}{c_0}, i \\in 1, \\dots, p`.
+    :math:`\phi_i = -\frac{c_i}{c_0}, i \in 1, \dots, p`.

-    If the coefficients provided are matrices :math:`(C_0, C_1, \\dots, C_p)`,
+    If the coefficients provided are matrices :math:`(C_0, C_1, \dots, C_p)`,
     each of shape :math:`(m, m)`, then the companion matrix is an
-    :math:`nm \\times nm` matrix formed with the elements in the first column
-    defined as :math:`\\phi_i = -C_0^{-1} C_i', i \\in 1, \\dots, p`.
+    :math:`nm \times nm` matrix formed with the elements in the first column
+    defined as :math:`\phi_i = -C_0^{-1} C_i', i \in 1, \dots, p`.

     It is important to understand the expected signs of the coefficients. A
     typical AR(p) model is written as:

     .. math::
-        y_t = a_1 y_{t-1} + \\dots + a_p y_{t-p} + \\varepsilon_t
+        y_t = a_1 y_{t-1} + \dots + a_p y_{t-p} + \varepsilon_t

     This can be rewritten as:

     .. math::
-        (1 - a_1 L - \\dots - a_p L^p )y_t = \\varepsilon_t \\\\
-        (1 + c_1 L + \\dots + c_p L^p )y_t = \\varepsilon_t \\\\
-        c(L) y_t = \\varepsilon_t
+        (1 - a_1 L - \dots - a_p L^p )y_t = \varepsilon_t \\
+        (1 + c_1 L + \dots + c_p L^p )y_t = \varepsilon_t \\
+        c(L) y_t = \varepsilon_t

     The coefficients from this form are defined to be :math:`c_i = - a_i`, and
     it is the :math:`c_i` coefficients that this function expects to be
     provided.
     """
-    pass
+    identity_matrix = False
+    if isinstance(polynomial, (int, np.integer)):
+        # GH 5570, allow numpy integer types, but coerce to python int
+        n = int(polynomial)
+        m = 1
+        polynomial = None
+    else:
+        n = len(polynomial) - 1
+
+        if n < 1:
+            raise ValueError("Companion matrix polynomials must include at"
+                             " least two terms.")
+
+        if isinstance(polynomial, (list, tuple)):
+            try:
+                # Note: cannot use polynomial[0] because of the special
+                # behavior associated with matrix polynomials and the constant
+                # 1, see below.
+                m = len(polynomial[1])
+            except TypeError:
+                m = 1
+
+            # Check if we just have a scalar polynomial
+            if m == 1:
+                polynomial = np.asanyarray(polynomial)
+            # Check if 1 was passed as the first argument (indicating an
+            # identity matrix)
+            elif polynomial[0] == 1:
+                polynomial[0] = np.eye(m)
+                identity_matrix = True
+        else:
+            m = 1
+            polynomial = np.asanyarray(polynomial)
+
+    matrix = np.zeros((n * m, n * m), dtype=np.asanyarray(polynomial).dtype)
+    idx = np.diag_indices((n - 1) * m)
+    idx = (idx[0], idx[1] + m)
+    matrix[idx] = 1
+    if polynomial is not None and n > 0:
+        if m == 1:
+            matrix[:, 0] = -polynomial[1:] / polynomial[0]
+        elif identity_matrix:
+            for i in range(n):
+                matrix[i * m:(i + 1) * m, :m] = -polynomial[i+1].T
+        else:
+            inv = np.linalg.inv(polynomial[0])
+            for i in range(n):
+                matrix[i * m:(i + 1) * m, :m] = -np.dot(inv, polynomial[i+1]).T
+    return matrix


 def diff(series, k_diff=1, k_seasonal_diff=None, seasonal_periods=1):
-    """
+    r"""
     Difference a series simply and/or seasonally along the zero-th axis.

     Given a series (denoted :math:`y_t`), performs the differencing operation

     .. math::

-        \\Delta^d \\Delta_s^D y_t
+        \Delta^d \Delta_s^D y_t

     where :math:`d =` `diff`, :math:`s =` `seasonal_periods`,
-    :math:`D =` `seasonal\\_diff`, and :math:`\\Delta` is the difference
+    :math:`D =` `seasonal\_diff`, and :math:`\Delta` is the difference
     operator.

     Parameters
@@ -174,7 +270,28 @@ def diff(series, k_diff=1, k_seasonal_diff=None, seasonal_periods=1):
     differenced : ndarray
         The differenced array.
     """
-    pass
+    pandas = _is_using_pandas(series, None)
+    differenced = np.asanyarray(series) if not pandas else series
+
+    # Seasonal differencing
+    if k_seasonal_diff is not None:
+        while k_seasonal_diff > 0:
+            if not pandas:
+                differenced = (differenced[seasonal_periods:] -
+                               differenced[:-seasonal_periods])
+            else:
+                sdiffed = differenced.diff(seasonal_periods)
+                differenced = sdiffed[seasonal_periods:]
+            k_seasonal_diff -= 1
+
+    # Simple differencing
+    if not pandas:
+        differenced = np.diff(differenced, k_diff, axis=0)
+    else:
+        while k_diff > 0:
+            differenced = differenced.diff()[1:]
+            k_diff -= 1
+    return differenced


 def concat(series, axis=0, allow_mix=False):
@@ -198,11 +315,66 @@ def concat(series, axis=0, allow_mix=False):
         The concatenated array. Will be a DataFrame if series are pandas
         objects.
     """
-    pass
+    is_pandas = np.r_[[_is_using_pandas(s, None) for s in series]]
+    ndim = np.r_[[np.ndim(s) for s in series]]
+    max_ndim = np.max(ndim)
+
+    if max_ndim > 2:
+        raise ValueError('`tools.concat` does not support arrays with 3 or'
+                         ' more dimensions.')
+
+    # Make sure the iterable is mutable
+    if isinstance(series, tuple):
+        series = list(series)
+
+    # Standardize ndim
+    for i in range(len(series)):
+        if ndim[i] == 0 and max_ndim == 1:
+            series[i] = np.atleast_1d(series[i])
+        elif ndim[i] == 0 and max_ndim == 2:
+            series[i] = np.atleast_2d(series[i])
+        elif ndim[i] == 1 and max_ndim == 2 and is_pandas[i]:
+            name = series[i].name
+            series[i] = series[i].to_frame()
+            series[i].columns = [name]
+        elif ndim[i] == 1 and max_ndim == 2 and not is_pandas[i]:
+            series[i] = np.atleast_2d(series[i]).T
+
+    if np.all(is_pandas):
+        if isinstance(series[0], pd.DataFrame):
+            base_columns = series[0].columns
+        else:
+            base_columns = pd.Index([series[0].name])
+        for i in range(1, len(series)):
+            s = series[i]
+
+            if isinstance(s, pd.DataFrame):
+                # Handle case where we were passed a dataframe and a series
+                # to concatenate, and the series did not have a name.
+                if s.columns.equals(pd.Index([None])):
+                    s.columns = base_columns[:1]
+                s_columns = s.columns
+            else:
+                s_columns = pd.Index([s.name])
+
+            if axis == 0 and not base_columns.equals(s_columns):
+                raise ValueError('Columns must match to concatenate along'
+                                 ' rows.')
+            elif axis == 1 and not series[0].index.equals(s.index):
+                raise ValueError('Index must match to concatenate along'
+                                 ' columns.')
+        concatenated = pd.concat(series, axis=axis)
+    elif np.all(~is_pandas) or allow_mix:
+        concatenated = np.concatenate(series, axis=axis)
+    else:
+        raise ValueError('Attempted to concatenate Pandas objects with'
+                         ' non-Pandas objects with `allow_mix=False`.')
+
+    return concatenated


 def is_invertible(polynomial, threshold=1 - 1e-10):
-    """
+    r"""
     Determine if a polynomial is invertible.

     Requires all roots of the polynomial lie inside the unit circle.
@@ -225,12 +397,12 @@ def is_invertible(polynomial, threshold=1 - 1e-10):
     Notes
     -----

-    If the coefficients provided are scalars :math:`(c_0, c_1, \\dots, c_n)`,
-    then the corresponding polynomial is :math:`c_0 + c_1 L + \\dots + c_n L^n`.
+    If the coefficients provided are scalars :math:`(c_0, c_1, \dots, c_n)`,
+    then the corresponding polynomial is :math:`c_0 + c_1 L + \dots + c_n L^n`.


-    If the coefficients provided are matrices :math:`(C_0, C_1, \\dots, C_n)`,
-    then the corresponding polynomial is :math:`C_0 + C_1 L + \\dots + C_n L^n`.
+    If the coefficients provided are matrices :math:`(C_0, C_1, \dots, C_n)`,
+    then the corresponding polynomial is :math:`C_0 + C_1 L + \dots + C_n L^n`.

     There are three equivalent methods of determining if the polynomial
     represented by the coefficients is invertible:
@@ -239,35 +411,41 @@ def is_invertible(polynomial, threshold=1 - 1e-10):

     .. math::

-        C(L) & = c_0 + c_1 L + \\dots + c_n L^n \\\\
-             & = constant (1 - \\lambda_1 L)
-                 (1 - \\lambda_2 L) \\dots (1 - \\lambda_n L)
+        C(L) & = c_0 + c_1 L + \dots + c_n L^n \\
+             & = constant (1 - \lambda_1 L)
+                 (1 - \lambda_2 L) \dots (1 - \lambda_n L)

     In order for :math:`C(L)` to be invertible, it must be that each factor
-    :math:`(1 - \\lambda_i L)` is invertible; the condition is then that
-    :math:`|\\lambda_i| < 1`, where :math:`\\lambda_i` is a root of the
+    :math:`(1 - \lambda_i L)` is invertible; the condition is then that
+    :math:`|\lambda_i| < 1`, where :math:`\lambda_i` is a root of the
     polynomial.

     The second method factorizes the polynomial into:

     .. math::

-        C(L) & = c_0 + c_1 L + \\dots + c_n L^n \\\\
-             & = constant (L - \\zeta_1) (L - \\zeta_2) \\dots (L - \\zeta_3)
+        C(L) & = c_0 + c_1 L + \dots + c_n L^n \\
+             & = constant (L - \zeta_1) (L - \zeta_2) \dots (L - \zeta_3)

-    The condition is now :math:`|\\zeta_i| > 1`, where :math:`\\zeta_i` is a root
+    The condition is now :math:`|\zeta_i| > 1`, where :math:`\zeta_i` is a root
     of the polynomial with reversed coefficients and
-    :math:`\\lambda_i = \\frac{1}{\\zeta_i}`.
+    :math:`\lambda_i = \frac{1}{\zeta_i}`.

     Finally, a companion matrix can be formed using the coefficients of the
     polynomial. Then the eigenvalues of that matrix give the roots of the
     polynomial. This last method is the one actually used.
     """
-    pass
+    # First method:
+    # np.all(np.abs(np.roots(np.r_[1, params])) < 1)
+    # Second method:
+    # np.all(np.abs(np.roots(np.r_[1, params][::-1])) > 1)
+    # Final method:
+    eigvals = np.linalg.eigvals(companion_matrix(polynomial))
+    return np.all(np.abs(eigvals) < threshold)


 def solve_discrete_lyapunov(a, q, complex_step=False):
-    """
+    r"""
     Solves the discrete Lyapunov equation using a bilinear transformation.

     Notes
@@ -278,7 +456,19 @@ def solve_discrete_lyapunov(a, q, complex_step=False):
     (usually the transition matrix) in order to allow complex step
     differentiation.
     """
-    pass
+    eye = np.eye(a.shape[0], dtype=a.dtype)
+    if not complex_step:
+        aH = a.conj().transpose()
+        aHI_inv = np.linalg.inv(aH + eye)
+        b = np.dot(aH - eye, aHI_inv)
+        c = 2*np.dot(np.dot(np.linalg.inv(a + eye), q), aHI_inv)
+        return solve_sylvester(b.conj().transpose(), b, -c)
+    else:
+        aH = a.transpose()
+        aHI_inv = np.linalg.inv(aH + eye)
+        b = np.dot(aH - eye, aHI_inv)
+        c = 2*np.dot(np.dot(np.linalg.inv(a + eye), q), aHI_inv)
+        return solve_sylvester(b.transpose(), b, -c)


 def constrain_stationary_univariate(unconstrained):
@@ -307,7 +497,15 @@ def constrain_stationary_univariate(unconstrained):
        Autoregressive-moving Average Models."
        Biometrika 71 (2) (August 1): 403-404.
     """
-    pass
+
+    n = unconstrained.shape[0]
+    y = np.zeros((n, n), dtype=unconstrained.dtype)
+    r = unconstrained/((1 + unconstrained**2)**0.5)
+    for k in range(n):
+        for i in range(k):
+            y[k, i] = y[k - 1, i] + r[k] * y[k - 1, k - i - 1]
+        y[k, k] = r[k]
+    return -y[n - 1, :]


 def unconstrain_stationary_univariate(constrained):
@@ -336,11 +534,19 @@ def unconstrain_stationary_univariate(constrained):
        Autoregressive-moving Average Models."
        Biometrika 71 (2) (August 1): 403-404.
     """
-    pass
-
-
-def _constrain_sv_less_than_one_python(unconstrained, order=None, k_endog=None
-    ):
+    n = constrained.shape[0]
+    y = np.zeros((n, n), dtype=constrained.dtype)
+    y[n-1:] = -constrained
+    for k in range(n-1, 0, -1):
+        for i in range(k):
+            y[k-1, i] = (y[k, i] - y[k, k]*y[k, k-i-1]) / (1 - y[k, k]**2)
+    r = y.diagonal()
+    x = r / ((1 - r**2)**0.5)
+    return x
+
+
+def _constrain_sv_less_than_one_python(unconstrained, order=None,
+                                       k_endog=None):
     """
     Transform arbitrary matrices to matrices with singular values less than
     one.
@@ -370,12 +576,26 @@ def _constrain_sv_less_than_one_python(unconstrained, order=None, k_endog=None
     Corresponds to Lemma 2.2 in Ansley and Kohn (1986). See
     `constrain_stationary_multivariate` for more details.
     """
-    pass
+
+    from scipy import linalg
+
+    constrained = []  # P_s,  s = 1, ..., p
+    if order is None:
+        order = len(unconstrained)
+    if k_endog is None:
+        k_endog = unconstrained[0].shape[0]
+
+    eye = np.eye(k_endog)
+    for i in range(order):
+        A = unconstrained[i]
+        B, lower = linalg.cho_factor(eye + np.dot(A, A.T), lower=True)
+        constrained.append(linalg.solve_triangular(B, A, lower=lower))
+    return constrained


 def _compute_coefficients_from_multivariate_pacf_python(
-    partial_autocorrelations, error_variance, transform_variance=False,
-    order=None, k_endog=None):
+        partial_autocorrelations, error_variance, transform_variance=False,
+        order=None, k_endog=None):
     """
     Transform matrices with singular values less than one to matrices
     corresponding to a stationary (or invertible) process.
@@ -415,12 +635,141 @@ def _compute_coefficients_from_multivariate_pacf_python(
     Corresponds to Lemma 2.1 in Ansley and Kohn (1986). See
     `constrain_stationary_multivariate` for more details.
     """
-    pass
+    from scipy import linalg
+
+    if order is None:
+        order = len(partial_autocorrelations)
+    if k_endog is None:
+        k_endog = partial_autocorrelations[0].shape[0]
+
+    # If we want to keep the provided variance but with the constrained
+    # coefficient matrices, we need to make a copy here, and then after the
+    # main loop we will transform the coefficients to match the passed variance
+    if not transform_variance:
+        initial_variance = error_variance
+        # Need to make the input variance large enough that the recursions
+        # do not lead to zero-matrices due to roundoff error, which would case
+        # exceptions from the Cholesky decompositions.
+        # Note that this will still not always ensure positive definiteness,
+        # and for k_endog, order large enough an exception may still be raised
+        error_variance = np.eye(k_endog) * (order + k_endog)**10
+
+    forward_variances = [error_variance]   # \Sigma_s
+    backward_variances = [error_variance]  # \Sigma_s^*,  s = 0, ..., p
+    autocovariances = [error_variance]     # \Gamma_s
+    # \phi_{s,k}, s = 1, ..., p
+    #             k = 1, ..., s+1
+    forwards = []
+    # \phi_{s,k}^*
+    backwards = []
+
+    error_variance_factor = linalg.cholesky(error_variance, lower=True)
+
+    forward_factors = [error_variance_factor]
+    backward_factors = [error_variance_factor]
+
+    # We fill in the entries as follows:
+    # [1,1]
+    # [2,2], [2,1]
+    # [3,3], [3,1], [3,2]
+    # ...
+    # [p,p], [p,1], ..., [p,p-1]
+    # the last row, correctly ordered, is then used as the coefficients
+    for s in range(order):  # s = 0, ..., p-1
+        prev_forwards = forwards
+        prev_backwards = backwards
+        forwards = []
+        backwards = []
+
+        # Create the "last" (k = s+1) matrix
+        # Note: this is for k = s+1. However, below we then have to fill
+        # in for k = 1, ..., s in order.
+        # P L*^{-1} = x
+        # x L* = P
+        # L*' x' = P'
+        forwards.append(
+            linalg.solve_triangular(
+                backward_factors[s], partial_autocorrelations[s].T,
+                lower=True, trans='T'))
+        forwards[0] = np.dot(forward_factors[s], forwards[0].T)
+
+        # P' L^{-1} = x
+        # x L = P'
+        # L' x' = P
+        backwards.append(
+            linalg.solve_triangular(
+                forward_factors[s], partial_autocorrelations[s],
+                lower=True, trans='T'))
+        backwards[0] = np.dot(backward_factors[s], backwards[0].T)
+
+        # Update the variance
+        # Note: if s >= 1, this will be further updated in the for loop
+        # below
+        # Also, this calculation will be re-used in the forward variance
+        tmp = np.dot(forwards[0], backward_variances[s])
+        autocovariances.append(tmp.copy().T)
+
+        # Create the remaining k = 1, ..., s matrices,
+        # only has an effect if s >= 1
+        for k in range(s):
+            forwards.insert(k, prev_forwards[k] - np.dot(
+                forwards[-1], prev_backwards[s-(k+1)]))
+
+            backwards.insert(k, prev_backwards[k] - np.dot(
+                backwards[-1], prev_forwards[s-(k+1)]))
+
+            autocovariances[s+1] += np.dot(autocovariances[k+1],
+                                           prev_forwards[s-(k+1)].T)
+
+        # Create forward and backwards variances
+        forward_variances.append(
+            forward_variances[s] - np.dot(tmp, forwards[s].T)
+        )
+        backward_variances.append(
+            backward_variances[s] -
+            np.dot(
+                np.dot(backwards[s], forward_variances[s]),
+                backwards[s].T
+            )
+        )
+
+        # Cholesky factors
+        forward_factors.append(
+            linalg.cholesky(forward_variances[s+1], lower=True)
+        )
+        backward_factors.append(
+            linalg.cholesky(backward_variances[s+1], lower=True)
+        )
+
+    # If we do not want to use the transformed variance, we need to
+    # adjust the constrained matrices, as presented in Lemma 2.3, see above
+    variance = forward_variances[-1]
+    if not transform_variance:
+        # Here, we need to construct T such that:
+        # variance = T * initial_variance * T'
+        # To do that, consider the Cholesky of variance (L) and
+        # input_variance (M) to get:
+        # L L' = T M M' T' = (TM) (TM)'
+        # => L = T M
+        # => L M^{-1} = T
+        initial_variance_factor = np.linalg.cholesky(initial_variance)
+        transformed_variance_factor = np.linalg.cholesky(variance)
+        transform = np.dot(initial_variance_factor,
+                           np.linalg.inv(transformed_variance_factor))
+        inv_transform = np.linalg.inv(transform)
+
+        for i in range(order):
+            forwards[i] = (
+                np.dot(np.dot(transform, forwards[i]), inv_transform)
+            )
+
+    return forwards, variance


 def constrain_stationary_multivariate_python(unconstrained, error_variance,
-    transform_variance=False, prefix=None):
-    """
+                                             transform_variance=False,
+                                             prefix=None):
+    r"""
     Transform unconstrained parameters used by the optimizer to constrained
     parameters used in likelihood evaluation for a vector autoregression.

@@ -457,17 +806,17 @@ def constrain_stationary_multivariate_python(unconstrained, error_variance,
     Notes
     -----
     In the notation of [1]_, the arguments `(variance, unconstrained)` are
-    written as :math:`(\\Sigma, A_1, \\dots, A_p)`, where :math:`p` is the order
+    written as :math:`(\Sigma, A_1, \dots, A_p)`, where :math:`p` is the order
     of the vector autoregression, and is here determined by the length of
     the `unconstrained` argument.

     There are two steps in the constraining algorithm.

-    First, :math:`(A_1, \\dots, A_p)` are transformed into
-    :math:`(P_1, \\dots, P_p)` via Lemma 2.2 of [1]_.
+    First, :math:`(A_1, \dots, A_p)` are transformed into
+    :math:`(P_1, \dots, P_p)` via Lemma 2.2 of [1]_.

-    Second, :math:`(\\Sigma, P_1, \\dots, P_p)` are transformed into
-    :math:`(\\Sigma, \\phi_1, \\dots, \\phi_p)` via Lemmas 2.1 and 2.3 of [1]_.
+    Second, :math:`(\Sigma, P_1, \dots, P_p)` are transformed into
+    :math:`(\Sigma, \phi_1, \dots, \phi_p)` via Lemmas 2.1 and 2.3 of [1]_.

     If `transform_variance=True`, then only Lemma 2.1 is applied in the second
     step.
@@ -486,7 +835,85 @@ def constrain_stationary_multivariate_python(unconstrained, error_variance,
        In Proceedings of the Business and Economic Statistics Section, 349-53.
        American Statistical Association
     """
-    pass
+
+    use_list = type(unconstrained) is list
+    if not use_list:
+        k_endog, order = unconstrained.shape
+        order //= k_endog
+
+        unconstrained = [
+            unconstrained[:k_endog, i*k_endog:(i+1)*k_endog]
+            for i in range(order)
+        ]
+
+    order = len(unconstrained)
+    k_endog = unconstrained[0].shape[0]
+
+    # Step 1: convert from arbitrary matrices to those with singular values
+    # less than one.
+    sv_constrained = _constrain_sv_less_than_one_python(
+        unconstrained, order, k_endog)
+
+    # Step 2: convert matrices from our "partial autocorrelation matrix" space
+    # (matrices with singular values less than one) to the space of stationary
+    # coefficient matrices
+    constrained, var = _compute_coefficients_from_multivariate_pacf_python(
+        sv_constrained, error_variance, transform_variance, order, k_endog)
+
+    if not use_list:
+        constrained = np.concatenate(constrained, axis=1).reshape(
+            k_endog, k_endog * order)
+
+    return constrained, var
+
+
+@Appender(constrain_stationary_multivariate_python.__doc__)
+def constrain_stationary_multivariate(unconstrained, variance,
+                                      transform_variance=False,
+                                      prefix=None):
+
+    use_list = type(unconstrained) is list
+    if use_list:
+        unconstrained = np.concatenate(unconstrained, axis=1)
+
+    k_endog, order = unconstrained.shape
+    order //= k_endog
+
+    if order < 1:
+        raise ValueError('Must have order at least 1')
+    if k_endog < 1:
+        raise ValueError('Must have at least 1 endogenous variable')
+
+    if prefix is None:
+        prefix, dtype, _ = find_best_blas_type(
+            [unconstrained, variance])
+    dtype = prefix_dtype_map[prefix]
+
+    unconstrained = np.asfortranarray(unconstrained, dtype=dtype)
+    variance = np.asfortranarray(variance, dtype=dtype)
+
+    # Step 1: convert from arbitrary matrices to those with singular values
+    # less than one.
+    # sv_constrained = _constrain_sv_less_than_one(unconstrained, order,
+    #                                              k_endog, prefix)
+    sv_constrained = prefix_sv_map[prefix](unconstrained, order, k_endog)
+
+    # Step 2: convert matrices from our "partial autocorrelation matrix"
+    # space (matrices with singular values less than one) to the space of
+    # stationary coefficient matrices
+    constrained, variance = prefix_pacf_map[prefix](
+        sv_constrained, variance, transform_variance, order, k_endog)
+
+    constrained = np.array(constrained, dtype=dtype)
+    variance = np.array(variance, dtype=dtype)
+
+    if use_list:
+        constrained = [
+            constrained[:k_endog, i*k_endog:(i+1)*k_endog]
+            for i in range(order)
+        ]
+
+    return constrained, variance


 def _unconstrain_sv_less_than_one(constrained, order=None, k_endog=None):
@@ -519,11 +946,27 @@ def _unconstrain_sv_less_than_one(constrained, order=None, k_endog=None):
     Corresponds to the inverse of Lemma 2.2 in Ansley and Kohn (1986). See
     `unconstrain_stationary_multivariate` for more details.
     """
-    pass
+    from scipy import linalg
+
+    unconstrained = []  # A_s,  s = 1, ..., p
+    if order is None:
+        order = len(constrained)
+    if k_endog is None:
+        k_endog = constrained[0].shape[0]
+
+    eye = np.eye(k_endog)
+    for i in range(order):
+        P = constrained[i]
+        # B^{-1} B^{-1}' = I - P P'
+        B_inv, lower = linalg.cho_factor(eye - np.dot(P, P.T), lower=True)
+        # A = BP
+        # B^{-1} A = P
+        unconstrained.append(linalg.solve_triangular(B_inv, P, lower=lower))
+    return unconstrained


 def _compute_multivariate_sample_acovf(endog, maxlag):
-    """
+    r"""
     Computer multivariate sample autocovariances

     Parameters
@@ -546,8 +989,8 @@ def _compute_multivariate_sample_acovf(endog, maxlag):

     .. math::

-        \\hat \\Gamma(s) = \\frac{1}{n} \\sum_{t=1}^{n-s}
-        (Z_t - \\bar Z) (Z_{t+s} - \\bar Z)'
+        \hat \Gamma(s) = \frac{1}{n} \sum_{t=1}^{n-s}
+        (Z_t - \bar Z) (Z_{t+s} - \bar Z)'

     See page 353 of Wei (1990). This function is primarily implemented for
     checking the partial autocorrelation functions below, and so is quite slow.
@@ -558,12 +1001,29 @@ def _compute_multivariate_sample_acovf(endog, maxlag):
        Time Series Analysis : Univariate and Multivariate Methods. Boston:
        Pearson.
     """
-    pass
+    # Get the (demeaned) data as an array
+    endog = np.array(endog)
+    if endog.ndim == 1:
+        endog = endog[:, np.newaxis]
+    endog -= np.mean(endog, axis=0)

+    # Dimensions
+    nobs, k_endog = endog.shape

-def _compute_multivariate_acovf_from_coefficients(coefficients,
-    error_variance, maxlag=None, forward_autocovariances=False):
-    """
+    sample_autocovariances = []
+    for s in range(maxlag + 1):
+        sample_autocovariances.append(np.zeros((k_endog, k_endog)))
+        for t in range(nobs - s):
+            sample_autocovariances[s] += np.outer(endog[t], endog[t+s])
+        sample_autocovariances[s] /= nobs
+
+    return sample_autocovariances
+
+
+def _compute_multivariate_acovf_from_coefficients(
+        coefficients, error_variance, maxlag=None,
+        forward_autocovariances=False):
+    r"""
     Compute multivariate autocovariances from vector autoregression coefficient
     matrices

@@ -597,24 +1057,75 @@ def _compute_multivariate_acovf_from_coefficients(coefficients,

     .. math::

-        \\Gamma(j) = E(y_t y_{t-j}')
+        \Gamma(j) = E(y_t y_{t-j}')

     for j = 1, ..., `maxlag`, unless `forward_autocovariances` is specified,
     in which case it computes:

     .. math::

-        E(y_t y_{t+j}') = \\Gamma(j)'
+        E(y_t y_{t+j}') = \Gamma(j)'

     Coefficients are assumed to be provided from the VAR model:

     .. math::
-        y_t = A_1 y_{t-1} + \\dots + A_p y_{t-p} + \\varepsilon_t
+        y_t = A_1 y_{t-1} + \dots + A_p y_{t-p} + \varepsilon_t

     Autocovariances are calculated by solving the associated discrete Lyapunov
     equation of the state space representation of the VAR process.
     """
-    pass
+    from scipy import linalg
+
+    # Convert coefficients to a list of matrices, for use in
+    # `companion_matrix`; get dimensions
+    if type(coefficients) is list:
+        order = len(coefficients)
+        k_endog = coefficients[0].shape[0]
+    else:
+        k_endog, order = coefficients.shape
+        order //= k_endog
+
+        coefficients = [
+            coefficients[:k_endog, i*k_endog:(i+1)*k_endog]
+            for i in range(order)
+        ]
+
+    if maxlag is None:
+        maxlag = order-1
+
+    # Start with VAR(p): w_{t+1} = phi_1 w_t + ... + phi_p w_{t-p+1} + u_{t+1}
+    # Then stack the VAR(p) into a VAR(1) in companion matrix form:
+    # z_{t+1} = F z_t + v_t
+    companion = companion_matrix(
+        [1] + [-np.squeeze(coefficients[i]) for i in range(order)]
+    ).T
+
+    # Compute the error variance matrix for the stacked form: E v_t v_t'
+    selected_variance = np.zeros(companion.shape)
+    selected_variance[:k_endog, :k_endog] = error_variance
+
+    # Compute the unconditional variance of z_t: E z_t z_t'
+    stacked_cov = linalg.solve_discrete_lyapunov(companion, selected_variance)
+
+    # The first (block) row of the variance of z_t gives the first p-1
+    # autocovariances of w_t: \Gamma_i = E w_t w_t+i with \Gamma_0 = Var(w_t)
+    # Note: these are okay, checked against ArmaProcess
+    autocovariances = [
+        stacked_cov[:k_endog, i*k_endog:(i+1)*k_endog]
+        for i in range(min(order, maxlag+1))
+    ]
+
+    for i in range(maxlag - (order-1)):
+        stacked_cov = np.dot(companion, stacked_cov)
+        autocovariances += [
+            stacked_cov[:k_endog, -k_endog:]
+        ]
+
+    if forward_autocovariances:
+        for i in range(len(autocovariances)):
+            autocovariances[i] = autocovariances[i].T
+
+    return autocovariances


 def _compute_multivariate_sample_pacf(endog, maxlag):
@@ -635,11 +1146,14 @@ def _compute_multivariate_sample_pacf(endog, maxlag):
         A list of the first `maxlag` sample partial autocorrelation matrices.
         Each matrix is shaped `k_endog` x `k_endog`.
     """
-    pass
+    sample_autocovariances = _compute_multivariate_sample_acovf(endog, maxlag)

+    return _compute_multivariate_pacf_from_autocovariances(
+        sample_autocovariances)

-def _compute_multivariate_pacf_from_autocovariances(autocovariances, order=
-    None, k_endog=None):
+
+def _compute_multivariate_pacf_from_autocovariances(autocovariances,
+                                                    order=None, k_endog=None):
     """
     Compute multivariate partial autocorrelations from autocovariances.

@@ -672,12 +1186,135 @@ def _compute_multivariate_pacf_from_autocovariances(autocovariances, order=
     Computes sample partial autocorrelations if sample autocovariances are
     given.
     """
-    pass
-
-
-def _compute_multivariate_pacf_from_coefficients(constrained,
-    error_variance, order=None, k_endog=None):
-    """
+    from scipy import linalg
+
+    if order is None:
+        order = len(autocovariances)-1
+    if k_endog is None:
+        k_endog = autocovariances[0].shape[0]
+
+    # Now apply the Ansley and Kohn (1986) algorithm, except that instead of
+    # calculating phi_{s+1, s+1} = L_s P_{s+1} {L_s^*}^{-1} (which requires
+    # the partial autocorrelation P_{s+1} which is what we're trying to
+    # calculate here), we calculate it as in Ansley and Newbold (1979), using
+    # the autocovariances \Gamma_s and the forwards and backwards residual
+    # variances \Sigma_s, \Sigma_s^*:
+    # phi_{s+1, s+1} = [ \Gamma_{s+1}' - \phi_{s,1} \Gamma_s' - ... -
+    #                    \phi_{s,s} \Gamma_1' ] {\Sigma_s^*}^{-1}
+
+    # Forward and backward variances
+    forward_variances = []   # \Sigma_s
+    backward_variances = []  # \Sigma_s^*,  s = 0, ..., p
+    # \phi_{s,k}, s = 1, ..., p
+    #             k = 1, ..., s+1
+    forwards = []
+    # \phi_{s,k}^*
+    backwards = []
+
+    forward_factors = []   # L_s
+    backward_factors = []  # L_s^*,  s = 0, ..., p
+
+    # Ultimately we want to construct the partial autocorrelation matrices
+    # Note that this is "1-indexed" in the sense that it stores P_1, ... P_p
+    # rather than starting with P_0.
+    partial_autocorrelations = []
+
+    # We fill in the entries of phi_{s,k} as follows:
+    # [1,1]
+    # [2,2], [2,1]
+    # [3,3], [3,1], [3,2]
+    # ...
+    # [p,p], [p,1], ..., [p,p-1]
+    # the last row, correctly ordered, should be the same as the coefficient
+    # matrices provided in the argument `constrained`
+    for s in range(order):  # s = 0, ..., p-1
+        prev_forwards = list(forwards)
+        prev_backwards = list(backwards)
+        forwards = []
+        backwards = []
+
+        # Create forward and backwards variances Sigma_s, Sigma*_s
+        forward_variance = autocovariances[0].copy()
+        backward_variance = autocovariances[0].T.copy()
+
+        for k in range(s):
+            forward_variance -= np.dot(prev_forwards[k],
+                                       autocovariances[k+1])
+            backward_variance -= np.dot(prev_backwards[k],
+                                        autocovariances[k+1].T)
+
+        forward_variances.append(forward_variance)
+        backward_variances.append(backward_variance)
+
+        # Cholesky factors
+        forward_factors.append(
+            linalg.cholesky(forward_variances[s], lower=True)
+        )
+        backward_factors.append(
+            linalg.cholesky(backward_variances[s], lower=True)
+        )
+
+        # Create the intermediate sum term
+        if s == 0:
+            # phi_11 = \Gamma_1' \Gamma_0^{-1}
+            # phi_11 \Gamma_0 = \Gamma_1'
+            # \Gamma_0 phi_11' = \Gamma_1
+            forwards.append(linalg.cho_solve(
+                (forward_factors[0], True), autocovariances[1]).T)
+            # backwards.append(forwards[-1])
+            # phi_11_star = \Gamma_1 \Gamma_0^{-1}
+            # phi_11_star \Gamma_0 = \Gamma_1
+            # \Gamma_0 phi_11_star' = \Gamma_1'
+            backwards.append(linalg.cho_solve(
+                (backward_factors[0], True), autocovariances[1].T).T)
+        else:
+            # G := \Gamma_{s+1}' -
+            #      \phi_{s,1} \Gamma_s' - .. - \phi_{s,s} \Gamma_1'
+            tmp_sum = autocovariances[s+1].T.copy()
+
+            for k in range(s):
+                tmp_sum -= np.dot(prev_forwards[k], autocovariances[s-k].T)
+
+            # Create the "last" (k = s+1) matrix
+            # Note: this is for k = s+1. However, below we then have to
+            # fill in for k = 1, ..., s in order.
+            # phi = G Sigma*^{-1}
+            # phi Sigma* = G
+            # Sigma*' phi' = G'
+            # Sigma* phi' = G'
+            # (because Sigma* is symmetric)
+            forwards.append(linalg.cho_solve(
+                (backward_factors[s], True), tmp_sum.T).T)
+
+            # phi = G' Sigma^{-1}
+            # phi Sigma = G'
+            # Sigma' phi' = G
+            # Sigma phi' = G
+            # (because Sigma is symmetric)
+            backwards.append(linalg.cho_solve(
+                (forward_factors[s], True), tmp_sum).T)
+
+        # Create the remaining k = 1, ..., s matrices,
+        # only has an effect if s >= 1
+        for k in range(s):
+            forwards.insert(k, prev_forwards[k] - np.dot(
+                forwards[-1], prev_backwards[s-(k+1)]))
+            backwards.insert(k, prev_backwards[k] - np.dot(
+                backwards[-1], prev_forwards[s-(k+1)]))
+
+        # Partial autocorrelation matrix: P_{s+1}
+        # P = L^{-1} phi L*
+        # L P = (phi L*)
+        partial_autocorrelations.append(linalg.solve_triangular(
+            forward_factors[s], np.dot(forwards[s], backward_factors[s]),
+            lower=True))
+
+    return partial_autocorrelations
+
+
+def _compute_multivariate_pacf_from_coefficients(constrained, error_variance,
+                                                 order=None, k_endog=None):
+    r"""
     Transform matrices corresponding to a stationary (or invertible) process
     to matrices with singular values less than one.

@@ -717,9 +1354,26 @@ def _compute_multivariate_pacf_from_coefficients(constrained,
     Coefficients are assumed to be provided from the VAR model:

     .. math::
-        y_t = A_1 y_{t-1} + \\dots + A_p y_{t-p} + \\varepsilon_t
+        y_t = A_1 y_{t-1} + \dots + A_p y_{t-p} + \varepsilon_t
     """
-    pass
+
+    if type(constrained) is list:
+        order = len(constrained)
+        k_endog = constrained[0].shape[0]
+    else:
+        k_endog, order = constrained.shape
+        order //= k_endog
+
+    # Get autocovariances for the process; these are defined to be
+    # E z_t z_{t-j}'
+    # However, we want E z_t z_{t+j}' = (E z_t z_{t-j}')'
+    _acovf = _compute_multivariate_acovf_from_coefficients
+
+    autocovariances = [
+        autocovariance.T for autocovariance in
+        _acovf(constrained, error_variance, maxlag=order)]
+
+    return _compute_multivariate_pacf_from_autocovariances(autocovariances)


 def unconstrain_stationary_multivariate(constrained, error_variance):
@@ -760,7 +1414,34 @@ def unconstrain_stationary_multivariate(constrained, error_variance):
        to Enforce Stationarity."
        Journal of Statistical Computation and Simulation 24 (2): 99-106.
     """
-    pass
+    use_list = type(constrained) is list
+    if not use_list:
+        k_endog, order = constrained.shape
+        order //= k_endog
+
+        constrained = [
+            constrained[:k_endog, i*k_endog:(i+1)*k_endog]
+            for i in range(order)
+        ]
+    else:
+        order = len(constrained)
+        k_endog = constrained[0].shape[0]
+
+    # Step 1: convert matrices from the space of stationary
+    # coefficient matrices to our "partial autocorrelation matrix" space
+    # (matrices with singular values less than one)
+    partial_autocorrelations = _compute_multivariate_pacf_from_coefficients(
+        constrained, error_variance, order, k_endog)
+
+    # Step 2: convert from arbitrary matrices to those with singular values
+    # less than one.
+    unconstrained = _unconstrain_sv_less_than_one(
+        partial_autocorrelations, order, k_endog)
+
+    if not use_list:
+        unconstrained = np.concatenate(unconstrained, axis=1)
+
+    return unconstrained, error_variance


 def validate_matrix_shape(name, shape, nrows, ncols, nobs):
@@ -787,7 +1468,33 @@ def validate_matrix_shape(name, shape, nrows, ncols, nobs):
     ValueError
         If the matrix is not of the desired shape.
     """
-    pass
+    ndim = len(shape)
+
+    # Enforce dimension
+    if ndim not in [2, 3]:
+        raise ValueError('Invalid value for %s matrix. Requires a'
+                         ' 2- or 3-dimensional array, got %d dimensions' %
+                         (name, ndim))
+    # Enforce the shape of the matrix
+    if not shape[0] == nrows:
+        raise ValueError('Invalid dimensions for %s matrix: requires %d'
+                         ' rows, got %d' % (name, nrows, shape[0]))
+    if not shape[1] == ncols:
+        raise ValueError('Invalid dimensions for %s matrix: requires %d'
+                         ' columns, got %d' % (name, ncols, shape[1]))
+
+    # If we do not yet know `nobs`, do not allow time-varying arrays
+    if nobs is None and not (ndim == 2 or shape[-1] == 1):
+        raise ValueError('Invalid dimensions for %s matrix: time-varying'
+                         ' matrices cannot be given unless `nobs` is specified'
+                         ' (implicitly when a dataset is bound or else set'
+                         ' explicity)' % name)
+
+    # Enforce time-varying array size
+    if ndim == 3 and nobs is not None and not shape[-1] in [1, nobs]:
+        raise ValueError('Invalid dimensions for time-varying %s'
+                         ' matrix. Requires shape (*,*,%d), got %s' %
+                         (name, nobs, str(shape)))


 def validate_vector_shape(name, shape, nrows, nobs):
@@ -812,11 +1519,34 @@ def validate_vector_shape(name, shape, nrows, nobs):
     ValueError
         If the vector is not of the desired shape.
     """
-    pass
+    ndim = len(shape)
+    # Enforce dimension
+    if ndim not in [1, 2]:
+        raise ValueError('Invalid value for %s vector. Requires a'
+                         ' 1- or 2-dimensional array, got %d dimensions' %
+                         (name, ndim))
+    # Enforce the shape of the vector
+    if not shape[0] == nrows:
+        raise ValueError('Invalid dimensions for %s vector: requires %d'
+                         ' rows, got %d' % (name, nrows, shape[0]))
+
+    # If we do not yet know `nobs`, do not allow time-varying arrays
+    if nobs is None and not (ndim == 1 or shape[-1] == 1):
+        raise ValueError('Invalid dimensions for %s vector: time-varying'
+                         ' vectors cannot be given unless `nobs` is specified'
+                         ' (implicitly when a dataset is bound or else set'
+                         ' explicity)' % name)
+
+    # Enforce time-varying array size
+    if ndim == 2 and not shape[1] in [1, nobs]:
+        raise ValueError('Invalid dimensions for time-varying %s'
+                         ' vector. Requires shape (*,%d), got %s' %
+                         (name, nobs, str(shape)))


 def reorder_missing_matrix(matrix, missing, reorder_rows=False,
-    reorder_cols=False, is_diagonal=False, inplace=False, prefix=None):
+                           reorder_cols=False, is_diagonal=False,
+                           inplace=False, prefix=None):
     """
     Reorder the rows or columns of a time-varying matrix where all non-missing
     values are in the upper left corner of the matrix.
@@ -848,7 +1578,17 @@ def reorder_missing_matrix(matrix, missing, reorder_rows=False,
     reordered_matrix : array_like
         The reordered matrix.
     """
-    pass
+    if prefix is None:
+        prefix = find_best_blas_type((matrix,))[0]
+    reorder = prefix_reorder_missing_matrix_map[prefix]
+
+    if not inplace:
+        matrix = np.copy(matrix, order='F')
+
+    reorder(matrix, np.asfortranarray(missing), reorder_rows, reorder_cols,
+            is_diagonal)
+
+    return matrix


 def reorder_missing_vector(vector, missing, inplace=False, prefix=None):
@@ -873,11 +1613,20 @@ def reorder_missing_vector(vector, missing, inplace=False, prefix=None):
     reordered_vector : array_like
         The reordered vector.
     """
-    pass
+    if prefix is None:
+        prefix = find_best_blas_type((vector,))[0]
+    reorder = prefix_reorder_missing_vector_map[prefix]

+    if not inplace:
+        vector = np.copy(vector, order='F')

-def copy_missing_matrix(A, B, missing, missing_rows=False, missing_cols=
-    False, is_diagonal=False, inplace=False, prefix=None):
+    reorder(vector, np.asfortranarray(missing))
+
+    return vector
+
+
+def copy_missing_matrix(A, B, missing, missing_rows=False, missing_cols=False,
+                        is_diagonal=False, inplace=False, prefix=None):
     """
     Copy the rows or columns of a time-varying matrix where all non-missing
     values are in the upper left corner of the matrix.
@@ -912,7 +1661,25 @@ def copy_missing_matrix(A, B, missing, missing_rows=False, missing_cols=
     copied_matrix : array_like
         The matrix B with the non-missing submatrix of A copied onto it.
     """
-    pass
+    if prefix is None:
+        prefix = find_best_blas_type((A, B))[0]
+    copy = prefix_copy_missing_matrix_map[prefix]
+
+    if not inplace:
+        B = np.copy(B, order='F')
+
+    # We may have been given an F-contiguous memoryview; in that case, we do
+    # not want to alter it or convert it to a numpy array
+    try:
+        if not A.is_f_contig():
+            raise ValueError()
+    except (AttributeError, ValueError):
+        A = np.asfortranarray(A)
+
+    copy(A, B, np.asfortranarray(missing), missing_rows, missing_cols,
+         is_diagonal)
+
+    return B


 def copy_missing_vector(a, b, missing, inplace=False, prefix=None):
@@ -939,11 +1706,28 @@ def copy_missing_vector(a, b, missing, inplace=False, prefix=None):
     copied_vector : array_like
         The vector b with the non-missing subvector of b copied onto it.
     """
-    pass
+    if prefix is None:
+        prefix = find_best_blas_type((a, b))[0]
+    copy = prefix_copy_missing_vector_map[prefix]
+
+    if not inplace:
+        b = np.copy(b, order='F')
+
+    # We may have been given an F-contiguous memoryview; in that case, we do
+    # not want to alter it or convert it to a numpy array
+    try:
+        if not a.is_f_contig():
+            raise ValueError()
+    except (AttributeError, ValueError):
+        a = np.asfortranarray(a)
+
+    copy(a, b, np.asfortranarray(missing))
+
+    return b


 def copy_index_matrix(A, B, index, index_rows=False, index_cols=False,
-    is_diagonal=False, inplace=False, prefix=None):
+                      is_diagonal=False, inplace=False, prefix=None):
     """
     Copy the rows or columns of a time-varying matrix where all non-index
     values are in the upper left corner of the matrix.
@@ -978,7 +1762,25 @@ def copy_index_matrix(A, B, index, index_rows=False, index_cols=False,
     copied_matrix : array_like
         The matrix B with the non-index submatrix of A copied onto it.
     """
-    pass
+    if prefix is None:
+        prefix = find_best_blas_type((A, B))[0]
+    copy = prefix_copy_index_matrix_map[prefix]
+
+    if not inplace:
+        B = np.copy(B, order='F')
+
+    # We may have been given an F-contiguous memoryview; in that case, we do
+    # not want to alter it or convert it to a numpy array
+    try:
+        if not A.is_f_contig():
+            raise ValueError()
+    except (AttributeError, ValueError):
+        A = np.asfortranarray(A)
+
+    copy(A, B, np.asfortranarray(index), index_rows, index_cols,
+         is_diagonal)
+
+    return B


 def copy_index_vector(a, b, index, inplace=False, prefix=None):
@@ -1005,17 +1807,176 @@ def copy_index_vector(a, b, index, inplace=False, prefix=None):
     copied_vector : array_like
         The vector b with the non-index subvector of b copied onto it.
     """
-    pass
+    if prefix is None:
+        prefix = find_best_blas_type((a, b))[0]
+    copy = prefix_copy_index_vector_map[prefix]
+
+    if not inplace:
+        b = np.copy(b, order='F')
+
+    # We may have been given an F-contiguous memoryview; in that case, we do
+    # not want to alter it or convert it to a numpy array
+    try:
+        if not a.is_f_contig():
+            raise ValueError()
+    except (AttributeError, ValueError):
+        a = np.asfortranarray(a)
+
+    copy(a, b, np.asfortranarray(index))
+
+    return b
+
+
+def prepare_exog(exog):
+    k_exog = 0
+    if exog is not None:
+        exog_is_using_pandas = _is_using_pandas(exog, None)
+        if not exog_is_using_pandas:
+            exog = np.asarray(exog)
+
+        # Make sure we have 2-dimensional array
+        if exog.ndim == 1:
+            if not exog_is_using_pandas:
+                exog = exog[:, None]
+            else:
+                exog = pd.DataFrame(exog)
+
+        k_exog = exog.shape[1]
+    return (k_exog, exog)
+
+
+def prepare_trend_spec(trend):
+    # Trend
+    if trend is None or trend == 'n':
+        polynomial_trend = np.ones(0)
+    elif trend == 'c':
+        polynomial_trend = np.r_[1]
+    elif trend == 't':
+        polynomial_trend = np.r_[0, 1]
+    elif trend == 'ct':
+        polynomial_trend = np.r_[1, 1]
+    elif trend == 'ctt':
+        # TODO deprecate ctt?
+        polynomial_trend = np.r_[1, 1, 1]
+    else:
+        trend = np.array(trend)
+        if trend.ndim > 0:
+            polynomial_trend = (trend > 0).astype(int)
+        else:
+            raise ValueError(
+                "Valid trend inputs are 'c' (constant), 't' (linear trend in "
+                "time), 'ct' (both), 'ctt' (both with trend squared) or an "
+                "interable defining a polynomial, e.g., [1, 1, 0, 1] is `a + "
+                f"b*t + ct**3`. Received {trend}"
+            )
+
+    # Note: k_trend is not the degree of the trend polynomial, because e.g.
+    # k_trend = 1 corresponds to the degree zero polynomial (with only a
+    # constant term).
+    k_trend = int(np.sum(polynomial_trend))
+
+    return polynomial_trend, k_trend
+
+
+def prepare_trend_data(polynomial_trend, k_trend, nobs, offset=1):
+    # Cache the arrays for calculating the intercept from the trend
+    # components
+    time_trend = np.arange(offset, nobs + offset)
+    trend_data = np.zeros((nobs, k_trend))
+    i = 0
+    for k in polynomial_trend.nonzero()[0]:
+        if k == 0:
+            trend_data[:, i] = np.ones(nobs,)
+        else:
+            trend_data[:, i] = time_trend**k
+        i += 1
+
+    return trend_data


 def _safe_cond(a):
     """Compute condition while protecting from LinAlgError"""
-    pass
+    try:
+        return np.linalg.cond(a)
+    except np.linalg.LinAlgError:
+        if np.any(np.isnan(a)):
+            return np.nan
+        else:
+            return np.inf
+
+
+def _compute_smoothed_state_weights(ssm, compute_t=None, compute_j=None,
+                                    compute_prior_weights=None, scale=1.0):
+    # Get references to the Cython objects
+    _model = ssm._statespace
+    _kfilter = ssm._kalman_filter
+    _smoother = ssm._kalman_smoother
+
+    # Determine the appropriate function for the dtype
+    func = prefix_compute_smoothed_state_weights_map[ssm.prefix]
+
+    # Handle compute_t and compute_j indexes
+    if compute_t is None:
+        compute_t = np.arange(ssm.nobs)
+    if compute_j is None:
+        compute_j = np.arange(ssm.nobs)
+    compute_t = np.unique(np.atleast_1d(compute_t).astype(np.int32))
+    compute_t.sort()
+    compute_j = np.unique(np.atleast_1d(compute_j).astype(np.int32))
+    compute_j.sort()
+
+    # Default setting for computing the prior weights
+    if compute_prior_weights is None:
+        compute_prior_weights = compute_j[0] == 0
+    # Validate that compute_prior_weights is valid
+    if compute_prior_weights and compute_j[0] != 0:
+        raise ValueError('If `compute_prior_weights` is set to True, then'
+                         ' `compute_j` must include the time period 0.')
+
+    # Compute the weights
+    weights, state_intercept_weights, prior_weights, _ = func(
+        _smoother, _kfilter, _model, compute_t, compute_j, scale,
+        bool(compute_prior_weights))
+
+    # Re-order missing entries correctly and transpose to the appropriate
+    # shape
+    t0 = min(compute_t[0], compute_j[0])
+    missing = np.isnan(ssm.endog[:, t0:])
+    if np.any(missing):
+        shape = weights.shape
+        # Transpose m, p, t, j, -> t, m, p, j so that we can use the
+        # `reorder_missing_matrix` function
+        weights = np.asfortranarray(weights.transpose(2, 0, 1, 3).reshape(
+            shape[2] * shape[0], shape[1], shape[3], order='C'))
+        missing = np.asfortranarray(missing.astype(np.int32))
+        reorder_missing_matrix(weights, missing, reorder_cols=True,
+                               inplace=True)
+        # Transpose t, m, p, j -> t, j, m, p,
+        weights = (weights.reshape(shape[2], shape[0], shape[1], shape[3])
+                          .transpose(0, 3, 1, 2))
+    else:
+        # Transpose m, p, t, j -> t, j, m, p
+        weights = weights.transpose(2, 3, 0, 1)
+
+    # Transpose m, l, t, j -> t, j, m, l
+    state_intercept_weights = state_intercept_weights.transpose(2, 3, 0, 1)
+
+    # Transpose m, l, t -> t, m, l
+    prior_weights = prior_weights.transpose(2, 0, 1)
+
+    # Subset to the actual computed t, j elements
+    ix_tj = np.ix_(compute_t - t0, compute_j - t0)
+    weights = weights[ix_tj]
+    state_intercept_weights = state_intercept_weights[ix_tj]
+    if compute_prior_weights:
+        prior_weights = prior_weights[compute_t - t0]
+
+    return weights, state_intercept_weights, prior_weights


 def compute_smoothed_state_weights(results, compute_t=None, compute_j=None,
-    compute_prior_weights=None, resmooth=None):
-    """
+                                   compute_prior_weights=None, resmooth=None):
+    r"""
     Construct the weights of observations and the prior on the smoothed state

     Parameters
@@ -1076,10 +2037,10 @@ def compute_smoothed_state_weights(results, compute_t=None, compute_j=None,

     .. math::

-        \\hat \\alpha_t = \\sum_{j=1}^n \\omega_{jt}^{\\hat \\alpha} y_j
+        \hat \alpha_t = \sum_{j=1}^n \omega_{jt}^{\hat \alpha} y_j

     One output of this function is the weights
-    :math:`\\omega_{jt}^{\\hat \\alpha}`. Note that the description in [1]_
+    :math:`\omega_{jt}^{\hat \alpha}`. Note that the description in [1]_
     assumes that the prior mean (or "initial state") is fixed to be zero. More
     generally, the smoothed state vector will also depend partly on the prior.
     The second output of this function are the weights of the prior mean.
@@ -1117,11 +2078,30 @@ def compute_smoothed_state_weights(results, compute_t=None, compute_j=None,
             Time Series Analysis by State Space Methods: Second Edition.
             Oxford University Press.
     """
-    pass
-
-
-def get_impact_dates(previous_model, updated_model, impact_date=None, start
-    =None, end=None, periods=None):
+    # Get the python model object
+    mod = results.model
+    # Always update the parameters to be consistent with `res`
+    mod.update(results.params)
+    # By default, resmooth if it appears the results have changed; check is
+    # based on the smoothed state vector
+    if resmooth is None:
+        resmooth = np.any(results.smoothed_state !=
+                          mod.ssm._kalman_smoother.smoothed_state)
+    # Resmooth if necessary, otherwise at least update the Cython model
+    if resmooth:
+        mod.ssm.smooth(conserve_memory=0, update_representation=False,
+                       update_filter=False, update_smoother=False)
+    else:
+        mod.ssm._initialize_representation()
+
+    return _compute_smoothed_state_weights(
+        mod.ssm, compute_t=compute_t, compute_j=compute_j,
+        compute_prior_weights=compute_prior_weights,
+        scale=results.filter_results.scale)
+
+
+def get_impact_dates(previous_model, updated_model, impact_date=None,
+                     start=None, end=None, periods=None):
     """
     Compute start/end periods and an index, often for impacts of data updates

@@ -1174,7 +2154,41 @@ def get_impact_dates(previous_model, updated_model, impact_date=None, start
     (here contained in the `updated_model`).

     """
-    pass
+    # There doesn't seem to be any universal default that both (a) make
+    # sense for all data update combinations, and (b) work with both
+    # time-invariant and time-varying models. So we require that the user
+    # specify exactly two of start, end, periods.
+    if impact_date is not None:
+        if not (start is None and end is None and periods is None):
+            raise ValueError('Cannot use the `impact_date` argument in'
+                             ' combination with `start`, `end`, or'
+                             ' `periods`.')
+        start = impact_date
+        periods = 1
+    if start is None and end is None and periods is None:
+        start = previous_model.nobs - 1
+        end = previous_model.nobs - 1
+    if int(start is None) + int(end is None) + int(periods is None) != 1:
+        raise ValueError('Of the three parameters: start, end, and'
+                         ' periods, exactly two must be specified')
+    # If we have the `periods` object, we need to convert `start`/`end` to
+    # integers so that we can compute the other one. That's because
+    # _get_prediction_index doesn't support a `periods` argument
+    elif start is not None and periods is not None:
+        start, _, _, _ = updated_model._get_prediction_index(start, start)
+        end = start + (periods - 1)
+    elif end is not None and periods is not None:
+        _, end, _, _ = updated_model._get_prediction_index(end, end)
+        start = end - (periods - 1)
+    elif start is not None and end is not None:
+        pass
+
+    # Get the integer-based start, end and the prediction index
+    start, end, out_of_sample, prediction_index = (
+        updated_model._get_prediction_index(start, end))
+    end = end + out_of_sample
+
+    return start, end, prediction_index


 def _atleast_1d(*arys):
@@ -1185,7 +2199,21 @@ def _atleast_1d(*arys):

     1. It allows for `None` arguments, and passes them directly through
     """
-    pass
+    res = []
+    for ary in arys:
+        if ary is None:
+            result = None
+        else:
+            ary = np.asanyarray(ary)
+            if ary.ndim == 0:
+                result = ary.reshape(1)
+            else:
+                result = ary
+        res.append(result)
+    if len(res) == 1:
+        return res[0]
+    else:
+        return res


 def _atleast_2d(*arys):
@@ -1197,4 +2225,20 @@ def _atleast_2d(*arys):
     1. It allows for `None` arguments, and passes them directly through
     2. Instead of creating new axis at the beginning, it creates it at the end
     """
-    pass
+    res = []
+    for ary in arys:
+        if ary is None:
+            result = None
+        else:
+            ary = np.asanyarray(ary)
+            if ary.ndim == 0:
+                result = ary.reshape(1, 1)
+            elif ary.ndim == 1:
+                result = ary[:, np.newaxis]
+            else:
+                result = ary
+        res.append(result)
+    if len(res) == 1:
+        return res[0]
+    else:
+        return res
diff --git a/statsmodels/tsa/statespace/varmax.py b/statsmodels/tsa/statespace/varmax.py
index a73904b39..303044d26 100644
--- a/statsmodels/tsa/statespace/varmax.py
+++ b/statsmodels/tsa/statespace/varmax.py
@@ -1,27 +1,36 @@
+# -*- coding: utf-8 -*-
 """
 Vector Autoregressive Moving Average with eXogenous regressors model

 Author: Chad Fulton
 License: Simplified-BSD
 """
+
 import contextlib
 from warnings import warn
+
 import pandas as pd
 import numpy as np
+
 from statsmodels.compat.pandas import Appender
 from statsmodels.tools.tools import Bunch
 from statsmodels.tools.data import _is_using_pandas
 from statsmodels.tsa.vector_ar import var_model
 import statsmodels.base.wrapper as wrap
 from statsmodels.tools.sm_exceptions import EstimationWarning
+
 from .kalman_filter import INVERT_UNIVARIATE, SOLVE_LU
 from .mlemodel import MLEModel, MLEResults, MLEResultsWrapper
 from .initialization import Initialization
-from .tools import is_invertible, concat, prepare_exog, constrain_stationary_multivariate, unconstrain_stationary_multivariate, prepare_trend_spec, prepare_trend_data
+from .tools import (
+    is_invertible, concat, prepare_exog,
+    constrain_stationary_multivariate, unconstrain_stationary_multivariate,
+    prepare_trend_spec, prepare_trend_data
+)


 class VARMAX(MLEModel):
-    """
+    r"""
     Vector Autoregressive Moving Average with eXogenous regressors model

     Parameters
@@ -99,15 +108,15 @@ class VARMAX(MLEModel):

     .. math::

-        y_t = A(t) + A_1 y_{t-1} + \\dots + A_p y_{t-p} + B x_t + \\epsilon_t +
-        M_1 \\epsilon_{t-1} + \\dots M_q \\epsilon_{t-q}
+        y_t = A(t) + A_1 y_{t-1} + \dots + A_p y_{t-p} + B x_t + \epsilon_t +
+        M_1 \epsilon_{t-1} + \dots M_q \epsilon_{t-q}

-    where :math:`\\epsilon_t \\sim N(0, \\Omega)`, and where :math:`y_t` is a
+    where :math:`\epsilon_t \sim N(0, \Omega)`, and where :math:`y_t` is a
     `k_endog x 1` vector. Additionally, this model allows considering the case
     where the variables are measured with error.

     Note that in the full VARMA(p,q) case there is a fundamental identification
-    problem in that the coefficient matrices :math:`\\{A_i, M_j\\}` are not
+    problem in that the coefficient matrices :math:`\{A_i, M_j\}` are not
     generally unique, meaning that for a given time series process there may
     be multiple sets of matrices that equivalently represent it. See Chapter 12
     of [1]_ for more information. Although this class can be used to estimate
@@ -122,80 +131,144 @@ class VARMAX(MLEModel):
     """

     def __init__(self, endog, exog=None, order=(1, 0), trend='c',
-        error_cov_type='unstructured', measurement_error=False,
-        enforce_stationarity=True, enforce_invertibility=True, trend_offset
-        =1, **kwargs):
+                 error_cov_type='unstructured', measurement_error=False,
+                 enforce_stationarity=True, enforce_invertibility=True,
+                 trend_offset=1, **kwargs):
+
+        # Model parameters
         self.error_cov_type = error_cov_type
         self.measurement_error = measurement_error
         self.enforce_stationarity = enforce_stationarity
         self.enforce_invertibility = enforce_invertibility
+
+        # Save the given orders
         self.order = order
+
+        # Model orders
         self.k_ar = int(order[0])
         self.k_ma = int(order[1])
+
+        # Check for valid model
         if error_cov_type not in ['diagonal', 'unstructured']:
-            raise ValueError(
-                'Invalid error covariance matrix type specification.')
+            raise ValueError('Invalid error covariance matrix type'
+                             ' specification.')
         if self.k_ar == 0 and self.k_ma == 0:
-            raise ValueError(
-                'Invalid VARMAX(p,q) specification; at least one p,q must be greater than zero.'
-                )
+            raise ValueError('Invalid VARMAX(p,q) specification; at least one'
+                             ' p,q must be greater than zero.')
+
+        # Warn for VARMA model
         if self.k_ar > 0 and self.k_ma > 0:
-            warn(
-                'Estimation of VARMA(p,q) models is not generically robust, due especially to identification issues.'
-                , EstimationWarning)
+            warn('Estimation of VARMA(p,q) models is not generically robust,'
+                 ' due especially to identification issues.',
+                 EstimationWarning)
+
+        # Trend
         self.trend = trend
         self.trend_offset = trend_offset
         self.polynomial_trend, self.k_trend = prepare_trend_spec(self.trend)
-        self._trend_is_const = (self.polynomial_trend.size == 1 and self.
-            polynomial_trend[0] == 1)
-        self.k_exog, exog = prepare_exog(exog)
+        self._trend_is_const = (self.polynomial_trend.size == 1 and
+                                self.polynomial_trend[0] == 1)
+
+        # Exogenous data
+        (self.k_exog, exog) = prepare_exog(exog)
+
+        # Note: at some point in the future might add state regression, as in
+        # SARIMAX.
         self.mle_regression = self.k_exog > 0
+
+        # We need to have an array or pandas at this point
         if not _is_using_pandas(endog, None):
             endog = np.asanyarray(endog)
+
+        # Model order
+        # Used internally in various places
         _min_k_ar = max(self.k_ar, 1)
         self._k_order = _min_k_ar + self.k_ma
+
+        # Number of states
         k_endog = endog.shape[1]
         k_posdef = k_endog
         k_states = k_endog * self._k_order
+
+        # By default, initialize as stationary
         kwargs.setdefault('initialization', 'stationary')
+
+        # By default, use LU decomposition
         kwargs.setdefault('inversion_method', INVERT_UNIVARIATE | SOLVE_LU)
-        super(VARMAX, self).__init__(endog, exog=exog, k_states=k_states,
-            k_posdef=k_posdef, **kwargs)
-        if self.k_exog > 0 or self.k_trend > 0 and not self._trend_is_const:
+
+        # Initialize the state space model
+        super(VARMAX, self).__init__(
+            endog, exog=exog, k_states=k_states, k_posdef=k_posdef, **kwargs
+        )
+
+        # Set as time-varying model if we have time-trend or exog
+        if self.k_exog > 0 or (self.k_trend > 0 and not self._trend_is_const):
             self.ssm._time_invariant = False
+
+        # Initialize the parameters
         self.parameters = {}
         self.parameters['trend'] = self.k_endog * self.k_trend
-        self.parameters['ar'] = self.k_endog ** 2 * self.k_ar
-        self.parameters['ma'] = self.k_endog ** 2 * self.k_ma
+        self.parameters['ar'] = self.k_endog**2 * self.k_ar
+        self.parameters['ma'] = self.k_endog**2 * self.k_ma
         self.parameters['regression'] = self.k_endog * self.k_exog
         if self.error_cov_type == 'diagonal':
             self.parameters['state_cov'] = self.k_endog
+        # These parameters fill in a lower-triangular matrix which is then
+        # dotted with itself to get a positive definite matrix.
         elif self.error_cov_type == 'unstructured':
-            self.parameters['state_cov'] = int(self.k_endog * (self.k_endog +
-                1) / 2)
+            self.parameters['state_cov'] = (
+                int(self.k_endog * (self.k_endog + 1) / 2)
+            )
         self.parameters['obs_cov'] = self.k_endog * self.measurement_error
         self.k_params = sum(self.parameters.values())
-        trend_data = prepare_trend_data(self.polynomial_trend, self.k_trend,
-            self.nobs + 1, offset=self.trend_offset)
+
+        # Initialize trend data: we create trend data with one more observation
+        # than we actually have, to make it easier to insert the appropriate
+        # trend component into the final state intercept.
+        trend_data = prepare_trend_data(
+            self.polynomial_trend, self.k_trend, self.nobs + 1,
+            offset=self.trend_offset)
         self._trend_data = trend_data[:-1]
         self._final_trend = trend_data[-1:]
-        if self.k_trend > 0 and not self._trend_is_const or self.k_exog > 0:
+
+        # Initialize known elements of the state space matrices
+
+        # If we have exog effects, then the state intercept needs to be
+        # time-varying
+        if (self.k_trend > 0 and not self._trend_is_const) or self.k_exog > 0:
             self.ssm['state_intercept'] = np.zeros((self.k_states, self.nobs))
+            # self.ssm['obs_intercept'] = np.zeros((self.k_endog, self.nobs))
+
+        # The design matrix is just an identity for the first k_endog states
         idx = np.diag_indices(self.k_endog)
         self.ssm[('design',) + idx] = 1
+
+        # The transition matrix is described in four blocks, where the upper
+        # left block is in companion form with the autoregressive coefficient
+        # matrices (so it is shaped k_endog * k_ar x k_endog * k_ar) ...
         if self.k_ar > 0:
             idx = np.diag_indices((self.k_ar - 1) * self.k_endog)
             idx = idx[0] + self.k_endog, idx[1]
             self.ssm[('transition',) + idx] = 1
+        # ... and the  lower right block is in companion form with zeros as the
+        # coefficient matrices (it is shaped k_endog * k_ma x k_endog * k_ma).
         idx = np.diag_indices((self.k_ma - 1) * self.k_endog)
-        idx = idx[0] + (_min_k_ar + 1) * self.k_endog, idx[1
-            ] + _min_k_ar * self.k_endog
+        idx = (idx[0] + (_min_k_ar + 1) * self.k_endog,
+               idx[1] + _min_k_ar * self.k_endog)
         self.ssm[('transition',) + idx] = 1
+
+        # The selection matrix is described in two blocks, where the upper
+        # block selects the all k_posdef errors in the first k_endog rows
+        # (the upper block is shaped k_endog * k_ar x k) and the lower block
+        # also selects all k_posdef errors in the first k_endog rows (the lower
+        # block is shaped k_endog * k_ma x k).
         idx = np.diag_indices(self.k_endog)
         self.ssm[('selection',) + idx] = 1
         idx = idx[0] + _min_k_ar * self.k_endog, idx[1]
         if self.k_ma > 0:
             self.ssm[('selection',) + idx] = 1
+
+        # Cache some indices
         if self._trend_is_const and self.k_exog == 0:
             self._idx_state_intercept = np.s_['state_intercept', :k_endog, :]
         elif self.k_trend > 0 or self.k_exog > 0:
@@ -205,18 +278,20 @@ class VARMAX(MLEModel):
         else:
             self._idx_transition = np.s_['transition', :k_endog, k_endog:]
         if self.error_cov_type == 'diagonal':
-            self._idx_state_cov = ('state_cov',) + np.diag_indices(self.k_endog
-                )
+            self._idx_state_cov = (
+                ('state_cov',) + np.diag_indices(self.k_endog))
         elif self.error_cov_type == 'unstructured':
             self._idx_lower_state_cov = np.tril_indices(self.k_endog)
         if self.measurement_error:
             self._idx_obs_cov = ('obs_cov',) + np.diag_indices(self.k_endog)

+        # Cache some slices
         def _slice(key, offset):
             length = self.parameters[key]
             param_slice = np.s_[offset:offset + length]
             offset += length
             return param_slice, offset
+
         offset = 0
         self._params_trend, offset = _slice('trend', offset)
         self._params_ar, offset = _slice('ar', offset)
@@ -224,10 +299,215 @@ class VARMAX(MLEModel):
         self._params_regression, offset = _slice('regression', offset)
         self._params_state_cov, offset = _slice('state_cov', offset)
         self._params_obs_cov, offset = _slice('obs_cov', offset)
+
+        # Variable holding optional final `exog`
+        # (note: self._final_trend was set earlier)
         self._final_exog = None
+
+        # Update _init_keys attached by super
         self._init_keys += ['order', 'trend', 'error_cov_type',
-            'measurement_error', 'enforce_stationarity',
-            'enforce_invertibility'] + list(kwargs.keys())
+                            'measurement_error', 'enforce_stationarity',
+                            'enforce_invertibility'] + list(kwargs.keys())
+
+    def clone(self, endog, exog=None, **kwargs):
+        return self._clone_from_init_kwds(endog, exog=exog, **kwargs)
+
+    @property
+    def _res_classes(self):
+        return {'fit': (VARMAXResults, VARMAXResultsWrapper)}
+
+    @property
+    def start_params(self):
+        params = np.zeros(self.k_params, dtype=np.float64)
+
+        # A. Run a multivariate regression to get beta estimates
+        endog = pd.DataFrame(self.endog.copy())
+        endog = endog.interpolate()
+        endog = np.require(endog.bfill(), requirements="W")
+        exog = None
+        if self.k_trend > 0 and self.k_exog > 0:
+            exog = np.c_[self._trend_data, self.exog]
+        elif self.k_trend > 0:
+            exog = self._trend_data
+        elif self.k_exog > 0:
+            exog = self.exog
+
+        # Although the Kalman filter can deal with missing values in endog,
+        # conditional sum of squares cannot
+        if np.any(np.isnan(endog)):
+            mask = ~np.any(np.isnan(endog), axis=1)
+            endog = endog[mask]
+            if exog is not None:
+                exog = exog[mask]
+
+        # Regression and trend effects via OLS
+        trend_params = np.zeros(0)
+        exog_params = np.zeros(0)
+        if self.k_trend > 0 or self.k_exog > 0:
+            trendexog_params = np.linalg.pinv(exog).dot(endog)
+            endog -= np.dot(exog, trendexog_params)
+            if self.k_trend > 0:
+                trend_params = trendexog_params[:self.k_trend].T
+            if self.k_endog > 0:
+                exog_params = trendexog_params[self.k_trend:].T
+
+        # B. Run a VAR model on endog to get trend, AR parameters
+        ar_params = []
+        k_ar = self.k_ar if self.k_ar > 0 else 1
+        mod_ar = var_model.VAR(endog)
+        res_ar = mod_ar.fit(maxlags=k_ar, ic=None, trend='n')
+        if self.k_ar > 0:
+            ar_params = np.array(res_ar.params).T.ravel()
+        endog = res_ar.resid
+
+        # Test for stationarity
+        if self.k_ar > 0 and self.enforce_stationarity:
+            coefficient_matrices = (
+                ar_params.reshape(
+                    self.k_endog * self.k_ar, self.k_endog
+                ).T
+            ).reshape(self.k_endog, self.k_endog, self.k_ar).T
+
+            stationary = is_invertible([1] + list(-coefficient_matrices))
+
+            if not stationary:
+                warn('Non-stationary starting autoregressive parameters'
+                     ' found. Using zeros as starting parameters.')
+                ar_params *= 0
+
+        # C. Run a VAR model on the residuals to get MA parameters
+        ma_params = []
+        if self.k_ma > 0:
+            mod_ma = var_model.VAR(endog)
+            res_ma = mod_ma.fit(maxlags=self.k_ma, ic=None, trend='n')
+            ma_params = np.array(res_ma.params.T).ravel()
+
+            # Test for invertibility
+            if self.enforce_invertibility:
+                coefficient_matrices = (
+                    ma_params.reshape(
+                        self.k_endog * self.k_ma, self.k_endog
+                    ).T
+                ).reshape(self.k_endog, self.k_endog, self.k_ma).T
+
+                invertible = is_invertible([1] + list(-coefficient_matrices))
+
+                if not invertible:
+                    warn('Non-stationary starting moving-average parameters'
+                         ' found. Using zeros as starting parameters.')
+                    ma_params *= 0
+
+        # Transform trend / exog params from mean form to intercept form
+        if self.k_ar > 0 and (self.k_trend > 0 or self.mle_regression):
+            coefficient_matrices = (
+                ar_params.reshape(
+                    self.k_endog * self.k_ar, self.k_endog
+                ).T
+            ).reshape(self.k_endog, self.k_endog, self.k_ar).T
+
+            tmp = np.eye(self.k_endog) - np.sum(coefficient_matrices, axis=0)
+
+            if self.k_trend > 0:
+                trend_params = np.dot(tmp, trend_params)
+            if self.mle_regression > 0:
+                exog_params = np.dot(tmp, exog_params)
+
+        # 1. Intercept terms
+        if self.k_trend > 0:
+            params[self._params_trend] = trend_params.ravel()
+
+        # 2. AR terms
+        if self.k_ar > 0:
+            params[self._params_ar] = ar_params
+
+        # 3. MA terms
+        if self.k_ma > 0:
+            params[self._params_ma] = ma_params
+
+        # 4. Regression terms
+        if self.mle_regression:
+            params[self._params_regression] = exog_params.ravel()
+
+        # 5. State covariance terms
+        if self.error_cov_type == 'diagonal':
+            params[self._params_state_cov] = res_ar.sigma_u.diagonal()
+        elif self.error_cov_type == 'unstructured':
+            cov_factor = np.linalg.cholesky(res_ar.sigma_u)
+            params[self._params_state_cov] = (
+                cov_factor[self._idx_lower_state_cov].ravel())
+
+        # 5. Measurement error variance terms
+        if self.measurement_error:
+            if self.k_ma > 0:
+                params[self._params_obs_cov] = res_ma.sigma_u.diagonal()
+            else:
+                params[self._params_obs_cov] = res_ar.sigma_u.diagonal()
+
+        return params
+
+    @property
+    def param_names(self):
+        param_names = []
+        endog_names = self.endog_names
+        if not isinstance(self.endog_names, list):
+            endog_names = [endog_names]
+
+        # 1. Intercept terms
+        if self.k_trend > 0:
+            for j in range(self.k_endog):
+                for i in self.polynomial_trend.nonzero()[0]:
+                    if i == 0:
+                        param_names += ['intercept.%s' % endog_names[j]]
+                    elif i == 1:
+                        param_names += ['drift.%s' % endog_names[j]]
+                    else:
+                        param_names += ['trend.%d.%s' % (i, endog_names[j])]
+
+        # 2. AR terms
+        param_names += [
+            'L%d.%s.%s' % (i+1, endog_names[k], endog_names[j])
+            for j in range(self.k_endog)
+            for i in range(self.k_ar)
+            for k in range(self.k_endog)
+        ]
+
+        # 3. MA terms
+        param_names += [
+            'L%d.e(%s).%s' % (i+1, endog_names[k], endog_names[j])
+            for j in range(self.k_endog)
+            for i in range(self.k_ma)
+            for k in range(self.k_endog)
+        ]
+
+        # 4. Regression terms
+        param_names += [
+            'beta.%s.%s' % (self.exog_names[j], endog_names[i])
+            for i in range(self.k_endog)
+            for j in range(self.k_exog)
+        ]
+
+        # 5. State covariance terms
+        if self.error_cov_type == 'diagonal':
+            param_names += [
+                'sigma2.%s' % endog_names[i]
+                for i in range(self.k_endog)
+            ]
+        elif self.error_cov_type == 'unstructured':
+            param_names += [
+                ('sqrt.var.%s' % endog_names[i] if i == j else
+                 'sqrt.cov.%s.%s' % (endog_names[j], endog_names[i]))
+                for i in range(self.k_endog)
+                for j in range(i+1)
+            ]
+
+        # 5. Measurement error variance terms
+        if self.measurement_error:
+            param_names += [
+                'measurement_variance.%s' % endog_names[i]
+                for i in range(self.k_endog)
+            ]
+
+        return param_names

     def transform_params(self, unconstrained):
         """
@@ -251,7 +531,66 @@ class VARMAX(MLEModel):
         Constrains the factor transition to be stationary and variances to be
         positive.
         """
-        pass
+        unconstrained = np.array(unconstrained, ndmin=1)
+        constrained = np.zeros(unconstrained.shape, dtype=unconstrained.dtype)
+
+        # 1. Intercept terms: nothing to do
+        constrained[self._params_trend] = unconstrained[self._params_trend]
+
+        # 2. AR terms: optionally force to be stationary
+        if self.k_ar > 0 and self.enforce_stationarity:
+            # Create the state covariance matrix
+            if self.error_cov_type == 'diagonal':
+                state_cov = np.diag(unconstrained[self._params_state_cov]**2)
+            elif self.error_cov_type == 'unstructured':
+                state_cov_lower = np.zeros(self.ssm['state_cov'].shape,
+                                           dtype=unconstrained.dtype)
+                state_cov_lower[self._idx_lower_state_cov] = (
+                    unconstrained[self._params_state_cov])
+                state_cov = np.dot(state_cov_lower, state_cov_lower.T)
+
+            # Transform the parameters
+            coefficients = unconstrained[self._params_ar].reshape(
+                self.k_endog, self.k_endog * self.k_ar)
+            coefficient_matrices, variance = (
+                constrain_stationary_multivariate(coefficients, state_cov))
+            constrained[self._params_ar] = coefficient_matrices.ravel()
+        else:
+            constrained[self._params_ar] = unconstrained[self._params_ar]
+
+        # 3. MA terms: optionally force to be invertible
+        if self.k_ma > 0 and self.enforce_invertibility:
+            # Transform the parameters, using an identity variance matrix
+            state_cov = np.eye(self.k_endog, dtype=unconstrained.dtype)
+            coefficients = unconstrained[self._params_ma].reshape(
+                self.k_endog, self.k_endog * self.k_ma)
+            coefficient_matrices, variance = (
+                constrain_stationary_multivariate(coefficients, state_cov))
+            constrained[self._params_ma] = coefficient_matrices.ravel()
+        else:
+            constrained[self._params_ma] = unconstrained[self._params_ma]
+
+        # 4. Regression terms: nothing to do
+        constrained[self._params_regression] = (
+            unconstrained[self._params_regression])
+
+        # 5. State covariance terms
+        # If we have variances, force them to be positive
+        if self.error_cov_type == 'diagonal':
+            constrained[self._params_state_cov] = (
+                unconstrained[self._params_state_cov]**2)
+        # Otherwise, nothing needs to be done
+        elif self.error_cov_type == 'unstructured':
+            constrained[self._params_state_cov] = (
+                unconstrained[self._params_state_cov])
+
+        # 5. Measurement error variance terms
+        if self.measurement_error:
+            # Force these to be positive
+            constrained[self._params_obs_cov] = (
+                unconstrained[self._params_obs_cov]**2)
+
+        return constrained

     def untransform_params(self, constrained):
         """
@@ -269,7 +608,163 @@ class VARMAX(MLEModel):
         unconstrained : array_like
             Array of unconstrained parameters used by the optimizer.
         """
-        pass
+        constrained = np.array(constrained, ndmin=1)
+        unconstrained = np.zeros(constrained.shape, dtype=constrained.dtype)
+
+        # 1. Intercept terms: nothing to do
+        unconstrained[self._params_trend] = constrained[self._params_trend]
+
+        # 2. AR terms: optionally were forced to be stationary
+        if self.k_ar > 0 and self.enforce_stationarity:
+            # Create the state covariance matrix
+            if self.error_cov_type == 'diagonal':
+                state_cov = np.diag(constrained[self._params_state_cov])
+            elif self.error_cov_type == 'unstructured':
+                state_cov_lower = np.zeros(self.ssm['state_cov'].shape,
+                                           dtype=constrained.dtype)
+                state_cov_lower[self._idx_lower_state_cov] = (
+                    constrained[self._params_state_cov])
+                state_cov = np.dot(state_cov_lower, state_cov_lower.T)
+
+            # Transform the parameters
+            coefficients = constrained[self._params_ar].reshape(
+                self.k_endog, self.k_endog * self.k_ar)
+            unconstrained_matrices, variance = (
+                unconstrain_stationary_multivariate(coefficients, state_cov))
+            unconstrained[self._params_ar] = unconstrained_matrices.ravel()
+        else:
+            unconstrained[self._params_ar] = constrained[self._params_ar]
+
+        # 3. MA terms: optionally were forced to be invertible
+        if self.k_ma > 0 and self.enforce_invertibility:
+            # Transform the parameters, using an identity variance matrix
+            state_cov = np.eye(self.k_endog, dtype=constrained.dtype)
+            coefficients = constrained[self._params_ma].reshape(
+                self.k_endog, self.k_endog * self.k_ma)
+            unconstrained_matrices, variance = (
+                unconstrain_stationary_multivariate(coefficients, state_cov))
+            unconstrained[self._params_ma] = unconstrained_matrices.ravel()
+        else:
+            unconstrained[self._params_ma] = constrained[self._params_ma]
+
+        # 4. Regression terms: nothing to do
+        unconstrained[self._params_regression] = (
+            constrained[self._params_regression])
+
+        # 5. State covariance terms
+        # If we have variances, then these were forced to be positive
+        if self.error_cov_type == 'diagonal':
+            unconstrained[self._params_state_cov] = (
+                constrained[self._params_state_cov]**0.5)
+        # Otherwise, nothing needs to be done
+        elif self.error_cov_type == 'unstructured':
+            unconstrained[self._params_state_cov] = (
+                constrained[self._params_state_cov])
+
+        # 5. Measurement error variance terms
+        if self.measurement_error:
+            # These were forced to be positive
+            unconstrained[self._params_obs_cov] = (
+                constrained[self._params_obs_cov]**0.5)
+
+        return unconstrained
+
+    def _validate_can_fix_params(self, param_names):
+        super(VARMAX, self)._validate_can_fix_params(param_names)
+
+        ix = np.cumsum(list(self.parameters.values()))[:-1]
+        (_, ar_names, ma_names, _, _, _) = [
+            arr.tolist() for arr in np.array_split(self.param_names, ix)]
+
+        if self.enforce_stationarity and self.k_ar > 0:
+            if self.k_endog > 1 or self.k_ar > 1:
+                fix_all = param_names.issuperset(ar_names)
+                fix_any = (
+                    len(param_names.intersection(ar_names)) > 0)
+                if fix_any and not fix_all:
+                    raise ValueError(
+                        'Cannot fix individual autoregressive parameters'
+                        ' when `enforce_stationarity=True`. In this case,'
+                        ' must either fix all autoregressive parameters or'
+                        ' none.')
+        if self.enforce_invertibility and self.k_ma > 0:
+            if self.k_endog or self.k_ma > 1:
+                fix_all = param_names.issuperset(ma_names)
+                fix_any = (
+                    len(param_names.intersection(ma_names)) > 0)
+                if fix_any and not fix_all:
+                    raise ValueError(
+                        'Cannot fix individual moving average parameters'
+                        ' when `enforce_invertibility=True`. In this case,'
+                        ' must either fix all moving average parameters or'
+                        ' none.')
+
+    def update(self, params, transformed=True, includes_fixed=False,
+               complex_step=False):
+        params = self.handle_params(params, transformed=transformed,
+                                    includes_fixed=includes_fixed)
+
+        # 1. State intercept
+        # - Exog
+        if self.mle_regression:
+            exog_params = params[self._params_regression].reshape(
+                self.k_endog, self.k_exog).T
+            intercept = np.dot(self.exog[1:], exog_params)
+            self.ssm[self._idx_state_intercept] = intercept.T
+
+            if self._final_exog is not None:
+                self.ssm['state_intercept', :self.k_endog, -1] = np.dot(
+                    self._final_exog, exog_params)
+
+        # - Trend
+        if self.k_trend > 0:
+            # If we did not set the intercept above, zero it out so we can
+            # just += later
+            if not self.mle_regression:
+                zero = np.array(0, dtype=params.dtype)
+                self.ssm['state_intercept', :] = zero
+
+            trend_params = params[self._params_trend].reshape(
+                self.k_endog, self.k_trend).T
+            if self._trend_is_const:
+                intercept = trend_params
+            else:
+                intercept = np.dot(self._trend_data[1:], trend_params)
+            self.ssm[self._idx_state_intercept] += intercept.T
+
+            if (self._final_trend is not None
+                    and self._idx_state_intercept[-1].stop == -1):
+                self.ssm['state_intercept', :self.k_endog, -1:] += np.dot(
+                    self._final_trend, trend_params).T
+
+        # Need to set the last state intercept to np.nan (with appropriate
+        # dtype) if we don't have the final exog
+        if self.mle_regression and self._final_exog is None:
+            nan = np.array(np.nan, dtype=params.dtype)
+            self.ssm['state_intercept', :self.k_endog, -1] = nan
+
+        # 2. Transition
+        ar = params[self._params_ar].reshape(
+            self.k_endog, self.k_endog * self.k_ar)
+        ma = params[self._params_ma].reshape(
+            self.k_endog, self.k_endog * self.k_ma)
+        self.ssm[self._idx_transition] = np.c_[ar, ma]
+
+        # 3. State covariance
+        if self.error_cov_type == 'diagonal':
+            self.ssm[self._idx_state_cov] = (
+                params[self._params_state_cov]
+            )
+        elif self.error_cov_type == 'unstructured':
+            state_cov_lower = np.zeros(self.ssm['state_cov'].shape,
+                                       dtype=params.dtype)
+            state_cov_lower[self._idx_lower_state_cov] = (
+                params[self._params_state_cov])
+            self.ssm['state_cov'] = np.dot(state_cov_lower, state_cov_lower.T)
+
+        # 4. Observation covariance
+        if self.measurement_error:
+            self.ssm[self._idx_obs_cov] = params[self._params_obs_cov]

     @contextlib.contextmanager
     def _set_final_exog(self, exog):
@@ -293,7 +788,40 @@ class VARMAX(MLEModel):
         Since we handle trend in the same way as `exog`, we still have this
         issue when only trend is used without `exog`.
         """
-        pass
+        cache_value = self._final_exog
+        if self.k_exog > 0:
+            if exog is not None:
+                exog = np.atleast_1d(exog)
+                if exog.ndim == 2:
+                    exog = exog[:1]
+                try:
+                    exog = np.reshape(exog[:1], (self.k_exog,))
+                except ValueError:
+                    raise ValueError('Provided exogenous values are not of the'
+                                     ' appropriate shape. Required %s, got %s.'
+                                     % (str((self.k_exog,)),
+                                        str(exog.shape)))
+            self._final_exog = exog
+        try:
+            yield
+        finally:
+            self._final_exog = cache_value
+
+    @Appender(MLEModel.simulate.__doc__)
+    def simulate(self, params, nsimulations, measurement_shocks=None,
+                 state_shocks=None, initial_state=None, anchor=None,
+                 repetitions=None, exog=None, extend_model=None,
+                 extend_kwargs=None, transformed=True, includes_fixed=False,
+                 **kwargs):
+        with self._set_final_exog(exog):
+            out = super(VARMAX, self).simulate(
+                params, nsimulations, measurement_shocks=measurement_shocks,
+                state_shocks=state_shocks, initial_state=initial_state,
+                anchor=anchor, repetitions=repetitions, exog=exog,
+                extend_model=extend_model, extend_kwargs=extend_kwargs,
+                transformed=transformed, includes_fixed=includes_fixed,
+                **kwargs)
+        return out


 class VARMAXResults(MLEResults):
@@ -321,33 +849,75 @@ class VARMAXResults(MLEResults):
     statsmodels.tsa.statespace.kalman_filter.FilterResults
     statsmodels.tsa.statespace.mlemodel.MLEResults
     """
-
     def __init__(self, model, params, filter_results, cov_type=None,
-        cov_kwds=None, **kwargs):
+                 cov_kwds=None, **kwargs):
         super(VARMAXResults, self).__init__(model, params, filter_results,
-            cov_type, cov_kwds, **kwargs)
-        self.specification = Bunch(**{'error_cov_type': self.model.
-            error_cov_type, 'measurement_error': self.model.
-            measurement_error, 'enforce_stationarity': self.model.
-            enforce_stationarity, 'enforce_invertibility': self.model.
-            enforce_invertibility, 'trend_offset': self.model.trend_offset,
-            'order': self.model.order, 'k_ar': self.model.k_ar, 'k_ma':
-            self.model.k_ma, 'trend': self.model.trend, 'k_trend': self.
-            model.k_trend, 'k_exog': self.model.k_exog})
+                                            cov_type, cov_kwds, **kwargs)
+
+        self.specification = Bunch(**{
+            # Set additional model parameters
+            'error_cov_type': self.model.error_cov_type,
+            'measurement_error': self.model.measurement_error,
+            'enforce_stationarity': self.model.enforce_stationarity,
+            'enforce_invertibility': self.model.enforce_invertibility,
+            'trend_offset': self.model.trend_offset,
+
+            'order': self.model.order,
+
+            # Model order
+            'k_ar': self.model.k_ar,
+            'k_ma': self.model.k_ma,
+
+            # Trend / Regression
+            'trend': self.model.trend,
+            'k_trend': self.model.k_trend,
+            'k_exog': self.model.k_exog,
+        })
+
+        # Polynomials / coefficient matrices
         self.coefficient_matrices_var = None
         self.coefficient_matrices_vma = None
         if self.model.k_ar > 0:
             ar_params = np.array(self.params[self.model._params_ar])
             k_endog = self.model.k_endog
             k_ar = self.model.k_ar
-            self.coefficient_matrices_var = ar_params.reshape(k_endog *
-                k_ar, k_endog).T.reshape(k_endog, k_endog, k_ar).T
+            self.coefficient_matrices_var = (
+                ar_params.reshape(k_endog * k_ar, k_endog).T
+            ).reshape(k_endog, k_endog, k_ar).T
         if self.model.k_ma > 0:
             ma_params = np.array(self.params[self.model._params_ma])
             k_endog = self.model.k_endog
             k_ma = self.model.k_ma
-            self.coefficient_matrices_vma = ma_params.reshape(k_endog *
-                k_ma, k_endog).T.reshape(k_endog, k_endog, k_ma).T
+            self.coefficient_matrices_vma = (
+                ma_params.reshape(k_endog * k_ma, k_endog).T
+            ).reshape(k_endog, k_endog, k_ma).T
+
+    def extend(self, endog, exog=None, **kwargs):
+        # If we have exog, then the last element of predicted_state and
+        # predicted_state_cov are nan (since they depend on the exog associated
+        # with the first out-of-sample point), so we need to compute them here
+        if exog is not None:
+            fcast = self.get_prediction(self.nobs, self.nobs, exog=exog[:1])
+            fcast_results = fcast.prediction_results
+            initial_state = fcast_results.predicted_state[..., 0]
+            initial_state_cov = fcast_results.predicted_state_cov[..., 0]
+        else:
+            initial_state = self.predicted_state[..., -1]
+            initial_state_cov = self.predicted_state_cov[..., -1]
+
+        kwargs.setdefault('trend_offset', self.nobs + self.model.trend_offset)
+        mod = self.model.clone(endog, exog=exog, **kwargs)
+
+        mod.ssm.initialization = Initialization(
+            mod.k_states, 'known', constant=initial_state,
+            stationary_cov=initial_state_cov)
+
+        if self.smoother_results is not None:
+            res = mod.smooth(self.params)
+        else:
+            res = mod.filter(self.params)
+
+        return res

     @contextlib.contextmanager
     def _set_final_exog(self, exog):
@@ -369,7 +939,16 @@ class VARMAXResults(MLEResults):
         additionally updates the last element of filter_results.state_intercept
         appropriately.
         """
-        pass
+        mod = self.model
+        with mod._set_final_exog(exog):
+            cache_value = self.filter_results.state_intercept[:, -1]
+            mod.update(self.params)
+            self.filter_results.state_intercept[:mod.k_endog, -1] = (
+                mod['state_intercept', :mod.k_endog, -1])
+            try:
+                yield
+            finally:
+                self.filter_results.state_intercept[:, -1] = cache_value

     @contextlib.contextmanager
     def _set_final_predicted_state(self, exog, out_of_sample):
@@ -391,14 +970,259 @@ class VARMAXResults(MLEResults):
         if we had these then the last predicted_state has been set to NaN since
         we did not have the appropriate `exog` to create it.
         """
-        pass
+        flag = out_of_sample and self.model.k_exog > 0
+
+        if flag:
+            tmp_endog = concat([
+                self.model.endog[-1:], np.zeros((1, self.model.k_endog))])
+            if self.model.k_exog > 0:
+                tmp_exog = concat([self.model.exog[-1:], exog[:1]])
+            else:
+                tmp_exog = None
+
+            tmp_trend_offset = self.model.trend_offset + self.nobs - 1
+            tmp_mod = self.model.clone(tmp_endog, exog=tmp_exog,
+                                       trend_offset=tmp_trend_offset)
+            constant = self.filter_results.predicted_state[:, -2]
+            stationary_cov = self.filter_results.predicted_state_cov[:, :, -2]
+            tmp_mod.ssm.initialize_known(constant=constant,
+                                         stationary_cov=stationary_cov)
+            tmp_res = tmp_mod.filter(self.params, transformed=True,
+                                     includes_fixed=True, return_ssm=True)
+
+            # Patch up `predicted_state`
+            self.filter_results.predicted_state[:, -1] = (
+                tmp_res.predicted_state[:, -2])
+        try:
+            yield
+        finally:
+            if flag:
+                self.filter_results.predicted_state[:, -1] = np.nan
+
+    @Appender(MLEResults.get_prediction.__doc__)
+    def get_prediction(self, start=None, end=None, dynamic=False,
+                       information_set='predicted', index=None, exog=None,
+                       **kwargs):
+        if start is None:
+            start = 0
+
+        # Handle end (e.g. date)
+        _start, _end, out_of_sample, _ = (
+            self.model._get_prediction_index(start, end, index, silent=True))
+
+        # Normalize `exog`
+        exog = self.model._validate_out_of_sample_exog(exog, out_of_sample)
+
+        # Handle trend offset for extended model
+        extend_kwargs = {}
+        if self.model.k_trend > 0:
+            extend_kwargs['trend_offset'] = (
+                self.model.trend_offset + self.nobs)
+
+        # Get the prediction
+        with self._set_final_exog(exog):
+            with self._set_final_predicted_state(exog, out_of_sample):
+                out = super(VARMAXResults, self).get_prediction(
+                    start=start, end=end, dynamic=dynamic,
+                    information_set=information_set, index=index, exog=exog,
+                    extend_kwargs=extend_kwargs, **kwargs)
+        return out
+
+    @Appender(MLEResults.simulate.__doc__)
+    def simulate(self, nsimulations, measurement_shocks=None,
+                 state_shocks=None, initial_state=None, anchor=None,
+                 repetitions=None, exog=None, extend_model=None,
+                 extend_kwargs=None, **kwargs):
+        if anchor is None or anchor == 'start':
+            iloc = 0
+        elif anchor == 'end':
+            iloc = self.nobs
+        else:
+            iloc, _, _ = self.model._get_index_loc(anchor)
+
+        if iloc < 0:
+            iloc = self.nobs + iloc
+        if iloc > self.nobs:
+            raise ValueError('Cannot anchor simulation after the estimated'
+                             ' sample.')
+
+        out_of_sample = max(iloc + nsimulations - self.nobs, 0)
+
+        # Normalize `exog`
+        exog = self.model._validate_out_of_sample_exog(exog, out_of_sample)
+
+        with self._set_final_predicted_state(exog, out_of_sample):
+            out = super(VARMAXResults, self).simulate(
+                nsimulations, measurement_shocks=measurement_shocks,
+                state_shocks=state_shocks, initial_state=initial_state,
+                anchor=anchor, repetitions=repetitions, exog=exog,
+                extend_model=extend_model, extend_kwargs=extend_kwargs,
+                **kwargs)
+
+        return out
+
+    def _news_previous_results(self, previous, start, end, periods,
+                               revisions_details_start=False,
+                               state_index=None):
+        # TODO: tests for:
+        # - the model cloning used in `kalman_smoother.news` works when we
+        #   have time-varying exog (i.e. or do we need to somehow explicitly
+        #   call the _set_final_exog and _set_final_predicted_state methods
+        #   on the rev_mod / revision_results)
+        # - in the case of revisions to `endog`, should the revised model use
+        #   the `previous` exog? or the `revised` exog?
+        # We need to figure out the out-of-sample exog, so that we can add back
+        # in the last exog, predicted state
+        exog = None
+        out_of_sample = self.nobs - previous.nobs
+        if self.model.k_exog > 0 and out_of_sample > 0:
+            exog = self.model.exog[-out_of_sample:]
+
+        # Compute the news
+        with contextlib.ExitStack() as stack:
+            stack.enter_context(previous.model._set_final_exog(exog))
+            stack.enter_context(previous._set_final_predicted_state(
+                exog, out_of_sample))
+
+            out = self.smoother_results.news(
+                previous.smoother_results, start=start, end=end,
+                revisions_details_start=revisions_details_start,
+                state_index=state_index)
+        return out
+
+    @Appender(MLEResults.summary.__doc__)
+    def summary(self, alpha=.05, start=None, separate_params=True):
+        from statsmodels.iolib.summary import summary_params
+
+        # Create the model name
+        spec = self.specification
+        if spec.k_ar > 0 and spec.k_ma > 0:
+            model_name = 'VARMA'
+            order = '(%s,%s)' % (spec.k_ar, spec.k_ma)
+        elif spec.k_ar > 0:
+            model_name = 'VAR'
+            order = '(%s)' % (spec.k_ar)
+        else:
+            model_name = 'VMA'
+            order = '(%s)' % (spec.k_ma)
+        if spec.k_exog > 0:
+            model_name += 'X'
+        model_name = [model_name + order]
+
+        if spec.k_trend > 0:
+            model_name.append('intercept')
+
+        if spec.measurement_error:
+            model_name.append('measurement error')
+
+        summary = super(VARMAXResults, self).summary(
+            alpha=alpha, start=start, model_name=model_name,
+            display_params=not separate_params
+        )
+
+        if separate_params:
+            indices = np.arange(len(self.params))
+
+            def make_table(self, mask, title, strip_end=True):
+                res = (self, self.params[mask], self.bse[mask],
+                       self.zvalues[mask], self.pvalues[mask],
+                       self.conf_int(alpha)[mask])
+
+                param_names = []
+                for name in np.array(self.data.param_names)[mask].tolist():
+                    if strip_end:
+                        param_name = '.'.join(name.split('.')[:-1])
+                    else:
+                        param_name = name
+                    if name in self.fixed_params:
+                        param_name = '%s (fixed)' % param_name
+                    param_names.append(param_name)
+
+                return summary_params(res, yname=None, xname=param_names,
+                                      alpha=alpha, use_t=False, title=title)
+
+            # Add parameter tables for each endogenous variable
+            k_endog = self.model.k_endog
+            k_ar = self.model.k_ar
+            k_ma = self.model.k_ma
+            k_trend = self.model.k_trend
+            k_exog = self.model.k_exog
+            endog_masks = []
+            for i in range(k_endog):
+                masks = []
+                offset = 0
+
+                # 1. Intercept terms
+                if k_trend > 0:
+                    masks.append(np.arange(i, i + k_endog * k_trend, k_endog))
+                    offset += k_endog * k_trend
+
+                # 2. AR terms
+                if k_ar > 0:
+                    start = i * k_endog * k_ar
+                    end = (i + 1) * k_endog * k_ar
+                    masks.append(
+                        offset + np.arange(start, end))
+                    offset += k_ar * k_endog**2
+
+                # 3. MA terms
+                if k_ma > 0:
+                    start = i * k_endog * k_ma
+                    end = (i + 1) * k_endog * k_ma
+                    masks.append(
+                        offset + np.arange(start, end))
+                    offset += k_ma * k_endog**2
+
+                # 4. Regression terms
+                if k_exog > 0:
+                    masks.append(
+                        offset + np.arange(i * k_exog, (i + 1) * k_exog))
+                    offset += k_endog * k_exog
+
+                # 5. Measurement error variance terms
+                if self.model.measurement_error:
+                    masks.append(
+                        np.array(self.model.k_params - i - 1, ndmin=1))
+
+                # Create the table
+                mask = np.concatenate(masks)
+                endog_masks.append(mask)
+
+                endog_names = self.model.endog_names
+                if not isinstance(endog_names, list):
+                    endog_names = [endog_names]
+                title = "Results for equation %s" % endog_names[i]
+                table = make_table(self, mask, title)
+                summary.tables.append(table)
+
+            # State covariance terms
+            state_cov_mask = (
+                np.arange(len(self.params))[self.model._params_state_cov])
+            table = make_table(self, state_cov_mask, "Error covariance matrix",
+                               strip_end=False)
+            summary.tables.append(table)
+
+            # Add a table for all other parameters
+            masks = []
+            for m in (endog_masks, [state_cov_mask]):
+                m = np.array(m).flatten()
+                if len(m) > 0:
+                    masks.append(m)
+            masks = np.concatenate(masks)
+            inverse_mask = np.array(list(set(indices).difference(set(masks))))
+            if len(inverse_mask) > 0:
+                table = make_table(self, inverse_mask, "Other parameters",
+                                   strip_end=False)
+                summary.tables.append(table)
+
+        return summary


 class VARMAXResultsWrapper(MLEResultsWrapper):
     _attrs = {}
-    _wrap_attrs = wrap.union_dicts(MLEResultsWrapper._wrap_attrs, _attrs)
+    _wrap_attrs = wrap.union_dicts(MLEResultsWrapper._wrap_attrs,
+                                   _attrs)
     _methods = {}
-    _wrap_methods = wrap.union_dicts(MLEResultsWrapper._wrap_methods, _methods)
-
-
-wrap.populate_wrapper(VARMAXResultsWrapper, VARMAXResults)
+    _wrap_methods = wrap.union_dicts(MLEResultsWrapper._wrap_methods,
+                                     _methods)
+wrap.populate_wrapper(VARMAXResultsWrapper, VARMAXResults)  # noqa:E305
diff --git a/statsmodels/tsa/stattools.py b/statsmodels/tsa/stattools.py
index 53eae0b88..891c77634 100644
--- a/statsmodels/tsa/stattools.py
+++ b/statsmodels/tsa/stattools.py
@@ -2,37 +2,83 @@
 Statistical tools for time series analysis
 """
 from __future__ import annotations
+
 from statsmodels.compat.numpy import lstsq
 from statsmodels.compat.pandas import deprecate_kwarg
 from statsmodels.compat.python import Literal, lzip
 from statsmodels.compat.scipy import _next_regular
+
 from typing import Union, List
 import warnings
+
 import numpy as np
 from numpy.linalg import LinAlgError
 import pandas as pd
 from scipy import stats
 from scipy.interpolate import interp1d
 from scipy.signal import correlate
+
 from statsmodels.regression.linear_model import OLS, yule_walker
-from statsmodels.tools.sm_exceptions import CollinearityWarning, InfeasibleTestError, InterpolationWarning, MissingDataError, ValueWarning
+from statsmodels.tools.sm_exceptions import (
+    CollinearityWarning,
+    InfeasibleTestError,
+    InterpolationWarning,
+    MissingDataError,
+    ValueWarning,
+)
 from statsmodels.tools.tools import Bunch, add_constant
-from statsmodels.tools.validation import array_like, bool_like, dict_like, float_like, int_like, string_like
+from statsmodels.tools.validation import (
+    array_like,
+    bool_like,
+    dict_like,
+    float_like,
+    int_like,
+    string_like,
+)
 from statsmodels.tsa._bds import bds
 from statsmodels.tsa._innovations import innovations_algo, innovations_filter
 from statsmodels.tsa.adfvalues import mackinnoncrit, mackinnonp
 from statsmodels.tsa.tsatools import add_trend, lagmat, lagmat2ds
+
 ArrayLike1D = Union[np.ndarray, pd.Series, List[float]]
-__all__ = ['acovf', 'acf', 'pacf', 'pacf_yw', 'pacf_ols', 'ccovf', 'ccf',
-    'q_stat', 'coint', 'arma_order_select_ic', 'adfuller', 'kpss', 'bds',
-    'pacf_burg', 'innovations_algo', 'innovations_filter',
-    'levinson_durbin_pacf', 'levinson_durbin', 'zivot_andrews',
-    'range_unit_root_test']
+
+__all__ = [
+    "acovf",
+    "acf",
+    "pacf",
+    "pacf_yw",
+    "pacf_ols",
+    "ccovf",
+    "ccf",
+    "q_stat",
+    "coint",
+    "arma_order_select_ic",
+    "adfuller",
+    "kpss",
+    "bds",
+    "pacf_burg",
+    "innovations_algo",
+    "innovations_filter",
+    "levinson_durbin_pacf",
+    "levinson_durbin",
+    "zivot_andrews",
+    "range_unit_root_test",
+]
+
 SQRTEPS = np.sqrt(np.finfo(np.double).eps)


-def _autolag(mod, endog, exog, startlag, maxlag, method, modargs=(),
-    fitargs=(), regresults=False):
+def _autolag(
+    mod,
+    endog,
+    exog,
+    startlag,
+    maxlag,
+    method,
+    modargs=(),
+    fitargs=(),
+    regresults=False,
+):
     """
     Returns the results for the lag length that maximizes the info criterion.

@@ -76,11 +122,57 @@ def _autolag(mod, endog, exog, startlag, maxlag, method, modargs=(),
     assumed to be in contiguous columns from low to high lag length with
     the highest lag in the last column.
     """
-    pass
-
-
-def adfuller(x, maxlag: (int | None)=None, regression='c', autolag='AIC',
-    store=False, regresults=False):
+    # TODO: can tcol be replaced by maxlag + 2?
+    # TODO: This could be changed to laggedRHS and exog keyword arguments if
+    #    this will be more general.
+
+    results = {}
+    method = method.lower()
+    for lag in range(startlag, startlag + maxlag + 1):
+        mod_instance = mod(endog, exog[:, :lag], *modargs)
+        results[lag] = mod_instance.fit()
+
+    if method == "aic":
+        icbest, bestlag = min((v.aic, k) for k, v in results.items())
+    elif method == "bic":
+        icbest, bestlag = min((v.bic, k) for k, v in results.items())
+    elif method == "t-stat":
+        # stop = stats.norm.ppf(.95)
+        stop = 1.6448536269514722
+        # Default values to ensure that always set
+        bestlag = startlag + maxlag
+        icbest = 0.0
+        for lag in range(startlag + maxlag, startlag - 1, -1):
+            icbest = np.abs(results[lag].tvalues[-1])
+            bestlag = lag
+            if np.abs(icbest) >= stop:
+                # Break for first lag with a significant t-stat
+                break
+    else:
+        raise ValueError(f"Information Criterion {method} not understood.")
+
+    if not regresults:
+        return icbest, bestlag
+    else:
+        return icbest, bestlag, results
+
+
+# this needs to be converted to a class like HetGoldfeldQuandt,
+# 3 different returns are a mess
+# See:
+# Ng and Perron(2001), Lag length selection and the construction of unit root
+# tests with good size and power, Econometrica, Vol 69 (6) pp 1519-1554
+# TODO: include drift keyword, only valid with regression == "c"
+# just changes the distribution of the test statistic to a t distribution
+# TODO: autolag is untested
+def adfuller(
+    x,
+    maxlag: int | None = None,
+    regression="c",
+    autolag="AIC",
+    store=False,
+    regresults=False,
+):
     """
     Augmented Dickey-Fuller unit root test.

@@ -168,11 +260,140 @@ def adfuller(x, maxlag: (int | None)=None, regression='c', autolag='AIC',
         University, Dept of Economics, Working Papers.  Available at
         http://ideas.repec.org/p/qed/wpaper/1227.html
     """
-    pass
-
-
-@deprecate_kwarg('unbiased', 'adjusted')
-def acovf(x, adjusted=False, demean=True, fft=True, missing='none', nlag=None):
+    x = array_like(x, "x")
+    maxlag = int_like(maxlag, "maxlag", optional=True)
+    regression = string_like(
+        regression, "regression", options=("c", "ct", "ctt", "n")
+    )
+    autolag = string_like(
+        autolag, "autolag", optional=True, options=("aic", "bic", "t-stat")
+    )
+    store = bool_like(store, "store")
+    regresults = bool_like(regresults, "regresults")
+
+    if x.max() == x.min():
+        raise ValueError("Invalid input, x is constant")
+
+    if regresults:
+        store = True
+
+    trenddict = {None: "n", 0: "c", 1: "ct", 2: "ctt"}
+    if regression is None or isinstance(regression, int):
+        regression = trenddict[regression]
+    regression = regression.lower()
+    nobs = x.shape[0]
+
+    ntrend = len(regression) if regression != "n" else 0
+    if maxlag is None:
+        # from Greene referencing Schwert 1989
+        maxlag = int(np.ceil(12.0 * np.power(nobs / 100.0, 1 / 4.0)))
+        # -1 for the diff
+        maxlag = min(nobs // 2 - ntrend - 1, maxlag)
+        if maxlag < 0:
+            raise ValueError(
+                "sample size is too short to use selected "
+                "regression component"
+            )
+    elif maxlag > nobs // 2 - ntrend - 1:
+        raise ValueError(
+            "maxlag must be less than (nobs/2 - 1 - ntrend) "
+            "where n trend is the number of included "
+            "deterministic regressors"
+        )
+    xdiff = np.diff(x)
+    xdall = lagmat(xdiff[:, None], maxlag, trim="both", original="in")
+    nobs = xdall.shape[0]
+
+    xdall[:, 0] = x[-nobs - 1 : -1]  # replace 0 xdiff with level of x
+    xdshort = xdiff[-nobs:]
+
+    if store:
+        from statsmodels.stats.diagnostic import ResultsStore
+
+        resstore = ResultsStore()
+    if autolag:
+        if regression != "n":
+            fullRHS = add_trend(xdall, regression, prepend=True)
+        else:
+            fullRHS = xdall
+        startlag = fullRHS.shape[1] - xdall.shape[1] + 1
+        # 1 for level
+        # search for lag length with smallest information criteria
+        # Note: use the same number of observations to have comparable IC
+        # aic and bic: smaller is better
+
+        if not regresults:
+            icbest, bestlag = _autolag(
+                OLS, xdshort, fullRHS, startlag, maxlag, autolag
+            )
+        else:
+            icbest, bestlag, alres = _autolag(
+                OLS,
+                xdshort,
+                fullRHS,
+                startlag,
+                maxlag,
+                autolag,
+                regresults=regresults,
+            )
+            resstore.autolag_results = alres
+
+        bestlag -= startlag  # convert to lag not column index
+
+        # rerun ols with best autolag
+        xdall = lagmat(xdiff[:, None], bestlag, trim="both", original="in")
+        nobs = xdall.shape[0]
+        xdall[:, 0] = x[-nobs - 1 : -1]  # replace 0 xdiff with level of x
+        xdshort = xdiff[-nobs:]
+        usedlag = bestlag
+    else:
+        usedlag = maxlag
+        icbest = None
+    if regression != "n":
+        resols = OLS(
+            xdshort, add_trend(xdall[:, : usedlag + 1], regression)
+        ).fit()
+    else:
+        resols = OLS(xdshort, xdall[:, : usedlag + 1]).fit()
+
+    adfstat = resols.tvalues[0]
+    #    adfstat = (resols.params[0]-1.0)/resols.bse[0]
+    # the "asymptotically correct" z statistic is obtained as
+    # nobs/(1-np.sum(resols.params[1:-(trendorder+1)])) (resols.params[0] - 1)
+    # I think this is the statistic that is used for series that are integrated
+    # for orders higher than I(1), ie., not ADF but cointegration tests.
+
+    # Get approx p-value and critical values
+    pvalue = mackinnonp(adfstat, regression=regression, N=1)
+    critvalues = mackinnoncrit(N=1, regression=regression, nobs=nobs)
+    critvalues = {
+        "1%": critvalues[0],
+        "5%": critvalues[1],
+        "10%": critvalues[2],
+    }
+    if store:
+        resstore.resols = resols
+        resstore.maxlag = maxlag
+        resstore.usedlag = usedlag
+        resstore.adfstat = adfstat
+        resstore.critvalues = critvalues
+        resstore.nobs = nobs
+        resstore.H0 = (
+            "The coefficient on the lagged level equals 1 - " "unit root"
+        )
+        resstore.HA = "The coefficient on the lagged level < 1 - stationary"
+        resstore.icbest = icbest
+        resstore._str = "Augmented Dickey-Fuller Test Results"
+        return adfstat, pvalue, critvalues, resstore
+    else:
+        if not autolag:
+            return adfstat, pvalue, usedlag, nobs, critvalues
+        else:
+            return adfstat, pvalue, usedlag, nobs, critvalues, icbest
+
+
+@deprecate_kwarg("unbiased", "adjusted")
+def acovf(x, adjusted=False, demean=True, fft=True, missing="none", nlag=None):
     """
     Estimate autocovariances.

@@ -215,7 +436,99 @@ def acovf(x, adjusted=False, demean=True, fft=True, missing='none', nlag=None):
            and amplitude modulation. Sankhya: The Indian Journal of
            Statistics, Series A, pp.383-392.
     """
-    pass
+    adjusted = bool_like(adjusted, "adjusted")
+    demean = bool_like(demean, "demean")
+    fft = bool_like(fft, "fft", optional=False)
+    missing = string_like(
+        missing, "missing", options=("none", "raise", "conservative", "drop")
+    )
+    nlag = int_like(nlag, "nlag", optional=True)
+
+    x = array_like(x, "x", ndim=1)
+
+    missing = missing.lower()
+    if missing == "none":
+        deal_with_masked = False
+    else:
+        deal_with_masked = has_missing(x)
+    if deal_with_masked:
+        if missing == "raise":
+            raise MissingDataError("NaNs were encountered in the data")
+        notmask_bool = ~np.isnan(x)  # bool
+        if missing == "conservative":
+            # Must copy for thread safety
+            x = x.copy()
+            x[~notmask_bool] = 0
+        else:  # "drop"
+            x = x[notmask_bool]  # copies non-missing
+        notmask_int = notmask_bool.astype(int)  # int
+
+    if demean and deal_with_masked:
+        # whether "drop" or "conservative":
+        xo = x - x.sum() / notmask_int.sum()
+        if missing == "conservative":
+            xo[~notmask_bool] = 0
+    elif demean:
+        xo = x - x.mean()
+    else:
+        xo = x
+
+    n = len(x)
+    lag_len = nlag
+    if nlag is None:
+        lag_len = n - 1
+    elif nlag > n - 1:
+        raise ValueError("nlag must be smaller than nobs - 1")
+
+    if not fft and nlag is not None:
+        acov = np.empty(lag_len + 1)
+        acov[0] = xo.dot(xo)
+        for i in range(lag_len):
+            acov[i + 1] = xo[i + 1 :].dot(xo[: -(i + 1)])
+        if not deal_with_masked or missing == "drop":
+            if adjusted:
+                acov /= n - np.arange(lag_len + 1)
+            else:
+                acov /= n
+        else:
+            if adjusted:
+                divisor = np.empty(lag_len + 1, dtype=np.int64)
+                divisor[0] = notmask_int.sum()
+                for i in range(lag_len):
+                    divisor[i + 1] = notmask_int[i + 1 :].dot(
+                        notmask_int[: -(i + 1)]
+                    )
+                divisor[divisor == 0] = 1
+                acov /= divisor
+            else:  # biased, missing data but npt "drop"
+                acov /= notmask_int.sum()
+        return acov
+
+    if adjusted and deal_with_masked and missing == "conservative":
+        d = np.correlate(notmask_int, notmask_int, "full")
+        d[d == 0] = 1
+    elif adjusted:
+        xi = np.arange(1, n + 1)
+        d = np.hstack((xi, xi[:-1][::-1]))
+    elif deal_with_masked:
+        # biased and NaNs given and ("drop" or "conservative")
+        d = notmask_int.sum() * np.ones(2 * n - 1)
+    else:  # biased and no NaNs or missing=="none"
+        d = n * np.ones(2 * n - 1)
+
+    if fft:
+        nobs = len(xo)
+        n = _next_regular(2 * nobs + 1)
+        Frf = np.fft.fft(xo, n=n)
+        acov = np.fft.ifft(Frf * np.conjugate(Frf))[:nobs] / d[nobs - 1 :]
+        acov = acov.real
+    else:
+        acov = np.correlate(xo, xo, "full")[n - 1 :] / d[n - 1 :]
+
+    if nlag is not None:
+        # Copy to allow gc of full array rather than view
+        return acov[: lag_len + 1].copy()
+    return acov


 def q_stat(x, nobs):
@@ -248,11 +561,31 @@ def q_stat(x, nobs):
     -----
     Designed to be used with acf.
     """
-    pass
-
-
-def acf(x, adjusted=False, nlags=None, qstat=False, fft=True, alpha=None,
-    bartlett_confint=True, missing='none'):
+    x = array_like(x, "x")
+    nobs = int_like(nobs, "nobs")
+
+    ret = (
+        nobs
+        * (nobs + 2)
+        * np.cumsum((1.0 / (nobs - np.arange(1, len(x) + 1))) * x ** 2)
+    )
+    chi2 = stats.chi2.sf(ret, np.arange(1, len(x) + 1))
+    return ret, chi2
+
+
+# NOTE: Changed unbiased to False
+# see for example
+# http://www.itl.nist.gov/div898/handbook/eda/section3/autocopl.htm
+def acf(
+    x,
+    adjusted=False,
+    nlags=None,
+    qstat=False,
+    fft=True,
+    alpha=None,
+    bartlett_confint=True,
+    missing="none",
+):
     """
     Calculate the autocorrelation function.

@@ -342,11 +675,47 @@ def acf(x, adjusted=False, nlags=None, qstat=False, fft=True, alpha=None,
     .. [3] Brockwell and Davis, 2010. Introduction to Time Series and
        Forecasting, 2nd edition.
     """
-    pass
-
-
-def pacf_yw(x: ArrayLike1D, nlags: (int | None)=None, method: Literal[
-    'adjusted', 'mle']='adjusted') ->np.ndarray:
+    adjusted = bool_like(adjusted, "adjusted")
+    nlags = int_like(nlags, "nlags", optional=True)
+    qstat = bool_like(qstat, "qstat")
+    fft = bool_like(fft, "fft", optional=False)
+    alpha = float_like(alpha, "alpha", optional=True)
+    missing = string_like(
+        missing, "missing", options=("none", "raise", "conservative", "drop")
+    )
+    x = array_like(x, "x")
+    # TODO: should this shrink for missing="drop" and NaNs in x?
+    nobs = x.shape[0]
+    if nlags is None:
+        nlags = min(int(10 * np.log10(nobs)), nobs - 1)
+
+    avf = acovf(x, adjusted=adjusted, demean=True, fft=fft, missing=missing)
+    acf = avf[: nlags + 1] / avf[0]
+    if not (qstat or alpha):
+        return acf
+    _alpha = alpha if alpha is not None else 0.05
+    if bartlett_confint:
+        varacf = np.ones_like(acf) / nobs
+        varacf[0] = 0
+        varacf[1] = 1.0 / nobs
+        varacf[2:] *= 1 + 2 * np.cumsum(acf[1:-1] ** 2)
+    else:
+        varacf = 1.0 / len(x)
+    interval = stats.norm.ppf(1 - _alpha / 2.0) * np.sqrt(varacf)
+    confint = np.array(lzip(acf - interval, acf + interval))
+    if not qstat:
+        return acf, confint
+    qstat, pvalue = q_stat(acf[1:], nobs=nobs)  # drop lag 0
+    if alpha is not None:
+        return acf, confint, qstat, pvalue
+    else:
+        return acf, qstat, pvalue
+
+def pacf_yw(
+    x: ArrayLike1D,
+    nlags: int | None = None,
+    method: Literal["adjusted", "mle"] = "adjusted",
+) -> np.ndarray:
     """
     Partial autocorrelation estimated with non-recursive yule_walker.

@@ -379,11 +748,23 @@ def pacf_yw(x: ArrayLike1D, nlags: (int | None)=None, method: Literal[
     This solves yule_walker for each desired lag and contains
     currently duplicate calculations.
     """
-    pass
-
-
-def pacf_burg(x: ArrayLike1D, nlags: (int | None)=None, demean: bool=True
-    ) ->tuple[np.ndarray, np.ndarray]:
+    x = array_like(x, "x")
+    nlags = int_like(nlags, "nlags", optional=True)
+    nobs = x.shape[0]
+    if nlags is None:
+        nlags = max(min(int(10 * np.log10(nobs)), nobs - 1), 1)
+    method = string_like(method, "method", options=("adjusted", "mle"))
+    pacf = [1.0]
+    with warnings.catch_warnings():
+        warnings.simplefilter("once", ValueWarning)
+        for k in range(1, nlags + 1):
+            pacf.append(yule_walker(x, k, method=method)[0][-1])
+    return np.array(pacf)
+
+
+def pacf_burg(
+    x: ArrayLike1D, nlags: int | None = None, demean: bool = True
+) -> tuple[np.ndarray, np.ndarray]:
     """
     Calculate Burg"s partial autocorrelation estimator.

@@ -420,12 +801,43 @@ def pacf_burg(x: ArrayLike1D, nlags: (int | None)=None, demean: bool=True
     .. [1] Brockwell, P.J. and Davis, R.A., 2016. Introduction to time series
         and forecasting. Springer.
     """
-    pass
-
-
-@deprecate_kwarg('unbiased', 'adjusted')
-def pacf_ols(x: ArrayLike1D, nlags: (int | None)=None, efficient: bool=True,
-    adjusted: bool=False) ->np.ndarray:
+    x = array_like(x, "x")
+    if demean:
+        x = x - x.mean()
+    nobs = x.shape[0]
+    p = nlags if nlags is not None else min(int(10 * np.log10(nobs)), nobs - 1)
+    p = max(p, 1)
+    if p > nobs - 1:
+        raise ValueError("nlags must be smaller than nobs - 1")
+    d = np.zeros(p + 1)
+    d[0] = 2 * x.dot(x)
+    pacf = np.zeros(p + 1)
+    u = x[::-1].copy()
+    v = x[::-1].copy()
+    d[1] = u[:-1].dot(u[:-1]) + v[1:].dot(v[1:])
+    pacf[1] = 2 / d[1] * v[1:].dot(u[:-1])
+    last_u = np.empty_like(u)
+    last_v = np.empty_like(v)
+    for i in range(1, p):
+        last_u[:] = u
+        last_v[:] = v
+        u[1:] = last_u[:-1] - pacf[i] * last_v[1:]
+        v[1:] = last_v[1:] - pacf[i] * last_u[:-1]
+        d[i + 1] = (1 - pacf[i] ** 2) * d[i] - v[i] ** 2 - u[-1] ** 2
+        pacf[i + 1] = 2 / d[i + 1] * v[i + 1 :].dot(u[i:-1])
+    sigma2 = (1 - pacf**2) * d / (2.0 * (nobs - np.arange(0, p + 1)))
+    pacf[0] = 1  # Insert the 0 lag partial autocorrel
+
+    return pacf, sigma2
+
+
+@deprecate_kwarg("unbiased", "adjusted")
+def pacf_ols(
+    x: ArrayLike1D,
+    nlags: int | None = None,
+    efficient: bool = True,
+    adjusted: bool = False,
+) -> np.ndarray:
     """
     Calculate partial autocorrelations via OLS.

@@ -479,13 +891,55 @@ def pacf_ols(x: ArrayLike1D, nlags: (int | None)=None, efficient: bool=True,
     .. [1] Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015).
        Time series analysis: forecasting and control. John Wiley & Sons, p. 66
     """
-    pass
-
-
-def pacf(x: ArrayLike1D, nlags: (int | None)=None, method: Literal['yw',
-    'ywadjusted', 'ols', 'ols-inefficient', 'ols-adjusted', 'ywm', 'ywmle',
-    'ld', 'ldadjusted', 'ldb', 'ldbiased', 'burg']='ywadjusted', alpha: (
-    float | None)=None) ->(np.ndarray | tuple[np.ndarray, np.ndarray]):
+    x = array_like(x, "x")
+    nlags = int_like(nlags, "nlags", optional=True)
+    efficient = bool_like(efficient, "efficient")
+    adjusted = bool_like(adjusted, "adjusted")
+    nobs = x.shape[0]
+    if nlags is None:
+        nlags = max(min(int(10 * np.log10(nobs)), nobs // 2), 1)
+    if nlags > nobs//2:
+        raise ValueError(f"nlags must be smaller than nobs // 2 ({nobs//2})")
+    pacf = np.empty(nlags + 1)
+    pacf[0] = 1.0
+    if efficient:
+        xlags, x0 = lagmat(x, nlags, original="sep")
+        xlags = add_constant(xlags)
+        for k in range(1, nlags + 1):
+            params = lstsq(xlags[k:, : k + 1], x0[k:], rcond=None)[0]
+            pacf[k] = np.squeeze(params[-1])
+    else:
+        x = x - np.mean(x)
+        # Create a single set of lags for multivariate OLS
+        xlags, x0 = lagmat(x, nlags, original="sep", trim="both")
+        for k in range(1, nlags + 1):
+            params = lstsq(xlags[:, :k], x0, rcond=None)[0]
+            # Last coefficient corresponds to PACF value (see [1])
+            pacf[k] = np.squeeze(params[-1])
+    if adjusted:
+        pacf *= nobs / (nobs - np.arange(nlags + 1))
+    return pacf
+
+
+def pacf(
+    x: ArrayLike1D,
+    nlags: int | None = None,
+    method: Literal[
+        "yw",
+        "ywadjusted",
+        "ols",
+        "ols-inefficient",
+        "ols-adjusted",
+        "ywm",
+        "ywmle",
+        "ld",
+        "ldadjusted",
+        "ldb",
+        "ldbiased",
+        "burg",
+    ] = "ywadjusted",
+    alpha: float | None = None,
+) -> np.ndarray | tuple[np.ndarray, np.ndarray]:
     """
     Partial autocorrelation estimate.

@@ -552,10 +1006,71 @@ def pacf(x: ArrayLike1D, nlags: (int | None)=None, method: Literal['yw',
     Yule-Walker (adjusted) and Levinson-Durbin (adjusted) performed
     consistently worse than the other options.
     """
-    pass
-
-
-@deprecate_kwarg('unbiased', 'adjusted')
+    nlags = int_like(nlags, "nlags", optional=True)
+    methods = (
+        "ols",
+        "ols-inefficient",
+        "ols-adjusted",
+        "yw",
+        "ywa",
+        "ld",
+        "ywadjusted",
+        "yw_adjusted",
+        "ywm",
+        "ywmle",
+        "yw_mle",
+        "lda",
+        "ldadjusted",
+        "ld_adjusted",
+        "ldb",
+        "ldbiased",
+        "ld_biased",
+        "burg",
+    )
+    x = array_like(x, "x", maxdim=2)
+    method = string_like(method, "method", options=methods)
+    alpha = float_like(alpha, "alpha", optional=True)
+
+    nobs = x.shape[0]
+    if nlags is None:
+        nlags = min(int(10 * np.log10(nobs)), nobs // 2 - 1)
+    nlags = max(nlags, 1)
+    if nlags > x.shape[0] // 2:
+        raise ValueError(
+            "Can only compute partial correlations for lags up to 50% of the "
+            f"sample size. The requested nlags {nlags} must be < "
+            f"{x.shape[0] // 2}."
+        )
+    if method in ("ols", "ols-inefficient", "ols-adjusted"):
+        efficient = "inefficient" not in method
+        adjusted = "adjusted" in method
+        ret = pacf_ols(x, nlags=nlags, efficient=efficient, adjusted=adjusted)
+    elif method in ("yw", "ywa", "ywadjusted", "yw_adjusted"):
+        ret = pacf_yw(x, nlags=nlags, method="adjusted")
+    elif method in ("ywm", "ywmle", "yw_mle"):
+        ret = pacf_yw(x, nlags=nlags, method="mle")
+    elif method in ("ld", "lda", "ldadjusted", "ld_adjusted"):
+        acv = acovf(x, adjusted=True, fft=False)
+        ld_ = levinson_durbin(acv, nlags=nlags, isacov=True)
+        ret = ld_[2]
+    elif method == "burg":
+        ret, _ = pacf_burg(x, nlags=nlags, demean=True)
+    # inconsistent naming with ywmle
+    else:  # method in ("ldb", "ldbiased", "ld_biased")
+        acv = acovf(x, adjusted=False, fft=False)
+        ld_ = levinson_durbin(acv, nlags=nlags, isacov=True)
+        ret = ld_[2]
+    if alpha is not None:
+        varacf = 1.0 / len(x)  # for all lags >=1
+        interval = stats.norm.ppf(1.0 - alpha / 2.0) * np.sqrt(varacf)
+        confint = np.array(lzip(ret - interval, ret + interval))
+        confint[0] = ret[0]  # fix confidence interval for lag 0 to varpacf=0
+        return ret, confint
+    else:
+        return ret
+
+
+@deprecate_kwarg("unbiased", "adjusted")
 def ccovf(x, y, adjusted=True, demean=True, fft=True):
     """
     Calculate the cross-covariance between two series.
@@ -579,10 +1094,29 @@ def ccovf(x, y, adjusted=True, demean=True, fft=True):
         is the covariance between {x[k], x[k+1], ..., x[n]} and {y[0], y[1], ..., y[m-k]},
         where n and m are the lengths of x and y, respectively.
     """
-    pass
-
-
-@deprecate_kwarg('unbiased', 'adjusted')
+    x = array_like(x, "x")
+    y = array_like(y, "y")
+    adjusted = bool_like(adjusted, "adjusted")
+    demean = bool_like(demean, "demean")
+    fft = bool_like(fft, "fft", optional=False)
+
+    n = len(x)
+    if demean:
+        xo = x - x.mean()
+        yo = y - y.mean()
+    else:
+        xo = x
+        yo = y
+    if adjusted:
+        d = np.arange(n, 0, -1)
+    else:
+        d = n
+
+    method = "fft" if fft else "direct"
+    return correlate(xo, yo, "full", method=method)[n - 1:] / d
+
+
+@deprecate_kwarg("unbiased", "adjusted")
 def ccf(x, y, adjusted=True, fft=True, *, nlags=None, alpha=None):
     """
     The cross-correlation function.
@@ -625,9 +1159,25 @@ def ccf(x, y, adjusted=True, fft=True, *, nlags=None, alpha=None):
     .. [1] Brockwell and Davis, 2016. Introduction to Time Series and
        Forecasting, 3rd edition, p. 242.
     """
-    pass
+    x = array_like(x, "x")
+    y = array_like(y, "y")
+    adjusted = bool_like(adjusted, "adjusted")
+    fft = bool_like(fft, "fft", optional=False)
+
+    cvf = ccovf(x, y, adjusted=adjusted, demean=True, fft=fft)
+    ret = cvf / (np.std(x) * np.std(y))
+    ret = ret[:nlags]

+    if alpha is not None:
+        interval = stats.norm.ppf(1.0 - alpha / 2.0) / np.sqrt(len(x))
+        confint = ret.reshape(-1, 1) + interval * np.array([-1, 1])
+        return ret, confint
+    else:
+        return ret

+
+# moved from sandbox.tsa.examples.try_ld_nitime, via nitime
+# TODO: check what to return, for testing and trying out returns everything
 def levinson_durbin(s, nlags=10, isacov=False):
     """
     Levinson-Durbin recursion for autoregressive processes.
@@ -668,7 +1218,35 @@ def levinson_durbin(s, nlags=10, isacov=False):
     sample autocovariance function is calculated with the default options
     (biased, no fft).
     """
-    pass
+    s = array_like(s, "s")
+    nlags = int_like(nlags, "nlags")
+    isacov = bool_like(isacov, "isacov")
+
+    order = nlags
+
+    if isacov:
+        sxx_m = s
+    else:
+        sxx_m = acovf(s, fft=False)[: order + 1]  # not tested
+
+    phi = np.zeros((order + 1, order + 1), "d")
+    sig = np.zeros(order + 1)
+    # initial points for the recursion
+    phi[1, 1] = sxx_m[1] / sxx_m[0]
+    sig[1] = sxx_m[0] - phi[1, 1] * sxx_m[1]
+    for k in range(2, order + 1):
+        phi[k, k] = (
+            sxx_m[k] - np.dot(phi[1:k, k - 1], sxx_m[1:k][::-1])
+        ) / sig[k - 1]
+        for j in range(1, k):
+            phi[j, k] = phi[j, k - 1] - phi[k, k] * phi[k - j, k - 1]
+        sig[k] = sig[k - 1] * (1 - phi[k, k] ** 2)
+
+    sigma_v = sig[-1]
+    arcoefs = phi[1:, -1]
+    pacf_ = np.diag(phi).copy()
+    pacf_[0] = 1.0
+    return sigma_v, arcoefs, pacf_, sig, phi  # return everything


 def levinson_durbin_pacf(pacf, nlags=None):
@@ -696,12 +1274,42 @@ def levinson_durbin_pacf(pacf, nlags=None):
     .. [1] Brockwell, P.J. and Davis, R.A., 2016. Introduction to time series
         and forecasting. Springer.
     """
-    pass
-
-
-def breakvar_heteroskedasticity_test(resid, subset_length=1 / 3,
-    alternative='two-sided', use_f=True):
-    """
+    pacf = array_like(pacf, "pacf")
+    nlags = int_like(nlags, "nlags", optional=True)
+    pacf = np.squeeze(np.asarray(pacf))
+
+    if pacf[0] != 1:
+        raise ValueError(
+            "The first entry of the pacf corresponds to lags 0 "
+            "and so must be 1."
+        )
+    pacf = pacf[1:]
+    n = pacf.shape[0]
+    if nlags is not None:
+        if nlags > n:
+            raise ValueError(
+                "Must provide at least as many values from the "
+                "pacf as the number of lags."
+            )
+        pacf = pacf[:nlags]
+        n = pacf.shape[0]
+
+    acf = np.zeros(n + 1)
+    acf[1] = pacf[0]
+    nu = np.cumprod(1 - pacf ** 2)
+    arcoefs = pacf.copy()
+    for i in range(1, n):
+        prev = arcoefs[: -(n - i)].copy()
+        arcoefs[: -(n - i)] = prev - arcoefs[i] * prev[::-1]
+        acf[i + 1] = arcoefs[i] * nu[i - 1] + prev.dot(acf[1 : -(n - i)][::-1])
+    acf[0] = 1
+    return arcoefs, acf
+
+
+def breakvar_heteroskedasticity_test(
+    resid, subset_length=1 / 3, alternative="two-sided", use_f=True
+):
+    r"""
     Test for heteroskedasticity of residuals

     Tests whether the sum-of-squares in the first subset of the sample is
@@ -757,12 +1365,12 @@ def breakvar_heteroskedasticity_test(resid, subset_length=1 / 3,

     .. math::

-        H(h) = \\sum_{t=T-h+1}^T  \\tilde v_t^2
-        \\Bigg / \\sum_{t=1}^{h} \\tilde v_t^2
+        H(h) = \sum_{t=T-h+1}^T  \tilde v_t^2
+        \Bigg / \sum_{t=1}^{h} \tilde v_t^2

     This statistic can be tested against an :math:`F(h,h)` distribution.
     Alternatively, :math:`h H(h)` is asymptotically distributed according
-    to :math:`\\chi_h^2`; this second test can be applied by passing
+    to :math:`\chi_h^2`; this second test can be applied by passing
     `use_f=False` as an argument.

     See section 5.4 of [1]_ for the above formula and discussion, as well
@@ -773,7 +1381,82 @@ def breakvar_heteroskedasticity_test(resid, subset_length=1 / 3,
     .. [1] Harvey, Andrew C. 1990. *Forecasting, Structural Time Series*
             *Models and the Kalman Filter.* Cambridge University Press.
     """
-    pass
+    squared_resid = np.asarray(resid, dtype=float) ** 2
+    if squared_resid.ndim == 1:
+        squared_resid = squared_resid.reshape(-1, 1)
+    nobs = len(resid)
+
+    if 0 < subset_length < 1:
+        h = int(np.round(nobs * subset_length))
+    elif type(subset_length) is int and subset_length >= 1:
+        h = subset_length
+
+    numer_resid = squared_resid[-h:]
+    numer_dof = (~np.isnan(numer_resid)).sum(axis=0)
+    numer_squared_sum = np.nansum(numer_resid, axis=0)
+    for i, dof in enumerate(numer_dof):
+        if dof < 2:
+            warnings.warn(
+                "Early subset of data for variable %d"
+                " has too few non-missing observations to"
+                " calculate test statistic." % i,
+                stacklevel=2,
+            )
+            numer_squared_sum[i] = np.nan
+
+    denom_resid = squared_resid[:h]
+    denom_dof = (~np.isnan(denom_resid)).sum(axis=0)
+    denom_squared_sum = np.nansum(denom_resid, axis=0)
+    for i, dof in enumerate(denom_dof):
+        if dof < 2:
+            warnings.warn(
+                "Later subset of data for variable %d"
+                " has too few non-missing observations to"
+                " calculate test statistic." % i,
+                stacklevel=2,
+            )
+            denom_squared_sum[i] = np.nan
+
+    test_statistic = numer_squared_sum / denom_squared_sum
+
+    # Setup functions to calculate the p-values
+    if use_f:
+        from scipy.stats import f
+
+        pval_lower = lambda test_statistics: f.cdf(  # noqa:E731
+            test_statistics, numer_dof, denom_dof
+        )
+        pval_upper = lambda test_statistics: f.sf(  # noqa:E731
+            test_statistics, numer_dof, denom_dof
+        )
+    else:
+        from scipy.stats import chi2
+
+        pval_lower = lambda test_statistics: chi2.cdf(  # noqa:E731
+            numer_dof * test_statistics, denom_dof
+        )
+        pval_upper = lambda test_statistics: chi2.sf(  # noqa:E731
+            numer_dof * test_statistics, denom_dof
+        )
+
+    # Calculate the one- or two-sided p-values
+    alternative = alternative.lower()
+    if alternative in ["i", "inc", "increasing"]:
+        p_value = pval_upper(test_statistic)
+    elif alternative in ["d", "dec", "decreasing"]:
+        test_statistic = 1.0 / test_statistic
+        p_value = pval_upper(test_statistic)
+    elif alternative in ["2", "2-sided", "two-sided"]:
+        p_value = 2 * np.minimum(
+            pval_lower(test_statistic), pval_upper(test_statistic)
+        )
+    else:
+        raise ValueError("Invalid alternative.")
+
+    if len(test_statistic) == 1:
+        return test_statistic[0], p_value[0]
+
+    return test_statistic, p_value


 def grangercausalitytests(x, maxlag, addconst=True, verbose=None):
@@ -853,11 +1536,167 @@ def grangercausalitytests(x, maxlag, addconst=True, verbose=None):

     >>> gc_res = grangercausalitytests(data, [4])
     """
-    pass
-
-
-def coint(y0, y1, trend='c', method='aeg', maxlag=None, autolag: (str |
-    None)='aic', return_results=None):
+    x = array_like(x, "x", ndim=2)
+    if not np.isfinite(x).all():
+        raise ValueError("x contains NaN or inf values.")
+    addconst = bool_like(addconst, "addconst")
+    if verbose is not None:
+        verbose = bool_like(verbose, "verbose")
+        warnings.warn(
+            "verbose is deprecated since functions should not print results",
+            FutureWarning,
+        )
+    else:
+        verbose = True  # old default
+
+    try:
+        maxlag = int_like(maxlag, "maxlag")
+        if maxlag <= 0:
+            raise ValueError("maxlag must be a positive integer")
+        lags = np.arange(1, maxlag + 1)
+    except TypeError:
+        lags = np.array([int(lag) for lag in maxlag])
+        maxlag = lags.max()
+        if lags.min() <= 0 or lags.size == 0:
+            raise ValueError(
+                "maxlag must be a non-empty list containing only "
+                "positive integers"
+            )
+
+    if x.shape[0] <= 3 * maxlag + int(addconst):
+        raise ValueError(
+            "Insufficient observations. Maximum allowable "
+            "lag is {}".format(int((x.shape[0] - int(addconst)) / 3) - 1)
+        )
+
+    resli = {}
+
+    for mlg in lags:
+        result = {}
+        if verbose:
+            print("\nGranger Causality")
+            print("number of lags (no zero)", mlg)
+        mxlg = mlg
+
+        # create lagmat of both time series
+        dta = lagmat2ds(x, mxlg, trim="both", dropex=1)
+
+        # add constant
+        if addconst:
+            dtaown = add_constant(dta[:, 1 : (mxlg + 1)], prepend=False)
+            dtajoint = add_constant(dta[:, 1:], prepend=False)
+            if (
+                dtajoint.shape[1] == (dta.shape[1] - 1)
+                or (dtajoint.max(0) == dtajoint.min(0)).sum() != 1
+            ):
+                raise InfeasibleTestError(
+                    "The x values include a column with constant values and so"
+                    " the test statistic cannot be computed."
+                )
+        else:
+            raise NotImplementedError("Not Implemented")
+            # dtaown = dta[:, 1:mxlg]
+            # dtajoint = dta[:, 1:]
+
+        # Run ols on both models without and with lags of second variable
+        res2down = OLS(dta[:, 0], dtaown).fit()
+        res2djoint = OLS(dta[:, 0], dtajoint).fit()
+
+        # print results
+        # for ssr based tests see:
+        # http://support.sas.com/rnd/app/examples/ets/granger/index.htm
+        # the other tests are made-up
+
+        # Granger Causality test using ssr (F statistic)
+        if res2djoint.model.k_constant:
+            tss = res2djoint.centered_tss
+        else:
+            tss = res2djoint.uncentered_tss
+        if (
+            tss == 0
+            or res2djoint.ssr == 0
+            or np.isnan(res2djoint.rsquared)
+            or (res2djoint.ssr / tss) < np.finfo(float).eps
+            or res2djoint.params.shape[0] != dtajoint.shape[1]
+        ):
+            raise InfeasibleTestError(
+                "The Granger causality test statistic cannot be compute "
+                "because the VAR has a perfect fit of the data."
+            )
+        fgc1 = (
+            (res2down.ssr - res2djoint.ssr)
+            / res2djoint.ssr
+            / mxlg
+            * res2djoint.df_resid
+        )
+        if verbose:
+            print(
+                "ssr based F test:         F=%-8.4f, p=%-8.4f, df_denom=%d,"
+                " df_num=%d"
+                % (
+                    fgc1,
+                    stats.f.sf(fgc1, mxlg, res2djoint.df_resid),
+                    res2djoint.df_resid,
+                    mxlg,
+                )
+            )
+        result["ssr_ftest"] = (
+            fgc1,
+            stats.f.sf(fgc1, mxlg, res2djoint.df_resid),
+            res2djoint.df_resid,
+            mxlg,
+        )
+
+        # Granger Causality test using ssr (ch2 statistic)
+        fgc2 = res2down.nobs * (res2down.ssr - res2djoint.ssr) / res2djoint.ssr
+        if verbose:
+            print(
+                "ssr based chi2 test:   chi2=%-8.4f, p=%-8.4f, "
+                "df=%d" % (fgc2, stats.chi2.sf(fgc2, mxlg), mxlg)
+            )
+        result["ssr_chi2test"] = (fgc2, stats.chi2.sf(fgc2, mxlg), mxlg)
+
+        # likelihood ratio test pvalue:
+        lr = -2 * (res2down.llf - res2djoint.llf)
+        if verbose:
+            print(
+                "likelihood ratio test: chi2=%-8.4f, p=%-8.4f, df=%d"
+                % (lr, stats.chi2.sf(lr, mxlg), mxlg)
+            )
+        result["lrtest"] = (lr, stats.chi2.sf(lr, mxlg), mxlg)
+
+        # F test that all lag coefficients of exog are zero
+        rconstr = np.column_stack(
+            (np.zeros((mxlg, mxlg)), np.eye(mxlg, mxlg), np.zeros((mxlg, 1)))
+        )
+        ftres = res2djoint.f_test(rconstr)
+        if verbose:
+            print(
+                "parameter F test:         F=%-8.4f, p=%-8.4f, df_denom=%d,"
+                " df_num=%d"
+                % (ftres.fvalue, ftres.pvalue, ftres.df_denom, ftres.df_num)
+            )
+        result["params_ftest"] = (
+            np.squeeze(ftres.fvalue)[()],
+            np.squeeze(ftres.pvalue)[()],
+            ftres.df_denom,
+            ftres.df_num,
+        )
+
+        resli[mxlg] = (result, [res2down, res2djoint, rconstr])
+
+    return resli
+
+
+def coint(
+    y0,
+    y1,
+    trend="c",
+    method="aeg",
+    maxlag=None,
+    autolag: str | None = "aic",
+    return_results=None,
+):
     """
     Test for no-cointegration of a univariate equation.

@@ -943,11 +1782,84 @@ def coint(y0, y1, trend='c', method='aeg', maxlag=None, autolag: (str |
        Queen"s University, Dept of Economics Working Papers 1227.
        http://ideas.repec.org/p/qed/wpaper/1227.html
     """
-    pass
-
-
-def arma_order_select_ic(y, max_ar=4, max_ma=2, ic='bic', trend='c',
-    model_kw=None, fit_kw=None):
+    y0 = array_like(y0, "y0")
+    y1 = array_like(y1, "y1", ndim=2)
+    trend = string_like(trend, "trend", options=("c", "n", "ct", "ctt"))
+    string_like(method, "method", options=("aeg",))
+    maxlag = int_like(maxlag, "maxlag", optional=True)
+    autolag = string_like(
+        autolag, "autolag", optional=True, options=("aic", "bic", "t-stat")
+    )
+    return_results = bool_like(return_results, "return_results", optional=True)
+
+    nobs, k_vars = y1.shape
+    k_vars += 1  # add 1 for y0
+
+    if trend == "n":
+        xx = y1
+    else:
+        xx = add_trend(y1, trend=trend, prepend=False)
+
+    res_co = OLS(y0, xx).fit()
+
+    if res_co.rsquared < 1 - 100 * SQRTEPS:
+        res_adf = adfuller(
+            res_co.resid, maxlag=maxlag, autolag=autolag, regression="n"
+        )
+    else:
+        warnings.warn(
+            "y0 and y1 are (almost) perfectly colinear."
+            "Cointegration test is not reliable in this case.",
+            CollinearityWarning,
+            stacklevel=2,
+        )
+        # Edge case where series are too similar
+        res_adf = (-np.inf,)
+
+    # no constant or trend, see egranger in Stata and MacKinnon
+    if trend == "n":
+        crit = [np.nan] * 3  # 2010 critical values not available
+    else:
+        crit = mackinnoncrit(N=k_vars, regression=trend, nobs=nobs - 1)
+        #  nobs - 1, the -1 is to match egranger in Stata, I do not know why.
+        #  TODO: check nobs or df = nobs - k
+
+    pval_asy = mackinnonp(res_adf[0], regression=trend, N=k_vars)
+    return res_adf[0], pval_asy, crit
+
+
+def _safe_arma_fit(y, order, model_kw, trend, fit_kw, start_params=None):
+    from statsmodels.tsa.arima.model import ARIMA
+
+    try:
+        return ARIMA(y, order=order, **model_kw, trend=trend).fit(
+            start_params=start_params, **fit_kw
+        )
+    except LinAlgError:
+        # SVD convergence failure on badly misspecified models
+        return
+
+    except ValueError as error:
+        if start_params is not None:  # do not recurse again
+            # user supplied start_params only get one chance
+            return
+        # try a little harder, should be handled in fit really
+        elif "initial" not in error.args[0] or "initial" in str(error):
+            start_params = [0.1] * sum(order)
+            if trend == "c":
+                start_params = [0.1] + start_params
+            return _safe_arma_fit(
+                y, order, model_kw, trend, fit_kw, start_params
+            )
+        else:
+            return
+    except:  # no idea what happened
+        return
+
+
+def arma_order_select_ic(
+    y, max_ar=4, max_ma=2, ic="bic", trend="c", model_kw=None, fit_kw=None
+):
     """
     Compute information criteria for many ARMA models.

@@ -1004,19 +1916,64 @@ def arma_order_select_ic(y, max_ar=4, max_ma=2, ic='bic', trend='c',
     >>> res.aic_min_order
     >>> res.bic_min_order
     """
-    pass
+    max_ar = int_like(max_ar, "max_ar")
+    max_ma = int_like(max_ma, "max_ma")
+    trend = string_like(trend, "trend", options=("n", "c"))
+    model_kw = dict_like(model_kw, "model_kw", optional=True)
+    fit_kw = dict_like(fit_kw, "fit_kw", optional=True)
+
+    ar_range = [i for i in range(max_ar + 1)]
+    ma_range = [i for i in range(max_ma + 1)]
+    if isinstance(ic, str):
+        ic = [ic]
+    elif not isinstance(ic, (list, tuple)):
+        raise ValueError("Need a list or a tuple for ic if not a string.")
+
+    results = np.zeros((len(ic), max_ar + 1, max_ma + 1))
+    model_kw = {} if model_kw is None else model_kw
+    fit_kw = {} if fit_kw is None else fit_kw
+    y_arr = array_like(y, "y", contiguous=True)
+    for ar in ar_range:
+        for ma in ma_range:
+            mod = _safe_arma_fit(y_arr, (ar, 0, ma), model_kw, trend, fit_kw)
+            if mod is None:
+                results[:, ar, ma] = np.nan
+                continue
+
+            for i, criteria in enumerate(ic):
+                results[i, ar, ma] = getattr(mod, criteria)
+
+    dfs = [
+        pd.DataFrame(res, columns=ma_range, index=ar_range) for res in results
+    ]
+
+    res = dict(zip(ic, dfs))
+
+    # add the minimums to the results dict
+    min_res = {}
+    for i, result in res.items():
+        delta = np.ascontiguousarray(np.abs(result.min().min() - result))
+        ncols = delta.shape[1]
+        loc = np.argmin(delta)
+        min_res.update({i + "_min_order": (loc // ncols, loc % ncols)})
+    res.update(min_res)
+
+    return Bunch(**res)


 def has_missing(data):
     """
     Returns True if "data" contains missing entries, otherwise False
     """
-    pass
+    return np.isnan(np.sum(data))


-def kpss(x, regression: Literal['c', 'ct']='c', nlags: (Literal['auto',
-    'legacy'] | int)='auto', store: bool=False) ->tuple[float, float, int,
-    dict[str, float]]:
+def kpss(
+    x,
+    regression: Literal["c", "ct"] = "c",
+    nlags: Literal["auto", "legacy"] | int = "auto",
+    store: bool = False,
+) -> tuple[float, float, int, dict[str, float]]:
     """
     Kwiatkowski-Phillips-Schmidt-Shin test for stationarity.

@@ -1094,7 +2051,95 @@ def kpss(x, regression: Literal['c', 'ct']='c', nlags: (Literal['auto',
        investigation. Journal of Business and Economic Statistics, 7 (2):
        147-159.
     """
-    pass
+    x = array_like(x, "x")
+    regression = string_like(regression, "regression", options=("c", "ct"))
+    store = bool_like(store, "store")
+
+    nobs = x.shape[0]
+    hypo = regression
+
+    # if m is not one, n != m * n
+    if nobs != x.size:
+        raise ValueError(f"x of shape {x.shape} not understood")
+
+    if hypo == "ct":
+        # p. 162 Kwiatkowski et al. (1992): y_t = beta * t + r_t + e_t,
+        # where beta is the trend, r_t a random walk and e_t a stationary
+        # error term.
+        resids = OLS(x, add_constant(np.arange(1, nobs + 1))).fit().resid
+        crit = [0.119, 0.146, 0.176, 0.216]
+    else:  # hypo == "c"
+        # special case of the model above, where beta = 0 (so the null
+        # hypothesis is that the data is stationary around r_0).
+        resids = x - x.mean()
+        crit = [0.347, 0.463, 0.574, 0.739]
+
+    if nlags == "legacy":
+        nlags = int(np.ceil(12.0 * np.power(nobs / 100.0, 1 / 4.0)))
+        nlags = min(nlags, nobs - 1)
+    elif nlags == "auto" or nlags is None:
+        if nlags is None:
+            # TODO: Remove before 0.14 is released
+            warnings.warn(
+                "None is not a valid value for nlags. It must be an integer, "
+                "'auto' or 'legacy'. None will raise starting in 0.14",
+                FutureWarning,
+                stacklevel=2,
+            )
+        # autolag method of Hobijn et al. (1998)
+        nlags = _kpss_autolag(resids, nobs)
+        nlags = min(nlags, nobs - 1)
+    elif isinstance(nlags, str):
+        raise ValueError("nvals must be 'auto' or 'legacy' when not an int")
+    else:
+        nlags = int_like(nlags, "nlags", optional=False)
+
+        if nlags >= nobs:
+            raise ValueError(
+                f"lags ({nlags}) must be < number of observations ({nobs})"
+            )
+
+    pvals = [0.10, 0.05, 0.025, 0.01]
+
+    eta = np.sum(resids.cumsum() ** 2) / (nobs ** 2)  # eq. 11, p. 165
+    s_hat = _sigma_est_kpss(resids, nobs, nlags)
+
+    kpss_stat = eta / s_hat
+    p_value = np.interp(kpss_stat, crit, pvals)
+
+    warn_msg = """\
+The test statistic is outside of the range of p-values available in the
+look-up table. The actual p-value is {direction} than the p-value returned.
+"""
+    if p_value == pvals[-1]:
+        warnings.warn(
+            warn_msg.format(direction="smaller"),
+            InterpolationWarning,
+            stacklevel=2,
+        )
+    elif p_value == pvals[0]:
+        warnings.warn(
+            warn_msg.format(direction="greater"),
+            InterpolationWarning,
+            stacklevel=2,
+        )
+
+    crit_dict = {"10%": crit[0], "5%": crit[1], "2.5%": crit[2], "1%": crit[3]}
+
+    if store:
+        from statsmodels.stats.diagnostic import ResultsStore
+
+        rstore = ResultsStore()
+        rstore.lags = nlags
+        rstore.nobs = nobs
+
+        stationary_type = "level" if hypo == "c" else "trend"
+        rstore.H0 = f"The series is {stationary_type} stationary"
+        rstore.HA = f"The series is not {stationary_type} stationary"
+
+        return kpss_stat, p_value, crit_dict, rstore
+    else:
+        return kpss_stat, p_value, nlags, crit_dict


 def _sigma_est_kpss(resids, nobs, lags):
@@ -1102,7 +2147,11 @@ def _sigma_est_kpss(resids, nobs, lags):
     Computes equation 10, p. 164 of Kwiatkowski et al. (1992). This is the
     consistent estimator for the variance.
     """
-    pass
+    s_hat = np.sum(resids ** 2)
+    for i in range(1, lags + 1):
+        resids_prod = np.dot(resids[i:], resids[: nobs - i])
+        s_hat += 2 * resids_prod * (1.0 - (i / (lags + 1.0)))
+    return s_hat / nobs


 def _kpss_autolag(resids, nobs):
@@ -1111,7 +2160,19 @@ def _kpss_autolag(resids, nobs):
     using method of Hobijn et al (1998). See also Andrews (1991), Newey & West
     (1994), and Schwert (1989). Assumes Bartlett / Newey-West kernel.
     """
-    pass
+    covlags = int(np.power(nobs, 2.0 / 9.0))
+    s0 = np.sum(resids ** 2) / nobs
+    s1 = 0
+    for i in range(1, covlags + 1):
+        resids_prod = np.dot(resids[i:], resids[: nobs - i])
+        resids_prod /= nobs / 2.0
+        s0 += resids_prod
+        s1 += i * resids_prod
+    s_hat = s1 / s0
+    pwr = 1.0 / 3.0
+    gamma_hat = 1.1447 * np.power(s_hat * s_hat, pwr)
+    autolags = int(gamma_hat * np.power(nobs, pwr))
+    return autolags


 def range_unit_root_test(x, store=False):
@@ -1160,7 +2221,97 @@ def range_unit_root_test(x, store=False):
         tests: robust against nonlinearities, error distributions, structural breaks
         and outliers. Journal of Time Series Analysis, 27 (4): 545-576.
     """
-    pass
+    x = array_like(x, "x")
+    store = bool_like(store, "store")
+
+    nobs = x.shape[0]
+
+    # if m is not one, n != m * n
+    if nobs != x.size:
+        raise ValueError(f"x of shape {x.shape} not understood")
+
+    # Table from [1] has been replicated using 200,000 samples
+    # Critical values for new n_obs values have been identified
+    pvals = [0.01, 0.025, 0.05, 0.10, 0.90, 0.95]
+    n = np.array(
+        [25, 50, 100, 150, 200, 250, 500, 1000, 2000, 3000, 4000, 5000]
+    )
+    crit = np.array(
+        [
+            [0.6626, 0.8126, 0.9192, 1.0712, 2.4863, 2.7312],
+            [0.7977, 0.9274, 1.0478, 1.1964, 2.6821, 2.9613],
+            [0.9070, 1.0243, 1.1412, 1.2888, 2.8317, 3.1393],
+            [0.9543, 1.0768, 1.1869, 1.3294, 2.8915, 3.2049],
+            [0.9833, 1.0984, 1.2101, 1.3494, 2.9308, 3.2482],
+            [0.9982, 1.1137, 1.2242, 1.3632, 2.9571, 3.2842],
+            [1.0494, 1.1643, 1.2712, 1.4076, 3.0207, 3.3584],
+            [1.0846, 1.1959, 1.2988, 1.4344, 3.0653, 3.4073],
+            [1.1121, 1.2200, 1.3230, 1.4556, 3.0948, 3.4439],
+            [1.1204, 1.2295, 1.3303, 1.4656, 3.1054, 3.4632],
+            [1.1309, 1.2347, 1.3378, 1.4693, 3.1165, 3.4717],
+            [1.1377, 1.2402, 1.3408, 1.4729, 3.1252, 3.4807],
+        ]
+    )
+
+    # Interpolation for nobs
+    inter_crit = np.zeros((1, crit.shape[1]))
+    for i in range(crit.shape[1]):
+        f = interp1d(n, crit[:, i])
+        inter_crit[0, i] = f(nobs)
+
+    # Calculate RUR stat
+    xs = pd.Series(x)
+    exp_max = xs.expanding(1).max().shift(1)
+    exp_min = xs.expanding(1).min().shift(1)
+    count = (xs > exp_max).sum() + (xs < exp_min).sum()
+
+    rur_stat = count / np.sqrt(len(x))
+
+    k = len(pvals) - 1
+    for i in range(len(pvals) - 1, -1, -1):
+        if rur_stat < inter_crit[0, i]:
+            k = i
+        else:
+            break
+
+    p_value = pvals[k]
+
+    warn_msg = """\
+The test statistic is outside of the range of p-values available in the
+look-up table. The actual p-value is {direction} than the p-value returned.
+"""
+    direction = ""
+    if p_value == pvals[-1]:
+        direction = "smaller"
+    elif p_value == pvals[0]:
+        direction = "larger"
+
+    if direction:
+        warnings.warn(
+            warn_msg.format(direction=direction),
+            InterpolationWarning,
+            stacklevel=2,
+        )
+
+    crit_dict = {
+        "10%": inter_crit[0, 3],
+        "5%": inter_crit[0, 2],
+        "2.5%": inter_crit[0, 1],
+        "1%": inter_crit[0, 0],
+    }
+
+    if store:
+        from statsmodels.stats.diagnostic import ResultsStore
+
+        rstore = ResultsStore()
+        rstore.nobs = nobs
+
+        rstore.H0 = "The series is not stationary"
+        rstore.HA = "The series is stationary"
+
+        return rur_stat, p_value, crit_dict, rstore
+    else:
+        return rur_stat, p_value, crit_dict


 class ZivotAndrewsUnitRoot:
@@ -1179,53 +2330,164 @@ class ZivotAndrewsUnitRoot:
         100,000 replications and 2000 data points.
         """
         self._za_critical_values = {}
-        self._c = (0.001, -6.78442), (0.1, -5.83192), (0.2, -5.68139), (0.3,
-            -5.58461), (0.4, -5.51308), (0.5, -5.45043), (0.6, -5.39924), (
-            0.7, -5.36023), (0.8, -5.33219), (0.9, -5.30294), (1.0, -5.27644
-            ), (2.5, -5.0334), (5.0, -4.81067), (7.5, -4.67636), (10.0, -
-            4.56618), (12.5, -4.4813), (15.0, -4.40507), (17.5, -4.33947), (
-            20.0, -4.28155), (22.5, -4.22683), (25.0, -4.1783), (27.5, -4.13101
-            ), (30.0, -4.08586), (32.5, -4.04455), (35.0, -4.0038), (37.5, 
-            -3.96144), (40.0, -3.92078), (42.5, -3.88178), (45.0, -3.84503), (
-            47.5, -3.80549), (50.0, -3.77031), (52.5, -3.73209), (55.0, -3.696
-            ), (57.5, -3.65985), (60.0, -3.62126), (65.0, -3.5458), (70.0, 
-            -3.46848), (75.0, -3.38533), (80.0, -3.29112), (85.0, -3.17832), (
-            90.0, -3.04165), (92.5, -2.95146), (95.0, -2.83179), (96.0, -
-            2.76465), (97.0, -2.68624), (98.0, -2.57884), (99.0, -2.40044), (
-            99.9, -1.88932)
-        self._za_critical_values['c'] = np.asarray(self._c)
-        self._t = (0.001, -83.9094), (0.1, -13.8837), (0.2, -9.13205), (0.3,
-            -6.32564), (0.4, -5.60803), (0.5, -5.38794), (0.6, -5.26585), (
-            0.7, -5.18734), (0.8, -5.12756), (0.9, -5.07984), (1.0, -5.03421
-            ), (2.5, -4.65634), (5.0, -4.4058), (7.5, -4.25214), (10.0, -
-            4.13678), (12.5, -4.03765), (15.0, -3.95185), (17.5, -3.87945), (
-            20.0, -3.81295), (22.5, -3.75273), (25.0, -3.69836), (27.5, -
-            3.64785), (30.0, -3.59819), (32.5, -3.55146), (35.0, -3.50522), (
-            37.5, -3.45987), (40.0, -3.41672), (42.5, -3.37465), (45.0, -
-            3.33394), (47.5, -3.29393), (50.0, -3.25316), (52.5, -3.21244), (
-            55.0, -3.17124), (57.5, -3.13211), (60.0, -3.09204), (65.0, -
-            3.01135), (70.0, -2.92897), (75.0, -2.83614), (80.0, -2.73893), (
-            85.0, -2.6284), (90.0, -2.49611), (92.5, -2.41337), (95.0, -2.3082
-            ), (96.0, -2.25797), (97.0, -2.19648), (98.0, -2.1132), (99.0, 
-            -1.99138), (99.9, -1.67466)
-        self._za_critical_values['t'] = np.asarray(self._t)
-        self._ct = (0.001, -38.178), (0.1, -6.43107), (0.2, -6.07279), (0.3,
-            -5.95496), (0.4, -5.86254), (0.5, -5.77081), (0.6, -5.72541), (
-            0.7, -5.68406), (0.8, -5.65163), (0.9, -5.60419), (1.0, -5.57556
-            ), (2.5, -5.29704), (5.0, -5.07332), (7.5, -4.93003), (10.0, -
-            4.82668), (12.5, -4.73711), (15.0, -4.6602), (17.5, -4.5897), (
-            20.0, -4.52855), (22.5, -4.471), (25.0, -4.42011), (27.5, -4.37387
-            ), (30.0, -4.32705), (32.5, -4.28126), (35.0, -4.23793), (37.5,
-            -4.19822), (40.0, -4.158), (42.5, -4.11946), (45.0, -4.08064), (
-            47.5, -4.04286), (50.0, -4.00489), (52.5, -3.96837), (55.0, -3.932
-            ), (57.5, -3.89496), (60.0, -3.85577), (65.0, -3.77795), (70.0,
-            -3.69794), (75.0, -3.61852), (80.0, -3.52485), (85.0, -3.41665), (
-            90.0, -3.28527), (92.5, -3.19724), (95.0, -3.08769), (96.0, -
-            3.03088), (97.0, -2.96091), (98.0, -2.85581), (99.0, -2.71015), (
-            99.9, -2.28767)
-        self._za_critical_values['ct'] = np.asarray(self._ct)
-
-    def _za_crit(self, stat, model='c'):
+        # constant-only model
+        self._c = (
+            (0.001, -6.78442),
+            (0.100, -5.83192),
+            (0.200, -5.68139),
+            (0.300, -5.58461),
+            (0.400, -5.51308),
+            (0.500, -5.45043),
+            (0.600, -5.39924),
+            (0.700, -5.36023),
+            (0.800, -5.33219),
+            (0.900, -5.30294),
+            (1.000, -5.27644),
+            (2.500, -5.03340),
+            (5.000, -4.81067),
+            (7.500, -4.67636),
+            (10.000, -4.56618),
+            (12.500, -4.48130),
+            (15.000, -4.40507),
+            (17.500, -4.33947),
+            (20.000, -4.28155),
+            (22.500, -4.22683),
+            (25.000, -4.17830),
+            (27.500, -4.13101),
+            (30.000, -4.08586),
+            (32.500, -4.04455),
+            (35.000, -4.00380),
+            (37.500, -3.96144),
+            (40.000, -3.92078),
+            (42.500, -3.88178),
+            (45.000, -3.84503),
+            (47.500, -3.80549),
+            (50.000, -3.77031),
+            (52.500, -3.73209),
+            (55.000, -3.69600),
+            (57.500, -3.65985),
+            (60.000, -3.62126),
+            (65.000, -3.54580),
+            (70.000, -3.46848),
+            (75.000, -3.38533),
+            (80.000, -3.29112),
+            (85.000, -3.17832),
+            (90.000, -3.04165),
+            (92.500, -2.95146),
+            (95.000, -2.83179),
+            (96.000, -2.76465),
+            (97.000, -2.68624),
+            (98.000, -2.57884),
+            (99.000, -2.40044),
+            (99.900, -1.88932),
+        )
+        self._za_critical_values["c"] = np.asarray(self._c)
+        # trend-only model
+        self._t = (
+            (0.001, -83.9094),
+            (0.100, -13.8837),
+            (0.200, -9.13205),
+            (0.300, -6.32564),
+            (0.400, -5.60803),
+            (0.500, -5.38794),
+            (0.600, -5.26585),
+            (0.700, -5.18734),
+            (0.800, -5.12756),
+            (0.900, -5.07984),
+            (1.000, -5.03421),
+            (2.500, -4.65634),
+            (5.000, -4.40580),
+            (7.500, -4.25214),
+            (10.000, -4.13678),
+            (12.500, -4.03765),
+            (15.000, -3.95185),
+            (17.500, -3.87945),
+            (20.000, -3.81295),
+            (22.500, -3.75273),
+            (25.000, -3.69836),
+            (27.500, -3.64785),
+            (30.000, -3.59819),
+            (32.500, -3.55146),
+            (35.000, -3.50522),
+            (37.500, -3.45987),
+            (40.000, -3.41672),
+            (42.500, -3.37465),
+            (45.000, -3.33394),
+            (47.500, -3.29393),
+            (50.000, -3.25316),
+            (52.500, -3.21244),
+            (55.000, -3.17124),
+            (57.500, -3.13211),
+            (60.000, -3.09204),
+            (65.000, -3.01135),
+            (70.000, -2.92897),
+            (75.000, -2.83614),
+            (80.000, -2.73893),
+            (85.000, -2.62840),
+            (90.000, -2.49611),
+            (92.500, -2.41337),
+            (95.000, -2.30820),
+            (96.000, -2.25797),
+            (97.000, -2.19648),
+            (98.000, -2.11320),
+            (99.000, -1.99138),
+            (99.900, -1.67466),
+        )
+        self._za_critical_values["t"] = np.asarray(self._t)
+        # constant + trend model
+        self._ct = (
+            (0.001, -38.17800),
+            (0.100, -6.43107),
+            (0.200, -6.07279),
+            (0.300, -5.95496),
+            (0.400, -5.86254),
+            (0.500, -5.77081),
+            (0.600, -5.72541),
+            (0.700, -5.68406),
+            (0.800, -5.65163),
+            (0.900, -5.60419),
+            (1.000, -5.57556),
+            (2.500, -5.29704),
+            (5.000, -5.07332),
+            (7.500, -4.93003),
+            (10.000, -4.82668),
+            (12.500, -4.73711),
+            (15.000, -4.66020),
+            (17.500, -4.58970),
+            (20.000, -4.52855),
+            (22.500, -4.47100),
+            (25.000, -4.42011),
+            (27.500, -4.37387),
+            (30.000, -4.32705),
+            (32.500, -4.28126),
+            (35.000, -4.23793),
+            (37.500, -4.19822),
+            (40.000, -4.15800),
+            (42.500, -4.11946),
+            (45.000, -4.08064),
+            (47.500, -4.04286),
+            (50.000, -4.00489),
+            (52.500, -3.96837),
+            (55.000, -3.93200),
+            (57.500, -3.89496),
+            (60.000, -3.85577),
+            (65.000, -3.77795),
+            (70.000, -3.69794),
+            (75.000, -3.61852),
+            (80.000, -3.52485),
+            (85.000, -3.41665),
+            (90.000, -3.28527),
+            (92.500, -3.19724),
+            (95.000, -3.08769),
+            (96.000, -3.03088),
+            (97.000, -2.96091),
+            (98.000, -2.85581),
+            (99.000, -2.71015),
+            (99.900, -2.28767),
+        )
+        self._za_critical_values["ct"] = np.asarray(self._ct)
+
+    def _za_crit(self, stat, model="c"):
         """
         Linear interpolation for Zivot-Andrews p-values and critical values

@@ -1249,29 +2511,72 @@ class ZivotAndrewsUnitRoot:
         The p-values are linear interpolated from the quantiles of the
         simulated ZA test statistic distribution
         """
-        pass
+        table = self._za_critical_values[model]
+        pcnts = table[:, 0]
+        stats = table[:, 1]
+        # ZA cv table contains quantiles multiplied by 100
+        pvalue = np.interp(stat, stats, pcnts) / 100.0
+        cv = [1.0, 5.0, 10.0]
+        crit_value = np.interp(cv, pcnts, stats)
+        cvdict = {
+            "1%": crit_value[0],
+            "5%": crit_value[1],
+            "10%": crit_value[2],
+        }
+        return pvalue, cvdict

     def _quick_ols(self, endog, exog):
         """
         Minimal implementation of LS estimator for internal use
         """
-        pass
+        xpxi = np.linalg.inv(exog.T.dot(exog))
+        xpy = exog.T.dot(endog)
+        nobs, k_exog = exog.shape
+        b = xpxi.dot(xpy)
+        e = endog - exog.dot(b)
+        sigma2 = e.T.dot(e) / (nobs - k_exog)
+        return b / np.sqrt(np.diag(sigma2 * xpxi))

     def _format_regression_data(self, series, nobs, const, trend, cols, lags):
         """
         Create the endog/exog data for the auxiliary regressions
         from the original (standardized) series under test.
         """
-        pass
-
-    def _update_regression_exog(self, exog, regression, period, nobs, const,
-        trend, cols, lags):
+        # first-diff y and standardize for numerical stability
+        endog = np.diff(series, axis=0)
+        endog /= np.sqrt(endog.T.dot(endog))
+        series /= np.sqrt(series.T.dot(series))
+        # reserve exog space
+        exog = np.zeros((endog[lags:].shape[0], cols + lags))
+        exog[:, 0] = const
+        # lagged y and dy
+        exog[:, cols - 1] = series[lags : (nobs - 1)]
+        exog[:, cols:] = lagmat(endog, lags, trim="none")[
+            lags : exog.shape[0] + lags
+        ]
+        return endog, exog
+
+    def _update_regression_exog(
+        self, exog, regression, period, nobs, const, trend, cols, lags
+    ):
         """
         Update the exog array for the next regression.
         """
-        pass
-
-    def run(self, x, trim=0.15, maxlag=None, regression='c', autolag='AIC'):
+        cutoff = period - (lags + 1)
+        if regression != "t":
+            exog[:cutoff, 1] = 0
+            exog[cutoff:, 1] = const
+            exog[:, 2] = trend[(lags + 2) : (nobs + 1)]
+            if regression == "ct":
+                exog[:cutoff, 3] = 0
+                exog[cutoff:, 3] = trend[1 : (nobs - period + 1)]
+        else:
+            exog[:, 1] = trend[(lags + 2) : (nobs + 1)]
+            exog[: (cutoff - 1), 2] = 0
+            exog[(cutoff - 1) :, 2] = trend[0 : (nobs - period + 1)]
+        return exog
+
+    def run(self, x, trim=0.15, maxlag=None, regression="c", autolag="AIC"):
         """
         Zivot-Andrews structural-break unit-root test.

@@ -1348,12 +2653,85 @@ class ZivotAndrewsUnitRoot:
            great crash, the oil-price shock, and the unit-root hypothesis.
            Journal of Business & Economic Studies, 10: 251-270.
         """
-        pass
-
-    def __call__(self, x, trim=0.15, maxlag=None, regression='c', autolag='AIC'
-        ):
-        return self.run(x, trim=trim, maxlag=maxlag, regression=regression,
-            autolag=autolag)
+        x = array_like(x, "x")
+        trim = float_like(trim, "trim")
+        maxlag = int_like(maxlag, "maxlag", optional=True)
+        regression = string_like(
+            regression, "regression", options=("c", "t", "ct")
+        )
+        autolag = string_like(
+            autolag, "autolag", options=("aic", "bic", "t-stat"), optional=True
+        )
+        if trim < 0 or trim > (1.0 / 3.0):
+            raise ValueError("trim value must be a float in range [0, 1/3)")
+        nobs = x.shape[0]
+        if autolag:
+            adf_res = adfuller(
+                x, maxlag=maxlag, regression="ct", autolag=autolag
+            )
+            baselags = adf_res[2]
+        elif maxlag:
+            baselags = maxlag
+        else:
+            baselags = int(12.0 * np.power(nobs / 100.0, 1 / 4.0))
+        trimcnt = int(nobs * trim)
+        start_period = trimcnt
+        end_period = nobs - trimcnt
+        if regression == "ct":
+            basecols = 5
+        else:
+            basecols = 4
+        # normalize constant and trend terms for stability
+        c_const = 1 / np.sqrt(nobs)
+        t_const = np.arange(1.0, nobs + 2)
+        t_const *= np.sqrt(3) / nobs ** (3 / 2)
+        # format the auxiliary regression data
+        endog, exog = self._format_regression_data(
+            x, nobs, c_const, t_const, basecols, baselags
+        )
+        # iterate through the time periods
+        stats = np.full(end_period + 1, np.inf)
+        for bp in range(start_period + 1, end_period + 1):
+            # update intercept dummy / trend / trend dummy
+            exog = self._update_regression_exog(
+                exog,
+                regression,
+                bp,
+                nobs,
+                c_const,
+                t_const,
+                basecols,
+                baselags,
+            )
+            # check exog rank on first iteration
+            if bp == start_period + 1:
+                o = OLS(endog[baselags:], exog, hasconst=1).fit()
+                if o.df_model < exog.shape[1] - 1:
+                    raise ValueError(
+                        "ZA: auxiliary exog matrix is not full rank.\n"
+                        "  cols (exc intercept) = {}  rank = {}".format(
+                            exog.shape[1] - 1, o.df_model
+                        )
+                    )
+                stats[bp] = o.tvalues[basecols - 1]
+            else:
+                stats[bp] = self._quick_ols(endog[baselags:], exog)[
+                    basecols - 1
+                ]
+        # return best seen
+        zastat = np.min(stats)
+        bpidx = np.argmin(stats) - 1
+        crit = self._za_crit(zastat, regression)
+        pval = crit[0]
+        cvdict = crit[1]
+        return zastat, pval, cvdict, baselags, bpidx
+
+    def __call__(
+        self, x, trim=0.15, maxlag=None, regression="c", autolag="AIC"
+    ):
+        return self.run(
+            x, trim=trim, maxlag=maxlag, regression=regression, autolag=autolag
+        )


 zivot_andrews = ZivotAndrewsUnitRoot()
diff --git a/statsmodels/tsa/stl/mstl.py b/statsmodels/tsa/stl/mstl.py
index 34e66800b..0cbf774f1 100644
--- a/statsmodels/tsa/stl/mstl.py
+++ b/statsmodels/tsa/stl/mstl.py
@@ -18,9 +18,11 @@ https://arxiv.org/pdf/2107.13462.pdf
 """
 from typing import Dict, Optional, Sequence, Tuple, Union
 import warnings
+
 import numpy as np
 import pandas as pd
 from scipy.stats import boxcox
+
 from statsmodels.tools.typing import ArrayLike1D
 from statsmodels.tsa.stl._stl import STL
 from statsmodels.tsa.tsatools import freq_to_period
@@ -98,19 +100,27 @@ class MSTL:
     .. plot:: plots/mstl_plot.py
     """

-    def __init__(self, endog: ArrayLike1D, *, periods: Optional[Union[int,
-        Sequence[int]]]=None, windows: Optional[Union[int, Sequence[int]]]=
-        None, lmbda: Optional[Union[float, str]]=None, iterate: int=2,
-        stl_kwargs: Optional[Dict[str, Union[int, bool, None]]]=None):
+    def __init__(
+        self,
+        endog: ArrayLike1D,
+        *,
+        periods: Optional[Union[int, Sequence[int]]] = None,
+        windows: Optional[Union[int, Sequence[int]]] = None,
+        lmbda: Optional[Union[float, str]] = None,
+        iterate: int = 2,
+        stl_kwargs: Optional[Dict[str, Union[int, bool, None]]] = None,
+    ):
         self.endog = endog
         self._y = self._to_1d_array(endog)
         self.nobs = self._y.shape[0]
         self.lmbda = lmbda
-        self.periods, self.windows = self._process_periods_and_windows(periods,
-            windows)
+        self.periods, self.windows = self._process_periods_and_windows(
+            periods, windows
+        )
         self.iterate = iterate
-        self._stl_kwargs = self._remove_overloaded_stl_kwargs(stl_kwargs if
-            stl_kwargs else {})
+        self._stl_kwargs = self._remove_overloaded_stl_kwargs(
+            stl_kwargs if stl_kwargs else {}
+        )

     def fit(self):
         """
@@ -122,9 +132,155 @@ class MSTL:
         DecomposeResult
             Estimation results.
         """
-        pass
+        num_seasons = len(self.periods)
+        iterate = 1 if num_seasons == 1 else self.iterate
+
+        # Box Cox
+        if self.lmbda == "auto":
+            y, lmbda = boxcox(self._y, lmbda=None)
+            self.est_lmbda = lmbda
+        elif self.lmbda:
+            y = boxcox(self._y, lmbda=self.lmbda)
+        else:
+            y = self._y
+
+        # Get STL fit params
+        stl_inner_iter = self._stl_kwargs.pop("inner_iter", None)
+        stl_outer_iter = self._stl_kwargs.pop("outer_iter", None)
+
+        # Iterate over each seasonal component to extract seasonalities
+        seasonal = np.zeros(shape=(num_seasons, self.nobs))
+        deseas = y
+        for _ in range(iterate):
+            for i in range(num_seasons):
+                deseas = deseas + seasonal[i]
+                res = STL(
+                    endog=deseas,
+                    period=self.periods[i],
+                    seasonal=self.windows[i],
+                    **self._stl_kwargs,
+                ).fit(inner_iter=stl_inner_iter, outer_iter=stl_outer_iter)
+                seasonal[i] = res.seasonal
+                deseas = deseas - seasonal[i]
+
+        seasonal = np.squeeze(seasonal.T)
+        trend = res.trend
+        rw = res.weights
+        resid = deseas - trend
+
+        # Return pandas if endog is pandas
+        if isinstance(self.endog, (pd.Series, pd.DataFrame)):
+            index = self.endog.index
+            y = pd.Series(y, index=index, name="observed")
+            trend = pd.Series(trend, index=index, name="trend")
+            resid = pd.Series(resid, index=index, name="resid")
+            rw = pd.Series(rw, index=index, name="robust_weight")
+            cols = [f"seasonal_{period}" for period in self.periods]
+            if seasonal.ndim == 1:
+                seasonal = pd.Series(seasonal, index=index, name="seasonal")
+            else:
+                seasonal = pd.DataFrame(seasonal, index=index, columns=cols)
+
+        # Avoid circular imports
+        from statsmodels.tsa.seasonal import DecomposeResult
+
+        return DecomposeResult(y, seasonal, trend, resid, rw)

     def __str__(self):
         return (
-            f'MSTL(endog, periods={self.periods}, windows={self.windows}, lmbda={self.lmbda}, iterate={self.iterate})'
+            "MSTL(endog,"
+            f" periods={self.periods},"
+            f" windows={self.windows},"
+            f" lmbda={self.lmbda},"
+            f" iterate={self.iterate})"
+        )
+
+    def _process_periods_and_windows(
+        self,
+        periods: Union[int, Sequence[int], None],
+        windows: Union[int, Sequence[int], None],
+    ) -> Tuple[Sequence[int], Sequence[int]]:
+        periods = self._process_periods(periods)
+
+        if windows:
+            windows = self._process_windows(windows, num_seasons=len(periods))
+            periods, windows = self._sort_periods_and_windows(periods, windows)
+        else:
+            windows = self._process_windows(windows, num_seasons=len(periods))
+            periods = sorted(periods)
+
+        if len(periods) != len(windows):
+            raise ValueError("Periods and windows must have same length")
+
+        # Remove long periods from decomposition
+        if any(period >= self.nobs / 2 for period in periods):
+            warnings.warn(
+                "A period(s) is larger than half the length of time series."
+                " Removing these period(s)."
+            )
+            periods = tuple(
+                period for period in periods if period < self.nobs / 2
             )
+            windows = windows[: len(periods)]
+
+        return periods, windows
+
+    def _process_periods(
+        self, periods: Union[int, Sequence[int], None]
+    ) -> Sequence[int]:
+        if periods is None:
+            periods = (self._infer_period(),)
+        elif isinstance(periods, int):
+            periods = (periods,)
+        else:
+            pass
+        return periods
+
+    def _process_windows(
+        self,
+        windows: Union[int, Sequence[int], None],
+        num_seasons: int,
+    ) -> Sequence[int]:
+        if windows is None:
+            windows = self._default_seasonal_windows(num_seasons)
+        elif isinstance(windows, int):
+            windows = (windows,)
+        else:
+            pass
+        return windows
+
+    def _infer_period(self) -> int:
+        freq = None
+        if isinstance(self.endog, (pd.Series, pd.DataFrame)):
+            freq = getattr(self.endog.index, "inferred_freq", None)
+        if freq is None:
+            raise ValueError("Unable to determine period from endog")
+        period = freq_to_period(freq)
+        return period
+
+    @staticmethod
+    def _sort_periods_and_windows(
+        periods, windows
+    ) -> Tuple[Sequence[int], Sequence[int]]:
+        if len(periods) != len(windows):
+            raise ValueError("Periods and windows must have same length")
+        periods, windows = zip(*sorted(zip(periods, windows)))
+        return periods, windows
+
+    @staticmethod
+    def _remove_overloaded_stl_kwargs(stl_kwargs: Dict) -> Dict:
+        args = ["endog", "period", "seasonal"]
+        for arg in args:
+            stl_kwargs.pop(arg, None)
+        return stl_kwargs
+
+    @staticmethod
+    def _default_seasonal_windows(n: int) -> Sequence[int]:
+        return tuple(7 + 4 * i for i in range(1, n + 1))  # See [1]
+
+    @staticmethod
+    def _to_1d_array(x):
+        y = np.ascontiguousarray(np.squeeze(np.asarray(x)), dtype=np.double)
+        if y.ndim != 1:
+            raise ValueError("y must be a 1d array")
+        return y
diff --git a/statsmodels/tsa/tsatools.py b/statsmodels/tsa/tsatools.py
index d3c8e847e..048b56a2e 100644
--- a/statsmodels/tsa/tsatools.py
+++ b/statsmodels/tsa/tsatools.py
@@ -1,21 +1,41 @@
 from __future__ import annotations
+
 from statsmodels.compat.python import Literal, lrange
+
 import warnings
+
 import numpy as np
 import pandas as pd
 from pandas import DataFrame
 from pandas.tseries import offsets
 from pandas.tseries.frequencies import to_offset
+
 from statsmodels.tools.data import _is_recarray, _is_using_pandas
 from statsmodels.tools.sm_exceptions import ValueWarning
 from statsmodels.tools.typing import NDArray
-from statsmodels.tools.validation import array_like, bool_like, int_like, string_like
-__all__ = ['lagmat', 'lagmat2ds', 'add_trend', 'duplication_matrix',
-    'elimination_matrix', 'commutation_matrix', 'vec', 'vech', 'unvec',
-    'unvech', 'freq_to_period']
-
-
-def add_trend(x, trend='c', prepend=False, has_constant='skip'):
+from statsmodels.tools.validation import (
+    array_like,
+    bool_like,
+    int_like,
+    string_like,
+)
+
+__all__ = [
+    "lagmat",
+    "lagmat2ds",
+    "add_trend",
+    "duplication_matrix",
+    "elimination_matrix",
+    "commutation_matrix",
+    "vec",
+    "vech",
+    "unvec",
+    "unvech",
+    "freq_to_period",
+]
+
+
+def add_trend(x, trend="c", prepend=False, has_constant="skip"):
     """
     Add a trend and/or constant to an array.

@@ -55,7 +75,95 @@ def add_trend(x, trend='c', prepend=False, has_constant='skip'):
     Returns columns as ['ctt','ct','c'] whenever applicable. There is currently
     no checking for an existing trend.
     """
-    pass
+    prepend = bool_like(prepend, "prepend")
+    trend = string_like(trend, "trend", options=("n", "c", "t", "ct", "ctt"))
+    has_constant = string_like(
+        has_constant, "has_constant", options=("raise", "add", "skip")
+    )
+
+    # TODO: could be generalized for trend of aribitrary order
+    columns = ["const", "trend", "trend_squared"]
+    if trend == "n":
+        return x.copy()
+    elif trend == "c":  # handles structured arrays
+        columns = columns[:1]
+        trendorder = 0
+    elif trend == "ct" or trend == "t":
+        columns = columns[:2]
+        if trend == "t":
+            columns = columns[1:2]
+        trendorder = 1
+    elif trend == "ctt":
+        trendorder = 2
+
+    if _is_recarray(x):
+        from statsmodels.tools.sm_exceptions import recarray_exception
+
+        raise NotImplementedError(recarray_exception)
+
+    is_pandas = _is_using_pandas(x, None)
+    if is_pandas:
+        if isinstance(x, pd.Series):
+            x = pd.DataFrame(x)
+        else:
+            x = x.copy()
+    else:
+        x = np.asanyarray(x)
+
+    nobs = len(x)
+    trendarr = np.vander(
+        np.arange(1, nobs + 1, dtype=np.float64), trendorder + 1
+    )
+    # put in order ctt
+    trendarr = np.fliplr(trendarr)
+    if trend == "t":
+        trendarr = trendarr[:, 1]
+
+    if "c" in trend:
+        if is_pandas:
+            # Mixed type protection
+            def safe_is_const(s):
+                try:
+                    return np.ptp(s) == 0.0 and np.any(s != 0.0)
+                except:
+                    return False
+
+            col_const = x.apply(safe_is_const, 0)
+        else:
+            ptp0 = np.ptp(np.asanyarray(x), axis=0)
+            col_is_const = ptp0 == 0
+            nz_const = col_is_const & (x[0] != 0)
+            col_const = nz_const
+
+        if np.any(col_const):
+            if has_constant == "raise":
+                if x.ndim == 1:
+                    base_err = "x is constant."
+                else:
+                    columns = np.arange(x.shape[1])[col_const]
+                    if isinstance(x, pd.DataFrame):
+                        columns = x.columns
+                    const_cols = ", ".join([str(c) for c in columns])
+                    base_err = (
+                        "x contains one or more constant columns. Column(s) "
+                        f"{const_cols} are constant."
+                    )
+                msg = f"{base_err} Adding a constant with trend='{trend}' is not allowed."
+                raise ValueError(msg)
+            elif has_constant == "skip":
+                columns = columns[1:]
+                trendarr = trendarr[:, 1:]
+
+    order = 1 if prepend else -1
+    if is_pandas:
+        trendarr = pd.DataFrame(trendarr, index=x.index, columns=columns)
+        x = [trendarr, x]
+        x = pd.concat(x[::order], axis=1)
+    else:
+        x = [trendarr, x]
+        x = np.column_stack(x[::order])
+
+    return x


 def add_lag(x, col=None, lags=1, drop=False, insert=True):
@@ -97,7 +205,45 @@ def add_lag(x, col=None, lags=1, drop=False, insert=True):
     so that the length of the returned array is len(`X`) - lags. The lags are
     returned in increasing order, ie., t-1,t-2,...,t-lags
     """
-    pass
+    lags = int_like(lags, "lags")
+    drop = bool_like(drop, "drop")
+    x = array_like(x, "x", ndim=2)
+    if col is None:
+        col = 0
+
+    # handle negative index
+    if col < 0:
+        col = x.shape[1] + col
+    if x.ndim == 1:
+        x = x[:, None]
+    contemp = x[:, col]
+
+    if insert is True:
+        ins_idx = col + 1
+    elif insert is False:
+        ins_idx = x.shape[1]
+    else:
+        if insert < 0:  # handle negative index
+            insert = x.shape[1] + insert + 1
+        if insert > x.shape[1]:
+            insert = x.shape[1]
+
+            warnings.warn(
+                "insert > number of variables, inserting at the"
+                " last position",
+                ValueWarning,
+            )
+        ins_idx = insert
+
+    ndlags = lagmat(contemp, lags, trim="Both")
+    first_cols = lrange(ins_idx)
+    last_cols = lrange(ins_idx, x.shape[1])
+    if drop:
+        if col in first_cols:
+            first_cols.pop(first_cols.index(col))
+        else:
+            last_cols.pop(last_cols.index(col))
+    return np.column_stack((x[lags:, first_cols], ndlags, x[lags:, last_cols]))


 def detrend(x, order=1, axis=0):
@@ -122,13 +268,37 @@ def detrend(x, order=1, axis=0):
         The detrended series is the residual of the linear regression of the
         data on the trend of given order.
     """
-    pass
-
-
-def lagmat(x, maxlag: int, trim: Literal['forward', 'backward', 'both',
-    'none']='forward', original: Literal['ex', 'sep', 'in']='ex',
-    use_pandas: bool=False) ->(NDArray | DataFrame | tuple[NDArray, NDArray
-    ] | tuple[DataFrame, DataFrame]):
+    order = int_like(order, "order")
+    axis = int_like(axis, "axis")
+
+    if x.ndim == 2 and int(axis) == 1:
+        x = x.T
+    elif x.ndim > 2:
+        raise NotImplementedError(
+            "x.ndim > 2 is not implemented until it is needed"
+        )
+
+    nobs = x.shape[0]
+    if order == 0:
+        # Special case demean
+        resid = x - x.mean(axis=0)
+    else:
+        trends = np.vander(np.arange(float(nobs)), N=order + 1)
+        beta = np.linalg.pinv(trends).dot(x)
+        resid = x - np.dot(trends, beta)
+
+    if x.ndim == 2 and int(axis) == 1:
+        resid = resid.T
+
+    return resid
+
+
+def lagmat(x,
+           maxlag: int,
+           trim: Literal["forward", "backward", "both", "none"]='forward',
+           original: Literal["ex", "sep", "in"]="ex",
+           use_pandas: bool=False
+           )-> NDArray | DataFrame | tuple[NDArray, NDArray] | tuple[DataFrame, DataFrame]:
     """
     Create 2d array of lags.

@@ -196,11 +366,87 @@ def lagmat(x, maxlag: int, trim: Literal['forward', 'backward', 'both',
        [ 0.,  0.,  5.,  6.,  3.,  4.],
        [ 0.,  0.,  0.,  0.,  5.,  6.]])
     """
-    pass
-
-
-def lagmat2ds(x, maxlag0, maxlagex=None, dropex=0, trim='forward',
-    use_pandas=False):
+    maxlag = int_like(maxlag, "maxlag")
+    use_pandas = bool_like(use_pandas, "use_pandas")
+    trim = string_like(
+        trim,
+        "trim",
+        optional=True,
+        options=("forward", "backward", "both", "none"),
+    )
+    original = string_like(original, "original", options=("ex", "sep", "in"))
+
+    # TODO:  allow list of lags additional to maxlag
+    orig = x
+    x = array_like(x, "x", ndim=2, dtype=None)
+    is_pandas = _is_using_pandas(orig, None) and use_pandas
+    trim = "none" if trim is None else trim
+    trim = trim.lower()
+    if is_pandas and trim in ("none", "backward"):
+        raise ValueError(
+            "trim cannot be 'none' or 'backward' when used on "
+            "Series or DataFrames"
+        )
+
+    dropidx = 0
+    nobs, nvar = x.shape
+    if original in ["ex", "sep"]:
+        dropidx = nvar
+    if maxlag >= nobs:
+        raise ValueError("maxlag should be < nobs")
+    lm = np.zeros((nobs + maxlag, nvar * (maxlag + 1)))
+    for k in range(0, int(maxlag + 1)):
+        lm[
+        maxlag - k: nobs + maxlag - k,
+        nvar * (maxlag - k): nvar * (maxlag - k + 1),
+        ] = x
+
+    if trim in ("none", "forward"):
+        startobs = 0
+    elif trim in ("backward", "both"):
+        startobs = maxlag
+    else:
+        raise ValueError("trim option not valid")
+
+    if trim in ("none", "backward"):
+        stopobs = len(lm)
+    else:
+        stopobs = nobs
+
+    if is_pandas:
+        x = orig
+        if isinstance(x, DataFrame):
+            x_columns = [str(c) for c in x.columns]
+            if len(set(x_columns)) != x.shape[1]:
+                raise ValueError(
+                    "Columns names must be distinct after conversion to string "
+                    "(if not already strings)."
+                )
+        else:
+            x_columns = [str(x.name)]
+        columns = [str(col) for col in x_columns]
+        for lag in range(maxlag):
+            lag_str = str(lag + 1)
+            columns.extend([str(col) + ".L." + lag_str for col in x_columns])
+        lm = DataFrame(lm[:stopobs], index=x.index, columns=columns)
+        lags = lm.iloc[startobs:]
+        if original in ("sep", "ex"):
+            leads = lags[x_columns]
+            lags = lags.drop(x_columns, axis=1)
+    else:
+        lags = lm[startobs:stopobs, dropidx:]
+        if original == "sep":
+            leads = lm[startobs:stopobs, :dropidx]
+
+    if original == "sep":
+        return lags, leads
+    else:
+        return lags
+
+
+def lagmat2ds(
+    x, maxlag0, maxlagex=None, dropex=0, trim="forward", use_pandas=False
+):
     """
     Generate lagmatrix for 2d array, columns arranged by variables.

@@ -236,7 +482,101 @@ def lagmat2ds(x, maxlag0, maxlagex=None, dropex=0, trim='forward',
     -----
     Inefficient implementation for unequal lags, implemented for convenience.
     """
-    pass
+    maxlag0 = int_like(maxlag0, "maxlag0")
+    maxlagex = int_like(maxlagex, "maxlagex", optional=True)
+    trim = string_like(
+        trim,
+        "trim",
+        optional=True,
+        options=("forward", "backward", "both", "none"),
+    )
+    if maxlagex is None:
+        maxlagex = maxlag0
+    maxlag = max(maxlag0, maxlagex)
+    is_pandas = _is_using_pandas(x, None)
+
+    if x.ndim == 1:
+        if is_pandas:
+            x = pd.DataFrame(x)
+        else:
+            x = x[:, None]
+    elif x.ndim == 0 or x.ndim > 2:
+        raise ValueError("Only supports 1 and 2-dimensional data.")
+
+    nobs, nvar = x.shape
+
+    if is_pandas and use_pandas:
+        lags = lagmat(
+            x.iloc[:, 0], maxlag, trim=trim, original="in", use_pandas=True
+        )
+        lagsli = [lags.iloc[:, : maxlag0 + 1]]
+        for k in range(1, nvar):
+            lags = lagmat(
+                x.iloc[:, k], maxlag, trim=trim, original="in", use_pandas=True
+            )
+            lagsli.append(lags.iloc[:, dropex : maxlagex + 1])
+        return pd.concat(lagsli, axis=1)
+    elif is_pandas:
+        x = np.asanyarray(x)
+
+    lagsli = [
+        lagmat(x[:, 0], maxlag, trim=trim, original="in")[:, : maxlag0 + 1]
+    ]
+    for k in range(1, nvar):
+        lagsli.append(
+            lagmat(x[:, k], maxlag, trim=trim, original="in")[
+                :, dropex : maxlagex + 1
+            ]
+        )
+    return np.column_stack(lagsli)
+
+
+def vec(mat):
+    return mat.ravel("F")
+
+
+def vech(mat):
+    # Gets Fortran-order
+    return mat.T.take(_triu_indices(len(mat)))
+
+
+# tril/triu/diag, suitable for ndarray.take
+
+
+def _tril_indices(n):
+    rows, cols = np.tril_indices(n)
+    return rows * n + cols
+
+
+def _triu_indices(n):
+    rows, cols = np.triu_indices(n)
+    return rows * n + cols
+
+
+def _diag_indices(n):
+    rows, cols = np.diag_indices(n)
+    return rows * n + cols
+
+
+def unvec(v):
+    k = int(np.sqrt(len(v)))
+    assert k * k == len(v)
+    return v.reshape((k, k), order="F")
+
+
+def unvech(v):
+    # quadratic formula, correct fp error
+    rows = 0.5 * (-1 + np.sqrt(1 + 8 * len(v)))
+    rows = int(np.round(rows))
+
+    result = np.zeros((rows, rows))
+    result[np.triu_indices(rows)] = v
+    result = result + result.T
+
+    # divide diagonal elements by 2
+    result[np.diag_indices(rows)] /= 2
+
+    return result


 def duplication_matrix(n):
@@ -248,7 +588,9 @@ def duplication_matrix(n):
     -------
     D_n : ndarray
     """
-    pass
+    n = int_like(n, "n")
+    tmp = np.eye(n * (n + 1) // 2)
+    return np.array([unvech(x).ravel() for x in tmp]).T


 def elimination_matrix(n):
@@ -262,7 +604,9 @@ def elimination_matrix(n):
     Returns
     -------
     """
-    pass
+    n = int_like(n, "n")
+    vech_indices = vec(np.tril(np.ones((n, n))))
+    return np.eye(n * n)[vech_indices != 0]


 def commutation_matrix(p, q):
@@ -278,7 +622,12 @@ def commutation_matrix(p, q):
     -------
     K : ndarray (pq x pq)
     """
-    pass
+    p = int_like(p, "p")
+    q = int_like(q, "q")
+
+    K = np.eye(p * q)
+    indices = np.arange(p * q).reshape((p, q), order="F")
+    return K.take(indices.ravel(), axis=0)


 def _ar_transparams(params):
@@ -294,7 +643,14 @@ def _ar_transparams(params):
     ---------
     Jones(1980)
     """
-    pass
+    newparams = np.tanh(params / 2)
+    tmp = np.tanh(params / 2)
+    for j in range(1, len(params)):
+        a = newparams[j]
+        for kiter in range(j):
+            tmp[kiter] -= a * newparams[j - kiter - 1]
+        newparams[:j] = tmp[:j]
+    return newparams


 def _ar_invtransparams(params):
@@ -306,7 +662,17 @@ def _ar_invtransparams(params):
     params : array_like
         The transformed AR coefficients
     """
-    pass
+    params = params.copy()
+    tmp = params.copy()
+    for j in range(len(params) - 1, 0, -1):
+        a = params[j]
+        for kiter in range(j):
+            tmp[kiter] = (params[kiter] + a * params[j - kiter - 1]) / (
+                1 - a ** 2
+            )
+        params[:j] = tmp[:j]
+    invarcoefs = 2 * np.arctanh(params)
+    return invarcoefs


 def _ma_transparams(params):
@@ -322,7 +688,16 @@ def _ma_transparams(params):
     ---------
     Jones(1980)
     """
-    pass
+    newparams = ((1 - np.exp(-params)) / (1 + np.exp(-params))).copy()
+    tmp = ((1 - np.exp(-params)) / (1 + np.exp(-params))).copy()
+
+    # levinson-durbin to get macf
+    for j in range(1, len(params)):
+        b = newparams[j]
+        for kiter in range(j):
+            tmp[kiter] += b * newparams[j - kiter - 1]
+        newparams[:j] = tmp[:j]
+    return newparams


 def _ma_invtransparams(macoefs):
@@ -334,7 +709,16 @@ def _ma_invtransparams(macoefs):
     params : ndarray
         The transformed MA coefficients
     """
-    pass
+    tmp = macoefs.copy()
+    for j in range(len(macoefs) - 1, 0, -1):
+        b = macoefs[j]
+        for kiter in range(j):
+            tmp[kiter] = (macoefs[kiter] - b * macoefs[j - kiter - 1]) / (
+                1 - b ** 2
+            )
+        macoefs[:j] = tmp[:j]
+    invmacoefs = -np.log((1 - macoefs) / (1 + macoefs))
+    return invmacoefs


 def unintegrate_levels(x, d):
@@ -358,7 +742,9 @@ def unintegrate_levels(x, d):
     --------
     unintegrate
     """
-    pass
+    d = int_like(d, "d")
+    x = x[:d]
+    return np.asarray([np.diff(x, d - i)[0] for i in range(d, 0, -1)])


 def unintegrate(x, levels):
@@ -387,10 +773,15 @@ def unintegrate(x, levels):
     >>> unintegrate(np.diff(x, 2), levels)
     array([  1.,   3.,   9.,  19.,   8.])
     """
-    pass
+    levels = list(levels)[:]  # copy
+    if len(levels) > 1:
+        x0 = levels.pop(-1)
+        return unintegrate(np.cumsum(np.r_[x0, x]), levels)
+    x0 = levels[0]
+    return np.cumsum(np.r_[x0, x])


-def freq_to_period(freq: (str | offsets.DateOffset)) ->int:
+def freq_to_period(freq: str | offsets.DateOffset) -> int:
     """
     Convert a pandas frequency to a periodicity

@@ -408,4 +799,28 @@ def freq_to_period(freq: (str | offsets.DateOffset)) ->int:
     -----
     Annual maps to 1, quarterly maps to 4, monthly to 12, weekly to 52.
     """
-    pass
+    if not isinstance(freq, offsets.DateOffset):
+        freq = to_offset(freq)  # go ahead and standardize
+    assert isinstance(freq, offsets.DateOffset)
+    freq = freq.rule_code.upper()
+
+    yearly_freqs = ("A-", "AS-", "Y-", "YS-", "YE-")
+    if freq in ("A", "Y") or freq.startswith(yearly_freqs):
+        return 1
+    elif freq == "Q" or freq.startswith(("Q-", "QS", "QE")):
+        return 4
+    elif freq == "M" or freq.startswith(("M-", "MS", "ME")):
+        return 12
+    elif freq == "W" or freq.startswith("W-"):
+        return 52
+    elif freq == "D":
+        return 7
+    elif freq == "B":
+        return 5
+    elif freq == "H":
+        return 24
+    else:  # pragma : no cover
+        raise ValueError(
+            "freq {} not understood. Please report if you "
+            "think this is in error.".format(freq)
+        )
diff --git a/statsmodels/tsa/varma_process.py b/statsmodels/tsa/varma_process.py
index ae8033e1d..404f42ea8 100644
--- a/statsmodels/tsa/varma_process.py
+++ b/statsmodels/tsa/varma_process.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """ Helper and filter functions for VAR and VARMA, and basic VAR class

 Created on Mon Jan 11 11:04:23 2010
@@ -29,11 +30,12 @@ see also VAR section in Notes.txt
 """
 import numpy as np
 from scipy import signal
+
 from statsmodels.tsa.tsatools import lagmat


 def varfilter(x, a):
-    """apply an autoregressive filter to a series x
+    '''apply an autoregressive filter to a series x

     Warning: I just found out that convolve does not work as I
        thought, this likely does not work correctly for
@@ -88,12 +90,47 @@ def varfilter(x, a):

     TODO: initial conditions

-    """
-    pass
+    '''
+    x = np.asarray(x)
+    a = np.asarray(a)
+    if x.ndim == 1:
+        x = x[:,None]
+    if x.ndim > 2:
+        raise ValueError('x array has to be 1d or 2d')
+    nvar = x.shape[1]
+    nlags = a.shape[0]
+    ntrim = nlags//2
+    # for x is 2d with ncols >1
+
+    if a.ndim == 1:
+        # case: identical ar filter (lag polynomial)
+        return signal.convolve(x, a[:,None], mode='valid')
+        # alternative:
+        #return signal.lfilter(a,[1],x.astype(float),axis=0)
+    elif a.ndim == 2:
+        if min(a.shape) == 1:
+            # case: identical ar filter (lag polynomial)
+            return signal.convolve(x, a, mode='valid')
+
+        # case: independent ar
+        #(a bit like recserar in gauss, but no x yet)
+        #(no, reserar is inverse filter)
+        result = np.zeros((x.shape[0]-nlags+1, nvar))
+        for i in range(nvar):
+            # could also use np.convolve, but easier for swiching to fft
+            result[:,i] = signal.convolve(x[:,i], a[:,i], mode='valid')
+        return result
+
+    elif a.ndim == 3:
+        # case: vector autoregressive with lag matrices
+        # Note: we must have shape[1] == shape[2] == nvar
+        yf = signal.convolve(x[:,:,None], a)
+        yvalid = yf[ntrim:-ntrim, yf.shape[1]//2,:]
+        return yvalid


 def varinversefilter(ar, nobs, version=1):
-    """creates inverse ar filter (MA representation) recursively
+    '''creates inverse ar filter (MA representation) recursively

     The VAR lag polynomial is defined by ::

@@ -120,12 +157,32 @@ def varinversefilter(ar, nobs, version=1):
     Notes
     -----

-    """
-    pass
+    '''
+    nlags, nvars, nvarsex = ar.shape
+    if nvars != nvarsex:
+        print('exogenous variables not implemented not tested')
+    arinv = np.zeros((nobs+1, nvarsex, nvars))
+    arinv[0,:,:] = ar[0]
+    arinv[1:nlags,:,:] = -ar[1:]
+    if version == 1:
+        for i in range(2,nobs+1):
+            tmp = np.zeros((nvars,nvars))
+            for p in range(1,nlags):
+                tmp += np.dot(-ar[p],arinv[i-p,:,:])
+            arinv[i,:,:] = tmp
+    if version == 0:
+        for i in range(nlags+1,nobs+1):
+            print(ar[1:].shape, arinv[i-1:i-nlags:-1,:,:].shape)
+            #arinv[i,:,:] = np.dot(-ar[1:],arinv[i-1:i-nlags:-1,:,:])
+            #print(np.tensordot(-ar[1:],arinv[i-1:i-nlags:-1,:,:],axes=([2],[1])).shape
+            #arinv[i,:,:] = np.tensordot(-ar[1:],arinv[i-1:i-nlags:-1,:,:],axes=([2],[1]))
+            raise NotImplementedError('waiting for generalized ufuncs or something')
+
+    return arinv


 def vargenerate(ar, u, initvalues=None):
-    """generate an VAR process with errors u
+    '''generate an VAR process with errors u

     similar to gauss
     uses loop
@@ -160,12 +217,33 @@ def vargenerate(ar, u, initvalues=None):
     imp[0,0] = 1
     vargenerate(a21,imp)

-    """
-    pass
+    '''
+    nlags, nvars, nvarsex = ar.shape
+    nlagsm1 = nlags - 1
+    nobs = u.shape[0]
+    if nvars != nvarsex:
+        print('exogenous variables not implemented not tested')
+    if u.shape[1] != nvars:
+        raise ValueError('u needs to have nvars columns')
+    if initvalues is None:
+        sar = np.zeros((nobs+nlagsm1, nvars))
+        start = nlagsm1
+    else:
+        start = max(nlagsm1, initvalues.shape[0])
+        sar = np.zeros((nobs+start, nvars))
+        sar[start-initvalues.shape[0]:start] = initvalues
+    #sar[nlagsm1:] = u
+    sar[start:] = u
+    #if version == 1:
+    for i in range(start,start+nobs):
+        for p in range(1,nlags):
+            sar[i] += np.dot(sar[i-p,:],-ar[p])
+
+    return sar


 def padone(x, front=0, back=0, axis=0, fillvalue=0):
-    """pad with zeros along one axis, currently only axis=0
+    '''pad with zeros along one axis, currently only axis=0


     can be used sequentially to pad several axis
@@ -181,12 +259,26 @@ def padone(x, front=0, back=0, axis=0, fillvalue=0):
            [  1.,   1.,   1.],
            [  1.,   1.,   1.],
            [ NaN,  NaN,  NaN]])
-    """
-    pass
+    '''
+    #primitive version
+    shape = np.array(x.shape)
+    shape[axis] += (front + back)
+    shapearr = np.array(x.shape)
+    out = np.empty(shape)
+    out.fill(fillvalue)
+    startind = np.zeros(x.ndim)
+    startind[axis] = front
+    endind = startind + shapearr
+    myslice = [slice(startind[k], endind[k]) for k in range(len(endind))]
+    #print(myslice
+    #print(out.shape
+    #print(out[tuple(myslice)].shape
+    out[tuple(myslice)] = x
+    return out


 def trimone(x, front=0, back=0, axis=0):
-    """trim number of array elements along one axis
+    '''trim number of array elements along one axis


     Examples
@@ -198,26 +290,38 @@ def trimone(x, front=0, back=0, axis=0):
     >>> trimone(xp,1,3,1)
     array([[ 1.,  1.,  1.],
            [ 1.,  1.,  1.]])
-    """
-    pass
+    '''
+    shape = np.array(x.shape)
+    shape[axis] -= (front + back)
+    #print(shape, front, back
+    shapearr = np.array(x.shape)
+    startind = np.zeros(x.ndim)
+    startind[axis] = front
+    endind = startind + shape
+    myslice = [slice(startind[k], endind[k]) for k in range(len(endind))]
+    #print(myslice
+    #print(shape, endind
+    #print(x[tuple(myslice)].shape
+    return x[tuple(myslice)]


 def ar2full(ar):
-    """make reduced lagpolynomial into a right side lagpoly array
-    """
-    pass
+    '''make reduced lagpolynomial into a right side lagpoly array
+    '''
+    nlags, nvar,nvarex = ar.shape
+    return np.r_[np.eye(nvar,nvarex)[None,:,:],-ar]


 def ar2lhs(ar):
-    """convert full (rhs) lagpolynomial into a reduced, left side lagpoly array
+    '''convert full (rhs) lagpolynomial into a reduced, left side lagpoly array

     this is mainly a reminder about the definition
-    """
-    pass
+    '''
+    return -ar[1:]


 class _Var:
-    """obsolete VAR class, use tsa.VAR instead, for internal use only
+    '''obsolete VAR class, use tsa.VAR instead, for internal use only


     Examples
@@ -232,14 +336,14 @@ class _Var:
            [[-0.77784898,  0.01726193],
             [ 0.10733009, -0.78665335]]])

-    """
+    '''

     def __init__(self, y):
         self.y = y
         self.nobs, self.nvars = y.shape

     def fit(self, nlags):
-        """estimate parameters using ols
+        '''estimate parameters using ols

         Parameters
         ----------
@@ -264,17 +368,32 @@ class _Var:
         estimation results are attached to the class instance


-        """
-        pass
+        '''
+        self.nlags = nlags # without current period
+        nvars = self.nvars
+        #TODO: ar2s looks like a module variable, bug?
+        #lmat = lagmat(ar2s, nlags, trim='both', original='in')
+        lmat = lagmat(self.y, nlags, trim='both', original='in')
+        self.yred = lmat[:,:nvars]
+        self.xred = lmat[:,nvars:]
+        res = np.linalg.lstsq(self.xred, self.yred, rcond=-1)
+        self.estresults = res
+        self.arlhs = res[0].reshape(nlags, nvars, nvars)
+        self.arhat = ar2full(self.arlhs)
+        self.rss = res[1]
+        self.xredrank = res[2]

     def predict(self):
-        """calculate estimated timeseries (yhat) for sample
+        '''calculate estimated timeseries (yhat) for sample
+
+        '''

-        """
-        pass
+        if not hasattr(self, 'yhat'):
+            self.yhat = varfilter(self.y, self.arhat)
+        return self.yhat

     def covmat(self):
-        """ covariance matrix of estimate
+        ''' covariance matrix of estimate
         # not sure it's correct, need to check orientation everywhere
         # looks ok, display needs getting used to
         >>> v.rss[None,None,:]*np.linalg.inv(np.dot(v.xred.T,v.xred))[:,:,None]
@@ -290,11 +409,14 @@ class _Var:
         >>> v.rss[1]*np.linalg.inv(np.dot(v.xred.T,v.xred))
         array([[ 0.32210609,  0.08670584],
                [ 0.08670584,  0.39696255]])
-       """
-        pass
+       '''
+
+        #check if orientation is same as self.arhat
+        self.paramcov = (self.rss[None,None,:] *
+            np.linalg.inv(np.dot(self.xred.T, self.xred))[:,:,None])

     def forecast(self, horiz=1, u=None):
-        """calculates forcast for horiz number of periods at end of sample
+        '''calculates forcast for horiz number of periods at end of sample

         Parameters
         ----------
@@ -307,12 +429,14 @@ class _Var:
         -------
         yforecast : array (nobs+horiz, nvars)
             this includes the sample and the forecasts
-        """
-        pass
+        '''
+        if u is None:
+            u = np.zeros((horiz, self.nvars))
+        return vargenerate(self.arhat, u, initvalues=self.y)


 class VarmaPoly:
-    """class to keep track of Varma polynomial format
+    '''class to keep track of Varma polynomial format


     Examples
@@ -334,16 +458,15 @@ class VarmaPoly:
                      [ 0.2, 0.3]]])


-    """
-
+    '''
     def __init__(self, ar, ma=None):
         self.ar = ar
         self.ma = ma
         nlags, nvarall, nvars = ar.shape
         self.nlags, self.nvarall, self.nvars = nlags, nvarall, nvars
-        self.isstructured = not (ar[0, :nvars] == np.eye(nvars)).all()
+        self.isstructured = not (ar[0,:nvars] == np.eye(nvars)).all()
         if self.ma is None:
-            self.ma = np.eye(nvars)[None, ...]
+            self.ma = np.eye(nvars)[None,...]
             self.isindependent = True
         else:
             self.isindependent = not (ma[0] == np.eye(nvars)).all()
@@ -351,39 +474,74 @@ class VarmaPoly:
         self.hasexog = nvarall > nvars
         self.arm1 = -ar[1:]

+    #@property
     def vstack(self, a=None, name='ar'):
-        """stack lagpolynomial vertically in 2d array
-
-        """
-        pass
+        '''stack lagpolynomial vertically in 2d array
+
+        '''
+        if a is not None:
+            a = a
+        elif name == 'ar':
+            a = self.ar
+        elif name == 'ma':
+            a = self.ma
+        else:
+            raise ValueError('no array or name given')
+        return a.reshape(-1, self.nvarall)

+    #@property
     def hstack(self, a=None, name='ar'):
-        """stack lagpolynomial horizontally in 2d array
-
-        """
-        pass
+        '''stack lagpolynomial horizontally in 2d array
+
+        '''
+        if a is not None:
+            a = a
+        elif name == 'ar':
+            a = self.ar
+        elif name == 'ma':
+            a = self.ma
+        else:
+            raise ValueError('no array or name given')
+        return a.swapaxes(1,2).reshape(-1, self.nvarall).T

+    #@property
     def stacksquare(self, a=None, name='ar', orientation='vertical'):
-        """stack lagpolynomial vertically in 2d square array with eye
-
-        """
-        pass
-
+        '''stack lagpolynomial vertically in 2d square array with eye
+
+        '''
+        if a is not None:
+            a = a
+        elif name == 'ar':
+            a = self.ar
+        elif name == 'ma':
+            a = self.ma
+        else:
+            raise ValueError('no array or name given')
+        astacked = a.reshape(-1, self.nvarall)
+        lenpk, nvars = astacked.shape #[0]
+        amat = np.eye(lenpk, k=nvars)
+        amat[:,:nvars] = astacked
+        return amat
+
+    #@property
     def vstackarma_minus1(self):
-        """stack ar and lagpolynomial vertically in 2d array
+        '''stack ar and lagpolynomial vertically in 2d array

-        """
-        pass
+        '''
+        a = np.concatenate((self.ar[1:], self.ma[1:]),0)
+        return a.reshape(-1, self.nvarall)

+    #@property
     def hstackarma_minus1(self):
-        """stack ar and lagpolynomial vertically in 2d array
+        '''stack ar and lagpolynomial vertically in 2d array

         this is the Kalman Filter representation, I think
-        """
-        pass
+        '''
+        a = np.concatenate((self.ar[1:], self.ma[1:]),0)
+        return a.swapaxes(1,2).reshape(-1, self.nvarall)

     def getisstationary(self, a=None):
-        """check whether the auto-regressive lag-polynomial is stationary
+        '''check whether the auto-regressive lag-polynomial is stationary

         Returns
         -------
@@ -398,11 +556,21 @@ class VarmaPoly:
         ----------
         formula taken from NAG manual

-        """
-        pass
+        '''
+        if a is not None:
+            a = a
+        else:
+            if self.isstructured:
+                a = -self.reduceform(self.ar)[1:]
+            else:
+                a = -self.ar[1:]
+        amat = self.stacksquare(a)
+        ev = np.sort(np.linalg.eigvals(amat))[::-1]
+        self.areigenvalues = ev
+        return (np.abs(ev) < 1).all()

     def getisinvertible(self, a=None):
-        """check whether the auto-regressive lag-polynomial is stationary
+        '''check whether the auto-regressive lag-polynomial is stationary

         Returns
         -------
@@ -417,41 +585,123 @@ class VarmaPoly:
         ----------
         formula taken from NAG manual

-        """
-        pass
+        '''
+        if a is not None:
+            a = a
+        else:
+            if self.isindependent:
+                a = self.reduceform(self.ma)[1:]
+            else:
+                a = self.ma[1:]
+        if a.shape[0] == 0:
+            # no ma lags
+            self.maeigenvalues = np.array([], np.complex)
+            return True
+
+        amat = self.stacksquare(a)
+        ev = np.sort(np.linalg.eigvals(amat))[::-1]
+        self.maeigenvalues = ev
+        return (np.abs(ev) < 1).all()

     def reduceform(self, apoly):
-        """
+        '''

         this assumes no exog, todo

-        """
-        pass
-
-
-if __name__ == '__main__':
-    a21 = np.array([[[1.0, 0.0], [0.0, 1.0]], [[-0.8, 0.0], [0.0, -0.6]]])
-    a22 = np.array([[[1.0, 0.0], [0.0, 1.0]], [[-0.8, 0.0], [0.1, -0.8]]])
-    a23 = np.array([[[1.0, 0.0], [0.0, 1.0]], [[-0.8, 0.2], [0.1, -0.6]]])
-    a24 = np.array([[[1.0, 0.0], [0.0, 1.0]], [[-0.6, 0.0], [0.2, -0.6]], [
-        [-0.1, 0.0], [0.1, -0.1]]])
-    a31 = np.r_[np.eye(3)[None, :, :], 0.8 * np.eye(3)[None, :, :]]
-    a32 = np.array([[[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]], [[
-        0.8, 0.0, 0.0], [0.1, 0.6, 0.0], [0.0, 0.0, 0.9]]])
-    ut = np.random.randn(1000, 2)
-    ar2s = vargenerate(a22, ut)
-    res = np.linalg.lstsq(lagmat(ar2s, 1), ar2s, rcond=-1)
-    bhat = res[0].reshape(1, 2, 2)
+        '''
+        if apoly.ndim != 3:
+            raise ValueError('apoly needs to be 3d')
+        nlags, nvarsex, nvars = apoly.shape
+
+        a = np.empty_like(apoly)
+        try:
+            a0inv = np.linalg.inv(a[0,:nvars, :])
+        except np.linalg.LinAlgError:
+            raise ValueError('matrix not invertible',
+                             'ask for implementation of pinv')
+
+        for lag in range(nlags):
+            a[lag] = np.dot(a0inv, apoly[lag])
+
+        return a
+
+
+if __name__ == "__main__":
+    # some example lag polynomials
+    a21 = np.array([[[ 1. ,  0. ],
+                     [ 0. ,  1. ]],
+
+                    [[-0.8,  0. ],
+                     [ 0.,  -0.6]]])
+
+    a22 = np.array([[[ 1. ,  0. ],
+                     [ 0. ,  1. ]],
+
+                    [[-0.8,  0. ],
+                     [ 0.1, -0.8]]])
+
+    a23 = np.array([[[ 1. ,  0. ],
+                     [ 0. ,  1. ]],
+
+                    [[-0.8,  0.2],
+                     [ 0.1, -0.6]]])
+
+    a24 = np.array([[[ 1. ,  0. ],
+                     [ 0. ,  1. ]],
+
+                    [[-0.6,  0. ],
+                     [ 0.2, -0.6]],
+
+                    [[-0.1,  0. ],
+                     [ 0.1, -0.1]]])
+
+    a31 = np.r_[np.eye(3)[None,:,:], 0.8*np.eye(3)[None,:,:]]
+    a32 = np.array([[[ 1. ,  0. ,  0. ],
+                     [ 0. ,  1. ,  0. ],
+                     [ 0. ,  0. ,  1. ]],
+
+                    [[ 0.8,  0. ,  0. ],
+                     [ 0.1,  0.6,  0. ],
+                     [ 0. ,  0. ,  0.9]]])
+
+    ########
+    ut = np.random.randn(1000,2)
+    ar2s = vargenerate(a22,ut)
+    #res = np.linalg.lstsq(lagmat(ar2s,1)[:,1:], ar2s)
+    res = np.linalg.lstsq(lagmat(ar2s,1), ar2s, rcond=-1)
+    bhat = res[0].reshape(1,2,2)
     arhat = ar2full(bhat)
+    #print(maxabs(arhat - a22)
+
     v = _Var(ar2s)
     v.fit(1)
     v.forecast()
     v.forecast(25)[-30:]
-    ar23 = np.array([[[1.0, 0.0], [0.0, 1.0]], [[-0.6, 0.0], [0.2, -0.6]],
-        [[-0.1, 0.0], [0.1, -0.1]]])
-    ma22 = np.array([[[1.0, 0.0], [0.0, 1.0]], [[0.4, 0.0], [0.2, 0.3]]])
-    ar23ns = np.array([[[1.0, 0.0], [0.0, 1.0]], [[-1.9, 0.0], [0.4, -0.6]],
-        [[0.3, 0.0], [0.1, -0.1]]])
+
+    ar23 = np.array([[[ 1. ,  0. ],
+                     [ 0. ,  1. ]],
+
+                    [[-0.6,  0. ],
+                     [ 0.2, -0.6]],
+
+                    [[-0.1,  0. ],
+                     [ 0.1, -0.1]]])
+
+    ma22 = np.array([[[ 1. ,  0. ],
+                     [ 0. ,  1. ]],
+
+                    [[ 0.4,  0. ],
+                     [ 0.2, 0.3]]])
+
+    ar23ns = np.array([[[ 1. ,  0. ],
+                     [ 0. ,  1. ]],
+
+                    [[-1.9,  0. ],
+                     [ 0.4, -0.6]],
+
+                    [[ 0.3,  0. ],
+                     [ 0.1, -0.1]]])
+
     vp = VarmaPoly(ar23, ma22)
     print(vars(vp))
     print(vp.vstack())
@@ -459,6 +709,7 @@ if __name__ == '__main__':
     print(vp.hstackarma_minus1())
     print(vp.getisstationary())
     print(vp.getisinvertible())
+
     vp2 = VarmaPoly(ar23ns)
     print(vp2.getisstationary())
-    print(vp2.getisinvertible())
+    print(vp2.getisinvertible()) # no ma lags
diff --git a/statsmodels/tsa/vector_ar/api.py b/statsmodels/tsa/vector_ar/api.py
index cede75ec8..a4f269994 100644
--- a/statsmodels/tsa/vector_ar/api.py
+++ b/statsmodels/tsa/vector_ar/api.py
@@ -1,3 +1,4 @@
-__all__ = ['VAR', 'SVAR']
+__all__ = ["VAR", "SVAR"]
+
 from .svar_model import SVAR
 from .var_model import VAR
diff --git a/statsmodels/tsa/vector_ar/hypothesis_test_results.py b/statsmodels/tsa/vector_ar/hypothesis_test_results.py
index e64ae4369..b252edde9 100644
--- a/statsmodels/tsa/vector_ar/hypothesis_test_results.py
+++ b/statsmodels/tsa/vector_ar/hypothesis_test_results.py
@@ -1,4 +1,5 @@
 import numpy as np
+
 from statsmodels.iolib.table import SimpleTable


@@ -23,9 +24,8 @@ class HypothesisTestResults:
         A string describing the null hypothesis. It will be used in the
         summary.
     """
-
-    def __init__(self, test_statistic, crit_value, pvalue, df, signif,
-        method, title, h0):
+    def __init__(self, test_statistic, crit_value, pvalue, df,
+                 signif, method, title, h0):
         self.test_statistic = test_statistic
         self.crit_value = crit_value
         self.pvalue = pvalue
@@ -33,32 +33,46 @@ class HypothesisTestResults:
         self.signif = signif
         self.method = method.capitalize()
         if test_statistic < crit_value:
-            self.conclusion = 'fail to reject'
+            self.conclusion = "fail to reject"
         else:
-            self.conclusion = 'reject'
+            self.conclusion = "reject"
         self.title = title
         self.h0 = h0
-        self.conclusion_str = 'Conclusion: %s H_0' % self.conclusion
-        self.signif_str = ' at {:.0%} significance level'.format(self.signif)
+        self.conclusion_str = "Conclusion: %s H_0" % self.conclusion
+        self.signif_str = " at {:.0%} significance level".format(self.signif)

     def summary(self):
         """Return summary"""
-        pass
+        title = self.title + ". " + self.h0 + ". " \
+                                  + self.conclusion_str + self.signif_str + "."
+        data_fmt = {"data_fmts": ["%#0.4g", "%#0.4g", "%#0.3F", "%s"]}
+        html_data_fmt = dict(data_fmt)
+        html_data_fmt["data_fmts"] = ["<td>" + i + "</td>"
+                                      for i in html_data_fmt["data_fmts"]]
+        return SimpleTable(data=[[self.test_statistic, self.crit_value,
+                                  self.pvalue, str(self.df)]],
+                           headers=['Test statistic', 'Critical value',
+                                    'p-value', 'df'],
+                           title=title,
+                           txt_fmt=data_fmt,
+                           html_fmt=html_data_fmt,
+                           ltx_fmt=data_fmt)

     def __str__(self):
-        return ('<' + self.__module__ + '.' + self.__class__.__name__ +
-            ' object. ' + self.h0 + ': ' + self.conclusion + self.
-            signif_str + '. Test statistic: {:.3f}'.format(self.
-            test_statistic) + ', critical value: {:.3f}>'.format(self.
-            crit_value) + ', p-value: {:.3f}>'.format(self.pvalue))
+        return "<" + self.__module__ + "." + self.__class__.__name__ \
+                   + " object. " + self.h0 + ": " + self.conclusion \
+                   + self.signif_str \
+                   + ". Test statistic: {:.3f}".format(self.test_statistic) \
+                   + ", critical value: {:.3f}>".format(self.crit_value) \
+                   + ", p-value: {:.3f}>".format(self.pvalue)

     def __eq__(self, other):
         if not isinstance(other, self.__class__):
             return False
-        return np.allclose(self.test_statistic, other.test_statistic
-            ) and np.allclose(self.crit_value, other.crit_value
-            ) and np.allclose(self.pvalue, other.pvalue) and np.allclose(self
-            .signif, other.signif)
+        return np.allclose(self.test_statistic, other.test_statistic) \
+            and np.allclose(self.crit_value, other.crit_value) \
+            and np.allclose(self.pvalue, other.pvalue) \
+            and np.allclose(self.signif, other.signif)


 class CausalityTestResults(HypothesisTestResults):
@@ -85,43 +99,46 @@ class CausalityTestResults(HypothesisTestResults):
         The kind of test. ``"f"`` indicates an F-test, ``"wald"`` indicates a
         Wald-test.
     """
-
-    def __init__(self, causing, caused, test_statistic, crit_value, pvalue,
-        df, signif, test='granger', method=None):
+    def __init__(self, causing, caused, test_statistic, crit_value, pvalue, df,
+                 signif, test="granger", method=None):
         self.causing = causing
         self.caused = caused
         self.test = test
-        if method is None or method.lower() not in ['f', 'wald']:
-            raise ValueError(
-                'The method ("f" for F-test, "wald" for Wald-test) must not be None.'
-                )
+        if method is None or method.lower() not in ["f", "wald"]:
+            raise ValueError('The method ("f" for F-test, "wald" for '
+                             "Wald-test) must not be None.")
         method = method.capitalize()
-        title = 'Granger' if self.test == 'granger' else 'Instantaneous'
-        title += ' causality %s-test' % method
-        h0 = 'H_0: '
+        # attributes used in summary and string representation:
+        title = "Granger" if self.test == "granger" else "Instantaneous"
+        title += " causality %s-test" % method
+        h0 = "H_0: "
         if len(self.causing) == 1:
-            h0 += '{} does not '.format(self.causing[0])
+            h0 += "{} does not ".format(self.causing[0])
         else:
-            h0 += '{} do not '.format(self.causing)
-        h0 += 'Granger-' if self.test == 'granger' else 'instantaneously '
-        h0 += 'cause '
+            h0 += "{} do not ".format(self.causing)
+        h0 += "Granger-" if self.test == "granger" else "instantaneously "
+        h0 += "cause "
         if len(self.caused) == 1:
             h0 += self.caused[0]
         else:
-            h0 += '[' + ', '.join(caused) + ']'
-        super().__init__(test_statistic, crit_value, pvalue, df, signif,
-            method, title, h0)
+            h0 += "[" + ", ".join(caused) + "]"
+
+        super().__init__(test_statistic, crit_value,
+                         pvalue, df, signif, method,
+                         title, h0)

     def __eq__(self, other):
         basic_test = super().__eq__(other)
         if not basic_test:
             return False
         test = self.test == other.test
-        variables = (self.causing == other.causing and self.caused == other
-            .caused)
-        if not variables and self.test == 'inst':
-            variables = (self.causing == other.caused and self.caused ==
-                other.causing)
+        variables = (self.causing == other.causing and
+                     self.caused == other.caused)
+        # instantaneous causality is a symmetric relation ==> causing and
+        # caused may be swapped
+        if not variables and self.test == "inst":
+            variables = (self.causing == other.caused and
+                         self.caused == other.causing)
         return test and variables


@@ -142,13 +159,13 @@ class NormalityTestResults(HypothesisTestResults):
     signif : float
         Significance level.
     """
-
     def __init__(self, test_statistic, crit_value, pvalue, df, signif):
-        method = 'Jarque-Bera'
-        title = 'normality (skew and kurtosis) test'
+        method = "Jarque-Bera"
+        title = "normality (skew and kurtosis) test"
         h0 = 'H_0: data generated by normally-distributed process'
-        super().__init__(test_statistic, crit_value, pvalue, df, signif,
-            method, title, h0)
+        super().__init__(test_statistic, crit_value,
+                         pvalue, df, signif,
+                         method, title, h0)


 class WhitenessTestResults(HypothesisTestResults):
@@ -170,15 +187,22 @@ class WhitenessTestResults(HypothesisTestResults):
     nlags : int
         Number of lags tested.
     """
-
-    def __init__(self, test_statistic, crit_value, pvalue, df, signif,
-        nlags, adjusted):
+    def __init__(self, test_statistic, crit_value, pvalue, df, signif, nlags,
+                 adjusted):
         self.lags = nlags
         self.adjusted = adjusted
-        method = 'Portmanteau'
-        title = '{}-test for residual autocorrelation'.format(method)
+        method = "Portmanteau"
+        title = "{}-test for residual autocorrelation".format(method)
         if adjusted:
-            title = 'Adjusted ' + title
-        h0 = 'H_0: residual autocorrelation up to lag {} is zero'.format(nlags)
-        super().__init__(test_statistic, crit_value, pvalue, df, signif,
-            method, title, h0)
+            title = "Adjusted " + title
+        h0 = "H_0: residual autocorrelation up to lag {} is zero".format(nlags)
+        super().__init__(
+            test_statistic,
+            crit_value,
+            pvalue,
+            df,
+            signif,
+            method,
+            title,
+            h0
+        )
diff --git a/statsmodels/tsa/vector_ar/irf.py b/statsmodels/tsa/vector_ar/irf.py
index 235dd8f0a..989ae3dad 100644
--- a/statsmodels/tsa/vector_ar/irf.py
+++ b/statsmodels/tsa/vector_ar/irf.py
@@ -1,13 +1,17 @@
+# -*- coding: utf-8 -*-
 """
 Impulse reponse-related code
 """
+
 import numpy as np
 import numpy.linalg as la
 import scipy.linalg as L
+
 from statsmodels.tools.decorators import cache_readonly
 import statsmodels.tsa.tsatools as tsa
 import statsmodels.tsa.vector_ar.plotting as plotting
 import statsmodels.tsa.vector_ar.util as util
+
 mat = np.array


@@ -18,41 +22,74 @@ class BaseIRAnalysis:
     """

     def __init__(self, model, P=None, periods=10, order=None, svar=False,
-        vecm=False):
+                 vecm=False):
         self.model = model
         self.periods = periods
         self.neqs, self.lags, self.T = model.neqs, model.k_ar, model.nobs
+
         self.order = order
+
         if P is None:
             sigma = model.sigma_u
+
+            # TODO, may be difficult at the moment
+            # if order is not None:
+            #     indexer = [model.get_eq_index(name) for name in order]
+            #     sigma = sigma[:, indexer][indexer, :]
+
+            #     if sigma.shape != model.sigma_u.shape:
+            #         raise ValueError('variable order is wrong length')
+
             P = la.cholesky(sigma)
+
         self.P = P
+
         self.svar = svar
+
         self.irfs = model.ma_rep(periods)
         if svar:
             self.svar_irfs = model.svar_ma_rep(periods, P=P)
         else:
             self.orth_irfs = model.orth_ma_rep(periods, P=P)
+
         self.cum_effects = self.irfs.cumsum(axis=0)
         if svar:
             self.svar_cum_effects = self.svar_irfs.cumsum(axis=0)
         else:
             self.orth_cum_effects = self.orth_irfs.cumsum(axis=0)
+
+        # long-run effects may be infinite for VECMs.
         if not vecm:
             self.lr_effects = model.long_run_effects()
             if svar:
                 self.svar_lr_effects = np.dot(model.long_run_effects(), P)
             else:
                 self.orth_lr_effects = np.dot(model.long_run_effects(), P)
+
+        # auxiliary stuff
         if vecm:
             self._A = util.comp_matrix(model.var_rep)
         else:
             self._A = util.comp_matrix(model.coefs)

-    def plot(self, orth=False, *, impulse=None, response=None, signif=0.05,
-        plot_params=None, figsize=(10, 10), subplot_params=None,
-        plot_stderr=True, stderr_type='asym', repl=1000, seed=None,
-        component=None):
+    def _choose_irfs(self, orth=False, svar=False):
+        if orth:
+            return self.orth_irfs
+        elif svar:
+            return self.svar_irfs
+        else:
+            return self.irfs
+
+    def cov(self, *args, **kwargs):
+        raise NotImplementedError
+
+    def cum_effect_cov(self, *args, **kwargs):
+        raise NotImplementedError
+
+    def plot(self, orth=False, *, impulse=None, response=None,
+             signif=0.05, plot_params=None, figsize=(10, 10),
+             subplot_params=None, plot_stderr=True, stderr_type='asym',
+             repl=1000, seed=None, component=None):
         """
         Plot impulse responses

@@ -84,11 +121,61 @@ class BaseIRAnalysis:
             np.random.seed for Monte Carlo replications
         component: array or vector of principal component indices
         """
-        pass
+        periods = self.periods
+        model = self.model
+        svar = self.svar
+
+        if orth and svar:
+            raise ValueError("For SVAR system, set orth=False")
+
+        irfs = self._choose_irfs(orth, svar)
+        if orth:
+            title = 'Impulse responses (orthogonalized)'
+        elif svar:
+            title = 'Impulse responses (structural)'
+        else:
+            title = 'Impulse responses'
+
+        if plot_stderr is False:
+            stderr = None
+
+        elif stderr_type not in ['asym', 'mc', 'sz1', 'sz2','sz3']:
+            raise ValueError("Error type must be either 'asym', 'mc','sz1','sz2', or 'sz3'")
+        else:
+            if stderr_type == 'asym':
+                stderr = self.cov(orth=orth)
+            if stderr_type == 'mc':
+                stderr = self.errband_mc(orth=orth, svar=svar,
+                                         repl=repl, signif=signif,
+                                         seed=seed)
+            if stderr_type == 'sz1':
+                stderr = self.err_band_sz1(orth=orth, svar=svar,
+                                           repl=repl, signif=signif,
+                                           seed=seed,
+                                           component=component)
+            if stderr_type == 'sz2':
+                stderr = self.err_band_sz2(orth=orth, svar=svar,
+                                           repl=repl, signif=signif,
+                                           seed=seed,
+                                           component=component)
+            if stderr_type == 'sz3':
+                stderr = self.err_band_sz3(orth=orth, svar=svar,
+                                           repl=repl, signif=signif,
+                                           seed=seed,
+                                           component=component)
+
+        fig = plotting.irf_grid_plot(irfs, stderr, impulse, response,
+                                     self.model.names, title, signif=signif,
+                                     subplot_params=subplot_params,
+                                     plot_params=plot_params,
+                                     figsize=figsize,
+                                     stderr_type=stderr_type)
+        return fig

     def plot_cum_effects(self, orth=False, *, impulse=None, response=None,
-        signif=0.05, plot_params=None, figsize=(10, 10), subplot_params=
-        None, plot_stderr=True, stderr_type='asym', repl=1000, seed=None):
+                         signif=0.05, plot_params=None, figsize=(10, 10),
+                         subplot_params=None, plot_stderr=True,
+                         stderr_type='asym', repl=1000, seed=None):
         """
         Plot cumulative impulse response functions

@@ -119,7 +206,35 @@ class BaseIRAnalysis:
         seed : int
             np.random.seed for Monte Carlo replications
         """
-        pass
+
+        if orth:
+            title = 'Cumulative responses responses (orthogonalized)'
+            cum_effects = self.orth_cum_effects
+            lr_effects = self.orth_lr_effects
+        else:
+            title = 'Cumulative responses'
+            cum_effects = self.cum_effects
+            lr_effects = self.lr_effects
+
+        if stderr_type not in ['asym', 'mc']:
+            raise ValueError("`stderr_type` must be one of 'asym', 'mc'")
+        else:
+            if stderr_type == 'asym':
+                stderr = self.cum_effect_cov(orth=orth)
+            if stderr_type == 'mc':
+                stderr = self.cum_errband_mc(orth=orth, repl=repl,
+                                             signif=signif, seed=seed)
+        if not plot_stderr:
+            stderr = None
+
+        fig = plotting.irf_grid_plot(cum_effects, stderr, impulse, response,
+                                     self.model.names, title, signif=signif,
+                                     hlines=lr_effects,
+                                     subplot_params=subplot_params,
+                                     plot_params=plot_params,
+                                     figsize=figsize,
+                                     stderr_type=stderr_type)
+        return fig


 class IRAnalysis(BaseIRAnalysis):
@@ -135,16 +250,18 @@ class IRAnalysis(BaseIRAnalysis):
     -----
     Using Lütkepohl (2005) notation
     """
-
     def __init__(self, model, P=None, periods=10, order=None, svar=False,
-        vecm=False):
-        BaseIRAnalysis.__init__(self, model, P=P, periods=periods, order=
-            order, svar=svar, vecm=vecm)
+                 vecm=False):
+        BaseIRAnalysis.__init__(self, model, P=P, periods=periods,
+                                order=order, svar=svar, vecm=vecm)
+
         if vecm:
             self.cov_a = model.cov_var_repr
         else:
             self.cov_a = model._cov_alpha
         self.cov_sig = model._cov_sigma
+
+        # memoize dict for G matrix function
         self._g_memo = {}

     def cov(self, orth=False):
@@ -158,17 +275,35 @@ class IRAnalysis(BaseIRAnalysis):
         Returns
         -------
         """
-        pass
+        if orth:
+            return self._orth_cov()
+
+        covs = self._empty_covm(self.periods + 1)
+        covs[0] = np.zeros((self.neqs ** 2, self.neqs ** 2))
+        for i in range(1, self.periods + 1):
+            Gi = self.G[i - 1]
+            covs[i] = Gi @ self.cov_a @ Gi.T

-    def errband_mc(self, orth=False, svar=False, repl=1000, signif=0.05,
-        seed=None, burn=100):
+        return covs
+
+    def errband_mc(self, orth=False, svar=False, repl=1000,
+                   signif=0.05, seed=None, burn=100):
         """
         IRF Monte Carlo integrated error bands
         """
-        pass
+        model = self.model
+        periods = self.periods
+        if svar:
+            return model.sirf_errband_mc(orth=orth, repl=repl, steps=periods,
+                                         signif=signif, seed=seed,
+                                         burn=burn, cum=False)
+        else:
+            return model.irf_errband_mc(orth=orth, repl=repl, steps=periods,
+                                        signif=signif, seed=seed,
+                                        burn=burn, cum=False)

-    def err_band_sz1(self, orth=False, svar=False, repl=1000, signif=0.05,
-        seed=None, burn=100, component=None):
+    def err_band_sz1(self, orth=False, svar=False, repl=1000,
+                     signif=0.05, seed=None, burn=100, component=None):
         """
         IRF Sims-Zha error band method 1. Assumes symmetric error bands around
         mean.
@@ -195,10 +330,37 @@ class IRAnalysis(BaseIRAnalysis):
         Sims, Christopher A., and Tao Zha. 1999. "Error Bands for Impulse
         Response". Econometrica 67: 1113-1155.
         """
-        pass
+
+        model = self.model
+        periods = self.periods
+        irfs = self._choose_irfs(orth, svar)
+        neqs = self.neqs
+        irf_resim = model.irf_resim(orth=orth, repl=repl, steps=periods,
+                                    seed=seed, burn=burn)
+        q = util.norm_signif_level(signif)
+
+        W, eigva, k =self._eigval_decomp_SZ(irf_resim)
+
+        if component is not None:
+            if np.shape(component) != (neqs,neqs):
+                raise ValueError("Component array must be " + str(neqs) + " x " + str(neqs))
+            if np.argmax(component) >= neqs*periods:
+                raise ValueError("Atleast one of the components does not exist")
+            else:
+                k = component
+
+        # here take the kth column of W, which we determine by finding the largest eigenvalue of the covaraince matrix
+        lower = np.copy(irfs)
+        upper = np.copy(irfs)
+        for i in range(neqs):
+            for j in range(neqs):
+                lower[1:,i,j] = irfs[1:,i,j] + W[i,j,:,k[i,j]]*q*np.sqrt(eigva[i,j,k[i,j]])
+                upper[1:,i,j] = irfs[1:,i,j] - W[i,j,:,k[i,j]]*q*np.sqrt(eigva[i,j,k[i,j]])
+
+        return lower, upper

     def err_band_sz2(self, orth=False, svar=False, repl=1000, signif=0.05,
-        seed=None, burn=100, component=None):
+                     seed=None, burn=100, component=None):
         """
         IRF Sims-Zha error band method 2.

@@ -226,10 +388,43 @@ class IRAnalysis(BaseIRAnalysis):
         Sims, Christopher A., and Tao Zha. 1999. "Error Bands for Impulse
         Response". Econometrica 67: 1113-1155.
         """
-        pass
+        model = self.model
+        periods = self.periods
+        irfs = self._choose_irfs(orth, svar)
+        neqs = self.neqs
+        irf_resim = model.irf_resim(orth=orth, repl=repl, steps=periods, seed=seed,
+                                    burn=100)
+
+        W, eigva, k = self._eigval_decomp_SZ(irf_resim)
+
+        if component is not None:
+            if np.shape(component) != (neqs,neqs):
+                raise ValueError("Component array must be " + str(neqs) + " x " + str(neqs))
+            if np.argmax(component) >= neqs*periods:
+                raise ValueError("Atleast one of the components does not exist")
+            else:
+                k = component
+
+        gamma = np.zeros((repl, periods+1, neqs, neqs))
+        for p in range(repl):
+            for i in range(neqs):
+                for j in range(neqs):
+                    gamma[p,1:,i,j] = W[i,j,k[i,j],:] * irf_resim[p,1:,i,j]
+
+        gamma_sort = np.sort(gamma, axis=0) #sort to get quantiles
+        indx = round(signif/2*repl)-1,round((1-signif/2)*repl)-1
+
+        lower = np.copy(irfs)
+        upper = np.copy(irfs)
+        for i in range(neqs):
+            for j in range(neqs):
+                lower[:,i,j] = irfs[:,i,j] + gamma_sort[indx[0],:,i,j]
+                upper[:,i,j] = irfs[:,i,j] + gamma_sort[indx[1],:,i,j]
+
+        return lower, upper

     def err_band_sz3(self, orth=False, svar=False, repl=1000, signif=0.05,
-        seed=None, burn=100, component=None):
+                     seed=None, burn=100, component=None):
         """
         IRF Sims-Zha error band method 3. Does not assume symmetric error bands around mean.

@@ -255,7 +450,59 @@ class IRAnalysis(BaseIRAnalysis):
         Sims, Christopher A., and Tao Zha. 1999. "Error Bands for Impulse
         Response". Econometrica 67: 1113-1155.
         """
-        pass
+
+        model = self.model
+        periods = self.periods
+        irfs = self._choose_irfs(orth, svar)
+        neqs = self.neqs
+        irf_resim = model.irf_resim(orth=orth, repl=repl, steps=periods,
+                                    seed=seed, burn=100)
+        stack = np.zeros((neqs, repl, periods*neqs))
+
+        #stack left to right, up and down
+
+        for p in range(repl):
+            for i in range(neqs):
+                stack[i, p,:] = np.ravel(irf_resim[p,1:,:,i].T)
+
+        stack_cov=np.zeros((neqs, periods*neqs, periods*neqs))
+        W = np.zeros((neqs, periods*neqs, periods*neqs))
+        eigva = np.zeros((neqs, periods*neqs))
+        k = np.zeros(neqs, dtype=int)
+
+        if component is not None:
+            if np.size(component) != (neqs):
+                raise ValueError("Component array must be of length " + str(neqs))
+            if np.argmax(component) >= neqs*periods:
+                raise ValueError("Atleast one of the components does not exist")
+            else:
+                k = component
+
+        #compute for eigen decomp for each stack
+        for i in range(neqs):
+            stack_cov[i] = np.cov(stack[i],rowvar=0)
+            W[i], eigva[i], k[i] = util.eigval_decomp(stack_cov[i])
+
+        gamma = np.zeros((repl, periods+1, neqs, neqs))
+        for p in range(repl):
+            c = 0
+            for j in range(neqs):
+                for i in range(neqs):
+                    gamma[p,1:,i,j] = W[j,k[j],i*periods:(i+1)*periods] * irf_resim[p,1:,i,j]
+                    if i == neqs-1:
+                        gamma[p,1:,i,j] = W[j,k[j],i*periods:] * irf_resim[p,1:,i,j]
+
+        gamma_sort = np.sort(gamma, axis=0) #sort to get quantiles
+        indx = round(signif/2*repl)-1,round((1-signif/2)*repl)-1
+
+        lower = np.copy(irfs)
+        upper = np.copy(irfs)
+        for i in range(neqs):
+            for j in range(neqs):
+                lower[:,i,j] = irfs[:,i,j] + gamma_sort[indx[0],:,i,j]
+                upper[:,i,j] = irfs[:,i,j] + gamma_sort[indx[1],:,i,j]
+
+        return lower, upper

     def _eigval_decomp_SZ(self, irf_resim):
         """
@@ -265,7 +512,76 @@ class IRAnalysis(BaseIRAnalysis):
         eigva: list of eigenvalues
         k: matrix indicating column # of largest eigenvalue for each c_i,j
         """
-        pass
+        neqs = self.neqs
+        periods = self.periods
+
+        cov_hold = np.zeros((neqs, neqs, periods, periods))
+        for i in range(neqs):
+            for j in range(neqs):
+                cov_hold[i,j,:,:] = np.cov(irf_resim[:,1:,i,j],rowvar=0)
+
+        W = np.zeros((neqs, neqs, periods, periods))
+        eigva = np.zeros((neqs, neqs, periods, 1))
+        k = np.zeros((neqs, neqs), dtype=int)
+
+        for i in range(neqs):
+            for j in range(neqs):
+                W[i,j,:,:], eigva[i,j,:,0], k[i,j] = util.eigval_decomp(cov_hold[i,j,:,:])
+        return W, eigva, k
+
+    @cache_readonly
+    def G(self):
+        # Gi matrices as defined on p. 111
+
+        K = self.neqs
+
+        # nlags = self.model.p
+        # J = np.hstack((np.eye(K),) + (np.zeros((K, K)),) * (nlags - 1))
+
+        def _make_g(i):
+            # p. 111 Lutkepohl
+            G = 0.
+            for m in range(i):
+                # be a bit cute to go faster
+                idx = i - 1 - m
+                if idx in self._g_memo:
+                    apow = self._g_memo[idx]
+                else:
+                    apow = la.matrix_power(self._A.T, idx)
+                    # apow = np.dot(J, apow)
+                    apow = apow[:K]
+                    self._g_memo[idx] = apow
+
+                # take first K rows
+                piece = np.kron(apow, self.irfs[m])
+                G = G + piece
+
+            return G
+
+        return [_make_g(i) for i in range(1, self.periods + 1)]
+
+    def _orth_cov(self):
+        # Lutkepohl 3.7.8
+
+        Ik = np.eye(self.neqs)
+        PIk = np.kron(self.P.T, Ik)
+        H = self.H
+
+        covs = self._empty_covm(self.periods + 1)
+        for i in range(self.periods + 1):
+            if i == 0:
+                apiece = 0
+            else:
+                Ci = np.dot(PIk, self.G[i-1])
+                apiece = Ci @ self.cov_a @ Ci.T
+
+            Cibar = np.dot(np.kron(Ik, self.irfs[i]), H)
+            bpiece = (Cibar @ self.cov_sig @ Cibar.T) / self.T
+
+            # Lutkepohl typo, cov_sig correct
+            covs[i] = apiece + bpiece
+
+        return covs

     def cum_effect_cov(self, orth=False):
         """
@@ -283,18 +599,94 @@ class IRAnalysis(BaseIRAnalysis):
         Returns
         -------
         """
-        pass
+        Ik = np.eye(self.neqs)
+        PIk = np.kron(self.P.T, Ik)
+
+        F = 0.
+        covs = self._empty_covm(self.periods + 1)
+        for i in range(self.periods + 1):
+            if i > 0:
+                F = F + self.G[i - 1]
+
+            if orth:
+                if i == 0:
+                    apiece = 0
+                else:
+                    Bn = np.dot(PIk, F)
+                    apiece = Bn @ self.cov_a @ Bn.T
+
+                Bnbar = np.dot(np.kron(Ik, self.cum_effects[i]), self.H)
+                bpiece = (Bnbar @ self.cov_sig @ Bnbar.T) / self.T
+
+                covs[i] = apiece + bpiece
+            else:
+                if i == 0:
+                    covs[i] = np.zeros((self.neqs**2, self.neqs**2))
+                    continue
+
+                covs[i] = F @ self.cov_a @ F.T

-    def cum_errband_mc(self, orth=False, repl=1000, signif=0.05, seed=None,
-        burn=100):
+        return covs
+
+    def cum_errband_mc(self, orth=False, repl=1000,
+                       signif=0.05, seed=None, burn=100):
         """
         IRF Monte Carlo integrated error bands of cumulative effect
         """
-        pass
+        model = self.model
+        periods = self.periods
+        return model.irf_errband_mc(orth=orth, repl=repl,
+                                    steps=periods, signif=signif,
+                                    seed=seed, burn=burn, cum=True)

     def lr_effect_cov(self, orth=False):
         """
         Returns
         -------
         """
-        pass
+        lre = self.lr_effects
+        Finfty = np.kron(np.tile(lre.T, self.lags), lre)
+        Ik = np.eye(self.neqs)
+
+        if orth:
+            Binf = np.dot(np.kron(self.P.T, np.eye(self.neqs)), Finfty)
+            Binfbar = np.dot(np.kron(Ik, lre), self.H)
+
+            return (Binf @ self.cov_a @ Binf.T +
+                    Binfbar @ self.cov_sig @ Binfbar.T)
+        else:
+            return Finfty @ self.cov_a @ Finfty.T
+
+    def stderr(self, orth=False):
+        return np.array([tsa.unvec(np.sqrt(np.diag(c)))
+                         for c in self.cov(orth=orth)])
+
+    def cum_effect_stderr(self, orth=False):
+        return np.array([tsa.unvec(np.sqrt(np.diag(c)))
+                         for c in self.cum_effect_cov(orth=orth)])
+
+    def lr_effect_stderr(self, orth=False):
+        cov = self.lr_effect_cov(orth=orth)
+        return tsa.unvec(np.sqrt(np.diag(cov)))
+
+    def _empty_covm(self, periods):
+        return np.zeros((periods, self.neqs ** 2, self.neqs ** 2),
+                        dtype=float)
+
+    @cache_readonly
+    def H(self):
+        k = self.neqs
+        Lk = tsa.elimination_matrix(k)
+        Kkk = tsa.commutation_matrix(k, k)
+        Ik = np.eye(k)
+
+        # B = Lk @ (np.eye(k**2) + commutation_matrix(k, k)) @ \
+        #     np.kron(self.P, np.eye(k)) @ Lk.T
+        # return Lk.T @ L.inv(B)
+
+        B = Lk @ (np.kron(Ik, self.P) @ Kkk + np.kron(self.P, Ik)) @ Lk.T
+
+        return np.dot(Lk.T, L.inv(B))
+
+    def fevd_table(self):
+        raise NotImplementedError
diff --git a/statsmodels/tsa/vector_ar/output.py b/statsmodels/tsa/vector_ar/output.py
index 24b341203..9ddc71281 100644
--- a/statsmodels/tsa/vector_ar/output.py
+++ b/statsmodels/tsa/vector_ar/output.py
@@ -1,25 +1,70 @@
 from statsmodels.compat.python import lzip
+
 from io import StringIO
+
 import numpy as np
+
 from statsmodels.iolib import SimpleTable
+
 mat = np.array
-_default_table_fmt = dict(empty_cell='', colsep='  ', row_pre='', row_post=
-    '', table_dec_above='=', table_dec_below='=', header_dec_below='-',
-    header_fmt='%s', stub_fmt='%s', title_align='c', header_align='r',
-    data_aligns='r', stubs_align='l', fmt='txt')
+
+_default_table_fmt = dict(
+    empty_cell = '',
+    colsep='  ',
+    row_pre = '',
+    row_post = '',
+    table_dec_above='=',
+    table_dec_below='=',
+    header_dec_below='-',
+    header_fmt = '%s',
+    stub_fmt = '%s',
+    title_align='c',
+    header_align = 'r',
+    data_aligns = 'r',
+    stubs_align = 'l',
+    fmt = 'txt'
+)


 class VARSummary:
-    default_fmt = dict(data_fmts=['%#15.6F', '%#15.6F', '%#15.3F',
-        '%#14.3F'], empty_cell='', colsep='  ', row_pre='', row_post='',
-        table_dec_above='=', table_dec_below='=', header_dec_below='-',
-        header_fmt='%s', stub_fmt='%s', title_align='c', header_align='r',
-        data_aligns='r', stubs_align='l', fmt='txt')
-    part1_fmt = dict(default_fmt, data_fmts=['%s'], colwidths=15, colsep=
-        ' ', table_dec_below='', header_dec_below=None)
-    part2_fmt = dict(default_fmt, data_fmts=['%#12.6g', '%#12.6g',
-        '%#10.4g', '%#5.4g'], colwidths=None, colsep='    ',
-        table_dec_above='-', table_dec_below='-', header_dec_below=None)
+    default_fmt = dict(
+        #data_fmts = ["%#12.6g","%#12.6g","%#10.4g","%#5.4g"],
+        #data_fmts = ["%#10.4g","%#10.4g","%#10.4g","%#6.4g"],
+        data_fmts = ["%#15.6F","%#15.6F","%#15.3F","%#14.3F"],
+        empty_cell = '',
+        #colwidths = 10,
+        colsep='  ',
+        row_pre = '',
+        row_post = '',
+        table_dec_above='=',
+        table_dec_below='=',
+        header_dec_below='-',
+        header_fmt = '%s',
+        stub_fmt = '%s',
+        title_align='c',
+        header_align = 'r',
+        data_aligns = 'r',
+        stubs_align = 'l',
+        fmt = 'txt'
+    )
+
+    part1_fmt = dict(
+        default_fmt,
+        data_fmts = ["%s"],
+        colwidths = 15,
+        colsep=' ',
+        table_dec_below='',
+        header_dec_below=None,
+    )
+    part2_fmt = dict(
+        default_fmt,
+        data_fmts = ["%#12.6g","%#12.6g","%#10.4g","%#5.4g"],
+        colwidths = None,
+        colsep='    ',
+        table_dec_above='-',
+        table_dec_below='-',
+        header_dec_below=None,
+    )

     def __init__(self, estimator):
         self.model = estimator
@@ -32,4 +77,172 @@ class VARSummary:
         """
         Summary of VAR model
         """
-        pass
+        buf = StringIO()
+
+        buf.write(self._header_table() + '\n')
+        buf.write(self._stats_table() + '\n')
+        buf.write(self._coef_table() + '\n')
+        buf.write(self._resid_info() + '\n')
+
+        return buf.getvalue()
+
+    def _header_table(self):
+        import time
+
+        model = self.model
+
+        t = time.localtime()
+
+        # TODO: change when we allow coef restrictions
+        # ncoefs = len(model.beta)
+
+        # Header information
+        part1title = "Summary of Regression Results"
+        part1data = [[model._model_type],
+                     ["OLS"], #TODO: change when fit methods change
+                     [time.strftime("%a, %d, %b, %Y", t)],
+                     [time.strftime("%H:%M:%S", t)]]
+        part1header = None
+        part1stubs = ('Model:',
+                      'Method:',
+                      'Date:',
+                      'Time:')
+        part1 = SimpleTable(part1data, part1header, part1stubs,
+                            title=part1title, txt_fmt=self.part1_fmt)
+
+        return str(part1)
+
+    def _stats_table(self):
+        # TODO: do we want individual statistics or should users just
+        # use results if wanted?
+        # Handle overall fit statistics
+
+        model = self.model
+
+        part2Lstubs = ('No. of Equations:',
+                       'Nobs:',
+                       'Log likelihood:',
+                       'AIC:')
+        part2Rstubs = ('BIC:',
+                       'HQIC:',
+                       'FPE:',
+                       'Det(Omega_mle):')
+        part2Ldata = [[model.neqs], [model.nobs], [model.llf], [model.aic]]
+        part2Rdata = [[model.bic], [model.hqic], [model.fpe], [model.detomega]]
+        part2Lheader = None
+        part2L = SimpleTable(part2Ldata, part2Lheader, part2Lstubs,
+                             txt_fmt = self.part2_fmt)
+        part2R = SimpleTable(part2Rdata, part2Lheader, part2Rstubs,
+                             txt_fmt = self.part2_fmt)
+        part2L.extend_right(part2R)
+
+        return str(part2L)
+
+    def _coef_table(self):
+        model = self.model
+        k = model.neqs
+
+        Xnames = self.model.exog_names
+
+        data = lzip(model.params.T.ravel(),
+                    model.stderr.T.ravel(),
+                    model.tvalues.T.ravel(),
+                    model.pvalues.T.ravel())
+
+        header = ('coefficient','std. error','t-stat','prob')
+
+        buf = StringIO()
+        dim = k * model.k_ar + model.k_trend + model.k_exog_user
+        for i in range(k):
+            section = "Results for equation %s" % model.names[i]
+            buf.write(section + '\n')
+
+            table = SimpleTable(data[dim * i : dim * (i + 1)], header,
+                                Xnames, title=None, txt_fmt = self.default_fmt)
+            buf.write(str(table) + '\n')
+
+            if i < k - 1:
+                buf.write('\n')
+
+        return buf.getvalue()
+
+    def _resid_info(self):
+        buf = StringIO()
+        names = self.model.names
+
+        buf.write("Correlation matrix of residuals" + '\n')
+        buf.write(pprint_matrix(self.model.resid_corr, names, names) + '\n')
+
+        return buf.getvalue()
+
+
+def normality_summary(results):
+    title = "Normality skew/kurtosis Chi^2-test"
+    null_hyp = 'H_0: data generated by normally-distributed process'
+    return hypothesis_test_table(results, title, null_hyp)
+
+
+def hypothesis_test_table(results, title, null_hyp):
+    fmt = dict(_default_table_fmt,
+               data_fmts=["%#15.6F","%#15.6F","%#15.3F", "%s"])
+
+    buf = StringIO()
+    table = SimpleTable([[results['statistic'],
+                          results['crit_value'],
+                          results['pvalue'],
+                          str(results['df'])]],
+                        ['Test statistic', 'Critical Value', 'p-value',
+                         'df'], [''], title=None, txt_fmt=fmt)
+
+    buf.write(title + '\n')
+    buf.write(str(table) + '\n')
+
+    buf.write(null_hyp + '\n')
+
+    buf.write("Conclusion: %s H_0" % results['conclusion'])
+    buf.write(" at %.2f%% significance level" % (results['signif'] * 100))
+
+    return buf.getvalue()
+
+
+def pprint_matrix(values, rlabels, clabels, col_space=None):
+    buf = StringIO()
+
+    T, K = len(rlabels), len(clabels)
+
+    if col_space is None:
+        min_space = 10
+        col_space = [max(len(str(c)) + 2, min_space) for c in clabels]
+    else:
+        col_space = (col_space,) * K
+
+    row_space = max([len(str(x)) for x in rlabels]) + 2
+
+    head = _pfixed('', row_space)
+
+    for j, h in enumerate(clabels):
+        head += _pfixed(h, col_space[j])
+
+    buf.write(head + '\n')
+
+    for i, rlab in enumerate(rlabels):
+        line = ('%s' % rlab).ljust(row_space)
+
+        for j in range(K):
+            line += _pfixed(values[i,j], col_space[j])
+
+        buf.write(line + '\n')
+
+    return buf.getvalue()
+
+
+def _pfixed(s, space, nanRep=None, float_format=None):
+    if isinstance(s, float):
+        if float_format:
+            formatted = float_format(s)
+        else:
+            formatted = "%#8.6F" % s
+
+        return formatted.rjust(space)
+    else:
+        return ('%s' % s)[:space].rjust(space)
diff --git a/statsmodels/tsa/vector_ar/plotting.py b/statsmodels/tsa/vector_ar/plotting.py
index f6fe62f8a..bdf8201bc 100644
--- a/statsmodels/tsa/vector_ar/plotting.py
+++ b/statsmodels/tsa/vector_ar/plotting.py
@@ -1,5 +1,7 @@
 from statsmodels.compat.python import lrange
+
 import numpy as np
+
 import statsmodels.tsa.vector_ar.util as util


@@ -8,16 +10,87 @@ class MPLConfigurator:
     def __init__(self):
         self._inverse_actions = []

+    def revert(self):
+        for action in self._inverse_actions:
+            action()
+
+    def set_fontsize(self, size):
+        import matplotlib as mpl
+        old_size = mpl.rcParams['font.size']
+        mpl.rcParams['font.size'] = size
+
+        def revert():
+            mpl.rcParams['font.size'] = old_size
+
+        self._inverse_actions.append(revert)
+
+
+#-------------------------------------------------------------------------------
+# Plotting functions

 def plot_mts(Y, names=None, index=None):
     """
     Plot multiple time series
     """
-    pass
+    import matplotlib.pyplot as plt
+
+    k = Y.shape[1]
+    rows, cols = k, 1
+
+    fig = plt.figure(figsize=(10, 10))
+
+    for j in range(k):
+        ts = Y[:, j]
+
+        ax = fig.add_subplot(rows, cols, j+1)
+        if index is not None:
+            ax.plot(index, ts)
+        else:
+            ax.plot(ts)
+
+        if names is not None:
+            ax.set_title(names[j])
+
+    return fig
+
+
+def plot_var_forc(prior, forc, err_upper, err_lower,
+                  index=None, names=None, plot_stderr=True,
+                  legend_options=None):
+    import matplotlib.pyplot as plt
+
+    n, k = prior.shape
+    rows, cols = k, 1

+    fig = plt.figure(figsize=(10, 10))

-def plot_with_error(y, error, x=None, axes=None, value_fmt='k', error_fmt=
-    'k--', alpha=0.05, stderr_type='asym'):
+    prange = np.arange(n)
+    rng_f = np.arange(n - 1, n + len(forc))
+    rng_err = np.arange(n, n + len(forc))
+
+    for j in range(k):
+        ax = plt.subplot(rows, cols, j+1)
+
+        p1 = ax.plot(prange, prior[:, j], 'k', label='Observed')
+        p2 = ax.plot(rng_f, np.r_[prior[-1:, j], forc[:, j]], 'k--',
+                     label='Forecast')
+
+        if plot_stderr:
+            p3 = ax.plot(rng_err, err_upper[:, j], 'k-.',
+                         label='Forc 2 STD err')
+            ax.plot(rng_err, err_lower[:, j], 'k-.')
+
+        if names is not None:
+            ax.set_title(names[j])
+
+        if legend_options is None:
+            legend_options = {"loc": "upper right"}
+        ax.legend(**legend_options)
+    return fig
+
+
+def plot_with_error(y, error, x=None, axes=None, value_fmt='k',
+                    error_fmt='k--', alpha=0.05, stderr_type = 'asym'):
     """
     Make plot with optional error bars

@@ -26,22 +99,95 @@ def plot_with_error(y, error, x=None, axes=None, value_fmt='k', error_fmt=
     y :
     error : array or None
     """
-    pass
+    import matplotlib.pyplot as plt
+
+    if axes is None:
+        axes = plt.gca()

+    x = x if x is not None else lrange(len(y))
+    plot_action = lambda y, fmt: axes.plot(x, y, fmt)
+    plot_action(y, value_fmt)

-def plot_full_acorr(acorr, fontsize=8, linewidth=8, xlabel=None, err_bound=None
-    ):
+    #changed this
+    if error is not None:
+        if stderr_type == 'asym':
+            q = util.norm_signif_level(alpha)
+            plot_action(y - q * error, error_fmt)
+            plot_action(y + q * error, error_fmt)
+        if stderr_type in ('mc','sz1','sz2','sz3'):
+            plot_action(error[0], error_fmt)
+            plot_action(error[1], error_fmt)
+
+
+def plot_full_acorr(acorr, fontsize=8, linewidth=8, xlabel=None,
+                    err_bound=None):
     """

     Parameters
     ----------
     """
-    pass
+    import matplotlib.pyplot as plt
+
+    config = MPLConfigurator()
+    config.set_fontsize(fontsize)
+
+    k = acorr.shape[1]
+    fig, axes = plt.subplots(k, k, figsize=(10, 10), squeeze=False)
+
+    for i in range(k):
+        for j in range(k):
+            ax = axes[i][j]
+            acorr_plot(acorr[:, i, j], linewidth=linewidth,
+                       xlabel=xlabel, ax=ax)
+
+            if err_bound is not None:
+                ax.axhline(err_bound, color='k', linestyle='--')
+                ax.axhline(-err_bound, color='k', linestyle='--')
+
+    adjust_subplots()
+    config.revert()

+    return fig

-def irf_grid_plot(values, stderr, impcol, rescol, names, title, signif=0.05,
-    hlines=None, subplot_params=None, plot_params=None, figsize=(10, 10),
-    stderr_type='asym'):
+
+def acorr_plot(acorr, linewidth=8, xlabel=None, ax=None):
+    import matplotlib.pyplot as plt
+
+    if ax is None:
+        ax = plt.gca()
+
+    if xlabel is None:
+        xlabel = np.arange(len(acorr))
+
+    ax.vlines(xlabel, [0], acorr, lw=linewidth)
+
+    ax.axhline(0, color='k')
+    ax.set_ylim([-1, 1])
+
+    # hack?
+    ax.set_xlim([-1, xlabel[-1] + 1])
+
+
+def plot_acorr_with_error():
+    raise NotImplementedError
+
+
+def adjust_subplots(**kwds):
+    import matplotlib.pyplot as plt
+
+    passed_kwds = dict(bottom=0.05, top=0.925,
+                       left=0.05, right=0.95,
+                       hspace=0.2)
+    passed_kwds.update(kwds)
+    plt.subplots_adjust(**passed_kwds)
+
+
+#-------------------------------------------------------------------------------
+# Multiple impulse response (cum_effects, etc.) cplots
+
+def irf_grid_plot(values, stderr, impcol, rescol, names, title,
+                  signif=0.05, hlines=None, subplot_params=None,
+                  plot_params=None, figsize=(10,10), stderr_type='asym'):
     """
     Reusable function to make flexible grid plots of impulse responses and
     comulative effects
@@ -50,4 +196,79 @@ def irf_grid_plot(values, stderr, impcol, rescol, names, title, signif=0.05,
     stderr : T x k x k
     hlines : k x k
     """
-    pass
+    import matplotlib.pyplot as plt
+
+    if subplot_params is None:
+        subplot_params = {}
+    if plot_params is None:
+        plot_params = {}
+
+    nrows, ncols, to_plot = _get_irf_plot_config(names, impcol, rescol)
+
+    fig, axes = plt.subplots(nrows=nrows, ncols=ncols, sharex=True,
+                             squeeze=False, figsize=figsize)
+
+    # fill out space
+    adjust_subplots()
+
+    fig.suptitle(title, fontsize=14)
+
+    subtitle_temp = r'%s$\rightarrow$%s'
+
+    k = len(names)
+
+    rng = lrange(len(values))
+    for (j, i, ai, aj) in to_plot:
+        ax = axes[ai][aj]
+
+        # HACK?
+        if stderr is not None:
+            if stderr_type == 'asym':
+                sig = np.sqrt(stderr[:, j * k + i, j * k + i])
+                plot_with_error(values[:, i, j], sig, x=rng, axes=ax,
+                                alpha=signif, value_fmt='b', stderr_type=stderr_type)
+            if stderr_type in ('mc','sz1','sz2','sz3'):
+                errs = stderr[0][:, i, j], stderr[1][:, i, j]
+                plot_with_error(values[:, i, j], errs, x=rng, axes=ax,
+                                alpha=signif, value_fmt='b', stderr_type=stderr_type)
+        else:
+            plot_with_error(values[:, i, j], None, x=rng, axes=ax,
+                            value_fmt='b')
+
+        ax.axhline(0, color='k')
+
+        if hlines is not None:
+            ax.axhline(hlines[i,j], color='k')
+
+        sz = subplot_params.get('fontsize', 12)
+        ax.set_title(subtitle_temp % (names[j], names[i]), fontsize=sz)
+
+    return fig
+
+
+def _get_irf_plot_config(names, impcol, rescol):
+    nrows = ncols = k = len(names)
+    if impcol is not None and rescol is not None:
+        # plot one impulse-response pair
+        nrows = ncols = 1
+        j = util.get_index(names, impcol)
+        i = util.get_index(names, rescol)
+        to_plot = [(j, i, 0, 0)]
+    elif impcol is not None:
+        # plot impacts of impulse in one variable
+        ncols = 1
+        j = util.get_index(names, impcol)
+        to_plot = [(j, i, i, 0) for i in range(k)]
+    elif rescol is not None:
+        # plot only things having impact on particular variable
+        ncols = 1
+        i = util.get_index(names, rescol)
+        to_plot = [(j, i, j, 0) for j in range(k)]
+    else:
+        # plot everything
+        to_plot = [(j, i, i, j) for i in range(k) for j in range(k)]
+
+    return nrows, ncols, to_plot
+
+#-------------------------------------------------------------------------------
+# Forecast error variance decomposition
diff --git a/statsmodels/tsa/vector_ar/svar_model.py b/statsmodels/tsa/vector_ar/svar_model.py
index 776602b27..6c43b9d6a 100644
--- a/statsmodels/tsa/vector_ar/svar_model.py
+++ b/statsmodels/tsa/vector_ar/svar_model.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Vector Autoregression (VAR) processes

@@ -8,6 +9,7 @@ Lütkepohl (2005) New Introduction to Multiple Time Series Analysis
 import numpy as np
 import numpy.linalg as npl
 from numpy.linalg import slogdet
+
 from statsmodels.tools.decorators import deprecated_alias
 from statsmodels.tools.numdiff import approx_fprime, approx_hess
 import statsmodels.tsa.base.tsa_model as tsbase
@@ -16,11 +18,19 @@ import statsmodels.tsa.vector_ar.util as util
 from statsmodels.tsa.vector_ar.var_model import VARProcess, VARResults


+def svar_ckerr(svar_type, A, B):
+    if A is None and (svar_type == 'A' or svar_type == 'AB'):
+        raise ValueError('SVAR of type A or AB but A array not given.')
+    if B is None and (svar_type == 'B' or svar_type == 'AB'):
+
+        raise ValueError('SVAR of type B or AB but B array not given.')
+
+
 class SVAR(tsbase.TimeSeriesModel):
-    """
+    r"""
     Fit VAR and then estimate structural components of A and B, defined:

-    .. math:: Ay_t = A_1 y_{t-1} + \\ldots + A_p y_{t-p} + B\\var(\\epsilon_t)
+    .. math:: Ay_t = A_1 y_{t-1} + \ldots + A_p y_{t-p} + B\var(\epsilon_t)

     Parameters
     ----------
@@ -41,20 +51,30 @@ class SVAR(tsbase.TimeSeriesModel):
     ----------
     Hamilton (1994) Time Series Analysis
     """
-    y = deprecated_alias('y', 'endog', remove_version='0.11.0')

-    def __init__(self, endog, svar_type, dates=None, freq=None, A=None, B=
-        None, missing='none'):
+    y = deprecated_alias("y", "endog", remove_version="0.11.0")
+
+    def __init__(self, endog, svar_type, dates=None,
+                 freq=None, A=None, B=None, missing='none'):
         super().__init__(endog, None, dates, freq, missing=missing)
+        #(self.endog, self.names,
+        # self.dates) = data_util.interpret_data(endog, names, dates)
+
         self.neqs = self.endog.shape[1]
+
         types = ['A', 'B', 'AB']
         if svar_type not in types:
-            raise ValueError('SVAR type not recognized, must be in ' + str(
-                types))
+            raise ValueError('SVAR type not recognized, must be in '
+                             + str(types))
         self.svar_type = svar_type
+
         svar_ckerr(svar_type, A, B)
+
         self.A_original = A
         self.B_original = B
+
+        # initialize A, B as I if not given
+        # Initialize SVAR masks
         if A is None:
             A = np.identity(self.neqs)
             self.A_mask = A_mask = np.zeros(A.shape, dtype=bool)
@@ -67,18 +87,27 @@ class SVAR(tsbase.TimeSeriesModel):
         else:
             B_mask = np.logical_or(B == 'E', B == 'e')
             self.B_mask = B_mask
+
+        # convert A and B to numeric
+        #TODO: change this when masked support is better or with formula
+        #integration
         Anum = np.zeros(A.shape, dtype=float)
         Anum[~A_mask] = A[~A_mask]
         Anum[A_mask] = np.nan
         self.A = Anum
+
         Bnum = np.zeros(B.shape, dtype=float)
         Bnum[~B_mask] = B[~B_mask]
         Bnum[B_mask] = np.nan
         self.B = Bnum

+        #LikelihoodModel.__init__(self, endog)
+
+        #super().__init__(endog)
+
     def fit(self, A_guess=None, B_guess=None, maxlags=None, method='ols',
-        ic=None, trend='c', verbose=False, s_method='mle', solver='bfgs',
-        override=False, maxiter=500, maxfun=500):
+            ic=None, trend='c', verbose=False, s_method='mle',
+            solver="bfgs", override=False, maxiter=500, maxfun=500):
         """
         Fit the SVAR model and solve for structural parameters

@@ -131,22 +160,102 @@ class SVAR(tsbase.TimeSeriesModel):
         -------
         est : SVARResults
         """
-        pass
+        lags = maxlags
+        if ic is not None:
+            selections = self.select_order(maxlags=maxlags, verbose=verbose)
+            if ic not in selections:
+                raise ValueError("%s not recognized, must be among %s"
+                                 % (ic, sorted(selections)))
+            lags = selections[ic]
+            if verbose:
+                print('Using %d based on %s criterion' %  (lags, ic))
+        else:
+            if lags is None:
+                lags = 1
+
+        self.nobs = len(self.endog) - lags
+
+        # initialize starting parameters
+        start_params = self._get_init_params(A_guess, B_guess)
+
+        return self._estimate_svar(start_params, lags, trend=trend,
+                                   solver=solver, override=override,
+                                   maxiter=maxiter, maxfun=maxfun)

     def _get_init_params(self, A_guess, B_guess):
         """
         Returns either the given starting or .1 if none are given.
         """
-        pass

-    def _estimate_svar(self, start_params, lags, maxiter, maxfun, trend='c',
-        solver='nm', override=False):
+        var_type = self.svar_type.lower()
+
+        n_masked_a = self.A_mask.sum()
+        if var_type in ['ab', 'a']:
+            if A_guess is None:
+                A_guess = np.array([.1]*n_masked_a)
+            else:
+                if len(A_guess) != n_masked_a:
+                    msg = 'len(A_guess) = %s, there are %s parameters in A'
+                    raise ValueError(msg % (len(A_guess), n_masked_a))
+        else:
+            A_guess = []
+
+        n_masked_b = self.B_mask.sum()
+        if var_type in ['ab', 'b']:
+            if B_guess is None:
+                B_guess = np.array([.1]*n_masked_b)
+            else:
+                if len(B_guess) != n_masked_b:
+                    msg = 'len(B_guess) = %s, there are %s parameters in B'
+                    raise ValueError(msg % (len(B_guess), n_masked_b))
+        else:
+            B_guess = []
+
+        return np.r_[A_guess, B_guess]
+
+    def _estimate_svar(self, start_params, lags, maxiter, maxfun,
+                       trend='c', solver="nm", override=False):
         """
         lags : int
         trend : {str, None}
             As per above
         """
-        pass
+        k_trend = util.get_trendorder(trend)
+        y = self.endog
+        z = util.get_var_endog(y, lags, trend=trend, has_constant='raise')
+        y_sample = y[lags:]
+
+        # Lutkepohl p75, about 5x faster than stated formula
+        var_params = np.linalg.lstsq(z, y_sample, rcond=-1)[0]
+        resid = y_sample - np.dot(z, var_params)
+
+        # Unbiased estimate of covariance matrix $\Sigma_u$ of the white noise
+        # process $u$
+        # equivalent definition
+        # .. math:: \frac{1}{T - Kp - 1} Y^\prime (I_T - Z (Z^\prime Z)^{-1}
+        # Z^\prime) Y
+        # Ref: Lutkepohl p.75
+        # df_resid right now is T - Kp - 1, which is a suggested correction
+
+        avobs = len(y_sample)
+
+        df_resid = avobs - (self.neqs * lags + k_trend)
+
+        sse = np.dot(resid.T, resid)
+        #TODO: should give users the option to use a dof correction or not
+        omega = sse / df_resid
+        self.sigma_u = omega
+
+        A, B = self._solve_AB(start_params, override=override,
+                              solver=solver,
+                              maxiter=maxiter)
+        A_mask = self.A_mask
+        B_mask = self.B_mask
+
+        return SVARResults(y, z, var_params, omega, lags,
+                           names=self.endog_names, trend=trend,
+                           dates=self.data.dates, model=self,
+                           A=A, B=B, A_mask=A_mask, B_mask=B_mask)

     def loglike(self, params):
         """
@@ -158,7 +267,34 @@ class SVAR(tsbase.TimeSeriesModel):
         first estimated, then likelihood with structural parameters
         is estimated
         """
-        pass
+
+        #TODO: this does not look robust if A or B is None
+        A = self.A
+        B = self.B
+        A_mask = self.A_mask
+        B_mask = self.B_mask
+        A_len = len(A[A_mask])
+        B_len = len(B[B_mask])
+
+        if A is not None:
+            A[A_mask] = params[:A_len]
+        if B is not None:
+            B[B_mask] = params[A_len:A_len+B_len]
+
+        nobs = self.nobs
+        neqs = self.neqs
+        sigma_u = self.sigma_u
+
+        W = np.dot(npl.inv(B),A)
+        trc_in = np.dot(np.dot(W.T,W),sigma_u)
+        sign, b_logdet = slogdet(B**2) #numpy 1.4 compat
+        b_slogdet = sign * b_logdet
+
+        likl = -nobs/2. * (neqs * np.log(2 * np.pi) -
+                           np.log(npl.det(A)**2) + b_slogdet +
+                           np.trace(trc_in))
+
+        return likl

     def score(self, AB_mask):
         """
@@ -172,13 +308,15 @@ class SVAR(tsbase.TimeSeriesModel):
         -----
         Return numerical gradient
         """
-        pass
+        loglike = self.loglike
+        return approx_fprime(AB_mask, loglike, epsilon=1e-8)

     def hessian(self, AB_mask):
         """
         Returns numerical hessian.
         """
-        pass
+        loglike = self.loglike
+        return approx_hess(AB_mask, loglike)

     def _solve_AB(self, start_params, maxiter, override=False, solver='bfgs'):
         """
@@ -201,7 +339,102 @@ class SVAR(tsbase.TimeSeriesModel):
         -------
         A_solve, B_solve: ML solutions for A, B matrices
         """
-        pass
+        #TODO: this could stand a refactor
+        A_mask = self.A_mask
+        B_mask = self.B_mask
+        A = self.A
+        B = self.B
+        A_len = len(A[A_mask])
+
+        A[A_mask] = start_params[:A_len]
+        B[B_mask] = start_params[A_len:]
+
+        if not override:
+            J = self._compute_J(A, B)
+            self.check_order(J)
+            self.check_rank(J)
+        else: #TODO: change to a warning?
+            print("Order/rank conditions have not been checked")
+
+        retvals = super().fit(start_params=start_params,
+                              method=solver, maxiter=maxiter,
+                              gtol=1e-20, disp=False).params
+
+        A[A_mask] = retvals[:A_len]
+        B[B_mask] = retvals[A_len:]
+
+        return A, B
+
+    def _compute_J(self, A_solve, B_solve):
+
+        #first compute appropriate duplication matrix
+        # taken from Magnus and Neudecker (1980),
+        #"The Elimination Matrix: Some Lemmas and Applications
+        # the creation of the D_n matrix follows MN (1980) directly,
+        #while the rest follows Hamilton (1994)
+
+        neqs = self.neqs
+        sigma_u = self.sigma_u
+        A_mask = self.A_mask
+        B_mask = self.B_mask
+
+        #first generate duplication matrix, see MN (1980) for notation
+
+        D_nT = np.zeros([int((1.0 / 2) * (neqs) * (neqs + 1)), neqs**2])
+
+        for j in range(neqs):
+            i=j
+            while j <= i < neqs:
+                u=np.zeros([int((1.0/2)*neqs*(neqs+1)), 1])
+                u[int(j * neqs + (i + 1) - (1.0 / 2) * (j + 1) * j - 1)] = 1
+                Tij=np.zeros([neqs,neqs])
+                Tij[i,j]=1
+                Tij[j,i]=1
+                D_nT=D_nT+np.dot(u,(Tij.ravel('F')[:,None]).T)
+                i=i+1
+
+        D_n=D_nT.T
+        D_pl=npl.pinv(D_n)
+
+        #generate S_B
+        S_B = np.zeros((neqs**2, len(A_solve[A_mask])))
+        S_D = np.zeros((neqs**2, len(B_solve[B_mask])))
+
+        j = 0
+        j_d = 0
+        if len(A_solve[A_mask]) != 0:
+            A_vec = np.ravel(A_mask, order='F')
+            for k in range(neqs**2):
+                if A_vec[k]:
+                    S_B[k,j] = -1
+                    j += 1
+        if len(B_solve[B_mask]) != 0:
+            B_vec = np.ravel(B_mask, order='F')
+            for k in range(neqs**2):
+                if B_vec[k]:
+                    S_D[k,j_d] = 1
+                    j_d +=1
+
+        #now compute J
+        invA = npl.inv(A_solve)
+        J_p1i = np.dot(np.dot(D_pl, np.kron(sigma_u, invA)), S_B)
+        J_p1 = -2.0 * J_p1i
+        J_p2 = np.dot(np.dot(D_pl, np.kron(invA, invA)), S_D)
+
+        J = np.append(J_p1, J_p2, axis=1)
+
+        return J
+
+    def check_order(self, J):
+        if np.size(J, axis=0) < np.size(J, axis=1):
+            raise ValueError("Order condition not met: "
+                             "solution may not be unique")
+
+    def check_rank(self, J):
+        rank = np.linalg.matrix_rank(J)
+        if rank < np.size(J, axis=1):
+            raise ValueError("Rank condition not met: "
+                             "solution may not be unique.")


 class SVARProcess(VARProcess):
@@ -219,9 +452,8 @@ class SVARProcess(VARProcess):
     B : neqs x neqs np.ndarry with unknown parameters marked with 'E'
     B_mask : neqs x neqs mask array with known parameters masked
     """
-
-    def __init__(self, coefs, intercept, sigma_u, A_solve, B_solve, names=None
-        ):
+    def __init__(self, coefs, intercept, sigma_u, A_solve, B_solve,
+                 names=None):
         self.k_ar = len(coefs)
         self.neqs = coefs.shape[1]
         self.coefs = coefs
@@ -236,7 +468,7 @@ class SVARProcess(VARProcess):

         Unavailable for SVAR
         """
-        pass
+        raise NotImplementedError

     def svar_ma_rep(self, maxn=10, P=None):
         """
@@ -244,7 +476,13 @@ class SVARProcess(VARProcess):
         Compute Structural MA coefficient matrices using MLE
         of A, B
         """
-        pass
+        if P is None:
+            A_solve = self.A_solve
+            B_solve = self.B_solve
+            P = np.dot(npl.inv(A_solve), B_solve)
+
+        ma_mats = self.ma_rep(maxn=maxn)
+        return np.array([np.dot(coefs, P) for coefs in ma_mats])


 class SVARResults(SVARProcess, VARResults):
@@ -307,37 +545,51 @@ class SVARResults(SVARProcess, VARResults):
     trenorder
     tvalues
     """
+
     _model_type = 'SVAR'

-    def __init__(self, endog, endog_lagged, params, sigma_u, lag_order, A=
-        None, B=None, A_mask=None, B_mask=None, model=None, trend='c',
-        names=None, dates=None):
+    def __init__(self, endog, endog_lagged, params, sigma_u, lag_order,
+                 A=None, B=None, A_mask=None, B_mask=None, model=None,
+                 trend='c', names=None, dates=None):
+
         self.model = model
         self.endog = endog
         self.endog_lagged = endog_lagged
         self.dates = dates
+
         self.n_totobs, self.neqs = self.endog.shape
         self.nobs = self.n_totobs - lag_order
         k_trend = util.get_trendorder(trend)
-        if k_trend > 0:
+        if k_trend > 0: # make this the polynomial trend order
             trendorder = k_trend - 1
         else:
             trendorder = None
         self.k_trend = k_trend
-        self.k_exog = k_trend
+        self.k_exog = k_trend  # now (0.9) required by VARProcess
         self.trendorder = trendorder
+
         self.exog_names = util.make_lag_names(names, lag_order, k_trend)
         self.params = params
         self.sigma_u = sigma_u
+
+        # Each matrix needs to be transposed
         reshaped = self.params[self.k_trend:]
         reshaped = reshaped.reshape((lag_order, self.neqs, self.neqs))
+
+        # Need to transpose each coefficient matrix
         intercept = self.params[0]
         coefs = reshaped.swapaxes(1, 2).copy()
+
+        #SVAR components
+        #TODO: if you define these here, you do not also have to define
+        #them in SVAR process, but I left them for now -ss
         self.A = A
         self.B = B
         self.A_mask = A_mask
         self.B_mask = B_mask
-        super().__init__(coefs, intercept, sigma_u, A, B, names=names)
+
+        super().__init__(coefs, intercept, sigma_u, A, B,
+                                          names=names)

     def irf(self, periods=10, var_order=None):
         """
@@ -351,10 +603,14 @@ class SVARResults(SVARProcess, VARResults):
         -------
         irf : IRAnalysis
         """
-        pass
+        A = self.A
+        B= self.B
+        P = np.dot(npl.inv(A), B)
+
+        return IRAnalysis(self, P=P, periods=periods, svar=True)

-    def sirf_errband_mc(self, orth=False, repl=1000, steps=10, signif=0.05,
-        seed=None, burn=100, cum=False):
+    def sirf_errband_mc(self, orth=False, repl=1000, steps=10,
+                        signif=0.05, seed=None, burn=100, cum=False):
         """
         Compute Monte Carlo integrated error bands assuming normally
         distributed for impulse response functions
@@ -384,4 +640,58 @@ class SVARResults(SVARProcess, VARResults):
         -------
         Tuple of lower and upper arrays of ma_rep monte carlo standard errors
         """
-        pass
+        neqs = self.neqs
+        mean = self.mean()
+        k_ar = self.k_ar
+        coefs = self.coefs
+        sigma_u = self.sigma_u
+        intercept = self.intercept
+        df_model = self.df_model
+        nobs = self.nobs
+
+        ma_coll = np.zeros((repl, steps + 1, neqs, neqs))
+        A = self.A
+        B = self.B
+        A_mask = self.A_mask
+        B_mask = self.B_mask
+        A_pass = self.model.A_original
+        B_pass = self.model.B_original
+        s_type = self.model.svar_type
+
+        g_list = []
+
+        def agg(impulses):
+            if cum:
+                return impulses.cumsum(axis=0)
+            return impulses
+
+        opt_A = A[A_mask]
+        opt_B = B[B_mask]
+        for i in range(repl):
+            # discard first hundred to correct for starting bias
+            sim = util.varsim(coefs, intercept, sigma_u, seed=seed,
+                              steps=nobs + burn)
+            sim = sim[burn:]
+
+            smod = SVAR(sim, svar_type=s_type, A=A_pass, B=B_pass)
+            if i == 10:
+                # Use first 10 to update starting val for remainder of fits
+                mean_AB = np.mean(g_list, axis=0)
+                split = len(A[A_mask])
+                opt_A = mean_AB[:split]
+                opt_B = mean_AB[split:]
+
+            sres = smod.fit(maxlags=k_ar, A_guess=opt_A, B_guess=opt_B)
+
+            if i < 10:
+                # save estimates for starting val if in first 10
+                g_list.append(np.append(sres.A[A_mask].tolist(),
+                                        sres.B[B_mask].tolist()))
+            ma_coll[i] = agg(sres.svar_ma_rep(maxn=steps))
+
+        ma_sort = np.sort(ma_coll, axis=0)  # sort to get quantiles
+        index = (int(round(signif / 2 * repl) - 1),
+                 int(round((1 - signif / 2) * repl) - 1))
+        lower = ma_sort[index[0], :, :, :]
+        upper = ma_sort[index[1], :, :, :]
+        return lower, upper
diff --git a/statsmodels/tsa/vector_ar/util.py b/statsmodels/tsa/vector_ar/util.py
index 50d670ae3..0f0a2c4c6 100644
--- a/statsmodels/tsa/vector_ar/util.py
+++ b/statsmodels/tsa/vector_ar/util.py
@@ -1,14 +1,19 @@
+# -*- coding: utf-8 -*-
 """
 Miscellaneous utility code for VAR estimation
 """
 from statsmodels.compat.pandas import frequencies
 from statsmodels.compat.python import asbytes
 from statsmodels.tools.validation import array_like, int_like
+
 import numpy as np
 import pandas as pd
 from scipy import stats, linalg
+
 import statsmodels.tsa.tsatools as tsa

+#-------------------------------------------------------------------------------
+# Auxiliary functions for estimation

 def get_var_endog(y, lags, trend='c', has_constant='skip'):
     """
@@ -21,7 +26,31 @@ def get_var_endog(y, lags, trend='c', has_constant='skip'):

     has_constant can be 'raise', 'add', or 'skip'. See add_constant.
     """
-    pass
+    nobs = len(y)
+    # Ravel C order, need to put in descending order
+    Z = np.array([y[t-lags : t][::-1].ravel() for t in range(lags, nobs)])
+
+    # Add constant, trend, etc.
+    if trend != 'n':
+        Z = tsa.add_trend(Z, prepend=True, trend=trend,
+                          has_constant=has_constant)
+
+    return Z
+
+
+def get_trendorder(trend='c'):
+    # Handle constant, etc.
+    if trend == 'c':
+        trendorder = 1
+    elif trend in ('n', 'nc'):
+        trendorder = 0
+    elif trend == 'ct':
+        trendorder = 2
+    elif trend == 'ctt':
+        trendorder = 3
+    else:
+        raise ValueError(f"Unkown trend: {trend}")
+    return trendorder


 def make_lag_names(names, lag_order, trendorder=1, exog=None):
@@ -33,7 +62,38 @@ def make_lag_names(names, lag_order, trendorder=1, exog=None):
     >>> make_lag_names(['foo', 'bar'], 2, 1)
     ['const', 'L1.foo', 'L1.bar', 'L2.foo', 'L2.bar']
     """
-    pass
+    lag_names = []
+    if isinstance(names, str):
+        names = [names]
+
+    # take care of lagged endogenous names
+    for i in range(1, lag_order + 1):
+        for name in names:
+            if not isinstance(name, str):
+                name = str(name) # will need consistent unicode handling
+            lag_names.append('L'+str(i)+'.'+name)
+
+    # handle the constant name
+    if trendorder != 0:
+        lag_names.insert(0, 'const')
+    if trendorder > 1:
+        lag_names.insert(1, 'trend')
+    if trendorder > 2:
+        lag_names.insert(2, 'trend**2')
+    if exog is not None:
+        if isinstance(exog, pd.Series):
+            exog = pd.DataFrame(exog)
+        elif not hasattr(exog, 'ndim'):
+            exog = np.asarray(exog)
+        if exog.ndim == 1:
+            exog = exog[:, None]
+        for i in range(exog.shape[1]):
+            if isinstance(exog, pd.DataFrame):
+                exog_name = str(exog.columns[i])
+            else:
+                exog_name = "exog" + str(i)
+            lag_names.insert(trendorder + i, exog_name)
+    return lag_names


 def comp_matrix(coefs):
@@ -46,20 +106,91 @@ def comp_matrix(coefs):
          0   I_K ... 0     0
          0 ...       I_K   0]
     """
-    pass
+    p, k1, k2 = coefs.shape
+    if k1 != k2:
+        raise ValueError('coefs must be 3-d with shape (p, k, k).')
+
+    kp = k1 * p

+    result = np.zeros((kp, kp))
+    result[:k1] = np.concatenate(coefs, axis=1)

-def parse_lutkepohl_data(path):
+    # Set I_K matrices
+    if p > 1:
+        result[np.arange(k1, kp), np.arange(kp-k1)] = 1
+
+    return result
+
+#-------------------------------------------------------------------------------
+# Miscellaneous stuff
+
+
+def parse_lutkepohl_data(path): # pragma: no cover
     """
     Parse data files from Lütkepohl (2005) book

     Source for data files: www.jmulti.de
     """
-    pass

+    from collections import deque
+    from datetime import datetime
+    import re
+
+    regex = re.compile(asbytes(r'<(.*) (\w)([\d]+)>.*'))
+    with open(path, 'rb') as f:
+        lines = deque(f)
+
+    to_skip = 0
+    while asbytes('*/') not in lines.popleft():
+        #while '*/' not in lines.popleft():
+        to_skip += 1
+
+    while True:
+        to_skip += 1
+        line = lines.popleft()
+        m = regex.match(line)
+        if m:
+            year, freq, start_point = m.groups()
+            break
+
+    data = (pd.read_csv(path, delimiter=r"\s+", header=to_skip+1)
+            .to_records(index=False))
+
+    n = len(data)

-def varsim(coefs, intercept, sig_u, steps=100, initial_values=None, seed=
-    None, nsimulations=None):
+    # generate the corresponding date range (using pandas for now)
+    start_point = int(start_point)
+    year = int(year)
+
+    offsets = {
+        asbytes('Q'): frequencies.BQuarterEnd(),
+        asbytes('M'): frequencies.BMonthEnd(),
+        asbytes('A'): frequencies.BYearEnd()
+    }
+
+    # create an instance
+    offset = offsets[freq]
+
+    inc = offset * (start_point - 1)
+    start_date = offset.rollforward(datetime(year, 1, 1)) + inc
+
+    offset = offsets[freq]
+    date_range = pd.date_range(start=start_date, freq=offset, periods=n)
+
+    return data, date_range
+
+
+def norm_signif_level(alpha=0.05):
+    return stats.norm.ppf(1 - alpha / 2)
+
+
+def acf_to_acorr(acf):
+    diag = np.diag(acf[0])
+    # numpy broadcasting sufficient
+    return acf / np.sqrt(np.outer(diag, diag))
+
+
+def varsim(coefs, intercept, sig_u, steps=100, initial_values=None, seed=None, nsimulations=None):
     """
     Simulate VAR(p) process, given coefficients and assuming Gaussian noise

@@ -103,9 +234,58 @@ def varsim(coefs, intercept, sig_u, steps=100, initial_values=None, seed=
         Endog of the simulated VAR process. Shape will be (nsimulations, steps, neqs)
         or (steps, neqs) if `nsimulations` is None.
     """
-    pass
+    rs = np.random.RandomState(seed=seed)
+    rmvnorm = rs.multivariate_normal
+    p, k, k = coefs.shape
+    nsimulations= int_like(nsimulations, "nsimulations", optional=True)
+    if isinstance(nsimulations, int) and nsimulations <= 0:
+        raise ValueError("nsimulations must be a positive integer if provided")
+    if nsimulations is None:
+        result_shape = (steps, k)
+        nsimulations = 1
+    else:
+        result_shape = (nsimulations, steps, k)
+    if sig_u is None:
+        sig_u = np.eye(k)
+    ugen = rmvnorm(np.zeros(len(sig_u)), sig_u, steps*nsimulations).reshape(nsimulations, steps, k)
+    result = np.zeros((nsimulations, steps, k))
+    if intercept is not None:
+        # intercept can be 2-D like an offset variable
+        if np.ndim(intercept) > 1:
+            if not len(intercept) == ugen.shape[1]:
+                raise ValueError('2-D intercept needs to have length `steps`')
+        # add intercept/offset also to intial values
+        result += intercept
+        result[:,p:] += ugen[:,p:]
+    else:
+        result[:,p:] = ugen[:,p:]
+
+    initial_values = array_like(initial_values, "initial_values", optional=True, maxdim=2)
+    if initial_values is not None:
+        if not (initial_values.shape == (p, k) or initial_values.shape == (k,)):
+            raise ValueError("initial_values should have shape (p, k) or (k,) where p is the number of lags and k is the number of equations.")
+        result[:,:p] = initial_values
+
+    # add in AR terms
+    for t in range(p, steps):
+        ygen = result[:,t]
+        for j in range(p):
+            ygen += np.dot(coefs[j], result[:,t-j-1].T).T
+
+    return result.reshape(result_shape)


+def get_index(lst, name):
+    try:
+        result = lst.index(name)
+    except Exception:
+        if not isinstance(name, int):
+            raise
+        result = name
+    return result
+
+
+#method used repeatedly in Sims-Zha error bands
 def eigval_decomp(sym_array):
     """
     Returns
@@ -114,7 +294,10 @@ def eigval_decomp(sym_array):
     eigva: list of eigenvalues
     k: largest eigenvector
     """
-    pass
+    #check if symmetric, do not include shock period
+    eigva, W = linalg.eig(sym_array, left=True, right=False)
+    k = np.argmax(eigva)
+    return W, eigva, k


 def vech(A):
@@ -124,7 +307,16 @@ def vech(A):
     -------
     vechvec: vector of all elements on and below diagonal
     """
-    pass
+
+    length=A.shape[1]
+    vechvec=[]
+    for i in range(length):
+        b=i
+        while b < length:
+            vechvec.append(A[b,i])
+            b=b+1
+    vechvec=np.asarray(vechvec)
+    return vechvec


 def seasonal_dummies(n_seasons, len_endog, first_period=0, centered=False):
@@ -152,4 +344,13 @@ def seasonal_dummies(n_seasons, len_endog, first_period=0, centered=False):
     -------
     seasonal_dummies : ndarray (len_endog x n_seasons-1)
     """
-    pass
+    if n_seasons == 0:
+        return np.empty((len_endog, 0))
+    if n_seasons > 0:
+        season_exog = np.zeros((len_endog, n_seasons - 1))
+        for i in range(n_seasons - 1):
+            season_exog[(i-first_period) % n_seasons::n_seasons, i] = 1
+
+        if centered:
+            season_exog -= 1 / n_seasons
+        return season_exog
diff --git a/statsmodels/tsa/vector_ar/var_model.py b/statsmodels/tsa/vector_ar/var_model.py
index a2ac33e8f..2b510a1a8 100644
--- a/statsmodels/tsa/vector_ar/var_model.py
+++ b/statsmodels/tsa/vector_ar/var_model.py
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 """
 Vector Autoregression (VAR) processes

@@ -6,30 +7,44 @@ References
 Lütkepohl (2005) New Introduction to Multiple Time Series Analysis
 """
 from __future__ import annotations
+
 from statsmodels.compat.python import lrange
+
 from collections import defaultdict
 from io import StringIO
+
 import numpy as np
 import pandas as pd
 import scipy.stats as stats
+
 import statsmodels.base.wrapper as wrap
 from statsmodels.iolib.table import SimpleTable
 from statsmodels.tools.decorators import cache_readonly, deprecated_alias
 from statsmodels.tools.linalg import logdet_symm
 from statsmodels.tools.sm_exceptions import OutputWarning
 from statsmodels.tools.validation import array_like
-from statsmodels.tsa.base.tsa_model import TimeSeriesModel, TimeSeriesResultsWrapper
+from statsmodels.tsa.base.tsa_model import (
+    TimeSeriesModel,
+    TimeSeriesResultsWrapper,
+)
 import statsmodels.tsa.tsatools as tsa
 from statsmodels.tsa.tsatools import duplication_matrix, unvec, vec
 from statsmodels.tsa.vector_ar import output, plotting, util
-from statsmodels.tsa.vector_ar.hypothesis_test_results import CausalityTestResults, NormalityTestResults, WhitenessTestResults
+from statsmodels.tsa.vector_ar.hypothesis_test_results import (
+    CausalityTestResults,
+    NormalityTestResults,
+    WhitenessTestResults,
+)
 from statsmodels.tsa.vector_ar.irf import IRAnalysis
 from statsmodels.tsa.vector_ar.output import VARSummary

+# -------------------------------------------------------------------------------
+# VAR process routines
+

 def ma_rep(coefs, maxn=10):
-    """
-    MA(\\infty) representation of VAR(p) process
+    r"""
+    MA(\infty) representation of VAR(p) process

     Parameters
     ----------
@@ -41,19 +56,31 @@ def ma_rep(coefs, maxn=10):
     -----
     VAR(p) process as

-    .. math:: y_t = A_1 y_{t-1} + \\ldots + A_p y_{t-p} + u_t
+    .. math:: y_t = A_1 y_{t-1} + \ldots + A_p y_{t-p} + u_t

     can be equivalently represented as

-    .. math:: y_t = \\mu + \\sum_{i=0}^\\infty \\Phi_i u_{t-i}
+    .. math:: y_t = \mu + \sum_{i=0}^\infty \Phi_i u_{t-i}

-    e.g. can recursively compute the \\Phi_i matrices with \\Phi_0 = I_k
+    e.g. can recursively compute the \Phi_i matrices with \Phi_0 = I_k

     Returns
     -------
     phis : ndarray (maxn + 1 x k x k)
     """
-    pass
+    p, k, k = coefs.shape
+    phis = np.zeros((maxn + 1, k, k))
+    phis[0] = np.eye(k)
+
+    # recursively compute Phi matrices
+    for i in range(1, maxn + 1):
+        for j in range(1, i + 1):
+            if j > p:
+                break
+
+            phis[i] += np.dot(phis[i - j], coefs[j - 1])
+
+    return phis


 def is_stable(coefs, verbose=False):
@@ -69,7 +96,15 @@ def is_stable(coefs, verbose=False):
     -------
     is_stable : bool
     """
-    pass
+    A_var1 = util.comp_matrix(coefs)
+    eigs = np.linalg.eigvals(A_var1)
+
+    if verbose:
+        print("Eigenvalues of VAR(1) rep")
+        for val in np.abs(eigs):
+            print(val)
+
+    return (np.abs(eigs) <= 1).all()


 def var_acf(coefs, sig_u, nlags=None):
@@ -94,7 +129,23 @@ def var_acf(coefs, sig_u, nlags=None):
     -------
     acf : ndarray, (p, k, k)
     """
-    pass
+    p, k, _ = coefs.shape
+    if nlags is None:
+        nlags = p
+
+    # p x k x k, ACF for lags 0, ..., p-1
+    result = np.zeros((nlags + 1, k, k))
+    result[:p] = _var_acf(coefs, sig_u)
+
+    # yule-walker equations
+    for h in range(p, nlags + 1):
+        # compute ACF for lag=h
+        # G(h) = A_1 G(h-1) + ... + A_p G(h-p)
+
+        for j in range(p):
+            result[h] += np.dot(coefs[j], result[h - j - 1])
+
+    return result


 def _var_acf(coefs, sig_u):
@@ -105,11 +156,26 @@ def _var_acf(coefs, sig_u):
     -----
     Lütkepohl (2005) p.29
     """
-    pass
+    p, k, k2 = coefs.shape
+    assert k == k2
+
+    A = util.comp_matrix(coefs)
+    # construct VAR(1) noise covariance
+    SigU = np.zeros((k * p, k * p))
+    SigU[:k, :k] = sig_u
+
+    # vec(ACF) = (I_(kp)^2 - kron(A, A))^-1 vec(Sigma_U)
+    vecACF = np.linalg.solve(np.eye((k * p) ** 2) - np.kron(A, A), vec(SigU))
+
+    acf = unvec(vecACF)
+    acf = [acf[:k, k * i : k * (i + 1)] for i in range(p)]
+    acf = np.array(acf)
+
+    return acf


 def forecast_cov(ma_coefs, sigma_u, steps):
-    """
+    r"""
     Compute theoretical forecast error variance matrices

     Parameters
@@ -119,13 +185,23 @@ def forecast_cov(ma_coefs, sigma_u, steps):

     Notes
     -----
-    .. math:: \\mathrm{MSE}(h) = \\sum_{i=0}^{h-1} \\Phi \\Sigma_u \\Phi^T
+    .. math:: \mathrm{MSE}(h) = \sum_{i=0}^{h-1} \Phi \Sigma_u \Phi^T

     Returns
     -------
     forc_covs : ndarray (steps x neqs x neqs)
     """
-    pass
+    neqs = len(sigma_u)
+    forc_covs = np.zeros((steps, neqs, neqs))
+
+    prior = np.zeros((neqs, neqs))
+    for h in range(steps):
+        # Sigma(h) = Sigma(h-1) + Phi Sig_u Phi'
+        phi = ma_coefs[h]
+        var = phi @ sigma_u @ phi.T
+        forc_covs[h] = prior = prior + var
+
+    return forc_covs


 mse = forecast_cov
@@ -151,7 +227,43 @@ def forecast(y, coefs, trend_coefs, steps, exog=None):
     -----
     Lütkepohl p. 37
     """
-    pass
+    p = len(coefs)
+    k = len(coefs[0])
+    if y.shape[0] < p:
+        raise ValueError(
+            f"y must by have at least order ({p}) observations. "
+            f"Got {y.shape[0]}."
+        )
+    # initial value
+    forcs = np.zeros((steps, k))
+    if exog is not None and trend_coefs is not None:
+        forcs += np.dot(exog, trend_coefs)
+    # to make existing code (with trend_coefs=intercept and without exog) work:
+    elif exog is None and trend_coefs is not None:
+        forcs += trend_coefs
+
+    # h=0 forecast should be latest observation
+    # forcs[0] = y[-1]
+
+    # make indices easier to think about
+    for h in range(1, steps + 1):
+        # y_t(h) = intercept + sum_1^p A_i y_t_(h-i)
+        f = forcs[h - 1]
+        for i in range(1, p + 1):
+            # slightly hackish
+            if h - i <= 0:
+                # e.g. when h=1, h-1 = 0, which is y[-1]
+                prior_y = y[h - i - 1]
+            else:
+                # e.g. when h=2, h-1=1, which is forcs[0]
+                prior_y = forcs[h - i - 1]
+
+            # i=1 is coefs[0]
+            f = f + np.dot(coefs[i - 1], prior_y)
+
+        forcs[h - 1] = f
+
+    return forcs


 def _forecast_vars(steps, ma_coefs, sig_u):
@@ -168,11 +280,31 @@ def _forecast_vars(steps, ma_coefs, sig_u):
     Returns
     -------
     """
-    pass
+    covs = mse(ma_coefs, sig_u, steps)
+    # Take diagonal for each cov
+    neqs = len(sig_u)
+    inds = np.arange(neqs)
+    return covs[:, inds, inds]
+
+
+def forecast_interval(
+    y, coefs, trend_coefs, sig_u, steps=5, alpha=0.05, exog=1
+):
+    assert 0 < alpha < 1
+    q = util.norm_signif_level(alpha)
+
+    point_forecast = forecast(y, coefs, trend_coefs, steps, exog)
+    ma_coefs = ma_rep(coefs, steps)
+    sigma = np.sqrt(_forecast_vars(steps, ma_coefs, sig_u))
+
+    forc_lower = point_forecast - q * sigma
+    forc_upper = point_forecast + q * sigma
+
+    return point_forecast, forc_lower, forc_upper


 def var_loglike(resid, omega, nobs):
-    """
+    r"""
     Returns the value of the VAR(p) log-likelihood.

     Parameters
@@ -196,16 +328,70 @@ def var_loglike(resid, omega, nobs):

     .. math::

-        -\\left(\\frac{T}{2}\\right)
-        \\left(\\ln\\left|\\Omega\\right|-K\\ln\\left(2\\pi\\right)-K\\right)
+        -\left(\frac{T}{2}\right)
+        \left(\ln\left|\Omega\right|-K\ln\left(2\pi\right)-K\right)
     """
-    pass
+    logdet = logdet_symm(np.asarray(omega))
+    neqs = len(omega)
+    part1 = -(nobs * neqs / 2) * np.log(2 * np.pi)
+    part2 = -(nobs / 2) * (logdet + neqs)
+    return part1 + part2
+
+
+def _reordered(self, order):
+    # Create new arrays to hold rearranged results from .fit()
+    endog = self.endog
+    endog_lagged = self.endog_lagged
+    params = self.params
+    sigma_u = self.sigma_u
+    names = self.names
+    k_ar = self.k_ar
+    endog_new = np.zeros_like(endog)
+    endog_lagged_new = np.zeros_like(endog_lagged)
+    params_new_inc = np.zeros_like(params)
+    params_new = np.zeros_like(params)
+    sigma_u_new_inc = np.zeros_like(sigma_u)
+    sigma_u_new = np.zeros_like(sigma_u)
+    num_end = len(self.params[0])
+    names_new = []
+
+    # Rearrange elements and fill in new arrays
+    k = self.k_trend
+    for i, c in enumerate(order):
+        endog_new[:, i] = self.endog[:, c]
+        if k > 0:
+            params_new_inc[0, i] = params[0, i]
+            endog_lagged_new[:, 0] = endog_lagged[:, 0]
+        for j in range(k_ar):
+            params_new_inc[i + j * num_end + k, :] = self.params[
+                c + j * num_end + k, :
+            ]
+            endog_lagged_new[:, i + j * num_end + k] = endog_lagged[
+                :, c + j * num_end + k
+            ]
+        sigma_u_new_inc[i, :] = sigma_u[c, :]
+        names_new.append(names[c])
+    for i, c in enumerate(order):
+        params_new[:, i] = params_new_inc[:, c]
+        sigma_u_new[:, i] = sigma_u_new_inc[:, c]
+
+    return VARResults(
+        endog=endog_new,
+        endog_lagged=endog_lagged_new,
+        params=params_new,
+        sigma_u=sigma_u_new,
+        lag_order=self.k_ar,
+        model=self.model,
+        trend="c",
+        names=names_new,
+        dates=self.dates,
+    )


 def orth_ma_rep(results, maxn=10, P=None):
-    """Compute Orthogonalized MA coefficient matrices using P matrix such
-    that :math:`\\Sigma_u = PP^\\prime`. P defaults to the Cholesky
-    decomposition of :math:`\\Sigma_u`
+    r"""Compute Orthogonalized MA coefficient matrices using P matrix such
+    that :math:`\Sigma_u = PP^\prime`. P defaults to the Cholesky
+    decomposition of :math:`\Sigma_u`

     Parameters
     ----------
@@ -219,7 +405,11 @@ def orth_ma_rep(results, maxn=10, P=None):
     -------
     coefs : ndarray (maxn x neqs x neqs)
     """
-    pass
+    if P is None:
+        P = results._chol_sigma_u
+
+    ma_mats = results.ma_rep(maxn=maxn)
+    return np.array([np.dot(coefs, P) for coefs in ma_mats])


 def test_normality(results, signif=0.05):
@@ -250,7 +440,25 @@ def test_normality(results, signif=0.05):
        Normality in Autoregressions: Asymptotic Theory and Simulation
        Evidence." Journal of Business & Economic Statistics
     """
-    pass
+    resid_c = results.resid - results.resid.mean(0)
+    sig = np.dot(resid_c.T, resid_c) / results.nobs
+    Pinv = np.linalg.inv(np.linalg.cholesky(sig))
+
+    w = np.dot(Pinv, resid_c.T)
+    b1 = (w ** 3).sum(1)[:, None] / results.nobs
+    b2 = (w ** 4).sum(1)[:, None] / results.nobs - 3
+
+    lam_skew = results.nobs * np.dot(b1.T, b1) / 6
+    lam_kurt = results.nobs * np.dot(b2.T, b2) / 24
+
+    lam_omni = float(np.squeeze(lam_skew + lam_kurt))
+    omni_dist = stats.chi2(results.neqs * 2)
+    omni_pvalue = float(omni_dist.sf(lam_omni))
+    crit_omni = float(omni_dist.ppf(1 - signif))
+
+    return NormalityTestResults(
+        lam_omni, crit_omni, omni_pvalue, results.neqs * 2, signif
+    )


 class LagOrderResults:
@@ -277,27 +485,49 @@ class LagOrderResults:
     """

     def __init__(self, ics, selected_orders, vecm=False):
-        self.title = ('VECM' if vecm else 'VAR') + ' Order Selection'
-        self.title += ' (* highlights the minimums)'
+        self.title = ("VECM" if vecm else "VAR") + " Order Selection"
+        self.title += " (* highlights the minimums)"
         self.ics = ics
         self.selected_orders = selected_orders
         self.vecm = vecm
-        self.aic = selected_orders['aic']
-        self.bic = selected_orders['bic']
-        self.hqic = selected_orders['hqic']
-        self.fpe = selected_orders['fpe']
+        self.aic = selected_orders["aic"]
+        self.bic = selected_orders["bic"]
+        self.hqic = selected_orders["hqic"]
+        self.fpe = selected_orders["fpe"]
+
+    def summary(self):  # basically copied from (now deleted) print_ic_table()
+        cols = sorted(self.ics)  # ["aic", "bic", "hqic", "fpe"]
+        str_data = np.array(
+            [["%#10.4g" % v for v in self.ics[c]] for c in cols], dtype=object
+        ).T
+        # mark minimum with an asterisk
+        for i, col in enumerate(cols):
+            idx = int(self.selected_orders[col]), i
+            str_data[idx] += "*"
+        return SimpleTable(
+            str_data,
+            [col.upper() for col in cols],
+            lrange(len(str_data)),
+            title=self.title,
+        )

     def __str__(self):
         return (
-            f'<{self.__module__}.{self.__class__.__name__} object. Selected orders are: AIC -> {str(self.aic)}, BIC -> {str(self.bic)}, FPE -> {str(self.fpe)}, HQIC ->  {str(self.hqic)}>'
-            )
+            f"<{self.__module__}.{self.__class__.__name__} object. Selected "
+            f"orders are: AIC -> {str(self.aic)}, BIC -> {str(self.bic)}, "
+            f"FPE -> {str(self.fpe)}, HQIC ->  {str(self.hqic)}>"
+        )
+
+
+# -------------------------------------------------------------------------------
+# VARProcess class: for known or unknown VAR process


 class VAR(TimeSeriesModel):
-    """
+    r"""
     Fit VAR(p) process and do lag order selection

-    .. math:: y_t = A_1 y_{t-1} + \\ldots + A_p y_{t-p} + u_t
+    .. math:: y_t = A_1 y_{t-1} + \ldots + A_p y_{t-p} + u_t

     Parameters
     ----------
@@ -312,24 +542,81 @@ class VAR(TimeSeriesModel):
     ----------
     Lütkepohl (2005) New Introduction to Multiple Time Series Analysis
     """
-    y = deprecated_alias('y', 'endog', remove_version='0.11.0')

-    def __init__(self, endog, exog=None, dates=None, freq=None, missing='none'
-        ):
+    y = deprecated_alias("y", "endog", remove_version="0.11.0")
+
+    def __init__(
+        self, endog, exog=None, dates=None, freq=None, missing="none"
+    ):
         super().__init__(endog, exog, dates, freq, missing=missing)
         if self.endog.ndim == 1:
-            raise ValueError('Only gave one variable to VAR')
+            raise ValueError("Only gave one variable to VAR")
         self.neqs = self.endog.shape[1]
         self.n_totobs = len(endog)

-    def predict(self, params, start=None, end=None, lags=1, trend='c'):
+    def predict(self, params, start=None, end=None, lags=1, trend="c"):
         """
         Returns in-sample predictions or forecasts
         """
-        pass
+        params = np.array(params)
+
+        if start is None:
+            start = lags

-    def fit(self, maxlags: (int | None)=None, method='ols', ic=None, trend=
-        'c', verbose=False):
+        # Handle start, end
+        (
+            start,
+            end,
+            out_of_sample,
+            prediction_index,
+        ) = self._get_prediction_index(start, end)
+
+        if end < start:
+            raise ValueError("end is before start")
+        if end == start + out_of_sample:
+            return np.array([])
+
+        k_trend = util.get_trendorder(trend)
+        k = self.neqs
+        k_ar = lags
+
+        predictedvalues = np.zeros((end + 1 - start + out_of_sample, k))
+        if k_trend != 0:
+            intercept = params[:k_trend]
+            predictedvalues += intercept
+
+        y = self.endog
+        x = util.get_var_endog(y, lags, trend=trend, has_constant="raise")
+        fittedvalues = np.dot(x, params)
+
+        fv_start = start - k_ar
+        pv_end = min(len(predictedvalues), len(fittedvalues) - fv_start)
+        fv_end = min(len(fittedvalues), end - k_ar + 1)
+        predictedvalues[:pv_end] = fittedvalues[fv_start:fv_end]
+
+        if not out_of_sample:
+            return predictedvalues
+
+        # fit out of sample
+        y = y[-k_ar:]
+        coefs = params[k_trend:].reshape((k_ar, k, k)).swapaxes(1, 2)
+        predictedvalues[pv_end:] = forecast(y, coefs, intercept, out_of_sample)
+        return predictedvalues
+
+    def fit(
+        self,
+        maxlags: int | None = None,
+        method="ols",
+        ic=None,
+        trend="c",
+        verbose=False,
+    ):
+        # todo: this code is only supporting deterministic terms as exog.
+        # This means that all exog-variables have lag 0. If dealing with
+        # different exogs is necessary, a `lags_exog`-parameter might make
+        # sense (e.g. a sequence of ints specifying lags).
+        # Alternatively, leading zeros for exog-variables with smaller number
+        # of lags than the maximum number of exog-lags might work.
         """
         Fit the VAR model

@@ -364,9 +651,50 @@ class VAR(TimeSeriesModel):
         -----
         See Lütkepohl pp. 146-153 for implementation details.
         """
-        pass
+        lags = maxlags
+        if trend not in ["c", "ct", "ctt", "n"]:
+            raise ValueError("trend '{}' not supported for VAR".format(trend))
+
+        if ic is not None:
+            selections = self.select_order(maxlags=maxlags)
+            if not hasattr(selections, ic):
+                raise ValueError(
+                    "%s not recognized, must be among %s"
+                    % (ic, sorted(selections))
+                )
+            lags = getattr(selections, ic)
+            if verbose:
+                print(selections)
+                print("Using %d based on %s criterion" % (lags, ic))
+        else:
+            if lags is None:
+                lags = 1
+
+        k_trend = util.get_trendorder(trend)
+        orig_exog_names = self.exog_names
+        self.exog_names = util.make_lag_names(self.endog_names, lags, k_trend)
+        self.nobs = self.n_totobs - lags
+
+        # add exog to data.xnames (necessary because the length of xnames also
+        # determines the allowed size of VARResults.params)
+        if self.exog is not None:
+            if orig_exog_names:
+                x_names_to_add = orig_exog_names
+            else:
+                x_names_to_add = [
+                    ("exog%d" % i) for i in range(self.exog.shape[1])
+                ]
+            self.data.xnames = (
+                self.data.xnames[:k_trend]
+                + x_names_to_add
+                + self.data.xnames[k_trend:]
+            )
+        self.data.cov_names = pd.MultiIndex.from_product(
+            (self.data.xnames, self.data.ynames)
+        )
+        return self._estimate_var(lags, trend=trend)

-    def _estimate_var(self, lags, offset=0, trend='c'):
+    def _estimate_var(self, lags, offset=0, trend="c"):
         """
         lags : int
             Lags of the endogenous variable.
@@ -376,9 +704,79 @@ class VAR(TimeSeriesModel):
         trend : {str, None}
             As per above
         """
-        pass
+        # have to do this again because select_order does not call fit
+        self.k_trend = k_trend = util.get_trendorder(trend)
+
+        if offset < 0:  # pragma: no cover
+            raise ValueError("offset must be >= 0")
+
+        nobs = self.n_totobs - lags - offset
+        endog = self.endog[offset:]
+        exog = None if self.exog is None else self.exog[offset:]
+        z = util.get_var_endog(endog, lags, trend=trend, has_constant="raise")
+        if exog is not None:
+            # TODO: currently only deterministic terms supported (exoglags==0)
+            # and since exoglags==0, x will be an array of size 0.
+            x = util.get_var_endog(
+                exog[-nobs:], 0, trend="n", has_constant="raise"
+            )
+            x_inst = exog[-nobs:]
+            x = np.column_stack((x, x_inst))
+            del x_inst  # free memory
+            temp_z = z
+            z = np.empty((x.shape[0], x.shape[1] + z.shape[1]))
+            z[:, : self.k_trend] = temp_z[:, : self.k_trend]
+            z[:, self.k_trend : self.k_trend + x.shape[1]] = x
+            z[:, self.k_trend + x.shape[1] :] = temp_z[:, self.k_trend :]
+            del temp_z, x  # free memory
+        # the following modification of z is necessary to get the same results
+        # as JMulTi for the constant-term-parameter...
+        for i in range(self.k_trend):
+            if (np.diff(z[:, i]) == 1).all():  # modify the trend-column
+                z[:, i] += lags
+            # make the same adjustment for the quadratic term
+            if (np.diff(np.sqrt(z[:, i])) == 1).all():
+                z[:, i] = (np.sqrt(z[:, i]) + lags) ** 2
+
+        y_sample = endog[lags:]
+        # Lütkepohl p75, about 5x faster than stated formula
+        params = np.linalg.lstsq(z, y_sample, rcond=1e-15)[0]
+        resid = y_sample - np.dot(z, params)
+
+        # Unbiased estimate of covariance matrix $\Sigma_u$ of the white noise
+        # process $u$
+        # equivalent definition
+        # .. math:: \frac{1}{T - Kp - 1} Y^\prime (I_T - Z (Z^\prime Z)^{-1}
+        # Z^\prime) Y
+        # Ref: Lütkepohl p.75
+        # df_resid right now is T - Kp - 1, which is a suggested correction
+
+        avobs = len(y_sample)
+        if exog is not None:
+            k_trend += exog.shape[1]
+        df_resid = avobs - (self.neqs * lags + k_trend)
+
+        sse = np.dot(resid.T, resid)
+        if df_resid:
+            omega = sse / df_resid
+        else:
+            omega = np.full_like(sse, np.nan)
+
+        varfit = VARResults(
+            endog,
+            z,
+            params,
+            omega,
+            lags,
+            names=self.endog_names,
+            trend=trend,
+            dates=self.data.dates,
+            model=self,
+            exog=self.exog,
+        )
+        return VARResultsWrapper(varfit)

-    def select_order(self, maxlags=None, trend='c'):
+    def select_order(self, maxlags=None, trend="c"):
         """
         Compute lag order selections based on each of the available information
         criteria
@@ -397,15 +795,49 @@ class VAR(TimeSeriesModel):
         -------
         selections : LagOrderResults
         """
-        pass
+        ntrend = len(trend) if trend.startswith("c") else 0
+        max_estimable = (self.n_totobs - self.neqs - ntrend) // (1 + self.neqs)
+        if maxlags is None:
+            maxlags = int(round(12 * (len(self.endog) / 100.0) ** (1 / 4.0)))
+            # TODO: This expression shows up in a bunch of places, but
+            #  in some it is `int` and in others `np.ceil`.  Also in some
+            #  it multiplies by 4 instead of 12.  Let's put these all in
+            #  one place and document when to use which variant.
+
+            # Ensure enough obs to estimate model with maxlags
+            maxlags = min(maxlags, max_estimable)
+        else:
+            if maxlags > max_estimable:
+                raise ValueError(
+                    "maxlags is too large for the number of observations and "
+                    "the number of equations. The largest model cannot be "
+                    "estimated."
+                )
+
+        ics = defaultdict(list)
+        p_min = 0 if self.exog is not None or trend != "n" else 1
+        for p in range(p_min, maxlags + 1):
+            # exclude some periods to same amount of data used for each lag
+            # order
+            result = self._estimate_var(p, offset=maxlags - p, trend=trend)
+
+            for k, v in result.info_criteria.items():
+                ics[k].append(v)
+
+        selected_orders = dict(
+            (k, np.array(v).argmin() + p_min) for k, v in ics.items()
+        )
+
+        return LagOrderResults(ics, selected_orders, vecm=False)

     @classmethod
-    def from_formula(cls, formula, data, subset=None, drop_cols=None, *args,
-        **kwargs):
+    def from_formula(
+        cls, formula, data, subset=None, drop_cols=None, *args, **kwargs
+    ):
         """
         Not implemented. Formulas are not supported for VAR models.
         """
-        pass
+        raise NotImplementedError("formulas are not supported for VAR models.")


 class VARProcess:
@@ -429,24 +861,29 @@ class VARProcess:
         trend.
     """

-    def __init__(self, coefs, coefs_exog, sigma_u, names=None, _params_info
-        =None):
+    def __init__(
+        self, coefs, coefs_exog, sigma_u, names=None, _params_info=None
+    ):
         self.k_ar = len(coefs)
         self.neqs = coefs.shape[1]
         self.coefs = coefs
         self.coefs_exog = coefs_exog
+        # Note reshaping 1-D coefs_exog to 2_D makes unit tests fail
         self.sigma_u = sigma_u
         self.names = names
+
         if _params_info is None:
             _params_info = {}
-        self.k_exog_user = _params_info.get('k_exog_user', 0)
+        self.k_exog_user = _params_info.get("k_exog_user", 0)
         if self.coefs_exog is not None:
             k_ex = self.coefs_exog.shape[0] if self.coefs_exog.ndim != 1 else 1
             k_c = k_ex - self.k_exog_user
         else:
             k_c = 0
-        self.k_trend = _params_info.get('k_trend', k_c)
+        self.k_trend = _params_info.get("k_trend", k_c)
+        # TODO: we need to distinguish exog including trend and exog_user
         self.k_exog = self.k_trend + self.k_exog_user
+
         if self.k_trend > 0:
             if coefs_exog.ndim == 2:
                 self.intercept = coefs_exog[:, 0]
@@ -457,13 +894,16 @@ class VARProcess:

     def get_eq_index(self, name):
         """Return integer position of requested equation name"""
-        pass
+        return util.get_index(self.names, name)

     def __str__(self):
-        output = 'VAR(%d) process for %d-dimensional response y_t' % (self.
-            k_ar, self.neqs)
-        output += '\nstable: %s' % self.is_stable()
-        output += '\nmean: %s' % self.mean()
+        output = "VAR(%d) process for %d-dimensional response y_t" % (
+            self.k_ar,
+            self.neqs,
+        )
+        output += "\nstable: %s" % self.is_stable()
+        output += "\nmean: %s" % self.mean()
+
         return output

     def is_stable(self, verbose=False):
@@ -479,10 +919,9 @@ class VARProcess:
         Checks if det(I - Az) = 0 for any mod(z) <= 1, so all the eigenvalues of
         the companion matrix must lie outside the unit circle
         """
-        pass
+        return is_stable(self.coefs, verbose=verbose)

-    def simulate_var(self, steps=None, offset=None, seed=None,
-        initial_values=None, nsimulations=None):
+    def simulate_var(self, steps=None, offset=None, seed=None, initial_values=None, nsimulations=None):
         """
         simulate the VAR(p) process for the desired number of steps

@@ -519,29 +958,67 @@ class VARProcess:
             Endog of the simulated VAR process. Shape will be (nsimulations, steps, neqs)
             or (steps, neqs) if `nsimulations` is None.
         """
-        pass
+        steps_ = None
+        if offset is None:
+            if self.k_exog_user > 0 or self.k_trend > 1:
+                # if more than intercept
+                # endog_lagged contains all regressors, trend, exog_user
+                # and lagged endog, trimmed initial observations
+                offset = self.endog_lagged[:, : self.k_exog].dot(
+                    self.coefs_exog.T
+                )
+                steps_ = self.endog_lagged.shape[0]
+            else:
+                offset = self.intercept
+        else:
+            steps_ = offset.shape[0]
+
+        # default, but over written if exog or offset are used
+        if steps is None:
+            if steps_ is None:
+                steps = 1000
+            else:
+                steps = steps_
+        else:
+            if steps_ is not None and steps != steps_:
+                raise ValueError(
+                    "if exog or offset are used, then steps must"
+                    "be equal to their length or None"
+                )
+
+        y = util.varsim(
+            self.coefs,
+            offset,
+            self.sigma_u,
+            steps=steps,
+            seed=seed,
+            initial_values=initial_values,
+            nsimulations=nsimulations
+        )
+        return y

     def plotsim(self, steps=None, offset=None, seed=None):
         """
         Plot a simulation from the VAR(p) process for the desired number of
         steps
         """
-        pass
+        y = self.simulate_var(steps=steps, offset=offset, seed=seed)
+        return plotting.plot_mts(y)

     def intercept_longrun(self):
-        """
+        r"""
         Long run intercept of stable VAR process

         Lütkepohl eq. 2.1.23

-        .. math:: \\mu = (I - A_1 - \\dots - A_p)^{-1} \\alpha
+        .. math:: \mu = (I - A_1 - \dots - A_p)^{-1} \alpha

-        where \\alpha is the intercept (parameter of the constant)
+        where \alpha is the intercept (parameter of the constant)
         """
-        pass
+        return np.linalg.solve(self._char_mat, self.intercept)

     def mean(self):
-        """
+        r"""
         Long run intercept of stable VAR process

         Warning: trend and exog except for intercept are ignored for this.
@@ -549,15 +1026,15 @@ class VARProcess:

         Lütkepohl eq. 2.1.23

-        .. math:: \\mu = (I - A_1 - \\dots - A_p)^{-1} \\alpha
+        .. math:: \mu = (I - A_1 - \dots - A_p)^{-1} \alpha

-        where \\alpha is the intercept (parameter of the constant)
+        where \alpha is the intercept (parameter of the constant)
         """
-        pass
+        return self.intercept_longrun()

     def ma_rep(self, maxn=10):
-        """
-        Compute MA(:math:`\\infty`) coefficient matrices
+        r"""
+        Compute MA(:math:`\infty`) coefficient matrices

         Parameters
         ----------
@@ -568,13 +1045,13 @@ class VARProcess:
         -------
         coefs : ndarray (maxn x k x k)
         """
-        pass
+        return ma_rep(self.coefs, maxn=maxn)

     def orth_ma_rep(self, maxn=10, P=None):
-        """
+        r"""
         Compute orthogonalized MA coefficient matrices using P matrix such
-        that :math:`\\Sigma_u = PP^\\prime`. P defaults to the Cholesky
-        decomposition of :math:`\\Sigma_u`
+        that :math:`\Sigma_u = PP^\prime`. P defaults to the Cholesky
+        decomposition of :math:`\Sigma_u`

         Parameters
         ----------
@@ -587,21 +1064,25 @@ class VARProcess:
         -------
         coefs : ndarray (maxn x k x k)
         """
-        pass
+        return orth_ma_rep(self, maxn, P)

     def long_run_effects(self):
-        """Compute long-run effect of unit impulse
+        r"""Compute long-run effect of unit impulse

         .. math::

-            \\Psi_\\infty = \\sum_{i=0}^\\infty \\Phi_i
+            \Psi_\infty = \sum_{i=0}^\infty \Phi_i
         """
-        pass
+        return np.linalg.inv(self._char_mat)
+
+    @cache_readonly
+    def _chol_sigma_u(self):
+        return np.linalg.cholesky(self.sigma_u)

     @cache_readonly
     def _char_mat(self):
         """Characteristic matrix of the VAR"""
-        pass
+        return np.eye(self.neqs) - self.coefs.sum(0)

     def acf(self, nlags=None):
         """Compute theoretical autocovariance function
@@ -610,7 +1091,7 @@ class VARProcess:
         -------
         acf : ndarray (p x k x k)
         """
-        pass
+        return var_acf(self.coefs, self.sigma_u, nlags=nlags)

     def acorr(self, nlags=None):
         """
@@ -627,11 +1108,14 @@ class VARProcess:
         acorr : ndarray
             Autocorrelation and cross correlations (nlags, neqs, neqs)
         """
-        pass
+        return util.acf_to_acorr(self.acf(nlags=nlags))

     def plot_acorr(self, nlags=10, linewidth=8):
         """Plot theoretical autocorrelation function"""
-        pass
+        fig = plotting.plot_full_acorr(
+            self.acorr(nlags=nlags), linewidth=linewidth
+        )
+        return fig

     def forecast(self, y, steps, exog_future=None):
         """Produce linear minimum MSE forecasts for desired number of steps
@@ -650,10 +1134,51 @@ class VARProcess:
         -----
         Lütkepohl pp 37-38
         """
-        pass
+        if self.exog is None and exog_future is not None:
+            raise ValueError(
+                "No exog in model, so no exog_future supported "
+                "in forecast method."
+            )
+        if self.exog is not None and exog_future is None:
+            raise ValueError(
+                "Please provide an exog_future argument to "
+                "the forecast method."
+            )

+        exog_future = array_like(
+            exog_future, "exog_future", optional=True, ndim=2
+        )
+        if exog_future is not None:
+            if exog_future.shape[0] != steps:
+                err_msg = f"""\
+exog_future only has {exog_future.shape[0]} observations. It must have \
+steps ({steps}) observations.
+"""
+                raise ValueError(err_msg)
+        trend_coefs = None if self.coefs_exog.size == 0 else self.coefs_exog.T
+
+        exogs = []
+        if self.trend.startswith("c"):  # constant term
+            exogs.append(np.ones(steps))
+        exog_lin_trend = np.arange(
+            self.n_totobs + 1, self.n_totobs + 1 + steps
+        )
+        if "t" in self.trend:
+            exogs.append(exog_lin_trend)
+        if "tt" in self.trend:
+            exogs.append(exog_lin_trend ** 2)
+        if exog_future is not None:
+            exogs.append(exog_future)
+
+        if not exogs:
+            exog_future = None
+        else:
+            exog_future = np.column_stack(exogs)
+        return forecast(y, self.coefs, trend_coefs, steps, exog_future)
+
+    # TODO: use `mse` module-level function?
     def mse(self, steps):
-        """
+        r"""
         Compute theoretical forecast error variance matrices

         Parameters
@@ -663,15 +1188,35 @@ class VARProcess:

         Notes
         -----
-        .. math:: \\mathrm{MSE}(h) = \\sum_{i=0}^{h-1} \\Phi \\Sigma_u \\Phi^T
+        .. math:: \mathrm{MSE}(h) = \sum_{i=0}^{h-1} \Phi \Sigma_u \Phi^T

         Returns
         -------
         forc_covs : ndarray (steps x neqs x neqs)
         """
-        pass
+        ma_coefs = self.ma_rep(steps)
+
+        k = len(self.sigma_u)
+        forc_covs = np.zeros((steps, k, k))
+
+        prior = np.zeros((k, k))
+        for h in range(steps):
+            # Sigma(h) = Sigma(h-1) + Phi Sig_u Phi'
+            phi = ma_coefs[h]
+            var = phi @ self.sigma_u @ phi.T
+            forc_covs[h] = prior = prior + var
+
+        return forc_covs
+
     forecast_cov = mse

+    def _forecast_vars(self, steps):
+        covs = self.forecast_cov(steps)
+
+        # Take diagonal for each cov
+        inds = np.arange(self.neqs)
+        return covs[:, inds, inds]
+
     def forecast_interval(self, y, steps, alpha=0.05, exog_future=None):
         """
         Construct forecast interval estimates assuming the y are Gaussian
@@ -703,11 +1248,33 @@ class VARProcess:
         -----
         Lütkepohl pp. 39-40
         """
-        pass
+        if not 0 < alpha < 1:
+            raise ValueError("alpha must be between 0 and 1")
+        q = util.norm_signif_level(alpha)
+
+        point_forecast = self.forecast(y, steps, exog_future=exog_future)
+        sigma = np.sqrt(self._forecast_vars(steps))
+
+        forc_lower = point_forecast - q * sigma
+        forc_upper = point_forecast + q * sigma
+
+        return point_forecast, forc_lower, forc_upper

     def to_vecm(self):
         """to_vecm"""
-        pass
+        k = self.coefs.shape[1]
+        p = self.coefs.shape[0]
+        A = self.coefs
+        pi = -(np.identity(k) - np.sum(A, 0))
+        gamma = np.zeros((p - 1, k, k))
+        for i in range(p - 1):
+            gamma[i] = -(np.sum(A[i + 1 :], 0))
+        gamma = np.concatenate(gamma, 1)
+        return {"Gamma": gamma, "Pi": pi}
+
+
+# ----------------------------------------------------------------------------
+# VARResults class


 class VARResults(VARProcess):
@@ -751,22 +1318,41 @@ class VARResults(VARProcess):
     sigma_u : ndarray (K x K)
         Estimate of white noise process variance Var[u_t]
     """
-    _model_type = 'VAR'

-    def __init__(self, endog, endog_lagged, params, sigma_u, lag_order,
-        model=None, trend='c', names=None, dates=None, exog=None):
+    _model_type = "VAR"
+
+    def __init__(
+        self,
+        endog,
+        endog_lagged,
+        params,
+        sigma_u,
+        lag_order,
+        model=None,
+        trend="c",
+        names=None,
+        dates=None,
+        exog=None,
+    ):
+
         self.model = model
         self.endog = endog
         self.endog_lagged = endog_lagged
         self.dates = dates
+
         self.n_totobs, neqs = self.endog.shape
         self.nobs = self.n_totobs - lag_order
         self.trend = trend
         k_trend = util.get_trendorder(trend)
-        self.exog_names = util.make_lag_names(names, lag_order, k_trend,
-            model.data.orig_exog)
+        self.exog_names = util.make_lag_names(
+            names, lag_order, k_trend, model.data.orig_exog
+        )
         self.params = params
         self.exog = exog
+
+        # Initialize VARProcess parent class
+        # construct coefficient matrices
+        # Each matrix needs to be transposed
         endog_start = k_trend
         if exog is not None:
             k_exog_user = exog.shape[1]
@@ -775,52 +1361,68 @@ class VARResults(VARProcess):
             k_exog_user = 0
         reshaped = self.params[endog_start:]
         reshaped = reshaped.reshape((lag_order, neqs, neqs))
+        # Need to transpose each coefficient matrix
         coefs = reshaped.swapaxes(1, 2).copy()
+
         self.coefs_exog = params[:endog_start].T
         self.k_exog = self.coefs_exog.shape[1]
         self.k_exog_user = k_exog_user
-        _params_info = {'k_trend': k_trend, 'k_exog_user': k_exog_user,
-            'k_ar': lag_order}
-        super().__init__(coefs, self.coefs_exog, sigma_u, names=names,
-            _params_info=_params_info)
+
+        # maybe change to params class, distinguish exog_all versus exog_user
+        # see issue #4535
+        _params_info = {
+            "k_trend": k_trend,
+            "k_exog_user": k_exog_user,
+            "k_ar": lag_order,
+        }
+        super().__init__(
+            coefs,
+            self.coefs_exog,
+            sigma_u,
+            names=names,
+            _params_info=_params_info,
+        )

     def plot(self):
         """Plot input time series"""
-        pass
+        return plotting.plot_mts(
+            self.endog, names=self.names, index=self.dates
+        )

     @property
     def df_model(self):
         """
         Number of estimated parameters per variable, including the intercept / trends
         """
-        pass
+        return self.neqs * self.k_ar + self.k_exog

     @property
     def df_resid(self):
         """Number of observations minus number of estimated parameters"""
-        pass
+        return self.nobs - self.df_model

     @cache_readonly
     def fittedvalues(self):
         """
         The predicted insample values of the response variables of the model.
         """
-        pass
+        return np.dot(self.endog_lagged, self.params)

     @cache_readonly
     def resid(self):
         """
         Residuals of response variable resulting from estimated coefficients
         """
-        pass
+        return self.endog[self.k_ar :] - self.fittedvalues

     def sample_acov(self, nlags=1):
         """Sample acov"""
-        pass
+        return _compute_acov(self.endog[self.k_ar :], nlags=nlags)

     def sample_acorr(self, nlags=1):
         """Sample acorr"""
-        pass
+        acovs = self.sample_acov(nlags=nlags)
+        return _acovs_to_acorrs(acovs)

     def plot_sample_acorr(self, nlags=10, linewidth=8):
         """
@@ -839,7 +1441,10 @@ class VARResults(VARProcess):
         Figure
             The figure that contains the plot axes.
         """
-        pass
+        fig = plotting.plot_full_acorr(
+            self.sample_acorr(nlags=nlags), linewidth=linewidth
+        )
+        return fig

     def resid_acov(self, nlags=1):
         """
@@ -852,7 +1457,7 @@ class VARResults(VARProcess):
         Returns
         -------
         """
-        pass
+        return _compute_acov(self.resid, nlags=nlags)

     def resid_acorr(self, nlags=1):
         """
@@ -865,19 +1470,22 @@ class VARResults(VARProcess):
         Returns
         -------
         """
-        pass
+        acovs = self.resid_acov(nlags=nlags)
+        return _acovs_to_acorrs(acovs)

     @cache_readonly
     def resid_corr(self):
         """
         Centered residual correlation matrix
         """
-        pass
+        return self.resid_acorr(0)[0]

     @cache_readonly
     def sigma_u_mle(self):
         """(Biased) maximum likelihood estimate of noise process covariance"""
-        pass
+        if not self.df_resid:
+            return np.zeros_like(self.sigma_u)
+        return self.sigma_u * self.df_resid / self.nobs

     def cov_params(self):
         """Estimated variance-covariance of model coefficients
@@ -890,59 +1498,80 @@ class VARResults(VARProcess):
         Adjusted to be an unbiased estimator
         Ref: Lütkepohl p.74-75
         """
-        pass
+        z = self.endog_lagged
+        return np.kron(np.linalg.inv(z.T @ z), self.sigma_u)

     def cov_ybar(self):
-        """Asymptotically consistent estimate of covariance of the sample mean
+        r"""Asymptotically consistent estimate of covariance of the sample mean

         .. math::

-            \\sqrt(T) (\\bar{y} - \\mu) \\rightarrow
-                  {\\cal N}(0, \\Sigma_{\\bar{y}}) \\\\
+            \sqrt(T) (\bar{y} - \mu) \rightarrow
+                  {\cal N}(0, \Sigma_{\bar{y}}) \\

-            \\Sigma_{\\bar{y}} = B \\Sigma_u B^\\prime, \\text{where }
-                  B = (I_K - A_1 - \\cdots - A_p)^{-1}
+            \Sigma_{\bar{y}} = B \Sigma_u B^\prime, \text{where }
+                  B = (I_K - A_1 - \cdots - A_p)^{-1}

         Notes
         -----
         Lütkepohl Proposition 3.3
         """
-        pass
+
+        Ainv = np.linalg.inv(np.eye(self.neqs) - self.coefs.sum(0))
+        return Ainv @ self.sigma_u @ Ainv.T
+
+    # ------------------------------------------------------------
+    # Estimation-related things
+
+    @cache_readonly
+    def _zz(self):
+        # Z'Z
+        return np.dot(self.endog_lagged.T, self.endog_lagged)

     @property
     def _cov_alpha(self):
         """
         Estimated covariance matrix of model coefficients w/o exog
         """
-        pass
+        # drop exog
+        kn = self.k_exog * self.neqs
+        return self.cov_params()[kn:, kn:]

     @cache_readonly
     def _cov_sigma(self):
         """
         Estimated covariance matrix of vech(sigma_u)
         """
-        pass
+        D_K = tsa.duplication_matrix(self.neqs)
+        D_Kinv = np.linalg.pinv(D_K)
+
+        sigxsig = np.kron(self.sigma_u, self.sigma_u)
+        return 2 * D_Kinv @ sigxsig @ D_Kinv.T

     @cache_readonly
     def llf(self):
-        """Compute VAR(p) loglikelihood"""
-        pass
+        "Compute VAR(p) loglikelihood"
+        return var_loglike(self.resid, self.sigma_u_mle, self.nobs)

     @cache_readonly
     def stderr(self):
         """Standard errors of coefficients, reshaped to match in size"""
-        pass
-    bse = stderr
+        stderr = np.sqrt(np.diag(self.cov_params()))
+        return stderr.reshape((self.df_model, self.neqs), order="C")
+
+    bse = stderr  # statsmodels interface?

     @cache_readonly
     def stderr_endog_lagged(self):
         """Stderr_endog_lagged"""
-        pass
+        start = self.k_exog
+        return self.stderr[start:]

     @cache_readonly
     def stderr_dt(self):
         """Stderr_dt"""
-        pass
+        end = self.k_exog
+        return self.stderr[:end]

     @cache_readonly
     def tvalues(self):
@@ -950,43 +1579,63 @@ class VARResults(VARProcess):
         Compute t-statistics. Use Student-t(T - Kp - 1) = t(df_resid) to
         test significance.
         """
-        pass
+        return self.params / self.stderr

     @cache_readonly
     def tvalues_endog_lagged(self):
         """tvalues_endog_lagged"""
-        pass
+        start = self.k_exog
+        return self.tvalues[start:]

     @cache_readonly
     def tvalues_dt(self):
         """tvalues_dt"""
-        pass
+        end = self.k_exog
+        return self.tvalues[:end]

     @cache_readonly
     def pvalues(self):
         """
         Two-sided p-values for model coefficients from Student t-distribution
         """
-        pass
+        # return stats.t.sf(np.abs(self.tvalues), self.df_resid)*2
+        return 2 * stats.norm.sf(np.abs(self.tvalues))

     @cache_readonly
     def pvalues_endog_lagged(self):
         """pvalues_endog_laggd"""
-        pass
+        start = self.k_exog
+        return self.pvalues[start:]

     @cache_readonly
     def pvalues_dt(self):
         """pvalues_dt"""
-        pass
+        end = self.k_exog
+        return self.pvalues[:end]
+
+    # todo: ------------------------------------------------------------------

     def plot_forecast(self, steps, alpha=0.05, plot_stderr=True):
         """
         Plot forecast
         """
-        pass
+        mid, lower, upper = self.forecast_interval(
+            self.endog[-self.k_ar :], steps, alpha=alpha
+        )
+        fig = plotting.plot_var_forc(
+            self.endog,
+            mid,
+            lower,
+            upper,
+            names=self.names,
+            plot_stderr=plot_stderr,
+        )
+        return fig
+
+    # Forecast error covariance functions

-    def forecast_cov(self, steps=1, method='mse'):
-        """Compute forecast covariance matrices for desired number of steps
+    def forecast_cov(self, steps=1, method="mse"):
+        r"""Compute forecast covariance matrices for desired number of steps

         Parameters
         ----------
@@ -994,7 +1643,7 @@ class VARResults(VARProcess):

         Notes
         -----
-        .. math:: \\Sigma_{\\hat y}(h) = \\Sigma_y(h) + \\Omega(h) / T
+        .. math:: \Sigma_{\hat y}(h) = \Sigma_y(h) + \Omega(h) / T

         Ref: Lütkepohl pp. 96-97

@@ -1002,10 +1651,36 @@ class VARResults(VARProcess):
         -------
         covs : ndarray (steps x k x k)
         """
-        pass
-
-    def irf_errband_mc(self, orth=False, repl=1000, steps=10, signif=0.05,
-        seed=None, burn=100, cum=False):
+        fc_cov = self.mse(steps)
+        if method == "mse":
+            pass
+        elif method == "auto":
+            if self.k_exog == 1 and self.k_trend < 2:
+                # currently only supported if no exog and trend in ['n', 'c']
+                fc_cov += self._omega_forc_cov(steps) / self.nobs
+                import warnings
+
+                warnings.warn(
+                    "forecast cov takes parameter uncertainty into" "account",
+                    OutputWarning,
+                    stacklevel = 2,
+                )
+        else:
+            raise ValueError("method has to be either 'mse' or 'auto'")
+
+        return fc_cov
+
+    # Monte Carlo irf standard errors
+    def irf_errband_mc(
+        self,
+        orth=False,
+        repl=1000,
+        steps=10,
+        signif=0.05,
+        seed=None,
+        burn=100,
+        cum=False,
+    ):
         """
         Compute Monte Carlo integrated error bands assuming normally
         distributed for impulse response functions
@@ -1035,10 +1710,21 @@ class VARResults(VARProcess):
         -------
         Tuple of lower and upper arrays of ma_rep monte carlo standard errors
         """
-        pass
+        ma_coll = self.irf_resim(
+            orth=orth, repl=repl, steps=steps, seed=seed, burn=burn, cum=cum
+        )
+
+        ma_sort = np.sort(ma_coll, axis=0)  # sort to get quantiles
+        # python 2: round returns float
+        low_idx = int(round(signif / 2 * repl) - 1)
+        upp_idx = int(round((1 - signif / 2) * repl) - 1)
+        lower = ma_sort[low_idx, :, :, :]
+        upper = ma_sort[upp_idx, :, :, :]
+        return lower, upper

-    def irf_resim(self, orth=False, repl=1000, steps=10, seed=None, burn=
-        100, cum=False):
+    def irf_resim(
+        self, orth=False, repl=1000, steps=10, seed=None, burn=100, cum=False
+    ):
         """
         Simulates impulse response function, returning an array of simulations.
         Used for Sims-Zha error band calculation.
@@ -1069,7 +1755,90 @@ class VARResults(VARProcess):
         -------
         Array of simulated impulse response functions
         """
-        pass
+        neqs = self.neqs
+        k_ar = self.k_ar
+        coefs = self.coefs
+        sigma_u = self.sigma_u
+        intercept = self.intercept
+        nobs = self.nobs
+        nobs_original = nobs + k_ar
+
+        ma_coll = np.zeros((repl, steps + 1, neqs, neqs))
+
+        def fill_coll(sim):
+            ret = VAR(sim, exog=self.exog).fit(maxlags=k_ar, trend=self.trend)
+            ret = (
+                ret.orth_ma_rep(maxn=steps) if orth else ret.ma_rep(maxn=steps)
+            )
+            return ret.cumsum(axis=0) if cum else ret
+
+        for i in range(repl):
+            # discard first burn to eliminate correct for starting bias
+            sim = util.varsim(
+                coefs,
+                intercept,
+                sigma_u,
+                seed=seed,
+                steps=nobs_original + burn,
+            )
+            sim = sim[burn:]
+            ma_coll[i, :, :, :] = fill_coll(sim)
+
+        return ma_coll
+
+    def _omega_forc_cov(self, steps):
+        # Approximate MSE matrix \Omega(h) as defined in Lut p97
+        G = self._zz
+        Ginv = np.linalg.inv(G)
+
+        # memoize powers of B for speedup
+        # TODO: see if can memoize better
+        # TODO: much lower-hanging fruit in caching `np.trace` below.
+        B = self._bmat_forc_cov()
+        _B = {}
+
+        def bpow(i):
+            if i not in _B:
+                _B[i] = np.linalg.matrix_power(B, i)
+
+            return _B[i]
+
+        phis = self.ma_rep(steps)
+        sig_u = self.sigma_u
+
+        omegas = np.zeros((steps, self.neqs, self.neqs))
+        for h in range(1, steps + 1):
+            if h == 1:
+                omegas[h - 1] = self.df_model * self.sigma_u
+                continue
+
+            om = omegas[h - 1]
+            for i in range(h):
+                for j in range(h):
+                    Bi = bpow(h - 1 - i)
+                    Bj = bpow(h - 1 - j)
+                    mult = np.trace(Bi.T @ Ginv @ Bj @ G)
+                    om += mult * phis[i] @ sig_u @ phis[j].T
+            omegas[h - 1] = om
+
+        return omegas
+
+    def _bmat_forc_cov(self):
+        # B as defined on p. 96 of Lut
+        upper = np.zeros((self.k_exog, self.df_model))
+        upper[:, : self.k_exog] = np.eye(self.k_exog)
+
+        lower_dim = self.neqs * (self.k_ar - 1)
+        eye = np.eye(lower_dim)
+        lower = np.column_stack(
+            (
+                np.zeros((lower_dim, self.k_exog)),
+                eye,
+                np.zeros((lower_dim, self.neqs)),
+            )
+        )
+
+        return np.vstack((upper, self.params.T, lower))

     def summary(self):
         """Compute console output summary of estimates
@@ -1078,7 +1847,7 @@ class VARResults(VARProcess):
         -------
         summary : VARSummary
         """
-        pass
+        return VARSummary(self)

     def irf(self, periods=10, var_decomp=None, var_order=None):
         """Analyze impulse responses to shocks in system
@@ -1096,7 +1865,12 @@ class VARResults(VARProcess):
         -------
         irf : IRAnalysis
         """
-        pass
+        if var_order is not None:
+            raise NotImplementedError(
+                "alternate variable order not implemented" " (yet)"
+            )
+
+        return IRAnalysis(self, P=var_decomp, periods=periods)

     def fevd(self, periods=10, var_decomp=None):
         """
@@ -1106,13 +1880,28 @@ class VARResults(VARProcess):
         -------
         fevd : FEVD instance
         """
-        pass
+        return FEVD(self, P=var_decomp, periods=periods)

     def reorder(self, order):
         """Reorder variables for structural specification"""
-        pass
+        if len(order) != len(self.params[0, :]):
+            raise ValueError(
+                "Reorder specification length should match "
+                "number of endogenous variables"
+            )
+        # This converts order to list of integers if given as strings
+        if isinstance(order[0], str):
+            order_new = []
+            for i, nam in enumerate(order):
+                order_new.append(self.names.index(order[i]))
+            order = order_new
+        return _reordered(self, order)
+
+    # -----------------------------------------------------------
+    # VAR Diagnostics: Granger-causality, whiteness of residuals,
+    #                  normality, etc

-    def test_causality(self, caused, causing=None, kind='f', signif=0.05):
+    def test_causality(self, caused, causing=None, kind="f", signif=0.05):
         """
         Test Granger causality

@@ -1161,7 +1950,86 @@ class VARResults(VARProcess):
         .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series*
            *Analysis*. Springer.
         """
-        pass
+        if not (0 < signif < 1):
+            raise ValueError("signif has to be between 0 and 1")
+
+        allowed_types = (str, int)
+
+        if isinstance(caused, allowed_types):
+            caused = [caused]
+        if not all(isinstance(c, allowed_types) for c in caused):
+            raise TypeError(
+                "caused has to be of type string or int (or a "
+                "sequence of these types)."
+            )
+        caused = [self.names[c] if type(c) is int else c for c in caused]
+        caused_ind = [util.get_index(self.names, c) for c in caused]
+
+        if causing is not None:
+
+            if isinstance(causing, allowed_types):
+                causing = [causing]
+            if not all(isinstance(c, allowed_types) for c in causing):
+                raise TypeError(
+                    "causing has to be of type string or int (or "
+                    "a sequence of these types) or None."
+                )
+            causing = [self.names[c] if type(c) is int else c for c in causing]
+            causing_ind = [util.get_index(self.names, c) for c in causing]
+        else:
+            causing_ind = [i for i in range(self.neqs) if i not in caused_ind]
+            causing = [self.names[c] for c in caused_ind]
+
+        k, p = self.neqs, self.k_ar
+        if p == 0:
+            err = "Cannot test Granger Causality in a model with 0 lags."
+            raise RuntimeError(err)
+
+        # number of restrictions
+        num_restr = len(causing) * len(caused) * p
+        num_det_terms = self.k_exog
+
+        # Make restriction matrix
+        C = np.zeros((num_restr, k * num_det_terms + k ** 2 * p), dtype=float)
+        cols_det = k * num_det_terms
+        row = 0
+        for j in range(p):
+            for ing_ind in causing_ind:
+                for ed_ind in caused_ind:
+                    C[row, cols_det + ed_ind + k * ing_ind + k ** 2 * j] = 1
+                    row += 1
+
+        # Lütkepohl 3.6.5
+        Cb = np.dot(C, vec(self.params.T))
+        middle = np.linalg.inv(C @ self.cov_params() @ C.T)
+
+        # wald statistic
+        lam_wald = statistic = Cb @ middle @ Cb
+
+        if kind.lower() == "wald":
+            df = num_restr
+            dist = stats.chi2(df)
+        elif kind.lower() == "f":
+            statistic = lam_wald / num_restr
+            df = (num_restr, k * self.df_resid)
+            dist = stats.f(*df)
+        else:
+            raise ValueError("kind %s not recognized" % kind)
+
+        pvalue = dist.sf(statistic)
+        crit_value = dist.ppf(1 - signif)
+
+        return CausalityTestResults(
+            causing,
+            caused,
+            statistic,
+            crit_value,
+            pvalue,
+            df,
+            signif,
+            test="granger",
+            method=kind,
+        )

     def test_inst_causality(self, causing, signif=0.05):
         """
@@ -1226,7 +2094,65 @@ class VARResults(VARProcess):
         .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series*
            *Analysis*. Springer.
         """
-        pass
+        if not (0 < signif < 1):
+            raise ValueError("signif has to be between 0 and 1")
+
+        allowed_types = (str, int)
+        if isinstance(causing, allowed_types):
+            causing = [causing]
+        if not all(isinstance(c, allowed_types) for c in causing):
+            raise TypeError(
+                "causing has to be of type string or int (or a "
+                + "a sequence of these types)."
+            )
+        causing = [self.names[c] if type(c) is int else c for c in causing]
+        causing_ind = [util.get_index(self.names, c) for c in causing]
+
+        caused_ind = [i for i in range(self.neqs) if i not in causing_ind]
+        caused = [self.names[c] for c in caused_ind]
+
+        # Note: JMulTi seems to be using k_ar+1 instead of k_ar
+        k, t, p = self.neqs, self.nobs, self.k_ar
+
+        num_restr = len(causing) * len(caused)  # called N in Lütkepohl
+
+        sigma_u = self.sigma_u
+        vech_sigma_u = util.vech(sigma_u)
+        sig_mask = np.zeros(sigma_u.shape)
+        # set =1 twice to ensure, that all the ones needed are below the main
+        # diagonal:
+        sig_mask[causing_ind, caused_ind] = 1
+        sig_mask[caused_ind, causing_ind] = 1
+        vech_sig_mask = util.vech(sig_mask)
+        inds = np.nonzero(vech_sig_mask)[0]
+
+        # Make restriction matrix
+        C = np.zeros((num_restr, len(vech_sigma_u)), dtype=float)
+        for row in range(num_restr):
+            C[row, inds[row]] = 1
+        Cs = np.dot(C, vech_sigma_u)
+        d = np.linalg.pinv(duplication_matrix(k))
+        Cd = np.dot(C, d)
+        middle = np.linalg.inv(Cd @ np.kron(sigma_u, sigma_u) @ Cd.T) / 2
+
+        wald_statistic = t * (Cs.T @ middle @ Cs)
+        df = num_restr
+        dist = stats.chi2(df)
+
+        pvalue = dist.sf(wald_statistic)
+        crit_value = dist.ppf(1 - signif)
+
+        return CausalityTestResults(
+            causing,
+            caused,
+            wald_statistic,
+            crit_value,
+            pvalue,
+            df,
+            signif,
+            test="inst",
+            method="wald",
+        )

     def test_whiteness(self, nlags=10, signif=0.05, adjusted=False):
         """
@@ -1257,14 +2183,38 @@ class VARResults(VARProcess):
         .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series*
            *Analysis*. Springer.
         """
-        pass
+        if nlags - self.k_ar <= 0:
+            raise ValueError(
+                "The whiteness test can only be used when nlags "
+                "is larger than the number of lags included in "
+                f"the model ({self.k_ar})."
+            )
+        statistic = 0
+        u = np.asarray(self.resid)
+        acov_list = _compute_acov(u, nlags)
+        cov0_inv = np.linalg.inv(acov_list[0])
+        for t in range(1, nlags + 1):
+            ct = acov_list[t]
+            to_add = np.trace(ct.T @ cov0_inv @ ct @ cov0_inv)
+            if adjusted:
+                to_add /= self.nobs - t
+            statistic += to_add
+        statistic *= self.nobs ** 2 if adjusted else self.nobs
+        df = self.neqs ** 2 * (nlags - self.k_ar)
+        dist = stats.chi2(df)
+        pvalue = dist.sf(statistic)
+        crit_value = dist.ppf(1 - signif)
+
+        return WhitenessTestResults(
+            statistic, crit_value, pvalue, df, signif, nlags, adjusted
+        )

     def plot_acorr(self, nlags=10, resid=True, linewidth=8):
-        """
+        r"""
         Plot autocorrelation of sample (endog) or residuals

         Sample (Y) or Residual autocorrelations are plotted together with the
-        standard :math:`2 / \\sqrt{T}` bounds.
+        standard :math:`2 / \sqrt{T}` bounds.

         Parameters
         ----------
@@ -1281,7 +2231,21 @@ class VARResults(VARProcess):
         Figure
             Figure instance containing the plot.
         """
-        pass
+        if resid:
+            acorrs = self.resid_acorr(nlags)
+        else:
+            acorrs = self.sample_acorr(nlags)
+
+        bound = 2 / np.sqrt(self.nobs)
+
+        fig = plotting.plot_full_acorr(
+            acorrs[1:],
+            xlabel=np.arange(1, nlags + 1),
+            err_bound=bound,
+            linewidth=linewidth,
+        )
+        fig.suptitle(r"ACF plots for residuals with $2 / \sqrt{T}$ bounds ")
+        return fig

     def test_normality(self, signif=0.05):
         """
@@ -1301,29 +2265,48 @@ class VARResults(VARProcess):
         -----
         H0 (null) : data are generated by a Gaussian-distributed process
         """
-        pass
+        return test_normality(self, signif=signif)

     @cache_readonly
     def detomega(self):
-        """
+        r"""
         Return determinant of white noise covariance with degrees of freedom
         correction:

         .. math::

-            \\hat \\Omega = \\frac{T}{T - Kp - 1} \\hat \\Omega_{\\mathrm{MLE}}
+            \hat \Omega = \frac{T}{T - Kp - 1} \hat \Omega_{\mathrm{MLE}}
         """
-        pass
+        return np.linalg.det(self.sigma_u)

     @cache_readonly
     def info_criteria(self):
-        """information criteria for lagorder selection"""
-        pass
+        "information criteria for lagorder selection"
+        nobs = self.nobs
+        neqs = self.neqs
+        lag_order = self.k_ar
+        free_params = lag_order * neqs ** 2 + neqs * self.k_exog
+        if self.df_resid:
+            ld = logdet_symm(self.sigma_u_mle)
+        else:
+            ld = -np.inf
+
+        # See Lütkepohl pp. 146-150
+
+        aic = ld + (2.0 / nobs) * free_params
+        bic = ld + (np.log(nobs) / nobs) * free_params
+        hqic = ld + (2.0 * np.log(np.log(nobs)) / nobs) * free_params
+        if self.df_resid:
+            fpe = ((nobs + self.df_model) / self.df_resid) ** neqs * np.exp(ld)
+        else:
+            fpe = np.inf
+
+        return {"aic": aic, "bic": bic, "hqic": hqic, "fpe": fpe}

     @property
     def aic(self):
         """Akaike information criterion"""
-        pass
+        return self.info_criteria["aic"]

     @property
     def fpe(self):
@@ -1331,17 +2314,17 @@ class VARResults(VARProcess):

         Lütkepohl p. 147, see info_criteria
         """
-        pass
+        return self.info_criteria["fpe"]

     @property
     def hqic(self):
         """Hannan-Quinn criterion"""
-        pass
+        return self.info_criteria["hqic"]

     @property
     def bic(self):
         """Bayesian a.k.a. Schwarz info criterion"""
-        pass
+        return self.info_criteria["bic"]

     @cache_readonly
     def roots(self):
@@ -1351,21 +2334,38 @@ class VARResults(VARProcess):
         Note that the inverse roots are returned, and stability requires that
         the roots lie outside the unit circle.
         """
-        pass
+        neqs = self.neqs
+        k_ar = self.k_ar
+        p = neqs * k_ar
+        arr = np.zeros((p, p))
+        arr[:neqs, :] = np.column_stack(self.coefs)
+        arr[neqs:, :-neqs] = np.eye(p - neqs)
+        roots = np.linalg.eig(arr)[0] ** -1
+        idx = np.argsort(np.abs(roots))[::-1]  # sort by reverse modulus
+        return roots[idx]


 class VARResultsWrapper(wrap.ResultsWrapper):
-    _attrs = {'bse': 'columns_eq', 'cov_params': 'cov', 'params':
-        'columns_eq', 'pvalues': 'columns_eq', 'tvalues': 'columns_eq',
-        'sigma_u': 'cov_eq', 'sigma_u_mle': 'cov_eq', 'stderr': 'columns_eq'}
-    _wrap_attrs = wrap.union_dicts(TimeSeriesResultsWrapper._wrap_attrs, _attrs
-        )
-    _methods = {'conf_int': 'multivariate_confint'}
-    _wrap_methods = wrap.union_dicts(TimeSeriesResultsWrapper._wrap_methods,
-        _methods)
-
-
-wrap.populate_wrapper(VARResultsWrapper, VARResults)
+    _attrs = {
+        "bse": "columns_eq",
+        "cov_params": "cov",
+        "params": "columns_eq",
+        "pvalues": "columns_eq",
+        "tvalues": "columns_eq",
+        "sigma_u": "cov_eq",
+        "sigma_u_mle": "cov_eq",
+        "stderr": "columns_eq",
+    }
+    _wrap_attrs = wrap.union_dicts(
+        TimeSeriesResultsWrapper._wrap_attrs, _attrs
+    )
+    _methods = {"conf_int": "multivariate_confint"}
+    _wrap_methods = wrap.union_dicts(
+        TimeSeriesResultsWrapper._wrap_methods, _methods
+    )
+
+
+wrap.populate_wrapper(VARResultsWrapper, VARResults)  # noqa:E305


 class FEVD:
@@ -1376,26 +2376,48 @@ class FEVD:

     def __init__(self, model, P=None, periods=None):
         self.periods = periods
+
         self.model = model
         self.neqs = model.neqs
         self.names = model.model.endog_names
+
         self.irfobj = model.irf(var_decomp=P, periods=periods)
         self.orth_irfs = self.irfobj.orth_irfs
+
+        # cumulative impulse responses
         irfs = (self.orth_irfs[:periods] ** 2).cumsum(axis=0)
+
         rng = lrange(self.neqs)
         mse = self.model.mse(periods)[:, rng, rng]
+
+        # lag x equation x component
         fevd = np.empty_like(irfs)
+
         for i in range(periods):
             fevd[i] = (irfs[i].T / mse[i]).T
+
+        # switch to equation x lag x component
         self.decomp = fevd.swapaxes(0, 1)

+    def summary(self):
+        buf = StringIO()
+
+        rng = lrange(self.periods)
+        for i in range(self.neqs):
+            ppm = output.pprint_matrix(self.decomp[i], rng, self.names)
+
+            buf.write("FEVD for %s\n" % self.names[i])
+            buf.write(ppm + "\n")
+
+        print(buf.getvalue())
+
     def cov(self):
         """Compute asymptotic standard errors

         Returns
         -------
         """
-        pass
+        raise NotImplementedError

     def plot(self, periods=None, figsize=(10, 10), **plot_kwds):
         """Plot graphical display of FEVD
@@ -1405,4 +2427,68 @@ class FEVD:
         periods : int, default None
             Defaults to number originally specified. Can be at most that number
         """
-        pass
+        import matplotlib.pyplot as plt
+
+        k = self.neqs
+        periods = periods or self.periods
+
+        fig, axes = plt.subplots(nrows=k, figsize=figsize)
+
+        fig.suptitle("Forecast error variance decomposition (FEVD)")
+
+        colors = [str(c) for c in np.arange(k, dtype=float) / k]
+        ticks = np.arange(periods)
+
+        limits = self.decomp.cumsum(2)
+        ax = axes[0]
+        for i in range(k):
+            ax = axes[i]
+
+            this_limits = limits[i].T
+
+            handles = []
+
+            for j in range(k):
+                lower = this_limits[j - 1] if j > 0 else 0
+                upper = this_limits[j]
+                handle = ax.bar(
+                    ticks,
+                    upper - lower,
+                    bottom=lower,
+                    color=colors[j],
+                    label=self.names[j],
+                    **plot_kwds,
+                )
+
+                handles.append(handle)
+
+            ax.set_title(self.names[i])
+
+        # just use the last axis to get handles for plotting
+        handles, labels = ax.get_legend_handles_labels()
+        fig.legend(handles, labels, loc="upper right")
+        plotting.adjust_subplots(right=0.85)
+        return fig
+
+
+# -------------------------------------------------------------------------------
+
+
+def _compute_acov(x, nlags=1):
+    x = x - x.mean(0)
+
+    result = []
+    for lag in range(nlags + 1):
+        if lag > 0:
+            r = np.dot(x[lag:].T, x[:-lag])
+        else:
+            r = np.dot(x.T, x)
+
+        result.append(r)
+
+    return np.array(result) / len(x)
+
+
+def _acovs_to_acorrs(acovs):
+    sd = np.sqrt(np.diag(acovs[0]))
+    return acovs / np.outer(sd, sd)
diff --git a/statsmodels/tsa/vector_ar/vecm.py b/statsmodels/tsa/vector_ar/vecm.py
index 48e6a39e2..99b75fa28 100644
--- a/statsmodels/tsa/vector_ar/vecm.py
+++ b/statsmodels/tsa/vector_ar/vecm.py
@@ -1,9 +1,12 @@
+# -*- coding: utf-8 -*-
 from collections import defaultdict
+
 import numpy as np
 from numpy import hstack, vstack
 from numpy.linalg import inv, svd
 import scipy
 import scipy.stats
+
 from statsmodels.iolib.summary import Summary
 from statsmodels.iolib.table import SimpleTable
 from statsmodels.tools.decorators import cache_readonly
@@ -12,15 +15,33 @@ from statsmodels.tools.validation import string_like
 import statsmodels.tsa.base.tsa_model as tsbase
 from statsmodels.tsa.coint_tables import c_sja, c_sjt
 from statsmodels.tsa.tsatools import duplication_matrix, lagmat, vec
-from statsmodels.tsa.vector_ar.hypothesis_test_results import CausalityTestResults, WhitenessTestResults
+from statsmodels.tsa.vector_ar.hypothesis_test_results import (
+    CausalityTestResults,
+    WhitenessTestResults,
+)
 import statsmodels.tsa.vector_ar.irf as irf
 import statsmodels.tsa.vector_ar.plotting as plot
 from statsmodels.tsa.vector_ar.util import get_index, seasonal_dummies
-from statsmodels.tsa.vector_ar.var_model import VAR, LagOrderResults, _compute_acov, forecast, forecast_interval, ma_rep, orth_ma_rep, test_normality
-
-
-def select_order(data, maxlags: int, deterministic: str='n', seasons: int=0,
-    exog=None, exog_coint=None):
+from statsmodels.tsa.vector_ar.var_model import (
+    VAR,
+    LagOrderResults,
+    _compute_acov,
+    forecast,
+    forecast_interval,
+    ma_rep,
+    orth_ma_rep,
+    test_normality,
+)
+
+
+def select_order(
+    data,
+    maxlags: int,
+    deterministic: str = "n",
+    seasons: int = 0,
+    exog=None,
+    exog_coint=None,
+):
     """
     Compute lag order selections based on each of the available information
     criteria.
@@ -53,7 +74,38 @@ def select_order(data, maxlags: int, deterministic: str='n', seasons: int=0,
     -------
     selected_orders : :class:`statsmodels.tsa.vector_ar.var_model.LagOrderResults`
     """
-    pass
+    ic = defaultdict(list)
+    deterministic = string_like(deterministic, "deterministic")
+    for p in range(1, maxlags + 2):  # +2 because k_ar_VECM == k_ar_VAR - 1
+        exogs = []
+        if "co" in deterministic or "ci" in deterministic:
+            exogs.append(np.ones(len(data)).reshape(-1, 1))
+        if "lo" in deterministic or "li" in deterministic:
+            exogs.append(1 + np.arange(len(data)).reshape(-1, 1))
+        if exog_coint is not None:
+            exogs.append(exog_coint)
+        if seasons > 0:
+            exogs.append(
+                seasonal_dummies(seasons, len(data)).reshape(-1, seasons - 1)
+            )
+        if exog is not None:
+            exogs.append(exog)
+        exogs = hstack(exogs) if exogs else None
+        var_model = VAR(data, exogs)
+        # exclude some periods ==> same amount of data used for each lag order
+        var_result = var_model._estimate_var(lags=p, offset=maxlags + 1 - p)
+
+        for k, v in var_result.info_criteria.items():
+            ic[k].append(v)
+    # -1+1 in the following line is only here for clarification.
+    # -1 because k_ar_VECM == k_ar_VAR - 1
+    # +1 because p == index +1 (we start with p=1, not p=0)
+    selected_orders = dict(
+        (ic_name, np.array(ic_value).argmin() - 1 + 1)
+        for ic_name, ic_value in ic.items()
+    )
+
+    return LagOrderResults(ic, selected_orders, True)


 def _linear_trend(nobs, k_ar, coint=False):
@@ -80,7 +132,10 @@ def _linear_trend(nobs, k_ar, coint=False):
     The returned array's size is nobs and not nobs_tot so it cannot be used to
     construct the exog-argument of VECM's __init__ method.
     """
-    pass
+    ret = np.arange(nobs) + k_ar
+    if not coint:
+        ret += 1
+    return ret


 def _num_det_vars(det_string, seasons=0):
@@ -108,11 +163,26 @@ def _num_det_vars(det_string, seasons=0):
         Number of deterministic terms and number dummy variables for seasonal
         terms.
     """
-    pass
-
-
-def _deterministic_to_exog(deterministic, seasons, nobs_tot, first_season=0,
-    seasons_centered=False, exog=None, exog_coint=None):
+    num = 0
+    det_string = string_like(det_string, "det_string")
+    if "ci" in det_string or "co" in det_string:
+        num += 1
+    if "li" in det_string or "lo" in det_string:
+        num += 1
+    if seasons > 0:
+        num += seasons - 1
+    return num
+
+
+def _deterministic_to_exog(
+    deterministic,
+    seasons,
+    nobs_tot,
+    first_season=0,
+    seasons_centered=False,
+    exog=None,
+    exog_coint=None,
+):
     """
     Translate all information about deterministic terms into a single array.

@@ -147,7 +217,26 @@ def _deterministic_to_exog(deterministic, seasons, nobs_tot, first_season=0,
         None, if the function's arguments do not contain deterministic terms.
         Otherwise, an ndarray representing these deterministic terms.
     """
-    pass
+    exogs = []
+    deterministic = string_like(deterministic, "deterministic")
+    if "co" in deterministic or "ci" in deterministic:
+        exogs.append(np.ones(nobs_tot))
+    if exog_coint is not None:
+        exogs.append(exog_coint)
+    if "lo" in deterministic or "li" in deterministic:
+        exogs.append(np.arange(nobs_tot))
+    if seasons > 0:
+        exogs.append(
+            seasonal_dummies(
+                seasons,
+                nobs_tot,
+                first_period=first_season,
+                centered=seasons_centered,
+            )
+        )
+    if exog is not None:
+        exogs.append(exog)
+    return np.column_stack(exogs) if exogs else None


 def _mat_sqrt(_2darray):
@@ -163,11 +252,20 @@ def _mat_sqrt(_2darray):
     result : ndarray
         Square root of the matrix given as function argument.
     """
-    pass
-
-
-def _endog_matrices(endog, exog, exog_coint, diff_lags, deterministic,
-    seasons=0, first_season=0):
+    u_, s_, v_ = svd(_2darray, full_matrices=False)
+    s_ = np.sqrt(s_)
+    return u_.dot(s_[:, None] * v_)
+
+
+def _endog_matrices(
+    endog,
+    exog,
+    exog_coint,
+    diff_lags,
+    deterministic,
+    seasons=0,
+    first_season=0,
+):
     """
     Returns different matrices needed for parameter estimation.

@@ -222,7 +320,58 @@ def _endog_matrices(endog, exog, exog_coint, diff_lags, deterministic,
     ----------
     .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.
     """
-    pass
+    deterministic = string_like(deterministic, "deterministic")
+    # p. 286:
+    p = diff_lags + 1
+    y = endog
+    K = y.shape[0]
+    y_1_T = y[:, p:]
+    T = y_1_T.shape[1]
+    delta_y = np.diff(y)
+    delta_y_1_T = delta_y[:, p - 1 :]
+
+    y_lag1 = y[:, p - 1 : -1]
+    if "co" in deterministic and "ci" in deterministic:
+        raise ValueError(
+            "Both 'co' and 'ci' as deterministic terms given. "
+            + "Please choose one of the two."
+        )
+    y_lag1_stack = [y_lag1]
+    if "ci" in deterministic:  # pp. 257, 299, 306, 307
+        y_lag1_stack.append(np.ones(T))
+    if "li" in deterministic:  # p. 299
+        y_lag1_stack.append(_linear_trend(T, p, coint=True))
+    if exog_coint is not None:
+        y_lag1_stack.append(exog_coint[-T - 1 : -1].T)
+    y_lag1 = np.row_stack(y_lag1_stack)
+
+    # p. 286:
+    delta_x = np.zeros((diff_lags * K, T))
+    if diff_lags > 0:
+        for j in range(delta_x.shape[1]):
+            delta_x[:, j] = delta_y[
+                :, j + p - 2 : None if j - 1 < 0 else j - 1 : -1
+            ].T.reshape(K * (p - 1))
+    delta_x_stack = [delta_x]
+    # p. 299, p. 303:
+    if "co" in deterministic:
+        delta_x_stack.append(np.ones(T))
+    if seasons > 0:
+        delta_x_stack.append(
+            seasonal_dummies(
+                seasons,
+                delta_x.shape[1],
+                first_period=first_season + diff_lags + 1,
+                centered=True,
+            ).T
+        )
+    if "lo" in deterministic:
+        delta_x_stack.append(_linear_trend(T, p))
+    if exog is not None:
+        delta_x_stack.append(exog[-T:].T)
+    delta_x = np.row_stack(delta_x_stack)
+
+    return y_1_T, delta_y_1_T, y_lag1, delta_x


 def _r_matrices(delta_y_1_T, y_lag1, delta_x):
@@ -253,7 +402,15 @@ def _r_matrices(delta_y_1_T, y_lag1, delta_x):
     ----------
     .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.
     """
-    pass
+
+    # todo: rewrite m such that a big (TxT) matrix is avoided
+    nobs = y_lag1.shape[1]
+    m = np.identity(nobs) - (
+        delta_x.T.dot(inv(delta_x.dot(delta_x.T))).dot(delta_x)
+    )  # p. 291
+    r0 = delta_y_1_T.dot(m)  # p. 292
+    r1 = y_lag1.dot(m)
+    return r0, r1


 def _sij(delta_x, delta_y_1_T, y_lag1):
@@ -283,7 +440,23 @@ def _sij(delta_x, delta_y_1_T, y_lag1):
     ----------
     .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.
     """
-    pass
+    nobs = y_lag1.shape[1]
+    r0, r1 = _r_matrices(delta_y_1_T, y_lag1, delta_x)
+    s00 = np.dot(r0, r0.T) / nobs
+    s01 = np.dot(r0, r1.T) / nobs
+    s10 = s01.T
+    s11 = np.dot(r1, r1.T) / nobs
+    s11_ = inv(_mat_sqrt(s11))
+    # p. 295:
+    s01_s11_ = np.dot(s01, s11_)
+    eig = np.linalg.eig(s01_s11_.T @ inv(s00) @ s01_s11_)
+    lambd = eig[0]
+    v = eig[1]
+    # reorder eig_vals to make them decreasing (and order eig_vecs accordingly)
+    lambd_order = np.argsort(lambd)[::-1]
+    lambd = lambd[lambd_order]
+    v = v[:, lambd_order]
+    return s00, s01, s10, s11, s11_, lambd, v


 class CointRankResults:
@@ -309,23 +482,57 @@ class CointRankResults:
         The test's significance level.
     """

-    def __init__(self, rank, neqs, test_stats, crit_vals, method='trace',
-        signif=0.05):
+    def __init__(
+        self, rank, neqs, test_stats, crit_vals, method="trace", signif=0.05
+    ):
         self.rank = rank
         self.neqs = neqs
-        self.r_1 = [(neqs if method == 'trace' else i + 1) for i in range(
-            min(rank + 1, neqs))]
+        self.r_1 = [
+            neqs if method == "trace" else i + 1
+            for i in range(min(rank + 1, neqs))
+        ]
         self.test_stats = test_stats
         self.crit_vals = crit_vals
         self.method = method
         self.signif = signif

+    def summary(self):
+        headers = ["r_0", "r_1", "test statistic", "critical value"]
+        title = (
+            "Johansen cointegration test using "
+            + ("trace" if self.method == "trace" else "maximum eigenvalue")
+            + " test statistic with {:.0%}".format(self.signif)
+            + " significance level"
+        )
+        num_tests = min(self.rank, self.neqs - 1)
+        data = [
+            [i, self.r_1[i], self.test_stats[i], self.crit_vals[i]]
+            for i in range(num_tests + 1)
+        ]
+        data_fmt = {
+            "data_fmts": ["%s", "%s", "%#0.4g", "%#0.4g"],
+            "data_aligns": "r",
+        }
+        html_data_fmt = dict(data_fmt)
+        html_data_fmt["data_fmts"] = [
+            "<td>" + i + "</td>" for i in html_data_fmt["data_fmts"]
+        ]
+        return SimpleTable(
+            data=data,
+            headers=headers,
+            title=title,
+            txt_fmt=data_fmt,
+            html_fmt=html_data_fmt,
+            ltx_fmt=data_fmt,
+        )
+
     def __str__(self):
         return self.summary().as_text()


-def select_coint_rank(endog, det_order, k_ar_diff, method='trace', signif=0.05
-    ):
+def select_coint_rank(
+    endog, det_order, k_ar_diff, method="trace", signif=0.05
+):
     """Calculate the cointegration rank of a VECM.

     Parameters
@@ -350,7 +557,48 @@ def select_coint_rank(endog, det_order, k_ar_diff, method='trace', signif=0.05
         A :class:`CointRankResults` object containing the cointegration rank suggested
         by the test and allowing a summary to be printed.
     """
-    pass
+    if method not in ["trace", "maxeig"]:
+        raise ValueError(
+            "The method argument has to be either 'trace' or"
+            "'maximum eigenvalue'."
+        )
+
+    if det_order not in [-1, 0, 1]:
+        if type(det_order) is int and det_order > 1:
+            raise ValueError(
+                "A det_order greather than 1 is not supported."
+                "Use a value of -1, 0, or 1."
+            )
+        else:
+            raise ValueError("det_order must be -1, 0, or 1.")
+
+    possible_signif_values = [0.1, 0.05, 0.01]
+    if signif not in possible_signif_values:
+        raise ValueError(
+            "Please choose a significance level from {0.1, 0.05," "0.01}"
+        )
+
+    coint_result = coint_johansen(endog, det_order, k_ar_diff)
+    test_stat = coint_result.lr1 if method == "trace" else coint_result.lr2
+    crit_vals = coint_result.cvt if method == "trace" else coint_result.cvm
+    signif_index = possible_signif_values.index(signif)
+
+    neqs = endog.shape[1]
+    r_0 = 0  # rank in null hypothesis
+    while r_0 < neqs:
+        if test_stat[r_0] < crit_vals[r_0, signif_index]:
+            break  # we accept current rank
+        else:
+            r_0 += 1  # we reject current rank and test next possible rank
+
+    return CointRankResults(
+        r_0,
+        neqs,
+        test_stat[: r_0 + 1],
+        crit_vals[: r_0 + 1, signif_index],
+        method,
+        signif,
+    )


 def coint_johansen(endog, det_order, k_ar_diff):
@@ -391,7 +639,103 @@ def coint_johansen(endog, det_order, k_ar_diff):
     .. [1] Lütkepohl, H. 2005. New Introduction to Multiple Time Series
         Analysis. Springer.
     """
-    pass
+    import warnings
+
+    if det_order not in [-1, 0, 1]:
+        warnings.warn(
+            "Critical values are only available for a det_order of "
+            "-1, 0, or 1.",
+            category=HypothesisTestWarning,
+            stacklevel=2,
+        )
+    if endog.shape[1] > 12:  # todo: test with a time series of 13 variables
+        warnings.warn(
+            "Critical values are only available for time series "
+            "with 12 variables at most.",
+            category=HypothesisTestWarning,
+            stacklevel=2,
+        )
+
+    from statsmodels.regression.linear_model import OLS
+
+    def detrend(y, order):
+        if order == -1:
+            return y
+        return (
+            OLS(y, np.vander(np.linspace(-1, 1, len(y)), order + 1))
+            .fit()
+            .resid
+        )
+
+    def resid(y, x):
+        if x.size == 0:
+            return y
+        r = y - np.dot(x, np.dot(np.linalg.pinv(x), y))
+        return r
+
+    endog = np.asarray(endog)
+    nobs, neqs = endog.shape
+
+    # why this?  f is detrend transformed series, det_order is detrend data
+    if det_order > -1:
+        f = 0
+    else:
+        f = det_order
+
+    endog = detrend(endog, det_order)
+    dx = np.diff(endog, 1, axis=0)
+    z = lagmat(dx, k_ar_diff)
+    z = z[k_ar_diff:]
+    z = detrend(z, f)
+
+    dx = dx[k_ar_diff:]
+
+    dx = detrend(dx, f)
+    r0t = resid(dx, z)
+    # GH 5731, [:-0] does not work, need [:t-0]
+    lx = endog[: (endog.shape[0] - k_ar_diff)]
+    lx = lx[1:]
+    dx = detrend(lx, f)
+    rkt = resid(dx, z)  # level on lagged diffs
+    # Level covariance after filtering k_ar_diff
+    skk = np.dot(rkt.T, rkt) / rkt.shape[0]
+    # Covariacne between filtered and unfiltered
+    sk0 = np.dot(rkt.T, r0t) / rkt.shape[0]
+    s00 = np.dot(r0t.T, r0t) / r0t.shape[0]
+    sig = np.dot(sk0, np.dot(inv(s00), sk0.T))
+    tmp = inv(skk)
+    au, du = np.linalg.eig(np.dot(tmp, sig))  # au is eval, du is evec
+
+    temp = inv(np.linalg.cholesky(np.dot(du.T, np.dot(skk, du))))
+    dt = np.dot(du, temp)
+
+    # JP: the next part can be done much  easier
+    auind = np.argsort(au)
+    aind = np.flipud(auind)
+    a = au[aind]
+    d = dt[:, aind]
+    # Normalize by first non-zero element of d, usually [0, 0]
+    # GH 5517
+    non_zero_d = d.flat != 0
+    if np.any(non_zero_d):
+        d *= np.sign(d.flat[non_zero_d][0])
+
+    #  Compute the trace and max eigenvalue statistics
+    lr1 = np.zeros(neqs)
+    lr2 = np.zeros(neqs)
+    cvm = np.zeros((neqs, 3))
+    cvt = np.zeros((neqs, 3))
+    iota = np.ones(neqs)
+    t, junk = rkt.shape
+    for i in range(0, neqs):
+        tmp = np.log(iota - a)[i:]
+        lr1[i] = -t * np.sum(tmp, 0)
+        lr2[i] = -t * np.log(1 - a[i])
+        cvm[i, :] = c_sja(neqs - i, det_order)
+        cvt[i, :] = c_sjt(neqs - i, det_order)
+        aind[i] = i
+
+    return JohansenTestResult(rkt, r0t, a, d, lr1, lr2, cvt, cvm, aind)


 class JohansenTestResult:
@@ -409,7 +753,7 @@ class JohansenTestResult:
     """

     def __init__(self, rkt, r0t, eig, evec, lr1, lr2, cvt, cvm, ind):
-        self._meth = 'johansen'
+        self._meth = "johansen"
         self._rkt = rkt
         self._r0t = r0t
         self._eig = eig
@@ -423,72 +767,72 @@ class JohansenTestResult:
     @property
     def rkt(self):
         """Residuals for :math:`Y_{-1}`"""
-        pass
+        return self._rkt

     @property
     def r0t(self):
         """Residuals for :math:`\\Delta Y`."""
-        pass
+        return self._r0t

     @property
     def eig(self):
         """Eigenvalues of VECM coefficient matrix"""
-        pass
+        return self._eig

     @property
     def evec(self):
         """Eigenvectors of VECM coefficient matrix"""
-        pass
+        return self._evec

     @property
     def trace_stat(self):
         """Trace statistic"""
-        pass
+        return self._lr1

     @property
     def lr1(self):
         """Trace statistic"""
-        pass
+        return self._lr1

     @property
     def max_eig_stat(self):
         """Maximum eigenvalue statistic"""
-        pass
+        return self._lr2

     @property
     def lr2(self):
         """Maximum eigenvalue statistic"""
-        pass
+        return self._lr2

     @property
     def trace_stat_crit_vals(self):
         """Critical values (90%, 95%, 99%) of trace statistic"""
-        pass
+        return self._cvt

     @property
     def cvt(self):
         """Critical values (90%, 95%, 99%) of trace statistic"""
-        pass
+        return self._cvt

     @property
     def cvm(self):
         """Critical values (90%, 95%, 99%) of maximum eigenvalue statistic."""
-        pass
+        return self._cvm

     @property
     def max_eig_stat_crit_vals(self):
         """Critical values (90%, 95%, 99%) of maximum eigenvalue statistic."""
-        pass
+        return self._cvm

     @property
     def ind(self):
         """Order of eigenvalues"""
-        pass
+        return self._ind

     @property
     def meth(self):
         """Test method"""
-        pass
+        return self._meth


 class VECM(tsbase.TimeSeriesModel):
@@ -595,15 +939,28 @@ class VECM(tsbase.TimeSeriesModel):
            *Vector Autoregressive Models*. Oxford University Press.
     """

-    def __init__(self, endog, exog=None, exog_coint=None, dates=None, freq=
-        None, missing='none', k_ar_diff=1, coint_rank=1, deterministic='n',
-        seasons=0, first_season=0):
+    def __init__(
+        self,
+        endog,
+        exog=None,
+        exog_coint=None,
+        dates=None,
+        freq=None,
+        missing="none",
+        k_ar_diff=1,
+        coint_rank=1,
+        deterministic="n",
+        seasons=0,
+        first_season=0,
+    ):
         super().__init__(endog, exog, dates, freq, missing=missing)
-        if exog_coint is not None and not exog_coint.shape[0] == endog.shape[0
-            ]:
-            raise ValueError('exog_coint must have as many rows as enodg_tot!')
+        if (
+            exog_coint is not None
+            and not exog_coint.shape[0] == endog.shape[0]
+        ):
+            raise ValueError("exog_coint must have as many rows as enodg_tot!")
         if self.endog.ndim == 1:
-            raise ValueError('Only gave one variable to VECM')
+            raise ValueError("Only gave one variable to VECM")
         self.y = self.endog.T
         self.exog_coint = exog_coint
         self.neqs = self.endog.shape[1]
@@ -613,9 +970,9 @@ class VECM(tsbase.TimeSeriesModel):
         self.deterministic = deterministic
         self.seasons = seasons
         self.first_season = first_season
-        self.load_coef_repr = 'ec'
+        self.load_coef_repr = "ec"  # name for loading coef. (alpha) in summary

-    def fit(self, method='ml'):
+    def fit(self, method="ml"):
         """
         Estimates the parameters of a VECM.

@@ -634,7 +991,66 @@ class VECM(tsbase.TimeSeriesModel):
         ----------
         .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.
         """
-        pass
+        if method == "ml":
+            return self._estimate_vecm_ml()
+        else:
+            raise ValueError(
+                "%s not recognized, must be among %s" % (method, "ml")
+            )
+
+    def _estimate_vecm_ml(self):
+        y_1_T, delta_y_1_T, y_lag1, delta_x = _endog_matrices(
+            self.y,
+            self.exog,
+            self.exog_coint,
+            self.k_ar_diff,
+            self.deterministic,
+            self.seasons,
+            self.first_season,
+        )
+        T = y_1_T.shape[1]
+
+        s00, s01, s10, s11, s11_, _, v = _sij(delta_x, delta_y_1_T, y_lag1)
+
+        beta_tilde = (v[:, : self.coint_rank].T.dot(s11_)).T
+        beta_tilde = np.real_if_close(beta_tilde)
+        # normalize beta tilde such that eye(r) forms the first r rows of it:
+        beta_tilde = np.dot(beta_tilde, inv(beta_tilde[: self.coint_rank]))
+        alpha_tilde = s01.dot(beta_tilde).dot(
+            inv(beta_tilde.T.dot(s11).dot(beta_tilde))
+        )
+        gamma_tilde = (
+            (delta_y_1_T - alpha_tilde.dot(beta_tilde.T).dot(y_lag1))
+            .dot(delta_x.T)
+            .dot(inv(np.dot(delta_x, delta_x.T)))
+        )
+        temp = (
+            delta_y_1_T
+            - alpha_tilde.dot(beta_tilde.T).dot(y_lag1)
+            - gamma_tilde.dot(delta_x)
+        )
+        sigma_u_tilde = temp.dot(temp.T) / T
+
+        return VECMResults(
+            self.y,
+            self.exog,
+            self.exog_coint,
+            self.k_ar,
+            self.coint_rank,
+            alpha_tilde,
+            beta_tilde,
+            gamma_tilde,
+            sigma_u_tilde,
+            deterministic=self.deterministic,
+            seasons=self.seasons,
+            delta_y_1_T=delta_y_1_T,
+            y_lag1=y_lag1,
+            delta_x=delta_x,
+            model=self,
+            names=self.endog_names,
+            dates=self.data.dates,
+            first_season=self.first_season,
+        )

     @property
     def _lagged_param_names(self):
@@ -655,7 +1071,38 @@ class VECM(tsbase.TimeSeriesModel):
         ----------
         .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.
         """
-        pass
+        param_names = []
+
+        # 1. Deterministic terms outside cointegration relation
+        if "co" in self.deterministic:
+            param_names += ["const.%s" % n for n in self.endog_names]
+
+        if self.seasons > 0:
+            param_names += [
+                "season%d.%s" % (s, n)
+                for s in range(1, self.seasons)
+                for n in self.endog_names
+            ]
+
+        if "lo" in self.deterministic:
+            param_names += ["lin_trend.%s" % n for n in self.endog_names]
+
+        if self.exog is not None:
+            param_names += [
+                "exog%d.%s" % (exog_no, n)
+                for exog_no in range(1, self.exog.shape[1] + 1)
+                for n in self.endog_names
+            ]
+
+        # 2. lagged endogenous terms
+        param_names += [
+            "L%d.%s.%s" % (i + 1, n1, n2)
+            for n2 in self.endog_names
+            for i in range(self.k_ar_diff)
+            for n1 in self.endog_names
+        ]
+
+        return param_names

     @property
     def _load_coef_param_names(self):
@@ -672,7 +1119,19 @@ class VECM(tsbase.TimeSeriesModel):
         ----------
         .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.
         """
-        pass
+        param_names = []
+
+        if self.coint_rank == 0:
+            return None
+
+        # loading coefficients (alpha) # called "ec" in JMulTi, "ECT" in tsDyn,
+        param_names += [  # and "_ce" in Stata
+            self.load_coef_repr + "%d.%s" % (i + 1, self.endog_names[j])
+            for j in range(self.neqs)
+            for i in range(self.coint_rank)
+        ]
+
+        return param_names

     @property
     def _coint_param_names(self):
@@ -686,7 +1145,36 @@ class VECM(tsbase.TimeSeriesModel):
             as well as deterministic terms inside the cointegration relation
             (if present in the model).
         """
-        pass
+        # 1. cointegration matrix/vector
+        param_names = []
+
+        param_names += [
+            ("beta.%d." + self.load_coef_repr + "%d") % (j + 1, i + 1)
+            for i in range(self.coint_rank)
+            for j in range(self.neqs)
+        ]
+
+        # 2. deterministic terms inside cointegration relation
+        if "ci" in self.deterministic:
+            param_names += [
+                "const." + self.load_coef_repr + "%d" % (i + 1)
+                for i in range(self.coint_rank)
+            ]
+
+        if "li" in self.deterministic:
+            param_names += [
+                "lin_trend." + self.load_coef_repr + "%d" % (i + 1)
+                for i in range(self.coint_rank)
+            ]
+
+        if self.exog_coint is not None:
+            param_names += [
+                "exog_coint%d.%s" % (n + 1, exog_no)
+                for exog_no in range(1, self.exog_coint.shape[1] + 1)
+                for n in range(self.neqs)
+            ]
+
+        return param_names


 class VECMResults:
@@ -885,10 +1373,27 @@ class VECMResults:
     .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.
     """

-    def __init__(self, endog, exog, exog_coint, k_ar, coint_rank, alpha,
-        beta, gamma, sigma_u, deterministic='n', seasons=0, first_season=0,
-        delta_y_1_T=None, y_lag1=None, delta_x=None, model=None, names=None,
-        dates=None):
+    def __init__(
+        self,
+        endog,
+        exog,
+        exog_coint,
+        k_ar,
+        coint_rank,
+        alpha,
+        beta,
+        gamma,
+        sigma_u,
+        deterministic="n",
+        seasons=0,
+        first_season=0,
+        delta_y_1_T=None,
+        y_lag1=None,
+        delta_x=None,
+        model=None,
+        names=None,
+        dates=None,
+    ):
         self.model = model
         self.y_all = endog
         self.exog = exog
@@ -897,10 +1402,11 @@ class VECMResults:
         self.dates = dates
         self.neqs = endog.shape[0]
         self.k_ar = k_ar
-        deterministic = string_like(deterministic, 'deterministic')
+        deterministic = string_like(deterministic, "deterministic")
         self.deterministic = deterministic
         self.seasons = seasons
         self.first_season = first_season
+
         self.coint_rank = coint_rank
         if alpha.dtype == np.complex128 and np.all(np.imag(alpha) == 0):
             alpha = np.real_if_close(alpha)
@@ -908,51 +1414,137 @@ class VECMResults:
             beta = np.real_if_close(beta)
         if gamma.dtype == np.complex128 and np.all(np.imag(gamma) == 0):
             gamma = np.real_if_close(gamma)
+
         self.alpha = alpha
         self.beta, self.det_coef_coint = np.vsplit(beta, [self.neqs])
-        self.gamma, self.det_coef = np.hsplit(gamma, [self.neqs * (self.
-            k_ar - 1)])
-        if 'ci' in deterministic:
+        self.gamma, self.det_coef = np.hsplit(
+            gamma, [self.neqs * (self.k_ar - 1)]
+        )
+
+        if "ci" in deterministic:
             self.const_coint = self.det_coef_coint[:1, :]
         else:
             self.const_coint = np.zeros(coint_rank).reshape((1, -1))
-        if 'li' in deterministic:
-            start = 1 if 'ci' in deterministic else 0
-            self.lin_trend_coint = self.det_coef_coint[start:start + 1, :]
+        if "li" in deterministic:
+            start = 1 if "ci" in deterministic else 0
+            self.lin_trend_coint = self.det_coef_coint[start : start + 1, :]
         else:
             self.lin_trend_coint = np.zeros(coint_rank).reshape(1, -1)
         if self.exog_coint is not None:
-            start = ('ci' in deterministic) + ('li' in deterministic)
+            start = ("ci" in deterministic) + ("li" in deterministic)
             self.exog_coint_coefs = self.det_coef_coint[start:, :]
         else:
             self.exog_coint_coefs = None
-        split_const_season = 1 if 'co' in deterministic else 0
-        split_season_lin = split_const_season + (seasons - 1 if seasons else 0)
-        if 'lo' in deterministic:
+
+        split_const_season = 1 if "co" in deterministic else 0
+        split_season_lin = split_const_season + (
+            (seasons - 1) if seasons else 0
+        )
+        if "lo" in deterministic:
             split_lin_exog = split_season_lin + 1
         else:
             split_lin_exog = split_season_lin
         self.const, self.seasonal, self.lin_trend, self.exog_coefs = np.hsplit(
-            self.det_coef, [split_const_season, split_season_lin,
-            split_lin_exog])
+            self.det_coef,
+            [split_const_season, split_season_lin, split_lin_exog],
+        )
+
         self.sigma_u = sigma_u
-        if (y_lag1 is not None and delta_x is not None and delta_y_1_T is not
-            None):
+
+        if (
+            y_lag1 is not None
+            and delta_x is not None
+            and delta_y_1_T is not None
+        ):
             self._delta_y_1_T = delta_y_1_T
             self._y_lag1 = y_lag1
             self._delta_x = delta_x
         else:
-            _y_1_T, self._delta_y_1_T, self._y_lag1, self._delta_x = (
-                _endog_matrices(endog, self.exog, k_ar, deterministic, seasons)
-                )
+            (
+                _y_1_T,
+                self._delta_y_1_T,
+                self._y_lag1,
+                self._delta_x,
+            ) = _endog_matrices(endog, self.exog, k_ar, deterministic, seasons)
         self.nobs = self._y_lag1.shape[1]

     @cache_readonly
-    def llf(self):
+    def llf(self):  # Lutkepohl p. 295 (7.2.20)
         """
         Compute the VECM's loglikelihood.
         """
-        pass
+        K = self.neqs
+        T = self.nobs
+        r = self.coint_rank
+        s00, _, _, _, _, lambd, _ = _sij(
+            self._delta_x, self._delta_y_1_T, self._y_lag1
+        )
+        return (
+            -K * T * np.log(2 * np.pi) / 2
+            - T * (np.log(np.linalg.det(s00)) + sum(np.log(1 - lambd)[:r])) / 2
+            - K * T / 2
+        )
+
+    @cache_readonly
+    def _cov_sigma(self):
+        sigma_u = self.sigma_u
+        d = duplication_matrix(self.neqs)
+        d_K_plus = np.linalg.pinv(d)
+        # compare p. 93, 297 Lutkepohl (2005)
+        return 2 * (d_K_plus @ np.kron(sigma_u, sigma_u) @ d_K_plus.T)
+
+    @cache_readonly
+    def cov_params_default(self):  # p.296 (7.2.21)
+        # Sigma_co described on p. 287
+        beta = self.beta
+        if self.det_coef_coint.size > 0:
+            beta = vstack((beta, self.det_coef_coint))
+        dt = self.deterministic
+        num_det = ("co" in dt) + ("lo" in dt)
+        num_det += (self.seasons - 1) if self.seasons else 0
+        if self.exog is not None:
+            num_det += self.exog.shape[1]
+        b_id = scipy.linalg.block_diag(
+            beta, np.identity(self.neqs * (self.k_ar - 1) + num_det)
+        )
+
+        y_lag1 = self._y_lag1
+        b_y = beta.T.dot(y_lag1)
+        omega11 = b_y.dot(b_y.T)
+        omega12 = b_y.dot(self._delta_x.T)
+        omega21 = omega12.T
+        omega22 = self._delta_x.dot(self._delta_x.T)
+        omega = np.bmat([[omega11, omega12], [omega21, omega22]]).A
+
+        mat1 = b_id.dot(inv(omega)).dot(b_id.T)
+        return np.kron(mat1, self.sigma_u)
+
+    @cache_readonly
+    def cov_params_wo_det(self):
+        # rows & cols to be dropped (related to deterministic terms inside the
+        # cointegration relation)
+        start_i = self.neqs ** 2  # first elements belong to alpha @ beta.T
+        end_i = start_i + self.neqs * self.det_coef_coint.shape[0]
+        to_drop_i = np.arange(start_i, end_i)
+
+        # rows & cols to be dropped (related to deterministic terms outside of
+        # the cointegration relation)
+        cov = self.cov_params_default
+        cov_size = len(cov)
+        to_drop_o = np.arange(cov_size - self.det_coef.size, cov_size)
+
+        to_drop = np.union1d(to_drop_i, to_drop_o)
+
+        mask = np.ones(cov.shape, dtype=bool)
+        mask[to_drop] = False
+        mask[:, to_drop] = False
+        cov_size_new = mask.sum(axis=0)[0]
+        return cov[mask].reshape((cov_size_new, cov_size_new))
+
+    # standard errors:
+    @cache_readonly
+    def stderr_params(self):
+        return np.sqrt(np.diag(self.cov_params_default))

     @cache_readonly
     def stderr_coint(self):
@@ -975,7 +1567,157 @@ class VECMResults:
         ----------
         .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.
         """
-        pass
+        r = self.coint_rank
+        _, r1 = _r_matrices(self._delta_y_1_T, self._y_lag1, self._delta_x)
+        r12 = r1[r:]
+        if r12.size == 0:
+            return np.zeros((r, r))
+        mat1 = inv(r12.dot(r12.T))
+        mat1 = np.kron(mat1.T, np.identity(r))
+        det = self.det_coef_coint.shape[0]
+        mat2 = np.kron(
+            np.identity(self.neqs - r + det),
+            inv(self.alpha.T @ inv(self.sigma_u) @ self.alpha),
+        )
+        first_rows = np.zeros((r, r))
+        last_rows_1d = np.sqrt(np.diag(mat1.dot(mat2)))
+        last_rows = last_rows_1d.reshape((self.neqs - r + det, r), order="F")
+        return vstack((first_rows, last_rows))
+
+    @cache_readonly
+    def stderr_alpha(self):
+        ret_1dim = self.stderr_params[: self.alpha.size]
+        return ret_1dim.reshape(self.alpha.shape, order="F")
+
+    @cache_readonly
+    def stderr_beta(self):
+        ret_1dim = self.stderr_coint[: self.beta.shape[0]]
+        return ret_1dim.reshape(self.beta.shape, order="F")
+
+    @cache_readonly
+    def stderr_det_coef_coint(self):
+        if self.det_coef_coint.size == 0:
+            return self.det_coef_coint  # 0-size array
+        ret_1dim = self.stderr_coint[self.beta.shape[0] :]
+        return ret_1dim.reshape(self.det_coef_coint.shape, order="F")
+
+    @cache_readonly
+    def stderr_gamma(self):
+        start = self.alpha.shape[0] * (
+            self.beta.shape[0] + self.det_coef_coint.shape[0]
+        )
+        ret_1dim = self.stderr_params[start : start + self.gamma.size]
+        return ret_1dim.reshape(self.gamma.shape, order="F")
+
+    @cache_readonly
+    def stderr_det_coef(self):
+        if self.det_coef.size == 0:
+            return self.det_coef  # 0-size array
+        ret1_1dim = self.stderr_params[-self.det_coef.size :]
+        return ret1_1dim.reshape(self.det_coef.shape, order="F")
+
+    # t-values:
+    @cache_readonly
+    def tvalues_alpha(self):
+        return self.alpha / self.stderr_alpha
+
+    @cache_readonly
+    def tvalues_beta(self):
+        r = self.coint_rank
+        first_rows = np.zeros((r, r))
+        last_rows = self.beta[r:] / self.stderr_beta[r:]
+        return vstack((first_rows, last_rows))
+
+    @cache_readonly
+    def tvalues_det_coef_coint(self):
+        if self.det_coef_coint.size == 0:
+            return self.det_coef_coint  # 0-size array
+        return self.det_coef_coint / self.stderr_det_coef_coint
+
+    @cache_readonly
+    def tvalues_gamma(self):
+        return self.gamma / self.stderr_gamma
+
+    @cache_readonly
+    def tvalues_det_coef(self):
+        if self.det_coef.size == 0:
+            return self.det_coef  # 0-size array
+        return self.det_coef / self.stderr_det_coef
+
+    # p-values:
+    @cache_readonly
+    def pvalues_alpha(self):
+        return (1 - scipy.stats.norm.cdf(abs(self.tvalues_alpha))) * 2
+
+    @cache_readonly
+    def pvalues_beta(self):
+        first_rows = np.zeros((self.coint_rank, self.coint_rank))
+        tval_last = self.tvalues_beta[self.coint_rank :]
+        last_rows = (1 - scipy.stats.norm.cdf(abs(tval_last))) * 2  # student-t
+        return vstack((first_rows, last_rows))
+
+    @cache_readonly
+    def pvalues_det_coef_coint(self):
+        if self.det_coef_coint.size == 0:
+            return self.det_coef_coint  # 0-size array
+        return (1 - scipy.stats.norm.cdf(abs(self.tvalues_det_coef_coint))) * 2
+
+    @cache_readonly
+    def pvalues_gamma(self):
+        return (1 - scipy.stats.norm.cdf(abs(self.tvalues_gamma))) * 2
+
+    @cache_readonly
+    def pvalues_det_coef(self):
+        if self.det_coef.size == 0:
+            return self.det_coef  # 0-size array
+        return (1 - scipy.stats.norm.cdf(abs(self.tvalues_det_coef))) * 2
+
+    # confidence intervals
+    def _make_conf_int(self, est, stderr, alpha):
+        struct_arr = np.zeros(
+            est.shape, dtype=[("lower", float), ("upper", float)]
+        )
+        struct_arr["lower"] = (
+            est - scipy.stats.norm.ppf(1 - alpha / 2) * stderr
+        )
+        struct_arr["upper"] = (
+            est + scipy.stats.norm.ppf(1 - alpha / 2) * stderr
+        )
+        return struct_arr
+
+    def conf_int_alpha(self, alpha=0.05):
+        return self._make_conf_int(self.alpha, self.stderr_alpha, alpha)
+
+    def conf_int_beta(self, alpha=0.05):
+        return self._make_conf_int(self.beta, self.stderr_beta, alpha)
+
+    def conf_int_det_coef_coint(self, alpha=0.05):
+        return self._make_conf_int(
+            self.det_coef_coint, self.stderr_det_coef_coint, alpha
+        )
+
+    def conf_int_gamma(self, alpha=0.05):
+        return self._make_conf_int(self.gamma, self.stderr_gamma, alpha)
+
+    def conf_int_det_coef(self, alpha=0.05):
+        return self._make_conf_int(self.det_coef, self.stderr_det_coef, alpha)
+
+    @cache_readonly
+    def var_rep(self):
+        pi = self.alpha.dot(self.beta.T)
+        gamma = self.gamma
+        K = self.neqs
+        A = np.zeros((self.k_ar, K, K))
+        A[0] = pi + np.identity(K)
+        if self.gamma.size > 0:
+            A[0] += gamma[:, :K]
+            A[self.k_ar - 1] = -gamma[:, K * (self.k_ar - 2) :]
+            for i in range(1, self.k_ar - 1):
+                A[i] = (
+                    gamma[:, K * i : K * (i + 1)]
+                    - gamma[:, K * (i - 1) : K * i]
+                )
+        return A

     @cache_readonly
     def cov_var_repr(self):
@@ -990,7 +1732,54 @@ class VECMResults:
         -------
         cov : array (neqs**2 * k_ar x neqs**2 * k_ar)
         """
-        pass
+        # This implementation is using the fact that for a random variable x
+        # with covariance matrix Sigma_x the following holds:
+        # B @ x with B being a suitably sized matrix has the covariance matrix
+        # B @ Sigma_x @ B.T. The arrays called vecm_var_transformation and
+        # self.cov_params_wo_det in the code play the roles of B and Sigma_x
+        # respectively. The elements of the random variable x are the elements
+        # of the estimated matrices Pi (alpha @ beta.T) and Gamma.
+        # Alternatively the following code (commented out) would yield the same
+        # result (following p. 289 in Lutkepohl):
+        # K, p = self.neqs, self.k_ar
+        # w = np.identity(K * p)
+        # w[np.arange(K, len(w)), np.arange(K, len(w))] *= (-1)
+        # w[np.arange(K, len(w)), np.arange(len(w)-K)] = 1
+        #
+        # w_eye = np.kron(w, np.identity(K))
+        #
+        # return w_eye.T @ self.cov_params_default @ w_eye
+
+        if self.k_ar - 1 == 0:
+            return self.cov_params_wo_det
+
+        vecm_var_transformation = np.zeros(
+            (self.neqs ** 2 * self.k_ar, self.neqs ** 2 * self.k_ar)
+        )
+        eye = np.identity(self.neqs ** 2)
+        # for A_1:
+        vecm_var_transformation[
+            : self.neqs ** 2, : 2 * self.neqs ** 2
+        ] = hstack((eye, eye))
+        # for A_i, where i = 2, ..., k_ar-1
+        for i in range(2, self.k_ar):
+            start_row = self.neqs ** 2 + (i - 2) * self.neqs ** 2
+            start_col = self.neqs ** 2 + (i - 2) * self.neqs ** 2
+            vecm_var_transformation[
+                start_row : start_row + self.neqs ** 2,
+                start_col : start_col + 2 * self.neqs ** 2,
+            ] = hstack((-eye, eye))
+        # for A_p:
+        vecm_var_transformation[-self.neqs ** 2 :, -self.neqs ** 2 :] = -eye
+        vvt = vecm_var_transformation
+        return vvt @ self.cov_params_wo_det @ vvt.T
+
+    def ma_rep(self, maxn=10):
+        return ma_rep(self.var_rep, maxn)
+
+    @cache_readonly
+    def _chol_sigma_u(self):
+        return np.linalg.cholesky(self.sigma_u)

     def orth_ma_rep(self, maxn=10, P=None):
         """Compute orthogonalized MA coefficient matrices.
@@ -1011,7 +1800,7 @@ class VECMResults:
         -------
         coefs : ndarray (maxn x neqs x neqs)
         """
-        pass
+        return orth_ma_rep(self, maxn, P)

     def predict(self, steps=5, alpha=None, exog_fc=None, exog_coint_fc=None):
         """
@@ -1040,10 +1829,116 @@ class VECMResults:
             period, the last row (index [steps-1]) is the steps-periods-ahead-
             forecast.
         """
-        pass
+        if self.exog is not None and exog_fc is None:
+            raise ValueError(
+                "exog_fc is None: Please pass the future values "
+                "of the VECM's exog terms via the exog_fc "
+                "argument!"
+            )
+        if self.exog is None and exog_fc is not None:
+            raise ValueError(
+                "This VECMResult-instance's exog attribute is "
+                "None. Please do not pass a non-None value as the "
+                "method's exog_fc-argument."
+            )
+        if exog_fc is not None and exog_fc.shape[0] < steps:
+            raise ValueError(
+                "The argument exog_fc must have at least steps "
+                "elements in its first dimension"
+            )
+
+        if self.exog_coint is not None and exog_coint_fc is None:
+            raise ValueError(
+                "exog_coint_fc is None: Please pass the future "
+                "values of the VECM's exog_coint terms via the "
+                "exog_coint_fc argument!"
+            )
+        if self.exog_coint is None and exog_coint_fc is not None:
+            raise ValueError(
+                "This VECMResult-instance's exog_coint attribute "
+                "is None. Please do not pass a non-None value as "
+                "the method's exog_coint_fc-argument."
+            )
+        if exog_coint_fc is not None and exog_coint_fc.shape[0] < steps - 1:
+            raise ValueError(
+                "The argument exog_coint_fc must have at least "
+                "steps elements in its first dimension"
+            )
+
+        last_observations = self.y_all.T[-self.k_ar :]
+        exog = []
+        trend_coefs = []
+
+        # adding deterministic terms outside cointegration relation
+        exog_const = np.ones(steps)
+        nobs_tot = self.nobs + self.k_ar
+        if self.const.size > 0:
+            exog.append(exog_const)
+            trend_coefs.append(self.const.T)
+
+        if self.seasons > 0:
+            first_future_season = (self.first_season + nobs_tot) % self.seasons
+            exog_seasonal = seasonal_dummies(
+                self.seasons, steps, first_future_season, True
+            )
+            exog.append(exog_seasonal)
+            trend_coefs.append(self.seasonal.T)
+
+        exog_lin_trend = _linear_trend(self.nobs, self.k_ar)
+        exog_lin_trend = exog_lin_trend[-1] + 1 + np.arange(steps)
+        if self.lin_trend.size > 0:
+            exog.append(exog_lin_trend)
+            trend_coefs.append(self.lin_trend.T)
+
+        if exog_fc is not None:
+            exog.append(exog_fc[:steps])
+            trend_coefs.append(self.exog_coefs.T)
+
+        # adding deterministic terms inside cointegration relation
+        if "ci" in self.deterministic:
+            exog.append(exog_const)
+            trend_coefs.append(self.alpha.dot(self.const_coint.T).T)
+        exog_lin_trend_coint = _linear_trend(self.nobs, self.k_ar, coint=True)
+        exog_lin_trend_coint = exog_lin_trend_coint[-1] + 1 + np.arange(steps)
+        if "li" in self.deterministic:
+            exog.append(exog_lin_trend_coint)
+            trend_coefs.append(self.alpha.dot(self.lin_trend_coint.T).T)
+
+        if exog_coint_fc is not None:
+            if exog_coint_fc.ndim == 1:
+                exog_coint_fc = exog_coint_fc[:, None]  # make 2-D
+            exog_coint_fc = np.vstack(
+                (self.exog_coint[-1:], exog_coint_fc[: steps - 1])
+            )
+            exog.append(exog_coint_fc)
+            trend_coefs.append(self.alpha.dot(self.exog_coint_coefs.T).T)
+
+        # glueing all deterministics together
+        exog = np.column_stack(exog) if exog != [] else None
+        if trend_coefs != []:
+            trend_coefs = np.row_stack(trend_coefs)
+        else:
+            trend_coefs = None
+
+        # call the forecasting function of the VAR-module
+        if alpha is not None:
+            return forecast_interval(
+                last_observations,
+                self.var_rep,
+                trend_coefs,
+                self.sigma_u,
+                steps,
+                alpha=alpha,
+                exog=exog,
+            )
+        else:
+            return forecast(
+                last_observations, self.var_rep, trend_coefs, steps, exog
+            )

-    def plot_forecast(self, steps, alpha=0.05, plot_conf_int=True,
-        n_last_obs=None):
+    def plot_forecast(
+        self, steps, alpha=0.05, plot_conf_int=True, n_last_obs=None
+    ):
         """
         Plot the forecast.

@@ -1059,10 +1954,22 @@ class VECMResults:
             If int, restrict plotted history to n_last_obs observations.
             If None, include the whole history in the plot.
         """
-        pass
+        mid, lower, upper = self.predict(steps, alpha=alpha)
+
+        y = self.y_all.T
+        y = y[self.k_ar :] if n_last_obs is None else y[-n_last_obs:]
+        plot.plot_var_forc(
+            y,
+            mid,
+            lower,
+            upper,
+            names=self.names,
+            plot_stderr=plot_conf_int,
+            legend_options={"loc": "lower left"},
+        )

     def test_granger_causality(self, caused, causing=None, signif=0.05):
-        """
+        r"""
         Test for Granger-causality.

         The concept of Granger-causality is described in chapter 7.6.3 of [1]_.
@@ -1100,14 +2007,124 @@ class VECMResults:
         ----------
         .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.

-        .. |H0| replace:: H\\ :sub:`0`
+        .. |H0| replace:: H\ :sub:`0`

-        .. |H1| replace:: H\\ :sub:`1`
+        .. |H1| replace:: H\ :sub:`1`
         """
-        pass
+        if not (0 < signif < 1):
+            raise ValueError("signif has to be between 0 and 1")
+
+        allowed_types = (str, int)
+
+        if isinstance(caused, allowed_types):
+            caused = [caused]
+        if not all(isinstance(c, allowed_types) for c in caused):
+            raise TypeError(
+                "caused has to be of type string or int (or a "
+                "sequence of these types)."
+            )
+        caused = [self.names[c] if type(c) is int else c for c in caused]
+        caused_ind = [get_index(self.names, c) for c in caused]
+
+        if causing is not None:
+
+            if isinstance(causing, allowed_types):
+                causing = [causing]
+            if not all(isinstance(c, allowed_types) for c in causing):
+                raise TypeError(
+                    "causing has to be of type string or int (or "
+                    "a sequence of these types) or None."
+                )
+            causing = [self.names[c] if type(c) is int else c for c in causing]
+            causing_ind = [get_index(self.names, c) for c in causing]
+
+        if causing is None:
+            causing_ind = [i for i in range(self.neqs) if i not in caused_ind]
+            causing = [self.names[c] for c in causing_ind]
+
+        y, k, t, p = self.y_all, self.neqs, self.nobs - 1, self.k_ar + 1
+        exog = _deterministic_to_exog(
+            self.deterministic,
+            self.seasons,
+            nobs_tot=self.nobs + self.k_ar,
+            first_season=self.first_season,
+            seasons_centered=True,
+            exog=self.exog,
+            exog_coint=self.exog_coint,
+        )
+        var_results = VAR(y.T, exog).fit(maxlags=p, trend="n")
+
+        # num_restr is called N in Lutkepohl
+        num_restr = len(causing) * len(caused) * (p - 1)
+        num_det_terms = _num_det_vars(self.deterministic, self.seasons)
+        if self.exog is not None:
+            num_det_terms += self.exog.shape[1]
+        if self.exog_coint is not None:
+            num_det_terms += self.exog_coint.shape[1]
+
+        # Make restriction matrix
+        C = np.zeros(
+            (num_restr, k * num_det_terms + k ** 2 * (p - 1)), dtype=float
+        )
+        cols_det = k * num_det_terms
+        row = 0
+        for j in range(p - 1):
+            for ing_ind in causing_ind:
+                for ed_ind in caused_ind:
+                    C[row, cols_det + ed_ind + k * ing_ind + k ** 2 * j] = 1
+                    row += 1
+        Ca = np.dot(C, vec(var_results.params[:-k].T))
+
+        x_min_p_components = []
+        if exog is not None:
+            x_min_p_components.append(exog[-t:].T)
+
+        x_min_p = np.zeros((k * p, t))
+        for i in range(p - 1):  # fll first k * k_ar rows of x_min_p
+            x_min_p[i * k : (i + 1) * k, :] = (
+                y[:, p - 1 - i : -1 - i] - y[:, :-p]
+            )
+        x_min_p[-k:, :] = y[:, :-p]  # fill last rows of x_min_p
+        x_min_p_components.append(x_min_p)
+
+        x_min_p = np.row_stack(x_min_p_components)
+        x_x = np.dot(x_min_p, x_min_p.T)  # k*k_ar x k*k_ar
+        x_x_11 = inv(x_x)[
+            : k * (p - 1) + num_det_terms, : k * (p - 1) + num_det_terms
+        ]  # k*(k_ar-1) x k*(k_ar-1)
+        # For VAR-models with parameter restrictions the denominator in the
+        # calculation of sigma_u is nobs and not (nobs-k*k_ar-num_det_terms).
+        # Testing for Granger-causality means testing for restricted
+        # parameters, thus the former of the two denominators is used. As
+        # Lutkepohl states, both variants of the estimated sigma_u are
+        # possible. (see Lutkepohl, p.198)
+        # The choice of the denominator T has also the advantage of getting the
+        # same results as the reference software JMulTi.
+        sigma_u = var_results.sigma_u * (t - k * p - num_det_terms) / t
+        sig_alpha_min_p = t * np.kron(x_x_11, sigma_u)  # k**2*(p-1)xk**2*(p-1)
+        middle = inv(C @ sig_alpha_min_p @ C.T)
+
+        wald_statistic = t * (Ca.T @ middle @ Ca)
+        f_statistic = wald_statistic / num_restr
+        df = (num_restr, k * var_results.df_resid)
+        f_distribution = scipy.stats.f(*df)
+
+        pvalue = f_distribution.sf(f_statistic)
+        crit_value = f_distribution.ppf(1 - signif)
+        return CausalityTestResults(
+            causing,
+            caused,
+            f_statistic,
+            crit_value,
+            pvalue,
+            df,
+            signif,
+            test="granger",
+            method="f",
+        )

     def test_inst_causality(self, causing, signif=0.05):
-        """
+        r"""
         Test for instantaneous causality.

         The concept of instantaneous causality is described in chapters 3.6.3
@@ -1147,11 +2164,29 @@ class VECMResults:
         ----------
         .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.

-        .. |H0| replace:: H\\ :sub:`0`
+        .. |H0| replace:: H\ :sub:`0`

-        .. |H1| replace:: H\\ :sub:`1`
+        .. |H1| replace:: H\ :sub:`1`
         """
-        pass
+        exog = _deterministic_to_exog(
+            self.deterministic,
+            self.seasons,
+            nobs_tot=self.nobs + self.k_ar,
+            first_season=self.first_season,
+            seasons_centered=True,
+            exog=self.exog,
+            exog_coint=self.exog_coint,
+        )
+
+        # Note: JMulTi seems to be using k_ar+1 instead of k_ar
+        k, t, p = self.neqs, self.nobs, self.k_ar
+        # fit with trend "n" because all trend information is already in exog
+        var_results = VAR(self.y_all.T, exog).fit(maxlags=p, trend="n")
+        var_results._results.names = self.names
+        return var_results.test_inst_causality(causing=causing, signif=signif)
+
+    def irf(self, periods=10):
+        return irf.IRAnalysis(self, periods=periods, vecm=True)

     @cache_readonly
     def fittedvalues(self):
@@ -1163,7 +2198,16 @@ class VECMResults:
         fitted : array (nobs x neqs)
             The predicted in-sample values of the models' endogenous variables.
         """
-        pass
+        beta = self.beta
+        if self.det_coef_coint.size > 0:
+            beta = vstack((beta, self.det_coef_coint))
+        pi = np.dot(self.alpha, beta.T)
+
+        gamma = self.gamma
+        if self.det_coef.size > 0:
+            gamma = hstack((gamma, self.det_coef))
+        delta_y = np.dot(pi, self._y_lag1) + np.dot(gamma, self._delta_x)
+        return (delta_y + self._y_lag1[: self.neqs]).T

     @cache_readonly
     def resid(self):
@@ -1175,12 +2219,12 @@ class VECMResults:
         resid : array (nobs x neqs)
             The residuals.
         """
-        pass
+        return self.y_all.T[self.k_ar :] - self.fittedvalues

     def test_normality(self, signif=0.05):
-        """
+        r"""
         Test assumption of normal-distributed errors using Jarque-Bera-style
-        omnibus :math:`\\\\chi^2` test.
+        omnibus :math:`\\chi^2` test.

         Parameters
         ----------
@@ -1195,9 +2239,9 @@ class VECMResults:
         -----
         |H0| : data are generated by a Gaussian-distributed process

-        .. |H0| replace:: H\\ :sub:`0`
+        .. |H0| replace:: H\ :sub:`0`
         """
-        pass
+        return test_normality(self, signif=signif)

     def test_whiteness(self, nlags=10, signif=0.05, adjusted=False):
         """
@@ -1219,7 +2263,39 @@ class VECMResults:
         ----------
         .. [1] Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.
         """
-        pass
+
+        statistic = 0
+        u = np.asarray(self.resid)
+        acov_list = _compute_acov(u, nlags)
+        # self.sigma_u instead of cov(0) is necessary to get the same
+        # result as JMulTi. The difference between the two is that sigma_u is
+        # calculated with the usual residuals while in cov(0) the
+        # residuals are demeaned. To me JMulTi's behaviour seems a bit strange
+        # because it uses the usual residuals here but demeaned residuals in
+        # the calculation of autocovariances with lag > 0. (used in the
+        # argument of trace() four rows below this comment.)
+        c0_inv = inv(self.sigma_u)  # instead of inv(cov(0))
+        if c0_inv.dtype == np.complex128 and np.all(np.imag(c0_inv) == 0):
+            c0_inv = np.real(c0_inv)
+        for t in range(1, nlags + 1):
+            ct = acov_list[t]
+            to_add = np.trace(ct.T @ c0_inv @ ct @ c0_inv)
+            if adjusted:
+                to_add /= self.nobs - t
+            statistic += to_add
+        statistic *= self.nobs ** 2 if adjusted else self.nobs
+
+        df = (
+            self.neqs ** 2 * (nlags - self.k_ar + 1)
+            - self.neqs * self.coint_rank
+        )
+        dist = scipy.stats.chi2(df)
+        pvalue = dist.sf(statistic)
+        crit_value = dist.ppf(1 - signif)
+
+        return WhitenessTestResults(
+            statistic, crit_value, pvalue, df, signif, nlags, adjusted
+        )

     def plot_data(self, with_presample=False):
         """
@@ -1231,7 +2307,10 @@ class VECMResults:
             If `False`, the pre-sample data (the first `k_ar` values) will
             not be plotted.
         """
-        pass
+        y = self.y_all if with_presample else self.y_all[:, self.k_ar :]
+        names = self.names
+        dates = self.dates if with_presample else self.dates[self.k_ar :]
+        plot.plot_mts(y.T, names=names, index=dates)

     def summary(self, alpha=0.05):
         """
@@ -1247,4 +2326,242 @@ class VECMResults:
         summary : :class:`statsmodels.iolib.summary.Summary`
             A summary containing information about estimated parameters.
         """
-        pass
+        from statsmodels.iolib.summary import summary_params
+
+        summary = Summary()
+
+        def make_table(
+            self,
+            params,
+            std_err,
+            t_values,
+            p_values,
+            conf_int,
+            mask,
+            names,
+            title,
+            strip_end=True,
+        ):
+            res = (
+                self,
+                params[mask],
+                std_err[mask],
+                t_values[mask],
+                p_values[mask],
+                conf_int[mask],
+            )
+            param_names = [
+                ".".join(name.split(".")[:-1]) if strip_end else name
+                for name in np.array(names)[mask].tolist()
+            ]
+            return summary_params(
+                res,
+                yname=None,
+                xname=param_names,
+                alpha=alpha,
+                use_t=False,
+                title=title,
+            )
+
+        # ---------------------------------------------------------------------
+        # Add tables with gamma and det_coef for each endogenous variable:
+        lagged_params_components = []
+        stderr_lagged_params_components = []
+        tvalues_lagged_params_components = []
+        pvalues_lagged_params_components = []
+        conf_int_lagged_params_components = []
+        if self.det_coef.size > 0:
+            lagged_params_components.append(self.det_coef.flatten(order="F"))
+            stderr_lagged_params_components.append(
+                self.stderr_det_coef.flatten(order="F")
+            )
+            tvalues_lagged_params_components.append(
+                self.tvalues_det_coef.flatten(order="F")
+            )
+            pvalues_lagged_params_components.append(
+                self.pvalues_det_coef.flatten(order="F")
+            )
+            conf_int = self.conf_int_det_coef(alpha=alpha)
+            lower = conf_int["lower"].flatten(order="F")
+            upper = conf_int["upper"].flatten(order="F")
+            conf_int_lagged_params_components.append(
+                np.column_stack((lower, upper))
+            )
+        if self.k_ar - 1 > 0:
+            lagged_params_components.append(self.gamma.flatten())
+            stderr_lagged_params_components.append(self.stderr_gamma.flatten())
+            tvalues_lagged_params_components.append(
+                self.tvalues_gamma.flatten()
+            )
+            pvalues_lagged_params_components.append(
+                self.pvalues_gamma.flatten()
+            )
+            conf_int = self.conf_int_gamma(alpha=alpha)
+            lower = conf_int["lower"].flatten()
+            upper = conf_int["upper"].flatten()
+            conf_int_lagged_params_components.append(
+                np.column_stack((lower, upper))
+            )
+
+        # if gamma or det_coef exists, then make a summary-table for them:
+        if len(lagged_params_components) != 0:
+            lagged_params = hstack(lagged_params_components)
+            stderr_lagged_params = hstack(stderr_lagged_params_components)
+            tvalues_lagged_params = hstack(tvalues_lagged_params_components)
+            pvalues_lagged_params = hstack(pvalues_lagged_params_components)
+            conf_int_lagged_params = vstack(conf_int_lagged_params_components)
+
+            for i in range(self.neqs):
+                masks = []
+                offset = 0
+                # 1. Deterministic terms outside cointegration relation
+                if "co" in self.deterministic:
+                    masks.append(offset + np.array(i, ndmin=1))
+                    offset += self.neqs
+                if self.seasons > 0:
+                    for _ in range(self.seasons - 1):
+                        masks.append(offset + np.array(i, ndmin=1))
+                        offset += self.neqs
+                if "lo" in self.deterministic:
+                    masks.append(offset + np.array(i, ndmin=1))
+                    offset += self.neqs
+                if self.exog is not None:
+                    for _ in range(self.exog.shape[1]):
+                        masks.append(offset + np.array(i, ndmin=1))
+                        offset += self.neqs
+                # 2. Lagged endogenous terms
+                if self.k_ar - 1 > 0:
+                    start = i * self.neqs * (self.k_ar - 1)
+                    end = (i + 1) * self.neqs * (self.k_ar - 1)
+                    masks.append(offset + np.arange(start, end))
+                    # offset += self.neqs**2 * (self.k_ar-1)
+
+                # Create the table
+                mask = np.concatenate(masks)
+                eq_name = self.model.endog_names[i]
+                title = (
+                    "Det. terms outside the coint. relation "
+                    + "& lagged endog. parameters for equation %s" % eq_name
+                )
+                table = make_table(
+                    self,
+                    lagged_params,
+                    stderr_lagged_params,
+                    tvalues_lagged_params,
+                    pvalues_lagged_params,
+                    conf_int_lagged_params,
+                    mask,
+                    self.model._lagged_param_names,
+                    title,
+                )
+                summary.tables.append(table)
+
+        # ---------------------------------------------------------------------
+        # Loading coefficients (alpha):
+        a = self.alpha.flatten()
+        se_a = self.stderr_alpha.flatten()
+        t_a = self.tvalues_alpha.flatten()
+        p_a = self.pvalues_alpha.flatten()
+        ci_a = self.conf_int_alpha(alpha=alpha)
+        lower = ci_a["lower"].flatten()
+        upper = ci_a["upper"].flatten()
+        ci_a = np.column_stack((lower, upper))
+        a_names = self.model._load_coef_param_names
+        alpha_masks = []
+        for i in range(self.neqs):
+            if self.coint_rank > 0:
+                start = i * self.coint_rank
+                end = start + self.coint_rank
+                mask = np.arange(start, end)
+
+            # Create the table
+            alpha_masks.append(mask)
+
+            eq_name = self.model.endog_names[i]
+            title = "Loading coefficients (alpha) for equation %s" % eq_name
+            table = make_table(
+                self, a, se_a, t_a, p_a, ci_a, mask, a_names, title
+            )
+            summary.tables.append(table)
+
+        # ---------------------------------------------------------------------
+        # Cointegration matrix/vector (beta) and det. terms inside coint. rel.:
+        coint_components = []
+        stderr_coint_components = []
+        tvalues_coint_components = []
+        pvalues_coint_components = []
+        conf_int_coint_components = []
+        if self.coint_rank > 0:
+            coint_components.append(self.beta.T.flatten())
+            stderr_coint_components.append(self.stderr_beta.T.flatten())
+            tvalues_coint_components.append(self.tvalues_beta.T.flatten())
+            pvalues_coint_components.append(self.pvalues_beta.T.flatten())
+            conf_int = self.conf_int_beta(alpha=alpha)
+            lower = conf_int["lower"].T.flatten()
+            upper = conf_int["upper"].T.flatten()
+            conf_int_coint_components.append(np.column_stack((lower, upper)))
+        if self.det_coef_coint.size > 0:
+            coint_components.append(self.det_coef_coint.flatten())
+            stderr_coint_components.append(
+                self.stderr_det_coef_coint.flatten()
+            )
+            tvalues_coint_components.append(
+                self.tvalues_det_coef_coint.flatten()
+            )
+            pvalues_coint_components.append(
+                self.pvalues_det_coef_coint.flatten()
+            )
+            conf_int = self.conf_int_det_coef_coint(alpha=alpha)
+            lower = conf_int["lower"].flatten()
+            upper = conf_int["upper"].flatten()
+            conf_int_coint_components.append(np.column_stack((lower, upper)))
+        coint = hstack(coint_components)
+        stderr_coint = hstack(stderr_coint_components)
+        tvalues_coint = hstack(tvalues_coint_components)
+        pvalues_coint = hstack(pvalues_coint_components)
+        conf_int_coint = vstack(conf_int_coint_components)
+        coint_names = self.model._coint_param_names
+
+        for i in range(self.coint_rank):
+            masks = []
+            offset = 0
+
+            # 1. Cointegration matrix (beta)
+            if self.coint_rank > 0:
+                start = i * self.neqs
+                end = start + self.neqs
+                masks.append(offset + np.arange(start, end))
+                offset += self.neqs * self.coint_rank
+
+            # 2. Deterministic terms inside cointegration relation
+            if "ci" in self.deterministic:
+                masks.append(offset + np.array(i, ndmin=1))
+                offset += self.coint_rank
+            if "li" in self.deterministic:
+                masks.append(offset + np.array(i, ndmin=1))
+                offset += self.coint_rank
+            if self.exog_coint is not None:
+                for _ in range(self.exog_coint.shape[1]):
+                    masks.append(offset + np.array(i, ndmin=1))
+                    offset += self.coint_rank
+
+            # Create the table
+            mask = np.concatenate(masks)
+            title = (
+                "Cointegration relations for "
+                + "loading-coefficients-column %d" % (i + 1)
+            )
+            table = make_table(
+                self,
+                coint,
+                stderr_coint,
+                tvalues_coint,
+                pvalues_coint,
+                conf_int_coint,
+                mask,
+                coint_names,
+                title,
+            )
+            summary.tables.append(table)
+
+        return summary
diff --git a/statsmodels/tsa/x13.py b/statsmodels/tsa/x13.py
index 7f9b15cce..07a98a7d3 100644
--- a/statsmodels/tsa/x13.py
+++ b/statsmodels/tsa/x13.py
@@ -8,21 +8,27 @@ Many of the functions are called x12. However, they are also intended to work
 for x13. If this is not the case, it's a bug.
 """
 from statsmodels.compat.pandas import deprecate_kwarg
+
 import os
 import subprocess
 import tempfile
 import re
 from warnings import warn
+
 import pandas as pd
+
 from statsmodels.tools.tools import Bunch
-from statsmodels.tools.sm_exceptions import X13NotFoundError, IOWarning, X13Error, X13Warning
-__all__ = ['x13_arima_select_order', 'x13_arima_analysis']
-_binary_names = ('x13as.exe', 'x13as', 'x12a.exe', 'x12a', 'x13as_ascii',
-    'x13as_html')
+from statsmodels.tools.sm_exceptions import (X13NotFoundError,
+                                             IOWarning, X13Error,
+                                             X13Warning)

+__all__ = ["x13_arima_select_order", "x13_arima_analysis"]

-class _freq_to_period:
+_binary_names = ('x13as.exe', 'x13as', 'x12a.exe', 'x12a',
+                 'x13as_ascii', 'x13as_html')

+
+class _freq_to_period:
     def __getitem__(self, key):
         if key.startswith('M'):
             return 12
@@ -33,9 +39,10 @@ class _freq_to_period:


 _freq_to_period = _freq_to_period()
-_period_to_freq = {(12): 'M', (4): 'Q'}
-_log_to_x12 = {(True): 'log', (False): 'none', None: 'auto'}
-_bool_to_yes_no = lambda x: 'yes' if x else 'no'
+
+_period_to_freq = {12: 'M', 4: 'Q'}
+_log_to_x12 = {True: 'log', False: 'none', None: 'auto'}
+_bool_to_yes_no = lambda x: 'yes' if x else 'no'  # noqa:E731


 def _find_x12(x12path=None, prefer_x13=True):
@@ -45,7 +52,43 @@ def _find_x12(x12path=None, prefer_x13=True):
     X13PATH must be defined. If prefer_x13 is True, only X13PATH is searched
     for. If it is false, only X12PATH is searched for.
     """
-    pass
+    global _binary_names
+    if x12path is not None and x12path.endswith(_binary_names):
+        # remove binary from path if path is not a directory
+        if not os.path.isdir(x12path):
+            x12path = os.path.dirname(x12path)
+
+    if not prefer_x13:  # search for x12 first
+        _binary_names = _binary_names[::-1]
+        if x12path is None:
+            x12path = os.getenv("X12PATH", "")
+        if not x12path:
+            x12path = os.getenv("X13PATH", "")
+    elif x12path is None:
+        x12path = os.getenv("X13PATH", "")
+        if not x12path:
+            x12path = os.getenv("X12PATH", "")
+
+    for binary in _binary_names:
+        x12 = os.path.join(x12path, binary)
+        try:
+            subprocess.check_call(x12, stdout=subprocess.PIPE,
+                                  stderr=subprocess.PIPE)
+            return x12
+        except OSError:
+            pass
+
+    else:
+        return False
+
+
+def _check_x12(x12path=None):
+    x12path = _find_x12(x12path)
+    if not x12path:
+        raise X13NotFoundError("x12a and x13as not found on path. Give the "
+                               "path, put them on PATH, or set the "
+                               "X12PATH or X13PATH environmental variable.")
+    return x12path


 def _clean_order(order):
@@ -53,7 +96,97 @@ def _clean_order(order):
     Takes something like (1 1 0)(0 1 1) and returns a arma order, sarma
     order tuple. Also accepts (1 1 0) and return arma order and (0, 0, 0)
     """
-    pass
+    order = re.findall(r"\([0-9 ]*?\)", order)
+
+    def clean(x):
+        return tuple(map(int, re.sub("[()]", "", x).split(" ")))
+
+    if len(order) > 1:
+        order, sorder = map(clean, order)
+    else:
+        order = clean(order[0])
+        sorder = (0, 0, 0)
+
+    return order, sorder
+
+
+def run_spec(x12path, specpath, outname=None, meta=False, datameta=False):
+
+    if meta and datameta:
+        raise ValueError("Cannot specify both meta and datameta.")
+    if meta:
+        args = [x12path, "-m " + specpath]
+    elif datameta:
+        args = [x12path, "-d " + specpath]
+    else:
+        args = [x12path, specpath]
+
+    if outname:
+        args += [outname]
+
+    return subprocess.Popen(args, stdout=subprocess.PIPE,
+                            stderr=subprocess.STDOUT)
+
+
+def _make_automdl_options(maxorder, maxdiff, diff):
+    options = "\n"
+    options += "maxorder = ({0} {1})\n".format(maxorder[0], maxorder[1])
+    if maxdiff is not None:  # maxdiff always takes precedence
+        options += "maxdiff = ({0} {1})\n".format(maxdiff[0], maxdiff[1])
+    else:
+        options += "diff = ({0} {1})\n".format(diff[0], diff[1])
+    return options
+
+
+def _make_var_names(exog):
+    if hasattr(exog, "name"):
+        var_names = [exog.name]
+    elif hasattr(exog, "columns"):
+        var_names = exog.columns
+    else:
+        raise ValueError("exog is not a Series or DataFrame or is unnamed.")
+    try:
+        var_names = " ".join(var_names)
+    except TypeError:  # cannot have names that are numbers, pandas default
+        from statsmodels.base.data import _make_exog_names
+        if exog.ndim == 1:
+            var_names = "x1"
+        else:
+            var_names = " ".join(_make_exog_names(exog))
+    return var_names
+
+
+def _make_regression_options(trading, exog):
+    if not trading and exog is None:  # start regression spec
+        return ""
+
+    reg_spec = "regression{\n"
+    if trading:
+        reg_spec += "    variables = (td)\n"
+    if exog is not None:
+        var_names = _make_var_names(exog)
+        reg_spec += "    user = ({0})\n".format(var_names)
+        reg_spec += "    data = ({0})\n".format("\n".join(map(str,
+                                                exog.values.ravel().tolist())))
+
+    reg_spec += "}\n"  # close out regression spec
+    return reg_spec
+
+
+def _make_forecast_options(forecast_periods):
+    if forecast_periods is None:
+        return ""
+    forecast_spec = "forecast{\n"
+    forecast_spec += "maxlead = ({0})\n}}\n".format(forecast_periods)
+    return forecast_spec
+
+
+def _check_errors(errors):
+    errors = errors[errors.find("spc:")+4:].strip()
+    if errors and 'ERROR' in errors:
+        raise X13Error(errors)
+    elif errors and 'WARNING' in errors:
+        warn(errors, X13Warning)


 def _convert_out_to_series(x, dates, name):
@@ -61,11 +194,39 @@ def _convert_out_to_series(x, dates, name):
     Convert x to a DataFrame where x is a string in the format given by
     x-13arima-seats output.
     """
-    pass
+    from io import StringIO
+    from pandas import read_csv
+    out = read_csv(StringIO(x), skiprows=2,
+                   header=None, sep='\t', engine='python')
+    return out.set_index(dates).rename(columns={1: name})[name]
+
+
+def _open_and_read(fname):
+    # opens a file, reads it, and make sure it's closed
+    with open(fname, 'r', encoding="utf-8") as fin:
+        fout = fin.read()
+    return fout


 class Spec:
-    pass
+    @property
+    def spec_name(self):
+        return self.__class__.__name__.replace("Spec", "")
+
+    def create_spec(self, **kwargs):
+        spec = """{name} {{
+        {options}
+        }}
+        """
+        return spec.format(name=self.spec_name,
+                           options=self.options)
+
+    def set_options(self, **kwargs):
+        options = ""
+        for key, value in kwargs.items():
+            options += "{0}={1}\n".format(key, value)
+            self.__dict__.update({key: value})
+        self.options = options


 class SeriesSpec(Spec):
@@ -99,25 +260,71 @@ class SeriesSpec(Spec):
     saveprecision
     trimzero
     """
-
     def __init__(self, data, name='Unnamed Series', appendbcst=False,
-        appendfcst=False, comptype=None, compwt=1, decimals=0, modelspan=(),
-        period=12, precision=0, to_print=[], to_save=[], span=(), start=(1,
-        1), title='', series_type=None, divpower=None, missingcode=-99999,
-        missingval=1000000000):
-        appendbcst, appendfcst = map(_bool_to_yes_no, [appendbcst, appendfcst])
-        series_name = '"{0}"'.format(name[:64])
-        title = '"{0}"'.format(title[:79])
-        self.set_options(data=data, appendbcst=appendbcst, appendfcst=
-            appendfcst, period=period, start=start, title=title, name=
-            series_name)
+                 appendfcst=False,
+                 comptype=None, compwt=1, decimals=0, modelspan=(),
+                 period=12, precision=0, to_print=[], to_save=[], span=(),
+                 start=(1, 1), title='', series_type=None, divpower=None,
+                 missingcode=-99999, missingval=1000000000):
+
+        appendbcst, appendfcst = map(_bool_to_yes_no, [appendbcst,
+                                                       appendfcst,
+                                                       ])
+
+        series_name = "\"{0}\"".format(name[:64])  # trim to 64 characters
+        title = "\"{0}\"".format(title[:79])  # trim to 79 characters
+        self.set_options(data=data, appendbcst=appendbcst,
+                         appendfcst=appendfcst, period=period, start=start,
+                         title=title, name=series_name,
+                         )
+
+
+def pandas_to_series_spec(x):
+    # from statsmodels.tools.data import _check_period_index
+    # check_period_index(x)
+    if hasattr(x, 'columns'):  # convert to series
+        if len(x.columns) > 1:
+            raise ValueError("Does not handle DataFrame with more than one "
+                             "column")
+        x = x[x.columns[0]]
+
+    data = "({0})".format("\n".join(map(str, x.values.tolist())))
+
+    # get periodicity
+    # get start / first data
+    # give it a title
+    try:
+        period = _freq_to_period[x.index.freqstr]
+    except (AttributeError, ValueError):
+        from pandas.tseries.api import infer_freq
+        period = _freq_to_period[infer_freq(x.index)]
+    start_date = x.index[0]
+    if period == 12:
+        year, stperiod = start_date.year, start_date.month
+    elif period == 4:
+        year, stperiod = start_date.year, start_date.quarter
+    else:  # pragma: no cover
+        raise ValueError("Only monthly and quarterly periods are supported."
+                         " Please report or send a pull request if you want "
+                         "this extended.")
+
+    if hasattr(x, 'name'):
+        name = x.name or "Unnamed Series"
+    else:
+        name = 'Unnamed Series'
+    series_spec = SeriesSpec(data=data, name=name, period=period,
+                             title=name, start="{0}.{1}".format(year,
+                                                                stperiod))
+    return series_spec


 @deprecate_kwarg('forecast_years', 'forecast_periods')
 def x13_arima_analysis(endog, maxorder=(2, 1), maxdiff=(2, 1), diff=None,
-    exog=None, log=None, outlier=True, trading=False, forecast_periods=None,
-    retspec=False, speconly=False, start=None, freq=None, print_stdout=
-    False, x12path=None, prefer_x13=True, tempdir=None):
+                       exog=None, log=None, outlier=True, trading=False,
+                       forecast_periods=None, retspec=False,
+                       speconly=False, start=None, freq=None,
+                       print_stdout=False, x12path=None, prefer_x13=True,
+                       tempdir=None):
     """
     Perform x13-arima analysis for monthly or quarterly data.

@@ -209,14 +416,89 @@ def x13_arima_analysis(endog, maxorder=(2, 1), maxdiff=(2, 1), diff=None,
     directory, invoking exog12/X13 in a subprocess, and reading the output
     back in.
     """
-    pass
+    x12path = _check_x12(x12path)
+
+    if not isinstance(endog, (pd.DataFrame, pd.Series)):
+        if start is None or freq is None:
+            raise ValueError("start and freq cannot be none if endog is not "
+                             "a pandas object")
+        endog = pd.Series(endog, index=pd.DatetimeIndex(start=start,
+                                                        periods=len(endog),
+                                                        freq=freq))
+    spec_obj = pandas_to_series_spec(endog)
+    spec = spec_obj.create_spec()
+    spec += "transform{{function={0}}}\n".format(_log_to_x12[log])
+    if outlier:
+        spec += "outlier{}\n"
+    options = _make_automdl_options(maxorder, maxdiff, diff)
+    spec += "automdl{{{0}}}\n".format(options)
+    spec += _make_regression_options(trading, exog)
+    spec += _make_forecast_options(forecast_periods)
+    spec += "x11{ save=(d11 d12 d13) }"
+    if speconly:
+        return spec
+    # write it to a tempfile
+    # TODO: make this more robust - give the user some control?
+    ftempin = tempfile.NamedTemporaryFile(delete=False,
+                                          suffix='.spc',
+                                          dir=tempdir)
+    ftempout = tempfile.NamedTemporaryFile(delete=False, dir=tempdir)
+    try:
+        ftempin.write(spec.encode('utf8'))
+        ftempin.close()
+        ftempout.close()
+        # call x12 arima
+        p = run_spec(x12path, ftempin.name[:-4], ftempout.name)
+        p.wait()
+        stdout = p.stdout.read()
+        if print_stdout:
+            print(p.stdout.read())
+        # check for errors
+        errors = _open_and_read(ftempout.name + '.err')
+        _check_errors(errors)
+
+        # read in results
+        results = _open_and_read(ftempout.name + '.out')
+        seasadj = _open_and_read(ftempout.name + '.d11')
+        trend = _open_and_read(ftempout.name + '.d12')
+        irregular = _open_and_read(ftempout.name + '.d13')
+    finally:
+        try:  # sometimes this gives a permission denied error?
+            #   not sure why. no process should have these open
+            os.remove(ftempin.name)
+            os.remove(ftempout.name)
+        except OSError:
+            if os.path.exists(ftempin.name):
+                warn("Failed to delete resource {0}".format(ftempin.name),
+                     IOWarning)
+            if os.path.exists(ftempout.name):
+                warn("Failed to delete resource {0}".format(ftempout.name),
+                     IOWarning)
+
+    seasadj = _convert_out_to_series(seasadj, endog.index, 'seasadj')
+    trend = _convert_out_to_series(trend, endog.index, 'trend')
+    irregular = _convert_out_to_series(irregular, endog.index, 'irregular')
+
+    # NOTE: there is not likely anything in stdout that's not in results
+    #       so may be safe to just suppress and remove it
+    if not retspec:
+        res = X13ArimaAnalysisResult(observed=endog, results=results,
+                                     seasadj=seasadj, trend=trend,
+                                     irregular=irregular, stdout=stdout)
+    else:
+        res = X13ArimaAnalysisResult(observed=endog, results=results,
+                                     seasadj=seasadj, trend=trend,
+                                     irregular=irregular, stdout=stdout,
+                                     spec=spec)
+    return res


 @deprecate_kwarg('forecast_years', 'forecast_periods')
-def x13_arima_select_order(endog, maxorder=(2, 1), maxdiff=(2, 1), diff=
-    None, exog=None, log=None, outlier=True, trading=False,
-    forecast_periods=None, start=None, freq=None, print_stdout=False,
-    x12path=None, prefer_x13=True, tempdir=None):
+def x13_arima_select_order(endog, maxorder=(2, 1), maxdiff=(2, 1), diff=None,
+                           exog=None, log=None, outlier=True, trading=False,
+                           forecast_periods=None,
+                           start=None, freq=None, print_stdout=False,
+                           x12path=None, prefer_x13=True, tempdir=None):
     """
     Perform automatic seasonal ARIMA order identification using x12/x13 ARIMA.

@@ -298,11 +580,44 @@ def x13_arima_select_order(endog, maxorder=(2, 1), maxdiff=(2, 1), diff=
     directory, invoking X12/X13 in a subprocess, and reading the output back
     in.
     """
-    pass
+    results = x13_arima_analysis(endog, x12path=x12path, exog=exog, log=log,
+                                 outlier=outlier, trading=trading,
+                                 forecast_periods=forecast_periods,
+                                 maxorder=maxorder, maxdiff=maxdiff, diff=diff,
+                                 start=start, freq=freq, prefer_x13=prefer_x13,
+                                 tempdir=tempdir)
+    model = re.search("(?<=Final automatic model choice : ).*",
+                      results.results)
+    order = model.group()
+    if re.search("Mean is not significant", results.results):
+        include_mean = False
+    elif re.search("Constant", results.results):
+        include_mean = True
+    else:
+        include_mean = False
+    order, sorder = _clean_order(order)
+    res = Bunch(order=order, sorder=sorder, include_mean=include_mean,
+                results=results.results, stdout=results.stdout)
+    return res


 class X13ArimaAnalysisResult:
-
     def __init__(self, **kwargs):
         for key, value in kwargs.items():
             setattr(self, key, value)
+
+    def plot(self):
+        from statsmodels.graphics.utils import _import_mpl
+        plt = _import_mpl()
+        fig, axes = plt.subplots(4, 1, sharex=True)
+        self.observed.plot(ax=axes[0], legend=False)
+        axes[0].set_ylabel('Observed')
+        self.seasadj.plot(ax=axes[1], legend=False)
+        axes[1].set_ylabel('Seas. Adjusted')
+        self.trend.plot(ax=axes[2], legend=False)
+        axes[2].set_ylabel('Trend')
+        self.irregular.plot(ax=axes[3], legend=False)
+        axes[3].set_ylabel('Irregular')
+
+        fig.tight_layout()
+        return fig

Reference (Gold): statsmodels

Pytest Summary for test statsmodels

Failed pytests:

test_discrete.py::TestLogitNewton::test_normalized_cov_params

test_discrete.py::TestLogitBFGS::test_normalized_cov_params

test_var.py::TestVARResults::test_fevd_cov

test_hannan_rissanen.py::test_invalid_orders

test_hannan_rissanen.py::test_initial_order

test_exact_diffuse_filtering.py::TestVAR1Mixed_Approx::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_Approx::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_Approx::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_Approx::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1_Approx::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1Mixed_Approx::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1_Approx::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1_Approx::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1Mixed_Approx::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestDFM_Approx::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestDFM_Approx::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestDFM_Approx::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestDFMCollapsed_Approx::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestDFMCollapsed_Approx::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestDFMCollapsed_Approx::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1Missing_Approx::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1Missing_Approx::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1Missing_Approx::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1Mixed_KFAS::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1Mixed_KFAS::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestDFM_KFAS::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1Mixed_KFAS::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestDFM_KFAS::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1_KFAS::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestDFM_KFAS::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1Missing_KFAS::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1Missing_KFAS::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1Missing_KFAS::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1_KFAS::test_simulation_smoothed_state_disturbance

test_exact_diffuse_filtering.py::TestVAR1_KFAS::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_KFAS::test_simulation_smoothed_measurement_disturbance

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_KFAS::test_simulation_smoothed_state

test_exact_diffuse_filtering.py::TestVAR1MeasurementError_KFAS::test_simulation_smoothed_state_disturbance

test_discrete.py::TestMNLogitNewtonBaseZero::test_cov_params

test_discrete.py::TestMNLogitNewtonBaseZero::test_normalized_cov_params

test_discrete.py::TestMNLogitNewtonBaseZero::test_distr

test_discrete.py::TestMNLogitLBFGSBaseZero::test_normalized_cov_params

test_discrete.py::TestMNLogitLBFGSBaseZero::test_distr

test_discrete.py::TestProbitNewton::test_normalized_cov_params

test_discrete.py::TestMNLogitLBFGSBaseZero::test_cov_params

test_discrete.py::TestProbitMinimizeDefault::test_normalized_cov_params

test_discrete.py::TestProbitBasinhopping::test_normalized_cov_params

test_discrete.py::TestProbitMinimizeAdditionalOptions::test_normalized_cov_params

test_discrete.py::TestProbitNM::test_normalized_cov_params

test_discrete.py::TestProbitPowell::test_normalized_cov_params

test_discrete.py::TestProbitNCG::test_normalized_cov_params

test_discrete.py::TestProbitCG::test_normalized_cov_params

test_discrete.py::TestProbitMinimizeDogleg::test_normalized_cov_params

test_discrete.py::TestProbitBFGS::test_normalized_cov_params

test_discrete.py::TestPoissonNewton::test_cov_params

test_discrete.py::TestPoissonNewton::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialPNB2BFGS::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialPNB2BFGS::test_pvalues

test_discrete.py::TestNegativeBinomialNB2BFGS::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialPNB1BFGS::test_pvalues

test_discrete.py::TestNegativeBinomialNB2BFGS::test_pvalues

test_discrete.py::TestNegativeBinomialNB2Newton::test_pvalues

test_discrete.py::TestNegativeBinomialNB2Newton::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialPNB1BFGS::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialPNB2Newton::test_pvalues

test_discrete.py::TestNegativeBinomialGeometricBFGS::test_pvalues

test_discrete.py::TestNegativeBinomialGeometricBFGS::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialNB1Newton::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialPNB2Newton::test_normalized_cov_params

test_discrete.py::TestNegativeBinomialNB1Newton::test_predict_xb

test_discrete.py::TestNegativeBinomialNB1BFGS::test_normalized_cov_params

test_tsaplots.py::test_plot_month

test_discrete.py::TestNegativeBinomialNB1BFGS::test_predict

test_discrete.py::TestNegativeBinomialNB1Newton::test_predict

test_discrete.py::TestNegativeBinomialNB1Newton::test_pvalues

test_discrete.py::TestNegativeBinomialNB1BFGS::test_pvalues

test_discrete.py::TestNegativeBinomialNB1BFGS::test_predict_xb

Pytest Summary for test `statsmodels`