Bug 1621350 - land NSS 710d10a72934 UPGRADE_NSS_RELEASE, r=jcj
authorKevin Jacobs <kjacobs@mozilla.com>
Tue, 10 Mar 2020 21:35:56 +0000
changeset 518044 3d799e2b89192d74a19961f4211051130ae4be77
parent 518043 84c74c8221024fbd312b08f495e6ecb2fb407ac5
child 518045 597d165cd5d3f66bb9b68d9adc7b8f4d6777d3de
push id37204
push userrmaries@mozilla.com
push dateWed, 11 Mar 2020 15:45:15 +0000
treeherdermozilla-central@5e32bdf73dc2 [default view] [failures only]
perfherder[talos] [build metrics] [platform microbench] (compared to previous push)
reviewersjcj
bugs1621350, 1618915, 1618739, 1619056, 1619102, 1612493, 1614183, 1618400, 1617533
milestone76.0a1
first release with
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
last release without
nightly linux32
nightly linux64
nightly mac
nightly win32
nightly win64
Bug 1621350 - land NSS 710d10a72934 UPGRADE_NSS_RELEASE, r=jcj 2020-03-10 Kevin Jacobs <kjacobs@mozilla.com> * lib/ssl/ssl3exthandle.c: Bug 1618915 - Fix UBSAN issue in ssl_ParseSessionTicket r=jcj,bbeurdouche [710d10a72934] [tip] 2020-03-09 Kevin Jacobs <kjacobs@mozilla.com> * lib/ssl/ssl3exthandle.c: Bug 1618739 - Don't assert fuzzer behavior in SSL_ParseSessionTicket r=jcj [12fc91fad84a] 2020-03-03 Benjamin Beurdouche <bbeurdouche@mozilla.com> * readme.md: Bug 1619056 - Update README: TLS 1.3 is not experimental anymore. r=jcj [08944e50dce0] 2020-03-09 Kevin Jacobs <kjacobs@mozilla.com> * gtests/ssl_gtest/ssl_version_unittest.cc, lib/ssl/sslexp.h, lib/ssl/sslimpl.h, lib/ssl/sslsock.c, lib/ssl/tls13exthandle.c: Bug 1619102 - Add workaround option to include both DTLS and TLS versions in DTLS supported_versions. r=mt Add an experimental function for enabling a DTLS 1.3 supported_versions compatibility workaround. [53803dc4628f] 2020-03-09 Benjamin Beurdouche <bbeurdouche@mozilla.com> * automation/taskcluster/scripts/run_hacl.sh, lib/freebl/verified/Hacl_Poly1305_128.c, lib/freebl/verified/Hacl_Poly1305_256.c: Bug 1612493 - Fix Firefox build for Windows 2012 x64. r=kjacobs [7e09cdab32d0] 2020-03-02 Kevin Jacobs <kjacobs@mozilla.com> * lib/freebl/blinit.c: Bug 1614183 - Fixup, clang-format. r=me [b17a367b83de] [NSS_3_51_BETA1] 2020-03-02 Giulio Benetti <giulio.benetti@benettiengineering.com> * lib/freebl/blinit.c: Bug 1614183 - Check if PPC __has_include(<sys/auxv.h>). r=kjacobs Some build environment doesn't provide <sys/auxv.h> and this causes build failure, so let's check if that header exists by using __has_include() helper. Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com> [bb7c46049f26] 2020-03-02 Kurt Miller <kurt@intricatesoftware.com> * lib/freebl/blinit.c: Bug 1618400 - Fix unused variable 'getauxval' on OpenBSD/arm64 r=jcj https://bugzilla.mozilla.org/show_bug.cgi?id=1618400 [2c989888dee7] 2020-02-28 Benjamin Beurdouche <bbeurdouche@mozilla.com> * automation/taskcluster/graph/src/extend.js, coreconf/arch.mk, coreconf/config.mk, lib/freebl/Makefile, lib/freebl/blapii.h, lib/freebl/blinit.c, lib/freebl/chacha20poly1305.c, lib/freebl/freebl.gyp, lib/freebl/verified/Hacl_Chacha20Poly1305_256.c, lib/freebl/verified/Hacl_Chacha20Poly1305_256.h, lib/freebl/verified/Hacl_Chacha20_Vec256.c, lib/freebl/verified/Hacl_Chacha20_Vec256.h, lib/freebl/verified/Hacl_Poly1305_256.c, lib/freebl/verified/Hacl_Poly1305_256.h, nss-tool/hw-support.c: Bug 1612493 - Support for HACL* AVX2 code for Chacha20, Poly1305 and Chacha20Poly1305. r=kjacobs *** Bug 1612493 - Import AVX2 code from HACL* *** Bug 1612493 - Add CPU detection for AVX2, BMI1, BMI2, FMA, MOVBE *** Bug 1612493 - New flag NSS_DISABLE_AVX2 for freebl/Makefile and freebl.gyp *** Bug 1612493 - Disable use of AVX2 on GCC 4.4 which doesn’t support -mavx2 *** Bug 1612493 - Disable tests when the platform doesn't have support for AVX2 [d5deac55f543] * automation/taskcluster/scripts/run_hacl.sh, lib/freebl/verified/Hacl_Chacha20.c, lib/freebl/verified/Hacl_Chacha20Poly1305_128.c, lib/freebl/verified/Hacl_Chacha20Poly1305_32.c, lib/freebl/verified/Hacl_Chacha20_Vec128.c, lib/freebl/verified/Hacl_Curve25519_51.c, lib/freebl/verified/Hacl_Kremlib.h, lib/freebl/verified/Hacl_Poly1305_128.c, lib/freebl/verified/Hacl_Poly1305_32.c, lib/freebl/verified/kremlin/include/kremlin/internal/types.h, lib/freebl/verified/kremlin/kremlib/dist/minimal/FStar_UInt128.h, li b/freebl/verified/kremlin/kremlib/dist/minimal/FStar_UInt128_Verifie d.h, lib/freebl/verified/kremlin/kremlib/dist/minimal/FStar_UInt_8_1 6_32_64.h, lib/freebl/verified/kremlin/kremlib/dist/minimal/LowStar_ Endianness.h, lib/freebl/verified/kremlin/kremlib/dist/minimal/fstar _uint128_gcc64.h, lib/freebl/verified/libintvector.h: Bug 1617533 - Update of HACL* after libintvector.h and coding style changes. r=kjacobs *** Bug 1617533 - Clang format *** Bug 1617533 - Update HACL* commit for job in Taskcluster *** Bug 1617533 - Update HACL* Kremlin code [b6677ae9067e] Differential Revision: https://phabricator.services.mozilla.com/D66264
old-configure.in
security/nss/TAG-INFO
security/nss/automation/abi-check/previous-nss-release
security/nss/automation/taskcluster/graph/src/extend.js
security/nss/automation/taskcluster/scripts/run_hacl.sh
security/nss/coreconf/arch.mk
security/nss/coreconf/config.mk
security/nss/coreconf/coreconf.dep
security/nss/gtests/ssl_gtest/ssl_version_unittest.cc
security/nss/lib/freebl/Makefile
security/nss/lib/freebl/blapii.h
security/nss/lib/freebl/blinit.c
security/nss/lib/freebl/chacha20poly1305.c
security/nss/lib/freebl/freebl.gyp
security/nss/lib/freebl/verified/Hacl_Chacha20.c
security/nss/lib/freebl/verified/Hacl_Chacha20Poly1305_128.c
security/nss/lib/freebl/verified/Hacl_Chacha20Poly1305_256.c
security/nss/lib/freebl/verified/Hacl_Chacha20Poly1305_256.h
security/nss/lib/freebl/verified/Hacl_Chacha20Poly1305_32.c
security/nss/lib/freebl/verified/Hacl_Chacha20_Vec128.c
security/nss/lib/freebl/verified/Hacl_Chacha20_Vec256.c
security/nss/lib/freebl/verified/Hacl_Chacha20_Vec256.h
security/nss/lib/freebl/verified/Hacl_Curve25519_51.c
security/nss/lib/freebl/verified/Hacl_Kremlib.h
security/nss/lib/freebl/verified/Hacl_Poly1305_128.c
security/nss/lib/freebl/verified/Hacl_Poly1305_256.c
security/nss/lib/freebl/verified/Hacl_Poly1305_256.h
security/nss/lib/freebl/verified/Hacl_Poly1305_32.c
security/nss/lib/freebl/verified/kremlin/include/kremlin/internal/types.h
security/nss/lib/freebl/verified/kremlin/kremlib/dist/minimal/FStar_UInt128.h
security/nss/lib/freebl/verified/kremlin/kremlib/dist/minimal/FStar_UInt128_Verified.h
security/nss/lib/freebl/verified/kremlin/kremlib/dist/minimal/FStar_UInt_8_16_32_64.h
security/nss/lib/freebl/verified/kremlin/kremlib/dist/minimal/LowStar_Endianness.h
security/nss/lib/freebl/verified/kremlin/kremlib/dist/minimal/fstar_uint128_gcc64.h
security/nss/lib/freebl/verified/libintvector.h
security/nss/lib/nss/nss.h
security/nss/lib/softoken/softkver.h
security/nss/lib/ssl/ssl3exthandle.c
security/nss/lib/ssl/sslexp.h
security/nss/lib/ssl/sslimpl.h
security/nss/lib/ssl/sslsock.c
security/nss/lib/ssl/tls13exthandle.c
security/nss/lib/util/nssutil.h
security/nss/nss-tool/hw-support.c
security/nss/readme.md
--- a/old-configure.in
+++ b/old-configure.in
@@ -1482,17 +1482,17 @@ dnl = If NSS was not detected in the sys
 dnl = use the one in the source tree (mozilla/security/nss)
 dnl ========================================================
 
 MOZ_ARG_WITH_BOOL(system-nss,
 [  --with-system-nss       Use system installed NSS],
     _USE_SYSTEM_NSS=1 )
 
 if test -n "$_USE_SYSTEM_NSS"; then
-    AM_PATH_NSS(3.51, [MOZ_SYSTEM_NSS=1], [AC_MSG_ERROR([you don't have NSS installed or your version is too old])])
+    AM_PATH_NSS(3.52, [MOZ_SYSTEM_NSS=1], [AC_MSG_ERROR([you don't have NSS installed or your version is too old])])
 fi
 
 NSS_CFLAGS="$NSS_CFLAGS -I${DIST}/include/nss"
 if test -z "$MOZ_SYSTEM_NSS"; then
    case "${OS_ARCH}" in
         # Only few platforms have been tested with GYP
         WINNT|Darwin|Linux|DragonFly|FreeBSD|NetBSD|OpenBSD|SunOS)
             ;;
--- a/security/nss/TAG-INFO
+++ b/security/nss/TAG-INFO
@@ -1,1 +1,1 @@
-NSS_3_51_RTM
\ No newline at end of file
+710d10a72934
\ No newline at end of file
--- a/security/nss/automation/abi-check/previous-nss-release
+++ b/security/nss/automation/abi-check/previous-nss-release
@@ -1,1 +1,1 @@
-NSS_3_50_BRANCH
+NSS_3_51_BRANCH
--- a/security/nss/automation/taskcluster/graph/src/extend.js
+++ b/security/nss/automation/taskcluster/graph/src/extend.js
@@ -96,17 +96,17 @@ queue.filter(task => {
   // Only old make builds have -Ddisable_libpkix=0 and can run chain tests.
   if (task.tests == "chains" && task.collection != "make") {
     return false;
   }
 
   // Don't run all additional hardware tests on ARM.
   if (task.group == "Cipher" && task.platform == "aarch64" && task.env &&
       (task.env.NSS_DISABLE_PCLMUL == "1" || task.env.NSS_DISABLE_HW_AES == "1"
-       || task.env.NSS_DISABLE_AVX == "1")) {
+       || task.env.NSS_DISABLE_AVX == "1" || task.env.NSS_DISABLE_AVX2 == "1")) {
     return false;
   }
 
   // Don't run DBM builds on aarch64.
   if (task.group == "DBM" && task.platform == "aarch64") {
     return false;
   }
 
@@ -1010,16 +1010,20 @@ function scheduleTests(task_build, task_
     name: "Cipher tests", symbol: "NoPCLMUL", tests: "cipher",
     env: {NSS_DISABLE_PCLMUL: "1"}, group: "Cipher"
   }));
   queue.scheduleTask(merge(cert_base_long, {
     name: "Cipher tests", symbol: "NoAVX", tests: "cipher",
     env: {NSS_DISABLE_AVX: "1"}, group: "Cipher"
   }));
   queue.scheduleTask(merge(cert_base_long, {
+    name: "Cipher tests", symbol: "NoAVX2", tests: "cipher",
+    env: {NSS_DISABLE_AVX2: "1"}, group: "Cipher"
+  }));
+  queue.scheduleTask(merge(cert_base_long, {
     name: "Cipher tests", symbol: "NoSSSE3|NEON", tests: "cipher",
     env: {
       NSS_DISABLE_ARM_NEON: "1",
       NSS_DISABLE_SSSE3: "1"
     }, group: "Cipher"
   }));
   queue.scheduleTask(merge(cert_base_long, {
     name: "Cipher tests", symbol: "NoSSE4.1", tests: "cipher",
--- a/security/nss/automation/taskcluster/scripts/run_hacl.sh
+++ b/security/nss/automation/taskcluster/scripts/run_hacl.sh
@@ -8,17 +8,17 @@ fi
 
 set -e -x -v
 
 # The docker image this is running in has NSS sources.
 # Get the HACL* source, containing a snapshot of the C code, extracted on the
 # HACL CI.
 # When bug 1593647 is resolved, extract the code on CI again.
 git clone -q "https://github.com/project-everest/hacl-star" ~/hacl-star
-git -C ~/hacl-star checkout -q 186a985597d57e3b587ceb0ef6deb0b5de706ae2
+git -C ~/hacl-star checkout -q 079854e0072041d60859b6d8af2743bc6a37dc05
 
 # Format the C snapshot.
 cd ~/hacl-star/dist/mozilla
 cp ~/nss/.clang-format .
 find . -type f -name '*.[ch]' -exec clang-format -i {} \+
 cd ~/hacl-star/dist/kremlin
 cp ~/nss/.clang-format .
 find . -type f -name '*.[ch]' -exec clang-format -i {} \+
--- a/security/nss/coreconf/arch.mk
+++ b/security/nss/coreconf/arch.mk
@@ -135,16 +135,45 @@ endif
 # For OS/2
 #
 ifeq ($(OS_ARCH),OS_2)
     OS_ARCH = OS2
     OS_RELEASE := $(shell uname -v)
 endif
 
 #######################################################################
+# Master "Core Components" macros for Hardware features               #
+#######################################################################
+
+ifndef NSS_DISABLE_AVX2
+    ifneq ($(CPU_ARCH),x86_64)
+        # Disable AVX2 entirely on non-Intel platforms
+        NSS_DISABLE_AVX2 = 1
+        $(warning CPU_ARCH is not x86_64, disabling -mavx2)
+    else
+        ifdef CC_IS_CLANG
+            # Clang reports its version as an older gcc, but it's OK
+            NSS_DISABLE_AVX2 = 0
+        else
+            ifneq (,$(filter 4.8 4.9,$(word 1,$(GCC_VERSION)).$(word 2,$(GCC_VERSION))))
+                NSS_DISABLE_AVX2 = 0
+            endif
+            ifeq (,$(filter 0 1 2 3 4,$(word 1,$(GCC_VERSION))))
+                NSS_DISABLE_AVX2 = 0
+            endif
+        endif
+        ifndef NSS_DISABLE_AVX2
+            $(warning Unable to find gcc 4.8 or greater, disabling -Werror)
+            NSS_DISABLE_AVX2 = 1
+        endif
+    endif
+    export NSS_DISABLE_AVX2
+endif #ndef NSS_DISABLE_AVX2
+
+#######################################################################
 # Master "Core Components" macros for getting the OS target           #
 #######################################################################
 
 #
 # Note: OS_TARGET should be specified on the command line for gmake.
 # When OS_TARGET=WIN95 is specified, then a Windows 95 target is built.
 # The difference between the Win95 target and the WinNT target is that
 # the WinNT target uses Windows NT specific features not available
--- a/security/nss/coreconf/config.mk
+++ b/security/nss/coreconf/config.mk
@@ -157,16 +157,20 @@ endif
 ifdef NSS_DISABLE_LIBPKIX
 DEFINES += -DNSS_DISABLE_LIBPKIX
 endif
 
 ifdef NSS_DISABLE_DBM
 DEFINES += -DNSS_DISABLE_DBM
 endif
 
+ifdef NSS_DISABLE_AVX2
+DEFINES += -DNSS_DISABLE_AVX2
+endif
+
 ifdef NSS_DISABLE_CHACHAPOLY
 DEFINES += -DNSS_DISABLE_CHACHAPOLY
 endif
 
 ifdef NSS_PKIX_NO_LDAP
 DEFINES += -DNSS_PKIX_NO_LDAP
 endif
 
--- a/security/nss/coreconf/coreconf.dep
+++ b/security/nss/coreconf/coreconf.dep
@@ -5,8 +5,9 @@
 
 /*
  * A dummy header file that is a dependency for all the object files.
  * Used to force a full recompilation of NSS in Mozilla's Tinderbox
  * depend builds.  See comments in rules.mk.
  */
 
 #error "Do not include this header file."
+
--- a/security/nss/gtests/ssl_gtest/ssl_version_unittest.cc
+++ b/security/nss/gtests/ssl_gtest/ssl_version_unittest.cc
@@ -350,16 +350,46 @@ TEST_F(DtlsConnectTest, DtlsSupportedVer
   ASSERT_TRUE(capture->extension().Read(1, 2, &version));
   EXPECT_EQ(0x7f00 | DTLS_1_3_DRAFT_VERSION, static_cast<int>(version));
   ASSERT_TRUE(capture->extension().Read(3, 2, &version));
   EXPECT_EQ(SSL_LIBRARY_VERSION_DTLS_1_2_WIRE, static_cast<int>(version));
   ASSERT_TRUE(capture->extension().Read(5, 2, &version));
   EXPECT_EQ(SSL_LIBRARY_VERSION_DTLS_1_0_WIRE, static_cast<int>(version));
 }
 
+// Verify the DTLS 1.3 supported_versions interop workaround.
+TEST_F(DtlsConnectTest, Dtls13VersionWorkaround) {
+  static const uint16_t kExpectVersionsWorkaround[] = {
+      0x7f00 | DTLS_1_3_DRAFT_VERSION, SSL_LIBRARY_VERSION_DTLS_1_2_WIRE,
+      SSL_LIBRARY_VERSION_TLS_1_2, SSL_LIBRARY_VERSION_DTLS_1_0_WIRE,
+      SSL_LIBRARY_VERSION_TLS_1_1};
+  const int min_ver = SSL_LIBRARY_VERSION_TLS_1_1,
+            max_ver = SSL_LIBRARY_VERSION_TLS_1_3;
+
+  // Toggle the workaround, then verify both encodings are present.
+  EnsureTlsSetup();
+  SSL_SetDtls13VersionWorkaround(client_->ssl_fd(), PR_TRUE);
+  SSL_SetDtls13VersionWorkaround(client_->ssl_fd(), PR_FALSE);
+  SSL_SetDtls13VersionWorkaround(client_->ssl_fd(), PR_TRUE);
+  client_->SetVersionRange(min_ver, max_ver);
+  server_->SetVersionRange(min_ver, max_ver);
+  auto capture = MakeTlsFilter<TlsExtensionCapture>(
+      client_, ssl_tls13_supported_versions_xtn);
+  Connect();
+
+  uint32_t version = 0;
+  size_t off = 1;
+  ASSERT_EQ(1 + sizeof(kExpectVersionsWorkaround), capture->extension().len());
+  for (unsigned int i = 0; i < PR_ARRAY_SIZE(kExpectVersionsWorkaround); i++) {
+    ASSERT_TRUE(capture->extension().Read(off, 2, &version));
+    EXPECT_EQ(kExpectVersionsWorkaround[i], static_cast<uint16_t>(version));
+    off += 2;
+  }
+}
+
 // Verify the client sends only TLS versions in supported_versions
 TEST_F(TlsConnectTest, TlsSupportedVersionsEncoding) {
   client_->SetVersionRange(SSL_LIBRARY_VERSION_TLS_1_0,
                            SSL_LIBRARY_VERSION_TLS_1_3);
   server_->SetVersionRange(SSL_LIBRARY_VERSION_TLS_1_0,
                            SSL_LIBRARY_VERSION_TLS_1_3);
   auto capture = MakeTlsFilter<TlsExtensionCapture>(
       client_, ssl_tls13_supported_versions_xtn);
--- a/security/nss/lib/freebl/Makefile
+++ b/security/nss/lib/freebl/Makefile
@@ -80,21 +80,21 @@ endif
 # The prelink command itself can reverse the process of modification and output
 # the prestine shared library as it was before prelink made it's changes.
 # This option tells Freebl could use prelink to output the original copy of
 # the shared library before prelink modified it.
 #
 # FREEBL_PRELINK_COMMAND
 #
 # This is an optional environment variable which can override the default
-# prelink command. It could be used on systems that did something similiar to 
-# prelink but used a different command and syntax. The only requirement is the 
-# program must take the library as the last argument, the program must output 
-# the original library to standard out, and the program does not need to take 
-# any quoted or imbedded spaces in its arguments (except the path to the 
+# prelink command. It could be used on systems that did something similiar to
+# prelink but used a different command and syntax. The only requirement is the
+# program must take the library as the last argument, the program must output
+# the original library to standard out, and the program does not need to take
+# any quoted or imbedded spaces in its arguments (except the path to the
 # library itself, which can have imbedded spaces or special characters).
 #
 ifdef FREEBL_USE_PRELINK
 	DEFINES += -DFREEBL_USE_PRELINK
 ifdef LINUX
 	DEFINES += -D__GNU_SOURCE=1
 endif
 endif
@@ -143,17 +143,17 @@ endif
 
 ifeq ($(OS_TARGET),OSF1)
     DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_NO_MP_WORD
     MPI_SRCS += mpvalpha.c
 endif
 
 ifeq (OS2,$(OS_TARGET))
     ASFILES  = mpi_x86_os2.s
-    DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE 
+    DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE
     DEFINES += -DMP_ASSEMBLY_DIV_2DX1D
     DEFINES += -DMP_USE_UINT_DIGIT -DMP_NO_MP_WORD
     DEFINES += -DMP_IS_LITTLE_ENDIAN
 endif
 
 ifeq (,$(filter-out WINNT WIN95,$(OS_TARGET)))
 ifndef USE_64
 # 32-bit Windows
@@ -164,17 +164,17 @@ ifdef NS_USE_GCC
 #                -DMP_ASSEMBLY_DIV_2DX1D
 # but we haven't figured out how to make it work, so we are not
 # using assembler right now.
     ASFILES  =
     DEFINES += -DMP_NO_MP_WORD -DMP_USE_UINT_DIGIT
 else
 # MSVC
     MPI_SRCS += mpi_x86_asm.c
-    DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE 
+    DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE
     DEFINES += -DMP_ASSEMBLY_DIV_2DX1D -DMP_USE_UINT_DIGIT -DMP_NO_MP_WORD
     ifdef BUILD_OPT
 	OPTIMIZER += -Ox  # maximum optimization for freebl
     endif
     # The Intel AES assembly code requires Visual C++ 2010.
     # if $(_MSC_VER) >= 1600 (Visual C++ 2010)
     ifeq ($(firstword $(sort $(_MSC_VER) 1600)),1600)
 	DEFINES += -DUSE_HW_AES -DINTEL_GCM
@@ -215,17 +215,17 @@ endif
 endif
 
 ifeq ($(OS_TARGET),IRIX)
 ifeq ($(USE_N32),1)
     ASFILES  = mpi_mips.s
     ifeq ($(NS_USE_GCC),1)
 	ASFLAGS = -Wp,-P -Wp,-traditional -O -mips3
     else
-	ASFLAGS = -O -OPT:Olimit=4000 -dollar -fullwarn -xansi -n32 -mips3 
+	ASFLAGS = -O -OPT:Olimit=4000 -dollar -fullwarn -xansi -n32 -mips3
     endif
     DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE
     DEFINES += -DMP_USE_UINT_DIGIT
 endif
 endif
 
 ifeq ($(OS_TARGET),Darwin)
 ifeq ($(CPU_ARCH),x86)
@@ -248,22 +248,22 @@ ifeq ($(CPU_ARCH),x86_64)
     DEFINES += -DUSE_HW_AES -DINTEL_GCM
     ASFILES += intel-aes.s intel-gcm.s
     EXTRA_SRCS += intel-gcm-wrap.c
     INTEL_GCM = 1
     MPI_SRCS += mpi_amd64.c mp_comba.c
 endif
 ifeq ($(CPU_ARCH),x86)
     ASFILES  = mpi_x86.s
-    DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE 
+    DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE
     DEFINES += -DMP_ASSEMBLY_DIV_2DX1D -DMP_USE_UINT_DIGIT
     DEFINES += -DMP_IS_LITTLE_ENDIAN
 endif
 ifeq ($(CPU_ARCH),arm)
-    DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE 
+    DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE
     DEFINES += -DMP_USE_UINT_DIGIT
     DEFINES += -DSHA_NO_LONG_LONG # avoid 64-bit arithmetic in SHA512
     MPI_SRCS += mpi_arm.c
 endif
 ifeq ($(CPU_ARCH),ppc)
     EXTRA_SRCS += gcm-ppc.c
 ifdef USE_64
     DEFINES += -DNSS_NO_INIT_SUPPORT
@@ -278,36 +278,36 @@ ifeq ($(OS_TARGET),AIX)
     endif
 endif # AIX
 
 ifeq ($(OS_TARGET), HP-UX)
 ifneq ($(OS_TEST), ia64)
 # PA-RISC
 ASFILES += ret_cr16.s
 ifndef USE_64
-    FREEBL_BUILD_SINGLE_SHLIB = 
+    FREEBL_BUILD_SINGLE_SHLIB =
     HAVE_ABI32_INT32 = 1
     HAVE_ABI32_FPU = 1
 endif
 ifdef FREEBL_CHILD_BUILD
 ifdef USE_ABI32_INT32
 # build for DA1.1 (HP PA 1.1) 32-bit ABI build with 32-bit arithmetic
     DEFINES  += -DMP_USE_UINT_DIGIT -DMP_NO_MP_WORD
     DEFINES += -DSHA_NO_LONG_LONG # avoid 64-bit arithmetic in SHA512
 else
 ifdef USE_64
-# this builds for DA2.0W (HP PA 2.0 Wide), the LP64 ABI, using 64-bit digits 
-    MPI_SRCS += mpi_hp.c 
-    ASFILES  += hpma512.s hppa20.s 
+# this builds for DA2.0W (HP PA 2.0 Wide), the LP64 ABI, using 64-bit digits
+    MPI_SRCS += mpi_hp.c
+    ASFILES  += hpma512.s hppa20.s
     DEFINES  += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE
 else
-# this builds for DA2.0 (HP PA 2.0 Narrow) ABI32_FPU model 
+# this builds for DA2.0 (HP PA 2.0 Narrow) ABI32_FPU model
 # (the 32-bit ABI with 64-bit registers) using 64-bit digits
-    MPI_SRCS += mpi_hp.c 
-    ASFILES  += hpma512.s hppa20.s 
+    MPI_SRCS += mpi_hp.c
+    ASFILES  += hpma512.s hppa20.s
     DEFINES  += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE
 ifndef NS_USE_GCC
     ARCHFLAG = -Aa +e +DA2.0 +DS2.0
 endif
 endif
 endif
 endif
 endif
@@ -332,17 +332,17 @@ ifdef NS_USE_GCC
     else
 	MKSHLIB += -Wl,-B,symbolic,-z,now,-z,text
     endif # GCC_USE_GNU_LD
 else
     MKSHLIB += -B symbolic -z now -z text
 endif # NS_USE_GCC
 
 # Sun's WorkShop defines v8, v8plus and v9 architectures.
-# gcc on Solaris defines v8 and v9 "cpus".  
+# gcc on Solaris defines v8 and v9 "cpus".
 # gcc's v9 is equivalent to Workshop's v8plus.
 # gcc's -m64 is equivalent to Workshop's v9
 # We always use Sun's assembler, which uses Sun's naming convention.
 ifeq ($(CPU_ARCH),sparc)
     FREEBL_BUILD_SINGLE_SHLIB=
     ifdef USE_64
         HAVE_ABI64_INT = 1
         HAVE_ABI64_FPU = 1
@@ -382,30 +382,30 @@ ifeq ($(CPU_ARCH),sparc)
 	    FPU_TARGET_OPTIMIZER = -xcache=64/32/4:1024/64/4 -xchip=ultra3
 	else
 	    # Forte 6 C compiler generates incorrect code for rijndael.c
 	    # if -xchip=ultra3 is used (Bugzilla bug 333925).  So we revert
 	    # to what we used in NSS 3.10.
 	    FPU_TARGET_OPTIMIZER = -xchip=ultra2
 	endif
 	ifdef USE_ABI32_INT64
-	    # this builds for Sparc v8+a ABI32_FPU architecture, 64-bit registers, 
+	    # this builds for Sparc v8+a ABI32_FPU architecture, 64-bit registers,
 	    # 32-bit ABI, it uses 64-bit words, integer arithmetic,
 	    # no FPU (non-VIS cpus).
 	    # These flags were suggested by the compiler group for building
 	    # with SunStudio 10.
 	    ifdef BUILD_OPT
                 SOL_CFLAGS += -xO4
 	    endif
  	    SOL_CFLAGS += -xtarget=generic
 	    ARCHFLAG = -xarch=v8plus
 	    SOLARIS_AS_FLAGS = -xarch=v8plus -K PIC
 	endif
 	ifdef USE_ABI32_FPU
-	    # this builds for Sparc v8+a ABI32_FPU architecture, 64-bit registers, 
+	    # this builds for Sparc v8+a ABI32_FPU architecture, 64-bit registers,
 	    # 32-bit ABI, it uses FPU code, and 32-bit word size.
 	    # these flags were determined by running cc -### -fast and copying
 	    # the generated flag settings
 	    SOL_CFLAGS += -fsingle -xmemalign=8s
 	    ifdef BUILD_OPT
                 SOL_CFLAGS += -D__MATHERR_ERRNO_DONTCARE -fsimple=1
                 SOL_CFLAGS += -xalias_level=basic -xbuiltin=%all
                 SOL_CFLAGS += $(FPU_TARGET_OPTIMIZER) -xdepend
@@ -437,22 +437,22 @@ ifeq ($(CPU_ARCH),sparc)
 	    endif
 	    ARCHFLAG = -xarch=v9a
 	    SOLARIS_AS_FLAGS = -xarch=v9a -K PIC
 	endif
     endif # NS_USE_GCC
 
     ### set flags for both GCC and Sun cc
     ifdef USE_ABI32_INT64
-	# this builds for Sparc v8+a ABI32_FPU architecture, 64-bit registers, 
+	# this builds for Sparc v8+a ABI32_FPU architecture, 64-bit registers,
 	# 32-bit ABI, it uses 64-bit words, integer arithmetic, no FPU
 	# best times are with no MP_ flags specified
     endif
     ifdef USE_ABI32_FPU
-	# this builds for Sparc v8+a ABI32_FPU architecture, 64-bit registers, 
+	# this builds for Sparc v8+a ABI32_FPU architecture, 64-bit registers,
 	# 32-bit ABI, it uses FPU code, and 32-bit word size
 	MPI_SRCS += mpi_sparc.c
 	ASFILES  = mpv_sparcv8.s montmulfv8.s
 	DEFINES  += -DMP_NO_MP_WORD -DMP_USE_UINT_DIGIT -DMP_ASSEMBLY_MULTIPLY
 	DEFINES  += -DMP_USING_MONT_MULF -DMP_MONT_USE_MP_MUL
     endif
     ifdef USE_ABI64_INT
 	# this builds for Sparc v9a pure 64-bit architecture
@@ -498,17 +498,17 @@ else
 	DEFINES += -DNSS_USE_COMBA -DMP_IS_LITTLE_ENDIAN
 	# comment the next two lines to turn off Intel HW acceleration
 	DEFINES += -DUSE_HW_AES
 	ASFILES += intel-aes.s
 	MPI_SRCS += mpi_amd64.c
     else
 	# Solaris x86
 	DEFINES += -DMP_USE_UINT_DIGIT
-	DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE 
+	DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE
 	DEFINES += -DMP_ASSEMBLY_DIV_2DX1D
 	ASFILES  = mpi_i86pc.s
  	ifndef NS_USE_GCC
  	   MPCPU_SRCS =
  	   ASFILES += mpcpucache_x86.s
  	endif
     endif
 endif # Solaris for non-sparc family CPUs
@@ -521,31 +521,39 @@ ifneq ($(shell $(CC) -? 2>&1 >/dev/null 
     ifdef CC_IS_CLANG
             HAVE_INT128_SUPPORT = 1
             DEFINES += -DHAVE_INT128_SUPPORT
     else ifeq (1,$(CC_IS_GCC))
         ifneq (,$(filter 4.6 4.7 4.8 4.9,$(word 1,$(GCC_VERSION)).$(word 2,$(GCC_VERSION))))
             HAVE_INT128_SUPPORT = 1
             DEFINES += -DHAVE_INT128_SUPPORT
         endif
+        ifneq (,$(filter 4.8 4.9,$(word 1,$(GCC_VERSION)).$(word 2,$(GCC_VERSION))))
+            NSS_DISABLE_AVX2 = 1
+        endif
         ifeq (,$(filter 0 1 2 3 4,$(word 1,$(GCC_VERSION))))
             HAVE_INT128_SUPPORT = 1
+            NSS_DISABLE_AVX2 = 0
             DEFINES += -DHAVE_INT128_SUPPORT
         endif
     endif
 endif # lcc
 endif # USE_64
 
 ifndef HAVE_INT128_SUPPORT
     DEFINES += -DKRML_VERIFIED_UINT128
 endif
 
 ifndef NSS_DISABLE_CHACHAPOLY
     ifeq ($(CPU_ARCH),x86_64)
-        EXTRA_SRCS += Hacl_Poly1305_128.c Hacl_Chacha20_Vec128.c Hacl_Chacha20Poly1305_128.c
+        ifndef NSS_DISABLE_AVX2
+            EXTRA_SRCS += Hacl_Poly1305_256.c Hacl_Chacha20_Vec256.c Hacl_Chacha20Poly1305_256.c
+        else
+            EXTRA_SRCS += Hacl_Poly1305_128.c Hacl_Chacha20_Vec128.c Hacl_Chacha20Poly1305_128.c
+        endif # NSS_DISABLE_AVX2
     endif # x86_64
 
     VERIFIED_SRCS += Hacl_Poly1305_32.c Hacl_Chacha20.c Hacl_Chacha20Poly1305_32.c
 endif # NSS_DISABLE_CHACHAPOLY
 
 ifeq (,$(filter-out x86_64 aarch64,$(CPU_ARCH)))
     # All 64-bit architectures get the 64 bit version.
     ECL_SRCS += curve25519_64.c
@@ -625,101 +633,101 @@ endif
 ifndef FREEBL_CHILD_BUILD
 
 # Parent build. This is where we decide which shared libraries to build
 
 ifdef FREEBL_BUILD_SINGLE_SHLIB
 
 ################### Single shared lib stuff #########################
 SINGLE_SHLIB_DIR = $(OBJDIR)/$(OS_TARGET)_SINGLE_SHLIB
-ALL_TRASH += $(SINGLE_SHLIB_DIR) 
+ALL_TRASH += $(SINGLE_SHLIB_DIR)
 
 $(SINGLE_SHLIB_DIR):
 	-mkdir -p $(SINGLE_SHLIB_DIR)
 
 release_md libs:: $(SINGLE_SHLIB_DIR)
 	$(MAKE) FREEBL_CHILD_BUILD=1 \
  OBJDIR=$(SINGLE_SHLIB_DIR) $@
 ######################## common stuff #########################
 
 endif
 
 ifdef NEED_STUB_BUILD
 SINGLE_SHLIB_DIR = $(OBJDIR)/$(OS_TARGET)_SINGLE_SHLIB
-ALL_TRASH += $(SINGLE_SHLIB_DIR) 
+ALL_TRASH += $(SINGLE_SHLIB_DIR)
 $(SINGLE_SHLIB_DIR):
 	-mkdir $(SINGLE_SHLIB_DIR)
 
 release_md libs:: $(SINGLE_SHLIB_DIR)
 	$(MAKE) FREEBL_CHILD_BUILD=1 USE_STUB_BUILD=1 \
  OBJDIR=$(SINGLE_SHLIB_DIR) $@
 endif
 
 # multiple shared libraries
 
 ######################## ABI32_FPU stuff #########################
 ifdef HAVE_ABI32_FPU
 ABI32_FPU_DIR = $(OBJDIR)/$(OS_TARGET)_ABI32_FPU
-ALL_TRASH += $(ABI32_FPU_DIR) 
+ALL_TRASH += $(ABI32_FPU_DIR)
 
 $(ABI32_FPU_DIR):
 	-mkdir $(ABI32_FPU_DIR)
 
 release_md libs:: $(ABI32_FPU_DIR)
 	$(MAKE) FREEBL_CHILD_BUILD=1 USE_ABI32_FPU=1 \
  OBJDIR=$(ABI32_FPU_DIR) $@
 endif
 
 ######################## ABI32_INT32 stuff #########################
 ifdef HAVE_ABI32_INT32
 ABI32_INT32_DIR = $(OBJDIR)/$(OS_TARGET)_ABI32_INT32
-ALL_TRASH += $(ABI32_INT32_DIR) 
+ALL_TRASH += $(ABI32_INT32_DIR)
 
 $(ABI32_INT32_DIR):
 	-mkdir $(ABI32_INT32_DIR)
 
 release_md libs:: $(ABI32_INT32_DIR)
 	$(MAKE) FREEBL_CHILD_BUILD=1 USE_ABI32_INT32=1 \
  OBJDIR=$(ABI32_INT32_DIR) $@
 endif
 
 ######################## ABI32_INT64 stuff #########################
 ifdef HAVE_ABI32_INT64
 ABI32_INT64_DIR = $(OBJDIR)/$(OS_TARGET)_ABI32_INT64
-ALL_TRASH += $(ABI32_INT64_DIR) 
+ALL_TRASH += $(ABI32_INT64_DIR)
 
 $(ABI32_INT64_DIR):
 	-mkdir $(ABI32_INT64_DIR)
 
 release_md libs:: $(ABI32_INT64_DIR)
 	$(MAKE) FREEBL_CHILD_BUILD=1 USE_ABI32_INT64=1\
  OBJDIR=$(ABI32_INT64_DIR) $@
 endif
 
 ######################## END of 32-bit stuff #########################
 
 # above is 32-bit builds, below is 64-bit builds
 
 ######################## ABI64_FPU stuff #########################
 ifdef HAVE_ABI64_FPU
 ABI64_FPU_DIR = $(OBJDIR)/$(OS_TARGET)_ABI64_FPU
-ALL_TRASH += $(ABI64_FPU_DIR) 
+ALL_TRASH += $(ABI64_FPU_DIR)
 
 $(ABI64_FPU_DIR):
 	-mkdir $(ABI64_FPU_DIR)
 
 release_md libs:: $(ABI64_FPU_DIR)
 	$(MAKE) FREEBL_CHILD_BUILD=1 USE_ABI64_FPU=1 \
  OBJDIR=$(ABI64_FPU_DIR) $@
 endif
 
 ######################## ABI64_INT stuff #########################
 ifdef HAVE_ABI64_INT
 ABI64_INT_DIR = $(OBJDIR)/$(OS_TARGET)_ABI64_INT
-ALL_TRASH += $(ABI64_INT_DIR) 
+ALL_TRASH += $(ABI64_INT_DIR)
 
 $(ABI64_INT_DIR):
 	-mkdir $(ABI64_INT_DIR)
 
 release_md libs:: $(ABI64_INT_DIR)
 	$(MAKE) FREEBL_CHILD_BUILD=1 USE_ABI64_INT=1 \
  OBJDIR=$(ABI64_INT_DIR) $@
 endif
@@ -780,11 +788,17 @@ endif
 ifeq ($(CPU_ARCH),ppc)
 ifndef NSS_DISABLE_ALTIVEC
 $(OBJDIR)/$(PROG_PREFIX)gcm-ppc$(OBJ_SUFFIX): CFLAGS += -mcrypto -maltivec -mvsx
 $(OBJDIR)/$(PROG_PREFIX)gcm$(OBJ_SUFFIX): CFLAGS += -mcrypto -maltivec -mvsx
 $(OBJDIR)/$(PROG_PREFIX)rijndael$(OBJ_SUFFIX): CFLAGS += -mcrypto -maltivec -mvsx
 endif
 endif
 
-$(OBJDIR)/$(PROG_PREFIX)Hacl_Chacha20_Vec128$(OBJ_SUFFIX): CFLAGS += -mssse3 -msse4 -mavx -maes 
-$(OBJDIR)/$(PROG_PREFIX)Hacl_Chacha20Poly1305_128$(OBJ_SUFFIX): CFLAGS += -mssse3 -msse4 -mavx -maes 
+$(OBJDIR)/$(PROG_PREFIX)Hacl_Chacha20_Vec128$(OBJ_SUFFIX): CFLAGS += -mssse3 -msse4 -mavx -maes
+$(OBJDIR)/$(PROG_PREFIX)Hacl_Chacha20Poly1305_128$(OBJ_SUFFIX): CFLAGS += -mssse3 -msse4 -mavx -maes
 $(OBJDIR)/$(PROG_PREFIX)Hacl_Poly1305_128$(OBJ_SUFFIX): CFLAGS += -mssse3 -msse4 -mavx -maes -mpclmul
+
+ifndef NSS_DISABLE_AVX2
+$(OBJDIR)/$(PROG_PREFIX)Hacl_Chacha20Poly1305_256$(OBJ_SUFFIX): CFLAGS += -mssse3 -msse4 -mavx2 -maes
+$(OBJDIR)/$(PROG_PREFIX)Hacl_Chacha20_Vec256$(OBJ_SUFFIX): CFLAGS += -mssse3 -msse4 -mavx -mavx2 -maes
+$(OBJDIR)/$(PROG_PREFIX)Hacl_Poly1305_256$(OBJ_SUFFIX): CFLAGS += -mssse3 -msse4 -mavx -mavx2 -maes -mpclmul
+endif
--- a/security/nss/lib/freebl/blapii.h
+++ b/security/nss/lib/freebl/blapii.h
@@ -75,16 +75,17 @@ SEC_END_PROTOS
 
 SECStatus RSA_Init();
 SECStatus generate_prime(mp_int *prime, int primeLen);
 
 /* Freebl state. */
 PRBool aesni_support();
 PRBool clmul_support();
 PRBool avx_support();
+PRBool avx2_support();
 PRBool ssse3_support();
 PRBool sse4_1_support();
 PRBool sse4_2_support();
 PRBool arm_neon_support();
 PRBool arm_aes_support();
 PRBool arm_pmull_support();
 PRBool arm_sha1_support();
 PRBool arm_sha2_support();
--- a/security/nss/lib/freebl/blinit.c
+++ b/security/nss/lib/freebl/blinit.c
@@ -22,16 +22,17 @@
 #endif
 
 static PRCallOnceType coFreeblInit;
 
 /* State variables. */
 static PRBool aesni_support_ = PR_FALSE;
 static PRBool clmul_support_ = PR_FALSE;
 static PRBool avx_support_ = PR_FALSE;
+static PRBool avx2_support_ = PR_FALSE;
 static PRBool ssse3_support_ = PR_FALSE;
 static PRBool sse4_1_support_ = PR_FALSE;
 static PRBool sse4_2_support_ = PR_FALSE;
 static PRBool arm_neon_support_ = PR_FALSE;
 static PRBool arm_aes_support_ = PR_FALSE;
 static PRBool arm_sha1_support_ = PR_FALSE;
 static PRBool arm_sha2_support_ = PR_FALSE;
 static PRBool arm_pmull_support_ = PR_FALSE;
@@ -70,38 +71,53 @@ check_xcr0_ymm()
     return (xcr0 & 6) == 6;
 }
 
 #define ECX_AESNI (1 << 25)
 #define ECX_CLMUL (1 << 1)
 #define ECX_XSAVE (1 << 26)
 #define ECX_OSXSAVE (1 << 27)
 #define ECX_AVX (1 << 28)
+#define EBX_AVX2 (1 << 5)
+#define EBX_BMI1 (1 << 3)
+#define EBX_BMI2 (1 << 8)
+#define ECX_FMA (1 << 12)
+#define ECX_MOVBE (1 << 22)
 #define ECX_SSSE3 (1 << 9)
 #define ECX_SSE4_1 (1 << 19)
 #define ECX_SSE4_2 (1 << 20)
 #define AVX_BITS (ECX_XSAVE | ECX_OSXSAVE | ECX_AVX)
+#define AVX2_EBX_BITS (EBX_AVX2 | EBX_BMI1 | EBX_BMI2)
+#define AVX2_ECX_BITS (ECX_FMA | ECX_MOVBE)
 
 void
 CheckX86CPUSupport()
 {
     unsigned long eax, ebx, ecx, edx;
+    unsigned long eax7, ebx7, ecx7, edx7;
     char *disable_hw_aes = PR_GetEnvSecure("NSS_DISABLE_HW_AES");
     char *disable_pclmul = PR_GetEnvSecure("NSS_DISABLE_PCLMUL");
     char *disable_avx = PR_GetEnvSecure("NSS_DISABLE_AVX");
+    char *disable_avx2 = PR_GetEnvSecure("NSS_DISABLE_AVX2");
     char *disable_ssse3 = PR_GetEnvSecure("NSS_DISABLE_SSSE3");
     char *disable_sse4_1 = PR_GetEnvSecure("NSS_DISABLE_SSE4_1");
     char *disable_sse4_2 = PR_GetEnvSecure("NSS_DISABLE_SSE4_2");
     freebl_cpuid(1, &eax, &ebx, &ecx, &edx);
+    freebl_cpuid(7, &eax7, &ebx7, &ecx7, &edx7);
     aesni_support_ = (PRBool)((ecx & ECX_AESNI) != 0 && disable_hw_aes == NULL);
     clmul_support_ = (PRBool)((ecx & ECX_CLMUL) != 0 && disable_pclmul == NULL);
     /* For AVX we check AVX, OSXSAVE, and XSAVE
      * as well as XMM and YMM state. */
     avx_support_ = (PRBool)((ecx & AVX_BITS) == AVX_BITS) && check_xcr0_ymm() &&
                    disable_avx == NULL;
+    /* For AVX2 we check AVX2, BMI1, BMI2, FMA, MOVBE.
+     * We do not check for AVX above. */
+    avx2_support_ = (PRBool)((ebx7 & AVX2_EBX_BITS) == AVX2_EBX_BITS &&
+                             (ecx & AVX2_ECX_BITS) == AVX2_ECX_BITS &&
+                             disable_avx2 == NULL);
     ssse3_support_ = (PRBool)((ecx & ECX_SSSE3) != 0 &&
                               disable_ssse3 == NULL);
     sse4_1_support_ = (PRBool)((ecx & ECX_SSE4_1) != 0 &&
                                disable_sse4_1 == NULL);
     sse4_2_support_ = (PRBool)((ecx & ECX_SSE4_2) != 0 &&
                                disable_sse4_2 == NULL);
 }
 #endif /* NSS_X86_OR_X64 */
@@ -379,16 +395,21 @@ clmul_support()
     return clmul_support_;
 }
 PRBool
 avx_support()
 {
     return avx_support_;
 }
 PRBool
+avx2_support()
+{
+    return avx2_support_;
+}
+PRBool
 ssse3_support()
 {
     return ssse3_support_;
 }
 PRBool
 sse4_1_support()
 {
     return sse4_1_support_;
--- a/security/nss/lib/freebl/chacha20poly1305.c
+++ b/security/nss/lib/freebl/chacha20poly1305.c
@@ -10,19 +10,27 @@
 #include <stdio.h>
 
 #include "seccomon.h"
 #include "secerr.h"
 #include "blapit.h"
 #include "blapii.h"
 #include "chacha20poly1305.h"
 
-// There are two implementations of ChaCha20Poly1305:
-// 1) 128-bit with hardware acceleration used on x64
-// 2) 32-bit used on all other platforms
+// There are three implementations of ChaCha20Poly1305:
+// 1) 128-bit with AVX hardware acceleration used on x64
+// 2) 256-bit with AVX2 hardware acceleration used on x64
+// 3) 32-bit used on all other platforms
+
+// On x64 when AVX2 and other necessary registers are available,
+// the 256bit-verctorized version will be used. When AVX2 features
+// are unavailable or disabled but AVX registers are available, the
+// 128bit-vectorized version will be used. In all other cases the
+// scalar version of the HACL* code will be used.
+
 // Instead of including the headers (they bring other things we don't want),
 // we declare the functions here.
 // Usage is guarded by runtime checks of required hardware features.
 
 // Forward declaration from Hacl_Chacha20_Vec128.h and Hacl_Chacha20Poly1305_128.h.
 extern void Hacl_Chacha20_Vec128_chacha20_encrypt_128(uint32_t len, uint8_t *out,
                                                       uint8_t *text, uint8_t *key,
                                                       uint8_t *n1, uint32_t ctr);
@@ -30,16 +38,29 @@ extern void
 Hacl_Chacha20Poly1305_128_aead_encrypt(uint8_t *k, uint8_t *n1, uint32_t aadlen,
                                        uint8_t *aad, uint32_t mlen, uint8_t *m,
                                        uint8_t *cipher, uint8_t *mac);
 extern uint32_t
 Hacl_Chacha20Poly1305_128_aead_decrypt(uint8_t *k, uint8_t *n1, uint32_t aadlen,
                                        uint8_t *aad, uint32_t mlen, uint8_t *m,
                                        uint8_t *cipher, uint8_t *mac);
 
+// Forward declaration from Hacl_Chacha20_Vec256.h and Hacl_Chacha20Poly1305_256.h.
+extern void Hacl_Chacha20_Vec256_chacha20_encrypt_256(uint32_t len, uint8_t *out,
+                                                      uint8_t *text, uint8_t *key,
+                                                      uint8_t *n1, uint32_t ctr);
+extern void
+Hacl_Chacha20Poly1305_256_aead_encrypt(uint8_t *k, uint8_t *n1, uint32_t aadlen,
+                                       uint8_t *aad, uint32_t mlen, uint8_t *m,
+                                       uint8_t *cipher, uint8_t *mac);
+extern uint32_t
+Hacl_Chacha20Poly1305_256_aead_decrypt(uint8_t *k, uint8_t *n1, uint32_t aadlen,
+                                       uint8_t *aad, uint32_t mlen, uint8_t *m,
+                                       uint8_t *cipher, uint8_t *mac);
+
 // Forward declaration from Hacl_Chacha20.h and Hacl_Chacha20Poly1305_32.h.
 extern void Hacl_Chacha20_chacha20_encrypt(uint32_t len, uint8_t *out,
                                            uint8_t *text, uint8_t *key,
                                            uint8_t *n1, uint32_t ctr);
 extern void
 Hacl_Chacha20Poly1305_32_aead_encrypt(uint8_t *k, uint8_t *n1, uint32_t aadlen,
                                       uint8_t *aad, uint32_t mlen, uint8_t *m,
                                       uint8_t *cipher, uint8_t *mac);
@@ -108,17 +129,25 @@ ChaCha20Poly1305_DestroyContext(ChaCha20
 
 #ifndef NSS_DISABLE_CHACHAPOLY
 void
 ChaCha20Xor(uint8_t *output, uint8_t *block, uint32_t len, uint8_t *k,
             uint8_t *nonce, uint32_t ctr)
 {
 #ifdef NSS_X64
     if (ssse3_support() && sse4_1_support() && avx_support()) {
+#ifdef NSS_DISABLE_AVX2
         Hacl_Chacha20_Vec128_chacha20_encrypt_128(len, output, block, k, nonce, ctr);
+#else
+        if (avx2_support()) {
+            Hacl_Chacha20_Vec256_chacha20_encrypt_256(len, output, block, k, nonce, ctr);
+        } else {
+            Hacl_Chacha20_Vec128_chacha20_encrypt_128(len, output, block, k, nonce, ctr);
+        }
+#endif
     } else
 #endif
     {
         Hacl_Chacha20_chacha20_encrypt(len, output, block, k, nonce, ctr);
     }
 }
 #endif /* NSS_DISABLE_CHACHAPOLY */
 
@@ -162,19 +191,31 @@ ChaCha20Poly1305_Seal(const ChaCha20Poly
     }
     if (maxOutputLen < inputLen + ctx->tagLen) {
         PORT_SetError(SEC_ERROR_OUTPUT_LEN);
         return SECFailure;
     }
 
 #ifdef NSS_X64
     if (ssse3_support() && sse4_1_support() && avx_support()) {
+#ifdef NSS_DISABLE_AVX2
         Hacl_Chacha20Poly1305_128_aead_encrypt(
             (uint8_t *)ctx->key, (uint8_t *)nonce, adLen, (uint8_t *)ad, inputLen,
             (uint8_t *)input, output, output + inputLen);
+#else
+        if (avx2_support()) {
+            Hacl_Chacha20Poly1305_256_aead_encrypt(
+                (uint8_t *)ctx->key, (uint8_t *)nonce, adLen, (uint8_t *)ad, inputLen,
+                (uint8_t *)input, output, output + inputLen);
+        } else {
+            Hacl_Chacha20Poly1305_128_aead_encrypt(
+                (uint8_t *)ctx->key, (uint8_t *)nonce, adLen, (uint8_t *)ad, inputLen,
+                (uint8_t *)input, output, output + inputLen);
+        }
+#endif
     } else
 #endif
     {
         Hacl_Chacha20Poly1305_32_aead_encrypt(
             (uint8_t *)ctx->key, (uint8_t *)nonce, adLen, (uint8_t *)ad, inputLen,
             (uint8_t *)input, output, output + inputLen);
     }
 
@@ -212,19 +253,31 @@ ChaCha20Poly1305_Open(const ChaCha20Poly
     if (inputLen >= (1ULL << (6 + 32)) + ctx->tagLen) {
         PORT_SetError(SEC_ERROR_INPUT_LEN);
         return SECFailure;
     }
 
     uint32_t res = 1;
 #ifdef NSS_X64
     if (ssse3_support() && sse4_1_support() && avx_support()) {
+#ifdef NSS_DISABLE_AVX2
         res = Hacl_Chacha20Poly1305_128_aead_decrypt(
             (uint8_t *)ctx->key, (uint8_t *)nonce, adLen, (uint8_t *)ad, ciphertextLen,
             (uint8_t *)output, (uint8_t *)input, (uint8_t *)input + ciphertextLen);
+#else
+        if (avx2_support()) {
+            res = Hacl_Chacha20Poly1305_256_aead_decrypt(
+                (uint8_t *)ctx->key, (uint8_t *)nonce, adLen, (uint8_t *)ad, ciphertextLen,
+                (uint8_t *)output, (uint8_t *)input, (uint8_t *)input + ciphertextLen);
+        } else {
+            res = Hacl_Chacha20Poly1305_128_aead_decrypt(
+                (uint8_t *)ctx->key, (uint8_t *)nonce, adLen, (uint8_t *)ad, ciphertextLen,
+                (uint8_t *)output, (uint8_t *)input, (uint8_t *)input + ciphertextLen);
+        }
+#endif
     } else
 #endif
     {
         res = Hacl_Chacha20Poly1305_32_aead_decrypt(
             (uint8_t *)ctx->key, (uint8_t *)nonce, adLen, (uint8_t *)ad, ciphertextLen,
             (uint8_t *)output, (uint8_t *)input, (uint8_t *)input + ciphertextLen);
     }
 
--- a/security/nss/lib/freebl/freebl.gyp
+++ b/security/nss/lib/freebl/freebl.gyp
@@ -49,21 +49,20 @@
       'cflags': [
         '-mssse3',
       ],
       'cflags_mozilla': [
         '-mssse3'
       ],
     },
     {
-      # TODO: make this so that all hardware accelerated code is in here.
-      'target_name': 'hw-acc-crypto',
+      'target_name': 'hw-acc-crypto-avx',
       'type': 'static_library',
       # 'sources': [
-      #   All hardware accelerated crypto currently requires x64
+      #   All AVX hardware accelerated crypto currently requires x64
       # ],
       'dependencies': [
         '<(DEPTH)/exports.gyp:nss_exports'
       ],
       'conditions': [
         [ 'target_arch=="x64"', {
           'cflags': [
             '-mssse3',
@@ -113,16 +112,82 @@
             'verified/Hacl_Poly1305_128.c',
             'verified/Hacl_Chacha20_Vec128.c',
             'verified/Hacl_Chacha20Poly1305_128.c',
           ],
         }],
       ],
     },
     {
+      'target_name': 'hw-acc-crypto-avx2',
+      'type': 'static_library',
+      # 'sources': [
+      #   All AVX2 hardware accelerated crypto currently requires x64
+      # ],
+      'dependencies': [
+        '<(DEPTH)/exports.gyp:nss_exports'
+      ],
+      'conditions': [
+        [ 'target_arch=="x64"', {
+          'cflags': [
+            '-mssse3',
+            '-msse4'
+          ],
+          'cflags_mozilla': [
+            '-mssse3',
+            '-msse4',
+            '-mpclmul',
+            '-maes',
+            '-mavx',
+            '-mavx2',
+          ],
+          # GCC doesn't define this.
+          'defines': [
+            '__SSSE3__',
+          ],
+        }],
+        [ 'OS=="linux" or OS=="android" or OS=="dragonfly" or OS=="freebsd" or \
+           OS=="netbsd" or OS=="openbsd"', {
+          'cflags': [
+            '-mpclmul',
+            '-maes',
+            '-mavx',
+            '-mavx2',
+          ],
+        }],
+        # macOS build doesn't use cflags.
+        [ 'OS=="mac" or OS=="ios"', {
+          'xcode_settings': {
+            'OTHER_CFLAGS': [
+              '-mssse3',
+              '-msse4',
+              '-mpclmul',
+              '-maes',
+              '-mavx',
+              '-mavx2',
+            ],
+          },
+        }],
+        [ 'target_arch=="arm"', {
+          # Gecko doesn't support non-NEON platform on Android, but tier-3
+          # platform such as Linux/arm will need it
+          'cflags_mozilla': [
+            '-mfpu=neon'
+          ],
+        }],
+        [ 'target_arch=="x64"', {
+          'sources': [
+            'verified/Hacl_Poly1305_256.c',
+            'verified/Hacl_Chacha20_Vec256.c',
+            'verified/Hacl_Chacha20Poly1305_256.c',
+          ],
+        }],
+      ],
+    },
+    {
       'target_name': 'gcm-aes-x86_c_lib',
       'type': 'static_library',
       'sources': [
         'gcm-x86.c', 'aes-x86.c'
       ],
       'dependencies': [
         '<(DEPTH)/exports.gyp:nss_exports'
       ],
@@ -248,17 +313,18 @@
     {
       'target_name': 'freebl_static',
       'type': 'static_library',
       'includes': [
         'freebl_base.gypi',
       ],
       'dependencies': [
         '<(DEPTH)/exports.gyp:nss_exports',
-        'hw-acc-crypto',
+        'hw-acc-crypto-avx',
+        'hw-acc-crypto-avx2',
       ],
       'conditions': [
         [ 'target_arch=="ia32" or target_arch=="x64"', {
           'dependencies': [
             'gcm-aes-x86_c_lib',
           ],
         }, 'disable_arm_hw_aes==0 and (target_arch=="arm" or target_arch=="arm64" or target_arch=="aarch64")', {
           'dependencies': [
@@ -309,17 +375,18 @@
     {
       'target_name': '<(freebl_name)',
       'type': 'shared_library',
       'includes': [
         'freebl_base.gypi',
       ],
       'dependencies': [
         '<(DEPTH)/exports.gyp:nss_exports',
-        'hw-acc-crypto',
+        'hw-acc-crypto-avx',
+        'hw-acc-crypto-avx2',
       ],
       'conditions': [
         [ 'target_arch=="ia32" or target_arch=="x64"', {
           'dependencies': [
             'gcm-aes-x86_c_lib',
           ]
         }, 'target_arch=="arm" or target_arch=="arm64" or target_arch=="aarch64"', {
           'dependencies': [
@@ -389,33 +456,35 @@
     {
       'target_name': 'freebl_64int_3',
       'includes': [
         'freebl_base.gypi',
       ],
       'type': 'shared_library',
       'dependencies': [
         '<(DEPTH)/exports.gyp:nss_exports',
-        'hw-acc-crypto',
+        'hw-acc-crypto-avx',
+        'hw-acc-crypto-avx2',
       ],
     },
     {
       'target_name': 'freebl_64fpu_3',
       'includes': [
         'freebl_base.gypi',
       ],
       'type': 'shared_library',
       'sources': [
         'mpi/mpi_sparc.c',
         'mpi/mpv_sparcv9.s',
         'mpi/montmulfv9.s',
       ],
       'dependencies': [
         '<(DEPTH)/exports.gyp:nss_exports',
-        'hw-acc-crypto',
+        'hw-acc-crypto-avx',
+        'hw-acc-crypto-avx2',
       ],
       'asflags_mozilla': [
         '-mcpu=v9', '-Wa,-xarch=v9a'
       ],
       'defines': [
         'MP_NO_MP_WORD',
         'MP_USE_UINT_DIGIT',
         'MP_ASSEMBLY_MULTIPLY',
--- a/security/nss/lib/freebl/verified/Hacl_Chacha20.c
+++ b/security/nss/lib/freebl/verified/Hacl_Chacha20.c
@@ -22,23 +22,18 @@
  */
 
 #include "Hacl_Chacha20.h"
 
 uint32_t
     Hacl_Impl_Chacha20_Vec_chacha20_constants[4U] =
         { (uint32_t)0x61707865U, (uint32_t)0x3320646eU, (uint32_t)0x79622d32U, (uint32_t)0x6b206574U };
 
-inline static void
-Hacl_Impl_Chacha20_Core32_quarter_round(
-    uint32_t *st,
-    uint32_t a,
-    uint32_t b,
-    uint32_t c,
-    uint32_t d)
+static inline void
+quarter_round(uint32_t *st, uint32_t a, uint32_t b, uint32_t c, uint32_t d)
 {
     uint32_t sta = st[a];
     uint32_t stb0 = st[b];
     uint32_t std0 = st[d];
     uint32_t sta10 = sta + stb0;
     uint32_t std10 = std0 ^ sta10;
     uint32_t std2 = std10 << (uint32_t)16U | std10 >> (uint32_t)16U;
     st[a] = sta10;
@@ -64,207 +59,159 @@ Hacl_Impl_Chacha20_Core32_quarter_round(
     uint32_t std = st[b];
     uint32_t sta1 = sta3 + stb;
     uint32_t std1 = std ^ sta1;
     uint32_t std22 = std1 << (uint32_t)7U | std1 >> (uint32_t)25U;
     st[c] = sta1;
     st[b] = std22;
 }
 
-inline static void
-Hacl_Impl_Chacha20_Core32_double_round(uint32_t *st)
+static inline void
+double_round(uint32_t *st)
 {
-    Hacl_Impl_Chacha20_Core32_quarter_round(st,
-                                            (uint32_t)0U,
-                                            (uint32_t)4U,
-                                            (uint32_t)8U,
-                                            (uint32_t)12U);
-    Hacl_Impl_Chacha20_Core32_quarter_round(st,
-                                            (uint32_t)1U,
-                                            (uint32_t)5U,
-                                            (uint32_t)9U,
-                                            (uint32_t)13U);
-    Hacl_Impl_Chacha20_Core32_quarter_round(st,
-                                            (uint32_t)2U,
-                                            (uint32_t)6U,
-                                            (uint32_t)10U,
-                                            (uint32_t)14U);
-    Hacl_Impl_Chacha20_Core32_quarter_round(st,
-                                            (uint32_t)3U,
-                                            (uint32_t)7U,
-                                            (uint32_t)11U,
-                                            (uint32_t)15U);
-    Hacl_Impl_Chacha20_Core32_quarter_round(st,
-                                            (uint32_t)0U,
-                                            (uint32_t)5U,
-                                            (uint32_t)10U,
-                                            (uint32_t)15U);
-    Hacl_Impl_Chacha20_Core32_quarter_round(st,
-                                            (uint32_t)1U,
-                                            (uint32_t)6U,
-                                            (uint32_t)11U,
-                                            (uint32_t)12U);
-    Hacl_Impl_Chacha20_Core32_quarter_round(st,
-                                            (uint32_t)2U,
-                                            (uint32_t)7U,
-                                            (uint32_t)8U,
-                                            (uint32_t)13U);
-    Hacl_Impl_Chacha20_Core32_quarter_round(st,
-                                            (uint32_t)3U,
-                                            (uint32_t)4U,
-                                            (uint32_t)9U,
-                                            (uint32_t)14U);
+    quarter_round(st, (uint32_t)0U, (uint32_t)4U, (uint32_t)8U, (uint32_t)12U);
+    quarter_round(st, (uint32_t)1U, (uint32_t)5U, (uint32_t)9U, (uint32_t)13U);
+    quarter_round(st, (uint32_t)2U, (uint32_t)6U, (uint32_t)10U, (uint32_t)14U);
+    quarter_round(st, (uint32_t)3U, (uint32_t)7U, (uint32_t)11U, (uint32_t)15U);
+    quarter_round(st, (uint32_t)0U, (uint32_t)5U, (uint32_t)10U, (uint32_t)15U);
+    quarter_round(st, (uint32_t)1U, (uint32_t)6U, (uint32_t)11U, (uint32_t)12U);
+    quarter_round(st, (uint32_t)2U, (uint32_t)7U, (uint32_t)8U, (uint32_t)13U);
+    quarter_round(st, (uint32_t)3U, (uint32_t)4U, (uint32_t)9U, (uint32_t)14U);
 }
 
-inline static void
-Hacl_Impl_Chacha20_rounds(uint32_t *st)
+static inline void
+rounds(uint32_t *st)
 {
-    Hacl_Impl_Chacha20_Core32_double_round(st);
-    Hacl_Impl_Chacha20_Core32_double_round(st);
-    Hacl_Impl_Chacha20_Core32_double_round(st);
-    Hacl_Impl_Chacha20_Core32_double_round(st);
-    Hacl_Impl_Chacha20_Core32_double_round(st);
-    Hacl_Impl_Chacha20_Core32_double_round(st);
-    Hacl_Impl_Chacha20_Core32_double_round(st);
-    Hacl_Impl_Chacha20_Core32_double_round(st);
-    Hacl_Impl_Chacha20_Core32_double_round(st);
-    Hacl_Impl_Chacha20_Core32_double_round(st);
+    double_round(st);
+    double_round(st);
+    double_round(st);
+    double_round(st);
+    double_round(st);
+    double_round(st);
+    double_round(st);
+    double_round(st);
+    double_round(st);
+    double_round(st);
 }
 
-inline static void
-Hacl_Impl_Chacha20_chacha20_core(uint32_t *k, uint32_t *ctx, uint32_t ctr)
+static inline void
+chacha20_core(uint32_t *k, uint32_t *ctx, uint32_t ctr)
 {
-    memcpy(k, ctx, (uint32_t)16U * sizeof ctx[0U]);
+    memcpy(k, ctx, (uint32_t)16U * sizeof(ctx[0U]));
     uint32_t ctr_u32 = ctr;
     k[12U] = k[12U] + ctr_u32;
-    Hacl_Impl_Chacha20_rounds(k);
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
+    rounds(k);
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
         uint32_t *os = k;
         uint32_t x = k[i] + ctx[i];
         os[i] = x;
     }
     k[12U] = k[12U] + ctr_u32;
 }
 
 static uint32_t
-    Hacl_Impl_Chacha20_chacha20_constants[4U] =
+    chacha20_constants[4U] =
         { (uint32_t)0x61707865U, (uint32_t)0x3320646eU, (uint32_t)0x79622d32U, (uint32_t)0x6b206574U };
 
-inline static void
-Hacl_Impl_Chacha20_chacha20_init(uint32_t *ctx, uint8_t *k, uint8_t *n1, uint32_t ctr)
+static inline void
+chacha20_init(uint32_t *ctx, uint8_t *k, uint8_t *n1, uint32_t ctr)
 {
     uint32_t *uu____0 = ctx;
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)4U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)4U; i++) {
         uint32_t *os = uu____0;
-        uint32_t x = Hacl_Impl_Chacha20_chacha20_constants[i];
+        uint32_t x = chacha20_constants[i];
         os[i] = x;
     }
     uint32_t *uu____1 = ctx + (uint32_t)4U;
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)8U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)8U; i++) {
         uint32_t *os = uu____1;
         uint8_t *bj = k + i * (uint32_t)4U;
         uint32_t u = load32_le(bj);
         uint32_t r = u;
         uint32_t x = r;
         os[i] = x;
     }
     ctx[12U] = ctr;
     uint32_t *uu____2 = ctx + (uint32_t)13U;
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)3U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)3U; i++) {
         uint32_t *os = uu____2;
         uint8_t *bj = n1 + i * (uint32_t)4U;
         uint32_t u = load32_le(bj);
         uint32_t r = u;
         uint32_t x = r;
         os[i] = x;
     }
 }
 
-inline static void
-Hacl_Impl_Chacha20_chacha20_encrypt_block(
-    uint32_t *ctx,
-    uint8_t *out,
-    uint32_t incr1,
-    uint8_t *text)
+static inline void
+chacha20_encrypt_block(uint32_t *ctx, uint8_t *out, uint32_t incr1, uint8_t *text)
 {
     uint32_t k[16U] = { 0U };
-    Hacl_Impl_Chacha20_chacha20_core(k, ctx, incr1);
+    chacha20_core(k, ctx, incr1);
     uint32_t bl[16U] = { 0U };
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
         uint32_t *os = bl;
         uint8_t *bj = text + i * (uint32_t)4U;
         uint32_t u = load32_le(bj);
         uint32_t r = u;
         uint32_t x = r;
         os[i] = x;
     }
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
         uint32_t *os = bl;
         uint32_t x = bl[i] ^ k[i];
         os[i] = x;
     }
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
         store32_le(out + i * (uint32_t)4U, bl[i]);
     }
 }
 
-inline static void
-Hacl_Impl_Chacha20_chacha20_encrypt_last(
-    uint32_t *ctx,
-    uint32_t len,
-    uint8_t *out,
-    uint32_t incr1,
-    uint8_t *text)
+static inline void
+chacha20_encrypt_last(uint32_t *ctx, uint32_t len, uint8_t *out, uint32_t incr1, uint8_t *text)
 {
     uint8_t plain[64U] = { 0U };
-    memcpy(plain, text, len * sizeof text[0U]);
-    Hacl_Impl_Chacha20_chacha20_encrypt_block(ctx, plain, incr1, plain);
-    memcpy(out, plain, len * sizeof plain[0U]);
+    memcpy(plain, text, len * sizeof(text[0U]));
+    chacha20_encrypt_block(ctx, plain, incr1, plain);
+    memcpy(out, plain, len * sizeof(plain[0U]));
 }
 
-inline static void
-Hacl_Impl_Chacha20_chacha20_update(uint32_t *ctx, uint32_t len, uint8_t *out, uint8_t *text)
+static inline void
+chacha20_update(uint32_t *ctx, uint32_t len, uint8_t *out, uint8_t *text)
 {
     uint32_t rem1 = len % (uint32_t)64U;
     uint32_t nb = len / (uint32_t)64U;
     uint32_t rem2 = len % (uint32_t)64U;
-    for (uint32_t i = (uint32_t)0U; i < nb; i = i + (uint32_t)1U) {
-        Hacl_Impl_Chacha20_chacha20_encrypt_block(ctx,
-                                                  out + i * (uint32_t)64U,
-                                                  i,
-                                                  text + i * (uint32_t)64U);
+    for (uint32_t i = (uint32_t)0U; i < nb; i++) {
+        chacha20_encrypt_block(ctx, out + i * (uint32_t)64U, i, text + i * (uint32_t)64U);
     }
     if (rem2 > (uint32_t)0U) {
-        Hacl_Impl_Chacha20_chacha20_encrypt_last(ctx,
-                                                 rem1,
-                                                 out + nb * (uint32_t)64U,
-                                                 nb,
-                                                 text + nb * (uint32_t)64U);
+        chacha20_encrypt_last(ctx, rem1, out + nb * (uint32_t)64U, nb, text + nb * (uint32_t)64U);
     }
 }
 
 void
 Hacl_Chacha20_chacha20_encrypt(
     uint32_t len,
     uint8_t *out,
     uint8_t *text,
     uint8_t *key,
     uint8_t *n1,
     uint32_t ctr)
 {
     uint32_t ctx[16U] = { 0U };
-    Hacl_Impl_Chacha20_chacha20_init(ctx, key, n1, ctr);
-    Hacl_Impl_Chacha20_chacha20_update(ctx, len, out, text);
+    chacha20_init(ctx, key, n1, ctr);
+    chacha20_update(ctx, len, out, text);
 }
 
 void
 Hacl_Chacha20_chacha20_decrypt(
     uint32_t len,
     uint8_t *out,
     uint8_t *cipher,
     uint8_t *key,
     uint8_t *n1,
     uint32_t ctr)
 {
     uint32_t ctx[16U] = { 0U };
-    Hacl_Impl_Chacha20_chacha20_init(ctx, key, n1, ctr);
-    Hacl_Impl_Chacha20_chacha20_update(ctx, len, out, cipher);
+    chacha20_init(ctx, key, n1, ctr);
+    chacha20_update(ctx, len, out, cipher);
 }
--- a/security/nss/lib/freebl/verified/Hacl_Chacha20Poly1305_128.c
+++ b/security/nss/lib/freebl/verified/Hacl_Chacha20Poly1305_128.c
@@ -18,21 +18,18 @@
  * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  */
 
 #include "Hacl_Chacha20Poly1305_128.h"
 
-inline static void
-Hacl_Chacha20Poly1305_128_poly1305_padded_128(
-    Lib_IntVector_Intrinsics_vec128 *ctx,
-    uint32_t len,
-    uint8_t *text)
+static inline void
+poly1305_padded_128(Lib_IntVector_Intrinsics_vec128 *ctx, uint32_t len, uint8_t *text)
 {
     uint32_t n1 = len / (uint32_t)16U;
     uint32_t r = len % (uint32_t)16U;
     uint8_t *blocks = text;
     uint8_t *rem1 = text + n1 * (uint32_t)16U;
     Lib_IntVector_Intrinsics_vec128 *pre0 = ctx + (uint32_t)5U;
     Lib_IntVector_Intrinsics_vec128 *acc0 = ctx;
     uint32_t sz_block = (uint32_t)32U;
@@ -40,17 +37,17 @@ Hacl_Chacha20Poly1305_128_poly1305_padde
     uint8_t *t00 = blocks;
     if (len0 > (uint32_t)0U) {
         uint32_t bs = (uint32_t)32U;
         uint8_t *text0 = t00;
         Hacl_Impl_Poly1305_Field32xN_128_load_acc2(acc0, text0);
         uint32_t len1 = len0 - bs;
         uint8_t *text1 = t00 + bs;
         uint32_t nb = len1 / bs;
-        for (uint32_t i = (uint32_t)0U; i < nb; i = i + (uint32_t)1U) {
+        for (uint32_t i = (uint32_t)0U; i < nb; i++) {
             uint8_t *block = text1 + i * bs;
             Lib_IntVector_Intrinsics_vec128 e[5U];
             for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
                 e[_i] = Lib_IntVector_Intrinsics_vec128_zero;
             Lib_IntVector_Intrinsics_vec128 b1 = Lib_IntVector_Intrinsics_vec128_load_le(block);
             Lib_IntVector_Intrinsics_vec128
                 b2 = Lib_IntVector_Intrinsics_vec128_load_le(block + (uint32_t)16U);
             Lib_IntVector_Intrinsics_vec128 lo = Lib_IntVector_Intrinsics_vec128_interleave_low64(b1, b2);
@@ -195,67 +192,53 @@ Hacl_Chacha20Poly1305_128_poly1305_padde
                     Lib_IntVector_Intrinsics_vec128_add64(a43,
                                                           Lib_IntVector_Intrinsics_vec128_mul64(r0, f140));
             Lib_IntVector_Intrinsics_vec128 t01 = a04;
             Lib_IntVector_Intrinsics_vec128 t1 = a14;
             Lib_IntVector_Intrinsics_vec128 t2 = a24;
             Lib_IntVector_Intrinsics_vec128 t3 = a34;
             Lib_IntVector_Intrinsics_vec128 t4 = a44;
             Lib_IntVector_Intrinsics_vec128
-                l = Lib_IntVector_Intrinsics_vec128_add64(t01, Lib_IntVector_Intrinsics_vec128_zero);
+                mask261 = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU);
             Lib_IntVector_Intrinsics_vec128
-                tmp0 =
-                    Lib_IntVector_Intrinsics_vec128_and(l,
-                                                        Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-            Lib_IntVector_Intrinsics_vec128
-                c01 = Lib_IntVector_Intrinsics_vec128_shift_right64(l, (uint32_t)26U);
-            Lib_IntVector_Intrinsics_vec128 l0 = Lib_IntVector_Intrinsics_vec128_add64(t1, c01);
+                z0 = Lib_IntVector_Intrinsics_vec128_shift_right64(t01, (uint32_t)26U);
             Lib_IntVector_Intrinsics_vec128
-                tmp1 =
-                    Lib_IntVector_Intrinsics_vec128_and(l0,
-                                                        Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-            Lib_IntVector_Intrinsics_vec128
-                c11 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
-            Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(t2, c11);
+                z1 = Lib_IntVector_Intrinsics_vec128_shift_right64(t3, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec128 x0 = Lib_IntVector_Intrinsics_vec128_and(t01, mask261);
+            Lib_IntVector_Intrinsics_vec128 x3 = Lib_IntVector_Intrinsics_vec128_and(t3, mask261);
+            Lib_IntVector_Intrinsics_vec128 x1 = Lib_IntVector_Intrinsics_vec128_add64(t1, z0);
+            Lib_IntVector_Intrinsics_vec128 x4 = Lib_IntVector_Intrinsics_vec128_add64(t4, z1);
             Lib_IntVector_Intrinsics_vec128
-                tmp2 =
-                    Lib_IntVector_Intrinsics_vec128_and(l1,
-                                                        Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+                z01 = Lib_IntVector_Intrinsics_vec128_shift_right64(x1, (uint32_t)26U);
             Lib_IntVector_Intrinsics_vec128
-                c21 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
-            Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(t3, c21);
+                z11 = Lib_IntVector_Intrinsics_vec128_shift_right64(x4, (uint32_t)26U);
             Lib_IntVector_Intrinsics_vec128
-                tmp3 =
-                    Lib_IntVector_Intrinsics_vec128_and(l2,
-                                                        Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+                t = Lib_IntVector_Intrinsics_vec128_shift_left64(z11, (uint32_t)2U);
+            Lib_IntVector_Intrinsics_vec128 z12 = Lib_IntVector_Intrinsics_vec128_add64(z11, t);
+            Lib_IntVector_Intrinsics_vec128 x11 = Lib_IntVector_Intrinsics_vec128_and(x1, mask261);
+            Lib_IntVector_Intrinsics_vec128 x41 = Lib_IntVector_Intrinsics_vec128_and(x4, mask261);
+            Lib_IntVector_Intrinsics_vec128 x2 = Lib_IntVector_Intrinsics_vec128_add64(t2, z01);
+            Lib_IntVector_Intrinsics_vec128 x01 = Lib_IntVector_Intrinsics_vec128_add64(x0, z12);
             Lib_IntVector_Intrinsics_vec128
-                c31 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
-            Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(t4, c31);
-            Lib_IntVector_Intrinsics_vec128
-                tmp4 =
-                    Lib_IntVector_Intrinsics_vec128_and(l3,
-                                                        Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-            Lib_IntVector_Intrinsics_vec128
-                c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+                z02 = Lib_IntVector_Intrinsics_vec128_shift_right64(x2, (uint32_t)26U);
             Lib_IntVector_Intrinsics_vec128
-                l4 =
-                    Lib_IntVector_Intrinsics_vec128_add64(tmp0,
-                                                          Lib_IntVector_Intrinsics_vec128_smul64(c4, (uint64_t)5U));
-            Lib_IntVector_Intrinsics_vec128
-                tmp01 =
-                    Lib_IntVector_Intrinsics_vec128_and(l4,
-                                                        Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+                z13 = Lib_IntVector_Intrinsics_vec128_shift_right64(x01, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec128 x21 = Lib_IntVector_Intrinsics_vec128_and(x2, mask261);
+            Lib_IntVector_Intrinsics_vec128 x02 = Lib_IntVector_Intrinsics_vec128_and(x01, mask261);
+            Lib_IntVector_Intrinsics_vec128 x31 = Lib_IntVector_Intrinsics_vec128_add64(x3, z02);
+            Lib_IntVector_Intrinsics_vec128 x12 = Lib_IntVector_Intrinsics_vec128_add64(x11, z13);
             Lib_IntVector_Intrinsics_vec128
-                c5 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
-            Lib_IntVector_Intrinsics_vec128 tmp11 = Lib_IntVector_Intrinsics_vec128_add64(tmp1, c5);
-            Lib_IntVector_Intrinsics_vec128 o00 = tmp01;
-            Lib_IntVector_Intrinsics_vec128 o10 = tmp11;
-            Lib_IntVector_Intrinsics_vec128 o20 = tmp2;
-            Lib_IntVector_Intrinsics_vec128 o30 = tmp3;
-            Lib_IntVector_Intrinsics_vec128 o40 = tmp4;
+                z03 = Lib_IntVector_Intrinsics_vec128_shift_right64(x31, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec128 x32 = Lib_IntVector_Intrinsics_vec128_and(x31, mask261);
+            Lib_IntVector_Intrinsics_vec128 x42 = Lib_IntVector_Intrinsics_vec128_add64(x41, z03);
+            Lib_IntVector_Intrinsics_vec128 o00 = x02;
+            Lib_IntVector_Intrinsics_vec128 o10 = x12;
+            Lib_IntVector_Intrinsics_vec128 o20 = x21;
+            Lib_IntVector_Intrinsics_vec128 o30 = x32;
+            Lib_IntVector_Intrinsics_vec128 o40 = x42;
             acc0[0U] = o00;
             acc0[1U] = o10;
             acc0[2U] = o20;
             acc0[3U] = o30;
             acc0[4U] = o40;
             Lib_IntVector_Intrinsics_vec128 f100 = acc0[0U];
             Lib_IntVector_Intrinsics_vec128 f11 = acc0[1U];
             Lib_IntVector_Intrinsics_vec128 f12 = acc0[2U];
@@ -278,17 +261,17 @@ Hacl_Chacha20Poly1305_128_poly1305_padde
             acc0[4U] = o4;
         }
         Hacl_Impl_Poly1305_Field32xN_128_fmul_r2_normalize(acc0, pre0);
     }
     uint32_t len1 = n1 * (uint32_t)16U - len0;
     uint8_t *t10 = blocks + len0;
     uint32_t nb = len1 / (uint32_t)16U;
     uint32_t rem2 = len1 % (uint32_t)16U;
-    for (uint32_t i = (uint32_t)0U; i < nb; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < nb; i++) {
         uint8_t *block = t10 + i * (uint32_t)16U;
         Lib_IntVector_Intrinsics_vec128 e[5U];
         for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
             e[_i] = Lib_IntVector_Intrinsics_vec128_zero;
         uint64_t u0 = load64_le(block);
         uint64_t lo = u0;
         uint64_t u = load64_le(block + (uint32_t)8U);
         uint64_t hi = u;
@@ -443,80 +426,66 @@ Hacl_Chacha20Poly1305_128_poly1305_padde
                 Lib_IntVector_Intrinsics_vec128_add64(a45,
                                                       Lib_IntVector_Intrinsics_vec128_mul64(r0, a41));
         Lib_IntVector_Intrinsics_vec128 t01 = a06;
         Lib_IntVector_Intrinsics_vec128 t11 = a16;
         Lib_IntVector_Intrinsics_vec128 t2 = a26;
         Lib_IntVector_Intrinsics_vec128 t3 = a36;
         Lib_IntVector_Intrinsics_vec128 t4 = a46;
         Lib_IntVector_Intrinsics_vec128
-            l = Lib_IntVector_Intrinsics_vec128_add64(t01, Lib_IntVector_Intrinsics_vec128_zero);
+            mask261 = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU);
         Lib_IntVector_Intrinsics_vec128
-            tmp0 =
-                Lib_IntVector_Intrinsics_vec128_and(l,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c01 = Lib_IntVector_Intrinsics_vec128_shift_right64(l, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l0 = Lib_IntVector_Intrinsics_vec128_add64(t11, c01);
+            z0 = Lib_IntVector_Intrinsics_vec128_shift_right64(t01, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            tmp1 =
-                Lib_IntVector_Intrinsics_vec128_and(l0,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c11 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(t2, c11);
+            z1 = Lib_IntVector_Intrinsics_vec128_shift_right64(t3, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x0 = Lib_IntVector_Intrinsics_vec128_and(t01, mask261);
+        Lib_IntVector_Intrinsics_vec128 x3 = Lib_IntVector_Intrinsics_vec128_and(t3, mask261);
+        Lib_IntVector_Intrinsics_vec128 x1 = Lib_IntVector_Intrinsics_vec128_add64(t11, z0);
+        Lib_IntVector_Intrinsics_vec128 x4 = Lib_IntVector_Intrinsics_vec128_add64(t4, z1);
         Lib_IntVector_Intrinsics_vec128
-            tmp2 =
-                Lib_IntVector_Intrinsics_vec128_and(l1,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            z01 = Lib_IntVector_Intrinsics_vec128_shift_right64(x1, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            c21 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(t3, c21);
+            z11 = Lib_IntVector_Intrinsics_vec128_shift_right64(x4, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            tmp3 =
-                Lib_IntVector_Intrinsics_vec128_and(l2,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            t = Lib_IntVector_Intrinsics_vec128_shift_left64(z11, (uint32_t)2U);
+        Lib_IntVector_Intrinsics_vec128 z12 = Lib_IntVector_Intrinsics_vec128_add64(z11, t);
+        Lib_IntVector_Intrinsics_vec128 x11 = Lib_IntVector_Intrinsics_vec128_and(x1, mask261);
+        Lib_IntVector_Intrinsics_vec128 x41 = Lib_IntVector_Intrinsics_vec128_and(x4, mask261);
+        Lib_IntVector_Intrinsics_vec128 x2 = Lib_IntVector_Intrinsics_vec128_add64(t2, z01);
+        Lib_IntVector_Intrinsics_vec128 x01 = Lib_IntVector_Intrinsics_vec128_add64(x0, z12);
         Lib_IntVector_Intrinsics_vec128
-            c31 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(t4, c31);
-        Lib_IntVector_Intrinsics_vec128
-            tmp4 =
-                Lib_IntVector_Intrinsics_vec128_and(l3,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+            z02 = Lib_IntVector_Intrinsics_vec128_shift_right64(x2, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            l4 =
-                Lib_IntVector_Intrinsics_vec128_add64(tmp0,
-                                                      Lib_IntVector_Intrinsics_vec128_smul64(c4, (uint64_t)5U));
-        Lib_IntVector_Intrinsics_vec128
-            tmp01 =
-                Lib_IntVector_Intrinsics_vec128_and(l4,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            z13 = Lib_IntVector_Intrinsics_vec128_shift_right64(x01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x21 = Lib_IntVector_Intrinsics_vec128_and(x2, mask261);
+        Lib_IntVector_Intrinsics_vec128 x02 = Lib_IntVector_Intrinsics_vec128_and(x01, mask261);
+        Lib_IntVector_Intrinsics_vec128 x31 = Lib_IntVector_Intrinsics_vec128_add64(x3, z02);
+        Lib_IntVector_Intrinsics_vec128 x12 = Lib_IntVector_Intrinsics_vec128_add64(x11, z13);
         Lib_IntVector_Intrinsics_vec128
-            c5 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 tmp11 = Lib_IntVector_Intrinsics_vec128_add64(tmp1, c5);
-        Lib_IntVector_Intrinsics_vec128 o0 = tmp01;
-        Lib_IntVector_Intrinsics_vec128 o1 = tmp11;
-        Lib_IntVector_Intrinsics_vec128 o2 = tmp2;
-        Lib_IntVector_Intrinsics_vec128 o3 = tmp3;
-        Lib_IntVector_Intrinsics_vec128 o4 = tmp4;
+            z03 = Lib_IntVector_Intrinsics_vec128_shift_right64(x31, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x32 = Lib_IntVector_Intrinsics_vec128_and(x31, mask261);
+        Lib_IntVector_Intrinsics_vec128 x42 = Lib_IntVector_Intrinsics_vec128_add64(x41, z03);
+        Lib_IntVector_Intrinsics_vec128 o0 = x02;
+        Lib_IntVector_Intrinsics_vec128 o1 = x12;
+        Lib_IntVector_Intrinsics_vec128 o2 = x21;
+        Lib_IntVector_Intrinsics_vec128 o3 = x32;
+        Lib_IntVector_Intrinsics_vec128 o4 = x42;
         acc0[0U] = o0;
         acc0[1U] = o1;
         acc0[2U] = o2;
         acc0[3U] = o3;
         acc0[4U] = o4;
     }
     if (rem2 > (uint32_t)0U) {
         uint8_t *last1 = t10 + nb * (uint32_t)16U;
         Lib_IntVector_Intrinsics_vec128 e[5U];
         for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
             e[_i] = Lib_IntVector_Intrinsics_vec128_zero;
         uint8_t tmp[16U] = { 0U };
-        memcpy(tmp, last1, rem2 * sizeof last1[0U]);
+        memcpy(tmp, last1, rem2 * sizeof(last1[0U]));
         uint64_t u0 = load64_le(tmp);
         uint64_t lo = u0;
         uint64_t u = load64_le(tmp + (uint32_t)8U);
         uint64_t hi = u;
         Lib_IntVector_Intrinsics_vec128 f0 = Lib_IntVector_Intrinsics_vec128_load64(lo);
         Lib_IntVector_Intrinsics_vec128 f1 = Lib_IntVector_Intrinsics_vec128_load64(hi);
         Lib_IntVector_Intrinsics_vec128
             f010 =
@@ -667,75 +636,61 @@ Hacl_Chacha20Poly1305_128_poly1305_padde
                 Lib_IntVector_Intrinsics_vec128_add64(a45,
                                                       Lib_IntVector_Intrinsics_vec128_mul64(r0, a41));
         Lib_IntVector_Intrinsics_vec128 t01 = a06;
         Lib_IntVector_Intrinsics_vec128 t11 = a16;
         Lib_IntVector_Intrinsics_vec128 t2 = a26;
         Lib_IntVector_Intrinsics_vec128 t3 = a36;
         Lib_IntVector_Intrinsics_vec128 t4 = a46;
         Lib_IntVector_Intrinsics_vec128
-            l = Lib_IntVector_Intrinsics_vec128_add64(t01, Lib_IntVector_Intrinsics_vec128_zero);
+            mask261 = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU);
         Lib_IntVector_Intrinsics_vec128
-            tmp0 =
-                Lib_IntVector_Intrinsics_vec128_and(l,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c01 = Lib_IntVector_Intrinsics_vec128_shift_right64(l, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l0 = Lib_IntVector_Intrinsics_vec128_add64(t11, c01);
+            z0 = Lib_IntVector_Intrinsics_vec128_shift_right64(t01, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            tmp1 =
-                Lib_IntVector_Intrinsics_vec128_and(l0,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c11 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(t2, c11);
+            z1 = Lib_IntVector_Intrinsics_vec128_shift_right64(t3, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x0 = Lib_IntVector_Intrinsics_vec128_and(t01, mask261);
+        Lib_IntVector_Intrinsics_vec128 x3 = Lib_IntVector_Intrinsics_vec128_and(t3, mask261);
+        Lib_IntVector_Intrinsics_vec128 x1 = Lib_IntVector_Intrinsics_vec128_add64(t11, z0);
+        Lib_IntVector_Intrinsics_vec128 x4 = Lib_IntVector_Intrinsics_vec128_add64(t4, z1);
         Lib_IntVector_Intrinsics_vec128
-            tmp2 =
-                Lib_IntVector_Intrinsics_vec128_and(l1,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            z01 = Lib_IntVector_Intrinsics_vec128_shift_right64(x1, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            c21 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(t3, c21);
+            z11 = Lib_IntVector_Intrinsics_vec128_shift_right64(x4, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            tmp3 =
-                Lib_IntVector_Intrinsics_vec128_and(l2,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            t = Lib_IntVector_Intrinsics_vec128_shift_left64(z11, (uint32_t)2U);
+        Lib_IntVector_Intrinsics_vec128 z12 = Lib_IntVector_Intrinsics_vec128_add64(z11, t);
+        Lib_IntVector_Intrinsics_vec128 x11 = Lib_IntVector_Intrinsics_vec128_and(x1, mask261);
+        Lib_IntVector_Intrinsics_vec128 x41 = Lib_IntVector_Intrinsics_vec128_and(x4, mask261);
+        Lib_IntVector_Intrinsics_vec128 x2 = Lib_IntVector_Intrinsics_vec128_add64(t2, z01);
+        Lib_IntVector_Intrinsics_vec128 x01 = Lib_IntVector_Intrinsics_vec128_add64(x0, z12);
         Lib_IntVector_Intrinsics_vec128
-            c31 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(t4, c31);
-        Lib_IntVector_Intrinsics_vec128
-            tmp4 =
-                Lib_IntVector_Intrinsics_vec128_and(l3,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+            z02 = Lib_IntVector_Intrinsics_vec128_shift_right64(x2, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            l4 =
-                Lib_IntVector_Intrinsics_vec128_add64(tmp0,
-                                                      Lib_IntVector_Intrinsics_vec128_smul64(c4, (uint64_t)5U));
-        Lib_IntVector_Intrinsics_vec128
-            tmp01 =
-                Lib_IntVector_Intrinsics_vec128_and(l4,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            z13 = Lib_IntVector_Intrinsics_vec128_shift_right64(x01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x21 = Lib_IntVector_Intrinsics_vec128_and(x2, mask261);
+        Lib_IntVector_Intrinsics_vec128 x02 = Lib_IntVector_Intrinsics_vec128_and(x01, mask261);
+        Lib_IntVector_Intrinsics_vec128 x31 = Lib_IntVector_Intrinsics_vec128_add64(x3, z02);
+        Lib_IntVector_Intrinsics_vec128 x12 = Lib_IntVector_Intrinsics_vec128_add64(x11, z13);
         Lib_IntVector_Intrinsics_vec128
-            c5 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 tmp11 = Lib_IntVector_Intrinsics_vec128_add64(tmp1, c5);
-        Lib_IntVector_Intrinsics_vec128 o0 = tmp01;
-        Lib_IntVector_Intrinsics_vec128 o1 = tmp11;
-        Lib_IntVector_Intrinsics_vec128 o2 = tmp2;
-        Lib_IntVector_Intrinsics_vec128 o3 = tmp3;
-        Lib_IntVector_Intrinsics_vec128 o4 = tmp4;
+            z03 = Lib_IntVector_Intrinsics_vec128_shift_right64(x31, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x32 = Lib_IntVector_Intrinsics_vec128_and(x31, mask261);
+        Lib_IntVector_Intrinsics_vec128 x42 = Lib_IntVector_Intrinsics_vec128_add64(x41, z03);
+        Lib_IntVector_Intrinsics_vec128 o0 = x02;
+        Lib_IntVector_Intrinsics_vec128 o1 = x12;
+        Lib_IntVector_Intrinsics_vec128 o2 = x21;
+        Lib_IntVector_Intrinsics_vec128 o3 = x32;
+        Lib_IntVector_Intrinsics_vec128 o4 = x42;
         acc0[0U] = o0;
         acc0[1U] = o1;
         acc0[2U] = o2;
         acc0[3U] = o3;
         acc0[4U] = o4;
     }
     uint8_t tmp[16U] = { 0U };
-    memcpy(tmp, rem1, r * sizeof rem1[0U]);
+    memcpy(tmp, rem1, r * sizeof(rem1[0U]));
     if (r > (uint32_t)0U) {
         Lib_IntVector_Intrinsics_vec128 *pre = ctx + (uint32_t)5U;
         Lib_IntVector_Intrinsics_vec128 *acc = ctx;
         Lib_IntVector_Intrinsics_vec128 e[5U];
         for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
             e[_i] = Lib_IntVector_Intrinsics_vec128_zero;
         uint64_t u0 = load64_le(tmp);
         uint64_t lo = u0;
@@ -892,92 +847,78 @@ Hacl_Chacha20Poly1305_128_poly1305_padde
                 Lib_IntVector_Intrinsics_vec128_add64(a45,
                                                       Lib_IntVector_Intrinsics_vec128_mul64(r0, a41));
         Lib_IntVector_Intrinsics_vec128 t0 = a06;
         Lib_IntVector_Intrinsics_vec128 t1 = a16;
         Lib_IntVector_Intrinsics_vec128 t2 = a26;
         Lib_IntVector_Intrinsics_vec128 t3 = a36;
         Lib_IntVector_Intrinsics_vec128 t4 = a46;
         Lib_IntVector_Intrinsics_vec128
-            l = Lib_IntVector_Intrinsics_vec128_add64(t0, Lib_IntVector_Intrinsics_vec128_zero);
+            mask261 = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU);
         Lib_IntVector_Intrinsics_vec128
-            tmp0 =
-                Lib_IntVector_Intrinsics_vec128_and(l,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c01 = Lib_IntVector_Intrinsics_vec128_shift_right64(l, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l0 = Lib_IntVector_Intrinsics_vec128_add64(t1, c01);
+            z0 = Lib_IntVector_Intrinsics_vec128_shift_right64(t0, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            tmp1 =
-                Lib_IntVector_Intrinsics_vec128_and(l0,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c11 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(t2, c11);
+            z1 = Lib_IntVector_Intrinsics_vec128_shift_right64(t3, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x0 = Lib_IntVector_Intrinsics_vec128_and(t0, mask261);
+        Lib_IntVector_Intrinsics_vec128 x3 = Lib_IntVector_Intrinsics_vec128_and(t3, mask261);
+        Lib_IntVector_Intrinsics_vec128 x1 = Lib_IntVector_Intrinsics_vec128_add64(t1, z0);
+        Lib_IntVector_Intrinsics_vec128 x4 = Lib_IntVector_Intrinsics_vec128_add64(t4, z1);
         Lib_IntVector_Intrinsics_vec128
-            tmp2 =
-                Lib_IntVector_Intrinsics_vec128_and(l1,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            z01 = Lib_IntVector_Intrinsics_vec128_shift_right64(x1, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            c21 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(t3, c21);
+            z11 = Lib_IntVector_Intrinsics_vec128_shift_right64(x4, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            tmp3 =
-                Lib_IntVector_Intrinsics_vec128_and(l2,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            t = Lib_IntVector_Intrinsics_vec128_shift_left64(z11, (uint32_t)2U);
+        Lib_IntVector_Intrinsics_vec128 z12 = Lib_IntVector_Intrinsics_vec128_add64(z11, t);
+        Lib_IntVector_Intrinsics_vec128 x11 = Lib_IntVector_Intrinsics_vec128_and(x1, mask261);
+        Lib_IntVector_Intrinsics_vec128 x41 = Lib_IntVector_Intrinsics_vec128_and(x4, mask261);
+        Lib_IntVector_Intrinsics_vec128 x2 = Lib_IntVector_Intrinsics_vec128_add64(t2, z01);
+        Lib_IntVector_Intrinsics_vec128 x01 = Lib_IntVector_Intrinsics_vec128_add64(x0, z12);
         Lib_IntVector_Intrinsics_vec128
-            c31 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(t4, c31);
-        Lib_IntVector_Intrinsics_vec128
-            tmp4 =
-                Lib_IntVector_Intrinsics_vec128_and(l3,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+            z02 = Lib_IntVector_Intrinsics_vec128_shift_right64(x2, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            l4 =
-                Lib_IntVector_Intrinsics_vec128_add64(tmp0,
-                                                      Lib_IntVector_Intrinsics_vec128_smul64(c4, (uint64_t)5U));
-        Lib_IntVector_Intrinsics_vec128
-            tmp01 =
-                Lib_IntVector_Intrinsics_vec128_and(l4,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            z13 = Lib_IntVector_Intrinsics_vec128_shift_right64(x01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x21 = Lib_IntVector_Intrinsics_vec128_and(x2, mask261);
+        Lib_IntVector_Intrinsics_vec128 x02 = Lib_IntVector_Intrinsics_vec128_and(x01, mask261);
+        Lib_IntVector_Intrinsics_vec128 x31 = Lib_IntVector_Intrinsics_vec128_add64(x3, z02);
+        Lib_IntVector_Intrinsics_vec128 x12 = Lib_IntVector_Intrinsics_vec128_add64(x11, z13);
         Lib_IntVector_Intrinsics_vec128
-            c5 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 tmp11 = Lib_IntVector_Intrinsics_vec128_add64(tmp1, c5);
-        Lib_IntVector_Intrinsics_vec128 o0 = tmp01;
-        Lib_IntVector_Intrinsics_vec128 o1 = tmp11;
-        Lib_IntVector_Intrinsics_vec128 o2 = tmp2;
-        Lib_IntVector_Intrinsics_vec128 o3 = tmp3;
-        Lib_IntVector_Intrinsics_vec128 o4 = tmp4;
+            z03 = Lib_IntVector_Intrinsics_vec128_shift_right64(x31, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x32 = Lib_IntVector_Intrinsics_vec128_and(x31, mask261);
+        Lib_IntVector_Intrinsics_vec128 x42 = Lib_IntVector_Intrinsics_vec128_add64(x41, z03);
+        Lib_IntVector_Intrinsics_vec128 o0 = x02;
+        Lib_IntVector_Intrinsics_vec128 o1 = x12;
+        Lib_IntVector_Intrinsics_vec128 o2 = x21;
+        Lib_IntVector_Intrinsics_vec128 o3 = x32;
+        Lib_IntVector_Intrinsics_vec128 o4 = x42;
         acc[0U] = o0;
         acc[1U] = o1;
         acc[2U] = o2;
         acc[3U] = o3;
         acc[4U] = o4;
         return;
     }
 }
 
-inline static void
-Hacl_Chacha20Poly1305_128_poly1305_do_128(
+static inline void
+poly1305_do_128(
     uint8_t *k,
     uint32_t aadlen,
     uint8_t *aad,
     uint32_t mlen,
     uint8_t *m,
     uint8_t *out)
 {
     Lib_IntVector_Intrinsics_vec128 ctx[25U];
     for (uint32_t _i = 0U; _i < (uint32_t)25U; ++_i)
         ctx[_i] = Lib_IntVector_Intrinsics_vec128_zero;
     uint8_t block[16U] = { 0U };
     Hacl_Poly1305_128_poly1305_init(ctx, k);
-    Hacl_Chacha20Poly1305_128_poly1305_padded_128(ctx, aadlen, aad);
-    Hacl_Chacha20Poly1305_128_poly1305_padded_128(ctx, mlen, m);
+    poly1305_padded_128(ctx, aadlen, aad);
+    poly1305_padded_128(ctx, mlen, m);
     store64_le(block, (uint64_t)aadlen);
     store64_le(block + (uint32_t)8U, (uint64_t)mlen);
     Lib_IntVector_Intrinsics_vec128 *pre = ctx + (uint32_t)5U;
     Lib_IntVector_Intrinsics_vec128 *acc = ctx;
     Lib_IntVector_Intrinsics_vec128 e[5U];
     for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
         e[_i] = Lib_IntVector_Intrinsics_vec128_zero;
     uint64_t u0 = load64_le(block);
@@ -1135,67 +1076,53 @@ Hacl_Chacha20Poly1305_128_poly1305_do_12
             Lib_IntVector_Intrinsics_vec128_add64(a45,
                                                   Lib_IntVector_Intrinsics_vec128_mul64(r0, a41));
     Lib_IntVector_Intrinsics_vec128 t0 = a06;
     Lib_IntVector_Intrinsics_vec128 t1 = a16;
     Lib_IntVector_Intrinsics_vec128 t2 = a26;
     Lib_IntVector_Intrinsics_vec128 t3 = a36;
     Lib_IntVector_Intrinsics_vec128 t4 = a46;
     Lib_IntVector_Intrinsics_vec128
-        l = Lib_IntVector_Intrinsics_vec128_add64(t0, Lib_IntVector_Intrinsics_vec128_zero);
+        mask261 = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU);
     Lib_IntVector_Intrinsics_vec128
-        tmp0 =
-            Lib_IntVector_Intrinsics_vec128_and(l,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c01 = Lib_IntVector_Intrinsics_vec128_shift_right64(l, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l0 = Lib_IntVector_Intrinsics_vec128_add64(t1, c01);
+        z0 = Lib_IntVector_Intrinsics_vec128_shift_right64(t0, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        tmp1 =
-            Lib_IntVector_Intrinsics_vec128_and(l0,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c11 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(t2, c11);
+        z1 = Lib_IntVector_Intrinsics_vec128_shift_right64(t3, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 x0 = Lib_IntVector_Intrinsics_vec128_and(t0, mask261);
+    Lib_IntVector_Intrinsics_vec128 x3 = Lib_IntVector_Intrinsics_vec128_and(t3, mask261);
+    Lib_IntVector_Intrinsics_vec128 x1 = Lib_IntVector_Intrinsics_vec128_add64(t1, z0);
+    Lib_IntVector_Intrinsics_vec128 x4 = Lib_IntVector_Intrinsics_vec128_add64(t4, z1);
     Lib_IntVector_Intrinsics_vec128
-        tmp2 =
-            Lib_IntVector_Intrinsics_vec128_and(l1,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+        z01 = Lib_IntVector_Intrinsics_vec128_shift_right64(x1, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        c21 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(t3, c21);
+        z11 = Lib_IntVector_Intrinsics_vec128_shift_right64(x4, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        tmp3 =
-            Lib_IntVector_Intrinsics_vec128_and(l2,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+        t = Lib_IntVector_Intrinsics_vec128_shift_left64(z11, (uint32_t)2U);
+    Lib_IntVector_Intrinsics_vec128 z12 = Lib_IntVector_Intrinsics_vec128_add64(z11, t);
+    Lib_IntVector_Intrinsics_vec128 x11 = Lib_IntVector_Intrinsics_vec128_and(x1, mask261);
+    Lib_IntVector_Intrinsics_vec128 x41 = Lib_IntVector_Intrinsics_vec128_and(x4, mask261);
+    Lib_IntVector_Intrinsics_vec128 x2 = Lib_IntVector_Intrinsics_vec128_add64(t2, z01);
+    Lib_IntVector_Intrinsics_vec128 x01 = Lib_IntVector_Intrinsics_vec128_add64(x0, z12);
     Lib_IntVector_Intrinsics_vec128
-        c31 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(t4, c31);
-    Lib_IntVector_Intrinsics_vec128
-        tmp4 =
-            Lib_IntVector_Intrinsics_vec128_and(l3,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+        z02 = Lib_IntVector_Intrinsics_vec128_shift_right64(x2, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        l4 =
-            Lib_IntVector_Intrinsics_vec128_add64(tmp0,
-                                                  Lib_IntVector_Intrinsics_vec128_smul64(c4, (uint64_t)5U));
-    Lib_IntVector_Intrinsics_vec128
-        tmp01 =
-            Lib_IntVector_Intrinsics_vec128_and(l4,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+        z13 = Lib_IntVector_Intrinsics_vec128_shift_right64(x01, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 x21 = Lib_IntVector_Intrinsics_vec128_and(x2, mask261);
+    Lib_IntVector_Intrinsics_vec128 x02 = Lib_IntVector_Intrinsics_vec128_and(x01, mask261);
+    Lib_IntVector_Intrinsics_vec128 x31 = Lib_IntVector_Intrinsics_vec128_add64(x3, z02);
+    Lib_IntVector_Intrinsics_vec128 x12 = Lib_IntVector_Intrinsics_vec128_add64(x11, z13);
     Lib_IntVector_Intrinsics_vec128
-        c5 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 tmp11 = Lib_IntVector_Intrinsics_vec128_add64(tmp1, c5);
-    Lib_IntVector_Intrinsics_vec128 o0 = tmp01;
-    Lib_IntVector_Intrinsics_vec128 o1 = tmp11;
-    Lib_IntVector_Intrinsics_vec128 o2 = tmp2;
-    Lib_IntVector_Intrinsics_vec128 o3 = tmp3;
-    Lib_IntVector_Intrinsics_vec128 o4 = tmp4;
+        z03 = Lib_IntVector_Intrinsics_vec128_shift_right64(x31, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 x32 = Lib_IntVector_Intrinsics_vec128_and(x31, mask261);
+    Lib_IntVector_Intrinsics_vec128 x42 = Lib_IntVector_Intrinsics_vec128_add64(x41, z03);
+    Lib_IntVector_Intrinsics_vec128 o0 = x02;
+    Lib_IntVector_Intrinsics_vec128 o1 = x12;
+    Lib_IntVector_Intrinsics_vec128 o2 = x21;
+    Lib_IntVector_Intrinsics_vec128 o3 = x32;
+    Lib_IntVector_Intrinsics_vec128 o4 = x42;
     acc[0U] = o0;
     acc[1U] = o1;
     acc[2U] = o2;
     acc[3U] = o3;
     acc[4U] = o4;
     Hacl_Poly1305_128_poly1305_finish(out, k, ctx);
 }
 
@@ -1209,17 +1136,17 @@ Hacl_Chacha20Poly1305_128_aead_encrypt(
     uint8_t *m,
     uint8_t *cipher,
     uint8_t *mac)
 {
     Hacl_Chacha20_Vec128_chacha20_encrypt_128(mlen, cipher, m, k, n1, (uint32_t)1U);
     uint8_t tmp[64U] = { 0U };
     Hacl_Chacha20_Vec128_chacha20_encrypt_128((uint32_t)64U, tmp, tmp, k, n1, (uint32_t)0U);
     uint8_t *key = tmp;
-    Hacl_Chacha20Poly1305_128_poly1305_do_128(key, aadlen, aad, mlen, cipher, mac);
+    poly1305_do_128(key, aadlen, aad, mlen, cipher, mac);
 }
 
 uint32_t
 Hacl_Chacha20Poly1305_128_aead_decrypt(
     uint8_t *k,
     uint8_t *n1,
     uint32_t aadlen,
     uint8_t *aad,
@@ -1227,19 +1154,19 @@ Hacl_Chacha20Poly1305_128_aead_decrypt(
     uint8_t *m,
     uint8_t *cipher,
     uint8_t *mac)
 {
     uint8_t computed_mac[16U] = { 0U };
     uint8_t tmp[64U] = { 0U };
     Hacl_Chacha20_Vec128_chacha20_encrypt_128((uint32_t)64U, tmp, tmp, k, n1, (uint32_t)0U);
     uint8_t *key = tmp;
-    Hacl_Chacha20Poly1305_128_poly1305_do_128(key, aadlen, aad, mlen, cipher, computed_mac);
+    poly1305_do_128(key, aadlen, aad, mlen, cipher, computed_mac);
     uint8_t res = (uint8_t)255U;
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
         uint8_t uu____0 = FStar_UInt8_eq_mask(computed_mac[i], mac[i]);
         res = uu____0 & res;
     }
     uint8_t z = res;
     if (z == (uint8_t)255U) {
         Hacl_Chacha20_Vec128_chacha20_encrypt_128(mlen, m, cipher, k, n1, (uint32_t)1U);
         return (uint32_t)0U;
     }
new file mode 100644
--- /dev/null
+++ b/security/nss/lib/freebl/verified/Hacl_Chacha20Poly1305_256.c
@@ -0,0 +1,1176 @@
+/* MIT License
+ *
+ * Copyright (c) 2016-2020 INRIA, CMU and Microsoft Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "Hacl_Chacha20Poly1305_256.h"
+
+static inline void
+poly1305_padded_256(Lib_IntVector_Intrinsics_vec256 *ctx, uint32_t len, uint8_t *text)
+{
+    uint32_t n1 = len / (uint32_t)16U;
+    uint32_t r = len % (uint32_t)16U;
+    uint8_t *blocks = text;
+    uint8_t *rem1 = text + n1 * (uint32_t)16U;
+    Lib_IntVector_Intrinsics_vec256 *pre0 = ctx + (uint32_t)5U;
+    Lib_IntVector_Intrinsics_vec256 *acc0 = ctx;
+    uint32_t sz_block = (uint32_t)64U;
+    uint32_t len0 = n1 * (uint32_t)16U / sz_block * sz_block;
+    uint8_t *t00 = blocks;
+    if (len0 > (uint32_t)0U) {
+        uint32_t bs = (uint32_t)64U;
+        uint8_t *text0 = t00;
+        Hacl_Impl_Poly1305_Field32xN_256_load_acc4(acc0, text0);
+        uint32_t len1 = len0 - bs;
+        uint8_t *text1 = t00 + bs;
+        uint32_t nb = len1 / bs;
+        for (uint32_t i = (uint32_t)0U; i < nb; i++) {
+            uint8_t *block = text1 + i * bs;
+            Lib_IntVector_Intrinsics_vec256 e[5U];
+            for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
+                e[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+            Lib_IntVector_Intrinsics_vec256 lo = Lib_IntVector_Intrinsics_vec256_load_le(block);
+            Lib_IntVector_Intrinsics_vec256
+                hi = Lib_IntVector_Intrinsics_vec256_load_le(block + (uint32_t)32U);
+            Lib_IntVector_Intrinsics_vec256
+                mask2610 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+            Lib_IntVector_Intrinsics_vec256
+                m0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(lo, hi);
+            Lib_IntVector_Intrinsics_vec256
+                m1 = Lib_IntVector_Intrinsics_vec256_interleave_high128(lo, hi);
+            Lib_IntVector_Intrinsics_vec256
+                m2 = Lib_IntVector_Intrinsics_vec256_shift_right(m0, (uint32_t)48U);
+            Lib_IntVector_Intrinsics_vec256
+                m3 = Lib_IntVector_Intrinsics_vec256_shift_right(m1, (uint32_t)48U);
+            Lib_IntVector_Intrinsics_vec256
+                m4 = Lib_IntVector_Intrinsics_vec256_interleave_high64(m0, m1);
+            Lib_IntVector_Intrinsics_vec256
+                t010 = Lib_IntVector_Intrinsics_vec256_interleave_low64(m0, m1);
+            Lib_IntVector_Intrinsics_vec256
+                t30 = Lib_IntVector_Intrinsics_vec256_interleave_low64(m2, m3);
+            Lib_IntVector_Intrinsics_vec256
+                t20 = Lib_IntVector_Intrinsics_vec256_shift_right64(t30, (uint32_t)4U);
+            Lib_IntVector_Intrinsics_vec256 o20 = Lib_IntVector_Intrinsics_vec256_and(t20, mask2610);
+            Lib_IntVector_Intrinsics_vec256
+                t10 = Lib_IntVector_Intrinsics_vec256_shift_right64(t010, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256 o10 = Lib_IntVector_Intrinsics_vec256_and(t10, mask2610);
+            Lib_IntVector_Intrinsics_vec256 o5 = Lib_IntVector_Intrinsics_vec256_and(t010, mask2610);
+            Lib_IntVector_Intrinsics_vec256
+                t31 = Lib_IntVector_Intrinsics_vec256_shift_right64(t30, (uint32_t)30U);
+            Lib_IntVector_Intrinsics_vec256 o30 = Lib_IntVector_Intrinsics_vec256_and(t31, mask2610);
+            Lib_IntVector_Intrinsics_vec256
+                o40 = Lib_IntVector_Intrinsics_vec256_shift_right64(m4, (uint32_t)40U);
+            Lib_IntVector_Intrinsics_vec256 o00 = o5;
+            Lib_IntVector_Intrinsics_vec256 o11 = o10;
+            Lib_IntVector_Intrinsics_vec256 o21 = o20;
+            Lib_IntVector_Intrinsics_vec256 o31 = o30;
+            Lib_IntVector_Intrinsics_vec256 o41 = o40;
+            e[0U] = o00;
+            e[1U] = o11;
+            e[2U] = o21;
+            e[3U] = o31;
+            e[4U] = o41;
+            uint64_t b = (uint64_t)0x1000000U;
+            Lib_IntVector_Intrinsics_vec256 mask = Lib_IntVector_Intrinsics_vec256_load64(b);
+            Lib_IntVector_Intrinsics_vec256 f4 = e[4U];
+            e[4U] = Lib_IntVector_Intrinsics_vec256_or(f4, mask);
+            Lib_IntVector_Intrinsics_vec256 *rn = pre0 + (uint32_t)10U;
+            Lib_IntVector_Intrinsics_vec256 *rn5 = pre0 + (uint32_t)15U;
+            Lib_IntVector_Intrinsics_vec256 r0 = rn[0U];
+            Lib_IntVector_Intrinsics_vec256 r1 = rn[1U];
+            Lib_IntVector_Intrinsics_vec256 r2 = rn[2U];
+            Lib_IntVector_Intrinsics_vec256 r3 = rn[3U];
+            Lib_IntVector_Intrinsics_vec256 r4 = rn[4U];
+            Lib_IntVector_Intrinsics_vec256 r51 = rn5[1U];
+            Lib_IntVector_Intrinsics_vec256 r52 = rn5[2U];
+            Lib_IntVector_Intrinsics_vec256 r53 = rn5[3U];
+            Lib_IntVector_Intrinsics_vec256 r54 = rn5[4U];
+            Lib_IntVector_Intrinsics_vec256 f10 = acc0[0U];
+            Lib_IntVector_Intrinsics_vec256 f110 = acc0[1U];
+            Lib_IntVector_Intrinsics_vec256 f120 = acc0[2U];
+            Lib_IntVector_Intrinsics_vec256 f130 = acc0[3U];
+            Lib_IntVector_Intrinsics_vec256 f140 = acc0[4U];
+            Lib_IntVector_Intrinsics_vec256 a0 = Lib_IntVector_Intrinsics_vec256_mul64(r0, f10);
+            Lib_IntVector_Intrinsics_vec256 a1 = Lib_IntVector_Intrinsics_vec256_mul64(r1, f10);
+            Lib_IntVector_Intrinsics_vec256 a2 = Lib_IntVector_Intrinsics_vec256_mul64(r2, f10);
+            Lib_IntVector_Intrinsics_vec256 a3 = Lib_IntVector_Intrinsics_vec256_mul64(r3, f10);
+            Lib_IntVector_Intrinsics_vec256 a4 = Lib_IntVector_Intrinsics_vec256_mul64(r4, f10);
+            Lib_IntVector_Intrinsics_vec256
+                a01 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a0,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r54, f110));
+            Lib_IntVector_Intrinsics_vec256
+                a11 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a1,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r0, f110));
+            Lib_IntVector_Intrinsics_vec256
+                a21 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a2,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r1, f110));
+            Lib_IntVector_Intrinsics_vec256
+                a31 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a3,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r2, f110));
+            Lib_IntVector_Intrinsics_vec256
+                a41 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a4,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r3, f110));
+            Lib_IntVector_Intrinsics_vec256
+                a02 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a01,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r53, f120));
+            Lib_IntVector_Intrinsics_vec256
+                a12 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a11,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r54, f120));
+            Lib_IntVector_Intrinsics_vec256
+                a22 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a21,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r0, f120));
+            Lib_IntVector_Intrinsics_vec256
+                a32 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a31,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r1, f120));
+            Lib_IntVector_Intrinsics_vec256
+                a42 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a41,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r2, f120));
+            Lib_IntVector_Intrinsics_vec256
+                a03 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a02,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r52, f130));
+            Lib_IntVector_Intrinsics_vec256
+                a13 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a12,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r53, f130));
+            Lib_IntVector_Intrinsics_vec256
+                a23 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a22,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r54, f130));
+            Lib_IntVector_Intrinsics_vec256
+                a33 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a32,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r0, f130));
+            Lib_IntVector_Intrinsics_vec256
+                a43 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a42,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r1, f130));
+            Lib_IntVector_Intrinsics_vec256
+                a04 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a03,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r51, f140));
+            Lib_IntVector_Intrinsics_vec256
+                a14 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a13,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r52, f140));
+            Lib_IntVector_Intrinsics_vec256
+                a24 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a23,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r53, f140));
+            Lib_IntVector_Intrinsics_vec256
+                a34 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a33,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r54, f140));
+            Lib_IntVector_Intrinsics_vec256
+                a44 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a43,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r0, f140));
+            Lib_IntVector_Intrinsics_vec256 t01 = a04;
+            Lib_IntVector_Intrinsics_vec256 t1 = a14;
+            Lib_IntVector_Intrinsics_vec256 t2 = a24;
+            Lib_IntVector_Intrinsics_vec256 t3 = a34;
+            Lib_IntVector_Intrinsics_vec256 t4 = a44;
+            Lib_IntVector_Intrinsics_vec256
+                mask261 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+            Lib_IntVector_Intrinsics_vec256
+                z0 = Lib_IntVector_Intrinsics_vec256_shift_right64(t01, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256
+                z1 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256 x0 = Lib_IntVector_Intrinsics_vec256_and(t01, mask261);
+            Lib_IntVector_Intrinsics_vec256 x3 = Lib_IntVector_Intrinsics_vec256_and(t3, mask261);
+            Lib_IntVector_Intrinsics_vec256 x1 = Lib_IntVector_Intrinsics_vec256_add64(t1, z0);
+            Lib_IntVector_Intrinsics_vec256 x4 = Lib_IntVector_Intrinsics_vec256_add64(t4, z1);
+            Lib_IntVector_Intrinsics_vec256
+                z01 = Lib_IntVector_Intrinsics_vec256_shift_right64(x1, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256
+                z11 = Lib_IntVector_Intrinsics_vec256_shift_right64(x4, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256
+                t = Lib_IntVector_Intrinsics_vec256_shift_left64(z11, (uint32_t)2U);
+            Lib_IntVector_Intrinsics_vec256 z12 = Lib_IntVector_Intrinsics_vec256_add64(z11, t);
+            Lib_IntVector_Intrinsics_vec256 x11 = Lib_IntVector_Intrinsics_vec256_and(x1, mask261);
+            Lib_IntVector_Intrinsics_vec256 x41 = Lib_IntVector_Intrinsics_vec256_and(x4, mask261);
+            Lib_IntVector_Intrinsics_vec256 x2 = Lib_IntVector_Intrinsics_vec256_add64(t2, z01);
+            Lib_IntVector_Intrinsics_vec256 x01 = Lib_IntVector_Intrinsics_vec256_add64(x0, z12);
+            Lib_IntVector_Intrinsics_vec256
+                z02 = Lib_IntVector_Intrinsics_vec256_shift_right64(x2, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256
+                z13 = Lib_IntVector_Intrinsics_vec256_shift_right64(x01, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256 x21 = Lib_IntVector_Intrinsics_vec256_and(x2, mask261);
+            Lib_IntVector_Intrinsics_vec256 x02 = Lib_IntVector_Intrinsics_vec256_and(x01, mask261);
+            Lib_IntVector_Intrinsics_vec256 x31 = Lib_IntVector_Intrinsics_vec256_add64(x3, z02);
+            Lib_IntVector_Intrinsics_vec256 x12 = Lib_IntVector_Intrinsics_vec256_add64(x11, z13);
+            Lib_IntVector_Intrinsics_vec256
+                z03 = Lib_IntVector_Intrinsics_vec256_shift_right64(x31, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256 x32 = Lib_IntVector_Intrinsics_vec256_and(x31, mask261);
+            Lib_IntVector_Intrinsics_vec256 x42 = Lib_IntVector_Intrinsics_vec256_add64(x41, z03);
+            Lib_IntVector_Intrinsics_vec256 o01 = x02;
+            Lib_IntVector_Intrinsics_vec256 o12 = x12;
+            Lib_IntVector_Intrinsics_vec256 o22 = x21;
+            Lib_IntVector_Intrinsics_vec256 o32 = x32;
+            Lib_IntVector_Intrinsics_vec256 o42 = x42;
+            acc0[0U] = o01;
+            acc0[1U] = o12;
+            acc0[2U] = o22;
+            acc0[3U] = o32;
+            acc0[4U] = o42;
+            Lib_IntVector_Intrinsics_vec256 f100 = acc0[0U];
+            Lib_IntVector_Intrinsics_vec256 f11 = acc0[1U];
+            Lib_IntVector_Intrinsics_vec256 f12 = acc0[2U];
+            Lib_IntVector_Intrinsics_vec256 f13 = acc0[3U];
+            Lib_IntVector_Intrinsics_vec256 f14 = acc0[4U];
+            Lib_IntVector_Intrinsics_vec256 f20 = e[0U];
+            Lib_IntVector_Intrinsics_vec256 f21 = e[1U];
+            Lib_IntVector_Intrinsics_vec256 f22 = e[2U];
+            Lib_IntVector_Intrinsics_vec256 f23 = e[3U];
+            Lib_IntVector_Intrinsics_vec256 f24 = e[4U];
+            Lib_IntVector_Intrinsics_vec256 o0 = Lib_IntVector_Intrinsics_vec256_add64(f100, f20);
+            Lib_IntVector_Intrinsics_vec256 o1 = Lib_IntVector_Intrinsics_vec256_add64(f11, f21);
+            Lib_IntVector_Intrinsics_vec256 o2 = Lib_IntVector_Intrinsics_vec256_add64(f12, f22);
+            Lib_IntVector_Intrinsics_vec256 o3 = Lib_IntVector_Intrinsics_vec256_add64(f13, f23);
+            Lib_IntVector_Intrinsics_vec256 o4 = Lib_IntVector_Intrinsics_vec256_add64(f14, f24);
+            acc0[0U] = o0;
+            acc0[1U] = o1;
+            acc0[2U] = o2;
+            acc0[3U] = o3;
+            acc0[4U] = o4;
+        }
+        Hacl_Impl_Poly1305_Field32xN_256_fmul_r4_normalize(acc0, pre0);
+    }
+    uint32_t len1 = n1 * (uint32_t)16U - len0;
+    uint8_t *t10 = blocks + len0;
+    uint32_t nb = len1 / (uint32_t)16U;
+    uint32_t rem2 = len1 % (uint32_t)16U;
+    for (uint32_t i = (uint32_t)0U; i < nb; i++) {
+        uint8_t *block = t10 + i * (uint32_t)16U;
+        Lib_IntVector_Intrinsics_vec256 e[5U];
+        for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
+            e[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+        uint64_t u0 = load64_le(block);
+        uint64_t lo = u0;
+        uint64_t u = load64_le(block + (uint32_t)8U);
+        uint64_t hi = u;
+        Lib_IntVector_Intrinsics_vec256 f0 = Lib_IntVector_Intrinsics_vec256_load64(lo);
+        Lib_IntVector_Intrinsics_vec256 f1 = Lib_IntVector_Intrinsics_vec256_load64(hi);
+        Lib_IntVector_Intrinsics_vec256
+            f010 =
+                Lib_IntVector_Intrinsics_vec256_and(f0,
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f110 =
+                Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                                  (uint32_t)26U),
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f20 =
+                Lib_IntVector_Intrinsics_vec256_or(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                                 (uint32_t)52U),
+                                                   Lib_IntVector_Intrinsics_vec256_shift_left64(Lib_IntVector_Intrinsics_vec256_and(f1,
+                                                                                                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3fffU)),
+                                                                                                (uint32_t)12U));
+        Lib_IntVector_Intrinsics_vec256
+            f30 =
+                Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f1,
+                                                                                                  (uint32_t)14U),
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f40 = Lib_IntVector_Intrinsics_vec256_shift_right64(f1, (uint32_t)40U);
+        Lib_IntVector_Intrinsics_vec256 f01 = f010;
+        Lib_IntVector_Intrinsics_vec256 f111 = f110;
+        Lib_IntVector_Intrinsics_vec256 f2 = f20;
+        Lib_IntVector_Intrinsics_vec256 f3 = f30;
+        Lib_IntVector_Intrinsics_vec256 f41 = f40;
+        e[0U] = f01;
+        e[1U] = f111;
+        e[2U] = f2;
+        e[3U] = f3;
+        e[4U] = f41;
+        uint64_t b = (uint64_t)0x1000000U;
+        Lib_IntVector_Intrinsics_vec256 mask = Lib_IntVector_Intrinsics_vec256_load64(b);
+        Lib_IntVector_Intrinsics_vec256 f4 = e[4U];
+        e[4U] = Lib_IntVector_Intrinsics_vec256_or(f4, mask);
+        Lib_IntVector_Intrinsics_vec256 *r1 = pre0;
+        Lib_IntVector_Intrinsics_vec256 *r5 = pre0 + (uint32_t)5U;
+        Lib_IntVector_Intrinsics_vec256 r0 = r1[0U];
+        Lib_IntVector_Intrinsics_vec256 r11 = r1[1U];
+        Lib_IntVector_Intrinsics_vec256 r2 = r1[2U];
+        Lib_IntVector_Intrinsics_vec256 r3 = r1[3U];
+        Lib_IntVector_Intrinsics_vec256 r4 = r1[4U];
+        Lib_IntVector_Intrinsics_vec256 r51 = r5[1U];
+        Lib_IntVector_Intrinsics_vec256 r52 = r5[2U];
+        Lib_IntVector_Intrinsics_vec256 r53 = r5[3U];
+        Lib_IntVector_Intrinsics_vec256 r54 = r5[4U];
+        Lib_IntVector_Intrinsics_vec256 f10 = e[0U];
+        Lib_IntVector_Intrinsics_vec256 f11 = e[1U];
+        Lib_IntVector_Intrinsics_vec256 f12 = e[2U];
+        Lib_IntVector_Intrinsics_vec256 f13 = e[3U];
+        Lib_IntVector_Intrinsics_vec256 f14 = e[4U];
+        Lib_IntVector_Intrinsics_vec256 a0 = acc0[0U];
+        Lib_IntVector_Intrinsics_vec256 a1 = acc0[1U];
+        Lib_IntVector_Intrinsics_vec256 a2 = acc0[2U];
+        Lib_IntVector_Intrinsics_vec256 a3 = acc0[3U];
+        Lib_IntVector_Intrinsics_vec256 a4 = acc0[4U];
+        Lib_IntVector_Intrinsics_vec256 a01 = Lib_IntVector_Intrinsics_vec256_add64(a0, f10);
+        Lib_IntVector_Intrinsics_vec256 a11 = Lib_IntVector_Intrinsics_vec256_add64(a1, f11);
+        Lib_IntVector_Intrinsics_vec256 a21 = Lib_IntVector_Intrinsics_vec256_add64(a2, f12);
+        Lib_IntVector_Intrinsics_vec256 a31 = Lib_IntVector_Intrinsics_vec256_add64(a3, f13);
+        Lib_IntVector_Intrinsics_vec256 a41 = Lib_IntVector_Intrinsics_vec256_add64(a4, f14);
+        Lib_IntVector_Intrinsics_vec256 a02 = Lib_IntVector_Intrinsics_vec256_mul64(r0, a01);
+        Lib_IntVector_Intrinsics_vec256 a12 = Lib_IntVector_Intrinsics_vec256_mul64(r11, a01);
+        Lib_IntVector_Intrinsics_vec256 a22 = Lib_IntVector_Intrinsics_vec256_mul64(r2, a01);
+        Lib_IntVector_Intrinsics_vec256 a32 = Lib_IntVector_Intrinsics_vec256_mul64(r3, a01);
+        Lib_IntVector_Intrinsics_vec256 a42 = Lib_IntVector_Intrinsics_vec256_mul64(r4, a01);
+        Lib_IntVector_Intrinsics_vec256
+            a03 =
+                Lib_IntVector_Intrinsics_vec256_add64(a02,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a13 =
+                Lib_IntVector_Intrinsics_vec256_add64(a12,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a23 =
+                Lib_IntVector_Intrinsics_vec256_add64(a22,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r11, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a33 =
+                Lib_IntVector_Intrinsics_vec256_add64(a32,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r2, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a43 =
+                Lib_IntVector_Intrinsics_vec256_add64(a42,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r3, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a04 =
+                Lib_IntVector_Intrinsics_vec256_add64(a03,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a14 =
+                Lib_IntVector_Intrinsics_vec256_add64(a13,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a24 =
+                Lib_IntVector_Intrinsics_vec256_add64(a23,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a34 =
+                Lib_IntVector_Intrinsics_vec256_add64(a33,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r11, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a44 =
+                Lib_IntVector_Intrinsics_vec256_add64(a43,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r2, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a05 =
+                Lib_IntVector_Intrinsics_vec256_add64(a04,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r52, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a15 =
+                Lib_IntVector_Intrinsics_vec256_add64(a14,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a25 =
+                Lib_IntVector_Intrinsics_vec256_add64(a24,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a35 =
+                Lib_IntVector_Intrinsics_vec256_add64(a34,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a45 =
+                Lib_IntVector_Intrinsics_vec256_add64(a44,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r11, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a06 =
+                Lib_IntVector_Intrinsics_vec256_add64(a05,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r51, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a16 =
+                Lib_IntVector_Intrinsics_vec256_add64(a15,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r52, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a26 =
+                Lib_IntVector_Intrinsics_vec256_add64(a25,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a36 =
+                Lib_IntVector_Intrinsics_vec256_add64(a35,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a46 =
+                Lib_IntVector_Intrinsics_vec256_add64(a45,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a41));
+        Lib_IntVector_Intrinsics_vec256 t01 = a06;
+        Lib_IntVector_Intrinsics_vec256 t11 = a16;
+        Lib_IntVector_Intrinsics_vec256 t2 = a26;
+        Lib_IntVector_Intrinsics_vec256 t3 = a36;
+        Lib_IntVector_Intrinsics_vec256 t4 = a46;
+        Lib_IntVector_Intrinsics_vec256
+            mask261 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+        Lib_IntVector_Intrinsics_vec256
+            z0 = Lib_IntVector_Intrinsics_vec256_shift_right64(t01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z1 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x0 = Lib_IntVector_Intrinsics_vec256_and(t01, mask261);
+        Lib_IntVector_Intrinsics_vec256 x3 = Lib_IntVector_Intrinsics_vec256_and(t3, mask261);
+        Lib_IntVector_Intrinsics_vec256 x1 = Lib_IntVector_Intrinsics_vec256_add64(t11, z0);
+        Lib_IntVector_Intrinsics_vec256 x4 = Lib_IntVector_Intrinsics_vec256_add64(t4, z1);
+        Lib_IntVector_Intrinsics_vec256
+            z01 = Lib_IntVector_Intrinsics_vec256_shift_right64(x1, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z11 = Lib_IntVector_Intrinsics_vec256_shift_right64(x4, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            t = Lib_IntVector_Intrinsics_vec256_shift_left64(z11, (uint32_t)2U);
+        Lib_IntVector_Intrinsics_vec256 z12 = Lib_IntVector_Intrinsics_vec256_add64(z11, t);
+        Lib_IntVector_Intrinsics_vec256 x11 = Lib_IntVector_Intrinsics_vec256_and(x1, mask261);
+        Lib_IntVector_Intrinsics_vec256 x41 = Lib_IntVector_Intrinsics_vec256_and(x4, mask261);
+        Lib_IntVector_Intrinsics_vec256 x2 = Lib_IntVector_Intrinsics_vec256_add64(t2, z01);
+        Lib_IntVector_Intrinsics_vec256 x01 = Lib_IntVector_Intrinsics_vec256_add64(x0, z12);
+        Lib_IntVector_Intrinsics_vec256
+            z02 = Lib_IntVector_Intrinsics_vec256_shift_right64(x2, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z13 = Lib_IntVector_Intrinsics_vec256_shift_right64(x01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x21 = Lib_IntVector_Intrinsics_vec256_and(x2, mask261);
+        Lib_IntVector_Intrinsics_vec256 x02 = Lib_IntVector_Intrinsics_vec256_and(x01, mask261);
+        Lib_IntVector_Intrinsics_vec256 x31 = Lib_IntVector_Intrinsics_vec256_add64(x3, z02);
+        Lib_IntVector_Intrinsics_vec256 x12 = Lib_IntVector_Intrinsics_vec256_add64(x11, z13);
+        Lib_IntVector_Intrinsics_vec256
+            z03 = Lib_IntVector_Intrinsics_vec256_shift_right64(x31, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x32 = Lib_IntVector_Intrinsics_vec256_and(x31, mask261);
+        Lib_IntVector_Intrinsics_vec256 x42 = Lib_IntVector_Intrinsics_vec256_add64(x41, z03);
+        Lib_IntVector_Intrinsics_vec256 o0 = x02;
+        Lib_IntVector_Intrinsics_vec256 o1 = x12;
+        Lib_IntVector_Intrinsics_vec256 o2 = x21;
+        Lib_IntVector_Intrinsics_vec256 o3 = x32;
+        Lib_IntVector_Intrinsics_vec256 o4 = x42;
+        acc0[0U] = o0;
+        acc0[1U] = o1;
+        acc0[2U] = o2;
+        acc0[3U] = o3;
+        acc0[4U] = o4;
+    }
+    if (rem2 > (uint32_t)0U) {
+        uint8_t *last1 = t10 + nb * (uint32_t)16U;
+        Lib_IntVector_Intrinsics_vec256 e[5U];
+        for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
+            e[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+        uint8_t tmp[16U] = { 0U };
+        memcpy(tmp, last1, rem2 * sizeof(last1[0U]));
+        uint64_t u0 = load64_le(tmp);
+        uint64_t lo = u0;
+        uint64_t u = load64_le(tmp + (uint32_t)8U);
+        uint64_t hi = u;
+        Lib_IntVector_Intrinsics_vec256 f0 = Lib_IntVector_Intrinsics_vec256_load64(lo);
+        Lib_IntVector_Intrinsics_vec256 f1 = Lib_IntVector_Intrinsics_vec256_load64(hi);
+        Lib_IntVector_Intrinsics_vec256
+            f010 =
+                Lib_IntVector_Intrinsics_vec256_and(f0,
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f110 =
+                Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                                  (uint32_t)26U),
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f20 =
+                Lib_IntVector_Intrinsics_vec256_or(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                                 (uint32_t)52U),
+                                                   Lib_IntVector_Intrinsics_vec256_shift_left64(Lib_IntVector_Intrinsics_vec256_and(f1,
+                                                                                                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3fffU)),
+                                                                                                (uint32_t)12U));
+        Lib_IntVector_Intrinsics_vec256
+            f30 =
+                Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f1,
+                                                                                                  (uint32_t)14U),
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f40 = Lib_IntVector_Intrinsics_vec256_shift_right64(f1, (uint32_t)40U);
+        Lib_IntVector_Intrinsics_vec256 f01 = f010;
+        Lib_IntVector_Intrinsics_vec256 f111 = f110;
+        Lib_IntVector_Intrinsics_vec256 f2 = f20;
+        Lib_IntVector_Intrinsics_vec256 f3 = f30;
+        Lib_IntVector_Intrinsics_vec256 f4 = f40;
+        e[0U] = f01;
+        e[1U] = f111;
+        e[2U] = f2;
+        e[3U] = f3;
+        e[4U] = f4;
+        uint64_t b = (uint64_t)1U << rem2 * (uint32_t)8U % (uint32_t)26U;
+        Lib_IntVector_Intrinsics_vec256 mask = Lib_IntVector_Intrinsics_vec256_load64(b);
+        Lib_IntVector_Intrinsics_vec256 fi = e[rem2 * (uint32_t)8U / (uint32_t)26U];
+        e[rem2 * (uint32_t)8U / (uint32_t)26U] = Lib_IntVector_Intrinsics_vec256_or(fi, mask);
+        Lib_IntVector_Intrinsics_vec256 *r1 = pre0;
+        Lib_IntVector_Intrinsics_vec256 *r5 = pre0 + (uint32_t)5U;
+        Lib_IntVector_Intrinsics_vec256 r0 = r1[0U];
+        Lib_IntVector_Intrinsics_vec256 r11 = r1[1U];
+        Lib_IntVector_Intrinsics_vec256 r2 = r1[2U];
+        Lib_IntVector_Intrinsics_vec256 r3 = r1[3U];
+        Lib_IntVector_Intrinsics_vec256 r4 = r1[4U];
+        Lib_IntVector_Intrinsics_vec256 r51 = r5[1U];
+        Lib_IntVector_Intrinsics_vec256 r52 = r5[2U];
+        Lib_IntVector_Intrinsics_vec256 r53 = r5[3U];
+        Lib_IntVector_Intrinsics_vec256 r54 = r5[4U];
+        Lib_IntVector_Intrinsics_vec256 f10 = e[0U];
+        Lib_IntVector_Intrinsics_vec256 f11 = e[1U];
+        Lib_IntVector_Intrinsics_vec256 f12 = e[2U];
+        Lib_IntVector_Intrinsics_vec256 f13 = e[3U];
+        Lib_IntVector_Intrinsics_vec256 f14 = e[4U];
+        Lib_IntVector_Intrinsics_vec256 a0 = acc0[0U];
+        Lib_IntVector_Intrinsics_vec256 a1 = acc0[1U];
+        Lib_IntVector_Intrinsics_vec256 a2 = acc0[2U];
+        Lib_IntVector_Intrinsics_vec256 a3 = acc0[3U];
+        Lib_IntVector_Intrinsics_vec256 a4 = acc0[4U];
+        Lib_IntVector_Intrinsics_vec256 a01 = Lib_IntVector_Intrinsics_vec256_add64(a0, f10);
+        Lib_IntVector_Intrinsics_vec256 a11 = Lib_IntVector_Intrinsics_vec256_add64(a1, f11);
+        Lib_IntVector_Intrinsics_vec256 a21 = Lib_IntVector_Intrinsics_vec256_add64(a2, f12);
+        Lib_IntVector_Intrinsics_vec256 a31 = Lib_IntVector_Intrinsics_vec256_add64(a3, f13);
+        Lib_IntVector_Intrinsics_vec256 a41 = Lib_IntVector_Intrinsics_vec256_add64(a4, f14);
+        Lib_IntVector_Intrinsics_vec256 a02 = Lib_IntVector_Intrinsics_vec256_mul64(r0, a01);
+        Lib_IntVector_Intrinsics_vec256 a12 = Lib_IntVector_Intrinsics_vec256_mul64(r11, a01);
+        Lib_IntVector_Intrinsics_vec256 a22 = Lib_IntVector_Intrinsics_vec256_mul64(r2, a01);
+        Lib_IntVector_Intrinsics_vec256 a32 = Lib_IntVector_Intrinsics_vec256_mul64(r3, a01);
+        Lib_IntVector_Intrinsics_vec256 a42 = Lib_IntVector_Intrinsics_vec256_mul64(r4, a01);
+        Lib_IntVector_Intrinsics_vec256
+            a03 =
+                Lib_IntVector_Intrinsics_vec256_add64(a02,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a13 =
+                Lib_IntVector_Intrinsics_vec256_add64(a12,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a23 =
+                Lib_IntVector_Intrinsics_vec256_add64(a22,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r11, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a33 =
+                Lib_IntVector_Intrinsics_vec256_add64(a32,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r2, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a43 =
+                Lib_IntVector_Intrinsics_vec256_add64(a42,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r3, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a04 =
+                Lib_IntVector_Intrinsics_vec256_add64(a03,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a14 =
+                Lib_IntVector_Intrinsics_vec256_add64(a13,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a24 =
+                Lib_IntVector_Intrinsics_vec256_add64(a23,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a34 =
+                Lib_IntVector_Intrinsics_vec256_add64(a33,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r11, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a44 =
+                Lib_IntVector_Intrinsics_vec256_add64(a43,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r2, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a05 =
+                Lib_IntVector_Intrinsics_vec256_add64(a04,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r52, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a15 =
+                Lib_IntVector_Intrinsics_vec256_add64(a14,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a25 =
+                Lib_IntVector_Intrinsics_vec256_add64(a24,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a35 =
+                Lib_IntVector_Intrinsics_vec256_add64(a34,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a45 =
+                Lib_IntVector_Intrinsics_vec256_add64(a44,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r11, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a06 =
+                Lib_IntVector_Intrinsics_vec256_add64(a05,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r51, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a16 =
+                Lib_IntVector_Intrinsics_vec256_add64(a15,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r52, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a26 =
+                Lib_IntVector_Intrinsics_vec256_add64(a25,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a36 =
+                Lib_IntVector_Intrinsics_vec256_add64(a35,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a46 =
+                Lib_IntVector_Intrinsics_vec256_add64(a45,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a41));
+        Lib_IntVector_Intrinsics_vec256 t01 = a06;
+        Lib_IntVector_Intrinsics_vec256 t11 = a16;
+        Lib_IntVector_Intrinsics_vec256 t2 = a26;
+        Lib_IntVector_Intrinsics_vec256 t3 = a36;
+        Lib_IntVector_Intrinsics_vec256 t4 = a46;
+        Lib_IntVector_Intrinsics_vec256
+            mask261 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+        Lib_IntVector_Intrinsics_vec256
+            z0 = Lib_IntVector_Intrinsics_vec256_shift_right64(t01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z1 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x0 = Lib_IntVector_Intrinsics_vec256_and(t01, mask261);
+        Lib_IntVector_Intrinsics_vec256 x3 = Lib_IntVector_Intrinsics_vec256_and(t3, mask261);
+        Lib_IntVector_Intrinsics_vec256 x1 = Lib_IntVector_Intrinsics_vec256_add64(t11, z0);
+        Lib_IntVector_Intrinsics_vec256 x4 = Lib_IntVector_Intrinsics_vec256_add64(t4, z1);
+        Lib_IntVector_Intrinsics_vec256
+            z01 = Lib_IntVector_Intrinsics_vec256_shift_right64(x1, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z11 = Lib_IntVector_Intrinsics_vec256_shift_right64(x4, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            t = Lib_IntVector_Intrinsics_vec256_shift_left64(z11, (uint32_t)2U);
+        Lib_IntVector_Intrinsics_vec256 z12 = Lib_IntVector_Intrinsics_vec256_add64(z11, t);
+        Lib_IntVector_Intrinsics_vec256 x11 = Lib_IntVector_Intrinsics_vec256_and(x1, mask261);
+        Lib_IntVector_Intrinsics_vec256 x41 = Lib_IntVector_Intrinsics_vec256_and(x4, mask261);
+        Lib_IntVector_Intrinsics_vec256 x2 = Lib_IntVector_Intrinsics_vec256_add64(t2, z01);
+        Lib_IntVector_Intrinsics_vec256 x01 = Lib_IntVector_Intrinsics_vec256_add64(x0, z12);
+        Lib_IntVector_Intrinsics_vec256
+            z02 = Lib_IntVector_Intrinsics_vec256_shift_right64(x2, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z13 = Lib_IntVector_Intrinsics_vec256_shift_right64(x01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x21 = Lib_IntVector_Intrinsics_vec256_and(x2, mask261);
+        Lib_IntVector_Intrinsics_vec256 x02 = Lib_IntVector_Intrinsics_vec256_and(x01, mask261);
+        Lib_IntVector_Intrinsics_vec256 x31 = Lib_IntVector_Intrinsics_vec256_add64(x3, z02);
+        Lib_IntVector_Intrinsics_vec256 x12 = Lib_IntVector_Intrinsics_vec256_add64(x11, z13);
+        Lib_IntVector_Intrinsics_vec256
+            z03 = Lib_IntVector_Intrinsics_vec256_shift_right64(x31, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x32 = Lib_IntVector_Intrinsics_vec256_and(x31, mask261);
+        Lib_IntVector_Intrinsics_vec256 x42 = Lib_IntVector_Intrinsics_vec256_add64(x41, z03);
+        Lib_IntVector_Intrinsics_vec256 o0 = x02;
+        Lib_IntVector_Intrinsics_vec256 o1 = x12;
+        Lib_IntVector_Intrinsics_vec256 o2 = x21;
+        Lib_IntVector_Intrinsics_vec256 o3 = x32;
+        Lib_IntVector_Intrinsics_vec256 o4 = x42;
+        acc0[0U] = o0;
+        acc0[1U] = o1;
+        acc0[2U] = o2;
+        acc0[3U] = o3;
+        acc0[4U] = o4;
+    }
+    uint8_t tmp[16U] = { 0U };
+    memcpy(tmp, rem1, r * sizeof(rem1[0U]));
+    if (r > (uint32_t)0U) {
+        Lib_IntVector_Intrinsics_vec256 *pre = ctx + (uint32_t)5U;
+        Lib_IntVector_Intrinsics_vec256 *acc = ctx;
+        Lib_IntVector_Intrinsics_vec256 e[5U];
+        for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
+            e[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+        uint64_t u0 = load64_le(tmp);
+        uint64_t lo = u0;
+        uint64_t u = load64_le(tmp + (uint32_t)8U);
+        uint64_t hi = u;
+        Lib_IntVector_Intrinsics_vec256 f0 = Lib_IntVector_Intrinsics_vec256_load64(lo);
+        Lib_IntVector_Intrinsics_vec256 f1 = Lib_IntVector_Intrinsics_vec256_load64(hi);
+        Lib_IntVector_Intrinsics_vec256
+            f010 =
+                Lib_IntVector_Intrinsics_vec256_and(f0,
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f110 =
+                Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                                  (uint32_t)26U),
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f20 =
+                Lib_IntVector_Intrinsics_vec256_or(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                                 (uint32_t)52U),
+                                                   Lib_IntVector_Intrinsics_vec256_shift_left64(Lib_IntVector_Intrinsics_vec256_and(f1,
+                                                                                                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3fffU)),
+                                                                                                (uint32_t)12U));
+        Lib_IntVector_Intrinsics_vec256
+            f30 =
+                Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f1,
+                                                                                                  (uint32_t)14U),
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f40 = Lib_IntVector_Intrinsics_vec256_shift_right64(f1, (uint32_t)40U);
+        Lib_IntVector_Intrinsics_vec256 f01 = f010;
+        Lib_IntVector_Intrinsics_vec256 f111 = f110;
+        Lib_IntVector_Intrinsics_vec256 f2 = f20;
+        Lib_IntVector_Intrinsics_vec256 f3 = f30;
+        Lib_IntVector_Intrinsics_vec256 f41 = f40;
+        e[0U] = f01;
+        e[1U] = f111;
+        e[2U] = f2;
+        e[3U] = f3;
+        e[4U] = f41;
+        uint64_t b = (uint64_t)0x1000000U;
+        Lib_IntVector_Intrinsics_vec256 mask = Lib_IntVector_Intrinsics_vec256_load64(b);
+        Lib_IntVector_Intrinsics_vec256 f4 = e[4U];
+        e[4U] = Lib_IntVector_Intrinsics_vec256_or(f4, mask);
+        Lib_IntVector_Intrinsics_vec256 *r1 = pre;
+        Lib_IntVector_Intrinsics_vec256 *r5 = pre + (uint32_t)5U;
+        Lib_IntVector_Intrinsics_vec256 r0 = r1[0U];
+        Lib_IntVector_Intrinsics_vec256 r11 = r1[1U];
+        Lib_IntVector_Intrinsics_vec256 r2 = r1[2U];
+        Lib_IntVector_Intrinsics_vec256 r3 = r1[3U];
+        Lib_IntVector_Intrinsics_vec256 r4 = r1[4U];
+        Lib_IntVector_Intrinsics_vec256 r51 = r5[1U];
+        Lib_IntVector_Intrinsics_vec256 r52 = r5[2U];
+        Lib_IntVector_Intrinsics_vec256 r53 = r5[3U];
+        Lib_IntVector_Intrinsics_vec256 r54 = r5[4U];
+        Lib_IntVector_Intrinsics_vec256 f10 = e[0U];
+        Lib_IntVector_Intrinsics_vec256 f11 = e[1U];
+        Lib_IntVector_Intrinsics_vec256 f12 = e[2U];
+        Lib_IntVector_Intrinsics_vec256 f13 = e[3U];
+        Lib_IntVector_Intrinsics_vec256 f14 = e[4U];
+        Lib_IntVector_Intrinsics_vec256 a0 = acc[0U];
+        Lib_IntVector_Intrinsics_vec256 a1 = acc[1U];
+        Lib_IntVector_Intrinsics_vec256 a2 = acc[2U];
+        Lib_IntVector_Intrinsics_vec256 a3 = acc[3U];
+        Lib_IntVector_Intrinsics_vec256 a4 = acc[4U];
+        Lib_IntVector_Intrinsics_vec256 a01 = Lib_IntVector_Intrinsics_vec256_add64(a0, f10);
+        Lib_IntVector_Intrinsics_vec256 a11 = Lib_IntVector_Intrinsics_vec256_add64(a1, f11);
+        Lib_IntVector_Intrinsics_vec256 a21 = Lib_IntVector_Intrinsics_vec256_add64(a2, f12);
+        Lib_IntVector_Intrinsics_vec256 a31 = Lib_IntVector_Intrinsics_vec256_add64(a3, f13);
+        Lib_IntVector_Intrinsics_vec256 a41 = Lib_IntVector_Intrinsics_vec256_add64(a4, f14);
+        Lib_IntVector_Intrinsics_vec256 a02 = Lib_IntVector_Intrinsics_vec256_mul64(r0, a01);
+        Lib_IntVector_Intrinsics_vec256 a12 = Lib_IntVector_Intrinsics_vec256_mul64(r11, a01);
+        Lib_IntVector_Intrinsics_vec256 a22 = Lib_IntVector_Intrinsics_vec256_mul64(r2, a01);
+        Lib_IntVector_Intrinsics_vec256 a32 = Lib_IntVector_Intrinsics_vec256_mul64(r3, a01);
+        Lib_IntVector_Intrinsics_vec256 a42 = Lib_IntVector_Intrinsics_vec256_mul64(r4, a01);
+        Lib_IntVector_Intrinsics_vec256
+            a03 =
+                Lib_IntVector_Intrinsics_vec256_add64(a02,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a13 =
+                Lib_IntVector_Intrinsics_vec256_add64(a12,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a23 =
+                Lib_IntVector_Intrinsics_vec256_add64(a22,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r11, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a33 =
+                Lib_IntVector_Intrinsics_vec256_add64(a32,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r2, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a43 =
+                Lib_IntVector_Intrinsics_vec256_add64(a42,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r3, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a04 =
+                Lib_IntVector_Intrinsics_vec256_add64(a03,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a14 =
+                Lib_IntVector_Intrinsics_vec256_add64(a13,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a24 =
+                Lib_IntVector_Intrinsics_vec256_add64(a23,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a34 =
+                Lib_IntVector_Intrinsics_vec256_add64(a33,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r11, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a44 =
+                Lib_IntVector_Intrinsics_vec256_add64(a43,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r2, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a05 =
+                Lib_IntVector_Intrinsics_vec256_add64(a04,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r52, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a15 =
+                Lib_IntVector_Intrinsics_vec256_add64(a14,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a25 =
+                Lib_IntVector_Intrinsics_vec256_add64(a24,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a35 =
+                Lib_IntVector_Intrinsics_vec256_add64(a34,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a45 =
+                Lib_IntVector_Intrinsics_vec256_add64(a44,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r11, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a06 =
+                Lib_IntVector_Intrinsics_vec256_add64(a05,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r51, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a16 =
+                Lib_IntVector_Intrinsics_vec256_add64(a15,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r52, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a26 =
+                Lib_IntVector_Intrinsics_vec256_add64(a25,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a36 =
+                Lib_IntVector_Intrinsics_vec256_add64(a35,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a46 =
+                Lib_IntVector_Intrinsics_vec256_add64(a45,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a41));
+        Lib_IntVector_Intrinsics_vec256 t0 = a06;
+        Lib_IntVector_Intrinsics_vec256 t1 = a16;
+        Lib_IntVector_Intrinsics_vec256 t2 = a26;
+        Lib_IntVector_Intrinsics_vec256 t3 = a36;
+        Lib_IntVector_Intrinsics_vec256 t4 = a46;
+        Lib_IntVector_Intrinsics_vec256
+            mask261 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+        Lib_IntVector_Intrinsics_vec256
+            z0 = Lib_IntVector_Intrinsics_vec256_shift_right64(t0, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z1 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x0 = Lib_IntVector_Intrinsics_vec256_and(t0, mask261);
+        Lib_IntVector_Intrinsics_vec256 x3 = Lib_IntVector_Intrinsics_vec256_and(t3, mask261);
+        Lib_IntVector_Intrinsics_vec256 x1 = Lib_IntVector_Intrinsics_vec256_add64(t1, z0);
+        Lib_IntVector_Intrinsics_vec256 x4 = Lib_IntVector_Intrinsics_vec256_add64(t4, z1);
+        Lib_IntVector_Intrinsics_vec256
+            z01 = Lib_IntVector_Intrinsics_vec256_shift_right64(x1, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z11 = Lib_IntVector_Intrinsics_vec256_shift_right64(x4, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            t = Lib_IntVector_Intrinsics_vec256_shift_left64(z11, (uint32_t)2U);
+        Lib_IntVector_Intrinsics_vec256 z12 = Lib_IntVector_Intrinsics_vec256_add64(z11, t);
+        Lib_IntVector_Intrinsics_vec256 x11 = Lib_IntVector_Intrinsics_vec256_and(x1, mask261);
+        Lib_IntVector_Intrinsics_vec256 x41 = Lib_IntVector_Intrinsics_vec256_and(x4, mask261);
+        Lib_IntVector_Intrinsics_vec256 x2 = Lib_IntVector_Intrinsics_vec256_add64(t2, z01);
+        Lib_IntVector_Intrinsics_vec256 x01 = Lib_IntVector_Intrinsics_vec256_add64(x0, z12);
+        Lib_IntVector_Intrinsics_vec256
+            z02 = Lib_IntVector_Intrinsics_vec256_shift_right64(x2, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z13 = Lib_IntVector_Intrinsics_vec256_shift_right64(x01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x21 = Lib_IntVector_Intrinsics_vec256_and(x2, mask261);
+        Lib_IntVector_Intrinsics_vec256 x02 = Lib_IntVector_Intrinsics_vec256_and(x01, mask261);
+        Lib_IntVector_Intrinsics_vec256 x31 = Lib_IntVector_Intrinsics_vec256_add64(x3, z02);
+        Lib_IntVector_Intrinsics_vec256 x12 = Lib_IntVector_Intrinsics_vec256_add64(x11, z13);
+        Lib_IntVector_Intrinsics_vec256
+            z03 = Lib_IntVector_Intrinsics_vec256_shift_right64(x31, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x32 = Lib_IntVector_Intrinsics_vec256_and(x31, mask261);
+        Lib_IntVector_Intrinsics_vec256 x42 = Lib_IntVector_Intrinsics_vec256_add64(x41, z03);
+        Lib_IntVector_Intrinsics_vec256 o0 = x02;
+        Lib_IntVector_Intrinsics_vec256 o1 = x12;
+        Lib_IntVector_Intrinsics_vec256 o2 = x21;
+        Lib_IntVector_Intrinsics_vec256 o3 = x32;
+        Lib_IntVector_Intrinsics_vec256 o4 = x42;
+        acc[0U] = o0;
+        acc[1U] = o1;
+        acc[2U] = o2;
+        acc[3U] = o3;
+        acc[4U] = o4;
+        return;
+    }
+}
+
+static inline void
+poly1305_do_256(
+    uint8_t *k,
+    uint32_t aadlen,
+    uint8_t *aad,
+    uint32_t mlen,
+    uint8_t *m,
+    uint8_t *out)
+{
+    Lib_IntVector_Intrinsics_vec256 ctx[25U];
+    for (uint32_t _i = 0U; _i < (uint32_t)25U; ++_i)
+        ctx[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+    uint8_t block[16U] = { 0U };
+    Hacl_Poly1305_256_poly1305_init(ctx, k);
+    poly1305_padded_256(ctx, aadlen, aad);
+    poly1305_padded_256(ctx, mlen, m);
+    store64_le(block, (uint64_t)aadlen);
+    store64_le(block + (uint32_t)8U, (uint64_t)mlen);
+    Lib_IntVector_Intrinsics_vec256 *pre = ctx + (uint32_t)5U;
+    Lib_IntVector_Intrinsics_vec256 *acc = ctx;
+    Lib_IntVector_Intrinsics_vec256 e[5U];
+    for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
+        e[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+    uint64_t u0 = load64_le(block);
+    uint64_t lo = u0;
+    uint64_t u = load64_le(block + (uint32_t)8U);
+    uint64_t hi = u;
+    Lib_IntVector_Intrinsics_vec256 f0 = Lib_IntVector_Intrinsics_vec256_load64(lo);
+    Lib_IntVector_Intrinsics_vec256 f1 = Lib_IntVector_Intrinsics_vec256_load64(hi);
+    Lib_IntVector_Intrinsics_vec256
+        f010 =
+            Lib_IntVector_Intrinsics_vec256_and(f0,
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        f110 =
+            Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                              (uint32_t)26U),
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        f20 =
+            Lib_IntVector_Intrinsics_vec256_or(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                             (uint32_t)52U),
+                                               Lib_IntVector_Intrinsics_vec256_shift_left64(Lib_IntVector_Intrinsics_vec256_and(f1,
+                                                                                                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3fffU)),
+                                                                                            (uint32_t)12U));
+    Lib_IntVector_Intrinsics_vec256
+        f30 =
+            Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f1,
+                                                                                              (uint32_t)14U),
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        f40 = Lib_IntVector_Intrinsics_vec256_shift_right64(f1, (uint32_t)40U);
+    Lib_IntVector_Intrinsics_vec256 f01 = f010;
+    Lib_IntVector_Intrinsics_vec256 f111 = f110;
+    Lib_IntVector_Intrinsics_vec256 f2 = f20;
+    Lib_IntVector_Intrinsics_vec256 f3 = f30;
+    Lib_IntVector_Intrinsics_vec256 f41 = f40;
+    e[0U] = f01;
+    e[1U] = f111;
+    e[2U] = f2;
+    e[3U] = f3;
+    e[4U] = f41;
+    uint64_t b = (uint64_t)0x1000000U;
+    Lib_IntVector_Intrinsics_vec256 mask = Lib_IntVector_Intrinsics_vec256_load64(b);
+    Lib_IntVector_Intrinsics_vec256 f4 = e[4U];
+    e[4U] = Lib_IntVector_Intrinsics_vec256_or(f4, mask);
+    Lib_IntVector_Intrinsics_vec256 *r = pre;
+    Lib_IntVector_Intrinsics_vec256 *r5 = pre + (uint32_t)5U;
+    Lib_IntVector_Intrinsics_vec256 r0 = r[0U];
+    Lib_IntVector_Intrinsics_vec256 r1 = r[1U];
+    Lib_IntVector_Intrinsics_vec256 r2 = r[2U];
+    Lib_IntVector_Intrinsics_vec256 r3 = r[3U];
+    Lib_IntVector_Intrinsics_vec256 r4 = r[4U];
+    Lib_IntVector_Intrinsics_vec256 r51 = r5[1U];
+    Lib_IntVector_Intrinsics_vec256 r52 = r5[2U];
+    Lib_IntVector_Intrinsics_vec256 r53 = r5[3U];
+    Lib_IntVector_Intrinsics_vec256 r54 = r5[4U];
+    Lib_IntVector_Intrinsics_vec256 f10 = e[0U];
+    Lib_IntVector_Intrinsics_vec256 f11 = e[1U];
+    Lib_IntVector_Intrinsics_vec256 f12 = e[2U];
+    Lib_IntVector_Intrinsics_vec256 f13 = e[3U];
+    Lib_IntVector_Intrinsics_vec256 f14 = e[4U];
+    Lib_IntVector_Intrinsics_vec256 a0 = acc[0U];
+    Lib_IntVector_Intrinsics_vec256 a1 = acc[1U];
+    Lib_IntVector_Intrinsics_vec256 a2 = acc[2U];
+    Lib_IntVector_Intrinsics_vec256 a3 = acc[3U];
+    Lib_IntVector_Intrinsics_vec256 a4 = acc[4U];
+    Lib_IntVector_Intrinsics_vec256 a01 = Lib_IntVector_Intrinsics_vec256_add64(a0, f10);
+    Lib_IntVector_Intrinsics_vec256 a11 = Lib_IntVector_Intrinsics_vec256_add64(a1, f11);
+    Lib_IntVector_Intrinsics_vec256 a21 = Lib_IntVector_Intrinsics_vec256_add64(a2, f12);
+    Lib_IntVector_Intrinsics_vec256 a31 = Lib_IntVector_Intrinsics_vec256_add64(a3, f13);
+    Lib_IntVector_Intrinsics_vec256 a41 = Lib_IntVector_Intrinsics_vec256_add64(a4, f14);
+    Lib_IntVector_Intrinsics_vec256 a02 = Lib_IntVector_Intrinsics_vec256_mul64(r0, a01);
+    Lib_IntVector_Intrinsics_vec256 a12 = Lib_IntVector_Intrinsics_vec256_mul64(r1, a01);
+    Lib_IntVector_Intrinsics_vec256 a22 = Lib_IntVector_Intrinsics_vec256_mul64(r2, a01);
+    Lib_IntVector_Intrinsics_vec256 a32 = Lib_IntVector_Intrinsics_vec256_mul64(r3, a01);
+    Lib_IntVector_Intrinsics_vec256 a42 = Lib_IntVector_Intrinsics_vec256_mul64(r4, a01);
+    Lib_IntVector_Intrinsics_vec256
+        a03 =
+            Lib_IntVector_Intrinsics_vec256_add64(a02,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r54, a11));
+    Lib_IntVector_Intrinsics_vec256
+        a13 =
+            Lib_IntVector_Intrinsics_vec256_add64(a12,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r0, a11));
+    Lib_IntVector_Intrinsics_vec256
+        a23 =
+            Lib_IntVector_Intrinsics_vec256_add64(a22,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r1, a11));
+    Lib_IntVector_Intrinsics_vec256
+        a33 =
+            Lib_IntVector_Intrinsics_vec256_add64(a32,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r2, a11));
+    Lib_IntVector_Intrinsics_vec256
+        a43 =
+            Lib_IntVector_Intrinsics_vec256_add64(a42,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r3, a11));
+    Lib_IntVector_Intrinsics_vec256
+        a04 =
+            Lib_IntVector_Intrinsics_vec256_add64(a03,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r53, a21));
+    Lib_IntVector_Intrinsics_vec256
+        a14 =
+            Lib_IntVector_Intrinsics_vec256_add64(a13,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r54, a21));
+    Lib_IntVector_Intrinsics_vec256
+        a24 =
+            Lib_IntVector_Intrinsics_vec256_add64(a23,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r0, a21));
+    Lib_IntVector_Intrinsics_vec256
+        a34 =
+            Lib_IntVector_Intrinsics_vec256_add64(a33,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r1, a21));
+    Lib_IntVector_Intrinsics_vec256
+        a44 =
+            Lib_IntVector_Intrinsics_vec256_add64(a43,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r2, a21));
+    Lib_IntVector_Intrinsics_vec256
+        a05 =
+            Lib_IntVector_Intrinsics_vec256_add64(a04,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r52, a31));
+    Lib_IntVector_Intrinsics_vec256
+        a15 =
+            Lib_IntVector_Intrinsics_vec256_add64(a14,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r53, a31));
+    Lib_IntVector_Intrinsics_vec256
+        a25 =
+            Lib_IntVector_Intrinsics_vec256_add64(a24,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r54, a31));
+    Lib_IntVector_Intrinsics_vec256
+        a35 =
+            Lib_IntVector_Intrinsics_vec256_add64(a34,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r0, a31));
+    Lib_IntVector_Intrinsics_vec256
+        a45 =
+            Lib_IntVector_Intrinsics_vec256_add64(a44,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r1, a31));
+    Lib_IntVector_Intrinsics_vec256
+        a06 =
+            Lib_IntVector_Intrinsics_vec256_add64(a05,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r51, a41));
+    Lib_IntVector_Intrinsics_vec256
+        a16 =
+            Lib_IntVector_Intrinsics_vec256_add64(a15,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r52, a41));
+    Lib_IntVector_Intrinsics_vec256
+        a26 =
+            Lib_IntVector_Intrinsics_vec256_add64(a25,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r53, a41));
+    Lib_IntVector_Intrinsics_vec256
+        a36 =
+            Lib_IntVector_Intrinsics_vec256_add64(a35,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r54, a41));
+    Lib_IntVector_Intrinsics_vec256
+        a46 =
+            Lib_IntVector_Intrinsics_vec256_add64(a45,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r0, a41));
+    Lib_IntVector_Intrinsics_vec256 t0 = a06;
+    Lib_IntVector_Intrinsics_vec256 t1 = a16;
+    Lib_IntVector_Intrinsics_vec256 t2 = a26;
+    Lib_IntVector_Intrinsics_vec256 t3 = a36;
+    Lib_IntVector_Intrinsics_vec256 t4 = a46;
+    Lib_IntVector_Intrinsics_vec256
+        mask261 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+    Lib_IntVector_Intrinsics_vec256
+        z0 = Lib_IntVector_Intrinsics_vec256_shift_right64(t0, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z1 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x0 = Lib_IntVector_Intrinsics_vec256_and(t0, mask261);
+    Lib_IntVector_Intrinsics_vec256 x3 = Lib_IntVector_Intrinsics_vec256_and(t3, mask261);
+    Lib_IntVector_Intrinsics_vec256 x1 = Lib_IntVector_Intrinsics_vec256_add64(t1, z0);
+    Lib_IntVector_Intrinsics_vec256 x4 = Lib_IntVector_Intrinsics_vec256_add64(t4, z1);
+    Lib_IntVector_Intrinsics_vec256
+        z01 = Lib_IntVector_Intrinsics_vec256_shift_right64(x1, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z11 = Lib_IntVector_Intrinsics_vec256_shift_right64(x4, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        t = Lib_IntVector_Intrinsics_vec256_shift_left64(z11, (uint32_t)2U);
+    Lib_IntVector_Intrinsics_vec256 z12 = Lib_IntVector_Intrinsics_vec256_add64(z11, t);
+    Lib_IntVector_Intrinsics_vec256 x11 = Lib_IntVector_Intrinsics_vec256_and(x1, mask261);
+    Lib_IntVector_Intrinsics_vec256 x41 = Lib_IntVector_Intrinsics_vec256_and(x4, mask261);
+    Lib_IntVector_Intrinsics_vec256 x2 = Lib_IntVector_Intrinsics_vec256_add64(t2, z01);
+    Lib_IntVector_Intrinsics_vec256 x01 = Lib_IntVector_Intrinsics_vec256_add64(x0, z12);
+    Lib_IntVector_Intrinsics_vec256
+        z02 = Lib_IntVector_Intrinsics_vec256_shift_right64(x2, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z13 = Lib_IntVector_Intrinsics_vec256_shift_right64(x01, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x21 = Lib_IntVector_Intrinsics_vec256_and(x2, mask261);
+    Lib_IntVector_Intrinsics_vec256 x02 = Lib_IntVector_Intrinsics_vec256_and(x01, mask261);
+    Lib_IntVector_Intrinsics_vec256 x31 = Lib_IntVector_Intrinsics_vec256_add64(x3, z02);
+    Lib_IntVector_Intrinsics_vec256 x12 = Lib_IntVector_Intrinsics_vec256_add64(x11, z13);
+    Lib_IntVector_Intrinsics_vec256
+        z03 = Lib_IntVector_Intrinsics_vec256_shift_right64(x31, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x32 = Lib_IntVector_Intrinsics_vec256_and(x31, mask261);
+    Lib_IntVector_Intrinsics_vec256 x42 = Lib_IntVector_Intrinsics_vec256_add64(x41, z03);
+    Lib_IntVector_Intrinsics_vec256 o0 = x02;
+    Lib_IntVector_Intrinsics_vec256 o1 = x12;
+    Lib_IntVector_Intrinsics_vec256 o2 = x21;
+    Lib_IntVector_Intrinsics_vec256 o3 = x32;
+    Lib_IntVector_Intrinsics_vec256 o4 = x42;
+    acc[0U] = o0;
+    acc[1U] = o1;
+    acc[2U] = o2;
+    acc[3U] = o3;
+    acc[4U] = o4;
+    Hacl_Poly1305_256_poly1305_finish(out, k, ctx);
+}
+
+void
+Hacl_Chacha20Poly1305_256_aead_encrypt(
+    uint8_t *k,
+    uint8_t *n1,
+    uint32_t aadlen,
+    uint8_t *aad,
+    uint32_t mlen,
+    uint8_t *m,
+    uint8_t *cipher,
+    uint8_t *mac)
+{
+    Hacl_Chacha20_Vec256_chacha20_encrypt_256(mlen, cipher, m, k, n1, (uint32_t)1U);
+    uint8_t tmp[64U] = { 0U };
+    Hacl_Chacha20_Vec256_chacha20_encrypt_256((uint32_t)64U, tmp, tmp, k, n1, (uint32_t)0U);
+    uint8_t *key = tmp;
+    poly1305_do_256(key, aadlen, aad, mlen, cipher, mac);
+}
+
+uint32_t
+Hacl_Chacha20Poly1305_256_aead_decrypt(
+    uint8_t *k,
+    uint8_t *n1,
+    uint32_t aadlen,
+    uint8_t *aad,
+    uint32_t mlen,
+    uint8_t *m,
+    uint8_t *cipher,
+    uint8_t *mac)
+{
+    uint8_t computed_mac[16U] = { 0U };
+    uint8_t tmp[64U] = { 0U };
+    Hacl_Chacha20_Vec256_chacha20_encrypt_256((uint32_t)64U, tmp, tmp, k, n1, (uint32_t)0U);
+    uint8_t *key = tmp;
+    poly1305_do_256(key, aadlen, aad, mlen, cipher, computed_mac);
+    uint8_t res = (uint8_t)255U;
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
+        uint8_t uu____0 = FStar_UInt8_eq_mask(computed_mac[i], mac[i]);
+        res = uu____0 & res;
+    }
+    uint8_t z = res;
+    if (z == (uint8_t)255U) {
+        Hacl_Chacha20_Vec256_chacha20_encrypt_256(mlen, m, cipher, k, n1, (uint32_t)1U);
+        return (uint32_t)0U;
+    }
+    return (uint32_t)1U;
+}
new file mode 100644
--- /dev/null
+++ b/security/nss/lib/freebl/verified/Hacl_Chacha20Poly1305_256.h
@@ -0,0 +1,60 @@
+/* MIT License
+ *
+ * Copyright (c) 2016-2020 INRIA, CMU and Microsoft Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "libintvector.h"
+#include "kremlin/internal/types.h"
+#include "kremlin/lowstar_endianness.h"
+#include <string.h>
+#include <stdbool.h>
+
+#ifndef __Hacl_Chacha20Poly1305_256_H
+#define __Hacl_Chacha20Poly1305_256_H
+
+#include "Hacl_Kremlib.h"
+#include "Hacl_Chacha20_Vec256.h"
+#include "Hacl_Poly1305_256.h"
+
+void
+Hacl_Chacha20Poly1305_256_aead_encrypt(
+    uint8_t *k,
+    uint8_t *n1,
+    uint32_t aadlen,
+    uint8_t *aad,
+    uint32_t mlen,
+    uint8_t *m,
+    uint8_t *cipher,
+    uint8_t *mac);
+
+uint32_t
+Hacl_Chacha20Poly1305_256_aead_decrypt(
+    uint8_t *k,
+    uint8_t *n1,
+    uint32_t aadlen,
+    uint8_t *aad,
+    uint32_t mlen,
+    uint8_t *m,
+    uint8_t *cipher,
+    uint8_t *mac);
+
+#define __Hacl_Chacha20Poly1305_256_H_DEFINED
+#endif
--- a/security/nss/lib/freebl/verified/Hacl_Chacha20Poly1305_32.c
+++ b/security/nss/lib/freebl/verified/Hacl_Chacha20Poly1305_32.c
@@ -18,28 +18,28 @@
  * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  */
 
 #include "Hacl_Chacha20Poly1305_32.h"
 
-static void
-Hacl_Chacha20Poly1305_32_poly1305_padded_32(uint64_t *ctx, uint32_t len, uint8_t *text)
+static inline void
+poly1305_padded_32(uint64_t *ctx, uint32_t len, uint8_t *text)
 {
     uint32_t n1 = len / (uint32_t)16U;
     uint32_t r = len % (uint32_t)16U;
     uint8_t *blocks = text;
     uint8_t *rem1 = text + n1 * (uint32_t)16U;
     uint64_t *pre0 = ctx + (uint32_t)5U;
     uint64_t *acc0 = ctx;
     uint32_t nb = n1 * (uint32_t)16U / (uint32_t)16U;
     uint32_t rem2 = n1 * (uint32_t)16U % (uint32_t)16U;
-    for (uint32_t i = (uint32_t)0U; i < nb; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < nb; i++) {
         uint8_t *block = blocks + i * (uint32_t)16U;
         uint64_t e[5U] = { 0U };
         uint64_t u0 = load64_le(block);
         uint64_t lo = u0;
         uint64_t u = load64_le(block + (uint32_t)8U);
         uint64_t hi = u;
         uint64_t f0 = lo;
         uint64_t f1 = hi;
@@ -113,51 +113,56 @@ Hacl_Chacha20Poly1305_32_poly1305_padded
         uint64_t a26 = a25 + r53 * a41;
         uint64_t a36 = a35 + r54 * a41;
         uint64_t a46 = a45 + r0 * a41;
         uint64_t t0 = a06;
         uint64_t t1 = a16;
         uint64_t t2 = a26;
         uint64_t t3 = a36;
         uint64_t t4 = a46;
-        uint64_t l = t0 + (uint64_t)0U;
-        uint64_t tmp0 = l & (uint64_t)0x3ffffffU;
-        uint64_t c01 = l >> (uint32_t)26U;
-        uint64_t l0 = t1 + c01;
-        uint64_t tmp1 = l0 & (uint64_t)0x3ffffffU;
-        uint64_t c11 = l0 >> (uint32_t)26U;
-        uint64_t l1 = t2 + c11;
-        uint64_t tmp2 = l1 & (uint64_t)0x3ffffffU;
-        uint64_t c21 = l1 >> (uint32_t)26U;
-        uint64_t l2 = t3 + c21;
-        uint64_t tmp3 = l2 & (uint64_t)0x3ffffffU;
-        uint64_t c31 = l2 >> (uint32_t)26U;
-        uint64_t l3 = t4 + c31;
-        uint64_t tmp4 = l3 & (uint64_t)0x3ffffffU;
-        uint64_t c4 = l3 >> (uint32_t)26U;
-        uint64_t l4 = tmp0 + c4 * (uint64_t)5U;
-        uint64_t tmp01 = l4 & (uint64_t)0x3ffffffU;
-        uint64_t c5 = l4 >> (uint32_t)26U;
-        uint64_t tmp11 = tmp1 + c5;
-        uint64_t o0 = tmp01;
-        uint64_t o1 = tmp11;
-        uint64_t o2 = tmp2;
-        uint64_t o3 = tmp3;
-        uint64_t o4 = tmp4;
+        uint64_t mask261 = (uint64_t)0x3ffffffU;
+        uint64_t z0 = t0 >> (uint32_t)26U;
+        uint64_t z1 = t3 >> (uint32_t)26U;
+        uint64_t x0 = t0 & mask261;
+        uint64_t x3 = t3 & mask261;
+        uint64_t x1 = t1 + z0;
+        uint64_t x4 = t4 + z1;
+        uint64_t z01 = x1 >> (uint32_t)26U;
+        uint64_t z11 = x4 >> (uint32_t)26U;
+        uint64_t t = z11 << (uint32_t)2U;
+        uint64_t z12 = z11 + t;
+        uint64_t x11 = x1 & mask261;
+        uint64_t x41 = x4 & mask261;
+        uint64_t x2 = t2 + z01;
+        uint64_t x01 = x0 + z12;
+        uint64_t z02 = x2 >> (uint32_t)26U;
+        uint64_t z13 = x01 >> (uint32_t)26U;
+        uint64_t x21 = x2 & mask261;
+        uint64_t x02 = x01 & mask261;
+        uint64_t x31 = x3 + z02;
+        uint64_t x12 = x11 + z13;
+        uint64_t z03 = x31 >> (uint32_t)26U;
+        uint64_t x32 = x31 & mask261;
+        uint64_t x42 = x41 + z03;
+        uint64_t o0 = x02;
+        uint64_t o1 = x12;
+        uint64_t o2 = x21;
+        uint64_t o3 = x32;
+        uint64_t o4 = x42;
         acc0[0U] = o0;
         acc0[1U] = o1;
         acc0[2U] = o2;
         acc0[3U] = o3;
         acc0[4U] = o4;
     }
     if (rem2 > (uint32_t)0U) {
         uint8_t *last1 = blocks + nb * (uint32_t)16U;
         uint64_t e[5U] = { 0U };
         uint8_t tmp[16U] = { 0U };
-        memcpy(tmp, last1, rem2 * sizeof last1[0U]);
+        memcpy(tmp, last1, rem2 * sizeof(last1[0U]));
         uint64_t u0 = load64_le(tmp);
         uint64_t lo = u0;
         uint64_t u = load64_le(tmp + (uint32_t)8U);
         uint64_t hi = u;
         uint64_t f0 = lo;
         uint64_t f1 = hi;
         uint64_t f010 = f0 & (uint64_t)0x3ffffffU;
         uint64_t f110 = f0 >> (uint32_t)26U & (uint64_t)0x3ffffffU;
@@ -229,48 +234,53 @@ Hacl_Chacha20Poly1305_32_poly1305_padded
         uint64_t a26 = a25 + r53 * a41;
         uint64_t a36 = a35 + r54 * a41;
         uint64_t a46 = a45 + r0 * a41;
         uint64_t t0 = a06;
         uint64_t t1 = a16;
         uint64_t t2 = a26;
         uint64_t t3 = a36;
         uint64_t t4 = a46;
-        uint64_t l = t0 + (uint64_t)0U;
-        uint64_t tmp0 = l & (uint64_t)0x3ffffffU;
-        uint64_t c01 = l >> (uint32_t)26U;
-        uint64_t l0 = t1 + c01;
-        uint64_t tmp1 = l0 & (uint64_t)0x3ffffffU;
-        uint64_t c11 = l0 >> (uint32_t)26U;
-        uint64_t l1 = t2 + c11;
-        uint64_t tmp2 = l1 & (uint64_t)0x3ffffffU;
-        uint64_t c21 = l1 >> (uint32_t)26U;
-        uint64_t l2 = t3 + c21;
-        uint64_t tmp3 = l2 & (uint64_t)0x3ffffffU;
-        uint64_t c31 = l2 >> (uint32_t)26U;
-        uint64_t l3 = t4 + c31;
-        uint64_t tmp4 = l3 & (uint64_t)0x3ffffffU;
-        uint64_t c4 = l3 >> (uint32_t)26U;
-        uint64_t l4 = tmp0 + c4 * (uint64_t)5U;
-        uint64_t tmp01 = l4 & (uint64_t)0x3ffffffU;
-        uint64_t c5 = l4 >> (uint32_t)26U;
-        uint64_t tmp11 = tmp1 + c5;
-        uint64_t o0 = tmp01;
-        uint64_t o1 = tmp11;
-        uint64_t o2 = tmp2;
-        uint64_t o3 = tmp3;
-        uint64_t o4 = tmp4;
+        uint64_t mask261 = (uint64_t)0x3ffffffU;
+        uint64_t z0 = t0 >> (uint32_t)26U;
+        uint64_t z1 = t3 >> (uint32_t)26U;
+        uint64_t x0 = t0 & mask261;
+        uint64_t x3 = t3 & mask261;
+        uint64_t x1 = t1 + z0;
+        uint64_t x4 = t4 + z1;
+        uint64_t z01 = x1 >> (uint32_t)26U;
+        uint64_t z11 = x4 >> (uint32_t)26U;
+        uint64_t t = z11 << (uint32_t)2U;
+        uint64_t z12 = z11 + t;
+        uint64_t x11 = x1 & mask261;
+        uint64_t x41 = x4 & mask261;
+        uint64_t x2 = t2 + z01;
+        uint64_t x01 = x0 + z12;
+        uint64_t z02 = x2 >> (uint32_t)26U;
+        uint64_t z13 = x01 >> (uint32_t)26U;
+        uint64_t x21 = x2 & mask261;
+        uint64_t x02 = x01 & mask261;
+        uint64_t x31 = x3 + z02;
+        uint64_t x12 = x11 + z13;
+        uint64_t z03 = x31 >> (uint32_t)26U;
+        uint64_t x32 = x31 & mask261;
+        uint64_t x42 = x41 + z03;
+        uint64_t o0 = x02;
+        uint64_t o1 = x12;
+        uint64_t o2 = x21;
+        uint64_t o3 = x32;
+        uint64_t o4 = x42;
         acc0[0U] = o0;
         acc0[1U] = o1;
         acc0[2U] = o2;
         acc0[3U] = o3;
         acc0[4U] = o4;
     }
     uint8_t tmp[16U] = { 0U };
-    memcpy(tmp, rem1, r * sizeof rem1[0U]);
+    memcpy(tmp, rem1, r * sizeof(rem1[0U]));
     if (r > (uint32_t)0U) {
         uint64_t *pre = ctx + (uint32_t)5U;
         uint64_t *acc = ctx;
         uint64_t e[5U] = { 0U };
         uint64_t u0 = load64_le(tmp);
         uint64_t lo = u0;
         uint64_t u = load64_le(tmp + (uint32_t)8U);
         uint64_t hi = u;
@@ -346,63 +356,68 @@ Hacl_Chacha20Poly1305_32_poly1305_padded
         uint64_t a26 = a25 + r53 * a41;
         uint64_t a36 = a35 + r54 * a41;
         uint64_t a46 = a45 + r0 * a41;
         uint64_t t0 = a06;
         uint64_t t1 = a16;
         uint64_t t2 = a26;
         uint64_t t3 = a36;
         uint64_t t4 = a46;
-        uint64_t l = t0 + (uint64_t)0U;
-        uint64_t tmp0 = l & (uint64_t)0x3ffffffU;
-        uint64_t c01 = l >> (uint32_t)26U;
-        uint64_t l0 = t1 + c01;
-        uint64_t tmp1 = l0 & (uint64_t)0x3ffffffU;
-        uint64_t c11 = l0 >> (uint32_t)26U;
-        uint64_t l1 = t2 + c11;
-        uint64_t tmp2 = l1 & (uint64_t)0x3ffffffU;
-        uint64_t c21 = l1 >> (uint32_t)26U;
-        uint64_t l2 = t3 + c21;
-        uint64_t tmp3 = l2 & (uint64_t)0x3ffffffU;
-        uint64_t c31 = l2 >> (uint32_t)26U;
-        uint64_t l3 = t4 + c31;
-        uint64_t tmp4 = l3 & (uint64_t)0x3ffffffU;
-        uint64_t c4 = l3 >> (uint32_t)26U;
-        uint64_t l4 = tmp0 + c4 * (uint64_t)5U;
-        uint64_t tmp01 = l4 & (uint64_t)0x3ffffffU;
-        uint64_t c5 = l4 >> (uint32_t)26U;
-        uint64_t tmp11 = tmp1 + c5;
-        uint64_t o0 = tmp01;
-        uint64_t o1 = tmp11;
-        uint64_t o2 = tmp2;
-        uint64_t o3 = tmp3;
-        uint64_t o4 = tmp4;
+        uint64_t mask261 = (uint64_t)0x3ffffffU;
+        uint64_t z0 = t0 >> (uint32_t)26U;
+        uint64_t z1 = t3 >> (uint32_t)26U;
+        uint64_t x0 = t0 & mask261;
+        uint64_t x3 = t3 & mask261;
+        uint64_t x1 = t1 + z0;
+        uint64_t x4 = t4 + z1;
+        uint64_t z01 = x1 >> (uint32_t)26U;
+        uint64_t z11 = x4 >> (uint32_t)26U;
+        uint64_t t = z11 << (uint32_t)2U;
+        uint64_t z12 = z11 + t;
+        uint64_t x11 = x1 & mask261;
+        uint64_t x41 = x4 & mask261;
+        uint64_t x2 = t2 + z01;
+        uint64_t x01 = x0 + z12;
+        uint64_t z02 = x2 >> (uint32_t)26U;
+        uint64_t z13 = x01 >> (uint32_t)26U;
+        uint64_t x21 = x2 & mask261;
+        uint64_t x02 = x01 & mask261;
+        uint64_t x31 = x3 + z02;
+        uint64_t x12 = x11 + z13;
+        uint64_t z03 = x31 >> (uint32_t)26U;
+        uint64_t x32 = x31 & mask261;
+        uint64_t x42 = x41 + z03;
+        uint64_t o0 = x02;
+        uint64_t o1 = x12;
+        uint64_t o2 = x21;
+        uint64_t o3 = x32;
+        uint64_t o4 = x42;
         acc[0U] = o0;
         acc[1U] = o1;
         acc[2U] = o2;
         acc[3U] = o3;
         acc[4U] = o4;
         return;
     }
 }
 
-static void
-Hacl_Chacha20Poly1305_32_poly1305_do_32(
+static inline void
+poly1305_do_32(
     uint8_t *k,
     uint32_t aadlen,
     uint8_t *aad,
     uint32_t mlen,
     uint8_t *m,
     uint8_t *out)
 {
     uint64_t ctx[25U] = { 0U };
     uint8_t block[16U] = { 0U };
     Hacl_Poly1305_32_poly1305_init(ctx, k);
-    Hacl_Chacha20Poly1305_32_poly1305_padded_32(ctx, aadlen, aad);
-    Hacl_Chacha20Poly1305_32_poly1305_padded_32(ctx, mlen, m);
+    poly1305_padded_32(ctx, aadlen, aad);
+    poly1305_padded_32(ctx, mlen, m);
     store64_le(block, (uint64_t)aadlen);
     store64_le(block + (uint32_t)8U, (uint64_t)mlen);
     uint64_t *pre = ctx + (uint32_t)5U;
     uint64_t *acc = ctx;
     uint64_t e[5U] = { 0U };
     uint64_t u0 = load64_le(block);
     uint64_t lo = u0;
     uint64_t u = load64_le(block + (uint32_t)8U);
@@ -479,40 +494,45 @@ Hacl_Chacha20Poly1305_32_poly1305_do_32(
     uint64_t a26 = a25 + r53 * a41;
     uint64_t a36 = a35 + r54 * a41;
     uint64_t a46 = a45 + r0 * a41;
     uint64_t t0 = a06;
     uint64_t t1 = a16;
     uint64_t t2 = a26;
     uint64_t t3 = a36;
     uint64_t t4 = a46;
-    uint64_t l = t0 + (uint64_t)0U;
-    uint64_t tmp0 = l & (uint64_t)0x3ffffffU;
-    uint64_t c01 = l >> (uint32_t)26U;
-    uint64_t l0 = t1 + c01;
-    uint64_t tmp1 = l0 & (uint64_t)0x3ffffffU;
-    uint64_t c11 = l0 >> (uint32_t)26U;
-    uint64_t l1 = t2 + c11;
-    uint64_t tmp2 = l1 & (uint64_t)0x3ffffffU;
-    uint64_t c21 = l1 >> (uint32_t)26U;
-    uint64_t l2 = t3 + c21;
-    uint64_t tmp3 = l2 & (uint64_t)0x3ffffffU;
-    uint64_t c31 = l2 >> (uint32_t)26U;
-    uint64_t l3 = t4 + c31;
-    uint64_t tmp4 = l3 & (uint64_t)0x3ffffffU;
-    uint64_t c4 = l3 >> (uint32_t)26U;
-    uint64_t l4 = tmp0 + c4 * (uint64_t)5U;
-    uint64_t tmp01 = l4 & (uint64_t)0x3ffffffU;
-    uint64_t c5 = l4 >> (uint32_t)26U;
-    uint64_t tmp11 = tmp1 + c5;
-    uint64_t o0 = tmp01;
-    uint64_t o1 = tmp11;
-    uint64_t o2 = tmp2;
-    uint64_t o3 = tmp3;
-    uint64_t o4 = tmp4;
+    uint64_t mask261 = (uint64_t)0x3ffffffU;
+    uint64_t z0 = t0 >> (uint32_t)26U;
+    uint64_t z1 = t3 >> (uint32_t)26U;
+    uint64_t x0 = t0 & mask261;
+    uint64_t x3 = t3 & mask261;
+    uint64_t x1 = t1 + z0;
+    uint64_t x4 = t4 + z1;
+    uint64_t z01 = x1 >> (uint32_t)26U;
+    uint64_t z11 = x4 >> (uint32_t)26U;
+    uint64_t t = z11 << (uint32_t)2U;
+    uint64_t z12 = z11 + t;
+    uint64_t x11 = x1 & mask261;
+    uint64_t x41 = x4 & mask261;
+    uint64_t x2 = t2 + z01;
+    uint64_t x01 = x0 + z12;
+    uint64_t z02 = x2 >> (uint32_t)26U;
+    uint64_t z13 = x01 >> (uint32_t)26U;
+    uint64_t x21 = x2 & mask261;
+    uint64_t x02 = x01 & mask261;
+    uint64_t x31 = x3 + z02;
+    uint64_t x12 = x11 + z13;
+    uint64_t z03 = x31 >> (uint32_t)26U;
+    uint64_t x32 = x31 & mask261;
+    uint64_t x42 = x41 + z03;
+    uint64_t o0 = x02;
+    uint64_t o1 = x12;
+    uint64_t o2 = x21;
+    uint64_t o3 = x32;
+    uint64_t o4 = x42;
     acc[0U] = o0;
     acc[1U] = o1;
     acc[2U] = o2;
     acc[3U] = o3;
     acc[4U] = o4;
     Hacl_Poly1305_32_poly1305_finish(out, k, ctx);
 }
 
@@ -526,17 +546,17 @@ Hacl_Chacha20Poly1305_32_aead_encrypt(
     uint8_t *m,
     uint8_t *cipher,
     uint8_t *mac)
 {
     Hacl_Chacha20_chacha20_encrypt(mlen, cipher, m, k, n1, (uint32_t)1U);
     uint8_t tmp[64U] = { 0U };
     Hacl_Chacha20_chacha20_encrypt((uint32_t)64U, tmp, tmp, k, n1, (uint32_t)0U);
     uint8_t *key = tmp;
-    Hacl_Chacha20Poly1305_32_poly1305_do_32(key, aadlen, aad, mlen, cipher, mac);
+    poly1305_do_32(key, aadlen, aad, mlen, cipher, mac);
 }
 
 uint32_t
 Hacl_Chacha20Poly1305_32_aead_decrypt(
     uint8_t *k,
     uint8_t *n1,
     uint32_t aadlen,
     uint8_t *aad,
@@ -544,19 +564,19 @@ Hacl_Chacha20Poly1305_32_aead_decrypt(
     uint8_t *m,
     uint8_t *cipher,
     uint8_t *mac)
 {
     uint8_t computed_mac[16U] = { 0U };
     uint8_t tmp[64U] = { 0U };
     Hacl_Chacha20_chacha20_encrypt((uint32_t)64U, tmp, tmp, k, n1, (uint32_t)0U);
     uint8_t *key = tmp;
-    Hacl_Chacha20Poly1305_32_poly1305_do_32(key, aadlen, aad, mlen, cipher, computed_mac);
+    poly1305_do_32(key, aadlen, aad, mlen, cipher, computed_mac);
     uint8_t res = (uint8_t)255U;
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
         uint8_t uu____0 = FStar_UInt8_eq_mask(computed_mac[i], mac[i]);
         res = uu____0 & res;
     }
     uint8_t z = res;
     if (z == (uint8_t)255U) {
         Hacl_Chacha20_chacha20_encrypt(mlen, m, cipher, k, n1, (uint32_t)1U);
         return (uint32_t)0U;
     }
--- a/security/nss/lib/freebl/verified/Hacl_Chacha20_Vec128.c
+++ b/security/nss/lib/freebl/verified/Hacl_Chacha20_Vec128.c
@@ -18,18 +18,18 @@
  * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  */
 
 #include "Hacl_Chacha20_Vec128.h"
 
-static void
-Hacl_Chacha20_Vec128_double_round_128(Lib_IntVector_Intrinsics_vec128 *st)
+static inline void
+double_round_128(Lib_IntVector_Intrinsics_vec128 *st)
 {
     st[0U] = Lib_IntVector_Intrinsics_vec128_add32(st[0U], st[4U]);
     Lib_IntVector_Intrinsics_vec128 std = Lib_IntVector_Intrinsics_vec128_xor(st[12U], st[0U]);
     st[12U] = Lib_IntVector_Intrinsics_vec128_rotate_left32(std, (uint32_t)16U);
     st[8U] = Lib_IntVector_Intrinsics_vec128_add32(st[8U], st[12U]);
     Lib_IntVector_Intrinsics_vec128 std0 = Lib_IntVector_Intrinsics_vec128_xor(st[4U], st[8U]);
     st[4U] = Lib_IntVector_Intrinsics_vec128_rotate_left32(std0, (uint32_t)12U);
     st[0U] = Lib_IntVector_Intrinsics_vec128_add32(st[0U], st[4U]);
@@ -119,125 +119,112 @@ Hacl_Chacha20_Vec128_double_round_128(Li
     st[3U] = Lib_IntVector_Intrinsics_vec128_add32(st[3U], st[4U]);
     Lib_IntVector_Intrinsics_vec128 std29 = Lib_IntVector_Intrinsics_vec128_xor(st[14U], st[3U]);
     st[14U] = Lib_IntVector_Intrinsics_vec128_rotate_left32(std29, (uint32_t)8U);
     st[9U] = Lib_IntVector_Intrinsics_vec128_add32(st[9U], st[14U]);
     Lib_IntVector_Intrinsics_vec128 std30 = Lib_IntVector_Intrinsics_vec128_xor(st[4U], st[9U]);
     st[4U] = Lib_IntVector_Intrinsics_vec128_rotate_left32(std30, (uint32_t)7U);
 }
 
-static void
-Hacl_Chacha20_Vec128_chacha20_core_128(
+static inline void
+chacha20_core_128(
     Lib_IntVector_Intrinsics_vec128 *k,
     Lib_IntVector_Intrinsics_vec128 *ctx,
     uint32_t ctr)
 {
-    memcpy(k, ctx, (uint32_t)16U * sizeof ctx[0U]);
+    memcpy(k, ctx, (uint32_t)16U * sizeof(ctx[0U]));
     uint32_t ctr_u32 = (uint32_t)4U * ctr;
     Lib_IntVector_Intrinsics_vec128 cv = Lib_IntVector_Intrinsics_vec128_load32(ctr_u32);
     k[12U] = Lib_IntVector_Intrinsics_vec128_add32(k[12U], cv);
-    Hacl_Chacha20_Vec128_double_round_128(k);
-    Hacl_Chacha20_Vec128_double_round_128(k);
-    Hacl_Chacha20_Vec128_double_round_128(k);
-    Hacl_Chacha20_Vec128_double_round_128(k);
-    Hacl_Chacha20_Vec128_double_round_128(k);
-    Hacl_Chacha20_Vec128_double_round_128(k);
-    Hacl_Chacha20_Vec128_double_round_128(k);
-    Hacl_Chacha20_Vec128_double_round_128(k);
-    Hacl_Chacha20_Vec128_double_round_128(k);
-    Hacl_Chacha20_Vec128_double_round_128(k);
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
+    double_round_128(k);
+    double_round_128(k);
+    double_round_128(k);
+    double_round_128(k);
+    double_round_128(k);
+    double_round_128(k);
+    double_round_128(k);
+    double_round_128(k);
+    double_round_128(k);
+    double_round_128(k);
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
         Lib_IntVector_Intrinsics_vec128 *os = k;
         Lib_IntVector_Intrinsics_vec128 x = Lib_IntVector_Intrinsics_vec128_add32(k[i], ctx[i]);
         os[i] = x;
     }
     k[12U] = Lib_IntVector_Intrinsics_vec128_add32(k[12U], cv);
 }
 
-static void
-Hacl_Chacha20_Vec128_chacha20_init_128(
-    Lib_IntVector_Intrinsics_vec128 *ctx,
-    uint8_t *k,
-    uint8_t *n1,
-    uint32_t ctr)
+static inline void
+chacha20_init_128(Lib_IntVector_Intrinsics_vec128 *ctx, uint8_t *k, uint8_t *n1, uint32_t ctr)
 {
     uint32_t ctx1[16U] = { 0U };
     uint32_t *uu____0 = ctx1;
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)4U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)4U; i++) {
         uint32_t *os = uu____0;
         uint32_t x = Hacl_Impl_Chacha20_Vec_chacha20_constants[i];
         os[i] = x;
     }
     uint32_t *uu____1 = ctx1 + (uint32_t)4U;
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)8U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)8U; i++) {
         uint32_t *os = uu____1;
         uint8_t *bj = k + i * (uint32_t)4U;
         uint32_t u = load32_le(bj);
         uint32_t r = u;
         uint32_t x = r;
         os[i] = x;
     }
     ctx1[12U] = ctr;
     uint32_t *uu____2 = ctx1 + (uint32_t)13U;
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)3U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)3U; i++) {
         uint32_t *os = uu____2;
         uint8_t *bj = n1 + i * (uint32_t)4U;
         uint32_t u = load32_le(bj);
         uint32_t r = u;
         uint32_t x = r;
         os[i] = x;
     }
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
         Lib_IntVector_Intrinsics_vec128 *os = ctx;
         uint32_t x = ctx1[i];
         Lib_IntVector_Intrinsics_vec128 x0 = Lib_IntVector_Intrinsics_vec128_load32(x);
         os[i] = x0;
     }
     Lib_IntVector_Intrinsics_vec128
         ctr1 =
-            Lib_IntVector_Intrinsics_vec128_load32s((uint32_t)3U,
+            Lib_IntVector_Intrinsics_vec128_load32s((uint32_t)0U,
+                                                    (uint32_t)1U,
                                                     (uint32_t)2U,
-                                                    (uint32_t)1U,
-                                                    (uint32_t)0U);
+                                                    (uint32_t)3U);
     Lib_IntVector_Intrinsics_vec128 c12 = ctx[12U];
     ctx[12U] = Lib_IntVector_Intrinsics_vec128_add32(c12, ctr1);
 }
 
 void
 Hacl_Chacha20_Vec128_chacha20_encrypt_128(
     uint32_t len,
     uint8_t *out,
     uint8_t *text,
     uint8_t *key,
     uint8_t *n1,
     uint32_t ctr)
 {
     Lib_IntVector_Intrinsics_vec128 ctx[16U];
     for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
         ctx[_i] = Lib_IntVector_Intrinsics_vec128_zero;
-    Hacl_Chacha20_Vec128_chacha20_init_128(ctx, key, n1, ctr);
-    uint32_t rem1 = len % ((uint32_t)4U * (uint32_t)64U);
-    uint32_t nb = len / ((uint32_t)4U * (uint32_t)64U);
-    uint32_t rem2 = len % ((uint32_t)4U * (uint32_t)64U);
-    for (uint32_t i0 = (uint32_t)0U; i0 < nb; i0 = i0 + (uint32_t)1U) {
-        uint8_t *uu____0 = out + i0 * (uint32_t)4U * (uint32_t)64U;
-        uint8_t *uu____1 = text + i0 * (uint32_t)256U;
+    chacha20_init_128(ctx, key, n1, ctr);
+    uint32_t rem1 = len % (uint32_t)256U;
+    uint32_t nb = len / (uint32_t)256U;
+    uint32_t rem2 = len % (uint32_t)256U;
+    for (uint32_t i = (uint32_t)0U; i < nb; i++) {
+        uint8_t *uu____0 = out + i * (uint32_t)256U;
+        uint8_t *uu____1 = text + i * (uint32_t)256U;
         Lib_IntVector_Intrinsics_vec128 k[16U];
         for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
             k[_i] = Lib_IntVector_Intrinsics_vec128_zero;
-        Hacl_Chacha20_Vec128_chacha20_core_128(k, ctx, i0);
-        Lib_IntVector_Intrinsics_vec128 bl[16U];
-        for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
-            bl[_i] = Lib_IntVector_Intrinsics_vec128_zero;
-        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
-            Lib_IntVector_Intrinsics_vec128 *os = bl;
-            Lib_IntVector_Intrinsics_vec128
-                x = Lib_IntVector_Intrinsics_vec128_load_le(uu____1 + i * (uint32_t)4U * (uint32_t)4U);
-            os[i] = x;
-        }
+        chacha20_core_128(k, ctx, i);
         Lib_IntVector_Intrinsics_vec128 v00 = k[0U];
         Lib_IntVector_Intrinsics_vec128 v16 = k[1U];
         Lib_IntVector_Intrinsics_vec128 v20 = k[2U];
         Lib_IntVector_Intrinsics_vec128 v30 = k[3U];
         Lib_IntVector_Intrinsics_vec128
             v0_ = Lib_IntVector_Intrinsics_vec128_interleave_low32(v00, v16);
         Lib_IntVector_Intrinsics_vec128
             v1_ = Lib_IntVector_Intrinsics_vec128_interleave_high32(v00, v16);
@@ -340,43 +327,32 @@ Hacl_Chacha20_Vec128_chacha20_encrypt_12
         k[8U] = v2;
         k[9U] = v6;
         k[10U] = v10;
         k[11U] = v14;
         k[12U] = v3;
         k[13U] = v7;
         k[14U] = v11;
         k[15U] = v15;
-        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
-            Lib_IntVector_Intrinsics_vec128 *os = bl;
-            Lib_IntVector_Intrinsics_vec128 x = Lib_IntVector_Intrinsics_vec128_xor(bl[i], k[i]);
-            os[i] = x;
-        }
-        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
-            Lib_IntVector_Intrinsics_vec128_store_le(uu____0 + i * (uint32_t)16U, bl[i]);
+        for (uint32_t i0 = (uint32_t)0U; i0 < (uint32_t)16U; i0++) {
+            Lib_IntVector_Intrinsics_vec128
+                x = Lib_IntVector_Intrinsics_vec128_load_le(uu____1 + i0 * (uint32_t)16U);
+            Lib_IntVector_Intrinsics_vec128 y = Lib_IntVector_Intrinsics_vec128_xor(x, k[i0]);
+            Lib_IntVector_Intrinsics_vec128_store_le(uu____0 + i0 * (uint32_t)16U, y);
         }
     }
     if (rem2 > (uint32_t)0U) {
-        uint8_t *uu____2 = out + nb * (uint32_t)4U * (uint32_t)64U;
-        uint8_t *uu____3 = text + nb * (uint32_t)4U * (uint32_t)64U;
+        uint8_t *uu____2 = out + nb * (uint32_t)256U;
+        uint8_t *uu____3 = text + nb * (uint32_t)256U;
         uint8_t plain[256U] = { 0U };
-        memcpy(plain, uu____3, rem1 * sizeof uu____3[0U]);
+        memcpy(plain, uu____3, rem1 * sizeof(uu____3[0U]));
         Lib_IntVector_Intrinsics_vec128 k[16U];
         for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
             k[_i] = Lib_IntVector_Intrinsics_vec128_zero;
-        Hacl_Chacha20_Vec128_chacha20_core_128(k, ctx, nb);
-        Lib_IntVector_Intrinsics_vec128 bl[16U];
-        for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
-            bl[_i] = Lib_IntVector_Intrinsics_vec128_zero;
-        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
-            Lib_IntVector_Intrinsics_vec128 *os = bl;
-            Lib_IntVector_Intrinsics_vec128
-                x = Lib_IntVector_Intrinsics_vec128_load_le(plain + i * (uint32_t)4U * (uint32_t)4U);
-            os[i] = x;
-        }
+        chacha20_core_128(k, ctx, nb);
         Lib_IntVector_Intrinsics_vec128 v00 = k[0U];
         Lib_IntVector_Intrinsics_vec128 v16 = k[1U];
         Lib_IntVector_Intrinsics_vec128 v20 = k[2U];
         Lib_IntVector_Intrinsics_vec128 v30 = k[3U];
         Lib_IntVector_Intrinsics_vec128
             v0_ = Lib_IntVector_Intrinsics_vec128_interleave_low32(v00, v16);
         Lib_IntVector_Intrinsics_vec128
             v1_ = Lib_IntVector_Intrinsics_vec128_interleave_high32(v00, v16);
@@ -479,60 +455,49 @@ Hacl_Chacha20_Vec128_chacha20_encrypt_12
         k[8U] = v2;
         k[9U] = v6;
         k[10U] = v10;
         k[11U] = v14;
         k[12U] = v3;
         k[13U] = v7;
         k[14U] = v11;
         k[15U] = v15;
-        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
-            Lib_IntVector_Intrinsics_vec128 *os = bl;
-            Lib_IntVector_Intrinsics_vec128 x = Lib_IntVector_Intrinsics_vec128_xor(bl[i], k[i]);
-            os[i] = x;
+        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
+            Lib_IntVector_Intrinsics_vec128
+                x = Lib_IntVector_Intrinsics_vec128_load_le(plain + i * (uint32_t)16U);
+            Lib_IntVector_Intrinsics_vec128 y = Lib_IntVector_Intrinsics_vec128_xor(x, k[i]);
+            Lib_IntVector_Intrinsics_vec128_store_le(plain + i * (uint32_t)16U, y);
         }
-        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
-            Lib_IntVector_Intrinsics_vec128_store_le(plain + i * (uint32_t)16U, bl[i]);
-        }
-        memcpy(uu____2, plain, rem1 * sizeof plain[0U]);
+        memcpy(uu____2, plain, rem1 * sizeof(plain[0U]));
     }
 }
 
 void
 Hacl_Chacha20_Vec128_chacha20_decrypt_128(
     uint32_t len,
     uint8_t *out,
     uint8_t *cipher,
     uint8_t *key,
     uint8_t *n1,
     uint32_t ctr)
 {
     Lib_IntVector_Intrinsics_vec128 ctx[16U];
     for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
         ctx[_i] = Lib_IntVector_Intrinsics_vec128_zero;
-    Hacl_Chacha20_Vec128_chacha20_init_128(ctx, key, n1, ctr);
-    uint32_t rem1 = len % ((uint32_t)4U * (uint32_t)64U);
-    uint32_t nb = len / ((uint32_t)4U * (uint32_t)64U);
-    uint32_t rem2 = len % ((uint32_t)4U * (uint32_t)64U);
-    for (uint32_t i0 = (uint32_t)0U; i0 < nb; i0 = i0 + (uint32_t)1U) {
-        uint8_t *uu____0 = out + i0 * (uint32_t)4U * (uint32_t)64U;
-        uint8_t *uu____1 = cipher + i0 * (uint32_t)256U;
+    chacha20_init_128(ctx, key, n1, ctr);
+    uint32_t rem1 = len % (uint32_t)256U;
+    uint32_t nb = len / (uint32_t)256U;
+    uint32_t rem2 = len % (uint32_t)256U;
+    for (uint32_t i = (uint32_t)0U; i < nb; i++) {
+        uint8_t *uu____0 = out + i * (uint32_t)256U;
+        uint8_t *uu____1 = cipher + i * (uint32_t)256U;
         Lib_IntVector_Intrinsics_vec128 k[16U];
         for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
             k[_i] = Lib_IntVector_Intrinsics_vec128_zero;
-        Hacl_Chacha20_Vec128_chacha20_core_128(k, ctx, i0);
-        Lib_IntVector_Intrinsics_vec128 bl[16U];
-        for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
-            bl[_i] = Lib_IntVector_Intrinsics_vec128_zero;
-        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
-            Lib_IntVector_Intrinsics_vec128 *os = bl;
-            Lib_IntVector_Intrinsics_vec128
-                x = Lib_IntVector_Intrinsics_vec128_load_le(uu____1 + i * (uint32_t)4U * (uint32_t)4U);
-            os[i] = x;
-        }
+        chacha20_core_128(k, ctx, i);
         Lib_IntVector_Intrinsics_vec128 v00 = k[0U];
         Lib_IntVector_Intrinsics_vec128 v16 = k[1U];
         Lib_IntVector_Intrinsics_vec128 v20 = k[2U];
         Lib_IntVector_Intrinsics_vec128 v30 = k[3U];
         Lib_IntVector_Intrinsics_vec128
             v0_ = Lib_IntVector_Intrinsics_vec128_interleave_low32(v00, v16);
         Lib_IntVector_Intrinsics_vec128
             v1_ = Lib_IntVector_Intrinsics_vec128_interleave_high32(v00, v16);
@@ -635,43 +600,32 @@ Hacl_Chacha20_Vec128_chacha20_decrypt_12
         k[8U] = v2;
         k[9U] = v6;
         k[10U] = v10;
         k[11U] = v14;
         k[12U] = v3;
         k[13U] = v7;
         k[14U] = v11;
         k[15U] = v15;
-        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
-            Lib_IntVector_Intrinsics_vec128 *os = bl;
-            Lib_IntVector_Intrinsics_vec128 x = Lib_IntVector_Intrinsics_vec128_xor(bl[i], k[i]);
-            os[i] = x;
-        }
-        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
-            Lib_IntVector_Intrinsics_vec128_store_le(uu____0 + i * (uint32_t)16U, bl[i]);
+        for (uint32_t i0 = (uint32_t)0U; i0 < (uint32_t)16U; i0++) {
+            Lib_IntVector_Intrinsics_vec128
+                x = Lib_IntVector_Intrinsics_vec128_load_le(uu____1 + i0 * (uint32_t)16U);
+            Lib_IntVector_Intrinsics_vec128 y = Lib_IntVector_Intrinsics_vec128_xor(x, k[i0]);
+            Lib_IntVector_Intrinsics_vec128_store_le(uu____0 + i0 * (uint32_t)16U, y);
         }
     }
     if (rem2 > (uint32_t)0U) {
-        uint8_t *uu____2 = out + nb * (uint32_t)4U * (uint32_t)64U;
-        uint8_t *uu____3 = cipher + nb * (uint32_t)4U * (uint32_t)64U;
+        uint8_t *uu____2 = out + nb * (uint32_t)256U;
+        uint8_t *uu____3 = cipher + nb * (uint32_t)256U;
         uint8_t plain[256U] = { 0U };
-        memcpy(plain, uu____3, rem1 * sizeof uu____3[0U]);
+        memcpy(plain, uu____3, rem1 * sizeof(uu____3[0U]));
         Lib_IntVector_Intrinsics_vec128 k[16U];
         for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
             k[_i] = Lib_IntVector_Intrinsics_vec128_zero;
-        Hacl_Chacha20_Vec128_chacha20_core_128(k, ctx, nb);
-        Lib_IntVector_Intrinsics_vec128 bl[16U];
-        for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
-            bl[_i] = Lib_IntVector_Intrinsics_vec128_zero;
-        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
-            Lib_IntVector_Intrinsics_vec128 *os = bl;
-            Lib_IntVector_Intrinsics_vec128
-                x = Lib_IntVector_Intrinsics_vec128_load_le(plain + i * (uint32_t)4U * (uint32_t)4U);
-            os[i] = x;
-        }
+        chacha20_core_128(k, ctx, nb);
         Lib_IntVector_Intrinsics_vec128 v00 = k[0U];
         Lib_IntVector_Intrinsics_vec128 v16 = k[1U];
         Lib_IntVector_Intrinsics_vec128 v20 = k[2U];
         Lib_IntVector_Intrinsics_vec128 v30 = k[3U];
         Lib_IntVector_Intrinsics_vec128
             v0_ = Lib_IntVector_Intrinsics_vec128_interleave_low32(v00, v16);
         Lib_IntVector_Intrinsics_vec128
             v1_ = Lib_IntVector_Intrinsics_vec128_interleave_high32(v00, v16);
@@ -774,19 +728,17 @@ Hacl_Chacha20_Vec128_chacha20_decrypt_12
         k[8U] = v2;
         k[9U] = v6;
         k[10U] = v10;
         k[11U] = v14;
         k[12U] = v3;
         k[13U] = v7;
         k[14U] = v11;
         k[15U] = v15;
-        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
-            Lib_IntVector_Intrinsics_vec128 *os = bl;
-            Lib_IntVector_Intrinsics_vec128 x = Lib_IntVector_Intrinsics_vec128_xor(bl[i], k[i]);
-            os[i] = x;
+        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
+            Lib_IntVector_Intrinsics_vec128
+                x = Lib_IntVector_Intrinsics_vec128_load_le(plain + i * (uint32_t)16U);
+            Lib_IntVector_Intrinsics_vec128 y = Lib_IntVector_Intrinsics_vec128_xor(x, k[i]);
+            Lib_IntVector_Intrinsics_vec128_store_le(plain + i * (uint32_t)16U, y);
         }
-        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i = i + (uint32_t)1U) {
-            Lib_IntVector_Intrinsics_vec128_store_le(plain + i * (uint32_t)16U, bl[i]);
-        }
-        memcpy(uu____2, plain, rem1 * sizeof plain[0U]);
+        memcpy(uu____2, plain, rem1 * sizeof(plain[0U]));
     }
 }
new file mode 100644
--- /dev/null
+++ b/security/nss/lib/freebl/verified/Hacl_Chacha20_Vec256.c
@@ -0,0 +1,876 @@
+/* MIT License
+ *
+ * Copyright (c) 2016-2020 INRIA, CMU and Microsoft Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "Hacl_Chacha20_Vec256.h"
+
+static inline void
+double_round_256(Lib_IntVector_Intrinsics_vec256 *st)
+{
+    st[0U] = Lib_IntVector_Intrinsics_vec256_add32(st[0U], st[4U]);
+    Lib_IntVector_Intrinsics_vec256 std = Lib_IntVector_Intrinsics_vec256_xor(st[12U], st[0U]);
+    st[12U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std, (uint32_t)16U);
+    st[8U] = Lib_IntVector_Intrinsics_vec256_add32(st[8U], st[12U]);
+    Lib_IntVector_Intrinsics_vec256 std0 = Lib_IntVector_Intrinsics_vec256_xor(st[4U], st[8U]);
+    st[4U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std0, (uint32_t)12U);
+    st[0U] = Lib_IntVector_Intrinsics_vec256_add32(st[0U], st[4U]);
+    Lib_IntVector_Intrinsics_vec256 std1 = Lib_IntVector_Intrinsics_vec256_xor(st[12U], st[0U]);
+    st[12U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std1, (uint32_t)8U);
+    st[8U] = Lib_IntVector_Intrinsics_vec256_add32(st[8U], st[12U]);
+    Lib_IntVector_Intrinsics_vec256 std2 = Lib_IntVector_Intrinsics_vec256_xor(st[4U], st[8U]);
+    st[4U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std2, (uint32_t)7U);
+    st[1U] = Lib_IntVector_Intrinsics_vec256_add32(st[1U], st[5U]);
+    Lib_IntVector_Intrinsics_vec256 std3 = Lib_IntVector_Intrinsics_vec256_xor(st[13U], st[1U]);
+    st[13U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std3, (uint32_t)16U);
+    st[9U] = Lib_IntVector_Intrinsics_vec256_add32(st[9U], st[13U]);
+    Lib_IntVector_Intrinsics_vec256 std4 = Lib_IntVector_Intrinsics_vec256_xor(st[5U], st[9U]);
+    st[5U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std4, (uint32_t)12U);
+    st[1U] = Lib_IntVector_Intrinsics_vec256_add32(st[1U], st[5U]);
+    Lib_IntVector_Intrinsics_vec256 std5 = Lib_IntVector_Intrinsics_vec256_xor(st[13U], st[1U]);
+    st[13U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std5, (uint32_t)8U);
+    st[9U] = Lib_IntVector_Intrinsics_vec256_add32(st[9U], st[13U]);
+    Lib_IntVector_Intrinsics_vec256 std6 = Lib_IntVector_Intrinsics_vec256_xor(st[5U], st[9U]);
+    st[5U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std6, (uint32_t)7U);
+    st[2U] = Lib_IntVector_Intrinsics_vec256_add32(st[2U], st[6U]);
+    Lib_IntVector_Intrinsics_vec256 std7 = Lib_IntVector_Intrinsics_vec256_xor(st[14U], st[2U]);
+    st[14U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std7, (uint32_t)16U);
+    st[10U] = Lib_IntVector_Intrinsics_vec256_add32(st[10U], st[14U]);
+    Lib_IntVector_Intrinsics_vec256 std8 = Lib_IntVector_Intrinsics_vec256_xor(st[6U], st[10U]);
+    st[6U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std8, (uint32_t)12U);
+    st[2U] = Lib_IntVector_Intrinsics_vec256_add32(st[2U], st[6U]);
+    Lib_IntVector_Intrinsics_vec256 std9 = Lib_IntVector_Intrinsics_vec256_xor(st[14U], st[2U]);
+    st[14U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std9, (uint32_t)8U);
+    st[10U] = Lib_IntVector_Intrinsics_vec256_add32(st[10U], st[14U]);
+    Lib_IntVector_Intrinsics_vec256 std10 = Lib_IntVector_Intrinsics_vec256_xor(st[6U], st[10U]);
+    st[6U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std10, (uint32_t)7U);
+    st[3U] = Lib_IntVector_Intrinsics_vec256_add32(st[3U], st[7U]);
+    Lib_IntVector_Intrinsics_vec256 std11 = Lib_IntVector_Intrinsics_vec256_xor(st[15U], st[3U]);
+    st[15U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std11, (uint32_t)16U);
+    st[11U] = Lib_IntVector_Intrinsics_vec256_add32(st[11U], st[15U]);
+    Lib_IntVector_Intrinsics_vec256 std12 = Lib_IntVector_Intrinsics_vec256_xor(st[7U], st[11U]);
+    st[7U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std12, (uint32_t)12U);
+    st[3U] = Lib_IntVector_Intrinsics_vec256_add32(st[3U], st[7U]);
+    Lib_IntVector_Intrinsics_vec256 std13 = Lib_IntVector_Intrinsics_vec256_xor(st[15U], st[3U]);
+    st[15U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std13, (uint32_t)8U);
+    st[11U] = Lib_IntVector_Intrinsics_vec256_add32(st[11U], st[15U]);
+    Lib_IntVector_Intrinsics_vec256 std14 = Lib_IntVector_Intrinsics_vec256_xor(st[7U], st[11U]);
+    st[7U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std14, (uint32_t)7U);
+    st[0U] = Lib_IntVector_Intrinsics_vec256_add32(st[0U], st[5U]);
+    Lib_IntVector_Intrinsics_vec256 std15 = Lib_IntVector_Intrinsics_vec256_xor(st[15U], st[0U]);
+    st[15U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std15, (uint32_t)16U);
+    st[10U] = Lib_IntVector_Intrinsics_vec256_add32(st[10U], st[15U]);
+    Lib_IntVector_Intrinsics_vec256 std16 = Lib_IntVector_Intrinsics_vec256_xor(st[5U], st[10U]);
+    st[5U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std16, (uint32_t)12U);
+    st[0U] = Lib_IntVector_Intrinsics_vec256_add32(st[0U], st[5U]);
+    Lib_IntVector_Intrinsics_vec256 std17 = Lib_IntVector_Intrinsics_vec256_xor(st[15U], st[0U]);
+    st[15U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std17, (uint32_t)8U);
+    st[10U] = Lib_IntVector_Intrinsics_vec256_add32(st[10U], st[15U]);
+    Lib_IntVector_Intrinsics_vec256 std18 = Lib_IntVector_Intrinsics_vec256_xor(st[5U], st[10U]);
+    st[5U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std18, (uint32_t)7U);
+    st[1U] = Lib_IntVector_Intrinsics_vec256_add32(st[1U], st[6U]);
+    Lib_IntVector_Intrinsics_vec256 std19 = Lib_IntVector_Intrinsics_vec256_xor(st[12U], st[1U]);
+    st[12U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std19, (uint32_t)16U);
+    st[11U] = Lib_IntVector_Intrinsics_vec256_add32(st[11U], st[12U]);
+    Lib_IntVector_Intrinsics_vec256 std20 = Lib_IntVector_Intrinsics_vec256_xor(st[6U], st[11U]);
+    st[6U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std20, (uint32_t)12U);
+    st[1U] = Lib_IntVector_Intrinsics_vec256_add32(st[1U], st[6U]);
+    Lib_IntVector_Intrinsics_vec256 std21 = Lib_IntVector_Intrinsics_vec256_xor(st[12U], st[1U]);
+    st[12U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std21, (uint32_t)8U);
+    st[11U] = Lib_IntVector_Intrinsics_vec256_add32(st[11U], st[12U]);
+    Lib_IntVector_Intrinsics_vec256 std22 = Lib_IntVector_Intrinsics_vec256_xor(st[6U], st[11U]);
+    st[6U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std22, (uint32_t)7U);
+    st[2U] = Lib_IntVector_Intrinsics_vec256_add32(st[2U], st[7U]);
+    Lib_IntVector_Intrinsics_vec256 std23 = Lib_IntVector_Intrinsics_vec256_xor(st[13U], st[2U]);
+    st[13U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std23, (uint32_t)16U);
+    st[8U] = Lib_IntVector_Intrinsics_vec256_add32(st[8U], st[13U]);
+    Lib_IntVector_Intrinsics_vec256 std24 = Lib_IntVector_Intrinsics_vec256_xor(st[7U], st[8U]);
+    st[7U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std24, (uint32_t)12U);
+    st[2U] = Lib_IntVector_Intrinsics_vec256_add32(st[2U], st[7U]);
+    Lib_IntVector_Intrinsics_vec256 std25 = Lib_IntVector_Intrinsics_vec256_xor(st[13U], st[2U]);
+    st[13U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std25, (uint32_t)8U);
+    st[8U] = Lib_IntVector_Intrinsics_vec256_add32(st[8U], st[13U]);
+    Lib_IntVector_Intrinsics_vec256 std26 = Lib_IntVector_Intrinsics_vec256_xor(st[7U], st[8U]);
+    st[7U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std26, (uint32_t)7U);
+    st[3U] = Lib_IntVector_Intrinsics_vec256_add32(st[3U], st[4U]);
+    Lib_IntVector_Intrinsics_vec256 std27 = Lib_IntVector_Intrinsics_vec256_xor(st[14U], st[3U]);
+    st[14U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std27, (uint32_t)16U);
+    st[9U] = Lib_IntVector_Intrinsics_vec256_add32(st[9U], st[14U]);
+    Lib_IntVector_Intrinsics_vec256 std28 = Lib_IntVector_Intrinsics_vec256_xor(st[4U], st[9U]);
+    st[4U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std28, (uint32_t)12U);
+    st[3U] = Lib_IntVector_Intrinsics_vec256_add32(st[3U], st[4U]);
+    Lib_IntVector_Intrinsics_vec256 std29 = Lib_IntVector_Intrinsics_vec256_xor(st[14U], st[3U]);
+    st[14U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std29, (uint32_t)8U);
+    st[9U] = Lib_IntVector_Intrinsics_vec256_add32(st[9U], st[14U]);
+    Lib_IntVector_Intrinsics_vec256 std30 = Lib_IntVector_Intrinsics_vec256_xor(st[4U], st[9U]);
+    st[4U] = Lib_IntVector_Intrinsics_vec256_rotate_left32(std30, (uint32_t)7U);
+}
+
+static inline void
+chacha20_core_256(
+    Lib_IntVector_Intrinsics_vec256 *k,
+    Lib_IntVector_Intrinsics_vec256 *ctx,
+    uint32_t ctr)
+{
+    memcpy(k, ctx, (uint32_t)16U * sizeof(ctx[0U]));
+    uint32_t ctr_u32 = (uint32_t)8U * ctr;
+    Lib_IntVector_Intrinsics_vec256 cv = Lib_IntVector_Intrinsics_vec256_load32(ctr_u32);
+    k[12U] = Lib_IntVector_Intrinsics_vec256_add32(k[12U], cv);
+    double_round_256(k);
+    double_round_256(k);
+    double_round_256(k);
+    double_round_256(k);
+    double_round_256(k);
+    double_round_256(k);
+    double_round_256(k);
+    double_round_256(k);
+    double_round_256(k);
+    double_round_256(k);
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
+        Lib_IntVector_Intrinsics_vec256 *os = k;
+        Lib_IntVector_Intrinsics_vec256 x = Lib_IntVector_Intrinsics_vec256_add32(k[i], ctx[i]);
+        os[i] = x;
+    }
+    k[12U] = Lib_IntVector_Intrinsics_vec256_add32(k[12U], cv);
+}
+
+static inline void
+chacha20_init_256(Lib_IntVector_Intrinsics_vec256 *ctx, uint8_t *k, uint8_t *n1, uint32_t ctr)
+{
+    uint32_t ctx1[16U] = { 0U };
+    uint32_t *uu____0 = ctx1;
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)4U; i++) {
+        uint32_t *os = uu____0;
+        uint32_t x = Hacl_Impl_Chacha20_Vec_chacha20_constants[i];
+        os[i] = x;
+    }
+    uint32_t *uu____1 = ctx1 + (uint32_t)4U;
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)8U; i++) {
+        uint32_t *os = uu____1;
+        uint8_t *bj = k + i * (uint32_t)4U;
+        uint32_t u = load32_le(bj);
+        uint32_t r = u;
+        uint32_t x = r;
+        os[i] = x;
+    }
+    ctx1[12U] = ctr;
+    uint32_t *uu____2 = ctx1 + (uint32_t)13U;
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)3U; i++) {
+        uint32_t *os = uu____2;
+        uint8_t *bj = n1 + i * (uint32_t)4U;
+        uint32_t u = load32_le(bj);
+        uint32_t r = u;
+        uint32_t x = r;
+        os[i] = x;
+    }
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
+        Lib_IntVector_Intrinsics_vec256 *os = ctx;
+        uint32_t x = ctx1[i];
+        Lib_IntVector_Intrinsics_vec256 x0 = Lib_IntVector_Intrinsics_vec256_load32(x);
+        os[i] = x0;
+    }
+    Lib_IntVector_Intrinsics_vec256
+        ctr1 =
+            Lib_IntVector_Intrinsics_vec256_load32s((uint32_t)0U,
+                                                    (uint32_t)1U,
+                                                    (uint32_t)2U,
+                                                    (uint32_t)3U,
+                                                    (uint32_t)4U,
+                                                    (uint32_t)5U,
+                                                    (uint32_t)6U,
+                                                    (uint32_t)7U);
+    Lib_IntVector_Intrinsics_vec256 c12 = ctx[12U];
+    ctx[12U] = Lib_IntVector_Intrinsics_vec256_add32(c12, ctr1);
+}
+
+void
+Hacl_Chacha20_Vec256_chacha20_encrypt_256(
+    uint32_t len,
+    uint8_t *out,
+    uint8_t *text,
+    uint8_t *key,
+    uint8_t *n1,
+    uint32_t ctr)
+{
+    Lib_IntVector_Intrinsics_vec256 ctx[16U];
+    for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
+        ctx[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+    chacha20_init_256(ctx, key, n1, ctr);
+    uint32_t rem1 = len % (uint32_t)512U;
+    uint32_t nb = len / (uint32_t)512U;
+    uint32_t rem2 = len % (uint32_t)512U;
+    for (uint32_t i = (uint32_t)0U; i < nb; i++) {
+        uint8_t *uu____0 = out + i * (uint32_t)512U;
+        uint8_t *uu____1 = text + i * (uint32_t)512U;
+        Lib_IntVector_Intrinsics_vec256 k[16U];
+        for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
+            k[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+        chacha20_core_256(k, ctx, i);
+        Lib_IntVector_Intrinsics_vec256 v00 = k[0U];
+        Lib_IntVector_Intrinsics_vec256 v16 = k[1U];
+        Lib_IntVector_Intrinsics_vec256 v20 = k[2U];
+        Lib_IntVector_Intrinsics_vec256 v30 = k[3U];
+        Lib_IntVector_Intrinsics_vec256 v40 = k[4U];
+        Lib_IntVector_Intrinsics_vec256 v50 = k[5U];
+        Lib_IntVector_Intrinsics_vec256 v60 = k[6U];
+        Lib_IntVector_Intrinsics_vec256 v70 = k[7U];
+        Lib_IntVector_Intrinsics_vec256
+            v0_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v00, v16);
+        Lib_IntVector_Intrinsics_vec256
+            v1_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v00, v16);
+        Lib_IntVector_Intrinsics_vec256
+            v2_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v20, v30);
+        Lib_IntVector_Intrinsics_vec256
+            v3_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v20, v30);
+        Lib_IntVector_Intrinsics_vec256
+            v4_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v40, v50);
+        Lib_IntVector_Intrinsics_vec256
+            v5_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v40, v50);
+        Lib_IntVector_Intrinsics_vec256
+            v6_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v60, v70);
+        Lib_IntVector_Intrinsics_vec256
+            v7_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v60, v70);
+        Lib_IntVector_Intrinsics_vec256
+            v0__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v0_, v2_);
+        Lib_IntVector_Intrinsics_vec256
+            v1__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v0_, v2_);
+        Lib_IntVector_Intrinsics_vec256
+            v2__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v1_, v3_);
+        Lib_IntVector_Intrinsics_vec256
+            v3__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v1_, v3_);
+        Lib_IntVector_Intrinsics_vec256
+            v4__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v4_, v6_);
+        Lib_IntVector_Intrinsics_vec256
+            v5__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v4_, v6_);
+        Lib_IntVector_Intrinsics_vec256
+            v6__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v5_, v7_);
+        Lib_IntVector_Intrinsics_vec256
+            v7__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v5_, v7_);
+        Lib_IntVector_Intrinsics_vec256
+            v0___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v0__, v4__);
+        Lib_IntVector_Intrinsics_vec256
+            v1___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v0__, v4__);
+        Lib_IntVector_Intrinsics_vec256
+            v2___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v1__, v5__);
+        Lib_IntVector_Intrinsics_vec256
+            v3___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v1__, v5__);
+        Lib_IntVector_Intrinsics_vec256
+            v4___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v2__, v6__);
+        Lib_IntVector_Intrinsics_vec256
+            v5___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v2__, v6__);
+        Lib_IntVector_Intrinsics_vec256
+            v6___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v3__, v7__);
+        Lib_IntVector_Intrinsics_vec256
+            v7___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v3__, v7__);
+        Lib_IntVector_Intrinsics_vec256 v0 = v0___;
+        Lib_IntVector_Intrinsics_vec256 v1 = v2___;
+        Lib_IntVector_Intrinsics_vec256 v2 = v4___;
+        Lib_IntVector_Intrinsics_vec256 v3 = v6___;
+        Lib_IntVector_Intrinsics_vec256 v4 = v1___;
+        Lib_IntVector_Intrinsics_vec256 v5 = v3___;
+        Lib_IntVector_Intrinsics_vec256 v6 = v5___;
+        Lib_IntVector_Intrinsics_vec256 v7 = v7___;
+        Lib_IntVector_Intrinsics_vec256 v01 = k[8U];
+        Lib_IntVector_Intrinsics_vec256 v110 = k[9U];
+        Lib_IntVector_Intrinsics_vec256 v21 = k[10U];
+        Lib_IntVector_Intrinsics_vec256 v31 = k[11U];
+        Lib_IntVector_Intrinsics_vec256 v41 = k[12U];
+        Lib_IntVector_Intrinsics_vec256 v51 = k[13U];
+        Lib_IntVector_Intrinsics_vec256 v61 = k[14U];
+        Lib_IntVector_Intrinsics_vec256 v71 = k[15U];
+        Lib_IntVector_Intrinsics_vec256
+            v0_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v01, v110);
+        Lib_IntVector_Intrinsics_vec256
+            v1_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v01, v110);
+        Lib_IntVector_Intrinsics_vec256
+            v2_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v21, v31);
+        Lib_IntVector_Intrinsics_vec256
+            v3_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v21, v31);
+        Lib_IntVector_Intrinsics_vec256
+            v4_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v41, v51);
+        Lib_IntVector_Intrinsics_vec256
+            v5_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v41, v51);
+        Lib_IntVector_Intrinsics_vec256
+            v6_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v61, v71);
+        Lib_IntVector_Intrinsics_vec256
+            v7_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v61, v71);
+        Lib_IntVector_Intrinsics_vec256
+            v0__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v0_0, v2_0);
+        Lib_IntVector_Intrinsics_vec256
+            v1__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v0_0, v2_0);
+        Lib_IntVector_Intrinsics_vec256
+            v2__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v1_0, v3_0);
+        Lib_IntVector_Intrinsics_vec256
+            v3__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v1_0, v3_0);
+        Lib_IntVector_Intrinsics_vec256
+            v4__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v4_0, v6_0);
+        Lib_IntVector_Intrinsics_vec256
+            v5__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v4_0, v6_0);
+        Lib_IntVector_Intrinsics_vec256
+            v6__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v5_0, v7_0);
+        Lib_IntVector_Intrinsics_vec256
+            v7__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v5_0, v7_0);
+        Lib_IntVector_Intrinsics_vec256
+            v0___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v0__0, v4__0);
+        Lib_IntVector_Intrinsics_vec256
+            v1___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v0__0, v4__0);
+        Lib_IntVector_Intrinsics_vec256
+            v2___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v1__0, v5__0);
+        Lib_IntVector_Intrinsics_vec256
+            v3___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v1__0, v5__0);
+        Lib_IntVector_Intrinsics_vec256
+            v4___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v2__0, v6__0);
+        Lib_IntVector_Intrinsics_vec256
+            v5___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v2__0, v6__0);
+        Lib_IntVector_Intrinsics_vec256
+            v6___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v3__0, v7__0);
+        Lib_IntVector_Intrinsics_vec256
+            v7___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v3__0, v7__0);
+        Lib_IntVector_Intrinsics_vec256 v8 = v0___0;
+        Lib_IntVector_Intrinsics_vec256 v9 = v2___0;
+        Lib_IntVector_Intrinsics_vec256 v10 = v4___0;
+        Lib_IntVector_Intrinsics_vec256 v11 = v6___0;
+        Lib_IntVector_Intrinsics_vec256 v12 = v1___0;
+        Lib_IntVector_Intrinsics_vec256 v13 = v3___0;
+        Lib_IntVector_Intrinsics_vec256 v14 = v5___0;
+        Lib_IntVector_Intrinsics_vec256 v15 = v7___0;
+        k[0U] = v0;
+        k[1U] = v8;
+        k[2U] = v1;
+        k[3U] = v9;
+        k[4U] = v2;
+        k[5U] = v10;
+        k[6U] = v3;
+        k[7U] = v11;
+        k[8U] = v4;
+        k[9U] = v12;
+        k[10U] = v5;
+        k[11U] = v13;
+        k[12U] = v6;
+        k[13U] = v14;
+        k[14U] = v7;
+        k[15U] = v15;
+        for (uint32_t i0 = (uint32_t)0U; i0 < (uint32_t)16U; i0++) {
+            Lib_IntVector_Intrinsics_vec256
+                x = Lib_IntVector_Intrinsics_vec256_load_le(uu____1 + i0 * (uint32_t)32U);
+            Lib_IntVector_Intrinsics_vec256 y = Lib_IntVector_Intrinsics_vec256_xor(x, k[i0]);
+            Lib_IntVector_Intrinsics_vec256_store_le(uu____0 + i0 * (uint32_t)32U, y);
+        }
+    }
+    if (rem2 > (uint32_t)0U) {
+        uint8_t *uu____2 = out + nb * (uint32_t)512U;
+        uint8_t *uu____3 = text + nb * (uint32_t)512U;
+        uint8_t plain[512U] = { 0U };
+        memcpy(plain, uu____3, rem1 * sizeof(uu____3[0U]));
+        Lib_IntVector_Intrinsics_vec256 k[16U];
+        for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
+            k[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+        chacha20_core_256(k, ctx, nb);
+        Lib_IntVector_Intrinsics_vec256 v00 = k[0U];
+        Lib_IntVector_Intrinsics_vec256 v16 = k[1U];
+        Lib_IntVector_Intrinsics_vec256 v20 = k[2U];
+        Lib_IntVector_Intrinsics_vec256 v30 = k[3U];
+        Lib_IntVector_Intrinsics_vec256 v40 = k[4U];
+        Lib_IntVector_Intrinsics_vec256 v50 = k[5U];
+        Lib_IntVector_Intrinsics_vec256 v60 = k[6U];
+        Lib_IntVector_Intrinsics_vec256 v70 = k[7U];
+        Lib_IntVector_Intrinsics_vec256
+            v0_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v00, v16);
+        Lib_IntVector_Intrinsics_vec256
+            v1_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v00, v16);
+        Lib_IntVector_Intrinsics_vec256
+            v2_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v20, v30);
+        Lib_IntVector_Intrinsics_vec256
+            v3_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v20, v30);
+        Lib_IntVector_Intrinsics_vec256
+            v4_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v40, v50);
+        Lib_IntVector_Intrinsics_vec256
+            v5_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v40, v50);
+        Lib_IntVector_Intrinsics_vec256
+            v6_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v60, v70);
+        Lib_IntVector_Intrinsics_vec256
+            v7_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v60, v70);
+        Lib_IntVector_Intrinsics_vec256
+            v0__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v0_, v2_);
+        Lib_IntVector_Intrinsics_vec256
+            v1__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v0_, v2_);
+        Lib_IntVector_Intrinsics_vec256
+            v2__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v1_, v3_);
+        Lib_IntVector_Intrinsics_vec256
+            v3__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v1_, v3_);
+        Lib_IntVector_Intrinsics_vec256
+            v4__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v4_, v6_);
+        Lib_IntVector_Intrinsics_vec256
+            v5__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v4_, v6_);
+        Lib_IntVector_Intrinsics_vec256
+            v6__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v5_, v7_);
+        Lib_IntVector_Intrinsics_vec256
+            v7__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v5_, v7_);
+        Lib_IntVector_Intrinsics_vec256
+            v0___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v0__, v4__);
+        Lib_IntVector_Intrinsics_vec256
+            v1___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v0__, v4__);
+        Lib_IntVector_Intrinsics_vec256
+            v2___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v1__, v5__);
+        Lib_IntVector_Intrinsics_vec256
+            v3___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v1__, v5__);
+        Lib_IntVector_Intrinsics_vec256
+            v4___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v2__, v6__);
+        Lib_IntVector_Intrinsics_vec256
+            v5___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v2__, v6__);
+        Lib_IntVector_Intrinsics_vec256
+            v6___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v3__, v7__);
+        Lib_IntVector_Intrinsics_vec256
+            v7___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v3__, v7__);
+        Lib_IntVector_Intrinsics_vec256 v0 = v0___;
+        Lib_IntVector_Intrinsics_vec256 v1 = v2___;
+        Lib_IntVector_Intrinsics_vec256 v2 = v4___;
+        Lib_IntVector_Intrinsics_vec256 v3 = v6___;
+        Lib_IntVector_Intrinsics_vec256 v4 = v1___;
+        Lib_IntVector_Intrinsics_vec256 v5 = v3___;
+        Lib_IntVector_Intrinsics_vec256 v6 = v5___;
+        Lib_IntVector_Intrinsics_vec256 v7 = v7___;
+        Lib_IntVector_Intrinsics_vec256 v01 = k[8U];
+        Lib_IntVector_Intrinsics_vec256 v110 = k[9U];
+        Lib_IntVector_Intrinsics_vec256 v21 = k[10U];
+        Lib_IntVector_Intrinsics_vec256 v31 = k[11U];
+        Lib_IntVector_Intrinsics_vec256 v41 = k[12U];
+        Lib_IntVector_Intrinsics_vec256 v51 = k[13U];
+        Lib_IntVector_Intrinsics_vec256 v61 = k[14U];
+        Lib_IntVector_Intrinsics_vec256 v71 = k[15U];
+        Lib_IntVector_Intrinsics_vec256
+            v0_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v01, v110);
+        Lib_IntVector_Intrinsics_vec256
+            v1_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v01, v110);
+        Lib_IntVector_Intrinsics_vec256
+            v2_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v21, v31);
+        Lib_IntVector_Intrinsics_vec256
+            v3_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v21, v31);
+        Lib_IntVector_Intrinsics_vec256
+            v4_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v41, v51);
+        Lib_IntVector_Intrinsics_vec256
+            v5_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v41, v51);
+        Lib_IntVector_Intrinsics_vec256
+            v6_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v61, v71);
+        Lib_IntVector_Intrinsics_vec256
+            v7_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v61, v71);
+        Lib_IntVector_Intrinsics_vec256
+            v0__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v0_0, v2_0);
+        Lib_IntVector_Intrinsics_vec256
+            v1__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v0_0, v2_0);
+        Lib_IntVector_Intrinsics_vec256
+            v2__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v1_0, v3_0);
+        Lib_IntVector_Intrinsics_vec256
+            v3__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v1_0, v3_0);
+        Lib_IntVector_Intrinsics_vec256
+            v4__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v4_0, v6_0);
+        Lib_IntVector_Intrinsics_vec256
+            v5__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v4_0, v6_0);
+        Lib_IntVector_Intrinsics_vec256
+            v6__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v5_0, v7_0);
+        Lib_IntVector_Intrinsics_vec256
+            v7__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v5_0, v7_0);
+        Lib_IntVector_Intrinsics_vec256
+            v0___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v0__0, v4__0);
+        Lib_IntVector_Intrinsics_vec256
+            v1___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v0__0, v4__0);
+        Lib_IntVector_Intrinsics_vec256
+            v2___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v1__0, v5__0);
+        Lib_IntVector_Intrinsics_vec256
+            v3___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v1__0, v5__0);
+        Lib_IntVector_Intrinsics_vec256
+            v4___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v2__0, v6__0);
+        Lib_IntVector_Intrinsics_vec256
+            v5___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v2__0, v6__0);
+        Lib_IntVector_Intrinsics_vec256
+            v6___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v3__0, v7__0);
+        Lib_IntVector_Intrinsics_vec256
+            v7___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v3__0, v7__0);
+        Lib_IntVector_Intrinsics_vec256 v8 = v0___0;
+        Lib_IntVector_Intrinsics_vec256 v9 = v2___0;
+        Lib_IntVector_Intrinsics_vec256 v10 = v4___0;
+        Lib_IntVector_Intrinsics_vec256 v11 = v6___0;
+        Lib_IntVector_Intrinsics_vec256 v12 = v1___0;
+        Lib_IntVector_Intrinsics_vec256 v13 = v3___0;
+        Lib_IntVector_Intrinsics_vec256 v14 = v5___0;
+        Lib_IntVector_Intrinsics_vec256 v15 = v7___0;
+        k[0U] = v0;
+        k[1U] = v8;
+        k[2U] = v1;
+        k[3U] = v9;
+        k[4U] = v2;
+        k[5U] = v10;
+        k[6U] = v3;
+        k[7U] = v11;
+        k[8U] = v4;
+        k[9U] = v12;
+        k[10U] = v5;
+        k[11U] = v13;
+        k[12U] = v6;
+        k[13U] = v14;
+        k[14U] = v7;
+        k[15U] = v15;
+        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
+            Lib_IntVector_Intrinsics_vec256
+                x = Lib_IntVector_Intrinsics_vec256_load_le(plain + i * (uint32_t)32U);
+            Lib_IntVector_Intrinsics_vec256 y = Lib_IntVector_Intrinsics_vec256_xor(x, k[i]);
+            Lib_IntVector_Intrinsics_vec256_store_le(plain + i * (uint32_t)32U, y);
+        }
+        memcpy(uu____2, plain, rem1 * sizeof(plain[0U]));
+    }
+}
+
+void
+Hacl_Chacha20_Vec256_chacha20_decrypt_256(
+    uint32_t len,
+    uint8_t *out,
+    uint8_t *cipher,
+    uint8_t *key,
+    uint8_t *n1,
+    uint32_t ctr)
+{
+    Lib_IntVector_Intrinsics_vec256 ctx[16U];
+    for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
+        ctx[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+    chacha20_init_256(ctx, key, n1, ctr);
+    uint32_t rem1 = len % (uint32_t)512U;
+    uint32_t nb = len / (uint32_t)512U;
+    uint32_t rem2 = len % (uint32_t)512U;
+    for (uint32_t i = (uint32_t)0U; i < nb; i++) {
+        uint8_t *uu____0 = out + i * (uint32_t)512U;
+        uint8_t *uu____1 = cipher + i * (uint32_t)512U;
+        Lib_IntVector_Intrinsics_vec256 k[16U];
+        for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
+            k[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+        chacha20_core_256(k, ctx, i);
+        Lib_IntVector_Intrinsics_vec256 v00 = k[0U];
+        Lib_IntVector_Intrinsics_vec256 v16 = k[1U];
+        Lib_IntVector_Intrinsics_vec256 v20 = k[2U];
+        Lib_IntVector_Intrinsics_vec256 v30 = k[3U];
+        Lib_IntVector_Intrinsics_vec256 v40 = k[4U];
+        Lib_IntVector_Intrinsics_vec256 v50 = k[5U];
+        Lib_IntVector_Intrinsics_vec256 v60 = k[6U];
+        Lib_IntVector_Intrinsics_vec256 v70 = k[7U];
+        Lib_IntVector_Intrinsics_vec256
+            v0_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v00, v16);
+        Lib_IntVector_Intrinsics_vec256
+            v1_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v00, v16);
+        Lib_IntVector_Intrinsics_vec256
+            v2_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v20, v30);
+        Lib_IntVector_Intrinsics_vec256
+            v3_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v20, v30);
+        Lib_IntVector_Intrinsics_vec256
+            v4_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v40, v50);
+        Lib_IntVector_Intrinsics_vec256
+            v5_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v40, v50);
+        Lib_IntVector_Intrinsics_vec256
+            v6_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v60, v70);
+        Lib_IntVector_Intrinsics_vec256
+            v7_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v60, v70);
+        Lib_IntVector_Intrinsics_vec256
+            v0__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v0_, v2_);
+        Lib_IntVector_Intrinsics_vec256
+            v1__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v0_, v2_);
+        Lib_IntVector_Intrinsics_vec256
+            v2__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v1_, v3_);
+        Lib_IntVector_Intrinsics_vec256
+            v3__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v1_, v3_);
+        Lib_IntVector_Intrinsics_vec256
+            v4__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v4_, v6_);
+        Lib_IntVector_Intrinsics_vec256
+            v5__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v4_, v6_);
+        Lib_IntVector_Intrinsics_vec256
+            v6__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v5_, v7_);
+        Lib_IntVector_Intrinsics_vec256
+            v7__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v5_, v7_);
+        Lib_IntVector_Intrinsics_vec256
+            v0___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v0__, v4__);
+        Lib_IntVector_Intrinsics_vec256
+            v1___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v0__, v4__);
+        Lib_IntVector_Intrinsics_vec256
+            v2___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v1__, v5__);
+        Lib_IntVector_Intrinsics_vec256
+            v3___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v1__, v5__);
+        Lib_IntVector_Intrinsics_vec256
+            v4___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v2__, v6__);
+        Lib_IntVector_Intrinsics_vec256
+            v5___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v2__, v6__);
+        Lib_IntVector_Intrinsics_vec256
+            v6___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v3__, v7__);
+        Lib_IntVector_Intrinsics_vec256
+            v7___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v3__, v7__);
+        Lib_IntVector_Intrinsics_vec256 v0 = v0___;
+        Lib_IntVector_Intrinsics_vec256 v1 = v2___;
+        Lib_IntVector_Intrinsics_vec256 v2 = v4___;
+        Lib_IntVector_Intrinsics_vec256 v3 = v6___;
+        Lib_IntVector_Intrinsics_vec256 v4 = v1___;
+        Lib_IntVector_Intrinsics_vec256 v5 = v3___;
+        Lib_IntVector_Intrinsics_vec256 v6 = v5___;
+        Lib_IntVector_Intrinsics_vec256 v7 = v7___;
+        Lib_IntVector_Intrinsics_vec256 v01 = k[8U];
+        Lib_IntVector_Intrinsics_vec256 v110 = k[9U];
+        Lib_IntVector_Intrinsics_vec256 v21 = k[10U];
+        Lib_IntVector_Intrinsics_vec256 v31 = k[11U];
+        Lib_IntVector_Intrinsics_vec256 v41 = k[12U];
+        Lib_IntVector_Intrinsics_vec256 v51 = k[13U];
+        Lib_IntVector_Intrinsics_vec256 v61 = k[14U];
+        Lib_IntVector_Intrinsics_vec256 v71 = k[15U];
+        Lib_IntVector_Intrinsics_vec256
+            v0_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v01, v110);
+        Lib_IntVector_Intrinsics_vec256
+            v1_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v01, v110);
+        Lib_IntVector_Intrinsics_vec256
+            v2_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v21, v31);
+        Lib_IntVector_Intrinsics_vec256
+            v3_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v21, v31);
+        Lib_IntVector_Intrinsics_vec256
+            v4_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v41, v51);
+        Lib_IntVector_Intrinsics_vec256
+            v5_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v41, v51);
+        Lib_IntVector_Intrinsics_vec256
+            v6_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v61, v71);
+        Lib_IntVector_Intrinsics_vec256
+            v7_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v61, v71);
+        Lib_IntVector_Intrinsics_vec256
+            v0__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v0_0, v2_0);
+        Lib_IntVector_Intrinsics_vec256
+            v1__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v0_0, v2_0);
+        Lib_IntVector_Intrinsics_vec256
+            v2__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v1_0, v3_0);
+        Lib_IntVector_Intrinsics_vec256
+            v3__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v1_0, v3_0);
+        Lib_IntVector_Intrinsics_vec256
+            v4__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v4_0, v6_0);
+        Lib_IntVector_Intrinsics_vec256
+            v5__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v4_0, v6_0);
+        Lib_IntVector_Intrinsics_vec256
+            v6__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v5_0, v7_0);
+        Lib_IntVector_Intrinsics_vec256
+            v7__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v5_0, v7_0);
+        Lib_IntVector_Intrinsics_vec256
+            v0___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v0__0, v4__0);
+        Lib_IntVector_Intrinsics_vec256
+            v1___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v0__0, v4__0);
+        Lib_IntVector_Intrinsics_vec256
+            v2___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v1__0, v5__0);
+        Lib_IntVector_Intrinsics_vec256
+            v3___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v1__0, v5__0);
+        Lib_IntVector_Intrinsics_vec256
+            v4___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v2__0, v6__0);
+        Lib_IntVector_Intrinsics_vec256
+            v5___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v2__0, v6__0);
+        Lib_IntVector_Intrinsics_vec256
+            v6___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v3__0, v7__0);
+        Lib_IntVector_Intrinsics_vec256
+            v7___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v3__0, v7__0);
+        Lib_IntVector_Intrinsics_vec256 v8 = v0___0;
+        Lib_IntVector_Intrinsics_vec256 v9 = v2___0;
+        Lib_IntVector_Intrinsics_vec256 v10 = v4___0;
+        Lib_IntVector_Intrinsics_vec256 v11 = v6___0;
+        Lib_IntVector_Intrinsics_vec256 v12 = v1___0;
+        Lib_IntVector_Intrinsics_vec256 v13 = v3___0;
+        Lib_IntVector_Intrinsics_vec256 v14 = v5___0;
+        Lib_IntVector_Intrinsics_vec256 v15 = v7___0;
+        k[0U] = v0;
+        k[1U] = v8;
+        k[2U] = v1;
+        k[3U] = v9;
+        k[4U] = v2;
+        k[5U] = v10;
+        k[6U] = v3;
+        k[7U] = v11;
+        k[8U] = v4;
+        k[9U] = v12;
+        k[10U] = v5;
+        k[11U] = v13;
+        k[12U] = v6;
+        k[13U] = v14;
+        k[14U] = v7;
+        k[15U] = v15;
+        for (uint32_t i0 = (uint32_t)0U; i0 < (uint32_t)16U; i0++) {
+            Lib_IntVector_Intrinsics_vec256
+                x = Lib_IntVector_Intrinsics_vec256_load_le(uu____1 + i0 * (uint32_t)32U);
+            Lib_IntVector_Intrinsics_vec256 y = Lib_IntVector_Intrinsics_vec256_xor(x, k[i0]);
+            Lib_IntVector_Intrinsics_vec256_store_le(uu____0 + i0 * (uint32_t)32U, y);
+        }
+    }
+    if (rem2 > (uint32_t)0U) {
+        uint8_t *uu____2 = out + nb * (uint32_t)512U;
+        uint8_t *uu____3 = cipher + nb * (uint32_t)512U;
+        uint8_t plain[512U] = { 0U };
+        memcpy(plain, uu____3, rem1 * sizeof(uu____3[0U]));
+        Lib_IntVector_Intrinsics_vec256 k[16U];
+        for (uint32_t _i = 0U; _i < (uint32_t)16U; ++_i)
+            k[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+        chacha20_core_256(k, ctx, nb);
+        Lib_IntVector_Intrinsics_vec256 v00 = k[0U];
+        Lib_IntVector_Intrinsics_vec256 v16 = k[1U];
+        Lib_IntVector_Intrinsics_vec256 v20 = k[2U];
+        Lib_IntVector_Intrinsics_vec256 v30 = k[3U];
+        Lib_IntVector_Intrinsics_vec256 v40 = k[4U];
+        Lib_IntVector_Intrinsics_vec256 v50 = k[5U];
+        Lib_IntVector_Intrinsics_vec256 v60 = k[6U];
+        Lib_IntVector_Intrinsics_vec256 v70 = k[7U];
+        Lib_IntVector_Intrinsics_vec256
+            v0_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v00, v16);
+        Lib_IntVector_Intrinsics_vec256
+            v1_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v00, v16);
+        Lib_IntVector_Intrinsics_vec256
+            v2_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v20, v30);
+        Lib_IntVector_Intrinsics_vec256
+            v3_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v20, v30);
+        Lib_IntVector_Intrinsics_vec256
+            v4_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v40, v50);
+        Lib_IntVector_Intrinsics_vec256
+            v5_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v40, v50);
+        Lib_IntVector_Intrinsics_vec256
+            v6_ = Lib_IntVector_Intrinsics_vec256_interleave_low32(v60, v70);
+        Lib_IntVector_Intrinsics_vec256
+            v7_ = Lib_IntVector_Intrinsics_vec256_interleave_high32(v60, v70);
+        Lib_IntVector_Intrinsics_vec256
+            v0__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v0_, v2_);
+        Lib_IntVector_Intrinsics_vec256
+            v1__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v0_, v2_);
+        Lib_IntVector_Intrinsics_vec256
+            v2__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v1_, v3_);
+        Lib_IntVector_Intrinsics_vec256
+            v3__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v1_, v3_);
+        Lib_IntVector_Intrinsics_vec256
+            v4__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v4_, v6_);
+        Lib_IntVector_Intrinsics_vec256
+            v5__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v4_, v6_);
+        Lib_IntVector_Intrinsics_vec256
+            v6__ = Lib_IntVector_Intrinsics_vec256_interleave_low64(v5_, v7_);
+        Lib_IntVector_Intrinsics_vec256
+            v7__ = Lib_IntVector_Intrinsics_vec256_interleave_high64(v5_, v7_);
+        Lib_IntVector_Intrinsics_vec256
+            v0___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v0__, v4__);
+        Lib_IntVector_Intrinsics_vec256
+            v1___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v0__, v4__);
+        Lib_IntVector_Intrinsics_vec256
+            v2___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v1__, v5__);
+        Lib_IntVector_Intrinsics_vec256
+            v3___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v1__, v5__);
+        Lib_IntVector_Intrinsics_vec256
+            v4___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v2__, v6__);
+        Lib_IntVector_Intrinsics_vec256
+            v5___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v2__, v6__);
+        Lib_IntVector_Intrinsics_vec256
+            v6___ = Lib_IntVector_Intrinsics_vec256_interleave_low128(v3__, v7__);
+        Lib_IntVector_Intrinsics_vec256
+            v7___ = Lib_IntVector_Intrinsics_vec256_interleave_high128(v3__, v7__);
+        Lib_IntVector_Intrinsics_vec256 v0 = v0___;
+        Lib_IntVector_Intrinsics_vec256 v1 = v2___;
+        Lib_IntVector_Intrinsics_vec256 v2 = v4___;
+        Lib_IntVector_Intrinsics_vec256 v3 = v6___;
+        Lib_IntVector_Intrinsics_vec256 v4 = v1___;
+        Lib_IntVector_Intrinsics_vec256 v5 = v3___;
+        Lib_IntVector_Intrinsics_vec256 v6 = v5___;
+        Lib_IntVector_Intrinsics_vec256 v7 = v7___;
+        Lib_IntVector_Intrinsics_vec256 v01 = k[8U];
+        Lib_IntVector_Intrinsics_vec256 v110 = k[9U];
+        Lib_IntVector_Intrinsics_vec256 v21 = k[10U];
+        Lib_IntVector_Intrinsics_vec256 v31 = k[11U];
+        Lib_IntVector_Intrinsics_vec256 v41 = k[12U];
+        Lib_IntVector_Intrinsics_vec256 v51 = k[13U];
+        Lib_IntVector_Intrinsics_vec256 v61 = k[14U];
+        Lib_IntVector_Intrinsics_vec256 v71 = k[15U];
+        Lib_IntVector_Intrinsics_vec256
+            v0_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v01, v110);
+        Lib_IntVector_Intrinsics_vec256
+            v1_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v01, v110);
+        Lib_IntVector_Intrinsics_vec256
+            v2_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v21, v31);
+        Lib_IntVector_Intrinsics_vec256
+            v3_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v21, v31);
+        Lib_IntVector_Intrinsics_vec256
+            v4_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v41, v51);
+        Lib_IntVector_Intrinsics_vec256
+            v5_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v41, v51);
+        Lib_IntVector_Intrinsics_vec256
+            v6_0 = Lib_IntVector_Intrinsics_vec256_interleave_low32(v61, v71);
+        Lib_IntVector_Intrinsics_vec256
+            v7_0 = Lib_IntVector_Intrinsics_vec256_interleave_high32(v61, v71);
+        Lib_IntVector_Intrinsics_vec256
+            v0__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v0_0, v2_0);
+        Lib_IntVector_Intrinsics_vec256
+            v1__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v0_0, v2_0);
+        Lib_IntVector_Intrinsics_vec256
+            v2__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v1_0, v3_0);
+        Lib_IntVector_Intrinsics_vec256
+            v3__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v1_0, v3_0);
+        Lib_IntVector_Intrinsics_vec256
+            v4__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v4_0, v6_0);
+        Lib_IntVector_Intrinsics_vec256
+            v5__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v4_0, v6_0);
+        Lib_IntVector_Intrinsics_vec256
+            v6__0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(v5_0, v7_0);
+        Lib_IntVector_Intrinsics_vec256
+            v7__0 = Lib_IntVector_Intrinsics_vec256_interleave_high64(v5_0, v7_0);
+        Lib_IntVector_Intrinsics_vec256
+            v0___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v0__0, v4__0);
+        Lib_IntVector_Intrinsics_vec256
+            v1___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v0__0, v4__0);
+        Lib_IntVector_Intrinsics_vec256
+            v2___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v1__0, v5__0);
+        Lib_IntVector_Intrinsics_vec256
+            v3___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v1__0, v5__0);
+        Lib_IntVector_Intrinsics_vec256
+            v4___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v2__0, v6__0);
+        Lib_IntVector_Intrinsics_vec256
+            v5___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v2__0, v6__0);
+        Lib_IntVector_Intrinsics_vec256
+            v6___0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v3__0, v7__0);
+        Lib_IntVector_Intrinsics_vec256
+            v7___0 = Lib_IntVector_Intrinsics_vec256_interleave_high128(v3__0, v7__0);
+        Lib_IntVector_Intrinsics_vec256 v8 = v0___0;
+        Lib_IntVector_Intrinsics_vec256 v9 = v2___0;
+        Lib_IntVector_Intrinsics_vec256 v10 = v4___0;
+        Lib_IntVector_Intrinsics_vec256 v11 = v6___0;
+        Lib_IntVector_Intrinsics_vec256 v12 = v1___0;
+        Lib_IntVector_Intrinsics_vec256 v13 = v3___0;
+        Lib_IntVector_Intrinsics_vec256 v14 = v5___0;
+        Lib_IntVector_Intrinsics_vec256 v15 = v7___0;
+        k[0U] = v0;
+        k[1U] = v8;
+        k[2U] = v1;
+        k[3U] = v9;
+        k[4U] = v2;
+        k[5U] = v10;
+        k[6U] = v3;
+        k[7U] = v11;
+        k[8U] = v4;
+        k[9U] = v12;
+        k[10U] = v5;
+        k[11U] = v13;
+        k[12U] = v6;
+        k[13U] = v14;
+        k[14U] = v7;
+        k[15U] = v15;
+        for (uint32_t i = (uint32_t)0U; i < (uint32_t)16U; i++) {
+            Lib_IntVector_Intrinsics_vec256
+                x = Lib_IntVector_Intrinsics_vec256_load_le(plain + i * (uint32_t)32U);
+            Lib_IntVector_Intrinsics_vec256 y = Lib_IntVector_Intrinsics_vec256_xor(x, k[i]);
+            Lib_IntVector_Intrinsics_vec256_store_le(plain + i * (uint32_t)32U, y);
+        }
+        memcpy(uu____2, plain, rem1 * sizeof(plain[0U]));
+    }
+}
new file mode 100644
--- /dev/null
+++ b/security/nss/lib/freebl/verified/Hacl_Chacha20_Vec256.h
@@ -0,0 +1,55 @@
+/* MIT License
+ *
+ * Copyright (c) 2016-2020 INRIA, CMU and Microsoft Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "libintvector.h"
+#include "kremlin/internal/types.h"
+#include "kremlin/lowstar_endianness.h"
+#include <string.h>
+#include <stdbool.h>
+
+#ifndef __Hacl_Chacha20_Vec256_H
+#define __Hacl_Chacha20_Vec256_H
+
+#include "Hacl_Chacha20.h"
+#include "Hacl_Kremlib.h"
+
+void
+Hacl_Chacha20_Vec256_chacha20_encrypt_256(
+    uint32_t len,
+    uint8_t *out,
+    uint8_t *text,
+    uint8_t *key,
+    uint8_t *n1,
+    uint32_t ctr);
+
+void
+Hacl_Chacha20_Vec256_chacha20_decrypt_256(
+    uint32_t len,
+    uint8_t *out,
+    uint8_t *cipher,
+    uint8_t *key,
+    uint8_t *n1,
+    uint32_t ctr);
+
+#define __Hacl_Chacha20_Vec256_H_DEFINED
+#endif
--- a/security/nss/lib/freebl/verified/Hacl_Curve25519_51.c
+++ b/security/nss/lib/freebl/verified/Hacl_Curve25519_51.c
@@ -18,18 +18,18 @@
  * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  */
 
 #include "Hacl_Curve25519_51.h"
 
-inline static void
-Hacl_Impl_Curve25519_Field51_fadd(uint64_t *out, uint64_t *f1, uint64_t *f2)
+static inline void
+fadd0(uint64_t *out, uint64_t *f1, uint64_t *f2)
 {
     uint64_t f10 = f1[0U];
     uint64_t f20 = f2[0U];
     uint64_t f11 = f1[1U];
     uint64_t f21 = f2[1U];
     uint64_t f12 = f1[2U];
     uint64_t f22 = f2[2U];
     uint64_t f13 = f1[3U];
@@ -38,18 +38,18 @@ Hacl_Impl_Curve25519_Field51_fadd(uint64
     uint64_t f24 = f2[4U];
     out[0U] = f10 + f20;
     out[1U] = f11 + f21;
     out[2U] = f12 + f22;
     out[3U] = f13 + f23;
     out[4U] = f14 + f24;
 }
 
-inline static void
-Hacl_Impl_Curve25519_Field51_fsub(uint64_t *out, uint64_t *f1, uint64_t *f2)
+static inline void
+fsub0(uint64_t *out, uint64_t *f1, uint64_t *f2)
 {
     uint64_t f10 = f1[0U];
     uint64_t f20 = f2[0U];
     uint64_t f11 = f1[1U];
     uint64_t f21 = f2[1U];
     uint64_t f12 = f1[2U];
     uint64_t f22 = f2[2U];
     uint64_t f13 = f1[3U];
@@ -58,22 +58,18 @@ Hacl_Impl_Curve25519_Field51_fsub(uint64
     uint64_t f24 = f2[4U];
     out[0U] = f10 + (uint64_t)0x3fffffffffff68U - f20;
     out[1U] = f11 + (uint64_t)0x3ffffffffffff8U - f21;
     out[2U] = f12 + (uint64_t)0x3ffffffffffff8U - f22;
     out[3U] = f13 + (uint64_t)0x3ffffffffffff8U - f23;
     out[4U] = f14 + (uint64_t)0x3ffffffffffff8U - f24;
 }
 
-inline static void
-Hacl_Impl_Curve25519_Field51_fmul(
-    uint64_t *out,
-    uint64_t *f1,
-    uint64_t *f2,
-    FStar_UInt128_uint128 *uu____2959)
+static inline void
+fmul0(uint64_t *out, uint64_t *f1, uint64_t *f2)
 {
     uint64_t f10 = f1[0U];
     uint64_t f11 = f1[1U];
     uint64_t f12 = f1[2U];
     uint64_t f13 = f1[3U];
     uint64_t f14 = f1[4U];
     uint64_t f20 = f2[0U];
     uint64_t f21 = f2[1U];
@@ -140,22 +136,18 @@ Hacl_Impl_Curve25519_Field51_fmul(
     uint64_t o4 = tmp41;
     out[0U] = o0;
     out[1U] = o1;
     out[2U] = o2;
     out[3U] = o3;
     out[4U] = o4;
 }
 
-inline static void
-Hacl_Impl_Curve25519_Field51_fmul2(
-    uint64_t *out,
-    uint64_t *f1,
-    uint64_t *f2,
-    FStar_UInt128_uint128 *uu____4281)
+static inline void
+fmul20(uint64_t *out, uint64_t *f1, uint64_t *f2)
 {
     uint64_t f10 = f1[0U];
     uint64_t f11 = f1[1U];
     uint64_t f12 = f1[2U];
     uint64_t f13 = f1[3U];
     uint64_t f14 = f1[4U];
     uint64_t f20 = f2[0U];
     uint64_t f21 = f2[1U];
@@ -305,18 +297,18 @@ Hacl_Impl_Curve25519_Field51_fmul2(
     out[4U] = o14;
     out[5U] = o20;
     out[6U] = o21;
     out[7U] = o22;
     out[8U] = o23;
     out[9U] = o24;
 }
 
-inline static void
-Hacl_Impl_Curve25519_Field51_fmul1(uint64_t *out, uint64_t *f1, uint64_t f2)
+static inline void
+fmul1(uint64_t *out, uint64_t *f1, uint64_t f2)
 {
     uint64_t f10 = f1[0U];
     uint64_t f11 = f1[1U];
     uint64_t f12 = f1[2U];
     uint64_t f13 = f1[3U];
     uint64_t f14 = f1[4U];
     FStar_UInt128_uint128 tmp_w0 = FStar_UInt128_mul_wide(f2, f10);
     FStar_UInt128_uint128 tmp_w1 = FStar_UInt128_mul_wide(f2, f11);
@@ -349,21 +341,18 @@ Hacl_Impl_Curve25519_Field51_fmul1(uint6
     uint64_t o4 = tmp4;
     out[0U] = o0;
     out[1U] = o1;
     out[2U] = o2;
     out[3U] = o3;
     out[4U] = o4;
 }
 
-inline static void
-Hacl_Impl_Curve25519_Field51_fsqr(
-    uint64_t *out,
-    uint64_t *f,
-    FStar_UInt128_uint128 *uu____6941)
+static inline void
+fsqr0(uint64_t *out, uint64_t *f)
 {
     uint64_t f0 = f[0U];
     uint64_t f1 = f[1U];
     uint64_t f2 = f[2U];
     uint64_t f3 = f[3U];
     uint64_t f4 = f[4U];
     uint64_t d0 = (uint64_t)2U * f0;
     uint64_t d1 = (uint64_t)2U * f1;
@@ -427,21 +416,18 @@ Hacl_Impl_Curve25519_Field51_fsqr(
     uint64_t o4 = tmp4;
     out[0U] = o0;
     out[1U] = o1;
     out[2U] = o2;
     out[3U] = o3;
     out[4U] = o4;
 }
 
-inline static void
-Hacl_Impl_Curve25519_Field51_fsqr2(
-    uint64_t *out,
-    uint64_t *f,
-    FStar_UInt128_uint128 *uu____7692)
+static inline void
+fsqr20(uint64_t *out, uint64_t *f)
 {
     uint64_t f10 = f[0U];
     uint64_t f11 = f[1U];
     uint64_t f12 = f[2U];
     uint64_t f13 = f[3U];
     uint64_t f14 = f[4U];
     uint64_t f20 = f[5U];
     uint64_t f21 = f[6U];
@@ -586,17 +572,17 @@ Hacl_Impl_Curve25519_Field51_fsqr2(
     out[5U] = o20;
     out[6U] = o21;
     out[7U] = o22;
     out[8U] = o23;
     out[9U] = o24;
 }
 
 static void
-Hacl_Impl_Curve25519_Field51_store_felem(uint64_t *u64s, uint64_t *f)
+store_felem(uint64_t *u64s, uint64_t *f)
 {
     uint64_t f0 = f[0U];
     uint64_t f1 = f[1U];
     uint64_t f2 = f[2U];
     uint64_t f3 = f[3U];
     uint64_t f4 = f[4U];
     uint64_t l_ = f0 + (uint64_t)0U;
     uint64_t tmp0 = l_ & (uint64_t)0x7ffffffffffffU;
@@ -646,122 +632,111 @@ Hacl_Impl_Curve25519_Field51_store_felem
     uint64_t o2 = o20;
     uint64_t o3 = o30;
     u64s[0U] = o0;
     u64s[1U] = o1;
     u64s[2U] = o2;
     u64s[3U] = o3;
 }
 
-inline static void
-Hacl_Impl_Curve25519_Field51_cswap2(uint64_t bit, uint64_t *p1, uint64_t *p2)
+static inline void
+cswap20(uint64_t bit, uint64_t *p1, uint64_t *p2)
 {
     uint64_t mask = (uint64_t)0U - bit;
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)10U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)10U; i++) {
         uint64_t dummy = mask & (p1[i] ^ p2[i]);
         p1[i] = p1[i] ^ dummy;
         p2[i] = p2[i] ^ dummy;
     }
 }
 
-static uint8_t
-    Hacl_Curve25519_51_g25519[32U] =
-        {
-          (uint8_t)9U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U,
-          (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U,
-          (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U,
-          (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U,
-          (uint8_t)0U, (uint8_t)0U, (uint8_t)0U, (uint8_t)0U
-        };
+static uint8_t g25519[32U] = { (uint8_t)9U };
 
 static void
-Hacl_Curve25519_51_point_add_and_double(
-    uint64_t *q,
-    uint64_t *p01_tmp1,
-    FStar_UInt128_uint128 *tmp2)
+point_add_and_double(uint64_t *q, uint64_t *p01_tmp1, FStar_UInt128_uint128 *tmp2)
 {
     uint64_t *nq = p01_tmp1;
     uint64_t *nq_p1 = p01_tmp1 + (uint32_t)10U;
     uint64_t *tmp1 = p01_tmp1 + (uint32_t)20U;
     uint64_t *x1 = q;
     uint64_t *x2 = nq;
     uint64_t *z2 = nq + (uint32_t)5U;
     uint64_t *z3 = nq_p1 + (uint32_t)5U;
     uint64_t *a = tmp1;
     uint64_t *b = tmp1 + (uint32_t)5U;
     uint64_t *ab = tmp1;
     uint64_t *dc = tmp1 + (uint32_t)10U;
-    Hacl_Impl_Curve25519_Field51_fadd(a, x2, z2);
-    Hacl_Impl_Curve25519_Field51_fsub(b, x2, z2);
+    fadd0(a, x2, z2);
+    fsub0(b, x2, z2);
     uint64_t *x3 = nq_p1;
     uint64_t *z31 = nq_p1 + (uint32_t)5U;
     uint64_t *d0 = dc;
     uint64_t *c0 = dc + (uint32_t)5U;
-    Hacl_Impl_Curve25519_Field51_fadd(c0, x3, z31);
-    Hacl_Impl_Curve25519_Field51_fsub(d0, x3, z31);
-    Hacl_Impl_Curve25519_Field51_fmul2(dc, dc, ab, tmp2);
-    Hacl_Impl_Curve25519_Field51_fadd(x3, d0, c0);
-    Hacl_Impl_Curve25519_Field51_fsub(z31, d0, c0);
+    fadd0(c0, x3, z31);
+    fsub0(d0, x3, z31);
+    fmul20(dc, dc, ab);
+    fadd0(x3, d0, c0);
+    fsub0(z31, d0, c0);
     uint64_t *a1 = tmp1;
     uint64_t *b1 = tmp1 + (uint32_t)5U;
     uint64_t *d = tmp1 + (uint32_t)10U;
     uint64_t *c = tmp1 + (uint32_t)15U;
     uint64_t *ab1 = tmp1;
     uint64_t *dc1 = tmp1 + (uint32_t)10U;
-    Hacl_Impl_Curve25519_Field51_fsqr2(dc1, ab1, tmp2);
-    Hacl_Impl_Curve25519_Field51_fsqr2(nq_p1, nq_p1, tmp2);
+    fsqr20(dc1, ab1);
+    fsqr20(nq_p1, nq_p1);
     a1[0U] = c[0U];
     a1[1U] = c[1U];
     a1[2U] = c[2U];
     a1[3U] = c[3U];
     a1[4U] = c[4U];
-    Hacl_Impl_Curve25519_Field51_fsub(c, d, c);
-    Hacl_Impl_Curve25519_Field51_fmul1(b1, c, (uint64_t)121665U);
-    Hacl_Impl_Curve25519_Field51_fadd(b1, b1, d);
-    Hacl_Impl_Curve25519_Field51_fmul2(nq, dc1, ab1, tmp2);
-    Hacl_Impl_Curve25519_Field51_fmul(z3, z3, x1, tmp2);
+    fsub0(c, d, c);
+    fmul1(b1, c, (uint64_t)121665U);
+    fadd0(b1, b1, d);
+    fmul20(nq, dc1, ab1);
+    fmul0(z3, z3, x1);
 }
 
 static void
-Hacl_Curve25519_51_point_double(uint64_t *nq, uint64_t *tmp1, FStar_UInt128_uint128 *tmp2)
+point_double(uint64_t *nq, uint64_t *tmp1, FStar_UInt128_uint128 *tmp2)
 {
     uint64_t *x2 = nq;
     uint64_t *z2 = nq + (uint32_t)5U;
     uint64_t *a = tmp1;
     uint64_t *b = tmp1 + (uint32_t)5U;
     uint64_t *d = tmp1 + (uint32_t)10U;
     uint64_t *c = tmp1 + (uint32_t)15U;
     uint64_t *ab = tmp1;
     uint64_t *dc = tmp1 + (uint32_t)10U;
-    Hacl_Impl_Curve25519_Field51_fadd(a, x2, z2);
-    Hacl_Impl_Curve25519_Field51_fsub(b, x2, z2);
-    Hacl_Impl_Curve25519_Field51_fsqr2(dc, ab, tmp2);
+    fadd0(a, x2, z2);
+    fsub0(b, x2, z2);
+    fsqr20(dc, ab);
     a[0U] = c[0U];
     a[1U] = c[1U];
     a[2U] = c[2U];
     a[3U] = c[3U];
     a[4U] = c[4U];
-    Hacl_Impl_Curve25519_Field51_fsub(c, d, c);
-    Hacl_Impl_Curve25519_Field51_fmul1(b, c, (uint64_t)121665U);
-    Hacl_Impl_Curve25519_Field51_fadd(b, b, d);
-    Hacl_Impl_Curve25519_Field51_fmul2(nq, dc, ab, tmp2);
+    fsub0(c, d, c);
+    fmul1(b, c, (uint64_t)121665U);
+    fadd0(b, b, d);
+    fmul20(nq, dc, ab);
 }
 
 static void
-Hacl_Curve25519_51_montgomery_ladder(uint64_t *out, uint8_t *key, uint64_t *init1)
+montgomery_ladder(uint64_t *out, uint8_t *key, uint64_t *init1)
 {
     FStar_UInt128_uint128 tmp2[10U];
     for (uint32_t _i = 0U; _i < (uint32_t)10U; ++_i)
         tmp2[_i] = FStar_UInt128_uint64_to_uint128((uint64_t)0U);
     uint64_t p01_tmp1_swap[41U] = { 0U };
     uint64_t *p0 = p01_tmp1_swap;
     uint64_t *p01 = p01_tmp1_swap;
     uint64_t *p03 = p01;
     uint64_t *p11 = p01 + (uint32_t)10U;
-    memcpy(p11, init1, (uint32_t)10U * sizeof init1[0U]);
+    memcpy(p11, init1, (uint32_t)10U * sizeof(init1[0U]));
     uint64_t *x0 = p03;
     uint64_t *z0 = p03 + (uint32_t)5U;
     x0[0U] = (uint64_t)1U;
     x0[1U] = (uint64_t)0U;
     x0[2U] = (uint64_t)0U;
     x0[3U] = (uint64_t)0U;
     x0[4U] = (uint64_t)0U;
     z0[0U] = (uint64_t)0U;
@@ -769,114 +744,110 @@ Hacl_Curve25519_51_montgomery_ladder(uin
     z0[2U] = (uint64_t)0U;
     z0[3U] = (uint64_t)0U;
     z0[4U] = (uint64_t)0U;
     uint64_t *p01_tmp1 = p01_tmp1_swap;
     uint64_t *p01_tmp11 = p01_tmp1_swap;
     uint64_t *nq1 = p01_tmp1_swap;
     uint64_t *nq_p11 = p01_tmp1_swap + (uint32_t)10U;
     uint64_t *swap1 = p01_tmp1_swap + (uint32_t)40U;
-    Hacl_Impl_Curve25519_Field51_cswap2((uint64_t)1U, nq1, nq_p11);
-    Hacl_Curve25519_51_point_add_and_double(init1, p01_tmp11, tmp2);
+    cswap20((uint64_t)1U, nq1, nq_p11);
+    point_add_and_double(init1, p01_tmp11, tmp2);
     swap1[0U] = (uint64_t)1U;
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)251U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)251U; i++) {
         uint64_t *p01_tmp12 = p01_tmp1_swap;
         uint64_t *swap2 = p01_tmp1_swap + (uint32_t)40U;
         uint64_t *nq2 = p01_tmp12;
         uint64_t *nq_p12 = p01_tmp12 + (uint32_t)10U;
         uint64_t
             bit =
                 (uint64_t)(key[((uint32_t)253U - i) / (uint32_t)8U] >> ((uint32_t)253U - i) % (uint32_t)8U & (uint8_t)1U);
         uint64_t sw = swap2[0U] ^ bit;
-        Hacl_Impl_Curve25519_Field51_cswap2(sw, nq2, nq_p12);
-        Hacl_Curve25519_51_point_add_and_double(init1, p01_tmp12, tmp2);
+        cswap20(sw, nq2, nq_p12);
+        point_add_and_double(init1, p01_tmp12, tmp2);
         swap2[0U] = bit;
     }
     uint64_t sw = swap1[0U];
-    Hacl_Impl_Curve25519_Field51_cswap2(sw, nq1, nq_p11);
+    cswap20(sw, nq1, nq_p11);
     uint64_t *nq10 = p01_tmp1;
     uint64_t *tmp1 = p01_tmp1 + (uint32_t)20U;
-    Hacl_Curve25519_51_point_double(nq10, tmp1, tmp2);
-    Hacl_Curve25519_51_point_double(nq10, tmp1, tmp2);
-    Hacl_Curve25519_51_point_double(nq10, tmp1, tmp2);
-    memcpy(out, p0, (uint32_t)10U * sizeof p0[0U]);
+    point_double(nq10, tmp1, tmp2);
+    point_double(nq10, tmp1, tmp2);
+    point_double(nq10, tmp1, tmp2);
+    memcpy(out, p0, (uint32_t)10U * sizeof(p0[0U]));
 }
 
 static void
-Hacl_Curve25519_51_fsquare_times(
-    uint64_t *o,
-    uint64_t *inp,
-    FStar_UInt128_uint128 *tmp,
-    uint32_t n1)
+fsquare_times(uint64_t *o, uint64_t *inp, FStar_UInt128_uint128 *tmp, uint32_t n1)
 {
-    Hacl_Impl_Curve25519_Field51_fsqr(o, inp, tmp);
-    for (uint32_t i = (uint32_t)0U; i < n1 - (uint32_t)1U; i = i + (uint32_t)1U) {
-        Hacl_Impl_Curve25519_Field51_fsqr(o, o, tmp);
+    fsqr0(o, inp);
+    for (uint32_t i = (uint32_t)0U; i < n1 - (uint32_t)1U; i++) {
+        fsqr0(o, o);
     }
 }
 
 static void
-Hacl_Curve25519_51_finv(uint64_t *o, uint64_t *i, FStar_UInt128_uint128 *tmp)
+finv(uint64_t *o, uint64_t *i, FStar_UInt128_uint128 *tmp)
 {
     uint64_t t1[20U] = { 0U };
     uint64_t *a = t1;
     uint64_t *b = t1 + (uint32_t)5U;
     uint64_t *c = t1 + (uint32_t)10U;
     uint64_t *t00 = t1 + (uint32_t)15U;
     FStar_UInt128_uint128 *tmp1 = tmp;
-    Hacl_Curve25519_51_fsquare_times(a, i, tmp1, (uint32_t)1U);
-    Hacl_Curve25519_51_fsquare_times(t00, a, tmp1, (uint32_t)2U);
-    Hacl_Impl_Curve25519_Field51_fmul(b, t00, i, tmp);
-    Hacl_Impl_Curve25519_Field51_fmul(a, b, a, tmp);
-    Hacl_Curve25519_51_fsquare_times(t00, a, tmp1, (uint32_t)1U);
-    Hacl_Impl_Curve25519_Field51_fmul(b, t00, b, tmp);
-    Hacl_Curve25519_51_fsquare_times(t00, b, tmp1, (uint32_t)5U);
-    Hacl_Impl_Curve25519_Field51_fmul(b, t00, b, tmp);
-    Hacl_Curve25519_51_fsquare_times(t00, b, tmp1, (uint32_t)10U);
-    Hacl_Impl_Curve25519_Field51_fmul(c, t00, b, tmp);
-    Hacl_Curve25519_51_fsquare_times(t00, c, tmp1, (uint32_t)20U);
-    Hacl_Impl_Curve25519_Field51_fmul(t00, t00, c, tmp);
-    Hacl_Curve25519_51_fsquare_times(t00, t00, tmp1, (uint32_t)10U);
-    Hacl_Impl_Curve25519_Field51_fmul(b, t00, b, tmp);
-    Hacl_Curve25519_51_fsquare_times(t00, b, tmp1, (uint32_t)50U);
-    Hacl_Impl_Curve25519_Field51_fmul(c, t00, b, tmp);
-    Hacl_Curve25519_51_fsquare_times(t00, c, tmp1, (uint32_t)100U);
-    Hacl_Impl_Curve25519_Field51_fmul(t00, t00, c, tmp);
-    Hacl_Curve25519_51_fsquare_times(t00, t00, tmp1, (uint32_t)50U);
-    Hacl_Impl_Curve25519_Field51_fmul(t00, t00, b, tmp);
-    Hacl_Curve25519_51_fsquare_times(t00, t00, tmp1, (uint32_t)5U);
+    fsquare_times(a, i, tmp1, (uint32_t)1U);
+    fsquare_times(t00, a, tmp1, (uint32_t)2U);
+    fmul0(b, t00, i);
+    fmul0(a, b, a);
+    fsquare_times(t00, a, tmp1, (uint32_t)1U);
+    fmul0(b, t00, b);
+    fsquare_times(t00, b, tmp1, (uint32_t)5U);
+    fmul0(b, t00, b);
+    fsquare_times(t00, b, tmp1, (uint32_t)10U);
+    fmul0(c, t00, b);
+    fsquare_times(t00, c, tmp1, (uint32_t)20U);
+    fmul0(t00, t00, c);
+    fsquare_times(t00, t00, tmp1, (uint32_t)10U);
+    fmul0(b, t00, b);
+    fsquare_times(t00, b, tmp1, (uint32_t)50U);
+    fmul0(c, t00, b);
+    fsquare_times(t00, c, tmp1, (uint32_t)100U);
+    fmul0(t00, t00, c);
+    fsquare_times(t00, t00, tmp1, (uint32_t)50U);
+    fmul0(t00, t00, b);
+    fsquare_times(t00, t00, tmp1, (uint32_t)5U);
     uint64_t *a0 = t1;
     uint64_t *t0 = t1 + (uint32_t)15U;
-    Hacl_Impl_Curve25519_Field51_fmul(o, t0, a0, tmp);
+    fmul0(o, t0, a0);
 }
 
 static void
-Hacl_Curve25519_51_encode_point(uint8_t *o, uint64_t *i)
+encode_point(uint8_t *o, uint64_t *i)
 {
     uint64_t *x = i;
     uint64_t *z = i + (uint32_t)5U;
     uint64_t tmp[5U] = { 0U };
     uint64_t u64s[4U] = { 0U };
     FStar_UInt128_uint128 tmp_w[10U];
     for (uint32_t _i = 0U; _i < (uint32_t)10U; ++_i)
         tmp_w[_i] = FStar_UInt128_uint64_to_uint128((uint64_t)0U);
-    Hacl_Curve25519_51_finv(tmp, z, tmp_w);
-    Hacl_Impl_Curve25519_Field51_fmul(tmp, tmp, x, tmp_w);
-    Hacl_Impl_Curve25519_Field51_store_felem(u64s, tmp);
-    for (uint32_t i0 = (uint32_t)0U; i0 < (uint32_t)4U; i0 = i0 + (uint32_t)1U) {
+    finv(tmp, z, tmp_w);
+    fmul0(tmp, tmp, x);
+    store_felem(u64s, tmp);
+    for (uint32_t i0 = (uint32_t)0U; i0 < (uint32_t)4U; i0++) {
         store64_le(o + i0 * (uint32_t)8U, u64s[i0]);
     }
 }
 
 void
 Hacl_Curve25519_51_scalarmult(uint8_t *out, uint8_t *priv, uint8_t *pub)
 {
     uint64_t init1[10U] = { 0U };
     uint64_t tmp[4U] = { 0U };
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)4U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)4U; i++) {
         uint64_t *os = tmp;
         uint8_t *bj = pub + i * (uint32_t)8U;
         uint64_t u = load64_le(bj);
         uint64_t r = u;
         uint64_t x = r;
         os[i] = x;
     }
     uint64_t tmp3 = tmp[3U];
@@ -896,38 +867,38 @@ Hacl_Curve25519_51_scalarmult(uint8_t *o
     uint64_t f2h = tmp[2U] >> (uint32_t)25U;
     uint64_t f3l = (tmp[3U] & (uint64_t)0xfffU) << (uint32_t)39U;
     uint64_t f3h = tmp[3U] >> (uint32_t)12U;
     x[0U] = f0l;
     x[1U] = f0h | f1l;
     x[2U] = f1h | f2l;
     x[3U] = f2h | f3l;
     x[4U] = f3h;
-    Hacl_Curve25519_51_montgomery_ladder(init1, priv, init1);
-    Hacl_Curve25519_51_encode_point(out, init1);
+    montgomery_ladder(init1, priv, init1);
+    encode_point(out, init1);
 }
 
 void
 Hacl_Curve25519_51_secret_to_public(uint8_t *pub, uint8_t *priv)
 {
     uint8_t basepoint[32U] = { 0U };
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)32U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)32U; i++) {
         uint8_t *os = basepoint;
-        uint8_t x = Hacl_Curve25519_51_g25519[i];
+        uint8_t x = g25519[i];
         os[i] = x;
     }
     Hacl_Curve25519_51_scalarmult(pub, priv, basepoint);
 }
 
 bool
 Hacl_Curve25519_51_ecdh(uint8_t *out, uint8_t *priv, uint8_t *pub)
 {
     uint8_t zeros1[32U] = { 0U };
     Hacl_Curve25519_51_scalarmult(out, priv, pub);
     uint8_t res = (uint8_t)255U;
-    for (uint32_t i = (uint32_t)0U; i < (uint32_t)32U; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < (uint32_t)32U; i++) {
         uint8_t uu____0 = FStar_UInt8_eq_mask(out[i], zeros1[i]);
         res = uu____0 & res;
     }
     uint8_t z = res;
     bool r = z == (uint8_t)255U;
     return !r;
 }
--- a/security/nss/lib/freebl/verified/Hacl_Kremlib.h
+++ b/security/nss/lib/freebl/verified/Hacl_Kremlib.h
@@ -24,28 +24,28 @@
 #include "kremlin/internal/types.h"
 #include "kremlin/lowstar_endianness.h"
 #include <string.h>
 #include <stdbool.h>
 
 #ifndef __Hacl_Kremlib_H
 #define __Hacl_Kremlib_H
 
-inline static uint8_t FStar_UInt8_eq_mask(uint8_t a, uint8_t b);
+static inline uint8_t FStar_UInt8_eq_mask(uint8_t a, uint8_t b);
 
-inline static uint64_t FStar_UInt64_eq_mask(uint64_t a, uint64_t b);
+static inline uint64_t FStar_UInt64_eq_mask(uint64_t a, uint64_t b);
 
-inline static uint64_t FStar_UInt64_gte_mask(uint64_t a, uint64_t b);
+static inline uint64_t FStar_UInt64_gte_mask(uint64_t a, uint64_t b);
 
-inline static FStar_UInt128_uint128
+static inline FStar_UInt128_uint128
 FStar_UInt128_add(FStar_UInt128_uint128 a, FStar_UInt128_uint128 b);
 
-inline static FStar_UInt128_uint128
+static inline FStar_UInt128_uint128
 FStar_UInt128_shift_right(FStar_UInt128_uint128 a, uint32_t s);
 
-inline static FStar_UInt128_uint128 FStar_UInt128_uint64_to_uint128(uint64_t a);
+static inline FStar_UInt128_uint128 FStar_UInt128_uint64_to_uint128(uint64_t a);
 
-inline static uint64_t FStar_UInt128_uint128_to_uint64(FStar_UInt128_uint128 a);
+static inline uint64_t FStar_UInt128_uint128_to_uint64(FStar_UInt128_uint128 a);
 
-inline static FStar_UInt128_uint128 FStar_UInt128_mul_wide(uint64_t x, uint64_t y);
+static inline FStar_UInt128_uint128 FStar_UInt128_mul_wide(uint64_t x, uint64_t y);
 
 #define __Hacl_Kremlib_H_DEFINED
 #endif
--- a/security/nss/lib/freebl/verified/Hacl_Poly1305_128.c
+++ b/security/nss/lib/freebl/verified/Hacl_Poly1305_128.c
@@ -234,71 +234,57 @@ Hacl_Impl_Poly1305_Field32xN_128_fmul_r2
             Lib_IntVector_Intrinsics_vec128_add64(a44,
                                                   Lib_IntVector_Intrinsics_vec128_mul64(r201, a4));
     Lib_IntVector_Intrinsics_vec128 t0 = a05;
     Lib_IntVector_Intrinsics_vec128 t1 = a15;
     Lib_IntVector_Intrinsics_vec128 t2 = a25;
     Lib_IntVector_Intrinsics_vec128 t3 = a35;
     Lib_IntVector_Intrinsics_vec128 t4 = a45;
     Lib_IntVector_Intrinsics_vec128
-        l0 = Lib_IntVector_Intrinsics_vec128_add64(t0, Lib_IntVector_Intrinsics_vec128_zero);
+        mask261 = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU);
     Lib_IntVector_Intrinsics_vec128
-        tmp00 =
-            Lib_IntVector_Intrinsics_vec128_and(l0,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c00 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(t1, c00);
+        z0 = Lib_IntVector_Intrinsics_vec128_shift_right64(t0, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        tmp10 =
-            Lib_IntVector_Intrinsics_vec128_and(l1,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c10 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(t2, c10);
+        z1 = Lib_IntVector_Intrinsics_vec128_shift_right64(t3, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 x0 = Lib_IntVector_Intrinsics_vec128_and(t0, mask261);
+    Lib_IntVector_Intrinsics_vec128 x3 = Lib_IntVector_Intrinsics_vec128_and(t3, mask261);
+    Lib_IntVector_Intrinsics_vec128 x1 = Lib_IntVector_Intrinsics_vec128_add64(t1, z0);
+    Lib_IntVector_Intrinsics_vec128 x4 = Lib_IntVector_Intrinsics_vec128_add64(t4, z1);
     Lib_IntVector_Intrinsics_vec128
-        tmp20 =
-            Lib_IntVector_Intrinsics_vec128_and(l2,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+        z01 = Lib_IntVector_Intrinsics_vec128_shift_right64(x1, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        c20 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(t3, c20);
+        z11 = Lib_IntVector_Intrinsics_vec128_shift_right64(x4, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        tmp30 =
-            Lib_IntVector_Intrinsics_vec128_and(l3,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+        t = Lib_IntVector_Intrinsics_vec128_shift_left64(z11, (uint32_t)2U);
+    Lib_IntVector_Intrinsics_vec128 z12 = Lib_IntVector_Intrinsics_vec128_add64(z11, t);
+    Lib_IntVector_Intrinsics_vec128 x11 = Lib_IntVector_Intrinsics_vec128_and(x1, mask261);
+    Lib_IntVector_Intrinsics_vec128 x41 = Lib_IntVector_Intrinsics_vec128_and(x4, mask261);
+    Lib_IntVector_Intrinsics_vec128 x2 = Lib_IntVector_Intrinsics_vec128_add64(t2, z01);
+    Lib_IntVector_Intrinsics_vec128 x01 = Lib_IntVector_Intrinsics_vec128_add64(x0, z12);
     Lib_IntVector_Intrinsics_vec128
-        c30 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l4 = Lib_IntVector_Intrinsics_vec128_add64(t4, c30);
-    Lib_IntVector_Intrinsics_vec128
-        tmp40 =
-            Lib_IntVector_Intrinsics_vec128_and(l4,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c40 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
+        z02 = Lib_IntVector_Intrinsics_vec128_shift_right64(x2, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        l5 =
-            Lib_IntVector_Intrinsics_vec128_add64(tmp00,
-                                                  Lib_IntVector_Intrinsics_vec128_smul64(c40, (uint64_t)5U));
-    Lib_IntVector_Intrinsics_vec128
-        tmp01 =
-            Lib_IntVector_Intrinsics_vec128_and(l5,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+        z13 = Lib_IntVector_Intrinsics_vec128_shift_right64(x01, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 x21 = Lib_IntVector_Intrinsics_vec128_and(x2, mask261);
+    Lib_IntVector_Intrinsics_vec128 x02 = Lib_IntVector_Intrinsics_vec128_and(x01, mask261);
+    Lib_IntVector_Intrinsics_vec128 x31 = Lib_IntVector_Intrinsics_vec128_add64(x3, z02);
+    Lib_IntVector_Intrinsics_vec128 x12 = Lib_IntVector_Intrinsics_vec128_add64(x11, z13);
     Lib_IntVector_Intrinsics_vec128
-        c50 = Lib_IntVector_Intrinsics_vec128_shift_right64(l5, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 tmp11 = Lib_IntVector_Intrinsics_vec128_add64(tmp10, c50);
-    Lib_IntVector_Intrinsics_vec128 o00 = tmp01;
-    Lib_IntVector_Intrinsics_vec128 o10 = tmp11;
-    Lib_IntVector_Intrinsics_vec128 o20 = tmp20;
-    Lib_IntVector_Intrinsics_vec128 o30 = tmp30;
-    Lib_IntVector_Intrinsics_vec128 o40 = tmp40;
+        z03 = Lib_IntVector_Intrinsics_vec128_shift_right64(x31, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 x32 = Lib_IntVector_Intrinsics_vec128_and(x31, mask261);
+    Lib_IntVector_Intrinsics_vec128 x42 = Lib_IntVector_Intrinsics_vec128_add64(x41, z03);
+    Lib_IntVector_Intrinsics_vec128 o0 = x02;
+    Lib_IntVector_Intrinsics_vec128 o10 = x12;
+    Lib_IntVector_Intrinsics_vec128 o20 = x21;
+    Lib_IntVector_Intrinsics_vec128 o30 = x32;
+    Lib_IntVector_Intrinsics_vec128 o40 = x42;
     Lib_IntVector_Intrinsics_vec128
         o01 =
-            Lib_IntVector_Intrinsics_vec128_add64(o00,
-                                                  Lib_IntVector_Intrinsics_vec128_interleave_high64(o00, o00));
+            Lib_IntVector_Intrinsics_vec128_add64(o0,
+                                                  Lib_IntVector_Intrinsics_vec128_interleave_high64(o0, o0));
     Lib_IntVector_Intrinsics_vec128
         o11 =
             Lib_IntVector_Intrinsics_vec128_add64(o10,
                                                   Lib_IntVector_Intrinsics_vec128_interleave_high64(o10, o10));
     Lib_IntVector_Intrinsics_vec128
         o21 =
             Lib_IntVector_Intrinsics_vec128_add64(o20,
                                                   Lib_IntVector_Intrinsics_vec128_interleave_high64(o20, o20));
@@ -313,60 +299,53 @@ Hacl_Impl_Poly1305_Field32xN_128_fmul_r2
     Lib_IntVector_Intrinsics_vec128
         l = Lib_IntVector_Intrinsics_vec128_add64(o01, Lib_IntVector_Intrinsics_vec128_zero);
     Lib_IntVector_Intrinsics_vec128
         tmp0 =
             Lib_IntVector_Intrinsics_vec128_and(l,
                                                 Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
     Lib_IntVector_Intrinsics_vec128
         c0 = Lib_IntVector_Intrinsics_vec128_shift_right64(l, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l6 = Lib_IntVector_Intrinsics_vec128_add64(o11, c0);
+    Lib_IntVector_Intrinsics_vec128 l0 = Lib_IntVector_Intrinsics_vec128_add64(o11, c0);
     Lib_IntVector_Intrinsics_vec128
         tmp1 =
-            Lib_IntVector_Intrinsics_vec128_and(l6,
+            Lib_IntVector_Intrinsics_vec128_and(l0,
                                                 Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
     Lib_IntVector_Intrinsics_vec128
-        c1 = Lib_IntVector_Intrinsics_vec128_shift_right64(l6, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l7 = Lib_IntVector_Intrinsics_vec128_add64(o21, c1);
+        c1 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(o21, c1);
     Lib_IntVector_Intrinsics_vec128
         tmp2 =
-            Lib_IntVector_Intrinsics_vec128_and(l7,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c2 = Lib_IntVector_Intrinsics_vec128_shift_right64(l7, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l8 = Lib_IntVector_Intrinsics_vec128_add64(o31, c2);
-    Lib_IntVector_Intrinsics_vec128
-        tmp3 =
-            Lib_IntVector_Intrinsics_vec128_and(l8,
+            Lib_IntVector_Intrinsics_vec128_and(l1,
                                                 Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
     Lib_IntVector_Intrinsics_vec128
-        c3 = Lib_IntVector_Intrinsics_vec128_shift_right64(l8, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l9 = Lib_IntVector_Intrinsics_vec128_add64(o41, c3);
+        c2 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(o31, c2);
     Lib_IntVector_Intrinsics_vec128
-        tmp4 =
-            Lib_IntVector_Intrinsics_vec128_and(l9,
+        tmp3 =
+            Lib_IntVector_Intrinsics_vec128_and(l2,
                                                 Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
     Lib_IntVector_Intrinsics_vec128
-        c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l9, (uint32_t)26U);
+        c3 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(o41, c3);
     Lib_IntVector_Intrinsics_vec128
-        l10 =
+        tmp4 =
+            Lib_IntVector_Intrinsics_vec128_and(l3,
+                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec128
+        c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128
+        o00 =
             Lib_IntVector_Intrinsics_vec128_add64(tmp0,
                                                   Lib_IntVector_Intrinsics_vec128_smul64(c4, (uint64_t)5U));
-    Lib_IntVector_Intrinsics_vec128
-        tmp0_ =
-            Lib_IntVector_Intrinsics_vec128_and(l10,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c5 = Lib_IntVector_Intrinsics_vec128_shift_right64(l10, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 o0 = tmp0_;
-    Lib_IntVector_Intrinsics_vec128 o1 = Lib_IntVector_Intrinsics_vec128_add64(tmp1, c5);
+    Lib_IntVector_Intrinsics_vec128 o1 = tmp1;
     Lib_IntVector_Intrinsics_vec128 o2 = tmp2;
     Lib_IntVector_Intrinsics_vec128 o3 = tmp3;
     Lib_IntVector_Intrinsics_vec128 o4 = tmp4;
-    out[0U] = o0;
+    out[0U] = o00;
     out[1U] = o1;
     out[2U] = o2;
     out[3U] = o3;
     out[4U] = o4;
 }
 
 uint32_t Hacl_Poly1305_128_blocklen = (uint32_t)16U;
 
@@ -530,67 +509,53 @@ Hacl_Poly1305_128_poly1305_init(Lib_IntV
             Lib_IntVector_Intrinsics_vec128_add64(a43,
                                                   Lib_IntVector_Intrinsics_vec128_mul64(r0, f14));
     Lib_IntVector_Intrinsics_vec128 t0 = a04;
     Lib_IntVector_Intrinsics_vec128 t1 = a14;
     Lib_IntVector_Intrinsics_vec128 t2 = a24;
     Lib_IntVector_Intrinsics_vec128 t3 = a34;
     Lib_IntVector_Intrinsics_vec128 t4 = a44;
     Lib_IntVector_Intrinsics_vec128
-        l = Lib_IntVector_Intrinsics_vec128_add64(t0, Lib_IntVector_Intrinsics_vec128_zero);
+        mask261 = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU);
     Lib_IntVector_Intrinsics_vec128
-        tmp0 =
-            Lib_IntVector_Intrinsics_vec128_and(l,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c0 = Lib_IntVector_Intrinsics_vec128_shift_right64(l, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l0 = Lib_IntVector_Intrinsics_vec128_add64(t1, c0);
+        z0 = Lib_IntVector_Intrinsics_vec128_shift_right64(t0, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        tmp1 =
-            Lib_IntVector_Intrinsics_vec128_and(l0,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c1 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(t2, c1);
+        z1 = Lib_IntVector_Intrinsics_vec128_shift_right64(t3, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 x0 = Lib_IntVector_Intrinsics_vec128_and(t0, mask261);
+    Lib_IntVector_Intrinsics_vec128 x3 = Lib_IntVector_Intrinsics_vec128_and(t3, mask261);
+    Lib_IntVector_Intrinsics_vec128 x1 = Lib_IntVector_Intrinsics_vec128_add64(t1, z0);
+    Lib_IntVector_Intrinsics_vec128 x4 = Lib_IntVector_Intrinsics_vec128_add64(t4, z1);
     Lib_IntVector_Intrinsics_vec128
-        tmp2 =
-            Lib_IntVector_Intrinsics_vec128_and(l1,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+        z01 = Lib_IntVector_Intrinsics_vec128_shift_right64(x1, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        c2 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(t3, c2);
+        z11 = Lib_IntVector_Intrinsics_vec128_shift_right64(x4, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        tmp3 =
-            Lib_IntVector_Intrinsics_vec128_and(l2,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+        t = Lib_IntVector_Intrinsics_vec128_shift_left64(z11, (uint32_t)2U);
+    Lib_IntVector_Intrinsics_vec128 z12 = Lib_IntVector_Intrinsics_vec128_add64(z11, t);
+    Lib_IntVector_Intrinsics_vec128 x11 = Lib_IntVector_Intrinsics_vec128_and(x1, mask261);
+    Lib_IntVector_Intrinsics_vec128 x41 = Lib_IntVector_Intrinsics_vec128_and(x4, mask261);
+    Lib_IntVector_Intrinsics_vec128 x2 = Lib_IntVector_Intrinsics_vec128_add64(t2, z01);
+    Lib_IntVector_Intrinsics_vec128 x01 = Lib_IntVector_Intrinsics_vec128_add64(x0, z12);
     Lib_IntVector_Intrinsics_vec128
-        c3 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(t4, c3);
-    Lib_IntVector_Intrinsics_vec128
-        tmp4 =
-            Lib_IntVector_Intrinsics_vec128_and(l3,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+        z02 = Lib_IntVector_Intrinsics_vec128_shift_right64(x2, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        l4 =
-            Lib_IntVector_Intrinsics_vec128_add64(tmp0,
-                                                  Lib_IntVector_Intrinsics_vec128_smul64(c4, (uint64_t)5U));
-    Lib_IntVector_Intrinsics_vec128
-        tmp01 =
-            Lib_IntVector_Intrinsics_vec128_and(l4,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+        z13 = Lib_IntVector_Intrinsics_vec128_shift_right64(x01, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 x21 = Lib_IntVector_Intrinsics_vec128_and(x2, mask261);
+    Lib_IntVector_Intrinsics_vec128 x02 = Lib_IntVector_Intrinsics_vec128_and(x01, mask261);
+    Lib_IntVector_Intrinsics_vec128 x31 = Lib_IntVector_Intrinsics_vec128_add64(x3, z02);
+    Lib_IntVector_Intrinsics_vec128 x12 = Lib_IntVector_Intrinsics_vec128_add64(x11, z13);
     Lib_IntVector_Intrinsics_vec128
-        c5 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 tmp11 = Lib_IntVector_Intrinsics_vec128_add64(tmp1, c5);
-    Lib_IntVector_Intrinsics_vec128 o0 = tmp01;
-    Lib_IntVector_Intrinsics_vec128 o1 = tmp11;
-    Lib_IntVector_Intrinsics_vec128 o2 = tmp2;
-    Lib_IntVector_Intrinsics_vec128 o3 = tmp3;
-    Lib_IntVector_Intrinsics_vec128 o4 = tmp4;
+        z03 = Lib_IntVector_Intrinsics_vec128_shift_right64(x31, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 x32 = Lib_IntVector_Intrinsics_vec128_and(x31, mask261);
+    Lib_IntVector_Intrinsics_vec128 x42 = Lib_IntVector_Intrinsics_vec128_add64(x41, z03);
+    Lib_IntVector_Intrinsics_vec128 o0 = x02;
+    Lib_IntVector_Intrinsics_vec128 o1 = x12;
+    Lib_IntVector_Intrinsics_vec128 o2 = x21;
+    Lib_IntVector_Intrinsics_vec128 o3 = x32;
+    Lib_IntVector_Intrinsics_vec128 o4 = x42;
     rn[0U] = o0;
     rn[1U] = o1;
     rn[2U] = o2;
     rn[3U] = o3;
     rn[4U] = o4;
     Lib_IntVector_Intrinsics_vec128 f201 = rn[0U];
     Lib_IntVector_Intrinsics_vec128 f21 = rn[1U];
     Lib_IntVector_Intrinsics_vec128 f22 = rn[2U];
@@ -766,67 +731,53 @@ Hacl_Poly1305_128_poly1305_update1(Lib_I
             Lib_IntVector_Intrinsics_vec128_add64(a45,
                                                   Lib_IntVector_Intrinsics_vec128_mul64(r0, a41));
     Lib_IntVector_Intrinsics_vec128 t0 = a06;
     Lib_IntVector_Intrinsics_vec128 t1 = a16;
     Lib_IntVector_Intrinsics_vec128 t2 = a26;
     Lib_IntVector_Intrinsics_vec128 t3 = a36;
     Lib_IntVector_Intrinsics_vec128 t4 = a46;
     Lib_IntVector_Intrinsics_vec128
-        l = Lib_IntVector_Intrinsics_vec128_add64(t0, Lib_IntVector_Intrinsics_vec128_zero);
+        mask261 = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU);
     Lib_IntVector_Intrinsics_vec128
-        tmp0 =
-            Lib_IntVector_Intrinsics_vec128_and(l,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c0 = Lib_IntVector_Intrinsics_vec128_shift_right64(l, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l0 = Lib_IntVector_Intrinsics_vec128_add64(t1, c0);
+        z0 = Lib_IntVector_Intrinsics_vec128_shift_right64(t0, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        tmp1 =
-            Lib_IntVector_Intrinsics_vec128_and(l0,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c1 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(t2, c1);
+        z1 = Lib_IntVector_Intrinsics_vec128_shift_right64(t3, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 x0 = Lib_IntVector_Intrinsics_vec128_and(t0, mask261);
+    Lib_IntVector_Intrinsics_vec128 x3 = Lib_IntVector_Intrinsics_vec128_and(t3, mask261);
+    Lib_IntVector_Intrinsics_vec128 x1 = Lib_IntVector_Intrinsics_vec128_add64(t1, z0);
+    Lib_IntVector_Intrinsics_vec128 x4 = Lib_IntVector_Intrinsics_vec128_add64(t4, z1);
     Lib_IntVector_Intrinsics_vec128
-        tmp2 =
-            Lib_IntVector_Intrinsics_vec128_and(l1,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+        z01 = Lib_IntVector_Intrinsics_vec128_shift_right64(x1, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        c2 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(t3, c2);
+        z11 = Lib_IntVector_Intrinsics_vec128_shift_right64(x4, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        tmp3 =
-            Lib_IntVector_Intrinsics_vec128_and(l2,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+        t = Lib_IntVector_Intrinsics_vec128_shift_left64(z11, (uint32_t)2U);
+    Lib_IntVector_Intrinsics_vec128 z12 = Lib_IntVector_Intrinsics_vec128_add64(z11, t);
+    Lib_IntVector_Intrinsics_vec128 x11 = Lib_IntVector_Intrinsics_vec128_and(x1, mask261);
+    Lib_IntVector_Intrinsics_vec128 x41 = Lib_IntVector_Intrinsics_vec128_and(x4, mask261);
+    Lib_IntVector_Intrinsics_vec128 x2 = Lib_IntVector_Intrinsics_vec128_add64(t2, z01);
+    Lib_IntVector_Intrinsics_vec128 x01 = Lib_IntVector_Intrinsics_vec128_add64(x0, z12);
     Lib_IntVector_Intrinsics_vec128
-        c3 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(t4, c3);
-    Lib_IntVector_Intrinsics_vec128
-        tmp4 =
-            Lib_IntVector_Intrinsics_vec128_and(l3,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+        z02 = Lib_IntVector_Intrinsics_vec128_shift_right64(x2, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        l4 =
-            Lib_IntVector_Intrinsics_vec128_add64(tmp0,
-                                                  Lib_IntVector_Intrinsics_vec128_smul64(c4, (uint64_t)5U));
-    Lib_IntVector_Intrinsics_vec128
-        tmp01 =
-            Lib_IntVector_Intrinsics_vec128_and(l4,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+        z13 = Lib_IntVector_Intrinsics_vec128_shift_right64(x01, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 x21 = Lib_IntVector_Intrinsics_vec128_and(x2, mask261);
+    Lib_IntVector_Intrinsics_vec128 x02 = Lib_IntVector_Intrinsics_vec128_and(x01, mask261);
+    Lib_IntVector_Intrinsics_vec128 x31 = Lib_IntVector_Intrinsics_vec128_add64(x3, z02);
+    Lib_IntVector_Intrinsics_vec128 x12 = Lib_IntVector_Intrinsics_vec128_add64(x11, z13);
     Lib_IntVector_Intrinsics_vec128
-        c5 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 tmp11 = Lib_IntVector_Intrinsics_vec128_add64(tmp1, c5);
-    Lib_IntVector_Intrinsics_vec128 o0 = tmp01;
-    Lib_IntVector_Intrinsics_vec128 o1 = tmp11;
-    Lib_IntVector_Intrinsics_vec128 o2 = tmp2;
-    Lib_IntVector_Intrinsics_vec128 o3 = tmp3;
-    Lib_IntVector_Intrinsics_vec128 o4 = tmp4;
+        z03 = Lib_IntVector_Intrinsics_vec128_shift_right64(x31, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 x32 = Lib_IntVector_Intrinsics_vec128_and(x31, mask261);
+    Lib_IntVector_Intrinsics_vec128 x42 = Lib_IntVector_Intrinsics_vec128_add64(x41, z03);
+    Lib_IntVector_Intrinsics_vec128 o0 = x02;
+    Lib_IntVector_Intrinsics_vec128 o1 = x12;
+    Lib_IntVector_Intrinsics_vec128 o2 = x21;
+    Lib_IntVector_Intrinsics_vec128 o3 = x32;
+    Lib_IntVector_Intrinsics_vec128 o4 = x42;
     acc[0U] = o0;
     acc[1U] = o1;
     acc[2U] = o2;
     acc[3U] = o3;
     acc[4U] = o4;
 }
 
 void
@@ -842,17 +793,17 @@ Hacl_Poly1305_128_poly1305_update(
     uint8_t *t0 = text;
     if (len0 > (uint32_t)0U) {
         uint32_t bs = (uint32_t)32U;
         uint8_t *text0 = t0;
         Hacl_Impl_Poly1305_Field32xN_128_load_acc2(acc, text0);
         uint32_t len1 = len0 - bs;
         uint8_t *text1 = t0 + bs;
         uint32_t nb = len1 / bs;
-        for (uint32_t i = (uint32_t)0U; i < nb; i = i + (uint32_t)1U) {
+        for (uint32_t i = (uint32_t)0U; i < nb; i++) {
             uint8_t *block = text1 + i * bs;
             Lib_IntVector_Intrinsics_vec128 e[5U];
             for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
                 e[_i] = Lib_IntVector_Intrinsics_vec128_zero;
             Lib_IntVector_Intrinsics_vec128 b1 = Lib_IntVector_Intrinsics_vec128_load_le(block);
             Lib_IntVector_Intrinsics_vec128
                 b2 = Lib_IntVector_Intrinsics_vec128_load_le(block + (uint32_t)16U);
             Lib_IntVector_Intrinsics_vec128 lo = Lib_IntVector_Intrinsics_vec128_interleave_low64(b1, b2);
@@ -997,67 +948,53 @@ Hacl_Poly1305_128_poly1305_update(
                     Lib_IntVector_Intrinsics_vec128_add64(a43,
                                                           Lib_IntVector_Intrinsics_vec128_mul64(r0, f140));
             Lib_IntVector_Intrinsics_vec128 t01 = a04;
             Lib_IntVector_Intrinsics_vec128 t1 = a14;
             Lib_IntVector_Intrinsics_vec128 t2 = a24;
             Lib_IntVector_Intrinsics_vec128 t3 = a34;
             Lib_IntVector_Intrinsics_vec128 t4 = a44;
             Lib_IntVector_Intrinsics_vec128
-                l = Lib_IntVector_Intrinsics_vec128_add64(t01, Lib_IntVector_Intrinsics_vec128_zero);
+                mask261 = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU);
             Lib_IntVector_Intrinsics_vec128
-                tmp0 =
-                    Lib_IntVector_Intrinsics_vec128_and(l,
-                                                        Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-            Lib_IntVector_Intrinsics_vec128
-                c0 = Lib_IntVector_Intrinsics_vec128_shift_right64(l, (uint32_t)26U);
-            Lib_IntVector_Intrinsics_vec128 l0 = Lib_IntVector_Intrinsics_vec128_add64(t1, c0);
+                z0 = Lib_IntVector_Intrinsics_vec128_shift_right64(t01, (uint32_t)26U);
             Lib_IntVector_Intrinsics_vec128
-                tmp1 =
-                    Lib_IntVector_Intrinsics_vec128_and(l0,
-                                                        Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-            Lib_IntVector_Intrinsics_vec128
-                c1 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
-            Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(t2, c1);
+                z1 = Lib_IntVector_Intrinsics_vec128_shift_right64(t3, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec128 x0 = Lib_IntVector_Intrinsics_vec128_and(t01, mask261);
+            Lib_IntVector_Intrinsics_vec128 x3 = Lib_IntVector_Intrinsics_vec128_and(t3, mask261);
+            Lib_IntVector_Intrinsics_vec128 x1 = Lib_IntVector_Intrinsics_vec128_add64(t1, z0);
+            Lib_IntVector_Intrinsics_vec128 x4 = Lib_IntVector_Intrinsics_vec128_add64(t4, z1);
             Lib_IntVector_Intrinsics_vec128
-                tmp2 =
-                    Lib_IntVector_Intrinsics_vec128_and(l1,
-                                                        Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+                z01 = Lib_IntVector_Intrinsics_vec128_shift_right64(x1, (uint32_t)26U);
             Lib_IntVector_Intrinsics_vec128
-                c2 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
-            Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(t3, c2);
+                z11 = Lib_IntVector_Intrinsics_vec128_shift_right64(x4, (uint32_t)26U);
             Lib_IntVector_Intrinsics_vec128
-                tmp3 =
-                    Lib_IntVector_Intrinsics_vec128_and(l2,
-                                                        Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+                t = Lib_IntVector_Intrinsics_vec128_shift_left64(z11, (uint32_t)2U);
+            Lib_IntVector_Intrinsics_vec128 z12 = Lib_IntVector_Intrinsics_vec128_add64(z11, t);
+            Lib_IntVector_Intrinsics_vec128 x11 = Lib_IntVector_Intrinsics_vec128_and(x1, mask261);
+            Lib_IntVector_Intrinsics_vec128 x41 = Lib_IntVector_Intrinsics_vec128_and(x4, mask261);
+            Lib_IntVector_Intrinsics_vec128 x2 = Lib_IntVector_Intrinsics_vec128_add64(t2, z01);
+            Lib_IntVector_Intrinsics_vec128 x01 = Lib_IntVector_Intrinsics_vec128_add64(x0, z12);
             Lib_IntVector_Intrinsics_vec128
-                c3 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
-            Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(t4, c3);
-            Lib_IntVector_Intrinsics_vec128
-                tmp4 =
-                    Lib_IntVector_Intrinsics_vec128_and(l3,
-                                                        Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-            Lib_IntVector_Intrinsics_vec128
-                c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+                z02 = Lib_IntVector_Intrinsics_vec128_shift_right64(x2, (uint32_t)26U);
             Lib_IntVector_Intrinsics_vec128
-                l4 =
-                    Lib_IntVector_Intrinsics_vec128_add64(tmp0,
-                                                          Lib_IntVector_Intrinsics_vec128_smul64(c4, (uint64_t)5U));
-            Lib_IntVector_Intrinsics_vec128
-                tmp01 =
-                    Lib_IntVector_Intrinsics_vec128_and(l4,
-                                                        Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+                z13 = Lib_IntVector_Intrinsics_vec128_shift_right64(x01, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec128 x21 = Lib_IntVector_Intrinsics_vec128_and(x2, mask261);
+            Lib_IntVector_Intrinsics_vec128 x02 = Lib_IntVector_Intrinsics_vec128_and(x01, mask261);
+            Lib_IntVector_Intrinsics_vec128 x31 = Lib_IntVector_Intrinsics_vec128_add64(x3, z02);
+            Lib_IntVector_Intrinsics_vec128 x12 = Lib_IntVector_Intrinsics_vec128_add64(x11, z13);
             Lib_IntVector_Intrinsics_vec128
-                c5 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
-            Lib_IntVector_Intrinsics_vec128 tmp11 = Lib_IntVector_Intrinsics_vec128_add64(tmp1, c5);
-            Lib_IntVector_Intrinsics_vec128 o00 = tmp01;
-            Lib_IntVector_Intrinsics_vec128 o10 = tmp11;
-            Lib_IntVector_Intrinsics_vec128 o20 = tmp2;
-            Lib_IntVector_Intrinsics_vec128 o30 = tmp3;
-            Lib_IntVector_Intrinsics_vec128 o40 = tmp4;
+                z03 = Lib_IntVector_Intrinsics_vec128_shift_right64(x31, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec128 x32 = Lib_IntVector_Intrinsics_vec128_and(x31, mask261);
+            Lib_IntVector_Intrinsics_vec128 x42 = Lib_IntVector_Intrinsics_vec128_add64(x41, z03);
+            Lib_IntVector_Intrinsics_vec128 o00 = x02;
+            Lib_IntVector_Intrinsics_vec128 o10 = x12;
+            Lib_IntVector_Intrinsics_vec128 o20 = x21;
+            Lib_IntVector_Intrinsics_vec128 o30 = x32;
+            Lib_IntVector_Intrinsics_vec128 o40 = x42;
             acc[0U] = o00;
             acc[1U] = o10;
             acc[2U] = o20;
             acc[3U] = o30;
             acc[4U] = o40;
             Lib_IntVector_Intrinsics_vec128 f100 = acc[0U];
             Lib_IntVector_Intrinsics_vec128 f11 = acc[1U];
             Lib_IntVector_Intrinsics_vec128 f12 = acc[2U];
@@ -1080,17 +1017,17 @@ Hacl_Poly1305_128_poly1305_update(
             acc[4U] = o4;
         }
         Hacl_Impl_Poly1305_Field32xN_128_fmul_r2_normalize(acc, pre);
     }
     uint32_t len1 = len - len0;
     uint8_t *t1 = text + len0;
     uint32_t nb = len1 / (uint32_t)16U;
     uint32_t rem1 = len1 % (uint32_t)16U;
-    for (uint32_t i = (uint32_t)0U; i < nb; i = i + (uint32_t)1U) {
+    for (uint32_t i = (uint32_t)0U; i < nb; i++) {
         uint8_t *block = t1 + i * (uint32_t)16U;
         Lib_IntVector_Intrinsics_vec128 e[5U];
         for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
             e[_i] = Lib_IntVector_Intrinsics_vec128_zero;
         uint64_t u0 = load64_le(block);
         uint64_t lo = u0;
         uint64_t u = load64_le(block + (uint32_t)8U);
         uint64_t hi = u;
@@ -1245,80 +1182,66 @@ Hacl_Poly1305_128_poly1305_update(
                 Lib_IntVector_Intrinsics_vec128_add64(a45,
                                                       Lib_IntVector_Intrinsics_vec128_mul64(r0, a41));
         Lib_IntVector_Intrinsics_vec128 t01 = a06;
         Lib_IntVector_Intrinsics_vec128 t11 = a16;
         Lib_IntVector_Intrinsics_vec128 t2 = a26;
         Lib_IntVector_Intrinsics_vec128 t3 = a36;
         Lib_IntVector_Intrinsics_vec128 t4 = a46;
         Lib_IntVector_Intrinsics_vec128
-            l = Lib_IntVector_Intrinsics_vec128_add64(t01, Lib_IntVector_Intrinsics_vec128_zero);
+            mask261 = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU);
         Lib_IntVector_Intrinsics_vec128
-            tmp0 =
-                Lib_IntVector_Intrinsics_vec128_and(l,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c0 = Lib_IntVector_Intrinsics_vec128_shift_right64(l, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l0 = Lib_IntVector_Intrinsics_vec128_add64(t11, c0);
+            z0 = Lib_IntVector_Intrinsics_vec128_shift_right64(t01, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            tmp1 =
-                Lib_IntVector_Intrinsics_vec128_and(l0,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c1 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(t2, c1);
+            z1 = Lib_IntVector_Intrinsics_vec128_shift_right64(t3, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x0 = Lib_IntVector_Intrinsics_vec128_and(t01, mask261);
+        Lib_IntVector_Intrinsics_vec128 x3 = Lib_IntVector_Intrinsics_vec128_and(t3, mask261);
+        Lib_IntVector_Intrinsics_vec128 x1 = Lib_IntVector_Intrinsics_vec128_add64(t11, z0);
+        Lib_IntVector_Intrinsics_vec128 x4 = Lib_IntVector_Intrinsics_vec128_add64(t4, z1);
         Lib_IntVector_Intrinsics_vec128
-            tmp2 =
-                Lib_IntVector_Intrinsics_vec128_and(l1,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            z01 = Lib_IntVector_Intrinsics_vec128_shift_right64(x1, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            c2 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(t3, c2);
+            z11 = Lib_IntVector_Intrinsics_vec128_shift_right64(x4, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            tmp3 =
-                Lib_IntVector_Intrinsics_vec128_and(l2,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            t = Lib_IntVector_Intrinsics_vec128_shift_left64(z11, (uint32_t)2U);
+        Lib_IntVector_Intrinsics_vec128 z12 = Lib_IntVector_Intrinsics_vec128_add64(z11, t);
+        Lib_IntVector_Intrinsics_vec128 x11 = Lib_IntVector_Intrinsics_vec128_and(x1, mask261);
+        Lib_IntVector_Intrinsics_vec128 x41 = Lib_IntVector_Intrinsics_vec128_and(x4, mask261);
+        Lib_IntVector_Intrinsics_vec128 x2 = Lib_IntVector_Intrinsics_vec128_add64(t2, z01);
+        Lib_IntVector_Intrinsics_vec128 x01 = Lib_IntVector_Intrinsics_vec128_add64(x0, z12);
         Lib_IntVector_Intrinsics_vec128
-            c3 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(t4, c3);
-        Lib_IntVector_Intrinsics_vec128
-            tmp4 =
-                Lib_IntVector_Intrinsics_vec128_and(l3,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+            z02 = Lib_IntVector_Intrinsics_vec128_shift_right64(x2, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            l4 =
-                Lib_IntVector_Intrinsics_vec128_add64(tmp0,
-                                                      Lib_IntVector_Intrinsics_vec128_smul64(c4, (uint64_t)5U));
-        Lib_IntVector_Intrinsics_vec128
-            tmp01 =
-                Lib_IntVector_Intrinsics_vec128_and(l4,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            z13 = Lib_IntVector_Intrinsics_vec128_shift_right64(x01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x21 = Lib_IntVector_Intrinsics_vec128_and(x2, mask261);
+        Lib_IntVector_Intrinsics_vec128 x02 = Lib_IntVector_Intrinsics_vec128_and(x01, mask261);
+        Lib_IntVector_Intrinsics_vec128 x31 = Lib_IntVector_Intrinsics_vec128_add64(x3, z02);
+        Lib_IntVector_Intrinsics_vec128 x12 = Lib_IntVector_Intrinsics_vec128_add64(x11, z13);
         Lib_IntVector_Intrinsics_vec128
-            c5 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 tmp11 = Lib_IntVector_Intrinsics_vec128_add64(tmp1, c5);
-        Lib_IntVector_Intrinsics_vec128 o0 = tmp01;
-        Lib_IntVector_Intrinsics_vec128 o1 = tmp11;
-        Lib_IntVector_Intrinsics_vec128 o2 = tmp2;
-        Lib_IntVector_Intrinsics_vec128 o3 = tmp3;
-        Lib_IntVector_Intrinsics_vec128 o4 = tmp4;
+            z03 = Lib_IntVector_Intrinsics_vec128_shift_right64(x31, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x32 = Lib_IntVector_Intrinsics_vec128_and(x31, mask261);
+        Lib_IntVector_Intrinsics_vec128 x42 = Lib_IntVector_Intrinsics_vec128_add64(x41, z03);
+        Lib_IntVector_Intrinsics_vec128 o0 = x02;
+        Lib_IntVector_Intrinsics_vec128 o1 = x12;
+        Lib_IntVector_Intrinsics_vec128 o2 = x21;
+        Lib_IntVector_Intrinsics_vec128 o3 = x32;
+        Lib_IntVector_Intrinsics_vec128 o4 = x42;
         acc[0U] = o0;
         acc[1U] = o1;
         acc[2U] = o2;
         acc[3U] = o3;
         acc[4U] = o4;
     }
     if (rem1 > (uint32_t)0U) {
         uint8_t *last1 = t1 + nb * (uint32_t)16U;
         Lib_IntVector_Intrinsics_vec128 e[5U];
         for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
             e[_i] = Lib_IntVector_Intrinsics_vec128_zero;
         uint8_t tmp[16U] = { 0U };
-        memcpy(tmp, last1, rem1 * sizeof last1[0U]);
+        memcpy(tmp, last1, rem1 * sizeof(last1[0U]));
         uint64_t u0 = load64_le(tmp);
         uint64_t lo = u0;
         uint64_t u = load64_le(tmp + (uint32_t)8U);
         uint64_t hi = u;
         Lib_IntVector_Intrinsics_vec128 f0 = Lib_IntVector_Intrinsics_vec128_load64(lo);
         Lib_IntVector_Intrinsics_vec128 f1 = Lib_IntVector_Intrinsics_vec128_load64(hi);
         Lib_IntVector_Intrinsics_vec128
             f010 =
@@ -1469,67 +1392,53 @@ Hacl_Poly1305_128_poly1305_update(
                 Lib_IntVector_Intrinsics_vec128_add64(a45,
                                                       Lib_IntVector_Intrinsics_vec128_mul64(r0, a41));
         Lib_IntVector_Intrinsics_vec128 t01 = a06;
         Lib_IntVector_Intrinsics_vec128 t11 = a16;
         Lib_IntVector_Intrinsics_vec128 t2 = a26;
         Lib_IntVector_Intrinsics_vec128 t3 = a36;
         Lib_IntVector_Intrinsics_vec128 t4 = a46;
         Lib_IntVector_Intrinsics_vec128
-            l = Lib_IntVector_Intrinsics_vec128_add64(t01, Lib_IntVector_Intrinsics_vec128_zero);
+            mask261 = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU);
         Lib_IntVector_Intrinsics_vec128
-            tmp0 =
-                Lib_IntVector_Intrinsics_vec128_and(l,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c0 = Lib_IntVector_Intrinsics_vec128_shift_right64(l, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l0 = Lib_IntVector_Intrinsics_vec128_add64(t11, c0);
+            z0 = Lib_IntVector_Intrinsics_vec128_shift_right64(t01, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            tmp1 =
-                Lib_IntVector_Intrinsics_vec128_and(l0,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c1 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(t2, c1);
+            z1 = Lib_IntVector_Intrinsics_vec128_shift_right64(t3, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x0 = Lib_IntVector_Intrinsics_vec128_and(t01, mask261);
+        Lib_IntVector_Intrinsics_vec128 x3 = Lib_IntVector_Intrinsics_vec128_and(t3, mask261);
+        Lib_IntVector_Intrinsics_vec128 x1 = Lib_IntVector_Intrinsics_vec128_add64(t11, z0);
+        Lib_IntVector_Intrinsics_vec128 x4 = Lib_IntVector_Intrinsics_vec128_add64(t4, z1);
         Lib_IntVector_Intrinsics_vec128
-            tmp2 =
-                Lib_IntVector_Intrinsics_vec128_and(l1,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            z01 = Lib_IntVector_Intrinsics_vec128_shift_right64(x1, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            c2 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(t3, c2);
+            z11 = Lib_IntVector_Intrinsics_vec128_shift_right64(x4, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            tmp3 =
-                Lib_IntVector_Intrinsics_vec128_and(l2,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            t = Lib_IntVector_Intrinsics_vec128_shift_left64(z11, (uint32_t)2U);
+        Lib_IntVector_Intrinsics_vec128 z12 = Lib_IntVector_Intrinsics_vec128_add64(z11, t);
+        Lib_IntVector_Intrinsics_vec128 x11 = Lib_IntVector_Intrinsics_vec128_and(x1, mask261);
+        Lib_IntVector_Intrinsics_vec128 x41 = Lib_IntVector_Intrinsics_vec128_and(x4, mask261);
+        Lib_IntVector_Intrinsics_vec128 x2 = Lib_IntVector_Intrinsics_vec128_add64(t2, z01);
+        Lib_IntVector_Intrinsics_vec128 x01 = Lib_IntVector_Intrinsics_vec128_add64(x0, z12);
         Lib_IntVector_Intrinsics_vec128
-            c3 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(t4, c3);
-        Lib_IntVector_Intrinsics_vec128
-            tmp4 =
-                Lib_IntVector_Intrinsics_vec128_and(l3,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-        Lib_IntVector_Intrinsics_vec128
-            c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+            z02 = Lib_IntVector_Intrinsics_vec128_shift_right64(x2, (uint32_t)26U);
         Lib_IntVector_Intrinsics_vec128
-            l4 =
-                Lib_IntVector_Intrinsics_vec128_add64(tmp0,
-                                                      Lib_IntVector_Intrinsics_vec128_smul64(c4, (uint64_t)5U));
-        Lib_IntVector_Intrinsics_vec128
-            tmp01 =
-                Lib_IntVector_Intrinsics_vec128_and(l4,
-                                                    Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+            z13 = Lib_IntVector_Intrinsics_vec128_shift_right64(x01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x21 = Lib_IntVector_Intrinsics_vec128_and(x2, mask261);
+        Lib_IntVector_Intrinsics_vec128 x02 = Lib_IntVector_Intrinsics_vec128_and(x01, mask261);
+        Lib_IntVector_Intrinsics_vec128 x31 = Lib_IntVector_Intrinsics_vec128_add64(x3, z02);
+        Lib_IntVector_Intrinsics_vec128 x12 = Lib_IntVector_Intrinsics_vec128_add64(x11, z13);
         Lib_IntVector_Intrinsics_vec128
-            c5 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
-        Lib_IntVector_Intrinsics_vec128 tmp11 = Lib_IntVector_Intrinsics_vec128_add64(tmp1, c5);
-        Lib_IntVector_Intrinsics_vec128 o0 = tmp01;
-        Lib_IntVector_Intrinsics_vec128 o1 = tmp11;
-        Lib_IntVector_Intrinsics_vec128 o2 = tmp2;
-        Lib_IntVector_Intrinsics_vec128 o3 = tmp3;
-        Lib_IntVector_Intrinsics_vec128 o4 = tmp4;
+            z03 = Lib_IntVector_Intrinsics_vec128_shift_right64(x31, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec128 x32 = Lib_IntVector_Intrinsics_vec128_and(x31, mask261);
+        Lib_IntVector_Intrinsics_vec128 x42 = Lib_IntVector_Intrinsics_vec128_add64(x41, z03);
+        Lib_IntVector_Intrinsics_vec128 o0 = x02;
+        Lib_IntVector_Intrinsics_vec128 o1 = x12;
+        Lib_IntVector_Intrinsics_vec128 o2 = x21;
+        Lib_IntVector_Intrinsics_vec128 o3 = x32;
+        Lib_IntVector_Intrinsics_vec128 o4 = x42;
         acc[0U] = o0;
         acc[1U] = o1;
         acc[2U] = o2;
         acc[3U] = o3;
         acc[4U] = o4;
         return;
     }
 }
@@ -1538,99 +1447,136 @@ void
 Hacl_Poly1305_128_poly1305_finish(
     uint8_t *tag,
     uint8_t *key,
     Lib_IntVector_Intrinsics_vec128 *ctx)
 {
     Lib_IntVector_Intrinsics_vec128 *acc = ctx;
     uint8_t *ks = key + (uint32_t)16U;
     Lib_IntVector_Intrinsics_vec128 f0 = acc[0U];
-    Lib_IntVector_Intrinsics_vec128 f12 = acc[1U];
-    Lib_IntVector_Intrinsics_vec128 f22 = acc[2U];
-    Lib_IntVector_Intrinsics_vec128 f32 = acc[3U];
+    Lib_IntVector_Intrinsics_vec128 f13 = acc[1U];
+    Lib_IntVector_Intrinsics_vec128 f23 = acc[2U];
+    Lib_IntVector_Intrinsics_vec128 f33 = acc[3U];
     Lib_IntVector_Intrinsics_vec128 f40 = acc[4U];
     Lib_IntVector_Intrinsics_vec128
-        l = Lib_IntVector_Intrinsics_vec128_add64(f0, Lib_IntVector_Intrinsics_vec128_zero);
+        l0 = Lib_IntVector_Intrinsics_vec128_add64(f0, Lib_IntVector_Intrinsics_vec128_zero);
+    Lib_IntVector_Intrinsics_vec128
+        tmp00 =
+            Lib_IntVector_Intrinsics_vec128_and(l0,
+                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec128
+        c00 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(f13, c00);
+    Lib_IntVector_Intrinsics_vec128
+        tmp10 =
+            Lib_IntVector_Intrinsics_vec128_and(l1,
+                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec128
+        c10 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(f23, c10);
+    Lib_IntVector_Intrinsics_vec128
+        tmp20 =
+            Lib_IntVector_Intrinsics_vec128_and(l2,
+                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec128
+        c20 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(f33, c20);
+    Lib_IntVector_Intrinsics_vec128
+        tmp30 =
+            Lib_IntVector_Intrinsics_vec128_and(l3,
+                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec128
+        c30 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 l4 = Lib_IntVector_Intrinsics_vec128_add64(f40, c30);
+    Lib_IntVector_Intrinsics_vec128
+        tmp40 =
+            Lib_IntVector_Intrinsics_vec128_and(l4,
+                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec128
+        c40 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128
+        f010 =
+            Lib_IntVector_Intrinsics_vec128_add64(tmp00,
+                                                  Lib_IntVector_Intrinsics_vec128_smul64(c40, (uint64_t)5U));
+    Lib_IntVector_Intrinsics_vec128 f110 = tmp10;
+    Lib_IntVector_Intrinsics_vec128 f210 = tmp20;
+    Lib_IntVector_Intrinsics_vec128 f310 = tmp30;
+    Lib_IntVector_Intrinsics_vec128 f410 = tmp40;
+    Lib_IntVector_Intrinsics_vec128
+        l = Lib_IntVector_Intrinsics_vec128_add64(f010, Lib_IntVector_Intrinsics_vec128_zero);
     Lib_IntVector_Intrinsics_vec128
         tmp0 =
             Lib_IntVector_Intrinsics_vec128_and(l,
                                                 Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
     Lib_IntVector_Intrinsics_vec128
         c0 = Lib_IntVector_Intrinsics_vec128_shift_right64(l, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l0 = Lib_IntVector_Intrinsics_vec128_add64(f12, c0);
+    Lib_IntVector_Intrinsics_vec128 l5 = Lib_IntVector_Intrinsics_vec128_add64(f110, c0);
     Lib_IntVector_Intrinsics_vec128
         tmp1 =
-            Lib_IntVector_Intrinsics_vec128_and(l0,
+            Lib_IntVector_Intrinsics_vec128_and(l5,
                                                 Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
     Lib_IntVector_Intrinsics_vec128
-        c1 = Lib_IntVector_Intrinsics_vec128_shift_right64(l0, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l1 = Lib_IntVector_Intrinsics_vec128_add64(f22, c1);
+        c1 = Lib_IntVector_Intrinsics_vec128_shift_right64(l5, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 l6 = Lib_IntVector_Intrinsics_vec128_add64(f210, c1);
     Lib_IntVector_Intrinsics_vec128
         tmp2 =
-            Lib_IntVector_Intrinsics_vec128_and(l1,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c2 = Lib_IntVector_Intrinsics_vec128_shift_right64(l1, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l2 = Lib_IntVector_Intrinsics_vec128_add64(f32, c2);
-    Lib_IntVector_Intrinsics_vec128
-        tmp3 =
-            Lib_IntVector_Intrinsics_vec128_and(l2,
+            Lib_IntVector_Intrinsics_vec128_and(l6,
                                                 Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
     Lib_IntVector_Intrinsics_vec128
-        c3 = Lib_IntVector_Intrinsics_vec128_shift_right64(l2, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 l3 = Lib_IntVector_Intrinsics_vec128_add64(f40, c3);
+        c2 = Lib_IntVector_Intrinsics_vec128_shift_right64(l6, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 l7 = Lib_IntVector_Intrinsics_vec128_add64(f310, c2);
+    Lib_IntVector_Intrinsics_vec128
+        tmp3 =
+            Lib_IntVector_Intrinsics_vec128_and(l7,
+                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec128
+        c3 = Lib_IntVector_Intrinsics_vec128_shift_right64(l7, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec128 l8 = Lib_IntVector_Intrinsics_vec128_add64(f410, c3);
     Lib_IntVector_Intrinsics_vec128
         tmp4 =
-            Lib_IntVector_Intrinsics_vec128_and(l3,
+            Lib_IntVector_Intrinsics_vec128_and(l8,
                                                 Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
     Lib_IntVector_Intrinsics_vec128
-        c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l3, (uint32_t)26U);
+        c4 = Lib_IntVector_Intrinsics_vec128_shift_right64(l8, (uint32_t)26U);
     Lib_IntVector_Intrinsics_vec128
-        l4 =
+        f02 =
             Lib_IntVector_Intrinsics_vec128_add64(tmp0,
                                                   Lib_IntVector_Intrinsics_vec128_smul64(c4, (uint64_t)5U));
-    Lib_IntVector_Intrinsics_vec128
-        tmp0_ =
-            Lib_IntVector_Intrinsics_vec128_and(l4,
-                                                Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU));
-    Lib_IntVector_Intrinsics_vec128
-        c5 = Lib_IntVector_Intrinsics_vec128_shift_right64(l4, (uint32_t)26U);
-    Lib_IntVector_Intrinsics_vec128 f010 = tmp0_;
-    Lib_IntVector_Intrinsics_vec128 f110 = Lib_IntVector_Intrinsics_vec128_add64(tmp1, c5);
-    Lib_IntVector_Intrinsics_vec128 f210 = tmp2;
-    Lib_IntVector_Intrinsics_vec128 f310 = tmp3;
-    Lib_IntVector_Intrinsics_vec128 f410 = tmp4;
+    Lib_IntVector_Intrinsics_vec128 f12 = tmp1;
+    Lib_IntVector_Intrinsics_vec128 f22 = tmp2;
+    Lib_IntVector_Intrinsics_vec128 f32 = tmp3;
+    Lib_IntVector_Intrinsics_vec128 f42 = tmp4;
     Lib_IntVector_Intrinsics_vec128
         mh = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3ffffffU);
     Lib_IntVector_Intrinsics_vec128
         ml = Lib_IntVector_Intrinsics_vec128_load64((uint64_t)0x3fffffbU);
-    Lib_IntVector_Intrinsics_vec128 mask = Lib_IntVector_Intrinsics_vec128_eq64(f410, mh);
+    Lib_IntVector_Intrinsics_vec128 mask = Lib_IntVector_Intrinsics_vec128_eq64(f42, mh);
     Lib_IntVector_Intrinsics_vec128
         mask1 =
             Lib_IntVector_Intrinsics_vec128_and(mask,
-                                                Lib_IntVector_Intrinsics_vec128_eq64(f310, mh));
+                                                Lib_IntVector_Intrinsics_vec128_eq64(f32, mh));
     Lib_IntVector_Intrinsics_vec128
         mask2 =
             Lib_IntVector_Intrinsics_vec128_and(mask1,
-                                                Lib_IntVector_Intrinsics_vec128_eq64(f210, mh));
+                                                Lib_IntVector_Intrinsics_vec128_eq64(f22, mh));
     Lib_IntVector_Intrinsics_vec128
         mask3 =
             Lib_IntVector_Intrinsics_vec128_and(mask2,
-                                                Lib_IntVector_Intrinsics_vec128_eq64(f110, mh));
+                                                Lib_IntVector_Intrinsics_vec128_eq64(f12, mh));
     Lib_IntVector_Intrinsics_vec128
         mask4 =
             Lib_IntVector_Intrinsics_vec128_and(mask3,
-                                                Lib_IntVector_Intrinsics_vec128_lognot(Lib_IntVector_Intrinsics_vec128_gt64(ml, f010)));
+                                                Lib_IntVector_Intrinsics_vec128_lognot(Lib_IntVector_Intrinsics_vec128_gt64(ml, f02)));
     Lib_IntVector_Intrinsics_vec128 ph = Lib_IntVector_Intrinsics_vec128_and(mask4, mh);
     Lib_IntVector_Intrinsics_vec128 pl = Lib_IntVector_Intrinsics_vec128_and(mask4, ml);
-    Lib_IntVector_Intrinsics_vec128 o0 = Lib_IntVector_Intrinsics_vec128_sub64(f010, pl);
-    Lib_IntVector_Intrinsics_vec128 o1 = Lib_IntVector_Intrinsics_vec128_sub64(f110, ph);
-    Lib_IntVector_Intrinsics_vec128 o2 = Lib_IntVector_Intrinsics_vec128_sub64(f210, ph);
-    Lib_IntVector_Intrinsics_vec128 o3 = Lib_IntVector_Intrinsics_vec128_sub64(f310, ph);
-    Lib_IntVector_Intrinsics_vec128 o4 = Lib_IntVector_Intrinsics_vec128_sub64(f410, ph);
+    Lib_IntVector_Intrinsics_vec128 o0 = Lib_IntVector_Intrinsics_vec128_sub64(f02, pl);
+    Lib_IntVector_Intrinsics_vec128 o1 = Lib_IntVector_Intrinsics_vec128_sub64(f12, ph);
+    Lib_IntVector_Intrinsics_vec128 o2 = Lib_IntVector_Intrinsics_vec128_sub64(f22, ph);
+    Lib_IntVector_Intrinsics_vec128 o3 = Lib_IntVector_Intrinsics_vec128_sub64(f32, ph);
+    Lib_IntVector_Intrinsics_vec128 o4 = Lib_IntVector_Intrinsics_vec128_sub64(f42, ph);
     Lib_IntVector_Intrinsics_vec128 f011 = o0;
     Lib_IntVector_Intrinsics_vec128 f111 = o1;
     Lib_IntVector_Intrinsics_vec128 f211 = o2;
     Lib_IntVector_Intrinsics_vec128 f311 = o3;
     Lib_IntVector_Intrinsics_vec128 f411 = o4;
     acc[0U] = f011;
     acc[1U] = f111;
     acc[2U] = f211;
new file mode 100644
--- /dev/null
+++ b/security/nss/lib/freebl/verified/Hacl_Poly1305_256.c
@@ -0,0 +1,2120 @@
+/* MIT License
+ *
+ * Copyright (c) 2016-2020 INRIA, CMU and Microsoft Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in all
+ * copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "Hacl_Poly1305_256.h"
+
+void
+Hacl_Impl_Poly1305_Field32xN_256_load_acc4(Lib_IntVector_Intrinsics_vec256 *acc, uint8_t *b)
+{
+    Lib_IntVector_Intrinsics_vec256 e[5U];
+    for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
+        e[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+    Lib_IntVector_Intrinsics_vec256 lo = Lib_IntVector_Intrinsics_vec256_load_le(b);
+    Lib_IntVector_Intrinsics_vec256
+        hi = Lib_IntVector_Intrinsics_vec256_load_le(b + (uint32_t)32U);
+    Lib_IntVector_Intrinsics_vec256
+        mask261 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+    Lib_IntVector_Intrinsics_vec256 m0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(lo, hi);
+    Lib_IntVector_Intrinsics_vec256
+        m1 = Lib_IntVector_Intrinsics_vec256_interleave_high128(lo, hi);
+    Lib_IntVector_Intrinsics_vec256
+        m2 = Lib_IntVector_Intrinsics_vec256_shift_right(m0, (uint32_t)48U);
+    Lib_IntVector_Intrinsics_vec256
+        m3 = Lib_IntVector_Intrinsics_vec256_shift_right(m1, (uint32_t)48U);
+    Lib_IntVector_Intrinsics_vec256 m4 = Lib_IntVector_Intrinsics_vec256_interleave_high64(m0, m1);
+    Lib_IntVector_Intrinsics_vec256 t0 = Lib_IntVector_Intrinsics_vec256_interleave_low64(m0, m1);
+    Lib_IntVector_Intrinsics_vec256 t3 = Lib_IntVector_Intrinsics_vec256_interleave_low64(m2, m3);
+    Lib_IntVector_Intrinsics_vec256
+        t2 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)4U);
+    Lib_IntVector_Intrinsics_vec256 o20 = Lib_IntVector_Intrinsics_vec256_and(t2, mask261);
+    Lib_IntVector_Intrinsics_vec256
+        t1 = Lib_IntVector_Intrinsics_vec256_shift_right64(t0, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 o10 = Lib_IntVector_Intrinsics_vec256_and(t1, mask261);
+    Lib_IntVector_Intrinsics_vec256 o5 = Lib_IntVector_Intrinsics_vec256_and(t0, mask261);
+    Lib_IntVector_Intrinsics_vec256
+        t31 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)30U);
+    Lib_IntVector_Intrinsics_vec256 o30 = Lib_IntVector_Intrinsics_vec256_and(t31, mask261);
+    Lib_IntVector_Intrinsics_vec256
+        o40 = Lib_IntVector_Intrinsics_vec256_shift_right64(m4, (uint32_t)40U);
+    Lib_IntVector_Intrinsics_vec256 o0 = o5;
+    Lib_IntVector_Intrinsics_vec256 o1 = o10;
+    Lib_IntVector_Intrinsics_vec256 o2 = o20;
+    Lib_IntVector_Intrinsics_vec256 o3 = o30;
+    Lib_IntVector_Intrinsics_vec256 o4 = o40;
+    e[0U] = o0;
+    e[1U] = o1;
+    e[2U] = o2;
+    e[3U] = o3;
+    e[4U] = o4;
+    uint64_t b1 = (uint64_t)0x1000000U;
+    Lib_IntVector_Intrinsics_vec256 mask = Lib_IntVector_Intrinsics_vec256_load64(b1);
+    Lib_IntVector_Intrinsics_vec256 f40 = e[4U];
+    e[4U] = Lib_IntVector_Intrinsics_vec256_or(f40, mask);
+    Lib_IntVector_Intrinsics_vec256 acc0 = acc[0U];
+    Lib_IntVector_Intrinsics_vec256 acc1 = acc[1U];
+    Lib_IntVector_Intrinsics_vec256 acc2 = acc[2U];
+    Lib_IntVector_Intrinsics_vec256 acc3 = acc[3U];
+    Lib_IntVector_Intrinsics_vec256 acc4 = acc[4U];
+    Lib_IntVector_Intrinsics_vec256 e0 = e[0U];
+    Lib_IntVector_Intrinsics_vec256 e1 = e[1U];
+    Lib_IntVector_Intrinsics_vec256 e2 = e[2U];
+    Lib_IntVector_Intrinsics_vec256 e3 = e[3U];
+    Lib_IntVector_Intrinsics_vec256 e4 = e[4U];
+    Lib_IntVector_Intrinsics_vec256 r0 = Lib_IntVector_Intrinsics_vec256_zero;
+    Lib_IntVector_Intrinsics_vec256 r1 = Lib_IntVector_Intrinsics_vec256_zero;
+    Lib_IntVector_Intrinsics_vec256 r2 = Lib_IntVector_Intrinsics_vec256_zero;
+    Lib_IntVector_Intrinsics_vec256 r3 = Lib_IntVector_Intrinsics_vec256_zero;
+    Lib_IntVector_Intrinsics_vec256 r4 = Lib_IntVector_Intrinsics_vec256_zero;
+    Lib_IntVector_Intrinsics_vec256
+        r01 =
+            Lib_IntVector_Intrinsics_vec256_insert64(r0,
+                                                     Lib_IntVector_Intrinsics_vec256_extract64(acc0, (uint32_t)0U),
+                                                     (uint32_t)0U);
+    Lib_IntVector_Intrinsics_vec256
+        r11 =
+            Lib_IntVector_Intrinsics_vec256_insert64(r1,
+                                                     Lib_IntVector_Intrinsics_vec256_extract64(acc1, (uint32_t)0U),
+                                                     (uint32_t)0U);
+    Lib_IntVector_Intrinsics_vec256
+        r21 =
+            Lib_IntVector_Intrinsics_vec256_insert64(r2,
+                                                     Lib_IntVector_Intrinsics_vec256_extract64(acc2, (uint32_t)0U),
+                                                     (uint32_t)0U);
+    Lib_IntVector_Intrinsics_vec256
+        r31 =
+            Lib_IntVector_Intrinsics_vec256_insert64(r3,
+                                                     Lib_IntVector_Intrinsics_vec256_extract64(acc3, (uint32_t)0U),
+                                                     (uint32_t)0U);
+    Lib_IntVector_Intrinsics_vec256
+        r41 =
+            Lib_IntVector_Intrinsics_vec256_insert64(r4,
+                                                     Lib_IntVector_Intrinsics_vec256_extract64(acc4, (uint32_t)0U),
+                                                     (uint32_t)0U);
+    Lib_IntVector_Intrinsics_vec256 f0 = Lib_IntVector_Intrinsics_vec256_add64(r01, e0);
+    Lib_IntVector_Intrinsics_vec256 f1 = Lib_IntVector_Intrinsics_vec256_add64(r11, e1);
+    Lib_IntVector_Intrinsics_vec256 f2 = Lib_IntVector_Intrinsics_vec256_add64(r21, e2);
+    Lib_IntVector_Intrinsics_vec256 f3 = Lib_IntVector_Intrinsics_vec256_add64(r31, e3);
+    Lib_IntVector_Intrinsics_vec256 f4 = Lib_IntVector_Intrinsics_vec256_add64(r41, e4);
+    Lib_IntVector_Intrinsics_vec256 acc01 = f0;
+    Lib_IntVector_Intrinsics_vec256 acc11 = f1;
+    Lib_IntVector_Intrinsics_vec256 acc21 = f2;
+    Lib_IntVector_Intrinsics_vec256 acc31 = f3;
+    Lib_IntVector_Intrinsics_vec256 acc41 = f4;
+    acc[0U] = acc01;
+    acc[1U] = acc11;
+    acc[2U] = acc21;
+    acc[3U] = acc31;
+    acc[4U] = acc41;
+}
+
+void
+Hacl_Impl_Poly1305_Field32xN_256_fmul_r4_normalize(
+    Lib_IntVector_Intrinsics_vec256 *out,
+    Lib_IntVector_Intrinsics_vec256 *p)
+{
+    Lib_IntVector_Intrinsics_vec256 *r = p;
+    Lib_IntVector_Intrinsics_vec256 *r_5 = p + (uint32_t)5U;
+    Lib_IntVector_Intrinsics_vec256 *r4 = p + (uint32_t)10U;
+    Lib_IntVector_Intrinsics_vec256 a0 = out[0U];
+    Lib_IntVector_Intrinsics_vec256 a1 = out[1U];
+    Lib_IntVector_Intrinsics_vec256 a2 = out[2U];
+    Lib_IntVector_Intrinsics_vec256 a3 = out[3U];
+    Lib_IntVector_Intrinsics_vec256 a4 = out[4U];
+    Lib_IntVector_Intrinsics_vec256 r10 = r[0U];
+    Lib_IntVector_Intrinsics_vec256 r11 = r[1U];
+    Lib_IntVector_Intrinsics_vec256 r12 = r[2U];
+    Lib_IntVector_Intrinsics_vec256 r13 = r[3U];
+    Lib_IntVector_Intrinsics_vec256 r14 = r[4U];
+    Lib_IntVector_Intrinsics_vec256 r151 = r_5[1U];
+    Lib_IntVector_Intrinsics_vec256 r152 = r_5[2U];
+    Lib_IntVector_Intrinsics_vec256 r153 = r_5[3U];
+    Lib_IntVector_Intrinsics_vec256 r154 = r_5[4U];
+    Lib_IntVector_Intrinsics_vec256 r40 = r4[0U];
+    Lib_IntVector_Intrinsics_vec256 r41 = r4[1U];
+    Lib_IntVector_Intrinsics_vec256 r42 = r4[2U];
+    Lib_IntVector_Intrinsics_vec256 r43 = r4[3U];
+    Lib_IntVector_Intrinsics_vec256 r44 = r4[4U];
+    Lib_IntVector_Intrinsics_vec256 a010 = Lib_IntVector_Intrinsics_vec256_mul64(r10, r10);
+    Lib_IntVector_Intrinsics_vec256 a110 = Lib_IntVector_Intrinsics_vec256_mul64(r11, r10);
+    Lib_IntVector_Intrinsics_vec256 a210 = Lib_IntVector_Intrinsics_vec256_mul64(r12, r10);
+    Lib_IntVector_Intrinsics_vec256 a310 = Lib_IntVector_Intrinsics_vec256_mul64(r13, r10);
+    Lib_IntVector_Intrinsics_vec256 a410 = Lib_IntVector_Intrinsics_vec256_mul64(r14, r10);
+    Lib_IntVector_Intrinsics_vec256
+        a020 =
+            Lib_IntVector_Intrinsics_vec256_add64(a010,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r154, r11));
+    Lib_IntVector_Intrinsics_vec256
+        a120 =
+            Lib_IntVector_Intrinsics_vec256_add64(a110,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r10, r11));
+    Lib_IntVector_Intrinsics_vec256
+        a220 =
+            Lib_IntVector_Intrinsics_vec256_add64(a210,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r11, r11));
+    Lib_IntVector_Intrinsics_vec256
+        a320 =
+            Lib_IntVector_Intrinsics_vec256_add64(a310,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12, r11));
+    Lib_IntVector_Intrinsics_vec256
+        a420 =
+            Lib_IntVector_Intrinsics_vec256_add64(a410,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r13, r11));
+    Lib_IntVector_Intrinsics_vec256
+        a030 =
+            Lib_IntVector_Intrinsics_vec256_add64(a020,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r153, r12));
+    Lib_IntVector_Intrinsics_vec256
+        a130 =
+            Lib_IntVector_Intrinsics_vec256_add64(a120,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r154, r12));
+    Lib_IntVector_Intrinsics_vec256
+        a230 =
+            Lib_IntVector_Intrinsics_vec256_add64(a220,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r10, r12));
+    Lib_IntVector_Intrinsics_vec256
+        a330 =
+            Lib_IntVector_Intrinsics_vec256_add64(a320,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r11, r12));
+    Lib_IntVector_Intrinsics_vec256
+        a430 =
+            Lib_IntVector_Intrinsics_vec256_add64(a420,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12, r12));
+    Lib_IntVector_Intrinsics_vec256
+        a040 =
+            Lib_IntVector_Intrinsics_vec256_add64(a030,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r152, r13));
+    Lib_IntVector_Intrinsics_vec256
+        a140 =
+            Lib_IntVector_Intrinsics_vec256_add64(a130,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r153, r13));
+    Lib_IntVector_Intrinsics_vec256
+        a240 =
+            Lib_IntVector_Intrinsics_vec256_add64(a230,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r154, r13));
+    Lib_IntVector_Intrinsics_vec256
+        a340 =
+            Lib_IntVector_Intrinsics_vec256_add64(a330,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r10, r13));
+    Lib_IntVector_Intrinsics_vec256
+        a440 =
+            Lib_IntVector_Intrinsics_vec256_add64(a430,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r11, r13));
+    Lib_IntVector_Intrinsics_vec256
+        a050 =
+            Lib_IntVector_Intrinsics_vec256_add64(a040,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r151, r14));
+    Lib_IntVector_Intrinsics_vec256
+        a150 =
+            Lib_IntVector_Intrinsics_vec256_add64(a140,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r152, r14));
+    Lib_IntVector_Intrinsics_vec256
+        a250 =
+            Lib_IntVector_Intrinsics_vec256_add64(a240,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r153, r14));
+    Lib_IntVector_Intrinsics_vec256
+        a350 =
+            Lib_IntVector_Intrinsics_vec256_add64(a340,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r154, r14));
+    Lib_IntVector_Intrinsics_vec256
+        a450 =
+            Lib_IntVector_Intrinsics_vec256_add64(a440,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r10, r14));
+    Lib_IntVector_Intrinsics_vec256 t00 = a050;
+    Lib_IntVector_Intrinsics_vec256 t10 = a150;
+    Lib_IntVector_Intrinsics_vec256 t20 = a250;
+    Lib_IntVector_Intrinsics_vec256 t30 = a350;
+    Lib_IntVector_Intrinsics_vec256 t40 = a450;
+    Lib_IntVector_Intrinsics_vec256
+        mask2610 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+    Lib_IntVector_Intrinsics_vec256
+        z00 = Lib_IntVector_Intrinsics_vec256_shift_right64(t00, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z10 = Lib_IntVector_Intrinsics_vec256_shift_right64(t30, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x00 = Lib_IntVector_Intrinsics_vec256_and(t00, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x30 = Lib_IntVector_Intrinsics_vec256_and(t30, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x10 = Lib_IntVector_Intrinsics_vec256_add64(t10, z00);
+    Lib_IntVector_Intrinsics_vec256 x40 = Lib_IntVector_Intrinsics_vec256_add64(t40, z10);
+    Lib_IntVector_Intrinsics_vec256
+        z010 = Lib_IntVector_Intrinsics_vec256_shift_right64(x10, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z110 = Lib_IntVector_Intrinsics_vec256_shift_right64(x40, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        t5 = Lib_IntVector_Intrinsics_vec256_shift_left64(z110, (uint32_t)2U);
+    Lib_IntVector_Intrinsics_vec256 z12 = Lib_IntVector_Intrinsics_vec256_add64(z110, t5);
+    Lib_IntVector_Intrinsics_vec256 x110 = Lib_IntVector_Intrinsics_vec256_and(x10, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x410 = Lib_IntVector_Intrinsics_vec256_and(x40, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x20 = Lib_IntVector_Intrinsics_vec256_add64(t20, z010);
+    Lib_IntVector_Intrinsics_vec256 x010 = Lib_IntVector_Intrinsics_vec256_add64(x00, z12);
+    Lib_IntVector_Intrinsics_vec256
+        z020 = Lib_IntVector_Intrinsics_vec256_shift_right64(x20, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z130 = Lib_IntVector_Intrinsics_vec256_shift_right64(x010, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x210 = Lib_IntVector_Intrinsics_vec256_and(x20, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x020 = Lib_IntVector_Intrinsics_vec256_and(x010, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x310 = Lib_IntVector_Intrinsics_vec256_add64(x30, z020);
+    Lib_IntVector_Intrinsics_vec256 x120 = Lib_IntVector_Intrinsics_vec256_add64(x110, z130);
+    Lib_IntVector_Intrinsics_vec256
+        z030 = Lib_IntVector_Intrinsics_vec256_shift_right64(x310, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x320 = Lib_IntVector_Intrinsics_vec256_and(x310, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x420 = Lib_IntVector_Intrinsics_vec256_add64(x410, z030);
+    Lib_IntVector_Intrinsics_vec256 r20 = x020;
+    Lib_IntVector_Intrinsics_vec256 r21 = x120;
+    Lib_IntVector_Intrinsics_vec256 r22 = x210;
+    Lib_IntVector_Intrinsics_vec256 r23 = x320;
+    Lib_IntVector_Intrinsics_vec256 r24 = x420;
+    Lib_IntVector_Intrinsics_vec256 a011 = Lib_IntVector_Intrinsics_vec256_mul64(r10, r20);
+    Lib_IntVector_Intrinsics_vec256 a111 = Lib_IntVector_Intrinsics_vec256_mul64(r11, r20);
+    Lib_IntVector_Intrinsics_vec256 a211 = Lib_IntVector_Intrinsics_vec256_mul64(r12, r20);
+    Lib_IntVector_Intrinsics_vec256 a311 = Lib_IntVector_Intrinsics_vec256_mul64(r13, r20);
+    Lib_IntVector_Intrinsics_vec256 a411 = Lib_IntVector_Intrinsics_vec256_mul64(r14, r20);
+    Lib_IntVector_Intrinsics_vec256
+        a021 =
+            Lib_IntVector_Intrinsics_vec256_add64(a011,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r154, r21));
+    Lib_IntVector_Intrinsics_vec256
+        a121 =
+            Lib_IntVector_Intrinsics_vec256_add64(a111,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r10, r21));
+    Lib_IntVector_Intrinsics_vec256
+        a221 =
+            Lib_IntVector_Intrinsics_vec256_add64(a211,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r11, r21));
+    Lib_IntVector_Intrinsics_vec256
+        a321 =
+            Lib_IntVector_Intrinsics_vec256_add64(a311,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12, r21));
+    Lib_IntVector_Intrinsics_vec256
+        a421 =
+            Lib_IntVector_Intrinsics_vec256_add64(a411,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r13, r21));
+    Lib_IntVector_Intrinsics_vec256
+        a031 =
+            Lib_IntVector_Intrinsics_vec256_add64(a021,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r153, r22));
+    Lib_IntVector_Intrinsics_vec256
+        a131 =
+            Lib_IntVector_Intrinsics_vec256_add64(a121,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r154, r22));
+    Lib_IntVector_Intrinsics_vec256
+        a231 =
+            Lib_IntVector_Intrinsics_vec256_add64(a221,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r10, r22));
+    Lib_IntVector_Intrinsics_vec256
+        a331 =
+            Lib_IntVector_Intrinsics_vec256_add64(a321,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r11, r22));
+    Lib_IntVector_Intrinsics_vec256
+        a431 =
+            Lib_IntVector_Intrinsics_vec256_add64(a421,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12, r22));
+    Lib_IntVector_Intrinsics_vec256
+        a041 =
+            Lib_IntVector_Intrinsics_vec256_add64(a031,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r152, r23));
+    Lib_IntVector_Intrinsics_vec256
+        a141 =
+            Lib_IntVector_Intrinsics_vec256_add64(a131,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r153, r23));
+    Lib_IntVector_Intrinsics_vec256
+        a241 =
+            Lib_IntVector_Intrinsics_vec256_add64(a231,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r154, r23));
+    Lib_IntVector_Intrinsics_vec256
+        a341 =
+            Lib_IntVector_Intrinsics_vec256_add64(a331,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r10, r23));
+    Lib_IntVector_Intrinsics_vec256
+        a441 =
+            Lib_IntVector_Intrinsics_vec256_add64(a431,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r11, r23));
+    Lib_IntVector_Intrinsics_vec256
+        a051 =
+            Lib_IntVector_Intrinsics_vec256_add64(a041,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r151, r24));
+    Lib_IntVector_Intrinsics_vec256
+        a151 =
+            Lib_IntVector_Intrinsics_vec256_add64(a141,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r152, r24));
+    Lib_IntVector_Intrinsics_vec256
+        a251 =
+            Lib_IntVector_Intrinsics_vec256_add64(a241,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r153, r24));
+    Lib_IntVector_Intrinsics_vec256
+        a351 =
+            Lib_IntVector_Intrinsics_vec256_add64(a341,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r154, r24));
+    Lib_IntVector_Intrinsics_vec256
+        a451 =
+            Lib_IntVector_Intrinsics_vec256_add64(a441,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r10, r24));
+    Lib_IntVector_Intrinsics_vec256 t01 = a051;
+    Lib_IntVector_Intrinsics_vec256 t11 = a151;
+    Lib_IntVector_Intrinsics_vec256 t21 = a251;
+    Lib_IntVector_Intrinsics_vec256 t31 = a351;
+    Lib_IntVector_Intrinsics_vec256 t41 = a451;
+    Lib_IntVector_Intrinsics_vec256
+        mask2611 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+    Lib_IntVector_Intrinsics_vec256
+        z04 = Lib_IntVector_Intrinsics_vec256_shift_right64(t01, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z14 = Lib_IntVector_Intrinsics_vec256_shift_right64(t31, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x03 = Lib_IntVector_Intrinsics_vec256_and(t01, mask2611);
+    Lib_IntVector_Intrinsics_vec256 x33 = Lib_IntVector_Intrinsics_vec256_and(t31, mask2611);
+    Lib_IntVector_Intrinsics_vec256 x13 = Lib_IntVector_Intrinsics_vec256_add64(t11, z04);
+    Lib_IntVector_Intrinsics_vec256 x43 = Lib_IntVector_Intrinsics_vec256_add64(t41, z14);
+    Lib_IntVector_Intrinsics_vec256
+        z011 = Lib_IntVector_Intrinsics_vec256_shift_right64(x13, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z111 = Lib_IntVector_Intrinsics_vec256_shift_right64(x43, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        t6 = Lib_IntVector_Intrinsics_vec256_shift_left64(z111, (uint32_t)2U);
+    Lib_IntVector_Intrinsics_vec256 z120 = Lib_IntVector_Intrinsics_vec256_add64(z111, t6);
+    Lib_IntVector_Intrinsics_vec256 x111 = Lib_IntVector_Intrinsics_vec256_and(x13, mask2611);
+    Lib_IntVector_Intrinsics_vec256 x411 = Lib_IntVector_Intrinsics_vec256_and(x43, mask2611);
+    Lib_IntVector_Intrinsics_vec256 x22 = Lib_IntVector_Intrinsics_vec256_add64(t21, z011);
+    Lib_IntVector_Intrinsics_vec256 x011 = Lib_IntVector_Intrinsics_vec256_add64(x03, z120);
+    Lib_IntVector_Intrinsics_vec256
+        z021 = Lib_IntVector_Intrinsics_vec256_shift_right64(x22, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z131 = Lib_IntVector_Intrinsics_vec256_shift_right64(x011, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x211 = Lib_IntVector_Intrinsics_vec256_and(x22, mask2611);
+    Lib_IntVector_Intrinsics_vec256 x021 = Lib_IntVector_Intrinsics_vec256_and(x011, mask2611);
+    Lib_IntVector_Intrinsics_vec256 x311 = Lib_IntVector_Intrinsics_vec256_add64(x33, z021);
+    Lib_IntVector_Intrinsics_vec256 x121 = Lib_IntVector_Intrinsics_vec256_add64(x111, z131);
+    Lib_IntVector_Intrinsics_vec256
+        z031 = Lib_IntVector_Intrinsics_vec256_shift_right64(x311, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x321 = Lib_IntVector_Intrinsics_vec256_and(x311, mask2611);
+    Lib_IntVector_Intrinsics_vec256 x421 = Lib_IntVector_Intrinsics_vec256_add64(x411, z031);
+    Lib_IntVector_Intrinsics_vec256 r30 = x021;
+    Lib_IntVector_Intrinsics_vec256 r31 = x121;
+    Lib_IntVector_Intrinsics_vec256 r32 = x211;
+    Lib_IntVector_Intrinsics_vec256 r33 = x321;
+    Lib_IntVector_Intrinsics_vec256 r34 = x421;
+    Lib_IntVector_Intrinsics_vec256
+        v12120 = Lib_IntVector_Intrinsics_vec256_interleave_low64(r20, r10);
+    Lib_IntVector_Intrinsics_vec256
+        v34340 = Lib_IntVector_Intrinsics_vec256_interleave_low64(r40, r30);
+    Lib_IntVector_Intrinsics_vec256
+        r12340 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v34340, v12120);
+    Lib_IntVector_Intrinsics_vec256
+        v12121 = Lib_IntVector_Intrinsics_vec256_interleave_low64(r21, r11);
+    Lib_IntVector_Intrinsics_vec256
+        v34341 = Lib_IntVector_Intrinsics_vec256_interleave_low64(r41, r31);
+    Lib_IntVector_Intrinsics_vec256
+        r12341 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v34341, v12121);
+    Lib_IntVector_Intrinsics_vec256
+        v12122 = Lib_IntVector_Intrinsics_vec256_interleave_low64(r22, r12);
+    Lib_IntVector_Intrinsics_vec256
+        v34342 = Lib_IntVector_Intrinsics_vec256_interleave_low64(r42, r32);
+    Lib_IntVector_Intrinsics_vec256
+        r12342 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v34342, v12122);
+    Lib_IntVector_Intrinsics_vec256
+        v12123 = Lib_IntVector_Intrinsics_vec256_interleave_low64(r23, r13);
+    Lib_IntVector_Intrinsics_vec256
+        v34343 = Lib_IntVector_Intrinsics_vec256_interleave_low64(r43, r33);
+    Lib_IntVector_Intrinsics_vec256
+        r12343 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v34343, v12123);
+    Lib_IntVector_Intrinsics_vec256
+        v12124 = Lib_IntVector_Intrinsics_vec256_interleave_low64(r24, r14);
+    Lib_IntVector_Intrinsics_vec256
+        v34344 = Lib_IntVector_Intrinsics_vec256_interleave_low64(r44, r34);
+    Lib_IntVector_Intrinsics_vec256
+        r12344 = Lib_IntVector_Intrinsics_vec256_interleave_low128(v34344, v12124);
+    Lib_IntVector_Intrinsics_vec256
+        r123451 = Lib_IntVector_Intrinsics_vec256_smul64(r12341, (uint64_t)5U);
+    Lib_IntVector_Intrinsics_vec256
+        r123452 = Lib_IntVector_Intrinsics_vec256_smul64(r12342, (uint64_t)5U);
+    Lib_IntVector_Intrinsics_vec256
+        r123453 = Lib_IntVector_Intrinsics_vec256_smul64(r12343, (uint64_t)5U);
+    Lib_IntVector_Intrinsics_vec256
+        r123454 = Lib_IntVector_Intrinsics_vec256_smul64(r12344, (uint64_t)5U);
+    Lib_IntVector_Intrinsics_vec256 a01 = Lib_IntVector_Intrinsics_vec256_mul64(r12340, a0);
+    Lib_IntVector_Intrinsics_vec256 a11 = Lib_IntVector_Intrinsics_vec256_mul64(r12341, a0);
+    Lib_IntVector_Intrinsics_vec256 a21 = Lib_IntVector_Intrinsics_vec256_mul64(r12342, a0);
+    Lib_IntVector_Intrinsics_vec256 a31 = Lib_IntVector_Intrinsics_vec256_mul64(r12343, a0);
+    Lib_IntVector_Intrinsics_vec256 a41 = Lib_IntVector_Intrinsics_vec256_mul64(r12344, a0);
+    Lib_IntVector_Intrinsics_vec256
+        a02 =
+            Lib_IntVector_Intrinsics_vec256_add64(a01,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r123454, a1));
+    Lib_IntVector_Intrinsics_vec256
+        a12 =
+            Lib_IntVector_Intrinsics_vec256_add64(a11,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12340, a1));
+    Lib_IntVector_Intrinsics_vec256
+        a22 =
+            Lib_IntVector_Intrinsics_vec256_add64(a21,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12341, a1));
+    Lib_IntVector_Intrinsics_vec256
+        a32 =
+            Lib_IntVector_Intrinsics_vec256_add64(a31,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12342, a1));
+    Lib_IntVector_Intrinsics_vec256
+        a42 =
+            Lib_IntVector_Intrinsics_vec256_add64(a41,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12343, a1));
+    Lib_IntVector_Intrinsics_vec256
+        a03 =
+            Lib_IntVector_Intrinsics_vec256_add64(a02,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r123453, a2));
+    Lib_IntVector_Intrinsics_vec256
+        a13 =
+            Lib_IntVector_Intrinsics_vec256_add64(a12,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r123454, a2));
+    Lib_IntVector_Intrinsics_vec256
+        a23 =
+            Lib_IntVector_Intrinsics_vec256_add64(a22,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12340, a2));
+    Lib_IntVector_Intrinsics_vec256
+        a33 =
+            Lib_IntVector_Intrinsics_vec256_add64(a32,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12341, a2));
+    Lib_IntVector_Intrinsics_vec256
+        a43 =
+            Lib_IntVector_Intrinsics_vec256_add64(a42,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12342, a2));
+    Lib_IntVector_Intrinsics_vec256
+        a04 =
+            Lib_IntVector_Intrinsics_vec256_add64(a03,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r123452, a3));
+    Lib_IntVector_Intrinsics_vec256
+        a14 =
+            Lib_IntVector_Intrinsics_vec256_add64(a13,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r123453, a3));
+    Lib_IntVector_Intrinsics_vec256
+        a24 =
+            Lib_IntVector_Intrinsics_vec256_add64(a23,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r123454, a3));
+    Lib_IntVector_Intrinsics_vec256
+        a34 =
+            Lib_IntVector_Intrinsics_vec256_add64(a33,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12340, a3));
+    Lib_IntVector_Intrinsics_vec256
+        a44 =
+            Lib_IntVector_Intrinsics_vec256_add64(a43,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12341, a3));
+    Lib_IntVector_Intrinsics_vec256
+        a05 =
+            Lib_IntVector_Intrinsics_vec256_add64(a04,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r123451, a4));
+    Lib_IntVector_Intrinsics_vec256
+        a15 =
+            Lib_IntVector_Intrinsics_vec256_add64(a14,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r123452, a4));
+    Lib_IntVector_Intrinsics_vec256
+        a25 =
+            Lib_IntVector_Intrinsics_vec256_add64(a24,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r123453, a4));
+    Lib_IntVector_Intrinsics_vec256
+        a35 =
+            Lib_IntVector_Intrinsics_vec256_add64(a34,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r123454, a4));
+    Lib_IntVector_Intrinsics_vec256
+        a45 =
+            Lib_IntVector_Intrinsics_vec256_add64(a44,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r12340, a4));
+    Lib_IntVector_Intrinsics_vec256 t0 = a05;
+    Lib_IntVector_Intrinsics_vec256 t1 = a15;
+    Lib_IntVector_Intrinsics_vec256 t2 = a25;
+    Lib_IntVector_Intrinsics_vec256 t3 = a35;
+    Lib_IntVector_Intrinsics_vec256 t4 = a45;
+    Lib_IntVector_Intrinsics_vec256
+        mask261 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+    Lib_IntVector_Intrinsics_vec256
+        z0 = Lib_IntVector_Intrinsics_vec256_shift_right64(t0, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z1 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x0 = Lib_IntVector_Intrinsics_vec256_and(t0, mask261);
+    Lib_IntVector_Intrinsics_vec256 x3 = Lib_IntVector_Intrinsics_vec256_and(t3, mask261);
+    Lib_IntVector_Intrinsics_vec256 x1 = Lib_IntVector_Intrinsics_vec256_add64(t1, z0);
+    Lib_IntVector_Intrinsics_vec256 x4 = Lib_IntVector_Intrinsics_vec256_add64(t4, z1);
+    Lib_IntVector_Intrinsics_vec256
+        z01 = Lib_IntVector_Intrinsics_vec256_shift_right64(x1, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z11 = Lib_IntVector_Intrinsics_vec256_shift_right64(x4, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        t = Lib_IntVector_Intrinsics_vec256_shift_left64(z11, (uint32_t)2U);
+    Lib_IntVector_Intrinsics_vec256 z121 = Lib_IntVector_Intrinsics_vec256_add64(z11, t);
+    Lib_IntVector_Intrinsics_vec256 x11 = Lib_IntVector_Intrinsics_vec256_and(x1, mask261);
+    Lib_IntVector_Intrinsics_vec256 x41 = Lib_IntVector_Intrinsics_vec256_and(x4, mask261);
+    Lib_IntVector_Intrinsics_vec256 x2 = Lib_IntVector_Intrinsics_vec256_add64(t2, z01);
+    Lib_IntVector_Intrinsics_vec256 x01 = Lib_IntVector_Intrinsics_vec256_add64(x0, z121);
+    Lib_IntVector_Intrinsics_vec256
+        z02 = Lib_IntVector_Intrinsics_vec256_shift_right64(x2, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z13 = Lib_IntVector_Intrinsics_vec256_shift_right64(x01, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x21 = Lib_IntVector_Intrinsics_vec256_and(x2, mask261);
+    Lib_IntVector_Intrinsics_vec256 x02 = Lib_IntVector_Intrinsics_vec256_and(x01, mask261);
+    Lib_IntVector_Intrinsics_vec256 x31 = Lib_IntVector_Intrinsics_vec256_add64(x3, z02);
+    Lib_IntVector_Intrinsics_vec256 x12 = Lib_IntVector_Intrinsics_vec256_add64(x11, z13);
+    Lib_IntVector_Intrinsics_vec256
+        z03 = Lib_IntVector_Intrinsics_vec256_shift_right64(x31, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x32 = Lib_IntVector_Intrinsics_vec256_and(x31, mask261);
+    Lib_IntVector_Intrinsics_vec256 x42 = Lib_IntVector_Intrinsics_vec256_add64(x41, z03);
+    Lib_IntVector_Intrinsics_vec256 o0 = x02;
+    Lib_IntVector_Intrinsics_vec256 o10 = x12;
+    Lib_IntVector_Intrinsics_vec256 o20 = x21;
+    Lib_IntVector_Intrinsics_vec256 o30 = x32;
+    Lib_IntVector_Intrinsics_vec256 o40 = x42;
+    Lib_IntVector_Intrinsics_vec256
+        v00 = Lib_IntVector_Intrinsics_vec256_interleave_high128(o0, o0);
+    Lib_IntVector_Intrinsics_vec256 v10 = Lib_IntVector_Intrinsics_vec256_add64(o0, v00);
+    Lib_IntVector_Intrinsics_vec256
+        v20 =
+            Lib_IntVector_Intrinsics_vec256_add64(v10,
+                                                  Lib_IntVector_Intrinsics_vec256_shuffle64(v10,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U));
+    Lib_IntVector_Intrinsics_vec256
+        v01 = Lib_IntVector_Intrinsics_vec256_interleave_high128(o10, o10);
+    Lib_IntVector_Intrinsics_vec256 v11 = Lib_IntVector_Intrinsics_vec256_add64(o10, v01);
+    Lib_IntVector_Intrinsics_vec256
+        v21 =
+            Lib_IntVector_Intrinsics_vec256_add64(v11,
+                                                  Lib_IntVector_Intrinsics_vec256_shuffle64(v11,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U));
+    Lib_IntVector_Intrinsics_vec256
+        v02 = Lib_IntVector_Intrinsics_vec256_interleave_high128(o20, o20);
+    Lib_IntVector_Intrinsics_vec256 v12 = Lib_IntVector_Intrinsics_vec256_add64(o20, v02);
+    Lib_IntVector_Intrinsics_vec256
+        v22 =
+            Lib_IntVector_Intrinsics_vec256_add64(v12,
+                                                  Lib_IntVector_Intrinsics_vec256_shuffle64(v12,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U));
+    Lib_IntVector_Intrinsics_vec256
+        v03 = Lib_IntVector_Intrinsics_vec256_interleave_high128(o30, o30);
+    Lib_IntVector_Intrinsics_vec256 v13 = Lib_IntVector_Intrinsics_vec256_add64(o30, v03);
+    Lib_IntVector_Intrinsics_vec256
+        v23 =
+            Lib_IntVector_Intrinsics_vec256_add64(v13,
+                                                  Lib_IntVector_Intrinsics_vec256_shuffle64(v13,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U));
+    Lib_IntVector_Intrinsics_vec256
+        v04 = Lib_IntVector_Intrinsics_vec256_interleave_high128(o40, o40);
+    Lib_IntVector_Intrinsics_vec256 v14 = Lib_IntVector_Intrinsics_vec256_add64(o40, v04);
+    Lib_IntVector_Intrinsics_vec256
+        v24 =
+            Lib_IntVector_Intrinsics_vec256_add64(v14,
+                                                  Lib_IntVector_Intrinsics_vec256_shuffle64(v14,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U,
+                                                                                            (uint32_t)1U));
+    Lib_IntVector_Intrinsics_vec256
+        l = Lib_IntVector_Intrinsics_vec256_add64(v20, Lib_IntVector_Intrinsics_vec256_zero);
+    Lib_IntVector_Intrinsics_vec256
+        tmp0 =
+            Lib_IntVector_Intrinsics_vec256_and(l,
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        c0 = Lib_IntVector_Intrinsics_vec256_shift_right64(l, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 l0 = Lib_IntVector_Intrinsics_vec256_add64(v21, c0);
+    Lib_IntVector_Intrinsics_vec256
+        tmp1 =
+            Lib_IntVector_Intrinsics_vec256_and(l0,
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        c1 = Lib_IntVector_Intrinsics_vec256_shift_right64(l0, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 l1 = Lib_IntVector_Intrinsics_vec256_add64(v22, c1);
+    Lib_IntVector_Intrinsics_vec256
+        tmp2 =
+            Lib_IntVector_Intrinsics_vec256_and(l1,
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        c2 = Lib_IntVector_Intrinsics_vec256_shift_right64(l1, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 l2 = Lib_IntVector_Intrinsics_vec256_add64(v23, c2);
+    Lib_IntVector_Intrinsics_vec256
+        tmp3 =
+            Lib_IntVector_Intrinsics_vec256_and(l2,
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        c3 = Lib_IntVector_Intrinsics_vec256_shift_right64(l2, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 l3 = Lib_IntVector_Intrinsics_vec256_add64(v24, c3);
+    Lib_IntVector_Intrinsics_vec256
+        tmp4 =
+            Lib_IntVector_Intrinsics_vec256_and(l3,
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        c4 = Lib_IntVector_Intrinsics_vec256_shift_right64(l3, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        o00 =
+            Lib_IntVector_Intrinsics_vec256_add64(tmp0,
+                                                  Lib_IntVector_Intrinsics_vec256_smul64(c4, (uint64_t)5U));
+    Lib_IntVector_Intrinsics_vec256 o1 = tmp1;
+    Lib_IntVector_Intrinsics_vec256 o2 = tmp2;
+    Lib_IntVector_Intrinsics_vec256 o3 = tmp3;
+    Lib_IntVector_Intrinsics_vec256 o4 = tmp4;
+    out[0U] = o00;
+    out[1U] = o1;
+    out[2U] = o2;
+    out[3U] = o3;
+    out[4U] = o4;
+}
+
+uint32_t Hacl_Poly1305_256_blocklen = (uint32_t)16U;
+
+void
+Hacl_Poly1305_256_poly1305_init(Lib_IntVector_Intrinsics_vec256 *ctx, uint8_t *key)
+{
+    Lib_IntVector_Intrinsics_vec256 *acc = ctx;
+    Lib_IntVector_Intrinsics_vec256 *pre = ctx + (uint32_t)5U;
+    uint8_t *kr = key;
+    acc[0U] = Lib_IntVector_Intrinsics_vec256_zero;
+    acc[1U] = Lib_IntVector_Intrinsics_vec256_zero;
+    acc[2U] = Lib_IntVector_Intrinsics_vec256_zero;
+    acc[3U] = Lib_IntVector_Intrinsics_vec256_zero;
+    acc[4U] = Lib_IntVector_Intrinsics_vec256_zero;
+    uint64_t u0 = load64_le(kr);
+    uint64_t lo = u0;
+    uint64_t u = load64_le(kr + (uint32_t)8U);
+    uint64_t hi = u;
+    uint64_t mask0 = (uint64_t)0x0ffffffc0fffffffU;
+    uint64_t mask1 = (uint64_t)0x0ffffffc0ffffffcU;
+    uint64_t lo1 = lo & mask0;
+    uint64_t hi1 = hi & mask1;
+    Lib_IntVector_Intrinsics_vec256 *r = pre;
+    Lib_IntVector_Intrinsics_vec256 *r5 = pre + (uint32_t)5U;
+    Lib_IntVector_Intrinsics_vec256 *rn = pre + (uint32_t)10U;
+    Lib_IntVector_Intrinsics_vec256 *rn_5 = pre + (uint32_t)15U;
+    Lib_IntVector_Intrinsics_vec256 r_vec0 = Lib_IntVector_Intrinsics_vec256_load64(lo1);
+    Lib_IntVector_Intrinsics_vec256 r_vec1 = Lib_IntVector_Intrinsics_vec256_load64(hi1);
+    Lib_IntVector_Intrinsics_vec256
+        f00 =
+            Lib_IntVector_Intrinsics_vec256_and(r_vec0,
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        f15 =
+            Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(r_vec0,
+                                                                                              (uint32_t)26U),
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        f20 =
+            Lib_IntVector_Intrinsics_vec256_or(Lib_IntVector_Intrinsics_vec256_shift_right64(r_vec0,
+                                                                                             (uint32_t)52U),
+                                               Lib_IntVector_Intrinsics_vec256_shift_left64(Lib_IntVector_Intrinsics_vec256_and(r_vec1,
+                                                                                                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3fffU)),
+                                                                                            (uint32_t)12U));
+    Lib_IntVector_Intrinsics_vec256
+        f30 =
+            Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(r_vec1,
+                                                                                              (uint32_t)14U),
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        f40 = Lib_IntVector_Intrinsics_vec256_shift_right64(r_vec1, (uint32_t)40U);
+    Lib_IntVector_Intrinsics_vec256 f0 = f00;
+    Lib_IntVector_Intrinsics_vec256 f1 = f15;
+    Lib_IntVector_Intrinsics_vec256 f2 = f20;
+    Lib_IntVector_Intrinsics_vec256 f3 = f30;
+    Lib_IntVector_Intrinsics_vec256 f4 = f40;
+    r[0U] = f0;
+    r[1U] = f1;
+    r[2U] = f2;
+    r[3U] = f3;
+    r[4U] = f4;
+    Lib_IntVector_Intrinsics_vec256 f200 = r[0U];
+    Lib_IntVector_Intrinsics_vec256 f210 = r[1U];
+    Lib_IntVector_Intrinsics_vec256 f220 = r[2U];
+    Lib_IntVector_Intrinsics_vec256 f230 = r[3U];
+    Lib_IntVector_Intrinsics_vec256 f240 = r[4U];
+    r5[0U] = Lib_IntVector_Intrinsics_vec256_smul64(f200, (uint64_t)5U);
+    r5[1U] = Lib_IntVector_Intrinsics_vec256_smul64(f210, (uint64_t)5U);
+    r5[2U] = Lib_IntVector_Intrinsics_vec256_smul64(f220, (uint64_t)5U);
+    r5[3U] = Lib_IntVector_Intrinsics_vec256_smul64(f230, (uint64_t)5U);
+    r5[4U] = Lib_IntVector_Intrinsics_vec256_smul64(f240, (uint64_t)5U);
+    Lib_IntVector_Intrinsics_vec256 r0 = r[0U];
+    Lib_IntVector_Intrinsics_vec256 r10 = r[1U];
+    Lib_IntVector_Intrinsics_vec256 r20 = r[2U];
+    Lib_IntVector_Intrinsics_vec256 r30 = r[3U];
+    Lib_IntVector_Intrinsics_vec256 r40 = r[4U];
+    Lib_IntVector_Intrinsics_vec256 r510 = r5[1U];
+    Lib_IntVector_Intrinsics_vec256 r520 = r5[2U];
+    Lib_IntVector_Intrinsics_vec256 r530 = r5[3U];
+    Lib_IntVector_Intrinsics_vec256 r540 = r5[4U];
+    Lib_IntVector_Intrinsics_vec256 f100 = r[0U];
+    Lib_IntVector_Intrinsics_vec256 f110 = r[1U];
+    Lib_IntVector_Intrinsics_vec256 f120 = r[2U];
+    Lib_IntVector_Intrinsics_vec256 f130 = r[3U];
+    Lib_IntVector_Intrinsics_vec256 f140 = r[4U];
+    Lib_IntVector_Intrinsics_vec256 a00 = Lib_IntVector_Intrinsics_vec256_mul64(r0, f100);
+    Lib_IntVector_Intrinsics_vec256 a10 = Lib_IntVector_Intrinsics_vec256_mul64(r10, f100);
+    Lib_IntVector_Intrinsics_vec256 a20 = Lib_IntVector_Intrinsics_vec256_mul64(r20, f100);
+    Lib_IntVector_Intrinsics_vec256 a30 = Lib_IntVector_Intrinsics_vec256_mul64(r30, f100);
+    Lib_IntVector_Intrinsics_vec256 a40 = Lib_IntVector_Intrinsics_vec256_mul64(r40, f100);
+    Lib_IntVector_Intrinsics_vec256
+        a010 =
+            Lib_IntVector_Intrinsics_vec256_add64(a00,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r540, f110));
+    Lib_IntVector_Intrinsics_vec256
+        a110 =
+            Lib_IntVector_Intrinsics_vec256_add64(a10,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r0, f110));
+    Lib_IntVector_Intrinsics_vec256
+        a210 =
+            Lib_IntVector_Intrinsics_vec256_add64(a20,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r10, f110));
+    Lib_IntVector_Intrinsics_vec256
+        a310 =
+            Lib_IntVector_Intrinsics_vec256_add64(a30,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r20, f110));
+    Lib_IntVector_Intrinsics_vec256
+        a410 =
+            Lib_IntVector_Intrinsics_vec256_add64(a40,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r30, f110));
+    Lib_IntVector_Intrinsics_vec256
+        a020 =
+            Lib_IntVector_Intrinsics_vec256_add64(a010,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r530, f120));
+    Lib_IntVector_Intrinsics_vec256
+        a120 =
+            Lib_IntVector_Intrinsics_vec256_add64(a110,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r540, f120));
+    Lib_IntVector_Intrinsics_vec256
+        a220 =
+            Lib_IntVector_Intrinsics_vec256_add64(a210,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r0, f120));
+    Lib_IntVector_Intrinsics_vec256
+        a320 =
+            Lib_IntVector_Intrinsics_vec256_add64(a310,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r10, f120));
+    Lib_IntVector_Intrinsics_vec256
+        a420 =
+            Lib_IntVector_Intrinsics_vec256_add64(a410,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r20, f120));
+    Lib_IntVector_Intrinsics_vec256
+        a030 =
+            Lib_IntVector_Intrinsics_vec256_add64(a020,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r520, f130));
+    Lib_IntVector_Intrinsics_vec256
+        a130 =
+            Lib_IntVector_Intrinsics_vec256_add64(a120,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r530, f130));
+    Lib_IntVector_Intrinsics_vec256
+        a230 =
+            Lib_IntVector_Intrinsics_vec256_add64(a220,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r540, f130));
+    Lib_IntVector_Intrinsics_vec256
+        a330 =
+            Lib_IntVector_Intrinsics_vec256_add64(a320,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r0, f130));
+    Lib_IntVector_Intrinsics_vec256
+        a430 =
+            Lib_IntVector_Intrinsics_vec256_add64(a420,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r10, f130));
+    Lib_IntVector_Intrinsics_vec256
+        a040 =
+            Lib_IntVector_Intrinsics_vec256_add64(a030,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r510, f140));
+    Lib_IntVector_Intrinsics_vec256
+        a140 =
+            Lib_IntVector_Intrinsics_vec256_add64(a130,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r520, f140));
+    Lib_IntVector_Intrinsics_vec256
+        a240 =
+            Lib_IntVector_Intrinsics_vec256_add64(a230,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r530, f140));
+    Lib_IntVector_Intrinsics_vec256
+        a340 =
+            Lib_IntVector_Intrinsics_vec256_add64(a330,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r540, f140));
+    Lib_IntVector_Intrinsics_vec256
+        a440 =
+            Lib_IntVector_Intrinsics_vec256_add64(a430,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r0, f140));
+    Lib_IntVector_Intrinsics_vec256 t00 = a040;
+    Lib_IntVector_Intrinsics_vec256 t10 = a140;
+    Lib_IntVector_Intrinsics_vec256 t20 = a240;
+    Lib_IntVector_Intrinsics_vec256 t30 = a340;
+    Lib_IntVector_Intrinsics_vec256 t40 = a440;
+    Lib_IntVector_Intrinsics_vec256
+        mask2610 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+    Lib_IntVector_Intrinsics_vec256
+        z00 = Lib_IntVector_Intrinsics_vec256_shift_right64(t00, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z10 = Lib_IntVector_Intrinsics_vec256_shift_right64(t30, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x00 = Lib_IntVector_Intrinsics_vec256_and(t00, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x30 = Lib_IntVector_Intrinsics_vec256_and(t30, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x10 = Lib_IntVector_Intrinsics_vec256_add64(t10, z00);
+    Lib_IntVector_Intrinsics_vec256 x40 = Lib_IntVector_Intrinsics_vec256_add64(t40, z10);
+    Lib_IntVector_Intrinsics_vec256
+        z010 = Lib_IntVector_Intrinsics_vec256_shift_right64(x10, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z110 = Lib_IntVector_Intrinsics_vec256_shift_right64(x40, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        t5 = Lib_IntVector_Intrinsics_vec256_shift_left64(z110, (uint32_t)2U);
+    Lib_IntVector_Intrinsics_vec256 z12 = Lib_IntVector_Intrinsics_vec256_add64(z110, t5);
+    Lib_IntVector_Intrinsics_vec256 x110 = Lib_IntVector_Intrinsics_vec256_and(x10, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x410 = Lib_IntVector_Intrinsics_vec256_and(x40, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x20 = Lib_IntVector_Intrinsics_vec256_add64(t20, z010);
+    Lib_IntVector_Intrinsics_vec256 x010 = Lib_IntVector_Intrinsics_vec256_add64(x00, z12);
+    Lib_IntVector_Intrinsics_vec256
+        z020 = Lib_IntVector_Intrinsics_vec256_shift_right64(x20, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z130 = Lib_IntVector_Intrinsics_vec256_shift_right64(x010, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x210 = Lib_IntVector_Intrinsics_vec256_and(x20, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x020 = Lib_IntVector_Intrinsics_vec256_and(x010, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x310 = Lib_IntVector_Intrinsics_vec256_add64(x30, z020);
+    Lib_IntVector_Intrinsics_vec256 x120 = Lib_IntVector_Intrinsics_vec256_add64(x110, z130);
+    Lib_IntVector_Intrinsics_vec256
+        z030 = Lib_IntVector_Intrinsics_vec256_shift_right64(x310, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x320 = Lib_IntVector_Intrinsics_vec256_and(x310, mask2610);
+    Lib_IntVector_Intrinsics_vec256 x420 = Lib_IntVector_Intrinsics_vec256_add64(x410, z030);
+    Lib_IntVector_Intrinsics_vec256 o00 = x020;
+    Lib_IntVector_Intrinsics_vec256 o10 = x120;
+    Lib_IntVector_Intrinsics_vec256 o20 = x210;
+    Lib_IntVector_Intrinsics_vec256 o30 = x320;
+    Lib_IntVector_Intrinsics_vec256 o40 = x420;
+    rn[0U] = o00;
+    rn[1U] = o10;
+    rn[2U] = o20;
+    rn[3U] = o30;
+    rn[4U] = o40;
+    Lib_IntVector_Intrinsics_vec256 f201 = rn[0U];
+    Lib_IntVector_Intrinsics_vec256 f211 = rn[1U];
+    Lib_IntVector_Intrinsics_vec256 f221 = rn[2U];
+    Lib_IntVector_Intrinsics_vec256 f231 = rn[3U];
+    Lib_IntVector_Intrinsics_vec256 f241 = rn[4U];
+    rn_5[0U] = Lib_IntVector_Intrinsics_vec256_smul64(f201, (uint64_t)5U);
+    rn_5[1U] = Lib_IntVector_Intrinsics_vec256_smul64(f211, (uint64_t)5U);
+    rn_5[2U] = Lib_IntVector_Intrinsics_vec256_smul64(f221, (uint64_t)5U);
+    rn_5[3U] = Lib_IntVector_Intrinsics_vec256_smul64(f231, (uint64_t)5U);
+    rn_5[4U] = Lib_IntVector_Intrinsics_vec256_smul64(f241, (uint64_t)5U);
+    Lib_IntVector_Intrinsics_vec256 r00 = rn[0U];
+    Lib_IntVector_Intrinsics_vec256 r1 = rn[1U];
+    Lib_IntVector_Intrinsics_vec256 r2 = rn[2U];
+    Lib_IntVector_Intrinsics_vec256 r3 = rn[3U];
+    Lib_IntVector_Intrinsics_vec256 r4 = rn[4U];
+    Lib_IntVector_Intrinsics_vec256 r51 = rn_5[1U];
+    Lib_IntVector_Intrinsics_vec256 r52 = rn_5[2U];
+    Lib_IntVector_Intrinsics_vec256 r53 = rn_5[3U];
+    Lib_IntVector_Intrinsics_vec256 r54 = rn_5[4U];
+    Lib_IntVector_Intrinsics_vec256 f10 = rn[0U];
+    Lib_IntVector_Intrinsics_vec256 f11 = rn[1U];
+    Lib_IntVector_Intrinsics_vec256 f12 = rn[2U];
+    Lib_IntVector_Intrinsics_vec256 f13 = rn[3U];
+    Lib_IntVector_Intrinsics_vec256 f14 = rn[4U];
+    Lib_IntVector_Intrinsics_vec256 a0 = Lib_IntVector_Intrinsics_vec256_mul64(r00, f10);
+    Lib_IntVector_Intrinsics_vec256 a1 = Lib_IntVector_Intrinsics_vec256_mul64(r1, f10);
+    Lib_IntVector_Intrinsics_vec256 a2 = Lib_IntVector_Intrinsics_vec256_mul64(r2, f10);
+    Lib_IntVector_Intrinsics_vec256 a3 = Lib_IntVector_Intrinsics_vec256_mul64(r3, f10);
+    Lib_IntVector_Intrinsics_vec256 a4 = Lib_IntVector_Intrinsics_vec256_mul64(r4, f10);
+    Lib_IntVector_Intrinsics_vec256
+        a01 =
+            Lib_IntVector_Intrinsics_vec256_add64(a0,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r54, f11));
+    Lib_IntVector_Intrinsics_vec256
+        a11 =
+            Lib_IntVector_Intrinsics_vec256_add64(a1,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r00, f11));
+    Lib_IntVector_Intrinsics_vec256
+        a21 = Lib_IntVector_Intrinsics_vec256_add64(a2, Lib_IntVector_Intrinsics_vec256_mul64(r1, f11));
+    Lib_IntVector_Intrinsics_vec256
+        a31 = Lib_IntVector_Intrinsics_vec256_add64(a3, Lib_IntVector_Intrinsics_vec256_mul64(r2, f11));
+    Lib_IntVector_Intrinsics_vec256
+        a41 = Lib_IntVector_Intrinsics_vec256_add64(a4, Lib_IntVector_Intrinsics_vec256_mul64(r3, f11));
+    Lib_IntVector_Intrinsics_vec256
+        a02 =
+            Lib_IntVector_Intrinsics_vec256_add64(a01,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r53, f12));
+    Lib_IntVector_Intrinsics_vec256
+        a12 =
+            Lib_IntVector_Intrinsics_vec256_add64(a11,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r54, f12));
+    Lib_IntVector_Intrinsics_vec256
+        a22 =
+            Lib_IntVector_Intrinsics_vec256_add64(a21,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r00, f12));
+    Lib_IntVector_Intrinsics_vec256
+        a32 =
+            Lib_IntVector_Intrinsics_vec256_add64(a31,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r1, f12));
+    Lib_IntVector_Intrinsics_vec256
+        a42 =
+            Lib_IntVector_Intrinsics_vec256_add64(a41,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r2, f12));
+    Lib_IntVector_Intrinsics_vec256
+        a03 =
+            Lib_IntVector_Intrinsics_vec256_add64(a02,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r52, f13));
+    Lib_IntVector_Intrinsics_vec256
+        a13 =
+            Lib_IntVector_Intrinsics_vec256_add64(a12,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r53, f13));
+    Lib_IntVector_Intrinsics_vec256
+        a23 =
+            Lib_IntVector_Intrinsics_vec256_add64(a22,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r54, f13));
+    Lib_IntVector_Intrinsics_vec256
+        a33 =
+            Lib_IntVector_Intrinsics_vec256_add64(a32,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r00, f13));
+    Lib_IntVector_Intrinsics_vec256
+        a43 =
+            Lib_IntVector_Intrinsics_vec256_add64(a42,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r1, f13));
+    Lib_IntVector_Intrinsics_vec256
+        a04 =
+            Lib_IntVector_Intrinsics_vec256_add64(a03,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r51, f14));
+    Lib_IntVector_Intrinsics_vec256
+        a14 =
+            Lib_IntVector_Intrinsics_vec256_add64(a13,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r52, f14));
+    Lib_IntVector_Intrinsics_vec256
+        a24 =
+            Lib_IntVector_Intrinsics_vec256_add64(a23,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r53, f14));
+    Lib_IntVector_Intrinsics_vec256
+        a34 =
+            Lib_IntVector_Intrinsics_vec256_add64(a33,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r54, f14));
+    Lib_IntVector_Intrinsics_vec256
+        a44 =
+            Lib_IntVector_Intrinsics_vec256_add64(a43,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r00, f14));
+    Lib_IntVector_Intrinsics_vec256 t0 = a04;
+    Lib_IntVector_Intrinsics_vec256 t1 = a14;
+    Lib_IntVector_Intrinsics_vec256 t2 = a24;
+    Lib_IntVector_Intrinsics_vec256 t3 = a34;
+    Lib_IntVector_Intrinsics_vec256 t4 = a44;
+    Lib_IntVector_Intrinsics_vec256
+        mask261 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+    Lib_IntVector_Intrinsics_vec256
+        z0 = Lib_IntVector_Intrinsics_vec256_shift_right64(t0, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z1 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x0 = Lib_IntVector_Intrinsics_vec256_and(t0, mask261);
+    Lib_IntVector_Intrinsics_vec256 x3 = Lib_IntVector_Intrinsics_vec256_and(t3, mask261);
+    Lib_IntVector_Intrinsics_vec256 x1 = Lib_IntVector_Intrinsics_vec256_add64(t1, z0);
+    Lib_IntVector_Intrinsics_vec256 x4 = Lib_IntVector_Intrinsics_vec256_add64(t4, z1);
+    Lib_IntVector_Intrinsics_vec256
+        z01 = Lib_IntVector_Intrinsics_vec256_shift_right64(x1, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z11 = Lib_IntVector_Intrinsics_vec256_shift_right64(x4, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        t = Lib_IntVector_Intrinsics_vec256_shift_left64(z11, (uint32_t)2U);
+    Lib_IntVector_Intrinsics_vec256 z120 = Lib_IntVector_Intrinsics_vec256_add64(z11, t);
+    Lib_IntVector_Intrinsics_vec256 x11 = Lib_IntVector_Intrinsics_vec256_and(x1, mask261);
+    Lib_IntVector_Intrinsics_vec256 x41 = Lib_IntVector_Intrinsics_vec256_and(x4, mask261);
+    Lib_IntVector_Intrinsics_vec256 x2 = Lib_IntVector_Intrinsics_vec256_add64(t2, z01);
+    Lib_IntVector_Intrinsics_vec256 x01 = Lib_IntVector_Intrinsics_vec256_add64(x0, z120);
+    Lib_IntVector_Intrinsics_vec256
+        z02 = Lib_IntVector_Intrinsics_vec256_shift_right64(x2, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z13 = Lib_IntVector_Intrinsics_vec256_shift_right64(x01, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x21 = Lib_IntVector_Intrinsics_vec256_and(x2, mask261);
+    Lib_IntVector_Intrinsics_vec256 x02 = Lib_IntVector_Intrinsics_vec256_and(x01, mask261);
+    Lib_IntVector_Intrinsics_vec256 x31 = Lib_IntVector_Intrinsics_vec256_add64(x3, z02);
+    Lib_IntVector_Intrinsics_vec256 x12 = Lib_IntVector_Intrinsics_vec256_add64(x11, z13);
+    Lib_IntVector_Intrinsics_vec256
+        z03 = Lib_IntVector_Intrinsics_vec256_shift_right64(x31, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x32 = Lib_IntVector_Intrinsics_vec256_and(x31, mask261);
+    Lib_IntVector_Intrinsics_vec256 x42 = Lib_IntVector_Intrinsics_vec256_add64(x41, z03);
+    Lib_IntVector_Intrinsics_vec256 o0 = x02;
+    Lib_IntVector_Intrinsics_vec256 o1 = x12;
+    Lib_IntVector_Intrinsics_vec256 o2 = x21;
+    Lib_IntVector_Intrinsics_vec256 o3 = x32;
+    Lib_IntVector_Intrinsics_vec256 o4 = x42;
+    rn[0U] = o0;
+    rn[1U] = o1;
+    rn[2U] = o2;
+    rn[3U] = o3;
+    rn[4U] = o4;
+    Lib_IntVector_Intrinsics_vec256 f202 = rn[0U];
+    Lib_IntVector_Intrinsics_vec256 f21 = rn[1U];
+    Lib_IntVector_Intrinsics_vec256 f22 = rn[2U];
+    Lib_IntVector_Intrinsics_vec256 f23 = rn[3U];
+    Lib_IntVector_Intrinsics_vec256 f24 = rn[4U];
+    rn_5[0U] = Lib_IntVector_Intrinsics_vec256_smul64(f202, (uint64_t)5U);
+    rn_5[1U] = Lib_IntVector_Intrinsics_vec256_smul64(f21, (uint64_t)5U);
+    rn_5[2U] = Lib_IntVector_Intrinsics_vec256_smul64(f22, (uint64_t)5U);
+    rn_5[3U] = Lib_IntVector_Intrinsics_vec256_smul64(f23, (uint64_t)5U);
+    rn_5[4U] = Lib_IntVector_Intrinsics_vec256_smul64(f24, (uint64_t)5U);
+}
+
+void
+Hacl_Poly1305_256_poly1305_update1(Lib_IntVector_Intrinsics_vec256 *ctx, uint8_t *text)
+{
+    Lib_IntVector_Intrinsics_vec256 *pre = ctx + (uint32_t)5U;
+    Lib_IntVector_Intrinsics_vec256 *acc = ctx;
+    Lib_IntVector_Intrinsics_vec256 e[5U];
+    for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
+        e[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+    uint64_t u0 = load64_le(text);
+    uint64_t lo = u0;
+    uint64_t u = load64_le(text + (uint32_t)8U);
+    uint64_t hi = u;
+    Lib_IntVector_Intrinsics_vec256 f0 = Lib_IntVector_Intrinsics_vec256_load64(lo);
+    Lib_IntVector_Intrinsics_vec256 f1 = Lib_IntVector_Intrinsics_vec256_load64(hi);
+    Lib_IntVector_Intrinsics_vec256
+        f010 =
+            Lib_IntVector_Intrinsics_vec256_and(f0,
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        f110 =
+            Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                              (uint32_t)26U),
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        f20 =
+            Lib_IntVector_Intrinsics_vec256_or(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                             (uint32_t)52U),
+                                               Lib_IntVector_Intrinsics_vec256_shift_left64(Lib_IntVector_Intrinsics_vec256_and(f1,
+                                                                                                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3fffU)),
+                                                                                            (uint32_t)12U));
+    Lib_IntVector_Intrinsics_vec256
+        f30 =
+            Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f1,
+                                                                                              (uint32_t)14U),
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256
+        f40 = Lib_IntVector_Intrinsics_vec256_shift_right64(f1, (uint32_t)40U);
+    Lib_IntVector_Intrinsics_vec256 f01 = f010;
+    Lib_IntVector_Intrinsics_vec256 f111 = f110;
+    Lib_IntVector_Intrinsics_vec256 f2 = f20;
+    Lib_IntVector_Intrinsics_vec256 f3 = f30;
+    Lib_IntVector_Intrinsics_vec256 f41 = f40;
+    e[0U] = f01;
+    e[1U] = f111;
+    e[2U] = f2;
+    e[3U] = f3;
+    e[4U] = f41;
+    uint64_t b = (uint64_t)0x1000000U;
+    Lib_IntVector_Intrinsics_vec256 mask = Lib_IntVector_Intrinsics_vec256_load64(b);
+    Lib_IntVector_Intrinsics_vec256 f4 = e[4U];
+    e[4U] = Lib_IntVector_Intrinsics_vec256_or(f4, mask);
+    Lib_IntVector_Intrinsics_vec256 *r = pre;
+    Lib_IntVector_Intrinsics_vec256 *r5 = pre + (uint32_t)5U;
+    Lib_IntVector_Intrinsics_vec256 r0 = r[0U];
+    Lib_IntVector_Intrinsics_vec256 r1 = r[1U];
+    Lib_IntVector_Intrinsics_vec256 r2 = r[2U];
+    Lib_IntVector_Intrinsics_vec256 r3 = r[3U];
+    Lib_IntVector_Intrinsics_vec256 r4 = r[4U];
+    Lib_IntVector_Intrinsics_vec256 r51 = r5[1U];
+    Lib_IntVector_Intrinsics_vec256 r52 = r5[2U];
+    Lib_IntVector_Intrinsics_vec256 r53 = r5[3U];
+    Lib_IntVector_Intrinsics_vec256 r54 = r5[4U];
+    Lib_IntVector_Intrinsics_vec256 f10 = e[0U];
+    Lib_IntVector_Intrinsics_vec256 f11 = e[1U];
+    Lib_IntVector_Intrinsics_vec256 f12 = e[2U];
+    Lib_IntVector_Intrinsics_vec256 f13 = e[3U];
+    Lib_IntVector_Intrinsics_vec256 f14 = e[4U];
+    Lib_IntVector_Intrinsics_vec256 a0 = acc[0U];
+    Lib_IntVector_Intrinsics_vec256 a1 = acc[1U];
+    Lib_IntVector_Intrinsics_vec256 a2 = acc[2U];
+    Lib_IntVector_Intrinsics_vec256 a3 = acc[3U];
+    Lib_IntVector_Intrinsics_vec256 a4 = acc[4U];
+    Lib_IntVector_Intrinsics_vec256 a01 = Lib_IntVector_Intrinsics_vec256_add64(a0, f10);
+    Lib_IntVector_Intrinsics_vec256 a11 = Lib_IntVector_Intrinsics_vec256_add64(a1, f11);
+    Lib_IntVector_Intrinsics_vec256 a21 = Lib_IntVector_Intrinsics_vec256_add64(a2, f12);
+    Lib_IntVector_Intrinsics_vec256 a31 = Lib_IntVector_Intrinsics_vec256_add64(a3, f13);
+    Lib_IntVector_Intrinsics_vec256 a41 = Lib_IntVector_Intrinsics_vec256_add64(a4, f14);
+    Lib_IntVector_Intrinsics_vec256 a02 = Lib_IntVector_Intrinsics_vec256_mul64(r0, a01);
+    Lib_IntVector_Intrinsics_vec256 a12 = Lib_IntVector_Intrinsics_vec256_mul64(r1, a01);
+    Lib_IntVector_Intrinsics_vec256 a22 = Lib_IntVector_Intrinsics_vec256_mul64(r2, a01);
+    Lib_IntVector_Intrinsics_vec256 a32 = Lib_IntVector_Intrinsics_vec256_mul64(r3, a01);
+    Lib_IntVector_Intrinsics_vec256 a42 = Lib_IntVector_Intrinsics_vec256_mul64(r4, a01);
+    Lib_IntVector_Intrinsics_vec256
+        a03 =
+            Lib_IntVector_Intrinsics_vec256_add64(a02,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r54, a11));
+    Lib_IntVector_Intrinsics_vec256
+        a13 =
+            Lib_IntVector_Intrinsics_vec256_add64(a12,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r0, a11));
+    Lib_IntVector_Intrinsics_vec256
+        a23 =
+            Lib_IntVector_Intrinsics_vec256_add64(a22,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r1, a11));
+    Lib_IntVector_Intrinsics_vec256
+        a33 =
+            Lib_IntVector_Intrinsics_vec256_add64(a32,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r2, a11));
+    Lib_IntVector_Intrinsics_vec256
+        a43 =
+            Lib_IntVector_Intrinsics_vec256_add64(a42,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r3, a11));
+    Lib_IntVector_Intrinsics_vec256
+        a04 =
+            Lib_IntVector_Intrinsics_vec256_add64(a03,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r53, a21));
+    Lib_IntVector_Intrinsics_vec256
+        a14 =
+            Lib_IntVector_Intrinsics_vec256_add64(a13,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r54, a21));
+    Lib_IntVector_Intrinsics_vec256
+        a24 =
+            Lib_IntVector_Intrinsics_vec256_add64(a23,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r0, a21));
+    Lib_IntVector_Intrinsics_vec256
+        a34 =
+            Lib_IntVector_Intrinsics_vec256_add64(a33,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r1, a21));
+    Lib_IntVector_Intrinsics_vec256
+        a44 =
+            Lib_IntVector_Intrinsics_vec256_add64(a43,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r2, a21));
+    Lib_IntVector_Intrinsics_vec256
+        a05 =
+            Lib_IntVector_Intrinsics_vec256_add64(a04,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r52, a31));
+    Lib_IntVector_Intrinsics_vec256
+        a15 =
+            Lib_IntVector_Intrinsics_vec256_add64(a14,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r53, a31));
+    Lib_IntVector_Intrinsics_vec256
+        a25 =
+            Lib_IntVector_Intrinsics_vec256_add64(a24,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r54, a31));
+    Lib_IntVector_Intrinsics_vec256
+        a35 =
+            Lib_IntVector_Intrinsics_vec256_add64(a34,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r0, a31));
+    Lib_IntVector_Intrinsics_vec256
+        a45 =
+            Lib_IntVector_Intrinsics_vec256_add64(a44,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r1, a31));
+    Lib_IntVector_Intrinsics_vec256
+        a06 =
+            Lib_IntVector_Intrinsics_vec256_add64(a05,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r51, a41));
+    Lib_IntVector_Intrinsics_vec256
+        a16 =
+            Lib_IntVector_Intrinsics_vec256_add64(a15,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r52, a41));
+    Lib_IntVector_Intrinsics_vec256
+        a26 =
+            Lib_IntVector_Intrinsics_vec256_add64(a25,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r53, a41));
+    Lib_IntVector_Intrinsics_vec256
+        a36 =
+            Lib_IntVector_Intrinsics_vec256_add64(a35,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r54, a41));
+    Lib_IntVector_Intrinsics_vec256
+        a46 =
+            Lib_IntVector_Intrinsics_vec256_add64(a45,
+                                                  Lib_IntVector_Intrinsics_vec256_mul64(r0, a41));
+    Lib_IntVector_Intrinsics_vec256 t0 = a06;
+    Lib_IntVector_Intrinsics_vec256 t1 = a16;
+    Lib_IntVector_Intrinsics_vec256 t2 = a26;
+    Lib_IntVector_Intrinsics_vec256 t3 = a36;
+    Lib_IntVector_Intrinsics_vec256 t4 = a46;
+    Lib_IntVector_Intrinsics_vec256
+        mask261 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+    Lib_IntVector_Intrinsics_vec256
+        z0 = Lib_IntVector_Intrinsics_vec256_shift_right64(t0, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z1 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x0 = Lib_IntVector_Intrinsics_vec256_and(t0, mask261);
+    Lib_IntVector_Intrinsics_vec256 x3 = Lib_IntVector_Intrinsics_vec256_and(t3, mask261);
+    Lib_IntVector_Intrinsics_vec256 x1 = Lib_IntVector_Intrinsics_vec256_add64(t1, z0);
+    Lib_IntVector_Intrinsics_vec256 x4 = Lib_IntVector_Intrinsics_vec256_add64(t4, z1);
+    Lib_IntVector_Intrinsics_vec256
+        z01 = Lib_IntVector_Intrinsics_vec256_shift_right64(x1, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z11 = Lib_IntVector_Intrinsics_vec256_shift_right64(x4, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        t = Lib_IntVector_Intrinsics_vec256_shift_left64(z11, (uint32_t)2U);
+    Lib_IntVector_Intrinsics_vec256 z12 = Lib_IntVector_Intrinsics_vec256_add64(z11, t);
+    Lib_IntVector_Intrinsics_vec256 x11 = Lib_IntVector_Intrinsics_vec256_and(x1, mask261);
+    Lib_IntVector_Intrinsics_vec256 x41 = Lib_IntVector_Intrinsics_vec256_and(x4, mask261);
+    Lib_IntVector_Intrinsics_vec256 x2 = Lib_IntVector_Intrinsics_vec256_add64(t2, z01);
+    Lib_IntVector_Intrinsics_vec256 x01 = Lib_IntVector_Intrinsics_vec256_add64(x0, z12);
+    Lib_IntVector_Intrinsics_vec256
+        z02 = Lib_IntVector_Intrinsics_vec256_shift_right64(x2, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256
+        z13 = Lib_IntVector_Intrinsics_vec256_shift_right64(x01, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x21 = Lib_IntVector_Intrinsics_vec256_and(x2, mask261);
+    Lib_IntVector_Intrinsics_vec256 x02 = Lib_IntVector_Intrinsics_vec256_and(x01, mask261);
+    Lib_IntVector_Intrinsics_vec256 x31 = Lib_IntVector_Intrinsics_vec256_add64(x3, z02);
+    Lib_IntVector_Intrinsics_vec256 x12 = Lib_IntVector_Intrinsics_vec256_add64(x11, z13);
+    Lib_IntVector_Intrinsics_vec256
+        z03 = Lib_IntVector_Intrinsics_vec256_shift_right64(x31, (uint32_t)26U);
+    Lib_IntVector_Intrinsics_vec256 x32 = Lib_IntVector_Intrinsics_vec256_and(x31, mask261);
+    Lib_IntVector_Intrinsics_vec256 x42 = Lib_IntVector_Intrinsics_vec256_add64(x41, z03);
+    Lib_IntVector_Intrinsics_vec256 o0 = x02;
+    Lib_IntVector_Intrinsics_vec256 o1 = x12;
+    Lib_IntVector_Intrinsics_vec256 o2 = x21;
+    Lib_IntVector_Intrinsics_vec256 o3 = x32;
+    Lib_IntVector_Intrinsics_vec256 o4 = x42;
+    acc[0U] = o0;
+    acc[1U] = o1;
+    acc[2U] = o2;
+    acc[3U] = o3;
+    acc[4U] = o4;
+}
+
+void
+Hacl_Poly1305_256_poly1305_update(
+    Lib_IntVector_Intrinsics_vec256 *ctx,
+    uint32_t len,
+    uint8_t *text)
+{
+    Lib_IntVector_Intrinsics_vec256 *pre = ctx + (uint32_t)5U;
+    Lib_IntVector_Intrinsics_vec256 *acc = ctx;
+    uint32_t sz_block = (uint32_t)64U;
+    uint32_t len0 = len / sz_block * sz_block;
+    uint8_t *t0 = text;
+    if (len0 > (uint32_t)0U) {
+        uint32_t bs = (uint32_t)64U;
+        uint8_t *text0 = t0;
+        Hacl_Impl_Poly1305_Field32xN_256_load_acc4(acc, text0);
+        uint32_t len1 = len0 - bs;
+        uint8_t *text1 = t0 + bs;
+        uint32_t nb = len1 / bs;
+        for (uint32_t i = (uint32_t)0U; i < nb; i++) {
+            uint8_t *block = text1 + i * bs;
+            Lib_IntVector_Intrinsics_vec256 e[5U];
+            for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
+                e[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+            Lib_IntVector_Intrinsics_vec256 lo = Lib_IntVector_Intrinsics_vec256_load_le(block);
+            Lib_IntVector_Intrinsics_vec256
+                hi = Lib_IntVector_Intrinsics_vec256_load_le(block + (uint32_t)32U);
+            Lib_IntVector_Intrinsics_vec256
+                mask2610 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+            Lib_IntVector_Intrinsics_vec256
+                m0 = Lib_IntVector_Intrinsics_vec256_interleave_low128(lo, hi);
+            Lib_IntVector_Intrinsics_vec256
+                m1 = Lib_IntVector_Intrinsics_vec256_interleave_high128(lo, hi);
+            Lib_IntVector_Intrinsics_vec256
+                m2 = Lib_IntVector_Intrinsics_vec256_shift_right(m0, (uint32_t)48U);
+            Lib_IntVector_Intrinsics_vec256
+                m3 = Lib_IntVector_Intrinsics_vec256_shift_right(m1, (uint32_t)48U);
+            Lib_IntVector_Intrinsics_vec256
+                m4 = Lib_IntVector_Intrinsics_vec256_interleave_high64(m0, m1);
+            Lib_IntVector_Intrinsics_vec256
+                t010 = Lib_IntVector_Intrinsics_vec256_interleave_low64(m0, m1);
+            Lib_IntVector_Intrinsics_vec256
+                t30 = Lib_IntVector_Intrinsics_vec256_interleave_low64(m2, m3);
+            Lib_IntVector_Intrinsics_vec256
+                t20 = Lib_IntVector_Intrinsics_vec256_shift_right64(t30, (uint32_t)4U);
+            Lib_IntVector_Intrinsics_vec256 o20 = Lib_IntVector_Intrinsics_vec256_and(t20, mask2610);
+            Lib_IntVector_Intrinsics_vec256
+                t10 = Lib_IntVector_Intrinsics_vec256_shift_right64(t010, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256 o10 = Lib_IntVector_Intrinsics_vec256_and(t10, mask2610);
+            Lib_IntVector_Intrinsics_vec256 o5 = Lib_IntVector_Intrinsics_vec256_and(t010, mask2610);
+            Lib_IntVector_Intrinsics_vec256
+                t31 = Lib_IntVector_Intrinsics_vec256_shift_right64(t30, (uint32_t)30U);
+            Lib_IntVector_Intrinsics_vec256 o30 = Lib_IntVector_Intrinsics_vec256_and(t31, mask2610);
+            Lib_IntVector_Intrinsics_vec256
+                o40 = Lib_IntVector_Intrinsics_vec256_shift_right64(m4, (uint32_t)40U);
+            Lib_IntVector_Intrinsics_vec256 o00 = o5;
+            Lib_IntVector_Intrinsics_vec256 o11 = o10;
+            Lib_IntVector_Intrinsics_vec256 o21 = o20;
+            Lib_IntVector_Intrinsics_vec256 o31 = o30;
+            Lib_IntVector_Intrinsics_vec256 o41 = o40;
+            e[0U] = o00;
+            e[1U] = o11;
+            e[2U] = o21;
+            e[3U] = o31;
+            e[4U] = o41;
+            uint64_t b = (uint64_t)0x1000000U;
+            Lib_IntVector_Intrinsics_vec256 mask = Lib_IntVector_Intrinsics_vec256_load64(b);
+            Lib_IntVector_Intrinsics_vec256 f4 = e[4U];
+            e[4U] = Lib_IntVector_Intrinsics_vec256_or(f4, mask);
+            Lib_IntVector_Intrinsics_vec256 *rn = pre + (uint32_t)10U;
+            Lib_IntVector_Intrinsics_vec256 *rn5 = pre + (uint32_t)15U;
+            Lib_IntVector_Intrinsics_vec256 r0 = rn[0U];
+            Lib_IntVector_Intrinsics_vec256 r1 = rn[1U];
+            Lib_IntVector_Intrinsics_vec256 r2 = rn[2U];
+            Lib_IntVector_Intrinsics_vec256 r3 = rn[3U];
+            Lib_IntVector_Intrinsics_vec256 r4 = rn[4U];
+            Lib_IntVector_Intrinsics_vec256 r51 = rn5[1U];
+            Lib_IntVector_Intrinsics_vec256 r52 = rn5[2U];
+            Lib_IntVector_Intrinsics_vec256 r53 = rn5[3U];
+            Lib_IntVector_Intrinsics_vec256 r54 = rn5[4U];
+            Lib_IntVector_Intrinsics_vec256 f10 = acc[0U];
+            Lib_IntVector_Intrinsics_vec256 f110 = acc[1U];
+            Lib_IntVector_Intrinsics_vec256 f120 = acc[2U];
+            Lib_IntVector_Intrinsics_vec256 f130 = acc[3U];
+            Lib_IntVector_Intrinsics_vec256 f140 = acc[4U];
+            Lib_IntVector_Intrinsics_vec256 a0 = Lib_IntVector_Intrinsics_vec256_mul64(r0, f10);
+            Lib_IntVector_Intrinsics_vec256 a1 = Lib_IntVector_Intrinsics_vec256_mul64(r1, f10);
+            Lib_IntVector_Intrinsics_vec256 a2 = Lib_IntVector_Intrinsics_vec256_mul64(r2, f10);
+            Lib_IntVector_Intrinsics_vec256 a3 = Lib_IntVector_Intrinsics_vec256_mul64(r3, f10);
+            Lib_IntVector_Intrinsics_vec256 a4 = Lib_IntVector_Intrinsics_vec256_mul64(r4, f10);
+            Lib_IntVector_Intrinsics_vec256
+                a01 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a0,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r54, f110));
+            Lib_IntVector_Intrinsics_vec256
+                a11 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a1,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r0, f110));
+            Lib_IntVector_Intrinsics_vec256
+                a21 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a2,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r1, f110));
+            Lib_IntVector_Intrinsics_vec256
+                a31 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a3,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r2, f110));
+            Lib_IntVector_Intrinsics_vec256
+                a41 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a4,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r3, f110));
+            Lib_IntVector_Intrinsics_vec256
+                a02 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a01,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r53, f120));
+            Lib_IntVector_Intrinsics_vec256
+                a12 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a11,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r54, f120));
+            Lib_IntVector_Intrinsics_vec256
+                a22 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a21,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r0, f120));
+            Lib_IntVector_Intrinsics_vec256
+                a32 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a31,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r1, f120));
+            Lib_IntVector_Intrinsics_vec256
+                a42 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a41,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r2, f120));
+            Lib_IntVector_Intrinsics_vec256
+                a03 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a02,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r52, f130));
+            Lib_IntVector_Intrinsics_vec256
+                a13 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a12,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r53, f130));
+            Lib_IntVector_Intrinsics_vec256
+                a23 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a22,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r54, f130));
+            Lib_IntVector_Intrinsics_vec256
+                a33 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a32,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r0, f130));
+            Lib_IntVector_Intrinsics_vec256
+                a43 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a42,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r1, f130));
+            Lib_IntVector_Intrinsics_vec256
+                a04 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a03,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r51, f140));
+            Lib_IntVector_Intrinsics_vec256
+                a14 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a13,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r52, f140));
+            Lib_IntVector_Intrinsics_vec256
+                a24 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a23,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r53, f140));
+            Lib_IntVector_Intrinsics_vec256
+                a34 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a33,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r54, f140));
+            Lib_IntVector_Intrinsics_vec256
+                a44 =
+                    Lib_IntVector_Intrinsics_vec256_add64(a43,
+                                                          Lib_IntVector_Intrinsics_vec256_mul64(r0, f140));
+            Lib_IntVector_Intrinsics_vec256 t01 = a04;
+            Lib_IntVector_Intrinsics_vec256 t1 = a14;
+            Lib_IntVector_Intrinsics_vec256 t2 = a24;
+            Lib_IntVector_Intrinsics_vec256 t3 = a34;
+            Lib_IntVector_Intrinsics_vec256 t4 = a44;
+            Lib_IntVector_Intrinsics_vec256
+                mask261 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+            Lib_IntVector_Intrinsics_vec256
+                z0 = Lib_IntVector_Intrinsics_vec256_shift_right64(t01, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256
+                z1 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256 x0 = Lib_IntVector_Intrinsics_vec256_and(t01, mask261);
+            Lib_IntVector_Intrinsics_vec256 x3 = Lib_IntVector_Intrinsics_vec256_and(t3, mask261);
+            Lib_IntVector_Intrinsics_vec256 x1 = Lib_IntVector_Intrinsics_vec256_add64(t1, z0);
+            Lib_IntVector_Intrinsics_vec256 x4 = Lib_IntVector_Intrinsics_vec256_add64(t4, z1);
+            Lib_IntVector_Intrinsics_vec256
+                z01 = Lib_IntVector_Intrinsics_vec256_shift_right64(x1, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256
+                z11 = Lib_IntVector_Intrinsics_vec256_shift_right64(x4, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256
+                t = Lib_IntVector_Intrinsics_vec256_shift_left64(z11, (uint32_t)2U);
+            Lib_IntVector_Intrinsics_vec256 z12 = Lib_IntVector_Intrinsics_vec256_add64(z11, t);
+            Lib_IntVector_Intrinsics_vec256 x11 = Lib_IntVector_Intrinsics_vec256_and(x1, mask261);
+            Lib_IntVector_Intrinsics_vec256 x41 = Lib_IntVector_Intrinsics_vec256_and(x4, mask261);
+            Lib_IntVector_Intrinsics_vec256 x2 = Lib_IntVector_Intrinsics_vec256_add64(t2, z01);
+            Lib_IntVector_Intrinsics_vec256 x01 = Lib_IntVector_Intrinsics_vec256_add64(x0, z12);
+            Lib_IntVector_Intrinsics_vec256
+                z02 = Lib_IntVector_Intrinsics_vec256_shift_right64(x2, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256
+                z13 = Lib_IntVector_Intrinsics_vec256_shift_right64(x01, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256 x21 = Lib_IntVector_Intrinsics_vec256_and(x2, mask261);
+            Lib_IntVector_Intrinsics_vec256 x02 = Lib_IntVector_Intrinsics_vec256_and(x01, mask261);
+            Lib_IntVector_Intrinsics_vec256 x31 = Lib_IntVector_Intrinsics_vec256_add64(x3, z02);
+            Lib_IntVector_Intrinsics_vec256 x12 = Lib_IntVector_Intrinsics_vec256_add64(x11, z13);
+            Lib_IntVector_Intrinsics_vec256
+                z03 = Lib_IntVector_Intrinsics_vec256_shift_right64(x31, (uint32_t)26U);
+            Lib_IntVector_Intrinsics_vec256 x32 = Lib_IntVector_Intrinsics_vec256_and(x31, mask261);
+            Lib_IntVector_Intrinsics_vec256 x42 = Lib_IntVector_Intrinsics_vec256_add64(x41, z03);
+            Lib_IntVector_Intrinsics_vec256 o01 = x02;
+            Lib_IntVector_Intrinsics_vec256 o12 = x12;
+            Lib_IntVector_Intrinsics_vec256 o22 = x21;
+            Lib_IntVector_Intrinsics_vec256 o32 = x32;
+            Lib_IntVector_Intrinsics_vec256 o42 = x42;
+            acc[0U] = o01;
+            acc[1U] = o12;
+            acc[2U] = o22;
+            acc[3U] = o32;
+            acc[4U] = o42;
+            Lib_IntVector_Intrinsics_vec256 f100 = acc[0U];
+            Lib_IntVector_Intrinsics_vec256 f11 = acc[1U];
+            Lib_IntVector_Intrinsics_vec256 f12 = acc[2U];
+            Lib_IntVector_Intrinsics_vec256 f13 = acc[3U];
+            Lib_IntVector_Intrinsics_vec256 f14 = acc[4U];
+            Lib_IntVector_Intrinsics_vec256 f20 = e[0U];
+            Lib_IntVector_Intrinsics_vec256 f21 = e[1U];
+            Lib_IntVector_Intrinsics_vec256 f22 = e[2U];
+            Lib_IntVector_Intrinsics_vec256 f23 = e[3U];
+            Lib_IntVector_Intrinsics_vec256 f24 = e[4U];
+            Lib_IntVector_Intrinsics_vec256 o0 = Lib_IntVector_Intrinsics_vec256_add64(f100, f20);
+            Lib_IntVector_Intrinsics_vec256 o1 = Lib_IntVector_Intrinsics_vec256_add64(f11, f21);
+            Lib_IntVector_Intrinsics_vec256 o2 = Lib_IntVector_Intrinsics_vec256_add64(f12, f22);
+            Lib_IntVector_Intrinsics_vec256 o3 = Lib_IntVector_Intrinsics_vec256_add64(f13, f23);
+            Lib_IntVector_Intrinsics_vec256 o4 = Lib_IntVector_Intrinsics_vec256_add64(f14, f24);
+            acc[0U] = o0;
+            acc[1U] = o1;
+            acc[2U] = o2;
+            acc[3U] = o3;
+            acc[4U] = o4;
+        }
+        Hacl_Impl_Poly1305_Field32xN_256_fmul_r4_normalize(acc, pre);
+    }
+    uint32_t len1 = len - len0;
+    uint8_t *t1 = text + len0;
+    uint32_t nb = len1 / (uint32_t)16U;
+    uint32_t rem1 = len1 % (uint32_t)16U;
+    for (uint32_t i = (uint32_t)0U; i < nb; i++) {
+        uint8_t *block = t1 + i * (uint32_t)16U;
+        Lib_IntVector_Intrinsics_vec256 e[5U];
+        for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
+            e[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+        uint64_t u0 = load64_le(block);
+        uint64_t lo = u0;
+        uint64_t u = load64_le(block + (uint32_t)8U);
+        uint64_t hi = u;
+        Lib_IntVector_Intrinsics_vec256 f0 = Lib_IntVector_Intrinsics_vec256_load64(lo);
+        Lib_IntVector_Intrinsics_vec256 f1 = Lib_IntVector_Intrinsics_vec256_load64(hi);
+        Lib_IntVector_Intrinsics_vec256
+            f010 =
+                Lib_IntVector_Intrinsics_vec256_and(f0,
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f110 =
+                Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                                  (uint32_t)26U),
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f20 =
+                Lib_IntVector_Intrinsics_vec256_or(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                                 (uint32_t)52U),
+                                                   Lib_IntVector_Intrinsics_vec256_shift_left64(Lib_IntVector_Intrinsics_vec256_and(f1,
+                                                                                                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3fffU)),
+                                                                                                (uint32_t)12U));
+        Lib_IntVector_Intrinsics_vec256
+            f30 =
+                Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f1,
+                                                                                                  (uint32_t)14U),
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f40 = Lib_IntVector_Intrinsics_vec256_shift_right64(f1, (uint32_t)40U);
+        Lib_IntVector_Intrinsics_vec256 f01 = f010;
+        Lib_IntVector_Intrinsics_vec256 f111 = f110;
+        Lib_IntVector_Intrinsics_vec256 f2 = f20;
+        Lib_IntVector_Intrinsics_vec256 f3 = f30;
+        Lib_IntVector_Intrinsics_vec256 f41 = f40;
+        e[0U] = f01;
+        e[1U] = f111;
+        e[2U] = f2;
+        e[3U] = f3;
+        e[4U] = f41;
+        uint64_t b = (uint64_t)0x1000000U;
+        Lib_IntVector_Intrinsics_vec256 mask = Lib_IntVector_Intrinsics_vec256_load64(b);
+        Lib_IntVector_Intrinsics_vec256 f4 = e[4U];
+        e[4U] = Lib_IntVector_Intrinsics_vec256_or(f4, mask);
+        Lib_IntVector_Intrinsics_vec256 *r = pre;
+        Lib_IntVector_Intrinsics_vec256 *r5 = pre + (uint32_t)5U;
+        Lib_IntVector_Intrinsics_vec256 r0 = r[0U];
+        Lib_IntVector_Intrinsics_vec256 r1 = r[1U];
+        Lib_IntVector_Intrinsics_vec256 r2 = r[2U];
+        Lib_IntVector_Intrinsics_vec256 r3 = r[3U];
+        Lib_IntVector_Intrinsics_vec256 r4 = r[4U];
+        Lib_IntVector_Intrinsics_vec256 r51 = r5[1U];
+        Lib_IntVector_Intrinsics_vec256 r52 = r5[2U];
+        Lib_IntVector_Intrinsics_vec256 r53 = r5[3U];
+        Lib_IntVector_Intrinsics_vec256 r54 = r5[4U];
+        Lib_IntVector_Intrinsics_vec256 f10 = e[0U];
+        Lib_IntVector_Intrinsics_vec256 f11 = e[1U];
+        Lib_IntVector_Intrinsics_vec256 f12 = e[2U];
+        Lib_IntVector_Intrinsics_vec256 f13 = e[3U];
+        Lib_IntVector_Intrinsics_vec256 f14 = e[4U];
+        Lib_IntVector_Intrinsics_vec256 a0 = acc[0U];
+        Lib_IntVector_Intrinsics_vec256 a1 = acc[1U];
+        Lib_IntVector_Intrinsics_vec256 a2 = acc[2U];
+        Lib_IntVector_Intrinsics_vec256 a3 = acc[3U];
+        Lib_IntVector_Intrinsics_vec256 a4 = acc[4U];
+        Lib_IntVector_Intrinsics_vec256 a01 = Lib_IntVector_Intrinsics_vec256_add64(a0, f10);
+        Lib_IntVector_Intrinsics_vec256 a11 = Lib_IntVector_Intrinsics_vec256_add64(a1, f11);
+        Lib_IntVector_Intrinsics_vec256 a21 = Lib_IntVector_Intrinsics_vec256_add64(a2, f12);
+        Lib_IntVector_Intrinsics_vec256 a31 = Lib_IntVector_Intrinsics_vec256_add64(a3, f13);
+        Lib_IntVector_Intrinsics_vec256 a41 = Lib_IntVector_Intrinsics_vec256_add64(a4, f14);
+        Lib_IntVector_Intrinsics_vec256 a02 = Lib_IntVector_Intrinsics_vec256_mul64(r0, a01);
+        Lib_IntVector_Intrinsics_vec256 a12 = Lib_IntVector_Intrinsics_vec256_mul64(r1, a01);
+        Lib_IntVector_Intrinsics_vec256 a22 = Lib_IntVector_Intrinsics_vec256_mul64(r2, a01);
+        Lib_IntVector_Intrinsics_vec256 a32 = Lib_IntVector_Intrinsics_vec256_mul64(r3, a01);
+        Lib_IntVector_Intrinsics_vec256 a42 = Lib_IntVector_Intrinsics_vec256_mul64(r4, a01);
+        Lib_IntVector_Intrinsics_vec256
+            a03 =
+                Lib_IntVector_Intrinsics_vec256_add64(a02,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a13 =
+                Lib_IntVector_Intrinsics_vec256_add64(a12,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a23 =
+                Lib_IntVector_Intrinsics_vec256_add64(a22,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r1, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a33 =
+                Lib_IntVector_Intrinsics_vec256_add64(a32,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r2, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a43 =
+                Lib_IntVector_Intrinsics_vec256_add64(a42,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r3, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a04 =
+                Lib_IntVector_Intrinsics_vec256_add64(a03,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a14 =
+                Lib_IntVector_Intrinsics_vec256_add64(a13,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a24 =
+                Lib_IntVector_Intrinsics_vec256_add64(a23,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a34 =
+                Lib_IntVector_Intrinsics_vec256_add64(a33,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r1, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a44 =
+                Lib_IntVector_Intrinsics_vec256_add64(a43,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r2, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a05 =
+                Lib_IntVector_Intrinsics_vec256_add64(a04,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r52, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a15 =
+                Lib_IntVector_Intrinsics_vec256_add64(a14,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a25 =
+                Lib_IntVector_Intrinsics_vec256_add64(a24,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a35 =
+                Lib_IntVector_Intrinsics_vec256_add64(a34,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a45 =
+                Lib_IntVector_Intrinsics_vec256_add64(a44,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r1, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a06 =
+                Lib_IntVector_Intrinsics_vec256_add64(a05,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r51, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a16 =
+                Lib_IntVector_Intrinsics_vec256_add64(a15,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r52, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a26 =
+                Lib_IntVector_Intrinsics_vec256_add64(a25,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a36 =
+                Lib_IntVector_Intrinsics_vec256_add64(a35,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a46 =
+                Lib_IntVector_Intrinsics_vec256_add64(a45,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a41));
+        Lib_IntVector_Intrinsics_vec256 t01 = a06;
+        Lib_IntVector_Intrinsics_vec256 t11 = a16;
+        Lib_IntVector_Intrinsics_vec256 t2 = a26;
+        Lib_IntVector_Intrinsics_vec256 t3 = a36;
+        Lib_IntVector_Intrinsics_vec256 t4 = a46;
+        Lib_IntVector_Intrinsics_vec256
+            mask261 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+        Lib_IntVector_Intrinsics_vec256
+            z0 = Lib_IntVector_Intrinsics_vec256_shift_right64(t01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z1 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x0 = Lib_IntVector_Intrinsics_vec256_and(t01, mask261);
+        Lib_IntVector_Intrinsics_vec256 x3 = Lib_IntVector_Intrinsics_vec256_and(t3, mask261);
+        Lib_IntVector_Intrinsics_vec256 x1 = Lib_IntVector_Intrinsics_vec256_add64(t11, z0);
+        Lib_IntVector_Intrinsics_vec256 x4 = Lib_IntVector_Intrinsics_vec256_add64(t4, z1);
+        Lib_IntVector_Intrinsics_vec256
+            z01 = Lib_IntVector_Intrinsics_vec256_shift_right64(x1, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z11 = Lib_IntVector_Intrinsics_vec256_shift_right64(x4, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            t = Lib_IntVector_Intrinsics_vec256_shift_left64(z11, (uint32_t)2U);
+        Lib_IntVector_Intrinsics_vec256 z12 = Lib_IntVector_Intrinsics_vec256_add64(z11, t);
+        Lib_IntVector_Intrinsics_vec256 x11 = Lib_IntVector_Intrinsics_vec256_and(x1, mask261);
+        Lib_IntVector_Intrinsics_vec256 x41 = Lib_IntVector_Intrinsics_vec256_and(x4, mask261);
+        Lib_IntVector_Intrinsics_vec256 x2 = Lib_IntVector_Intrinsics_vec256_add64(t2, z01);
+        Lib_IntVector_Intrinsics_vec256 x01 = Lib_IntVector_Intrinsics_vec256_add64(x0, z12);
+        Lib_IntVector_Intrinsics_vec256
+            z02 = Lib_IntVector_Intrinsics_vec256_shift_right64(x2, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z13 = Lib_IntVector_Intrinsics_vec256_shift_right64(x01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x21 = Lib_IntVector_Intrinsics_vec256_and(x2, mask261);
+        Lib_IntVector_Intrinsics_vec256 x02 = Lib_IntVector_Intrinsics_vec256_and(x01, mask261);
+        Lib_IntVector_Intrinsics_vec256 x31 = Lib_IntVector_Intrinsics_vec256_add64(x3, z02);
+        Lib_IntVector_Intrinsics_vec256 x12 = Lib_IntVector_Intrinsics_vec256_add64(x11, z13);
+        Lib_IntVector_Intrinsics_vec256
+            z03 = Lib_IntVector_Intrinsics_vec256_shift_right64(x31, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x32 = Lib_IntVector_Intrinsics_vec256_and(x31, mask261);
+        Lib_IntVector_Intrinsics_vec256 x42 = Lib_IntVector_Intrinsics_vec256_add64(x41, z03);
+        Lib_IntVector_Intrinsics_vec256 o0 = x02;
+        Lib_IntVector_Intrinsics_vec256 o1 = x12;
+        Lib_IntVector_Intrinsics_vec256 o2 = x21;
+        Lib_IntVector_Intrinsics_vec256 o3 = x32;
+        Lib_IntVector_Intrinsics_vec256 o4 = x42;
+        acc[0U] = o0;
+        acc[1U] = o1;
+        acc[2U] = o2;
+        acc[3U] = o3;
+        acc[4U] = o4;
+    }
+    if (rem1 > (uint32_t)0U) {
+        uint8_t *last1 = t1 + nb * (uint32_t)16U;
+        Lib_IntVector_Intrinsics_vec256 e[5U];
+        for (uint32_t _i = 0U; _i < (uint32_t)5U; ++_i)
+            e[_i] = Lib_IntVector_Intrinsics_vec256_zero;
+        uint8_t tmp[16U] = { 0U };
+        memcpy(tmp, last1, rem1 * sizeof(last1[0U]));
+        uint64_t u0 = load64_le(tmp);
+        uint64_t lo = u0;
+        uint64_t u = load64_le(tmp + (uint32_t)8U);
+        uint64_t hi = u;
+        Lib_IntVector_Intrinsics_vec256 f0 = Lib_IntVector_Intrinsics_vec256_load64(lo);
+        Lib_IntVector_Intrinsics_vec256 f1 = Lib_IntVector_Intrinsics_vec256_load64(hi);
+        Lib_IntVector_Intrinsics_vec256
+            f010 =
+                Lib_IntVector_Intrinsics_vec256_and(f0,
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f110 =
+                Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                                  (uint32_t)26U),
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f20 =
+                Lib_IntVector_Intrinsics_vec256_or(Lib_IntVector_Intrinsics_vec256_shift_right64(f0,
+                                                                                                 (uint32_t)52U),
+                                                   Lib_IntVector_Intrinsics_vec256_shift_left64(Lib_IntVector_Intrinsics_vec256_and(f1,
+                                                                                                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3fffU)),
+                                                                                                (uint32_t)12U));
+        Lib_IntVector_Intrinsics_vec256
+            f30 =
+                Lib_IntVector_Intrinsics_vec256_and(Lib_IntVector_Intrinsics_vec256_shift_right64(f1,
+                                                                                                  (uint32_t)14U),
+                                                    Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+        Lib_IntVector_Intrinsics_vec256
+            f40 = Lib_IntVector_Intrinsics_vec256_shift_right64(f1, (uint32_t)40U);
+        Lib_IntVector_Intrinsics_vec256 f01 = f010;
+        Lib_IntVector_Intrinsics_vec256 f111 = f110;
+        Lib_IntVector_Intrinsics_vec256 f2 = f20;
+        Lib_IntVector_Intrinsics_vec256 f3 = f30;
+        Lib_IntVector_Intrinsics_vec256 f4 = f40;
+        e[0U] = f01;
+        e[1U] = f111;
+        e[2U] = f2;
+        e[3U] = f3;
+        e[4U] = f4;
+        uint64_t b = (uint64_t)1U << rem1 * (uint32_t)8U % (uint32_t)26U;
+        Lib_IntVector_Intrinsics_vec256 mask = Lib_IntVector_Intrinsics_vec256_load64(b);
+        Lib_IntVector_Intrinsics_vec256 fi = e[rem1 * (uint32_t)8U / (uint32_t)26U];
+        e[rem1 * (uint32_t)8U / (uint32_t)26U] = Lib_IntVector_Intrinsics_vec256_or(fi, mask);
+        Lib_IntVector_Intrinsics_vec256 *r = pre;
+        Lib_IntVector_Intrinsics_vec256 *r5 = pre + (uint32_t)5U;
+        Lib_IntVector_Intrinsics_vec256 r0 = r[0U];
+        Lib_IntVector_Intrinsics_vec256 r1 = r[1U];
+        Lib_IntVector_Intrinsics_vec256 r2 = r[2U];
+        Lib_IntVector_Intrinsics_vec256 r3 = r[3U];
+        Lib_IntVector_Intrinsics_vec256 r4 = r[4U];
+        Lib_IntVector_Intrinsics_vec256 r51 = r5[1U];
+        Lib_IntVector_Intrinsics_vec256 r52 = r5[2U];
+        Lib_IntVector_Intrinsics_vec256 r53 = r5[3U];
+        Lib_IntVector_Intrinsics_vec256 r54 = r5[4U];
+        Lib_IntVector_Intrinsics_vec256 f10 = e[0U];
+        Lib_IntVector_Intrinsics_vec256 f11 = e[1U];
+        Lib_IntVector_Intrinsics_vec256 f12 = e[2U];
+        Lib_IntVector_Intrinsics_vec256 f13 = e[3U];
+        Lib_IntVector_Intrinsics_vec256 f14 = e[4U];
+        Lib_IntVector_Intrinsics_vec256 a0 = acc[0U];
+        Lib_IntVector_Intrinsics_vec256 a1 = acc[1U];
+        Lib_IntVector_Intrinsics_vec256 a2 = acc[2U];
+        Lib_IntVector_Intrinsics_vec256 a3 = acc[3U];
+        Lib_IntVector_Intrinsics_vec256 a4 = acc[4U];
+        Lib_IntVector_Intrinsics_vec256 a01 = Lib_IntVector_Intrinsics_vec256_add64(a0, f10);
+        Lib_IntVector_Intrinsics_vec256 a11 = Lib_IntVector_Intrinsics_vec256_add64(a1, f11);
+        Lib_IntVector_Intrinsics_vec256 a21 = Lib_IntVector_Intrinsics_vec256_add64(a2, f12);
+        Lib_IntVector_Intrinsics_vec256 a31 = Lib_IntVector_Intrinsics_vec256_add64(a3, f13);
+        Lib_IntVector_Intrinsics_vec256 a41 = Lib_IntVector_Intrinsics_vec256_add64(a4, f14);
+        Lib_IntVector_Intrinsics_vec256 a02 = Lib_IntVector_Intrinsics_vec256_mul64(r0, a01);
+        Lib_IntVector_Intrinsics_vec256 a12 = Lib_IntVector_Intrinsics_vec256_mul64(r1, a01);
+        Lib_IntVector_Intrinsics_vec256 a22 = Lib_IntVector_Intrinsics_vec256_mul64(r2, a01);
+        Lib_IntVector_Intrinsics_vec256 a32 = Lib_IntVector_Intrinsics_vec256_mul64(r3, a01);
+        Lib_IntVector_Intrinsics_vec256 a42 = Lib_IntVector_Intrinsics_vec256_mul64(r4, a01);
+        Lib_IntVector_Intrinsics_vec256
+            a03 =
+                Lib_IntVector_Intrinsics_vec256_add64(a02,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a13 =
+                Lib_IntVector_Intrinsics_vec256_add64(a12,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a23 =
+                Lib_IntVector_Intrinsics_vec256_add64(a22,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r1, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a33 =
+                Lib_IntVector_Intrinsics_vec256_add64(a32,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r2, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a43 =
+                Lib_IntVector_Intrinsics_vec256_add64(a42,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r3, a11));
+        Lib_IntVector_Intrinsics_vec256
+            a04 =
+                Lib_IntVector_Intrinsics_vec256_add64(a03,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a14 =
+                Lib_IntVector_Intrinsics_vec256_add64(a13,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a24 =
+                Lib_IntVector_Intrinsics_vec256_add64(a23,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a34 =
+                Lib_IntVector_Intrinsics_vec256_add64(a33,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r1, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a44 =
+                Lib_IntVector_Intrinsics_vec256_add64(a43,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r2, a21));
+        Lib_IntVector_Intrinsics_vec256
+            a05 =
+                Lib_IntVector_Intrinsics_vec256_add64(a04,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r52, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a15 =
+                Lib_IntVector_Intrinsics_vec256_add64(a14,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a25 =
+                Lib_IntVector_Intrinsics_vec256_add64(a24,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a35 =
+                Lib_IntVector_Intrinsics_vec256_add64(a34,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a45 =
+                Lib_IntVector_Intrinsics_vec256_add64(a44,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r1, a31));
+        Lib_IntVector_Intrinsics_vec256
+            a06 =
+                Lib_IntVector_Intrinsics_vec256_add64(a05,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r51, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a16 =
+                Lib_IntVector_Intrinsics_vec256_add64(a15,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r52, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a26 =
+                Lib_IntVector_Intrinsics_vec256_add64(a25,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r53, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a36 =
+                Lib_IntVector_Intrinsics_vec256_add64(a35,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r54, a41));
+        Lib_IntVector_Intrinsics_vec256
+            a46 =
+                Lib_IntVector_Intrinsics_vec256_add64(a45,
+                                                      Lib_IntVector_Intrinsics_vec256_mul64(r0, a41));
+        Lib_IntVector_Intrinsics_vec256 t01 = a06;
+        Lib_IntVector_Intrinsics_vec256 t11 = a16;
+        Lib_IntVector_Intrinsics_vec256 t2 = a26;
+        Lib_IntVector_Intrinsics_vec256 t3 = a36;
+        Lib_IntVector_Intrinsics_vec256 t4 = a46;
+        Lib_IntVector_Intrinsics_vec256
+            mask261 = Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU);
+        Lib_IntVector_Intrinsics_vec256
+            z0 = Lib_IntVector_Intrinsics_vec256_shift_right64(t01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z1 = Lib_IntVector_Intrinsics_vec256_shift_right64(t3, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x0 = Lib_IntVector_Intrinsics_vec256_and(t01, mask261);
+        Lib_IntVector_Intrinsics_vec256 x3 = Lib_IntVector_Intrinsics_vec256_and(t3, mask261);
+        Lib_IntVector_Intrinsics_vec256 x1 = Lib_IntVector_Intrinsics_vec256_add64(t11, z0);
+        Lib_IntVector_Intrinsics_vec256 x4 = Lib_IntVector_Intrinsics_vec256_add64(t4, z1);
+        Lib_IntVector_Intrinsics_vec256
+            z01 = Lib_IntVector_Intrinsics_vec256_shift_right64(x1, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z11 = Lib_IntVector_Intrinsics_vec256_shift_right64(x4, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            t = Lib_IntVector_Intrinsics_vec256_shift_left64(z11, (uint32_t)2U);
+        Lib_IntVector_Intrinsics_vec256 z12 = Lib_IntVector_Intrinsics_vec256_add64(z11, t);
+        Lib_IntVector_Intrinsics_vec256 x11 = Lib_IntVector_Intrinsics_vec256_and(x1, mask261);
+        Lib_IntVector_Intrinsics_vec256 x41 = Lib_IntVector_Intrinsics_vec256_and(x4, mask261);
+        Lib_IntVector_Intrinsics_vec256 x2 = Lib_IntVector_Intrinsics_vec256_add64(t2, z01);
+        Lib_IntVector_Intrinsics_vec256 x01 = Lib_IntVector_Intrinsics_vec256_add64(x0, z12);
+        Lib_IntVector_Intrinsics_vec256
+            z02 = Lib_IntVector_Intrinsics_vec256_shift_right64(x2, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256
+            z13 = Lib_IntVector_Intrinsics_vec256_shift_right64(x01, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x21 = Lib_IntVector_Intrinsics_vec256_and(x2, mask261);
+        Lib_IntVector_Intrinsics_vec256 x02 = Lib_IntVector_Intrinsics_vec256_and(x01, mask261);
+        Lib_IntVector_Intrinsics_vec256 x31 = Lib_IntVector_Intrinsics_vec256_add64(x3, z02);
+        Lib_IntVector_Intrinsics_vec256 x12 = Lib_IntVector_Intrinsics_vec256_add64(x11, z13);
+        Lib_IntVector_Intrinsics_vec256
+            z03 = Lib_IntVector_Intrinsics_vec256_shift_right64(x31, (uint32_t)26U);
+        Lib_IntVector_Intrinsics_vec256 x32 = Lib_IntVector_Intrinsics_vec256_and(x31, mask261);
+        Lib_IntVector_Intrinsics_vec256 x42 = Lib_IntVector_Intrinsics_vec256_add64(x41, z03);
+        Lib_IntVector_Intrinsics_vec256 o0 = x02;
+        Lib_IntVector_Intrinsics_vec256 o1 = x12;
+        Lib_IntVector_Intrinsics_vec256 o2 = x21;
+        Lib_IntVector_Intrinsics_vec256 o3 = x32;
+        Lib_IntVector_Intrinsics_vec256 o4 = x42;
+        acc[0U] = o0;
+        acc[1U] = o1;
+        acc[2U] = o2;
+        acc[3U] = o3;
+        acc[4U] = o4;
+        return;
+    }
+}
+
+void
+Hacl_Poly1305_256_poly1305_finish(
+    uint8_t *tag,
+    uint8_t *key,
+    Lib_IntVector_Intrinsics_vec256 *ctx)
+{
+    Lib_IntVector_Intrinsics_vec256 *acc = ctx;
+    uint8_t *ks = key + (uint32_t)16U;
+    Lib_IntVector_Intrinsics_vec256 f0 = acc[0U];
+    Lib_IntVector_Intrinsics_vec256 f13 = acc[1U];
+    Lib_IntVector_Intrinsics_vec256 f23 = acc[2U];
+    Lib_IntVector_Intrinsics_vec256 f33 = acc[3U];
+    Lib_IntVector_Intrinsics_vec256 f40 = acc[4U];
+    Lib_IntVector_Intrinsics_vec256
+        l0 = Lib_IntVector_Intrinsics_vec256_add64(f0, Lib_IntVector_Intrinsics_vec256_zero);
+    Lib_IntVector_Intrinsics_vec256
+        tmp00 =
+            Lib_IntVector_Intrinsics_vec256_and(l0,
+                                                Lib_IntVector_Intrinsics_vec256_load64((uint64_t)0x3ffffffU));
+    Lib_IntVector_Intrinsics_vec256