!pip install tensorflow_transform==1.4.0
!pip install apache-beam==2.39.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow_transform==1.4.0
  Downloading tensorflow_transform-1.4.0-py3-none-any.whl (413 kB)
     |████████████████████████████████| 413 kB 15.7 MB/s 
Collecting apache-beam[gcp]<3,>=2.33
  Downloading apache_beam-2.41.0-cp37-cp37m-manylinux2010_x86_64.whl (10.9 MB)
     |████████████████████████████████| 10.9 MB 55.1 MB/s 
Collecting tfx-bsl<1.5.0,>=1.4.0
  Downloading tfx_bsl-1.4.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (19.1 MB)
     |████████████████████████████████| 19.1 MB 460 kB/s 
Collecting tensorflow-metadata<1.5.0,>=1.4.0
  Downloading tensorflow_metadata-1.4.0-py3-none-any.whl (48 kB)
     |████████████████████████████████| 48 kB 5.3 MB/s 
Requirement already satisfied: pydot<2,>=1.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow_transform==1.4.0) (1.3.0)
Collecting tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2
  Downloading https://us-python.pkg.dev/colab-wheels/public/tensorflow/tensorflow-2.6.5%2Bzzzcolab20220523104206-cp37-cp37m-linux_x86_64.whl (570.3 MB)
     |████████████████████████████████| 570.3 MB 291 bytes/s 
Requirement already satisfied: protobuf<4,>=3.13 in /usr/local/lib/python3.7/dist-packages (from tensorflow_transform==1.4.0) (3.17.3)
Collecting numpy<1.20,>=1.16
  Downloading numpy-1.19.5-cp37-cp37m-manylinux2010_x86_64.whl (14.8 MB)
     |████████████████████████████████| 14.8 MB 54.1 MB/s 
Collecting absl-py<0.13,>=0.9
  Downloading absl_py-0.12.0-py3-none-any.whl (129 kB)
     |████████████████████████████████| 129 kB 68.0 MB/s 
Collecting pyarrow<6,>=1
  Downloading pyarrow-5.0.0-cp37-cp37m-manylinux2014_x86_64.whl (23.6 MB)
     |████████████████████████████████| 23.6 MB 1.2 MB/s 
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from absl-py<0.13,>=0.9->tensorflow_transform==1.4.0) (1.15.0)
Collecting hdfs<3.0.0,>=2.1.0
  Downloading hdfs-2.7.0-py3-none-any.whl (34 kB)
Requirement already satisfied: grpcio<2,>=1.33.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.47.0)
Collecting cloudpickle<3,>=2.1.0
  Downloading cloudpickle-2.1.0-py3-none-any.whl (25 kB)
Requirement already satisfied: crcmod<2.0,>=1.7 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.7)
Collecting pymongo<4.0.0,>=3.8.0
  Downloading pymongo-3.12.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (508 kB)
     |████████████████████████████████| 508 kB 61.9 MB/s 
Collecting proto-plus<2,>=1.7.1
  Downloading proto_plus-1.22.1-py3-none-any.whl (47 kB)
     |████████████████████████████████| 47 kB 6.0 MB/s 
Collecting orjson<4.0
  Downloading orjson-3.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (270 kB)
     |████████████████████████████████| 270 kB 75.2 MB/s 
Requirement already satisfied: typing-extensions>=3.7.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (4.1.1)
Collecting dill<0.3.2,>=0.3.1.1
  Downloading dill-0.3.1.1.tar.gz (151 kB)
     |████████████████████████████████| 151 kB 66.3 MB/s 
Collecting fastavro<2,>=0.23.6
  Downloading fastavro-1.6.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)
     |████████████████████████████████| 2.4 MB 60.1 MB/s 
Requirement already satisfied: httplib2<0.21.0,>=0.8 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (0.17.4)
Requirement already satisfied: python-dateutil<3,>=2.8.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (2.8.2)
Collecting requests<3.0.0,>=2.24.0
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
     |████████████████████████████████| 62 kB 1.4 MB/s 
Requirement already satisfied: pytz>=2018.3 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (2022.2.1)
Collecting google-auth-httplib2<0.2.0,>=0.1.0
  Downloading google_auth_httplib2-0.1.0-py2.py3-none-any.whl (9.3 kB)
Collecting google-apitools<0.5.32,>=0.5.31
  Downloading google-apitools-0.5.31.tar.gz (173 kB)
     |████████████████████████████████| 173 kB 78.2 MB/s 
Requirement already satisfied: google-api-core!=2.8.2,<3 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.31.6)
Collecting grpcio-gcp<1,>=0.2.2
  Downloading grpcio_gcp-0.2.2-py2.py3-none-any.whl (9.4 kB)
Collecting google-cloud-vision<2,>=0.38.0
  Downloading google_cloud_vision-1.0.2-py2.py3-none-any.whl (435 kB)
     |████████████████████████████████| 435 kB 68.8 MB/s 
Collecting google-cloud-spanner<2,>=1.13.0
  Downloading google_cloud_spanner-1.19.3-py2.py3-none-any.whl (255 kB)
     |████████████████████████████████| 255 kB 74.9 MB/s 
Collecting google-cloud-bigquery-storage<2.14,>=2.6.3
  Downloading google_cloud_bigquery_storage-2.13.2-py2.py3-none-any.whl (180 kB)
     |████████████████████████████████| 180 kB 78.0 MB/s 
Requirement already satisfied: cachetools<5,>=3.1.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (4.2.4)
Collecting google-cloud-videointelligence<2,>=1.8.0
  Downloading google_cloud_videointelligence-1.16.3-py2.py3-none-any.whl (183 kB)
     |████████████████████████████████| 183 kB 75.5 MB/s 
Collecting google-cloud-bigtable<2,>=0.31.1
  Downloading google_cloud_bigtable-1.7.2-py2.py3-none-any.whl (267 kB)
     |████████████████████████████████| 267 kB 74.2 MB/s 
Collecting google-cloud-language<2,>=1.3.0
  Downloading google_cloud_language-1.3.2-py2.py3-none-any.whl (83 kB)
     |████████████████████████████████| 83 kB 2.2 MB/s 
Requirement already satisfied: google-cloud-bigquery<3,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.21.0)
Requirement already satisfied: google-cloud-core<3,>=0.28.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.0.3)
Collecting google-cloud-pubsub<3,>=2.1.0
  Downloading google_cloud_pubsub-2.13.6-py2.py3-none-any.whl (235 kB)
     |████████████████████████████████| 235 kB 71.9 MB/s 
Requirement already satisfied: google-cloud-datastore<2,>=1.8.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.8.0)
Collecting google-cloud-pubsublite<2,>=1.2.0
  Downloading google_cloud_pubsublite-1.4.3-py2.py3-none-any.whl (267 kB)
     |████████████████████████████████| 267 kB 77.9 MB/s 
Requirement already satisfied: google-auth<3,>=1.18.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.35.0)
Collecting google-cloud-dlp<4,>=3.0.0
  Downloading google_cloud_dlp-3.8.1-py2.py3-none-any.whl (119 kB)
     |████████████████████████████████| 119 kB 76.6 MB/s 
Collecting google-cloud-recommendations-ai<0.8.0,>=0.1.0
  Downloading google_cloud_recommendations_ai-0.7.1-py2.py3-none-any.whl (148 kB)
     |████████████████████████████████| 148 kB 71.2 MB/s 
Requirement already satisfied: setuptools>=40.3.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core!=2.8.2,<3->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (57.4.0)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core!=2.8.2,<3->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.56.4)
Requirement already satisfied: packaging>=14.3 in /usr/local/lib/python3.7/dist-packages (from google-api-core!=2.8.2,<3->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (21.3)
Collecting fasteners>=0.14
  Downloading fasteners-0.17.3-py3-none-any.whl (18 kB)
Requirement already satisfied: oauth2client>=1.4.12 in /usr/local/lib/python3.7/dist-packages (from google-apitools<0.5.32,>=0.5.31->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (4.1.3)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.18.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (0.2.8)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.18.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (4.9)
Requirement already satisfied: google-resumable-media!=0.4.0,<0.5.0dev,>=0.3.1 in /usr/local/lib/python3.7/dist-packages (from google-cloud-bigquery<3,>=1.6.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (0.4.1)
Collecting protobuf<4,>=3.13
  Downloading protobuf-3.20.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)
     |████████████████████████████████| 1.0 MB 60.2 MB/s 
Collecting google-cloud-core<3,>=0.28.1
  Downloading google_cloud_core-1.7.3-py2.py3-none-any.whl (28 kB)
Collecting grpc-google-iam-v1<0.13dev,>=0.12.3
  Downloading grpc_google_iam_v1-0.12.4-py2.py3-none-any.whl (26 kB)
Collecting google-cloud-dlp<4,>=3.0.0
  Downloading google_cloud_dlp-3.8.0-py2.py3-none-any.whl (119 kB)
     |████████████████████████████████| 119 kB 72.0 MB/s 
  Downloading google_cloud_dlp-3.7.1-py2.py3-none-any.whl (118 kB)
     |████████████████████████████████| 118 kB 77.9 MB/s 
Collecting grpcio-status>=1.16.0
  Downloading grpcio_status-1.48.1-py3-none-any.whl (14 kB)
Collecting google-cloud-pubsub<3,>=2.1.0
  Downloading google_cloud_pubsub-2.13.5-py2.py3-none-any.whl (234 kB)
     |████████████████████████████████| 234 kB 69.0 MB/s 
  Downloading google_cloud_pubsub-2.13.4-py2.py3-none-any.whl (234 kB)
     |████████████████████████████████| 234 kB 67.7 MB/s 
  Downloading google_cloud_pubsub-2.13.3-py2.py3-none-any.whl (234 kB)
     |████████████████████████████████| 234 kB 78.7 MB/s 
  Downloading google_cloud_pubsub-2.13.2-py2.py3-none-any.whl (234 kB)
     |████████████████████████████████| 234 kB 75.8 MB/s 
  Downloading google_cloud_pubsub-2.13.1-py2.py3-none-any.whl (234 kB)
     |████████████████████████████████| 234 kB 67.8 MB/s 
Collecting overrides<7.0.0,>=6.0.1
  Downloading overrides-6.2.0-py3-none-any.whl (17 kB)
Collecting google-cloud-pubsublite<2,>=1.2.0
  Downloading google_cloud_pubsublite-1.4.2-py2.py3-none-any.whl (265 kB)
     |████████████████████████████████| 265 kB 73.0 MB/s 
Collecting google-cloud-recommendations-ai<0.8.0,>=0.1.0
  Downloading google_cloud_recommendations_ai-0.7.0-py2.py3-none-any.whl (148 kB)
     |████████████████████████████████| 148 kB 68.1 MB/s 
  Downloading google_cloud_recommendations_ai-0.6.2-py2.py3-none-any.whl (147 kB)
     |████████████████████████████████| 147 kB 71.7 MB/s 
Collecting grpcio<2,>=1.33.1
  Downloading grpcio-1.48.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB)
     |████████████████████████████████| 4.6 MB 55.4 MB/s 
Collecting docopt
  Downloading docopt-0.6.2.tar.gz (25 kB)
Requirement already satisfied: pyasn1>=0.1.7 in /usr/local/lib/python3.7/dist-packages (from oauth2client>=1.4.12->google-apitools<0.5.32,>=0.5.31->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (0.4.8)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=14.3->google-api-core!=2.8.2,<3->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (3.0.9)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (2022.6.15)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (2.10)
Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (2.1.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.24.3)
Collecting gast==0.4.0
  Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)
Requirement already satisfied: astunparse~=1.6.3 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.6.3)
Requirement already satisfied: h5py~=3.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (3.1.0)
Collecting clang~=5.0
  Downloading clang-5.0.tar.gz (30 kB)
Collecting tensorboard<2.7,>=2.6.0
  Downloading tensorboard-2.6.0-py3-none-any.whl (5.6 MB)
     |████████████████████████████████| 5.6 MB 28.4 MB/s 
Collecting wrapt~=1.12.1
  Downloading wrapt-1.12.1.tar.gz (27 kB)
Requirement already satisfied: opt-einsum~=3.3.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (3.3.0)
Collecting tensorflow-estimator<2.7,>=2.6.0
  Downloading tensorflow_estimator-2.6.0-py2.py3-none-any.whl (462 kB)
     |████████████████████████████████| 462 kB 63.6 MB/s 
Requirement already satisfied: google-pasta~=0.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (0.2.0)
Requirement already satisfied: keras-preprocessing~=1.1.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.1.2)
Collecting keras<2.7,>=2.6.0
  Downloading keras-2.6.0-py2.py3-none-any.whl (1.3 MB)
     |████████████████████████████████| 1.3 MB 60.4 MB/s 
Collecting typing-extensions>=3.7.0
  Downloading typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Requirement already satisfied: termcolor~=1.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.1.0)
Collecting protobuf<4,>=3.13
  Downloading protobuf-3.19.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 62.7 MB/s 
Requirement already satisfied: wheel~=0.35 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (0.37.1)
Collecting flatbuffers~=1.12.0
  Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB)
Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py~=3.1.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.5.2)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (0.6.1)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (0.4.6)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (3.4.1)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.8.1)
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.0.1)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.3.1)
Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.7/dist-packages (from markdown>=2.6.8->tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (4.12.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (3.8.1)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (3.2.0)
Collecting tensorflow-serving-api!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<3,>=1.15
  Downloading tensorflow_serving_api-2.9.1-py2.py3-none-any.whl (37 kB)
Requirement already satisfied: pandas<2,>=1.0 in /usr/local/lib/python3.7/dist-packages (from tfx-bsl<1.5.0,>=1.4.0->tensorflow_transform==1.4.0) (1.3.5)
Requirement already satisfied: google-api-python-client<2,>=1.7.11 in /usr/local/lib/python3.7/dist-packages (from tfx-bsl<1.5.0,>=1.4.0->tensorflow_transform==1.4.0) (1.12.11)
Requirement already satisfied: uritemplate<4dev,>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client<2,>=1.7.11->tfx-bsl<1.5.0,>=1.4.0->tensorflow_transform==1.4.0) (3.0.1)
  Downloading tensorflow_serving_api-2.9.0-py2.py3-none-any.whl (37 kB)
  Downloading tensorflow_serving_api-2.8.2-py2.py3-none-any.whl (37 kB)
  Downloading tensorflow_serving_api-2.8.0-py2.py3-none-any.whl (37 kB)
  Downloading tensorflow_serving_api-2.7.3-py2.py3-none-any.whl (37 kB)
  Downloading tensorflow_serving_api-2.7.0-py2.py3-none-any.whl (37 kB)
  Downloading tensorflow_serving_api-2.6.5-py2.py3-none-any.whl (37 kB)
Building wheels for collected packages: dill, google-apitools, clang, wrapt, docopt
  Building wheel for dill (setup.py) ... done
  Created wheel for dill: filename=dill-0.3.1.1-py3-none-any.whl size=78544 sha256=a8628e8e02ff369c897f050aa2f251d26a0341f9b68da31a497e4b386fe37c80
  Stored in directory: /root/.cache/pip/wheels/a4/61/fd/c57e374e580aa78a45ed78d5859b3a44436af17e22ca53284f
  Building wheel for google-apitools (setup.py) ... done
  Created wheel for google-apitools: filename=google_apitools-0.5.31-py3-none-any.whl size=131039 sha256=ba564574c87d171c595ecddc69de09bcc119d0445b98a6922bbbc36bbc1e8189
  Stored in directory: /root/.cache/pip/wheels/19/b5/2f/1cc3cf2b31e7a9cd1508731212526d9550271274d351c96f16
  Building wheel for clang (setup.py) ... done
  Created wheel for clang: filename=clang-5.0-py3-none-any.whl size=30694 sha256=fa3e6962c1abc57cf4211f26c27bd362f7b483e2b942a2eda131a3bc1512268f
  Stored in directory: /root/.cache/pip/wheels/98/91/04/971b4c587cf47ae952b108949b46926f426c02832d120a082a
  Building wheel for wrapt (setup.py) ... done
  Created wheel for wrapt: filename=wrapt-1.12.1-cp37-cp37m-linux_x86_64.whl size=68722 sha256=353d3c2f1dafc8dd36d22d5f87e48638a82ed14943b2399864d1ceee16755601
  Stored in directory: /root/.cache/pip/wheels/62/76/4c/aa25851149f3f6d9785f6c869387ad82b3fd37582fa8147ac6
  Building wheel for docopt (setup.py) ... done
  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13723 sha256=ed83d225a89465c2c92bfa144901c8d043cc6101679f99ceb55fcea26958ec6d
  Stored in directory: /root/.cache/pip/wheels/72/b0/3f/1d95f96ff986c7dfffe46ce2be4062f38ebd04b506c77c81b9
Successfully built dill google-apitools clang wrapt docopt
Installing collected packages: protobuf, typing-extensions, requests, grpcio, proto-plus, numpy, grpcio-status, grpcio-gcp, grpc-google-iam-v1, docopt, absl-py, wrapt, tensorflow-estimator, tensorboard, pymongo, pyarrow, overrides, orjson, keras, hdfs, google-cloud-pubsub, google-cloud-core, gast, flatbuffers, fasteners, fastavro, dill, cloudpickle, clang, tensorflow, google-cloud-vision, google-cloud-videointelligence, google-cloud-spanner, google-cloud-recommendations-ai, google-cloud-pubsublite, google-cloud-language, google-cloud-dlp, google-cloud-bigtable, google-cloud-bigquery-storage, google-auth-httplib2, google-apitools, apache-beam, tensorflow-serving-api, tensorflow-metadata, tfx-bsl, tensorflow-transform
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.17.3
    Uninstalling protobuf-3.17.3:
      Successfully uninstalled protobuf-3.17.3
  Attempting uninstall: typing-extensions
    Found existing installation: typing-extensions 4.1.1
    Uninstalling typing-extensions-4.1.1:
      Successfully uninstalled typing-extensions-4.1.1
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
  Attempting uninstall: grpcio
    Found existing installation: grpcio 1.47.0
    Uninstalling grpcio-1.47.0:
      Successfully uninstalled grpcio-1.47.0
  Attempting uninstall: numpy
    Found existing installation: numpy 1.21.6
    Uninstalling numpy-1.21.6:
      Successfully uninstalled numpy-1.21.6
  Attempting uninstall: absl-py
    Found existing installation: absl-py 1.2.0
    Uninstalling absl-py-1.2.0:
      Successfully uninstalled absl-py-1.2.0
  Attempting uninstall: wrapt
    Found existing installation: wrapt 1.14.1
    Uninstalling wrapt-1.14.1:
      Successfully uninstalled wrapt-1.14.1
  Attempting uninstall: tensorflow-estimator
    Found existing installation: tensorflow-estimator 2.8.0
    Uninstalling tensorflow-estimator-2.8.0:
      Successfully uninstalled tensorflow-estimator-2.8.0
  Attempting uninstall: tensorboard
    Found existing installation: tensorboard 2.8.0
    Uninstalling tensorboard-2.8.0:
      Successfully uninstalled tensorboard-2.8.0
  Attempting uninstall: pymongo
    Found existing installation: pymongo 4.2.0
    Uninstalling pymongo-4.2.0:
      Successfully uninstalled pymongo-4.2.0
  Attempting uninstall: pyarrow
    Found existing installation: pyarrow 6.0.1
    Uninstalling pyarrow-6.0.1:
      Successfully uninstalled pyarrow-6.0.1
  Attempting uninstall: keras
    Found existing installation: keras 2.8.0
    Uninstalling keras-2.8.0:
      Successfully uninstalled keras-2.8.0
  Attempting uninstall: google-cloud-core
    Found existing installation: google-cloud-core 1.0.3
    Uninstalling google-cloud-core-1.0.3:
      Successfully uninstalled google-cloud-core-1.0.3
  Attempting uninstall: gast
    Found existing installation: gast 0.5.3
    Uninstalling gast-0.5.3:
      Successfully uninstalled gast-0.5.3
  Attempting uninstall: flatbuffers
    Found existing installation: flatbuffers 2.0.7
    Uninstalling flatbuffers-2.0.7:
      Successfully uninstalled flatbuffers-2.0.7
  Attempting uninstall: dill
    Found existing installation: dill 0.3.5.1
    Uninstalling dill-0.3.5.1:
      Successfully uninstalled dill-0.3.5.1
  Attempting uninstall: cloudpickle
    Found existing installation: cloudpickle 1.5.0
    Uninstalling cloudpickle-1.5.0:
      Successfully uninstalled cloudpickle-1.5.0
  Attempting uninstall: tensorflow
    Found existing installation: tensorflow 2.8.2+zzzcolab20220719082949
    Uninstalling tensorflow-2.8.2+zzzcolab20220719082949:
      Successfully uninstalled tensorflow-2.8.2+zzzcolab20220719082949
  Attempting uninstall: google-cloud-language
    Found existing installation: google-cloud-language 1.2.0
    Uninstalling google-cloud-language-1.2.0:
      Successfully uninstalled google-cloud-language-1.2.0
  Attempting uninstall: google-cloud-bigquery-storage
    Found existing installation: google-cloud-bigquery-storage 1.1.2
    Uninstalling google-cloud-bigquery-storage-1.1.2:
      Successfully uninstalled google-cloud-bigquery-storage-1.1.2
  Attempting uninstall: google-auth-httplib2
    Found existing installation: google-auth-httplib2 0.0.4
    Uninstalling google-auth-httplib2-0.0.4:
      Successfully uninstalled google-auth-httplib2-0.0.4
  Attempting uninstall: tensorflow-metadata
    Found existing installation: tensorflow-metadata 1.10.0
    Uninstalling tensorflow-metadata-1.10.0:
      Successfully uninstalled tensorflow-metadata-1.10.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
xarray-einstats 0.2.2 requires numpy>=1.21, but you have numpy 1.19.5 which is incompatible.
cmdstanpy 1.0.7 requires numpy>=1.21, but you have numpy 1.19.5 which is incompatible.
Successfully installed absl-py-0.12.0 apache-beam-2.41.0 clang-5.0 cloudpickle-2.1.0 dill-0.3.1.1 docopt-0.6.2 fastavro-1.6.0 fasteners-0.17.3 flatbuffers-1.12 gast-0.4.0 google-apitools-0.5.31 google-auth-httplib2-0.1.0 google-cloud-bigquery-storage-2.13.2 google-cloud-bigtable-1.7.2 google-cloud-core-1.7.3 google-cloud-dlp-3.7.1 google-cloud-language-1.3.2 google-cloud-pubsub-2.13.1 google-cloud-pubsublite-1.4.2 google-cloud-recommendations-ai-0.6.2 google-cloud-spanner-1.19.3 google-cloud-videointelligence-1.16.3 google-cloud-vision-1.0.2 grpc-google-iam-v1-0.12.4 grpcio-1.48.1 grpcio-gcp-0.2.2 grpcio-status-1.48.1 hdfs-2.7.0 keras-2.6.0 numpy-1.19.5 orjson-3.8.0 overrides-6.2.0 proto-plus-1.22.1 protobuf-3.19.4 pyarrow-5.0.0 pymongo-3.12.3 requests-2.28.1 tensorboard-2.6.0 tensorflow-2.6.5+zzzcolab20220523104206 tensorflow-estimator-2.6.0 tensorflow-metadata-1.4.0 tensorflow-serving-api-2.6.5 tensorflow-transform-1.4.0 tfx-bsl-1.4.0 typing-extensions-3.10.0.2 wrapt-1.12.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting apache-beam==2.39.0
  Downloading apache_beam-2.39.0-cp37-cp37m-manylinux2010_x86_64.whl (10.3 MB)
     |████████████████████████████████| 10.3 MB 15.1 MB/s 
Requirement already satisfied: grpcio<2,>=1.29.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (1.48.1)
Requirement already satisfied: numpy<1.23.0,>=1.14.3 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (1.19.5)
Requirement already satisfied: pyarrow<8.0.0,>=0.15.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (5.0.0)
Requirement already satisfied: cloudpickle<3,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (2.1.0)
Requirement already satisfied: proto-plus<2,>=1.7.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (1.22.1)
Requirement already satisfied: pymongo<4.0.0,>=3.8.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (3.12.3)
Requirement already satisfied: python-dateutil<3,>=2.8.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (2.8.2)
Requirement already satisfied: typing-extensions>=3.7.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (3.10.0.2)
Requirement already satisfied: protobuf<4,>=3.12.2 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (3.19.4)
Requirement already satisfied: crcmod<2.0,>=1.7 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (1.7)
Requirement already satisfied: requests<3.0.0,>=2.24.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (2.28.1)
Requirement already satisfied: orjson<4.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (3.8.0)
Requirement already satisfied: pydot<2,>=1.2.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (1.3.0)
Requirement already satisfied: fastavro<2,>=0.23.6 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (1.6.0)
Requirement already satisfied: httplib2<0.20.0,>=0.8 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (0.17.4)
Requirement already satisfied: hdfs<3.0.0,>=2.1.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (2.7.0)
Requirement already satisfied: dill<0.3.2,>=0.3.1.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (0.3.1.1)
Requirement already satisfied: pytz>=2018.3 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (2022.2.1)
Requirement already satisfied: six>=1.5.2 in /usr/local/lib/python3.7/dist-packages (from grpcio<2,>=1.29.0->apache-beam==2.39.0) (1.15.0)
Requirement already satisfied: docopt in /usr/local/lib/python3.7/dist-packages (from hdfs<3.0.0,>=2.1.0->apache-beam==2.39.0) (0.6.2)
Requirement already satisfied: pyparsing>=2.1.4 in /usr/local/lib/python3.7/dist-packages (from pydot<2,>=1.2.0->apache-beam==2.39.0) (3.0.9)
Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam==2.39.0) (2.1.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam==2.39.0) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam==2.39.0) (2022.6.15)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam==2.39.0) (1.24.3)
Installing collected packages: apache-beam
  Attempting uninstall: apache-beam
    Found existing installation: apache-beam 2.41.0
    Uninstalling apache-beam-2.41.0:
      Successfully uninstalled apache-beam-2.41.0
Successfully installed apache-beam-2.39.0


import apache_beam as beam
print('Apache Beam version: {}'.format(beam.__version__))

import tensorflow as tf
print('Tensorflow version: {}'.format(tf.__version__))

import tensorflow_transform as tft
from tensorflow_transform import beam as tft_beam
from tensorflow_transform.tf_metadata import dataset_metadata
from tensorflow_transform.tf_metadata import schema_utils
print('TensorFlow Transform version: {}'.format(tft.__version__))

Apache Beam version: 2.39.0
Tensorflow version: 2.6.5
TensorFlow Transform version: 1.4.0


import os

# Directory of the raw data files
DATA_DIR = '/content/data/'

# Download the dataset
!wget -nc https://raw.githubusercontent.com/https-deeplearning-ai/MLEP-public/main/course2/week4-ungraded-lab/data/jena_climate_2009_2016.csv -P {DATA_DIR}

# Assign data path to a variable for easy reference
INPUT_FILE = os.path.join(DATA_DIR, 'jena_climate_2009_2016.csv')

--2022-09-03 08:21:46--  https://raw.githubusercontent.com/https-deeplearning-ai/MLEP-public/main/course2/week4-ungraded-lab/data/jena_climate_2009_2016.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 43164220 (41M) [text/plain]
Saving to: ‘/content/data/jena_climate_2009_2016.csv’

jena_climate_2009_2 100%[===================>]  41.16M   198MB/s    in 0.2s    

2022-09-03 08:21:49 (198 MB/s) - ‘/content/data/jena_climate_2009_2016.csv’ saved [43164220/43164220]


import pandas as pd

# Put dataset in a dataframe
df = pd.read_csv(INPUT_FILE, header=0, index_col=0)

# Preview the last few rows
df.tail()


df.describe().transpose()


# Define feature keys

TIMESTAMP_FEATURES = ["Date Time"]
NUMERIC_FEATURES = [
    "p (mbar)",
    "T (degC)",
    "Tpot (K)",
    "Tdew (degC)", 
    "rh (%)", 
    "VPmax (mbar)", 
    "VPact (mbar)", 
    "VPdef (mbar)", 
    "sh (g/kg)",
    "H2OC (mmol/mol)",
    "rho (g/m**3)",
    "wv (m/s)",
    "max. wv (m/s)",
    "wd (deg)",
]


#@ title Visualization Utilities
import matplotlib.pyplot as plt

# Color Palette
colors = [
    "blue",
    "orange",
    "green",
    "red",
    "purple",
    "brown",
    "pink",
    "gray",
    "olive",
    "cyan",
]

# Plots each column as a time series
def visualize_plots(dataset, columns):
    features = dataset[columns]
    fig, axes = plt.subplots(
        nrows=len(columns)//2 + len(columns)%2, ncols=2, figsize=(15, 20), dpi=80, facecolor="w", edgecolor="k"
    )
    for i, col in enumerate(columns):
        c = colors[i % (len(colors))]
        t_data = dataset[col]
        t_data.index = dataset.index
        t_data.head()
        ax = t_data.plot(
            ax=axes[i // 2, i % 2],
            color=c,
            title="{}".format(col),
            rot=25,
        )
    ax.legend([col])
    plt.tight_layout()


# Visualize the dataset
visualize_plots(df, NUMERIC_FEATURES)


# Set the wind velocity outliers to 0
wv = df['wv (m/s)']
bad_wv = wv == -9999.0
wv[bad_wv] = 0.0

# Set the max wind velocity outliers to 0
max_wv = df['max. wv (m/s)']
bad_max_wv = max_wv == -9999.0
max_wv[bad_max_wv] = 0.0


visualize_plots(df, NUMERIC_FEATURES)


import seaborn as sns

def show_correlation_heatmap(dataframe):
    plt.figure(figsize=(20,20))
    cor = dataframe.corr()
    sns.heatmap(cor, annot=True, cmap=plt.cm.PuBu)
    plt.show()


show_correlation_heatmap(df)


# Features to filter out
FEATURES_TO_REMOVE = ["Tpot (K)", "Tdew (degC)","VPact (mbar)" , "H2OC (mmol/mol)", "max. wv (m/s)"]


from datetime import datetime

# combine features into one list
ordered_columns = TIMESTAMP_FEATURES + NUMERIC_FEATURES

# index of the date time string
date_time_idx = ordered_columns.index(TIMESTAMP_FEATURES[0])

# index of the 'wv (m/s)' feature
wv_idx = ordered_columns.index('wv (m/s)')

def clean_fn(line):
  '''
  Converts datetime strings in the CSV to Unix timestamps and removes outliers
  the wind velocity column. Used as part of
  the transform pipeline.

  Args:
    line (string) - one row of a CSV file
  
  Returns:

  '''

  # Split the CSV string to a list
  line_split = line.split(b',')

  # Decodes the timestamp string to utf-8
  date_time_string = line_split[date_time_idx].decode("utf-8")

  # Creates a datetime object from the timestamp string
  date_time = datetime.strptime(date_time_string, '%d.%m.%Y %H:%M:%S')

  # Generates a timestamp from the object
  timestamp = datetime.timestamp(date_time)

  # Overwrites the string timestamp in the row with the timestamp in seconds
  line_split[date_time_idx] = bytes(str(timestamp), 'utf-8')

  # Check if wind velocity is an outlier
  if line_split[wv_idx] == b'-9999.0':

    # Overwrite with default value of 0
    line_split[wv_idx] = b'0.0'

  # rejoin the list item into one string
  mod_line = b','.join(line_split)

  return mod_line


import numpy as np
import math as m

def preprocessing_fn(inputs):
  """Preprocess input columns into transformed columns."""
  
  outputs = inputs.copy()

  # Filter redundant features
  for key in FEATURES_TO_REMOVE:
    del outputs[key]

  # Convert degrees to radians
  pi = tf.constant(m.pi)
  wd_rad = inputs['wd (deg)'] * pi / 180.0

  # Calculate the wind x and y components.
  outputs['Wx'] = inputs['wv (m/s)'] * tf.math.cos(wd_rad)
  outputs['Wy'] = inputs['wv (m/s)'] * tf.math.sin(wd_rad)

  # Delete `wv (m/s)` and `wd (deg)` after getting the wind vector
  del outputs['wv (m/s)']
  del outputs['wd (deg)']

  # Get day and year in seconds
  day = tf.cast(24*60*60, tf.float32)
  year = tf.cast((365.2425)*day, tf.float32)

  # Get timestamp feature
  timestamp_s = outputs['Date Time']

  # Convert timestamps into periodic signals
  outputs['Day sin'] = tf.math.sin(timestamp_s * (2 * pi / day))
  outputs['Day cos'] = tf.math.cos(timestamp_s * (2 * pi / day))
  outputs['Year sin'] = tf.math.sin(timestamp_s * (2 * pi / year))
  outputs['Year cos'] = tf.math.cos(timestamp_s * (2 * pi / year))

  # Delete timestamp feature
  del outputs['Date Time']

  # Declare final list of features
  FINAL_FEATURE_LIST =  ["p (mbar)",
    "T (degC)",
    "rh (%)", 
    "VPmax (mbar)", 
    "VPdef (mbar)", 
    "sh (g/kg)",
    "rho (g/m**3)",
    "Wx",
    "Wy",
    "Day sin",
    'Day cos',
    'Year sin',
    'Year cos'
    ]

  # Scale all features
  for key in FINAL_FEATURE_LIST:
    outputs[key] = tft.scale_to_0_1(outputs[key])

  return outputs


# Number of records to include in the train split
TRAIN_SPLIT = 300000

# Get date time of the last element in the train split
date_time_train_boundary = df.iloc[TRAIN_SPLIT - 1].name

# Creates a datetime object from the timestamp string
date_time_train_boundary = datetime.strptime(date_time_train_boundary, '%d.%m.%Y %H:%M:%S')

# Convert date time string to Unix timestamp in seconds
date_time_train_boundary = bytes(str(datetime.timestamp(date_time_train_boundary)), 'utf-8')


def partition_fn(line, num_partitions):
  '''
  Partition function to work with Beam.partition

  Args:
    line (string) - One record in the CSV file.
    num_partition (integer) - Number of partitions. Required argument by Beam. Unused in this function.

  Returns:
    0 or 1 (integer) - 0 if line timestamp is below the date time boundary, 1 otherwise. 
  '''

  # Split the CSV string to a list
  line_split = line.split(b',')

  # Get the timestamp of the current line
  line_dt = line_split[date_time_idx]

  # Check if it is above or below the date time boundary
  partition_num = int(line_dt > date_time_train_boundary)

  return partition_num


# Declare feature spec
RAW_DATA_FEATURE_SPEC = dict(
    [(name, tf.io.FixedLenFeature([], tf.float32))
     for name in TIMESTAMP_FEATURES] +
    [(name, tf.io.FixedLenFeature([], tf.float32))
     for name in NUMERIC_FEATURES]
)

# Create schema from feature spec
RAW_DATA_SCHEMA = tft.tf_metadata.schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC)


import shutil
from tfx_bsl.coders.example_coder import RecordBatchToExamplesEncoder
from tfx_bsl.public import tfxio

# Directory names for the TF Transform outputs
WORKING_DIR = 'transform_dir'
TRANSFORM_TRAIN_FILENAME = 'transform_train'
TRANSFORM_TEST_FILENAME = 'transform_test'
TRANSFORM_TEMP_DIR = 'tft_temp'


# The "with" block will create a pipeline, and run that pipeline at the exit
#   of the block.
def read_and_transform_data(working_dir):
  '''
  Reads a CSV File and preprocesses the data using TF Transform

  Args:
    working_dir (string) - directory to place TF Transform outputs
  
  Returns:
    transform_fn - transformation graph
    transformed_train_data - transformed training examples
    transformed_test_data - transformed test examples
    transformed_metadata - transform output metadata
  '''

  # Delete TF Transform if it already exists
  if os.path.exists(working_dir):
    shutil.rmtree(working_dir)

  with beam.Pipeline() as pipeline:
      with tft_beam.Context(temp_dir=os.path.join(working_dir, TRANSFORM_TEMP_DIR)):
        
        # Read the input CSV and clean the data
        raw_data = (
              pipeline
              | 'ReadTrainData' >> beam.io.ReadFromText(INPUT_FILE, coder=beam.coders.BytesCoder(), skip_header_lines=1)
              | 'CleanLines' >> beam.Map(clean_fn))

        # Partition the dataset into train and test sets using the partition_fn defined earlier.    
        raw_train_data, raw_test_data = (raw_data
                                 | 'TrainTestSplit' >> beam.Partition(partition_fn, 2))
        
        # Create a TFXIO to read the data with the schema. You need
        # to list all columns in order since the schema doesn't specify the
        # order of columns in the csv.
        csv_tfxio = tfxio.BeamRecordCsvTFXIO(
              physical_format='text',
              column_names=ordered_columns,
              schema=RAW_DATA_SCHEMA)

        # Parse the raw train data into inputs for TF Transform
        raw_train_data = (raw_train_data 
                          | 'DecodeTrainData' >> csv_tfxio.BeamSource())
        
        # Get the raw data metadata
        RAW_DATA_METADATA = csv_tfxio.TensorAdapterConfig()
        
        # Pair the train data with the metadata into a tuple
        raw_train_dataset = (raw_train_data, RAW_DATA_METADATA)

        # Training data transformation. The TFXIO (RecordBatch) output format
        # is chosen for improved performance.
        (transformed_train_data,transformed_metadata) , transform_fn = (
        raw_train_dataset | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn, output_record_batches=True))


        # Parse the raw data into inputs for TF Transform
        raw_test_data = (raw_test_data
                         | 'DecodeTestData' >> csv_tfxio.BeamSource())
        
        # Pair the test data with the metadata into a tuple
        raw_test_dataset = (raw_test_data, RAW_DATA_METADATA)
        
        # Now apply the same transform function to the test data.
        # You don't need the transformed data schema. It's the same as before.
        transformed_test_data, _ = (
          (raw_test_dataset, transform_fn) | tft_beam.TransformDataset(output_record_batches=True))
        
        # Declare an encoder to convert output record batches to TF Examples 
        transformed_data_coder = RecordBatchToExamplesEncoder(transformed_metadata.schema)
        
        # Encode transformed train data and write to disk
        _ = (
            transformed_train_data
            | 'EncodeTrainData' >> beam.FlatMapTuple(lambda batch, _: transformed_data_coder.encode(batch))
            | 'WriteTrainData' >> beam.io.WriteToTFRecord(
                os.path.join(working_dir, TRANSFORM_TRAIN_FILENAME)))

        # Encode transformed test data and write to disk
        _ = (
            transformed_test_data
            | 'EncodeTestData' >> beam.FlatMapTuple(lambda batch, _: transformed_data_coder.encode(batch))
            | 'WriteTestData' >> beam.io.WriteToTFRecord(
                os.path.join(working_dir, TRANSFORM_TEST_FILENAME)))
        
        # Write transform function to disk
        _ = (
          transform_fn
          | 'WriteTransformFn' >>
          tft_beam.WriteTransformFn(os.path.join(working_dir)))

         
  return transform_fn, transformed_train_data, transformed_test_data, transformed_metadata


def main():
  return read_and_transform_data(WORKING_DIR)

if __name__ == '__main__':
  transform_fn, transformed_train_data, trainsformed_test_data, transformed_metadata = main()

WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_transform/tf_utils.py:289: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.


# Constants to prepare the transformed data for modeling

LABEL_KEY = 'T (degC)'
OBSERVATIONS_PER_HOUR = 6
HISTORY_SIZE = 120
FUTURE_TARGET = 12
BATCH_SIZE = 72
SHIFT = 1


# Get the output of the Transform component
tf_transform_output = tft.TFTransformOutput(os.path.join(WORKING_DIR))

# Get the index of the label key
index_of_label = list(tf_transform_output.transformed_feature_spec().keys()).index(LABEL_KEY)
print(index_of_label)

2


def parse_function(example_proto):
    
    feature_spec = tf_transform_output.transformed_feature_spec()
    
    # Define features with the example_proto (transformed data) and the feature_spec using tf.io.parse_single_example 
    features = tf.io.parse_single_example(example_proto, feature_spec)
    values = list(features.values())
    values[index_of_label], values[len(features) - 1] = values[len(features) - 1], values[index_of_label]
    
    # Stack the values along the first axis
    stacked_features = tf.stack(values, axis=0)

    return stacked_features


def map_features_target(elements):
    features = elements[:HISTORY_SIZE]
    target = elements[-1:,-1]
    return (features, target)


def get_windowed_dataset(path):
        
    # Instantiate a tf.data.TFRecordDataset passing in the appropiate path
    dataset = tf.data.TFRecordDataset(path)
    
    # Use the dataset's map method to map the parse_function
    dataset = dataset.map(parse_function)
    
    # Use the window method with expected total size. Define stride and set drop_remainder to True
    dataset = dataset.window(HISTORY_SIZE + FUTURE_TARGET, shift=SHIFT, stride=OBSERVATIONS_PER_HOUR, drop_remainder=True)
    
    # Use the flat_map method passing in an anonymous function that given a window returns window.batch(HISTORY_SIZE + FUTURE_TARGET)
    dataset = dataset.flat_map(lambda window: window.batch(HISTORY_SIZE + FUTURE_TARGET))
    
    # Use the map method passing in the previously defined map_features_target function
    dataset = dataset.map(map_features_target) 
    
    # Use the batch method and pass in the appropiate batch size
    dataset = dataset.batch(BATCH_SIZE)

    return dataset


# Get list of train and test data tfrecord filenames from the transform outputs
train_tfrecord_files = tf.io.gfile.glob(os.path.join(WORKING_DIR, TRANSFORM_TRAIN_FILENAME + '*'))
test_tfrecord_files = tf.io.gfile.glob(os.path.join(WORKING_DIR, TRANSFORM_TEST_FILENAME + '*'))

# Generate dataset windows
windowed_train_dataset = get_windowed_dataset(train_tfrecord_files[0])
windowed_test_dataset = get_windowed_dataset(test_tfrecord_files[0])


ordered_feature_spec_names = tf_transform_output.transformed_feature_spec().keys()

# Preview an example in the train dataset
for features, target  in windowed_train_dataset.take(1):
    print(f'Shape of input features for a batch: {features.shape}')
    print(f'Shape of targets for a batch: {target.shape}\n')

    print(f'INPUT FEATURES:')
    for value, name in zip(features[0][0].numpy(), ordered_feature_spec_names):
      print(f'{name} : {value}') 
  
    print(f'\nTARGET TEMPERATURE: {target[0][0]}')

Shape of input features for a batch: (72, 120, 13)
Shape of targets for a batch: (72, 1)

INPUT FEATURES:
Day cos : 0.9994063377380371
Day sin : 0.5243587493896484
T (degC) : 0.08167896419763565
VPdef (mbar) : 0.0052231717854738235
VPmax (mbar) : 0.04098501801490784
Wx : 0.5456575751304626
Wy : 0.5428817868232727
Year cos : 0.9999775886535645
Year sin : 0.5047342777252197
p (mbar) : 0.8355501890182495
rh (%) : 0.9230327606201172
rho (g/m**3) : 0.743212878704071
sh (g/kg) : 0.25462883710861206

TARGET TEMPERATURE: 0.14098861813545227


# Preview an example in the test dataset
for features, target in windowed_test_dataset.take(1):
    print(f'Shape of input features for a batch: {features.shape}')
    print(f'Shape of targets for a batch: {target.shape}\n')

    print(f'INPUT FEATURES:')
    for value, name in zip(features[0][0].numpy(), ordered_feature_spec_names):
      print(f'{name} : {value}') 
  
    print(f'\nTARGET TEMPERATURE: {target[0][0]}')

Shape of input features for a batch: (72, 120, 13)
Shape of targets for a batch: (72, 1)

INPUT FEATURES:
Day cos : 0.9083542823791504
Day sin : 0.7885253429412842
T (degC) : 0.5666477680206299
VPdef (mbar) : 0.004748338367789984
VPmax (mbar) : 0.27363526821136475
Wx : 0.6980631351470947
Wy : 0.5810309648513794
Year cos : 0.3435264229774475
Year sin : 0.025114715099334717
p (mbar) : 0.8016924262046814
rh (%) : 0.9862148761749268
rho (g/m**3) : 0.4026159942150116
sh (g/kg) : 0.6419228911399841

TARGET TEMPERATURE: 0.7635467648506165

Index	Features	Format	Description
1	Date Time	01.01.2009 00:10:00	Date-time reference
2	p (mbar)	996.52	The pascal SI derived unit of pressure used to quantify internal pressure. Meteorological reports typically state atmospheric pressure in millibars.
3	T (degC)	-8.02	Temperature in Celsius
4	Tpot (K)	265.4	Temperature in Kelvin
5	Tdew (degC)	-8.9	Temperature in Celsius relative to humidity. Dew Point is a measure of the absolute amount of water in the air, the DP is the temperature at which the air cannot hold all the moisture in it and water condenses.
6	rh (%)	93.3	Relative Humidity is a measure of how saturated the air is with water vapor, the %RH determines the amount of water contained within collection objects.
7	VPmax (mbar)	3.33	Saturation vapor pressure
8	VPact (mbar)	3.11	Vapor pressure
9	VPdef (mbar)	0.22	Vapor pressure deficit
10	sh (g/kg)	1.94	Specific humidity
11	H2OC (mmol/mol)	3.12	Water vapor concentration
12	rho (g/m ** 3)	1307.75	Airtight
13	wv (m/s)	1.03	Wind speed
14	max. wv (m/s)	1.75	Maximum wind speed
15	wd (deg)	152.3	Wind direction in degrees

	p (mbar)	T (degC)	Tpot (K)	Tdew (degC)	rh (%)	VPmax (mbar)	VPact (mbar)	VPdef (mbar)	sh (g/kg)	H2OC (mmol/mol)	rho (g/m**3)	wv (m/s)	max. wv (m/s)	wd (deg)
Date Time
31.12.2016 23:20:00	1000.07	-4.05	269.10	-8.13	73.10	4.52	3.30	1.22	2.06	3.30	1292.98	0.67	1.52	240.0
31.12.2016 23:30:00	999.93	-3.35	269.81	-8.06	69.71	4.77	3.32	1.44	2.07	3.32	1289.44	1.14	1.92	234.3
31.12.2016 23:40:00	999.82	-3.16	270.01	-8.21	67.91	4.84	3.28	1.55	2.05	3.28	1288.39	1.08	2.00	215.2
31.12.2016 23:50:00	999.81	-4.23	268.94	-8.53	71.80	4.46	3.20	1.26	1.99	3.20	1293.56	1.49	2.16	225.8
01.01.2017 00:00:00	999.82	-4.82	268.36	-8.42	75.70	4.27	3.23	1.04	2.01	3.23	1296.38	1.23	1.96	184.9

	count	mean	std	min	25%	50%	75%	max
p (mbar)	420551.0	989.212776	8.358481	913.60	984.20	989.58	994.72	1015.35
T (degC)	420551.0	9.450147	8.423365	-23.01	3.36	9.42	15.47	37.28
Tpot (K)	420551.0	283.492743	8.504471	250.60	277.43	283.47	289.53	311.34
Tdew (degC)	420551.0	4.955854	6.730674	-25.01	0.24	5.22	10.07	23.11
rh (%)	420551.0	76.008259	16.476175	12.95	65.21	79.30	89.40	100.00
VPmax (mbar)	420551.0	13.576251	7.739020	0.95	7.78	11.82	17.60	63.77
VPact (mbar)	420551.0	9.533756	4.184164	0.79	6.21	8.86	12.35	28.32
VPdef (mbar)	420551.0	4.042412	4.896851	0.00	0.87	2.19	5.30	46.01
sh (g/kg)	420551.0	6.022408	2.656139	0.50	3.92	5.59	7.80	18.13
H2OC (mmol/mol)	420551.0	9.640223	4.235395	0.80	6.29	8.96	12.49	28.82
rho (g/m**3)	420551.0	1216.062748	39.975208	1059.45	1187.49	1213.79	1242.77	1393.54
wv (m/s)	420551.0	1.702224	65.446714	-9999.00	0.99	1.76	2.86	28.49
max. wv (m/s)	420551.0	3.056555	69.016932	-9999.00	1.76	2.96	4.74	23.50
wd (deg)	420551.0	174.743738	86.681693	0.00	124.90	198.10	234.10	360.00

Ungraded Lab: Feature Engineering with Weather Data¶

Install Packages¶

Imports¶

Download the Data¶

Inspect the Data¶

Feature Engineering¶

Correlated features¶

Distribution of Wind Data¶

Date Time Feature¶

Create a `tf.Transform` preprocessing_fn¶

Transform the data¶

Train Test Split¶

Declare Schema for Cleaned Data¶

Create the `tf.Transform` pipeline¶

Prepare Training and Test Datasets from TFTransformOutput¶

Wrap Up¶

Ungraded Lab: Feature Engineering with Weather Data¶

Install Packages¶

Imports¶

Download the Data¶

Inspect the Data¶

Feature Engineering¶

Correlated features¶

Distribution of Wind Data¶

Date Time Feature¶

Create a tf.Transform preprocessing_fn¶

Transform the data¶

Train Test Split¶

Declare Schema for Cleaned Data¶

Create the tf.Transform pipeline¶

Prepare Training and Test Datasets from TFTransformOutput¶

Wrap Up¶

Create a `tf.Transform` preprocessing_fn¶

Create the `tf.Transform` pipeline¶