!pip install tensorflow_transform==1.4.0
!pip install apache-beam==2.39.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow_transform==1.4.0
  Downloading tensorflow_transform-1.4.0-py3-none-any.whl (413 kB)
     |████████████████████████████████| 413 kB 33.1 MB/s 
Collecting absl-py<0.13,>=0.9
  Downloading absl_py-0.12.0-py3-none-any.whl (129 kB)
     |████████████████████████████████| 129 kB 68.4 MB/s 
Collecting apache-beam[gcp]<3,>=2.33
  Downloading apache_beam-2.41.0-cp37-cp37m-manylinux2010_x86_64.whl (10.9 MB)
     |████████████████████████████████| 10.9 MB 55.9 MB/s 
Collecting pyarrow<6,>=1
  Downloading pyarrow-5.0.0-cp37-cp37m-manylinux2014_x86_64.whl (23.6 MB)
     |████████████████████████████████| 23.6 MB 1.4 MB/s 
Collecting tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2
  Downloading https://us-python.pkg.dev/colab-wheels/public/tensorflow/tensorflow-2.6.5%2Bzzzcolab20220523104206-cp37-cp37m-linux_x86_64.whl (570.3 MB)
     |████████████████████████████████| 570.3 MB 328 bytes/s 
Collecting numpy<1.20,>=1.16
  Downloading numpy-1.19.5-cp37-cp37m-manylinux2010_x86_64.whl (14.8 MB)
     |████████████████████████████████| 14.8 MB 52.6 MB/s 
Collecting tfx-bsl<1.5.0,>=1.4.0
  Downloading tfx_bsl-1.4.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (19.1 MB)
     |████████████████████████████████| 19.1 MB 1.2 MB/s 
Requirement already satisfied: pydot<2,>=1.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow_transform==1.4.0) (1.3.0)
Requirement already satisfied: protobuf<4,>=3.13 in /usr/local/lib/python3.7/dist-packages (from tensorflow_transform==1.4.0) (3.17.3)
Collecting tensorflow-metadata<1.5.0,>=1.4.0
  Downloading tensorflow_metadata-1.4.0-py3-none-any.whl (48 kB)
     |████████████████████████████████| 48 kB 4.2 MB/s 
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from absl-py<0.13,>=0.9->tensorflow_transform==1.4.0) (1.15.0)
Requirement already satisfied: httplib2<0.21.0,>=0.8 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (0.17.4)
Collecting pymongo<4.0.0,>=3.8.0
  Downloading pymongo-3.12.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (508 kB)
     |████████████████████████████████| 508 kB 69.1 MB/s 
Collecting dill<0.3.2,>=0.3.1.1
  Downloading dill-0.3.1.1.tar.gz (151 kB)
     |████████████████████████████████| 151 kB 71.3 MB/s 
Requirement already satisfied: grpcio<2,>=1.33.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.47.0)
Collecting fastavro<2,>=0.23.6
  Downloading fastavro-1.6.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)
     |████████████████████████████████| 2.4 MB 53.5 MB/s 
Requirement already satisfied: crcmod<2.0,>=1.7 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.7)
Collecting orjson<4.0
  Downloading orjson-3.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (270 kB)
     |████████████████████████████████| 270 kB 76.4 MB/s 
Requirement already satisfied: python-dateutil<3,>=2.8.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (2.8.2)
Collecting cloudpickle<3,>=2.1.0
  Downloading cloudpickle-2.1.0-py3-none-any.whl (25 kB)
Collecting hdfs<3.0.0,>=2.1.0
  Downloading hdfs-2.7.0-py3-none-any.whl (34 kB)
Collecting requests<3.0.0,>=2.24.0
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
     |████████████████████████████████| 62 kB 1.8 MB/s 
Requirement already satisfied: typing-extensions>=3.7.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (4.1.1)
Collecting proto-plus<2,>=1.7.1
  Downloading proto_plus-1.22.1-py3-none-any.whl (47 kB)
     |████████████████████████████████| 47 kB 4.5 MB/s 
Requirement already satisfied: pytz>=2018.3 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (2022.2.1)
Collecting google-cloud-videointelligence<2,>=1.8.0
  Downloading google_cloud_videointelligence-1.16.3-py2.py3-none-any.whl (183 kB)
     |████████████████████████████████| 183 kB 77.7 MB/s 
Collecting google-cloud-recommendations-ai<0.8.0,>=0.1.0
  Downloading google_cloud_recommendations_ai-0.7.1-py2.py3-none-any.whl (148 kB)
     |████████████████████████████████| 148 kB 72.9 MB/s 
Collecting google-apitools<0.5.32,>=0.5.31
  Downloading google-apitools-0.5.31.tar.gz (173 kB)
     |████████████████████████████████| 173 kB 76.4 MB/s 
Collecting grpcio-gcp<1,>=0.2.2
  Downloading grpcio_gcp-0.2.2-py2.py3-none-any.whl (9.4 kB)
Collecting google-cloud-dlp<4,>=3.0.0
  Downloading google_cloud_dlp-3.8.1-py2.py3-none-any.whl (119 kB)
     |████████████████████████████████| 119 kB 66.1 MB/s 
Collecting google-cloud-bigquery-storage<2.14,>=2.6.3
  Downloading google_cloud_bigquery_storage-2.13.2-py2.py3-none-any.whl (180 kB)
     |████████████████████████████████| 180 kB 71.8 MB/s 
Requirement already satisfied: google-cloud-core<3,>=0.28.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.0.3)
Collecting google-cloud-pubsublite<2,>=1.2.0
  Downloading google_cloud_pubsublite-1.4.3-py2.py3-none-any.whl (267 kB)
     |████████████████████████████████| 267 kB 60.4 MB/s 
Requirement already satisfied: google-cloud-bigquery<3,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.21.0)
Requirement already satisfied: google-auth<3,>=1.18.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.35.0)
Requirement already satisfied: google-api-core!=2.8.2,<3 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.31.6)
Requirement already satisfied: cachetools<5,>=3.1.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (4.2.4)
Collecting google-cloud-language<2,>=1.3.0
  Downloading google_cloud_language-1.3.2-py2.py3-none-any.whl (83 kB)
     |████████████████████████████████| 83 kB 2.4 MB/s 
Collecting google-cloud-bigtable<2,>=0.31.1
  Downloading google_cloud_bigtable-1.7.2-py2.py3-none-any.whl (267 kB)
     |████████████████████████████████| 267 kB 70.8 MB/s 
Collecting google-auth-httplib2<0.2.0,>=0.1.0
  Downloading google_auth_httplib2-0.1.0-py2.py3-none-any.whl (9.3 kB)
Requirement already satisfied: google-cloud-datastore<2,>=1.8.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.8.0)
Collecting google-cloud-spanner<2,>=1.13.0
  Downloading google_cloud_spanner-1.19.3-py2.py3-none-any.whl (255 kB)
     |████████████████████████████████| 255 kB 77.4 MB/s 
Collecting google-cloud-vision<2,>=0.38.0
  Downloading google_cloud_vision-1.0.2-py2.py3-none-any.whl (435 kB)
     |████████████████████████████████| 435 kB 71.6 MB/s 
Collecting google-cloud-pubsub<3,>=2.1.0
  Downloading google_cloud_pubsub-2.13.6-py2.py3-none-any.whl (235 kB)
     |████████████████████████████████| 235 kB 74.8 MB/s 
Requirement already satisfied: setuptools>=40.3.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core!=2.8.2,<3->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (57.4.0)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core!=2.8.2,<3->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.56.4)
Requirement already satisfied: packaging>=14.3 in /usr/local/lib/python3.7/dist-packages (from google-api-core!=2.8.2,<3->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (21.3)
Collecting fasteners>=0.14
  Downloading fasteners-0.17.3-py3-none-any.whl (18 kB)
Requirement already satisfied: oauth2client>=1.4.12 in /usr/local/lib/python3.7/dist-packages (from google-apitools<0.5.32,>=0.5.31->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (4.1.3)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.18.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (0.2.8)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.18.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (4.9)
Requirement already satisfied: google-resumable-media!=0.4.0,<0.5.0dev,>=0.3.1 in /usr/local/lib/python3.7/dist-packages (from google-cloud-bigquery<3,>=1.6.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (0.4.1)
Collecting protobuf<4,>=3.13
  Downloading protobuf-3.20.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)
     |████████████████████████████████| 1.0 MB 64.4 MB/s 
Collecting grpc-google-iam-v1<0.13dev,>=0.12.3
  Downloading grpc_google_iam_v1-0.12.4-py2.py3-none-any.whl (26 kB)
Collecting google-cloud-core<3,>=0.28.1
  Downloading google_cloud_core-1.7.3-py2.py3-none-any.whl (28 kB)
Collecting google-cloud-dlp<4,>=3.0.0
  Downloading google_cloud_dlp-3.8.0-py2.py3-none-any.whl (119 kB)
     |████████████████████████████████| 119 kB 76.9 MB/s 
  Downloading google_cloud_dlp-3.7.1-py2.py3-none-any.whl (118 kB)
     |████████████████████████████████| 118 kB 76.9 MB/s 
Collecting grpcio-status>=1.16.0
  Downloading grpcio_status-1.48.1-py3-none-any.whl (14 kB)
Collecting google-cloud-pubsub<3,>=2.1.0
  Downloading google_cloud_pubsub-2.13.5-py2.py3-none-any.whl (234 kB)
     |████████████████████████████████| 234 kB 77.3 MB/s 
  Downloading google_cloud_pubsub-2.13.4-py2.py3-none-any.whl (234 kB)
     |████████████████████████████████| 234 kB 77.4 MB/s 
  Downloading google_cloud_pubsub-2.13.3-py2.py3-none-any.whl (234 kB)
     |████████████████████████████████| 234 kB 78.5 MB/s 
  Downloading google_cloud_pubsub-2.13.2-py2.py3-none-any.whl (234 kB)
     |████████████████████████████████| 234 kB 79.0 MB/s 
  Downloading google_cloud_pubsub-2.13.1-py2.py3-none-any.whl (234 kB)
     |████████████████████████████████| 234 kB 80.5 MB/s 
Collecting overrides<7.0.0,>=6.0.1
  Downloading overrides-6.2.0-py3-none-any.whl (17 kB)
Collecting google-cloud-pubsublite<2,>=1.2.0
  Downloading google_cloud_pubsublite-1.4.2-py2.py3-none-any.whl (265 kB)
     |████████████████████████████████| 265 kB 66.5 MB/s 
Collecting google-cloud-recommendations-ai<0.8.0,>=0.1.0
  Downloading google_cloud_recommendations_ai-0.7.0-py2.py3-none-any.whl (148 kB)
     |████████████████████████████████| 148 kB 70.1 MB/s 
  Downloading google_cloud_recommendations_ai-0.6.2-py2.py3-none-any.whl (147 kB)
     |████████████████████████████████| 147 kB 74.5 MB/s 
Collecting grpcio<2,>=1.33.1
  Downloading grpcio-1.48.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB)
     |████████████████████████████████| 4.6 MB 56.3 MB/s 
Collecting docopt
  Downloading docopt-0.6.2.tar.gz (25 kB)
Requirement already satisfied: pyasn1>=0.1.7 in /usr/local/lib/python3.7/dist-packages (from oauth2client>=1.4.12->google-apitools<0.5.32,>=0.5.31->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (0.4.8)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=14.3->google-api-core!=2.8.2,<3->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (3.0.9)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (2022.6.15)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (2.10)
Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (2.1.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam[gcp]<3,>=2.33->tensorflow_transform==1.4.0) (1.24.3)
Collecting tensorboard<2.7,>=2.6.0
  Downloading tensorboard-2.6.0-py3-none-any.whl (5.6 MB)
     |████████████████████████████████| 5.6 MB 49.0 MB/s 
Collecting gast==0.4.0
  Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting tensorflow-estimator<2.7,>=2.6.0
  Downloading tensorflow_estimator-2.6.0-py2.py3-none-any.whl (462 kB)
     |████████████████████████████████| 462 kB 72.1 MB/s 
Requirement already satisfied: google-pasta~=0.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (0.2.0)
Collecting flatbuffers~=1.12.0
  Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB)
Requirement already satisfied: opt-einsum~=3.3.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (3.3.0)
Collecting wrapt~=1.12.1
  Downloading wrapt-1.12.1.tar.gz (27 kB)
Collecting protobuf<4,>=3.13
  Downloading protobuf-3.19.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 53.6 MB/s 
Requirement already satisfied: keras-preprocessing~=1.1.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.1.2)
Requirement already satisfied: astunparse~=1.6.3 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.6.3)
Requirement already satisfied: h5py~=3.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (3.1.0)
Collecting keras<2.7,>=2.6.0
  Downloading keras-2.6.0-py2.py3-none-any.whl (1.3 MB)
     |████████████████████████████████| 1.3 MB 66.1 MB/s 
Collecting clang~=5.0
  Downloading clang-5.0.tar.gz (30 kB)
Requirement already satisfied: wheel~=0.35 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (0.37.1)
Requirement already satisfied: termcolor~=1.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.1.0)
Collecting typing-extensions>=3.7.0
  Downloading typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py~=3.1.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.5.2)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (0.6.1)
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.0.1)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (0.4.6)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.8.1)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (3.4.1)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (1.3.1)
Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.7/dist-packages (from markdown>=2.6.8->tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (4.12.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (3.8.1)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.7,>=2.6.0->tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<2.7,>=1.15.2->tensorflow_transform==1.4.0) (3.2.0)
Collecting tensorflow-serving-api!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<3,>=1.15
  Downloading tensorflow_serving_api-2.9.1-py2.py3-none-any.whl (37 kB)
Requirement already satisfied: google-api-python-client<2,>=1.7.11 in /usr/local/lib/python3.7/dist-packages (from tfx-bsl<1.5.0,>=1.4.0->tensorflow_transform==1.4.0) (1.12.11)
Requirement already satisfied: pandas<2,>=1.0 in /usr/local/lib/python3.7/dist-packages (from tfx-bsl<1.5.0,>=1.4.0->tensorflow_transform==1.4.0) (1.3.5)
Requirement already satisfied: uritemplate<4dev,>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client<2,>=1.7.11->tfx-bsl<1.5.0,>=1.4.0->tensorflow_transform==1.4.0) (3.0.1)
  Downloading tensorflow_serving_api-2.9.0-py2.py3-none-any.whl (37 kB)
  Downloading tensorflow_serving_api-2.8.2-py2.py3-none-any.whl (37 kB)
  Downloading tensorflow_serving_api-2.8.0-py2.py3-none-any.whl (37 kB)
  Downloading tensorflow_serving_api-2.7.3-py2.py3-none-any.whl (37 kB)
  Downloading tensorflow_serving_api-2.7.0-py2.py3-none-any.whl (37 kB)
  Downloading tensorflow_serving_api-2.6.5-py2.py3-none-any.whl (37 kB)
Building wheels for collected packages: dill, google-apitools, clang, wrapt, docopt
  Building wheel for dill (setup.py) ... done
  Created wheel for dill: filename=dill-0.3.1.1-py3-none-any.whl size=78544 sha256=9b941610fa183fad63ed7b8bf270ce98aabe36be9a58fc977633519a03b93499
  Stored in directory: /root/.cache/pip/wheels/a4/61/fd/c57e374e580aa78a45ed78d5859b3a44436af17e22ca53284f
  Building wheel for google-apitools (setup.py) ... done
  Created wheel for google-apitools: filename=google_apitools-0.5.31-py3-none-any.whl size=131039 sha256=6d36d9cf0d7d1f09e5de93d4de8ff96da5a01e1b68ead321ed49b85ad286f245
  Stored in directory: /root/.cache/pip/wheels/19/b5/2f/1cc3cf2b31e7a9cd1508731212526d9550271274d351c96f16
  Building wheel for clang (setup.py) ... done
  Created wheel for clang: filename=clang-5.0-py3-none-any.whl size=30694 sha256=9ed5d15e7fc3e36e767fde7e2a3e62cb542002a81f547c22051371950ba9638b
  Stored in directory: /root/.cache/pip/wheels/98/91/04/971b4c587cf47ae952b108949b46926f426c02832d120a082a
  Building wheel for wrapt (setup.py) ... done
  Created wheel for wrapt: filename=wrapt-1.12.1-cp37-cp37m-linux_x86_64.whl size=68720 sha256=5ae0f4982d805f7802fe9a9769a30b0f0ee6d1472ca0fb3983acab8bbc015fc3
  Stored in directory: /root/.cache/pip/wheels/62/76/4c/aa25851149f3f6d9785f6c869387ad82b3fd37582fa8147ac6
  Building wheel for docopt (setup.py) ... done
  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13723 sha256=e3d5748a52501d946eca3f3d3135ebc923c661b5e2a7764682b27b15d7d24502
  Stored in directory: /root/.cache/pip/wheels/72/b0/3f/1d95f96ff986c7dfffe46ce2be4062f38ebd04b506c77c81b9
Successfully built dill google-apitools clang wrapt docopt
Installing collected packages: protobuf, typing-extensions, requests, grpcio, proto-plus, numpy, grpcio-status, grpcio-gcp, grpc-google-iam-v1, docopt, absl-py, wrapt, tensorflow-estimator, tensorboard, pymongo, pyarrow, overrides, orjson, keras, hdfs, google-cloud-pubsub, google-cloud-core, gast, flatbuffers, fasteners, fastavro, dill, cloudpickle, clang, tensorflow, google-cloud-vision, google-cloud-videointelligence, google-cloud-spanner, google-cloud-recommendations-ai, google-cloud-pubsublite, google-cloud-language, google-cloud-dlp, google-cloud-bigtable, google-cloud-bigquery-storage, google-auth-httplib2, google-apitools, apache-beam, tensorflow-serving-api, tensorflow-metadata, tfx-bsl, tensorflow-transform
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.17.3
    Uninstalling protobuf-3.17.3:
      Successfully uninstalled protobuf-3.17.3
  Attempting uninstall: typing-extensions
    Found existing installation: typing-extensions 4.1.1
    Uninstalling typing-extensions-4.1.1:
      Successfully uninstalled typing-extensions-4.1.1
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
  Attempting uninstall: grpcio
    Found existing installation: grpcio 1.47.0
    Uninstalling grpcio-1.47.0:
      Successfully uninstalled grpcio-1.47.0
  Attempting uninstall: numpy
    Found existing installation: numpy 1.21.6
    Uninstalling numpy-1.21.6:
      Successfully uninstalled numpy-1.21.6
  Attempting uninstall: absl-py
    Found existing installation: absl-py 1.2.0
    Uninstalling absl-py-1.2.0:
      Successfully uninstalled absl-py-1.2.0
  Attempting uninstall: wrapt
    Found existing installation: wrapt 1.14.1
    Uninstalling wrapt-1.14.1:
      Successfully uninstalled wrapt-1.14.1
  Attempting uninstall: tensorflow-estimator
    Found existing installation: tensorflow-estimator 2.8.0
    Uninstalling tensorflow-estimator-2.8.0:
      Successfully uninstalled tensorflow-estimator-2.8.0
  Attempting uninstall: tensorboard
    Found existing installation: tensorboard 2.8.0
    Uninstalling tensorboard-2.8.0:
      Successfully uninstalled tensorboard-2.8.0
  Attempting uninstall: pymongo
    Found existing installation: pymongo 4.2.0
    Uninstalling pymongo-4.2.0:
      Successfully uninstalled pymongo-4.2.0
  Attempting uninstall: pyarrow
    Found existing installation: pyarrow 6.0.1
    Uninstalling pyarrow-6.0.1:
      Successfully uninstalled pyarrow-6.0.1
  Attempting uninstall: keras
    Found existing installation: keras 2.8.0
    Uninstalling keras-2.8.0:
      Successfully uninstalled keras-2.8.0
  Attempting uninstall: google-cloud-core
    Found existing installation: google-cloud-core 1.0.3
    Uninstalling google-cloud-core-1.0.3:
      Successfully uninstalled google-cloud-core-1.0.3
  Attempting uninstall: gast
    Found existing installation: gast 0.5.3
    Uninstalling gast-0.5.3:
      Successfully uninstalled gast-0.5.3
  Attempting uninstall: flatbuffers
    Found existing installation: flatbuffers 2.0.7
    Uninstalling flatbuffers-2.0.7:
      Successfully uninstalled flatbuffers-2.0.7
  Attempting uninstall: dill
    Found existing installation: dill 0.3.5.1
    Uninstalling dill-0.3.5.1:
      Successfully uninstalled dill-0.3.5.1
  Attempting uninstall: cloudpickle
    Found existing installation: cloudpickle 1.5.0
    Uninstalling cloudpickle-1.5.0:
      Successfully uninstalled cloudpickle-1.5.0
  Attempting uninstall: tensorflow
    Found existing installation: tensorflow 2.8.2+zzzcolab20220719082949
    Uninstalling tensorflow-2.8.2+zzzcolab20220719082949:
      Successfully uninstalled tensorflow-2.8.2+zzzcolab20220719082949
  Attempting uninstall: google-cloud-language
    Found existing installation: google-cloud-language 1.2.0
    Uninstalling google-cloud-language-1.2.0:
      Successfully uninstalled google-cloud-language-1.2.0
  Attempting uninstall: google-cloud-bigquery-storage
    Found existing installation: google-cloud-bigquery-storage 1.1.2
    Uninstalling google-cloud-bigquery-storage-1.1.2:
      Successfully uninstalled google-cloud-bigquery-storage-1.1.2
  Attempting uninstall: google-auth-httplib2
    Found existing installation: google-auth-httplib2 0.0.4
    Uninstalling google-auth-httplib2-0.0.4:
      Successfully uninstalled google-auth-httplib2-0.0.4
  Attempting uninstall: tensorflow-metadata
    Found existing installation: tensorflow-metadata 1.10.0
    Uninstalling tensorflow-metadata-1.10.0:
      Successfully uninstalled tensorflow-metadata-1.10.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
xarray-einstats 0.2.2 requires numpy>=1.21, but you have numpy 1.19.5 which is incompatible.
cmdstanpy 1.0.7 requires numpy>=1.21, but you have numpy 1.19.5 which is incompatible.
Successfully installed absl-py-0.12.0 apache-beam-2.41.0 clang-5.0 cloudpickle-2.1.0 dill-0.3.1.1 docopt-0.6.2 fastavro-1.6.0 fasteners-0.17.3 flatbuffers-1.12 gast-0.4.0 google-apitools-0.5.31 google-auth-httplib2-0.1.0 google-cloud-bigquery-storage-2.13.2 google-cloud-bigtable-1.7.2 google-cloud-core-1.7.3 google-cloud-dlp-3.7.1 google-cloud-language-1.3.2 google-cloud-pubsub-2.13.1 google-cloud-pubsublite-1.4.2 google-cloud-recommendations-ai-0.6.2 google-cloud-spanner-1.19.3 google-cloud-videointelligence-1.16.3 google-cloud-vision-1.0.2 grpc-google-iam-v1-0.12.4 grpcio-1.48.1 grpcio-gcp-0.2.2 grpcio-status-1.48.1 hdfs-2.7.0 keras-2.6.0 numpy-1.19.5 orjson-3.8.0 overrides-6.2.0 proto-plus-1.22.1 protobuf-3.19.4 pyarrow-5.0.0 pymongo-3.12.3 requests-2.28.1 tensorboard-2.6.0 tensorflow-2.6.5+zzzcolab20220523104206 tensorflow-estimator-2.6.0 tensorflow-metadata-1.4.0 tensorflow-serving-api-2.6.5 tensorflow-transform-1.4.0 tfx-bsl-1.4.0 typing-extensions-3.10.0.2 wrapt-1.12.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting apache-beam==2.39.0
  Downloading apache_beam-2.39.0-cp37-cp37m-manylinux2010_x86_64.whl (10.3 MB)
     |████████████████████████████████| 10.3 MB 16.6 MB/s 
Requirement already satisfied: crcmod<2.0,>=1.7 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (1.7)
Requirement already satisfied: numpy<1.23.0,>=1.14.3 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (1.19.5)
Requirement already satisfied: pytz>=2018.3 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (2022.2.1)
Requirement already satisfied: pyarrow<8.0.0,>=0.15.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (5.0.0)
Requirement already satisfied: typing-extensions>=3.7.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (3.10.0.2)
Requirement already satisfied: httplib2<0.20.0,>=0.8 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (0.17.4)
Requirement already satisfied: orjson<4.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (3.8.0)
Requirement already satisfied: hdfs<3.0.0,>=2.1.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (2.7.0)
Requirement already satisfied: pymongo<4.0.0,>=3.8.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (3.12.3)
Requirement already satisfied: proto-plus<2,>=1.7.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (1.22.1)
Requirement already satisfied: cloudpickle<3,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (2.1.0)
Requirement already satisfied: python-dateutil<3,>=2.8.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (2.8.2)
Requirement already satisfied: dill<0.3.2,>=0.3.1.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (0.3.1.1)
Requirement already satisfied: fastavro<2,>=0.23.6 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (1.6.0)
Requirement already satisfied: requests<3.0.0,>=2.24.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (2.28.1)
Requirement already satisfied: grpcio<2,>=1.29.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (1.48.1)
Requirement already satisfied: pydot<2,>=1.2.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (1.3.0)
Requirement already satisfied: protobuf<4,>=3.12.2 in /usr/local/lib/python3.7/dist-packages (from apache-beam==2.39.0) (3.19.4)
Requirement already satisfied: six>=1.5.2 in /usr/local/lib/python3.7/dist-packages (from grpcio<2,>=1.29.0->apache-beam==2.39.0) (1.15.0)
Requirement already satisfied: docopt in /usr/local/lib/python3.7/dist-packages (from hdfs<3.0.0,>=2.1.0->apache-beam==2.39.0) (0.6.2)
Requirement already satisfied: pyparsing>=2.1.4 in /usr/local/lib/python3.7/dist-packages (from pydot<2,>=1.2.0->apache-beam==2.39.0) (3.0.9)
Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam==2.39.0) (2.1.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam==2.39.0) (1.24.3)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam==2.39.0) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.24.0->apache-beam==2.39.0) (2022.6.15)
Installing collected packages: apache-beam
  Attempting uninstall: apache-beam
    Found existing installation: apache-beam 2.41.0
    Uninstalling apache-beam-2.41.0:
      Successfully uninstalled apache-beam-2.41.0
Successfully installed apache-beam-2.39.0


import apache_beam as beam
print('Apache Beam version: {}'.format(beam.__version__))

import tensorflow as tf
print('Tensorflow version: {}'.format(tf.__version__))

import tensorflow_transform as tft
from tensorflow_transform import beam as tft_beam
from tensorflow_transform.tf_metadata import dataset_metadata
from tensorflow_transform.tf_metadata import schema_utils
print('TensorFlow Transform version: {}'.format(tft.__version__))

Apache Beam version: 2.39.0
Tensorflow version: 2.6.5
TensorFlow Transform version: 1.4.0


import os

# Directory of the raw data files
DATA_DIR = '/content/data/'

# Download the dataset
!wget -nc https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/raw/main/course2/week4-ungraded-lab/data/WISDM_ar_latest.tar.gz -P {DATA_DIR}

# Extract the dataset
!tar -xvf {DATA_DIR}/WISDM_ar_latest.tar.gz -C {DATA_DIR}

# Assign data path to a variable for easy reference
INPUT_FILE = os.path.join(DATA_DIR, 'WISDM_ar_v1.1/WISDM_ar_v1.1_raw.txt')

--2022-09-03 08:28:57--  https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/raw/main/course2/week4-ungraded-lab/data/WISDM_ar_latest.tar.gz
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/main/course2/week4-ungraded-lab/data/WISDM_ar_latest.tar.gz [following]
--2022-09-03 08:28:57--  https://raw.githubusercontent.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/main/course2/week4-ungraded-lab/data/WISDM_ar_latest.tar.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11404612 (11M) [application/octet-stream]
Saving to: ‘/content/data/WISDM_ar_latest.tar.gz’

WISDM_ar_latest.tar 100%[===================>]  10.88M  --.-KB/s    in 0.06s   

2022-09-03 08:28:59 (177 MB/s) - ‘/content/data/WISDM_ar_latest.tar.gz’ saved [11404612/11404612]

WISDM_ar_v1.1/
WISDM_ar_v1.1/readme.txt
WISDM_ar_v1.1/WISDM_ar_v1.1_raw.txt
WISDM_ar_v1.1/WISDM_ar_v1.1_raw_about.txt
WISDM_ar_v1.1/WISDM_ar_v1.1_transformed.arff
WISDM_ar_v1.1/WISDM_ar_v1.1_trans_about.txt


import pandas as pd

# Put dataset in a dataframe
df = pd.read_csv(INPUT_FILE, header=None, names=['user_id', 'activity', 'timestamp', 'x-acc','y-acc', 'z-acc'])

# Preview the first few rows
df.head()


# Visulaization Utilities
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

def visualize_value_plots_for_categorical_feature(feature, colors=['b']):
    '''Plots a bar graph for categorical features'''
    counts = feature.value_counts()
    plt.bar(counts.index, counts.values, color=colors)
    plt.show()


def visualize_plots(dataset, activity, columns):
    '''Visualizes the accelerometer data against time'''
    features = dataset[dataset['activity'] == activity][columns][:200]
    if 'z-acc' in columns:
        # remove semicolons in the z-acc column
        features['z-acc'] = features['z-acc'].replace(regex=True, to_replace=r';', value=r'')
        features['z-acc'] = features['z-acc'].astype(np.float64)
    axis = features.plot(subplots=True, figsize=(16, 12), 
                     title=activity)

    for ax in axis:
        ax.legend(loc='lower left', bbox_to_anchor=(1.0, 0.5))


# Plot the histogram of activities
visualize_value_plots_for_categorical_feature(df['activity'], colors=['r', 'g', 'b', 'y', 'm', 'c'])


# Plot the histogram for users
visualize_value_plots_for_categorical_feature(df['user_id'])


def partition_fn(line, num_partitions):
  '''
  Partition function to work with Beam.partition

  Args:
    line (string) - One record in the CSV file.
    num_partition (integer) - Number of partitions. Required argument by Beam. Unused in this function.

  Returns:
    0 or 1 (integer) - 0 if user id is less than 30, 1 otherwise. 
  '''
  
  # Get the 1st substring delimited by a comma. Cast to an int.
  user_id = int(line[:line.index(b',')])

  # Check if it is above or below 30
  partition_num = int(user_id <= 30)

  return partition_num


# Plot the measurements for `Jogging`
visualize_plots(df, 'Jogging', columns=['x-acc', 'y-acc', 'z-acc'])


visualize_plots(df, 'Sitting', columns=['x-acc', 'y-acc', 'z-acc'])


STRING_FEATURES = ['activity']
INT_FEATURES = ['user_id', 'timestamp']
FLOAT_FEATURES = ['x-acc', 'y-acc', 'z-acc']

# Declare feature spec
RAW_DATA_FEATURE_SPEC = dict(
    [(name, tf.io.FixedLenFeature([], tf.string))
     for name in STRING_FEATURES] +
    [(name, tf.io.FixedLenFeature([], tf.int64))
     for name in INT_FEATURES] +
    [(name, tf.io.FixedLenFeature([], tf.float32))
     for name in FLOAT_FEATURES]
)

# Create schema from feature spec
RAW_DATA_SCHEMA = tft.tf_metadata.schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC)


LABEL_KEY = 'activity'

def preprocessing_fn(inputs):
  """Preprocess input columns into transformed columns."""

  # Copy inputs
  outputs = inputs.copy()

  # Delete features not to be included as inputs to the model
  del outputs["user_id"]
  del outputs["timestamp"]
  
  # Create a vocabulary for the string labels
  outputs[LABEL_KEY] = tft.compute_and_apply_vocabulary(inputs[LABEL_KEY],vocab_filename=LABEL_KEY)

  # Scale features by their min-max
  for key in FLOAT_FEATURES:
     outputs[key] = tft.scale_by_min_max(outputs[key])

  return outputs


import shutil
from tfx_bsl.coders.example_coder import RecordBatchToExamplesEncoder
from tfx_bsl.public import tfxio

# Directory names for the TF Transform outputs
WORKING_DIR = 'transform_dir'
TRANSFORM_TRAIN_FILENAME = 'transform_train'
TRANSFORM_TEST_FILENAME = 'transform_test'
TRANSFORM_TEMP_DIR = 'tft_temp'

ordered_columns = ['user_id', 'activity', 'timestamp', 'x-acc','y-acc', 'z-acc']

def transform_data(working_dir):
    '''
    Reads a CSV File and preprocesses the data using TF Transform

    Args:
      working_dir (string) - directory to place TF Transform outputs
    
    Returns:
      transform_fn - transformation graph
      transformed_train_data - transformed training examples
      transformed_test_data - transformed test examples
      transformed_metadata - transform output metadata
    '''

    # Delete TF Transform if it already exists
    if os.path.exists(working_dir):
      shutil.rmtree(working_dir)

    with beam.Pipeline() as pipeline:
        with tft_beam.Context(temp_dir=os.path.join(working_dir, TRANSFORM_TEMP_DIR)):
  
          # Read the input CSV, clean and filter the data (replace semicolon and incomplete rows)
          raw_data = (
              pipeline
              | 'ReadTrainData' >> beam.io.ReadFromText(INPUT_FILE, coder=beam.coders.BytesCoder())
              | 'CleanLines' >> beam.Map(lambda line: line.replace(b',;', b'').replace(b';', b''))
              | 'FilterLines' >> beam.Filter(lambda line: line.count(b',') == len(ordered_columns) - 1 and line[-1:] != b','))

          # Partition the data into training and test data using beam.Partition
          raw_train_data, raw_test_data = (raw_data
                                  | 'TrainTestSplit' >> beam.Partition(partition_fn, 2))
                    
          # Create a TFXIO to read the data with the schema. 
          csv_tfxio = tfxio.BeamRecordCsvTFXIO(
              physical_format='text',
              column_names=ordered_columns,
              schema=RAW_DATA_SCHEMA)

          # Parse the raw train data into inputs for TF Transform
          raw_train_data = (raw_train_data 
                            | 'DecodeTrainData' >> csv_tfxio.BeamSource())

          # Get the raw data metadata
          RAW_DATA_METADATA = csv_tfxio.TensorAdapterConfig()
          
          # Pair the test data with the metadata into a tuple
          raw_train_dataset = (raw_train_data, RAW_DATA_METADATA)

          # Training data transformation. The TFXIO (RecordBatch) output format
          # is chosen for improved performance.
          (transformed_train_data,transformed_metadata) , transform_fn = (
              raw_train_dataset 
                | 'AnalyzeAndTransformTrainData' >> tft_beam.AnalyzeAndTransformDataset(preprocessing_fn, output_record_batches=True))
          
          # Parse the raw test data into inputs for TF Transform
          raw_test_data = (raw_test_data 
                            |'DecodeTestData' >> csv_tfxio.BeamSource())

          # Pair the test data with the metadata into a tuple
          raw_test_dataset = (raw_test_data, RAW_DATA_METADATA)

          # Now apply the same transform function to the test data.
          # You don't need the transformed data schema. It's the same as before.
          transformed_test_data, _ = ((raw_test_dataset, transform_fn) 
                                        | 'AnalyzeAndTransformTestData' >> tft_beam.TransformDataset(output_record_batches=True))
          
          # Declare an encoder to convert output record batches to TF Examples 
          transformed_data_coder = RecordBatchToExamplesEncoder(transformed_metadata.schema)

          
          # Encode transformed train data and write to disk
          _ = (
              transformed_train_data
              | 'EncodeTrainData' >> beam.FlatMapTuple(lambda batch, _: transformed_data_coder.encode(batch))
              | 'WriteTrainData' >> beam.io.WriteToTFRecord(
                  os.path.join(working_dir, TRANSFORM_TRAIN_FILENAME)))

          # Encode transformed test data and write to disk
          _ = (
              transformed_test_data
              | 'EncodeTestData' >> beam.FlatMapTuple(lambda batch, _: transformed_data_coder.encode(batch))
              | 'WriteTestData' >> beam.io.WriteToTFRecord(
                  os.path.join(working_dir, TRANSFORM_TEST_FILENAME)))
          
          # Write transform function to disk
          _ = (
            transform_fn
            | 'WriteTransformFn' >>
            tft_beam.WriteTransformFn(os.path.join(working_dir)))

    return transform_fn, transformed_train_data, transformed_test_data, transformed_metadata


def main():
  return transform_data(WORKING_DIR)

if __name__ == '__main__':
  transform_fn, transformed_train_data,transformed_test_data, transformed_metadata = main()

WARNING:apache_beam.runners.interactive.interactive_environment:Dependencies required for Interactive Beam PCollection visualization are not available, please use: `pip install apache-beam[interactive]` to install necessary dependencies to enable all data visualization features.

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_transform/tf_utils.py:289: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.


# Get the output of the Transform component
tf_transform_output = tft.TFTransformOutput(os.path.join(WORKING_DIR))

# Parameters
HISTORY_SIZE = 80
BATCH_SIZE = 100
STEP_SIZE = 40

def parse_function(example_proto):
    '''Parse the values from tf examples'''
    feature_spec = tf_transform_output.transformed_feature_spec()
    features = tf.io.parse_single_example(example_proto, feature_spec)
    values = list(features.values())
    values = [float(value) for value in values]
    features = tf.stack(values, axis=0)
    return features

def add_mode(features):
    '''Calculate mode of activity for the current history size of elements'''
    unique, _, count = tf.unique_with_counts(features[:,0])
    max_occurrences = tf.reduce_max(count)
    max_cond = tf.equal(count, max_occurrences)
    max_numbers = tf.squeeze(tf.gather(unique, tf.where(max_cond)))

    #Features (X) are all features except activity (x-acc, y-acc, z-acc)
    #Target(Y) is the mode of activity values of all rows in this window
    return (features[:,1:], max_numbers)

def get_windowed_dataset(path):
  '''Get the dataset and group them into windows'''
  dataset = tf.data.TFRecordDataset(path)
  dataset = dataset.map(parse_function)
  dataset = dataset.window(HISTORY_SIZE, shift=STEP_SIZE, drop_remainder=True)
  dataset = dataset.flat_map(lambda window: window.batch(HISTORY_SIZE))
  dataset = dataset.map(add_mode)
  dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
  dataset = dataset.repeat()

  return dataset


# Get list of train and test data tfrecord filenames from the transform outputs
train_tfrecord_files = tf.io.gfile.glob(os.path.join(WORKING_DIR, TRANSFORM_TRAIN_FILENAME + '*'))
test_tfrecord_files = tf.io.gfile.glob(os.path.join(WORKING_DIR, TRANSFORM_TEST_FILENAME + '*'))

# Generate dataset windows
windowed_train_dataset = get_windowed_dataset(train_tfrecord_files[0])
windowed_test_dataset = get_windowed_dataset(test_tfrecord_files[0])


# Preview an example in the train dataset
for x, y in windowed_train_dataset.take(1):
  print("\nFeatures (x-acc, y-acc, z-acc):\n")
  print(x)
  print("\nTarget (activity):\n")
  print(y)

Features (x-acc, y-acc, z-acc):

tf.Tensor(
[[[0.47814363 0.82018137 0.5200351 ]
  [0.6224036  0.7842018  0.53206956]
  [0.61964923 0.77451503 0.5043538 ]
  ...
  [0.64202845 0.57974094 0.5371751 ]
  [0.44543552 0.8143001  0.4336057 ]
  [0.53150946 0.92431456 0.73337346]]

 [[0.63892984 0.5942712  0.52076447]
  [0.5015558  0.5174686  0.526964  ]
  [0.37382194 0.9952359  0.72498584]
  ...
  [0.41444892 0.57316774 0.526964  ]
  [0.42890933 0.9952359  0.7209743 ]
  [0.5101631  0.46003968 0.49633083]]

 [[0.5063759  0.28809875 0.30961415]
  [0.4054972  0.7762448  0.48502573]
  [0.37760922 0.79596436 0.516753  ]
  ...
  [0.63617545 0.8717291  0.7056577 ]
  [0.5084416  0.30055326 0.26767585]
  [0.37657633 0.67314947 0.2111503 ]]

 ...

 [[0.45025566 0.82398695 0.63782704]
  [0.41341603 0.8589287  0.55796194]
  [0.33319503 0.66450053 0.5444687 ]
  ...
  [0.44818988 0.62852097 0.5988062 ]
  [0.44061536 0.55552393 0.60208833]
  [0.41823614 0.66934395 0.76692414]]

 [[0.41444892 0.73542184 0.52295256]
  [0.4626503  0.7413031  0.5466568 ]
  [0.47917655 0.6859499  0.51894104]
  ...
  [0.49294835 0.6510082  0.59588873]
  [0.46953622 0.55448604 0.5466568 ]
  [0.49673563 0.7160482  0.6101113 ]]

 [[0.41823614 0.62748307 0.61521685]
  [0.47160202 0.6956367  0.5732785 ]
  [0.44233683 0.701518   0.54045725]
  ...
  [0.4984571  0.90494096 0.5484802 ]
  [0.5346081  0.7433788  0.58640707]
  [0.4726349  0.5672865  0.5433747 ]]], shape=(100, 80, 3), dtype=float32)

Target (activity):

tf.Tensor(
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 3. 3. 3. 3.
 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.
 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.
 2. 2. 2. 3.], shape=(100,), dtype=float32)


# Preview an example in the train dataset
for x, y in windowed_test_dataset.take(1):
  print("\nFeatures (x-acc, y-acc, z-acc):\n")
  print(x)
  print("\nTarget (activity):\n")
  print(y)

Features (x-acc, y-acc, z-acc):

tf.Tensor(
[[[0.5101631  0.7471844  0.49231938]
  [0.4957027  0.75687116 0.49122533]
  [0.4898497  0.74822223 0.48794317]
  ...
  [0.59830296 0.6800686  0.5003423 ]
  [0.49949    0.5925414  0.5331636 ]
  [0.6168949  0.89421636 0.6400151 ]]

 [[0.53254235 0.71501034 0.5659849 ]
  [0.5346081  0.6984044  0.568173  ]
  [0.55664307 0.7160482  0.5856777 ]
  ...
  [0.5752351  0.7492601  0.4901313 ]
  [0.50913024 0.55932945 0.4759087 ]
  [0.58005524 0.7675959  0.65022624]]

 [[0.54700285 0.85719883 0.50362444]
  [0.55664307 0.6624248  0.43798187]
  [0.47538927 0.7063614  0.50763595]
  ...
  [0.5683492  0.59946054 0.5003423 ]
  [0.50431013 0.611915   0.57437253]
  [0.57213646 0.8454363  0.6943526 ]]

 ...

 [[0.54700285 0.9952359  0.45730996]
  [0.81142217 0.8433606  0.6050058 ]
  [0.42408916 0.4192167  0.49414277]
  ...
  [0.5549216  0.95752656 0.43068826]
  [0.68954134 0.9952359  0.5240466 ]
  [0.4209905  0.34518176 0.5200351 ]]

 [[0.701936   0.8755346  0.34535292]
  [0.7687294  0.85719883 0.57437253]
  [0.44164824 0.39188606 0.5466568 ]
  ...
  [0.6805897  0.8679235  0.4715325 ]
  [0.58384246 0.76552016 0.62032235]
  [0.5817767  0.47284007 0.48502573]]

 [[0.3951683  0.37355027 0.2859099 ]
  [0.7115763  0.89006484 0.50544786]
  [0.53529674 0.7444167  0.6028177 ]
  ...
  [0.58280957 0.4659209  0.5127415 ]
  [0.5401169  0.98347336 0.3916674 ]
  [0.6485701  0.9924683  0.49122533]]], shape=(100, 80, 3), dtype=float32)

Target (activity):

tf.Tensor(
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1.], shape=(100,), dtype=float32)

	user_id	activity	timestamp	x-acc	y-acc	z-acc
0	33	Jogging	49105962326000	-0.694638	12.680544	0.50395286;
1	33	Jogging	49106062271000	5.012288	11.264028	0.95342433;
2	33	Jogging	49106112167000	4.903325	10.882658	-0.08172209;
3	33	Jogging	49106222305000	-0.612916	18.496431	3.0237172;
4	33	Jogging	49106332290000	-1.184970	12.108489	7.205164;

Ungraded Lab: Feature Engineering with Accelerometer Data¶

Install Packages¶

Imports¶

Download the Data¶

Inspect the Data¶

Histogram of Activities¶

Histogram of Measurements per User¶

Acceleration per Activity¶

Declare Schema for Cleaned Data¶

Create a `tf.Transform` preprocessing_fn¶

Transform the data¶

Prepare Training and Test Datasets from TFTransformOutput¶

Wrap Up¶

Ungraded Lab: Feature Engineering with Accelerometer Data¶

Install Packages¶

Imports¶

Download the Data¶

Inspect the Data¶

Histogram of Activities¶

Histogram of Measurements per User¶

Acceleration per Activity¶

Declare Schema for Cleaned Data¶

Create a tf.Transform preprocessing_fn¶

Transform the data¶

Prepare Training and Test Datasets from TFTransformOutput¶

Wrap Up¶

Create a `tf.Transform` preprocessing_fn¶