Ganesh Bhat • over 1 year ago
Unable to ingest data from external sources and issues with Kubernetes
Hi,
We have deployed the application on openshift but are facing issues in ingesting data from external source i.e. youtube. The streamlit app is exposed on 8501 port but ingesting from youtube is failing. Do we need to modify any connections (inbound/outbound) on openshift to enable data ingestion? If yes, how and where?
We have referred this (https://www.redhat.com/en/blog/run-elastic-cloud-on-kubernetes-on-red-hat-openshift) but we don't have access to Operatorhub to create elasticsearch separately.
Also, we are receiving the below error and are unclear on the issue. Do we need to specify any hosts specifically? The below could be related to elasticsearch which we are deploying as an embedding store for the RAG. We are not sure about as the below error is not explicit about it. Elasticsearch is exposed on 9200 but we cannot specify multiple ports when deploying on openshift.
Connection failed with status 503, and response " Application is not available The application is currently not serving requests at this endpoint. It may not have been started or is still starting. Possible reasons you are seeing this page: The host doesn't exist. Make sure the hostname was typed correctly and that a route matching this hostname exists. The host exists, but doesn't have a matching path. Check if the URL path was typed correctly and that the route was created using the desired path. Route and path matches, but all pods are down. Make sure that the resources exposed by this route (pods, services, deployment configs, etc) have at least one pod running. ".
Comments are closed.

10 comments
Michelle Brain Manager • over 1 year ago
I have shared this with the team and they responded that:
Please note that pods go down after 72 hours if they aren't used. Also, please be sure to bring all the required pods for his online application
Ganesh Bhat • over 1 year ago
How do you bring all the required pods? The response is not helpful for us to resolve the issue and we are not experts in Kubernetes or openshift infrastructure. We have deployed & tested this locally on Docker and on hugging face and it is working fine at both locations.
Michelle Brain Manager • over 1 year ago
Sorry about this! Hope the following will help you out:
To restart the pods please use one of the following:
oc rollout restart pod
OR
oc delete pod
Please use the below to get the list of pods:
oc get pods -n
If you want to do it using GUI:
1. Login to ocp
2. Click workloads -> Deployment Configs
3. Find the pod you want to restart.
4. On the right side, click on the 3 dots.
5. Click start rollout. If you delete your pod, or scale it to 0 and to 1
Please let me know if you have any other questions. Best of luck!
Ganesh Bhat • over 1 year ago
Hi Michelle,
We cannot create an egress policy for our pod on which the RAG application is running because of which we cannot fetch youtube data into the RAG application.
An error occurred
networkpolicies.networking.k8s.io is forbidden: User "ganeshpbhat2" cannot create resource "networkpolicies" in API group "networking.k8s.io" in the namespace "youtube-rag"
Ganesh Bhat • over 1 year ago
We have create a network policy successfully and the above is resolved but we are still facing issues in connecting to youtube.
We are unable to ingest data from youtube even though we have modified the network policies and the deployment YAML. We are able to ingest the youtube data locally on our laptop using docker but not on openshift. It seems to be blocked by openshift.
2024-11-13 22:52:46,893 - transcript_extractor - ERROR - Error extracting transcript for video zjkBMFhNj_g:
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=zjkBMFhNj_g! This is most likely caused by:
Subtitles are disabled for this video
Michelle Brain Manager • over 1 year ago
Hello, Sorry for the slow response. It might be too difficult to troubleshoot the specifics of your project. Please be sure to submit the use case and anything you were able to accomplish.
Best of luck!
Ganesh Bhat • over 1 year ago
Hi Michelle,
Thanks for your support. We have submitted but with errors. We have recorded a working demo from our local desktop using docker and wanted to update the YouTube video link. Is it possible to update the YouTube video link from the backend as I can't do it now.
Warm Regards
Ganesh
Michelle Brain Manager • over 1 year ago
Hi Ganesh, Can you share with me the link to the video so I can add it as a note for the review team. Thank you,
Ganesh Bhat • over 1 year ago
Thanks, Michelle.
Youtube link: https://www.youtube.com/watch?v=KaQ3IWzl6lo
0:00 to 2:03 mins is the current deployment of solution on openshift and 2:04 to the end is the local demo of the same application.
We also understood the reason for our issue: We were using youtube transcript python package and youtube has banned all cloud IP addresses including openshift and hence we could not extract the transcripts. We also tried using google youtube APIs which did not work due to IP block by youtube. The only way to circumvent this is to create a proxy URL with proxy IP and then get the youtube transcripts. More details about the issue at
https://github.com/jdepoix/youtube-transcript-api/issues/303
Please do convey it to the review team.
We also tried creating a proxy server on openshift but that did not work due to permission issues.
Michelle Brain Manager • over 1 year ago
Thank you! I'll pass this along!