In a project we are using Talend to load a lot of data each night and we are experiencing randomly getting “Connection does not exists” error messages during our data load. This can happen at any time both during the connection phase, and so far we have been unable to see any real signs of why it is happening and when. In addition this leads connection reset often leads to our data being corrupt and unusable meaning we have to start all over. We have therefore set up an error handling when reading from this data source.
Setting up a try/catch in Talend
Create a context variable that we call continueLooping. This boolean will be used to end our loop when we reach our maximum number of attempts or the connection has been successful
Add a tJava where you initialize the variable to true
Then add a tLoop as a While and condition context.continueLooping
Now we add a tJavaflex where our try/catch block will be. Put the try block in the start code and the catch block in the end code. Mine look something like this. Feel free to add some logging in here as well so you can keep track of where the error is happening. in the main code I make the job sleep for a little while in order to give our connection some time to get back up.
Add a tJava with a “On component Ok” trigger on your database connection. Here set the continueLooping to false to stop the loop.
In the end it should look something like this:
Extending the error handling
In our case we are already looping our read by using a job above this one to read data one month at a time. The output of this job is large .csv files which we then upload to Azure blob storage in order to use Polybase to finally move the data into our SQL DWH data warehouse. Since we know we can loose connection in the middle of our read we need to clean up our .csv files before starting the loop all over for the month that we are reading. This is done by adding an If trigger on the tJavaflex where our trigger is to check which iteration we are in. If we are not in the first iteration of our loop something has gone wrong and we need to do some cleanup to make sure our data is correct in the end. We therefore remove all rows for the month we are supposed to read before we let the loop start over. Now, the only way I have been able to do this is by creating a copy of our existing file, filter out rows for current month and then write it back as the original file. In the end it looks like this:
Overall it seems to work very nicely when we are unable to trust that our data source will keep our connection open for the whole duration.
2017 is here and once again I’ll have a goal of writing here at least one time every month. Last year I failed this goal, but this year? Let’s give it another go.
I have a list of projects I want to do and I’ve finally started to track what I am doing every day. I’ve thought of this for a while as I think it would be interesting to see if I am able to find out if there are parameters in my life that affect things like my mood and evergy level. To do this tracking I’ve made a very simple website that is hosted in Azure with an Azure SQL Database behind to store my data. This was needed to make the registration as easy as possible. I had an Excel sheet the first couple of days before I got this site up and running, and I would not recommand that approach at all if you’re going to do the same. It is a too big effort to open an excel sheet every time you drink something or whatever you want to register. So far I am very happy with how it is working out, but I have started to get quite the backlog of increasing size with ideas for improvements. I have told myself to not change anything during January and then do an evalutation of how I think it is working out and which changes are absolutely needed. I’ve also decided not too look at any of the data until January is over. I don’t want to be too aware of what I am logging in the beginning, but rather try to make logging into somethig I just do and then when I have some months of data I can see if something sticks out from them.
So far I have split registration into two parts. First part is activities during the day. Things such as what I eat, drink, if I work out and for how long etc. The other part is a registration I do at the end of each day. Here I note down my mood, energy level, stress level and overall feeling as well as some small notes about f.ex if I hang out with friends during the day or did 100 math puzzles on an app I downloaded. As mentioned I am hoping to see if there is something that affects f.ex my mood and I am able to more actively apporach these things in the future. Not really sure if this will pop up, but at least I’ll have some more data on myself and what I do during the day.
Ever done something like this or have ideas of things that are worth noting down during the day? Let me know. I know some people take these kind of things to the extreme and I probably won’t do that, but I’m always interested in new input or ideas.
Note at the end: I really like how easy it is to create a web app in Azure and everytime I now do a push to my master branch in my Git reposit the website updates within seconds. Easy to set up a Visual Studio Team Services account and create the reposit to start working right away.