FeaturedIT topics

IDG Contributor Network: Just because something can be done …

One meme I repeat often is “there is always more than one solution.” I also often write about how enterprises should evaluate the available solutions and go with the best for their needs. What I don’t often mention is that in the real world there are many good (and not-so-good) reasons why the best solution just isn’t going to happen, or not in the necessary time frame (which I rant about too often). When I find myself in a place where the optimal route is blocked I try to find a way around that is interesting, educational, and expedient, and generally in the opposite order.

A solutionist has to do what a solutionist has to do

I recently dealt with a specific case where the preferred option was off the table, and it is a situation I have seen happen multiple times, which prompted me to share the solution. The use case is when files generated by daily operations of Informatica Cloud begin to fill up the disk space and cause issues. The optimal solution is to have a server archive process that is reusable across the enterprise, preferably designed as a generic service. The worst solution (aside from just ignoring the problem and then dealing with the outages over and over) is to simply increase disk space. That approach is akin to issue Whack-a-Mole as it pushes the space issue out and then the file index latency issue pops up—until the space issue eventually comes back.

Of course, file management scripts are not that hard to write. I’ve written a few for various purposes and choose Python as the language because it can be used anywhere a secure agent can be installed (i.e., Windows, Unix, and Linux). In this particular enterprise, getting scripts tested, installed and managed is only slightly less challenging than getting an enterprise archive service in place, so that was not the route for this situation. Not to fear, the enterprise did have the advanced version of Informatica Cloud, meaning it has the Application Integration (aka, ICRT, for its former name Informatica Cloud Run Time) license in addition to the Data Integration license. So rather than go through the managed services group to manage the scripting, I used the Application Integration platform to do the job. Here’s how.

Make some room

The first iteration had to do with recovering space quickly while the retention policy was still unknown. One minor hurdle was that the data integrations run pretty much 24/7 (which is why we were running into the space issues). So, the solution started with creating a subfolder where the files were piling up and moving them to new folder. Files in use wouldn’t move, so I didn’t have to worry about working on files that might be involved with an in-flight job. I only used a find based on the last modified date to completely eliminate that concern. After the files were moved it was trivial to have a command that compressed the files. While I would have liked to use the tar command for this, I wanted to grow a single archive rather than create daily archives and I did not want to have to explode and recompress each time as it might push the disk space in to the red. I named the temporary folders for the year and month and deleted folders once they became two months old.

If you haven’t worked with ICRT, it is optimized for working with web services and treats everything as XML, using XQuery and XPath functions for managing data. So, this first pass consisted of defining variables (current_dated_folder, old_dated_folder, last_month_year, days_to_zip, and months_to_keep) and then putting them together into a command generated by XQuery functions:

let $startDate := if(fn:day-from-date(fn:current-date()) >= ($temp.days_to_zip + 1)) then
              fn:concat(fn:year-from-date(fn:current-date()),’-‘,
    fn:substring($temp.current_dated_folder, 5),’-01’) else
    fn:concat($temp.last_month_year, ‘-01’)
let $durationDays := fn:concat(“P”, $temp.days_to_zip,”D”)
let $endDate := fn:substring(
              fn:string(fn:current-date() - xs:dayTimeDuration($durationDays))
    ,1, 10)
let $archive := fn:replace(fn:substring($startDate, 1, 7), ‘-‘, ‘’)
return fn:concat(‘find . -not -name “*.gz” -not -name “*.7z” -mtime +’,fn:string(31*$input.months_to_keep),’ -delete;’,
              ‘mkdir -p ‘, $temp.current_dated_folder,
              ‘;rm -rf ‘,$temp.old_dated_folder,
              ‘;find . -newermt “‘, $startDate,
    ‘“ ! -newermt “‘,$endDate,’” -print0 | xargs -IF -0 mv F ‘,$archive,
    ‘/;cd ‘,$archive,
    ‘;if [ “$(ls —ignore="*.gz" —ignore="*.zip" | wc -l)” -ge “1” ]; then zip —q ‘,$archive,
‘.zip -r *.* -8 -x *.gz* -x *.zip* -x *.7z*;find . -not -name “*.zip” -not -name “*.gz” -delete; fi’)

Watch your step

The first time I deployed this to another org I saw this ugly error:

{”error”:{”code”:500,”detail”:{”reason”:”Access denied.”,”code”:”AeException”},”message”:”Access denied.”}}

I will save you the details of the torturous steps I went through to find the problem but will share where I found it because it is not the first time it has happened to me. The answer was in a post I had made explaining to someone else how to fix it, which is to go to the Admin console and enable the Shell Service.

Everything is a nail

Like all quick fixes, I started with the immediate issue, which was fairly straight forward. Looking at the solution, which managed a log path that filled up because of dealing with a file I/O connection as part of hourly data integrations, I began looking for other nails my new hammer could deal with. The next obvious use was the file archives used to reconcile if issues were raised by the stake holders of the target systems. For the most part, this was a simple enhancement, increasing the number of field variables used and adding input fields for the variables so the process can be called as a service (one of my favorite Informatica Cloud Application Integration features).

Good development means good testing, so once things were working for log files as a default path and archive folders as an optional parameter and varied number of months retained configurable, I grabbed a bunch of files from test environments and begin increasing the number and variety. And found my first design flaw, which is that the solution only dealt with files within the retention period and left the older files lingering around. While it was an improvement over the current situation of them piling up until the environment went down, it was not as good as could be managed within the scope of what could be done, so it was back to the process palette to figure out how to deal with the old files without making the process too difficult for junior maintenance team members to manage.

 While I started by checking if there were older files to be dealt with, I soon realized this was a superfluous check because I was treating everything older than the retention period the same as well as the files between the furthest period and the current month. I worked through getting the older files removed using the following:

fn:concat(‘find . -maxdepth 1 -not -name “*.gz” -not -name “*.7z” -mtime +’,(fn:days-from-duration(fn:current-date() - (fn:current-date() - xs:yearMonthDuration(fn:concat(‘P’, $input.months_to_keep, ‘M’))))) - 1,’ -delete;’)

Once the files to be deleted were dealt with, I then went after those within a single month of the retention period.

let $startDate := xs:date(fn:concat(fn:substring($temp.current_dated_folder, 1, 4),
              ‘-‘, fn:substring($temp.current_dated_folder, 5),’-01’))
let $endDate := $startDate + xs:yearMonthDuration(“P1M”)
let $archive := fn:replace(fn:substring(fn:string($startDate), 1, 7), ‘-‘, ‘’)
return fn:concat(‘mkdir -p ‘, $temp.current_dated_folder,
              ‘;find . -newermt “‘, $startDate,’” ! -newermt “‘,$endDate,
    ‘“ -print0 | xargs -IF -0 mv F ‘,$archive,’/;cd ‘,$archive,
    ‘;if [ “$(ls —ignore="*.gz" —ignore="*.zip" | wc -l)” -ge “1” ]; then zip —q ‘,$archive,
    ‘.zip -r *.* -8 -x *.gz* -x *.zip* -x *.7z*;find . -not -name “*.zip” -not -name “*.gz” -delete; fi’)

Once the monthly command was figured out, I stuck a decision step in to iterate as necessary and all was good.

But wait, there’s more … no, it was just gas

Of course, once I had the process for managing older files in place I realized I could modify that to accommodate current files, too. However, the adage of “if it works don’t fix it” came to mind. If I were building the process as a classroom example I definitely would have taken the time to refactor. But given that this was a real word work situation and the resulting implementation was easy to follow as it was, I left it with the slightly redundant step. I also left a test-only step used speed up development because it had no overhead and it will make it easier for future enhancements.

If you would like a copy of the final process to learn from or reuse, DM me on Twitter @ssnsolutionist with your email address and a link to this article.

This article is published as part of the IDG Contributor Network. Want to Join?

Related Articles

Back to top button