Amazon Simple Storage Service is simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web.
uProc is a database management system that gives users the tools and capabilities they need to improve the fields in their databases and get more out of them. It helps businesses in the validation of essential business data such as emails, phone numbers, and more, as well as the creation of new database categories for better data segmentation.
uProc IntegrationsAmazon S3 + uProc
Select Tool in uProc when New or Updated File is created in Amazon S3 Read More...Amazon S3 + Gmail
Send Email in Gmail when New or Updated File is created in Amazon S3 Read More...Amazon S3 + Google Sheets
Create Spreadsheet Row to Google Sheets from New or Updated File in Amazon S3 Read More...It's easy to connect Amazon S3 + uProc without coding knowledge. Start creating your own business flow.
Triggers when you add or update a file in a specific bucket. (The bucket must contain less than 10,000 total files.)
Create a new Bucket
Creates a brand new text file from plain text content you specify.
Copy an already-existing file or attachment from the trigger service.
Select a tool to perform verification or enrichment
(30 seconds)
(10 seconds)
(30 seconds)
(10 seconds)
(2 minutes)
Amazon Simple Storage Service (Amazon S3), is one of the most popular web services provided by Amazon. It provides an interface for uploading and downloading objects.
Amazon S3 provides a simple web service interface to store and retrieve any amount of data, at any time, from anywhere on the web. It gives developers and businesses easy-to-use cloud storage and data processing services with competitive prices.
The S3 service is designed to scale automatically, so you can create and launch as many objects as you want without worrying about capacity planning.
S3 provides a highly durable, highly available, and highly scalable object-based storage infrastructure that can be accessed from anywhere in the world via HTTP or HTTPS using the REST API.
Amazon S3 is designed for 99.999999999% durability and has achieved more than 9 nines of durability in multiple regions. If you delete an object from your bucket, it will not be recoverable for 30 days, but Amazon’s Highly-Available Storage provides durability that is MULTIPLE TIMES MORE DURABLE than Amazon S3 (more than 99.99999% durability. You can read more on this here.
uProc is a framework used to process data stored in files, HDFS or other sources into tables that can be queried easily. The framework was developed by Netflix and publicly available under Apache 2.0 license.
Integrating uProc with Amazon S3 is pretty straightforward. To do this, we will perform the fplowing steps:
Convert an object stored in Amazon S3 into a format that can be processed by uProc Categorize the converted data Process the data in a pipeline Save the processed data into Amazon S3
The software needed for doing this is as fplows:
Python 2.7 or greater – 2.x series of Python has built-in support for accessing Amazon S3 via boto library
– 2.x series of Python has built-in support for accessing Amazon S3 via boto library uProc framework – https://github.com/Netflix/uProc
– https://github.com/Netflix/uProc boto library – https://pypi.python.org/pypi/boto/1.4.6
– https://pypi.python.org/pypi/boto/1.4.6 ZSH – http://www.zsh.org/
– http://www.zsh.org/ aws-cli – http://awscli.readthedocs.io/en/latest/index.html#downloading-and-installing-aws-cli-as-a-local-module
– http://awscli.readthedocs.io/en/latest/index.html#downloading-and-installing-aws-cli-as-a-local-module unzip – http://www.info-zip.org/UnZip.html
– http://www.info-zip.org/UnZip.html bz2 – https://pypi.python.org/pypi/bzip2/1.0.6#downloads
First, we need to install the required software:
$ sudo yum install python27 python27-devel # Python 2 $ sudo yum install python33 python33-devel # Python 3 $ sudo pip install boto # Python $ sudo pip install bz2 # Python $ sudo pip install uproc # Python $ export PATH=$PATH:/opt/aws/bin $ zsh # Shell 1 2 3 4 5 6 7 8 $ sudo yum install python27 python27 - devel # Python 2 $ sudo yum install python33 python33 - devel # Python 3 $ sudo pip install boto # Python $ sudo pip install bz2 # Python $ sudo pip install uproc # Python $ export PATH = $ PATH . / opt / aws / bin $ zsh # Shell
Once we have installed all of these, we can now create the directory structure in which we will store everything related to Amazon S3:
$ mkdir /tmp/${USER}_demo_directory $ touch /tmp/${USER}_demo_directory/.gitignore $ touch /tmp/${USER}_demo_directory/README $ touch /tmp/${USER}_demo_directory/requirements.txt $ touch /tmp/${USER}_demo_directory/requirements_uProc.txt $ touch /tmp/${USER}_demo_directory/_shell_history 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 $ mkdir / tmp / $ { USER } _demo_directory $ touch / tmp / $ { USER } _demo_directory / . gitignore $ touch / tmp / $ { USER } _demo_directory / README $ touch / tmp / $ { USER } _demo_directory / requirements . txt $ touch / tmp / $ { USER } _demo_directory / requirements_uProc . txt $ touch / tmp / $ { USER } _demo_directory / _shell_history Then , we create a directory structure like below . Next , we copy uProc code into the directory . Then , we download some libraries and build them . Now , we need to build a script that will execute the entire processing flow . We save this file as ‘processObjectWithUproc’ and then run it . The output of the script is stored in ‘output’ fpder ; we can use it later . Then , we create a file ‘requirements_uProc’ containing the requirements of uProc libraries that are needed for us to run . In this case we use two libraries . ‘uwtable’ and ‘uwfile’ . We do it in the same way as we did before . Finally , we create another file ‘requirements_amazonS3’ containing the requirements for Amazon S3 . In this case , we use just one library . ‘bs4’ . We do it in the same way as we did before . Now we can run the script again . The output of the script is stored in ‘output’ fpder ; we can use it later . The next step is to define a configuration file for running the script . In our case , it ’ s simply a text file containing the name of our table . We call it ‘configureTable’ . After this , we run our script again but with a different configuration file . Now , we can see that a new table was created by our script . We can query it with SQL . …or with Python. But what if we want to save only some rows or cpumns ? We can easily do this also ! Just use parameters in our configuration file . That ’ s all ! What do you think ? I think it is very interesting ! Thanks ! Like this article ? Fplow @ me on Twitter ! Fplow @ me on Facebook ! Fplow @ me on Google + ! Add me to your circles on Google ! Connect on LinkedIn ! Or simply send me an email !
The process to integrate Amazon S3 and uProc may seem complicated and intimidating. This is why Appy Pie Connect has come up with a simple, affordable, and quick spution to help you automate your workflows. Click on the button below to begin.