Note
You are viewing the documentation for an older version of boto (boto2).
Boto3, the next version of Boto, is now stable and recommended for general use. It can be used side-by-side with Boto in the same project, so it is easy to start using Boto3 in your existing projects as well as new projects. Going forward, API updates and all new feature work will be focused on Boto3.
For more information, see the documentation for boto3.
EMR¶
boto.emr¶
This module provies an interface to the Elastic MapReduce (EMR) service from AWS.
-
boto.emr.
connect_to_region
(region_name, **kw_params)¶
-
boto.emr.
regions
()¶ Get all available regions for the Amazon Elastic MapReduce service.
Return type: list Returns: A list of boto.regioninfo.RegionInfo
boto.emr.connection¶
Represents a connection to the EMR service
-
class
boto.emr.connection.
EmrConnection
(aws_access_key_id=None, aws_secret_access_key=None, is_secure=True, port=None, proxy=None, proxy_port=None, proxy_user=None, proxy_pass=None, debug=0, https_connection_factory=None, region=None, path='/', security_token=None, validate_certs=True, profile_name=None)¶ -
APIVersion
= '2009-03-31'¶
-
DebuggingArgs
= 's3://{region_name}.elasticmapreduce/libs/state-pusher/0.1/fetch'¶
-
DebuggingJar
= 's3://{region_name}.elasticmapreduce/libs/script-runner/script-runner.jar'¶
-
DefaultRegionEndpoint
= 'elasticmapreduce.us-east-1.amazonaws.com'¶
-
DefaultRegionName
= 'us-east-1'¶
-
ResponseError
¶ alias of
boto.exception.EmrResponseError
-
add_instance_groups
(jobflow_id, instance_groups)¶ Adds instance groups to a running cluster.
Parameters: - jobflow_id (str) – The id of the jobflow which will take the new instance groups
- instance_groups (list(boto.emr.InstanceGroup)) – A list of instance groups to add to the job
-
add_jobflow_steps
(jobflow_id, steps)¶ Adds steps to a jobflow
Parameters: - jobflow_id (str) – The job flow id
- steps (list(boto.emr.Step)) – A list of steps to add to the job
Create new metadata tags for the specified resource id.
Parameters: - resource_id (str) – The cluster id
- tags (dict) – A dictionary containing the name/value pairs. If you want to create only a tag name, the value for that tag should be the empty string (e.g. ‘’) or None.
-
describe_cluster
(cluster_id)¶ Describes an Elastic MapReduce cluster
Parameters: cluster_id (str) – The cluster id of interest
-
describe_jobflow
(jobflow_id)¶ This method is deprecated. We recommend you use list_clusters, describe_cluster, list_steps, list_instance_groups and list_bootstrap_actions instead.
Describes a single Elastic MapReduce job flow
Parameters: jobflow_id (str) – The job flow id of interest
-
describe_jobflows
(states=None, jobflow_ids=None, created_after=None, created_before=None)¶ This method is deprecated. We recommend you use list_clusters, describe_cluster, list_steps, list_instance_groups and list_bootstrap_actions instead.
Retrieve all the Elastic MapReduce job flows on your account
Parameters: - states (list) – A list of strings with job flow states wanted
- jobflow_ids (list) – A list of job flow IDs
- created_after (datetime) – Bound on job flow creation time
- created_before (datetime) – Bound on job flow creation time
-
describe_step
(cluster_id, step_id)¶ Describe an Elastic MapReduce step
Parameters: - cluster_id (str) – The cluster id of interest
- step_id (str) – The step id of interest
-
list_bootstrap_actions
(cluster_id, marker=None)¶ Get a list of bootstrap actions for an Elastic MapReduce cluster
Parameters: - cluster_id (str) – The cluster id of interest
- marker (str) – Pagination marker
-
list_clusters
(created_after=None, created_before=None, cluster_states=None, marker=None)¶ List Elastic MapReduce clusters with optional filtering
Parameters: - created_after (datetime) – Bound on cluster creation time
- created_before (datetime) – Bound on cluster creation time
- cluster_states (list) – Bound on cluster states
- marker (str) – Pagination marker
-
list_instance_groups
(cluster_id, marker=None)¶ List EC2 instance groups in a cluster
Parameters: - cluster_id (str) – The cluster id of interest
- marker (str) – Pagination marker
-
list_instances
(cluster_id, instance_group_id=None, instance_group_types=None, marker=None)¶ List EC2 instances in a cluster
Parameters: - cluster_id (str) – The cluster id of interest
- instance_group_id (str) – The EC2 instance group id of interest
- instance_group_types (list) – Filter by EC2 instance group type
- marker (str) – Pagination marker
-
list_steps
(cluster_id, step_states=None, marker=None)¶ List cluster steps
Parameters: - cluster_id (str) – The cluster id of interest
- step_states (list) – Filter by step states
- marker (str) – Pagination marker
-
modify_instance_groups
(instance_group_ids, new_sizes)¶ Modify the number of nodes and configuration settings in an instance group.
Parameters: - instance_group_ids (list(str)) – A list of the ID’s of the instance groups to be modified
- new_sizes (list(int)) – A list of the new sizes for each instance group
Remove metadata tags for the specified resource id.
Parameters: - resource_id (str) – The cluster id
- tags (list) – A list of tag names to remove.
-
run_jobflow
(name, log_uri=None, ec2_keyname=None, availability_zone=None, master_instance_type='m1.small', slave_instance_type='m1.small', num_instances=1, action_on_failure='TERMINATE_JOB_FLOW', keep_alive=False, enable_debugging=False, hadoop_version=None, steps=None, bootstrap_actions=[], instance_groups=None, additional_info=None, ami_version=None, api_params=None, visible_to_all_users=None, job_flow_role=None, service_role=None)¶ Runs a job flow :type name: str :param name: Name of the job flow
Parameters: - log_uri (str) – URI of the S3 bucket to place logs
- ec2_keyname (str) – EC2 key used for the instances
- availability_zone (str) – EC2 availability zone of the cluster
- master_instance_type (str) – EC2 instance type of the master
- slave_instance_type (str) – EC2 instance type of the slave nodes
- num_instances (int) – Number of instances in the Hadoop cluster
- action_on_failure (str) – Action to take if a step terminates
- keep_alive (bool) – Denotes whether the cluster should stay alive upon completion
- enable_debugging (bool) – Denotes whether AWS console debugging should be enabled.
- hadoop_version (str) – Version of Hadoop to use. This no longer defaults to ‘0.20’ and now uses the AMI default.
- steps (list(boto.emr.Step)) – List of steps to add with the job
- bootstrap_actions (list(boto.emr.BootstrapAction)) – List of bootstrap actions that run before Hadoop starts.
- instance_groups (list(boto.emr.InstanceGroup)) – Optional list of instance groups to use when creating this job. NB: When provided, this argument supersedes num_instances and master/slave_instance_type.
- ami_version (str) – Amazon Machine Image (AMI) version to use for instances. Values accepted by EMR are ‘1.0’, ‘2.0’, and ‘latest’; EMR currently defaults to ‘1.0’ if you don’t set ‘ami_version’.
- additional_info (JSON str) – A JSON string for selecting additional features
- api_params (dict) – a dictionary of additional parameters to pass directly to the EMR API (so you don’t have to upgrade boto to use new EMR features). You can also delete an API parameter by setting it to None.
- visible_to_all_users (bool) – Whether the job flow is visible to all IAM
users of the AWS account associated with the job flow. If this
value is set to
True
, all IAM users of that AWS account can view and (if they have the proper policy permissions set) manage the job flow. If it is set toFalse
, only the IAM user that created the job flow can view and manage it. - job_flow_role (str) – An IAM role for the job flow. The EC2
instances of the job flow assume this role. The default role is
EMRJobflowDefault
. In order to use the default role, you must have already created it using the CLI. - service_role (str) – The IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf.
Return type: str
Returns: The jobflow id
-
set_termination_protection
(jobflow_id, termination_protection_status)¶ Set termination protection on specified Elastic MapReduce job flows
Parameters: - jobflow_ids (list or str) – A list of job flow IDs
- termination_protection_status (bool) – Termination protection status
-
set_visible_to_all_users
(jobflow_id, visibility)¶ Set whether specified Elastic Map Reduce job flows are visible to all IAM users
Parameters: - jobflow_ids (list or str) – A list of job flow IDs
- visibility (bool) – Visibility
-
terminate_jobflow
(jobflow_id)¶ Terminate an Elastic MapReduce job flow
Parameters: jobflow_id (str) – A jobflow id
-
terminate_jobflows
(jobflow_ids)¶ Terminate an Elastic MapReduce job flow
Parameters: jobflow_ids (list) – A list of job flow IDs
-
boto.emr.step¶
-
class
boto.emr.step.
HiveBase
(name, **kw)¶ -
BaseArgs
= ['s3n://us-east-1.elasticmapreduce/libs/hive/hive-script', '--base-path', 's3n://us-east-1.elasticmapreduce/libs/hive/']¶
-
-
class
boto.emr.step.
HiveStep
(name, hive_file, hive_versions='latest', hive_args=None)¶ Hive script step
-
class
boto.emr.step.
InstallHiveStep
(hive_versions='latest', hive_site=None)¶ Install Hive on EMR step
-
InstallHiveName
= 'Install Hive'¶
-
-
class
boto.emr.step.
InstallPigStep
(pig_versions='latest')¶ Install pig on emr step
-
InstallPigName
= 'Install Pig'¶
-
-
class
boto.emr.step.
JarStep
(name, jar, main_class=None, action_on_failure='TERMINATE_JOB_FLOW', step_args=None)¶ Custom jar step
A elastic mapreduce step that executes a jar
Parameters: - name (str) – The name of the step
- jar (str) – S3 URI to the Jar file
- main_class (str) – The class to execute in the jar
- action_on_failure (str) – An action, defined in the EMR docs to take on failure.
- step_args (list(str)) – A list of arguments to pass to the step
-
args
()¶ Return type: list(str) Returns: List of arguments for the step
-
jar
()¶ Return type: str Returns: URI to the jar
-
main_class
()¶ Return type: str Returns: The main class name
-
class
boto.emr.step.
PigBase
(name, **kw)¶ -
BaseArgs
= ['s3n://us-east-1.elasticmapreduce/libs/pig/pig-script', '--base-path', 's3n://us-east-1.elasticmapreduce/libs/pig/']¶
-
-
class
boto.emr.step.
PigStep
(name, pig_file, pig_versions='latest', pig_args=[])¶ Pig script step
-
class
boto.emr.step.
ScriptRunnerStep
(name, **kw)¶ -
ScriptRunnerJar
= 's3n://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar'¶
-
-
class
boto.emr.step.
Step
¶ Jobflow Step base class
-
args
()¶ Return type: list(str) Returns: List of arguments for the step
-
jar
()¶ Return type: str Returns: URI to the jar
-
main_class
()¶ Return type: str Returns: The main class name
-
-
class
boto.emr.step.
StreamingStep
(name, mapper, reducer=None, combiner=None, action_on_failure='TERMINATE_JOB_FLOW', cache_files=None, cache_archives=None, step_args=None, input=None, output=None, jar='/home/hadoop/contrib/streaming/hadoop-streaming.jar')¶ Hadoop streaming step
A hadoop streaming elastic mapreduce step
Parameters: - name (str) – The name of the step
- mapper (str) – The mapper URI
- reducer (str) – The reducer URI
- combiner (str) – The combiner URI. Only works for Hadoop 0.20 and later!
- action_on_failure (str) – An action, defined in the EMR docs to take on failure.
- cache_files (list(str)) – A list of cache files to be bundled with the job
- cache_archives (list(str)) – A list of jar archives to be bundled with the job
- step_args (list(str)) – A list of arguments to pass to the step
- input (str or a list of str) – The input uri
- output (str) – The output uri
- jar (str) – The hadoop streaming jar. This can be either a local path on the master node, or an s3:// URI.
-
args
()¶ Return type: list(str) Returns: List of arguments for the step
-
jar
()¶ Return type: str Returns: URI to the jar
-
main_class
()¶ Return type: str Returns: The main class name
boto.emr.emrobject¶
This module contains EMR response objects
-
class
boto.emr.emrobject.
AddInstanceGroupsResponse
(connection=None)¶ -
Fields
= set(['InstanceGroupIds', 'JobFlowId'])¶
-
-
class
boto.emr.emrobject.
Application
(connection=None)¶ -
Fields
= set(['AdditionalInfo', 'Args', 'Name', 'Version'])¶
-
-
class
boto.emr.emrobject.
BootstrapAction
(connection=None)¶ -
Fields
= set(['Args', 'Name', 'Path', 'ScriptPath'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
BootstrapActionList
(connection=None)¶ -
Fields
= set(['Marker'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
Cluster
(connection=None)¶ -
Fields
= set(['AutoTerminate', 'Id', 'LogUri', 'MasterPublicDnsName', 'Name', 'NormalizedInstanceHours', 'RequestedAmiVersion', 'RunningAmiVersion', 'ServiceRole', 'TerminationProtected', 'VisibleToAllUsers'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
ClusterStateChangeReason
(connection=None)¶ -
Fields
= set(['Code', 'Message'])¶
-
-
class
boto.emr.emrobject.
ClusterStatus
(connection=None)¶ -
Fields
= set(['State', 'StateChangeReason', 'Timeline'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
ClusterSummary
(connection)¶ -
Fields
= set(['Id', 'Name', 'NormalizedInstanceHours'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
ClusterSummaryList
(connection)¶ -
Fields
= set(['Marker'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
ClusterTimeline
(connection=None)¶ -
Fields
= set(['CreationDateTime', 'EndDateTime', 'ReadyDateTime'])¶
-
-
class
boto.emr.emrobject.
Ec2InstanceAttributes
(connection=None)¶ -
Fields
= set(['Ec2AvailabilityZone', 'Ec2KeyName', 'Ec2SubnetId', 'IamInstanceProfile'])¶
-
-
class
boto.emr.emrobject.
EmrObject
(connection=None)¶ -
Fields
= set([])¶
-
endElement
(name, value, connection)¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
HadoopStep
(connection=None)¶ -
Fields
= set(['ActionOnFailure', 'Id', 'Name'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
InstanceGroup
(connection=None)¶ -
Fields
= set(['BidPrice', 'CreationDateTime', 'EndDateTime', 'InstanceGroupId', 'InstanceRequestCount', 'InstanceRole', 'InstanceRunningCount', 'InstanceType', 'LastStateChangeReason', 'LaunchGroup', 'Market', 'Name', 'ReadyDateTime', 'StartDateTime', 'State'])¶
-
-
class
boto.emr.emrobject.
InstanceGroupInfo
(connection=None)¶ -
Fields
= set(['BidPrice', 'Id', 'InstanceGroupType', 'InstanceType', 'Market', 'Name', 'RequestedInstanceCount', 'RunningInstanceCount'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
InstanceGroupList
(connection=None)¶ -
Fields
= set(['Marker'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
InstanceInfo
(connection=None)¶ -
Fields
= set(['Ec2InstanceId', 'Id', 'PrivateDnsName', 'PrivateIpAddress', 'PublicDnsName', 'PublicIpAddress'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
InstanceList
(connection=None)¶ -
Fields
= set(['Marker'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
JobFlow
(connection=None)¶ -
Fields
= set(['AmiVersion', 'AvailabilityZone', 'CreationDateTime', 'Ec2KeyName', 'EndDateTime', 'HadoopVersion', 'Id', 'InstanceCount', 'JobFlowId', 'KeepJobFlowAliveWhenNoSteps', 'LastStateChangeReason', 'LogUri', 'MasterInstanceId', 'MasterInstanceType', 'MasterPublicDnsName', 'Name', 'NormalizedInstanceHours', 'ReadyDateTime', 'RequestId', 'SlaveInstanceType', 'StartDateTime', 'State', 'TerminationProtected', 'Type', 'Value', 'VisibleToAllUsers'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
ModifyInstanceGroupsResponse
(connection=None)¶ -
Fields
= set(['RequestId'])¶
-
-
class
boto.emr.emrobject.
Step
(connection=None)¶ -
Fields
= set(['ActionOnFailure', 'CreationDateTime', 'EndDateTime', 'Jar', 'LastStateChangeReason', 'MainClass', 'Name', 'StartDateTime', 'State'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
StepConfig
(connection=None)¶ -
Fields
= set(['Jar', 'MainClass'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
StepId
(connection=None)¶
-
class
boto.emr.emrobject.
StepSummary
(connection=None)¶ -
Fields
= set(['Id', 'Name'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
StepSummaryList
(connection=None)¶ -
Fields
= set(['Marker'])¶
-
startElement
(name, attrs, connection)¶
-
-
class
boto.emr.emrobject.
SupportedProduct
(connection=None)¶