eland.
read_csv
Read a comma-separated values (csv) file into eland.DataFrame (i.e. an Elasticsearch index).
Modifies an Elasticsearch index
Note pandas iteration options not supported
Name of Elasticsearch index to be appended to
How to behave if the index already exists.
List of columns to map to geo_point data type
number of csv rows to read before bulk index into Elasticsearch
See also
Notes
iterator not supported
Examples
See if ‘churn’ index exists in Elasticsearch
>>> from elasticsearch import Elasticsearch # doctest: +SKIP >>> es = Elasticsearch() # doctest: +SKIP >>> es.indices.exists(index="churn") # doctest: +SKIP False
Read ‘churn.csv’ and use first column as _id (and eland.DataFrame index)
# churn.csv ,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn 0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,0 1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,0 ...
>>> ed.read_csv("churn.csv", ... es_client='localhost', ... es_dest_index='churn', ... es_refresh=True, ... index_col=0) # doctest: +SKIP account length area code churn customer service calls ... total night calls total night charge total night minutes voice mail plan 0 128 415 0 1 ... 91 11.01 244.7 yes 1 107 415 0 1 ... 103 11.45 254.4 yes 2 137 415 0 0 ... 104 7.32 162.6 no 3 84 408 0 2 ... 89 8.86 196.9 no 4 75 415 0 3 ... 121 8.41 186.9 no ... ... ... ... ... ... ... ... ... ... 3328 192 415 0 2 ... 83 12.56 279.1 yes 3329 68 415 0 3 ... 123 8.61 191.3 no 3330 28 510 0 2 ... 91 8.64 191.9 no 3331 184 510 0 2 ... 137 6.26 139.2 no 3332 74 415 0 0 ... 77 10.86 241.4 yes <BLANKLINE> [3333 rows x 21 columns]
Validate data now exists in ‘churn’ index:
>>> es.search(index="churn", size=1) # doctest: +SKIP {'took': 1, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 3333, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'churn', '_id': '0', '_score': 1.0, '_source': {'state': 'KS', 'account length': 128, 'area code': 415, 'phone number': '382-4657', 'international plan': 'no', 'voice mail plan': 'yes', 'number vmail messages': 25, 'total day minutes': 265.1, 'total day calls': 110, 'total day charge': 45.07, 'total eve minutes': 197.4, 'total eve calls': 99, 'total eve charge': 16.78, 'total night minutes': 244.7, 'total night calls': 91, 'total night charge': 11.01, 'total intl minutes': 10.0, 'total intl calls': 3, 'total intl charge': 2.7, 'customer service calls': 1, 'churn': 0}}]}}
TODO - currently the eland.DataFrame may not retain the order of the data in the csv.