eland.csv_to_eland#
- eland.csv_to_eland(filepath_or_buffer, es_client: Union[str, List[str], Tuple[str, ...], Elasticsearch], es_dest_index: str, es_if_exists: str = 'fail', es_refresh: bool = False, es_dropna: bool = False, es_type_overrides: Optional[Mapping[str, str]] = None, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=None, prefix=None, mangle_dupe_cols=None, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, chunksize=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, warn_bad_lines: bool = True, error_bad_lines: bool = True, on_bad_lines: str = 'error', delim_whitespace=False, low_memory: bool = True, memory_map=False, float_precision=None, **extra_kwargs) DataFrame #
Read a comma-separated values (csv) file into eland.DataFrame (i.e. an Elasticsearch index).
Modifies an Elasticsearch index
Note pandas iteration options not supported
Parameters#
- es_client: Elasticsearch client argument(s)
elasticsearch-py parameters or
elasticsearch-py instance
- es_dest_index: str
Name of Elasticsearch index to be appended to
- es_if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’
How to behave if the index already exists.
fail: Raise a ValueError.
replace: Delete the index before inserting new values.
append: Insert new values to the existing index. Create if does not exist.
- es_dropna: bool, default ‘False’
True: Remove missing values (see pandas.Series.dropna)
False: Include missing values - may cause bulk to fail
- es_type_overrides: dict, default None
Dict of columns: es_type to override default es datatype mappings
- chunksize
number of csv rows to read before bulk index into Elasticsearch
Other Parameters#
Parameters derived from pandas.read_csv.
See Also#
Notes#
iterator not supported
Examples#
See if ‘churn’ index exists in Elasticsearch
>>> from elasticsearch import Elasticsearch >>> es = Elasticsearch() >>> es.indices.exists(index="churn") False
Read ‘churn.csv’ and use first column as _id (and eland.DataFrame index)
# churn.csv ,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn 0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,0 1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,0 ...
>>> ed.csv_to_eland( ... "churn.csv", ... es_client='http://localhost:9200', ... es_dest_index='churn', ... es_refresh=True, ... index_col=0 ... ) account length area code churn customer service calls ... total night calls total night charge total night minutes voice mail plan 0 128 415 0 1 ... 91 11.01 244.7 yes 1 107 415 0 1 ... 103 11.45 254.4 yes 2 137 415 0 0 ... 104 7.32 162.6 no 3 84 408 0 2 ... 89 8.86 196.9 no 4 75 415 0 3 ... 121 8.41 186.9 no ... ... ... ... ... ... ... ... ... ... 3328 192 415 0 2 ... 83 12.56 279.1 yes 3329 68 415 0 3 ... 123 8.61 191.3 no 3330 28 510 0 2 ... 91 8.64 191.9 no 3331 184 510 0 2 ... 137 6.26 139.2 no 3332 74 415 0 0 ... 77 10.86 241.4 yes [3333 rows x 21 columns]
Validate data now exists in ‘churn’ index:
>>> es.search(index="churn", size=1) {'took': 1, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 3333, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'churn', '_id': '0', '_score': 1.0, '_source': {'state': 'KS', 'account length': 128, 'area code': 415, 'phone number': '382-4657', 'international plan': 'no', 'voice mail plan': 'yes', 'number vmail messages': 25, 'total day minutes': 265.1, 'total day calls': 110, 'total day charge': 45.07, 'total eve minutes': 197.4, 'total eve calls': 99, 'total eve charge': 16.78, 'total night minutes': 244.7, 'total night calls': 91, 'total night charge': 11.01, 'total intl minutes': 10.0, 'total intl calls': 3, 'total intl charge': 2.7, 'customer service calls': 1, 'churn': 0}}]}}
TODO - currently the eland.DataFrame may not retain the order of the data in the csv.