Data cleansing and validation

In the following, we want to give you a practical overview of various libraries and methods for data cleansing and validation with Python. Besides well-known libraries like NumPy and ,:doc:/workspace/pandas/index we also use several small, specialised libraries like dedupe, fuzzywuzzy, voluptuous, tdda and hypothesis. We prefer these more lightweight solutions to large, universal systems like Great Expectations or MobyDQ.

Overview

GitHub-Insights

Name

Stars

Mitwirkende

Commit-Aktivität

Lizenz

scikit-learn

https://raster.shields.io/github/stars/scikit-learn/scikit-learn https://raster.shields.io/github/contributors/scikit-learn/scikit-learn https://raster.shields.io/github/commit-activity/y/scikit-learn/scikit-learn https://raster.shields.io/github/license/scikit-learn/scikit-learn

ydata-profiling <https://github.com/ydataai/ydata-profiling>`_

https://raster.shields.io/github/stars/ydataai/ydata-profiling https://raster.shields.io/github/contributors/ydataai/ydata-profiling https://raster.shields.io/github/commit-activity/y/ydataai/ydata-profiling https://raster.shields.io/github/license/ydataai/ydata-profiling

fuzzywuzzy

https://raster.shields.io/github/stars/seatgeek/fuzzywuzzy https://raster.shields.io/github/contributors/seatgeek/fuzzywuzzy https://raster.shields.io/github/commit-activity/y/seatgeek/fuzzywuzzy https://raster.shields.io/github/license/seatgeek/fuzzywuzzy

Hypothesis

https://raster.shields.io/github/stars/HypothesisWorks/hypothesis https://raster.shields.io/github/contributors/HypothesisWorks/hypothesis https://raster.shields.io/github/commit-activity/y/HypothesisWorks/hypothesis https://raster.shields.io/github/license/HypothesisWorks/hypothesis

marshmallow

https://raster.shields.io/github/stars/marshmallow-code/marshmallow https://raster.shields.io/github/contributors/marshmallow-code/marshmallow https://raster.shields.io/github/commit-activity/y/marshmallow-code/marshmallow https://raster.shields.io/github/license/marshmallow-code/marshmallow

dedupe

https://raster.shields.io/github/stars/dedupeio/dedupe https://raster.shields.io/github/contributors/dedupeio/dedupe https://raster.shields.io/github/commit-activity/y/dedupeio/dedupe https://raster.shields.io/github/license/dedupeio/dedupe

pandera

https://raster.shields.io/github/stars/unionai-oss/pandera https://raster.shields.io/github/contributors/unionai-oss/pandera https://raster.shields.io/github/commit-activity/y/unionai-oss/pandera https://raster.shields.io/github/license/unionai-oss/pandera

Voluptuous

https://raster.shields.io/github/stars/alecthomas/voluptuous https://raster.shields.io/github/contributors/alecthomas/voluptuous https://raster.shields.io/github/commit-activity/y/alecthomas/voluptuous https://raster.shields.io/github/license/alecthomas/voluptuous

datacleaner

https://raster.shields.io/github/stars/rhiever/datacleaner https://raster.shields.io/github/contributors/rhiever/datacleaner https://raster.shields.io/github/commit-activity/y/rhiever/datacleaner https://raster.shields.io/github/license/rhiever/datacleaner

popmon

https://raster.shields.io/github/stars/ing-bank/popmon https://raster.shields.io/github/contributors/ing-bank/popmon https://raster.shields.io/github/commit-activity/y/ing-bank/popmon https://raster.shields.io/github/license/ing-bank/popmon

TDDA

https://raster.shields.io/github/stars/tdda/tdda https://raster.shields.io/github/contributors/tdda/tdda https://raster.shields.io/github/commit-activity/y/tdda/tdda https://raster.shields.io/github/license/tdda/tdda

Validr

https://raster.shields.io/github/stars/guyskk/validr https://raster.shields.io/github/contributors/guyskk/validr https://raster.shields.io/github/commit-activity/y/guyskk/validr https://raster.shields.io/github/license/guyskk/validr

Probatus

https://raster.shields.io/github/stars/ing-bank/probatus https://raster.shields.io/github/contributors/ing-bank/probatus https://raster.shields.io/github/commit-activity/y/ing-bank/probatus https://raster.shields.io/github/license/ing-bank/probatus

Dormant projects

GitHub-Insights

Name

Stars

Mitwirkende

Commit-Aktivität

Lizenz

Bulwark

https://raster.shields.io/github/stars/ZaxR/bulwark https://raster.shields.io/github/contributors/ZaxR/bulwark https://raster.shields.io/github/commit-activity/y/ZaxR/bulwark https://raster.shields.io/github/license/ZaxR/bulwark

PandasSchema

https://raster.shields.io/github/stars/multimeric/PandasSchema https://raster.shields.io/github/contributors/multimeric/PandasSchema https://raster.shields.io/github/commit-activity/y/multimeric/PandasSchema https://raster.shields.io/github/license/multimeric/PandasSchema

pandas-validation

https://raster.shields.io/github/stars/jmenglund/pandas-validation https://raster.shields.io/github/contributors/jmenglund/pandas-validation https://raster.shields.io/github/commit-activity/y/jmenglund/pandas-validation https://raster.shields.io/github/license/jmenglund/pandas-validation

Opulent-Pandas

https://raster.shields.io/github/stars/danielvdende/opulent-pandas https://raster.shields.io/github/contributors/danielvdende/opulent-pandas https://raster.shields.io/github/commit-activity/y/danielvdende/opulent-pandas https://raster.shields.io/github/license/danielvdende/opulent-pandas

signpost

https://raster.shields.io/github/stars/ilsedippenaar/signpost https://raster.shields.io/github/contributors/ilsedippenaar/signpost https://raster.shields.io/github/commit-activity/y/ilsedippenaar/signpost https://raster.shields.io/github/license/ilsedippenaar/signpost